
Similar jobs
Review Criteria
- Strong DevOps /Cloud Engineer Profiles
- Must have 3+ years of experience as a DevOps / Cloud Engineer
- Must have strong expertise in cloud platforms – AWS / Azure / GCP (any one or more)
- Must have strong hands-on experience in Linux administration and system management
- Must have hands-on experience with containerization and orchestration tools such as Docker and Kubernetes
- Must have experience in building and optimizing CI/CD pipelines using tools like GitHub Actions, GitLab CI, or Jenkins
- Must have hands-on experience with Infrastructure-as-Code tools such as Terraform, Ansible, or CloudFormation
- Must be proficient in scripting languages such as Python or Bash for automation
- Must have experience with monitoring and alerting tools like Prometheus, Grafana, ELK, or CloudWatch
- Top tier Product-based company (B2B Enterprise SaaS preferred)
Preferred
- Experience in multi-tenant SaaS infrastructure scaling.
- Exposure to AI/ML pipeline deployments or iPaaS / reverse ETL connectors.
Role & Responsibilities
We are seeking a DevOps Engineer to design, build, and maintain scalable, secure, and resilient infrastructure for our SaaS platform and AI-driven products. The role will focus on cloud infrastructure, CI/CD pipelines, container orchestration, monitoring, and security automation, enabling rapid and reliable software delivery.
Key Responsibilities:
- Design, implement, and manage cloud-native infrastructure (AWS/Azure/GCP).
- Build and optimize CI/CD pipelines to support rapid release cycles.
- Manage containerization & orchestration (Docker, Kubernetes).
- Own infrastructure-as-code (Terraform, Ansible, CloudFormation).
- Set up and maintain monitoring & alerting frameworks (Prometheus, Grafana, ELK, etc.).
- Drive cloud security automation (IAM, SSL, secrets management).
- Partner with engineering teams to embed DevOps into SDLC.
- Troubleshoot production issues and drive incident response.
- Support multi-tenant SaaS scaling strategies.
Ideal Candidate
- 3–6 years' experience as DevOps/Cloud Engineer in SaaS or enterprise environments.
- Strong expertise in AWS, Azure, or GCP.
- Strong expertise in LINUX Administration.
- Hands-on with Kubernetes, Docker, CI/CD tools (GitHub Actions, GitLab, Jenkins).
- Proficient in Terraform/Ansible/CloudFormation.
- Strong scripting skills (Python, Bash).
- Experience with monitoring stacks (Prometheus, Grafana, ELK, CloudWatch).
- Strong grasp of cloud security best practices.
Interested candidates are requested to email their resumes with the subject line "Application for [Job Title]".
Only applications received via email will be reviewed. Applications through other channels will not be considered.
Job Description
The client’s department DPS, Digital People Solutions, offers a sophisticated portfolio of IT applications, providing a strong foundation for professional and efficient People & Organization (P&O) and Business Management, both globally and locally, for a well-known German company listed on the DAX-40 index, which includes the 40 largest and most liquid companies on the Frankfurt Stock Exchange
We are seeking talented DevOps-Engineers with focus on Elastic Stack (ELK) to join our dynamic DPS team. In this role, you will be responsible for refining and advising on the further development of an existing monitoring solution based on the Elastic Stack (ELK). You will independently handle tasks related to architecture, setup, technical migration, and documentation.
The current application landscape features multiple Java web services running on JEE application servers, primarily hosted on AWS, and integrated with various systems such as SAP, other services, and external partners. DPS is committed to delivering the best digital work experience for the customers employees and customers alike.
Responsibilities:
Install, set up, and automate rollouts using Ansible/CloudFormation for all stages (Dev, QA, Prod) in the AWS Cloud for components such as Elastic Search, Kibana, Metric beats, APM server, APM agents, and interface configuration.
Create and develop regular "Default Dashboards" for visualizing metrics from various sources like Apache Webserver, application servers and databases.
Improve and fix bugs in installation and automation routines.
Monitor CPU usage, security findings, and AWS alerts.
Develop and extend "Default Alerting" for issues like OOM errors, datasource issues, and LDAP errors.
Monitor storage space and create concepts for expanding the Elastic landscape in AWS Cloud and Elastic Cloud Enterprise (ECE).
Implement machine learning, uptime monitoring including SLA, JIRA integration, security analysis, anomaly detection, and other useful ELK Stack features.
Integrate data from AWS CloudWatch.
Document all relevant information and train involved personnel in the used technologies.
Requirements:
Experience with Elastic Stack (ELK) components and related technologies.
Proficiency in automation tools like Ansible and CloudFormation.
Strong knowledge of AWS Cloud services.
Experience in creating and managing dashboards and alerts.
Familiarity with IAM roles and rights management.
Ability to document processes and train team members.
Excellent problem-solving skills and attention to detail.
Skills & Requirements
Elastic Stack (ELK), Elasticsearch, Kibana, Logstash, Beats, APM, Ansible, CloudFormation, AWS Cloud, AWS CloudWatch, IAM roles, AWS security, Automation, Monitoring, Dashboard creation, Alerting, Anomaly detection, Machine learning integration, Uptime monitoring, JIRA integration, Apache Webserver, JEE application servers, SAP integration, Database monitoring, Troubleshooting, Performance optimization, Documentation, Training, Problem-solving, Security analysis.
About Hive
Hive is the leading provider of cloud-based AI solutions for content understanding,
trusted by the world’s largest, fastest growing, and most innovative organizations. The
company empowers developers with a portfolio of best-in-class, pre-trained AI models, serving billions of customer API requests every month. Hive also offers turnkey software applications powered by proprietary AI models and datasets, enabling breakthrough use cases across industries. Together, Hive’s solutions are transforming content moderation, brand protection, sponsorship measurement, context-based ad targeting, and more.
Hive has raised over $120M in capital from leading investors, including General Catalyst, 8VC, Glynn Capital, Bain & Company, Visa Ventures, and others. We have over 250 employees globally in our San Francisco, Seattle, and Delhi offices. Please reach out if you are interested in joining the future of AI!
About Role
Our unique machine learning needs led us to open our own data centers, with an
emphasis on distributed high performance computing integrating GPUs. Even with these data centers, we maintain a hybrid infrastructure with public clouds when the right fit. As we continue to commercialize our machine learning models, we also need to grow our DevOps and Site Reliability team to maintain the reliability of our enterprise SaaS offering for our customers. Our ideal candidate is someone who is
able to thrive in an unstructured environment and takes automation seriously. You believe there is no task that can’t be automated and no server scale too large. You take pride in optimizing performance at scale in every part of the stack and never manually performing the same task twice.
Responsibilities
● Create tools and processes for deploying and managing hardware for Private Cloud Infrastructure.
● Improve workflows of developer, data, and machine learning teams
● Manage integration and deployment tooling
● Create and maintain monitoring and alerting tools and dashboards for various services, and audit infrastructure
● Manage a diverse array of technology platforms, following best practices and
procedures
● Participate in on-call rotation and root cause analysis
Requirements
● Minimum 5 - 10 years of previous experience working directly with Software
Engineering teams as a developer, DevOps Engineer, or Site Reliability
Engineer.
● Experience with infrastructure as a service, distributed systems, and software design at a high-level.
● Comfortable working on Linux infrastructures (Debian) via the CLIAble to learn quickly in a fast-paced environment.
● Able to debug, optimize, and automate routine tasks
● Able to multitask, prioritize, and manage time efficiently independently
● Can communicate effectively across teams and management levels
● Degree in computer science, or similar, is an added plus!
Technology Stack
● Operating Systems - Linux/Debian Family/Ubuntu
● Configuration Management - Chef
● Containerization - Docker
● Container Orchestrators - Mesosphere/Kubernetes
● Scripting Languages - Python/Ruby/Node/Bash
● CI/CD Tools - Jenkins
● Network hardware - Arista/Cisco/Fortinet
● Hardware - HP/SuperMicro
● Storage - Ceph, S3
● Database - Scylla, Postgres, Pivotal GreenPlum
● Message Brokers: RabbitMQ
● Logging/Search - ELK Stack
● AWS: VPC/EC2/IAM/S3
● Networking: TCP / IP, ICMP, SSH, DNS, HTTP, SSL / TLS, Storage systems,
RAID, distributed file systems, NFS / iSCSI / CIFS
Who we are
We are a group of ambitious individuals who are passionate about creating a revolutionary AI company. At Hive, you will have a steep learning curve and an opportunity to contribute to one of the fastest growing AI start-ups in San Francisco. The work you do here will have a noticeable and direct impact on the
development of the company.
Thank you for your interest in Hive and we hope to meet you soon
Requirements:
● Should have at least 2+ years of DevOps experience
● Should have experience with Kubernetes
● Should have experience with Terraform/Helm
● Should have experience in building scalable server-side systems
● Should have experience in cloud infrastructure and designing databases
● Having experience with NodeJS/TypeScript/AWS is a bonus
● Having experience with WebRTC is a bonus
- Responsible for the entire infrastructure including Production (both bare metal and AWS).
- Manage and maintain the production systems and operations including SysAdmin, DB activities.
- Improve tools and processes, automate manual efforts, and maintain the health of the system.
- Champion best practices, CI-CD, Metrics Driven Development
- Optimise the company's computing architecture
- Conduct systems tests for security, performance, and availability
- Maintain security of the system
- Develop and maintain design and troubleshooting documentation
- 7+ years of experience into DevOps/Technical Operations
- Extensive experience in operating scripting language like shell, python, etc
- Experience in developing and maintaining CI/CD process for SaaS applications using tools such as Jenkins
- Hands on experience in using configuration management tools such as Puppet, SaltStack, Ansible, etc
- Hands-on experience to build and handle VMs, Containers utilizing tools such as Kubernetes, Docker, etc
- Hands on experience in building, designing and maintaining cloud-based applications with AWS, Azure,GCP, etc
- Knowledge of Databases (MySQL, NoSQL)
- Knowledge of security/ethical hacking
- Have experience with ElasticSearch, Kibana, LogStash
- Have experience with Cassandra, Hadoop, or Spark
- Have experience with Mongo, Hive
What will you do?
- Setup, manage Applications with automation, DevOps, and CI/CD tools.
- Deploy, Maintain and Monitor Infrastructure and Services.
- Automate code and Infra Deployments.
- Tune, optimize and keep systems up to date.
- Design and implement deployment strategies.
- Setup infrastructure in cloud platforms like AWS, Azure, Google Cloud, IBM cloud, Digital Ocean etc as per requirement.
- Experience working on Linux based infrastructure
- Strong hands-on knowledge of setting up production, staging, and dev environments on AWS/GCP/Azure
- Strong hands-on knowledge of technologies like Terraform, Docker, Kubernetes
- Strong understanding of continuous testing environments such as Travis-CI, CircleCI, Jenkins, etc.
- Configuration and managing databases such as MySQL, Mongo
- Excellent troubleshooting
- Working knowledge of various tools, open-source technologies, and cloud services
- Awareness of critical concepts in DevOps and Agile principles
We are looking for a full-time remote DevOps Engineer who has worked with CI/CD automation, big data pipelines and Cloud Infrastructure, to solve complex technical challenges at scale that will reshape the healthcare industry for generations. You will get the opportunity to be involved in the latest tech in big data engineering, novel machine learning pipelines and highly scalable backend development. The successful candidates will be working in a team of highly skilled and experienced developers, data scientists and CTO.
Job Requirements
- Experience deploying, automating, maintaining, and improving complex services and pipelines • Strong understanding of DevOps tools/process/methodologies
- Experience with AWS Cloud Formation and AWS CLI is essential
- The ability to work to project deadlines efficiently and with minimum guidance
- A positive attitude and enjoys working within a global distributed team
Skills
- Highly proficient working with CI/CD and automating infrastructure provisioning
- Deep understanding of AWS Cloud platform and hands on experience setting up and maintaining with large scale implementations
- Experience with JavaScript/TypeScript, Node, Python and Bash/Shell Scripting
- Hands on experience with Docker and container orchestration
- Experience setting up and maintaining big data pipelines, Serverless stacks and containers infrastructure
- An interest in healthcare and medical sectors
- Technical degree with 4 plus years’ infrastructure and automation experience

2. Has done Infrastructure coding using Cloudformation/Terraform and Configuration also understands it very clearly
3. Deep understanding of the microservice design and aware of centralized Caching(Redis),centralized configuration(Consul/Zookeeper)
4. Hands-on experience of working on containers and its orchestration using Kubernetes
5. Hands-on experience of Linux and Windows Operating System
6. Worked on NoSQL Databases like Cassandra, Aerospike, Mongo or
Couchbase, Central Logging, monitoring and Caching using stacks like ELK(Elastic) on the cloud, Prometheus, etc.
7. Has good knowledge of Network Security, Security Architecture and Secured SDLC practices






