



You will be responsible for
1. Setting up, maintaining cloud (AWS/GCP/Azure) and kubernetes cluster and automating
their operation
2. All operational aspects of devtron platform including maintenance, upgrades,
automation.
3. Providing kubernetes expertise to facilitate smooth and fast customer onboarding on
devtron platform
Responsibilities:
1. Manage devtron platform on multiple kubernetes clusters
2. Designing and embedding industry best practices for online services including disaster
recovery, business continuity, monitoring/alerting, and service health measurement
3. Providing operational support for day to day activities involving the deployment of
services
4. Identify opportunities for improving the security, reliability, and scalability of the platform
5. Facilitate smooth and fast customer onboarding on devtron platform
6. Drive customer engagement
Requirements:
● Bachelor's Degree in Computer Science or a related field.
● 2+ years working as a devops engineer
● Proficient in 1 or more programming languages (e.g. Python, Go, Ruby).
● Familiar with shell scripts, Linux commands, network fundamentals
● Understanding of large scale distributed systems
● Basic understanding of cloud computing (AWS/GCP/Azure)
Preferred Qualifications:
● Great analytical and interpersonal skills
● Passion for creating efficient, reliable, reusable programs/scripts.
● Excited about technology, have a strong interest in learning about and playing with the
latest technologies and doing POC.
● Strong customer focus, ownership, urgency and drive.
● Knowledge and experience with cloud native tools like prometheus, kubernetes, docker,
grafana.

About Devtron Inc.
About
Devtron is open-source DevOps platform that is specifically designed for Kubernetes. The platform offers a range of features including CI, CD, security, debugging, and cost optimization, all accessible through an intuitive user interface. With Devtron, customers can easily debug applications, monitor events, and check configurations, all from a single screen without the need to switch to cloud watch. The platform also provides metrics that measure deployment frequencies, as well as a single-pane view for application debugging, which helps to increase the stability of applications.
Devtron is founded in 2019 by Nishant Kumar, Prashant Ghildiyal, and Rajesh Razdan, Devtron is headquartered in Del Mar, California.
Similar jobs
About the Company:
Gruve is an innovative Software Services startup dedicated to empowering Enterprise Customers in managing their Data Life Cycle. We specialize in Cyber Security, Customer Experience, Infrastructure, and advanced technologies such as Machine Learning and Artificial Intelligence. Our mission is to assist our customers in their business strategies utilizing their data to make more intelligent decisions. As an well-funded early-stage startup, Gruve offers a dynamic environment with strong customer and partner networks.
Why Gruve:
At Gruve, we foster a culture of innovation, collaboration, and continuous learning. We are committed to building a diverse and inclusive workplace where everyone can thrive and contribute their best work. If you’re passionate about technology and eager to make an impact, we’d love to hear from you.
Gruve is an equal opportunity employer. We welcome applicants from all backgrounds and thank all who apply; however, only those selected for an interview will be contacted.
Position summary:
We are seeking a Staff Engineer – DevOps with 8-12 years of experience in designing, implementing, and optimizing CI/CD pipelines, cloud infrastructure, and automation frameworks. The ideal candidate will have expertise in Kubernetes, Terraform, CI/CD, Security, Observability, and Cloud Platforms (AWS, Azure, GCP). You will play a key role in scaling and securing our infrastructure, improving developer productivity, and ensuring high availability and performance.
Key Roles & Responsibilities:
- Design, implement, and maintain CI/CD pipelines using tools like Jenkins, GitLab CI/CD, ArgoCD, and Tekton.
- Deploy and manage Kubernetes clusters (EKS, AKS, GKE) and containerized workloads.
- Automate infrastructure provisioning using Terraform, Ansible, Pulumi, or CloudFormation.
- Implement observability and monitoring solutions using Prometheus, Grafana, ELK, OpenTelemetry, or Datadog.
- Ensure security best practices in DevOps, including IAM, secrets management, container security, and vulnerability scanning.
- Optimize cloud infrastructure (AWS, Azure, GCP) for performance, cost efficiency, and scalability.
- Develop and manage GitOps workflows and infrastructure-as-code (IaC) automation.
- Implement zero-downtime deployment strategies, including blue-green deployments, canary releases, and feature flags.
- Work closely with development teams to optimize build pipelines, reduce deployment time, and improve system reliability.
Basic Qualifications:
- A bachelor’s or master’s degree in computer science, electronics engineering or a related field
- 8-12 years of experience in DevOps, Site Reliability Engineering (SRE), or Infrastructure Automation.
- Strong expertise in CI/CD pipelines, version control (Git), and release automation.
- Hands-on experience with Kubernetes (EKS, AKS, GKE) and container orchestration.
- Proficiency in Terraform, Ansible for infrastructure automation.
- Experience with AWS, Azure, or GCP services (EC2, S3, IAM, VPC, Lambda, API Gateway, etc.).
- Expertise in monitoring/logging tools such as Prometheus, Grafana, ELK, OpenTelemetry, or Datadog.
- Strong scripting and automation skills in Python, Bash, or Go.
Preferred Qualifications
- Experience in FinOps Cloud Cost Optimization) and Kubernetes cluster scaling.
- Exposure to serverless architectures and event-driven workflows.
- Contributions to open-source DevOps projects.
Hi,
We are looking for candidate with experience in DevSecOps.
Please find below JD for your reference.
Responsibilities:
Execute shell scripts for seamless automation and system management.
Implement infrastructure as code using Terraform for AWS, Kubernetes, Helm, kustomize, and kubectl.
Oversee AWS security groups, VPC configurations, and utilize Aviatrix for efficient network orchestration.
Contribute to Opentelemetry Collector for enhanced observability.
Implement microsegmentation using AWS native resources and Aviatrix for commercial routes.
Enforce policies through Open Policy Agent (OPA) integration.
Develop and maintain comprehensive runbooks for standard operating procedures.
Utilize packet tracing for network analysis and security optimization.
Apply OWASP tools and practices for robust web application security.
Integrate container vulnerability scanning tools seamlessly within CI/CD pipelines.
Define security requirements for source code repositories, binary repositories, and secrets managers in CI/CD pipelines.
Collaborate with software and platform engineers to infuse security principles into DevOps teams.
Regularly monitor and report project status to the management team.
Qualifications:
Proficient in shell scripting and automation.
Strong command of Terraform, AWS, Kubernetes, Helm, kustomize, and kubectl.
Deep understanding of AWS security practices, VPC configurations, and Aviatrix.
Familiarity with Opentelemetry for observability and OPA for policy enforcement.
Experience in packet tracing for network analysis.
Practical application of OWASP tools and web application security.
Integration of container vulnerability scanning tools within CI/CD pipelines.
Proven ability to define security requirements for source code repositories, binary repositories, and secrets managers in CI/CD pipelines.
Collaboration expertise with DevOps teams for security integration.
Regular monitoring and reporting capabilities.
Site Reliability Engineering experience.
Hands-on proficiency with source code management tools, especially Git.
Cloud platform expertise (AWS, Azure, or GCP) with hands-on experience in deploying and managing applications.
Please send across your updated profile.
We are hiring for a Lead DevOps Engineer in Cloud domain with hands on experience in Azure / GCP.
- Expertise in managing Cloud / VMWare resources and good exposure on Dockers/Kubernetes
- Working knowledge of operating systems( Unix, Linux, IBM AIX)
- Experience in installation, configuration and managing apache webserver, Tomcat/Jboss
- Good understanding of JVM, troubleshooting and performance tuning through thread dump and log analysis
-Strong expertise in Dev Ops tools:
- Deployment (Chef/Puppet/Ansible /Nebula/Nolio)
- SCM (TFS, GIT, ClearCase)
- Build tools (Ant,Maven, Make, Gradle)
- Artifact repositories (Nexes, JFrog ArtiFactory)
- CI tools (Jenkins, TeamCity),
- Experienced in scripting languages: Python, Ant, Bash and Shell
What will be required of you?
- Responsible for implementation and support of application/web server infrastructure for complex business applications
- Server configuration management, release management, deployments, automation & troubleshooting
- Set-up and configure Development, Staging, UAT and Production server environment for projects and install/configure all dependencies using the industry best practices
- Manage Code Repositories
- Manage, Document, Control and Innovate Development and Release procedure.
- Configure automated deployment on multiple environment
- Hands-on working experience of Azure or GCP.
- Knowledge Transfer the implementation to support team and until such time support any production issues
Key Responsibilities:
- Work with the development team to plan, execute and monitor deployments
- Capacity planning for product deployments
- Adopt best practices for deployment and monitoring systems
- Ensure the SLAs for performance, up time are met
- Constantly monitor systems, suggest changes to improve performance and decrease costs.
- Ensure the highest standards of security
Key Competencies (Functional):
- Proficiency in coding in atleast one scripting language - bash, Python, etc
- Has personally managed a fleet of servers (> 15)
- Understand different environments production, deployment and staging
- Worked in micro service / Service oriented architecture systems
- Has worked with automated deployment systems – Ansible / Chef / Puppet.
- Can write MySQL queries
About Hive
Hive is the leading provider of cloud-based AI solutions for content understanding,
trusted by the world’s largest, fastest growing, and most innovative organizations. The
company empowers developers with a portfolio of best-in-class, pre-trained AI models, serving billions of customer API requests every month. Hive also offers turnkey software applications powered by proprietary AI models and datasets, enabling breakthrough use cases across industries. Together, Hive’s solutions are transforming content moderation, brand protection, sponsorship measurement, context-based ad targeting, and more.
Hive has raised over $120M in capital from leading investors, including General Catalyst, 8VC, Glynn Capital, Bain & Company, Visa Ventures, and others. We have over 250 employees globally in our San Francisco, Seattle, and Delhi offices. Please reach out if you are interested in joining the future of AI!
About Role
Our unique machine learning needs led us to open our own data centers, with an
emphasis on distributed high performance computing integrating GPUs. Even with these data centers, we maintain a hybrid infrastructure with public clouds when the right fit. As we continue to commercialize our machine learning models, we also need to grow our DevOps and Site Reliability team to maintain the reliability of our enterprise SaaS offering for our customers. Our ideal candidate is someone who is
able to thrive in an unstructured environment and takes automation seriously. You believe there is no task that can’t be automated and no server scale too large. You take pride in optimizing performance at scale in every part of the stack and never manually performing the same task twice.
Responsibilities
● Create tools and processes for deploying and managing hardware for Private Cloud Infrastructure.
● Improve workflows of developer, data, and machine learning teams
● Manage integration and deployment tooling
● Create and maintain monitoring and alerting tools and dashboards for various services, and audit infrastructure
● Manage a diverse array of technology platforms, following best practices and
procedures
● Participate in on-call rotation and root cause analysis
Requirements
● Minimum 5 - 10 years of previous experience working directly with Software
Engineering teams as a developer, DevOps Engineer, or Site Reliability
Engineer.
● Experience with infrastructure as a service, distributed systems, and software design at a high-level.
● Comfortable working on Linux infrastructures (Debian) via the CLIAble to learn quickly in a fast-paced environment.
● Able to debug, optimize, and automate routine tasks
● Able to multitask, prioritize, and manage time efficiently independently
● Can communicate effectively across teams and management levels
● Degree in computer science, or similar, is an added plus!
Technology Stack
● Operating Systems - Linux/Debian Family/Ubuntu
● Configuration Management - Chef
● Containerization - Docker
● Container Orchestrators - Mesosphere/Kubernetes
● Scripting Languages - Python/Ruby/Node/Bash
● CI/CD Tools - Jenkins
● Network hardware - Arista/Cisco/Fortinet
● Hardware - HP/SuperMicro
● Storage - Ceph, S3
● Database - Scylla, Postgres, Pivotal GreenPlum
● Message Brokers: RabbitMQ
● Logging/Search - ELK Stack
● AWS: VPC/EC2/IAM/S3
● Networking: TCP / IP, ICMP, SSH, DNS, HTTP, SSL / TLS, Storage systems,
RAID, distributed file systems, NFS / iSCSI / CIFS
Who we are
We are a group of ambitious individuals who are passionate about creating a revolutionary AI company. At Hive, you will have a steep learning curve and an opportunity to contribute to one of the fastest growing AI start-ups in San Francisco. The work you do here will have a noticeable and direct impact on the
development of the company.
Thank you for your interest in Hive and we hope to meet you soon
Kutumb is the first and largest communities platform for Bharat. We are growing at an exponential trajectory. More than 1 Crore users use Kutumb to connect with their community. We are backed by world-class VCs and angel investors. We are growing and looking for exceptional Infrastructure Engineers to join our Engineering team.
More on this here - https://kutumbapp.com/why-join-us.html">https://kutumbapp.com/why-join-us.html
We’re excited if you have:
- Recent experience designing and building unified observability platforms that enable companies to use the sometimes-overwhelming amount of available data (metrics, logs, and traces) to determine quickly if their application or service is operating as desired
- Expertise in deploying and using open-source observability tools in large-scale environments, including Prometheus, Grafana, ELK (ElasticSearch + Logstash + Kibana), Jaeger, Kiali, and/or Loki
- Familiarity with open standards like OpenTelemetry, OpenTracing, and OpenMetrics
- Familiarity with Kubernetes and Istio as the architecture on which the observability platform runs, and how they integrate and scale. Additionally, the ability to contribute improvements back to the joint platform for the benefit of all teams
- Demonstrated customer engagement and collaboration skills to curate custom dashboards and views, and identify and deploy new tools, to meet their requirements
- The drive and self-motivation to understand the intricate details of a complex infrastructure environment
- Using CICD tools to automatically perform canary analysis and roll out changes after passing automated gates (think Argo & keptn)
- Hands-on experience working with AWS
- Bonus points for knowledge of ETL pipelines and Big data architecture
- Great problem-solving skills & takes pride in your work
- Enjoys building scalable and resilient systems, with a focus on systems that are robust by design and suitably monitored
- Abstracting all of the above into as simple of an interface as possible (like Knative) so developers don't need to know about it unless they choose to open the escape hatch
What you’ll be doing:
- Design and build automation around the chosen tools to make onboarding new services easy for developers (dashboards, alerts, traces, etc)
- Demonstrate great communication skills in working with technical and non-technical audiences
- Contribute new open-source tools and/or improvements to existing open-source tools back to the CNCF ecosystem
Tools we use:
Kops, Argo, Prometheus/ Loki/ Grafana, Kubernetes, AWS, MySQL/ PostgreSQL, Apache Druid, Cassandra, Fluentd, Redis, OpenVPN, MongoDB, ELK
What we offer:
- High pace of learning
- Opportunity to build the product from scratch
- High autonomy and ownership
- A great and ambitious team to work with
- Opportunity to work on something that really matters
- Top of the class market salary and meaningful ESOP ownership
- Experience using AWS (that’s just common sense)
- Experience designing and building web environments on AWS, which includes working with services like EC2, ELB, RDS, and S3
- Experience building and maintaining cloud-native applications
- A solid background in Linux/Unix and Windows server system administration
- Experience using https://www.simplilearn.com/tutorials/devops-tutorial/devops-tools" target="_blank">DevOps tools in a cloud environment, such as Ansible, Artifactory, https://www.simplilearn.com/tutorials/docker-tutorial/what-is-docker-container" target="_blank">Docker, GitHub, https://www.simplilearn.com/tutorials/jenkins-tutorial/what-is-jenkins" target="_blank">Jenkins, https://www.simplilearn.com/tutorials/kubernetes-tutorial/what-is-kubernetes" target="_blank">Kubernetes, Maven, and Sonar Qube
- Experience installing and configuring different application servers such as JBoss, Tomcat, and WebLogic
- Experience using monitoring solutions like CloudWatch, ELK Stack, and Prometheus
- An understanding of writing Infrastructure-as-Code (IaC), using tools like CloudFormation or Terraform
- Knowledge of one or more of the most-used programming languages available for today’s cloud computing (i.e., SQL data, XML data, R math, Clojure math, Haskell functional, Erlang functional, Python procedural, and Go procedural languages)
- Experience in troubleshooting distributed systems
- Proficiency in script development and scripting languages
- The ability to be a team player
- The ability and skill to train other people in procedural and technical topics
- Strong communication and collaboration skills
As a special aside, an AWS engineer who works in DevOps should also have experience with:
- The theory, concepts, and real-world application of Continuous Delivery (CD), which requires familiarity with tools like AWS CodeBuild, AWS CodeDeploy, and AWS CodePipeline
- An understanding of automation
About the compnay:
Our Client is a B2B2C tech Web3 startup founded by founders - IITB Graduates who are experienced in retail, ecommerce and fintech.
Vision: Our Client aims to change the way that customers, creators, and retail investors interact and transact at brands of all shapes and sizes. Essentially, becoming the Web3 version of brands driven social ecommerce & investment platform.
Role Description
We are looking for a DevOps Engineer responsible for managing cloud technologies, deployment
automation and CI /CD
Key Responsibilities
Building and setting up new development tools and infrastructure
Understanding the needs of stakeholders and conveying this to developers
Working on ways to automate and improve development and release processes
Testing and examining code written by others and analyzing results
Ensuring that systems are safe and secure against cybersecurity threats
Identifying technical problems and developing software updates and ‘fixes’
Working with software developers and software engineers to ensure that development
follows established processes and works as intended
Planning out projects and being involved in project management decisions
Required Skills and Qualifications
BE / MCA / B.Sc-IT / B.Tech in Computer Science or a related field.
4+ years of overall development experience.
Strong understanding of cloud deployment and setup.
Hands-on experience with tools like Jenkins, Gradle etc.
Deploy updates and fixes.
Provide Level 2 technical support.
Build tools to reduce occurrences of errors and improve customer experience.
Perform root cause analysis for production errors.
Investigate and resolve technical issues.
Develop scripts to automate deployment.
Design procedures for system troubleshooting and maintenance.
Proficient with git and git workflows.
Working knowledge of databases and SQL.
Problem-solving attitude.
Collaborative team spirit
Regards
Team Merito

- AWS Cloud, CICD, Serverless setups, Monitoring Setup
- Performance setup, scalability in hands experience, Linux expertise, DevOps Operations
- AWS Cloud, CICD, Serverless setups, Monitoring Setup, Performance setup, scalability in hands experience, Linux expertise, DevOps Operations.

