Devops jobs

50+ DevOps Jobs in India

Apply to 50+ DevOps Jobs on CutShort.io. Find your next job, effortlessly. Browse DevOps Jobs and apply today!

DevOps Engineer / Site Reliability Engineer (SRE)

at NeoGenCode Technologies Pvt Ltd

2 candid answers

Posted by Akshay Patil

Gurugram

5 - 10 yrs

₹4L - ₹9L / yr

DevOps

Site Reliability Engineer (SRE)

Amazon Web Services (AWS)

Docker

Kubernetes

+16 more

🚀 Job Title : DevOps Engineer / Site Reliability Engineer (SRE)

Experience Level : 5+ Years

Location : Gurugram Sector 39, Haryana (On-site)

Employment Type : Full Time Opportunity

About the Role :

We are looking for a proactive DevOps / Site Reliability Engineer (SRE) with around 5 years of hands-on experience designing, automating, and scaling cloud infrastructure and CI/CD delivery pipelines.

In this role, you will bridge the gap between development and operations. You will be responsible for orchestrating containerized applications, automating infrastructure via Code (IaC), establishing SRE best practices (SLIs, SLOs, SLAs), and ensuring maximum uptime, resiliency, and operational efficiency across multi-cloud environments (AWS/Azure/GCP).

Mandatory Skills :

AWS, Kubernetes, Docker, Terraform, Ansible, Jenkins, GitLab CI/CD, GitHub Actions, Python, Bash, CI/CD, Infrastructure as Code (IaC), Grafana, Prometheus, ELK, New Relic, CloudWatch, SRE, SLI/SLO/SLA, Linux

Key Responsibilities :

1. Cloud Infrastructure & Infrastructure as Code (IaC) :

Provision, configure, and maintain scalable, high-availability infrastructure on multi-cloud platforms, primarily AWS (EC2, VPC, IAM, S3, RDS, Route53, ALB/ASG, Lambda, EBS).
Build, deploy, and manage Infrastructure as Code (IaC) using Terraform, Ansible, and CloudFormation to enforce consistency and eliminate configuration drift.
Execute disaster recovery (DR) planning, automated failover / failback mechanisms, and chaos engineering exercises to validate system resiliency.

2. CI/CD, Automation & Development :

Design, end-to-end maintain, and optimize robust CI/CD pipelines using Jenkins, GitLab CI, and GitHub Actions.
Automate release pipelines, versioning, branching strategies, and approval gates using Groovy, Python, and Bash scripting. Integrate automated code quality and security scanning tools (SonarQube, Black Duck, or Fortify) directly into delivery pipelines.
Develop custom tools, scripts, or microservices (e.g., Python / Node.js) to automate manual operational tasks and operational toil.

3. Containerization & Orchestration :

Onboard and orchestrate containerized microservices utilizing Docker and Kubernetes (including Helm charts).
Ensure high availability, auto-scaling, resource management, and fault tolerance for Kubernetes pod deployments.

4. Observability, SRE & Incident Management :

Drive Site Reliability Engineering (SRE) maturity by establishing, tracking, and reporting SLIs, SLOs, and SLAs with cross-functional engineering teams.
Build, configure, and manage full-stack observability tools : Grafana, Prometheus, New Relic, Elasticsearch / Logstash / Kibana (ELK), Sentry, and AWS CloudWatch.
Set up real-time alerting, custom metric dashboards, and automated log rotation / pruning scripts.
Handle production incidents, lead Root Cause Analysis (RCA) investigations, and implement preventive measures to reduce Mean Time to Resolution (MTTR).

Required Qualifications & Skills :

Education : Bachelor’s Degree in Electronics and Communication Engineering, Computer Science, or a related technical field.
Experience : ~5 years of experience in DevOps, SRE, or Cloud System Administration roles.
Cloud & Infrastructure : Hands-on experience with AWS (Core services like EC2, S3, VPC, RDS, IAM, Lambda, Auto Scaling) and exposure to Azure / GCP.
CI/CD & Version Control : Proficiency with Jenkins, GitLab CI, GitHub Actions, and Git workflows.
Containerization : Core proficiency in Docker and Kubernetes cluster management / onboarding.
Infrastructure as Code : Expertise in Ansible, Terraform, or AWS CloudFormation.
Scripting & Languages : Strong hands-on automation skills with Python, Bash, and foundational knowledge of Node.js, Java or C++.
Observability & Logging : Strong experience with Grafana, Prometheus, New Relic, ELK stack, or Splunk.
Database & SQL : Familiarity with relational databases (MySQL, RDS) for monitoring setup and operational analytics.

Soft Skills & Competencies :

Excellent problem-solving, root-cause identification, and chaos engineering mindsets.
Strong written and verbal communication skills in English.
Comfortable working in Agile cross-functional environments and collaborating across development, security, and operations teams.
Innate drive to reduce manual toil and automate repetitive processes.

🚀 Job Title : DevOps Engineer / Site Reliability Engineer (SRE)

Experience Level : 5+ Years

Location : Gurugram Sector 39, Haryana (On-site)

Employment Type : Full Time Opportunity

About the Role :

Mandatory Skills :

Key Responsibilities :

1. Cloud Infrastructure & Infrastructure as Code (IaC) :

Provision, configure, and maintain scalable, high-availability infrastructure on multi-cloud platforms, primarily AWS (EC2, VPC, IAM, S3, RDS, Route53, ALB/ASG, Lambda, EBS).
Build, deploy, and manage Infrastructure as Code (IaC) using Terraform, Ansible, and CloudFormation to enforce consistency and eliminate configuration drift.
Execute disaster recovery (DR) planning, automated failover / failback mechanisms, and chaos engineering exercises to validate system resiliency.

2. CI/CD, Automation & Development :

Design, end-to-end maintain, and optimize robust CI/CD pipelines using Jenkins, GitLab CI, and GitHub Actions.
Automate release pipelines, versioning, branching strategies, and approval gates using Groovy, Python, and Bash scripting. Integrate automated code quality and security scanning tools (SonarQube, Black Duck, or Fortify) directly into delivery pipelines.
Develop custom tools, scripts, or microservices (e.g., Python / Node.js) to automate manual operational tasks and operational toil.

3. Containerization & Orchestration :

Onboard and orchestrate containerized microservices utilizing Docker and Kubernetes (including Helm charts).
Ensure high availability, auto-scaling, resource management, and fault tolerance for Kubernetes pod deployments.

4. Observability, SRE & Incident Management :

Drive Site Reliability Engineering (SRE) maturity by establishing, tracking, and reporting SLIs, SLOs, and SLAs with cross-functional engineering teams.
Build, configure, and manage full-stack observability tools : Grafana, Prometheus, New Relic, Elasticsearch / Logstash / Kibana (ELK), Sentry, and AWS CloudWatch.
Set up real-time alerting, custom metric dashboards, and automated log rotation / pruning scripts.
Handle production incidents, lead Root Cause Analysis (RCA) investigations, and implement preventive measures to reduce Mean Time to Resolution (MTTR).

Required Qualifications & Skills :

Education : Bachelor’s Degree in Electronics and Communication Engineering, Computer Science, or a related technical field.
Experience : ~5 years of experience in DevOps, SRE, or Cloud System Administration roles.
Cloud & Infrastructure : Hands-on experience with AWS (Core services like EC2, S3, VPC, RDS, IAM, Lambda, Auto Scaling) and exposure to Azure / GCP.
CI/CD & Version Control : Proficiency with Jenkins, GitLab CI, GitHub Actions, and Git workflows.
Containerization : Core proficiency in Docker and Kubernetes cluster management / onboarding.
Infrastructure as Code : Expertise in Ansible, Terraform, or AWS CloudFormation.
Scripting & Languages : Strong hands-on automation skills with Python, Bash, and foundational knowledge of Node.js, Java or C++.
Observability & Logging : Strong experience with Grafana, Prometheus, New Relic, ELK stack, or Splunk.
Database & SQL : Familiarity with relational databases (MySQL, RDS) for monitoring setup and operational analytics.

Soft Skills & Competencies :

Excellent problem-solving, root-cause identification, and chaos engineering mindsets.
Strong written and verbal communication skills in English.
Comfortable working in Agile cross-functional environments and collaborating across development, security, and operations teams.
Innate drive to reduce manual toil and automate repetitive processes.

Senior Devops Engineer Lead

at Amura Health

3 candid answers

1 video

Posted by Swathi S

Chennai

7 - 12 yrs

₹30L - ₹55L / yr

Amazon Web Services (AWS)

Python

CI/CD

DevOps

Platform as a Service (PaaS)

+7 more

Amura’s Vision

We believe that the most under-appreciated route to releasing untapped human potential is to build a healthier body, and through which a better brain. This allows us to do more of everything that is important to each one of us.

Billions of healthier brains, sitting in healthier bodies, can take up more complex problems that defy solutions today, including many existential threats, and solve them in just a few decades.

Billions of healthier brains will make the world richer beyond what we can imagine today. The surplus wealth, combined with better human capabilities, will lead us to a new renaissance, giving us a richer and more beautiful culture.

These healthier brains will be equipped with deeper intellect, be less acrimonious, more magnanimous, and have a kinder outlook on the world, resulting in a world that is better than any previous time.

We find this vision of the future exhilarating. Our hopes and dreams are to create this future as quickly as possible and ensure that it is widely distributed and optimized to maximize all forms of human excellence.

Role Overview

We are looking for a highly skilled Senior DevOps Engineer (AI-Native Infrastructure & Platform Engineering) with deep expertise in AWS cloud infrastructure, automation, AI infrastructure operations, and modern DevOps/SRE practices.

This role goes beyond traditional DevOps and requires a seasoned specialist capable of building and operating AI-ready infrastructure platforms that support high-throughput APIs, LLM/AI workloads, GPU-based compute, data-intensive systems, real-time inference pipelines, and scalable ML platforms.

You will be responsible for architecting, automating, securing, and optimizing highly scalable and cost-efficient cloud environments that enable high-velocity engineering and AI teams. This is an ideal position for someone who combines technical ownership, an automation-first mindset, and a passion for developer productivity and platform reliability.

Key Responsibilities

Cloud Infrastructure & Platform Engineering (AWS)

Architect, deploy, and manage highly scalable and secure infrastructure on AWS. Design cloud platforms supporting AI/ML workloads, data pipelines, real-time APIs, and high-concurrency backend systems.
Hands-on expertise with key AWS services including EC2, ECS/EKS, Lambda, RDS, DynamoDB, S3, VPC, CloudFront, IAM, CloudWatch, and GPU-enabled instances.
Build and maintain Infrastructure-as-Code (IaC) using Terraform, CloudFormation, or AWS CDK.
Design multi-AZ and multi-region architectures for high availability and disaster recovery (HA/DR).
Build reusable platform templates and shared infrastructure modules.

AI/ML Infrastructure & MLOps

Build and maintain infrastructure for LLM applications, AI inference workloads, model serving platforms, vector databases, and feature stores.
Support GPU-based workloads and optimize compute/storage usage.
Enable scalable deployment patterns for AI applications using Kubernetes/EKS. Collaborate with Data Science and ML Engineering teams on model deployment, training/tuning of models, CI/CD for ML systems, experiment environments, and reproducibility.
Support orchestration and deployment of AI workflows and inference services while implementing observability and reliability for AI pipelines.

CI/CD, Automation & Developer Productivity

Build and maintain CI/CD pipelines using GitHub Actions, GitLab CI, Jenkins, or AWS CodePipeline.
Automate deployments, environment provisioning, and release workflows.
Build self-service developer platforms, preview environments, and reusable deployment workflows to improve developer productivity.
Implement automated patching, scaling, backups, cleanup workflows, and drift detection.

Containers, Kubernetes & Platform Reliability

Manage Docker-based environments, containerized applications, and optimize workloads using Kubernetes (EKS) or ECS/Fargate.
Manage autoscaling, cluster health, node pools, ingress, service mesh, and workload isolation.
Optimize infrastructure for performance, resilience, and cost-efficiency.
Implement progressive deployment strategies including blue/green, canary, and rolling deployments.

Observability, Incident Response & SRE Practices

Implement observability stacks using CloudWatch, Prometheus, Grafana, ELK, Datadog, OpenTelemetry, or New Relic.
Build actionable dashboards and intelligent alerting systems while defining and tracking SLIs, SLOs, and SLAs.
Lead incident response, root cause analysis, and blameless postmortems to reduce operational toil and improve MTTR.

FinOps, Cost Governance & Security

Continuously monitor and optimize cloud costs (compute utilization, storage lifecycle, GPU usage, and data transfer) using AWS Cost Explorer, Budgets, Trusted Advisor, CloudHealth, or Kubecost.
Implement AWS security best practices for IAM, VPCs, security groups, NACLs, encryption, and manage secrets using KMS, SSM Parameter Store, or Vault.
Build secure CI/CD pipelines with automated security checks, least-privilege access, audit logging, and ensure compliance readiness for ISO 27001, SOC2, and GDPR.

Collaboration, Leadership & Platform Culture

Work closely with engineering, AI/ML, QA, product, and operations teams to drive a DevOps, SRE, GitOps, and automation-first culture.
Mentor junior DevOps and Platform Engineers while creating and maintaining detailed runbooks, architecture diagrams, and platform documentation.

Skills & Qualifications

Must-Have:

7+ years of experience in DevOps, SRE, Platform Engineering, or Cloud Infrastructure Engineering.
Strong expertise in AWS cloud architecture, services, and deep understanding of Kubernetes (EKS), containers, and cloud-native systems.
Strong Infrastructure-as-Code expertise using Terraform, CloudFormation, or CDK. Strong Linux administration, networking, DNS, routing, and load balancing knowledge. Strong scripting/programming experience in Python, Bash, or Go (preferred). Experience with CI/CD automation, GitOps workflows, and observability platforms supporting scalable production systems.

Preferred / Nice-to-Have:

Experience with AI/ML infrastructure, MLOps, model serving, vector databases, GPU orchestration, and inference optimization.
Familiarity with Kafka, Redis, SQS, and event-driven systems.
Exposure to platform engineering, internal developer platforms, and tools like ArgoCD, Flux, Helm, and OpenTelemetry.
AWS Certifications: Solutions Architect, DevOps Engineer, or SysOps Administrator. Knowledge of distributed systems and large-scale platform operations.

Preferred / Nice-to-Have:

Experience with AI/ML infrastructure, MLOps, model serving, vector databases, GPU orchestration, and inference optimization.
Familiarity with Kafka, Redis, SQS, and event-driven systems.
Exposure to platform engineering, internal developer platforms, and tools like ArgoCD, Flux, Helm, and OpenTelemetry.
AWS Certifications: Solutions Architect, DevOps Engineer, or SysOps Administrator. Knowledge of distributed systems and large-scale platform operations.

Here are answers to some questions you may have

Where is your office?

Chennai (Velachery)

Work Model

Work from Office – because great stories are built in person!

Do you have an online presence?

https://amura.ai (we are @AmuraHealth on all social media)

Amura’s Vision

Billions of healthier brains, sitting in healthier bodies, can take up more complex problems that defy solutions today, including many existential threats, and solve them in just a few decades.

Role Overview

Key Responsibilities

Cloud Infrastructure & Platform Engineering (AWS)

Architect, deploy, and manage highly scalable and secure infrastructure on AWS. Design cloud platforms supporting AI/ML workloads, data pipelines, real-time APIs, and high-concurrency backend systems.
Hands-on expertise with key AWS services including EC2, ECS/EKS, Lambda, RDS, DynamoDB, S3, VPC, CloudFront, IAM, CloudWatch, and GPU-enabled instances.
Build and maintain Infrastructure-as-Code (IaC) using Terraform, CloudFormation, or AWS CDK.
Design multi-AZ and multi-region architectures for high availability and disaster recovery (HA/DR).
Build reusable platform templates and shared infrastructure modules.

AI/ML Infrastructure & MLOps

Build and maintain infrastructure for LLM applications, AI inference workloads, model serving platforms, vector databases, and feature stores.
Support GPU-based workloads and optimize compute/storage usage.
Enable scalable deployment patterns for AI applications using Kubernetes/EKS. Collaborate with Data Science and ML Engineering teams on model deployment, training/tuning of models, CI/CD for ML systems, experiment environments, and reproducibility.
Support orchestration and deployment of AI workflows and inference services while implementing observability and reliability for AI pipelines.

CI/CD, Automation & Developer Productivity

Build and maintain CI/CD pipelines using GitHub Actions, GitLab CI, Jenkins, or AWS CodePipeline.
Automate deployments, environment provisioning, and release workflows.
Build self-service developer platforms, preview environments, and reusable deployment workflows to improve developer productivity.
Implement automated patching, scaling, backups, cleanup workflows, and drift detection.

Containers, Kubernetes & Platform Reliability

Manage Docker-based environments, containerized applications, and optimize workloads using Kubernetes (EKS) or ECS/Fargate.
Manage autoscaling, cluster health, node pools, ingress, service mesh, and workload isolation.
Optimize infrastructure for performance, resilience, and cost-efficiency.
Implement progressive deployment strategies including blue/green, canary, and rolling deployments.

Observability, Incident Response & SRE Practices

Implement observability stacks using CloudWatch, Prometheus, Grafana, ELK, Datadog, OpenTelemetry, or New Relic.
Build actionable dashboards and intelligent alerting systems while defining and tracking SLIs, SLOs, and SLAs.
Lead incident response, root cause analysis, and blameless postmortems to reduce operational toil and improve MTTR.

FinOps, Cost Governance & Security

Continuously monitor and optimize cloud costs (compute utilization, storage lifecycle, GPU usage, and data transfer) using AWS Cost Explorer, Budgets, Trusted Advisor, CloudHealth, or Kubecost.
Implement AWS security best practices for IAM, VPCs, security groups, NACLs, encryption, and manage secrets using KMS, SSM Parameter Store, or Vault.
Build secure CI/CD pipelines with automated security checks, least-privilege access, audit logging, and ensure compliance readiness for ISO 27001, SOC2, and GDPR.

Collaboration, Leadership & Platform Culture

Work closely with engineering, AI/ML, QA, product, and operations teams to drive a DevOps, SRE, GitOps, and automation-first culture.
Mentor junior DevOps and Platform Engineers while creating and maintaining detailed runbooks, architecture diagrams, and platform documentation.

Skills & Qualifications

Must-Have:

7+ years of experience in DevOps, SRE, Platform Engineering, or Cloud Infrastructure Engineering.
Strong expertise in AWS cloud architecture, services, and deep understanding of Kubernetes (EKS), containers, and cloud-native systems.
Strong Infrastructure-as-Code expertise using Terraform, CloudFormation, or CDK. Strong Linux administration, networking, DNS, routing, and load balancing knowledge. Strong scripting/programming experience in Python, Bash, or Go (preferred). Experience with CI/CD automation, GitOps workflows, and observability platforms supporting scalable production systems.

Preferred / Nice-to-Have:

Experience with AI/ML infrastructure, MLOps, model serving, vector databases, GPU orchestration, and inference optimization.
Familiarity with Kafka, Redis, SQS, and event-driven systems.
Exposure to platform engineering, internal developer platforms, and tools like ArgoCD, Flux, Helm, and OpenTelemetry.
AWS Certifications: Solutions Architect, DevOps Engineer, or SysOps Administrator. Knowledge of distributed systems and large-scale platform operations.

Preferred / Nice-to-Have:

Experience with AI/ML infrastructure, MLOps, model serving, vector databases, GPU orchestration, and inference optimization.
Familiarity with Kafka, Redis, SQS, and event-driven systems.
Exposure to platform engineering, internal developer platforms, and tools like ArgoCD, Flux, Helm, and OpenTelemetry.
AWS Certifications: Solutions Architect, DevOps Engineer, or SysOps Administrator. Knowledge of distributed systems and large-scale platform operations.

Here are answers to some questions you may have

Where is your office?

Chennai (Velachery)

Work Model

Work from Office – because great stories are built in person!

Do you have an online presence?

https://amura.ai (we are @AmuraHealth on all social media)

Cloud Operations Engineer

at VY SYSTEMS PRIVATE LIMITED

Posted by Banu S

Bengaluru (Bangalore)

6 - 9 yrs

₹5L - ₹20L / yr

Cloud Operations Engineer

DevOps

Jenkins

Docker

Kubernetes

+3 more

You have strong experience supporting cloud environments within the AWS platform.

* You have hands-on experience with cloud platforms, Docker, Kubernetes, Jenkins, Terraform, Artifactory, infrastructure automation, CI/CD pipelines, monitoring, and operational support in enterprise environments.

* You have hands-on experience with infrastructure automation, scripting, and configuration management.

* You understand monitoring, logging, alerting, and incident management practices in cloud operations.

* You have experience working with containers, orchestration platforms, and deployment pipelines.

* You are comfortable supporting Linux and/or Windows server environments in cloud-based ecosystems.

* You understand identity and access management, security controls, patching, and operational compliance practices.

* You have experience with disaster recovery, backup processes, and platform resilience planning.

* You have the technical depth required to support and guide Cloud Operations activities while contributing to the overall effectiveness of the function from the India side.

You have experience supporting DevOps or Site Reliability Engineering practices.

* You have exposure to cost management and cloud optimization tools.

* You are comfortable presenting operational updates, recommendations, and risk considerations to stakeholders.

* You work well in team-based environments and contribute to shared goals.

* You have effective documentation skills for processes, runbooks, and support knowledge.

* You have experience working in regulated or enterprise-scale environments.

* You have familiarity with service management tools and structured change management processes.

* You demonstrate the potential to take on broader operational ownership and support the leadership of the Cloud Operations function from the India side.

* You are comfortable coordinating with multiple stakeholders and contributing to operational decision-making in a global delivery model.