11+ Root cause analysis Jobs in Delhi, NCR and Gurgaon | Root cause analysis Job openings in Delhi, NCR and Gurgaon
Apply to 11+ Root cause analysis Jobs in Delhi, NCR and Gurgaon on CutShort.io. Explore the latest Root cause analysis Job opportunities across top companies like Google, Amazon & Adobe.
Required Skills: Advanced AWS Infrastructure Expertise, CI/CD Pipeline Automation, Monitoring, Observability & Incident Management, Security, Networking & Risk Management, Infrastructure as Code & Scripting
Criteria:
- 5+ years of DevOps/SRE experience in cloud-native, product-based companies (B2C scale preferred)
- Strong hands-on AWS expertise across core and advanced services (EC2, ECS/EKS, Lambda, S3, CloudFront, RDS, VPC, IAM, ELB/ALB, Route53)
- Proven experience designing high-availability, fault-tolerant cloud architectures for large-scale traffic
- Strong experience building & maintaining CI/CD pipelines (Jenkins mandatory; GitHub Actions/GitLab CI a plus)
- Prior experience running production-grade microservices deployments and automated rollout strategies (Blue/Green, Canary)
- Hands-on experience with monitoring & observability tools (Grafana, Prometheus, ELK, CloudWatch, New Relic, etc.)
- Solid hands-on experience with MongoDB in production, including performance tuning, indexing & replication
- Strong scripting skills (Bash, Shell, Python) for automation
- Hands-on experience with IaC (Terraform, CloudFormation, or Ansible)
- Deep understanding of networking fundamentals (VPC, subnets, routing, NAT, security groups)
- Strong experience in incident management, root cause analysis & production firefighting
Description
Role Overview
Company is seeking an experienced Senior DevOps Engineer to design, build, and optimize cloud infrastructure on AWS, automate CI/CD pipelines, implement monitoring and security frameworks, and proactively identify scalability challenges. This role requires someone who has hands-on experience running infrastructure at B2C product scale, ideally in media/OTT or high-traffic applications.
Key Responsibilities
1. Cloud Infrastructure — AWS (Primary Focus)
- Architect, deploy, and manage scalable infrastructure using AWS services such as EC2, ECS/EKS, Lambda, S3, CloudFront, RDS, ELB/ALB, VPC, IAM, Route53, etc.
- Optimize cloud cost, resource utilization, and performance across environments.
- Design high-availability, fault-tolerant systems for streaming workloads.
2. CI/CD Automation
- Build and maintain CI/CD pipelines using Jenkins, GitHub Actions, or GitLab CI.
- Automate deployments for microservices, mobile apps, and backend APIs.
- Implement blue/green and canary deployments for seamless production rollouts.
3. Observability & Monitoring
- Implement logging, metrics, and alerting using tools like Grafana, Prometheus, ELK, CloudWatch, New Relic, etc.
- Perform proactive performance analysis to minimize downtime and bottlenecks.
- Set up dashboards for real-time visibility into system health and user traffic spikes.
4. Security, Compliance & Risk Highlighting
• Conduct frequent risk assessments and identify vulnerabilities in:
o Cloud architecture
o Access policies (IAM)
o Secrets & key management
o Data flows & network exposure
• Implement security best practices including VPC isolation, WAF rules, firewall policies, and SSL/TLS management.
5. Scalability & Reliability Engineering
- Analyze traffic patterns for OTT-specific load variations (weekends, new releases, peak hours).
- Identify scalability gaps and propose solutions across:
- o Microservices
- o Caching layers
- o CDN distribution (CloudFront)
- o Database workloads
- Perform capacity planning and load testing to ensure readiness for 10x traffic growth.
6. Database & Storage Support
- Administer and optimize MongoDB for high-read/low-latency use cases.
- Design backup, recovery, and data replication strategies.
- Work closely with backend teams to tune query performance and indexing.
7. Automation & Infrastructure as Code
- Implement IaC using Terraform, CloudFormation, or Ansible.
- Automate repetitive infrastructure tasks to ensure consistency across environments.
Required Skills & Experience
Technical Must-Haves
- 5+ years of DevOps/SRE experience in cloud-native, product-based companies.
- Strong hands-on experience with AWS (core and advanced services).
- Expertise in Jenkins CI/CD pipelines.
- Solid background working with MongoDB in production environments.
- Good understanding of networking: VPCs, subnets, security groups, NAT, routing.
- Strong scripting experience (Bash, Python, Shell).
- Experience handling risk identification, root cause analysis, and incident management.
Nice to Have
- Experience with OTT, video streaming, media, or any content-heavy product environments.
- Familiarity with containers (Docker), orchestration (Kubernetes/EKS), and service mesh.
- Understanding of CDN, caching, and streaming pipelines.
Personality & Mindset
- Strong sense of ownership and urgency—DevOps is mission critical at OTT scale.
- Proactive problem solver with ability to think about long-term scalability.
- Comfortable working with cross-functional engineering teams.
Why Join company?
• Build and operate infrastructure powering millions of monthly users.
• Opportunity to shape DevOps culture and cloud architecture from the ground up.
• High-impact role in a fast-scaling Indian OTT product.
REVIEW CRITERIA:
MANDATORY:
- Strong Senior/Lead DevOps Engineer Profile
- Must have 8+ years of hands-on experience in DevOps engineering, with a strong focus on AWS cloud infrastructure and services (EC2, VPC, EKS, RDS, Lambda, CloudFront, etc.).
- Must have strong system administration expertise (installation, tuning, troubleshooting, security hardening)
- Must have solid experience in CI/CD pipeline setup and automation using tools such as Jenkins, GitHub Actions, or similar
- Must have hands-on experience with Infrastructure as Code (IaC) tools such as Terraform, CloudFormation, or Ansible
- Must have strong database expertise across MongoDB and Snowflake (administration, performance optimization, integrations)
- Must have experience with monitoring and observability tools such as Prometheus, Grafana, ELK, CloudWatch, or Datadog
- Must have good exposure to containerization and orchestration using Docker and Kubernetes (EKS)
- Must be currently working in an AWS-based environment (AWS experience must be in the current organization)
- Its an IC role
PREFERRED:
- Must be proficient in scripting languages (Bash, Python) for automation and operational tasks.
- Must have strong understanding of security best practices, IAM, WAF, and GuardDuty configurations.
- Exposure to DevSecOps and end-to-end automation of deployments, provisioning, and monitoring.
- Bachelor’s or Master’s degree in Computer Science, Information Technology, or related field.
- Candidates from NCR region only (No outstation candidates).
ROLES AND RESPONSIBILITIES:
We are seeking a highly skilled Senior DevOps Engineer with 8+ years of hands-on experience in designing, automating, and optimizing cloud-native solutions on AWS. AWS and Linux expertise are mandatory. The ideal candidate will have strong experience across databases, automation, CI/CD, containers, and observability, with the ability to build and scale secure, reliable cloud environments.
KEY RESPONSIBILITIES:
Cloud & Infrastructure as Code (IaC)-
- Architect and manage AWS environments ensuring scalability, security, and high availability.
- Implement infrastructure automation using Terraform, CloudFormation, and Ansible.
- Configure VPC Peering, Transit Gateway, and PrivateLink/Connect for advanced networking.
CI/CD & Automation:
- Build and maintain CI/CD pipelines (Jenkins, GitHub, SonarQube, automated testing).
- Automate deployments, provisioning, and monitoring across environments.
Containers & Orchestration:
- Deploy and operate workloads on Docker and Kubernetes (EKS).
- Implement IAM Roles for Service Accounts (IRSA) for secure pod-level access.
- Optimize performance of containerized and microservices applications.
Monitoring & Reliability:
- Implement observability with Prometheus, Grafana, ELK, CloudWatch, M/Monit, and Datadog.
- Establish logging, alerting, and proactive monitoring for high availability.
Security & Compliance:
- Apply AWS security best practices including IAM, IRSA, SSO, and role-based access control.
- Manage WAF, Guard Duty, Inspector, and other AWS-native security tools.
- Configure VPNs, firewalls, and secure access policies and AWS organizations.
Databases & Analytics:
- Must have expertise in MongoDB, Snowflake, Aerospike, RDS, PostgreSQL, MySQL/MariaDB, and other RDBMS.
- Manage data reliability, performance tuning, and cloud-native integrations.
- Experience with Apache Airflow and Spark.
IDEAL CANDIDATE:
- 8+ years in DevOps engineering, with strong AWS Cloud expertise (EC2, VPC, TG, RDS, S3, IAM, EKS, EMR, SCP, MWAA, Lambda, CloudFront, SNS, SES etc.).
- Linux expertise is mandatory (system administration, tuning, troubleshooting, CIS hardening etc).
- Strong knowledge of databases: MongoDB, Snowflake, Aerospike, RDS, PostgreSQL, MySQL/MariaDB, and other RDBMS.
- Hands-on with Docker, Kubernetes (EKS), Terraform, CloudFormation, Ansible.
- Proven ability with CI/CD pipeline automation and DevSecOps practices.
- Practical experience with VPC Peering, Transit Gateway, WAF, Guard Duty, Inspector and advanced AWS networking and security tools.
- Expertise in observability tools: Prometheus, Grafana, ELK, CloudWatch, M/Monit, and Datadog.
- Strong scripting skills (Shell/bash, Python, or similar) for automation.
- Bachelor / Master’s degree
- Effective communication skills
PERKS, BENEFITS AND WORK CULTURE:
- Competitive Salary Package
- Generous Leave Policy
- Flexible Working Hours
- Performance-Based Bonuses
- Health Care Benefits
About the Job
This is a full-time role for a Lead DevOps Engineer at Spark Eighteen. We are seeking an experienced DevOps professional to lead our infrastructure strategy, design resilient systems, and drive continuous improvement in our deployment processes. In this role, you will architect scalable solutions, mentor junior engineers, and ensure the highest standards of reliability and security across our cloud infrastructure. The job location is flexible with preference for the Delhi NCR region.
Responsibilities
- Lead and mentor the DevOps/SRE team
- Define and drive DevOps strategy and roadmaps
- Oversee infrastructure automation and CI/CD at scale
- Collaborate with architects, developers, and QA teams to integrate DevOps practices
- Ensure security, compliance, and high availability of platforms
- Own incident response, postmortems, and root cause analysis
- Budgeting, team hiring, and performance evaluation
Requirements
Technical Skills
- Bachelor's or Master's degree in Computer Science, Engineering, or related field.
- 7+ years of professional DevOps experience with demonstrated progression.
- Strong architecture and leadership background
- Deep hands-on knowledge of infrastructure as code, CI/CD, and cloud
- Proven experience with monitoring, security, and governance
- Effective stakeholder and project management
- Experience with tools like Jenkins, ArgoCD, Terraform, Vault, ELK, etc.
- Strong understanding of business continuity and disaster recovery
Soft Skills
- Cross-functional communication excellence with ability to lead technical discussions.
- Strong mentorship capabilities for junior and mid-level team members.
- Advanced strategic thinking and ability to propose innovative solutions.
- Excellent knowledge transfer skills through documentation and training.
- Ability to understand and align technical solutions with broader business strategy.
- Proactive problem-solving approach with focus on continuous improvement.
- Strong leadership skills in guiding team performance and technical direction.
- Effective collaboration across development, QA, and business teams.
- Ability to make complex technical decisions with minimal supervision.
- Strategic approach to risk management and mitigation.
What We Offer
- Professional Growth: Continuous learning opportunities through diverse projects and mentorship from experienced leaders
- Global Exposure: Work with clients from 20+ countries, gaining insights into different markets and business cultures
- Impactful Work: Contribute to projects that make a real difference, with solutions generating over $1B in revenue
- Work-Life Balance: Flexible arrangements that respect personal wellbeing while fostering productivity
- Career Advancement: Clear progression pathways as you develop skills within our growing organization
- Competitive Compensation: Attractive salary packages that recognize your contributions and expertise
Our Culture
At Spark Eighteen, our culture centers on innovation, excellence, and growth. We believe in:
- Quality-First: Delivering excellence rather than just quick solutions
- True Partnership: Building relationships based on trust and mutual respect
- Communication: Prioritizing clear, effective communication across teams
- Innovation: Encouraging curiosity and creative approaches to problem-solving
- Continuous Learning: Supporting professional development at all levels
- Collaboration: Combining diverse perspectives to achieve shared goals
- Impact: Measuring success by the value we create for clients and users
Apply Here - https://tinyurl.com/t6x23p9b
Job Overview:
We are looking for a seasoned OpenStack Administrator with strong expertise in managing large-scale production environments. The ideal candidate should have hands-on experience with Linux, Kubernetes, and OpenShift, and be capable of performing routine maintenance, upgrades, and troubleshooting in complex cloud infrastructures.
The candidate must also be comfortable working with Red Hat support, managing escalations, and communicating effectively with both internal teams and external clients.
Key Skills & Qualifications:
- Proven experience managing OpenStack infrastructure in production.
- Strong proficiency in Linux system administration (RHEL/CentOS preferred).
- Hands-on experience with Kubernetes and OpenShift.
- Experience with system monitoring, log management, and troubleshooting tools.
- Familiarity with RH support portal, managing cases, and following up on resolutions.
- Excellent problem-solving skills and ability to work under pressure.
- Strong client communication skills and ability to articulate technical issues clearly.
- Proven ability to work in and manage large-scale production environments.
Candidates with OpenStack certification will be preferred.
Role Overview:
As a DevOps Engineer (L2), you will play a key role in designing, implementing, and optimizing infrastructure. You will take ownership of automating processes, improving system reliability, and supporting the development lifecycle.
Key Responsibilities:
- Design and manage scalable, secure, and highly available cloud infrastructure.
- Lead efforts in implementing and optimizing CI/CD pipelines.
- Automate repetitive tasks and develop robust monitoring solutions.
- Ensure the security and compliance of systems, including IAM, VPCs, and network configurations.
- Troubleshoot complex issues across development, staging, and production environments.
- Mentor and guide L1 engineers on best practices.
- Stay updated on emerging DevOps tools and technologies.
- Manage cloud resources efficiently using Infrastructure as Code (IaC) tools like Terraform and AWS CloudFormation.
Qualifications:
- Bachelor’s degree in Computer Science, IT, or a related field.
- Proven experience with CI/CD pipelines and tools like Jenkins, GitLab, or Azure DevOps.
- Advanced knowledge of cloud platforms (AWS, Azure, or GCP) with hands-on experience in deployments, migrations, and optimizations.
- Strong expertise in containerization (Docker) and orchestration tools (Kubernetes).
- Proficiency in scripting languages like Python, Bash, or PowerShell.
- Deep understanding of system security, networking, and load balancing.
- Strong analytical skills and problem-solving mindset.
- Certifications (e.g., AWS Certified Solutions Architect, Kubernetes Administrator) are a plus.
What We Offer:
- Opportunity to work with a cutting-edge tech stack in a product-first company.
- Collaborative and growth-oriented environment.
- Competitive salary and benefits.
- Freedom to innovate and contribute to impactful projects.
Key Responsibilities
• As a part of the DevOps team, you will be responsible for the configuration, optimization, documentation, and support of the CI/CD components.
• Creating and managing build and release pipelines with Azure DevOps and Jenkins.
• Assist in planning and reviewing application architecture and design to promote an efficient deployment process.
• Troubleshoot server performance issues & handle the continuous integration system.
• Automate infrastructure provisioning using ARM Templates and Terraform.
• Monitor and Support deployment, Cloud-based and On-premises Infrastructure.
• Diagnose and develop root-cause solutions for failures and performance issues in the production environment.
• Deploy and manage Infrastructure for production applications
• Configure security best practices for application and infrastructure
Essential Requirements
• Good hands-on experience with cloud platforms like Azure, AWS & GCP. (Preferably Azure)
• Strong knowledge of CI/CD principles.
• Strong work experience with CI/CD implementation tools like Azure DevOps, Team City, Octopus Deploy, AWS Code Deploy, and Jenkins.
• Experience of writing automation scripts with PowerShell, Bash, Python, etc.
• GitHub, JIRA, Confluence, and Continuous Integration (CI) system.
• Understanding of secure DevOps practices
Good to Have -
•Knowledge of scripting languages such as PowerShell, Bash
• Experience with project management and workflow tools such as Agile, Jira, Scrum/Kanban, etc.
• Experience with Build technologies and cloud services. (Jenkins, TeamCity, Azure DevOps, Bamboo, AWS Code Deploy)
• Strong communication skills and ability to explain protocol and processes with team and management.
• Must be able to handle multiple tasks and adapt to a constantly changing environment.
• Must have a good understanding of SDLC.
• Knowledge of Linux, Windows server, Monitoring tools, and Shell scripting.
• Self-motivated; demonstrating the ability to achieve in technologies with minimal supervision.
• Organized, flexible, and analytical ability to solve problems creatively
● Manage AWS services and day to day cloud operations.
● Work closely with the development and QA team to make the deployment process
smooth and devise new tools and technologies in order to achieve automation of most
of the components.
● Strengthen the infrastructure in terms of Reliability (configuring HA etc.), Security (cloud
network management, VPC, etc.) and Scalability (configuring clusters, load balancers,
etc.)
● Expert level understanding of DB replication, Sharding (mySQL DB Systems), HA
clusters, Failovers and recovery mechanisms.
● Build and maintain CI-CD (continuous integration/deployment) workflows.
● Having an expert knowledge on AWS EC2, S3, RDS, Cloudfront and other AWS offered
services and products.
● Installation and management of software systems in order to support the development
team e.g. DB installation and administration, web servers, caching and other such
systems.
Requirements:
● B. Tech or Bachelor's in a related field.
● 2-5 years of hands-on experience with AWS cloud services such as EC2, ECS,
Cloudwatch, SQS, S3, CloudFront, route53.
● Experience with setting up CI-CD pipelines and successfully running large scale
systems.
● Experience with source control systems (SVN, GIT etc), Deployment and build
automation tools like Jenkins, Bamboo, Ansible etc.
● Good experience and understanding of Linux/Unix based systems and hands-on
experience working with them with respect to networking, security, administration.
● Atleast 1-2 years of experience with shell/python/perl scripting; having experience with
Bash scripting is an added advantage.
● Experience with automation tasks like, automated backups, configuring fail overs,
automating deployment related process is a must have.
● Good to have knowledge of setting up the ELK stack; Infrastructure as a code services
like Terraform; working and automating processes with AWS SDK/CLI tools with scripts
Our Client is an IT infrastructure services company, focused and specialized in delivering solutions and services on Microsoft products and technologies. They are a Microsoft partner and cloud solution provider. Our Client's objective is to help small, mid-sized as well as global enterprises to transform their business by using innovation in IT, adapting to the latest technologies and using IT as an enabler for business to meet business goals and continuous growth.
With focused and experienced management and a strong team of IT Infrastructure professionals, they are adding value by making IT Infrastructure a robust, agile, secure and cost-effective service to the business. As an independent IT Infrastructure company, they provide their clients with unbiased advice on how to successfully implement and manage technology to complement their business requirements.
- Providing on-call support within a high availability production environment
- Logging issues
- Providing Complex problem analysis and resolution for technical and application issues
- Supporting and collaborating with team members
- Running system updates
- Monitoring and responding to system alerts
- Developing and running system health checks
- Applying industry standard practices across the technology estate
- Performing system reviews
- Reviewing and maintaining infrastructure configuration
- Diagnosing performance issues and network bottlenecks
- Collaborating within geographically distributed teams
- Supporting software development infrastructure by continuous integration and delivery standards
- Working closely with developers and QA teams as part of a customer support centre
- Projecting delivery work, either individually or in conjunction with other teams, external suppliers or contractors
- Ensuring maintenance of the technical environments to meet current standards
- Ensuring compliance with appropriate industry and security regulations
- Providing support to Development and Customer Support teams
- Managing the hosted infrastructure through vendor engagement
- Managing 3rd party software licensing ensuring compliance
- Delivering new technologies as agreed by the business
What you need to have:
- Experience working within a technical operations environment relevant to associated skills stated.
- Be proficient in:
- Linux, zsh/ bash/ similar
- ssh, tmux/ screen/ similar
- vim/ emacs/ similar
- Computer networking
- Have a reasonable working knowledge of:
- Cloud infrastructure, Preferably GCP
- One or more programming/ scripting languages
- Git
- Docker
- Web services and web servers
- Databases, relational and NoSQL
- Some familiarity with:
- Puppet, ansible
- Terraform
- GitHub, CircleCI , Kubernetes
- Scripting language- Shell
- Databases: Cassandra, Postgres, MySQL or CloudSQL
- Agile working practices including scrum and Kanban
- Private & public cloud hosting environments
- Strong technology interests with a positive ‘can do’ attitude
- Be flexible and adaptable to changing priorities
- Be good at planning and organising their own time and able to meet targets and deadlines without supervision
- Excellent written and verbal communication skills.
- Approachable with both colleagues and team members
- Be resourceful and practical with an ability to respond positively and quickly to technical and business challenges
- Be persuasive, articulate and influential, but down to earth and friendly with own team and colleagues
- Have an ability to establish relationships quickly and to work effectively either as part of a team or singularly
- Be customer focused with both internal and external customers
- Be capable of remaining calm under pressure
- Technically minded with good problem resolution skills and systematic manner
- Excellent documentation skills
- Prepared to participate in out of hours support rota
Radical is a platform connecting data, medicine and people -- through machine learning, and usable, performant products. Software has never been the strong suit of the medical industry -- and we are changing that. We believe that the same sophistication and performance that powers our daily needs through millions of consumer applications -- be it your grocery, your food delivery or your movie tickets -- when applied to healthcare, has a massive potential to transform the industry, and positively impact lives of patients and doctors. Radical works with some of the largest hospitals and public health programmes in India, and has a growing footprint both inside the country and abroad.
As a DevOps Engineer at Radical, you will:
Work closely with all stakeholders in the healthcare ecosystem - patients, doctors, paramedics and administrators - to conceptualise and bring to life the ideal set of products that add value to their time
Work alongside Software Developers and ML Engineers to solve problems and assist in architecture design
Work on systems which have an extraordinary emphasis on capturing data that can help build better workflows, algorithms and tools
Work on high performance systems that deal with several million transactions, multi-modal data and large datasets, with a close attention to detail
We’re looking for someone who has:
Familiarity and experience with writing working, well-documented and well-tested scripts, Dockerfiles, Puppet/Ansible/Chef/Terraform scripts.
Proficiency with scripting languages like Python and Bash.
Knowledge of systems deployment and maintainence, including setting up CI/CD and working alongside Software Developers, monitoring logs, dashboards, etc.
Experience integrating with a wide variety of external tools and services
Experience navigating AWS and leveraging appropriate services and technologies rather than DIY solutions (such as hosting an application directly on EC2 vs containerisation, or an Elastic Beanstalk)
It’s not essential, but great if you have:
An established track record of deploying and maintaining systems.
Experience with microservices and decomposition of monolithic architectures
Proficiency in automated tests.
Proficiency with the linux ecosystem
Experience in deploying systems to production on cloud platforms such as AWS
The position is open now, and we are onboarding immediately.
Please write to us with an updated resume, and one thing you would like us to see as part of your application. This one thing can be anything that you think makes you stand apart among candidates.
Radical is based out of Delhi NCR, India, and we look forward to working with you!
We're looking for people who may not know all the answers, but are obsessive about finding them, and take pride in the code that they write. We are more interested in the ability to learn fast, think rigorously and for people who aren’t afraid to challenge assumptions, and take large bets -- only to work hard and prove themselves correct. You're encouraged to apply even if your experience doesn't precisely match the job description. Join us.





