11+ Root cause analysis Jobs in Delhi, NCR and Gurgaon | Root cause analysis Job openings in Delhi, NCR and Gurgaon
Apply to 11+ Root cause analysis Jobs in Delhi, NCR and Gurgaon on CutShort.io. Explore the latest Root cause analysis Job opportunities across top companies like Google, Amazon & Adobe.
Required Skills: Advanced AWS Infrastructure Expertise, CI/CD Pipeline Automation, Monitoring, Observability & Incident Management, Security, Networking & Risk Management, Infrastructure as Code & Scripting
Criteria:
- 5+ years of DevOps/SRE experience in cloud-native, product-based companies (B2C scale preferred)
- Strong hands-on AWS expertise across core and advanced services (EC2, ECS/EKS, Lambda, S3, CloudFront, RDS, VPC, IAM, ELB/ALB, Route53)
- Proven experience designing high-availability, fault-tolerant cloud architectures for large-scale traffic
- Strong experience building & maintaining CI/CD pipelines (Jenkins mandatory; GitHub Actions/GitLab CI a plus)
- Prior experience running production-grade microservices deployments and automated rollout strategies (Blue/Green, Canary)
- Hands-on experience with monitoring & observability tools (Grafana, Prometheus, ELK, CloudWatch, New Relic, etc.)
- Solid hands-on experience with MongoDB in production, including performance tuning, indexing & replication
- Strong scripting skills (Bash, Shell, Python) for automation
- Hands-on experience with IaC (Terraform, CloudFormation, or Ansible)
- Deep understanding of networking fundamentals (VPC, subnets, routing, NAT, security groups)
- Strong experience in incident management, root cause analysis & production firefighting
Description
Role Overview
Company is seeking an experienced Senior DevOps Engineer to design, build, and optimize cloud infrastructure on AWS, automate CI/CD pipelines, implement monitoring and security frameworks, and proactively identify scalability challenges. This role requires someone who has hands-on experience running infrastructure at B2C product scale, ideally in media/OTT or high-traffic applications.
Key Responsibilities
1. Cloud Infrastructure — AWS (Primary Focus)
- Architect, deploy, and manage scalable infrastructure using AWS services such as EC2, ECS/EKS, Lambda, S3, CloudFront, RDS, ELB/ALB, VPC, IAM, Route53, etc.
- Optimize cloud cost, resource utilization, and performance across environments.
- Design high-availability, fault-tolerant systems for streaming workloads.
2. CI/CD Automation
- Build and maintain CI/CD pipelines using Jenkins, GitHub Actions, or GitLab CI.
- Automate deployments for microservices, mobile apps, and backend APIs.
- Implement blue/green and canary deployments for seamless production rollouts.
3. Observability & Monitoring
- Implement logging, metrics, and alerting using tools like Grafana, Prometheus, ELK, CloudWatch, New Relic, etc.
- Perform proactive performance analysis to minimize downtime and bottlenecks.
- Set up dashboards for real-time visibility into system health and user traffic spikes.
4. Security, Compliance & Risk Highlighting
• Conduct frequent risk assessments and identify vulnerabilities in:
o Cloud architecture
o Access policies (IAM)
o Secrets & key management
o Data flows & network exposure
• Implement security best practices including VPC isolation, WAF rules, firewall policies, and SSL/TLS management.
5. Scalability & Reliability Engineering
- Analyze traffic patterns for OTT-specific load variations (weekends, new releases, peak hours).
- Identify scalability gaps and propose solutions across:
- o Microservices
- o Caching layers
- o CDN distribution (CloudFront)
- o Database workloads
- Perform capacity planning and load testing to ensure readiness for 10x traffic growth.
6. Database & Storage Support
- Administer and optimize MongoDB for high-read/low-latency use cases.
- Design backup, recovery, and data replication strategies.
- Work closely with backend teams to tune query performance and indexing.
7. Automation & Infrastructure as Code
- Implement IaC using Terraform, CloudFormation, or Ansible.
- Automate repetitive infrastructure tasks to ensure consistency across environments.
Required Skills & Experience
Technical Must-Haves
- 5+ years of DevOps/SRE experience in cloud-native, product-based companies.
- Strong hands-on experience with AWS (core and advanced services).
- Expertise in Jenkins CI/CD pipelines.
- Solid background working with MongoDB in production environments.
- Good understanding of networking: VPCs, subnets, security groups, NAT, routing.
- Strong scripting experience (Bash, Python, Shell).
- Experience handling risk identification, root cause analysis, and incident management.
Nice to Have
- Experience with OTT, video streaming, media, or any content-heavy product environments.
- Familiarity with containers (Docker), orchestration (Kubernetes/EKS), and service mesh.
- Understanding of CDN, caching, and streaming pipelines.
Personality & Mindset
- Strong sense of ownership and urgency—DevOps is mission critical at OTT scale.
- Proactive problem solver with ability to think about long-term scalability.
- Comfortable working with cross-functional engineering teams.
Why Join company?
• Build and operate infrastructure powering millions of monthly users.
• Opportunity to shape DevOps culture and cloud architecture from the ground up.
• High-impact role in a fast-scaling Indian OTT product.
ROLE & RESPONSIBILITIES:
We are hiring a Senior DevSecOps / Security Engineer with 8+ years of experience securing AWS cloud, on-prem infrastructure, DevOps platforms, MLOps environments, CI/CD pipelines, container orchestration, and data/ML platforms. This role is responsible for creating and maintaining a unified security posture across all systems used by DevOps and MLOps teams — including AWS, Kubernetes, EMR, MWAA, Spark, Docker, GitOps, observability tools, and network infrastructure.
KEY RESPONSIBILITIES:
1. Cloud Security (AWS)-
- Secure all AWS resources consumed by DevOps/MLOps/Data Science: EC2, EKS, ECS, EMR, MWAA, S3, RDS, Redshift, Lambda, CloudFront, Glue, Athena, Kinesis, Transit Gateway, VPC Peering.
- Implement IAM least privilege, SCPs, KMS, Secrets Manager, SSO & identity governance.
- Configure AWS-native security: WAF, Shield, GuardDuty, Inspector, Macie, CloudTrail, Config, Security Hub.
- Harden VPC architecture, subnets, routing, SG/NACLs, multi-account environments.
- Ensure encryption of data at rest/in transit across all cloud services.
2. DevOps Security (IaC, CI/CD, Kubernetes, Linux)-
Infrastructure as Code & Automation Security:
- Secure Terraform, CloudFormation, Ansible with policy-as-code (OPA, Checkov, tfsec).
- Enforce misconfiguration scanning and automated remediation.
CI/CD Security:
- Secure Jenkins, GitHub, GitLab pipelines with SAST, DAST, SCA, secrets scanning, image scanning.
- Implement secure build, artifact signing, and deployment workflows.
Containers & Kubernetes:
- Harden Docker images, private registries, runtime policies.
- Enforce EKS security: RBAC, IRSA, PSP/PSS, network policies, runtime monitoring.
- Apply CIS Benchmarks for Kubernetes and Linux.
Monitoring & Reliability:
- Secure observability stack: Grafana, CloudWatch, logging, alerting, anomaly detection.
- Ensure audit logging across cloud/platform layers.
3. MLOps Security (Airflow, EMR, Spark, Data Platforms, ML Pipelines)-
Pipeline & Workflow Security:
- Secure Airflow/MWAA connections, secrets, DAGs, execution environments.
- Harden EMR, Spark jobs, Glue jobs, IAM roles, S3 buckets, encryption, and access policies.
ML Platform Security:
- Secure Jupyter/JupyterHub environments, containerized ML workspaces, and experiment tracking systems.
- Control model access, artifact protection, model registry security, and ML metadata integrity.
Data Security:
- Secure ETL/ML data flows across S3, Redshift, RDS, Glue, Kinesis.
- Enforce data versioning security, lineage tracking, PII protection, and access governance.
ML Observability:
- Implement drift detection (data drift/model drift), feature monitoring, audit logging.
- Integrate ML monitoring with Grafana/Prometheus/CloudWatch.
4. Network & Endpoint Security-
- Manage firewall policies, VPN, IDS/IPS, endpoint protection, secure LAN/WAN, Zero Trust principles.
- Conduct vulnerability assessments, penetration test coordination, and network segmentation.
- Secure remote workforce connectivity and internal office networks.
5. Threat Detection, Incident Response & Compliance-
- Centralize log management (CloudWatch, OpenSearch/ELK, SIEM).
- Build security alerts, automated threat detection, and incident workflows.
- Lead incident containment, forensics, RCA, and remediation.
- Ensure compliance with ISO 27001, SOC 2, GDPR, HIPAA (as applicable).
- Maintain security policies, procedures, RRPs (Runbooks), and audits.
IDEAL CANDIDATE:
- 8+ years in DevSecOps, Cloud Security, Platform Security, or equivalent.
- Proven ability securing AWS cloud ecosystems (IAM, EKS, EMR, MWAA, VPC, WAF, GuardDuty, KMS, Inspector, Macie).
- Strong hands-on experience with Docker, Kubernetes (EKS), CI/CD tools, and Infrastructure-as-Code.
- Experience securing ML platforms, data pipelines, and MLOps systems (Airflow/MWAA, Spark/EMR).
- Strong Linux security (CIS hardening, auditing, intrusion detection).
- Proficiency in Python, Bash, and automation/scripting.
- Excellent knowledge of SIEM, observability, threat detection, monitoring systems.
- Understanding of microservices, API security, serverless security.
- Strong understanding of vulnerability management, penetration testing practices, and remediation plans.
EDUCATION:
- Master’s degree in Cybersecurity, Computer Science, Information Technology, or related field.
- Relevant certifications (AWS Security Specialty, CISSP, CEH, CKA/CKS) are a plus.
PERKS, BENEFITS AND WORK CULTURE:
- Competitive Salary Package
- Generous Leave Policy
- Flexible Working Hours
- Performance-Based Bonuses
- Health Care Benefits
We are seeking a skilled and proactive Kubernetes Administrator with strong hands-on experience in managing Red Hat OpenShift environments. The ideal candidate will have a solid background in Kubernetes administration, ArgoCD, and Jenkins.
This role demands a self-motivated, quick learner who can confidently manage OpenShift-based infrastructure in production environments, communicate effectively with stakeholders, and escalate issues promptly when needed.
Key Skills & Qualifications
- Strong experience with Red Hat OpenShift and Kubernetes administration (OpenShift or Kubernetes certification a plus).
- Proven expertise in managing containerized workloads on OpenShift platforms.
- Experience with ArgoCD, GitLab CI/CD, and Helm for deployment automation.
- Ability to troubleshoot issues in high-pressure production environments.
- Strong communication and customer-facing skills.
- Quick learner with a positive attitude toward problem-solving.
Role Overview:
As a DevOps Engineer (L2), you will play a key role in designing, implementing, and optimizing infrastructure. You will take ownership of automating processes, improving system reliability, and supporting the development lifecycle.
Key Responsibilities:
- Design and manage scalable, secure, and highly available cloud infrastructure.
- Lead efforts in implementing and optimizing CI/CD pipelines.
- Automate repetitive tasks and develop robust monitoring solutions.
- Ensure the security and compliance of systems, including IAM, VPCs, and network configurations.
- Troubleshoot complex issues across development, staging, and production environments.
- Mentor and guide L1 engineers on best practices.
- Stay updated on emerging DevOps tools and technologies.
- Manage cloud resources efficiently using Infrastructure as Code (IaC) tools like Terraform and AWS CloudFormation.
Qualifications:
- Bachelor’s degree in Computer Science, IT, or a related field.
- Proven experience with CI/CD pipelines and tools like Jenkins, GitLab, or Azure DevOps.
- Advanced knowledge of cloud platforms (AWS, Azure, or GCP) with hands-on experience in deployments, migrations, and optimizations.
- Strong expertise in containerization (Docker) and orchestration tools (Kubernetes).
- Proficiency in scripting languages like Python, Bash, or PowerShell.
- Deep understanding of system security, networking, and load balancing.
- Strong analytical skills and problem-solving mindset.
- Certifications (e.g., AWS Certified Solutions Architect, Kubernetes Administrator) are a plus.
What We Offer:
- Opportunity to work with a cutting-edge tech stack in a product-first company.
- Collaborative and growth-oriented environment.
- Competitive salary and benefits.
- Freedom to innovate and contribute to impactful projects.
Role : Senior Engineer Infrastructure
Key Responsibilities:
● Infrastructure Development and Management: Design, implement, and manage robust and scalable infrastructure solutions, ensuring optimal performance,security, and availability. Lead transition and migration projects, moving legacy systemsto cloud-based solutions.
● Develop and maintain applications and services using Golang.
● Automation and Optimization: Implement automation tools and frameworksto optimize operational processes. Monitorsystem performance, optimizing and modifying systems as necessary.
● Security and Compliance: Ensure infrastructure security by implementing industry best practices and compliance requirements. Respond to and mitigate security incidents and vulnerabilities.
Qualifications:
● Bachelor's degree in Computer Science, Engineering, or a related field (or equivalent practical experience).
● Good understanding of prominent backend languageslike Golang, Python, Node.js, or others.
● In-depth knowledge of network architecture,system security, infrastructure scalability.
● Proficiency with development tools,server management, and database systems.
● Strong experience with cloud services(AWS.), deployment,scaling, and management.
● Knowledge of Azure is a plus
● Familiarity with containers and orchestration services,such as Docker, Kubernetes, etc.
● Strong problem-solving skills and analytical thinking.
● Excellent verbal and written communication skills.
● Ability to thrive in a collaborative team environment.
● Genuine passion for backend development and keen interest in scalable systems.
We are looking for an excellent experienced person in the Dev-Ops field. Be a part of a vibrant, rapidly growing tech enterprise with a great working environment. As a DevOps Engineer, you will be responsible for managing and building upon the infrastructure that supports our data intelligence platform. You'll also be involved in building tools and establishing processes to empower developers to
deploy and release their code seamlessly.
Responsibilities
The ideal DevOps Engineers possess a solid understanding of system internals and distributed systems.
Understanding accessibility and security compliance (Depending on the specific project)
User authentication and authorization between multiple systems,
servers, and environments
Integration of multiple data sources and databases into one system
Understanding fundamental design principles behind a scalable
application
Configuration management tools (Ansible/Chef/Puppet), Cloud
Service Providers (AWS/DigitalOcean), Docker+Kubernetes ecosystem is a plus.
Should be able to make key decisions for our infrastructure,
networking and security.
Manipulation of shell scripts during migration and DB connection.
Monitor Production Server Health of different parameters (CPU Load, Physical Memory, Swap Memory and Setup Monitoring tool to
Monitor Production Servers Health, Nagios
Created Alerts and configured monitoring of specified metrics to
manage their cloud infrastructure efficiently.
Setup/Managing VPC, Subnets; make connection between different zones; blocking suspicious ip/subnet via ACL.
Creating/Managing AMI/Snapshots/Volumes, Upgrade/downgrade
AWS resources (CPU, Memory, EBS)
The candidate would be Responsible for managing microservices at scale maintain the compute and storage infrastructure for various product teams.
Strong Knowledge about Configuration Management Tools like –
Ansible, Chef, Puppet
Extensively worked with Change tracking tools like JIRA and log
Analysis, Maintaining documents of production server error log's
reports.
Experienced in Troubleshooting, Backup, and Recovery
Excellent Knowledge of Cloud Service Providers like – AWS, Digital
Ocean
Good Knowledge about Docker, Kubernetes eco-system.
Proficient understanding of code versioning tools, such as Git
Must have experience working in an automated environment.
Good knowledge of Amazon Web Service Architects like – Amazon EC2, Amazon S3 (Amazon Glacier), Amazon VPC, Amazon Cloud Watch.
Scheduling jobs using crontab, Create SWAP Memory
Proficient Knowledge about Access Management (IAM)
Must have expertise in Maven, Jenkins, Chef, SVN, GitHub, Tomcat, Linux, etc.
Candidate Should have good knowledge about GCP.
EducationalQualifications
B-Tech-IT/M-Tech -/MBA- IT/ BCA /MCA or any degree in the relevant field
EXPERIENCE: 2-6 yr
You need to drive automation for implementing scalable and robust applications. You would indulge your dedication and passion to build server-side optimization ensuring low-latency and high-end performance for the cloud deployed within datacentre. You should have sound knowledge of Open stack and Kubernetes domain.
YOUR ‘OKR’ SUMMARY
OKR means Objective and Key Results.
As a DevOps Engineer, you will understand the overall movement of data in the entire platform, find bottlenecks,define solutions, develop key pieces, write APIs and own deployment of those. You will work with internal and external development teams to discover these opportunities and to solve hard problems. You will also guide engineers in solving the complex problems, developing your acceptance tests for those and reviewing the work and
the test results.
What you will do
- As a DevOps Engineer responsible for systems being used by customer across the globe.
- Set the goals for overall system and divide into goals for the sub-system.
- Guide/motivate/convince/mentor the architects on sub-systems and help them achieving improvements with agility and speed.
- Identify the performance bottleneck and come up with the solution to optimize time and cost taken by build/test system.
- Be a thought leader to contribute to the capacity planning for software/hardware, spanning internal and public cloud, solving the trade-off between turnaround time and utilization.
- Bring in technologies enabling massively parallel systems to improve turnaround time by an order of magnitude.
What you will need
A strong sense of ownership, urgency, and drive. As an integral part of the development team, you will need the following skills to succeed.
- BS or BE/B.Tech or equivalent experience in EE/CS with 10+ years of experience.
- Strong background of Architecting and shipping distributed scalable software product with good understanding of system programming.
- Excellent background of Cloud technologies like: OpenStack, Docker, Kubernetes, Ansible, Ceph is must.
- Excellent understanding of hybrid, multi-cloud architecture and edge computing concepts.
- Ability to identify the bottleneck and come up with solution to optimize it.
- Programming and software development skills in Python, Shell-script along with good understanding of distributed systems and REST APIs.
- Experience in working with SQL/NoSQL database systems such as MySQL, MongoDB or Elasticsearch.
- Excellent knowledge and working experience with Docker containers and Virtual Machines.
- Ability to effectively work across organizational boundaries to maximize alignment and productivity between teams.
- Ability and flexibility to work and communicate effectively in a multi-national, multi-time-zone corporate.
Additional Advantage:
- Deep understanding of technology and passionate about what you do.
- Background in designing high performant scalable software systems with strong focus to optimizehardware cost.
- Solid collaborative and interpersonal skills, specifically a proven ability to effectively guide andinfluence within a dynamic environment.
- Strong commitment to get the most performance out of a system being worked on.
- Prior development of a large software project using service-oriented architecture operating with real time constraints.
What's In It for You?
- You will get a chance to work on cloud-native and hyper-scale products
- You will be working with industry leaders in cloud.
- You can expect a steep learning curve.
- You will get the experience of solving real time problems, eventually you become a problem solver.
Benefits & Perks:
- Competitive Salary
- Health Insurance
- Open Learning - 100% Reimbursement for online technical courses.
- Fast Growth - opportunities to grow quickly and surely
- Creative Freedom + Flat hierarchy
- Sponsorship to all those employees who represent company in events and meet ups.
- Flexible working hours
- 5 days week
- Hybrid Working model (Office and WFH)
Our Hiring Process:
Candidates for this position can expect the hiring process as follows (subject to successful clearing of every round)
- Initial Resume screening call with our Recruiting team
- Next, candidates will be invited to solve coding exercises.
- Next, candidates will be invited for first technical interview
- Next, candidates will be invited for final technical interview
- Finally, candidates will be invited for Culture Plus interview with HR
- Candidates may be asked to interview with the Leadership team
- Successful candidates will subsequently be made an offer via email
As always, the interviews and screening call will be conducted via a mix of telephonic and video call.
So, if you are looking at an opportunity to really make a difference- make it with us…
Coredge.io provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by applicable central, state or local laws.
Exposure to development and implementation practices in a modern systems environment together with exposure to working in a project team particularly with reference to industry methodologies, e.g. Agile, continuous delivery, etc
- At least 3-5 years of experience building and maintaining AWS infrastructure (VPC, EC2, Security Groups, IAM, ECS, CodeDeploy, CloudFront, S3)
- Strong understanding of how to secure AWS environments and meet compliance requirements
- Experience using DevOps methodology and Infrastructure as Code
- Automation / CI/CD tools – Bitbucket Pipelines, Jenkins
- Infrastructure as code – Terraform, Cloudformation, etc
- Strong experience deploying and managing infrastructure with Terraform
- Automated provisioning and configuration management – Ansible, Chef, Puppet
- Experience with Docker, GitHub, Jenkins, ELK and deploying applications on AWS
- Improve CI/CD processes, support software builds and CI/CD of the development departments
- Develop, maintain, and optimize automated deployment code for development, test, staging and production environments





