11+ Root cause analysis Jobs in Delhi, NCR and Gurgaon | Root cause analysis Job openings in Delhi, NCR and Gurgaon
Apply to 11+ Root cause analysis Jobs in Delhi, NCR and Gurgaon on CutShort.io. Explore the latest Root cause analysis Job opportunities across top companies like Google, Amazon & Adobe.
Required Skills: Advanced AWS Infrastructure Expertise, CI/CD Pipeline Automation, Monitoring, Observability & Incident Management, Security, Networking & Risk Management, Infrastructure as Code & Scripting
Criteria:
- 5+ years of DevOps/SRE experience in cloud-native, product-based companies (B2C scale preferred)
- Strong hands-on AWS expertise across core and advanced services (EC2, ECS/EKS, Lambda, S3, CloudFront, RDS, VPC, IAM, ELB/ALB, Route53)
- Proven experience designing high-availability, fault-tolerant cloud architectures for large-scale traffic
- Strong experience building & maintaining CI/CD pipelines (Jenkins mandatory; GitHub Actions/GitLab CI a plus)
- Prior experience running production-grade microservices deployments and automated rollout strategies (Blue/Green, Canary)
- Hands-on experience with monitoring & observability tools (Grafana, Prometheus, ELK, CloudWatch, New Relic, etc.)
- Solid hands-on experience with MongoDB in production, including performance tuning, indexing & replication
- Strong scripting skills (Bash, Shell, Python) for automation
- Hands-on experience with IaC (Terraform, CloudFormation, or Ansible)
- Deep understanding of networking fundamentals (VPC, subnets, routing, NAT, security groups)
- Strong experience in incident management, root cause analysis & production firefighting
Description
Role Overview
Company is seeking an experienced Senior DevOps Engineer to design, build, and optimize cloud infrastructure on AWS, automate CI/CD pipelines, implement monitoring and security frameworks, and proactively identify scalability challenges. This role requires someone who has hands-on experience running infrastructure at B2C product scale, ideally in media/OTT or high-traffic applications.
Key Responsibilities
1. Cloud Infrastructure — AWS (Primary Focus)
- Architect, deploy, and manage scalable infrastructure using AWS services such as EC2, ECS/EKS, Lambda, S3, CloudFront, RDS, ELB/ALB, VPC, IAM, Route53, etc.
- Optimize cloud cost, resource utilization, and performance across environments.
- Design high-availability, fault-tolerant systems for streaming workloads.
2. CI/CD Automation
- Build and maintain CI/CD pipelines using Jenkins, GitHub Actions, or GitLab CI.
- Automate deployments for microservices, mobile apps, and backend APIs.
- Implement blue/green and canary deployments for seamless production rollouts.
3. Observability & Monitoring
- Implement logging, metrics, and alerting using tools like Grafana, Prometheus, ELK, CloudWatch, New Relic, etc.
- Perform proactive performance analysis to minimize downtime and bottlenecks.
- Set up dashboards for real-time visibility into system health and user traffic spikes.
4. Security, Compliance & Risk Highlighting
• Conduct frequent risk assessments and identify vulnerabilities in:
o Cloud architecture
o Access policies (IAM)
o Secrets & key management
o Data flows & network exposure
• Implement security best practices including VPC isolation, WAF rules, firewall policies, and SSL/TLS management.
5. Scalability & Reliability Engineering
- Analyze traffic patterns for OTT-specific load variations (weekends, new releases, peak hours).
- Identify scalability gaps and propose solutions across:
- o Microservices
- o Caching layers
- o CDN distribution (CloudFront)
- o Database workloads
- Perform capacity planning and load testing to ensure readiness for 10x traffic growth.
6. Database & Storage Support
- Administer and optimize MongoDB for high-read/low-latency use cases.
- Design backup, recovery, and data replication strategies.
- Work closely with backend teams to tune query performance and indexing.
7. Automation & Infrastructure as Code
- Implement IaC using Terraform, CloudFormation, or Ansible.
- Automate repetitive infrastructure tasks to ensure consistency across environments.
Required Skills & Experience
Technical Must-Haves
- 5+ years of DevOps/SRE experience in cloud-native, product-based companies.
- Strong hands-on experience with AWS (core and advanced services).
- Expertise in Jenkins CI/CD pipelines.
- Solid background working with MongoDB in production environments.
- Good understanding of networking: VPCs, subnets, security groups, NAT, routing.
- Strong scripting experience (Bash, Python, Shell).
- Experience handling risk identification, root cause analysis, and incident management.
Nice to Have
- Experience with OTT, video streaming, media, or any content-heavy product environments.
- Familiarity with containers (Docker), orchestration (Kubernetes/EKS), and service mesh.
- Understanding of CDN, caching, and streaming pipelines.
Personality & Mindset
- Strong sense of ownership and urgency—DevOps is mission critical at OTT scale.
- Proactive problem solver with ability to think about long-term scalability.
- Comfortable working with cross-functional engineering teams.
Why Join company?
• Build and operate infrastructure powering millions of monthly users.
• Opportunity to shape DevOps culture and cloud architecture from the ground up.
• High-impact role in a fast-scaling Indian OTT product.
Role - IT Cloud Engineer/ Devops
- Proficient in Linux.
- Hands on experience with AWS cloud or Google Cloud.
- Knowledge of container technology like Docker.
- Expertise in scripting languages. (Shell scripting or Python scripting)
- Working knowledge of LAMP/LEMP stack, networking and version control system like Gitlab or Github.
Job Description:
The incumbent would be responsible for:
- Deployment of various infrastructures on Cloud platforms like AWS, GCP, Azure, OVH etc.
- Server monitoring, analysis and troubleshooting.
- Deploying multi-tier architectures using microservices.
- Integration of Container technologies like Docker, Kubernetes etc as per application requirement.
- Automating workflow with python or shell scripting.
- CI and CD integration for application lifecycle management.
- Hosting and managing websites on Linux machines.
- Frontend, backend and database optimization.
- Protecting operations by keeping information confidential.
- Providing information by collecting, analyzing, summarizing development & service issues.
- Prepares & installs solutions by determining and designing system specifications, standards & programming.
Key Responsibilities
- Design, implement, and maintain CI/CD pipelines for backend, frontend, and mobile applications.
- Manage cloud infrastructure using AWS (EC2, Lambda, S3, VPC, RDS, CloudWatch, ECS/EKS).
- Configure and maintain Docker containers and/or Kubernetes clusters.
- Implement and maintain Infrastructure as Code (IaC) using Terraform / CloudFormation.
- Automate build, deployment, and monitoring processes.
- Manage code repositories using Git/GitHub/GitLab, enforce branching strategies.
- Implement monitoring and alerting using tools like Prometheus, Grafana, CloudWatch, ELK, Splunk.
- Ensure system scalability, reliability, and security.
- Troubleshoot production issues and perform root-cause analysis.
- Collaborate with engineering teams to improve deployment and development workflows.
- Optimize infrastructure costs and improve performance.
Required Skills & Qualifications
- 3+ years of experience in DevOps, SRE, or Cloud Engineering.
- Strong hands-on knowledge of AWS cloud services.
- Experience with Docker, containers, and orchestrators (ECS, EKS, Kubernetes).
- Strong understanding of CI/CD tools: GitHub Actions, Jenkins, GitLab CI, or AWS CodePipeline.
- Experience with Linux administration and shell scripting.
- Strong understanding of Networking, VPC, DNS, Load Balancers, Security Groups.
- Experience with monitoring/logging tools: CloudWatch, ELK, Prometheus, Grafana.
- Experience with Terraform or CloudFormation (IaC).
- Good understanding of Node.js or similar application deployments.
- Knowledge of NGINX/Apache and load balancing concepts.
- Strong problem-solving and communication skills.
Preferred/Good to Have
- Experience with Kubernetes (EKS).
- Experience with Serverless architectures (Lambda).
- Experience with Redis, MongoDB, RDS.
- Certification in AWS Solutions Architect / DevOps Engineer.
- Experience with security best practices, IAM policies, and DevSecOps.
- Understanding of cost optimization and cloud cost management.
About the Job
This is a full-time role for a Lead DevOps Engineer at Spark Eighteen. We are seeking an experienced DevOps professional to lead our infrastructure strategy, design resilient systems, and drive continuous improvement in our deployment processes. In this role, you will architect scalable solutions, mentor junior engineers, and ensure the highest standards of reliability and security across our cloud infrastructure. The job location is flexible with preference for the Delhi NCR region.
Responsibilities
- Lead and mentor the DevOps/SRE team
- Define and drive DevOps strategy and roadmaps
- Oversee infrastructure automation and CI/CD at scale
- Collaborate with architects, developers, and QA teams to integrate DevOps practices
- Ensure security, compliance, and high availability of platforms
- Own incident response, postmortems, and root cause analysis
- Budgeting, team hiring, and performance evaluation
Requirements
Technical Skills
- Bachelor's or Master's degree in Computer Science, Engineering, or related field.
- 7+ years of professional DevOps experience with demonstrated progression.
- Strong architecture and leadership background
- Deep hands-on knowledge of infrastructure as code, CI/CD, and cloud
- Proven experience with monitoring, security, and governance
- Effective stakeholder and project management
- Experience with tools like Jenkins, ArgoCD, Terraform, Vault, ELK, etc.
- Strong understanding of business continuity and disaster recovery
Soft Skills
- Cross-functional communication excellence with ability to lead technical discussions.
- Strong mentorship capabilities for junior and mid-level team members.
- Advanced strategic thinking and ability to propose innovative solutions.
- Excellent knowledge transfer skills through documentation and training.
- Ability to understand and align technical solutions with broader business strategy.
- Proactive problem-solving approach with focus on continuous improvement.
- Strong leadership skills in guiding team performance and technical direction.
- Effective collaboration across development, QA, and business teams.
- Ability to make complex technical decisions with minimal supervision.
- Strategic approach to risk management and mitigation.
What We Offer
- Professional Growth: Continuous learning opportunities through diverse projects and mentorship from experienced leaders
- Global Exposure: Work with clients from 20+ countries, gaining insights into different markets and business cultures
- Impactful Work: Contribute to projects that make a real difference, with solutions generating over $1B in revenue
- Work-Life Balance: Flexible arrangements that respect personal wellbeing while fostering productivity
- Career Advancement: Clear progression pathways as you develop skills within our growing organization
- Competitive Compensation: Attractive salary packages that recognize your contributions and expertise
Our Culture
At Spark Eighteen, our culture centers on innovation, excellence, and growth. We believe in:
- Quality-First: Delivering excellence rather than just quick solutions
- True Partnership: Building relationships based on trust and mutual respect
- Communication: Prioritizing clear, effective communication across teams
- Innovation: Encouraging curiosity and creative approaches to problem-solving
- Continuous Learning: Supporting professional development at all levels
- Collaboration: Combining diverse perspectives to achieve shared goals
- Impact: Measuring success by the value we create for clients and users
Apply Here - https://tinyurl.com/t6x23p9b
We are seeking a skilled and proactive Kubernetes Administrator with strong hands-on experience in managing Red Hat OpenShift environments. The ideal candidate will have a solid background in Kubernetes administration, ArgoCD, and Jenkins.
This role demands a self-motivated, quick learner who can confidently manage OpenShift-based infrastructure in production environments, communicate effectively with stakeholders, and escalate issues promptly when needed.
Key Skills & Qualifications
- Strong experience with Red Hat OpenShift and Kubernetes administration (OpenShift or Kubernetes certification a plus).
- Proven expertise in managing containerized workloads on OpenShift platforms.
- Experience with ArgoCD, GitLab CI/CD, and Helm for deployment automation.
- Ability to troubleshoot issues in high-pressure production environments.
- Strong communication and customer-facing skills.
- Quick learner with a positive attitude toward problem-solving.
Required Skills:
- Experience in systems administration, SRE or DevOps focused role
- Experience in handling production support (on-call)
- Good understanding of the Linux operating system and networking concepts.
- Demonstrated competency with the following AWS services: ECS, EC2, EBS, EKS, S3, RDS, ELB, IAM, Lambda.
- Experience with Docker containers and containerization concepts
- Experience with managing and scaling Kubernetes clusters in a production environment
- Experience building scalable infrastructure in AWS with Terraform.
- Strong knowledge of Protocol-level such as HTTP/HTTPS, SMTP, DNS, and LDAP
- Experience monitoring production systems
- Expertise in leveraging Automation / DevOps principles, experience with operational tools, and able to apply best practices for infrastructure and software deployment (Ansible).
- HAProxy, Nginx, SSH, MySQL configuration and operation experience
- Ability to work seamlessly with software developers, QA, project managers, and business development
- Ability to produce and maintain written documentation
You need to drive automation for implementing scalable and robust applications. You would indulge your dedication and passion to build server-side optimization ensuring low-latency and high-end performance for the cloud deployed within datacentre. You should have sound knowledge of Open stack and Kubernetes domain.
YOUR ‘OKR’ SUMMARY
OKR means Objective and Key Results.
As a DevOps Engineer, you will understand the overall movement of data in the entire platform, find bottlenecks,define solutions, develop key pieces, write APIs and own deployment of those. You will work with internal and external development teams to discover these opportunities and to solve hard problems. You will also guide engineers in solving the complex problems, developing your acceptance tests for those and reviewing the work and
the test results.
What you will do
- As a DevOps Engineer responsible for systems being used by customer across the globe.
- Set the goals for overall system and divide into goals for the sub-system.
- Guide/motivate/convince/mentor the architects on sub-systems and help them achieving improvements with agility and speed.
- Identify the performance bottleneck and come up with the solution to optimize time and cost taken by build/test system.
- Be a thought leader to contribute to the capacity planning for software/hardware, spanning internal and public cloud, solving the trade-off between turnaround time and utilization.
- Bring in technologies enabling massively parallel systems to improve turnaround time by an order of magnitude.
What you will need
A strong sense of ownership, urgency, and drive. As an integral part of the development team, you will need the following skills to succeed.
- BS or BE/B.Tech or equivalent experience in EE/CS with 10+ years of experience.
- Strong background of Architecting and shipping distributed scalable software product with good understanding of system programming.
- Excellent background of Cloud technologies like: OpenStack, Docker, Kubernetes, Ansible, Ceph is must.
- Excellent understanding of hybrid, multi-cloud architecture and edge computing concepts.
- Ability to identify the bottleneck and come up with solution to optimize it.
- Programming and software development skills in Python, Shell-script along with good understanding of distributed systems and REST APIs.
- Experience in working with SQL/NoSQL database systems such as MySQL, MongoDB or Elasticsearch.
- Excellent knowledge and working experience with Docker containers and Virtual Machines.
- Ability to effectively work across organizational boundaries to maximize alignment and productivity between teams.
- Ability and flexibility to work and communicate effectively in a multi-national, multi-time-zone corporate.
Additional Advantage:
- Deep understanding of technology and passionate about what you do.
- Background in designing high performant scalable software systems with strong focus to optimizehardware cost.
- Solid collaborative and interpersonal skills, specifically a proven ability to effectively guide andinfluence within a dynamic environment.
- Strong commitment to get the most performance out of a system being worked on.
- Prior development of a large software project using service-oriented architecture operating with real time constraints.
What's In It for You?
- You will get a chance to work on cloud-native and hyper-scale products
- You will be working with industry leaders in cloud.
- You can expect a steep learning curve.
- You will get the experience of solving real time problems, eventually you become a problem solver.
Benefits & Perks:
- Competitive Salary
- Health Insurance
- Open Learning - 100% Reimbursement for online technical courses.
- Fast Growth - opportunities to grow quickly and surely
- Creative Freedom + Flat hierarchy
- Sponsorship to all those employees who represent company in events and meet ups.
- Flexible working hours
- 5 days week
- Hybrid Working model (Office and WFH)
Our Hiring Process:
Candidates for this position can expect the hiring process as follows (subject to successful clearing of every round)
- Initial Resume screening call with our Recruiting team
- Next, candidates will be invited to solve coding exercises.
- Next, candidates will be invited for first technical interview
- Next, candidates will be invited for final technical interview
- Finally, candidates will be invited for Culture Plus interview with HR
- Candidates may be asked to interview with the Leadership team
- Successful candidates will subsequently be made an offer via email
As always, the interviews and screening call will be conducted via a mix of telephonic and video call.
So, if you are looking at an opportunity to really make a difference- make it with us…
Coredge.io provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by applicable central, state or local laws.
Our Client is an IT infrastructure services company, focused and specialized in delivering solutions and services on Microsoft products and technologies. They are a Microsoft partner and cloud solution provider. Our Client's objective is to help small, mid-sized as well as global enterprises to transform their business by using innovation in IT, adapting to the latest technologies and using IT as an enabler for business to meet business goals and continuous growth.
With focused and experienced management and a strong team of IT Infrastructure professionals, they are adding value by making IT Infrastructure a robust, agile, secure and cost-effective service to the business. As an independent IT Infrastructure company, they provide their clients with unbiased advice on how to successfully implement and manage technology to complement their business requirements.
- Providing on-call support within a high availability production environment
- Logging issues
- Providing Complex problem analysis and resolution for technical and application issues
- Supporting and collaborating with team members
- Running system updates
- Monitoring and responding to system alerts
- Developing and running system health checks
- Applying industry standard practices across the technology estate
- Performing system reviews
- Reviewing and maintaining infrastructure configuration
- Diagnosing performance issues and network bottlenecks
- Collaborating within geographically distributed teams
- Supporting software development infrastructure by continuous integration and delivery standards
- Working closely with developers and QA teams as part of a customer support centre
- Projecting delivery work, either individually or in conjunction with other teams, external suppliers or contractors
- Ensuring maintenance of the technical environments to meet current standards
- Ensuring compliance with appropriate industry and security regulations
- Providing support to Development and Customer Support teams
- Managing the hosted infrastructure through vendor engagement
- Managing 3rd party software licensing ensuring compliance
- Delivering new technologies as agreed by the business
What you need to have:
- Experience working within a technical operations environment relevant to associated skills stated.
- Be proficient in:
- Linux, zsh/ bash/ similar
- ssh, tmux/ screen/ similar
- vim/ emacs/ similar
- Computer networking
- Have a reasonable working knowledge of:
- Cloud infrastructure, Preferably GCP
- One or more programming/ scripting languages
- Git
- Docker
- Web services and web servers
- Databases, relational and NoSQL
- Some familiarity with:
- Puppet, ansible
- Terraform
- GitHub, CircleCI , Kubernetes
- Scripting language- Shell
- Databases: Cassandra, Postgres, MySQL or CloudSQL
- Agile working practices including scrum and Kanban
- Private & public cloud hosting environments
- Strong technology interests with a positive ‘can do’ attitude
- Be flexible and adaptable to changing priorities
- Be good at planning and organising their own time and able to meet targets and deadlines without supervision
- Excellent written and verbal communication skills.
- Approachable with both colleagues and team members
- Be resourceful and practical with an ability to respond positively and quickly to technical and business challenges
- Be persuasive, articulate and influential, but down to earth and friendly with own team and colleagues
- Have an ability to establish relationships quickly and to work effectively either as part of a team or singularly
- Be customer focused with both internal and external customers
- Be capable of remaining calm under pressure
- Technically minded with good problem resolution skills and systematic manner
- Excellent documentation skills
- Prepared to participate in out of hours support rota
Position Summary
DevOps is a Department of Horizontal Digital, within which we have 3 different practices.
- Cloud Engineering
- Build and Release
- Managed Services
This opportunity is for Cloud Engineering role who also have some experience with Infrastructure migrations, this will be a complete hands-on job, with focus on migrating clients workloads to the cloud, reporting to the Solution Architect/Team Lead and along with that you are also expected to work on different projects for building out the Sitecore Infrastructure from scratch.
We are Sitecore Platinum Partner and majority of the Infrastructure work that we are doing is for Sitecore.
Sitecore is a .Net Based Enterprise level Web CMS, which can be deployed on On-Prem, IaaS, PaaS and Containers.
So, most of our DevOps work is currently planning, architecting and deploying infrastructure for Sitecore.
Key Responsibilities:
- This role includes ownership of technical, commercial and service elements related to cloud migration and Infrastructure deployments.
- Person who will be selected for this position will ensure high customer satisfaction delivering Infra and migration projects.
- Candidate must expect to work in parallel across multiple projects, along with that candidate must also have a fully flexible approach to working hours.
- Candidate should keep him/herself updated with the rapid technological advancements and developments that are taking place in the industry.
- Along with that candidate should also have a know-how on Infrastructure as a code, Kubernetes, AKS/EKS, Terraform, Azure DevOps, CI/CD Pipelines.
Requirements:
- Bachelor’s degree in computer science or equivalent qualification.
- Total work experience of 6 to 8 Years.
- Total migration experience of 4 to 6 Years.
- Multiple Cloud Background (Azure/AWS/GCP)
- Implementation knowledge of VMs, Vnet,
- Know-how of Cloud Readiness and Assessment
- Good Understanding of 6 R's of Migration.
- Detailed understanding of the cloud offerings
- Ability to Assess and perform discovery independently for any cloud migration.
- Working Exp. on Containers and Kubernetes.
- Good Knowledge of Azure Site Recovery/Azure Migrate/Cloud Endure
- Understanding on vSphere and Hyper-V Virtualization.
- Working experience with Active Directory.
- Working experience with AWS Cloud formation/Terraform templates.
- Working Experience of VPN/Express route/peering/Network Security Groups/Route Table/NAT Gateway, etc.
- Experience of working with CI/CD tools like Octopus, Teamcity, Code Build, Code Deploy, Azure DevOps, GitHub action.
- High Availability and Disaster Recovery Implementations, taking into the consideration of RTO and RPO aspects.
- Candidates with AWS/Azure/GCP Certifications will be preferred.
What you do :
- Developing automation for the various deployments core to our business
- Documenting run books for various processes / improving knowledge bases
- Identifying technical issues, communicating and recommending solutions
- Miscellaneous support (user account, VPN, network, etc)
- Develop continuous integration / deployment strategies
- Production systems deployment/monitoring/optimization
-
Management of staging/development environments
What you know :
- Ability to work with a wide variety of open source technologies and tools
- Ability to code/script (Python, Ruby, Bash)
- Experience with systems and IT operations
- Comfortable with frequent incremental code testing and deployment
- Strong grasp of automation tools (Chef, Packer, Ansible, or others)
- Experience with cloud infrastructure and bare-metal systems
- Experience optimizing infrastructure for high availability and low latencies
- Experience with instrumenting systems for monitoring and reporting purposes
- Well versed in software configuration management systems (git, others)
- Experience with cloud providers (AWS or other) and tailoring apps for cloud deployment
-
Data management skills
Education :
- Degree in Computer Engineering or Computer Science
- 1-3 years of equivalent experience in DevOps roles.
- Work conducted is focused on business outcomes
- Can work in an environment with a high level of autonomy (at the individual and team level)
-
Comfortable working in an open, collaborative environment, reaching across functional.
Our Offering :
- True start-up experience - no bureaucracy and a ton of tough decisions that have a real impact on the business from day one.
-
The camaraderie of an amazingly talented team that is working tirelessly to build a great OS for India and surrounding markets.
Perks :
- Awesome benefits, social gatherings, etc.
- Work with intelligent, fun and interesting people in a dynamic start-up environment.







