2+ Reliability engineering Jobs in Chennai | Reliability engineering Job openings in Chennai
Apply to 2+ Reliability engineering Jobs in Chennai on CutShort.io. Explore the latest Reliability engineering Job opportunities across top companies like Google, Amazon & Adobe.

We are hiring a Site Reliability Engineer (SRE) to join our high-performance engineering team. In this role, you'll be responsible for driving reliability, performance, scalability, and security across cloud-native systems while bridging the gap between development and operations.
Key Responsibilities
- Design and implement scalable, resilient infrastructure on AWS
- Take ownership of the SRE function – availability, latency, performance, monitoring, incident response, and capacity planning
- Partner with product and engineering teams to improve system reliability, observability, and release velocity
- Set up, maintain, and enhance CI/CD pipelines using Jenkins, GitHub Actions, or AWS CodePipeline
- Conduct load and stress testing, identify performance bottlenecks, and implement optimization strategies
Required Skills & Qualifications
- Proven hands-on experience in cloud infrastructure design (AWS strongly preferred)
- Strong background in DevOps and SRE principles
- Proficiency with performance testing tools like JMeter, Gatling, k6, or Locust
- Deep understanding of cloud security and best practices for reliability engineering
- AWS Solution Architect Certification – Associate or Professional (preferred)
- Solid problem-solving skills and a proactive approach to systems improvement
Why Join Us?
- Work with cutting-edge technologies in a cloud-native, fast-paced environment
- Collaborate with cross-functional teams driving meaningful impact
- Hybrid work culture with flexibility and autonomy
- Open, inclusive work environment focused on innovation and excellence

Job Title: Site Reliability Engineer (SRE)
Experience: 4+ Years
Work Location: Bangalore / Chennai / Pune / Gurgaon
Work Mode: Hybrid or Onsite (based on project need)
Domain Preference: Candidates with past experience working in shoe/footwear retail brands (e.g., Nike, Adidas, Puma) are highly preferred.
🛠️ Key Responsibilities
- Design, implement, and manage scalable, reliable, and secure infrastructure on AWS.
- Develop and maintain Python-based automation scripts for deployment, monitoring, and alerting.
- Monitor system performance, uptime, and overall health using tools like Prometheus, Grafana, or Datadog.
- Handle incident response, root cause analysis, and ensure proactive remediation of production issues.
- Define and implement Service Level Objectives (SLOs) and Error Budgets in alignment with business requirements.
- Build tools to improve system reliability, automate manual tasks, and enforce infrastructure consistency.
- Collaborate with development and DevOps teams to ensure robust CI/CD pipelines and safe deployments.
- Conduct chaos testing and participate in on-call rotations to maintain 24/7 application availability.
✅ Must-Have Skills
- 4+ years of experience in Site Reliability Engineering or DevOps with a focus on reliability, monitoring, and automation.
- Strong programming skills in Python (mandatory).
- Hands-on experience with AWS cloud services (EC2, S3, Lambda, ECS/EKS, CloudWatch, etc.).
- Expertise in monitoring and alerting tools like Prometheus, Grafana, Datadog, CloudWatch, etc.
- Strong background in Linux-based systems and shell scripting.
- Experience implementing infrastructure as code using tools like Terraform or CloudFormation.
- Deep understanding of incident management, SLOs/SLIs, and postmortem practices.
- Prior working experience in footwear/retail brands such as Nike or similar is highly preferred.