Reliability engineering Jobs in Chennai

2+ Reliability engineering Jobs in Chennai | Reliability engineering Job openings in Chennai

Apply to 2+ Reliability engineering Jobs in Chennai on CutShort.io. Explore the latest Reliability engineering Job opportunities across top companies like Google, Amazon & Adobe.

SRE

at Deqode

1 recruiter

Posted by Shubham Das

Mumbai, Delhi, Gurugram, Noida, Ghaziabad, Faridabad, Chennai

6 - 9 yrs

₹15L - ₹17L / yr

Amazon Web Services (AWS)

Reliability engineering

We are hiring a Site Reliability Engineer (SRE) to join our high-performance engineering team. In this role, you'll be responsible for driving reliability, performance, scalability, and security across cloud-native systems while bridging the gap between development and operations.

Key Responsibilities

Design and implement scalable, resilient infrastructure on AWS
Take ownership of the SRE function – availability, latency, performance, monitoring, incident response, and capacity planning
Partner with product and engineering teams to improve system reliability, observability, and release velocity
Set up, maintain, and enhance CI/CD pipelines using Jenkins, GitHub Actions, or AWS CodePipeline
Conduct load and stress testing, identify performance bottlenecks, and implement optimization strategies

Required Skills & Qualifications

Proven hands-on experience in cloud infrastructure design (AWS strongly preferred)
Strong background in DevOps and SRE principles
Proficiency with performance testing tools like JMeter, Gatling, k6, or Locust
Deep understanding of cloud security and best practices for reliability engineering
AWS Solution Architect Certification – Associate or Professional (preferred)
Solid problem-solving skills and a proactive approach to systems improvement

Why Join Us?

Work with cutting-edge technologies in a cloud-native, fast-paced environment
Collaborate with cross-functional teams driving meaningful impact
Hybrid work culture with flexibility and autonomy
Open, inclusive work environment focused on innovation and excellence

Key Responsibilities

Design and implement scalable, resilient infrastructure on AWS
Take ownership of the SRE function – availability, latency, performance, monitoring, incident response, and capacity planning
Partner with product and engineering teams to improve system reliability, observability, and release velocity
Set up, maintain, and enhance CI/CD pipelines using Jenkins, GitHub Actions, or AWS CodePipeline
Conduct load and stress testing, identify performance bottlenecks, and implement optimization strategies

Required Skills & Qualifications

Proven hands-on experience in cloud infrastructure design (AWS strongly preferred)
Strong background in DevOps and SRE principles
Proficiency with performance testing tools like JMeter, Gatling, k6, or Locust
Deep understanding of cloud security and best practices for reliability engineering
AWS Solution Architect Certification – Associate or Professional (preferred)
Solid problem-solving skills and a proactive approach to systems improvement

Why Join Us?

Work with cutting-edge technologies in a cloud-native, fast-paced environment
Collaborate with cross-functional teams driving meaningful impact
Hybrid work culture with flexibility and autonomy
Open, inclusive work environment focused on innovation and excellence

Site Reliability Engineer

at Deqode

1 recruiter

Posted by Roshni Maji

Pune, Bengaluru (Bangalore), Gurugram, Chennai

4 - 8 yrs

₹7L - ₹26L / yr

SRE

Reliability engineering

Amazon Web Services (AWS)

Python

Job Title: Site Reliability Engineer (SRE)

Experience: 4+ Years

Work Location: Bangalore / Chennai / Pune / Gurgaon

Work Mode: Hybrid or Onsite (based on project need)

Domain Preference: Candidates with past experience working in shoe/footwear retail brands (e.g., Nike, Adidas, Puma) are highly preferred.

🛠️ Key Responsibilities

Design, implement, and manage scalable, reliable, and secure infrastructure on AWS.
Develop and maintain Python-based automation scripts for deployment, monitoring, and alerting.
Monitor system performance, uptime, and overall health using tools like Prometheus, Grafana, or Datadog.
Handle incident response, root cause analysis, and ensure proactive remediation of production issues.
Define and implement Service Level Objectives (SLOs) and Error Budgets in alignment with business requirements.
Build tools to improve system reliability, automate manual tasks, and enforce infrastructure consistency.
Collaborate with development and DevOps teams to ensure robust CI/CD pipelines and safe deployments.
Conduct chaos testing and participate in on-call rotations to maintain 24/7 application availability.

✅ Must-Have Skills

4+ years of experience in Site Reliability Engineering or DevOps with a focus on reliability, monitoring, and automation.
Strong programming skills in Python (mandatory).
Hands-on experience with AWS cloud services (EC2, S3, Lambda, ECS/EKS, CloudWatch, etc.).
Expertise in monitoring and alerting tools like Prometheus, Grafana, Datadog, CloudWatch, etc.
Strong background in Linux-based systems and shell scripting.
Experience implementing infrastructure as code using tools like Terraform or CloudFormation.
Deep understanding of incident management, SLOs/SLIs, and postmortem practices.
Prior working experience in footwear/retail brands such as Nike or similar is highly preferred.

Job Title: Site Reliability Engineer (SRE)

Experience: 4+ Years

Work Location: Bangalore / Chennai / Pune / Gurgaon

Work Mode: Hybrid or Onsite (based on project need)

Domain Preference: Candidates with past experience working in shoe/footwear retail brands (e.g., Nike, Adidas, Puma) are highly preferred.

🛠️ Key Responsibilities

Design, implement, and manage scalable, reliable, and secure infrastructure on AWS.
Develop and maintain Python-based automation scripts for deployment, monitoring, and alerting.
Monitor system performance, uptime, and overall health using tools like Prometheus, Grafana, or Datadog.
Handle incident response, root cause analysis, and ensure proactive remediation of production issues.
Define and implement Service Level Objectives (SLOs) and Error Budgets in alignment with business requirements.
Build tools to improve system reliability, automate manual tasks, and enforce infrastructure consistency.
Collaborate with development and DevOps teams to ensure robust CI/CD pipelines and safe deployments.
Conduct chaos testing and participate in on-call rotations to maintain 24/7 application availability.

✅ Must-Have Skills

4+ years of experience in Site Reliability Engineering or DevOps with a focus on reliability, monitoring, and automation.
Strong programming skills in Python (mandatory).
Hands-on experience with AWS cloud services (EC2, S3, Lambda, ECS/EKS, CloudWatch, etc.).
Expertise in monitoring and alerting tools like Prometheus, Grafana, Datadog, CloudWatch, etc.
Strong background in Linux-based systems and shell scripting.
Experience implementing infrastructure as code using tools like Terraform or CloudFormation.
Deep understanding of incident management, SLOs/SLIs, and postmortem practices.
Prior working experience in footwear/retail brands such as Nike or similar is highly preferred.

Get to hear about interesting companies hiring right now

Follow Cutshort

Why apply via Cutshort?

Connect with actual hiring teams and get their fast response. No spam.

Find more jobs

Get to hear about interesting companies hiring right now

Follow Cutshort