Pricing FAQs For employers

Reliability engineering Jobs in Hyderabad

Reliability engineering jobs

Jobs by category

3+ Reliability engineering Jobs in Hyderabad | Reliability engineering Job openings in Hyderabad

Apply to 3+ Reliability engineering Jobs in Hyderabad on CutShort.io. Explore the latest Reliability engineering Job opportunities across top companies like Google, Amazon & Adobe.

Searce Inc

Lead | Senior Site Reliability Engineer

at Searce Inc

3 recruiters

Reena Bandekar

Posted by Reena Bandekar

Pune, Gurugram, Bengaluru (Bangalore), Hyderabad

5 - 12 yrs

₹15L - ₹28L / yr

DevOps

Kubernetes

Incident management

Observability

Reliability engineering

+4 more

Lead Cloud Reliability Engineer

Job Responsibilities

● Lead and manage the Cloud Reliability teams to provide strong Managed Services support to end-customers.

● Isolate, troubleshoot and resolve issues reported by CMS clients in their cloud environment

● Drive the communication with the customer providing details about the issue, current steps, next plan of action, ETA

● Gather client's requirements related to use of specic cloud services and provide assistance in seing them up and resolving issues

● Create SOPs and knowledge articles for use by the L1 teams to resolve common issues

● Identify recurring issues, perform root cause analysis and propose/implement preventive actions

● Follow change management procedure to identify, record and implement changes

● Plan and deploy OS, security patches in Windows/Linux environment and upgrade k8s clusters

● Identify the recurring manual activities and contribute to automation

● Provide technical guidance and educate team members on development and operations. Monitor metrics and develop ways to improve.

● System troubleshooting and problem-solving across plaorm and application domains. Ability to use a wide variety of open-source technologies and cloud services.

● Build, maintain, and monitor conguration standards.

● Ensuring critical system security through using best-in-class cloud security solutions.

Qualifications

● 4-7 years experience in Cloud Infrastructure and Operations domains and IT operational experience preferably in a global enterprise environment.

● Specialize in one or two cloud deployment platforms: AWS, GCP

● Hands on experience with AWS/GCP services (EKS, ECS, EC2, VPC, RDS, Lambda, GKE, Compute Engine)

● Understanding of one or more programming languages (Python, JavaScript, Ruby, Java, .Net)

● Logging and Monitoring tools (ELK, Stackdriver, CloudWatch)

● Knowledge on Conguration Management tools such as Ansible, Terraform, Puppet, Chef

● Experience working with deployment and orchestration technologies (such as Docker, Kubernetes, Mesos)

● Good analytical, communication, problem solving, and learning skills.

● Knowledge on programming against cloud plaorms such as Google Cloud Platform and lean development methodologies.

● Strong service aitude and a commitment to quality.

● Willingness to work in shifts.

Lead Cloud Reliability Engineer

Job Responsibilities

● Lead and manage the Cloud Reliability teams to provide strong Managed Services support to end-customers.

● Isolate, troubleshoot and resolve issues reported by CMS clients in their cloud environment

● Drive the communication with the customer providing details about the issue, current steps, next plan of action, ETA

● Gather client's requirements related to use of specic cloud services and provide assistance in seing them up and resolving issues

● Create SOPs and knowledge articles for use by the L1 teams to resolve common issues

● Identify recurring issues, perform root cause analysis and propose/implement preventive actions

● Follow change management procedure to identify, record and implement changes

● Plan and deploy OS, security patches in Windows/Linux environment and upgrade k8s clusters

● Identify the recurring manual activities and contribute to automation

● Provide technical guidance and educate team members on development and operations. Monitor metrics and develop ways to improve.

● System troubleshooting and problem-solving across plaorm and application domains. Ability to use a wide variety of open-source technologies and cloud services.

● Build, maintain, and monitor conguration standards.

● Ensuring critical system security through using best-in-class cloud security solutions.

Qualifications

● 4-7 years experience in Cloud Infrastructure and Operations domains and IT operational experience preferably in a global enterprise environment.

● Specialize in one or two cloud deployment platforms: AWS, GCP

● Hands on experience with AWS/GCP services (EKS, ECS, EC2, VPC, RDS, Lambda, GKE, Compute Engine)

● Understanding of one or more programming languages (Python, JavaScript, Ruby, Java, .Net)

● Logging and Monitoring tools (ELK, Stackdriver, CloudWatch)

● Knowledge on Conguration Management tools such as Ansible, Terraform, Puppet, Chef

● Experience working with deployment and orchestration technologies (such as Docker, Kubernetes, Mesos)

● Good analytical, communication, problem solving, and learning skills.

● Knowledge on programming against cloud plaorms such as Google Cloud Platform and lean development methodologies.

● Strong service aitude and a commitment to quality.

● Willingness to work in shifts.

Read more

Redpin

Software Engineer II, SRE

at Redpin

1 candid answer

Shivani Nalavade

Posted by Shivani Nalavade

Hyderabad

4 - 6 yrs

Best in industry

Reliability engineering

Docker

About the Company

At Redpin we simplify life's most important payments. Buying a new property overseas can be a stressful time, especially when it comes to moving your money. Through our Currencies Direct and TorFX brands we've been helping people do just that for over 25 years. With recent investment we're now on a mission to build a new range of digital products and services that will make moving money Internationally for Real Estate purchases even easier

We’re on a mission to become the solution for Real Estate payments everywhere. To do this, we are transitioning our business from a horizontal FX platform to a verticalized, embedded software company, as we look to the future and Redpin 2.0.

About the Role

At Redpin, we’re passionate about building software that solves problems. We count on our site reliability engineers (SREs) to empower users with a rich feature set, high availability, and stellar performance level to pursue their missions. As we expand customer deployments, we’re seeking an experienced SRE to deliver insights from massive-scale data in real time. Specifically, we’re searching for someone who has fresh ideas and a unique viewpoint, and who enjoys collaborating with a cross-functional team to develop real-world solutions and positive user experiences for every interaction.

What You'll Do

Run the production environment by monitoring availability and taking a holistic view of system health.
Build software and systems to manage platform infrastructure and applications
Improve reliability, quality, and time-to-market of our suite of software solutions.
Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating for continual improvement.
Provide primary operational support and engineering for multiple large-scale distributed software applications.
Design, implement, and maintain highly available and scalable infrastructure and systems on AWS.
Gather and analyze metrics from operating systems as well as applications to assist in performance tuning and fault finding.
Partner with development teams to improve services through rigorous testing and release procedures.
Participate in system design consulting, platform management, and capacity planning.
Create sustainable systems and services through automation and uplifts.
Balance feature development speed and reliability with well-defined service-level objectives

What You’ll Need

Bachelor’s degree in computer science, Software Engineering, or a related field. (Master's degree preferred)
4+ years of experience as a Site Reliability Engineer or in a similar role.
Strong knowledge of system architecture, infrastructure design, and best practices.
Proficiency in scripting and automation using languages like Python, Bash, or similar technologies.
Experience with cloud platforms such as AWS, including infrastructure provisioning and management.
Strong understanding of networking principles and protocols.
Experience with supporting Java, Spring Boot, Hibernate JPA, Python, React, and .NET technologies Application.
Knowledge of API gateway solutions like Kong and Layer 7.
Experience working with databases such as Elastic, SQL Server, Postgres SQL.
Familiarity with messaging systems like MQ, ActiveMQ, and Kafka.
Proficiency in managing servers such as Tomcat, JBoss, Apache, NGINX, and IIS.
Experience with containerization using EKS (Elastic Kubernetes Service).
Knowledge of CI/CD processes and tools like Jenkins, Artifactory, and Ansible.
Proficiency in monitoring tools such as Coralogix, CloudWatch, Zabbix, Grafana, and Prometheus.
Strong problem-solving and troubleshooting skills with the ability to analyse and resolve complex technical issues.
Excellent communication and collaboration skills to work effectively in a team environment.
Strong attention to detail and ability to prioritize and manage multiple tasks simultaneously.
Self-motivated and able to work independently with minimal supervision.

We welcome people from all backgrounds who seek the opportunity to help build a future where we connect the dots for international property payments. If you have the curiosity, passion, and collaborative spirit, work with us, and let’s move the world of PropTech forward, together.

Redpin, Currencies Direct and TorFX are proud to be an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to sex, gender identity, sexual orientation, race, colour, religion, national origin, disability, protected veteran status, age, or any other characteristic protected by law

About the Company

At Redpin we simplify life's most important payments. Buying a new property overseas can be a stressful time, especially when it comes to moving your money. Through our Currencies Direct and TorFX brands we've been helping people do just that for over 25 years. With recent investment we're now on a mission to build a new range of digital products and services that will make moving money Internationally for Real Estate purchases even easier

We’re on a mission to become the solution for Real Estate payments everywhere. To do this, we are transitioning our business from a horizontal FX platform to a verticalized, embedded software company, as we look to the future and Redpin 2.0.

About the Role

At Redpin, we’re passionate about building software that solves problems. We count on our site reliability engineers (SREs) to empower users with a rich feature set, high availability, and stellar performance level to pursue their missions. As we expand customer deployments, we’re seeking an experienced SRE to deliver insights from massive-scale data in real time. Specifically, we’re searching for someone who has fresh ideas and a unique viewpoint, and who enjoys collaborating with a cross-functional team to develop real-world solutions and positive user experiences for every interaction.

What You'll Do

Run the production environment by monitoring availability and taking a holistic view of system health.
Build software and systems to manage platform infrastructure and applications
Improve reliability, quality, and time-to-market of our suite of software solutions.
Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating for continual improvement.
Provide primary operational support and engineering for multiple large-scale distributed software applications.
Design, implement, and maintain highly available and scalable infrastructure and systems on AWS.
Gather and analyze metrics from operating systems as well as applications to assist in performance tuning and fault finding.
Partner with development teams to improve services through rigorous testing and release procedures.
Participate in system design consulting, platform management, and capacity planning.
Create sustainable systems and services through automation and uplifts.
Balance feature development speed and reliability with well-defined service-level objectives

What You’ll Need

Bachelor’s degree in computer science, Software Engineering, or a related field. (Master's degree preferred)
4+ years of experience as a Site Reliability Engineer or in a similar role.
Strong knowledge of system architecture, infrastructure design, and best practices.
Proficiency in scripting and automation using languages like Python, Bash, or similar technologies.
Experience with cloud platforms such as AWS, including infrastructure provisioning and management.
Strong understanding of networking principles and protocols.
Experience with supporting Java, Spring Boot, Hibernate JPA, Python, React, and .NET technologies Application.
Knowledge of API gateway solutions like Kong and Layer 7.
Experience working with databases such as Elastic, SQL Server, Postgres SQL.
Familiarity with messaging systems like MQ, ActiveMQ, and Kafka.
Proficiency in managing servers such as Tomcat, JBoss, Apache, NGINX, and IIS.
Experience with containerization using EKS (Elastic Kubernetes Service).
Knowledge of CI/CD processes and tools like Jenkins, Artifactory, and Ansible.
Proficiency in monitoring tools such as Coralogix, CloudWatch, Zabbix, Grafana, and Prometheus.
Strong problem-solving and troubleshooting skills with the ability to analyse and resolve complex technical issues.
Excellent communication and collaboration skills to work effectively in a team environment.
Strong attention to detail and ability to prioritize and manage multiple tasks simultaneously.
Self-motivated and able to work independently with minimal supervision.

We welcome people from all backgrounds who seek the opportunity to help build a future where we connect the dots for international property payments. If you have the curiosity, passion, and collaborative spirit, work with us, and let’s move the world of PropTech forward, together.

Redpin, Currencies Direct and TorFX are proud to be an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to sex, gender identity, sexual orientation, race, colour, religion, national origin, disability, protected veteran status, age, or any other characteristic protected by law

Read more

Auditoria.ai

Senior Site Reliability Enginer

Auditoria.ai

Agency job

via Options Executive Search Pvt Ltd by Achyut Menon

Hyderabad

6 - 12 yrs

₹30L - ₹40L / yr

DevOps

Reliability engineering

Artificial Intelligence (AI)

Amazon Web Services (AWS)

Docker

+2 more

Job Title: SRE Lead Engineer

Location: Hyderabad, India

Company: Client of Options Executive Search, AI Saas Product Development Company

We are seeking a DevOps / SRE Lead Engineer to architect and scale our client's multi-tenant SaaS platform with AI/ML at the core..

Our client, a fast-growing AI-powered SaaS company in the FinTech space, is looking for a Site Reliability Engineering (SRE) Lead Engineer to join their dynamic team. This is an opportunity to design and operate large-scale SaaS systems that integrate cutting-edge AI/ML capabilities.

About the Role

As the SRE Lead Engineer, you will be responsible for architecting, building, and maintaining infrastructure that powers a multi-tenant SaaS platform. You’ll drive reliability, scalability, and security, while supporting AI/ML pipelines in production. This is a hands-on role with significant ownership, requiring both technical depth and leadership in site reliability practices.

Key Responsibilities

Architect, design, and deploy end-to-end infrastructure for large-scale, microservices-based SaaS platforms.
Ensure system reliability, scalability, and security for AI/ML model integrations and data pipelines.
Automate environment provisioning and management using Terraform in AWS (EKS-focused).
Implement full-stack observability across applications, networks, and operating systems.
Lead incident management and participate in 24/7 on-call rotation.
Optimize SaaS reliability while enabling REST APIs, SSO integrations (Okta/Auth0), and cloud data services (RDS/MySQL, Elasticsearch).
Define and maintain backup and disaster recovery for critical workloads.

Required Skills & Experience

8+ years in SRE/DevOps roles, managing enterprise SaaS applications in production.
Minimum 1 year experience with AI/ML infrastructure or model-serving environments.
Strong expertise in AWS cloud, particularly EKS, container orchestration, and Kubernetes.
Hands-on experience with Infrastructure as Code (Terraform), Docker, and scripting (Python, Bash).
Solid Linux OS and networking fundamentals.
Experience in monitoring and observability with ELK, CloudWatch, or similar tools.
Strong track record with microservices, REST APIs, SSO, and cloud databases.

Nice-to-Have Skills

Experience with MLOps and AI/ML pipeline observability.
Cost optimization and security hardening in multi-tenant SaaS.
Prior exposure to FinTech or enterprise finance solutions.

Qualifications

Bachelor’s degree in Computer Science, Engineering, or related discipline.
AWS Certified Solutions Architect (strongly preferred).
Experience in early-stage or high-growth startups is an advantage.

Why Join?

Be at the forefront of AI/ML-powered SaaS innovation in FinTech.
Work with a high-energy, entrepreneurial team building next-gen infrastructure.
Take ownership of mission-critical reliability challenges.
Grow your career in an environment that values impact, adaptability, and innovation.

If you’re passionate about building secure, scalable, and intelligent platforms, we’d love to hear from you. Apply now to be part of our client’s journey in redefining enterprise finance operations.

Job Title: SRE Lead Engineer

Location: Hyderabad, India

Company: Client of Options Executive Search, AI Saas Product Development Company

We are seeking a DevOps / SRE Lead Engineer to architect and scale our client's multi-tenant SaaS platform with AI/ML at the core..

Our client, a fast-growing AI-powered SaaS company in the FinTech space, is looking for a Site Reliability Engineering (SRE) Lead Engineer to join their dynamic team. This is an opportunity to design and operate large-scale SaaS systems that integrate cutting-edge AI/ML capabilities.

About the Role

As the SRE Lead Engineer, you will be responsible for architecting, building, and maintaining infrastructure that powers a multi-tenant SaaS platform. You’ll drive reliability, scalability, and security, while supporting AI/ML pipelines in production. This is a hands-on role with significant ownership, requiring both technical depth and leadership in site reliability practices.

Key Responsibilities

Architect, design, and deploy end-to-end infrastructure for large-scale, microservices-based SaaS platforms.
Ensure system reliability, scalability, and security for AI/ML model integrations and data pipelines.
Automate environment provisioning and management using Terraform in AWS (EKS-focused).
Implement full-stack observability across applications, networks, and operating systems.
Lead incident management and participate in 24/7 on-call rotation.
Optimize SaaS reliability while enabling REST APIs, SSO integrations (Okta/Auth0), and cloud data services (RDS/MySQL, Elasticsearch).
Define and maintain backup and disaster recovery for critical workloads.

Required Skills & Experience

8+ years in SRE/DevOps roles, managing enterprise SaaS applications in production.
Minimum 1 year experience with AI/ML infrastructure or model-serving environments.
Strong expertise in AWS cloud, particularly EKS, container orchestration, and Kubernetes.
Hands-on experience with Infrastructure as Code (Terraform), Docker, and scripting (Python, Bash).
Solid Linux OS and networking fundamentals.
Experience in monitoring and observability with ELK, CloudWatch, or similar tools.
Strong track record with microservices, REST APIs, SSO, and cloud databases.

Nice-to-Have Skills

Experience with MLOps and AI/ML pipeline observability.
Cost optimization and security hardening in multi-tenant SaaS.
Prior exposure to FinTech or enterprise finance solutions.

Qualifications

Bachelor’s degree in Computer Science, Engineering, or related discipline.
AWS Certified Solutions Architect (strongly preferred).
Experience in early-stage or high-growth startups is an advantage.

Why Join?

Be at the forefront of AI/ML-powered SaaS innovation in FinTech.
Work with a high-energy, entrepreneurial team building next-gen infrastructure.
Take ownership of mission-critical reliability challenges.
Grow your career in an environment that values impact, adaptability, and innovation.

If you’re passionate about building secure, scalable, and intelligent platforms, we’d love to hear from you. Apply now to be part of our client’s journey in redefining enterprise finance operations.

Read more

Get to hear about interesting companies hiring right now

Follow Cutshort

Why apply via Cutshort?

Connect with actual hiring teams and get their fast response. No spam.

Get to hear about interesting companies hiring right now

Follow Cutshort