Cutshort logo
Reliability engineering Jobs in Hyderabad

3+ Reliability engineering Jobs in Hyderabad | Reliability engineering Job openings in Hyderabad

Apply to 3+ Reliability engineering Jobs in Hyderabad on CutShort.io. Explore the latest Reliability engineering Job opportunities across top companies like Google, Amazon & Adobe.

icon
Searce Inc

at Searce Inc

3 recruiters
Reena Bandekar
Posted by Reena Bandekar
Pune, Gurugram, Bengaluru (Bangalore), Hyderabad
5 - 12 yrs
₹15L - ₹28L / yr
DevOps
skill iconKubernetes
Incident management
Observability
Reliability engineering
+4 more

Lead Cloud Reliability Engineer


Job Responsibilities

● Lead and manage the Cloud Reliability teams to provide strong Managed Services support to end-customers.

● Isolate, troubleshoot and resolve issues reported by CMS clients in their cloud environment

● Drive the communication with the customer providing details about the issue, current steps, next plan of action, ETA

● Gather client's requirements related to use of specic cloud services and provide assistance in seing them up and resolving issues

● Create SOPs and knowledge articles for use by the L1 teams to resolve common issues

● Identify recurring issues, perform root cause analysis and propose/implement preventive actions

● Follow change management procedure to identify, record and implement changes

● Plan and deploy OS, security patches in Windows/Linux environment and upgrade k8s clusters

● Identify the recurring manual activities and contribute to automation

● Provide technical guidance and educate team members on development and operations. Monitor metrics and develop ways to improve.

● System troubleshooting and problem-solving across plaorm and application domains. Ability to use a wide variety of open-source technologies and cloud services.

● Build, maintain, and monitor conguration standards.

● Ensuring critical system security through using best-in-class cloud security solutions.


Qualifications

● 4-7 years experience in Cloud Infrastructure and Operations domains and IT operational experience preferably in a global enterprise environment.

● Specialize in one or two cloud deployment platforms: AWS, GCP

● Hands on experience with AWS/GCP services (EKS, ECS, EC2, VPC, RDS, Lambda, GKE, Compute Engine)

● Understanding of one or more programming languages (Python, JavaScript, Ruby, Java, .Net)

● Logging and Monitoring tools (ELK, Stackdriver, CloudWatch)

● Knowledge on Conguration Management tools such as Ansible, Terraform, Puppet, Chef

● Experience working with deployment and orchestration technologies (such as Docker, Kubernetes, Mesos)

● Good analytical, communication, problem solving, and learning skills.

● Knowledge on programming against cloud plaorms such as Google Cloud Platform and lean development methodologies.

● Strong service aitude and a commitment to quality.

● Willingness to work in shifts.

Read more
Redpin

at Redpin

1 candid answer
Shivani Nalavade
Posted by Shivani Nalavade
Hyderabad
4 - 6 yrs
Best in industry
Reliability engineering
skill iconDocker


About the Company


At Redpin we simplify life's most important payments. Buying a new property overseas can be a stressful time, especially when it comes to moving your money. Through our Currencies Direct and TorFX brands we've been helping people do just that for over 25 years. With recent investment we're now on a mission to build a new range of digital products and services that will make moving money Internationally for Real Estate purchases even easier


We’re on a mission to become the solution for Real Estate payments everywhere. To do this, we are transitioning our business from a horizontal FX platform to a verticalized, embedded software company, as we look to the future and Redpin 2.0.




About the Role

At Redpin, we’re passionate about building software that solves problems. We count on our site reliability engineers (SREs) to empower users with a rich feature set, high availability, and stellar performance level to pursue their missions. As we expand customer deployments, we’re seeking an experienced SRE to deliver insights from massive-scale data in real time. Specifically, we’re searching for someone who has fresh ideas and a unique viewpoint, and who enjoys collaborating with a cross-functional team to develop real-world solutions and positive user experiences for every interaction.

 

What You'll Do

  • Run the production environment by monitoring availability and taking a holistic view of system health.
  • Build software and systems to manage platform infrastructure and applications
  •  Improve reliability, quality, and time-to-market of our suite of software solutions.
  •  Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating for continual improvement.
  • Provide primary operational support and engineering for multiple large-scale distributed software applications.
  • Design, implement, and maintain highly available and scalable infrastructure and systems on AWS.
  •  Gather and analyze metrics from operating systems as well as applications to assist in performance tuning and fault finding.
  •  Partner with development teams to improve services through rigorous testing and release procedures.
  • Participate in system design consulting, platform management, and capacity planning.
  • Create sustainable systems and services through automation and uplifts.
  •  Balance feature development speed and reliability with well-defined service-level objectives

 

What You’ll Need

  • Bachelor’s degree in computer science, Software Engineering, or a related field. (Master's degree preferred)
  • 4+ years of experience as a Site Reliability Engineer or in a similar role.
  • Strong knowledge of system architecture, infrastructure design, and best practices. 
  • Proficiency in scripting and automation using languages like Python, Bash, or similar technologies.
  • Experience with cloud platforms such as AWS, including infrastructure provisioning and management.
  • Strong understanding of networking principles and protocols.
  •  Experience with supporting Java, Spring Boot, Hibernate JPA, Python, React, and .NET technologies Application.
  • Knowledge of API gateway solutions like Kong and Layer 7.
  • Experience working with databases such as Elastic, SQL Server, Postgres SQL.
  • Familiarity with messaging systems like MQ, ActiveMQ, and Kafka.
  • Proficiency in managing servers such as Tomcat, JBoss, Apache, NGINX, and IIS.
  • Experience with containerization using EKS (Elastic Kubernetes Service).
  • Knowledge of CI/CD processes and tools like Jenkins, Artifactory, and Ansible.
  •  Proficiency in monitoring tools such as Coralogix, CloudWatch, Zabbix, Grafana, and Prometheus.
  • Strong problem-solving and troubleshooting skills with the ability to analyse and resolve complex technical issues.
  •  Excellent communication and collaboration skills to work effectively in a team environment.
  • Strong attention to detail and ability to prioritize and manage multiple tasks simultaneously.
  •  Self-motivated and able to work independently with minimal supervision. 

 

We welcome people from all backgrounds who seek the opportunity to help build a future where we connect the dots for international property payments. If you have the curiosity, passion, and collaborative spirit, work with us, and let’s move the world of PropTech forward, together.


Redpin, Currencies Direct and TorFX are proud to be an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to sex, gender identity, sexual orientation, race, colour, religion, national origin, disability, protected veteran status, age, or any other characteristic protected by law

Read more
Auditoria.ai
Hyderabad
6 - 12 yrs
₹30L - ₹40L / yr
DevOps
Reliability engineering
Artificial Intelligence (AI)
skill iconAmazon Web Services (AWS)
skill iconDocker
+2 more

Job Title: SRE Lead Engineer

Location: Hyderabad, India

Company: Client of Options Executive Search, AI Saas Product Development Company


We are seeking a DevOps / SRE Lead Engineer to architect and scale our client's multi-tenant SaaS platform with AI/ML at the core..


Our client, a fast-growing AI-powered SaaS company in the FinTech space, is looking for a Site Reliability Engineering (SRE) Lead Engineer to join their dynamic team. This is an opportunity to design and operate large-scale SaaS systems that integrate cutting-edge AI/ML capabilities.


About the Role


As the SRE Lead Engineer, you will be responsible for architecting, building, and maintaining infrastructure that powers a multi-tenant SaaS platform. You’ll drive reliability, scalability, and security, while supporting AI/ML pipelines in production. This is a hands-on role with significant ownership, requiring both technical depth and leadership in site reliability practices.


Key Responsibilities

  • Architect, design, and deploy end-to-end infrastructure for large-scale, microservices-based SaaS platforms.
  • Ensure system reliability, scalability, and security for AI/ML model integrations and data pipelines.
  • Automate environment provisioning and management using Terraform in AWS (EKS-focused).
  • Implement full-stack observability across applications, networks, and operating systems.
  • Lead incident management and participate in 24/7 on-call rotation.
  • Optimize SaaS reliability while enabling REST APIs, SSO integrations (Okta/Auth0), and cloud data services (RDS/MySQL, Elasticsearch).
  • Define and maintain backup and disaster recovery for critical workloads.


Required Skills & Experience

  • 8+ years in SRE/DevOps roles, managing enterprise SaaS applications in production.
  • Minimum 1 year experience with AI/ML infrastructure or model-serving environments.
  • Strong expertise in AWS cloud, particularly EKS, container orchestration, and Kubernetes.
  • Hands-on experience with Infrastructure as Code (Terraform), Docker, and scripting (Python, Bash).
  • Solid Linux OS and networking fundamentals.
  • Experience in monitoring and observability with ELK, CloudWatch, or similar tools.
  • Strong track record with microservices, REST APIs, SSO, and cloud databases.


Nice-to-Have Skills

  • Experience with MLOps and AI/ML pipeline observability.
  • Cost optimization and security hardening in multi-tenant SaaS.
  • Prior exposure to FinTech or enterprise finance solutions.


Qualifications

  • Bachelor’s degree in Computer Science, Engineering, or related discipline.
  • AWS Certified Solutions Architect (strongly preferred).
  • Experience in early-stage or high-growth startups is an advantage.


Why Join?

  • Be at the forefront of AI/ML-powered SaaS innovation in FinTech.
  • Work with a high-energy, entrepreneurial team building next-gen infrastructure.
  • Take ownership of mission-critical reliability challenges.
  • Grow your career in an environment that values impact, adaptability, and innovation.


If you’re passionate about building secure, scalable, and intelligent platforms, we’d love to hear from you. Apply now to be part of our client’s journey in redefining enterprise finance operations.

Read more
Get to hear about interesting companies hiring right now
Company logo
Company logo
Company logo
Company logo
Company logo
Linkedin iconFollow Cutshort
Why apply via Cutshort?
Connect with actual hiring teams and get their fast response. No spam.
Find more jobs
Get to hear about interesting companies hiring right now
Company logo
Company logo
Company logo
Company logo
Company logo
Linkedin iconFollow Cutshort