3+ Reliability engineering Jobs in Hyderabad | Reliability engineering Job openings in Hyderabad
Apply to 3+ Reliability engineering Jobs in Hyderabad on CutShort.io. Explore the latest Reliability engineering Job opportunities across top companies like Google, Amazon & Adobe.
🚀 RECRUITING BOND HIRING
Role: CLOUD OPERATIONS & MONITORING ENGINEER - (THE GUARDIAN OF UPTIME)
⚡ THIS IS NOT A MONITORING ROLE
THIS IS A COMMAND ROLE
You don’t watch dashboards.
You control outcomes.
You don’t react to incidents.
You eliminate them before they escalate.
This role powers an AI-driven SaaS + IoT platform where:
---> Uptime is non-negotiable
---> Latency is hunted
---> Failures are never allowed to repeat
Incidents don’t grow.
Problems don’t hide.
Uptime is enforced.
🧠 WHAT YOU’LL OWN
(Real Work. Real Impact.)
🔍 Total Observability
---> Real-time visibility across cloud, application, database & infrastructure
---> High-signal dashboards (Grafana + cloud-native tools)
---> Performance trends tracked before growth breaks systems
🚨 Smart Alerting (No Noise)
---> Alerts that fire only when action is required
---> Zero false positives. Zero alert fatigue
Right signal → right person → right time
⚙ Automation as a Weapon
---> End-to-end automation of operational tasks
---> Standardized logging, metrics & alerting
---> Systems that scale without human friction
🧯 Incident Command & Reliability
---> First responder for critical incidents (on-call rotation)
---> Root cause analysis across network, app, DB & storage
Fix fast — then harden so it never breaks the same way again
📘 Operational Excellence
---> Battle-tested runbooks
---> Documentation that actually works under pressure
Every incident → a stronger platform
🛠️ TECHNOLOGIES YOU’LL MASTER
☁ Cloud: AWS | Azure | Google Cloud
📊 Monitoring: Grafana | Metrics | Traces | Logs
📡 Alerting: Production-grade alerting systems
🌐 Networking: DNS | Routing | Load Balancers | Security
🗄 Databases: Production systems under real pressure
⚙ DevOps: Automation | Reliability Engineering
🎯 WHO WE’RE LOOKING FOR
Engineers who take uptime personally.
You bring:
---> 3+ years in Cloud Ops / DevOps / SRE
---> Live production SaaS experience
---> Deep AWS / Azure / GCP expertise
---> Strong monitoring & alerting experience
---> Solid networking fundamentals
---> Calm, methodical incident response
---> Bonus (Highly Preferred):
---> B2B SaaS + IoT / hybrid platforms
---> Strong automation mindset
---> Engineers who think in systems, not tickets
💼 JOB DETAILS
📍 Bengaluru
🏢 Hybrid (WFH)
💰 (Final CTC depends on experience & interviews)
🌟 WHY THIS ROLE?
Most cloud teams manage uptime. We weaponize it.
Your work won’t just keep systems running — it will keep customers confident, operations flawless, and competitors wondering how it all works so smoothly.
📩 APPLY / REFER : 🔗 Know someone who lives for reliability, observability & cloud excellence?
🚀 RECRUITING BOND HIRING
Role: CLOUD OPERATIONS & MONITORING ENGINEER - (THE GUARDIAN OF UPTIME)
⚡ THIS IS NOT A MONITORING ROLE
THIS IS A COMMAND ROLE
You don’t watch dashboards.
You control outcomes.
You don’t react to incidents.
You eliminate them before they escalate.
This role powers an AI-driven SaaS + IoT platform where:
---> Uptime is non-negotiable
---> Latency is hunted
---> Failures are never allowed to repeat
Incidents don’t grow.
Problems don’t hide.
Uptime is enforced.
🧠 WHAT YOU’LL OWN
(Real Work. Real Impact.)
🔍 Total Observability
---> Real-time visibility across cloud, application, database & infrastructure
---> High-signal dashboards (Grafana + cloud-native tools)
---> Performance trends tracked before growth breaks systems
🚨 Smart Alerting (No Noise)
---> Alerts that fire only when action is required
---> Zero false positives. Zero alert fatigue
Right signal → right person → right time
⚙ Automation as a Weapon
---> End-to-end automation of operational tasks
---> Standardized logging, metrics & alerting
---> Systems that scale without human friction
🧯 Incident Command & Reliability
---> First responder for critical incidents (on-call rotation)
---> Root cause analysis across network, app, DB & storage
Fix fast — then harden so it never breaks the same way again
📘 Operational Excellence
---> Battle-tested runbooks
---> Documentation that actually works under pressure
Every incident → a stronger platform
🛠️ TECHNOLOGIES YOU’LL MASTER
☁ Cloud: AWS | Azure | Google Cloud
📊 Monitoring: Grafana | Metrics | Traces | Logs
📡 Alerting: Production-grade alerting systems
🌐 Networking: DNS | Routing | Load Balancers | Security
🗄 Databases: Production systems under real pressure
⚙ DevOps: Automation | Reliability Engineering
🎯 WHO WE’RE LOOKING FOR
Engineers who take uptime personally.
You bring:
---> 3+ years in Cloud Ops / DevOps / SRE
---> Live production SaaS experience
---> Deep AWS / Azure / GCP expertise
---> Strong monitoring & alerting experience
---> Solid networking fundamentals
---> Calm, methodical incident response
---> Bonus (Highly Preferred):
---> B2B SaaS + IoT / hybrid platforms
---> Strong automation mindset
---> Engineers who think in systems, not tickets
💼 JOB DETAILS
📍 Bengaluru
🏢 Hybrid (WFH)
💰 (Final CTC depends on experience & interviews)
🌟 WHY THIS ROLE?
Most cloud teams manage uptime. We weaponize it.
Your work won’t just keep systems running — it will keep customers confident, operations flawless, and competitors wondering how it all works so smoothly.
📩 APPLY / REFER : 🔗 Know someone who lives for reliability, observability & cloud excellence?
Job Title: SRE Lead Engineer
Location: Hyderabad, India
Company: Client of Options Executive Search, AI Saas Product Development Company
We are seeking a DevOps / SRE Lead Engineer to architect and scale our client's multi-tenant SaaS platform with AI/ML at the core..
Our client, a fast-growing AI-powered SaaS company in the FinTech space, is looking for a Site Reliability Engineering (SRE) Lead Engineer to join their dynamic team. This is an opportunity to design and operate large-scale SaaS systems that integrate cutting-edge AI/ML capabilities.
About the Role
As the SRE Lead Engineer, you will be responsible for architecting, building, and maintaining infrastructure that powers a multi-tenant SaaS platform. You’ll drive reliability, scalability, and security, while supporting AI/ML pipelines in production. This is a hands-on role with significant ownership, requiring both technical depth and leadership in site reliability practices.
Key Responsibilities
- Architect, design, and deploy end-to-end infrastructure for large-scale, microservices-based SaaS platforms.
- Ensure system reliability, scalability, and security for AI/ML model integrations and data pipelines.
- Automate environment provisioning and management using Terraform in AWS (EKS-focused).
- Implement full-stack observability across applications, networks, and operating systems.
- Lead incident management and participate in 24/7 on-call rotation.
- Optimize SaaS reliability while enabling REST APIs, SSO integrations (Okta/Auth0), and cloud data services (RDS/MySQL, Elasticsearch).
- Define and maintain backup and disaster recovery for critical workloads.
Required Skills & Experience
- 8+ years in SRE/DevOps roles, managing enterprise SaaS applications in production.
- Minimum 1 year experience with AI/ML infrastructure or model-serving environments.
- Strong expertise in AWS cloud, particularly EKS, container orchestration, and Kubernetes.
- Hands-on experience with Infrastructure as Code (Terraform), Docker, and scripting (Python, Bash).
- Solid Linux OS and networking fundamentals.
- Experience in monitoring and observability with ELK, CloudWatch, or similar tools.
- Strong track record with microservices, REST APIs, SSO, and cloud databases.
Nice-to-Have Skills
- Experience with MLOps and AI/ML pipeline observability.
- Cost optimization and security hardening in multi-tenant SaaS.
- Prior exposure to FinTech or enterprise finance solutions.
Qualifications
- Bachelor’s degree in Computer Science, Engineering, or related discipline.
- AWS Certified Solutions Architect (strongly preferred).
- Experience in early-stage or high-growth startups is an advantage.
Why Join?
- Be at the forefront of AI/ML-powered SaaS innovation in FinTech.
- Work with a high-energy, entrepreneurial team building next-gen infrastructure.
- Take ownership of mission-critical reliability challenges.
- Grow your career in an environment that values impact, adaptability, and innovation.
If you’re passionate about building secure, scalable, and intelligent platforms, we’d love to hear from you. Apply now to be part of our client’s journey in redefining enterprise finance operations.

