2+ Reliability engineering Jobs in Hyderabad | Reliability engineering Job openings in Hyderabad
Apply to 2+ Reliability engineering Jobs in Hyderabad on CutShort.io. Explore the latest Reliability engineering Job opportunities across top companies like Google, Amazon & Adobe.
About the Company
At Redpin we simplify life's most important payments. Buying a new property overseas can be a stressful time, especially when it comes to moving your money. Through our Currencies Direct and TorFX brands we've been helping people do just that for over 25 years. With recent investment we're now on a mission to build a new range of digital products and services that will make moving money Internationally for Real Estate purchases even easier
We’re on a mission to become the solution for Real Estate payments everywhere. To do this, we are transitioning our business from a horizontal FX platform to a verticalized, embedded software company, as we look to the future and Redpin 2.0.
About the Role
At Redpin, we’re passionate about building software that solves problems. We count on our site reliability engineers (SREs) to empower users with a rich feature set, high availability, and stellar performance level to pursue their missions. As we expand customer deployments, we’re seeking an experienced SRE to deliver insights from massive-scale data in real time. Specifically, we’re searching for someone who has fresh ideas and a unique viewpoint, and who enjoys collaborating with a cross-functional team to develop real-world solutions and positive user experiences for every interaction.
What You'll Do
- Run the production environment by monitoring availability and taking a holistic view of system health.
- Build software and systems to manage platform infrastructure and applications
- Improve reliability, quality, and time-to-market of our suite of software solutions.
- Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating for continual improvement.
- Provide primary operational support and engineering for multiple large-scale distributed software applications.
- Design, implement, and maintain highly available and scalable infrastructure and systems on AWS.
- Gather and analyze metrics from operating systems as well as applications to assist in performance tuning and fault finding.
- Partner with development teams to improve services through rigorous testing and release procedures.
- Participate in system design consulting, platform management, and capacity planning.
- Create sustainable systems and services through automation and uplifts.
- Balance feature development speed and reliability with well-defined service-level objectives
What You’ll Need
- Bachelor’s degree in computer science, Software Engineering, or a related field. (Master's degree preferred)
- 4+ years of experience as a Site Reliability Engineer or in a similar role.
- Strong knowledge of system architecture, infrastructure design, and best practices.
- Proficiency in scripting and automation using languages like Python, Bash, or similar technologies.
- Experience with cloud platforms such as AWS, including infrastructure provisioning and management.
- Strong understanding of networking principles and protocols.
- Experience with supporting Java, Spring Boot, Hibernate JPA, Python, React, and .NET technologies Application.
- Knowledge of API gateway solutions like Kong and Layer 7.
- Experience working with databases such as Elastic, SQL Server, Postgres SQL.
- Familiarity with messaging systems like MQ, ActiveMQ, and Kafka.
- Proficiency in managing servers such as Tomcat, JBoss, Apache, NGINX, and IIS.
- Experience with containerization using EKS (Elastic Kubernetes Service).
- Knowledge of CI/CD processes and tools like Jenkins, Artifactory, and Ansible.
- Proficiency in monitoring tools such as Coralogix, CloudWatch, Zabbix, Grafana, and Prometheus.
- Strong problem-solving and troubleshooting skills with the ability to analyse and resolve complex technical issues.
- Excellent communication and collaboration skills to work effectively in a team environment.
- Strong attention to detail and ability to prioritize and manage multiple tasks simultaneously.
- Self-motivated and able to work independently with minimal supervision.
We welcome people from all backgrounds who seek the opportunity to help build a future where we connect the dots for international property payments. If you have the curiosity, passion, and collaborative spirit, work with us, and let’s move the world of PropTech forward, together.
Redpin, Currencies Direct and TorFX are proud to be an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to sex, gender identity, sexual orientation, race, colour, religion, national origin, disability, protected veteran status, age, or any other characteristic protected by law
Job Title: SRE Lead Engineer
Location: Hyderabad, India
Company: Client of Options Executive Search, AI Saas Product Development Company
We are seeking a DevOps / SRE Lead Engineer to architect and scale our client's multi-tenant SaaS platform with AI/ML at the core..
Our client, a fast-growing AI-powered SaaS company in the FinTech space, is looking for a Site Reliability Engineering (SRE) Lead Engineer to join their dynamic team. This is an opportunity to design and operate large-scale SaaS systems that integrate cutting-edge AI/ML capabilities.
About the Role
As the SRE Lead Engineer, you will be responsible for architecting, building, and maintaining infrastructure that powers a multi-tenant SaaS platform. You’ll drive reliability, scalability, and security, while supporting AI/ML pipelines in production. This is a hands-on role with significant ownership, requiring both technical depth and leadership in site reliability practices.
Key Responsibilities
- Architect, design, and deploy end-to-end infrastructure for large-scale, microservices-based SaaS platforms.
- Ensure system reliability, scalability, and security for AI/ML model integrations and data pipelines.
- Automate environment provisioning and management using Terraform in AWS (EKS-focused).
- Implement full-stack observability across applications, networks, and operating systems.
- Lead incident management and participate in 24/7 on-call rotation.
- Optimize SaaS reliability while enabling REST APIs, SSO integrations (Okta/Auth0), and cloud data services (RDS/MySQL, Elasticsearch).
- Define and maintain backup and disaster recovery for critical workloads.
Required Skills & Experience
- 8+ years in SRE/DevOps roles, managing enterprise SaaS applications in production.
- Minimum 1 year experience with AI/ML infrastructure or model-serving environments.
- Strong expertise in AWS cloud, particularly EKS, container orchestration, and Kubernetes.
- Hands-on experience with Infrastructure as Code (Terraform), Docker, and scripting (Python, Bash).
- Solid Linux OS and networking fundamentals.
- Experience in monitoring and observability with ELK, CloudWatch, or similar tools.
- Strong track record with microservices, REST APIs, SSO, and cloud databases.
Nice-to-Have Skills
- Experience with MLOps and AI/ML pipeline observability.
- Cost optimization and security hardening in multi-tenant SaaS.
- Prior exposure to FinTech or enterprise finance solutions.
Qualifications
- Bachelor’s degree in Computer Science, Engineering, or related discipline.
- AWS Certified Solutions Architect (strongly preferred).
- Experience in early-stage or high-growth startups is an advantage.
Why Join?
- Be at the forefront of AI/ML-powered SaaS innovation in FinTech.
- Work with a high-energy, entrepreneurial team building next-gen infrastructure.
- Take ownership of mission-critical reliability challenges.
- Grow your career in an environment that values impact, adaptability, and innovation.
If you’re passionate about building secure, scalable, and intelligent platforms, we’d love to hear from you. Apply now to be part of our client’s journey in redefining enterprise finance operations.


