
Engineering Leader, Cloud Infrastructure.
Bengaluru, Karnataka, India
Do you thrive on solving complex technical problems? Do you want to be at the cutting edge of technology? If so,we’re interested in speaking with you!
Your Impact:
We’re looking for a seasoned engineering leader in the Cloud team that is responsible for building, operating, and maintaining a customer-facing DBaaS service in multiple public clouds (AWS, GCP, and Azure). The service supports unified multiverse management of YugabyteDB, including fault-domain aware provisioning, rolling upgrades, security,
networking, monitoring, and day-2 operations (backups, scaling, billing etc). If you’re a strong leader who exemplifies collaboration, who is driven and thrive in a fast-paced startup environment, and who has a strong desire to build an internet-scale, extensible cloud based service with strong emphasis on simplicity and user experience, this job is for
you.
You Will:
Lead, inspire, and influence to make sure your team is successful
Partner with the recruiting team to attract and retain high-quality and diverse talent
Establish great rapport with other development teams, Product Managers, Sales and Customer Success tomaintain high levels of visibility, efficiency, and collaboration
Ensure teams have appropriate technical direction, leadership and balance between short-term impact andlong term architectural vision.
Occasionally contributing to development tasks such as coding and feature verifications to assist teamswith release commitments, to gain an understanding of the deeply technical product as well as to keepyour technical acumen sharp.
You'll need:
BS/MS degree in CS-or- a related field with 5+ years of engineering management experience leading productive, high-functioning teams
Strong fundamentals in distributed systems design and development
Ability to hire while ensuring a high hiring bar, keep engineers motivated, coach/mentor, and handle performance management
Experience running production services in Public Clouds such as AWS, GCP, and Azure
Experience with running large stateful data systems in the Cloud
Prior knowledge of Cloud architecture and implementation features (multi-tenancy, containerization,orchestration, elastic scalability)
A great track record of shipping features and hitting deadlines consistently; should be able to move fast,build in increments and iterate; have a sense of urgency, aggressive mindset towards achieving results and excellent prioritization skills; able to anticipate future technical needs for the product and craft plans to realize them
Ability to influence the team, peers, and upper management using effective communication and collaborative techniques; focused on building and maintaining a culture of collaboration within the team.

Similar jobs
Key responsibilities
• Design, build, and maintain robust CI/CD pipelines using Azure DevOps Services (Azure Pipelines) and Git-based workflows.
• Implement and manage infrastructure as code (IaC) using ARM templates, Bicep, and/or Terraform for repeatable environment provisioning.
• Containerize applications (Docker) and manage container orchestration platforms such as AKS (Azure Kubernetes Service).
• Automate build, test, release, and rollback processes; integrate automated testing and quality gates into pipelines.
• Monitor and improve platform reliability and observability using logging and monitoring tools (e.g., Azure Monitor, Application Insights, Prometheus, Grafana).
• Drive platform security and compliance through pipeline controls, secrets management (Key Vault / Vault), and secure configuration practices.
• Implement cost-optimization and governance for Azure resources (tags, policies, budgets).
• Troubleshoot build/release failures, production incidents, and performance bottlenecks; perform root-cause analysis and implement permanent fixes.
• Mentor developers in Git workflows, pipeline authoring, best practices for IaC, and cloud-native design.
• Maintain clear documentation: runbooks, deployment playbooks, architecture diagrams, and pipeline templates.
Required skills & experience
• 4+ years hands-on experience working with Azure and cloud-native application delivery.
• Deep experience with Azure DevOps (Repos, Pipelines, Artifacts, Boards).
• Strong IaC skills with Terraform, ARM templates, or Bicep.
• Solid experience with CI/CD design and YAML pipeline authoring.
• Practical knowledge of containerization (Docker) and Kubernetes — preferably AKS.
• Scripting skills: PowerShell, Bash, and/or Python for automation.
• Experience with Git workflows (branching strategies, PRs, code reviews).
• Familiarity with configuration management and secrets management (Azure Key Vault, HashiCorp Vault).
• Understanding of networking, identity (Azure AD), and security fundamentals in Azure.
• Strong troubleshooting, debugging, and incident response skills.
• Good collaboration and communication skills; ability to work across teams.
Certification
AZ-400: Microsoft Certified: DevOps Engineer Expert or AZ-104 or AZ 305 or Terraform Associate.

Location: Bangalore
Experience: 2–5 years
Type: Full-time | On-site
Start: Immediate
Why this role exists
Most systems don’t fail because of one big outage.
They fail because reliability is treated as an afterthought.
Right now, uptime depends too much on individual heroics.
That doesn’t scale.
This role exists to build a reliability system where:
- Uptime is predictable
- Failures are contained
- Escalations don’t depend on leadership
What you’ll do
You will not just monitor systems.
You will own reliability as a product.
1. Drive uptime to production-grade reliability
- Improve system uptime to 99.9% customer-facing SLA within 4 months
- Define and track:
- SLAs / SLOs / error budgets
- Ensure reliability is measured from the customer’s perspective, not internal metrics
2. Build incident response as a system
- Set up a 24/7 incident response rotation across 3 engineers
- Eliminate dependency on leadership (no single escalation point)
- Define:
- Incident severity levels
- Response playbooks
- Escalation protocols
- Ensure fast detection → containment → resolution
3. Contain and fix erratic system behavior
- Identify and resolve:
- Latency spikes
- Downtime incidents
- Integration failures
- Build guardrails to prevent recurrence
- Focus on root cause elimination, not temporary fixes
4. Create continuous reliability feedback loops
- Work closely with engineering teams to:
- Surface recurring failure patterns
- Improve build quality
- Reduce production bugs
- Ensure learnings from incidents directly improve future releases
5. Improve observability and monitoring
- Build dashboards and alerts for:
- System health
- Performance metrics
- Failure signals
- Ensure issues are detected before customers report them
6. Reduce operational fragility
- Remove single points of failure (people, systems, workflows)
- Improve system resilience across:
- Deployments
- Integrations
- Runtime environments
What success looks like
- Uptime reaches 99.9%+ reliably
- Incidents are:
- Detected early
- Contained quickly
- Resolved permanently
- No dependency on a single individual for escalation
- System behavior becomes predictable and stable
- Engineering teams ship with higher reliability confidence
Who you are
- You have 2-5 years of experience in SRE / DevOps / backend systems
- You have worked on production systems with real uptime expectations
- You think in:
- Systems
- Failure modes
- Trade-offs
- You are comfortable debugging live, high-pressure environments
What will make you stand out
- Experience with:
- Distributed systems
- Cloud infrastructure (AWS / Azure / GCP)
- Monitoring & alerting tools
- Have built or improved:
- Incident response systems
- Reliability frameworks
- Strong debugging skills across:
- Infra
- Application
- Integrations
Compensation
₹60,000/month (fixed)
(Aligned with role scope and impact expectations)
Why join
- You will define reliability standards for a production AI platform
- Your work directly impacts:
- Customer trust
- Product performance
- Enterprise readiness
- You will move the system from reactive → predictable
What this role is not
- Not just monitoring dashboards
- Not limited to handling tickets
- Not dependent on escalation to leadership
What this role is
- A builder of reliability systems
- A guardian of uptime and performance
- A multiplier of engineering quality
One question to self-evaluate
Can you build a system where downtime is rare, predictable, and never dependent on a single person?
Experience: 3+ years of experience in Cloud Architecture
About Company:
The company is a global leader in secure payments and trusted transactions. They are at the forefront of the digital revolution that is shaping new ways of paying, living, doing business and building relationships that pass on trust along the entire payments value chain, enabling sustainable economic growth. Their innovative solutions, rooted in a rock-solid technological base, are environmentally friendly, widely accessible and support social transformation.
Cloud Architect / Lead
- Role Overview
- Senior Engineer with a strong background and experience in cloud related technologies and architectures. Can design target cloud architectures to transform existing architectures together with the in-house team. Can actively hands-on configure and build cloud architectures and guide others.
- Key Knowledge
- 3-5+ years of experience in AWS/GCP or Azure technologies
- Is likely certified on one or more of the major cloud platforms
- Strong experience from hands-on work with technologies such as Terraform, K8S, Docker and orchestration of containers.
- Ability to guide and lead internal agile teams on cloud technology
- Background from the financial services industry or similar critical operational experience
Who we are :
Stanza Living is India's largest and fastest growing tech-enabled, managed accommodation company that delivers a hospitality-led living experience to migrant students and young working professionals across India. We have a full-stack business model that focuses on design, development and delivery of daily living solutions tailored to the young consumers' lifestyle. From smartly-planned residences, host of amenities and services for hassle-free living to exclusive community engagement programmes - everything is seamlessly integrated through technology to ensure the highest consumer delight.
Today, we are :
- India's largest managed accommodation company with over 75,000+ beds under management across 25+ cities
- Most capitalized player in the managed accommodation space, backed by global marquee investors - Falcon Edge, Equity International, Sequoia Capital, Matrix Partners, Accel Partners
- Recognized as the Best Real Estate Tech company across the Globe in 2020 by leading analysis agency, Tracxn
- LinkedIn Top Startup to Work for - 2019
Objectives of this role :
- Work in tandem with our engineering team to identify and implement the most optimal cloud-based solutions for the company
- Define and document best practices and strategies regarding application deployment and infrastructure maintenance
- Provide guidance, thought leadership, and mentorship to developer teams to build their cloud competencies
- Ensure application performance, uptime, and scale, maintaining high standards for code quality and thoughtful design
- Manage cloud environments in accordance with company security guidelines
Job Description :
- Excellent understanding of Cloud Platform (AWS)
- Strong knowledge on AWS Services, design, configuration on enterprise systems
- Good knowledge on Kubernetes configuration, Dockers
- Understanding the needs of the business for defining AWS system specifications
- Understand Architecture Requirements and ensure effective support activities
- Evaluation and choosing suitable AWS Service or and suggesting methods for integration
- Overseeing assigned programs and guiding the team members
- Providing assistance when technical problems arise
- Making sure the agreed infrastructure and architecture are implemented
- Addressing the technical concerns, suggestions, and ideas
- Configure Monitoring systems to make sure they meet business goals as well as user requirements
- Excellent knowledge of AWS IaaS Layer
- Ability to lead & implement PS workloads or POCs
- Ensure continual knowledge management
We are looking for a DevOps Engineer (individual contributor) to maintain and build upon our next-generation infrastructure. We aim to ensure that our systems are secure, reliable and high-performing by constantly striving to achieve best-in-class infrastructure and security by:
- Leveraging a variety of tools to ensure all configuration is codified (using tools like Terraform and Flux) and applied in a secure, repeatable way (via CI)
- Routinely identifying new technologies and processes that enable us to streamline our operations and improve overall security
- Holistically monitoring our overall DevOps setup and health to ensure our roadmap constantly delivers high-impact improvements
- Eliminating toil by automating as many operational aspects of our day-to-day work as possible using internally created, third party and/or open-source tools
- Maintain a culture of empowerment and self-service by minimizing friction for developers to understand and use our infrastructure through a combination of innovative tools, excellent documentation and teamwork
Tech stack: Microservices primarily written in JavaScript, Kotlin, Scala, and Python. The majority of our infrastructure sits within EKS on AWS, using Istio. We use Terraform and Helm/Flux when working with AWS and EKS (k8s). Deployments are managed with a combination of Jenkins and Flux. We rely heavily on Kafka, Cassandra, Mongo and Postgres and are increasingly leveraging AWS-managed services (e.g. RDS, lambda).
We are a self organized engineering team with a passion for programming and solving business problems for our customers. We are looking to expand our team capabilities on the DevOps front and are on a lookout for 4 DevOps professionals having relevant hands on technical experience of 4-8 years.
We encourage our team to continuously learn new technologies and apply the learnings in the day to day work even if the new technologies are not adopted. We strive to continuously improve our DevOps practices and expertise to form a solid backbone for the product, customer relationships and sales teams which enables them to add new customers every week to our financing network.
As a DevOps Engineer, you :
- Will work collaboratively with the engineering and customer support teams to deploy and operate our systems.
- Build and maintain tools for deployment, monitoring and operations.
- Help automate and streamline our operations and processes.
- Troubleshoot and resolve issues in our test and production environments.
- Take control of various mandates and change management processes to ensure compliance for various certifications (PCI and ISO 27001 in particular)
- Monitor and optimize the usage of various cloud services.
- Setup and enforce CI/CD processes and practices
Skills required :
- Strong experience with AWS services (EC2, ECS, ELB, S3, SES, to name a few)
- Strong background in Linux/Unix administration and hardening
- Experience with automation using Ansible, Terraform or equivalent
- Experience with continuous integration and continuous deployment tools (Jenkins)
- Experience with container related technologies (docker, lxc, rkt, docker swarm, kubernetes)
- Working understanding of code and script (Python, Perl, Ruby, Java)
- Working understanding of SQL and databases
- Working understanding of version control system (GIT is preferred)
- Managing IT operations, setting up best practices and tuning them from time-totime.
- Ensuring that process overheads do not reduce the productivity and effectiveness of small team. - Willingness to explore and learn new technologies and continuously refactor thetools and processes.
We are looking for an experienced DevOps (Development and Operations) professional to join our growing organization. In this position, you will be responsible for finding and reporting bugs in web and mobile apps & assist Sr DevOps to manage infrastructure projects and processes. Keen attention to detail, problem-solving abilities, and a solid knowledge base are essential.
As a DevOps, you will work in a Kubernetes based microservices environment.
Experience in Microsoft Azure cloud and Kubernetes is preferred, not mandatory.
Ultimately, you will ensure that our products, applications and systems work correctly.
Responsibilities:
- Detect and track software defects and inconsistencies
- Apply quality engineering principals throughout the Agile product lifecycle
- Handle code deployments in all environments
- Monitor metrics and develop ways to improve
- Consult with peers for feedback during testing stages
- Build, maintain, and monitor configuration standards
- Maintain day-to-day management and administration of projects
- Manage CI and CD tools with team
- Follow all best practices and procedures as established by the company
- Provide support and documentation
Required Technical and Professional Expertise
- Minimum 2+ years if DevOps
- Have experience in SaaS infrastructure development and Web Apps
- Experience in delivering microservices at scale; designing microservices solutions
- Proven Cloud experience/delivery of applications on Azure
- Proficient in configuration Management tools such as Ansible or any of Terraform Puppet, Chef, Salt, etc
- Hands-on experience in Networking/network configuration, Application performance monitoring, Container performance, and security.
- Understanding of Kubernetes, Python along with scripting languages like bash/shell
- Good to have experience in Linux internals, Linux packaging, Release Engineering (Branching, versioning, tagging), Artifact repository, Artifactory, Nexus, and CI/CD tooling (Concourse CI, Travis, Jenkins)
- Must be a proactive person
- You love collaborative environments that use agile methodologies to encourage creative design thinking and find innovative ways to develop with cutting edge technologies
- An ambitious individual who can work under their own direction towards agreed targets/goals and with a creative approach to work.
- An intuitive individual with an ability to manage change and proven time management
- Proven interpersonal skills while contributing to team effort by accomplishing related results as needed.
Location – Pune
Experience - 1.5 to 3 YR
Payroll: Direct with Client
Salary Range: 3 to 5 Lacs (depending on existing)
Role and Responsibility
• Good understanding and Experience on AWS CloudWatch for ES2, Amazon Web Services, and Resources, and other sources.
• Collect and Store logs
• Monitor and Store Logs
• Log Analyze
• Configure Alarm
• Configure Dashboard
• Preparation and following of SOP's, Documentation.
• Good understanding AWS in DevOps.
• Experience with AWS services ( EC2, ECS, CloudWatch, VPC, Networking )
• Experience with a variety of infrastructure, application, and log monitoring tools ~ Prometheus, Grafana,
• Familiarity with Docker, Linux, and Linux security
• Knowledge and experience with container-based architectures like Docker
• Experience on performing troubleshooting on AWS service.
• Experience in configuring services in AWS like EC2, S3, ECS
• Experience with Linux system administration and engineering skills on Cloud infrastructure
• Knowledge of Load Balancers, Firewalls, and network switching components
• Knowledge of Internet-based technologies - TCP/IP, DNS, HTTP, SMTP & Networking concepts
• Knowledge of security best practices
• Comfortable 24x7 supporting Production environments
• Strong communication skills

EXP:: 4 - 7 yrs
- Any scripting language:: Python, Scala, shell or bash
- Cloud:: AWS
- Database:: Relational (SQL) & non-relational (NoSQL)
- CI/CD tools and Version controlling
We are looking for a Sr. Engineer DevOps and SysOps, who is responsible for managing AWS and Azure cloud computing. Your primary focus would be to help multiple projects with various cloud service implementation, create and manage CI/CD pipelines for deployment, explore new services on cloud and help projects to implement them.
Technical Requirements & Responsibilities
- Have 4+ years’ experience as a DevOps and SysOps Engineer.
- Apply cloud computing skills to deploy upgrades and fixes on AWS and Azure (GCP is optional / Good to have).
- Design, develop, and implement software integrations based on user feedback.
- Troubleshoot production issues and coordinate with the development team to streamline code deployment.
- Implement automation tools and frameworks (CI/CD pipelines).
- Analyze code and communicate detailed reviews to development teams to ensure a marked improvement in applications and the timely completion of projects.
- Collaborate with team members to improve the company’s engineering tools, systems and procedures, and data security.
- Optimize the company’s computing architecture.
- Conduct systems tests for security, performance, and availability.
- Develop and maintain design and troubleshooting documentation.
- Expert in code deployment tools (Puppet, Ansible, and Chef).
- Can maintain Java / PHP / Ruby on Rail / DotNet web applications.
- Experience in network, server, and application-status monitoring.
- Possess a strong command of software-automation production systems (Jenkins and Selenium).
- Expertise in software development methodologies.
- You have working knowledge of known DevOps tools like Git and GitHub.
- Possess a problem-solving attitude.
- Can work independently and as part of a team.
Soft Skills Requirements
- Strong communication skills
- Agility and quick learner
- Attention to detail
- Organizational skills
- Understanding of the Software development life cycle
- Good Analytical and problem-solving skills
- Self-motivated with the ability to prioritize, meet deadlines, and manage changing priorities
- Should have a high level of energy working as an individual contributor and as a part of team.
- Good command over verbal and written English communication






