
Engineering Leader- Cloud Infrastructure
at Our client company is into Computer software. (YB1)
- Lead, inspire, and influence to make sure your team is successful
- Partner with the recruiting team to attract and retain high-quality and diverse talent
- Establish great rapport with other development teams, Product Managers, Sales, and Customer Success to maintain high levels of visibility, efficiency, and collaboration
- Ensure teams have appropriate technical direction, leadership, and balance between short-term impact and long-term architectural vision.
- Occasionally contributing to development tasks such as coding and feature verifications to assist teams with release commitments, to gain an understanding of the deeply technical product as well as to keep your technical acumen sharp
You'll need:
- BS/MS degree in CS-or- a related field with 5+ years of engineering management experience leading productive, high-functioning teams
- Strong fundamentals in distributed systems design and development
- Ability to hire while ensuring a high hiring bar, keep engineers motivated, coach/mentor, and handle performance management
- Experience running production services in Public Clouds such as AWS, GCP, and Azure
- Experience with running large stateful data systems in the Cloud
- Prior knowledge of Cloud architecture and implementation features (multi-tenancy, containerization, orchestration, elastic scalability)
- A great track record of shipping features and hitting deadlines consistently; should be able to move fast, build in increments and iterate; have a sense of urgency, aggressive mindset towards achieving results and excellent prioritization skills; able to anticipate future technical needs for the product and craft plans to realize them
- Ability to influence the team, peers, and upper management using effective communication and collaborative techniques; focused on building and maintaining a culture of collaboration within the team

Similar jobs

Location: Bangalore
Experience: 2–5 years
Type: Full-time | On-site
Start: Immediate
Why this role exists
Most systems don’t fail because of one big outage.
They fail because reliability is treated as an afterthought.
Right now, uptime depends too much on individual heroics.
That doesn’t scale.
This role exists to build a reliability system where:
- Uptime is predictable
- Failures are contained
- Escalations don’t depend on leadership
What you’ll do
You will not just monitor systems.
You will own reliability as a product.
1. Drive uptime to production-grade reliability
- Improve system uptime to 99.9% customer-facing SLA within 4 months
- Define and track:
- SLAs / SLOs / error budgets
- Ensure reliability is measured from the customer’s perspective, not internal metrics
2. Build incident response as a system
- Set up a 24/7 incident response rotation across 3 engineers
- Eliminate dependency on leadership (no single escalation point)
- Define:
- Incident severity levels
- Response playbooks
- Escalation protocols
- Ensure fast detection → containment → resolution
3. Contain and fix erratic system behavior
- Identify and resolve:
- Latency spikes
- Downtime incidents
- Integration failures
- Build guardrails to prevent recurrence
- Focus on root cause elimination, not temporary fixes
4. Create continuous reliability feedback loops
- Work closely with engineering teams to:
- Surface recurring failure patterns
- Improve build quality
- Reduce production bugs
- Ensure learnings from incidents directly improve future releases
5. Improve observability and monitoring
- Build dashboards and alerts for:
- System health
- Performance metrics
- Failure signals
- Ensure issues are detected before customers report them
6. Reduce operational fragility
- Remove single points of failure (people, systems, workflows)
- Improve system resilience across:
- Deployments
- Integrations
- Runtime environments
What success looks like
- Uptime reaches 99.9%+ reliably
- Incidents are:
- Detected early
- Contained quickly
- Resolved permanently
- No dependency on a single individual for escalation
- System behavior becomes predictable and stable
- Engineering teams ship with higher reliability confidence
Who you are
- You have 2-5 years of experience in SRE / DevOps / backend systems
- You have worked on production systems with real uptime expectations
- You think in:
- Systems
- Failure modes
- Trade-offs
- You are comfortable debugging live, high-pressure environments
What will make you stand out
- Experience with:
- Distributed systems
- Cloud infrastructure (AWS / Azure / GCP)
- Monitoring & alerting tools
- Have built or improved:
- Incident response systems
- Reliability frameworks
- Strong debugging skills across:
- Infra
- Application
- Integrations
Compensation
₹60,000/month (fixed)
(Aligned with role scope and impact expectations)
Why join
- You will define reliability standards for a production AI platform
- Your work directly impacts:
- Customer trust
- Product performance
- Enterprise readiness
- You will move the system from reactive → predictable
What this role is not
- Not just monitoring dashboards
- Not limited to handling tickets
- Not dependent on escalation to leadership
What this role is
- A builder of reliability systems
- A guardian of uptime and performance
- A multiplier of engineering quality
One question to self-evaluate
Can you build a system where downtime is rare, predictable, and never dependent on a single person?
About the Role:
We are looking for a skilled AWS DevOps Engineer to join our Cloud Operations team in Bangalore. This hybrid role is ideal for someone with hands-on experience in AWS and a strong background in application migration from on-premises to cloud environments. You'll play a key role in driving cloud adoption, optimizing infrastructure, and ensuring seamless cloud operations.
Key Responsibilities:
- Manage and maintain AWS cloud infrastructure and services.
- Lead and support application migration projects from on-prem to cloud.
- Automate infrastructure provisioning using Infrastructure as Code (IaC) tools.
- Monitor cloud environments and optimize cost, performance, and reliability.
- Collaborate with development, operations, and security teams to implement DevOps best practices.
- Troubleshoot and resolve infrastructure and deployment issues.
Required Skills:
- 3–5 years of experience in AWS cloud environment.
- Proven experience with on-premises to cloud application migration.
- Strong understanding of AWS core services (EC2, VPC, S3, IAM, RDS, etc.).
- Solid scripting skills (Python, Bash, or similar).
Good to Have:
- Experience with Terraform for Infrastructure as Code.
- Familiarity with Kubernetes for container orchestration.
- Exposure to CI/CD tools like Jenkins, GitLab, or AWS CodePipeline.
Overview
adesso India specialises in optimization of core business processes for organizations. Our focus is on providing state-of-the-art solutions that streamline operations and elevate productivity to new heights.
Comprised of a team of industry experts and experienced technology professionals, we ensure that our software development and implementations are reliable, robust, and seamlessly integrated with the latest technologies. By leveraging our extensive knowledge and skills, we empower businesses to achieve their objectives efficiently and effectively.
Job Description
The client’s department DPS, Digital People Solutions, offers a sophisticated portfolio of IT applications, providing a strong foundation for professional and efficient People & Organization (P&O) and Business Management, both globally and locally, for a well-known German company listed on the DAX-40 index, which includes the 40 largest and most liquid companies on the Frankfurt Stock Exchange
We are seeking talented DevOps-Engineers with focus on Elastic Stack (ELK) to join our dynamic DPS team. In this role, you will be responsible for refining and advising on the further development of an existing monitoring solution based on the Elastic Stack (ELK). You will independently handle tasks related to architecture, setup, technical migration, and documentation.
The current application landscape features multiple Java web services running on JEE application servers, primarily hosted on AWS, and integrated with various systems such as SAP, other services, and external partners. DPS is committed to delivering the best digital work experience for the customers employees and customers alike.
Responsibilities:
Install, set up, and automate rollouts using Ansible/CloudFormation for all stages (Dev, QA, Prod) in the AWS Cloud for components such as Elastic Search, Kibana, Metric beats, APM server, APM agents, and interface configuration.
Create and develop regular "Default Dashboards" for visualizing metrics from various sources like Apache Webserver, application servers and databases.
Improve and fix bugs in installation and automation routines.
Monitor CPU usage, security findings, and AWS alerts.
Develop and extend "Default Alerting" for issues like OOM errors, datasource issues, and LDAP errors.
Monitor storage space and create concepts for expanding the Elastic landscape in AWS Cloud and Elastic Cloud Enterprise (ECE).
Implement machine learning, uptime monitoring including SLA, JIRA integration, security analysis, anomaly detection, and other useful ELK Stack features.
Integrate data from AWS CloudWatch.
Document all relevant information and train involved personnel in the used technologies.
Requirements:
Experience with Elastic Stack (ELK) components and related technologies.
Proficiency in automation tools like Ansible and CloudFormation.
Strong knowledge of AWS Cloud services.
Experience in creating and managing dashboards and alerts.
Familiarity with IAM roles and rights management.
Ability to document processes and train team members.
Excellent problem-solving skills and attention to detail.
Skills & Requirements
Elastic Stack (ELK), Elasticsearch, Kibana, Logstash, Beats, APM, Ansible, CloudFormation, AWS Cloud, AWS CloudWatch, IAM roles, AWS security, Automation, Monitoring, Dashboard creation, Alerting, Anomaly detection, Machine learning integration, Uptime monitoring, JIRA integration, Apache Webserver, JEE application servers, SAP integration, Database monitoring, Troubleshooting, Performance optimization, Documentation, Training, Problem-solving, Security analysis.
Company Overview
Adia Health revolutionizes clinical decision support by enhancing diagnostic accuracy and personalizing care. It modernizes the diagnostic process by automating optimal lab test selection and interpretation, utilizing a combination of expert medical insights, real-world data, and artificial intelligence. This approach not only streamlines the diagnostic journey but also ensures precise, individualized patient care by integrating comprehensive medical histories and collective platform knowledge.
Position Overview
We are seeking a talented and experienced Site Reliability Engineer/DevOps Engineer to join our dynamic team. The ideal candidate will be responsible for ensuring the reliability, scalability, and performance of our infrastructure and applications. You will collaborate closely with development, operations, and product teams to automate processes, implement best practices, and improve system reliability.
Key Responsibilities
- Design, implement, and maintain highly available and scalable infrastructure solutions using modern DevOps practices.
- Automate deployment, monitoring, and maintenance processes to streamline operations and increase efficiency.
- Monitor system performance and troubleshoot issues, ensuring timely resolution to minimize downtime and impact on users.
- Implement and manage CI/CD pipelines to automate software delivery and ensure code quality.
- Manage and configure cloud-based infrastructure services to optimize performance and cost.
- Collaborate with development teams to design and implement scalable, reliable, and secure applications.
- Implement and maintain monitoring, logging, and alerting solutions to proactively identify and address potential issues.
- Conduct periodic security assessments and implement appropriate measures to ensure the integrity and security of systems and data.
- Continuously evaluate and implement new tools and technologies to improve efficiency, reliability, and scalability.
- Participate in on-call rotation and respond to incidents promptly to ensure system uptime and availability.
Qualifications
- Bachelor's degree in Computer Science, Engineering, or related field
- Proven experience (5+ years) as a Site Reliability Engineer, DevOps Engineer, or similar role
- Strong understanding of cloud computing principles and experience with AWS
- Experience of building and supporting complex CI/CD pipelines using Github
- Experience of building and supporting infrastructure as a code using Terraform
- Proficiency in scripting and automating tools
- Solid understanding of networking concepts and protocols
- Understanding of security best practices and experience implementing security controls in cloud environments
- Knowing modern security requirements like SOC2, HIPAA, HITRUST will be a solid advantage.
Role : Principal Devops Engineer
About the Client
It is a Product base company that has to build a platform using AI and ML technology for their transportation and logiticsThey also have a presence in the global market
Responsibilities and Requirements
• Experience in designing and maintaining high volume and scalable micro-services architecture on cloud infrastructure
• Knowledge in Linux/Unix Administration and Python/Shell Scripting
• Experience working with cloud platforms like AWS (EC2, ELB, S3, Auto-scaling, VPC, Lambda), GCP, Azure
• Knowledge in deployment automation, Continuous Integration and Continuous Deployment (Jenkins, Maven, Puppet, Chef, GitLab) and monitoring tools like Zabbix, Cloud Watch Monitoring, Nagios
• Knowledge of Java Virtual Machines, Apache Tomcat, Nginx, Apache Kafka, Microservices architecture, Caching mechanisms
• Experience in enterprise application development, maintenance and operations
• Knowledge of best practices and IT operations in an always-up, always-available service
• Excellent written and oral communication skills, judgment and decision-making skill
- At least 5 year of experience in Cloud technologies-AWS and Azure and developing.
- Experience in implementing DevOps practices and DevOps-tools in areas like CI/CD using Jenkins environment automation, and release automation, virtualization, infra as a code or metrics tracking.
- Hands on experience in DevOps tools configuration in different environments.
- Strong knowledge of working with DevOps design patterns, processes and best practices
- Hand-on experience in Setting up Build pipelines.
- Prior working experience in system administration or architecture in Windows or Linux.
- Must have experience in GIT (BitBucket, GitHub, GitLab)
- Hands-on experience on Jenkins pipeline scripting.
- Hands-on knowledge in one scripting language (Nant, Perl, Python, Shell or PowerShell)
- Configuration level skills in tools like SonarQube (or similar tools) and Artifactory.
- Expertise on Virtual Infrastructure (VMWare or VirtualBox or QEMU or KVM or Vagrant) and environment automation/provisioning using SaltStack/Ansible/Puppet/Chef
- Deploying, automating, maintaining and managing Azure cloud based production systems including monitoring capacity.
- Good to have experience in migrating code repositories from one source control to another.
- Hands-on experience in Docker container and orchestration based deployments like Kubernetes, Service Fabric, Docker swarm.
- Must have good communication skills and problem solving skills
Job description
Problem Statement-Solution
Only 10% of India speaks English and 90% speak over 25 languages and 1000s of dialects. The internet has largely been in English. A good part of India is now getting internet connectivity thanks to cheap smartphones and Jio. The non-English speaking internet users will balloon to about 600 million users out of the total 750 million internet users in India by 2020. This will make the vernacular segment one of the largest segments in the world - almost 2x the size of the US population. The vernacular segment has very few products that they can use on the internet.
One large human need is that of sharing thoughts and connecting with people of the same community on the basis of language and common interests. Twitter serves this need globally but the experience is mostly in English. There’s a large unaddressed need for these vernacular users to express themselves in their mother tongue and connect with others from their community. Koo is a solution to this problem.
About Koo
Koo was founded in March 2020, as a micro-blogging platform in both Indian languages and English, which gives a voice to the millions of Indians who communicate in Indian languages.
Currently available in Assamese, Bengali, English, Hindi, Kannada, Marathi, Tamil and Telugu, Koo enables people from across India to express themselves online in their mother tongues. In a country where under 10% of the population speaks English as a native language, Koo meets the need for a social media platform that can deliver an immersive language experience to an Indian user, thereby enabling them to connect and interact with each other. The recently introduced ‘Talk to Type’ enables users to leverage the voice assistant to share their thoughts without having to type. In August 2021, Koo crossed 10 million downloads, in just 16 months of launch.
Since June 2021, Koo is available in Nigeria.
Founding Team
Koo is founded by veteran internet entrepreneurs - Aprameya Radhakrishna (CEO, Taxiforsure) and Mayank Bidawatka (Co-founder, Goodbox & Coreteam, redBus).
Technology Team & Culture
The technology team comprises sharp coders, technology geeks and guys who have been entrepreneurs or are entrepreneurial and extremely passionate towards technology. Talent is coming from the likes of Google, Walmart, Redbus, Dailyhunt. Anyone being part of a technology team will have a lot to learn from their peers and mentors. Download our android app and take a look at what we’ve built. Technology stack compromises of a wide variety of cutting-edge technologies like Kotlin, Java 15, Reactive Programming, MongoDB, Cassandra, Kubernetes, AWS, NodeJS, Python, ReactJS, Redis, Aerospike, ML, Deep learning etc. We believe in giving a lot of independence and autonomy to ownership-driven individuals.
Technology skill sets required for a matching profile
- Experience between 3 to 7 years in a devops role with mandatory one or more stints at a fast paced startup.
- Mandatory experience with containers, kubernetes (EKS) (setting up from scratch), istio and microservices.
- Sound knowledge of technologies like Terraform, Automation Scripts, Cron jobs etc. Must have worked with goals of putting infra as code like setting up new environments with code.
- Knowledge of industry standards around monitoring, alerting, self healing, high availability, auto scaling etc.
- Exhaustive experience with various cloud technologies (especially on AWS) like SQS, SNS, Elastic Search, Elastic Cache, Elastic Transcoder, VPC, Subnets, Security groups etc.
- Must have set up stable CI-CD pipelines with capabilities to do zero downtime deployments on any one of rolling updates, blue green or canary modes.
- Experience with VPN and LDAP solutions for securely login to infrastructure and proving SSO. 8. Master of deploying and troubleshooting all layers of application from network, frontend, backend and databases (MongoDB, Redis, Postgres, Cassandra, ElasticSearch, Aerospike etc).
Location: Bengaluru
Department: DevOps
We are looking for extraordinary infrastructure engineers to build a world class
cloud platform that scales to millions of users. You must have experience
building key portions of a highly scalable infrastructure using Amazon AWS and
should know EC2, S3, EMR like the back of your hand. You must enjoy working
in a fast-paced startup and enjoy wearing multiple hats to get the job done.
Responsibilities
● Manage AWS server farm Own AWS infrastructure automation and
support.
● Own production deployments in multiple AWS environments
● End-end backend engineering infra charter includes Dev ops,Global
deployment, Security and compliances according to latest practices.
Ability to guide the team in debugging production issues and write
best-of-the breed code.
● Drive “engineering excellence” (defects, productivity through automation,
performance of products etc) through clearly defined metrics.
● Stay current with the latest tools, technology ideas and methodologies;
share knowledge by clearly articulating results and ideas to key decision
makers.
● Hiring, mentoring and retaining a very talented team.
Requirements
● B.S. or M.S in Computer Science or a related field (math, physics,
engineering)
● 5-8 years of experience in maintaining infrastructure system/devops
● Enjoy playing with tech like nginx, haproxy, postgres, AWS, ansible,
docker, nagios, or graphite Deployment automation experience with
Puppet/Chef/Ansible/Salt Stack Work with small, tightly knit product
teams that function cohesively to move as quickly as possible.
● Determination to provide reliable and fault tolerant systems to the
application developers that consume them
● Experience in developing Java/C++ backend systems is a huge plus Be a
strong team player.
Preferred
Deep working knowledge of Linux servers and networked environments
Thorough understanding of distributed systems and the protocols they use,
including TCP/IP, RESTful APIs, SQL, NoSQL. Experience in managing a NoSQL
database (Cassandra) is a huge plus.
● Building and managing multiple application environments on AWS using automation tools like Terraform or
Cloudformation etc.
● Deploy applications with zero downtime via automation with configuration management tools such as Ansible.
● Setting up Infrastructure monitoring tools such as Prometheus, Grafana
● Setting up centralised logging using tools such as ELK.
● Containerisation of applications/microservices.
● Ensure application availability to 99.9% with highly available infrastructure.
● Monitoring performance of applications and databases.
● Ensuring that systems are safe and secure against cyber security threats.
● Working with software developers to ensure that release cycle and deployment processes are followed.
● Evaluating existing applications and platforms, give recommendations for enhancing performance via gap analysis,
identifying the most practical alternative solutions and assisting with modifications.
Skills -
● Strong knowledge of AWS Managed Services such as EC2, RDS, ECS, ECR, S3, Cloudfront, SES, Redshift, Elastic Cache,
AMQP etc.
● Experience in handling production workloads.
● Experience with Nginx web server.
● Experience with NoSql and Sql Databases such as MongoDB, Postgresql etc.
● Experience with Containerisation of applications/micro services using Docker.
● Understanding of system administration in Linux environments.
● Strong Knowledge of Infrastructure as a Code such as Terraform, Cloudformation etc.
● Strong knowledge of configuration management tools such as Ansible, Chef etc.
● Familiarity with tools such as GitLab, Jenkins, Vercel, JIRA etc.
● Proficiency in scripting languages including Bash, Python etc.
● Full understanding of software development lifecycle best practices and agile methodology
● Strong communication and documentation skills.
● An ability to drive to goals and milestones while valuing and maintaining a strong attention to detail
● Excellent judgment, analytical thinking, and problem-solving skills
● Self-motivated individual that possesses excellent time management and organizational skills
As DevOps Engineer, you are responsible to setup and maintain GIT repository, DevOps tools like Jenkins, UCD, Docker, Kubernetes, Jfrog Artifactory, Cloud monitoring tools, Cloud security.
- Setup, configure, and maintain GIT repos, Jenkins, UCD, etc. for multi hosting cloud environments.
- Architect and maintain the server infrastructure in AWS. Build highly resilient infrastructure following industry best practices.
- Working on Docker images and maintaining Kubernetes clusters.
- Develop and maintain the automation scripts using Ansible or other available tools.
- Maintain and monitor cloud Kubernetes Clusters and patching when necessary.
- Working on Cloud security tools to keep applications secured.
- Participate in software development lifecycle, specifically infra design, execution, and debugging required to achieve successful implementation of integrated solutions within the portfolio.
- Required Technical and Professional Expertise.
- Minimum 4-6 years of experience in IT industry.
- Expertise in implementing and managing Devops CI/CD pipeline.
- Experience in DevOps automation tools. And Very well versed with DevOps Frameworks, Agile.
- Working knowledge of scripting using shell, Python, Terraform, Ansible or puppet or chef.
- Experience and good understanding in any of Cloud like AWS, Azure, Google cloud.
- Knowledge of Docker and Kubernetes is required.
- Proficient in troubleshooting skills with proven abilities in resolving complex technical issues.
- Experience with working with ticketing tools.
- Middleware technologies knowledge or database knowledge is desirable.
- Experience and well versed with Jira tool is a plus.
We look forward to connecting with you. As you may take time to review this opportunity, we will wait for a reasonable time of around 3-5 days before we screen the collected applications and start lining up job discussions with the hiring manager. However, we assure you that we will attempt to maintain a reasonable time window for successfully closing this requirement. The candidates will be kept informed and updated on the feedback and application status.








