
Lead DevOps Engineer
at Consumer Internet, Technology & Travel and Tourism Platform
Job Details
- Job Title: Lead DevOps Engineer
- Industry: Consumer Internet, Technology & Travel and Tourism Platform
- Function - IT
- Experience Required: 7-10 years
- Employment Type: Full Time
- Job Location: Bengaluru
- CTC Range: Best in Industry
Criteria:
- Strong Lead DevOps / Infrastructure Engineer Profiles.
- Must have 7+ years of hands-on experience working as a DevOps / Infrastructure Engineer.
- Candidate’s current title must be Lead DevOps Engineer (or equivalent Lead role) in the current organization
- Must have minimum 2+ years of team management / technical leadership experience, including mentoring engineers, driving infrastructure decisions, or leading DevOps initiatives.
- Must have strong hands-on experience with Kubernetes (container orchestration) including deployment, scaling, and cluster management.
- Must have experience with Infrastructure as Code (IaC) tools such as Terraform, Ansible, Chef, or Puppet.
- Must have strong scripting and automation experience using Python, Go, Bash, or similar scripting languages.
- Must have working experience with distributed databases or data systems such as MongoDB, Redis, Cassandra, Elasticsearch, or Puppet.
- Must have strong hands-on experience in Observability & Monitoring, CI/CD architecture, and Networking concepts in production environments.
- (Company) – Must be from B2C Product Companies only.
- (Education) – B.E/ B.Tech
Preferred
- Experience working in microservices architecture and event-driven systems.
- Exposure to cloud infrastructure, scalability, reliability, and cost optimization practices.
- (Skills) – Understanding of programming languages such as Go, Python, or Java.
- (Environment) – Experience working in high-growth startup or large-scale production environments.
Job Description
As a DevOps Engineer, you will be working on building and operating infrastructure at scale, designing and implementing a variety of tools to enable product teams to build and deploy their services independently, improving observability across the board, and designing for security, resiliency, availability, and stability. If the prospect of ensuring system reliability at scale and exploring cutting-edge technology to solve problems, excites you, then this is your fit.
Job Responsibilities:
- Own end-to-end infrastructure right from non-prod to prod environment including self-managed DBs
- Codify our infrastructure
- Do what it takes to keep the uptime above 99.99%
- Understand the bigger picture and sail through the ambiguities
- Scale technology considering cost and observability and manage end-to-end processes
- Understand DevOps philosophy and evangelize the principles across the organization
- Strong communication and collaboration skills to break down the silos

Similar jobs
Required Skills: Advanced AWS Infrastructure Expertise, CI/CD Pipeline Automation, Monitoring, Observability & Incident Management, Security, Networking & Risk Management, Infrastructure as Code & Scripting
Criteria:
- 5+ years of DevOps/SRE experience in cloud-native, product-based companies (B2C scale preferred)
- Strong hands-on AWS expertise across core and advanced services (EC2, ECS/EKS, Lambda, S3, CloudFront, RDS, VPC, IAM, ELB/ALB, Route53)
- Proven experience designing high-availability, fault-tolerant cloud architectures for large-scale traffic
- Strong experience building & maintaining CI/CD pipelines (Jenkins mandatory; GitHub Actions/GitLab CI a plus)
- Prior experience running production-grade microservices deployments and automated rollout strategies (Blue/Green, Canary)
- Hands-on experience with monitoring & observability tools (Grafana, Prometheus, ELK, CloudWatch, New Relic, etc.)
- Solid hands-on experience with MongoDB in production, including performance tuning, indexing & replication
- Strong scripting skills (Bash, Shell, Python) for automation
- Hands-on experience with IaC (Terraform, CloudFormation, or Ansible)
- Deep understanding of networking fundamentals (VPC, subnets, routing, NAT, security groups)
- Strong experience in incident management, root cause analysis & production firefighting
Description
Role Overview
Company is seeking an experienced Senior DevOps Engineer to design, build, and optimize cloud infrastructure on AWS, automate CI/CD pipelines, implement monitoring and security frameworks, and proactively identify scalability challenges. This role requires someone who has hands-on experience running infrastructure at B2C product scale, ideally in media/OTT or high-traffic applications.
Key Responsibilities
1. Cloud Infrastructure — AWS (Primary Focus)
- Architect, deploy, and manage scalable infrastructure using AWS services such as EC2, ECS/EKS, Lambda, S3, CloudFront, RDS, ELB/ALB, VPC, IAM, Route53, etc.
- Optimize cloud cost, resource utilization, and performance across environments.
- Design high-availability, fault-tolerant systems for streaming workloads.
2. CI/CD Automation
- Build and maintain CI/CD pipelines using Jenkins, GitHub Actions, or GitLab CI.
- Automate deployments for microservices, mobile apps, and backend APIs.
- Implement blue/green and canary deployments for seamless production rollouts.
3. Observability & Monitoring
- Implement logging, metrics, and alerting using tools like Grafana, Prometheus, ELK, CloudWatch, New Relic, etc.
- Perform proactive performance analysis to minimize downtime and bottlenecks.
- Set up dashboards for real-time visibility into system health and user traffic spikes.
4. Security, Compliance & Risk Highlighting
• Conduct frequent risk assessments and identify vulnerabilities in:
o Cloud architecture
o Access policies (IAM)
o Secrets & key management
o Data flows & network exposure
• Implement security best practices including VPC isolation, WAF rules, firewall policies, and SSL/TLS management.
5. Scalability & Reliability Engineering
- Analyze traffic patterns for OTT-specific load variations (weekends, new releases, peak hours).
- Identify scalability gaps and propose solutions across:
- o Microservices
- o Caching layers
- o CDN distribution (CloudFront)
- o Database workloads
- Perform capacity planning and load testing to ensure readiness for 10x traffic growth.
6. Database & Storage Support
- Administer and optimize MongoDB for high-read/low-latency use cases.
- Design backup, recovery, and data replication strategies.
- Work closely with backend teams to tune query performance and indexing.
7. Automation & Infrastructure as Code
- Implement IaC using Terraform, CloudFormation, or Ansible.
- Automate repetitive infrastructure tasks to ensure consistency across environments.
Required Skills & Experience
Technical Must-Haves
- 5+ years of DevOps/SRE experience in cloud-native, product-based companies.
- Strong hands-on experience with AWS (core and advanced services).
- Expertise in Jenkins CI/CD pipelines.
- Solid background working with MongoDB in production environments.
- Good understanding of networking: VPCs, subnets, security groups, NAT, routing.
- Strong scripting experience (Bash, Python, Shell).
- Experience handling risk identification, root cause analysis, and incident management.
Nice to Have
- Experience with OTT, video streaming, media, or any content-heavy product environments.
- Familiarity with containers (Docker), orchestration (Kubernetes/EKS), and service mesh.
- Understanding of CDN, caching, and streaming pipelines.
Personality & Mindset
- Strong sense of ownership and urgency—DevOps is mission critical at OTT scale.
- Proactive problem solver with ability to think about long-term scalability.
- Comfortable working with cross-functional engineering teams.
Why Join company?
• Build and operate infrastructure powering millions of monthly users.
• Opportunity to shape DevOps culture and cloud architecture from the ground up.
• High-impact role in a fast-scaling Indian OTT product.
Job Title : DevOps Engineer
Experience : 3+ Years
Location : Mumbai
Employment Type : Full-time
Job Overview :
We’re looking for an experienced DevOps Engineer to design, build, and manage Kubernetes-based deployments for a microservices data discovery platform.
The ideal candidate has strong hands-on expertise with Helm, Docker, CI/CD pipelines, and cloud networking — and can handle complex deployments across on-prem, cloud, and air-gapped environments.
Mandatory Skills :
✅ Helm, Kubernetes, Docker
✅ Jenkins, ArgoCD, GitOps
✅ Cloud Networking (VPCs, bare metal vs. VMs)
✅ Storage (MinIO, Ceph, NFS, S3/EBS)
✅ Air-gapped & multi-tenant deployments
Key Responsibilities :
- Build and customize Helm charts from scratch.
- Implement CI/CD pipelines using Jenkins, ArgoCD, GitOps.
- Manage containerized deployments across on-prem/cloud setups.
- Work on air-gapped and restricted environments.
- Optimize for scalability, monitoring, and security (Prometheus, Grafana, RBAC, HPA).
Job Title : Senior DevOps Engineer
Location : Remote
Experience Level : 5+ Years
Role Overview :
We are a funded AI startup seeking a Senior DevOps Engineer to design, implement, and maintain a secure, scalable, and efficient infrastructure. In this role, you will focus on automating operations, optimizing deployment processes, and enabling engineering teams to deliver high-quality products seamlessly.
Key Responsibilities:
Infrastructure Scalability & Reliability :
- Architect and manage cloud infrastructure on AWS, GCP, or Azure for high availability, reliability, and cost-efficiency.
- Implement container orchestration using Kubernetes or Docker Compose.
- Utilize Infrastructure as Code (IaC) tools like Pulumi or Terraform to manage and configure infrastructure.
Deployment Automation :
- Design and maintain CI/CD pipelines using GitHub Actions, Jenkins, or similar tools.
- Implement deployment strategies such as canary or blue-green deployments, and create rollback mechanisms to ensure seamless updates.
Monitoring & Observability :
- Leverage tools like OpenTelemetry, Grafana, and Datadog to monitor system health and performance.
- Establish centralized logging systems and create real-time dashboards for actionable insights.
Security & Compliance :
- Securely manage secrets using tools like HashiCorp Vault or Doppler.
- Conduct static code analysis with tools such as SonarQube or Snyk to ensure compliance with security standards.
Collaboration & Team Enablement :
- Mentor and guide team members on DevOps best practices and workflows.
- Document infrastructure setups, incident runbooks, and troubleshooting workflows to enhance team efficiency.
Required Skills :
- Expertise in managing cloud platforms like AWS, GCP, or Azure.
- In-depth knowledge of Kubernetes, Docker, and IaC tools like Terraform or Pulumi.
- Advanced scripting capabilities in Python or Bash.
- Proficiency in CI/CD tools such as GitHub Actions, Jenkins, or similar.
- Experience with observability tools like Grafana, OpenTelemetry, and Datadog.
- Strong troubleshooting skills for debugging production systems and optimizing performance.
Preferred Qualifications :
- Experience in scaling AI or ML-based applications.
- Familiarity with distributed systems and microservices architecture.
- Understanding of agile methodologies and DevSecOps practices.
- Certifications in AWS, Azure, or Kubernetes.
What We Offer :
- Opportunity to work in a fast-paced AI startup environment.
- Flexible remote work culture.
- Competitive salary and equity options.
- Professional growth through challenging projects and learning opportunities.
Must Have -
a. Background working with Startups
b. Good knowledge of Kubernetes & Docker
c. Background working in Azure
What you’ll be doing
- Ensure that our applications and environments are stable, scalable, secure and performing as expected.
- Proactively engage and work in alignment with cross-functional colleagues to understand their requirements, contributing to and providing suitable supporting solutions.
- Develop and introduce systems to aid and facilitate rapid growth including implementation of deployment policies, designing and implementing new procedures, configuration management and planning of patches and for capacity upgrades
- Observability: ensure suitable levels of monitoring and alerting are in place to keep engineers aware of issues.
- Establish runbooks and procedures to keep outages to a minimum. Jump in before users notice that things are off track, then automate it for the future.
- Automate everything so that nothing is ever done manually in production.
- Identify and mitigate reliability and security risks. Make sure we are prepared for peak times,
- DDoS attacks and fat fingers.
- Troubleshoot issues across the whole stack - software, applications and network.
- Manage individual project priorities, deadlines, and deliverables as part of a self-organizing team.
- Learn and unlearn every day by exchanging knowledge and new insights, conducting constructive code reviews, and participating in retrospectives.
Requirements
- 2+ years extensive experience of Linux server administration include patching, packaging (rpm), performance tuning, networking, user management, and security.
- 2+ years of implementing systems that are highly available, secure, scalable, and self-healingon Azure cloud platform
- Strong understanding of networking, especially in cloud environments along with a good understanding of CICD.
- Prior experience implementing industry standard security best practices, including those recommended by Azure
- Proficiency with Bash, and any high-level scripting language.
- Basic working knowledge of observability stacks like ELK, prometheus, grafana, Signoz etc
- Proficiency with Infrastructure as Code and Infrastructure Testing, preferably using Pulumi/Terraform.
- Hands-on experience in building and administering VMs and Containers using tools such as Docker/Kubernetes.
- Excellent communication skills, spoken as well as written, with a demonstrated ability to articulate technical problems and projects to all stakeholders.
- At least 5 year of experience in Cloud technologies-AWS and Azure and developing.
- Experience in implementing DevOps practices and DevOps-tools in areas like CI/CD using Jenkins environment automation, and release automation, virtualization, infra as a code or metrics tracking.
- Hands on experience in DevOps tools configuration in different environments.
- Strong knowledge of working with DevOps design patterns, processes and best practices
- Hand-on experience in Setting up Build pipelines.
- Prior working experience in system administration or architecture in Windows or Linux.
- Must have experience in GIT (BitBucket, GitHub, GitLab)
- Hands-on experience on Jenkins pipeline scripting.
- Hands-on knowledge in one scripting language (Nant, Perl, Python, Shell or PowerShell)
- Configuration level skills in tools like SonarQube (or similar tools) and Artifactory.
- Expertise on Virtual Infrastructure (VMWare or VirtualBox or QEMU or KVM or Vagrant) and environment automation/provisioning using SaltStack/Ansible/Puppet/Chef
- Deploying, automating, maintaining and managing Azure cloud based production systems including monitoring capacity.
- Good to have experience in migrating code repositories from one source control to another.
- Hands-on experience in Docker container and orchestration based deployments like Kubernetes, Service Fabric, Docker swarm.
- Must have good communication skills and problem solving skills
What we look for:
As a DevOps Developer, you will contribute to a thriving and growing AIGovernance Engineering team. You will work in a Kubernetes-based microservices environment to support our bleeding-edge cloud services. This will include custom solutions, as well as open source DevOps tools (build and deploy automation, monitoring and data gathering for our software delivery pipeline). You will also be contributing to our continuous improvement and continuous delivery while increasing maturity of DevOps and agile adoption practices.
Responsibilities:
- Ability to deploy software using orchestrators /scripts/Automation on Hybrid and Public clouds like AWS
- Ability to write shell/python/ or any unix scripts
- Working Knowledge on Docker & Kubernetes
- Ability to create pipelines using Jenkins or any CI/CD tool and GitOps tool like ArgoCD
- Working knowledge of Git as a source control system and defect tracking system
- Ability to debug and troubleshoot deployment issues
- Ability to use tools for faster resolution of issues
- Excellent communication and soft skills
- Passionate and ability work and deliver in a multi-team environment
- Good team player
- Flexible and quick learner
- Ability to write docker files, Kubernetes yaml files / Helm charts
- Experience with monitoring tools like Nagios, Prometheus and visualisation tools such as Grafana.
- Ability to write Ansible, terraform scripts
- Linux System experience and Administration
- Effective cross-functional leadership skills: working with engineering and operational teams to ensure systems are secure, scalable, and reliable.
- Ability to review deployment and operational environments, i.e., execute initiatives to reduce failure, troubleshoot issues across the entire infrastructure stack, expand monitoring capabilities, and manage technical operations.
- 5+ years of experience in DevOps including automated system configuration, application deployment, and infrastructure-as-code.
- Advanced Linux system administration abilities.
- Real-world experience managing large-scale AWS or GCP environments. Multi-account management a plus.
- Experience with managing production environments on AWS or GCP.
- Solid understanding CI/CD pipelines using GitHub, CircleCI/Jenkins, JFrog Artifactory/Nexus.
- Experience on any configuration management tools like Ansible, Puppet or Chef is a must.
- Experience in any one of the scripting languages: Shell, Python, etc.
- Experience in containerization using Docker and orchestration using Kubernetes/EKS/GKE is a must.
- Solid understanding of SSL and DNS.
- Experience on deploying and running any open-source monitoring/graphing solution like Prometheus, Grafana, etc.
- Basic understanding of networking concepts.
- Always adhere to security best practices.
- Knowledge on Bigdata (Hadoop/Druid) systems administration will be a plus.
- Knowledge on managing and running DBs (MySQL/MariaDB/Postgres) will be an added advantage.
What you get to do
- Work with development teams to build and maintain cloud environments to specifications developed closely with multiple teams. Support and automate the deployment of applications into those environments
- Diagnose and resolve occurring, latent and systemic reliability issues across entire stack: hardware, software, application and network. Work closely with development teams to troubleshoot and resolve application and service issues
- Continuously improve Conviva SaaS services and infrastructure for availability, performance and security
- Implement security best practices – primarily patching of operating systems and applications
- Automate everything. Build proactive monitoring and alerting tools. Provide standards, documentation, and coaching to developers.
- Participate in 12x7 on-call rotations
- Work with third party service/support providers for installations, support related calls, problem resolutions etc.
• At least 4 years of hands-on experience with cloud infrastructure on GCP
• Hands-on-Experience on Kubernetes is a mandate
• Exposure to configuration management and orchestration tools at scale (e.g. Terraform, Ansible, Packer)
• Knowledge and hand-on-experience in DevOps tools (e.g. Jenkins, Groovy, and Gradle)
• Knowledge and hand-on-experience on the various platforms (e.g. Gitlab, CircleCl and Spinnakar)
• Familiarity with monitoring and alerting tools (e.g. CloudWatch, ELK stack, Prometheus)
• Proven ability to work independently or as an integral member of a team
Preferable Skills:
• Familiarity with standard IT security practices such as encryption,
credentials and key management.
• Proven experience on various coding languages (Java, Python-) to
• support DevOps operation and cloud transformation
• Familiarity and knowledge of the web standards (e.g. REST APIs, web security mechanisms)
• Hands on experience with GCP
• Experience in performance tuning, services outage management and troubleshooting.
Attributes:
• Good verbal and written communication skills
• Exceptional leadership, time management, and organizational skill Ability to operate independently and make decisions with little direct supervision
Job Description
- Experienced in Cloud (AWS, Digital Ocean, Google& Azure) development and System Operations.
- Cloud Storage Services and prior experience in designing and building infrastructure components in Amazon. DigitalOcean (or other cloud providers).
- Strong Linux experience (Red Hat 6.x/CentOS 5.x & 6.x, Debian).
- Expert Knowledge in - Git, Jenkins
- Should have experience in CI/CD Automation using tools like Jenkins, Ansible, puppet/Mcollective, Chef Docker, Kubernetes, GIT, Ansible, Terraform, Packer, Hashicorp Vault Docker, Kubernetes, Python Scripting etc.
- Good working knowledge with scripting languages such as Python, php
- Good communications and Presentation Skills
Preferred Education:
Bachelor's Degree or global equivalent in Computer Science or related field
- Mandatory: Docker, AWS, Linux, Kubernete or ECS
- Prior experience provisioning and spinning up AWS Clusters / Kubernetes
- Production experience to build scalable systems (load balancers, memcached, master/slave architectures)
- Experience supporting a managed cloud services infrastructure
- Ability to maintain, monitor and optimise production database servers
- Prior work with Cloud Monitoring tools (Nagios, Cacti, CloudWatch etc.)
- Experience with Docker, Kubernetes, Mesos, NoSQL databases (DynamoDB, Cassandra, MongoDB, etc)
- Other Open Source tools used in the infrastructure space (Packer, Terraform, Vagrant, etc.)
- In-depth knowledge on Linux Environment.
- Prior experience leading technical teams through the design and implementation of systems infrastructure projects.
- Working knowledge of Configuration Management (Chef, Puppet or Ansible preferred) Continuous Integration Tools (Jenkins preferred)
- Experience in handling large production deployments and infrastructure.
- DevOps based infrastructure and application deployments experience.
- Working knowledge of the AWS network architecture including designing VPN solutions between regions and subnets
- Hands-on knowledge with the AWS AMI architecture including the development of machine templates and blueprints
- He/she should be able to validate that the environment meets all security and compliance controls.
- Good working knowledge of AWS services such as Messaging, Application Services, Migration Services, Cost Management Platform.
- Proven written and verbal communication skills.
- Understands and can serve as the technical team lead to oversee the build of the Cloud environment based on customer requirements.
- Previous NOC experience.
- Client Facing Experience with excellent Customer Communication and Documentation Skills









