
We are looking for a skilled DevOps Engineer with strong expertise in Microsoft Azure and Pulumi to join our growing engineering team. The ideal candidate will be responsible for designing, implementing, and maintaining scalable cloud infrastructure and CI/CD pipelines. This role requires hands-on experience with Infrastructure as Code (IaC), automation, and cloud-native DevOps practices while collaborating closely with development and platform teams.
The candidate must be comfortable working in a U.S. time zone overlap, ensuring smooth collaboration with onshore teams.
Key Responsibilities
- Design, implement, and maintain cloud infrastructure on Microsoft Azure using modern DevOps practices.
- Develop and manage Infrastructure as Code (IaC) using Pulumi to automate infrastructure provisioning and management.
- Build, maintain, and optimize CI/CD pipelines to support reliable and frequent deployments.
- Monitor, troubleshoot, and improve cloud infrastructure performance, availability, and security.
- Collaborate with development teams to improve deployment processes and application reliability.
- Implement automation and configuration management to streamline operations.
- Manage containerized applications and orchestration platforms when applicable.
- Ensure adherence to cloud security best practices, governance, and compliance standards.
- Support production environments and resolve infrastructure-related issues in a timely manner.
- Maintain documentation for infrastructure architecture, deployment pipelines, and operational procedures.
Required Skills & Experience
- 4–7 years of experience in DevOps, Cloud Engineering, or related roles.
- Strong hands-on experience with Microsoft Azure cloud services.
- Proven experience using Pulumi for Infrastructure as Code (must-have).
- Experience building and managing CI/CD pipelines using tools like Azure DevOps, GitHub Actions, Jenkins, or similar.
- Strong experience with containerization technologies (Docker, Kubernetes).
- Proficiency with scripting languages such as Python, Bash, or PowerShell.
- Experience with monitoring and logging tools.
- Strong understanding of cloud networking, security, and identity management.
- Experience with Git-based version control systems.
Preferred Qualifications
- Experience with Kubernetes in Azure (AKS).
- Familiarity with observability tools such as Prometheus, Grafana, or similar.
- Experience with cloud cost optimization and performance tuning.
- Knowledge of DevSecOps practices and security automation.
- Azure certifications such as Azure DevOps Engineer Expert or Azure Administrator.

Similar jobs
Required Skills: Advanced AWS Infrastructure Expertise, CI/CD Pipeline Automation, Monitoring, Observability & Incident Management, Security, Networking & Risk Management, Infrastructure as Code & Scripting
Criteria:
- 5+ years of DevOps/SRE experience in cloud-native, product-based companies (B2C scale preferred)
- Strong hands-on AWS expertise across core and advanced services (EC2, ECS/EKS, Lambda, S3, CloudFront, RDS, VPC, IAM, ELB/ALB, Route53)
- Proven experience designing high-availability, fault-tolerant cloud architectures for large-scale traffic
- Strong experience building & maintaining CI/CD pipelines (Jenkins mandatory; GitHub Actions/GitLab CI a plus)
- Prior experience running production-grade microservices deployments and automated rollout strategies (Blue/Green, Canary)
- Hands-on experience with monitoring & observability tools (Grafana, Prometheus, ELK, CloudWatch, New Relic, etc.)
- Solid hands-on experience with MongoDB in production, including performance tuning, indexing & replication
- Strong scripting skills (Bash, Shell, Python) for automation
- Hands-on experience with IaC (Terraform, CloudFormation, or Ansible)
- Deep understanding of networking fundamentals (VPC, subnets, routing, NAT, security groups)
- Strong experience in incident management, root cause analysis & production firefighting
Description
Role Overview
Company is seeking an experienced Senior DevOps Engineer to design, build, and optimize cloud infrastructure on AWS, automate CI/CD pipelines, implement monitoring and security frameworks, and proactively identify scalability challenges. This role requires someone who has hands-on experience running infrastructure at B2C product scale, ideally in media/OTT or high-traffic applications.
Key Responsibilities
1. Cloud Infrastructure — AWS (Primary Focus)
- Architect, deploy, and manage scalable infrastructure using AWS services such as EC2, ECS/EKS, Lambda, S3, CloudFront, RDS, ELB/ALB, VPC, IAM, Route53, etc.
- Optimize cloud cost, resource utilization, and performance across environments.
- Design high-availability, fault-tolerant systems for streaming workloads.
2. CI/CD Automation
- Build and maintain CI/CD pipelines using Jenkins, GitHub Actions, or GitLab CI.
- Automate deployments for microservices, mobile apps, and backend APIs.
- Implement blue/green and canary deployments for seamless production rollouts.
3. Observability & Monitoring
- Implement logging, metrics, and alerting using tools like Grafana, Prometheus, ELK, CloudWatch, New Relic, etc.
- Perform proactive performance analysis to minimize downtime and bottlenecks.
- Set up dashboards for real-time visibility into system health and user traffic spikes.
4. Security, Compliance & Risk Highlighting
• Conduct frequent risk assessments and identify vulnerabilities in:
o Cloud architecture
o Access policies (IAM)
o Secrets & key management
o Data flows & network exposure
• Implement security best practices including VPC isolation, WAF rules, firewall policies, and SSL/TLS management.
5. Scalability & Reliability Engineering
- Analyze traffic patterns for OTT-specific load variations (weekends, new releases, peak hours).
- Identify scalability gaps and propose solutions across:
- o Microservices
- o Caching layers
- o CDN distribution (CloudFront)
- o Database workloads
- Perform capacity planning and load testing to ensure readiness for 10x traffic growth.
6. Database & Storage Support
- Administer and optimize MongoDB for high-read/low-latency use cases.
- Design backup, recovery, and data replication strategies.
- Work closely with backend teams to tune query performance and indexing.
7. Automation & Infrastructure as Code
- Implement IaC using Terraform, CloudFormation, or Ansible.
- Automate repetitive infrastructure tasks to ensure consistency across environments.
Required Skills & Experience
Technical Must-Haves
- 5+ years of DevOps/SRE experience in cloud-native, product-based companies.
- Strong hands-on experience with AWS (core and advanced services).
- Expertise in Jenkins CI/CD pipelines.
- Solid background working with MongoDB in production environments.
- Good understanding of networking: VPCs, subnets, security groups, NAT, routing.
- Strong scripting experience (Bash, Python, Shell).
- Experience handling risk identification, root cause analysis, and incident management.
Nice to Have
- Experience with OTT, video streaming, media, or any content-heavy product environments.
- Familiarity with containers (Docker), orchestration (Kubernetes/EKS), and service mesh.
- Understanding of CDN, caching, and streaming pipelines.
Personality & Mindset
- Strong sense of ownership and urgency—DevOps is mission critical at OTT scale.
- Proactive problem solver with ability to think about long-term scalability.
- Comfortable working with cross-functional engineering teams.
Why Join company?
• Build and operate infrastructure powering millions of monthly users.
• Opportunity to shape DevOps culture and cloud architecture from the ground up.
• High-impact role in a fast-scaling Indian OTT product.
REVIEW CRITERIA:
MANDATORY:
- Strong Hands-On AWS Cloud Engineering / DevOps Profile
- Mandatory (Experience 1): Must have 12+ years of experience in AWS Cloud Engineering / Cloud Operations / Application Support
- Mandatory (Experience 2): Must have strong hands-on experience supporting AWS production environments (EC2, VPC, IAM, S3, ALB, CloudWatch)
- Mandatory (Infrastructure as a code): Must have hands-on Infrastructure as Code experience using Terraform in production environments
- Mandatory (AWS Networking): Strong understanding of AWS networking and connectivity (VPC design, routing, NAT, load balancers, hybrid connectivity basics)
- Mandatory (Cost Optimization): Exposure to cost optimization and usage tracking in AWS environments
- Mandatory (Core Skills): Experience handling monitoring, alerts, incident management, and root cause analysis
- Mandatory (Soft Skills): Strong communication skills and stakeholder coordination skills
ROLE & RESPONSIBILITIES:
We are looking for a hands-on AWS Cloud Engineer to support day-to-day cloud operations, automation, and reliability of AWS environments. This role works closely with the Cloud Operations Lead, DevOps, Security, and Application teams to ensure stable, secure, and cost-effective cloud platforms.
KEY RESPONSIBILITIES:
- Operate and support AWS production environments across multiple accounts
- Manage infrastructure using Terraform and support CI/CD pipelines
- Support Amazon EKS clusters, upgrades, scaling, and troubleshooting
- Build and manage Docker images and push to Amazon ECR
- Monitor systems using CloudWatch and third-party tools; respond to incidents
- Support AWS networking (VPCs, NAT, Transit Gateway, VPN/DX)
- Assist with cost optimization, tagging, and governance standards
- Automate operational tasks using Python, Lambda, and Systems Manager
IDEAL CANDIDATE:
- Strong hands-on AWS experience (EC2, VPC, IAM, S3, ALB, CloudWatch)
- Experience with Terraform and Git-based workflows
- Hands-on experience with Kubernetes / EKS
- Experience with CI/CD tools (GitHub Actions, Jenkins, etc.)
- Scripting experience in Python or Bash
- Understanding of monitoring, incident management, and cloud security basics
NICE TO HAVE:
- AWS Associate-level certifications
- Experience with Karpenter, Prometheus, New Relic
- Exposure to FinOps and cost optimization practices
We are looking for a passionate DevOps Engineer who can support deployment and monitor our Production, QE, and Staging environments performance. Applicants should have a strong understanding of UNIX internals and should be able to clearly articulate how it works. Knowledge of shell scripting & security aspects is a must. Any experience with infrastructure as code is a big plus. The key responsibility of the role is to manage deployments, security, and support of business solutions. Having experience in database applications like Postgres, ELK, NodeJS, NextJS & Ruby on Rails is a huge plus. At VakilSearch. Experience doesn't matter, passion to produce change matters
Responsibilities and Accountabilities:
- As part of the DevOps team, you will be responsible for configuration, optimization, documentation, and support of the infra components of VakilSearch’s product which are hosted in cloud services & on-prem facility
- Design, build tools and framework that support deploying and managing our platform & Exploring new tools, technologies, and processes to improve speed, efficiency, and scalability
- Support and troubleshoot scalability, high availability, performance, monitoring, backup, and restore of different Env
- Manage resources in a cost-effective, innovative manner including assisting subordinates ineffective use of resources and tools
- Resolve incidents as escalated from Monitoring tools and Business Development Team
- Implement and follow security guidelines, both policy and technology to protect our data
- Identify root cause for issues and develop long-term solutions to fix recurring issues and Document it
- Strong in performing production operation activities even at night times if required
- Ability to automate [Scripts] recurring tasks to increase velocity and quality
- Ability to manage and deliver multiple project phases at the same time
I Qualification(s):
- Experience in working with Linux Server, DevOps tools, and Orchestration tools
- Linux, AWS, GCP, Azure, CompTIA+, and any other certification are a value-add
II Experience Required in DevOps Aspects:
- Length of Experience: Minimum 1-4 years of experience
- Nature of Experience:
- Experience in Cloud deployments, Linux administration[ Kernel Tuning is a value add ], Linux clustering, AWS, virtualization, and networking concepts [ Azure, GCP value add ]
- Experience in deployment solutions CI/CD like Jenkins, GitHub Actions [ Release Management is a value add ]
- Hands-on experience in any of the configuration management IaC tools like Chef, Terraform, and CloudFormation [ Ansible & Puppet is a value add ]
- Administration, Configuring and utilizing Monitoring and Alerting tools like Prometheus, Grafana, Loki, ELK, Zabbix, Datadog, etc
- Experience with Containerization and orchestration tools like Docker, and Kubernetes [ Docker swarm is a value add ]Good Scripting skills in at least one interpreted language - Shell/bash scripting or Ruby/Python/Perl
- Experience in Database applications like PostgreSQL, MongoDB & MySQL [DataOps]
- Good at Version Control & source code management systems like GitHub, GIT
- Experience in Serverless [ Lambda/GCP cloud function/Azure function ]
- Experience in Web Server Nginx, and Apache
- Knowledge in Redis, RabbitMQ, ELK, REST API [ MLOps Tools is a value add ]
- Knowledge in Puma, Unicorn, Gunicorn & Yarn
- Hands-on VMWare ESXi/Xencenter deployments is a value add
- Experience in Implementing and troubleshooting TCP/IP networks, VPN, Load Balancing & Web application firewalls
- Deploying, Configuring, and Maintaining Linux server systems ON premises and off-premises
- Code Quality like SonarQube is a value-add
- Test Automation like Selenium, JMeter, and JUnit is a value-add
- Experience in Heroku and OpenStack is a value-add
- Experience in Identifying Inbound and Outbound Threats and resolving it
- Knowledge of CVE & applying the patches for OS, Ruby gems, Node, and Python packages
- Documenting the Security fix for future use
- Establish cross-team collaboration with security built into the software development lifecycle
- Forensics and Root Cause Analysis skills are mandatory
- Weekly Sanity Checks of the on-prem and off-prem environment
III Skill Set & Personality Traits required:
- An understanding of programming languages such as Ruby, NodeJS, ReactJS, Perl, Java, Python, and PHP
- Good written and verbal communication skills to facilitate efficient and effective interaction with peers, partners, vendors, and customers
IV Age Group: 21 – 36 Years
V Cost to the Company: As per industry standards
Sr.DevOps Engineer (5 to 8 yrs. Exp.)
Location: Ahmedabad
- Strong Experience in Infrastructure provisioning in cloud using Terraform & AWS CloudFormation Templates.
- Strong Experience in Serverless Containerization technologies such as Kubernetes, Docker etc.
- Strong Experience in Jenkins & AWS Native CI/CD implementation using code
- Strong Experience in Cloud operational automation using Python, Shell script, AWS CLI, AWS Systems Manager, AWS Lamnda, etc.
- Day to Day AWS Cloud administration tasks
- Strong Experience in Configuration management using Ansible and PowerShell.
- Strong Experience in Linux and any scripting language must required.
- Knowledge of Monitoring tool will be added advantage.
- Understanding of DevOps practices which involves Continuous Integration, Delivery and Deployment.
- Hands on with application deployment process
Key Skills: AWS, terraform, Serverless, Jenkins,Devops,CI/CD,Python,CLI,Linux,Git,Kubernetes
Role: Software Developer
Industry Type: IT-Software, Software Services
FunctionalArea:ITSoftware- Application Programming, Maintenance
Employment Type: Full Time, Permanent
Education: Any computer graduate.
Salary: Best in Industry.Role : Principal Devops Engineer
About the Client
It is a Product base company that has to build a platform using AI and ML technology for their transportation and logiticsThey also have a presence in the global market
Responsibilities and Requirements
• Experience in designing and maintaining high volume and scalable micro-services architecture on cloud infrastructure
• Knowledge in Linux/Unix Administration and Python/Shell Scripting
• Experience working with cloud platforms like AWS (EC2, ELB, S3, Auto-scaling, VPC, Lambda), GCP, Azure
• Knowledge in deployment automation, Continuous Integration and Continuous Deployment (Jenkins, Maven, Puppet, Chef, GitLab) and monitoring tools like Zabbix, Cloud Watch Monitoring, Nagios
• Knowledge of Java Virtual Machines, Apache Tomcat, Nginx, Apache Kafka, Microservices architecture, Caching mechanisms
• Experience in enterprise application development, maintenance and operations
• Knowledge of best practices and IT operations in an always-up, always-available service
• Excellent written and oral communication skills, judgment and decision-making skill
Required qualifications and must have skills
-
5+ years of experience managing a team of 5+ infrastructure software engineers
-
5+ years of experience in building and scaling technical infrastructure
-
5+ years of experience in delivering software
-
Experience leading by influence in multi-team, cross-functional projects
-
Demonstrated experience recruiting and managing technical teams, including performance management and managing engineers
-
Experience with cloud service providers such as AWS, GCP, or Azure
-
Experience with containerization technologies such as Kubernetes and Docker
Nice to have Skills
-
Experience with Hadoop, Hive and Presto
-
Application/infrastructure benchmarking and optimization
-
Familiarity with modern CI/CD practices
-
Familiarity with reliability best practices
Requirements
This person MUST have:
- B.E Computer Science or equivalent
- 2+ Years of hands-on experience troubleshooting/setting up of the Linux environment, who can write shell scripts for any given requirement.
- 1+ Years of hands-on experience setting up/configuring AWS or GCP services from SCRATCH and maintaining them.
- 1+ Years of hands-on experience setting up/configuring Kubernetes & EKS and ensuring high availability of container orchestration.
- 1+ Years of hands-on experience setting up CICD from SCRATCH in Jenkins & Gitlab.
- Experience configuring/maintaining one monitoring tool.
- Excellent verbal & written communication skills.
- Candidates with certifications - AWS, GCP, CKA, etc will be preferred
- Hands-on experience with databases (Cassandra, MongoDB, MySQL, RDS).
Experience:
- Min 3 years of experience as SRE automation engineer building, running, and maintaining production sites. Not looking for candidates who have experience only as L1/L2.
Location:
- Remotely, anywhere in India
Timings:
- The person is expected to deliver with both high speed and high quality as well as work for 40 Hours per week (~6.5 hours per day, 6 days per week) in shifts which will rotate every month.
Position:
- Full time/Direct
- We have great benefits such as PF, medical insurance, 12 annual company holidays, 12 PTO leaves per year, annual increments, Diwali bonus, spot bonuses and other incentives etc.
- We dont believe in locking in people with large notice periods. You will stay here because you love the company. We have only a 15 days notice period.
About the Company
- 💰 Early-stage, ed-tech, funded, growing, growing fast
- 🎯 Mission Driven: Make Indonesia competitive on a global scale
- 🥅 Build the best educational content and technology to advance STEM education
- 🥇 Students-First approach
- 🇮🇩 🇮🇳 Teams in India and Indonesia
Skillset 🧗🏼♀️
- You primarily identify as a DevOps/Infrastructure engineer and are comfortable working with systems and cloud-native services on AWS
- You can design, implement, and maintain secure and scalable infrastructure delivering cloud-based services
- You have experience operating and maintaining production systems in a Linux based public cloud environment
- You are familiar with cloud-native concepts - Containers, Lambdas, Orchestration (ECS, Kubernetes)
- You’re in love with system metrics and strive to help deliver improvements to systems all the time
- You can think in terms of Infrastructure as Code to build tools for automating deployment, monitoring, and operations of the platform
- You can be on-call once every few weeks to provide application support, incident management, and troubleshooting
- You’re fairly comfortable with GIT, AWS CLI, python, docker CLI, in general, all things CLI. Oh! Bash scripting too!
- You have high integrity, and you are reliable
What you can expect from us 👌🏼
☮️ Mentorship, growth, great work culture
- Mentorship and continuous improvement are a part of the team’s DNA. We have a battle-tested robust growth framework. You will have people to look up to and people looking up to you
- We are a people-first, high-trust, high-autonomy team
- We live in the TDD, Pair Programming, First Principles world
🌏 Remote done right
- Distributed does not mean working in isolation, feeling alone, being buried in Zoom calls
- Our leadership team has been WFH for 10+ years now and we know how remote teams work. This will be a place to belong
- A good balance between deep focussed work and collaborative work ⚖️
🖥️ Friendly, humane interview process
- 30-minute alignment check and screening call
- A short take-home coding assignment, no more than 2-3 hours. Time is precious
- Pair programming interview. Collaborate, work together. No sitting behind a desk and judging
- In-depth engineering discussion around your skills and career so far
- System design and architecture interview for seniors
What we ask from you👇🏼
- Bring your software engineering — both individual brilliance and collaborative skills
- Bring your good nature — we're building a team that supports each other
- Be vested or interested in the company vision
Responsibilities
- Building and maintenance of resilient and scalable production infrastructure
- Improvement of monitoring systems
- Creation and support of development automation processes (CI / CD)
- Participation in infrastructure development
- Detection of problems in architecture and proposing of solutions for solving them
- Creation of tasks for system improvements for system scalability, performance and monitoring
- Analysis of product requirements in the aspect of devops
- Managing a team of DevOps, control of task deliveries
- Incident analysis and fixing
Technology stack
Linux, Bash, Salt/Ansible, LXC, libvirt, IPsec, VXLAN, Open vSwitch, OpenVPN, OSPF, BIRD, Cisco NX-OS, Multicast, PIM, LVM, software RAID, LUKS, PostgreSQL, nginx, haproxy, Prometheus, Grafana, Zabbix, GitLab, Capistrano
Skills and Experience
- Understanding of the distributed systems principles
- Understanding of principles for building a resistant network infrastructure
- Experience of Ubuntu Linux administration (Debian-like will be a plus)
- Strong knowledge of Bash
- Experience of working with LXC-containers
- Understanding and experience with infrastructure as a code approach
- Experience of development idempotent Ansible roles
- Experience with relational databases (PostgeSQL), ability to create simple SQL queries
- Experience with git
- Experience with monitoring and metric collect systems (Prometheus, Grafana, Zabbix)
- Understanding of dynamic routing (OSPF)
Preferred experience
- Experience of working with highload zero-downtown environments
- Experience of coding on Python
- Experience of working with IPsec, VXLAN, Open vSwitch
- Knowledge and experience of working with network equipment Cisco
- Experience of working with Cisco NX-OS
- Knowledge of principles of multicast protocols IGMP, PIM
- Experience of setting multicast on Cisco equipment
- Experience of working with Solarflare Onload
- Experience administering Atlassian products









