
What are we looking for??
- You have a good understanding and work experience in AKS, Kubernetes, and EKS.
- You are able to manage multi region clusters for disaster recovery.
- You have a good understanding of AWS stack.
- You have experience of production level in Kubernetes.
- You are comfortable coding/programming and can do so whenever required.
- You have worked with programmable infrastructure in some way - Built a CI/CD pipeline, Provisioned infrastructure programmatically or Provisioned monitoring and logging infrastructure for large sets of machines.
- You love automating things, sometimes even what seems like you can’t automate - such as one of our engineers used Ansible to set up the Ubuntu workstation and runs a playbook every time something has to be installed.
- You don’t throw around words such as “high availability” or “resilient systems” without understanding at least their basics. Because you know that words are easy to talk about but there is a fair amount of work to build such a system in practice.
- You love coaching people - about the 12-factor apps or the latest tool that reduced your time of doing a task by X times and so on. You lead by example when it comes to technical work and community.
- You understand the areas you have worked on very well but, you are curious about many systems that you may not have worked on and want to fiddle with them.
- You know that understanding applications and the runtime technologies gives you a better perspective - you never looked at them as two different things.
What you will be learning and doing?
- You will be working with customers trying to transform their applications and adopt cloud-native technologies. The technologies used will be Kubernetes, Prometheus, Service Mesh, Distributed tracing and public cloud technologies or on-premise infrastructure.
- The problems and solutions are continuously evolving in space but fundamentally you will be solving problems with simplest and scalable automation.
- You will be building open source tools for problems that you think are common across customers and industry. No one ever benefited from re-inventing the wheel, did they?
- You will be hacking around open source projects, understand their capabilities, limitations and apply the right tool for the right job.
- You will be educating the customers - from their operations engineers to developers on scalable ways to build and operate applications in modern cloud-native infrastructure.

About Improving
About
Improving is a leading IT professional services firm committed to helping companies achieve lasting success through modern technology. With core expertise in AI, Data, and Applications, we specialize in transforming legacy systems, building cloud-native platforms, and delivering intelligent, future-ready solutions for today’s complex business needs. Improving’s leaders are equally committed to fostering a great place to work that is inclusive and purpose-centered, empowering Improvers to bring their whole selves to work. Our team is known for its collaborative approach and long-term partnerships that prioritize measurable outcomes. By combining technical excellence with strategic insight, Improving enables all stakeholders to grow, adapt, and lead in an ever-evolving digital landscape.
Tech stack
Similar jobs
JOB DETAILS:
- Job Title: Lead DevOps Engineer
- Industry: Ride-hailing
- Experience: 6-9 years
- Working Days: 5 days/week
- Work Mode: ONSITE
- Job Location: Bangalore
- CTC Range: Best in Industry
Required Skills: Cloud & Infrastructure Operations, Kubernetes & Container Orchestration, Monitoring, Reliability & Observability, Proficiency with Terraform, Ansible etc., Strong problem-solving skills with scripting (Python/Go/Shell)
Criteria:
1. Candidate must be from a product-based or scalable app-based start-ups company with experience handling large-scale production traffic.
2. Minimum 6 yrs of experience working as a DevOps/Infrastructure Consultant
3. Candidate must have 2 years of experience as an lead (handling team of 3 to 4 members at least)
4. Own end-to-end infrastructure right from non-prod to prod environment including self-managed
5. Candidate must have Self experience in database migration from scratch
6. Must have a firm hold on the container orchestration tool Kubernetes
7. Should have expertise in configuration management tools like Ansible, Terraform, Chef / Puppet
8. Understanding programming languages like GO/Python, and Java
9. Working on databases like Mongo/Redis/Cassandra/Elasticsearch/Kafka.
10. Working experience on Cloud platform -AWS
11. Candidate should have Minimum 1.5 years stability per organization, and a clear reason for relocation.
Description
Job Summary:
As a DevOps Engineer at company, you will be working on building and operating infrastructure at scale, designing and implementing a variety of tools to enable product teams to build and deploy their services independently, improving observability across the board, and designing for security, resiliency, availability, and stability. If the prospect of ensuring system reliability at scale and exploring cutting-edge technology to solve problems, excites you, then this is your fit.
Job Responsibilities:
● Own end-to-end infrastructure right from non-prod to prod environment including self-managed DBs
● Codify our infrastructure
● Do what it takes to keep the uptime above 99.99%
● Understand the bigger picture and sail through the ambiguities
● Scale technology considering cost and observability and manage end-to-end processes
● Understand DevOps philosophy and evangelize the principles across the organization
● Strong communication and collaboration skills to break down the silos
Job Requirements:
● B.Tech. / B.E. degree in Computer Science or equivalent software engineering degree/experience
● Minimum 6 yrs of experience working as a DevOps/Infrastructure Consultant
● Must have a firm hold on the container orchestration tool Kubernetes
● Must have expertise in configuration management tools like Ansible, Terraform, Chef / Puppet
● Strong problem-solving skills, and ability to write scripts using any scripting language
● Understanding programming languages like GO/Python, and Java
● Comfortable working on databases like Mongo/Redis/Cassandra/Elasticsearch/Kafka.
What’s there for you?
Company’s team handles everything – infra, tooling, and self-manages a bunch of databases, such as
● 150+ microservices with event-driven architecture across different tech stacks Golang/ java/ node
● More than 100,000 Request per second on our edge gateways
● ~20,000 events per second on self-managed Kafka
● 100s of TB of data on self-managed databases
● 100s of real-time continuous deployment to production
● Self-managed infra supporting
● 100% OSS
Key Responsibilities
DevOps Strategy & Leadership
- Define and execute the end-to-end DevOps strategy for high-frequency trading and fintech platforms.
- Lead, mentor, and scale a high-performing DevOps team focused on automation, reliability, and performance.
- Partner closely with engineering and product leaders to ensure infrastructure strategy supports business and technical goals.
CI/CD & Infrastructure Automation
- Architect, implement, and optimize enterprise-grade CI/CD pipelines for ultra-low-latency trading systems.
- Drive Infrastructure as Code (IaC) adoption using Terraform, Helm, Kubernetes, and advanced automation toolsets.
- Establish robust release management, deployment workflows, and versioning best practices for mission‑critical environments.
Cloud & On‑Prem Infrastructure Management
- Design and manage hybrid infrastructures across AWS, GCP, and on-premise data centers ensuring high availability and fault tolerance.
- Implement sophisticated networking strategies for low-latency workloads including routing optimization and performance tuning.
- Lead multi‑cloud scalability, cost optimization, and environment standardization initiatives.
Performance Monitoring & Optimization
- Oversee large-scale monitoring systems using Prometheus, Grafana, ELK, and related observability tools.
- Implement predictive alerting, automated remediation, and system‑wide health checks for zero‑downtime operations.
- Conduct root-cause analyses and performance tuning for systems processing millions of transactions per second.
Security & Compliance
- Champion DevSecOps practices and embed security across the entire development and deployment lifecycle.
- Ensure adherence to financial regulatory standards (SEBI and global frameworks) with strong audit and compliance mechanisms.
- Lead security automation efforts, vulnerability management, and advanced IAM policy implementation.
Required Skills & Qualifications
- 10+ years of DevOps experience, with 5+ years in a leadership capacity.
- Deep hands-on expertise in CI/CD tools such as Jenkins, GitLab CI/CD, and ArgoCD.
- Strong command of AWS, GCP, and hybrid cloud infrastructures.
- Expert-level knowledge of Kubernetes, Docker, and large-scale container orchestration.
- Advanced proficiency in Terraform, Helm, and overall IaC workflows.
- Strong Linux administration, networking fundamentals (TCP/IP, DNS, Firewalls), and system internals.
- Experience with monitoring and observability platforms (Prometheus, Grafana, ELK).
- Excellent scripting skills in Python, Bash, or Go for automation and tooling.
- Deep understanding of security principles, encryption, IAM, and compliance frameworks.
Good to Have
- Experience with ultra-low-latency or high-frequency trading systems.
- Knowledge of FIX protocol, FPGA acceleration, or network‑level optimizations.
- Familiarity with Redis, Nginx, or other high‑throughput systems.
- Exposure to micro‑second‑level performance tuning or network acceleration technologies.
Why Join Us?
- Be part of a team that consistently raises the bar and delivers exceptional engineering outcomes.
- A culture where innovation, ownership, and bold thinking are valued.
- Exceptional growth opportunities—ideal for someone who thrives in fast-paced, high-impact environments.
- Build systems that influence markets and redefine the fintech landscape.
This isn’t just a role—it’s a challenge, a platform, and a proving ground.
Ready to step up? Apply now.
Type, Location
Full Time @ Anywhere in India
Desired Experience
2+ years
Job Description
What You’ll Do
● Deploy, automate and maintain web-scale infrastructure with leading public cloud vendors such as Amazon Web Services, Digital Ocean & Google Cloud Platform.
● Take charge of DevOps activities for CI/CD with the latest tech stacks.
● Acquire industry-recognized, professional cloud certifications (AWS/Google) in the capacity of developer or architect Devise multi-region technical solutions.
● Implementing the DevOps philosophy and strategy across different domains in organisation.
● Build automation at various levels, including code deployment to streamline release process
● Will be responsible for architecture of cloud services
● 24*7 monitoring of the infrastructure
● Use programming/scripting in your day-to-day work
● Have shell experience - for example Powershell on Windows, or BASH on *nix
● Use a Version Control System, preferably git
● Hands on at least one CLI/SDK/API of at least one public cloud ( GCP, AWS, DO)
● Scalability, HA and troubleshooting of web-scale applications.
● Infrastructure-As-Code tools like Terraform, CloudFormation
● CI/CD systems such as Jenkins, CircleCI
● Container technologies such as Docker, Kubernetes, OpenShift
● Monitoring and alerting systems: e.g. NewRelic, AWS CloudWatch, Google StackDriver, Graphite, Nagios/ICINGA
What you bring to the table
● Hands on experience in Cloud compute services, Cloud Function, Networking, Load balancing, Autoscaling.
● Hands on with GCP/AWS Compute & Networking services i.e. Compute Engine, App Engine, Kubernetes Engine, Cloud Function, Networking (VPC, Firewall, Load Balancer), Cloud SQL, Datastore.
● DBs: Postgresql, MySQL, Elastic Search, Redis, kafka, MongoDB or other NoSQL systems
● Configuration management tools such as Ansible/Chef/Puppet
Bonus if you have…
● Basic understanding of Networking(routing, switching, dns) and Storage
● Basic understanding of Protocol such as UDP/TCP
● Basic understanding of Cloud computing
● Basic understanding of Cloud computing models like SaaS, PaaS
● Basic understanding of git or any other source code repo
● Basic understanding of Databases(sql/no sql)
● Great problem solving skills
● Good in communication
● Adaptive to learning
Key Sills Required for Lead DevOps Engineer
Containerization Technologies
Docker, Kubernetes, OpenShift
Cloud Technologies
AWS/Azure, GCP
CI/CD Pipeline Tools
Jenkins, Azure Devops
Configuration Management Tools
Ansible, Chef,
SCM Tools
Git, GitHub, Bitbucket
Monitoring Tools
New Relic, Nagios, Prometheus
Cloud Infra Automation
Terraform
Scripting Languages
Python, Shell, Groovy
· Ability to decide the Architecture for the project and tools as per the availability
· Sound knowledge required in the deployment strategies and able to define the timelines
· Team handling skills are a must
· Debugging skills are an advantage
· Good to have knowledge of Databases like Mysql, Postgresql
It is advantageous to be familiar with Kafka. RabbitMQ
· Good to have knowledge of Web servers to deploy web applications
· Good to have knowledge of Code quality checking tools like SonarQube and Vulnerability scanning
· Advantage to having experience in DevSecOps
Note: Tools mentioned in bold are a must and others are added advantage
Job description
The ideal candidate is a self-motivated, multi-tasker, and demonstrated team player. You will be a lead developer responsible for the development of new software security policies and enhancements to security on existing products. You should excel in working with large-scale applications and frameworks and have outstanding communication and leadership skills.
Responsibilities
- Consulting with management on the operational requirements of software solutions.
- Contributing expertise on information system options, risk, and operational impact.
- Mentoring junior software developers in gaining experience and assuming DevOps responsibilities.
- Managing the installation and configuration of solutions.
- Collaborating with developers on software requirements, as well as interpreting test stage data.
- Developing interface simulators and designing automated module deployments.
- Completing code and script updates, as well as resolving product implementation errors.
- Overseeing routine maintenance procedures and performing diagnostic tests.
- Documenting processes and monitoring performance metrics.
- Conforming to best practices in network administration and cybersecurity.
Qualifications
- Minimum of 2 years of hands-on experience in software development and DevOps, specifically managing AWS Infrastructure such as EC2s, RDS, Elastic cache, S3, IAM, cloud trail and other services provided by AWS.
- Experience Building a multi-region highly available auto-scaling infrastructure that optimises performance and cost. plan for future infrastructure as well as Maintain & optimise existing infrastructure.
- Conceptualise, architect and build automated deployment pipelines in a CI/CD environment like Jenkins.
- Conceptualise, architect and build a containerised infrastructure using Docker, Mesosphere or similar SaaS platforms.
- Conceptualise, architect and build a secured network utilising VPCs with inputs from the security team.
- Work with developers & QA to institute a policy of Continuous Integration with Automated testing Architect, build and manage dashboards to provide visibility into delivery, production application functional and performance status.
- Work with developers to institute systems, policies and workflows which allow for rollback of deployments Triage release of applications to production environment on a daily basis.
- Interface with developers and triage SQL queries that need to be executed in production environments.
- Assist the developers and on calls for other teams with post mortem, follow up and review of issues affecting production availability.
- Minimum 2 years’ experience in Ansible.
- Must have written playbook to automate provisioning of AWS infrastructure as well as automation of routine maintenance tasks.
- Must have had prior experience automating deployments to production and lower environments.
- Experience with APM tools like New Relic and log management tools.
- Our entire platform is hosted on AWS, comprising of web applications, webservices, RDS, Redis and Elastic Search clusters and several other AWS resources like EC2, S3, Cloud front, Route53 and SNS.
- Essential Functions System Architecture Process Design and Implementation
- Minimum of 2 years scripting experience in Ruby/Python (Preferable) and Shell Web Application Deployment Systems Continuous Integration tools (Ansible)Establishing and enforcing Network Security Policy (AWS VPC, Security Group) & ACLs.
- Establishing and enforcing systems monitoring tools and standards
- Establishing and enforcing Risk Assessment policies and standards
- Establishing and enforcing Escalation policies and standards
We are hiring candidates who are looking to work in a cloud environment and ready to learn and adapt to the evolving technologies.
Linux Administrator Roles & Responsibilities:
- 5+ or more years of professional experience with strong working expertise in Agile environments
- Deep knowledge in managing Linux servers.
- Managing Windows servers(Not Mandatory).
- Manage Web servers (Apache, Nginx).
- Manage Application servers.
- Strong background & experience in any one scripting language (Bash, Python)
- Manage firewall rules.
- Perform root cause analysis for production errors.
- Basic administration of MySQL, MSSQL.
- Ready to learn and adapt to business requirements.
- Manage information security controls with best practises and processes.
- Support business requirements beyond working hours.
- Ensuring highest uptimes of the services.
- Monitoring resource usages.
Skills/Requirements
- Bachelor’s Degree or Diploma in Computer Science, Engineering, Software Engineering or a relevant field.
- Experience with Linux-based infrastructures, Linux/Unix administration.
- Knowledge in managing databases such as My SQL, MS SQL.
- Knowledge of scripting languages such as Python, Bash.
- Knowledge in open-source technologies and cloud services like AWS, Azure is a plus. Candidates willing to learn will be preferred.
- Experience in managing web applications.
- Problem-solving attitude.
- 5+ years experience in the IT industry.
• Design cloud infrastructure that is secure, scalable, and highly available on AWS
• Define infrastructure and deployment requirements
• Provision, configure and maintain AWS cloud infrastructure defined as code
• Ensure configuration and compliance with configuration management tools
• Troubleshoot problems across a wide array of services and functional areas
• Build and maintain operational tools for deployment, monitoring, and analysis of AWS infrastructure and systems
• Perform infrastructure cost analysis and optimization
Qualifications:
• At least 3-5 years of experience building and maintaining AWS infrastructure (VPC, EC2, Security Groups, IAM, ECS, CodeDeploy, CloudFront, S3)
• Strong understanding of how to secure AWS environments and meet compliance requirements
• Expertise on configuration management
• Hands-on experience deploying and managing infrastructure with Terraform
• Solid foundation of networking and Linux administration
• Experience with Docker, GitHub, Jenkins, ELK and deploying applications on AWS
• Ability to learn/use a wide variety of open source technologies and tools
• Strong bias for action and ownership
Responsibilities
- Building and maintenance of resilient and scalable production infrastructure
- Improvement of monitoring systems
- Creation and support of development automation processes (CI / CD)
- Participation in infrastructure development
- Detection of problems in architecture and proposing of solutions for solving them
- Creation of tasks for system improvements for system scalability, performance and monitoring
- Analysis of product requirements in the aspect of devops
- Managing a team of DevOps, control of task deliveries
- Incident analysis and fixing
Technology stack
Linux, Bash, Salt/Ansible, LXC, libvirt, IPsec, VXLAN, Open vSwitch, OpenVPN, OSPF, BIRD, Cisco NX-OS, Multicast, PIM, LVM, software RAID, LUKS, PostgreSQL, nginx, haproxy, Prometheus, Grafana, Zabbix, GitLab, Capistrano
Skills and Experience
- Understanding of the distributed systems principles
- Understanding of principles for building a resistant network infrastructure
- Experience of Ubuntu Linux administration (Debian-like will be a plus)
- Strong knowledge of Bash
- Experience of working with LXC-containers
- Understanding and experience with infrastructure as a code approach
- Experience of development idempotent Ansible roles
- Experience with relational databases (PostgeSQL), ability to create simple SQL queries
- Experience with git
- Experience with monitoring and metric collect systems (Prometheus, Grafana, Zabbix)
- Understanding of dynamic routing (OSPF)
Preferred experience
- Experience of working with highload zero-downtown environments
- Experience of coding on Python
- Experience of working with IPsec, VXLAN, Open vSwitch
- Knowledge and experience of working with network equipment Cisco
- Experience of working with Cisco NX-OS
- Knowledge of principles of multicast protocols IGMP, PIM
- Experience of setting multicast on Cisco equipment
- Experience of working with Solarflare Onload
- Experience administering Atlassian products
Objectives of this Role
Improve reliability, quality, and time-to-market of our suite of software solutions
- Run the production environment by monitoring availability and taking a holistic view of system health
- Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer - needs, and innovating to continually improve
- Provide primary operational support and engineering for multiple large distributed software applications
- Participate in system design consulting, platform management, and capacity planning
- Languages: Python, Java, Ruby DSL, Bash
- Databases : MySQL, Cassandra , Elastic Search
- Deployment: AWS CloudFormation
Essential Criteria:
- 8 or more years administrating production Linux systems in a 24x7 environment
- 3 or more years’ experience in a DevOps/ SRE role as an engineer or technical lead
- At least 1 year of team leadership experience
- Significant knowledge of Amazon Web Services (CLI/APIs, EC2, EBS, S3, VPCs, IAM, AWS Lambda)
- Experience deploying services into containerized orchestration environments such as Kubernetes
- Experience with infrastructure automation tools like CloudFormation, Terraform, etc.
- Experience with at least one of Python, Bash, Ruby, or equivalent
- Experience creating and managing CI/CD pipeline like Jenkins or Spinnaker
- Familiar with version control using Git
- Solid understanding of common security principles
Nice to Have:
- Preference for hands on experience with Serverless Architecture, Kubernetes and Docker
- Strong experience with open-source configuration management tools
- Managing distributed systems spanning multiple AWS regions / data-centers
- Experience with bootstrapping solutions
- Open source contributor
- We’re committed to client success: There are over 6,200 brand and retail websites in the Bazaarvoice network. Our clients represent some of the world’s leading companies across a wide range of industries including retail, apparel, automotive, consumer electronics and travel.
- We’re leaders in consumer-generated content: Each month, more than one billion consumers view and share authentic consumer-generated content, such as ratings and reviews, curated photos, social posts and videos, about products in our network. Thousands upon thousands or reviews are added to the Bazaarvoice network everyday.
- Our network delivers: Network analytics provide insights that help marketers and advertisers provide more engaging experiences that drive brand awareness, consideration, sales, and loyalty.
- We’re a great place to work: We pride ourselves on our unique culture. Join a company that values passion, innovation, authenticity, generosity, respect, teamwork, and performance.
- Strong Understanding of Linux administration
- Good understanding of using Python or Shell scripting (Automation mindset is key in this role)
- Hands on experience with Implementation of CI/CD Processes
Experience working with one of these cloud platforms (AWS, Azure or Google Cloud) - Experience working with configuration management tools such as Ansible, Chef
Experience in Source Control Management including SVN, Bitbucket and GitHub
Experience with setup & management of monitoring tools like Nagios, Sensu & Prometheus
Troubleshoot and triage development and Production issues - Understanding of micro-services is a plus
Roles & Responsibilities
- Implementation and troubleshooting on Linux technologies related to OS, Virtualization, server and storage, backup, scripting / automation, Performance fine tuning
- LAMP stack skills
- Monitoring tools deployment / management (Nagios, New Relic, Zabbix, etc)
- Infra provisioning using Infra as code mindset
- CI/CD automation










