
Role Overview:
Virtana is looking for a Senior DevOps Engineer to join our R&D Infrastructure team. In this role, you won't just follow conventions — you'll help redefine them. You will own the architecture, build, and day-to-day operations of the GCP-based cloud platform that powers Virtana's SaaS products and the AI-driven observability experience our Global 2000 customers depend on. This is a hands-on senior individual contributor role with meaningful technical leadership scope, working alongside engineers and architects on a unified observability platform.
Work Location: Pune
Job Type: Hybrid
Role Responsibilities:
- GCP Cloud Operations: Develop, deploy, operate, and support production cloud infrastructure primarily on GCP — leveraging GKE, BigTable, BigQuery, Dataflow, Cloud Storage, IAM, and core networking services.
- Reliability & SLAs: Ensure production systems are running at all times with multiple levels of redundancy to meet committed SLAs; lead incident response, root cause analysis, and post-incident reviews.
- Build & Release Automation: Design, implement, and continuously improve scalable CI/CD pipelines and test frameworks leveraged by QA and development teams across the company.
- Infrastructure as Code: Manage large-scale, repeatable deployments using Terraform, Ansible, Puppet, or SaltStack; champion Git-based workflows and version control standards for distributed engineering teams.
- Security & Availability: Maintain the ongoing maintenance, security, patching, and availability of services in line with tight operations, security, and procedural models.
- Monitoring & Alerting: Plan and deliver high-value monitoring and alerting features to support operations, support, and customer-facing reliability — eating our own dog food with the Virtana Platform wherever possible.
- Capacity & Cost: Forecast capacity, plan upgrades, patches, and migrations, and drive cloud cost efficiency across hybrid and multi-cloud environments.
- Cross-Functional Partnership: Work with development, operations, and support personnel to identify, isolate, and diagnose issues; handle support escalations and drive permanent fixes.
Required Qualifications:
- Bachelor's degree in Computer Science / Engineering or equivalent relevant experience.
- 5–7 years of professional hands-on DevOps / SRE experience supporting production cloud environments.
- Strong, demonstrable production experience on GCP — including GKE, BigTable, BigQuery, Dataflow, IAM, and core GCP networking services.
- Deep, hands-on expertise with container orchestration (Kubernetes) and Docker in production.
- Advanced proficiency with at least one infrastructure-as-code / configuration management tool: Terraform, Ansible, Puppet, or SaltStack.
- Solid understanding of networking, firewalls, load balancers, DNS, and database operations.
- Strong working knowledge of Git-based workflows and version control standards for distributed engineering teams.
- Comfort operating hybrid environments that include both Linux and Windows ecosystems.
- Excellent verbal and written communication skills, with the ability to explain highly technical topics to both technical and non-technical audiences.
- Self-motivated, detail-oriented, and able to work both independently and within a globally distributed team.
Good to Have:
- Strong scripting skills and a demonstrated ability to automate operational toil — Python preferred; Bash, Go, or Groovy a plus.
- Hands-on experience designing and operating CI/CD pipelines with Jenkins (Spinnaker, GitHub Actions, or GitLab CI also welcome).
- Exposure to AWS or other public clouds in addition to GCP.
- Experience operating SaaS platforms built on microservices architectures.

About Virtana
About
Virtana is the leader in observability for hybrid infrastructure. The AI-powered Virtana Platform delivers a unified view across applications, services, and underlying infrastructure, correlating user impact, service dependencies, performance bottlenecks, and cost drivers in real time. Trusted by Global 2000 enterprises, Virtana helps IT, operations, and platform teams improve efficiency, reduce risk, and make faster, AI-driven decisions across complex, dynamic environments. Learn more at virtana.com.
Virtana is proud to be an Equal Opportunity Employer. We value diversity and are committed to creating an inclusive environment for all employees.
Recent Milestones
- Acquired Zenoss to build a unified observability platform combining deep infrastructure analytics, AI-driven intelligence, and end-to-end service visibility.
- Launched AI Factory Observability (AIFO) to address the performance, cost, risk, and visibility challenges of AI infrastructure across GPU-driven and hybrid environments.
- Recognized as Best Hybrid Cloud Solution (2025) by The Cloud Awards.
- First to extend Agentic AI across the Enterprise Stack, enabling autonomous, full-stack intelligence.
- Patented Full-Stack Cloud Optimization for AI Environments, driving smarter performance and cost efficiency for modern AI workloads.
Tech stack
Candid answers by the company
Virtana builds the industry’s deepest hybrid infrastructure observability platform, giving enterprises unmatched visibility into their complex IT environments -whether on‑premises, in the cloud, or across both.
Photos
Connect with the team
Similar jobs
The Role
As a DevOps Engineer at Blitzy's Pune headquarters, you'll build and operate the infrastructure that powers our AI agents and the applications they produce. You'll work at the intersection of cloud infrastructure, developer tooling, and AI-native systems — designing the pipelines, clusters, and automation that allow Blitzy to ship production-ready software at machine speed. This is a hands-on, high-ownership role for an engineer who moves fast, automates everything, and cares deeply about developer experience and system reliability.
What Success Looks Like
- Kubernetes clusters are running reliably at scale, with clear deployment standards, Helm-managed releases, and minimal manual intervention required from engineering teams.
- CI/CD pipelines are fast, consistent, and trusted — developers ship confidently knowing the automation handles the rest.
- Observability is comprehensive: alerts are actionable, dashboards are meaningful, and incidents are resolved faster because the right data is always available.
- Infrastructure provisioning is fully automated — no snowflake environments, no manual setup, everything reproducible through code.
- AI agent orchestration infrastructure is stable and scalable, directly enabling Blitzy's core product to deliver for enterprise customers.
- Engineering teams notice the difference — developer productivity is measurably higher and infrastructure is no longer a bottleneck to shipping.
Areas of Ownership
- Build and manage Kubernetes clusters supporting AI agent workloads and application deployment at scale.
- Design, implement, and maintain CI/CD pipelines for application and AI service delivery — ensuring speed, reliability, and repeatability.
- Automate infrastructure provisioning and dynamic scaling using Python scripts and Terraform IaC.
- Deploy and manage applications using Helm charts; own packaging standards and release automation.
- Build and maintain comprehensive observability stacks — alerting, distributed tracing, metrics, and logging (e.g., Prometheus, Grafana, Datadog, OpenTelemetry).
- Monitor and maintain production services and APIs; own incident response and drive blameless postmortems.
- Build dedicated infrastructure for AI agent orchestration and management, enabling Blitzy's core autonomous development capabilities.
- Collaborate with engineering teams on deployment strategies and continuously improve developer experience through tooling and automation.
Required Experience
- 5–8 years of DevOps, infrastructure, or platform engineering experience.
- Python proficiency for scripting, automation, and infrastructure tooling.
- Deep Kubernetes expertise — cluster management, workload deployment, scaling, and troubleshooting.
- Hands-on Helm experience for application packaging and release management.
- Proven ability to design and implement CI/CD pipelines across complex, multi-service environments.
- Practical experience with at least one major cloud platform (AWS, GCP, or Azure).
- Terraform proficiency for infrastructure-as-code provisioning and state management.
- Strong Linux administration and containerization fundamentals (Docker, OCI).
What Makes You Stand Out
- CKA (Certified Kubernetes Administrator) certification.
- Familiarity with MLOps tooling such as MLflow, Kubeflow, or similar platforms for AI/ML workload management.
- Experience with microservices architecture and distributed systems design.
- Knowledge of API gateways and service mesh technologies (Istio, Linkerd, or equivalent).
- Prior experience in a high-growth AI or software startup where you moved fast and owned broadly.
- Track record of meaningfully improving developer productivity through platform and tooling investments.
What Makes This Role Different
Most DevOps roles have you maintaining existing systems. At Blitzy, you're building the infrastructure layer for a platform that autonomously writes enterprise software — a genuinely new category of product. You'll work on AI agent orchestration, Kubernetes at scale, and developer tooling that is directly responsible for how fast Blitzy delivers value to Fortune 500 customers. As an early member of the Pune engineering team, you'll have outsized influence over our infrastructure culture and technical direction. High performers are eligible for company equity — giving you real ownership in what you build.
Job Summary:
The Lead IaC Engineer will design, implement, automate, and maintain infrastructure across on-premises and cloud environments. This role should have strong hands-on expertise in Chef, Python, Terraform, and some AWS & Windows administration knowledge.
8-12 years of experience
Primary Skills – Chef, Python, and Terraform
Secondary – AWS & Windows admin (Cloud is not mandatory)
Expert troubleshooting skills.
Expertise in designing highly secure cloud services and cloud infrastructure using AWS
(EC2, RDS, S3, ECS, Route53)
Experience with DevOps tools including Docker, Ansible, Terraform.
• Experience with monitoring tools such as DataDog, Splunk.
Experience building and maintaining large scale infrastructure in AWS including
experience leveraging one or more coding languages for automation.
Experience providing 24X7 on call production support.
Understanding of best practices, industry standards and repeatable, supportable
processes.
Knowledge and working experience of container-based deployments such as Docker,
Terraform, AWS ECS.
of TCP/IP, DNS, Certs & Networking Concepts.
Knowledge and working experience of the CI/CD development pipeline and experience
of the CI/CD maturity model. (Jenkins)
Knowledge and working experience
Strong core Linux OS skills, shell scripting, python scripting.
Working experience of modern engineering operations duties, including providing the
necessary tools and infrastructure to support high performance Dev and QA teams.
Database, MySQL administration skills is a plus.
Prior work in high load and high-traffic infrastructure is a plus.
Clear vision of and commitment to providing outstanding customer service.
Job description
We are seeking a highly skilled and experienced Backend Developer to join our team at Infomance. The ideal candidate will have a minimum of 3 years of hands-on experience in developing and managing WordPress websites, deploying sites on AWS, and managing servers through cPanel. This role is critical in ensuring the seamless performance, security, and scalability of our web platform.
Key Responsibilities:
- Develop and maintain the server end of our WordPress website.
- Manage and optimize the AWS server environment to ensure high performance and reliability.
- Deploy and manage WordPress sites on AWS.
- Utilize cPanel for server management, including setting up and configuring domains, email accounts, databases, and other hosting features.
- Collaborate with front-end developers to integrate user-facing elements with server-side logic.
- Ensure the security and integrity of the website and server infrastructure.
- Troubleshoot and resolve website and server issues promptly.
- Implement and maintain automated backup and disaster recovery solutions.
- Monitor server performance and optimize for speed and efficiency.
- Stay updated with the latest industry trends and technologies to ensure our platform remains at the cutting edge.
Required Skills:
1.. WordPress Development:
- Proficient in PHP, MySQL, HTML, CSS, and JavaScript.
- Experience in developing custom WordPress themes and plugins.
- Understanding of WordPress core and architecture.
2.. AWS (Amazon Web Services):
- Extensive experience with EC2, S3, RDS, and other relevant AWS services.
- Expertise in deploying and managing WordPress sites on AWS.
- Knowledge of AWS security best practices.
3.. Server Management:
- Proficiency with cPanel for server management.
- Experience in setting up and configuring domains, email accounts, and databases using cPanel.
- Ability to optimize server performance and ensure reliability.
4.. Version Control Systems:
- Experience with Git for version control and collaboration.
- Understanding of best practices for code management and deployment.
5.. API Integration:
- Familiarity with RESTful APIs and third-party integrations.
- Ability to implement and manage API connections effectively.
6.. Security:
- Strong understanding of web and server security practices.
- Experience in implementing security measures to protect the website and server infrastructure.
7.. Problem-Solving:
- Excellent troubleshooting skills.
- Ability to diagnose and resolve website and server issues promptly.
8.. Performance Optimization:
- Knowledge of techniques to optimize website speed and efficiency.
- Experience with caching, load balancing, and other performance-enhancing practices.
9.. Backup and Recovery:
- Experience in implementing automated backup solutions.
- Ability to manage disaster recovery processes effectively.
10.. Collaboration and Communication:
- Strong communication skills for working with cross-functional teams.
- Ability to collaborate effectively with front-end developers and other stakeholders.
Preferred Qualifications:
- Experience with full-stack development on WordPress.
- Knowledge of serverless architectures and microservices.
- Experience with other cloud platforms like Google Cloud or Azure.
- Understanding of DevOps practices and CI/CD pipelines.
· Strong knowledge on Windows and Linux
· Experience working in Version Control Systems like git
· Hands-on experience in tools Docker, SonarQube, Ansible, Kubernetes, ELK.
· Basic understanding of SQL commands
· Experience working on Azure Cloud DevOps
Objectives :
- Building and setting up new development tools and infrastructure
- Working on ways to automate and improve development and release processes
- Testing code written by others and analyzing results
- Ensuring that systems are safe and secure against cybersecurity threats
- Identifying technical problems and developing software updates and ‘fixes’
- Working with software developers and software engineers to ensure that development follows established processes and works as intended
- Planning out projects and being involved in project management decisions
Daily and Monthly Responsibilities :
- Deploy updates and fixes
- Build tools to reduce occurrences of errors and improve customer experience
- Develop software to integrate with internal back-end systems
- Perform root cause analysis for production errors
- Investigate and resolve technical issues
- Develop scripts to automate visualization
- Design procedures for system troubleshooting and maintenance
Skills and Qualifications :
- Degree in Computer Science or Software Engineering or BSc in Computer Science, Engineering or relevant field
- 3+ years of experience as a DevOps Engineer or similar software engineering role
- Proficient with git and git workflows
- Good logical skills and knowledge of programming concepts(OOPS,Data Structures)
- Working knowledge of databases and SQL
- Problem-solving attitude
- Collaborative team spirit
Our Client is an IT infrastructure services company, focused and specialized in delivering solutions and services on Microsoft products and technologies. They are a Microsoft partner and cloud solution provider. Our Client's objective is to help small, mid-sized as well as global enterprises to transform their business by using innovation in IT, adapting to the latest technologies and using IT as an enabler for business to meet business goals and continuous growth.
With focused and experienced management and a strong team of IT Infrastructure professionals, they are adding value by making IT Infrastructure a robust, agile, secure and cost-effective service to the business. As an independent IT Infrastructure company, they provide their clients with unbiased advice on how to successfully implement and manage technology to complement their business requirements.
- Providing on-call support within a high availability production environment
- Logging issues
- Providing Complex problem analysis and resolution for technical and application issues
- Supporting and collaborating with team members
- Running system updates
- Monitoring and responding to system alerts
- Developing and running system health checks
- Applying industry standard practices across the technology estate
- Performing system reviews
- Reviewing and maintaining infrastructure configuration
- Diagnosing performance issues and network bottlenecks
- Collaborating within geographically distributed teams
- Supporting software development infrastructure by continuous integration and delivery standards
- Working closely with developers and QA teams as part of a customer support centre
- Projecting delivery work, either individually or in conjunction with other teams, external suppliers or contractors
- Ensuring maintenance of the technical environments to meet current standards
- Ensuring compliance with appropriate industry and security regulations
- Providing support to Development and Customer Support teams
- Managing the hosted infrastructure through vendor engagement
- Managing 3rd party software licensing ensuring compliance
- Delivering new technologies as agreed by the business
What you need to have:
- Experience working within a technical operations environment relevant to associated skills stated.
- Be proficient in:
- Linux, zsh/ bash/ similar
- ssh, tmux/ screen/ similar
- vim/ emacs/ similar
- Computer networking
- Have a reasonable working knowledge of:
- Cloud infrastructure, Preferably GCP
- One or more programming/ scripting languages
- Git
- Docker
- Web services and web servers
- Databases, relational and NoSQL
- Some familiarity with:
- Puppet, ansible
- Terraform
- GitHub, CircleCI , Kubernetes
- Scripting language- Shell
- Databases: Cassandra, Postgres, MySQL or CloudSQL
- Agile working practices including scrum and Kanban
- Private & public cloud hosting environments
- Strong technology interests with a positive ‘can do’ attitude
- Be flexible and adaptable to changing priorities
- Be good at planning and organising their own time and able to meet targets and deadlines without supervision
- Excellent written and verbal communication skills.
- Approachable with both colleagues and team members
- Be resourceful and practical with an ability to respond positively and quickly to technical and business challenges
- Be persuasive, articulate and influential, but down to earth and friendly with own team and colleagues
- Have an ability to establish relationships quickly and to work effectively either as part of a team or singularly
- Be customer focused with both internal and external customers
- Be capable of remaining calm under pressure
- Technically minded with good problem resolution skills and systematic manner
- Excellent documentation skills
- Prepared to participate in out of hours support rota
As DevOps Engineer, you'll be part of the team building the stage for our Software Engineers to work on, helping to enhance our product performance and reliability.
Responsibilities:
- Build & operate infrastructure to support website, backed cluster, ML projects in the organization.
- Helping teams become more autonomous and allowing the Operation team to focus on improving the infrastructure and optimizing processes.
- Delivering system management tooling to the engineering teams.
- Working on your own applications which will be used internally.
- Contributing to open source projects that we are using (or that we may start).
- Be an advocate for engineering best practices in and out of the company.
- Organizing tech talks and participating in meetups and representing Box8 at industry events.
- Sharing pager duty for the rare instances of something serious happening.
- Collaborate with other developers to understand & setup tooling needed for Continuous Integration/Delivery/Deployment (CI/CD) practices.
Requirements:
- 1+ Years Of Industry Experience Scale existing back end systems to handle ever increasing amounts of traffic and new product requirements.
- Ruby On Rails or Python and Bash/Shell skills.
- Experience managing complex systems at scale.
- Experience with Docker, rkt or similar container engine.
- Experience with Kubernetes or similar clustering solutions.
- Experience with tools such as Ansible or Chef Understanding of the importance of smart metrics and alerting.
- Hands on experience with cloud infrastructure provisioning, deployment, monitoring (we are on AWS and use ECS, ELB, EC2, Elasticache, Elasticsearch, S3, CloudWatch).
- Experience with relational SQL and NoSQL databases, including Postgres and Cassandra.
- Knowledge of data pipeline and workflow management tools: Azkaban, Luigi, Airflow, etc.
- Experience in working on linux based servers.
- Managing large scale production grade infrastructure on AWS Cloud.
- Good Knowledge on scripting languages like ruby, python or bash.
- Experience in creating in deployment pipeline from scratch.
- Expertise in any of the CI tools, preferably Jenkins.
- Good knowledge of docker containers and its usage.
- Using Infra/App Monitoring tools like, CloudWatch/Newrelic/Sensu.
Good to have:
- Knowledge of Ruby on Rails based applications and its deployment methodologies.
- Experience working on Container Orchestration tools like Kubernetes/ECS/Mesos.
- Extra Points For Experience With Front-end development NewRelic GCP Kafka, Elasticsearch.
















