Your Role:
- Serve as a primary point responsible for the overall health, performance, and capacity of one or more of our Internet-facing services
- Gain deep knowledge of our complex applications
- Assist in the roll-out and deployment of new product features and installations to facilitate our rapid iteration and constant growth
- Develop tools to improve our ability to rapidly deploy and effectively monitor custom applications in a large-scale UNIX environment
- Work closely with development teams to ensure that platforms are designed with "operability" in mind.
- Function well in a fast-paced, rapidly-changing environment
- Should be able to lead a team of smart engineers
- Should be able to strategically guide the team to greater automation adoption
Must Have:
- Experience Building/managing DevOps/SRE teams
- Strong in troubleshooting/debugging Systems, Network and Applications
- Strong in Unix/Linux operating systems and Networking
- Working knowledge of Open source technologies in Monitoring, Deployment and incident management
Good to Have:
- Minimum 3+ years of team management experience
- Experience in Containers and orchestration layers like Kubernetes, Mesos/Marathon
- Proven experience in programming & diagnostics in any languages like Go, Python, Java
- Experience in NoSQL/SQL technologies like Cassandra/MySQL/CouchBase etc.
- Experience in BigData technologies like Kafka/Hadoop/Airflow/Spark
- Is a die-hard sports fan

About www.thecatalystiq.com
About
Connect with the team
Similar jobs
DevOps Engineer (Cloud & Infrastructure)
📍 Noida | 🕐 Full-Time | 🧭 Experience: 2–4 years
About TestMu AI
TestMu AI (formerly LambdaTest) is an AI-native platform designed to move software testing beyond simple automation into the era of agentic intelligence. It provides end-to-end AI agents that manage the entire Quality Engineering lifecycle.
- Full-Stack AI Agents: Autonomously plan, author, execute, and analyze tests across the SDLC.
- Comprehensive Coverage: Supports web, mobile, and enterprise applications.
- Real-World Testing: Scale execution across real devices, browsers, and custom environments.
About the Role
This isn't a role for someone who just wants to "maintain" systems. As a DevOps Engineer at TestMu AI, you are the architect of the automated highways that power our AI agents. You will step into a fast-paced environment where you bridge the gap between cloud-native automation and core infrastructure.
You will manage complex CI/CD pipelines, troubleshoot deep-seated Linux issues, and ensure our hybrid-cloud environment (AWS/Azure) is as resilient as the code it runs.
Key Responsibilities: The Pillars of Growth
A. DevOps & Automation (50% Focus)
- Platform Orchestration: Lead the migration to modular, self-healing Terraform and Helm templates.
- Agentic CI/CD: Architect GitHub Actions workflows that treat AI agents as first-class citizens, automating environment promotion and risk scoring.
- Kubernetes Mastery: Advanced management of Docker and K8s clusters to support scalable production workloads.
- Predictive Observability: Use Prometheus, Grafana, and ELK to move from reactive alerts to autonomous anomaly detection.
B. Networking & Data Center Mastery (30% Focus)
- Hybrid Networking: Design and troubleshoot VPCs and subnets in Azure/AWS/GCP, paired with physical VLANs and switches in our data centers.
- Bare-Metal Lifecycle: Automate hardware provisioning, RAID setup, and firmware updates for our real-device cloud.
- Remote Admin: Master out-of-band management (iDRAC, iLO, IPMI) to ensure 100% remote operational capability.
- Core Protocols: Own the lifecycle of DNS, DHCP, Load Balancing, and IPAM across distributed environments.
C. Development & Scripting (20% Focus)
- Backend Integration: Debug and optimize Python or Go code; understanding how logic interacts with system-level resources.
- Advanced Scripting: Write idempotent Bash/Python scripts to automate complex, multi-server operational tasks.
- Agentic Tooling: Support the integration of LLM-based developer tools into DevOps workflows to eliminate "toil".
The Interview Journey
We value your ability to solve problems under pressure more than your ability to memorize documentation.
- Technical Round 1 (DevOps Leads): A live session focused on real-world debugging scenarios and Linux fundamentals.
- Technical Round 2 (Hiring Manager / Pod Lead): An assessment of your architectural thinking, automation strategy, and team alignment.
- Technical Round 3 (SVP Engineering / VP DevOps): Strategic discussion on scalability, infrastructure vision, and technical leadership.
- Final Round (CEO): Mission alignment, cultural fit, and the "big picture" at TestMu AI.
Growth Timeline
This is a high-visibility role. You will receive direct mentorship from our senior engineering leadership. As you master our production environment, you will have a clear path to move into Senior DevOps Engineer or Infrastructure Architect roles as our pods scale.
Perks That Matter
Health Cover: Comprehensive insurance for you and your family.
Fresh Meals: Daily catered meals at the office.
Transport: Safe cab facilities for eligible shifts.
Pod Budgets: Dedicated engagement budgets for team building and offsites.
Key Qualifications :
- At least 2 years of hands-on experience with cloud infrastructure on AWS or GCP
- Exposure to configuration management and orchestration tools at scale (e.g. Terraform, Ansible, Packer)
- Knowledge in DevOps tools (e.g. Jenkins, Groovy, and Gradle)
- Familiarity with monitoring and alerting tools(e.g. CloudWatch, ELK stack, Prometheus)
- Proven ability to work independently or as an integral member of a team
Preferable Skills :
- Familiarity with standard IT security practices such as encryption, credentials and key management
- Proven ability to acquire various coding languages (Java, Python- ) to support DevOps operation and cloud transformation
- Familiarity in web standards (e.g. REST APIs, web security mechanisms)
- Multi-cloud management experience with GCP / Azure
- Experience in performance tuning, services outage management and troubleshooting
Job Responsibilities:
- Managing and maintaining the efficient functioning of containerized applications and systems within an organization
- Design, implement, and manage scalable Kubernetes clusters in cloud or on-premise environments
- Develop and maintain CI/CD pipelines to automate infrastructure and application deployments, and track all automation processes
- Implement workload automation using configuration management tools, as well as infrastructure as code (IaC) approaches for resource provisioning
- Monitor, troubleshoot, and optimize the performance of Kubernetes clusters and underlying cloud infrastructure
- Ensure high availability, security, and scalability of infrastructure through automation and best practices
- Establish and enforce cloud security standards, policies, and procedures Work agile technologies
Primary Requirements:
- Kubernetes: Proven experience in managing Kubernetes clusters (min. 2-3 years)
- Linux/Unix: Proficiency in administering complex Linux infrastructures and services
- Infrastructure as Code: Hands-on experience with CM tools like Ansible, as well as the
- knowledge of resource provisioning with Terraform or other Cloud-based utilities
- CI/CD Pipelines: Expertise in building and monitoring complex CI/CD pipelines to
- manage the build, test, packaging, containerization and release processes of software
- Scripting & Automation: Strong scripting and process automation skills in Bash, Python
- Monitoring Tools: Experience with monitoring and logging tools (Prometheus, Grafana)
- Version Control: Proficient with Git and familiar with GitOps workflows.
- Security: Strong understanding of security best practices in cloud and containerized
- environments.
Skills/Traits that would be an advantage:
- Kubernetes administration experience, including installation, configuration, and troubleshooting
- Kubernetes development experience
- Strong analytical and problem-solving skills
- Excellent communication and interpersonal skills
- Ability to work independently and as part of a team
- Design cloud infrastructure that is secure, scalable, and highly available on AWS, Azure and GCP
- Work collaboratively with software engineering to define infrastructure and deployment requirements
- Provision, configure and maintain AWS, Azure, GCP cloud infrastructure defined as code
- Ensure configuration and compliance with configuration management tools
- Administer and troubleshoot Linux based systems
- Troubleshoot problems across a wide array of services and functional areas
- Build and maintain operational tools for deployment, monitoring, and analysis of AWS, Azure Infrastructure and systems
- Perform infrastructure cost analysis and optimization

- Provides free and subscription-based website and email services hosted and operated at data centres in Mumbai and Hyderabad.
- Serve global audience and customers through sophisticated content delivery networks.
- Operate a service infrastructure using the latest technologies for web services and a very large storage infrastructure.
- Provides virtualized infrastructure, allows seamless migration and the addition of services for scalability.
- Pioneers and earliest adopters of public cloud and NoSQL big data store - since more than a decade.
- Provide innovative internet services with work on multiple technologies like php, java, nodejs, python and c++ to scale our services as per need.
- Has Internet infrastructure peering arrangements with all the major and minor ISPs and telecom service providers.
- Have mail traffic exchange agreements with major Internet services.
Job Details :
- This job position provides competitive professional opportunity both to experienced and aspiring engineers. The company's technology and operations groups are managed by senior professionals with deep subject matter expertise.
- The company believes having an open work environment offering mentoring and learning opportunities with an informal and flexible work culture, which allows professionals to actively participate and contribute to the success of our services and business.
- You will be part of a team that keeps the business running for cloud products and services that are used 24- 7 by the company's consumers and enterprise customers around the world. You will be asked to contribute to operate, maintain and provide escalation support for the company's cloud infrastructure that powers all of cloud offerings.
Job Role :
- As a senior engineer, your role grows as you gain experience in our operations. We facilitate a hands-on learning experience after an induction program, to get you into the role as quickly as possible.
- The systems engineer role also requires candidates to research and recommend innovative and automated approaches for system administration tasks.
- The work culture allows a seamless integration with different product engineering teams. The teams work together and share responsibility to triage in complex operational situations. The candidate is expected to stay updated on best practices and help evolve processes both for resilience of services and compliance.
- You will be required to provide support for both, production and non-production environments to ensure system updates and expected service levels. You will be required to specifically handle 24/7 L2 and L3 oversight for incident responses and have an excellent understanding of the end-to-end support process from client to different support escalation levels.
- The role also requires a discipline to create, update and maintain process documents, based on operation incidents, technologies and tools used in the processes to resolve issues.
QUALIFICATION AND EXPERIENCE :
- A graduate degree or senior diploma in engineering or technology with some or all of the following:
- Knowledge and work experience with KVM, AWS (Glacier, S3, EC2), RabbitMQ, Fluentd, Syslog, Nginx is preferred
- Installation and tuning of Web Servers, PHP, Java servlets, memory-based databases for scalability and performance
- Knowledge of email related protocols such as SMTP, POP3, IMAP along with experience in maintenance and administration of MTAs such as postfix, qmail, etc will be an added advantage
- Must have knowledge on monitoring tools, trend analysis, networking technologies, security tools and troubleshooting aspects.
- Knowledge of analyzing and mitigating security related issues and threats is certainly desirable.
- Knowledge of agile development/SDLC processes and hands-on participation in planning sprints and managing daily scrum is desirable.
- Preferably, programming experience in Shell, Python, Perl or C.
Requirements
You will make an ideal candidate if you have:
-
Experience of building a range of Services in a Cloud Service provider
-
Expert understanding of DevOps principles and Infrastructure as a Code concepts and techniques
-
Strong understanding of CI/CD tools (Jenkins, Ansible, GitHub)
-
Managed an infrastructure that involved 50+ hosts/network
-
3+ years of Kubernetes experience & 5+ years of experience in Native services such as Compute (virtual machines), Containers (AKS), Databases, DevOps, Identity, Storage & Security
-
Experience in engineering solutions on cloud foundation platform using Infrastructure As Code methods (eg. Terraform)
-
Security and Compliance, e.g. IAM and cloud compliance/auditing/monitoring tools
-
Customer/stakeholder focus. Ability to build strong relationships with Application teams, cross functional IT and global/local IT teams
-
Good leadership and teamwork skills - Works collaboratively in an agile environment
-
Operational effectiveness - delivers solutions that align to approved design patterns and security standards
-
Excellent skills in at least one of following: Python, Ruby, Java, JavaScript, Go, Node.JS
-
Experienced in full automation and configuration management
-
A track record of constantly looking for ways to do things better and an excellent understanding of the mechanism necessary to successfully implement change
-
Set and achieved challenging short, medium and long term goals which exceeded the standards in their field
-
Excellent written and spoken communication skills; an ability to communicate with impact, ensuring complex information is articulated in a meaningful way to wide and varied audiences
-
Built effective networks across business areas, developing relationships based on mutual trust and encouraging others to do the same
-
A successful track record of delivering complex projects and/or programmes, utilizing appropriate techniques and tools to ensure and measure success
-
A comprehensive understanding of risk management and proven experience of ensuring own/others' compliance with relevant regulatory processes
Essential Skills :
-
Demonstrable Cloud service provider experience - infrastructure build and configurations of a variety of services including compute, devops, databases, storage & security
-
Demonstrable experience of Linux administration and scripting preferably Red Hat
-
Experience of working with Continuous Integration (CI), Continuous Delivery (CD) and continuous testing tools
-
Experience working within an Agile environment
-
Programming experience in one or more of the following languages: Python, Ruby, Java, JavaScript, Go, Node.JS
-
Server administration (either Linux or Windows)
-
Automation scripting (using scripting languages such as Terraform, Ansible etc.)
-
Ability to quickly acquire new skills and tools
Required Skills :
-
Linux & Windows Server Certification
We are a growth-oriented, dynamic, multi-national startup, so those that are looking for that startup excitement, dynamics, and buzz are here at the right place. Read on -
FrontM (http://www.frontm.com/" target="_blank">www.frontm.com) is an edge AI company with a platform that is redefining how businesses and people in remote and isolated environments (maritime, aviation, mining....) collaborate and drive smart decisions.
Successful candidate will lead the back end architecture working alongside VP of delivery, CTO and CEO
The problem you will be working on:
- Take ownership of AWS cloud infrastructure
- Overlook tech ops with hands-on CI/CD and administration
- Develop Node.js Java and backend system procedures for stability, scale and performance
- Understand FrontM platform roadmap and contribute to planning strategic and tactical capabilities
- Integrate APIs and abstractions for complex requirements
Who you are:
- You are an experienced Cloud Architect and back end developer
- You have experience creating AWS Serverless Lambdas EC2 MongoDB backends
- You have extensive CI/CD and DevOps experience
- You can take ownership of continuous server uptime, maintenance, stability and performance
- You can lead a team of backend developers and architects
- You are a die-hard problem solver and never-say-no person
- You have 10+ years experience
- You are very sound in English language
- You have the ability to initiate and lead teams working with senior management
Additional benefits
- Generous pay package, flexible for the right candidate
- Career development and growth planning
- Entrepreneurial environment that nurtures and promotes innovation
- Multi-national team with an enjoyable culture
We'd love to talk to you if you find this interesting and like to join in on our exciting journey
DevOps Engineer Skills Building a scalable and highly available infrastructure for data science Knows data science project workflows Hands-on with deployment patterns for online/offline predictions (server/serverless)
Experience with either terraform or Kubernetes
Experience of ML deployment frameworks like Kubeflow, MLflow, SageMaker Working knowledge of Jenkins or similar tool Responsibilities Owns all the ML cloud infrastructure (AWS) Help builds out an entirely CI/CD ecosystem with auto-scaling Work with a testing engineer to design testing methodologies for ML APIs Ability to research & implement new technologies Help with cost optimizations of infrastructure.
Knowledge sharing Nice to Have Develop APIs for machine learning Can write Python servers for ML systems with API frameworks Understanding of task queue frameworks like Celery








