
DevOps Engineer (Cloud & Infrastructure)
📍 Noida | 🕐 Full-Time | 🧭 Experience: 2–4 years
About TestMu AI
TestMu AI (formerly LambdaTest) is an AI-native platform designed to move software testing beyond simple automation into the era of agentic intelligence. It provides end-to-end AI agents that manage the entire Quality Engineering lifecycle.
- Full-Stack AI Agents: Autonomously plan, author, execute, and analyze tests across the SDLC.
- Comprehensive Coverage: Supports web, mobile, and enterprise applications.
- Real-World Testing: Scale execution across real devices, browsers, and custom environments.
About the Role
This isn't a role for someone who just wants to "maintain" systems. As a DevOps Engineer at TestMu AI, you are the architect of the automated highways that power our AI agents. You will step into a fast-paced environment where you bridge the gap between cloud-native automation and core infrastructure.
You will manage complex CI/CD pipelines, troubleshoot deep-seated Linux issues, and ensure our hybrid-cloud environment (AWS/Azure) is as resilient as the code it runs.
Key Responsibilities: The Pillars of Growth
A. DevOps & Automation (50% Focus)
- Platform Orchestration: Lead the migration to modular, self-healing Terraform and Helm templates.
- Agentic CI/CD: Architect GitHub Actions workflows that treat AI agents as first-class citizens, automating environment promotion and risk scoring.
- Kubernetes Mastery: Advanced management of Docker and K8s clusters to support scalable production workloads.
- Predictive Observability: Use Prometheus, Grafana, and ELK to move from reactive alerts to autonomous anomaly detection.
B. Networking & Data Center Mastery (30% Focus)
- Hybrid Networking: Design and troubleshoot VPCs and subnets in Azure/AWS/GCP, paired with physical VLANs and switches in our data centers.
- Bare-Metal Lifecycle: Automate hardware provisioning, RAID setup, and firmware updates for our real-device cloud.
- Remote Admin: Master out-of-band management (iDRAC, iLO, IPMI) to ensure 100% remote operational capability.
- Core Protocols: Own the lifecycle of DNS, DHCP, Load Balancing, and IPAM across distributed environments.
C. Development & Scripting (20% Focus)
- Backend Integration: Debug and optimize Python or Go code; understanding how logic interacts with system-level resources.
- Advanced Scripting: Write idempotent Bash/Python scripts to automate complex, multi-server operational tasks.
- Agentic Tooling: Support the integration of LLM-based developer tools into DevOps workflows to eliminate "toil".
The Interview Journey
We value your ability to solve problems under pressure more than your ability to memorize documentation.
- Technical Round 1 (DevOps Leads): A live session focused on real-world debugging scenarios and Linux fundamentals.
- Technical Round 2 (Hiring Manager / Pod Lead): An assessment of your architectural thinking, automation strategy, and team alignment.
- Technical Round 3 (SVP Engineering / VP DevOps): Strategic discussion on scalability, infrastructure vision, and technical leadership.
- Final Round (CEO): Mission alignment, cultural fit, and the "big picture" at TestMu AI.
Growth Timeline
This is a high-visibility role. You will receive direct mentorship from our senior engineering leadership. As you master our production environment, you will have a clear path to move into Senior DevOps Engineer or Infrastructure Architect roles as our pods scale.
Perks That Matter
Health Cover: Comprehensive insurance for you and your family.
Fresh Meals: Daily catered meals at the office.
Transport: Safe cab facilities for eligible shifts.
Pod Budgets: Dedicated engagement budgets for team building and offsites.

About TestMu AI (Formely LambdaTest)
About
TestMu AI (formerly LambdaTest) is the world’s first full-stack Agentic AI Quality Engineering Platform.
We built TestMu AI for a reality where software is written by AI and must be shipped at machine speed.
Tech stack
Connect with the team
Similar jobs
Job Description
Position - SRE developer / DevOps Engineer
Location - Mumbai
Experience- 3- 10 years
About HaystackAnalytics:
HaystackAnalytics is a company working in deep technology of genomics, computing and data science for creating the first of its kind clinical reporting engine in Healthcare. We are a new but well funded company with a tremendous amount of pedigree in the team (IIT Founders, IIT & IIM core team). Some of the technologies we have created are a global first in infectious disease and chronic diagnostics. As a product company creating a huge amount of IP, our Technology and R&D team are our crown jewels. With early success of our products in India, we are now expanding to take our products to international shores.
Inviting Passionate Engineers to join a new age enterprise:
At HaystackAnalytics, we rely on our dynamic team of engineers to solve the many challenges and puzzles that come with our rapidly evolving stack that deals with Healthcare and Genomics.
We’re looking for full stack engineers who are passionate problem solvers, ready to work with new technologies and architectures in a forward-thinking organization that’s always pushing boundaries. Here, you will take complete, end-to-end ownership of projects across the entire stack.
Our ideal candidate has experience building enterprise products and has understanding and experience of working with new age front end technologies, web frameworks, APIs, databases, distributed computing,back end languages, caching, security, message based architectures et al.
You’ll be joining a small team working at the forefront of new technology, solving the challenges that impact both the front end and back end architecture, and ultimately, delivering amazing global user experiences.
Objectives of this Role:
- Work across the full stack, building highly scalable distributed solutions that enable positive user experiences and measurable business growth
- Ideate and develop new product features in collaboration with domain experts in healthcare and genomics
- Develop state of the art enterprise standard front-end and backend services
- Develop cloud platform services based on container orchestration platform
- Continuously embrace automation for repetitive tasks
- Ensure application performance, uptime, and scale, maintaining high standards of code quality by using clean coding principles and solid design patterns
- Build robust tech modules that are Unit Testable, Automating recurring tasks and processes
- Engage effectively with team members and collaborate to upskill and unblock each other
Frontend Skills
- HTML 5
- CSS framework ( LESS/ SASS )
- Es6 / Typescript
- Electron app / TAURI
- Component library ( Webcomponents / radix / material )
- CSS ( tailwind)
- State management --> Redux / Zustand / Recoil
- Build tools - > (webpack/vite/Parcel/turborepo)
- Frameworks -- > Next JS /
- Design patterns
- Test Automation Frameworks (cypress playwright etc )
- Functional Programming concepts
- Scripting ( bash , python )
Backend Skills
- Node / Deno / bun - Express / NEST JS
- Language : Typescript / Python / Rust /
- REST / GRAPHQL
- SOLID Design Principles
- Storage (mongodb/ Object Storage / postgres )
- Caching ( Redis / In memory Data grid )
- Pub sub (KAFKA / SQS / SNS / Event bridge / RabbitMQ)
- Container Technology ( Docker / Kubernetes )
- Cloud ( azure , aws , openshift )
- GITOPS
- Automation ( terraform , Serverless )
Other Skills
- Innovation and thought leadership
- UI - UX design skills
- Interest in learning new tools, languages, workflows, and philosophies to grow
- Communication
Roles and Responsibilities:
▪ Data Pipeline Development: Build, deploy, and maintain efficient ETL/ELT pipelines using Azure
Data Factory, Data Factory & Azure Synapse Analytics.
▪ We are only looking for senior candidates with over 5 yrs of relevant exp with ample client
facing exp.
· Finance/Insurance experience is also a must.
▪ Data Modelling & Warehousing: Design and optimize data models, warehouses, and lakes for
structured/unstructured data.
▪ SQL & Query Optimization: Write complex SQL queries, optimize performance, and manage
databases. · Python Automation: Develop scripts for data processing, automation, and
integration using Python (Pandas, NumPy).
Technical Skills:
▪ Cloud Technologies: Azure Synapse Analytics, Azure Fabric, Azure Databricks and AWS(good to
have)
▪ Knowledge of Python, Pyspark, SQL, ETL concepts
▪ Good understanding of Insurance Operations and KPI reporting is an advantage.
Lead DevSecOps Engineer
Location: Pune, India (In-office) | Experience: 3–5 years | Type: Full-time
Apply here → https://lnk.ink/CLqe2
About FlytBase:
FlytBase is a Physical AI platform powering autonomous drones and robots across industrial sites. Our software enables 24/7 operations in critical infrastructure like solar farms, ports, oil refineries, and more.
We're building intelligent autonomy — not just automation — and security is core to that vision.
What You’ll Own
You’ll be leading and building the backbone of our AI-native drone orchestration platform — used by global industrial giants for autonomous operations.
Expect to:
- Design and manage multi-region, multi-cloud infrastructure (AWS, Kubernetes, Terraform, Docker)
- Own infrastructure provisioning through GitOps, Ansible, Helm, and IaC
- Set up observability stacks (Prometheus, Grafana) and write custom alerting rules
- Build for Zero Trust security — logs, secrets, audits, access policies
- Lead incident response, postmortems, and playbooks to reduce MTTR
- Automate and secure CI/CD pipelines with SAST, DAST, image hardening
- Script your way out of toil using Python, Bash, or LLM-based agents
- Work alongside dev, platform, and product teams to ship secure, scalable systems
What We’re Looking For:
You’ve probably done a lot of this already:
- 3–5+ years in DevOps / DevSecOps for high-availability SaaS or product infra
- Hands-on with Kubernetes, Terraform, Docker, and cloud-native tooling
- Strong in Linux internals, OS hardening, and network security
- Built and owned CI/CD pipelines, IaC, and automated releases
- Written scripts (Python/Bash) that saved your team hours
- Familiar with SOC 2, ISO 27001, threat detection, and compliance work
Bonus if you’ve:
- Played with LLMs or AI agents to streamline ops and Built bots that monitor, patch, or auto-deploy.
What It Means to Be a Flyter
- AI-native instincts: You don’t just use AI — you think in it. Your terminal window has a co-pilot.
- Ownership without oversight: You own outcomes, not tasks. No one micromanages you here.
- Joy in complexity: Security + infra + scale = your happy place.
- Radical candor: You give and receive sharp feedback early — and grow faster because of it.
- Loops over lines: we prioritize continuous feedback, iteration, and learning over one-way execution or rigid, linear planning.
- H3: Happy. Healthy. High-Performing. We believe long-term performance stems from an environment where you feel emotionally fulfilled, physically well, and deeply motivated.
- Systems > Heroics: We value well-designed, repeatable systems over last-minute firefighting or one-off effort.
Perks:
▪ Unlimited leave & flexible hours
▪ Top-tier health coverage
▪ Budget for AI tools, courses
▪ International deployments
▪ ESOPs and high-agency team culture
Apply Here- https://lnk.ink/CLqe2
Challenging opportunity to Improve and manage a complicated AWS Infrastructure built for ECG wearable, for live ecg streaming and monitoring Patients
Hi,
We are looking for candidate with experience in DevSecOps.
Please find below JD for your reference.
Responsibilities:
Execute shell scripts for seamless automation and system management.
Implement infrastructure as code using Terraform for AWS, Kubernetes, Helm, kustomize, and kubectl.
Oversee AWS security groups, VPC configurations, and utilize Aviatrix for efficient network orchestration.
Contribute to Opentelemetry Collector for enhanced observability.
Implement microsegmentation using AWS native resources and Aviatrix for commercial routes.
Enforce policies through Open Policy Agent (OPA) integration.
Develop and maintain comprehensive runbooks for standard operating procedures.
Utilize packet tracing for network analysis and security optimization.
Apply OWASP tools and practices for robust web application security.
Integrate container vulnerability scanning tools seamlessly within CI/CD pipelines.
Define security requirements for source code repositories, binary repositories, and secrets managers in CI/CD pipelines.
Collaborate with software and platform engineers to infuse security principles into DevOps teams.
Regularly monitor and report project status to the management team.
Qualifications:
Proficient in shell scripting and automation.
Strong command of Terraform, AWS, Kubernetes, Helm, kustomize, and kubectl.
Deep understanding of AWS security practices, VPC configurations, and Aviatrix.
Familiarity with Opentelemetry for observability and OPA for policy enforcement.
Experience in packet tracing for network analysis.
Practical application of OWASP tools and web application security.
Integration of container vulnerability scanning tools within CI/CD pipelines.
Proven ability to define security requirements for source code repositories, binary repositories, and secrets managers in CI/CD pipelines.
Collaboration expertise with DevOps teams for security integration.
Regular monitoring and reporting capabilities.
Site Reliability Engineering experience.
Hands-on proficiency with source code management tools, especially Git.
Cloud platform expertise (AWS, Azure, or GCP) with hands-on experience in deploying and managing applications.
Please send across your updated profile.
Objectives :
- Building and setting up new development tools and infrastructure
- Working on ways to automate and improve development and release processes
- Testing code written by others and analyzing results
- Ensuring that systems are safe and secure against cybersecurity threats
- Identifying technical problems and developing software updates and ‘fixes’
- Working with software developers and software engineers to ensure that development follows established processes and works as intended
- Planning out projects and being involved in project management decisions
Daily and Monthly Responsibilities :
- Deploy updates and fixes
- Build tools to reduce occurrences of errors and improve customer experience
- Develop software to integrate with internal back-end systems
- Perform root cause analysis for production errors
- Investigate and resolve technical issues
- Develop scripts to automate visualization
- Design procedures for system troubleshooting and maintenance
Skills and Qualifications :
- Degree in Computer Science or Software Engineering or BSc in Computer Science, Engineering or relevant field
- 3+ years of experience as a DevOps Engineer or similar software engineering role
- Proficient with git and git workflows
- Good logical skills and knowledge of programming concepts(OOPS,Data Structures)
- Working knowledge of databases and SQL
- Problem-solving attitude
- Collaborative team spirit
- Understanding customer requirements and project KPIs
- Implementing various development, testing, automation tools, and IT infrastructure
- Planning the team structure, activities, and involvement in project management activities.
- Managing stakeholders and external interfaces
- Setting up tools and required infrastructure
- Defining and setting development, test, release, update, and support processes for DevOps operation
- Have the technical skill to review, verify, and validate the software code developed in the project.
- Troubleshooting techniques and fixing the code bugs
- Monitoring the processes during the entire lifecycle for its adherence and updating or creating new processes for improvement and minimizing the wastage
- Encouraging and building automated processes wherever possible
- Identifying and deploying cybersecurity measures by continuously performing vulnerability assessment and risk management
- Incidence management and root cause analysis
- Coordination and communication within the team and with customers
- Selecting and deploying appropriate CI/CD tools
- Strive for continuous improvement and build continuous integration, continuous development, and constant deployment pipeline (CI/CD Pipeline)
- Mentoring and guiding the team members
- Monitoring and measuring customer experience and KPIs
- Managing periodic reporting on the progress to the management and the customer
This company is a network of the world's best developers - full-time, long-term remote software jobs with better compensation and career growth. We enable our clients to accelerate their Cloud Offering, and Capitalize on Cloud. We have our own IOT/AI platform and we provide professional services on that platform to build custom clouds for their IOT devices. We also build mobile apps, run 24x7 devops/site reliability engineering for our clients.
We are looking for very hands-on SRE (Site Reliability Engineering) engineers with 3 to 6 years of experience. The person will be part of team that is responsible for designing & implementing automation from scratch for medium to large scale cloud infrastructure and providing 24x7 services to our North American / European customers. This also includes ensuring ~100% uptime for almost 50+ internal sites. The person is expected to deliver with both high speed and high quality as well as work for 40 Hours per week (~6.5 hours per day, 6 days per week) in shifts which will rotate every month.
This person MUST have:
- B.E Computer Science or equivalent
- 2+ Years of hands-on experience troubleshooting/setting up of the Linux environment, who can write shell scripts for any given requirement.
- 1+ Years of hands-on experience setting up/configuring AWS or GCP services from SCRATCH and maintaining them.
- 1+ Years of hands-on experience setting up/configuring Kubernetes & EKS and ensuring high availability of container orchestration.
- 1+ Years of hands-on experience setting up CICD from SCRATCH in Jenkins & Gitlab.
- Experience configuring/maintaining one monitoring tool.
- Excellent verbal & written communication skills.
- Candidates with certifications - AWS, GCP, CKA, etc will be preferred
- Hands-on experience with databases (Cassandra, MongoDB, MySQL, RDS).
Experience:
- Min 3 years of experience as SRE automation engineer building, running, and maintaining production sites. Not looking for candidates who have experience only as L1/L2 or Build & Deploy..
Location:
- Remotely, anywhere in India
Timings:
- The person is expected to deliver with both high speed and high quality as well as work for 40 Hours per week (~6.5 hours per day, 6 days per week) in shifts which will rotate every month.
Position:
- Full time/Direct
- We have great benefits such as PF, medical insurance, 12 annual company holidays, 12 PTO leaves per year, annual increments, Diwali bonus, spot bonuses and other incentives etc.
- We dont believe in locking in people with large notice periods. You will stay here because you love the company. We have only a 15 days notice period.
We are global expert in cloud consulting and service management, focusing exclusively on the Cloud DevOps Space. In short, we strive to be at the forefront in this era of digital disruption by being dynamic, agile and cohesive in providing businesses the solutions needed to leverage it to the next level. Our expert team of Engineers, Programmers, Designers and Business development professionals are the foundations of our firm with the fusion of cutting-edge technology.Nimble IT Consulting is vested in Research and Analysis of Current and Upcoming trends, be it Technology, Business Values and User Experience, we dedicate our efforts tirelessly to be at the pinnacle of the Quality Standards. Devising solutions that are just not only being approved or followed by industry leaders in fact they depend on it. Read more about us: https://nimbleitconsulting.com/" target="_blank">https://nimbleitconsulting.com
What we are looking for
A DevOps Engineer to join our team and provide consulting services to our clients, below is the technology stack we are interested in
Technical skills
- Expertise in implementing and managing Devops CI/CD pipeline. ( either using Jenkins or Azure DevOps )
- At least one AWS or Azure Certification
- Terraform Scripting
- Hands-on experience with git and source code management and release management.
- Experience in DevOps automation tools. And Very well versed with DevOps principles and the Agile Frameworks.
- Working knowledge of scripting using shell, Python, Gradle, Yaml, Ansible or puppet or chef.
- Working knowledge of build systems for various technologies like npm, maven etc.
- Experience and good understanding in any of Cloud platforms like AWS, Azure or Google cloud.
- Hands on Knowledge of Docker and Kubernetes is required.
- Proficient in troubleshooting skills with proven abilities in resolving complex technical issues. Experience with working with ticketing tools (Jira & Service now)
- A programming language like Java, Go , NodeJS is a nice to have.
- Work Permit for United Kingdom ( tier 2 visa ) total duration of visa will be 5 years ( first 2 years and then 3 year extension)
- At the end of the 5 years you will be eligible for British Citizenship by applying for Indefinite leave to remain in the UK
- Learn new technologies - We won’t ever expect you to do the same thing day in day out; we want to
- give you the chance to explore the latest techniques to solve challenging technical problems and help
- you become the best developer you can be.
- Join a growing agile team that are consistently delivering.
- Technical Development Program
We are an equal opportunity employer and value diversity at our company. We do not discriminate on the
basis of race, religion, colour, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.
A.P.T Portfolio, a high frequency trading firm that specialises in Quantitative Trading & Investment Strategies.Founded in November 2009, it has been a major liquidity provider in global Stock markets.
As a manager, you would be incharge of managing the devops team and your remit shall include the following
- Private Cloud - Design & maintain a high performance and reliable network architecture to support HPC applications
- Scheduling Tool - Implement and maintain a HPC scheduling technology like Kubernetes, Hadoop YARN Mesos, HTCondor or Nomad for processing & scheduling analytical jobs. Implement controls which allow analytical jobs to seamlessly utilize ideal capacity on the private cloud.
- Security - Implementing best security practices and implementing data isolation policy between different divisions internally.
- Capacity Sizing - Monitor private cloud usage and share details with different teams. Plan capacity enhancements on a quarterly basis.
- Storage solution - Optimize storage solutions like NetApp, EMC, Quobyte for analytical jobs. Monitor their performance on a daily basis to identify issues early.
- NFS - Implement and optimize latest version of NFS for our use case.
- Public Cloud - Drive AWS/Google-Cloud utilization in the firm for increasing efficiency, improving collaboration and for reducing cost. Maintain the environment for our existing use cases. Further explore potential areas of using public cloud within the firm.
- BackUps - Identify and automate back up of all crucial data/binary/code etc in a secured manner at such duration warranted by the use case. Ensure that recovery from back-up is tested and seamless.
- Access Control - Maintain password less access control and improve security over time. Minimize failures for automated job due to unsuccessful logins.
- Operating System -Plan, test and roll out new operating system for all production, simulation and desktop environments. Work closely with developers to highlight new performance enhancements capabilities of new versions.
- Configuration management -Work closely with DevOps/ development team to freeze configurations/playbook for various teams & internal applications. Deploy and maintain standard tools such as Ansible, Puppet, chef etc for the same.
- Data Storage & Security Planning - Maintain a tight control of root access on various devices. Ensure root access is rolled back as soon the desired objective is achieved.
- Audit access logs on devices. Use third party tools to put in a monitoring mechanism for early detection of any suspicious activity.
- Maintaining all third party tools used for development and collaboration - This shall include maintaining a fault tolerant environment for GIT/Perforce, productivity tools such as Slack/Microsoft team, build tools like Jenkins/Bamboo etc
Qualifications
- Bachelors or Masters Level Degree, preferably in CSE/IT
- 10+ years of relevant experience in sys-admin function
- Must have strong knowledge of IT Infrastructure, Linux, Networking and grid.
- Must have strong grasp of automation & Data management tools.
- Efficient in scripting languages and python
Desirables
- Professional attitude, co-operative and mature approach to work, must be focused, structured and well considered, troubleshooting skills.
- Exhibit a high level of individual initiative and ownership, effectively collaborate with other team members.
APT Portfolio is an equal opportunity employer










