Role : SRE
Experience : 4 - 8 Years
- Experience in building, deploying and operating cloud solutions on Kubernetes
- Strong expertise administrating and scaling Kubernetes on bare metal and CKA preferred
- Expertise on K8s Interfaces CNI, CSI, CRI and Service meshe
- Hands-on experience as a DevOps or Automation development
- Demonstrable knowledge of TCP/IP, Linux operating system internals, filesystems, disk/storage technologies and storage protocols.
- Experience working with Helm Charts and building out Infrastructure As Code (IaC)
- Experience in writing software to automate orchestration tasks at scale; we commonly use Python, Go, and Shell scripting
- Knowledge of systems (Linux, GNU tooling), networking (OSI model, DNS, routing) and virtualization vs containerization
- Expertise in CI/CD tooling for cloud-based applications specifically Terraform / CloudFormation, Jenkins and Git
- Architected CNF Orchestration with Kubernetes
- Strong understanding of the principles of 12-factor apps and modern containerized microservices
- Plan for reliability by designing systems to work across our multi-region and multi-cloud environments
- Experience developing and using Application & Integration stacks/tools such as Kafka, Spring Cloud, Apache Camel, Kubernetes, Docker, Redis, Knative, and NoSQL

Similar jobs
DevOps Engineer (Cloud & Infrastructure)
📍 Noida | 🕐 Full-Time | 🧭 Experience: 2–4 years
About TestMu AI
TestMu AI (formerly LambdaTest) is an AI-native platform designed to move software testing beyond simple automation into the era of agentic intelligence. It provides end-to-end AI agents that manage the entire Quality Engineering lifecycle.
- Full-Stack AI Agents: Autonomously plan, author, execute, and analyze tests across the SDLC.
- Comprehensive Coverage: Supports web, mobile, and enterprise applications.
- Real-World Testing: Scale execution across real devices, browsers, and custom environments.
About the Role
This isn't a role for someone who just wants to "maintain" systems. As a DevOps Engineer at TestMu AI, you are the architect of the automated highways that power our AI agents. You will step into a fast-paced environment where you bridge the gap between cloud-native automation and core infrastructure.
You will manage complex CI/CD pipelines, troubleshoot deep-seated Linux issues, and ensure our hybrid-cloud environment (AWS/Azure) is as resilient as the code it runs.
Key Responsibilities: The Pillars of Growth
A. DevOps & Automation (50% Focus)
- Platform Orchestration: Lead the migration to modular, self-healing Terraform and Helm templates.
- Agentic CI/CD: Architect GitHub Actions workflows that treat AI agents as first-class citizens, automating environment promotion and risk scoring.
- Kubernetes Mastery: Advanced management of Docker and K8s clusters to support scalable production workloads.
- Predictive Observability: Use Prometheus, Grafana, and ELK to move from reactive alerts to autonomous anomaly detection.
B. Networking & Data Center Mastery (30% Focus)
- Hybrid Networking: Design and troubleshoot VPCs and subnets in Azure/AWS/GCP, paired with physical VLANs and switches in our data centers.
- Bare-Metal Lifecycle: Automate hardware provisioning, RAID setup, and firmware updates for our real-device cloud.
- Remote Admin: Master out-of-band management (iDRAC, iLO, IPMI) to ensure 100% remote operational capability.
- Core Protocols: Own the lifecycle of DNS, DHCP, Load Balancing, and IPAM across distributed environments.
C. Development & Scripting (20% Focus)
- Backend Integration: Debug and optimize Python or Go code; understanding how logic interacts with system-level resources.
- Advanced Scripting: Write idempotent Bash/Python scripts to automate complex, multi-server operational tasks.
- Agentic Tooling: Support the integration of LLM-based developer tools into DevOps workflows to eliminate "toil".
The Interview Journey
We value your ability to solve problems under pressure more than your ability to memorize documentation.
- Technical Round 1 (DevOps Leads): A live session focused on real-world debugging scenarios and Linux fundamentals.
- Technical Round 2 (Hiring Manager / Pod Lead): An assessment of your architectural thinking, automation strategy, and team alignment.
- Technical Round 3 (SVP Engineering / VP DevOps): Strategic discussion on scalability, infrastructure vision, and technical leadership.
- Final Round (CEO): Mission alignment, cultural fit, and the "big picture" at TestMu AI.
Growth Timeline
This is a high-visibility role. You will receive direct mentorship from our senior engineering leadership. As you master our production environment, you will have a clear path to move into Senior DevOps Engineer or Infrastructure Architect roles as our pods scale.
Perks That Matter
Health Cover: Comprehensive insurance for you and your family.
Fresh Meals: Daily catered meals at the office.
Transport: Safe cab facilities for eligible shifts.
Pod Budgets: Dedicated engagement budgets for team building and offsites.
Job Title: Senior DevOps Engineer
Location: Gurgaon – Sector 39
Work Mode: 5 Days Onsite
Experience: 5+ Years
About the Role
We are looking for an experienced Senior DevOps Engineer to build, manage, and maintain highly reliable, scalable, and secure infrastructure. The role involves deploying product updates, handling production issues, implementing customer integrations, and leading DevOps best practices across teams.
Key Responsibilities
- Manage and maintain production-grade infrastructure ensuring high availability and performance.
- Deploy application updates, patches, and bug fixes across environments.
- Handle Level-2 support and resolve escalated production issues.
- Perform root cause analysis and implement preventive solutions.
- Build automation tools and scripts to improve system reliability and efficiency.
- Develop monitoring, logging, alerting, and reporting systems.
- Ensure secure deployments following data encryption and cybersecurity best practices.
- Collaborate with development, product, and QA teams for smooth releases.
- Lead and mentor a small DevOps team (3–4 engineers).
Core Focus Areas
Server Setup & Management (60%)
- Hands-on management of bare-metal servers.
- Server provisioning, configuration, and lifecycle management.
- Network configuration including redundancy, bonding, and performance tuning.
Queue Systems – Kafka / RabbitMQ (15%)
- Implementation and management of message queues for distributed systems.
Storage Systems – SAN / NAS (15%)
- Setup and management of enterprise storage systems.
- Ensure backup, recovery, and data availability.
Database Knowledge (5%)
- Working experience with Redis, MySQL/PostgreSQL, MongoDB, Elasticsearch.
- Basic database administration and performance tuning.
Telecom Exposure (Good to Have – 5%)
- Experience with SMS, voice systems, or real-time data processing environments.
Technical Skills Required
- Linux administration & Shell scripting
- CI/CD tools – Jenkins
- Git (GitHub / SVN) and branching strategies
- Docker & Kubernetes
- AWS cloud services
- Ansible for configuration management
- Databases: MySQL, MariaDB, MongoDB
- Web servers: Apache, Tomcat
- Load balancing & HA: HAProxy, Keepalived
- Monitoring tools: Nagios and related observability stacks
Company Overview:
Planview has one mission: to build the future of connected work with market-leading portfolio management and work management solutions. Planview is a recognized innovator and industry leader, our solutions enable organizations to connect the business from ideas to impact, empowering companies to accelerate the achievement of what matters most. Our solutions span every class of work, resource, and organization to address the varying needs of diverse and distributed teams, departments, and enterprises.
As a Sr CloudOps Engineer II, you will oversee teams of Engineers and be a champion for configuration management, technologies in the cloud, and continuous improvement. You will work closely with global leaders to ensure that our applications, infrastructure, and processes are scalable, secure, and supportable. By leveraging your production experience and development skills you will work hand in hand with Engineers (Dev, DevOps, DBOps) to design and implement solutions that improve delivery of value to customers, reduce costs, and eliminate toil.
Responsibilities (What you will do):
- Guide the professional development of Engineers and support the teams to accomplish business goals
- Work closely with leaders in the Israel to align on priorities and architect, deliver, and manage our products
- Build systems that are secure, scalable, and self-healing.
- Manage and improve deployment pipelines.
- Triage and remediate production issues.
- Participate in on-call rotations for escalations.
Qualifications (What you will bring):
- Bachelor's degree is CS or equivalent experience in related field.
- 2+ years managing Engineering teams.
- 8+ years of experience as a site reliability or platform engineer, preferably in a fast-scaling environment
- 5+ years administering Linux and Windows environments.
- 3+ years programming / scripting experience (e.g., Python, JavaScript, PowerShell)
- Strong technical knowledge in OS’s (Linux and Windows), virtualizations, storage systems, networking, and firewall implementations
- Maintaining production environments in the On Premise (90%) and Cloud (10%) (e.g., AWS, Google Cloud, Azure)
- Solid understanding of networking principles and how it applies to data flow and security.
- Automating deployments of cloud based available services (e.g., AWS EC2 / RDS, Docker, Kubernetes)
- Experience managing CI/CD infrastructures, with a strong proficiency in platforms like bitbucket and Jenkins to streamline deployment pipelines and ensure efficient software delivery.
- Management of resources using Infrastructure as Code tools (e.g., CloudFormation, Terraform, Chef)
- Knowledge of observability tools such as LogicMonitor, New Relic, Prometheus, and Coralogix, as well as their implementation.
- Worked within Agile and Lean software development teams.
- Experience working in globally distributed teams.
- Ability to look on the big picture and manage risks.
Role Overview:
As a DevOps Engineer (L2), you will play a key role in designing, implementing, and optimizing infrastructure. You will take ownership of automating processes, improving system reliability, and supporting the development lifecycle.
Key Responsibilities:
- Design and manage scalable, secure, and highly available cloud infrastructure.
- Lead efforts in implementing and optimizing CI/CD pipelines.
- Automate repetitive tasks and develop robust monitoring solutions.
- Ensure the security and compliance of systems, including IAM, VPCs, and network configurations.
- Troubleshoot complex issues across development, staging, and production environments.
- Mentor and guide L1 engineers on best practices.
- Stay updated on emerging DevOps tools and technologies.
- Manage cloud resources efficiently using Infrastructure as Code (IaC) tools like Terraform and AWS CloudFormation.
Qualifications:
- Bachelor’s degree in Computer Science, IT, or a related field.
- Proven experience with CI/CD pipelines and tools like Jenkins, GitLab, or Azure DevOps.
- Advanced knowledge of cloud platforms (AWS, Azure, or GCP) with hands-on experience in deployments, migrations, and optimizations.
- Strong expertise in containerization (Docker) and orchestration tools (Kubernetes).
- Proficiency in scripting languages like Python, Bash, or PowerShell.
- Deep understanding of system security, networking, and load balancing.
- Strong analytical skills and problem-solving mindset.
- Certifications (e.g., AWS Certified Solutions Architect, Kubernetes Administrator) are a plus.
What We Offer:
- Opportunity to work with a cutting-edge tech stack in a product-first company.
- Collaborative and growth-oriented environment.
- Competitive salary and benefits.
- Freedom to innovate and contribute to impactful projects.
Job Description:
• Drive end-to-end automation from GitHub/GitLab/BitBucket to Deployment,
Observability and Enabling the SRE activities
• Guide operations support (setup, configuration, management, troubleshooting) of
digital platforms and applications
• Solid understanding of DevSecOps Workflows that support CI, CS, CD, CM, CT.
• Deploy, configure, and manage SaaS and PaaS cloud platform and applications
• Provide Level 1 (OS, patching) and Level 2 (app server instance troubleshooting)
• DevOps programming: writing scripts, building operations/server instance/app/DB
monitoring tools Set up / manage continuous build and dev project management
environment: JenkinX/GitHub Actions/Tekton, Git, Jira Designing secure networks,
systems, and application architectures
• Collaborating with cross-functional teams to ensure secure product development
• Disaster recovery, network forensics analysis, and pen-testing solutions
• Planning, researching, and developing security policies, standards, and procedures
• Awareness training of the workforce on information security standards, policies, and
best practices
• Installation and use of firewalls, data encryption and other security products and
procedures
• Maturity in understanding compliance, policy and cloud governance and ability to
identify and execute automation.
• At Wesco, we discuss more about solutions than problems. We celebrate innovation
and creativity.
Objectives :
- Building and setting up new development tools and infrastructure
- Working on ways to automate and improve development and release processes
- Testing code written by others and analyzing results
- Ensuring that systems are safe and secure against cybersecurity threats
- Identifying technical problems and developing software updates and ‘fixes’
- Working with software developers and software engineers to ensure that development follows established processes and works as intended
- Planning out projects and being involved in project management decisions
Daily and Monthly Responsibilities :
- Deploy updates and fixes
- Build tools to reduce occurrences of errors and improve customer experience
- Develop software to integrate with internal back-end systems
- Perform root cause analysis for production errors
- Investigate and resolve technical issues
- Develop scripts to automate visualization
- Design procedures for system troubleshooting and maintenance
Skills and Qualifications :
- Degree in Computer Science or Software Engineering or BSc in Computer Science, Engineering or relevant field
- 3+ years of experience as a DevOps Engineer or similar software engineering role
- Proficient with git and git workflows
- Good logical skills and knowledge of programming concepts(OOPS,Data Structures)
- Working knowledge of databases and SQL
- Problem-solving attitude
- Collaborative team spirit
- Good knowledge of at least one language (C#, Java, Python, Go, PHP, Node.js)
- Have enough experience on application and infrastructure architectures
- Design and plan cloud solution architecture
- Design for security, network, and compliances
- Analyze and optimize technical and business processes
- Ensure solution and operational reliability
- Manage and provision cloud infrastructure
- Manage IaaS, PaaS, and SaaS solutions
- Design strategies around cloud governance, migration, Cloud operations and DevOps
- Design highly scalable, available, and reliable cloud applications
- Build and test applications
- Deploy applications on cloud
- Integration with cloud services
Certification:
- Architect level certificate of any cloud (AWS, GCP, Azure)
About the job
Our goal
We are reinventing the future of MLOps. Censius Observability platform enables businesses to gain greater visibility into how their AI makes decisions to understand it better. We enable explanations of predictions, continuous monitoring of drifts, and assessing fairness in the real world. (TLDR build the best ML monitoring tool)
The culture
We believe in constantly iterating and improving our team culture, just like our product. We have found a good balance between async and sync work default is still Notion docs over meetings, but at the same time, we recognize that as an early-stage startup brainstorming together over calls leads to results faster. If you enjoy taking ownership, moving quickly, and writing docs, you will fit right in.
The role:
Our engineering team is growing and we are looking to bring on board a senior software engineer who can help us transition to the next phase of the company. As we roll out our platform to customers, you will be pivotal in refining our system architecture, ensuring the various tech stacks play well with each other, and smoothening the DevOps process.
On the platform, we use Python (ML-related jobs), Golang (core infrastructure), and NodeJS (user-facing). The platform is 100% cloud-native and we use Envoy as a proxy (eventually will lead to service-mesh architecture).
By joining our team, you will get the exposure to working across a swath of modern technologies while building an enterprise-grade ML platform in the most promising area.
Responsibilities
- Be the bridge between engineering and product teams. Understand long-term product roadmap and architect a system design that will scale with our plans.
- Take ownership of converting product insights into detailed engineering requirements. Break these down into smaller tasks and work with the team to plan and execute sprints.
- Author high-quality, highly-performance, and unit-tested code running on a distributed environment using containers.
- Continually evaluate and improve DevOps processes for a cloud-native codebase.
- Review PRs, mentor others and proactively take initiatives to improve our team's shipping velocity.
- Leverage your industry experience to champion engineering best practices within the organization.
Qualifications
Work Experience
- 3+ years of industry experience (2+ years in a senior engineering role) preferably with some exposure in leading remote development teams in the past.
- Proven track record building large-scale, high-throughput, low-latency production systems with at least 3+ years working with customers, architecting solutions, and delivering end-to-end products.
- Fluency in writing production-grade Go or Python in a microservice architecture with containers/VMs for over 3+ years.
- 3+ years of DevOps experience (Kubernetes, Docker, Helm and public cloud APIs)
- Worked with relational (SQL) as well as non-relational databases (Mongo or Couch) in a production environment.
- (Bonus: worked with big data in data lakes/warehouses).
- (Bonus: built an end-to-end ML pipeline)
Skills
- Strong documentation skills. As a remote team, we heavily rely on elaborate documentation for everything we are working on.
- Ability to motivate, mentor, and lead others (we have a flat team structure, but the team would rely upon you to make important decisions)
- Strong independent contributor as well as a team player.
- Working knowledge of ML and familiarity with concepts of MLOps
Benefits
- Competitive Salary
- Work Remotely
- Health insurance
- Unlimited Time Off
- Support for continual learning (free books and online courses)
- Reimbursement for streaming services (think Netflix)
- Reimbursement for gym or physical activity of your choice
- Flex hours
- Leveling Up Opportunities
You will excel in this role if
- You have a product mindset. You understand, care about, and can relate to our customers.
- You take ownership, collaborate, and follow through to the very end.
- You love solving difficult problems, stand your ground, and get what you want from engineers.
- Resonate with our core values of innovation, curiosity, accountability, trust, fun, and social good.

- AWS Cloud, CICD, Serverless setups, Monitoring Setup
- Performance setup, scalability in hands experience, Linux expertise, DevOps Operations
- AWS Cloud, CICD, Serverless setups, Monitoring Setup, Performance setup, scalability in hands experience, Linux expertise, DevOps Operations.
The mission of R&D IT Design Infrastructure is to offer a state-of-the-art design environment
for the chip hardware designers. The R&D IT design environment is a complex landscape of EDA Applications, High Performance Compute, and Storage environments - consolidated in five regional datacenters. Over 7,000 chip hardware designers, spread across 40+ locations around the world, use this state-of-the-art design environment to design new chips and drive the innovation of company. The following figures give an idea about the scale: the landscape has more 75,000+ cores, 30+ PBytes of data, and serves 2,000+ CAD applications and versions. The operational service teams are globally organized to cover 24/7 support to the chip hardware design and software design projects.
Since the landscape is really too complex to manage the traditional way, it is our strategy to transform our R&D IT design infrastructure into “software-defined datacenters”. This transformation entails a different way of work and a different mind-set (DevOps, Site Reliability Engineering) to ensure that our IT services are reliable. That’s why we are looking for a DevOps Linux Engineer to strengthen the team that is building a new on-premise software defined virtualization and containerization platform (PaaS) for our IT landscape, so that we can manage it with best practices from software engineering and offer the IT service reliability which is required by our chip hardware design community.
It will be your role to develop and maintain the base Linux OS images that are offered via automation to the customers of the internal (on-premise) cloud platforms.
Your responsibilities as DevOps Linux Engineer:
• Develop and maintain the base RedHat Linux operating system images
• Develop and maintain code to configure and test the base OS image
• Provide input to support the team to design, develop and maintain automation
products with playbooks (YAML) and modules (Python/PowerShell) in tools like
Ansible Tower and Service-Now
• Test and verify the code produced by the team (including your own) to continuously
improve and refactor
• Troubleshoot and solve incidents on the RedHat Linux operating system
• Work actively with other teams to align on the architecture of the PaaS solution
• Keep the Base OS image up2date via patches or make sure patches are available to
the virtual machine owners
• Train team members and others with your extensive automation knowledge
• Work together with ServiceNow developers in your team to provide the best intuitive
end-user experience possible for the virtual machine OS deployments
We are looking for a DevOps engineer/consultant with the following characteristics:
• Master or Bachelor degree
• You are a technical, creative, analytical and open-minded engineer that is eager to
learn and not afraid to take initiative.
• Your favorite t-shirt has “Linux” or “RedHat” printed on it at least once.
• Linux guru: You have great knowledge Linux servers (RedHat), RedHat Satellite 6
and other RedHat products
• Experience in Infrastructure services, e.g., Networking, DNS, LDAP, SMTP
• DevOps mindset: You are a team-player that is eager to develop and maintain cool
products to automate/optimize processes in a complex IT infrastructure and are able
to build and maintain productive working relationships
• You have great English communication skills, both verbally as in writing.
• No issue to work outside business hours to support the platform for critical R&D
Applications
Other competences we value, but are not strictly mandatory:
• Experience with agile development methods, like Scrum, and are convinced of its
power to deliver products with immense (business) value.
• “Security” is your middle name, and you are always challenging yourself and your
colleagues to design and develop new solutions as security tight as possible.
• Being a master in automation and orchestration with tools like Ansible Tower (or
comparable) and feel comfortable with developing new modules in Python or
PowerShell.
• It would be awesome if you are already a true Yoda when it comes to code version
control and branching strategies with Git, and preferably have worked with GitLab
before.
• Experience with automated testing in a CI/CD pipeline with Ansible, Python and tools
like Selenium.
• An enthusiast on cloud platforms like Azure & AWS.
• Background in and affinity with R&D














