
BENEFITS: -
Competitive salary and stock options -
Premium medical benefits
LOCATION: - Remote (India)
EDUCATION AND EXPERIENCE -
Degree in computer science, software engineering, or a related field -
At least 3-5 years of professional work experience as a
DevOps / test automation / deployment engineer -
Experience in agile software development methodologies
JOB RESPONSIBILITIES: -
Design and development of scalable software test framework to automate test procedures - Perform health checks of existing sites periodically (manually and also developing a automation pipeline) - Manage per site software releases -
Generate release notes, user guides, technical documentation -
Perform upgrades/ downgrades (patch management) -
Generate and Manage configurations -
Suggest process improvements by interfacing with product owner -
File process/installation/deployment related issues & tracking them -
Verify customer-reported issues and translate them into technical tasks for development
REQUIREMENTS:
Following are must to have requirements, -
Automation frameworks like Selenium etc. -
Scripting languages like Perl, Python etc. -
Linux systems -
Version control system like git -
Issue tracking system like Jira -
Networking protocols -
Cloud infrastructure -
Ability to work individually and as part of a team with a sense of urgency -
Excellent communication skills in written and verbal English -
Great attention to detail
PREFERRED:
Following are good to have requirements, -
Knowledge in any of the programming languages like C, C++ etc.
Experience managing cloud-based (e.g., AWS, Google Cloud, etc.) and in-house server infrastructure -
Familiarity with machine learning / artificial intelligence infrastructure -
Experience in data visualization and statistics -
Basic knowledge of hardware infrastructure containing routers, switches and others -
Familiarity with web & data securityDesign and development of scalable software test framework to automate test

Similar jobs
About the Company:
Gruve is an innovative Software Services startup dedicated to empowering Enterprise Customers in managing their Data Life Cycle. We specialize in Cyber Security, Customer Experience, Infrastructure, and advanced technologies such as Machine Learning and Artificial Intelligence. Our mission is to assist our customers in their business strategies utilizing their data to make more intelligent decisions. As an well-funded early-stage startup, Gruve offers a dynamic environment with strong customer and partner networks.
Why Gruve:
At Gruve, we foster a culture of innovation, collaboration, and continuous learning. We are committed to building a diverse and inclusive workplace where everyone can thrive and contribute their best work. If you’re passionate about technology and eager to make an impact, we’d love to hear from you.
Gruve is an equal opportunity employer. We welcome applicants from all backgrounds and thank all who apply; however, only those selected for an interview will be contacted.
Position summary:
We are seeking a Staff Engineer – DevOps with 8-12 years of experience in designing, implementing, and optimizing CI/CD pipelines, cloud infrastructure, and automation frameworks. The ideal candidate will have expertise in Kubernetes, Terraform, CI/CD, Security, Observability, and Cloud Platforms (AWS, Azure, GCP). You will play a key role in scaling and securing our infrastructure, improving developer productivity, and ensuring high availability and performance.
Key Roles & Responsibilities:
- Design, implement, and maintain CI/CD pipelines using tools like Jenkins, GitLab CI/CD, ArgoCD, and Tekton.
- Deploy and manage Kubernetes clusters (EKS, AKS, GKE) and containerized workloads.
- Automate infrastructure provisioning using Terraform, Ansible, Pulumi, or CloudFormation.
- Implement observability and monitoring solutions using Prometheus, Grafana, ELK, OpenTelemetry, or Datadog.
- Ensure security best practices in DevOps, including IAM, secrets management, container security, and vulnerability scanning.
- Optimize cloud infrastructure (AWS, Azure, GCP) for performance, cost efficiency, and scalability.
- Develop and manage GitOps workflows and infrastructure-as-code (IaC) automation.
- Implement zero-downtime deployment strategies, including blue-green deployments, canary releases, and feature flags.
- Work closely with development teams to optimize build pipelines, reduce deployment time, and improve system reliability.
Basic Qualifications:
- A bachelor’s or master’s degree in computer science, electronics engineering or a related field
- 8-12 years of experience in DevOps, Site Reliability Engineering (SRE), or Infrastructure Automation.
- Strong expertise in CI/CD pipelines, version control (Git), and release automation.
- Hands-on experience with Kubernetes (EKS, AKS, GKE) and container orchestration.
- Proficiency in Terraform, Ansible for infrastructure automation.
- Experience with AWS, Azure, or GCP services (EC2, S3, IAM, VPC, Lambda, API Gateway, etc.).
- Expertise in monitoring/logging tools such as Prometheus, Grafana, ELK, OpenTelemetry, or Datadog.
- Strong scripting and automation skills in Python, Bash, or Go.
Preferred Qualifications
- Experience in FinOps Cloud Cost Optimization) and Kubernetes cluster scaling.
- Exposure to serverless architectures and event-driven workflows.
- Contributions to open-source DevOps projects.
The Key Responsibilities Include But Not Limited to:
Help identify and drive Speed, Performance, Scalability, and Reliability related optimization based on experience and learnings from the production incidents.
Work in an agile DevSecOps environment in creating, maintaining, monitoring, and automation of the overall solution-deployment.
Understand and explain the effect of product architecture decisions on systems.
Identify issues and/or opportunities for improvements that are common across multiple services/teams.
This role will require weekend deployments
Skills and Qualifications:
1. 3+ years of experience in a DevOps end-to-end development process with heavy focus on service monitoring and site reliability engineering work.
2. Advanced knowledge of programming/scripting languages (Bash, PERL, Python, Node.js).
3. Experience in Agile/SCRUM enterprise-scale software development including working with GiT, JIRA, Confluence, etc.
4. Advance experience with core microservice technology (RESTFul development).
5. Working knowledge of using Advance AI/ML tools are pluses.
6. Working knowledge in the one or more of the Cloud Services: Amazon AWS, Microsoft Azure
7. Bachelors or Master’s degree in Computer Science or equivalent related field experience
Key Behaviours / Attitudes:
Professional curiosity and a desire to a develop deep understanding of services and technologies.
Experience building & running systems to drive high availability, performance and operational improvements
Excellent written & oral communication skills; to ask pertinent questions, and to assess/aggregate/report the responses.
Ability to quickly grasp and analyze complex and rapidly changing systemsSoft skills
1. Self-motivated and self-managing.
2. Excellent communication / follow-up / time management skills.
3. Ability to fulfill role/duties independently within defined policies and procedures.
4. Ability to balance multi-task and multiple priorities while maintaining a high level of customer satisfaction is key.
5. Be able to work in an interrupt-driven environment.Work with Dori Ai world class technology to develop, implement, and support Dori's global infrastructure.
As a member of the IT organization, assist with the analyze of existing complex programs and formulate logic for new complex internal systems. Prepare flowcharting, perform coding, and test/debug programs. Develop conversion and system implementation plans. Recommend changes to development, maintenance, and system standards.
Leading contributor individually and as a team member, providing direction and mentoring to others. Work is non-routine and very complex, involving the application of advanced technical/business skills in a specialized area. BS or equivalent experience in programming on enterprise or department servers or systems.

- Provides free and subscription-based website and email services hosted and operated at data centres in Mumbai and Hyderabad.
- Serve global audience and customers through sophisticated content delivery networks.
- Operate a service infrastructure using the latest technologies for web services and a very large storage infrastructure.
- Provides virtualized infrastructure, allows seamless migration and the addition of services for scalability.
- Pioneers and earliest adopters of public cloud and NoSQL big data store - since more than a decade.
- Provide innovative internet services with work on multiple technologies like php, java, nodejs, python and c++ to scale our services as per need.
- Has Internet infrastructure peering arrangements with all the major and minor ISPs and telecom service providers.
- Have mail traffic exchange agreements with major Internet services.
Job Details :
- This job position provides competitive professional opportunity both to experienced and aspiring engineers. The company's technology and operations groups are managed by senior professionals with deep subject matter expertise.
- The company believes having an open work environment offering mentoring and learning opportunities with an informal and flexible work culture, which allows professionals to actively participate and contribute to the success of our services and business.
- You will be part of a team that keeps the business running for cloud products and services that are used 24- 7 by the company's consumers and enterprise customers around the world. You will be asked to contribute to operate, maintain and provide escalation support for the company's cloud infrastructure that powers all of cloud offerings.
Job Role :
- As a senior engineer, your role grows as you gain experience in our operations. We facilitate a hands-on learning experience after an induction program, to get you into the role as quickly as possible.
- The systems engineer role also requires candidates to research and recommend innovative and automated approaches for system administration tasks.
- The work culture allows a seamless integration with different product engineering teams. The teams work together and share responsibility to triage in complex operational situations. The candidate is expected to stay updated on best practices and help evolve processes both for resilience of services and compliance.
- You will be required to provide support for both, production and non-production environments to ensure system updates and expected service levels. You will be required to specifically handle 24/7 L2 and L3 oversight for incident responses and have an excellent understanding of the end-to-end support process from client to different support escalation levels.
- The role also requires a discipline to create, update and maintain process documents, based on operation incidents, technologies and tools used in the processes to resolve issues.
QUALIFICATION AND EXPERIENCE :
- A graduate degree or senior diploma in engineering or technology with some or all of the following:
- Knowledge and work experience with KVM, AWS (Glacier, S3, EC2), RabbitMQ, Fluentd, Syslog, Nginx is preferred
- Installation and tuning of Web Servers, PHP, Java servlets, memory-based databases for scalability and performance
- Knowledge of email related protocols such as SMTP, POP3, IMAP along with experience in maintenance and administration of MTAs such as postfix, qmail, etc will be an added advantage
- Must have knowledge on monitoring tools, trend analysis, networking technologies, security tools and troubleshooting aspects.
- Knowledge of analyzing and mitigating security related issues and threats is certainly desirable.
- Knowledge of agile development/SDLC processes and hands-on participation in planning sprints and managing daily scrum is desirable.
- Preferably, programming experience in Shell, Python, Perl or C.
- Cloud and virtualization-based technologies (Amazon Web Services (AWS), VMWare).
- Java Application Server Administration (Weblogic, WidlFfy, JBoss, Tomcat).
- Docker and Kubernetes (EKS)
- Linux/UNIX Administration (Amazon Linux and RedHat).
- Developing and supporting cloud infrastructure designs and implementations and guiding application development teams.
- Configuration Management tools (Chef or Puppet or ansible).
- Log aggregations tools such as Elastic and/or Splunk.
- Automate infrastructure and application deployment-related tasks using terraform.
- Automate repetitive tasks required to maintain a secure and up-to-date operational environment.
Responsibilities
- Build and support always-available private/public cloud-based software-as-a-service (SaaS) applications.
- Build AWS or other public cloud infrastructure using Terraform.
- Deploy and manage Kubernetes (EKS) based docker applications in AWS.
- Create custom OS images using Packer.
- Create and revise infrastructure and architectural designs and implementation plans and guide the implementation with operations.
- Liaison between application development, infrastructure support, and tools (IT Services) teams.
- Development and documentation of Chef recipes and/or ansible scripts. Support throughout the entire deployment lifecycle (development, quality assurance, and production).
- Help developers leverage infrastructure, application, and cloud platform features and functionality participate in code and design reviews, and support developers by building CI/CD pipelines using Bamboo, Jenkins, or Spinnaker.
- Create knowledge-sharing presentations and documentation to help developers and operations teams understand and leverage the system's capabilities.
- Learn on the job and explore new technologies with little supervision.
- Leverage scripting (BASH, Perl, Ruby, Python) to build required automation and tools on an ad-hoc basis.
Who we have in mind:
- Solid experience in building a solution on AWS or other public cloud services using Terraform.
- Excellent problem-solving skills with a desire to take on responsibility.
- Extensive knowledge in containerized application and deployment in Kubernetes
- Extensive knowledge of the Linux operating system, RHEL preferred.
- Proficiency with shell scripting.
- Experience with Java application servers.
- Experience with GiT and Subversion.
- Excellent written and verbal communication skills with the ability to communicate technical issues to non-technical and technical audiences.
- Experience working in a large-scale operational environment.
- Internet and operating system security fundamentals.
- Extensive knowledge of massively scalable systems. Linux operating system/application development desirable.
- Programming in scripting languages such as Python. Other object-oriented languages (C++, Java) are a plus.
- Experience with Configuration Management Automation tools (chef or puppet).
- Experience with virtualization, preferably on multiple hypervisors.
- BS/MS in Computer Science or equivalent experience.
- Excellent written and verbal skills.
Education or Equivalent Experience:
- Bachelor's degree or equivalent education in related fields
- Certificates of training in associated fields/equipment’s
REVOS is a smart micro-mobility platform that works with enterprises across the automotive shared mobility value chain to enable and accelerate their smart vehicle journeys. Founded in 2017, it aims to empower all 2 and 3 wheeler vehicles through AI-integrated IoT solutions that will make them smart, safe, connected. We are backed by investors like USV and Prime Venture.
Duties and Responsibilities :
- Automating various tasks in cloud operations, deployment, monitoring, and performance optimization for big data stack.
- Build, release, and configuration management of production systems.
- System troubleshooting and problem-solving across platform and application domains.
- Suggesting architecture improvements, recommending process improvements.
- Evaluate new technology options and vendor products.
- Function well in a fast-paced, rapidly-changing environment
- Communicate effectively with people at all levels of the organization
Qualifications and Required Skills:
- Overall 3+ years of experience in various software engineering roles.
- 3+ years of experience in building applications and tools in any tech stack, preferably deployed on cloud
- Recent 3 years’ experience must be on Serverless/cloud-native development in AWS (preferred)/Azure
- Expertise in any of the programming languages – (NodeJS or Python preferable)
- Must have hands-on experience in using AWS/Azure - SDK/APIs.
- Must have experience in deploying, releasing, and managing production systems
- MCA or a degree in engineering in Computer Science, IT, or Electronics stream
- 2+ years of demonstrable experience leading site reliability and performance in large-scale, high-traffic environments
- 2+ years of hands-on experience as a DevOps engineer
- Strong leadership, communication and interpersonal skills geared to getting things done
- Developing themselves and the talent within their charge – fostering and creating opportunity for the team
- Strong understanding of SRE concepts and the DevOps culture. Set the direction and strategy for your team, and help shape the overall SRE program for the company
- Be able to lead complicated technical issues and communicating status updates/RCA with management and customers.
- Own site stability, performance, capacity planning, DevOps recruitment.


DevOps Engineer Skills Building a scalable and highly available infrastructure for data science Knows data science project workflows Hands-on with deployment patterns for online/offline predictions (server/serverless)
Experience with either terraform or Kubernetes
Experience of ML deployment frameworks like Kubeflow, MLflow, SageMaker Working knowledge of Jenkins or similar tool Responsibilities Owns all the ML cloud infrastructure (AWS) Help builds out an entirely CI/CD ecosystem with auto-scaling Work with a testing engineer to design testing methodologies for ML APIs Ability to research & implement new technologies Help with cost optimizations of infrastructure.
Knowledge sharing Nice to Have Develop APIs for machine learning Can write Python servers for ML systems with API frameworks Understanding of task queue frameworks like Celery

Radical is a platform connecting data, medicine and people -- through machine learning, and usable, performant products. Software has never been the strong suit of the medical industry -- and we are changing that. We believe that the same sophistication and performance that powers our daily needs through millions of consumer applications -- be it your grocery, your food delivery or your movie tickets -- when applied to healthcare, has a massive potential to transform the industry, and positively impact lives of patients and doctors. Radical works with some of the largest hospitals and public health programmes in India, and has a growing footprint both inside the country and abroad.
As a DevOps Engineer at Radical, you will:
Work closely with all stakeholders in the healthcare ecosystem - patients, doctors, paramedics and administrators - to conceptualise and bring to life the ideal set of products that add value to their time
Work alongside Software Developers and ML Engineers to solve problems and assist in architecture design
Work on systems which have an extraordinary emphasis on capturing data that can help build better workflows, algorithms and tools
Work on high performance systems that deal with several million transactions, multi-modal data and large datasets, with a close attention to detail
We’re looking for someone who has:
Familiarity and experience with writing working, well-documented and well-tested scripts, Dockerfiles, Puppet/Ansible/Chef/Terraform scripts.
Proficiency with scripting languages like Python and Bash.
Knowledge of systems deployment and maintainence, including setting up CI/CD and working alongside Software Developers, monitoring logs, dashboards, etc.
Experience integrating with a wide variety of external tools and services
Experience navigating AWS and leveraging appropriate services and technologies rather than DIY solutions (such as hosting an application directly on EC2 vs containerisation, or an Elastic Beanstalk)
It’s not essential, but great if you have:
An established track record of deploying and maintaining systems.
Experience with microservices and decomposition of monolithic architectures
Proficiency in automated tests.
Proficiency with the linux ecosystem
Experience in deploying systems to production on cloud platforms such as AWS
The position is open now, and we are onboarding immediately.
Please write to us with an updated resume, and one thing you would like us to see as part of your application. This one thing can be anything that you think makes you stand apart among candidates.
Radical is based out of Delhi NCR, India, and we look forward to working with you!
We're looking for people who may not know all the answers, but are obsessive about finding them, and take pride in the code that they write. We are more interested in the ability to learn fast, think rigorously and for people who aren’t afraid to challenge assumptions, and take large bets -- only to work hard and prove themselves correct. You're encouraged to apply even if your experience doesn't precisely match the job description. Join us.


