MLOps Engineer
Required Candidate profile :
- 3+ years’ experience in developing continuous integration and deployment (CI/CD) pipelines (e.g. Jenkins, Github Actions) and bringing ML models to CI/CD pipelines
- Candidate with strong Azure expertise
- Exposure of Productionize the models
- Candidate should have complete knowledge of Azure ecosystem, especially in the area of DE
- Candidate should have prior experience in Design, build, test, and maintain machine learning infrastructure to empower data scientists to rapidly iterate on model development
- Develop continuous integration and deployment (CI/CD) pipelines on top of Azure that includes AzureML, MLflow and Azure Devops
- Proficient knowledge of git, Docker and containers, Kubernetes
- Familiarity with Terraform
- E2E production experience with Azure ML, Azure ML Pipelines
- Experience in Azure ML extension for Azure Devops
- Worked on Model Drift (Concept Drift, Data Drift preferable on Azure ML.)
- Candidate will be part of a cross-functional team that builds and delivers production-ready data science projects. You will work with team members and stakeholders to creatively identify, design, and implement solutions that reduce operational burden, increase reliability and resiliency, ensure disaster recovery and business continuity, enable CI/CD, optimize ML and AI services, and maintain it all in infrastructure as code everything-in-version-control manner.
- Candidate with strong Azure expertise
- Candidate should have complete knowledge of Azure ecosystem, especially in the area of DE
- Candidate should have prior experience in Design, build, test, and maintain machine learning infrastructure to empower data scientists to rapidly iterate on model development
- Develop continuous integration and deployment (CI/CD) pipelines on top of Azure that includes AzureML, MLflow and Azure Devops

About Cyphertree Technologies Pvt. Ltd.
About
Connect with the team
Similar jobs
About MyOperator
MyOperator is a Business AI Operator, a category leader that unifies WhatsApp, Calls, and AI-powered chat & voice bots into one intelligent business communication platform. Unlike fragmented communication tools, MyOperator combines automation, intelligence, and workflow integration to help businesses run WhatsApp campaigns, manage calls, deploy AI chatbots, and track performance — all from a single, no-code platform. Trusted by 12,000+ brands including Amazon, Domino's, Apollo, and Razorpay, MyOperator enables faster responses, higher resolution rates, and scalable customer engagement — without fragmented tools or increased headcount.
Job Summary
We are looking for a skilled and motivated DevOps Engineer with 3+ years of hands-on experience in AWS cloud infrastructure, CI/CD automation, and Kubernetes-based deployments. The ideal candidate will have strong expertise in Infrastructure as Code, containerization, monitoring, and automation, and will play a key role in ensuring high availability, scalability, and security of production systems.
Key Responsibilities
- Design, deploy, manage, and maintain AWS cloud infrastructure, including EC2, RDS, OpenSearch, VPC, S3, ALB, API Gateway, Lambda, SNS, and SQS.
- Build, manage, and operate Kubernetes (EKS) clusters and containerized workloads.
- Containerize applications using Docker and manage deployments with Helm charts
- Develop and maintain CI/CD pipelines using Jenkins for automated build and deployment processes
- Provision and manage infrastructure using Terraform (Infrastructure as Code)
- Implement and manage monitoring, logging, and alerting solutions using Prometheus and Grafana
- Write and maintain Python scripts for automation, monitoring, and operational tasks
- Ensure high availability, scalability, performance, and cost optimization of cloud resources
- Implement and follow security best practices across AWS and Kubernetes environments
- Troubleshoot production issues, perform root cause analysis, and support incident resolution
- Collaborate closely with development and QA teams to streamline deployment and release processes
Required Skills & Qualifications
- 3+ years of hands-on experience as a DevOps Engineer or Cloud Engineer.
- Strong experience with AWS services, including:
- EC2, RDS, OpenSearch, VPC, S3
- Application Load Balancer (ALB), API Gateway, Lambda
- SNS and SQS.
- Hands-on experience with AWS EKS (Kubernetes)
- Strong knowledge of Docker and Helm charts
- Experience with Terraform for infrastructure provisioning and management
- Solid experience building and managing CI/CD pipelines using Jenkins
- Practical experience with Prometheus and Grafana for monitoring and alerting
- Proficiency in Python scripting for automation and operational tasks
- Good understanding of Linux systems, networking concepts, and cloud security
- Strong problem-solving and troubleshooting skills
Good to Have (Preferred Skills)
- Exposure to GitOps practices
- Experience managing multi-environment setups (Dev, QA, UAT, Production)
- Knowledge of cloud cost optimization techniques
- Understanding of Kubernetes security best practices
- Experience with log aggregation tools (e.g., ELK/OpenSearch stack)
Language Preference
- Fluency in English is mandatory.
- Fluency in Hindi is preferred.
• Bachelor’s or master’s degree in Computer Engineering,
Computer Science, Computer Applications, Mathematics, Statistics or related technical field or
equivalent practical experience. Relevant experience of at least 3 years in lieu of above if from a
different stream of education.
• Well-versed in DevOps principals & practices and hands-on DevOps
tool-chain integration experience: Release Orchestration & Automation, Source Code & Build
Management, Code Quality & Security Management, Behavior Driven Development, Test Driven
Development, Continuous Integration, Continuous Delivery, Continuous Deployment, and
Operational Monitoring & Management; extra points if you can demonstrate your knowledge with
working examples.
• Hands-on experience with demonstrable working experience with DevOps tools
and platforms viz., Slack, Jira, GIT, Jenkins, Code Quality & Security Plugins, Maven, Artifactory,
Terraform, Ansible/Chef/Puppet, Spinnaker, Tekton, StackStorm, Prometheus, Grafana, ELK,
PagerDuty, VictorOps, etc.
• Well-versed in Virtualization & Containerization; must demonstrate
experience in technologies such as Kubernetes, Istio, Docker, OpenShift, Anthos, Oracle VirtualBox,
Vagrant, etc.
• Well-versed in AWS and/or Azure or and/or Google Cloud; must demonstrate
experience in at least FIVE (5) services offered under AWS and/or Azure or and/or Google Cloud in
any categories: Compute or Storage, Database, Networking & Content Delivery, Management &
Governance, Analytics, Security, Identity, & Compliance (or) equivalent demonstratable Cloud
Platform experience.
• Well-versed with demonstrable working experience with API Management,
API Gateway, Service Mesh, Identity & Access Management, Data Protection & Encryption, tools &
platforms.
• Hands-on programming experience in either core Java and/or Python and/or JavaScript
and/or Scala; freshers passing out of college or lateral movers into IT must be able to code in
languages they have studied.
• Well-versed with Storage, Networks and Storage Networking basics
which will enable you to work in a Cloud environment.
• Well-versed with Network, Data, and
Application Security basics which will enable you to work in a Cloud as well as Business
Applications / API services environment.
• Extra points if you are certified in AWS and/or Azure
and/or Google Cloud.
Job Description
We are seeking a skilled DevOps Specialist to join our global automotive team. As DevOps Specialist, you will be responsible for managing operations, system monitoring, troubleshooting, and supporting automation workflows to ensure operational stability and excellence for enterprise IT projects. You will be providing support for critical application environments for industry leaders in the automotive industry.
Responsibilities:
Daily maintenance tasks on application availability, response times, pro-active incident tracking on system logs and resources monitoring
Incident Management: Monitor and respond to tickets raised by the DevOps team or end-users.
Support users with prepared troubleshooting Maintain detailed incident logs, track SLAs, and prepare root cause analysis reports.
Change & Problem Management: Support scheduled changes, releases, and maintenance activities. Assist in identifying and tracking recurring issues.
Documentation & Communication: Maintain process documentation, runbooks, and knowledge base articles. Provide regular updates to stakeholders on incidents and resolutions.
Tool & Platform Support: Manage and troubleshoot CI/CD tools (e.g., Jenkins, GitLab), container platforms (e.g., Docker, Kubernetes), and cloud services (e.g., AWS, Azure).
Requirements:
DevOps Skillset: Logfile analysis /troubleshooting (ELK Stack), Linux administration, Monitoring (App Dynamics, Checkmk, Prometheus, Grafana), Security (Black Duck, SonarQube, Dependabot, OWASP or similar)
Experience with Docker.
Familiarity with DevOps principles and ticket tools like ServiceNow.
Experience in handling confidential data and safety sensitive systems
Strong analytical, communication, and organizational abilities. Easy to work with.
Optional: Experience with our relevant business domain (Automotive / Manufacturing industry, especially production management systems). Familiarity with IT process frameworks SCRUM, ITIL.
Skills & Requirements
DevOps, Logfile Analysis, Troubleshooting, ELK Stack, Linux Administration, Monitoring, AppDynamics, Checkmk, Prometheus, Grafana, Security, Black Duck, SonarQube, Dependabot, OWASP, Docker, CI/CD, Jenkins, GitLab, Kubernetes, AWS, Azure, ServiceNow, Incident Management, Change Management, Problem Management, Documentation, Communication, Analytical Skills, Organizational Skills, SCRUM, ITIL, Automotive Industry, Manufacturing Industry, Production Management Systems.
The candidate must have 2-3 years of experience in the domain. The responsibilities include:
● Deploying system on Linux-based environment using Docker
● Manage & maintain the production environment
● Deploy updates and fixes
● Provide Level 1 technical support
● Build tools to reduce occurrences of errors and improve customer experience
● Develop software to integrate with internal back-end systems
● Perform root cause analysis for production errors
● Investigate and resolve technical issues
● Develop scripts to automate visualization
● Design procedures for system troubleshooting and maintenance
● Experience working on Linux-based infrastructure
● Excellent understanding of MERN Stack, Docker & Nginx (Good to have Node Js)
● Configuration and managing databases such as Mongo
● Excellent troubleshooting
● Experience of working with AWS/Azure/GCP
● Working knowledge of various tools, open-source technologies, and cloud services
● Awareness of critical concepts in DevOps and Agile principles
● Experience of CI/CD Pipeline
- Preferred experience in development associated with Kafka or big data technologies understand essential Kafka components like Zookeeper, Brokers, and optimization of Kafka clients applications (Producers & Consumers). -
Experience with Automation of Infrastructure, Testing , DB Deployment Automation, Logging/Monitoring/alerting
- AWS services experience on CloudFormation, ECS, Elastic Container Registry, Pipelines, Cloudwatch, Glue, and other related services.
- AWS Elastic Kubernetes Services (EKS) - Kubernetes and containers managing and auto-scaling -
Good knowledge and hands-on experiences with various AWS services like EC2, RDS, EKS, S3, Lambda, API, Cloudwatch, etc.
- Good and quick with log analysis to perform Root Cause Analysis (RCA) on production deployments and container errors on cloud watch.
Working on ways to automate and improve deployment and release processes.
- High understanding of the Serverless architecture concept. - Good with Deployment automation tools and Investigating to resolve technical issues.
technical issues. - Sound knowledge of APIs, databases, and container-based ETL jobs.
- Planning out projects and being involved in project management decisions. Soft Skills
- Adaptability
- Collaboration with different teams
- Good communication skills
- Team player attitude
- Work towards improving the following 4 verticals - scalability, availability, security, and cost, for company's workflows and products.
- Help in provisioning, managing, optimizing cloud infrastructure in AWS (IAM, EC2, RDS, CloudFront, S3, ECS, Lambda, ELK etc.)
- Work with the development teams to design scalable, robust systems using cloud architecture for both 0-to-1 and 1-to-100 products.
- Drive technical initiatives and architectural service improvements.
- Be able to predict problems and implement solutions that detect and prevent outages.
- Mentor/manage a team of engineers.
- Design solutions with failure scenarios in mind to ensure reliability.
- Document rigorously to keep track of all changes/upgrades to the infrastructure and as well share knowledge with the rest of the team
- Identify vulnerabilities during development with actionable information to empower developers to remediate vulnerabilities
- Automate the build and testing processes to consistently integrate code
- Manage changes to documents, software, images, large web sites, and other collections of code, configuration, and metadata among disparate teams
Our client is a call management solutions company, which helps small to mid-sized businesses use its virtual call center to manage customer calls and queries. It is an AI and cloud-based call operating facility that is affordable as well as feature-optimized. The advanced features offered like call recording, IVR, toll-free numbers, call tracking, etc are based on automation and enhances the call handling quality and process, for each client as per their requirements. They service over 6,000 business clients including large accounts like Flipkart and Uber.
- Being involved in Configuration Management, Web Services Architectures, DevOps Implementation, Build & Release Management, Database management, Backups, and Monitoring.
- Ensuring reliable operation of CI/ CD pipelines
- Orchestrate the provisioning, load balancing, configuration, monitoring and billing of resources in the cloud environment in a highly automated manner
- Logging, metrics and alerting management.
- Creating Docker files
- Creating Bash/ Python scripts for automation.
- Performing root cause analysis for production errors.
What you need to have:
- Proficient in Linux Commands line and troubleshooting.
- Proficient in AWS Services. Deployment, Monitoring and troubleshooting applications in AWS.
- Hands-on experience with CI tooling preferably with Jenkins.
- Proficient in deployment using Ansible.
- Knowledge of infrastructure management tools (Infrastructure as cloud) such as terraform, AWS cloudformation etc.
- Proficient in deployment of applications behind load balancers and proxy servers such as nginx, apache.
- Scripting languages: Bash, Python, Groovy.
- Experience with Logging, Monitoring, and Alerting tools like ELK(Elastic-search, Logstash, Kibana), Nagios. Graylog, splunk Prometheus, Grafana is a plus.
- Cloud and virtualization-based technologies (Amazon Web Services (AWS), VMWare).
- Java Application Server Administration (Weblogic, WidlFfy, JBoss, Tomcat).
- Docker and Kubernetes (EKS)
- Linux/UNIX Administration (Amazon Linux and RedHat).
- Developing and supporting cloud infrastructure designs and implementations and guiding application development teams.
- Configuration Management tools (Chef or Puppet or ansible).
- Log aggregations tools such as Elastic and/or Splunk.
- Automate infrastructure and application deployment-related tasks using terraform.
- Automate repetitive tasks required to maintain a secure and up-to-date operational environment.
Responsibilities
- Build and support always-available private/public cloud-based software-as-a-service (SaaS) applications.
- Build AWS or other public cloud infrastructure using Terraform.
- Deploy and manage Kubernetes (EKS) based docker applications in AWS.
- Create custom OS images using Packer.
- Create and revise infrastructure and architectural designs and implementation plans and guide the implementation with operations.
- Liaison between application development, infrastructure support, and tools (IT Services) teams.
- Development and documentation of Chef recipes and/or ansible scripts. Support throughout the entire deployment lifecycle (development, quality assurance, and production).
- Help developers leverage infrastructure, application, and cloud platform features and functionality participate in code and design reviews, and support developers by building CI/CD pipelines using Bamboo, Jenkins, or Spinnaker.
- Create knowledge-sharing presentations and documentation to help developers and operations teams understand and leverage the system's capabilities.
- Learn on the job and explore new technologies with little supervision.
- Leverage scripting (BASH, Perl, Ruby, Python) to build required automation and tools on an ad-hoc basis.
Who we have in mind:
- Solid experience in building a solution on AWS or other public cloud services using Terraform.
- Excellent problem-solving skills with a desire to take on responsibility.
- Extensive knowledge in containerized application and deployment in Kubernetes
- Extensive knowledge of the Linux operating system, RHEL preferred.
- Proficiency with shell scripting.
- Experience with Java application servers.
- Experience with GiT and Subversion.
- Excellent written and verbal communication skills with the ability to communicate technical issues to non-technical and technical audiences.
- Experience working in a large-scale operational environment.
- Internet and operating system security fundamentals.
- Extensive knowledge of massively scalable systems. Linux operating system/application development desirable.
- Programming in scripting languages such as Python. Other object-oriented languages (C++, Java) are a plus.
- Experience with Configuration Management Automation tools (chef or puppet).
- Experience with virtualization, preferably on multiple hypervisors.
- BS/MS in Computer Science or equivalent experience.
- Excellent written and verbal skills.
Education or Equivalent Experience:
- Bachelor's degree or equivalent education in related fields
- Certificates of training in associated fields/equipment’s
Role : DevOps Engineer
Experience : 1 to 2 years and 2 to 5 Years as DevOps Engineer (2 Positions)
Location : Bangalore. 5 Days Working.
Education Qualification : Tech/B.E for Tier-1/Tier-2/Tier-3 Colleges or equivalent institutes
Skills :- DevOps Engineering, Ruby On Rails or Python and Bash/Shell skills, Docker, rkt or similar container engine, Kubernetes or similar clustering solutions
As DevOps Engineer, you'll be part of the team building the stage for our Software Engineers to work on, helping to enhance our product performance and reliability.
Responsibilities:
- Build & operate infrastructure to support website, backed cluster, ML projects in the organization.
- Helping teams become more autonomous and allowing the Operation team to focus on improving the infrastructure and optimizing processes.
- Delivering system management tooling to the engineering teams.
- Working on your own applications which will be used internally.
- Contributing to open source projects that we are using (or that we may start).
- Be an advocate for engineering best practices in and out of the company.
- Organizing tech talks and participating in meetups and representing Box8 at industry events.
- Sharing pager duty for the rare instances of something serious happening. ∙ Collaborate with other developers to understand & setup tooling needed for Continuous Integration/Delivery/Deployment (CI/CD) practices.
Requirements:
- 1+ Years Of Industry Experience Scale existing back end systems to handle ever increasing amounts of traffic and new product requirements.
- Ruby On Rails or Python and Bash/Shell skills.
- Experience managing complex systems at scale.
- Experience with Docker, rkt or similar container engine.
- Experience with Kubernetes or similar clustering solutions.
- Experience with tools such as Ansible or Chef Understanding of the importance of smart metrics and alerting.
- Hands on experience with cloud infrastructure provisioning, deployment, monitoring (we are on AWS and use ECS, ELB, EC2, Elasticache, Elasticsearch, S3, CloudWatch).
- Experience with relational SQL and NoSQL databases, including Postgres and Cassandra.
- Knowledge of data pipeline and workflow management tools: Azkaban, Luigi, Airflow, etc.
- Experience in working on linux based servers.
- Managing large scale production grade infrastructure on AWS Cloud.
- Good Knowledge on scripting languages like ruby, python or bash.
- Experience in creating in deployment pipeline from scratch.
- Expertise in any of the CI tools, preferably Jenkins.
- Good knowledge of docker containers and its usage.
- Using Infra/App Monitoring tools like, CloudWatch/Newrelic/Sensu.
Good to have:
- Knowledge of Ruby on Rails based applications and its deployment methodologies.
- Experience working on Container Orchestration tools like Kubernetes/ECS/Mesos.
- Extra Points For Experience With Front-end development NewRelic GCP Kafka, Elasticsearch.
Job Location: Jaipur
Experience Required: Minimum 3 years
About the role:
As a DevOps Engineer for Punchh, you will be working with our developers, SRE, and DevOps teams implementing our next generation infrastructure. We are looking for a self-motivated, responsible, team player who love designing systems that scale. Punchh provides a rich engineering environment where you can be creative, learn new technologies, solve engineering problems, all while delivering business objectives. The DevOps culture here is one with immense trust and responsibility. You will be given the opportunity to make an impact as there are no silos here.
Responsibilities:
- Deliver SLA and business objectives through whole lifecycle design of services through inception to implementation.
- Ensuring availability, performance, security, and scalability of AWS production systems
- Scale our systems and services through continuous integration, infrastructure as code, and gradual refactoring in an agile environment.
- Maintain services once a project is live by monitoring and measuring availability, latency, and overall system and application health.
- Write and maintain software that runs the infrastructure that powers the Loyalty and Data platform for some of the world’s largest brands.
- 24x7 in shifts on call for Level 2 and higher escalations
- Respond to incidents and write blameless RCA’s/postmortems
- Implement and practice proper security controls and processes
- Providing recommendations for architecture and process improvements.
- Definition and deployment of systems for metrics, logging, and monitoring on platform.
Must have:
- Minimum 3 Years of Experience in DevOps.
- BS degree in Computer Science, Mathematics, Engineering, or equivalent practical experience.
- Strong inter-personal skills.
- Must have experience in CI/CD tooling such as Jenkins, CircleCI, TravisCI
- Must have experience in Docker, Kubernetes, Amazon ECS or Mesos
- Experience in code development in at least one high-level programming language fromthis list: python, ruby, golang, groovy
- Proficient in shell scripting, and most importantly, know when to stop scripting and start developing.
- Experience in creation of highly automated infrastructures with any Configuration Management tools like: Terraform, Cloudformation or Ansible.
- In-depth knowledge of the Linux operating system and administration.
- Production experience with a major cloud provider such Amazon AWS.
- Knowledge of web server technologies such as Nginx or Apache.
- Knowledge of Redis, Memcache, or one of the many in-memory data stores.
- Experience with various load balancing technologies such as Amazon ALB/ELB, HA Proxy, F5.
- Comfortable with large-scale, highly-available distributed systems.
Good to have:
- Understanding of Web Standards (REST, SOAP APIs, OWASP, HTTP, TLS)
- Production experience with Hashicorp products such as Vault or Consul
- Expertise in designing, analyzing troubleshooting large-scale distributed systems.
- Experience in an PCI environment
- Experience with Big Data distributions from Cloudera, MapR, or Hortonworks
- Experience maintaining and scaling database applications
- Knowledge of fundamental systems engineering principles such as CAP Theorem, Concurrency Control, etc.
- Understanding of the network fundamentals: OSI, TCI/IP, topologies, etc.
- Understanding of Auditing of Infrastructure and help org. to control Infrastructure costs.
- Experience in Kafka, RabbitMQ or any messaging bus.









