
MLOps Lead Engineer
at IT solutions specialized in Apps Lifecycle management. (MG1)
- Automate and maintain ML and Data pipelines at scale
- Collaborate with Data Scientists and Data Engineers on feature development teams to containerize and build out deployment pipelines for new modules
- Maintain and expand our on-prem deployments with spark clusters
- Design, build and optimize applications containerization and orchestration with Docker and Kubernetes and AWS or Azure
- 5 years of IT experience in data-driven or AI technology products
- Understanding of ML Model Deployment and Lifecycle
- Extensive experience in Apache airflow for MLOps workflow automation
- Experience is building and automating data pipelines
- Experience in working on Spark Cluster architecture
- Extensive experience with Unix/Linux environments
- Experience with standard concepts and technologies used in CI/CD build, deployment pipelines using Jenkins
- Strong experience in Python and PySpark and building required automation (using standard technologies such as Docker, Jenkins, and Ansible).
- Experience with Kubernetes or Docker Swarm
- Working technical knowledge of current systems software, protocols, and standards, including firewalls, Active Directory, etc.
- Basic knowledge of Multi-tier architectures: load balancers, caching, web servers, application servers, and databases.
- Experience with various virtualization technologies and multi-tenant, private and hybrid cloud environments.
- Hands-on software and hardware troubleshooting experience.
- Experience documenting and maintaining configuration and process information.
- Basic Knowledge of machine learning frameworks: Tensorflow, Caffe/Caffe2, Pytorch

Similar jobs

We are looking for a highly skilled Linux-focused DevOps Engineer with strong troubleshooting capabilities. The ideal candidate must have hands-on experience in Linux systems, OpenStack, Kubernetes, and Ansible automation. This role requires deep debugging skills and structured root cause analysis.
Key Responsibilities
• Troubleshoot complex infrastructure and application issues through deep log analysis.
• Manage and optimize Linux-based production environments.
• Deploy and manage workloads in Kubernetes clusters.
• Provision and manage infrastructure using OpenStack.
• Automate configuration management using Ansible. • Diagnose networking issues including DNS, routing, and firewall configurations.
• Develop automation scripts using Bash or Python.
• Participate in production incident handling and RCA documentation. Required Skills & Experience
• Strong Linux internals knowledge (systemd, processes, memory, I/O). • Experience analyzing system logs (/var/log, journalctl, dmesg).
• Hands-on Kubernetes production troubleshooting.
• OpenStack VM provisioning and networking knowledge.
• Ansible playbook and role development experience.
• Strong scripting skills (Bash or Python).
Good to Have
• Terraform exposure. • Monitoring tools such as Prometheus, Grafana, ELK. • Experience in multi-tenant environments. • On-call production support experience.
At TechBiz Global, we are providing recruitment service to our TOP clients from our portfolio. We are currently seeking 4 DevOps Support Engineer to join one of our clients' teams in India who can start until 20th of July. If you're looking for an exciting opportunity to grow in a innovative environment, this could be the perfect fit for you.
Job requirements
Key Responsibilities:
- Monitor and troubleshoot AWS and/or Azure environments to ensure optimal performance and availability.
- Respond promptly to incidents and alerts, investigating and resolving issues efficiently.
- Perform basic scripting and automation tasks to streamline cloud operations (e.g., Bash, Python).
- Communicate clearly and fluently in English with customers and internal teams.
- Collaborate closely with the Team Lead, following Standard Operating Procedures (SOPs) and escalation workflows.
- Work in a rotating shift schedule, including weekends and nights, ensuring continuous support coverage.
Shift Details:
- Engineers rotate shifts, typically working 4–5 shifts per week.
- Each engineer works about 4 to 5 shifts per week, rotating through morning, evening, and night shifts—including weekends—to cover 24/7 support evenly among the team
- Rotation ensures no single engineer is always working nights or weekends; the load is shared fairly among the team.
Qualifications:
- 2–5 years of experience in DevOps or cloud support roles.
- Strong familiarity with AWS and/or Azure cloud environments.
- Experience with CI/CD tools such as GitHub Actions or Jenkins.
- Proficiency with monitoring tools like Datadog, CloudWatch, or similar.
- Basic scripting skills in Bash, Python, or comparable languages.
- Excellent communication skills in English.
- Comfortable and willing to work in a shift-based support role, including night and weekend shifts.
- Prior experience in a shift-based support environment is preferred.
What We Offer:
- Remote work opportunity — work from anywhere in India with a stable internet connection.
- Comprehensive training program including:
- Shadowing existing processes to gain hands-on experience.
- Learning internal tools, Standard Operating Procedures (SOPs), ticketing systems, and escalation paths to ensure smooth onboarding and ongoing success.
Job Description
Role Overview:
We're looking for a passionate DevOps engineer with a minimum of 10 years’ experience across all levels, who will work closely with the development teams in Agile setup to continuously improve, support, secure, and operate our production and test environments. We believe in automating our infrastructure as much as possible and pursuing challenging problems in a sustainable and repeatable way.
Our Toolchain
- Ansible, Docker, Kubernetes, Terraform, Gitlab, Jenkins, Fastlane, New Relic, Datadog, SonarQube, IaC
- Apache, Nginx, Linux, Ubuntu, Microservices, Python, Shell, Bash, Helm
- Selenium, Jmeter, Slack, Jira, SAST, OSSEC, OWASP
- Node.JS, PHP, Golang, MySQL, MongoDB, Firebase, Redis, Elastic search,
- VPC, API Gateway, Cognito, DocumentDB, ECS, Lambda, Route53, ACM, S3, EC2, IAM
You'll need:
- Production experience with distributed/scalable systems consisting of multiple microservices and/or high-traffic web applications
- Experience with configuration management systems such as Ansible, Chef, Puppet
- Extensive knowledge of the Linux operating system
- Troubleshooting skills that range from diagnosis to solution for Dev team issues
- Knowledge of how the web works and HTTP fundamentals
- Knowledge of IP networking, DNS, load balancing, and firewalling
Bonus points, if you have:
- Experience in agile development and delivery process.
- Good knowledge of at least one programming language. TecStub uses e.g. Nodes, PHP
- Experience in containerizing applications and deployment to production (Docker, Kubernetes)
- Experience in building modern Terraform infrastructures in cloud environments (AWS, GCP, etc...)
- Experience in analysis of application and database performance monitoring tools (Newrelic, datalog, cluster control, etc..)
- Experience with SQL databases like MySQL, NoSQL, Realtime database stores like Redis, or anything in between.
- Experience being part of the engineering team that built the platform.
- Knowledge of good security practices, including network security, system hardening, secure software, and compliance.
- Familiarity with automated build pipeline / continuous integration using Gitlab and Jenkins and Kubernetes/Docker with this setup, we're deploying to production 2 times per day!
Interview Process:
The entire interview process would take approximately 10 Days.
- HR Screening Call (15 minutes)
- Technical Interview Round Level 1 (30 Minutes)
- Technical Interview Round Level 2 (60 minutes)
- Final Interview Round (60 minutes)
- Offer
About Tecstub:
Tecstub is a renowned global provider of comprehensive digital commerce solutions for some of the world's largest enterprises. With offices in North America and Asia-Pacific, our team offers end-to-end solutions such as strategic Solution Consulting, eCommerce website and application development, and support & maintenance services that are tailored to meet our clients' unique business goals. We are dedicated to delivering excellence by working as an extended partner, providing next-generation solutions that are sustainable, scalable, and future-proof. Our passionate and driven team of professionals has over a decade of experience in the industry and is committed to helping our clients stay ahead of the competition.
We value our employees and strive to create a positive work environment that promotes work-life balance and personal growth. As part of our commitment to our team, we offer a range of benefits to ensure our employees are supported and motivated.
- A 5-day work week that promotes work-life balance and allows our employees to take care of personal responsibilities while excelling in their professional roles.
- 30 annual paid leaves that can be utilized for various personal reasons, such as regional holidays, sick leaves, or any other personal needs. We believe that taking time off is essential for overall well-being and productivity.
- Additional special leaves for birthdays, maternity and paternity events to ensure that our employees can prioritize their personal milestones without any added stress.
- Health insurance coverage of 3 lakhs sum insured for our employees, spouse, and children, to provide peace of mind and security for their health needs.
- Vouchers and gifts for important life events such as birthdays and anniversaries, to celebrate our employees' milestones and show appreciation for their contributions to the company.
- A dedicated learning and growth budget for courses and certifications, to support our employees' career aspirations and encourage professional development.
- Company outings to celebrate our successes together and promote a sense of camaraderie among our team members. We believe that celebrating achievements is an important part of building a positive work culture.
Skills
AWS, Terraform, KUBERNETES, GITHUB, APACHE, BASH, DOCKER, ANSIBLE, GIT, Microservices, UBUNTU, GITLAB, CI/CD, APACHE SERVER, NGINX, NODEJS
Roles and Responsibilities
- Primary stakeholder collaborating with Dir Engineering on software/infrastructure architecture, monitoring/alerting framework and all other architectural level technical issues
- Design and manage implementation of Silvermine’s high performance, scalable, extensible and resilient microservices application stack based of existing, partially migrated monolithic application and for new product development. Includes:
- Utilizing either ECS Fargate (no EC2 clusters) or EKS as the orchestration framework – to be tested up to a minimum of 100k concurrent users
- Exploring, designing and implementing use of on demand compute (Lambda) where appropriate
- Scalable and redundant data architecture supporting microservices design principles
- A scalable reverse proxy layer to isolate microservices from managing network connections
- Utilizing CDN capabilities to offload origin load via an intelligent caching strategy
- Leveraging best in breed AWS service offerings to enable team to focus on application stack instead of application scaffolding while minimizing operational complexity and cost
- Monitoring and optimizing of stack for
- Security and monitoring
- Leverage AWS and 3rd party services to monitor the application stack and data; secure them from DDOS attacks and security breaches; and alert the team in the vent of an incident
- Using APM and logging tools:
- Monitor application stack and infrastructure component performance
- Proactively detect, triage and mitigate stack performance issues
- Alert upon exception events
- Provide triaging tools for debugging and Root Cause Analysis.
- Enhance the CI/CD pipeline to support automated testing, a resilient deployment model (e.g., blue-green, canary) and 100% rollback support (including the data layer)
- Development a comprehensive, supportable, repeatable IAC implementation using CloudFormation or Terraform
- Take a leadership role and exhibit expertise in the development of standards, architectural governance, design patterns, best practices and optimization of existing architecture.
- Partner with teams and leaders to provide strategic consultation for business process design/optimization, creating strategic technology road maps, performing rapid prototyping and implementing technical solutions to accelerate the fulfillment of the business strategic vision.
- Staying up to date on emerging technologies (AI, Automation, Cloud etc.) and trends with a clear focus on productivity, ease of use and fit-for-purpose, by researching, testing, and evaluating.
- Providing POCs and product implementation guidelines.
- Applying imagination and innovation by creating, inventing, and implementing new or better approaches, alternatives and breakthrough ideas that are valued by customers within the function.
- Assessing current state of solutions, defining future state needs, identifying gaps and recommending new technology solutions and strategic business execution improvements.
- Overseeing and facilitating the evaluation and selection technology, product standards and the design of standard configurations/implementation patterns.
- Partnering with other architects and solution owners to create standards and set strategies for the enterprise.
- Communicating directly with business colleagues on applying digital workplace technologies to solve identified business challenges.
Skills Required:
- Good mentorship skills to coach and guide the team on AWS DevOps.
- Jenkins, Python, Pipeline as Code, Cloud Formation Templates and Terraform.
- Experience with Dockers, Containers, Lambda and Fargate is a must
- Experience with CI/CD and Release management
- Strong proficiency in PowerShell scripting
- Demonstrable expertise in Java
- Familiarity with REST APIs
Qualifications:
- Minimum of 5 years of relevant experience in Devops.
- Bachelors or Masters in Computer Science or equivalent degree.
- AWS Certifications is added advantage
Cloud Software Engineer
Notice Period: 45 days / Immediate Joining
Banyan Data Services (BDS) is a US-based Infrastructure services Company, headquartered in San Jose, California, USA. It provides full-stack managed services to support business applications and data infrastructure. We do provide the data solutions and services on bare metal, On-prem, and all Cloud platforms. Our engagement service is built on the DevOps standard practice and SRE model.
We offer you an opportunity to join our rocket ship startup, run by a world-class executive team. We are looking for candidates that aspire to be a part of the cutting-edge solutions and services we offer, that address next-gen data evolution challenges. Candidates who are willing to use their experience in areas directly related to Infrastructure Services, Software as Service, and Cloud Services and create a niche in the market.
Roles and Responsibilities
· A wide variety of engineering projects including data visualization, web services, data engineering, web-portals, SDKs, and integrations in numerous languages, frameworks, and clouds platforms
· Apply continuous delivery practices to deliver high-quality software and value as early as possible.
· Work in collaborative teams to build new experiences
· Participate in the entire cycle of software consulting and delivery from ideation to deployment
· Integrating multiple software products across cloud and hybrid environments
· Developing processes and procedures for software applications migration to the cloud, as well as managed services in the cloud
· Migrating existing on-premises software applications to cloud leveraging a structured method and best practices
Desired Candidate Profile : *** freshers can also apply ***
· 2+years of experience with 1 or more development languages such as Java, Python, or Spark.
· 1 year + of experience with private/public/hybrid cloud model design, implementation, orchestration, and support.
· Certification or any training's completion of any one of the cloud environments like AWS, GCP, Azure, Oracle Cloud, and Digital Ocean.
· Strong problem-solvers who are comfortable in unfamiliar situations, and can view challenges through multiple perspectives
· Driven to develop technical skills for oneself and team-mates
· Hands-on experience with cloud computing and/or traditional enterprise datacentre technologies, i.e., network, compute, storage, and virtualization.
· Possess at least one cloud-related certification from AWS, Azure, or equivalent
· Ability to write high-quality, well-tested code and comfort with Object-Oriented or functional programming patterns
· Past experience quickly learning new languages and frameworks
· Ability to work with a high degree of autonomy and self-direction
http://www.banyandata.com" target="_blank">www.banyandata.com
About the company:
Tathastu, the next-generation innovation labs is Future Group’s initiative to provide a new-age retail experience - combining the physical with digital and enhancing it with data. We are creating next-generation consumer interactions by combining AI/ML, Data Science, and emerging technologies with consumer platforms.
The E-Commerce vertical under Tathastu has developed online consumer platforms for Future Group’s portfolio of retail brands -Easy day, Big Bazaar, Central, Brand factory, aLL, Clarks, Coverstory. Backed by our network of offline stores we have built a new retail platform that merges our Online & Offline retail streams. We use data to power all our decisions across our products and build internal tools to help us scale our impact with a small closely-knit team.
Our widespread store network, robust logistics, and technology capabilities have made it possible to launch a ‘2-Hour Delivery Promise’ on every product across fashion, food, FMCG, and home products for orders placed online through the Big Bazaar mobile app and portal. This makes Big Bazaar the first retailer in the country to offer instant home delivery on almost every consumer product ordered online.
Job Responsibilities:
- You’ll streamline and automate the software development and infrastructure management processes and play a crucial role in executing high-impact initiatives and continuously improving processes to increase the effectiveness of our platforms.
- You’ll translate complex use cases into discrete technical solutions in platform architecture, design and coding, functionality, usability, and optimization.
- You will drive automation in repetitive tasks, configuration management, and deliver comprehensive automated tests to debug/troubleshoot Cloud AWS-based systems and BigData applications.
- You’ll continuously discover, evaluate, and implement new technologies to maximize the development and operational efficiency of the platforms.
- You’ll determine the metrics that will define technical and operational success and constantly track such metrics to fine-tune the technology stack of the organization.
Experience: 4 to 8 Yrs
Qualification: B.Tech / MCA
Required Skills:
- Experience with Linux/UNIX systems administration and Amazon Web Services (AWS).
- Infrastructure as Code (Terraform), Kubernetes and container orchestration, Web servers (Nginx, Apache), Application Servers(Tomcat,Node.js,..), document stores and relational databases (AWS RDS-MySQL).
- Site Reliability Engineering patterns and visibility /performance/availability monitoring (Cloudwatch, Prometheus)
- Background in and happy to work hands-on with technical troubleshooting and performance tuning.
- Supportive and collaborative personality - ability to influence and drive progress with your peers
Our Technology Stack:
- Docker/Kubernetes
- Cloud (AWS)
- Python/GoLang Programming
- Microservices
- Automation Tools
As part of the engineering team, you would be expected to have
deep technology expertise with a passion for building highly scalable products.
This is a unique opportunity where you can impact the lives of people across 150+
countries!
Responsibilities
• Develop Collaborate in large-scale systems design discussions.
• Deploying and maintaining in-house/customer systems ensuring high availability,
performance and optimal cost.
• Automate build pipelines. Ensuring right architecture for CI/CD
• Work with engineering leaders to ensure cloud security
• Develop standard operating procedures for various facets of Infrastructure
services (CI/CD, Git Branching, SAST, Quality gates, Auto Scaling)
• Perform & automate regular backups of servers & databases. Ensure rollback and
restore capabilities are Realtime and with zero-downtime.
• Lead the entire DevOps charter for ONE Championship. Mentor other DevOps
engineers. Ensure industry standards are followed.
Requirements
• Overall 5+ years of experience in as DevOps Engineer/Site Reliability Engineer
• B.E/B.Tech in CS or equivalent streams from institute of repute
• Experience in Azure is a must. AWS experience is a plus
• Experience in Kubernetes, Docker, and containers
• Proficiency in developing and deploying fully automated environments using
Puppet/Ansible and Terraform
• Experience with monitoring tools like Nagios/Icinga, Prometheus, AlertManager,
Newrelic
• Good knowledge of source code control (git)
• Expertise in Continuous Integration and Continuous Deployment setup using Azure
Pipeline or Jenkins
• Strong experience in programming languages. Python is preferred
• Experience in scripting and unit testing
• Basic knowledge of SQL & NoSQL databases
• Strong Linux fundamentals
• Experience in SonarQube, Locust & Browserstack is a plus
- Proficient in Java, Node or Python
- Experience with NewRelic, Splunk, SignalFx, DataDog etc.
- Monitoring and alerting experience
- Full stack development experience
- Hands-on with building and deploying micro services in Cloud (AWS/Azure)
- Experience with terraform w.r.t Infrastructure As Code
- Should have experience troubleshooting live production systems using monitoring/log analytics tools
- Should have experience leading a team (2 or more engineers)
- Experienced using Jenkins or similar deployment pipeline tools
- Understanding of distributed architectures
Job Location: Jaipur
Experience Required: Minimum 3 years
About the role:
As a DevOps Engineer for Punchh, you will be working with our developers, SRE, and DevOps teams implementing our next generation infrastructure. We are looking for a self-motivated, responsible, team player who love designing systems that scale. Punchh provides a rich engineering environment where you can be creative, learn new technologies, solve engineering problems, all while delivering business objectives. The DevOps culture here is one with immense trust and responsibility. You will be given the opportunity to make an impact as there are no silos here.
Responsibilities:
- Deliver SLA and business objectives through whole lifecycle design of services through inception to implementation.
- Ensuring availability, performance, security, and scalability of AWS production systems
- Scale our systems and services through continuous integration, infrastructure as code, and gradual refactoring in an agile environment.
- Maintain services once a project is live by monitoring and measuring availability, latency, and overall system and application health.
- Write and maintain software that runs the infrastructure that powers the Loyalty and Data platform for some of the world’s largest brands.
- 24x7 in shifts on call for Level 2 and higher escalations
- Respond to incidents and write blameless RCA’s/postmortems
- Implement and practice proper security controls and processes
- Providing recommendations for architecture and process improvements.
- Definition and deployment of systems for metrics, logging, and monitoring on platform.
Must have:
- Minimum 3 Years of Experience in DevOps.
- BS degree in Computer Science, Mathematics, Engineering, or equivalent practical experience.
- Strong inter-personal skills.
- Must have experience in CI/CD tooling such as Jenkins, CircleCI, TravisCI
- Must have experience in Docker, Kubernetes, Amazon ECS or Mesos
- Experience in code development in at least one high-level programming language fromthis list: python, ruby, golang, groovy
- Proficient in shell scripting, and most importantly, know when to stop scripting and start developing.
- Experience in creation of highly automated infrastructures with any Configuration Management tools like: Terraform, Cloudformation or Ansible.
- In-depth knowledge of the Linux operating system and administration.
- Production experience with a major cloud provider such Amazon AWS.
- Knowledge of web server technologies such as Nginx or Apache.
- Knowledge of Redis, Memcache, or one of the many in-memory data stores.
- Experience with various load balancing technologies such as Amazon ALB/ELB, HA Proxy, F5.
- Comfortable with large-scale, highly-available distributed systems.
Good to have:
- Understanding of Web Standards (REST, SOAP APIs, OWASP, HTTP, TLS)
- Production experience with Hashicorp products such as Vault or Consul
- Expertise in designing, analyzing troubleshooting large-scale distributed systems.
- Experience in an PCI environment
- Experience with Big Data distributions from Cloudera, MapR, or Hortonworks
- Experience maintaining and scaling database applications
- Knowledge of fundamental systems engineering principles such as CAP Theorem, Concurrency Control, etc.
- Understanding of the network fundamentals: OSI, TCI/IP, topologies, etc.
- Understanding of Auditing of Infrastructure and help org. to control Infrastructure costs.
- Experience in Kafka, RabbitMQ or any messaging bus.










