
As a MLOps Engineer in QuantumBlack you will:
Develop and deploy technology that enables data scientists and data engineers to build, productionize and deploy machine learning models following best practices. Work to set the standards for SWE and
DevOps practices within multi-disciplinary delivery teams
Choose and use the right cloud services, DevOps tooling and ML tooling for the team to be able to produce high-quality code that allows your team to release to production.
Build modern, scalable, and secure CI/CD pipelines to automate development and deployment
workflows used by data scientists (ML pipelines) and data engineers (Data pipelines)
Shape and support next generation technology that enables scaling ML products and platforms. Bring
expertise in cloud to enable ML use case development, including MLOps
Our Tech Stack-
We leverage AWS, Google Cloud, Azure, Databricks, Docker, Kubernetes, Argo, Airflow, Kedro, Python,
Terraform, GitHub actions, MLFlow, Node.JS, React, Typescript amongst others in our projects
Key Skills:
• Excellent hands-on expert knowledge of cloud platform infrastructure and administration
(Azure/AWS/GCP) with strong knowledge of cloud services integration, and cloud security
• Expertise setting up CI/CD processes, building and maintaining secure DevOps pipelines with at
least 2 major DevOps stacks (e.g., Azure DevOps, Gitlab, Argo)
• Experience with modern development methods and tooling: Containers (e.g., docker) and
container orchestration (K8s), CI/CD tools (e.g., Circle CI, Jenkins, GitHub actions, Azure
DevOps), version control (Git, GitHub, GitLab), orchestration/DAGs tools (e.g., Argo, Airflow,
Kubeflow)
• Hands-on coding skills Python 3 (e.g., API including automated testing frameworks and libraries
(e.g., pytest) and Infrastructure as Code (e.g., Terraform) and Kubernetes artifacts (e.g.,
deployments, operators, helm charts)
• Experience setting up at least one contemporary MLOps tooling (e.g., experiment tracking,
model governance, packaging, deployment, feature store)
• Practical knowledge delivering and maintaining production software such as APIs and cloud
infrastructure
• Knowledge of SQL (intermediate level or more preferred) and familiarity working with at least
one common RDBMS (MySQL, Postgres, SQL Server, Oracle)

Similar jobs
Job Description:
Infilect is a GenAI company pioneering the use of Image Recognition in Consumer Packaged Goods retail.
We are looking for a Senior DevOps Engineer to be responsible and accountable for the smooth running of our Cloud, AI workflows, and AI-based Computer Systems. Furthermore, the candidate will supervise the implementation and maintenance of the company’s computing needs including the in-house GPU & AI servers along with AI workloads.
Responsibilities
- Understanding and automating AI based deployment an AI based workflows
- Implementing various development, testing, automation tools, and IT infrastructure
- Manage Cloud, computer systems and other IT assets.
- Strive for continuous improvement and build continuous integration, continuous development, and constant deployment pipeline (CI/CD Pipeline)
- Design, develop, implement, and coordinate systems, policies, and procedures for Cloud and on-premise systems
- Ensure the security of data, network access, and backup systems
- Act in alignment with user needs and system functionality to contribute to organizational policy
- Identify problematic areas, perform RCA and implement strategic solutions in time
- Preserve assets, information security, and control structures
- Handle monthly/annual cloud budget and ensure cost effectiveness
Requirements and skills
- Well versed in automation tools such as Docker, Kubernetes, Puppet, Ansible etc.
- Working Knowledge of Python, SQL database stack or any full-stack with relevant tools.
- Understanding agile development, CI/CD, sprints, code reviews, Git and GitHub/Bitbucket workflows
- Well versed with ELK stack or any other logging, monitoring and analysis tools
- Proven working experience of 2+ years as an DevOps/Tech lead/IT Manager or relevant positions
- Excellent knowledge of technical management, information analysis, and of computer hardware/software systems
- Hands-on experience with computer networks, network administration, and network installation
- Knowledge in ISO/SOC Type II implementation with be a
- BE/B.Tech/ME/M.Tech in Computer Science, IT, Electronics or a similar field
Requirements:
• Previously help a DevOps Engineer or System Engineer role
• 4+ years of production Linux system admin experience in high traffic environment
• 1+ years of experience with Amazon AWS and related services (instances, ELB,
EBS, S3, etc.) and abstractions on top of AWS.
• Strong understanding of network fundamentals, IP and related services (DNS, VPN, firewalls, etc.) and
security concerns.
• Experience in running Docker and Kubernetes clusters in production.
• Love automating mundane tasks and make developers life easy
• Must be able to code in, at a minimum, Python (or Ruby) and Bash.
• Non-trivial production experience with Saltstack and/or Puppet, Composer,Jenkins, GIT
• Agile software development best practices - continuous integration, releases,branches, etc.
• Experience with modern monitoring tools; capacity planning.
• Some experience with MySQL, PostgreSQL, ElasticSearch, Node.js, and PHP is a plus.
• Self-motivated, fast learner, detail-oriented, team player with a sense of humor
Experience in managing CI/CD using Jenkins.
We are a boutique IT services & solutions firm headquartered in the Bay Area with offices in India. Our offering includes custom-configured hybrid cloud solutions backed by our managed services. We combine best in class DevOps and IT infrastructure management practices, to manage our clients Hybrid Cloud Environments.
In addition, we build and deploy our private cloud solutions using Open stack to provide our clients with a secure, cost effective and scale able Hybrid Cloud solution. We work with start-ups as well as enterprise clients.
This is an exciting opportunity for an experienced Cloud Engineer to work on exciting projects and have an opportunity to expand their knowledge working on adjacent technologies as well.
Must have skills
• Provisioning skills on IaaS cloud computing for platforms such as AWS, Azure, GCP.
• Strong working experience in AWS space with various AWS services and implementations (i.e. VPCs, SES, EC2, S3, Route 53, Cloud Front, etc.)
• Ability to design solutions based on client requirements.
• Some experience with various network LAN/WAN appliances like (Cisco routers and ASA systems, Barracuda, Meraki, SilverPeak, Palo Alto, Fortinet, etc.)
• Understanding of networked storage like (NFS / SMB / iSCSI / Storage GW / Windows Offline)
• Linux / Windows server installation, maintenance, monitoring, data backup and recovery, security, and administration.
• Good knowledge of TCP/IP protocol & internet technologies.
• Passion for innovation and problem solving, in a start-up environment.
• Good communication skills.
Good to have
• Remote Monitoring & Management.
• Familiarity with Kubernetes and Containers.
• Exposure to DevOps automation scripts & experience with tools like Git, bash scripting, PowerShell, AWS Cloud Formation, Ansible, Chef or Puppet will be a plus.
• Architect / Practitioner certification from OEM with hands-on capabilities.
What you will be working on
• Trouble shoot and handle L2/ L3 tickets.
• Design and architect Enterprise Cloud systems and services.
• Design, Build and Maintain environments primarily in AWS using EC2, S3/Storage, CloudFront, VPC, ELB, Auto Scaling, Direct Connect, Route53, Firewall, etc.
• Build and deploy in GCP/ Azure as needed.
• Architect cloud solutions keeping performance, cost and BCP considerations in mind.
• Plan cloud migration projects as needed.
• Collaborate & work as part of a cohesive team.
• Help build our private cloud offering on Open stack.
• Hands-on experience in Azure.
• Build and maintain CI/CD tools and pipelines.
• Designing and managing highly scalable, reliable, and fault-tolerant infrastructure & networking that forms the backbone of distributed systems at RARA Now.
• Continuously improve code quality, product execution, and customer delight.
• Communicate, collaborate and work effectively across distributed teams in a global environment.
• Operate to strengthen teams across their product with their knowledge base
• Contribute to improving team relatedness, and help build a culture of camaraderie.
• Continuously refactor applications to ensure high-quality design
• Pair with team members on functional and non-functional requirements and spread design philosophy and goals across the team
• Excellent bash, and scripting fundamentals and hands-on with scripting in programming languages such as Python, Ruby, Golang, etc.
• Good understanding of distributed system fundamentals and ability to troubleshoot issues in a larger distributed infrastructure
• Working knowledge of the TCP/IP stack, internet routing, and load balancing
• Basic understanding of cluster orchestrators and schedulers (Kubernetes)
• Deep knowledge of Linux as a production environment, and container technologies. e.g., Docker, Infrastructure as Code such as Terraform, and K8s administration at large scale.
• Have worked on production distributed systems and have an understanding of microservices architecture, RESTful services, and CI/CD.
Experience and Education
• Bachelor’s degree in engineering or equivalent.
Work experience
• 4+ years of infrastructure and operations management
Experience at a global scale.
• 4+ years of experience in operations management, including monitoring, configuration management, automation, backup, and recovery.
• Broad experience in the data center, networking, storage, server, Linux, and cloud technologies.
• Broad knowledge of release engineering: build, integration, deployment, and provisioning, including familiarity with different upgrade models.
• Demonstratable experience with executing, or being involved of, a complete end-to-end project lifecycle.
Skills
• Excellent communication and teamwork skills – both oral and written.
• Skilled at collaborating effectively with both Operations and Engineering teams.
• Process and documentation oriented.
• Attention to details. Excellent problem-solving skills.
• Ability to simplify complex situations and lead calmly through periods of crisis.
• Experience implementing and optimizing operational processes.
• Ability to lead small teams: provide technical direction, prioritize tasks to achieve goals, identify dependencies, report on progress.
Technical Skills
• Strong fluency in Linux environments is a must.
• Good SQL skills.
• Demonstratable scripting/programming skills (bash, python, ruby, or go) and the ability to develop custom tool integrations between multiple systems using their published API’s / CLI’s.
• L3, load balancer, routing, and VPN configuration.
• Kubernetes configuration and management.
• Expertise using version control systems such as Git.
• Configuration and maintenance of database technologies such as Cassandra, MariaDB, Elastic.
• Designing and configuration of open-source monitoring systems such as Nagios, Grafana, or Prometheus.
• Designing and configuration of log pipeline technologies such as ELK (Elastic Search Logstash Kibana), FluentD, GROK, rsyslog, Google Stackdriver.
• Using and writing modules for Infrastructure as Code tools such as Ansible, Terraform, helm, customize.
• Strong understanding of virtualization and containerization technologies such as VMware, Docker, and Kubernetes.
• Specific experience with Google Cloud Platform or Amazon EC2 deployments and virtual machines.c
- 3+ years experience leading a team of DevOps engineers
- 8+ years experience managing DevOps for large engineering teams developing cloud-native software
- Strong in networking concepts
- In-depth knowledge of AWS and cloud architectures/services.
- Experience within the container and container orchestration space (Docker, Kubernetes)
- Passion for CI/CD pipeline using tools such as Jenkins etc.
- Familiarity with config management tools like Ansible Terraform etc
- Proven record of measuring and improving DevOps metrics
- Familiarity with observability tools and experience setting them up
- Passion for building tools and productizing services that empower development teams.
- Excellent knowledge of Linux command-line tools and ability to write bash scripts.
- Strong in Unix / Linux administration and management,
KEY ROLES/RESPONSIBILITIES:
- Own and manage the entire cloud infrastructure
- Create the entire CI/CD pipeline to build and release
- Explore new technologies and tools and recommend those that best fit the team and organization
- Own and manage the site reliability
- Strong decision-making skills and metric-driven approach
- Mentor and coach other team members
Job Description:
Mandatory Skills:
Should have strong working experience with Cloud technologies like AWS and Azure.
Should have strong working experience with CI/CD tools like Jenkins and Rundeck.
Must have experience with configuration management tools like Ansible.
Must have working knowledge on tools like Terraform.
Must be good at Scripting Languages like shell scripting and python.
Should be expertise in DevOps practices and should have demonstrated the ability to apply that knowledge across diverse projects and teams.
Preferable skills:
Experience with tools like Docker, Kubernetes, Puppet, JIRA, gitlab and Jfrog.
Experience in scripting languages like groovy.
Experience with GCP
Summary & Responsibilities:
Write build pipelines and IaaC (ARM templates, terraform or cloud formation).
Develop ansible playbooks to install and configure various products.
Implement Jenkins and Rundeck jobs( and pipelines).
Must be a self-starter and be able to work well in a fast paced, dynamic environment
Work independently and resolve issues with minimal supervision.
Strong desire to learn new technologies and techniques
Strong communication (written / verbal ) skills
Qualification:
Bachelor's degree in Computer Science or equivalent.
4+ years of experience in DevOps and AWS.
2+ years of experience in Python, Shell scripting and Azure.
Responsibilities
- Building and maintenance of resilient and scalable production infrastructure
Improvement of monitoring systems
- Improvement of monitoring systems
- Creation and support of development automation processes (CI / CD)
- Participation in infrastructure development
- Detection of problems in architecture and proposing of solutions for solving them
- Creation of tasks for system improvements for system scalability, performance and monitoring
- Analysis of product requirements in the aspect of devops
- Incident analysis and fixing
Skills and Experience
- Understanding of the distributed systems principles
- Understanding of principles for building a resistant network infrastructure
- Experience of Ubuntu Linux administration (Debian-like will be a plus)
- Strong knowledge of Bash
- Experience of working with LXC-containers
- Understanding and experience with infrastructure as a code approach
- Experience of development idempotent Ansible roles
- Experience of working with git
Preferred experience
Experience with relational databases (PostgeSQL), ability to create simple SQL queries
- Experience with monitoring and metric collect systems (Prometheus, Grafana, Zabbix)
- Understanding of dynamic routing (OSPF)
- Knowledge and experience of working with network equipment Cisco
- Experience of working with Cisco NX-OS
- Experience of working with IPsec, VXLAN, Open vSwitch
- Knowledge of principles of multicast protocols IGMP, PIM
- Experience of setting multicast on Cisco equipment
- Experience administering Atlassian products

2. Has done Infrastructure coding using Cloudformation/Terraform and Configuration also understands it very clearly
3. Deep understanding of the microservice design and aware of centralized Caching(Redis),centralized configuration(Consul/Zookeeper)
4. Hands-on experience of working on containers and its orchestration using Kubernetes
5. Hands-on experience of Linux and Windows Operating System
6. Worked on NoSQL Databases like Cassandra, Aerospike, Mongo or
Couchbase, Central Logging, monitoring and Caching using stacks like ELK(Elastic) on the cloud, Prometheus, etc.
7. Has good knowledge of Network Security, Security Architecture and Secured SDLC practices







