CLOUDSUFI is a Data Science and Product Engineering organization building Products and Solutions for Technology and Enterprise industries. We firmly believe in the power of data to transform businesses and make better decisions. We combine unmatched experience in business processes with cutting edge infrastructure and cloud services. We partner with our customers to monetize their data and make enterprise data dance.
Job type - Fulltime / Contract
Summary
We are looking for a Senior MLOps Engineer to support the AI CoE in building and scaling machine learning operations. This position requires both strategic oversight and direct involvement in MLOps infrastructure design, automation, and optimization. The person will lead a team while collaborating with various stakeholders to manage machine learning pipelines and model deployments in GCP / AWS / Azure. One of key parts of this role would also be managing data and models using data cataloging tools, ensuring that they are well- documented, versioned, and accessible for reuse and auditing.
Job Description:
⮚ Deploy models to production in GCP and own the model maintenance, monitoring and support activities
⮚ Split time between high-level strategy and hands-on technical implementation
⮚ Architect, build, and maintain scalable MLOps pipelines, with a focus on GCP / AWS / Azure services such as Vertex AI, GKE, Cloud Storage, and Big Query; stay up-to-date with the latest trends and advancements in MLOps
⮚ Implement and optimize CI/CD pipelines for machine learning model deployment, ensuring minimal downtime and streamlined processes
⮚ Work closely with data scientists and data engineers to ensure efficient data processing pipelines, model training, testing, and deployment
⮚ Manage data catalog tools for model and dataset versioning, lineage tracking, and governance. Ensure that all models and datasets are properly documented and discoverable
⮚ Develop automated systems for model monitoring, logging, and performance tracking in production environments
⮚ Lead the integration of data cataloging tools (e.g., Open MetaData), ensuring the traceability and versioning of both datasets and models.
Required Experience:
⮚ Bachelor’s degree in Computer Science, Engineering or similar quantitative disciplines
⮚ 4+ years of professional experience in MLOps or similar roles
⮚ Candidate should be able to able to write code in ML
⮚ Excellent analytical and problem-solving skills for technical challenges related to MLOps
⮚ Excellent English proficiency, presentation, and communication skills ⮚ Proven experience in deploying, monitoring, and managing machine learning models on GCP / AWS / Azure
⮚ Hands-on experience with data catalog tools
⮚ Expert in GCP / AWS / Azure services such as Vertex AI, GKE, BigQuery, and Cloud Build, Endpoint etc for building scalable ML infrastructure (GCP / AWS / Azure official Certifications are a huge plus) ⮚ Experience with model serving frameworks (e.g., TensorFlow Serving, TorchServe), and MLOps tools like Kubeflow, MLflow, or TFX