Emr jobs

24+ EMR Jobs in India

Apply to 24+ EMR Jobs on CutShort.io. Find your next job, effortlessly. Browse EMR Jobs and apply today!

Emr jobs in other cities

Amazon EMR Jobs Amazon EMR Jobs in Bangalore (Bengaluru)Amazon EMR Jobs in Chennai Amazon EMR Jobs in Delhi, NCR and Gurgaon Amazon EMR Jobs in Hyderabad Amazon EMR Jobs in Kolkata Amazon EMR Jobs in Mumbai EMR Jobs in Bangalore (Bengaluru)EMR Jobs in Chandigarh EMR Jobs in Chennai EMR Jobs in Coimbatore EMR Jobs in Delhi, NCR and Gurgaon EMR Jobs in Hyderabad EMR Jobs in Mumbai EMR Jobs in Pune

Jobs by Category

Fullstack Developer Jobs Backend Developer Jobs Frontend Developer Jobs Android Developer Jobs iOS Developer Jobs DevOps Jobs Data Science Jobs

Business Developer Jobs Digital Marketing Jobs Sales Jobs

UX Designer Jobs Graphic Designer Jobs

Jobs by Location

Startup Jobs in Bangalore Startup Jobs in Pune Startup Jobs in Delhi All Startup jobs

Collections

Funded Startup Jobs Product Startup Jobs

Data Engineer (AWS)

at Flycatch infotech PVT LTD

Posted by Nidhun K santhosh

Remote only

5 - 8 yrs

₹12L - ₹14L / yr

EMR

Apache Spark

PostgreSQL

Apache Iceberg

DynamoDB

Job Description – Data Engineer (AWS)

Experience: 5–8 Years

Key Skills

AWS EMR
Apache Spark
PostgreSQL
DynamoDB
Apache Iceberg
Strong Communication Skills

Job Responsibilities

Design, develop, and maintain scalable data engineering solutions on AWS
Build and optimize big data processing pipelines using Apache Spark and AWS EMR
Work with PostgreSQL and DynamoDB for data storage and management
Implement and manage data lake solutions using Apache Iceberg
Ensure data quality, performance optimization, and reliability of data pipelines
Collaborate with cross-functional teams to understand business and technical requirements
Troubleshoot and resolve data processing and performance issues

Requirements

5–8 years of experience in Data Engineering
Strong hands-on experience with AWS EMR and Apache Spark
Experience working with PostgreSQL and DynamoDB
Good understanding of Apache Iceberg and modern data lake architectures
Strong analytical, problem-solving, and communication skills

Apply here : https://lnkd.in/gY2rUu9Y

Job Description – Data Engineer (AWS)

Experience: 5–8 Years

Key Skills

AWS EMR
Apache Spark
PostgreSQL
DynamoDB
Apache Iceberg
Strong Communication Skills

Job Responsibilities

Design, develop, and maintain scalable data engineering solutions on AWS
Build and optimize big data processing pipelines using Apache Spark and AWS EMR
Work with PostgreSQL and DynamoDB for data storage and management
Implement and manage data lake solutions using Apache Iceberg
Ensure data quality, performance optimization, and reliability of data pipelines
Collaborate with cross-functional teams to understand business and technical requirements
Troubleshoot and resolve data processing and performance issues

Requirements

5–8 years of experience in Data Engineering
Strong hands-on experience with AWS EMR and Apache Spark
Experience working with PostgreSQL and DynamoDB
Good understanding of Apache Iceberg and modern data lake architectures
Strong analytical, problem-solving, and communication skills

Apply here : https://lnkd.in/gY2rUu9Y

Principal Data Scientist

Healthcare Industry

Agency job

via Peak Hire Solutions by Dharati Thakkar

Bengaluru (Bangalore)

6 - 10 yrs

₹25L - ₹30L / yr

MLOps

Generative AI

Python

Natural Language Processing (NLP)

Machine Learning (ML)

+22 more

JOB DETAILS:

* Job Title: Principal Data Scientist

* Industry: Healthcare

* Salary: Best in Industry

* Experience: 6-10 years

* Location: Bengaluru

Preferred Skills: Generative AI, NLP & ASR, Transformer Models, Cloud Deployment, MLOps

Criteria:

Candidate must have 7+ years of experience in ML, Generative AI, NLP, ASR, and LLMs (preferably healthcare).
Candidate must have strong Python skills with hands-on experience in PyTorch/TensorFlow and transformer model fine-tuning.
Candidate must have experience deploying scalable AI solutions on AWS/Azure/GCP with MLOps, Docker, and Kubernetes.
Candidate must have hands-on experience with LangChain, OpenAI APIs, vector databases, and RAG architectures.
Candidate must have experience integrating AI with EHR/EMR systems, ensuring HIPAA/HL7/FHIR compliance, and leading AI initiatives.

Job Description

Principal Data Scientist

(Healthcare AI | ASR | LLM | NLP | Cloud | Agentic AI)

Job Details

Designation: Principal Data Scientist (Healthcare AI, ASR, LLM, NLP, Cloud, Agentic AI)
Location: Hebbal Ring Road, Bengaluru
Work Mode: Work from Office
Shift: Day Shift
Reporting To: SVP
Compensation: Best in the industry (for suitable candidates)

Educational Qualifications

Ph.D. or Master’s degree in Computer Science, Artificial Intelligence, Machine Learning, or a related field
Technical certifications in AI/ML, NLP, or Cloud Computing are an added advantage

Experience Required

7+ years of experience solving real-world problems using:
Natural Language Processing (NLP)
Automatic Speech Recognition (ASR)
Large Language Models (LLMs)
Machine Learning (ML)
Preferably within the healthcare domain
Experience in Agentic AI, cloud deployments, and fine-tuning transformer-based models is highly desirable

Role Overview

This position is part of company, a healthcare division of Focus Group specializing in medical coding and scribing.

We are building a suite of AI-powered, state-of-the-art web and mobile solutions designed to:

Reduce administrative burden in EMR data entry
Improve provider satisfaction and productivity
Enhance quality of care and patient outcomes

Our solutions combine cutting-edge AI technologies with live scribing services to streamline clinical workflows and strengthen clinical decision-making.

The Principal Data Scientist will lead the design, development, and deployment of cognitive AI solutions, including advanced speech and text analytics for healthcare applications. The role demands deep expertise in generative AI, classical ML, deep learning, cloud deployments, and agentic AI frameworks.

Key Responsibilities

AI Strategy & Solution Development

Define and develop AI-driven solutions for speech recognition, text processing, and conversational AI
Research and implement transformer-based models (Whisper, LLaMA, GPT, T5, BERT, etc.) for speech-to-text, medical summarization, and clinical documentation
Develop and integrate Agentic AI frameworks enabling multi-agent collaboration
Design scalable, reusable, and production-ready AI frameworks for speech and text analytics

Model Development & Optimization

Fine-tune, train, and optimize large-scale NLP and ASR models
Develop and optimize ML algorithms for speech, text, and structured healthcare data
Conduct rigorous testing and validation to ensure high clinical accuracy and performance
Continuously evaluate and enhance model efficiency and reliability

Cloud & MLOps Implementation

Architect and deploy AI models on AWS, Azure, or GCP
Deploy and manage models using containerization, Kubernetes, and serverless architectures
Design and implement robust MLOps strategies for lifecycle management

Integration & Compliance

Ensure compliance with healthcare standards such as HIPAA, HL7, and FHIR
Integrate AI systems with EHR/EMR platforms
Implement ethical AI practices, regulatory compliance, and bias mitigation techniques

Collaboration & Leadership

Work closely with business analysts, healthcare professionals, software engineers, and ML engineers
Implement LangChain, OpenAI APIs, vector databases (Pinecone, FAISS, Weaviate), and RAG architectures
Mentor and lead junior data scientists and engineers
Contribute to AI research, publications, patents, and long-term AI strategy

Required Skills & Competencies

Expertise in Machine Learning, Deep Learning, and Generative AI
Strong Python programming skills
Hands-on experience with PyTorch and TensorFlow
Experience fine-tuning transformer-based LLMs (GPT, BERT, T5, LLaMA, etc.)
Familiarity with ASR models (Whisper, Canary, wav2vec, DeepSpeech)
Experience with text embeddings and vector databases
Proficiency in cloud platforms (AWS, Azure, GCP)
Experience with LangChain, OpenAI APIs, and RAG architectures
Knowledge of agentic AI frameworks and reinforcement learning
Familiarity with Docker, Kubernetes, and MLOps best practices
Understanding of FHIR, HL7, HIPAA, and healthcare system integrations
Strong communication, collaboration, and mentoring skills

JOB DETAILS:

* Job Title: Principal Data Scientist

* Industry: Healthcare

* Salary: Best in Industry

* Experience: 6-10 years

* Location: Bengaluru

Preferred Skills: Generative AI, NLP & ASR, Transformer Models, Cloud Deployment, MLOps

Criteria:

Candidate must have 7+ years of experience in ML, Generative AI, NLP, ASR, and LLMs (preferably healthcare).
Candidate must have strong Python skills with hands-on experience in PyTorch/TensorFlow and transformer model fine-tuning.
Candidate must have experience deploying scalable AI solutions on AWS/Azure/GCP with MLOps, Docker, and Kubernetes.
Candidate must have hands-on experience with LangChain, OpenAI APIs, vector databases, and RAG architectures.
Candidate must have experience integrating AI with EHR/EMR systems, ensuring HIPAA/HL7/FHIR compliance, and leading AI initiatives.

Job Description

Principal Data Scientist

(Healthcare AI | ASR | LLM | NLP | Cloud | Agentic AI)

Job Details

Designation: Principal Data Scientist (Healthcare AI, ASR, LLM, NLP, Cloud, Agentic AI)
Location: Hebbal Ring Road, Bengaluru
Work Mode: Work from Office
Shift: Day Shift
Reporting To: SVP
Compensation: Best in the industry (for suitable candidates)

Educational Qualifications

Ph.D. or Master’s degree in Computer Science, Artificial Intelligence, Machine Learning, or a related field
Technical certifications in AI/ML, NLP, or Cloud Computing are an added advantage

Experience Required

7+ years of experience solving real-world problems using:
Natural Language Processing (NLP)
Automatic Speech Recognition (ASR)
Large Language Models (LLMs)
Machine Learning (ML)
Preferably within the healthcare domain
Experience in Agentic AI, cloud deployments, and fine-tuning transformer-based models is highly desirable

Role Overview

This position is part of company, a healthcare division of Focus Group specializing in medical coding and scribing.

We are building a suite of AI-powered, state-of-the-art web and mobile solutions designed to:

Reduce administrative burden in EMR data entry
Improve provider satisfaction and productivity
Enhance quality of care and patient outcomes

Our solutions combine cutting-edge AI technologies with live scribing services to streamline clinical workflows and strengthen clinical decision-making.

Key Responsibilities

AI Strategy & Solution Development

Define and develop AI-driven solutions for speech recognition, text processing, and conversational AI
Research and implement transformer-based models (Whisper, LLaMA, GPT, T5, BERT, etc.) for speech-to-text, medical summarization, and clinical documentation
Develop and integrate Agentic AI frameworks enabling multi-agent collaboration
Design scalable, reusable, and production-ready AI frameworks for speech and text analytics

Model Development & Optimization

Fine-tune, train, and optimize large-scale NLP and ASR models
Develop and optimize ML algorithms for speech, text, and structured healthcare data
Conduct rigorous testing and validation to ensure high clinical accuracy and performance
Continuously evaluate and enhance model efficiency and reliability

Cloud & MLOps Implementation

Architect and deploy AI models on AWS, Azure, or GCP
Deploy and manage models using containerization, Kubernetes, and serverless architectures
Design and implement robust MLOps strategies for lifecycle management

Integration & Compliance

Ensure compliance with healthcare standards such as HIPAA, HL7, and FHIR
Integrate AI systems with EHR/EMR platforms
Implement ethical AI practices, regulatory compliance, and bias mitigation techniques

Collaboration & Leadership

Work closely with business analysts, healthcare professionals, software engineers, and ML engineers
Implement LangChain, OpenAI APIs, vector databases (Pinecone, FAISS, Weaviate), and RAG architectures
Mentor and lead junior data scientists and engineers
Contribute to AI research, publications, patents, and long-term AI strategy

Required Skills & Competencies

Expertise in Machine Learning, Deep Learning, and Generative AI
Strong Python programming skills
Hands-on experience with PyTorch and TensorFlow
Experience fine-tuning transformer-based LLMs (GPT, BERT, T5, LLaMA, etc.)
Familiarity with ASR models (Whisper, Canary, wav2vec, DeepSpeech)
Experience with text embeddings and vector databases
Proficiency in cloud platforms (AWS, Azure, GCP)
Experience with LangChain, OpenAI APIs, and RAG architectures
Knowledge of agentic AI frameworks and reinforcement learning
Familiarity with Docker, Kubernetes, and MLOps best practices
Understanding of FHIR, HL7, HIPAA, and healthcare system integrations
Strong communication, collaboration, and mentoring skills

Lead II - SE - AWS, Apache Spark (PySpark/Scala), Apache Kafka

Global digital transformation solutions provider.

Agency job

via Peak Hire Solutions by Dharati Thakkar

Hyderabad

5 - 8 yrs

₹11L - ₹20L / yr

PySpark

Apache Kafka

Data architecture

Amazon Web Services (AWS)

EMR

+32 more

JOB DETAILS:

* Job Title: Lead II - Software Engineering - AWS, Apache Spark (PySpark/Scala), Apache Kafka

* Industry: Global digital transformation solutions provider

* Salary: Best in Industry

* Experience: 5-8 years

* Location: Hyderabad

Job Summary

We are seeking a skilled Data Engineer to design, build, and optimize scalable data pipelines and cloud-based data platforms. The role involves working with large-scale batch and real-time data processing systems, collaborating with cross-functional teams, and ensuring data reliability, security, and performance across the data lifecycle.

Key Responsibilities

ETL Pipeline Development & Optimization

Design, develop, and maintain complex end-to-end ETL pipelines for large-scale data ingestion and processing.
Optimize data pipelines for performance, scalability, fault tolerance, and reliability.

Big Data Processing

Develop and optimize batch and real-time data processing solutions using Apache Spark (PySpark/Scala) and Apache Kafka.
Ensure fault-tolerant, scalable, and high-performance data processing systems.

Cloud Infrastructure Development

Build and manage scalable, cloud-native data infrastructure on AWS.
Design resilient and cost-efficient data pipelines adaptable to varying data volume and formats.

Real-Time & Batch Data Integration

Enable seamless ingestion and processing of real-time streaming and batch data sources (e.g., AWS MSK).
Ensure consistency, data quality, and a unified view across multiple data sources and formats.

Data Analysis & Insights

Partner with business teams and data scientists to understand data requirements.
Perform in-depth data analysis to identify trends, patterns, and anomalies.
Deliver high-quality datasets and present actionable insights to stakeholders.

CI/CD & Automation

Implement and maintain CI/CD pipelines using Jenkins or similar tools.
Automate testing, deployment, and monitoring to ensure smooth production releases.

Data Security & Compliance

Collaborate with security teams to ensure compliance with organizational and regulatory standards (e.g., GDPR, HIPAA).
Implement data governance practices ensuring data integrity, security, and traceability.

Troubleshooting & Performance Tuning

Identify and resolve performance bottlenecks in data pipelines.
Apply best practices for monitoring, tuning, and optimizing data ingestion and storage.

Collaboration & Cross-Functional Work

Work closely with engineers, data scientists, product managers, and business stakeholders.
Participate in agile ceremonies, sprint planning, and architectural discussions.

Skills & Qualifications

Mandatory (Must-Have) Skills

AWS Expertise

Hands-on experience with AWS Big Data services such as EMR, Managed Apache Airflow, Glue, S3, DMS, MSK, and EC2.
Strong understanding of cloud-native data architectures.

Big Data Technologies

Proficiency in PySpark or Scala Spark and SQL for large-scale data transformation and analysis.
Experience with Apache Spark and Apache Kafka in production environments.

Data Frameworks

Strong knowledge of Spark DataFrames and Datasets.

ETL Pipeline Development

Proven experience in building scalable and reliable ETL pipelines for both batch and real-time data processing.

Database Modeling & Data Warehousing

Expertise in designing scalable data models for OLAP and OLTP systems.

Data Analysis & Insights

Ability to perform complex data analysis and extract actionable business insights.
Strong analytical and problem-solving skills with a data-driven mindset.

CI/CD & Automation

Basic to intermediate experience with CI/CD pipelines using Jenkins or similar tools.
Familiarity with automated testing and deployment workflows.

Good-to-Have (Preferred) Skills

Knowledge of Java for data processing applications.
Experience with NoSQL databases (e.g., DynamoDB, Cassandra, MongoDB).
Familiarity with data governance frameworks and compliance tooling.
Experience with monitoring and observability tools such as AWS CloudWatch, Splunk, or Dynatrace.
Exposure to cost optimization strategies for large-scale cloud data platforms.

Skills: big data, scala spark, apache spark, ETL pipeline development

******

Notice period - 0 to 15 days only

Job stability is mandatory

Location: Hyderabad

Note: If a candidate is a short joiner, based in Hyderabad, and fits within the approved budget, we will proceed with an offer

F2F Interview: 14th Feb 2026

3 days in office, Hybrid model.

JOB DETAILS:

* Job Title: Lead II - Software Engineering - AWS, Apache Spark (PySpark/Scala), Apache Kafka

* Industry: Global digital transformation solutions provider

* Salary: Best in Industry

* Experience: 5-8 years

* Location: Hyderabad

Job Summary

Key Responsibilities

ETL Pipeline Development & Optimization

Design, develop, and maintain complex end-to-end ETL pipelines for large-scale data ingestion and processing.
Optimize data pipelines for performance, scalability, fault tolerance, and reliability.

Big Data Processing

Develop and optimize batch and real-time data processing solutions using Apache Spark (PySpark/Scala) and Apache Kafka.
Ensure fault-tolerant, scalable, and high-performance data processing systems.

Cloud Infrastructure Development

Build and manage scalable, cloud-native data infrastructure on AWS.
Design resilient and cost-efficient data pipelines adaptable to varying data volume and formats.

Real-Time & Batch Data Integration

Enable seamless ingestion and processing of real-time streaming and batch data sources (e.g., AWS MSK).
Ensure consistency, data quality, and a unified view across multiple data sources and formats.

Data Analysis & Insights

Partner with business teams and data scientists to understand data requirements.
Perform in-depth data analysis to identify trends, patterns, and anomalies.
Deliver high-quality datasets and present actionable insights to stakeholders.

CI/CD & Automation

Implement and maintain CI/CD pipelines using Jenkins or similar tools.
Automate testing, deployment, and monitoring to ensure smooth production releases.

Data Security & Compliance

Collaborate with security teams to ensure compliance with organizational and regulatory standards (e.g., GDPR, HIPAA).
Implement data governance practices ensuring data integrity, security, and traceability.

Troubleshooting & Performance Tuning

Identify and resolve performance bottlenecks in data pipelines.
Apply best practices for monitoring, tuning, and optimizing data ingestion and storage.

Collaboration & Cross-Functional Work

Work closely with engineers, data scientists, product managers, and business stakeholders.
Participate in agile ceremonies, sprint planning, and architectural discussions.

Skills & Qualifications

Mandatory (Must-Have) Skills

AWS Expertise

Hands-on experience with AWS Big Data services such as EMR, Managed Apache Airflow, Glue, S3, DMS, MSK, and EC2.
Strong understanding of cloud-native data architectures.

Big Data Technologies

Proficiency in PySpark or Scala Spark and SQL for large-scale data transformation and analysis.
Experience with Apache Spark and Apache Kafka in production environments.

Data Frameworks

Strong knowledge of Spark DataFrames and Datasets.

ETL Pipeline Development

Proven experience in building scalable and reliable ETL pipelines for both batch and real-time data processing.

Database Modeling & Data Warehousing

Expertise in designing scalable data models for OLAP and OLTP systems.

Data Analysis & Insights

Ability to perform complex data analysis and extract actionable business insights.
Strong analytical and problem-solving skills with a data-driven mindset.

CI/CD & Automation

Basic to intermediate experience with CI/CD pipelines using Jenkins or similar tools.
Familiarity with automated testing and deployment workflows.

Good-to-Have (Preferred) Skills

Knowledge of Java for data processing applications.
Experience with NoSQL databases (e.g., DynamoDB, Cassandra, MongoDB).
Familiarity with data governance frameworks and compliance tooling.
Experience with monitoring and observability tools such as AWS CloudWatch, Splunk, or Dynatrace.
Exposure to cost optimization strategies for large-scale cloud data platforms.

Skills: big data, scala spark, apache spark, ETL pipeline development

******

Notice period - 0 to 15 days only

Job stability is mandatory

Location: Hyderabad

Note: If a candidate is a short joiner, based in Hyderabad, and fits within the approved budget, we will proceed with an offer

F2F Interview: 14th Feb 2026

3 days in office, Hybrid model.

MLOps Engineer

AdTech Industry

Agency job

via Peak Hire Solutions by Dharati Thakkar

Noida

7 - 12 yrs

₹40L - ₹80L / yr

Machine Learning (ML)

Apache Spark

Apache Airflow

Python

Amazon Web Services (AWS)

+23 more

Review Criteria:

Strong MLOps profile
8+ years of DevOps experience and 4+ years in MLOps / ML pipeline automation and production deployments
4+ years hands-on experience in Apache Airflow / MWAA managing workflow orchestration in production
4+ years hands-on experience in Apache Spark (EMR / Glue / managed or self-hosted) for distributed computation
Must have strong hands-on experience across key AWS services including EKS/ECS/Fargate, Lambda, Kinesis, Athena/Redshift, S3, and CloudWatch
Must have hands-on Python for pipeline & automation development
4+ years of experience in AWS cloud, with recent companies
(Company) - Product companies preferred; Exception for service company candidates with strong MLOps + AWS depth

Preferred:

Hands-on in Docker deployments for ML workflows on EKS / ECS
Experience with ML observability (data drift / model drift / performance monitoring / alerting) using CloudWatch / Grafana / Prometheus / OpenSearch.
Experience with CI / CD / CT using GitHub Actions / Jenkins.
Experience with JupyterHub/Notebooks, Linux, scripting, and metadata tracking for ML lifecycle.
Understanding of ML frameworks (TensorFlow / PyTorch) for deployment scenarios.

Job Specific Criteria:

CV Attachment is mandatory
Please provide CTC Breakup (Fixed + Variable)?
Are you okay for F2F round?
Have candidate filled the google form?

Role & Responsibilities:

We are looking for a Senior MLOps Engineer with 8+ years of experience building and managing production-grade ML platforms and pipelines. The ideal candidate will have strong expertise across AWS, Airflow/MWAA, Apache Spark, Kubernetes (EKS), and automation of ML lifecycle workflows. You will work closely with data science, data engineering, and platform teams to operationalize and scale ML models in production.

Key Responsibilities:

Design and manage cloud-native ML platforms supporting training, inference, and model lifecycle automation.
Build ML/ETL pipelines using Apache Airflow / AWS MWAA and distributed data workflows using Apache Spark (EMR/Glue).
Containerize and deploy ML workloads using Docker, EKS, ECS/Fargate, and Lambda.
Develop CI/CT/CD pipelines integrating model validation, automated training, testing, and deployment.
Implement ML observability: model drift, data drift, performance monitoring, and alerting using CloudWatch, Grafana, Prometheus.
Ensure data governance, versioning, metadata tracking, reproducibility, and secure data pipelines.
Collaborate with data scientists to productionize notebooks, experiments, and model deployments.

Ideal Candidate:

8+ years in MLOps/DevOps with strong ML pipeline experience.
Strong hands-on experience with AWS:
Compute/Orchestration: EKS, ECS, EC2, Lambda
Data: EMR, Glue, S3, Redshift, RDS, Athena, Kinesis
Workflow: MWAA/Airflow, Step Functions
Monitoring: CloudWatch, OpenSearch, Grafana
Strong Python skills and familiarity with ML frameworks (TensorFlow/PyTorch/Scikit-learn).
Expertise with Docker, Kubernetes, Git, CI/CD tools (GitHub Actions/Jenkins).
Strong Linux, scripting, and troubleshooting skills.
Experience enabling reproducible ML environments using Jupyter Hub and containerized development workflows.

Education:

Master’s degree in computer science, Machine Learning, Data Engineering, or related field.

Review Criteria:

Strong MLOps profile
8+ years of DevOps experience and 4+ years in MLOps / ML pipeline automation and production deployments
4+ years hands-on experience in Apache Airflow / MWAA managing workflow orchestration in production
4+ years hands-on experience in Apache Spark (EMR / Glue / managed or self-hosted) for distributed computation
Must have strong hands-on experience across key AWS services including EKS/ECS/Fargate, Lambda, Kinesis, Athena/Redshift, S3, and CloudWatch
Must have hands-on Python for pipeline & automation development
4+ years of experience in AWS cloud, with recent companies
(Company) - Product companies preferred; Exception for service company candidates with strong MLOps + AWS depth

Preferred:

Hands-on in Docker deployments for ML workflows on EKS / ECS
Experience with ML observability (data drift / model drift / performance monitoring / alerting) using CloudWatch / Grafana / Prometheus / OpenSearch.
Experience with CI / CD / CT using GitHub Actions / Jenkins.
Experience with JupyterHub/Notebooks, Linux, scripting, and metadata tracking for ML lifecycle.
Understanding of ML frameworks (TensorFlow / PyTorch) for deployment scenarios.

Job Specific Criteria:

CV Attachment is mandatory
Please provide CTC Breakup (Fixed + Variable)?
Are you okay for F2F round?
Have candidate filled the google form?

Role & Responsibilities:

Key Responsibilities:

Design and manage cloud-native ML platforms supporting training, inference, and model lifecycle automation.
Build ML/ETL pipelines using Apache Airflow / AWS MWAA and distributed data workflows using Apache Spark (EMR/Glue).
Containerize and deploy ML workloads using Docker, EKS, ECS/Fargate, and Lambda.
Develop CI/CT/CD pipelines integrating model validation, automated training, testing, and deployment.
Implement ML observability: model drift, data drift, performance monitoring, and alerting using CloudWatch, Grafana, Prometheus.
Ensure data governance, versioning, metadata tracking, reproducibility, and secure data pipelines.
Collaborate with data scientists to productionize notebooks, experiments, and model deployments.

Ideal Candidate:

8+ years in MLOps/DevOps with strong ML pipeline experience.
Strong hands-on experience with AWS:
Compute/Orchestration: EKS, ECS, EC2, Lambda
Data: EMR, Glue, S3, Redshift, RDS, Athena, Kinesis
Workflow: MWAA/Airflow, Step Functions
Monitoring: CloudWatch, OpenSearch, Grafana
Strong Python skills and familiarity with ML frameworks (TensorFlow/PyTorch/Scikit-learn).
Expertise with Docker, Kubernetes, Git, CI/CD tools (GitHub Actions/Jenkins).
Strong Linux, scripting, and troubleshooting skills.
Experience enabling reproducible ML environments using Jupyter Hub and containerized development workflows.

Education:

Master’s degree in computer science, Machine Learning, Data Engineering, or related field.

MLOps Engineer

AdTech Industry

Agency job

via Peak Hire Solutions by Dharati Thakkar

Noida

8 - 12 yrs

₹60L - ₹80L / yr

Apache Airflow

Apache Spark

AWS CloudFormation

DevOps

MLOps

+19 more

Review Criteria:

Strong MLOps profile
8+ years of DevOps experience and 4+ years in MLOps / ML pipeline automation and production deployments
4+ years hands-on experience in Apache Airflow / MWAA managing workflow orchestration in production
4+ years hands-on experience in Apache Spark (EMR / Glue / managed or self-hosted) for distributed computation
Must have strong hands-on experience across key AWS services including EKS/ECS/Fargate, Lambda, Kinesis, Athena/Redshift, S3, and CloudWatch
Must have hands-on Python for pipeline & automation development
4+ years of experience in AWS cloud, with recent companies
(Company) - Product companies preferred; Exception for service company candidates with strong MLOps + AWS depth

Preferred:

Hands-on in Docker deployments for ML workflows on EKS / ECS
Experience with ML observability (data drift / model drift / performance monitoring / alerting) using CloudWatch / Grafana / Prometheus / OpenSearch.
Experience with CI / CD / CT using GitHub Actions / Jenkins.
Experience with JupyterHub/Notebooks, Linux, scripting, and metadata tracking for ML lifecycle.
Understanding of ML frameworks (TensorFlow / PyTorch) for deployment scenarios.

Job Specific Criteria:

CV Attachment is mandatory
Please provide CTC Breakup (Fixed + Variable)?
Are you okay for F2F round?
Have candidate filled the google form?

Role & Responsibilities:

Key Responsibilities:

Design and manage cloud-native ML platforms supporting training, inference, and model lifecycle automation.
Build ML/ETL pipelines using Apache Airflow / AWS MWAA and distributed data workflows using Apache Spark (EMR/Glue).
Containerize and deploy ML workloads using Docker, EKS, ECS/Fargate, and Lambda.
Develop CI/CT/CD pipelines integrating model validation, automated training, testing, and deployment.
Implement ML observability: model drift, data drift, performance monitoring, and alerting using CloudWatch, Grafana, Prometheus.
Ensure data governance, versioning, metadata tracking, reproducibility, and secure data pipelines.
Collaborate with data scientists to productionize notebooks, experiments, and model deployments.

Ideal Candidate:

8+ years in MLOps/DevOps with strong ML pipeline experience.
Strong hands-on experience with AWS:
Compute/Orchestration: EKS, ECS, EC2, Lambda
Data: EMR, Glue, S3, Redshift, RDS, Athena, Kinesis
Workflow: MWAA/Airflow, Step Functions
Monitoring: CloudWatch, OpenSearch, Grafana
Strong Python skills and familiarity with ML frameworks (TensorFlow/PyTorch/Scikit-learn).
Expertise with Docker, Kubernetes, Git, CI/CD tools (GitHub Actions/Jenkins).
Strong Linux, scripting, and troubleshooting skills.
Experience enabling reproducible ML environments using Jupyter Hub and containerized development workflows.

Education:

Master’s degree in computer science, Machine Learning, Data Engineering, or related field.

Review Criteria:

Strong MLOps profile
8+ years of DevOps experience and 4+ years in MLOps / ML pipeline automation and production deployments
4+ years hands-on experience in Apache Airflow / MWAA managing workflow orchestration in production
4+ years hands-on experience in Apache Spark (EMR / Glue / managed or self-hosted) for distributed computation
Must have strong hands-on experience across key AWS services including EKS/ECS/Fargate, Lambda, Kinesis, Athena/Redshift, S3, and CloudWatch
Must have hands-on Python for pipeline & automation development
4+ years of experience in AWS cloud, with recent companies
(Company) - Product companies preferred; Exception for service company candidates with strong MLOps + AWS depth

Preferred:

Hands-on in Docker deployments for ML workflows on EKS / ECS
Experience with ML observability (data drift / model drift / performance monitoring / alerting) using CloudWatch / Grafana / Prometheus / OpenSearch.
Experience with CI / CD / CT using GitHub Actions / Jenkins.
Experience with JupyterHub/Notebooks, Linux, scripting, and metadata tracking for ML lifecycle.
Understanding of ML frameworks (TensorFlow / PyTorch) for deployment scenarios.

Job Specific Criteria:

CV Attachment is mandatory
Please provide CTC Breakup (Fixed + Variable)?
Are you okay for F2F round?
Have candidate filled the google form?

Role & Responsibilities:

Key Responsibilities:

Design and manage cloud-native ML platforms supporting training, inference, and model lifecycle automation.
Build ML/ETL pipelines using Apache Airflow / AWS MWAA and distributed data workflows using Apache Spark (EMR/Glue).
Containerize and deploy ML workloads using Docker, EKS, ECS/Fargate, and Lambda.
Develop CI/CT/CD pipelines integrating model validation, automated training, testing, and deployment.
Implement ML observability: model drift, data drift, performance monitoring, and alerting using CloudWatch, Grafana, Prometheus.
Ensure data governance, versioning, metadata tracking, reproducibility, and secure data pipelines.
Collaborate with data scientists to productionize notebooks, experiments, and model deployments.

Ideal Candidate:

8+ years in MLOps/DevOps with strong ML pipeline experience.
Strong hands-on experience with AWS:
Compute/Orchestration: EKS, ECS, EC2, Lambda
Data: EMR, Glue, S3, Redshift, RDS, Athena, Kinesis
Workflow: MWAA/Airflow, Step Functions
Monitoring: CloudWatch, OpenSearch, Grafana
Strong Python skills and familiarity with ML frameworks (TensorFlow/PyTorch/Scikit-learn).
Expertise with Docker, Kubernetes, Git, CI/CD tools (GitHub Actions/Jenkins).
Strong Linux, scripting, and troubleshooting skills.
Experience enabling reproducible ML environments using Jupyter Hub and containerized development workflows.

Education:

Master’s degree in computer science, Machine Learning, Data Engineering, or related field.

Lead II - Data Engineering -Python - Databricks, PySpark, Python

Global digital transformation solutions provider.

Agency job

via Peak Hire Solutions by Dharati Thakkar

Bengaluru (Bangalore)

7 - 9 yrs

₹15L - ₹28L / yr

databricks

Python

SQL

PySpark

Amazon Web Services (AWS)

+9 more

Role Proficiency:

This role requires proficiency in developing data pipelines including coding and testing for ingesting wrangling transforming and joining data from various sources. The ideal candidate should be adept in ETL tools like Informatica Glue Databricks and DataProc with strong coding skills in Python PySpark and SQL. This position demands independence and proficiency across various data domains. Expertise in data warehousing solutions such as Snowflake BigQuery Lakehouse and Delta Lake is essential including the ability to calculate processing costs and address performance issues. A solid understanding of DevOps and infrastructure needs is also required.

Skill Examples:

Proficiency in SQL Python or other programming languages used for data manipulation.
Experience with ETL tools such as Apache Airflow Talend Informatica AWS Glue Dataproc and Azure ADF.
Hands-on experience with cloud platforms like AWS Azure or Google Cloud particularly with data-related services (e.g. AWS Glue BigQuery).
Conduct tests on data pipelines and evaluate results against data quality and performance specifications.
Experience in performance tuning.
Experience in data warehouse design and cost improvements.
Apply and optimize data models for efficient storage retrieval and processing of large datasets.
Communicate and explain design/development aspects to customers.
Estimate time and resource requirements for developing/debugging features/components.
Participate in RFP responses and solutioning.
Mentor team members and guide them in relevant upskilling and certification.

Knowledge Examples:

Knowledge of various ETL services used by cloud providers including Apache PySpark AWS Glue GCP DataProc/Dataflow Azure ADF and ADLF.
Proficient in SQL for analytics and windowing functions.
Understanding of data schemas and models.
Familiarity with domain-related data.
Knowledge of data warehouse optimization techniques.
Understanding of data security concepts.
Awareness of patterns frameworks and automation practices.

Additional Comments:

# of Resources: 22 Role(s): Technical Role Location(s): India Planned Start Date: 1/1/2026 Planned End Date: 6/30/2026

Project Overview:

Role Scope / Deliverables: We are seeking highly skilled Data Engineer with strong experience in Databricks, PySpark, Python, SQL, and AWS to join our data engineering team on or before 1st week of Dec, 2025.

The candidate will be responsible for designing, developing, and optimizing large-scale data pipelines and analytics solutions that drive business insights and operational efficiency.

Design, build, and maintain scalable data pipelines using Databricks and PySpark.

Develop and optimize complex SQL queries for data extraction, transformation, and analysis.

Implement data integration solutions across multiple AWS services (S3, Glue, Lambda, Redshift, EMR, etc.).

Collaborate with analytics, data science, and business teams to deliver clean, reliable, and timely datasets.

Ensure data quality, performance, and reliability across data workflows.

Participate in code reviews, data architecture discussions, and performance optimization initiatives.

Support migration and modernization efforts for legacy data systems to modern cloud-based solutions.

Key Skills:

Hands-on experience with Databricks, PySpark & Python for building ETL/ELT pipelines.

Proficiency in SQL (performance tuning, complex joins, CTEs, window functions).

Strong understanding of AWS services (S3, Glue, Lambda, Redshift, CloudWatch, etc.).

Experience with data modeling, schema design, and performance optimization.

Familiarity with CI/CD pipelines, version control (Git), and workflow orchestration (Airflow preferred).

Excellent problem-solving, communication, and collaboration skills.

Skills: Databricks, Pyspark & Python, Sql, Aws Services

Must-Haves

Python/PySpark (5+ years), SQL (5+ years), Databricks (3+ years), AWS Services (3+ years), ETL tools (Informatica, Glue, DataProc) (3+ years)

Hands-on experience with Databricks, PySpark & Python for ETL/ELT pipelines.

Proficiency in SQL (performance tuning, complex joins, CTEs, window functions).

Strong understanding of AWS services (S3, Glue, Lambda, Redshift, CloudWatch, etc.).

Experience with data modeling, schema design, and performance optimization.

Familiarity with CI/CD pipelines, Git, and workflow orchestration (Airflow preferred).

******

Notice period - Immediate to 15 days

Location: Bangalore

Role Proficiency:

Skill Examples:

Proficiency in SQL Python or other programming languages used for data manipulation.
Experience with ETL tools such as Apache Airflow Talend Informatica AWS Glue Dataproc and Azure ADF.
Hands-on experience with cloud platforms like AWS Azure or Google Cloud particularly with data-related services (e.g. AWS Glue BigQuery).
Conduct tests on data pipelines and evaluate results against data quality and performance specifications.
Experience in performance tuning.
Experience in data warehouse design and cost improvements.
Apply and optimize data models for efficient storage retrieval and processing of large datasets.
Communicate and explain design/development aspects to customers.
Estimate time and resource requirements for developing/debugging features/components.
Participate in RFP responses and solutioning.
Mentor team members and guide them in relevant upskilling and certification.

Knowledge Examples:

Knowledge of various ETL services used by cloud providers including Apache PySpark AWS Glue GCP DataProc/Dataflow Azure ADF and ADLF.
Proficient in SQL for analytics and windowing functions.
Understanding of data schemas and models.
Familiarity with domain-related data.
Knowledge of data warehouse optimization techniques.
Understanding of data security concepts.
Awareness of patterns frameworks and automation practices.