Etl jobs

50+ ETL Jobs in India

Apply to 50+ ETL Jobs on CutShort.io. Find your next job, effortlessly. Browse ETL Jobs and apply today!

Senior Data Engineer

at Sonatype

5 candid answers

Posted by Reshika Mendiratta

Hyderabad

5 - 8 yrs

Upto ₹28L / yr (Varies

)

Java

ETL

Spring

databricks

SQL

+5 more

Who We Are

At Sonatype, we help organizations build better, more secure software by enabling them to understand and control their software supply chains. Our products are trusted by thousands of engineering teams globally, providing critical insights into dependency health, license risk, and software security. We’re passionate about empowering developers—and we back it with data.

The Opportunity

We’re looking for a Data Engineer with full stack expertise to join our growing Data Platform team. This role blends data engineering, microservices, and full-stack development to deliver end-to-end services that power analytics, machine learning, and advanced search across Sonatype.

You will design and build data-driven microservices and workflows using Java, Python, and Spring Batch, implement frontends for data workflows, and deploy everything through CI/CD pipelines into AWS ECS/Fargate. You’ll also ensure services are monitorable, debuggable, and reliable at scale, while clearly documenting designs with Mermaid-based sequence and dataflow diagrams.

This is a hands-on engineering role for someone who thrives at the intersection of data systems, fullstack development, ML, and cloud-native platforms.

What You’ll Do

Design, build, and maintain data pipelines, ETL/ELT workflows, and scalable microservices.
Development of complex web scraping (Playwright) and realtime pipelines (Kafka/Queues/Flink).
Develop end-to-end microservices with backend (Java 5+, Python 5+, Spring Batch 2+) and frontend (React or any).
Deploy, publish, and operate services in AWS ECS/Fargate using CI/CD pipelines (Jenkins, GitOps).
Architect and optimize data storage models in SQL (MySQL, PostgreSQL) and NoSQL stores.
Implement web scraping and external data ingestion pipelines.
Enable Databricks and PySpark-based workflows for large-scale analytics.
Build advanced data search capabilities (fuzzy matching, vector similarity search, semantic retrieval).
Apply ML techniques (scikit-learn, classification algorithms, predictive modeling) to data-driven solutions.
Implement observability, debugging, monitoring, and alerting for deployed services.
Create Mermaid sequence diagrams, flowcharts, and dataflow diagrams to document system architecture and workflows.
Drive best practices in fullstack data service development, including architecture, testing, and documentation.

What We’re Looking For

Minimum Qualifications

2+ years of experience as a Data Engineer or a Software Backend engineering role
Strong programming skills in Python, Scala, or Java
Hands-on experience with HBase or similar NoSQL columnar stores
Hands-on experience with distributed data systems like Spark, Kafka, or Flink
Proficient in writing complex SQL and optimizing queries for performance
Experience building and maintaining robust ETL/ELT pipelines in production
Familiarity with workflow orchestration tools (Airflow, Dagster, or similar)
Understanding of data modeling techniques (star schema, dimensional modeling, etc.)
Familiarity with CI/CD pipelines (Jenkins or similar)
Ability to visualize and communicate architectures using Mermaid diagrams

Bonus Points

Experience working with Databricks, dbt, Terraform, or Kubernetes
Familiarity with streaming data pipelines or real-time processing
Exposure to data governance frameworks and tools
Experience supporting data products or ML pipelines in production
Strong understanding of data privacy, security, and compliance best practices

Why You’ll Love Working Here

Data with purpose: Work on problems that directly impact how the world builds secure software
Modern tooling: Leverage the best of open-source and cloud-native technologies
Collaborative culture: Join a passionate team that values learning, autonomy, and impact

Who We Are

The Opportunity

This is a hands-on engineering role for someone who thrives at the intersection of data systems, fullstack development, ML, and cloud-native platforms.

What You’ll Do

Design, build, and maintain data pipelines, ETL/ELT workflows, and scalable microservices.
Development of complex web scraping (Playwright) and realtime pipelines (Kafka/Queues/Flink).
Develop end-to-end microservices with backend (Java 5+, Python 5+, Spring Batch 2+) and frontend (React or any).
Deploy, publish, and operate services in AWS ECS/Fargate using CI/CD pipelines (Jenkins, GitOps).
Architect and optimize data storage models in SQL (MySQL, PostgreSQL) and NoSQL stores.
Implement web scraping and external data ingestion pipelines.
Enable Databricks and PySpark-based workflows for large-scale analytics.
Build advanced data search capabilities (fuzzy matching, vector similarity search, semantic retrieval).
Apply ML techniques (scikit-learn, classification algorithms, predictive modeling) to data-driven solutions.
Implement observability, debugging, monitoring, and alerting for deployed services.
Create Mermaid sequence diagrams, flowcharts, and dataflow diagrams to document system architecture and workflows.
Drive best practices in fullstack data service development, including architecture, testing, and documentation.

What We’re Looking For

Minimum Qualifications

2+ years of experience as a Data Engineer or a Software Backend engineering role
Strong programming skills in Python, Scala, or Java
Hands-on experience with HBase or similar NoSQL columnar stores
Hands-on experience with distributed data systems like Spark, Kafka, or Flink
Proficient in writing complex SQL and optimizing queries for performance
Experience building and maintaining robust ETL/ELT pipelines in production
Familiarity with workflow orchestration tools (Airflow, Dagster, or similar)
Understanding of data modeling techniques (star schema, dimensional modeling, etc.)
Familiarity with CI/CD pipelines (Jenkins or similar)
Ability to visualize and communicate architectures using Mermaid diagrams

Bonus Points

Experience working with Databricks, dbt, Terraform, or Kubernetes
Familiarity with streaming data pipelines or real-time processing
Exposure to data governance frameworks and tools
Experience supporting data products or ML pipelines in production
Strong understanding of data privacy, security, and compliance best practices

Why You’ll Love Working Here

Data with purpose: Work on problems that directly impact how the world builds secure software
Modern tooling: Leverage the best of open-source and cloud-native technologies
Collaborative culture: Join a passionate team that values learning, autonomy, and impact

Data Engineer - Validation & quality

Tech AI startup in Bangalore

Agency job

via Recruit Square by Priyanka choudhary

Remote only

4 - 8 yrs

₹12L - ₹18L / yr

pandas

NumPy

MLOps

SQL

ETL

+1 more

Data Engineer – Validation & Quality

Responsibilities

Build rule-based and statistical validation frameworks using Pandas / NumPy.
Implement contradiction detection, reconciliation, and anomaly flagging.
Design and compute confidence metrics for each evidence record.
Automate schema compliance, sampling, and checksum verification across data sources.
Collaborate with the Kernel to embed validation results into every output artifact.

Requirements

5 + years in data engineering, data quality, or MLOps validation.
Strong SQL optimization and ETL background.
Familiarity with data lineage, DQ frameworks, and regulatory standards (SOC 2 / GDPR).

Data Engineer – Validation & Quality

Responsibilities

Build rule-based and statistical validation frameworks using Pandas / NumPy.
Implement contradiction detection, reconciliation, and anomaly flagging.
Design and compute confidence metrics for each evidence record.
Automate schema compliance, sampling, and checksum verification across data sources.
Collaborate with the Kernel to embed validation results into every output artifact.

Requirements

5 + years in data engineering, data quality, or MLOps validation.
Strong SQL optimization and ETL background.
Familiarity with data lineage, DQ frameworks, and regulatory standards (SOC 2 / GDPR).

Senior Data Engineer/Developer

at Blutic India Pvt Ltd

Posted by SURBHI Varshney

Remote only

10 - 15 yrs

₹18L - ₹20L / yr

SQL,

ETL

Python

REDSHIFT

SNOWFLAKE

+2 more

We are looking for a Senior Data Engineer/Developer with over 10+ years of experience to be a key contributor to our data-driven initiatives. This role is 'primarily' focused on development, involving the design and construction of data models, writing complex SQL, developing ETL processes, and contributing to our data architecture. The 'secondary' focus involves applying your deep database knowledge to performance tuning, query optimization, and collaborating on DBA-related support activities on AWS environments (RDS, Redshift, SQL Server, Snowflake). The ideal candidate is a builder who understands how to get the most out of a database platform.Key Responsibilities Data Development & Engineering (Primary Focus):

· Design & Development: Architect, design, and implement efficient, scalable, and sustainable data models and database schemas.

· Advanced SQL Programming: Write sophisticated, highly-optimized SQL code for complex business logic, data retrieval, and manipulation within MySQL RDS, SQL Server, and AWS Redshift.

· Data Pipeline & ETL Development: Collaborate with engineering teams to design, build, and maintain robust ETL processes and data pipeline integrations.

· Automation & Scripting: Utilize Python as a primary tool for scripting, automation, data processing, and enhancing platform capabilities.

· CI/CD Ownership: Own and enhance CI/CD pipelines for database deployments, schema migrations, and automated testing, ensuring smooth and reliable releases.

· Solution Collaboration: Collaborate with application engineering teams to deliver scalable, secure, and performing data solutions and APIs.

Database Administration & Optimization (Secondary Focus):

· Performance Tuning: Proactively identify and resolve performance bottlenecks, including slow-running queries, indexing strategies, and resource contention. Use tools like SQL Sentry for deep diagnostics.

· Operational Support: Perform essential DBA activities such as supporting backup/recovery strategies, contributing to high-availability designs, and assisting with patch management plans.

· AWS Data Management: Administer and optimize AWS RDS and Redshift instances, leveraging knowledge of DB Clusters (Read Replicas, Multi-AZ) for development and testing.

· Monitoring & Reliability: Monitor data platform health using Amazon CloudWatch, xMatters, and other tools to ensure high availability and reliability, tackling issues as they arise.

Architecture & Mentorship:

· Contribute to architectural decisions and infrastructure modernization efforts on AWS and Snowflake.

· Provide technical guidance and mentorship to other developers on best practices in database design and SQL.

Required Qualifications & Experience

· 10+ years of experience in a data engineering, database development, or software development role with a heavy focus on data.

· Expert-level SQL programming skills with extensive experience in MySQL (AWS RDS) and Microsoft SQL Server ,Redshift and snowflake

· Strong development skills in Python for Lambda / glue development on

· Hands-on experience designing and optimizing for AWS Redshift

· Proven experience in performance tuning and optimization of complex queries and data models.

· Solid understanding of ETL concepts, processes, and tools.

· Experience with CI/CD tools (e.g., Bitbucket Pipelines, Jenkins) for automating database deployments.

· Experience managing production data environments and troubleshooting platform issues.

· Excellent written and verbal communication skills, with the ability to work effectively in a remote team.

Preferred Skills (Nice-to-Have)

· Understanding of Data Governance

· Experience with Snowflake, particularly around architecture, agents, data sharing, security, and performance.

· Knowledge of infrastructure-as-code (IaC) tools like CloudFormation.

Work Schedule & Conditions

· This is a 100% remote, long-term opportunity.

· The standard work week will be Wednesday through Sunday.

· Your designated days off will be Monday and Tuesday.

· You must be willing to work partially overlapping hours with Eastern Standard Time (EST) to ensure collaboration with the team and support during core business hours