Data warehouse dwh jobs

50+ Data Warehouse (DWH) Jobs in India

Apply to 50+ Data Warehouse (DWH) Jobs on CutShort.io. Find your next job, effortlessly. Browse Data Warehouse (DWH) Jobs and apply today!

Senior Data Engineer (Dataform, BigQuery)

AI Industry

Agency job

via Peak Hire Solutions by Dhara Thakkar

Mumbai, Bengaluru (Bangalore), Hyderabad, Gurugram

6 - 10 yrs

₹32L - ₹42L / yr

ETL

SQL

Google Cloud Platform (GCP)

Data engineering

ELT

+17 more

Role & Responsibilities:

We are looking for a strong Data Engineer to join our growing team. The ideal candidate brings solid ETL fundamentals, hands-on pipeline experience, and cloud platform proficiency — with a preference for GCP / BigQuery expertise.

Responsibilities:

Design, build, and maintain scalable data pipelines and ETL/ELT workflows
Work with Dataform or DBT to implement transformation logic and data models
Develop and optimize data solutions on GCP (BigQuery, GCS) or AWS/Azure
Support data migration initiatives and data mesh architecture patterns
Collaborate with analysts, scientists, and business stakeholders to deliver reliable data products
Apply data governance and quality best practices across the data lifecycle
Troubleshoot pipeline issues and drive proactive monitoring and resolution

Ideal Candidate:

Strong Data Engineer Profile
Must have 6+ years of hands-on experience in Data Engineering, with strong ownership of end-to-end data pipeline development.
Must have strong experience in ETL/ELT pipeline design, transformation logic, and data workflow orchestration.
Must have hands-on experience with any one of the following: Dataform, dbt, or BigQuery, with practical exposure to data transformation, modeling, or cloud data warehousing.
Must have working experience on any cloud platform: GCP (preferred), AWS, or Azure, including object storage (GCS, S3, ADLS).
Must have strong SQL skills with experience in writing complex queries and optimizing performance.
Must have programming experience in Python and/or SQL for data processing.
Must have experience in building and maintaining scalable data pipelines and troubleshooting data issues.
Exposure to data migration projects and/or data mesh architecture concepts.
Experience with Spark / PySpark or large-scale data processing frameworks.
Experience working in product-based companies or data-driven environments.
Bachelor’s or Master’s degree in Computer Science, Engineering, or related field.

NOTE:

There will be an interview drive scheduled on 28th and 29th March 2026, and if shortlisted, they will be expected to be available on these Interview dates. Only Immediate joiners are considered.

Role & Responsibilities:

Responsibilities:

Design, build, and maintain scalable data pipelines and ETL/ELT workflows
Work with Dataform or DBT to implement transformation logic and data models
Develop and optimize data solutions on GCP (BigQuery, GCS) or AWS/Azure
Support data migration initiatives and data mesh architecture patterns
Collaborate with analysts, scientists, and business stakeholders to deliver reliable data products
Apply data governance and quality best practices across the data lifecycle
Troubleshoot pipeline issues and drive proactive monitoring and resolution

Ideal Candidate:

Strong Data Engineer Profile
Must have 6+ years of hands-on experience in Data Engineering, with strong ownership of end-to-end data pipeline development.
Must have strong experience in ETL/ELT pipeline design, transformation logic, and data workflow orchestration.
Must have hands-on experience with any one of the following: Dataform, dbt, or BigQuery, with practical exposure to data transformation, modeling, or cloud data warehousing.
Must have working experience on any cloud platform: GCP (preferred), AWS, or Azure, including object storage (GCS, S3, ADLS).
Must have strong SQL skills with experience in writing complex queries and optimizing performance.
Must have programming experience in Python and/or SQL for data processing.
Must have experience in building and maintaining scalable data pipelines and troubleshooting data issues.
Exposure to data migration projects and/or data mesh architecture concepts.
Experience with Spark / PySpark or large-scale data processing frameworks.
Experience working in product-based companies or data-driven environments.
Bachelor’s or Master’s degree in Computer Science, Engineering, or related field.

NOTE:

There will be an interview drive scheduled on 28th and 29th March 2026, and if shortlisted, they will be expected to be available on these Interview dates. Only Immediate joiners are considered.

Data Scientist

Public Listed - Product Based company

Agency job

via Recruiting Bond by Pavan Kumar

Bengaluru (Bangalore)

4 - 8 yrs

₹25L - ₹70L / yr

Data Science

data platforms

Data-flow analysis

Data pipelines

AI Infrastructure

+28 more

🤖 Data Scientist – Frontier AI for Data Platforms & Distributed Systems (4–8 Years)

Experience: 4–8 Years

Location: Bengaluru (On-site / Hybrid)

Company: Publicly Listed, Global Product Platform

🧠 About the Mission

We are building a Top 1% AI-Native Engineering & Data Organization — from first principles.

This is not incremental improvement.

This is a full-stack transformation of a large-scale enterprise into an AI-native data platform company.

We are re-architecting:

Legacy systems → AI-native architectures
Static pipelines → autonomous, self-healing systems
Data platforms → intelligent, learning systems
Software workflows → agentic execution layers

This is the kind of shift you would expect from companies like Google or Microsoft —

Except here, you will build it from day zero and scale it globally.

🧠 The Opportunity: This role sits at the intersection of three high-impact domains:

1. Frontier AI Systems: Large Language Models (LLMs), Small Language Models (SLMs), and Agentic AI

2. Data Platforms: Warehouses, Lakehouses, Streaming Systems, Query Engines

3. Distributed Systems: High-throughput, low-latency, multi-region infrastructure

We are building systems where:

Data platforms optimize themselves using ML/LLMs
Pipelines are autonomous, self-healing, and adaptive
Queries are generated, optimized, and executed intelligently
Infrastructure learns from usage and evolves continuously

This is: AI as the control plane for data infrastructure

🧩 What You’ll Work On

You will design and build AI-native systems deeply embedded inside data infrastructure.

1. AI-Native Data Platforms

Build LLM-powered interfaces:
Natural language → SQL / pipelines / transformations
Design semantic data layers:
Embeddings, vector search, knowledge graphs
Develop AI copilots:
For data engineers, analysts, and platform users

2. Autonomous Data Pipelines

Build self-healing ETL/ELT systems using AI agents
Create pipelines that:
Detect anomalies in real time
Automatically debug failures
Dynamically optimize transformations

3. Intelligent Query & Compute Optimization

Apply ML/LLMs to:
Query planning and execution
Cost-based optimization using learned models
Workload prediction and scheduling
Build systems that:
Learn from query patterns
Continuously improve performance and cost efficiency

4. Distributed Data + AI Infrastructure

Architect systems operating at:
Billions of events per day
Petabyte-scale data
Work with:
Distributed compute engines (Spark / Flink / Ray class systems)
Streaming systems (Kafka-class infra)
Vector databases and hybrid retrieval systems

5. Learning Systems & Feedback Loops

Build closed-loop AI systems:
Execution → feedback → model updates
Develop:
Continual learning pipelines
Online learning systems for infra optimization
Experimentation frameworks (A/B, bandits, eval pipelines)

6. LLM & Agentic Systems (Infra-Aware)

Build agents that understand data systems
Enable:
Autonomous pipeline debugging
Root cause analysis for infra failures
Intelligent orchestration of data workflows

🧠 What We’re Looking For

Core Foundations

Strong grounding in:
Machine Learning, Deep Learning, NLP
Statistics, optimization, probabilistic systems
Distributed systems fundamentals
Deep understanding of:
Transformer architectures
Modern LLM ecosystems

Hands-On Expertise

Experience building:
LLM / GenAI systems (RAG, fine-tuning, embeddings)
Data platforms (warehouse, lake, lakehouse architectures)
Distributed pipelines and compute systems
Strong programming skills:
Python (ML/AI stack)
SQL (deep understanding — query planning, optimization mindset)

Systems Thinking (Critical)

You think in systems, not components.

Built or worked on:
Large-scale data pipelines
High-throughput distributed systems
Low-latency, high-concurrency architectures
Understand:
Query optimization and execution
Data partitioning, indexing, caching
Trade-offs in distributed systems

🔥 What Sets You Apart (Top 1%)

Built AI-powered data platforms or infra systems in production
Designed or contributed to:
Query engines / optimizers
Data observability / lineage systems
AI-driven infra or AIOps platforms
Experience with:
Multi-modal AI (logs, metrics, traces, text)
Agentic AI systems
Autonomous infrastructure
Worked on systems at scale comparable to:
Google (BigQuery-like systems)
Meta (real-time analytics infra)
Snowflake / Databricks (lakehouse architectures)

🧬 Ideal Background (Not Mandatory)

We often see strong candidates from:

Data infrastructure or platform engineering teams
AI-first startups or research-driven environments
High-scale product companies

Experience building:

Internal platforms used by 1000s of engineers
Systems serving millions of users / high throughput workloads
Multi-region, distributed cloud systems

🧠 The Kind of Problems You’ll Solve

Can LLMs replace traditional query optimizers?
How do we build self-healing data pipelines at scale?
Can data systems learn from every query and improve automatically?
How do we embed reasoning and planning into infrastructure layers?
What does a fully autonomous data platform look like?

Background: We Commonly See (But Not Limited To)

Our team often includes engineers from top-tier institutions and strong research or product backgrounds, including:

Leading engineering schools in India and globally
Engineers with experience in top product companies, AI startups, or research-driven environments
That said, we care far more about demonstrated ability, depth, and impact than pedigree alone.

🤖 Data Scientist – Frontier AI for Data Platforms & Distributed Systems (4–8 Years)

Experience: 4–8 Years

Location: Bengaluru (On-site / Hybrid)

Company: Publicly Listed, Global Product Platform

🧠 About the Mission

We are building a Top 1% AI-Native Engineering & Data Organization — from first principles.

This is not incremental improvement.

This is a full-stack transformation of a large-scale enterprise into an AI-native data platform company.

We are re-architecting:

Legacy systems → AI-native architectures
Static pipelines → autonomous, self-healing systems
Data platforms → intelligent, learning systems
Software workflows → agentic execution layers

This is the kind of shift you would expect from companies like Google or Microsoft —

Except here, you will build it from day zero and scale it globally.

🧠 The Opportunity: This role sits at the intersection of three high-impact domains:

1. Frontier AI Systems: Large Language Models (LLMs), Small Language Models (SLMs), and Agentic AI

2. Data Platforms: Warehouses, Lakehouses, Streaming Systems, Query Engines

3. Distributed Systems: High-throughput, low-latency, multi-region infrastructure

We are building systems where:

Data platforms optimize themselves using ML/LLMs
Pipelines are autonomous, self-healing, and adaptive
Queries are generated, optimized, and executed intelligently
Infrastructure learns from usage and evolves continuously

This is: AI as the control plane for data infrastructure

🧩 What You’ll Work On

You will design and build AI-native systems deeply embedded inside data infrastructure.

1. AI-Native Data Platforms

Build LLM-powered interfaces:
Natural language → SQL / pipelines / transformations
Design semantic data layers:
Embeddings, vector search, knowledge graphs
Develop AI copilots:
For data engineers, analysts, and platform users

2. Autonomous Data Pipelines

Build self-healing ETL/ELT systems using AI agents
Create pipelines that:
Detect anomalies in real time
Automatically debug failures
Dynamically optimize transformations

3. Intelligent Query & Compute Optimization

Apply ML/LLMs to:
Query planning and execution
Cost-based optimization using learned models
Workload prediction and scheduling
Build systems that:
Learn from query patterns
Continuously improve performance and cost efficiency

4. Distributed Data + AI Infrastructure

Architect systems operating at:
Billions of events per day
Petabyte-scale data
Work with:
Distributed compute engines (Spark / Flink / Ray class systems)
Streaming systems (Kafka-class infra)
Vector databases and hybrid retrieval systems

5. Learning Systems & Feedback Loops

Build closed-loop AI systems:
Execution → feedback → model updates
Develop:
Continual learning pipelines
Online learning systems for infra optimization
Experimentation frameworks (A/B, bandits, eval pipelines)

6. LLM & Agentic Systems (Infra-Aware)

Build agents that understand data systems
Enable:
Autonomous pipeline debugging
Root cause analysis for infra failures
Intelligent orchestration of data workflows

🧠 What We’re Looking For

Core Foundations

Strong grounding in:
Machine Learning, Deep Learning, NLP
Statistics, optimization, probabilistic systems
Distributed systems fundamentals
Deep understanding of:
Transformer architectures
Modern LLM ecosystems

Hands-On Expertise

Experience building:
LLM / GenAI systems (RAG, fine-tuning, embeddings)
Data platforms (warehouse, lake, lakehouse architectures)
Distributed pipelines and compute systems
Strong programming skills:
Python (ML/AI stack)
SQL (deep understanding — query planning, optimization mindset)

Systems Thinking (Critical)

You think in systems, not components.

Built or worked on:
Large-scale data pipelines
High-throughput distributed systems
Low-latency, high-concurrency architectures
Understand:
Query optimization and execution
Data partitioning, indexing, caching
Trade-offs in distributed systems

🔥 What Sets You Apart (Top 1%)

Built AI-powered data platforms or infra systems in production
Designed or contributed to:
Query engines / optimizers
Data observability / lineage systems
AI-driven infra or AIOps platforms
Experience with:
Multi-modal AI (logs, metrics, traces, text)
Agentic AI systems
Autonomous infrastructure
Worked on systems at scale comparable to:
Google (BigQuery-like systems)
Meta (real-time analytics infra)
Snowflake / Databricks (lakehouse architectures)

🧬 Ideal Background (Not Mandatory)

We often see strong candidates from:

Data infrastructure or platform engineering teams
AI-first startups or research-driven environments
High-scale product companies

Experience building:

Internal platforms used by 1000s of engineers
Systems serving millions of users / high throughput workloads
Multi-region, distributed cloud systems

🧠 The Kind of Problems You’ll Solve

Can LLMs replace traditional query optimizers?
How do we build self-healing data pipelines at scale?
Can data systems learn from every query and improve automatically?
How do we embed reasoning and planning into infrastructure layers?
What does a fully autonomous data platform look like?

Background: We Commonly See (But Not Limited To)

Our team often includes engineers from top-tier institutions and strong research or product backgrounds, including:

Leading engineering schools in India and globally
Engineers with experience in top product companies, AI startups, or research-driven environments
That said, we care far more about demonstrated ability, depth, and impact than pedigree alone.

Data Engineer

at Wissen Technology

4 recruiters

Posted by Robin Silverster

Bengaluru (Bangalore)

5 - 8 yrs

₹10L - ₹33L / yr

Data engineering

databricks

Python

Data Warehouse (DWH)

SQL

+1 more

Data Engineer MS Data Engineer + Snowflake/databrics Required Skills: · 6 to 8 years of being a practitioner in data engineering or a related field. Should have experience in Snowflake or Databricks. Experience with data processing frameworks like Apache Spark or Hadoop. Experience working on Databricks. Familiarity with cloud platforms (AWS, Azure) and their data services. Experience with data warehousing concepts and technologies. Experience with message queues and streaming platforms (e.g., Kafka). Excellent communication and collaboration skills. Ability to work independently and as part of a geographically distributed team.

Associate III - Data Engineering

Global digital transformation solutions provider

Agency job

via Peak Hire Solutions by Dhara Thakkar

Kochi (Cochin), Trivandrum

4 - 6 yrs

₹11L - ₹17L / yr

Amazon Web Services (AWS)

Python

Data engineering

SQL

ETL

+22 more

JOB DETAILS:

* Job Title: Associate III - Data Engineering

* Industry: Global digital transformation solutions provide

* Salary: Best in Industry

* Experience: 4-6 years

* Location: Trivandrum, Kochi

Job Description

Job Title:

Data Services Engineer – AWS & Snowflake

Job Summary:

As a Data Services Engineer, you will be responsible for designing, developing, and maintaining robust data solutions using AWS cloud services and Snowflake.

You will work closely with cross-functional teams to ensure data is accessible, secure, and optimized for performance.

Your role will involve implementing scalable data pipelines, managing data integration, and supporting analytics initiatives.

Responsibilities:

• Design and implement scalable and secure data pipelines on AWS and Snowflake (Star/Snowflake schema)

• Optimize query performance using clustering keys, materialized views, and caching

• Develop and maintain Snowflake data warehouses and data marts.

• Build and maintain ETL/ELT workflows using Snowflake-native features (Snowpipe, Streams, Tasks).

• Integrate Snowflake with cloud platforms (AWS, Azure, GCP) and third-party tools (Airflow, dbt, Informatica)

• Utilize Snowpark and Python/Java for complex transformations

• Implement RBAC, data masking, and row-level security.

• Optimize data storage and retrieval for performance and cost-efficiency.

• Collaborate with stakeholders to gather data requirements and deliver solutions.

• Ensure data quality, governance, and compliance with industry standards.

• Monitor, troubleshoot, and resolve data pipeline and performance issues.

• Document data architecture, processes, and best practices.

• Support data migration and integration from various sources.

Qualifications:

• Bachelor’s degree in Computer Science, Information Technology, or a related field.

• 3 to 4 years of hands-on experience in data engineering or data services.

• Proven experience with AWS data services (e.g., S3, Glue, Redshift, Lambda).

• Strong expertise in Snowflake architecture, development, and optimization.

• Proficiency in SQL and Python for data manipulation and scripting.

• Solid understanding of ETL/ELT processes and data modeling.

• Experience with data integration tools and orchestration frameworks.

• Excellent analytical, problem-solving, and communication skills.

Preferred Skills:

• AWS Glue, AWS Lambda, Amazon Redshift

• Snowflake Data Warehouse

• SQL & Python

Skills: Aws Lambda, AWS Glue, Amazon Redshift, Snowflake Data Warehouse

Must-Haves

AWS data services (4-6 years), Snowflake architecture (4-6 years), SQL (proficient), Python (proficient), ETL/ELT processes (solid understanding)

Skills: AWS, AWS lambda, Snowflake, Data engineering, Snowpipe, Data integration tools, orchestration framework

Relevant 4 - 6 Years

python is mandatory

******

Notice period - 0 to 15 days only (Feb joiners’ profiles only)

Location: Kochi

F2F Interview 7th Feb