Spark jobs

50+ Spark Jobs in India

Apply to 50+ Spark Jobs on CutShort.io. Find your next job, effortlessly. Browse Spark Jobs and apply today!

Spark jobs in other cities

Jobs by Category

Fullstack Developer Jobs Backend Developer Jobs Frontend Developer Jobs Android Developer Jobs iOS Developer Jobs DevOps Jobs Data Science Jobs

Business Developer Jobs Digital Marketing Jobs Sales Jobs

UX Designer Jobs Graphic Designer Jobs

Jobs by Location

Startup Jobs in Bangalore Startup Jobs in Pune Startup Jobs in Delhi All Startup jobs

Collections

Funded Startup Jobs Product Startup Jobs

Data Architect (Dremio Lakehouse)

AI company

Agency job

via Peak Hire Solutions by Dhara Thakkar

Bengaluru (Bangalore)

5 - 10 yrs

₹20L - ₹45L / yr

Data architecture

Data engineering

SQL

Data modeling

GCS

+21 more

Review Criteria

Strong Dremio / Lakehouse Data Architect profile
5+ years of experience in Data Architecture / Data Engineering, with minimum 3+ years hands-on in Dremio
Strong expertise in SQL optimization, data modeling, query performance tuning, and designing analytical schemas for large-scale systems
Deep experience with cloud object storage (S3 / ADLS / GCS) and file formats such as Parquet, Delta, Iceberg along with distributed query planning concepts
Hands-on experience integrating data via APIs, JDBC, Delta/Parquet, object storage, and coordinating with data engineering pipelines (Airflow, DBT, Kafka, Spark, etc.)
Proven experience designing and implementing lakehouse architecture including ingestion, curation, semantic modeling, reflections/caching optimization, and enabling governed analytics
Strong understanding of data governance, lineage, RBAC-based access control, and enterprise security best practices
Excellent communication skills with ability to work closely with BI, data science, and engineering teams; strong documentation discipline
Candidates must come from enterprise data modernization, cloud-native, or analytics-driven companies

Preferred

Preferred (Nice-to-have) – Experience integrating Dremio with BI tools (Tableau, Power BI, Looker) or data catalogs (Collibra, Alation, Purview); familiarity with Snowflake, Databricks, or BigQuery environments

Job Specific Criteria

CV Attachment is mandatory
How many years of experience you have with Dremio?
Which is your preferred job location (Mumbai / Bengaluru / Hyderabad / Gurgaon)?
Are you okay with 3 Days WFO?
Virtual Interview requires video to be on, are you okay with it?

Role & Responsibilities

You will be responsible for architecting, implementing, and optimizing Dremio-based data lakehouse environments integrated with cloud storage, BI, and data engineering ecosystems. The role requires a strong balance of architecture design, data modeling, query optimization, and governance enablement in large-scale analytical environments.

Design and implement Dremio lakehouse architecture on cloud (AWS/Azure/Snowflake/Databricks ecosystem).
Define data ingestion, curation, and semantic modeling strategies to support analytics and AI workloads.
Optimize Dremio reflections, caching, and query performance for diverse data consumption patterns.
Collaborate with data engineering teams to integrate data sources via APIs, JDBC, Delta/Parquet, and object storage layers (S3/ADLS).
Establish best practices for data security, lineage, and access control aligned with enterprise governance policies.
Support self-service analytics by enabling governed data products and semantic layers.
Develop reusable design patterns, documentation, and standards for Dremio deployment, monitoring, and scaling.
Work closely with BI and data science teams to ensure fast, reliable, and well-modeled access to enterprise data.

Ideal Candidate

Bachelor’s or master’s in computer science, Information Systems, or related field.
5+ years in data architecture and engineering, with 3+ years in Dremio or modern lakehouse platforms.
Strong expertise in SQL optimization, data modeling, and performance tuning within Dremio or similar query engines (Presto, Trino, Athena).
Hands-on experience with cloud storage (S3, ADLS, GCS), Parquet/Delta/Iceberg formats, and distributed query planning.
Knowledge of data integration tools and pipelines (Airflow, DBT, Kafka, Spark, etc.).
Familiarity with enterprise data governance, metadata management, and role-based access control (RBAC).
Excellent problem-solving, documentation, and stakeholder communication skills.

Review Criteria

Strong Dremio / Lakehouse Data Architect profile
5+ years of experience in Data Architecture / Data Engineering, with minimum 3+ years hands-on in Dremio
Strong expertise in SQL optimization, data modeling, query performance tuning, and designing analytical schemas for large-scale systems
Deep experience with cloud object storage (S3 / ADLS / GCS) and file formats such as Parquet, Delta, Iceberg along with distributed query planning concepts
Hands-on experience integrating data via APIs, JDBC, Delta/Parquet, object storage, and coordinating with data engineering pipelines (Airflow, DBT, Kafka, Spark, etc.)
Proven experience designing and implementing lakehouse architecture including ingestion, curation, semantic modeling, reflections/caching optimization, and enabling governed analytics
Strong understanding of data governance, lineage, RBAC-based access control, and enterprise security best practices
Excellent communication skills with ability to work closely with BI, data science, and engineering teams; strong documentation discipline
Candidates must come from enterprise data modernization, cloud-native, or analytics-driven companies

Preferred

Preferred (Nice-to-have) – Experience integrating Dremio with BI tools (Tableau, Power BI, Looker) or data catalogs (Collibra, Alation, Purview); familiarity with Snowflake, Databricks, or BigQuery environments

Job Specific Criteria

CV Attachment is mandatory
How many years of experience you have with Dremio?
Which is your preferred job location (Mumbai / Bengaluru / Hyderabad / Gurgaon)?
Are you okay with 3 Days WFO?
Virtual Interview requires video to be on, are you okay with it?

Role & Responsibilities

Design and implement Dremio lakehouse architecture on cloud (AWS/Azure/Snowflake/Databricks ecosystem).
Define data ingestion, curation, and semantic modeling strategies to support analytics and AI workloads.
Optimize Dremio reflections, caching, and query performance for diverse data consumption patterns.
Collaborate with data engineering teams to integrate data sources via APIs, JDBC, Delta/Parquet, and object storage layers (S3/ADLS).
Establish best practices for data security, lineage, and access control aligned with enterprise governance policies.
Support self-service analytics by enabling governed data products and semantic layers.
Develop reusable design patterns, documentation, and standards for Dremio deployment, monitoring, and scaling.
Work closely with BI and data science teams to ensure fast, reliable, and well-modeled access to enterprise data.

Ideal Candidate

Bachelor’s or master’s in computer science, Information Systems, or related field.
5+ years in data architecture and engineering, with 3+ years in Dremio or modern lakehouse platforms.
Strong expertise in SQL optimization, data modeling, and performance tuning within Dremio or similar query engines (Presto, Trino, Athena).
Hands-on experience with cloud storage (S3, ADLS, GCS), Parquet/Delta/Iceberg formats, and distributed query planning.
Knowledge of data integration tools and pipelines (Airflow, DBT, Kafka, Spark, etc.).
Familiarity with enterprise data governance, metadata management, and role-based access control (RBAC).
Excellent problem-solving, documentation, and stakeholder communication skills.

Senior Data Engineer

at Tecblic Private LImited

Posted by Priya Khatri

Ahmedabad

5 - 6 yrs

₹5L - ₹15L / yr

Windows Azure

Python

SQL

Data Warehouse (DWH)

Data modeling

+5 more

Job Description: Data Engineer

Location: Ahmedabad

Experience: 5 to 6 years

Employment Type: Full-Time

We are looking for a highly motivated and experienced Data Engineer to join our team. As a Data Engineer, you will play a critical role in designing, building, and optimizing data pipelines that ensure the availability, reliability, and performance of our data infrastructure. You will collaborate closely with data scientists, analysts, and cross-functional teams to provide timely and efficient data solutions.

Responsibilities

● Design and optimize data pipelines for various data sources

● Design and implement efficient data storage and retrieval mechanisms

● Develop data modelling solutions and data validation mechanisms

● Troubleshoot data-related issues and recommend process improvements

● Collaborate with data scientists and stakeholders to provide data-driven insights and solutions

● Coach and mentor junior data engineers in the team

Skills Required:

● Minimum 4 years of experience in data engineering or related field

● Proficient in designing and optimizing data pipelines and data modeling

● Strong programming expertise in Python

● Hands-on experience with big data technologies such as Hadoop, Spark, and Hive

● Extensive experience with cloud data services such as AWS, Azure, and GCP

● Advanced knowledge of database technologies like SQL, NoSQL, and data warehousing

● Knowledge of distributed computing and storage systems

● Familiarity with DevOps practices and power automate and Microsoft Fabric will be an added advantage

● Strong analytical and problem-solving skills with outstanding communication and collaboration abilities

Qualifications

Bachelor's degree in Computer Science, Data Science, or a Computer related field

Job Description: Data Engineer

Location: Ahmedabad

Experience: 5 to 6 years

Employment Type: Full-Time

Responsibilities

● Design and optimize data pipelines for various data sources

● Design and implement efficient data storage and retrieval mechanisms

● Develop data modelling solutions and data validation mechanisms

● Troubleshoot data-related issues and recommend process improvements

● Collaborate with data scientists and stakeholders to provide data-driven insights and solutions

● Coach and mentor junior data engineers in the team

Skills Required:

● Minimum 4 years of experience in data engineering or related field

● Proficient in designing and optimizing data pipelines and data modeling

● Strong programming expertise in Python

● Hands-on experience with big data technologies such as Hadoop, Spark, and Hive

● Extensive experience with cloud data services such as AWS, Azure, and GCP

● Advanced knowledge of database technologies like SQL, NoSQL, and data warehousing

● Knowledge of distributed computing and storage systems

● Familiarity with DevOps practices and power automate and Microsoft Fabric will be an added advantage

● Strong analytical and problem-solving skills with outstanding communication and collaboration abilities

Qualifications

Bachelor's degree in Computer Science, Data Science, or a Computer related field

Sr. Data Scientist (AI/ML, Deep learning)

AI Industry

Agency job

via Peak Hire Solutions by Dhara Thakkar

Mumbai, Bengaluru (Bangalore), Hyderabad, Gurugram

5 - 12 yrs

₹20L - ₹46L / yr

Data Science

Artificial Intelligence (AI)

Machine Learning (ML)

Generative AI

Deep Learning

+14 more

Review Criteria

Strong Senior Data Scientist (AI/ML/GenAI) Profile
5+ years of experience in designing, developing, and deploying Machine Learning / Deep Learning (ML/DL) systems in production
Must have strong hands-on experience in Python and deep learning frameworks such as PyTorch, TensorFlow, or JAX.
1+ years of experience in fine-tuning Large Language Models (LLMs) using techniques like LoRA/QLoRA, and building RAG (Retrieval-Augmented Generation) pipelines.
Must have experience with MLOps and production-grade systems including Docker, Kubernetes, Spark, model registries, and CI/CD workflows

Preferred

Prior experience in open-source GenAI contributions, applied LLM/GenAI research, or large-scale production AI systems
Preferred (Education) – B.S./M.S./Ph.D. in Computer Science, Data Science, Machine Learning, or a related field.

Job Specific Criteria

CV Attachment is mandatory
Which is your preferred job location (Mumbai / Bengaluru / Hyderabad / Gurgaon)?
Are you okay with 3 Days WFO?
Virtual Interview requires video to be on, are you okay with it?

Role & Responsibilities

Company is hiring a Senior Data Scientist with strong expertise in AI, machine learning engineering (MLE), and generative AI. You will play a leading role in designing, deploying, and scaling production-grade ML systems — including large language model (LLM)-based pipelines, AI copilots, and agentic workflows. This role is ideal for someone who thrives on balancing cutting-edge research with production rigor and loves mentoring while building impact-first AI applications.

Responsibilities:

Own the full ML lifecycle: model design, training, evaluation, deployment
Design production-ready ML pipelines with CI/CD, testing, monitoring, and drift detection
Fine-tune LLMs and implement retrieval-augmented generation (RAG) pipelines
Build agentic workflows for reasoning, planning, and decision-making
Develop both real-time and batch inference systems using Docker, Kubernetes, and Spark
Leverage state-of-the-art architectures: transformers, diffusion models, RLHF, and multimodal pipelines
Collaborate with product and engineering teams to integrate AI models into business applications
Mentor junior team members and promote MLOps, scalable architecture, and responsible AI best practices

Ideal Candidate

5+ years of experience in designing, deploying, and scaling ML/DL systems in production
Proficient in Python and deep learning frameworks such as PyTorch, TensorFlow, or JAX
Experience with LLM fine-tuning, LoRA/QLoRA, vector search (Weaviate/PGVector), and RAG pipelines
Familiarity with agent-based development (e.g., ReAct agents, function-calling, orchestration)
Solid understanding of MLOps: Docker, Kubernetes, Spark, model registries, and deployment workflows
Strong software engineering background with experience in testing, version control, and APIs
Proven ability to balance innovation with scalable deployment
B.S./M.S./Ph.D. in Computer Science, Data Science, or a related field
Bonus: Open-source contributions, GenAI research, or applied systems at scale

Review Criteria

Strong Senior Data Scientist (AI/ML/GenAI) Profile
5+ years of experience in designing, developing, and deploying Machine Learning / Deep Learning (ML/DL) systems in production
Must have strong hands-on experience in Python and deep learning frameworks such as PyTorch, TensorFlow, or JAX.
1+ years of experience in fine-tuning Large Language Models (LLMs) using techniques like LoRA/QLoRA, and building RAG (Retrieval-Augmented Generation) pipelines.
Must have experience with MLOps and production-grade systems including Docker, Kubernetes, Spark, model registries, and CI/CD workflows

Preferred

Prior experience in open-source GenAI contributions, applied LLM/GenAI research, or large-scale production AI systems
Preferred (Education) – B.S./M.S./Ph.D. in Computer Science, Data Science, Machine Learning, or a related field.

Job Specific Criteria

CV Attachment is mandatory
Which is your preferred job location (Mumbai / Bengaluru / Hyderabad / Gurgaon)?
Are you okay with 3 Days WFO?
Virtual Interview requires video to be on, are you okay with it?

Role & Responsibilities

Responsibilities:

Own the full ML lifecycle: model design, training, evaluation, deployment
Design production-ready ML pipelines with CI/CD, testing, monitoring, and drift detection
Fine-tune LLMs and implement retrieval-augmented generation (RAG) pipelines
Build agentic workflows for reasoning, planning, and decision-making
Develop both real-time and batch inference systems using Docker, Kubernetes, and Spark
Leverage state-of-the-art architectures: transformers, diffusion models, RLHF, and multimodal pipelines
Collaborate with product and engineering teams to integrate AI models into business applications
Mentor junior team members and promote MLOps, scalable architecture, and responsible AI best practices

Ideal Candidate

5+ years of experience in designing, deploying, and scaling ML/DL systems in production
Proficient in Python and deep learning frameworks such as PyTorch, TensorFlow, or JAX
Experience with LLM fine-tuning, LoRA/QLoRA, vector search (Weaviate/PGVector), and RAG pipelines
Familiarity with agent-based development (e.g., ReAct agents, function-calling, orchestration)
Solid understanding of MLOps: Docker, Kubernetes, Spark, model registries, and deployment workflows
Strong software engineering background with experience in testing, version control, and APIs
Proven ability to balance innovation with scalable deployment
B.S./M.S./Ph.D. in Computer Science, Data Science, or a related field
Bonus: Open-source contributions, GenAI research, or applied systems at scale

Data Engineer

at Intineri infosol Pvt Ltd

2 candid answers

Posted by Shivani Pandey

Remote only

6 - 8 yrs

₹5L - ₹10L / yr

PySpark

Spark

Snowflake

Data Transformation Tool (DBT)

Airflow

+2 more

About the Role:

We are seeking an experienced Data Engineer to lead and execute the migration of existing Databricks-based pipelines to Snowflake. The role requires strong expertise in PySpark/Spark, Snowflake, DBT, and Airflow with additional exposure to DevOps and CI/CD practices. The candidate will be responsible for re-architecting data

pipelines, ensuring data consistency, scalability, and performance in Snowflake, and enabling robust automation and monitoring across environments.

Key Responsibilities

Databricks to Snowflake Migration

· Analyze and understand existing pipelines and frameworks in Databricks (PySpark/Spark).

· Re-architect pipelines for execution in Snowflake using efficient SQL-based processing.

· Translate Databricks notebooks/jobs into Snowflake/DBT equivalents.

· Ensure a smooth transition with data consistency, performance, and scalability.

Snowflake

· Hands-on experience with storage integrations, staging (internal/external), Snowpipe, tables/views, COPY INTO, CREATE OR ALTER, and file formats.

· Implement RBAC (role-based access control), data governance, and performance tuning.

· Design and optimize SQL queries for large-scale data processing.

DBT (with Snowflake)

· Implement and manage models, macros, materializations, and SQL execution within DBT.

· Use DBT for modular development, version control, and multi-environment deployments.

Airflow (Orchestration)

· Design and manage DAGs to automate workflows and ensure reliability.

· Handle task dependencies, error recovery, monitoring, and integrations (Cosmos, Astronomer, Docker).

DevOps & CI/CD

· Develop and manage CI/CD pipelines for Snowflake and DBT using GitHub Actions, Azure DevOps, or equivalent.

· Manage version-controlled environments and ensure smooth promotion of changes across dev, test, and prod.

Monitoring & Observability

· Implement monitoring, alerting, and logging for data pipelines.

· Build self-healing or alert-driven mechanisms for critical/severe issue detection.

· Ensure system reliability and proactive issue resolution.

Required Skills & Qualifications

· 5+ years of experience in data engineering with focus on cloud data platforms.

· Strong expertise in:

· Databricks (PySpark/Spark) – analysis, transformations, dependencies.

· Snowflake – architecture, SQL, performance tuning, security (RBAC).

· DBT – modular model development, macros, deployments.

· Airflow – DAG design, orchestration, and error handling.

· Experience in CI/CD pipeline development (GitHub Actions, Azure DevOps).

· Solid understanding of data modeling, ETL/ELT processes, and best practices.

· Excellent problem-solving, communication, and stakeholder collaboration skills.

Good to Have

· Exposure to Docker/Kubernetes for orchestration.

· Knowledge of Azure Data Services (ADF, ADLS) or similar cloud tools.

· Experience with data governance, lineage, and metadata management.

Education

· Bachelor’s / Master’s degree in Computer Science, Engineering, or related field.

About the Role:

pipelines, ensuring data consistency, scalability, and performance in Snowflake, and enabling robust automation and monitoring across environments.

Key Responsibilities

Databricks to Snowflake Migration

· Analyze and understand existing pipelines and frameworks in Databricks (PySpark/Spark).

· Re-architect pipelines for execution in Snowflake using efficient SQL-based processing.

· Translate Databricks notebooks/jobs into Snowflake/DBT equivalents.

· Ensure a smooth transition with data consistency, performance, and scalability.

Snowflake

· Hands-on experience with storage integrations, staging (internal/external), Snowpipe, tables/views, COPY INTO, CREATE OR ALTER, and file formats.

· Implement RBAC (role-based access control), data governance, and performance tuning.

· Design and optimize SQL queries for large-scale data processing.

DBT (with Snowflake)

· Implement and manage models, macros, materializations, and SQL execution within DBT.

· Use DBT for modular development, version control, and multi-environment deployments.

Airflow (Orchestration)

· Design and manage DAGs to automate workflows and ensure reliability.

· Handle task dependencies, error recovery, monitoring, and integrations (Cosmos, Astronomer, Docker).

DevOps & CI/CD

· Develop and manage CI/CD pipelines for Snowflake and DBT using GitHub Actions, Azure DevOps, or equivalent.

· Manage version-controlled environments and ensure smooth promotion of changes across dev, test, and prod.

Monitoring & Observability

· Implement monitoring, alerting, and logging for data pipelines.

· Build self-healing or alert-driven mechanisms for critical/severe issue detection.

· Ensure system reliability and proactive issue resolution.

Required Skills & Qualifications

· 5+ years of experience in data engineering with focus on cloud data platforms.

· Strong expertise in:

· Databricks (PySpark/Spark) – analysis, transformations, dependencies.

· Snowflake – architecture, SQL, performance tuning, security (RBAC).

· DBT – modular model development, macros, deployments.

· Airflow – DAG design, orchestration, and error handling.

· Experience in CI/CD pipeline development (GitHub Actions, Azure DevOps).

· Solid understanding of data modeling, ETL/ELT processes, and best practices.

· Excellent problem-solving, communication, and stakeholder collaboration skills.

Good to Have

· Exposure to Docker/Kubernetes for orchestration.

· Knowledge of Azure Data Services (ADF, ADLS) or similar cloud tools.

· Experience with data governance, lineage, and metadata management.

Education

· Bachelor’s / Master’s degree in Computer Science, Engineering, or related field.

Sr. Big Data Engineer

at Inncircles

Posted by Gangadhar M

Hyderabad

3 - 5 yrs

Best in industry

PySpark

Spark

Python

ETL

Amazon EMR

+7 more

We are looking for a highly skilled Sr. Big Data Engineer with 3-5 years of experience in

building large-scale data pipelines, real-time streaming solutions, and batch/stream

processing systems. The ideal candidate should be proficient in Spark, Kafka, Python, and

AWS Big Data services, with hands-on experience in implementing CDC (Change Data

Capture) pipelines and integrating multiple data sources and sinks.

Responsibilities

Design, develop, and optimize batch and streaming data pipelines using Apache Spark and Python.
Build and maintain real-time data ingestion pipelines leveraging Kafka and AWS Kinesis.
Implement CDC (Change Data Capture) pipelines using Kafka Connect, Debezium or similar frameworks.
Integrate data from multiple sources and sinks (databases, APIs, message queues, file systems, cloud storage).
Work with AWS Big Data ecosystem: Glue, EMR, Kinesis, Athena, S3, Lambda, Step Functions.
Ensure pipeline scalability, reliability, and performance tuning of Spark jobs and EMR clusters.
Develop data transformation and ETL workflows in AWS Glue and manage schema evolution.
Collaborate with data scientists, analysts, and product teams to deliver reliable and high-quality data solutions.
Implement monitoring, logging, and alerting for critical data pipelines.
Follow best practices for data security, compliance, and cost optimization in cloud environments.

Required Skills & Experience

Programming: Strong proficiency in Python (PySpark, data frameworks, automation).
Big Data Processing: Hands-on experience with Apache Spark (batch & streaming).
Messaging & Streaming: Proficient in Kafka (brokers, topics, partitions, consumer groups) and AWS Kinesis.
CDC Pipelines: Experience with Debezium / Kafka Connect / custom CDC frameworks.
AWS Services: AWS Glue, EMR, S3, Athena, Lambda, IAM, CloudWatch.
ETL/ELT Workflows: Strong knowledge of data ingestion, transformation, partitioning, schema management.
Databases: Experience with relational databases (MySQL, Postgres, Oracle) and NoSQL (MongoDB, DynamoDB, Cassandra).
Data Formats: JSON, Parquet, Avro, ORC, Delta/Iceberg/Hudi.
Version Control & CI/CD: Git, GitHub/GitLab, Jenkins, or CodePipeline.
Monitoring/Logging: CloudWatch, Prometheus, ELK/Opensearch.
Containers & Orchestration (nice-to-have): Docker, Kubernetes, Airflow/Step
Functions for workflow orchestration.

Preferred Qualifications

Bachelor’s or Master’s degree in Computer Science, Data Engineering, or related field.
Experience in large-scale data lake / lake house architectures.
Knowledge of data warehousing concepts and query optimisation.
Familiarity with data governance, lineage, and cataloging tools (Glue Data Catalog, Apache Atlas).
Exposure to ML/AI data pipelines is a plus.

Tools & Technologies (must-have exposure)

Big Data & Processing: Apache Spark, PySpark, AWS EMR, AWS Glue
Streaming & Messaging: Apache Kafka, Kafka Connect, Debezium, AWS Kinesis
Cloud & Storage: AWS (S3, Athena, Lambda, IAM, CloudWatch)
Programming & Scripting: Python, SQL, Bash
Orchestration: Airflow / Step Functions
Version Control & CI/CD: Git, Jenkins/CodePipeline
Data Formats: Parquet, Avro, ORC, JSON, Delta, Iceberg, Hudi

We are looking for a highly skilled Sr. Big Data Engineer with 3-5 years of experience in

building large-scale data pipelines, real-time streaming solutions, and batch/stream

processing systems. The ideal candidate should be proficient in Spark, Kafka, Python, and

AWS Big Data services, with hands-on experience in implementing CDC (Change Data

Capture) pipelines and integrating multiple data sources and sinks.

Responsibilities

Design, develop, and optimize batch and streaming data pipelines using Apache Spark and Python.
Build and maintain real-time data ingestion pipelines leveraging Kafka and AWS Kinesis.
Implement CDC (Change Data Capture) pipelines using Kafka Connect, Debezium or similar frameworks.
Integrate data from multiple sources and sinks (databases, APIs, message queues, file systems, cloud storage).
Work with AWS Big Data ecosystem: Glue, EMR, Kinesis, Athena, S3, Lambda, Step Functions.
Ensure pipeline scalability, reliability, and performance tuning of Spark jobs and EMR clusters.
Develop data transformation and ETL workflows in AWS Glue and manage schema evolution.
Collaborate with data scientists, analysts, and product teams to deliver reliable and high-quality data solutions.
Implement monitoring, logging, and alerting for critical data pipelines.
Follow best practices for data security, compliance, and cost optimization in cloud environments.

Required Skills & Experience

Programming: Strong proficiency in Python (PySpark, data frameworks, automation).
Big Data Processing: Hands-on experience with Apache Spark (batch & streaming).
Messaging & Streaming: Proficient in Kafka (brokers, topics, partitions, consumer groups) and AWS Kinesis.
CDC Pipelines: Experience with Debezium / Kafka Connect / custom CDC frameworks.
AWS Services: AWS Glue, EMR, S3, Athena, Lambda, IAM, CloudWatch.
ETL/ELT Workflows: Strong knowledge of data ingestion, transformation, partitioning, schema management.
Databases: Experience with relational databases (MySQL, Postgres, Oracle) and NoSQL (MongoDB, DynamoDB, Cassandra).
Data Formats: JSON, Parquet, Avro, ORC, Delta/Iceberg/Hudi.
Version Control & CI/CD: Git, GitHub/GitLab, Jenkins, or CodePipeline.
Monitoring/Logging: CloudWatch, Prometheus, ELK/Opensearch.
Containers & Orchestration (nice-to-have): Docker, Kubernetes, Airflow/Step
Functions for workflow orchestration.

Preferred Qualifications

Bachelor’s or Master’s degree in Computer Science, Data Engineering, or related field.
Experience in large-scale data lake / lake house architectures.
Knowledge of data warehousing concepts and query optimisation.
Familiarity with data governance, lineage, and cataloging tools (Glue Data Catalog, Apache Atlas).
Exposure to ML/AI data pipelines is a plus.

Tools & Technologies (must-have exposure)

Big Data & Processing: Apache Spark, PySpark, AWS EMR, AWS Glue
Streaming & Messaging: Apache Kafka, Kafka Connect, Debezium, AWS Kinesis
Cloud & Storage: AWS (S3, Athena, Lambda, IAM, CloudWatch)
Programming & Scripting: Python, SQL, Bash
Orchestration: Airflow / Step Functions
Version Control & CI/CD: Git, Jenkins/CodePipeline
Data Formats: Parquet, Avro, ORC, JSON, Delta, Iceberg, Hudi

VP - Data Architect (B2B SaaS)

Technology Industry

Agency job

via Peak Hire Solutions by Dhara Thakkar

Delhi

10 - 15 yrs

₹105L - ₹140L / yr

Data engineering

Apache Spark

Apache

Apache Kafka

Java

+25 more

MANDATORY:

Super Quality Data Architect, Data Engineering Manager / Director Profile
Must have 12+ YOE in Data Engineering roles, with at least 2+ years in a Leadership role
Must have 7+ YOE in hands-on Tech development with Java (Highly preferred) or Python, Node.JS, GoLang
Must have strong experience in large data technologies, tools like HDFS, YARN, Map-Reduce, Hive, Kafka, Spark, Airflow, Presto etc.
Strong expertise in HLD and LLD, to design scalable, maintainable data architectures.
Must have managed a team of at least 5+ Data Engineers (Read Leadership role in CV)
Product Companies (Prefers high-scale, data-heavy companies)

PREFERRED:

Must be from Tier - 1 Colleges, preferred IIT
Candidates must have spent a minimum 3 yrs in each company.
Must have recent 4+ YOE with high-growth Product startups, and should have implemented Data Engineering systems from an early stage in the Company

ROLES & RESPONSIBILITIES:

Lead and mentor a team of data engineers, ensuring high performance and career growth.
Architect and optimize scalable data infrastructure, ensuring high availability and reliability.
Drive the development and implementation of data governance frameworks and best practices.
Work closely with cross-functional teams to define and execute a data roadmap.
Optimize data processing workflows for performance and cost efficiency.
Ensure data security, compliance, and quality across all data platforms.
Foster a culture of innovation and technical excellence within the data team.

IDEAL CANDIDATE:

10+ years of experience in software/data engineering, with at least 3+ years in a leadership role.
Expertise in backend development with programming languages such as Java, PHP, Python, Node.JS, GoLang, JavaScript, HTML, and CSS.
Proficiency in SQL, Python, and Scala for data processing and analytics.
Strong understanding of cloud platforms (AWS, GCP, or Azure) and their data services.
Strong foundation and expertise in HLD and LLD, as well as design patterns, preferably using Spring Boot or Google Guice
Experience in big data technologies such as Spark, Hadoop, Kafka, and distributed computing frameworks.
Hands-on experience with data warehousing solutions such as Snowflake, Redshift, or BigQuery
Deep knowledge of data governance, security, and compliance (GDPR, SOC2, etc.).
Experience in NoSQL databases like Redis, Cassandra, MongoDB, and TiDB.
Familiarity with automation and DevOps tools like Jenkins, Ansible, Docker, Kubernetes, Chef, Grafana, and ELK.
Proven ability to drive technical strategy and align it with business objectives.
Strong leadership, communication, and stakeholder management skills.

PREFERRED QUALIFICATIONS:

Experience in machine learning infrastructure or MLOps is a plus.
Exposure to real-time data processing and analytics.
Interest in data structures, algorithm analysis and design, multicore programming, and scalable architecture.
Prior experience in a SaaS or high-growth tech company.

MANDATORY:

Super Quality Data Architect, Data Engineering Manager / Director Profile
Must have 12+ YOE in Data Engineering roles, with at least 2+ years in a Leadership role
Must have 7+ YOE in hands-on Tech development with Java (Highly preferred) or Python, Node.JS, GoLang
Must have strong experience in large data technologies, tools like HDFS, YARN, Map-Reduce, Hive, Kafka, Spark, Airflow, Presto etc.
Strong expertise in HLD and LLD, to design scalable, maintainable data architectures.
Must have managed a team of at least 5+ Data Engineers (Read Leadership role in CV)
Product Companies (Prefers high-scale, data-heavy companies)

PREFERRED:

Must be from Tier - 1 Colleges, preferred IIT
Candidates must have spent a minimum 3 yrs in each company.
Must have recent 4+ YOE with high-growth Product startups, and should have implemented Data Engineering systems from an early stage in the Company

ROLES & RESPONSIBILITIES:

Lead and mentor a team of data engineers, ensuring high performance and career growth.
Architect and optimize scalable data infrastructure, ensuring high availability and reliability.
Drive the development and implementation of data governance frameworks and best practices.
Work closely with cross-functional teams to define and execute a data roadmap.
Optimize data processing workflows for performance and cost efficiency.
Ensure data security, compliance, and quality across all data platforms.
Foster a culture of innovation and technical excellence within the data team.

IDEAL CANDIDATE:

10+ years of experience in software/data engineering, with at least 3+ years in a leadership role.
Expertise in backend development with programming languages such as Java, PHP, Python, Node.JS, GoLang, JavaScript, HTML, and CSS.
Proficiency in SQL, Python, and Scala for data processing and analytics.
Strong understanding of cloud platforms (AWS, GCP, or Azure) and their data services.
Strong foundation and expertise in HLD and LLD, as well as design patterns, preferably using Spring Boot or Google Guice
Experience in big data technologies such as Spark, Hadoop, Kafka, and distributed computing frameworks.
Hands-on experience with data warehousing solutions such as Snowflake, Redshift, or BigQuery
Deep knowledge of data governance, security, and compliance (GDPR, SOC2, etc.).
Experience in NoSQL databases like Redis, Cassandra, MongoDB, and TiDB.
Familiarity with automation and DevOps tools like Jenkins, Ansible, Docker, Kubernetes, Chef, Grafana, and ELK.
Proven ability to drive technical strategy and align it with business objectives.
Strong leadership, communication, and stakeholder management skills.

PREFERRED QUALIFICATIONS:

Experience in machine learning infrastructure or MLOps is a plus.
Exposure to real-time data processing and analytics.
Interest in data structures, algorithm analysis and design, multicore programming, and scalable architecture.
Prior experience in a SaaS or high-growth tech company.

Senior Data Engineer

at Wissen Technology

4 recruiters

Posted by Robin Silverster

Bengaluru (Bangalore)

5 - 11 yrs

₹10L - ₹35L / yr

Python

Spark

Apache Kafka

Snow flake schema

databricks

+1 more

Required Skills:

· 8+ years of being a practitioner in data engineering or a related field.

· Proficiency in programming skills in Python

· Experience with data processing frameworks like Apache Spark or Hadoop.

· Experience working on Databricks.

· Familiarity with cloud platforms (AWS, Azure) and their data services.

· Experience with data warehousing concepts and technologies.

· Experience with message queues and streaming platforms (e.g., Kafka).

· Excellent communication and collaboration skills.

· Ability to work independently and as part of a geographically distributed team.

Required Skills:

· 8+ years of being a practitioner in data engineering or a related field.

· Proficiency in programming skills in Python

· Experience with data processing frameworks like Apache Spark or Hadoop.

· Experience working on Databricks.

· Familiarity with cloud platforms (AWS, Azure) and their data services.

· Experience with data warehousing concepts and technologies.

· Experience with message queues and streaming platforms (e.g., Kafka).

· Excellent communication and collaboration skills.

· Ability to work independently and as part of a geographically distributed team.

PySpark/Scala Developer

at Tata Consultancy Services

2 recruiters

Agency job

via Risk Resources LLP hyd by Jhansi Padiy

Chennai, Hyderabad, Kolkata, Delhi, Pune, Bengaluru (Bangalore)

4 - 10 yrs

₹6L - ₹30L / yr

Scala

PySpark

Spark

Amazon Web Services (AWS)

Job Title: PySpark/Scala Developer

Functional Skills: Experience in Credit Risk/Regulatory risk domain

Technical Skills: Spark ,PySpark, Python, Hive, Scala, MapReduce, Unix shell scripting

Good to Have Skills: Exposure to Machine Learning Techniques

Job Description:

5+ Years of experience with Developing/Fine tuning and implementing programs/applications

Using Python/PySpark/Scala on Big Data/Hadoop Platform.

Roles and Responsibilities:

a) Work with a Leading Bank’s Risk Management team on specific projects/requirements pertaining to risk Models in

consumer and wholesale banking

b) Enhance Machine Learning Models using PySpark or Scala

c) Work with Data Scientists to Build ML Models based on Business Requirements and Follow ML Cycle to Deploy them all

the way to Production Environment

d) Participate Feature Engineering, Training Models, Scoring and retraining

e) Architect Data Pipeline and Automate Data Ingestion and Model Jobs

Skills and competencies:

Required:

· Strong analytical skills in conducting sophisticated statistical analysis using bureau/vendor data, customer performance

Data and macro-economic data to solve business problems.

· Working experience in languages PySpark & Scala to develop code to validate and implement models and codes in

Credit Risk/Banking

· Experience with distributed systems such as Hadoop/MapReduce, Spark, streaming data processing, cloud architecture.

Familiarity with machine learning frameworks and libraries (like scikit-learn, SparkML, tensorflow, pytorch etc.
Experience in systems integration, web services, batch processing
Experience in migrating codes to PySpark/Scala is big Plus
The ability to act as liaison conveying information needs of the business to IT and data constraints to the business

applies equal conveyance regarding business strategy and IT strategy, business processes and work flow

· Flexibility in approach and thought process

· Attitude to learn and comprehend the periodical changes in the regulatory requirement as per FED

Job Title: PySpark/Scala Developer

Functional Skills: Experience in Credit Risk/Regulatory risk domain

Technical Skills: Spark ,PySpark, Python, Hive, Scala, MapReduce, Unix shell scripting

Good to Have Skills: Exposure to Machine Learning Techniques

Job Description:

5+ Years of experience with Developing/Fine tuning and implementing programs/applications

Using Python/PySpark/Scala on Big Data/Hadoop Platform.

Roles and Responsibilities:

a) Work with a Leading Bank’s Risk Management team on specific projects/requirements pertaining to risk Models in

consumer and wholesale banking

b) Enhance Machine Learning Models using PySpark or Scala

c) Work with Data Scientists to Build ML Models based on Business Requirements and Follow ML Cycle to Deploy them all

the way to Production Environment

d) Participate Feature Engineering, Training Models, Scoring and retraining

e) Architect Data Pipeline and Automate Data Ingestion and Model Jobs

Skills and competencies:

Required:

· Strong analytical skills in conducting sophisticated statistical analysis using bureau/vendor data, customer performance

Data and macro-economic data to solve business problems.

· Working experience in languages PySpark & Scala to develop code to validate and implement models and codes in

Credit Risk/Banking

· Experience with distributed systems such as Hadoop/MapReduce, Spark, streaming data processing, cloud architecture.

Familiarity with machine learning frameworks and libraries (like scikit-learn, SparkML, tensorflow, pytorch etc.
Experience in systems integration, web services, batch processing
Experience in migrating codes to PySpark/Scala is big Plus
The ability to act as liaison conveying information needs of the business to IT and data constraints to the business

applies equal conveyance regarding business strategy and IT strategy, business processes and work flow

· Flexibility in approach and thought process

· Attitude to learn and comprehend the periodical changes in the regulatory requirement as per FED

PySpark/Scala Developer

at Tata Consultancy Services

2 recruiters

Agency job

via Risk Resources LLP hyd by susmitha o

Bengaluru (Bangalore), Hyderabad, Pune, Delhi, Kolkata, Chennai

5 - 8 yrs

₹7L - ₹30L / yr

Scala

Python

PySpark

Apache Hive

Spark

+3 more

Skills and competencies:

Required:

· Strong analytical skills in conducting sophisticated statistical analysis using bureau/vendor data, customer performance

Data and macro-economic data to solve business problems.

· Working experience in languages PySpark & Scala to develop code to validate and implement models and codes in

Credit Risk/Banking

· Experience with distributed systems such as Hadoop/MapReduce, Spark, streaming data processing, cloud architecture.

Familiarity with machine learning frameworks and libraries (like scikit-learn, SparkML, tensorflow, pytorch etc.
Experience in systems integration, web services, batch processing
Experience in migrating codes to PySpark/Scala is big Plus
The ability to act as liaison conveying information needs of the business to IT and data constraints to the business

applies equal conveyance regarding business strategy and IT strategy, business processes and work flow

· Flexibility in approach and thought process

· Attitude to learn and comprehend the periodical changes in the regulatory requirement as per FED

Skills and competencies:

Required:

· Strong analytical skills in conducting sophisticated statistical analysis using bureau/vendor data, customer performance

Data and macro-economic data to solve business problems.

· Working experience in languages PySpark & Scala to develop code to validate and implement models and codes in

Credit Risk/Banking

· Experience with distributed systems such as Hadoop/MapReduce, Spark, streaming data processing, cloud architecture.

Familiarity with machine learning frameworks and libraries (like scikit-learn, SparkML, tensorflow, pytorch etc.
Experience in systems integration, web services, batch processing
Experience in migrating codes to PySpark/Scala is big Plus
The ability to act as liaison conveying information needs of the business to IT and data constraints to the business

applies equal conveyance regarding business strategy and IT strategy, business processes and work flow

· Flexibility in approach and thought process

· Attitude to learn and comprehend the periodical changes in the regulatory requirement as per FED

Databricks - Solution Consultant

at Aceis Services

2 candid answers

Posted by Anushi Mishra

Remote only

2 - 10 yrs

₹8.6L - ₹30.2L / yr

CI/CD

Apache Spark

PySpark

MLOps

Machine Learning (ML)

+6 more

We are hiring freelancers to work on advanced Data & AI projects using Databricks. If you are passionate about cloud platforms, machine learning, data engineering, or architecture, and want to work with cutting-edge tools on real-world challenges, this is the opportunity for you!

✅ Key Details

Work Type: Freelance / Contract
Location: Remote
Time Zones: IST / EST only
Domain: Data & AI, Cloud, Big Data, Machine Learning
Collaboration: Work with industry leaders on innovative projects

🔹 Open Roles

1. Databricks – Senior Consultant

Skills: Data Warehousing, Python, Java, Scala, ETL, SQL, AWS, GCP, Azure
Experience: 6+ years

2. Databricks – ML Engineer

Skills: CI/CD, MLOps, Machine Learning, Spark, Hadoop
Experience: 4+ years

3. Databricks – Solution Architect

Skills: Azure, GCP, AWS, CI/CD, MLOps
Experience: 7+ years

4. Databricks – Solution Consultant

Skills: SQL, Spark, BigQuery, Python, Scala
Experience: 2+ years

✅ What We Offer

Opportunity to work with top-tier professionals and clients
Exposure to cutting-edge technologies and real-world data challenges
Flexible remote work environment aligned with IST / EST time zones
Competitive compensation and growth opportunities

📌 Skills We Value

✅ Key Details

Work Type: Freelance / Contract
Location: Remote
Time Zones: IST / EST only
Domain: Data & AI, Cloud, Big Data, Machine Learning
Collaboration: Work with industry leaders on innovative projects

🔹 Open Roles

1. Databricks – Senior Consultant

Skills: Data Warehousing, Python, Java, Scala, ETL, SQL, AWS, GCP, Azure
Experience: 6+ years

2. Databricks – ML Engineer

Skills: CI/CD, MLOps, Machine Learning, Spark, Hadoop
Experience: 4+ years

3. Databricks – Solution Architect

Skills: Azure, GCP, AWS, CI/CD, MLOps
Experience: 7+ years

4. Databricks – Solution Consultant

Skills: SQL, Spark, BigQuery, Python, Scala
Experience: 2+ years

✅ What We Offer

Opportunity to work with top-tier professionals and clients
Exposure to cutting-edge technologies and real-world data challenges
Flexible remote work environment aligned with IST / EST time zones
Competitive compensation and growth opportunities

📌 Skills We Value

Senior Data Engineer

It is a global technology consultancy

Agency job

via Scaling Theory by DivyaSri Rajendran

Bengaluru (Bangalore)

4.5 - 10 yrs

₹15L - ₹30L / yr

Spark

Scala

Hadoop

Amazon Web Services (AWS)

Role overview:

Must have About 5 - 11 years and at least 3 years relevant experience with Bigdata.
Must have Experience in building highly scalable business applications, which involve implementing large complex business flows and dealing with huge amounts of data.
Must have experience in Hadoop, Hive, Spark with Scala with good experience in performance tuning and debugging issues.
Good to have any stream processing Spark/Java Kafka.
Must have experience in design and development of Big data projects.
Good knowledge in Functional programming and OOP concepts, SOLID principles, design patterns for developing scalable applications.
Familiarity with build tools like Maven.
Must have experience with any RDBMS and at least one SQL database preferably PostgresSQL
Must have experience writing unit and integration tests using scaliest
Must have experience using any versioning control system - Git
Must have experience with CI / CD pipeline – Jenkins is a plus
Basic hands-on experience in one of the cloud provider (AWS/Azure) is a plus
Databricks Spark certification is a plus.

What would you do here:

As a Software Development Engineer 2 you will be responsible for expanding and optimising our data and data pipeline architecture as well as optimising data flow and collection for cross-functional teams. The ideal candidate is an experienced data pipeline design and data wrangler who enjoys optimising data systems and building them from the ground up. The Data Engineer will lead our software developers on data initiatives and will ensure optimal data delivery architecture is consistent throughout ongoing projects. They must be self-directed and comfortable supporting the data needs of multiple teams, systems and products. The right candidate will be excited by the prospect of optimising or even re-designing our company’s data architecture to support our next generation of products and data initiatives.

Responsibilities:

•Create and maintain optimal data pipeline architecture

•Assemble large complex data sets that meet functional / non-functional business requirements.

•Identify design and implement internal process improvements: automating manual processes optimising data delivery, coordinating to re-design infrastructure for greater scalability etc.

•Work with stakeholders including the Executive Product Data and Design teams to assist with data-related technical issues and support their data infrastructure needs.

•Keep our data separated and secure

•Work with data and analytics experts to strive for greater functionality in our data systems.

- Support PROD systems

Role overview:

Must have About 5 - 11 years and at least 3 years relevant experience with Bigdata.
Must have Experience in building highly scalable business applications, which involve implementing large complex business flows and dealing with huge amounts of data.
Must have experience in Hadoop, Hive, Spark with Scala with good experience in performance tuning and debugging issues.
Good to have any stream processing Spark/Java Kafka.
Must have experience in design and development of Big data projects.
Good knowledge in Functional programming and OOP concepts, SOLID principles, design patterns for developing scalable applications.
Familiarity with build tools like Maven.
Must have experience with any RDBMS and at least one SQL database preferably PostgresSQL
Must have experience writing unit and integration tests using scaliest
Must have experience using any versioning control system - Git
Must have experience with CI / CD pipeline – Jenkins is a plus
Basic hands-on experience in one of the cloud provider (AWS/Azure) is a plus
Databricks Spark certification is a plus.

What would you do here:

Responsibilities:

•Create and maintain optimal data pipeline architecture

•Assemble large complex data sets that meet functional / non-functional business requirements.

•Identify design and implement internal process improvements: automating manual processes optimising data delivery, coordinating to re-design infrastructure for greater scalability etc.

•Work with stakeholders including the Executive Product Data and Design teams to assist with data-related technical issues and support their data infrastructure needs.

•Keep our data separated and secure

•Work with data and analytics experts to strive for greater functionality in our data systems.

- Support PROD systems

Big Data Engineer

empowers digital transformation for innovative and high grow

Agency job

via Hirebound by Jebin Joy

Pune

4 - 12 yrs

₹12L - ₹30L / yr

Hadoop

Spark

Apache Kafka

ETL

Java

+2 more

To be successful in this role, you should possess

• Collaborate closely with Product Management and Engineering leadership to devise and build the

right solution.

• Participate in Design discussions and brainstorming sessions to select, integrate, and maintain Big

Data tools and frameworks required to solve Big Data problems at scale.

• Design and implement systems to cleanse, process, and analyze large data sets using distributed

processing tools like Akka and Spark.

• Understanding and critically reviewing existing data pipelines, and coming up with ideas in

collaboration with Technical Leaders and Architects to improve upon current bottlenecks

• Take initiatives, and show the drive to pick up new stuff proactively, and work as a Senior

Individual contributor on the multiple products and features we have.

• 3+ years of experience in developing highly scalable Big Data pipelines.

• In-depth understanding of the Big Data ecosystem including processing frameworks like Spark,

Akka, Storm, and Hadoop, and the file types they deal with.

• Experience with ETL and Data pipeline tools like Apache NiFi, Airflow etc.

• Excellent coding skills in Java or Scala, including the understanding to apply appropriate Design

Patterns when required.

• Experience with Git and build tools like Gradle/Maven/SBT.

• Strong understanding of object-oriented design, data structures, algorithms, profiling, and

optimization.

• Have elegant, readable, maintainable and extensible code style.

You are someone who would easily be able to

• Work closely with the US and India engineering teams to help build the Java/Scala based data

pipelines

• Lead the India engineering team in technical excellence and ownership of critical modules; own

the development of new modules and features

• Troubleshoot live production server issues.

• Handle client coordination and be able to work as a part of a team, be able to contribute

independently and drive the team to exceptional contributions with minimal team supervision

• Follow Agile methodology, JIRA for work planning, issue management/tracking

Additional Project/Soft Skills:

• Should be able to work independently with India & US based team members.

• Strong verbal and written communication with ability to articulate problems and solutions over phone and emails.

• Strong sense of urgency, with a passion for accuracy and timeliness.

• Ability to work calmly in high pressure situations and manage multiple projects/tasks.

• Ability to work independently and possess superior skills in issue resolution.

• Should have the passion to learn and implement, analyze and troubleshoot issues

To be successful in this role, you should possess

• Collaborate closely with Product Management and Engineering leadership to devise and build the

right solution.

• Participate in Design discussions and brainstorming sessions to select, integrate, and maintain Big

Data tools and frameworks required to solve Big Data problems at scale.

• Design and implement systems to cleanse, process, and analyze large data sets using distributed

processing tools like Akka and Spark.

• Understanding and critically reviewing existing data pipelines, and coming up with ideas in

collaboration with Technical Leaders and Architects to improve upon current bottlenecks

• Take initiatives, and show the drive to pick up new stuff proactively, and work as a Senior

Individual contributor on the multiple products and features we have.

• 3+ years of experience in developing highly scalable Big Data pipelines.

• In-depth understanding of the Big Data ecosystem including processing frameworks like Spark,

Akka, Storm, and Hadoop, and the file types they deal with.

• Experience with ETL and Data pipeline tools like Apache NiFi, Airflow etc.

• Excellent coding skills in Java or Scala, including the understanding to apply appropriate Design

Patterns when required.

• Experience with Git and build tools like Gradle/Maven/SBT.

• Strong understanding of object-oriented design, data structures, algorithms, profiling, and

optimization.

• Have elegant, readable, maintainable and extensible code style.

You are someone who would easily be able to

• Work closely with the US and India engineering teams to help build the Java/Scala based data

pipelines

• Lead the India engineering team in technical excellence and ownership of critical modules; own

the development of new modules and features

• Troubleshoot live production server issues.

• Handle client coordination and be able to work as a part of a team, be able to contribute

independently and drive the team to exceptional contributions with minimal team supervision

• Follow Agile methodology, JIRA for work planning, issue management/tracking

Additional Project/Soft Skills:

• Should be able to work independently with India & US based team members.

• Strong verbal and written communication with ability to articulate problems and solutions over phone and emails.

• Strong sense of urgency, with a passion for accuracy and timeliness.

• Ability to work calmly in high pressure situations and manage multiple projects/tasks.

• Ability to work independently and possess superior skills in issue resolution.

• Should have the passion to learn and implement, analyze and troubleshoot issues

Data Engineer

at Pluginlive

1 recruiter

Posted by Harsha Saggi

Chennai, Mumbai

4 - 6 yrs

₹10L - ₹20L / yr

Python

SQL

NOSQL Databases

Data architecture

Data modeling

+7 more

Role Overview:

We are seeking a talented and experienced Data Architect with strong data visualization capabilities to join our dynamic team in Mumbai. As a Data Architect, you will be responsible for designing, building, and managing our data infrastructure, ensuring its reliability, scalability, and performance. You will also play a crucial role in transforming complex data into insightful visualizations that drive business decisions. This role requires a deep understanding of data modeling, database technologies (particularly Oracle Cloud), data warehousing principles, and proficiency in data manipulation and visualization tools, including Python and SQL.

Responsibilities:

Design and implement robust and scalable data architectures, including data warehouses, data lakes, and operational data stores, primarily leveraging Oracle Cloud services.
Develop and maintain data models (conceptual, logical, and physical) that align with business requirements and ensure data integrity and consistency.
Define data governance policies and procedures to ensure data quality, security, and compliance.
Collaborate with data engineers to build and optimize ETL/ELT pipelines for efficient data ingestion, transformation, and loading.
Develop and execute data migration strategies to Oracle Cloud.
Utilize strong SQL skills to query, manipulate, and analyze large datasets from various sources.
Leverage Python and relevant libraries (e.g., Pandas, NumPy) for data cleaning, transformation, and analysis.
Design and develop interactive and insightful data visualizations using tools like [Specify Visualization Tools - e.g., Tableau, Power BI, Matplotlib, Seaborn, Plotly] to communicate data-driven insights to both technical and non-technical stakeholders.
Work closely with business analysts and stakeholders to understand their data needs and translate them into effective data models and visualizations.
Ensure the performance and reliability of data visualization dashboards and reports.
Stay up-to-date with the latest trends and technologies in data architecture, cloud computing (especially Oracle Cloud), and data visualization.
Troubleshoot data-related issues and provide timely resolutions.
Document data architectures, data flows, and data visualization solutions.
Participate in the evaluation and selection of new data technologies and tools.

Qualifications:

Bachelor's or Master's degree in Computer Science, Data Science, Information Systems, or a related field.
Proven experience (typically 5+ years) as a Data Architect, Data Modeler, or similar role.
Deep understanding of data warehousing concepts, dimensional modeling (e.g., star schema, snowflake schema), and ETL/ELT processes.
Extensive experience working with relational databases, particularly Oracle, and proficiency in SQL.
Hands-on experience with Oracle Cloud data services (e.g., Autonomous Data Warehouse, Object Storage, Data Integration).
Strong programming skills in Python and experience with data manipulation and analysis libraries (e.g., Pandas, NumPy).
Demonstrated ability to create compelling and effective data visualizations using industry-standard tools (e.g., Tableau, Power BI, Matplotlib, Seaborn, Plotly).
Excellent analytical and problem-solving skills with the ability to interpret complex data and translate it into actionable insights.
Strong communication and presentation skills, with the ability to effectively communicate technical concepts to non-technical audiences.
Experience with data governance and data quality principles.
Familiarity with agile development methodologies.
Ability to work independently and collaboratively within a team environment.

Application Link- https://forms.gle/km7n2WipJhC2Lj2r5

Role Overview:

Responsibilities:

Design and implement robust and scalable data architectures, including data warehouses, data lakes, and operational data stores, primarily leveraging Oracle Cloud services.
Develop and maintain data models (conceptual, logical, and physical) that align with business requirements and ensure data integrity and consistency.
Define data governance policies and procedures to ensure data quality, security, and compliance.
Collaborate with data engineers to build and optimize ETL/ELT pipelines for efficient data ingestion, transformation, and loading.
Develop and execute data migration strategies to Oracle Cloud.
Utilize strong SQL skills to query, manipulate, and analyze large datasets from various sources.
Leverage Python and relevant libraries (e.g., Pandas, NumPy) for data cleaning, transformation, and analysis.
Design and develop interactive and insightful data visualizations using tools like [Specify Visualization Tools - e.g., Tableau, Power BI, Matplotlib, Seaborn, Plotly] to communicate data-driven insights to both technical and non-technical stakeholders.
Work closely with business analysts and stakeholders to understand their data needs and translate them into effective data models and visualizations.
Ensure the performance and reliability of data visualization dashboards and reports.
Stay up-to-date with the latest trends and technologies in data architecture, cloud computing (especially Oracle Cloud), and data visualization.
Troubleshoot data-related issues and provide timely resolutions.
Document data architectures, data flows, and data visualization solutions.
Participate in the evaluation and selection of new data technologies and tools.

Qualifications:

Bachelor's or Master's degree in Computer Science, Data Science, Information Systems, or a related field.
Proven experience (typically 5+ years) as a Data Architect, Data Modeler, or similar role.
Deep understanding of data warehousing concepts, dimensional modeling (e.g., star schema, snowflake schema), and ETL/ELT processes.
Extensive experience working with relational databases, particularly Oracle, and proficiency in SQL.
Hands-on experience with Oracle Cloud data services (e.g., Autonomous Data Warehouse, Object Storage, Data Integration).
Strong programming skills in Python and experience with data manipulation and analysis libraries (e.g., Pandas, NumPy).
Demonstrated ability to create compelling and effective data visualizations using industry-standard tools (e.g., Tableau, Power BI, Matplotlib, Seaborn, Plotly).
Excellent analytical and problem-solving skills with the ability to interpret complex data and translate it into actionable insights.
Strong communication and presentation skills, with the ability to effectively communicate technical concepts to non-technical audiences.
Experience with data governance and data quality principles.
Familiarity with agile development methodologies.
Ability to work independently and collaboratively within a team environment.

Application Link- https://forms.gle/km7n2WipJhC2Lj2r5

Data Engineer – GCP + Spark + DBT

at NeoGenCode Technologies Pvt Ltd

2 candid answers

Posted by Akshay Patil

Bengaluru (Bangalore)

8 - 12 yrs

₹15L - ₹22L / yr

Data engineering

Google Cloud Platform (GCP)

Data Transformation Tool (DBT)

Google Dataform

BigQuery

+6 more

Job Title : Data Engineer – GCP + Spark + DBT

Location : Bengaluru (On-site at Client Location | 3 Days WFO)

Experience : 8 to 12 Years

Level : Associate Architect

Type : Full-time

Job Overview :

We are looking for a seasoned Data Engineer to join the Data Platform Engineering team supporting a Unified Data Platform (UDP). This role requires hands-on expertise in DBT, GCP, BigQuery, and PySpark, with a solid foundation in CI/CD, data pipeline optimization, and agile delivery.

Mandatory Skills : GCP, DBT, Google Dataform, BigQuery, PySpark/Spark SQL, Advanced SQL, CI/CD, Git, Agile Methodologies.

Key Responsibilities :

Design, build, and optimize scalable data pipelines using BigQuery, DBT, and PySpark.
Leverage GCP-native services like Cloud Storage, Pub/Sub, Dataproc, Cloud Functions, and Composer for ETL/ELT workflows.
Implement and maintain CI/CD for data engineering projects with Git-based version control.
Collaborate with cross-functional teams including Infra, Security, and DataOps for reliable, secure, and high-quality data delivery.
Lead code reviews, mentor junior engineers, and enforce best practices in data engineering.
Participate in Agile sprints, backlog grooming, and Jira-based project tracking.

Must-Have Skills :

Strong experience with DBT, Google Dataform, and BigQuery
Hands-on expertise with PySpark/Spark SQL
Proficient in GCP for data engineering workflows
Solid knowledge of SQL optimization, Git, and CI/CD pipelines
Agile team experience and strong problem-solving abilities

Nice-to-Have Skills :

Familiarity with Databricks, Delta Lake, or Kafka
Exposure to data observability and quality frameworks (e.g., Great Expectations, Soda)
Knowledge of MDM patterns, Terraform, or IaC is a plus

Job Title : Data Engineer – GCP + Spark + DBT

Location : Bengaluru (On-site at Client Location | 3 Days WFO)

Experience : 8 to 12 Years

Level : Associate Architect

Type : Full-time

Job Overview :

Mandatory Skills : GCP, DBT, Google Dataform, BigQuery, PySpark/Spark SQL, Advanced SQL, CI/CD, Git, Agile Methodologies.

Key Responsibilities :

Design, build, and optimize scalable data pipelines using BigQuery, DBT, and PySpark.
Leverage GCP-native services like Cloud Storage, Pub/Sub, Dataproc, Cloud Functions, and Composer for ETL/ELT workflows.
Implement and maintain CI/CD for data engineering projects with Git-based version control.
Collaborate with cross-functional teams including Infra, Security, and DataOps for reliable, secure, and high-quality data delivery.
Lead code reviews, mentor junior engineers, and enforce best practices in data engineering.
Participate in Agile sprints, backlog grooming, and Jira-based project tracking.

Must-Have Skills :

Strong experience with DBT, Google Dataform, and BigQuery
Hands-on expertise with PySpark/Spark SQL
Proficient in GCP for data engineering workflows
Solid knowledge of SQL optimization, Git, and CI/CD pipelines
Agile team experience and strong problem-solving abilities

Nice-to-Have Skills :

Familiarity with Databricks, Delta Lake, or Kafka
Exposure to data observability and quality frameworks (e.g., Great Expectations, Soda)
Knowledge of MDM patterns, Terraform, or IaC is a plus

Data Engineer

at VyTCDC

Posted by Gobinath Sundaram

Bengaluru (Bangalore)

5 - 8 yrs

₹4L - ₹25L / yr

Data engineering

Python

Spark

🛠️ Key Responsibilities

Design, build, and maintain scalable data pipelines using Python and Apache Spark (PySpark or Scala APIs)
Develop and optimize ETL processes for batch and real-time data ingestion
Collaborate with data scientists, analysts, and DevOps teams to support data-driven solutions
Ensure data quality, integrity, and governance across all stages of the data lifecycle
Implement data validation, monitoring, and alerting mechanisms for production pipelines
Work with cloud platforms (AWS, GCP, or Azure) and tools like Airflow, Kafka, and Delta Lake
Participate in code reviews, performance tuning, and documentation

🎓 Qualifications

Bachelor’s or Master’s degree in Computer Science, Engineering, or related field
3–6 years of experience in data engineering with a focus on Python and Spark
Experience with distributed computing and handling large-scale datasets (10TB+)
Familiarity with data security, PII handling, and compliance standards is a plus

🛠️ Key Responsibilities

Design, build, and maintain scalable data pipelines using Python and Apache Spark (PySpark or Scala APIs)
Develop and optimize ETL processes for batch and real-time data ingestion
Collaborate with data scientists, analysts, and DevOps teams to support data-driven solutions
Ensure data quality, integrity, and governance across all stages of the data lifecycle
Implement data validation, monitoring, and alerting mechanisms for production pipelines
Work with cloud platforms (AWS, GCP, or Azure) and tools like Airflow, Kafka, and Delta Lake
Participate in code reviews, performance tuning, and documentation

🎓 Qualifications

Bachelor’s or Master’s degree in Computer Science, Engineering, or related field
3–6 years of experience in data engineering with a focus on Python and Spark
Experience with distributed computing and handling large-scale datasets (10TB+)
Familiarity with data security, PII handling, and compliance standards is a plus

senior data engineer

at KJBN labs

2 candid answers

Posted by sakthi ganesh

Bengaluru (Bangalore)

4 - 7 yrs

₹10L - ₹30L / yr

Hadoop

Apache Kafka

Spark

Python

Java

+8 more

Senior Data Engineer Job Description

Overview

The Senior Data Engineer will design, develop, and maintain scalable data pipelines and

infrastructure to support data-driven decision-making and advanced analytics. This role requires deep

expertise in data engineering, strong problem-solving skills, and the ability to collaborate with

cross-functional teams to deliver robust data solutions.

Key Responsibilities

Data Pipeline Development: Design, build, and optimize scalable, secure, and reliable data

pipelines to ingest, process, and transform large volumes of structured and unstructured data.

Data Architecture: Architect and maintain data storage solutions, including data lakes, data

warehouses, and databases, ensuring performance, scalability, and cost-efficiency.

Data Integration: Integrate data from diverse sources, including APIs, third-party systems,

and streaming platforms, ensuring data quality and consistency.

Performance Optimization: Monitor and optimize data systems for performance, scalability,

and cost, implementing best practices for partitioning, indexing, and caching.

Collaboration: Work closely with data scientists, analysts, and software engineers to

understand data needs and deliver solutions that enable advanced analytics, machine

learning, and reporting.

Data Governance: Implement data governance policies, ensuring compliance with data

security, privacy regulations (e.g., GDPR, CCPA), and internal standards.

Automation: Develop automated processes for data ingestion, transformation, and validation

to improve efficiency and reduce manual intervention.

Mentorship: Guide and mentor junior data engineers, fostering a culture of technical

excellence and continuous learning.

Troubleshooting: Diagnose and resolve complex data-related issues, ensuring high

availability and reliability of data systems.

Required Qualifications

Education: Bachelor’s or Master’s degree in Computer Science, Engineering, Data Science,

or a related field.

Experience: 5+ years of experience in data engineering or a related role, with a proven track

record of building scalable data pipelines and infrastructure.

Technical Skills:

Proficiency in programming languages such as Python, Java, or Scala.

Expertise in SQL and experience with NoSQL databases (e.g., MongoDB, Cassandra).

Strong experience with cloud platforms (e.g., AWS, Azure, GCP) and their data services

(e.g., Redshift, BigQuery, Snowflake).

Hands-on experience with ETL/ELT tools (e.g., Apache Airflow, Talend, Informatica) and

data integration frameworks.

Familiarity with big data technologies (e.g., Hadoop, Spark, Kafka) and distributed

systems.

Knowledge of containerization and orchestration tools (e.g., Docker, Kubernetes) is a

plus.

Soft Skills:

Excellent problem-solving and analytical skills.

Strong communication and collaboration abilities.

Ability to work in a fast-paced, dynamic environment and manage multiple priorities.

Certifications (optional but preferred): Cloud certifications (e.g., AWS Certified Data Analytics,

Google Professional Data Engineer) or relevant data engineering certifications.

Preferred Qualifica

Experience with real-time data processing and streaming architectures.

Familiarity with machine learning pipelines and MLOps practices.

Knowledge of data visualization tools (e.g., Tableau, Power BI) and their integration with data

pipelines.

Experience in industries with high data complexity, such as finance, healthcare, or

e-commerce.

Work Environment

Location: Hybrid/Remote/On-site (depending on company policy).

Team: Collaborative, cross-functional team environment with data scientists, analysts, and

business stakeholders.

Hours: Full-time, with occasional on-call responsibilities for critical data systems.