Spark Jobs in Bangalore (Bengaluru)

50+ Spark Jobs in Bangalore (Bengaluru) | Spark Job openings in Bangalore (Bengaluru)

Apply to 50+ Spark Jobs in Bangalore (Bengaluru) on CutShort.io. Explore the latest Spark Job opportunities across top companies like Google, Amazon & Adobe.

Spark jobs in other cities

Jobs by Category

Fullstack Developer Jobs Backend Developer Jobs Frontend Developer Jobs Android Developer Jobs iOS Developer Jobs DevOps Jobs Data Science Jobs

Business Developer Jobs Digital Marketing Jobs Sales Jobs

UX Designer Jobs Graphic Designer Jobs

Jobs by Location

Startup Jobs in Bangalore Startup Jobs in Pune Startup Jobs in Delhi All Startup jobs

Collections

Funded Startup Jobs Product Startup Jobs

Senior Data Engineer

at Wissen Technology

4 recruiters

Posted by Robin Silverster

Bengaluru (Bangalore)

8 - 11 yrs

₹10L - ₹35L / yr

Python

Spark

Apache Kafka

Snow flake schema

databricks

+1 more

Required Skills:

· 8+ years of being a practitioner in data engineering or a related field.

· Proficiency in programming skills in Python

· Experience with data processing frameworks like Apache Spark or Hadoop.

· Experience working on Databricks.

· Familiarity with cloud platforms (AWS, Azure) and their data services.

· Experience with data warehousing concepts and technologies.

· Experience with message queues and streaming platforms (e.g., Kafka).

· Excellent communication and collaboration skills.

· Ability to work independently and as part of a geographically distributed team.

Required Skills:

· 8+ years of being a practitioner in data engineering or a related field.

· Proficiency in programming skills in Python

· Experience with data processing frameworks like Apache Spark or Hadoop.

· Experience working on Databricks.

· Familiarity with cloud platforms (AWS, Azure) and their data services.

· Experience with data warehousing concepts and technologies.

· Experience with message queues and streaming platforms (e.g., Kafka).

· Excellent communication and collaboration skills.

· Ability to work independently and as part of a geographically distributed team.

Big Data Developer

at Hashone Careers

2 candid answers

Posted by Madhavan I

Bengaluru (Bangalore), Hyderabad, Pune

5 - 8 yrs

₹14L - ₹25L / yr

Spark

Scala

Jenkins

Job Description

Experience - 5 - 8 yrs

Mode of Working - Hybrid (3 days WFO, 2 days WFH)

Location - Bangalore / Pune / Hyderabad.

Big Data Developer

Focus: Building applications and solutions

Writes code and develops software applications that process big data
Creates data-driven applications and analytical tools
Focuses on implementing business logic and algorithms
Build backend services and APIs to facilitate secure and efficient data exchange.

Key Responsibilities:

Develop data processing applications using Spark, Hadoop
Write MapReduce jobs and data transformation logic
Implement machine learning models and analytics solutions
Code optimisation and debugging
Hands on experience into Databricks
Hands on experience on Airflow
Hands-on experience on any Ci/cd tools ( jenkins preferred)

Primary Skills:

Programming languages (Scala, Python programming )
Spark,
Data structures and algorithms
Application development
Software development best practices

View Less

Skills

SPARK, SCALA, JENKINS, CI/CD, Databricks, AIRFLOW

Job Description

Experience - 5 - 8 yrs

Mode of Working - Hybrid (3 days WFO, 2 days WFH)

Location - Bangalore / Pune / Hyderabad.

Big Data Developer

Focus: Building applications and solutions

Writes code and develops software applications that process big data
Creates data-driven applications and analytical tools
Focuses on implementing business logic and algorithms
Build backend services and APIs to facilitate secure and efficient data exchange.

Key Responsibilities:

Develop data processing applications using Spark, Hadoop
Write MapReduce jobs and data transformation logic
Implement machine learning models and analytics solutions
Code optimisation and debugging
Hands on experience into Databricks
Hands on experience on Airflow
Hands-on experience on any Ci/cd tools ( jenkins preferred)

Primary Skills:

Programming languages (Scala, Python programming )
Spark,
Data structures and algorithms
Application development
Software development best practices

View Less

Skills

SPARK, SCALA, JENKINS, CI/CD, Databricks, AIRFLOW

PySpark/Scala Developer

at Tata Consultancy Services

2 recruiters

Agency job

via Risk Resources LLP hyd by Jhansi Padiy

Chennai, Hyderabad, Kolkata, Delhi, Pune, Bengaluru (Bangalore)

4 - 10 yrs

₹6L - ₹30L / yr

Scala

PySpark

Spark

Amazon Web Services (AWS)

Job Title: PySpark/Scala Developer

Functional Skills: Experience in Credit Risk/Regulatory risk domain

Technical Skills: Spark ,PySpark, Python, Hive, Scala, MapReduce, Unix shell scripting

Good to Have Skills: Exposure to Machine Learning Techniques

Job Description:

5+ Years of experience with Developing/Fine tuning and implementing programs/applications

Using Python/PySpark/Scala on Big Data/Hadoop Platform.

Roles and Responsibilities:

a) Work with a Leading Bank’s Risk Management team on specific projects/requirements pertaining to risk Models in

consumer and wholesale banking

b) Enhance Machine Learning Models using PySpark or Scala

c) Work with Data Scientists to Build ML Models based on Business Requirements and Follow ML Cycle to Deploy them all

the way to Production Environment

d) Participate Feature Engineering, Training Models, Scoring and retraining

e) Architect Data Pipeline and Automate Data Ingestion and Model Jobs

Skills and competencies:

Required:

· Strong analytical skills in conducting sophisticated statistical analysis using bureau/vendor data, customer performance

Data and macro-economic data to solve business problems.

· Working experience in languages PySpark & Scala to develop code to validate and implement models and codes in

Credit Risk/Banking

· Experience with distributed systems such as Hadoop/MapReduce, Spark, streaming data processing, cloud architecture.

Familiarity with machine learning frameworks and libraries (like scikit-learn, SparkML, tensorflow, pytorch etc.
Experience in systems integration, web services, batch processing
Experience in migrating codes to PySpark/Scala is big Plus
The ability to act as liaison conveying information needs of the business to IT and data constraints to the business

applies equal conveyance regarding business strategy and IT strategy, business processes and work flow

· Flexibility in approach and thought process

· Attitude to learn and comprehend the periodical changes in the regulatory requirement as per FED

Job Title: PySpark/Scala Developer

Functional Skills: Experience in Credit Risk/Regulatory risk domain

Technical Skills: Spark ,PySpark, Python, Hive, Scala, MapReduce, Unix shell scripting

Good to Have Skills: Exposure to Machine Learning Techniques

Job Description:

5+ Years of experience with Developing/Fine tuning and implementing programs/applications

Using Python/PySpark/Scala on Big Data/Hadoop Platform.

Roles and Responsibilities:

a) Work with a Leading Bank’s Risk Management team on specific projects/requirements pertaining to risk Models in

consumer and wholesale banking

b) Enhance Machine Learning Models using PySpark or Scala

c) Work with Data Scientists to Build ML Models based on Business Requirements and Follow ML Cycle to Deploy them all

the way to Production Environment

d) Participate Feature Engineering, Training Models, Scoring and retraining

e) Architect Data Pipeline and Automate Data Ingestion and Model Jobs

Skills and competencies:

Required:

· Strong analytical skills in conducting sophisticated statistical analysis using bureau/vendor data, customer performance

Data and macro-economic data to solve business problems.

· Working experience in languages PySpark & Scala to develop code to validate and implement models and codes in

Credit Risk/Banking

· Experience with distributed systems such as Hadoop/MapReduce, Spark, streaming data processing, cloud architecture.

Familiarity with machine learning frameworks and libraries (like scikit-learn, SparkML, tensorflow, pytorch etc.
Experience in systems integration, web services, batch processing
Experience in migrating codes to PySpark/Scala is big Plus
The ability to act as liaison conveying information needs of the business to IT and data constraints to the business

applies equal conveyance regarding business strategy and IT strategy, business processes and work flow

· Flexibility in approach and thought process

· Attitude to learn and comprehend the periodical changes in the regulatory requirement as per FED

PySpark/Scala Developer

at Tata Consultancy Services

2 recruiters

Agency job

via Risk Resources LLP hyd by susmitha o

Bengaluru (Bangalore), Hyderabad, Pune, Delhi, Kolkata, Chennai

5 - 8 yrs

₹7L - ₹30L / yr

Scala

Python

PySpark

Apache Hive

Spark

+3 more

Skills and competencies:

Required:

· Strong analytical skills in conducting sophisticated statistical analysis using bureau/vendor data, customer performance

Data and macro-economic data to solve business problems.

· Working experience in languages PySpark & Scala to develop code to validate and implement models and codes in

Credit Risk/Banking

· Experience with distributed systems such as Hadoop/MapReduce, Spark, streaming data processing, cloud architecture.

Familiarity with machine learning frameworks and libraries (like scikit-learn, SparkML, tensorflow, pytorch etc.
Experience in systems integration, web services, batch processing
Experience in migrating codes to PySpark/Scala is big Plus
The ability to act as liaison conveying information needs of the business to IT and data constraints to the business

applies equal conveyance regarding business strategy and IT strategy, business processes and work flow

· Flexibility in approach and thought process

· Attitude to learn and comprehend the periodical changes in the regulatory requirement as per FED

Skills and competencies:

Required:

· Strong analytical skills in conducting sophisticated statistical analysis using bureau/vendor data, customer performance

Data and macro-economic data to solve business problems.

· Working experience in languages PySpark & Scala to develop code to validate and implement models and codes in

Credit Risk/Banking

· Experience with distributed systems such as Hadoop/MapReduce, Spark, streaming data processing, cloud architecture.

Familiarity with machine learning frameworks and libraries (like scikit-learn, SparkML, tensorflow, pytorch etc.
Experience in systems integration, web services, batch processing
Experience in migrating codes to PySpark/Scala is big Plus
The ability to act as liaison conveying information needs of the business to IT and data constraints to the business

applies equal conveyance regarding business strategy and IT strategy, business processes and work flow

· Flexibility in approach and thought process

· Attitude to learn and comprehend the periodical changes in the regulatory requirement as per FED

Software Engineer ( Backend )

at NeoGenCode Technologies Pvt Ltd

2 candid answers

Posted by Shivank Bhardwaj

Bengaluru (Bangalore)

4 - 8 yrs

₹5L - ₹20L / yr

Object Oriented Programming (OOPs)

Java

Spark

Microservices

CI/CD

+6 more

Job Description

We are seeking a highly skilled and experienced Backend Engineer to join our dynamic and fast-paced development team in Bangalore. The ideal candidate will have expertise in Java development, particularly in Java 8 or above, and extensive hands-on experience with Apache Spark, Spark Streaming, and Spring Boot for developing scalable and high-performance microservices. The candidate must also have strong problem-solving skills, a deep understanding of distributed computing, and experience with cloud technologies (Azure).

Key Responsibilities

Design, develop, and maintain highly scalable microservices and optimized RESTful APIs using Spring Boot in Java 8 or above.
Write efficient and maintainable Spark and Spark Streaming code for processing large-scale data in real-time.
Implement Java 8 advanced features such as Functional Interfaces, Lambda Expressions, Streams, Parallel Streams, Completable Futures, and Concurrency API improvements.
Work with relational (SQL) and non-relational (Cosmos DB) databases for data modeling and optimization.
Utilize Maven for building and deploying artifacts to the snapshot repository.
Collaborate with cross-functional teams, including Product, Business, Automation, and other stakeholders, to define, design, and deliver new features.
Follow Agile SCRUM methodologies for software development and actively participate in sprint planning and retrospective meetings.
Maintain version control using Git and ensure best practices for code collaboration and peer code reviews.
Implement CI/CD pipelines using tools such as Jenkins and GitHub Actions to automate build and deployment processes.
Work with Azure Cloud Technologies to build and deploy cloud-based applications.
Apply software design patterns and best practices in backend development to enhance system architecture and scalability.
Troubleshoot and debug applications, ensuring high performance, security, and scalability.
Keep up to date with the latest industry trends, tools, and technologies to continuously improve development processes.

Minimum Qualifications

BS/MS in Computer Science or equivalent.
4+ years of industry experience in developing highly scalable microservices and optimized RESTful APIs using Spring Boot in Java 8 or above.
3+ years of experience in version control tools like Git.
3+ years of experience working in an Agile SCRUM environment.
Strong understanding of software design patterns and distributed computing concepts.
Solid experience in relational and non-relational databases (SQL and Cosmos DB).
Experience with Maven for building and managing dependencies.
Knowledge of CI/CD workflows and experience with Jenkins and GitHub Actions.
Prior enterprise experience in working with Azure Cloud Technologies.
Proven ability to work collaboratively with cross-functional teams to deliver high-quality product features.
Strong problem-solving skills, debugging techniques, and ability to troubleshoot complex issues efficiently.

Preferred Qualifications

Experience with Kafka or other messaging queues for real-time data processing.
Exposure to Docker, Kubernetes, and container orchestration tools.
Hands-on experience with NoSQL databases like MongoDB, Cassandra, or DynamoDB.
Experience with performance optimization techniques for backend applications.
Knowledge of test-driven development (TDD) and unit testing frameworks like JUnit.

Job Description

Key Responsibilities

Design, develop, and maintain highly scalable microservices and optimized RESTful APIs using Spring Boot in Java 8 or above.
Write efficient and maintainable Spark and Spark Streaming code for processing large-scale data in real-time.
Implement Java 8 advanced features such as Functional Interfaces, Lambda Expressions, Streams, Parallel Streams, Completable Futures, and Concurrency API improvements.
Work with relational (SQL) and non-relational (Cosmos DB) databases for data modeling and optimization.
Utilize Maven for building and deploying artifacts to the snapshot repository.
Collaborate with cross-functional teams, including Product, Business, Automation, and other stakeholders, to define, design, and deliver new features.
Follow Agile SCRUM methodologies for software development and actively participate in sprint planning and retrospective meetings.
Maintain version control using Git and ensure best practices for code collaboration and peer code reviews.
Implement CI/CD pipelines using tools such as Jenkins and GitHub Actions to automate build and deployment processes.
Work with Azure Cloud Technologies to build and deploy cloud-based applications.
Apply software design patterns and best practices in backend development to enhance system architecture and scalability.
Troubleshoot and debug applications, ensuring high performance, security, and scalability.
Keep up to date with the latest industry trends, tools, and technologies to continuously improve development processes.

Minimum Qualifications

BS/MS in Computer Science or equivalent.
4+ years of industry experience in developing highly scalable microservices and optimized RESTful APIs using Spring Boot in Java 8 or above.
3+ years of experience in version control tools like Git.
3+ years of experience working in an Agile SCRUM environment.
Strong understanding of software design patterns and distributed computing concepts.
Solid experience in relational and non-relational databases (SQL and Cosmos DB).
Experience with Maven for building and managing dependencies.
Knowledge of CI/CD workflows and experience with Jenkins and GitHub Actions.
Prior enterprise experience in working with Azure Cloud Technologies.
Proven ability to work collaboratively with cross-functional teams to deliver high-quality product features.
Strong problem-solving skills, debugging techniques, and ability to troubleshoot complex issues efficiently.

Preferred Qualifications

Experience with Kafka or other messaging queues for real-time data processing.
Exposure to Docker, Kubernetes, and container orchestration tools.
Hands-on experience with NoSQL databases like MongoDB, Cassandra, or DynamoDB.
Experience with performance optimization techniques for backend applications.
Knowledge of test-driven development (TDD) and unit testing frameworks like JUnit.

Data Engineer- L2

at CoffeeBeans

2 candid answers

Posted by Nikita Sinha

Bengaluru (Bangalore), Pune

5 - 7 yrs

Upto ₹22L / yr (Varies

)

Python

SQL

ETL

Data modeling

Spark

+6 more

Role Overview

We're looking for experienced Data Engineers who can independently design, build, and manage scalable data platforms. You'll work directly with clients and internal teams to develop robust data pipelines that support analytics, AI/ML, and operational systems.

You’ll also play a mentorship role and help establish strong engineering practices across our data projects.

Key Responsibilities

Design and develop large-scale, distributed data pipelines (batch and streaming)
Implement scalable data models, warehouses/lakehouses, and data lakes
Translate business requirements into technical data solutions
Optimize data pipelines for performance and reliability
Ensure code is clean, modular, tested, and documented
Contribute to architecture, tooling decisions, and platform setup
Review code/design and mentor junior engineers

Must-Have Skills

Strong programming skills in Python and advanced SQL
Solid grasp of ETL/ELT, data modeling (OLTP & OLAP), and stream processing
Hands-on experience with frameworks like Apache Spark, Flink, etc.
Experience with orchestration tools like Airflow
Familiarity with CI/CD pipelines and Git
Ability to debug and scale data pipelines in production

Preferred Skills

Experience with cloud platforms (AWS preferred, GCP or Azure also fine)
Exposure to Databricks, dbt, or similar tools
Understanding of data governance, quality frameworks, and observability
Certifications (e.g., AWS Data Analytics, Solutions Architect, Databricks) are a bonus

What We’re Looking For

Problem-solver with strong analytical skills and attention to detail
Fast learner who can adapt across tools, tech stacks, and domains
Comfortable working in fast-paced, client-facing environments
Willingness to travel within India when required

Role Overview

You’ll also play a mentorship role and help establish strong engineering practices across our data projects.

Key Responsibilities

Design and develop large-scale, distributed data pipelines (batch and streaming)
Implement scalable data models, warehouses/lakehouses, and data lakes
Translate business requirements into technical data solutions
Optimize data pipelines for performance and reliability
Ensure code is clean, modular, tested, and documented
Contribute to architecture, tooling decisions, and platform setup
Review code/design and mentor junior engineers

Must-Have Skills

Strong programming skills in Python and advanced SQL
Solid grasp of ETL/ELT, data modeling (OLTP & OLAP), and stream processing
Hands-on experience with frameworks like Apache Spark, Flink, etc.
Experience with orchestration tools like Airflow
Familiarity with CI/CD pipelines and Git
Ability to debug and scale data pipelines in production

Preferred Skills

Experience with cloud platforms (AWS preferred, GCP or Azure also fine)
Exposure to Databricks, dbt, or similar tools
Understanding of data governance, quality frameworks, and observability
Certifications (e.g., AWS Data Analytics, Solutions Architect, Databricks) are a bonus

What We’re Looking For

Problem-solver with strong analytical skills and attention to detail
Fast learner who can adapt across tools, tech stacks, and domains
Comfortable working in fast-paced, client-facing environments
Willingness to travel within India when required

Senior Data Engineer

It is a global technology consultancy

Agency job

via Scaling Theory by DivyaSri Rajendran

Bengaluru (Bangalore)

4.5 - 10 yrs

₹15L - ₹30L / yr

Spark

Scala

Hadoop

Amazon Web Services (AWS)

Role overview:

Must have About 5 - 11 years and at least 3 years relevant experience with Bigdata.
Must have Experience in building highly scalable business applications, which involve implementing large complex business flows and dealing with huge amounts of data.
Must have experience in Hadoop, Hive, Spark with Scala with good experience in performance tuning and debugging issues.
Good to have any stream processing Spark/Java Kafka.
Must have experience in design and development of Big data projects.
Good knowledge in Functional programming and OOP concepts, SOLID principles, design patterns for developing scalable applications.
Familiarity with build tools like Maven.
Must have experience with any RDBMS and at least one SQL database preferably PostgresSQL
Must have experience writing unit and integration tests using scaliest
Must have experience using any versioning control system - Git
Must have experience with CI / CD pipeline – Jenkins is a plus
Basic hands-on experience in one of the cloud provider (AWS/Azure) is a plus
Databricks Spark certification is a plus.

What would you do here:

As a Software Development Engineer 2 you will be responsible for expanding and optimising our data and data pipeline architecture as well as optimising data flow and collection for cross-functional teams. The ideal candidate is an experienced data pipeline design and data wrangler who enjoys optimising data systems and building them from the ground up. The Data Engineer will lead our software developers on data initiatives and will ensure optimal data delivery architecture is consistent throughout ongoing projects. They must be self-directed and comfortable supporting the data needs of multiple teams, systems and products. The right candidate will be excited by the prospect of optimising or even re-designing our company’s data architecture to support our next generation of products and data initiatives.

Responsibilities:

•Create and maintain optimal data pipeline architecture

•Assemble large complex data sets that meet functional / non-functional business requirements.

•Identify design and implement internal process improvements: automating manual processes optimising data delivery, coordinating to re-design infrastructure for greater scalability etc.

•Work with stakeholders including the Executive Product Data and Design teams to assist with data-related technical issues and support their data infrastructure needs.

•Keep our data separated and secure

•Work with data and analytics experts to strive for greater functionality in our data systems.

- Support PROD systems

Role overview:

Must have About 5 - 11 years and at least 3 years relevant experience with Bigdata.
Must have Experience in building highly scalable business applications, which involve implementing large complex business flows and dealing with huge amounts of data.
Must have experience in Hadoop, Hive, Spark with Scala with good experience in performance tuning and debugging issues.
Good to have any stream processing Spark/Java Kafka.
Must have experience in design and development of Big data projects.
Good knowledge in Functional programming and OOP concepts, SOLID principles, design patterns for developing scalable applications.
Familiarity with build tools like Maven.
Must have experience with any RDBMS and at least one SQL database preferably PostgresSQL
Must have experience writing unit and integration tests using scaliest
Must have experience using any versioning control system - Git
Must have experience with CI / CD pipeline – Jenkins is a plus
Basic hands-on experience in one of the cloud provider (AWS/Azure) is a plus
Databricks Spark certification is a plus.

What would you do here:

Responsibilities:

•Create and maintain optimal data pipeline architecture

•Assemble large complex data sets that meet functional / non-functional business requirements.

•Identify design and implement internal process improvements: automating manual processes optimising data delivery, coordinating to re-design infrastructure for greater scalability etc.

•Work with stakeholders including the Executive Product Data and Design teams to assist with data-related technical issues and support their data infrastructure needs.

•Keep our data separated and secure

•Work with data and analytics experts to strive for greater functionality in our data systems.

- Support PROD systems

Data Engineer – GCP + Spark + DBT

at NeoGenCode Technologies Pvt Ltd

2 candid answers

Posted by Akshay Patil

Bengaluru (Bangalore)

8 - 12 yrs

₹15L - ₹22L / yr

Data engineering

Google Cloud Platform (GCP)

Data Transformation Tool (DBT)

Google Dataform

BigQuery

+6 more

Job Title : Data Engineer – GCP + Spark + DBT

Location : Bengaluru (On-site at Client Location | 3 Days WFO)

Experience : 8 to 12 Years

Level : Associate Architect

Type : Full-time

Job Overview :

We are looking for a seasoned Data Engineer to join the Data Platform Engineering team supporting a Unified Data Platform (UDP). This role requires hands-on expertise in DBT, GCP, BigQuery, and PySpark, with a solid foundation in CI/CD, data pipeline optimization, and agile delivery.

Mandatory Skills : GCP, DBT, Google Dataform, BigQuery, PySpark/Spark SQL, Advanced SQL, CI/CD, Git, Agile Methodologies.

Key Responsibilities :

Design, build, and optimize scalable data pipelines using BigQuery, DBT, and PySpark.
Leverage GCP-native services like Cloud Storage, Pub/Sub, Dataproc, Cloud Functions, and Composer for ETL/ELT workflows.
Implement and maintain CI/CD for data engineering projects with Git-based version control.
Collaborate with cross-functional teams including Infra, Security, and DataOps for reliable, secure, and high-quality data delivery.
Lead code reviews, mentor junior engineers, and enforce best practices in data engineering.
Participate in Agile sprints, backlog grooming, and Jira-based project tracking.

Must-Have Skills :

Strong experience with DBT, Google Dataform, and BigQuery
Hands-on expertise with PySpark/Spark SQL
Proficient in GCP for data engineering workflows
Solid knowledge of SQL optimization, Git, and CI/CD pipelines
Agile team experience and strong problem-solving abilities

Nice-to-Have Skills :

Familiarity with Databricks, Delta Lake, or Kafka
Exposure to data observability and quality frameworks (e.g., Great Expectations, Soda)
Knowledge of MDM patterns, Terraform, or IaC is a plus

Job Title : Data Engineer – GCP + Spark + DBT

Location : Bengaluru (On-site at Client Location | 3 Days WFO)

Experience : 8 to 12 Years

Level : Associate Architect

Type : Full-time

Job Overview :

Mandatory Skills : GCP, DBT, Google Dataform, BigQuery, PySpark/Spark SQL, Advanced SQL, CI/CD, Git, Agile Methodologies.

Key Responsibilities :

Design, build, and optimize scalable data pipelines using BigQuery, DBT, and PySpark.
Leverage GCP-native services like Cloud Storage, Pub/Sub, Dataproc, Cloud Functions, and Composer for ETL/ELT workflows.
Implement and maintain CI/CD for data engineering projects with Git-based version control.
Collaborate with cross-functional teams including Infra, Security, and DataOps for reliable, secure, and high-quality data delivery.
Lead code reviews, mentor junior engineers, and enforce best practices in data engineering.
Participate in Agile sprints, backlog grooming, and Jira-based project tracking.

Must-Have Skills :

Strong experience with DBT, Google Dataform, and BigQuery
Hands-on expertise with PySpark/Spark SQL
Proficient in GCP for data engineering workflows
Solid knowledge of SQL optimization, Git, and CI/CD pipelines
Agile team experience and strong problem-solving abilities

Nice-to-Have Skills :

Familiarity with Databricks, Delta Lake, or Kafka
Exposure to data observability and quality frameworks (e.g., Great Expectations, Soda)
Knowledge of MDM patterns, Terraform, or IaC is a plus

Data Engineer

at VyTCDC

Posted by Gobinath Sundaram

Bengaluru (Bangalore)

5 - 8 yrs

₹4L - ₹25L / yr

Data engineering

Python

Spark

🛠️ Key Responsibilities

Design, build, and maintain scalable data pipelines using Python and Apache Spark (PySpark or Scala APIs)
Develop and optimize ETL processes for batch and real-time data ingestion
Collaborate with data scientists, analysts, and DevOps teams to support data-driven solutions
Ensure data quality, integrity, and governance across all stages of the data lifecycle
Implement data validation, monitoring, and alerting mechanisms for production pipelines
Work with cloud platforms (AWS, GCP, or Azure) and tools like Airflow, Kafka, and Delta Lake
Participate in code reviews, performance tuning, and documentation

🎓 Qualifications

Bachelor’s or Master’s degree in Computer Science, Engineering, or related field
3–6 years of experience in data engineering with a focus on Python and Spark
Experience with distributed computing and handling large-scale datasets (10TB+)
Familiarity with data security, PII handling, and compliance standards is a plus

🛠️ Key Responsibilities

Design, build, and maintain scalable data pipelines using Python and Apache Spark (PySpark or Scala APIs)
Develop and optimize ETL processes for batch and real-time data ingestion
Collaborate with data scientists, analysts, and DevOps teams to support data-driven solutions
Ensure data quality, integrity, and governance across all stages of the data lifecycle
Implement data validation, monitoring, and alerting mechanisms for production pipelines
Work with cloud platforms (AWS, GCP, or Azure) and tools like Airflow, Kafka, and Delta Lake
Participate in code reviews, performance tuning, and documentation

🎓 Qualifications

Bachelor’s or Master’s degree in Computer Science, Engineering, or related field
3–6 years of experience in data engineering with a focus on Python and Spark
Experience with distributed computing and handling large-scale datasets (10TB+)
Familiarity with data security, PII handling, and compliance standards is a plus

senior data engineer

at KJBN labs

2 candid answers

Posted by sakthi ganesh

Bengaluru (Bangalore)

4 - 7 yrs

₹10L - ₹30L / yr

Hadoop

Apache Kafka

Spark

Python

Java

+8 more

Senior Data Engineer Job Description

Overview

The Senior Data Engineer will design, develop, and maintain scalable data pipelines and

infrastructure to support data-driven decision-making and advanced analytics. This role requires deep

expertise in data engineering, strong problem-solving skills, and the ability to collaborate with

cross-functional teams to deliver robust data solutions.

Key Responsibilities

Data Pipeline Development: Design, build, and optimize scalable, secure, and reliable data

pipelines to ingest, process, and transform large volumes of structured and unstructured data.

Data Architecture: Architect and maintain data storage solutions, including data lakes, data

warehouses, and databases, ensuring performance, scalability, and cost-efficiency.

Data Integration: Integrate data from diverse sources, including APIs, third-party systems,

and streaming platforms, ensuring data quality and consistency.

Performance Optimization: Monitor and optimize data systems for performance, scalability,

and cost, implementing best practices for partitioning, indexing, and caching.

Collaboration: Work closely with data scientists, analysts, and software engineers to

understand data needs and deliver solutions that enable advanced analytics, machine

learning, and reporting.

Data Governance: Implement data governance policies, ensuring compliance with data

security, privacy regulations (e.g., GDPR, CCPA), and internal standards.

Automation: Develop automated processes for data ingestion, transformation, and validation

to improve efficiency and reduce manual intervention.

Mentorship: Guide and mentor junior data engineers, fostering a culture of technical

excellence and continuous learning.

Troubleshooting: Diagnose and resolve complex data-related issues, ensuring high

availability and reliability of data systems.

Required Qualifications

Education: Bachelor’s or Master’s degree in Computer Science, Engineering, Data Science,

or a related field.

Experience: 5+ years of experience in data engineering or a related role, with a proven track

record of building scalable data pipelines and infrastructure.

Technical Skills:

Proficiency in programming languages such as Python, Java, or Scala.

Expertise in SQL and experience with NoSQL databases (e.g., MongoDB, Cassandra).

Strong experience with cloud platforms (e.g., AWS, Azure, GCP) and their data services

(e.g., Redshift, BigQuery, Snowflake).

Hands-on experience with ETL/ELT tools (e.g., Apache Airflow, Talend, Informatica) and

data integration frameworks.

Familiarity with big data technologies (e.g., Hadoop, Spark, Kafka) and distributed

systems.

Knowledge of containerization and orchestration tools (e.g., Docker, Kubernetes) is a

plus.

Soft Skills:

Excellent problem-solving and analytical skills.

Strong communication and collaboration abilities.

Ability to work in a fast-paced, dynamic environment and manage multiple priorities.

Certifications (optional but preferred): Cloud certifications (e.g., AWS Certified Data Analytics,

Google Professional Data Engineer) or relevant data engineering certifications.

Preferred Qualifica

Experience with real-time data processing and streaming architectures.

Familiarity with machine learning pipelines and MLOps practices.

Knowledge of data visualization tools (e.g., Tableau, Power BI) and their integration with data

pipelines.

Experience in industries with high data complexity, such as finance, healthcare, or

e-commerce.

Work Environment

Location: Hybrid/Remote/On-site (depending on company policy).

Team: Collaborative, cross-functional team environment with data scientists, analysts, and

business stakeholders.

Hours: Full-time, with occasional on-call responsibilities for critical data systems.