PySpark Jobs in Chennai

29+ PySpark Jobs in Chennai | PySpark Job openings in Chennai

Apply to 29+ PySpark Jobs in Chennai on CutShort.io. Explore the latest PySpark Job opportunities across top companies like Google, Amazon & Adobe.

Senior Data Engineer_AWS

at Ganit Business Solutions

3 recruiters

Agency job

via hirezyai by HR Hirezyai

Bengaluru (Bangalore), Chennai, Mumbai

5.5 - 12 yrs

₹15L - ₹25L / yr

Amazon Web Services (AWS)

PySpark

SQL

Roles & Responsibilities

Data Engineering Excellence: Design and implement data pipelines using formats like JSON, Parquet, CSV, and ORC, utilizing batch and streaming ingestion.
Cloud Data Migration Leadership: Lead cloud migration projects, developing scalable Spark pipelines.
Medallion Architecture: Implement Bronze, Silver, and gold tables for scalable data systems.
Spark Code Optimization: Optimize Spark code to ensure efficient cloud migration.
Data Modeling: Develop and maintain data models with strong governance practices.
Data Cataloging & Quality: Implement cataloging strategies with Unity Catalog to maintain high-quality data.
Delta Live Table Leadership: Lead the design and implementation of Delta Live Tables (DLT) pipelines for secure, tamper-resistant data management.
Customer Collaboration: Collaborate with clients to optimize cloud migrations and ensure best practices in design and governance.

Educational Qualifications

Experience: Minimum 5 years of hands-on experience in data engineering, with a proven track record in complex pipeline development and cloud-based data migration projects.
Education: Bachelor’s or higher degree in Computer Science, Data Engineering, or a related field.
Skills
Must-have: Proficiency in Spark, SQL, Python, and other relevant data processing technologies. Strong knowledge of Databricks and its components, including Delta Live Table (DLT) pipeline implementations. Expertise in on-premises to cloud Spark code optimization and Medallion Architecture.

Good to Have

Familiarity with AWS services (experience with additional cloud platforms like GCP or Azure is a plus).

Soft Skills

Excellent communication and collaboration skills, with the ability to work effectively with clients and internal teams.
Certifications
AWS/GCP/Azure Data Engineer Certification.

Roles & Responsibilities

Data Engineering Excellence: Design and implement data pipelines using formats like JSON, Parquet, CSV, and ORC, utilizing batch and streaming ingestion.
Cloud Data Migration Leadership: Lead cloud migration projects, developing scalable Spark pipelines.
Medallion Architecture: Implement Bronze, Silver, and gold tables for scalable data systems.
Spark Code Optimization: Optimize Spark code to ensure efficient cloud migration.
Data Modeling: Develop and maintain data models with strong governance practices.
Data Cataloging & Quality: Implement cataloging strategies with Unity Catalog to maintain high-quality data.
Delta Live Table Leadership: Lead the design and implementation of Delta Live Tables (DLT) pipelines for secure, tamper-resistant data management.
Customer Collaboration: Collaborate with clients to optimize cloud migrations and ensure best practices in design and governance.

Educational Qualifications

Experience: Minimum 5 years of hands-on experience in data engineering, with a proven track record in complex pipeline development and cloud-based data migration projects.
Education: Bachelor’s or higher degree in Computer Science, Data Engineering, or a related field.
Skills
Must-have: Proficiency in Spark, SQL, Python, and other relevant data processing technologies. Strong knowledge of Databricks and its components, including Delta Live Table (DLT) pipeline implementations. Expertise in on-premises to cloud Spark code optimization and Medallion Architecture.

Good to Have

Familiarity with AWS services (experience with additional cloud platforms like GCP or Azure is a plus).

Soft Skills

Excellent communication and collaboration skills, with the ability to work effectively with clients and internal teams.
Certifications
AWS/GCP/Azure Data Engineer Certification.

PySpark/Scala Developer

at Tata Consultancy Services

2 recruiters

Agency job

via Risk Resources LLP hyd by Jhansi Padiy

Chennai, Hyderabad, Kolkata, Delhi, Pune, Bengaluru (Bangalore)

4 - 10 yrs

₹6L - ₹30L / yr

Scala

PySpark

Spark

Amazon Web Services (AWS)

Job Title: PySpark/Scala Developer

Functional Skills: Experience in Credit Risk/Regulatory risk domain

Technical Skills: Spark ,PySpark, Python, Hive, Scala, MapReduce, Unix shell scripting

Good to Have Skills: Exposure to Machine Learning Techniques

Job Description:

5+ Years of experience with Developing/Fine tuning and implementing programs/applications

Using Python/PySpark/Scala on Big Data/Hadoop Platform.

Roles and Responsibilities:

a) Work with a Leading Bank’s Risk Management team on specific projects/requirements pertaining to risk Models in

consumer and wholesale banking

b) Enhance Machine Learning Models using PySpark or Scala

c) Work with Data Scientists to Build ML Models based on Business Requirements and Follow ML Cycle to Deploy them all

the way to Production Environment

d) Participate Feature Engineering, Training Models, Scoring and retraining

e) Architect Data Pipeline and Automate Data Ingestion and Model Jobs

Skills and competencies:

Required:

· Strong analytical skills in conducting sophisticated statistical analysis using bureau/vendor data, customer performance

Data and macro-economic data to solve business problems.

· Working experience in languages PySpark & Scala to develop code to validate and implement models and codes in

Credit Risk/Banking

· Experience with distributed systems such as Hadoop/MapReduce, Spark, streaming data processing, cloud architecture.

Familiarity with machine learning frameworks and libraries (like scikit-learn, SparkML, tensorflow, pytorch etc.
Experience in systems integration, web services, batch processing
Experience in migrating codes to PySpark/Scala is big Plus
The ability to act as liaison conveying information needs of the business to IT and data constraints to the business

applies equal conveyance regarding business strategy and IT strategy, business processes and work flow

· Flexibility in approach and thought process

· Attitude to learn and comprehend the periodical changes in the regulatory requirement as per FED

Job Title: PySpark/Scala Developer

Functional Skills: Experience in Credit Risk/Regulatory risk domain

Technical Skills: Spark ,PySpark, Python, Hive, Scala, MapReduce, Unix shell scripting

Good to Have Skills: Exposure to Machine Learning Techniques

Job Description:

5+ Years of experience with Developing/Fine tuning and implementing programs/applications

Using Python/PySpark/Scala on Big Data/Hadoop Platform.

Roles and Responsibilities:

a) Work with a Leading Bank’s Risk Management team on specific projects/requirements pertaining to risk Models in

consumer and wholesale banking

b) Enhance Machine Learning Models using PySpark or Scala

c) Work with Data Scientists to Build ML Models based on Business Requirements and Follow ML Cycle to Deploy them all

the way to Production Environment

d) Participate Feature Engineering, Training Models, Scoring and retraining

e) Architect Data Pipeline and Automate Data Ingestion and Model Jobs

Skills and competencies:

Required:

· Strong analytical skills in conducting sophisticated statistical analysis using bureau/vendor data, customer performance

Data and macro-economic data to solve business problems.

· Working experience in languages PySpark & Scala to develop code to validate and implement models and codes in

Credit Risk/Banking

· Experience with distributed systems such as Hadoop/MapReduce, Spark, streaming data processing, cloud architecture.

Familiarity with machine learning frameworks and libraries (like scikit-learn, SparkML, tensorflow, pytorch etc.
Experience in systems integration, web services, batch processing
Experience in migrating codes to PySpark/Scala is big Plus
The ability to act as liaison conveying information needs of the business to IT and data constraints to the business

applies equal conveyance regarding business strategy and IT strategy, business processes and work flow

· Flexibility in approach and thought process

· Attitude to learn and comprehend the periodical changes in the regulatory requirement as per FED

PySpark/Scala Developer

at Tata Consultancy Services

2 recruiters

Agency job

via Risk Resources LLP hyd by susmitha o

Bengaluru (Bangalore), Hyderabad, Pune, Delhi, Kolkata, Chennai

5 - 8 yrs

₹7L - ₹30L / yr

Scala

Python

PySpark

Apache Hive

Spark

+3 more

Skills and competencies:

Required:

· Strong analytical skills in conducting sophisticated statistical analysis using bureau/vendor data, customer performance

Data and macro-economic data to solve business problems.

· Working experience in languages PySpark & Scala to develop code to validate and implement models and codes in

Credit Risk/Banking

· Experience with distributed systems such as Hadoop/MapReduce, Spark, streaming data processing, cloud architecture.

Familiarity with machine learning frameworks and libraries (like scikit-learn, SparkML, tensorflow, pytorch etc.
Experience in systems integration, web services, batch processing
Experience in migrating codes to PySpark/Scala is big Plus
The ability to act as liaison conveying information needs of the business to IT and data constraints to the business

applies equal conveyance regarding business strategy and IT strategy, business processes and work flow

· Flexibility in approach and thought process

· Attitude to learn and comprehend the periodical changes in the regulatory requirement as per FED

Skills and competencies:

Required:

· Strong analytical skills in conducting sophisticated statistical analysis using bureau/vendor data, customer performance

Data and macro-economic data to solve business problems.

· Working experience in languages PySpark & Scala to develop code to validate and implement models and codes in

Credit Risk/Banking

· Experience with distributed systems such as Hadoop/MapReduce, Spark, streaming data processing, cloud architecture.

Familiarity with machine learning frameworks and libraries (like scikit-learn, SparkML, tensorflow, pytorch etc.
Experience in systems integration, web services, batch processing
Experience in migrating codes to PySpark/Scala is big Plus
The ability to act as liaison conveying information needs of the business to IT and data constraints to the business

applies equal conveyance regarding business strategy and IT strategy, business processes and work flow

· Flexibility in approach and thought process

· Attitude to learn and comprehend the periodical changes in the regulatory requirement as per FED

Solution/Technical Architect (Databricks)

at Quintica

Posted by Nitin D

Remote, Bengaluru (Bangalore), Pune, Chennai, Nagpur

5 - 15 yrs

₹20L - ₹30L / yr

databricks

PySpark

Apache Spark

CI/CD

Data engineering

Technical Architect (Databricks)

10+ Years Data Engineering Experience with expertise in Databricks
3+ years of consulting experience
Completed Data Engineering Professional certification & required classes
Minimum 2-3 projects delivered with hands-on experience in Databricks
Completed Apache Spark Programming with Databricks, Data Engineering with Databricks, Optimizing Apache Spark™ on Databricks
Experience in Spark and/or Hadoop, Flink, Presto, other popular big data engines
Familiarity with Databricks multi-hop pipeline architecture

Sr. Data Engineer (Databricks)

5+ Years Data Engineering Experience with expertise in Databricks
Completed Data Engineering Associate certification & required classes
Minimum 1 project delivered with hands-on experience in development on Databricks
Completed Apache Spark Programming with Databricks, Data Engineering with Databricks, Optimizing Apache Spark™ on Databricks
SQL delivery experience, and familiarity with Bigquery, Synapse or Redshift
Proficient in Python, knowledge of additional databricks programming languages (Scala)

Technical Architect (Databricks)

10+ Years Data Engineering Experience with expertise in Databricks
3+ years of consulting experience
Completed Data Engineering Professional certification & required classes
Minimum 2-3 projects delivered with hands-on experience in Databricks
Completed Apache Spark Programming with Databricks, Data Engineering with Databricks, Optimizing Apache Spark™ on Databricks
Experience in Spark and/or Hadoop, Flink, Presto, other popular big data engines
Familiarity with Databricks multi-hop pipeline architecture

Sr. Data Engineer (Databricks)

5+ Years Data Engineering Experience with expertise in Databricks
Completed Data Engineering Associate certification & required classes
Minimum 1 project delivered with hands-on experience in development on Databricks
Completed Apache Spark Programming with Databricks, Data Engineering with Databricks, Optimizing Apache Spark™ on Databricks
SQL delivery experience, and familiarity with Bigquery, Synapse or Redshift
Proficient in Python, knowledge of additional databricks programming languages (Scala)

AWS Data Engineer

at VyTCDC

Posted by Gobinath Sundaram

Chennai, Bengaluru (Bangalore), Hyderabad, Mumbai, Pune, Noida

4 - 6 yrs

₹3L - ₹21L / yr

AWS Data Engineer

Amazon Web Services (AWS)

Python

PySpark

databricks

+1 more

Key Responsibilities

Design and implement ETL/ELT pipelines using Databricks, PySpark, and AWS Glue
Develop and maintain scalable data architectures on AWS (S3, EMR, Lambda, Redshift, RDS)
Perform data wrangling, cleansing, and transformation using Python and SQL
Collaborate with data scientists to integrate Generative AI models into analytics workflows
Build dashboards and reports to visualize insights using tools like Power BI or Tableau
Ensure data quality, governance, and security across all data assets
Optimize performance of data pipelines and troubleshoot bottlenecks
Work closely with stakeholders to understand data requirements and deliver actionable insights

🧪 Required Skills

Skill AreaTools & TechnologiesCloud PlatformsAWS (S3, Lambda, Glue, EMR, Redshift)Big DataDatabricks, Apache Spark, PySparkProgrammingPython, SQLData EngineeringETL/ELT, Data Lakes, Data WarehousingAnalyticsData Modeling, Visualization, BI ReportingGen AI IntegrationOpenAI, Hugging Face, LangChain (preferred)DevOps (Bonus)Git, Jenkins, Terraform, Docker

📚 Qualifications

Bachelor's or Master’s degree in Computer Science, Data Science, or related field
3+ years of experience in data engineering or data analytics
Hands-on experience with Databricks, PySpark, and AWS
Familiarity with Generative AI tools and frameworks is a strong plus
Strong problem-solving and communication skills

🌟 Preferred Traits

Analytical mindset with attention to detail
Passion for data and emerging technologies
Ability to work independently and in cross-functional teams
Eagerness to learn and adapt in a fast-paced environment

Key Responsibilities

Design and implement ETL/ELT pipelines using Databricks, PySpark, and AWS Glue
Develop and maintain scalable data architectures on AWS (S3, EMR, Lambda, Redshift, RDS)
Perform data wrangling, cleansing, and transformation using Python and SQL
Collaborate with data scientists to integrate Generative AI models into analytics workflows
Build dashboards and reports to visualize insights using tools like Power BI or Tableau
Ensure data quality, governance, and security across all data assets
Optimize performance of data pipelines and troubleshoot bottlenecks
Work closely with stakeholders to understand data requirements and deliver actionable insights

🧪 Required Skills

📚 Qualifications

Bachelor's or Master’s degree in Computer Science, Data Science, or related field
3+ years of experience in data engineering or data analytics
Hands-on experience with Databricks, PySpark, and AWS
Familiarity with Generative AI tools and frameworks is a strong plus
Strong problem-solving and communication skills

🌟 Preferred Traits

Analytical mindset with attention to detail
Passion for data and emerging technologies
Ability to work independently and in cross-functional teams
Eagerness to learn and adapt in a fast-paced environment

AWS Data Engineer

at Deqode

1 recruiter

Posted by Alisha Das

Bengaluru (Bangalore), Mumbai, Pune, Chennai, Gurugram

5.6 - 7 yrs

₹10L - ₹28L / yr

Amazon Web Services (AWS)

Python

PySpark

SQL

Job Summary:

As an AWS Data Engineer, you will be responsible for designing, developing, and maintaining scalable, high-performance data pipelines using AWS services. With 6+ years of experience, you’ll collaborate closely with data architects, analysts, and business stakeholders to build reliable, secure, and cost-efficient data infrastructure across the organization.

Key Responsibilities:

Design, develop, and manage scalable data pipelines using AWS Glue, Lambda, and other serverless technologies
Implement ETL workflows and transformation logic using PySpark and Python on AWS Glue
Leverage AWS Redshift for warehousing, performance tuning, and large-scale data queries
Work with AWS DMS and RDS for database integration and migration
Optimize data flows and system performance for speed and cost-effectiveness
Deploy and manage infrastructure using AWS CloudFormation templates
Collaborate with cross-functional teams to gather requirements and build robust data solutions
Ensure data integrity, quality, and security across all systems and processes

Required Skills & Experience:

6+ years of experience in Data Engineering with strong AWS expertise
Proficient in Python and PySpark for data processing and ETL development
Hands-on experience with AWS Glue, Lambda, DMS, RDS, and Redshift
Strong SQL skills for building complex queries and performing data analysis
Familiarity with AWS CloudFormation and infrastructure as code principles
Good understanding of serverless architecture and cost-optimized design
Ability to write clean, modular, and maintainable code
Strong analytical thinking and problem-solving skills

Job Summary:

Key Responsibilities:

Design, develop, and manage scalable data pipelines using AWS Glue, Lambda, and other serverless technologies
Implement ETL workflows and transformation logic using PySpark and Python on AWS Glue
Leverage AWS Redshift for warehousing, performance tuning, and large-scale data queries
Work with AWS DMS and RDS for database integration and migration
Optimize data flows and system performance for speed and cost-effectiveness
Deploy and manage infrastructure using AWS CloudFormation templates
Collaborate with cross-functional teams to gather requirements and build robust data solutions
Ensure data integrity, quality, and security across all systems and processes

Required Skills & Experience:

6+ years of experience in Data Engineering with strong AWS expertise
Proficient in Python and PySpark for data processing and ETL development
Hands-on experience with AWS Glue, Lambda, DMS, RDS, and Redshift
Strong SQL skills for building complex queries and performing data analysis
Familiarity with AWS CloudFormation and infrastructure as code principles
Good understanding of serverless architecture and cost-optimized design
Ability to write clean, modular, and maintainable code
Strong analytical thinking and problem-solving skills

ETL Automation Tester

at E2E Infoware Management Services

Posted by Monika S

Bengaluru (Bangalore), Pune, Chennai

5 - 12 yrs

₹5L - ₹25L / yr

PySpark

Automation

SQL

Skill Name: ETL Automation Testing

Location: Bangalore, Chennai and Pune

Experience: 5+ Years

Required:

Experience in ETL Automation Testing

Strong experience in Pyspark.

Skill Name: ETL Automation Testing

Location: Bangalore, Chennai and Pune

Experience: 5+ Years

Required:

Experience in ETL Automation Testing

Strong experience in Pyspark.

AWS Data Engineer

at Deqode

1 recruiter

Posted by Roshni Maji

Pune, Bengaluru (Bangalore), Gurugram, Chennai, Mumbai

5 - 7 yrs

₹6L - ₹20L / yr

Amazon Web Services (AWS)

Amazon Redshift

AWS Glue

Python

PySpark

Position: AWS Data Engineer

Experience: 5 to 7 Years

Location: Bengaluru, Pune, Chennai, Mumbai, Gurugram

Work Mode: Hybrid (3 days work from office per week)

Employment Type: Full-time

About the Role:

We are seeking a highly skilled and motivated AWS Data Engineer with 5–7 years of experience in building and optimizing data pipelines, architectures, and data sets. The ideal candidate will have strong experience with AWS services including Glue, Athena, Redshift, Lambda, DMS, RDS, and CloudFormation. You will be responsible for managing the full data lifecycle from ingestion to transformation and storage, ensuring efficiency and performance.

Key Responsibilities:

Design, develop, and optimize scalable ETL pipelines using AWS Glue, Python/PySpark, and SQL.
Work extensively with AWS services such as Glue, Athena, Lambda, DMS, RDS, Redshift, CloudFormation, and other serverless technologies.
Implement and manage data lake and warehouse solutions using AWS Redshift and S3.
Optimize data models and storage for cost-efficiency and performance.
Write advanced SQL queries to support complex data analysis and reporting requirements.
Collaborate with stakeholders to understand data requirements and translate them into scalable solutions.
Ensure high data quality and integrity across platforms and processes.
Implement CI/CD pipelines and best practices for infrastructure as code using CloudFormation or similar tools.

Required Skills & Experience:

Strong hands-on experience with Python or PySpark for data processing.
Deep knowledge of AWS Glue, Athena, Lambda, Redshift, RDS, DMS, and CloudFormation.
Proficiency in writing complex SQL queries and optimizing them for performance.
Familiarity with serverless architectures and AWS best practices.
Experience in designing and maintaining robust data architectures and data lakes.
Ability to troubleshoot and resolve data pipeline issues efficiently.
Strong communication and stakeholder management skills.

Position: AWS Data Engineer

Experience: 5 to 7 Years

Location: Bengaluru, Pune, Chennai, Mumbai, Gurugram

Work Mode: Hybrid (3 days work from office per week)

Employment Type: Full-time

About the Role:

Key Responsibilities:

Design, develop, and optimize scalable ETL pipelines using AWS Glue, Python/PySpark, and SQL.
Work extensively with AWS services such as Glue, Athena, Lambda, DMS, RDS, Redshift, CloudFormation, and other serverless technologies.
Implement and manage data lake and warehouse solutions using AWS Redshift and S3.
Optimize data models and storage for cost-efficiency and performance.
Write advanced SQL queries to support complex data analysis and reporting requirements.
Collaborate with stakeholders to understand data requirements and translate them into scalable solutions.
Ensure high data quality and integrity across platforms and processes.
Implement CI/CD pipelines and best practices for infrastructure as code using CloudFormation or similar tools.

Required Skills & Experience:

Strong hands-on experience with Python or PySpark for data processing.
Deep knowledge of AWS Glue, Athena, Lambda, Redshift, RDS, DMS, and CloudFormation.
Proficiency in writing complex SQL queries and optimizing them for performance.
Familiarity with serverless architectures and AWS best practices.
Experience in designing and maintaining robust data architectures and data lakes.
Ability to troubleshoot and resolve data pipeline issues efficiently.
Strong communication and stakeholder management skills.

AWS Data Engineer

at Deqode

1 recruiter

Posted by Roshni Maji

Bengaluru (Bangalore), Pune, Mumbai, Chennai, Gurugram

5 - 7 yrs

₹5L - ₹19L / yr

Python

PySpark

Amazon Web Services (AWS)

aws

Amazon Redshift

+1 more

Position: AWS Data Engineer

Experience: 5 to 7 Years

Location: Bengaluru, Pune, Chennai, Mumbai, Gurugram

Work Mode: Hybrid (3 days work from office per week)

Employment Type: Full-time

About the Role:

Key Responsibilities:

Design, develop, and optimize scalable ETL pipelines using AWS Glue, Python/PySpark, and SQL.
Work extensively with AWS services such as Glue, Athena, Lambda, DMS, RDS, Redshift, CloudFormation, and other serverless technologies.
Implement and manage data lake and warehouse solutions using AWS Redshift and S3.
Optimize data models and storage for cost-efficiency and performance.
Write advanced SQL queries to support complex data analysis and reporting requirements.
Collaborate with stakeholders to understand data requirements and translate them into scalable solutions.
Ensure high data quality and integrity across platforms and processes.
Implement CI/CD pipelines and best practices for infrastructure as code using CloudFormation or similar tools.

Required Skills & Experience:

Strong hands-on experience with Python or PySpark for data processing.
Deep knowledge of AWS Glue, Athena, Lambda, Redshift, RDS, DMS, and CloudFormation.
Proficiency in writing complex SQL queries and optimizing them for performance.
Familiarity with serverless architectures and AWS best practices.
Experience in designing and maintaining robust data architectures and data lakes.
Ability to troubleshoot and resolve data pipeline issues efficiently.
Strong communication and stakeholder management skills.

Position: AWS Data Engineer

Experience: 5 to 7 Years

Location: Bengaluru, Pune, Chennai, Mumbai, Gurugram

Work Mode: Hybrid (3 days work from office per week)

Employment Type: Full-time

About the Role:

Key Responsibilities:

Design, develop, and optimize scalable ETL pipelines using AWS Glue, Python/PySpark, and SQL.
Work extensively with AWS services such as Glue, Athena, Lambda, DMS, RDS, Redshift, CloudFormation, and other serverless technologies.
Implement and manage data lake and warehouse solutions using AWS Redshift and S3.
Optimize data models and storage for cost-efficiency and performance.
Write advanced SQL queries to support complex data analysis and reporting requirements.
Collaborate with stakeholders to understand data requirements and translate them into scalable solutions.
Ensure high data quality and integrity across platforms and processes.
Implement CI/CD pipelines and best practices for infrastructure as code using CloudFormation or similar tools.

Required Skills & Experience:

Strong hands-on experience with Python or PySpark for data processing.
Deep knowledge of AWS Glue, Athena, Lambda, Redshift, RDS, DMS, and CloudFormation.
Proficiency in writing complex SQL queries and optimizing them for performance.
Familiarity with serverless architectures and AWS best practices.
Experience in designing and maintaining robust data architectures and data lakes.
Ability to troubleshoot and resolve data pipeline issues efficiently.
Strong communication and stakeholder management skills.

Data Engineer - AWS

at Deqode

1 recruiter

Posted by Shraddha Katare

Bengaluru (Bangalore), Pune, Chennai, Mumbai, Gurugram

5 - 7 yrs

₹5L - ₹19L / yr

Amazon Web Services (AWS)

Python

PySpark

SQL

redshift

Profile: AWS Data Engineer

Mode- Hybrid

Experience- 5+7 years

Locations - Bengaluru, Pune, Chennai, Mumbai, Gurugram

Roles and Responsibilities

Design and maintain ETL pipelines using AWS Glue and Python/PySpark
Optimize SQL queries for Redshift and Athena
Develop Lambda functions for serverless data processing
Configure AWS DMS for database migration and replication
Implement infrastructure as code with CloudFormation
Build optimized data models for performance
Manage RDS databases and AWS service integrations
Troubleshoot and improve data processing efficiency
Gather requirements from business stakeholders
Implement data quality checks and validation
Document data pipelines and architecture
Monitor workflows and implement alerting
Keep current with AWS services and best practices

Required Technical Expertise:

Python/PySpark for data processing
AWS Glue for ETL operations
Redshift and Athena for data querying
AWS Lambda and serverless architecture
AWS DMS and RDS management
CloudFormation for infrastructure
SQL optimization and performance tuning

Profile: AWS Data Engineer

Mode- Hybrid

Experience- 5+7 years

Locations - Bengaluru, Pune, Chennai, Mumbai, Gurugram

Roles and Responsibilities

Design and maintain ETL pipelines using AWS Glue and Python/PySpark
Optimize SQL queries for Redshift and Athena
Develop Lambda functions for serverless data processing
Configure AWS DMS for database migration and replication
Implement infrastructure as code with CloudFormation
Build optimized data models for performance
Manage RDS databases and AWS service integrations
Troubleshoot and improve data processing efficiency
Gather requirements from business stakeholders
Implement data quality checks and validation
Document data pipelines and architecture
Monitor workflows and implement alerting
Keep current with AWS services and best practices

Required Technical Expertise:

Python/PySpark for data processing
AWS Glue for ETL operations
Redshift and Athena for data querying
AWS Lambda and serverless architecture
AWS DMS and RDS management
CloudFormation for infrastructure
SQL optimization and performance tuning

AWS Data Engineer

at Deqode

1 recruiter

Posted by Alisha Das

Pune, Mumbai, Bengaluru (Bangalore), Chennai

4 - 7 yrs

₹5L - ₹15L / yr

Amazon Web Services (AWS)

Python

PySpark

Glue semantics

Amazon Redshift

+1 more

Job Overview:

We are seeking an experienced AWS Data Engineer to join our growing data team. The ideal candidate will have hands-on experience with AWS Glue, Redshift, PySpark, and other AWS services to build robust, scalable data pipelines. This role is perfect for someone passionate about data engineering, automation, and cloud-native development.

Key Responsibilities:

Design, build, and maintain scalable and efficient ETL pipelines using AWS Glue, PySpark, and related tools.
Integrate data from diverse sources and ensure its quality, consistency, and reliability.
Work with large datasets in structured and semi-structured formats across cloud-based data lakes and warehouses.
Optimize and maintain data infrastructure, including Amazon Redshift, for high performance.
Collaborate with data analysts, data scientists, and product teams to understand data requirements and deliver solutions.
Automate data validation, transformation, and loading processes to support real-time and batch data processing.
Monitor and troubleshoot data pipeline issues and ensure smooth operations in production environments.

Required Skills:

5 to 7 years of hands-on experience in data engineering roles.
Strong proficiency in Python and PySpark for data transformation and scripting.
Deep understanding and practical experience with AWS Glue, AWS Redshift, S3, and other AWS data services.
Solid understanding of SQL and database optimization techniques.
Experience working with large-scale data pipelines and high-volume data environments.
Good knowledge of data modeling, warehousing, and performance tuning.

Preferred/Good to Have:

Experience with workflow orchestration tools like Airflow or Step Functions.
Familiarity with CI/CD for data pipelines.
Knowledge of data governance and security best practices on AWS.

Job Overview:

Key Responsibilities:

Design, build, and maintain scalable and efficient ETL pipelines using AWS Glue, PySpark, and related tools.
Integrate data from diverse sources and ensure its quality, consistency, and reliability.
Work with large datasets in structured and semi-structured formats across cloud-based data lakes and warehouses.
Optimize and maintain data infrastructure, including Amazon Redshift, for high performance.
Collaborate with data analysts, data scientists, and product teams to understand data requirements and deliver solutions.
Automate data validation, transformation, and loading processes to support real-time and batch data processing.
Monitor and troubleshoot data pipeline issues and ensure smooth operations in production environments.

Required Skills:

5 to 7 years of hands-on experience in data engineering roles.
Strong proficiency in Python and PySpark for data transformation and scripting.
Deep understanding and practical experience with AWS Glue, AWS Redshift, S3, and other AWS data services.
Solid understanding of SQL and database optimization techniques.
Experience working with large-scale data pipelines and high-volume data environments.
Good knowledge of data modeling, warehousing, and performance tuning.

Preferred/Good to Have:

Experience with workflow orchestration tools like Airflow or Step Functions.
Familiarity with CI/CD for data pipelines.
Knowledge of data governance and security best practices on AWS.

Data Engineer

at ZeMoSo Technologies

11 recruiters

Agency job

via TIGI HR Solution Pvt. Ltd. by Vaidehi Sarkar

Mumbai, Bengaluru (Bangalore), Hyderabad, Chennai, Pune

4 - 8 yrs

₹10L - ₹15L / yr

Data engineering

Python

SQL

Data Warehouse (DWH)

Amazon Web Services (AWS)

+3 more

Work Mode: Hybrid

Need B.Tech, BE, M.Tech, ME candidates - Mandatory

Must-Have Skills:

● Educational Qualification :- B.Tech, BE, M.Tech, ME in any field.

● Minimum of 3 years of proven experience as a Data Engineer.

● Strong proficiency in Python programming language and SQL.

● Experience in DataBricks and setting up and managing data pipelines, data warehouses/lakes.

● Good comprehension and critical thinking skills.

● Kindly note Salary bracket will vary according to the exp. of the candidate -

- Experience from 4 yrs to 6 yrs - Salary upto 22 LPA

- Experience from 5 yrs to 8 yrs - Salary upto 30 LPA

- Experience more than 8 yrs - Salary upto 40 LPA

Work Mode: Hybrid

Need B.Tech, BE, M.Tech, ME candidates - Mandatory

Must-Have Skills:

● Educational Qualification :- B.Tech, BE, M.Tech, ME in any field.

● Minimum of 3 years of proven experience as a Data Engineer.

● Strong proficiency in Python programming language and SQL.

● Experience in DataBricks and setting up and managing data pipelines, data warehouses/lakes.

● Good comprehension and critical thinking skills.

● Kindly note Salary bracket will vary according to the exp. of the candidate -

- Experience from 4 yrs to 6 yrs - Salary upto 22 LPA

- Experience from 5 yrs to 8 yrs - Salary upto 30 LPA

- Experience more than 8 yrs - Salary upto 40 LPA

GCP Senior Data Engineer

at Xebia IT Architects

2 recruiters

Posted by Vijay S

Bengaluru (Bangalore), Gurugram, Pune, Hyderabad, Chennai, Bhopal, Jaipur

10 - 15 yrs

₹30L - ₹40L / yr

Spark

Google Cloud Platform (GCP)

Python

Apache Airflow

PySpark

+1 more

We are looking for a Senior Data Engineer with strong expertise in GCP, Databricks, and Airflow to design and implement a GCP Cloud Native Data Processing Framework. The ideal candidate will work on building scalable data pipelines and help migrate existing workloads to a modern framework.

Shift: 2 PM 11 PM
Work Mode: Hybrid (3 days a week) across Xebia locations
Notice Period: Immediate joiners or those with a notice period of up to 30 days

Key Responsibilities:

Design and implement a GCP Native Data Processing Framework leveraging Spark and GCP Cloud Services.
Develop and maintain data pipelines using Databricks and Airflow for transforming Raw → Silver → Gold data layers.
Ensure data integrity, consistency, and availability across all systems.
Collaborate with data engineers, analysts, and stakeholders to optimize performance.
Document standards and best practices for data engineering workflows.

Required Experience:

7-8 years of experience in data engineering, architecture, and pipeline development.
Strong knowledge of GCP, Databricks, PySpark, and BigQuery.
Experience with Orchestration tools like Airflow, Dagster, or GCP equivalents.
Understanding of Data Lake table formats (Delta, Iceberg, etc.).
Proficiency in Python for scripting and automation.
Strong problem-solving skills and collaborative mindset.

⚠️ Please apply only if you have not applied recently or are not currently in the interview process for any open roles at Xebia.

Looking forward to your response!

Best regards,

Vijay S

Assistant Manager - TAG

https://www.linkedin.com/in/vijay-selvarajan/

Shift: 2 PM 11 PM
Work Mode: Hybrid (3 days a week) across Xebia locations
Notice Period: Immediate joiners or those with a notice period of up to 30 days

Key Responsibilities:

Design and implement a GCP Native Data Processing Framework leveraging Spark and GCP Cloud Services.
Develop and maintain data pipelines using Databricks and Airflow for transforming Raw → Silver → Gold data layers.
Ensure data integrity, consistency, and availability across all systems.
Collaborate with data engineers, analysts, and stakeholders to optimize performance.
Document standards and best practices for data engineering workflows.

Required Experience:

7-8 years of experience in data engineering, architecture, and pipeline development.
Strong knowledge of GCP, Databricks, PySpark, and BigQuery.
Experience with Orchestration tools like Airflow, Dagster, or GCP equivalents.
Understanding of Data Lake table formats (Delta, Iceberg, etc.).
Proficiency in Python for scripting and automation.
Strong problem-solving skills and collaborative mindset.

⚠️ Please apply only if you have not applied recently or are not currently in the interview process for any open roles at Xebia.

Looking forward to your response!

Best regards,

Vijay S

Assistant Manager - TAG

https://www.linkedin.com/in/vijay-selvarajan/

AWS Data Engineer (Contractual)

at Forward Eye Technologies

Posted by Jaya S

Bengaluru (Bangalore), Mumbai, Delhi, Gurugram, Pune, Hyderabad, Ahmedabad, Chennai

3 - 7 yrs

₹8L - ₹15L / yr

AWS Lambda

Amazon S3

Amazon VPC

Amazon EC2

Amazon Redshift

+3 more

Technical Skills:

Ability to understand and translate business requirements into design.
Proficient in AWS infrastructure components such as S3, IAM, VPC, EC2, and Redshift.
Experience in creating ETL jobs using Python/PySpark.
Proficiency in creating AWS Lambda functions for event-based jobs.
Knowledge of automating ETL processes using AWS Step Functions.
Competence in building data warehouses and loading data into them.

Responsibilities:

Understand business requirements and translate them into design.
Assess AWS infrastructure needs for development work.
Develop ETL jobs using Python/PySpark to meet requirements.
Implement AWS Lambda for event-based tasks.
Automate ETL processes using AWS Step Functions.
Build data warehouses and manage data loading.
Engage with customers and stakeholders to articulate the benefits of proposed solutions and frameworks.

Technical Skills:

Ability to understand and translate business requirements into design.
Proficient in AWS infrastructure components such as S3, IAM, VPC, EC2, and Redshift.
Experience in creating ETL jobs using Python/PySpark.
Proficiency in creating AWS Lambda functions for event-based jobs.
Knowledge of automating ETL processes using AWS Step Functions.
Competence in building data warehouses and loading data into them.

Responsibilities:

Understand business requirements and translate them into design.
Assess AWS infrastructure needs for development work.
Develop ETL jobs using Python/PySpark to meet requirements.
Implement AWS Lambda for event-based tasks.
Automate ETL processes using AWS Step Functions.
Build data warehouses and manage data loading.
Engage with customers and stakeholders to articulate the benefits of proposed solutions and frameworks.

Data Engineer

one-to-one, one-to-many, and many-to-many

Agency job

via The Hub by Sridevi Viswanathan

Chennai

5 - 10 yrs

₹1L - ₹15L / yr

AWS CloudFormation

Python

PySpark

AWS Lambda

5-7 years of experience in Data Engineering with solid experience in design, development and implementation of end-to-end data ingestion and data processing system in AWS platform.

2-3 years of experience in AWS Glue, Lambda, Appflow, EventBridge, Python, PySpark, Lake House, S3, Redshift, Postgres, API Gateway, CloudFormation, Kinesis, Athena, KMS, IAM.

Experience in modern data architecture, Lake House, Enterprise Data Lake, Data Warehouse, API interfaces, solution patterns, standards and optimizing data ingestion.

Experience in build of data pipelines from source systems like SAP Concur, Veeva Vault, Azure Cost, various social media platforms or similar source systems.

Expertise in analyzing source data and designing a robust and scalable data ingestion framework and pipelines adhering to client Enterprise Data Architecture guidelines.

Proficient in design and development of solutions for real-time (or near real time) stream data processing as well as batch processing on the AWS platform.

Work closely with business analysts, data architects, data engineers, and data analysts to ensure that the data ingestion solutions meet the needs of the business.

Troubleshoot and provide support for issues related to data quality and data ingestion solutions. This may involve debugging data pipeline processes, optimizing queries, or troubleshooting application performance issues.

Experience in working in Agile/Scrum methodologies, CI/CD tools and practices, coding standards, code reviews, source management (GITHUB), JIRA, JIRA Xray and Confluence.

Experience or exposure to design and development using Full Stack tools.

Strong analytical and problem-solving skills, excellent communication (written and oral), and interpersonal skills.

Bachelor's or master's degree in computer science or related field.