PySpark Jobs in Chennai

29+ PySpark Jobs in Chennai | PySpark Job openings in Chennai

Apply to 29+ PySpark Jobs in Chennai on CutShort.io. Explore the latest PySpark Job opportunities across top companies like Google, Amazon & Adobe.

Data Engineer

at Moative

3 candid answers

Posted by Eman Khan

Chennai

3 - 5 yrs

₹10L - ₹25L / yr

Python

PySpark

Scala

Data engineering

ETL

+12 more

About Moative

Moative, an Applied AI company, designs and builds transformation AI solutions for traditional industries in energy, utilities, healthcare & lifesciences, and more. Through Moative Labs, we build AI micro-products and launch AI startups with partners in vertical markets that align with our theses.

Our Past: We have built and sold two companies, one of which was an AI company. Our founders and leaders are Math PhDs, Ivy League University Alumni, Ex-Googlers, and successful entrepreneurs.

Our Team: Our team of 20+ employees consist of data scientists, AI/ML Engineers, and mathematicians from top engineering and research institutes such as IITs, CERN, IISc, UZH, Ph.Ds. Our team includes academicians, IBM Research Fellows, and former founders.

Work you’ll do

As a Data Engineer, you will work on data architecture, large-scale processing systems, and data flow management. You will build and maintain optimal data architecture and data pipelines, assemble large, complex data sets, and ensure that data is readily available to data scientists, analysts, and other users. In close collaboration with ML engineers, data scientists, and domain experts, you’ll deliver robust, production-grade solutions that directly impact business outcomes. Ultimately, you will be responsible for developing and implementing systems that optimize the organization’s data use and data quality.

Responsibilities

Create and maintain optimal data architecture and data pipelines on cloud infrastructure (such as AWS/ Azure/ GCP)
Assemble large, complex data sets that meet functional / non-functional business requirements
Identify, design, and implement internal process improvements
Build the pipeline infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources
Support development of analytics that utilize the data pipeline to provide actionable insights into key business metrics
Work with stakeholders to assist with data-related technical issues and support their data infrastructure needs

Who you are

You are a passionate and results-oriented engineer who understands the importance of data architecture and data quality to impact solution development, enhance products, and ultimately improve business applications. You thrive in dynamic environments and are comfortable navigating ambiguity. You possess a strong sense of ownership and are eager to take initiative, advocating for your technical decisions while remaining open to feedback and collaboration.

You have experience in developing and deploying data pipelines to support real-world applications. You have a good understanding of data structures and are excellent at writing clean, efficient code to extract, create and manage large data sets for analytical uses. You have the ability to conduct regular testing and debugging to ensure optimal data pipeline performance. You are excited at the possibility of contributing to intelligent applications that can directly impact business services and make a positive difference to users.

Skills & Requirements

3+ years of hands-on experience as a data engineer, data architect or similar role, with a good understanding of data structures and data engineering.
Solid knowledge of cloud infra and data-related services on AWS (EC2, EMR, RDS, Redshift) and/ or Azure.
Advanced knowledge of SQL, including writing complex queries, stored procedures, views, etc.
Strong experience with data pipeline and workflow management tools (such as Luigi, Airflow).
Experience with common relational SQL, NoSQL and Graph databases.
Strong experience with scripting languages: Python, PySpark, Scala, etc.
Practical experience with basic DevOps concepts: CI/CD, containerization (Docker, Kubernetes), etc
Experience with big data tools (Spark, Kafka, etc) and stream processing.
Excellent communication skills to collaborate with colleagues from both technical and business backgrounds, discuss and convey ideas and findings effectively.
Ability to analyze complex problems, think critically for troubleshooting and develop robust data solutions.
Ability to identify and tackle issues efficiently and proactively, conduct thorough research and collaborate to find long-term, scalable solutions.

Working at Moative

Moative is a young company, but we believe strongly in thinking long-term, while acting with urgency. Our ethos is rooted in innovation, efficiency and high-quality outcomes. We believe the future of work is AI-augmented and boundary less. Here are some of our guiding principles:

Think in decades. Act in hours. As an independent company, our moat is time. While our decisions are for the long-term horizon, our execution will be fast – measured in hours and days, not weeks and months.
Own the canvas. Throw yourself in to build, fix or improve – anything that isn’t done right, irrespective of who did it. Be selfish about improving across the organization – because once the rot sets in, we waste years in surgery and recovery.
Use data or don’t use data. Use data where you ought to but not as a ‘cover-my-back’ political tool. Be capable of making decisions with partial or limited data. Get better at intuition and pattern-matching. Whichever way you go, be mostly right about it.
Avoid work about work. Process creeps on purpose, unless we constantly question it. We are deliberate about committing to rituals that take time away from the actual work. We truly believe that a meeting that could be an email, should be an email and you don’t need a person with the highest title to say that out loud.
High revenue per person. We work backwards from this metric. Our default is to automate instead of hiring. We multi-skill our people to own more outcomes than hiring someone who has less to do. We don’t like squatting and hoarding that comes in the form of hiring for growth. High revenue per person comes from high quality work from everyone. We demand it.

If this role and our work is of interest to you, please apply. We encourage you to apply even if you believe you do not meet all the requirements listed above.

That said, you should demonstrate that you are in the 90th percentile or above. This may mean that you have studied in top-notch institutions, won competitions that are intellectually demanding, built something of your own, or rated as an outstanding performer by your current or previous employers.

The position is based out of Chennai. Our work currently involves significant in-person collaboration and we expect you to work out of our offices in Chennai.

About Moative

Our Past: We have built and sold two companies, one of which was an AI company. Our founders and leaders are Math PhDs, Ivy League University Alumni, Ex-Googlers, and successful entrepreneurs.

Work you’ll do

Responsibilities

Create and maintain optimal data architecture and data pipelines on cloud infrastructure (such as AWS/ Azure/ GCP)
Assemble large, complex data sets that meet functional / non-functional business requirements
Identify, design, and implement internal process improvements
Build the pipeline infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources
Support development of analytics that utilize the data pipeline to provide actionable insights into key business metrics
Work with stakeholders to assist with data-related technical issues and support their data infrastructure needs

Who you are

Skills & Requirements

3+ years of hands-on experience as a data engineer, data architect or similar role, with a good understanding of data structures and data engineering.
Solid knowledge of cloud infra and data-related services on AWS (EC2, EMR, RDS, Redshift) and/ or Azure.
Advanced knowledge of SQL, including writing complex queries, stored procedures, views, etc.
Strong experience with data pipeline and workflow management tools (such as Luigi, Airflow).
Experience with common relational SQL, NoSQL and Graph databases.
Strong experience with scripting languages: Python, PySpark, Scala, etc.
Practical experience with basic DevOps concepts: CI/CD, containerization (Docker, Kubernetes), etc
Experience with big data tools (Spark, Kafka, etc) and stream processing.
Excellent communication skills to collaborate with colleagues from both technical and business backgrounds, discuss and convey ideas and findings effectively.
Ability to analyze complex problems, think critically for troubleshooting and develop robust data solutions.
Ability to identify and tackle issues efficiently and proactively, conduct thorough research and collaborate to find long-term, scalable solutions.

Working at Moative

Think in decades. Act in hours. As an independent company, our moat is time. While our decisions are for the long-term horizon, our execution will be fast – measured in hours and days, not weeks and months.
Own the canvas. Throw yourself in to build, fix or improve – anything that isn’t done right, irrespective of who did it. Be selfish about improving across the organization – because once the rot sets in, we waste years in surgery and recovery.
Use data or don’t use data. Use data where you ought to but not as a ‘cover-my-back’ political tool. Be capable of making decisions with partial or limited data. Get better at intuition and pattern-matching. Whichever way you go, be mostly right about it.
Avoid work about work. Process creeps on purpose, unless we constantly question it. We are deliberate about committing to rituals that take time away from the actual work. We truly believe that a meeting that could be an email, should be an email and you don’t need a person with the highest title to say that out loud.
High revenue per person. We work backwards from this metric. Our default is to automate instead of hiring. We multi-skill our people to own more outcomes than hiring someone who has less to do. We don’t like squatting and hoarding that comes in the form of hiring for growth. High revenue per person comes from high quality work from everyone. We demand it.

If this role and our work is of interest to you, please apply. We encourage you to apply even if you believe you do not meet all the requirements listed above.

The position is based out of Chennai. Our work currently involves significant in-person collaboration and we expect you to work out of our offices in Chennai.

PySpark/Scala Developer

at Tata Consultancy Services

2 recruiters

Agency job

via Risk Resources LLP hyd by Jhansi Padiy

Chennai, Hyderabad, Kolkata, Delhi, Pune, Bengaluru (Bangalore)

4 - 10 yrs

₹6L - ₹30L / yr

Scala

PySpark

Spark

Amazon Web Services (AWS)

Job Title: PySpark/Scala Developer

Functional Skills: Experience in Credit Risk/Regulatory risk domain

Technical Skills: Spark ,PySpark, Python, Hive, Scala, MapReduce, Unix shell scripting

Good to Have Skills: Exposure to Machine Learning Techniques

Job Description:

5+ Years of experience with Developing/Fine tuning and implementing programs/applications

Using Python/PySpark/Scala on Big Data/Hadoop Platform.

Roles and Responsibilities:

a) Work with a Leading Bank’s Risk Management team on specific projects/requirements pertaining to risk Models in

consumer and wholesale banking

b) Enhance Machine Learning Models using PySpark or Scala

c) Work with Data Scientists to Build ML Models based on Business Requirements and Follow ML Cycle to Deploy them all

the way to Production Environment

d) Participate Feature Engineering, Training Models, Scoring and retraining

e) Architect Data Pipeline and Automate Data Ingestion and Model Jobs

Skills and competencies:

Required:

· Strong analytical skills in conducting sophisticated statistical analysis using bureau/vendor data, customer performance

Data and macro-economic data to solve business problems.

· Working experience in languages PySpark & Scala to develop code to validate and implement models and codes in

Credit Risk/Banking

· Experience with distributed systems such as Hadoop/MapReduce, Spark, streaming data processing, cloud architecture.

Familiarity with machine learning frameworks and libraries (like scikit-learn, SparkML, tensorflow, pytorch etc.
Experience in systems integration, web services, batch processing
Experience in migrating codes to PySpark/Scala is big Plus
The ability to act as liaison conveying information needs of the business to IT and data constraints to the business

applies equal conveyance regarding business strategy and IT strategy, business processes and work flow

· Flexibility in approach and thought process

· Attitude to learn and comprehend the periodical changes in the regulatory requirement as per FED

Job Title: PySpark/Scala Developer

Functional Skills: Experience in Credit Risk/Regulatory risk domain

Technical Skills: Spark ,PySpark, Python, Hive, Scala, MapReduce, Unix shell scripting

Good to Have Skills: Exposure to Machine Learning Techniques

Job Description:

5+ Years of experience with Developing/Fine tuning and implementing programs/applications

Using Python/PySpark/Scala on Big Data/Hadoop Platform.

Roles and Responsibilities:

a) Work with a Leading Bank’s Risk Management team on specific projects/requirements pertaining to risk Models in

consumer and wholesale banking

b) Enhance Machine Learning Models using PySpark or Scala

c) Work with Data Scientists to Build ML Models based on Business Requirements and Follow ML Cycle to Deploy them all

the way to Production Environment

d) Participate Feature Engineering, Training Models, Scoring and retraining

e) Architect Data Pipeline and Automate Data Ingestion and Model Jobs

Skills and competencies:

Required:

· Strong analytical skills in conducting sophisticated statistical analysis using bureau/vendor data, customer performance

Data and macro-economic data to solve business problems.

· Working experience in languages PySpark & Scala to develop code to validate and implement models and codes in

Credit Risk/Banking

· Experience with distributed systems such as Hadoop/MapReduce, Spark, streaming data processing, cloud architecture.

Familiarity with machine learning frameworks and libraries (like scikit-learn, SparkML, tensorflow, pytorch etc.
Experience in systems integration, web services, batch processing
Experience in migrating codes to PySpark/Scala is big Plus
The ability to act as liaison conveying information needs of the business to IT and data constraints to the business

applies equal conveyance regarding business strategy and IT strategy, business processes and work flow

· Flexibility in approach and thought process

· Attitude to learn and comprehend the periodical changes in the regulatory requirement as per FED

PySpark/Scala Developer

at Tata Consultancy Services

2 recruiters

Agency job

via Risk Resources LLP hyd by susmitha o

Bengaluru (Bangalore), Hyderabad, Pune, Delhi, Kolkata, Chennai

5 - 8 yrs

₹7L - ₹30L / yr

Scala

Python

PySpark

Apache Hive

Spark

+3 more

Skills and competencies:

Required:

· Strong analytical skills in conducting sophisticated statistical analysis using bureau/vendor data, customer performance

Data and macro-economic data to solve business problems.

· Working experience in languages PySpark & Scala to develop code to validate and implement models and codes in

Credit Risk/Banking

· Experience with distributed systems such as Hadoop/MapReduce, Spark, streaming data processing, cloud architecture.

Familiarity with machine learning frameworks and libraries (like scikit-learn, SparkML, tensorflow, pytorch etc.
Experience in systems integration, web services, batch processing
Experience in migrating codes to PySpark/Scala is big Plus
The ability to act as liaison conveying information needs of the business to IT and data constraints to the business

applies equal conveyance regarding business strategy and IT strategy, business processes and work flow

· Flexibility in approach and thought process

· Attitude to learn and comprehend the periodical changes in the regulatory requirement as per FED

Skills and competencies:

Required:

· Strong analytical skills in conducting sophisticated statistical analysis using bureau/vendor data, customer performance

Data and macro-economic data to solve business problems.

· Working experience in languages PySpark & Scala to develop code to validate and implement models and codes in

Credit Risk/Banking

· Experience with distributed systems such as Hadoop/MapReduce, Spark, streaming data processing, cloud architecture.

Familiarity with machine learning frameworks and libraries (like scikit-learn, SparkML, tensorflow, pytorch etc.
Experience in systems integration, web services, batch processing
Experience in migrating codes to PySpark/Scala is big Plus
The ability to act as liaison conveying information needs of the business to IT and data constraints to the business

applies equal conveyance regarding business strategy and IT strategy, business processes and work flow

· Flexibility in approach and thought process

· Attitude to learn and comprehend the periodical changes in the regulatory requirement as per FED

Solution/Technical Architect (Databricks)

at Quintica

Posted by Nitin D

Remote, Bengaluru (Bangalore), Pune, Chennai, Nagpur

5 - 15 yrs

₹20L - ₹30L / yr

databricks

PySpark

Apache Spark

CI/CD

Data engineering

Technical Architect (Databricks)

10+ Years Data Engineering Experience with expertise in Databricks
3+ years of consulting experience
Completed Data Engineering Professional certification & required classes
Minimum 2-3 projects delivered with hands-on experience in Databricks
Completed Apache Spark Programming with Databricks, Data Engineering with Databricks, Optimizing Apache Spark™ on Databricks
Experience in Spark and/or Hadoop, Flink, Presto, other popular big data engines
Familiarity with Databricks multi-hop pipeline architecture

Sr. Data Engineer (Databricks)

5+ Years Data Engineering Experience with expertise in Databricks
Completed Data Engineering Associate certification & required classes
Minimum 1 project delivered with hands-on experience in development on Databricks
Completed Apache Spark Programming with Databricks, Data Engineering with Databricks, Optimizing Apache Spark™ on Databricks
SQL delivery experience, and familiarity with Bigquery, Synapse or Redshift
Proficient in Python, knowledge of additional databricks programming languages (Scala)

Technical Architect (Databricks)

10+ Years Data Engineering Experience with expertise in Databricks
3+ years of consulting experience
Completed Data Engineering Professional certification & required classes
Minimum 2-3 projects delivered with hands-on experience in Databricks
Completed Apache Spark Programming with Databricks, Data Engineering with Databricks, Optimizing Apache Spark™ on Databricks
Experience in Spark and/or Hadoop, Flink, Presto, other popular big data engines
Familiarity with Databricks multi-hop pipeline architecture

Sr. Data Engineer (Databricks)

5+ Years Data Engineering Experience with expertise in Databricks
Completed Data Engineering Associate certification & required classes
Minimum 1 project delivered with hands-on experience in development on Databricks
Completed Apache Spark Programming with Databricks, Data Engineering with Databricks, Optimizing Apache Spark™ on Databricks
SQL delivery experience, and familiarity with Bigquery, Synapse or Redshift
Proficient in Python, knowledge of additional databricks programming languages (Scala)

AWS Data Engineer

at VyTCDC

Posted by Gobinath Sundaram

Chennai, Bengaluru (Bangalore), Hyderabad, Mumbai, Pune, Noida

4 - 6 yrs

₹3L - ₹21L / yr

AWS Data Engineer

Amazon Web Services (AWS)

Python

PySpark

databricks

+1 more

Key Responsibilities

Design and implement ETL/ELT pipelines using Databricks, PySpark, and AWS Glue
Develop and maintain scalable data architectures on AWS (S3, EMR, Lambda, Redshift, RDS)
Perform data wrangling, cleansing, and transformation using Python and SQL
Collaborate with data scientists to integrate Generative AI models into analytics workflows
Build dashboards and reports to visualize insights using tools like Power BI or Tableau
Ensure data quality, governance, and security across all data assets
Optimize performance of data pipelines and troubleshoot bottlenecks
Work closely with stakeholders to understand data requirements and deliver actionable insights

🧪 Required Skills

Skill AreaTools & TechnologiesCloud PlatformsAWS (S3, Lambda, Glue, EMR, Redshift)Big DataDatabricks, Apache Spark, PySparkProgrammingPython, SQLData EngineeringETL/ELT, Data Lakes, Data WarehousingAnalyticsData Modeling, Visualization, BI ReportingGen AI IntegrationOpenAI, Hugging Face, LangChain (preferred)DevOps (Bonus)Git, Jenkins, Terraform, Docker

📚 Qualifications

Bachelor's or Master’s degree in Computer Science, Data Science, or related field
3+ years of experience in data engineering or data analytics
Hands-on experience with Databricks, PySpark, and AWS
Familiarity with Generative AI tools and frameworks is a strong plus
Strong problem-solving and communication skills

🌟 Preferred Traits

Analytical mindset with attention to detail
Passion for data and emerging technologies
Ability to work independently and in cross-functional teams
Eagerness to learn and adapt in a fast-paced environment

Key Responsibilities

Design and implement ETL/ELT pipelines using Databricks, PySpark, and AWS Glue
Develop and maintain scalable data architectures on AWS (S3, EMR, Lambda, Redshift, RDS)
Perform data wrangling, cleansing, and transformation using Python and SQL
Collaborate with data scientists to integrate Generative AI models into analytics workflows
Build dashboards and reports to visualize insights using tools like Power BI or Tableau
Ensure data quality, governance, and security across all data assets
Optimize performance of data pipelines and troubleshoot bottlenecks
Work closely with stakeholders to understand data requirements and deliver actionable insights

🧪 Required Skills

📚 Qualifications

Bachelor's or Master’s degree in Computer Science, Data Science, or related field
3+ years of experience in data engineering or data analytics
Hands-on experience with Databricks, PySpark, and AWS
Familiarity with Generative AI tools and frameworks is a strong plus
Strong problem-solving and communication skills

🌟 Preferred Traits

Analytical mindset with attention to detail
Passion for data and emerging technologies
Ability to work independently and in cross-functional teams
Eagerness to learn and adapt in a fast-paced environment

AWS Data Engineer

at Deqode

1 recruiter

Posted by Alisha Das

Bengaluru (Bangalore), Mumbai, Pune, Chennai, Gurugram

5.6 - 7 yrs

₹10L - ₹28L / yr

Amazon Web Services (AWS)

Python

PySpark

SQL

Job Summary:

As an AWS Data Engineer, you will be responsible for designing, developing, and maintaining scalable, high-performance data pipelines using AWS services. With 6+ years of experience, you’ll collaborate closely with data architects, analysts, and business stakeholders to build reliable, secure, and cost-efficient data infrastructure across the organization.

Key Responsibilities:

Design, develop, and manage scalable data pipelines using AWS Glue, Lambda, and other serverless technologies
Implement ETL workflows and transformation logic using PySpark and Python on AWS Glue
Leverage AWS Redshift for warehousing, performance tuning, and large-scale data queries
Work with AWS DMS and RDS for database integration and migration
Optimize data flows and system performance for speed and cost-effectiveness
Deploy and manage infrastructure using AWS CloudFormation templates
Collaborate with cross-functional teams to gather requirements and build robust data solutions
Ensure data integrity, quality, and security across all systems and processes

Required Skills & Experience:

6+ years of experience in Data Engineering with strong AWS expertise
Proficient in Python and PySpark for data processing and ETL development
Hands-on experience with AWS Glue, Lambda, DMS, RDS, and Redshift
Strong SQL skills for building complex queries and performing data analysis
Familiarity with AWS CloudFormation and infrastructure as code principles
Good understanding of serverless architecture and cost-optimized design
Ability to write clean, modular, and maintainable code
Strong analytical thinking and problem-solving skills

Job Summary:

Key Responsibilities:

Design, develop, and manage scalable data pipelines using AWS Glue, Lambda, and other serverless technologies
Implement ETL workflows and transformation logic using PySpark and Python on AWS Glue
Leverage AWS Redshift for warehousing, performance tuning, and large-scale data queries
Work with AWS DMS and RDS for database integration and migration
Optimize data flows and system performance for speed and cost-effectiveness
Deploy and manage infrastructure using AWS CloudFormation templates
Collaborate with cross-functional teams to gather requirements and build robust data solutions
Ensure data integrity, quality, and security across all systems and processes

Required Skills & Experience:

6+ years of experience in Data Engineering with strong AWS expertise
Proficient in Python and PySpark for data processing and ETL development
Hands-on experience with AWS Glue, Lambda, DMS, RDS, and Redshift
Strong SQL skills for building complex queries and performing data analysis
Familiarity with AWS CloudFormation and infrastructure as code principles
Good understanding of serverless architecture and cost-optimized design
Ability to write clean, modular, and maintainable code
Strong analytical thinking and problem-solving skills

ETL Automation Tester

at E2E Infoware Management Services

Posted by Monika S

Bengaluru (Bangalore), Pune, Chennai

5 - 12 yrs

₹5L - ₹25L / yr

PySpark

Automation

SQL

Skill Name: ETL Automation Testing

Location: Bangalore, Chennai and Pune

Experience: 5+ Years

Required:

Experience in ETL Automation Testing

Strong experience in Pyspark.

Skill Name: ETL Automation Testing

Location: Bangalore, Chennai and Pune

Experience: 5+ Years

Required:

Experience in ETL Automation Testing

Strong experience in Pyspark.

AWS Data Engineer

at Deqode

1 recruiter

Posted by Roshni Maji

Pune, Bengaluru (Bangalore), Gurugram, Chennai, Mumbai

5 - 7 yrs

₹6L - ₹20L / yr

Amazon Web Services (AWS)

Amazon Redshift

AWS Glue

Python

PySpark

Position: AWS Data Engineer

Experience: 5 to 7 Years

Location: Bengaluru, Pune, Chennai, Mumbai, Gurugram

Work Mode: Hybrid (3 days work from office per week)

Employment Type: Full-time

About the Role:

We are seeking a highly skilled and motivated AWS Data Engineer with 5–7 years of experience in building and optimizing data pipelines, architectures, and data sets. The ideal candidate will have strong experience with AWS services including Glue, Athena, Redshift, Lambda, DMS, RDS, and CloudFormation. You will be responsible for managing the full data lifecycle from ingestion to transformation and storage, ensuring efficiency and performance.

Key Responsibilities:

Design, develop, and optimize scalable ETL pipelines using AWS Glue, Python/PySpark, and SQL.
Work extensively with AWS services such as Glue, Athena, Lambda, DMS, RDS, Redshift, CloudFormation, and other serverless technologies.
Implement and manage data lake and warehouse solutions using AWS Redshift and S3.
Optimize data models and storage for cost-efficiency and performance.
Write advanced SQL queries to support complex data analysis and reporting requirements.
Collaborate with stakeholders to understand data requirements and translate them into scalable solutions.
Ensure high data quality and integrity across platforms and processes.
Implement CI/CD pipelines and best practices for infrastructure as code using CloudFormation or similar tools.

Required Skills & Experience:

Strong hands-on experience with Python or PySpark for data processing.
Deep knowledge of AWS Glue, Athena, Lambda, Redshift, RDS, DMS, and CloudFormation.
Proficiency in writing complex SQL queries and optimizing them for performance.
Familiarity with serverless architectures and AWS best practices.
Experience in designing and maintaining robust data architectures and data lakes.
Ability to troubleshoot and resolve data pipeline issues efficiently.
Strong communication and stakeholder management skills.

Position: AWS Data Engineer

Experience: 5 to 7 Years

Location: Bengaluru, Pune, Chennai, Mumbai, Gurugram

Work Mode: Hybrid (3 days work from office per week)

Employment Type: Full-time

About the Role:

Key Responsibilities:

Design, develop, and optimize scalable ETL pipelines using AWS Glue, Python/PySpark, and SQL.
Work extensively with AWS services such as Glue, Athena, Lambda, DMS, RDS, Redshift, CloudFormation, and other serverless technologies.
Implement and manage data lake and warehouse solutions using AWS Redshift and S3.
Optimize data models and storage for cost-efficiency and performance.
Write advanced SQL queries to support complex data analysis and reporting requirements.
Collaborate with stakeholders to understand data requirements and translate them into scalable solutions.
Ensure high data quality and integrity across platforms and processes.
Implement CI/CD pipelines and best practices for infrastructure as code using CloudFormation or similar tools.

Required Skills & Experience:

Strong hands-on experience with Python or PySpark for data processing.
Deep knowledge of AWS Glue, Athena, Lambda, Redshift, RDS, DMS, and CloudFormation.
Proficiency in writing complex SQL queries and optimizing them for performance.
Familiarity with serverless architectures and AWS best practices.
Experience in designing and maintaining robust data architectures and data lakes.
Ability to troubleshoot and resolve data pipeline issues efficiently.
Strong communication and stakeholder management skills.

AWS Data Engineer

at Deqode

1 recruiter

Posted by Roshni Maji

Bengaluru (Bangalore), Pune, Mumbai, Chennai, Gurugram

5 - 7 yrs

₹5L - ₹19L / yr

Python

PySpark

Amazon Web Services (AWS)

aws

Amazon Redshift

+1 more

Position: AWS Data Engineer

Experience: 5 to 7 Years

Location: Bengaluru, Pune, Chennai, Mumbai, Gurugram

Work Mode: Hybrid (3 days work from office per week)

Employment Type: Full-time

About the Role:

Key Responsibilities:

Design, develop, and optimize scalable ETL pipelines using AWS Glue, Python/PySpark, and SQL.
Work extensively with AWS services such as Glue, Athena, Lambda, DMS, RDS, Redshift, CloudFormation, and other serverless technologies.
Implement and manage data lake and warehouse solutions using AWS Redshift and S3.
Optimize data models and storage for cost-efficiency and performance.
Write advanced SQL queries to support complex data analysis and reporting requirements.
Collaborate with stakeholders to understand data requirements and translate them into scalable solutions.
Ensure high data quality and integrity across platforms and processes.
Implement CI/CD pipelines and best practices for infrastructure as code using CloudFormation or similar tools.

Required Skills & Experience:

Strong hands-on experience with Python or PySpark for data processing.
Deep knowledge of AWS Glue, Athena, Lambda, Redshift, RDS, DMS, and CloudFormation.
Proficiency in writing complex SQL queries and optimizing them for performance.
Familiarity with serverless architectures and AWS best practices.
Experience in designing and maintaining robust data architectures and data lakes.
Ability to troubleshoot and resolve data pipeline issues efficiently.
Strong communication and stakeholder management skills.

Position: AWS Data Engineer

Experience: 5 to 7 Years

Location: Bengaluru, Pune, Chennai, Mumbai, Gurugram

Work Mode: Hybrid (3 days work from office per week)

Employment Type: Full-time

About the Role:

Key Responsibilities:

Design, develop, and optimize scalable ETL pipelines using AWS Glue, Python/PySpark, and SQL.
Work extensively with AWS services such as Glue, Athena, Lambda, DMS, RDS, Redshift, CloudFormation, and other serverless technologies.
Implement and manage data lake and warehouse solutions using AWS Redshift and S3.
Optimize data models and storage for cost-efficiency and performance.
Write advanced SQL queries to support complex data analysis and reporting requirements.
Collaborate with stakeholders to understand data requirements and translate them into scalable solutions.
Ensure high data quality and integrity across platforms and processes.
Implement CI/CD pipelines and best practices for infrastructure as code using CloudFormation or similar tools.

Required Skills & Experience:

Strong hands-on experience with Python or PySpark for data processing.
Deep knowledge of AWS Glue, Athena, Lambda, Redshift, RDS, DMS, and CloudFormation.
Proficiency in writing complex SQL queries and optimizing them for performance.
Familiarity with serverless architectures and AWS best practices.
Experience in designing and maintaining robust data architectures and data lakes.
Ability to troubleshoot and resolve data pipeline issues efficiently.
Strong communication and stakeholder management skills.

Data Engineer - AWS

at Deqode

1 recruiter

Posted by Shraddha Katare

Bengaluru (Bangalore), Pune, Chennai, Mumbai, Gurugram

5 - 7 yrs

₹5L - ₹19L / yr

Amazon Web Services (AWS)

Python

PySpark

SQL

redshift

Profile: AWS Data Engineer

Mode- Hybrid

Experience- 5+7 years

Locations - Bengaluru, Pune, Chennai, Mumbai, Gurugram

Roles and Responsibilities

Design and maintain ETL pipelines using AWS Glue and Python/PySpark
Optimize SQL queries for Redshift and Athena
Develop Lambda functions for serverless data processing
Configure AWS DMS for database migration and replication
Implement infrastructure as code with CloudFormation
Build optimized data models for performance
Manage RDS databases and AWS service integrations
Troubleshoot and improve data processing efficiency
Gather requirements from business stakeholders
Implement data quality checks and validation
Document data pipelines and architecture
Monitor workflows and implement alerting
Keep current with AWS services and best practices

Required Technical Expertise:

Python/PySpark for data processing
AWS Glue for ETL operations
Redshift and Athena for data querying
AWS Lambda and serverless architecture
AWS DMS and RDS management
CloudFormation for infrastructure
SQL optimization and performance tuning

Profile: AWS Data Engineer

Mode- Hybrid

Experience- 5+7 years

Locations - Bengaluru, Pune, Chennai, Mumbai, Gurugram

Roles and Responsibilities

Design and maintain ETL pipelines using AWS Glue and Python/PySpark
Optimize SQL queries for Redshift and Athena
Develop Lambda functions for serverless data processing
Configure AWS DMS for database migration and replication
Implement infrastructure as code with CloudFormation
Build optimized data models for performance
Manage RDS databases and AWS service integrations
Troubleshoot and improve data processing efficiency
Gather requirements from business stakeholders
Implement data quality checks and validation
Document data pipelines and architecture
Monitor workflows and implement alerting
Keep current with AWS services and best practices

Required Technical Expertise:

Python/PySpark for data processing
AWS Glue for ETL operations
Redshift and Athena for data querying
AWS Lambda and serverless architecture
AWS DMS and RDS management
CloudFormation for infrastructure
SQL optimization and performance tuning

AWS Data Engineer

at Deqode

1 recruiter

Posted by Alisha Das

Pune, Mumbai, Bengaluru (Bangalore), Chennai

4 - 7 yrs

₹5L - ₹15L / yr

Amazon Web Services (AWS)

Python

PySpark

Glue semantics

Amazon Redshift

+1 more

Job Overview:

We are seeking an experienced AWS Data Engineer to join our growing data team. The ideal candidate will have hands-on experience with AWS Glue, Redshift, PySpark, and other AWS services to build robust, scalable data pipelines. This role is perfect for someone passionate about data engineering, automation, and cloud-native development.

Key Responsibilities:

Design, build, and maintain scalable and efficient ETL pipelines using AWS Glue, PySpark, and related tools.
Integrate data from diverse sources and ensure its quality, consistency, and reliability.
Work with large datasets in structured and semi-structured formats across cloud-based data lakes and warehouses.
Optimize and maintain data infrastructure, including Amazon Redshift, for high performance.
Collaborate with data analysts, data scientists, and product teams to understand data requirements and deliver solutions.
Automate data validation, transformation, and loading processes to support real-time and batch data processing.
Monitor and troubleshoot data pipeline issues and ensure smooth operations in production environments.

Required Skills:

5 to 7 years of hands-on experience in data engineering roles.
Strong proficiency in Python and PySpark for data transformation and scripting.
Deep understanding and practical experience with AWS Glue, AWS Redshift, S3, and other AWS data services.
Solid understanding of SQL and database optimization techniques.
Experience working with large-scale data pipelines and high-volume data environments.
Good knowledge of data modeling, warehousing, and performance tuning.

Preferred/Good to Have:

Experience with workflow orchestration tools like Airflow or Step Functions.
Familiarity with CI/CD for data pipelines.
Knowledge of data governance and security best practices on AWS.

Job Overview:

Key Responsibilities:

Design, build, and maintain scalable and efficient ETL pipelines using AWS Glue, PySpark, and related tools.
Integrate data from diverse sources and ensure its quality, consistency, and reliability.
Work with large datasets in structured and semi-structured formats across cloud-based data lakes and warehouses.
Optimize and maintain data infrastructure, including Amazon Redshift, for high performance.
Collaborate with data analysts, data scientists, and product teams to understand data requirements and deliver solutions.
Automate data validation, transformation, and loading processes to support real-time and batch data processing.
Monitor and troubleshoot data pipeline issues and ensure smooth operations in production environments.

Required Skills:

5 to 7 years of hands-on experience in data engineering roles.
Strong proficiency in Python and PySpark for data transformation and scripting.
Deep understanding and practical experience with AWS Glue, AWS Redshift, S3, and other AWS data services.
Solid understanding of SQL and database optimization techniques.
Experience working with large-scale data pipelines and high-volume data environments.
Good knowledge of data modeling, warehousing, and performance tuning.

Preferred/Good to Have:

Experience with workflow orchestration tools like Airflow or Step Functions.
Familiarity with CI/CD for data pipelines.
Knowledge of data governance and security best practices on AWS.

Data Engineer

at ZeMoSo Technologies

11 recruiters

Agency job

via TIGI HR Solution Pvt. Ltd. by Vaidehi Sarkar

Mumbai, Bengaluru (Bangalore), Hyderabad, Chennai, Pune

4 - 8 yrs

₹10L - ₹15L / yr

Data engineering

Python

SQL

Data Warehouse (DWH)

Amazon Web Services (AWS)

+3 more

Work Mode: Hybrid

Need B.Tech, BE, M.Tech, ME candidates - Mandatory

Must-Have Skills:

● Educational Qualification :- B.Tech, BE, M.Tech, ME in any field.

● Minimum of 3 years of proven experience as a Data Engineer.

● Strong proficiency in Python programming language and SQL.

● Experience in DataBricks and setting up and managing data pipelines, data warehouses/lakes.

● Good comprehension and critical thinking skills.

● Kindly note Salary bracket will vary according to the exp. of the candidate -

- Experience from 4 yrs to 6 yrs - Salary upto 22 LPA

- Experience from 5 yrs to 8 yrs - Salary upto 30 LPA

- Experience more than 8 yrs - Salary upto 40 LPA

Work Mode: Hybrid

Need B.Tech, BE, M.Tech, ME candidates - Mandatory

Must-Have Skills:

● Educational Qualification :- B.Tech, BE, M.Tech, ME in any field.

● Minimum of 3 years of proven experience as a Data Engineer.

● Strong proficiency in Python programming language and SQL.

● Experience in DataBricks and setting up and managing data pipelines, data warehouses/lakes.

● Good comprehension and critical thinking skills.

● Kindly note Salary bracket will vary according to the exp. of the candidate -

- Experience from 4 yrs to 6 yrs - Salary upto 22 LPA

- Experience from 5 yrs to 8 yrs - Salary upto 30 LPA

- Experience more than 8 yrs - Salary upto 40 LPA

GCP Senior Data Engineer

at Xebia IT Architects

2 recruiters

Posted by Vijay S

Bengaluru (Bangalore), Gurugram, Pune, Hyderabad, Chennai, Bhopal, Jaipur

10 - 15 yrs

₹30L - ₹40L / yr

Spark

Google Cloud Platform (GCP)

Python

Apache Airflow

PySpark

+1 more

We are looking for a Senior Data Engineer with strong expertise in GCP, Databricks, and Airflow to design and implement a GCP Cloud Native Data Processing Framework. The ideal candidate will work on building scalable data pipelines and help migrate existing workloads to a modern framework.

Shift: 2 PM 11 PM
Work Mode: Hybrid (3 days a week) across Xebia locations
Notice Period: Immediate joiners or those with a notice period of up to 30 days

Key Responsibilities:

Design and implement a GCP Native Data Processing Framework leveraging Spark and GCP Cloud Services.
Develop and maintain data pipelines using Databricks and Airflow for transforming Raw → Silver → Gold data layers.
Ensure data integrity, consistency, and availability across all systems.
Collaborate with data engineers, analysts, and stakeholders to optimize performance.
Document standards and best practices for data engineering workflows.

Required Experience:

7-8 years of experience in data engineering, architecture, and pipeline development.
Strong knowledge of GCP, Databricks, PySpark, and BigQuery.
Experience with Orchestration tools like Airflow, Dagster, or GCP equivalents.
Understanding of Data Lake table formats (Delta, Iceberg, etc.).
Proficiency in Python for scripting and automation.
Strong problem-solving skills and collaborative mindset.

⚠️ Please apply only if you have not applied recently or are not currently in the interview process for any open roles at Xebia.

Looking forward to your response!

Best regards,

Vijay S

Assistant Manager - TAG

https://www.linkedin.com/in/vijay-selvarajan/

Shift: 2 PM 11 PM
Work Mode: Hybrid (3 days a week) across Xebia locations
Notice Period: Immediate joiners or those with a notice period of up to 30 days

Key Responsibilities:

Design and implement a GCP Native Data Processing Framework leveraging Spark and GCP Cloud Services.
Develop and maintain data pipelines using Databricks and Airflow for transforming Raw → Silver → Gold data layers.
Ensure data integrity, consistency, and availability across all systems.
Collaborate with data engineers, analysts, and stakeholders to optimize performance.
Document standards and best practices for data engineering workflows.

Required Experience:

7-8 years of experience in data engineering, architecture, and pipeline development.
Strong knowledge of GCP, Databricks, PySpark, and BigQuery.
Experience with Orchestration tools like Airflow, Dagster, or GCP equivalents.
Understanding of Data Lake table formats (Delta, Iceberg, etc.).
Proficiency in Python for scripting and automation.
Strong problem-solving skills and collaborative mindset.

⚠️ Please apply only if you have not applied recently or are not currently in the interview process for any open roles at Xebia.

Looking forward to your response!

Best regards,

Vijay S

Assistant Manager - TAG

https://www.linkedin.com/in/vijay-selvarajan/

AWS Data Engineer (Contractual)

at Forward Eye Technologies

Posted by Jaya S

Bengaluru (Bangalore), Mumbai, Delhi, Gurugram, Pune, Hyderabad, Ahmedabad, Chennai

3 - 7 yrs

₹8L - ₹15L / yr

AWS Lambda

Amazon S3

Amazon VPC

Amazon EC2

Amazon Redshift

+3 more

Technical Skills:

Ability to understand and translate business requirements into design.
Proficient in AWS infrastructure components such as S3, IAM, VPC, EC2, and Redshift.
Experience in creating ETL jobs using Python/PySpark.
Proficiency in creating AWS Lambda functions for event-based jobs.
Knowledge of automating ETL processes using AWS Step Functions.
Competence in building data warehouses and loading data into them.

Responsibilities:

Understand business requirements and translate them into design.
Assess AWS infrastructure needs for development work.
Develop ETL jobs using Python/PySpark to meet requirements.
Implement AWS Lambda for event-based tasks.
Automate ETL processes using AWS Step Functions.
Build data warehouses and manage data loading.
Engage with customers and stakeholders to articulate the benefits of proposed solutions and frameworks.

Technical Skills:

Ability to understand and translate business requirements into design.
Proficient in AWS infrastructure components such as S3, IAM, VPC, EC2, and Redshift.
Experience in creating ETL jobs using Python/PySpark.
Proficiency in creating AWS Lambda functions for event-based jobs.
Knowledge of automating ETL processes using AWS Step Functions.
Competence in building data warehouses and loading data into them.

Responsibilities:

Understand business requirements and translate them into design.
Assess AWS infrastructure needs for development work.
Develop ETL jobs using Python/PySpark to meet requirements.
Implement AWS Lambda for event-based tasks.
Automate ETL processes using AWS Step Functions.
Build data warehouses and manage data loading.
Engage with customers and stakeholders to articulate the benefits of proposed solutions and frameworks.

Data Engineer

one-to-one, one-to-many, and many-to-many

Agency job

via The Hub by Sridevi Viswanathan

Chennai

5 - 10 yrs

₹1L - ₹15L / yr

AWS CloudFormation

Python

PySpark

AWS Lambda

5-7 years of experience in Data Engineering with solid experience in design, development and implementation of end-to-end data ingestion and data processing system in AWS platform.

2-3 years of experience in AWS Glue, Lambda, Appflow, EventBridge, Python, PySpark, Lake House, S3, Redshift, Postgres, API Gateway, CloudFormation, Kinesis, Athena, KMS, IAM.

Experience in modern data architecture, Lake House, Enterprise Data Lake, Data Warehouse, API interfaces, solution patterns, standards and optimizing data ingestion.

Experience in build of data pipelines from source systems like SAP Concur, Veeva Vault, Azure Cost, various social media platforms or similar source systems.

Expertise in analyzing source data and designing a robust and scalable data ingestion framework and pipelines adhering to client Enterprise Data Architecture guidelines.

Proficient in design and development of solutions for real-time (or near real time) stream data processing as well as batch processing on the AWS platform.

Work closely with business analysts, data architects, data engineers, and data analysts to ensure that the data ingestion solutions meet the needs of the business.

Troubleshoot and provide support for issues related to data quality and data ingestion solutions. This may involve debugging data pipeline processes, optimizing queries, or troubleshooting application performance issues.

Experience in working in Agile/Scrum methodologies, CI/CD tools and practices, coding standards, code reviews, source management (GITHUB), JIRA, JIRA Xray and Confluence.

Experience or exposure to design and development using Full Stack tools.

Strong analytical and problem-solving skills, excellent communication (written and oral), and interpersonal skills.

Bachelor's or master's degree in computer science or related field.