PySpark Jobs in Pune

47+ PySpark Jobs in Pune | PySpark Job openings in Pune

Apply to 47+ PySpark Jobs in Pune on CutShort.io. Explore the latest PySpark Job opportunities across top companies like Google, Amazon & Adobe.

Azure Data Engineer

at Deqode

1 recruiter

Posted by Shraddha Katare

Pune, Gurugram, Jaipur, Bhopal

5 - 8 yrs

₹10L - ₹18L / yr

Data engineering

databricks

Data Structures

Python

PySpark

Job Description -

Position: Senior Data Engineer (Azure)

Experience - 6+ Years

Mode - Hybrid

Location - Gurgaon, Pune, Jaipur, Bangalore, Bhopal

Key Responsibilities:

Data Processing on Azure: Azure Data Factory, Streaming Analytics, Event Hubs, Azure Databricks, Data Migration Service, Data Pipeline.
Provisioning, configuring, and developing Azure solutions (ADB, ADF, ADW, etc.).
Design and implement scalable data models and migration strategies.
Work on distributed big data batch or streaming pipelines (Kafka or similar).
Develop data integration and transformation solutions for structured and unstructured data.
Collaborate with cross-functional teams for performance tuning and optimization.
Monitor data workflows and ensure compliance with data governance and quality standards.
Contribute to continuous improvement through automation and DevOps practices.

Required Skills & Experience:

6–10 years of experience as a Data Engineer.
Strong proficiency in Azure Databricks, PySpark, Python, SQL, and Azure Data Factory.
Experience in Data Modelling, Data Migration, and Data Warehousing.
Good understanding of database structure principles and schema design.
Hands-on experience using MS SQL Server, Oracle, or similar RDBMS platforms.
Experience in DevOps tools (Azure DevOps, Jenkins, Airflow, Azure Monitor) – good to have.
Knowledge of distributed data processing and real-time streaming (Kafka/Event Hub).
Familiarity with visualization tools like Power BI or Tableau.
Strong analytical, problem-solving, and debugging skills.
Self-motivated, detail-oriented, and capable of managing priorities effectively.

Job Description -

Position: Senior Data Engineer (Azure)

Experience - 6+ Years

Mode - Hybrid

Location - Gurgaon, Pune, Jaipur, Bangalore, Bhopal

Key Responsibilities:

Data Processing on Azure: Azure Data Factory, Streaming Analytics, Event Hubs, Azure Databricks, Data Migration Service, Data Pipeline.
Provisioning, configuring, and developing Azure solutions (ADB, ADF, ADW, etc.).
Design and implement scalable data models and migration strategies.
Work on distributed big data batch or streaming pipelines (Kafka or similar).
Develop data integration and transformation solutions for structured and unstructured data.
Collaborate with cross-functional teams for performance tuning and optimization.
Monitor data workflows and ensure compliance with data governance and quality standards.
Contribute to continuous improvement through automation and DevOps practices.

Required Skills & Experience:

6–10 years of experience as a Data Engineer.
Strong proficiency in Azure Databricks, PySpark, Python, SQL, and Azure Data Factory.
Experience in Data Modelling, Data Migration, and Data Warehousing.
Good understanding of database structure principles and schema design.
Hands-on experience using MS SQL Server, Oracle, or similar RDBMS platforms.
Experience in DevOps tools (Azure DevOps, Jenkins, Airflow, Azure Monitor) – good to have.
Knowledge of distributed data processing and real-time streaming (Kafka/Event Hub).
Familiarity with visualization tools like Power BI or Tableau.
Strong analytical, problem-solving, and debugging skills.
Self-motivated, detail-oriented, and capable of managing priorities effectively.

Databricks Admin

One of the reputed Client in India

Agency job

via Evalutech Prospect Services Private Limited by HR Evalutech

Bengaluru (Bangalore), Mumbai, Delhi, Gurugram, Noida, Hyderabad, Pune

6 - 8 yrs

₹12L - ₹13L / yr

Amazon Web Services (AWS)

Python

PySpark

Our Client is looking to hire Databricks Amin immediatly.

This is PAN-INDIA Bulk hiring

Minimum of 6-8+ years with Databricks, Pyspark/Python and AWS.

Must have AWS

Notice 15-30 days is preferred.

Share profiles at hr at etpspl dot com

Please refer/share our email to your friends/colleagues who are looking for job.

Our Client is looking to hire Databricks Amin immediatly.

This is PAN-INDIA Bulk hiring

Minimum of 6-8+ years with Databricks, Pyspark/Python and AWS.

Must have AWS

Notice 15-30 days is preferred.

Share profiles at hr at etpspl dot com

Please refer/share our email to your friends/colleagues who are looking for job.

Data Engineer

at Wissen Technology

4 recruiters

Posted by Gagandeep Kaur

Bengaluru (Bangalore), Mumbai, Pune

4 - 7 yrs

Best in industry

Python

PySpark

pandas

Airflow

Data engineering

Wissen Technology is hiring for Data Engineer

About Wissen Technology: At Wissen Technology, we deliver niche, custom-built products that solve complex business challenges across industries worldwide. Founded in 2015, our core philosophy is built around a strong product engineering mindset—ensuring every solution is architected and delivered right the first time. Today, Wissen Technology has a global footprint with 2000+ employees across offices in the US, UK, UAE, India, and Australia. Our commitment to excellence translates into delivering 2X impact compared to traditional service providers. How do we achieve this? Through a combination of deep domain knowledge, cutting-edge technology expertise, and a relentless focus on quality. We don’t just meet expectations—we exceed them by ensuring faster time-to-market, reduced rework, and greater alignment with client objectives. We have a proven track record of building mission-critical systems across industries, including financial services, healthcare, retail, manufacturing, and more. Wissen stands apart through its unique delivery models. Our outcome-based projects ensure predictable costs and timelines, while our agile pods provide clients the flexibility to adapt to their evolving business needs. Wissen leverages its thought leadership and technology prowess to drive superior business outcomes. Our success is powered by top-tier talent. Our mission is clear: to be the partner of choice for building world-class custom products that deliver exceptional impact—the first time, every time.

Job Summary: Wissen Technology is hiring a Data Engineer with expertise in Python, Pandas, Airflow, and Azure Cloud Services. The ideal candidate will have strong communication skills and experience with Kubernetes.

Experience: 4-7 years

Notice Period: Immediate- 15 days

Location: Pune, Mumbai, Bangalore

Mode of Work: Hybrid

Key Responsibilities:

Develop and maintain data pipelines using Python and Pandas.
Implement and manage workflows using Airflow.
Utilize Azure Cloud Services for data storage and processing.
Collaborate with cross-functional teams to understand data requirements and deliver solutions.
Ensure data quality and integrity throughout the data lifecycle.
Optimize and scale data infrastructure to meet business needs.

Qualifications and Required Skills:

Proficiency in Python (Must Have).
Strong experience with Pandas (Must Have).
Expertise in Airflow (Must Have).
Experience with Azure Cloud Services.
Good communication skills.

Good to Have Skills:

Experience with Pyspark.
Knowledge of Kubernetes.

Wissen Sites:

Website: http://www.wissen.com
LinkedIn: https://www.linkedin.com/company/wissen-technology
Wissen Leadership: https://www.wissen.com/company/leadership-team/
Wissen Live: https://www.linkedin.com/company/wissen-technology/posts/feedView=All
Wissen Thought Leadership: https://www.wissen.com/articles/

Wissen Technology is hiring for Data Engineer

Experience: 4-7 years

Notice Period: Immediate- 15 days

Location: Pune, Mumbai, Bangalore

Mode of Work: Hybrid

Key Responsibilities:

Develop and maintain data pipelines using Python and Pandas.
Implement and manage workflows using Airflow.
Utilize Azure Cloud Services for data storage and processing.
Collaborate with cross-functional teams to understand data requirements and deliver solutions.
Ensure data quality and integrity throughout the data lifecycle.
Optimize and scale data infrastructure to meet business needs.

Qualifications and Required Skills:

Proficiency in Python (Must Have).
Strong experience with Pandas (Must Have).
Expertise in Airflow (Must Have).
Experience with Azure Cloud Services.
Good communication skills.

Good to Have Skills:

Experience with Pyspark.
Knowledge of Kubernetes.

Wissen Sites:

Website: http://www.wissen.com
LinkedIn: https://www.linkedin.com/company/wissen-technology
Wissen Leadership: https://www.wissen.com/company/leadership-team/
Wissen Live: https://www.linkedin.com/company/wissen-technology/posts/feedView=All
Wissen Thought Leadership: https://www.wissen.com/articles/

Hiring _Azure Data Bricks

at Wissen Technology

4 recruiters

Posted by Bipasha Rath

Mumbai, Bengaluru (Bangalore), Pune

3 - 7 yrs

Best in industry

Python

pandas

PySpark

Experience: 3–7 Years

Locations: Pune / Bangalore / Mumbai

Notice Period :Immediate joiner only

Employment Type: Full-time

🛠️ Key Skills (Mandatory):

Python: Strong coding skills for data manipulation and automation.
PySpark: Experience with distributed data processing using Spark.
SQL: Proficient in writing complex queries for data extraction and transformation.
Azure Databricks: Hands-on experience with notebooks, Delta Lake, and MLflow

Interested candidates please share resume with details below.

Total Experience -

Relevant Experience in Python,Pyspark,AQL,Azure Data bricks-

Current CTC -

Expected CTC -

Notice period -

Current Location -

Desired Location -

Experience: 3–7 Years

Locations: Pune / Bangalore / Mumbai

Notice Period :Immediate joiner only

Employment Type: Full-time

🛠️ Key Skills (Mandatory):

Python: Strong coding skills for data manipulation and automation.
PySpark: Experience with distributed data processing using Spark.
SQL: Proficient in writing complex queries for data extraction and transformation.
Azure Databricks: Hands-on experience with notebooks, Delta Lake, and MLflow

Interested candidates please share resume with details below.

Total Experience -

Relevant Experience in Python,Pyspark,AQL,Azure Data bricks-

Current CTC -

Expected CTC -

Notice period -

Current Location -

Desired Location -

DATA ENGINEER

at Wissen Technology

4 recruiters

Posted by Janane Mohanasankaran

Bengaluru (Bangalore), Pune, Mumbai

7 - 12 yrs

Best in industry

Python

pandas

PySpark

SQL

Data engineering

Wissen Technology is hiring for Data Engineer

About Wissen Technology:At Wissen Technology, we deliver niche, custom-built products that solve complex business challenges across industries worldwide. Founded in 2015, our core philosophy is built around a strong product engineering mindset—ensuring every solution is architected and delivered right the first time. Today, Wissen Technology has a global footprint with 2000+ employees across offices in the US, UK, UAE, India, and Australia. Our commitment to excellence translates into delivering 2X impact compared to traditional service providers. How do we achieve this? Through a combination of deep domain knowledge, cutting-edge technology expertise, and a relentless focus on quality. We don’t just meet expectations—we exceed them by ensuring faster time-to-market, reduced rework, and greater alignment with client objectives. We have a proven track record of building mission-critical systems across industries, including financial services, healthcare, retail, manufacturing, and more. Wissen stands apart through its unique delivery models. Our outcome-based projects ensure predictable costs and timelines, while our agile pods provide clients the flexibility to adapt to their evolving business needs. Wissen leverages its thought leadership and technology prowess to drive superior business outcomes. Our success is powered by top-tier talent. Our mission is clear: to be the partner of choice for building world-class custom products that deliver exceptional impact—the first time, every time.

Job Summary:Wissen Technology is hiring a Data Engineer with a strong background in Python, data engineering, and workflow optimization. The ideal candidate will have experience with Delta Tables, Parquet, and be proficient in Pandas and PySpark.

Experience:7+ years

Location:Pune, Mumbai, Bangalore

Mode of Work:Hybrid

Key Responsibilities:

Develop and maintain data pipelines using Python (Pandas, PySpark).
Optimize data workflows and ensure efficient data processing.
Work with Delta Tables and Parquet for data storage and management.
Collaborate with cross-functional teams to understand data requirements and deliver solutions.
Ensure data quality and integrity throughout the data lifecycle.
Implement best practices for data engineering and workflow optimization.

Qualifications and Required Skills:

Proficiency in Python, specifically with Pandas and PySpark.
Strong experience in data engineering and workflow optimization.
Knowledge of Delta Tables and Parquet.
Excellent problem-solving skills and attention to detail.
Ability to work collaboratively in a team environment.
Strong communication skills.

Good to Have Skills:

Experience with Databricks.
Knowledge of Apache Spark, DBT, and Airflow.
Advanced Pandas optimizations.
Familiarity with PyTest/DBT testing frameworks.

Wissen Sites:

Website: http://www.wissen.com
LinkedIn: https://www.linkedin.com/company/wissen-technology
Wissen Leadership: https://www.wissen.com/company/leadership-team/
Wissen Live: https://www.linkedin.com/company/wissen-technology/posts/feedView=All
Wissen Thought Leadership: https://www.wissen.com/articles/

Wissen | Driving Digital Transformation

A technology consultancy that drives digital innovation by connecting strategy and execution, helping global clients to strengthen their core technology.

Wissen Technology is hiring for Data Engineer

About Wissen Technology:At Wissen Technology, we deliver niche, custom-built products that solve complex business challenges across industries worldwide. Founded in 2015, our core philosophy is built around a strong product engineering mindset—ensuring every solution is architected and delivered right the first time. Today, Wissen Technology has a global footprint with 2000+ employees across offices in the US, UK, UAE, India, and Australia. Our commitment to excellence translates into delivering 2X impact compared to traditional service providers. How do we achieve this? Through a combination of deep domain knowledge, cutting-edge technology expertise, and a relentless focus on quality. We don’t just meet expectations—we exceed them by ensuring faster time-to-market, reduced rework, and greater alignment with client objectives. We have a proven track record of building mission-critical systems across industries, including financial services, healthcare, retail, manufacturing, and more. Wissen stands apart through its unique delivery models. Our outcome-based projects ensure predictable costs and timelines, while our agile pods provide clients the flexibility to adapt to their evolving business needs. Wissen leverages its thought leadership and technology prowess to drive superior business outcomes. Our success is powered by top-tier talent. Our mission is clear: to be the partner of choice for building world-class custom products that deliver exceptional impact—the first time, every time.

Experience:7+ years

Location:Pune, Mumbai, Bangalore

Mode of Work:Hybrid

Key Responsibilities:

Develop and maintain data pipelines using Python (Pandas, PySpark).
Optimize data workflows and ensure efficient data processing.
Work with Delta Tables and Parquet for data storage and management.
Collaborate with cross-functional teams to understand data requirements and deliver solutions.
Ensure data quality and integrity throughout the data lifecycle.
Implement best practices for data engineering and workflow optimization.

Qualifications and Required Skills:

Proficiency in Python, specifically with Pandas and PySpark.
Strong experience in data engineering and workflow optimization.
Knowledge of Delta Tables and Parquet.
Excellent problem-solving skills and attention to detail.
Ability to work collaboratively in a team environment.
Strong communication skills.

Good to Have Skills:

Experience with Databricks.
Knowledge of Apache Spark, DBT, and Airflow.
Advanced Pandas optimizations.
Familiarity with PyTest/DBT testing frameworks.

Wissen Sites:

Website: http://www.wissen.com
LinkedIn: https://www.linkedin.com/company/wissen-technology
Wissen Leadership: https://www.wissen.com/company/leadership-team/
Wissen Live: https://www.linkedin.com/company/wissen-technology/posts/feedView=All
Wissen Thought Leadership: https://www.wissen.com/articles/

Wissen | Driving Digital Transformation

A technology consultancy that drives digital innovation by connecting strategy and execution, helping global clients to strengthen their core technology.

PySpark/Scala Developer

at Tata Consultancy Services

2 recruiters

Agency job

via Risk Resources LLP hyd by Jhansi Padiy

Chennai, Hyderabad, Kolkata, Delhi, Pune, Bengaluru (Bangalore)

4 - 10 yrs

₹6L - ₹30L / yr

Scala

PySpark

Spark

Amazon Web Services (AWS)

Job Title: PySpark/Scala Developer

Functional Skills: Experience in Credit Risk/Regulatory risk domain

Technical Skills: Spark ,PySpark, Python, Hive, Scala, MapReduce, Unix shell scripting

Good to Have Skills: Exposure to Machine Learning Techniques

Job Description:

5+ Years of experience with Developing/Fine tuning and implementing programs/applications

Using Python/PySpark/Scala on Big Data/Hadoop Platform.

Roles and Responsibilities:

a) Work with a Leading Bank’s Risk Management team on specific projects/requirements pertaining to risk Models in

consumer and wholesale banking

b) Enhance Machine Learning Models using PySpark or Scala

c) Work with Data Scientists to Build ML Models based on Business Requirements and Follow ML Cycle to Deploy them all

the way to Production Environment

d) Participate Feature Engineering, Training Models, Scoring and retraining

e) Architect Data Pipeline and Automate Data Ingestion and Model Jobs

Skills and competencies:

Required:

· Strong analytical skills in conducting sophisticated statistical analysis using bureau/vendor data, customer performance

Data and macro-economic data to solve business problems.

· Working experience in languages PySpark & Scala to develop code to validate and implement models and codes in

Credit Risk/Banking

· Experience with distributed systems such as Hadoop/MapReduce, Spark, streaming data processing, cloud architecture.

Familiarity with machine learning frameworks and libraries (like scikit-learn, SparkML, tensorflow, pytorch etc.
Experience in systems integration, web services, batch processing
Experience in migrating codes to PySpark/Scala is big Plus
The ability to act as liaison conveying information needs of the business to IT and data constraints to the business

applies equal conveyance regarding business strategy and IT strategy, business processes and work flow

· Flexibility in approach and thought process

· Attitude to learn and comprehend the periodical changes in the regulatory requirement as per FED

Job Title: PySpark/Scala Developer

Functional Skills: Experience in Credit Risk/Regulatory risk domain

Technical Skills: Spark ,PySpark, Python, Hive, Scala, MapReduce, Unix shell scripting

Good to Have Skills: Exposure to Machine Learning Techniques

Job Description:

5+ Years of experience with Developing/Fine tuning and implementing programs/applications

Using Python/PySpark/Scala on Big Data/Hadoop Platform.

Roles and Responsibilities:

a) Work with a Leading Bank’s Risk Management team on specific projects/requirements pertaining to risk Models in

consumer and wholesale banking

b) Enhance Machine Learning Models using PySpark or Scala

c) Work with Data Scientists to Build ML Models based on Business Requirements and Follow ML Cycle to Deploy them all

the way to Production Environment

d) Participate Feature Engineering, Training Models, Scoring and retraining

e) Architect Data Pipeline and Automate Data Ingestion and Model Jobs

Skills and competencies:

Required:

· Strong analytical skills in conducting sophisticated statistical analysis using bureau/vendor data, customer performance

Data and macro-economic data to solve business problems.

· Working experience in languages PySpark & Scala to develop code to validate and implement models and codes in

Credit Risk/Banking

· Experience with distributed systems such as Hadoop/MapReduce, Spark, streaming data processing, cloud architecture.

Familiarity with machine learning frameworks and libraries (like scikit-learn, SparkML, tensorflow, pytorch etc.
Experience in systems integration, web services, batch processing
Experience in migrating codes to PySpark/Scala is big Plus
The ability to act as liaison conveying information needs of the business to IT and data constraints to the business

applies equal conveyance regarding business strategy and IT strategy, business processes and work flow

· Flexibility in approach and thought process

· Attitude to learn and comprehend the periodical changes in the regulatory requirement as per FED

PySpark/Scala Developer

at Tata Consultancy Services

2 recruiters

Agency job

via Risk Resources LLP hyd by susmitha o

Bengaluru (Bangalore), Hyderabad, Pune, Delhi, Kolkata, Chennai

5 - 8 yrs

₹7L - ₹30L / yr

Scala

Python

PySpark

Apache Hive

Spark

+3 more

Skills and competencies:

Required:

· Strong analytical skills in conducting sophisticated statistical analysis using bureau/vendor data, customer performance

Data and macro-economic data to solve business problems.

· Working experience in languages PySpark & Scala to develop code to validate and implement models and codes in

Credit Risk/Banking

· Experience with distributed systems such as Hadoop/MapReduce, Spark, streaming data processing, cloud architecture.

Familiarity with machine learning frameworks and libraries (like scikit-learn, SparkML, tensorflow, pytorch etc.
Experience in systems integration, web services, batch processing
Experience in migrating codes to PySpark/Scala is big Plus
The ability to act as liaison conveying information needs of the business to IT and data constraints to the business

applies equal conveyance regarding business strategy and IT strategy, business processes and work flow

· Flexibility in approach and thought process

· Attitude to learn and comprehend the periodical changes in the regulatory requirement as per FED

Skills and competencies:

Required:

· Strong analytical skills in conducting sophisticated statistical analysis using bureau/vendor data, customer performance

Data and macro-economic data to solve business problems.

· Working experience in languages PySpark & Scala to develop code to validate and implement models and codes in

Credit Risk/Banking

· Experience with distributed systems such as Hadoop/MapReduce, Spark, streaming data processing, cloud architecture.

Familiarity with machine learning frameworks and libraries (like scikit-learn, SparkML, tensorflow, pytorch etc.
Experience in systems integration, web services, batch processing
Experience in migrating codes to PySpark/Scala is big Plus
The ability to act as liaison conveying information needs of the business to IT and data constraints to the business

applies equal conveyance regarding business strategy and IT strategy, business processes and work flow

· Flexibility in approach and thought process

· Attitude to learn and comprehend the periodical changes in the regulatory requirement as per FED

AWS Data Engineer

at Deqode

1 recruiter

Posted by Shraddha Katare

Pune, Bengaluru (Bangalore)

5 - 8 yrs

₹5L - ₹13L / yr

Amazon Web Services (AWS)

databricks

PySpark

SQL

Profile: AWS Data Engineer

Mandate skills :AWS + Databricks + Pyspark + SQL role

Location: Bangalore/Pune/Hyderabad/Chennai/Gurgaon:

Notice Period: Immediate

Key Requirements :

Design, build, and maintain scalable data pipelines to collect, process, and store from multiple datasets.
Optimize data storage solutions for better performance, scalability, and cost-efficiency.
Develop and manage ETL/ELT processes to transform data as per schema definitions, apply slicing and dicing, and make it available for downstream jobs and other teams.
Collaborate closely with cross-functional teams to understand system and product functionalities, pace up feature development, and capture evolving data requirements.
Engage with stakeholders to gather requirements and create curated datasets for downstream consumption and end-user reporting.
Automate deployment and CI/CD processes using GitHub workflows, identifying areas to reduce manual, repetitive work.
Ensure compliance with data governance policies, privacy regulations, and security protocols.
Utilize cloud platforms like AWS and work on Databricks for data processing with S3 Storage.
Work with distributed systems and big data technologies such as Spark, SQL, and Delta Lake.
Integrate with SFTP to push data securely from Databricks to remote locations.
Analyze and interpret spark query execution plans to fine-tune queries for faster and more efficient processing.
Strong problem-solving and troubleshooting skills in large-scale distributed systems.

Profile: AWS Data Engineer

Mandate skills :AWS + Databricks + Pyspark + SQL role

Location: Bangalore/Pune/Hyderabad/Chennai/Gurgaon:

Notice Period: Immediate

Key Requirements :

Design, build, and maintain scalable data pipelines to collect, process, and store from multiple datasets.
Optimize data storage solutions for better performance, scalability, and cost-efficiency.
Develop and manage ETL/ELT processes to transform data as per schema definitions, apply slicing and dicing, and make it available for downstream jobs and other teams.
Collaborate closely with cross-functional teams to understand system and product functionalities, pace up feature development, and capture evolving data requirements.
Engage with stakeholders to gather requirements and create curated datasets for downstream consumption and end-user reporting.
Automate deployment and CI/CD processes using GitHub workflows, identifying areas to reduce manual, repetitive work.
Ensure compliance with data governance policies, privacy regulations, and security protocols.
Utilize cloud platforms like AWS and work on Databricks for data processing with S3 Storage.
Work with distributed systems and big data technologies such as Spark, SQL, and Delta Lake.
Integrate with SFTP to push data securely from Databricks to remote locations.
Analyze and interpret spark query execution plans to fine-tune queries for faster and more efficient processing.
Strong problem-solving and troubleshooting skills in large-scale distributed systems.

Data Scientist

at Data Axle

2 candid answers

Posted by Eman Khan

Remote, Pune

4 - 9 yrs

Best in industry

Machine Learning (ML)

Python

SQL

PySpark

XGBoost

About Data Axle:

Data Axle Inc. has been an industry leader in data, marketing solutions, sales and research for over 50 years in the USA. Data Axle now as an established strategic global centre of excellence in Pune. This centre delivers mission critical data services to its global customers powered by its proprietary cloud-based technology platform and by leveraging proprietary business & consumer databases.

Data Axle Pune is pleased to have achieved certification as a Great Place to Work!

Roles & Responsibilities:

We are looking for a Data Scientist to join the Data Science Client Services team to continue our success of identifying high quality target audiences that generate profitable marketing return for our clients. We are looking for experienced data science, machine learning and MLOps practitioners to design, build and deploy impactful predictive marketing solutions that serve a wide range of verticals and clients. The right candidate will enjoy contributing to and learning from a highly talented team and working on a variety of projects.

We are looking for a Senior Data Scientist who will be responsible for:

Ownership of design, implementation, and deployment of machine learning algorithms in a modern Python-based cloud architecture
Design or enhance ML workflows for data ingestion, model design, model inference and scoring
Oversight on team project execution and delivery
If senior, establish peer review guidelines for high quality coding to help develop junior team members’ skill set growth, cross-training, and team efficiencies
Visualize and publish model performance results and insights to internal and external audiences

Qualifications:

Masters in a relevant quantitative, applied field (Statistics, Econometrics, Computer Science, Mathematics, Engineering)
Minimum of 3.5 years of work experience in the end-to-end lifecycle of ML model development and deployment into production within a cloud infrastructure (Databricks is highly preferred)
Proven ability to manage the output of a small team in a fast-paced environment and to lead by example in the fulfilment of client requests
Exhibit deep knowledge of core mathematical principles relating to data science and machine learning (ML Theory + Best Practices, Feature Engineering and Selection, Supervised and Unsupervised ML, A/B Testing, etc.)
Proficiency in Python and SQL required; PySpark/Spark experience a plus
Ability to conduct a productive peer review and proper code structure in Github
Proven experience developing, testing, and deploying various ML algorithms (neural networks, XGBoost, Bayes, and the like)
Working knowledge of modern CI/CD methods This position description is intended to describe the duties most frequently performed by an individual in this position.

It is not intended to be a complete list of assigned duties but to describe a position level.

About Data Axle:

Data Axle Pune is pleased to have achieved certification as a Great Place to Work!

Roles & Responsibilities:

We are looking for a Senior Data Scientist who will be responsible for:

Ownership of design, implementation, and deployment of machine learning algorithms in a modern Python-based cloud architecture
Design or enhance ML workflows for data ingestion, model design, model inference and scoring
Oversight on team project execution and delivery
If senior, establish peer review guidelines for high quality coding to help develop junior team members’ skill set growth, cross-training, and team efficiencies
Visualize and publish model performance results and insights to internal and external audiences

Qualifications:

Masters in a relevant quantitative, applied field (Statistics, Econometrics, Computer Science, Mathematics, Engineering)
Minimum of 3.5 years of work experience in the end-to-end lifecycle of ML model development and deployment into production within a cloud infrastructure (Databricks is highly preferred)
Proven ability to manage the output of a small team in a fast-paced environment and to lead by example in the fulfilment of client requests
Exhibit deep knowledge of core mathematical principles relating to data science and machine learning (ML Theory + Best Practices, Feature Engineering and Selection, Supervised and Unsupervised ML, A/B Testing, etc.)
Proficiency in Python and SQL required; PySpark/Spark experience a plus
Ability to conduct a productive peer review and proper code structure in Github
Proven experience developing, testing, and deploying various ML algorithms (neural networks, XGBoost, Bayes, and the like)
Working knowledge of modern CI/CD methods This position description is intended to describe the duties most frequently performed by an individual in this position.

It is not intended to be a complete list of assigned duties but to describe a position level.

Solution/Technical Architect (Databricks)

at Quintica

Posted by Nitin D

Remote, Bengaluru (Bangalore), Pune, Chennai, Nagpur

5 - 15 yrs

₹20L - ₹30L / yr

databricks

PySpark

Apache Spark

CI/CD

Data engineering

Technical Architect (Databricks)

10+ Years Data Engineering Experience with expertise in Databricks
3+ years of consulting experience
Completed Data Engineering Professional certification & required classes
Minimum 2-3 projects delivered with hands-on experience in Databricks
Completed Apache Spark Programming with Databricks, Data Engineering with Databricks, Optimizing Apache Spark™ on Databricks
Experience in Spark and/or Hadoop, Flink, Presto, other popular big data engines
Familiarity with Databricks multi-hop pipeline architecture

Sr. Data Engineer (Databricks)

5+ Years Data Engineering Experience with expertise in Databricks
Completed Data Engineering Associate certification & required classes
Minimum 1 project delivered with hands-on experience in development on Databricks
Completed Apache Spark Programming with Databricks, Data Engineering with Databricks, Optimizing Apache Spark™ on Databricks
SQL delivery experience, and familiarity with Bigquery, Synapse or Redshift
Proficient in Python, knowledge of additional databricks programming languages (Scala)

Technical Architect (Databricks)

10+ Years Data Engineering Experience with expertise in Databricks
3+ years of consulting experience
Completed Data Engineering Professional certification & required classes
Minimum 2-3 projects delivered with hands-on experience in Databricks
Completed Apache Spark Programming with Databricks, Data Engineering with Databricks, Optimizing Apache Spark™ on Databricks
Experience in Spark and/or Hadoop, Flink, Presto, other popular big data engines
Familiarity with Databricks multi-hop pipeline architecture

Sr. Data Engineer (Databricks)

5+ Years Data Engineering Experience with expertise in Databricks
Completed Data Engineering Associate certification & required classes
Minimum 1 project delivered with hands-on experience in development on Databricks
Completed Apache Spark Programming with Databricks, Data Engineering with Databricks, Optimizing Apache Spark™ on Databricks
SQL delivery experience, and familiarity with Bigquery, Synapse or Redshift
Proficient in Python, knowledge of additional databricks programming languages (Scala)

Data Engineer

at Wissen Technology

4 recruiters

Posted by Annie Varghese

Pune, Mumbai, Bengaluru (Bangalore)

3 - 8 yrs

Best in industry

snowflake

Apache Airflow

ETL

Python

PySpark

+1 more

Job Summary:

We are looking for a highly skilled and experienced Data Engineer with deep expertise in Airflow, dbt, Python, and Snowflake. The ideal candidate will be responsible for designing, building, and managing scalable data pipelines and transformation frameworks to enable robust data workflows across the organization.

Key Responsibilities:

Design and implement scalable ETL/ELT pipelines using Apache Airflow for orchestration.
Develop modular and maintainable data transformation models using dbt.
Write high-performance data processing scripts and automation using Python.
Build and maintain data models and pipelines on Snowflake.
Collaborate with data analysts, data scientists, and business teams to deliver clean, reliable, and timely data.
Monitor and optimize pipeline performance and troubleshoot issues proactively.
Follow best practices in version control, testing, and CI/CD for data projects.

Must-Have Skills:

Strong hands-on experience with Apache Airflow for scheduling and orchestrating data workflows.
Proficiency in dbt (data build tool) for building scalable and testable data models.
Expert-level skills in Python for data processing and automation.
Solid experience with Snowflake, including SQL performance tuning, data modeling, and warehouse management.
Strong understanding of data engineering best practices including modularity, testing, and deployment.

Good to Have:

Experience working with cloud platforms (AWS/GCP/Azure).
Familiarity with CI/CD pipelines for data (e.g., GitHub Actions, GitLab CI).
Exposure to modern data stack tools (e.g., Fivetran, Stitch, Looker).
Knowledge of data security and governance best practices.

Note : One face-to-face (F2F) round is mandatory, and as per the process, you will need to visit the office for this.

Job Summary:

Key Responsibilities:

Design and implement scalable ETL/ELT pipelines using Apache Airflow for orchestration.
Develop modular and maintainable data transformation models using dbt.
Write high-performance data processing scripts and automation using Python.
Build and maintain data models and pipelines on Snowflake.
Collaborate with data analysts, data scientists, and business teams to deliver clean, reliable, and timely data.
Monitor and optimize pipeline performance and troubleshoot issues proactively.
Follow best practices in version control, testing, and CI/CD for data projects.

Must-Have Skills:

Strong hands-on experience with Apache Airflow for scheduling and orchestrating data workflows.
Proficiency in dbt (data build tool) for building scalable and testable data models.
Expert-level skills in Python for data processing and automation.
Solid experience with Snowflake, including SQL performance tuning, data modeling, and warehouse management.
Strong understanding of data engineering best practices including modularity, testing, and deployment.

Good to Have:

Experience working with cloud platforms (AWS/GCP/Azure).
Familiarity with CI/CD pipelines for data (e.g., GitHub Actions, GitLab CI).
Exposure to modern data stack tools (e.g., Fivetran, Stitch, Looker).
Knowledge of data security and governance best practices.

Note : One face-to-face (F2F) round is mandatory, and as per the process, you will need to visit the office for this.

AWS Data Engineer

at VyTCDC

Posted by Gobinath Sundaram

Chennai, Bengaluru (Bangalore), Hyderabad, Mumbai, Pune, Noida

4 - 6 yrs

₹3L - ₹21L / yr

AWS Data Engineer

Amazon Web Services (AWS)

Python

PySpark

databricks

+1 more

Key Responsibilities

Design and implement ETL/ELT pipelines using Databricks, PySpark, and AWS Glue
Develop and maintain scalable data architectures on AWS (S3, EMR, Lambda, Redshift, RDS)
Perform data wrangling, cleansing, and transformation using Python and SQL
Collaborate with data scientists to integrate Generative AI models into analytics workflows
Build dashboards and reports to visualize insights using tools like Power BI or Tableau
Ensure data quality, governance, and security across all data assets
Optimize performance of data pipelines and troubleshoot bottlenecks
Work closely with stakeholders to understand data requirements and deliver actionable insights

🧪 Required Skills

Skill AreaTools & TechnologiesCloud PlatformsAWS (S3, Lambda, Glue, EMR, Redshift)Big DataDatabricks, Apache Spark, PySparkProgrammingPython, SQLData EngineeringETL/ELT, Data Lakes, Data WarehousingAnalyticsData Modeling, Visualization, BI ReportingGen AI IntegrationOpenAI, Hugging Face, LangChain (preferred)DevOps (Bonus)Git, Jenkins, Terraform, Docker

📚 Qualifications

Bachelor's or Master’s degree in Computer Science, Data Science, or related field
3+ years of experience in data engineering or data analytics
Hands-on experience with Databricks, PySpark, and AWS
Familiarity with Generative AI tools and frameworks is a strong plus
Strong problem-solving and communication skills

🌟 Preferred Traits

Analytical mindset with attention to detail
Passion for data and emerging technologies
Ability to work independently and in cross-functional teams
Eagerness to learn and adapt in a fast-paced environment

Key Responsibilities

Design and implement ETL/ELT pipelines using Databricks, PySpark, and AWS Glue
Develop and maintain scalable data architectures on AWS (S3, EMR, Lambda, Redshift, RDS)
Perform data wrangling, cleansing, and transformation using Python and SQL
Collaborate with data scientists to integrate Generative AI models into analytics workflows
Build dashboards and reports to visualize insights using tools like Power BI or Tableau
Ensure data quality, governance, and security across all data assets
Optimize performance of data pipelines and troubleshoot bottlenecks
Work closely with stakeholders to understand data requirements and deliver actionable insights

🧪 Required Skills

📚 Qualifications

Bachelor's or Master’s degree in Computer Science, Data Science, or related field
3+ years of experience in data engineering or data analytics
Hands-on experience with Databricks, PySpark, and AWS
Familiarity with Generative AI tools and frameworks is a strong plus
Strong problem-solving and communication skills

🌟 Preferred Traits

Analytical mindset with attention to detail
Passion for data and emerging technologies
Ability to work independently and in cross-functional teams
Eagerness to learn and adapt in a fast-paced environment

Python developer

at Wissen Technology

4 recruiters

Posted by Praffull Shinde

Pune, Mumbai, Bengaluru (Bangalore)

4 - 8 yrs

₹14L - ₹26L / yr

Python

PySpark

Django

Flask

RESTful APIs

+3 more

Job title - Python developer

Exp – 4 to 6 years

Location – Pune/Mum/B’lore

PFB JD

Requirements:

Proven experience as a Python Developer
Strong knowledge of core Python and Pyspark concepts
Experience with web frameworks such as Django or Flask
Good exposure to any cloud platform (GCP Preferred)
CI/CD exposure required
Solid understanding of RESTful APIs and how to build them
Experience working with databases like Oracle DB and MySQL
Ability to write efficient SQL queries and optimize database performance
Strong problem-solving skills and attention to detail
Strong SQL programing (stored procedure, functions)
Excellent communication and interpersonal skill

Roles and Responsibilities

Design, develop, and maintain data pipelines and ETL processes using pyspark
Work closely with data scientists and analysts to provide them with clean, structured data.
Optimize data storage and retrieval for performance and scalability.
Collaborate with cross-functional teams to gather data requirements.
Ensure data quality and integrity through data validation and cleansing processes.
Monitor and troubleshoot data-related issues to ensure data pipeline reliability.
Stay up to date with industry best practices and emerging technologies in data engineering.

Job title - Python developer

Exp – 4 to 6 years

Location – Pune/Mum/B’lore

PFB JD

Requirements:

Proven experience as a Python Developer
Strong knowledge of core Python and Pyspark concepts
Experience with web frameworks such as Django or Flask
Good exposure to any cloud platform (GCP Preferred)
CI/CD exposure required
Solid understanding of RESTful APIs and how to build them
Experience working with databases like Oracle DB and MySQL
Ability to write efficient SQL queries and optimize database performance
Strong problem-solving skills and attention to detail
Strong SQL programing (stored procedure, functions)
Excellent communication and interpersonal skill

Roles and Responsibilities

Design, develop, and maintain data pipelines and ETL processes using pyspark
Work closely with data scientists and analysts to provide them with clean, structured data.
Optimize data storage and retrieval for performance and scalability.
Collaborate with cross-functional teams to gather data requirements.
Ensure data quality and integrity through data validation and cleansing processes.
Monitor and troubleshoot data-related issues to ensure data pipeline reliability.
Stay up to date with industry best practices and emerging technologies in data engineering.

AWS Data Engineer

at Deqode

1 recruiter

Posted by Alisha Das

Bengaluru (Bangalore), Mumbai, Pune, Chennai, Gurugram

5.6 - 7 yrs

₹10L - ₹28L / yr

Amazon Web Services (AWS)

Python

PySpark

SQL

Job Summary:

As an AWS Data Engineer, you will be responsible for designing, developing, and maintaining scalable, high-performance data pipelines using AWS services. With 6+ years of experience, you’ll collaborate closely with data architects, analysts, and business stakeholders to build reliable, secure, and cost-efficient data infrastructure across the organization.

Key Responsibilities:

Design, develop, and manage scalable data pipelines using AWS Glue, Lambda, and other serverless technologies
Implement ETL workflows and transformation logic using PySpark and Python on AWS Glue
Leverage AWS Redshift for warehousing, performance tuning, and large-scale data queries
Work with AWS DMS and RDS for database integration and migration
Optimize data flows and system performance for speed and cost-effectiveness
Deploy and manage infrastructure using AWS CloudFormation templates
Collaborate with cross-functional teams to gather requirements and build robust data solutions
Ensure data integrity, quality, and security across all systems and processes

Required Skills & Experience:

6+ years of experience in Data Engineering with strong AWS expertise
Proficient in Python and PySpark for data processing and ETL development
Hands-on experience with AWS Glue, Lambda, DMS, RDS, and Redshift
Strong SQL skills for building complex queries and performing data analysis
Familiarity with AWS CloudFormation and infrastructure as code principles
Good understanding of serverless architecture and cost-optimized design
Ability to write clean, modular, and maintainable code
Strong analytical thinking and problem-solving skills

Job Summary:

Key Responsibilities:

Design, develop, and manage scalable data pipelines using AWS Glue, Lambda, and other serverless technologies
Implement ETL workflows and transformation logic using PySpark and Python on AWS Glue
Leverage AWS Redshift for warehousing, performance tuning, and large-scale data queries
Work with AWS DMS and RDS for database integration and migration
Optimize data flows and system performance for speed and cost-effectiveness
Deploy and manage infrastructure using AWS CloudFormation templates
Collaborate with cross-functional teams to gather requirements and build robust data solutions
Ensure data integrity, quality, and security across all systems and processes

Required Skills & Experience:

6+ years of experience in Data Engineering with strong AWS expertise
Proficient in Python and PySpark for data processing and ETL development
Hands-on experience with AWS Glue, Lambda, DMS, RDS, and Redshift
Strong SQL skills for building complex queries and performing data analysis
Familiarity with AWS CloudFormation and infrastructure as code principles
Good understanding of serverless architecture and cost-optimized design
Ability to write clean, modular, and maintainable code
Strong analytical thinking and problem-solving skills

ETL Automation Tester

at E2E Infoware Management Services

Posted by Monika S

Bengaluru (Bangalore), Pune, Chennai

5 - 12 yrs

₹5L - ₹25L / yr

PySpark

Automation

SQL

Skill Name: ETL Automation Testing

Location: Bangalore, Chennai and Pune

Experience: 5+ Years

Required:

Experience in ETL Automation Testing

Strong experience in Pyspark.

Skill Name: ETL Automation Testing

Location: Bangalore, Chennai and Pune

Experience: 5+ Years

Required:

Experience in ETL Automation Testing

Strong experience in Pyspark.

Senior Data Engineer

at Wissen Technology

4 recruiters

Posted by Vishakha Walunj

Bengaluru (Bangalore), Pune, Mumbai

7 - 12 yrs

Best in industry

PySpark

databricks

SQL

Python

Required Skills:

Hands-on experience with Databricks, PySpark
Proficiency in SQL, Python, and Spark.
Understanding of data warehousing concepts and data modeling.
Experience with CI/CD pipelines and version control (e.g., Git).
Fundamental knowledge of any cloud services, preferably Azure or GCP.

Good to Have:

Bigquery
Experience with performance tuning and data governance.

Required Skills:

Hands-on experience with Databricks, PySpark
Proficiency in SQL, Python, and Spark.
Understanding of data warehousing concepts and data modeling.
Experience with CI/CD pipelines and version control (e.g., Git).
Fundamental knowledge of any cloud services, preferably Azure or GCP.

Good to Have:

Bigquery
Experience with performance tuning and data governance.

AWS Data Engineer

at Deqode

1 recruiter

Posted by Roshni Maji

Pune, Bengaluru (Bangalore), Gurugram, Chennai, Mumbai

5 - 7 yrs

₹6L - ₹20L / yr

Amazon Web Services (AWS)

Amazon Redshift

AWS Glue

Python

PySpark

Position: AWS Data Engineer

Experience: 5 to 7 Years

Location: Bengaluru, Pune, Chennai, Mumbai, Gurugram

Work Mode: Hybrid (3 days work from office per week)

Employment Type: Full-time

About the Role:

We are seeking a highly skilled and motivated AWS Data Engineer with 5–7 years of experience in building and optimizing data pipelines, architectures, and data sets. The ideal candidate will have strong experience with AWS services including Glue, Athena, Redshift, Lambda, DMS, RDS, and CloudFormation. You will be responsible for managing the full data lifecycle from ingestion to transformation and storage, ensuring efficiency and performance.

Key Responsibilities:

Design, develop, and optimize scalable ETL pipelines using AWS Glue, Python/PySpark, and SQL.
Work extensively with AWS services such as Glue, Athena, Lambda, DMS, RDS, Redshift, CloudFormation, and other serverless technologies.
Implement and manage data lake and warehouse solutions using AWS Redshift and S3.
Optimize data models and storage for cost-efficiency and performance.
Write advanced SQL queries to support complex data analysis and reporting requirements.
Collaborate with stakeholders to understand data requirements and translate them into scalable solutions.
Ensure high data quality and integrity across platforms and processes.
Implement CI/CD pipelines and best practices for infrastructure as code using CloudFormation or similar tools.

Required Skills & Experience:

Strong hands-on experience with Python or PySpark for data processing.
Deep knowledge of AWS Glue, Athena, Lambda, Redshift, RDS, DMS, and CloudFormation.
Proficiency in writing complex SQL queries and optimizing them for performance.
Familiarity with serverless architectures and AWS best practices.
Experience in designing and maintaining robust data architectures and data lakes.
Ability to troubleshoot and resolve data pipeline issues efficiently.
Strong communication and stakeholder management skills.

Position: AWS Data Engineer

Experience: 5 to 7 Years

Location: Bengaluru, Pune, Chennai, Mumbai, Gurugram

Work Mode: Hybrid (3 days work from office per week)

Employment Type: Full-time

About the Role:

Key Responsibilities:

Design, develop, and optimize scalable ETL pipelines using AWS Glue, Python/PySpark, and SQL.
Work extensively with AWS services such as Glue, Athena, Lambda, DMS, RDS, Redshift, CloudFormation, and other serverless technologies.
Implement and manage data lake and warehouse solutions using AWS Redshift and S3.
Optimize data models and storage for cost-efficiency and performance.
Write advanced SQL queries to support complex data analysis and reporting requirements.
Collaborate with stakeholders to understand data requirements and translate them into scalable solutions.
Ensure high data quality and integrity across platforms and processes.
Implement CI/CD pipelines and best practices for infrastructure as code using CloudFormation or similar tools.

Required Skills & Experience:

Strong hands-on experience with Python or PySpark for data processing.
Deep knowledge of AWS Glue, Athena, Lambda, Redshift, RDS, DMS, and CloudFormation.
Proficiency in writing complex SQL queries and optimizing them for performance.
Familiarity with serverless architectures and AWS best practices.
Experience in designing and maintaining robust data architectures and data lakes.
Ability to troubleshoot and resolve data pipeline issues efficiently.
Strong communication and stakeholder management skills.

AWS Data Engineer

at Deqode

1 recruiter

Posted by Roshni Maji

Bengaluru (Bangalore), Pune, Mumbai, Chennai, Gurugram

5 - 7 yrs

₹5L - ₹19L / yr

Python

PySpark

Amazon Web Services (AWS)

aws

Amazon Redshift

+1 more

Position: AWS Data Engineer

Experience: 5 to 7 Years

Location: Bengaluru, Pune, Chennai, Mumbai, Gurugram

Work Mode: Hybrid (3 days work from office per week)

Employment Type: Full-time

About the Role:

Key Responsibilities:

Design, develop, and optimize scalable ETL pipelines using AWS Glue, Python/PySpark, and SQL.
Work extensively with AWS services such as Glue, Athena, Lambda, DMS, RDS, Redshift, CloudFormation, and other serverless technologies.
Implement and manage data lake and warehouse solutions using AWS Redshift and S3.
Optimize data models and storage for cost-efficiency and performance.
Write advanced SQL queries to support complex data analysis and reporting requirements.
Collaborate with stakeholders to understand data requirements and translate them into scalable solutions.
Ensure high data quality and integrity across platforms and processes.
Implement CI/CD pipelines and best practices for infrastructure as code using CloudFormation or similar tools.

Required Skills & Experience:

Strong hands-on experience with Python or PySpark for data processing.
Deep knowledge of AWS Glue, Athena, Lambda, Redshift, RDS, DMS, and CloudFormation.
Proficiency in writing complex SQL queries and optimizing them for performance.
Familiarity with serverless architectures and AWS best practices.
Experience in designing and maintaining robust data architectures and data lakes.
Ability to troubleshoot and resolve data pipeline issues efficiently.
Strong communication and stakeholder management skills.

Position: AWS Data Engineer

Experience: 5 to 7 Years

Location: Bengaluru, Pune, Chennai, Mumbai, Gurugram

Work Mode: Hybrid (3 days work from office per week)

Employment Type: Full-time

About the Role:

Key Responsibilities:

Design, develop, and optimize scalable ETL pipelines using AWS Glue, Python/PySpark, and SQL.
Work extensively with AWS services such as Glue, Athena, Lambda, DMS, RDS, Redshift, CloudFormation, and other serverless technologies.
Implement and manage data lake and warehouse solutions using AWS Redshift and S3.
Optimize data models and storage for cost-efficiency and performance.
Write advanced SQL queries to support complex data analysis and reporting requirements.
Collaborate with stakeholders to understand data requirements and translate them into scalable solutions.
Ensure high data quality and integrity across platforms and processes.
Implement CI/CD pipelines and best practices for infrastructure as code using CloudFormation or similar tools.

Required Skills & Experience:

Strong hands-on experience with Python or PySpark for data processing.
Deep knowledge of AWS Glue, Athena, Lambda, Redshift, RDS, DMS, and CloudFormation.
Proficiency in writing complex SQL queries and optimizing them for performance.
Familiarity with serverless architectures and AWS best practices.
Experience in designing and maintaining robust data architectures and data lakes.
Ability to troubleshoot and resolve data pipeline issues efficiently.
Strong communication and stakeholder management skills.

ETL Developer

at Deqode

1 recruiter

Posted by Mokshada Solanki

Bengaluru (Bangalore), Mumbai, Pune, Gurugram

4 - 5 yrs

₹4L - ₹20L / yr

SQL

Amazon Web Services (AWS)

Migration

PySpark

ETL

Job Summary:

Seeking a seasoned SQL + ETL Developer with 4+ years of experience in managing large-scale datasets and cloud-based data pipelines. The ideal candidate is hands-on with MySQL, PySpark, AWS Glue, and ETL workflows, with proven expertise in AWS migration and performance optimization.

Key Responsibilities:

Develop and optimize complex SQL queries and stored procedures to handle large datasets (100+ million records).
Build and maintain scalable ETL pipelines using AWS Glue and PySpark.
Work on data migration tasks in AWS environments.
Monitor and improve database performance; automate key performance indicators and reports.
Collaborate with cross-functional teams to support data integration and delivery requirements.
Write shell scripts for automation and manage ETL jobs efficiently.

Required Skills:

Strong experience with MySQL, complex SQL queries, and stored procedures.
Hands-on experience with AWS Glue, PySpark, and ETL processes.
Good understanding of AWS ecosystem and migration strategies.
Proficiency in shell scripting.
Strong communication and collaboration skills.

Nice to Have:

Working knowledge of Python.
Experience with AWS RDS.

Job Summary:

Key Responsibilities:

Develop and optimize complex SQL queries and stored procedures to handle large datasets (100+ million records).
Build and maintain scalable ETL pipelines using AWS Glue and PySpark.
Work on data migration tasks in AWS environments.
Monitor and improve database performance; automate key performance indicators and reports.
Collaborate with cross-functional teams to support data integration and delivery requirements.
Write shell scripts for automation and manage ETL jobs efficiently.

Required Skills:

Strong experience with MySQL, complex SQL queries, and stored procedures.
Hands-on experience with AWS Glue, PySpark, and ETL processes.
Good understanding of AWS ecosystem and migration strategies.
Proficiency in shell scripting.
Strong communication and collaboration skills.

Nice to Have:

Working knowledge of Python.
Experience with AWS RDS.

Data Engineer - AWS

at Deqode

1 recruiter

Posted by Shraddha Katare

Bengaluru (Bangalore), Pune, Chennai, Mumbai, Gurugram

5 - 7 yrs

₹5L - ₹19L / yr

Amazon Web Services (AWS)

Python

PySpark

SQL

redshift

Profile: AWS Data Engineer

Mode- Hybrid

Experience- 5+7 years

Locations - Bengaluru, Pune, Chennai, Mumbai, Gurugram

Roles and Responsibilities

Design and maintain ETL pipelines using AWS Glue and Python/PySpark
Optimize SQL queries for Redshift and Athena
Develop Lambda functions for serverless data processing
Configure AWS DMS for database migration and replication
Implement infrastructure as code with CloudFormation
Build optimized data models for performance
Manage RDS databases and AWS service integrations
Troubleshoot and improve data processing efficiency
Gather requirements from business stakeholders
Implement data quality checks and validation
Document data pipelines and architecture
Monitor workflows and implement alerting
Keep current with AWS services and best practices

Required Technical Expertise:

Python/PySpark for data processing
AWS Glue for ETL operations
Redshift and Athena for data querying
AWS Lambda and serverless architecture
AWS DMS and RDS management
CloudFormation for infrastructure
SQL optimization and performance tuning

Profile: AWS Data Engineer

Mode- Hybrid

Experience- 5+7 years

Locations - Bengaluru, Pune, Chennai, Mumbai, Gurugram

Roles and Responsibilities

Design and maintain ETL pipelines using AWS Glue and Python/PySpark
Optimize SQL queries for Redshift and Athena
Develop Lambda functions for serverless data processing
Configure AWS DMS for database migration and replication
Implement infrastructure as code with CloudFormation
Build optimized data models for performance
Manage RDS databases and AWS service integrations
Troubleshoot and improve data processing efficiency
Gather requirements from business stakeholders
Implement data quality checks and validation
Document data pipelines and architecture
Monitor workflows and implement alerting
Keep current with AWS services and best practices

Required Technical Expertise:

Python/PySpark for data processing
AWS Glue for ETL operations
Redshift and Athena for data querying
AWS Lambda and serverless architecture
AWS DMS and RDS management
CloudFormation for infrastructure
SQL optimization and performance tuning

AWS Data Engineer

at Deqode

1 recruiter

Posted by Alisha Das

Pune, Mumbai, Bengaluru (Bangalore), Chennai

4 - 7 yrs

₹5L - ₹15L / yr

Amazon Web Services (AWS)

Python

PySpark

Glue semantics

Amazon Redshift

+1 more

Job Overview:

We are seeking an experienced AWS Data Engineer to join our growing data team. The ideal candidate will have hands-on experience with AWS Glue, Redshift, PySpark, and other AWS services to build robust, scalable data pipelines. This role is perfect for someone passionate about data engineering, automation, and cloud-native development.

Key Responsibilities:

Design, build, and maintain scalable and efficient ETL pipelines using AWS Glue, PySpark, and related tools.
Integrate data from diverse sources and ensure its quality, consistency, and reliability.
Work with large datasets in structured and semi-structured formats across cloud-based data lakes and warehouses.
Optimize and maintain data infrastructure, including Amazon Redshift, for high performance.
Collaborate with data analysts, data scientists, and product teams to understand data requirements and deliver solutions.
Automate data validation, transformation, and loading processes to support real-time and batch data processing.
Monitor and troubleshoot data pipeline issues and ensure smooth operations in production environments.

Required Skills:

5 to 7 years of hands-on experience in data engineering roles.
Strong proficiency in Python and PySpark for data transformation and scripting.
Deep understanding and practical experience with AWS Glue, AWS Redshift, S3, and other AWS data services.
Solid understanding of SQL and database optimization techniques.
Experience working with large-scale data pipelines and high-volume data environments.
Good knowledge of data modeling, warehousing, and performance tuning.

Preferred/Good to Have:

Experience with workflow orchestration tools like Airflow or Step Functions.
Familiarity with CI/CD for data pipelines.
Knowledge of data governance and security best practices on AWS.

Job Overview:

Key Responsibilities:

Design, build, and maintain scalable and efficient ETL pipelines using AWS Glue, PySpark, and related tools.
Integrate data from diverse sources and ensure its quality, consistency, and reliability.
Work with large datasets in structured and semi-structured formats across cloud-based data lakes and warehouses.
Optimize and maintain data infrastructure, including Amazon Redshift, for high performance.
Collaborate with data analysts, data scientists, and product teams to understand data requirements and deliver solutions.
Automate data validation, transformation, and loading processes to support real-time and batch data processing.
Monitor and troubleshoot data pipeline issues and ensure smooth operations in production environments.

Required Skills:

5 to 7 years of hands-on experience in data engineering roles.
Strong proficiency in Python and PySpark for data transformation and scripting.
Deep understanding and practical experience with AWS Glue, AWS Redshift, S3, and other AWS data services.
Solid understanding of SQL and database optimization techniques.
Experience working with large-scale data pipelines and high-volume data environments.
Good knowledge of data modeling, warehousing, and performance tuning.

Preferred/Good to Have:

Experience with workflow orchestration tools like Airflow or Step Functions.
Familiarity with CI/CD for data pipelines.
Knowledge of data governance and security best practices on AWS.

ETL Developer

at Deqode

1 recruiter

Posted by Shraddha Katare

Pune, Mumbai, Bengaluru (Bangalore), Gurugram

4 - 6 yrs

₹5L - ₹10L / yr

ETL

SQL

Amazon Web Services (AWS)

PySpark

KPI

Role - ETL Developer

Work Mode - Hybrid

Experience- 4+ years

Location - Pune, Gurgaon, Bengaluru, Mumbai

Required Skills - AWS, AWS Glue, Pyspark, ETL, SQL

Required Skills:

4+ years of hands-on experience in MySQL, including SQL queries and procedure development
Experience in Pyspark, AWS, AWS Glue
Experience in AWS ,Migration
Experience with automated scripting and tracking KPIs/metrics for database performance
Proficiency in shell scripting and ETL.
Strong communication skills and a collaborative team player
Knowledge of Python and AWS RDS is a plus

Role - ETL Developer

Work Mode - Hybrid

Experience- 4+ years

Location - Pune, Gurgaon, Bengaluru, Mumbai

Required Skills - AWS, AWS Glue, Pyspark, ETL, SQL

Required Skills:

4+ years of hands-on experience in MySQL, including SQL queries and procedure development
Experience in Pyspark, AWS, AWS Glue
Experience in AWS ,Migration
Experience with automated scripting and tracking KPIs/metrics for database performance
Proficiency in shell scripting and ETL.
Strong communication skills and a collaborative team player
Knowledge of Python and AWS RDS is a plus

Data Engineer

at ZeMoSo Technologies

11 recruiters

Agency job

via TIGI HR Solution Pvt. Ltd. by Vaidehi Sarkar

Mumbai, Bengaluru (Bangalore), Hyderabad, Chennai, Pune

4 - 8 yrs

₹10L - ₹15L / yr

Data engineering

Python

SQL

Data Warehouse (DWH)

Amazon Web Services (AWS)

+3 more

Work Mode: Hybrid

Need B.Tech, BE, M.Tech, ME candidates - Mandatory

Must-Have Skills:

● Educational Qualification :- B.Tech, BE, M.Tech, ME in any field.

● Minimum of 3 years of proven experience as a Data Engineer.

● Strong proficiency in Python programming language and SQL.

● Experience in DataBricks and setting up and managing data pipelines, data warehouses/lakes.

● Good comprehension and critical thinking skills.

● Kindly note Salary bracket will vary according to the exp. of the candidate -

- Experience from 4 yrs to 6 yrs - Salary upto 22 LPA

- Experience from 5 yrs to 8 yrs - Salary upto 30 LPA

- Experience more than 8 yrs - Salary upto 40 LPA

Work Mode: Hybrid

Need B.Tech, BE, M.Tech, ME candidates - Mandatory

Must-Have Skills:

● Educational Qualification :- B.Tech, BE, M.Tech, ME in any field.

● Minimum of 3 years of proven experience as a Data Engineer.

● Strong proficiency in Python programming language and SQL.

● Experience in DataBricks and setting up and managing data pipelines, data warehouses/lakes.

● Good comprehension and critical thinking skills.

● Kindly note Salary bracket will vary according to the exp. of the candidate -

- Experience from 4 yrs to 6 yrs - Salary upto 22 LPA

- Experience from 5 yrs to 8 yrs - Salary upto 30 LPA

- Experience more than 8 yrs - Salary upto 40 LPA

Data Engineer

at Deqode

1 recruiter

Posted by Alisha Das

Bengaluru (Bangalore), Delhi, Gurugram, Noida, Ghaziabad, Faridabad, Mumbai, Pune, Hyderabad, Indore, Jaipur, Kolkata

4 - 5 yrs

₹2L - ₹18L / yr

Python

PySpark

We are looking for a skilled and passionate Data Engineers with a strong foundation in Python programming and hands-on experience working with APIs, AWS cloud, and modern development practices. The ideal candidate will have a keen interest in building scalable backend systems and working with big data tools like PySpark.

Key Responsibilities:

Write clean, scalable, and efficient Python code.
Work with Python frameworks such as PySpark for data processing.
Design, develop, update, and maintain APIs (RESTful).
Deploy and manage code using GitHub CI/CD pipelines.
Collaborate with cross-functional teams to define, design, and ship new features.
Work on AWS cloud services for application deployment and infrastructure.
Basic database design and interaction with MySQL or DynamoDB.
Debugging and troubleshooting application issues and performance bottlenecks.

Required Skills & Qualifications:

4+ years of hands-on experience with Python development.
Proficient in Python basics with a strong problem-solving approach.
Experience with AWS Cloud services (EC2, Lambda, S3, etc.).
Good understanding of API development and integration.
Knowledge of GitHub and CI/CD workflows.
Experience in working with PySpark or similar big data frameworks.
Basic knowledge of MySQL or DynamoDB.
Excellent communication skills and a team-oriented mindset.

Nice to Have:

Experience in containerization (Docker/Kubernetes).
Familiarity with Agile/Scrum methodologies.

Key Responsibilities:

Write clean, scalable, and efficient Python code.
Work with Python frameworks such as PySpark for data processing.
Design, develop, update, and maintain APIs (RESTful).
Deploy and manage code using GitHub CI/CD pipelines.
Collaborate with cross-functional teams to define, design, and ship new features.
Work on AWS cloud services for application deployment and infrastructure.
Basic database design and interaction with MySQL or DynamoDB.
Debugging and troubleshooting application issues and performance bottlenecks.

Required Skills & Qualifications:

4+ years of hands-on experience with Python development.
Proficient in Python basics with a strong problem-solving approach.
Experience with AWS Cloud services (EC2, Lambda, S3, etc.).
Good understanding of API development and integration.
Knowledge of GitHub and CI/CD workflows.
Experience in working with PySpark or similar big data frameworks.
Basic knowledge of MySQL or DynamoDB.
Excellent communication skills and a team-oriented mindset.

Nice to Have:

Experience in containerization (Docker/Kubernetes).
Familiarity with Agile/Scrum methodologies.

GCP Senior Data Engineer

at Xebia IT Architects

2 recruiters

Posted by Vijay S

Bengaluru (Bangalore), Gurugram, Pune, Hyderabad, Chennai, Bhopal, Jaipur

10 - 15 yrs

₹30L - ₹40L / yr

Spark

Google Cloud Platform (GCP)

Python

Apache Airflow

PySpark

+1 more

We are looking for a Senior Data Engineer with strong expertise in GCP, Databricks, and Airflow to design and implement a GCP Cloud Native Data Processing Framework. The ideal candidate will work on building scalable data pipelines and help migrate existing workloads to a modern framework.

Shift: 2 PM 11 PM
Work Mode: Hybrid (3 days a week) across Xebia locations
Notice Period: Immediate joiners or those with a notice period of up to 30 days

Key Responsibilities:

Design and implement a GCP Native Data Processing Framework leveraging Spark and GCP Cloud Services.
Develop and maintain data pipelines using Databricks and Airflow for transforming Raw → Silver → Gold data layers.
Ensure data integrity, consistency, and availability across all systems.
Collaborate with data engineers, analysts, and stakeholders to optimize performance.
Document standards and best practices for data engineering workflows.

Required Experience:

7-8 years of experience in data engineering, architecture, and pipeline development.
Strong knowledge of GCP, Databricks, PySpark, and BigQuery.
Experience with Orchestration tools like Airflow, Dagster, or GCP equivalents.
Understanding of Data Lake table formats (Delta, Iceberg, etc.).
Proficiency in Python for scripting and automation.
Strong problem-solving skills and collaborative mindset.

⚠️ Please apply only if you have not applied recently or are not currently in the interview process for any open roles at Xebia.

Looking forward to your response!

Best regards,

Vijay S

Assistant Manager - TAG

https://www.linkedin.com/in/vijay-selvarajan/

Shift: 2 PM 11 PM
Work Mode: Hybrid (3 days a week) across Xebia locations
Notice Period: Immediate joiners or those with a notice period of up to 30 days

Key Responsibilities:

Design and implement a GCP Native Data Processing Framework leveraging Spark and GCP Cloud Services.
Develop and maintain data pipelines using Databricks and Airflow for transforming Raw → Silver → Gold data layers.
Ensure data integrity, consistency, and availability across all systems.
Collaborate with data engineers, analysts, and stakeholders to optimize performance.
Document standards and best practices for data engineering workflows.

Required Experience:

7-8 years of experience in data engineering, architecture, and pipeline development.
Strong knowledge of GCP, Databricks, PySpark, and BigQuery.
Experience with Orchestration tools like Airflow, Dagster, or GCP equivalents.
Understanding of Data Lake table formats (Delta, Iceberg, etc.).
Proficiency in Python for scripting and automation.
Strong problem-solving skills and collaborative mindset.

⚠️ Please apply only if you have not applied recently or are not currently in the interview process for any open roles at Xebia.

Looking forward to your response!

Best regards,

Vijay S

Assistant Manager - TAG

https://www.linkedin.com/in/vijay-selvarajan/

AWS Data engineer

at Deqode

1 recruiter

Posted by Shraddha Katare

Pune

2 - 5 yrs

₹3L - ₹10L / yr

PySpark

Amazon Web Services (AWS)

AWS Lambda

SQL

Data engineering

+2 more

Here is the Job Description -

Location -- Viman Nagar, Pune

Mode - 5 Days Working

Required Tech Skills:

● Strong at PySpark, Python

● Good understanding of Data Structure

● Good at SQL query/optimization

● Strong fundamentals of OOPs programming

● Good understanding of AWS Cloud, Big Data.

● Data Lake, AWS Glue, Athena, S3, Kinesis, SQL/NoSQL DB

Here is the Job Description -

Location -- Viman Nagar, Pune

Mode - 5 Days Working

Required Tech Skills:

● Strong at PySpark, Python

● Good understanding of Data Structure

● Good at SQL query/optimization

● Strong fundamentals of OOPs programming

● Good understanding of AWS Cloud, Big Data.

● Data Lake, AWS Glue, Athena, S3, Kinesis, SQL/NoSQL DB

Data Engineer

at NeoGenCode Technologies Pvt Ltd

2 candid answers

Posted by Akshay Patil

Pune

4 - 8 yrs

₹1L - ₹12L / yr

PySpark

Data engineering

Big Data

Hadoop

Spark

+4 more

Job Description :

Job Title : Data Engineer

Location : Pune (Hybrid Work Model)

Experience Required : 4 to 8 Years

Role Overview :

We are seeking talented and driven Data Engineers to join our team in Pune. The ideal candidate will have a strong background in data engineering with expertise in Python, PySpark, and SQL. You will be responsible for designing, building, and maintaining scalable data pipelines and systems that empower our business intelligence and analytics initiatives.

Key Responsibilities:

Develop, optimize, and maintain ETL pipelines and data workflows.
Design and implement scalable data solutions using Python, PySpark, and SQL.
Collaborate with cross-functional teams to gather and analyze data requirements.
Ensure data quality, integrity, and security throughout the data lifecycle.
Monitor and troubleshoot data pipelines to ensure reliability and performance.
Work on hybrid data environments involving on-premise and cloud-based systems.
Assist in the deployment and maintenance of big data solutions.

Required Skills and Qualifications:

Bachelor’s degree in Computer Science, Information Technology, or related field.
4 to 8 Years of experience in Data Engineering or related roles.
Proficiency in Python and PySpark for data processing and analysis.
Strong SQL skills with experience in writing complex queries and optimizing performance.
Familiarity with data pipeline tools and frameworks.
Knowledge of cloud platforms such as AWS, Azure, or GCP is a plus.
Excellent problem-solving and analytical skills.
Strong communication and teamwork abilities.

Preferred Qualifications:

Experience with big data technologies like Hadoop, Hive, or Spark.
Familiarity with data visualization tools and techniques.
Knowledge of CI/CD pipelines and DevOps practices in a data engineering context.

Work Model:

This position follows a hybrid work model, with candidates expected to work from the Pune office as per business needs.

Why Join Us?

Opportunity to work with cutting-edge technologies.
Collaborative and innovative work environment.
Competitive compensation and benefits.
Clear career progression and growth opportunities.

Job Description :

Job Title : Data Engineer

Location : Pune (Hybrid Work Model)

Experience Required : 4 to 8 Years

Role Overview :

Key Responsibilities:

Develop, optimize, and maintain ETL pipelines and data workflows.
Design and implement scalable data solutions using Python, PySpark, and SQL.
Collaborate with cross-functional teams to gather and analyze data requirements.
Ensure data quality, integrity, and security throughout the data lifecycle.
Monitor and troubleshoot data pipelines to ensure reliability and performance.
Work on hybrid data environments involving on-premise and cloud-based systems.
Assist in the deployment and maintenance of big data solutions.

Required Skills and Qualifications:

Bachelor’s degree in Computer Science, Information Technology, or related field.
4 to 8 Years of experience in Data Engineering or related roles.
Proficiency in Python and PySpark for data processing and analysis.
Strong SQL skills with experience in writing complex queries and optimizing performance.
Familiarity with data pipeline tools and frameworks.
Knowledge of cloud platforms such as AWS, Azure, or GCP is a plus.
Excellent problem-solving and analytical skills.
Strong communication and teamwork abilities.

Preferred Qualifications:

Experience with big data technologies like Hadoop, Hive, or Spark.
Familiarity with data visualization tools and techniques.
Knowledge of CI/CD pipelines and DevOps practices in a data engineering context.

Work Model:

This position follows a hybrid work model, with candidates expected to work from the Pune office as per business needs.

Why Join Us?

Opportunity to work with cutting-edge technologies.
Collaborative and innovative work environment.
Competitive compensation and benefits.
Clear career progression and growth opportunities.

Azure Data Engineer

at TVARIT GmbH

2 candid answers

Posted by Shivani Kawade

Remote, Pune

2 - 4 yrs

₹8L - ₹20L / yr

Python

PySpark

ETL

databricks

Azure

+6 more

TVARIT GmbH develops and delivers solutions in the field of artificial intelligence (AI) for the Manufacturing, automotive, and process industries. With its software products, TVARIT makes it possible for its customers to make intelligent and well-founded decisions, e.g., in forward-looking Maintenance, increasing the OEE and predictive quality. We have renowned reference customers, competent technology, a good research team from renowned Universities, and the award of a renowned AI prize (e.g., EU Horizon 2020) which makes Tvarit one of the most innovative AI companies in Germany and Europe.

We are looking for a self-motivated person with a positive "can-do" attitude and excellent oral and written communication skills in English.

We are seeking a skilled and motivated Data Engineer from the manufacturing Industry with over two years of experience to join our team. As a data engineer, you will be responsible for designing, building, and maintaining the infrastructure required for the collection, storage, processing, and analysis of large and complex data sets. The ideal candidate will have a strong foundation in ETL pipelines and Python, with additional experience in Azure and Terraform being a plus. This role requires a proactive individual who can contribute to our data infrastructure and support our analytics and data science initiatives.

Skills Required

Experience in the manufacturing industry (metal industry is a plus)
2+ years of experience as a Data Engineer
Experience in data cleaning & structuring and data manipulation
ETL Pipelines: Proven experience in designing, building, and maintaining ETL pipelines.
Python: Strong proficiency in Python programming for data manipulation, transformation, and automation.
Experience in SQL and data structures
Knowledge in big data technologies such as Spark, Flink, Hadoop, Apache and NoSQL databases.
Knowledge of cloud technologies (at least one) such as AWS, Azure, and Google Cloud Platform.
Proficient in data management and data governance
Strong analytical and problem-solving skills.
Excellent communication and teamwork abilities.

Nice To Have

Azure: Experience with Azure data services (e.g., Azure Data Factory, Azure Databricks, Azure SQL Database).
Terraform: Knowledge of Terraform for infrastructure as code (IaC) to manage cloud.

We are looking for a self-motivated person with a positive "can-do" attitude and excellent oral and written communication skills in English.

Skills Required

Experience in the manufacturing industry (metal industry is a plus)
2+ years of experience as a Data Engineer
Experience in data cleaning & structuring and data manipulation
ETL Pipelines: Proven experience in designing, building, and maintaining ETL pipelines.
Python: Strong proficiency in Python programming for data manipulation, transformation, and automation.
Experience in SQL and data structures
Knowledge in big data technologies such as Spark, Flink, Hadoop, Apache and NoSQL databases.
Knowledge of cloud technologies (at least one) such as AWS, Azure, and Google Cloud Platform.
Proficient in data management and data governance
Strong analytical and problem-solving skills.
Excellent communication and teamwork abilities.

Nice To Have

Azure: Experience with Azure data services (e.g., Azure Data Factory, Azure Databricks, Azure SQL Database).
Terraform: Knowledge of Terraform for infrastructure as code (IaC) to manage cloud.

Data Engineer

at Wissen Technology

4 recruiters

Posted by Sukanya Mohan

Pune, Bengaluru (Bangalore)

5 - 10 yrs

Best in industry

Amazon Web Services (AWS)

EMR

Python

GLUE

SQL

+1 more

Greetings , Wissen Technology is Hiring for the position of Data Engineer

Please find the Job Description for your Reference:

Design, develop, and maintain data pipelines on AWS EMR (Elastic MapReduce) to support data processing and analytics.
Implement data ingestion processes from various sources including APIs, databases, and flat files.
Optimize and tune big data workflows for performance and scalability.
Collaborate with data scientists, analysts, and other stakeholders to understand data requirements and deliver solutions.
Manage and monitor EMR clusters, ensuring high availability and reliability.
Develop ETL (Extract, Transform, Load) processes to cleanse, transform, and store data in data lakes and data warehouses.
Implement data security best practices to ensure data is protected and compliant with relevant regulations.
Create and maintain technical documentation related to data pipelines, workflows, and infrastructure.
Troubleshoot and resolve issues related to data processing and EMR cluster performance.

Qualifications:

Bachelor’s degree in Computer Science, Information Technology, or a related field.
5+ years of experience in data engineering, with a focus on big data technologies.
Strong experience with AWS services, particularly EMR, S3, Redshift, Lambda, and Glue.
Proficiency in programming languages such as Python, Java, or Scala.
Experience with big data frameworks and tools such as Hadoop, Spark, Hive, and Pig.
Solid understanding of data modeling, ETL processes, and data warehousing concepts.
Experience with SQL and NoSQL databases.
Familiarity with CI/CD pipelines and version control systems (e.g., Git).
Strong problem-solving skills and the ability to work independently and collaboratively in a team environment

Greetings , Wissen Technology is Hiring for the position of Data Engineer

Please find the Job Description for your Reference:

Design, develop, and maintain data pipelines on AWS EMR (Elastic MapReduce) to support data processing and analytics.
Implement data ingestion processes from various sources including APIs, databases, and flat files.
Optimize and tune big data workflows for performance and scalability.
Collaborate with data scientists, analysts, and other stakeholders to understand data requirements and deliver solutions.
Manage and monitor EMR clusters, ensuring high availability and reliability.
Develop ETL (Extract, Transform, Load) processes to cleanse, transform, and store data in data lakes and data warehouses.
Implement data security best practices to ensure data is protected and compliant with relevant regulations.
Create and maintain technical documentation related to data pipelines, workflows, and infrastructure.
Troubleshoot and resolve issues related to data processing and EMR cluster performance.

Qualifications:

Bachelor’s degree in Computer Science, Information Technology, or a related field.
5+ years of experience in data engineering, with a focus on big data technologies.
Strong experience with AWS services, particularly EMR, S3, Redshift, Lambda, and Glue.
Proficiency in programming languages such as Python, Java, or Scala.
Experience with big data frameworks and tools such as Hadoop, Spark, Hive, and Pig.
Solid understanding of data modeling, ETL processes, and data warehousing concepts.
Experience with SQL and NoSQL databases.
Familiarity with CI/CD pipelines and version control systems (e.g., Git).
Strong problem-solving skills and the ability to work independently and collaboratively in a team environment

Sr. Data Engineer (Data Warehouse-Snowflake)

at IntraEdge

1 recruiter

Posted by Karishma Shingote

Pune

5 - 11 yrs

₹5L - ₹15L / yr

SQL

snowflake

Enterprise Data Warehouse (EDW)

Python

PySpark

Sr. Data Engineer (Data Warehouse-Snowflake)

Experience: 5+yrs

Location: Pune (Hybrid)

As a Senior Data engineer with Snowflake expertise you are a subject matter expert who is curious and an innovative thinker to mentor young professionals. You are a key person to convert Vision and Data Strategy for Data solutions and deliver them. With your knowledge you will help create data-driven thinking within the organization, not just within Data teams, but also in the wider stakeholder community.

Skills Preferred

Advanced written, verbal, and analytic skills, and demonstrated ability to influence and facilitate sustained change. Ability to convey information clearly and concisely to all levels of staff and management about programs, services, best practices, strategies, and organizational mission and values.
Proven ability to focus on priorities, strategies, and vision.
Very Good understanding in Data Foundation initiatives, like Data Modelling, Data Quality Management, Data Governance, Data Maturity Assessments and Data Strategy in support of the key business stakeholders.
Actively deliver the roll-out and embedding of Data Foundation initiatives in support of the key business programs advising on the technology and using leading market standard tools.
Coordinate the change management process, incident management and problem management process.
Ensure traceability of requirements from Data through testing and scope changes, to training and transition.
Drive implementation efficiency and effectiveness across the pilots and future projects to minimize cost, increase speed of implementation and maximize value delivery

Knowledge Preferred

Extensive knowledge and hands on experience with Snowflake and its different components like User/Group, Data Store/ Warehouse management, External Stage/table, working with semi structured data, Snowpipe etc.
Implement and manage CI/CD for migrating and deploying codes to higher environments with Snowflake codes.
Proven experience with Snowflake Access control and authentication, data security, data sharing, working with VS Code extension for snowflake, replication, and failover, optimizing SQL, analytical ability to troubleshoot and debug on development and production issues quickly is key for success in this role.
Proven technology champion in working with relational, Data warehouses databases, query authoring (SQL) as well as working familiarity with a variety of databases.
Highly Experienced in building and optimizing complex queries. Good with manipulating, processing, and extracting value from large, disconnected datasets.
Your experience in handling big data sets and big data technologies will be an asset.
Proven champion with in-depth knowledge of any one of the scripting languages: Python, SQL, Pyspark.

Primary responsibilities

You will be an asset in our team bringing deep technical skills and capabilities to become a key part of projects defining the data journey in our company, keen to engage, network and innovate in collaboration with company wide teams.
Collaborate with the data and analytics team to develop and maintain a data model and data governance infrastructure using a range of different storage technologies that enables optimal data storage and sharing using advanced methods.
Support the development of processes and standards for data mining, data modeling and data protection.
Design and implement continuous process improvements for automating manual processes and optimizing data delivery.
Assess and report on the unique data needs of key stakeholders and troubleshoot any data-related technical issues through to resolution.
Work to improve data models that support business intelligence tools, improve data accessibility and foster data-driven decision making.
Ensure traceability of requirements from Data through testing and scope changes, to training and transition.
Manage and lead technical design and development activities for implementation of large-scale data solutions in Snowflake to support multiple use cases (transformation, reporting and analytics, data monetization, etc.).
Translate advanced business data, integration and analytics problems into technical approaches that yield actionable recommendations, across multiple, diverse domains; communicate results and educate others through design and build of insightful presentations.
Exhibit strong knowledge of the Snowflake ecosystem and can clearly articulate the value proposition of cloud modernization/transformation to a wide range of stakeholders.

Relevant work experience

Bachelors in a Science, Technology, Engineering, Mathematics or Computer Science discipline or equivalent with 7+ Years of experience in enterprise-wide data warehousing, governance, policies, procedures, and implementation.

Aptitude for working with data, interpreting results, business intelligence and analytic best practices.

Business understanding

Good knowledge and understanding of Consumer and industrial products sector and IoT.

Good functional understanding of solutions supporting business processes.

Skill Must have

Snowflake 5+ years
Overall different Data warehousing techs 5+ years
SQL 5+ years
Data warehouse designing experience 3+ years
Experience with cloud and on-prem hybrid models in data architecture
Knowledge of Data Governance and strong understanding of data lineage and data quality
Programming & Scripting: Python, Pyspark
Database technologies such as Traditional RDBMS (MS SQL Server, Oracle, MySQL, PostgreSQL)

Nice to have

Demonstrated experience in modern enterprise data integration platforms such as Informatica
AWS cloud services: S3, Lambda, Glue and Kinesis and API Gateway, EC2, EMR, RDS, Redshift and Kinesis
Good understanding of Data Architecture approaches
Experience in designing and building streaming data ingestion, analysis and processing pipelines using Kafka, Kafka Streams, Spark Streaming, Stream sets and similar cloud native technologies.
Experience with implementation of operations concerns for a data platform such as monitoring, security, and scalability
Experience working in DevOps, Agile, Scrum, Continuous Delivery and/or Rapid Application Development environments
Building mock and proof-of-concepts across different capabilities/tool sets exposure
Experience working with structured, semi-structured, and unstructured data, extracting information, and identifying linkages across disparate data sets

Sr. Data Engineer (Data Warehouse-Snowflake)

Experience: 5+yrs

Location: Pune (Hybrid)

Skills Preferred

Advanced written, verbal, and analytic skills, and demonstrated ability to influence and facilitate sustained change. Ability to convey information clearly and concisely to all levels of staff and management about programs, services, best practices, strategies, and organizational mission and values.
Proven ability to focus on priorities, strategies, and vision.
Very Good understanding in Data Foundation initiatives, like Data Modelling, Data Quality Management, Data Governance, Data Maturity Assessments and Data Strategy in support of the key business stakeholders.
Actively deliver the roll-out and embedding of Data Foundation initiatives in support of the key business programs advising on the technology and using leading market standard tools.
Coordinate the change management process, incident management and problem management process.
Ensure traceability of requirements from Data through testing and scope changes, to training and transition.
Drive implementation efficiency and effectiveness across the pilots and future projects to minimize cost, increase speed of implementation and maximize value delivery

Knowledge Preferred

Extensive knowledge and hands on experience with Snowflake and its different components like User/Group, Data Store/ Warehouse management, External Stage/table, working with semi structured data, Snowpipe etc.
Implement and manage CI/CD for migrating and deploying codes to higher environments with Snowflake codes.
Proven experience with Snowflake Access control and authentication, data security, data sharing, working with VS Code extension for snowflake, replication, and failover, optimizing SQL, analytical ability to troubleshoot and debug on development and production issues quickly is key for success in this role.
Proven technology champion in working with relational, Data warehouses databases, query authoring (SQL) as well as working familiarity with a variety of databases.
Highly Experienced in building and optimizing complex queries. Good with manipulating, processing, and extracting value from large, disconnected datasets.
Your experience in handling big data sets and big data technologies will be an asset.
Proven champion with in-depth knowledge of any one of the scripting languages: Python, SQL, Pyspark.

Primary responsibilities

You will be an asset in our team bringing deep technical skills and capabilities to become a key part of projects defining the data journey in our company, keen to engage, network and innovate in collaboration with company wide teams.
Collaborate with the data and analytics team to develop and maintain a data model and data governance infrastructure using a range of different storage technologies that enables optimal data storage and sharing using advanced methods.
Support the development of processes and standards for data mining, data modeling and data protection.
Design and implement continuous process improvements for automating manual processes and optimizing data delivery.
Assess and report on the unique data needs of key stakeholders and troubleshoot any data-related technical issues through to resolution.
Work to improve data models that support business intelligence tools, improve data accessibility and foster data-driven decision making.
Ensure traceability of requirements from Data through testing and scope changes, to training and transition.
Manage and lead technical design and development activities for implementation of large-scale data solutions in Snowflake to support multiple use cases (transformation, reporting and analytics, data monetization, etc.).
Translate advanced business data, integration and analytics problems into technical approaches that yield actionable recommendations, across multiple, diverse domains; communicate results and educate others through design and build of insightful presentations.
Exhibit strong knowledge of the Snowflake ecosystem and can clearly articulate the value proposition of cloud modernization/transformation to a wide range of stakeholders.

Relevant work experience

Aptitude for working with data, interpreting results, business intelligence and analytic best practices.

Business understanding

Good knowledge and understanding of Consumer and industrial products sector and IoT.

Good functional understanding of solutions supporting business processes.

Skill Must have

Snowflake 5+ years
Overall different Data warehousing techs 5+ years
SQL 5+ years
Data warehouse designing experience 3+ years
Experience with cloud and on-prem hybrid models in data architecture
Knowledge of Data Governance and strong understanding of data lineage and data quality
Programming & Scripting: Python, Pyspark
Database technologies such as Traditional RDBMS (MS SQL Server, Oracle, MySQL, PostgreSQL)

Nice to have

Demonstrated experience in modern enterprise data integration platforms such as Informatica
AWS cloud services: S3, Lambda, Glue and Kinesis and API Gateway, EC2, EMR, RDS, Redshift and Kinesis
Good understanding of Data Architecture approaches
Experience in designing and building streaming data ingestion, analysis and processing pipelines using Kafka, Kafka Streams, Spark Streaming, Stream sets and similar cloud native technologies.
Experience with implementation of operations concerns for a data platform such as monitoring, security, and scalability
Experience working in DevOps, Agile, Scrum, Continuous Delivery and/or Rapid Application Development environments
Building mock and proof-of-concepts across different capabilities/tool sets exposure
Experience working with structured, semi-structured, and unstructured data, extracting information, and identifying linkages across disparate data sets

AWS Data Engineer (Contractual)

at Forward Eye Technologies

Posted by Jaya S

Bengaluru (Bangalore), Mumbai, Delhi, Gurugram, Pune, Hyderabad, Ahmedabad, Chennai

3 - 7 yrs

₹8L - ₹15L / yr

AWS Lambda

Amazon S3

Amazon VPC

Amazon EC2

Amazon Redshift

+3 more

Technical Skills:

Ability to understand and translate business requirements into design.
Proficient in AWS infrastructure components such as S3, IAM, VPC, EC2, and Redshift.
Experience in creating ETL jobs using Python/PySpark.
Proficiency in creating AWS Lambda functions for event-based jobs.
Knowledge of automating ETL processes using AWS Step Functions.
Competence in building data warehouses and loading data into them.

Responsibilities:

Understand business requirements and translate them into design.
Assess AWS infrastructure needs for development work.
Develop ETL jobs using Python/PySpark to meet requirements.
Implement AWS Lambda for event-based tasks.
Automate ETL processes using AWS Step Functions.
Build data warehouses and manage data loading.
Engage with customers and stakeholders to articulate the benefits of proposed solutions and frameworks.

Technical Skills:

Ability to understand and translate business requirements into design.
Proficient in AWS infrastructure components such as S3, IAM, VPC, EC2, and Redshift.
Experience in creating ETL jobs using Python/PySpark.
Proficiency in creating AWS Lambda functions for event-based jobs.
Knowledge of automating ETL processes using AWS Step Functions.
Competence in building data warehouses and loading data into them.

Responsibilities:

Understand business requirements and translate them into design.
Assess AWS infrastructure needs for development work.
Develop ETL jobs using Python/PySpark to meet requirements.
Implement AWS Lambda for event-based tasks.
Automate ETL processes using AWS Step Functions.
Build data warehouses and manage data loading.
Engage with customers and stakeholders to articulate the benefits of proposed solutions and frameworks.

Senior Data Engineer (L2)

at Publicis Sapient

10 recruiters

Posted by Mohit Singh

Bengaluru (Bangalore), Pune, Hyderabad, Gurugram, Noida

5 - 11 yrs

₹20L - ₹36L / yr

PySpark

Data engineering

Big Data

Hadoop

Spark

+7 more

Publicis Sapient Overview:

The Senior Associate People Senior Associate L1 in Data Engineering, you will translate client requirements into technical design, and implement components for data engineering solution. Utilize deep understanding of data integration and big data design principles in creating custom solutions or implementing package solutions. You will independently drive design discussions to insure the necessary health of the overall solution

Job Summary:

As Senior Associate L2 in Data Engineering, you will translate client requirements into technical design, and implement components for data engineering solution. Utilize deep understanding of data integration and big data design principles in creating custom solutions or implementing package solutions. You will independently drive design discussions to insure the necessary health of the overall solution

The role requires a hands-on technologist who has strong programming background like Java / Scala / Python, should have experience in Data Ingestion, Integration and data Wrangling, Computation, Analytics pipelines and exposure to Hadoop ecosystem components. You are also required to have hands-on knowledge on at least one of AWS, GCP, Azure cloud platforms.

Role & Responsibilities:

Your role is focused on Design, Development and delivery of solutions involving:

• Data Integration, Processing & Governance

• Data Storage and Computation Frameworks, Performance Optimizations

• Analytics & Visualizations

• Infrastructure & Cloud Computing

• Data Management Platforms

• Implement scalable architectural models for data processing and storage

• Build functionality for data ingestion from multiple heterogeneous sources in batch & real-time mode

• Build functionality for data analytics, search and aggregation

Experience Guidelines:

Mandatory Experience and Competencies:

# Competency

1.Overall 5+ years of IT experience with 3+ years in Data related technologies

2.Minimum 2.5 years of experience in Big Data technologies and working exposure in at least one cloud platform on related data services (AWS / Azure / GCP)

3.Hands-on experience with the Hadoop stack – HDFS, sqoop, kafka, Pulsar, NiFi, Spark, Spark Streaming, Flink, Storm, hive, oozie, airflow and other components required in building end to end data pipeline.

4.Strong experience in at least of the programming language Java, Scala, Python. Java preferable

5.Hands-on working knowledge of NoSQL and MPP data platforms like Hbase, MongoDb, Cassandra, AWS Redshift, Azure SQLDW, GCP BigQuery etc

6.Well-versed and working knowledge with data platform related services on at least 1 cloud platform, IAM and data security

Preferred Experience and Knowledge (Good to Have):

# Competency

1.Good knowledge of traditional ETL tools (Informatica, Talend, etc) and database technologies (Oracle, MySQL, SQL Server, Postgres) with hands on experience

2.Knowledge on data governance processes (security, lineage, catalog) and tools like Collibra, Alation etc

3.Knowledge on distributed messaging frameworks like ActiveMQ / RabbiMQ / Solace, search & indexing and Micro services architectures

4.Performance tuning and optimization of data pipelines

5.CI/CD – Infra provisioning on cloud, auto build & deployment pipelines, code quality

6.Cloud data specialty and other related Big data technology certifications

Personal Attributes:

• Strong written and verbal communication skills

• Articulation skills

• Good team player

• Self-starter who requires minimal oversight

• Ability to prioritize and manage multiple tasks

• Process orientation and the ability to define and set up processes

Publicis Sapient Overview:

Job Summary:

Role & Responsibilities:

Your role is focused on Design, Development and delivery of solutions involving:

• Data Integration, Processing & Governance

• Data Storage and Computation Frameworks, Performance Optimizations

• Analytics & Visualizations

• Infrastructure & Cloud Computing

• Data Management Platforms

• Implement scalable architectural models for data processing and storage

• Build functionality for data ingestion from multiple heterogeneous sources in batch & real-time mode

• Build functionality for data analytics, search and aggregation

Experience Guidelines:

Mandatory Experience and Competencies:

# Competency

1.Overall 5+ years of IT experience with 3+ years in Data related technologies

2.Minimum 2.5 years of experience in Big Data technologies and working exposure in at least one cloud platform on related data services (AWS / Azure / GCP)

4.Strong experience in at least of the programming language Java, Scala, Python. Java preferable

5.Hands-on working knowledge of NoSQL and MPP data platforms like Hbase, MongoDb, Cassandra, AWS Redshift, Azure SQLDW, GCP BigQuery etc

6.Well-versed and working knowledge with data platform related services on at least 1 cloud platform, IAM and data security

Preferred Experience and Knowledge (Good to Have):

# Competency

1.Good knowledge of traditional ETL tools (Informatica, Talend, etc) and database technologies (Oracle, MySQL, SQL Server, Postgres) with hands on experience

2.Knowledge on data governance processes (security, lineage, catalog) and tools like Collibra, Alation etc

3.Knowledge on distributed messaging frameworks like ActiveMQ / RabbiMQ / Solace, search & indexing and Micro services architectures

4.Performance tuning and optimization of data pipelines

5.CI/CD – Infra provisioning on cloud, auto build & deployment pipelines, code quality

6.Cloud data specialty and other related Big data technology certifications

Personal Attributes:

• Strong written and verbal communication skills

• Articulation skills

• Good team player

• Self-starter who requires minimal oversight

• Ability to prioritize and manage multiple tasks

• Process orientation and the ability to define and set up processes

Senior Data Engineer (L1)

at Publicis Sapient

10 recruiters

Posted by Mohit Singh

Bengaluru (Bangalore), Gurugram, Pune, Hyderabad, Noida

4 - 10 yrs

Best in industry

PySpark

Data engineering

Big Data

Hadoop

Spark

+6 more

Publicis Sapient Overview:

Job Summary:

As Senior Associate L1 in Data Engineering, you will do technical design, and implement components for data engineering solution. Utilize deep understanding of data integration and big data design principles in creating custom solutions or implementing package solutions. You will independently drive design discussions to insure the necessary health of the overall solution

The role requires a hands-on technologist who has strong programming background like Java / Scala / Python, should have experience in Data Ingestion, Integration and data Wrangling, Computation, Analytics pipelines and exposure to Hadoop ecosystem components. Having hands-on knowledge on at least one of AWS, GCP, Azure cloud platforms will be preferable.

Role & Responsibilities:

Job Title: Senior Associate L1 – Data Engineering

Your role is focused on Design, Development and delivery of solutions involving:

• Data Ingestion, Integration and Transformation

• Data Storage and Computation Frameworks, Performance Optimizations

• Analytics & Visualizations

• Infrastructure & Cloud Computing

• Data Management Platforms

• Build functionality for data ingestion from multiple heterogeneous sources in batch & real-time

• Build functionality for data analytics, search and aggregation

Experience Guidelines:

Mandatory Experience and Competencies:

# Competency

1.Overall 3.5+ years of IT experience with 1.5+ years in Data related technologies

2.Minimum 1.5 years of experience in Big Data technologies

4.Strong experience in at least of the programming language Java, Scala, Python. Java preferable

5.Hands-on working knowledge of NoSQL and MPP data platforms like Hbase, MongoDb, Cassandra, AWS Redshift, Azure SQLDW, GCP BigQuery etc

Preferred Experience and Knowledge (Good to Have):

# Competency

1.Good knowledge of traditional ETL tools (Informatica, Talend, etc) and database technologies (Oracle, MySQL, SQL Server, Postgres) with hands on experience

2.Knowledge on data governance processes (security, lineage, catalog) and tools like Collibra, Alation etc

3.Knowledge on distributed messaging frameworks like ActiveMQ / RabbiMQ / Solace, search & indexing and Micro services architectures

4.Performance tuning and optimization of data pipelines

5.CI/CD – Infra provisioning on cloud, auto build & deployment pipelines, code quality

6.Working knowledge with data platform related services on at least 1 cloud platform, IAM and data security

7.Cloud data specialty and other related Big data technology certifications

Job Title: Senior Associate L1 – Data Engineering

Personal Attributes:

• Strong written and verbal communication skills

• Articulation skills

• Good team player

• Self-starter who requires minimal oversight

• Ability to prioritize and manage multiple tasks

• Process orientation and the ability to define and set up processes

Publicis Sapient Overview:

Job Summary:

The role requires a hands-on technologist who has strong programming background like Java / Scala / Python, should have experience in Data Ingestion, Integration and data Wrangling, Computation, Analytics pipelines and exposure to Hadoop ecosystem components. Having hands-on knowledge on at least one of AWS, GCP, Azure cloud platforms will be preferable.

Role & Responsibilities:

Job Title: Senior Associate L1 – Data Engineering

Your role is focused on Design, Development and delivery of solutions involving:

• Data Ingestion, Integration and Transformation

• Data Storage and Computation Frameworks, Performance Optimizations

• Analytics & Visualizations

• Infrastructure & Cloud Computing

• Data Management Platforms

• Build functionality for data ingestion from multiple heterogeneous sources in batch & real-time

• Build functionality for data analytics, search and aggregation

Experience Guidelines:

Mandatory Experience and Competencies:

# Competency

1.Overall 3.5+ years of IT experience with 1.5+ years in Data related technologies

2.Minimum 1.5 years of experience in Big Data technologies

4.Strong experience in at least of the programming language Java, Scala, Python. Java preferable

5.Hands-on working knowledge of NoSQL and MPP data platforms like Hbase, MongoDb, Cassandra, AWS Redshift, Azure SQLDW, GCP BigQuery etc

Preferred Experience and Knowledge (Good to Have):

# Competency

1.Good knowledge of traditional ETL tools (Informatica, Talend, etc) and database technologies (Oracle, MySQL, SQL Server, Postgres) with hands on experience

2.Knowledge on data governance processes (security, lineage, catalog) and tools like Collibra, Alation etc

3.Knowledge on distributed messaging frameworks like ActiveMQ / RabbiMQ / Solace, search & indexing and Micro services architectures

4.Performance tuning and optimization of data pipelines

5.CI/CD – Infra provisioning on cloud, auto build & deployment pipelines, code quality

6.Working knowledge with data platform related services on at least 1 cloud platform, IAM and data security

7.Cloud data specialty and other related Big data technology certifications

Job Title: Senior Associate L1 – Data Engineering

Personal Attributes:

• Strong written and verbal communication skills

• Articulation skills

• Good team player

• Self-starter who requires minimal oversight

• Ability to prioritize and manage multiple tasks

• Process orientation and the ability to define and set up processes

Data engineer

at Mitibase

Posted by Vaidehi Ghangurde

Pune

2 - 4 yrs

₹6L - ₹8L / yr

Vue.js

AngularJS (1.x)

React.js

Angular (2+)

Javascript

+6 more

· The Objective:

You will play a crucial role in designing, implementing, and maintaining our data infrastructure, run tests and update the systems

· Job function and requirements

o Expert in Python, Pandas and Numpy with knowledge of Python web Framework such as Django and Flask.

o Able to integrate multiple data sources and databases into one system.

o Basic understanding of frontend technologies like HTML, CSS, JavaScript.

o Able to build data pipelines.

o Strong unit test and debugging skills.

o Understanding of fundamental design principles behind a scalable application

o Good understanding of RDBMS databases among Mysql or Postgresql.

o Able to analyze and transform raw data.

· About us

Mitibase helps companies find warm prospects every month that are most relevant, and then helps their team to act on those with automation. We do so by automatically tracking key accounts and contacts for job changes and relationships triggers and surfaces them as warm leads in your sales pipeline.

· The Objective:

You will play a crucial role in designing, implementing, and maintaining our data infrastructure, run tests and update the systems

· Job function and requirements

o Expert in Python, Pandas and Numpy with knowledge of Python web Framework such as Django and Flask.

o Able to integrate multiple data sources and databases into one system.

o Basic understanding of frontend technologies like HTML, CSS, JavaScript.

o Able to build data pipelines.

o Strong unit test and debugging skills.

o Understanding of fundamental design principles behind a scalable application

o Good understanding of RDBMS databases among Mysql or Postgresql.

o Able to analyze and transform raw data.

· About us

Big Data developer

one of the world's leading multinational investment bank

Agency job

via HiyaMee by Lithin Raj

Pune

5 - 9 yrs

₹5L - ₹15L / yr

PySpark

Data engineering

Big Data

Hadoop

Spark

+2 more

This role is for a developer with strong core application or system programming skills in Scala, java and
good exposure to concepts and/or technology across the broader spectrum. Enterprise Risk Technology
covers a variety of existing systems and green-field projects.
A Full stack Hadoop development experience with Scala development
A Full stack Java development experience covering Core Java (including JDK 1.8) and good understanding
of design patterns.
Requirements:-
• Strong hands-on development in Java technologies.
• Strong hands-on development in Hadoop technologies like Spark, Scala and experience on Avro.
• Participation in product feature design and documentation
• Requirement break-up, ownership and implantation.
• Product BAU deliveries and Level 3 production defects fixes.
Qualifications & Experience
• Degree holder in numerate subject
• Hands on Experience on Hadoop, Spark, Scala, Impala, Avro and messaging like Kafka
• Experience across a core compiled language – Java
• Proficiency in Java related frameworks like Springs, Hibernate, JPA
• Hands on experience in JDK 1.8 and strong skillset covering Collections, Multithreading with

For internal use only
For internal use only
experience working on Distributed applications.
• Strong hands-on development track record with end-to-end development cycle involvement
• Good exposure to computational concepts
• Good communication and interpersonal skills
• Working knowledge of risk and derivatives pricing (optional)
• Proficiency in SQL (PL/SQL), data modelling.
• Understanding of Hadoop architecture and Scala program language is a good to have.

Data Engineer

consulting & implementation services in the area of Oil & Gas, Mining and Manufacturing Industry

Agency job

via Jobdost by Sathish Kumar

Ahmedabad, Hyderabad, Pune, Delhi

5 - 7 yrs

₹18L - ₹25L / yr

AWS Lambda

AWS Simple Notification Service (SNS)

AWS Simple Queuing Service (SQS)

Python

PySpark

+9 more

Data Engineer

Required skill set: AWS GLUE, AWS LAMBDA, AWS SNS/SQS, AWS ATHENA, SPARK, SNOWFLAKE, PYTHON

Mandatory Requirements 

Experience in AWS Glue
Experience in Apache Parquet 
Proficient in AWS S3 and data lake 
Knowledge of Snowflake
Understanding of file-based ingestion best practices.
Scripting language - Python & pyspark

CORE RESPONSIBILITIES

Create and manage cloud resources in AWS 
Data ingestion from different data sources which exposes data using different technologies, such as: RDBMS, REST HTTP API, flat files, Streams, and Time series data based on various proprietary systems. Implement data ingestion and processing with the help of Big Data technologies 
Data processing/transformation using various technologies such as Spark and Cloud Services. You will need to understand your part of business logic and implement it using the language supported by the base data platform 
Develop automated data quality check to make sure right data enters the platform and verifying the results of the calculations 
Develop an infrastructure to collect, transform, combine and publish/distribute customer data.
Define process improvement opportunities to optimize data collection, insights and displays.
Ensure data and results are accessible, scalable, efficient, accurate, complete and flexible 
Identify and interpret trends and patterns from complex data sets 
Construct a framework utilizing data visualization tools and techniques to present consolidated analytical and actionable results to relevant stakeholders. 
Key participant in regular Scrum ceremonies with the agile teams  
Proficient at developing queries, writing reports and presenting findings 
Mentor junior members and bring best industry practices

 QUALIFICATIONS

5-7+ years’ experience as data engineer in consumer finance or equivalent industry (consumer loans, collections, servicing, optional product, and insurance sales) 
Strong background in math, statistics, computer science, data science or related discipline
Advanced knowledge one of language: Java, Scala, Python, C# 
Production experience with: HDFS, YARN, Hive, Spark, Kafka, Oozie / Airflow, Amazon Web Services (AWS), Docker / Kubernetes, Snowflake  
Proficient with
Data mining/programming tools (e.g. SAS, SQL, R, Python)
Database technologies (e.g. PostgreSQL, Redshift, Snowflake. and Greenplum)
Data visualization (e.g. Tableau, Looker, MicroStrategy)
Comfortable learning about and deploying new technologies and tools. 
Organizational skills and the ability to handle multiple projects and priorities simultaneously and meet established deadlines. 
Good written and oral communication skills and ability to present results to non-technical audiences 
Knowledge of business intelligence and analytical tools, technologies and techniques.

Familiarity and experience in the following is a plus: 

AWS certification
Spark Streaming 
Kafka Streaming / Kafka Connect 
ELK Stack 
Cassandra / MongoDB 
CI/CD: Jenkins, GitLab, Jira, Confluence other related tools

Data Engineer

Required skill set: AWS GLUE, AWS LAMBDA, AWS SNS/SQS, AWS ATHENA, SPARK, SNOWFLAKE, PYTHON

Mandatory Requirements 

Experience in AWS Glue
Experience in Apache Parquet 
Proficient in AWS S3 and data lake 
Knowledge of Snowflake
Understanding of file-based ingestion best practices.
Scripting language - Python & pyspark

CORE RESPONSIBILITIES

Create and manage cloud resources in AWS 
Data ingestion from different data sources which exposes data using different technologies, such as: RDBMS, REST HTTP API, flat files, Streams, and Time series data based on various proprietary systems. Implement data ingestion and processing with the help of Big Data technologies 
Data processing/transformation using various technologies such as Spark and Cloud Services. You will need to understand your part of business logic and implement it using the language supported by the base data platform 
Develop automated data quality check to make sure right data enters the platform and verifying the results of the calculations 
Develop an infrastructure to collect, transform, combine and publish/distribute customer data.
Define process improvement opportunities to optimize data collection, insights and displays.
Ensure data and results are accessible, scalable, efficient, accurate, complete and flexible 
Identify and interpret trends and patterns from complex data sets 
Construct a framework utilizing data visualization tools and techniques to present consolidated analytical and actionable results to relevant stakeholders. 
Key participant in regular Scrum ceremonies with the agile teams  
Proficient at developing queries, writing reports and presenting findings 
Mentor junior members and bring best industry practices

 QUALIFICATIONS

5-7+ years’ experience as data engineer in consumer finance or equivalent industry (consumer loans, collections, servicing, optional product, and insurance sales) 
Strong background in math, statistics, computer science, data science or related discipline
Advanced knowledge one of language: Java, Scala, Python, C# 
Production experience with: HDFS, YARN, Hive, Spark, Kafka, Oozie / Airflow, Amazon Web Services (AWS), Docker / Kubernetes, Snowflake  
Proficient with
Data mining/programming tools (e.g. SAS, SQL, R, Python)
Database technologies (e.g. PostgreSQL, Redshift, Snowflake. and Greenplum)
Data visualization (e.g. Tableau, Looker, MicroStrategy)
Comfortable learning about and deploying new technologies and tools. 
Organizational skills and the ability to handle multiple projects and priorities simultaneously and meet established deadlines. 
Good written and oral communication skills and ability to present results to non-technical audiences 
Knowledge of business intelligence and analytical tools, technologies and techniques.

Familiarity and experience in the following is a plus: 

AWS certification
Spark Streaming 
Kafka Streaming / Kafka Connect 
ELK Stack 
Cassandra / MongoDB 
CI/CD: Jenkins, GitLab, Jira, Confluence other related tools

Data Engineer

at GradMener Technology Pvt. Ltd.

Posted by Soni Jagwani

Pune, Chennai

5 - 9 yrs

₹15L - ₹20L / yr

Scala

PySpark

Spark

SQL Azure

Hadoop

+4 more

5+ years of experience in a Data Engineering role on cloud environment

Must have good experience in Scala/PySpark (preferably on data-bricks environment)

Extensive experience with Transact-SQL.
Experience in Data-bricks/Spark.

Strong experience in Dataware house projects
Expertise in database development projects with ETL processes.
Manage and maintain data engineering pipelines

Develop batch processing, streaming and integration solutions
Experienced in building and operationalizing large-scale enterprise data solutions and applications

Using one or more of Azure data and analytics services in combination with custom solutions
Azure Data Lake, Azure SQL DW (Synapse), and SQL Database products or equivalent products from other cloud services providers

In-depth understanding of data management (e. g. permissions, security, and monitoring).
Cloud repositories for e.g. Azure GitHub, Git
Experience in an agile environment (Prefer Azure DevOps).

Good to have

Manage source data access security
Automate Azure Data Factory pipelines
Continuous Integration/Continuous deployment (CICD) pipelines, Source Repositories
Experience in implementing and maintaining CICD pipelines
Power BI understanding, Delta Lake house architecture
Knowledge of software development best practices.
Excellent analytical and organization skills.
Effective working in a team as well as working independently.
Strong written and verbal communication skills.
Expertise in database development projects and ETL processes.

5+ years of experience in a Data Engineering role on cloud environment

Must have good experience in Scala/PySpark (preferably on data-bricks environment)

Extensive experience with Transact-SQL.
Experience in Data-bricks/Spark.

Strong experience in Dataware house projects
Expertise in database development projects with ETL processes.
Manage and maintain data engineering pipelines

Develop batch processing, streaming and integration solutions
Experienced in building and operationalizing large-scale enterprise data solutions and applications

Using one or more of Azure data and analytics services in combination with custom solutions
Azure Data Lake, Azure SQL DW (Synapse), and SQL Database products or equivalent products from other cloud services providers

In-depth understanding of data management (e. g. permissions, security, and monitoring).
Cloud repositories for e.g. Azure GitHub, Git
Experience in an agile environment (Prefer Azure DevOps).

Good to have

Manage source data access security
Automate Azure Data Factory pipelines
Continuous Integration/Continuous deployment (CICD) pipelines, Source Repositories
Experience in implementing and maintaining CICD pipelines
Power BI understanding, Delta Lake house architecture
Knowledge of software development best practices.
Excellent analytical and organization skills.
Effective working in a team as well as working independently.
Strong written and verbal communication skills.
Expertise in database development projects and ETL processes.

Data Architect (SG0601)

at EnterpriseMinds

2 recruiters

Posted by phani kalyan

Pune

9 - 14 yrs

₹20L - ₹40L / yr

Spark

Hadoop

Big Data

Data engineering

PySpark

+3 more

Job Id: SG0601

Hi,

Enterprise Minds is looking for Data Architect for Pune Location.

Req Skills:
Python,Pyspark,Hadoop,Java,Scala

Big data developer

Persistent System Ltd

Agency job

via Milestone Hr Consultancy by Haina khan

Pune, Bengaluru (Bangalore), Hyderabad

4 - 9 yrs

₹8L - ₹27L / yr

Python

PySpark

Amazon Web Services (AWS)

Spark

Scala

Greetings..

We have urgent requirement of Data Engineer/Sr Data Engineer for reputed MNC company.

Exp: 4-9yrs

Location: Pune/Bangalore/Hyderabad

Skills: We need candidate either Python AWS or Pyspark AWS or Spark Scala

Big data Developer

at Persistent Systems

1 video

1 recruiter

Agency job

via Milestone Hr Consultancy by Haina khan

Pune, Bengaluru (Bangalore), Hyderabad, Nagpur

4 - 9 yrs

₹4L - ₹15L / yr

Spark

Hadoop

Big Data

Data engineering

PySpark

+3 more

Greetings..

We have an urgent requirements of Big Data Developer profiles in our reputed MNC company.

Location: Pune/Bangalore/Hyderabad/Nagpur
Experience: 4-9yrs

Skills: Pyspark,AWS
or Spark,Scala,AWS
or Python Aws

Big Data Engineer

Hiring for one of the MNC for India location

Agency job

via Natalie Consultants by Rahul Kumar

Gurugram, Pune, Bengaluru (Bangalore), Delhi, Noida, Ghaziabad, Faridabad

2 - 9 yrs

₹8L - ₹20L / yr

Python

Hadoop

Big Data

Spark

Data engineering

+3 more

Key Responsibilities : ( Data Developer Python, Spark)

Exp : 2 to 9 Yrs

Development of data platforms, integration frameworks, processes, and code.

Develop and deliver APIs in Python or Scala for Business Intelligence applications build using a range of web languages

Develop comprehensive automated tests for features via end-to-end integration tests, performance tests, acceptance tests and unit tests.

Elaborate stories in a collaborative agile environment (SCRUM or Kanban)

Familiarity with cloud platforms like GCP, AWS or Azure.

Experience with large data volumes.

Familiarity with writing rest-based services.

Experience with distributed processing and systems

Experience with Hadoop / Spark toolsets

Experience with relational database management systems (RDBMS)

Experience with Data Flow development

Knowledge of Agile and associated development techniques including:

Key Responsibilities : ( Data Developer Python, Spark)

Exp : 2 to 9 Yrs

Development of data platforms, integration frameworks, processes, and code.

Develop and deliver APIs in Python or Scala for Business Intelligence applications build using a range of web languages

Develop comprehensive automated tests for features via end-to-end integration tests, performance tests, acceptance tests and unit tests.

Elaborate stories in a collaborative agile environment (SCRUM or Kanban)

Familiarity with cloud platforms like GCP, AWS or Azure.

Experience with large data volumes.

Familiarity with writing rest-based services.

Experience with distributed processing and systems

Experience with Hadoop / Spark toolsets

Experience with relational database management systems (RDBMS)

Experience with Data Flow development

Knowledge of Agile and associated development techniques including:

Pyspark Lead/Pyspark Dev

at Virtusa

2 recruiters

Agency job

via Response Informatics by Anupama Lavanya Uppala

Chennai, Bengaluru (Bangalore), Mumbai, Hyderabad, Pune

3 - 10 yrs

₹10L - ₹25L / yr

PySpark

Python

Minimum 1 years of relevant experience, in PySpark (mandatory)
Hands on experience in development, test, deploy, maintain and improving data integration pipeline in AWS cloud environment is added plus
Ability to play lead role and independently manage 3-5 member of Pyspark development team
EMR ,Python and PYspark mandate.
Knowledge and awareness working with AWS Cloud technologies like Apache Spark, , Glue, Kafka, Kinesis, and Lambda in S3, Redshift, RDS

Minimum 1 years of relevant experience, in PySpark (mandatory)
Hands on experience in development, test, deploy, maintain and improving data integration pipeline in AWS cloud environment is added plus
Ability to play lead role and independently manage 3-5 member of Pyspark development team
EMR ,Python and PYspark mandate.
Knowledge and awareness working with AWS Cloud technologies like Apache Spark, , Glue, Kafka, Kinesis, and Lambda in S3, Redshift, RDS

Data Engineer

Cloud infrastructure solutions and support company. (SE1)

Agency job

via Multi Recruit by Ranjini A R

Pune

2 - 6 yrs

₹12L - ₹16L / yr

SQL

ETL

Data engineering

Big Data

Java

+2 more

Design, create, test, and maintain data pipeline architecture in collaboration with the Data Architect.
Build the infrastructure required for extraction, transformation, and loading of data from a wide variety of data sources using Java, SQL, and Big Data technologies.
Support the translation of data needs into technical system requirements. Support in building complex queries required by the product teams.
Build data pipelines that clean, transform, and aggregate data from disparate sources
Develop, maintain and optimize ETLs to increase data accuracy, data stability, data availability, and pipeline performance.
Engage with Product Management and Business to deploy and monitor products/services on cloud platforms.
Stay up-to-date with advances in data persistence and big data technologies and run pilots to design the data architecture to scale with the increased data sets of consumer experience.
Handle data integration, consolidation, and reconciliation activities for digital consumer / medical products.

Job Qualifications:

Bachelor’s or master's degree in Computer Science, Information management, Statistics or related field
5+ years of experience in the Consumer or Healthcare industry in an analytical role with a focus on building on data pipelines, querying data, analyzing, and clearly presenting analyses to members of the data science team.
Technical expertise with data models, data mining.
Hands-on Knowledge of programming languages in Java, Python, R, and Scala.
Strong knowledge in Big data tools like the snowflake, AWS Redshift, Hadoop, map-reduce, etc.
Having knowledge in tools like AWS Glue, S3, AWS EMR, Streaming data pipelines, Kafka/Kinesis is desirable.
Hands-on knowledge in SQL and No-SQL database design.
Having knowledge in CI/CD for the building and hosting of the solutions.
Having AWS certification is an added advantage.
Having Strong knowledge in visualization tools like Tableau, QlikView is an added advantage
A team player capable of working and integrating across cross-functional teams for implementing project requirements. Experience in technical requirements gathering and documentation.
Ability to work effectively and independently in a fast-paced agile environment with tight deadlines
A flexible, pragmatic, and collaborative team player with the innate ability to engage with data architects, analysts, and scientists

Design, create, test, and maintain data pipeline architecture in collaboration with the Data Architect.
Build the infrastructure required for extraction, transformation, and loading of data from a wide variety of data sources using Java, SQL, and Big Data technologies.
Support the translation of data needs into technical system requirements. Support in building complex queries required by the product teams.
Build data pipelines that clean, transform, and aggregate data from disparate sources
Develop, maintain and optimize ETLs to increase data accuracy, data stability, data availability, and pipeline performance.
Engage with Product Management and Business to deploy and monitor products/services on cloud platforms.
Stay up-to-date with advances in data persistence and big data technologies and run pilots to design the data architecture to scale with the increased data sets of consumer experience.
Handle data integration, consolidation, and reconciliation activities for digital consumer / medical products.

Job Qualifications:

Bachelor’s or master's degree in Computer Science, Information management, Statistics or related field
5+ years of experience in the Consumer or Healthcare industry in an analytical role with a focus on building on data pipelines, querying data, analyzing, and clearly presenting analyses to members of the data science team.
Technical expertise with data models, data mining.
Hands-on Knowledge of programming languages in Java, Python, R, and Scala.
Strong knowledge in Big data tools like the snowflake, AWS Redshift, Hadoop, map-reduce, etc.
Having knowledge in tools like AWS Glue, S3, AWS EMR, Streaming data pipelines, Kafka/Kinesis is desirable.
Hands-on knowledge in SQL and No-SQL database design.
Having knowledge in CI/CD for the building and hosting of the solutions.
Having AWS certification is an added advantage.
Having Strong knowledge in visualization tools like Tableau, QlikView is an added advantage
A team player capable of working and integrating across cross-functional teams for implementing project requirements. Experience in technical requirements gathering and documentation.
Ability to work effectively and independently in a fast-paced agile environment with tight deadlines
A flexible, pragmatic, and collaborative team player with the innate ability to engage with data architects, analysts, and scientists

Data Engineer For Python

at A2Tech Consultants

3 recruiters

Posted by Dhaval B

Pune

4 - 12 yrs

₹6L - ₹15L / yr

Data engineering

Data Engineer

ETL

Spark

Apache Kafka

+5 more

We are looking for a smart candidate with:

Strong Python Coding skills and OOP skills
Should have worked on Big Data product Architecture
Should have worked with any one of the SQL-based databases like MySQL, PostgreSQL and any one of
NoSQL-based databases such as Cassandra, Elasticsearch etc.
Hands on experience on frameworks like Spark RDD, DataFrame, Dataset
Experience on development of ETL for data product
Candidate should have working knowledge on performance optimization, optimal resource utilization, Parallelism and tuning of spark jobs
Working knowledge on file formats: CSV, JSON, XML, PARQUET, ORC, AVRO
Good to have working knowledge with any one of the Analytical Databases like Druid, MongoDB, Apache Hive etc.
Experience to handle real-time data feeds (good to have working knowledge on Apache Kafka or similar tool)

Key Skills:

Python and Scala (Optional), Spark / PySpark, Parallel programming

We are looking for a smart candidate with:

Strong Python Coding skills and OOP skills
Should have worked on Big Data product Architecture
Should have worked with any one of the SQL-based databases like MySQL, PostgreSQL and any one of
NoSQL-based databases such as Cassandra, Elasticsearch etc.
Hands on experience on frameworks like Spark RDD, DataFrame, Dataset
Experience on development of ETL for data product
Candidate should have working knowledge on performance optimization, optimal resource utilization, Parallelism and tuning of spark jobs
Working knowledge on file formats: CSV, JSON, XML, PARQUET, ORC, AVRO
Good to have working knowledge with any one of the Analytical Databases like Druid, MongoDB, Apache Hive etc.
Experience to handle real-time data feeds (good to have working knowledge on Apache Kafka or similar tool)

Key Skills:

Python and Scala (Optional), Spark / PySpark, Parallel programming

Bigdata Lead Architecture

at DataMetica

1 video

7 recruiters

Posted by Nikita Aher

Pune, Hyderabad

7 - 12 yrs

₹12L - ₹33L / yr

Big Data

Hadoop

Spark

Apache Spark

Apache Hive

+3 more

Job description

Role : Lead Architecture (Spark, Scala, Big Data/Hadoop, Java)

Primary Location : India-Pune, Hyderabad

Experience : 7 - 12 Years

Management Level: 7

Joining Time: Immediate Joiners are preferred

Attend requirements gathering workshops, estimation discussions, design meetings and status review meetings
Experience of Solution Design and Solution Architecture for the data engineer model to build and implement Big Data Projects on-premises and on cloud.
Align architecture with business requirements and stabilizing the developed solution
Ability to build prototypes to demonstrate the technical feasibility of your vision
Professional experience facilitating and leading solution design, architecture and delivery planning activities for data intensive and high throughput platforms and applications
To be able to benchmark systems, analyses system bottlenecks and propose solutions to eliminate them
Able to help programmers and project managers in the design, planning and governance of implementing projects of any kind.
Develop, construct, test and maintain architectures and run Sprints for development and rollout of functionalities
Data Analysis, Code development experience, ideally in Big Data Spark, Hive, Hadoop, Java, Python, PySpark,
Execute projects of various types i.e. Design, development, Implementation and migration of functional analytics Models/Business logic across architecture approaches
Work closely with Business Analysts to understand the core business problems and deliver efficient IT solutions of the product
Deployment sophisticated analytics program of code using any of cloud application.

Perks and Benefits we Provide!

Working with Highly Technical and Passionate, mission-driven people
Subsidized Meals & Snacks
Flexible Schedule
Approachable leadership
Access to various learning tools and programs
Pet Friendly
Certification Reimbursement Policy
Check out more about us on our website below!

www.datametica.com

Job description

Role : Lead Architecture (Spark, Scala, Big Data/Hadoop, Java)

Primary Location : India-Pune, Hyderabad

Experience : 7 - 12 Years

Management Level: 7

Joining Time: Immediate Joiners are preferred

Attend requirements gathering workshops, estimation discussions, design meetings and status review meetings
Experience of Solution Design and Solution Architecture for the data engineer model to build and implement Big Data Projects on-premises and on cloud.
Align architecture with business requirements and stabilizing the developed solution
Ability to build prototypes to demonstrate the technical feasibility of your vision
Professional experience facilitating and leading solution design, architecture and delivery planning activities for data intensive and high throughput platforms and applications
To be able to benchmark systems, analyses system bottlenecks and propose solutions to eliminate them
Able to help programmers and project managers in the design, planning and governance of implementing projects of any kind.
Develop, construct, test and maintain architectures and run Sprints for development and rollout of functionalities
Data Analysis, Code development experience, ideally in Big Data Spark, Hive, Hadoop, Java, Python, PySpark,
Execute projects of various types i.e. Design, development, Implementation and migration of functional analytics Models/Business logic across architecture approaches
Work closely with Business Analysts to understand the core business problems and deliver efficient IT solutions of the product
Deployment sophisticated analytics program of code using any of cloud application.

Perks and Benefits we Provide!

Working with Highly Technical and Passionate, mission-driven people
Subsidized Meals & Snacks
Flexible Schedule
Approachable leadership
Access to various learning tools and programs
Pet Friendly
Certification Reimbursement Policy
Check out more about us on our website below!

www.datametica.com

Sr Data Engineer

at Infogain

Agency job

via Technogen India PvtLtd by RAHUL BATTA

Bengaluru (Bangalore), Pune, Noida, NCR (Delhi | Gurgaon | Noida)

7 - 10 yrs

₹20L - ₹25L / yr

Data engineering

Python

SQL

Spark

PySpark

+10 more

Sr. Data Engineer:

Core Skills – Data Engineering, Big Data, Pyspark, Spark SQL and Python

Candidate with prior Palantir Cloud Foundry OR Clinical Trial Data Model background is preferred

Major accountabilities:

Responsible for Data Engineering, Foundry Data Pipeline Creation, Foundry Analysis & Reporting, Slate Application development, re-usable code development & management and Integrating Internal or External System with Foundry for data ingestion with high quality.
Have good understanding on Foundry Platform landscape and it’s capabilities
Performs data analysis required to troubleshoot data related issues and assist in the resolution of data issues.
Defines company data assets (data models), Pyspark, spark SQL, jobs to populate data models.
Designs data integrations and data quality framework.
Design & Implement integration with Internal, External Systems, F1 AWS platform using Foundry Data Connector or Magritte Agent
Collaboration with data scientists, data analyst and technology teams to document and leverage their understanding of the Foundry integration with different data sources - Actively participate in agile work practices
Coordinating with Quality Engineer to ensure the all quality controls, naming convention & best practices have been followed

Desired Candidate Profile :

Strong data engineering background
Experience with Clinical Data Model is preferred
Experience in

SQL Server ,Postgres, Cassandra, Hadoop, and Spark for distributed data storage and parallel computing
Java and Groovy for our back-end applications and data integration tools
Python for data processing and analysis
Cloud infrastructure based on AWS EC2 and S3

7+ years IT experience, 2+ years’ experience in Palantir Foundry Platform, 4+ years’ experience in Big Data platform
5+ years of Python and Pyspark development experience
Strong troubleshooting and problem solving skills
BTech or master's degree in computer science or a related technical field
Experience designing, building, and maintaining big data pipelines systems
Hands-on experience on Palantir Foundry Platform and Foundry custom Apps development
Able to design and implement data integration between Palantir Foundry and external Apps based on Foundry data connector framework
Hands-on in programming languages primarily Python, R, Java, Unix shell scripts
Hand-on experience in AWS / Azure cloud platform and stack
Strong in API based architecture and concept, able to do quick PoC using API integration and development
Knowledge of machine learning and AI
Skill and comfort working in a rapidly changing environment with dynamic objectives and iteration with users.

Demonstrated ability to continuously learn, work independently, and make decisions with minimal supervision

Sr. Data Engineer:

Core Skills – Data Engineering, Big Data, Pyspark, Spark SQL and Python

Candidate with prior Palantir Cloud Foundry OR Clinical Trial Data Model background is preferred

Major accountabilities:

Responsible for Data Engineering, Foundry Data Pipeline Creation, Foundry Analysis & Reporting, Slate Application development, re-usable code development & management and Integrating Internal or External System with Foundry for data ingestion with high quality.
Have good understanding on Foundry Platform landscape and it’s capabilities
Performs data analysis required to troubleshoot data related issues and assist in the resolution of data issues.
Defines company data assets (data models), Pyspark, spark SQL, jobs to populate data models.
Designs data integrations and data quality framework.
Design & Implement integration with Internal, External Systems, F1 AWS platform using Foundry Data Connector or Magritte Agent
Collaboration with data scientists, data analyst and technology teams to document and leverage their understanding of the Foundry integration with different data sources - Actively participate in agile work practices
Coordinating with Quality Engineer to ensure the all quality controls, naming convention & best practices have been followed

Desired Candidate Profile :

Strong data engineering background
Experience with Clinical Data Model is preferred
Experience in

SQL Server ,Postgres, Cassandra, Hadoop, and Spark for distributed data storage and parallel computing
Java and Groovy for our back-end applications and data integration tools
Python for data processing and analysis
Cloud infrastructure based on AWS EC2 and S3

7+ years IT experience, 2+ years’ experience in Palantir Foundry Platform, 4+ years’ experience in Big Data platform
5+ years of Python and Pyspark development experience
Strong troubleshooting and problem solving skills
BTech or master's degree in computer science or a related technical field
Experience designing, building, and maintaining big data pipelines systems
Hands-on experience on Palantir Foundry Platform and Foundry custom Apps development
Able to design and implement data integration between Palantir Foundry and external Apps based on Foundry data connector framework
Hands-on in programming languages primarily Python, R, Java, Unix shell scripts
Hand-on experience in AWS / Azure cloud platform and stack
Strong in API based architecture and concept, able to do quick PoC using API integration and development
Knowledge of machine learning and AI
Skill and comfort working in a rapidly changing environment with dynamic objectives and iteration with users.

Demonstrated ability to continuously learn, work independently, and make decisions with minimal supervision

Azure Data Engineer

at Fragma Data Systems

8 recruiters

Posted by Evelyn Charles

Remote, Bengaluru (Bangalore), Hyderabad, Chennai, Mumbai, Pune

8 - 15 yrs

₹16L - ₹28L / yr

PySpark

SQL Azure

azure synapse

Windows Azure

Azure Data Engineer

+3 more

Technology Skills:

Building and operationalizing large scale enterprise data solutions and applications using one or more of AZURE data and analytics services in combination with custom solutions - Azure Synapse/Azure SQL DWH, Azure Data Lake, Azure Blob Storage, Spark, HDInsights, Databricks, CosmosDB, EventHub/IOTHub.
Experience in migrating on-premise data warehouses to data platforms on AZURE cloud.
Designing and implementing data engineering, ingestion, and transformation functions

Good to Have:

Experience with Azure Analysis Services
Experience in Power BI
Experience with third-party solutions like Attunity/Stream sets, Informatica
Experience with PreSales activities (Responding to RFPs, Executing Quick POCs)
Capacity Planning and Performance Tuning on Azure Stack and Spark.

Technology Skills:

Building and operationalizing large scale enterprise data solutions and applications using one or more of AZURE data and analytics services in combination with custom solutions - Azure Synapse/Azure SQL DWH, Azure Data Lake, Azure Blob Storage, Spark, HDInsights, Databricks, CosmosDB, EventHub/IOTHub.
Experience in migrating on-premise data warehouses to data platforms on AZURE cloud.
Designing and implementing data engineering, ingestion, and transformation functions

Good to Have:

Experience with Azure Analysis Services
Experience in Power BI
Experience with third-party solutions like Attunity/Stream sets, Informatica
Experience with PreSales activities (Responding to RFPs, Executing Quick POCs)
Capacity Planning and Performance Tuning on Azure Stack and Spark.

Get to hear about interesting companies hiring right now

Follow Cutshort

Why apply via Cutshort?

Connect with actual hiring teams and get their fast response. No spam.

Find more jobs

Get to hear about interesting companies hiring right now

Follow Cutshort