Pyspark jobs

50+ PySpark Jobs in India

Apply to 50+ PySpark Jobs on CutShort.io. Find your next job, effortlessly. Browse PySpark Jobs and apply today!

Databricks Admin

One of the reputed Client in India

Agency job

via Evalutech Prospect Services Private Limited by HR Evalutech

Bengaluru (Bangalore), Mumbai, Delhi, Gurugram, Noida, Hyderabad, Pune

6 - 8 yrs

₹12L - ₹13L / yr

Amazon Web Services (AWS)

Python

PySpark

Our Client is looking to hire Databricks Amin immediatly.

This is PAN-INDIA Bulk hiring

Minimum of 6-8+ years with Databricks, Pyspark/Python and AWS.

Must have AWS

Notice 15-30 days is preferred.

Share profiles at hr at etpspl dot com

Please refer/share our email to your friends/colleagues who are looking for job.

Our Client is looking to hire Databricks Amin immediatly.

This is PAN-INDIA Bulk hiring

Minimum of 6-8+ years with Databricks, Pyspark/Python and AWS.

Must have AWS

Notice 15-30 days is preferred.

Share profiles at hr at etpspl dot com

Please refer/share our email to your friends/colleagues who are looking for job.

Data Engineer

at Moative

3 candid answers

Posted by Eman Khan

Chennai

3 - 5 yrs

₹10L - ₹25L / yr

Python

PySpark

Scala

Data engineering

ETL

+12 more

About Moative

Moative, an Applied AI company, designs and builds transformation AI solutions for traditional industries in energy, utilities, healthcare & lifesciences, and more. Through Moative Labs, we build AI micro-products and launch AI startups with partners in vertical markets that align with our theses.

Our Past: We have built and sold two companies, one of which was an AI company. Our founders and leaders are Math PhDs, Ivy League University Alumni, Ex-Googlers, and successful entrepreneurs.

Our Team: Our team of 20+ employees consist of data scientists, AI/ML Engineers, and mathematicians from top engineering and research institutes such as IITs, CERN, IISc, UZH, Ph.Ds. Our team includes academicians, IBM Research Fellows, and former founders.

Work you’ll do

As a Data Engineer, you will work on data architecture, large-scale processing systems, and data flow management. You will build and maintain optimal data architecture and data pipelines, assemble large, complex data sets, and ensure that data is readily available to data scientists, analysts, and other users. In close collaboration with ML engineers, data scientists, and domain experts, you’ll deliver robust, production-grade solutions that directly impact business outcomes. Ultimately, you will be responsible for developing and implementing systems that optimize the organization’s data use and data quality.

Responsibilities

Create and maintain optimal data architecture and data pipelines on cloud infrastructure (such as AWS/ Azure/ GCP)
Assemble large, complex data sets that meet functional / non-functional business requirements
Identify, design, and implement internal process improvements
Build the pipeline infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources
Support development of analytics that utilize the data pipeline to provide actionable insights into key business metrics
Work with stakeholders to assist with data-related technical issues and support their data infrastructure needs

Who you are

You are a passionate and results-oriented engineer who understands the importance of data architecture and data quality to impact solution development, enhance products, and ultimately improve business applications. You thrive in dynamic environments and are comfortable navigating ambiguity. You possess a strong sense of ownership and are eager to take initiative, advocating for your technical decisions while remaining open to feedback and collaboration.

You have experience in developing and deploying data pipelines to support real-world applications. You have a good understanding of data structures and are excellent at writing clean, efficient code to extract, create and manage large data sets for analytical uses. You have the ability to conduct regular testing and debugging to ensure optimal data pipeline performance. You are excited at the possibility of contributing to intelligent applications that can directly impact business services and make a positive difference to users.

Skills & Requirements

3+ years of hands-on experience as a data engineer, data architect or similar role, with a good understanding of data structures and data engineering.
Solid knowledge of cloud infra and data-related services on AWS (EC2, EMR, RDS, Redshift) and/ or Azure.
Advanced knowledge of SQL, including writing complex queries, stored procedures, views, etc.
Strong experience with data pipeline and workflow management tools (such as Luigi, Airflow).
Experience with common relational SQL, NoSQL and Graph databases.
Strong experience with scripting languages: Python, PySpark, Scala, etc.
Practical experience with basic DevOps concepts: CI/CD, containerization (Docker, Kubernetes), etc
Experience with big data tools (Spark, Kafka, etc) and stream processing.
Excellent communication skills to collaborate with colleagues from both technical and business backgrounds, discuss and convey ideas and findings effectively.
Ability to analyze complex problems, think critically for troubleshooting and develop robust data solutions.
Ability to identify and tackle issues efficiently and proactively, conduct thorough research and collaborate to find long-term, scalable solutions.

Working at Moative

Moative is a young company, but we believe strongly in thinking long-term, while acting with urgency. Our ethos is rooted in innovation, efficiency and high-quality outcomes. We believe the future of work is AI-augmented and boundary less. Here are some of our guiding principles:

Think in decades. Act in hours. As an independent company, our moat is time. While our decisions are for the long-term horizon, our execution will be fast – measured in hours and days, not weeks and months.
Own the canvas. Throw yourself in to build, fix or improve – anything that isn’t done right, irrespective of who did it. Be selfish about improving across the organization – because once the rot sets in, we waste years in surgery and recovery.
Use data or don’t use data. Use data where you ought to but not as a ‘cover-my-back’ political tool. Be capable of making decisions with partial or limited data. Get better at intuition and pattern-matching. Whichever way you go, be mostly right about it.
Avoid work about work. Process creeps on purpose, unless we constantly question it. We are deliberate about committing to rituals that take time away from the actual work. We truly believe that a meeting that could be an email, should be an email and you don’t need a person with the highest title to say that out loud.
High revenue per person. We work backwards from this metric. Our default is to automate instead of hiring. We multi-skill our people to own more outcomes than hiring someone who has less to do. We don’t like squatting and hoarding that comes in the form of hiring for growth. High revenue per person comes from high quality work from everyone. We demand it.

If this role and our work is of interest to you, please apply. We encourage you to apply even if you believe you do not meet all the requirements listed above.

That said, you should demonstrate that you are in the 90th percentile or above. This may mean that you have studied in top-notch institutions, won competitions that are intellectually demanding, built something of your own, or rated as an outstanding performer by your current or previous employers.

The position is based out of Chennai. Our work currently involves significant in-person collaboration and we expect you to work out of our offices in Chennai.

About Moative

Our Past: We have built and sold two companies, one of which was an AI company. Our founders and leaders are Math PhDs, Ivy League University Alumni, Ex-Googlers, and successful entrepreneurs.

Work you’ll do

Responsibilities

Create and maintain optimal data architecture and data pipelines on cloud infrastructure (such as AWS/ Azure/ GCP)
Assemble large, complex data sets that meet functional / non-functional business requirements
Identify, design, and implement internal process improvements
Build the pipeline infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources
Support development of analytics that utilize the data pipeline to provide actionable insights into key business metrics
Work with stakeholders to assist with data-related technical issues and support their data infrastructure needs

Who you are

Skills & Requirements

3+ years of hands-on experience as a data engineer, data architect or similar role, with a good understanding of data structures and data engineering.
Solid knowledge of cloud infra and data-related services on AWS (EC2, EMR, RDS, Redshift) and/ or Azure.
Advanced knowledge of SQL, including writing complex queries, stored procedures, views, etc.
Strong experience with data pipeline and workflow management tools (such as Luigi, Airflow).
Experience with common relational SQL, NoSQL and Graph databases.
Strong experience with scripting languages: Python, PySpark, Scala, etc.
Practical experience with basic DevOps concepts: CI/CD, containerization (Docker, Kubernetes), etc
Experience with big data tools (Spark, Kafka, etc) and stream processing.
Excellent communication skills to collaborate with colleagues from both technical and business backgrounds, discuss and convey ideas and findings effectively.
Ability to analyze complex problems, think critically for troubleshooting and develop robust data solutions.
Ability to identify and tackle issues efficiently and proactively, conduct thorough research and collaborate to find long-term, scalable solutions.

Working at Moative

Think in decades. Act in hours. As an independent company, our moat is time. While our decisions are for the long-term horizon, our execution will be fast – measured in hours and days, not weeks and months.
Own the canvas. Throw yourself in to build, fix or improve – anything that isn’t done right, irrespective of who did it. Be selfish about improving across the organization – because once the rot sets in, we waste years in surgery and recovery.
Use data or don’t use data. Use data where you ought to but not as a ‘cover-my-back’ political tool. Be capable of making decisions with partial or limited data. Get better at intuition and pattern-matching. Whichever way you go, be mostly right about it.
Avoid work about work. Process creeps on purpose, unless we constantly question it. We are deliberate about committing to rituals that take time away from the actual work. We truly believe that a meeting that could be an email, should be an email and you don’t need a person with the highest title to say that out loud.
High revenue per person. We work backwards from this metric. Our default is to automate instead of hiring. We multi-skill our people to own more outcomes than hiring someone who has less to do. We don’t like squatting and hoarding that comes in the form of hiring for growth. High revenue per person comes from high quality work from everyone. We demand it.

If this role and our work is of interest to you, please apply. We encourage you to apply even if you believe you do not meet all the requirements listed above.

The position is based out of Chennai. Our work currently involves significant in-person collaboration and we expect you to work out of our offices in Chennai.

Sr. Big Data Engineer

at Inncircles

Posted by Gangadhar M

Hyderabad

3 - 5 yrs

Best in industry

PySpark

Spark

Python

ETL

Amazon EMR

+7 more

We are looking for a highly skilled Sr. Big Data Engineer with 3-5 years of experience in

building large-scale data pipelines, real-time streaming solutions, and batch/stream

processing systems. The ideal candidate should be proficient in Spark, Kafka, Python, and

AWS Big Data services, with hands-on experience in implementing CDC (Change Data

Capture) pipelines and integrating multiple data sources and sinks.

Responsibilities

Design, develop, and optimize batch and streaming data pipelines using Apache Spark and Python.
Build and maintain real-time data ingestion pipelines leveraging Kafka and AWS Kinesis.
Implement CDC (Change Data Capture) pipelines using Kafka Connect, Debezium or similar frameworks.
Integrate data from multiple sources and sinks (databases, APIs, message queues, file systems, cloud storage).
Work with AWS Big Data ecosystem: Glue, EMR, Kinesis, Athena, S3, Lambda, Step Functions.
Ensure pipeline scalability, reliability, and performance tuning of Spark jobs and EMR clusters.
Develop data transformation and ETL workflows in AWS Glue and manage schema evolution.
Collaborate with data scientists, analysts, and product teams to deliver reliable and high-quality data solutions.
Implement monitoring, logging, and alerting for critical data pipelines.
Follow best practices for data security, compliance, and cost optimization in cloud environments.

Required Skills & Experience

Programming: Strong proficiency in Python (PySpark, data frameworks, automation).
Big Data Processing: Hands-on experience with Apache Spark (batch & streaming).
Messaging & Streaming: Proficient in Kafka (brokers, topics, partitions, consumer groups) and AWS Kinesis.
CDC Pipelines: Experience with Debezium / Kafka Connect / custom CDC frameworks.
AWS Services: AWS Glue, EMR, S3, Athena, Lambda, IAM, CloudWatch.
ETL/ELT Workflows: Strong knowledge of data ingestion, transformation, partitioning, schema management.
Databases: Experience with relational databases (MySQL, Postgres, Oracle) and NoSQL (MongoDB, DynamoDB, Cassandra).
Data Formats: JSON, Parquet, Avro, ORC, Delta/Iceberg/Hudi.
Version Control & CI/CD: Git, GitHub/GitLab, Jenkins, or CodePipeline.
Monitoring/Logging: CloudWatch, Prometheus, ELK/Opensearch.
Containers & Orchestration (nice-to-have): Docker, Kubernetes, Airflow/Step
Functions for workflow orchestration.

Preferred Qualifications

Bachelor’s or Master’s degree in Computer Science, Data Engineering, or related field.
Experience in large-scale data lake / lake house architectures.
Knowledge of data warehousing concepts and query optimisation.
Familiarity with data governance, lineage, and cataloging tools (Glue Data Catalog, Apache Atlas).
Exposure to ML/AI data pipelines is a plus.

Tools & Technologies (must-have exposure)

Big Data & Processing: Apache Spark, PySpark, AWS EMR, AWS Glue
Streaming & Messaging: Apache Kafka, Kafka Connect, Debezium, AWS Kinesis
Cloud & Storage: AWS (S3, Athena, Lambda, IAM, CloudWatch)
Programming & Scripting: Python, SQL, Bash
Orchestration: Airflow / Step Functions
Version Control & CI/CD: Git, Jenkins/CodePipeline
Data Formats: Parquet, Avro, ORC, JSON, Delta, Iceberg, Hudi

We are looking for a highly skilled Sr. Big Data Engineer with 3-5 years of experience in

building large-scale data pipelines, real-time streaming solutions, and batch/stream

processing systems. The ideal candidate should be proficient in Spark, Kafka, Python, and

AWS Big Data services, with hands-on experience in implementing CDC (Change Data

Capture) pipelines and integrating multiple data sources and sinks.

Responsibilities

Design, develop, and optimize batch and streaming data pipelines using Apache Spark and Python.
Build and maintain real-time data ingestion pipelines leveraging Kafka and AWS Kinesis.
Implement CDC (Change Data Capture) pipelines using Kafka Connect, Debezium or similar frameworks.
Integrate data from multiple sources and sinks (databases, APIs, message queues, file systems, cloud storage).
Work with AWS Big Data ecosystem: Glue, EMR, Kinesis, Athena, S3, Lambda, Step Functions.
Ensure pipeline scalability, reliability, and performance tuning of Spark jobs and EMR clusters.
Develop data transformation and ETL workflows in AWS Glue and manage schema evolution.
Collaborate with data scientists, analysts, and product teams to deliver reliable and high-quality data solutions.
Implement monitoring, logging, and alerting for critical data pipelines.
Follow best practices for data security, compliance, and cost optimization in cloud environments.

Required Skills & Experience

Programming: Strong proficiency in Python (PySpark, data frameworks, automation).
Big Data Processing: Hands-on experience with Apache Spark (batch & streaming).
Messaging & Streaming: Proficient in Kafka (brokers, topics, partitions, consumer groups) and AWS Kinesis.
CDC Pipelines: Experience with Debezium / Kafka Connect / custom CDC frameworks.
AWS Services: AWS Glue, EMR, S3, Athena, Lambda, IAM, CloudWatch.
ETL/ELT Workflows: Strong knowledge of data ingestion, transformation, partitioning, schema management.
Databases: Experience with relational databases (MySQL, Postgres, Oracle) and NoSQL (MongoDB, DynamoDB, Cassandra).
Data Formats: JSON, Parquet, Avro, ORC, Delta/Iceberg/Hudi.
Version Control & CI/CD: Git, GitHub/GitLab, Jenkins, or CodePipeline.
Monitoring/Logging: CloudWatch, Prometheus, ELK/Opensearch.
Containers & Orchestration (nice-to-have): Docker, Kubernetes, Airflow/Step
Functions for workflow orchestration.

Preferred Qualifications

Bachelor’s or Master’s degree in Computer Science, Data Engineering, or related field.
Experience in large-scale data lake / lake house architectures.
Knowledge of data warehousing concepts and query optimisation.
Familiarity with data governance, lineage, and cataloging tools (Glue Data Catalog, Apache Atlas).
Exposure to ML/AI data pipelines is a plus.

Tools & Technologies (must-have exposure)

Big Data & Processing: Apache Spark, PySpark, AWS EMR, AWS Glue
Streaming & Messaging: Apache Kafka, Kafka Connect, Debezium, AWS Kinesis
Cloud & Storage: AWS (S3, Athena, Lambda, IAM, CloudWatch)
Programming & Scripting: Python, SQL, Bash
Orchestration: Airflow / Step Functions
Version Control & CI/CD: Git, Jenkins/CodePipeline
Data Formats: Parquet, Avro, ORC, JSON, Delta, Iceberg, Hudi

Technical Lead - Python FastAPI

at Technoidentity

Posted by Human Resources

Hyderabad

6 - 12 yrs

₹20L - ₹35L / yr

Python

FastAPI

PySpark

Supercharge Your Career as a Technical Lead - Python at Technoidentity!

Are you ready to solve people challenges that fuel business growth? At Technoidentity, we’re a Data+AI product engineering company building cutting-edge solutions in the FinTech domain for over 13 years—and we’re expanding globally. It’s the perfect time to join our

team of tech innovators and leave your mark!

At Technoidentity, we’re a Data + AI product engineering company trusted to deliver scalable and modern enterprise solutions. Join us as a Senior Python Developer and Technical Lead, where you'll guide high-performing engineering teams, design complex systems, and deliver

clean, scalable backend solutions using Python and modern data technologies. Your leadership will directly shape the architecture and execution of enterprise projects, with added strength in understanding database logic including PL/SQL and PostgreSQL/AlloyDB.

What’s in it for You?

• Modern Python Stack – Python 3.x, FastAPI, Pandas, NumPy, SQLAlchemy, PostgreSQL/AlloyDB, PL/pgSQL.

• Tech Leadership – Drive technical decision-making, mentor developers, and ensure code quality and scalability.

• Scalable Projects – Architect and optimize data-intensive backend services for highthroughput and distributed systems.

• Engineering Best Practices – Enforce clean architecture, code reviews, testing strategies, and SDLC alignment.

• Cross-Functional Collaboration – Lead conversations across engineering, QA, product, and DevOps to ensure delivery excellence.

What Will You Be Doing?

Technical Leadership

• Lead a team of developers through design, code reviews, and technical mentorship.

• Set architectural direction and ensure scalability, modularity, and code quality.

• Work with stakeholders to translate business goals into robust technical solutions.

Backend Development & Data Engineering

• Design and build clean, high-performance backend services using FastAPI and Python

best practices.

• Handle row- and column-level data transformation using Pandas and NumPy.

• Apply data wrangling, cleansing, and preprocessing techniques across microservices and pipelines.

Database & Performance Optimization

• Write performant queries, procedures, and triggers using PostgreSQL and PL/pgSQL.

• Understand legacy logic in PL/SQL and participate in rewriting or modernizing it for PostgreSQL-based systems.

• Tune both backend and database performance, including memory, indexing, and query optimization.

Parallelism & Communication

• Implement multithreading, multiprocessing, and parallel data flows in Python.

• Integrate Kafka, RabbitMQ, or Pub/Sub systems for real-time and async message

processing.

Engineering Excellence

• Drive adherence to Agile, Git-based workflows, CI/CD, and DevOps pipelines.

• Promote testing (unit/integration), monitoring, and observability for all backend systems.

• Stay current with Python ecosystem evolution and introduce tools that improve productivity and performance.

What Makes You the Perfect Fit?

• 6–10 years of proven experience in Python development, with strong expertise in designing and delivering scalable backend solutions

Supercharge Your Career as a Technical Lead - Python at Technoidentity!

team of tech innovators and leave your mark!

What’s in it for You?

• Modern Python Stack – Python 3.x, FastAPI, Pandas, NumPy, SQLAlchemy, PostgreSQL/AlloyDB, PL/pgSQL.

• Tech Leadership – Drive technical decision-making, mentor developers, and ensure code quality and scalability.

• Scalable Projects – Architect and optimize data-intensive backend services for highthroughput and distributed systems.

• Engineering Best Practices – Enforce clean architecture, code reviews, testing strategies, and SDLC alignment.

• Cross-Functional Collaboration – Lead conversations across engineering, QA, product, and DevOps to ensure delivery excellence.

What Will You Be Doing?

Technical Leadership

• Lead a team of developers through design, code reviews, and technical mentorship.

• Set architectural direction and ensure scalability, modularity, and code quality.

• Work with stakeholders to translate business goals into robust technical solutions.

Backend Development & Data Engineering

• Design and build clean, high-performance backend services using FastAPI and Python

best practices.

• Handle row- and column-level data transformation using Pandas and NumPy.

• Apply data wrangling, cleansing, and preprocessing techniques across microservices and pipelines.

Database & Performance Optimization

• Write performant queries, procedures, and triggers using PostgreSQL and PL/pgSQL.

• Understand legacy logic in PL/SQL and participate in rewriting or modernizing it for PostgreSQL-based systems.

• Tune both backend and database performance, including memory, indexing, and query optimization.

Parallelism & Communication

• Implement multithreading, multiprocessing, and parallel data flows in Python.

• Integrate Kafka, RabbitMQ, or Pub/Sub systems for real-time and async message

processing.

Engineering Excellence

• Drive adherence to Agile, Git-based workflows, CI/CD, and DevOps pipelines.

• Promote testing (unit/integration), monitoring, and observability for all backend systems.

• Stay current with Python ecosystem evolution and introduce tools that improve productivity and performance.

What Makes You the Perfect Fit?

• 6–10 years of proven experience in Python development, with strong expertise in designing and delivering scalable backend solutions

Python Developer

at Wissen Technology

4 recruiters

Posted by Nishita Bangera

Bengaluru (Bangalore)

4 - 8 yrs

Best in industry

Python

SQL

PySpark

Django

Key Responsibilities

Develop and maintain Python-based applications.
Design and optimize SQL queries and databases.
Collaborate with cross-functional teams to define, design, and ship new features.
Write clean, maintainable, and efficient code.
Troubleshoot and debug applications.
Participate in code reviews and contribute to team knowledge sharing.

Qualifications and Required Skills

Strong proficiency in Python programming.
Experience with SQL and database management.
Experience with web frameworks such as Django or Flask.
Knowledge of front-end technologies like HTML, CSS, and JavaScript.
Familiarity with version control systems like Git.
Strong problem-solving skills and attention to detail.
Excellent communication and teamwork skills.

Good to Have Skills

Experience with cloud platforms like AWS or Azure.
Knowledge of containerization technologies like Docker.
Familiarity with continuous integration and continuous deployment (CI/CD) pipelines

Key Responsibilities

Develop and maintain Python-based applications.
Design and optimize SQL queries and databases.
Collaborate with cross-functional teams to define, design, and ship new features.
Write clean, maintainable, and efficient code.
Troubleshoot and debug applications.
Participate in code reviews and contribute to team knowledge sharing.

Qualifications and Required Skills

Strong proficiency in Python programming.
Experience with SQL and database management.
Experience with web frameworks such as Django or Flask.
Knowledge of front-end technologies like HTML, CSS, and JavaScript.
Familiarity with version control systems like Git.
Strong problem-solving skills and attention to detail.
Excellent communication and teamwork skills.

Good to Have Skills

Experience with cloud platforms like AWS or Azure.
Knowledge of containerization technologies like Docker.
Familiarity with continuous integration and continuous deployment (CI/CD) pipelines

Data Engineer

at Wissen Technology

4 recruiters

Posted by Gagandeep Kaur

Bengaluru (Bangalore), Mumbai, Pune

4 - 7 yrs

Best in industry

Python

PySpark

pandas

Airflow

Data engineering

Wissen Technology is hiring for Data Engineer

About Wissen Technology: At Wissen Technology, we deliver niche, custom-built products that solve complex business challenges across industries worldwide. Founded in 2015, our core philosophy is built around a strong product engineering mindset—ensuring every solution is architected and delivered right the first time. Today, Wissen Technology has a global footprint with 2000+ employees across offices in the US, UK, UAE, India, and Australia. Our commitment to excellence translates into delivering 2X impact compared to traditional service providers. How do we achieve this? Through a combination of deep domain knowledge, cutting-edge technology expertise, and a relentless focus on quality. We don’t just meet expectations—we exceed them by ensuring faster time-to-market, reduced rework, and greater alignment with client objectives. We have a proven track record of building mission-critical systems across industries, including financial services, healthcare, retail, manufacturing, and more. Wissen stands apart through its unique delivery models. Our outcome-based projects ensure predictable costs and timelines, while our agile pods provide clients the flexibility to adapt to their evolving business needs. Wissen leverages its thought leadership and technology prowess to drive superior business outcomes. Our success is powered by top-tier talent. Our mission is clear: to be the partner of choice for building world-class custom products that deliver exceptional impact—the first time, every time.

Job Summary: Wissen Technology is hiring a Data Engineer with expertise in Python, Pandas, Airflow, and Azure Cloud Services. The ideal candidate will have strong communication skills and experience with Kubernetes.

Experience: 4-7 years

Notice Period: Immediate- 15 days

Location: Pune, Mumbai, Bangalore

Mode of Work: Hybrid

Key Responsibilities:

Develop and maintain data pipelines using Python and Pandas.
Implement and manage workflows using Airflow.
Utilize Azure Cloud Services for data storage and processing.
Collaborate with cross-functional teams to understand data requirements and deliver solutions.
Ensure data quality and integrity throughout the data lifecycle.
Optimize and scale data infrastructure to meet business needs.

Qualifications and Required Skills:

Proficiency in Python (Must Have).
Strong experience with Pandas (Must Have).
Expertise in Airflow (Must Have).
Experience with Azure Cloud Services.
Good communication skills.

Good to Have Skills:

Experience with Pyspark.
Knowledge of Kubernetes.

Wissen Sites:

Wissen Technology is hiring for Data Engineer

Experience: 4-7 years

Notice Period: Immediate- 15 days

Location: Pune, Mumbai, Bangalore

Mode of Work: Hybrid

Key Responsibilities:

Develop and maintain data pipelines using Python and Pandas.
Implement and manage workflows using Airflow.
Utilize Azure Cloud Services for data storage and processing.
Collaborate with cross-functional teams to understand data requirements and deliver solutions.
Ensure data quality and integrity throughout the data lifecycle.
Optimize and scale data infrastructure to meet business needs.

Qualifications and Required Skills:

Proficiency in Python (Must Have).
Strong experience with Pandas (Must Have).
Expertise in Airflow (Must Have).
Experience with Azure Cloud Services.
Good communication skills.

Good to Have Skills:

Experience with Pyspark.
Knowledge of Kubernetes.

Wissen Sites:

Website: http://www.wissen.com
LinkedIn: https://www.linkedin.com/company/wissen-technology
Wissen Leadership: https://www.wissen.com/company/leadership-team/
Wissen Live: https://www.linkedin.com/company/wissen-technology/posts/feedView=All
Wissen Thought Leadership: https://www.wissen.com/articles/

Hiring _Azure Data Bricks

at Wissen Technology

4 recruiters

Posted by Bipasha Rath

Mumbai, Bengaluru (Bangalore), Pune

3 - 7 yrs

Best in industry

Python

pandas

PySpark

Experience: 3–7 Years

Locations: Pune / Bangalore / Mumbai

Notice Period :Immediate joiner only

Employment Type: Full-time

🛠️ Key Skills (Mandatory):

Python: Strong coding skills for data manipulation and automation.
PySpark: Experience with distributed data processing using Spark.
SQL: Proficient in writing complex queries for data extraction and transformation.
Azure Databricks: Hands-on experience with notebooks, Delta Lake, and MLflow

Interested candidates please share resume with details below.

Total Experience -

Relevant Experience in Python,Pyspark,AQL,Azure Data bricks-

Current CTC -

Expected CTC -

Notice period -

Current Location -

Desired Location -

Experience: 3–7 Years

Locations: Pune / Bangalore / Mumbai

Notice Period :Immediate joiner only

Employment Type: Full-time

🛠️ Key Skills (Mandatory):

Python: Strong coding skills for data manipulation and automation.
PySpark: Experience with distributed data processing using Spark.
SQL: Proficient in writing complex queries for data extraction and transformation.
Azure Databricks: Hands-on experience with notebooks, Delta Lake, and MLflow

Interested candidates please share resume with details below.

Total Experience -

Relevant Experience in Python,Pyspark,AQL,Azure Data bricks-

Current CTC -

Expected CTC -

Notice period -

Current Location -

Desired Location -

DATA ENGINEER

at Wissen Technology

4 recruiters

Posted by Janane Mohanasankaran

Bengaluru (Bangalore), Pune, Mumbai

7 - 12 yrs

Best in industry

Python

pandas

PySpark

SQL

Data engineering

Wissen Technology is hiring for Data Engineer

About Wissen Technology:At Wissen Technology, we deliver niche, custom-built products that solve complex business challenges across industries worldwide. Founded in 2015, our core philosophy is built around a strong product engineering mindset—ensuring every solution is architected and delivered right the first time. Today, Wissen Technology has a global footprint with 2000+ employees across offices in the US, UK, UAE, India, and Australia. Our commitment to excellence translates into delivering 2X impact compared to traditional service providers. How do we achieve this? Through a combination of deep domain knowledge, cutting-edge technology expertise, and a relentless focus on quality. We don’t just meet expectations—we exceed them by ensuring faster time-to-market, reduced rework, and greater alignment with client objectives. We have a proven track record of building mission-critical systems across industries, including financial services, healthcare, retail, manufacturing, and more. Wissen stands apart through its unique delivery models. Our outcome-based projects ensure predictable costs and timelines, while our agile pods provide clients the flexibility to adapt to their evolving business needs. Wissen leverages its thought leadership and technology prowess to drive superior business outcomes. Our success is powered by top-tier talent. Our mission is clear: to be the partner of choice for building world-class custom products that deliver exceptional impact—the first time, every time.

Job Summary:Wissen Technology is hiring a Data Engineer with a strong background in Python, data engineering, and workflow optimization. The ideal candidate will have experience with Delta Tables, Parquet, and be proficient in Pandas and PySpark.

Experience:7+ years

Location:Pune, Mumbai, Bangalore

Mode of Work:Hybrid

Key Responsibilities:

Develop and maintain data pipelines using Python (Pandas, PySpark).
Optimize data workflows and ensure efficient data processing.
Work with Delta Tables and Parquet for data storage and management.
Collaborate with cross-functional teams to understand data requirements and deliver solutions.
Ensure data quality and integrity throughout the data lifecycle.
Implement best practices for data engineering and workflow optimization.

Qualifications and Required Skills:

Proficiency in Python, specifically with Pandas and PySpark.
Strong experience in data engineering and workflow optimization.
Knowledge of Delta Tables and Parquet.
Excellent problem-solving skills and attention to detail.
Ability to work collaboratively in a team environment.
Strong communication skills.

Good to Have Skills:

Experience with Databricks.
Knowledge of Apache Spark, DBT, and Airflow.
Advanced Pandas optimizations.
Familiarity with PyTest/DBT testing frameworks.

Wissen Sites:

Wissen | Driving Digital Transformation

A technology consultancy that drives digital innovation by connecting strategy and execution, helping global clients to strengthen their core technology.

Wissen Technology is hiring for Data Engineer

About Wissen Technology:At Wissen Technology, we deliver niche, custom-built products that solve complex business challenges across industries worldwide. Founded in 2015, our core philosophy is built around a strong product engineering mindset—ensuring every solution is architected and delivered right the first time. Today, Wissen Technology has a global footprint with 2000+ employees across offices in the US, UK, UAE, India, and Australia. Our commitment to excellence translates into delivering 2X impact compared to traditional service providers. How do we achieve this? Through a combination of deep domain knowledge, cutting-edge technology expertise, and a relentless focus on quality. We don’t just meet expectations—we exceed them by ensuring faster time-to-market, reduced rework, and greater alignment with client objectives. We have a proven track record of building mission-critical systems across industries, including financial services, healthcare, retail, manufacturing, and more. Wissen stands apart through its unique delivery models. Our outcome-based projects ensure predictable costs and timelines, while our agile pods provide clients the flexibility to adapt to their evolving business needs. Wissen leverages its thought leadership and technology prowess to drive superior business outcomes. Our success is powered by top-tier talent. Our mission is clear: to be the partner of choice for building world-class custom products that deliver exceptional impact—the first time, every time.

Experience:7+ years

Location:Pune, Mumbai, Bangalore

Mode of Work:Hybrid

Key Responsibilities:

Develop and maintain data pipelines using Python (Pandas, PySpark).
Optimize data workflows and ensure efficient data processing.
Work with Delta Tables and Parquet for data storage and management.
Collaborate with cross-functional teams to understand data requirements and deliver solutions.
Ensure data quality and integrity throughout the data lifecycle.
Implement best practices for data engineering and workflow optimization.

Qualifications and Required Skills:

Proficiency in Python, specifically with Pandas and PySpark.
Strong experience in data engineering and workflow optimization.
Knowledge of Delta Tables and Parquet.
Excellent problem-solving skills and attention to detail.
Ability to work collaboratively in a team environment.
Strong communication skills.

Good to Have Skills:

Experience with Databricks.
Knowledge of Apache Spark, DBT, and Airflow.
Advanced Pandas optimizations.
Familiarity with PyTest/DBT testing frameworks.

Wissen Sites:

Website: http://www.wissen.com
LinkedIn: https://www.linkedin.com/company/wissen-technology
Wissen Leadership: https://www.wissen.com/company/leadership-team/
Wissen Live: https://www.linkedin.com/company/wissen-technology/posts/feedView=All
Wissen Thought Leadership: https://www.wissen.com/articles/

Wissen | Driving Digital Transformation

A technology consultancy that drives digital innovation by connecting strategy and execution, helping global clients to strengthen their core technology.

PySpark/Scala Developer

at Tata Consultancy Services

2 recruiters

Agency job

via Risk Resources LLP hyd by Jhansi Padiy

Chennai, Hyderabad, Kolkata, Delhi, Pune, Bengaluru (Bangalore)

4 - 10 yrs

₹6L - ₹30L / yr

Scala

PySpark

Spark

Amazon Web Services (AWS)

Job Title: PySpark/Scala Developer

Functional Skills: Experience in Credit Risk/Regulatory risk domain

Technical Skills: Spark ,PySpark, Python, Hive, Scala, MapReduce, Unix shell scripting

Good to Have Skills: Exposure to Machine Learning Techniques

Job Description:

5+ Years of experience with Developing/Fine tuning and implementing programs/applications

Using Python/PySpark/Scala on Big Data/Hadoop Platform.

Roles and Responsibilities:

a) Work with a Leading Bank’s Risk Management team on specific projects/requirements pertaining to risk Models in

consumer and wholesale banking

b) Enhance Machine Learning Models using PySpark or Scala

c) Work with Data Scientists to Build ML Models based on Business Requirements and Follow ML Cycle to Deploy them all

the way to Production Environment

d) Participate Feature Engineering, Training Models, Scoring and retraining

e) Architect Data Pipeline and Automate Data Ingestion and Model Jobs

Skills and competencies:

Required:

· Strong analytical skills in conducting sophisticated statistical analysis using bureau/vendor data, customer performance

Data and macro-economic data to solve business problems.

· Working experience in languages PySpark & Scala to develop code to validate and implement models and codes in

Credit Risk/Banking

· Experience with distributed systems such as Hadoop/MapReduce, Spark, streaming data processing, cloud architecture.

Familiarity with machine learning frameworks and libraries (like scikit-learn, SparkML, tensorflow, pytorch etc.
Experience in systems integration, web services, batch processing
Experience in migrating codes to PySpark/Scala is big Plus
The ability to act as liaison conveying information needs of the business to IT and data constraints to the business

applies equal conveyance regarding business strategy and IT strategy, business processes and work flow

· Flexibility in approach and thought process

· Attitude to learn and comprehend the periodical changes in the regulatory requirement as per FED

Job Title: PySpark/Scala Developer

Functional Skills: Experience in Credit Risk/Regulatory risk domain

Technical Skills: Spark ,PySpark, Python, Hive, Scala, MapReduce, Unix shell scripting

Good to Have Skills: Exposure to Machine Learning Techniques

Job Description:

5+ Years of experience with Developing/Fine tuning and implementing programs/applications

Using Python/PySpark/Scala on Big Data/Hadoop Platform.

Roles and Responsibilities:

a) Work with a Leading Bank’s Risk Management team on specific projects/requirements pertaining to risk Models in

consumer and wholesale banking

b) Enhance Machine Learning Models using PySpark or Scala

c) Work with Data Scientists to Build ML Models based on Business Requirements and Follow ML Cycle to Deploy them all

the way to Production Environment

d) Participate Feature Engineering, Training Models, Scoring and retraining

e) Architect Data Pipeline and Automate Data Ingestion and Model Jobs

Skills and competencies:

Required:

· Strong analytical skills in conducting sophisticated statistical analysis using bureau/vendor data, customer performance

Data and macro-economic data to solve business problems.

· Working experience in languages PySpark & Scala to develop code to validate and implement models and codes in

Credit Risk/Banking

· Experience with distributed systems such as Hadoop/MapReduce, Spark, streaming data processing, cloud architecture.

Familiarity with machine learning frameworks and libraries (like scikit-learn, SparkML, tensorflow, pytorch etc.
Experience in systems integration, web services, batch processing
Experience in migrating codes to PySpark/Scala is big Plus
The ability to act as liaison conveying information needs of the business to IT and data constraints to the business

applies equal conveyance regarding business strategy and IT strategy, business processes and work flow

· Flexibility in approach and thought process

· Attitude to learn and comprehend the periodical changes in the regulatory requirement as per FED

PySpark/Scala Developer

at Tata Consultancy Services

2 recruiters

Agency job

via Risk Resources LLP hyd by susmitha o

Bengaluru (Bangalore), Hyderabad, Pune, Delhi, Kolkata, Chennai

5 - 8 yrs

₹7L - ₹30L / yr

Scala

Python

PySpark

Apache Hive

Spark

+3 more

Skills and competencies:

Required:

· Strong analytical skills in conducting sophisticated statistical analysis using bureau/vendor data, customer performance

Data and macro-economic data to solve business problems.

· Working experience in languages PySpark & Scala to develop code to validate and implement models and codes in

Credit Risk/Banking

· Experience with distributed systems such as Hadoop/MapReduce, Spark, streaming data processing, cloud architecture.

Familiarity with machine learning frameworks and libraries (like scikit-learn, SparkML, tensorflow, pytorch etc.
Experience in systems integration, web services, batch processing
Experience in migrating codes to PySpark/Scala is big Plus
The ability to act as liaison conveying information needs of the business to IT and data constraints to the business

applies equal conveyance regarding business strategy and IT strategy, business processes and work flow

· Flexibility in approach and thought process

· Attitude to learn and comprehend the periodical changes in the regulatory requirement as per FED

Skills and competencies:

Required:

· Strong analytical skills in conducting sophisticated statistical analysis using bureau/vendor data, customer performance

Data and macro-economic data to solve business problems.

· Working experience in languages PySpark & Scala to develop code to validate and implement models and codes in

Credit Risk/Banking

· Experience with distributed systems such as Hadoop/MapReduce, Spark, streaming data processing, cloud architecture.

Familiarity with machine learning frameworks and libraries (like scikit-learn, SparkML, tensorflow, pytorch etc.
Experience in systems integration, web services, batch processing
Experience in migrating codes to PySpark/Scala is big Plus
The ability to act as liaison conveying information needs of the business to IT and data constraints to the business

applies equal conveyance regarding business strategy and IT strategy, business processes and work flow

· Flexibility in approach and thought process

· Attitude to learn and comprehend the periodical changes in the regulatory requirement as per FED

AWS Data Engineer

at Deqode

1 recruiter

Posted by Shraddha Katare

Pune, Bengaluru (Bangalore)

5 - 8 yrs

₹5L - ₹13L / yr

Amazon Web Services (AWS)

databricks

PySpark

SQL

Profile: AWS Data Engineer

Mandate skills :AWS + Databricks + Pyspark + SQL role

Location: Bangalore/Pune/Hyderabad/Chennai/Gurgaon:

Notice Period: Immediate

Key Requirements :

Design, build, and maintain scalable data pipelines to collect, process, and store from multiple datasets.
Optimize data storage solutions for better performance, scalability, and cost-efficiency.
Develop and manage ETL/ELT processes to transform data as per schema definitions, apply slicing and dicing, and make it available for downstream jobs and other teams.
Collaborate closely with cross-functional teams to understand system and product functionalities, pace up feature development, and capture evolving data requirements.
Engage with stakeholders to gather requirements and create curated datasets for downstream consumption and end-user reporting.
Automate deployment and CI/CD processes using GitHub workflows, identifying areas to reduce manual, repetitive work.
Ensure compliance with data governance policies, privacy regulations, and security protocols.
Utilize cloud platforms like AWS and work on Databricks for data processing with S3 Storage.
Work with distributed systems and big data technologies such as Spark, SQL, and Delta Lake.
Integrate with SFTP to push data securely from Databricks to remote locations.
Analyze and interpret spark query execution plans to fine-tune queries for faster and more efficient processing.
Strong problem-solving and troubleshooting skills in large-scale distributed systems.

Profile: AWS Data Engineer

Mandate skills :AWS + Databricks + Pyspark + SQL role

Location: Bangalore/Pune/Hyderabad/Chennai/Gurgaon:

Notice Period: Immediate

Key Requirements :

Design, build, and maintain scalable data pipelines to collect, process, and store from multiple datasets.
Optimize data storage solutions for better performance, scalability, and cost-efficiency.
Develop and manage ETL/ELT processes to transform data as per schema definitions, apply slicing and dicing, and make it available for downstream jobs and other teams.
Collaborate closely with cross-functional teams to understand system and product functionalities, pace up feature development, and capture evolving data requirements.
Engage with stakeholders to gather requirements and create curated datasets for downstream consumption and end-user reporting.
Automate deployment and CI/CD processes using GitHub workflows, identifying areas to reduce manual, repetitive work.
Ensure compliance with data governance policies, privacy regulations, and security protocols.
Utilize cloud platforms like AWS and work on Databricks for data processing with S3 Storage.
Work with distributed systems and big data technologies such as Spark, SQL, and Delta Lake.
Integrate with SFTP to push data securely from Databricks to remote locations.
Analyze and interpret spark query execution plans to fine-tune queries for faster and more efficient processing.
Strong problem-solving and troubleshooting skills in large-scale distributed systems.

Data Engineer

at Data Axle

2 candid answers

Posted by Nikita Sinha

Pune

3 - 6 yrs

Upto ₹28L / yr (Varies

)

Python

PySpark

SQL

Amazon Web Services (AWS)

databricks

+1 more

• Data Pipeline Development: Design and implement scalable data pipelines using PySpark and Databricks on AWS cloud infrastructure

• ETL/ELT Operations: Extract, transform, and load data from various sources using Python, SQL, and PySpark for batch and streaming data processing

• Databricks Platform Management: Develop and maintain data workflows, notebooks, and clusters in Databricks environment for efficient data processing

• AWS Cloud Services: Utilize AWS services including S3, Glue, EMR, Redshift, Kinesis, and Lambda for comprehensive data solutions

• Data Transformation: Write efficient PySpark scripts and SQL queries to process large-scale datasets and implement complex business logic

• Data Quality & Monitoring: Implement data validation, quality checks, and monitoring solutions to ensure data integrity across pipelines

• Collaboration: Work closely with data scientists, analysts, and other engineering teams to support analytics and machine learning initiatives

• Performance Optimization: Monitor and optimize data pipeline performance, query efficiency, and resource utilization in Databricks and AWS environments

Required Qualifications:

• Experience: 3+ years of hands-on experience in data engineering, ETL development, or related field

• PySpark Expertise: Strong proficiency in PySpark for large-scale data processing and transformations

• Python Programming: Solid Python programming skills with experience in data manipulation libraries (pandas etc)

• SQL Proficiency: Advanced SQL skills including complex queries, window functions, and performance optimization

• Databricks Experience: Hands-on experience with Databricks platform, including notebook development, cluster management, and job scheduling

• AWS Cloud Services: Working knowledge of core AWS services (S3, Glue, EMR, Redshift, IAM, Lambda)

• Data Modeling: Understanding of dimensional modeling, data warehousing concepts, and ETL best practices

• Version Control: Experience with Git and collaborative development workflows

Preferred Qualifications:

• Education: Bachelor's degree in Computer Science, Engineering, Mathematics, or related technical field

• Advanced AWS: Experience with additional AWS services like Athena, QuickSight, Step Functions, and CloudWatch

• Data Formats: Experience working with various data formats (JSON, Parquet, Avro, Delta Lake)

• Containerization: Basic knowledge of Docker and container orchestration

• Agile Methodology: Experience working in Agile/Scrum development environments

• Business Intelligence Tools: Exposure to BI tools like Tableau, Power BI, or Databricks SQL Analytics

Technical Skills Summary:

Core Technologies:

PySpark & Spark SQL
Python (pandas, boto3)
SQL (PostgreSQL, MySQL, Redshift)
Databricks (notebooks, clusters, jobs, Delta Lake)

AWS Services:

S3, Glue, EMR, Redshift
Lambda, Athena
IAM, CloudWatch

Development Tools:

Git/GitHub
CI/CD pipelines, Docker
Linux/Unix command line