PySpark Jobs in Hyderabad

45+ PySpark Jobs in Hyderabad | PySpark Job openings in Hyderabad

Apply to 45+ PySpark Jobs in Hyderabad on CutShort.io. Explore the latest PySpark Job opportunities across top companies like Google, Amazon & Adobe.

GCP Data Engineer

at 10XScale.ai

Posted by Naveen Balne

Hyderabad, Bengaluru (Bangalore)

8 - 16 yrs

₹10L - ₹50L / yr

Google Cloud Platform (GCP)

Python

PySpark

Google BigQuery

Data engineering

+4 more

GCP Data Engineer

Experience: 8 – 15 yrs

Grade: C2/D1

Skill: GCP + Python/Pyspark

NP: Immediate joiners

8+ years of hands-on experience in Python.
8+ years of hands-on experience in Data Engineering.
5+ Years of hands-on experience in GCP Big Query.
Experience in building scalable data pipelines and automation frameworks.
Experience migrating data and pipelines from SQL Server to GCP Big Query.
Familiarity with CI/CD tools and Agile methodologies.
Good understanding on Data Governance, Data Quality, Metadata, Lineage
Expertise in Data Model design in Big query.
Expertise in writing optimal Big query SQL and Stored Proc.
Expertise in GCS Cloud Storage, Pub Sub, Cloud Composer, DAG, Apache Airflow, Data Flow, Data Proc, Data Plex, Cloud Run.
Expertise in Vertex AI and Feature Store.
Expertise in Spark and Apache Beam is desirable.

Email resumes to: naveenkb @ 10xscale.ai

GCP Data Engineer

Experience: 8 – 15 yrs

Grade: C2/D1

Skill: GCP + Python/Pyspark

NP: Immediate joiners

8+ years of hands-on experience in Python.
8+ years of hands-on experience in Data Engineering.
5+ Years of hands-on experience in GCP Big Query.
Experience in building scalable data pipelines and automation frameworks.
Experience migrating data and pipelines from SQL Server to GCP Big Query.
Familiarity with CI/CD tools and Agile methodologies.
Good understanding on Data Governance, Data Quality, Metadata, Lineage
Expertise in Data Model design in Big query.
Expertise in writing optimal Big query SQL and Stored Proc.
Expertise in GCS Cloud Storage, Pub Sub, Cloud Composer, DAG, Apache Airflow, Data Flow, Data Proc, Data Plex, Cloud Run.
Expertise in Vertex AI and Feature Store.
Expertise in Spark and Apache Beam is desirable.

Email resumes to: naveenkb @ 10xscale.ai

Senior Data Engineer (Dataform, BigQuery)

AI Industry

Agency job

via Peak Hire Solutions by Dharati Thakkar

Mumbai, Bengaluru (Bangalore), Hyderabad, Gurugram

6 - 10 yrs

₹32L - ₹42L / yr

ETL

SQL

Google Cloud Platform (GCP)

Data engineering

ELT

+17 more

Role & Responsibilities:

We are looking for a strong Data Engineer to join our growing team. The ideal candidate brings solid ETL fundamentals, hands-on pipeline experience, and cloud platform proficiency — with a preference for GCP / BigQuery expertise.

Responsibilities:

Design, build, and maintain scalable data pipelines and ETL/ELT workflows
Work with Dataform or DBT to implement transformation logic and data models
Develop and optimize data solutions on GCP (BigQuery, GCS) or AWS/Azure
Support data migration initiatives and data mesh architecture patterns
Collaborate with analysts, scientists, and business stakeholders to deliver reliable data products
Apply data governance and quality best practices across the data lifecycle
Troubleshoot pipeline issues and drive proactive monitoring and resolution

Ideal Candidate:

Strong Data Engineer Profile
Must have 6+ years of hands-on experience in Data Engineering, with strong ownership of end-to-end data pipeline development.
Must have strong experience in ETL/ELT pipeline design, transformation logic, and data workflow orchestration.
Must have hands-on experience with any one of the following: Dataform, dbt, or BigQuery, with practical exposure to data transformation, modeling, or cloud data warehousing.
Must have working experience on any cloud platform: GCP (preferred), AWS, or Azure, including object storage (GCS, S3, ADLS).
Must have strong SQL skills with experience in writing complex queries and optimizing performance.
Must have programming experience in Python and/or SQL for data processing.
Must have experience in building and maintaining scalable data pipelines and troubleshooting data issues.
Exposure to data migration projects and/or data mesh architecture concepts.
Experience with Spark / PySpark or large-scale data processing frameworks.
Experience working in product-based companies or data-driven environments.
Bachelor’s or Master’s degree in Computer Science, Engineering, or related field.

NOTE:

There will be an interview drive scheduled on 28th and 29th March 2026, and if shortlisted, they will be expected to be available on these Interview dates. Only Immediate joiners are considered.

Role & Responsibilities:

Responsibilities:

Design, build, and maintain scalable data pipelines and ETL/ELT workflows
Work with Dataform or DBT to implement transformation logic and data models
Develop and optimize data solutions on GCP (BigQuery, GCS) or AWS/Azure
Support data migration initiatives and data mesh architecture patterns
Collaborate with analysts, scientists, and business stakeholders to deliver reliable data products
Apply data governance and quality best practices across the data lifecycle
Troubleshoot pipeline issues and drive proactive monitoring and resolution

Ideal Candidate:

Strong Data Engineer Profile
Must have 6+ years of hands-on experience in Data Engineering, with strong ownership of end-to-end data pipeline development.
Must have strong experience in ETL/ELT pipeline design, transformation logic, and data workflow orchestration.
Must have hands-on experience with any one of the following: Dataform, dbt, or BigQuery, with practical exposure to data transformation, modeling, or cloud data warehousing.
Must have working experience on any cloud platform: GCP (preferred), AWS, or Azure, including object storage (GCS, S3, ADLS).
Must have strong SQL skills with experience in writing complex queries and optimizing performance.
Must have programming experience in Python and/or SQL for data processing.
Must have experience in building and maintaining scalable data pipelines and troubleshooting data issues.
Exposure to data migration projects and/or data mesh architecture concepts.
Experience with Spark / PySpark or large-scale data processing frameworks.
Experience working in product-based companies or data-driven environments.
Bachelor’s or Master’s degree in Computer Science, Engineering, or related field.

NOTE:

There will be an interview drive scheduled on 28th and 29th March 2026, and if shortlisted, they will be expected to be available on these Interview dates. Only Immediate joiners are considered.

Lead I - Data Engineering (Python, AWS Glue, Pyspark, Terraform)

Global Digital Transformation Solutions Provider

Agency job

via Peak Hire Solutions by Dharati Thakkar

Hyderabad

5 - 7 yrs

₹15L - ₹21L / yr

Python

Terraform

PySpark

Amazon Web Services (AWS)

Job Details

- Job Title: Lead I - Data Engineering (Python, AWS Glue, Pyspark, Terraform)

- Industry: Global digital transformation solutions provider

- Domain - Information technology (IT)

- Experience Required: 5-7 years

- Employment Type: Full Time

- Job Location: Hyderabad

- CTC Range: Best in Industry

Job Description

Data Engineer with AWS, Python, Glue, Terraform, Step function and Spark

Skills: Python, AWS Glue, Pyspark, Terraform - All are mandatory

******

Notice period - 0 to 15 days only

Job stability is mandatory

Location: Hyderabad

Job Details

- Job Title: Lead I - Data Engineering (Python, AWS Glue, Pyspark, Terraform)

- Industry: Global digital transformation solutions provider

- Domain - Information technology (IT)

- Experience Required: 5-7 years

- Employment Type: Full Time

- Job Location: Hyderabad

- CTC Range: Best in Industry

Job Description

Data Engineer with AWS, Python, Glue, Terraform, Step function and Spark

Skills: Python, AWS Glue, Pyspark, Terraform - All are mandatory

******

Notice period - 0 to 15 days only

Job stability is mandatory

Location: Hyderabad

Lead II - SE - AWS, Apache Spark (PySpark/Scala), Apache Kafka

Global digital transformation solutions provider.

Agency job

via Peak Hire Solutions by Dharati Thakkar

Hyderabad

5 - 8 yrs

₹11L - ₹20L / yr

PySpark

Apache Kafka

Data architecture

Amazon Web Services (AWS)

EMR

+32 more

JOB DETAILS:

* Job Title: Lead II - Software Engineering - AWS, Apache Spark (PySpark/Scala), Apache Kafka

* Industry: Global digital transformation solutions provider

* Salary: Best in Industry

* Experience: 5-8 years

* Location: Hyderabad

Job Summary

We are seeking a skilled Data Engineer to design, build, and optimize scalable data pipelines and cloud-based data platforms. The role involves working with large-scale batch and real-time data processing systems, collaborating with cross-functional teams, and ensuring data reliability, security, and performance across the data lifecycle.

Key Responsibilities

ETL Pipeline Development & Optimization

Design, develop, and maintain complex end-to-end ETL pipelines for large-scale data ingestion and processing.
Optimize data pipelines for performance, scalability, fault tolerance, and reliability.

Big Data Processing

Develop and optimize batch and real-time data processing solutions using Apache Spark (PySpark/Scala) and Apache Kafka.
Ensure fault-tolerant, scalable, and high-performance data processing systems.

Cloud Infrastructure Development

Build and manage scalable, cloud-native data infrastructure on AWS.
Design resilient and cost-efficient data pipelines adaptable to varying data volume and formats.

Real-Time & Batch Data Integration

Enable seamless ingestion and processing of real-time streaming and batch data sources (e.g., AWS MSK).
Ensure consistency, data quality, and a unified view across multiple data sources and formats.

Data Analysis & Insights

Partner with business teams and data scientists to understand data requirements.
Perform in-depth data analysis to identify trends, patterns, and anomalies.
Deliver high-quality datasets and present actionable insights to stakeholders.

CI/CD & Automation

Implement and maintain CI/CD pipelines using Jenkins or similar tools.
Automate testing, deployment, and monitoring to ensure smooth production releases.

Data Security & Compliance

Collaborate with security teams to ensure compliance with organizational and regulatory standards (e.g., GDPR, HIPAA).
Implement data governance practices ensuring data integrity, security, and traceability.

Troubleshooting & Performance Tuning

Identify and resolve performance bottlenecks in data pipelines.
Apply best practices for monitoring, tuning, and optimizing data ingestion and storage.

Collaboration & Cross-Functional Work

Work closely with engineers, data scientists, product managers, and business stakeholders.
Participate in agile ceremonies, sprint planning, and architectural discussions.

Skills & Qualifications

Mandatory (Must-Have) Skills

AWS Expertise

Hands-on experience with AWS Big Data services such as EMR, Managed Apache Airflow, Glue, S3, DMS, MSK, and EC2.
Strong understanding of cloud-native data architectures.

Big Data Technologies

Proficiency in PySpark or Scala Spark and SQL for large-scale data transformation and analysis.
Experience with Apache Spark and Apache Kafka in production environments.

Data Frameworks

Strong knowledge of Spark DataFrames and Datasets.

ETL Pipeline Development

Proven experience in building scalable and reliable ETL pipelines for both batch and real-time data processing.

Database Modeling & Data Warehousing

Expertise in designing scalable data models for OLAP and OLTP systems.

Data Analysis & Insights

Ability to perform complex data analysis and extract actionable business insights.
Strong analytical and problem-solving skills with a data-driven mindset.

CI/CD & Automation

Basic to intermediate experience with CI/CD pipelines using Jenkins or similar tools.
Familiarity with automated testing and deployment workflows.

Good-to-Have (Preferred) Skills

Knowledge of Java for data processing applications.
Experience with NoSQL databases (e.g., DynamoDB, Cassandra, MongoDB).
Familiarity with data governance frameworks and compliance tooling.
Experience with monitoring and observability tools such as AWS CloudWatch, Splunk, or Dynatrace.
Exposure to cost optimization strategies for large-scale cloud data platforms.

Skills: big data, scala spark, apache spark, ETL pipeline development

******

Notice period - 0 to 15 days only

Job stability is mandatory

Location: Hyderabad

Note: If a candidate is a short joiner, based in Hyderabad, and fits within the approved budget, we will proceed with an offer

F2F Interview: 14th Feb 2026

3 days in office, Hybrid model.

JOB DETAILS:

* Job Title: Lead II - Software Engineering - AWS, Apache Spark (PySpark/Scala), Apache Kafka

* Industry: Global digital transformation solutions provider

* Salary: Best in Industry

* Experience: 5-8 years

* Location: Hyderabad

Job Summary

Key Responsibilities

ETL Pipeline Development & Optimization

Design, develop, and maintain complex end-to-end ETL pipelines for large-scale data ingestion and processing.
Optimize data pipelines for performance, scalability, fault tolerance, and reliability.

Big Data Processing

Develop and optimize batch and real-time data processing solutions using Apache Spark (PySpark/Scala) and Apache Kafka.
Ensure fault-tolerant, scalable, and high-performance data processing systems.

Cloud Infrastructure Development

Build and manage scalable, cloud-native data infrastructure on AWS.
Design resilient and cost-efficient data pipelines adaptable to varying data volume and formats.

Real-Time & Batch Data Integration

Enable seamless ingestion and processing of real-time streaming and batch data sources (e.g., AWS MSK).
Ensure consistency, data quality, and a unified view across multiple data sources and formats.

Data Analysis & Insights

Partner with business teams and data scientists to understand data requirements.
Perform in-depth data analysis to identify trends, patterns, and anomalies.
Deliver high-quality datasets and present actionable insights to stakeholders.

CI/CD & Automation

Implement and maintain CI/CD pipelines using Jenkins or similar tools.
Automate testing, deployment, and monitoring to ensure smooth production releases.

Data Security & Compliance

Collaborate with security teams to ensure compliance with organizational and regulatory standards (e.g., GDPR, HIPAA).
Implement data governance practices ensuring data integrity, security, and traceability.

Troubleshooting & Performance Tuning

Identify and resolve performance bottlenecks in data pipelines.
Apply best practices for monitoring, tuning, and optimizing data ingestion and storage.

Collaboration & Cross-Functional Work

Work closely with engineers, data scientists, product managers, and business stakeholders.
Participate in agile ceremonies, sprint planning, and architectural discussions.

Skills & Qualifications

Mandatory (Must-Have) Skills

AWS Expertise

Hands-on experience with AWS Big Data services such as EMR, Managed Apache Airflow, Glue, S3, DMS, MSK, and EC2.
Strong understanding of cloud-native data architectures.

Big Data Technologies

Proficiency in PySpark or Scala Spark and SQL for large-scale data transformation and analysis.
Experience with Apache Spark and Apache Kafka in production environments.

Data Frameworks

Strong knowledge of Spark DataFrames and Datasets.

ETL Pipeline Development

Proven experience in building scalable and reliable ETL pipelines for both batch and real-time data processing.

Database Modeling & Data Warehousing

Expertise in designing scalable data models for OLAP and OLTP systems.

Data Analysis & Insights

Ability to perform complex data analysis and extract actionable business insights.
Strong analytical and problem-solving skills with a data-driven mindset.

CI/CD & Automation

Basic to intermediate experience with CI/CD pipelines using Jenkins or similar tools.
Familiarity with automated testing and deployment workflows.

Good-to-Have (Preferred) Skills

Knowledge of Java for data processing applications.
Experience with NoSQL databases (e.g., DynamoDB, Cassandra, MongoDB).
Familiarity with data governance frameworks and compliance tooling.
Experience with monitoring and observability tools such as AWS CloudWatch, Splunk, or Dynatrace.
Exposure to cost optimization strategies for large-scale cloud data platforms.

Skills: big data, scala spark, apache spark, ETL pipeline development

******

Notice period - 0 to 15 days only

Job stability is mandatory

Location: Hyderabad

Note: If a candidate is a short joiner, based in Hyderabad, and fits within the approved budget, we will proceed with an offer

F2F Interview: 14th Feb 2026

3 days in office, Hybrid model.

Data Architect (Dremio Lakehouse)

AI-First Company

Agency job

via Peak Hire Solutions by Dharati Thakkar

Bengaluru (Bangalore), Mumbai, Hyderabad, Gurugram

5 - 17 yrs

₹30L - ₹45L / yr

Data engineering

Data architecture

SQL

Data modeling

GCS

+47 more

ROLES AND RESPONSIBILITIES:

You will be responsible for architecting, implementing, and optimizing Dremio-based data Lakehouse environments integrated with cloud storage, BI, and data engineering ecosystems. The role requires a strong balance of architecture design, data modeling, query optimization, and governance enablement in large-scale analytical environments.

Design and implement Dremio lakehouse architecture on cloud (AWS/Azure/Snowflake/Databricks ecosystem).
Define data ingestion, curation, and semantic modeling strategies to support analytics and AI workloads.
Optimize Dremio reflections, caching, and query performance for diverse data consumption patterns.
Collaborate with data engineering teams to integrate data sources via APIs, JDBC, Delta/Parquet, and object storage layers (S3/ADLS).
Establish best practices for data security, lineage, and access control aligned with enterprise governance policies.
Support self-service analytics by enabling governed data products and semantic layers.
Develop reusable design patterns, documentation, and standards for Dremio deployment, monitoring, and scaling.
Work closely with BI and data science teams to ensure fast, reliable, and well-modeled access to enterprise data.

IDEAL CANDIDATE:

Bachelor’s or Master’s in Computer Science, Information Systems, or related field.
5+ years in data architecture and engineering, with 3+ years in Dremio or modern lakehouse platforms.
Strong expertise in SQL optimization, data modeling, and performance tuning within Dremio or similar query engines (Presto, Trino, Athena).
Hands-on experience with cloud storage (S3, ADLS, GCS), Parquet/Delta/Iceberg formats, and distributed query planning.
Knowledge of data integration tools and pipelines (Airflow, DBT, Kafka, Spark, etc.).
Familiarity with enterprise data governance, metadata management, and role-based access control (RBAC).
Excellent problem-solving, documentation, and stakeholder communication skills.

PREFERRED:

Experience integrating Dremio with BI tools (Tableau, Power BI, Looker) and data catalogs (Collibra, Alation, Purview).
Exposure to Snowflake, Databricks, or BigQuery environments.
Experience in high-tech, manufacturing, or enterprise data modernization programs.

ROLES AND RESPONSIBILITIES:

Design and implement Dremio lakehouse architecture on cloud (AWS/Azure/Snowflake/Databricks ecosystem).
Define data ingestion, curation, and semantic modeling strategies to support analytics and AI workloads.
Optimize Dremio reflections, caching, and query performance for diverse data consumption patterns.
Collaborate with data engineering teams to integrate data sources via APIs, JDBC, Delta/Parquet, and object storage layers (S3/ADLS).
Establish best practices for data security, lineage, and access control aligned with enterprise governance policies.
Support self-service analytics by enabling governed data products and semantic layers.
Develop reusable design patterns, documentation, and standards for Dremio deployment, monitoring, and scaling.
Work closely with BI and data science teams to ensure fast, reliable, and well-modeled access to enterprise data.

IDEAL CANDIDATE:

Bachelor’s or Master’s in Computer Science, Information Systems, or related field.
5+ years in data architecture and engineering, with 3+ years in Dremio or modern lakehouse platforms.
Strong expertise in SQL optimization, data modeling, and performance tuning within Dremio or similar query engines (Presto, Trino, Athena).
Hands-on experience with cloud storage (S3, ADLS, GCS), Parquet/Delta/Iceberg formats, and distributed query planning.
Knowledge of data integration tools and pipelines (Airflow, DBT, Kafka, Spark, etc.).
Familiarity with enterprise data governance, metadata management, and role-based access control (RBAC).
Excellent problem-solving, documentation, and stakeholder communication skills.

PREFERRED:

Experience integrating Dremio with BI tools (Tableau, Power BI, Looker) and data catalogs (Collibra, Alation, Purview).
Exposure to Snowflake, Databricks, or BigQuery environments.
Experience in high-tech, manufacturing, or enterprise data modernization programs.

Databricks Admin

One of the reputed Client in India

Agency job

via Evalutech Prospect Services Private Limited by HR Evalutech

Bengaluru (Bangalore), Mumbai, Delhi, Gurugram, Noida, Hyderabad, Pune

6 - 8 yrs

₹12L - ₹13L / yr

Amazon Web Services (AWS)

Python

PySpark

Our Client is looking to hire Databricks Amin immediatly.

This is PAN-INDIA Bulk hiring

Minimum of 6-8+ years with Databricks, Pyspark/Python and AWS.

Must have AWS

Notice 15-30 days is preferred.

Share profiles at hr at etpspl dot com

Please refer/share our email to your friends/colleagues who are looking for job.

Our Client is looking to hire Databricks Amin immediatly.

This is PAN-INDIA Bulk hiring

Minimum of 6-8+ years with Databricks, Pyspark/Python and AWS.

Must have AWS

Notice 15-30 days is preferred.

Share profiles at hr at etpspl dot com

Please refer/share our email to your friends/colleagues who are looking for job.

PySpark/Scala Developer

at Tata Consultancy Services

2 recruiters

Agency job

via Risk Resources LLP hyd by Jhansi Padiy

Chennai, Hyderabad, Kolkata, Delhi, Pune, Bengaluru (Bangalore)

4 - 10 yrs

₹6L - ₹30L / yr

Scala

PySpark

Spark

Amazon Web Services (AWS)

Job Title: PySpark/Scala Developer

Functional Skills: Experience in Credit Risk/Regulatory risk domain

Technical Skills: Spark ,PySpark, Python, Hive, Scala, MapReduce, Unix shell scripting

Good to Have Skills: Exposure to Machine Learning Techniques

Job Description:

5+ Years of experience with Developing/Fine tuning and implementing programs/applications

Using Python/PySpark/Scala on Big Data/Hadoop Platform.

Roles and Responsibilities:

a) Work with a Leading Bank’s Risk Management team on specific projects/requirements pertaining to risk Models in

consumer and wholesale banking

b) Enhance Machine Learning Models using PySpark or Scala

c) Work with Data Scientists to Build ML Models based on Business Requirements and Follow ML Cycle to Deploy them all

the way to Production Environment

d) Participate Feature Engineering, Training Models, Scoring and retraining

e) Architect Data Pipeline and Automate Data Ingestion and Model Jobs

Skills and competencies:

Required:

· Strong analytical skills in conducting sophisticated statistical analysis using bureau/vendor data, customer performance

Data and macro-economic data to solve business problems.

· Working experience in languages PySpark & Scala to develop code to validate and implement models and codes in

Credit Risk/Banking

· Experience with distributed systems such as Hadoop/MapReduce, Spark, streaming data processing, cloud architecture.

Familiarity with machine learning frameworks and libraries (like scikit-learn, SparkML, tensorflow, pytorch etc.
Experience in systems integration, web services, batch processing
Experience in migrating codes to PySpark/Scala is big Plus
The ability to act as liaison conveying information needs of the business to IT and data constraints to the business

applies equal conveyance regarding business strategy and IT strategy, business processes and work flow

· Flexibility in approach and thought process

· Attitude to learn and comprehend the periodical changes in the regulatory requirement as per FED

Job Title: PySpark/Scala Developer

Functional Skills: Experience in Credit Risk/Regulatory risk domain

Technical Skills: Spark ,PySpark, Python, Hive, Scala, MapReduce, Unix shell scripting

Good to Have Skills: Exposure to Machine Learning Techniques

Job Description:

5+ Years of experience with Developing/Fine tuning and implementing programs/applications

Using Python/PySpark/Scala on Big Data/Hadoop Platform.

Roles and Responsibilities:

a) Work with a Leading Bank’s Risk Management team on specific projects/requirements pertaining to risk Models in

consumer and wholesale banking

b) Enhance Machine Learning Models using PySpark or Scala

c) Work with Data Scientists to Build ML Models based on Business Requirements and Follow ML Cycle to Deploy them all

the way to Production Environment

d) Participate Feature Engineering, Training Models, Scoring and retraining

e) Architect Data Pipeline and Automate Data Ingestion and Model Jobs

Skills and competencies:

Required:

· Strong analytical skills in conducting sophisticated statistical analysis using bureau/vendor data, customer performance

Data and macro-economic data to solve business problems.

· Working experience in languages PySpark & Scala to develop code to validate and implement models and codes in

Credit Risk/Banking

· Experience with distributed systems such as Hadoop/MapReduce, Spark, streaming data processing, cloud architecture.

Familiarity with machine learning frameworks and libraries (like scikit-learn, SparkML, tensorflow, pytorch etc.
Experience in systems integration, web services, batch processing
Experience in migrating codes to PySpark/Scala is big Plus
The ability to act as liaison conveying information needs of the business to IT and data constraints to the business

applies equal conveyance regarding business strategy and IT strategy, business processes and work flow

· Flexibility in approach and thought process

· Attitude to learn and comprehend the periodical changes in the regulatory requirement as per FED

PySpark/Scala Developer

at Tata Consultancy Services

2 recruiters

Agency job

via Risk Resources LLP hyd by susmitha o

Bengaluru (Bangalore), Hyderabad, Pune, Delhi, Kolkata, Chennai

5 - 8 yrs

₹7L - ₹30L / yr

Scala

Python

PySpark

Apache Hive

Spark

+3 more

Skills and competencies:

Required:

· Strong analytical skills in conducting sophisticated statistical analysis using bureau/vendor data, customer performance

Data and macro-economic data to solve business problems.

· Working experience in languages PySpark & Scala to develop code to validate and implement models and codes in

Credit Risk/Banking

· Experience with distributed systems such as Hadoop/MapReduce, Spark, streaming data processing, cloud architecture.

Familiarity with machine learning frameworks and libraries (like scikit-learn, SparkML, tensorflow, pytorch etc.
Experience in systems integration, web services, batch processing
Experience in migrating codes to PySpark/Scala is big Plus
The ability to act as liaison conveying information needs of the business to IT and data constraints to the business

applies equal conveyance regarding business strategy and IT strategy, business processes and work flow

· Flexibility in approach and thought process

· Attitude to learn and comprehend the periodical changes in the regulatory requirement as per FED

Skills and competencies:

Required:

· Strong analytical skills in conducting sophisticated statistical analysis using bureau/vendor data, customer performance

Data and macro-economic data to solve business problems.

· Working experience in languages PySpark & Scala to develop code to validate and implement models and codes in

Credit Risk/Banking

· Experience with distributed systems such as Hadoop/MapReduce, Spark, streaming data processing, cloud architecture.

Familiarity with machine learning frameworks and libraries (like scikit-learn, SparkML, tensorflow, pytorch etc.
Experience in systems integration, web services, batch processing
Experience in migrating codes to PySpark/Scala is big Plus
The ability to act as liaison conveying information needs of the business to IT and data constraints to the business

applies equal conveyance regarding business strategy and IT strategy, business processes and work flow

· Flexibility in approach and thought process

· Attitude to learn and comprehend the periodical changes in the regulatory requirement as per FED

AWS Data Engineer

at VyTCDC

Posted by Gobinath Sundaram

Chennai, Bengaluru (Bangalore), Hyderabad, Mumbai, Pune, Noida

4 - 6 yrs

₹3L - ₹21L / yr

AWS Data Engineer

Amazon Web Services (AWS)

Python

PySpark

databricks

+1 more

Key Responsibilities

Design and implement ETL/ELT pipelines using Databricks, PySpark, and AWS Glue
Develop and maintain scalable data architectures on AWS (S3, EMR, Lambda, Redshift, RDS)
Perform data wrangling, cleansing, and transformation using Python and SQL
Collaborate with data scientists to integrate Generative AI models into analytics workflows
Build dashboards and reports to visualize insights using tools like Power BI or Tableau
Ensure data quality, governance, and security across all data assets
Optimize performance of data pipelines and troubleshoot bottlenecks
Work closely with stakeholders to understand data requirements and deliver actionable insights

🧪 Required Skills

Skill AreaTools & TechnologiesCloud PlatformsAWS (S3, Lambda, Glue, EMR, Redshift)Big DataDatabricks, Apache Spark, PySparkProgrammingPython, SQLData EngineeringETL/ELT, Data Lakes, Data WarehousingAnalyticsData Modeling, Visualization, BI ReportingGen AI IntegrationOpenAI, Hugging Face, LangChain (preferred)DevOps (Bonus)Git, Jenkins, Terraform, Docker

📚 Qualifications

Bachelor's or Master’s degree in Computer Science, Data Science, or related field
3+ years of experience in data engineering or data analytics
Hands-on experience with Databricks, PySpark, and AWS
Familiarity with Generative AI tools and frameworks is a strong plus
Strong problem-solving and communication skills

🌟 Preferred Traits

Analytical mindset with attention to detail
Passion for data and emerging technologies
Ability to work independently and in cross-functional teams
Eagerness to learn and adapt in a fast-paced environment

Key Responsibilities

Design and implement ETL/ELT pipelines using Databricks, PySpark, and AWS Glue
Develop and maintain scalable data architectures on AWS (S3, EMR, Lambda, Redshift, RDS)
Perform data wrangling, cleansing, and transformation using Python and SQL
Collaborate with data scientists to integrate Generative AI models into analytics workflows
Build dashboards and reports to visualize insights using tools like Power BI or Tableau
Ensure data quality, governance, and security across all data assets
Optimize performance of data pipelines and troubleshoot bottlenecks
Work closely with stakeholders to understand data requirements and deliver actionable insights

🧪 Required Skills

📚 Qualifications

Bachelor's or Master’s degree in Computer Science, Data Science, or related field
3+ years of experience in data engineering or data analytics
Hands-on experience with Databricks, PySpark, and AWS
Familiarity with Generative AI tools and frameworks is a strong plus
Strong problem-solving and communication skills

🌟 Preferred Traits

Analytical mindset with attention to detail
Passion for data and emerging technologies
Ability to work independently and in cross-functional teams
Eagerness to learn and adapt in a fast-paced environment

AWS data engineer

at Tekit Software solution Pvt Ltd

1 candid answer

Posted by himanshi Tripathi

Hyderabad, Bengaluru (Bangalore)

8 - 10 yrs

₹15L - ₹27L / yr

Amazon Web Services (AWS)

Python

PySpark

SQL

🔍 Job Description:

We are looking for an experienced and highly skilled Technical Lead to guide the development and enhancement of a large-scale Data Observability solution built on AWS. This platform is pivotal in delivering monitoring, reporting, and actionable insights across the client's data landscape.

The Technical Lead will drive end-to-end feature delivery, mentor junior engineers, and uphold engineering best practices. The position reports to the Programme Technical Lead / Architect and involves close collaboration to align on platform vision, technical priorities, and success KPIs.

🎯 Key Responsibilities:

Lead the design, development, and delivery of features for the data observability solution.
Mentor and guide junior engineers, promoting technical growth and engineering excellence.
Collaborate with the architect to align on platform roadmap, vision, and success metrics.
Ensure high quality, scalability, and performance in data engineering solutions.
Contribute to code reviews, architecture discussions, and operational readiness.

🔧 Primary Must-Have Skills (Non-Negotiable):

5+ years in Data Engineering or Software Engineering roles.
3+ years in a technical team or squad leadership capacity.
Deep expertise in AWS Data Services: Glue, EMR, Kinesis, Lambda, Athena, S3.
Advanced programming experience with PySpark, Python, and SQL.
Proven experience in building scalable, production-grade data pipelines on cloud platforms.

🔍 Job Description:

🎯 Key Responsibilities:

Lead the design, development, and delivery of features for the data observability solution.
Mentor and guide junior engineers, promoting technical growth and engineering excellence.
Collaborate with the architect to align on platform roadmap, vision, and success metrics.
Ensure high quality, scalability, and performance in data engineering solutions.
Contribute to code reviews, architecture discussions, and operational readiness.

🔧 Primary Must-Have Skills (Non-Negotiable):

5+ years in Data Engineering or Software Engineering roles.
3+ years in a technical team or squad leadership capacity.
Deep expertise in AWS Data Services: Glue, EMR, Kinesis, Lambda, Athena, S3.
Advanced programming experience with PySpark, Python, and SQL.
Proven experience in building scalable, production-grade data pipelines on cloud platforms.

Data Engineer

at ZeMoSo Technologies

11 recruiters

Agency job

via TIGI HR Solution Pvt. Ltd. by Vaidehi Sarkar

Mumbai, Bengaluru (Bangalore), Hyderabad, Chennai, Pune

4 - 8 yrs

₹10L - ₹15L / yr

Data engineering

Python

SQL

Data Warehouse (DWH)

Amazon Web Services (AWS)

+3 more

Work Mode: Hybrid

Need B.Tech, BE, M.Tech, ME candidates - Mandatory

Must-Have Skills:

● Educational Qualification :- B.Tech, BE, M.Tech, ME in any field.

● Minimum of 3 years of proven experience as a Data Engineer.

● Strong proficiency in Python programming language and SQL.

● Experience in DataBricks and setting up and managing data pipelines, data warehouses/lakes.

● Good comprehension and critical thinking skills.

● Kindly note Salary bracket will vary according to the exp. of the candidate -

- Experience from 4 yrs to 6 yrs - Salary upto 22 LPA

- Experience from 5 yrs to 8 yrs - Salary upto 30 LPA

- Experience more than 8 yrs - Salary upto 40 LPA

Work Mode: Hybrid

Need B.Tech, BE, M.Tech, ME candidates - Mandatory

Must-Have Skills:

● Educational Qualification :- B.Tech, BE, M.Tech, ME in any field.

● Minimum of 3 years of proven experience as a Data Engineer.

● Strong proficiency in Python programming language and SQL.

● Experience in DataBricks and setting up and managing data pipelines, data warehouses/lakes.

● Good comprehension and critical thinking skills.

● Kindly note Salary bracket will vary according to the exp. of the candidate -

- Experience from 4 yrs to 6 yrs - Salary upto 22 LPA

- Experience from 5 yrs to 8 yrs - Salary upto 30 LPA

- Experience more than 8 yrs - Salary upto 40 LPA

Data Engineer

at Deqode

1 recruiter

Posted by Alisha Das

Bengaluru (Bangalore), Delhi, Gurugram, Noida, Ghaziabad, Faridabad, Mumbai, Pune, Hyderabad, Indore, Jaipur, Kolkata

4 - 5 yrs

₹2L - ₹18L / yr

Python

PySpark

We are looking for a skilled and passionate Data Engineers with a strong foundation in Python programming and hands-on experience working with APIs, AWS cloud, and modern development practices. The ideal candidate will have a keen interest in building scalable backend systems and working with big data tools like PySpark.

Key Responsibilities:

Write clean, scalable, and efficient Python code.
Work with Python frameworks such as PySpark for data processing.
Design, develop, update, and maintain APIs (RESTful).
Deploy and manage code using GitHub CI/CD pipelines.
Collaborate with cross-functional teams to define, design, and ship new features.
Work on AWS cloud services for application deployment and infrastructure.
Basic database design and interaction with MySQL or DynamoDB.
Debugging and troubleshooting application issues and performance bottlenecks.

Required Skills & Qualifications:

4+ years of hands-on experience with Python development.
Proficient in Python basics with a strong problem-solving approach.
Experience with AWS Cloud services (EC2, Lambda, S3, etc.).
Good understanding of API development and integration.
Knowledge of GitHub and CI/CD workflows.
Experience in working with PySpark or similar big data frameworks.
Basic knowledge of MySQL or DynamoDB.
Excellent communication skills and a team-oriented mindset.

Nice to Have:

Experience in containerization (Docker/Kubernetes).
Familiarity with Agile/Scrum methodologies.

Key Responsibilities:

Write clean, scalable, and efficient Python code.
Work with Python frameworks such as PySpark for data processing.
Design, develop, update, and maintain APIs (RESTful).
Deploy and manage code using GitHub CI/CD pipelines.
Collaborate with cross-functional teams to define, design, and ship new features.
Work on AWS cloud services for application deployment and infrastructure.
Basic database design and interaction with MySQL or DynamoDB.
Debugging and troubleshooting application issues and performance bottlenecks.

Required Skills & Qualifications:

4+ years of hands-on experience with Python development.
Proficient in Python basics with a strong problem-solving approach.
Experience with AWS Cloud services (EC2, Lambda, S3, etc.).
Good understanding of API development and integration.
Knowledge of GitHub and CI/CD workflows.
Experience in working with PySpark or similar big data frameworks.
Basic knowledge of MySQL or DynamoDB.
Excellent communication skills and a team-oriented mindset.

Nice to Have:

Experience in containerization (Docker/Kubernetes).
Familiarity with Agile/Scrum methodologies.

GCP Senior Data Engineer

at Xebia IT Architects

2 recruiters

Posted by Vijay S

Bengaluru (Bangalore), Gurugram, Pune, Hyderabad, Chennai, Bhopal, Jaipur

10 - 15 yrs

₹30L - ₹40L / yr

Spark

Google Cloud Platform (GCP)

Python

Apache Airflow

PySpark

+1 more

We are looking for a Senior Data Engineer with strong expertise in GCP, Databricks, and Airflow to design and implement a GCP Cloud Native Data Processing Framework. The ideal candidate will work on building scalable data pipelines and help migrate existing workloads to a modern framework.

Shift: 2 PM 11 PM
Work Mode: Hybrid (3 days a week) across Xebia locations
Notice Period: Immediate joiners or those with a notice period of up to 30 days

Key Responsibilities:

Design and implement a GCP Native Data Processing Framework leveraging Spark and GCP Cloud Services.
Develop and maintain data pipelines using Databricks and Airflow for transforming Raw → Silver → Gold data layers.
Ensure data integrity, consistency, and availability across all systems.
Collaborate with data engineers, analysts, and stakeholders to optimize performance.
Document standards and best practices for data engineering workflows.

Required Experience:

7-8 years of experience in data engineering, architecture, and pipeline development.
Strong knowledge of GCP, Databricks, PySpark, and BigQuery.
Experience with Orchestration tools like Airflow, Dagster, or GCP equivalents.
Understanding of Data Lake table formats (Delta, Iceberg, etc.).
Proficiency in Python for scripting and automation.
Strong problem-solving skills and collaborative mindset.

⚠️ Please apply only if you have not applied recently or are not currently in the interview process for any open roles at Xebia.

Looking forward to your response!

Best regards,

Vijay S

Assistant Manager - TAG

https://www.linkedin.com/in/vijay-selvarajan/

Shift: 2 PM 11 PM
Work Mode: Hybrid (3 days a week) across Xebia locations
Notice Period: Immediate joiners or those with a notice period of up to 30 days

Key Responsibilities:

Design and implement a GCP Native Data Processing Framework leveraging Spark and GCP Cloud Services.
Develop and maintain data pipelines using Databricks and Airflow for transforming Raw → Silver → Gold data layers.
Ensure data integrity, consistency, and availability across all systems.
Collaborate with data engineers, analysts, and stakeholders to optimize performance.
Document standards and best practices for data engineering workflows.

Required Experience:

7-8 years of experience in data engineering, architecture, and pipeline development.
Strong knowledge of GCP, Databricks, PySpark, and BigQuery.
Experience with Orchestration tools like Airflow, Dagster, or GCP equivalents.
Understanding of Data Lake table formats (Delta, Iceberg, etc.).
Proficiency in Python for scripting and automation.
Strong problem-solving skills and collaborative mindset.

⚠️ Please apply only if you have not applied recently or are not currently in the interview process for any open roles at Xebia.

Looking forward to your response!

Best regards,

Vijay S

Assistant Manager - TAG

https://www.linkedin.com/in/vijay-selvarajan/

Data Engineer

at Indigrators solutions

Posted by Afzal Mohammed

Hyderabad

5 - 8 yrs

₹18L - ₹24L / yr

Python

PySpark

Palantir Foundry

Palantir

Foundry

Job Description

Job Title: Data Engineer

Location: Hyderabad, India

Job Type: Full Time

Experience: 5 – 8 Years

Working Model: On-Site (No remote or work-from-home options available)

Work Schedule: Mountain Time Zone (3:00 PM to 11:00 PM IST)

Role Overview

The Data Engineer will be responsible for designing and implementing scalable backend systems, leveraging Python and PySpark to build high-performance solutions. The role requires a proactive and detail-orientated individual who can solve complex data engineering challenges while collaborating with cross-functional teams to deliver quality results.

Key Responsibilities

Develop and maintain backend systems using Python and PySpark.
Optimise and enhance system performance for large-scale data processing.
Collaborate with cross-functional teams to define requirements and deliver solutions.
Debug, troubleshoot, and resolve system issues and bottlenecks.
Follow coding best practices to ensure code quality and maintainability.
Utilise tools like Palantir Foundry for data management workflows (good to have).

Qualifications

Strong proficiency in Python backend development.
Hands-on experience with PySpark for data engineering.
Excellent problem-solving skills and attention to detail.
Good communication skills for effective team collaboration.
Experience with Palantir Foundry or similar platforms is a plus.

Preferred Skills

Experience with large-scale data processing and pipeline development.
Familiarity with agile methodologies and development tools.
Ability to optimise and streamline backend processes effectively.

Job Description

Job Title: Data Engineer

Location: Hyderabad, India

Job Type: Full Time

Experience: 5 – 8 Years

Working Model: On-Site (No remote or work-from-home options available)

Work Schedule: Mountain Time Zone (3:00 PM to 11:00 PM IST)

Role Overview

Key Responsibilities

Develop and maintain backend systems using Python and PySpark.
Optimise and enhance system performance for large-scale data processing.
Collaborate with cross-functional teams to define requirements and deliver solutions.
Debug, troubleshoot, and resolve system issues and bottlenecks.
Follow coding best practices to ensure code quality and maintainability.
Utilise tools like Palantir Foundry for data management workflows (good to have).

Qualifications

Strong proficiency in Python backend development.
Hands-on experience with PySpark for data engineering.
Excellent problem-solving skills and attention to detail.
Good communication skills for effective team collaboration.
Experience with Palantir Foundry or similar platforms is a plus.

Preferred Skills

Experience with large-scale data processing and pipeline development.
Familiarity with agile methodologies and development tools.
Ability to optimise and streamline backend processes effectively.

Data Engineer

at Frisco Analytics Pvt Ltd

Posted by Cedrick Mariadas

Bengaluru (Bangalore), Hyderabad

5 - 8 yrs

₹15L - ₹20L / yr

databricks

Apache Spark

Python

SQL

MySQL

+3 more

We are actively seeking a self-motivated Data Engineer with expertise in Azure cloud and Databricks, with a thorough understanding of Delta Lake and Lake-house Architecture. The ideal candidate should excel in developing scalable data solutions, crafting platform tools, and integrating systems, while demonstrating proficiency in cloud-native database solutions and distributed data processing.

Key Responsibilities:

Contribute to the development and upkeep of a scalable data platform, incorporating tools and frameworks that leverage Azure and Databricks capabilities.
Exhibit proficiency in various RDBMS databases such as MySQL and SQL-Server, emphasizing their integration in applications and pipeline development.
Design and maintain high-caliber code, including data pipelines and applications, utilizing Python, Scala, and PHP.
Implement effective data processing solutions via Apache Spark, optimizing Spark applications for large-scale data handling.
Optimize data storage using formats like Parquet and Delta Lake to ensure efficient data accessibility and reliable performance.
Demonstrate understanding of Hive Metastore, Unity Catalog Metastore, and the operational dynamics of external tables.
Collaborate with diverse teams to convert business requirements into precise technical specifications.

Requirements:

Bachelor’s degree in Computer Science, Engineering, or a related discipline.
Demonstrated hands-on experience with Azure cloud services and Databricks.
Proficient programming skills in Python, Scala, and PHP.
In-depth knowledge of SQL, NoSQL databases, and data warehousing principles.
Familiarity with distributed data processing and external table management.
Insight into enterprise data solutions for PIM, CDP, MDM, and ERP applications.
Exceptional problem-solving acumen and meticulous attention to detail.

Additional Qualifications :

Acquaintance with data security and privacy standards.
Experience in CI/CD pipelines and version control systems, notably Git.
Familiarity with Agile methodologies and DevOps practices.
Competence in technical writing for comprehensive documentation.

Key Responsibilities:

Contribute to the development and upkeep of a scalable data platform, incorporating tools and frameworks that leverage Azure and Databricks capabilities.
Exhibit proficiency in various RDBMS databases such as MySQL and SQL-Server, emphasizing their integration in applications and pipeline development.
Design and maintain high-caliber code, including data pipelines and applications, utilizing Python, Scala, and PHP.
Implement effective data processing solutions via Apache Spark, optimizing Spark applications for large-scale data handling.
Optimize data storage using formats like Parquet and Delta Lake to ensure efficient data accessibility and reliable performance.
Demonstrate understanding of Hive Metastore, Unity Catalog Metastore, and the operational dynamics of external tables.
Collaborate with diverse teams to convert business requirements into precise technical specifications.

Requirements:

Bachelor’s degree in Computer Science, Engineering, or a related discipline.
Demonstrated hands-on experience with Azure cloud services and Databricks.
Proficient programming skills in Python, Scala, and PHP.
In-depth knowledge of SQL, NoSQL databases, and data warehousing principles.
Familiarity with distributed data processing and external table management.
Insight into enterprise data solutions for PIM, CDP, MDM, and ERP applications.
Exceptional problem-solving acumen and meticulous attention to detail.

Additional Qualifications :

Acquaintance with data security and privacy standards.
Experience in CI/CD pipelines and version control systems, notably Git.
Familiarity with Agile methodologies and DevOps practices.
Competence in technical writing for comprehensive documentation.

AWS Data Engineer (Contractual)

at Forward Eye Technologies

Posted by Jaya S

Bengaluru (Bangalore), Mumbai, Delhi, Gurugram, Pune, Hyderabad, Ahmedabad, Chennai

3 - 7 yrs

₹8L - ₹15L / yr

AWS Lambda

Amazon S3

Amazon VPC

Amazon EC2

Amazon Redshift

+3 more

Technical Skills:

Ability to understand and translate business requirements into design.
Proficient in AWS infrastructure components such as S3, IAM, VPC, EC2, and Redshift.
Experience in creating ETL jobs using Python/PySpark.
Proficiency in creating AWS Lambda functions for event-based jobs.
Knowledge of automating ETL processes using AWS Step Functions.
Competence in building data warehouses and loading data into them.

Responsibilities:

Understand business requirements and translate them into design.
Assess AWS infrastructure needs for development work.
Develop ETL jobs using Python/PySpark to meet requirements.
Implement AWS Lambda for event-based tasks.
Automate ETL processes using AWS Step Functions.
Build data warehouses and manage data loading.
Engage with customers and stakeholders to articulate the benefits of proposed solutions and frameworks.

Technical Skills:

Ability to understand and translate business requirements into design.
Proficient in AWS infrastructure components such as S3, IAM, VPC, EC2, and Redshift.
Experience in creating ETL jobs using Python/PySpark.
Proficiency in creating AWS Lambda functions for event-based jobs.
Knowledge of automating ETL processes using AWS Step Functions.
Competence in building data warehouses and loading data into them.

Responsibilities:

Understand business requirements and translate them into design.
Assess AWS infrastructure needs for development work.
Develop ETL jobs using Python/PySpark to meet requirements.
Implement AWS Lambda for event-based tasks.
Automate ETL processes using AWS Step Functions.
Build data warehouses and manage data loading.
Engage with customers and stakeholders to articulate the benefits of proposed solutions and frameworks.

Senior Data Engineer (L2)

at Publicis Sapient

10 recruiters

Posted by Mohit Singh

Bengaluru (Bangalore), Pune, Hyderabad, Gurugram, Noida

5 - 11 yrs

₹20L - ₹36L / yr

PySpark

Data engineering

Big Data

Hadoop

Spark

+7 more

Publicis Sapient Overview:

The Senior Associate People Senior Associate L1 in Data Engineering, you will translate client requirements into technical design, and implement components for data engineering solution. Utilize deep understanding of data integration and big data design principles in creating custom solutions or implementing package solutions. You will independently drive design discussions to insure the necessary health of the overall solution

Job Summary:

As Senior Associate L2 in Data Engineering, you will translate client requirements into technical design, and implement components for data engineering solution. Utilize deep understanding of data integration and big data design principles in creating custom solutions or implementing package solutions. You will independently drive design discussions to insure the necessary health of the overall solution

The role requires a hands-on technologist who has strong programming background like Java / Scala / Python, should have experience in Data Ingestion, Integration and data Wrangling, Computation, Analytics pipelines and exposure to Hadoop ecosystem components. You are also required to have hands-on knowledge on at least one of AWS, GCP, Azure cloud platforms.

Role & Responsibilities:

Your role is focused on Design, Development and delivery of solutions involving:

• Data Integration, Processing & Governance

• Data Storage and Computation Frameworks, Performance Optimizations

• Analytics & Visualizations

• Infrastructure & Cloud Computing

• Data Management Platforms

• Implement scalable architectural models for data processing and storage

• Build functionality for data ingestion from multiple heterogeneous sources in batch & real-time mode

• Build functionality for data analytics, search and aggregation

Experience Guidelines:

Mandatory Experience and Competencies:

# Competency

1.Overall 5+ years of IT experience with 3+ years in Data related technologies

2.Minimum 2.5 years of experience in Big Data technologies and working exposure in at least one cloud platform on related data services (AWS / Azure / GCP)

3.Hands-on experience with the Hadoop stack – HDFS, sqoop, kafka, Pulsar, NiFi, Spark, Spark Streaming, Flink, Storm, hive, oozie, airflow and other components required in building end to end data pipeline.

4.Strong experience in at least of the programming language Java, Scala, Python. Java preferable

5.Hands-on working knowledge of NoSQL and MPP data platforms like Hbase, MongoDb, Cassandra, AWS Redshift, Azure SQLDW, GCP BigQuery etc

6.Well-versed and working knowledge with data platform related services on at least 1 cloud platform, IAM and data security

Preferred Experience and Knowledge (Good to Have):

# Competency

1.Good knowledge of traditional ETL tools (Informatica, Talend, etc) and database technologies (Oracle, MySQL, SQL Server, Postgres) with hands on experience

2.Knowledge on data governance processes (security, lineage, catalog) and tools like Collibra, Alation etc

3.Knowledge on distributed messaging frameworks like ActiveMQ / RabbiMQ / Solace, search & indexing and Micro services architectures

4.Performance tuning and optimization of data pipelines

5.CI/CD – Infra provisioning on cloud, auto build & deployment pipelines, code quality

6.Cloud data specialty and other related Big data technology certifications

Personal Attributes:

• Strong written and verbal communication skills

• Articulation skills

• Good team player

• Self-starter who requires minimal oversight

• Ability to prioritize and manage multiple tasks

• Process orientation and the ability to define and set up processes

Publicis Sapient Overview:

Job Summary:

Role & Responsibilities:

Your role is focused on Design, Development and delivery of solutions involving:

• Data Integration, Processing & Governance

• Data Storage and Computation Frameworks, Performance Optimizations

• Analytics & Visualizations

• Infrastructure & Cloud Computing

• Data Management Platforms

• Implement scalable architectural models for data processing and storage

• Build functionality for data ingestion from multiple heterogeneous sources in batch & real-time mode

• Build functionality for data analytics, search and aggregation

Experience Guidelines:

Mandatory Experience and Competencies:

# Competency

1.Overall 5+ years of IT experience with 3+ years in Data related technologies

2.Minimum 2.5 years of experience in Big Data technologies and working exposure in at least one cloud platform on related data services (AWS / Azure / GCP)

4.Strong experience in at least of the programming language Java, Scala, Python. Java preferable

5.Hands-on working knowledge of NoSQL and MPP data platforms like Hbase, MongoDb, Cassandra, AWS Redshift, Azure SQLDW, GCP BigQuery etc

6.Well-versed and working knowledge with data platform related services on at least 1 cloud platform, IAM and data security

Preferred Experience and Knowledge (Good to Have):

# Competency

1.Good knowledge of traditional ETL tools (Informatica, Talend, etc) and database technologies (Oracle, MySQL, SQL Server, Postgres) with hands on experience

2.Knowledge on data governance processes (security, lineage, catalog) and tools like Collibra, Alation etc

3.Knowledge on distributed messaging frameworks like ActiveMQ / RabbiMQ / Solace, search & indexing and Micro services architectures

4.Performance tuning and optimization of data pipelines

5.CI/CD – Infra provisioning on cloud, auto build & deployment pipelines, code quality

6.Cloud data specialty and other related Big data technology certifications

Personal Attributes:

• Strong written and verbal communication skills

• Articulation skills

• Good team player

• Self-starter who requires minimal oversight

• Ability to prioritize and manage multiple tasks

• Process orientation and the ability to define and set up processes

Data Scientist with Apache Pyspark

at Kanerika Software

3 candid answers

2 recruiters

Posted by Meenakshi Ramagiri

RIYADH (Saudi Arabia), Hyderabad

6 - 12 yrs

₹10L - ₹15L / yr

Data Science

Machine Learning (ML)

Natural Language Processing (NLP)

Computer Vision

recommendation algorithm

+2 more

Job Description

Responsibilities:

- Collaborate with stakeholders to understand business objectives and requirements for AI/ML projects.

- Conduct research and stay up-to-date with the latest AI/ML algorithms, techniques, and frameworks.

- Design and develop machine learning models, algorithms, and data pipelines.

- Collect, preprocess, and clean large datasets to ensure data quality and reliability.

- Train, evaluate, and optimize machine learning models using appropriate evaluation metrics.

- Implement and deploy AI/ML models into production environments.

- Monitor model performance and propose enhancements or updates as needed.

- Collaborate with software engineers to integrate AI/ML capabilities into existing software systems.

- Perform data analysis and visualization to derive actionable insights.

- Stay informed about emerging trends and advancements in the field of AI/ML and apply them to improve existing solutions.

Strong experience in Apache pyspark is must

Requirements:

- Bachelor's or Master's degree in Computer Science, Engineering, or a related field.

- Proven experience of 3-5 years as an AI/ML Engineer or a similar role.

- Strong knowledge of machine learning algorithms, deep learning frameworks, and data science concepts.

- Proficiency in programming languages such as Python, Java, or C++.

- Experience with popular AI/ML libraries and frameworks, such as TensorFlow, Keras, PyTorch, or scikit-learn.

- Familiarity with cloud platforms, such as AWS, Azure, or GCP, and their AI/ML services.

- Solid understanding of data preprocessing, feature engineering, and model evaluation techniques.

- Experience in deploying and scaling machine learning models in production environments.

- Strong problem-solving skills and ability to work on multiple projects simultaneously.

- Excellent communication and teamwork skills.

Preferred Skills:

- Experience with natural language processing (NLP) techniques and tools.

- Familiarity with big data technologies, such as Hadoop, Spark, or Hive.

- Knowledge of containerization technologies like Docker and orchestration tools like Kubernetes.

- Understanding of DevOps practices for AI/ML model deployment

-Apache ,Pyspark