Apache Spark Jobs in Hyderabad

15+ Apache Spark Jobs in Hyderabad | Apache Spark Job openings in Hyderabad

Apply to 15+ Apache Spark Jobs in Hyderabad on CutShort.io. Explore the latest Apache Spark Job opportunities across top companies like Google, Amazon & Adobe.

Data Architect (Dremio Lakehouse)

AI Industry

Agency job

via Peak Hire Solutions by Dhara Thakkar

Mumbai, Bengaluru (Bangalore), Hyderabad, Gurugram

5 - 17 yrs

₹34L - ₹45L / yr

Dremio

Data engineering

Business Intelligence (BI)

Tableau

PowerBI

+51 more

Review Criteria:

Strong Dremio / Lakehouse Data Architect profile
5+ years of experience in Data Architecture / Data Engineering, with minimum 3+ years hands-on in Dremio
Strong expertise in SQL optimization, data modeling, query performance tuning, and designing analytical schemas for large-scale systems
Deep experience with cloud object storage (S3 / ADLS / GCS) and file formats such as Parquet, Delta, Iceberg along with distributed query planning concepts
Hands-on experience integrating data via APIs, JDBC, Delta/Parquet, object storage, and coordinating with data engineering pipelines (Airflow, DBT, Kafka, Spark, etc.)
Proven experience designing and implementing lakehouse architecture including ingestion, curation, semantic modeling, reflections/caching optimization, and enabling governed analytics
Strong understanding of data governance, lineage, RBAC-based access control, and enterprise security best practices
Excellent communication skills with ability to work closely with BI, data science, and engineering teams; strong documentation discipline
Candidates must come from enterprise data modernization, cloud-native, or analytics-driven companies

Preferred:

Experience integrating Dremio with BI tools (Tableau, Power BI, Looker) or data catalogs (Collibra, Alation, Purview); familiarity with Snowflake, Databricks, or BigQuery environments

Role & Responsibilities:

You will be responsible for architecting, implementing, and optimizing Dremio-based data lakehouse environments integrated with cloud storage, BI, and data engineering ecosystems. The role requires a strong balance of architecture design, data modeling, query optimization, and governance enablement in large-scale analytical environments.

Design and implement Dremio lakehouse architecture on cloud (AWS/Azure/Snowflake/Databricks ecosystem).
Define data ingestion, curation, and semantic modeling strategies to support analytics and AI workloads.
Optimize Dremio reflections, caching, and query performance for diverse data consumption patterns.
Collaborate with data engineering teams to integrate data sources via APIs, JDBC, Delta/Parquet, and object storage layers (S3/ADLS).
Establish best practices for data security, lineage, and access control aligned with enterprise governance policies.
Support self-service analytics by enabling governed data products and semantic layers.
Develop reusable design patterns, documentation, and standards for Dremio deployment, monitoring, and scaling.
Work closely with BI and data science teams to ensure fast, reliable, and well-modeled access to enterprise data.

Ideal Candidate:

Bachelor’s or Master’s in Computer Science, Information Systems, or related field.
5+ years in data architecture and engineering, with 3+ years in Dremio or modern lakehouse platforms.
Strong expertise in SQL optimization, data modeling, and performance tuning within Dremio or similar query engines (Presto, Trino, Athena).
Hands-on experience with cloud storage (S3, ADLS, GCS), Parquet/Delta/Iceberg formats, and distributed query planning.
Knowledge of data integration tools and pipelines (Airflow, DBT, Kafka, Spark, etc.).
Familiarity with enterprise data governance, metadata management, and role-based access control (RBAC).
Excellent problem-solving, documentation, and stakeholder communication skills.

Preferred:

Experience integrating Dremio with BI tools (Tableau, Power BI, Looker) and data catalogs (Collibra, Alation, Purview).
Exposure to Snowflake, Databricks, or BigQuery environments.
Experience in high-tech, manufacturing, or enterprise data modernization programs.

Review Criteria:

Strong Dremio / Lakehouse Data Architect profile
5+ years of experience in Data Architecture / Data Engineering, with minimum 3+ years hands-on in Dremio
Strong expertise in SQL optimization, data modeling, query performance tuning, and designing analytical schemas for large-scale systems
Deep experience with cloud object storage (S3 / ADLS / GCS) and file formats such as Parquet, Delta, Iceberg along with distributed query planning concepts
Hands-on experience integrating data via APIs, JDBC, Delta/Parquet, object storage, and coordinating with data engineering pipelines (Airflow, DBT, Kafka, Spark, etc.)
Proven experience designing and implementing lakehouse architecture including ingestion, curation, semantic modeling, reflections/caching optimization, and enabling governed analytics
Strong understanding of data governance, lineage, RBAC-based access control, and enterprise security best practices
Excellent communication skills with ability to work closely with BI, data science, and engineering teams; strong documentation discipline
Candidates must come from enterprise data modernization, cloud-native, or analytics-driven companies

Preferred:

Experience integrating Dremio with BI tools (Tableau, Power BI, Looker) or data catalogs (Collibra, Alation, Purview); familiarity with Snowflake, Databricks, or BigQuery environments

Role & Responsibilities:

Design and implement Dremio lakehouse architecture on cloud (AWS/Azure/Snowflake/Databricks ecosystem).
Define data ingestion, curation, and semantic modeling strategies to support analytics and AI workloads.
Optimize Dremio reflections, caching, and query performance for diverse data consumption patterns.
Collaborate with data engineering teams to integrate data sources via APIs, JDBC, Delta/Parquet, and object storage layers (S3/ADLS).
Establish best practices for data security, lineage, and access control aligned with enterprise governance policies.
Support self-service analytics by enabling governed data products and semantic layers.
Develop reusable design patterns, documentation, and standards for Dremio deployment, monitoring, and scaling.
Work closely with BI and data science teams to ensure fast, reliable, and well-modeled access to enterprise data.

Ideal Candidate:

Bachelor’s or Master’s in Computer Science, Information Systems, or related field.
5+ years in data architecture and engineering, with 3+ years in Dremio or modern lakehouse platforms.
Strong expertise in SQL optimization, data modeling, and performance tuning within Dremio or similar query engines (Presto, Trino, Athena).
Hands-on experience with cloud storage (S3, ADLS, GCS), Parquet/Delta/Iceberg formats, and distributed query planning.
Knowledge of data integration tools and pipelines (Airflow, DBT, Kafka, Spark, etc.).
Familiarity with enterprise data governance, metadata management, and role-based access control (RBAC).
Excellent problem-solving, documentation, and stakeholder communication skills.

Preferred:

Experience integrating Dremio with BI tools (Tableau, Power BI, Looker) and data catalogs (Collibra, Alation, Purview).
Exposure to Snowflake, Databricks, or BigQuery environments.
Experience in high-tech, manufacturing, or enterprise data modernization programs.

Lead II - SE - AWS, Apache Spark (PySpark/Scala), Apache Kafka

Global digital transformation solutions provider.

Agency job

via Peak Hire Solutions by Dhara Thakkar

Hyderabad

5 - 8 yrs

₹11L - ₹20L / yr

PySpark

Apache Kafka

Data architecture

Amazon Web Services (AWS)

EMR

+32 more

JOB DETAILS:

* Job Title: Lead II - Software Engineering - AWS, Apache Spark (PySpark/Scala), Apache Kafka

* Industry: Global digital transformation solutions provider

* Salary: Best in Industry

* Experience: 5-8 years

* Location: Hyderabad

Job Summary

We are seeking a skilled Data Engineer to design, build, and optimize scalable data pipelines and cloud-based data platforms. The role involves working with large-scale batch and real-time data processing systems, collaborating with cross-functional teams, and ensuring data reliability, security, and performance across the data lifecycle.

Key Responsibilities

ETL Pipeline Development & Optimization

Design, develop, and maintain complex end-to-end ETL pipelines for large-scale data ingestion and processing.
Optimize data pipelines for performance, scalability, fault tolerance, and reliability.

Big Data Processing

Develop and optimize batch and real-time data processing solutions using Apache Spark (PySpark/Scala) and Apache Kafka.
Ensure fault-tolerant, scalable, and high-performance data processing systems.

Cloud Infrastructure Development

Build and manage scalable, cloud-native data infrastructure on AWS.
Design resilient and cost-efficient data pipelines adaptable to varying data volume and formats.

Real-Time & Batch Data Integration

Enable seamless ingestion and processing of real-time streaming and batch data sources (e.g., AWS MSK).
Ensure consistency, data quality, and a unified view across multiple data sources and formats.

Data Analysis & Insights

Partner with business teams and data scientists to understand data requirements.
Perform in-depth data analysis to identify trends, patterns, and anomalies.
Deliver high-quality datasets and present actionable insights to stakeholders.

CI/CD & Automation

Implement and maintain CI/CD pipelines using Jenkins or similar tools.
Automate testing, deployment, and monitoring to ensure smooth production releases.

Data Security & Compliance

Collaborate with security teams to ensure compliance with organizational and regulatory standards (e.g., GDPR, HIPAA).
Implement data governance practices ensuring data integrity, security, and traceability.

Troubleshooting & Performance Tuning

Identify and resolve performance bottlenecks in data pipelines.
Apply best practices for monitoring, tuning, and optimizing data ingestion and storage.

Collaboration & Cross-Functional Work

Work closely with engineers, data scientists, product managers, and business stakeholders.
Participate in agile ceremonies, sprint planning, and architectural discussions.

Skills & Qualifications

Mandatory (Must-Have) Skills

AWS Expertise

Hands-on experience with AWS Big Data services such as EMR, Managed Apache Airflow, Glue, S3, DMS, MSK, and EC2.
Strong understanding of cloud-native data architectures.

Big Data Technologies

Proficiency in PySpark or Scala Spark and SQL for large-scale data transformation and analysis.
Experience with Apache Spark and Apache Kafka in production environments.

Data Frameworks

Strong knowledge of Spark DataFrames and Datasets.

ETL Pipeline Development

Proven experience in building scalable and reliable ETL pipelines for both batch and real-time data processing.

Database Modeling & Data Warehousing

Expertise in designing scalable data models for OLAP and OLTP systems.

Data Analysis & Insights

Ability to perform complex data analysis and extract actionable business insights.
Strong analytical and problem-solving skills with a data-driven mindset.

CI/CD & Automation

Basic to intermediate experience with CI/CD pipelines using Jenkins or similar tools.
Familiarity with automated testing and deployment workflows.

Good-to-Have (Preferred) Skills

Knowledge of Java for data processing applications.
Experience with NoSQL databases (e.g., DynamoDB, Cassandra, MongoDB).
Familiarity with data governance frameworks and compliance tooling.
Experience with monitoring and observability tools such as AWS CloudWatch, Splunk, or Dynatrace.
Exposure to cost optimization strategies for large-scale cloud data platforms.

Skills: big data, scala spark, apache spark, ETL pipeline development

******

Notice period - 0 to 15 days only

Job stability is mandatory

Location: Hyderabad

Note: If a candidate is a short joiner, based in Hyderabad, and fits within the approved budget, we will proceed with an offer

F2F Interview: 14th Feb 2026

3 days in office, Hybrid model.

JOB DETAILS:

* Job Title: Lead II - Software Engineering - AWS, Apache Spark (PySpark/Scala), Apache Kafka

* Industry: Global digital transformation solutions provider

* Salary: Best in Industry

* Experience: 5-8 years

* Location: Hyderabad

Job Summary

Key Responsibilities

ETL Pipeline Development & Optimization

Design, develop, and maintain complex end-to-end ETL pipelines for large-scale data ingestion and processing.
Optimize data pipelines for performance, scalability, fault tolerance, and reliability.

Big Data Processing

Develop and optimize batch and real-time data processing solutions using Apache Spark (PySpark/Scala) and Apache Kafka.
Ensure fault-tolerant, scalable, and high-performance data processing systems.

Cloud Infrastructure Development

Build and manage scalable, cloud-native data infrastructure on AWS.
Design resilient and cost-efficient data pipelines adaptable to varying data volume and formats.

Real-Time & Batch Data Integration

Enable seamless ingestion and processing of real-time streaming and batch data sources (e.g., AWS MSK).
Ensure consistency, data quality, and a unified view across multiple data sources and formats.

Data Analysis & Insights

Partner with business teams and data scientists to understand data requirements.
Perform in-depth data analysis to identify trends, patterns, and anomalies.
Deliver high-quality datasets and present actionable insights to stakeholders.

CI/CD & Automation

Implement and maintain CI/CD pipelines using Jenkins or similar tools.
Automate testing, deployment, and monitoring to ensure smooth production releases.

Data Security & Compliance

Collaborate with security teams to ensure compliance with organizational and regulatory standards (e.g., GDPR, HIPAA).
Implement data governance practices ensuring data integrity, security, and traceability.

Troubleshooting & Performance Tuning

Identify and resolve performance bottlenecks in data pipelines.
Apply best practices for monitoring, tuning, and optimizing data ingestion and storage.

Collaboration & Cross-Functional Work

Work closely with engineers, data scientists, product managers, and business stakeholders.
Participate in agile ceremonies, sprint planning, and architectural discussions.

Skills & Qualifications

Mandatory (Must-Have) Skills

AWS Expertise

Hands-on experience with AWS Big Data services such as EMR, Managed Apache Airflow, Glue, S3, DMS, MSK, and EC2.
Strong understanding of cloud-native data architectures.

Big Data Technologies

Proficiency in PySpark or Scala Spark and SQL for large-scale data transformation and analysis.
Experience with Apache Spark and Apache Kafka in production environments.

Data Frameworks

Strong knowledge of Spark DataFrames and Datasets.

ETL Pipeline Development

Proven experience in building scalable and reliable ETL pipelines for both batch and real-time data processing.

Database Modeling & Data Warehousing

Expertise in designing scalable data models for OLAP and OLTP systems.

Data Analysis & Insights

Ability to perform complex data analysis and extract actionable business insights.
Strong analytical and problem-solving skills with a data-driven mindset.

CI/CD & Automation

Basic to intermediate experience with CI/CD pipelines using Jenkins or similar tools.
Familiarity with automated testing and deployment workflows.

Good-to-Have (Preferred) Skills

Knowledge of Java for data processing applications.
Experience with NoSQL databases (e.g., DynamoDB, Cassandra, MongoDB).
Familiarity with data governance frameworks and compliance tooling.
Experience with monitoring and observability tools such as AWS CloudWatch, Splunk, or Dynatrace.
Exposure to cost optimization strategies for large-scale cloud data platforms.

Skills: big data, scala spark, apache spark, ETL pipeline development

******

Notice period - 0 to 15 days only

Job stability is mandatory

Location: Hyderabad

Note: If a candidate is a short joiner, based in Hyderabad, and fits within the approved budget, we will proceed with an offer

F2F Interview: 14th Feb 2026

3 days in office, Hybrid model.

Data Architect (Dremio Lakehouse)

AI-First Company

Agency job

via Peak Hire Solutions by Dhara Thakkar

Bengaluru (Bangalore), Mumbai, Hyderabad, Gurugram

5 - 17 yrs

₹30L - ₹45L / yr

Data engineering

Data architecture

SQL

Data modeling

GCS

+47 more

ROLES AND RESPONSIBILITIES:

You will be responsible for architecting, implementing, and optimizing Dremio-based data Lakehouse environments integrated with cloud storage, BI, and data engineering ecosystems. The role requires a strong balance of architecture design, data modeling, query optimization, and governance enablement in large-scale analytical environments.

Design and implement Dremio lakehouse architecture on cloud (AWS/Azure/Snowflake/Databricks ecosystem).
Define data ingestion, curation, and semantic modeling strategies to support analytics and AI workloads.
Optimize Dremio reflections, caching, and query performance for diverse data consumption patterns.
Collaborate with data engineering teams to integrate data sources via APIs, JDBC, Delta/Parquet, and object storage layers (S3/ADLS).
Establish best practices for data security, lineage, and access control aligned with enterprise governance policies.
Support self-service analytics by enabling governed data products and semantic layers.
Develop reusable design patterns, documentation, and standards for Dremio deployment, monitoring, and scaling.
Work closely with BI and data science teams to ensure fast, reliable, and well-modeled access to enterprise data.

IDEAL CANDIDATE:

Bachelor’s or Master’s in Computer Science, Information Systems, or related field.
5+ years in data architecture and engineering, with 3+ years in Dremio or modern lakehouse platforms.
Strong expertise in SQL optimization, data modeling, and performance tuning within Dremio or similar query engines (Presto, Trino, Athena).
Hands-on experience with cloud storage (S3, ADLS, GCS), Parquet/Delta/Iceberg formats, and distributed query planning.
Knowledge of data integration tools and pipelines (Airflow, DBT, Kafka, Spark, etc.).
Familiarity with enterprise data governance, metadata management, and role-based access control (RBAC).
Excellent problem-solving, documentation, and stakeholder communication skills.

PREFERRED:

Experience integrating Dremio with BI tools (Tableau, Power BI, Looker) and data catalogs (Collibra, Alation, Purview).
Exposure to Snowflake, Databricks, or BigQuery environments.
Experience in high-tech, manufacturing, or enterprise data modernization programs.

ROLES AND RESPONSIBILITIES:

Design and implement Dremio lakehouse architecture on cloud (AWS/Azure/Snowflake/Databricks ecosystem).
Define data ingestion, curation, and semantic modeling strategies to support analytics and AI workloads.
Optimize Dremio reflections, caching, and query performance for diverse data consumption patterns.
Collaborate with data engineering teams to integrate data sources via APIs, JDBC, Delta/Parquet, and object storage layers (S3/ADLS).
Establish best practices for data security, lineage, and access control aligned with enterprise governance policies.
Support self-service analytics by enabling governed data products and semantic layers.
Develop reusable design patterns, documentation, and standards for Dremio deployment, monitoring, and scaling.
Work closely with BI and data science teams to ensure fast, reliable, and well-modeled access to enterprise data.

IDEAL CANDIDATE:

Bachelor’s or Master’s in Computer Science, Information Systems, or related field.
5+ years in data architecture and engineering, with 3+ years in Dremio or modern lakehouse platforms.
Strong expertise in SQL optimization, data modeling, and performance tuning within Dremio or similar query engines (Presto, Trino, Athena).
Hands-on experience with cloud storage (S3, ADLS, GCS), Parquet/Delta/Iceberg formats, and distributed query planning.
Knowledge of data integration tools and pipelines (Airflow, DBT, Kafka, Spark, etc.).
Familiarity with enterprise data governance, metadata management, and role-based access control (RBAC).
Excellent problem-solving, documentation, and stakeholder communication skills.

PREFERRED:

Experience integrating Dremio with BI tools (Tableau, Power BI, Looker) and data catalogs (Collibra, Alation, Purview).
Exposure to Snowflake, Databricks, or BigQuery environments.
Experience in high-tech, manufacturing, or enterprise data modernization programs.

Sr. Big Data Engineer

at Inncircles

Posted by Gangadhar M

Hyderabad

3 - 5 yrs

Best in industry

PySpark

Spark

Python

ETL

Amazon EMR

+7 more

We are looking for a highly skilled Sr. Big Data Engineer with 3-5 years of experience in

building large-scale data pipelines, real-time streaming solutions, and batch/stream

processing systems. The ideal candidate should be proficient in Spark, Kafka, Python, and

AWS Big Data services, with hands-on experience in implementing CDC (Change Data

Capture) pipelines and integrating multiple data sources and sinks.

Responsibilities

Design, develop, and optimize batch and streaming data pipelines using Apache Spark and Python.
Build and maintain real-time data ingestion pipelines leveraging Kafka and AWS Kinesis.
Implement CDC (Change Data Capture) pipelines using Kafka Connect, Debezium or similar frameworks.
Integrate data from multiple sources and sinks (databases, APIs, message queues, file systems, cloud storage).
Work with AWS Big Data ecosystem: Glue, EMR, Kinesis, Athena, S3, Lambda, Step Functions.
Ensure pipeline scalability, reliability, and performance tuning of Spark jobs and EMR clusters.
Develop data transformation and ETL workflows in AWS Glue and manage schema evolution.
Collaborate with data scientists, analysts, and product teams to deliver reliable and high-quality data solutions.
Implement monitoring, logging, and alerting for critical data pipelines.
Follow best practices for data security, compliance, and cost optimization in cloud environments.

Required Skills & Experience

Programming: Strong proficiency in Python (PySpark, data frameworks, automation).
Big Data Processing: Hands-on experience with Apache Spark (batch & streaming).
Messaging & Streaming: Proficient in Kafka (brokers, topics, partitions, consumer groups) and AWS Kinesis.
CDC Pipelines: Experience with Debezium / Kafka Connect / custom CDC frameworks.
AWS Services: AWS Glue, EMR, S3, Athena, Lambda, IAM, CloudWatch.
ETL/ELT Workflows: Strong knowledge of data ingestion, transformation, partitioning, schema management.
Databases: Experience with relational databases (MySQL, Postgres, Oracle) and NoSQL (MongoDB, DynamoDB, Cassandra).
Data Formats: JSON, Parquet, Avro, ORC, Delta/Iceberg/Hudi.
Version Control & CI/CD: Git, GitHub/GitLab, Jenkins, or CodePipeline.
Monitoring/Logging: CloudWatch, Prometheus, ELK/Opensearch.
Containers & Orchestration (nice-to-have): Docker, Kubernetes, Airflow/Step
Functions for workflow orchestration.

Preferred Qualifications

Bachelor’s or Master’s degree in Computer Science, Data Engineering, or related field.
Experience in large-scale data lake / lake house architectures.
Knowledge of data warehousing concepts and query optimisation.
Familiarity with data governance, lineage, and cataloging tools (Glue Data Catalog, Apache Atlas).
Exposure to ML/AI data pipelines is a plus.

Tools & Technologies (must-have exposure)

Big Data & Processing: Apache Spark, PySpark, AWS EMR, AWS Glue
Streaming & Messaging: Apache Kafka, Kafka Connect, Debezium, AWS Kinesis
Cloud & Storage: AWS (S3, Athena, Lambda, IAM, CloudWatch)
Programming & Scripting: Python, SQL, Bash
Orchestration: Airflow / Step Functions
Version Control & CI/CD: Git, Jenkins/CodePipeline
Data Formats: Parquet, Avro, ORC, JSON, Delta, Iceberg, Hudi

We are looking for a highly skilled Sr. Big Data Engineer with 3-5 years of experience in

building large-scale data pipelines, real-time streaming solutions, and batch/stream

processing systems. The ideal candidate should be proficient in Spark, Kafka, Python, and

AWS Big Data services, with hands-on experience in implementing CDC (Change Data

Capture) pipelines and integrating multiple data sources and sinks.

Responsibilities

Design, develop, and optimize batch and streaming data pipelines using Apache Spark and Python.
Build and maintain real-time data ingestion pipelines leveraging Kafka and AWS Kinesis.
Implement CDC (Change Data Capture) pipelines using Kafka Connect, Debezium or similar frameworks.
Integrate data from multiple sources and sinks (databases, APIs, message queues, file systems, cloud storage).
Work with AWS Big Data ecosystem: Glue, EMR, Kinesis, Athena, S3, Lambda, Step Functions.
Ensure pipeline scalability, reliability, and performance tuning of Spark jobs and EMR clusters.
Develop data transformation and ETL workflows in AWS Glue and manage schema evolution.
Collaborate with data scientists, analysts, and product teams to deliver reliable and high-quality data solutions.
Implement monitoring, logging, and alerting for critical data pipelines.
Follow best practices for data security, compliance, and cost optimization in cloud environments.

Required Skills & Experience

Programming: Strong proficiency in Python (PySpark, data frameworks, automation).
Big Data Processing: Hands-on experience with Apache Spark (batch & streaming).
Messaging & Streaming: Proficient in Kafka (brokers, topics, partitions, consumer groups) and AWS Kinesis.
CDC Pipelines: Experience with Debezium / Kafka Connect / custom CDC frameworks.
AWS Services: AWS Glue, EMR, S3, Athena, Lambda, IAM, CloudWatch.
ETL/ELT Workflows: Strong knowledge of data ingestion, transformation, partitioning, schema management.
Databases: Experience with relational databases (MySQL, Postgres, Oracle) and NoSQL (MongoDB, DynamoDB, Cassandra).
Data Formats: JSON, Parquet, Avro, ORC, Delta/Iceberg/Hudi.
Version Control & CI/CD: Git, GitHub/GitLab, Jenkins, or CodePipeline.
Monitoring/Logging: CloudWatch, Prometheus, ELK/Opensearch.
Containers & Orchestration (nice-to-have): Docker, Kubernetes, Airflow/Step
Functions for workflow orchestration.

Preferred Qualifications

Bachelor’s or Master’s degree in Computer Science, Data Engineering, or related field.
Experience in large-scale data lake / lake house architectures.
Knowledge of data warehousing concepts and query optimisation.
Familiarity with data governance, lineage, and cataloging tools (Glue Data Catalog, Apache Atlas).
Exposure to ML/AI data pipelines is a plus.

Tools & Technologies (must-have exposure)

Big Data & Processing: Apache Spark, PySpark, AWS EMR, AWS Glue
Streaming & Messaging: Apache Kafka, Kafka Connect, Debezium, AWS Kinesis
Cloud & Storage: AWS (S3, Athena, Lambda, IAM, CloudWatch)
Programming & Scripting: Python, SQL, Bash
Orchestration: Airflow / Step Functions
Version Control & CI/CD: Git, Jenkins/CodePipeline
Data Formats: Parquet, Avro, ORC, JSON, Delta, Iceberg, Hudi

data engineer

top MNC

Agency job

via Vy Systems by thirega thanasekaran

Bengaluru (Bangalore), Chennai, Hyderabad, Coimbatore, Kochi (Cochin), Thrissur, Thiruvananthapuram, Kozhikode (Calicut), Kasaragod

5 - 12 yrs

₹5L - ₹9L / yr

Data engineering

databricks

Apache Synapse

Apache Spark

Job Summary:

Seeking an experienced Senior Data Engineer to lead data ingestion, transformation, and optimization initiatives using the modern Apache and Azure data stack. The role involves working on scalable pipelines, large-scale distributed systems, and data lake management.

Core Responsibilities:

· Build and manage high-volume data pipelines using Spark/Databricks.

· Implement ELT frameworks using Azure Data Factory/Synapse Pipelines.

· Optimize large-scale datasets in Delta/Iceberg formats.

· Implement robust data quality, monitoring, and governance layers.

· Collaborate with Data Scientists, Analysts, and Business stakeholders.

Technical Stack:

· Big Data: Apache Spark, Kafka, Hive, Airflow, Hudi/Iceberg

· Cloud: Azure (Synapse, ADF, ADLS Gen2), Databricks, AWS (Glue/S3)

· Languages: Python, Scala, SQL

· Storage Formats: Delta Lake, Iceberg, Parquet, ORC

· CI/CD: Azure DevOps, Terraform (infra as code), Git

Senior Data Engineer (Apache Stack + Databricks/Synapse)

Share cv to

Thirega@ vysystems dot com - WhatsApp - 91Five0033Five2Three

Job Summary:

Core Responsibilities:

· Build and manage high-volume data pipelines using Spark/Databricks.

· Implement ELT frameworks using Azure Data Factory/Synapse Pipelines.

· Optimize large-scale datasets in Delta/Iceberg formats.

· Implement robust data quality, monitoring, and governance layers.

· Collaborate with Data Scientists, Analysts, and Business stakeholders.

Technical Stack:

· Big Data: Apache Spark, Kafka, Hive, Airflow, Hudi/Iceberg

· Cloud: Azure (Synapse, ADF, ADLS Gen2), Databricks, AWS (Glue/S3)

· Languages: Python, Scala, SQL

· Storage Formats: Delta Lake, Iceberg, Parquet, ORC

· CI/CD: Azure DevOps, Terraform (infra as code), Git

Senior Data Engineer (Apache Stack + Databricks/Synapse)

Share cv to

Thirega@ vysystems dot com - WhatsApp - 91Five0033Five2Three

Data Engineer

at Frisco Analytics Pvt Ltd

Posted by Cedrick Mariadas

Bengaluru (Bangalore), Hyderabad

5 - 8 yrs

₹15L - ₹20L / yr

databricks

Apache Spark

Python

SQL

MySQL

+3 more

We are actively seeking a self-motivated Data Engineer with expertise in Azure cloud and Databricks, with a thorough understanding of Delta Lake and Lake-house Architecture. The ideal candidate should excel in developing scalable data solutions, crafting platform tools, and integrating systems, while demonstrating proficiency in cloud-native database solutions and distributed data processing.

Key Responsibilities:

Contribute to the development and upkeep of a scalable data platform, incorporating tools and frameworks that leverage Azure and Databricks capabilities.
Exhibit proficiency in various RDBMS databases such as MySQL and SQL-Server, emphasizing their integration in applications and pipeline development.
Design and maintain high-caliber code, including data pipelines and applications, utilizing Python, Scala, and PHP.
Implement effective data processing solutions via Apache Spark, optimizing Spark applications for large-scale data handling.
Optimize data storage using formats like Parquet and Delta Lake to ensure efficient data accessibility and reliable performance.
Demonstrate understanding of Hive Metastore, Unity Catalog Metastore, and the operational dynamics of external tables.
Collaborate with diverse teams to convert business requirements into precise technical specifications.

Requirements:

Bachelor’s degree in Computer Science, Engineering, or a related discipline.
Demonstrated hands-on experience with Azure cloud services and Databricks.
Proficient programming skills in Python, Scala, and PHP.
In-depth knowledge of SQL, NoSQL databases, and data warehousing principles.
Familiarity with distributed data processing and external table management.
Insight into enterprise data solutions for PIM, CDP, MDM, and ERP applications.
Exceptional problem-solving acumen and meticulous attention to detail.

Additional Qualifications :

Acquaintance with data security and privacy standards.
Experience in CI/CD pipelines and version control systems, notably Git.
Familiarity with Agile methodologies and DevOps practices.
Competence in technical writing for comprehensive documentation.

Key Responsibilities:

Contribute to the development and upkeep of a scalable data platform, incorporating tools and frameworks that leverage Azure and Databricks capabilities.
Exhibit proficiency in various RDBMS databases such as MySQL and SQL-Server, emphasizing their integration in applications and pipeline development.
Design and maintain high-caliber code, including data pipelines and applications, utilizing Python, Scala, and PHP.
Implement effective data processing solutions via Apache Spark, optimizing Spark applications for large-scale data handling.
Optimize data storage using formats like Parquet and Delta Lake to ensure efficient data accessibility and reliable performance.
Demonstrate understanding of Hive Metastore, Unity Catalog Metastore, and the operational dynamics of external tables.
Collaborate with diverse teams to convert business requirements into precise technical specifications.

Requirements:

Bachelor’s degree in Computer Science, Engineering, or a related discipline.
Demonstrated hands-on experience with Azure cloud services and Databricks.
Proficient programming skills in Python, Scala, and PHP.
In-depth knowledge of SQL, NoSQL databases, and data warehousing principles.
Familiarity with distributed data processing and external table management.
Insight into enterprise data solutions for PIM, CDP, MDM, and ERP applications.
Exceptional problem-solving acumen and meticulous attention to detail.

Additional Qualifications :

Acquaintance with data security and privacy standards.
Experience in CI/CD pipelines and version control systems, notably Git.
Familiarity with Agile methodologies and DevOps practices.
Competence in technical writing for comprehensive documentation.

Senior Data Engineer (L2)

at Publicis Sapient

10 recruiters

Posted by Mohit Singh

Bengaluru (Bangalore), Pune, Hyderabad, Gurugram, Noida

5 - 11 yrs

₹20L - ₹36L / yr

PySpark

Data engineering

Big Data

Hadoop

Spark

+7 more

Publicis Sapient Overview:

The Senior Associate People Senior Associate L1 in Data Engineering, you will translate client requirements into technical design, and implement components for data engineering solution. Utilize deep understanding of data integration and big data design principles in creating custom solutions or implementing package solutions. You will independently drive design discussions to insure the necessary health of the overall solution

Job Summary:

As Senior Associate L2 in Data Engineering, you will translate client requirements into technical design, and implement components for data engineering solution. Utilize deep understanding of data integration and big data design principles in creating custom solutions or implementing package solutions. You will independently drive design discussions to insure the necessary health of the overall solution

The role requires a hands-on technologist who has strong programming background like Java / Scala / Python, should have experience in Data Ingestion, Integration and data Wrangling, Computation, Analytics pipelines and exposure to Hadoop ecosystem components. You are also required to have hands-on knowledge on at least one of AWS, GCP, Azure cloud platforms.

Role & Responsibilities:

Your role is focused on Design, Development and delivery of solutions involving:

• Data Integration, Processing & Governance

• Data Storage and Computation Frameworks, Performance Optimizations

• Analytics & Visualizations

• Infrastructure & Cloud Computing

• Data Management Platforms

• Implement scalable architectural models for data processing and storage

• Build functionality for data ingestion from multiple heterogeneous sources in batch & real-time mode

• Build functionality for data analytics, search and aggregation

Experience Guidelines:

Mandatory Experience and Competencies:

# Competency

1.Overall 5+ years of IT experience with 3+ years in Data related technologies

2.Minimum 2.5 years of experience in Big Data technologies and working exposure in at least one cloud platform on related data services (AWS / Azure / GCP)

3.Hands-on experience with the Hadoop stack – HDFS, sqoop, kafka, Pulsar, NiFi, Spark, Spark Streaming, Flink, Storm, hive, oozie, airflow and other components required in building end to end data pipeline.

4.Strong experience in at least of the programming language Java, Scala, Python. Java preferable

5.Hands-on working knowledge of NoSQL and MPP data platforms like Hbase, MongoDb, Cassandra, AWS Redshift, Azure SQLDW, GCP BigQuery etc

6.Well-versed and working knowledge with data platform related services on at least 1 cloud platform, IAM and data security

Preferred Experience and Knowledge (Good to Have):

# Competency

1.Good knowledge of traditional ETL tools (Informatica, Talend, etc) and database technologies (Oracle, MySQL, SQL Server, Postgres) with hands on experience

2.Knowledge on data governance processes (security, lineage, catalog) and tools like Collibra, Alation etc

3.Knowledge on distributed messaging frameworks like ActiveMQ / RabbiMQ / Solace, search & indexing and Micro services architectures

4.Performance tuning and optimization of data pipelines

5.CI/CD – Infra provisioning on cloud, auto build & deployment pipelines, code quality

6.Cloud data specialty and other related Big data technology certifications

Personal Attributes:

• Strong written and verbal communication skills

• Articulation skills

• Good team player

• Self-starter who requires minimal oversight

• Ability to prioritize and manage multiple tasks

• Process orientation and the ability to define and set up processes

Publicis Sapient Overview:

Job Summary:

Role & Responsibilities:

Your role is focused on Design, Development and delivery of solutions involving:

• Data Integration, Processing & Governance

• Data Storage and Computation Frameworks, Performance Optimizations

• Analytics & Visualizations

• Infrastructure & Cloud Computing

• Data Management Platforms

• Implement scalable architectural models for data processing and storage

• Build functionality for data ingestion from multiple heterogeneous sources in batch & real-time mode

• Build functionality for data analytics, search and aggregation

Experience Guidelines:

Mandatory Experience and Competencies:

# Competency

1.Overall 5+ years of IT experience with 3+ years in Data related technologies

2.Minimum 2.5 years of experience in Big Data technologies and working exposure in at least one cloud platform on related data services (AWS / Azure / GCP)

4.Strong experience in at least of the programming language Java, Scala, Python. Java preferable

5.Hands-on working knowledge of NoSQL and MPP data platforms like Hbase, MongoDb, Cassandra, AWS Redshift, Azure SQLDW, GCP BigQuery etc

6.Well-versed and working knowledge with data platform related services on at least 1 cloud platform, IAM and data security

Preferred Experience and Knowledge (Good to Have):

# Competency

1.Good knowledge of traditional ETL tools (Informatica, Talend, etc) and database technologies (Oracle, MySQL, SQL Server, Postgres) with hands on experience

2.Knowledge on data governance processes (security, lineage, catalog) and tools like Collibra, Alation etc

3.Knowledge on distributed messaging frameworks like ActiveMQ / RabbiMQ / Solace, search & indexing and Micro services architectures

4.Performance tuning and optimization of data pipelines

5.CI/CD – Infra provisioning on cloud, auto build & deployment pipelines, code quality

6.Cloud data specialty and other related Big data technology certifications

Personal Attributes:

• Strong written and verbal communication skills

• Articulation skills

• Good team player

• Self-starter who requires minimal oversight

• Ability to prioritize and manage multiple tasks

• Process orientation and the ability to define and set up processes

Senior Data Engineering Role - Google Cloud Platform with Spark

A LEADING US BASED MNC

Agency job

via Zeal Consultants by Zeal Consultants

Bengaluru (Bangalore), Hyderabad, Delhi, Gurugram

5 - 10 yrs

₹14L - ₹15L / yr

Google Cloud Platform (GCP)

Spark

PySpark

Apache Spark

"DATA STREAMING"

Data Engineering : Senior Engineer / Manager

As Senior Engineer/ Manager in Data Engineering, you will translate client requirements into technical design, and implement components for a data engineering solutions. Utilize a deep understanding of data integration and big data design principles in creating custom solutions or implementing package solutions. You will independently drive design discussions to insure the necessary health of the overall solution.

Must Have skills :

1. GCP

2. Spark streaming : Live data streaming experience is desired.

3. Any 1 coding language: Java/Pyhton /Scala

Skills & Experience :

- Overall experience of MINIMUM 5+ years with Minimum 4 years of relevant experience in Big Data technologies

- Hands-on experience with the Hadoop stack - HDFS, sqoop, kafka, Pulsar, NiFi, Spark, Spark Streaming, Flink, Storm, hive, oozie, airflow and other components required in building end to end data pipeline. Working knowledge on real-time data pipelines is added advantage.

- Strong experience in at least of the programming language Java, Scala, Python. Java preferable

- Hands-on working knowledge of NoSQL and MPP data platforms like Hbase, MongoDb, Cassandra, AWS Redshift, Azure SQLDW, GCP BigQuery etc.

- Well-versed and working knowledge with data platform related services on GCP

- Bachelor's degree and year of work experience of 6 to 12 years or any combination of education, training and/or experience that demonstrates the ability to perform the duties of the position

Your Impact :

- Data Ingestion, Integration and Transformation

- Data Storage and Computation Frameworks, Performance Optimizations

- Analytics & Visualizations

- Infrastructure & Cloud Computing

- Data Management Platforms

- Build functionality for data ingestion from multiple heterogeneous sources in batch & real-time

- Build functionality for data analytics, search and aggregation

Data Engineering : Senior Engineer / Manager

Must Have skills :

1. GCP

2. Spark streaming : Live data streaming experience is desired.

3. Any 1 coding language: Java/Pyhton /Scala

Skills & Experience :

- Overall experience of MINIMUM 5+ years with Minimum 4 years of relevant experience in Big Data technologies

- Strong experience in at least of the programming language Java, Scala, Python. Java preferable

- Hands-on working knowledge of NoSQL and MPP data platforms like Hbase, MongoDb, Cassandra, AWS Redshift, Azure SQLDW, GCP BigQuery etc.

- Well-versed and working knowledge with data platform related services on GCP

- Bachelor's degree and year of work experience of 6 to 12 years or any combination of education, training and/or experience that demonstrates the ability to perform the duties of the position

Your Impact :

- Data Ingestion, Integration and Transformation

- Data Storage and Computation Frameworks, Performance Optimizations

- Analytics & Visualizations

- Infrastructure & Cloud Computing

- Data Management Platforms

- Build functionality for data ingestion from multiple heterogeneous sources in batch & real-time

- Build functionality for data analytics, search and aggregation

Data Lake Engineer

at [x]cube LABS

2 candid answers

1 video

Posted by Krishna kandregula

Hyderabad

2 - 6 yrs

₹8L - ₹20L / yr

ETL

Informatica

Data Warehouse (DWH)

PowerBI

DAX

+12 more

Creating and managing ETL/ELT pipelines based on requirements
Build PowerBI dashboards and manage datasets needed.
Work with stakeholders to identify data structures needed for future and perform any transformations including aggregations.
Build data cubes for real-time visualisation needs and CXO dashboards.

Required Tech Skills

Microsoft PowerBI & DAX
Python, Pandas, PyArrow, Jupyter Noteboks, ApacheSpark
Azure Synapse, Azure DataBricks, Azure HDInsight, Azure Data Factory

Creating and managing ETL/ELT pipelines based on requirements
Build PowerBI dashboards and manage datasets needed.
Work with stakeholders to identify data structures needed for future and perform any transformations including aggregations.
Build data cubes for real-time visualisation needs and CXO dashboards.

Required Tech Skills

Microsoft PowerBI & DAX
Python, Pandas, PyArrow, Jupyter Noteboks, ApacheSpark
Azure Synapse, Azure DataBricks, Azure HDInsight, Azure Data Factory

Data Engineer

at Accolite Digital

Posted by Nitesh Parab

Bengaluru (Bangalore), Hyderabad, Gurugram, Delhi, Noida, Ghaziabad, Faridabad

4 - 8 yrs

₹5L - ₹15L / yr

ETL

Informatica

Data Warehouse (DWH)

SSIS

SQL Server Integration Services (SSIS)

+10 more

Job Title: Data Engineer

Job Summary: As a Data Engineer, you will be responsible for designing, building, and maintaining the infrastructure and tools necessary for data collection, storage, processing, and analysis. You will work closely with data scientists and analysts to ensure that data is available, accessible, and in a format that can be easily consumed for business insights.

Responsibilities:

Design, build, and maintain data pipelines to collect, store, and process data from various sources.
Create and manage data warehousing and data lake solutions.
Develop and maintain data processing and data integration tools.
Collaborate with data scientists and analysts to design and implement data models and algorithms for data analysis.
Optimize and scale existing data infrastructure to ensure it meets the needs of the business.
Ensure data quality and integrity across all data sources.
Develop and implement best practices for data governance, security, and privacy.
Monitor data pipeline performance / Errors and troubleshoot issues as needed.
Stay up-to-date with emerging data technologies and best practices.

Requirements:

Bachelor's degree in Computer Science, Information Systems, or a related field.

Experience with ETL tools like Matillion,SSIS,Informatica

Experience with SQL and relational databases such as SQL server, MySQL, PostgreSQL, or Oracle.

Experience in writing complex SQL queries

Strong programming skills in languages such as Python, Java, or Scala.

Experience with data modeling, data warehousing, and data integration.

Strong problem-solving skills and ability to work independently.

Excellent communication and collaboration skills.

Familiarity with big data technologies such as Hadoop, Spark, or Kafka.

Familiarity with data warehouse/Data lake technologies like Snowflake or Databricks

Familiarity with cloud computing platforms such as AWS, Azure, or GCP.

Familiarity with Reporting tools

Teamwork/ growth contribution

Helping the team in taking the Interviews and identifying right candidates
Adhering to timelines
Intime status communication and upfront communication of any risks
Tech, train, share knowledge with peers.
Good Communication skills
Proven abilities to take initiative and be innovative
Analytical mind with a problem-solving aptitude

Good to have :

Master's degree in Computer Science, Information Systems, or a related field.

Experience with NoSQL databases such as MongoDB or Cassandra.

Familiarity with data visualization and business intelligence tools such as Tableau or Power BI.

Knowledge of machine learning and statistical modeling techniques.

If you are passionate about data and want to work with a dynamic team of data scientists and analysts, we encourage you to apply for this position.

Job Title: Data Engineer

Responsibilities:

Design, build, and maintain data pipelines to collect, store, and process data from various sources.
Create and manage data warehousing and data lake solutions.
Develop and maintain data processing and data integration tools.
Collaborate with data scientists and analysts to design and implement data models and algorithms for data analysis.
Optimize and scale existing data infrastructure to ensure it meets the needs of the business.
Ensure data quality and integrity across all data sources.
Develop and implement best practices for data governance, security, and privacy.
Monitor data pipeline performance / Errors and troubleshoot issues as needed.
Stay up-to-date with emerging data technologies and best practices.

Requirements:

Bachelor's degree in Computer Science, Information Systems, or a related field.

Experience with ETL tools like Matillion,SSIS,Informatica

Experience with SQL and relational databases such as SQL server, MySQL, PostgreSQL, or Oracle.

Experience in writing complex SQL queries

Strong programming skills in languages such as Python, Java, or Scala.

Experience with data modeling, data warehousing, and data integration.

Strong problem-solving skills and ability to work independently.

Excellent communication and collaboration skills.

Familiarity with big data technologies such as Hadoop, Spark, or Kafka.

Familiarity with data warehouse/Data lake technologies like Snowflake or Databricks

Familiarity with cloud computing platforms such as AWS, Azure, or GCP.

Familiarity with Reporting tools

Teamwork/ growth contribution

Helping the team in taking the Interviews and identifying right candidates
Adhering to timelines
Intime status communication and upfront communication of any risks
Tech, train, share knowledge with peers.
Good Communication skills
Proven abilities to take initiative and be innovative
Analytical mind with a problem-solving aptitude

Good to have :

Master's degree in Computer Science, Information Systems, or a related field.

Experience with NoSQL databases such as MongoDB or Cassandra.

Familiarity with data visualization and business intelligence tools such as Tableau or Power BI.

Knowledge of machine learning and statistical modeling techniques.

If you are passionate about data and want to work with a dynamic team of data scientists and analysts, we encourage you to apply for this position.

Data Engineer

at Hammoq

1 recruiter

Posted by Nikitha Muthuswamy

Remote, Indore, Ujjain, Hyderabad, Bengaluru (Bangalore)

5 - 8 yrs

₹5L - ₹15L / yr

pandas

NumPy

Data engineering

Data Engineer

Apache Spark

+6 more

Does analytics to extract insights from raw historical data of the organization.
Generates usable training dataset for any/all MV projects with the help of Annotators, if needed.
Analyses user trends, and identifies their biggest bottlenecks in Hammoq Workflow.
Tests the short/long term impact of productized MV models on those trends.
Skills - Numpy, Pandas, SPARK, APACHE SPARK, PYSPARK, ETL mandatory.

Does analytics to extract insights from raw historical data of the organization.
Generates usable training dataset for any/all MV projects with the help of Annotators, if needed.
Analyses user trends, and identifies their biggest bottlenecks in Hammoq Workflow.
Tests the short/long term impact of productized MV models on those trends.
Skills - Numpy, Pandas, SPARK, APACHE SPARK, PYSPARK, ETL mandatory.

Big Data Spark Lead

at DataMetica

1 video

7 recruiters

Posted by Sumangali Desai

Pune, Hyderabad

7 - 12 yrs

₹7L - ₹20L / yr

Apache Spark

Big Data

Spark

Scala

Hadoop

+3 more

We at Datametica Solutions Private Limited are looking for Big Data Spark Lead who have a passion for cloud with knowledge of different on-premise and cloud Data implementation in the field of Big Data and Analytics including and not limiting to Teradata, Netezza, Exadata, Oracle, Cloudera, Hortonworks and alike.
Ideal candidates should have technical experience in migrations and the ability to help customers get value from Datametica's tools and accelerators.

Job Description
Experience : 7+ years
Location : Pune / Hyderabad
Skills :

Drive and participate in requirements gathering workshops, estimation discussions, design meetings and status review meetings
Participate and contribute in Solution Design and Solution Architecture for implementing Big Data Projects on-premise and on cloud
Technical Hands on experience in design, coding, development and managing Large Hadoop implementation
Proficient in SQL, Hive, PIG, Spark SQL, Shell Scripting, Kafka, Flume, Scoop with large Big Data and Data Warehousing projects with either Java, Python or Scala based Hadoop programming background
Proficient with various development methodologies like waterfall, agile/scrum and iterative
Good Interpersonal skills and excellent communication skills for US and UK based clients

About Us!
A global Leader in the Data Warehouse Migration and Modernization to the Cloud, we empower businesses by migrating their Data/Workload/ETL/Analytics to the Cloud by leveraging Automation.

We have expertise in transforming legacy Teradata, Oracle, Hadoop, Netezza, Vertica, Greenplum along with ETLs like Informatica, Datastage, AbInitio & others, to cloud-based data warehousing with other capabilities in data engineering, advanced analytics solutions, data management, data lake and cloud optimization.

Datametica is a key partner of the major cloud service providers - Google, Microsoft, Amazon, Snowflake.

We have our own products!
Eagle – Data warehouse Assessment & Migration Planning Product
Raven – Automated Workload Conversion Product
Pelican - Automated Data Validation Product, which helps automate and accelerate data migration to the cloud.

Why join us!
Datametica is a place to innovate, bring new ideas to live and learn new things. We believe in building a culture of innovation, growth and belonging. Our people and their dedication over these years are the key factors in achieving our success.

Benefits we Provide!
Working with Highly Technical and Passionate, mission-driven people
Subsidized Meals & Snacks
Flexible Schedule
Approachable leadership
Access to various learning tools and programs
Pet Friendly
Certification Reimbursement Policy

Check out more about us on our website below!
www.datametica.com

Drive and participate in requirements gathering workshops, estimation discussions, design meetings and status review meetings
Participate and contribute in Solution Design and Solution Architecture for implementing Big Data Projects on-premise and on cloud
Technical Hands on experience in design, coding, development and managing Large Hadoop implementation
Proficient in SQL, Hive, PIG, Spark SQL, Shell Scripting, Kafka, Flume, Scoop with large Big Data and Data Warehousing projects with either Java, Python or Scala based Hadoop programming background
Proficient with various development methodologies like waterfall, agile/scrum and iterative
Good Interpersonal skills and excellent communication skills for US and UK based clients

Data Engineer

at SpringML

1 video

2 recruiters

Posted by Kayal Vizhi

Hyderabad

4 - 11 yrs

₹8L - ₹20L / yr

Big Data

Hadoop

Apache Spark

Spark

Data Structures

+3 more

SpringML is looking to hire a top-notch Senior Data Engineer who is passionate about working with data and using the latest distributed framework to process large dataset. As an Associate Data Engineer, your primary role will be to design and build data pipelines. You will be focused on helping client projects on data integration, data prep and implementing machine learning on datasets. In this role, you will work on some of the latest technologies, collaborate with partners on early win, consultative approach with clients, interact daily with executive leadership, and help build a great company. Chosen team members will be part of the core team and play a critical role in scaling up our emerging practice.

RESPONSIBILITIES:

Ability to work as a member of a team assigned to design and implement data integration solutions.
Build Data pipelines using standard frameworks in Hadoop, Apache Beam and other open-source solutions.
Learn quickly – ability to understand and rapidly comprehend new areas – functional and technical – and apply detailed and critical thinking to customer solutions.
Propose design solutions and recommend best practices for large scale data analysis

SKILLS:

B.tech degree in computer science, mathematics or other relevant fields.
4+years of experience in ETL, Data Warehouse, Visualization and building data pipelines.
Strong Programming skills – experience and expertise in one of the following: Java, Python, Scala, C.
Proficient in big data/distributed computing frameworks such as Apache,Spark, Kafka,
Experience with Agile implementation methodologies

RESPONSIBILITIES:

Ability to work as a member of a team assigned to design and implement data integration solutions.
Build Data pipelines using standard frameworks in Hadoop, Apache Beam and other open-source solutions.
Learn quickly – ability to understand and rapidly comprehend new areas – functional and technical – and apply detailed and critical thinking to customer solutions.
Propose design solutions and recommend best practices for large scale data analysis

SKILLS:

B.tech degree in computer science, mathematics or other relevant fields.
4+years of experience in ETL, Data Warehouse, Visualization and building data pipelines.
Strong Programming skills – experience and expertise in one of the following: Java, Python, Scala, C.
Proficient in big data/distributed computing frameworks such as Apache,Spark, Kafka,
Experience with Agile implementation methodologies

Snowflake with Spark-ETL Developer

service based company

Agency job

via Myna Solutions by Preethi M

Hyderabad

5 - 9 yrs

₹12L - ₹14L / yr

ETL

Snowflake

Data Warehouse (DWH)

Datawarehousing

Apache Spark

+4 more

Overall experience of 4 – 8 years of experience in DW / BI technologies.
Minimum 2 years of work experience on Snowflake and Azure storage.
Minimum 3 years of development experience in ETL Tool Experience.
Strong SQL database skills in other databases like Oracle, SQL Server, DB2 and Teradata
Good to have Hadoop and Spark experience.
Good conceptual knowledge on Data-Warehouse and various methodologies.
Working knowledge in any of the scripting like UNIX / Shell
Good Presentation and communication skills.
Should be flexible with the overlapping working hours.
Should be able to work independently and be proactive.
Good understanding of Agile development cycle.

Python Developer (Data Engineer)

at Milestone Hr Consultancy

2 recruiters

Posted by Jyoti Sharma

Remote, Hyderabad

3 - 8 yrs

₹6L - ₹16L / yr

Python

Django

Data engineering

Apache Hive

Apache Spark

We are currently looking for passionate Data Engineers to join our team and mission. In this role, you will help doctors from across the world improve care and save lives by helping extract insights and predict risk. Our Data Engineers ensure that data are ingested and prepared, ready for insights and intelligence to be derived from them. We’re looking for smart individuals to join our incredibly talented team, that is on a mission to transform healthcare.As a Data Engineer you will be engaged in some or all of the following activities:• Implement, test and deploy distributed data ingestion, data processing and feature engineering systems computing on large volumes of Healthcare data using a variety of open source and proprietary technologies.• Design data architectures and schemas optimized for analytics and machine learning.• Implement telemetry to monitor the performance and operations of data pipelines.• Develop tools and libraries to implement and manage data processing pipelines, including ingestion, cleaning, transformation, and feature computation.• Work with large data sets, and integrate diverse data sources, data types and data structures.• Work with Data Scientists, Machine Learning Engineers and Visualization Engineers to understand data requirements, and translate them into production-ready data pipelines.• Write and automate unit, functional, integration and performance tests in a Continuous Integration environment.• Take initiative to find solutions to technical challenges for healthcare data.You are a great match if you have some or all of the following skills and qualifications.• Strong understanding of database design and feature engineering to support Machine Learning and analytics.• At least 3 years of industry experience building, testing and deploying large-scale, distributed data processing systems.• Proficiency in working with multiple data processing tools and query languages (Python, Spark, SQL, etc.).• Excellent understanding of distributed computing concepts and Big Data technologies (Spark, Hive, etc.).• Proficiency in performance tuning and optimization of data processing pipelines.• Attention to detail and focus on software quality, with experience in software testing.• Strong cross discipline communication skills and teamwork.• Demonstrated clear and thorough logical and analytical thinking, as well as problem solving skills.• Bachelor or Masters in Computer Science or related field. Skill - Apache Spark-Python-Hive Skill Description - Skill1– SparkSkill2- PythonSkill3 – Hive, SQL Responsibility - Sr. data engineer"

Get to hear about interesting companies hiring right now

Follow Cutshort

Why apply via Cutshort?

Connect with actual hiring teams and get their fast response. No spam.

Find more jobs

Get to hear about interesting companies hiring right now

Follow Cutshort