Apache Spark Jobs in Hyderabad

13+ Apache Spark Jobs in Hyderabad | Apache Spark Job openings in Hyderabad

Apply to 13+ Apache Spark Jobs in Hyderabad on CutShort.io. Explore the latest Apache Spark Job opportunities across top companies like Google, Amazon & Adobe.

Sr. Big Data Engineer

at Inncircles

Posted by Gangadhar M

Hyderabad

3 - 5 yrs

Best in industry

PySpark

Spark

Python

ETL

Amazon EMR

+7 more

We are looking for a highly skilled Sr. Big Data Engineer with 3-5 years of experience in

building large-scale data pipelines, real-time streaming solutions, and batch/stream

processing systems. The ideal candidate should be proficient in Spark, Kafka, Python, and

AWS Big Data services, with hands-on experience in implementing CDC (Change Data

Capture) pipelines and integrating multiple data sources and sinks.

Responsibilities

Design, develop, and optimize batch and streaming data pipelines using Apache Spark and Python.
Build and maintain real-time data ingestion pipelines leveraging Kafka and AWS Kinesis.
Implement CDC (Change Data Capture) pipelines using Kafka Connect, Debezium or similar frameworks.
Integrate data from multiple sources and sinks (databases, APIs, message queues, file systems, cloud storage).
Work with AWS Big Data ecosystem: Glue, EMR, Kinesis, Athena, S3, Lambda, Step Functions.
Ensure pipeline scalability, reliability, and performance tuning of Spark jobs and EMR clusters.
Develop data transformation and ETL workflows in AWS Glue and manage schema evolution.
Collaborate with data scientists, analysts, and product teams to deliver reliable and high-quality data solutions.
Implement monitoring, logging, and alerting for critical data pipelines.
Follow best practices for data security, compliance, and cost optimization in cloud environments.

Required Skills & Experience

Programming: Strong proficiency in Python (PySpark, data frameworks, automation).
Big Data Processing: Hands-on experience with Apache Spark (batch & streaming).
Messaging & Streaming: Proficient in Kafka (brokers, topics, partitions, consumer groups) and AWS Kinesis.
CDC Pipelines: Experience with Debezium / Kafka Connect / custom CDC frameworks.
AWS Services: AWS Glue, EMR, S3, Athena, Lambda, IAM, CloudWatch.
ETL/ELT Workflows: Strong knowledge of data ingestion, transformation, partitioning, schema management.
Databases: Experience with relational databases (MySQL, Postgres, Oracle) and NoSQL (MongoDB, DynamoDB, Cassandra).
Data Formats: JSON, Parquet, Avro, ORC, Delta/Iceberg/Hudi.
Version Control & CI/CD: Git, GitHub/GitLab, Jenkins, or CodePipeline.
Monitoring/Logging: CloudWatch, Prometheus, ELK/Opensearch.
Containers & Orchestration (nice-to-have): Docker, Kubernetes, Airflow/Step
Functions for workflow orchestration.

Preferred Qualifications

Bachelor’s or Master’s degree in Computer Science, Data Engineering, or related field.
Experience in large-scale data lake / lake house architectures.
Knowledge of data warehousing concepts and query optimisation.
Familiarity with data governance, lineage, and cataloging tools (Glue Data Catalog, Apache Atlas).
Exposure to ML/AI data pipelines is a plus.

Tools & Technologies (must-have exposure)

Big Data & Processing: Apache Spark, PySpark, AWS EMR, AWS Glue
Streaming & Messaging: Apache Kafka, Kafka Connect, Debezium, AWS Kinesis
Cloud & Storage: AWS (S3, Athena, Lambda, IAM, CloudWatch)
Programming & Scripting: Python, SQL, Bash
Orchestration: Airflow / Step Functions
Version Control & CI/CD: Git, Jenkins/CodePipeline
Data Formats: Parquet, Avro, ORC, JSON, Delta, Iceberg, Hudi

We are looking for a highly skilled Sr. Big Data Engineer with 3-5 years of experience in

building large-scale data pipelines, real-time streaming solutions, and batch/stream

processing systems. The ideal candidate should be proficient in Spark, Kafka, Python, and

AWS Big Data services, with hands-on experience in implementing CDC (Change Data

Capture) pipelines and integrating multiple data sources and sinks.

Responsibilities

Design, develop, and optimize batch and streaming data pipelines using Apache Spark and Python.
Build and maintain real-time data ingestion pipelines leveraging Kafka and AWS Kinesis.
Implement CDC (Change Data Capture) pipelines using Kafka Connect, Debezium or similar frameworks.
Integrate data from multiple sources and sinks (databases, APIs, message queues, file systems, cloud storage).
Work with AWS Big Data ecosystem: Glue, EMR, Kinesis, Athena, S3, Lambda, Step Functions.
Ensure pipeline scalability, reliability, and performance tuning of Spark jobs and EMR clusters.
Develop data transformation and ETL workflows in AWS Glue and manage schema evolution.
Collaborate with data scientists, analysts, and product teams to deliver reliable and high-quality data solutions.
Implement monitoring, logging, and alerting for critical data pipelines.
Follow best practices for data security, compliance, and cost optimization in cloud environments.

Required Skills & Experience

Programming: Strong proficiency in Python (PySpark, data frameworks, automation).
Big Data Processing: Hands-on experience with Apache Spark (batch & streaming).
Messaging & Streaming: Proficient in Kafka (brokers, topics, partitions, consumer groups) and AWS Kinesis.
CDC Pipelines: Experience with Debezium / Kafka Connect / custom CDC frameworks.
AWS Services: AWS Glue, EMR, S3, Athena, Lambda, IAM, CloudWatch.
ETL/ELT Workflows: Strong knowledge of data ingestion, transformation, partitioning, schema management.
Databases: Experience with relational databases (MySQL, Postgres, Oracle) and NoSQL (MongoDB, DynamoDB, Cassandra).
Data Formats: JSON, Parquet, Avro, ORC, Delta/Iceberg/Hudi.
Version Control & CI/CD: Git, GitHub/GitLab, Jenkins, or CodePipeline.
Monitoring/Logging: CloudWatch, Prometheus, ELK/Opensearch.
Containers & Orchestration (nice-to-have): Docker, Kubernetes, Airflow/Step
Functions for workflow orchestration.

Preferred Qualifications

Bachelor’s or Master’s degree in Computer Science, Data Engineering, or related field.
Experience in large-scale data lake / lake house architectures.
Knowledge of data warehousing concepts and query optimisation.
Familiarity with data governance, lineage, and cataloging tools (Glue Data Catalog, Apache Atlas).
Exposure to ML/AI data pipelines is a plus.

Tools & Technologies (must-have exposure)

Big Data & Processing: Apache Spark, PySpark, AWS EMR, AWS Glue
Streaming & Messaging: Apache Kafka, Kafka Connect, Debezium, AWS Kinesis
Cloud & Storage: AWS (S3, Athena, Lambda, IAM, CloudWatch)
Programming & Scripting: Python, SQL, Bash
Orchestration: Airflow / Step Functions
Version Control & CI/CD: Git, Jenkins/CodePipeline
Data Formats: Parquet, Avro, ORC, JSON, Delta, Iceberg, Hudi

Big Data Developer

at CoffeeBeans

2 candid answers

Posted by Nikita Sinha

Bengaluru (Bangalore), Pune, Hyderabad

5 - 8 yrs

Upto ₹28L / yr (Varies

)

Apache Spark

Scala

Python

Focus Areas:

Build applications and solutions that process and analyze large-scale data.
Develop data-driven applications and analytical tools.
Implement business logic, algorithms, and backend services.
Design and build APIs for secure and efficient data exchange.

Key Responsibilities:

Develop and maintain data processing applications using Apache Spark and Hadoop.
Write MapReduce jobs and complex data transformation logic.
Implement machine learning models and analytics solutions for business use cases.
Optimize code for performance and scalability; perform debugging and troubleshooting.
Work hands-on with Databricks for data engineering and analysis.
Design and manage Airflow DAGs for orchestration and automation.
Integrate and maintain CI/CD pipelines (preferably using Jenkins).

Primary Skills & Qualifications:

Strong programming skills in Scala and Python.
Expertise in Apache Spark for large-scale data processing.
Solid understanding of data structures and algorithms.
Proven experience in application development and software engineering best practices.
Experience working in agile and collaborative environments.

Focus Areas:

Build applications and solutions that process and analyze large-scale data.
Develop data-driven applications and analytical tools.
Implement business logic, algorithms, and backend services.
Design and build APIs for secure and efficient data exchange.

Key Responsibilities:

Develop and maintain data processing applications using Apache Spark and Hadoop.
Write MapReduce jobs and complex data transformation logic.
Implement machine learning models and analytics solutions for business use cases.
Optimize code for performance and scalability; perform debugging and troubleshooting.
Work hands-on with Databricks for data engineering and analysis.
Design and manage Airflow DAGs for orchestration and automation.
Integrate and maintain CI/CD pipelines (preferably using Jenkins).

Primary Skills & Qualifications:

Strong programming skills in Scala and Python.
Expertise in Apache Spark for large-scale data processing.
Solid understanding of data structures and algorithms.
Proven experience in application development and software engineering best practices.
Experience working in agile and collaborative environments.

data engineer

top MNC

Agency job

via Vy Systems by thirega thanasekaran

Bengaluru (Bangalore), Chennai, Hyderabad, Coimbatore, Kochi (Cochin), Thrissur, Thiruvananthapuram, Kozhikode (Calicut), Kasaragod

5 - 12 yrs

₹5L - ₹9L / yr

Data engineering

databricks

Apache Synapse

Apache Spark

Job Summary:

Seeking an experienced Senior Data Engineer to lead data ingestion, transformation, and optimization initiatives using the modern Apache and Azure data stack. The role involves working on scalable pipelines, large-scale distributed systems, and data lake management.

Core Responsibilities:

· Build and manage high-volume data pipelines using Spark/Databricks.

· Implement ELT frameworks using Azure Data Factory/Synapse Pipelines.

· Optimize large-scale datasets in Delta/Iceberg formats.

· Implement robust data quality, monitoring, and governance layers.

· Collaborate with Data Scientists, Analysts, and Business stakeholders.

Technical Stack:

· Big Data: Apache Spark, Kafka, Hive, Airflow, Hudi/Iceberg

· Cloud: Azure (Synapse, ADF, ADLS Gen2), Databricks, AWS (Glue/S3)

· Languages: Python, Scala, SQL

· Storage Formats: Delta Lake, Iceberg, Parquet, ORC

· CI/CD: Azure DevOps, Terraform (infra as code), Git

Senior Data Engineer (Apache Stack + Databricks/Synapse)

Share cv to

Thirega@ vysystems dot com - WhatsApp - 91Five0033Five2Three

Job Summary:

Core Responsibilities:

· Build and manage high-volume data pipelines using Spark/Databricks.

· Implement ELT frameworks using Azure Data Factory/Synapse Pipelines.

· Optimize large-scale datasets in Delta/Iceberg formats.

· Implement robust data quality, monitoring, and governance layers.

· Collaborate with Data Scientists, Analysts, and Business stakeholders.

Technical Stack:

· Big Data: Apache Spark, Kafka, Hive, Airflow, Hudi/Iceberg

· Cloud: Azure (Synapse, ADF, ADLS Gen2), Databricks, AWS (Glue/S3)

· Languages: Python, Scala, SQL

· Storage Formats: Delta Lake, Iceberg, Parquet, ORC

· CI/CD: Azure DevOps, Terraform (infra as code), Git

Senior Data Engineer (Apache Stack + Databricks/Synapse)

Share cv to

Thirega@ vysystems dot com - WhatsApp - 91Five0033Five2Three

Data Engineer

at Frisco Analytics Pvt Ltd

Posted by Cedrick Mariadas

Bengaluru (Bangalore), Hyderabad

5 - 8 yrs

₹15L - ₹20L / yr

databricks

Apache Spark

Python

SQL

MySQL

+3 more

We are actively seeking a self-motivated Data Engineer with expertise in Azure cloud and Databricks, with a thorough understanding of Delta Lake and Lake-house Architecture. The ideal candidate should excel in developing scalable data solutions, crafting platform tools, and integrating systems, while demonstrating proficiency in cloud-native database solutions and distributed data processing.

Key Responsibilities:

Contribute to the development and upkeep of a scalable data platform, incorporating tools and frameworks that leverage Azure and Databricks capabilities.
Exhibit proficiency in various RDBMS databases such as MySQL and SQL-Server, emphasizing their integration in applications and pipeline development.
Design and maintain high-caliber code, including data pipelines and applications, utilizing Python, Scala, and PHP.
Implement effective data processing solutions via Apache Spark, optimizing Spark applications for large-scale data handling.
Optimize data storage using formats like Parquet and Delta Lake to ensure efficient data accessibility and reliable performance.
Demonstrate understanding of Hive Metastore, Unity Catalog Metastore, and the operational dynamics of external tables.
Collaborate with diverse teams to convert business requirements into precise technical specifications.

Requirements:

Bachelor’s degree in Computer Science, Engineering, or a related discipline.
Demonstrated hands-on experience with Azure cloud services and Databricks.
Proficient programming skills in Python, Scala, and PHP.
In-depth knowledge of SQL, NoSQL databases, and data warehousing principles.
Familiarity with distributed data processing and external table management.
Insight into enterprise data solutions for PIM, CDP, MDM, and ERP applications.
Exceptional problem-solving acumen and meticulous attention to detail.

Additional Qualifications :

Acquaintance with data security and privacy standards.
Experience in CI/CD pipelines and version control systems, notably Git.
Familiarity with Agile methodologies and DevOps practices.
Competence in technical writing for comprehensive documentation.

Key Responsibilities:

Contribute to the development and upkeep of a scalable data platform, incorporating tools and frameworks that leverage Azure and Databricks capabilities.
Exhibit proficiency in various RDBMS databases such as MySQL and SQL-Server, emphasizing their integration in applications and pipeline development.
Design and maintain high-caliber code, including data pipelines and applications, utilizing Python, Scala, and PHP.
Implement effective data processing solutions via Apache Spark, optimizing Spark applications for large-scale data handling.
Optimize data storage using formats like Parquet and Delta Lake to ensure efficient data accessibility and reliable performance.
Demonstrate understanding of Hive Metastore, Unity Catalog Metastore, and the operational dynamics of external tables.
Collaborate with diverse teams to convert business requirements into precise technical specifications.

Requirements:

Bachelor’s degree in Computer Science, Engineering, or a related discipline.
Demonstrated hands-on experience with Azure cloud services and Databricks.
Proficient programming skills in Python, Scala, and PHP.
In-depth knowledge of SQL, NoSQL databases, and data warehousing principles.
Familiarity with distributed data processing and external table management.
Insight into enterprise data solutions for PIM, CDP, MDM, and ERP applications.
Exceptional problem-solving acumen and meticulous attention to detail.

Additional Qualifications :

Acquaintance with data security and privacy standards.
Experience in CI/CD pipelines and version control systems, notably Git.
Familiarity with Agile methodologies and DevOps practices.
Competence in technical writing for comprehensive documentation.

Senior Data Engineer (L2)

at Publicis Sapient

10 recruiters

Posted by Mohit Singh

Bengaluru (Bangalore), Pune, Hyderabad, Gurugram, Noida

5 - 11 yrs

₹20L - ₹36L / yr

PySpark

Data engineering

Big Data

Hadoop

Spark

+7 more

Publicis Sapient Overview:

The Senior Associate People Senior Associate L1 in Data Engineering, you will translate client requirements into technical design, and implement components for data engineering solution. Utilize deep understanding of data integration and big data design principles in creating custom solutions or implementing package solutions. You will independently drive design discussions to insure the necessary health of the overall solution

Job Summary:

As Senior Associate L2 in Data Engineering, you will translate client requirements into technical design, and implement components for data engineering solution. Utilize deep understanding of data integration and big data design principles in creating custom solutions or implementing package solutions. You will independently drive design discussions to insure the necessary health of the overall solution

The role requires a hands-on technologist who has strong programming background like Java / Scala / Python, should have experience in Data Ingestion, Integration and data Wrangling, Computation, Analytics pipelines and exposure to Hadoop ecosystem components. You are also required to have hands-on knowledge on at least one of AWS, GCP, Azure cloud platforms.

Role & Responsibilities:

Your role is focused on Design, Development and delivery of solutions involving:

• Data Integration, Processing & Governance

• Data Storage and Computation Frameworks, Performance Optimizations

• Analytics & Visualizations

• Infrastructure & Cloud Computing

• Data Management Platforms

• Implement scalable architectural models for data processing and storage

• Build functionality for data ingestion from multiple heterogeneous sources in batch & real-time mode

• Build functionality for data analytics, search and aggregation

Experience Guidelines:

Mandatory Experience and Competencies:

# Competency

1.Overall 5+ years of IT experience with 3+ years in Data related technologies

2.Minimum 2.5 years of experience in Big Data technologies and working exposure in at least one cloud platform on related data services (AWS / Azure / GCP)

3.Hands-on experience with the Hadoop stack – HDFS, sqoop, kafka, Pulsar, NiFi, Spark, Spark Streaming, Flink, Storm, hive, oozie, airflow and other components required in building end to end data pipeline.

4.Strong experience in at least of the programming language Java, Scala, Python. Java preferable

5.Hands-on working knowledge of NoSQL and MPP data platforms like Hbase, MongoDb, Cassandra, AWS Redshift, Azure SQLDW, GCP BigQuery etc

6.Well-versed and working knowledge with data platform related services on at least 1 cloud platform, IAM and data security

Preferred Experience and Knowledge (Good to Have):

# Competency

1.Good knowledge of traditional ETL tools (Informatica, Talend, etc) and database technologies (Oracle, MySQL, SQL Server, Postgres) with hands on experience

2.Knowledge on data governance processes (security, lineage, catalog) and tools like Collibra, Alation etc

3.Knowledge on distributed messaging frameworks like ActiveMQ / RabbiMQ / Solace, search & indexing and Micro services architectures

4.Performance tuning and optimization of data pipelines

5.CI/CD – Infra provisioning on cloud, auto build & deployment pipelines, code quality

6.Cloud data specialty and other related Big data technology certifications

Personal Attributes:

• Strong written and verbal communication skills

• Articulation skills

• Good team player

• Self-starter who requires minimal oversight

• Ability to prioritize and manage multiple tasks

• Process orientation and the ability to define and set up processes

Publicis Sapient Overview:

Job Summary:

Role & Responsibilities:

Your role is focused on Design, Development and delivery of solutions involving:

• Data Integration, Processing & Governance

• Data Storage and Computation Frameworks, Performance Optimizations

• Analytics & Visualizations

• Infrastructure & Cloud Computing

• Data Management Platforms

• Implement scalable architectural models for data processing and storage

• Build functionality for data ingestion from multiple heterogeneous sources in batch & real-time mode

• Build functionality for data analytics, search and aggregation

Experience Guidelines:

Mandatory Experience and Competencies:

# Competency

1.Overall 5+ years of IT experience with 3+ years in Data related technologies

2.Minimum 2.5 years of experience in Big Data technologies and working exposure in at least one cloud platform on related data services (AWS / Azure / GCP)

4.Strong experience in at least of the programming language Java, Scala, Python. Java preferable

5.Hands-on working knowledge of NoSQL and MPP data platforms like Hbase, MongoDb, Cassandra, AWS Redshift, Azure SQLDW, GCP BigQuery etc

6.Well-versed and working knowledge with data platform related services on at least 1 cloud platform, IAM and data security

Preferred Experience and Knowledge (Good to Have):

# Competency

1.Good knowledge of traditional ETL tools (Informatica, Talend, etc) and database technologies (Oracle, MySQL, SQL Server, Postgres) with hands on experience

2.Knowledge on data governance processes (security, lineage, catalog) and tools like Collibra, Alation etc

3.Knowledge on distributed messaging frameworks like ActiveMQ / RabbiMQ / Solace, search & indexing and Micro services architectures

4.Performance tuning and optimization of data pipelines

5.CI/CD – Infra provisioning on cloud, auto build & deployment pipelines, code quality

6.Cloud data specialty and other related Big data technology certifications

Personal Attributes:

• Strong written and verbal communication skills

• Articulation skills

• Good team player

• Self-starter who requires minimal oversight

• Ability to prioritize and manage multiple tasks

• Process orientation and the ability to define and set up processes

Senior Data Engineering Role - Google Cloud Platform with Spark

A LEADING US BASED MNC

Agency job

via Zeal Consultants by Zeal Consultants

Bengaluru (Bangalore), Hyderabad, Delhi, Gurugram

5 - 10 yrs

₹14L - ₹15L / yr

Google Cloud Platform (GCP)

Spark

PySpark

Apache Spark

"DATA STREAMING"

Data Engineering : Senior Engineer / Manager

As Senior Engineer/ Manager in Data Engineering, you will translate client requirements into technical design, and implement components for a data engineering solutions. Utilize a deep understanding of data integration and big data design principles in creating custom solutions or implementing package solutions. You will independently drive design discussions to insure the necessary health of the overall solution.

Must Have skills :

1. GCP

2. Spark streaming : Live data streaming experience is desired.

3. Any 1 coding language: Java/Pyhton /Scala

Skills & Experience :

- Overall experience of MINIMUM 5+ years with Minimum 4 years of relevant experience in Big Data technologies

- Hands-on experience with the Hadoop stack - HDFS, sqoop, kafka, Pulsar, NiFi, Spark, Spark Streaming, Flink, Storm, hive, oozie, airflow and other components required in building end to end data pipeline. Working knowledge on real-time data pipelines is added advantage.

- Strong experience in at least of the programming language Java, Scala, Python. Java preferable

- Hands-on working knowledge of NoSQL and MPP data platforms like Hbase, MongoDb, Cassandra, AWS Redshift, Azure SQLDW, GCP BigQuery etc.

- Well-versed and working knowledge with data platform related services on GCP

- Bachelor's degree and year of work experience of 6 to 12 years or any combination of education, training and/or experience that demonstrates the ability to perform the duties of the position

Your Impact :

- Data Ingestion, Integration and Transformation

- Data Storage and Computation Frameworks, Performance Optimizations

- Analytics & Visualizations

- Infrastructure & Cloud Computing

- Data Management Platforms

- Build functionality for data ingestion from multiple heterogeneous sources in batch & real-time

- Build functionality for data analytics, search and aggregation

Data Engineering : Senior Engineer / Manager

Must Have skills :

1. GCP

2. Spark streaming : Live data streaming experience is desired.

3. Any 1 coding language: Java/Pyhton /Scala

Skills & Experience :

- Overall experience of MINIMUM 5+ years with Minimum 4 years of relevant experience in Big Data technologies

- Strong experience in at least of the programming language Java, Scala, Python. Java preferable

- Hands-on working knowledge of NoSQL and MPP data platforms like Hbase, MongoDb, Cassandra, AWS Redshift, Azure SQLDW, GCP BigQuery etc.

- Well-versed and working knowledge with data platform related services on GCP

- Bachelor's degree and year of work experience of 6 to 12 years or any combination of education, training and/or experience that demonstrates the ability to perform the duties of the position

Your Impact :

- Data Ingestion, Integration and Transformation

- Data Storage and Computation Frameworks, Performance Optimizations

- Analytics & Visualizations

- Infrastructure & Cloud Computing

- Data Management Platforms

- Build functionality for data ingestion from multiple heterogeneous sources in batch & real-time

- Build functionality for data analytics, search and aggregation

Data Lake Engineer

at [x]cube LABS

2 candid answers

1 video

Posted by Krishna kandregula

Hyderabad

2 - 6 yrs

₹8L - ₹20L / yr

ETL

Informatica

Data Warehouse (DWH)

PowerBI

DAX

+12 more

Creating and managing ETL/ELT pipelines based on requirements
Build PowerBI dashboards and manage datasets needed.
Work with stakeholders to identify data structures needed for future and perform any transformations including aggregations.
Build data cubes for real-time visualisation needs and CXO dashboards.

Required Tech Skills

Microsoft PowerBI & DAX
Python, Pandas, PyArrow, Jupyter Noteboks, ApacheSpark
Azure Synapse, Azure DataBricks, Azure HDInsight, Azure Data Factory

Creating and managing ETL/ELT pipelines based on requirements
Build PowerBI dashboards and manage datasets needed.
Work with stakeholders to identify data structures needed for future and perform any transformations including aggregations.
Build data cubes for real-time visualisation needs and CXO dashboards.

Required Tech Skills

Microsoft PowerBI & DAX
Python, Pandas, PyArrow, Jupyter Noteboks, ApacheSpark
Azure Synapse, Azure DataBricks, Azure HDInsight, Azure Data Factory

Data Engineer

at Accolite Digital

Posted by Nitesh Parab

Bengaluru (Bangalore), Hyderabad, Gurugram, Delhi, Noida, Ghaziabad, Faridabad

4 - 8 yrs

₹5L - ₹15L / yr

ETL

Informatica

Data Warehouse (DWH)

SSIS

SQL Server Integration Services (SSIS)

+10 more

Job Title: Data Engineer

Job Summary: As a Data Engineer, you will be responsible for designing, building, and maintaining the infrastructure and tools necessary for data collection, storage, processing, and analysis. You will work closely with data scientists and analysts to ensure that data is available, accessible, and in a format that can be easily consumed for business insights.

Responsibilities:

Design, build, and maintain data pipelines to collect, store, and process data from various sources.
Create and manage data warehousing and data lake solutions.
Develop and maintain data processing and data integration tools.
Collaborate with data scientists and analysts to design and implement data models and algorithms for data analysis.
Optimize and scale existing data infrastructure to ensure it meets the needs of the business.
Ensure data quality and integrity across all data sources.
Develop and implement best practices for data governance, security, and privacy.
Monitor data pipeline performance / Errors and troubleshoot issues as needed.
Stay up-to-date with emerging data technologies and best practices.

Requirements:

Bachelor's degree in Computer Science, Information Systems, or a related field.

Experience with ETL tools like Matillion,SSIS,Informatica

Experience with SQL and relational databases such as SQL server, MySQL, PostgreSQL, or Oracle.

Experience in writing complex SQL queries

Strong programming skills in languages such as Python, Java, or Scala.

Experience with data modeling, data warehousing, and data integration.

Strong problem-solving skills and ability to work independently.

Excellent communication and collaboration skills.

Familiarity with big data technologies such as Hadoop, Spark, or Kafka.

Familiarity with data warehouse/Data lake technologies like Snowflake or Databricks

Familiarity with cloud computing platforms such as AWS, Azure, or GCP.

Familiarity with Reporting tools

Teamwork/ growth contribution

Helping the team in taking the Interviews and identifying right candidates
Adhering to timelines
Intime status communication and upfront communication of any risks
Tech, train, share knowledge with peers.
Good Communication skills
Proven abilities to take initiative and be innovative
Analytical mind with a problem-solving aptitude

Good to have :

Master's degree in Computer Science, Information Systems, or a related field.

Experience with NoSQL databases such as MongoDB or Cassandra.

Familiarity with data visualization and business intelligence tools such as Tableau or Power BI.

Knowledge of machine learning and statistical modeling techniques.

If you are passionate about data and want to work with a dynamic team of data scientists and analysts, we encourage you to apply for this position.

Job Title: Data Engineer

Responsibilities:

Design, build, and maintain data pipelines to collect, store, and process data from various sources.
Create and manage data warehousing and data lake solutions.
Develop and maintain data processing and data integration tools.
Collaborate with data scientists and analysts to design and implement data models and algorithms for data analysis.
Optimize and scale existing data infrastructure to ensure it meets the needs of the business.
Ensure data quality and integrity across all data sources.
Develop and implement best practices for data governance, security, and privacy.
Monitor data pipeline performance / Errors and troubleshoot issues as needed.
Stay up-to-date with emerging data technologies and best practices.

Requirements:

Bachelor's degree in Computer Science, Information Systems, or a related field.

Experience with ETL tools like Matillion,SSIS,Informatica

Experience with SQL and relational databases such as SQL server, MySQL, PostgreSQL, or Oracle.

Experience in writing complex SQL queries

Strong programming skills in languages such as Python, Java, or Scala.

Experience with data modeling, data warehousing, and data integration.

Strong problem-solving skills and ability to work independently.

Excellent communication and collaboration skills.

Familiarity with big data technologies such as Hadoop, Spark, or Kafka.

Familiarity with data warehouse/Data lake technologies like Snowflake or Databricks

Familiarity with cloud computing platforms such as AWS, Azure, or GCP.

Familiarity with Reporting tools

Teamwork/ growth contribution

Helping the team in taking the Interviews and identifying right candidates
Adhering to timelines
Intime status communication and upfront communication of any risks
Tech, train, share knowledge with peers.
Good Communication skills
Proven abilities to take initiative and be innovative
Analytical mind with a problem-solving aptitude

Good to have :

Master's degree in Computer Science, Information Systems, or a related field.

Experience with NoSQL databases such as MongoDB or Cassandra.

Familiarity with data visualization and business intelligence tools such as Tableau or Power BI.

Knowledge of machine learning and statistical modeling techniques.

If you are passionate about data and want to work with a dynamic team of data scientists and analysts, we encourage you to apply for this position.

Data Engineer

at Hammoq

1 recruiter

Posted by Nikitha Muthuswamy

Remote, Indore, Ujjain, Hyderabad, Bengaluru (Bangalore)

5 - 8 yrs

₹5L - ₹15L / yr

pandas

NumPy

Data engineering

Data Engineer

Apache Spark

+6 more

Does analytics to extract insights from raw historical data of the organization.
Generates usable training dataset for any/all MV projects with the help of Annotators, if needed.
Analyses user trends, and identifies their biggest bottlenecks in Hammoq Workflow.
Tests the short/long term impact of productized MV models on those trends.
Skills - Numpy, Pandas, SPARK, APACHE SPARK, PYSPARK, ETL mandatory.

Does analytics to extract insights from raw historical data of the organization.
Generates usable training dataset for any/all MV projects with the help of Annotators, if needed.
Analyses user trends, and identifies their biggest bottlenecks in Hammoq Workflow.
Tests the short/long term impact of productized MV models on those trends.
Skills - Numpy, Pandas, SPARK, APACHE SPARK, PYSPARK, ETL mandatory.

Big Data Spark Lead

at DataMetica

1 video

7 recruiters

Posted by Sumangali Desai

Pune, Hyderabad

7 - 12 yrs

₹7L - ₹20L / yr

Apache Spark

Big Data

Spark

Scala

Hadoop

+3 more

We at Datametica Solutions Private Limited are looking for Big Data Spark Lead who have a passion for cloud with knowledge of different on-premise and cloud Data implementation in the field of Big Data and Analytics including and not limiting to Teradata, Netezza, Exadata, Oracle, Cloudera, Hortonworks and alike.
Ideal candidates should have technical experience in migrations and the ability to help customers get value from Datametica's tools and accelerators.

Job Description
Experience : 7+ years
Location : Pune / Hyderabad
Skills :

Drive and participate in requirements gathering workshops, estimation discussions, design meetings and status review meetings
Participate and contribute in Solution Design and Solution Architecture for implementing Big Data Projects on-premise and on cloud
Technical Hands on experience in design, coding, development and managing Large Hadoop implementation
Proficient in SQL, Hive, PIG, Spark SQL, Shell Scripting, Kafka, Flume, Scoop with large Big Data and Data Warehousing projects with either Java, Python or Scala based Hadoop programming background
Proficient with various development methodologies like waterfall, agile/scrum and iterative
Good Interpersonal skills and excellent communication skills for US and UK based clients

About Us!
A global Leader in the Data Warehouse Migration and Modernization to the Cloud, we empower businesses by migrating their Data/Workload/ETL/Analytics to the Cloud by leveraging Automation.

We have expertise in transforming legacy Teradata, Oracle, Hadoop, Netezza, Vertica, Greenplum along with ETLs like Informatica, Datastage, AbInitio & others, to cloud-based data warehousing with other capabilities in data engineering, advanced analytics solutions, data management, data lake and cloud optimization.

Datametica is a key partner of the major cloud service providers - Google, Microsoft, Amazon, Snowflake.

We have our own products!
Eagle – Data warehouse Assessment & Migration Planning Product
Raven – Automated Workload Conversion Product
Pelican - Automated Data Validation Product, which helps automate and accelerate data migration to the cloud.

Why join us!
Datametica is a place to innovate, bring new ideas to live and learn new things. We believe in building a culture of innovation, growth and belonging. Our people and their dedication over these years are the key factors in achieving our success.

Benefits we Provide!
Working with Highly Technical and Passionate, mission-driven people
Subsidized Meals & Snacks
Flexible Schedule
Approachable leadership
Access to various learning tools and programs
Pet Friendly
Certification Reimbursement Policy

Check out more about us on our website below!
www.datametica.com

Drive and participate in requirements gathering workshops, estimation discussions, design meetings and status review meetings
Participate and contribute in Solution Design and Solution Architecture for implementing Big Data Projects on-premise and on cloud
Technical Hands on experience in design, coding, development and managing Large Hadoop implementation
Proficient in SQL, Hive, PIG, Spark SQL, Shell Scripting, Kafka, Flume, Scoop with large Big Data and Data Warehousing projects with either Java, Python or Scala based Hadoop programming background
Proficient with various development methodologies like waterfall, agile/scrum and iterative
Good Interpersonal skills and excellent communication skills for US and UK based clients

Data Engineer

at SpringML

1 video

2 recruiters

Posted by Kayal Vizhi

Hyderabad

4 - 11 yrs

₹8L - ₹20L / yr

Big Data

Hadoop

Apache Spark

Spark

Data Structures

+3 more

SpringML is looking to hire a top-notch Senior Data Engineer who is passionate about working with data and using the latest distributed framework to process large dataset. As an Associate Data Engineer, your primary role will be to design and build data pipelines. You will be focused on helping client projects on data integration, data prep and implementing machine learning on datasets. In this role, you will work on some of the latest technologies, collaborate with partners on early win, consultative approach with clients, interact daily with executive leadership, and help build a great company. Chosen team members will be part of the core team and play a critical role in scaling up our emerging practice.

RESPONSIBILITIES:

Ability to work as a member of a team assigned to design and implement data integration solutions.
Build Data pipelines using standard frameworks in Hadoop, Apache Beam and other open-source solutions.
Learn quickly – ability to understand and rapidly comprehend new areas – functional and technical – and apply detailed and critical thinking to customer solutions.
Propose design solutions and recommend best practices for large scale data analysis

SKILLS:

B.tech degree in computer science, mathematics or other relevant fields.
4+years of experience in ETL, Data Warehouse, Visualization and building data pipelines.
Strong Programming skills – experience and expertise in one of the following: Java, Python, Scala, C.
Proficient in big data/distributed computing frameworks such as Apache,Spark, Kafka,
Experience with Agile implementation methodologies

RESPONSIBILITIES:

Ability to work as a member of a team assigned to design and implement data integration solutions.
Build Data pipelines using standard frameworks in Hadoop, Apache Beam and other open-source solutions.
Learn quickly – ability to understand and rapidly comprehend new areas – functional and technical – and apply detailed and critical thinking to customer solutions.
Propose design solutions and recommend best practices for large scale data analysis

SKILLS:

B.tech degree in computer science, mathematics or other relevant fields.
4+years of experience in ETL, Data Warehouse, Visualization and building data pipelines.
Strong Programming skills – experience and expertise in one of the following: Java, Python, Scala, C.
Proficient in big data/distributed computing frameworks such as Apache,Spark, Kafka,
Experience with Agile implementation methodologies

Snowflake with Spark-ETL Developer

service based company

Agency job

via Myna Solutions by Preethi M

Hyderabad

5 - 9 yrs

₹12L - ₹14L / yr

ETL

Snowflake

Data Warehouse (DWH)

Datawarehousing

Apache Spark

+4 more

Overall experience of 4 – 8 years of experience in DW / BI technologies.
Minimum 2 years of work experience on Snowflake and Azure storage.
Minimum 3 years of development experience in ETL Tool Experience.
Strong SQL database skills in other databases like Oracle, SQL Server, DB2 and Teradata
Good to have Hadoop and Spark experience.
Good conceptual knowledge on Data-Warehouse and various methodologies.
Working knowledge in any of the scripting like UNIX / Shell
Good Presentation and communication skills.
Should be flexible with the overlapping working hours.
Should be able to work independently and be proactive.
Good understanding of Agile development cycle.

Python Developer (Data Engineer)

at Milestone Hr Consultancy

2 recruiters

Posted by Jyoti Sharma

Remote, Hyderabad

3 - 8 yrs

₹6L - ₹16L / yr

Python

Django

Data engineering

Apache Hive

Apache Spark

We are currently looking for passionate Data Engineers to join our team and mission. In this role, you will help doctors from across the world improve care and save lives by helping extract insights and predict risk. Our Data Engineers ensure that data are ingested and prepared, ready for insights and intelligence to be derived from them. We’re looking for smart individuals to join our incredibly talented team, that is on a mission to transform healthcare.As a Data Engineer you will be engaged in some or all of the following activities:• Implement, test and deploy distributed data ingestion, data processing and feature engineering systems computing on large volumes of Healthcare data using a variety of open source and proprietary technologies.• Design data architectures and schemas optimized for analytics and machine learning.• Implement telemetry to monitor the performance and operations of data pipelines.• Develop tools and libraries to implement and manage data processing pipelines, including ingestion, cleaning, transformation, and feature computation.• Work with large data sets, and integrate diverse data sources, data types and data structures.• Work with Data Scientists, Machine Learning Engineers and Visualization Engineers to understand data requirements, and translate them into production-ready data pipelines.• Write and automate unit, functional, integration and performance tests in a Continuous Integration environment.• Take initiative to find solutions to technical challenges for healthcare data.You are a great match if you have some or all of the following skills and qualifications.• Strong understanding of database design and feature engineering to support Machine Learning and analytics.• At least 3 years of industry experience building, testing and deploying large-scale, distributed data processing systems.• Proficiency in working with multiple data processing tools and query languages (Python, Spark, SQL, etc.).• Excellent understanding of distributed computing concepts and Big Data technologies (Spark, Hive, etc.).• Proficiency in performance tuning and optimization of data processing pipelines.• Attention to detail and focus on software quality, with experience in software testing.• Strong cross discipline communication skills and teamwork.• Demonstrated clear and thorough logical and analytical thinking, as well as problem solving skills.• Bachelor or Masters in Computer Science or related field. Skill - Apache Spark-Python-Hive Skill Description - Skill1– SparkSkill2- PythonSkill3 – Hive, SQL Responsibility - Sr. data engineer"

Get to hear about interesting companies hiring right now

Follow Cutshort

Why apply via Cutshort?

Connect with actual hiring teams and get their fast response. No spam.

Find more jobs

Get to hear about interesting companies hiring right now

Follow Cutshort