Amazon EMR Jobs in Hyderabad

3+ Amazon EMR Jobs in Hyderabad | Amazon EMR Job openings in Hyderabad

Apply to 3+ Amazon EMR Jobs in Hyderabad on CutShort.io. Explore the latest Amazon EMR Job opportunities across top companies like Google, Amazon & Adobe.

Sr. Big Data Engineer

at Inncircles

Posted by Gangadhar M

Hyderabad

3 - 5 yrs

Best in industry

PySpark

Spark

Python

ETL

Amazon EMR

+7 more

We are looking for a highly skilled Sr. Big Data Engineer with 3-5 years of experience in

building large-scale data pipelines, real-time streaming solutions, and batch/stream

processing systems. The ideal candidate should be proficient in Spark, Kafka, Python, and

AWS Big Data services, with hands-on experience in implementing CDC (Change Data

Capture) pipelines and integrating multiple data sources and sinks.

Responsibilities

Design, develop, and optimize batch and streaming data pipelines using Apache Spark and Python.
Build and maintain real-time data ingestion pipelines leveraging Kafka and AWS Kinesis.
Implement CDC (Change Data Capture) pipelines using Kafka Connect, Debezium or similar frameworks.
Integrate data from multiple sources and sinks (databases, APIs, message queues, file systems, cloud storage).
Work with AWS Big Data ecosystem: Glue, EMR, Kinesis, Athena, S3, Lambda, Step Functions.
Ensure pipeline scalability, reliability, and performance tuning of Spark jobs and EMR clusters.
Develop data transformation and ETL workflows in AWS Glue and manage schema evolution.
Collaborate with data scientists, analysts, and product teams to deliver reliable and high-quality data solutions.
Implement monitoring, logging, and alerting for critical data pipelines.
Follow best practices for data security, compliance, and cost optimization in cloud environments.

Required Skills & Experience

Programming: Strong proficiency in Python (PySpark, data frameworks, automation).
Big Data Processing: Hands-on experience with Apache Spark (batch & streaming).
Messaging & Streaming: Proficient in Kafka (brokers, topics, partitions, consumer groups) and AWS Kinesis.
CDC Pipelines: Experience with Debezium / Kafka Connect / custom CDC frameworks.
AWS Services: AWS Glue, EMR, S3, Athena, Lambda, IAM, CloudWatch.
ETL/ELT Workflows: Strong knowledge of data ingestion, transformation, partitioning, schema management.
Databases: Experience with relational databases (MySQL, Postgres, Oracle) and NoSQL (MongoDB, DynamoDB, Cassandra).
Data Formats: JSON, Parquet, Avro, ORC, Delta/Iceberg/Hudi.
Version Control & CI/CD: Git, GitHub/GitLab, Jenkins, or CodePipeline.
Monitoring/Logging: CloudWatch, Prometheus, ELK/Opensearch.
Containers & Orchestration (nice-to-have): Docker, Kubernetes, Airflow/Step
Functions for workflow orchestration.

Preferred Qualifications

Bachelor’s or Master’s degree in Computer Science, Data Engineering, or related field.
Experience in large-scale data lake / lake house architectures.
Knowledge of data warehousing concepts and query optimisation.
Familiarity with data governance, lineage, and cataloging tools (Glue Data Catalog, Apache Atlas).
Exposure to ML/AI data pipelines is a plus.

Tools & Technologies (must-have exposure)

Big Data & Processing: Apache Spark, PySpark, AWS EMR, AWS Glue
Streaming & Messaging: Apache Kafka, Kafka Connect, Debezium, AWS Kinesis
Cloud & Storage: AWS (S3, Athena, Lambda, IAM, CloudWatch)
Programming & Scripting: Python, SQL, Bash
Orchestration: Airflow / Step Functions
Version Control & CI/CD: Git, Jenkins/CodePipeline
Data Formats: Parquet, Avro, ORC, JSON, Delta, Iceberg, Hudi

We are looking for a highly skilled Sr. Big Data Engineer with 3-5 years of experience in

building large-scale data pipelines, real-time streaming solutions, and batch/stream

processing systems. The ideal candidate should be proficient in Spark, Kafka, Python, and

AWS Big Data services, with hands-on experience in implementing CDC (Change Data

Capture) pipelines and integrating multiple data sources and sinks.

Responsibilities

Design, develop, and optimize batch and streaming data pipelines using Apache Spark and Python.
Build and maintain real-time data ingestion pipelines leveraging Kafka and AWS Kinesis.
Implement CDC (Change Data Capture) pipelines using Kafka Connect, Debezium or similar frameworks.
Integrate data from multiple sources and sinks (databases, APIs, message queues, file systems, cloud storage).
Work with AWS Big Data ecosystem: Glue, EMR, Kinesis, Athena, S3, Lambda, Step Functions.
Ensure pipeline scalability, reliability, and performance tuning of Spark jobs and EMR clusters.
Develop data transformation and ETL workflows in AWS Glue and manage schema evolution.
Collaborate with data scientists, analysts, and product teams to deliver reliable and high-quality data solutions.
Implement monitoring, logging, and alerting for critical data pipelines.
Follow best practices for data security, compliance, and cost optimization in cloud environments.

Required Skills & Experience

Programming: Strong proficiency in Python (PySpark, data frameworks, automation).
Big Data Processing: Hands-on experience with Apache Spark (batch & streaming).
Messaging & Streaming: Proficient in Kafka (brokers, topics, partitions, consumer groups) and AWS Kinesis.
CDC Pipelines: Experience with Debezium / Kafka Connect / custom CDC frameworks.
AWS Services: AWS Glue, EMR, S3, Athena, Lambda, IAM, CloudWatch.
ETL/ELT Workflows: Strong knowledge of data ingestion, transformation, partitioning, schema management.
Databases: Experience with relational databases (MySQL, Postgres, Oracle) and NoSQL (MongoDB, DynamoDB, Cassandra).
Data Formats: JSON, Parquet, Avro, ORC, Delta/Iceberg/Hudi.
Version Control & CI/CD: Git, GitHub/GitLab, Jenkins, or CodePipeline.
Monitoring/Logging: CloudWatch, Prometheus, ELK/Opensearch.
Containers & Orchestration (nice-to-have): Docker, Kubernetes, Airflow/Step
Functions for workflow orchestration.

Preferred Qualifications

Bachelor’s or Master’s degree in Computer Science, Data Engineering, or related field.
Experience in large-scale data lake / lake house architectures.
Knowledge of data warehousing concepts and query optimisation.
Familiarity with data governance, lineage, and cataloging tools (Glue Data Catalog, Apache Atlas).
Exposure to ML/AI data pipelines is a plus.

Tools & Technologies (must-have exposure)

Big Data & Processing: Apache Spark, PySpark, AWS EMR, AWS Glue
Streaming & Messaging: Apache Kafka, Kafka Connect, Debezium, AWS Kinesis
Cloud & Storage: AWS (S3, Athena, Lambda, IAM, CloudWatch)
Programming & Scripting: Python, SQL, Bash
Orchestration: Airflow / Step Functions
Version Control & CI/CD: Git, Jenkins/CodePipeline
Data Formats: Parquet, Avro, ORC, JSON, Delta, Iceberg, Hudi

Sr Hadoop Operations Engineer

Multinational Company providing energy & Automation digital

Agency job

via Jobdost by Sathish Kumar

Hyderabad

7 - 12 yrs

₹12L - ₹24L / yr

Spark

Hadoop

Big Data

Data engineering

PySpark

+5 more

Skills

Proficient experience of minimum 7 years into Hadoop. Hands-on experience of minimum 2 years into AWS - EMR/ S3 and other AWS services and dashboards. Good experience of minimum 2 years into Spark framework. Good understanding of Hadoop Eco system including Hive, MR, Spark and Zeppelin. Responsible for troubleshooting and recommendation for Spark and MR jobs. Should be able to use existing logs to debug the issue. Responsible for implementation and ongoing administration of Hadoop infrastructure including monitoring, tuning and troubleshooting Triage production issues when they occur with other operational teams. Hands on experience to troubleshoot incidents, formulate theories and test hypothesis and narrow down possibilities to find the root cause.

Skills

Senior Data Engineer

at Genesys

5 recruiters

Posted by Manojkumar Ganesh

Chennai, Hyderabad

4 - 10 yrs

₹10L - ₹40L / yr

ETL

Datawarehousing

Business Intelligence (BI)

Big Data

PySpark

+6 more

Join our team

We're looking for an experienced and passionate Data Engineer to join our team. Our vision is to empower Genesys to leverage data to drive better customer and business outcomes. Our batch and streaming solutions turn vast amounts of data into useful insights. If you’re interested in working with the latest big data technologies, using industry leading BI analytics and visualization tools, and bringing the power of data to our customers’ fingertips then this position is for you!

Our ideal candidate thrives in a fast-paced environment, enjoys the challenge of highly complex business contexts (that are typically being defined in real-time), and, above all, is a passionate about data and analytics.

What you'll get to do

Work in an agile development environment, constantly shipping and iterating.
Develop high quality batch and streaming big data pipelines.
Interface with our Data Consumers, gathering requirements, and delivering complete data solutions.
Own the design, development, and maintenance of datasets that drive key business decisions.
Support, monitor and maintain the data models
Adopt and define the standards and best practices in data engineering including data integrity, performance optimization, validation, reliability, and documentation.
Keep up-to-date with advances in big data technologies and run pilots to design the data architecture to scale with the increased data volume using cloud services.
Triage many possible courses of action in a high-ambiguity environment, making use of both quantitative analysis and business judgment.

Your experience should include

Bachelor’s degree in CS or related technical field.
5+ years of experience in data modelling, data development, and data warehousing.
Experience working with Big Data technologies (Hadoop, Hive, Spark, Kafka, Kinesis).
Experience with large scale data processing systems for both batch and streaming technologies (Hadoop, Spark, Kinesis, Flink).
Experience in programming using Python, Java or Scala.
Experience with data orchestration tools (Airflow, Oozie, Step Functions).
Solid understanding of database technologies including NoSQL and SQL.
Strong in SQL queries (experience with Snowflake Cloud Datawarehouse is a plus)
Work experience in Talend is a plus
Track record of delivering reliable data pipelines with solid test infrastructure, CICD, data quality checks, monitoring, and alerting.
Strong organizational and multitasking skills with ability to balance competing priorities.
Excellent communication (verbal and written) and interpersonal skills and an ability to effectively communicate with both business and technical teams.
An ability to work in a fast-paced environment where continuous innovation is occurring, and ambiguity is the norm.

Good to have

Experience with AWS big data technologies - S3, EMR, Kinesis, Redshift, Glue

Join our team

What you'll get to do

Work in an agile development environment, constantly shipping and iterating.
Develop high quality batch and streaming big data pipelines.
Interface with our Data Consumers, gathering requirements, and delivering complete data solutions.
Own the design, development, and maintenance of datasets that drive key business decisions.
Support, monitor and maintain the data models
Adopt and define the standards and best practices in data engineering including data integrity, performance optimization, validation, reliability, and documentation.
Keep up-to-date with advances in big data technologies and run pilots to design the data architecture to scale with the increased data volume using cloud services.
Triage many possible courses of action in a high-ambiguity environment, making use of both quantitative analysis and business judgment.

Your experience should include

Bachelor’s degree in CS or related technical field.
5+ years of experience in data modelling, data development, and data warehousing.
Experience working with Big Data technologies (Hadoop, Hive, Spark, Kafka, Kinesis).
Experience with large scale data processing systems for both batch and streaming technologies (Hadoop, Spark, Kinesis, Flink).
Experience in programming using Python, Java or Scala.
Experience with data orchestration tools (Airflow, Oozie, Step Functions).
Solid understanding of database technologies including NoSQL and SQL.
Strong in SQL queries (experience with Snowflake Cloud Datawarehouse is a plus)
Work experience in Talend is a plus
Track record of delivering reliable data pipelines with solid test infrastructure, CICD, data quality checks, monitoring, and alerting.
Strong organizational and multitasking skills with ability to balance competing priorities.
Excellent communication (verbal and written) and interpersonal skills and an ability to effectively communicate with both business and technical teams.
An ability to work in a fast-paced environment where continuous innovation is occurring, and ambiguity is the norm.

Good to have

Experience with AWS big data technologies - S3, EMR, Kinesis, Redshift, Glue

Get to hear about interesting companies hiring right now

Follow Cutshort

Why apply via Cutshort?

Connect with actual hiring teams and get their fast response. No spam.

Find more jobs

Get to hear about interesting companies hiring right now

Follow Cutshort