Athena jobs

12+ athena Jobs in India

Apply to 12+ athena Jobs on CutShort.io. Find your next job, effortlessly. Browse athena Jobs and apply today!

Sr. Big Data Engineer

at Inncircles

Posted by Gangadhar M

Hyderabad

3 - 5 yrs

Best in industry

PySpark

Spark

Python

ETL

Amazon EMR

+7 more

We are looking for a highly skilled Sr. Big Data Engineer with 3-5 years of experience in

building large-scale data pipelines, real-time streaming solutions, and batch/stream

processing systems. The ideal candidate should be proficient in Spark, Kafka, Python, and

AWS Big Data services, with hands-on experience in implementing CDC (Change Data

Capture) pipelines and integrating multiple data sources and sinks.

Responsibilities

Design, develop, and optimize batch and streaming data pipelines using Apache Spark and Python.
Build and maintain real-time data ingestion pipelines leveraging Kafka and AWS Kinesis.
Implement CDC (Change Data Capture) pipelines using Kafka Connect, Debezium or similar frameworks.
Integrate data from multiple sources and sinks (databases, APIs, message queues, file systems, cloud storage).
Work with AWS Big Data ecosystem: Glue, EMR, Kinesis, Athena, S3, Lambda, Step Functions.
Ensure pipeline scalability, reliability, and performance tuning of Spark jobs and EMR clusters.
Develop data transformation and ETL workflows in AWS Glue and manage schema evolution.
Collaborate with data scientists, analysts, and product teams to deliver reliable and high-quality data solutions.
Implement monitoring, logging, and alerting for critical data pipelines.
Follow best practices for data security, compliance, and cost optimization in cloud environments.

Required Skills & Experience

Programming: Strong proficiency in Python (PySpark, data frameworks, automation).
Big Data Processing: Hands-on experience with Apache Spark (batch & streaming).
Messaging & Streaming: Proficient in Kafka (brokers, topics, partitions, consumer groups) and AWS Kinesis.
CDC Pipelines: Experience with Debezium / Kafka Connect / custom CDC frameworks.
AWS Services: AWS Glue, EMR, S3, Athena, Lambda, IAM, CloudWatch.
ETL/ELT Workflows: Strong knowledge of data ingestion, transformation, partitioning, schema management.
Databases: Experience with relational databases (MySQL, Postgres, Oracle) and NoSQL (MongoDB, DynamoDB, Cassandra).
Data Formats: JSON, Parquet, Avro, ORC, Delta/Iceberg/Hudi.
Version Control & CI/CD: Git, GitHub/GitLab, Jenkins, or CodePipeline.
Monitoring/Logging: CloudWatch, Prometheus, ELK/Opensearch.
Containers & Orchestration (nice-to-have): Docker, Kubernetes, Airflow/Step
Functions for workflow orchestration.

Preferred Qualifications

Bachelor’s or Master’s degree in Computer Science, Data Engineering, or related field.
Experience in large-scale data lake / lake house architectures.
Knowledge of data warehousing concepts and query optimisation.
Familiarity with data governance, lineage, and cataloging tools (Glue Data Catalog, Apache Atlas).
Exposure to ML/AI data pipelines is a plus.

Tools & Technologies (must-have exposure)

Big Data & Processing: Apache Spark, PySpark, AWS EMR, AWS Glue
Streaming & Messaging: Apache Kafka, Kafka Connect, Debezium, AWS Kinesis
Cloud & Storage: AWS (S3, Athena, Lambda, IAM, CloudWatch)
Programming & Scripting: Python, SQL, Bash
Orchestration: Airflow / Step Functions
Version Control & CI/CD: Git, Jenkins/CodePipeline
Data Formats: Parquet, Avro, ORC, JSON, Delta, Iceberg, Hudi

We are looking for a highly skilled Sr. Big Data Engineer with 3-5 years of experience in

building large-scale data pipelines, real-time streaming solutions, and batch/stream

processing systems. The ideal candidate should be proficient in Spark, Kafka, Python, and

AWS Big Data services, with hands-on experience in implementing CDC (Change Data

Capture) pipelines and integrating multiple data sources and sinks.

Responsibilities

Design, develop, and optimize batch and streaming data pipelines using Apache Spark and Python.
Build and maintain real-time data ingestion pipelines leveraging Kafka and AWS Kinesis.
Implement CDC (Change Data Capture) pipelines using Kafka Connect, Debezium or similar frameworks.
Integrate data from multiple sources and sinks (databases, APIs, message queues, file systems, cloud storage).
Work with AWS Big Data ecosystem: Glue, EMR, Kinesis, Athena, S3, Lambda, Step Functions.
Ensure pipeline scalability, reliability, and performance tuning of Spark jobs and EMR clusters.
Develop data transformation and ETL workflows in AWS Glue and manage schema evolution.
Collaborate with data scientists, analysts, and product teams to deliver reliable and high-quality data solutions.
Implement monitoring, logging, and alerting for critical data pipelines.
Follow best practices for data security, compliance, and cost optimization in cloud environments.

Required Skills & Experience

Programming: Strong proficiency in Python (PySpark, data frameworks, automation).
Big Data Processing: Hands-on experience with Apache Spark (batch & streaming).
Messaging & Streaming: Proficient in Kafka (brokers, topics, partitions, consumer groups) and AWS Kinesis.
CDC Pipelines: Experience with Debezium / Kafka Connect / custom CDC frameworks.
AWS Services: AWS Glue, EMR, S3, Athena, Lambda, IAM, CloudWatch.
ETL/ELT Workflows: Strong knowledge of data ingestion, transformation, partitioning, schema management.
Databases: Experience with relational databases (MySQL, Postgres, Oracle) and NoSQL (MongoDB, DynamoDB, Cassandra).
Data Formats: JSON, Parquet, Avro, ORC, Delta/Iceberg/Hudi.
Version Control & CI/CD: Git, GitHub/GitLab, Jenkins, or CodePipeline.
Monitoring/Logging: CloudWatch, Prometheus, ELK/Opensearch.
Containers & Orchestration (nice-to-have): Docker, Kubernetes, Airflow/Step
Functions for workflow orchestration.

Preferred Qualifications

Bachelor’s or Master’s degree in Computer Science, Data Engineering, or related field.
Experience in large-scale data lake / lake house architectures.
Knowledge of data warehousing concepts and query optimisation.
Familiarity with data governance, lineage, and cataloging tools (Glue Data Catalog, Apache Atlas).
Exposure to ML/AI data pipelines is a plus.

Tools & Technologies (must-have exposure)

Big Data & Processing: Apache Spark, PySpark, AWS EMR, AWS Glue
Streaming & Messaging: Apache Kafka, Kafka Connect, Debezium, AWS Kinesis
Cloud & Storage: AWS (S3, Athena, Lambda, IAM, CloudWatch)
Programming & Scripting: Python, SQL, Bash
Orchestration: Airflow / Step Functions
Version Control & CI/CD: Git, Jenkins/CodePipeline
Data Formats: Parquet, Avro, ORC, JSON, Delta, Iceberg, Hudi

Sr. Data Engineer

at NeoGenCode Technologies Pvt Ltd

2 candid answers

Posted by Akshay Patil

Noida

5 - 10 yrs

₹2L - ₹10L / yr

Looker

SQL

Python

ETL

Amazon Web Services (AWS)

+6 more

Job Title : Sr. Data Engineer

Experience : 5+ Years

Location : Noida (Hybrid – 3 Days in Office)

Shift Timing : 2-11 PM

Availability : Immediate

Job Description :

We are seeking a Senior Data Engineer to design, develop, and optimize data solutions.
The role involves building ETL pipelines, integrating data into BI tools, and ensuring data quality while working with SQL, Python (Pandas, NumPy), and cloud platforms (AWS/GCP).
You will also develop dashboards using Looker Studio and work with AWS services like S3, Lambda, Glue ETL, Athena, RDS, and Redshift.
Strong debugging, collaboration, and communication skills are essential.

Job Title : Sr. Data Engineer

Experience : 5+ Years

Location : Noida (Hybrid – 3 Days in Office)

Shift Timing : 2-11 PM

Availability : Immediate

Job Description :

We are seeking a Senior Data Engineer to design, develop, and optimize data solutions.
The role involves building ETL pipelines, integrating data into BI tools, and ensuring data quality while working with SQL, Python (Pandas, NumPy), and cloud platforms (AWS/GCP).
You will also develop dashboards using Looker Studio and work with AWS services like S3, Lambda, Glue ETL, Athena, RDS, and Redshift.
Strong debugging, collaboration, and communication skills are essential.

AWS Data Engineer

at Mactores Cognition Private Limited

Posted by Disha Thakar

Remote only

2 - 15 yrs

₹6L - ₹40L / yr

Amazon Web Services (AWS)

PySpark

athena

Data engineering

As AWS Data Engineer, you are a full-stack data engineer that loves solving business problems. You work with business leads, analysts, and data scientists to understand the business domain and engage with fellow engineers to build data products that empower better decision-making. You are passionate about the data quality of our business metrics and the flexibility of your solution that scales to respond to broader business questions.

If you love to solve problems using your skills, then come join the Team Mactores. We have a casual and fun office environment that actively steers clear of rigid "corporate" culture, focuses on productivity and creativity, and allows you to be part of a world-class team while still being yourself.

What you will do?

Write efficient code in - PySpark, Amazon Glue
Write SQL Queries in - Amazon Athena, Amazon Redshift
Explore new technologies and learn new techniques to solve business problems creatively
Collaborate with many teams - engineering and business, to build better data products and services
Deliver the projects along with the team collaboratively and manage updates to customers on time

What are we looking for?

1 to 3 years of experience in Apache Spark, PySpark, Amazon Glue
2+ years of experience in writing ETL jobs using pySpark, and SparkSQL
2+ years of experience in SQL queries and stored procedures
Have a deep understanding of all the Dataframe API with all the transformation functions supported by Spark 2.7+

You will be preferred if you have

Prior experience in working on AWS EMR, Apache Airflow
Certifications AWS Certified Big Data – Specialty OR Cloudera Certified Big Data Engineer OR Hortonworks Certified Big Data Engineer
Understanding of DataOps Engineering

Life at Mactores

We care about creating a culture that makes a real difference in the lives of every Mactorian. Our 10 Core Leadership Principles that honor Decision-making, Leadership, Collaboration, and Curiosity drive how we work.

1. Be one step ahead

2. Deliver the best

3. Be bold

4. Pay attention to the detail

5. Enjoy the challenge

6. Be curious and take action

7. Take leadership

8. Own it

9. Deliver value

10. Be collaborative

We would like you to read more details about the work culture on https://mactores.com/careers

The Path to Joining the Mactores Team

At Mactores, our recruitment process is structured around three distinct stages:

Pre-Employment Assessment:

You will be invited to participate in a series of pre-employment evaluations to assess your technical proficiency and suitability for the role.

Managerial Interview: The hiring manager will engage with you in multiple discussions, lasting anywhere from 30 minutes to an hour, to assess your technical skills, hands-on experience, leadership potential, and communication abilities.

HR Discussion: During this 30-minute session, you'll have the opportunity to discuss the offer and next steps with a member of the HR team.

At Mactores, we are committed to providing equal opportunities in all of our employment practices, and we do not discriminate based on race, religion, gender, national origin, age, disability, marital status, military status, genetic information, or any other category protected by federal, state, and local laws. This policy extends to all aspects of the employment relationship, including recruitment, compensation, promotions, transfers, disciplinary action, layoff, training, and social and recreational programs. All employment decisions will be made in compliance with these principles.

What you will do?

Write efficient code in - PySpark, Amazon Glue
Write SQL Queries in - Amazon Athena, Amazon Redshift
Explore new technologies and learn new techniques to solve business problems creatively
Collaborate with many teams - engineering and business, to build better data products and services
Deliver the projects along with the team collaboratively and manage updates to customers on time

What are we looking for?

1 to 3 years of experience in Apache Spark, PySpark, Amazon Glue
2+ years of experience in writing ETL jobs using pySpark, and SparkSQL
2+ years of experience in SQL queries and stored procedures
Have a deep understanding of all the Dataframe API with all the transformation functions supported by Spark 2.7+

You will be preferred if you have

Prior experience in working on AWS EMR, Apache Airflow
Certifications AWS Certified Big Data – Specialty OR Cloudera Certified Big Data Engineer OR Hortonworks Certified Big Data Engineer
Understanding of DataOps Engineering

Life at Mactores

1. Be one step ahead

2. Deliver the best

3. Be bold

4. Pay attention to the detail

5. Enjoy the challenge

6. Be curious and take action

7. Take leadership

8. Own it

9. Deliver value

10. Be collaborative

We would like you to read more details about the work culture on https://mactores.com/careers

The Path to Joining the Mactores Team

At Mactores, our recruitment process is structured around three distinct stages:

Pre-Employment Assessment:

You will be invited to participate in a series of pre-employment evaluations to assess your technical proficiency and suitability for the role.

HR Discussion: During this 30-minute session, you'll have the opportunity to discuss the offer and next steps with a member of the HR team.

Big Data Engineer

at Altimetrik

8 recruiters

Agency job

via SOT-Science of talent Acquisition consulting services Pvt Ltd by Accounts SOT

Hyderabad

5 - 8 yrs

₹12L - ₹25L / yr

PySpark

Data engineering

Big Data

Hadoop

Spark

+6 more

Big Data Engineer: 5+ yrs.
Immediate Joiner

Expertise in building AWS Data Engineering pipelines with AWS Glue -> Athena -> Quick sight
Experience in developing lambda functions with AWS Lambda
Expertise with Spark/PySpark – Candidate should be hands on with PySpark code and should be able to do transformations with Spark
Should be able to code in Python and Scala.
Snowflake experience will be a plus
We can start keeping Hadoop and Hive requirements as good to have or understanding of is enough rather than keeping it as a desirable requirement.

Big Data Engineer: 5+ yrs.
Immediate Joiner

Expertise in building AWS Data Engineering pipelines with AWS Glue -> Athena -> Quick sight
Experience in developing lambda functions with AWS Lambda
Expertise with Spark/PySpark – Candidate should be hands on with PySpark code and should be able to do transformations with Spark
Should be able to code in Python and Scala.
Snowflake experience will be a plus
We can start keeping Hadoop and Hive requirements as good to have or understanding of is enough rather than keeping it as a desirable requirement.

Python developer

at cirruslabs

Posted by shivani sagar

Hyderabad

3 - 6 yrs

₹8L - ₹16L / yr

Python

Django

Flask

AWS Lambda

Amazon Web Services (AWS)

+16 more

Role Specific Responsibilities
 Interfaces with other processes and/or business functions to ensure they can leverage the
benefits provided by the AWS Platform process
 Responsible for managing the configuration of all IaaS assets across the platforms
 Hands-on python experience
 Manages the entire AWS platform(Python, Flask, RESTAPI, serverless framework) and
recommend those that best meet the organization's requirements
 Has a good understanding of the various AWS services, particularly: S3, Athena, Python code,
Glue, Lambda, Cloud Formation, and other AWS serverless resources.
 AWS Certification is Plus
 Knowledge of best practices for IT operations in an always-on, always-available service model
 Responsible for the execution of the process controls, ensuring that staff comply with process
and data standards
Qualifications
 Bachelor’s degree in Computer Science, Business Information Systems or relevant experience and
accomplishments
 3 to 6 years of experience in the IT field
 AWS Python developer
 AWS, Serverless/Lambda, Middleware.

 Strong AWS skills including Data Pipeline, S3, RDS, Redshift with familiarity with other components
like - Lambda, Glue, Step functions, CloudWatch
 Must have created REST API with AWS Lambda.
 Python relevant exp 3 years
 Good to have Experience working on projects and problem solving with large scale multivendor
teams.
 Good to have knowledge on Agile Development
 Good knowledge on SDLC.
 Hands on AWS Databases, (RDS, etc)
 Good to have Unit testing exp.
 Good to have CICD working knowledge.
 Decent communication, as there will be client interaction and documentation.

Education (degree): Bachelor’s degree in Computer Science, Business Information Systems or relevant
experience and accomplishments
Years of Experience: 3-6 years
Technical Skills
 Linux/Unix system administration
 Continuous Integration/Continuous Delivery tools like Jenkins
 Cloud provisioning and management – Azure, AWS, GCP
 Ansible, Chef, or Puppet
 Python, PowerShell & BASH
Job Details
 JOB TITLE/JOB CODE: AWS Python Develop[er, III-Sr. Analyst
 RC: TBD
 PREFERRED LOCATION: HYDERABAD, IND
 POSITION REPORTS TO: Manager USI T&I Cloud Managed Platform
 CAREER LEVEL: 3
Work Location:
Hyderabad

Big Data Developer

Urgent Openings with one of our client

Agency job

via SOT-Science of talent Acquisition consulting services Pvt Ltd by Pavithra Rajasekaran

Hyderabad

3 - 7 yrs

₹3L - ₹10L / yr

Spark

Hadoop

Big Data

Data engineering

PySpark

+4 more

Experience : 3 to 7 Years
Number of Positions : 20

Job Location : Hyderabad

Notice : 30 Days

1. Expertise in building AWS Data Engineering pipelines with AWS Glue -> Athena -> Quick sight

2. Experience in developing lambda functions with AWS Lambda

3. Expertise with Spark/PySpark – Candidate should be hands on with PySpark code and should be able to do transformations with Spark

4. Should be able to code in Python and Scala.

5. Snowflake experience will be a plus

Hadoop and Hive requirements as good to have or understanding of is enough.

Experience : 3 to 7 Years
Number of Positions : 20

Job Location : Hyderabad

Notice : 30 Days

1. Expertise in building AWS Data Engineering pipelines with AWS Glue -> Athena -> Quick sight

2. Experience in developing lambda functions with AWS Lambda

3. Expertise with Spark/PySpark – Candidate should be hands on with PySpark code and should be able to do transformations with Spark

4. Should be able to code in Python and Scala.

5. Snowflake experience will be a plus

Hadoop and Hive requirements as good to have or understanding of is enough.

Scala Developer

at Astegic

3 recruiters

Posted by Nikita Pasricha

Remote only

5 - 7 yrs

₹8L - ₹15L / yr

Data engineering

SQL

Relational Database (RDBMS)

Big Data

Scala

+14 more

WHAT YOU WILL DO:

● Create and maintain optimal data pipeline architecture.
● Assemble large, complex data sets that meet functional / non-functional business requirements.
● Identify, design, and implement internal process improvements: automating manual processes,

optimizing data delivery, re-designing infrastructure for greater scalability, etc.
● Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide

variety of data sources using Spark,Hadoop and AWS 'big data' technologies.(EC2, EMR, S3, Athena).
● Build analytics tools that utilize the data pipeline to provide actionable insights into customer acquisition,

operational efficiency and other key business performance metrics.
● Work with stakeholders including the Executive, Product, Data and Design teams to assist with

data-related technical issues and support their data infrastructure needs.
● Keep our data separated and secure across national boundaries through multiple data centers and AWS

regions.
● Create data tools for analytics and data scientist team members that assist them in building and

optimizing our product into an innovative industry leader.
● Work with data and analytics experts to strive for greater functionality in our data systems.

REQUIRED SKILLS & QUALIFICATIONS:

● 5+ years of experience in a Data Engineer role.
● Advanced working SQL knowledge and experience working with relational databases, query authoring

(SQL) as well as working familiarity with a variety of databases.
● Experience building and optimizing 'big data' data pipelines, architectures and data sets.
● Experience performing root cause analysis on internal and external data and processes to answer

specific business questions and identify opportunities for improvement.
● Strong analytic skills related to working with unstructured datasets.
● Build processes supporting data transformation, data structures, metadata, dependency and workload

management.
● A successful history of manipulating, processing and extracting value from large disconnected datasets.
● Working knowledge of message queuing, stream processing, and highly scalable 'big data' data stores.
● Strong project management and organizational skills.
● Experience supporting and working with cross-functional teams in a dynamic environment
● Experience with big data tools: Hadoop, Spark, Pig, Vetica, etc.
● Experience with AWS cloud services: EC2, EMR, S3, Athena
● Experience with Linux
● Experience with object-oriented/object function scripting languages: Python, Java, Shell, Scala, etc.

PREFERRED SKILLS & QUALIFICATIONS:

● Graduate degree in Computer Science, Statistics, Informatics, Information Systems or another quantitative field.

WHAT YOU WILL DO:

● Create and maintain optimal data pipeline architecture.
● Assemble large, complex data sets that meet functional / non-functional business requirements.
● Identify, design, and implement internal process improvements: automating manual processes,

optimizing data delivery, re-designing infrastructure for greater scalability, etc.
● Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide

variety of data sources using Spark,Hadoop and AWS 'big data' technologies.(EC2, EMR, S3, Athena).
● Build analytics tools that utilize the data pipeline to provide actionable insights into customer acquisition,

operational efficiency and other key business performance metrics.
● Work with stakeholders including the Executive, Product, Data and Design teams to assist with

data-related technical issues and support their data infrastructure needs.
● Keep our data separated and secure across national boundaries through multiple data centers and AWS

regions.
● Create data tools for analytics and data scientist team members that assist them in building and

optimizing our product into an innovative industry leader.
● Work with data and analytics experts to strive for greater functionality in our data systems.

REQUIRED SKILLS & QUALIFICATIONS:

● 5+ years of experience in a Data Engineer role.
● Advanced working SQL knowledge and experience working with relational databases, query authoring

(SQL) as well as working familiarity with a variety of databases.
● Experience building and optimizing 'big data' data pipelines, architectures and data sets.
● Experience performing root cause analysis on internal and external data and processes to answer

specific business questions and identify opportunities for improvement.
● Strong analytic skills related to working with unstructured datasets.
● Build processes supporting data transformation, data structures, metadata, dependency and workload

management.
● A successful history of manipulating, processing and extracting value from large disconnected datasets.
● Working knowledge of message queuing, stream processing, and highly scalable 'big data' data stores.
● Strong project management and organizational skills.
● Experience supporting and working with cross-functional teams in a dynamic environment
● Experience with big data tools: Hadoop, Spark, Pig, Vetica, etc.
● Experience with AWS cloud services: EC2, EMR, S3, Athena
● Experience with Linux
● Experience with object-oriented/object function scripting languages: Python, Java, Shell, Scala, etc.

PREFERRED SKILLS & QUALIFICATIONS:

● Graduate degree in Computer Science, Statistics, Informatics, Information Systems or another quantitative field.

Machine Learning Engineer

at Carsome

3 recruiters

Posted by Piyush Palkar

Remote, Kuala Lumpur

2 - 5 yrs

₹20L - ₹30L / yr

Python

Amazon Web Services (AWS)

Django

Flask

TensorFlow

+2 more

Carsome is a growing startup that is utilising data to improve the experience of second hand car shoppers. This involves developing, deploying & maintaining machine learning models that are used to improve our customers' experience. We are looking for candidates who are aware of the machine learning project lifecycle and can help managing ML deployments.

Responsibilities: - Write and maintain production level code in Python for deploying machine learning models - Create and maintain deployment pipelines through CI/CD tools (preferribly GitLab CI) - Implement alerts and monitoring for prediction accuracy and data drift detection - Implement automated pipelines for training and replacing models - Work closely with with the data science team to deploy new models to production Required Qualifications: - Degree in Computer Science, Data Science, IT or a related discipline. - 2+ years of experience in software engineering or data engineering. - Programming experience in Python - Experience in data profiling, ETL development, testing and implementation - Experience in deploying machine learning models

Good to have: - Experience in AWS resources for ML and data engineering (SageMaker, Glue, Athena, Redshift, S3) - Experience in deploying TensorFlow models - Experience in deploying and managing ML Flow

Event & Unstructured Data

They provide both wholesale and retail funding. PM1

Agency job

via Multi Recruit by Sapna Deb

Mumbai

5 - 7 yrs

₹20L - ₹25L / yr

AWS KINESYS

Data engineering

AWS Lambda

DynamoDB

data pipeline

+11 more

Key responsibility is to design & develop a data pipeline for real-time data integration, processing, executing of the model (if required), and exposing output via MQ / API / No-SQL DB for consumption
Provide technical expertise to design efficient data ingestion solutions to store & process unstructured data, such as Documents, audio, images, weblogs, etc
Developing API services to provide data as a service
Prototyping Solutions for complex data processing problems using AWS cloud-native solutions
Implementing automated Audit & Quality assurance Checks in Data Pipeline
Document & maintain data lineage from various sources to enable data governance
Coordination with BIU, IT, and other stakeholders to provide best-in-class data pipeline solutions, exposing data via APIs, loading in down streams, No-SQL Databases, etc

Skills

Programming experience using Python & SQL
Extensive working experience in Data Engineering projects, using AWS Kinesys, AWS S3, DynamoDB, EMR, Lambda, Athena, etc for event processing
Experience & expertise in implementing complex data pipeline
Strong Familiarity with AWS Toolset for Storage & Processing. Able to recommend the right tools/solutions available to address specific data processing problems
Hands-on experience in Unstructured (Audio, Image, Documents, Weblogs, etc) Data processing.
Good analytical skills with the ability to synthesize data to design and deliver meaningful information
Know-how on any No-SQL DB (DynamoDB, MongoDB, CosmosDB, etc) will be an advantage.
Ability to understand business functionality, processes, and flows
Good combination of technical and interpersonal skills with strong written and verbal communication; detail-oriented with the ability to work independently

Functional knowledge

Real-time Event Processing
Data Governance & Quality assurance
Containerized deployment
Linux
Unstructured Data Processing
AWS Toolsets for Storage & Processing
Data Security

Key responsibility is to design & develop a data pipeline for real-time data integration, processing, executing of the model (if required), and exposing output via MQ / API / No-SQL DB for consumption
Provide technical expertise to design efficient data ingestion solutions to store & process unstructured data, such as Documents, audio, images, weblogs, etc
Developing API services to provide data as a service
Prototyping Solutions for complex data processing problems using AWS cloud-native solutions
Implementing automated Audit & Quality assurance Checks in Data Pipeline
Document & maintain data lineage from various sources to enable data governance
Coordination with BIU, IT, and other stakeholders to provide best-in-class data pipeline solutions, exposing data via APIs, loading in down streams, No-SQL Databases, etc

Skills

Programming experience using Python & SQL
Extensive working experience in Data Engineering projects, using AWS Kinesys, AWS S3, DynamoDB, EMR, Lambda, Athena, etc for event processing
Experience & expertise in implementing complex data pipeline
Strong Familiarity with AWS Toolset for Storage & Processing. Able to recommend the right tools/solutions available to address specific data processing problems
Hands-on experience in Unstructured (Audio, Image, Documents, Weblogs, etc) Data processing.
Good analytical skills with the ability to synthesize data to design and deliver meaningful information
Know-how on any No-SQL DB (DynamoDB, MongoDB, CosmosDB, etc) will be an advantage.
Ability to understand business functionality, processes, and flows
Good combination of technical and interpersonal skills with strong written and verbal communication; detail-oriented with the ability to work independently

Functional knowledge

Real-time Event Processing
Data Governance & Quality assurance
Containerized deployment
Linux
Unstructured Data Processing
AWS Toolsets for Storage & Processing
Data Security

ETL specialist (for a startup hedge fund)

at GitHub

4 recruiters

Posted by Nataliia Mediana

Remote only

3 - 8 yrs

$24K - $60K / yr

ETL

PySpark

Data engineering

Data engineer

athena

+9 more

We are a nascent quant hedge fund; we need to stage financial data and make it easy to run and re-run various preprocessing and ML jobs on the data.
- We are looking for an experienced data engineer to join our team.
- The preprocessing involves ETL tasks, using pyspark, AWS Glue, staging data in parquet formats on S3, and Athena

To succeed in this data engineering position, you should care about well-documented, testable code and data integrity. We have devops who can help with AWS permissions.
We would like to build up a consistent data lake with staged, ready-to-use data, and to build up various scripts that will serve as blueprints for various additional data ingestion and transforms.

If you enjoy setting up something which many others will rely on, and have the relevant ETL expertise, we’d like to work with you.

Responsibilities
- Analyze and organize raw data
- Build data pipelines
- Prepare data for predictive modeling
- Explore ways to enhance data quality and reliability
- Potentially, collaborate with data scientists to support various experiments

Requirements
- Previous experience as a data engineer with the above technologies

Data Engineer

Startup Focused on simplifying Buying Intent

Agency job

via Qrata by Blessy Fernandes

Bengaluru (Bangalore)

4 - 9 yrs

₹28L - ₹56L / yr

Big Data

Apache Spark

Spark

Hadoop

ETL

+7 more

5+ years of experience in a Data Engineer role.
 Proficiency in Linux.
 Must have SQL knowledge and experience working with relational databases,
query authoring (SQL) as well as familiarity with databases including Mysql,
Mongo, Cassandra, and Athena.
 Must have experience with Python/Scala.
Must have experience with Big Data technologies like Apache Spark.
 Must have experience with Apache Airflow.
 Experience with data pipeline and ETL tools like AWS Glue.
 Experience working with AWS cloud services: EC2, S3, RDS, Redshift.

AWS Service Developer

at Angel One

4 recruiters

Posted by Andleeb Mujeeb

Remote only

2 - 6 yrs

₹12L - ₹18L / yr

Amazon Web Services (AWS)

PySpark

Python

Scala

Go Programming (Golang)

+19 more

Designation: Specialist - Cloud Service Developer (ABL_SS_600)

Position description:

The person would be primary responsible for developing solutions using AWS services. Ex: Fargate, Lambda, ECS, ALB, NLB, S3 etc.
Apply advanced troubleshooting techniques to provide Solutions to issues pertaining to Service Availability, Performance, and Resiliency
Monitor & Optimize the performance using AWS dashboards and logs
Partner with Engineering leaders and peers in delivering technology solutions that meet the business requirements
Work with the cloud team in agile approach and develop cost optimized solutions

Primary Responsibilities:

Develop solutions using AWS services includiing Fargate, Lambda, ECS, ALB, NLB, S3 etc.

Reporting Team

Reporting Designation: Head - Big Data Engineering and Cloud Development (ABL_SS_414)
Reporting Department: Application Development (2487)

Required Skills:

AWS certification would be preferred
Good understanding in Monitoring (Cloudwatch, alarms, logs, custom metrics, Trust SNS configuration)
Good experience with Fargate, Lambda, ECS, ALB, NLB, S3, Glue, Aurora and other AWS services.
Preferred to have Knowledge on Storage (S3, Life cycle management, Event configuration)
Good in data structure, programming in (pyspark / python / golang / Scala)

Designation: Specialist - Cloud Service Developer (ABL_SS_600)

Position description:

The person would be primary responsible for developing solutions using AWS services. Ex: Fargate, Lambda, ECS, ALB, NLB, S3 etc.
Apply advanced troubleshooting techniques to provide Solutions to issues pertaining to Service Availability, Performance, and Resiliency
Monitor & Optimize the performance using AWS dashboards and logs
Partner with Engineering leaders and peers in delivering technology solutions that meet the business requirements
Work with the cloud team in agile approach and develop cost optimized solutions

Primary Responsibilities:

Develop solutions using AWS services includiing Fargate, Lambda, ECS, ALB, NLB, S3 etc.

Reporting Team

Reporting Designation: Head - Big Data Engineering and Cloud Development (ABL_SS_414)
Reporting Department: Application Development (2487)

Required Skills:

AWS certification would be preferred
Good understanding in Monitoring (Cloudwatch, alarms, logs, custom metrics, Trust SNS configuration)
Good experience with Fargate, Lambda, ECS, ALB, NLB, S3, Glue, Aurora and other AWS services.
Preferred to have Knowledge on Storage (S3, Life cycle management, Event configuration)
Good in data structure, programming in (pyspark / python / golang / Scala)

Get to hear about interesting companies hiring right now

Follow Cutshort

Why apply via Cutshort?

Connect with actual hiring teams and get their fast response. No spam.

Find more jobs

Get to hear about interesting companies hiring right now

Follow Cutshort