Hadoop jobs

50+ Hadoop Jobs in India

Apply to 50+ Hadoop Jobs on CutShort.io. Find your next job, effortlessly. Browse Hadoop Jobs and apply today!

Hadoop jobs in other cities

Apache Hadoop Jobs Apache Hadoop Jobs in Bangalore (Bengaluru)Apache Hadoop Jobs in Chennai Apache Hadoop Jobs in Coimbatore Hadoop Jobs in Ahmedabad Hadoop Jobs in Bangalore (Bengaluru)Hadoop Jobs in Chandigarh Hadoop Jobs in Chennai Hadoop Jobs in Coimbatore Hadoop Jobs in Delhi, NCR and Gurgaon Hadoop Jobs in Hyderabad Hadoop Jobs in Jaipur Hadoop Jobs in Kochi (Cochin)Hadoop Jobs in Mumbai Hadoop Jobs in Pune

Jobs by Category

Fullstack Developer Jobs Backend Developer Jobs Frontend Developer Jobs Android Developer Jobs iOS Developer Jobs DevOps Jobs Data Science Jobs

Business Developer Jobs Digital Marketing Jobs Sales Jobs

UX Designer Jobs Graphic Designer Jobs

Jobs by Location

Startup Jobs in Bangalore Startup Jobs in Pune Startup Jobs in Delhi All Startup jobs

Collections

Funded Startup Jobs Product Startup Jobs

Cloudera Hadoop Administrator

at Cloudtern Solutions

Posted by Hari Priya Budime

Hyderabad

4 - 8 yrs

₹5L - ₹13L / yr

Cloudera

Hadoop

RHEL

Job Title: Cloudera Hadoop Administrator

Location: Hyderabad

Experience: 4+ years

Employment Type: Full Time

Job Summary:

We are looking for a proactive and technically strong Hadoop Administrator who has hands-on

experience working with Red Hat Linux systems, basic automation through Bash scripting, and

foundational exposure to Hadoop and Cloudera platforms. The ideal candidate has been

involved in real-time system recovery, automation, and training initiatives and is eager to expand

their skills in Linux system administration and Big Data platforms.

Key Responsibilities:

● Administered and managed Red Hat Enterprise Linux (RHEL 7/8) servers to ensure

system stability, performance, and security.

● Automated daily operational tasks using Bash scripting, enhancing system efficiency

and reducing manual work.

● Contributed to backup and disaster recovery strategies for critical enterprise systems.

● Played a key role in recovering a business-critical application (RMA) from disaster

with minimal downtime.

● Delivered internal training on Linux administration, shell scripting, Logical Volume

Management (LVM), and Hadoop basics to junior engineers.

● Participated in Hadoop administration as part of an insourcing project within the

organization.

● Gained hands-on experience installing and maintaining Cloudera CDP Private Cloud

Base clusters under guidance, supporting cluster performance, security, and reliability.

● Assisted in implementing security practices using Apache Ranger, Kerberos, and

Apache Atlas.

Technical Skills:

● Operating Systems: Red Hat Enterprise Linux 7/8

● Scripting: Bash (automation, disk usage monitoring, LVM setup)

● Big Data Tools: Cloudera CDP, Hadoop basics (HDFS, Hive, Spark, Impala)

Job Title: Cloudera Hadoop Administrator

Location: Hyderabad

Experience: 4+ years

Employment Type: Full Time

Job Summary:

We are looking for a proactive and technically strong Hadoop Administrator who has hands-on

experience working with Red Hat Linux systems, basic automation through Bash scripting, and

foundational exposure to Hadoop and Cloudera platforms. The ideal candidate has been

involved in real-time system recovery, automation, and training initiatives and is eager to expand

their skills in Linux system administration and Big Data platforms.

Key Responsibilities:

● Administered and managed Red Hat Enterprise Linux (RHEL 7/8) servers to ensure

system stability, performance, and security.

● Automated daily operational tasks using Bash scripting, enhancing system efficiency

and reducing manual work.

● Contributed to backup and disaster recovery strategies for critical enterprise systems.

● Played a key role in recovering a business-critical application (RMA) from disaster

with minimal downtime.

● Delivered internal training on Linux administration, shell scripting, Logical Volume

Management (LVM), and Hadoop basics to junior engineers.

● Participated in Hadoop administration as part of an insourcing project within the

organization.

● Gained hands-on experience installing and maintaining Cloudera CDP Private Cloud

Base clusters under guidance, supporting cluster performance, security, and reliability.

● Assisted in implementing security practices using Apache Ranger, Kerberos, and

Apache Atlas.

Technical Skills:

● Operating Systems: Red Hat Enterprise Linux 7/8

● Scripting: Bash (automation, disk usage monitoring, LVM setup)

● Big Data Tools: Cloudera CDP, Hadoop basics (HDFS, Hive, Spark, Impala)

Senior Data Scientist

at Proximity Works

1 video

5 recruiters

Posted by Eman Khan

Remote only

5 - 10 yrs

₹30L - ₹60L / yr

Python

Data Science

pandas

Scikit-Learn

TensorFlow

+9 more

We’re seeking a highly skilled, execution-focused Senior Data Scientist with a minimum of 5 years of experience. This role demands hands-on expertise in building, deploying, and optimizing machine learning models at scale, while working with big data technologies and modern cloud platforms. You will be responsible for driving data-driven solutions from experimentation to production, leveraging advanced tools and frameworks across Python, SQL, Spark, and AWS. The role requires strong technical depth, problem-solving ability, and ownership in delivering business impact through data science.

Responsibilities

Design, build, and deploy scalable machine learning models into production systems.
Develop advanced analytics and predictive models using Python, SQL, and popular ML/DL frameworks (Pandas, Scikit-learn, TensorFlow, PyTorch).
Leverage Databricks, Apache Spark, and Hadoop for large-scale data processing and model training.
Implement workflows and pipelines using Airflow and AWS EMR for automation and orchestration.
Collaborate with engineering teams to integrate models into cloud-based applications on AWS.
Optimize query performance, storage usage, and data pipelines for efficiency.
Conduct end-to-end experiments, including data preprocessing, feature engineering, model training, validation, and deployment.
Drive initiatives independently with high ownership and accountability.
Stay up to date with industry best practices in machine learning, big data, and cloud-native deployments.

Requirements:

Minimum 5 years of experience in Data Science or Applied Machine Learning.
Strong proficiency in Python, SQL, and ML libraries (Pandas, Scikit-learn, TensorFlow, PyTorch).
Proven expertise in deploying ML models into production systems.
Experience with big data platforms (Hadoop, Spark) and distributed data processing.
Hands-on experience with Databricks, Airflow, and AWS EMR.
Strong knowledge of AWS cloud services (S3, Lambda, SageMaker, EC2, etc.).
Solid understanding of query optimization, storage systems, and data pipelines.
Excellent problem-solving skills, with the ability to design scalable solutions.
Strong communication and collaboration skills to work in cross-functional teams.

Benefits:

Best in class salary: We hire only the best, and we pay accordingly.
Proximity Talks: Meet other designers, engineers, and product geeks — and learn from experts in the field.
Keep on learning with a world-class team: Work with the best in the field, challenge yourself constantly, and learn something new every day.

About Us:

Proximity is the trusted technology, design, and consulting partner for some of the biggest Sports, Media, and Entertainment companies in the world! We’re headquartered in San Francisco and have offices in Palo Alto, Dubai, Mumbai, and Bangalore. Since 2019, Proximity has created and grown high-impact, scalable products used by 370 million daily users, with a total net worth of $45.7 billion among our client companies.

Today, we are a global team of coders, designers, product managers, geeks, and experts. We solve complex problems and build cutting-edge tech, at scale. Our team of Proxonauts is growing quickly, which means your impact on the company’s success will be huge. You’ll have the chance to work with experienced leaders who have built and led multiple tech, product, and design teams.

Responsibilities

Design, build, and deploy scalable machine learning models into production systems.
Develop advanced analytics and predictive models using Python, SQL, and popular ML/DL frameworks (Pandas, Scikit-learn, TensorFlow, PyTorch).
Leverage Databricks, Apache Spark, and Hadoop for large-scale data processing and model training.
Implement workflows and pipelines using Airflow and AWS EMR for automation and orchestration.
Collaborate with engineering teams to integrate models into cloud-based applications on AWS.
Optimize query performance, storage usage, and data pipelines for efficiency.
Conduct end-to-end experiments, including data preprocessing, feature engineering, model training, validation, and deployment.
Drive initiatives independently with high ownership and accountability.
Stay up to date with industry best practices in machine learning, big data, and cloud-native deployments.

Requirements:

Minimum 5 years of experience in Data Science or Applied Machine Learning.
Strong proficiency in Python, SQL, and ML libraries (Pandas, Scikit-learn, TensorFlow, PyTorch).
Proven expertise in deploying ML models into production systems.
Experience with big data platforms (Hadoop, Spark) and distributed data processing.
Hands-on experience with Databricks, Airflow, and AWS EMR.
Strong knowledge of AWS cloud services (S3, Lambda, SageMaker, EC2, etc.).
Solid understanding of query optimization, storage systems, and data pipelines.
Excellent problem-solving skills, with the ability to design scalable solutions.
Strong communication and collaboration skills to work in cross-functional teams.

Benefits:

Best in class salary: We hire only the best, and we pay accordingly.
Proximity Talks: Meet other designers, engineers, and product geeks — and learn from experts in the field.
Keep on learning with a world-class team: Work with the best in the field, challenge yourself constantly, and learn something new every day.

About Us:

Big Data Engineer

at Deqode

1 recruiter

Posted by Shraddha Katare

Bengaluru (Bangalore)

5 - 8 yrs

₹5L - ₹20L / yr

Apache Hive

Apache Spark

Python

SQL

Hadoop

+1 more

Profile: Big Data Engineer (System Design)

Experience: 5+ years

Location: Bangalore

Work Mode: Hybrid

About the Role

We're looking for an experienced Big Data Engineer with system design expertise to architect and build scalable data pipelines and optimize big data solutions.

Key Responsibilities

Design, develop, and maintain data pipelines and ETL processes using Python, Hive, and Spark
Architect scalable big data solutions with strong system design principles
Build and optimize workflows using Apache Airflow
Implement data modeling, integration, and warehousing solutions
Collaborate with cross-functional teams to deliver data solutions

Must-Have Skills

5+ years as a Data Engineer with Python, Hive, and Spark
Strong hands-on experience with Java
Advanced SQL and Hadoop experience
Expertise in Apache Airflow
Strong understanding of data modeling, integration, and warehousing
Experience with relational databases (PostgreSQL, MySQL)
System design knowledge
Excellent problem-solving and communication skills

Good to Have

Docker and containerization experience
Knowledge of Apache Beam, Apache Flink, or similar frameworks
Cloud platform experience.

Profile: Big Data Engineer (System Design)

Experience: 5+ years

Location: Bangalore

Work Mode: Hybrid

About the Role

We're looking for an experienced Big Data Engineer with system design expertise to architect and build scalable data pipelines and optimize big data solutions.

Key Responsibilities

Design, develop, and maintain data pipelines and ETL processes using Python, Hive, and Spark
Architect scalable big data solutions with strong system design principles
Build and optimize workflows using Apache Airflow
Implement data modeling, integration, and warehousing solutions
Collaborate with cross-functional teams to deliver data solutions

Must-Have Skills

5+ years as a Data Engineer with Python, Hive, and Spark
Strong hands-on experience with Java
Advanced SQL and Hadoop experience
Expertise in Apache Airflow
Strong understanding of data modeling, integration, and warehousing
Experience with relational databases (PostgreSQL, MySQL)
System design knowledge
Excellent problem-solving and communication skills

Good to Have

Docker and containerization experience
Knowledge of Apache Beam, Apache Flink, or similar frameworks
Cloud platform experience.

Senior Data Engineer

It is a global technology consultancy

Agency job

via Scaling Theory by DivyaSri Rajendran

Bengaluru (Bangalore)

4.5 - 10 yrs

₹15L - ₹30L / yr

Spark

Scala

Hadoop

Amazon Web Services (AWS)

Role overview:

Must have About 5 - 11 years and at least 3 years relevant experience with Bigdata.
Must have Experience in building highly scalable business applications, which involve implementing large complex business flows and dealing with huge amounts of data.
Must have experience in Hadoop, Hive, Spark with Scala with good experience in performance tuning and debugging issues.
Good to have any stream processing Spark/Java Kafka.
Must have experience in design and development of Big data projects.
Good knowledge in Functional programming and OOP concepts, SOLID principles, design patterns for developing scalable applications.
Familiarity with build tools like Maven.
Must have experience with any RDBMS and at least one SQL database preferably PostgresSQL
Must have experience writing unit and integration tests using scaliest
Must have experience using any versioning control system - Git
Must have experience with CI / CD pipeline – Jenkins is a plus
Basic hands-on experience in one of the cloud provider (AWS/Azure) is a plus
Databricks Spark certification is a plus.

What would you do here:

As a Software Development Engineer 2 you will be responsible for expanding and optimising our data and data pipeline architecture as well as optimising data flow and collection for cross-functional teams. The ideal candidate is an experienced data pipeline design and data wrangler who enjoys optimising data systems and building them from the ground up. The Data Engineer will lead our software developers on data initiatives and will ensure optimal data delivery architecture is consistent throughout ongoing projects. They must be self-directed and comfortable supporting the data needs of multiple teams, systems and products. The right candidate will be excited by the prospect of optimising or even re-designing our company’s data architecture to support our next generation of products and data initiatives.

Responsibilities:

•Create and maintain optimal data pipeline architecture

•Assemble large complex data sets that meet functional / non-functional business requirements.

•Identify design and implement internal process improvements: automating manual processes optimising data delivery, coordinating to re-design infrastructure for greater scalability etc.

•Work with stakeholders including the Executive Product Data and Design teams to assist with data-related technical issues and support their data infrastructure needs.

•Keep our data separated and secure

•Work with data and analytics experts to strive for greater functionality in our data systems.

- Support PROD systems

Role overview:

Must have About 5 - 11 years and at least 3 years relevant experience with Bigdata.
Must have Experience in building highly scalable business applications, which involve implementing large complex business flows and dealing with huge amounts of data.
Must have experience in Hadoop, Hive, Spark with Scala with good experience in performance tuning and debugging issues.
Good to have any stream processing Spark/Java Kafka.
Must have experience in design and development of Big data projects.
Good knowledge in Functional programming and OOP concepts, SOLID principles, design patterns for developing scalable applications.
Familiarity with build tools like Maven.
Must have experience with any RDBMS and at least one SQL database preferably PostgresSQL
Must have experience writing unit and integration tests using scaliest
Must have experience using any versioning control system - Git
Must have experience with CI / CD pipeline – Jenkins is a plus
Basic hands-on experience in one of the cloud provider (AWS/Azure) is a plus
Databricks Spark certification is a plus.

What would you do here:

Responsibilities:

•Create and maintain optimal data pipeline architecture

•Assemble large complex data sets that meet functional / non-functional business requirements.

•Identify design and implement internal process improvements: automating manual processes optimising data delivery, coordinating to re-design infrastructure for greater scalability etc.

•Work with stakeholders including the Executive Product Data and Design teams to assist with data-related technical issues and support their data infrastructure needs.

•Keep our data separated and secure

•Work with data and analytics experts to strive for greater functionality in our data systems.

- Support PROD systems

Big Data Engineer

empowers digital transformation for innovative and high grow

Agency job

via Hirebound by Jebin Joy

Pune

4 - 12 yrs

₹12L - ₹30L / yr

Hadoop

Spark

Apache Kafka

ETL

Java

+2 more

To be successful in this role, you should possess

• Collaborate closely with Product Management and Engineering leadership to devise and build the

right solution.

• Participate in Design discussions and brainstorming sessions to select, integrate, and maintain Big

Data tools and frameworks required to solve Big Data problems at scale.

• Design and implement systems to cleanse, process, and analyze large data sets using distributed

processing tools like Akka and Spark.

• Understanding and critically reviewing existing data pipelines, and coming up with ideas in

collaboration with Technical Leaders and Architects to improve upon current bottlenecks

• Take initiatives, and show the drive to pick up new stuff proactively, and work as a Senior

Individual contributor on the multiple products and features we have.

• 3+ years of experience in developing highly scalable Big Data pipelines.

• In-depth understanding of the Big Data ecosystem including processing frameworks like Spark,

Akka, Storm, and Hadoop, and the file types they deal with.

• Experience with ETL and Data pipeline tools like Apache NiFi, Airflow etc.

• Excellent coding skills in Java or Scala, including the understanding to apply appropriate Design

Patterns when required.

• Experience with Git and build tools like Gradle/Maven/SBT.

• Strong understanding of object-oriented design, data structures, algorithms, profiling, and

optimization.

• Have elegant, readable, maintainable and extensible code style.

You are someone who would easily be able to

• Work closely with the US and India engineering teams to help build the Java/Scala based data

pipelines

• Lead the India engineering team in technical excellence and ownership of critical modules; own

the development of new modules and features

• Troubleshoot live production server issues.

• Handle client coordination and be able to work as a part of a team, be able to contribute

independently and drive the team to exceptional contributions with minimal team supervision

• Follow Agile methodology, JIRA for work planning, issue management/tracking

Additional Project/Soft Skills:

• Should be able to work independently with India & US based team members.

• Strong verbal and written communication with ability to articulate problems and solutions over phone and emails.

• Strong sense of urgency, with a passion for accuracy and timeliness.

• Ability to work calmly in high pressure situations and manage multiple projects/tasks.

• Ability to work independently and possess superior skills in issue resolution.

• Should have the passion to learn and implement, analyze and troubleshoot issues

To be successful in this role, you should possess

• Collaborate closely with Product Management and Engineering leadership to devise and build the

right solution.

• Participate in Design discussions and brainstorming sessions to select, integrate, and maintain Big

Data tools and frameworks required to solve Big Data problems at scale.

• Design and implement systems to cleanse, process, and analyze large data sets using distributed

processing tools like Akka and Spark.

• Understanding and critically reviewing existing data pipelines, and coming up with ideas in

collaboration with Technical Leaders and Architects to improve upon current bottlenecks

• Take initiatives, and show the drive to pick up new stuff proactively, and work as a Senior

Individual contributor on the multiple products and features we have.

• 3+ years of experience in developing highly scalable Big Data pipelines.

• In-depth understanding of the Big Data ecosystem including processing frameworks like Spark,

Akka, Storm, and Hadoop, and the file types they deal with.

• Experience with ETL and Data pipeline tools like Apache NiFi, Airflow etc.

• Excellent coding skills in Java or Scala, including the understanding to apply appropriate Design

Patterns when required.

• Experience with Git and build tools like Gradle/Maven/SBT.

• Strong understanding of object-oriented design, data structures, algorithms, profiling, and

optimization.

• Have elegant, readable, maintainable and extensible code style.

You are someone who would easily be able to

• Work closely with the US and India engineering teams to help build the Java/Scala based data

pipelines

• Lead the India engineering team in technical excellence and ownership of critical modules; own

the development of new modules and features

• Troubleshoot live production server issues.

• Handle client coordination and be able to work as a part of a team, be able to contribute

independently and drive the team to exceptional contributions with minimal team supervision

• Follow Agile methodology, JIRA for work planning, issue management/tracking

Additional Project/Soft Skills:

• Should be able to work independently with India & US based team members.

• Strong verbal and written communication with ability to articulate problems and solutions over phone and emails.

• Strong sense of urgency, with a passion for accuracy and timeliness.

• Ability to work calmly in high pressure situations and manage multiple projects/tasks.

• Ability to work independently and possess superior skills in issue resolution.

• Should have the passion to learn and implement, analyze and troubleshoot issues

Data Engineer

at Pluginlive

1 recruiter

Posted by Harsha Saggi

Chennai, Mumbai

4 - 6 yrs

₹10L - ₹20L / yr

Python

SQL

NOSQL Databases

Data architecture

Data modeling

+7 more

Role Overview:

We are seeking a talented and experienced Data Architect with strong data visualization capabilities to join our dynamic team in Mumbai. As a Data Architect, you will be responsible for designing, building, and managing our data infrastructure, ensuring its reliability, scalability, and performance. You will also play a crucial role in transforming complex data into insightful visualizations that drive business decisions. This role requires a deep understanding of data modeling, database technologies (particularly Oracle Cloud), data warehousing principles, and proficiency in data manipulation and visualization tools, including Python and SQL.

Responsibilities:

Design and implement robust and scalable data architectures, including data warehouses, data lakes, and operational data stores, primarily leveraging Oracle Cloud services.
Develop and maintain data models (conceptual, logical, and physical) that align with business requirements and ensure data integrity and consistency.
Define data governance policies and procedures to ensure data quality, security, and compliance.
Collaborate with data engineers to build and optimize ETL/ELT pipelines for efficient data ingestion, transformation, and loading.
Develop and execute data migration strategies to Oracle Cloud.
Utilize strong SQL skills to query, manipulate, and analyze large datasets from various sources.
Leverage Python and relevant libraries (e.g., Pandas, NumPy) for data cleaning, transformation, and analysis.
Design and develop interactive and insightful data visualizations using tools like [Specify Visualization Tools - e.g., Tableau, Power BI, Matplotlib, Seaborn, Plotly] to communicate data-driven insights to both technical and non-technical stakeholders.
Work closely with business analysts and stakeholders to understand their data needs and translate them into effective data models and visualizations.
Ensure the performance and reliability of data visualization dashboards and reports.
Stay up-to-date with the latest trends and technologies in data architecture, cloud computing (especially Oracle Cloud), and data visualization.
Troubleshoot data-related issues and provide timely resolutions.
Document data architectures, data flows, and data visualization solutions.
Participate in the evaluation and selection of new data technologies and tools.

Qualifications:

Bachelor's or Master's degree in Computer Science, Data Science, Information Systems, or a related field.
Proven experience (typically 5+ years) as a Data Architect, Data Modeler, or similar role.
Deep understanding of data warehousing concepts, dimensional modeling (e.g., star schema, snowflake schema), and ETL/ELT processes.
Extensive experience working with relational databases, particularly Oracle, and proficiency in SQL.
Hands-on experience with Oracle Cloud data services (e.g., Autonomous Data Warehouse, Object Storage, Data Integration).
Strong programming skills in Python and experience with data manipulation and analysis libraries (e.g., Pandas, NumPy).
Demonstrated ability to create compelling and effective data visualizations using industry-standard tools (e.g., Tableau, Power BI, Matplotlib, Seaborn, Plotly).
Excellent analytical and problem-solving skills with the ability to interpret complex data and translate it into actionable insights.
Strong communication and presentation skills, with the ability to effectively communicate technical concepts to non-technical audiences.
Experience with data governance and data quality principles.
Familiarity with agile development methodologies.
Ability to work independently and collaboratively within a team environment.

Application Link- https://forms.gle/km7n2WipJhC2Lj2r5

Role Overview:

Responsibilities:

Design and implement robust and scalable data architectures, including data warehouses, data lakes, and operational data stores, primarily leveraging Oracle Cloud services.
Develop and maintain data models (conceptual, logical, and physical) that align with business requirements and ensure data integrity and consistency.
Define data governance policies and procedures to ensure data quality, security, and compliance.
Collaborate with data engineers to build and optimize ETL/ELT pipelines for efficient data ingestion, transformation, and loading.
Develop and execute data migration strategies to Oracle Cloud.
Utilize strong SQL skills to query, manipulate, and analyze large datasets from various sources.
Leverage Python and relevant libraries (e.g., Pandas, NumPy) for data cleaning, transformation, and analysis.
Design and develop interactive and insightful data visualizations using tools like [Specify Visualization Tools - e.g., Tableau, Power BI, Matplotlib, Seaborn, Plotly] to communicate data-driven insights to both technical and non-technical stakeholders.
Work closely with business analysts and stakeholders to understand their data needs and translate them into effective data models and visualizations.
Ensure the performance and reliability of data visualization dashboards and reports.
Stay up-to-date with the latest trends and technologies in data architecture, cloud computing (especially Oracle Cloud), and data visualization.
Troubleshoot data-related issues and provide timely resolutions.
Document data architectures, data flows, and data visualization solutions.
Participate in the evaluation and selection of new data technologies and tools.

Qualifications:

Bachelor's or Master's degree in Computer Science, Data Science, Information Systems, or a related field.
Proven experience (typically 5+ years) as a Data Architect, Data Modeler, or similar role.
Deep understanding of data warehousing concepts, dimensional modeling (e.g., star schema, snowflake schema), and ETL/ELT processes.
Extensive experience working with relational databases, particularly Oracle, and proficiency in SQL.
Hands-on experience with Oracle Cloud data services (e.g., Autonomous Data Warehouse, Object Storage, Data Integration).
Strong programming skills in Python and experience with data manipulation and analysis libraries (e.g., Pandas, NumPy).
Demonstrated ability to create compelling and effective data visualizations using industry-standard tools (e.g., Tableau, Power BI, Matplotlib, Seaborn, Plotly).
Excellent analytical and problem-solving skills with the ability to interpret complex data and translate it into actionable insights.
Strong communication and presentation skills, with the ability to effectively communicate technical concepts to non-technical audiences.
Experience with data governance and data quality principles.
Familiarity with agile development methodologies.
Ability to work independently and collaboratively within a team environment.

Application Link- https://forms.gle/km7n2WipJhC2Lj2r5

senior data engineer

at KJBN labs

2 candid answers

Posted by sakthi ganesh

Bengaluru (Bangalore)

4 - 7 yrs

₹10L - ₹30L / yr

Hadoop

Apache Kafka

Spark

Python

Java

+8 more

Senior Data Engineer Job Description

Overview

The Senior Data Engineer will design, develop, and maintain scalable data pipelines and

infrastructure to support data-driven decision-making and advanced analytics. This role requires deep

expertise in data engineering, strong problem-solving skills, and the ability to collaborate with

cross-functional teams to deliver robust data solutions.

Key Responsibilities

Data Pipeline Development: Design, build, and optimize scalable, secure, and reliable data

pipelines to ingest, process, and transform large volumes of structured and unstructured data.

Data Architecture: Architect and maintain data storage solutions, including data lakes, data

warehouses, and databases, ensuring performance, scalability, and cost-efficiency.

Data Integration: Integrate data from diverse sources, including APIs, third-party systems,

and streaming platforms, ensuring data quality and consistency.

Performance Optimization: Monitor and optimize data systems for performance, scalability,

and cost, implementing best practices for partitioning, indexing, and caching.

Collaboration: Work closely with data scientists, analysts, and software engineers to

understand data needs and deliver solutions that enable advanced analytics, machine

learning, and reporting.

Data Governance: Implement data governance policies, ensuring compliance with data

security, privacy regulations (e.g., GDPR, CCPA), and internal standards.

Automation: Develop automated processes for data ingestion, transformation, and validation

to improve efficiency and reduce manual intervention.

Mentorship: Guide and mentor junior data engineers, fostering a culture of technical

excellence and continuous learning.

Troubleshooting: Diagnose and resolve complex data-related issues, ensuring high

availability and reliability of data systems.

Required Qualifications

Education: Bachelor’s or Master’s degree in Computer Science, Engineering, Data Science,

or a related field.

Experience: 5+ years of experience in data engineering or a related role, with a proven track

record of building scalable data pipelines and infrastructure.

Technical Skills:

Proficiency in programming languages such as Python, Java, or Scala.

Expertise in SQL and experience with NoSQL databases (e.g., MongoDB, Cassandra).

Strong experience with cloud platforms (e.g., AWS, Azure, GCP) and their data services

(e.g., Redshift, BigQuery, Snowflake).

Hands-on experience with ETL/ELT tools (e.g., Apache Airflow, Talend, Informatica) and

data integration frameworks.

Familiarity with big data technologies (e.g., Hadoop, Spark, Kafka) and distributed

systems.

Knowledge of containerization and orchestration tools (e.g., Docker, Kubernetes) is a

plus.

Soft Skills:

Excellent problem-solving and analytical skills.

Strong communication and collaboration abilities.

Ability to work in a fast-paced, dynamic environment and manage multiple priorities.

Certifications (optional but preferred): Cloud certifications (e.g., AWS Certified Data Analytics,

Google Professional Data Engineer) or relevant data engineering certifications.

Preferred Qualifica

Experience with real-time data processing and streaming architectures.

Familiarity with machine learning pipelines and MLOps practices.

Knowledge of data visualization tools (e.g., Tableau, Power BI) and their integration with data

pipelines.

Experience in industries with high data complexity, such as finance, healthcare, or

e-commerce.

Work Environment

Location: Hybrid/Remote/On-site (depending on company policy).

Team: Collaborative, cross-functional team environment with data scientists, analysts, and

business stakeholders.

Hours: Full-time, with occasional on-call responsibilities for critical data systems.