Cutshort logo

21+ HDFS Jobs in India

Apply to 21+ HDFS Jobs on CutShort.io. Find your next job, effortlessly. Browse HDFS Jobs and apply today!

icon
Mobile Programming LLC

at Mobile Programming LLC

1 video
34 recruiters
Sukhdeep Singh
Posted by Sukhdeep Singh
Chennai
4 - 7 yrs
₹13L - ₹15L / yr
Data Analytics
Data Visualization
PowerBI
Tableau
Qlikview
+10 more

Title: Platform Engineer Location: Chennai Work Mode: Hybrid (Remote and Chennai Office) Experience: 4+ years Budget: 16 - 18 LPA

Responsibilities:

  • Parse data using Python, create dashboards in Tableau.
  • Utilize Jenkins for Airflow pipeline creation and CI/CD maintenance.
  • Migrate Datastage jobs to Snowflake, optimize performance.
  • Work with HDFS, Hive, Kafka, and basic Spark.
  • Develop Python scripts for data parsing, quality checks, and visualization.
  • Conduct unit testing and web application testing.
  • Implement Apache Airflow and handle production migration.
  • Apply data warehousing techniques for data cleansing and dimension modeling.

Requirements:

  • 4+ years of experience as a Platform Engineer.
  • Strong Python skills, knowledge of Tableau.
  • Experience with Jenkins, Snowflake, HDFS, Hive, and Kafka.
  • Proficient in Unix Shell Scripting and SQL.
  • Familiarity with ETL tools like DataStage and DMExpress.
  • Understanding of Apache Airflow.
  • Strong problem-solving and communication skills.

Note: Only candidates willing to work in Chennai and available for immediate joining will be considered. Budget for this position is 16 - 18 LPA.

Read more
Conviva

at Conviva

1 recruiter
Anusha Bondada
Posted by Anusha Bondada
Bengaluru (Bangalore)
3 - 6 yrs
₹20L - ₹40L / yr
Spark
Hadoop
Big Data
Data engineering
PySpark
+9 more

As Conviva is expanding, we are building products providing deep insights into end-user experience for our customers.

 

Platform and TLB Team

The vision for the TLB team is to build data processing software that works on terabytes of streaming data in real-time. Engineer the next-gen Spark-like system for in-memory computation of large time-series datasets – both Spark-like backend infra and library-based programming model. Build a horizontally and vertically scalable system that analyses trillions of events per day within sub-second latencies. Utilize the latest and greatest big data technologies to build solutions for use cases across multiple verticals. Lead technology innovation and advancement that will have a big business impact for years to come. Be part of a worldwide team building software using the latest technologies and the best of software development tools and processes.

 

What You’ll Do

This is an individual contributor position. Expectations will be on the below lines:

  • Design, build and maintain the stream processing, and time-series analysis system which is at the heart of Conviva’s products
  • Responsible for the architecture of the Conviva platform
  • Build features, enhancements, new services, and bug fixing in Scala and Java on a Jenkins-based pipeline to be deployed as Docker containers on Kubernetes
  • Own the entire lifecycle of your microservice including early specs, design, technology choice, development, unit-testing, integration-testing, documentation, deployment, troubleshooting, enhancements, etc.
  • Lead a team to develop a feature or parts of a product
  • Adhere to the Agile model of software development to plan, estimate, and ship per business priority

 

What you need to succeed

  • 5+ years of work experience in software development of data processing products.
  • Engineering degree in software or equivalent from a premier institute.
  • Excellent knowledge of fundamentals of Computer Science like algorithms and data structures. Hands-on with functional programming and know-how of its concepts
  • Excellent programming and debugging skills on the JVM. Proficient in writing code in Scala/Java/Rust/Haskell/Erlang that is reliable, maintainable, secure, and performant
  • Experience with big data technologies like Spark, Flink, Kafka, Druid, HDFS, etc.
  • Deep understanding of distributed systems concepts and scalability challenges including multi-threading, concurrency, sharding, partitioning, etc.
  • Experience/knowledge of Akka/Lagom framework and/or stream processing technologies like RxJava or Project Reactor will be a big plus. Knowledge of design patterns like event-streaming, CQRS and DDD to build large microservice architectures will be a big plus
  • Excellent communication skills. Willingness to work under pressure. Hunger to learn and succeed. Comfortable with ambiguity. Comfortable with complexity

 

Underpinning the Conviva platform is a rich history of innovation. More than 60 patents represent award-winning technologies and standards, including first-of-its kind-innovations like time-state analytics and AI-automated data modeling, that surfaces actionable insights. By understanding real-world human experiences and having the ability to act within seconds of observation, our customers can solve business-critical issues and focus on growing their business ahead of the competition. Examples of the brands Conviva has helped fuel streaming growth for include: DAZN, Disney+, HBO, Hulu, NBCUniversal, Paramount+, Peacock, Sky, Sling TV, Univision and Warner Bros Discovery.  

Privately held, Conviva is headquartered in Silicon Valley, California with offices and people around the globe. For more information, visit us at www.conviva.com. Join us to help extend our leadership position in big data streaming analytics to new audiences and markets! 


Read more
Hyderabad
4 - 7 yrs
₹14L - ₹25L / yr
Spark
Hadoop
Big Data
Data engineering
PySpark
+5 more

Roles and Responsibilities

Big Data Engineer + Spark Responsibilies Atleast 3 to 4 years of relevant experience as Big Data Engineer Min 1 year of relevant hands-on experience into Spark framework. Minimum 4 years of Application Development experience using any programming language like Scala/Java/Python. Hands on experience on any major components in Hadoop Ecosystem like HDFS or Map or Reduce or Hive or Impala. Strong programming experience of building applications / platforms using Scala/Java/Python. Experienced in implementing Spark RDD Transformations, actions to implement business analysis. An efficient interpersonal communicator with sound analytical problemsolving skills and management capabilities. Strive to keep the slope of the learning curve high and able to quickly adapt to new environments and technologies. Good knowledge on agile methodology of Software development.
Read more
Number Theory

at Number Theory

3 recruiters
Nidhi Mishra
Posted by Nidhi Mishra
Gurugram
2 - 4 yrs
₹10L - ₹15L / yr
Hadoop
Spark
HDFS
Scala
Java
+2 more
Position Overview: Data Engineer (2+ yrs)
Our company is seeking to hire a skilled software developer to help with the development of our AI/ML platform.
Your duties will primarily revolve around building Platform by writing code in Scala, as well as modifying platform
to fix errors, work on distributed computing, adapt it to new cloud services, improve its performance, or upgrade
interfaces. To be successful in this role, you will need extensive knowledge of programming languages and the
software development life-cycle.

Responsibilities:
 Analyze, design develop, troubleshoot and debug Platform
 Writes code and guides other team membersfor best practices and performs testing and debugging of
applications.
 Specify, design and implementminor changes to existing software architecture. Build highly complex
enhancements and resolve complex bugs. Build and execute unit tests and unit plans.
 Duties and tasks are varied and complex, needing independent judgment. Fully competent in own area of
expertise

Experience:
The candidate should have about 2+ years of experience with design and development in Java/Scala. Experience in
algorithm, Distributed System, Data-structure, database and architectures of distributed System is mandatory.

Required Skills:
1. In-depth knowledge of Hadoop, Spark architecture and its componentssuch as HDFS, YARN and executor, cores and memory param
2. Knowledge of Scala/Java.
3. Extensive experience in developing spark job. Should possess good Oops knowledge and be aware of
enterprise application design patterns.
4. Good knowledge of Unix/Linux.
5. Experience working on large-scale software projects
6. Keep an eye out for technological trends, open-source projects that can be used.
7. Knows common programming languages Frameworks
Read more
Product and Service based company
Hyderabad, Ahmedabad
4 - 8 yrs
₹15L - ₹30L / yr
Amazon Web Services (AWS)
Apache
Snow flake schema
Python
Spark
+13 more

Job Description

 

Mandatory Requirements 

  • Experience in AWS Glue

  • Experience in Apache Parquet 

  • Proficient in AWS S3 and data lake 

  • Knowledge of Snowflake

  • Understanding of file-based ingestion best practices.

  • Scripting language - Python & pyspark

CORE RESPONSIBILITIES

  • Create and manage cloud resources in AWS 

  • Data ingestion from different data sources which exposes data using different technologies, such as: RDBMS, flat files, Streams, and Time series data based on various proprietary systems. Implement data ingestion and processing with the help of Big Data technologies 

  • Data processing/transformation using various technologies such as Spark and Cloud Services. You will need to understand your part of business logic and implement it using the language supported by the base data platform 

  • Develop automated data quality check to make sure right data enters the platform and verifying the results of the calculations 

  • Develop an infrastructure to collect, transform, combine and publish/distribute customer data.

  • Define process improvement opportunities to optimize data collection, insights and displays.

  • Ensure data and results are accessible, scalable, efficient, accurate, complete and flexible 

  • Identify and interpret trends and patterns from complex data sets 

  • Construct a framework utilizing data visualization tools and techniques to present consolidated analytical and actionable results to relevant stakeholders. 

  • Key participant in regular Scrum ceremonies with the agile teams  

  • Proficient at developing queries, writing reports and presenting findings 

  • Mentor junior members and bring best industry practices.

 

QUALIFICATIONS

  • 5-7+ years’ experience as data engineer in consumer finance or equivalent industry (consumer loans, collections, servicing, optional product, and insurance sales) 

  • Strong background in math, statistics, computer science, data science or related discipline

  • Advanced knowledge one of language: Java, Scala, Python, C# 

  • Production experience with: HDFS, YARN, Hive, Spark, Kafka, Oozie / Airflow, Amazon Web Services (AWS), Docker / Kubernetes, Snowflake  

  • Proficient with

  • Data mining/programming tools (e.g. SAS, SQL, R, Python)

  • Database technologies (e.g. PostgreSQL, Redshift, Snowflake. and Greenplum)

  • Data visualization (e.g. Tableau, Looker, MicroStrategy)

  • Comfortable learning about and deploying new technologies and tools. 

  • Organizational skills and the ability to handle multiple projects and priorities simultaneously and meet established deadlines. 

  • Good written and oral communication skills and ability to present results to non-technical audiences 

  • Knowledge of business intelligence and analytical tools, technologies and techniques.

Familiarity and experience in the following is a plus: 

  • AWS certification

  • Spark Streaming 

  • Kafka Streaming / Kafka Connect 

  • ELK Stack 

  • Cassandra / MongoDB 

  • CI/CD: Jenkins, GitLab, Jira, Confluence other related tools

Read more
US Based Product Organization
Bengaluru (Bangalore)
10 - 15 yrs
₹25L - ₹45L / yr
Hadoop
HDFS
Apache Hive
Zookeeper
Cloudera
+8 more

Responsibilities :

  • Provide Support Services to our Gold & Enterprise customers using our flagship product suits. This may include assistance provided during the engineering and operations of distributed systems as well as responses for mission-critical systems and production customers.
  • Lead end-to-end delivery and customer success of next-generation features related to scalability, reliability, robustness, usability, security, and performance of the product
  • Lead and mentor others about concurrency, parallelization to deliver scalability, performance, and resource optimization in a multithreaded and distributed environment
  • Demonstrate the ability to actively listen to customers and show empathy to the customer’s business impact when they experience issues with our products


Requires Skills :

  • 10+ years of Experience with a highly scalable, distributed, multi-node environment (100+ nodes)
  • Hadoop operation including Zookeeper, HDFS, YARN, Hive, and related components like the Hive metastore, Cloudera Manager/Ambari, etc
  • Authentication and security configuration and tuning (KNOX, LDAP, Kerberos, SSL/TLS, second priority: SSO/OAuth/OIDC, Ranger/Sentry)
  • Java troubleshooting, e.g., collection and evaluation of jstacks, heap dumps
  • Linux, NFS, Windows, including application installation, scripting, basic command line
  • Docker and Kubernetes configuration and troubleshooting, including Helm charts, storage options, logging, and basic kubectl CLI
  • Experience working with scripting languages (Bash, PowerShell, Python)
  • Working knowledge of application, server, and network security management concepts
  • Familiarity with virtual machine technologies
  • Knowledge of databases like MySQL and PostgreSQL,
  • Certification on any of the leading Cloud providers (AWS, Azure, GCP ) and/or Kubernetes is a big plus
Read more
Acceldata

at Acceldata

5 recruiters
Richa  Kukar
Posted by Richa Kukar
Bengaluru (Bangalore)
6 - 10 yrs
Best in industry
SRE
Reliability engineering
Site reliability
Hadoop
HDFS
+1 more

Senior SRE - Acceldata (IC3 Level)


About the Job


You will join a team of highly skilled engineers who are responsible for delivering Acceldata’s support services. Our Site Reliability Engineers are trained to be active listeners and demonstrate empathy when customers encounter product issues. In our fun and collaborative environment  Site Reliability Engineers develop strong business, interpersonal and technical skills to deliver high-quality service to our valued customers.


When you arrive for your first day, we’ll want you to have:

  • Solid skills in troubleshooting to repair failed products or processes on a machine or a system using a logical, systematic search for the source of a problem in order to solve it, and make the product or process operational again
  • A strong ability to understand the feelings of our customers as we empathize with them on the issue at hand
  • A strong desire to increase your product and technology skillset; increase- your confidence supporting our products so you can help our customers succeed

In this position you will…

  • Provide Support Services to our Gold & Enterprise customers using our flagship Acceldata Pulse,Flow & Torch Product suits. This may include assistance provided during the engineering and operations of distributed systems as well as responses for mission-critical systems and production customers.
  • Demonstrate the ability to actively listen to customers and show empathy to the customer’s business impact when they experience issues with our products
  • Participate in the queue management and coordination process by owning customer escalations, managing the unassigned queue.
  • Be involved with and work on other support related activities - Performing POC & assisting Onboarding deployments of Acceldata & Hadoop distribution products.
  • Triage, diagnose and escalate customer inquiries when applicable during their engineering and operations efforts.
  • Collaborate and share solutions with both customers and the Internal team.
  • Investigate product related issues both for particular customers and for common trends that may arise
  • Study and understand critical system components and large cluster operations
  • Differentiate between issues that arise in operations, user code, or product
  • Coordinate enhancement and feature requests with product management and Acceldata engineering team.
  • Flexible in working in Shifts.
  • Participate in a Rotational weekend on-call roster for critical support needs.
  • Participate as a designated or dedicated engineer for specific customers. Aspects of this engagement translates to building long term successful relationships with customers, leading weekly status calls, and occasional visits to customer sites

In this position, you should have…

  • A strong desire and aptitude to become a well-rounded support professional. Acceldata Support considers the service we deliver as our core product.
  • A positive attitude towards feedback and continual improvement
  • A willingness to give direct feedback to and partner with management to improve team operations
  • A tenacity to bring calm and order to the often stressful situations of customer cases
  • A mental capability to multi-task across many customer situations simultaneously
  • Bachelor degree in Computer Science or Engineering or equivalent experience. Master’s degree is a plus
  • At least 2+ years of experience with at least one of the following cloud platforms: Amazon Web Services (AWS), Microsoft Azure, Google Cloud Platform (GCP), experience with managing and supporting a cloud infrastructure on any of the 3 platforms. Also knowledge on Kubernetes, Docker is a must.
  • Strong troubleshooting skills (in example, TCP/IP, DNS, File system, Load balancing, database, Java)
  • Excellent communication skills in English (written and verbal)
  • Prior enterprise support experience in a technical environment strongly preferred

Strong Hands-on Experience Working With Or Supporting The Following

  • 8-12 years of Experience with a highly-scalable, distributed, multi-node environment (50+ nodes)
  • Hadoop operation including Zookeeper, HDFS, YARN, Hive, and related components like the Hive metastore, Cloudera Manager/Ambari, etc
  • Authentication and security configuration and tuning (KNOX, LDAP, Kerberos, SSL/TLS, second priority: SSO/OAuth/OIDC, Ranger/Sentry)
  • Java troubleshooting, e.g., collection and evaluation of jstacks, heap dumps

You might also have…

  • Linux, NFS, Windows, including application installation, scripting, basic command line
  • Docker and Kubernetes configuration and troubleshooting, including Helm charts, storage options, logging, and basic kubectl CLI
  • Experience working with scripting languages (Bash, PowerShell, Python)
  • Working knowledge of application, server, and network security management concepts
  • Familiarity with virtual machine technologies
  • Knowledge of databases like MySQL and PostgreSQL,
  • Certification on any of the leading Cloud providers (AWS, Azure, GCP ) and/or Kubernetes is a big plus

The right person in this role has an opportunity to make a huge impact at Acceldata and add value to our future decisions. If this position has piqued your interest and you have what we described - we invite you to apply! An adventure in data awaits.

Learn more at https://www.acceldata.io/about-us">https://www.acceldata.io/about-us



Read more
Bengaluru (Bangalore)
8 - 15 yrs
₹25L - ₹60L / yr
Data engineering
Big Data
Spark
Apache Kafka
Cassandra
+20 more
Responsibilities

● Able to contribute to the gathering of functional requirements, developing technical
specifications, and test case planning
● Demonstrating technical expertise, and solving challenging programming and design
problems
● 60% hands-on coding with architecture ownership of one or more products
● Ability to articulate architectural and design options, and educate development teams and
business users
● Resolve defects/bugs during QA testing, pre-production, production, and post-release
patches
● Mentor and guide team members
● Work cross-functionally with various bidgely teams including product management, QA/QE,
various product lines, and/or business units to drive forward results

Requirements
● BS/MS in computer science or equivalent work experience
● 8-12 years’ experience designing and developing applications in Data Engineering
● Hands-on experience with Big data EcoSystems.
● Past experience with Hadoop,Hdfs,Map Reduce,YARN,AWS Cloud, EMR, S3, Spark, Cassandra,
Kafka, Zookeeper
● Expertise with any of the following Object-Oriented Languages (OOD): Java/J2EE,Scala,
Python
● Ability to lead and mentor technical team members
● Expertise with the entire Software Development Life Cycle (SDLC)
● Excellent communication skills: Demonstrated ability to explain complex technical issues to
both technical and non-technical audiences
● Expertise in the Software design/architecture process
● Expertise with unit testing & Test-Driven Development (TDD)
● Business Acumen - strategic thinking & strategy development
● Experience on Cloud or AWS is preferable
● Have a good understanding and ability to develop software, prototypes, or proofs of
concepts (POC's) for various Data Engineering requirements.
● Experience with Agile Development, SCRUM, or Extreme Programming methodologies
Read more
AI-powered cloud-based SaaS solution
Bengaluru (Bangalore)
2 - 10 yrs
₹15L - ₹50L / yr
Data engineering
Big Data
Data Engineer
Big Data Engineer
Hibernate (Java)
+18 more
Responsibilities

● Able contribute to the gathering of functional requirements, developing technical
specifications, and project & test planning
● Demonstrating technical expertise, and solving challenging programming and design
problems
● Roughly 80% hands-on coding
● Generate technical documentation and PowerPoint presentations to communicate
architectural and design options, and educate development teams and business users
● Resolve defects/bugs during QA testing, pre-production, production, and post-release
patches
● Work cross-functionally with various bidgely teams including: product management,
QA/QE, various product lines, and/or business units to drive forward results

Requirements
● BS/MS in computer science or equivalent work experience
● 2-4 years’ experience designing and developing applications in Data Engineering
● Hands-on experience with Big data Eco Systems.
● Hadoop,Hdfs,Map Reduce,YARN,AWS Cloud, EMR, S3, Spark, Cassandra, Kafka,
Zookeeper
● Expertise with any of the following Object-Oriented Languages (OOD): Java/J2EE,Scala,
Python
● Strong leadership experience: Leading meetings, presenting if required
● Excellent communication skills: Demonstrated ability to explain complex technical
issues to both technical and non-technical audiences
● Expertise in the Software design/architecture process
● Expertise with unit testing & Test-Driven Development (TDD)
● Experience on Cloud or AWS is preferable
● Have a good understanding and ability to develop software, prototypes, or proofs of
concepts (POC's) for various Data Engineering requirements.
Read more
AI Based SAAS company
Agency job
via wrackle by Naveen Taalanki
Bengaluru (Bangalore)
12 - 22 yrs
₹50L - ₹99L / yr
Engineering Management
Engineering Manager
Engineering head
Technical Architecture
Technical lead
+20 more

Location: Bangalore

Function: Software Engineering → Backend Development

 

We are looking for an extraordinary and dynamic Director of Engineering to be part of its Engineering team in Bangalore. You must have a good record of architecting scalable solutions, hiring and mentoring talented teams and working with product managers to build great products. You must be highly analytical and a good problem solver. You will be part of a highly energetic and innovative team that believes nothing is impossible with some creativity and hard work.

 

Responsibilities:

  • Own the overall solution design and implementation for backend systems. This includes requirement analysis, scope discussion, design, architecture, implementation, delivery and resolving production issues related to engineering.
  • Owner of the technology roadmap of our products from core back end engineering perspective.
  • Ability to guide the team in debugging production issues and write best-of- the breed code.
  • Drive engineering excellence (defects, productivity through automation, performance of products etc) through clearly defined metrics.
  • Stay current with the latest tools, technology ideas and methodologies; share knowledge by clearly articulating results and ideas to key decision makers.
  • Hiring, mentoring, and retaining a very talented team.

 

Requirements:

  • 12 - 20 years of strong experience in product development.
  • Strong experience in building data engineering (no SQL DBs, HDFS, Kafka, cassandra, Elasticsearch, Spark etc) intensive backend.
  • Excellent track record of designing and delivering System architecture, implementation and deployment of successful solutions in a custome facing role
  • Strong in problem solving and analytical skills.
  • Ability to influence decision making through data and be metric driven.
  • Strong understanding of non-functional requirements like security, test automation etc.
  • Fluency in Java, Spring, Hibernate, J2EE, REST Services.
  • Ability to hire, mentor and retain best-of-the-breed engineers.
  • Exposure to Agile development methodologies.
  • Ability to collaborate across teams and strong interpersonal skills.
  • SAAS experience a plus.

 

Read more
India's best Short Video App
Bengaluru (Bangalore)
4 - 12 yrs
₹25L - ₹50L / yr
Data engineering
Big Data
Spark
Apache Kafka
Apache Hive
+26 more
What Makes You a Great Fit for The Role?

You’re awesome at and will be responsible for
 
Extensive programming experience with cross-platform development of one of the following Java/SpringBoot, Javascript/Node.js, Express.js or Python
3-4 years of experience in big data analytics technologies like Storm, Spark/Spark streaming, Flink, AWS Kinesis, Kafka streaming, Hive, Druid, Presto, Elasticsearch, Airflow, etc.
3-4 years of experience in building high performance RPC services using different high performance paradigms: multi-threading, multi-processing, asynchronous programming (nonblocking IO), reactive programming,
3-4 years of experience working high throughput low latency databases and cache layers like MongoDB, Hbase, Cassandra, DynamoDB,, Elasticache ( Redis + Memcache )
Experience with designing and building high scale app backends and micro-services leveraging cloud native services on AWS like proxies, caches, CDNs, messaging systems, Serverless compute(e.g. lambda), monitoring and telemetry.
Strong understanding of distributed systems fundamentals around scalability, elasticity, availability, fault-tolerance.
Experience in analysing and improving the efficiency, scalability, and stability of distributed systems and backend micro services.
5-7 years of strong design/development experience in building massively large scale, high throughput low latency distributed internet systems and products.
Good experience in working with Hadoop and Big Data technologies like HDFS, Pig, Hive, Storm, HBase, Scribe, Zookeeper and NoSQL systems etc.
Agile methodologies, Sprint management, Roadmap, Mentoring, Documenting, Software architecture.
Liaison with Product Management, DevOps, QA, Client and other teams
 
Your Experience Across The Years in the Roles You’ve Played
 
Have total or more 5 - 7 years of experience with 2-3 years in a startup.
Have B.Tech or M.Tech or equivalent academic qualification from premier institute.
Experience in Product companies working on Internet-scale applications is preferred
Thoroughly aware of cloud computing infrastructure on AWS leveraging cloud native service and infrastructure services to design solutions.
Follow Cloud Native Computing Foundation leveraging mature open source projects including understanding of containerisation/Kubernetes.
 
You are passionate about learning or growing your expertise in some or all of the following
Data Pipelines
Data Warehousing
Statistics
Metrics Development
 
We Value Engineers Who Are
 
Customer-focused: We believe that doing what’s right for the creator is ultimately what will drive our business forward.
Obsessed with Quality: Your Production code just works & scales linearly
Team players. You believe that more can be achieved together. You listen to feedback and also provide supportive feedback to help others grow/improve.
Pragmatic: We do things quickly to learn what our creators desire. You know when it’s appropriate to take shortcuts that don’t sacrifice quality or maintainability.
Owners: Engineers at Chingari know how to positively impact the business.
Read more
netmedscom

at netmedscom

3 recruiters
Vijay Hemnath
Posted by Vijay Hemnath
Chennai
2 - 5 yrs
₹6L - ₹25L / yr
Big Data
Hadoop
Apache Hive
Scala
Spark
+12 more

We are looking for an outstanding Big Data Engineer with experience setting up and maintaining Data Warehouse and Data Lakes for an Organization. This role would closely collaborate with the Data Science team and assist the team build and deploy machine learning and deep learning models on big data analytics platforms.

Roles and Responsibilities:

  • Develop and maintain scalable data pipelines and build out new integrations and processes required for optimal extraction, transformation, and loading of data from a wide variety of data sources using 'Big Data' technologies.
  • Develop programs in Scala and Python as part of data cleaning and processing.
  • Assemble large, complex data sets that meet functional / non-functional business requirements and fostering data-driven decision making across the organization.  
  • Responsible to design and develop distributed, high volume, high velocity multi-threaded event processing systems.
  • Implement processes and systems to validate data, monitor data quality, ensuring production data is always accurate and available for key stakeholders and business processes that depend on it.
  • Perform root cause analysis on internal and external data and processes to answer specific business questions and identify opportunities for improvement.
  • Provide high operational excellence guaranteeing high availability and platform stability.
  • Closely collaborate with the Data Science team and assist the team build and deploy machine learning and deep learning models on big data analytics platforms.

Skills:

  • Experience with Big Data pipeline, Big Data analytics, Data warehousing.
  • Experience with SQL/No-SQL, schema design and dimensional data modeling.
  • Strong understanding of Hadoop Architecture, HDFS ecosystem and eexperience with Big Data technology stack such as HBase, Hadoop, Hive, MapReduce.
  • Experience in designing systems that process structured as well as unstructured data at large scale.
  • Experience in AWS/Spark/Java/Scala/Python development.
  • Should have Strong skills in PySpark (Python & SPARK). Ability to create, manage and manipulate Spark Dataframes. Expertise in Spark query tuning and performance optimization.
  • Experience in developing efficient software code/frameworks for multiple use cases leveraging Python and big data technologies.
  • Prior exposure to streaming data sources such as Kafka.
  • Should have knowledge on Shell Scripting and Python scripting.
  • High proficiency in database skills (e.g., Complex SQL), for data preparation, cleaning, and data wrangling/munging, with the ability to write advanced queries and create stored procedures.
  • Experience with NoSQL databases such as Cassandra / MongoDB.
  • Solid experience in all phases of Software Development Lifecycle - plan, design, develop, test, release, maintain and support, decommission.
  • Experience with DevOps tools (GitHub, Travis CI, and JIRA) and methodologies (Lean, Agile, Scrum, Test Driven Development).
  • Experience building and deploying applications on on-premise and cloud-based infrastructure.
  • Having a good understanding of machine learning landscape and concepts. 

 

Qualifications and Experience:

Engineering and post graduate candidates, preferably in Computer Science, from premier institutions with proven work experience as a Big Data Engineer or a similar role for 3-5 years.

Certifications:

Good to have at least one of the Certifications listed here:

    AZ 900 - Azure Fundamentals

    DP 200, DP 201, DP 203, AZ 204 - Data Engineering

    AZ 400 - Devops Certification

Read more
MNC

at MNC

Agency job
via Fragma Data Systems by Harpreet kour
Bengaluru (Bangalore)
5 - 9 yrs
₹16L - ₹20L / yr
Apache Hadoop
Hadoop
Apache Hive
HDFS
SSL
+1 more
  • Responsibilities
         - Responsible for implementation and ongoing administration of Hadoop
    infrastructure.
         - Aligning with the systems engineering team to propose and deploy new
    hardware and software environments required for Hadoop and to expand existing
    environments.
         - Working with data delivery teams to setup new Hadoop users. This job includes
    setting up Linux users, setting up Kerberos principals and testing HDFS, Hive, Pig
    and MapReduce access for the new users.
         - Cluster maintenance as well as creation and removal of nodes using tools like
    Ganglia, Nagios, Cloudera Manager Enterprise, Dell Open Manage and other tools
         - Performance tuning of Hadoop clusters and Hadoop MapReduce routines
         - Screen Hadoop cluster job performances and capacity planning
         - Monitor Hadoop cluster connectivity and security
         - Manage and review Hadoop log files.
         - File system management and monitoring.
         - Diligently teaming with the infrastructure, network, database, application and
    business intelligence teams to guarantee high data quality and availability
         - Collaboration with application teams to install operating system and Hadoop
    updates, patches, version upgrades when required.
        
    READ MORE OF THE JOB DESCRIPTION 
    Qualifications
    Qualifications
         - Bachelors Degree in Information Technology, Computer Science or other
    relevant fields
         - General operational expertise such as good troubleshooting skills,
    understanding of systems capacity, bottlenecks, basics of memory, CPU, OS,
    storage, and networks.
         - Hadoop skills like HBase, Hive, Pig, Mahout
         - Ability to deploy Hadoop cluster, add and remove nodes, keep track of jobs,
    monitor critical parts of the cluster, configure name node high availability, schedule
    and configure it and take backups.
         - Good knowledge of Linux as Hadoop runs on Linux.
         - Familiarity with open source configuration management and deployment tools
    such as Puppet or Chef and Linux scripting.
         Nice to Have
         - Knowledge of Troubleshooting Core Java Applications is a plus.

Read more
Dremio

at Dremio

4 recruiters
Kiran B
Posted by Kiran B
Hyderabad, Bengaluru (Bangalore)
15 - 20 yrs
Best in industry
Java
Data Structures
Algorithms
Multithreading
Problem solving
+7 more

About the Role

The Dremio India team owns the DataLake Engine along with Cloud Infrastructure and services that power it. With focus on next generation data analytics supporting modern table formats like Iceberg, Deltalake, and open source initiatives such as Apache Arrow, Project Nessie and hybrid-cloud infrastructure, this team provides various opportunities to learn, deliver, and grow in career. We are looking for technical leaders with passion and experience in architecting and delivering high-quality distributed systems at massive scale.

Responsibilities & ownership

  • Lead end-to-end delivery and customer success of next-generation features related to scalability, reliability, robustness, usability, security, and performance of the product
  • Lead and mentor others about concurrency, parallelization to deliver scalability, performance and resource optimization in a multithreaded and distributed environment
  • Propose and promote strategic company-wide tech investments taking care of business goals, customer requirements, and industry standards
  • Lead the team to solve complex, unknown and ambiguous problems, and customer issues cutting across team and module boundaries with technical expertise, and influence others
  • Review and influence designs of other team members 
  • Design and deliver architectures that run optimally on public clouds like GCP, AWS, and Azure
  • Partner with other leaders to nurture innovation and engineering excellence in the team
  • Drive priorities with others to facilitate timely accomplishments of business objectives
  • Perform RCA of customer issues and drive investments to avoid similar issues in future
  • Collaborate with Product Management, Support, and field teams to ensure that customers are successful with Dremio
  • Proactively suggest learning opportunities about new technology and skills, and be a role model for constant learning and growth

Requirements

  • B.S./M.S/Equivalent in Computer Science or a related technical field or equivalent experience
  • Fluency in Java/C++ with 15+ years of experience developing production-level software
  • Strong foundation in data structures, algorithms, multi-threaded and asynchronous programming models and their use in developing distributed and scalable systems
  • 8+ years experience in developing complex and scalable distributed systems and delivering, deploying, and managing microservices successfully
  • Subject Matter Expert in one or more of query processing or optimization, distributed systems, concurrency, micro service based architectures, data replication, networking, storage systems
  • Experience in taking company-wide initiatives, convincing stakeholders, and delivering them
  • Expert in solving complex, unknown and ambiguous problems spanning across teams and taking initiative in planning and delivering them with high quality
  • Ability to anticipate and propose plan/design changes based on changing requirements 
  • Passion for quality, zero downtime upgrades, availability, resiliency, and uptime of the platform
  • Passion for learning and delivering using latest technologies
  • Hands-on experience of working projects on AWS, Azure, and GCP 
  • Experience with containers and Kubernetes for orchestration and container management in private and public clouds (AWS, Azure,  and GCP) 
  • Understanding of distributed file systems such as  S3, ADLS or HDFS
  • Excellent communication skills and affinity for collaboration and teamwork

 

Read more
MNC

at MNC

Agency job
via Fragma Data Systems by Harpreet kour
Bengaluru (Bangalore)
3 - 6 yrs
₹6L - ₹15L / yr
Apache Hadoop
Hadoop
HDFS
Apache Sqoop
Apache Flume
+5 more
1. Design and development of data ingestion pipelines.
2. Perform data migration and conversion activities.
3. Develop and integrate software applications using suitable development
methodologies and standards, applying standard architectural patterns, taking
into account critical performance characteristics and security measures.
4. Collaborate with Business Analysts, Architects and Senior Developers to
establish the physical application framework (e.g. libraries, modules, execution
environments).
5. Perform end to end automation of ETL process for various datasets that are
being ingested into the big data platform.
Read more
Indium Software

at Indium Software

16 recruiters
Ivarajneasan S K
Posted by Ivarajneasan S K
Chennai
9 - 14 yrs
₹12L - ₹18L / yr
Apache Hadoop
Hadoop
Cloudera
HDFS
MapReduce
+2 more
Deploying a Hadoop cluster, maintaining a hadoop cluster, adding and removing nodes using cluster monitoring tools like Ganglia Nagios or Cloudera Manager, configuring the NameNode high availability and keeping a track of all the running hadoop jobs.

Good understating or hand's on in Kafka Admin / Apache Kafka Streaming.

Implementing, managing, and administering the overall hadoop infrastructure.

Takes care of the day-to-day running of Hadoop clusters

A hadoop administrator will have to work closely with the database team, network team, BI team, and application teams to make sure that all the big data applications are highly available and performing as expected.

If working with open source Apache Distribution, then hadoop admins have to manually setup all the configurations- Core-Site, HDFS-Site, YARN-Site and Map Red-Site. However, when working with popular hadoop distribution like Hortonworks, Cloudera or MapR the configuration files are setup on startup and the hadoop admin need not configure them manually.

Hadoop admin is responsible for capacity planning and estimating the requirements for lowering or increasing the capacity of the hadoop cluster.

Hadoop admin is also responsible for deciding the size of the hadoop cluster based on the data to be stored in HDFS.

Ensure that the hadoop cluster is up and running all the time.

Monitoring the cluster connectivity and performance.

Manage and review Hadoop log files.

Backup and recovery tasks

Resource and security management

Troubleshooting application errors and ensuring that they do not occur again.
Read more
Bengaluru (Bangalore)
4 - 9 yrs
₹15L - ₹30L / yr
Big Data
Hadoop
Data processing
Python
Data engineering
+3 more

REQUIREMENT:

  •  Previous experience of working in large scale data engineering
  •  4+ years of experience working in data engineering and/or backend technologies with cloud experience (any) is mandatory.
  •  Previous experience of architecting and designing backend for large scale data processing.
  •  Familiarity and experience of working in different technologies related to data engineering – different database technologies, Hadoop, spark, storm, hive etc.
  •  Hands-on and have the ability to contribute a key portion of data engineering backend.
  •  Self-inspired and motivated to drive for exceptional results.
  •  Familiarity and experience working with different stages of data engineering – data acquisition, data refining, large scale data processing, efficient data storage for business analysis.
  •  Familiarity and experience working with different DB technologies and how to scale them.

RESPONSIBILITY:

  •  End to end responsibility to come up with data engineering architecture, design, development and then implementation of it.
  •  Build data engineering workflow for large scale data processing.
  •  Discover opportunities in data acquisition.
  •  Bring industry best practices for data engineering workflow.
  •  Develop data set processes for data modelling, mining and production.
  •  Take additional tech responsibilities for driving an initiative to completion
  •  Recommend ways to improve data reliability, efficiency and quality
  •  Goes out of their way to reduce complexity.
  •  Humble and outgoing - engineering cheerleaders.
Read more
GeakMinds Technologies Pvt Ltd
John Richardson
Posted by John Richardson
Chennai
1 - 5 yrs
₹1L - ₹6L / yr
Hadoop
Big Data
HDFS
Apache Sqoop
Apache Flume
+2 more
• Looking for Big Data Engineer with 3+ years of experience. • Hands-on experience with MapReduce-based platforms, like Pig, Spark, Shark. • Hands-on experience with data pipeline tools like Kafka, Storm, Spark Streaming. • Store and query data with Sqoop, Hive, MySQL, HBase, Cassandra, MongoDB, Drill, Phoenix, and Presto. • Hands-on experience in managing Big Data on a cluster with HDFS and MapReduce. • Handle streaming data in real time with Kafka, Flume, Spark Streaming, Flink, and Storm. • Experience with Azure cloud, Cognitive Services, Databricks is preferred.
Read more
Pion Global Solutions LTD
Sheela P
Posted by Sheela P
Mumbai
3 - 100 yrs
₹4L - ₹15L / yr
Spark
Big Data
Hadoop
HDFS
Apache Sqoop
+2 more
Looking for Big data Developers in Mumbai Location
Read more
Accion Labs

at Accion Labs

14 recruiters
Neha Mayekar
Posted by Neha Mayekar
Mumbai
5 - 14 yrs
₹8L - ₹18L / yr
HDFS
Hbase
Spark
Flume
hive
+2 more
US based Multinational Company Hands on Hadoop
Read more
Securonix

at Securonix

1 recruiter
Ramakrishna Murthy
Posted by Ramakrishna Murthy
Pune
3 - 7 yrs
₹10L - ₹15L / yr
HDFS
Apache Flume
Apache HBase
Hadoop
Impala
+3 more
Securonix is a Big Data Security Analytics product company. The only product which delivers real-time behavior analytics (UEBA) on Big Data.
Read more
Get to hear about interesting companies hiring right now
Company logo
Company logo
Company logo
Company logo
Company logo
Linkedin iconFollow Cutshort
Why apply via Cutshort?
Connect with actual hiring teams and get their fast response. No spam.
Find more jobs
Get to hear about interesting companies hiring right now
Company logo
Company logo
Company logo
Company logo
Company logo
Linkedin iconFollow Cutshort