24+ HDFS Jobs in India
Apply to 24+ HDFS Jobs on CutShort.io. Find your next job, effortlessly. Browse HDFS Jobs and apply today!
Experience: 12-15 Years
Key Responsibilities:
- Client Engagement & Requirements Gathering: Independently engage with client stakeholders to
- understand data landscapes and requirements, translating them into functional and technical specifications.
- Data Architecture & Solution Design: Architect and implement Hadoop-based Cloudera CDP solutions,
- including data integration, data warehousing, and data lakes.
- Data Processes & Governance: Develop data ingestion and ETL/ELT frameworks, ensuring robust data governance and quality practices.
- Performance Optimization: Provide SQL expertise and optimize Hadoop ecosystems (HDFS, Ozone, Kudu, Spark Streaming, etc.) for maximum performance.
- Coding & Development: Hands-on coding in relevant technologies and frameworks, ensuring project deliverables meet stringent quality and performance standards.
- API & Database Management: Integrate APIs and manage databases (e.g., PostgreSQL, Oracle) to support seamless data flows.
- Leadership & Mentoring: Guide and mentor a team of data engineers and analysts, fostering collaboration and technical excellence.
Skills Required:
- a. Technical Proficiency:
- • Extensive experience with Hadoop ecosystem tools and services (HDFS, YARN, Cloudera
- Manager, Impala, Kudu, Hive, Spark Streaming, etc.).
- • Proficiency in programming languages like Spark, Python, Scala and a strong grasp of SQL
- performance tuning.
- • ETL tool expertise (e.g., Informatica, Talend, Apache Nifi) and data modelling knowledge.
- • API integration skills for effective data flow management.
- b. Project Management & Communication:
- • Proven ability to lead large-scale data projects and manage project timelines.
- • Excellent communication, presentation, and critical thinking skills.
- c. Client & Team Leadership:
- • Engage effectively with clients and partners, leading onsite and offshore teams.
Secondary Skills: Streaming, Archiving , AWS / AZURE / CLOUD
Role:
· Should have strong programming and support experience in Java, J2EE technologies
· Should have good experience in Core Java, JSP, Sevlets, JDBC
· Good exposure in Hadoop development ( HDFS, Map Reduce, Hive, HBase, Spark)
· Should have 2+ years of Java experience and 1+ years of experience in Hadoop
· Should possess good communication skills
- Architectural Leadership:
- Design and architect robust, scalable, and high-performance Hadoop solutions.
- Define and implement data architecture strategies, standards, and processes.
- Collaborate with senior leadership to align data strategies with business goals.
- Technical Expertise:
- Develop and maintain complex data processing systems using Hadoop and its ecosystem (HDFS, YARN, MapReduce, Hive, HBase, Pig, etc.).
- Ensure optimal performance and scalability of Hadoop clusters.
- Oversee the integration of Hadoop solutions with existing data systems and third-party applications.
- Strategic Planning:
- Develop long-term plans for data architecture, considering emerging technologies and future trends.
- Evaluate and recommend new technologies and tools to enhance the Hadoop ecosystem.
- Lead the adoption of big data best practices and methodologies.
- Team Leadership and Collaboration:
- Mentor and guide data engineers and developers, fostering a culture of continuous improvement.
- Work closely with data scientists, analysts, and other stakeholders to understand requirements and deliver high-quality solutions.
- Ensure effective communication and collaboration across all teams involved in data projects.
- Project Management:
- Lead large-scale data projects from inception to completion, ensuring timely delivery and high quality.
- Manage project resources, budgets, and timelines effectively.
- Monitor project progress and address any issues or risks promptly.
- Data Governance and Security:
- Implement robust data governance policies and procedures to ensure data quality and compliance.
- Ensure data security and privacy by implementing appropriate measures and controls.
- Conduct regular audits and reviews of data systems to ensure compliance with industry standards and regulations.
Title: Platform Engineer Location: Chennai Work Mode: Hybrid (Remote and Chennai Office) Experience: 4+ years Budget: 16 - 18 LPA
Responsibilities:
- Parse data using Python, create dashboards in Tableau.
- Utilize Jenkins for Airflow pipeline creation and CI/CD maintenance.
- Migrate Datastage jobs to Snowflake, optimize performance.
- Work with HDFS, Hive, Kafka, and basic Spark.
- Develop Python scripts for data parsing, quality checks, and visualization.
- Conduct unit testing and web application testing.
- Implement Apache Airflow and handle production migration.
- Apply data warehousing techniques for data cleansing and dimension modeling.
Requirements:
- 4+ years of experience as a Platform Engineer.
- Strong Python skills, knowledge of Tableau.
- Experience with Jenkins, Snowflake, HDFS, Hive, and Kafka.
- Proficient in Unix Shell Scripting and SQL.
- Familiarity with ETL tools like DataStage and DMExpress.
- Understanding of Apache Airflow.
- Strong problem-solving and communication skills.
Note: Only candidates willing to work in Chennai and available for immediate joining will be considered. Budget for this position is 16 - 18 LPA.
As Conviva is expanding, we are building products providing deep insights into end-user experience for our customers.
Platform and TLB Team
The vision for the TLB team is to build data processing software that works on terabytes of streaming data in real-time. Engineer the next-gen Spark-like system for in-memory computation of large time-series datasets – both Spark-like backend infra and library-based programming model. Build a horizontally and vertically scalable system that analyses trillions of events per day within sub-second latencies. Utilize the latest and greatest big data technologies to build solutions for use cases across multiple verticals. Lead technology innovation and advancement that will have a big business impact for years to come. Be part of a worldwide team building software using the latest technologies and the best of software development tools and processes.
What You’ll Do
This is an individual contributor position. Expectations will be on the below lines:
- Design, build and maintain the stream processing, and time-series analysis system which is at the heart of Conviva’s products
- Responsible for the architecture of the Conviva platform
- Build features, enhancements, new services, and bug fixing in Scala and Java on a Jenkins-based pipeline to be deployed as Docker containers on Kubernetes
- Own the entire lifecycle of your microservice including early specs, design, technology choice, development, unit-testing, integration-testing, documentation, deployment, troubleshooting, enhancements, etc.
- Lead a team to develop a feature or parts of a product
- Adhere to the Agile model of software development to plan, estimate, and ship per business priority
What you need to succeed
- 5+ years of work experience in software development of data processing products.
- Engineering degree in software or equivalent from a premier institute.
- Excellent knowledge of fundamentals of Computer Science like algorithms and data structures. Hands-on with functional programming and know-how of its concepts
- Excellent programming and debugging skills on the JVM. Proficient in writing code in Scala/Java/Rust/Haskell/Erlang that is reliable, maintainable, secure, and performant
- Experience with big data technologies like Spark, Flink, Kafka, Druid, HDFS, etc.
- Deep understanding of distributed systems concepts and scalability challenges including multi-threading, concurrency, sharding, partitioning, etc.
- Experience/knowledge of Akka/Lagom framework and/or stream processing technologies like RxJava or Project Reactor will be a big plus. Knowledge of design patterns like event-streaming, CQRS and DDD to build large microservice architectures will be a big plus
- Excellent communication skills. Willingness to work under pressure. Hunger to learn and succeed. Comfortable with ambiguity. Comfortable with complexity
Underpinning the Conviva platform is a rich history of innovation. More than 60 patents represent award-winning technologies and standards, including first-of-its kind-innovations like time-state analytics and AI-automated data modeling, that surfaces actionable insights. By understanding real-world human experiences and having the ability to act within seconds of observation, our customers can solve business-critical issues and focus on growing their business ahead of the competition. Examples of the brands Conviva has helped fuel streaming growth for include: DAZN, Disney+, HBO, Hulu, NBCUniversal, Paramount+, Peacock, Sky, Sling TV, Univision and Warner Bros Discovery.
Privately held, Conviva is headquartered in Silicon Valley, California with offices and people around the globe. For more information, visit us at www.conviva.com. Join us to help extend our leadership position in big data streaming analytics to new audiences and markets!
Multinational Company providing energy & Automation digital
Roles and Responsibilities
Our company is seeking to hire a skilled software developer to help with the development of our AI/ML platform.
Your duties will primarily revolve around building Platform by writing code in Scala, as well as modifying platform
to fix errors, work on distributed computing, adapt it to new cloud services, improve its performance, or upgrade
interfaces. To be successful in this role, you will need extensive knowledge of programming languages and the
software development life-cycle.
Responsibilities:
Analyze, design develop, troubleshoot and debug Platform
Writes code and guides other team membersfor best practices and performs testing and debugging of
applications.
Specify, design and implementminor changes to existing software architecture. Build highly complex
enhancements and resolve complex bugs. Build and execute unit tests and unit plans.
Duties and tasks are varied and complex, needing independent judgment. Fully competent in own area of
expertise
Experience:
The candidate should have about 2+ years of experience with design and development in Java/Scala. Experience in
algorithm, Distributed System, Data-structure, database and architectures of distributed System is mandatory.
Required Skills:
1. In-depth knowledge of Hadoop, Spark architecture and its componentssuch as HDFS, YARN and executor, cores and memory param
2. Knowledge of Scala/Java.
3. Extensive experience in developing spark job. Should possess good Oops knowledge and be aware of
enterprise application design patterns.
4. Good knowledge of Unix/Linux.
5. Experience working on large-scale software projects
6. Keep an eye out for technological trends, open-source projects that can be used.
7. Knows common programming languages Frameworks
Job Description
Mandatory Requirements
-
Experience in AWS Glue
-
Experience in Apache Parquet
-
Proficient in AWS S3 and data lake
-
Knowledge of Snowflake
-
Understanding of file-based ingestion best practices.
-
Scripting language - Python & pyspark
CORE RESPONSIBILITIES
-
Create and manage cloud resources in AWS
-
Data ingestion from different data sources which exposes data using different technologies, such as: RDBMS, flat files, Streams, and Time series data based on various proprietary systems. Implement data ingestion and processing with the help of Big Data technologies
-
Data processing/transformation using various technologies such as Spark and Cloud Services. You will need to understand your part of business logic and implement it using the language supported by the base data platform
-
Develop automated data quality check to make sure right data enters the platform and verifying the results of the calculations
-
Develop an infrastructure to collect, transform, combine and publish/distribute customer data.
-
Define process improvement opportunities to optimize data collection, insights and displays.
-
Ensure data and results are accessible, scalable, efficient, accurate, complete and flexible
-
Identify and interpret trends and patterns from complex data sets
-
Construct a framework utilizing data visualization tools and techniques to present consolidated analytical and actionable results to relevant stakeholders.
-
Key participant in regular Scrum ceremonies with the agile teams
-
Proficient at developing queries, writing reports and presenting findings
-
Mentor junior members and bring best industry practices.
QUALIFICATIONS
-
5-7+ years’ experience as data engineer in consumer finance or equivalent industry (consumer loans, collections, servicing, optional product, and insurance sales)
-
Strong background in math, statistics, computer science, data science or related discipline
-
Advanced knowledge one of language: Java, Scala, Python, C#
-
Production experience with: HDFS, YARN, Hive, Spark, Kafka, Oozie / Airflow, Amazon Web Services (AWS), Docker / Kubernetes, Snowflake
-
Proficient with
-
Data mining/programming tools (e.g. SAS, SQL, R, Python)
-
Database technologies (e.g. PostgreSQL, Redshift, Snowflake. and Greenplum)
-
Data visualization (e.g. Tableau, Looker, MicroStrategy)
-
Comfortable learning about and deploying new technologies and tools.
-
Organizational skills and the ability to handle multiple projects and priorities simultaneously and meet established deadlines.
-
Good written and oral communication skills and ability to present results to non-technical audiences
-
Knowledge of business intelligence and analytical tools, technologies and techniques.
Familiarity and experience in the following is a plus:
-
AWS certification
-
Spark Streaming
-
Kafka Streaming / Kafka Connect
-
ELK Stack
-
Cassandra / MongoDB
-
CI/CD: Jenkins, GitLab, Jira, Confluence other related tools
Responsibilities :
- Provide Support Services to our Gold & Enterprise customers using our flagship product suits. This may include assistance provided during the engineering and operations of distributed systems as well as responses for mission-critical systems and production customers.
- Lead end-to-end delivery and customer success of next-generation features related to scalability, reliability, robustness, usability, security, and performance of the product
- Lead and mentor others about concurrency, parallelization to deliver scalability, performance, and resource optimization in a multithreaded and distributed environment
- Demonstrate the ability to actively listen to customers and show empathy to the customer’s business impact when they experience issues with our products
Requires Skills :
- 10+ years of Experience with a highly scalable, distributed, multi-node environment (100+ nodes)
- Hadoop operation including Zookeeper, HDFS, YARN, Hive, and related components like the Hive metastore, Cloudera Manager/Ambari, etc
- Authentication and security configuration and tuning (KNOX, LDAP, Kerberos, SSL/TLS, second priority: SSO/OAuth/OIDC, Ranger/Sentry)
- Java troubleshooting, e.g., collection and evaluation of jstacks, heap dumps
- Linux, NFS, Windows, including application installation, scripting, basic command line
- Docker and Kubernetes configuration and troubleshooting, including Helm charts, storage options, logging, and basic kubectl CLI
- Experience working with scripting languages (Bash, PowerShell, Python)
- Working knowledge of application, server, and network security management concepts
- Familiarity with virtual machine technologies
- Knowledge of databases like MySQL and PostgreSQL,
- Certification on any of the leading Cloud providers (AWS, Azure, GCP ) and/or Kubernetes is a big plus
Senior SRE - Acceldata (IC3 Level)
About the Job
You will join a team of highly skilled engineers who are responsible for delivering Acceldata’s support services. Our Site Reliability Engineers are trained to be active listeners and demonstrate empathy when customers encounter product issues. In our fun and collaborative environment Site Reliability Engineers develop strong business, interpersonal and technical skills to deliver high-quality service to our valued customers.
When you arrive for your first day, we’ll want you to have:
- Solid skills in troubleshooting to repair failed products or processes on a machine or a system using a logical, systematic search for the source of a problem in order to solve it, and make the product or process operational again
- A strong ability to understand the feelings of our customers as we empathize with them on the issue at hand
- A strong desire to increase your product and technology skillset; increase- your confidence supporting our products so you can help our customers succeed
In this position you will…
- Provide Support Services to our Gold & Enterprise customers using our flagship Acceldata Pulse,Flow & Torch Product suits. This may include assistance provided during the engineering and operations of distributed systems as well as responses for mission-critical systems and production customers.
- Demonstrate the ability to actively listen to customers and show empathy to the customer’s business impact when they experience issues with our products
- Participate in the queue management and coordination process by owning customer escalations, managing the unassigned queue.
- Be involved with and work on other support related activities - Performing POC & assisting Onboarding deployments of Acceldata & Hadoop distribution products.
- Triage, diagnose and escalate customer inquiries when applicable during their engineering and operations efforts.
- Collaborate and share solutions with both customers and the Internal team.
- Investigate product related issues both for particular customers and for common trends that may arise
- Study and understand critical system components and large cluster operations
- Differentiate between issues that arise in operations, user code, or product
- Coordinate enhancement and feature requests with product management and Acceldata engineering team.
- Flexible in working in Shifts.
- Participate in a Rotational weekend on-call roster for critical support needs.
- Participate as a designated or dedicated engineer for specific customers. Aspects of this engagement translates to building long term successful relationships with customers, leading weekly status calls, and occasional visits to customer sites
In this position, you should have…
- A strong desire and aptitude to become a well-rounded support professional. Acceldata Support considers the service we deliver as our core product.
- A positive attitude towards feedback and continual improvement
- A willingness to give direct feedback to and partner with management to improve team operations
- A tenacity to bring calm and order to the often stressful situations of customer cases
- A mental capability to multi-task across many customer situations simultaneously
- Bachelor degree in Computer Science or Engineering or equivalent experience. Master’s degree is a plus
- At least 2+ years of experience with at least one of the following cloud platforms: Amazon Web Services (AWS), Microsoft Azure, Google Cloud Platform (GCP), experience with managing and supporting a cloud infrastructure on any of the 3 platforms. Also knowledge on Kubernetes, Docker is a must.
- Strong troubleshooting skills (in example, TCP/IP, DNS, File system, Load balancing, database, Java)
- Excellent communication skills in English (written and verbal)
- Prior enterprise support experience in a technical environment strongly preferred
Strong Hands-on Experience Working With Or Supporting The Following
- 8-12 years of Experience with a highly-scalable, distributed, multi-node environment (50+ nodes)
- Hadoop operation including Zookeeper, HDFS, YARN, Hive, and related components like the Hive metastore, Cloudera Manager/Ambari, etc
- Authentication and security configuration and tuning (KNOX, LDAP, Kerberos, SSL/TLS, second priority: SSO/OAuth/OIDC, Ranger/Sentry)
- Java troubleshooting, e.g., collection and evaluation of jstacks, heap dumps
You might also have…
- Linux, NFS, Windows, including application installation, scripting, basic command line
- Docker and Kubernetes configuration and troubleshooting, including Helm charts, storage options, logging, and basic kubectl CLI
- Experience working with scripting languages (Bash, PowerShell, Python)
- Working knowledge of application, server, and network security management concepts
- Familiarity with virtual machine technologies
- Knowledge of databases like MySQL and PostgreSQL,
- Certification on any of the leading Cloud providers (AWS, Azure, GCP ) and/or Kubernetes is a big plus
The right person in this role has an opportunity to make a huge impact at Acceldata and add value to our future decisions. If this position has piqued your interest and you have what we described - we invite you to apply! An adventure in data awaits.
Learn more at https://www.acceldata.io/about-us">https://www.acceldata.io/about-us
AI-powered cloud-based SaaS solution provider
● Able to contribute to the gathering of functional requirements, developing technical
specifications, and test case planning
● Demonstrating technical expertise, and solving challenging programming and design
problems
● 60% hands-on coding with architecture ownership of one or more products
● Ability to articulate architectural and design options, and educate development teams and
business users
● Resolve defects/bugs during QA testing, pre-production, production, and post-release
patches
● Mentor and guide team members
● Work cross-functionally with various bidgely teams including product management, QA/QE,
various product lines, and/or business units to drive forward results
Requirements
● BS/MS in computer science or equivalent work experience
● 8-12 years’ experience designing and developing applications in Data Engineering
● Hands-on experience with Big data EcoSystems.
● Past experience with Hadoop,Hdfs,Map Reduce,YARN,AWS Cloud, EMR, S3, Spark, Cassandra,
Kafka, Zookeeper
● Expertise with any of the following Object-Oriented Languages (OOD): Java/J2EE,Scala,
Python
● Ability to lead and mentor technical team members
● Expertise with the entire Software Development Life Cycle (SDLC)
● Excellent communication skills: Demonstrated ability to explain complex technical issues to
both technical and non-technical audiences
● Expertise in the Software design/architecture process
● Expertise with unit testing & Test-Driven Development (TDD)
● Business Acumen - strategic thinking & strategy development
● Experience on Cloud or AWS is preferable
● Have a good understanding and ability to develop software, prototypes, or proofs of
concepts (POC's) for various Data Engineering requirements.
● Experience with Agile Development, SCRUM, or Extreme Programming methodologies
● Able contribute to the gathering of functional requirements, developing technical
specifications, and project & test planning
● Demonstrating technical expertise, and solving challenging programming and design
problems
● Roughly 80% hands-on coding
● Generate technical documentation and PowerPoint presentations to communicate
architectural and design options, and educate development teams and business users
● Resolve defects/bugs during QA testing, pre-production, production, and post-release
patches
● Work cross-functionally with various bidgely teams including: product management,
QA/QE, various product lines, and/or business units to drive forward results
Requirements
● BS/MS in computer science or equivalent work experience
● 2-4 years’ experience designing and developing applications in Data Engineering
● Hands-on experience with Big data Eco Systems.
● Hadoop,Hdfs,Map Reduce,YARN,AWS Cloud, EMR, S3, Spark, Cassandra, Kafka,
Zookeeper
● Expertise with any of the following Object-Oriented Languages (OOD): Java/J2EE,Scala,
Python
● Strong leadership experience: Leading meetings, presenting if required
● Excellent communication skills: Demonstrated ability to explain complex technical
issues to both technical and non-technical audiences
● Expertise in the Software design/architecture process
● Expertise with unit testing & Test-Driven Development (TDD)
● Experience on Cloud or AWS is preferable
● Have a good understanding and ability to develop software, prototypes, or proofs of
concepts (POC's) for various Data Engineering requirements.
Location: Bangalore
Function: Software Engineering → Backend Development
We are looking for an extraordinary and dynamic Director of Engineering to be part of its Engineering team in Bangalore. You must have a good record of architecting scalable solutions, hiring and mentoring talented teams and working with product managers to build great products. You must be highly analytical and a good problem solver. You will be part of a highly energetic and innovative team that believes nothing is impossible with some creativity and hard work.
Responsibilities:
- Own the overall solution design and implementation for backend systems. This includes requirement analysis, scope discussion, design, architecture, implementation, delivery and resolving production issues related to engineering.
- Owner of the technology roadmap of our products from core back end engineering perspective.
- Ability to guide the team in debugging production issues and write best-of- the breed code.
- Drive engineering excellence (defects, productivity through automation, performance of products etc) through clearly defined metrics.
- Stay current with the latest tools, technology ideas and methodologies; share knowledge by clearly articulating results and ideas to key decision makers.
- Hiring, mentoring, and retaining a very talented team.
Requirements:
- 12 - 20 years of strong experience in product development.
- Strong experience in building data engineering (no SQL DBs, HDFS, Kafka, cassandra, Elasticsearch, Spark etc) intensive backend.
- Excellent track record of designing and delivering System architecture, implementation and deployment of successful solutions in a custome facing role
- Strong in problem solving and analytical skills.
- Ability to influence decision making through data and be metric driven.
- Strong understanding of non-functional requirements like security, test automation etc.
- Fluency in Java, Spring, Hibernate, J2EE, REST Services.
- Ability to hire, mentor and retain best-of-the-breed engineers.
- Exposure to Agile development methodologies.
- Ability to collaborate across teams and strong interpersonal skills.
- SAAS experience a plus.
We are looking for an outstanding Big Data Engineer with experience setting up and maintaining Data Warehouse and Data Lakes for an Organization. This role would closely collaborate with the Data Science team and assist the team build and deploy machine learning and deep learning models on big data analytics platforms.
Roles and Responsibilities:
- Develop and maintain scalable data pipelines and build out new integrations and processes required for optimal extraction, transformation, and loading of data from a wide variety of data sources using 'Big Data' technologies.
- Develop programs in Scala and Python as part of data cleaning and processing.
- Assemble large, complex data sets that meet functional / non-functional business requirements and fostering data-driven decision making across the organization.
- Responsible to design and develop distributed, high volume, high velocity multi-threaded event processing systems.
- Implement processes and systems to validate data, monitor data quality, ensuring production data is always accurate and available for key stakeholders and business processes that depend on it.
- Perform root cause analysis on internal and external data and processes to answer specific business questions and identify opportunities for improvement.
- Provide high operational excellence guaranteeing high availability and platform stability.
- Closely collaborate with the Data Science team and assist the team build and deploy machine learning and deep learning models on big data analytics platforms.
Skills:
- Experience with Big Data pipeline, Big Data analytics, Data warehousing.
- Experience with SQL/No-SQL, schema design and dimensional data modeling.
- Strong understanding of Hadoop Architecture, HDFS ecosystem and eexperience with Big Data technology stack such as HBase, Hadoop, Hive, MapReduce.
- Experience in designing systems that process structured as well as unstructured data at large scale.
- Experience in AWS/Spark/Java/Scala/Python development.
- Should have Strong skills in PySpark (Python & SPARK). Ability to create, manage and manipulate Spark Dataframes. Expertise in Spark query tuning and performance optimization.
- Experience in developing efficient software code/frameworks for multiple use cases leveraging Python and big data technologies.
- Prior exposure to streaming data sources such as Kafka.
- Should have knowledge on Shell Scripting and Python scripting.
- High proficiency in database skills (e.g., Complex SQL), for data preparation, cleaning, and data wrangling/munging, with the ability to write advanced queries and create stored procedures.
- Experience with NoSQL databases such as Cassandra / MongoDB.
- Solid experience in all phases of Software Development Lifecycle - plan, design, develop, test, release, maintain and support, decommission.
- Experience with DevOps tools (GitHub, Travis CI, and JIRA) and methodologies (Lean, Agile, Scrum, Test Driven Development).
- Experience building and deploying applications on on-premise and cloud-based infrastructure.
- Having a good understanding of machine learning landscape and concepts.
Qualifications and Experience:
Engineering and post graduate candidates, preferably in Computer Science, from premier institutions with proven work experience as a Big Data Engineer or a similar role for 3-5 years.
Certifications:
Good to have at least one of the Certifications listed here:
AZ 900 - Azure Fundamentals
DP 200, DP 201, DP 203, AZ 204 - Data Engineering
AZ 400 - Devops Certification
-
Responsibilities
- Responsible for implementation and ongoing administration of Hadoop
infrastructure.
- Aligning with the systems engineering team to propose and deploy new
hardware and software environments required for Hadoop and to expand existing
environments.
- Working with data delivery teams to setup new Hadoop users. This job includes
setting up Linux users, setting up Kerberos principals and testing HDFS, Hive, Pig
and MapReduce access for the new users.
- Cluster maintenance as well as creation and removal of nodes using tools like
Ganglia, Nagios, Cloudera Manager Enterprise, Dell Open Manage and other tools
- Performance tuning of Hadoop clusters and Hadoop MapReduce routines
- Screen Hadoop cluster job performances and capacity planning
- Monitor Hadoop cluster connectivity and security
- Manage and review Hadoop log files.
- File system management and monitoring.
- Diligently teaming with the infrastructure, network, database, application and
business intelligence teams to guarantee high data quality and availability
- Collaboration with application teams to install operating system and Hadoop
updates, patches, version upgrades when required.
READ MORE OF THE JOB DESCRIPTION
Qualifications
Qualifications
- Bachelors Degree in Information Technology, Computer Science or other
relevant fields
- General operational expertise such as good troubleshooting skills,
understanding of systems capacity, bottlenecks, basics of memory, CPU, OS,
storage, and networks.
- Hadoop skills like HBase, Hive, Pig, Mahout
- Ability to deploy Hadoop cluster, add and remove nodes, keep track of jobs,
monitor critical parts of the cluster, configure name node high availability, schedule
and configure it and take backups.
- Good knowledge of Linux as Hadoop runs on Linux.
- Familiarity with open source configuration management and deployment tools
such as Puppet or Chef and Linux scripting.
Nice to Have
- Knowledge of Troubleshooting Core Java Applications is a plus.
About the Role
The Dremio India team owns the DataLake Engine along with Cloud Infrastructure and services that power it. With focus on next generation data analytics supporting modern table formats like Iceberg, Deltalake, and open source initiatives such as Apache Arrow, Project Nessie and hybrid-cloud infrastructure, this team provides various opportunities to learn, deliver, and grow in career. We are looking for technical leaders with passion and experience in architecting and delivering high-quality distributed systems at massive scale.
Responsibilities & ownership
- Lead end-to-end delivery and customer success of next-generation features related to scalability, reliability, robustness, usability, security, and performance of the product
- Lead and mentor others about concurrency, parallelization to deliver scalability, performance and resource optimization in a multithreaded and distributed environment
- Propose and promote strategic company-wide tech investments taking care of business goals, customer requirements, and industry standards
- Lead the team to solve complex, unknown and ambiguous problems, and customer issues cutting across team and module boundaries with technical expertise, and influence others
- Review and influence designs of other team members
- Design and deliver architectures that run optimally on public clouds like GCP, AWS, and Azure
- Partner with other leaders to nurture innovation and engineering excellence in the team
- Drive priorities with others to facilitate timely accomplishments of business objectives
- Perform RCA of customer issues and drive investments to avoid similar issues in future
- Collaborate with Product Management, Support, and field teams to ensure that customers are successful with Dremio
- Proactively suggest learning opportunities about new technology and skills, and be a role model for constant learning and growth
Requirements
- B.S./M.S/Equivalent in Computer Science or a related technical field or equivalent experience
- Fluency in Java/C++ with 15+ years of experience developing production-level software
- Strong foundation in data structures, algorithms, multi-threaded and asynchronous programming models and their use in developing distributed and scalable systems
- 8+ years experience in developing complex and scalable distributed systems and delivering, deploying, and managing microservices successfully
- Subject Matter Expert in one or more of query processing or optimization, distributed systems, concurrency, micro service based architectures, data replication, networking, storage systems
- Experience in taking company-wide initiatives, convincing stakeholders, and delivering them
- Expert in solving complex, unknown and ambiguous problems spanning across teams and taking initiative in planning and delivering them with high quality
- Ability to anticipate and propose plan/design changes based on changing requirements
- Passion for quality, zero downtime upgrades, availability, resiliency, and uptime of the platform
- Passion for learning and delivering using latest technologies
- Hands-on experience of working projects on AWS, Azure, and GCP
- Experience with containers and Kubernetes for orchestration and container management in private and public clouds (AWS, Azure, and GCP)
- Understanding of distributed file systems such as S3, ADLS or HDFS
- Excellent communication skills and affinity for collaboration and teamwork
2. Perform data migration and conversion activities.
3. Develop and integrate software applications using suitable development
methodologies and standards, applying standard architectural patterns, taking
into account critical performance characteristics and security measures.
4. Collaborate with Business Analysts, Architects and Senior Developers to
establish the physical application framework (e.g. libraries, modules, execution
environments).
5. Perform end to end automation of ETL process for various datasets that are
being ingested into the big data platform.
Good understating or hand's on in Kafka Admin / Apache Kafka Streaming.
Implementing, managing, and administering the overall hadoop infrastructure.
Takes care of the day-to-day running of Hadoop clusters
A hadoop administrator will have to work closely with the database team, network team, BI team, and application teams to make sure that all the big data applications are highly available and performing as expected.
If working with open source Apache Distribution, then hadoop admins have to manually setup all the configurations- Core-Site, HDFS-Site, YARN-Site and Map Red-Site. However, when working with popular hadoop distribution like Hortonworks, Cloudera or MapR the configuration files are setup on startup and the hadoop admin need not configure them manually.
Hadoop admin is responsible for capacity planning and estimating the requirements for lowering or increasing the capacity of the hadoop cluster.
Hadoop admin is also responsible for deciding the size of the hadoop cluster based on the data to be stored in HDFS.
Ensure that the hadoop cluster is up and running all the time.
Monitoring the cluster connectivity and performance.
Manage and review Hadoop log files.
Backup and recovery tasks
Resource and security management
Troubleshooting application errors and ensuring that they do not occur again.
REQUIREMENT:
- Previous experience of working in large scale data engineering
- 4+ years of experience working in data engineering and/or backend technologies with cloud experience (any) is mandatory.
- Previous experience of architecting and designing backend for large scale data processing.
- Familiarity and experience of working in different technologies related to data engineering – different database technologies, Hadoop, spark, storm, hive etc.
- Hands-on and have the ability to contribute a key portion of data engineering backend.
- Self-inspired and motivated to drive for exceptional results.
- Familiarity and experience working with different stages of data engineering – data acquisition, data refining, large scale data processing, efficient data storage for business analysis.
- Familiarity and experience working with different DB technologies and how to scale them.
RESPONSIBILITY:
- End to end responsibility to come up with data engineering architecture, design, development and then implementation of it.
- Build data engineering workflow for large scale data processing.
- Discover opportunities in data acquisition.
- Bring industry best practices for data engineering workflow.
- Develop data set processes for data modelling, mining and production.
- Take additional tech responsibilities for driving an initiative to completion
- Recommend ways to improve data reliability, efficiency and quality
- Goes out of their way to reduce complexity.
- Humble and outgoing - engineering cheerleaders.