4+ HDFS Jobs in Hyderabad | HDFS Job openings in Hyderabad
Apply to 4+ HDFS Jobs in Hyderabad on CutShort.io. Explore the latest HDFS Job opportunities across top companies like Google, Amazon & Adobe.
Secondary Skills: Streaming, Archiving , AWS / AZURE / CLOUD
Role:
· Should have strong programming and support experience in Java, J2EE technologies
· Should have good experience in Core Java, JSP, Sevlets, JDBC
· Good exposure in Hadoop development ( HDFS, Map Reduce, Hive, HBase, Spark)
· Should have 2+ years of Java experience and 1+ years of experience in Hadoop
· Should possess good communication skills
Multinational Company providing energy & Automation digital
Roles and Responsibilities
Job Description
Mandatory Requirements
-
Experience in AWS Glue
-
Experience in Apache Parquet
-
Proficient in AWS S3 and data lake
-
Knowledge of Snowflake
-
Understanding of file-based ingestion best practices.
-
Scripting language - Python & pyspark
CORE RESPONSIBILITIES
-
Create and manage cloud resources in AWS
-
Data ingestion from different data sources which exposes data using different technologies, such as: RDBMS, flat files, Streams, and Time series data based on various proprietary systems. Implement data ingestion and processing with the help of Big Data technologies
-
Data processing/transformation using various technologies such as Spark and Cloud Services. You will need to understand your part of business logic and implement it using the language supported by the base data platform
-
Develop automated data quality check to make sure right data enters the platform and verifying the results of the calculations
-
Develop an infrastructure to collect, transform, combine and publish/distribute customer data.
-
Define process improvement opportunities to optimize data collection, insights and displays.
-
Ensure data and results are accessible, scalable, efficient, accurate, complete and flexible
-
Identify and interpret trends and patterns from complex data sets
-
Construct a framework utilizing data visualization tools and techniques to present consolidated analytical and actionable results to relevant stakeholders.
-
Key participant in regular Scrum ceremonies with the agile teams
-
Proficient at developing queries, writing reports and presenting findings
-
Mentor junior members and bring best industry practices.
QUALIFICATIONS
-
5-7+ years’ experience as data engineer in consumer finance or equivalent industry (consumer loans, collections, servicing, optional product, and insurance sales)
-
Strong background in math, statistics, computer science, data science or related discipline
-
Advanced knowledge one of language: Java, Scala, Python, C#
-
Production experience with: HDFS, YARN, Hive, Spark, Kafka, Oozie / Airflow, Amazon Web Services (AWS), Docker / Kubernetes, Snowflake
-
Proficient with
-
Data mining/programming tools (e.g. SAS, SQL, R, Python)
-
Database technologies (e.g. PostgreSQL, Redshift, Snowflake. and Greenplum)
-
Data visualization (e.g. Tableau, Looker, MicroStrategy)
-
Comfortable learning about and deploying new technologies and tools.
-
Organizational skills and the ability to handle multiple projects and priorities simultaneously and meet established deadlines.
-
Good written and oral communication skills and ability to present results to non-technical audiences
-
Knowledge of business intelligence and analytical tools, technologies and techniques.
Familiarity and experience in the following is a plus:
-
AWS certification
-
Spark Streaming
-
Kafka Streaming / Kafka Connect
-
ELK Stack
-
Cassandra / MongoDB
-
CI/CD: Jenkins, GitLab, Jira, Confluence other related tools
About the Role
The Dremio India team owns the DataLake Engine along with Cloud Infrastructure and services that power it. With focus on next generation data analytics supporting modern table formats like Iceberg, Deltalake, and open source initiatives such as Apache Arrow, Project Nessie and hybrid-cloud infrastructure, this team provides various opportunities to learn, deliver, and grow in career. We are looking for technical leaders with passion and experience in architecting and delivering high-quality distributed systems at massive scale.
Responsibilities & ownership
- Lead end-to-end delivery and customer success of next-generation features related to scalability, reliability, robustness, usability, security, and performance of the product
- Lead and mentor others about concurrency, parallelization to deliver scalability, performance and resource optimization in a multithreaded and distributed environment
- Propose and promote strategic company-wide tech investments taking care of business goals, customer requirements, and industry standards
- Lead the team to solve complex, unknown and ambiguous problems, and customer issues cutting across team and module boundaries with technical expertise, and influence others
- Review and influence designs of other team members
- Design and deliver architectures that run optimally on public clouds like GCP, AWS, and Azure
- Partner with other leaders to nurture innovation and engineering excellence in the team
- Drive priorities with others to facilitate timely accomplishments of business objectives
- Perform RCA of customer issues and drive investments to avoid similar issues in future
- Collaborate with Product Management, Support, and field teams to ensure that customers are successful with Dremio
- Proactively suggest learning opportunities about new technology and skills, and be a role model for constant learning and growth
Requirements
- B.S./M.S/Equivalent in Computer Science or a related technical field or equivalent experience
- Fluency in Java/C++ with 15+ years of experience developing production-level software
- Strong foundation in data structures, algorithms, multi-threaded and asynchronous programming models and their use in developing distributed and scalable systems
- 8+ years experience in developing complex and scalable distributed systems and delivering, deploying, and managing microservices successfully
- Subject Matter Expert in one or more of query processing or optimization, distributed systems, concurrency, micro service based architectures, data replication, networking, storage systems
- Experience in taking company-wide initiatives, convincing stakeholders, and delivering them
- Expert in solving complex, unknown and ambiguous problems spanning across teams and taking initiative in planning and delivering them with high quality
- Ability to anticipate and propose plan/design changes based on changing requirements
- Passion for quality, zero downtime upgrades, availability, resiliency, and uptime of the platform
- Passion for learning and delivering using latest technologies
- Hands-on experience of working projects on AWS, Azure, and GCP
- Experience with containers and Kubernetes for orchestration and container management in private and public clouds (AWS, Azure, and GCP)
- Understanding of distributed file systems such as S3, ADLS or HDFS
- Excellent communication skills and affinity for collaboration and teamwork