28+ Hadoop Jobs in Hyderabad | Hadoop Job openings in Hyderabad
Apply to 28+ Hadoop Jobs in Hyderabad on CutShort.io. Explore the latest Hadoop Job opportunities across top companies like Google, Amazon & Adobe.
Secondary Skills: Streaming, Archiving , AWS / AZURE / CLOUD
Role:
· Should have strong programming and support experience in Java, J2EE technologies
· Should have good experience in Core Java, JSP, Sevlets, JDBC
· Good exposure in Hadoop development ( HDFS, Map Reduce, Hive, HBase, Spark)
· Should have 2+ years of Java experience and 1+ years of experience in Hadoop
· Should possess good communication skills
The Sr. Analytics Engineer would provide technical expertise in needs identification, data modeling, data movement, and transformation mapping (source to target), automation and testing strategies, translating business needs into technical solutions with adherence to established data guidelines and approaches from a business unit or project perspective.
Understands and leverages best-fit technologies (e.g., traditional star schema structures, cloud, Hadoop, NoSQL, etc.) and approaches to address business and environmental challenges.
Provides data understanding and coordinates data-related activities with other data management groups such as master data management, data governance, and metadata management.
Actively participates with other consultants in problem-solving and approach development.
Responsibilities :
Provide a consultative approach with business users, asking questions to understand the business need and deriving the data flow, conceptual, logical, and physical data models based on those needs.
Perform data analysis to validate data models and to confirm the ability to meet business needs.
Assist with and support setting the data architecture direction, ensuring data architecture deliverables are developed, ensuring compliance to standards and guidelines, implementing the data architecture, and supporting technical developers at a project or business unit level.
Coordinate and consult with the Data Architect, project manager, client business staff, client technical staff and project developers in data architecture best practices and anything else that is data related at the project or business unit levels.
Work closely with Business Analysts and Solution Architects to design the data model satisfying the business needs and adhering to Enterprise Architecture.
Coordinate with Data Architects, Program Managers and participate in recurring meetings.
Help and mentor team members to understand the data model and subject areas.
Ensure that the team adheres to best practices and guidelines.
Requirements :
- Strong working knowledge of at least 3 years of Spark, Java/Scala/Pyspark, Kafka, Git, Unix / Linux, and ETL pipeline designing.
- Experience with Spark optimization/tuning/resource allocations
- Excellent understanding of IN memory distributed computing frameworks like Spark and its parameter tuning, writing optimized workflow sequences.
- Experience of relational databases (e.g., PostgreSQL, MySQL) and NoSQL databases (e.g., Redshift, Bigquery, Cassandra, etc).
- Familiarity with Docker, Kubernetes, Azure Data Lake/Blob storage, AWS S3, Google Cloud storage, etc.
- Have a deep understanding of the various stacks and components of the Big Data ecosystem.
- Hands-on experience with Python is a huge plus
Sigmoid works with a variety of clients from start-ups to fortune 500 companies. We are looking for a detailed oriented self-starter to assist our engineering and analytics teams in various roles as a Software Development Engineer.
This position will be a part of a growing team working towards building world class large scale Big Data architectures. This individual should have a sound understanding of programming principles, experience in programming in Java, Python or similar languages and can expect to
spend a majority of their time coding.
Location - Bengaluru and Hyderabad
Responsibilities:
● Good development practices
○ Hands on coder with good experience in programming languages like Java or
Python.
○ Hands-on experience on the Big Data stack like PySpark, Hbase, Hadoop, Mapreduce and ElasticSearch.
○ Good understanding of programming principles and development practices like checkin policy, unit testing, code deployment
○ Self starter to be able to grasp new concepts and technology and translate them into large scale engineering developments
○ Excellent experience in Application development and support, integration development and data management.
● Align Sigmoid with key Client initiatives
○ Interface daily with customers across leading Fortune 500 companies to understand strategic requirements
● Stay up-to-date on the latest technology to ensure the greatest ROI for customer &Sigmoid
○ Hands on coder with good understanding on enterprise level code
○ Design and implement APIs, abstractions and integration patterns to solve challenging distributed computing problems
○ Experience in defining technical requirements, data extraction, data
transformation, automating jobs, productionizing jobs, and exploring new big data technologies within a Parallel Processing environment
● Culture
○ Must be a strategic thinker with the ability to think unconventional /
out:of:box.
○ Analytical and data driven orientation.
○ Raw intellect, talent and energy are critical.
○ Entrepreneurial and Agile : understands the demands of a private, high growth company.
○ Ability to be both a leader and hands on "doer".
Qualifications: -
- Years of track record of relevant work experience and a computer Science or related technical discipline is required
- Experience with functional and object-oriented programming, Java must.
- hand-On knowledge in Map Reduce, Hadoop, PySpark, Hbase and ElasticSearch.
- Effective communication skills (both written and verbal)
- Ability to collaborate with a diverse set of engineers, data scientists and product managers
- Comfort in a fast-paced start-up environment
Preferred Qualification:
- Technical knowledge in Map Reduce, Hadoop & GCS Stack a plus.
- Experience in agile methodology
- Experience with database modeling and development, data mining and warehousing.
- Experience in architecture and delivery of Enterprise scale applications and capable in developing framework, design patterns etc. Should be able to understand and tackle technical challenges, propose comprehensive solutions and guide junior staff
- Experience working with large, complex data sets from a variety of sources
Job Title: Data Engineer
Job Summary: As a Data Engineer, you will be responsible for designing, building, and maintaining the infrastructure and tools necessary for data collection, storage, processing, and analysis. You will work closely with data scientists and analysts to ensure that data is available, accessible, and in a format that can be easily consumed for business insights.
Responsibilities:
- Design, build, and maintain data pipelines to collect, store, and process data from various sources.
- Create and manage data warehousing and data lake solutions.
- Develop and maintain data processing and data integration tools.
- Collaborate with data scientists and analysts to design and implement data models and algorithms for data analysis.
- Optimize and scale existing data infrastructure to ensure it meets the needs of the business.
- Ensure data quality and integrity across all data sources.
- Develop and implement best practices for data governance, security, and privacy.
- Monitor data pipeline performance / Errors and troubleshoot issues as needed.
- Stay up-to-date with emerging data technologies and best practices.
Requirements:
Bachelor's degree in Computer Science, Information Systems, or a related field.
Experience with ETL tools like Matillion,SSIS,Informatica
Experience with SQL and relational databases such as SQL server, MySQL, PostgreSQL, or Oracle.
Experience in writing complex SQL queries
Strong programming skills in languages such as Python, Java, or Scala.
Experience with data modeling, data warehousing, and data integration.
Strong problem-solving skills and ability to work independently.
Excellent communication and collaboration skills.
Familiarity with big data technologies such as Hadoop, Spark, or Kafka.
Familiarity with data warehouse/Data lake technologies like Snowflake or Databricks
Familiarity with cloud computing platforms such as AWS, Azure, or GCP.
Familiarity with Reporting tools
Teamwork/ growth contribution
- Helping the team in taking the Interviews and identifying right candidates
- Adhering to timelines
- Intime status communication and upfront communication of any risks
- Tech, train, share knowledge with peers.
- Good Communication skills
- Proven abilities to take initiative and be innovative
- Analytical mind with a problem-solving aptitude
Good to have :
Master's degree in Computer Science, Information Systems, or a related field.
Experience with NoSQL databases such as MongoDB or Cassandra.
Familiarity with data visualization and business intelligence tools such as Tableau or Power BI.
Knowledge of machine learning and statistical modeling techniques.
If you are passionate about data and want to work with a dynamic team of data scientists and analysts, we encourage you to apply for this position.
About Telstra
Telstra is Australia’s leading telecommunications and technology company, with operations in more than 20 countries, including In India where we’re building a new Innovation and Capability Centre (ICC) in Bangalore.
We’re growing, fast, and for you that means many exciting opportunities to develop your career at Telstra. Join us on this exciting journey, and together, we’ll reimagine the future.
Why Telstra?
- We're an iconic Australian company with a rich heritage that's been built over 100 years. Telstra is Australia's leading Telecommunications and Technology Company. We've been operating internationally for more than 70 years.
- International presence spanning over 20 countries.
- We are one of the 20 largest telecommunications providers globally
- At Telstra, the work is complex and stimulating, but with that comes a great sense of achievement. We are shaping the tomorrow's modes of communication with our innovation driven teams.
Telstra offers an opportunity to make a difference to lives of millions of people by providing the choice of flexibility in work and a rewarding career that you will be proud of!
About the team
Being part of Networks & IT means you'll be part of a team that focuses on extending our network superiority to enable the continued execution of our digital strategy.
With us, you'll be working with world-leading technology and change the way we do IT to ensure business needs drive priorities, accelerating our digitisation programme.
Focus of the role
Any new engineer who comes into data chapter would be mostly into developing reusable data processing and storage frameworks that can be used across data platform.
About you
To be successful in the role, you'll bring skills and experience in:-
Essential
- Hands-on experience in Spark Core, Spark SQL, SQL/Hive/Impala, Git/SVN/Any other VCS and Data warehousing
- Skilled in the Hadoop Ecosystem(HDP/Cloudera/MapR/EMR etc)
- Azure data factory/Airflow/control-M/Luigi
- PL/SQL
- Exposure to NOSQL(Hbase/Cassandra/GraphDB(Neo4J)/MongoDB)
- File formats (Parquet/ORC/AVRO/Delta/Hudi etc.)
- Kafka/Kinesis/Eventhub
Highly Desirable
Experience and knowledgeable on the following:
- Spark Streaming
- Cloud exposure (Azure/AWS/GCP)
- Azure data offerings - ADF, ADLS2, Azure Databricks, Azure Synapse, Eventhubs, CosmosDB etc.
- Presto/Athena
- Azure DevOps
- Jenkins/ Bamboo/Any similar build tools
- Power BI
- Prior experience in building or working in team building reusable frameworks,
- Data modelling.
- Data Architecture and design principles. (Delta/Kappa/Lambda architecture)
- Exposure to CI/CD
- Code Quality - Static and Dynamic code scans
- Agile SDLC
If you've got a passion to innovate, succeed as part of a great team, and looking for the next step in your career, we'd welcome you to apply!
___________________________
We’re committed to building a diverse and inclusive workforce in all its forms. We encourage applicants from diverse gender, cultural and linguistic backgrounds and applicants who may be living with a disability. We also offer flexibility in all our roles, to ensure everyone can participate.
To learn more about how we support our people, including accessibility adjustments we can provide you through the recruitment process, visit tel.st/thrive.
Multinational Company providing energy & Automation digital
Roles and Responsibilities
Multinational Company providing energy & Automation digital
Skills
Urgent Openings with one of our client
Experience : 3 to 7 Years
Number of Positions : 20
Job Location : Hyderabad
Notice : 30 Days
1. Expertise in building AWS Data Engineering pipelines with AWS Glue -> Athena -> Quick sight
2. Experience in developing lambda functions with AWS Lambda
3. Expertise with Spark/PySpark – Candidate should be hands on with PySpark code and should be able to do transformations with Spark
4. Should be able to code in Python and Scala.
5. Snowflake experience will be a plus
Hadoop and Hive requirements as good to have or understanding of is enough.
We are hiring for Tier 1 MNC for the software developer with good knowledge in Spark,Hadoop and Scala
Location: Bangalore/Pune/Hyderabad/Nagpur
4-5 years of overall experience in software development.
- Experience on Hadoop (Apache/Cloudera/Hortonworks) and/or other Map Reduce Platforms
- Experience on Hive, Pig, Sqoop, Flume and/or Mahout
- Experience on NO-SQL – HBase, Cassandra, MongoDB
- Hands on experience with Spark development, Knowledge of Storm, Kafka, Scala
- Good knowledge of Java
- Good background of Configuration Management/Ticketing systems like Maven/Ant/JIRA etc.
- Knowledge around any Data Integration and/or EDW tools is plus
- Good to have knowledge of using Python/Perl/Shell
Please note - Hbase hive and spark are must.
Responsibilities:
* 3+ years of Data Engineering Experience - Design, develop, deliver and maintain data infrastructures.
* SQL Specialist – Strong knowledge and Seasoned experience with SQL Queries
* Languages: Python
* Good communicator, shows initiative, works well with stakeholders.
* Experience working closely with Data Analysts and provide the data they need and guide them on the issues.
* Solid ETL experience and Hadoop/Hive/Pyspark/Presto/ SparkSQL
* Solid communication and articulation skills
* Able to handle stakeholders independently with less interventions of reporting manager.
* Develop strategies to solve problems in logical yet creative ways.
* Create custom reports and presentations accompanied by strong data visualization and storytelling
We would be excited if you have:
* Excellent communication and interpersonal skills
* Ability to meet deadlines and manage project delivery
* Excellent report-writing and presentation skills
* Critical thinking and problem-solving capabilities
CORE RESPONSIBILITIES
- Create and manage cloud resources in AWS
- Data ingestion from different data sources which exposes data using different technologies, such as: RDBMS, REST HTTP API, flat files, Streams, and Time series data based on various proprietary systems. Implement data ingestion and processing with the help of Big Data technologies
- Data processing/transformation using various technologies such as Spark and Cloud Services. You will need to understand your part of business logic and implement it using the language supported by the base data platform
- Develop automated data quality check to make sure right data enters the platform and verifying the results of the calculations
- Develop an infrastructure to collect, transform, combine and publish/distribute customer data.
- Define process improvement opportunities to optimize data collection, insights and displays.
- Ensure data and results are accessible, scalable, efficient, accurate, complete and flexible
- Identify and interpret trends and patterns from complex data sets
- Construct a framework utilizing data visualization tools and techniques to present consolidated analytical and actionable results to relevant stakeholders.
- Key participant in regular Scrum ceremonies with the agile teams
- Proficient at developing queries, writing reports and presenting findings
- Mentor junior members and bring best industry practices
QUALIFICATIONS
- 5-7+ years’ experience as data engineer in consumer finance or equivalent industry (consumer loans, collections, servicing, optional product, and insurance sales)
- Strong background in math, statistics, computer science, data science or related discipline
- Advanced knowledge one of language: Java, Scala, Python, C#
- Production experience with: HDFS, YARN, Hive, Spark, Kafka, Oozie / Airflow, Amazon Web Services (AWS), Docker / Kubernetes, Snowflake
- Proficient with
- Data mining/programming tools (e.g. SAS, SQL, R, Python)
- Database technologies (e.g. PostgreSQL, Redshift, Snowflake. and Greenplum)
- Data visualization (e.g. Tableau, Looker, MicroStrategy)
- Comfortable learning about and deploying new technologies and tools.
- Organizational skills and the ability to handle multiple projects and priorities simultaneously and meet established deadlines.
- Good written and oral communication skills and ability to present results to non-technical audiences
- Knowledge of business intelligence and analytical tools, technologies and techniques.
Mandatory Requirements
- Experience in AWS Glue
- Experience in Apache Parquet
- Proficient in AWS S3 and data lake
- Knowledge of Snowflake
- Understanding of file-based ingestion best practices.
- Scripting language - Python & pyspark
Multinational Company providing Automation digital Solutions
- At least 4 to 7 years of relevant experience as Big Data Engineer
- Hands-on experience in Scala or Python
- Hands-on experience on major components in Hadoop Ecosystem like HDFS, Map Reduce, Hive, Impala.
- Strong programming experience in building applications/platform using Scala or Python.
- Experienced in implementing Spark RDD Transformations, actions to implement business analysis
We are specialized in productizing solutions of new technology.
Our vision is to build engineers with entrepreneurial and leadership mindsets who can create highly impactful products and solutions using technology to deliver immense value to our clients.
We strive to develop innovation and passion into everything we do, whether it is services or products, or solutions.
We are hiring for Java / Spark Developer.
Good IT experience in JAVA technology and its frameworks
Hands on experience in JAVA FULL STACK development including MINIMUM 2YEAR IN SPARK
Experience in writing SQL for any one Relational database ex. Oracle
Extensive knowledge in HiveSQL Hadoop
Experience in Version control through Bitbucket Issue tracking through JIRA CICD
Skills- Informatica with Big Data Management
1.Minimum 6 to 8 years of experience in informatica BDM development
2.Experience working on Spark/SQL
3.Develops informtica mapping/Sql
Job description
Role : Lead Architecture (Spark, Scala, Big Data/Hadoop, Java)
Primary Location : India-Pune, Hyderabad
Experience : 7 - 12 Years
Management Level: 7
Joining Time: Immediate Joiners are preferred
- Attend requirements gathering workshops, estimation discussions, design meetings and status review meetings
- Experience of Solution Design and Solution Architecture for the data engineer model to build and implement Big Data Projects on-premises and on cloud.
- Align architecture with business requirements and stabilizing the developed solution
- Ability to build prototypes to demonstrate the technical feasibility of your vision
- Professional experience facilitating and leading solution design, architecture and delivery planning activities for data intensive and high throughput platforms and applications
- To be able to benchmark systems, analyses system bottlenecks and propose solutions to eliminate them
- Able to help programmers and project managers in the design, planning and governance of implementing projects of any kind.
- Develop, construct, test and maintain architectures and run Sprints for development and rollout of functionalities
- Data Analysis, Code development experience, ideally in Big Data Spark, Hive, Hadoop, Java, Python, PySpark,
- Execute projects of various types i.e. Design, development, Implementation and migration of functional analytics Models/Business logic across architecture approaches
- Work closely with Business Analysts to understand the core business problems and deliver efficient IT solutions of the product
- Deployment sophisticated analytics program of code using any of cloud application.
Perks and Benefits we Provide!
- Working with Highly Technical and Passionate, mission-driven people
- Subsidized Meals & Snacks
- Flexible Schedule
- Approachable leadership
- Access to various learning tools and programs
- Pet Friendly
- Certification Reimbursement Policy
- Check out more about us on our website below!
www.datametica.com
Summary
Our Kafka developer has a combination of technical skills, communication skills and business knowledge. The developer should be able to work on multiple medium to large projects. The successful candidate will have excellent technical skills of Apache/Confluent Kafka, Enterprise Data WareHouse preferable GCP BigQuery or any equivalent Cloud EDW and also will be able to take oral and written business requirements and develop efficient code to meet set deliverables.
Must Have Skills
- Participate in the development, enhancement and maintenance of data applications both as an individual contributor and as a lead.
- Leading in the identification, isolation, resolution and communication of problems within the production environment.
- Leading developer and applying technical skills Apache/Confluent Kafka (Preferred) AWS Kinesis (Optional), Cloud Enterprise Data Warehouse Google BigQuery (Preferred) or AWS RedShift or SnowFlakes (Optional)
- Design recommending best approach suited for data movement from different sources to Cloud EDW using Apache/Confluent Kafka
- Performs independent functional and technical analysis for major projects supporting several corporate initiatives.
- Communicate and Work with IT partners and user community with various levels from Sr Management to detailed developer to business SME for project definition .
- Works on multiple platforms and multiple projects concurrently.
- Performs code and unit testing for complex scope modules, and projects
- Provide expertise and hands on experience working on Kafka connect using schema registry in a very high volume environment (~900 Million messages)
- Provide expertise in Kafka brokers, zookeepers, KSQL, KStream and Kafka Control center.
- Provide expertise and hands on experience working on AvroConverters, JsonConverters, and StringConverters.
- Provide expertise and hands on experience working on Kafka connectors such as MQ connectors, Elastic Search connectors, JDBC connectors, File stream connector, JMS source connectors, Tasks, Workers, converters, Transforms.
- Provide expertise and hands on experience on custom connectors using the Kafka core concepts and API.
- Working knowledge on Kafka Rest proxy.
- Ensure optimum performance, high availability and stability of solutions.
- Create topics, setup redundancy cluster, deploy monitoring tools, alerts and has good knowledge of best practices.
- Create stubs for producers, consumers and consumer groups for helping onboard applications from different languages/platforms. Leverage Hadoop ecosystem knowledge to design, and develop capabilities to deliver our solutions using Spark, Scala, Python, Hive, Kafka and other things in the Hadoop ecosystem.
- Use automation tools like provisioning using Jenkins, Udeploy or relevant technologies
- Ability to perform data related benchmarking, performance analysis and tuning.
- Strong skills in In-memory applications, Database Design, Data Integration.
Ideal candidates should have technical experience in migrations and the ability to help customers get value from Datametica's tools and accelerators.
Job Description
Experience : 7+ years
Location : Pune / Hyderabad
Skills :
- Drive and participate in requirements gathering workshops, estimation discussions, design meetings and status review meetings
- Participate and contribute in Solution Design and Solution Architecture for implementing Big Data Projects on-premise and on cloud
- Technical Hands on experience in design, coding, development and managing Large Hadoop implementation
- Proficient in SQL, Hive, PIG, Spark SQL, Shell Scripting, Kafka, Flume, Scoop with large Big Data and Data Warehousing projects with either Java, Python or Scala based Hadoop programming background
- Proficient with various development methodologies like waterfall, agile/scrum and iterative
- Good Interpersonal skills and excellent communication skills for US and UK based clients
About Us!
A global Leader in the Data Warehouse Migration and Modernization to the Cloud, we empower businesses by migrating their Data/Workload/ETL/Analytics to the Cloud by leveraging Automation.
We have expertise in transforming legacy Teradata, Oracle, Hadoop, Netezza, Vertica, Greenplum along with ETLs like Informatica, Datastage, AbInitio & others, to cloud-based data warehousing with other capabilities in data engineering, advanced analytics solutions, data management, data lake and cloud optimization.
Datametica is a key partner of the major cloud service providers - Google, Microsoft, Amazon, Snowflake.
We have our own products!
Eagle – Data warehouse Assessment & Migration Planning Product
Raven – Automated Workload Conversion Product
Pelican - Automated Data Validation Product, which helps automate and accelerate data migration to the cloud.
Why join us!
Datametica is a place to innovate, bring new ideas to live and learn new things. We believe in building a culture of innovation, growth and belonging. Our people and their dedication over these years are the key factors in achieving our success.
Benefits we Provide!
Working with Highly Technical and Passionate, mission-driven people
Subsidized Meals & Snacks
Flexible Schedule
Approachable leadership
Access to various learning tools and programs
Pet Friendly
Certification Reimbursement Policy
Check out more about us on our website below!
www.datametica.com
at Persistent Systems
Location: Pune/Nagpur,Goa,Hyderabad/
Job Requirements:
- 9 years and above of total experience preferably in bigdata space.
- Creating spark applications using Scala to process data.
- Experience in scheduling and troubleshooting/debugging Spark jobs in steps.
- Experience in spark job performance tuning and optimizations.
- Should have experience in processing data using Kafka/Pyhton.
- Individual should have experience and understanding in configuring Kafka topics to optimize the performance.
- Should be proficient in writing SQL queries to process data in Data Warehouse.
- Hands on experience in working with Linux commands to troubleshoot/debug issues and creating shell scripts to automate tasks.
- Experience on AWS services like EMR.
• 5+ years’ experience developing and maintaining modern ingestion pipeline using
technologies like Spark, Apache Nifi etc).
• 2+ years’ experience with Healthcare Payors (focusing on Membership, Enrollment, Eligibility,
• Claims, Clinical)
• Hands on experience on AWS Cloud and its Native components like S3, Athena, Redshift &
• Jupyter Notebooks
• Strong in Spark Scala & Python pipelines (ETL & Streaming)
• Strong experience in metadata management tools like AWS Glue
• String experience in coding with languages like Java, Python
• Worked on designing ETL & streaming pipelines in Spark Scala / Python
• Good experience in Requirements gathering, Design & Development
• Working with cross-functional teams to meet strategic goals.
• Experience in high volume data environments
• Critical thinking and excellent verbal and written communication skills
• Strong problem-solving and analytical abilities, should be able to work and delivery
individually
• Good-to-have AWS Developer certified, Scala coding experience, Postman-API and Apache
Airflow or similar schedulers experience
• Nice-to-have experience in healthcare messaging standards like HL7, CCDA, EDI, 834, 835, 837
• Good communication skills
We are looking for a Senior Python Developer to produce large scale distributed software solutions. You’ll be part of a cross-functional team that’s responsible for the complete software development life cycle, from conception to deployment.
If you’re also familiar with Agile methodologies, we’d like to meet you.
Responsibilities:
Work with development teams and product managers to ideate software solutions Design client-side and server-side architecture Build the front-end of applications through appealing visual design Develop and manage well-functioning databases and applications Write effective APIs Test software to ensure responsiveness and efficiency Troubleshoot, debug and upgrade software Create security and data protection settings Write technical documentation
Requirements
Proven experience as a Python Developer or similar role Knowledge on Python, Django, MongoDB, Elasticsearch, AWS Excellent communication and teamwork skills Great attention to detail Organizational skills An analytical mind Experience on Apache Kafka, Hbase and Graph DB is an added bonus
Be Part Of Building The Future
Dremio is the Data Lake Engine company. Our mission is to reshape the world of analytics to deliver on the promise of data with a fundamentally new architecture, purpose-built for the exploding trend towards cloud data lake storage such as AWS S3 and Microsoft ADLS. We dramatically reduce and even eliminate the need for the complex and expensive workarounds that have been in use for decades, such as data warehouses (whether on-premise or cloud-native), structural data prep, ETL, cubes, and extracts. We do this by enabling lightning-fast queries directly against data lake storage, combined with full self-service for data users and full governance and control for IT. The results for enterprises are extremely compelling: 100X faster time to insight; 10X greater efficiency; zero data copies; and game-changing simplicity. And equally compelling is the market opportunity for Dremio, as we are well on our way to disrupting a $25BN+ market.
About the Role
The Dremio India team owns the DataLake Engine along with Cloud Infrastructure and services that power it. With focus on next generation data analytics supporting modern table formats like Iceberg, Deltalake, and open source initiatives such as Apache Arrow, Project Nessie and hybrid-cloud infrastructure, this team provides various opportunities to learn, deliver, and grow in career. We are looking for innovative minds with experience in leading and building high quality distributed systems at massive scale and solving complex problems.
Responsibilities & ownership
- Lead, build, deliver and ensure customer success of next-generation features related to scalability, reliability, robustness, usability, security, and performance of the product.
- Work on distributed systems for data processing with efficient protocols and communication, locking and consensus, schedulers, resource management, low latency access to distributed storage, auto scaling, and self healing.
- Understand and reason about concurrency and parallelization to deliver scalability and performance in a multithreaded and distributed environment.
- Lead the team to solve complex and unknown problems
- Solve technical problems and customer issues with technical expertise
- Design and deliver architectures that run optimally on public clouds like GCP, AWS, and Azure
- Mentor other team members for high quality and design
- Collaborate with Product Management to deliver on customer requirements and innovation
- Collaborate with Support and field teams to ensure that customers are successful with Dremio
Requirements
- B.S./M.S/Equivalent in Computer Science or a related technical field or equivalent experience
- Fluency in Java/C++ with 8+ years of experience developing production-level software
- Strong foundation in data structures, algorithms, multi-threaded and asynchronous programming models, and their use in developing distributed and scalable systems
- 5+ years experience in developing complex and scalable distributed systems and delivering, deploying, and managing microservices successfully
- Hands-on experience in query processing or optimization, distributed systems, concurrency control, data replication, code generation, networking, and storage systems
- Passion for quality, zero downtime upgrades, availability, resiliency, and uptime of the platform
- Passion for learning and delivering using latest technologies
- Ability to solve ambiguous, unexplored, and cross-team problems effectively
- Hands on experience of working projects on AWS, Azure, and Google Cloud Platform
- Experience with containers and Kubernetes for orchestration and container management in private and public clouds (AWS, Azure, and Google Cloud)
- Understanding of distributed file systems such as S3, ADLS, or HDFS
- Excellent communication skills and affinity for collaboration and teamwork
- Ability to work individually and collaboratively with other team members
- Ability to scope and plan solution for big problems and mentors others on the same
- Interested and motivated to be part of a fast-moving startup with a fun and accomplished team
Minimum 2 years of work experience on Snowflake and Azure storage.
Minimum 3 years of development experience in ETL Tool Experience.
Strong SQL database skills in other databases like Oracle, SQL Server, DB2 and Teradata
Good to have Hadoop and Spark experience.
Good conceptual knowledge on Data-Warehouse and various methodologies.
Working knowledge in any of the scripting like UNIX / Shell
Good Presentation and communication skills.
Should be flexible with the overlapping working hours.
Should be able to work independently and be proactive.
Good understanding of Agile development cycle.
- 5+ years of experience in a Data Engineer role
- Graduate degree in Computer Science, Statistics, Informatics, Information Systems or another quantitative field.
- Experience with big data tools: Hadoop, Spark, Kafka, etc.
- Experience with relational SQL and NoSQL databases such as Cassandra.
- Experience with AWS cloud services: EC2, EMR, Athena
- Experience with object-oriented/object function scripting languages: Python, Java, C++, Scala, etc.
- Advanced SQL knowledge and experience working with relational databases, query authoring (SQL) as well as familiarity with unstructured datasets.
- Deep problem-solving skills to perform root cause analysis on internal and external data and processes to answer specific business questions and identify opportunities for improvement.