We are looking for an outstanding Big Data Engineer with experience setting up and maintaining Data Warehouse and Data Lakes for an Organization. This role would closely collaborate with the Data Science team and assist the team build and deploy machine learning and deep learning models on big data analytics platforms.
Roles and Responsibilities:
- Develop and maintain scalable data pipelines and build out new integrations and processes required for optimal extraction, transformation, and loading of data from a wide variety of data sources using 'Big Data' technologies.
- Develop programs in Scala and Python as part of data cleaning and processing.
- Assemble large, complex data sets that meet functional / non-functional business requirements and fostering data-driven decision making across the organization.
- Responsible to design and develop distributed, high volume, high velocity multi-threaded event processing systems.
- Implement processes and systems to validate data, monitor data quality, ensuring production data is always accurate and available for key stakeholders and business processes that depend on it.
- Perform root cause analysis on internal and external data and processes to answer specific business questions and identify opportunities for improvement.
- Provide high operational excellence guaranteeing high availability and platform stability.
- Closely collaborate with the Data Science team and assist the team build and deploy machine learning and deep learning models on big data analytics platforms.
- Experience with Big Data pipeline, Big Data analytics, Data warehousing.
- Experience with SQL/No-SQL, schema design and dimensional data modeling.
- Strong understanding of Hadoop Architecture, HDFS ecosystem and eexperience with Big Data technology stack such as HBase, Hadoop, Hive, MapReduce.
- Experience in designing systems that process structured as well as unstructured data at large scale.
- Experience in AWS/Spark/Java/Scala/Python development.
- Should have Strong skills in PySpark (Python & SPARK). Ability to create, manage and manipulate Spark Dataframes. Expertise in Spark query tuning and performance optimization.
- Experience in developing efficient software code/frameworks for multiple use cases leveraging Python and big data technologies.
- Prior exposure to streaming data sources such as Kafka.
- Should have knowledge on Shell Scripting and Python scripting.
- High proficiency in database skills (e.g., Complex SQL), for data preparation, cleaning, and data wrangling/munging, with the ability to write advanced queries and create stored procedures.
- Experience with NoSQL databases such as Cassandra / MongoDB.
- Solid experience in all phases of Software Development Lifecycle - plan, design, develop, test, release, maintain and support, decommission.
- Experience with DevOps tools (GitHub, Travis CI, and JIRA) and methodologies (Lean, Agile, Scrum, Test Driven Development).
- Experience building and deploying applications on on-premise and cloud-based infrastructure.
- Having a good understanding of machine learning landscape and concepts.
Qualifications and Experience:
Engineering and post graduate candidates, preferably in Computer Science, from premier institutions with proven work experience as a Big Data Engineer or a similar role for 3-5 years.
Good to have at least one of the Certifications listed here:
AZ 900 - Azure Fundamentals
DP 200, DP 201, DP 203, AZ 204 - Data Engineering
AZ 400 - Devops Certification