![Product and Service based company's logo](/_next/image?url=https%3A%2F%2Fcdn.cutshort.io%2Fpublic%2Fimages%2Fdefault_company_picture.jpg&w=3840&q=75)
![skill icon](/_next/image?url=https%3A%2F%2Fcdn.cutshort.io%2Fpublic%2Fimages%2Fskill_icons%2Fpython.png&w=32&q=75)
![skill icon](/_next/image?url=https%3A%2F%2Fcdn.cutshort.io%2Fpublic%2Fimages%2Fskill_icons%2Fscala.png&w=32&q=75)
![skill icon](/_next/image?url=https%3A%2F%2Fcdn.cutshort.io%2Fpublic%2Fimages%2Fskill_icons%2Fc_sharp.png&w=32&q=75)
Job Description
Mandatory Requirements
-
Experience in AWS Glue
-
Experience in Apache Parquet
-
Proficient in AWS S3 and data lake
-
Knowledge of Snowflake
-
Understanding of file-based ingestion best practices.
-
Scripting language - Python & pyspark
CORE RESPONSIBILITIES
-
Create and manage cloud resources in AWS
-
Data ingestion from different data sources which exposes data using different technologies, such as: RDBMS, flat files, Streams, and Time series data based on various proprietary systems. Implement data ingestion and processing with the help of Big Data technologies
-
Data processing/transformation using various technologies such as Spark and Cloud Services. You will need to understand your part of business logic and implement it using the language supported by the base data platform
-
Develop automated data quality check to make sure right data enters the platform and verifying the results of the calculations
-
Develop an infrastructure to collect, transform, combine and publish/distribute customer data.
-
Define process improvement opportunities to optimize data collection, insights and displays.
-
Ensure data and results are accessible, scalable, efficient, accurate, complete and flexible
-
Identify and interpret trends and patterns from complex data sets
-
Construct a framework utilizing data visualization tools and techniques to present consolidated analytical and actionable results to relevant stakeholders.
-
Key participant in regular Scrum ceremonies with the agile teams
-
Proficient at developing queries, writing reports and presenting findings
-
Mentor junior members and bring best industry practices.
QUALIFICATIONS
-
5-7+ years’ experience as data engineer in consumer finance or equivalent industry (consumer loans, collections, servicing, optional product, and insurance sales)
-
Strong background in math, statistics, computer science, data science or related discipline
-
Advanced knowledge one of language: Java, Scala, Python, C#
-
Production experience with: HDFS, YARN, Hive, Spark, Kafka, Oozie / Airflow, Amazon Web Services (AWS), Docker / Kubernetes, Snowflake
-
Proficient with
-
Data mining/programming tools (e.g. SAS, SQL, R, Python)
-
Database technologies (e.g. PostgreSQL, Redshift, Snowflake. and Greenplum)
-
Data visualization (e.g. Tableau, Looker, MicroStrategy)
-
Comfortable learning about and deploying new technologies and tools.
-
Organizational skills and the ability to handle multiple projects and priorities simultaneously and meet established deadlines.
-
Good written and oral communication skills and ability to present results to non-technical audiences
-
Knowledge of business intelligence and analytical tools, technologies and techniques.
Familiarity and experience in the following is a plus:
-
AWS certification
-
Spark Streaming
-
Kafka Streaming / Kafka Connect
-
ELK Stack
-
Cassandra / MongoDB
-
CI/CD: Jenkins, GitLab, Jira, Confluence other related tools
![companies logos](/_next/image?url=https%3A%2F%2Fcdn.cutshort.io%2Fpublic%2Fimages%2Fhiring_companies_logos-v2.webp&w=3840&q=80)
Similar jobs
![skill icon](/_next/image?url=https%3A%2F%2Fcdn.cutshort.io%2Fpublic%2Fimages%2Fskill_icons%2Fpython.png&w=32&q=75)
We are looking out for a Snowflake developer for one of our premium clients for their PAN India loaction
You will be responsible for designing, building, and maintaining data pipelines that handle Real-world data at Compile. You will be handling both inbound and outbound data deliveries at Compile for datasets including Claims, Remittances, EHR, SDOH, etc.
You will
- Work on building and maintaining data pipelines (specifically RWD).
- Build, enhance and maintain existing pipelines in pyspark, python and help build analytical insights and datasets.
- Scheduling and maintaining pipeline jobs for RWD.
- Develop, test, and implement data solutions based on the design.
- Design and implement quality checks on existing and new data pipelines.
- Ensure adherence to security and compliance that is required for the products.
- Maintain relationships with various data vendors and track changes and issues across vendors and deliveries.
You have
- Hands-on experience with ETL process (min of 5 years).
- Excellent communication skills and ability to work with multiple vendors.
- High proficiency with Spark, SQL.
- Proficiency in Data modeling, validation, quality check, and data engineering concepts.
- Experience in working with big-data processing technologies using - databricks, dbt, S3, Delta lake, Deequ, Griffin, Snowflake, BigQuery.
- Familiarity with version control technologies, and CI/CD systems.
- Understanding of scheduling tools like Airflow/Prefect.
- Min of 3 years of experience managing data warehouses.
- Familiarity with healthcare datasets is a plus.
Compile embraces diversity and equal opportunity in a serious way. We are committed to building a team of people from many backgrounds, perspectives, and skills. We know the more inclusive we are, the better our work will be.
Job DescriptionPosition: Sr Data Engineer – Databricks & AWS
Experience: 4 - 5 Years
Company Profile:
Exponentia.ai is an AI tech organization with a presence across India, Singapore, the Middle East, and the UK. We are an innovative and disruptive organization, working on cutting-edge technology to help our clients transform into the enterprises of the future. We provide artificial intelligence-based products/platforms capable of automated cognitive decision-making to improve productivity, quality, and economics of the underlying business processes. Currently, we are transforming ourselves and rapidly expanding our business.
Exponentia.ai has developed long-term relationships with world-class clients such as PayPal, PayU, SBI Group, HDFC Life, Kotak Securities, Wockhardt and Adani Group amongst others.
One of the top partners of Cloudera (leading analytics player) and Qlik (leader in BI technologies), Exponentia.ai has recently been awarded the ‘Innovation Partner Award’ by Qlik in 2017.
Get to know more about us on our website: http://www.exponentia.ai/ and Life @Exponentia.
Role Overview:
· A Data Engineer understands the client requirements and develops and delivers the data engineering solutions as per the scope.
· The role requires good skills in the development of solutions using various services required for data architecture on Databricks Delta Lake, streaming, AWS, ETL Development, and data modeling.
Job Responsibilities
• Design of data solutions on Databricks including delta lake, data warehouse, data marts and other data solutions to support the analytics needs of the organization.
• Apply best practices during design in data modeling (logical, physical) and ETL pipelines (streaming and batch) using cloud-based services.
• Design, develop and manage the pipelining (collection, storage, access), data engineering (data quality, ETL, Data Modelling) and understanding (documentation, exploration) of the data.
• Interact with stakeholders regarding data landscape understanding, conducting discovery exercises, developing proof of concepts and demonstrating it to stakeholders.
Technical Skills
• Has more than 2 Years of experience in developing data lakes, and datamarts on the Databricks platform.
• Proven skill sets in AWS Data Lake services such as - AWS Glue, S3, Lambda, SNS, IAM, and skills in Spark, Python, and SQL.
• Experience in Pentaho
• Good understanding of developing a data warehouse, data marts etc.
• Has a good understanding of system architectures, and design patterns and should be able to design and develop applications using these principles.
Personality Traits
• Good collaboration and communication skills
• Excellent problem-solving skills to be able to structure the right analytical solutions.
• Strong sense of teamwork, ownership, and accountability
• Analytical and conceptual thinking
• Ability to work in a fast-paced environment with tight schedules.
• Good presentation skills with the ability to convey complex ideas to peers and management.
Education:
BE / ME / MS/MCA.
Job Overview
We are looking for a Data Engineer to join our data team to solve data-driven critical
business problems. The hire will be responsible for expanding and optimizing the existing
end-to-end architecture including the data pipeline architecture. The Data Engineer will
collaborate with software developers, database architects, data analysts, data scientists and platform team on data initiatives and will ensure optimal data delivery architecture is
consistent throughout ongoing projects. The right candidate should have hands on in
developing a hybrid set of data-pipelines depending on the business requirements.
Responsibilities
- Develop, construct, test and maintain existing and new data-driven architectures.
- Align architecture with business requirements and provide solutions which fits best
- to solve the business problems.
- Build the infrastructure required for optimal extraction, transformation, and loading
- of data from a wide variety of data sources using SQL and Azure ‘big data’
- technologies.
- Data acquisition from multiple sources across the organization.
- Use programming language and tools efficiently to collate the data.
- Identify ways to improve data reliability, efficiency and quality
- Use data to discover tasks that can be automated.
- Deliver updates to stakeholders based on analytics.
- Set up practices on data reporting and continuous monitoring
Required Technical Skills
- Graduate in Computer Science or in similar quantitative area
- 1+ years of relevant work experience as a Data Engineer or in a similar role.
- Advanced SQL knowledge, Data-Modelling and experience working with relational
- databases, query authoring (SQL) as well as working familiarity with a variety of
- databases.
- Experience in developing and optimizing ETL pipelines, big data pipelines, and datadriven
- architectures.
- Must have strong big-data core knowledge & experience in programming using Spark - Python/Scala
- Experience with orchestrating tool like Airflow or similar
- Experience with Azure Data Factory is good to have
- Build processes supporting data transformation, data structures, metadata,
- dependency and workload management.
- Experience supporting and working with cross-functional teams in a dynamic
- environment.
- Good understanding of Git workflow, Test-case driven development and using CICD
- is good to have
- Good to have some understanding of Delta tables It would be advantage if the candidate also have below mentioned experience using
- the following software/tools:
- Experience with big data tools: Hadoop, Spark, Hive, etc.
- Experience with relational SQL and NoSQL databases
- Experience with cloud data services
- Experience with object-oriented/object function scripting languages: Python, Scala, etc.
![skill icon](/_next/image?url=https%3A%2F%2Fcdn.cutshort.io%2Fpublic%2Fimages%2Fskill_icons%2Fdata_science.png&w=32&q=75)
![skill icon](/_next/image?url=https%3A%2F%2Fcdn.cutshort.io%2Fpublic%2Fimages%2Fskill_icons%2Fpython.png&w=32&q=75)
- Create and maintain optimal data pipeline architecture
- Assemble large, complex data sets that meet business requirements
- Identifying, designing, and implementing internal process improvements including redesigning infrastructure for greater scalability, optimizing data delivery, and automating manual processes
- Work with Data, Analytics & Tech team to extract, arrange and analyze data
- Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL and AWS technologies
- Building analytical tools to utilize the data pipeline, providing actionable insight into key business performance metrics including operational efficiency and customer acquisition
- Works closely with all business units and engineering teams to develop a strategy for long-term data platform architecture.
- Working with stakeholders including data, design, product, and executive teams, and assisting them with data-related technical issues
- Working with stakeholders including the Executive, Product, Data, and Design teams to support their data infrastructure needs while assisting with data-related technical issues.
- SQL
- Ruby or Python(Ruby preferred)
- Apache-Hadoop based analytics
- Data warehousing
- Data architecture
- Schema design
- ML
- Prior experience of 2 to 5 years as a Data Engineer.
- Ability in managing and communicating data warehouse plans to internal teams.
- Experience designing, building, and maintaining data processing systems.
- Ability to perform root cause analysis on external and internal processes and data to identify opportunities for improvement and answer questions.
- Excellent analytic skills associated with working on unstructured datasets.
- Ability to build processes that support data transformation, workload management, data structures, dependency, and metadata.
- Building and operationalizing large scale enterprise data solutions and applications using one or more of AZURE data and analytics services in combination with custom solutions - Azure Synapse/Azure SQL DWH, Azure Data Lake, Azure Blob Storage, Spark, HDInsights, Databricks, CosmosDB, EventHub/IOTHub.
- Experience in migrating on-premise data warehouses to data platforms on AZURE cloud.
- Designing and implementing data engineering, ingestion, and transformation functions
-
Azure Synapse or Azure SQL data warehouse
-
Spark on Azure is available in HD insights and data bricks
![skill icon](/_next/image?url=https%3A%2F%2Fcdn.cutshort.io%2Fpublic%2Fimages%2Fskill_icons%2Fdeep_learning.png&w=32&q=75)
![skill icon](/_next/image?url=https%3A%2F%2Fcdn.cutshort.io%2Fpublic%2Fimages%2Fskill_icons%2Fmachine_learning.png&w=32&q=75)
● Frame ML / AI use cases that can improve the company’s product
● Implement and develop ML / AI / Data driven rule based algorithms as software items
● For example, building a chatbot that replies an answer from relevant FAQ, and
reinforcing the system with a feedback loop so that the bot improves
Must have skills:
● Data extraction and ETL
● Python (numpy, pandas, comfortable with OOP)
● Django
● Knowledge of basic Machine Learning / Deep Learning / AI algorithms and ability to
implement them
● Good understanding of SDLC
● Deployed ML / AI model in a mobile / web product
● Soft skills : Strong communication skills & Critical thinking ability
Good to have:
● Full stack development experience
Required Qualification:
B.Tech. / B.E. degree in Computer Science or equivalent software engineering
- Create and maintain optimal data pipeline architecture
- Assemble large, complex data sets that meet functional / non-functional business requirements.
- Author data services using a variety of programming languages
- Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
- Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using Snowflake Cloud Datawarehouse as well as SQL and Azure ‘big data’ technologies
- Work with stakeholders including the Executive, Product, Data and Design teams to assist with data-related technical issues and support their data infrastructure needs.
- Keep our data separated and secure across national boundaries through multiple data centers and Azure regions.
- Create data tools for analytics and data scientist team members that assist them in building and optimizing our product into an innovative industry leader.
- Work with data and analytics experts to strive for greater functionality in our data systems.
- Work in an Agile environment with Scrum teams.
- Ensure data quality and help in achieving data governance.
Basic Qualifications
- 3+ years of experience in a Data Engineer or Software Engineer role
- Undergraduate degree required (Graduate degree preferred) in Computer Science, Statistics, Informatics, Information Systems or another quantitative field.
- Experience using the following software/tools:
- Experience with “Snowflake Cloud Datawarehouse”
- Experience with Azure cloud services: ADLS, ADF, ADLA, AAS
- Experience with data pipeline and workflow management tools
- Advanced working SQL knowledge and experience working with relational databases, query authoring (SQL) as well as working familiarity with a variety of databases
- Understanding of Datawarehouse (DWH) systems, and migration from DWH to data lakes/Snowflake
- Understanding of ELT and ETL patterns and when to use each. Understanding of data models and transforming data into the models
- Strong analytic skills related to working with unstructured datasets
- Build processes supporting data transformation, data structures, metadata, dependency and workload management
- Experience supporting and working with cross-functional teams in a dynamic environment.
![skill icon](/_next/image?url=https%3A%2F%2Fcdn.cutshort.io%2Fpublic%2Fimages%2Fskill_icons%2Fdata_science.png&w=32&q=75)
![skill icon](/_next/image?url=https%3A%2F%2Fcdn.cutshort.io%2Fpublic%2Fimages%2Fskill_icons%2Fpython.png&w=32&q=75)
![icon](/_next/image?url=https%3A%2F%2Fcdn.cutshort.io%2Fpublic%2Fimages%2Fsearch.png&w=48&q=75)
![companies logos](/_next/image?url=https%3A%2F%2Fcdn.cutshort.io%2Fpublic%2Fimages%2Fhiring_companies_logos-v2.webp&w=3840&q=80)