Sr. Data Engineer
at A leader in Cognitive and Emerging Technologies Business
Job Title: Sr. Data Engineer
Experience: 5 to 8 years
Work Location: Hyderabad (option to work remotely)
Skillset: Python, PySpark, Kafka, Airflow, Sql, NoSql, API Integration,Data pipeline, Big Data, AWS/ GCP/ OCI/Azure
2. Tech Round -I
3. Tech Round - II
4. HR Round
Calling out Python ninjas to showcase their expertise in a stimulating environment, geared towards building cutting-edge products and services. If you have a knack for data processing, scripting and
are excited about delivering a scalable, high-quality data ingestion, API Integration solutions, then we are looking for you!
You will get a chance to work on exciting projects at our state-of-the-art office, grow along with the company and be fruitfully rewarded for your efforts!
● Understanding our data sets and how to bring them together.
● Working with our engineering team to support custom solutions offered to the product development..
● Filling the gap between development, engineering and data ops.
● Creating, maintaining and documenting scripts to support ongoing custom solutions.
● Excellent organizational skills, including attention to precise details
● Strong multitasking skills and ability to work in a fast-paced environment
● 5+ years experience with Python to develop scripts.
● Know your way around RESTFUL APIs.[Able to integrate not necessary to publish]
● You are familiar with pulling and pushing files from SFTP and AWS S3.
● Experience with any Cloud solutions including GCP / AWS / OCI / Azure.
● Familiarity with SQL programming to query and transform data from relational Databases.
● Familiarity to work with Linux (and Linux work environment).
● Excellent written and verbal communication skills
● Extracting, transforming, and loading data into internal databases and Hadoop
● Optimizing our new and existing data pipelines for speed and reliability
● Deploying product build and product improvements
● Documenting and managing multiple repositories of code
● Experience with SQL and NoSQL databases (Casendra, MySQL)
● Hands-on experience in data pipelining and ETL. (Any of these frameworks/tools: Hadoop, BigQuery,
● Hands-on experience in AirFlow
● Understanding of best practices, common coding patterns and good practices around
● storing, partitioning, warehousing and indexing of data
● Experience in reading the data from Kafka topic (both live stream and offline)
● Experience in PySpark and Data frames
● Collaborating across an agile team to continuously design, iterate, and develop big data systems.
● Extracting, transforming, and loading data into internal databases.
● Optimizing our new and existing data pipelines for speed and reliability.
● Deploying new products and product improvements.
● Documenting and managing multiple repositories of code.
- Mandatory - Hands on experience in Python and PySpark.
- Build pySpark applications using Spark Dataframes in Python using Jupyter notebook and PyCharm(IDE).
- Worked on optimizing spark jobs that processes huge volumes of data.
- Hands on experience in version control tools like Git.
- Worked on Amazon’s Analytics services like Amazon EMR, Lambda function etc
- Worked on Amazon’s Compute services like Amazon Lambda, Amazon EC2 and Amazon’s Storage service like S3 and few other services like SNS.
- Experience/knowledge of bash/shell scripting will be a plus.
- Experience in working with fixed width, delimited , multi record file formats etc.
- Hands on experience in tools like Jenkins to build, test and deploy the applications
- Awareness of Devops concepts and be able to work in an automated release pipeline environment.
- Excellent debugging skills.
The data science team is responsible for solving business problems with complex data. Data complexity could be characterized in terms of volume, dimensionality and multiple touchpoints/sources. We understand the data, ask fundamental-first-principle questions, apply our analytical and machine learning skills to solve the problem in the best way possible.
Our ideal candidate
The role would be a client facing one, hence good communication skills are a must.
The candidate should have the ability to communicate complex models and analysis in a clear and precise manner.
The candidate would be responsible for:
- Comprehending business problems properly - what to predict, how to build DV, what value addition he/she is bringing to the client, etc.
- Understanding and analyzing large, complex, multi-dimensional datasets and build features relevant for business
- Understanding the math behind algorithms and choosing one over another
- Understanding approaches like stacking, ensemble and applying them correctly to increase accuracy
Desired technical requirements
- Proficiency with Python and the ability to write production-ready codes.
- Experience in pyspark, machine learning and deep learning
- Big data experience, e.g. familiarity with Spark, Hadoop, is highly preferred
- Familiarity with SQL or other databases.
Data Engineer JD:
- Designing, developing, constructing, installing, testing and maintaining the complete data management & processing systems.
- Building highly scalable, robust, fault-tolerant, & secure user data platform adhering to data protection laws.
- Taking care of the complete ETL (Extract, Transform & Load) process.
- Ensuring architecture is planned in such a way that it meets all the business requirements.
- Exploring new ways of using existing data, to provide more insights out of it.
- Proposing ways to improve data quality, reliability & efficiency of the whole system.
- Creating data models to reduce system complexity and hence increase efficiency & reduce cost.
- Introducing new data management tools & technologies into the existing system to make it more efficient.
- Setting up monitoring and alarming on data pipeline jobs to detect failures and anomalies
What do we expect from you?
- BS/MS in Computer Science or equivalent experience
- 5 years of recent experience in Big Data Engineering.
- Good experience in working with Hadoop and Big Data technologies like HDFS, Pig, Hive, Zookeeper, Storm, Spark, Airflow and NoSQL systems
- Excellent programming and debugging skills in Java or Python.
- Apache spark, python, hands on experience in deploying ML models
- Has worked on streaming and realtime pipelines
- Experience with Apache Kafka or has worked with any of Spark Streaming, Flume or Storm
Data structure & Algorithms
Problem solving + Coding
● B.Tech/Masters in Mathematics, Statistics, Computer Science or another
● 2-3+ years of work experience in ML domain ( 2-5 years experience )
● Hands-on coding experience in Python
● Experience in machine learning techniques such as Regression, Classification,
Predictive modeling, Clustering, Deep Learning stack, NLP
● Working knowledge of Tensorflow/PyTorch
● Experience with distributed computing frameworks: Map/Reduce, Hadoop, Spark
● Experience with databases: MongoDB
- Design and develop strong analytics system and predictive models
- Managing a team of data scientists, machine learning engineers, and big data specialists
- Identify valuable data sources and automate data collection processes
- Undertake pre-processing of structured and unstructured data
- Analyze large amounts of information to discover trends and patterns
- Build predictive models and machine-learning algorithms
- Combine models through ensemble modeling
- Present information using data visualization techniques
- Propose solutions and strategies to business challenges
- Collaborate with engineering and product development teams
- Proven experience as a seasoned Data Scientist
- Good Experience in data mining processes
- Understanding of machine learning and Knowledge of operations research is a value addition
- Strong understanding and experience in R, SQL, and Python; Knowledge base with Scala, Java, or C++ is an asset
- Experience using business intelligence tools (e. g. Tableau) and data frameworks (e. g. Hadoop)
- Strong math skills (e. g. statistics, algebra)
- Problem-solving aptitude
- Excellent communication and presentation skills
- Experience in Natural Language Processing (NLP)
- Strong competitive coding skills
- BSc/BA in Computer Science, Engineering or relevant field; graduate degree in Data Science or other quantitative field is preferred
- Ensure and own Data integrity across distributed systems.
- Extract, Transform and Load data from multiple systems for reporting into BI platform.
- Create Data Sets and Data models to build intelligence upon.
- Develop and own various integration tools and data points.
- Hands-on development and/or design within the project in order to maintain timelines.
- Work closely with the Project manager to deliver on business requirements OTIF (on time in full)
- Understand the cross-functional business data points thoroughly and be SPOC for all data-related queries.
- Work with both Web Analytics and Backend Data analytics.
- Support the rest of the BI team in generating reports and analysis
- Quickly learn and use Bespoke & third party SaaS reporting tools with little documentation.
- Assist in presenting demos and preparing materials for Leadership.
- Strong experience in Datawarehouse modeling techniques and SQL queries
- A good understanding of designing, developing, deploying, and maintaining Power BI report solutions
- Ability to create KPIs, visualizations, reports, and dashboards based on business requirement
- Knowledge and experience in prototyping, designing, and requirement analysis
- Be able to implement row-level security on data and understand application security layer models in Power BI
- Proficiency in making DAX queries in Power BI desktop.
- Expertise in using advanced level calculations on data sets
- Experience in the Fintech domain and stakeholder management.
- Hands-on development/maintenance experience in Tableau: Developing, maintaining, and managing advanced reporting, analytics, dashboards and other BI solutions using Tableau
- Reviewing and improving existing Tableau dashboards and data models/ systems and collaborating with teams to integrate new systems
- Provide support and expertise to the business community to assist with better utilization of Tableau
- Understand business requirements, conduct analysis and recommend solution options for intelligent dashboards in Tableau
- Experience with Data Extraction, Transformation and Load (ETL) – knowledge of how to extract, transform and load data
- Execute SQL data queries across multiple data sources in support of business intelligence reporting needs. Format query results / reports in various ways
- Participates in QA testing, liaising with other project team members and being responsive to client's needs, all with an eye on details in a fast-paced environment
- Performing and documenting data analysis, data validation, and data mapping/design
Key Performance Indicators (Indicate how performance will be measured: indicators, activities…)
KPIs will be outlined in detail in the goal sheet
Ideal Background (State the minimum and desirable education and experience level)
Minimum: Graduation, preferably in Science
· Minimum: 2-3 years’ relevant work experience in the field of reporting and data analytics using Tableau.
· Tableau certifications would be preferred
· Work experience in the regulated medical device / Pharmaceutical industry would be an added advantage, but not mandatory
Minimum: English (written and spoken)
Specific Professional Competencies: Indicate any other soft/technical/professional knowledge and skills requirements