Job DescriptionPosition: Sr Data Engineer – Databricks & AWS
Experience: 4 - 5 Years
Company Profile:
Exponentia.ai is an AI tech organization with a presence across India, Singapore, the Middle East, and the UK. We are an innovative and disruptive organization, working on cutting-edge technology to help our clients transform into the enterprises of the future. We provide artificial intelligence-based products/platforms capable of automated cognitive decision-making to improve productivity, quality, and economics of the underlying business processes. Currently, we are transforming ourselves and rapidly expanding our business.
Exponentia.ai has developed long-term relationships with world-class clients such as PayPal, PayU, SBI Group, HDFC Life, Kotak Securities, Wockhardt and Adani Group amongst others.
One of the top partners of Cloudera (leading analytics player) and Qlik (leader in BI technologies), Exponentia.ai has recently been awarded the ‘Innovation Partner Award’ by Qlik in 2017.
Get to know more about us on our website: http://www.exponentia.ai/ and Life @Exponentia.
Role Overview:
· A Data Engineer understands the client requirements and develops and delivers the data engineering solutions as per the scope.
· The role requires good skills in the development of solutions using various services required for data architecture on Databricks Delta Lake, streaming, AWS, ETL Development, and data modeling.
Job Responsibilities
• Design of data solutions on Databricks including delta lake, data warehouse, data marts and other data solutions to support the analytics needs of the organization.
• Apply best practices during design in data modeling (logical, physical) and ETL pipelines (streaming and batch) using cloud-based services.
• Design, develop and manage the pipelining (collection, storage, access), data engineering (data quality, ETL, Data Modelling) and understanding (documentation, exploration) of the data.
• Interact with stakeholders regarding data landscape understanding, conducting discovery exercises, developing proof of concepts and demonstrating it to stakeholders.
Technical Skills
• Has more than 2 Years of experience in developing data lakes, and datamarts on the Databricks platform.
• Proven skill sets in AWS Data Lake services such as - AWS Glue, S3, Lambda, SNS, IAM, and skills in Spark, Python, and SQL.
• Experience in Pentaho
• Good understanding of developing a data warehouse, data marts etc.
• Has a good understanding of system architectures, and design patterns and should be able to design and develop applications using these principles.
Personality Traits
• Good collaboration and communication skills
• Excellent problem-solving skills to be able to structure the right analytical solutions.
• Strong sense of teamwork, ownership, and accountability
• Analytical and conceptual thinking
• Ability to work in a fast-paced environment with tight schedules.
• Good presentation skills with the ability to convey complex ideas to peers and management.
Education:
BE / ME / MS/MCA.
About Exponentia.ai
Data is the new Oil and AI is the most powerful value accelerator today - this is one of the most important belief we live by. We are an award-winning AI-Tech MNC transforming businesses through AI, ML and Data Analytics Solutions.
We help organizations solve complex business challenges by combining industry experience, data and AI first engineering practices, and our propriety technology solutions to achieve business outcomes at scale.
Our propriety solutions include - OneTAP for Voice Analytics, Intelligent Nudges, Customer Intelligence Platform and Engagely.ai and we specialise in Data (ML, BI, Cloud, DWH & Data Lakes) and AI Solutions (NLP, Conversational Analytics).
Exponentia.ai was founded in the year 2014 and is headquartered in Mumbai, India. We have expanded our services to the UK, Singapore and US. We’ve been honored with the Innovation Award by Qlik & Excellence in Business Process Automation by Automation Anywhere.
Visit our website- www.exponentia.ai to learn more about our products and services.
Similar jobs
- Bring in industry best practices around creating and maintaining robust data pipelines for complex data projects with/without AI component
- programmatically ingesting data from several static and real-time sources (incl. web scraping)
- rendering results through dynamic interfaces incl. web / mobile / dashboard with the ability to log usage and granular user feedbacks
- performance tuning and optimal implementation of complex Python scripts (using SPARK), SQL (using stored procedures, HIVE), and NoSQL queries in a production environment
- Industrialize ML / DL solutions and deploy and manage production services; proactively handle data issues arising on live apps
- Perform ETL on large and complex datasets for AI applications - work closely with data scientists on performance optimization of large-scale ML/DL model training
- Build data tools to facilitate fast data cleaning and statistical analysis
- Ensure data architecture is secure and compliant
- Resolve issues escalated from Business and Functional areas on data quality, accuracy, and availability
- Work closely with APAC CDO and coordinate with a fully decentralized team across different locations in APAC and global HQ (Paris).
You should be
- Expert in structured and unstructured data in traditional and Big data environments – Oracle / SQLserver, MongoDB, Hive / Pig, BigQuery, and Spark
- Have excellent knowledge of Python programming both in traditional and distributed models (PySpark)
- Expert in shell scripting and writing schedulers
- Hands-on experience with Cloud - deploying complex data solutions in hybrid cloud / on-premise environment both for data extraction/storage and computation
- Hands-on experience in deploying production apps using large volumes of data with state-of-the-art technologies like Dockers, Kubernetes, and Kafka
- Strong knowledge of data security best practices
- 5+ years experience in a data engineering role
- Science / Engineering graduate from a Tier-1 university in the country
- And most importantly, you must be a passionate coder who really cares about building apps that can help people do things better, smarter, and faster even when they sleep
- Partners with business stakeholders to translate business objectives into clearly defined analytical projects.
- Identify opportunities for text analytics and NLP to enhance the core product platform, select the best machine learning techniques for the specific business problem and then build the models that solve the problem.
- Own the end-end process, from recognizing the problem to implementing the solution.
- Define the variables and their inter-relationships and extract the data from our data repositories, leveraging infrastructure including Cloud computing solutions and relational database environments.
- Build predictive models that are accurate and robust and that help our customers to utilize the core platform to the maximum extent.
Skills and Qualification
- 12 to 15 yrs of experience.
- An advanced degree in predictive analytics, machine learning, artificial intelligence; or a degree in programming and significant experience with text analytics/NLP. He shall have a strong background in machine learning (unsupervised and supervised techniques). In particular, excellent understanding of machine learning techniques and algorithms, such as k-NN, Naive Bayes, SVM, Decision Forests, logistic regression, MLPs, RNNs, etc.
- Experience with text mining, parsing, and classification using state-of-the-art techniques.
- Experience with information retrieval, Natural Language Processing, Natural Language
- Understanding and Neural Language Modeling.
- Ability to evaluate the quality of ML models and to define the right performance metrics for models in accordance with the requirements of the core platform.
- Experience in the Python data science ecosystem: Pandas, NumPy, SciPy, sci-kit-learn, NLTK, Gensim, etc.
- Excellent verbal and written communication skills, particularly possessing the ability to share technical results and recommendations to both technical and non-technical audiences.
- Ability to perform high-level work both independently and collaboratively as a project member or leader on multiple projects.
Data Engineer JD:
- Designing, developing, constructing, installing, testing and maintaining the complete data management & processing systems.
- Building highly scalable, robust, fault-tolerant, & secure user data platform adhering to data protection laws.
- Taking care of the complete ETL (Extract, Transform & Load) process.
- Ensuring architecture is planned in such a way that it meets all the business requirements.
- Exploring new ways of using existing data, to provide more insights out of it.
- Proposing ways to improve data quality, reliability & efficiency of the whole system.
- Creating data models to reduce system complexity and hence increase efficiency & reduce cost.
- Introducing new data management tools & technologies into the existing system to make it more efficient.
- Setting up monitoring and alarming on data pipeline jobs to detect failures and anomalies
What do we expect from you?
- BS/MS in Computer Science or equivalent experience
- 5 years of recent experience in Big Data Engineering.
- Good experience in working with Hadoop and Big Data technologies like HDFS, Pig, Hive, Zookeeper, Storm, Spark, Airflow and NoSQL systems
- Excellent programming and debugging skills in Java or Python.
- Apache spark, python, hands on experience in deploying ML models
- Has worked on streaming and realtime pipelines
- Experience with Apache Kafka or has worked with any of Spark Streaming, Flume or Storm
Focus Area:
R1 |
Data structure & Algorithms |
R2 |
Problem solving + Coding |
R3 |
Design (LLD) |
- Strong analytical skills with the ability to collect, organize, analyze, and disseminate significant amounts of information with attention to detail and accuracy
- Expertise of SQL/PL-SQL -ability to write procedures and create queries for reporting purpose.
- Must have worked on a reporting tool – Power BI/Tableau etc.
- Strong knowledge of excel/Google Sheets – must have worked with pivot tables, aggregate functions, logical if conditions.
- Strong verbal and written communication skills for coordination with departments.
- An analytical mind and inclination for problem-solving
Strong knowledge in Power BI (DAX + Power Query + Power BI Service + Power BI
Desktop Visualisations) and Azure Data Storages.
Should have experience in Power BI mobile Dashboards.
Strong knowledge in SQL.
Good knowledge of DWH concepts.
Work as an independent contributor at the client location.
Implementing Access Control and impose required Security.
Candidate must have very good communication skills.
- Data pre-processing, data transformation, data analysis, and feature engineering
- Performance optimization of scripts (code) and Productionizing of code (SQL, Pandas, Python or PySpark, etc.)
- Required skills:
- Bachelors in - in Computer Science, Data Science, Computer Engineering, IT or equivalent
- Fluency in Python (Pandas), PySpark, SQL, or similar
- Azure data factory experience (min 12 months)
- Able to write efficient code using traditional, OO concepts, modular programming following the SDLC process.
- Experience in production optimization and end-to-end performance tracing (technical root cause analysis)
- Ability to work independently with demonstrated experience in project or program management
- Azure experience ability to translate data scientist code in Python and make it efficient (production) for cloud deployment
- Total Experience of 7-10 years and should be interested in teaching and research
- 3+ years’ experience in data engineering which includes data ingestion, preparation, provisioning, automated testing, and quality checks.
- 3+ Hands-on experience in Big Data cloud platforms like AWS and GCP, Data Lakes and Data Warehouses
- 3+ years of Big Data and Analytics Technologies. Experience in SQL, writing code in spark engine using python, scala or java Language. Experience in Spark, Scala
- Experience in designing, building, and maintaining ETL systems
- Experience in data pipeline and workflow management tools like Airflow
- Application Development background along with knowledge of Analytics libraries, opensource Natural Language Processing, statistical and big data computing libraries
- Familiarity with Visualization and Reporting Tools like Tableau, Kibana.
- Should be good at storytelling in Technology
Qualification: B.Tech / BE / M.Sc / MBA / B.Sc, Having Certifications in Big Data Technologies and Cloud platforms like AWS, Azure and GCP will be preferred
Primary Skills: Big Data + Python + Spark + Hive + Cloud Computing
Secondary Skills: NoSQL+ SQL + ETL + Scala + Tableau
Selection Process: 1 Hackathon, 1 Technical round and 1 HR round
Benefit: Free of cost training on Data Science from top notch professors
Position Name: Software Developer
Required Experience: 3+ Years
Number of positions: 4
Qualifications: Master’s or Bachelor s degree in Engineering, Computer Science, or equivalent (BE/BTech or MS in Computer Science).
Key Skills: Python, Django, Ngnix, Linux, Sanic, Pandas, Numpy, Snowflake, SciPy, Data Visualization, RedShift, BigData, Charting
Compensation - As per industry standards.
Joining - Immediate joining is preferrable.
Required Skills:
- Strong Experience in Python and web frameworks like Django, Tornado and/or Flask
- Experience in data analytics using standard python libraries using Pandas, NumPy, MatPlotLib
- Conversant in implementing charts using charting libraries like Highcharts, d3.js, c3.js, dc.js and data Visualization tools like Plotly, GGPlot
- Handling and using large databases and Datawarehouse technologies like MongoDB, MySQL, BigData, Snowflake, Redshift.
- Experience in building APIs, Multi-threading for tasks on Linux platform
- Exposure to finance and capital markets will be added advantage.
- Strong understanding of software design principles, algorithms, data structures, design patterns, and multithreading concepts.
- Worked on building highly-available distributed systems on cloud infrastructure or have had exposure to architectural pattern of a large, high-scale web application.
- Strong understanding of software design principles, algorithms, data structures, design patterns, and multithreading concepts.
- Basic understanding of front-end technologies, such as JavaScript, HTML5, and CSS3
Company Description:
Reval Analytical Services is a fully-owned subsidiary of Virtua Research Inc. US. It is a financial services technology company focused on consensus analytics, peer analytics and Web-enabled information delivery. The Company’s unique combination of investment research experience, modeling expertise, and software development capabilities enables it to provide industry-leading financial research tools and services for investors, analysts, and corporate management.
Website: http://www.virtuaresearch.com" target="_blank">www.virtuaresearch.com