Purpose of Job:
We are looking for an exceptionally talented Lead data engineer who has
exposure in implementing AWS services to build data pipelines, api
integration and designing data warehouse. Candidate with both hands-on
and leadership capabilities will be ideal for this position.
Job Responsibilities:
• Total 6+ years of experience as a Data Engineer and 2+
years of experience in managing a team
• Have minimum 3 years of AWS Cloud experience.
• Well versed in languages such as Python, PySpark, SQL, NodeJS etc
• Has extensive experience in Spark ecosystem and has
worked on both real time and batch processing
• Have experience in AWS Glue, EMR, DMS, Lambda, S3, DynamoDB, Step functions, Airflow, RDS, Aurora etc.
• Experience with modern Database systems such as
Redshift, Presto, Hive etc.
• Worked on building data lakes in the past on S3 or
Apache Hudi • Solid understanding of Data Warehousing Concepts
• Good to have experience on tools such as Kafka or Kinesis
• Good to have AWS Developer Associate or Solutions
Architect Associate Certification
• Have experience in managing a team
Qualifications:
At least a bachelor’s degree in Science, Engineering, Applied
Mathematics. Preferred Masters degree. Other Requirements: Team Management skills, Learning Attitude, Ownership skills
About Fintech Company
Similar jobs
Job Description: Speech Recognition Engineer
3 years of industry experience
Responsibility:
Development of ASR engine using frameworks like ESPNET or FairSeq or Athena or
Deep Speech using PyTorch or Tensorflow
Assist to define technology required for Speech to Text besides core engine
and to design integration of these technologies.
Work on improvement of model's accuracy and guide the team with best practices.
Desired experience:
Good understanding of machine learning(ML) tools.
Should be well versed in classical speech processing methodologies like hidden
Markov models (HMMs), Gaussian mixture models (GMMs), Artificial neural
networks (ANNs), Language modeling, etc.
Hands-on experience of current deep learning (DL) techniques like Convolutional
neural networks (CNNs), recurrent neural networks (RNNs), long-term short-term
memory (LSTM), connectionist temporal classification (CTC), etc used for
speech processing is essential.
The candidate should have hands-on experience and any of the end-to-end
implementation of ASR tools such as ESPNET or FairSeq or Athena or Deep
Speech
Hands-on PyTorch and Tensorflow experience is desirable.
Experience in techniques used for resolving issues related to accuracy, noise,
confidence scoring etc.
Ability to implement recipes using scripting languages like bash
Ability to develop applications using python, c++, Java
Knowledge of Transformer Models like BERT-base. ELMo, ULM-FIT
Experience in CI/CD pipeline, MLOps processes
Hands-on deployment experience using cloud applications like AWS, GCP
Location: Bangalore
Desirable Skills: ML, Knowledge of Speech Recognition frameworks such as ESPNET
or FairSeq or Athena or Deep Speech etc, Hands-on experience of deep learning (DL)
techniques like CNN, RNN, LSTM etc., AWS, GCP
Responsibilities: - Write and maintain production level code in Python for deploying machine learning models - Create and maintain deployment pipelines through CI/CD tools (preferribly GitLab CI) - Implement alerts and monitoring for prediction accuracy and data drift detection - Implement automated pipelines for training and replacing models - Work closely with with the data science team to deploy new models to production Required Qualifications: - Degree in Computer Science, Data Science, IT or a related discipline. - 2+ years of experience in software engineering or data engineering. - Programming experience in Python - Experience in data profiling, ETL development, testing and implementation - Experience in deploying machine learning models
Good to have: - Experience in AWS resources for ML and data engineering (SageMaker, Glue, Athena, Redshift, S3) - Experience in deploying TensorFlow models - Experience in deploying and managing ML Flow
- Work closely with your business to identify issues and use data to propose solutions for effective decision making
- Build algorithms and design experiments to merge, manage, interrogate and extract data to supply tailored reports to colleagues, customers or the wider organisation.
- Creating and using advanced machine learning algorithms and statistics: regression, simulation, scenario analysis, modeling, clustering, decision trees, neural networks, etc
- Querying databases and using statistical computer languages: R, Python, SLQ, etc.
- Visualizing/presenting data through various Dashboards for Data Analysis, Using Python Dash, Flask etc.
- Test data mining models to select the most appropriate ones for use on a project
- Work in a POSIX/UNIX environment to run/deploy applications
- Mine and analyze data from company databases to drive optimization and improvement of product development, marketing techniques and business strategies.
- Develop custom data models and algorithms to apply to data sets.
- Use predictive modeling to increase and optimize customer experiences, revenue generation, ad targeting and other business outcomes.
- Assess the effectiveness of data sources and data-gathering techniques and improve data collection methods
- Horizon scan to stay up to date with the latest technology, techniques and methods
- Coordinate with different functional teams to implement models and monitor outcomes.
- Stay curious and enthusiastic about using algorithms to solve problems and enthuse others to see the benefit of your work.
General Expectations:
- Able to create algorithms to extract information from large data sets
- Strong knowledge of Python, R, Java or another scripting/statistical languages to automate data retrieval, manipulation and analysis.
- Experience with extracting and aggregating data from large data sets using SQL or other tools
- Strong understanding of various NLP, and NLU techniques like Named Entity Recognition, Summarization, Topic Modeling, Text Classification, Lemmatization and Stemming.
- Knowledge and experience in statistical and data mining techniques: GLM/Regression, Random Forest, Boosting, Trees, etc.
- Experience with Python libraries such as Pandas, NumPy, SciPy, Scikit-Learn
- Experience with Jupyter / Pandas / Numpy to manipulate and analyse data
- Knowledge of Machine Learning techniques and their respective pros and cons
- Strong Knowledge of various Data Science Visualization Tools like Tableau, PowerBI, D3, Plotly, etc.
- Experience using web services: Redshift, AWS, S3, Spark, DigitalOcean, etc.
- Proficiency in using query languages, such as SQL, Spark DataFrame API, etc.
- Hands-on experience in HTML, CSS, Bootstrap, JavaScript, AJAX, jQuery and Prototyping.
- Hands-on experience on C#, Javascript, .Net
- Experience in understanding and analyzing data using statistical software (e.g., Python, R, KDB+ and other relevant libraries)
- Experienced in building applications that meet enterprise needs – secure, scalable, loosely coupled design
- Strong knowledge of computer science, algorithms, and design patterns
- Strong oral and written communication, and other soft skills critical to collaborating and engage with teams
- Proficiency in shell scripting
- Proficiency in automation of tasks
- Proficiency in Pyspark/Python
- Proficiency in writing and understanding of sqoop
- Understanding of CloudEra manager
- Good understanding of RDBMS
- Good understanding of Excel
- Design AWS data ingestion frameworks and pipelines based on the specific needs driven by the Product Owners and user stories…
- Experience building Data Lake using AWS and Hands-on experience in S3, EKS, ECS, AWS Glue, AWS KMS, AWS Firehose, EMR
- Experience Apache Spark Programming with Databricks
- Experience working on NoSQL Databases such as Cassandra, HBase, and Elastic Search
- Hands on experience with leveraging CI/CD to rapidly build & test application code
- Expertise in Data governance and Data Quality
- Experience working with PCI Data and working with data scientists is a plus
- At least 4+ years of experience in the following Big Data frameworks: File Format (Parquet, AVRO, ORC), Resource Management, Distributed Processing and RDBMS
- 5+ years of experience on designing and developing Data Pipelines for Data Ingestion or Transformation using AWS technologies
- Use data to develop machine learning models that optimize decision making in Credit Risk, Fraud, Marketing, and Operations
- Implement data pipelines, new features, and algorithms that are critical to our production models
- Create scalable strategies to deploy and execute your models
- Write well designed, testable, efficient code
- Identify valuable data sources and automate collection processes.
- Undertake to preprocess of structured and unstructured data.
- Analyze large amounts of information to discover trends and patterns.
Requirements:
- 2+ years of experience in applied data science or engineering with a focus on machine learning
- Python expertise with good knowledge of machine learning libraries, tools, techniques, and frameworks (e.g. pandas, sklearn, xgboost, lightgbm, logistic regression, random forest classifier, gradient boosting regressor, etc)
- strong quantitative and programming skills with a product-driven sensibility
SpringML is looking to hire a top-notch Senior Data Engineer who is passionate about working with data and using the latest distributed framework to process large dataset. As an Associate Data Engineer, your primary role will be to design and build data pipelines. You will be focused on helping client projects on data integration, data prep and implementing machine learning on datasets. In this role, you will work on some of the latest technologies, collaborate with partners on early win, consultative approach with clients, interact daily with executive leadership, and help build a great company. Chosen team members will be part of the core team and play a critical role in scaling up our emerging practice.
RESPONSIBILITIES:
- Ability to work as a member of a team assigned to design and implement data integration solutions.
- Build Data pipelines using standard frameworks in Hadoop, Apache Beam and other open-source solutions.
- Learn quickly – ability to understand and rapidly comprehend new areas – functional and technical – and apply detailed and critical thinking to customer solutions.
- Propose design solutions and recommend best practices for large scale data analysis
SKILLS:
- B.tech degree in computer science, mathematics or other relevant fields.
- 4+years of experience in ETL, Data Warehouse, Visualization and building data pipelines.
- Strong Programming skills – experience and expertise in one of the following: Java, Python, Scala, C.
- Proficient in big data/distributed computing frameworks such as Apache,Spark, Kafka,
- Experience with Agile implementation methodologies
DataWeave provides Retailers and Brands with “Competitive Intelligence as a Service” that enables them to take key decisions that impact their revenue. Powered by AI, we provide easily consumable and actionable competitive intelligence by aggregating and analyzing billions of publicly available data points on the Web to help businesses develop data-driven strategies and make smarter decisions.
Data Science@DataWeave
We the Data Science team at DataWeave (called Semantics internally) build the core machine learning backend and structured domain knowledge needed to deliver insights through our data products. Our underpinnings are: innovation, business awareness, long term thinking, and pushing the envelope. We are a fast paced labs within the org applying the latest research in Computer Vision, Natural Language Processing, and Deep Learning to hard problems in different domains.
How we work?
It's hard to tell what we love more, problems or solutions! Every day, we choose to address some of the hardest data problems that there are. We are in the business of making sense of messy public data on the web. At serious scale!
What do we offer?
- Some of the most challenging research problems in NLP and Computer Vision. Huge text and image datasets that you can play with!
- Ability to see the impact of your work and the value you're adding to our customers almost immediately.
- Opportunity to work on different problems and explore a wide variety of tools to figure out what really excites you.
- A culture of openness. Fun work environment. A flat hierarchy. Organization wide visibility. Flexible working hours.
- Learning opportunities with courses and tech conferences. Mentorship from seniors in the team.
- Last but not the least, competitive salary packages and fast paced growth opportunities.
Who are we looking for?
The ideal candidate is a strong software developer or a researcher with experience building and shipping production grade data science applications at scale. Such a candidate has keen interest in liaising with the business and product teams to understand a business problem, and translate that into a data science problem. You are also expected to develop capabilities that open up new business productization opportunities.
We are looking for someone with 6+ years of relevant experience working on problems in NLP or Computer Vision with a Master's degree (PhD preferred).
Key problem areas
- Preprocessing and feature extraction noisy and unstructured data -- both text as well as images.
- Keyphrase extraction, sequence labeling, entity relationship mining from texts in different domains.
- Document clustering, attribute tagging, data normalization, classification, summarization, sentiment analysis.
- Image based clustering and classification, segmentation, object detection, extracting text from images, generative models, recommender systems.
- Ensemble approaches for all the above problems using multiple text and image based techniques.
Relevant set of skills
- Have a strong grasp of concepts in computer science, probability and statistics, linear algebra, calculus, optimization, algorithms and complexity.
- Background in one or more of information retrieval, data mining, statistical techniques, natural language processing, and computer vision.
- Excellent coding skills on multiple programming languages with experience building production grade systems. Prior experience with Python is a bonus.
- Experience building and shipping machine learning models that solve real world engineering problems. Prior experience with deep learning is a bonus.
- Experience building robust clustering and classification models on unstructured data (text, images, etc). Experience working with Retail domain data is a bonus.
- Ability to process noisy and unstructured data to enrich it and extract meaningful relationships.
- Experience working with a variety of tools and libraries for machine learning and visualization, including numpy, matplotlib, scikit-learn, Keras, PyTorch, Tensorflow.
- Use the command line like a pro. Be proficient in Git and other essential software development tools.
- Working knowledge of large-scale computational models such as MapReduce and Spark is a bonus.
- Be a self-starter—someone who thrives in fast paced environments with minimal ‘management’.
- It's a huge bonus if you have some personal projects (including open source contributions) that you work on during your spare time. Show off some of your projects you have hosted on GitHub.
Role and responsibilities
- Understand the business problems we are solving. Build data science capability that align with our product strategy.
- Conduct research. Do experiments. Quickly build throw away prototypes to solve problems pertaining to the Retail domain.
- Build robust clustering and classification models in an iterative manner that can be used in production.
- Constantly think scale, think automation. Measure everything. Optimize proactively.
- Take end to end ownership of the projects you are working on. Work with minimal supervision.
- Help scale our delivery, customer success, and data quality teams with constant algorithmic improvements and automation.
- Take initiatives to build new capabilities. Develop business awareness. Explore productization opportunities.
- Be a tech thought leader. Add passion and vibrance to the team. Push the envelope. Be a mentor to junior members of the team.
- Stay on top of latest research in deep learning, NLP, Computer Vision, and other relevant areas.