Data Transformation Services Jobs in Mumbai
Essential Duties and Responsibilities:
- Build data systems and pipelines
- Prepare data for ML modeling
- Combine raw information from different sources
- Conduct complex data analysis and report on results
- Build data systems and pipelines.
Work Experience :
- 3 years of experience working with Node, AI/ML & Data Transformation Tools
- Hands on experience with ETL & Data Visualization tools
- Familiarity with Python (Numpy, Pandas)
- Experience with SQL & NoSQL DBs
Must Have : Python, Data warehouse tool , ETL, SQL/MongoDB, Data modeling, Data transformation, Data visualization
Nice to have: MongoDB/ SQL, Snowflake, Matillion, Node.JS, ML model building
Job Overview
We are looking for a Data Engineer to join our data team to solve data-driven critical
business problems. The hire will be responsible for expanding and optimizing the existing
end-to-end architecture including the data pipeline architecture. The Data Engineer will
collaborate with software developers, database architects, data analysts, data scientists and platform team on data initiatives and will ensure optimal data delivery architecture is
consistent throughout ongoing projects. The right candidate should have hands on in
developing a hybrid set of data-pipelines depending on the business requirements.
Responsibilities
- Develop, construct, test and maintain existing and new data-driven architectures.
- Align architecture with business requirements and provide solutions which fits best
- to solve the business problems.
- Build the infrastructure required for optimal extraction, transformation, and loading
- of data from a wide variety of data sources using SQL and Azure ‘big data’
- technologies.
- Data acquisition from multiple sources across the organization.
- Use programming language and tools efficiently to collate the data.
- Identify ways to improve data reliability, efficiency and quality
- Use data to discover tasks that can be automated.
- Deliver updates to stakeholders based on analytics.
- Set up practices on data reporting and continuous monitoring
Required Technical Skills
- Graduate in Computer Science or in similar quantitative area
- 1+ years of relevant work experience as a Data Engineer or in a similar role.
- Advanced SQL knowledge, Data-Modelling and experience working with relational
- databases, query authoring (SQL) as well as working familiarity with a variety of
- databases.
- Experience in developing and optimizing ETL pipelines, big data pipelines, and datadriven
- architectures.
- Must have strong big-data core knowledge & experience in programming using Spark - Python/Scala
- Experience with orchestrating tool like Airflow or similar
- Experience with Azure Data Factory is good to have
- Build processes supporting data transformation, data structures, metadata,
- dependency and workload management.
- Experience supporting and working with cross-functional teams in a dynamic
- environment.
- Good understanding of Git workflow, Test-case driven development and using CICD
- is good to have
- Good to have some understanding of Delta tables It would be advantage if the candidate also have below mentioned experience using
- the following software/tools:
- Experience with big data tools: Hadoop, Spark, Hive, etc.
- Experience with relational SQL and NoSQL databases
- Experience with cloud data services
- Experience with object-oriented/object function scripting languages: Python, Scala, etc.
This profile will include the following responsibilities:
- Develop Parsers for XML and JSON Data sources/feeds
- Write Automation Scripts for product development
- Build API Integrations for 3rd Party product integration
- Perform Data Analysis
- Research on Machine learning algorithms
- Understand AWS cloud architecture and work with 3 party vendors for deployments
- Resolve issues in AWS environmentWe are looking for candidates with:
Qualification: BE/BTech/Bsc-IT/MCA
Programming Language: Python
Web Development: Basic understanding of Web Development. Working knowledge of Python Flask is desirable
Database & Platform: AWS/Docker/MySQL/MongoDB
Basic Understanding of Machine Learning Models & AWS Fundamentals is recommended.
- Handling Survey Scripting Process through the use of survey software platform such as Toluna, QuestionPro, Decipher.
- Mining large & complex data sets using SQL, Hadoop, NoSQL or Spark.
- Delivering complex consumer data analysis through the use of software like R, Python, Excel and etc such as
- Working on Basic Statistical Analysis such as:T-Test &Correlation
- Performing more complex data analysis processes through Machine Learning technique such as:
- Classification
- Regression
- Clustering
- Text
- Analysis
- Neural Networking
- Creating an Interactive Dashboard Creation through the use of software like Tableau or any other software you are able to use.
- Working on Statistical and mathematical modelling, application of ML and AI algorithms
What you need to have:
- Bachelor or Master's degree in highly quantitative field (CS, machine learning, mathematics, statistics, economics) or equivalent experience.
- An opportunity for one, who is eager of proving his or her data analytical skills with one of the Biggest FMCG market player.
About IDfy
IDfy is ranked amongst the World's Top 100 Regulatory Technology companies for the last two years. IDfy's AI-powered technology solutions help real people unlock real opportunities. We create the confidence required for people and businesses to engage with each other in the digital world. If you have used any major payment wallets, digitally opened a bank account , have used a self-drive car, have played a real-money online game, or hosted people through AirBnB, it's quite likely that your identity has been verified through IDfy at some point.
About the team
- The machine learning team is a closely knit team responsible for building models and services that support key workflows for IDfy.
- Our models are critical for these workflows and as such are expected to perform accurately and with low latency. We use a mix of conventional and hand-crafted deep learning models.
- The team comes from diverse backgrounds and experience. We respect opinions and believe in honest, open communication.
- We work directly with business and product teams to craft solutions for our customers. We know that we are, and function as a platform and not a services company.
About the role
In this role you will:
- Work on all aspects of a production machine learning platform: acquiring data, training and building models, deploying models, building API services for exposing these models, maintaining them in production, and more.
- Work on performance tuning of models
- From time to time work on support and debugging of these production systems
- Work on researching the latest technology in the areas of our interest and applying it to build newer products and enhancement of the existing platform.
- Building workflows for training and production systems
- Contribute to documentation
While the emphasis will be on researching, building and deploying models into production, you will be expected to contribute to aspects mentioned above.
About you
You are a seasoned machine learning engineer (or data scientist). Our ideal candidate is someone with 5+ years of experience in production machine learning.
Must Haves
- You should be experienced in framing and solving complex problems with the application of machine learning or deep learning models.
- Deep expertise in computer vision or NLP with the experience of putting it into production at scale.
- You have experienced that and understand that modelling is only a small part of building and delivering AI solutions and know what it takes to keep a high-performance system up and running.
- Managing a large scale production ML system for at least a couple of years
- Optimization and tuning of models for deployment at scale
- Monitoring and debugging of production ML systems
- An enthusiasm and drive to learn, assimilate and disseminate the state of the art research. A lot of what we are building will require innovative approaches using newly researched models and applications.
- Past experience of mentoring junior colleagues
- Knowledge of and experience in ML Ops and tooling for efficient machine learning processes
Good to Have
- Our stack also includes languages like Go and Elixir. We would love it if you know any of these or take interest in functional programming.
- We use Docker and Kubernetes for deploying our services, so an understanding of this would be useful to have.
- Experience in using any other platform, frameworks, tools.
Other things to keep in mind
- Our goal is to help a significant part of the world’s population unlock real opportunities. This is an opportunity to make a positive impact here, and we hope you like it as much as we do.
Life At IDfy
People at IDfy care about creating value. We take pride in the strong collaborative culture that we have built, and our love for solving challenging problems. Life at IDfy is not always what you’d expect at a tech start-up that’s growing exponentially every quarter. There’s still time and space for balance.
We host regular talks, events and performances around Life, Art, Sports, and Technology; continuously sparking creative neurons in our people to keep their intellectual juices flowing. There’s never a dull day at IDfy. The office environment is casual and it goes beyond just the dress code. We have no conventional hierarchies and believe in an open-door policy where everyone is approachable.
About the role
In this role you will:
- Be working on all aspects of a production machine learning system. You will be acquiring data, training and building models, deploying models, building API services for exposing these models, maintaining them in production, and more.
- Work on performance tuning of models
- From time to time work on support and debugging of these production systems
- Work on researching the latest technology in the areas of our interest and applying it to build newer products and enhancement of the existing platform.
- Building workflows for training and production systems
- Contribute to documentation
About you
- You are a mid-career machine learning engineer (or data scientist). Our ideal candidate is someone with 4-6 years of experience in data science.
Must Haves
- You should be experienced in framing and solving problems with the application of machine learning or deep learning models.
- Knowledge of and experience in computer vision. A large part of our work revolves around computer vision, and you should have worked on this in a production environment.
- You have experienced that and understand that modelling is only a small part of building and delivering AI solutions and know what it takes to keep a high-performance system up and running.
- Our usage of libraries and tooling is oriented around Python, Tensorflow and Pytorch, so we would want you to have a good understanding of and experience in applying these.
- We build our own services, hence we would expect you to have knowledge of writing APIs.
- Enthusiasm and drive to learn and assimilate the state of the art research. A lot of what we are building will require innovative approaches using newly researched models and applications.
Good to Have
- Our stack also includes languages like Ruby, Go and Elixir. We would love it if you know any of these or take interest in functional programming.
- Knowledge of and experience in ML Ops and tooling would be a welcome addition. We use Docker and Kubernetes for deploying our services.
- Experience in using any other platform, frameworks, tools.
About IDfy
IDfy is ranked amongst the World's Top 100 Regulatory Technology companies for the last two years. IDfy's AI-powered technology solutions help identify people accurately, authenticate their credentials, and make sure that no fraud or impersonator enters the system. We create the confidence required for people and businesses to engage with each other in the digital world. If you have used any major payment wallets, or have used a self-drive car, or have played a real-money online game, or hosted people through AirBnB, it's quite likely that your identity has been verified by IDfy at some point.
About the team
- The machine learning team is a self-contained team responsible for building models and services that support key workflows for IDfy.
- Our models are gating criteria for these workflows and as such are expected to perform accurately and quickly. We use a mix of conventional and hand-crafted deep learning models.
- The team comes from diverse backgrounds and experience. We have IIT-ians, ex-bankers and startup founders.
- We work directly with business and product teams to craft solutions for our customers. We know that we are, and function as a platform and not a services company.
Object-oriented languages (e.g. Python, PySpark, Java, C#, C++ ) and frameworks (e.g. J2EE or .NET)
We’re looking to hire someone to help scale Machine Learning and NLP efforts at Episource. You’ll work with the team that develops the models powering Episource’s product focused on NLP driven medical coding. Some of the problems include improving our ICD code recommendations , clinical named entity recognition and information extraction from clinical notes.
This is a role for highly technical machine learning & data engineers who combine outstanding oral and written communication skills, and the ability to code up prototypes and productionalize using a large range of tools, algorithms, and languages. Most importantly they need to have the ability to autonomously plan and organize their work assignments based on high-level team goals.
You will be responsible for setting an agenda to develop and ship machine learning models that positively impact the business, working with partners across the company including operations and engineering. You will use research results to shape strategy for the company, and help build a foundation of tools and practices used by quantitative staff across the company.
What you will achieve:
-
Define the research vision for data science, and oversee planning, staffing, and prioritization to make sure the team is advancing that roadmap
-
Invest in your team’s skills, tools, and processes to improve their velocity, including working with engineering counterparts to shape the roadmap for machine learning needs
-
Hire, retain, and develop talented and diverse staff through ownership of our data science hiring processes, brand, and functional leadership of data scientists
-
Evangelise machine learning and AI internally and externally, including attending conferences and being a thought leader in the space
-
Partner with the executive team and other business leaders to deliver cross-functional research work and models
Required Skills:
-
Strong background in classical machine learning and machine learning deployments is a must and preferably with 4-8 years of experience
-
Knowledge of deep learning & NLP
-
Hands-on experience in TensorFlow/PyTorch, Scikit-Learn, Python, Apache Spark & Big Data platforms to manipulate large-scale structured and unstructured datasets.
-
Experience with GPU computing is a plus.
-
Professional experience as a data science leader, setting the vision for how to most effectively use data in your organization. This could be through technical leadership with ownership over a research agenda, or developing a team as a personnel manager in a new area at a larger company.
-
Expert-level experience with a wide range of quantitative methods that can be applied to business problems.
-
Evidence you’ve successfully been able to scope, deliver and sell your own research in a way that shifts the agenda of a large organization.
-
Excellent written and verbal communication skills on quantitative topics for a variety of audiences: product managers, designers, engineers, and business leaders.
-
Fluent in data fundamentals: SQL, data manipulation using a procedural language, statistics, experimentation, and modeling
Qualifications
-
Professional experience as a data science leader, setting the vision for how to most effectively use data in your organization
-
Expert-level experience with machine learning that can be applied to business problems
-
Evidence you’ve successfully been able to scope, deliver and sell your own work in a way that shifts the agenda of a large organization
-
Fluent in data fundamentals: SQL, data manipulation using a procedural language, statistics, experimentation, and modeling
-
Degree in a field that has very applicable use of data science / statistics techniques (e.g. statistics, applied math, computer science, OR a science field with direct statistics application)
-
5+ years of industry experience in data science and machine learning, preferably at a software product company
-
3+ years of experience managing data science teams, incl. managing/grooming managers beneath you
-
3+ years of experience partnering with executive staff on data topics