As an experienced Data Scientist you’ll join a team of data scientists, analysts, and software engineers
working to push the boundaries of data science in health care. We like to experiment, iterate, and
innovate with technology, from developing new algorithms specific to health care’s challenges, to
bringing the latest machine learning practices and applications developed in other industries into the
health care world. We know that algorithms are only valuable when powered by the right data, so we
focus on fully understanding the problems we need to solve, and truly understanding the data behind
them before launching into solutions – ensuring that the solutions we do land on are impactful and
• Research, conceptualize, and implement analytical approaches and predictive modeling to
evaluate scenarios, predict utilization and clinical outcomes, and recommend actions to impact
• Manage and execute on the entire model development process, including scope definition,
hypothesis formation, data cleaning and preparation, feature selection, model implementation
in production, validation and iteration, using multiple data sources.
• Provide guidance on necessary data and software infrastructure capabilities to deliver a scalable
solution across partners and support the implementation of the team’s algorithms and models
• Contribute to the development and publication in major journals, conferences showcasing
leadership in healthcare data science.
• Work closely and collaborate with Data Scientists, Machine Learning engineers, IT teams and
Business stakeholders spread out across various locations in US and India to achieve business
• Provide guidance to other Data Scientist and Machine Learning Engineers
We are looking for candidates who have demonstrated both a strong business sense and deep understanding of the quantitative foundations of modelling.
• Excellent analytical and problem-solving skills, including the ability to disaggregate issues, identify root causes and recommend solutions
• Statistical programming software experience in SPSS and comfortable working with large data sets.
• R, Python, SAS & SQL are preferred but not a mandate
• Excellent time management skills
• Good written and verbal communication skills; understanding of both written and spoken English
• Strong interpersonal skills
• Ability to act autonomously, bringing structure and organization to work
• Creative and action-oriented mindset
• Ability to interact in a fluid, demanding and unstructured environment where priorities evolve constantly, and methodologies are regularly challenged
• Ability to work under pressure and deliver on tight deadlines
Qualifications and Experience:
• Graduate degree in: Statistics/Economics/Econometrics/Computer
Science/Engineering/Mathematics/MBA (with a strong quantitative background) or
• Strong track record work experience in the field of business intelligence, market
research, and/or Advanced Analytics
• Knowledge of data collection methods (focus groups, surveys, etc.)
• Knowledge of statistical packages (SPSS, SAS, R, Python, or similar), databases,
and MS Office (Excel, PowerPoint, Word)
• Strong analytical and critical thinking skills
• Industry experience in Consumer Experience/Healthcare a plus
Novelship is seeking a Data Engineer to be based in India or Remote in South East Asia to join our Tech Team.
Brief Description of the Role:
As a Data Engineer, you will be responsible for Building & Maintaining our Analytics Infrastructure, Data Taxononmy, Data Ingestion and aggregation to provide Business Intelligence to different teams and support Data Dependent tools like ERP and CRM.
In this role you will:
- Analyze and design ETL solutions to store/fetch data from multiple systems like Postgres, Airtable, Google Analytics and Mixpanel.
- Drive the implementation of new data management projects such as Finance ERP and re-structure of the current data architecture.
- Participate in the building of a single source of Data Sytems and Data Taxonomy projects.
- Engage in problem definition and resolution and collaborate with a diverse group of engineers and business owners from across the company.
- Work with stakeholders including the Strategy, Product and Marketing teams to assist with data-related technical issues, support their data analytics needs and work on data collection and aggregation solutions.
- Act as a technical resource for the Data team and be involved in creating and implementing current and future Analytics projects like data lake design and data warehouse design.
- Ensure quality and consistency of the data in the Data warehouse and follow best data governance practices.
- Analyze large amounts of information to discover trends and patterns to provide Business Intelligence.
- Mine and analyse data from databases to drive optimization and improvement of product development, marketing techniques and business strategies.
- Design and build reusable components, frameworks and libraries at scale to support analytics data products
- Build and maintain optimal data pipeline architecture and data systems. Assemble large, complex data sets that meet functional / non-functional business requirements.
- Identify, design, and implement internal process improvements: automating manual processes, optimising data delivery, re-designing infrastructure for greater scalability, etc.
- Streamline existing and introduce enhanced reporting and analysis solutions that leverage complex data sources derived from multiple internal systems
- 2 to 4 years of professional experience as a Data Engineer.
- Proficiency in either Python, Scala or R.
- Proficiency in SQL, Relational & Non-Relational Databases.
- Excellent analytical and problem-solving skills.
- Experience with Business Intelligence tools like Data Studio, Power BI and Tableau.
- Experience in Data Cleaning, Creating Data Pipelines, Data Modelling, Storytelling and Dashboarding.
- Bachelors or Masters's education in Computer Science
● B.Tech/Masters in Mathematics, Statistics, Computer Science or another
● 2-3+ years of work experience in ML domain ( 2-5 years experience )
● Hands-on coding experience in Python
● Experience in machine learning techniques such as Regression, Classification,
Predictive modeling, Clustering, Deep Learning stack, NLP
● Working knowledge of Tensorflow/PyTorch
● Experience with distributed computing frameworks: Map/Reduce, Hadoop, Spark
● Experience with databases: MongoDB
We are looking out for a technically driven "Full-Stack Engineer" for one of our premium client
• Bachelor's degree in computer science or related field; Master's degree is a plus
• 3+ years of relevant work experience
• Meaningful experience with at least two of the following technologies: Python, Scala, Java
• Strong proven experience on distributed processing frameworks (Spark, Hadoop, EMR) and SQL is very
• Commercial client-facing project experience is helpful, including working in close-knit teams
• Ability to work across structured, semi-structured, and unstructured data, extracting information and
identifying linkages across disparate data sets
• Confirmed ability in clearly communicating complex solutions
• Understandings on Information Security principles to ensure compliant handling and management of
• Experience and interest in Cloud platforms such as: AWS, Azure, Google Platform or Databricks
• Extraordinary attention to detail
- Participate in full machine learning Lifecycle including data collection, cleaning, preprocessing to training models, and deploying them to Production.
- Discover data sources, get access to them, ingest them, clean them up, and make them “machine learning ready”.
- Work with data scientists to create and refine features from the underlying data and build pipelines to train and deploy models.
- Partner with data scientists to understand and implement machine learning algorithms.
- Support A/B tests, gather data, perform analysis, draw conclusions on the impact of your models.
- Work cross-functionally with product managers, data scientists, and product engineers, and communicate results to peers and leaders.
- Mentor junior team members
Who we have in mind:
- Graduate in Computer Science or related field, or equivalent practical experience.
- 4+ years of experience in software engineering with 2+ years of direct experience in the machine learning field.
- Proficiency with SQL, Python, Spark, and basic libraries such as Scikit-learn, NumPy, Pandas.
- Familiarity with deep learning frameworks such as TensorFlow or Keras
- Experience with Computer Vision (OpenCV), NLP frameworks (NLTK, SpaCY, BERT).
- Basic knowledge of machine learning techniques (i.e. classification, regression, and clustering).
- Understand machine learning principles (training, validation, etc.)
- Strong hands-on knowledge of data query and data processing tools (i.e. SQL)
- Software engineering fundamentals: version control systems (i.e. Git, Github) and workflows, and ability to write production-ready code.
- Experience deploying highly scalable software supporting millions or more users
- Experience building applications on cloud (AWS or Azure)
- Experience working in scrum teams with Agile tools like JIRA
- Strong oral and written communication skills. Ability to explain complex concepts and technical material to non-technical users
Episource has devoted more than a decade in building solutions for risk adjustment to measure healthcare outcomes. As one of the leading companies in healthcare, we have helped numerous clients optimize their medical records, data, analytics to enable better documentation of care for patients with chronic diseases.
The backbone of our consistent success has been our obsession with data and technology. At Episource, all of our strategic initiatives start with the question - how can data be “deployed”? Our analytics platforms and datalakes ingest huge quantities of data daily, to help our clients deliver services. We have also built our own machine learning and NLP platform to infuse added productivity and efficiency into our workflow. Combined, these build a foundation of tools and practices used by quantitative staff across the company.
What’s our poison you ask? We work with most of the popular frameworks and technologies like Spark, Airflow, Ansible, Terraform, Docker, ELK. For machine learning and NLP, we are big fans of keras, spacy, scikit-learn, pandas and numpy. AWS and serverless platforms help us stitch these together to stay ahead of the curve.
ABOUT THE ROLE:
We’re looking to hire someone to help scale Machine Learning and NLP efforts at Episource. You’ll work with the team that develops the models powering Episource’s product focused on NLP driven medical coding. Some of the problems include improving our ICD code recommendations, clinical named entity recognition, improving patient health, clinical suspecting and information extraction from clinical notes.
This is a role for highly technical data engineers who combine outstanding oral and written communication skills, and the ability to code up prototypes and productionalize using a large range of tools, algorithms, and languages. Most importantly they need to have the ability to autonomously plan and organize their work assignments based on high-level team goals.
You will be responsible for setting an agenda to develop and ship data-driven architectures that positively impact the business, working with partners across the company including operations and engineering. You will use research results to shape strategy for the company and help build a foundation of tools and practices used by quantitative staff across the company.
During the course of a typical day with our team, expect to work on one or more projects around the following;
1. Create and maintain optimal data pipeline architectures for ML
2. Develop a strong API ecosystem for ML pipelines
3. Building CI/CD pipelines for ML deployments using Github Actions, Travis, Terraform and Ansible
4. Responsible to design and develop distributed, high volume, high-velocity multi-threaded event processing systems
5. Knowledge of software engineering best practices across the development lifecycle, coding standards, code reviews, source management, build processes, testing, and operations
6. Deploying data pipelines in production using Infrastructure-as-a-Code platforms
7. Designing scalable implementations of the models developed by our Data Science teams
8. Big data and distributed ML with PySpark on AWS EMR, and more!
Bachelor’s degree or greater in Computer Science, IT or related fields
Minimum of 5 years of experience in cloud, DevOps, MLOps & data projects
Strong experience with bash scripting, unix environments and building scalable/distributed systems
Experience with automation/configuration management using Ansible, Terraform, or equivalent
Very strong experience with AWS and Python
Experience building CI/CD systems
Experience with containerization technologies like Docker, Kubernetes, ECS, EKS or equivalent
Ability to build and manage application and performance monitoring processes