1)Machine learning development using Python or Scala Spark
2)Knowledge of multiple ML algorithms like Random forest, XG boost, RNN, CNN, Transform learning etc..
3)Aware of typical challenges in machine learning implementation and respective applications
Good to have
1)Stack development or DevOps team experience
2)Cloud service (AWS, Cloudera), SAAS, PAAS
3)Big data tools and framework
Quintiles and IMS Health have come together to become IQVIA, The Human Data Science Company™ Inspired by the industry we help, IQVIA commits to providing solutions that enable life sciences companies to innovate with confidence, maximize their opportunities and ultimately drive human health outcomes forward.
We provide actionable solutions by tapping into the power of the IQVIA CORE™:
• Domain Expertise. Institutional knowledge and domain expertise across diseases, geographies and scientific methods
• Advanced Analytics. Faster, more precise decision-making generated by advanced analytics designed for healthcare
• Unparalleled Data. One of the world’s largest curated healthcare data sources with innovative privacy protections
• Transformative Technology. Leading technologies to provide real-time access to operations-critical information
- Use statistical methods to analyze data and generate useful business reports and insights
- Analyze Publisher and Demand side data and provide actionable insights to improve monetisation to operations team and implement the strategies
- Provide support for ad hoc data requests from the Operations teams and Management
- Use 3rd party API's, web scraping, csv report processing to build dashboards in Google Data Studio
- Provide support for Analytics Processes monitoring and troubleshooting
- Support in creating reports, dashboards and models
- Independently determine the appropriate approach for new assignments
- Inquisitive and having great problem-solving skills
- Ability to own projects and work independently once given a direction
- Experience working directly with business users to build reports, dashboards, models and solving business questions with data
- Tools Expertise - Relational Databases -SQL is a must along with Python
- Familiarity with AWS Athena, Redshift a plus
- 2-7 years
- UG - B.Tech/B.E.; PG - M.Tech/ MSc, Computer Science, Statistics, Maths, Data Science/ Data Analytics
It’s no surprise that 6sense is named a top workplace year after year — we have industry-leading technology developed and taken to market by a world-class team. 6sense is Top Rated on Glassdoor with a 4.9/5 and our CEO Jason Zintak was recognized as the #1 CEO in the small & medium business category by Glassdoor’s 2021 Top CEO Employees Choice Awards.
In 2021, the company was recognized for having the Best Company for Diversity, Best Company for Women, Best CEO, Best Company Culture, Best Company Perks & Beneﬁts and Happiest Employees from the employee feedback platform Comparably. In addition, 6sense has also won several accolades that demonstrate its reputation as an employer of choice including the Glassdoor Best Place to Work (2022), TrustRadius Tech Cares (2021) and Inc. Best Workplaces (2022, 2021, 2020, 2019).
6sense reinvents the way organizations create, manage, and convert pipeline to revenue. The 6sense Revenue AI captures anonymous buying signals, predicts the right accounts to target at the ideal time, and recommends the channels and messages to boost revenue performance. Removing guesswork, friction and wasted sales effort, 6sense empowers sales, marketing, and customer success teams to signiﬁcantly improve pipeline quality, accelerate sales velocity, increase conversion rates, and grow revenue predictably.
6sense is seeking a Data Engineer to become part of a team designing, developing, and deploying its customer centric applications.
A Data Engineer at 6sense will have the opportunity to
- Create, validate and maintain optimal data pipelines, assemble large, complex data sets that meet functional / non-functional business requirements.
- Improving our current data pipelines i.e. improve their performance, remove redundancy, and figure out a way to test before v/s after to roll out.
- Debug any issues that arise from data pipelines especially performance issues.
- Experiment with new tools and new versions of hive/presto etc. etc.
Required qualifications and must have skills
- Excellent analytical and problem-solving skills
- 6+ years work experience showing growth as a Data Engineer.
- Strong hands-on experience with Big Data Platforms like Hadoop / Hive / Spark / Presto
- Experience with writing Hive / Presto UDFs in Java
- String experience in writing complex, optimized SQL queries across large data sets
- Experience with optimizing queries and underlying storage
- Comfortable with Unix / Linux command line
- BE/BTech/BS or equivalent
Nice to have Skills
- Used Key Value stores or noSQL databases
- Good understanding of docker and container platforms like Mesos and Kubernetes
- Security-first architecture approach
- Application benchmarking and optimization
- Develop robust, scalable and maintainable machine learning models to answer business problems against large data sets.
- Build methods for document clustering, topic modeling, text classification, named entity recognition, sentiment analysis, and POS tagging.
- Perform elements of data cleaning, feature selection and feature engineering and organize experiments in conjunction with best practices.
- Benchmark, apply, and test algorithms against success metrics. Interpret the results in terms of relating those metrics to the business process.
- Work with development teams to ensure models can be implemented as part of a delivered solution replicable across many clients.
- Knowledge of Machine Learning, NLP, Document Classification, Topic Modeling and Information Extraction with a proven track record of applying them to real problems.
- Experience working with big data systems and big data concepts.
- Ability to provide clear and concise communication both with other technical teams and non-technical domain specialists.
- Strong team player; ability to provide both a strong individual contribution but also work as a team and contribute to wider goals is a must in this dynamic environment.
- Experience with noisy and/or unstructured textual data.
knowledge graph and NLP including summarization, topic modelling etc
- Strong coding ability with statistical analysis tools in Python or R, and general software development skills (source code management, debugging, testing, deployment, etc.)
- Working knowledge of various text mining algorithms and their use-cases such as keyword extraction, PLSA, LDA, HMM, CRF, deep learning & recurrent ANN, word2vec/doc2vec, Bayesian modeling.
- Strong understanding of text pre-processing and normalization techniques, such as tokenization,
- POS tagging and parsing and how they work at a low level.
- Excellent problem solving skills.
- Strong verbal and written communication skills
- Masters or higher in data mining or machine learning; or equivalent practical analytics / modelling experience
- Practical experience in using NLP related techniques and algorithms
- Experience in open source coding and communities desirable.
Able to containerize Models and associated modules and work in a Microservices environment
Job Description for :
Role: Data/Integration Architect
Experience – 8-10 Years
Notice Period: Under 30 days
Key Responsibilities: Designing, Developing frameworks for batch and real time jobs on Talend. Leading migration of these jobs from Mulesoft to Talend, maintaining best practices for the team, conducting code reviews and demos.
Talend Data Fabric - Application, API Integration, Data Integration. Knowledge on Talend Management Cloud, deployment and scheduling of jobs using TMC or Autosys.
Programming Languages - Python/Java
Databases: SQL Server, Other Databases, Hadoop
Should have worked on Agile
Sound communication skills
Should be open to learning new technologies based on business needs on the job
Awareness of other data/integration platforms like Mulesoft, Camel
Awareness Hadoop, Snowflake, S3
Responsibilities for Data Engineer
- Create and maintain optimal data pipeline architecture,
- Assemble large, complex data sets that meet functional / non-functional business requirements.
- Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
- Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL and AWS ‘big data’ technologies.
- Build analytics tools that utilize the data pipeline to provide actionable insights into customer acquisition, operational efficiency and other key business performance metrics.
- Work with stakeholders including the Executive, Product, Data and Design teams to assist with data-related technical issues and support their data infrastructure needs.
- Keep our data separated and secure across national boundaries through multiple data centers and AWS regions.
- Create data tools for analytics and data scientist team members that assist them in building and optimizing our product into an innovative industry leader.
- Work with data and analytics experts to strive for greater functionality in our data systems.
Qualifications for Data Engineer
- Advanced working SQL knowledge and experience working with relational databases, query authoring (SQL) as well as working familiarity with a variety of databases.
- Experience building and optimizing ‘big data’ data pipelines, architectures and data sets.
- Experience performing root cause analysis on internal and external data and processes to answer specific business questions and identify opportunities for improvement.
- Strong analytic skills related to working with unstructured datasets.
- Build processes supporting data transformation, data structures, metadata, dependency and workload management.
- A successful history of manipulating, processing and extracting value from large disconnected datasets.
- Working knowledge of message queuing, stream processing, and highly scalable ‘big data’ data stores.
- Strong project management and organizational skills.
- Experience supporting and working with cross-functional teams in a dynamic environment.
- We are looking for a candidate with 5+ years of experience in a Data Engineer role, who has attained a Graduate degree in Computer Science, Statistics, Informatics, Information Systems or another quantitative field. They should also have experience using the following software/tools:
- Experience with big data tools: Hadoop, Spark, Kafka, etc.
- Experience with relational SQL and NoSQL databases, including Postgres and Cassandra.
- Experience with data pipeline and workflow management tools: Azkaban, Luigi, Airflow, etc.
- Experience with AWS cloud services: EC2, EMR, RDS, Redshift
- Experience with stream-processing systems: Storm, Spark-Streaming, etc.
- Experience with object-oriented/object function scripting languages: Python, Java, C++, Scala, etc.
Episource has devoted more than a decade in building solutions for risk adjustment to measure healthcare outcomes. As one of the leading companies in healthcare, we have helped numerous clients optimize their medical records, data, analytics to enable better documentation of care for patients with chronic diseases.
The backbone of our consistent success has been our obsession with data and technology. At Episource, all of our strategic initiatives start with the question - how can data be “deployed”? Our analytics platforms and datalakes ingest huge quantities of data daily, to help our clients deliver services. We have also built our own machine learning and NLP platform to infuse added productivity and efficiency into our workflow. Combined, these build a foundation of tools and practices used by quantitative staff across the company.
What’s our poison you ask? We work with most of the popular frameworks and technologies like Spark, Airflow, Ansible, Terraform, Docker, ELK. For machine learning and NLP, we are big fans of keras, spacy, scikit-learn, pandas and numpy. AWS and serverless platforms help us stitch these together to stay ahead of the curve.
ABOUT THE ROLE:
We’re looking to hire someone to help scale Machine Learning and NLP efforts at Episource. You’ll work with the team that develops the models powering Episource’s product focused on NLP driven medical coding. Some of the problems include improving our ICD code recommendations, clinical named entity recognition, improving patient health, clinical suspecting and information extraction from clinical notes.
This is a role for highly technical data engineers who combine outstanding oral and written communication skills, and the ability to code up prototypes and productionalize using a large range of tools, algorithms, and languages. Most importantly they need to have the ability to autonomously plan and organize their work assignments based on high-level team goals.
You will be responsible for setting an agenda to develop and ship data-driven architectures that positively impact the business, working with partners across the company including operations and engineering. You will use research results to shape strategy for the company and help build a foundation of tools and practices used by quantitative staff across the company.
During the course of a typical day with our team, expect to work on one or more projects around the following;
1. Create and maintain optimal data pipeline architectures for ML
2. Develop a strong API ecosystem for ML pipelines
3. Building CI/CD pipelines for ML deployments using Github Actions, Travis, Terraform and Ansible
4. Responsible to design and develop distributed, high volume, high-velocity multi-threaded event processing systems
5. Knowledge of software engineering best practices across the development lifecycle, coding standards, code reviews, source management, build processes, testing, and operations
6. Deploying data pipelines in production using Infrastructure-as-a-Code platforms
7. Designing scalable implementations of the models developed by our Data Science teams
8. Big data and distributed ML with PySpark on AWS EMR, and more!
Bachelor’s degree or greater in Computer Science, IT or related fields
Minimum of 5 years of experience in cloud, DevOps, MLOps & data projects
Strong experience with bash scripting, unix environments and building scalable/distributed systems
Experience with automation/configuration management using Ansible, Terraform, or equivalent
Very strong experience with AWS and Python
Experience building CI/CD systems
Experience with containerization technologies like Docker, Kubernetes, ECS, EKS or equivalent
Ability to build and manage application and performance monitoring processes
- Partnering with internal business owners (product, marketing, edit, etc.) to understand needs and develop custom analysis to optimize for user engagement and retention
- Good understanding of the underlying business and workings of cross functional teams for successful execution
- Design and develop analyses based on business requirement needs and challenges.
- Leveraging statistical analysis on consumer research and data mining projects, including segmentation, clustering, factor analysis, multivariate regression, predictive modeling, etc.
- Providing statistical analysis on custom research projects and consult on A/B testing and other statistical analysis as needed. Other reports and custom analysis as required.
- Identify and use appropriate investigative and analytical technologies to interpret and verify results.
- Apply and learn a wide variety of tools and languages to achieve results
- Use best practices to develop statistical and/ or machine learning techniques to build models that address business needs.
- 2 - 4 years of relevant experience in Data science.
- Preferred education: Bachelor's degree in a technical field or equivalent experience.
- Experience in advanced analytics, model building, statistical modeling, optimization, and machine learning algorithms.
- Machine Learning Algorithms: Crystal clear understanding, coding, implementation, error analysis, model tuning knowledge on Linear Regression, Logistic Regression, SVM, shallow Neural Networks, clustering, Decision Trees, Random forest, XGBoost, Recommender Systems, ARIMA and Anomaly Detection. Feature selection, hyper parameters tuning, model selection and error analysis, boosting and ensemble methods.
- Strong with programming languages like Python and data processing using SQL or equivalent and ability to experiment with newer open source tools.
- Experience in normalizing data to ensure it is homogeneous and consistently formatted to enable sorting, query and analysis.
- Experience designing, developing, implementing and maintaining a database and programs to manage data analysis efforts.
- Experience with big data and cloud computing viz. Spark, Hadoop (MapReduce, PIG, HIVE).
- Experience in risk and credit score domains preferred.