https://www.artivatic.ai/" target="_blank">Artivatic is a technology startup that uses AI/ML/Deep learning to build intelligent products & solutions for finance, healthcare & insurance businesses. It is based out of Bangalore with 20+ team focus on technology. Artivatic building is cutting edge solutions to enable 750 Million plus people to get insurance, financial access, and health benefits with alternative data sources to increase their productivity, efficiency, automation power, and profitability, hence improving their way of doing business more intelligently & seamlessly. Artivatic offers lending underwriting, credit/insurance underwriting, fraud, prediction, personalization, recommendation, risk profiling, consumer profiling intelligence, KYC Automation & Compliance, automated decisions, monitoring, claims processing, sentiment/psychology behavior, auto insurance claims, travel insurance, disease prediction for insurance and more
companies uncover the 3% of active buyers in their target market. It evaluates
over 100 billion data points and analyzes factors such as buyer journeys, technology
adoption patterns, and other digital footprints to deliver market & sales intelligence.
Its customers have access to the buying patterns and contact information of
more than 17 million companies and 70 million decision makers across the world.
Role – Data Engineer
Work in collaboration with the application team and integration team to
design, create, and maintain optimal data pipeline architecture and data
structures for Data Lake/Data Warehouse.
Work with stakeholders including the Sales, Product, and Customer Support
teams to assist with data-related technical issues and support their data
Assemble large, complex data sets from third-party vendors to meet business
Identify, design, and implement internal process improvements: automating
manual processes, optimizing data delivery, re-designing infrastructure for
greater scalability, etc.
Build the infrastructure required for optimal extraction, transformation, and
loading of data from a wide variety of data sources using SQL, Elasticsearch,
MongoDB, and AWS technology.
Streamline existing and introduce enhanced reporting and analysis solutions
that leverage complex data sources derived from multiple internal systems.
5+ years of experience in a Data Engineer role.
Proficiency in Linux.
Must have SQL knowledge and experience working with relational databases,
query authoring (SQL) as well as familiarity with databases including Mysql,
Mongo, Cassandra, and Athena.
Must have experience with Python/Scala.
Must have experience with Big Data technologies like Apache Spark.
Must have experience with Apache Airflow.
Experience with data pipeline and ETL tools like AWS Glue.
Experience working with AWS cloud services: EC2, S3, RDS, Redshift.
Please note - This is a 100% remote opportunity and you can work from any location.
About the team:
You will be a part of Cactus Labs which is the R&D Cell of Cactus Communications. Cactus Labs is a high impact cell that works to solve complex technical and business problems that help keep us strategically competitive in the industry. We are a multi-cultural team spread across multiple countries. We work in the domain of AI/ML especially with Text (NLP - Natural Language Processing), Language Understanding, Explainable AI, Big Data, AR/VR etc.
The opportunity: Within Cactus Labs you will work with the Big Data team. This team manages Terabytes of data coming from different sources. We are re-orchestrating data pipelines to handle this data at scale and improve visibility and robustness. We operate across all the three Cloud Platforms and leverage the best of them.
In this role, you will get to own a component end to end. You will also get to work on cloud platform and learn to design distributed data processing systems to operate at scale.
- Collaborate with a team of Big Data Engineers, Big Data and Cloud Architects and Domain SMEs to drive the product ahead
- Stay up to date with the progress of in the domain since We work on cutting-edge technologies and are constantly trying new things out
- Build solutions for massive scale. This requires extensive benchmarking to pick the right approach
- Understand the data in and out, and make sense of it. You will at times need to draw conclusions and present it to the business users
- Be independent, self-driven and highly motivated. While you will have the best people to learn from and access to various courses or training materials, we expect you to take charge of your growth and learning.
Expectations from you:
- 1-3 Year of relevant experience in Big Data with pySpark
- Hands on experience of distributed computing and Big Data Ecosystem - Hadoop, HDFS, Spark etc
- Good understanding of data lake and their importance in a Big Data Ecosystem
- Experience of working in the Cloud Environment (AWS, Azure or GCP)
- You like to work without a lot of supervision or micromanagement.
- Above all, you get excited by data. You like to dive deep, mine patterns and draw conclusions. You believe in making data driven decisions and helping the team look for the pattern as well.
- Familiarity with search engines like Elasticsearch and Bigdata warehouses systems like AWS Athena, Google Big Query etc
- Building data pipelines using Airflow
- Experience of working in AWS Cloud Environment
- Knowledge of NLP and ML
● Able contribute to the gathering of functional requirements, developing technical
specifications, and project & test planning
● Demonstrating technical expertise, and solving challenging programming and design
● Roughly 80% hands-on coding
● Generate technical documentation and PowerPoint presentations to communicate
architectural and design options, and educate development teams and business users
● Resolve defects/bugs during QA testing, pre-production, production, and post-release
● Work cross-functionally with various bidgely teams including: product management,
QA/QE, various product lines, and/or business units to drive forward results
● BS/MS in computer science or equivalent work experience
● 2-4 years’ experience designing and developing applications in Data Engineering
● Hands-on experience with Big data Eco Systems.
● Hadoop,Hdfs,Map Reduce,YARN,AWS Cloud, EMR, S3, Spark, Cassandra, Kafka,
● Expertise with any of the following Object-Oriented Languages (OOD): Java/J2EE,Scala,
● Strong leadership experience: Leading meetings, presenting if required
● Excellent communication skills: Demonstrated ability to explain complex technical
issues to both technical and non-technical audiences
● Expertise in the Software design/architecture process
● Expertise with unit testing & Test-Driven Development (TDD)
● Experience on Cloud or AWS is preferable
● Have a good understanding and ability to develop software, prototypes, or proofs of
concepts (POC's) for various Data Engineering requirements.
Must Have Skills:
- Solid Knowledge on DWH, ETL and Big Data Concepts
- Excellent SQL Skills (With knowledge of SQL Analytics Functions)
- Working Experience on any ETL tool i.e. SSIS / Informatica
- Working Experience on any Azure or AWS Big Data Tools.
- Experience on Implementing Data Jobs (Batch / Real time Streaming)
- Excellent written and verbal communication skills in English, Self-motivated with strong sense of ownership and Ready to learn new tools and technologies
- Experience on Py-Spark / Spark SQL
- AWS Data Tools (AWS Glue, AWS Athena)
- Azure Data Tools (Azure Databricks, Azure Data Factory)
- Knowledge about Azure Blob, Azure File Storage, AWS S3, Elastic Search / Redis Search
- Knowledge on domain/function (across pricing, promotions and assortment).
- Implementation Experience on Schema and Data Validator framework (Python / Java / SQL),
- Knowledge on DQS and MDM.
- Independently work on ETL / DWH / Big data Projects
- Gather and process raw data at scale.
- Design and develop data applications using selected tools and frameworks as required and requested.
- Read, extract, transform, stage and load data to selected tools and frameworks as required and requested.
- Perform tasks such as writing scripts, web scraping, calling APIs, write SQL queries, etc.
- Work closely with the engineering team to integrate your work into our production systems.
- Process unstructured data into a form suitable for analysis.
- Analyse processed data.
- Support business decisions with ad hoc analysis as needed.
- Monitoring data performance and modifying infrastructure as needed.
Responsibility: Smart Resource, having excellent communication skills
• Responsible for developing and maintaining applications with PySpark
Technical Knowledge (Must Have)
- Strong experience in SQL / HiveQL/ AWS Athena,
- Strong expertise in the development of data pipelines (snaplogic is preferred).
- Design, Development, Deployment and administration of data processing applications.
- Good Exposure towards AWS and Azure Cloud computing environments.
- Knowledge around BigData, AWS Cloud Architecture, Best practices, Securities, Governance, Metadata Management, Data Quality etc.
- Data extraction through various firm sources (RDBMS, Unstructured Data Sources) and load to datalake with all best practices.
- Knowledge in Python
- Good knowledge in NoSQL technologies (Neo4J/ MongoDB)
- Experience/knowledge in SnapLogic (ETL Technologies)
- Working knowledge on Unix (AIX, Linux), shell scripting
- Experience/knowledge in Data Modeling. Database Development
- Experience/knowledge creation of reports and dashboards in Tableau/ PowerBI
empower healthcare payers, providers and members to quickly process medical data to
make informed decisions and reduce health care costs. You will be focusing on research,
development, strategy, operations, people management, and being a thought leader for
team members based out of India. You should have professional healthcare experience
using both structured and unstructured data to build applications. These applications
include but are not limited to machine learning, artificial intelligence, optical character
recognition, natural language processing, and integrating processes into the overall AI
pipeline to mine healthcare and medical information with high recall and other relevant
metrics. The results will be used dually for real-time operational processes with both
automated and human-based decision making as well as contribute to reducing
healthcare administrative costs. We work with all major cloud and big data vendors
offerings including (Azure, AWS, Google, IBM, etc.) to achieve our goals in healthcare and
The Director, Data Science will have the opportunity to build a team, shape team culture
and operating norms as a result of the fast-paced nature of a new, high-growth
• Strong communication and presentation skills to convey progress to a diverse group of stakeholders
• Strong expertise in data science, data engineering, software engineering, cloud vendors, big data technologies, real-time streaming applications, DevOps and product delivery
• Experience building stakeholder trust and confidence in deployed models especially via application of the algorithmic bias, interpretable machine learning,
data integrity, data quality, reproducible research and reliable engineering 24x7x365 product availability, scalability
• Expertise in healthcare privacy, federated learning, continuous integration and deployment, DevOps support
• Provide mentoring to data scientists and machine learning engineers as well as career development
• Meet project related team members for individual specific needs on a regular basis related to project/product deliverables
• Provide training and guidance for team members when required
• Provide performance feedback when required by leadership
The Experience You’ll Need (Required):
• MS/M.Tech degree or PhD in Computer Science, Mathematics, Physics or related STEM fields
• Significant healthcare data experience including but not limited to usage of claims data
• Delivered multiple data science and machine learning projects over 8+ years with values exceeding $10 Million or more and has worked on platform members exceeding 10 million lives
• 9+ years of industry experience in data science, machine learning, and artificial intelligence
• Strong expertise in data science, data engineering, software engineering, cloud vendors, big data technologies, real time streaming applications, DevOps, and product delivery
• Knows how to solve and launch real artificial intelligence and data science related problems and products along with managing and coordinating the
business process change, IT / cloud operations, meeting production level code standards
• Ownerships of key workflows part of data science life cycle like data acquisition, data quality, and results
• Experience building stakeholder trust and confidence in deployed models especially via application of algorithmic bias, interpretable machine learning,
data integrity, data quality, reproducible research, and reliable engineering 24x7x365 product availability, scalability
• Expertise in healthcare privacy, federated learning, continuous integration and deployment, DevOps support
• 3+ Years of experience managing directly five (5) or more senior level data scientists, machine learning engineers with advanced degrees and directly
made staff decisions
• Very strong understanding of mathematical concepts including but not limited to linear algebra, advanced calculus, partial differential equations, and
statistics including Bayesian approaches at master’s degree level and above
• 6+ years of programming experience in C++ or Java or Scala and data science programming languages like Python and R including strong understanding of
concepts like data structures, algorithms, compression techniques, high performance computing, distributed computing, and various computer architecture
• Very strong understanding and experience with traditional data science approaches like sampling techniques, feature engineering, classification, and
regressions, SVM, trees, model evaluations with several projects over 3+ years
• Very strong understanding and experience in Natural Language Processing,
reasoning, and understanding, information retrieval, text mining, search, with
3+ years of hands on experience
• Experience with developing and deploying several products in production with
experience in two or more of the following languages (Python, C++, Java, Scala)
• Strong Unix/Linux background and experience with at least one of the
following cloud vendors like AWS, Azure, and Google
• Three plus (3+) years hands on experience with MapR \ Cloudera \ Databricks
Big Data platform with Spark, Hive, Kafka etc.
• Three plus (3+) years of experience with high-performance computing like
Dask, CUDA distributed GPU, TPU etc.
• Presented at major conferences and/or published materials
- Previous experience of working in large scale data engineering
- 4+ years of experience working in data engineering and/or backend technologies with cloud experience (any) is mandatory.
- Previous experience of architecting and designing backend for large scale data processing.
- Familiarity and experience of working in different technologies related to data engineering – different database technologies, Hadoop, spark, storm, hive etc.
- Hands-on and have the ability to contribute a key portion of data engineering backend.
- Self-inspired and motivated to drive for exceptional results.
- Familiarity and experience working with different stages of data engineering – data acquisition, data refining, large scale data processing, efficient data storage for business analysis.
- Familiarity and experience working with different DB technologies and how to scale them.
- End to end responsibility to come up with data engineering architecture, design, development and then implementation of it.
- Build data engineering workflow for large scale data processing.
- Discover opportunities in data acquisition.
- Bring industry best practices for data engineering workflow.
- Develop data set processes for data modelling, mining and production.
- Take additional tech responsibilities for driving an initiative to completion
- Recommend ways to improve data reliability, efficiency and quality
- Goes out of their way to reduce complexity.
- Humble and outgoing - engineering cheerleaders.