We are looking for an exceptional Data Scientist who is passionate about data and motivated to build large scale machine learning solutions. This person will be contributing to the analytics of data for insight discovery and development of machine learning pipeline to support modelling of terabytes of daily data for various use cases
Typical persona: Data Science Manager / Architect
Experience: 8+ years programming/engineering experience (with at least last 4 years in big data, Data science)
- Hands-on Python: Pandas, Scikit-Learn
- Working knowledge of Kafka
- Able to carry out own tasks and help the team in resolving problems - logical or technical (25% of job)
- Good on analytical & debugging skills
- Strong communication skills
Desired (in order of priorities):
- Go (Strong advantage)
- Airflow (Strong advantage)
- Familiarity & working experience on more than one type of database: relational, object, columnar, graph and other unstructured databases
- Data structures, Algorithms
- Experience with multi-threaded and thread sync concepts
- AWS Sagemaker
- Should have strong experience in Python programming minimum 4 Years
Fix issues with plugins for our Python-based ETL pipelines
Help with automation of standard workflow
Deliver Python microservices for provisioning and managing cloud infrastructure
Responsible for any refactoring of code
Effectively manage challenges associated with handling large volumes of data working to tight deadlines
Manage expectations with internal stakeholders and context-switch in a fast-paced environment
Thrive in an environment that uses AWS and Elasticsearch extensively
Keep abreast of technology and contribute to the engineering strategy
Champion best development practices and provide mentorship to others
First and foremost you are a Python developer, experienced with the Python Data stack
You love and care about data
Your code is an artistic manifest reflecting how elegant you are in what you do
You feel sparks of joy when a new abstraction or pattern arises from your code
You support the manifests DRY (Don’t Repeat Yourself) and KISS (Keep It Short and Simple)
You are a continuous learner
You have a natural willingness to automate tasks
You have critical thinking and an eye for detail
Excellent ability and experience of working to tight deadlines
Sharp analytical and problem-solving skills
Strong sense of ownership and accountability for your work and delivery
Excellent written and oral communication skills
Mature collaboration and mentoring abilities
We are keen to know your digital footprint (community talks, blog posts, certifications, courses you have participated in or you are keen to, your personal projects as well as any kind of contributions to the open-source communities if any)
Delivering complex software, ideally in a FinTech setting
Experience with CI/CD tools such as Jenkins, CircleCI
Experience with code versioning (git / mercurial / subversion)
Lead Machine Learning (ML)/
5 + years of experience
Contify is an AI-enabled Market and Competitive Intelligence (MCI)
software to help professionals make informed decisions. Its B2B SaaS
platform helps leading organizations such as Ericsson, EY, Wipro,
Deloitte, L&T, BCG, MetLife, etc. track information on their competitors,
customers, industries, and topics of interest by continuously monitoring
over 500,000+ sources on a real-time basis. Contify is rapidly growing
with 185+ people across two offices in India. Contify is the winner of
Frost and Sullivan’s Product Innovation Award for Market and
Competitive Intelligence Platforms.
We are looking for a hardworking, aspirational, and innovative
engineering person for the Lead ML/ NLP Engineer position. You’ll build
Contify’s ML and NLP capabilities and help us extract value from
unstructured data. Using advanced NLP, ML, and text analytics, you will
develop applications that will extract business insights by analyzing a
large amount of unstructured text information, identifying patterns, and
by connecting the events.
You will be responsible for all the processes from data collection, and
pre-processing, to training models and deploying them to production.
➔ Understand the business objectives; design and deploy scalable
ML models/ NLP applications to meet those objectives
➔ Use of NLP techniques for text representation, semantic analysis,
information extraction, to meet the business objectives in an
efficient manner along with metrics to measure progress
➔ Extend existing ML libraries and frameworks and use effective text
representations to transform natural language into useful features
➔ Defining and supervising the data collection process, verifying data
quality, and employing data augmentation techniques
➔ Defining the preprocessing or feature engineering to be done on a
➔ Analyze the errors of the model and design strategies to overcome
➔ Research and implement the right algorithms and tools for ML/
➔ Collaborate with engineering and product development teams
➔ Represent Contify in external ML industry events and publish
thought leadership articles
Desired Skills and Experience
To succeed in this role, you should possess outstanding skills in
statistical analysis, machine learning methods, and text representation
➔ Deep understanding of text representation techniques (such as n-
grams, bag of words, sentiment analysis, etc), statistics and
➔ Hand on experience in feature extraction techniques for text
classification and topic mining
➔ Knowledge of text analytics with a strong understanding of NLP
algorithms and models (GLMs, SVM, PCA, NB, Clustering, DTs)
and their underlying computational and probabilistic statistics
◆ Word Embedding like Tfidf, Word2Vec, GLove, FastText, etc.
◆ Language models like Bert, GPT, RoBERTa, XLNet
◆ Neural networks like RNN, GRU, LSTM, Bi-LSTM
◆ Classification algorithms like LinearSVC, SVM, LR
◆ XGB, MultinomialNB, etc.
◆ Other Algos- PCA, Clustering methods, etc
➔ Excellent knowledge and demonstrable experience in using NLP
packages such as NLTK, Word2Vec, SpaCy, Gensim, Standford
CoreNLP, TensorFlow/ PyTorch.
➔ Experience in setting up supervised & unsupervised learning
models including data cleaning, data analytics, feature creation,
model selection & ensemble methods, performance metrics &
➔ Evaluation Metrics- Root Mean Squared Error, Confusion Matrix, F
Score, AUC – ROC, etc
➔ Understanding of knowledge graph will be a plus
➔ Education: Bachelors or Masters in Computer Science,
Mathematics, Computational Linguistics or similar field
➔ At least 4 years' experience building Machine Learning & NLP
solutions over open-source platforms such as SciKit-Learn,
Tensorflow, SparkML, etc
➔ At least 2 years' experience in designing and developing
enterprise-scale NLP solutions in one or more of: Named Entity
Recognition, Document Classification, Feature Extraction, Triplet
Extraction, Clustering, Summarization, Topic Modelling, Dialog
Systems, Sentiment Analysis
➔ Self-starter who can see the big picture, and prioritize your work to
make the largest impact on the business’ and customer’s vision
➔ Being a committer or a contributor to an open-source project is a
Contify is a people-oriented company. Emotional intelligence, therefore,
is a must. You should enjoy working in a team environment, supporting
your teammates in pursuit of our common goals, and working with your
colleagues to drive customer value. You strive to not only improve
yourself, but also those around you.
either one of Java, Scala or Python
Experience in Bigdata Technologies (Hadoop/Spark/Hive/Presto/
Experience in Distributed Search (Solr/Elastic Search), In-memory data-grid
(Redis/Ignite), Cloud native apps and Kubernetes is a plus
Experience in building REST services and API’s following best practices of service
abstractions, Micro-services. Experience in Orchestration frameworks is a plus
Experience in Agile methodology and CICD - tool integration, automation,
Added advantage for being a committer in one of the open-source Bigdata
technologies - Spark, Hive, Kafka, Yarn, Hadoop/HDFS
o Strong Python development skills, with 7+ yrs. experience with SQL.
o A bachelor or master’s degree in Computer Science or related areas
o 5+ years of experience in data integration and pipeline development
o Experience in Implementing Databricks Delta lake and data lake
o Expertise designing and implementing data pipelines using modern data engineering approach and tools: SQL, Python, Delta Lake, Databricks, Snowflake Spark
o Experience in working with multiple file formats (Parque, Avro, Delta Lake) & API
o experience with AWS Cloud on data integration with S3.
o Hands on Development experience with Python and/or Scala.
o Experience with SQL and NoSQL databases.
o Experience in using data modeling techniques and tools (focused on Dimensional design)
o Experience with micro-service architecture using Docker and Kubernetes
o Have experience working with one or more of the public cloud providers i.e. AWS, Azure or GCP
o Experience in effectively presenting and summarizing complex data to diverse audiences through visualizations and other means
o Excellent verbal and written communications skills and strong leadership capabilities
Azure Data Lake, dataFactory, Databricks, Delta Lake
Ideal candidates should have technical experience in migrations and the ability to help customers get value from Datametica's tools and accelerators.
Experience : 7+ years
Location : Pune / Hyderabad
- Drive and participate in requirements gathering workshops, estimation discussions, design meetings and status review meetings
- Participate and contribute in Solution Design and Solution Architecture for implementing Big Data Projects on-premise and on cloud
- Technical Hands on experience in design, coding, development and managing Large Hadoop implementation
- Proficient in SQL, Hive, PIG, Spark SQL, Shell Scripting, Kafka, Flume, Scoop with large Big Data and Data Warehousing projects with either Java, Python or Scala based Hadoop programming background
- Proficient with various development methodologies like waterfall, agile/scrum and iterative
- Good Interpersonal skills and excellent communication skills for US and UK based clients
A global Leader in the Data Warehouse Migration and Modernization to the Cloud, we empower businesses by migrating their Data/Workload/ETL/Analytics to the Cloud by leveraging Automation.
We have expertise in transforming legacy Teradata, Oracle, Hadoop, Netezza, Vertica, Greenplum along with ETLs like Informatica, Datastage, AbInitio & others, to cloud-based data warehousing with other capabilities in data engineering, advanced analytics solutions, data management, data lake and cloud optimization.
Datametica is a key partner of the major cloud service providers - Google, Microsoft, Amazon, Snowflake.
We have our own products!
Eagle – Data warehouse Assessment & Migration Planning Product
Raven – Automated Workload Conversion Product
Pelican - Automated Data Validation Product, which helps automate and accelerate data migration to the cloud.
Why join us!
Datametica is a place to innovate, bring new ideas to live and learn new things. We believe in building a culture of innovation, growth and belonging. Our people and their dedication over these years are the key factors in achieving our success.
Benefits we Provide!
Working with Highly Technical and Passionate, mission-driven people
Subsidized Meals & Snacks
Access to various learning tools and programs
Certification Reimbursement Policy
Check out more about us on our website below!
We are looking for an outstanding Big Data Engineer with experience setting up and maintaining Data Warehouse and Data Lakes for an Organization. This role would closely collaborate with the Data Science team and assist the team build and deploy machine learning and deep learning models on big data analytics platforms.
Roles and Responsibilities:
- Develop and maintain scalable data pipelines and build out new integrations and processes required for optimal extraction, transformation, and loading of data from a wide variety of data sources using 'Big Data' technologies.
- Develop programs in Scala and Python as part of data cleaning and processing.
- Assemble large, complex data sets that meet functional / non-functional business requirements and fostering data-driven decision making across the organization.
- Responsible to design and develop distributed, high volume, high velocity multi-threaded event processing systems.
- Implement processes and systems to validate data, monitor data quality, ensuring production data is always accurate and available for key stakeholders and business processes that depend on it.
- Perform root cause analysis on internal and external data and processes to answer specific business questions and identify opportunities for improvement.
- Provide high operational excellence guaranteeing high availability and platform stability.
- Closely collaborate with the Data Science team and assist the team build and deploy machine learning and deep learning models on big data analytics platforms.
- Experience with Big Data pipeline, Big Data analytics, Data warehousing.
- Experience with SQL/No-SQL, schema design and dimensional data modeling.
- Strong understanding of Hadoop Architecture, HDFS ecosystem and eexperience with Big Data technology stack such as HBase, Hadoop, Hive, MapReduce.
- Experience in designing systems that process structured as well as unstructured data at large scale.
- Experience in AWS/Spark/Java/Scala/Python development.
- Should have Strong skills in PySpark (Python & SPARK). Ability to create, manage and manipulate Spark Dataframes. Expertise in Spark query tuning and performance optimization.
- Experience in developing efficient software code/frameworks for multiple use cases leveraging Python and big data technologies.
- Prior exposure to streaming data sources such as Kafka.
- Should have knowledge on Shell Scripting and Python scripting.
- High proficiency in database skills (e.g., Complex SQL), for data preparation, cleaning, and data wrangling/munging, with the ability to write advanced queries and create stored procedures.
- Experience with NoSQL databases such as Cassandra / MongoDB.
- Solid experience in all phases of Software Development Lifecycle - plan, design, develop, test, release, maintain and support, decommission.
- Experience with DevOps tools (GitHub, Travis CI, and JIRA) and methodologies (Lean, Agile, Scrum, Test Driven Development).
- Experience building and deploying applications on on-premise and cloud-based infrastructure.
- Having a good understanding of machine learning landscape and concepts.
Qualifications and Experience:
Engineering and post graduate candidates, preferably in Computer Science, from premier institutions with proven work experience as a Big Data Engineer or a similar role for 3-5 years.
Good to have at least one of the Certifications listed here:
AZ 900 - Azure Fundamentals
DP 200, DP 201, DP 203, AZ 204 - Data Engineering
AZ 400 - Devops Certification