Data Scientist

at a software product company working on petabyte scale data

Agency job
icon
Pune
icon
7 - 15 yrs
icon
₹30L - ₹50L / yr
icon
Full time
Skills
Data Science
Data Scientist
Python
Java
Apache Kafka
pandas
NumPy
Scikit-Learn
Amazon Web Services (AWS)
Go Programming (Golang)
airflow

We are looking for an exceptional Data Scientist who is passionate about data and motivated to build large scale machine learning solutions. This person will be contributing to the analytics of data for insight discovery and development of machine learning pipeline to support modelling of terabytes of daily data for various use cases

 

Typical persona: Data Science Manager / Architect

 

Experience: 8+ years programming/engineering experience (with at least last 4 years in big data, Data science)

 

Must have:

  • Hands-on Python: Pandas, Scikit-Learn
  • Working knowledge of Kafka
  • Able to carry out own tasks and help the team in resolving problems - logical or technical (25% of job)
  • Good on analytical & debugging skills
  • Strong communication skills

Desired (in order of priorities):

  • Go (Strong advantage)
  • Airflow (Strong advantage)
  • Familiarity & working experience on more than one type of database: relational, object, columnar, graph and other unstructured databases
  • Data structures, Algorithms
  • Experience with multi-threaded and thread sync concepts
  • AWS Sagemaker
  • Keras
  • Should have strong experience in Python programming minimum 4 Years
Why apply to jobs via Cutshort
Personalized job matches
Stop wasting time. Get matched with jobs that meet your skills, aspirations and preferences.
Verified hiring teams
See actual hiring teams, find common social connections or connect with them directly. No 3rd party agencies here.
Move faster with AI
We use AI to get you faster responses, recommendations and unmatched user experience.
2101133
Matches delivered
3712187
Network size
15000
Companies hiring

Similar jobs

Python
ETL
Jenkins
CI/CD
pandas
Amazon Web Services (AWS)
AWS Lambda
pipeline
Microservices
Data engineering
Apache Kafka
icon
Bengaluru (Bangalore)
icon
1 - 8 yrs
icon
₹10L - ₹40L / yr
Roles & Responsibilties
Expectations of the role
This role will be reporting into Technical Lead (Support). You will be expected to resolve bugs in the platform that are identified by Customers and Internal Teams. This role will progress towards SDE-2 in 12-15 months where the developer will be working on solving complex problems around scale and building out new features.
 
What will you do?
  • Fix issues with plugins for our Python-based ETL pipelines
  • Help with automation of standard workflow
  • Deliver Python microservices for provisioning and managing cloud infrastructure
  • Responsible for any refactoring of code
  • Effectively manage challenges associated with handling large volumes of data working to tight deadlines
  • Manage expectations with internal stakeholders and context-switch in a fast-paced environment
  • Thrive in an environment that uses AWS and Elasticsearch extensively
  • Keep abreast of technology and contribute to the engineering strategy
  • Champion best development practices and provide mentorship to others
What are we looking for?
  • First and foremost you are a Python developer, experienced with the Python Data stack
  • You love and care about data
  • Your code is an artistic manifest reflecting how elegant you are in what you do
  • You feel sparks of joy when a new abstraction or pattern arises from your code
  • You support the manifests DRY (Don’t Repeat Yourself) and KISS (Keep It Short and Simple)
  • You are a continuous learner
  • You have a natural willingness to automate tasks
  • You have critical thinking and an eye for detail
  • Excellent ability and experience of working to tight deadlines
  • Sharp analytical and problem-solving skills
  • Strong sense of ownership and accountability for your work and delivery
  • Excellent written and oral communication skills
  • Mature collaboration and mentoring abilities
  • We are keen to know your digital footprint (community talks, blog posts, certifications, courses you have participated in or you are keen to, your personal projects as well as any kind of contributions to the open-source communities if any)
Nice to have:
  • Delivering complex software, ideally in a FinTech setting
  • Experience with CI/CD tools such as Jenkins, CircleCI
  • Experience with code versioning (git / mercurial / subversion)
Job posted by
Naveen Taalanki
Machine Learning (ML)
Data Science
Natural Language Processing (NLP)
Text mining
pandas
SpaCy
icon
Gurugram, Delhi, Noida, Ghaziabad, Faridabad
icon
6 - 12 yrs
icon
₹30L - ₹50L / yr

Job Description
Lead Machine Learning (ML)/
NLP Engineer
5 + years of experience

About Contify
Contify is an AI-enabled Market and Competitive Intelligence (MCI)
software to help professionals make informed decisions. Its B2B SaaS
platform helps leading organizations such as Ericsson, EY, Wipro,
Deloitte, L&T, BCG, MetLife, etc. track information on their competitors,
customers, industries, and topics of interest by continuously monitoring
over 500,000+ sources on a real-time basis. Contify is rapidly growing
with 185+ people across two offices in India. Contify is the winner of
Frost and Sullivan’s Product Innovation Award for Market and
Competitive Intelligence Platforms.

The role
We are looking for a hardworking, aspirational, and innovative
engineering person for the Lead ML/ NLP Engineer position. You’ll build
Contify’s ML and NLP capabilities and help us extract value from
unstructured data. Using advanced NLP, ML, and text analytics, you will
develop applications that will extract business insights by analyzing a
large amount of unstructured text information, identifying patterns, and
by connecting the events.
Responsibilities:
You will be responsible for all the processes from data collection, and
pre-processing, to training models and deploying them to production.
➔ Understand the business objectives; design and deploy scalable
ML models/ NLP applications to meet those objectives
➔ Use of NLP techniques for text representation, semantic analysis,
information extraction, to meet the business objectives in an
efficient manner along with metrics to measure progress
➔ Extend existing ML libraries and frameworks and use effective text
representations to transform natural language into useful features
➔ Defining and supervising the data collection process, verifying data
quality, and employing data augmentation techniques
➔ Defining the preprocessing or feature engineering to be done on a
given dataset
➔ Analyze the errors of the model and design strategies to overcome
them
➔ Research and implement the right algorithms and tools for ML/
NLP tasks
➔ Collaborate with engineering and product development teams
➔ Represent Contify in external ML industry events and publish
thought leadership articles


Desired Skills and Experience
To succeed in this role, you should possess outstanding skills in
statistical analysis, machine learning methods, and text representation
techniques.
➔ Deep understanding of text representation techniques (such as n-
grams, bag of words, sentiment analysis, etc), statistics and
classification algorithms
➔ Hand on experience in feature extraction techniques for text
classification and topic mining
➔ Knowledge of text analytics with a strong understanding of NLP
algorithms and models (GLMs, SVM, PCA, NB, Clustering, DTs)
and their underlying computational and probabilistic statistics
◆ Word Embedding like Tfidf, Word2Vec, GLove, FastText, etc.
◆ Language models like Bert, GPT,  RoBERTa, XLNet 
◆ Neural networks like RNN, GRU, LSTM, Bi-LSTM
◆ Classification algorithms like LinearSVC, SVM, LR
◆ XGB, MultinomialNB, etc.
◆ Other Algos- PCA, Clustering methods, etc
➔ Excellent knowledge and demonstrable experience in using NLP
packages such as NLTK, Word2Vec, SpaCy, Gensim, Standford
CoreNLP, TensorFlow/ PyTorch.
➔ Experience in setting up supervised & unsupervised learning
models including data cleaning, data analytics, feature creation,
model selection & ensemble methods, performance metrics &
visualization
➔ Evaluation Metrics- Root Mean Squared Error, Confusion Matrix, F
Score, AUC – ROC, etc
➔ Understanding of knowledge graph will be a plus


Qualifications
➔ Education: Bachelors or Masters in Computer Science,
Mathematics, Computational Linguistics or similar field
➔ At least 4 years' experience building Machine Learning & NLP
solutions over open-source platforms such as SciKit-Learn,
Tensorflow, SparkML, etc
➔ At least 2 years' experience in designing and developing
enterprise-scale NLP solutions in one or more of: Named Entity
Recognition, Document Classification, Feature Extraction, Triplet
Extraction, Clustering, Summarization, Topic Modelling, Dialog
Systems, Sentiment Analysis
➔ Self-starter who can see the big picture, and prioritize your work to
make the largest impact on the business’ and customer’s vision
and requirements
➔ Being a committer or a contributor to an open-source project is a
plus

Note
Contify is a people-oriented company. Emotional intelligence, therefore,
is a must. You should enjoy working in a team environment, supporting
your teammates in pursuit of our common goals, and working with your
colleagues to drive customer value. You strive to not only improve
yourself, but also those around you.

Job posted by
Gaurang Mahajan

Data Engineer ( Only Immediate)

at StatusNeo

Founded 2020  •  Products & Services  •  100-1000 employees  •  Profitable
Data engineering
Data Engineer
Python
Big Data
Spark
Scala
icon
Remote only
icon
2 - 15 yrs
icon
₹2L - ₹70L / yr
Proficiency in engineering practices and writing high quality code, with expertise in
either one of Java, Scala or Python
 Experience in Bigdata Technologies (Hadoop/Spark/Hive/Presto/HBase) & streaming
platforms (Kafka/NiFi/Storm)
 Experience in Distributed Search (Solr/Elastic Search), In-memory data-grid
(Redis/Ignite), Cloud native apps and Kubernetes is a plus
 Experience in building REST services and API’s following best practices of service
abstractions, Micro-services. Experience in Orchestration frameworks is a plus
 Experience in Agile methodology and CICD - tool integration, automation,
configuration management
 Added advantage for being a committer in one of the open-source Bigdata
technologies - Spark, Hive, Kafka, Yarn, Hadoop/HDFS
Job posted by
Alex P

Machine Learning Engineer

at CES IT

Founded 1996  •  Services  •  1000-5000 employees  •  Profitable
Machine Learning (ML)
Deep Learning
Python
Data modeling
icon
Hyderabad
icon
7 - 12 yrs
icon
₹5L - ₹15L / yr
o Critical thinking mind who likes to solve complex problems, loves programming, and cherishes to work in a fast-paced environment.
o Strong Python development skills, with 7+ yrs. experience with SQL.
o A bachelor or master’s degree in Computer Science or related areas
o 5+ years of experience in data integration and pipeline development
o Experience in Implementing Databricks Delta lake and data lake
o Expertise designing and implementing data pipelines using modern data engineering approach and tools: SQL, Python, Delta Lake, Databricks, Snowflake Spark
o Experience in working with multiple file formats (Parque, Avro, Delta Lake) & API
o experience with AWS Cloud on data integration with S3.
o Hands on Development experience with Python and/or Scala.
o Experience with SQL and NoSQL databases.
o Experience in using data modeling techniques and tools (focused on Dimensional design)
o Experience with micro-service architecture using Docker and Kubernetes
o Have experience working with one or more of the public cloud providers i.e. AWS, Azure or GCP
o Experience in effectively presenting and summarizing complex data to diverse audiences through visualizations and other means
o Excellent verbal and written communications skills and strong leadership capabilities

Skills:
ML
MOdelling
Python
SQL
Azure Data Lake, dataFactory, Databricks, Delta Lake
Job posted by
Yash Rathod

Big Data Spark Lead

at Datametica Solutions Private Limited

Founded 2013  •  Products & Services  •  100-1000 employees  •  Profitable
Apache Spark
Big Data
Spark
Scala
Hadoop
MapReduce
Java
Apache Hive
icon
Pune, Hyderabad
icon
7 - 12 yrs
icon
₹7L - ₹20L / yr
We at Datametica Solutions Private Limited are looking for Big Data Spark Lead who have a passion for cloud with knowledge of different on-premise and cloud Data implementation in the field of Big Data and Analytics including and not limiting to Teradata, Netezza, Exadata, Oracle, Cloudera, Hortonworks and alike.
Ideal candidates should have technical experience in migrations and the ability to help customers get value from Datametica's tools and accelerators.

Job Description
Experience : 7+ years
Location : Pune / Hyderabad
Skills :
  • Drive and participate in requirements gathering workshops, estimation discussions, design meetings and status review meetings
  • Participate and contribute in Solution Design and Solution Architecture for implementing Big Data Projects on-premise and on cloud
  • Technical Hands on experience in design, coding, development and managing Large Hadoop implementation
  • Proficient in SQL, Hive, PIG, Spark SQL, Shell Scripting, Kafka, Flume, Scoop with large Big Data and Data Warehousing projects with either Java, Python or Scala based Hadoop programming background
  • Proficient with various development methodologies like waterfall, agile/scrum and iterative
  • Good Interpersonal skills and excellent communication skills for US and UK based clients

About Us!
A global Leader in the Data Warehouse Migration and Modernization to the Cloud, we empower businesses by migrating their Data/Workload/ETL/Analytics to the Cloud by leveraging Automation.

We have expertise in transforming legacy Teradata, Oracle, Hadoop, Netezza, Vertica, Greenplum along with ETLs like Informatica, Datastage, AbInitio & others, to cloud-based data warehousing with other capabilities in data engineering, advanced analytics solutions, data management, data lake and cloud optimization.

Datametica is a key partner of the major cloud service providers - Google, Microsoft, Amazon, Snowflake.


We have our own products!
Eagle –
Data warehouse Assessment & Migration Planning Product
Raven –
Automated Workload Conversion Product
Pelican -
Automated Data Validation Product, which helps automate and accelerate data migration to the cloud.

Why join us!
Datametica is a place to innovate, bring new ideas to live and learn new things. We believe in building a culture of innovation, growth and belonging. Our people and their dedication over these years are the key factors in achieving our success.

Benefits we Provide!
Working with Highly Technical and Passionate, mission-driven people
Subsidized Meals & Snacks
Flexible Schedule
Approachable leadership
Access to various learning tools and programs
Pet Friendly
Certification Reimbursement Policy

Check out more about us on our website below!
www.datametica.com
Job posted by
Sumangali Desai

Big Data Engineer

at Netmeds.com

Founded 2015  •  Product  •  500-1000 employees  •  Raised funding
Big Data
Hadoop
Apache Hive
Scala
Spark
Datawarehousing
Machine Learning (ML)
Deep Learning
SQL
Data modeling
PySpark
Python
Amazon Web Services (AWS)
Java
Cassandra
DevOps
HDFS
icon
Chennai
icon
2 - 5 yrs
icon
₹6L - ₹25L / yr

We are looking for an outstanding Big Data Engineer with experience setting up and maintaining Data Warehouse and Data Lakes for an Organization. This role would closely collaborate with the Data Science team and assist the team build and deploy machine learning and deep learning models on big data analytics platforms.

Roles and Responsibilities:

  • Develop and maintain scalable data pipelines and build out new integrations and processes required for optimal extraction, transformation, and loading of data from a wide variety of data sources using 'Big Data' technologies.
  • Develop programs in Scala and Python as part of data cleaning and processing.
  • Assemble large, complex data sets that meet functional / non-functional business requirements and fostering data-driven decision making across the organization.  
  • Responsible to design and develop distributed, high volume, high velocity multi-threaded event processing systems.
  • Implement processes and systems to validate data, monitor data quality, ensuring production data is always accurate and available for key stakeholders and business processes that depend on it.
  • Perform root cause analysis on internal and external data and processes to answer specific business questions and identify opportunities for improvement.
  • Provide high operational excellence guaranteeing high availability and platform stability.
  • Closely collaborate with the Data Science team and assist the team build and deploy machine learning and deep learning models on big data analytics platforms.

Skills:

  • Experience with Big Data pipeline, Big Data analytics, Data warehousing.
  • Experience with SQL/No-SQL, schema design and dimensional data modeling.
  • Strong understanding of Hadoop Architecture, HDFS ecosystem and eexperience with Big Data technology stack such as HBase, Hadoop, Hive, MapReduce.
  • Experience in designing systems that process structured as well as unstructured data at large scale.
  • Experience in AWS/Spark/Java/Scala/Python development.
  • Should have Strong skills in PySpark (Python & SPARK). Ability to create, manage and manipulate Spark Dataframes. Expertise in Spark query tuning and performance optimization.
  • Experience in developing efficient software code/frameworks for multiple use cases leveraging Python and big data technologies.
  • Prior exposure to streaming data sources such as Kafka.
  • Should have knowledge on Shell Scripting and Python scripting.
  • High proficiency in database skills (e.g., Complex SQL), for data preparation, cleaning, and data wrangling/munging, with the ability to write advanced queries and create stored procedures.
  • Experience with NoSQL databases such as Cassandra / MongoDB.
  • Solid experience in all phases of Software Development Lifecycle - plan, design, develop, test, release, maintain and support, decommission.
  • Experience with DevOps tools (GitHub, Travis CI, and JIRA) and methodologies (Lean, Agile, Scrum, Test Driven Development).
  • Experience building and deploying applications on on-premise and cloud-based infrastructure.
  • Having a good understanding of machine learning landscape and concepts. 

 

Qualifications and Experience:

Engineering and post graduate candidates, preferably in Computer Science, from premier institutions with proven work experience as a Big Data Engineer or a similar role for 3-5 years.

Certifications:

Good to have at least one of the Certifications listed here:

    AZ 900 - Azure Fundamentals

    DP 200, DP 201, DP 203, AZ 204 - Data Engineering

    AZ 400 - Devops Certification

Job posted by
Vijay Hemnath

Data Scientist

at Simplilearn Solutions

Founded 2009  •  Product  •  500-1000 employees  •  Profitable
Data Science
R Programming
Python
Scala
Tableau
SQL server
icon
Bengaluru (Bangalore)
icon
2 - 5 yrs
icon
₹6L - ₹10L / yr
Simplilearn.com is the world’s largest professional certifications company and an Onalytica Top 20 influential brand. With a library of 400+ courses, we've helped 500,000+ professionals advance their careers, delivering $5 billion in pay raises. Simplilearn has over 6500 employees worldwide and our customers include Fortune 1000 companies, top universities, leading agencies and hundreds of thousands of working professionals. We are growing over 200% year on year and having fun doing it. Description We are looking for candidates with strong technical skills and proven track record in building predictive solutions for enterprises. This is a very challenging role and provides an opportunity to work on developing insights based Ed-Tech software products used by large set of customers across globe. It provides an exciting opportunity to work across various advanced analytics & data science problem statement using cutting-edge modern technologies collaborating with product, marketing & sales teams. Responsibilities • Work on enterprise level advanced reporting requirements & data analysis. • Solve various data science problems customer engagement, dynamic pricing, lead scoring, NPS improvement, optimization, chatbots etc. • Work on data engineering problems utilizing our tech stack - S3 Datalake, Spark, Redshift, Presto, Druid, Airflow etc. • Collect relevant data from source systems/Use crawling and parsing infrastructure to put together data sets. • Craft, conduct and analyse A/B experiments to evaluate machine learning models/algorithms. • Communicate findings and take algorithms/models to production with ownership. Desired Skills • BE/BTech/MSc/MS in Computer Science or related technical field. • 2-5 years of experience in advanced analytics discipline with solid data engineering & visualization skills. • Strong SQL skills and BI skills using Tableau & ability to perform various complex analytics in data. • Ability to propose hypothesis and design experiments in the context of specific problems using statistics & ML algorithms. • Good overlap with Modern Data processing framework such as AWS-lambda, Spark using Scala or Python. • Dedication and diligence in understanding the application domain, collecting/cleaning data and conducting various A/B experiments. • Bachelor Degree in Statistics or, prior experience with Ed-Tech is a plus
Job posted by
Aniket Manhar Nanjee

Lead Data Engineer

at Lymbyc

Founded 2012  •  Product  •  100-500 employees  •  Profitable
Apache Spark
Apache Kafka
Druid Database
Big Data
Apache Sqoop
RESTful APIs
Elasticsearch
Apache Ranger
Apache Atlas
kappa
icon
Bengaluru (Bangalore), Chennai
icon
4 - 8 yrs
icon
₹9L - ₹14L / yr
Key skill set : Apache NiFi, Kafka Connect (Confluent), Sqoop, Kylo, Spark, Druid, Presto, RESTful services, Lambda / Kappa architectures Responsibilities : - Build a scalable, reliable, operable and performant big data platform for both streaming and batch analytics - Design and implement data aggregation, cleansing and transformation layers Skills : - Around 4+ years of hands-on experience designing and operating large data platforms - Experience in Big data Ingestion, Transformation and stream/batch processing technologies using Apache NiFi, Apache Kafka, Kafka Connect (Confluent), Sqoop, Spark, Storm, Hive etc; - Experience in designing and building streaming data platforms in Lambda, Kappa architectures - Should have working experience in one of NoSQL, OLAP data stores like Druid, Cassandra, Elasticsearch, Pinot etc; - Experience in one of data warehousing tools like RedShift, BigQuery, Azure SQL Data Warehouse - Exposure to other Data Ingestion, Data Lake and querying frameworks like Marmaray, Kylo, Drill, Presto - Experience in designing and consuming microservices - Exposure to security and governance tools like Apache Ranger, Apache Atlas - Any contributions to open source projects a plus - Experience in performance benchmarks will be a plus
Job posted by
Venky Thiriveedhi

Bigdata Lead

at Saama Technologies

Founded 1997  •  Products & Services  •  100-1000 employees  •  Profitable
Hadoop
Spark
Apache Hive
Apache Flume
Java
Python
Scala
MySQL
Game Design
Technical Writing
icon
Pune
icon
2 - 5 yrs
icon
₹1L - ₹18L / yr
Description Deep experience and understanding of Apache Hadoop and surrounding technologies required; Experience with Spark, Impala, Hive, Flume, Parquet and MapReduce. Strong understanding of development languages to include: Java, Python, Scala, Shell Scripting Expertise in Apache Spark 2. x framework principals and usages. Should be proficient in developing Spark Batch and Streaming job in Python, Scala or Java. Should have proven experience in performance tuning of Spark applications both from application code and configuration perspective. Should be proficient in Kafka and integration with Spark. Should be proficient in Spark SQL and data warehousing techniques using Hive. Should be very proficient in Unix shell scripting and in operating on Linux. Should have knowledge about any cloud based infrastructure. Good experience in tuning Spark applications and performance improvements. Strong understanding of data profiling concepts and ability to operationalize analyses into design and development activities Experience with best practices of software development; Version control systems, automated builds, etc. Experienced in and able to lead the following phases of the Software Development Life Cycle on any project (feasibility planning, analysis, development, integration, test and implementation) Capable of working within the team or as an individual Experience to create technical documentation
Job posted by
Sandeep Chaudhary

Machine Learning Engineer

at UnFound

Founded 2017  •  Product  •  20-100 employees  •  Bootstrapped
Machine Learning (ML)
Deep Learning
Natural Language Processing (NLP)
Python
Microservices
Cloud Computing
Java
MongoDB
icon
Mumbai
icon
1 - 40 yrs
icon
₹5L - ₹5L / yr
Does the current state of media frustrate you? Do you want to change the way we consume news? Are you a kickass machine learning practitioner and aspiring entrepreneur, who has opinions on world affairs as well? If so, continue reading! We at UnFound are developing a product which simplifies complex and cluttered news into simple themes, removes bias by showing all (& often unheard of) perspectives, and produce crisp summaries- all with minimal human intervention! We are looking for passionate and experienced machine learning ENGINEER/INTERN, *preferably* with experience in NLP. We want someone who can take initiatives. If you need to be micro-managed, this is NOT the role for you. 1. Demonstrable background in machine learning, especially NLP, information retrieval, etc. 2. Hands on with popular data science frameworks- Python, Jupyter, TensorFlow, PyTorch. 3. Implementation ready background in deep learning techniques like word embeddings, CNN, RNN/LSTM, etc. 4. Experience with productionizing machine learning solutions, especially ML powered mobile/ web-apps/ BOTs. 5. Hands on experience on AWS, and other cloud platforms. GPU experience is strongly preferred. 6. Thorough understanding of back-end concepts, and databases (SQL, Postgres, NoSQL, etc.) 7. Good Kaggle (or similar) scores, MOOC (Udacity, Coursera, fast.ai, etc.) preferred.
Job posted by
Ankur Pandey
Did not find a job you were looking for?
icon
Search for relevant jobs from 10000+ companies such as Google, Amazon & Uber actively hiring on Cutshort.
Get to hear about interesting companies hiring right now
iconFollow Cutshort
Want to apply to this role at a software product company working on petabyte scale data?
Why apply via Cutshort?
Connect with actual hiring teams and get their fast response. No spam.
Learn more
Get to hear about interesting companies hiring right now
iconFollow Cutshort