5 - 6 Years
Good Knowledge in Relation and Non-Relational Database
To write Complex Queries and Identify problematic queries and provide a Solution
Good Hands on database tools
Experience in Both SQL and NON SQL Database like SQL Server, PostgreSQL, Mango DB, Maria DB. Etc.
Worked on Data Model Preparation & Structuring Database etc.
What is the role?
You will be responsible for building and maintaining highly scalable data infrastructure for our cloud-hosted SAAS product. You will work closely with the Product Managers and Technical team to define and implement data pipelines for customer-facing and internal reports.
- Design and develop resilient data pipelines.
- Write efficient queries to fetch data from the report database.
- Work closely with application backend engineers on data requirements for their stories.
- Designing and developing report APIs for the front end to consume.
- Focus on building highly available, fault-tolerant report systems.
- Constantly improve the architecture of the application by clearing the technical backlog.
- Adopt a culture of learning and development to constantly keep pace with and adopt new technolgies.
What are we looking for?
An enthusiastic individual with the following skills. Please do not hesitate to apply if you do not match all of it. We are open to promising candidates who are passionate about their work and are team players.
- Education - BE/MCA or equivalent
- Overall 8+ years of experience
- Expert level understanding of database concepts and BI.
- Well verse in databases such as MySQL, MongoDB and hands on experience in creating data models.
- Must have designed and implemented low latency data warehouse systems.
- Must have strong understanding of Kafka and related systems.
- Experience in clickhouse database preferred.
- Must have good knowledge of APIs and should be able to build interfaces for frontend engineers.
- Should be innovative and communicative in approach
- Will be responsible for functional/technical track of a project
Whom will you work with?
You will work with a top-notch tech team, working closely with the CTO and product team.
What can you look for?
A wholesome opportunity in a fast-paced environment that will enable you to juggle between concepts, yet maintain the quality on content, interact and share your ideas and have loads of learning while at work. Work with a team of highly talented young professionals and enjoy the benefits.
A fast-growing SaaS commerce company based in Bangalore with offices in Delhi, Mumbai, SF, Dubai, Singapore and Dublin. We have three products in our portfolio: Plum, Empuls and Compass. Works with over 1000 global clients. We help our clients in engaging and motivating their employees, sales teams, channel partners or consumers for better business results.
The Biostrap platform extracts many metrics related to health, sleep, and activity. Many algorithms are designed through research and often based on scientific literature, and in some cases they are augmented with or entirely designed using machine learning techniques. Biostrap is seeking a Data Scientist to design, develop, and implement algorithms to improve existing metrics and measure new ones.
As a Data Scientist at Biostrap, you will take on projects to improve or develop algorithms to measure health metrics, including:
- Research: search literature for starting points of the algorithm
- Design: decide on the general idea of the algorithm, in particular whether to use machine learning, mathematical techniques, or something else.
- Implement: program the algorithm in Python, and help deploy it.
The algorithms and their implementation will have to be accurate, efficient, and well-documented.
- A Master’s degree in a computational field, with a strong mathematical background.
- Strong knowledge of, and experience with, different machine learning techniques, including their theoretical background.
- Strong experience with Python
- Experience with Keras/TensorFlow, and preferably also with RNNs
- Experience with AWS or similar services for data pipelining and machine learning.
- Ability and drive to work independently on an open problem.
- Fluency in English.
o Should be able to work with API, shards etc in Elasticsearch.
o Write parser in Logstash
o Create Dashboards in Kibana
• Mandatory Experience.
o Must have very good understanding of Log Analytics
o Hands on experience in Elasticsearch, logstash & Kibana should be at expert level
o Elasticsearch : Should be able to write Kibana API
o Logstash : Should be able to write parsers.
o Kibana : Create different visualization and dashboards according to the Client needs
o Scripts : Should be able to write scripts in linux.
- Data Steward :
Data Steward will collaborate and work closely within the group software engineering and business division. Data Steward has overall accountability for the group's / Divisions overall data and reporting posture by responsibly managing data assets, data lineage, and data access, supporting sound data analysis. This role requires focus on data strategy, execution, and support for projects, programs, application enhancements, and production data fixes. Makes well-thought-out decisions on complex or ambiguous data issues and establishes the data stewardship and information management strategy and direction for the group. Effectively communicates to individuals at various levels of the technical and business communities. This individual will become part of the corporate Data Quality and Data management/entity resolution team supporting various systems across the board.
- Responsible for data quality and data accuracy across all group/division delivery initiatives.
- Responsible for data analysis, data profiling, data modeling, and data mapping capabilities.
- Responsible for reviewing and governing data queries and DML.
- Accountable for the assessment, delivery, quality, accuracy, and tracking of any production data fixes.
- Accountable for the performance, quality, and alignment to requirements for all data query design and development.
- Responsible for defining standards and best practices for data analysis, modeling, and queries.
- Responsible for understanding end-to-end data flows and identifying data dependencies in support of delivery, release, and change management.
- Responsible for the development and maintenance of an enterprise data dictionary that is aligned to data assets and the business glossary for the group responsible for the definition and maintenance of the group's data landscape including overlays with the technology landscape, end-to-end data flow/transformations, and data lineage.
- Responsible for rationalizing the group's reporting posture through the definition and maintenance of a reporting strategy and roadmap.
- Partners with the data governance team to ensure data solutions adhere to the organization’s data principles and guidelines.
- Owns group's data assets including reports, data warehouse, etc.
- Understand customer business use cases and be able to translate them to technical specifications and vision on how to implement a solution.
- Accountable for defining the performance tuning needs for all group data assets and managing the implementation of those requirements within the context of group initiatives as well as steady-state production.
- Partners with others in test data management and masking strategies and the creation of a reusable test data repository.
- Responsible for solving data-related issues and communicating resolutions with other solution domains.
- Actively and consistently support all efforts to simplify and enhance the Clinical Trial Predication use cases.
- Apply knowledge in analytic and statistical algorithms to help customers explore methods to improve their business.
- Contribute toward analytical research projects through all stages including concept formulation, determination of appropriate statistical methodology, data manipulation, research evaluation, and final research report.
- Visualize and report data findings creatively in a variety of visual formats that appropriately provide insight to the stakeholders.
- Achieve defined project goals within customer deadlines; proactively communicate status and escalate issues as needed.
- Strong understanding of the Software Development Life Cycle (SDLC) with Agile Methodologies
- Knowledge and understanding of industry-standard/best practices requirements gathering methodologies.
- Knowledge and understanding of Information Technology systems and software development.
- Experience with data modeling and test data management tools.
- Experience in the data integration project • Good problem solving & decision-making skills.
- Good communication skills within the team, site, and with the customer
Knowledge, Skills and Abilities
- Technical expertise in data architecture principles and design aspects of various DBMS and reporting concepts.
- Solid understanding of key DBMS platforms like SQL Server, Azure SQL
- Results-oriented, diligent, and works with a sense of urgency. Assertive, responsible for his/her own work (self-directed), have a strong affinity for defining work in deliverables, and be willing to commit to deadlines.
- Experience in MDM tools like MS DQ, SAS DM Studio, Tamr, Profisee, Reltio etc.
- Experience in Report and Dashboard development
- Statistical and Machine Learning models
- Python (sklearn, numpy, pandas, genism)
- Nice to Have:
- 1yr of ETL experience
- Natural Language Processing
- Neural networks and Deep learning
- xperience in keras,tensorflow,spacy, nltk, LightGBM python library
Interaction : Frequently interacts with subordinate supervisors.
Education : Bachelor’s degree, preferably in Computer Science, B.E or other quantitative field related to the area of assignment. Professional certification related to the area of assignment may be required
Experience : 7 years of Pharmaceutical /Biotech/life sciences experience, 5 years of Clinical Trials experience and knowledge, Excellent Documentation, Communication, and Presentation Skills including PowerPoint
We are building a global content marketplace that brings companies and content
creators together to scale up content creation processes across 50+ content verticals and 150+ industries. Over the past 2.5 years, we’ve worked with companies like India Today, Amazon India, Adobe, Swiggy, Dunzo, Businessworld, Paisabazaar, IndiGo Airlines, Apollo Hospitals, Infoedge, Times Group, Digit, BookMyShow, UpGrad, Yulu, YourStory, and 350+ other brands.
Our mission is to become the world’s largest content creation and distribution platform for all kinds of content creators and brands.
We are a 25+ member company and is scaling up rapidly in both team size and our ambition.
If we were to define the kind of people and the culture we have, it would be -
a) Individuals with an Extreme Sense of Passion About Work
b) Individuals with Strong Customer and Creator Obsession
c) Individuals with Extraordinary Hustle, Perseverance & Ambition
We are on the lookout for individuals who are always open to going the extra mile and thrive in a fast-paced environment. We are strong believers in building a great, enduring
a company that can outlast its builders and create a massive impact on the lives of our
employees, creators, and customers alike.
We are fortunate to be backed by some of the industry’s most prolific angel investors - Kunal Bahl and Rohit Bansal (Snapdeal founders), YourStory Media. (Shradha Sharma); Dr. Saurabh Srivastava, Co-founder of IAN and NASSCOM; Slideshare co-founder Amit Ranjan; Indifi's Co-founder and CEO Alok Mittal; Sidharth Rao, Chairman of Dentsu Webchutney; Ritesh Malik, Co-founder and CEO of Innov8; Sanjay Tripathy, former CMO, HDFC Life, and CEO of Agilio Labs; Manan Maheshwari, Co-founder of WYSH; and Hemanshu Jain, Co-founder of Diabeto.
Backed by Lightspeed Venture Partners
● Design, develop, test, deploy, maintain and improve ML models
● Implement novel learning algorithms and recommendation engines
● Apply Data Science concepts to solve routine problems of target users
● Translates business analysis needs into well-defined machine learning problems, and
selecting appropriate models and algorithms
● Create an architecture, implement, maintain and monitor various data source pipelines
that can be used across various different types of data sources
● Monitor performance of the architecture and conduct optimization
● Produce clean, efficient code based on specifications
● Verify and deploy programs and systems
● Troubleshoot, debug and upgrade existing applications
● Guide junior engineers for productive contribution to the development
The ideal candidate must -
ML and NLP Engineer
● 4 or more years of experience in ML Engineering
● Proven experience in NLP
● Familiarity with language generative model - GPT3
● Ability to write robust code in Python
● Familiarity with ML frameworks and libraries
● Hands on experience with AWS services like Sagemaker and Personalize
● Exposure to state of the art techniques in ML and NLP
● Understanding of data structures, data modeling, and software architecture
● Outstanding analytical and problem-solving skills
● Team player, an ability to work cooperatively with the other engineers.
● Ability to make quick decisions in high-pressure environments with limited information.
Roles & Responsibilities
- Proven experience with deploying and tuning Open Source components into enterprise ready production tooling Experience with datacentre (Metal as a Service – MAAS) and cloud deployment technologies (AWS or GCP Architect certificates required)
- Deep understanding of Linux from kernel mechanisms through user space management
- Experience on CI/CD (Continuous Integrations and Deployment) system solutions (Jenkins).
- Using Monitoring tools (local and on public cloud platforms) Nagios, Prometheus, Sensu, ELK, Cloud Watch, Splunk, New Relic etc. to trigger instant alerts, reports and dashboards. Work closely with the development and infrastructure teams to analyze and design solutions with four nines (99.99%) up-time, globally distributed, clustered, production and non-production virtualized infrastructure.
- Wide understanding of IP networking as well as data centre infrastructure
- Expert with software development tools and sourcecode management, understanding, managing issues, code changes and grouping them into deployment releases in a stable and measurable way to maximize production Must be expert at developing and using ansible roles and configuring deployment templates with jinja2.
- Solid understanding of data collection tools like Flume, Filebeat, Metricbeat, JMX Exporter agents.
- Extensive experience operating and tuning the kafka streaming data platform, specifically as a message queue for big data processing
- Strong understanding and must have experience:
- Apache spark framework, specifically spark core and spark streaming,
- Orchestration platforms, mesos and kubernetes,
- Data storage platforms, elasticstack, carbon, clickhouse, cassandra, ceph, hdfs
- Core presentation technologies kibana, and grafana.
- Excellent scripting and programming skills (bash, python, java, go, rust). Must have previous experience with “rust” in order to support, improve in house developed products
Red Hat Certified Architect certificate or equivalent required CCNA certificate required 3-5 years of experience running open source big data platforms
GREETINGS FROM CODEMANTRA !!!
EXCELLENT OPPORTUNITY FOR DATA SCIENCE/AI AND ML ARCHITECT !!!
Skills and Qualifications
*Strong Hands-on experience in Python Programming
*** Working experience with Computer Vision models - Object Detection Model, Image Classification
* Good experience in feature extraction, feature selection techniques and transfer learning
* Working Experience in building deep learning NLP Models for text classification, image analytics-CNN,RNN,LSTM.
* Working Experience in any of the AWS/GCP cloud platforms, exposure in fetching data from various sources.
* Good experience in exploratory data analysis, data visualisation, and other data pre-processing techniques.
* Knowledge in any one of the DL frameworks like Tensorflow, Pytorch, Keras, Caffe Good knowledge in statistics, distribution of data and in supervised and unsupervised machine learning algorithms.
* Exposure to OpenCV Familiarity with GPUs + CUDA Experience with NVIDIA software for cluster management and provisioning such as nvsm, dcgm and DeepOps.
* We are looking for a candidate with 9+ years of relevant experience , who has attained a Graduate degree in Computer Science, Statistics, Informatics, Information Systems or another quantitative field. They should also have experience using the following software/tools: *Experience with big data tools: Hadoop, Spark, Kafka, etc.
*Experience with AWS cloud services: EC2, RDS, AWS-Sagemaker(Added advantage)
*Experience with object-oriented/object function scripting languages in any: Python, Java, C++, Scala, etc.
*Selecting features, building and optimizing classifiers using machine learning techniques
*Data mining using state-of-the-art methods
*Enhancing data collection procedures to include information that is relevant for building analytic systems
*Processing, cleansing, and verifying the integrity of data used for analysis
*Creating automated anomaly detection systems and constant tracking of its performance
*Assemble large, complex data sets that meet functional / non-functional business requirements.
*Secure and manage when needed GPU cluster resources for events
*Write comprehensive internal feedback reports and find opportunities for improvements
*Manage GPU instances/machines to increase the performance and efficiency of the ML/DL model
Pipelines should be optimised to handle both real time data, batch update data and historical data.
Establish scalable, efficient, automated processes for complex, large scale data analysis.
Write high quality code to gather and manage large data sets (both real time and batch data) from multiple sources, perform ETL and store it in a data warehouse.
Manipulate and analyse complex, high-volume, high-dimensional data from varying sources using a variety of tools and data analysis techniques.
Participate in data pipelines health monitoring and performance optimisations as well as quality documentation.
Interact with end users/clients and translate business language into technical requirements.
Acts independently to expose and resolve problems.
Job Requirements :-
2+ years experience working in software development & data pipeline development for enterprise analytics.
2+ years of working with Python with exposure to various warehousing tools
In-depth working with any of commercial tools like AWS Glue, Ta-lend, Informatica, Data-stage, etc.
Experience with various relational databases like MySQL, MSSql, Oracle etc. is a must.
Experience with analytics and reporting tools (Tableau, Power BI, SSRS, SSAS).
Experience in various DevOps practices helping the client to deploy and scale the systems as per requirement.
Strong verbal and written communication skills with other developers and business client.
Knowledge of Logistics and/or Transportation Domain is a plus.
Hands-on with traditional databases and ERP systems like Sybase and People-soft.