- Five+ years experience working in a Big Data Software Development role
- Experience managing and deploying ML models in real world environments
- Bachelor's degree in Computer Science, Mathematics, Statistics, or other analytical fields
- Experience working with Python, Scala, Spark or other open-source software with data science libraries
- Experience in advanced math and statistics
- Excellent familiarity with command line linux environment
- Able to understand various data structures and common methods in data transformation
- Experience deploying machine learning models
About Us :
Docsumo is Document AI software that helps enterprises capture data and analyze customer documents. We convert documents such as invoices, ID cards, and bank statements into actionable data. We are work with clients such as PayU, Arbor and Hitachi and backed by Sequoia, Barclays, Techstars, and Better Capital.
As a Senior Machine Learning you will be working directly with the CTO to develop end to end API products for the US market in the information extraction domain.
- You will be designing and building systems that help Docsumo process visual data i.e. as PDF & images of documents.
- You'll work in our Machine Intelligence team, a close-knit group of scientists and engineers who incubate new capabilities from whiteboard sketches all the way to finished apps.
- You will get to learn the ins and outs of building core capabilities & API products that can scale globally.
- Should have hands-on experience applying advanced statistical learning techniques to different types of data.
- Should be able to design, build and work with RESTful Web Services in JSON and XML formats. (Flask preferred)
- Should follow Agile principles and processes including (but not limited to) standup meetings, sprints and retrospectives.
Skills / Requirements :
- Minimum 3+ years experience working in machine learning, text processing, data science, information retrieval, deep learning, natural language processing, text mining, regression, classification, etc.
- Must have a full-time degree in Computer Science or similar (Statistics/Mathematics)
- Working with OpenCV, TensorFlow and Keras
- Working with Python: Numpy, Scikit-learn, Matplotlib, Panda
- Familiarity with Version Control tools such as Git
- Theoretical and practical knowledge of SQL / NoSQL databases with hands-on experience in at least one database system.
- Must be self-motivated, flexible, collaborative, with an eagerness to learn
- Designing and implementing fine-tuned production ready data/ML pipelines in Hadoop platform.
- Driving optimization, testing and tooling to improve quality.
- Reviewing and approving high level & amp; detailed design to ensure that the solution delivers to the business needs and aligns to the data & analytics architecture principles and roadmap.
- Understanding business requirements and solution design to develop and implement solutions that adhere to big data architectural guidelines and address business requirements.
- Following proper SDLC (Code review, sprint process).
- Identifying, designing, and implementing internal process improvements: automating manual processes, optimizing data delivery, etc.
- Building robust and scalable data infrastructure (both batch processing and real-time) to support needs from internal and external users.
- Understanding various data security standards and using secure data security tools to apply and adhere to the required data controls for user access in the Hadoop platform.
- Supporting and contributing to development guidelines and standards for data ingestion.
- Working with a data scientist and business analytics team to assist in data ingestion and data related technical issues.
- Designing and documenting the development & deployment flow.
- Experience in developing rest API services using one of the Scala frameworks.
- Ability to troubleshoot and optimize complex queries on the Spark platform
- Expert in building and optimizing ‘big data’ data/ML pipelines, architectures and data sets.
- Knowledge in modelling unstructured to structured data design.
- Experience in Big Data access and storage techniques.
- Experience in doing cost estimation based on the design and development.
- Excellent debugging skills for the technical stack mentioned above which even includes analyzing server logs and application logs.
- Highly organized, self-motivated, proactive, and ability to propose best design solutions.
- Good time management and multitasking skills to work to deadlines by working independently and as a part of a team.
Job roles and responsibilities:
- Design, develop, test, deploy, maintain and improve ML models/infrastructure and software that uses these models
- Experience writing software in one or more languages such as Python, Scala, R, or similar with strong competencies in data structures, algorithms, and software design
- Experience working with recommendation engines, data pipelines, or distributed machine learning
- Experience working with deep learning frameworks (such as TensorFlow, Keras, Torch, Caffe, Theano)
- Knowledge of data analytics concepts, including bigdata, data warehouse technical architectures, ETL and reporting/analytic tools and environments
- Participate in cutting edge research in artificial intelligence and machine learning applications
- Contribute to engineering efforts from planning and organization to execution and delivery to solve complex, real world engineering problems
- Working knowledge on different Algorithms and Machine Learning techniques like, Linear & Logistic Regression analysis, Segmentation, Decisions trees, Cluster analysis and factor analysis, Time Series Analysis, K-Nearest Neighbour, K-Means algorithm, Random Forests Algorithm, NLP (Natural language processing), Sentimental analysis, various Artificial Neural Networks, Convolution Neural Nets (CNN), Bidirectional Recurrent Neural Networks (BRNN)
- Demonstrated excellent communication, presentation, and problem-solving skills
Technical Skills Required:
- GCP Native AI/ML services like Vision, NLP, Document AI, Dialogflow, CCAI, BQ etc.,
- Proficiency with a deep learning framework such as TensorFlow or Keras, etc.,
- Proficiency with Python and basic libraries for machine learning such as scikit-learn and pandas, jupyter notebook
- Expertise in visualizing and manipulating big datasets
- Ability to select hardware to run an ML model with the required latency
- Good to have MLOps and Kubeflow knowledge
- GCP ML Engineer Certification
The Biostrap platform extracts many metrics related to health, sleep, and activity. Many algorithms are designed through research and often based on scientific literature, and in some cases they are augmented with or entirely designed using machine learning techniques. Biostrap is seeking a Data Scientist to design, develop, and implement algorithms to improve existing metrics and measure new ones.
As a Data Scientist at Biostrap, you will take on projects to improve or develop algorithms to measure health metrics, including:
- Research: search literature for starting points of the algorithm
- Design: decide on the general idea of the algorithm, in particular whether to use machine learning, mathematical techniques, or something else.
- Implement: program the algorithm in Python, and help deploy it.
The algorithms and their implementation will have to be accurate, efficient, and well-documented.
- A Master’s degree in a computational field, with a strong mathematical background.
- Strong knowledge of, and experience with, different machine learning techniques, including their theoretical background.
- Strong experience with Python
- Experience with Keras/TensorFlow, and preferably also with RNNs
- Experience with AWS or similar services for data pipelining and machine learning.
- Ability and drive to work independently on an open problem.
- Fluency in English.
Do you want to help build real technology for a meaningful purpose? Do you want to contribute to making the world more sustainable, advanced and accomplished extraordinary precision in Analytics?
What is your role?
As a Computer Vision & Machine Learning Engineer at Datasee.AI, you’ll be core to the development of our robotic harvesting system’s visual intelligence. You’ll bring deep computer vision, machine learning, and software expertise while also thriving in a fast-paced, flexible, and energized startup environment. As an early team member, you’ll directly build our success, growth, and culture. You’ll hold a significant role and are excited to grow your role as Datasee.AI grows.
What you’ll do
- You will be working with the core R&D team which drives the computer vision and image processing development.
- Build deep learning model for our data and object detection on large scale images.
- Design and implement real-time algorithms for object detection, classification, tracking, and segmentation
- Coordinate and communicate within computer vision, software, and hardware teams to design and execute commercial engineering solutions.
- Automate the workflow process between the fast-paced data delivery systems.
What we are looking for
- 1 to 3+ years of professional experience in computer vision and machine learning.
- Extensive use of Python
- Experience in python libraries such as OpenCV, Tensorflow and Numpy
- Familiarity with a deep learning library such as Keras and PyTorch
- Worked on different CNN architectures such as FCN, R-CNN, Fast R-CNN and YOLO
- Experienced in hyperparameter tuning, data augmentation, data wrangling, model optimization and model deployment
- B.E./M.E/M.Sc. Computer Science/Engineering or relevant degree
- Dockerization, AWS modules and Production level modelling
- Basic knowledge of the Fundamentals of GIS would be added advantage
- Experience with Qt, Desktop application development, Desktop Automation
- Knowledge on Satellite image processing, Geo-Information System, GDAL, Qgis and ArcGIS
Datasee>AI, Inc. is an AI driven Image Analytics company offering Asset Management solutions for industries in the sectors of Renewable Energy, Infrastructure, Utilities & Agriculture. With core expertise in Image processing, Computer Vision & Machine Learning, Takvaviya’s solution provides value across the enterprise for all the stakeholders through a data driven approach.
With Sales & Operations based out of US, Europe & India, Datasee.AI is a team of 32 people located across different geographies and with varied domain expertise and interests.
A focused and happy bunch of people who take tasks head-on and build scalable platforms and products.
• Solid technical / data-mining skills and ability to work with large volumes of data; extract
and manipulate large datasets using common tools such as Python and SQL other
programming/scripting languages to translate data into business decisions/results
• Be data-driven and outcome-focused
• Must have good business judgment with demonstrated ability to think creatively and
• Must be an intuitive, organized analytical thinker, with the ability to perform detailed
• Takes personal ownership; Self-starter; Ability to drive projects with minimal guidance
and focus on high impact work
• Learns continuously; Seeks out knowledge, ideas and feedback.
• Looks for opportunities to build owns skills, knowledge and expertise.
• Experience with big data and cloud computing viz. Spark, Hadoop (MapReduce, PIG,
• Experience in risk and credit score domains preferred
• Comfortable with ambiguity and frequent context-switching in a fast-paced
As an experienced Data Scientist you’ll join a team of data scientists, analysts, and software engineers
working to push the boundaries of data science in health care. We like to experiment, iterate, and
innovate with technology, from developing new algorithms specific to health care’s challenges, to
bringing the latest machine learning practices and applications developed in other industries into the
health care world. We know that algorithms are only valuable when powered by the right data, so we
focus on fully understanding the problems we need to solve, and truly understanding the data behind
them before launching into solutions – ensuring that the solutions we do land on are impactful and
• Research, conceptualize, and implement analytical approaches and predictive modeling to
evaluate scenarios, predict utilization and clinical outcomes, and recommend actions to impact
• Manage and execute on the entire model development process, including scope definition,
hypothesis formation, data cleaning and preparation, feature selection, model implementation
in production, validation and iteration, using multiple data sources.
• Provide guidance on necessary data and software infrastructure capabilities to deliver a scalable
solution across partners and support the implementation of the team’s algorithms and models
• Contribute to the development and publication in major journals, conferences showcasing
leadership in healthcare data science.
• Work closely and collaborate with Data Scientists, Machine Learning engineers, IT teams and
Business stakeholders spread out across various locations in US and India to achieve business
• Provide guidance to other Data Scientist and Machine Learning Engineers
Want to make every line of code count? Tired of being a small cog in a big machine? Like a fast-paced environment where stuff get DONE? Wanna grow with a fast-growing company (both career and compensation)? Like to wear different hats? Join ThinkDeeply in our mission to create and apply Enterprise-Grade AI for all types of applications.
Seeking an M.L. Engineer with high aptitude toward development. Will also consider coders with high aptitude in M.L. Years of experience is important but we are also looking for interest and aptitude. As part of the early engineering team, you will have a chance to make a measurable impact in future of Thinkdeeply as well as having a significant amount of responsibility.
Bachelors/Masters or Phd in Computer Science or related industry experience
3+ years of Industry Experience in Deep Learning Frameworks in PyTorch or TensorFlow
7+ Years of industry experience in scripting languages such as Python, R.
7+ years in software development doing at least some level of Researching / POCs, Prototyping, Productizing, Process improvement, Large-data processing / performance computing
Familiar with non-neural network methods such as Bayesian, SVM, Adaboost, Random Forests etc
Some experience in setting up large scale training data pipelines.
Some experience in using Cloud services such as AWS, GCP, Azure
Experience in building deep learning models for Computer Vision and Natural Language Processing domains
Experience in productionizing/serving machine learning in industry setting
Understand the principles of developing cloud native applications
Collect, Organize and Process data pipelines for developing ML models
Research and develop novel prototypes for customers
Train, implement and evaluate shippable machine learning models
Deploy and iterate improvements of ML Models through feedback