About Wise Source
Similar jobs
Airflow developer:
Exp: 5 to 10yrs & Relevant exp must be above 4 Years.
Work location: Hyderabad (Hybrid Model)
Job description:
· Experience in working on Airflow.
· Experience in SQL, Python, and Object-oriented programming.
· Experience in the data warehouse, database concepts, and ETL tools (Informatica, DataStage, Pentaho, etc.).
· Azure experience and exposure to Kubernetes.
· Experience in Azure data factory, Azure Databricks, and Snowflake.
Required Skills: Azure Databricks/Data Factory, Kubernetes/Dockers, DAG Development, Hands-on Python coding.
Responsibilities
- Work on execution and scheduling of all tasks related to assigned projects' deliverable dates
- Optimize and debug existing codes to make them scalable and improve performance
- Design, development, and delivery of tested code and machine learning models into production environments
- Work effectively in teams, managing and leading teams
- Provide effective, constructive feedback to the delivery leader
- Manage client expectations and work with an agile mindset with machine learning and AI technology
- Design and prototype data-driven solutions
Eligibility
- Highly experienced in designing, building, and shipping scalable and production-quality machine learning algorithms in the field of Python applications
- Working knowledge and experience in NLP core components (NER, Entity Disambiguation, etc.)
- In-depth expertise in Data Munging and Storage (Experienced in SQL, NoSQL, MongoDB, Graph Databases)
- Expertise in writing scalable APIs for machine learning models
- Experience with maintaining code logs, task schedulers, and security
- Working knowledge of machine learning techniques, feed-forward, recurrent and convolutional neural networks, entropy models, supervised and unsupervised learning
- Experience with at least one of the following: Keras, Tensorflow, Caffe, or PyTorch
- Minimum 2.5 years of experience as a Python Developer.
- Minimum 2.5 years of experience in any framework like Django/Flask/Fast API
- Minimum 2.5 years of experience in SQL/ Postgress
- Minimum 2.5 years of experience in Git/Gitlab/Bit-Bucket
- Minimum 2+ years of experience in deployment (CICD with Jenkins)
- Minimum 2.5 years of experience in any cloud like AWS/GCP/Azure
TOP 3 SKILLS
Python (Language)
Spark Framework
Spark Streaming
Docker/Jenkins/ Spinakar
AWS
Hive Queries
He/She should be good coder.
Preff: - Airflow
Must have experience: -
Python
Spark framework and streaming
exposure to Machine Learning Lifecycle is mandatory.
Project:
This is searching domain project. Any searching activity which is happening on website this team create the model for the same, they create sorting/scored model for any search. This is done by the data
scientist This team is working more on the streaming side of data, the candidate would work extensively on Spark streaming and there will be a lot of work in Machine Learning.
INTERVIEW INFORMATION
3-4 rounds.
1st round based on data engineering batching experience.
2nd round based on data engineering streaming experience.
3rd round based on ML lifecycle (3rd round can be a techno-functional round based on previous
feedbacks otherwise 4th round will be a functional round if required.
Object-oriented languages (e.g. Python, PySpark, Java, C#, C++ ) and frameworks (e.g. J2EE or .NET)
About us
DataWeave provides Retailers and Brands with “Competitive Intelligence as a Service” that enables them to take key decisions that impact their revenue. Powered by AI, we provide easily consumable and actionable competitive intelligence by aggregating and analyzing billions of publicly available data points on the Web to help businesses develop data-driven strategies and make smarter decisions.
Data Science@DataWeave
We the Data Science team at DataWeave (called Semantics internally) build the core machine learning backend and structured domain knowledge needed to deliver insights through our data products. Our underpinnings are: innovation, business awareness, long term thinking, and pushing the envelope. We are a fast paced labs within the org applying the latest research in Computer Vision, Natural Language Processing, and Deep Learning to hard problems in different domains.
How we work?
It's hard to tell what we love more, problems or solutions! Every day, we choose to address some of the hardest data problems that there are. We are in the business of making sense of messy public data on the web. At serious scale!
What do we offer?
● Some of the most challenging research problems in NLP and Computer Vision. Huge text and image
datasets that you can play with!
● Ability to see the impact of your work and the value you're adding to our customers almost immediately.
● Opportunity to work on different problems and explore a wide variety of tools to figure out what really
excites you.
● A culture of openness. Fun work environment. A flat hierarchy. Organization wide visibility. Flexible
working hours.
● Learning opportunities with courses and tech conferences. Mentorship from seniors in the team.
● Last but not the least, competitive salary packages and fast paced growth opportunities.
Who are we looking for?
The ideal candidate is a strong software developer or a researcher with experience building and shipping production grade data science applications at scale. Such a candidate has keen interest in liaising with the business and product teams to understand a business problem, and translate that into a data science problem.
You are also expected to develop capabilities that open up new business productization opportunities.
We are looking for someone with a Master's degree and 1+ years of experience working on problems in NLP or Computer Vision.
If you have 4+ years of relevant experience with a Master's degree (PhD preferred), you will be considered for a senior role.
Key problem areas
● Preprocessing and feature extraction noisy and unstructured data -- both text as well as images.
● Keyphrase extraction, sequence labeling, entity relationship mining from texts in different domains.
● Document clustering, attribute tagging, data normalization, classification, summarization, sentiment
analysis.
● Image based clustering and classification, segmentation, object detection, extracting text from images,
generative models, recommender systems.
● Ensemble approaches for all the above problems using multiple text and image based techniques.
Relevant set of skills
● Have a strong grasp of concepts in computer science, probability and statistics, linear algebra, calculus,
optimization, algorithms and complexity.
● Background in one or more of information retrieval, data mining, statistical techniques, natural language
processing, and computer vision.
● Excellent coding skills on multiple programming languages with experience building production grade
systems. Prior experience with Python is a bonus.
● Experience building and shipping machine learning models that solve real world engineering problems.
Prior experience with deep learning is a bonus.
● Experience building robust clustering and classification models on unstructured data (text, images, etc).
Experience working with Retail domain data is a bonus.
● Ability to process noisy and unstructured data to enrich it and extract meaningful relationships.
● Experience working with a variety of tools and libraries for machine learning and visualization, including
numpy, matplotlib, scikit-learn, Keras, PyTorch, Tensorflow.
● Use the command line like a pro. Be proficient in Git and other essential software development tools.
● Working knowledge of large-scale computational models such as MapReduce and Spark is a bonus.
● Be a self-starter—someone who thrives in fast paced environments with minimal ‘management’.
● It's a huge bonus if you have some personal projects (including open source contributions) that you work
on during your spare time. Show off some of your projects you have hosted on GitHub.
Role and responsibilities
● Understand the business problems we are solving. Build data science capability that align with our product strategy.
● Conduct research. Do experiments. Quickly build throw away prototypes to solve problems pertaining to the Retail domain.
● Build robust clustering and classification models in an iterative manner that can be used in production.
● Constantly think scale, think automation. Measure everything. Optimize proactively.
● Take end to end ownership of the projects you are working on. Work with minimal supervision.
● Help scale our delivery, customer success, and data quality teams with constant algorithmic improvements and automation.
● Take initiatives to build new capabilities. Develop business awareness. Explore productization opportunities.
● Be a tech thought leader. Add passion and vibrance to the team. Push the envelope. Be a mentor to junior members of the team.
● Stay on top of latest research in deep learning, NLP, Computer Vision, and other relevant areas.