- Strong Python Coding skills and OOP skills
- Should have worked on Big Data product Architecture
- Should have worked with any one of the SQL-based databases like MySQL, PostgreSQL and any one of
- NoSQL-based databases such as Cassandra, Elasticsearch etc.
- Hands on experience on frameworks like Spark RDD, DataFrame, Dataset
- Experience on development of ETL for data product
- Candidate should have working knowledge on performance optimization, optimal resource utilization, Parallelism and tuning of spark jobs
- Working knowledge on file formats: CSV, JSON, XML, PARQUET, ORC, AVRO
- Good to have working knowledge with any one of the Analytical Databases like Druid, MongoDB, Apache Hive etc.
- Experience to handle real-time data feeds (good to have working knowledge on Apache Kafka or similar tool)
- Python and Scala (Optional), Spark / PySpark, Parallel programming
About A2Tech Consultants
Similar jobs
Data Scientist – Program Embedded
Job Description:
We are seeking a highly skilled and motivated senior data scientist to support a big data program. The successful candidate will play a pivotal role in supporting multiple projects in this program covering traditional tasks from revenue management, demand forecasting, improving customer experience to testing/using new tools/platforms such as Copilot Fabric for different purpose. The expected candidate would have deep expertise in machine learning methodology and applications. And he/she should have completed multiple large scale data science projects (full cycle from ideation to BAU). Beyond technical expertise, problem solving in complex set-up will be key to the success for this role. This is a data science role directly embedded into the program/projects, stake holder management and collaborations with patterner are crucial to the success on this role (on top of the deep expertise).
What we are looking for:
- Highly efficient in Python/Pyspark/R.
- Understand MLOps concepts, working experience in product industrialization (from Data Science point of view). Experience in building product for live deployment, and continuous development and continuous integration.
- Familiar with cloud platforms such as Azure, GCP, and the data management systems on such platform. Familiar with Databricks and product deployment on Databricks.
- Experience in ML projects involving techniques: Regression, Time Series, Clustering, Classification, Dimension Reduction, Anomaly detection with traditional ML approaches and DL approaches.
- Solid background in statistics, probability distributions, A/B testing validation, univariate/multivariate analysis, hypothesis test for different purpose, data augmentation etc.
- Familiar with designing testing framework for different modelling practice/projects based on business needs.
- Exposure to Gen AI tools and enthusiastic about experimenting and have new ideas on what can be done.
- If they have improved an internal company process using an AI tool, that would be great (e.g. process simplification, manual task automation, auto emails)
- Ideally, 10+ years of experience, and have been on independent business facing roles.
- CPG or retail as a data scientist would be nice, but not number one priority, especially for those who have navigated through multiple industries.
- Being proactive and collaborative would be essential.
Some projects examples within the program:
- Test new tools/platforms such as Copilo, Fabric for commercial reporting. Testing, validation and build trust.
- Building algorithms for predicting trend in category, consumptions to support dashboards.
- Revenue Growth Management, create/understand the algorithms behind the tools (can be built by 3rd parties) we need to maintain or choose to improve. Able to prioritize and build product roadmap. Able to design new solutions and articulate/quantify the limitation of the solutions.
- Demand forecasting, create localized forecasts to improve in store availability. Proper model monitoring for early detection of potential issues in the forecast focusing particularly on improving the end user experience.
● Able contribute to the gathering of functional requirements, developing technical
specifications, and project & test planning
● Demonstrating technical expertise, and solving challenging programming and design
problems
● Roughly 80% hands-on coding
● Generate technical documentation and PowerPoint presentations to communicate
architectural and design options, and educate development teams and business users
● Resolve defects/bugs during QA testing, pre-production, production, and post-release
patches
● Work cross-functionally with various bidgely teams including: product management,
QA/QE, various product lines, and/or business units to drive forward results
Requirements
● BS/MS in computer science or equivalent work experience
● 2-4 years’ experience designing and developing applications in Data Engineering
● Hands-on experience with Big data Eco Systems.
● Hadoop,Hdfs,Map Reduce,YARN,AWS Cloud, EMR, S3, Spark, Cassandra, Kafka,
Zookeeper
● Expertise with any of the following Object-Oriented Languages (OOD): Java/J2EE,Scala,
Python
● Strong leadership experience: Leading meetings, presenting if required
● Excellent communication skills: Demonstrated ability to explain complex technical
issues to both technical and non-technical audiences
● Expertise in the Software design/architecture process
● Expertise with unit testing & Test-Driven Development (TDD)
● Experience on Cloud or AWS is preferable
● Have a good understanding and ability to develop software, prototypes, or proofs of
concepts (POC's) for various Data Engineering requirements.
Requirements
Experience
- 5+ years of professional experience in implementing MLOps framework to scale up ML in production.
- Hands-on experience with Kubernetes, Kubeflow, MLflow, Sagemaker, and other ML model experiment management tools including training, inference, and evaluation.
- Experience in ML model serving (TorchServe, TensorFlow Serving, NVIDIA Triton inference server, etc.)
- Proficiency with ML model training frameworks (PyTorch, Pytorch Lightning, Tensorflow, etc.).
- Experience with GPU computing to do data and model training parallelism.
- Solid software engineering skills in developing systems for production.
- Strong expertise in Python.
- Building end-to-end data systems as an ML Engineer, Platform Engineer, or equivalent.
- Experience working with cloud data processing technologies (S3, ECR, Lambda, AWS, Spark, Dask, ElasticSearch, Presto, SQL, etc.).
- Having Geospatial / Remote sensing experience is a plus.
Data Engineer
Mandatory Requirements
- Experience in AWS Glue
- Experience in Apache Parquet
- Proficient in AWS S3 and data lake
- Knowledge of Snowflake
- Understanding of file-based ingestion best practices.
- Scripting language - Python & pyspark
CORE RESPONSIBILITIES
- Create and manage cloud resources in AWS
- Data ingestion from different data sources which exposes data using different technologies, such as: RDBMS, REST HTTP API, flat files, Streams, and Time series data based on various proprietary systems. Implement data ingestion and processing with the help of Big Data technologies
- Data processing/transformation using various technologies such as Spark and Cloud Services. You will need to understand your part of business logic and implement it using the language supported by the base data platform
- Develop automated data quality check to make sure right data enters the platform and verifying the results of the calculations
- Develop an infrastructure to collect, transform, combine and publish/distribute customer data.
- Define process improvement opportunities to optimize data collection, insights and displays.
- Ensure data and results are accessible, scalable, efficient, accurate, complete and flexible
- Identify and interpret trends and patterns from complex data sets
- Construct a framework utilizing data visualization tools and techniques to present consolidated analytical and actionable results to relevant stakeholders.
- Key participant in regular Scrum ceremonies with the agile teams
- Proficient at developing queries, writing reports and presenting findings
- Mentor junior members and bring best industry practices
QUALIFICATIONS
- 5-7+ years’ experience as data engineer in consumer finance or equivalent industry (consumer loans, collections, servicing, optional product, and insurance sales)
- Strong background in math, statistics, computer science, data science or related discipline
- Advanced knowledge one of language: Java, Scala, Python, C#
- Production experience with: HDFS, YARN, Hive, Spark, Kafka, Oozie / Airflow, Amazon Web Services (AWS), Docker / Kubernetes, Snowflake
- Proficient with
- Data mining/programming tools (e.g. SAS, SQL, R, Python)
- Database technologies (e.g. PostgreSQL, Redshift, Snowflake. and Greenplum)
- Data visualization (e.g. Tableau, Looker, MicroStrategy)
- Comfortable learning about and deploying new technologies and tools.
- Organizational skills and the ability to handle multiple projects and priorities simultaneously and meet established deadlines.
- Good written and oral communication skills and ability to present results to non-technical audiences
- Knowledge of business intelligence and analytical tools, technologies and techniques.
Familiarity and experience in the following is a plus:
- AWS certification
- Spark Streaming
- Kafka Streaming / Kafka Connect
- ELK Stack
- Cassandra / MongoDB
- CI/CD: Jenkins, GitLab, Jira, Confluence other related tools
JOB TITLE - Product Development Engineer - Machine Learning
● Work Location: Hyderabad
● Full-time
Company Description
Phenom People is the leader in Talent Experience Marketing (TXM for short). We’re an early-stage startup on a mission to fundamentally transform how companies acquire talent. As a category creator, our goals are two-fold: to educate talent acquisition and HR leaders on the benefits of TXM and to help solve their recruiting pain points.
Job Responsibilities:
- Design and implement machine learning, information extraction, probabilistic matching algorithms and models
- Research and develop innovative, scalable and dynamic solutions to hard problems
- Work closely with Machine Learning Scientists (PhDs), ML engineers, data scientists and data engineers to address challenges head-on.
- Use the latest advances in NLP, data science and machine learning to enhance our products and create new experiences
- Scale machine learning algorithm that powers our platform to support our growing customer base and increasing data volume
- Be a valued contributor in shaping the future of our products and services
- You will be part of our Data Science & Algorithms team and collaborate with product management and other team members
- Be part of a fast pace, fun-focused, agile team
Job Requirement:
- 4+ years of industry experience
- Ph.D./MS/B.Tech in computer science, information systems, or similar technical field
- Strong mathematics, statistics, and data analytics
- Solid coding and engineering skills preferably in Machine Learning (not mandatory)
- Proficient in Java, Python, and Scala
- Industry experience building and productionizing end-to-end systems
- Knowledge of Information Extraction, NLP algorithms coupled with Deep Learning
- Experience with data processing and storage frameworks like Hadoop, Spark, Kafka etc.
Position Summary
We’re looking for a Machine Learning Engineer to join our team of Phenom. We are expecting the below points to full fill this role.
- Capable of building accurate machine learning models is the main goal of a machine learning engineer
- Linear Algebra, Applied Statistics and Probability
- Building Data Models
- Strong knowledge of NLP
- Good understanding of multithreaded and object-oriented software development
- Mathematics, Mathematics and Mathematics
- Collaborate with Data Engineers to prepare data models required for machine learning models
- Collaborate with other product team members to apply state-of-the-art Ai methods that include dialogue systems, natural language processing, information retrieval and recommendation systems
- Build large-scale software systems and numerical computation topics
- Use predictive analytics and data mining to solve complex problems and drive business decisions
- Should be able to design the accurate ML end-to-end architecture including the data flows, algorithm scalability, and applicability
- Tackle situations where problem is unknown and the Solution is unknown
- Solve analytical problems, and effectively communicate methodologies and results to the customers
- Adept at translating business needs into technical requirements and translating data into actionable insights
- Work closely with internal stakeholders such as business teams, product managers, engineering teams, and customer success teams.
Benefits
- Competitive salary for a startup
- Gain experience rapidly
- Work directly with the executive team
- Fast-paced work environment
About Phenom People
At PhenomPeople, we believe candidates (Job seekers) are consumers. That’s why we’re bringing e-commerce experience to the job search, with a view to convert candidates into applicants. The Intelligent Career Site™ platform delivers the most relevant and personalized job search yet, with a career site optimized for mobile and desktop interfaces designed to integrate with any ATS, tailored content selection like Glassdoor reviews, YouTube videos and LinkedIn connections based on candidate search habits and an integrated real-time recruiting analytics dashboard.
Use Company career sites to reach candidates and encourage them to convert. The Intelligent Career Site™ offers a single platform to serve candidates a modern e-commerce experience from anywhere on the globe and on any device.
We track every visitor that comes to the Company career site. Through fingerprinting technology, candidates are tracked from the first visit and served jobs and content based on their location, click-stream, behavior on site, browser and device to give each visitor the most relevant experience.
Like consumers, candidates research companies and read reviews before they apply for a job. Through our understanding of the candidate journey, we are able to personalize their experience and deliver relevant content from sources such as corporate career sites, Glassdoor, YouTube and LinkedIn.
We give you clear visibility into the Company's candidate pipeline. By tracking up to 450 data points, we build profiles for every career site visitor based on their site visit behavior, social footprint and any other relevant data available on the open web.
Gain a better understanding of Company’s recruiting spending and where candidates convert or drop off from Company’s career site. The real-time analytics dashboard offers companies actionable insights on optimizing source spending and the candidate experience.
Kindly explore about the company phenom (https://www.phenom.com/">https://www.phenom.com/)
Youtube - https://www.youtube.com/c/PhenomPeople">https://www.youtube.com/c/PhenomPeople
LinkedIn - https://www.linkedin.com/company/phenompeople/">https://www.linkedin.com/company/phenompeople/
https://www.phenom.com/">Phenom | Talent Experience Management
- Modeling complex problems, discovering insights, and identifying opportunities through the use of statistical, algorithmic, mining, and visualization techniques
- Experience working with business understanding the requirement, creating the problem statement, and building scalable and dependable Analytical solutions
- Must have hands-on and strong experience in Python
- Broad knowledge of fundamentals and state-of-the-art in NLP and machine learning
- Strong analytical & algorithm development skills
- Deep knowledge of techniques such as Linear Regression, gradient descent, Logistic Regression, Forecasting, Cluster analysis, Decision trees, Linear Optimization, Text Mining, etc
- Ability to collaborate across teams and strong interpersonal skills
Skills
- Sound theoretical knowledge in ML algorithm and their application
- Hands-on experience in statistical modeling tools such as R, Python, and SQL
- Hands-on experience in Machine learning/data science
- Strong knowledge of statistics
- Experience in advanced analytics / Statistical techniques – Regression, Decision trees, Ensemble machine learning algorithms, etc
- Experience in Natural Language Processing & Deep Learning techniques
- Pandas, NLTK, Scikit-learn, SpaCy, Tensorflow
- Analyze and organize raw data
- Build data systems and pipelines
- Evaluate business needs and objectives
- Interpret trends and patterns
- Conduct complex data analysis and report on results
- Build algorithms and prototypes
- Combine raw information from different sources
- Explore ways to enhance data quality and reliability
- Identify opportunities for data acquisition
- Should have experience in Python, Django Micro Service Senior developer with Financial Services/Investment Banking background.
- Develop analytical tools and programs
- Collaborate with data scientists and architects on several projects
- Should have 5+ years of experience as a data engineer or in a similar role
- Technical expertise with data models, data mining, and segmentation techniques
- Should have experience programming languages such as Python
- Hands-on experience with SQL database design
- Great numerical and analytical skills
- Degree in Computer Science, IT, or similar field; a Master’s is a plus
- Data engineering certification (e.g. IBM Certified Data Engineer) is a plus
- Banking Domain
- Assist the team in building Machine learning/AI/Analytics models on open-source stack using Python and the Azure cloud stack.
- Be part of the internal data science team at fragma data - that provides data science consultation to large organizations such as Banks, e-commerce Cos, Social Media companies etc on their scalable AI/ML needs on the cloud and help build POCs, and develop Production ready solutions.
- Candidates will be provided with opportunities for training and professional certifications on the job in these areas - Azure Machine learning services, Microsoft Customer Insights, Spark, Chatbots, DataBricks, NoSQL databases etc.
- Assist the team in conducting AI demos, talks, and workshops occasionally to large audiences of senior stakeholders in the industry.
- Work on large enterprise scale projects end-to-end, involving domain specific projects across banking, finance, ecommerce, social media etc.
- Keen interest to learn new technologies and latest developments and apply them to projects assigned.
Desired Skills |
- Professional Hands-on coding experience in python for over 1 year for Data scientist, and over 3 years for Sr Data Scientist.
- This is primarily a programming/development-
oriented role - hence strong programming skills in writing object-oriented and modular code in python and experience of pushing projects to production is important. - Strong foundational knowledge and professional experience in
- Machine learning, (Compulsory)
- Deep Learning (Compulsory)
- Strong knowledge of At least One of : Natural Language Processing or Computer Vision or Speech Processing or Business Analytics
- Understanding of Database technologies and SQL. (Compulsory)
- Knowledge of the following Frameworks:
- Scikit-learn (Compulsory)
- Keras/tensorflow/pytorch (At least one of these is Compulsory)
- API development in python for ML models (good to have)
- Excellent communication skills.
- Excellent communication skills are necessary to succeed in this role, as this is a role with high external visibility, and with multiple opportunities to present data science results to a large external audience that will include external VPs, Directors, CXOs etc.
- Hence communication skills will be a key consideration in the selection process.
2. Assemble large, complex data sets that meet business requirements
3. Identify, design, and implement internal process improvements
4. Optimize data delivery and re-design infrastructure for greater scalability
5. Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL and AWS technologies
6. Build analytics tools that utilize the data pipeline to provide actionable insights into customer acquisition, operational efficiency, and other key business performance metrics
7. Work with internal and external stakeholders to assist with data-related technical issues and support data infrastructure needs
8. Create data tools for analytics and data scientist team members
Skills Required:
1. Working knowledge of ETL on any cloud (Azure / AWS / GCP)
2. Proficient in Python (Programming / Scripting)
3. Good understanding of any of the data warehousing concepts (Snowflake / AWS Redshift / Azure Synapse Analytics / Google Big Query / Hive)
4. In-depth understanding of principles of database structure
5. Good understanding of any of the ETL technologies (Informatica PowerCenter / AWS Glue / Data Factory / SSIS / Spark / Matillion / Talend / Azure)
6. Proficient in SQL (query solving)
7. Knowledge in Change case Management / Version Control – (VSS / DevOps / TFS / GitHub, Bit bucket, CICD Jenkin)