Understand various raw data input formats, build consumers on Kafka/ksqldb for them and ingest large amounts of raw data into Flink and Spark.
Conduct complex data analysis and report on results.
Build various aggregation streams for data and convert raw data into various logical processing streams.
Build algorithms to integrate multiple sources of data and create a unified data model from all the sources.
Build a unified data model on both SQL and NO-SQL databases to act as data sink.
Communicate the designs effectively with the fullstack engineering team for development.
Explore machine learning models that can be fitted on top of the data pipelines.
Mandatory Qualifications Skills:
Deep knowledge of Scala and Java programming languages is mandatory
Strong background in streaming data frameworks (Apache Flink, Apache Spark) is mandatory
Good understanding and hands on skills on streaming messaging platforms such as Kafka
Familiarity with R, C and Python is an asset
Analytical mind and business acumen with strong math skills (e.g. statistics, algebra)
Problem-solving aptitude
Excellent communication and presentation skills
Similar jobs
Job Description - Data Engineer
About us
Propellor is aimed at bringing Marketing Analytics and other Business Workflows to the Cloud ecosystem. We work with International Clients to make their Analytics ambitions come true, by deploying the latest tech stack and data science and engineering methods, making their business data insightful and actionable.
What is the role?
This team is responsible for building a Data Platform for many different units. This platform will be built on Cloud and therefore in this role, the individual will be organizing and orchestrating different data sources, and
giving recommendations on the services that fulfil goals based on the type of data
Qualifications:
• Experience with Python, SQL, Spark
• Knowledge/notions of JavaScript
• Knowledge of data processing, data modeling, and algorithms
• Strong in data, software, and system design patterns and architecture
• API building and maintaining
• Strong soft skills, communication
Nice to have:
• Experience with cloud: Google Cloud Platform, AWS, Azure
• Knowledge of Google Analytics 360 and/or GA4.
Key Responsibilities
• Work on the core backend and ensure it meets the performance benchmarks.
• Designing and developing APIs for the front end to consume.
• Constantly improve the architecture of the application by clearing the technical backlog.
• Meeting both technical and consumer needs.
• Staying abreast of developments in web applications and programming languages.
Key Responsibilities
• Design and develop platform based on microservices architecture.
• Work on the core backend and ensure it meets the performance benchmarks.
• Work on the front end with ReactJS.
• Designing and developing APIs for the front end to consume.
• Constantly improve the architecture of the application by clearing the technical backlog.
• Meeting both technical and consumer needs.
• Staying abreast of developments in web applications and programming languages.
What are we looking for?
An enthusiastic individual with the following skills. Please do not hesitate to apply if you do not match all of it. We are open to promising candidates who are passionate about their work and are team players.
• Education - BE/MCA or equivalent.
• Agnostic/Polyglot with multiple tech stacks.
• Worked on open-source technologies – NodeJS, ReactJS, MySQL, NoSQL, MongoDB, DynamoDB.
• Good experience with Front-end technologies like ReactJS.
• Backend exposure – good knowledge of building API.
• Worked on serverless technologies.
• Efficient in building microservices in combining server & front-end.
• Knowledge of cloud architecture.
• Should have sound working experience with relational and columnar DB.
• Should be innovative and communicative in approach.
• Will be responsible for the functional/technical track of a project.
Whom will you work with?
You will closely work with the engineering team and support the Product Team.
Hiring Process includes :
a. Written Test on Python and SQL
b. 2 - 3 rounds of Interviews
Immediate Joiners will be preferred
Data Engineer_Scala
Job Description:
We are looking for a Big Data Engineer who have worked across the entire ETL stack. Someone who has ingested data in a batch and live stream format, transformed large volumes of daily and built Data-warehouse to store the transformed data and has integrated different visualization dashboards and applications with the data stores. The primary focus will be on choosing optimal solutions to use for these purposes, then maintaining, implementing, and monitoring them.
Responsibilities:
- Develop, test, and implement data solutions based on functional / non-functional business requirements.
- You would be required to code in Scala and PySpark daily on Cloud as well as on-prem infrastructure
- Build Data Models to store the data in a most optimized manner
- Identify, design, and implement process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
- Implementing the ETL process and optimal data pipeline architecture
- Monitoring performance and advising any necessary infrastructure changes.
- Create data tools for analytics and data scientist team members that assist them in building and optimizing our product into an innovative industry leader.
- Work with data and analytics experts to strive for greater functionality in our data systems.
- Proactively identify potential production issues and recommend and implement solutions
- Must be able to write quality code and build secure, highly available systems.
- Create design documents that describe the functionality, capacity, architecture, and process.
- Review peer-codes and pipelines before deploying to Production for optimization issues and code standards
Skill Sets:
- Good understanding of optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL and ‘big data’ technologies.
- Proficient understanding of distributed computing principles
- Experience in working with batch processing/ real-time systems using various open-source technologies like NoSQL, Spark, Pig, Hive, Apache Airflow.
- Implemented complex projects dealing with the considerable data size (PB).
- Optimization techniques (performance, scalability, monitoring, etc.)
- Experience with integration of data from multiple data sources
- Experience with NoSQL databases, such as HBase, Cassandra, MongoDB, etc.,
- Knowledge of various ETL techniques and frameworks, such as Flume
- Experience with various messaging systems, such as Kafka or RabbitMQ
- Creation of DAGs for data engineering
- Expert at Python /Scala programming, especially for data engineering/ ETL purposes
You will:
- Architect and implement modules for ingesting, storing and manipulating large data sets for a variety of cybersecurity use-cases.
- Write code to provide backend support for data-driven UI widgets, web dashboards, workflows, search and API connectors.
- Design and implement high performance APIs between our frontend and backend components, and between different backend components.
- Build production quality solutions that balance complexity and performance
- Participate in the engineering life-cycle at Balbix, including designing high quality UI components, writing production code, conducting code reviews and working alongside our backend infrastructure and reliability teams
- Stay current on the ever-evolving technology landscape of web based UIs and recommend new systems for incorporation in our technology stack.
- Product-focused and passionate about building truly usable systems
- Collaborative and comfortable working with across teams including data engineering, front end, product management, and DevOps
- Responsible and like to take ownership of challenging problems
- A good communicator, and facilitate teamwork via good documentation practices
- Comfortable with ambiguity and able to iterate quickly in response to an evolving understanding of customer needs
- Curious about the world and your profession, and a constant learner
- BS in Computer Science or related field
- Atleast 3+ years of experience in the backend web stack (Node.js, MongoDB, Redis, Elastic Search, Postgres, Java, Python, Docker, Kubernetes, etc.)
- SQL, no-SQL database experience
- Experience building API (development experience using GraphQL is a plus)
- Familiarity with issues of web performance, availability, scalability, reliability, and maintainability
- Bring in industry best practices around creating and maintaining robust data pipelines for complex data projects with/without AI component
- programmatically ingesting data from several static and real-time sources (incl. web scraping)
- rendering results through dynamic interfaces incl. web / mobile / dashboard with the ability to log usage and granular user feedbacks
- performance tuning and optimal implementation of complex Python scripts (using SPARK), SQL (using stored procedures, HIVE), and NoSQL queries in a production environment
- Industrialize ML / DL solutions and deploy and manage production services; proactively handle data issues arising on live apps
- Perform ETL on large and complex datasets for AI applications - work closely with data scientists on performance optimization of large-scale ML/DL model training
- Build data tools to facilitate fast data cleaning and statistical analysis
- Ensure data architecture is secure and compliant
- Resolve issues escalated from Business and Functional areas on data quality, accuracy, and availability
- Work closely with APAC CDO and coordinate with a fully decentralized team across different locations in APAC and global HQ (Paris).
You should be
- Expert in structured and unstructured data in traditional and Big data environments – Oracle / SQLserver, MongoDB, Hive / Pig, BigQuery, and Spark
- Have excellent knowledge of Python programming both in traditional and distributed models (PySpark)
- Expert in shell scripting and writing schedulers
- Hands-on experience with Cloud - deploying complex data solutions in hybrid cloud / on-premise environment both for data extraction/storage and computation
- Hands-on experience in deploying production apps using large volumes of data with state-of-the-art technologies like Dockers, Kubernetes, and Kafka
- Strong knowledge of data security best practices
- 5+ years experience in a data engineering role
- Science / Engineering graduate from a Tier-1 university in the country
- And most importantly, you must be a passionate coder who really cares about building apps that can help people do things better, smarter, and faster even when they sleep
- Hands-on experience in any Cloud Platform
- Microsoft Azure Experience
- 3-6 years of relevant work experience in a Data Engineering role.
- Advanced working SQL knowledge and experience working with relational databases, query authoring (SQL) as well as working familiarity with a variety of databases.
- Experience building and optimizing data pipelines, architectures, and data sets.
- Experience performing root cause analysis on internal and external data and processes to answer specific business questions and identify opportunities for improvement.
- Strong analytic skills related to working with unstructured datasets.
- A good understanding of Airflow, Spark, NoSQL databases, Kafka is nice to have.
- Premium Institute Candidates only
About antuit.ai
Antuit.ai is the leader in AI-powered SaaS solutions for Demand Forecasting & Planning, Merchandising and Pricing. We have the industry’s first solution portfolio – powered by Artificial Intelligence and Machine Learning – that can help you digitally transform your Forecasting, Assortment, Pricing, and Personalization solutions. World-class retailers and consumer goods manufacturers leverage antuit.ai solutions, at scale, to drive outsized business results globally with higher sales, margin and sell-through.
Antuit.ai’s executives, comprised of industry leaders from McKinsey, Accenture, IBM, and SAS, and our team of Ph.Ds., data scientists, technologists, and domain experts, are passionate about delivering real value to our clients. Antuit.ai is funded by Goldman Sachs and Zodius Capital.
The Role:
Antuit.ai is interested in hiring a Principal Data Scientist, this person will facilitate standing up standardization and automation ecosystem for ML product delivery, he will also actively participate in managing implementation, design and tuning of product to meet business needs.
Responsibilities:
Responsibilities includes, but are not limited to the following:
- Manage and provides technical expertise to the delivery team. This includes recommendation of solution alternatives, identification of risks and managing business expectations.
- Design, build reliable and scalable automated processes for large scale machine learning.
- Use engineering expertise to help design solutions to novel problems in software development, data engineering, and machine learning.
- Collaborate with Business, Technology and Product teams to stand-up MLOps process.
- Apply your experience in making intelligent, forward-thinking, technical decisions to delivery ML ecosystem, including implementing new standards, architecture design, and workflows tools.
- Deep dive into complex algorithmic and product issues in production
- Own metrics and reporting for delivery team.
- Set a clear vision for the team members and working cohesively to attain it.
- Mentor and coach team members
Qualifications and Skills:
Requirements
- Engineering degree in any stream
- Has at least 7 years of prior experience in building ML driven products/solutions
- Excellent programming skills in any one of the language C++ or Python or Java.
- Hands on experience on open source libraries and frameworks- Tensorflow,Pytorch, MLFlow, KubeFlow, etc.
- Developed and productized large-scale models/algorithms in prior experience
- Can drive fast prototypes/proof of concept in evaluating various technology, frameworks/performance benchmarks.
- Familiar with software development practices/pipelines (DevOps- Kubernetes, docker containers, CI/CD tools).
- Good verbal, written and presentation skills.
- Ability to learn new skills and technologies.
- 3+ years working with retail or CPG preferred.
- Experience in forecasting and optimization problems, particularly in the CPG / Retail industry preferred.
Information Security Responsibilities
- Understand and adhere to Information Security policies, guidelines and procedure, practice them for protection of organizational data and Information System.
- Take part in Information Security training and act accordingly while handling information.
- Report all suspected security and policy breach to Infosec team or appropriate authority (CISO).
EEOC
Antuit.ai is an at-will, equal opportunity employer. We consider applicants for all positions without regard to race, color, religion, national origin or ancestry, gender identity, sex, age (40+), marital status, disability, veteran status, or any other legally protected status under local, state, or federal law.
- Extract and present valuable information from data
- Understand business requirements and generate insights
- Build mathematical models, validate and work with them
- Explain complex topics tailored to the audience
- Validate and follow up on results
- Work with large and complex data sets
- Establish priorities with clear goals and responsibilities to achieve a high level of performance.
- Work in an agile and iterative manner on solving problems
- Evaluate different options proactively and the ability to solve problems in an innovative way. Develop new solutions or combine existing methods to create new approaches.
- Good understanding of Digital & analytics
- Strong communication skills, orally and in writing
Job Overview:
As a Data Scientist, you will work in collaboration with our business and engineering people, on creating value from data. Often the work requires solving complex problems by turning vast amounts of data into business insights through advanced analytics, modeling, and machine learning. You have a strong foundation in analytics, mathematical modeling, computer science, and math - coupled with a strong business sense. You proactively fetch information from various sources and analyze it for better understanding of how the business performs. Furthermore, you model and build AI tools that automate certain processes within the company. The solutions produced will be implemented to impact business results.
Primary Responsibilities:
- Develop an understanding of business obstacles, create solutions based on advanced analytics and draw implications for model development
- Combine, explore, and draw insights from data. Often large and complex data assets from different parts of the business.
- Design and build explorative, predictive- or prescriptive models, utilizing optimization, simulation, and machine learning techniques
- Prototype and pilot new solutions and be a part of the aim of ‘productizing’ those valuable solutions that can have an impact at a global scale
- Guides and coaches other chapter colleagues to help solve data/technical problems at an operational level, and in methodologies to help improve development processes
- Identifies and interprets trends and patterns in complex data sets to enable the business to make data-driven decisions
ETL Talend developer
To be considered as a candidate for a Senior Data Engineer position, a person must have a proven track record of architecting data solutions on current and advanced technical platforms. They must have leadership abilities to lead a team providing data centric solutions with best practices and modern technologies in mind. They look to build collaborative relationships across all levels of the business and the IT organization. They possess analytic and problem-solving skills and have the ability to research and provide appropriate guidance for synthesizing complex information and extract business value. Have the intellectual curiosity and ability to deliver solutions with creativity and quality. Effectively work with business and customers to obtain business value for the requested work. Able to communicate technical results to both technical and non-technical users using effective story telling techniques and visualizations. Demonstrated ability to perform high quality work with innovation both independently and collaboratively.
Data Engineer
We are looking for a Data Engineer that will be responsible for collecting, storing, processing, and analyzing huge sets of data that is coming from different sources.
Responsibilities
Working with Big Data tools and frameworks to provide requested capabilities Identify development needs in order to improve and streamline operations Develop and manage BI solutions Implementing ETL process and Data Warehousing Monitoring performance and managing infrastructure
Skills
Proficient understanding of distributed computing principles Proficiency with Hadoop and Spark Experience with building stream-processing systems, using solutions such as Kafka and Spark-Streaming Good knowledge of Data querying tools SQL and Hive Knowledge of various ETL techniques and frameworks Experience with Python/Java/Scala (at least one) Experience with cloud services such as AWS or GCP Experience with NoSQL databases, such as DynamoDB,MongoDB will be an advantage Excellent written and verbal communication skills