- Design thinking to really understand the business problem
- Understanding new ways to deliver (agile, DT)
- Being able to do a functional design across S/4HANA and SCP). An understanding of the possibilities around automation/RPA (which should include UIPath, Blueprism, Contextor) and how these can be identified and embedded in business processes
- Following on from this, the same is true for AI and ML: What is available in SAP standard, how can these be enhanced/developed further, how these technologies can be embedded in the business process. There is no point in understanding the standard process, or the AI and ML components, we will need a new type of hybrid SAP practitioner.
Similar jobs
- Work in collaboration with the application team and integration team to design, create, and maintain optimal data pipeline architecture and data structures for Data Lake/Data Warehouse.
- Work with stakeholders including the Sales, Product, and Customer Support teams to assist with data-related technical issues and support their data analytics needs.
- Assemble large, complex data sets from third-party vendors to meet business requirements.
- Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
- Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL, Elasticsearch, MongoDB, and AWS technology.
- Streamline existing and introduce enhanced reporting and analysis solutions that leverage complex data sources derived from multiple internal systems.
Requirements
- 5+ years of experience in a Data Engineer role.
- Proficiency in Linux.
- Must have SQL knowledge and experience working with relational databases, query authoring (SQL) as well as familiarity with databases including Mysql, Mongo, Cassandra, and Athena.
- Must have experience with Python/Scala.
- Must have experience with Big Data technologies like Apache Spark.
- Must have experience with Apache Airflow.
- Experience with data pipeline and ETL tools like AWS Glue.
- Experience working with AWS cloud services: EC2, S3, RDS, Redshift.
Requirements-
● B.Tech/Masters in Mathematics, Statistics, Computer Science or another quantitative field
● 2-3+ years of work experience in ML domain ( 2-5 years experience )
● Hands-on coding experience in Python
● Experience in machine learning techniques such as Regression, Classification,Predictive modeling, Clustering, Deep Learning stack, NLP.
● Working knowledge of Tensorflow/PyTorch
Optional Add-ons-
● Experience with distributed computing frameworks: Map/Reduce, Hadoop, Spark etc.
● Experience with databases: MongoDB
Requirements:
● Understanding our data sets and how to bring them together.
● Working with our engineering team to support custom solutions offered to the product development.
● Filling the gap between development, engineering and data ops.
● Creating, maintaining and documenting scripts to support ongoing custom solutions.
● Excellent organizational skills, including attention to precise details
● Strong multitasking skills and ability to work in a fast-paced environment
● 5+ years experience with Python to develop scripts.
● Know your way around RESTFUL APIs.[Able to integrate not necessary to publish]
● You are familiar with pulling and pushing files from SFTP and AWS S3.
● Experience with any Cloud solutions including GCP / AWS / OCI / Azure.
● Familiarity with SQL programming to query and transform data from relational Databases.
● Familiarity to work with Linux (and Linux work environment).
● Excellent written and verbal communication skills
● Extracting, transforming, and loading data into internal databases and Hadoop
● Optimizing our new and existing data pipelines for speed and reliability
● Deploying product build and product improvements
● Documenting and managing multiple repositories of code
● Experience with SQL and NoSQL databases (Casendra, MySQL)
● Hands-on experience in data pipelining and ETL. (Any of these frameworks/tools: Hadoop, BigQuery,
RedShift, Athena)
● Hands-on experience in AirFlow
● Understanding of best practices, common coding patterns and good practices around
● storing, partitioning, warehousing and indexing of data
● Experience in reading the data from Kafka topic (both live stream and offline)
● Experience in PySpark and Data frames
Responsibilities:
You’ll
● Collaborating across an agile team to continuously design, iterate, and develop big data systems.
● Extracting, transforming, and loading data into internal databases.
● Optimizing our new and existing data pipelines for speed and reliability.
● Deploying new products and product improvements.
● Documenting and managing multiple repositories of code.
Data Engineer- Senior
Cubera is a data company revolutionizing big data analytics and Adtech through data share value principles wherein the users entrust their data to us. We refine the art of understanding, processing, extracting, and evaluating the data that is entrusted to us. We are a gateway for brands to increase their lead efficiency as the world moves towards web3.
What are you going to do?
Design & Develop high performance and scalable solutions that meet the needs of our customers.
Closely work with the Product Management, Architects and cross functional teams.
Build and deploy large-scale systems in Java/Python.
Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
Create data tools for analytics and data scientist team members that assist them in building and optimizing their algorithms.
Follow best practices that can be adopted in Bigdata stack.
Use your engineering experience and technical skills to drive the features and mentor the engineers.
What are we looking for ( Competencies) :
Bachelor’s degree in computer science, computer engineering, or related technical discipline.
Overall 5 to 8 years of programming experience in Java, Python including object-oriented design.
Data handling frameworks: Should have a working knowledge of one or more data handling frameworks like- Hive, Spark, Storm, Flink, Beam, Airflow, Nifi etc.
Data Infrastructure: Should have experience in building, deploying and maintaining applications on popular cloud infrastructure like AWS, GCP etc.
Data Store: Must have expertise in one of general-purpose No-SQL data stores like Elasticsearch, MongoDB, Redis, RedShift, etc.
Strong sense of ownership, focus on quality, responsiveness, efficiency, and innovation.
Ability to work with distributed teams in a collaborative and productive manner.
Benefits:
Competitive Salary Packages and benefits.
Collaborative, lively and an upbeat work environment with young professionals.
Job Category: Development
Job Type: Full Time
Job Location: Bangalore
Job Responsibilities:-
- Develop robust, scalable and maintainable machine learning models to answer business problems against large data sets.
- Build methods for document clustering, topic modeling, text classification, named entity recognition, sentiment analysis, and POS tagging.
- Perform elements of data cleaning, feature selection and feature engineering and organize experiments in conjunction with best practices.
- Benchmark, apply, and test algorithms against success metrics. Interpret the results in terms of relating those metrics to the business process.
- Work with development teams to ensure models can be implemented as part of a delivered solution replicable across many clients.
- Knowledge of Machine Learning, NLP, Document Classification, Topic Modeling and Information Extraction with a proven track record of applying them to real problems.
- Experience working with big data systems and big data concepts.
- Ability to provide clear and concise communication both with other technical teams and non-technical domain specialists.
- Strong team player; ability to provide both a strong individual contribution but also work as a team and contribute to wider goals is a must in this dynamic environment.
- Experience with noisy and/or unstructured textual data.
knowledge graph and NLP including summarization, topic modelling etc
- Strong coding ability with statistical analysis tools in Python or R, and general software development skills (source code management, debugging, testing, deployment, etc.)
- Working knowledge of various text mining algorithms and their use-cases such as keyword extraction, PLSA, LDA, HMM, CRF, deep learning & recurrent ANN, word2vec/doc2vec, Bayesian modeling.
- Strong understanding of text pre-processing and normalization techniques, such as tokenization,
- POS tagging and parsing and how they work at a low level.
- Excellent problem solving skills.
- Strong verbal and written communication skills
- Masters or higher in data mining or machine learning; or equivalent practical analytics / modelling experience
- Practical experience in using NLP related techniques and algorithms
- Experience in open source coding and communities desirable.
Able to containerize Models and associated modules and work in a Microservices environment
- Minimum 3-4 years of experience with ETL tools, SQL, SSAS & SSIS
- Good understanding of Data Governance, including Master Data Management (MDM) and Data Quality tools and processes
- Knowledge of programming languages eg. JASON, Python, R
- Hands on experience of SQL database design
- Experience working with REST API
- Influencing and supporting project delivery through involvement in project/sprint planning and QA
- Working experience with Azure
- Stakeholder management
- Good communication skills
Minimum of 4 years’ experience of working on DW/ETL projects and expert hands-on working knowledge of ETL tools.
Experience with Data Management & data warehouse development
Star schemas, Data Vaults, RDBMS, and ODS
Change Data capture
Slowly changing dimensions
Data governance
Data quality
Partitioning and tuning
Data Stewardship
Survivorship
Fuzzy Matching
Concurrency
Vertical and horizontal scaling
ELT, ETL
Spark, Hadoop, MPP, RDBMS
Experience with Dev/OPS architecture, implementation and operation
Hand's on working knowledge of Unix/Linux
Building Complex SQL Queries. Expert SQL and data analysis skills, ability to debug and fix data issue.
Complex ETL program design coding
Experience in Shell Scripting, Batch Scripting.
Good communication (oral & written) and inter-personal skills
Expert SQL and data analysis skill, ability to debug and fix data issue Work closely with business teams to understand their business needs and participate in requirements gathering, while creating artifacts and seek business approval.
Helping business define new requirements, Participating in End user meetings to derive and define the business requirement, propose cost effective solutions for data analytics and familiarize the team with the customer needs, specifications, design targets & techniques to support task performance and delivery.
Propose good design & solutions and adherence to the best Design & Standard practices.
Review & Propose industry best tools & technology for ever changing business rules and data set. Conduct Proof of Concepts (POC) with new tools & technologies to derive convincing benchmarks.
Prepare the plan, design and document the architecture, High-Level Topology Design, Functional Design, and review the same with customer IT managers and provide detailed knowledge to the development team to familiarize them with customer requirements, specifications, design standards and techniques.
Review code developed by other programmers, mentor, guide and monitor their work ensuring adherence to programming and documentation policies.
Work with functional business analysts to ensure that application programs are functioning as defined.
Capture user-feedback/comments on the delivered systems and document it for the client and project manager’s review. Review all deliverables before final delivery to client for quality adherence.
Technologies (Select based on requirement)
Databases - Oracle, Teradata, Postgres, SQL Server, Big Data, Snowflake, or Redshift
Tools – Talend, Informatica, SSIS, Matillion, Glue, or Azure Data Factory
Utilities for bulk loading and extracting
Languages – SQL, PL-SQL, T-SQL, Python, Java, or Scala
J/ODBC, JSON
Data Virtualization Data services development
Service Delivery - REST, Web Services
Data Virtualization Delivery – Denodo
ELT, ETL
Cloud certification Azure
Complex SQL Queries
Data Ingestion, Data Modeling (Domain), Consumption(RDMS)
- Banking Domain
- Assist the team in building Machine learning/AI/Analytics models on open-source stack using Python and the Azure cloud stack.
- Be part of the internal data science team at fragma data - that provides data science consultation to large organizations such as Banks, e-commerce Cos, Social Media companies etc on their scalable AI/ML needs on the cloud and help build POCs, and develop Production ready solutions.
- Candidates will be provided with opportunities for training and professional certifications on the job in these areas - Azure Machine learning services, Microsoft Customer Insights, Spark, Chatbots, DataBricks, NoSQL databases etc.
- Assist the team in conducting AI demos, talks, and workshops occasionally to large audiences of senior stakeholders in the industry.
- Work on large enterprise scale projects end-to-end, involving domain specific projects across banking, finance, ecommerce, social media etc.
- Keen interest to learn new technologies and latest developments and apply them to projects assigned.
Desired Skills |
- Professional Hands-on coding experience in python for over 1 year for Data scientist, and over 3 years for Sr Data Scientist.
- This is primarily a programming/development-
oriented role - hence strong programming skills in writing object-oriented and modular code in python and experience of pushing projects to production is important. - Strong foundational knowledge and professional experience in
- Machine learning, (Compulsory)
- Deep Learning (Compulsory)
- Strong knowledge of At least One of : Natural Language Processing or Computer Vision or Speech Processing or Business Analytics
- Understanding of Database technologies and SQL. (Compulsory)
- Knowledge of the following Frameworks:
- Scikit-learn (Compulsory)
- Keras/tensorflow/pytorch (At least one of these is Compulsory)
- API development in python for ML models (good to have)
- Excellent communication skills.
- Excellent communication skills are necessary to succeed in this role, as this is a role with high external visibility, and with multiple opportunities to present data science results to a large external audience that will include external VPs, Directors, CXOs etc.
- Hence communication skills will be a key consideration in the selection process.
SQL, Python, Numpy,Pandas,Knowledge of Hive and Data warehousing concept will be a plus point.
JD
- Strong analytical skills with the ability to collect, organise, analyse and interpret trends or patterns in complex data sets and provide reports & visualisations.
- Work with management to prioritise business KPIs and information needs Locate and define new process improvement opportunities.
- Technical expertise with data models, database design and development, data mining and segmentation techniques
- Proven success in a collaborative, team-oriented environment
- Working experience with geospatial data will be a plus.