Role: Data Engineer
Company: PayU
Location: Bangalore/ Mumbai
Experience : 2-5 yrs
About Company:
PayU is the payments and fintech business of Prosus, a global consumer internet group and one of the largest technology investors in the world. Operating and investing globally in markets with long-term growth potential, Prosus builds leading consumer internet companies that empower people and enrich communities.
The leading online payment service provider in 36 countries, PayU is dedicated to creating a fast, simple and efficient payment process for merchants and buyers. Focused on empowering people through financial services and creating a world without financial borders where everyone can prosper, PayU is one of the biggest investors in the fintech space globally, with investments totalling $700 million- to date. PayU also specializes in credit products and services for emerging markets across the globe. We are dedicated to removing risks to merchants, allowing consumers to use credit in ways that suit them and enabling a greater number of global citizens to access credit services.
Our local operations in Asia, Central and Eastern Europe, Latin America, the Middle East, Africa and South East Asia enable us to combine the expertise of high growth companies with our own unique local knowledge and technology to ensure that our customers have access to the best financial services.
India is the biggest market for PayU globally and the company has already invested $400 million in this region in last 4 years. PayU in its next phase of growth is developing a full regional fintech ecosystem providing multiple digital financial services in one integrated experience. We are going to do this through 3 mechanisms: build, co-build/partner; select strategic investments.
PayU supports over 350,000+ merchants and millions of consumers making payments online with over 250 payment methods and 1,800+ payment specialists. The markets in which PayU operates represent a potential consumer base of nearly 2.3 billion people and a huge growth potential for merchants.
Job responsibilities:
- Design infrastructure for data, especially for but not limited to consumption in machine learning applications
- Define database architecture needed to combine and link data, and ensure integrity across different sources
- Ensure performance of data systems for machine learning to customer-facing web and mobile applications using cutting-edge open source frameworks, to highly available RESTful services, to back-end Java based systems
- Work with large, fast, complex data sets to solve difficult, non-routine analysis problems, applying advanced data handling techniques if needed
- Build data pipelines, includes implementing, testing, and maintaining infrastructural components related to the data engineering stack.
- Work closely with Data Engineers, ML Engineers and SREs to gather data engineering requirements to prototype, develop, validate and deploy data science and machine learning solutions
Requirements to be successful in this role:
- Strong knowledge and experience in Python, Pandas, Data wrangling, ETL processes, statistics, data visualisation, Data Modelling and Informatica.
- Strong experience with scalable compute solutions such as in Kafka, Snowflake
- Strong experience with workflow management libraries and tools such as Airflow, AWS Step Functions etc.
- Strong experience with data engineering practices (i.e. data ingestion pipelines and ETL)
- A good understanding of machine learning methods, algorithms, pipelines, testing practices and frameworks
- Preferred) MEng/MSc/PhD degree in computer science, engineering, mathematics, physics, or equivalent (preference: DS/ AI)
- Experience with designing and implementing tools that support sharing of data, code, practices across organizations at scale
Similar jobs
Experience with various stream processing and batch processing tools (Kafka,
Spark etc). Programming with Python.
● Experience with relational and non-relational databases.
● Fairly good understanding of AWS (or any equivalent).
Key Responsibilities
● Design new systems and redesign existing systems to work at scale.
● Care about things like fault tolerance, durability, backups and recovery,
performance, maintainability, code simplicity etc.
● Lead a team of software engineers and help create an environment of ownership
and learning.
● Introduce best practices of software development and ensure their adoption
across the team.
● Help set and maintain coding standards for the team.
Proficiency in Linux.
Must have SQL knowledge and experience working with relational databases,
query authoring (SQL) as well as familiarity with databases including Mysql,
Mongo, Cassandra, and Athena.
Must have experience with Python/Scala.
Must have experience with Big Data technologies like Apache Spark.
Must have experience with Apache Airflow.
Experience with data pipeline and ETL tools like AWS Glue.
Experience working with AWS cloud services: EC2, S3, RDS, Redshift.
- Your responsibilities:
- Build, improve and extend NLP capabilities
- Research and evaluate different approaches to NLP problems
- Must be able to write code that is well designed, produce deliverable results
- Write code that scales and can be deployed to production
- Fundamentals of statistical methods is a must
- Experience in named entity recognition, POS Tagging, Lemmatization, vector representations of textual data and neural networks - RNN, LSTM
- A solid foundation in Python, data structures, algorithms, and general software development skills.
- Ability to apply machine learning to problems that deal with language
- Engineering ability to build robustly scalable pipelines
- Ability to work in a multi-disciplinary team with a strong product focus
Responsibilities
- Work on execution and scheduling of all tasks related to assigned projects' deliverable dates
- Optimize and debug existing codes to make them scalable and improve performance
- Design, development, and delivery of tested code and machine learning models into production environments
- Work effectively in teams, managing and leading teams
- Provide effective, constructive feedback to the delivery leader
- Manage client expectations and work with an agile mindset with machine learning and AI technology
- Design and prototype data-driven solutions
Eligibility
- Highly experienced in designing, building, and shipping scalable and production-quality machine learning algorithms in the field of Python applications
- Working knowledge and experience in NLP core components (NER, Entity Disambiguation, etc.)
- In-depth expertise in Data Munging and Storage (Experienced in SQL, NoSQL, MongoDB, Graph Databases)
- Expertise in writing scalable APIs for machine learning models
- Experience with maintaining code logs, task schedulers, and security
- Working knowledge of machine learning techniques, feed-forward, recurrent and convolutional neural networks, entropy models, supervised and unsupervised learning
- Experience with at least one of the following: Keras, Tensorflow, Caffe, or PyTorch
What you will do:
- Understand the process of CaaStle business teams, KPIs, and pain points
- Build scalable data products, self-service tools, data cubes to analyze and present data associated with acquisition, retention, product performance, operations, client services, etc.
- Closely partner with data engineering, product, and business teams and participate in requirements capture, research design, data collection, dashboard generation, and translation of results into actionable insights that can add value for business stakeholders
- Leverage advanced analytics to drive key success metrics for business and revenue generation
- Operationalize, implement, and automate changes to drive data-driven decisions
- Attend and play an active role in answering questions from the executive and/or business teams through data mining and analysis
We would love for you to have:
- Education: Advanced degree in Computer Science, Statistics, Mathematics, Engineering, Economics, Business Analytics or related field is required
- Experience: 2-4 years of professional experience
- Proficiency in data visualization/reporting tools (i.e. Tableau, Qlikview, etc.)
- Experience in A/B testing and measure performance of experiments
- Strong proficiency with SQL-based languages. Experience with large scale data analytics technologies (i.e., Hadoop and Spark)
- Strong analytical skills and business mindset with the ability to translate complex concepts and analysis into clear and concise takeaways to drive insights and strategies
- Excellent communication, social, and presentation skills with meticulous attention to detail
- Programming experience in Python, R, or other languages
- Knowledge of Data mining, statistical modeling approaches, and techniques
CaaStle is committed to equality of opportunity in employment. It has been and will continue to be the policy of CaaStle to provide full and equal employment opportunities to all employees and candidates for employment without regard to race, color, religion, national or ethnic origin, veteran status, age, sexual orientation, gender identity, or physical or mental disability. This policy applies to all terms, conditions and privileges of employment, such as those pertaining to training, transfer, promotion, compensation and recreational programs.
Job Responsibilities:-
- Develop robust, scalable and maintainable machine learning models to answer business problems against large data sets.
- Build methods for document clustering, topic modeling, text classification, named entity recognition, sentiment analysis, and POS tagging.
- Perform elements of data cleaning, feature selection and feature engineering and organize experiments in conjunction with best practices.
- Benchmark, apply, and test algorithms against success metrics. Interpret the results in terms of relating those metrics to the business process.
- Work with development teams to ensure models can be implemented as part of a delivered solution replicable across many clients.
- Knowledge of Machine Learning, NLP, Document Classification, Topic Modeling and Information Extraction with a proven track record of applying them to real problems.
- Experience working with big data systems and big data concepts.
- Ability to provide clear and concise communication both with other technical teams and non-technical domain specialists.
- Strong team player; ability to provide both a strong individual contribution but also work as a team and contribute to wider goals is a must in this dynamic environment.
- Experience with noisy and/or unstructured textual data.
knowledge graph and NLP including summarization, topic modelling etc
- Strong coding ability with statistical analysis tools in Python or R, and general software development skills (source code management, debugging, testing, deployment, etc.)
- Working knowledge of various text mining algorithms and their use-cases such as keyword extraction, PLSA, LDA, HMM, CRF, deep learning & recurrent ANN, word2vec/doc2vec, Bayesian modeling.
- Strong understanding of text pre-processing and normalization techniques, such as tokenization,
- POS tagging and parsing and how they work at a low level.
- Excellent problem solving skills.
- Strong verbal and written communication skills
- Masters or higher in data mining or machine learning; or equivalent practical analytics / modelling experience
- Practical experience in using NLP related techniques and algorithms
- Experience in open source coding and communities desirable.
Able to containerize Models and associated modules and work in a Microservices environment
- Gathering project requirements from customers and supporting their requests.
- Creating project estimates and scoping the solution based on clients’ requirements.
- Delivery on key project milestones in line with project Plan/ Budget.
- Establishing individual project plans and working with the team in prioritizing production schedules.
- Communication of milestones with the team and to clients via scheduled work-in-progress meetings
- Designing and documenting product requirements.
- Possess good analytical skills - detail-orientemd
- Be familiar with Microsoft applications and working knowledge of MS Excel
- Knowledge of MIS Reports & Dashboards
- Maintaining strong customer relationships with a positive, can-do attitude
Required Experience: 5 - 7 Years
Skills : ADF, Azure, SSIS, python
Job Description
Azure Data Engineer with hands on SSIS migrations and ADF expertise.
Roles & Responsibilities
•Overall, 6+ years’ experience in Cloud Data Engineering, with hands on experience in ADF (Azure Data Factory) is required.
Hands on experience with SSIS to ADF migration is preferred.
SQL Server Integration Services (SSIS) workloads to SSIS in ADF. ( Must have done at least one migration)
Hands on experience implementing Azure Data Factory frameworks, scheduling, and performance tuning.
Hands on experience in migrating SSIS solutions to ADF
Hands on experience in ADF coding side.
Hands on experience with MPP Database architecture
Hands on experience in python