- Modeling complex problems, discovering insights, and identifying opportunities through the use of statistical, algorithmic, mining, and visualization techniques
- Experience working with business understanding the requirement, creating the problem statement, and building scalable and dependable Analytical solutions
- Must have hands-on and strong experience in Python
- Broad knowledge of fundamentals and state-of-the-art in NLP and machine learning
- Strong analytical & algorithm development skills
- Deep knowledge of techniques such as Linear Regression, gradient descent, Logistic Regression, Forecasting, Cluster analysis, Decision trees, Linear Optimization, Text Mining, etc
- Ability to collaborate across teams and strong interpersonal skills
Skills
- Sound theoretical knowledge in ML algorithm and their application
- Hands-on experience in statistical modeling tools such as R, Python, and SQL
- Hands-on experience in Machine learning/data science
- Strong knowledge of statistics
- Experience in advanced analytics / Statistical techniques – Regression, Decision trees, Ensemble machine learning algorithms, etc
- Experience in Natural Language Processing & Deep Learning techniques
- Pandas, NLTK, Scikit-learn, SpaCy, Tensorflow
About Marktine
Similar jobs
1. Core Responsibilities
· Leading solutions for data engineering
· Maintain the integrity of both the design and the data that is held within the architecture
· Champion and educate people in the development and use of data engineering best practises
· Support the Head of Data Engineering and lead by example
· Contribute to the development of database management services and associated processes relating to the delivery of data solutions
· Provide requirements analysis, documentation, development, delivery and maintenance of data platforms.
· Develop database requirements in a structured and logical manner ensuring delivery is aligned with business prioritisation and best practise
· Design and deliver performance enhancements, application migration processes and version upgrades across a pipeline of BI environments.
· Provide support for the scoping and delivery of BI capability to internal users.
· Identify risks and issues and escalate to Line / Project manager.
· Work with clients, existing asset owners & their service providers and non BI development staff to clarify and deliver work stream objectives in timescales that deliver to the overall project expectations.
· Develop and maintain documentation in support of all BI processes.
· Proactively identify cost-justifiable improvements to data manipulation processes.
· Research and promote relevant BI tools and processes that contribute to increased efficiency and capability in support of corporate objectives.
· Promote a culture that embraces change, continuous improvement and a ‘can do’ attitude.
· Demonstrate enthusiasm and self-motivation at all times.
· Establish effective working relationships with other internal teams to drive improved efficiency and effective processes.
· Be a champion for high quality data and use of strategic data repositories, associated relational model, and Data Warehouse for optimising the delivery of accurate, consistent and reliable business intelligence
· Ensure that you fully understand and comply with the organisation’s Risk Management Policies as they relate to your area of responsibility and demonstrate in your day to day work that you put customers at the heart of everything you do.
· Ensure that you fully understand and comply with the organisation’s Data Governance Policies as they relate to your area of responsibility and demonstrate in your day to day work that you treat data as an important corporate asset which must be protected and managed.
· Maintain the company’s compliance standards and ensure timely completion of all mandatory on-line training modules and attestations.
2. Experience Requirements
· 5 years Data Engineering / ETL development experience is essential
· 5 years data design experience in an MI / BI / Analytics environment (Kimball, lake house, data lake) is essential
· 5 years experience of working in a structured Change Management project lifecycle is essential
· Experience of working in a financial services environment is desirable
· Experience of dealing with senior management within a large organisation is desirable
· 5 years experience of developing in conjunction with large complex projects and programmes is desirable
· Experience mentoring other members of the team on best practise and internal standards is essential
· Experience with cloud data platforms desirable (Microsoft Azure) is desirable
3. Knowledge Requirements
· A strong knowledge of business intelligence solutions and an ability to translate this into data solutions for the broader business is essential
· Strong demonstrable knowledge of data warehouse methodologies
· Robust understanding of high level business processes is essential
· Understanding of data migration, including reconciliation, data cleanse and cutover is desirable
The Merck Data Engineering Team is responsible for designing, developing, testing, and supporting automated end-to-end data pipelines and applications on Merck’s data management and global analytics platform (Palantir Foundry, Hadoop, AWS and other components).
The Foundry platform comprises multiple different technology stacks, which are hosted on Amazon Web Services (AWS) infrastructure or on-premise Merck’s own data centers. Developing pipelines and applications on Foundry requires:
• Proficiency in SQL / Java / Python (Python required; all 3 not necessary)
• Proficiency in PySpark for distributed computation
• Familiarity with Postgres and ElasticSearch
• Familiarity with HTML, CSS, and JavaScript and basic design/visual competency
• Familiarity with common databases (e.g. JDBC, mySQL, Microsoft SQL). Not all types required
This position will be project based and may work across multiple smaller projects or a single large project utilizing an agile project methodology.
Roles & Responsibilities:
• Develop data pipelines by ingesting various data sources – structured and un-structured – into Palantir Foundry
• Participate in end to end project lifecycle, from requirements analysis to go-live and operations of an application
• Acts as business analyst for developing requirements for Foundry pipelines
• Review code developed by other data engineers and check against platform-specific standards, cross-cutting concerns, coding and configuration standards and functional specification of the pipeline
• Document technical work in a professional and transparent way. Create high quality technical documentation
• Work out the best possible balance between technical feasibility and business requirements (the latter can be quite strict)
• Deploy applications on Foundry platform infrastructure with clearly defined checks
• Implementation of changes and bug fixes via Merck's change management framework and according to system engineering practices (additional training will be provided)
• DevOps project setup following Agile principles (e.g. Scrum)
• Besides working on projects, act as third level support for critical applications; analyze and resolve complex incidents/problems. Debug problems across a full stack of Foundry and code based on Python, Pyspark, and Java
• Work closely with business users, data scientists/analysts to design physical data models
•3+ years of experience in big data & data warehousing technologies
•Experience in processing and organizing large data sets
•Experience with big data tool sets such Airflow and Oozie
•Experience working with BigQuery, Snowflake or MPP, Kafka, Azure, GCP and AWS
•Experience developing in programming languages such as SQL, Python, Java or Scala
•Experience in pulling data from variety of databases systems like SQL Server, maria DB, Cassandra
NOSQL databases
•Experience working with retail, advertising or media data at large scale
•Experience working with data science engineering, advanced data insights development
•Strong quality proponent and thrives to impress with his/her work
•Strong problem-solving skills and ability to navigate complicated database relationships
•Good written and verbal communication skills , Demonstrated ability to work with product
management and/or business users to understand their needs.
Responsibilities:
- Design and develop strong analytics system and predictive models
- Managing a team of data scientists, machine learning engineers, and big data specialists
- Identify valuable data sources and automate data collection processes
- Undertake pre-processing of structured and unstructured data
- Analyze large amounts of information to discover trends and patterns
- Build predictive models and machine-learning algorithms
- Combine models through ensemble modeling
- Present information using data visualization techniques
- Propose solutions and strategies to business challenges
- Collaborate with engineering and product development teams
Requirements:
- Proven experience as a seasoned Data Scientist
- Good Experience in data mining processes
- Understanding of machine learning and Knowledge of operations research is a value addition
- Strong understanding and experience in R, SQL, and Python; Knowledge base with Scala, Java, or C++ is an asset
- Experience using business intelligence tools (e. g. Tableau) and data frameworks (e. g. Hadoop)
- Strong math skills (e. g. statistics, algebra)
- Problem-solving aptitude
- Excellent communication and presentation skills
- Experience in Natural Language Processing (NLP)
- Strong competitive coding skills
- BSc/BA in Computer Science, Engineering or relevant field; graduate degree in Data Science or other quantitative field is preferred
Company Profile:
Easebuzz is a payment solutions (fintech organisation) company which enables online merchants to accept, process and disburse payments through developer friendly APIs. We are focusing on building plug n play products including the payment infrastructure to solve complete business problems. Definitely a wonderful place where all the actions related to payments, lending, subscription, eKYC is happening at the same time.
We have been consistently profitable and are constantly developing new innovative products, as a result, we are able to grow 4x over the past year alone. We are well capitalised and have recently closed a fundraise of $4M in March, 2021 from prominent VC firms and angel investors. The company is based out of Pune and has a total strength of 180 employees. Easebuzz’s corporate culture is tied into the vision of building a workplace which breeds open communication and minimal bureaucracy. An equal opportunity employer, we welcome and encourage diversity in the workplace. One thing you can be sure of is that you will be surrounded by colleagues who are committed to helping each other grow.
Easebuzz Pvt. Ltd. has its presence in Pune, Bangalore, Gurugram.
Salary: As per company standards.
Designation: Data Engineering
Location: Pune
Experience with ETL, Data Modeling, and Data Architecture
Design, build and operationalize large scale enterprise data solutions and applications using one or more of AWS data and analytics services in combination with 3rd parties
- Spark, EMR, DynamoDB, RedShift, Kinesis, Lambda, Glue.
Experience with AWS cloud data lake for development of real-time or near real-time use cases
Experience with messaging systems such as Kafka/Kinesis for real time data ingestion and processing
Build data pipeline frameworks to automate high-volume and real-time data delivery
Create prototypes and proof-of-concepts for iterative development.
Experience with NoSQL databases, such as DynamoDB, MongoDB etc
Create and maintain optimal data pipeline architecture,
Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL and AWS ‘big data’ technologies.
Build analytics tools that utilize the data pipeline to provide actionable insights into customer acquisition, operational efficiency and other key business performance metrics.
Work with stakeholders including the Executive, Product, Data and Design teams to assist with data-related technical issues and support their data infrastructure needs.
Keep our data separated and secure across national boundaries through multiple data centers and AWS regions.
Create data tools for analytics and data scientist team members that assist them in building and optimizing our product into an innovative industry leader.
Evangelize a very high standard of quality, reliability and performance for data models and algorithms that can be streamlined into the engineering and sciences workflow
Build and enhance data pipeline architecture by designing and implementing data ingestion solutions.
Employment Type
Full-time
- Experience implementing large-scale ETL processes using Informatica PowerCenter.
- Design high-level ETL process and data flow from the source system to target databases.
- Strong experience with Oracle databases and strong SQL.
- Develop & unit test Informatica ETL processes for optimal performance utilizing best practices.
- Performance tune Informatica ETL mappings and report queries.
- Develop database objects like Stored Procedures, Functions, Packages, and Triggers using SQL and PL/SQL.
- Hands-on Experience in Unix.
- Experience in Informatica Cloud (IICS).
- Work with appropriate leads and review high-level ETL design, source to target data mapping document, and be the point of contact for any ETL-related questions.
- Good understanding of project life cycle, especially tasks within the ETL phase.
- Ability to work independently and multi-task to meet critical deadlines in a rapidly changing environment.
- Excellent communication and presentation skills.
- Effectively worked on the Onsite and Offshore work model.
About the Company:
It is a Data as a Service company that helps businesses harness the power of data. Our technology fuels some of the most interesting big data projects of the word. We are a small bunch of people working towards shaping the imminent data-driven future by solving some of its fundamental and toughest challenges.
Role: We are looking for an experienced team lead to drive data acquisition projects end to end. In this role, you will be working in the web scraping team with data engineers, helping them solve complex web problems and mentor them along the way. You’ll be adept at delivering large-scale web crawling projects, breaking down barriers for your team and planning at a higher level, and getting into the detail to make things happen when needed.
Responsibilities
- Interface with clients and sales team to translate functional requirements into technical requirements
- Plan and estimate tasks with your team, in collaboration with the delivery managers
- Engineer complex data acquisition projects
- Guide and mentor your team of engineers
- Anticipate issues that might arise and proactively consider those into design
- Perform code reviews and suggest design changes
Prerequisites
- Between 5-8 years of relevant experience
- Fluent programming skills and well-versed with scripting languages like Python or Ruby
- Solid foundation in data structures and algorithms
- Excellent tech troubleshooting skills
- Good understanding of web data landscape
- Prior exposure to DOM, XPATH and hands on experience with selenium/automated testing is a plus
Skills and competencies
- Prior experience with team handling and people management is mandatory
- Work independently with little to no supervision
- Extremely high attention to detail
- Ability to juggle between multiple projects