Job Responsibilities
- Design, build & test ETL processes using Python & SQL for the corporate data warehouse
- Inform, influence, support, and execute our product decisions
- Maintain advertising data integrity by working closely with R&D to organize and store data in a format that provides accurate data and allows the business to quickly identify issues.
- Evaluate and prototype new technologies in the area of data processing
- Think quickly, communicate clearly and work collaboratively with product, data, engineering, QA and operations teams
- High energy level, strong team player and good work ethic
- Data analysis, understanding of business requirements and translation into logical pipelines & processes
- Identification, analysis & resolution of production & development bugs
- Support the release process including completing & reviewing documentation
- Configure data mappings & transformations to orchestrate data integration & validation
- Provide subject matter expertise
- Document solutions, tools & processes
- Create & support test plans with hands-on testing
- Peer reviews of work developed by other data engineers within the team
- Establish good working relationships & communication channels with relevant departments
Skills and Qualifications we look for
- University degree 2.1 or higher (or equivalent) in a relevant subject. Master’s degree in any data subject will be a strong advantage.
- 4 - 6 years experience with data engineering.
- Strong coding ability and software development experience in Python.
- Strong hands-on experience with SQL and Data Processing.
- Google cloud platform (Cloud composer, Dataflow, Cloud function, Bigquery, Cloud storage, dataproc)
- Good working experience in any one of the ETL tools (Airflow would be preferable).
- Should possess strong analytical and problem solving skills.
- Good to have skills - Apache pyspark, CircleCI, Terraform
- Motivated, self-directed, able to work with ambiguity and interested in emerging technologies, agile and collaborative processes.
- Understanding & experience of agile / scrum delivery methodology
About MNC Company - Product Based
Similar jobs
heads to solve complex business problems
- Develop statistical, and machine learning-based models/pipelines/methods to improve business
processes and engagements
- Conduct sophisticated data mining analyses of large volumes of data and build data science
models, as required, as part of the credit and risk underwriting solutions; customer engagement and
retention; new business initiatives; business process improvements
- Translate data mining results into a clear business-focused deliverable for decisionmakers
- Working with Application Developers on integrating machine learning algorithms and data mining
models into operational systems so it could lead to automation, productivity increase, and time
savings
- Provide the technical direction required to resolve complex issues to ensure the on-time delivery of
solutions that meet the business team’s expectations. May need to develop new methods to apply
to situations
- Knowledge of how to leverage statistical models in algorithms is a must
- Experience in multivariate analysis; identifying how several parameters can affect
retention/behaviour of the customer and identifying actions at different points of the customer lifecycle
Extensive experience coding in Python and having mentored teams to learn the same
- Great understanding of the data science landscape and what tools to leverage for different
problems
- A great structured thinker that could bring structure to any data science problem quickly
- Ability to visualize data stories and adept in data visualization tools and present insights as cohesive
stories to senior leadership
- Excellent capability to organize large data sets collected from many sources (web APIs and internal
databases) to get actionable insights
- Initiate data science programs in the team and collaborate across other data science teams to build
a knowledge database
Description:
As a Data Engineering Lead at Company, you will be at the forefront of shaping and managing our data infrastructure with a primary focus on Google Cloud Platform (GCP). You will lead a team of data engineers to design, develop, and maintain our data pipelines, ensuring data quality, scalability, and availability for critical business insights.
Key Responsibilities:
1. Team Leadership:
a. Lead and mentor a team of data engineers, providing guidance, coaching, and performance management.
b. Foster a culture of innovation, collaboration, and continuous learning within the team.
2. Data Pipeline Development (Google Cloud Focus):
a. Design, develop, and maintain scalable data pipelines on Google Cloud Platform (GCP) using services such as BigQuery, Dataflow, and Dataprep.
b. Implement best practices for data extraction, transformation, and loading (ETL) processes on GCP.
3. Data Architecture and Optimization:
a. Define and enforce data architecture standards, ensuring data is structured and organized efficiently.
b. Optimize data storage, processing, and retrieval for maximum
performance and cost-effectiveness on GCP.
4. Data Governance and Quality:
a. Establish data governance frameworks and policies to maintain data quality, consistency, and compliance with regulatory requirements. b. Implement data monitoring and alerting systems to proactively address data quality issues.
5. Cross-functional Collaboration:
a. Collaborate with data scientists, analysts, and other cross-functional teams to understand data requirements and deliver data solutions that drive business insights.
b. Participate in discussions regarding data strategy and provide technical expertise.
6. Documentation and Best Practices:
a. Create and maintain documentation for data engineering processes, standards, and best practices.
b. Stay up-to-date with industry trends and emerging technologies, making recommendations for improvements as needed.
Qualifications
● Bachelor's or Master's degree in Computer Science, Data Engineering, or related field.
● 5+ years of experience in data engineering, with a strong emphasis on Google Cloud Platform.
● Proficiency in Google Cloud services, including BigQuery, Dataflow, Dataprep, and Cloud Storage.
● Experience with data modeling, ETL processes, and data integration. ● Strong programming skills in languages like Python or Java.
● Excellent problem-solving and communication skills.
● Leadership experience and the ability to manage and mentor a team.
- 5+ years of industry experience in administering (including setting up, managing, monitoring) data processing pipelines (both streaming and batch) using frameworks such as Kafka Streams, Py Spark, and streaming databases like druid or equivalent like Hive
- Strong industry expertise with containerization technologies including kubernetes (EKS/AKS), Kubeflow
- Experience with cloud platform services such as AWS, Azure or GCP especially with EKS, Managed Kafka
- 5+ Industry experience in python
- Experience with popular modern web frameworks such as Spring boot, Play framework, or Django
- Experience with scripting languages. Python experience highly desirable. Experience in API development using Swagger
- Implementing automated testing platforms and unit tests
- Proficient understanding of code versioning tools, such as Git
- Familiarity with continuous integration, Jenkins
Responsibilities
- Architect, Design and Implement Large scale data processing pipelines using Kafka Streams, PySpark, Fluentd and Druid
- Create custom Operators for Kubernetes, Kubeflow
- Develop data ingestion processes and ETLs
- Assist in dev ops operations
- Design and Implement APIs
- Identify performance bottlenecks and bugs, and devise solutions to these problems
- Help maintain code quality, organization, and documentation
- Communicate with stakeholders regarding various aspects of solution.
- Mentor team members on best practices
Responsibilities:
- Design and develop strong analytics system and predictive models
- Managing a team of data scientists, machine learning engineers, and big data specialists
- Identify valuable data sources and automate data collection processes
- Undertake pre-processing of structured and unstructured data
- Analyze large amounts of information to discover trends and patterns
- Build predictive models and machine-learning algorithms
- Combine models through ensemble modeling
- Present information using data visualization techniques
- Propose solutions and strategies to business challenges
- Collaborate with engineering and product development teams
Requirements:
- Proven experience as a seasoned Data Scientist
- Good Experience in data mining processes
- Understanding of machine learning and Knowledge of operations research is a value addition
- Strong understanding and experience in R, SQL, and Python; Knowledge base with Scala, Java, or C++ is an asset
- Experience using business intelligence tools (e. g. Tableau) and data frameworks (e. g. Hadoop)
- Strong math skills (e. g. statistics, algebra)
- Problem-solving aptitude
- Excellent communication and presentation skills
- Experience in Natural Language Processing (NLP)
- Strong competitive coding skills
- BSc/BA in Computer Science, Engineering or relevant field; graduate degree in Data Science or other quantitative field is preferred
Only a solid grounding in computer engineering, Unix, data structures and algorithms would enable you to meet this challenge. 7+ years of experience architecting, developing, releasing, and maintaining large-scale big data platforms on AWS or GCP Understanding of how Big Data tech and NoSQL stores like MongoDB, HBase/HDFS, ElasticSearch synergize to power applications in analytics, AI and knowledge graphs Understandingof how data processing models, data location patterns, disk IO, network IO, shuffling affect large scale text processing - feature extraction, searching etc Expertise with a variety of data processing systems, including streaming, event, and batch (Spark, Hadoop/MapReduce) 5+ years proficiency in configuring and deploying applications on Linux-based systems 5+ years of experience Spark - especially Pyspark for transforming large non-structured text data, creating highly optimized pipelines Experience with RDBMS, ETL techniques and frameworks (Sqoop, Flume) and big data querying tools (Pig, Hive) Stickler of world class best practices, uncompromising on the quality of engineering, understand standards and reference architectures and deep in Unix philosophy with appreciation of big data design patterns, orthogonal code design and functional computation models |
Mining large volumes of credit behavior data to generate insights around product holdings and monetization opportunities for cross sell
Use data science to size opportunity and product potential for launch of any new product/pilots
Build propensity models using heuristics and campaign performance to maximize efficiency.
Conduct portfolio analysis and establish key metrics for cross sell partnership
Desired profile/Skills:
2-5 years of experience with a degree in any quantitative discipline such as Engineering, Computer Science, Economics, Statistics or Mathematics
Excellent problem solving and comprehensive analytical skills – ability to structure ambiguous problem statements, perform detailed analysis and derive crisp insights.
Solid experience in using python and SQL
Prior work experience in a financial services space would be highly valued
Location: Bangalore/ Ahmedabad