Should design and operate data pipe lines.
Build and manage analytics platform using Elastic search, Redshift, Mongo db.
Strong programming fundamentals in Datastructures and algorithms.
About It's a OTT platform
Similar jobs
● Proficiency in Linux.
● Experience working with AWS cloud services: EC2, S3, RDS, Redshift.
● Must have SQL knowledge and experience working with relational databases, query
authoring (SQL) as well as familiarity with databases including Mysql, Mongo, Cassandra,
and Athena.
● Must have experience with Python/Scala.
● Must have experience with Big Data technologies like Apache Spark.
● Must have experience with Apache Airflow.
● Experience with data pipelines and ETL tools like AWS Glue.
companies uncover the 3% of active buyers in their target market. It evaluates
over 100 billion data points and analyzes factors such as buyer journeys, technology
adoption patterns, and other digital footprints to deliver market & sales intelligence.
Its customers have access to the buying patterns and contact information of
more than 17 million companies and 70 million decision makers across the world.
Role – Data Engineer
Responsibilities
Work in collaboration with the application team and integration team to
design, create, and maintain optimal data pipeline architecture and data
structures for Data Lake/Data Warehouse.
Work with stakeholders including the Sales, Product, and Customer Support
teams to assist with data-related technical issues and support their data
analytics needs.
Assemble large, complex data sets from third-party vendors to meet business
requirements.
Identify, design, and implement internal process improvements: automating
manual processes, optimizing data delivery, re-designing infrastructure for
greater scalability, etc.
Build the infrastructure required for optimal extraction, transformation, and
loading of data from a wide variety of data sources using SQL, Elasticsearch,
MongoDB, and AWS technology.
Streamline existing and introduce enhanced reporting and analysis solutions
that leverage complex data sources derived from multiple internal systems.
Requirements
5+ years of experience in a Data Engineer role.
Proficiency in Linux.
Must have SQL knowledge and experience working with relational databases,
query authoring (SQL) as well as familiarity with databases including Mysql,
Mongo, Cassandra, and Athena.
Must have experience with Python/Scala.
Must have experience with Big Data technologies like Apache Spark.
Must have experience with Apache Airflow.
Experience with data pipeline and ETL tools like AWS Glue.
Experience working with AWS cloud services: EC2, S3, RDS, Redshift.
Job Description
Mandatory Requirements
-
Experience in AWS Glue
-
Experience in Apache Parquet
-
Proficient in AWS S3 and data lake
-
Knowledge of Snowflake
-
Understanding of file-based ingestion best practices.
-
Scripting language - Python & pyspark
CORE RESPONSIBILITIES
-
Create and manage cloud resources in AWS
-
Data ingestion from different data sources which exposes data using different technologies, such as: RDBMS, flat files, Streams, and Time series data based on various proprietary systems. Implement data ingestion and processing with the help of Big Data technologies
-
Data processing/transformation using various technologies such as Spark and Cloud Services. You will need to understand your part of business logic and implement it using the language supported by the base data platform
-
Develop automated data quality check to make sure right data enters the platform and verifying the results of the calculations
-
Develop an infrastructure to collect, transform, combine and publish/distribute customer data.
-
Define process improvement opportunities to optimize data collection, insights and displays.
-
Ensure data and results are accessible, scalable, efficient, accurate, complete and flexible
-
Identify and interpret trends and patterns from complex data sets
-
Construct a framework utilizing data visualization tools and techniques to present consolidated analytical and actionable results to relevant stakeholders.
-
Key participant in regular Scrum ceremonies with the agile teams
-
Proficient at developing queries, writing reports and presenting findings
-
Mentor junior members and bring best industry practices.
QUALIFICATIONS
-
5-7+ years’ experience as data engineer in consumer finance or equivalent industry (consumer loans, collections, servicing, optional product, and insurance sales)
-
Strong background in math, statistics, computer science, data science or related discipline
-
Advanced knowledge one of language: Java, Scala, Python, C#
-
Production experience with: HDFS, YARN, Hive, Spark, Kafka, Oozie / Airflow, Amazon Web Services (AWS), Docker / Kubernetes, Snowflake
-
Proficient with
-
Data mining/programming tools (e.g. SAS, SQL, R, Python)
-
Database technologies (e.g. PostgreSQL, Redshift, Snowflake. and Greenplum)
-
Data visualization (e.g. Tableau, Looker, MicroStrategy)
-
Comfortable learning about and deploying new technologies and tools.
-
Organizational skills and the ability to handle multiple projects and priorities simultaneously and meet established deadlines.
-
Good written and oral communication skills and ability to present results to non-technical audiences
-
Knowledge of business intelligence and analytical tools, technologies and techniques.
Familiarity and experience in the following is a plus:
-
AWS certification
-
Spark Streaming
-
Kafka Streaming / Kafka Connect
-
ELK Stack
-
Cassandra / MongoDB
-
CI/CD: Jenkins, GitLab, Jira, Confluence other related tools
Job Description:
The data science team is responsible for solving business problems with complex data. Data complexity could be characterized in terms of volume, dimensionality and multiple touchpoints/sources. We understand the data, ask fundamental-first-principle questions, apply our analytical and machine learning skills to solve the problem in the best way possible.
Our ideal candidate
The role would be a client facing one, hence good communication skills are a must.
The candidate should have the ability to communicate complex models and analysis in a clear and precise manner.
The candidate would be responsible for:
- Comprehending business problems properly - what to predict, how to build DV, what value addition he/she is bringing to the client, etc.
- Understanding and analyzing large, complex, multi-dimensional datasets and build features relevant for business
- Understanding the math behind algorithms and choosing one over another
- Understanding approaches like stacking, ensemble and applying them correctly to increase accuracy
Desired technical requirements
- Proficiency with Python and the ability to write production-ready codes.
- Experience in pyspark, machine learning and deep learning
- Big data experience, e.g. familiarity with Spark, Hadoop, is highly preferred
- Familiarity with SQL or other databases.
About Telstra
Telstra is Australia’s leading telecommunications and technology company, with operations in more than 20 countries, including In India where we’re building a new Innovation and Capability Centre (ICC) in Bangalore.
We’re growing, fast, and for you that means many exciting opportunities to develop your career at Telstra. Join us on this exciting journey, and together, we’ll reimagine the future.
Why Telstra?
- We're an iconic Australian company with a rich heritage that's been built over 100 years. Telstra is Australia's leading Telecommunications and Technology Company. We've been operating internationally for more than 70 years.
- International presence spanning over 20 countries.
- We are one of the 20 largest telecommunications providers globally
- At Telstra, the work is complex and stimulating, but with that comes a great sense of achievement. We are shaping the tomorrow's modes of communication with our innovation driven teams.
Telstra offers an opportunity to make a difference to lives of millions of people by providing the choice of flexibility in work and a rewarding career that you will be proud of!
About the team
Being part of Networks & IT means you'll be part of a team that focuses on extending our network superiority to enable the continued execution of our digital strategy.
With us, you'll be working with world-leading technology and change the way we do IT to ensure business needs drive priorities, accelerating our digitisation programme.
Focus of the role
Any new engineer who comes into data chapter would be mostly into developing reusable data processing and storage frameworks that can be used across data platform.
About you
To be successful in the role, you'll bring skills and experience in:-
Essential
- Hands-on experience in Spark Core, Spark SQL, SQL/Hive/Impala, Git/SVN/Any other VCS and Data warehousing
- Skilled in the Hadoop Ecosystem(HDP/Cloudera/MapR/EMR etc)
- Azure data factory/Airflow/control-M/Luigi
- PL/SQL
- Exposure to NOSQL(Hbase/Cassandra/GraphDB(Neo4J)/MongoDB)
- File formats (Parquet/ORC/AVRO/Delta/Hudi etc.)
- Kafka/Kinesis/Eventhub
Highly Desirable
Experience and knowledgeable on the following:
- Spark Streaming
- Cloud exposure (Azure/AWS/GCP)
- Azure data offerings - ADF, ADLS2, Azure Databricks, Azure Synapse, Eventhubs, CosmosDB etc.
- Presto/Athena
- Azure DevOps
- Jenkins/ Bamboo/Any similar build tools
- Power BI
- Prior experience in building or working in team building reusable frameworks,
- Data modelling.
- Data Architecture and design principles. (Delta/Kappa/Lambda architecture)
- Exposure to CI/CD
- Code Quality - Static and Dynamic code scans
- Agile SDLC
If you've got a passion to innovate, succeed as part of a great team, and looking for the next step in your career, we'd welcome you to apply!
___________________________
We’re committed to building a diverse and inclusive workforce in all its forms. We encourage applicants from diverse gender, cultural and linguistic backgrounds and applicants who may be living with a disability. We also offer flexibility in all our roles, to ensure everyone can participate.
To learn more about how we support our people, including accessibility adjustments we can provide you through the recruitment process, visit tel.st/thrive.
- 4+ years of experience Solid understanding of Python, Java and general software development skills (source code management, debugging, testing, deployment etc.).
- Experience in working with Solr and ElasticSearch Experience with NLP technologies & the handling of unstructured text Detailed understanding of text pre-processing and normalisation techniques such as tokenisation, lemmatisation, stemming, POS tagging etc.
- Prior experience in implementation of traditional ML solutions - classification, regression or clustering problem Expertise in text-analytics - Sentiment Analysis, Entity Extraction, Language modelling - and associated sequence learning models ( RNN, LSTM, GRU).
- Comfortable working with deep-learning libraries (eg. PyTorch)
- Candidate can even be a fresher with 1 or 2 years of experience IIIT, IIIT, Bits Pilani, top 5 local colleges are preferred colleges and universities.
- A Masters candidate in machine learning.
- Can source candidates from Mu Sigma and Manthan.
Work days- Sun-Thu
Day shift