- Experience with Cloud native Data tools/Services such as AWS Athena, AWS Glue, Redshift Spectrum, AWS EMR, AWS Aurora, Big Query, Big Table, S3, etc.
- Strong programming skills in at least one of the following languages: Java, Scala, C++.
- Familiarity with a scripting language like Python as well as Unix/Linux shells.
- Comfortable with multiple AWS components including RDS, AWS Lambda, AWS Glue, AWS Athena, EMR. Equivalent tools in the GCP stack will also suffice.
- Strong analytical skills and advanced SQL knowledge, indexing, query optimization techniques.
- Experience implementing software around data processing, metadata management, and ETL pipeline tools like Airflow.
Experience with the following software/tools is highly desired:
- Apache Spark, Kafka, Hive, etc.
- SQL and NoSQL databases like MySQL, Postgres, DynamoDB.
- Workflow management tools like Airflow.
- AWS cloud services: RDS, AWS Lambda, AWS Glue, AWS Athena, EMR.
- Familiarity with Spark programming paradigms (batch and stream-processing).
- RESTful API services.
About cloud Transformation products, frameworks and services
Similar jobs
Publicis Sapient Overview:
The Senior Associate People Senior Associate L1 in Data Engineering, you will translate client requirements into technical design, and implement components for data engineering solution. Utilize deep understanding of data integration and big data design principles in creating custom solutions or implementing package solutions. You will independently drive design discussions to insure the necessary health of the overall solution
.
Job Summary:
As Senior Associate L1 in Data Engineering, you will do technical design, and implement components for data engineering solution. Utilize deep understanding of data integration and big data design principles in creating custom solutions or implementing package solutions. You will independently drive design discussions to insure the necessary health of the overall solution
The role requires a hands-on technologist who has strong programming background like Java / Scala / Python, should have experience in Data Ingestion, Integration and data Wrangling, Computation, Analytics pipelines and exposure to Hadoop ecosystem components. Having hands-on knowledge on at least one of AWS, GCP, Azure cloud platforms will be preferable.
Role & Responsibilities:
Job Title: Senior Associate L1 – Data Engineering
Your role is focused on Design, Development and delivery of solutions involving:
• Data Ingestion, Integration and Transformation
• Data Storage and Computation Frameworks, Performance Optimizations
• Analytics & Visualizations
• Infrastructure & Cloud Computing
• Data Management Platforms
• Build functionality for data ingestion from multiple heterogeneous sources in batch & real-time
• Build functionality for data analytics, search and aggregation
Experience Guidelines:
Mandatory Experience and Competencies:
# Competency
1.Overall 3.5+ years of IT experience with 1.5+ years in Data related technologies
2.Minimum 1.5 years of experience in Big Data technologies
3.Hands-on experience with the Hadoop stack – HDFS, sqoop, kafka, Pulsar, NiFi, Spark, Spark Streaming, Flink, Storm, hive, oozie, airflow and other components required in building end to end data pipeline. Working knowledge on real-time data pipelines is added advantage.
4.Strong experience in at least of the programming language Java, Scala, Python. Java preferable
5.Hands-on working knowledge of NoSQL and MPP data platforms like Hbase, MongoDb, Cassandra, AWS Redshift, Azure SQLDW, GCP BigQuery etc
Preferred Experience and Knowledge (Good to Have):
# Competency
1.Good knowledge of traditional ETL tools (Informatica, Talend, etc) and database technologies (Oracle, MySQL, SQL Server, Postgres) with hands on experience
2.Knowledge on data governance processes (security, lineage, catalog) and tools like Collibra, Alation etc
3.Knowledge on distributed messaging frameworks like ActiveMQ / RabbiMQ / Solace, search & indexing and Micro services architectures
4.Performance tuning and optimization of data pipelines
5.CI/CD – Infra provisioning on cloud, auto build & deployment pipelines, code quality
6.Working knowledge with data platform related services on at least 1 cloud platform, IAM and data security
7.Cloud data specialty and other related Big data technology certifications
Job Title: Senior Associate L1 – Data Engineering
Personal Attributes:
• Strong written and verbal communication skills
• Articulation skills
• Good team player
• Self-starter who requires minimal oversight
• Ability to prioritize and manage multiple tasks
• Process orientation and the ability to define and set up processes
Data Analyst
Job Brief:
The successful candidate will turn data into information, information into insight and insight into business decisions.
Data Analyst Job Duties
Data analyst responsibilities include conducting full lifecycle analysis to include requirements, activities and design. Data analysts will develop analysis and reporting capabilities. They will also monitor performance
and quality control plans to identify improvements.
About Us
We began in 2015 with an entrepreneurial vision to bring a digital change in the manufacturing landscape of India. With a team of 300+ we are working towards the digital transformation of business in the manufacturing industry across domains like Footwear, Apparel, Textile, Accessories etc. We are backed by investors such as
Info Edge (Naukri.com), Matrix Partners, Sequoia, Water Bridge Ventures and select Industry leaders.
Today, we have enabled 2000+ Manufacturers to digitize their distribution channel.
Responsibilities
● Interpret data, analyze results using statistical techniques and provide ongoing reports.
● Develop and implement databases, data collection systems, data analytics and other strategies that
optimize statistical efficiency and quality.
● Acquire data from primary or secondary data sources and maintain databases/data systems.
● Identify, analyze, and interpret trends or patterns in complex data sets.
● Filter and “clean” data by reviewing computer reports, printouts, and performance indicators to locate and
correct code problems.
● Work with management to prioritize business and information needs.
● Locate and define new process improvement opportunities.
Requirements
● Proven working experience as a Data Analyst or Business Data Analyst.
● Technical expertise regarding data models, database design development, data mining and segmentation
techniques.
● Strong knowledge of and experience with reporting packages (Business Objects etc), databases (SQL etc),
programming (XML, Javascript, or ETL frameworks).
● Knowledge of statistics and experience using statistical packages for analyzing datasets (Excel, SPSS, SAS
etc).
● Strong analytical skills with the ability to collect, organize, analyze, and disseminate significant amounts of
information with attention to detail and accuracy.
● Adept at queries, report writing and presenting findings.
Job Location
South Delhi, New Delhi
Lead Data Engineer
Data Engineers develop modern data architecture approaches to meet key business objectives and provide end-to-end data solutions. You might spend a few weeks with a new client on a deep technical review or a complete organizational review, helping them to understand the potential that data brings to solve their most pressing problems. On other projects, you might be acting as the architect, leading the design of technical solutions, or perhaps overseeing a program inception to build a new product. It could also be a software delivery project where you're equally happy coding and tech-leading the team to implement the solution.
Job responsibilities
· You might spend a few weeks with a new client on a deep technical review or a complete organizational review, helping them to understand the potential that data brings to solve their most pressing problems
· You will partner with teammates to create complex data processing pipelines in order to solve our clients' most ambitious challenges
· You will collaborate with Data Scientists in order to design scalable implementations of their models
· You will pair to write clean and iterative code based on TDD
· Leverage various continuous delivery practices to deploy, support and operate data pipelines
· Advise and educate clients on how to use different distributed storage and computing technologies from the plethora of options available
· Develop and operate modern data architecture approaches to meet key business objectives and provide end-to-end data solutions
· Create data models and speak to the tradeoffs of different modeling approaches
· On other projects, you might be acting as the architect, leading the design of technical solutions, or perhaps overseeing a program inception to build a new product
· Seamlessly incorporate data quality into your day-to-day work as well as into the delivery process
· Assure effective collaboration between Thoughtworks' and the client's teams, encouraging open communication and advocating for shared outcomes
Job qualifications Technical skills
· You are equally happy coding and leading a team to implement a solution
· You have a track record of innovation and expertise in Data Engineering
· You're passionate about craftsmanship and have applied your expertise across a range of industries and organizations
· You have a deep understanding of data modelling and experience with data engineering tools and platforms such as Kafka, Spark, and Hadoop
· You have built large-scale data pipelines and data-centric applications using any of the distributed storage platforms such as HDFS, S3, NoSQL databases (Hbase, Cassandra, etc.) and any of the distributed processing platforms like Hadoop, Spark, Hive, Oozie, and Airflow in a production setting
· Hands on experience in MapR, Cloudera, Hortonworks and/or cloud (AWS EMR, Azure HDInsights, Qubole etc.) based Hadoop distributions
· You are comfortable taking data-driven approaches and applying data security strategy to solve business problems
· You're genuinely excited about data infrastructure and operations with a familiarity working in cloud environments
· Working with data excites you: you have created Big data architecture, you can build and operate data pipelines, and maintain data storage, all within distributed systems
Professional skills
· Advocate your data engineering expertise to the broader tech community outside of Thoughtworks, speaking at conferences and acting as a mentor for more junior-level data engineers
· You're resilient and flexible in ambiguous situations and enjoy solving problems from technical and business perspectives
· An interest in coaching others, sharing your experience and knowledge with teammates
· You enjoy influencing others and always advocate for technical excellence while being open to change when needed
Concepts of RDBMS, Normalization techniques
Entity Relationship diagram/ ER-Model
Transaction, commit, rollback, ACID properties
Transaction log
Difference in behavior of the column if it is nullable
SQL Statements
Join Operations
DDL, DML, Data Modelling
Optimal Query writing - with Aggregate fn, Group By, having clause, Order by etc. Should be
hands on for scenario-based query Writing
Query optimizing technique, Indexing in depth
Understanding query plan
Batching
Locking schemes
Isolation levels
Concept of stored procedure, Cursor, trigger, View
Beginner level - PL/SQL - Procedure Function writing skill.
Spring JPA and Spring Data basics
Hibernate mappings
UNIX
Basic Concepts on Unix
Commonly used Unix Commands with their options
Combining Unix commands using Pipe Filter etc.
Vi Editor & its different modes
Basic level Scripting and basic knowledge on how to execute jar files from host
Files and directory permissions
Application based scenarios.
- Hands-on experience in any Cloud Platform
- Microsoft Azure Experience
Role Description:
- You will be part of the data delivery team and will have the opportunity to develop a deep understanding of the domain/function.
- You will design and drive the work plan for the optimization/automation and standardization of the processes incorporating best practices to achieve efficiency gains.
- You will run data engineering pipelines, link raw client data with data model, conduct data assessment, perform data quality checks, and transform data using ETL tools.
- You will perform data transformations, modeling, and validation activities, as well as configure applications to the client context. You will also develop scripts to validate, transform, and load raw data using programming languages such as Python and / or PySpark.
- In this role, you will determine database structural requirements by analyzing client operations, applications, and programming.
- You will develop cross-site relationships to enhance idea generation, and manage stakeholders.
- Lastly, you will collaborate with the team to support ongoing business processes by delivering high-quality end products on-time and perform quality checks wherever required.
Job Requirement:
- Bachelor’s degree in Engineering or Computer Science; Master’s degree is a plus
- 3+ years of professional work experience with a reputed analytics firm
- Expertise in handling large amount of data through Python or PySpark
- Conduct data assessment, perform data quality checks and transform data using SQL and ETL tools
- Experience of deploying ETL / data pipelines and workflows in cloud technologies and architecture such as Azure and Amazon Web Services will be valued
- Comfort with data modelling principles (e.g. database structure, entity relationships, UID etc.) and software development principles (e.g. modularization, testing, refactoring, etc.)
- A thoughtful and comfortable communicator (verbal and written) with the ability to facilitate discussions and conduct training
- Strong problem-solving, requirement gathering, and leading.
-
Track record of completing projects successfully on time, within budget and as per scope
- Key responsibility is to design and develop a data pipeline including the architecture, prototyping, and development of data extraction, transformation/processing, cleansing/standardizing, and loading in Data Warehouse at real-time/near the real-time frequency. Source data can be structured, semi-structured, and/or unstructured format.
- Provide technical expertise to design efficient data ingestion solutions to consolidate data from RDBMS, APIs, Messaging queues, weblogs, images, audios, documents, etc of Enterprise Applications, SAAS applications, external 3rd party sites or APIs, etc through ETL/ELT, API integrations, Change Data Capture, Robotic Process Automation, Custom Python/Java Coding, etc
- Development of complex data transformation using Talend (BigData edition), Python/Java transformation in Talend, SQL/Python/Java UDXs, AWS S3, etc to load in OLAP Data Warehouse in Structured/Semi-structured form
- Development of data model and creating transformation logic to populate models for faster data consumption with simple SQL.
- Implementing automated Audit & Quality assurance checks in Data Pipeline
- Document & maintain data lineage to enable data governance
- Coordination with BIU, IT, and other stakeholders to provide best-in-class data pipeline solutions, exposing data via APIs, loading in down streams, No-SQL Databases, etc
Requirements
- Programming experience using Python / Java, to create functions / UDX
- Extensive technical experience with SQL on RDBMS (Oracle/MySQL/Postgresql etc) including code optimization techniques
- Strong ETL/ELT skillset using Talend BigData Edition. Experience in Talend CDC & MDM functionality will be an advantage.
- Experience & expertise in implementing complex data pipelines, including semi-structured & unstructured data processing
- Expertise to design efficient data ingestion solutions to consolidate data from RDBMS, APIs, Messaging queues, weblogs, images, audios, documents, etc of Enterprise Applications, SAAS applications, external 3rd party sites or APIs, etc through ETL/ELT, API integrations, Change Data Capture, Robotic Process Automation, Custom Python/Java Coding, etc
- Good understanding & working experience in OLAP Data Warehousing solutions (Redshift, Synapse, Snowflake, Teradata, Vertica, etc) and cloud-native Data Lake (S3, ADLS, BigQuery, etc) solutions
- Familiarity with AWS tool stack for Storage & Processing. Able to recommend the right tools/solutions available to address a technical problem
- Good knowledge of database performance and tuning, troubleshooting, query optimization, and tuning
- Good analytical skills with the ability to synthesize data to design and deliver meaningful information
- Good knowledge of Design, Development & Performance tuning of 3NF/Flat/Hybrid Data Model
- Know-how on any No-SQL DB (DynamoDB, MongoDB, CosmosDB, etc) will be an advantage.
- Ability to understand business functionality, processes, and flows
- Good combination of technical and interpersonal skills with strong written and verbal communication; detail-oriented with the ability to work independently
Functional knowledge
- Data Governance & Quality Assurance
- Distributed computing
- Linux
- Data structures and algorithm
- Unstructured Data Processing
- Must have 4 to 7 years of experience in ETL Design and Development using Informatica Components.
- Should have extensive knowledge in Unix shell scripting.
- Understanding of DW principles (Fact, Dimension tables, Dimensional Modelling and Data warehousing concepts).
- Research, development, document and modification of ETL processes as per data architecture and modeling requirements.
- Ensure appropriate documentation for all new development and modifications of the ETL processes and jobs.
- Should be good in writing complex SQL queries.
- • Selected candidates will be provided training opportunities on one or more of following: Google Cloud, AWS, DevOps Tools, Big Data technologies like Hadoop, Pig, Hive, Spark, Sqoop, Flume and
- Kafka would get chance to be part of the enterprise-grade implementation of Cloud and Big Data systems
- Will play an active role in setting up the Modern data platform based on Cloud and Big Data
- Would be part of teams with rich experience in various aspects of distributed systems and computing.
- Total Experience of 7-10 years and should be interested in teaching and research
- 3+ years’ experience in data engineering which includes data ingestion, preparation, provisioning, automated testing, and quality checks.
- 3+ Hands-on experience in Big Data cloud platforms like AWS and GCP, Data Lakes and Data Warehouses
- 3+ years of Big Data and Analytics Technologies. Experience in SQL, writing code in spark engine using python, scala or java Language. Experience in Spark, Scala
- Experience in designing, building, and maintaining ETL systems
- Experience in data pipeline and workflow management tools like Airflow
- Application Development background along with knowledge of Analytics libraries, opensource Natural Language Processing, statistical and big data computing libraries
- Familiarity with Visualization and Reporting Tools like Tableau, Kibana.
- Should be good at storytelling in Technology
Qualification: B.Tech / BE / M.Sc / MBA / B.Sc, Having Certifications in Big Data Technologies and Cloud platforms like AWS, Azure and GCP will be preferred
Primary Skills: Big Data + Python + Spark + Hive + Cloud Computing
Secondary Skills: NoSQL+ SQL + ETL + Scala + Tableau
Selection Process: 1 Hackathon, 1 Technical round and 1 HR round
Benefit: Free of cost training on Data Science from top notch professors