Role Proficiency:
This role requires proficiency in developing data pipelines including coding and testing for ingesting wrangling transforming and joining data from various sources. The ideal candidate should be adept in ETL tools like Informatica Glue Databricks and DataProc with strong coding skills in Python PySpark and SQL. This position demands independence and proficiency across various data domains. Expertise in data warehousing solutions such as Snowflake BigQuery Lakehouse and Delta Lake is essential including the ability to calculate processing costs and address performance issues. A solid understanding of DevOps and infrastructure needs is also required.
Skill Examples:
- Proficiency in SQL Python or other programming languages used for data manipulation.
- Experience with ETL tools such as Apache Airflow Talend Informatica AWS Glue Dataproc and Azure ADF.
- Hands-on experience with cloud platforms like AWS Azure or Google Cloud particularly with data-related services (e.g. AWS Glue BigQuery).
- Conduct tests on data pipelines and evaluate results against data quality and performance specifications.
- Experience in performance tuning.
- Experience in data warehouse design and cost improvements.
- Apply and optimize data models for efficient storage retrieval and processing of large datasets.
- Communicate and explain design/development aspects to customers.
- Estimate time and resource requirements for developing/debugging features/components.
- Participate in RFP responses and solutioning.
- Mentor team members and guide them in relevant upskilling and certification.
Knowledge Examples:
- Knowledge of various ETL services used by cloud providers including Apache PySpark AWS Glue GCP DataProc/Dataflow Azure ADF and ADLF.
- Proficient in SQL for analytics and windowing functions.
- Understanding of data schemas and models.
- Familiarity with domain-related data.
- Knowledge of data warehouse optimization techniques.
- Understanding of data security concepts.
- Awareness of patterns frameworks and automation practices.
Additional Comments:
# of Resources: 22 Role(s): Technical Role Location(s): India Planned Start Date: 1/1/2026 Planned End Date: 6/30/2026
Project Overview:
Role Scope / Deliverables: We are seeking highly skilled Data Engineer with strong experience in Databricks, PySpark, Python, SQL, and AWS to join our data engineering team on or before 1st week of Dec, 2025.
The candidate will be responsible for designing, developing, and optimizing large-scale data pipelines and analytics solutions that drive business insights and operational efficiency.
Design, build, and maintain scalable data pipelines using Databricks and PySpark.
Develop and optimize complex SQL queries for data extraction, transformation, and analysis.
Implement data integration solutions across multiple AWS services (S3, Glue, Lambda, Redshift, EMR, etc.).
Collaborate with analytics, data science, and business teams to deliver clean, reliable, and timely datasets.
Ensure data quality, performance, and reliability across data workflows.
Participate in code reviews, data architecture discussions, and performance optimization initiatives.
Support migration and modernization efforts for legacy data systems to modern cloud-based solutions.
Key Skills:
Hands-on experience with Databricks, PySpark & Python for building ETL/ELT pipelines.
Proficiency in SQL (performance tuning, complex joins, CTEs, window functions).
Strong understanding of AWS services (S3, Glue, Lambda, Redshift, CloudWatch, etc.).
Experience with data modeling, schema design, and performance optimization.
Familiarity with CI/CD pipelines, version control (Git), and workflow orchestration (Airflow preferred).
Excellent problem-solving, communication, and collaboration skills.
Skills: Databricks, Pyspark & Python, Sql, Aws Services
Must-Haves
Python/PySpark (5+ years), SQL (5+ years), Databricks (3+ years), AWS Services (3+ years), ETL tools (Informatica, Glue, DataProc) (3+ years)
Hands-on experience with Databricks, PySpark & Python for ETL/ELT pipelines.
Proficiency in SQL (performance tuning, complex joins, CTEs, window functions).
Strong understanding of AWS services (S3, Glue, Lambda, Redshift, CloudWatch, etc.).
Experience with data modeling, schema design, and performance optimization.
Familiarity with CI/CD pipelines, Git, and workflow orchestration (Airflow preferred).
******
Notice period - Immediate to 15 days
Location: Bangalore