Cutshort logo

50+ PySpark Jobs in India

Apply to 50+ PySpark Jobs on CutShort.io. Find your next job, effortlessly. Browse PySpark Jobs and apply today!

icon
Data Axle

at Data Axle

2 candid answers
Eman Khan
Posted by Eman Khan
Pune
6 - 9 yrs
Best in industry
skill iconMachine Learning (ML)
skill iconPython
SQL
PySpark
XGBoost

About Data Axle:

Data Axle Inc. has been an industry leader in data, marketing solutions, sales and research for over 50 years in the USA. Data Axle now as an established strategic global centre of excellence in Pune. This centre delivers mission critical data services to its global customers powered by its proprietary cloud-based technology platform and by leveraging proprietary business & consumer databases.


Data Axle Pune is pleased to have achieved certification as a Great Place to Work!


Roles & Responsibilities:

We are looking for a Senior Data Scientist to join the Data Science Client Services team to continue our success of identifying high quality target audiences that generate profitable marketing return for our clients. We are looking for experienced data science, machine learning and MLOps practitioners to design, build and deploy impactful predictive marketing solutions that serve a wide range of verticals and clients. The right candidate will enjoy contributing to and learning from a highly talented team and working on a variety of projects.


We are looking for a Senior Data Scientist who will be responsible for:

  1. Ownership of design, implementation, and deployment of machine learning algorithms in a modern Python-based cloud architecture
  2. Design or enhance ML workflows for data ingestion, model design, model inference and scoring
  3. Oversight on team project execution and delivery
  4. Establish peer review guidelines for high quality coding to help develop junior team members’ skill set growth, cross-training, and team efficiencies
  5. Visualize and publish model performance results and insights to internal and external audiences


Qualifications:

  1. Masters in a relevant quantitative, applied field (Statistics, Econometrics, Computer Science, Mathematics, Engineering)
  2. Minimum of 5 years of work experience in the end-to-end lifecycle of ML model development and deployment into production within a cloud infrastructure (Databricks is highly preferred)
  3. Proven ability to manage the output of a small team in a fast-paced environment and to lead by example in the fulfilment of client requests
  4. Exhibit deep knowledge of core mathematical principles relating to data science and machine learning (ML Theory + Best Practices, Feature Engineering and Selection, Supervised and Unsupervised ML, A/B Testing, etc.)
  5. Proficiency in Python and SQL required; PySpark/Spark experience a plus
  6. Ability to conduct a productive peer review and proper code structure in Github
  7. Proven experience developing, testing, and deploying various ML algorithms (neural networks, XGBoost, Bayes, and the like)
  8. Working knowledge of modern CI/CD methods This position description is intended to describe the duties most frequently performed by an individual in this position.


It is not intended to be a complete list of assigned duties but to describe a position level.

Read more
Deqode

at Deqode

1 recruiter
Roshni Maji
Posted by Roshni Maji
Bengaluru (Bangalore), Pune, Mumbai, Chennai, Gurugram
5 - 7 yrs
₹5L - ₹19L / yr
skill iconPython
PySpark
skill iconAmazon Web Services (AWS)
aws
Amazon Redshift
+1 more

Position: AWS Data Engineer

Experience: 5 to 7 Years

Location: Bengaluru, Pune, Chennai, Mumbai, Gurugram

Work Mode: Hybrid (3 days work from office per week)

Employment Type: Full-time

About the Role:

We are seeking a highly skilled and motivated AWS Data Engineer with 5–7 years of experience in building and optimizing data pipelines, architectures, and data sets. The ideal candidate will have strong experience with AWS services including Glue, Athena, Redshift, Lambda, DMS, RDS, and CloudFormation. You will be responsible for managing the full data lifecycle from ingestion to transformation and storage, ensuring efficiency and performance.

Key Responsibilities:

  • Design, develop, and optimize scalable ETL pipelines using AWS Glue, Python/PySpark, and SQL.
  • Work extensively with AWS services such as Glue, Athena, Lambda, DMS, RDS, Redshift, CloudFormation, and other serverless technologies.
  • Implement and manage data lake and warehouse solutions using AWS Redshift and S3.
  • Optimize data models and storage for cost-efficiency and performance.
  • Write advanced SQL queries to support complex data analysis and reporting requirements.
  • Collaborate with stakeholders to understand data requirements and translate them into scalable solutions.
  • Ensure high data quality and integrity across platforms and processes.
  • Implement CI/CD pipelines and best practices for infrastructure as code using CloudFormation or similar tools.

Required Skills & Experience:

  • Strong hands-on experience with Python or PySpark for data processing.
  • Deep knowledge of AWS Glue, Athena, Lambda, Redshift, RDS, DMS, and CloudFormation.
  • Proficiency in writing complex SQL queries and optimizing them for performance.
  • Familiarity with serverless architectures and AWS best practices.
  • Experience in designing and maintaining robust data architectures and data lakes.
  • Ability to troubleshoot and resolve data pipeline issues efficiently.
  • Strong communication and stakeholder management skills.


Read more
Deqode

at Deqode

1 recruiter
Mokshada Solanki
Posted by Mokshada Solanki
Bengaluru (Bangalore), Mumbai, Pune, Gurugram
4 - 5 yrs
₹4L - ₹20L / yr
SQL
skill iconAmazon Web Services (AWS)
Migration
PySpark
ETL

Job Summary:

Seeking a seasoned SQL + ETL Developer with 4+ years of experience in managing large-scale datasets and cloud-based data pipelines. The ideal candidate is hands-on with MySQL, PySpark, AWS Glue, and ETL workflows, with proven expertise in AWS migration and performance optimization.


Key Responsibilities:

  • Develop and optimize complex SQL queries and stored procedures to handle large datasets (100+ million records).
  • Build and maintain scalable ETL pipelines using AWS Glue and PySpark.
  • Work on data migration tasks in AWS environments.
  • Monitor and improve database performance; automate key performance indicators and reports.
  • Collaborate with cross-functional teams to support data integration and delivery requirements.
  • Write shell scripts for automation and manage ETL jobs efficiently.


Required Skills:

  • Strong experience with MySQL, complex SQL queries, and stored procedures.
  • Hands-on experience with AWS Glue, PySpark, and ETL processes.
  • Good understanding of AWS ecosystem and migration strategies.
  • Proficiency in shell scripting.
  • Strong communication and collaboration skills.


Nice to Have:

  • Working knowledge of Python.
  • Experience with AWS RDS.



Read more
Deqode

at Deqode

1 recruiter
Shraddha Katare
Posted by Shraddha Katare
Bengaluru (Bangalore), Pune, Chennai, Mumbai, Gurugram
5 - 7 yrs
₹5L - ₹19L / yr
skill iconAmazon Web Services (AWS)
skill iconPython
PySpark
SQL
redshift

Profile: AWS Data Engineer

Mode- Hybrid

Experience- 5+7 years

Locations - Bengaluru, Pune, Chennai, Mumbai, Gurugram


Roles and Responsibilities

  • Design and maintain ETL pipelines using AWS Glue and Python/PySpark
  • Optimize SQL queries for Redshift and Athena
  • Develop Lambda functions for serverless data processing
  • Configure AWS DMS for database migration and replication
  • Implement infrastructure as code with CloudFormation
  • Build optimized data models for performance
  • Manage RDS databases and AWS service integrations
  • Troubleshoot and improve data processing efficiency
  • Gather requirements from business stakeholders
  • Implement data quality checks and validation
  • Document data pipelines and architecture
  • Monitor workflows and implement alerting
  • Keep current with AWS services and best practices


Required Technical Expertise:

  • Python/PySpark for data processing
  • AWS Glue for ETL operations
  • Redshift and Athena for data querying
  • AWS Lambda and serverless architecture
  • AWS DMS and RDS management
  • CloudFormation for infrastructure
  • SQL optimization and performance tuning
Read more
Gruve
Reshika Mendiratta
Posted by Reshika Mendiratta
Bengaluru (Bangalore), Pune
5yrs+
Upto ₹50L / yr (Varies
)
skill iconPython
SQL
Data engineering
Apache Spark
PySpark
+6 more

About the Company:

Gruve is an innovative Software Services startup dedicated to empowering Enterprise Customers in managing their Data Life Cycle. We specialize in Cyber Security, Customer Experience, Infrastructure, and advanced technologies such as Machine Learning and Artificial Intelligence. Our mission is to assist our customers in their business strategies utilizing their data to make more intelligent decisions. As a well-funded early-stage startup, Gruve offers a dynamic environment with strong customer and partner networks.

 

Why Gruve:

At Gruve, we foster a culture of innovation, collaboration, and continuous learning. We are committed to building a diverse and inclusive workplace where everyone can thrive and contribute their best work. If you’re passionate about technology and eager to make an impact, we’d love to hear from you.

Gruve is an equal opportunity employer. We welcome applicants from all backgrounds and thank all who apply; however, only those selected for an interview will be contacted.

 

Position summary:

We are seeking a Senior Software Development Engineer – Data Engineering with 5-8 years of experience to design, develop, and optimize data pipelines and analytics workflows using Snowflake, Databricks, and Apache Spark. The ideal candidate will have a strong background in big data processing, cloud data platforms, and performance optimization to enable scalable data-driven solutions. 

Key Roles & Responsibilities:

  • Design, develop, and optimize ETL/ELT pipelines using Apache Spark, PySpark, Databricks, and Snowflake.
  • Implement real-time and batch data processing workflows in cloud environments (AWS, Azure, GCP).
  • Develop high-performance, scalable data pipelines for structured, semi-structured, and unstructured data.
  • Work with Delta Lake and Lakehouse architectures to improve data reliability and efficiency.
  • Optimize Snowflake and Databricks performance, including query tuning, caching, partitioning, and cost optimization.
  • Implement data governance, security, and compliance best practices.
  • Build and maintain data models, transformations, and data marts for analytics and reporting.
  • Collaborate with data scientists, analysts, and business teams to define data engineering requirements.
  • Automate infrastructure and deployments using Terraform, Airflow, or dbt.
  • Monitor and troubleshoot data pipeline failures, performance issues, and bottlenecks.
  • Develop and enforce data quality and observability frameworks using Great Expectations, Monte Carlo, or similar tools.


Basic Qualifications:

  • Bachelor’s or Master’s Degree in Computer Science or Data Science.
  • 5–8 years of experience in data engineering, big data processing, and cloud-based data platforms.
  • Hands-on expertise in Apache Spark, PySpark, and distributed computing frameworks.
  • Strong experience with Snowflake (Warehouses, Streams, Tasks, Snowpipe, Query Optimization).
  • Experience in Databricks (Delta Lake, MLflow, SQL Analytics, Photon Engine).
  • Proficiency in SQL, Python, or Scala for data transformation and analytics.
  • Experience working with data lake architectures and storage formats (Parquet, Avro, ORC, Iceberg).
  • Hands-on experience with cloud data services (AWS Redshift, Azure Synapse, Google BigQuery).
  • Experience in workflow orchestration tools like Apache Airflow, Prefect, or Dagster.
  • Strong understanding of data governance, access control, and encryption strategies.
  • Experience with CI/CD for data pipelines using GitOps, Terraform, dbt, or similar technologies.


Preferred Qualifications:

  • Knowledge of streaming data processing (Apache Kafka, Flink, Kinesis, Pub/Sub).
  • Experience in BI and analytics tools (Tableau, Power BI, Looker).
  • Familiarity with data observability tools (Monte Carlo, Great Expectations).
  • Experience with machine learning feature engineering pipelines in Databricks.
  • Contributions to open-source data engineering projects.
Read more
Deqode

at Deqode

1 recruiter
Alisha Das
Posted by Alisha Das
Pune, Mumbai, Bengaluru (Bangalore), Chennai
4 - 7 yrs
₹5L - ₹15L / yr
skill iconAmazon Web Services (AWS)
skill iconPython
PySpark
Glue semantics
Amazon Redshift
+1 more

Job Overview:

We are seeking an experienced AWS Data Engineer to join our growing data team. The ideal candidate will have hands-on experience with AWS Glue, Redshift, PySpark, and other AWS services to build robust, scalable data pipelines. This role is perfect for someone passionate about data engineering, automation, and cloud-native development.

Key Responsibilities:

  • Design, build, and maintain scalable and efficient ETL pipelines using AWS Glue, PySpark, and related tools.
  • Integrate data from diverse sources and ensure its quality, consistency, and reliability.
  • Work with large datasets in structured and semi-structured formats across cloud-based data lakes and warehouses.
  • Optimize and maintain data infrastructure, including Amazon Redshift, for high performance.
  • Collaborate with data analysts, data scientists, and product teams to understand data requirements and deliver solutions.
  • Automate data validation, transformation, and loading processes to support real-time and batch data processing.
  • Monitor and troubleshoot data pipeline issues and ensure smooth operations in production environments.

Required Skills:

  • 5 to 7 years of hands-on experience in data engineering roles.
  • Strong proficiency in Python and PySpark for data transformation and scripting.
  • Deep understanding and practical experience with AWS Glue, AWS Redshift, S3, and other AWS data services.
  • Solid understanding of SQL and database optimization techniques.
  • Experience working with large-scale data pipelines and high-volume data environments.
  • Good knowledge of data modeling, warehousing, and performance tuning.

Preferred/Good to Have:

  • Experience with workflow orchestration tools like Airflow or Step Functions.
  • Familiarity with CI/CD for data pipelines.
  • Knowledge of data governance and security best practices on AWS.
Read more
Deqode

at Deqode

1 recruiter
Shraddha Katare
Posted by Shraddha Katare
Pune, Mumbai, Bengaluru (Bangalore), Gurugram
4 - 6 yrs
₹5L - ₹10L / yr
ETL
SQL
skill iconAmazon Web Services (AWS)
PySpark
KPI

Role - ETL Developer

Work ModeHybrid

Experience- 4+ years

Location - Pune, Gurgaon, Bengaluru, Mumbai

Required Skills - AWS, AWS Glue, Pyspark, ETL, SQL

Required Skills:

  • 4+ years of hands-on experience in MySQL, including SQL queries and procedure development
  • Experience in Pyspark, AWS, AWS Glue
  • Experience in AWS ,Migration
  • Experience with automated scripting and tracking KPIs/metrics for database performance
  • Proficiency in shell scripting and ETL.
  • Strong communication skills and a collaborative team player
  • Knowledge of Python and AWS RDS is a plus


Read more
Wissen Technology

at Wissen Technology

4 recruiters
Hanisha Pralayakaveri
Posted by Hanisha Pralayakaveri
Bengaluru (Bangalore), Mumbai
5 - 9 yrs
Best in industry
skill iconPython
skill iconAmazon Web Services (AWS)
PySpark
Data engineering

Job Description: Data Engineer 

Position Overview:

Role Overview

We are seeking a skilled Python Data Engineer with expertise in designing and implementing data solutions using the AWS cloud platform. The ideal candidate will be responsible for building and maintaining scalable, efficient, and secure data pipelines while leveraging Python and AWS services to enable robust data analytics and decision-making processes.

 

Key Responsibilities

· Design, develop, and optimize data pipelines using Python and AWS services such as Glue, Lambda, S3, EMR, Redshift, Athena, and Kinesis.

· Implement ETL/ELT processes to extract, transform, and load data from various sources into centralized repositories (e.g., data lakes or data warehouses).

· Collaborate with cross-functional teams to understand business requirements and translate them into scalable data solutions.

· Monitor, troubleshoot, and enhance data workflows for performance and cost optimization.

· Ensure data quality and consistency by implementing validation and governance practices.

· Work on data security best practices in compliance with organizational policies and regulations.

· Automate repetitive data engineering tasks using Python scripts and frameworks.

· Leverage CI/CD pipelines for deployment of data workflows on AWS.

Read more
Deqode

at Deqode

1 recruiter
Roshni Maji
Posted by Roshni Maji
Remote only
5 - 7 yrs
₹12L - ₹16L / yr
skill iconPython
Google Cloud Platform (GCP)
SQL
PySpark
Data Transformation Tool (DBT)
+2 more

Role: GCP Data Engineer

Notice Period: Immediate Joiners

Experience: 5+ years

Location: Remote

Company: Deqode


About Deqode

At Deqode, we work with next-gen technologies to help businesses solve complex data challenges. Our collaborative teams build reliable, scalable systems that power smarter decisions and real-time analytics.


Key Responsibilities

  • Build and maintain scalable, automated data pipelines using Python, PySpark, and SQL.
  • Work on cloud-native data infrastructure using Google Cloud Platform (BigQuery, Cloud Storage, Dataflow).
  • Implement clean, reusable transformations using DBT and Databricks.
  • Design and schedule workflows using Apache Airflow.
  • Collaborate with data scientists and analysts to ensure downstream data usability.
  • Optimize pipelines and systems for performance and cost-efficiency.
  • Follow best software engineering practices: version control, unit testing, code reviews, CI/CD.
  • Manage and troubleshoot data workflows in Linux environments.
  • Apply data governance and access control via Unity Catalog or similar tools.


Required Skills & Experience

  • Strong hands-on experience with PySpark, Spark SQL, and Databricks.
  • Solid understanding of GCP services (BigQuery, Cloud Functions, Dataflow, Cloud Storage).
  • Proficiency in Python for scripting and automation.
  • Expertise in SQL and data modeling.
  • Experience with DBT for data transformations.
  • Working knowledge of Airflow for workflow orchestration.
  • Comfortable with Linux-based systems for deployment and troubleshooting.
  • Familiar with Git for version control and collaborative development.
  • Understanding of data pipeline optimization, monitoring, and debugging.
Read more
ZeMoSo Technologies

at ZeMoSo Technologies

11 recruiters
Agency job
via TIGI HR Solution Pvt. Ltd. by Vaidehi Sarkar
Mumbai, Bengaluru (Bangalore), Hyderabad, Chennai, Pune
4 - 8 yrs
₹10L - ₹15L / yr
Data engineering
skill iconPython
SQL
Data Warehouse (DWH)
skill iconAmazon Web Services (AWS)
+3 more

Work Mode: Hybrid


Need B.Tech, BE, M.Tech, ME candidates - Mandatory



Must-Have Skills:

● Educational Qualification :- B.Tech, BE, M.Tech, ME in any field.

● Minimum of 3 years of proven experience as a Data Engineer.

● Strong proficiency in Python programming language and SQL.

● Experience in DataBricks and setting up and managing data pipelines, data warehouses/lakes.

● Good comprehension and critical thinking skills.


● Kindly note Salary bracket will vary according to the exp. of the candidate - 

- Experience from 4 yrs to 6 yrs - Salary upto 22 LPA

- Experience from 5 yrs to 8 yrs - Salary upto 30 LPA

- Experience more than 8 yrs - Salary upto 40 LPA

Read more
Moative

at Moative

3 candid answers
Eman Khan
Posted by Eman Khan
Chennai
4 - 7 yrs
₹12L - ₹30L / yr
skill iconPython
PySpark
skill iconScala
MLOps
MLFlow
+7 more

About Moative

Moative, an Applied AI Services company, designs AI roadmaps, builds co-pilots and predictive AI solutions for companies in energy, utilities, packaging, commerce, and other primary industries. Through Moative Labs, we aspire to build micro-products and launch AI startups in vertical markets. 


Work you’ll do

As a ML/ AI Engineer, you will be responsible for designing and developing intelligent software to solve business problems. You will collaborate with data scientists and domain experts to incorporate ML and AI technologies into existing or new workflows. You’ll analyze new opportunities and ideas. You’ll train and evaluate ML models, conduct experiments, develop PoCs and prototypes.


Responsibilities

  • Designing, training, improving & launching machine learning models using tools such as XGBoost, Tensorflow, PyTorch.
  • Own the end-to-end ML lifecycle and MLOps, including model deployment, performance tuning, on-going evaluation and maintenance.
  • Improve the way we evaluate and monitor model and system performances.
  • Proposing and implementing ideas that directly impact our operational and strategic metrics.
  • Create tools and frameworks that accelerate the delivery of ML/ AI products.


Who you are

You are an engineer who is passionate about using AL/ML to improve processes, products and delight customers. You have experience working with less than clean data, developing ML models, and orchestrating the deployment of them to production. You thrive on taking initiatives, are very comfortable with ambiguity and can passionately defend your decisions.


Requirements and skills

  • 4+ years of experience in programming languages such as Python, PySpark, or Scala.
  • Proficient Knowledge of cloud platforms (e.g., AWS, Azure, GCP) and containerization, DevOps (Docker, Kubernetes), and MLOps practices and platforms like MLflow.
  • Strong understanding of ML algorithms and frameworks (e.g., TensorFlow, PyTorch).
  • Experience with AI foundational models and associated architectural and solution development frameworks
  • Broad understanding of data structures, data engineering, statistical methodologies and machine learning models.
  • Strong communication skills and teamwork.


Working at Moative

Moative is a young company, but we believe strongly in thinking long-term, while acting with urgency. Our ethos is rooted in innovation, efficiency and high-quality outcomes. We believe the future of work is AI-augmented and boundary less.


Here are some of our guiding principles:

  • Think in decades. Act in hours. As an independent company, our moat is time. While our decisions are for the long-term horizon, our execution will be fast – measured in hours and days, not weeks and months.
  • Own the canvas. Throw yourself in to build, fix or improve – anything that isn’t done right, irrespective of who did it. Be selfish about improving across the organization – because once the rot sets in, we waste years in surgery and recovery.
  • Use data or don’t use data. Use data where you ought to but not as a ‘cover-my-back’ political tool. Be capable of making decisions with partial or limited data. Get better at intuition and pattern-matching. Whichever way you go, be mostly right about it.
  • Avoid work about work. Process creeps on purpose, unless we constantly question it. We are deliberate about committing to rituals that take time away from the actual work. We truly believe that a meeting that could be an email, should be an email and you don’t need a person with the highest title to say that out loud.
  • High revenue per person. We work backwards from this metric. Our default is to automate instead of hiring. We multi-skill our people to own more outcomes than hiring someone who has less to do. We don’t like squatting and hoarding that comes in the form of hiring for growth. High revenue per person comes from high quality work from everyone. We demand it.


If this role and our work is of interest to you, please apply. We encourage you to apply even if you believe you do not meet all the requirements listed above.  


That said, you should demonstrate that you are in the 90th percentile or above. This may mean that you have studied in top-notch institutions, won competitions that are intellectually demanding, built something of your own, or rated as an outstanding performer by your current or previous employers. 


The position is based out of Chennai. Our work currently involves significant in-person collaboration and we expect you to work out of our offices in Chennai.

Read more
Data Axle

at Data Axle

2 candid answers
Eman Khan
Posted by Eman Khan
Pune
7 - 10 yrs
Best in industry
Google Cloud Platform (GCP)
ETL
skill iconPython
skill iconJava
skill iconScala
+4 more

About Data Axle:

Data Axle Inc. has been an industry leader in data, marketing solutions, sales and research for over 45 years in the USA. Data Axle has set up a strategic global center of excellence in Pune. This center delivers mission critical data services to its global customers powered by its proprietary cloud-based technology platform and by leveraging proprietary business & consumer databases. Data Axle is headquartered in Dallas, TX, USA.


Roles and Responsibilities:

  • Design, implement, and manage scalable analytical data infrastructure, enabling efficient access to large datasets and high-performance computing on Google Cloud Platform (GCP).
  • Develop and optimize data pipelines using GCP-native services like BigQuery, Dataflow, Dataproc, Pub/Sub, Cloud Data Fusion, and Cloud Storage.
  • Work with diverse data sources to extract, transform, and load data into enterprise-grade data lakes and warehouses, ensuring high availability and reliability.
  • Implement and maintain real-time data streaming solutions using Pub/Sub, Dataflow, and Kafka.
  • Research and integrate the latest big data and visualization technologies to enhance analytics capabilities and improve efficiency.
  • Collaborate with cross-functional teams to implement machine learning models and AI-driven analytics solutions using Vertex AI and BigQuery ML.
  • Continuously improve existing data architectures to support scalability, performance optimization, and cost efficiency.
  • Enhance data security and governance by implementing industry best practices for access control, encryption, and compliance.
  • Automate and optimize data workflows to simplify reporting, dashboarding, and self-service analytics using Looker and Data Studio.


Basic Qualifications

  • 7+ years of experience in data engineering, software development, business intelligence, or data science, with expertise in large-scale data processing and analytics.
  • Strong proficiency in SQL and experience with BigQuery for data warehousing.
  • Hands-on experience in designing and developing ETL/ELT pipelines using GCP services (Cloud Composer, Dataflow, Dataproc, Data Fusion, or Apache Airflow).
  • Expertise in distributed computing and big data processing frameworks, such as Apache Spark, Hadoop, or Flink, particularly within Dataproc and Dataflow environments.
  • Experience with business intelligence and data visualization tools, such as Looker, Tableau, or Power BI.
  • Knowledge of data governance, security best practices, and compliance requirements in cloud environments.


Preferred Qualifications:

  • Degree/Diploma in Computer Science, Engineering, Mathematics, or a related technical field.
  • Experience working with GCP big data technologies, including BigQuery, Dataflow, Dataproc, Pub/Sub, and Cloud SQL.
  • Hands-on experience with real-time data processing frameworks, including Kafka and Apache Beam.
  • Proficiency in Python, Java, or Scala for data engineering and pipeline development.
  • Familiarity with DevOps best practices, CI/CD pipelines, Terraform, and infrastructure-as-code for managing GCP resources.
  • Experience integrating AI/ML models into data workflows, leveraging BigQuery ML, Vertex AI, or TensorFlow.
  • Understanding of Agile methodologies, software development life cycle (SDLC), and cloud cost optimization strategies.
Read more
Data Axle

at Data Axle

2 candid answers
Eman Khan
Posted by Eman Khan
Pune
9 - 12 yrs
Best in industry
skill iconPython
PySpark
skill iconMachine Learning (ML)
SQL
skill iconData Science
+1 more

Roles & Responsibilities:  

We are looking for a Data Scientist to join the Data Science Client Services team to continue our success of identifying high quality target audiences that generate profitable marketing return for our clients. We are looking for experienced data science, machine learning and MLOps practitioners to design, build and deploy impactful predictive marketing solutions that serve a wide range of verticals and clients. The right candidate will enjoy contributing to and learning from a highly talented team and working on a variety of projects.  


We are looking for a Lead Data Scientist who will be responsible for  

  • Ownership of design, implementation, and deployment of machine learning algorithms in a modern Python-based cloud architecture  
  • Design or enhance ML workflows for data ingestion, model design, model inference and scoring 3. Oversight on team project execution and delivery  
  • Establish peer review guidelines for high quality coding to help develop junior team members’ skill set growth, cross-training, and team efficiencies  
  • Visualize and publish model performance results and insights to internal and external audiences  


Qualifications:  

  • Masters in a relevant quantitative, applied field (Statistics, Econometrics, Computer Science, Mathematics, Engineering)  
  • Minimum of 9+ years of work experience in the end-to-end lifecycle of ML model development and deployment into production within a cloud infrastructure (Databricks is highly preferred)
  • Exhibit deep knowledge of core mathematical principles relating to data science and machine learning (ML Theory + Best Practices, Feature Engineering and Selection, Supervised and Unsupervised ML, A/B Testing, etc.)  
  • Proficiency in Python and SQL required; PySpark/Spark experience a plus  
  • Ability to conduct a productive peer review and proper code structure in Github
  • Proven experience developing, testing, and deploying various ML algorithms (neural networks, XGBoost, Bayes, and the like)  
  • Working knowledge of modern CI/CD methods  


This position description is intended to describe the duties most frequently performed by an individual in this position. It is not intended to be a complete list of assigned duties but to describe a position level. 

Read more
Deqode

at Deqode

1 recruiter
Alisha Das
Posted by Alisha Das
Bengaluru (Bangalore), Delhi, Gurugram, Noida, Ghaziabad, Faridabad, Mumbai, Pune, Hyderabad, Indore, Jaipur, Kolkata
4 - 5 yrs
₹2L - ₹18L / yr
skill iconPython
PySpark

We are looking for a skilled and passionate Data Engineers with a strong foundation in Python programming and hands-on experience working with APIs, AWS cloud, and modern development practices. The ideal candidate will have a keen interest in building scalable backend systems and working with big data tools like PySpark.

Key Responsibilities:

  • Write clean, scalable, and efficient Python code.
  • Work with Python frameworks such as PySpark for data processing.
  • Design, develop, update, and maintain APIs (RESTful).
  • Deploy and manage code using GitHub CI/CD pipelines.
  • Collaborate with cross-functional teams to define, design, and ship new features.
  • Work on AWS cloud services for application deployment and infrastructure.
  • Basic database design and interaction with MySQL or DynamoDB.
  • Debugging and troubleshooting application issues and performance bottlenecks.

Required Skills & Qualifications:

  • 4+ years of hands-on experience with Python development.
  • Proficient in Python basics with a strong problem-solving approach.
  • Experience with AWS Cloud services (EC2, Lambda, S3, etc.).
  • Good understanding of API development and integration.
  • Knowledge of GitHub and CI/CD workflows.
  • Experience in working with PySpark or similar big data frameworks.
  • Basic knowledge of MySQL or DynamoDB.
  • Excellent communication skills and a team-oriented mindset.

Nice to Have:

  • Experience in containerization (Docker/Kubernetes).
  • Familiarity with Agile/Scrum methodologies.


Read more
NA

NA

Agency job
via Method Hub by Sampreetha Pai
anywhere in India
4 - 5 yrs
₹18L - ₹22L / yr
SQL Azure
Apache Spark
DevOps
PySpark
skill iconPython
+1 more

Azure DE

Primary Responsibilities -

  • Create and maintain data storage solutions including Azure SQL Database, Azure Data Lake, and Azure Blob Storage.
  • Design, implement, and maintain data pipelines for data ingestion, processing, and transformation in Azure Create data models for analytics purposes
  • Utilizing Azure Data Factory or comparable technologies, create and maintain ETL (Extract, Transform, Load) operations
  • Use Azure Data Factory and Databricks to assemble large, complex data sets
  • Implementing data validation and cleansing procedures will ensure the quality, integrity, and dependability of the data.
  • Ensure data security and compliance
  • Collaborate with data engineers, and other stakeholders to understand requirements and translate them into scalable and reliable data platform architectures

Required skills:

  • Blend of technical expertise, analytical problem-solving, and collaboration with cross-functional teams
  • Azure DevOps
  • Apache Spark, Python
  • SQL proficiency
  • Azure Databricks knowledge
  • Big data technologies


The DEs should be well versed in coding, spark core and data ingestion using Azure. Moreover, they need to be decent in terms of communication skills. They should also have core Azure DE skills and coding skills (pyspark, python and SQL).

Out of the 7 open demands, 5 of The Azure Data Engineers should have minimum 5 years of relevant Data Engineering experience.


Read more
NonStop io Technologies Pvt Ltd
Kalyani Wadnere
Posted by Kalyani Wadnere
Pune
2 - 4 yrs
Best in industry
AWS Lambda
databricks
Database migration
Apache Kafka
Apache Spark
+3 more

About NonStop io Technologies:

NonStop io Technologies is a value-driven company with a strong focus on process-oriented software engineering. We specialize in Product Development and have a decade's worth of experience in building web and mobile applications across various domains. NonStop io Technologies follows core principles that guide its operations and believes in staying invested in a product's vision for the long term. We are a small but proud group of individuals who believe in the 'givers gain' philosophy and strive to provide value in order to seek value. We are committed to and specialize in building cutting-edge technology products and serving as trusted technology partners for startups and enterprises. We pride ourselves on fostering innovation, learning, and community engagement. Join us to work on impactful projects in a collaborative and vibrant environment.

Brief Description:

We are looking for a talented Data Engineer to join our team. In this role, you will design, implement, and manage data pipelines, ensuring the accessibility and reliability of data for critical business processes. This is an exciting opportunity to work on scalable solutions that power data-driven decisions

Skillset:

Here is a list of some of the technologies you will work with (the list below is not set in stone)

Data Pipeline Orchestration and Execution:

● AWS Glue

● AWS Step Functions

● Databricks Change

Data Capture:

● Amazon Database Migration Service

● Amazon Managed Streaming for Apache Kafka with Debezium Plugin

Batch:

● AWS step functions (and Glue Jobs)

● Asynchronous queueing of batch job commands with RabbitMQ to various “ETL Jobs”

● Cron and subervisord processing on dedicated job server(s): Python & PHP

Streaming:

● Real-time processing via AWS MSK (Kafka), Apache Hudi, & Apache Flink

● Near real-time processing via worker (listeners) spread over AWS Lambda, custom server (daemons) written in Python and PHP Symfony

● Languages: Python & PySpark, Unix Shell, PHP Symfony (with Doctrine ORM)

● Monitoring & Reliability: Datadog & Cloudwatch

Things you will do:

● Build dashboards using Datadog and Cloudwatch to ensure system health and user support

● Build schema registries that enable data governance

● Partner with end-users to resolve service disruptions and evangelize our data product offerings

● Vigilantly oversee data quality and alert upstream data producers of issues

● Support and contribute to the data platform architecture strategy, roadmap, and implementation plans to support the company’s data-driven initiatives and business objective

● Work with Business Intelligence (BI) consumers to deliver enterprise-wide fact and dimension data product tables to enable data-driven decision-making across the organization.

● Other duties as assigned

Read more
Xebia IT Architects

at Xebia IT Architects

2 recruiters
Vijay S
Posted by Vijay S
Bengaluru (Bangalore), Gurugram, Pune, Hyderabad, Chennai, Bhopal, Jaipur
10 - 15 yrs
₹30L - ₹40L / yr
Spark
Google Cloud Platform (GCP)
skill iconPython
Apache Airflow
PySpark
+1 more

We are looking for a Senior Data Engineer with strong expertise in GCP, Databricks, and Airflow to design and implement a GCP Cloud Native Data Processing Framework. The ideal candidate will work on building scalable data pipelines and help migrate existing workloads to a modern framework.


  • Shift: 2 PM 11 PM
  • Work Mode: Hybrid (3 days a week) across Xebia locations
  • Notice Period: Immediate joiners or those with a notice period of up to 30 days


Key Responsibilities:

  • Design and implement a GCP Native Data Processing Framework leveraging Spark and GCP Cloud Services.
  • Develop and maintain data pipelines using Databricks and Airflow for transforming Raw → Silver → Gold data layers.
  • Ensure data integrity, consistency, and availability across all systems.
  • Collaborate with data engineers, analysts, and stakeholders to optimize performance.
  • Document standards and best practices for data engineering workflows.

Required Experience:


  • 7-8 years of experience in data engineering, architecture, and pipeline development.
  • Strong knowledge of GCP, Databricks, PySpark, and BigQuery.
  • Experience with Orchestration tools like Airflow, Dagster, or GCP equivalents.
  • Understanding of Data Lake table formats (Delta, Iceberg, etc.).
  • Proficiency in Python for scripting and automation.
  • Strong problem-solving skills and collaborative mindset.


⚠️ Please apply only if you have not applied recently or are not currently in the interview process for any open roles at Xebia.


Looking forward to your response!


Best regards,

Vijay S

Assistant Manager - TAG

https://www.linkedin.com/in/vijay-selvarajan/

Read more
Data Havn

Data Havn

Agency job
via Infinium Associate by Toshi Srivastava
Noida
5 - 9 yrs
₹40L - ₹60L / yr
skill iconPython
SQL
Data engineering
Snowflake
ETL
+5 more

About the Role:

We are seeking a talented Lead Data Engineer to join our team and play a pivotal role in transforming raw data into valuable insights. As a Data Engineer, you will design, develop, and maintain robust data pipelines and infrastructure to support our organization's analytics and decision-making processes.

Responsibilities:

  • Data Pipeline Development: Build and maintain scalable data pipelines to extract, transform, and load (ETL) data from various sources (e.g., databases, APIs, files) into data warehouses or data lakes.
  • Data Infrastructure: Design, implement, and manage data infrastructure components, including data warehouses, data lakes, and data marts.
  • Data Quality: Ensure data quality by implementing data validation, cleansing, and standardization processes.
  • Team Management: Able to handle team.
  • Performance Optimization: Optimize data pipelines and infrastructure for performance and efficiency.
  • Collaboration: Collaborate with data analysts, scientists, and business stakeholders to understand their data needs and translate them into technical requirements.
  • Tool and Technology Selection: Evaluate and select appropriate data engineering tools and technologies (e.g., SQL, Python, Spark, Hadoop, cloud platforms).
  • Documentation: Create and maintain clear and comprehensive documentation for data pipelines, infrastructure, and processes.

 

 

 

 

Skills:

  • Strong proficiency in SQL and at least one programming language (e.g., Python, Java).
  • Experience with data warehousing and data lake technologies (e.g., Snowflake, AWS Redshift, Databricks).
  • Knowledge of cloud platforms (e.g., AWS, GCP, Azure) and cloud-based data services.
  • Understanding of data modeling and data architecture concepts.
  • Experience with ETL/ELT tools and frameworks.
  • Excellent problem-solving and analytical skills.
  • Ability to work independently and as part of a team.

Preferred Qualifications:

  • Experience with real-time data processing and streaming technologies (e.g., Kafka, Flink).
  • Knowledge of machine learning and artificial intelligence concepts.
  • Experience with data visualization tools (e.g., Tableau, Power BI).
  • Certification in cloud platforms or data engineering.


Read more
Deqode

at Deqode

1 recruiter
Shraddha Katare
Posted by Shraddha Katare
Pune
2 - 5 yrs
₹3L - ₹10L / yr
PySpark
skill iconAmazon Web Services (AWS)
AWS Lambda
SQL
Data engineering
+2 more


Here is the Job Description - 


Location -- Viman Nagar, Pune

Mode - 5 Days Working


Required Tech Skills:


 ● Strong at PySpark, Python

 ● Good understanding of Data Structure 

 ● Good at SQL query/optimization 

 ● Strong fundamentals of OOPs programming 

 ● Good understanding of AWS Cloud, Big Data. 

 ● Data Lake, AWS Glue, Athena, S3, Kinesis, SQL/NoSQL DB  


Read more
Nirmitee.io

at Nirmitee.io

4 recruiters
Gitashri K
Posted by Gitashri K
Pune
5 - 10 yrs
₹8L - ₹15L / yr
skill iconPython
PySpark
skill iconAmazon Web Services (AWS)
CI/CD
skill iconGitHub

About the Role:

We are seeking a skilled Python Backend Developer to join our dynamic team. This role focuses on designing, building, and maintaining efficient, reusable, and reliable code that supports both monolithic and microservices architectures. The ideal candidate will have a strong understanding of backend frameworks and architectures, proficiency in asynchronous programming, and familiarity with deployment processes. Experience with AI model deployment is a plus.

Overall 5+ years of IT experience with minimum of 5+ Yrs of experience on Python and in Opensource web framework (Django) with AWS Experience.


Key Responsibilities:

- Develop, optimize, and maintain backend systems using Python, Pyspark, and FastAPI.

- Design and implement scalable architectures, including both monolithic and microservices.

-3+ Years of working experience in AWS (Lambda, Serverless, Step Function and EC2)

-Deep Knowledge on Python Flask/Django Framework

-Good understanding of REST API’s

-Sound Knowledge on Database

-Excellent problem-solving and analytical skills

-Leadership Skills, Good Communication Skills, interested to learn modern technologies

- Apply design patterns (MVC, Singleton, Observer, Factory) to solve complex problems effectively.

- Work with web servers (Nginx, Apache) and deploy web applications and services.

- Create and manage RESTful APIs; familiarity with GraphQL is a plus.

- Use asynchronous programming techniques (ASGI, WSGI, async/await) to enhance performance.

- Integrate background job processing with Celery and RabbitMQ, and manage caching mechanisms using Redis and Memcached.

- (Optional) Develop containerized applications using Docker and orchestrate deployments with Kubernetes.


Required Skills:

- Languages & Frameworks:Python, Django, AWS

- Backend Architecture & Design:Strong knowledge of monolithic and microservices architectures, design patterns, and asynchronous programming.

- Web Servers & Deployment:Proficient in Nginx and Apache, and experience in RESTful API design and development. GraphQL experience is a plus.

-Background Jobs & Task Queues: Proficiency in Celery and RabbitMQ, with experience in caching (Redis, Memcached).

- Additional Qualifications: Knowledge of Docker and Kubernetes (optional), with any exposure to AI model deployment considered a bonus.


Qualifications:

- Bachelor’s degree in Computer Science, Engineering, or a related field.

- 5+ years of experience in backend development using Python and Django and AWS.

- Demonstrated ability to design and implement scalable and robust architectures.

- Strong problem-solving skills, attention to detail, and a collaborative mindset.


Preferred:

- Experience with Docker/Kubernetes for containerization and orchestration.

- Exposure to AI model deployment processes.

Read more
Koantek
Bhoomika Varshney
Posted by Bhoomika Varshney
Remote only
4 - 8 yrs
₹10L - ₹30L / yr
skill iconPython
databricks
SQL
Spark
PySpark
+3 more

The Sr AWS/Azure/GCP Databricks Data Engineer at Koantek will use comprehensive

modern data engineering techniques and methods with Advanced Analytics to support

business decisions for our clients. Your goal is to support the use of data-driven insights

to help our clients achieve business outcomes and objectives. You can collect, aggregate, and analyze structured/unstructured data from multiple internal and external sources and

patterns, insights, and trends to decision-makers. You will help design and build data

pipelines, data streams, reporting tools, information dashboards, data service APIs, data

generators, and other end-user information portals and insight tools. You will be a critical

part of the data supply chain, ensuring that stakeholders can access and manipulate data

for routine and ad hoc analysis to drive business outcomes using Advanced Analytics. You are expected to function as a productive member of a team, working and

communicating proactively with engineering peers, technical lead, project managers, product owners, and resource managers. Requirements:

 Strong experience as an AWS/Azure/GCP Data Engineer and must have

AWS/Azure/GCP Databricks experience.  Expert proficiency in Spark Scala, Python, and spark

 Must have data migration experience from on-prem to cloud

 Hands-on experience in Kinesis to process & analyze Stream Data, Event/IoT Hubs, and Cosmos

 In depth understanding of Azure/AWS/GCP cloud and Data lake and Analytics

solutions on Azure.  Expert level hands-on development Design and Develop applications on Databricks.  Extensive hands-on experience implementing data migration and data processing

using AWS/Azure/GCP services

 In depth understanding of Spark Architecture including Spark Streaming, Spark Core, Spark SQL, Data Frames, RDD caching, Spark MLib

 Hands-on experience with the Technology stack available in the industry for data

management, data ingestion, capture, processing, and curation: Kafka, StreamSets, Attunity, GoldenGate, Map Reduce, Hadoop, Hive, Hbase, Cassandra, Spark, Flume, Hive, Impala, etc

 Hands-on knowledge of data frameworks, data lakes and open-source projects such

asApache Spark, MLflow, and Delta Lake

 Good working knowledge of code versioning tools [such as Git, Bitbucket or SVN]

 Hands-on experience in using Spark SQL with various data sources like JSON, Parquet and Key Value Pair

 Experience preparing data for Data Science and Machine Learning with exposure to- model selection, model lifecycle, hyperparameter tuning, model serving, deep

learning, etc

 Demonstrated experience preparing data, automating and building data pipelines for

AI Use Cases (text, voice, image, IoT data etc. ).  Good to have programming language experience with. NET or Spark/Scala

 Experience in creating tables, partitioning, bucketing, loading and aggregating data

using Spark Scala, Spark SQL/PySpark

 Knowledge of AWS/Azure/GCP DevOps processes like CI/CD as well as Agile tools

and processes including Git, Jenkins, Jira, and Confluence

 Working experience with Visual Studio, PowerShell Scripting, and ARM templates.  Able to build ingestion to ADLS and enable BI layer for Analytics

 Strong understanding of Data Modeling and defining conceptual logical and physical

data models.  Big Data/analytics/information analysis/database management in the cloud

 IoT/event-driven/microservices in the cloud- Experience with private and public cloud

architectures, pros/cons, and migration considerations.  Ability to remain up to date with industry standards and technological advancements

that will enhance data quality and reliability to advance strategic initiatives


 Working knowledge of RESTful APIs, OAuth2 authorization framework and security

best practices for API Gateways

 Guide customers in transforming big data projects, including development and

deployment of big data and AI applications

 Guide customers on Data engineering best practices, provide proof of concept, architect solutions and collaborate when needed

 2+ years of hands-on experience designing and implementing multi-tenant solutions


using AWS/Azure/GCP Databricks for data governance, data pipelines for near real-

time data warehouse, and machine learning solutions.  Over all 5+ years' experience in a software development, data engineering, or data


analytics field using Python, PySpark, Scala, Spark, Java, or equivalent technologies.  hands-on expertise in Apache SparkTM (Scala or Python)

 3+ years of experience working in query tuning, performance tuning, troubleshooting, and debugging Spark and other big data solutions.  Bachelor's or Master's degree in Big Data, Computer Science, Engineering, Mathematics, or similar area of study or equivalent work experience

 Ability to manage competing priorities in a fast-paced environment

 Ability to resolve issues

 Basic experience with or knowledge of agile methodologies

 AWS Certified: Solutions Architect Professional

 Databricks Certified Associate Developer for Apache Spark

 Microsoft Certified: Azure Data Engineer Associate

 GCP Certified: Professional Google Cloud Certified

Read more
IT Service company

IT Service company

Agency job
via Vinprotoday by Vikas Gaur
Mumbai
4 - 10 yrs
₹8L - ₹30L / yr
Google Cloud Platform (GCP)
Workflow
TensorFlow
Deployment management
PySpark
+1 more

Key Responsibilities:

Design, develop, and optimize scalable data pipelines and ETL processes.

Work with large datasets using GCP services like BigQuery, Dataflow, and Cloud Storage.

Implement real-time data streaming and processing solutions using Pub/Sub and Dataproc.

Collaborate with cross-functional teams to ensure data quality and governance.


Technical Requirements:

4+ years of experience in Data Engineering.

Strong expertise in GCP services like Workflow,tensorflow, Dataproc, and Cloud Storage.

Proficiency in SQL and programming languages such as Python or Java

.Experience in designing and implementing data pipelines

and working with real-time data processing.

Familiarity with CI/CD pipelines and cloud security best practices.

Read more
NeoGenCode Technologies Pvt Ltd
Akshay Patil
Posted by Akshay Patil
Pune
4 - 8 yrs
₹1L - ₹12L / yr
PySpark
Data engineering
Big Data
Hadoop
Spark
+4 more

Job Description :

Job Title : Data Engineer

Location : Pune (Hybrid Work Model)

Experience Required : 4 to 8 Years


Role Overview :

We are seeking talented and driven Data Engineers to join our team in Pune. The ideal candidate will have a strong background in data engineering with expertise in Python, PySpark, and SQL. You will be responsible for designing, building, and maintaining scalable data pipelines and systems that empower our business intelligence and analytics initiatives.


Key Responsibilities:

  • Develop, optimize, and maintain ETL pipelines and data workflows.
  • Design and implement scalable data solutions using Python, PySpark, and SQL.
  • Collaborate with cross-functional teams to gather and analyze data requirements.
  • Ensure data quality, integrity, and security throughout the data lifecycle.
  • Monitor and troubleshoot data pipelines to ensure reliability and performance.
  • Work on hybrid data environments involving on-premise and cloud-based systems.
  • Assist in the deployment and maintenance of big data solutions.

Required Skills and Qualifications:

  • Bachelor’s degree in Computer Science, Information Technology, or related field.
  • 4 to 8 Years of experience in Data Engineering or related roles.
  • Proficiency in Python and PySpark for data processing and analysis.
  • Strong SQL skills with experience in writing complex queries and optimizing performance.
  • Familiarity with data pipeline tools and frameworks.
  • Knowledge of cloud platforms such as AWS, Azure, or GCP is a plus.
  • Excellent problem-solving and analytical skills.
  • Strong communication and teamwork abilities.

Preferred Qualifications:

  • Experience with big data technologies like Hadoop, Hive, or Spark.
  • Familiarity with data visualization tools and techniques.
  • Knowledge of CI/CD pipelines and DevOps practices in a data engineering context.

Work Model:

  • This position follows a hybrid work model, with candidates expected to work from the Pune office as per business needs.

Why Join Us?

  • Opportunity to work with cutting-edge technologies.
  • Collaborative and innovative work environment.
  • Competitive compensation and benefits.
  • Clear career progression and growth opportunities.


Read more
Indigrators solutions
Hyderabad
5 - 8 yrs
₹18L - ₹24L / yr
skill iconPython
PySpark
Palantir Foundry
Palantir
Foundry

Job Description


Job Title: Data Engineer

Location: Hyderabad, India

Job Type: Full Time

Experience: 5 – 8 Years

Working Model: On-Site (No remote or work-from-home options available)

Work Schedule: Mountain Time Zone (3:00 PM to 11:00 PM IST)

Role Overview

The Data Engineer will be responsible for designing and implementing scalable backend systems, leveraging Python and PySpark to build high-performance solutions. The role requires a proactive and detail-orientated individual who can solve complex data engineering challenges while collaborating with cross-functional teams to deliver quality results.

Key Responsibilities

  • Develop and maintain backend systems using Python and PySpark.
  • Optimise and enhance system performance for large-scale data processing.
  • Collaborate with cross-functional teams to define requirements and deliver solutions.
  • Debug, troubleshoot, and resolve system issues and bottlenecks.
  • Follow coding best practices to ensure code quality and maintainability.
  • Utilise tools like Palantir Foundry for data management workflows (good to have).

Qualifications

  • Strong proficiency in Python backend development.
  • Hands-on experience with PySpark for data engineering.
  • Excellent problem-solving skills and attention to detail.
  • Good communication skills for effective team collaboration.
  • Experience with Palantir Foundry or similar platforms is a plus.

Preferred Skills

  • Experience with large-scale data processing and pipeline development.
  • Familiarity with agile methodologies and development tools.
  • Ability to optimise and streamline backend processes effectively.


Read more
Experiencecom
Remote only
7 - 12 yrs
₹20L - ₹35L / yr
Google Cloud Platform (GCP)
Big Data
skill iconPython
SQL
pandas
+3 more

Description


Come Join Us


Experience.com - We make every experience matter more

Position: Senior GCP Data Engineer

Job Location: Chennai (Base Location) / Remote

Employment Type: Full Time


Summary of Position

A Senior Data Engineer is a professional who specializes in preparing big data infrastructure for analytical or operational uses. He/She is responsible for develops and maintains scalable data pipelines and builds out new API integrations to support continuing increases in data volume and complexity. They collaborate with data scientists and business teams to improve data models that feed business intelligence tools, increasing data accessibility and fostering data-driven decision making across the organisation.


Responsibilities:

  • Collaborate with cross-functional teams to define, prioritize, and execute data engineering initiatives aligned with business objectives.
  • Design and implement scalable, reliable, and secure data solutions by industry best practices and compliance requirements.
  • Drive the adoption of cloud-native technologies and architectural patterns to optimize the performance, cost, and reliability of data pipelines and analytics solutions.
  • Mentor and lead a team of Data Engineers.
  • Demonstrate a drive to learn and master new technologies and techniques.
  • Apply strong problem-solving skills with an emphasis on building data-driven or AI-enhanced products.
  • Coordinate with ML/AI and engineering teams to understand data requirements.


Experience & Skills:

  • 8+ years of Strong experience in ETL and ELT data from various sources in Data Warehouses
  • 8+ years of experience in Python, Pandas, Numpy, and SciPy.
  • 5+ years of Experience in GCP 
  • 5+ years of Experience in BigQuery, PySpark, and Pub/Sub
  • 5+ years of Experience working with and creating data architectures.
  • Certified in Google Cloud Professional Data Engineer.
  • Advanced proficiency in Google Cloud services such as Dataflow, Dataproc, Dataprep, Data Studio, and Cloud Composer.
  • Proficient in writing complex Spark (PySpark) User Defined Functions (UDFs), Spark SQL, and HiveQL.
  • Good understanding of Elastic search.
  • Experience in assessing and ensuring data quality, data testing, and addressing data quality issues.
  • Excellent understanding of Spark architecture and underlying frameworks including storage management.
  • Solid background in database design and development, database administration, and software engineering across full life cycles.
  • Experience with NoSQL data stores like MongoDB, DocumentDB, and DynamoDB.
  • Knowledge of data governance principles and practices, including data lineage, metadata management, and access control mechanisms.
  • Experience in implementing and optimizing data security controls, encryption, and compliance measures in GCP environments.
  • Ability to troubleshoot complex issues, perform root cause analysis, and implement effective solutions in a timely manner.
  • Proficiency in data visualization tools such as Tableau, Looker, or Data Studio to create insightful dashboards and reports for business users.
  • Strong communication and interpersonal skills to effectively collaborate with technical and non-technical stakeholders, articulate complex concepts, and drive consensus.
  • Experience with agile methodologies and project management tools like Jira or Asana for sprint planning, backlog grooming, and task tracking.


Read more
ProtoGene Consulting Private Limited
Mumbai
3 - 8 yrs
₹7L - ₹18L / yr
PySpark
Data engineering
Big Data
Hadoop
Spark
+4 more

Data Engineer + Integration engineer + Support specialistExp – 5-8 years

Necessary Skills:· SQL & Python / PySpark

· AWS Services: Glue, Appflow, Redshift

· Data warehousing

· Data modelling

Job Description:· Experience of implementing and delivering data solutions and pipelines on AWS Cloud Platform. Design/ implement, and maintain the data architecture for all AWS data services

· A strong understanding of data modelling, data structures, databases (Redshift), and ETL processes

· Work with stakeholders to identify business needs and requirements for data-related projects

Strong SQL and/or Python or PySpark knowledge

· Creating data models that can be used to extract information from various sources & store it in a usable format

· Optimize data models for performance and efficiency

· Write SQL queries to support data analysis and reporting

· Monitor and troubleshoot data pipelines

· Collaborate with software engineers to design and implement data-driven features

· Perform root cause analysis on data issues

· Maintain documentation of the data architecture and ETL processes

· Identifying opportunities to improve performance by improving database structure or indexing methods

· Maintaining existing applications by updating existing code or adding new features to meet new requirements

· Designing and implementing security measures to protect data from unauthorized access or misuse

· Recommending infrastructure changes to improve capacity or performance

Experience in Process industry

Data Engineer + Integration engineer + Support specialistExp – 3-5 years

Necessary Skills:· SQL & Python / PySpark

· AWS Services: Glue, Appflow, Redshift

· Data warehousing basics

· Data modelling basics

Job Description:· Experience of implementing and delivering data solutions and pipelines on AWS Cloud Platform.

· A strong understanding of data modelling, data structures, databases (Redshift)

Strong SQL and/or Python or PySpark knowledge

· Design and implement ETL processes to load data into the data warehouse

· Creating data models that can be used to extract information from various sources & store it in a usable format

· Optimize data models for performance and efficiency

· Write SQL queries to support data analysis and reporting

· Collaborate with team to design and implement data-driven features

· Monitor and troubleshoot data pipelines

· Perform root cause analysis on data issues

· Maintain documentation of the data architecture and ETL processes

· Maintaining existing applications by updating existing code or adding new features to meet new requirements

· Designing and implementing security measures to protect data from unauthorized access or misuse

· Identifying opportunities to improve performance by improving database structure or indexing methods

· Designing and implementing security measures to protect data from unauthorized access or misuse

· Recommending infrastructure changes to improve capacity or performance


Read more
ProtoGene Consulting Private Limited
Vadodara
3 - 7 yrs
₹8L - ₹15L / yr
PySpark
Data engineering
Big Data
Hadoop
Spark
+4 more

Skills / Tools

  • SQL & Python / PySpark
  • AWS Services: Glue, Appflow, Redshift - Mandatory
  • Data warehousing basics
  • Data modelling basics

Job Description

  • Experience of implementing and delivering data solutions and pipelines on AWS Cloud Platform. Design/ implement, and maintain the data architecture for all AWS data services
  • A strong understanding of data modelling, data structures, databases (Redshift), and ETL processes
  • Work with stakeholders to identify business needs and requirements for data-related project
  • Strong SQL and/or Python or PySpark knowledge
  • Creating data models that can be used to extract information from various sources & store it in a usable format
  • Optimize data models for performance and efficiency
  • Write SQL queries to support data analysis and reporting
  • Monitor and troubleshoot data pipelines
  • Collaborate with software engineers to design and implement data-driven features 
  • Perform root cause analysis on data issues 
  • Maintain documentation of the data architecture and ETL processes 
  • Identifying opportunities to improve performance by improving database structure or indexing methods 
  • Maintaining existing applications by updating existing code or adding new features to meet new requirements 
  • Designing and implementing security measures to protect data from unauthorized access or misuse 
  • Recommending infrastructure changes to improve capacity or performance 
  • Experience in Process industry


Read more
Programmingcom
Abhishek Arora
Posted by Abhishek Arora
Gurugram
5 - 7 yrs
₹10L - ₹20L / yr
databricks
Windows Azure
PySpark

ob Description: 

We are seeking an experienced Azure Data Engineer with expertise in Azure Data Factory, Azure Databricks, and Azure Data Fabric to lead the migration of our existing data pipeline and processing infrastructure. The ideal candidate will have a strong background in Azure cloud data services, big data analytics, and data engineering, with specific experience in Azure Data Fabric. We are looking for someone who has at least 6 months of hands-on experience with Azure Data Fabric or has successfully completed at least one migration to Azure Data Fabric.


Key Responsibilities:

  • Assess the current data architecture using Azure Data Factory and Databricks and develop a detailed migration plan to Azure Data Fabric.
  • Design and implement end-to-end data pipelines within Azure Data Fabric, including data ingestion, transformation, storage, and analytics.
  • Optimize data workflows to leverage Azure Data Fabric's unified platform for data integration, big data processing, and real-time analytics.
  • Ensure seamless integration of data from SharePoint and other sources into Azure Data Fabric, maintaining data quality and integrity.
  • Collaborate with business analysts and business stakeholders to align data strategies and optimize the data environment for machine learning and AI workloads.
  • Implement security best practices, including data governance, access control, and monitoring within Azure Data Fabric.
  • Conduct performance tuning and optimization for data storage and processing within Azure Data Fabric to ensure high availability and cost efficiency.

Key Requirements:

  • Proven experience (5+ years) in Azure data engineering with a strong focus on Azure Data Factory and Azure Databricks.
  • At least 6 months of hands-on experience with Azure Data Fabric or completion of one migration to Azure Data Fabric.
  • Hands-on experience in designing, building, and managing data pipelines, data lakes, and data warehouses on Azure.
  • Expertise in Spark, SQL, and data transformation techniques within Azure environments.
  • Strong understanding of data governance, security, and compliance in cloud environments.
  • Experience with migrating data architectures and optimizing workflows on cloud platforms.
  • Ability to work collaboratively with cross-functional teams and communicate technical concepts effectively to non-technical stakeholders.
  • Azure certifications (e.g., Azure Data Engineer Associate, Azure Solutions Architect Expert) are a plus.

Key requirements:

  • The person should have at least 6 months of work experience in Data Fabric.  Make sure the experience is not less than 6 months 
  • Solid technical skills: data bricks, data fabric and data factory
  • Polished, good communication and interpersonal skills
  • The Person should have at least 6 years of experience in Databricks, Datafactory. 


Read more
TVARIT GmbH

at TVARIT GmbH

2 candid answers
Shivani Kawade
Posted by Shivani Kawade
Remote, Pune
2 - 4 yrs
₹8L - ₹20L / yr
skill iconPython
PySpark
ETL
databricks
Azure
+6 more

TVARIT GmbH develops and delivers solutions in the field of artificial intelligence (AI) for the Manufacturing, automotive, and process industries. With its software products, TVARIT makes it possible for its customers to make intelligent and well-founded decisions, e.g., in forward-looking Maintenance, increasing the OEE and predictive quality. We have renowned reference customers, competent technology, a good research team from renowned Universities, and the award of a renowned AI prize (e.g., EU Horizon 2020) which makes Tvarit one of the most innovative AI companies in Germany and Europe. 

 

 

We are looking for a self-motivated person with a positive "can-do" attitude and excellent oral and written communication skills in English. 

 

 

We are seeking a skilled and motivated Data Engineer from the manufacturing Industry with over two years of experience to join our team. As a data engineer, you will be responsible for designing, building, and maintaining the infrastructure required for the collection, storage, processing, and analysis of large and complex data sets. The ideal candidate will have a strong foundation in ETL pipelines and Python, with additional experience in Azure and Terraform being a plus. This role requires a proactive individual who can contribute to our data infrastructure and support our analytics and data science initiatives. 

 

 

Skills Required 

  • Experience in the manufacturing industry (metal industry is a plus)  
  • 2+ years of experience as a Data Engineer 
  • Experience in data cleaning & structuring and data manipulation 
  • ETL Pipelines: Proven experience in designing, building, and maintaining ETL pipelines. 
  • Python: Strong proficiency in Python programming for data manipulation, transformation, and automation. 
  • Experience in SQL and data structures  
  • Knowledge in big data technologies such as Spark, Flink, Hadoop, Apache and NoSQL databases. 
  • Knowledge of cloud technologies (at least one) such as AWS, Azure, and Google Cloud Platform. 
  • Proficient in data management and data governance  
  • Strong analytical and problem-solving skills. 
  • Excellent communication and teamwork abilities. 

 


Nice To Have 

  • Azure: Experience with Azure data services (e.g., Azure Data Factory, Azure Databricks, Azure SQL Database). 
  • Terraform: Knowledge of Terraform for infrastructure as code (IaC) to manage cloud. 


Read more
Wissen Technology

at Wissen Technology

4 recruiters
Sukanya Mohan
Posted by Sukanya Mohan
Pune, Bengaluru (Bangalore)
5 - 10 yrs
Best in industry
skill iconAmazon Web Services (AWS)
EMR
skill iconPython
GLUE
SQL
+1 more

Greetings , Wissen Technology is Hiring for the position of Data Engineer

Please find the Job Description for your Reference:


JD

  • Design, develop, and maintain data pipelines on AWS EMR (Elastic MapReduce) to support data processing and analytics.
  • Implement data ingestion processes from various sources including APIs, databases, and flat files.
  • Optimize and tune big data workflows for performance and scalability.
  • Collaborate with data scientists, analysts, and other stakeholders to understand data requirements and deliver solutions.
  • Manage and monitor EMR clusters, ensuring high availability and reliability.
  • Develop ETL (Extract, Transform, Load) processes to cleanse, transform, and store data in data lakes and data warehouses.
  • Implement data security best practices to ensure data is protected and compliant with relevant regulations.
  • Create and maintain technical documentation related to data pipelines, workflows, and infrastructure.
  • Troubleshoot and resolve issues related to data processing and EMR cluster performance.

 

 

Qualifications:

 

  • Bachelor’s degree in Computer Science, Information Technology, or a related field.
  • 5+ years of experience in data engineering, with a focus on big data technologies.
  • Strong experience with AWS services, particularly EMR, S3, Redshift, Lambda, and Glue.
  • Proficiency in programming languages such as Python, Java, or Scala.
  • Experience with big data frameworks and tools such as Hadoop, Spark, Hive, and Pig.
  • Solid understanding of data modeling, ETL processes, and data warehousing concepts.
  • Experience with SQL and NoSQL databases.
  • Familiarity with CI/CD pipelines and version control systems (e.g., Git).
  • Strong problem-solving skills and the ability to work independently and collaboratively in a team environment
Read more
codersbrain

at codersbrain

1 recruiter
Tanuj Uppal
Posted by Tanuj Uppal
Bengaluru (Bangalore)
10 - 15 yrs
₹10L - ₹15L / yr
Microsoft Windows Azure
Snowflake
Delivery Management
ETL
PySpark
+2 more
  • Sr. Solution Architect 
  • Job Location – Bangalore
  • Need candidates who can join in 15 days or less.
  • Overall, 12-15 years of experience.

 

Looking for this tech stack in a Sr. Solution Architect (who also has a Delivery Manager background). Someone who has heavy business and IT stakeholder collaboration and negotiation skills, someone who can provide thought leadership, collaborate in the development of Product roadmaps, influence decisions, negotiate effectively with business and IT stakeholders, etc.

 

  • Building data pipelines using Azure data tools and services (Azure Data Factory, Azure Databricks, Azure Function, Spark, Azure Blob/ADLS, Azure SQL, Snowflake..)
  • Administration of cloud infrastructure in public clouds such as Azure
  • Monitoring cloud infrastructure, applications, big data pipelines and ETL workflows
  • Managing outages, customer escalations, crisis management, and other similar circumstances.
  • Understanding of DevOps tools and environments like Azure DevOps, Jenkins, Git, Ansible, Terraform.
  • SQL, Spark SQL, Python, PySpark
  • Familiarity with agile software delivery methodologies
  • Proven experience collaborating with global Product Team members, including Business Stakeholders located in NA


Read more
Wissen Technology

at Wissen Technology

4 recruiters
Sukanya Mohan
Posted by Sukanya Mohan
Bengaluru (Bangalore)
8 - 15 yrs
Best in industry
Snow flake schema
skill iconPython
PySpark
databricks

Responsibilities:

  • Lead the design, development, and implementation of scalable data architectures leveraging Snowflake, Python, PySpark, and Databricks.
  • Collaborate with business stakeholders to understand requirements and translate them into technical specifications and data models.
  • Architect and optimize data pipelines for performance, reliability, and efficiency.
  • Ensure data quality, integrity, and security across all data processes and systems.
  • Provide technical leadership and mentorship to junior team members.
  • Stay abreast of industry trends and best practices in data architecture and analytics.
  • Drive innovation and continuous improvement in data management practices.

Requirements:

  • Bachelor's degree in Computer Science, Information Systems, or a related field. Master's degree preferred.
  • 5+ years of experience in data architecture, data engineering, or a related field.
  • Strong proficiency in Snowflake, including data modeling, performance tuning, and administration.
  • Expertise in Python and PySpark for data processing, manipulation, and analysis.
  • Hands-on experience with Databricks for building and managing data pipelines.
  • Proven leadership experience, with the ability to lead cross-functional teams and drive projects to successful completion.
  • Experience in the banking or insurance domain is highly desirable.
  • Excellent communication skills, with the ability to effectively collaborate with stakeholders at all levels of the organization.
  • Strong problem-solving and analytical skills, with a keen attention to detail.

Benefits:

  • Competitive salary and performance-based incentives.
  • Comprehensive benefits package, including health insurance, retirement plans, and wellness programs.
  • Flexible work arrangements, including remote options.
  • Opportunities for professional development and career advancement.
  • Dynamic and collaborative work environment with a focus on innovation and continuous learning.


Read more
IntraEdge

at IntraEdge

1 recruiter
Karishma Shingote
Posted by Karishma Shingote
Pune
5 - 11 yrs
₹5L - ₹15L / yr
SQL
snowflake
Enterprise Data Warehouse (EDW)
skill iconPython
PySpark

Sr. Data Engineer (Data Warehouse-Snowflake)

Experience: 5+yrs

Location: Pune (Hybrid)


As a Senior Data engineer with Snowflake expertise you are a subject matter expert who is curious and an innovative thinker to mentor young professionals. You are a key person to convert Vision and Data Strategy for Data solutions and deliver them. With your knowledge you will help create data-driven thinking within the organization, not just within Data teams, but also in the wider stakeholder community.


Skills Preferred

  • Advanced written, verbal, and analytic skills, and demonstrated ability to influence and facilitate sustained change. Ability to convey information clearly and concisely to all levels of staff and management about programs, services, best practices, strategies, and organizational mission and values.
  • Proven ability to focus on priorities, strategies, and vision.
  • Very Good understanding in Data Foundation initiatives, like Data Modelling, Data Quality Management, Data Governance, Data Maturity Assessments and Data Strategy in support of the key business stakeholders.
  • Actively deliver the roll-out and embedding of Data Foundation initiatives in support of the key business programs advising on the technology and using leading market standard tools.
  • Coordinate the change management process, incident management and problem management process.
  • Ensure traceability of requirements from Data through testing and scope changes, to training and transition.
  • Drive implementation efficiency and effectiveness across the pilots and future projects to minimize cost, increase speed of implementation and maximize value delivery


Knowledge Preferred

  • Extensive knowledge and hands on experience with Snowflake and its different components like User/Group, Data Store/ Warehouse management, External Stage/table, working with semi structured data, Snowpipe etc.
  • Implement and manage CI/CD for migrating and deploying codes to higher environments with Snowflake codes.
  • Proven experience with Snowflake Access control and authentication, data security, data sharing, working with VS Code extension for snowflake, replication, and failover, optimizing SQL, analytical ability to troubleshoot and debug on development and production issues quickly is key for success in this role.
  • Proven technology champion in working with relational, Data warehouses databases, query authoring (SQL) as well as working familiarity with a variety of databases. 
  • Highly Experienced in building and optimizing complex queries. Good with manipulating, processing, and extracting value from large, disconnected datasets.
  • Your experience in handling big data sets and big data technologies will be an asset.
  • Proven champion with in-depth knowledge of any one of the scripting languages: Python, SQL, Pyspark.


Primary responsibilities

  • You will be an asset in our team bringing deep technical skills and capabilities to become a key part of projects defining the data journey in our company, keen to engage, network and innovate in collaboration with company wide teams.
  • Collaborate with the data and analytics team to develop and maintain a data model and data governance infrastructure using a range of different storage technologies that enables optimal data storage and sharing using advanced methods.
  • Support the development of processes and standards for data mining, data modeling and data protection.
  • Design and implement continuous process improvements for automating manual processes and optimizing data delivery.
  • Assess and report on the unique data needs of key stakeholders and troubleshoot any data-related technical issues through to resolution.
  • Work to improve data models that support business intelligence tools, improve data accessibility and foster data-driven decision making.
  • Ensure traceability of requirements from Data through testing and scope changes, to training and transition.
  • Manage and lead technical design and development activities for implementation of large-scale data solutions in Snowflake to support multiple use cases (transformation, reporting and analytics, data monetization, etc.).
  • Translate advanced business data, integration and analytics problems into technical approaches that yield actionable recommendations, across multiple, diverse domains; communicate results and educate others through design and build of insightful presentations.
  • Exhibit strong knowledge of the Snowflake ecosystem and can clearly articulate the value proposition of cloud modernization/transformation to a wide range of stakeholders.


Relevant work experience

Bachelors in a Science, Technology, Engineering, Mathematics or Computer Science discipline or equivalent with 7+ Years of experience in enterprise-wide data warehousing, governance, policies, procedures, and implementation.

Aptitude for working with data, interpreting results, business intelligence and analytic best practices.


Business understanding

Good knowledge and understanding of Consumer and industrial products sector and IoT. 

Good functional understanding of solutions supporting business processes.


Skill Must have

  • Snowflake 5+ years
  • Overall different Data warehousing techs 5+ years
  • SQL 5+ years
  • Data warehouse designing experience 3+ years
  • Experience with cloud and on-prem hybrid models in data architecture
  • Knowledge of Data Governance and strong understanding of data lineage and data quality
  • Programming & Scripting: Python, Pyspark
  • Database technologies such as Traditional RDBMS (MS SQL Server, Oracle, MySQL, PostgreSQL)


Nice to have

  • Demonstrated experience in modern enterprise data integration platforms such as Informatica
  • AWS cloud services: S3, Lambda, Glue and Kinesis and API Gateway, EC2, EMR, RDS, Redshift and Kinesis
  • Good understanding of Data Architecture approaches
  • Experience in designing and building streaming data ingestion, analysis and processing pipelines using Kafka, Kafka Streams, Spark Streaming, Stream sets and similar cloud native technologies.
  • Experience with implementation of operations concerns for a data platform such as monitoring, security, and scalability
  • Experience working in DevOps, Agile, Scrum, Continuous Delivery and/or Rapid Application Development environments
  • Building mock and proof-of-concepts across different capabilities/tool sets exposure
  • Experience working with structured, semi-structured, and unstructured data, extracting information, and identifying linkages across disparate data sets


Read more
Frisco Analytics Pvt Ltd
Cedrick Mariadas
Posted by Cedrick Mariadas
Bengaluru (Bangalore), Hyderabad
5 - 8 yrs
₹15L - ₹20L / yr
databricks
Apache Spark
skill iconPython
SQL
MySQL
+3 more

We are actively seeking a self-motivated Data Engineer with expertise in Azure cloud and Databricks, with a thorough understanding of Delta Lake and Lake-house Architecture. The ideal candidate should excel in developing scalable data solutions, crafting platform tools, and integrating systems, while demonstrating proficiency in cloud-native database solutions and distributed data processing.


Key Responsibilities:

  • Contribute to the development and upkeep of a scalable data platform, incorporating tools and frameworks that leverage Azure and Databricks capabilities.
  • Exhibit proficiency in various RDBMS databases such as MySQL and SQL-Server, emphasizing their integration in applications and pipeline development.
  • Design and maintain high-caliber code, including data pipelines and applications, utilizing Python, Scala, and PHP.
  • Implement effective data processing solutions via Apache Spark, optimizing Spark applications for large-scale data handling.
  • Optimize data storage using formats like Parquet and Delta Lake to ensure efficient data accessibility and reliable performance.
  • Demonstrate understanding of Hive Metastore, Unity Catalog Metastore, and the operational dynamics of external tables.
  • Collaborate with diverse teams to convert business requirements into precise technical specifications.

Requirements:

  • Bachelor’s degree in Computer Science, Engineering, or a related discipline.
  • Demonstrated hands-on experience with Azure cloud services and Databricks.
  • Proficient programming skills in Python, Scala, and PHP.
  • In-depth knowledge of SQL, NoSQL databases, and data warehousing principles.
  • Familiarity with distributed data processing and external table management.
  • Insight into enterprise data solutions for PIM, CDP, MDM, and ERP applications.
  • Exceptional problem-solving acumen and meticulous attention to detail.

Additional Qualifications :

  • Acquaintance with data security and privacy standards.
  • Experience in CI/CD pipelines and version control systems, notably Git.
  • Familiarity with Agile methodologies and DevOps practices.
  • Competence in technical writing for comprehensive documentation.


Read more
Bengaluru (Bangalore), Mumbai, Delhi, Gurugram, Pune, Hyderabad, Ahmedabad, Chennai
3 - 7 yrs
₹8L - ₹15L / yr
AWS Lambda
Amazon S3
Amazon VPC
Amazon EC2
Amazon Redshift
+3 more

Technical Skills:


  • Ability to understand and translate business requirements into design.
  • Proficient in AWS infrastructure components such as S3, IAM, VPC, EC2, and Redshift.
  • Experience in creating ETL jobs using Python/PySpark.
  • Proficiency in creating AWS Lambda functions for event-based jobs.
  • Knowledge of automating ETL processes using AWS Step Functions.
  • Competence in building data warehouses and loading data into them.


Responsibilities:


  • Understand business requirements and translate them into design.
  • Assess AWS infrastructure needs for development work.
  • Develop ETL jobs using Python/PySpark to meet requirements.
  • Implement AWS Lambda for event-based tasks.
  • Automate ETL processes using AWS Step Functions.
  • Build data warehouses and manage data loading.
  • Engage with customers and stakeholders to articulate the benefits of proposed solutions and frameworks.
Read more
Publicis Sapient

at Publicis Sapient

10 recruiters
Mohit Singh
Posted by Mohit Singh
Bengaluru (Bangalore), Pune, Hyderabad, Gurugram, Noida
5 - 11 yrs
₹20L - ₹36L / yr
PySpark
Data engineering
Big Data
Hadoop
Spark
+7 more

Publicis Sapient Overview:

The Senior Associate People Senior Associate L1 in Data Engineering, you will translate client requirements into technical design, and implement components for data engineering solution. Utilize deep understanding of data integration and big data design principles in creating custom solutions or implementing package solutions. You will independently drive design discussions to insure the necessary health of the overall solution 

.

Job Summary:

As Senior Associate L2 in Data Engineering, you will translate client requirements into technical design, and implement components for data engineering solution. Utilize deep understanding of data integration and big data design principles in creating custom solutions or implementing package solutions. You will independently drive design discussions to insure the necessary health of the overall solution

The role requires a hands-on technologist who has strong programming background like Java / Scala / Python, should have experience in Data Ingestion, Integration and data Wrangling, Computation, Analytics pipelines and exposure to Hadoop ecosystem components. You are also required to have hands-on knowledge on at least one of AWS, GCP, Azure cloud platforms.


Role & Responsibilities:

Your role is focused on Design, Development and delivery of solutions involving:

• Data Integration, Processing & Governance

• Data Storage and Computation Frameworks, Performance Optimizations

• Analytics & Visualizations

• Infrastructure & Cloud Computing

• Data Management Platforms

• Implement scalable architectural models for data processing and storage

• Build functionality for data ingestion from multiple heterogeneous sources in batch & real-time mode

• Build functionality for data analytics, search and aggregation

Experience Guidelines:

Mandatory Experience and Competencies:

# Competency

1.Overall 5+ years of IT experience with 3+ years in Data related technologies

2.Minimum 2.5 years of experience in Big Data technologies and working exposure in at least one cloud platform on related data services (AWS / Azure / GCP)

3.Hands-on experience with the Hadoop stack – HDFS, sqoop, kafka, Pulsar, NiFi, Spark, Spark Streaming, Flink, Storm, hive, oozie, airflow and other components required in building end to end data pipeline.

4.Strong experience in at least of the programming language Java, Scala, Python. Java preferable

5.Hands-on working knowledge of NoSQL and MPP data platforms like Hbase, MongoDb, Cassandra, AWS Redshift, Azure SQLDW, GCP BigQuery etc

6.Well-versed and working knowledge with data platform related services on at least 1 cloud platform, IAM and data security


Preferred Experience and Knowledge (Good to Have):

# Competency

1.Good knowledge of traditional ETL tools (Informatica, Talend, etc) and database technologies (Oracle, MySQL, SQL Server, Postgres) with hands on experience

2.Knowledge on data governance processes (security, lineage, catalog) and tools like Collibra, Alation etc

3.Knowledge on distributed messaging frameworks like ActiveMQ / RabbiMQ / Solace, search & indexing and Micro services architectures

4.Performance tuning and optimization of data pipelines

5.CI/CD – Infra provisioning on cloud, auto build & deployment pipelines, code quality

6.Cloud data specialty and other related Big data technology certifications


Personal Attributes:

• Strong written and verbal communication skills

• Articulation skills

• Good team player

• Self-starter who requires minimal oversight

• Ability to prioritize and manage multiple tasks

• Process orientation and the ability to define and set up processes


Read more
Kanerika Software

at Kanerika Software

1 recruiter
Meenakshi Ramagiri
Posted by Meenakshi Ramagiri
RIYADH (Saudi Arabia), Hyderabad
6 - 12 yrs
₹10L - ₹15L / yr
skill iconData Science
skill iconMachine Learning (ML)
Natural Language Processing (NLP)
Computer Vision
recommendation algorithm
+2 more

Job Description


Responsibilities:

- Collaborate with stakeholders to understand business objectives and requirements for AI/ML projects.

- Conduct research and stay up-to-date with the latest AI/ML algorithms, techniques, and frameworks.

- Design and develop machine learning models, algorithms, and data pipelines.

- Collect, preprocess, and clean large datasets to ensure data quality and reliability.

- Train, evaluate, and optimize machine learning models using appropriate evaluation metrics.

- Implement and deploy AI/ML models into production environments.

- Monitor model performance and propose enhancements or updates as needed.

- Collaborate with software engineers to integrate AI/ML capabilities into existing software systems.

- Perform data analysis and visualization to derive actionable insights.

- Stay informed about emerging trends and advancements in the field of AI/ML and apply them to improve existing solutions.

Strong experience in Apache pyspark is must

 

Requirements:

- Bachelor's or Master's degree in Computer Science, Engineering, or a related field.

- Proven experience of 3-5 years as an AI/ML Engineer or a similar role.

- Strong knowledge of machine learning algorithms, deep learning frameworks, and data science concepts.

- Proficiency in programming languages such as Python, Java, or C++.

- Experience with popular AI/ML libraries and frameworks, such as TensorFlow, Keras, PyTorch, or scikit-learn.

- Familiarity with cloud platforms, such as AWS, Azure, or GCP, and their AI/ML services.

- Solid understanding of data preprocessing, feature engineering, and model evaluation techniques.

- Experience in deploying and scaling machine learning models in production environments.

- Strong problem-solving skills and ability to work on multiple projects simultaneously.

- Excellent communication and teamwork skills.

 

Preferred Skills:

- Experience with natural language processing (NLP) techniques and tools.

- Familiarity with big data technologies, such as Hadoop, Spark, or Hive.

- Knowledge of containerization technologies like Docker and orchestration tools like Kubernetes.

- Understanding of DevOps practices for AI/ML model deployment

-Apache ,Pyspark



Read more
one-to-one, one-to-many, and many-to-many

one-to-one, one-to-many, and many-to-many

Agency job
via The Hub by Sridevi Viswanathan
Chennai
5 - 10 yrs
₹1L - ₹15L / yr
AWS CloudFormation
skill iconPython
PySpark
AWS Lambda

5-7 years of experience in Data Engineering with solid experience in design, development and implementation of end-to-end data ingestion and data processing system in AWS platform.

2-3 years of experience in AWS Glue, Lambda, Appflow, EventBridge, Python, PySpark, Lake House, S3, Redshift, Postgres, API Gateway, CloudFormation, Kinesis, Athena, KMS, IAM.

Experience in modern data architecture, Lake House, Enterprise Data Lake, Data Warehouse, API interfaces, solution patterns, standards and optimizing data ingestion.

Experience in build of data pipelines from source systems like SAP Concur, Veeva Vault, Azure Cost, various social media platforms or similar source systems.

Expertise in analyzing source data and designing a robust and scalable data ingestion framework and pipelines adhering to client Enterprise Data Architecture guidelines.

Proficient in design and development of solutions for real-time (or near real time) stream data processing as well as batch processing on the AWS platform.

Work closely with business analysts, data architects, data engineers, and data analysts to ensure that the data ingestion solutions meet the needs of the business.

Troubleshoot and provide support for issues related to data quality and data ingestion solutions. This may involve debugging data pipeline processes, optimizing queries, or troubleshooting application performance issues.

Experience in working in Agile/Scrum methodologies, CI/CD tools and practices, coding standards, code reviews, source management (GITHUB), JIRA, JIRA Xray and Confluence.

Experience or exposure to design and development using Full Stack tools.

Strong analytical and problem-solving skills, excellent communication (written and oral), and interpersonal skills.

Bachelor's or master's degree in computer science or related field.

 

 

Read more
Publicis Sapient

at Publicis Sapient

10 recruiters
Mohit Singh
Posted by Mohit Singh
Bengaluru (Bangalore), Gurugram, Pune, Hyderabad, Noida
4 - 10 yrs
Best in industry
PySpark
Data engineering
Big Data
Hadoop
Spark
+6 more

Publicis Sapient Overview:

The Senior Associate People Senior Associate L1 in Data Engineering, you will translate client requirements into technical design, and implement components for data engineering solution. Utilize deep understanding of data integration and big data design principles in creating custom solutions or implementing package solutions. You will independently drive design discussions to insure the necessary health of the overall solution 

.

Job Summary:

As Senior Associate L1 in Data Engineering, you will do technical design, and implement components for data engineering solution. Utilize deep understanding of data integration and big data design principles in creating custom solutions or implementing package solutions. You will independently drive design discussions to insure the necessary health of the overall solution

The role requires a hands-on technologist who has strong programming background like Java / Scala / Python, should have experience in Data Ingestion, Integration and data Wrangling, Computation, Analytics pipelines and exposure to Hadoop ecosystem components. Having hands-on knowledge on at least one of AWS, GCP, Azure cloud platforms will be preferable.


Role & Responsibilities:

Job Title: Senior Associate L1 – Data Engineering

Your role is focused on Design, Development and delivery of solutions involving:

• Data Ingestion, Integration and Transformation

• Data Storage and Computation Frameworks, Performance Optimizations

• Analytics & Visualizations

• Infrastructure & Cloud Computing

• Data Management Platforms

• Build functionality for data ingestion from multiple heterogeneous sources in batch & real-time

• Build functionality for data analytics, search and aggregation


Experience Guidelines:

Mandatory Experience and Competencies:

# Competency

1.Overall 3.5+ years of IT experience with 1.5+ years in Data related technologies

2.Minimum 1.5 years of experience in Big Data technologies

3.Hands-on experience with the Hadoop stack – HDFS, sqoop, kafka, Pulsar, NiFi, Spark, Spark Streaming, Flink, Storm, hive, oozie, airflow and other components required in building end to end data pipeline. Working knowledge on real-time data pipelines is added advantage.

4.Strong experience in at least of the programming language Java, Scala, Python. Java preferable

5.Hands-on working knowledge of NoSQL and MPP data platforms like Hbase, MongoDb, Cassandra, AWS Redshift, Azure SQLDW, GCP BigQuery etc


Preferred Experience and Knowledge (Good to Have):

# Competency

1.Good knowledge of traditional ETL tools (Informatica, Talend, etc) and database technologies (Oracle, MySQL, SQL Server, Postgres) with hands on experience

2.Knowledge on data governance processes (security, lineage, catalog) and tools like Collibra, Alation etc

3.Knowledge on distributed messaging frameworks like ActiveMQ / RabbiMQ / Solace, search & indexing and Micro services architectures

4.Performance tuning and optimization of data pipelines

5.CI/CD – Infra provisioning on cloud, auto build & deployment pipelines, code quality

6.Working knowledge with data platform related services on at least 1 cloud platform, IAM and data security

7.Cloud data specialty and other related Big data technology certifications


Job Title: Senior Associate L1 – Data Engineering

Personal Attributes:

• Strong written and verbal communication skills

• Articulation skills

• Good team player

• Self-starter who requires minimal oversight

• Ability to prioritize and manage multiple tasks

• Process orientation and the ability to define and set up processes

Read more
Arting Digital
Pragati Bhardwaj
Posted by Pragati Bhardwaj
Bengaluru (Bangalore)
10 - 16 yrs
₹10L - ₹15L / yr
databricks
Data modeling
SQL
skill iconPython
AWS Lambda
+2 more

Title:- Lead Data Engineer 


Experience: 10+y

Budget: 32-36 LPA

Location: Bangalore 

Work of Mode: Work from office

Primary Skills: Data Bricks, Spark, Pyspark,Sql, Python, AWS

Qualification: Any Engineering degree


Roles and Responsibilities:


• 8 - 10+ years’ experience in developing scalable Big Data applications or solutions on

 distributed platforms.

• Able to partner with others in solving complex problems by taking a broad

 perspective to identify.

• innovative solutions.

• Strong skills building positive relationships across Product and Engineering.

• Able to influence and communicate effectively, both verbally and written, with team

  members and business stakeholders

• Able to quickly pick up new programming languages, technologies, and frameworks.

• Experience working in Agile and Scrum development process.

• Experience working in a fast-paced, results-oriented environment.

• Experience in Amazon Web Services (AWS) mainly S3, Managed Airflow, EMR/ EC2,

  IAM etc.

• Experience working with Data Warehousing tools, including SQL database, Presto,

  and Snowflake

• Experience architecting data product in Streaming, Serverless and Microservices

  Architecture and platform.

• Experience working with Data platforms, including EMR, Airflow, Databricks (Data

  Engineering & Delta

• Lake components, and Lakehouse Medallion architecture), etc.

• Experience with creating/ configuring Jenkins pipeline for smooth CI/CD process for

  Managed Spark jobs, build Docker images, etc.

• Experience working with distributed technology tools, including Spark, Python, Scala

• Working knowledge of Data warehousing, Data modelling, Governance and Data

  Architecture

• Working knowledge of Reporting & Analytical tools such as Tableau, Quicksite

  etc.

• Demonstrated experience in learning new technologies and skills.

• Bachelor’s degree in computer science, Information Systems, Business, or other

  relevant subject area

Read more
A LEADING US BASED MNC

A LEADING US BASED MNC

Agency job
via Zeal Consultants by Zeal Consultants
Bengaluru (Bangalore), Hyderabad, Delhi, Gurugram
5 - 10 yrs
₹14L - ₹15L / yr
Google Cloud Platform (GCP)
Spark
PySpark
Apache Spark
"DATA STREAMING"

Data Engineering : Senior Engineer / Manager


As Senior Engineer/ Manager in Data Engineering, you will translate client requirements into technical design, and implement components for a data engineering solutions. Utilize a deep understanding of data integration and big data design principles in creating custom solutions or implementing package solutions. You will independently drive design discussions to insure the necessary health of the overall solution.


Must Have skills :


1. GCP


2. Spark streaming : Live data streaming experience is desired.


3. Any 1 coding language: Java/Pyhton /Scala



Skills & Experience :


- Overall experience of MINIMUM 5+ years with Minimum 4 years of relevant experience in Big Data technologies


- Hands-on experience with the Hadoop stack - HDFS, sqoop, kafka, Pulsar, NiFi, Spark, Spark Streaming, Flink, Storm, hive, oozie, airflow and other components required in building end to end data pipeline. Working knowledge on real-time data pipelines is added advantage.


- Strong experience in at least of the programming language Java, Scala, Python. Java preferable


- Hands-on working knowledge of NoSQL and MPP data platforms like Hbase, MongoDb, Cassandra, AWS Redshift, Azure SQLDW, GCP BigQuery etc.


- Well-versed and working knowledge with data platform related services on GCP


- Bachelor's degree and year of work experience of 6 to 12 years or any combination of education, training and/or experience that demonstrates the ability to perform the duties of the position


Your Impact :


- Data Ingestion, Integration and Transformation


- Data Storage and Computation Frameworks, Performance Optimizations


- Analytics & Visualizations


- Infrastructure & Cloud Computing


- Data Management Platforms


- Build functionality for data ingestion from multiple heterogeneous sources in batch & real-time


- Build functionality for data analytics, search and aggregation

Read more
Career Forge

at Career Forge

2 candid answers
Mohammad Faiz
Posted by Mohammad Faiz
Delhi, Gurugram, Noida, Ghaziabad, Faridabad
5 - 7 yrs
₹12L - ₹15L / yr
skill iconPython
Apache Spark
PySpark
Data engineering
ETL
+10 more

🚀 Exciting Opportunity: Data Engineer Position in Gurugram 🌐


Hello 


We are actively seeking a talented and experienced Data Engineer to join our dynamic team at Reality Motivational Venture in Gurugram (Gurgaon). If you're passionate about data, thrive in a collaborative environment, and possess the skills we're looking for, we want to hear from you!


Position: Data Engineer  

Location: Gurugram (Gurgaon)  

Experience: 5+ years 


Key Skills:

- Python

- Spark, Pyspark

- Data Governance

- Cloud (AWS/Azure/GCP)


Main Responsibilities:

- Define and set up analytics environments for "Big Data" applications in collaboration with domain experts.

- Implement ETL processes for telemetry-based and stationary test data.

- Support in defining data governance, including data lifecycle management.

- Develop large-scale data processing engines and real-time search and analytics based on time series data.

- Ensure technical, methodological, and quality aspects.

- Support CI/CD processes.

- Foster know-how development and transfer, continuous improvement of leading technologies within Data Engineering.

- Collaborate with solution architects on the development of complex on-premise, hybrid, and cloud solution architectures.


Qualification Requirements:

- BSc, MSc, MEng, or PhD in Computer Science, Informatics/Telematics, Mathematics/Statistics, or a comparable engineering degree.

- Proficiency in Python and the PyData stack (Pandas/Numpy).

- Experience in high-level programming languages (C#/C++/Java).

- Familiarity with scalable processing environments like Dask (or Spark).

- Proficient in Linux and scripting languages (Bash Scripts).

- Experience in containerization and orchestration of containerized services (Kubernetes).

- Education in database technologies (SQL/OLAP and Non-SQL).

- Interest in Big Data storage technologies (Elastic, ClickHouse).

- Familiarity with Cloud technologies (Azure, AWS, GCP).

- Fluent English communication skills (speaking and writing).

- Ability to work constructively with a global team.

- Willingness to travel for business trips during development projects.


Preferable:

- Working knowledge of vehicle architectures, communication, and components.

- Experience in additional programming languages (C#/C++/Java, R, Scala, MATLAB).

- Experience in time-series processing.


How to Apply:

Interested candidates, please share your updated CV/resume with me.


Thank you for considering this exciting opportunity.

Read more
A fast growing Big Data company

A fast growing Big Data company

Agency job
via Careerconnects by Kumar Narayanan
Noida, Bengaluru (Bangalore), Chennai, Hyderabad
6 - 8 yrs
₹10L - ₹15L / yr
AWS Glue
SQL
skill iconPython
PySpark
Data engineering
+6 more

AWS Glue Developer 

Work Experience: 6 to 8 Years

Work Location:  Noida, Bangalore, Chennai & Hyderabad

Must Have Skills: AWS Glue, DMS, SQL, Python, PySpark, Data integrations and Data Ops, 

Job Reference ID:BT/F21/IND


Job Description:

Design, build and configure applications to meet business process and application requirements.


Responsibilities:

7 years of work experience with ETL, Data Modelling, and Data Architecture Proficient in ETL optimization, designing, coding, and tuning big data processes using Pyspark Extensive experience to build data platforms on AWS using core AWS services Step function, EMR, Lambda, Glue and Athena, Redshift, Postgres, RDS etc and design/develop data engineering solutions. Orchestrate using Airflow.


Technical Experience:

Hands-on experience on developing Data platform and its components Data Lake, cloud Datawarehouse, APIs, Batch and streaming data pipeline Experience with building data pipelines and applications to stream and process large datasets at low latencies.


➢ Enhancements, new development, defect resolution and production support of Big data ETL development using AWS native services.

➢ Create data pipeline architecture by designing and implementing data ingestion solutions.

➢ Integrate data sets using AWS services such as Glue, Lambda functions/ Airflow.

➢ Design and optimize data models on AWS Cloud using AWS data stores such as Redshift, RDS, S3, Athena.

➢ Author ETL processes using Python, Pyspark.

➢ Build Redshift Spectrum direct transformations and data modelling using data in S3.

➢ ETL process monitoring using CloudWatch events.

➢ You will be working in collaboration with other teams. Good communication must.

➢ Must have experience in using AWS services API, AWS CLI and SDK


Professional Attributes:

➢ Experience operating very large data warehouses or data lakes Expert-level skills in writing and optimizing SQL Extensive, real-world experience designing technology components for enterprise solutions and defining solution architectures and reference architectures with a focus on cloud technology.

➢ Must have 6+ years of big data ETL experience using Python, S3, Lambda, Dynamo DB, Athena, Glue in AWS environment.

➢ Expertise in S3, RDS, Redshift, Kinesis, EC2 clusters highly desired.


Qualification:

➢ Degree in Computer Science, Computer Engineering or equivalent.


Salary: Commensurate with experience and demonstrated competence

Read more
dataeaze systems

at dataeaze systems

1 recruiter
Ankita Kale
Posted by Ankita Kale
Remote only
5 - 8 yrs
₹12L - ₹22L / yr
skill iconAmazon Web Services (AWS)
skill iconPython
PySpark
ETL

POST - SENIOR DATA ENGINEER WITH AWS


Experience : 5 years


Must-have:

• Highly skilled in Python and PySpark

• Have expertise in writing Glue jobs ETL script, AWS

• Experience in working with Kafka

• Extensive SQL DB experience – Postgres

Good-to-have:

• Experience in working with data analytics and modelling

• Hands on Experience of PowerBI visualization tool

• Knowledge and hands-on on version control system - Git Common:

• Excellent communication and presentation skills (written and verbal) to all levels

of an organization

• Should be results oriented with ability to prioritize and drive multiple initiatives to

complete work you're doing on time

• Proven ability to influence a diverse geographically dispersed group of

individuals to facilitate, moderate, and influence productive design and implementation

discussions driving towards results


Shifts - Flexible ( might have to work as per US Shift timings for meetings ).

Employment Type - Any

Read more
hopscotch
Bengaluru (Bangalore)
5 - 8 yrs
₹6L - ₹15L / yr
skill iconPython
Amazon Redshift
skill iconAmazon Web Services (AWS)
PySpark
Data engineering
+3 more

About the role:

 Hopscotch is looking for a passionate Data Engineer to join our team. You will work closely with other teams like data analytics, marketing, data science and individual product teams to specify, validate, prototype, scale, and deploy data pipelines features and data architecture.


Here’s what will be expected out of you:

➢ Ability to work in a fast-paced startup mindset. Should be able to manage all aspects of data extraction transfer and load activities.

➢ Develop data pipelines that make data available across platforms.

➢ Should be comfortable in executing ETL (Extract, Transform and Load) processes which include data ingestion, data cleaning and curation into a data warehouse, database, or data platform.

➢ Work on various aspects of the AI/ML ecosystem – data modeling, data and ML pipelines.

➢ Work closely with Devops and senior Architect to come up with scalable system and model architectures for enabling real-time and batch services.


What we want:

➢ 5+ years of experience as a data engineer or data scientist with a focus on data engineering and ETL jobs.

➢ Well versed with the concept of Data warehousing, Data Modelling and/or Data Analysis.

➢ Experience using & building pipelines and performing ETL with industry-standard best practices on Redshift (more than 2+ years).

➢ Ability to troubleshoot and solve performance issues with data ingestion, data processing & query execution on Redshift.

➢ Good understanding of orchestration tools like Airflow.

 ➢ Strong Python and SQL coding skills.

➢ Strong Experience in distributed systems like spark.

➢ Experience with AWS Data and ML Technologies (AWS Glue,MWAA, Data Pipeline,EMR,Athena, Redshift,Lambda etc).

➢ Solid hands on with various data extraction techniques like CDC or Time/batch based and the related tools (Debezium, AWS DMS, Kafka Connect, etc) for near real time and batch data extraction.


Note :

Product based companies, Ecommerce companies is added advantage

Read more
Staffbee Solutions INC
Remote only
6 - 10 yrs
₹1L - ₹1.5L / yr
Spotfire
Qlikview
Tableau
PowerBI
Data Visualization
+11 more

Looking for freelance?

We are seeking a freelance Data Engineer with 7+ years of experience

 

Skills Required: Deep knowledge in any cloud (AWS, Azure , Google cloud), Data bricks, Data lakes, Data Ware housing Python/Scala , SQL, BI, and other analytics systems

 

What we are looking for

We are seeking an experienced Senior Data Engineer with experience in architecture, design, and development of highly scalable data integration and data engineering processes

 

  • The Senior Consultant must have a strong understanding and experience with data & analytics solution architecture, including data warehousing, data lakes, ETL/ELT workload patterns, and related BI & analytics systems
  • Strong in scripting languages like Python, Scala
  • 5+ years of hands-on experience with one or more of these data integration/ETL tools.
  • Experience building on-prem data warehousing solutions.
  • Experience with designing and developing ETLs, Data Marts, Star Schema
  • Designing a data warehouse solution using Synapse or Azure SQL DB
  • Experience building pipelines using Synapse or Azure Data Factory to ingest data from various sources
  • Understanding of integration run times available in Azure.
  • Advanced working SQL knowledge and experience working with relational databases, and queries. authoring (SQL) as well as working familiarity with a variety of database


Read more
Mitibase
Vaidehi Ghangurde
Posted by Vaidehi Ghangurde
Pune
2 - 4 yrs
₹6L - ₹8L / yr
skill iconVue.js
skill iconAngularJS (1.x)
skill iconReact.js
skill iconAngular (2+)
skill iconJavascript
+6 more

·      The Objective:

You will play a crucial role in designing, implementing, and maintaining our data infrastructure, run tests and update the systems


·      Job function and requirements

 

o  Expert in Python, Pandas and Numpy with knowledge of Python web Framework such as Django and Flask.

o  Able to integrate multiple data sources and databases into one system.

o  Basic understanding of frontend technologies like HTML, CSS, JavaScript.

o  Able to build data pipelines.

o  Strong unit test and debugging skills.

o  Understanding of fundamental design principles behind a scalable application

o  Good understanding of RDBMS databases among Mysql or Postgresql.

o  Able to analyze and transform raw data.

 

·      About us

Mitibase helps companies find warm prospects every month that are most relevant, and then helps their team to act on those with automation. We do so by automatically tracking key accounts and contacts for job changes and relationships triggers and surfaces them as warm leads in your sales pipeline.

Read more
A Product Based Client,Chennai

A Product Based Client,Chennai

Agency job
via SangatHR by Anna Poorni
Chennai
4 - 8 yrs
₹10L - ₹15L / yr
Data Warehouse (DWH)
Informatica
ETL
Spark
PySpark
+2 more

Analytics Job Description

We are hiring an Analytics Engineer to help drive our Business Intelligence efforts. You will

partner closely with leaders across the organization, working together to understand the how

and why of people, team and company challenges, workflows and culture. The team is

responsible for delivering data and insights that drive decision-making, execution, and

investments for our product initiatives.

You will work cross-functionally with product, marketing, sales, engineering, finance, and our

customer-facing teams enabling them with data and narratives about the customer journey.

You’ll also work closely with other data teams, such as data engineering and product analytics,

to ensure we are creating a strong data culture at Blend that enables our cross-functional partners

to be more data-informed.


Role : DataEngineer 

Please find below the JD for the DataEngineer Role..

  Location: Guindy,Chennai

How you’ll contribute:

• Develop objectives and metrics, ensure priorities are data-driven, and balance short-

term and long-term goals


• Develop deep analytical insights to inform and influence product roadmaps and

business decisions and help improve the consumer experience

• Work closely with GTM and supporting operations teams to author and develop core

data sets that empower analyses

• Deeply understand the business and proactively spot risks and opportunities

• Develop dashboards and define metrics that drive key business decisions

• Build and maintain scalable ETL pipelines via solutions such as Fivetran, Hightouch,

and Workato

• Design our Analytics and Business Intelligence architecture, assessing and

implementing new technologies that fitting


• Work with our engineering teams to continually make our data pipelines and tooling

more resilient


Who you are:

• Bachelor’s degree or equivalent required from an accredited institution with a

quantitative focus such as Economics, Operations Research, Statistics, Computer Science OR 1-3 Years of Experience as a Data Analyst, Data Engineer, Data Scientist

• Must have strong SQL and data modeling skills, with experience applying skills to

thoughtfully create data models in a warehouse environment.

• A proven track record of using analysis to drive key decisions and influence change

• Strong storyteller and ability to communicate effectively with managers and

executives

• Demonstrated ability to define metrics for product areas, understand the right

questions to ask and push back on stakeholders in the face of ambiguous, complex

problems, and work with diverse teams with different goals

• A passion for documentation.

• A solution-oriented growth mindset. You’ll need to be a self-starter and thrive in a

dynamic environment.

• A bias towards communication and collaboration with business and technical

stakeholders.

• Quantitative rigor and systems thinking.

• Prior startup experience is preferred, but not required.

• Interest or experience in machine learning techniques (such as clustering, decision

tree, and segmentation)

• Familiarity with a scientific computing language, such as Python, for data wrangling

and statistical analysis

• Experience with a SQL focused data transformation framework such as dbt

• Experience with a Business Intelligence Tool such as Mode/Tableau


Mandatory Skillset:


-Very Strong in SQL

-Spark OR pyspark OR Python

-Shell Scripting


Read more
Epik Solutions
Sakshi Sarraf
Posted by Sakshi Sarraf
Bengaluru (Bangalore), Noida
5 - 10 yrs
₹7L - ₹28L / yr
skill iconPython
SQL
databricks
skill iconScala
Spark
+2 more

Job Description:


As an Azure Data Engineer, your role will involve designing, developing, and maintaining data solutions on the Azure platform. You will be responsible for building and optimizing data pipelines, ensuring data quality and reliability, and implementing data processing and transformation logic. Your expertise in Azure Databricks, Python, SQL, Azure Data Factory (ADF), PySpark, and Scala will be essential for performing the following key responsibilities:


Designing and developing data pipelines: You will design and implement scalable and efficient data pipelines using Azure Databricks, PySpark, and Scala. This includes data ingestion, data transformation, and data loading processes.


Data modeling and database design: You will design and implement data models to support efficient data storage, retrieval, and analysis. This may involve working with relational databases, data lakes, or other storage solutions on the Azure platform.


Data integration and orchestration: You will leverage Azure Data Factory (ADF) to orchestrate data integration workflows and manage data movement across various data sources and targets. This includes scheduling and monitoring data pipelines.


Data quality and governance: You will implement data quality checks, validation rules, and data governance processes to ensure data accuracy, consistency, and compliance with relevant regulations and standards.


Performance optimization: You will optimize data pipelines and queries to improve overall system performance and reduce processing time. This may involve tuning SQL queries, optimizing data transformation logic, and leveraging caching techniques.


Monitoring and troubleshooting: You will monitor data pipelines, identify performance bottlenecks, and troubleshoot issues related to data ingestion, processing, and transformation. You will work closely with cross-functional teams to resolve data-related problems.


Documentation and collaboration: You will document data pipelines, data flows, and data transformation processes. You will collaborate with data scientists, analysts, and other stakeholders to understand their data requirements and provide data engineering support.


Skills and Qualifications:


Strong experience with Azure Databricks, Python, SQL, ADF, PySpark, and Scala.

Proficiency in designing and developing data pipelines and ETL processes.

Solid understanding of data modeling concepts and database design principles.

Familiarity with data integration and orchestration using Azure Data Factory.

Knowledge of data quality management and data governance practices.

Experience with performance tuning and optimization of data pipelines.

Strong problem-solving and troubleshooting skills related to data engineering.

Excellent collaboration and communication skills to work effectively in cross-functional teams.

Understanding of cloud computing principles and experience with Azure services.

Read more
RandomTrees

at RandomTrees

1 recruiter
Amareswarreddt yaddula
Posted by Amareswarreddt yaddula
Hyderabad
5 - 16 yrs
₹1L - ₹30L / yr
ETL
Informatica
Data Warehouse (DWH)
skill iconAmazon Web Services (AWS)
SQL
+3 more

We are #hiring for AWS Data Engineer expert to join our team


Job Title: AWS Data Engineer

Experience: 5 Yrs to 10Yrs

Location: Remote

Notice: Immediate or Max 20 Days

Role: Permanent Role


Skillset: AWS, ETL, SQL, Python, Pyspark, Postgres DB, Dremio.


Job Description:

 Able to develop ETL jobs.

Able to help with data curation/cleanup, data transformation, and building ETL pipelines.

Strong Postgres DB exp and knowledge of Dremio data visualization/semantic layer between DB and the application is a plus.

Sql, Python, and Pyspark is a must.

Communication should be good





Read more
Get to hear about interesting companies hiring right now
Company logo
Company logo
Company logo
Company logo
Company logo
Linkedin iconFollow Cutshort
Why apply via Cutshort?
Connect with actual hiring teams and get their fast response. No spam.
Find more jobs
Get to hear about interesting companies hiring right now
Company logo
Company logo
Company logo
Company logo
Company logo
Linkedin iconFollow Cutshort