50+ PySpark Jobs in India
Apply to 50+ PySpark Jobs on CutShort.io. Find your next job, effortlessly. Browse PySpark Jobs and apply today!


One of the reputed Client in India
Our Client is looking to hire Databricks Amin immediatly.
This is PAN-INDIA Bulk hiring
Minimum of 6-8+ years with Databricks, Pyspark/Python and AWS.
Must have AWS
Notice 15-30 days is preferred.
Share profiles at hr at etpspl dot com
Please refer/share our email to your friends/colleagues who are looking for job.
About Moative
Moative, an Applied AI company, designs and builds transformation AI solutions for traditional industries in energy, utilities, healthcare & lifesciences, and more. Through Moative Labs, we build AI micro-products and launch AI startups with partners in vertical markets that align with our theses.
Our Past: We have built and sold two companies, one of which was an AI company. Our founders and leaders are Math PhDs, Ivy League University Alumni, Ex-Googlers, and successful entrepreneurs.
Our Team: Our team of 20+ employees consist of data scientists, AI/ML Engineers, and mathematicians from top engineering and research institutes such as IITs, CERN, IISc, UZH, Ph.Ds. Our team includes academicians, IBM Research Fellows, and former founders.
Work you’ll do
As a Data Engineer, you will work on data architecture, large-scale processing systems, and data flow management. You will build and maintain optimal data architecture and data pipelines, assemble large, complex data sets, and ensure that data is readily available to data scientists, analysts, and other users. In close collaboration with ML engineers, data scientists, and domain experts, you’ll deliver robust, production-grade solutions that directly impact business outcomes. Ultimately, you will be responsible for developing and implementing systems that optimize the organization’s data use and data quality.
Responsibilities
- Create and maintain optimal data architecture and data pipelines on cloud infrastructure (such as AWS/ Azure/ GCP)
- Assemble large, complex data sets that meet functional / non-functional business requirements
- Identify, design, and implement internal process improvements
- Build the pipeline infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources
- Support development of analytics that utilize the data pipeline to provide actionable insights into key business metrics
- Work with stakeholders to assist with data-related technical issues and support their data infrastructure needs
Who you are
You are a passionate and results-oriented engineer who understands the importance of data architecture and data quality to impact solution development, enhance products, and ultimately improve business applications. You thrive in dynamic environments and are comfortable navigating ambiguity. You possess a strong sense of ownership and are eager to take initiative, advocating for your technical decisions while remaining open to feedback and collaboration.
You have experience in developing and deploying data pipelines to support real-world applications. You have a good understanding of data structures and are excellent at writing clean, efficient code to extract, create and manage large data sets for analytical uses. You have the ability to conduct regular testing and debugging to ensure optimal data pipeline performance. You are excited at the possibility of contributing to intelligent applications that can directly impact business services and make a positive difference to users.
Skills & Requirements
- 3+ years of hands-on experience as a data engineer, data architect or similar role, with a good understanding of data structures and data engineering.
- Solid knowledge of cloud infra and data-related services on AWS (EC2, EMR, RDS, Redshift) and/ or Azure.
- Advanced knowledge of SQL, including writing complex queries, stored procedures, views, etc.
- Strong experience with data pipeline and workflow management tools (such as Luigi, Airflow).
- Experience with common relational SQL, NoSQL and Graph databases.
- Strong experience with scripting languages: Python, PySpark, Scala, etc.
- Practical experience with basic DevOps concepts: CI/CD, containerization (Docker, Kubernetes), etc
- Experience with big data tools (Spark, Kafka, etc) and stream processing.
- Excellent communication skills to collaborate with colleagues from both technical and business backgrounds, discuss and convey ideas and findings effectively.
- Ability to analyze complex problems, think critically for troubleshooting and develop robust data solutions.
- Ability to identify and tackle issues efficiently and proactively, conduct thorough research and collaborate to find long-term, scalable solutions.
Working at Moative
Moative is a young company, but we believe strongly in thinking long-term, while acting with urgency. Our ethos is rooted in innovation, efficiency and high-quality outcomes. We believe the future of work is AI-augmented and boundary less. Here are some of our guiding principles:
- Think in decades. Act in hours. As an independent company, our moat is time. While our decisions are for the long-term horizon, our execution will be fast – measured in hours and days, not weeks and months.
- Own the canvas. Throw yourself in to build, fix or improve – anything that isn’t done right, irrespective of who did it. Be selfish about improving across the organization – because once the rot sets in, we waste years in surgery and recovery.
- Use data or don’t use data. Use data where you ought to but not as a ‘cover-my-back’ political tool. Be capable of making decisions with partial or limited data. Get better at intuition and pattern-matching. Whichever way you go, be mostly right about it.
- Avoid work about work. Process creeps on purpose, unless we constantly question it. We are deliberate about committing to rituals that take time away from the actual work. We truly believe that a meeting that could be an email, should be an email and you don’t need a person with the highest title to say that out loud.
- High revenue per person. We work backwards from this metric. Our default is to automate instead of hiring. We multi-skill our people to own more outcomes than hiring someone who has less to do. We don’t like squatting and hoarding that comes in the form of hiring for growth. High revenue per person comes from high quality work from everyone. We demand it.
If this role and our work is of interest to you, please apply. We encourage you to apply even if you believe you do not meet all the requirements listed above.
That said, you should demonstrate that you are in the 90th percentile or above. This may mean that you have studied in top-notch institutions, won competitions that are intellectually demanding, built something of your own, or rated as an outstanding performer by your current or previous employers.
The position is based out of Chennai. Our work currently involves significant in-person collaboration and we expect you to work out of our offices in Chennai.
We are looking for a highly skilled Sr. Big Data Engineer with 3-5 years of experience in
building large-scale data pipelines, real-time streaming solutions, and batch/stream
processing systems. The ideal candidate should be proficient in Spark, Kafka, Python, and
AWS Big Data services, with hands-on experience in implementing CDC (Change Data
Capture) pipelines and integrating multiple data sources and sinks.
Responsibilities
- Design, develop, and optimize batch and streaming data pipelines using Apache Spark and Python.
- Build and maintain real-time data ingestion pipelines leveraging Kafka and AWS Kinesis.
- Implement CDC (Change Data Capture) pipelines using Kafka Connect, Debezium or similar frameworks.
- Integrate data from multiple sources and sinks (databases, APIs, message queues, file systems, cloud storage).
- Work with AWS Big Data ecosystem: Glue, EMR, Kinesis, Athena, S3, Lambda, Step Functions.
- Ensure pipeline scalability, reliability, and performance tuning of Spark jobs and EMR clusters.
- Develop data transformation and ETL workflows in AWS Glue and manage schema evolution.
- Collaborate with data scientists, analysts, and product teams to deliver reliable and high-quality data solutions.
- Implement monitoring, logging, and alerting for critical data pipelines.
- Follow best practices for data security, compliance, and cost optimization in cloud environments.
Required Skills & Experience
- Programming: Strong proficiency in Python (PySpark, data frameworks, automation).
- Big Data Processing: Hands-on experience with Apache Spark (batch & streaming).
- Messaging & Streaming: Proficient in Kafka (brokers, topics, partitions, consumer groups) and AWS Kinesis.
- CDC Pipelines: Experience with Debezium / Kafka Connect / custom CDC frameworks.
- AWS Services: AWS Glue, EMR, S3, Athena, Lambda, IAM, CloudWatch.
- ETL/ELT Workflows: Strong knowledge of data ingestion, transformation, partitioning, schema management.
- Databases: Experience with relational databases (MySQL, Postgres, Oracle) and NoSQL (MongoDB, DynamoDB, Cassandra).
- Data Formats: JSON, Parquet, Avro, ORC, Delta/Iceberg/Hudi.
- Version Control & CI/CD: Git, GitHub/GitLab, Jenkins, or CodePipeline.
- Monitoring/Logging: CloudWatch, Prometheus, ELK/Opensearch.
- Containers & Orchestration (nice-to-have): Docker, Kubernetes, Airflow/Step
- Functions for workflow orchestration.
Preferred Qualifications
- Bachelor’s or Master’s degree in Computer Science, Data Engineering, or related field.
- Experience in large-scale data lake / lake house architectures.
- Knowledge of data warehousing concepts and query optimisation.
- Familiarity with data governance, lineage, and cataloging tools (Glue Data Catalog, Apache Atlas).
- Exposure to ML/AI data pipelines is a plus.
Tools & Technologies (must-have exposure)
- Big Data & Processing: Apache Spark, PySpark, AWS EMR, AWS Glue
- Streaming & Messaging: Apache Kafka, Kafka Connect, Debezium, AWS Kinesis
- Cloud & Storage: AWS (S3, Athena, Lambda, IAM, CloudWatch)
- Programming & Scripting: Python, SQL, Bash
- Orchestration: Airflow / Step Functions
- Version Control & CI/CD: Git, Jenkins/CodePipeline
- Data Formats: Parquet, Avro, ORC, JSON, Delta, Iceberg, Hudi
Supercharge Your Career as a Technical Lead - Python at Technoidentity!
Are you ready to solve people challenges that fuel business growth? At Technoidentity, we’re a Data+AI product engineering company building cutting-edge solutions in the FinTech domain for over 13 years—and we’re expanding globally. It’s the perfect time to join our
team of tech innovators and leave your mark!
At Technoidentity, we’re a Data + AI product engineering company trusted to deliver scalable and modern enterprise solutions. Join us as a Senior Python Developer and Technical Lead, where you'll guide high-performing engineering teams, design complex systems, and deliver
clean, scalable backend solutions using Python and modern data technologies. Your leadership will directly shape the architecture and execution of enterprise projects, with added strength in understanding database logic including PL/SQL and PostgreSQL/AlloyDB.
What’s in it for You?
• Modern Python Stack – Python 3.x, FastAPI, Pandas, NumPy, SQLAlchemy, PostgreSQL/AlloyDB, PL/pgSQL.
• Tech Leadership – Drive technical decision-making, mentor developers, and ensure code quality and scalability.
• Scalable Projects – Architect and optimize data-intensive backend services for highthroughput and distributed systems.
• Engineering Best Practices – Enforce clean architecture, code reviews, testing strategies, and SDLC alignment.
• Cross-Functional Collaboration – Lead conversations across engineering, QA, product, and DevOps to ensure delivery excellence.
What Will You Be Doing?
Technical Leadership
• Lead a team of developers through design, code reviews, and technical mentorship.
• Set architectural direction and ensure scalability, modularity, and code quality.
• Work with stakeholders to translate business goals into robust technical solutions.
Backend Development & Data Engineering
• Design and build clean, high-performance backend services using FastAPI and Python
best practices.
• Handle row- and column-level data transformation using Pandas and NumPy.
• Apply data wrangling, cleansing, and preprocessing techniques across microservices and pipelines.
Database & Performance Optimization
• Write performant queries, procedures, and triggers using PostgreSQL and PL/pgSQL.
• Understand legacy logic in PL/SQL and participate in rewriting or modernizing it for PostgreSQL-based systems.
• Tune both backend and database performance, including memory, indexing, and query optimization.
Parallelism & Communication
• Implement multithreading, multiprocessing, and parallel data flows in Python.
• Integrate Kafka, RabbitMQ, or Pub/Sub systems for real-time and async message
processing.
Engineering Excellence
• Drive adherence to Agile, Git-based workflows, CI/CD, and DevOps pipelines.
• Promote testing (unit/integration), monitoring, and observability for all backend systems.
• Stay current with Python ecosystem evolution and introduce tools that improve productivity and performance.
What Makes You the Perfect Fit?
• 6–10 years of proven experience in Python development, with strong expertise in designing and delivering scalable backend solutions
Key Responsibilities
- Develop and maintain Python-based applications.
- Design and optimize SQL queries and databases.
- Collaborate with cross-functional teams to define, design, and ship new features.
- Write clean, maintainable, and efficient code.
- Troubleshoot and debug applications.
- Participate in code reviews and contribute to team knowledge sharing.
Qualifications and Required Skills
- Strong proficiency in Python programming.
- Experience with SQL and database management.
- Experience with web frameworks such as Django or Flask.
- Knowledge of front-end technologies like HTML, CSS, and JavaScript.
- Familiarity with version control systems like Git.
- Strong problem-solving skills and attention to detail.
- Excellent communication and teamwork skills.
Good to Have Skills
- Experience with cloud platforms like AWS or Azure.
- Knowledge of containerization technologies like Docker.
- Familiarity with continuous integration and continuous deployment (CI/CD) pipelines
Wissen Technology is hiring for Data Engineer
About Wissen Technology: At Wissen Technology, we deliver niche, custom-built products that solve complex business challenges across industries worldwide. Founded in 2015, our core philosophy is built around a strong product engineering mindset—ensuring every solution is architected and delivered right the first time. Today, Wissen Technology has a global footprint with 2000+ employees across offices in the US, UK, UAE, India, and Australia. Our commitment to excellence translates into delivering 2X impact compared to traditional service providers. How do we achieve this? Through a combination of deep domain knowledge, cutting-edge technology expertise, and a relentless focus on quality. We don’t just meet expectations—we exceed them by ensuring faster time-to-market, reduced rework, and greater alignment with client objectives. We have a proven track record of building mission-critical systems across industries, including financial services, healthcare, retail, manufacturing, and more. Wissen stands apart through its unique delivery models. Our outcome-based projects ensure predictable costs and timelines, while our agile pods provide clients the flexibility to adapt to their evolving business needs. Wissen leverages its thought leadership and technology prowess to drive superior business outcomes. Our success is powered by top-tier talent. Our mission is clear: to be the partner of choice for building world-class custom products that deliver exceptional impact—the first time, every time.
Job Summary: Wissen Technology is hiring a Data Engineer with expertise in Python, Pandas, Airflow, and Azure Cloud Services. The ideal candidate will have strong communication skills and experience with Kubernetes.
Experience: 4-7 years
Notice Period: Immediate- 15 days
Location: Pune, Mumbai, Bangalore
Mode of Work: Hybrid
Key Responsibilities:
- Develop and maintain data pipelines using Python and Pandas.
- Implement and manage workflows using Airflow.
- Utilize Azure Cloud Services for data storage and processing.
- Collaborate with cross-functional teams to understand data requirements and deliver solutions.
- Ensure data quality and integrity throughout the data lifecycle.
- Optimize and scale data infrastructure to meet business needs.
Qualifications and Required Skills:
- Proficiency in Python (Must Have).
- Strong experience with Pandas (Must Have).
- Expertise in Airflow (Must Have).
- Experience with Azure Cloud Services.
- Good communication skills.
Good to Have Skills:
- Experience with Pyspark.
- Knowledge of Kubernetes.
Wissen Sites:
- Website: http://www.wissen.com
- LinkedIn: https://www.linkedin.com/company/wissen-technology
- Wissen Leadership: https://www.wissen.com/company/leadership-team/
- Wissen Live: https://www.linkedin.com/company/wissen-technology/posts/feedView=All
- Wissen Thought Leadership: https://www.wissen.com/articles/
Experience: 3–7 Years
Locations: Pune / Bangalore / Mumbai
Notice Period :Immediate joiner only
Employment Type: Full-time
🛠️ Key Skills (Mandatory):
- Python: Strong coding skills for data manipulation and automation.
- PySpark: Experience with distributed data processing using Spark.
- SQL: Proficient in writing complex queries for data extraction and transformation.
- Azure Databricks: Hands-on experience with notebooks, Delta Lake, and MLflow
Interested candidates please share resume with details below.
Total Experience -
Relevant Experience in Python,Pyspark,AQL,Azure Data bricks-
Current CTC -
Expected CTC -
Notice period -
Current Location -
Desired Location -
Wissen Technology is hiring for Data Engineer
About Wissen Technology:At Wissen Technology, we deliver niche, custom-built products that solve complex business challenges across industries worldwide. Founded in 2015, our core philosophy is built around a strong product engineering mindset—ensuring every solution is architected and delivered right the first time. Today, Wissen Technology has a global footprint with 2000+ employees across offices in the US, UK, UAE, India, and Australia. Our commitment to excellence translates into delivering 2X impact compared to traditional service providers. How do we achieve this? Through a combination of deep domain knowledge, cutting-edge technology expertise, and a relentless focus on quality. We don’t just meet expectations—we exceed them by ensuring faster time-to-market, reduced rework, and greater alignment with client objectives. We have a proven track record of building mission-critical systems across industries, including financial services, healthcare, retail, manufacturing, and more. Wissen stands apart through its unique delivery models. Our outcome-based projects ensure predictable costs and timelines, while our agile pods provide clients the flexibility to adapt to their evolving business needs. Wissen leverages its thought leadership and technology prowess to drive superior business outcomes. Our success is powered by top-tier talent. Our mission is clear: to be the partner of choice for building world-class custom products that deliver exceptional impact—the first time, every time.
Job Summary:Wissen Technology is hiring a Data Engineer with a strong background in Python, data engineering, and workflow optimization. The ideal candidate will have experience with Delta Tables, Parquet, and be proficient in Pandas and PySpark.
Experience:7+ years
Location:Pune, Mumbai, Bangalore
Mode of Work:Hybrid
Key Responsibilities:
- Develop and maintain data pipelines using Python (Pandas, PySpark).
- Optimize data workflows and ensure efficient data processing.
- Work with Delta Tables and Parquet for data storage and management.
- Collaborate with cross-functional teams to understand data requirements and deliver solutions.
- Ensure data quality and integrity throughout the data lifecycle.
- Implement best practices for data engineering and workflow optimization.
Qualifications and Required Skills:
- Proficiency in Python, specifically with Pandas and PySpark.
- Strong experience in data engineering and workflow optimization.
- Knowledge of Delta Tables and Parquet.
- Excellent problem-solving skills and attention to detail.
- Ability to work collaboratively in a team environment.
- Strong communication skills.
Good to Have Skills:
- Experience with Databricks.
- Knowledge of Apache Spark, DBT, and Airflow.
- Advanced Pandas optimizations.
- Familiarity with PyTest/DBT testing frameworks.
Wissen Sites:
- Website: http://www.wissen.com
- LinkedIn: https://www.linkedin.com/company/wissen-technology
- Wissen Leadership: https://www.wissen.com/company/leadership-team/
- Wissen Live: https://www.linkedin.com/company/wissen-technology/posts/feedView=All
- Wissen Thought Leadership: https://www.wissen.com/articles/
Wissen | Driving Digital Transformation
A technology consultancy that drives digital innovation by connecting strategy and execution, helping global clients to strengthen their core technology.
Job Title: PySpark/Scala Developer
Functional Skills: Experience in Credit Risk/Regulatory risk domain
Technical Skills: Spark ,PySpark, Python, Hive, Scala, MapReduce, Unix shell scripting
Good to Have Skills: Exposure to Machine Learning Techniques
Job Description:
5+ Years of experience with Developing/Fine tuning and implementing programs/applications
Using Python/PySpark/Scala on Big Data/Hadoop Platform.
Roles and Responsibilities:
a) Work with a Leading Bank’s Risk Management team on specific projects/requirements pertaining to risk Models in
consumer and wholesale banking
b) Enhance Machine Learning Models using PySpark or Scala
c) Work with Data Scientists to Build ML Models based on Business Requirements and Follow ML Cycle to Deploy them all
the way to Production Environment
d) Participate Feature Engineering, Training Models, Scoring and retraining
e) Architect Data Pipeline and Automate Data Ingestion and Model Jobs
Skills and competencies:
Required:
· Strong analytical skills in conducting sophisticated statistical analysis using bureau/vendor data, customer performance
Data and macro-economic data to solve business problems.
· Working experience in languages PySpark & Scala to develop code to validate and implement models and codes in
Credit Risk/Banking
· Experience with distributed systems such as Hadoop/MapReduce, Spark, streaming data processing, cloud architecture.
- Familiarity with machine learning frameworks and libraries (like scikit-learn, SparkML, tensorflow, pytorch etc.
- Experience in systems integration, web services, batch processing
- Experience in migrating codes to PySpark/Scala is big Plus
- The ability to act as liaison conveying information needs of the business to IT and data constraints to the business
applies equal conveyance regarding business strategy and IT strategy, business processes and work flow
· Flexibility in approach and thought process
· Attitude to learn and comprehend the periodical changes in the regulatory requirement as per FED
Skills and competencies:
Required:
· Strong analytical skills in conducting sophisticated statistical analysis using bureau/vendor data, customer performance
Data and macro-economic data to solve business problems.
· Working experience in languages PySpark & Scala to develop code to validate and implement models and codes in
Credit Risk/Banking
· Experience with distributed systems such as Hadoop/MapReduce, Spark, streaming data processing, cloud architecture.
- Familiarity with machine learning frameworks and libraries (like scikit-learn, SparkML, tensorflow, pytorch etc.
- Experience in systems integration, web services, batch processing
- Experience in migrating codes to PySpark/Scala is big Plus
- The ability to act as liaison conveying information needs of the business to IT and data constraints to the business
applies equal conveyance regarding business strategy and IT strategy, business processes and work flow
· Flexibility in approach and thought process
· Attitude to learn and comprehend the periodical changes in the regulatory requirement as per FED
Profile: AWS Data Engineer
Mandate skills :AWS + Databricks + Pyspark + SQL role
Location: Bangalore/Pune/Hyderabad/Chennai/Gurgaon:
Notice Period: Immediate
Key Requirements :
- Design, build, and maintain scalable data pipelines to collect, process, and store from multiple datasets.
- Optimize data storage solutions for better performance, scalability, and cost-efficiency.
- Develop and manage ETL/ELT processes to transform data as per schema definitions, apply slicing and dicing, and make it available for downstream jobs and other teams.
- Collaborate closely with cross-functional teams to understand system and product functionalities, pace up feature development, and capture evolving data requirements.
- Engage with stakeholders to gather requirements and create curated datasets for downstream consumption and end-user reporting.
- Automate deployment and CI/CD processes using GitHub workflows, identifying areas to reduce manual, repetitive work.
- Ensure compliance with data governance policies, privacy regulations, and security protocols.
- Utilize cloud platforms like AWS and work on Databricks for data processing with S3 Storage.
- Work with distributed systems and big data technologies such as Spark, SQL, and Delta Lake.
- Integrate with SFTP to push data securely from Databricks to remote locations.
- Analyze and interpret spark query execution plans to fine-tune queries for faster and more efficient processing.
- Strong problem-solving and troubleshooting skills in large-scale distributed systems.
• Data Pipeline Development: Design and implement scalable data pipelines using PySpark and Databricks on AWS cloud infrastructure
• ETL/ELT Operations: Extract, transform, and load data from various sources using Python, SQL, and PySpark for batch and streaming data processing
• Databricks Platform Management: Develop and maintain data workflows, notebooks, and clusters in Databricks environment for efficient data processing
• AWS Cloud Services: Utilize AWS services including S3, Glue, EMR, Redshift, Kinesis, and Lambda for comprehensive data solutions
• Data Transformation: Write efficient PySpark scripts and SQL queries to process large-scale datasets and implement complex business logic
• Data Quality & Monitoring: Implement data validation, quality checks, and monitoring solutions to ensure data integrity across pipelines
• Collaboration: Work closely with data scientists, analysts, and other engineering teams to support analytics and machine learning initiatives
• Performance Optimization: Monitor and optimize data pipeline performance, query efficiency, and resource utilization in Databricks and AWS environments
Required Qualifications:
• Experience: 3+ years of hands-on experience in data engineering, ETL development, or related field
• PySpark Expertise: Strong proficiency in PySpark for large-scale data processing and transformations
• Python Programming: Solid Python programming skills with experience in data manipulation libraries (pandas etc)
• SQL Proficiency: Advanced SQL skills including complex queries, window functions, and performance optimization
• Databricks Experience: Hands-on experience with Databricks platform, including notebook development, cluster management, and job scheduling
• AWS Cloud Services: Working knowledge of core AWS services (S3, Glue, EMR, Redshift, IAM, Lambda)
• Data Modeling: Understanding of dimensional modeling, data warehousing concepts, and ETL best practices
• Version Control: Experience with Git and collaborative development workflows
Preferred Qualifications:
• Education: Bachelor's degree in Computer Science, Engineering, Mathematics, or related technical field
• Advanced AWS: Experience with additional AWS services like Athena, QuickSight, Step Functions, and CloudWatch
• Data Formats: Experience working with various data formats (JSON, Parquet, Avro, Delta Lake)
• Containerization: Basic knowledge of Docker and container orchestration
• Agile Methodology: Experience working in Agile/Scrum development environments
• Business Intelligence Tools: Exposure to BI tools like Tableau, Power BI, or Databricks SQL Analytics
Technical Skills Summary:
Core Technologies:
- PySpark & Spark SQL
- Python (pandas, boto3)
- SQL (PostgreSQL, MySQL, Redshift)
- Databricks (notebooks, clusters, jobs, Delta Lake)
AWS Services:
- S3, Glue, EMR, Redshift
- Lambda, Athena
- IAM, CloudWatch
Development Tools:
- Git/GitHub
- CI/CD pipelines, Docker
- Linux/Unix command line
About Data Axle:
Data Axle Inc. has been an industry leader in data, marketing solutions, sales and research for over 50 years in the USA. Data Axle now as an established strategic global centre of excellence in Pune. This centre delivers mission critical data services to its global customers powered by its proprietary cloud-based technology platform and by leveraging proprietary business & consumer databases.
Data Axle Pune is pleased to have achieved certification as a Great Place to Work!
Roles & Responsibilities:
We are looking for a Data Scientist to join the Data Science Client Services team to continue our success of identifying high quality target audiences that generate profitable marketing return for our clients. We are looking for experienced data science, machine learning and MLOps practitioners to design, build and deploy impactful predictive marketing solutions that serve a wide range of verticals and clients. The right candidate will enjoy contributing to and learning from a highly talented team and working on a variety of projects.
We are looking for a Senior Data Scientist who will be responsible for:
- Ownership of design, implementation, and deployment of machine learning algorithms in a modern Python-based cloud architecture
- Design or enhance ML workflows for data ingestion, model design, model inference and scoring
- Oversight on team project execution and delivery
- If senior, establish peer review guidelines for high quality coding to help develop junior team members’ skill set growth, cross-training, and team efficiencies
- Visualize and publish model performance results and insights to internal and external audiences
Qualifications:
- Masters in a relevant quantitative, applied field (Statistics, Econometrics, Computer Science, Mathematics, Engineering)
- Minimum of 3.5 years of work experience in the end-to-end lifecycle of ML model development and deployment into production within a cloud infrastructure (Databricks is highly preferred)
- Proven ability to manage the output of a small team in a fast-paced environment and to lead by example in the fulfilment of client requests
- Exhibit deep knowledge of core mathematical principles relating to data science and machine learning (ML Theory + Best Practices, Feature Engineering and Selection, Supervised and Unsupervised ML, A/B Testing, etc.)
- Proficiency in Python and SQL required; PySpark/Spark experience a plus
- Ability to conduct a productive peer review and proper code structure in Github
- Proven experience developing, testing, and deploying various ML algorithms (neural networks, XGBoost, Bayes, and the like)
- Working knowledge of modern CI/CD methods This position description is intended to describe the duties most frequently performed by an individual in this position.
It is not intended to be a complete list of assigned duties but to describe a position level.
Technical Architect (Databricks)
- 10+ Years Data Engineering Experience with expertise in Databricks
- 3+ years of consulting experience
- Completed Data Engineering Professional certification & required classes
- Minimum 2-3 projects delivered with hands-on experience in Databricks
- Completed Apache Spark Programming with Databricks, Data Engineering with Databricks, Optimizing Apache Spark™ on Databricks
- Experience in Spark and/or Hadoop, Flink, Presto, other popular big data engines
- Familiarity with Databricks multi-hop pipeline architecture
Sr. Data Engineer (Databricks)
- 5+ Years Data Engineering Experience with expertise in Databricks
- Completed Data Engineering Associate certification & required classes
- Minimum 1 project delivered with hands-on experience in development on Databricks
- Completed Apache Spark Programming with Databricks, Data Engineering with Databricks, Optimizing Apache Spark™ on Databricks
- SQL delivery experience, and familiarity with Bigquery, Synapse or Redshift
- Proficient in Python, knowledge of additional databricks programming languages (Scala)
We are seeking a highly skilled Fabric Data Engineer with strong expertise in Azure ecosystem to design, build, and maintain scalable data solutions. The ideal candidate will have hands-on experience with Microsoft Fabric, Databricks, Azure Data Factory, PySpark, SQL, and other Azure services to support advanced analytics and data-driven decision-making.
Key Responsibilities
- Design, develop, and maintain scalable data pipelines using Microsoft Fabric and Azure data services.
- Implement data integration, transformation, and orchestration workflows with Azure Data Factory, Databricks, and PySpark.
- Work with stakeholders to understand business requirements and translate them into robust data solutions.
- Optimize performance and ensure data quality, reliability, and security across all layers.
- Develop and maintain data models, metadata, and documentation to support analytics and reporting.
- Collaborate with data scientists, analysts, and business teams to deliver insights-driven solutions.
- Stay updated with emerging Azure and Fabric technologies to recommend best practices and innovations.
- Required Skills & Experience
- Proven experience as a Data Engineer with strong expertise in the Azure cloud ecosystem.
Hands-on experience with:
- Microsoft Fabric
- Azure Databricks
- Azure Data Factory (ADF)
- PySpark & Python
- SQL (T-SQL/PL-SQL)
- Solid understanding of data warehousing, ETL/ELT processes, and big data architectures.
- Knowledge of data governance, security, and compliance within Azure.
- Strong problem-solving, debugging, and performance tuning skills.
- Excellent communication and collaboration abilities.
Preferred Qualifications
- Microsoft Certified: Fabric Analytics Engineer Associate / Azure Data Engineer Associate.
- Experience with Power BI, Delta Lake, and Lakehouse architecture.
- Exposure to DevOps, CI/CD pipelines, and Git-based version control.
We are hiring freelancers to work on advanced Data & AI projects using Databricks. If you are passionate about cloud platforms, machine learning, data engineering, or architecture, and want to work with cutting-edge tools on real-world challenges, this is the opportunity for you!
✅ Key Details
- Work Type: Freelance / Contract
- Location: Remote
- Time Zones: IST / EST only
- Domain: Data & AI, Cloud, Big Data, Machine Learning
- Collaboration: Work with industry leaders on innovative projects
🔹 Open Roles
1. Databricks – Senior Consultant
- Skills: Data Warehousing, Python, Java, Scala, ETL, SQL, AWS, GCP, Azure
- Experience: 6+ years
2. Databricks – ML Engineer
- Skills: CI/CD, MLOps, Machine Learning, Spark, Hadoop
- Experience: 4+ years
3. Databricks – Solution Architect
- Skills: Azure, GCP, AWS, CI/CD, MLOps
- Experience: 7+ years
4. Databricks – Solution Consultant
- Skills: SQL, Spark, BigQuery, Python, Scala
- Experience: 2+ years
✅ What We Offer
- Opportunity to work with top-tier professionals and clients
- Exposure to cutting-edge technologies and real-world data challenges
- Flexible remote work environment aligned with IST / EST time zones
- Competitive compensation and growth opportunities
📌 Skills We Value
Cloud Computing | Data Warehousing | Python | Java | Scala | ETL | SQL | AWS | GCP | Azure | CI/CD | MLOps | Machine Learning | Spark |
Job Summary:
We are looking for a highly skilled and experienced Data Engineer with deep expertise in Airflow, dbt, Python, and Snowflake. The ideal candidate will be responsible for designing, building, and managing scalable data pipelines and transformation frameworks to enable robust data workflows across the organization.
Key Responsibilities:
- Design and implement scalable ETL/ELT pipelines using Apache Airflow for orchestration.
- Develop modular and maintainable data transformation models using dbt.
- Write high-performance data processing scripts and automation using Python.
- Build and maintain data models and pipelines on Snowflake.
- Collaborate with data analysts, data scientists, and business teams to deliver clean, reliable, and timely data.
- Monitor and optimize pipeline performance and troubleshoot issues proactively.
- Follow best practices in version control, testing, and CI/CD for data projects.
Must-Have Skills:
- Strong hands-on experience with Apache Airflow for scheduling and orchestrating data workflows.
- Proficiency in dbt (data build tool) for building scalable and testable data models.
- Expert-level skills in Python for data processing and automation.
- Solid experience with Snowflake, including SQL performance tuning, data modeling, and warehouse management.
- Strong understanding of data engineering best practices including modularity, testing, and deployment.
Good to Have:
- Experience working with cloud platforms (AWS/GCP/Azure).
- Familiarity with CI/CD pipelines for data (e.g., GitHub Actions, GitLab CI).
- Exposure to modern data stack tools (e.g., Fivetran, Stitch, Looker).
- Knowledge of data security and governance best practices.
Note : One face-to-face (F2F) round is mandatory, and as per the process, you will need to visit the office for this.
Job Description:
- 4+ years of experience in a Data Engineer role,
- Experience with object-oriented/object function scripting languages: Python, Scala, Golang, Java, etc.
- Experience with Big data tools such as Spark, Hadoop/ Kafka/ Airflow/Hive
- Experience with Streaming data: Spark/Kinesis/Kafka/Pubsub/Event Hub
- Experience with GCP/Azure data factory/AWS
- Strong in SQL Scripting
- Experience with ETL tools
- Knowledge of Snowflake Data Warehouse
- Knowledge of Orchestration frameworks: Airflow/Luig
- Good to have knowledge of Data Quality Management frameworks
- Good to have knowledge of Master Data Management
- Self-learning abilities are a must
- Familiarity with upcoming new technologies is a strong plus.
- Should have a bachelor's degree in big data analytics, computer engineering, or a related field
Personal Competency:
- Strong communication skills is a MUST
- Self-motivated, detail-oriented
- Strong organizational skills
- Ability to prioritize workloads and meet deadlines
Job Description
Overview:
We are seeking an experienced Azure Data Engineer to join our team in a hybrid Developer/Support capacity. This role focuses on enhancing and supporting existing Data & Analytics solutions by leveraging Azure Data Engineering technologies. The engineer will work on developing, maintaining, and deploying IT products and solutions that serve various business users, with a strong emphasis on performance, scalability, and reliability.
Must-Have Skills:
Azure Databricks
PySpark
Azure Synapse Analytics
Key Responsibilities:
- Incident classification and prioritization
- Log analysis and trend identification
- Coordination with Subject Matter Experts (SMEs)
- Escalation of unresolved or complex issues
- Root cause analysis and permanent resolution implementation
- Stakeholder communication and status updates
- Resolution of complex and major incidents
- Code reviews (Per week 2 per individual) to ensure adherence to standards and optimize performance
- Bug fixing of recurring or critical issues identified during operations
- Gold layer tasks, including enhancements and performance tuning.
- Design, develop, and support data pipelines and solutions using Azure data engineering services.
- Implement data flow and ETL techniques leveraging Azure Data Factory, Databricks, and Synapse.
- Cleanse, transform, and enrich datasets using Databricks notebooks and PySpark.
- Orchestrate and automate workflows across services and systems.
- Collaborate with business and technical teams to deliver robust and scalable data solutions.
- Work in a support role to resolve incidents, handle change/service requests, and monitor performance.
- Contribute to CI/CD pipeline implementation using Azure DevOps.
Technical Requirements:
- 4 to 6 years of experience in IT and Azure data engineering technologies.
- Strong experience in Azure Databricks, Azure Synapse, and ADLS Gen2.
- Proficient in Python, PySpark, and SQL.
- Experience with file formats such as JSON and Parquet.
- Working knowledge of database systems, with a preference for Teradata and Snowflake.
- Hands-on experience with Azure DevOps and CI/CD pipeline deployments.
- Understanding of Data Warehousing concepts and data modeling best practices.
- Familiarity with SNOW (ServiceNow) for incident and change management.
Non-Technical Requirements:
- Ability to work independently and collaboratively in virtual teams across geographies.
- Strong analytical and problem-solving skills.
- Experience in Agile development practices, including estimation, testing, and deployment.
- Effective task and time management with the ability to prioritize under pressure.
- Clear communication and documentation skills for project updates and technical processes.
Technologies:
- Azure Data Factory
- Azure Databricks
- Azure Synapse Analytics
- PySpark / SQL
- Azure Data Lake Storage (ADLS), Blob Storage
- Azure DevOps (CI/CD pipelines)
Nice-to-Have:
- Experience with Business Intelligence tools, preferably Power BI
- DP-203 certification (Azure Data Engineer Associate)
NOTE -
Weekly rotational shifts -
11am to 8pm
2pm to 11pm
5pm to 2 am
P.S. - In any one weekend they should be available in call. If there is any issues alone they should work on that. there will be on call support monthly once.
Key Responsibilities
- Design and implement ETL/ELT pipelines using Databricks, PySpark, and AWS Glue
- Develop and maintain scalable data architectures on AWS (S3, EMR, Lambda, Redshift, RDS)
- Perform data wrangling, cleansing, and transformation using Python and SQL
- Collaborate with data scientists to integrate Generative AI models into analytics workflows
- Build dashboards and reports to visualize insights using tools like Power BI or Tableau
- Ensure data quality, governance, and security across all data assets
- Optimize performance of data pipelines and troubleshoot bottlenecks
- Work closely with stakeholders to understand data requirements and deliver actionable insights
🧪 Required Skills
Skill AreaTools & TechnologiesCloud PlatformsAWS (S3, Lambda, Glue, EMR, Redshift)Big DataDatabricks, Apache Spark, PySparkProgrammingPython, SQLData EngineeringETL/ELT, Data Lakes, Data WarehousingAnalyticsData Modeling, Visualization, BI ReportingGen AI IntegrationOpenAI, Hugging Face, LangChain (preferred)DevOps (Bonus)Git, Jenkins, Terraform, Docker
📚 Qualifications
- Bachelor's or Master’s degree in Computer Science, Data Science, or related field
- 3+ years of experience in data engineering or data analytics
- Hands-on experience with Databricks, PySpark, and AWS
- Familiarity with Generative AI tools and frameworks is a strong plus
- Strong problem-solving and communication skills
🌟 Preferred Traits
- Analytical mindset with attention to detail
- Passion for data and emerging technologies
- Ability to work independently and in cross-functional teams
- Eagerness to learn and adapt in a fast-paced environment
Job Title: Python Developer
Location: Bangalore
Experience: 5–7 Years
Employment Type: Full-Time
Job Description:
We are seeking an experienced Python Developer with strong proficiency in data analysis tools and PySpark, along with a solid understanding of SQL syntax. The ideal candidate will work on large-scale data processing and analysis tasks within a fast-paced environment.
Key Requirements:
Python: Hands-on experience with Python, specifically in data analysis using libraries such as pandas, numpy, etc.
PySpark: Proficiency in writing efficient PySpark code for distributed data processing.
SQL: Strong knowledge of SQL syntax and experience in writing optimized queries.
Ability to work independently and collaborate effectively with cross-functional teams.
🔍 Job Description:
We are looking for an experienced and highly skilled Technical Lead to guide the development and enhancement of a large-scale Data Observability solution built on AWS. This platform is pivotal in delivering monitoring, reporting, and actionable insights across the client's data landscape.
The Technical Lead will drive end-to-end feature delivery, mentor junior engineers, and uphold engineering best practices. The position reports to the Programme Technical Lead / Architect and involves close collaboration to align on platform vision, technical priorities, and success KPIs.
🎯 Key Responsibilities:
- Lead the design, development, and delivery of features for the data observability solution.
- Mentor and guide junior engineers, promoting technical growth and engineering excellence.
- Collaborate with the architect to align on platform roadmap, vision, and success metrics.
- Ensure high quality, scalability, and performance in data engineering solutions.
- Contribute to code reviews, architecture discussions, and operational readiness.
🔧 Primary Must-Have Skills (Non-Negotiable):
- 5+ years in Data Engineering or Software Engineering roles.
- 3+ years in a technical team or squad leadership capacity.
- Deep expertise in AWS Data Services: Glue, EMR, Kinesis, Lambda, Athena, S3.
- Advanced programming experience with PySpark, Python, and SQL.
- Proven experience in building scalable, production-grade data pipelines on cloud platforms.
Job Title : Data Engineer – GCP + Spark + DBT
Location : Bengaluru (On-site at Client Location | 3 Days WFO)
Experience : 8 to 12 Years
Level : Associate Architect
Type : Full-time
Job Overview :
We are looking for a seasoned Data Engineer to join the Data Platform Engineering team supporting a Unified Data Platform (UDP). This role requires hands-on expertise in DBT, GCP, BigQuery, and PySpark, with a solid foundation in CI/CD, data pipeline optimization, and agile delivery.
Mandatory Skills : GCP, DBT, Google Dataform, BigQuery, PySpark/Spark SQL, Advanced SQL, CI/CD, Git, Agile Methodologies.
Key Responsibilities :
- Design, build, and optimize scalable data pipelines using BigQuery, DBT, and PySpark.
- Leverage GCP-native services like Cloud Storage, Pub/Sub, Dataproc, Cloud Functions, and Composer for ETL/ELT workflows.
- Implement and maintain CI/CD for data engineering projects with Git-based version control.
- Collaborate with cross-functional teams including Infra, Security, and DataOps for reliable, secure, and high-quality data delivery.
- Lead code reviews, mentor junior engineers, and enforce best practices in data engineering.
- Participate in Agile sprints, backlog grooming, and Jira-based project tracking.
Must-Have Skills :
- Strong experience with DBT, Google Dataform, and BigQuery
- Hands-on expertise with PySpark/Spark SQL
- Proficient in GCP for data engineering workflows
- Solid knowledge of SQL optimization, Git, and CI/CD pipelines
- Agile team experience and strong problem-solving abilities
Nice-to-Have Skills :
- Familiarity with Databricks, Delta Lake, or Kafka
- Exposure to data observability and quality frameworks (e.g., Great Expectations, Soda)
- Knowledge of MDM patterns, Terraform, or IaC is a plus
1. Solid Databricks & pyspark experience
2. Must have worked in projects dealing with data at terabyte scale
3. Must have knowledge of spark optimization techniques
4. Must have experience setting up job pipelines in Databricks
5. Basic knowledge of gcp and big query is required
6. Understanding LLMs and vector db
Job title - Python developer
Exp – 4 to 6 years
Location – Pune/Mum/B’lore
PFB JD
Requirements:
- Proven experience as a Python Developer
- Strong knowledge of core Python and Pyspark concepts
- Experience with web frameworks such as Django or Flask
- Good exposure to any cloud platform (GCP Preferred)
- CI/CD exposure required
- Solid understanding of RESTful APIs and how to build them
- Experience working with databases like Oracle DB and MySQL
- Ability to write efficient SQL queries and optimize database performance
- Strong problem-solving skills and attention to detail
- Strong SQL programing (stored procedure, functions)
- Excellent communication and interpersonal skill
Roles and Responsibilities
- Design, develop, and maintain data pipelines and ETL processes using pyspark
- Work closely with data scientists and analysts to provide them with clean, structured data.
- Optimize data storage and retrieval for performance and scalability.
- Collaborate with cross-functional teams to gather data requirements.
- Ensure data quality and integrity through data validation and cleansing processes.
- Monitor and troubleshoot data-related issues to ensure data pipeline reliability.
- Stay up to date with industry best practices and emerging technologies in data engineering.
🚀 We Are Hiring: Data Engineer | 4+ Years Experience 🚀
Job description
🔍 Job Title: Data Engineer
📍 Location: Ahmedabad
🚀 Work Mode: On-Site Opportunity
📅 Experience: 4+ Years
🕒 Employment Type: Full-Time
⏱️ Availability : Immediate Joiner Preferred
Join Our Team as a Data Engineer
We are seeking a passionate and experienced Data Engineer to be a part of our dynamic and forward-thinking team in Ahmedabad. This is an exciting opportunity for someone who thrives on transforming raw data into powerful insights and building scalable, high-performance data infrastructure.
As a Data Engineer, you will work closely with data scientists, analysts, and cross-functional teams to design robust data pipelines, optimize data systems, and enable data-driven decision-making across the organization.
Your Key Responsibilities
Architect, build, and maintain scalable and reliable data pipelines from diverse data sources.
Design effective data storage, retrieval mechanisms, and data models to support analytics and business needs.
Implement data validation, transformation, and quality monitoring processes.
Collaborate with cross-functional teams to deliver impactful, data-driven solutions.
Proactively identify bottlenecks and optimize existing workflows and processes.
Provide guidance and mentorship to junior engineers in the team.
Skills & Expertise We’re Looking For
3+ years of hands-on experience in Data Engineering or related roles.
Strong expertise in Python and data pipeline design.
Experience working with Big Data tools like Hadoop, Spark, Hive.
Proficiency with SQL, NoSQL databases, and data warehousing solutions.
Solid experience in cloud platforms - Azure
Familiar with distributed computing, data modeling, and performance tuning.
Understanding of DevOps, Power Automate, and Microsoft Fabric is a plus.
Strong analytical thinking, collaboration skills, Excellent Communication Skill and the ability to work independently or as part of a team.
Qualifications
Bachelor’s degree in Computer Science, Data Science, or a related field.
Company Name – Wissen Technology
Group of companies in India – Wissen Technology & Wissen Infotech
Work Location - Senior Backend Developer – Java (with Python Exposure)- Mumbai
Experience - 4 to 10 years
Kindly revert over mail if you are interested.
Java Developer – Job Description
We are seeking a Senior Backend Developer with strong expertise in Java (Spring Boot) and working knowledge of Python. In this role, Java will be your primary development language, with Python used for scripting, automation, or selected service modules. You’ll be part of a collaborative backend team building scalable and high-performance systems.
Key Responsibilities
- Design and develop robust backend services and APIs primarily using Java (Spring Boot)
- Contribute to Python-based components where needed for automation, scripting, or lightweight services
- Build, integrate, and optimize RESTful APIs and microservices
- Work with relational and NoSQL databases
- Write unit and integration tests (JUnit, PyTest)
- Collaborate closely with DevOps, QA, and product teams
- Participate in architecture reviews and design discussions
- Help maintain code quality, organization, and automation
Required Skills & Qualifications
- 4 to 10 years of hands-on Java development experience
- Strong experience with Spring Boot, JPA/Hibernate, and REST APIs
- At least 1–2 years of hands-on experience with Python (e.g., for scripting, automation, or small services)
- Familiarity with Python frameworks like Flask or FastAPI is a plus
- Experience with SQL/NoSQL databases (e.g., PostgreSQL, MongoDB)
- Good understanding of OOP, design patterns, and software engineering best practices
- Familiarity with Docker, Git, and CI/CD pipelines
Job Summary:
As an AWS Data Engineer, you will be responsible for designing, developing, and maintaining scalable, high-performance data pipelines using AWS services. With 6+ years of experience, you’ll collaborate closely with data architects, analysts, and business stakeholders to build reliable, secure, and cost-efficient data infrastructure across the organization.
Key Responsibilities:
- Design, develop, and manage scalable data pipelines using AWS Glue, Lambda, and other serverless technologies
- Implement ETL workflows and transformation logic using PySpark and Python on AWS Glue
- Leverage AWS Redshift for warehousing, performance tuning, and large-scale data queries
- Work with AWS DMS and RDS for database integration and migration
- Optimize data flows and system performance for speed and cost-effectiveness
- Deploy and manage infrastructure using AWS CloudFormation templates
- Collaborate with cross-functional teams to gather requirements and build robust data solutions
- Ensure data integrity, quality, and security across all systems and processes
Required Skills & Experience:
- 6+ years of experience in Data Engineering with strong AWS expertise
- Proficient in Python and PySpark for data processing and ETL development
- Hands-on experience with AWS Glue, Lambda, DMS, RDS, and Redshift
- Strong SQL skills for building complex queries and performing data analysis
- Familiarity with AWS CloudFormation and infrastructure as code principles
- Good understanding of serverless architecture and cost-optimized design
- Ability to write clean, modular, and maintainable code
- Strong analytical thinking and problem-solving skills
🚀 Hiring: Data Engineer | GCP + Spark + Python + .NET |
| 6–10 Yrs | Gurugram (Hybrid)
We’re looking for a skilled Data Engineer with strong hands-on experience in GCP, Spark-Scala, Python, and .NET.
📍 Location: Suncity, Sector 54, Gurugram (Hybrid – 3 days onsite)
💼 Experience: 6–10 Years
⏱️ Notice Period :- Immediate Joiner
Required Skills:
- 5+ years of experience in distributed computing (Spark) and software development.
- 3+ years of experience in Spark-Scala
- 5+ years of experience in Data Engineering.
- 5+ years of experience in Python.
- Fluency in working with databases (preferably Postgres).
- Have a sound understanding of object-oriented programming and development principles.
- Experience working in an Agile Scrum or Kanban development environment.
- Experience working with version control software (preferably Git).
- Experience with CI/CD pipelines.
- Experience with automated testing, including integration/delta, Load, and Performance
About the Role:
We are seeking a talented Lead Data Engineer to join our team and play a pivotal role in transforming raw data into valuable insights. As a Data Engineer, you will design, develop, and maintain robust data pipelines and infrastructure to support our organization's analytics and decision-making processes.
Responsibilities:
- Data Pipeline Development: Build and maintain scalable data pipelines to extract, transform, and load (ETL) data from various sources (e.g., databases, APIs, files) into data warehouses or data lakes.
- Data Infrastructure: Design, implement, and manage data infrastructure components, including data warehouses, data lakes, and data marts.
- Data Quality: Ensure data quality by implementing data validation, cleansing, and standardization processes.
- Team Management: Able to handle team.
- Performance Optimization: Optimize data pipelines and infrastructure for performance and efficiency.
- Collaboration: Collaborate with data analysts, scientists, and business stakeholders to understand their data needs and translate them into technical requirements.
- Tool and Technology Selection: Evaluate and select appropriate data engineering tools and technologies (e.g., SQL, Python, Spark, Hadoop, cloud platforms).
- Documentation: Create and maintain clear and comprehensive documentation for data pipelines, infrastructure, and processes.
Skills:
- Strong proficiency in SQL and at least one programming language (e.g., Python, Java).
- Experience with data warehousing and data lake technologies (e.g., Snowflake, AWS Redshift, Databricks).
- Knowledge of cloud platforms (e.g., AWS, GCP, Azure) and cloud-based data services.
- Understanding of data modeling and data architecture concepts.
- Experience with ETL/ELT tools and frameworks.
- Excellent problem-solving and analytical skills.
- Ability to work independently and as part of a team.
Preferred Qualifications:
- Experience with real-time data processing and streaming technologies (e.g., Kafka, Flink).
- Knowledge of machine learning and artificial intelligence concepts.
- Experience with data visualization tools (e.g., Tableau, Power BI).
- Certification in cloud platforms or data engineering.
Skill Name: ETL Automation Testing
Location: Bangalore, Chennai and Pune
Experience: 5+ Years
Required:
Experience in ETL Automation Testing
Strong experience in Pyspark.
Required Skills:
- Hands-on experience with Databricks, PySpark
- Proficiency in SQL, Python, and Spark.
- Understanding of data warehousing concepts and data modeling.
- Experience with CI/CD pipelines and version control (e.g., Git).
- Fundamental knowledge of any cloud services, preferably Azure or GCP.
Good to Have:
- Bigquery
- Experience with performance tuning and data governance.
About the Role:
We are seeking a talented Lead Data Engineer to join our team and play a pivotal role in transforming raw data into valuable insights. As a Data Engineer, you will design, develop, and maintain robust data pipelines and infrastructure to support our organization's analytics and decision-making processes.
Responsibilities:
- Data Pipeline Development: Build and maintain scalable data pipelines to extract, transform, and load (ETL) data from various sources (e.g., databases, APIs, files) into data warehouses or data lakes.
- Data Infrastructure: Design, implement, and manage data infrastructure components, including data warehouses, data lakes, and data marts.
- Data Quality: Ensure data quality by implementing data validation, cleansing, and standardization processes.
- Team Management: Able to handle team.
- Performance Optimization: Optimize data pipelines and infrastructure for performance and efficiency.
- Collaboration: Collaborate with data analysts, scientists, and business stakeholders to understand their data needs and translate them into technical requirements.
- Tool and Technology Selection: Evaluate and select appropriate data engineering tools and technologies (e.g., SQL, Python, Spark, Hadoop, cloud platforms).
- Documentation: Create and maintain clear and comprehensive documentation for data pipelines, infrastructure, and processes.
Skills:
- Strong proficiency in SQL and at least one programming language (e.g., Python, Java).
- Experience with data warehousing and data lake technologies (e.g., Snowflake, AWS Redshift, Databricks).
- Knowledge of cloud platforms (e.g., AWS, GCP, Azure) and cloud-based data services.
- Understanding of data modeling and data architecture concepts.
- Experience with ETL/ELT tools and frameworks.
- Excellent problem-solving and analytical skills.
- Ability to work independently and as part of a team.
Preferred Qualifications:
- Experience with real-time data processing and streaming technologies (e.g., Kafka, Flink).
- Knowledge of machine learning and artificial intelligence concepts.
- Experience with data visualization tools (e.g., Tableau, Power BI).
- Certification in cloud platforms or data engineering.
Position: AWS Data Engineer
Experience: 5 to 7 Years
Location: Bengaluru, Pune, Chennai, Mumbai, Gurugram
Work Mode: Hybrid (3 days work from office per week)
Employment Type: Full-time
About the Role:
We are seeking a highly skilled and motivated AWS Data Engineer with 5–7 years of experience in building and optimizing data pipelines, architectures, and data sets. The ideal candidate will have strong experience with AWS services including Glue, Athena, Redshift, Lambda, DMS, RDS, and CloudFormation. You will be responsible for managing the full data lifecycle from ingestion to transformation and storage, ensuring efficiency and performance.
Key Responsibilities:
- Design, develop, and optimize scalable ETL pipelines using AWS Glue, Python/PySpark, and SQL.
- Work extensively with AWS services such as Glue, Athena, Lambda, DMS, RDS, Redshift, CloudFormation, and other serverless technologies.
- Implement and manage data lake and warehouse solutions using AWS Redshift and S3.
- Optimize data models and storage for cost-efficiency and performance.
- Write advanced SQL queries to support complex data analysis and reporting requirements.
- Collaborate with stakeholders to understand data requirements and translate them into scalable solutions.
- Ensure high data quality and integrity across platforms and processes.
- Implement CI/CD pipelines and best practices for infrastructure as code using CloudFormation or similar tools.
Required Skills & Experience:
- Strong hands-on experience with Python or PySpark for data processing.
- Deep knowledge of AWS Glue, Athena, Lambda, Redshift, RDS, DMS, and CloudFormation.
- Proficiency in writing complex SQL queries and optimizing them for performance.
- Familiarity with serverless architectures and AWS best practices.
- Experience in designing and maintaining robust data architectures and data lakes.
- Ability to troubleshoot and resolve data pipeline issues efficiently.
- Strong communication and stakeholder management skills.
Position: AWS Data Engineer
Experience: 5 to 7 Years
Location: Bengaluru, Pune, Chennai, Mumbai, Gurugram
Work Mode: Hybrid (3 days work from office per week)
Employment Type: Full-time
About the Role:
We are seeking a highly skilled and motivated AWS Data Engineer with 5–7 years of experience in building and optimizing data pipelines, architectures, and data sets. The ideal candidate will have strong experience with AWS services including Glue, Athena, Redshift, Lambda, DMS, RDS, and CloudFormation. You will be responsible for managing the full data lifecycle from ingestion to transformation and storage, ensuring efficiency and performance.
Key Responsibilities:
- Design, develop, and optimize scalable ETL pipelines using AWS Glue, Python/PySpark, and SQL.
- Work extensively with AWS services such as Glue, Athena, Lambda, DMS, RDS, Redshift, CloudFormation, and other serverless technologies.
- Implement and manage data lake and warehouse solutions using AWS Redshift and S3.
- Optimize data models and storage for cost-efficiency and performance.
- Write advanced SQL queries to support complex data analysis and reporting requirements.
- Collaborate with stakeholders to understand data requirements and translate them into scalable solutions.
- Ensure high data quality and integrity across platforms and processes.
- Implement CI/CD pipelines and best practices for infrastructure as code using CloudFormation or similar tools.
Required Skills & Experience:
- Strong hands-on experience with Python or PySpark for data processing.
- Deep knowledge of AWS Glue, Athena, Lambda, Redshift, RDS, DMS, and CloudFormation.
- Proficiency in writing complex SQL queries and optimizing them for performance.
- Familiarity with serverless architectures and AWS best practices.
- Experience in designing and maintaining robust data architectures and data lakes.
- Ability to troubleshoot and resolve data pipeline issues efficiently.
- Strong communication and stakeholder management skills.
Job Summary:
Seeking a seasoned SQL + ETL Developer with 4+ years of experience in managing large-scale datasets and cloud-based data pipelines. The ideal candidate is hands-on with MySQL, PySpark, AWS Glue, and ETL workflows, with proven expertise in AWS migration and performance optimization.
Key Responsibilities:
- Develop and optimize complex SQL queries and stored procedures to handle large datasets (100+ million records).
- Build and maintain scalable ETL pipelines using AWS Glue and PySpark.
- Work on data migration tasks in AWS environments.
- Monitor and improve database performance; automate key performance indicators and reports.
- Collaborate with cross-functional teams to support data integration and delivery requirements.
- Write shell scripts for automation and manage ETL jobs efficiently.
Required Skills:
- Strong experience with MySQL, complex SQL queries, and stored procedures.
- Hands-on experience with AWS Glue, PySpark, and ETL processes.
- Good understanding of AWS ecosystem and migration strategies.
- Proficiency in shell scripting.
- Strong communication and collaboration skills.
Nice to Have:
- Working knowledge of Python.
- Experience with AWS RDS.
Profile: AWS Data Engineer
Mode- Hybrid
Experience- 5+7 years
Locations - Bengaluru, Pune, Chennai, Mumbai, Gurugram
Roles and Responsibilities
- Design and maintain ETL pipelines using AWS Glue and Python/PySpark
- Optimize SQL queries for Redshift and Athena
- Develop Lambda functions for serverless data processing
- Configure AWS DMS for database migration and replication
- Implement infrastructure as code with CloudFormation
- Build optimized data models for performance
- Manage RDS databases and AWS service integrations
- Troubleshoot and improve data processing efficiency
- Gather requirements from business stakeholders
- Implement data quality checks and validation
- Document data pipelines and architecture
- Monitor workflows and implement alerting
- Keep current with AWS services and best practices
Required Technical Expertise:
- Python/PySpark for data processing
- AWS Glue for ETL operations
- Redshift and Athena for data querying
- AWS Lambda and serverless architecture
- AWS DMS and RDS management
- CloudFormation for infrastructure
- SQL optimization and performance tuning
Job Overview:
We are seeking an experienced AWS Data Engineer to join our growing data team. The ideal candidate will have hands-on experience with AWS Glue, Redshift, PySpark, and other AWS services to build robust, scalable data pipelines. This role is perfect for someone passionate about data engineering, automation, and cloud-native development.
Key Responsibilities:
- Design, build, and maintain scalable and efficient ETL pipelines using AWS Glue, PySpark, and related tools.
- Integrate data from diverse sources and ensure its quality, consistency, and reliability.
- Work with large datasets in structured and semi-structured formats across cloud-based data lakes and warehouses.
- Optimize and maintain data infrastructure, including Amazon Redshift, for high performance.
- Collaborate with data analysts, data scientists, and product teams to understand data requirements and deliver solutions.
- Automate data validation, transformation, and loading processes to support real-time and batch data processing.
- Monitor and troubleshoot data pipeline issues and ensure smooth operations in production environments.
Required Skills:
- 5 to 7 years of hands-on experience in data engineering roles.
- Strong proficiency in Python and PySpark for data transformation and scripting.
- Deep understanding and practical experience with AWS Glue, AWS Redshift, S3, and other AWS data services.
- Solid understanding of SQL and database optimization techniques.
- Experience working with large-scale data pipelines and high-volume data environments.
- Good knowledge of data modeling, warehousing, and performance tuning.
Preferred/Good to Have:
- Experience with workflow orchestration tools like Airflow or Step Functions.
- Familiarity with CI/CD for data pipelines.
- Knowledge of data governance and security best practices on AWS.
Role - ETL Developer
Work Mode - Hybrid
Experience- 4+ years
Location - Pune, Gurgaon, Bengaluru, Mumbai
Required Skills - AWS, AWS Glue, Pyspark, ETL, SQL
Required Skills:
- 4+ years of hands-on experience in MySQL, including SQL queries and procedure development
- Experience in Pyspark, AWS, AWS Glue
- Experience in AWS ,Migration
- Experience with automated scripting and tracking KPIs/metrics for database performance
- Proficiency in shell scripting and ETL.
- Strong communication skills and a collaborative team player
- Knowledge of Python and AWS RDS is a plus
Job Description: Data Engineer
Position Overview:
Role Overview
We are seeking a skilled Python Data Engineer with expertise in designing and implementing data solutions using the AWS cloud platform. The ideal candidate will be responsible for building and maintaining scalable, efficient, and secure data pipelines while leveraging Python and AWS services to enable robust data analytics and decision-making processes.
Key Responsibilities
· Design, develop, and optimize data pipelines using Python and AWS services such as Glue, Lambda, S3, EMR, Redshift, Athena, and Kinesis.
· Implement ETL/ELT processes to extract, transform, and load data from various sources into centralized repositories (e.g., data lakes or data warehouses).
· Collaborate with cross-functional teams to understand business requirements and translate them into scalable data solutions.
· Monitor, troubleshoot, and enhance data workflows for performance and cost optimization.
· Ensure data quality and consistency by implementing validation and governance practices.
· Work on data security best practices in compliance with organizational policies and regulations.
· Automate repetitive data engineering tasks using Python scripts and frameworks.
· Leverage CI/CD pipelines for deployment of data workflows on AWS.
Role: GCP Data Engineer
Notice Period: Immediate Joiners
Experience: 5+ years
Location: Remote
Company: Deqode
About Deqode
At Deqode, we work with next-gen technologies to help businesses solve complex data challenges. Our collaborative teams build reliable, scalable systems that power smarter decisions and real-time analytics.
Key Responsibilities
- Build and maintain scalable, automated data pipelines using Python, PySpark, and SQL.
- Work on cloud-native data infrastructure using Google Cloud Platform (BigQuery, Cloud Storage, Dataflow).
- Implement clean, reusable transformations using DBT and Databricks.
- Design and schedule workflows using Apache Airflow.
- Collaborate with data scientists and analysts to ensure downstream data usability.
- Optimize pipelines and systems for performance and cost-efficiency.
- Follow best software engineering practices: version control, unit testing, code reviews, CI/CD.
- Manage and troubleshoot data workflows in Linux environments.
- Apply data governance and access control via Unity Catalog or similar tools.
Required Skills & Experience
- Strong hands-on experience with PySpark, Spark SQL, and Databricks.
- Solid understanding of GCP services (BigQuery, Cloud Functions, Dataflow, Cloud Storage).
- Proficiency in Python for scripting and automation.
- Expertise in SQL and data modeling.
- Experience with DBT for data transformations.
- Working knowledge of Airflow for workflow orchestration.
- Comfortable with Linux-based systems for deployment and troubleshooting.
- Familiar with Git for version control and collaborative development.
- Understanding of data pipeline optimization, monitoring, and debugging.
Work Mode: Hybrid
Need B.Tech, BE, M.Tech, ME candidates - Mandatory
Must-Have Skills:
● Educational Qualification :- B.Tech, BE, M.Tech, ME in any field.
● Minimum of 3 years of proven experience as a Data Engineer.
● Strong proficiency in Python programming language and SQL.
● Experience in DataBricks and setting up and managing data pipelines, data warehouses/lakes.
● Good comprehension and critical thinking skills.
● Kindly note Salary bracket will vary according to the exp. of the candidate -
- Experience from 4 yrs to 6 yrs - Salary upto 22 LPA
- Experience from 5 yrs to 8 yrs - Salary upto 30 LPA
- Experience more than 8 yrs - Salary upto 40 LPA
We are looking for a skilled and passionate Data Engineers with a strong foundation in Python programming and hands-on experience working with APIs, AWS cloud, and modern development practices. The ideal candidate will have a keen interest in building scalable backend systems and working with big data tools like PySpark.
Key Responsibilities:
- Write clean, scalable, and efficient Python code.
- Work with Python frameworks such as PySpark for data processing.
- Design, develop, update, and maintain APIs (RESTful).
- Deploy and manage code using GitHub CI/CD pipelines.
- Collaborate with cross-functional teams to define, design, and ship new features.
- Work on AWS cloud services for application deployment and infrastructure.
- Basic database design and interaction with MySQL or DynamoDB.
- Debugging and troubleshooting application issues and performance bottlenecks.
Required Skills & Qualifications:
- 4+ years of hands-on experience with Python development.
- Proficient in Python basics with a strong problem-solving approach.
- Experience with AWS Cloud services (EC2, Lambda, S3, etc.).
- Good understanding of API development and integration.
- Knowledge of GitHub and CI/CD workflows.
- Experience in working with PySpark or similar big data frameworks.
- Basic knowledge of MySQL or DynamoDB.
- Excellent communication skills and a team-oriented mindset.
Nice to Have:
- Experience in containerization (Docker/Kubernetes).
- Familiarity with Agile/Scrum methodologies.
Azure DE
Primary Responsibilities -
- Create and maintain data storage solutions including Azure SQL Database, Azure Data Lake, and Azure Blob Storage.
- Design, implement, and maintain data pipelines for data ingestion, processing, and transformation in Azure Create data models for analytics purposes
- Utilizing Azure Data Factory or comparable technologies, create and maintain ETL (Extract, Transform, Load) operations
- Use Azure Data Factory and Databricks to assemble large, complex data sets
- Implementing data validation and cleansing procedures will ensure the quality, integrity, and dependability of the data.
- Ensure data security and compliance
- Collaborate with data engineers, and other stakeholders to understand requirements and translate them into scalable and reliable data platform architectures
Required skills:
- Blend of technical expertise, analytical problem-solving, and collaboration with cross-functional teams
- Azure DevOps
- Apache Spark, Python
- SQL proficiency
- Azure Databricks knowledge
- Big data technologies
The DEs should be well versed in coding, spark core and data ingestion using Azure. Moreover, they need to be decent in terms of communication skills. They should also have core Azure DE skills and coding skills (pyspark, python and SQL).
Out of the 7 open demands, 5 of The Azure Data Engineers should have minimum 5 years of relevant Data Engineering experience.
We are looking for a Senior Data Engineer with strong expertise in GCP, Databricks, and Airflow to design and implement a GCP Cloud Native Data Processing Framework. The ideal candidate will work on building scalable data pipelines and help migrate existing workloads to a modern framework.
- Shift: 2 PM 11 PM
- Work Mode: Hybrid (3 days a week) across Xebia locations
- Notice Period: Immediate joiners or those with a notice period of up to 30 days
Key Responsibilities:
- Design and implement a GCP Native Data Processing Framework leveraging Spark and GCP Cloud Services.
- Develop and maintain data pipelines using Databricks and Airflow for transforming Raw → Silver → Gold data layers.
- Ensure data integrity, consistency, and availability across all systems.
- Collaborate with data engineers, analysts, and stakeholders to optimize performance.
- Document standards and best practices for data engineering workflows.
Required Experience:
- 7-8 years of experience in data engineering, architecture, and pipeline development.
- Strong knowledge of GCP, Databricks, PySpark, and BigQuery.
- Experience with Orchestration tools like Airflow, Dagster, or GCP equivalents.
- Understanding of Data Lake table formats (Delta, Iceberg, etc.).
- Proficiency in Python for scripting and automation.
- Strong problem-solving skills and collaborative mindset.
⚠️ Please apply only if you have not applied recently or are not currently in the interview process for any open roles at Xebia.
Looking forward to your response!
Best regards,
Vijay S
Assistant Manager - TAG
Here is the Job Description -
Location -- Viman Nagar, Pune
Mode - 5 Days Working
Required Tech Skills:
● Strong at PySpark, Python
● Good understanding of Data Structure
● Good at SQL query/optimization
● Strong fundamentals of OOPs programming
● Good understanding of AWS Cloud, Big Data.
● Data Lake, AWS Glue, Athena, S3, Kinesis, SQL/NoSQL DB
The Sr AWS/Azure/GCP Databricks Data Engineer at Koantek will use comprehensive
modern data engineering techniques and methods with Advanced Analytics to support
business decisions for our clients. Your goal is to support the use of data-driven insights
to help our clients achieve business outcomes and objectives. You can collect, aggregate, and analyze structured/unstructured data from multiple internal and external sources and
patterns, insights, and trends to decision-makers. You will help design and build data
pipelines, data streams, reporting tools, information dashboards, data service APIs, data
generators, and other end-user information portals and insight tools. You will be a critical
part of the data supply chain, ensuring that stakeholders can access and manipulate data
for routine and ad hoc analysis to drive business outcomes using Advanced Analytics. You are expected to function as a productive member of a team, working and
communicating proactively with engineering peers, technical lead, project managers, product owners, and resource managers. Requirements:
Strong experience as an AWS/Azure/GCP Data Engineer and must have
AWS/Azure/GCP Databricks experience. Expert proficiency in Spark Scala, Python, and spark
Must have data migration experience from on-prem to cloud
Hands-on experience in Kinesis to process & analyze Stream Data, Event/IoT Hubs, and Cosmos
In depth understanding of Azure/AWS/GCP cloud and Data lake and Analytics
solutions on Azure. Expert level hands-on development Design and Develop applications on Databricks. Extensive hands-on experience implementing data migration and data processing
using AWS/Azure/GCP services
In depth understanding of Spark Architecture including Spark Streaming, Spark Core, Spark SQL, Data Frames, RDD caching, Spark MLib
Hands-on experience with the Technology stack available in the industry for data
management, data ingestion, capture, processing, and curation: Kafka, StreamSets, Attunity, GoldenGate, Map Reduce, Hadoop, Hive, Hbase, Cassandra, Spark, Flume, Hive, Impala, etc
Hands-on knowledge of data frameworks, data lakes and open-source projects such
asApache Spark, MLflow, and Delta Lake
Good working knowledge of code versioning tools [such as Git, Bitbucket or SVN]
Hands-on experience in using Spark SQL with various data sources like JSON, Parquet and Key Value Pair
Experience preparing data for Data Science and Machine Learning with exposure to- model selection, model lifecycle, hyperparameter tuning, model serving, deep
learning, etc
Demonstrated experience preparing data, automating and building data pipelines for
AI Use Cases (text, voice, image, IoT data etc. ). Good to have programming language experience with. NET or Spark/Scala
Experience in creating tables, partitioning, bucketing, loading and aggregating data
using Spark Scala, Spark SQL/PySpark
Knowledge of AWS/Azure/GCP DevOps processes like CI/CD as well as Agile tools
and processes including Git, Jenkins, Jira, and Confluence
Working experience with Visual Studio, PowerShell Scripting, and ARM templates. Able to build ingestion to ADLS and enable BI layer for Analytics
Strong understanding of Data Modeling and defining conceptual logical and physical
data models. Big Data/analytics/information analysis/database management in the cloud
IoT/event-driven/microservices in the cloud- Experience with private and public cloud
architectures, pros/cons, and migration considerations. Ability to remain up to date with industry standards and technological advancements
that will enhance data quality and reliability to advance strategic initiatives
Working knowledge of RESTful APIs, OAuth2 authorization framework and security
best practices for API Gateways
Guide customers in transforming big data projects, including development and
deployment of big data and AI applications
Guide customers on Data engineering best practices, provide proof of concept, architect solutions and collaborate when needed
2+ years of hands-on experience designing and implementing multi-tenant solutions
using AWS/Azure/GCP Databricks for data governance, data pipelines for near real-
time data warehouse, and machine learning solutions. Over all 5+ years' experience in a software development, data engineering, or data
analytics field using Python, PySpark, Scala, Spark, Java, or equivalent technologies. hands-on expertise in Apache SparkTM (Scala or Python)
3+ years of experience working in query tuning, performance tuning, troubleshooting, and debugging Spark and other big data solutions. Bachelor's or Master's degree in Big Data, Computer Science, Engineering, Mathematics, or similar area of study or equivalent work experience
Ability to manage competing priorities in a fast-paced environment
Ability to resolve issues
Basic experience with or knowledge of agile methodologies
AWS Certified: Solutions Architect Professional
Databricks Certified Associate Developer for Apache Spark
Microsoft Certified: Azure Data Engineer Associate
GCP Certified: Professional Google Cloud Certified
Key Responsibilities:
Design, develop, and optimize scalable data pipelines and ETL processes.
Work with large datasets using GCP services like BigQuery, Dataflow, and Cloud Storage.
Implement real-time data streaming and processing solutions using Pub/Sub and Dataproc.
Collaborate with cross-functional teams to ensure data quality and governance.
Technical Requirements:
4+ years of experience in Data Engineering.
Strong expertise in GCP services like Workflow,tensorflow, Dataproc, and Cloud Storage.
Proficiency in SQL and programming languages such as Python or Java
.Experience in designing and implementing data pipelines
and working with real-time data processing.
Familiarity with CI/CD pipelines and cloud security best practices.
Job Description :
Job Title : Data Engineer
Location : Pune (Hybrid Work Model)
Experience Required : 4 to 8 Years
Role Overview :
We are seeking talented and driven Data Engineers to join our team in Pune. The ideal candidate will have a strong background in data engineering with expertise in Python, PySpark, and SQL. You will be responsible for designing, building, and maintaining scalable data pipelines and systems that empower our business intelligence and analytics initiatives.
Key Responsibilities:
- Develop, optimize, and maintain ETL pipelines and data workflows.
- Design and implement scalable data solutions using Python, PySpark, and SQL.
- Collaborate with cross-functional teams to gather and analyze data requirements.
- Ensure data quality, integrity, and security throughout the data lifecycle.
- Monitor and troubleshoot data pipelines to ensure reliability and performance.
- Work on hybrid data environments involving on-premise and cloud-based systems.
- Assist in the deployment and maintenance of big data solutions.
Required Skills and Qualifications:
- Bachelor’s degree in Computer Science, Information Technology, or related field.
- 4 to 8 Years of experience in Data Engineering or related roles.
- Proficiency in Python and PySpark for data processing and analysis.
- Strong SQL skills with experience in writing complex queries and optimizing performance.
- Familiarity with data pipeline tools and frameworks.
- Knowledge of cloud platforms such as AWS, Azure, or GCP is a plus.
- Excellent problem-solving and analytical skills.
- Strong communication and teamwork abilities.
Preferred Qualifications:
- Experience with big data technologies like Hadoop, Hive, or Spark.
- Familiarity with data visualization tools and techniques.
- Knowledge of CI/CD pipelines and DevOps practices in a data engineering context.
Work Model:
- This position follows a hybrid work model, with candidates expected to work from the Pune office as per business needs.
Why Join Us?
- Opportunity to work with cutting-edge technologies.
- Collaborative and innovative work environment.
- Competitive compensation and benefits.
- Clear career progression and growth opportunities.
Job Description
Job Title: Data Engineer
Location: Hyderabad, India
Job Type: Full Time
Experience: 5 – 8 Years
Working Model: On-Site (No remote or work-from-home options available)
Work Schedule: Mountain Time Zone (3:00 PM to 11:00 PM IST)
Role Overview
The Data Engineer will be responsible for designing and implementing scalable backend systems, leveraging Python and PySpark to build high-performance solutions. The role requires a proactive and detail-orientated individual who can solve complex data engineering challenges while collaborating with cross-functional teams to deliver quality results.
Key Responsibilities
- Develop and maintain backend systems using Python and PySpark.
- Optimise and enhance system performance for large-scale data processing.
- Collaborate with cross-functional teams to define requirements and deliver solutions.
- Debug, troubleshoot, and resolve system issues and bottlenecks.
- Follow coding best practices to ensure code quality and maintainability.
- Utilise tools like Palantir Foundry for data management workflows (good to have).
Qualifications
- Strong proficiency in Python backend development.
- Hands-on experience with PySpark for data engineering.
- Excellent problem-solving skills and attention to detail.
- Good communication skills for effective team collaboration.
- Experience with Palantir Foundry or similar platforms is a plus.
Preferred Skills
- Experience with large-scale data processing and pipeline development.
- Familiarity with agile methodologies and development tools.
- Ability to optimise and streamline backend processes effectively.