50+ Spark Jobs in India
Apply to 50+ Spark Jobs on CutShort.io. Find your next job, effortlessly. Browse Spark Jobs and apply today!


We are hiring freelancers to work on advanced Data & AI projects using Databricks. If you are passionate about cloud platforms, machine learning, data engineering, or architecture, and want to work with cutting-edge tools on real-world challenges, this is the opportunity for you!
✅ Key Details
- Work Type: Freelance / Contract
- Location: Remote
- Time Zones: IST / EST only
- Domain: Data & AI, Cloud, Big Data, Machine Learning
- Collaboration: Work with industry leaders on innovative projects
🔹 Open Roles
1. Databricks – Senior Consultant
- Skills: Data Warehousing, Python, Java, Scala, ETL, SQL, AWS, GCP, Azure
- Experience: 6+ years
2. Databricks – ML Engineer
- Skills: CI/CD, MLOps, Machine Learning, Spark, Hadoop
- Experience: 4+ years
3. Databricks – Solution Architect
- Skills: Azure, GCP, AWS, CI/CD, MLOps
- Experience: 7+ years
4. Databricks – Solution Consultant
- Skills: SQL, Spark, BigQuery, Python, Scala
- Experience: 2+ years
✅ What We Offer
- Opportunity to work with top-tier professionals and clients
- Exposure to cutting-edge technologies and real-world data challenges
- Flexible remote work environment aligned with IST / EST time zones
- Competitive compensation and growth opportunities
📌 Skills We Value
Cloud Computing | Data Warehousing | Python | Java | Scala | ETL | SQL | AWS | GCP | Azure | CI/CD | MLOps | Machine Learning | Spark |

Role Overview
We're looking for experienced Data Engineers who can independently design, build, and manage scalable data platforms. You'll work directly with clients and internal teams to develop robust data pipelines that support analytics, AI/ML, and operational systems.
You’ll also play a mentorship role and help establish strong engineering practices across our data projects.
Key Responsibilities
- Design and develop large-scale, distributed data pipelines (batch and streaming)
- Implement scalable data models, warehouses/lakehouses, and data lakes
- Translate business requirements into technical data solutions
- Optimize data pipelines for performance and reliability
- Ensure code is clean, modular, tested, and documented
- Contribute to architecture, tooling decisions, and platform setup
- Review code/design and mentor junior engineers
Must-Have Skills
- Strong programming skills in Python and advanced SQL
- Solid grasp of ETL/ELT, data modeling (OLTP & OLAP), and stream processing
- Hands-on experience with frameworks like Apache Spark, Flink, etc.
- Experience with orchestration tools like Airflow
- Familiarity with CI/CD pipelines and Git
- Ability to debug and scale data pipelines in production
Preferred Skills
- Experience with cloud platforms (AWS preferred, GCP or Azure also fine)
- Exposure to Databricks, dbt, or similar tools
- Understanding of data governance, quality frameworks, and observability
- Certifications (e.g., AWS Data Analytics, Solutions Architect, Databricks) are a bonus
What We’re Looking For
- Problem-solver with strong analytical skills and attention to detail
- Fast learner who can adapt across tools, tech stacks, and domains
- Comfortable working in fast-paced, client-facing environments
- Willingness to travel within India when required


About Moative
Moative, an Applied AI company, designs and builds transformation AI solutions for traditional industries in energy, utilities, healthcare & lifesciences, and more. Through Moative Labs, we build AI micro-products and launch AI startups with partners in vertical markets that align with our theses.
Our Past: We have built and sold two companies, one of which was an AI company. Our founders and leaders are Math PhDs, Ivy League University Alumni, Ex-Googlers, and successful entrepreneurs.
Our Team: Our team of 20+ employees consist of data scientists, AI/ML Engineers, and mathematicians from top engineering and research institutes such as IITs, CERN, IISc, UZH, Ph.Ds. Our team includes academicians, IBM Research Fellows, and former founders.
Work you’ll do
As a Data Engineer, you will work on data architecture, large-scale processing systems, and data flow management. You will build and maintain optimal data architecture and data pipelines, assemble large, complex data sets, and ensure that data is readily available to data scientists, analysts, and other users. In close collaboration with ML engineers, data scientists, and domain experts, you’ll deliver robust, production-grade solutions that directly impact business outcomes. Ultimately, you will be responsible for developing and implementing systems that optimize the organization’s data use and data quality.
Responsibilities
- Create and maintain optimal data architecture and data pipelines on cloud infrastructure (such as AWS/ Azure/ GCP)
- Assemble large, complex data sets that meet functional / non-functional business requirements
- Identify, design, and implement internal process improvements
- Build the pipeline infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources
- Support development of analytics that utilize the data pipeline to provide actionable insights into key business metrics
- Work with stakeholders to assist with data-related technical issues and support their data infrastructure needs
Who you are
You are a passionate and results-oriented engineer who understands the importance of data architecture and data quality to impact solution development, enhance products, and ultimately improve business applications. You thrive in dynamic environments and are comfortable navigating ambiguity. You possess a strong sense of ownership and are eager to take initiative, advocating for your technical decisions while remaining open to feedback and collaboration.
You have experience in developing and deploying data pipelines to support real-world applications. You have a good understanding of data structures and are excellent at writing clean, efficient code to extract, create and manage large data sets for analytical uses. You have the ability to conduct regular testing and debugging to ensure optimal data pipeline performance. You are excited at the possibility of contributing to intelligent applications that can directly impact business services and make a positive difference to users.
Skills & Requirements
- 3+ years of hands-on experience as a data engineer, data architect or similar role, with a good understanding of data structures and data engineering.
- Solid knowledge of cloud infra and data-related services on AWS (EC2, EMR, RDS, Redshift) and/ or Azure.
- Advanced knowledge of SQL, including writing complex queries, stored procedures, views, etc.
- Strong experience with data pipeline and workflow management tools (such as Luigi, Airflow).
- Experience with common relational SQL, NoSQL and Graph databases.
- Strong experience with scripting languages: Python, PySpark, Scala, etc.
- Practical experience with basic DevOps concepts: CI/CD, containerization (Docker, Kubernetes), etc
- Experience with big data tools (Spark, Kafka, etc) and stream processing.
- Excellent communication skills to collaborate with colleagues from both technical and business backgrounds, discuss and convey ideas and findings effectively.
- Ability to analyze complex problems, think critically for troubleshooting and develop robust data solutions.
- Ability to identify and tackle issues efficiently and proactively, conduct thorough research and collaborate to find long-term, scalable solutions.
Working at Moative
Moative is a young company, but we believe strongly in thinking long-term, while acting with urgency. Our ethos is rooted in innovation, efficiency and high-quality outcomes. We believe the future of work is AI-augmented and boundary less. Here are some of our guiding principles:
- Think in decades. Act in hours. As an independent company, our moat is time. While our decisions are for the long-term horizon, our execution will be fast – measured in hours and days, not weeks and months.
- Own the canvas. Throw yourself in to build, fix or improve – anything that isn’t done right, irrespective of who did it. Be selfish about improving across the organization – because once the rot sets in, we waste years in surgery and recovery.
- Use data or don’t use data. Use data where you ought to but not as a ‘cover-my-back’ political tool. Be capable of making decisions with partial or limited data. Get better at intuition and pattern-matching. Whichever way you go, be mostly right about it.
- Avoid work about work. Process creeps on purpose, unless we constantly question it. We are deliberate about committing to rituals that take time away from the actual work. We truly believe that a meeting that could be an email, should be an email and you don’t need a person with the highest title to say that out loud.
- High revenue per person. We work backwards from this metric. Our default is to automate instead of hiring. We multi-skill our people to own more outcomes than hiring someone who has less to do. We don’t like squatting and hoarding that comes in the form of hiring for growth. High revenue per person comes from high quality work from everyone. We demand it.
If this role and our work is of interest to you, please apply. We encourage you to apply even if you believe you do not meet all the requirements listed above.
That said, you should demonstrate that you are in the 90th percentile or above. This may mean that you have studied in top-notch institutions, won competitions that are intellectually demanding, built something of your own, or rated as an outstanding performer by your current or previous employers.
The position is based out of Chennai. Our work currently involves significant in-person collaboration and we expect you to work out of our offices in Chennai.
Data Architecture and Engineering Lead
Responsibilities:
- Lead Data Architecture: Own the design, evolution, and delivery of enterprise data architecture across cloud and hybrid environments. Develop relational and analytical data models (conceptual, logical, and physical) to support business needs and ensure data integrity.
- Consolidate Core Systems: Unify data sources across airport systems into a single analytical platform optimised for business value.
- Build Scalable Infrastructure: Architect cloud-native solutions that support both batch and streaming data workflows using tools like Databricks, Kafka, etc.
- Implement Governance Frameworks: Define and enforce enterprise-wide data standards for access control, privacy, quality, security, and lineage.
- Enable Metadata & Cataloguing: Deploy metadata management and cataloguing tools to enhance data discoverability and self-service analytics.
- Operationalise AI/ML Pipelines: Lead data architecture that supports AI/ML initiatives, including forecasting, pricing models, and personalisation.
- Partner Across Functions: Translate business needs into data architecture solutions by collaborating with leaders in Operations, Finance, HR, Legal, Technology.
- Optimize Cloud Cost & Performance: Roll out compute and storage systems that balance cost efficiency, performance, and observability across platforms.
Qualifications:
- 12+ years of experience in data architecture, with 3+ years in a senior or leadership role across cloud or hybrid environments
- Proven ability to design and scale large data platforms supporting analytics, real-time reporting, and AI/ML use cases
- Hands-on expertise with ingestion, transformation, and orchestration pipelines
- Extensive experience with Microsoft Azure data services, including Azure Data Lake Storage, Azure Databricks, Azure Data Factory and related technologies.
- Strong knowledge of ERP data models, especially SAP and MS Dynamics
- Experience with data governance, compliance (GDPR/CCPA), metadata cataloguing, and security practices
- Familiarity with distributed systems and streaming frameworks like Spark or Flink
- Strong stakeholder management and communication skills, with the ability to influence both technical and business teams
Tools & Technologies
- Warehousing: Azure Databricks Delta, BigQuery
- Big Data: Apache Spark
- Cloud Platforms: Azure (ADLS, AKS, EventHub, ServiceBus)
- Streaming: Kafka, Pub/Sub
- RDBMS: PostgreSQL, MS SQL
- NoSQL: Redis
- Monitoring: Azure Monitoring, App Insight, Prometheus, Grafana

Role overview:
- Must have About 5 - 11 years and at least 3 years relevant experience with Bigdata.
- Must have Experience in building highly scalable business applications, which involve implementing large complex business flows and dealing with huge amounts of data.
- Must have experience in Hadoop, Hive, Spark with Scala with good experience in performance tuning and debugging issues.
- Good to have any stream processing Spark/Java Kafka.
- Must have experience in design and development of Big data projects.
- Good knowledge in Functional programming and OOP concepts, SOLID principles, design patterns for developing scalable applications.
- Familiarity with build tools like Maven.
- Must have experience with any RDBMS and at least one SQL database preferably PostgresSQL
- Must have experience writing unit and integration tests using scaliest
- Must have experience using any versioning control system - Git
- Must have experience with CI / CD pipeline – Jenkins is a plus
- Basic hands-on experience in one of the cloud provider (AWS/Azure) is a plus
- Databricks Spark certification is a plus.
What would you do here:
As a Software Development Engineer 2 you will be responsible for expanding and optimising our data and data pipeline architecture as well as optimising data flow and collection for cross-functional teams. The ideal candidate is an experienced data pipeline design and data wrangler who enjoys optimising data systems and building them from the ground up. The Data Engineer will lead our software developers on data initiatives and will ensure optimal data delivery architecture is consistent throughout ongoing projects. They must be self-directed and comfortable supporting the data needs of multiple teams, systems and products. The right candidate will be excited by the prospect of optimising or even re-designing our company’s data architecture to support our next generation of products and data initiatives.
Responsibilities:
•Create and maintain optimal data pipeline architecture
•Assemble large complex data sets that meet functional / non-functional business requirements.
•Identify design and implement internal process improvements: automating manual processes optimising data delivery, coordinating to re-design infrastructure for greater scalability etc.
•Work with stakeholders including the Executive Product Data and Design teams to assist with data-related technical issues and support their data infrastructure needs.
•Keep our data separated and secure
•Work with data and analytics experts to strive for greater functionality in our data systems.
- Support PROD systems



We’re seeking a highly skilled, execution-focused Senior Data Scientist with a minimum of 5 years of experience. This role demands hands-on expertise in building, deploying, and optimizing machine learning models at scale, while working with big data technologies and modern cloud platforms. You will be responsible for driving data-driven solutions from experimentation to production, leveraging advanced tools and frameworks across Python, SQL, Spark, and AWS. The role requires strong technical depth, problem-solving ability, and ownership in delivering business impact through data science.
Responsibilities
- Design, build, and deploy scalable machine learning models into production systems.
- Develop advanced analytics and predictive models using Python, SQL, and popular ML/DL frameworks (Pandas, Scikit-learn, TensorFlow, PyTorch).
- Leverage Databricks, Apache Spark, and Hadoop for large-scale data processing and model training.
- Implement workflows and pipelines using Airflow and AWS EMR for automation and orchestration.
- Collaborate with engineering teams to integrate models into cloud-based applications on AWS.
- Optimize query performance, storage usage, and data pipelines for efficiency.
- Conduct end-to-end experiments, including data preprocessing, feature engineering, model training, validation, and deployment.
- Drive initiatives independently with high ownership and accountability.
- Stay up to date with industry best practices in machine learning, big data, and cloud-native deployments.
Requirements:
- Minimum 5 years of experience in Data Science or Applied Machine Learning.
- Strong proficiency in Python, SQL, and ML libraries (Pandas, Scikit-learn, TensorFlow, PyTorch).
- Proven expertise in deploying ML models into production systems.
- Experience with big data platforms (Hadoop, Spark) and distributed data processing.
- Hands-on experience with Databricks, Airflow, and AWS EMR.
- Strong knowledge of AWS cloud services (S3, Lambda, SageMaker, EC2, etc.).
- Solid understanding of query optimization, storage systems, and data pipelines.
- Excellent problem-solving skills, with the ability to design scalable solutions.
- Strong communication and collaboration skills to work in cross-functional teams.
Benefits:
- Best in class salary: We hire only the best, and we pay accordingly.
- Proximity Talks: Meet other designers, engineers, and product geeks — and learn from experts in the field.
- Keep on learning with a world-class team: Work with the best in the field, challenge yourself constantly, and learn something new every day.
About Us:
Proximity is the trusted technology, design, and consulting partner for some of the biggest Sports, Media, and Entertainment companies in the world! We’re headquartered in San Francisco and have offices in Palo Alto, Dubai, Mumbai, and Bangalore. Since 2019, Proximity has created and grown high-impact, scalable products used by 370 million daily users, with a total net worth of $45.7 billion among our client companies.
Today, we are a global team of coders, designers, product managers, geeks, and experts. We solve complex problems and build cutting-edge tech, at scale. Our team of Proxonauts is growing quickly, which means your impact on the company’s success will be huge. You’ll have the chance to work with experienced leaders who have built and led multiple tech, product, and design teams.
To be successful in this role, you should possess
• Collaborate closely with Product Management and Engineering leadership to devise and build the
right solution.
• Participate in Design discussions and brainstorming sessions to select, integrate, and maintain Big
Data tools and frameworks required to solve Big Data problems at scale.
• Design and implement systems to cleanse, process, and analyze large data sets using distributed
processing tools like Akka and Spark.
• Understanding and critically reviewing existing data pipelines, and coming up with ideas in
collaboration with Technical Leaders and Architects to improve upon current bottlenecks
• Take initiatives, and show the drive to pick up new stuff proactively, and work as a Senior
Individual contributor on the multiple products and features we have.
• 3+ years of experience in developing highly scalable Big Data pipelines.
• In-depth understanding of the Big Data ecosystem including processing frameworks like Spark,
Akka, Storm, and Hadoop, and the file types they deal with.
• Experience with ETL and Data pipeline tools like Apache NiFi, Airflow etc.
• Excellent coding skills in Java or Scala, including the understanding to apply appropriate Design
Patterns when required.
• Experience with Git and build tools like Gradle/Maven/SBT.
• Strong understanding of object-oriented design, data structures, algorithms, profiling, and
optimization.
• Have elegant, readable, maintainable and extensible code style.
You are someone who would easily be able to
• Work closely with the US and India engineering teams to help build the Java/Scala based data
pipelines
• Lead the India engineering team in technical excellence and ownership of critical modules; own
the development of new modules and features
• Troubleshoot live production server issues.
• Handle client coordination and be able to work as a part of a team, be able to contribute
independently and drive the team to exceptional contributions with minimal team supervision
• Follow Agile methodology, JIRA for work planning, issue management/tracking
Additional Project/Soft Skills:
• Should be able to work independently with India & US based team members.
• Strong verbal and written communication with ability to articulate problems and solutions over phone and emails.
• Strong sense of urgency, with a passion for accuracy and timeliness.
• Ability to work calmly in high pressure situations and manage multiple projects/tasks.
• Ability to work independently and possess superior skills in issue resolution.
• Should have the passion to learn and implement, analyze and troubleshoot issues

Role Overview:
We are seeking a talented and experienced Data Architect with strong data visualization capabilities to join our dynamic team in Mumbai. As a Data Architect, you will be responsible for designing, building, and managing our data infrastructure, ensuring its reliability, scalability, and performance. You will also play a crucial role in transforming complex data into insightful visualizations that drive business decisions. This role requires a deep understanding of data modeling, database technologies (particularly Oracle Cloud), data warehousing principles, and proficiency in data manipulation and visualization tools, including Python and SQL.
Responsibilities:
- Design and implement robust and scalable data architectures, including data warehouses, data lakes, and operational data stores, primarily leveraging Oracle Cloud services.
- Develop and maintain data models (conceptual, logical, and physical) that align with business requirements and ensure data integrity and consistency.
- Define data governance policies and procedures to ensure data quality, security, and compliance.
- Collaborate with data engineers to build and optimize ETL/ELT pipelines for efficient data ingestion, transformation, and loading.
- Develop and execute data migration strategies to Oracle Cloud.
- Utilize strong SQL skills to query, manipulate, and analyze large datasets from various sources.
- Leverage Python and relevant libraries (e.g., Pandas, NumPy) for data cleaning, transformation, and analysis.
- Design and develop interactive and insightful data visualizations using tools like [Specify Visualization Tools - e.g., Tableau, Power BI, Matplotlib, Seaborn, Plotly] to communicate data-driven insights to both technical and non-technical stakeholders.
- Work closely with business analysts and stakeholders to understand their data needs and translate them into effective data models and visualizations.
- Ensure the performance and reliability of data visualization dashboards and reports.
- Stay up-to-date with the latest trends and technologies in data architecture, cloud computing (especially Oracle Cloud), and data visualization.
- Troubleshoot data-related issues and provide timely resolutions.
- Document data architectures, data flows, and data visualization solutions.
- Participate in the evaluation and selection of new data technologies and tools.
Qualifications:
- Bachelor's or Master's degree in Computer Science, Data Science, Information Systems, or a related field.
- Proven experience (typically 5+ years) as a Data Architect, Data Modeler, or similar role.
- Deep understanding of data warehousing concepts, dimensional modeling (e.g., star schema, snowflake schema), and ETL/ELT processes.
- Extensive experience working with relational databases, particularly Oracle, and proficiency in SQL.
- Hands-on experience with Oracle Cloud data services (e.g., Autonomous Data Warehouse, Object Storage, Data Integration).
- Strong programming skills in Python and experience with data manipulation and analysis libraries (e.g., Pandas, NumPy).
- Demonstrated ability to create compelling and effective data visualizations using industry-standard tools (e.g., Tableau, Power BI, Matplotlib, Seaborn, Plotly).
- Excellent analytical and problem-solving skills with the ability to interpret complex data and translate it into actionable insights.
- Strong communication and presentation skills, with the ability to effectively communicate technical concepts to non-technical audiences.
- Experience with data governance and data quality principles.
- Familiarity with agile development methodologies.
- Ability to work independently and collaboratively within a team environment.
Application Link- https://forms.gle/km7n2WipJhC2Lj2r5

The Opportunity
We’re looking for a Senior Data Engineer to join our growing Data Platform team. This role is a hybrid of data engineering and business intelligence, ideal for someone who enjoys solving complex data challenges while also building intuitive and actionable reporting solutions.
You’ll play a key role in designing and scaling the infrastructure and pipelines that power analytics, dashboards, machine learning, and decision-making across Sonatype. You’ll also be responsible for delivering clear, compelling, and insightful business intelligence through tools like Looker Studio and advanced SQL queries.
What You’ll Do
- Design, build, and maintain scalable data pipelines and ETL/ELT processes.
- Architect and optimize data models and storage solutions for analytics and operational use.
- Create and manage business intelligence reports and dashboards using tools like Looker Studio, Power BI, or similar.
- Collaborate with data scientists, analysts, and stakeholders to ensure datasets are reliable, meaningful, and actionable.
- Own and evolve parts of our data platform (e.g., Airflow, dbt, Spark, Redshift, or Snowflake).
- Write complex, high-performance SQL queries to support reporting and analytics needs.
- Implement observability, alerting, and data quality monitoring for critical pipelines.
- Drive best practices in data engineering and business intelligence, including documentation, testing, and CI/CD.
- Contribute to the evolution of our next-generation data lakehouse and BI architecture.
What We’re Looking For
Minimum Qualifications
- 5+ years of experience as a Data Engineer or in a hybrid data/reporting role.
- Strong programming skills in Python, Java, or Scala.
- Proficiency with data tools such as Databricks, data modeling techniques (e.g., star schema, dimensional modeling), and data warehousing solutions like Snowflake or Redshift.
- Hands-on experience with modern data platforms and orchestration tools (e.g., Spark, Kafka, Airflow).
- Proficient in SQL with experience in writing and optimizing complex queries for BI and analytics.
- Experience with BI tools such as Looker Studio, Power BI, or Tableau.
- Experience in building and maintaining robust ETL/ELT pipelines in production.
- Understanding of data quality, observability, and governance best practices.
Bonus Points
- Experience with dbt, Terraform, or Kubernetes.
- Familiarity with real-time data processing or streaming architectures.
- Understanding of data privacy, compliance, and security best practices in analytics and reporting.
Why You’ll Love Working Here
- Data with purpose: Work on problems that directly impact how the world builds secure software.
- Full-spectrum impact: Use both engineering and analytical skills to shape product, strategy, and operations.
- Modern tooling: Leverage the best of open-source and cloud-native technologies.
- Collaborative culture: Join a passionate team that values learning, autonomy, and real-world impact.

About the Role
We’re hiring a Data Engineer to join our Data Platform team. You’ll help build and scale the systems that power analytics, reporting, and data-driven features across the company. This role works with engineers, analysts, and product teams to make sure our data is accurate, available, and usable.
What You’ll Do
- Build and maintain reliable data pipelines and ETL/ELT workflows.
- Develop and optimize data models for analytics and internal tools.
- Work with team members to deliver clean, trusted datasets.
- Support core data platform tools like Airflow, dbt, Spark, Redshift, or Snowflake.
- Monitor data pipelines for quality, performance, and reliability.
- Write clear documentation and contribute to test coverage and CI/CD processes.
- Help shape our data lakehouse architecture and platform roadmap.
What You Need
- 2–4 years of experience in data engineering or a backend data-related role.
- Strong skills in Python or another backend programming language.
- Experience working with SQL and distributed data systems (e.g., Spark, Kafka).
- Familiarity with NoSQL stores like HBase or similar.
- Comfortable writing efficient queries and building data workflows.
- Understanding of data modeling for analytics and reporting.
- Exposure to tools like Airflow or other workflow schedulers.
Bonus Points
- Experience with DBT, Databricks, or real-time data pipelines.
- Familiarity with cloud infrastructure tools like Terraform or Kubernetes.
- Interest in data governance, ML pipelines, or compliance standards.
Why Join Us?
- Work on data that supports meaningful software security outcomes.
- Use modern tools in a cloud-first, open-source-friendly environment.
- Join a team that values clarity, learning, and autonomy.
If you're excited about building impactful software and helping others do the same, this is an opportunity to grow as a technical leader and make a meaningful impact.
Job Title : Data Engineer – GCP + Spark + DBT
Location : Bengaluru (On-site at Client Location | 3 Days WFO)
Experience : 8 to 12 Years
Level : Associate Architect
Type : Full-time
Job Overview :
We are looking for a seasoned Data Engineer to join the Data Platform Engineering team supporting a Unified Data Platform (UDP). This role requires hands-on expertise in DBT, GCP, BigQuery, and PySpark, with a solid foundation in CI/CD, data pipeline optimization, and agile delivery.
Mandatory Skills : GCP, DBT, Google Dataform, BigQuery, PySpark/Spark SQL, Advanced SQL, CI/CD, Git, Agile Methodologies.
Key Responsibilities :
- Design, build, and optimize scalable data pipelines using BigQuery, DBT, and PySpark.
- Leverage GCP-native services like Cloud Storage, Pub/Sub, Dataproc, Cloud Functions, and Composer for ETL/ELT workflows.
- Implement and maintain CI/CD for data engineering projects with Git-based version control.
- Collaborate with cross-functional teams including Infra, Security, and DataOps for reliable, secure, and high-quality data delivery.
- Lead code reviews, mentor junior engineers, and enforce best practices in data engineering.
- Participate in Agile sprints, backlog grooming, and Jira-based project tracking.
Must-Have Skills :
- Strong experience with DBT, Google Dataform, and BigQuery
- Hands-on expertise with PySpark/Spark SQL
- Proficient in GCP for data engineering workflows
- Solid knowledge of SQL optimization, Git, and CI/CD pipelines
- Agile team experience and strong problem-solving abilities
Nice-to-Have Skills :
- Familiarity with Databricks, Delta Lake, or Kafka
- Exposure to data observability and quality frameworks (e.g., Great Expectations, Soda)
- Knowledge of MDM patterns, Terraform, or IaC is a plus

🛠️ Key Responsibilities
- Design, build, and maintain scalable data pipelines using Python and Apache Spark (PySpark or Scala APIs)
- Develop and optimize ETL processes for batch and real-time data ingestion
- Collaborate with data scientists, analysts, and DevOps teams to support data-driven solutions
- Ensure data quality, integrity, and governance across all stages of the data lifecycle
- Implement data validation, monitoring, and alerting mechanisms for production pipelines
- Work with cloud platforms (AWS, GCP, or Azure) and tools like Airflow, Kafka, and Delta Lake
- Participate in code reviews, performance tuning, and documentation
🎓 Qualifications
- Bachelor’s or Master’s degree in Computer Science, Engineering, or related field
- 3–6 years of experience in data engineering with a focus on Python and Spark
- Experience with distributed computing and handling large-scale datasets (10TB+)
- Familiarity with data security, PII handling, and compliance standards is a plus

Senior Data Engineer Job Description
Overview
The Senior Data Engineer will design, develop, and maintain scalable data pipelines and
infrastructure to support data-driven decision-making and advanced analytics. This role requires deep
expertise in data engineering, strong problem-solving skills, and the ability to collaborate with
cross-functional teams to deliver robust data solutions.
Key Responsibilities
Data Pipeline Development: Design, build, and optimize scalable, secure, and reliable data
pipelines to ingest, process, and transform large volumes of structured and unstructured data.
Data Architecture: Architect and maintain data storage solutions, including data lakes, data
warehouses, and databases, ensuring performance, scalability, and cost-efficiency.
Data Integration: Integrate data from diverse sources, including APIs, third-party systems,
and streaming platforms, ensuring data quality and consistency.
Performance Optimization: Monitor and optimize data systems for performance, scalability,
and cost, implementing best practices for partitioning, indexing, and caching.
Collaboration: Work closely with data scientists, analysts, and software engineers to
understand data needs and deliver solutions that enable advanced analytics, machine
learning, and reporting.
Data Governance: Implement data governance policies, ensuring compliance with data
security, privacy regulations (e.g., GDPR, CCPA), and internal standards.
Automation: Develop automated processes for data ingestion, transformation, and validation
to improve efficiency and reduce manual intervention.
Mentorship: Guide and mentor junior data engineers, fostering a culture of technical
excellence and continuous learning.
Troubleshooting: Diagnose and resolve complex data-related issues, ensuring high
availability and reliability of data systems.
Required Qualifications
Education: Bachelor’s or Master’s degree in Computer Science, Engineering, Data Science,
or a related field.
Experience: 5+ years of experience in data engineering or a related role, with a proven track
record of building scalable data pipelines and infrastructure.
Technical Skills:
Proficiency in programming languages such as Python, Java, or Scala.
Expertise in SQL and experience with NoSQL databases (e.g., MongoDB, Cassandra).
Strong experience with cloud platforms (e.g., AWS, Azure, GCP) and their data services
(e.g., Redshift, BigQuery, Snowflake).
Hands-on experience with ETL/ELT tools (e.g., Apache Airflow, Talend, Informatica) and
data integration frameworks.
Familiarity with big data technologies (e.g., Hadoop, Spark, Kafka) and distributed
systems.
Knowledge of containerization and orchestration tools (e.g., Docker, Kubernetes) is a
plus.
Soft Skills:
Excellent problem-solving and analytical skills.
Strong communication and collaboration abilities.
Ability to work in a fast-paced, dynamic environment and manage multiple priorities.
Certifications (optional but preferred): Cloud certifications (e.g., AWS Certified Data Analytics,
Google Professional Data Engineer) or relevant data engineering certifications.
Preferred Qualifica
Experience with real-time data processing and streaming architectures.
Familiarity with machine learning pipelines and MLOps practices.
Knowledge of data visualization tools (e.g., Tableau, Power BI) and their integration with data
pipelines.
Experience in industries with high data complexity, such as finance, healthcare, or
e-commerce.
Work Environment
Location: Hybrid/Remote/On-site (depending on company policy).
Team: Collaborative, cross-functional team environment with data scientists, analysts, and
business stakeholders.
Hours: Full-time, with occasional on-call responsibilities for critical data systems.

Location: Mumbai
Job Type: Full-Time (Hybrid – 3 days in office, 2 days WFH)
Job Overview:
We are looking for a skilled Azure Data Engineer with strong experience in data modeling, pipeline development, and SQL/Spark expertise. The ideal candidate will work closely with the Data Analytics & BI teams to implement robust data solutions on Azure Synapse and ensure seamless data integration with third-party applications.
Key Responsibilities:
- Design, develop, and maintain Azure data pipelines using Azure Synapse (SQL dedicated pools or Apache Spark pools).
- Implement data models in collaboration with the Data Analytics and BI teams.
- Optimize and manage large-scale SQL and Spark-based data processing solutions.
- Ensure data availability and reliability for third-party application consumption.
- Collaborate with cross-functional teams to translate business requirements into scalable data solutions.
Required Skills & Experience:
3–5 years of hands-on experience in:
- Azure data services
- Data Modeling
- SQL development and tuning
- Apache Spark
- Strong knowledge of Azure Synapse Analytics.
- Experience in designing data pipelines and ETL/ELT processes.
- Ability to troubleshoot and optimize complex data workflows.
Preferred Qualifications:
- Experience with data governance, security, and data quality practices.
- Familiarity with DevOps practices in a data engineering context.
- Effective communication skills and the ability to work in a collaborative team environment.
Company name: PulseData labs Pvt Ltd (captive Unit for URUS, USA)
About URUS
We are the URUS family (US), a global leader in products and services for Agritech.
SENIOR DATA ENGINEER
This role is responsible for the design, development, and maintenance of data integration and reporting solutions. The ideal candidate will possess expertise in Databricks and strong skills in SQL Server, SSIS and SSRS, and experience with other modern data engineering tools such as Azure Data Factory. This position requires a proactive and results-oriented individual with a passion for data and a strong understanding of data warehousing principles.
Responsibilities
Data Integration
- Design, develop, and maintain robust and efficient ETL pipelines and processes on Databricks.
- Troubleshoot and resolve Databricks pipeline errors and performance issues.
- Maintain legacy SSIS packages for ETL processes.
- Troubleshoot and resolve SSIS package errors and performance issues.
- Optimize data flow performance and minimize data latency.
- Implement data quality checks and validations within ETL processes.
Databricks Development
- Develop and maintain Databricks pipelines and datasets using Python, Spark and SQL.
- Migrate legacy SSIS packages to Databricks pipelines.
- Optimize Databricks jobs for performance and cost-effectiveness.
- Integrate Databricks with other data sources and systems.
- Participate in the design and implementation of data lake architectures.
Data Warehousing
- Participate in the design and implementation of data warehousing solutions.
- Support data quality initiatives and implement data cleansing procedures.
Reporting and Analytics
- Collaborate with business users to understand data requirements for department driven reporting needs.
- Maintain existing library of complex SSRS reports, dashboards, and visualizations.
- Troubleshoot and resolve SSRS report issues, including performance bottlenecks and data inconsistencies.
Collaboration and Communication
- Comfortable in entrepreneurial, self-starting, and fast-paced environment, working both independently and with our highly skilled teams.
- Collaborate effectively with business users, data analysts, and other IT teams.
- Communicate technical information clearly and concisely, both verbally and in writing.
- Document all development work and procedures thoroughly.
Continuous Growth
- Keep abreast of the latest advancements in data integration, reporting, and data engineering technologies.
- Continuously improve skills and knowledge through training and self-learning.
This job description reflects managements assignment of essential functions; it does not prescribe or restrict the tasks that may be assigned.
Requirements
- Bachelor's degree in computer science, Information Systems, or a related field.
- 7+ years of experience in data integration and reporting.
- Extensive experience with Databricks, including Python, Spark, and Delta Lake.
- Strong proficiency in SQL Server, including T-SQL, stored procedures, and functions.
- Experience with SSIS (SQL Server Integration Services) development and maintenance.
- Experience with SSRS (SQL Server Reporting Services) report design and development.
- Experience with data warehousing concepts and best practices.
- Experience with Microsoft Azure cloud platform and Microsoft Fabric desirable.
- Strong analytical and problem-solving skills.
- Excellent communication and interpersonal skills.
- Ability to work independently and as part of a team.
- Experience with Agile methodologies.
What you’ll do
- Tame data → pull, clean, and shape structured & unstructured data.
- Orchestrate pipelines → Airflow / Step Functions / ADF… your call.
- Ship models → build, tune, and push to prod on SageMaker, Azure ML, or Vertex AI.
- Scale → Spark / Databricks for the heavy lifting.
- Automate everything → Docker, Kubernetes, CI/CD, MLFlow, Seldon, Kubeflow.
- Pair up → work with engineers, architects, and business folks to solve real problems, fast.
What you bring
- 3+ yrs hands-on MLOps (4-5 yrs total software experience).
- Proven chops on one hyperscaler (AWS, Azure, or GCP).
- Confidence with Databricks / Spark, Python, SQL, TensorFlow / PyTorch / Scikit-learn.
- You debug Kubernetes in your sleep and treat Dockerfiles like breathing.
- You prototype with open-source first, choose the right tool, then make it scale.
- Sharp mind, low ego, bias for action.
Nice-to-haves
- Sagemaker, Azure ML, or Vertex AI in production.
- Love for clean code, clear docs, and crisp PRs.

A leader in telecom, fintech, AI-led marketing automation.

We are looking for a talented MERN Developer with expertise in MongoDB/MySQL, Kubernetes, Python, ETL, Hadoop, and Spark. The ideal candidate will design, develop, and optimize scalable applications while ensuring efficient source code management and implementing Non-Functional Requirements (NFRs).
Key Responsibilities:
- Develop and maintain robust applications using MERN Stack (MongoDB, Express.js, React.js, Node.js).
- Design efficient database architectures (MongoDB/MySQL) for scalable data handling.
- Implement and manage Kubernetes-based deployment strategies for containerized applications.
- Ensure compliance with Non-Functional Requirements (NFRs), including source code management, development tools, and security best practices.
- Develop and integrate Python-based functionalities for data processing and automation.
- Work with ETL pipelines for smooth data transformations.
- Leverage Hadoop and Spark for processing and optimizing large-scale data operations.
- Collaborate with solution architects, DevOps teams, and data engineers to enhance system performance.
- Conduct code reviews, troubleshooting, and performance optimization to ensure seamless application functionality.
Required Skills & Qualifications:
- Proficiency in MERN Stack (MongoDB, Express.js, React.js, Node.js).
- Strong understanding of database technologies (MongoDB/MySQL).
- Experience working with Kubernetes for container orchestration.
- Hands-on knowledge of Non-Functional Requirements (NFRs) in application development.
- Expertise in Python, ETL pipelines, and big data technologies (Hadoop, Spark).
- Strong problem-solving and debugging skills.
- Knowledge of microservices architecture and cloud computing frameworks.
Preferred Qualifications:
- Certifications in cloud computing, Kubernetes, or database management.
- Experience in DevOps, CI/CD automation, and infrastructure management.
- Understanding of security best practices in application development.


What You’ll Be Doing:
● Design and build parts of our data pipeline architecture for extraction, transformation, and loading of data from a wide variety of data sources using the latest Big Data technologies.
● Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
● Work with stakeholders including the Executive, Product, Data and Design teams to assist with data-related technical issues and support their data infrastructure needs.
● Work with machine learning, data, and analytics experts to drive innovation, accuracy and greater functionality in our data system. Qualifications:
● Bachelor's degree in Engineering, Computer Science, or relevant field.
● 10+ years of relevant and recent experience in a Data Engineer role. ● 5+ years recent experience with Apache Spark and solid understanding of the fundamentals.
● Deep understanding of Big Data concepts and distributed systems.
● Strong coding skills with Scala, Python, Java and/or other languages and the ability to quickly switch between them with ease.
● Advanced working SQL knowledge and experience working with a variety of relational databases such as Postgres and/or MySQL.
● Cloud Experience with DataBricks
● Experience working with data stored in many formats including Delta Tables, Parquet, CSV and JSON.
● Comfortable working in a linux shell environment and writing scripts as needed.
● Comfortable working in an Agile environment
● Machine Learning knowledge is a plus.
● Must be capable of working independently and delivering stable, efficient and reliable software.
● Excellent written and verbal communication skills in English.
● Experience supporting and working with cross-functional teams in a dynamic environment
EMPLOYMENT TYPE: Full-Time, Permanent
LOCATION: Remote (Pan India)
SHIFT TIMINGS: 2.00 pm-11:00pm IST

About the Role:
We are seeking a talented Lead Data Engineer to join our team and play a pivotal role in transforming raw data into valuable insights. As a Data Engineer, you will design, develop, and maintain robust data pipelines and infrastructure to support our organization's analytics and decision-making processes.
Responsibilities:
- Data Pipeline Development: Build and maintain scalable data pipelines to extract, transform, and load (ETL) data from various sources (e.g., databases, APIs, files) into data warehouses or data lakes.
- Data Infrastructure: Design, implement, and manage data infrastructure components, including data warehouses, data lakes, and data marts.
- Data Quality: Ensure data quality by implementing data validation, cleansing, and standardization processes.
- Team Management: Able to handle team.
- Performance Optimization: Optimize data pipelines and infrastructure for performance and efficiency.
- Collaboration: Collaborate with data analysts, scientists, and business stakeholders to understand their data needs and translate them into technical requirements.
- Tool and Technology Selection: Evaluate and select appropriate data engineering tools and technologies (e.g., SQL, Python, Spark, Hadoop, cloud platforms).
- Documentation: Create and maintain clear and comprehensive documentation for data pipelines, infrastructure, and processes.
Skills:
- Strong proficiency in SQL and at least one programming language (e.g., Python, Java).
- Experience with data warehousing and data lake technologies (e.g., Snowflake, AWS Redshift, Databricks).
- Knowledge of cloud platforms (e.g., AWS, GCP, Azure) and cloud-based data services.
- Understanding of data modeling and data architecture concepts.
- Experience with ETL/ELT tools and frameworks.
- Excellent problem-solving and analytical skills.
- Ability to work independently and as part of a team.
Preferred Qualifications:
- Experience with real-time data processing and streaming technologies (e.g., Kafka, Flink).
- Knowledge of machine learning and artificial intelligence concepts.
- Experience with data visualization tools (e.g., Tableau, Power BI).
- Certification in cloud platforms or data engineering.

What You’ll Do:
As a Data Scientist, you will work closely across DeepIntent Analytics teams located in New York City, India, and Bosnia. The role will support internal and external business partners in defining patient and provider audiences, and generating analyses and insights related to measurement of campaign outcomes, Rx, patient journey, and supporting evolution of DeepIntent product suite. Activities in this position include creating and scoring audiences, reading campaign results, analyzing medical claims, clinical, demographic and clickstream data, performing analysis and creating actionable insights, summarizing, and presenting results and recommended actions to internal stakeholders and external clients, as needed.
- Explore ways to to create better audiences
- Analyze medical claims, clinical, demographic and clickstream data to produce and present actionable insights
- Explore ways of using inference, statistical, machine learning techniques to improve the performance of existing algorithms and decision heuristics
- Design and deploy new iterations of production-level code
- Contribute posts to our upcoming technical blog
Who You Are:
- Bachelor’s degree in a STEM field, such as Statistics, Mathematics, Engineering, Biostatistics, Econometrics, Economics, Finance, OR, or Data Science. Graduate degree is strongly preferred
- 3+ years of working experience as Data Analyst, Data Engineer, Data Scientist in digital marketing, consumer advertisement, telecom, or other areas requiring customer level predictive analytics
- Background in either data engineering or analytics
- Hands on technical experience is required, proficiency in performing statistical analysis in Python, including relevant libraries, required
- You have an advanced understanding of the ad-tech ecosystem, digital marketing and advertising data and campaigns or familiarity with the US healthcare patient and provider systems (e.g. medical claims, medications)
- Experience in programmatic, DSP related, marketing predictive analytics, audience segmentation or audience behaviour analysis or medical / healthcare experience
- You have varied and hands-on predictive machine learning experience (deep learning, boosting algorithms, inference)
- Familiarity with data science tools such as, Xgboost, pytorch, Jupyter and strong LLM user experience (developer/API experience is a plus)
- You are interested in translating complex quantitative results into meaningful findings and interpretable deliverables, and communicating with less technical audiences orally and in writing


Job Description:
Interviews will be scheduled in two days.
We are seeking a highly skilled Scala Developer to join our team on an immediate basis. The ideal candidate will work remotely and collaborate with a US-based client, so excellent communication skills are essential.
Key Responsibilities:
Develop scalable and high-performance applications using Scala.
Collaborate with cross-functional teams to understand requirements and deliver quality solutions.
Write clean, maintainable, and testable code.
Optimize application performance and troubleshoot issues.
Participate in code reviews and ensure adherence to best practices.
Required Skills:
Strong experience in Scala development.
Solid understanding of functional programming principles.
Experience with frameworks like Akka, Play, or Spark is a plus.
Good knowledge of REST APIs, microservices architecture, and concurrency.
Familiarity with CI/CD, Git, and Agile methodologies.
Roles & Responsibilities
- Develop and maintain scalable backend services using Scala.
- Design and integrate RESTful APIs and microservices.
- Collaborate with cross-functional teams to deliver technical solutions.
- Write clean, efficient, and testable code.
- Participate in code reviews and ensure code quality.
- Troubleshoot issues and optimize performance.
- Stay updated on Scala and backend development best practices.
Immediate joiner prefer.

We are looking for a Senior Data Engineer with strong expertise in GCP, Databricks, and Airflow to design and implement a GCP Cloud Native Data Processing Framework. The ideal candidate will work on building scalable data pipelines and help migrate existing workloads to a modern framework.
- Shift: 2 PM 11 PM
- Work Mode: Hybrid (3 days a week) across Xebia locations
- Notice Period: Immediate joiners or those with a notice period of up to 30 days
Key Responsibilities:
- Design and implement a GCP Native Data Processing Framework leveraging Spark and GCP Cloud Services.
- Develop and maintain data pipelines using Databricks and Airflow for transforming Raw → Silver → Gold data layers.
- Ensure data integrity, consistency, and availability across all systems.
- Collaborate with data engineers, analysts, and stakeholders to optimize performance.
- Document standards and best practices for data engineering workflows.
Required Experience:
- 7-8 years of experience in data engineering, architecture, and pipeline development.
- Strong knowledge of GCP, Databricks, PySpark, and BigQuery.
- Experience with Orchestration tools like Airflow, Dagster, or GCP equivalents.
- Understanding of Data Lake table formats (Delta, Iceberg, etc.).
- Proficiency in Python for scripting and automation.
- Strong problem-solving skills and collaborative mindset.
⚠️ Please apply only if you have not applied recently or are not currently in the interview process for any open roles at Xebia.
Looking forward to your response!
Best regards,
Vijay S
Assistant Manager - TAG


Level of skills and experience:
5 years of hands-on experience in using Python, Spark,Sql.
Experienced in AWS Cloud usage and management.
Experience with Databricks (Lakehouse, ML, Unity Catalog, MLflow).
Experience using various ML models and frameworks such as XGBoost, Lightgbm, Torch.
Experience with orchestrators such as Airflow and Kubeflow.
Familiarity with containerization and orchestration technologies (e.g., Docker, Kubernetes).
Fundamental understanding of Parquet, Delta Lake and other data file formats.
Proficiency on an IaC tool such as Terraform, CDK or CloudFormation.
Strong written and verbal English communication skill and proficient in communication with non-technical stakeholderst
Job Title: Big Data Engineer (Java Spark Developer – JAVA SPARK EXP IS MUST)
Location: Chennai, Hyderabad, Pune, Bangalore (Bengaluru) / NCR Delhi
Client: Premium Tier 1 Company
Payroll: Direct Client
Employment Type: Full time / Perm
Experience: 7+ years
Job Description:
We are looking for a skilled Big Data Engineers using Java Spark with 7+ years of experience in Big Data / legacy platforms, who can join immediately. Desired candidate should have design, development and optimization of real-time & batch data pipelines experience in Big Data environment at an enterprise scale applications. You will work on building scalable and high-performance data processing solutions, integrating real-time data streams, and building a reliable Data platforms. Strong troubleshooting, performance tuning, and collaboration skills are key for this role.
Key Responsibilities:
· Develop data pipelines using Java Spark and Kafka.
· Optimize and maintain real-time data pipelines and messaging systems.
· Collaborate with cross-functional teams to deliver scalable data solutions.
· Troubleshoot and resolve issues in Java Spark and Kafka applications.
Qualifications:
· Experience in Java Spark is must
· Knowledge and hands-on experience using distributed computing, real-time data streaming, and big data technologies
· Strong problem-solving and performance optimization skills
· Looking for immediate joiners
If interested, please share your resume along with the following details
1) Notice Period
2) Current CTC
3) Expected CTC
4) Have Experience in Java Spark - Y / N (this is must)
5) Any offers in hand
Thanks & Regards,
LION & ELEPHANTS CONSULTANCY PVT LTD TEAM
SINGAPORE | INDIA

The Sr AWS/Azure/GCP Databricks Data Engineer at Koantek will use comprehensive
modern data engineering techniques and methods with Advanced Analytics to support
business decisions for our clients. Your goal is to support the use of data-driven insights
to help our clients achieve business outcomes and objectives. You can collect, aggregate, and analyze structured/unstructured data from multiple internal and external sources and
patterns, insights, and trends to decision-makers. You will help design and build data
pipelines, data streams, reporting tools, information dashboards, data service APIs, data
generators, and other end-user information portals and insight tools. You will be a critical
part of the data supply chain, ensuring that stakeholders can access and manipulate data
for routine and ad hoc analysis to drive business outcomes using Advanced Analytics. You are expected to function as a productive member of a team, working and
communicating proactively with engineering peers, technical lead, project managers, product owners, and resource managers. Requirements:
Strong experience as an AWS/Azure/GCP Data Engineer and must have
AWS/Azure/GCP Databricks experience. Expert proficiency in Spark Scala, Python, and spark
Must have data migration experience from on-prem to cloud
Hands-on experience in Kinesis to process & analyze Stream Data, Event/IoT Hubs, and Cosmos
In depth understanding of Azure/AWS/GCP cloud and Data lake and Analytics
solutions on Azure. Expert level hands-on development Design and Develop applications on Databricks. Extensive hands-on experience implementing data migration and data processing
using AWS/Azure/GCP services
In depth understanding of Spark Architecture including Spark Streaming, Spark Core, Spark SQL, Data Frames, RDD caching, Spark MLib
Hands-on experience with the Technology stack available in the industry for data
management, data ingestion, capture, processing, and curation: Kafka, StreamSets, Attunity, GoldenGate, Map Reduce, Hadoop, Hive, Hbase, Cassandra, Spark, Flume, Hive, Impala, etc
Hands-on knowledge of data frameworks, data lakes and open-source projects such
asApache Spark, MLflow, and Delta Lake
Good working knowledge of code versioning tools [such as Git, Bitbucket or SVN]
Hands-on experience in using Spark SQL with various data sources like JSON, Parquet and Key Value Pair
Experience preparing data for Data Science and Machine Learning with exposure to- model selection, model lifecycle, hyperparameter tuning, model serving, deep
learning, etc
Demonstrated experience preparing data, automating and building data pipelines for
AI Use Cases (text, voice, image, IoT data etc. ). Good to have programming language experience with. NET or Spark/Scala
Experience in creating tables, partitioning, bucketing, loading and aggregating data
using Spark Scala, Spark SQL/PySpark
Knowledge of AWS/Azure/GCP DevOps processes like CI/CD as well as Agile tools
and processes including Git, Jenkins, Jira, and Confluence
Working experience with Visual Studio, PowerShell Scripting, and ARM templates. Able to build ingestion to ADLS and enable BI layer for Analytics
Strong understanding of Data Modeling and defining conceptual logical and physical
data models. Big Data/analytics/information analysis/database management in the cloud
IoT/event-driven/microservices in the cloud- Experience with private and public cloud
architectures, pros/cons, and migration considerations. Ability to remain up to date with industry standards and technological advancements
that will enhance data quality and reliability to advance strategic initiatives
Working knowledge of RESTful APIs, OAuth2 authorization framework and security
best practices for API Gateways
Guide customers in transforming big data projects, including development and
deployment of big data and AI applications
Guide customers on Data engineering best practices, provide proof of concept, architect solutions and collaborate when needed
2+ years of hands-on experience designing and implementing multi-tenant solutions
using AWS/Azure/GCP Databricks for data governance, data pipelines for near real-
time data warehouse, and machine learning solutions. Over all 5+ years' experience in a software development, data engineering, or data
analytics field using Python, PySpark, Scala, Spark, Java, or equivalent technologies. hands-on expertise in Apache SparkTM (Scala or Python)
3+ years of experience working in query tuning, performance tuning, troubleshooting, and debugging Spark and other big data solutions. Bachelor's or Master's degree in Big Data, Computer Science, Engineering, Mathematics, or similar area of study or equivalent work experience
Ability to manage competing priorities in a fast-paced environment
Ability to resolve issues
Basic experience with or knowledge of agile methodologies
AWS Certified: Solutions Architect Professional
Databricks Certified Associate Developer for Apache Spark
Microsoft Certified: Azure Data Engineer Associate
GCP Certified: Professional Google Cloud Certified

We are seeking a highly skilled and experienced Offshore Data Engineer . The role involves designing, implementing, and testing data pipelines and products.
Qualifications & Experience:
bachelor's or master's degree in computer science, Information Systems, or a related field.
5+ years of experience in data engineering, with expertise in data architecture and pipeline development.
☁️ Proven experience with GCP, Big Query, Databricks, Airflow, Spark, DBT, and GCP Services.
️ Hands-on experience with ETL processes, SQL, PostgreSQL, MySQL, MongoDB, Cassandra.
Strong proficiency in Python and data modelling.
Experience in testing and validation of data pipelines.
Preferred: Experience with eCommerce systems, data visualization tools (Tableau, Looker), and cloud certifications.
If you meet the above criteria and are interested, please share your updated CV along with the following details:
Total Experience:
Current CTC:
Expected CTC:
Current Location:
Preferred Location:
Notice Period / Last Working Day (if serving notice):
⚠️ Kindly share your details only if you have not applied recently or are not currently in the interview process for any open roles at Xebia.
Looking forward to your response!
Job Title : Senior AWS Data Engineer
Experience : 5+ Years
Location : Gurugram
Employment Type : Full-Time
Job Summary :
Seeking a Senior AWS Data Engineer with expertise in AWS to design, build, and optimize scalable data pipelines and data architectures. The ideal candidate will have experience in ETL/ELT, data warehousing, and big data technologies.
Key Responsibilities :
- Build and optimize data pipelines using AWS (Glue, EMR, Redshift, S3, etc.).
- Maintain data lakes & warehouses for analytics.
- Ensure data integrity through quality checks.
- Collaborate with data scientists & engineers to deliver solutions.
Qualifications :
- 7+ Years in Data Engineering.
- Expertise in AWS services, SQL, Python, Spark, Kafka.
- Experience with CI/CD, DevOps practices.
- Strong problem-solving skills.
Preferred Skills :
- Experience with Snowflake, Databricks.
- Knowledge of BI tools (Tableau, Power BI).
- Healthcare/Insurance domain experience is a plus.
Position : Software Engineer (Java Backend Engineer)
Experience : 4+ Years
📍 Location : Bangalore, India (Hybrid)
Mandatory Skills : Java 8+ (Advanced Features), Spring Boot, Apache Spark (Spark Streaming), SQL & Cosmos DB, Git, Maven, CI/CD (Jenkins, GitHub), Azure Cloud, Agile Scrum.
About the Role :
We are seeking a highly skilled Backend Engineer with expertise in Java, Spark, and microservices architecture to join our dynamic team. The ideal candidate will have a strong background in object-oriented programming, experience with Spark Streaming, and a deep understanding of distributed systems and cloud technologies.
Key Responsibilities :
- Design, develop, and maintain highly scalable microservices and optimized RESTful APIs using Spring Boot and Java 8+.
- Implement and optimize Spark Streaming applications for real-time data processing.
- Utilize advanced Java 8 features, including:
- Functional interfaces & Lambda expressions
- Streams and Parallel Streams
- Completable Futures & Concurrency API improvements
- Enhanced Collections APIs
- Work with relational (SQL) and NoSQL (Cosmos DB) databases, ensuring efficient data modeling and retrieval.
- Develop and manage CI/CD pipelines using Jenkins, GitHub, and related automation tools.
- Collaborate with cross-functional teams, including Product, Business, and Automation, to deliver end-to-end product features.
- Ensure adherence to Agile Scrum practices and participate in code reviews to maintain high-quality standards.
- Deploy and manage applications in Azure Cloud environments.
Minimum Qualifications:
- BS/MS in Computer Science or a related field.
- 4+ Years of experience developing backend applications with Spring Boot and Java 8+.
- 3+ Years of hands-on experience with Git for version control.
- Strong understanding of software design patterns and distributed computing principles.
- Experience with Maven for building and deploying artifacts.
- Proven ability to work in Agile Scrum environments with a collaborative team mindset.
- Prior experience with Azure Cloud Technologies.
Job Title : Tech Lead - Data Engineering (AWS, 7+ Years)
Location : Gurugram
Employment Type : Full-Time
Job Summary :
Seeking a Tech Lead - Data Engineering with expertise in AWS to design, build, and optimize scalable data pipelines and data architectures. The ideal candidate will have experience in ETL/ELT, data warehousing, and big data technologies.
Key Responsibilities :
- Build and optimize data pipelines using AWS (Glue, EMR, Redshift, S3, etc.).
- Maintain data lakes & warehouses for analytics.
- Ensure data integrity through quality checks.
- Collaborate with data scientists & engineers to deliver solutions.
Qualifications :
- 7+ Years in Data Engineering.
- Expertise in AWS services, SQL, Python, Spark, Kafka.
- Experience with CI/CD, DevOps practices.
- Strong problem-solving skills.
Preferred Skills :
- Experience with Snowflake, Databricks.
- Knowledge of BI tools (Tableau, Power BI).
- Healthcare/Insurance domain experience is a plus.
Dear Candidate,
We are Urgently hiring QA Automation Engineers and Test leads At Hyderabad and Bangalore
Exp: 6-10 yrs
Locations: Hyderabad ,Bangalore
JD:
we are Hiring Automation Testers with 6-10 years of Automation testing experience using QA automation tools like Java, UFT, Selenium, API Testing, ETL & others
Must Haves:
· Experience in Financial Domain is a must
· Extensive Hands-on experience in Design, implement and maintain automation framework using Java, UFT, ETL, Selenium tools and automation concepts.
· Experience with AWS concept and framework design/ testing.
· Experience in Data Analysis, Data Validation, Data Cleansing, Data Verification and identifying data mismatch.
· Experience with Databricks, Python, Spark, Hive, Airflow, etc.
· Experience in validating and analyzing kubernetics log files.
· API testing experience
· Backend testing skills with ability to write SQL queries in Databricks and in Oracle databases
· Experience in working with globally distributed Agile project teams
· Ability to work in a fast-paced, globally structured and team-based environment, as well as independently
· Experience in test management tools like Jira
· Good written and verbal communication skills
Good To have:
- Business and finance knowledge desirable
Best Regards,
Minakshi Soni
Executive - Talent Acquisition (L2)
Worldwide Locations: USA | HK | IN
Responsibilities:
· Analyze complex data sets to answer specific questions using MMIT’s market access data (MMIT) and Norstella claims data, third-party claims data (IQVIA LAAD, Symphony SHA). Applicant must have experience working with the aforementioned data sets exclusively.
· Deliver consultative services to clients related to MMIT RWD sets
· Produce complex analytical reports using data visualization tools such as Power BI or Tableau
· Define customized technical specifications to surface MMIT RWD in MMIT tools.
· Execute work in a timely fashion with high accuracy, while managing various competing priorities; Perform thorough troubleshooting and execute QA; Communicate with internal teams to obtain required data
· Ensure adherence to documentation requirements, process workflows, timelines, and escalation protocols
· And other duties as assigned.
Requirements:
· Bachelor’s Degree or relevant experience required
· 2-5 yrs. of professional experience in RWD analytics using SQL
· Fundamental understanding of Pharma and Market access space
· Strong analysis skills and proficiency with tools such as Tableau or PowerBI
· Excellent written and verbal communication skills.
· Analytical, critical thinking and creative problem-solving skills.
· Relationship building skills.
· Solid organizational skills including attention to detail and multitasking skills.
· Excellent time management and prioritization skills.

Role Objective:
Big Data Engineer will be responsible for expanding and optimizing our data and database architecture, as well as optimizing data flow and collection for cross functional teams. The ideal candidate is an experienced data pipeline builder and data wrangler who enjoys optimizing data systems and building. The Data Engineer will support our software developers, database architects, data analysts and data scientists on data initiatives and will ensure optimal data delivery architecture is consistent throughout ongoing projects. They must be self-directed and comfortable supporting the data needs of multiple teams, systems, and products
Roles & Responsibilities:
- Sound knowledge in Spark architecture and distributed computing and Spark streaming.
- Proficient in Spark – including RDD and Data frames core functions, troubleshooting and performance tuning.
- SFDC(Data modelling experience) would be given preference
- Good understanding in object-oriented concepts and hands on experience on Scala with excellent programming logic and technique.
- Good in functional programming and OOPS concept on Scala
- Good experience in SQL – should be able to write complex queries.
- Managing the team of Associates and Senior Associates and ensuring the utilization is maintained across the project.
- Able to mentor new members for onboarding to the project.
- Understand the client requirement and able to design, develop from scratch and deliver.
- AWS cloud experience would be preferable.
- Design, build and operationalize large scale enterprise data solutions and applications using one or more of AWS data and analytics services - DynamoDB, RedShift, Kinesis, Lambda, S3, etc. (preferred)
- Hands on experience utilizing AWS Management Tools (CloudWatch, CloudTrail) to proactively monitor large and complex deployments (preferred)
- Experience in analyzing, re-architecting, and re-platforming on-premises data warehouses to data platforms on AWS (preferred)
- Leading the client calls to flag off any delays, blockers, escalations and collate all the requirements.
- Managing project timing, client expectations and meeting deadlines.
- Should have played project and team management roles.
- Facilitate meetings within the team on regular basis.
- Understand business requirement and analyze different approaches and plan deliverables and milestones for the project.
- Optimization, maintenance, and support of pipelines.
- Strong analytical and logical skills.
- Ability to comfortably tackling new challenges and learn

Role Objective:
Big Data Engineer will be responsible for expanding and optimizing our data and database architecture, as well as optimizing data flow and collection for cross functional teams. The ideal candidate is an experienced data pipeline builder and data wrangler who enjoys optimizing data systems and building. The Data Engineer will support our software developers, database architects, data analysts and data scientists on data initiatives and will ensure optimal data delivery architecture is consistent throughout ongoing projects. They must be self-directed and comfortable supporting the data needs of multiple teams, systems, and products
Roles & Responsibilities:
- Sound knowledge in Spark architecture and distributed computing and Spark streaming.
- Proficient in Spark – including RDD and Data frames core functions, troubleshooting and performance tuning.
- Good understanding in object-oriented concepts and hands on experience on Scala with excellent programming logic and technique.
- Good in functional programming and OOPS concept on Scala
- Good experience in SQL – should be able to write complex queries.
- Managing the team of Associates and Senior Associates and ensuring the utilization is maintained across the project.
- Able to mentor new members for onboarding to the project.
- Understand the client requirement and able to design, develop from scratch and deliver.
- AWS cloud experience would be preferable.
- Design, build and operationalize large scale enterprise data solutions and applications using one or more of AWS data and analytics services - DynamoDB, RedShift, Kinesis, Lambda, S3, etc. (preferred)
- Hands on experience utilizing AWS Management Tools (CloudWatch, CloudTrail) to proactively monitor large and complex deployments (preferred)
- Experience in analyzing, re-architecting, and re-platforming on-premises data warehouses to data platforms on AWS (preferred)
- Leading the client calls to flag off any delays, blockers, escalations and collate all the requirements.
- Managing project timing, client expectations and meeting deadlines.
- Should have played project and team management roles.
- Facilitate meetings within the team on regular basis.
- Understand business requirement and analyze different approaches and plan deliverables and milestones for the project.
- Optimization, maintenance, and support of pipelines.
- Strong analytical and logical skills.
- Ability to comfortably tackling new challenges and learn
External Skills And Expertise
Must have Skills:
- Scala
- Spark
- SQL (Intermediate to advanced level)
- Spark Streaming
- AWS preferable/Any cloud
- Kafka /Kinesis/Any streaming services
- Object-Oriented Programming
- Hive, ETL/ELT design experience
- CICD experience (ETL pipeline deployment)
Good to Have Skills:
- AWS Certification
- Git/similar version control tool
- Knowledge in CI/CD, Microservices
Secondary Skills: Streaming, Archiving , AWS / AZURE / CLOUD
Role:
· Should have strong programming and support experience in Java, J2EE technologies
· Should have good experience in Core Java, JSP, Sevlets, JDBC
· Good exposure in Hadoop development ( HDFS, Map Reduce, Hive, HBase, Spark)
· Should have 2+ years of Java experience and 1+ years of experience in Hadoop
· Should possess good communication skills
software development and automated testing Proficient in Big Data technologies Designs, codes, tests, corrects and documents large and/or complex programs and program modifications from supplied specifications using agreed standards and tools, to achieve a well engineered result Proficient and Hands-on Data Warehousing,
Experience with Agile development, Continuous Integration, and Continuous Delivery Ability to effectively interpret technical and business objectives and provide solutions Strong communication skills, with ability to articulate technical solutions effectively across diverse group of stakeholders
Need to be a fast learner willing to adapt to evolving needs of the developer community.
Thanks & Regards
snehalata verma
IT Recruiter --HrBizHub

- Responsible for designing, storing, processing, and maintaining of large-scale data and related infrastructure.
- Can drive multiple projects both from operational and technical standpoint.
- Ideate and build PoV or PoC for new product that can help drive more business.
- Responsible for defining, designing, and implementing data engineering best practices, strategies, and solutions.
- Is an Architect who can guide the customers, team, and overall organization on tools, technologies, and best practices around data engineering.
- Lead architecture discussions, align with business needs, security, and best practices.
- Has strong conceptual understanding of Data Warehousing and ETL, Data Governance and Security, Cloud Computing, and Batch & Real Time data processing
- Has strong execution knowledge of Data Modeling, Databases in general (SQL and NoSQL), software development lifecycle and practices, unit testing, functional programming, etc.
- Understanding of Medallion architecture pattern
- Has worked on at least one cloud platform.
- Has worked as data architect and executed multiple end-end data engineering project.
- Has extensive knowledge of different data architecture designs and data modelling concepts.
- Manages conversation with the client stakeholders to understand the requirement and translate it into technical outcomes.
Required Tech Stack
- Strong proficiency in SQL
- Experience working on any of the three major cloud platforms i.e., AWS/Azure/GCP
- Working knowledge of an ETL and/or orchestration tools like IICS, Talend, Matillion, Airflow, Azure Data Factory, AWS Glue, GCP Composer, etc.
- Working knowledge of one or more OLTP databases (Postgres, MySQL, SQL Server, etc.)
- Working knowledge of one or more Data Warehouse like Snowflake, Redshift, Azure Synapse, Hive, Big Query, etc.
- Proficient in at least one programming language used in data engineering, such as Python (or Scala/Rust/Java)
- Has strong execution knowledge of Data Modeling (star schema, snowflake schema, fact vs dimension tables)
- Proficient in Spark and related applications like Databricks, GCP DataProc, AWS Glue, EMR, etc.
- Has worked on Kafka and real-time streaming.
- Has strong execution knowledge of data architecture design patterns (lambda vs kappa architecture, data harmonization, customer data platforms, etc.)
- Has worked on code and SQL query optimization.
- Strong knowledge of version control systems like Git to manage source code repositories and designing CI/CD pipelines for continuous delivery.
- Has worked on data and networking security (RBAC, secret management, key vaults, vnets, subnets, certificates)
The Sr. Analytics Engineer would provide technical expertise in needs identification, data modeling, data movement, and transformation mapping (source to target), automation and testing strategies, translating business needs into technical solutions with adherence to established data guidelines and approaches from a business unit or project perspective.
Understands and leverages best-fit technologies (e.g., traditional star schema structures, cloud, Hadoop, NoSQL, etc.) and approaches to address business and environmental challenges.
Provides data understanding and coordinates data-related activities with other data management groups such as master data management, data governance, and metadata management.
Actively participates with other consultants in problem-solving and approach development.
Responsibilities :
Provide a consultative approach with business users, asking questions to understand the business need and deriving the data flow, conceptual, logical, and physical data models based on those needs.
Perform data analysis to validate data models and to confirm the ability to meet business needs.
Assist with and support setting the data architecture direction, ensuring data architecture deliverables are developed, ensuring compliance to standards and guidelines, implementing the data architecture, and supporting technical developers at a project or business unit level.
Coordinate and consult with the Data Architect, project manager, client business staff, client technical staff and project developers in data architecture best practices and anything else that is data related at the project or business unit levels.
Work closely with Business Analysts and Solution Architects to design the data model satisfying the business needs and adhering to Enterprise Architecture.
Coordinate with Data Architects, Program Managers and participate in recurring meetings.
Help and mentor team members to understand the data model and subject areas.
Ensure that the team adheres to best practices and guidelines.
Requirements :
- Strong working knowledge of at least 3 years of Spark, Java/Scala/Pyspark, Kafka, Git, Unix / Linux, and ETL pipeline designing.
- Experience with Spark optimization/tuning/resource allocations
- Excellent understanding of IN memory distributed computing frameworks like Spark and its parameter tuning, writing optimized workflow sequences.
- Experience of relational databases (e.g., PostgreSQL, MySQL) and NoSQL databases (e.g., Redshift, Bigquery, Cassandra, etc).
- Familiarity with Docker, Kubernetes, Azure Data Lake/Blob storage, AWS S3, Google Cloud storage, etc.
- Have a deep understanding of the various stacks and components of the Big Data ecosystem.
- Hands-on experience with Python is a huge plus

TVARIT GmbH develops and delivers solutions in the field of artificial intelligence (AI) for the Manufacturing, automotive, and process industries. With its software products, TVARIT makes it possible for its customers to make intelligent and well-founded decisions, e.g., in forward-looking Maintenance, increasing the OEE and predictive quality. We have renowned reference customers, competent technology, a good research team from renowned Universities, and the award of a renowned AI prize (e.g., EU Horizon 2020) which makes TVARIT one of the most innovative AI companies in Germany and Europe.
We are looking for a self-motivated person with a positive "can-do" attitude and excellent oral and written communication skills in English.
We are seeking a skilled and motivated senior Data Engineer from the manufacturing Industry with over four years of experience to join our team. The Senior Data Engineer will oversee the department’s data infrastructure, including developing a data model, integrating large amounts of data from different systems, building & enhancing a data lake-house & subsequent analytics environment, and writing scripts to facilitate data analysis. The ideal candidate will have a strong foundation in ETL pipelines and Python, with additional experience in Azure and Terraform being a plus. This role requires a proactive individual who can contribute to our data infrastructure and support our analytics and data science initiatives.
Skills Required:
- Experience in the manufacturing industry (metal industry is a plus)
- 4+ years of experience as a Data Engineer
- Experience in data cleaning & structuring and data manipulation
- Architect and optimize complex data pipelines, leading the design and implementation of scalable data infrastructure, and ensuring data quality and reliability at scale
- ETL Pipelines: Proven experience in designing, building, and maintaining ETL pipelines.
- Python: Strong proficiency in Python programming for data manipulation, transformation, and automation.
- Experience in SQL and data structures
- Knowledge in big data technologies such as Spark, Flink, Hadoop, Apache, and NoSQL databases.
- Knowledge of cloud technologies (at least one) such as AWS, Azure, and Google Cloud Platform.
- Proficient in data management and data governance
- Strong analytical experience & skills that can extract actionable insights from raw data to help improve the business.
- Strong analytical and problem-solving skills.
- Excellent communication and teamwork abilities.
Nice To Have:
- Azure: Experience with Azure data services (e.g., Azure Data Factory, Azure Databricks, Azure SQL Database).
- Terraform: Knowledge of Terraform for infrastructure as code (IaC) to manage cloud.
- Bachelor’s degree in computer science, Information Technology, Engineering, or a related field from top-tier Indian Institutes of Information Technology (IIITs).
- Benefits And Perks
- A culture that fosters innovation, creativity, continuous learning, and resilience
- Progressive leave policy promoting work-life balance
- Mentorship opportunities with highly qualified internal resources and industry-driven programs
- Multicultural peer groups and supportive workplace policies
- Annual workcation program allowing you to work from various scenic locations
- Experience the unique environment of a dynamic start-up
Why should you join TVARIT ?
Working at TVARIT, a deep-tech German IT startup, offers a unique blend of innovation, collaboration, and growth opportunities. We seek individuals eager to adapt and thrive in a rapidly evolving environment.
If this opportunity excites you and aligns with your career aspirations, we encourage you to apply today!
Nielsen, a global company specialising in audience measurement and analytics, is currently seeking a proficient leader in data engineering to join their team in Bangalore, Gurgaon, or Mumbai.
This is a manager of managers role that involves managing multiple scrum teams and overseeing an advanced data platform that analyses audience consumption patterns across various channels like OTT, TV, Radio, and Social Media worldwide. You will be responsible for building and supervising a top-performing data engineering team that delivers data for targeted campaigns. Moreover, you will work with AWS services (S3, Lambda, Kinesis) and other data engineering technologies such as Spark, Scala/Python, Kafka, etc. There may also be opportunities to establish deep integrations with OTT platforms like Netflix, Prime Video, and other.

Primary Skills
DynamoDB, Java, Kafka, Spark, Amazon Redshift, AWS Lake Formation, AWS Glue, Python
Skills:
Good work experience showing growth as a Data Engineer.
Hands On programming experience
Implementation Experience on Kafka, Kinesis, Spark, AWS Glue, AWS Lake Formation.
Excellent knowledge in: Python, Scala/Java, Spark, AWS (Lambda, Step Functions, Dynamodb, EMR), Terraform, UI (Angular), Git, Mavena
Experience of performance optimization in Batch and Real time processing applications
Expertise in Data Governance and Data Security Implementation
Good hands-on design and programming skills building reusable tools and products Experience developing in AWS or similar cloud platforms. Preferred:, ECS, EKS, S3, EMR, DynamoDB, Aurora, Redshift, Quick Sight or similar.
Familiarity with systems with very high volume of transactions, micro service design, or data processing pipelines (Spark).
Knowledge and hands-on experience with server less technologies such as Lambda, MSK, MWAA, Kinesis Analytics a plus.
Expertise in practices like Agile, Peer reviews, Continuous Integration
Roles and responsibilities:
Determining project requirements and developing work schedules for the team.
Delegating tasks and achieving daily, weekly, and monthly goals.
Responsible for designing, building, testing, and deploying the software releases.
Salary: 25LPA-40LPA
Must have skills
3 to 6 years
Data Science
SQL, Excel, Big Query - mandate 3+ years
Python/ML, Hadoop, Spark - 2+ years
Requirements
• 3+ years prior experience as a data analyst
• Detail oriented, structural thinking and analytical mindset.
• Proven analytic skills, including data analysis and data validation.
• Technical writing experience in relevant areas, including queries, reports, and presentations.
• Strong SQL and Excel skills with the ability to learn other analytic tools
• Good communication skills (being precise and clear)
• Good to have prior knowledge of python and ML algorithms
Location: Pune
Required Skills : Scala, Python, Data Engineering, AWS, Cassandra/AstraDB, Athena, EMR, Spark/Snowflake
Job Description:
We are seeking a talented Machine Learning Engineer with expertise in software engineering to join our team. As a Machine Learning Engineer, your primary responsibility will be to develop machine learning (ML) solutions that focus on technology process improvements. Specifically, you will be working on projects involving ML & Generative AI solutions for Technology & Data Management Efficiencies such as optimal cloud computing, knowledge bots, Software Code Assistants, Automatic Data Management etc
Responsibilities:
- Collaborate with cross-functional teams to identify opportunities for technology process improvements that can be solved using machine learning and generative AI.
- Define and build innovate ML and Generative AI systems such as AI Assistants for varied SDLC tasks, and improve Data & Infrastructure management etc.
- Design and develop ML Engineering Solutions, generative AI Applications & Fine-Tuning Large Language Models (LLMs) for above ensuring scalability, efficiency, and maintainability of such solutions.
- Implement prompt engineering techniques to fine-tune and enhance LLMs for better performance and application-specific needs.
- Stay abreast of the latest advancements in the field of Generative AI and actively contribute to the research and development of new ML & Generative AI Solutions.
Requirements:
- A Master's or Ph.D. degree in Computer Science, Statistics, Data Science, or a related field.
- Proven experience working as a Software Engineer, with a focus on ML Engineering and exposure to Generative AI Applications such as chatGPT.
- Strong proficiency in programming languages such as Java, Scala, Python, Google Cloud, Biq Query, Hadoop & Spark etc
- Solid knowledge of software engineering best practices, including version control systems (e.g., Git), code reviews, and testing methodologies.
- Familiarity with large language models (LLMs), prompt engineering techniques, vector DB's, embedding & various fine-tuning techniques.
- Strong communication skills to effectively collaborate and present findings to both technical and non-technical stakeholders.
- Proven ability to adapt and learn new technologies and frameworks quickly.
- A proactive mindset with a passion for continuous learning and research in the field of Generative AI.
If you are a skilled and innovative Data Scientist with a passion for Generative AI, and have a desire to contribute to technology process improvements, we would love to hear from you. Join our team and help shape the future of our AI Driven Technology Solutions.
Publicis Sapient Overview:
The Senior Associate People Senior Associate L1 in Data Engineering, you will translate client requirements into technical design, and implement components for data engineering solution. Utilize deep understanding of data integration and big data design principles in creating custom solutions or implementing package solutions. You will independently drive design discussions to insure the necessary health of the overall solution
.
Job Summary:
As Senior Associate L2 in Data Engineering, you will translate client requirements into technical design, and implement components for data engineering solution. Utilize deep understanding of data integration and big data design principles in creating custom solutions or implementing package solutions. You will independently drive design discussions to insure the necessary health of the overall solution
The role requires a hands-on technologist who has strong programming background like Java / Scala / Python, should have experience in Data Ingestion, Integration and data Wrangling, Computation, Analytics pipelines and exposure to Hadoop ecosystem components. You are also required to have hands-on knowledge on at least one of AWS, GCP, Azure cloud platforms.
Role & Responsibilities:
Your role is focused on Design, Development and delivery of solutions involving:
• Data Integration, Processing & Governance
• Data Storage and Computation Frameworks, Performance Optimizations
• Analytics & Visualizations
• Infrastructure & Cloud Computing
• Data Management Platforms
• Implement scalable architectural models for data processing and storage
• Build functionality for data ingestion from multiple heterogeneous sources in batch & real-time mode
• Build functionality for data analytics, search and aggregation
Experience Guidelines:
Mandatory Experience and Competencies:
# Competency
1.Overall 5+ years of IT experience with 3+ years in Data related technologies
2.Minimum 2.5 years of experience in Big Data technologies and working exposure in at least one cloud platform on related data services (AWS / Azure / GCP)
3.Hands-on experience with the Hadoop stack – HDFS, sqoop, kafka, Pulsar, NiFi, Spark, Spark Streaming, Flink, Storm, hive, oozie, airflow and other components required in building end to end data pipeline.
4.Strong experience in at least of the programming language Java, Scala, Python. Java preferable
5.Hands-on working knowledge of NoSQL and MPP data platforms like Hbase, MongoDb, Cassandra, AWS Redshift, Azure SQLDW, GCP BigQuery etc
6.Well-versed and working knowledge with data platform related services on at least 1 cloud platform, IAM and data security
Preferred Experience and Knowledge (Good to Have):
# Competency
1.Good knowledge of traditional ETL tools (Informatica, Talend, etc) and database technologies (Oracle, MySQL, SQL Server, Postgres) with hands on experience
2.Knowledge on data governance processes (security, lineage, catalog) and tools like Collibra, Alation etc
3.Knowledge on distributed messaging frameworks like ActiveMQ / RabbiMQ / Solace, search & indexing and Micro services architectures
4.Performance tuning and optimization of data pipelines
5.CI/CD – Infra provisioning on cloud, auto build & deployment pipelines, code quality
6.Cloud data specialty and other related Big data technology certifications
Personal Attributes:
• Strong written and verbal communication skills
• Articulation skills
• Good team player
• Self-starter who requires minimal oversight
• Ability to prioritize and manage multiple tasks
• Process orientation and the ability to define and set up processes


Position Overview: We are seeking a talented Data Engineer with expertise in Power BI to join our team. The ideal candidate will be responsible for designing and implementing data pipelines, as well as developing insightful visualizations and reports using Power BI. Additionally, the candidate should have strong skills in Python, data analytics, PySpark, and Databricks. This role requires a blend of technical expertise, analytical thinking, and effective communication skills.
Key Responsibilities:
- Design, develop, and maintain data pipelines and architectures using PySpark and Databricks.
- Implement ETL processes to extract, transform, and load data from various sources into data warehouses or data lakes.
- Collaborate with data analysts and business stakeholders to understand data requirements and translate them into actionable insights.
- Develop interactive dashboards, reports, and visualizations using Power BI to communicate key metrics and trends.
- Optimize and tune data pipelines for performance, scalability, and reliability.
- Monitor and troubleshoot data infrastructure to ensure data quality, integrity, and availability.
- Implement security measures and best practices to protect sensitive data.
- Stay updated with emerging technologies and best practices in data engineering and data visualization.
- Document processes, workflows, and configurations to maintain a comprehensive knowledge base.
Requirements:
- Bachelor’s degree in Computer Science, Engineering, or related field. (Master’s degree preferred)
- Proven experience as a Data Engineer with expertise in Power BI, Python, PySpark, and Databricks.
- Strong proficiency in Power BI, including data modeling, DAX calculations, and creating interactive reports and dashboards.
- Solid understanding of data analytics concepts and techniques.
- Experience working with Big Data technologies such as Hadoop, Spark, or Kafka.
- Proficiency in programming languages such as Python and SQL.
- Hands-on experience with cloud platforms like AWS, Azure, or Google Cloud.
- Excellent analytical and problem-solving skills with attention to detail.
- Strong communication and collaboration skills to work effectively with cross-functional teams.
- Ability to work independently and manage multiple tasks simultaneously in a fast-paced environment.
Preferred Qualifications:
- Advanced degree in Computer Science, Engineering, or related field.
- Certifications in Power BI or related technologies.
- Experience with data visualization tools other than Power BI (e.g., Tableau, QlikView).
- Knowledge of machine learning concepts and frameworks.
Publicis Sapient Overview:
The Senior Associate People Senior Associate L1 in Data Engineering, you will translate client requirements into technical design, and implement components for data engineering solution. Utilize deep understanding of data integration and big data design principles in creating custom solutions or implementing package solutions. You will independently drive design discussions to insure the necessary health of the overall solution
.
Job Summary:
As Senior Associate L1 in Data Engineering, you will do technical design, and implement components for data engineering solution. Utilize deep understanding of data integration and big data design principles in creating custom solutions or implementing package solutions. You will independently drive design discussions to insure the necessary health of the overall solution
The role requires a hands-on technologist who has strong programming background like Java / Scala / Python, should have experience in Data Ingestion, Integration and data Wrangling, Computation, Analytics pipelines and exposure to Hadoop ecosystem components. Having hands-on knowledge on at least one of AWS, GCP, Azure cloud platforms will be preferable.
Role & Responsibilities:
Job Title: Senior Associate L1 – Data Engineering
Your role is focused on Design, Development and delivery of solutions involving:
• Data Ingestion, Integration and Transformation
• Data Storage and Computation Frameworks, Performance Optimizations
• Analytics & Visualizations
• Infrastructure & Cloud Computing
• Data Management Platforms
• Build functionality for data ingestion from multiple heterogeneous sources in batch & real-time
• Build functionality for data analytics, search and aggregation
Experience Guidelines:
Mandatory Experience and Competencies:
# Competency
1.Overall 3.5+ years of IT experience with 1.5+ years in Data related technologies
2.Minimum 1.5 years of experience in Big Data technologies
3.Hands-on experience with the Hadoop stack – HDFS, sqoop, kafka, Pulsar, NiFi, Spark, Spark Streaming, Flink, Storm, hive, oozie, airflow and other components required in building end to end data pipeline. Working knowledge on real-time data pipelines is added advantage.
4.Strong experience in at least of the programming language Java, Scala, Python. Java preferable
5.Hands-on working knowledge of NoSQL and MPP data platforms like Hbase, MongoDb, Cassandra, AWS Redshift, Azure SQLDW, GCP BigQuery etc
Preferred Experience and Knowledge (Good to Have):
# Competency
1.Good knowledge of traditional ETL tools (Informatica, Talend, etc) and database technologies (Oracle, MySQL, SQL Server, Postgres) with hands on experience
2.Knowledge on data governance processes (security, lineage, catalog) and tools like Collibra, Alation etc
3.Knowledge on distributed messaging frameworks like ActiveMQ / RabbiMQ / Solace, search & indexing and Micro services architectures
4.Performance tuning and optimization of data pipelines
5.CI/CD – Infra provisioning on cloud, auto build & deployment pipelines, code quality
6.Working knowledge with data platform related services on at least 1 cloud platform, IAM and data security
7.Cloud data specialty and other related Big data technology certifications
Job Title: Senior Associate L1 – Data Engineering
Personal Attributes:
• Strong written and verbal communication skills
• Articulation skills
• Good team player
• Self-starter who requires minimal oversight
• Ability to prioritize and manage multiple tasks
• Process orientation and the ability to define and set up processes
Job Title Big Data Developer
Job Description
Bachelor's degree in Engineering or Computer Science or equivalent OR Master's in Computer Applications or equivalent.
Solid Experience of software development experience and leading teams of engineers and scrum teams.
4+ years of hands-on experience of working with Map-Reduce, Hive, Spark (core, SQL and PySpark).
Solid Datawarehousing concepts.
Knowledge of Financial reporting ecosystem will be a plus.
4+ years of experience within Data Engineering/ Data Warehousing using Big Data technologies will be an addon.
Expert on Distributed ecosystem.
Hands-on experience with programming using Core Java or Python/Scala
Expert on Hadoop and Spark Architecture and its working principle
Hands-on experience on writing and understanding complex SQL(Hive/PySpark-dataframes), optimizing joins while processing huge amount of data.
Experience in UNIX shell scripting.
Roles & Responsibilities
Ability to design and develop optimized Data pipelines for batch and real time data processing
Should have experience in analysis, design, development, testing, and implementation of system applications
Demonstrated ability to develop and document technical and functional specifications and analyze software and system processing flows.
Excellent technical and analytical aptitude
Good communication skills.
Excellent Project management skills.
Results driven Approach.
Mandatory SkillsBig Data, PySpark, Hive
Data Engineering : Senior Engineer / Manager
As Senior Engineer/ Manager in Data Engineering, you will translate client requirements into technical design, and implement components for a data engineering solutions. Utilize a deep understanding of data integration and big data design principles in creating custom solutions or implementing package solutions. You will independently drive design discussions to insure the necessary health of the overall solution.
Must Have skills :
1. GCP
2. Spark streaming : Live data streaming experience is desired.
3. Any 1 coding language: Java/Pyhton /Scala
Skills & Experience :
- Overall experience of MINIMUM 5+ years with Minimum 4 years of relevant experience in Big Data technologies
- Hands-on experience with the Hadoop stack - HDFS, sqoop, kafka, Pulsar, NiFi, Spark, Spark Streaming, Flink, Storm, hive, oozie, airflow and other components required in building end to end data pipeline. Working knowledge on real-time data pipelines is added advantage.
- Strong experience in at least of the programming language Java, Scala, Python. Java preferable
- Hands-on working knowledge of NoSQL and MPP data platforms like Hbase, MongoDb, Cassandra, AWS Redshift, Azure SQLDW, GCP BigQuery etc.
- Well-versed and working knowledge with data platform related services on GCP
- Bachelor's degree and year of work experience of 6 to 12 years or any combination of education, training and/or experience that demonstrates the ability to perform the duties of the position
Your Impact :
- Data Ingestion, Integration and Transformation
- Data Storage and Computation Frameworks, Performance Optimizations
- Analytics & Visualizations
- Infrastructure & Cloud Computing
- Data Management Platforms
- Build functionality for data ingestion from multiple heterogeneous sources in batch & real-time
- Build functionality for data analytics, search and aggregation

5+ years of experience designing, developing, validating, and automating ETL processes 3+ years of experience traditional ETL tools such as Visual Studio, SQL Server Management Studio, SSIS, SSAS and SSRS 2+ years of experience with cloud technologies and platforms, such as: Kubernetes, Spark, Kafka, Azure Data Factory, Snowflake, ML Flow, Databricks, Airflow or similar Must have experience with designing and implementing data access layers Must be an expert with SQL/T-SQL and Python Must have experience in Kafka Define and implement data models with various database technologies like MongoDB, CosmosDB, Neo4j, MariaDB and SQL Serve Ingest and publish data from sources and to destinations via an API Exposure to ETL/ELT with using Kafka or Azure Event Hubs with Spark or Databricks is a plus Exposure to healthcare technologies and integrations for FHIR API, HL7 or other HIE protocols is a plus
Skills Required :
Designing, Developing, ETL, Visual Studio, Python, Spark, Kubernetes, Kafka, Azure Data Factory, SQL Server, Airflow, Databricks, T-SQL, MongoDB, CosmosDB, Snowflake, SSIS, SSAS, SSRS, FHIR API, HL7, HIE Protocols

DATA ENGINEERING CONSULTANT
About NutaNXT: NutaNXT is a next-gen Software Product Engineering services provider building ground-breaking products using AI/ML, Data Analytics, IOT, Cloud & new emerging technologies disrupting the global markets. Our mission is to help clients leverage our specialized Digital Product Engineering capabilities on Data Engineering, AI Automations, Software Full stack solutions and services to build best-in-class products and stay ahead of the curve. You will get a chance to work on multiple projects critical to NutaNXT needs with opportunities to learn, develop new skills,switch teams and projects as you and our fast-paced business grow and evolve. Location: Pune Experience: 6 to 8 years
Job Description: NutaNXT is looking for supporting the planning and implementation of data design services, providing sizing and configuration assistance and performing needs assessments. Delivery of architectures for transformations and modernizations of enterprise data solutions using Azure cloud data technologies. As a Data Engineering Consultant, you will collect, aggregate, store, and reconcile data in support of Customer's business decisions. You will design and build data pipelines, data streams, data service APIs, data generators and other end-user information portals and insight tools.
Mandatory Skills: -
- Demonstrable experience in enterprise level data platforms involving implementation of end-to-end data pipelines with Python or Scala - Hands-on experience with at least one of the leading public cloud data platforms (Ideally Azure)
- - Experience with different Databases (like column-oriented database, NoSQL database, RDBMS)
- - Experience in architecting data pipelines and solutions for both streaming and batch integrations using tools/frameworks like Azure Databricks, Azure Data Factory, Spark, Spark Streaming, etc
- . - Understanding of data modeling, warehouse design and fact/dimension concepts - Good Communication
Good To Have:
Certifications for any of the cloud services (Ideally Azure)
• Experience working with code repositories and continuous integration • Understanding of development and project methodologies
Why Join Us?
We offer Innovative work in AI & Data Engineering Space, with a unique, diverse workplace environment having a Continuous learning and development opportunities. These are just some of the reasons we're consistently being recognized as one of the best companies to work for, and why our people choose to grow careers at NutaNXT. We also offer a highly flexible, self-driven, remote work culture which fosters the best of innovation, creativity and work-life balance, market industry-leading compensation which we believe help us consistently deliver to our clients and grow in the highly competitive, fast evolving Digital Engineering space with a strong focus on building advanced software products for clients in the US, Europe and APAC regions.