Cutshort logo
Apache Spark Jobs in Delhi, NCR and Gurgaon

8+ Apache Spark Jobs in Delhi, NCR and Gurgaon | Apache Spark Job openings in Delhi, NCR and Gurgaon

Apply to 8+ Apache Spark Jobs in Delhi, NCR and Gurgaon on CutShort.io. Explore the latest Apache Spark Job opportunities across top companies like Google, Amazon & Adobe.

icon
Publicis Sapient

at Publicis Sapient

10 recruiters
Mohit Singh
Posted by Mohit Singh
Bengaluru (Bangalore), Pune, Hyderabad, Gurugram, Noida
5 - 11 yrs
₹20L - ₹36L / yr
PySpark
Data engineering
Big Data
Hadoop
Spark
+7 more

Publicis Sapient Overview:

The Senior Associate People Senior Associate L1 in Data Engineering, you will translate client requirements into technical design, and implement components for data engineering solution. Utilize deep understanding of data integration and big data design principles in creating custom solutions or implementing package solutions. You will independently drive design discussions to insure the necessary health of the overall solution 

.

Job Summary:

As Senior Associate L2 in Data Engineering, you will translate client requirements into technical design, and implement components for data engineering solution. Utilize deep understanding of data integration and big data design principles in creating custom solutions or implementing package solutions. You will independently drive design discussions to insure the necessary health of the overall solution

The role requires a hands-on technologist who has strong programming background like Java / Scala / Python, should have experience in Data Ingestion, Integration and data Wrangling, Computation, Analytics pipelines and exposure to Hadoop ecosystem components. You are also required to have hands-on knowledge on at least one of AWS, GCP, Azure cloud platforms.


Role & Responsibilities:

Your role is focused on Design, Development and delivery of solutions involving:

• Data Integration, Processing & Governance

• Data Storage and Computation Frameworks, Performance Optimizations

• Analytics & Visualizations

• Infrastructure & Cloud Computing

• Data Management Platforms

• Implement scalable architectural models for data processing and storage

• Build functionality for data ingestion from multiple heterogeneous sources in batch & real-time mode

• Build functionality for data analytics, search and aggregation

Experience Guidelines:

Mandatory Experience and Competencies:

# Competency

1.Overall 5+ years of IT experience with 3+ years in Data related technologies

2.Minimum 2.5 years of experience in Big Data technologies and working exposure in at least one cloud platform on related data services (AWS / Azure / GCP)

3.Hands-on experience with the Hadoop stack – HDFS, sqoop, kafka, Pulsar, NiFi, Spark, Spark Streaming, Flink, Storm, hive, oozie, airflow and other components required in building end to end data pipeline.

4.Strong experience in at least of the programming language Java, Scala, Python. Java preferable

5.Hands-on working knowledge of NoSQL and MPP data platforms like Hbase, MongoDb, Cassandra, AWS Redshift, Azure SQLDW, GCP BigQuery etc

6.Well-versed and working knowledge with data platform related services on at least 1 cloud platform, IAM and data security


Preferred Experience and Knowledge (Good to Have):

# Competency

1.Good knowledge of traditional ETL tools (Informatica, Talend, etc) and database technologies (Oracle, MySQL, SQL Server, Postgres) with hands on experience

2.Knowledge on data governance processes (security, lineage, catalog) and tools like Collibra, Alation etc

3.Knowledge on distributed messaging frameworks like ActiveMQ / RabbiMQ / Solace, search & indexing and Micro services architectures

4.Performance tuning and optimization of data pipelines

5.CI/CD – Infra provisioning on cloud, auto build & deployment pipelines, code quality

6.Cloud data specialty and other related Big data technology certifications


Personal Attributes:

• Strong written and verbal communication skills

• Articulation skills

• Good team player

• Self-starter who requires minimal oversight

• Ability to prioritize and manage multiple tasks

• Process orientation and the ability to define and set up processes


Read more
Bengaluru (Bangalore), Hyderabad, Delhi, Gurugram
5 - 10 yrs
₹14L - ₹15L / yr
Google Cloud Platform (GCP)
Spark
PySpark
Apache Spark
"DATA STREAMING"

Data Engineering : Senior Engineer / Manager


As Senior Engineer/ Manager in Data Engineering, you will translate client requirements into technical design, and implement components for a data engineering solutions. Utilize a deep understanding of data integration and big data design principles in creating custom solutions or implementing package solutions. You will independently drive design discussions to insure the necessary health of the overall solution.


Must Have skills :


1. GCP


2. Spark streaming : Live data streaming experience is desired.


3. Any 1 coding language: Java/Pyhton /Scala



Skills & Experience :


- Overall experience of MINIMUM 5+ years with Minimum 4 years of relevant experience in Big Data technologies


- Hands-on experience with the Hadoop stack - HDFS, sqoop, kafka, Pulsar, NiFi, Spark, Spark Streaming, Flink, Storm, hive, oozie, airflow and other components required in building end to end data pipeline. Working knowledge on real-time data pipelines is added advantage.


- Strong experience in at least of the programming language Java, Scala, Python. Java preferable


- Hands-on working knowledge of NoSQL and MPP data platforms like Hbase, MongoDb, Cassandra, AWS Redshift, Azure SQLDW, GCP BigQuery etc.


- Well-versed and working knowledge with data platform related services on GCP


- Bachelor's degree and year of work experience of 6 to 12 years or any combination of education, training and/or experience that demonstrates the ability to perform the duties of the position


Your Impact :


- Data Ingestion, Integration and Transformation


- Data Storage and Computation Frameworks, Performance Optimizations


- Analytics & Visualizations


- Infrastructure & Cloud Computing


- Data Management Platforms


- Build functionality for data ingestion from multiple heterogeneous sources in batch & real-time


- Build functionality for data analytics, search and aggregation

Read more
Career Forge

at Career Forge

2 candid answers
Mohammad Faiz
Posted by Mohammad Faiz
Delhi, Gurugram, Noida, Ghaziabad, Faridabad
5 - 7 yrs
₹12L - ₹15L / yr
skill iconPython
Apache Spark
PySpark
Data engineering
ETL
+10 more

🚀 Exciting Opportunity: Data Engineer Position in Gurugram 🌐


Hello 


We are actively seeking a talented and experienced Data Engineer to join our dynamic team at Reality Motivational Venture in Gurugram (Gurgaon). If you're passionate about data, thrive in a collaborative environment, and possess the skills we're looking for, we want to hear from you!


Position: Data Engineer  

Location: Gurugram (Gurgaon)  

Experience: 5+ years 


Key Skills:

- Python

- Spark, Pyspark

- Data Governance

- Cloud (AWS/Azure/GCP)


Main Responsibilities:

- Define and set up analytics environments for "Big Data" applications in collaboration with domain experts.

- Implement ETL processes for telemetry-based and stationary test data.

- Support in defining data governance, including data lifecycle management.

- Develop large-scale data processing engines and real-time search and analytics based on time series data.

- Ensure technical, methodological, and quality aspects.

- Support CI/CD processes.

- Foster know-how development and transfer, continuous improvement of leading technologies within Data Engineering.

- Collaborate with solution architects on the development of complex on-premise, hybrid, and cloud solution architectures.


Qualification Requirements:

- BSc, MSc, MEng, or PhD in Computer Science, Informatics/Telematics, Mathematics/Statistics, or a comparable engineering degree.

- Proficiency in Python and the PyData stack (Pandas/Numpy).

- Experience in high-level programming languages (C#/C++/Java).

- Familiarity with scalable processing environments like Dask (or Spark).

- Proficient in Linux and scripting languages (Bash Scripts).

- Experience in containerization and orchestration of containerized services (Kubernetes).

- Education in database technologies (SQL/OLAP and Non-SQL).

- Interest in Big Data storage technologies (Elastic, ClickHouse).

- Familiarity with Cloud technologies (Azure, AWS, GCP).

- Fluent English communication skills (speaking and writing).

- Ability to work constructively with a global team.

- Willingness to travel for business trips during development projects.


Preferable:

- Working knowledge of vehicle architectures, communication, and components.

- Experience in additional programming languages (C#/C++/Java, R, Scala, MATLAB).

- Experience in time-series processing.


How to Apply:

Interested candidates, please share your updated CV/resume with me.


Thank you for considering this exciting opportunity.

Read more
Accolite Digital
Nitesh Parab
Posted by Nitesh Parab
Bengaluru (Bangalore), Hyderabad, Gurugram, Delhi, Noida, Ghaziabad, Faridabad
4 - 8 yrs
₹5L - ₹15L / yr
ETL
Informatica
Data Warehouse (DWH)
SSIS
SQL Server Integration Services (SSIS)
+10 more

Job Title: Data Engineer

Job Summary: As a Data Engineer, you will be responsible for designing, building, and maintaining the infrastructure and tools necessary for data collection, storage, processing, and analysis. You will work closely with data scientists and analysts to ensure that data is available, accessible, and in a format that can be easily consumed for business insights.

Responsibilities:

  • Design, build, and maintain data pipelines to collect, store, and process data from various sources.
  • Create and manage data warehousing and data lake solutions.
  • Develop and maintain data processing and data integration tools.
  • Collaborate with data scientists and analysts to design and implement data models and algorithms for data analysis.
  • Optimize and scale existing data infrastructure to ensure it meets the needs of the business.
  • Ensure data quality and integrity across all data sources.
  • Develop and implement best practices for data governance, security, and privacy.
  • Monitor data pipeline performance / Errors and troubleshoot issues as needed.
  • Stay up-to-date with emerging data technologies and best practices.

Requirements:

Bachelor's degree in Computer Science, Information Systems, or a related field.

Experience with ETL tools like Matillion,SSIS,Informatica

Experience with SQL and relational databases such as SQL server, MySQL, PostgreSQL, or Oracle.

Experience in writing complex SQL queries

Strong programming skills in languages such as Python, Java, or Scala.

Experience with data modeling, data warehousing, and data integration.

Strong problem-solving skills and ability to work independently.

Excellent communication and collaboration skills.

Familiarity with big data technologies such as Hadoop, Spark, or Kafka.

Familiarity with data warehouse/Data lake technologies like Snowflake or Databricks

Familiarity with cloud computing platforms such as AWS, Azure, or GCP.

Familiarity with Reporting tools

Teamwork/ growth contribution

  • Helping the team in taking the Interviews and identifying right candidates
  • Adhering to timelines
  • Intime status communication and upfront communication of any risks
  • Tech, train, share knowledge with peers.
  • Good Communication skills
  • Proven abilities to take initiative and be innovative
  • Analytical mind with a problem-solving aptitude

Good to have :

Master's degree in Computer Science, Information Systems, or a related field.

Experience with NoSQL databases such as MongoDB or Cassandra.

Familiarity with data visualization and business intelligence tools such as Tableau or Power BI.

Knowledge of machine learning and statistical modeling techniques.

If you are passionate about data and want to work with a dynamic team of data scientists and analysts, we encourage you to apply for this position.

Read more
Celebal Technologies

at Celebal Technologies

2 recruiters
Payal Hasnani
Posted by Payal Hasnani
Jaipur, Noida, Gurugram, Delhi, Ghaziabad, Faridabad, Pune, Mumbai
5 - 15 yrs
₹7L - ₹25L / yr
PySpark
Data engineering
Big Data
Hadoop
Spark
+4 more
Job Responsibilities:

• Project Planning and Management
o Take end-to-end ownership of multiple projects / project tracks
o Create and maintain project plans and other related documentation for project
objectives, scope, schedule and delivery milestones
o Lead and participate across all the phases of software engineering, right from
requirements gathering to GO LIVE
o Lead internal team meetings on solution architecture, effort estimation, manpower
planning and resource (software/hardware/licensing) planning
o Manage RIDA (Risks, Impediments, Dependencies, Assumptions) for projects by
developing effective mitigation plans
• Team Management
o Act as the Scrum Master
o Conduct SCRUM ceremonies like Sprint Planning, Daily Standup, Sprint Retrospective
o Set clear objectives for the project and roles/responsibilities for each team member
o Train and mentor the team on their job responsibilities and SCRUM principles
o Make the team accountable for their tasks and help the team in achieving them
o Identify the requirements and come up with a plan for Skill Development for all team
members
• Communication
o Be the Single Point of Contact for the client in terms of day-to-day communication
o Periodically communicate project status to all the stakeholders (internal/external)
• Process Management and Improvement
o Create and document processes across all disciplines of software engineering
o Identify gaps and continuously improve processes within the team
o Encourage team members to contribute towards process improvement
o Develop a culture of quality and efficiency within the team

Must have:
• Minimum 08 years of experience (hands-on as well as leadership) in software / data engineering
across multiple job functions like Business Analysis, Development, Solutioning, QA, DevOps and
Project Management
• Hands-on as well as leadership experience in Big Data Engineering projects
• Experience developing or managing cloud solutions using Azure or other cloud provider
• Demonstrable knowledge on Hadoop, Hive, Spark, NoSQL DBs, SQL, Data Warehousing, ETL/ELT,
DevOps tools
• Strong project management and communication skills
• Strong analytical and problem-solving skills
• Strong systems level critical thinking skills
• Strong collaboration and influencing skills

Good to have:
• Knowledge on PySpark, Azure Data Factory, Azure Data Lake Storage, Synapse Dedicated SQL
Pool, Databricks, PowerBI, Machine Learning, Cloud Infrastructure
• Background in BFSI with focus on core banking
• Willingness to travel

Work Environment
• Customer Office (Mumbai) / Remote Work

Education
• UG: B. Tech - Computers / B. E. – Computers / BCA / B.Sc. Computer Science
Read more
RenovITe Technologies

at RenovITe Technologies

1 recruiter
Pranjali Sinha
Posted by Pranjali Sinha
Noida
6 - 9 yrs
₹8L - ₹16L / yr
Apache Spark
Apache Kafka
Gradle
Maven
• Java 1.8+ or Open JDK 11+ • Core spring framework
• Spring Boot • Spring Rest Services
• Any RDBMS • Gradel or Maven
• JPA / Hibernate • GIT / Bitbucket
• Apache Spark
Read more
Top startup of India -  News App
Noida
2 - 5 yrs
₹20L - ₹35L / yr
Linux/Unix
skill iconPython
Hadoop
Apache Spark
skill iconMongoDB
+4 more
Responsibilities
● Create and maintain optimal data pipeline architecture.
● Assemble large, complex data sets that meet functional / non-functional
business requirements.
● Building and optimizing ‘big data’ data pipelines, architectures and data sets.
● Maintain, organize & automate data processes for various use cases.
● Identifying trends, doing follow-up analysis, preparing visualizations.
● Creating daily, weekly and monthly reports of product KPIs.
● Create informative, actionable and repeatable reporting that highlights
relevant business trends and opportunities for improvement.

Required Skills And Experience:
● 2-5 years of work experience in data analytics- including analyzing large data sets.
● BTech in Mathematics/Computer Science
● Strong analytical, quantitative and data interpretation skills.
● Hands-on experience with Python, Apache Spark, Hadoop, NoSQL
databases(MongoDB preferred), Linux is a must.
● Experience building and optimizing ‘big data’ data pipelines, architectures and data sets.
● Experience with Google Cloud Data Analytics Products such as BigQuery, Dataflow, Dataproc etc. (or similar cloud-based platforms).
● Experience working within a Linux computing environment, and use of
command-line tools including knowledge of shell/Python scripting for
automating common tasks.
● Previous experience working at startups and/or in fast-paced environments.
● Previous experience as a data engineer or in a similar role.
Read more
SmartJoules

at SmartJoules

1 video
9 recruiters
Saksham Dutta
Posted by Saksham Dutta
Remote, NCR (Delhi | Gurgaon | Noida)
3 - 5 yrs
₹8L - ₹12L / yr
skill iconMachine Learning (ML)
skill iconPython
Big Data
Apache Spark
skill iconDeep Learning

Responsibilities:

  • Exploring and visualizing data to gain an understanding of it, then identifying differences in data distribution that could affect performance when deploying the model in the real world.
  • Verifying data quality, and/or ensuring it via data cleaning.
  • Able to adapt and work fast in producing the output which upgrades the decision making of stakeholders using ML.
  • To design and develop Machine Learning systems and schemes. 
  • To perform statistical analysis and fine-tune models using test results.
  • To train and retrain ML systems and models as and when necessary. 
  • To deploy ML models in production and maintain the cost of cloud infrastructure.
  • To develop Machine Learning apps according to client and data scientist requirements.
  • To analyze the problem-solving capabilities and use-cases of ML algorithms and rank them by how successful they are in meeting the objective.


Technical Knowledge:


  • Worked with real time problems, solved them using ML and deep learning models deployed in real time and should have some awesome projects under his belt to showcase. 
  • Proficiency in Python and experience with working with Jupyter Framework, Google collab and cloud hosted notebooks such as AWS sagemaker, DataBricks etc.
  • Proficiency in working with libraries Sklearn, Tensorflow, Open CV2, Pyspark,  Pandas, Numpy and related libraries.
  • Expert in visualising and manipulating complex datasets.
  • Proficiency in working with visualisation libraries such as seaborn, plotly, matplotlib etc.
  • Proficiency in Linear Algebra, statistics and probability required for Machine Learning.
  • Proficiency in ML Based algorithms for example, Gradient boosting, stacked Machine learning, classification algorithms and deep learning algorithms. Need to have experience in hypertuning various models and comparing the results of algorithm performance.
  • Big data Technologies such as Hadoop stack and Spark. 
  • Basic use of clouds (VM’s example EC2).
  • Brownie points for Kubernetes and Task Queues.      
  • Strong written and verbal communications.
  • Experience working in an Agile environment.
Read more
Get to hear about interesting companies hiring right now
Company logo
Company logo
Company logo
Company logo
Company logo
Linkedin iconFollow Cutshort
Why apply via Cutshort?
Connect with actual hiring teams and get their fast response. No spam.
Find more jobs
Get to hear about interesting companies hiring right now
Company logo
Company logo
Company logo
Company logo
Company logo
Linkedin iconFollow Cutshort