PySpark Jobs

Explore top PySpark Job opportunities from Top Companies & Startups. All jobs are added by verified employees who can be contacted directly below.
icon
DP
Posted by Vandana Saxena
icon
Mumbai, Navi Mumbai
icon
2 - 8 yrs
icon
₹5L - ₹12L / yr
Microsoft Windows Azure
ADF
NumPy
PySpark
Databricks
+1 more
Experience and expertise in using Azure cloud services. Azure certification will be a plus.

- Experience and expertise in Python Development and its different libraries like Pyspark, pandas, NumPy

- Expertise in ADF, Databricks.

- Creating and maintaining data interfaces across a number of different protocols (file, API.).

- Creating and maintaining internal business process solutions to keep our corporate system data in sync and reduce manual processes where appropriate.

- Creating and maintaining monitoring and alerting workflows to improve system transparency.

- Facilitate the development of our Azure cloud infrastructure relative to Data and Application systems.

- Design and lead development of our data infrastructure including data warehouses, data marts, and operational data stores.

- Experience in using Azure services such as ADLS Gen 2, Azure Functions, Azure messaging services, Azure SQL Server, Azure KeyVault, Azure Cognitive services etc.
Read more

For MNC Company with providing Remote Working Currently

Agency job
icon
Bengaluru (Bangalore), Chennai, Hyderabad, Pune, Jaipur, Chandigarh
icon
7 - 10 yrs
icon
₹1L - ₹24L / yr
Spark
Hadoop
Big Data
Data engineering
PySpark
+1 more

Role: Data Engineer

Experience: 7 to 10 Years

Skills Required:

Languages : Python

- AWS** : Glue, Lambda, Athena, Lake Formation, ECS, IAM, SQS, SNS, KMS

- Spark** (Experience in AWS PaaS not only scoped to EMR or On-prem like cloudera)/Pyspark

 

Secondary skills

------------------------

- Airflow (Good to have understanding)

- PyTest or UnitTest (Any testing Framework)

- CI/CD : Drone or CircleCI or TravisCI or any other tool (Understanding of how it works)

- Understanding of configuration framework (YAML, JSON, star lark etc)

- Terraform (IaC)

- Kubernetes (Understanding of containers and their deployment)

 

Read more
icon
Pune, Chennai
icon
5 - 9 yrs
icon
₹15L - ₹20L / yr
Scala
PySpark
Spark
SQL Azure
Hadoop
+4 more
  • 5+ years of experience in a Data Engineering role on cloud environment
  • Must have good experience in Scala/PySpark (preferably on data-bricks environment)
  • Extensive experience with Transact-SQL.
  • Experience in Data-bricks/Spark.
  • Strong experience in Dataware house projects
  • Expertise in database development projects with ETL processes.
  • Manage and maintain data engineering pipelines
  • Develop batch processing, streaming and integration solutions
  • Experienced in building and operationalizing large-scale enterprise data solutions and applications
  • Using one or more of Azure data and analytics services in combination with custom solutions
  • Azure Data Lake, Azure SQL DW (Synapse), and SQL Database products or equivalent products from other cloud services providers
  • In-depth understanding of data management (e. g. permissions, security, and monitoring).
  • Cloud repositories for e.g. Azure GitHub, Git
  • Experience in an agile environment (Prefer Azure DevOps).

Good to have

  • Manage source data access security
  • Automate Azure Data Factory pipelines
  • Continuous Integration/Continuous deployment (CICD) pipelines, Source Repositories
  • Experience in implementing and maintaining CICD pipelines
  • Power BI understanding, Delta Lake house architecture
  • Knowledge of software development best practices.
  • Excellent analytical and organization skills.
  • Effective working in a team as well as working independently.
  • Strong written and verbal communication skills.
  • Expertise in database development projects and ETL processes.
Read more
icon
Bengaluru (Bangalore)
icon
3 - 7.5 yrs
icon
₹10L - ₹25L / yr
Machine Learning (ML)
Data Science
Natural Language Processing (NLP)
Spark
Software deployment
+1 more
Job ID: ZS0701

Hi,

We are hiring for Data Scientist for Bangalore.

Req Skills:

  • NLP 
  • ML programming
  • Spark
  • Model Deployment
  • Experience processing unstructured data and building NLP models
  • Experience with big data tools pyspark
  • Pipeline orchestration using Airflow and model deployment experience is preferred
Read more

at bipp

DP
Posted by Vish Josh
icon
Remote only
icon
2 - 8 yrs
icon
₹5L - ₹14L / yr
SQL
Apache Spark
Python
PySpark
Hadoop
+2 more
Do NOT apply if you are :
  • If you are not serious about joining us, the Interview process shouldn't be a waste of time. 
  • Want to be a Power Bi, Qlik, or Tableau only developer.
  • A machine learning aspirant
  • A data scientist
  • Wanting to write only Python scripts
  • Want to do AI
  • Want to do 'BIG' data
  • Want to do HADOOP
  • Fresh Graduate
Apply if you :
  • Write SQL and Python for complicated analytical queries.
  • Understand existing business problems of the client and map their needs to the schema that they have.
  • Can neatly disassemble the problem into components and solve the needs by using SQL.
  • Have worked on existing BI products.
Mention 'awesome sql' when applying or in the screening question so that we know you have read the job post. 

Your role:
  • Develop solutions with our exciting new BI product for our clients.
  • You should be very experienced and comfortable with writing SQL against very complicated schema to help answer business questions.
  • Have an analytical thought process.
Read more
DP
Posted by Akanksha kondagurla
icon
Remote only
icon
5 - 10 yrs
icon
₹15L - ₹18L / yr
PySpark
Spark
Hadoop
Big Data
Data engineering
+3 more
  • Key Responsibilities : AShould be conversant with Apache Spark architecture RDDs various transformations and actions spark configuration and tuning techniques BKnowledge of Hadoop architecture execution engines frameworks applications tools CPyspark using Spark MLlib library DExposure to Data warehousing concepts methods

  • Technical Experience : 
    AShould have Excellent development experience with Python producing data applications BShould have 8 years of experience using PySpark with Spark RDDs Spark SQL DataFrames CShould have Experience in AWS Sagemaker and AWS Glue D Should have Experience in data wrangling and data analysis with Pandas and Numpy

  • Professional Attributes : 
    A Should have good communication and analytical skills B Team player

  • Educational Qualification : 
    Graduate
Read more
DP
Posted by Komal Samudrala
icon
Bengaluru (Bangalore)
icon
3 - 6 yrs
icon
₹7L - ₹30L / yr
Machine Learning (ML)
Natural Language Processing (NLP)
Python
Data Science
PySpark

Job ID: ZS0705

Data Scientist
Job description:
Exp: 3-6 Yrs
Location: Bangalore
Notice: Immediate to 15 days

Responsibilities:

  • Develop advanced algorithms that solve problems of large dimensionality in a computationally efficient and statistically effective manner;
  • Execute statistical and data mining techniques (e.g. hypothesis testing, machine learning and retrieval processes) on large data sets to identify trends, figures and other relevant information;
  • Evaluate emerging datasets and technologies that may contribute to our analytical platform;
  • Participate in development of select assets/accelerators that create scale;
  • Contribute to thought leadership through research and publication support;
  • Guide and mentor Associates on teams.

Qualifications:

 
  • 3-6 years of relevant post-collegiate work experience;
  • Knowledge of big data/advanced analytics concepts and algorithms (e.g. text mining, social listening, recommender systems, predictive modeling, etc.);
  • Should have experience on NLP, Pyspark
  • Exposure to tools/platforms (e.g. Hadoop eco system and database systems);
  • Agile project planning and project management skills;
  • Relevant domain knowledge preferred; (healthcare/transportation/hi-tech/insurance);
  • Excellent oral and written communication skills;
  • Strong attention to detail, with a research-focused mindset;
  • Excellent critical thinking and problem solving skills;
  • High motivation, good work ethic and maturity.
Read more
DP
Posted by phani kalyan
icon
Pune
icon
9 - 14 yrs
icon
₹20L - ₹40L / yr
Spark
Hadoop
Big Data
Data engineering
PySpark
+3 more
Job Id: SG0601

Hi,

Enterprise Minds is looking for Data Architect for Pune Location.

Req Skills:
Python,Pyspark,Hadoop,Java,Scala
Read more
icon
Bengaluru (Bangalore)
icon
3 - 6 yrs
icon
Best in industry
Python
PySpark
Data Science
Job ID: ZS070

Hi,

Enterprise minds is looking for Data Scientist. 

Strong in Python,Pyspark.

Prefer immediate joiners
Read more
icon
Pune
icon
6 - 8 yrs
icon
₹15L - ₹22L / yr
Spark
Hadoop
Big Data
Data engineering
PySpark
+2 more
Greetings!!

The Energy Exemplar (EE) data team is looking for an experienced Python Developer (Data Engineer) to join our Pune office. As a dedicated Data Engineer on our Research team, you will apply data engineering expertise, work very closely with the core data team to identify different data sources for specific energy markets and create an automated data pipeline. The pipeline will then incrementally pull the data from its sources and maintain a dataset, which in turn provides tremendous value to hundreds of EE customers.

 

At EE, you’ll have access to vast amounts of energy-related data from our sources. Our data pipelines are curated and supported by engineering teams. We also offer many company-sponsored classes and conferences that focus on data engineering, data platform. There’s a great growth opportunity for data engineering at EE..

Responsibilities

  •  Develop, test and maintain architectures, such as databases and large-scale processing systems using high-performance data pipelines.
  •  Recommend and implement ways to improve data reliability, efficiency, and quality.
  •  Identify performant features and make them universally accessible to our teams across EE.
  •  Work together with data analysts and data scientists to wrangle the data and provide quality datasets and insights to business-critical decisions
  • Take end-to-end responsibility for the development, quality, testing, and production readiness of the services you build.
  • Define and evangelize Data Engineering best standards and practices to ensure engineering excellence at every stage of a development cycle.
  • Act as a resident expert for data engineering, feature engineering, exploratory data analysis.
  • Agile methodologies, acting as Scrum Master would be an added plus.

Qualifications

  • 6+ years of professional experience in developing data pipelines for large-scale, complex datasets from varieties of data sources.
  • Data Engineering expertise with strong experience working with Python, Beautiful Soup, Selenium, Regular Expression, Web Scraping.
  • Best practices with Python Development, Doc String, Type Hints, Unit Testing, etc.
  • Experience working with Cloud-based data technologies such as Azure Data lake, Azure Data Factory, Azure Data Bricks is optionally desirable.
  • Moderate coding skills. SQL or similar required. C# or other languages strongly preferred.
  • Outstanding communication and collaboration skills. You can learn from and teach others.
  • Strong drive for results. You have a proven record of shepherding experiments to create successful shipping products/services
  • A Bachelor or Masters degree in Computer Science or Engineering with coursework in Python, Big Data, Data Engineering is highly desirable.
Read more
DP
Posted by Hrushikesh Mande
icon
Remote only
icon
1 - 4 yrs
icon
₹8L - ₹15L / yr
Data Warehouse (DWH)
Informatica
ETL
Big Data
PySpark
+2 more

About Climate Connect Digital


Our team is inspired to change the world by making energy greener, and more affordable. Established in 2011 in London, UK, and now headquartered in Gurgaon, India. From unassuming beginnings, we have become a leading energy-AI software player, at the vanguard of accelerating the global energy transition.


Today we are a remote first organization, building digital tools for modern enterprises to reduce their carbon footprint and help the industry to get to carbon zero.



About the Role - Data Engineer


As we start into our first strong growth phase, we are looking for a Data Engineer to build the data infrastructure to support business and product growth.

You are someone who can see projects through from beginning to end, coach others, and self-manage. We’re looking for an eager individual who can guide our data stack using AWS services with technical knowledge, communication skills, and real-world experience.


The data flowing through our platform directly contributes to decision-making by algorithms & all levels of leadership alike. If you’re passionate about building tools that enhance productivity, improve green energy, reduce waste, and improve work-life harmony for a large and rapidly growing finance user base, come join us!


Job Responsibilities

  • Iterate, build, and implement our data model, data warehousing, and data integration architecture using AWS & GCP services
  • Build solutions that ingest data from source and partner systems into our data infrastructure, where the data is transformed, intelligently curated and made available for consumption by downstream operational and analytical processes
  • Integrate data from source systems using common ETL tools or programming languages (e.g. Ruby, Python, Scala, AWS Data Pipeline, etc.)
  • Develop tailor-made strategies, concepts and solutions for the efficient handling of our growing amounts of data
  • Work iteratively with our data scientist to build up fact tables (e.g. container ship movements), dimension tables (e.g. weather data), ETL processes, and build the data catalog

Job Requirements


  • Experience designing, building and maintaining data architecture and warehousing using AWS services
  • Authoritative in ETL optimization, designing, coding, and tuning big data processes using Apache Spark, R, Python, C# and/or similar technologies
  • Experience managing AWS resources using Terraform
  • Experience in Data engineering and infrastructure work for analytical and machine learning processes
  • Experience with ETL tooling, migrating ETL code from one technology to another will be a benefit
  • Experience with Data visualisation / dashboarding tools as QA/QC data processes
  • Independent, self-starter who thrives in a fast pace environment

What’s in it for you


We offer competitive salaries based on prevailing market rates. In addition to your introductory package, you can expect to receive the following benefits:


  • Flexible working hours and leave policy
  • Learning and development opportunities
  • Medical insurance/Term insurance, Gratuity benefits over and above the salaries
  • Access to industry and domain thought leaders.

At Climate Connect, you get a rare opportunity to join an established company at the early stages of a significant and well-backed global growth push.


We are building a remote-first organisation ingrained in the team ethos. We understand its importance for the success of any next-generation technology company. The team includes passionate and self-driven people with unconventional backgrounds, and we’re seeking a similar spirit with the right potential.

 

What it’s ​like to work with us

 

You become part of a strong network and an accomplished legacy from leading technology and business schools worldwide when you join us. Such as the Indian Institute of Technology, Oxford University, University of Cambridge, University College London, and many more.

 

We don’t believe in constrained traditional hierarchies and instead work in flexible teams with the freedom to achieve successful business outcomes. We want more people who can thrive in a fast-paced, collaborative environment. Our comprehensive support system comprises a global network of advisors and experts, providing unparalleled opportunities for learning and growth.

Read more
DP
Posted by Raunak Swarnkar
icon
Bengaluru (Bangalore)
icon
0 - 2 yrs
icon
₹10L - ₹15L / yr
Python
PySpark
SQL
pandas
Cloud Computing
+2 more

BRIEF DESCRIPTION:

At-least 1 year of Python, Spark, SQL, data engineering experience

Primary Skillset: PySpark, Scala/Python/Spark, Azure Synapse, S3, RedShift/Snowflake

Relevant Experience: Legacy ETL job Migration to AWS Glue / Python & Spark combination

 

ROLE SCOPE:

Reverse engineer the existing/legacy ETL jobs

Create the workflow diagrams and review the logic diagrams with Tech Leads

Write equivalent logic in Python & Spark

Unit test the Glue jobs and certify the data loads before passing to system testing

Follow the best practices, enable appropriate audit & control mechanism

Analytically skillful, identify the root causes quickly and efficiently debug issues

Take ownership of the deliverables and support the deployments

 

REQUIREMENTS:

Create data pipelines for data integration into Cloud stacks eg. Azure Synapse

Code data processing jobs in Azure Synapse Analytics, Python, and Spark

Experience in dealing with structured, semi-structured, and unstructured data in batch and real-time environments.

Should be able to process .json, .parquet and .avro files

 

PREFERRED BACKGROUND:

Tier1/2 candidates from IIT/NIT/IIITs

However, relevant experience, learning attitude takes precedence

Read more

Top Management Consulting Company

Agency job
icon
Gurugram, Bengaluru (Bangalore)
icon
2 - 9 yrs
icon
₹10L - ₹38L / yr
Data Science
Machine Learning (ML)
Natural Language Processing (NLP)
Computer Vision
PySpark
+1 more
Greetings!!

We are looking for a Machine Learning engineer for on of our premium client.
Experience: 2-9 years
Location: Gurgaon/Bangalore
Tech Stack:

Python, PySpark, the Python Scientific Stack; MLFlow, Grafana, Prometheus for machine learning pipeline management and monitoring; SQL, Airflow, Databricks, our own open-source data pipelining framework called Kedro, Dask/RAPIDS; Django, GraphQL and ReactJS for horizontal product development; container technologies such as Docker and Kubernetes, CircleCI/Jenkins for CI/CD, cloud solutions such as AWS, GCP, and Azure as well as Terraform and Cloudformation for deployment
Read more
DP
Posted by Puja Kumari
icon
Remote only
icon
2 - 6 yrs
icon
₹6L - ₹20L / yr
Apache Hive
Spark
Scala
PySpark
Data engineering
+4 more
We are looking for big data engineers to join our transformational consulting team serving one of our top US clients in the financial sector. You'd get an opportunity to develop big data pipelines and convert business requirements to production grade services and products. With
lesser concentration on enforcing how to do a particular task, we believe in giving people the opportunity to think out of the box and come up with their own innovative solution to problem solving.
You will primarily be developing, managing and executing handling multiple prospect campaigns as part of Prospect Marketing Journey to ensure best conversion rates and retention rates. Below are the roles, responsibilities and skillsets we are looking for and if you feel these resonate with you, please get in touch with us by applying to this role.
Roles and Responsibilities:
• You'd be responsible for development and maintenance of applications with technologies involving Enterprise Java and Distributed technologies.
• You'd collaborate with developers, product manager, business analysts and business users in conceptualizing, estimating and developing new software applications and enhancements.
• You'd Assist in the definition, development, and documentation of software’s objectives, business requirements, deliverables, and specifications in collaboration with multiple cross-functional teams.
• Assist in the design and implementation process for new products, research and create POC for possible solutions.
Skillset:
• Bachelors or Masters Degree in a technology related field preferred.
• Overall experience of 2-3 years on the Big Data Technologies.
• Hands on experience with Spark (Java/ Scala)
• Hands on experience with Hive, Shell Scripting
• Knowledge on Hbase, Elastic Search
• Development experience In Java/ Python is preferred
• Familiar with profiling, code coverage, logging, common IDE’s and other
development tools.
• Demonstrated verbal and written communication skills, and ability to interface with Business, Analytics and IT organizations.
• Ability to work effectively in short-cycle, team oriented environment, managing multiple priorities and tasks.
• Ability to identify non-obvious solutions to complex problems
Read more
DP
Posted by Shiva V
icon
Remote, Hyderabad
icon
4 - 6 yrs
icon
₹15L - ₹20L / yr
Python
PySpark
Spark
Scala
Microsoft Azure Data factory
Should have good experience with Python or Scala/PySpark/Spark/
• Experience with Advanced SQL
• Experience with Azure data factory, data bricks,
• Experience with Azure IOT, Cosmos DB, BLOB Storage
• API management, FHIR API development,
• Proficient with Git and CI/CD best practices
• Experience working with Snowflake is a plus
Read more

Top 3 Fintech Startup

Agency job
via Jobdost by Sathish Kumar
icon
Bengaluru (Bangalore)
icon
6 - 9 yrs
icon
₹16L - ₹24L / yr
SQL
Amazon Web Services (AWS)
Spark
PySpark
Apache Hive

We are looking for an exceptionally talented Lead data engineer who has exposure in implementing AWS services to build data pipelines, api integration and designing data warehouse. Candidate with both hands-on and leadership capabilities will be ideal for this position.

 

Qualification: At least a bachelor’s degree in Science, Engineering, Applied Mathematics. Preferred Masters degree

 

Job Responsibilities:

• Total 6+ years of experience as a Data Engineer and 2+ years of experience in managing a team

• Have minimum 3 years of AWS Cloud experience.

• Well versed in languages such as Python, PySpark, SQL, NodeJS etc

• Has extensive experience in the real-timeSpark ecosystem and has worked on both real time and batch processing

• Have experience in AWS Glue, EMR, DMS, Lambda, S3, DynamoDB, Step functions, Airflow, RDS, Aurora etc.

• Experience with modern Database systems such as Redshift, Presto, Hive etc.

• Worked on building data lakes in the past on S3 or Apache Hudi

• Solid understanding of Data Warehousing Concepts

• Good to have experience on tools such as Kafka or Kinesis

• Good to have AWS Developer Associate or Solutions Architect Associate Certification

• Have experience in managing a team

Read more
icon
Remote only
icon
9 - 20 yrs
icon
Best in industry
OLTP
data ops
cloud data
Amazon Web Services (AWS)
Google Cloud Platform (GCP)
+6 more

THE ROLE:Sr. Cloud Data Infrastructure Engineer

As a Sr. Cloud Data Infrastructure Engineer with Intuitive, you will be responsible for building or converting legacy data pipelines from legacy environments to modern cloud environments to help the analytics and data science initiatives across our enterprise customers. You will be working closely with SMEs in Data Engineering and Cloud Engineering, to create solutions and extend Intuitive's DataOps Engineering Projects and Initiatives. The Sr. Cloud Data Infrastructure Engineer will be a central critical role for establishing the DataOps/DataX data logistics and management for building data pipelines, enforcing best practices, ownership for building complex and performant Data Lake Environments, work closely with Cloud Infrastructure Architects and DevSecOps automation teams. The Sr. Cloud Data Infrastructure Engineer is the main point of contact for all things related to DataLake formation and data at scale. In this role, we expect our DataOps leaders to be obsessed with data and providing insights to help our end customers.

ROLES & RESPONSIBILITIES:

  • Design, develop, implement, and tune large-scale distributed systems and pipelines that process large volume of data; focusing on scalability, low-latency, and fault-tolerance in every system built
  • Developing scalable and re-usable frameworks for ingesting large data from multiple sources.
  • Modern Data Orchestration engineering - query tuning, performance tuning, troubleshooting, and debugging big data solutions.
  • Provides technical leadership, fosters a team environment, and provides mentorship and feedback to technical resources.
  • Deep understanding of ETL/ELT design methodologies, patterns, personas, strategy, and tactics for complex data transformations.
  • Data processing/transformation using various technologies such as spark and cloud Services.
  • Understand current data engineering pipelines using legacy SAS tools and convert to modern pipelines.

 

Data Infrastructure Engineer Strategy Objectives: End to End Strategy

Define how data is acquired, stored, processed, distributed, and consumed.
Collaboration and Shared responsibility across disciplines as partners in delivery for progressing our maturity model in the End-to-End Data practice.

  • Understanding and experience with modern cloud data orchestration and engineering for one or more of the following cloud providers - AWS, Azure, GCP.
  • Leading multiple engagements to design and develop data logistic patterns to support data solutions using data modeling techniques (such as file based, normalized or denormalized, star schemas, schema on read, Vault data model, graphs) for mixed workloads, such as OLTP, OLAP, streaming using any formats (structured, semi-structured, unstructured).
  • Applying leadership and proven experience with architecting and designing data implementation patterns and engineered solutions using native cloud capabilities that span data ingestion & integration (ingress and egress), data storage (raw & cleansed), data prep & processing, master & reference data management, data virtualization & semantic layer, data consumption & visualization.
  • Implementing cloud data solutions in the context of business applications, cost optimization, client's strategic needs and future growth goals as it relates to becoming a 'data driven' organization.
  • Applying and creating leading practices that support high availability, scalable, process and storage intensive solutions architectures to data integration/migration, analytics and insights, AI, and ML requirements.
  • Applying leadership and review to create high quality detailed documentation related to cloud data Engineering.
  • Understanding of one or more is a big plus -CI/CD, cloud devops, containers (Kubernetes/Docker, etc.), Python/PySpark/JavaScript.
  • Implementing cloud data orchestration and data integration patterns (AWS Glue, Azure Data Factory, Event Hub, Databricks, etc.), storage and processing (Redshift, Azure Synapse, BigQuery, Snowflake)
  • Possessing a certification(s) in one of the following is a big plus - AWS/Azure/GCP data engineering, and Migration.

 

 

KEY REQUIREMENTS:

  • 10+ years’ experience as data engineer.
  • Must have 5+ Years in implementing data engineering solutions with multiple cloud providers and toolsets.
  • This is hands on role building data pipelines using Cloud Native and Partner Solutions. Hands-on technical experience with Data at Scale.
  • Must have deep expertise in one of the programming languages for data processes (Python, Scala). Experience with Python, PySpark, Hadoop, Hive and/or Spark to write data pipelines and data processing layers.
  • Must have worked with multiple database technologies and patterns. Good SQL experience for writing complex SQL transformation.
  • Performance Tuning of Spark SQL running on S3/Data Lake/Delta Lake/ storage and Strong Knowledge on Databricks and Cluster Configurations.
  • Nice to have Databricks administration including security and infrastructure features of Databricks.
  • Experience with Development Tools for CI/CD, Unit and Integration testing, Automation and Orchestration
Read more
icon
Remote only
icon
4 - 7 yrs
icon
₹10L - ₹30L / yr
ETL
Informatica
Data Warehouse (DWH)
Big Data
Scala
+4 more

Job Description:

We are looking for a Big Data Engineer who have worked across the entire ETL stack. Someone who has ingested data in a batch and live stream format, transformed large volumes of daily and built Data-warehouse to store the transformed data and has integrated different visualization dashboards and applications with the data stores.    The primary focus will be on choosing optimal solutions to use for these purposes, then maintaining, implementing, and monitoring them.

Responsibilities:

  • Develop, test, and implement data solutions based on functional / non-functional business requirements.
  • You would be required to code in Scala and PySpark daily on Cloud as well as on-prem infrastructure
  • Build Data Models to store the data in a most optimized manner
  • Identify, design, and implement process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
  • Implementing the ETL process and optimal data pipeline architecture
  • Monitoring performance and advising any necessary infrastructure changes.
  • Create data tools for analytics and data scientist team members that assist them in building and optimizing our product into an innovative industry leader.
  • Work with data and analytics experts to strive for greater functionality in our data systems.
  • Proactively identify potential production issues and recommend and implement solutions
  • Must be able to write quality code and build secure, highly available systems.
  • Create design documents that describe the functionality, capacity, architecture, and process.
  • Review peer-codes and pipelines before deploying to Production for optimization issues and code standards

Skill Sets:

  • Good understanding of optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL and ‘big data’ technologies.
  • Proficient understanding of distributed computing principles
  • Experience in working with batch processing/ real-time systems using various open-source technologies like NoSQL, Spark, Pig, Hive, Apache Airflow.
  • Implemented complex projects dealing with the considerable data size (PB).
  • Optimization techniques (performance, scalability, monitoring, etc.)
  • Experience with integration of data from multiple data sources
  • Experience with NoSQL databases, such as HBase, Cassandra, MongoDB, etc.,
  • Knowledge of various ETL techniques and frameworks, such as Flume
  • Experience with various messaging systems, such as Kafka or RabbitMQ
  • Creation of DAGs for data engineering
  • Expert at Python /Scala programming, especially for data engineering/ ETL purposes

 

 

 

Read more
icon
Bengaluru (Bangalore)
icon
2 - 10 yrs
icon
₹5L - ₹15L / yr
PySpark
Data engineering
Big Data
Hadoop
Spark
+1 more
Job Description:

Must Have Skills:
• Good experience in Pyspark - Including Dataframe core functions and Spark SQL
• Good experience in SQL DBs - Be able to write queries including fair complexity.
• Should have excellent experience in Big Data programming for data transformation and aggregations
• Good at ELT architecture. Business rules processing and data extraction from Data Lake into data streams for business consumption.
• Good customer communication.
• Good Analytical skills
Read more

MNC

Agency job
via Eurka IT SOL by Srikanth a
icon
Chennai
icon
5 - 11 yrs
icon
₹10L - ₹15L / yr
PySpark
SQL
Test Automation (QA)
Big Data
Data Science

Lead QA: more than 5 years experience , led the team of more than 5 people in big data platform, should have experience in Test Automation framework, should have experience of Test process documentation

Read more
DP
Posted by Karunya P
icon
Bengaluru (Bangalore), Hyderabad
icon
1 - 9 yrs
icon
₹1L - ₹15L / yr
SQL
Python
Hadoop
HiveQL
Spark
+1 more

Responsibilities:

 

* 3+ years of Data Engineering Experience - Design, develop, deliver and maintain data infrastructures.

SQL Specialist – Strong knowledge and Seasoned experience with SQL Queries

Languages: Python

* Good communicator, shows initiative, works well with stakeholders.

* Experience working closely with Data Analysts and provide the data they need and guide them on the issues.

* Solid ETL experience and Hadoop/Hive/Pyspark/Presto/ SparkSQL

* Solid communication and articulation skills

* Able to handle stakeholders independently with less interventions of reporting manager.

* Develop strategies to solve problems in logical yet creative ways.

* Create custom reports and presentations accompanied by strong data visualization and storytelling

 

We would be excited if you have:

 

* Excellent communication and interpersonal skills

* Ability to meet deadlines and manage project delivery

* Excellent report-writing and presentation skills

* Critical thinking and problem-solving capabilities

Read more

Top 3 Fintech Startup

Agency job
via Jobdost by Sathish Kumar
icon
Bengaluru (Bangalore)
icon
6 - 9 yrs
icon
₹20L - ₹30L / yr
Amazon Web Services (AWS)
PySpark
SQL
Apache Spark
Python

We are looking for an exceptionally talented Lead data engineer who has exposure in implementing AWS services to build data pipelines, api integration and designing data warehouse. Candidate with both hands-on and leadership capabilities will be ideal for this position.

 

Qualification: At least a bachelor’s degree in Science, Engineering, Applied Mathematics. Preferred Masters degree

 

Job Responsibilities:

• Total 6+ years of experience as a Data Engineer and 2+ years of experience in managing a team

• Have minimum 3 years of AWS Cloud experience.

• Well versed in languages such as Python, PySpark, SQL, NodeJS etc

• Has extensive experience in Spark ecosystem and has worked on both real time and batch processing

• Have experience in AWS Glue, EMR, DMS, Lambda, S3, DynamoDB, Step functions, Airflow, RDS, Aurora etc.

• Experience with modern Database systems such as Redshift, Presto, Hive etc.

• Worked on building data lakes in the past on S3 or Apache Hudi

• Solid understanding of Data Warehousing Concepts

• Good to have experience on tools such as Kafka or Kinesis

• Good to have AWS Developer Associate or Solutions Architect Associate Certification

• Have experience in managing a team

Read more

Consulting and Services company

Agency job
via Jobdost by Sathish Kumar
icon
Hyderabad, Ahmedabad
icon
5 - 10 yrs
icon
₹5L - ₹30L / yr
Amazon Web Services (AWS)
Apache
Python
PySpark

Data Engineer 

  

Mandatory Requirements  

  • Experience in AWS Glue 
  • Experience in Apache Parquet  
  • Proficient in AWS S3 and data lake  
  • Knowledge of Snowflake 
  • Understanding of file-based ingestion best practices. 
  • Scripting language - Python & pyspark 

 

CORE RESPONSIBILITIES 

  • Create and manage cloud resources in AWS  
  • Data ingestion from different data sources which exposes data using different technologies, such as: RDBMS, REST HTTP API, flat files, Streams, and Time series data based on various proprietary systems. Implement data ingestion and processing with the help of Big Data technologies  
  • Data processing/transformation using various technologies such as Spark and Cloud Services. You will need to understand your part of business logic and implement it using the language supported by the base data platform  
  • Develop automated data quality check to make sure right data enters the platform and verifying the results of the calculations  
  • Develop an infrastructure to collect, transform, combine and publish/distribute customer data. 
  • Define process improvement opportunities to optimize data collection, insights and displays. 
  • Ensure data and results are accessible, scalable, efficient, accurate, complete and flexible  
  • Identify and interpret trends and patterns from complex data sets  
  • Construct a framework utilizing data visualization tools and techniques to present consolidated analytical and actionable results to relevant stakeholders.  
  • Key participant in regular Scrum ceremonies with the agile teams   
  • Proficient at developing queries, writing reports and presenting findings  
  • Mentor junior members and bring best industry practices  

 

QUALIFICATIONS 

  • 5-7+ years’ experience as data engineer in consumer finance or equivalent industry (consumer loans, collections, servicing, optional product, and insurance sales)  
  • Strong background in math, statistics, computer science, data science or related discipline 
  • Advanced knowledge one of language: Java, Scala, Python, C#  
  • Production experience with: HDFS, YARN, Hive, Spark, Kafka, Oozie / Airflow, Amazon Web Services (AWS), Docker / Kubernetes, Snowflake   
  • Proficient with 
  • Data mining/programming tools (e.g. SAS, SQL, R, Python) 
  • Database technologies (e.g. PostgreSQL, Redshift, Snowflake. and Greenplum) 
  • Data visualization (e.g. Tableau, Looker, MicroStrategy) 
  • Comfortable learning about and deploying new technologies and tools.  
  • Organizational skills and the ability to handle multiple projects and priorities simultaneously and meet established deadlines.  
  • Good written and oral communication skills and ability to present results to non-technical audiences  
  • Knowledge of business intelligence and analytical tools, technologies and techniques. 

 

Familiarity and experience in the following is a plus:  

  • AWS certification 
  • Spark Streaming  
  • Kafka Streaming / Kafka Connect  
  • ELK Stack  
  • Cassandra / MongoDB  
  • CI/CD: Jenkins, GitLab, Jira, Confluence other related tools 
Read more

Persistent System Ltd

Agency job
via Milestone Hr Consultancy by Haina khan
icon
Pune, Bengaluru (Bangalore), Hyderabad
icon
4 - 9 yrs
icon
₹8L - ₹27L / yr
Python
PySpark
Amazon Web Services (AWS)
Spark
Scala
Greetings..

We have urgent requirement of Data Engineer/Sr Data Engineer for reputed MNC company.

Exp: 4-9yrs

Location: Pune/Bangalore/Hyderabad

Skills: We need candidate either Python AWS or Pyspark AWS or Spark Scala
Read more

Top startup of India - News App

Agency job
via Jobdost by Sathish Kumar
icon
Noida
icon
6 - 10 yrs
icon
₹35L - ₹65L / yr
Data Science
Machine Learning (ML)
Natural Language Processing (NLP)
Computer Vision
TensorFlow
+6 more
This will be an individual contributor role and people from Tier 1/2 and Product based company can only apply.

Requirements-

● B.Tech/Masters in Mathematics, Statistics, Computer Science or another quantitative field
● 2-3+ years of work experience in ML domain ( 2-5 years experience )
● Hands-on coding experience in Python
● Experience in machine learning techniques such as Regression, Classification,Predictive modeling, Clustering, Deep Learning stack, NLP.
● Working knowledge of Tensorflow/PyTorch
Optional Add-ons-
● Experience with distributed computing frameworks: Map/Reduce, Hadoop, Spark etc.
● Experience with databases: MongoDB
Read more
icon
Pune, Bengaluru (Bangalore), Hyderabad, Nagpur
icon
4 - 9 yrs
icon
₹4L - ₹15L / yr
Spark
Hadoop
Big Data
Data engineering
PySpark
+3 more
Greetings..

We have an urgent requirements of Big Data Developer profiles in our reputed MNC company.

Location: Pune/Bangalore/Hyderabad/Nagpur
Experience: 4-9yrs

Skills: Pyspark,AWS
or Spark,Scala,AWS
or Python Aws
Read more

Picture the future

Agency job
via Jobdost by Sathish Kumar
icon
Hyderabad
icon
4 - 7 yrs
icon
₹5L - ₹15L / yr
PySpark
Data engineering
Big Data
Hadoop
Spark
+7 more

CORE RESPONSIBILITIES

  • Create and manage cloud resources in AWS 
  • Data ingestion from different data sources which exposes data using different technologies, such as: RDBMS, REST HTTP API, flat files, Streams, and Time series data based on various proprietary systems. Implement data ingestion and processing with the help of Big Data technologies 
  • Data processing/transformation using various technologies such as Spark and Cloud Services. You will need to understand your part of business logic and implement it using the language supported by the base data platform 
  • Develop automated data quality check to make sure right data enters the platform and verifying the results of the calculations 
  • Develop an infrastructure to collect, transform, combine and publish/distribute customer data.
  • Define process improvement opportunities to optimize data collection, insights and displays.
  • Ensure data and results are accessible, scalable, efficient, accurate, complete and flexible 
  • Identify and interpret trends and patterns from complex data sets 
  • Construct a framework utilizing data visualization tools and techniques to present consolidated analytical and actionable results to relevant stakeholders. 
  • Key participant in regular Scrum ceremonies with the agile teams  
  • Proficient at developing queries, writing reports and presenting findings 
  • Mentor junior members and bring best industry practices 

 

QUALIFICATIONS

  • 5-7+ years’ experience as data engineer in consumer finance or equivalent industry (consumer loans, collections, servicing, optional product, and insurance sales) 
  • Strong background in math, statistics, computer science, data science or related discipline
  • Advanced knowledge one of language: Java, Scala, Python, C# 
  • Production experience with: HDFS, YARN, Hive, Spark, Kafka, Oozie / Airflow, Amazon Web Services (AWS), Docker / Kubernetes, Snowflake  
  • Proficient with
  • Data mining/programming tools (e.g. SAS, SQL, R, Python)
  • Database technologies (e.g. PostgreSQL, Redshift, Snowflake. and Greenplum)
  • Data visualization (e.g. Tableau, Looker, MicroStrategy)
  • Comfortable learning about and deploying new technologies and tools. 
  • Organizational skills and the ability to handle multiple projects and priorities simultaneously and meet established deadlines. 
  • Good written and oral communication skills and ability to present results to non-technical audiences 
  • Knowledge of business intelligence and analytical tools, technologies and techniques.


Mandatory Requirements 

  • Experience in AWS Glue
  • Experience in Apache Parquet 
  • Proficient in AWS S3 and data lake 
  • Knowledge of Snowflake
  • Understanding of file-based ingestion best practices.
  • Scripting language - Python & pyspark

 

Read more
icon
Remote only
icon
7 - 13 yrs
icon
₹15L - ₹35L / yr
PySpark
Data engineering
Big Data
Hadoop
Spark
+4 more
Experience
Experience Range

2 Years - 10 Years

Function Information Technology
Desired Skills
Must Have Skills:
• Good experience in Pyspark - Including Dataframe core functions and Spark SQL
• Good experience in SQL DBs - Be able to write queries including fair complexity.
• Should have excellent experience in Big Data programming for data transformation and aggregations
• Good at ELT architecture. Business rules processing and data extraction from Data Lake into data streams for business consumption.
• Good customer communication.
• Good Analytical skills
Education
Education Type Engineering
Degree / Diploma Bachelor of Engineering, Bachelor of Computer Applications, Any Engineering
Specialization / Subject Any Specialisation
Job Type Full Time
Job ID 000018
Department Software Development
Read more
icon
Bengaluru (Bangalore)
icon
7 - 14 yrs
icon
Best in industry
Python
PySpark
Spark
Microsoft Windows Azure
Data engineering
+6 more
Whom we want:
 
SkyPoint is looking for ambitious, independent engineers who want to have a big impact at a fast-growing company. You will work on our core data pipeline and the integrations that bring in data from many sources we support. We are looking for people who can understand the key values that make our product great and implement those values in the many small decisions you make every day as a developer
 
What you do:
 
As a Principal Data Engineering at SkyPoint:
 
You will be working with Python, PySpark, Spark (Azure Databricks), VS Code, REST APIs, Azure Durable Functions, Cosmos DB, Serverless, and Kubernetes container-based microservices and interacting with various Delta Lakehouse and NoSQL databases.
You will process the data into clean, unified, incremental, automated updates via Azure Durable Functions, Azure Data Factory, Delta Lake, and Spark.
Having Managerial experience in taking ownership of the product and leading the team.
 
Primary Duties & Responsibilities:
 
·       Making high-level estimation with a hypothesis and critical points.
·       Delivering components aligned with scope, budget, and planning committed with the business owner.
·       Design roadmap and follow progress with identification of critical points and risks.
·       Experience working with languages like Python, and Go and technologies such as serverless and containers.
·       Strong technical and problem-solving skills, with recent hands-on in Azure Machine Learning are good to have.
·       Experience in reliable distributed systems, with an emphasis on high-volume data management within the enterprise and/or web-scale products and platforms that operate under strict SLAs.
·       Broad technical knowledge which encompasses Software development and automation.
·       Experience with the use of a wide array of algorithms and data structures.
·       Expertise in working with Azure Functions, Azure Data Lake, Azure Data Factory, Azure Databricks/Spark, Azure DevOps, PySpark, Scikit-learn, TensorFlow, Keras, PyTorch
·       Best practices in design and programming.
·       Entrepreneurial mindset, excellent communication, and technical leadership skills.
·       Create and contribute to an environment that is geared to innovation, high productivity, high quality, and customer service.
 
 
Skills & Experience Require:
 
·       The ideal candidate will be an enthusiastic leader with building experience in a professional environment, and overall 6+ years of hands-on technical leadership quality. Possesses at least 2+ years in a leadership role.
·       Bachelor’s/Master’s degree, preferably in Software Engineering or Computer Science and from a reputed institution.
·       Planning and executing strategies for completing projects on time.
·       Researching and developing designs and products.
·       Ensuring products have the support of upper management.
·       Providing clear and concise instructions to the team.
·       Implementing and providing tools for non-regression tests automation.
·       Generating and reviewing documentation for all database changes or refinements.
·       Managing the entire module and having team-leading experience.
·       Making recommendations for software, hardware, and data storage upgrades.
·       Communicating with your team members and staff to effectively understand and interpret data changes or requirements.
·       Most recent work experience MUST include working on Python, Spark(Azure Databricks), Azure Durable Functions, Cosmos DB, Azure Data Factory, Delta Lakehouse, PySpark, NoSQL DB, Serverless, and Kubernetes container-based microservices
·       Excellent verbal and written communication skills.
Read more

Top 3 Fintech Startup

Agency job
via Jobdost by Sathish Kumar
icon
Bengaluru (Bangalore)
icon
4 - 7 yrs
icon
₹11L - ₹17L / yr
Machine Learning (ML)
Data Science
Natural Language Processing (NLP)
Computer Vision
Python
+6 more
Responsible to lead a team of analysts to build and deploy predictive models to infuse core business functions with deep analytical insights. The Senior Data Scientist will also work
closely with the Kinara management team to investigate strategically important business
questions.

Lead a team through the entire analytical and machine learning model life cycle:

 Define the problem statement
 Build and clean datasets
 Exploratory data analysis
 Feature engineering
 Apply ML algorithms and assess the performance
 Code for deployment
 Code testing and troubleshooting
 Communicate Analysis to Stakeholders
 Manage Data Analysts and Data Scientists
Read more

Hiring for one of the MNC for India location

Agency job
via Natalie Consultants by Rahul Kumar
icon
Gurugram, Pune, Bengaluru (Bangalore), Delhi, Noida, Ghaziabad, Faridabad
icon
2 - 9 yrs
icon
₹8L - ₹20L / yr
Python
Hadoop
Big Data
Spark
Data engineering
+3 more

Key Responsibilities : ( Data Developer Python, Spark)

Exp : 2 to 9 Yrs 

Development of data platforms, integration frameworks, processes, and code.

Develop and deliver APIs in Python or Scala for Business Intelligence applications build using a range of web languages

Develop comprehensive automated tests for features via end-to-end integration tests, performance tests, acceptance tests and unit tests.

Elaborate stories in a collaborative agile environment (SCRUM or Kanban)

Familiarity with cloud platforms like GCP, AWS or Azure.

Experience with large data volumes.

Familiarity with writing rest-based services.

Experience with distributed processing and systems

Experience with Hadoop / Spark toolsets

Experience with relational database management systems (RDBMS)

Experience with Data Flow development

Knowledge of Agile and associated development techniques including:

Read more
icon
Remote only
icon
3 - 6 yrs
icon
₹5L - ₹15L / yr
Python
PySpark
PyTorch
Natural Language Processing (NLP)
API
+3 more

JOB SKILLS & QUALIFICATIONS

WHAT YOU'LL DO

  • Design model serving solutions and develop machine learning-based applications. services, and APIs so as to productionise machine learning models.
  • Set and maintain engineering standards while to grow and go far.
  • Partner with the Data Scientists (those who actually build, train and evaluate ML models) to provide an end-to-end solution for machine learning-based projects.
  • Foster the technological evolution of services and improve their end-to-end quality attributes.
  • Be committed to Continuous Integration and Continuous Deployment.

 Preferred Skills


  • Familiarity with the engineering aspects of some of popular machine learning practices, libraries, and platforms (e.g. MLflow, Kubeflow, Mleap, Michelangelo, Feast, HopsWorks, MetaFlow, Zipline, Databricks, Spark, MLlib, PyTorch, TensorFlow, and Scikit-learn among others).
  • Comfortable dealing with trade-offs project delivery and quality, especially those involving latency, throughput, and transactions.
  • Experience Continuous Integration & Continuous Deployment processes and platforms, software design patterns and APIs.
  • A person that enjoys staying on top of all the best practices and tools of modern software engineering, while being a advocate of code quality and continuous improvement.
  • Someone interested in large-scale systems and passionate about solving complex problems while being open and comfortable with changes in the tech stack the teams use.
Read more

A leading global information technology and business process

Agency job
via Jobdost by Mamatha A
icon
Chennai
icon
5 - 14 yrs
icon
₹13L - ₹21L / yr
Python
Java
PySpark
Javascript
Hadoop

Python + Data scientist : 
• Hands-on and sound knowledge of Python, Pyspark, Java script

• Build data-driven models to understand the characteristics of engineering systems

• Train, tune, validate, and monitor predictive models

• Sound knowledge on Statistics

• Experience in developing data processing tasks using PySpark such as reading,

merging, enrichment, loading of data from external systems to target data destinations

• Working knowledge on Big Data or/and Hadoop environments

• Experience creating CI/CD Pipelines using Jenkins or like tools

• Practiced in eXtreme Programming (XP) disciplines 

Read more
icon
Navi Mumbai
icon
3 - 5 yrs
icon
₹7L - ₹18L / yr
PySpark
Data engineering
Big Data
Hadoop
Spark
+6 more
  • Proficiency in shell scripting
  • Proficiency in automation of tasks
  • Proficiency in Pyspark/Python
  • Proficiency in writing and understanding of sqoop
  • Understanding of CloudEra manager
  • Good understanding of RDBMS
  • Good understanding of Excel

 

Read more

A global provider of Business Process Management company

Agency job
via Jobdost by Saida Jabbar
icon
Bengaluru (Bangalore), UK
icon
5 - 10 yrs
icon
₹15L - ₹25L / yr
Data Visualization
PowerBI
ADF
Business Intelligence (BI)
PySpark
+11 more

Power BI Developer

Senior visualization engineer with 5 years’ experience in Power BI to develop and deliver solutions that enable delivery of information to audiences in support of key business processes. In addition, Hands-on experience on Azure data services like ADF and databricks is a must.

Ensure code and design quality through execution of test plans and assist in development of standards & guidelines working closely with internal and external design, business, and technical counterparts.

Candidates should have worked in agile development environments.

Desired Competencies:

  • Should have minimum of 3 years project experience using Power BI on Azure stack.
  • Should have good understanding and working knowledge of Data Warehouse and Data Modelling.
  • Good hands-on experience of Power BI
  • Hands-on experience T-SQL/ DAX/ MDX/ SSIS
  • Data Warehousing on SQL Server (preferably 2016)
  • Experience in Azure Data Services – ADF, DataBricks & PySpark
  • Manage own workload with minimum supervision.
  • Take responsibility of projects or issues assigned to them
  • Be personable, flexible and a team player
  • Good written and verbal communications
  • Have a strong personality who will be able to operate directly with users
Read more
DP
Posted by Rajesh C
icon
Chennai
icon
5 - 7 yrs
icon
₹20L - ₹30L / yr
Technical support
Tech Support
SQL
Informatica
PySpark
+1 more
Job Title: Support Engineer L3 Job Location: Chennai
We CondéNast are looking for a Support engineer Level 2 who would be responsible for
monitoring and maintaining the production systems to ensure the business continuity is
maintained. Your Responsibilities would also include prompt communication to business
and internal teams about process delays, stability, issue, resolutions.
Primary Responsibilities
● 5+ years experience in Production support
● The Support Data Engineer is responsible for monitoring of the data pipelines
that are in production.
● Level 3 support activities - Analysing issue, debug programs & Jobs, bug fix
● The position will contribute to the monitoring, rerun or reschedule, code fix
of pipelines for a variety of projects on a daily basis.
● Escalate failures to Data-Team/DevOps incase of Infrastructure Failures or unable
to revive the data-pipelines.
● Ensure accurate alerts are raised incase of pipeline failures and corresponding
stakeholders (Business/Data Teams) are notified about the same within the
agreed upon SLAs.
● Prepare and present success/failure metrics by accurately logging the
monitoring stats.
● Able to work in shifts to provide overlap with US Business teams
● Other duties as requested or assigned.
Desired Skills & Qualification
● Have Strong working knowledge of Pyspark, Informatica, SQL(PRESTO), Batch
Handling through schedulers(databricks, Astronomer will be an
advantage),AWS-S3, SQL, Airflow and Hive/Presto
● Have basic knowledge on Shell scripts and/or Bash commands.
● Able to execute queries in Databases and produce outputs.
● Able to understand and execute the steps provided by Data-Team to
revive data-pipelines.
● Strong verbal, written communication skills and strong interpersonal
skills.
● Graduate/Diploma in computer science or information technology.
About Condé Nast
CONDÉ NAST GLOBAL
Condé Nast is a global media house with over a century of distinguished publishing
history. With a portfolio of iconic brands like Vogue, GQ, Vanity Fair, The New Yorker and
Bon Appétit, we at Condé Nast aim to tell powerful, compelling stories of communities,
culture and the contemporary world. Our operations are headquartered in New York and
London, with colleagues and collaborators in 32 markets across the world, including
France, Germany, India, China, Japan, Spain, Italy, Russia, Mexico, and Latin America.
Condé Nast has been raising the industry standards and setting records for excellence in
the publishing space. Today, our brands reach over 1 billion people in print, online, video,
and social media.
CONDÉ NAST INDIA (DATA)
Over the years, Condé Nast successfully expanded and diversified into digital, TV, and
social platforms - in other words, a staggering amount of user data. Condé Nast made the
right move to invest heavily in understanding this data and formed a whole new Data
team entirely dedicated to data processing, engineering, analytics, and visualization. This
team helps drive engagement, fuel process innovation, further content enrichment, and
increase market revenue. The Data team aimed to create a company culture where data
was the common language and facilitate an environment where insights shared in
real-time could improve performance. The Global Data team operates out of Los Angeles,
New York, Chennai, and London. The team at Condé Nast Chennai works extensively with
data to amplify its brands' digital capabilities and boost online revenue. We are broadly
divided into four groups, Data Intelligence, Data Engineering, Data Science, and
Operations (including Product and Marketing Ops, Client Services) along with Data
Strategy and monetization. The teams built capabilities and products to create
data-driven solutions for better audience engagement.
What we look forward to:
We want to welcome bright, new minds into our midst and work together to create
diverse forms of self-expression. At Condé Nast, we encourage the imaginative and
celebrate the extraordinary. We are a media company for the future, with a remarkable
past. We are Condé Nast, and It Starts Here.
Read more

UAE Client

Agency job
via Fragma Data Systems by Evelyn Charles
icon
Remote only
icon
4.5 - 12 yrs
icon
₹20L - ₹30L / yr
PySpark
SQL
Data engineering
Big Data
Hadoop
+1 more

Must Have Skills:

 

  • Good experience in Pyspark - Including Dataframe core functions and Spark SQL
  • Good experience in SQL DBs - Be able to write queries including fair complexity.
  • Should have excellent experience in Big Data programming for data transformation and aggregations
  • Good at ELT architecture. Business rules processing and data extraction from Data Lake into data streams for business consumption.
  • Good customer communication.
  • Good Analytical skills

 

Technology Skills (Good to Have):

 

  • Building and operationalizing large scale enterprise data solutions and applications using one or more of AZURE data and analytics services in combination with custom solutions - Azure Synapse/Azure SQL DWH, Azure Data Lake, Azure Blob Storage, Spark, HDInsights, Databricks, CosmosDB, EventHub/IOTHub.
  • Experience in migrating on-premise data warehouses to data platforms on AZURE cloud. 
  • Designing and implementing data engineering, ingestion, and transformation functions
  • Azure Synapse or Azure SQL data warehouse
  • Spark on Azure is available in HD insights and data bricks

 

Read more
icon
Bengaluru (Bangalore)
icon
6 - 8 yrs
icon
₹8L - ₹15L / yr
PySpark
Data engineering
Big Data
Hadoop
Spark
+5 more
6-8years of experience in data engineer
Spark
Hadoop
Big Data
Data engineering
PySpark
Python
AWS Lambda
SQL
hadoop
kafka
Read more
DP
Posted by PriyaSaini
icon
Remote only
icon
3 - 8 yrs
icon
₹5L - ₹12L / yr
Data Analytics
Data modeling
Python
PySpark
ETL
+3 more

Role Description:

  • You will be part of the data delivery team and will have the opportunity to develop a deep understanding of the domain/function.
  • You will design and drive the work plan for the optimization/automation and standardization of the processes incorporating best practices to achieve efficiency gains.
  • You will run data engineering pipelines, link raw client data with data model, conduct data assessment, perform data quality checks, and transform data using ETL tools.
  • You will perform data transformations, modeling, and validation activities, as well as configure applications to the client context. You will also develop scripts to validate, transform, and load raw data using programming languages such as Python and / or PySpark.
  • In this role, you will determine database structural requirements by analyzing client operations, applications, and programming.
  • You will develop cross-site relationships to enhance idea generation, and manage stakeholders.
  • Lastly, you will collaborate with the team to support ongoing business processes by delivering high-quality end products on-time and perform quality checks wherever required.

Job Requirement:

  • Bachelor’s degree in Engineering or Computer Science; Master’s degree is a plus
  • 3+ years of professional work experience with a reputed analytics firm
  • Expertise in handling large amount of data through Python or PySpark
  • Conduct data assessment, perform data quality checks and transform data using SQL and ETL tools
  • Experience of deploying ETL / data pipelines and workflows in cloud technologies and architecture such as Azure and Amazon Web Services will be valued
  • Comfort with data modelling principles (e.g. database structure, entity relationships, UID etc.) and software development principles (e.g. modularization, testing, refactoring, etc.)
  • A thoughtful and comfortable communicator (verbal and written) with the ability to facilitate discussions and conduct training
  • Strong problem-solving, requirement gathering, and leading.
  • Track record of completing projects successfully on time, within budget and as per scope

Read more
icon
Remote only
icon
1 - 5 yrs
icon
₹10L - ₹15L / yr
SQL
PySpark
Responsible for developing and maintaining applications with PySpark 
• Contribute to the overall design and architecture of the application developed and deployed.
• Performance Tuning wrt to executor sizing and other environmental parameters, code optimization, partitions tuning, etc.
• Interact with business users to understand requirements and troubleshoot issues.
• Implement Projects based on functional specifications.

Must Have Skills:
• Good experience in Pyspark - Including Dataframe core functions and Spark SQL
• Good experience in SQL DBs - Be able to write queries including fair complexity.
• Should have excellent experience in Big Data programming for data transformation and aggregations
• Good at ELT architecture. Business rules processing and data extraction from Data Lake into data streams for business consumption.
• Good customer communication.
• Good Analytical skills
Read more
DP
Posted by Harpreet kour
icon
Bengaluru (Bangalore)
icon
1 - 6 yrs
icon
₹10L - ₹15L / yr
Data engineering
Big Data
PySpark
SQL
Python
 Good experience in Pyspark - Including Dataframe core functions and Spark SQL
Good experience in SQL DBs - Be able to write queries including fair complexity.
Should have excellent experience in Big Data programming for data transformation and aggregations
Good at ELT architecture. Business rules processing and data extraction from Data Lake into data streams for business consumption.
 Good customer communication.
 Good Analytical skills
Read more
icon
Remote only
icon
1.5 - 5 yrs
icon
₹8L - ₹15L / yr
PySpark
SQL
• Responsible for developing and maintaining applications with PySpark 
• Contribute to the overall design and architecture of the application developed and deployed.
• Performance Tuning wrt to executor sizing and other environmental parameters, code optimization, partitions tuning, etc.
• Interact with business users to understand requirements and troubleshoot issues.
• Implement Projects based on functional specifications.

Must Have Skills:
• Good experience in Pyspark - Including Dataframe core functions and Spark SQL
• Good experience in SQL DBs - Be able to write queries including fair complexity.
• Should have excellent experience in Big Data programming for data transformation and aggregations
• Good at ELT architecture. Business rules processing and data extraction from Data Lake into data streams for business consumption.
• Good customer communication.
• Good Analytical skills
Read more

Product Development

Agency job
via Purple Hirez by Aditya K
icon
Hyderabad
icon
12 - 20 yrs
icon
₹15L - ₹50L / yr
Analytics
Data Analytics
Kubernetes
PySpark
Python
+1 more

Job Description

We are looking for an experienced engineer with superb technical skills. Primarily be responsible for architecting and building large scale data pipelines that delivers AI and Analytical solutions to our customers. The right candidate will enthusiastically take ownership in developing and managing a continuously improving, robust, scalable software solutions.

Although your primary responsibilities will be around back-end work, we prize individuals who are willing to step in and contribute to other areas including automation, tooling, and management applications. Experience with or desire to learn Machine Learning a plus.

 

Skills

  • Bachelors/Masters/Phd in CS or equivalent industry experience
  • Demonstrated expertise of building and shipping cloud native applications
  • 5+ years of industry experience in administering (including setting up, managing, monitoring) data processing pipelines (both streaming and batch) using frameworks such as Kafka Streams, Py Spark, and streaming databases like druid or equivalent like Hive
  • Strong industry expertise with containerization technologies including kubernetes (EKS/AKS), Kubeflow
  • Experience with cloud platform services such as AWS, Azure or GCP especially with EKS, Managed Kafka
  • 5+ Industry experience in python
  • Experience with popular modern web frameworks such as Spring boot, Play framework, or Django
  • Experience with scripting languages. Python experience highly desirable. Experience in API development using Swagger
  • Implementing automated testing platforms and unit tests
  • Proficient understanding of code versioning tools, such as Git
  • Familiarity with continuous integration, Jenkins

Responsibilities

  • Architect, Design and Implement Large scale data processing pipelines using Kafka Streams, PySpark, Fluentd and Druid
  • Create custom Operators for Kubernetes, Kubeflow
  • Develop data ingestion processes and ETLs
  • Assist in dev ops operations
  • Design and Implement APIs
  • Identify performance bottlenecks and bugs, and devise solutions to these problems
  • Help maintain code quality, organization, and documentation
  • Communicate with stakeholders regarding various aspects of solution.
  • Mentor team members on best practices
Read more
Agency job
via Response Informatics by Swagatika Sahoo
icon
Chennai, Bengaluru (Bangalore), Pune, Mumbai, Hyderabad
icon
3 - 10 yrs
icon
₹10L - ₹24L / yr
PySpark
Python
Amazon Web Services (AWS)
Apache Spark
Glue semantics
+3 more
  • Minimum 1 years of relevant experience, in PySpark (mandatory)
  • Hands on experience in development, test, deploy, maintain and improving data integration pipeline in AWS cloud environment is added plus 
  • Ability to play lead role and independently manage 3-5 member of Pyspark development team 
  • EMR ,Python and PYspark mandate.
  • Knowledge and awareness working with AWS Cloud technologies like Apache Spark, , Glue, Kafka, Kinesis, and Lambda in S3, Redshift, RDS
Read more
Agency job
via Response Informatics by Anupama Lavanya Uppala
icon
Chennai, Bengaluru (Bangalore), Mumbai, Hyderabad, Pune
icon
3 - 10 yrs
icon
₹10L - ₹25L / yr
PySpark
Python
  • Minimum 1 years of relevant experience, in PySpark (mandatory)
  • Hands on experience in development, test, deploy, maintain and improving data integration pipeline in AWS cloud environment is added plus 
  • Ability to play lead role and independently manage 3-5 member of Pyspark development team 
  • EMR ,Python and PYspark mandate.
  • Knowledge and awareness working with AWS Cloud technologies like Apache Spark, , Glue, Kafka, Kinesis, and Lambda in S3, Redshift, RDS
Read more
DP
Posted by Alex P
icon
Remote, Gurugram, Delhi, Noida, Ghaziabad, Faridabad
icon
2 - 7 yrs
icon
₹4L - ₹30L / yr
Amazon Web Services (AWS)
Python
Spark
PySpark
Apache

Striving for excellence is in our DNA.

We are more than just specialists; we are experts in agile software development with a keen focus on Cloud Native D3 (Digital, Data, DevSecOps.  We help leading global businesses to imagine, design, engineer, and deliver software and digital experiences that change the world.

 

Description

Headquartered in Princeton, NJ (United States) we are a multinational company that is growing fast. This role is based out of our India setup. 

We believe that we are only as good as the quality of our people. Our offices are digital pods. Our clients are fortune brands. We’re always looking for the most talented and skilled teammates. Do you have it in you?

 

 

Read more

Cloud infrastructure solutions and support company. (SE1)

Agency job
via Multi Recruit by Ranjini A R
icon
Pune
icon
2 - 6 yrs
icon
₹12L - ₹16L / yr
SQL
ETL
Data engineering
Big Data
Java
+2 more
  • Design, create, test, and maintain data pipeline architecture in collaboration with the Data Architect.
  • Build the infrastructure required for extraction, transformation, and loading of data from a wide variety of data sources using Java, SQL, and Big Data technologies.
  • Support the translation of data needs into technical system requirements. Support in building complex queries required by the product teams.
  • Build data pipelines that clean, transform, and aggregate data from disparate sources
  • Develop, maintain and optimize ETLs to increase data accuracy, data stability, data availability, and pipeline performance.
  • Engage with Product Management and Business to deploy and monitor products/services on cloud platforms.
  • Stay up-to-date with advances in data persistence and big data technologies and run pilots to design the data architecture to scale with the increased data sets of consumer experience.
  • Handle data integration, consolidation, and reconciliation activities for digital consumer / medical products.

Job Qualifications:

  • Bachelor’s or master's degree in Computer Science, Information management, Statistics or related field
  • 5+ years of experience in the Consumer or Healthcare industry in an analytical role with a focus on building on data pipelines, querying data, analyzing, and clearly presenting analyses to members of the data science team.
  • Technical expertise with data models, data mining.
  • Hands-on Knowledge of programming languages in Java, Python, R, and Scala.
  • Strong knowledge in Big data tools like the snowflake, AWS Redshift, Hadoop, map-reduce, etc.
  • Having knowledge in tools like AWS Glue, S3, AWS EMR, Streaming data pipelines, Kafka/Kinesis is desirable.
  • Hands-on knowledge in SQL and No-SQL database design.
  • Having knowledge in CI/CD for the building and hosting of the solutions.
  • Having AWS certification is an added advantage.
  • Having Strong knowledge in visualization tools like Tableau, QlikView is an added advantage
  • A team player capable of working and integrating across cross-functional teams for implementing project requirements. Experience in technical requirements gathering and documentation.
  • Ability to work effectively and independently in a fast-paced agile environment with tight deadlines
  • A flexible, pragmatic, and collaborative team player with the innate ability to engage with data architects, analysts, and scientists
Read more
icon
Abu Dhabi, Dubai
icon
6 - 12 yrs
icon
₹18L - ₹25L / yr
PySpark
Big Data
Spark
Data Warehouse (DWH)
SQL
+2 more
Must-Have Skills:
• Good experience in Pyspark - Including Dataframe core functions and Spark SQL
• Good experience in SQL DBs - Be able to write queries including fair complexity.
• Should have excellent experience in Big Data programming for data transformation and aggregations
• Good at ELT architecture. Business rules processing and data extraction from Data Lake into data streams for business consumption.
• Good customer communication.
• Good Analytical skill
 
 
Technology Skills (Good to Have):
  • Building and operationalizing large scale enterprise data solutions and applications using one or more of AZURE data and analytics services in combination with custom solutions - Azure Synapse/Azure SQL DWH, Azure Data Lake, Azure Blob Storage, Spark, HDInsights, Databricks, CosmosDB, EventHub/IOTHub.
  • Experience in migrating on-premise data warehouses to data platforms on AZURE cloud. 
  • Designing and implementing data engineering, ingestion, and transformation functions
  • Azure Synapse or Azure SQL data warehouse
  • Spark on Azure is available in HD insights and data bricks
Read more
icon
Pune
icon
4 - 12 yrs
icon
₹6L - ₹15L / yr
Data engineering
Data Engineer
ETL
Spark
Apache Kafka
+5 more
We are looking for a smart candidate with:
  • Strong Python Coding skills and OOP skills
  • Should have worked on Big Data product Architecture
  • Should have worked with any one of the SQL-based databases like MySQL, PostgreSQL and any one of
  • NoSQL-based databases such as Cassandra, Elasticsearch etc.
  • Hands on experience on frameworks like Spark RDD, DataFrame, Dataset
  • Experience on development of ETL for data product
  • Candidate should have working knowledge on performance optimization, optimal resource utilization, Parallelism and tuning of spark jobs
  • Working knowledge on file formats: CSV, JSON, XML, PARQUET, ORC, AVRO
  • Good to have working knowledge with any one of the Analytical Databases like Druid, MongoDB, Apache Hive etc.
  • Experience to handle real-time data feeds (good to have working knowledge on Apache Kafka or similar tool)
Key Skills:
  • Python and Scala (Optional), Spark / PySpark, Parallel programming
Read more

UAE Client

Agency job
via Fragma Data Systems by Harpreet kour
icon
Dubai, Bengaluru (Bangalore)
icon
4 - 8 yrs
icon
₹6L - ₹16L / yr
Data engineering
Data Engineer
Big Data
Big Data Engineer
Apache Spark
+3 more
• Responsible for developing and maintaining applications with PySpark 
• Contribute to the overall design and architecture of the application developed and deployed.
• Performance Tuning wrt to executor sizing and other environmental parameters, code optimization, partitions tuning, etc.
• Interact with business users to understand requirements and troubleshoot issues.
• Implement Projects based on functional specifications.

Must Have Skills:
• Good experience in Pyspark - Including Dataframe core functions and Spark SQL
• Good experience in SQL DBs - Be able to write queries including fair complexity.
• Should have excellent experience in Big Data programming for data transformation and aggregations
• Good at ELT architecture. Business rules processing and data extraction from Data Lake into data streams for business consumption.
• Good customer communication.
• Good Analytical skills
Read more
Get to hear about interesting companies hiring right now
iconFollow Cutshort
Why apply via Cutshort?
Connect with actual hiring teams and get their fast response. No spam.
Learn more
Get to hear about interesting companies hiring right now
iconFollow Cutshort