15+ PySpark Jobs in Delhi, NCR and Gurgaon | PySpark Job openings in Delhi, NCR and Gurgaon
Apply to 15+ PySpark Jobs in Delhi, NCR and Gurgaon on CutShort.io. Explore the latest PySpark Job opportunities across top companies like Google, Amazon & Adobe.
Building the machine learning production (or MLOps) is the biggest challenge most large companies currently have in making the transition to becoming an AI-driven organization. This position is an opportunity for an experienced, server-side developer to build expertise in this exciting new frontier. You will be part of a team deploying state-of-the-art AI solutions for Fractal clients.
Responsibilities
As MLOps Engineer, you will work collaboratively with Data Scientists and Data engineers to deploy and operate advanced analytics machine learning models. You’ll help automate and streamline Model development and Model operations. You’ll build and maintain tools for deployment, monitoring, and operations. You’ll also troubleshoot and resolve issues in development, testing, and production environments.
- Enable Model tracking, model experimentation, Model automation
- Develop ML pipelines to support
- Develop MLOps components in Machine learning development life cycle using Model Repository (either of): MLFlow, Kubeflow Model Registry
- Develop MLOps components in Machine learning development life cycle using Machine Learning Services (either of): Kubeflow, DataRobot, HopsWorks, Dataiku or any relevant ML E2E PaaS/SaaS
- Work across all phases of Model development life cycle to build MLOPS components
- Build the knowledge base required to deliver increasingly complex MLOPS projects on Azure
- Be an integral part of client business development and delivery engagements across multiple domains
Required Qualifications
- 3-5 years experience building production-quality software.
- B.E/B.Tech/M.Tech in Computer Science or related technical degree OR Equivalent
- Strong experience in System Integration, Application Development or Data Warehouse projects across technologies used in the enterprise space
- Knowledge of MLOps, machine learning and docker
- Object-oriented languages (e.g. Python, PySpark, Java, C#, C++)
- CI/CD experience( i.e. Jenkins, Git hub action,
- Database programming using any flavors of SQL
- Knowledge of Git for Source code management
- Ability to collaborate effectively with highly technical resources in a fast-paced environment
- Ability to solve complex challenges/problems and rapidly deliver innovative solutions
- Foundational Knowledge of Cloud Computing on Azure
- Hunger and passion for learning new skills
Building the machine learning production System(or MLOps) is the biggest challenge most large companies currently have in making the transition to becoming an AI-driven organization. This position is an opportunity for an experienced, server-side developer to build expertise in this exciting new frontier. You will be part of a team deploying state-ofthe-art AI solutions for Fractal clients.
Responsibilities
As MLOps Engineer, you will work collaboratively with Data Scientists and Data engineers to deploy and operate advanced analytics machine learning models. You’ll help automate and streamline Model development and Model operations. You’ll build and maintain tools for deployment, monitoring, and operations. You’ll also troubleshoot and resolve issues in development, testing, and production environments.
- Enable Model tracking, model experimentation, Model automation
- Develop scalable ML pipelines
- Develop MLOps components in Machine learning development life cycle using Model Repository (either of): MLFlow, Kubeflow Model Registry
- Machine Learning Services (either of): Kubeflow, DataRobot, HopsWorks, Dataiku or any relevant ML E2E PaaS/SaaS
- Work across all phases of Model development life cycle to build MLOPS components
- Build the knowledge base required to deliver increasingly complex MLOPS projects on Azure
- Be an integral part of client business development and delivery engagements across multiple domains
Required Qualifications
- 5.5-9 years experience building production-quality software
- B.E/B.Tech/M.Tech in Computer Science or related technical degree OR equivalent
- Strong experience in System Integration, Application Development or Datawarehouse projects across technologies used in the enterprise space
- Expertise in MLOps, machine learning and docker
- Object-oriented languages (e.g. Python, PySpark, Java, C#, C++)
- Experience developing CI/CD components for production ready ML pipeline.
- Database programming using any flavors of SQL
- Knowledge of Git for Source code management
- Ability to collaborate effectively with highly technical resources in a fast-paced environment
- Ability to solve complex challenges/problems and rapidly deliver innovative solutions
- Team handling, problem solving, project management and communication skills & creative thinking
- Foundational Knowledge of Cloud Computing on Azure
- Hunger and passion for learning new skills
Technical Skills:
- Ability to understand and translate business requirements into design.
- Proficient in AWS infrastructure components such as S3, IAM, VPC, EC2, and Redshift.
- Experience in creating ETL jobs using Python/PySpark.
- Proficiency in creating AWS Lambda functions for event-based jobs.
- Knowledge of automating ETL processes using AWS Step Functions.
- Competence in building data warehouses and loading data into them.
Responsibilities:
- Understand business requirements and translate them into design.
- Assess AWS infrastructure needs for development work.
- Develop ETL jobs using Python/PySpark to meet requirements.
- Implement AWS Lambda for event-based tasks.
- Automate ETL processes using AWS Step Functions.
- Build data warehouses and manage data loading.
- Engage with customers and stakeholders to articulate the benefits of proposed solutions and frameworks.
Publicis Sapient Overview:
The Senior Associate People Senior Associate L1 in Data Engineering, you will translate client requirements into technical design, and implement components for data engineering solution. Utilize deep understanding of data integration and big data design principles in creating custom solutions or implementing package solutions. You will independently drive design discussions to insure the necessary health of the overall solution
.
Job Summary:
As Senior Associate L2 in Data Engineering, you will translate client requirements into technical design, and implement components for data engineering solution. Utilize deep understanding of data integration and big data design principles in creating custom solutions or implementing package solutions. You will independently drive design discussions to insure the necessary health of the overall solution
The role requires a hands-on technologist who has strong programming background like Java / Scala / Python, should have experience in Data Ingestion, Integration and data Wrangling, Computation, Analytics pipelines and exposure to Hadoop ecosystem components. You are also required to have hands-on knowledge on at least one of AWS, GCP, Azure cloud platforms.
Role & Responsibilities:
Your role is focused on Design, Development and delivery of solutions involving:
• Data Integration, Processing & Governance
• Data Storage and Computation Frameworks, Performance Optimizations
• Analytics & Visualizations
• Infrastructure & Cloud Computing
• Data Management Platforms
• Implement scalable architectural models for data processing and storage
• Build functionality for data ingestion from multiple heterogeneous sources in batch & real-time mode
• Build functionality for data analytics, search and aggregation
Experience Guidelines:
Mandatory Experience and Competencies:
# Competency
1.Overall 5+ years of IT experience with 3+ years in Data related technologies
2.Minimum 2.5 years of experience in Big Data technologies and working exposure in at least one cloud platform on related data services (AWS / Azure / GCP)
3.Hands-on experience with the Hadoop stack – HDFS, sqoop, kafka, Pulsar, NiFi, Spark, Spark Streaming, Flink, Storm, hive, oozie, airflow and other components required in building end to end data pipeline.
4.Strong experience in at least of the programming language Java, Scala, Python. Java preferable
5.Hands-on working knowledge of NoSQL and MPP data platforms like Hbase, MongoDb, Cassandra, AWS Redshift, Azure SQLDW, GCP BigQuery etc
6.Well-versed and working knowledge with data platform related services on at least 1 cloud platform, IAM and data security
Preferred Experience and Knowledge (Good to Have):
# Competency
1.Good knowledge of traditional ETL tools (Informatica, Talend, etc) and database technologies (Oracle, MySQL, SQL Server, Postgres) with hands on experience
2.Knowledge on data governance processes (security, lineage, catalog) and tools like Collibra, Alation etc
3.Knowledge on distributed messaging frameworks like ActiveMQ / RabbiMQ / Solace, search & indexing and Micro services architectures
4.Performance tuning and optimization of data pipelines
5.CI/CD – Infra provisioning on cloud, auto build & deployment pipelines, code quality
6.Cloud data specialty and other related Big data technology certifications
Personal Attributes:
• Strong written and verbal communication skills
• Articulation skills
• Good team player
• Self-starter who requires minimal oversight
• Ability to prioritize and manage multiple tasks
• Process orientation and the ability to define and set up processes
Publicis Sapient Overview:
The Senior Associate People Senior Associate L1 in Data Engineering, you will translate client requirements into technical design, and implement components for data engineering solution. Utilize deep understanding of data integration and big data design principles in creating custom solutions or implementing package solutions. You will independently drive design discussions to insure the necessary health of the overall solution
.
Job Summary:
As Senior Associate L1 in Data Engineering, you will do technical design, and implement components for data engineering solution. Utilize deep understanding of data integration and big data design principles in creating custom solutions or implementing package solutions. You will independently drive design discussions to insure the necessary health of the overall solution
The role requires a hands-on technologist who has strong programming background like Java / Scala / Python, should have experience in Data Ingestion, Integration and data Wrangling, Computation, Analytics pipelines and exposure to Hadoop ecosystem components. Having hands-on knowledge on at least one of AWS, GCP, Azure cloud platforms will be preferable.
Role & Responsibilities:
Job Title: Senior Associate L1 – Data Engineering
Your role is focused on Design, Development and delivery of solutions involving:
• Data Ingestion, Integration and Transformation
• Data Storage and Computation Frameworks, Performance Optimizations
• Analytics & Visualizations
• Infrastructure & Cloud Computing
• Data Management Platforms
• Build functionality for data ingestion from multiple heterogeneous sources in batch & real-time
• Build functionality for data analytics, search and aggregation
Experience Guidelines:
Mandatory Experience and Competencies:
# Competency
1.Overall 3.5+ years of IT experience with 1.5+ years in Data related technologies
2.Minimum 1.5 years of experience in Big Data technologies
3.Hands-on experience with the Hadoop stack – HDFS, sqoop, kafka, Pulsar, NiFi, Spark, Spark Streaming, Flink, Storm, hive, oozie, airflow and other components required in building end to end data pipeline. Working knowledge on real-time data pipelines is added advantage.
4.Strong experience in at least of the programming language Java, Scala, Python. Java preferable
5.Hands-on working knowledge of NoSQL and MPP data platforms like Hbase, MongoDb, Cassandra, AWS Redshift, Azure SQLDW, GCP BigQuery etc
Preferred Experience and Knowledge (Good to Have):
# Competency
1.Good knowledge of traditional ETL tools (Informatica, Talend, etc) and database technologies (Oracle, MySQL, SQL Server, Postgres) with hands on experience
2.Knowledge on data governance processes (security, lineage, catalog) and tools like Collibra, Alation etc
3.Knowledge on distributed messaging frameworks like ActiveMQ / RabbiMQ / Solace, search & indexing and Micro services architectures
4.Performance tuning and optimization of data pipelines
5.CI/CD – Infra provisioning on cloud, auto build & deployment pipelines, code quality
6.Working knowledge with data platform related services on at least 1 cloud platform, IAM and data security
7.Cloud data specialty and other related Big data technology certifications
Job Title: Senior Associate L1 – Data Engineering
Personal Attributes:
• Strong written and verbal communication skills
• Articulation skills
• Good team player
• Self-starter who requires minimal oversight
• Ability to prioritize and manage multiple tasks
• Process orientation and the ability to define and set up processes
A LEADING US BASED MNC
Data Engineering : Senior Engineer / Manager
As Senior Engineer/ Manager in Data Engineering, you will translate client requirements into technical design, and implement components for a data engineering solutions. Utilize a deep understanding of data integration and big data design principles in creating custom solutions or implementing package solutions. You will independently drive design discussions to insure the necessary health of the overall solution.
Must Have skills :
1. GCP
2. Spark streaming : Live data streaming experience is desired.
3. Any 1 coding language: Java/Pyhton /Scala
Skills & Experience :
- Overall experience of MINIMUM 5+ years with Minimum 4 years of relevant experience in Big Data technologies
- Hands-on experience with the Hadoop stack - HDFS, sqoop, kafka, Pulsar, NiFi, Spark, Spark Streaming, Flink, Storm, hive, oozie, airflow and other components required in building end to end data pipeline. Working knowledge on real-time data pipelines is added advantage.
- Strong experience in at least of the programming language Java, Scala, Python. Java preferable
- Hands-on working knowledge of NoSQL and MPP data platforms like Hbase, MongoDb, Cassandra, AWS Redshift, Azure SQLDW, GCP BigQuery etc.
- Well-versed and working knowledge with data platform related services on GCP
- Bachelor's degree and year of work experience of 6 to 12 years or any combination of education, training and/or experience that demonstrates the ability to perform the duties of the position
Your Impact :
- Data Ingestion, Integration and Transformation
- Data Storage and Computation Frameworks, Performance Optimizations
- Analytics & Visualizations
- Infrastructure & Cloud Computing
- Data Management Platforms
- Build functionality for data ingestion from multiple heterogeneous sources in batch & real-time
- Build functionality for data analytics, search and aggregation
🚀 Exciting Opportunity: Data Engineer Position in Gurugram 🌐
Hello
We are actively seeking a talented and experienced Data Engineer to join our dynamic team at Reality Motivational Venture in Gurugram (Gurgaon). If you're passionate about data, thrive in a collaborative environment, and possess the skills we're looking for, we want to hear from you!
Position: Data Engineer
Location: Gurugram (Gurgaon)
Experience: 5+ years
Key Skills:
- Python
- Spark, Pyspark
- Data Governance
- Cloud (AWS/Azure/GCP)
Main Responsibilities:
- Define and set up analytics environments for "Big Data" applications in collaboration with domain experts.
- Implement ETL processes for telemetry-based and stationary test data.
- Support in defining data governance, including data lifecycle management.
- Develop large-scale data processing engines and real-time search and analytics based on time series data.
- Ensure technical, methodological, and quality aspects.
- Support CI/CD processes.
- Foster know-how development and transfer, continuous improvement of leading technologies within Data Engineering.
- Collaborate with solution architects on the development of complex on-premise, hybrid, and cloud solution architectures.
Qualification Requirements:
- BSc, MSc, MEng, or PhD in Computer Science, Informatics/Telematics, Mathematics/Statistics, or a comparable engineering degree.
- Proficiency in Python and the PyData stack (Pandas/Numpy).
- Experience in high-level programming languages (C#/C++/Java).
- Familiarity with scalable processing environments like Dask (or Spark).
- Proficient in Linux and scripting languages (Bash Scripts).
- Experience in containerization and orchestration of containerized services (Kubernetes).
- Education in database technologies (SQL/OLAP and Non-SQL).
- Interest in Big Data storage technologies (Elastic, ClickHouse).
- Familiarity with Cloud technologies (Azure, AWS, GCP).
- Fluent English communication skills (speaking and writing).
- Ability to work constructively with a global team.
- Willingness to travel for business trips during development projects.
Preferable:
- Working knowledge of vehicle architectures, communication, and components.
- Experience in additional programming languages (C#/C++/Java, R, Scala, MATLAB).
- Experience in time-series processing.
How to Apply:
Interested candidates, please share your updated CV/resume with me.
Thank you for considering this exciting opportunity.
AWS Glue Developer
Work Experience: 6 to 8 Years
Work Location: Noida, Bangalore, Chennai & Hyderabad
Must Have Skills: AWS Glue, DMS, SQL, Python, PySpark, Data integrations and Data Ops,
Job Reference ID:BT/F21/IND
Job Description:
Design, build and configure applications to meet business process and application requirements.
Responsibilities:
7 years of work experience with ETL, Data Modelling, and Data Architecture Proficient in ETL optimization, designing, coding, and tuning big data processes using Pyspark Extensive experience to build data platforms on AWS using core AWS services Step function, EMR, Lambda, Glue and Athena, Redshift, Postgres, RDS etc and design/develop data engineering solutions. Orchestrate using Airflow.
Technical Experience:
Hands-on experience on developing Data platform and its components Data Lake, cloud Datawarehouse, APIs, Batch and streaming data pipeline Experience with building data pipelines and applications to stream and process large datasets at low latencies.
➢ Enhancements, new development, defect resolution and production support of Big data ETL development using AWS native services.
➢ Create data pipeline architecture by designing and implementing data ingestion solutions.
➢ Integrate data sets using AWS services such as Glue, Lambda functions/ Airflow.
➢ Design and optimize data models on AWS Cloud using AWS data stores such as Redshift, RDS, S3, Athena.
➢ Author ETL processes using Python, Pyspark.
➢ Build Redshift Spectrum direct transformations and data modelling using data in S3.
➢ ETL process monitoring using CloudWatch events.
➢ You will be working in collaboration with other teams. Good communication must.
➢ Must have experience in using AWS services API, AWS CLI and SDK
Professional Attributes:
➢ Experience operating very large data warehouses or data lakes Expert-level skills in writing and optimizing SQL Extensive, real-world experience designing technology components for enterprise solutions and defining solution architectures and reference architectures with a focus on cloud technology.
➢ Must have 6+ years of big data ETL experience using Python, S3, Lambda, Dynamo DB, Athena, Glue in AWS environment.
➢ Expertise in S3, RDS, Redshift, Kinesis, EC2 clusters highly desired.
Qualification:
➢ Degree in Computer Science, Computer Engineering or equivalent.
Salary: Commensurate with experience and demonstrated competence
Job Description:
As an Azure Data Engineer, your role will involve designing, developing, and maintaining data solutions on the Azure platform. You will be responsible for building and optimizing data pipelines, ensuring data quality and reliability, and implementing data processing and transformation logic. Your expertise in Azure Databricks, Python, SQL, Azure Data Factory (ADF), PySpark, and Scala will be essential for performing the following key responsibilities:
Designing and developing data pipelines: You will design and implement scalable and efficient data pipelines using Azure Databricks, PySpark, and Scala. This includes data ingestion, data transformation, and data loading processes.
Data modeling and database design: You will design and implement data models to support efficient data storage, retrieval, and analysis. This may involve working with relational databases, data lakes, or other storage solutions on the Azure platform.
Data integration and orchestration: You will leverage Azure Data Factory (ADF) to orchestrate data integration workflows and manage data movement across various data sources and targets. This includes scheduling and monitoring data pipelines.
Data quality and governance: You will implement data quality checks, validation rules, and data governance processes to ensure data accuracy, consistency, and compliance with relevant regulations and standards.
Performance optimization: You will optimize data pipelines and queries to improve overall system performance and reduce processing time. This may involve tuning SQL queries, optimizing data transformation logic, and leveraging caching techniques.
Monitoring and troubleshooting: You will monitor data pipelines, identify performance bottlenecks, and troubleshoot issues related to data ingestion, processing, and transformation. You will work closely with cross-functional teams to resolve data-related problems.
Documentation and collaboration: You will document data pipelines, data flows, and data transformation processes. You will collaborate with data scientists, analysts, and other stakeholders to understand their data requirements and provide data engineering support.
Skills and Qualifications:
Strong experience with Azure Databricks, Python, SQL, ADF, PySpark, and Scala.
Proficiency in designing and developing data pipelines and ETL processes.
Solid understanding of data modeling concepts and database design principles.
Familiarity with data integration and orchestration using Azure Data Factory.
Knowledge of data quality management and data governance practices.
Experience with performance tuning and optimization of data pipelines.
Strong problem-solving and troubleshooting skills related to data engineering.
Excellent collaboration and communication skills to work effectively in cross-functional teams.
Understanding of cloud computing principles and experience with Azure services.
- Mandatory - Hands on experience in Python and PySpark.
- Build pySpark applications using Spark Dataframes in Python using Jupyter notebook and PyCharm(IDE).
- Worked on optimizing spark jobs that processes huge volumes of data.
- Hands on experience in version control tools like Git.
- Worked on Amazon’s Analytics services like Amazon EMR, Lambda function etc
- Worked on Amazon’s Compute services like Amazon Lambda, Amazon EC2 and Amazon’s Storage service like S3 and few other services like SNS.
- Experience/knowledge of bash/shell scripting will be a plus.
- Experience in working with fixed width, delimited , multi record file formats etc.
- Hands on experience in tools like Jenkins to build, test and deploy the applications
- Awareness of Devops concepts and be able to work in an automated release pipeline environment.
- Excellent debugging skills.
Skills and requirements
- Experience analyzing complex and varied data in a commercial or academic setting.
- Desire to solve new and complex problems every day.
- Excellent ability to communicate scientific results to both technical and non-technical team members.
Desirable
- A degree in a numerically focused discipline such as, Maths, Physics, Chemistry, Engineering or Biological Sciences..
- Hands on experience on Python, Pyspark, SQL
- Hands on experience on building End to End Data Pipelines.
- Hands on Experience on Azure Data Factory, Azure Data Bricks, Data Lake - added advantage
- Hands on Experience in building data pipelines.
- Experience with Bigdata Tools, Hadoop, Hive, Sqoop, Spark, SparkSQL
- Experience with SQL or NoSQL databases for the purposes of data retrieval and management.
- Experience in data warehousing and business intelligence tools, techniques and technology, as well as experience in diving deep on data analysis or technical issues to come up with effective solutions.
- BS degree in math, statistics, computer science or equivalent technical field.
- Experience in data mining structured and unstructured data (SQL, ETL, data warehouse, Machine Learning etc.) in a business environment with large-scale, complex data sets.
- Proven ability to look at solutions in unconventional ways. Sees opportunities to innovate and can lead the way.
- Willing to learn and work on Data Science, ML, AI.
consulting & implementation services in the area of Oil & Gas, Mining and Manufacturing Industry
- Data Engineer
Required skill set: AWS GLUE, AWS LAMBDA, AWS SNS/SQS, AWS ATHENA, SPARK, SNOWFLAKE, PYTHON
Mandatory Requirements
- Experience in AWS Glue
- Experience in Apache Parquet
- Proficient in AWS S3 and data lake
- Knowledge of Snowflake
- Understanding of file-based ingestion best practices.
- Scripting language - Python & pyspark
CORE RESPONSIBILITIES
- Create and manage cloud resources in AWS
- Data ingestion from different data sources which exposes data using different technologies, such as: RDBMS, REST HTTP API, flat files, Streams, and Time series data based on various proprietary systems. Implement data ingestion and processing with the help of Big Data technologies
- Data processing/transformation using various technologies such as Spark and Cloud Services. You will need to understand your part of business logic and implement it using the language supported by the base data platform
- Develop automated data quality check to make sure right data enters the platform and verifying the results of the calculations
- Develop an infrastructure to collect, transform, combine and publish/distribute customer data.
- Define process improvement opportunities to optimize data collection, insights and displays.
- Ensure data and results are accessible, scalable, efficient, accurate, complete and flexible
- Identify and interpret trends and patterns from complex data sets
- Construct a framework utilizing data visualization tools and techniques to present consolidated analytical and actionable results to relevant stakeholders.
- Key participant in regular Scrum ceremonies with the agile teams
- Proficient at developing queries, writing reports and presenting findings
- Mentor junior members and bring best industry practices
QUALIFICATIONS
- 5-7+ years’ experience as data engineer in consumer finance or equivalent industry (consumer loans, collections, servicing, optional product, and insurance sales)
- Strong background in math, statistics, computer science, data science or related discipline
- Advanced knowledge one of language: Java, Scala, Python, C#
- Production experience with: HDFS, YARN, Hive, Spark, Kafka, Oozie / Airflow, Amazon Web Services (AWS), Docker / Kubernetes, Snowflake
- Proficient with
- Data mining/programming tools (e.g. SAS, SQL, R, Python)
- Database technologies (e.g. PostgreSQL, Redshift, Snowflake. and Greenplum)
- Data visualization (e.g. Tableau, Looker, MicroStrategy)
- Comfortable learning about and deploying new technologies and tools.
- Organizational skills and the ability to handle multiple projects and priorities simultaneously and meet established deadlines.
- Good written and oral communication skills and ability to present results to non-technical audiences
- Knowledge of business intelligence and analytical tools, technologies and techniques.
Familiarity and experience in the following is a plus:
- AWS certification
- Spark Streaming
- Kafka Streaming / Kafka Connect
- ELK Stack
- Cassandra / MongoDB
- CI/CD: Jenkins, GitLab, Jira, Confluence other related tools
Requirements-
● B.Tech/Masters in Mathematics, Statistics, Computer Science or another quantitative field
● 2-3+ years of work experience in ML domain ( 2-5 years experience )
● Hands-on coding experience in Python
● Experience in machine learning techniques such as Regression, Classification,Predictive modeling, Clustering, Deep Learning stack, NLP.
● Working knowledge of Tensorflow/PyTorch
Optional Add-ons-
● Experience with distributed computing frameworks: Map/Reduce, Hadoop, Spark etc.
● Experience with databases: MongoDB
Hiring for one of the MNC for India location
Key Responsibilities : ( Data Developer Python, Spark)
Exp : 2 to 9 Yrs
Development of data platforms, integration frameworks, processes, and code.
Develop and deliver APIs in Python or Scala for Business Intelligence applications build using a range of web languages
Develop comprehensive automated tests for features via end-to-end integration tests, performance tests, acceptance tests and unit tests.
Elaborate stories in a collaborative agile environment (SCRUM or Kanban)
Familiarity with cloud platforms like GCP, AWS or Azure.
Experience with large data volumes.
Familiarity with writing rest-based services.
Experience with distributed processing and systems
Experience with Hadoop / Spark toolsets
Experience with relational database management systems (RDBMS)
Experience with Data Flow development
Knowledge of Agile and associated development techniques including:
n
- Sr. Data Engineer:
Core Skills – Data Engineering, Big Data, Pyspark, Spark SQL and Python
Candidate with prior Palantir Cloud Foundry OR Clinical Trial Data Model background is preferred
Major accountabilities:
- Responsible for Data Engineering, Foundry Data Pipeline Creation, Foundry Analysis & Reporting, Slate Application development, re-usable code development & management and Integrating Internal or External System with Foundry for data ingestion with high quality.
- Have good understanding on Foundry Platform landscape and it’s capabilities
- Performs data analysis required to troubleshoot data related issues and assist in the resolution of data issues.
- Defines company data assets (data models), Pyspark, spark SQL, jobs to populate data models.
- Designs data integrations and data quality framework.
- Design & Implement integration with Internal, External Systems, F1 AWS platform using Foundry Data Connector or Magritte Agent
- Collaboration with data scientists, data analyst and technology teams to document and leverage their understanding of the Foundry integration with different data sources - Actively participate in agile work practices
- Coordinating with Quality Engineer to ensure the all quality controls, naming convention & best practices have been followed
Desired Candidate Profile :
- Strong data engineering background
- Experience with Clinical Data Model is preferred
- Experience in
- SQL Server ,Postgres, Cassandra, Hadoop, and Spark for distributed data storage and parallel computing
- Java and Groovy for our back-end applications and data integration tools
- Python for data processing and analysis
- Cloud infrastructure based on AWS EC2 and S3
- 7+ years IT experience, 2+ years’ experience in Palantir Foundry Platform, 4+ years’ experience in Big Data platform
- 5+ years of Python and Pyspark development experience
- Strong troubleshooting and problem solving skills
- BTech or master's degree in computer science or a related technical field
- Experience designing, building, and maintaining big data pipelines systems
- Hands-on experience on Palantir Foundry Platform and Foundry custom Apps development
- Able to design and implement data integration between Palantir Foundry and external Apps based on Foundry data connector framework
- Hands-on in programming languages primarily Python, R, Java, Unix shell scripts
- Hand-on experience in AWS / Azure cloud platform and stack
- Strong in API based architecture and concept, able to do quick PoC using API integration and development
- Knowledge of machine learning and AI
- Skill and comfort working in a rapidly changing environment with dynamic objectives and iteration with users.
Demonstrated ability to continuously learn, work independently, and make decisions with minimal supervision