50+ PySpark Jobs in Bangalore (Bengaluru) | PySpark Job openings in Bangalore (Bengaluru)
Apply to 50+ PySpark Jobs in Bangalore (Bengaluru) on CutShort.io. Explore the latest PySpark Job opportunities across top companies like Google, Amazon & Adobe.
Building the machine learning production (or MLOps) is the biggest challenge most large companies currently have in making the transition to becoming an AI-driven organization. This position is an opportunity for an experienced, server-side developer to build expertise in this exciting new frontier. You will be part of a team deploying state-of-the-art AI solutions for Fractal clients.
Responsibilities
As MLOps Engineer, you will work collaboratively with Data Scientists and Data engineers to deploy and operate advanced analytics machine learning models. You’ll help automate and streamline Model development and Model operations. You’ll build and maintain tools for deployment, monitoring, and operations. You’ll also troubleshoot and resolve issues in development, testing, and production environments.
- Enable Model tracking, model experimentation, Model automation
- Develop ML pipelines to support
- Develop MLOps components in Machine learning development life cycle using Model Repository (either of): MLFlow, Kubeflow Model Registry
- Develop MLOps components in Machine learning development life cycle using Machine Learning Services (either of): Kubeflow, DataRobot, HopsWorks, Dataiku or any relevant ML E2E PaaS/SaaS
- Work across all phases of Model development life cycle to build MLOPS components
- Build the knowledge base required to deliver increasingly complex MLOPS projects on Azure
- Be an integral part of client business development and delivery engagements across multiple domains
Required Qualifications
- 3-5 years experience building production-quality software.
- B.E/B.Tech/M.Tech in Computer Science or related technical degree OR Equivalent
- Strong experience in System Integration, Application Development or Data Warehouse projects across technologies used in the enterprise space
- Knowledge of MLOps, machine learning and docker
- Object-oriented languages (e.g. Python, PySpark, Java, C#, C++)
- CI/CD experience( i.e. Jenkins, Git hub action,
- Database programming using any flavors of SQL
- Knowledge of Git for Source code management
- Ability to collaborate effectively with highly technical resources in a fast-paced environment
- Ability to solve complex challenges/problems and rapidly deliver innovative solutions
- Foundational Knowledge of Cloud Computing on Azure
- Hunger and passion for learning new skills
Building the machine learning production System(or MLOps) is the biggest challenge most large companies currently have in making the transition to becoming an AI-driven organization. This position is an opportunity for an experienced, server-side developer to build expertise in this exciting new frontier. You will be part of a team deploying state-ofthe-art AI solutions for Fractal clients.
Responsibilities
As MLOps Engineer, you will work collaboratively with Data Scientists and Data engineers to deploy and operate advanced analytics machine learning models. You’ll help automate and streamline Model development and Model operations. You’ll build and maintain tools for deployment, monitoring, and operations. You’ll also troubleshoot and resolve issues in development, testing, and production environments.
- Enable Model tracking, model experimentation, Model automation
- Develop scalable ML pipelines
- Develop MLOps components in Machine learning development life cycle using Model Repository (either of): MLFlow, Kubeflow Model Registry
- Machine Learning Services (either of): Kubeflow, DataRobot, HopsWorks, Dataiku or any relevant ML E2E PaaS/SaaS
- Work across all phases of Model development life cycle to build MLOPS components
- Build the knowledge base required to deliver increasingly complex MLOPS projects on Azure
- Be an integral part of client business development and delivery engagements across multiple domains
Required Qualifications
- 5.5-9 years experience building production-quality software
- B.E/B.Tech/M.Tech in Computer Science or related technical degree OR equivalent
- Strong experience in System Integration, Application Development or Datawarehouse projects across technologies used in the enterprise space
- Expertise in MLOps, machine learning and docker
- Object-oriented languages (e.g. Python, PySpark, Java, C#, C++)
- Experience developing CI/CD components for production ready ML pipeline.
- Database programming using any flavors of SQL
- Knowledge of Git for Source code management
- Ability to collaborate effectively with highly technical resources in a fast-paced environment
- Ability to solve complex challenges/problems and rapidly deliver innovative solutions
- Team handling, problem solving, project management and communication skills & creative thinking
- Foundational Knowledge of Cloud Computing on Azure
- Hunger and passion for learning new skills
a leading data & analytics intelligence technology solutions provider to companies that value insights from information as a competitive advantage
What are we looking for?
- Bachelor’s degree in analytics related area (Data Science, Computer Engineering, Computer Science, Information Systems, Engineering, or a related discipline)
- 7+ years of work experience in data science, analytics, engineering, or product management for a diverse range of projects
- Hands on experience with python, deploying ML models
- Hands on experience with time-wise tracking of model performance, and diagnosis of data/model drift
- Familiarity with Dataiku, or other data-science-enabling tools (Sagemaker, etc)
- Demonstrated familiarity with distributed computing frameworks (snowpark, pyspark)
- Experience working with various types of data (structured / unstructured)
- Deep understanding of all data science phases (eg. Data engineering, EDA, Machine Learning, MLOps, Serving)
- Highly self-motivated to deliver both independently and with strong team collaboration
- Ability to creatively take on new challenges and work outside comfort zone
- Strong English communication skills (written & verbal)
Roles & Responsibilities:
- Clearly articulates expectations, capabilities, and action plans; actively listens with others’ frame of reference in mind; appropriately shares information with team; favorably influences people without direct authority
- Clearly articulates scope and deliverables of projects; breaks complex initiatives into detailed component parts and sequences actions appropriately; develops action plans and monitors progress independently; designs success criteria and uses them to track outcomes; drives implementation of recommendations when appropriate, engages with stakeholders throughout to ensure buy-in
- Manages projects with and through others; shares responsibility and credit; develops self and others through teamwork; comfortable providing guidance and sharing expertise with others to help them develop their skills and perform at their best; helps others take appropriate risks; communicates frequently with team members earning respect and trust of the team
- Experience in translating business priorities and vision into product/platform thinking, set clear directives to a group of team members with diverse skillsets, while providing functional & technical guidance and SME support
- Demonstrated experience interfacing with internal and external teams to develop innovative data science solutions
- Strong business analysis, product design, and product management skills
- Ability to work in a collaborative environment - reviewing peers' code, contributing to problem- solving sessions, ability to communicate technical knowledge to a a variety of audience (such as management, brand teams, data engineering teams, etc.)
- Ability to articulate model performance to a non-technical crowd, and ability to select appropriate evaluation criteria to evaluate hidden confounders and biases within a model
- MLOps frameworks and their use in model tracking and deployment, and automating the model serving pipeline
- Work with all sizes of ML model from linear/logistic regression to other sklearn-like models, to deep learning
- Formulate training schema for unbiased model training e.g. K-fold cross-validation, leave-one- out cross-validation) for parameter searching and model tuning
- Ability to work on machine learning work like recommender systems, end-to-end ML lifecycle)
- Ability to manage ML on largely imbalanced training sets (<5% positive rate)
- Ability to articulate model performance to a non-technical crowd, and ability to select appropriate evaluation criteria to evaluate hidden confounders and biases within a model
- MLOps frameworks and their use in model tracking and deployment, and automating the model serving pipeline
Greetings , Wissen Technology is Hiring for the position of Data Engineer
Please find the Job Description for your Reference:
JD
- Design, develop, and maintain data pipelines on AWS EMR (Elastic MapReduce) to support data processing and analytics.
- Implement data ingestion processes from various sources including APIs, databases, and flat files.
- Optimize and tune big data workflows for performance and scalability.
- Collaborate with data scientists, analysts, and other stakeholders to understand data requirements and deliver solutions.
- Manage and monitor EMR clusters, ensuring high availability and reliability.
- Develop ETL (Extract, Transform, Load) processes to cleanse, transform, and store data in data lakes and data warehouses.
- Implement data security best practices to ensure data is protected and compliant with relevant regulations.
- Create and maintain technical documentation related to data pipelines, workflows, and infrastructure.
- Troubleshoot and resolve issues related to data processing and EMR cluster performance.
Qualifications:
- Bachelor’s degree in Computer Science, Information Technology, or a related field.
- 5+ years of experience in data engineering, with a focus on big data technologies.
- Strong experience with AWS services, particularly EMR, S3, Redshift, Lambda, and Glue.
- Proficiency in programming languages such as Python, Java, or Scala.
- Experience with big data frameworks and tools such as Hadoop, Spark, Hive, and Pig.
- Solid understanding of data modeling, ETL processes, and data warehousing concepts.
- Experience with SQL and NoSQL databases.
- Familiarity with CI/CD pipelines and version control systems (e.g., Git).
- Strong problem-solving skills and the ability to work independently and collaboratively in a team environment
- Sr. Solution Architect
- Job Location – Bangalore
- Need candidates who can join in 15 days or less.
- Overall, 12-15 years of experience.
Looking for this tech stack in a Sr. Solution Architect (who also has a Delivery Manager background). Someone who has heavy business and IT stakeholder collaboration and negotiation skills, someone who can provide thought leadership, collaborate in the development of Product roadmaps, influence decisions, negotiate effectively with business and IT stakeholders, etc.
- Building data pipelines using Azure data tools and services (Azure Data Factory, Azure Databricks, Azure Function, Spark, Azure Blob/ADLS, Azure SQL, Snowflake..)
- Administration of cloud infrastructure in public clouds such as Azure
- Monitoring cloud infrastructure, applications, big data pipelines and ETL workflows
- Managing outages, customer escalations, crisis management, and other similar circumstances.
- Understanding of DevOps tools and environments like Azure DevOps, Jenkins, Git, Ansible, Terraform.
- SQL, Spark SQL, Python, PySpark
- Familiarity with agile software delivery methodologies
- Proven experience collaborating with global Product Team members, including Business Stakeholders located in NA
Responsibilities:
- Lead the design, development, and implementation of scalable data architectures leveraging Snowflake, Python, PySpark, and Databricks.
- Collaborate with business stakeholders to understand requirements and translate them into technical specifications and data models.
- Architect and optimize data pipelines for performance, reliability, and efficiency.
- Ensure data quality, integrity, and security across all data processes and systems.
- Provide technical leadership and mentorship to junior team members.
- Stay abreast of industry trends and best practices in data architecture and analytics.
- Drive innovation and continuous improvement in data management practices.
Requirements:
- Bachelor's degree in Computer Science, Information Systems, or a related field. Master's degree preferred.
- 5+ years of experience in data architecture, data engineering, or a related field.
- Strong proficiency in Snowflake, including data modeling, performance tuning, and administration.
- Expertise in Python and PySpark for data processing, manipulation, and analysis.
- Hands-on experience with Databricks for building and managing data pipelines.
- Proven leadership experience, with the ability to lead cross-functional teams and drive projects to successful completion.
- Experience in the banking or insurance domain is highly desirable.
- Excellent communication skills, with the ability to effectively collaborate with stakeholders at all levels of the organization.
- Strong problem-solving and analytical skills, with a keen attention to detail.
Benefits:
- Competitive salary and performance-based incentives.
- Comprehensive benefits package, including health insurance, retirement plans, and wellness programs.
- Flexible work arrangements, including remote options.
- Opportunities for professional development and career advancement.
- Dynamic and collaborative work environment with a focus on innovation and continuous learning.
We are actively seeking a self-motivated Data Engineer with expertise in Azure cloud and Databricks, with a thorough understanding of Delta Lake and Lake-house Architecture. The ideal candidate should excel in developing scalable data solutions, crafting platform tools, and integrating systems, while demonstrating proficiency in cloud-native database solutions and distributed data processing.
Key Responsibilities:
- Contribute to the development and upkeep of a scalable data platform, incorporating tools and frameworks that leverage Azure and Databricks capabilities.
- Exhibit proficiency in various RDBMS databases such as MySQL and SQL-Server, emphasizing their integration in applications and pipeline development.
- Design and maintain high-caliber code, including data pipelines and applications, utilizing Python, Scala, and PHP.
- Implement effective data processing solutions via Apache Spark, optimizing Spark applications for large-scale data handling.
- Optimize data storage using formats like Parquet and Delta Lake to ensure efficient data accessibility and reliable performance.
- Demonstrate understanding of Hive Metastore, Unity Catalog Metastore, and the operational dynamics of external tables.
- Collaborate with diverse teams to convert business requirements into precise technical specifications.
Requirements:
- Bachelor’s degree in Computer Science, Engineering, or a related discipline.
- Demonstrated hands-on experience with Azure cloud services and Databricks.
- Proficient programming skills in Python, Scala, and PHP.
- In-depth knowledge of SQL, NoSQL databases, and data warehousing principles.
- Familiarity with distributed data processing and external table management.
- Insight into enterprise data solutions for PIM, CDP, MDM, and ERP applications.
- Exceptional problem-solving acumen and meticulous attention to detail.
Additional Qualifications :
- Acquaintance with data security and privacy standards.
- Experience in CI/CD pipelines and version control systems, notably Git.
- Familiarity with Agile methodologies and DevOps practices.
- Competence in technical writing for comprehensive documentation.
Technical Skills:
- Ability to understand and translate business requirements into design.
- Proficient in AWS infrastructure components such as S3, IAM, VPC, EC2, and Redshift.
- Experience in creating ETL jobs using Python/PySpark.
- Proficiency in creating AWS Lambda functions for event-based jobs.
- Knowledge of automating ETL processes using AWS Step Functions.
- Competence in building data warehouses and loading data into them.
Responsibilities:
- Understand business requirements and translate them into design.
- Assess AWS infrastructure needs for development work.
- Develop ETL jobs using Python/PySpark to meet requirements.
- Implement AWS Lambda for event-based tasks.
- Automate ETL processes using AWS Step Functions.
- Build data warehouses and manage data loading.
- Engage with customers and stakeholders to articulate the benefits of proposed solutions and frameworks.
Publicis Sapient Overview:
The Senior Associate People Senior Associate L1 in Data Engineering, you will translate client requirements into technical design, and implement components for data engineering solution. Utilize deep understanding of data integration and big data design principles in creating custom solutions or implementing package solutions. You will independently drive design discussions to insure the necessary health of the overall solution
.
Job Summary:
As Senior Associate L2 in Data Engineering, you will translate client requirements into technical design, and implement components for data engineering solution. Utilize deep understanding of data integration and big data design principles in creating custom solutions or implementing package solutions. You will independently drive design discussions to insure the necessary health of the overall solution
The role requires a hands-on technologist who has strong programming background like Java / Scala / Python, should have experience in Data Ingestion, Integration and data Wrangling, Computation, Analytics pipelines and exposure to Hadoop ecosystem components. You are also required to have hands-on knowledge on at least one of AWS, GCP, Azure cloud platforms.
Role & Responsibilities:
Your role is focused on Design, Development and delivery of solutions involving:
• Data Integration, Processing & Governance
• Data Storage and Computation Frameworks, Performance Optimizations
• Analytics & Visualizations
• Infrastructure & Cloud Computing
• Data Management Platforms
• Implement scalable architectural models for data processing and storage
• Build functionality for data ingestion from multiple heterogeneous sources in batch & real-time mode
• Build functionality for data analytics, search and aggregation
Experience Guidelines:
Mandatory Experience and Competencies:
# Competency
1.Overall 5+ years of IT experience with 3+ years in Data related technologies
2.Minimum 2.5 years of experience in Big Data technologies and working exposure in at least one cloud platform on related data services (AWS / Azure / GCP)
3.Hands-on experience with the Hadoop stack – HDFS, sqoop, kafka, Pulsar, NiFi, Spark, Spark Streaming, Flink, Storm, hive, oozie, airflow and other components required in building end to end data pipeline.
4.Strong experience in at least of the programming language Java, Scala, Python. Java preferable
5.Hands-on working knowledge of NoSQL and MPP data platforms like Hbase, MongoDb, Cassandra, AWS Redshift, Azure SQLDW, GCP BigQuery etc
6.Well-versed and working knowledge with data platform related services on at least 1 cloud platform, IAM and data security
Preferred Experience and Knowledge (Good to Have):
# Competency
1.Good knowledge of traditional ETL tools (Informatica, Talend, etc) and database technologies (Oracle, MySQL, SQL Server, Postgres) with hands on experience
2.Knowledge on data governance processes (security, lineage, catalog) and tools like Collibra, Alation etc
3.Knowledge on distributed messaging frameworks like ActiveMQ / RabbiMQ / Solace, search & indexing and Micro services architectures
4.Performance tuning and optimization of data pipelines
5.CI/CD – Infra provisioning on cloud, auto build & deployment pipelines, code quality
6.Cloud data specialty and other related Big data technology certifications
Personal Attributes:
• Strong written and verbal communication skills
• Articulation skills
• Good team player
• Self-starter who requires minimal oversight
• Ability to prioritize and manage multiple tasks
• Process orientation and the ability to define and set up processes
Publicis Sapient Overview:
The Senior Associate People Senior Associate L1 in Data Engineering, you will translate client requirements into technical design, and implement components for data engineering solution. Utilize deep understanding of data integration and big data design principles in creating custom solutions or implementing package solutions. You will independently drive design discussions to insure the necessary health of the overall solution
.
Job Summary:
As Senior Associate L1 in Data Engineering, you will do technical design, and implement components for data engineering solution. Utilize deep understanding of data integration and big data design principles in creating custom solutions or implementing package solutions. You will independently drive design discussions to insure the necessary health of the overall solution
The role requires a hands-on technologist who has strong programming background like Java / Scala / Python, should have experience in Data Ingestion, Integration and data Wrangling, Computation, Analytics pipelines and exposure to Hadoop ecosystem components. Having hands-on knowledge on at least one of AWS, GCP, Azure cloud platforms will be preferable.
Role & Responsibilities:
Job Title: Senior Associate L1 – Data Engineering
Your role is focused on Design, Development and delivery of solutions involving:
• Data Ingestion, Integration and Transformation
• Data Storage and Computation Frameworks, Performance Optimizations
• Analytics & Visualizations
• Infrastructure & Cloud Computing
• Data Management Platforms
• Build functionality for data ingestion from multiple heterogeneous sources in batch & real-time
• Build functionality for data analytics, search and aggregation
Experience Guidelines:
Mandatory Experience and Competencies:
# Competency
1.Overall 3.5+ years of IT experience with 1.5+ years in Data related technologies
2.Minimum 1.5 years of experience in Big Data technologies
3.Hands-on experience with the Hadoop stack – HDFS, sqoop, kafka, Pulsar, NiFi, Spark, Spark Streaming, Flink, Storm, hive, oozie, airflow and other components required in building end to end data pipeline. Working knowledge on real-time data pipelines is added advantage.
4.Strong experience in at least of the programming language Java, Scala, Python. Java preferable
5.Hands-on working knowledge of NoSQL and MPP data platforms like Hbase, MongoDb, Cassandra, AWS Redshift, Azure SQLDW, GCP BigQuery etc
Preferred Experience and Knowledge (Good to Have):
# Competency
1.Good knowledge of traditional ETL tools (Informatica, Talend, etc) and database technologies (Oracle, MySQL, SQL Server, Postgres) with hands on experience
2.Knowledge on data governance processes (security, lineage, catalog) and tools like Collibra, Alation etc
3.Knowledge on distributed messaging frameworks like ActiveMQ / RabbiMQ / Solace, search & indexing and Micro services architectures
4.Performance tuning and optimization of data pipelines
5.CI/CD – Infra provisioning on cloud, auto build & deployment pipelines, code quality
6.Working knowledge with data platform related services on at least 1 cloud platform, IAM and data security
7.Cloud data specialty and other related Big data technology certifications
Job Title: Senior Associate L1 – Data Engineering
Personal Attributes:
• Strong written and verbal communication skills
• Articulation skills
• Good team player
• Self-starter who requires minimal oversight
• Ability to prioritize and manage multiple tasks
• Process orientation and the ability to define and set up processes
Title:- Lead Data Engineer
Experience: 10+y
Budget: 32-36 LPA
Location: Bangalore
Work of Mode: Work from office
Primary Skills: Data Bricks, Spark, Pyspark,Sql, Python, AWS
Qualification: Any Engineering degree
Roles and Responsibilities:
• 8 - 10+ years’ experience in developing scalable Big Data applications or solutions on
distributed platforms.
• Able to partner with others in solving complex problems by taking a broad
perspective to identify.
• innovative solutions.
• Strong skills building positive relationships across Product and Engineering.
• Able to influence and communicate effectively, both verbally and written, with team
members and business stakeholders
• Able to quickly pick up new programming languages, technologies, and frameworks.
• Experience working in Agile and Scrum development process.
• Experience working in a fast-paced, results-oriented environment.
• Experience in Amazon Web Services (AWS) mainly S3, Managed Airflow, EMR/ EC2,
IAM etc.
• Experience working with Data Warehousing tools, including SQL database, Presto,
and Snowflake
• Experience architecting data product in Streaming, Serverless and Microservices
Architecture and platform.
• Experience working with Data platforms, including EMR, Airflow, Databricks (Data
Engineering & Delta
• Lake components, and Lakehouse Medallion architecture), etc.
• Experience with creating/ configuring Jenkins pipeline for smooth CI/CD process for
Managed Spark jobs, build Docker images, etc.
• Experience working with distributed technology tools, including Spark, Python, Scala
• Working knowledge of Data warehousing, Data modelling, Governance and Data
Architecture
• Working knowledge of Reporting & Analytical tools such as Tableau, Quicksite
etc.
• Demonstrated experience in learning new technologies and skills.
• Bachelor’s degree in computer science, Information Systems, Business, or other
relevant subject area
A LEADING US BASED MNC
Data Engineering : Senior Engineer / Manager
As Senior Engineer/ Manager in Data Engineering, you will translate client requirements into technical design, and implement components for a data engineering solutions. Utilize a deep understanding of data integration and big data design principles in creating custom solutions or implementing package solutions. You will independently drive design discussions to insure the necessary health of the overall solution.
Must Have skills :
1. GCP
2. Spark streaming : Live data streaming experience is desired.
3. Any 1 coding language: Java/Pyhton /Scala
Skills & Experience :
- Overall experience of MINIMUM 5+ years with Minimum 4 years of relevant experience in Big Data technologies
- Hands-on experience with the Hadoop stack - HDFS, sqoop, kafka, Pulsar, NiFi, Spark, Spark Streaming, Flink, Storm, hive, oozie, airflow and other components required in building end to end data pipeline. Working knowledge on real-time data pipelines is added advantage.
- Strong experience in at least of the programming language Java, Scala, Python. Java preferable
- Hands-on working knowledge of NoSQL and MPP data platforms like Hbase, MongoDb, Cassandra, AWS Redshift, Azure SQLDW, GCP BigQuery etc.
- Well-versed and working knowledge with data platform related services on GCP
- Bachelor's degree and year of work experience of 6 to 12 years or any combination of education, training and/or experience that demonstrates the ability to perform the duties of the position
Your Impact :
- Data Ingestion, Integration and Transformation
- Data Storage and Computation Frameworks, Performance Optimizations
- Analytics & Visualizations
- Infrastructure & Cloud Computing
- Data Management Platforms
- Build functionality for data ingestion from multiple heterogeneous sources in batch & real-time
- Build functionality for data analytics, search and aggregation
AWS Glue Developer
Work Experience: 6 to 8 Years
Work Location: Noida, Bangalore, Chennai & Hyderabad
Must Have Skills: AWS Glue, DMS, SQL, Python, PySpark, Data integrations and Data Ops,
Job Reference ID:BT/F21/IND
Job Description:
Design, build and configure applications to meet business process and application requirements.
Responsibilities:
7 years of work experience with ETL, Data Modelling, and Data Architecture Proficient in ETL optimization, designing, coding, and tuning big data processes using Pyspark Extensive experience to build data platforms on AWS using core AWS services Step function, EMR, Lambda, Glue and Athena, Redshift, Postgres, RDS etc and design/develop data engineering solutions. Orchestrate using Airflow.
Technical Experience:
Hands-on experience on developing Data platform and its components Data Lake, cloud Datawarehouse, APIs, Batch and streaming data pipeline Experience with building data pipelines and applications to stream and process large datasets at low latencies.
➢ Enhancements, new development, defect resolution and production support of Big data ETL development using AWS native services.
➢ Create data pipeline architecture by designing and implementing data ingestion solutions.
➢ Integrate data sets using AWS services such as Glue, Lambda functions/ Airflow.
➢ Design and optimize data models on AWS Cloud using AWS data stores such as Redshift, RDS, S3, Athena.
➢ Author ETL processes using Python, Pyspark.
➢ Build Redshift Spectrum direct transformations and data modelling using data in S3.
➢ ETL process monitoring using CloudWatch events.
➢ You will be working in collaboration with other teams. Good communication must.
➢ Must have experience in using AWS services API, AWS CLI and SDK
Professional Attributes:
➢ Experience operating very large data warehouses or data lakes Expert-level skills in writing and optimizing SQL Extensive, real-world experience designing technology components for enterprise solutions and defining solution architectures and reference architectures with a focus on cloud technology.
➢ Must have 6+ years of big data ETL experience using Python, S3, Lambda, Dynamo DB, Athena, Glue in AWS environment.
➢ Expertise in S3, RDS, Redshift, Kinesis, EC2 clusters highly desired.
Qualification:
➢ Degree in Computer Science, Computer Engineering or equivalent.
Salary: Commensurate with experience and demonstrated competence
at hopscotch
About the role:
Hopscotch is looking for a passionate Data Engineer to join our team. You will work closely with other teams like data analytics, marketing, data science and individual product teams to specify, validate, prototype, scale, and deploy data pipelines features and data architecture.
Here’s what will be expected out of you:
➢ Ability to work in a fast-paced startup mindset. Should be able to manage all aspects of data extraction transfer and load activities.
➢ Develop data pipelines that make data available across platforms.
➢ Should be comfortable in executing ETL (Extract, Transform and Load) processes which include data ingestion, data cleaning and curation into a data warehouse, database, or data platform.
➢ Work on various aspects of the AI/ML ecosystem – data modeling, data and ML pipelines.
➢ Work closely with Devops and senior Architect to come up with scalable system and model architectures for enabling real-time and batch services.
What we want:
➢ 5+ years of experience as a data engineer or data scientist with a focus on data engineering and ETL jobs.
➢ Well versed with the concept of Data warehousing, Data Modelling and/or Data Analysis.
➢ Experience using & building pipelines and performing ETL with industry-standard best practices on Redshift (more than 2+ years).
➢ Ability to troubleshoot and solve performance issues with data ingestion, data processing & query execution on Redshift.
➢ Good understanding of orchestration tools like Airflow.
➢ Strong Python and SQL coding skills.
➢ Strong Experience in distributed systems like spark.
➢ Experience with AWS Data and ML Technologies (AWS Glue,MWAA, Data Pipeline,EMR,Athena, Redshift,Lambda etc).
➢ Solid hands on with various data extraction techniques like CDC or Time/batch based and the related tools (Debezium, AWS DMS, Kafka Connect, etc) for near real time and batch data extraction.
Note :
Product based companies, Ecommerce companies is added advantage
Job Description:
As an Azure Data Engineer, your role will involve designing, developing, and maintaining data solutions on the Azure platform. You will be responsible for building and optimizing data pipelines, ensuring data quality and reliability, and implementing data processing and transformation logic. Your expertise in Azure Databricks, Python, SQL, Azure Data Factory (ADF), PySpark, and Scala will be essential for performing the following key responsibilities:
Designing and developing data pipelines: You will design and implement scalable and efficient data pipelines using Azure Databricks, PySpark, and Scala. This includes data ingestion, data transformation, and data loading processes.
Data modeling and database design: You will design and implement data models to support efficient data storage, retrieval, and analysis. This may involve working with relational databases, data lakes, or other storage solutions on the Azure platform.
Data integration and orchestration: You will leverage Azure Data Factory (ADF) to orchestrate data integration workflows and manage data movement across various data sources and targets. This includes scheduling and monitoring data pipelines.
Data quality and governance: You will implement data quality checks, validation rules, and data governance processes to ensure data accuracy, consistency, and compliance with relevant regulations and standards.
Performance optimization: You will optimize data pipelines and queries to improve overall system performance and reduce processing time. This may involve tuning SQL queries, optimizing data transformation logic, and leveraging caching techniques.
Monitoring and troubleshooting: You will monitor data pipelines, identify performance bottlenecks, and troubleshoot issues related to data ingestion, processing, and transformation. You will work closely with cross-functional teams to resolve data-related problems.
Documentation and collaboration: You will document data pipelines, data flows, and data transformation processes. You will collaborate with data scientists, analysts, and other stakeholders to understand their data requirements and provide data engineering support.
Skills and Qualifications:
Strong experience with Azure Databricks, Python, SQL, ADF, PySpark, and Scala.
Proficiency in designing and developing data pipelines and ETL processes.
Solid understanding of data modeling concepts and database design principles.
Familiarity with data integration and orchestration using Azure Data Factory.
Knowledge of data quality management and data governance practices.
Experience with performance tuning and optimization of data pipelines.
Strong problem-solving and troubleshooting skills related to data engineering.
Excellent collaboration and communication skills to work effectively in cross-functional teams.
Understanding of cloud computing principles and experience with Azure services.
About Kloud9:
Kloud9 exists with the sole purpose of providing cloud expertise to the retail industry. Our team of cloud architects, engineers and developers help retailers launch a successful cloud initiative so you can quickly realise the benefits of cloud technology. Our standardised, proven cloud adoption methodologies reduce the cloud adoption time and effort so you can directly benefit from lower migration costs.
Kloud9 was founded with the vision of bridging the gap between E-commerce and cloud. The E-commerce of any industry is limiting and poses a huge challenge in terms of the finances spent on physical data structures.
At Kloud9, we know migrating to the cloud is the single most significant technology shift your company faces today. We are your trusted advisors in transformation and are determined to build a deep partnership along the way. Our cloud and retail experts will ease your transition to the cloud.
Our sole focus is to provide cloud expertise to retail industry giving our clients the empowerment that will take their business to the next level. Our team of proficient architects, engineers and developers have been designing, building and implementing solutions for retailers for an average of more than 20 years.
We are a cloud vendor that is both platform and technology independent. Our vendor independence not just provides us with a unique perspective into the cloud market but also ensures that we deliver the cloud solutions available that best meet our clients' requirements.
What we are looking for:
● 3+ years’ experience developing Data & Analytic solutions
● Experience building data lake solutions leveraging one or more of the following AWS, EMR, S3, Hive& Spark
● Experience with relational SQL
● Experience with scripting languages such as Shell, Python
● Experience with source control tools such as GitHub and related dev process
● Experience with workflow scheduling tools such as Airflow
● In-depth knowledge of scalable cloud
● Has a passion for data solutions
● Strong understanding of data structures and algorithms
● Strong understanding of solution and technical design
● Has a strong problem-solving and analytical mindset
● Experience working with Agile Teams.
● Able to influence and communicate effectively, both verbally and written, with team members and business stakeholders
● Able to quickly pick up new programming languages, technologies, and frameworks
● Bachelor’s Degree in computer science
Why Explore a Career at Kloud9:
With job opportunities in prime locations of US, London, Poland and Bengaluru, we help build your career paths in cutting edge technologies of AI, Machine Learning and Data Science. Be part of an inclusive and diverse workforce that's changing the face of retail technology with their creativity and innovative solutions. Our vested interest in our employees translates to deliver the best products and solutions to our customers.
About Kloud9:
Kloud9 exists with the sole purpose of providing cloud expertise to the retail industry. Our team of cloud architects, engineers and developers help retailers launch a successful cloud initiative so you can quickly realise the benefits of cloud technology. Our standardised, proven cloud adoption methodologies reduce the cloud adoption time and effort so you can directly benefit from lower migration costs.
Kloud9 was founded with the vision of bridging the gap between E-commerce and cloud. The E-commerce of any industry is limiting and poses a huge challenge in terms of the finances spent on physical data structures.
At Kloud9, we know migrating to the cloud is the single most significant technology shift your company faces today. We are your trusted advisors in transformation and are determined to build a deep partnership along the way. Our cloud and retail experts will ease your transition to the cloud.
Our sole focus is to provide cloud expertise to retail industry giving our clients the empowerment that will take their business to the next level. Our team of proficient architects, engineers and developers have been designing, building and implementing solutions for retailers for an average of more than 20 years.
We are a cloud vendor that is both platform and technology independent. Our vendor independence not just provides us with a unique perspective into the cloud market but also ensures that we deliver the cloud solutions available that best meet our clients' requirements.
● Overall 8+ Years of Experience in Web Application development.
● 5+ Years of development experience with JAVA8 , Springboot, Microservices and middleware
● 3+ Years of Designing Middleware using Node JS platform.
● good to have 2+ Years of Experience in using NodeJS along with AWS Serverless platform.
● Good Experience with Javascript / TypeScript, Event Loops, ExpressJS, GraphQL, SQL DB (MySQLDB), NoSQL DB(MongoDB) and YAML templates.
● Good Experience with TDD Driven Development and Automated Unit Testing.
● Good Experience with exposing and consuming Rest APIs in Java 8, Springboot platform and Swagger API contracts.
● Good Experience in building NodeJS middleware performing Transformations, Routing, Aggregation, Orchestration and Authentication(JWT/OAUTH).
● Experience supporting and working with cross-functional teams in a dynamic environment.
● Experience working in Agile Scrum Methodology.
● Very good Problem-Solving Skills.
● Very good learner and passion for technology.
● Excellent verbal and written communication skills in English
● Ability to communicate effectively with team members and business stakeholders
Secondary Skill Requirements:
● Experience working with any of Loopback, NestJS, Hapi.JS, Sails.JS, Passport.JS
Why Explore a Career at Kloud9:
With job opportunities in prime locations of US, London, Poland and Bengaluru, we help build your career paths in cutting edge technologies of AI, Machine Learning and Data Science. Be part of an inclusive and diverse workforce that's changing the face of retail technology with their creativity and innovative solutions. Our vested interest in our employees translates to deliver the best products and solutions to our customers.
Data Engineer
- High Skilled and proficient on Azure Data Engineering Tech stacks (ADF, Databricks) - Should be well experienced in design and development of Big data integration platform (Kafka, Hadoop). - Highly skilled and experienced in building medium to complex data integration pipelines for Data at Rest and streaming data using Spark. - Strong knowledge in R/Python. - Advanced proficiency in solution design and implementation through Azure Data Lake, SQL and NoSQL Databases. - Strong in Data Warehousing concepts - Expertise in SQL, SQL tuning, Data Management (Data Security), schema design, Python and ETL processes - Highly Motivated, Self-Starter and quick learner - Must have Good knowledge on Data modelling and understating of Data analytics - Exposure to Statistical procedures, Experiments and Machine Learning techniques is an added advantage. - Experience in leading small team of 6/7 Data Engineers. - Excellent written and verbal communication skills
|
Top Management Consulting Company
We are looking for a Machine Learning engineer for on of our premium client.
Experience: 2-9 years
Location: Gurgaon/Bangalore
Tech Stack:
Python, PySpark, the Python Scientific Stack; MLFlow, Grafana, Prometheus for machine learning pipeline management and monitoring; SQL, Airflow, Databricks, our own open-source data pipelining framework called Kedro, Dask/RAPIDS; Django, GraphQL and ReactJS for horizontal product development; container technologies such as Docker and Kubernetes, CircleCI/Jenkins for CI/CD, cloud solutions such as AWS, GCP, and Azure as well as Terraform and Cloudformation for deployment
Skills and requirements
- Experience analyzing complex and varied data in a commercial or academic setting.
- Desire to solve new and complex problems every day.
- Excellent ability to communicate scientific results to both technical and non-technical team members.
Desirable
- A degree in a numerically focused discipline such as, Maths, Physics, Chemistry, Engineering or Biological Sciences..
- Hands on experience on Python, Pyspark, SQL
- Hands on experience on building End to End Data Pipelines.
- Hands on Experience on Azure Data Factory, Azure Data Bricks, Data Lake - added advantage
- Hands on Experience in building data pipelines.
- Experience with Bigdata Tools, Hadoop, Hive, Sqoop, Spark, SparkSQL
- Experience with SQL or NoSQL databases for the purposes of data retrieval and management.
- Experience in data warehousing and business intelligence tools, techniques and technology, as well as experience in diving deep on data analysis or technical issues to come up with effective solutions.
- BS degree in math, statistics, computer science or equivalent technical field.
- Experience in data mining structured and unstructured data (SQL, ETL, data warehouse, Machine Learning etc.) in a business environment with large-scale, complex data sets.
- Proven ability to look at solutions in unconventional ways. Sees opportunities to innovate and can lead the way.
- Willing to learn and work on Data Science, ML, AI.
Hi,
We are hiring for Data Scientist for Bangalore.
Req Skills:
- NLP
- ML programming
- Spark
- Model Deployment
- Experience processing unstructured data and building NLP models
- Experience with big data tools pyspark
- Pipeline orchestration using Airflow and model deployment experience is preferred
Hi,
Enterprise minds is looking for Data Scientist.
Strong in Python,Pyspark.
Prefer immediate joiners
BRIEF DESCRIPTION:
At-least 1 year of Python, Spark, SQL, data engineering experience
Primary Skillset: PySpark, Scala/Python/Spark, Azure Synapse, S3, RedShift/Snowflake
Relevant Experience: Legacy ETL job Migration to AWS Glue / Python & Spark combination
ROLE SCOPE:
Reverse engineer the existing/legacy ETL jobs
Create the workflow diagrams and review the logic diagrams with Tech Leads
Write equivalent logic in Python & Spark
Unit test the Glue jobs and certify the data loads before passing to system testing
Follow the best practices, enable appropriate audit & control mechanism
Analytically skillful, identify the root causes quickly and efficiently debug issues
Take ownership of the deliverables and support the deployments
REQUIREMENTS:
Create data pipelines for data integration into Cloud stacks eg. Azure Synapse
Code data processing jobs in Azure Synapse Analytics, Python, and Spark
Experience in dealing with structured, semi-structured, and unstructured data in batch and real-time environments.
Should be able to process .json, .parquet and .avro files
PREFERRED BACKGROUND:
Tier1/2 candidates from IIT/NIT/IIITs
However, relevant experience, learning attitude takes precedence
We are looking for an exceptionally talented Lead data engineer who has exposure in implementing AWS services to build data pipelines, api integration and designing data warehouse. Candidate with both hands-on and leadership capabilities will be ideal for this position.
Qualification: At least a bachelor’s degree in Science, Engineering, Applied Mathematics. Preferred Masters degree
Job Responsibilities:
• Total 6+ years of experience as a Data Engineer and 2+ years of experience in managing a team
• Have minimum 3 years of AWS Cloud experience.
• Well versed in languages such as Python, PySpark, SQL, NodeJS etc
• Has extensive experience in the real-timeSpark ecosystem and has worked on both real time and batch processing
• Have experience in AWS Glue, EMR, DMS, Lambda, S3, DynamoDB, Step functions, Airflow, RDS, Aurora etc.
• Experience with modern Database systems such as Redshift, Presto, Hive etc.
• Worked on building data lakes in the past on S3 or Apache Hudi
• Solid understanding of Data Warehousing Concepts
• Good to have experience on tools such as Kafka or Kinesis
• Good to have AWS Developer Associate or Solutions Architect Associate Certification
• Have experience in managing a team
Must Have Skills:
• Good experience in Pyspark - Including Dataframe core functions and Spark SQL
• Good experience in SQL DBs - Be able to write queries including fair complexity.
• Should have excellent experience in Big Data programming for data transformation and aggregations
• Good at ELT architecture. Business rules processing and data extraction from Data Lake into data streams for business consumption.
• Good customer communication.
• Good Analytical skills
Responsibilities:
* 3+ years of Data Engineering Experience - Design, develop, deliver and maintain data infrastructures.
* SQL Specialist – Strong knowledge and Seasoned experience with SQL Queries
* Languages: Python
* Good communicator, shows initiative, works well with stakeholders.
* Experience working closely with Data Analysts and provide the data they need and guide them on the issues.
* Solid ETL experience and Hadoop/Hive/Pyspark/Presto/ SparkSQL
* Solid communication and articulation skills
* Able to handle stakeholders independently with less interventions of reporting manager.
* Develop strategies to solve problems in logical yet creative ways.
* Create custom reports and presentations accompanied by strong data visualization and storytelling
We would be excited if you have:
* Excellent communication and interpersonal skills
* Ability to meet deadlines and manage project delivery
* Excellent report-writing and presentation skills
* Critical thinking and problem-solving capabilities
We are looking for an exceptionally talented Lead data engineer who has exposure in implementing AWS services to build data pipelines, api integration and designing data warehouse. Candidate with both hands-on and leadership capabilities will be ideal for this position.
Qualification: At least a bachelor’s degree in Science, Engineering, Applied Mathematics. Preferred Masters degree
Job Responsibilities:
• Total 6+ years of experience as a Data Engineer and 2+ years of experience in managing a team
• Have minimum 3 years of AWS Cloud experience.
• Well versed in languages such as Python, PySpark, SQL, NodeJS etc
• Has extensive experience in Spark ecosystem and has worked on both real time and batch processing
• Have experience in AWS Glue, EMR, DMS, Lambda, S3, DynamoDB, Step functions, Airflow, RDS, Aurora etc.
• Experience with modern Database systems such as Redshift, Presto, Hive etc.
• Worked on building data lakes in the past on S3 or Apache Hudi
• Solid understanding of Data Warehousing Concepts
• Good to have experience on tools such as Kafka or Kinesis
• Good to have AWS Developer Associate or Solutions Architect Associate Certification
• Have experience in managing a team
We have urgent requirement of Data Engineer/Sr Data Engineer for reputed MNC company.
Exp: 4-9yrs
Location: Pune/Bangalore/Hyderabad
Skills: We need candidate either Python AWS or Pyspark AWS or Spark Scala
at Persistent Systems
We have an urgent requirements of Big Data Developer profiles in our reputed MNC company.
Location: Pune/Bangalore/Hyderabad/Nagpur
Experience: 4-9yrs
Skills: Pyspark,AWS
or Spark,Scala,AWS
or Python Aws
closely with the Kinara management team to investigate strategically important business
questions.
Lead a team through the entire analytical and machine learning model life cycle:
Define the problem statement
Build and clean datasets
Exploratory data analysis
Feature engineering
Apply ML algorithms and assess the performance
Code for deployment
Code testing and troubleshooting
Communicate Analysis to Stakeholders
Manage Data Analysts and Data Scientists
Hiring for one of the MNC for India location
Key Responsibilities : ( Data Developer Python, Spark)
Exp : 2 to 9 Yrs
Development of data platforms, integration frameworks, processes, and code.
Develop and deliver APIs in Python or Scala for Business Intelligence applications build using a range of web languages
Develop comprehensive automated tests for features via end-to-end integration tests, performance tests, acceptance tests and unit tests.
Elaborate stories in a collaborative agile environment (SCRUM or Kanban)
Familiarity with cloud platforms like GCP, AWS or Azure.
Experience with large data volumes.
Familiarity with writing rest-based services.
Experience with distributed processing and systems
Experience with Hadoop / Spark toolsets
Experience with relational database management systems (RDBMS)
Experience with Data Flow development
Knowledge of Agile and associated development techniques including:
n
A global provider of Business Process Management company
Power BI Developer
Senior visualization engineer with 5 years’ experience in Power BI to develop and deliver solutions that enable delivery of information to audiences in support of key business processes. In addition, Hands-on experience on Azure data services like ADF and databricks is a must.
Ensure code and design quality through execution of test plans and assist in development of standards & guidelines working closely with internal and external design, business, and technical counterparts.
Candidates should have worked in agile development environments.
Desired Competencies:
- Should have minimum of 3 years project experience using Power BI on Azure stack.
- Should have good understanding and working knowledge of Data Warehouse and Data Modelling.
- Good hands-on experience of Power BI
- Hands-on experience T-SQL/ DAX/ MDX/ SSIS
- Data Warehousing on SQL Server (preferably 2016)
- Experience in Azure Data Services – ADF, DataBricks & PySpark
- Manage own workload with minimum supervision.
- Take responsibility of projects or issues assigned to them
- Be personable, flexible and a team player
- Good written and verbal communications
- Have a strong personality who will be able to operate directly with users
at Virtusa
- Minimum 1 years of relevant experience, in PySpark (mandatory)
- Hands on experience in development, test, deploy, maintain and improving data integration pipeline in AWS cloud environment is added plus
- Ability to play lead role and independently manage 3-5 member of Pyspark development team
- EMR ,Python and PYspark mandate.
- Knowledge and awareness working with AWS Cloud technologies like Apache Spark, , Glue, Kafka, Kinesis, and Lambda in S3, Redshift, RDS
Must Have Skills:
- Does analytics to extract insights from raw historical data of the organization.
- Generates usable training dataset for any/all MV projects with the help of Annotators, if needed.
- Analyses user trends, and identifies their biggest bottlenecks in Hammoq Workflow.
- Tests the short/long term impact of productized MV models on those trends.
- Skills - Numpy, Pandas, SPARK, APACHE SPARK, PYSPARK, ETL mandatory.
- Sr. Data Engineer:
Core Skills – Data Engineering, Big Data, Pyspark, Spark SQL and Python
Candidate with prior Palantir Cloud Foundry OR Clinical Trial Data Model background is preferred
Major accountabilities:
- Responsible for Data Engineering, Foundry Data Pipeline Creation, Foundry Analysis & Reporting, Slate Application development, re-usable code development & management and Integrating Internal or External System with Foundry for data ingestion with high quality.
- Have good understanding on Foundry Platform landscape and it’s capabilities
- Performs data analysis required to troubleshoot data related issues and assist in the resolution of data issues.
- Defines company data assets (data models), Pyspark, spark SQL, jobs to populate data models.
- Designs data integrations and data quality framework.
- Design & Implement integration with Internal, External Systems, F1 AWS platform using Foundry Data Connector or Magritte Agent
- Collaboration with data scientists, data analyst and technology teams to document and leverage their understanding of the Foundry integration with different data sources - Actively participate in agile work practices
- Coordinating with Quality Engineer to ensure the all quality controls, naming convention & best practices have been followed
Desired Candidate Profile :
- Strong data engineering background
- Experience with Clinical Data Model is preferred
- Experience in
- SQL Server ,Postgres, Cassandra, Hadoop, and Spark for distributed data storage and parallel computing
- Java and Groovy for our back-end applications and data integration tools
- Python for data processing and analysis
- Cloud infrastructure based on AWS EC2 and S3
- 7+ years IT experience, 2+ years’ experience in Palantir Foundry Platform, 4+ years’ experience in Big Data platform
- 5+ years of Python and Pyspark development experience
- Strong troubleshooting and problem solving skills
- BTech or master's degree in computer science or a related technical field
- Experience designing, building, and maintaining big data pipelines systems
- Hands-on experience on Palantir Foundry Platform and Foundry custom Apps development
- Able to design and implement data integration between Palantir Foundry and external Apps based on Foundry data connector framework
- Hands-on in programming languages primarily Python, R, Java, Unix shell scripts
- Hand-on experience in AWS / Azure cloud platform and stack
- Strong in API based architecture and concept, able to do quick PoC using API integration and development
- Knowledge of machine learning and AI
- Skill and comfort working in a rapidly changing environment with dynamic objectives and iteration with users.
Demonstrated ability to continuously learn, work independently, and make decisions with minimal supervision
- Building and operationalizing large scale enterprise data solutions and applications using one or more of AZURE data and analytics services in combination with custom solutions - Azure Synapse/Azure SQL DWH, Azure Data Lake, Azure Blob Storage, Spark, HDInsights, Databricks, CosmosDB, EventHub/IOTHub.
- Experience in migrating on-premise data warehouses to data platforms on AZURE cloud.
- Designing and implementing data engineering, ingestion, and transformation functions
-
Azure Synapse or Azure SQL data warehouse
-
Spark on Azure is available in HD insights and data bricks
- Experience with Azure Analysis Services
- Experience in Power BI
- Experience with third-party solutions like Attunity/Stream sets, Informatica
- Experience with PreSales activities (Responding to RFPs, Executing Quick POCs)
- Capacity Planning and Performance Tuning on Azure Stack and Spark.
- Setting KPIs, monitoring key trends, and helping stakeholders by generating insights from the data delivered.
- Understanding user behaviour and performing root-cause analysis of changes in data trends across different verticals.
- Get answers to business questions, identify areas of improvement, and identify opportunities for growth.
- Work on ad-hoc requests for data and analysis.
- Work with Cross functional Teams as when required to automate reports and create informative dashboards based on problem statements.
WHO COULD BE A GREAT FIT:
Functional Experience
- 1-2 years of experience working in Analytics as a Business or Data Analyst.
- Analytical mind with a problem-solving aptitude.
- Familiarity with Microsoft Azure & AWS PySpark, Python, Data Bricks, Metabase, Understanding of APIs, data warehouse and ETL etc.
- Proficient in writing Complex Queries in SQL.
- Experience in Performing hands-on analysis on data and across multiple datasets and databases primarily using Excel, Google Sheets and R.
- Ability to work across cross-functional teams proactively.
- Bring in industry best practices around creating and maintaining robust data pipelines for complex data projects with/without AI component
- programmatically ingesting data from several static and real-time sources (incl. web scraping)
- rendering results through dynamic interfaces incl. web / mobile / dashboard with the ability to log usage and granular user feedbacks
- performance tuning and optimal implementation of complex Python scripts (using SPARK), SQL (using stored procedures, HIVE), and NoSQL queries in a production environment
- Industrialize ML / DL solutions and deploy and manage production services; proactively handle data issues arising on live apps
- Perform ETL on large and complex datasets for AI applications - work closely with data scientists on performance optimization of large-scale ML/DL model training
- Build data tools to facilitate fast data cleaning and statistical analysis
- Ensure data architecture is secure and compliant
- Resolve issues escalated from Business and Functional areas on data quality, accuracy, and availability
- Work closely with APAC CDO and coordinate with a fully decentralized team across different locations in APAC and global HQ (Paris).
You should be
- Expert in structured and unstructured data in traditional and Big data environments – Oracle / SQLserver, MongoDB, Hive / Pig, BigQuery, and Spark
- Have excellent knowledge of Python programming both in traditional and distributed models (PySpark)
- Expert in shell scripting and writing schedulers
- Hands-on experience with Cloud - deploying complex data solutions in hybrid cloud / on-premise environment both for data extraction/storage and computation
- Hands-on experience in deploying production apps using large volumes of data with state-of-the-art technologies like Dockers, Kubernetes, and Kafka
- Strong knowledge of data security best practices
- 5+ years experience in a data engineering role
- Science / Engineering graduate from a Tier-1 university in the country
- And most importantly, you must be a passionate coder who really cares about building apps that can help people do things better, smarter, and faster even when they sleep
IT solutions specialized in Apps Lifecycle management. (MG1)
- Automate and maintain ML and Data pipelines at scale
- Collaborate with Data Scientists and Data Engineers on feature development teams to containerize and build out deployment pipelines for new modules
- Maintain and expand our on-prem deployments with spark clusters
- Design, build and optimize applications containerization and orchestration with Docker and Kubernetes and AWS or Azure
- 5 years of IT experience in data-driven or AI technology products
- Understanding of ML Model Deployment and Lifecycle
- Extensive experience in Apache airflow for MLOps workflow automation
- Experience is building and automating data pipelines
- Experience in working on Spark Cluster architecture
- Extensive experience with Unix/Linux environments
- Experience with standard concepts and technologies used in CI/CD build, deployment pipelines using Jenkins
- Strong experience in Python and PySpark and building required automation (using standard technologies such as Docker, Jenkins, and Ansible).
- Experience with Kubernetes or Docker Swarm
- Working technical knowledge of current systems software, protocols, and standards, including firewalls, Active Directory, etc.
- Basic knowledge of Multi-tier architectures: load balancers, caching, web servers, application servers, and databases.
- Experience with various virtualization technologies and multi-tenant, private and hybrid cloud environments.
- Hands-on software and hardware troubleshooting experience.
- Experience documenting and maintaining configuration and process information.
- Basic Knowledge of machine learning frameworks: Tensorflow, Caffe/Caffe2, Pytorch
Azure – Data Engineer
- At least 2 years hands on experience working with an Agile data engineering team working on big data pipelines using Azure in a commercial environment.
- Dealing with senior stakeholders/leadership
- Understanding of Azure data security and encryption best practices. [ADFS/ACLs]
Data Bricks –experience writing in and using data bricks Using Python to transform, manipulate data.
Data Factory – experience using data factory in an enterprise solution to build data pipelines. Experience calling rest APIs.
Synapse/data warehouse – experience using synapse/data warehouse to present data securely and to build & manage data models.
Microsoft SQL server – We’d expect the candidate to have come from a SQL/Data background and progressed into Azure
PowerBI – Experience with this is preferred
Additionally
- Experience using GIT as a source control system
- Understanding of DevOps concepts and application
- Understanding of Azure Cloud costs/management and running platforms efficiently
- Creating, designing and developing data models
- Prepare plans for all ETL (Extract/Transformation/Load) procedures and architectures
- Validating results and creating business reports
- Monitoring and tuning data loads and queries
- Develop and prepare a schedule for a new data warehouse
- Analyze large databases and recommend appropriate optimization for the same
- Administer all requirements and design various functional specifications for data
- Provide support to the Software Development Life cycle
- Prepare various code designs and ensure efficient implementation of the same
- Evaluate all codes and ensure the quality of all project deliverables
- Monitor data warehouse work and provide subject matter expertise
- Hands-on BI practices, data structures, data modeling, SQL skills
- Minimum 1 year experience in Pyspark
JD for IOT DE:
The role requires experience in Azure core technologies – IoT Hub/ Event Hub, Stream Analytics, IoT Central, Azure Data Lake Storage, Azure Cosmos, Azure Data Factory, Azure SQL Database, Azure HDInsight / Databricks, SQL data warehouse.
You Have:
- Minimum 2 years of software development experience
- Minimum 2 years of experience in IoT/streaming data pipelines solution development
- Bachelor's and/or Master’s degree in computer science
- Strong Consulting skills in data management including data governance, data quality, security, data integration, processing, and provisioning
- Delivered data management projects with real-time/near real-time data insights delivery on Azure Cloud
- Translated complex analytical requirements into the technical design including data models, ETLs, and Dashboards / Reports
- Experience deploying dashboards and self-service analytics solutions on both relational and non-relational databases
- Experience with different computing paradigms in databases such as In-Memory, Distributed, Massively Parallel Processing
- Successfully delivered large scale IOT data management initiatives covering Plan, Design, Build and Deploy phases leveraging different delivery methodologies including Agile
- Experience in handling telemetry data with Spark Streaming, Kafka, Flink, Scala, Pyspark, Spark SQL.
- Hands-on experience on containers and Dockers
- Exposure to streaming protocols like MQTT and AMQP
- Knowledge of OT network protocols like OPC UA, CAN Bus, and similar protocols
- Strong knowledge of continuous integration, static code analysis, and test-driven development
- Experience in delivering projects in a highly collaborative delivery model with teams at onsite and offshore
- Must have excellent analytical and problem-solving skills
- Delivered change management initiatives focused on driving data platforms adoption across the enterprise
- Strong verbal and written communications skills are a must, as well as the ability to work effectively across internal and external organizations
Roles & Responsibilities
You Will:
- Translate functional requirements into technical design
- Interact with clients and internal stakeholders to understand the data and platform requirements in detail and determine core Azure services needed to fulfill the technical design
- Design, Develop and Deliver data integration interfaces in ADF and Azure Databricks
- Design, Develop and Deliver data provisioning interfaces to fulfill consumption needs
- Deliver data models on Azure platform, it could be on Azure Cosmos, SQL DW / Synapse, or SQL
- Advise clients on ML Engineering and deploying ML Ops at Scale on AKS
- Automate core activities to minimize the delivery lead times and improve the overall quality
- Optimize platform cost by selecting the right platform services and architecting the solution in a cost-effective manner
- Deploy Azure DevOps and CI CD processes
- Deploy logging and monitoring across the different integration points for critical alerts
Must Have Skills:
- Solid Knowledge on DWH, ETL and Big Data Concepts
- Excellent SQL Skills (With knowledge of SQL Analytics Functions)
- Working Experience on any ETL tool i.e. SSIS / Informatica
- Working Experience on any Azure or AWS Big Data Tools.
- Experience on Implementing Data Jobs (Batch / Real time Streaming)
- Excellent written and verbal communication skills in English, Self-motivated with strong sense of ownership and Ready to learn new tools and technologies
Preferred Skills:
- Experience on Py-Spark / Spark SQL
- AWS Data Tools (AWS Glue, AWS Athena)
- Azure Data Tools (Azure Databricks, Azure Data Factory)
Other Skills:
- Knowledge about Azure Blob, Azure File Storage, AWS S3, Elastic Search / Redis Search
- Knowledge on domain/function (across pricing, promotions and assortment).
- Implementation Experience on Schema and Data Validator framework (Python / Java / SQL),
- Knowledge on DQS and MDM.
Key Responsibilities:
- Independently work on ETL / DWH / Big data Projects
- Gather and process raw data at scale.
- Design and develop data applications using selected tools and frameworks as required and requested.
- Read, extract, transform, stage and load data to selected tools and frameworks as required and requested.
- Perform tasks such as writing scripts, web scraping, calling APIs, write SQL queries, etc.
- Work closely with the engineering team to integrate your work into our production systems.
- Process unstructured data into a form suitable for analysis.
- Analyse processed data.
- Support business decisions with ad hoc analysis as needed.
- Monitoring data performance and modifying infrastructure as needed.
Responsibility: Smart Resource, having excellent communication skills
Basic Qualifications
- Need to have a working knowledge of AWS Redshift.
- Minimum 1 year of designing and implementing a fully operational production-grade large-scale data solution on Snowflake Data Warehouse.
- 3 years of hands-on experience with building productized data ingestion and processing pipelines using Spark, Scala, Python
- 2 years of hands-on experience designing and implementing production-grade data warehousing solutions
- Expertise and excellent understanding of Snowflake Internals and integration of Snowflake with other data processing and reporting technologies
- Excellent presentation and communication skills, both written and verbal
- Ability to problem-solve and architect in an environment with unclear requirements
- Building and operationalizing large scale enterprise data solutions and applications using one or more of AZURE data and analytics services in combination with custom solutions - Azure Synapse/Azure SQL DWH, Azure Data Lake, Azure Blob Storage, Spark, HDInsights, Databricks, CosmosDB, EventHub/IOTHub.
- Experience in migrating on-premise data warehouses to data platforms on AZURE cloud.
- Designing and implementing data engineering, ingestion, and transformation functions
-
Azure Synapse or Azure SQL data warehouse
-
Spark on Azure is available in HD insights and data bricks