33+ PySpark Jobs in Hyderabad | PySpark Job openings in Hyderabad
Apply to 33+ PySpark Jobs in Hyderabad on CutShort.io. Explore the latest PySpark Job opportunities across top companies like Google, Amazon & Adobe.
Building the machine learning production (or MLOps) is the biggest challenge most large companies currently have in making the transition to becoming an AI-driven organization. This position is an opportunity for an experienced, server-side developer to build expertise in this exciting new frontier. You will be part of a team deploying state-of-the-art AI solutions for Fractal clients.
Responsibilities
As MLOps Engineer, you will work collaboratively with Data Scientists and Data engineers to deploy and operate advanced analytics machine learning models. You’ll help automate and streamline Model development and Model operations. You’ll build and maintain tools for deployment, monitoring, and operations. You’ll also troubleshoot and resolve issues in development, testing, and production environments.
- Enable Model tracking, model experimentation, Model automation
- Develop ML pipelines to support
- Develop MLOps components in Machine learning development life cycle using Model Repository (either of): MLFlow, Kubeflow Model Registry
- Develop MLOps components in Machine learning development life cycle using Machine Learning Services (either of): Kubeflow, DataRobot, HopsWorks, Dataiku or any relevant ML E2E PaaS/SaaS
- Work across all phases of Model development life cycle to build MLOPS components
- Build the knowledge base required to deliver increasingly complex MLOPS projects on Azure
- Be an integral part of client business development and delivery engagements across multiple domains
Required Qualifications
- 3-5 years experience building production-quality software.
- B.E/B.Tech/M.Tech in Computer Science or related technical degree OR Equivalent
- Strong experience in System Integration, Application Development or Data Warehouse projects across technologies used in the enterprise space
- Knowledge of MLOps, machine learning and docker
- Object-oriented languages (e.g. Python, PySpark, Java, C#, C++)
- CI/CD experience( i.e. Jenkins, Git hub action,
- Database programming using any flavors of SQL
- Knowledge of Git for Source code management
- Ability to collaborate effectively with highly technical resources in a fast-paced environment
- Ability to solve complex challenges/problems and rapidly deliver innovative solutions
- Foundational Knowledge of Cloud Computing on Azure
- Hunger and passion for learning new skills
Building the machine learning production System(or MLOps) is the biggest challenge most large companies currently have in making the transition to becoming an AI-driven organization. This position is an opportunity for an experienced, server-side developer to build expertise in this exciting new frontier. You will be part of a team deploying state-ofthe-art AI solutions for Fractal clients.
Responsibilities
As MLOps Engineer, you will work collaboratively with Data Scientists and Data engineers to deploy and operate advanced analytics machine learning models. You’ll help automate and streamline Model development and Model operations. You’ll build and maintain tools for deployment, monitoring, and operations. You’ll also troubleshoot and resolve issues in development, testing, and production environments.
- Enable Model tracking, model experimentation, Model automation
- Develop scalable ML pipelines
- Develop MLOps components in Machine learning development life cycle using Model Repository (either of): MLFlow, Kubeflow Model Registry
- Machine Learning Services (either of): Kubeflow, DataRobot, HopsWorks, Dataiku or any relevant ML E2E PaaS/SaaS
- Work across all phases of Model development life cycle to build MLOPS components
- Build the knowledge base required to deliver increasingly complex MLOPS projects on Azure
- Be an integral part of client business development and delivery engagements across multiple domains
Required Qualifications
- 5.5-9 years experience building production-quality software
- B.E/B.Tech/M.Tech in Computer Science or related technical degree OR equivalent
- Strong experience in System Integration, Application Development or Datawarehouse projects across technologies used in the enterprise space
- Expertise in MLOps, machine learning and docker
- Object-oriented languages (e.g. Python, PySpark, Java, C#, C++)
- Experience developing CI/CD components for production ready ML pipeline.
- Database programming using any flavors of SQL
- Knowledge of Git for Source code management
- Ability to collaborate effectively with highly technical resources in a fast-paced environment
- Ability to solve complex challenges/problems and rapidly deliver innovative solutions
- Team handling, problem solving, project management and communication skills & creative thinking
- Foundational Knowledge of Cloud Computing on Azure
- Hunger and passion for learning new skills
We are actively seeking a self-motivated Data Engineer with expertise in Azure cloud and Databricks, with a thorough understanding of Delta Lake and Lake-house Architecture. The ideal candidate should excel in developing scalable data solutions, crafting platform tools, and integrating systems, while demonstrating proficiency in cloud-native database solutions and distributed data processing.
Key Responsibilities:
- Contribute to the development and upkeep of a scalable data platform, incorporating tools and frameworks that leverage Azure and Databricks capabilities.
- Exhibit proficiency in various RDBMS databases such as MySQL and SQL-Server, emphasizing their integration in applications and pipeline development.
- Design and maintain high-caliber code, including data pipelines and applications, utilizing Python, Scala, and PHP.
- Implement effective data processing solutions via Apache Spark, optimizing Spark applications for large-scale data handling.
- Optimize data storage using formats like Parquet and Delta Lake to ensure efficient data accessibility and reliable performance.
- Demonstrate understanding of Hive Metastore, Unity Catalog Metastore, and the operational dynamics of external tables.
- Collaborate with diverse teams to convert business requirements into precise technical specifications.
Requirements:
- Bachelor’s degree in Computer Science, Engineering, or a related discipline.
- Demonstrated hands-on experience with Azure cloud services and Databricks.
- Proficient programming skills in Python, Scala, and PHP.
- In-depth knowledge of SQL, NoSQL databases, and data warehousing principles.
- Familiarity with distributed data processing and external table management.
- Insight into enterprise data solutions for PIM, CDP, MDM, and ERP applications.
- Exceptional problem-solving acumen and meticulous attention to detail.
Additional Qualifications :
- Acquaintance with data security and privacy standards.
- Experience in CI/CD pipelines and version control systems, notably Git.
- Familiarity with Agile methodologies and DevOps practices.
- Competence in technical writing for comprehensive documentation.
Technical Skills:
- Ability to understand and translate business requirements into design.
- Proficient in AWS infrastructure components such as S3, IAM, VPC, EC2, and Redshift.
- Experience in creating ETL jobs using Python/PySpark.
- Proficiency in creating AWS Lambda functions for event-based jobs.
- Knowledge of automating ETL processes using AWS Step Functions.
- Competence in building data warehouses and loading data into them.
Responsibilities:
- Understand business requirements and translate them into design.
- Assess AWS infrastructure needs for development work.
- Develop ETL jobs using Python/PySpark to meet requirements.
- Implement AWS Lambda for event-based tasks.
- Automate ETL processes using AWS Step Functions.
- Build data warehouses and manage data loading.
- Engage with customers and stakeholders to articulate the benefits of proposed solutions and frameworks.
Publicis Sapient Overview:
The Senior Associate People Senior Associate L1 in Data Engineering, you will translate client requirements into technical design, and implement components for data engineering solution. Utilize deep understanding of data integration and big data design principles in creating custom solutions or implementing package solutions. You will independently drive design discussions to insure the necessary health of the overall solution
.
Job Summary:
As Senior Associate L2 in Data Engineering, you will translate client requirements into technical design, and implement components for data engineering solution. Utilize deep understanding of data integration and big data design principles in creating custom solutions or implementing package solutions. You will independently drive design discussions to insure the necessary health of the overall solution
The role requires a hands-on technologist who has strong programming background like Java / Scala / Python, should have experience in Data Ingestion, Integration and data Wrangling, Computation, Analytics pipelines and exposure to Hadoop ecosystem components. You are also required to have hands-on knowledge on at least one of AWS, GCP, Azure cloud platforms.
Role & Responsibilities:
Your role is focused on Design, Development and delivery of solutions involving:
• Data Integration, Processing & Governance
• Data Storage and Computation Frameworks, Performance Optimizations
• Analytics & Visualizations
• Infrastructure & Cloud Computing
• Data Management Platforms
• Implement scalable architectural models for data processing and storage
• Build functionality for data ingestion from multiple heterogeneous sources in batch & real-time mode
• Build functionality for data analytics, search and aggregation
Experience Guidelines:
Mandatory Experience and Competencies:
# Competency
1.Overall 5+ years of IT experience with 3+ years in Data related technologies
2.Minimum 2.5 years of experience in Big Data technologies and working exposure in at least one cloud platform on related data services (AWS / Azure / GCP)
3.Hands-on experience with the Hadoop stack – HDFS, sqoop, kafka, Pulsar, NiFi, Spark, Spark Streaming, Flink, Storm, hive, oozie, airflow and other components required in building end to end data pipeline.
4.Strong experience in at least of the programming language Java, Scala, Python. Java preferable
5.Hands-on working knowledge of NoSQL and MPP data platforms like Hbase, MongoDb, Cassandra, AWS Redshift, Azure SQLDW, GCP BigQuery etc
6.Well-versed and working knowledge with data platform related services on at least 1 cloud platform, IAM and data security
Preferred Experience and Knowledge (Good to Have):
# Competency
1.Good knowledge of traditional ETL tools (Informatica, Talend, etc) and database technologies (Oracle, MySQL, SQL Server, Postgres) with hands on experience
2.Knowledge on data governance processes (security, lineage, catalog) and tools like Collibra, Alation etc
3.Knowledge on distributed messaging frameworks like ActiveMQ / RabbiMQ / Solace, search & indexing and Micro services architectures
4.Performance tuning and optimization of data pipelines
5.CI/CD – Infra provisioning on cloud, auto build & deployment pipelines, code quality
6.Cloud data specialty and other related Big data technology certifications
Personal Attributes:
• Strong written and verbal communication skills
• Articulation skills
• Good team player
• Self-starter who requires minimal oversight
• Ability to prioritize and manage multiple tasks
• Process orientation and the ability to define and set up processes
Job Description
Responsibilities:
- Collaborate with stakeholders to understand business objectives and requirements for AI/ML projects.
- Conduct research and stay up-to-date with the latest AI/ML algorithms, techniques, and frameworks.
- Design and develop machine learning models, algorithms, and data pipelines.
- Collect, preprocess, and clean large datasets to ensure data quality and reliability.
- Train, evaluate, and optimize machine learning models using appropriate evaluation metrics.
- Implement and deploy AI/ML models into production environments.
- Monitor model performance and propose enhancements or updates as needed.
- Collaborate with software engineers to integrate AI/ML capabilities into existing software systems.
- Perform data analysis and visualization to derive actionable insights.
- Stay informed about emerging trends and advancements in the field of AI/ML and apply them to improve existing solutions.
Strong experience in Apache pyspark is must
Requirements:
- Bachelor's or Master's degree in Computer Science, Engineering, or a related field.
- Proven experience of 3-5 years as an AI/ML Engineer or a similar role.
- Strong knowledge of machine learning algorithms, deep learning frameworks, and data science concepts.
- Proficiency in programming languages such as Python, Java, or C++.
- Experience with popular AI/ML libraries and frameworks, such as TensorFlow, Keras, PyTorch, or scikit-learn.
- Familiarity with cloud platforms, such as AWS, Azure, or GCP, and their AI/ML services.
- Solid understanding of data preprocessing, feature engineering, and model evaluation techniques.
- Experience in deploying and scaling machine learning models in production environments.
- Strong problem-solving skills and ability to work on multiple projects simultaneously.
- Excellent communication and teamwork skills.
Preferred Skills:
- Experience with natural language processing (NLP) techniques and tools.
- Familiarity with big data technologies, such as Hadoop, Spark, or Hive.
- Knowledge of containerization technologies like Docker and orchestration tools like Kubernetes.
- Understanding of DevOps practices for AI/ML model deployment
-Apache ,Pyspark
Publicis Sapient Overview:
The Senior Associate People Senior Associate L1 in Data Engineering, you will translate client requirements into technical design, and implement components for data engineering solution. Utilize deep understanding of data integration and big data design principles in creating custom solutions or implementing package solutions. You will independently drive design discussions to insure the necessary health of the overall solution
.
Job Summary:
As Senior Associate L1 in Data Engineering, you will do technical design, and implement components for data engineering solution. Utilize deep understanding of data integration and big data design principles in creating custom solutions or implementing package solutions. You will independently drive design discussions to insure the necessary health of the overall solution
The role requires a hands-on technologist who has strong programming background like Java / Scala / Python, should have experience in Data Ingestion, Integration and data Wrangling, Computation, Analytics pipelines and exposure to Hadoop ecosystem components. Having hands-on knowledge on at least one of AWS, GCP, Azure cloud platforms will be preferable.
Role & Responsibilities:
Job Title: Senior Associate L1 – Data Engineering
Your role is focused on Design, Development and delivery of solutions involving:
• Data Ingestion, Integration and Transformation
• Data Storage and Computation Frameworks, Performance Optimizations
• Analytics & Visualizations
• Infrastructure & Cloud Computing
• Data Management Platforms
• Build functionality for data ingestion from multiple heterogeneous sources in batch & real-time
• Build functionality for data analytics, search and aggregation
Experience Guidelines:
Mandatory Experience and Competencies:
# Competency
1.Overall 3.5+ years of IT experience with 1.5+ years in Data related technologies
2.Minimum 1.5 years of experience in Big Data technologies
3.Hands-on experience with the Hadoop stack – HDFS, sqoop, kafka, Pulsar, NiFi, Spark, Spark Streaming, Flink, Storm, hive, oozie, airflow and other components required in building end to end data pipeline. Working knowledge on real-time data pipelines is added advantage.
4.Strong experience in at least of the programming language Java, Scala, Python. Java preferable
5.Hands-on working knowledge of NoSQL and MPP data platforms like Hbase, MongoDb, Cassandra, AWS Redshift, Azure SQLDW, GCP BigQuery etc
Preferred Experience and Knowledge (Good to Have):
# Competency
1.Good knowledge of traditional ETL tools (Informatica, Talend, etc) and database technologies (Oracle, MySQL, SQL Server, Postgres) with hands on experience
2.Knowledge on data governance processes (security, lineage, catalog) and tools like Collibra, Alation etc
3.Knowledge on distributed messaging frameworks like ActiveMQ / RabbiMQ / Solace, search & indexing and Micro services architectures
4.Performance tuning and optimization of data pipelines
5.CI/CD – Infra provisioning on cloud, auto build & deployment pipelines, code quality
6.Working knowledge with data platform related services on at least 1 cloud platform, IAM and data security
7.Cloud data specialty and other related Big data technology certifications
Job Title: Senior Associate L1 – Data Engineering
Personal Attributes:
• Strong written and verbal communication skills
• Articulation skills
• Good team player
• Self-starter who requires minimal oversight
• Ability to prioritize and manage multiple tasks
• Process orientation and the ability to define and set up processes
A LEADING US BASED MNC
Data Engineering : Senior Engineer / Manager
As Senior Engineer/ Manager in Data Engineering, you will translate client requirements into technical design, and implement components for a data engineering solutions. Utilize a deep understanding of data integration and big data design principles in creating custom solutions or implementing package solutions. You will independently drive design discussions to insure the necessary health of the overall solution.
Must Have skills :
1. GCP
2. Spark streaming : Live data streaming experience is desired.
3. Any 1 coding language: Java/Pyhton /Scala
Skills & Experience :
- Overall experience of MINIMUM 5+ years with Minimum 4 years of relevant experience in Big Data technologies
- Hands-on experience with the Hadoop stack - HDFS, sqoop, kafka, Pulsar, NiFi, Spark, Spark Streaming, Flink, Storm, hive, oozie, airflow and other components required in building end to end data pipeline. Working knowledge on real-time data pipelines is added advantage.
- Strong experience in at least of the programming language Java, Scala, Python. Java preferable
- Hands-on working knowledge of NoSQL and MPP data platforms like Hbase, MongoDb, Cassandra, AWS Redshift, Azure SQLDW, GCP BigQuery etc.
- Well-versed and working knowledge with data platform related services on GCP
- Bachelor's degree and year of work experience of 6 to 12 years or any combination of education, training and/or experience that demonstrates the ability to perform the duties of the position
Your Impact :
- Data Ingestion, Integration and Transformation
- Data Storage and Computation Frameworks, Performance Optimizations
- Analytics & Visualizations
- Infrastructure & Cloud Computing
- Data Management Platforms
- Build functionality for data ingestion from multiple heterogeneous sources in batch & real-time
- Build functionality for data analytics, search and aggregation
AWS Glue Developer
Work Experience: 6 to 8 Years
Work Location: Noida, Bangalore, Chennai & Hyderabad
Must Have Skills: AWS Glue, DMS, SQL, Python, PySpark, Data integrations and Data Ops,
Job Reference ID:BT/F21/IND
Job Description:
Design, build and configure applications to meet business process and application requirements.
Responsibilities:
7 years of work experience with ETL, Data Modelling, and Data Architecture Proficient in ETL optimization, designing, coding, and tuning big data processes using Pyspark Extensive experience to build data platforms on AWS using core AWS services Step function, EMR, Lambda, Glue and Athena, Redshift, Postgres, RDS etc and design/develop data engineering solutions. Orchestrate using Airflow.
Technical Experience:
Hands-on experience on developing Data platform and its components Data Lake, cloud Datawarehouse, APIs, Batch and streaming data pipeline Experience with building data pipelines and applications to stream and process large datasets at low latencies.
➢ Enhancements, new development, defect resolution and production support of Big data ETL development using AWS native services.
➢ Create data pipeline architecture by designing and implementing data ingestion solutions.
➢ Integrate data sets using AWS services such as Glue, Lambda functions/ Airflow.
➢ Design and optimize data models on AWS Cloud using AWS data stores such as Redshift, RDS, S3, Athena.
➢ Author ETL processes using Python, Pyspark.
➢ Build Redshift Spectrum direct transformations and data modelling using data in S3.
➢ ETL process monitoring using CloudWatch events.
➢ You will be working in collaboration with other teams. Good communication must.
➢ Must have experience in using AWS services API, AWS CLI and SDK
Professional Attributes:
➢ Experience operating very large data warehouses or data lakes Expert-level skills in writing and optimizing SQL Extensive, real-world experience designing technology components for enterprise solutions and defining solution architectures and reference architectures with a focus on cloud technology.
➢ Must have 6+ years of big data ETL experience using Python, S3, Lambda, Dynamo DB, Athena, Glue in AWS environment.
➢ Expertise in S3, RDS, Redshift, Kinesis, EC2 clusters highly desired.
Qualification:
➢ Degree in Computer Science, Computer Engineering or equivalent.
Salary: Commensurate with experience and demonstrated competence
We are #hiring for AWS Data Engineer expert to join our team
Job Title: AWS Data Engineer
Experience: 5 Yrs to 10Yrs
Location: Remote
Notice: Immediate or Max 20 Days
Role: Permanent Role
Skillset: AWS, ETL, SQL, Python, Pyspark, Postgres DB, Dremio.
Job Description:
Able to develop ETL jobs.
Able to help with data curation/cleanup, data transformation, and building ETL pipelines.
Strong Postgres DB exp and knowledge of Dremio data visualization/semantic layer between DB and the application is a plus.
Sql, Python, and Pyspark is a must.
Communication should be good
at Altimetrik
Bigdata with cloud:
Experience : 5-10 years
Location : Hyderabad/Chennai
Notice period : 15-20 days Max
1. Expertise in building AWS Data Engineering pipelines with AWS Glue -> Athena -> Quick sight
2. Experience in developing lambda functions with AWS Lambda
3. Expertise with Spark/PySpark – Candidate should be hands on with PySpark code and should be able to do transformations with Spark
4. Should be able to code in Python and Scala.
5. Snowflake experience will be a plus
at Altimetrik
-Expertise in building AWS Data Engineering pipelines with AWS Glue -> Athena -> Quick sight.
-Experience in developing lambda functions with AWS Lambda.
-
Expertise with Spark/PySpark
– Candidate should be hands on with PySpark code and should be able to do transformations with Spark
-Should be able to code in Python and Scala.
-
Snowflake experience will be a plus
at Altimetrik
Experience in developing lambda functions with AWS Lambda
Expertise with Spark/PySpark – Candidate should be hands on with PySpark code and should be able to do transformations with Spark
Should be able to code in Python and Scala.
Snowflake experience will be a plus
at Altimetrik
- Expertise in building AWS Data Engineering pipelines with AWS Glue -> Athena -> Quick sight
- Experience in developing lambda functions with AWS Lambda
- Expertise with Spark/PySpark – Candidate should be hands on with PySpark code and should be able to do transformations with Spark
- Should be able to code in Python and Scala.
- Snowflake experience will be a plus
Altimetrik
Big data Developer
Exp: 3yrs to 7 yrs.
Job Location: Hyderabad
Notice: Immediate / within 30 days
1. Expertise in building AWS Data Engineering pipelines with AWS Glue -> Athena -> Quick sight
2. Experience in developing lambda functions with AWS Lambda
3. Expertise with Spark/PySpark Candidate should be hands on with PySpark code and should be able to do transformations with Spark
4. Should be able to code in Python and Scala.
5. Snowflake experience will be a plus
We can start keeping Hadoop and Hive requirements as good to have or understanding of is enough rather than keeping it as a desirable requirement.
Skills and requirements
- Experience analyzing complex and varied data in a commercial or academic setting.
- Desire to solve new and complex problems every day.
- Excellent ability to communicate scientific results to both technical and non-technical team members.
Desirable
- A degree in a numerically focused discipline such as, Maths, Physics, Chemistry, Engineering or Biological Sciences..
- Hands on experience on Python, Pyspark, SQL
- Hands on experience on building End to End Data Pipelines.
- Hands on Experience on Azure Data Factory, Azure Data Bricks, Data Lake - added advantage
- Hands on Experience in building data pipelines.
- Experience with Bigdata Tools, Hadoop, Hive, Sqoop, Spark, SparkSQL
- Experience with SQL or NoSQL databases for the purposes of data retrieval and management.
- Experience in data warehousing and business intelligence tools, techniques and technology, as well as experience in diving deep on data analysis or technical issues to come up with effective solutions.
- BS degree in math, statistics, computer science or equivalent technical field.
- Experience in data mining structured and unstructured data (SQL, ETL, data warehouse, Machine Learning etc.) in a business environment with large-scale, complex data sets.
- Proven ability to look at solutions in unconventional ways. Sees opportunities to innovate and can lead the way.
- Willing to learn and work on Data Science, ML, AI.
consulting & implementation services in the area of Oil & Gas, Mining and Manufacturing Industry
- Data Engineer
Required skill set: AWS GLUE, AWS LAMBDA, AWS SNS/SQS, AWS ATHENA, SPARK, SNOWFLAKE, PYTHON
Mandatory Requirements
- Experience in AWS Glue
- Experience in Apache Parquet
- Proficient in AWS S3 and data lake
- Knowledge of Snowflake
- Understanding of file-based ingestion best practices.
- Scripting language - Python & pyspark
CORE RESPONSIBILITIES
- Create and manage cloud resources in AWS
- Data ingestion from different data sources which exposes data using different technologies, such as: RDBMS, REST HTTP API, flat files, Streams, and Time series data based on various proprietary systems. Implement data ingestion and processing with the help of Big Data technologies
- Data processing/transformation using various technologies such as Spark and Cloud Services. You will need to understand your part of business logic and implement it using the language supported by the base data platform
- Develop automated data quality check to make sure right data enters the platform and verifying the results of the calculations
- Develop an infrastructure to collect, transform, combine and publish/distribute customer data.
- Define process improvement opportunities to optimize data collection, insights and displays.
- Ensure data and results are accessible, scalable, efficient, accurate, complete and flexible
- Identify and interpret trends and patterns from complex data sets
- Construct a framework utilizing data visualization tools and techniques to present consolidated analytical and actionable results to relevant stakeholders.
- Key participant in regular Scrum ceremonies with the agile teams
- Proficient at developing queries, writing reports and presenting findings
- Mentor junior members and bring best industry practices
QUALIFICATIONS
- 5-7+ years’ experience as data engineer in consumer finance or equivalent industry (consumer loans, collections, servicing, optional product, and insurance sales)
- Strong background in math, statistics, computer science, data science or related discipline
- Advanced knowledge one of language: Java, Scala, Python, C#
- Production experience with: HDFS, YARN, Hive, Spark, Kafka, Oozie / Airflow, Amazon Web Services (AWS), Docker / Kubernetes, Snowflake
- Proficient with
- Data mining/programming tools (e.g. SAS, SQL, R, Python)
- Database technologies (e.g. PostgreSQL, Redshift, Snowflake. and Greenplum)
- Data visualization (e.g. Tableau, Looker, MicroStrategy)
- Comfortable learning about and deploying new technologies and tools.
- Organizational skills and the ability to handle multiple projects and priorities simultaneously and meet established deadlines.
- Good written and oral communication skills and ability to present results to non-technical audiences
- Knowledge of business intelligence and analytical tools, technologies and techniques.
Familiarity and experience in the following is a plus:
- AWS certification
- Spark Streaming
- Kafka Streaming / Kafka Connect
- ELK Stack
- Cassandra / MongoDB
- CI/CD: Jenkins, GitLab, Jira, Confluence other related tools
• Experience with Advanced SQL
• Experience with Azure data factory, data bricks,
• Experience with Azure IOT, Cosmos DB, BLOB Storage
• API management, FHIR API development,
• Proficient with Git and CI/CD best practices
• Experience working with Snowflake is a plus
Responsibilities:
* 3+ years of Data Engineering Experience - Design, develop, deliver and maintain data infrastructures.
* SQL Specialist – Strong knowledge and Seasoned experience with SQL Queries
* Languages: Python
* Good communicator, shows initiative, works well with stakeholders.
* Experience working closely with Data Analysts and provide the data they need and guide them on the issues.
* Solid ETL experience and Hadoop/Hive/Pyspark/Presto/ SparkSQL
* Solid communication and articulation skills
* Able to handle stakeholders independently with less interventions of reporting manager.
* Develop strategies to solve problems in logical yet creative ways.
* Create custom reports and presentations accompanied by strong data visualization and storytelling
We would be excited if you have:
* Excellent communication and interpersonal skills
* Ability to meet deadlines and manage project delivery
* Excellent report-writing and presentation skills
* Critical thinking and problem-solving capabilities
Data Engineer
Mandatory Requirements
- Experience in AWS Glue
- Experience in Apache Parquet
- Proficient in AWS S3 and data lake
- Knowledge of Snowflake
- Understanding of file-based ingestion best practices.
- Scripting language - Python & pyspark
CORE RESPONSIBILITIES
- Create and manage cloud resources in AWS
- Data ingestion from different data sources which exposes data using different technologies, such as: RDBMS, REST HTTP API, flat files, Streams, and Time series data based on various proprietary systems. Implement data ingestion and processing with the help of Big Data technologies
- Data processing/transformation using various technologies such as Spark and Cloud Services. You will need to understand your part of business logic and implement it using the language supported by the base data platform
- Develop automated data quality check to make sure right data enters the platform and verifying the results of the calculations
- Develop an infrastructure to collect, transform, combine and publish/distribute customer data.
- Define process improvement opportunities to optimize data collection, insights and displays.
- Ensure data and results are accessible, scalable, efficient, accurate, complete and flexible
- Identify and interpret trends and patterns from complex data sets
- Construct a framework utilizing data visualization tools and techniques to present consolidated analytical and actionable results to relevant stakeholders.
- Key participant in regular Scrum ceremonies with the agile teams
- Proficient at developing queries, writing reports and presenting findings
- Mentor junior members and bring best industry practices
QUALIFICATIONS
- 5-7+ years’ experience as data engineer in consumer finance or equivalent industry (consumer loans, collections, servicing, optional product, and insurance sales)
- Strong background in math, statistics, computer science, data science or related discipline
- Advanced knowledge one of language: Java, Scala, Python, C#
- Production experience with: HDFS, YARN, Hive, Spark, Kafka, Oozie / Airflow, Amazon Web Services (AWS), Docker / Kubernetes, Snowflake
- Proficient with
- Data mining/programming tools (e.g. SAS, SQL, R, Python)
- Database technologies (e.g. PostgreSQL, Redshift, Snowflake. and Greenplum)
- Data visualization (e.g. Tableau, Looker, MicroStrategy)
- Comfortable learning about and deploying new technologies and tools.
- Organizational skills and the ability to handle multiple projects and priorities simultaneously and meet established deadlines.
- Good written and oral communication skills and ability to present results to non-technical audiences
- Knowledge of business intelligence and analytical tools, technologies and techniques.
Familiarity and experience in the following is a plus:
- AWS certification
- Spark Streaming
- Kafka Streaming / Kafka Connect
- ELK Stack
- Cassandra / MongoDB
- CI/CD: Jenkins, GitLab, Jira, Confluence other related tools
We have urgent requirement of Data Engineer/Sr Data Engineer for reputed MNC company.
Exp: 4-9yrs
Location: Pune/Bangalore/Hyderabad
Skills: We need candidate either Python AWS or Pyspark AWS or Spark Scala
at Persistent Systems
We have an urgent requirements of Big Data Developer profiles in our reputed MNC company.
Location: Pune/Bangalore/Hyderabad/Nagpur
Experience: 4-9yrs
Skills: Pyspark,AWS
or Spark,Scala,AWS
or Python Aws
CORE RESPONSIBILITIES
- Create and manage cloud resources in AWS
- Data ingestion from different data sources which exposes data using different technologies, such as: RDBMS, REST HTTP API, flat files, Streams, and Time series data based on various proprietary systems. Implement data ingestion and processing with the help of Big Data technologies
- Data processing/transformation using various technologies such as Spark and Cloud Services. You will need to understand your part of business logic and implement it using the language supported by the base data platform
- Develop automated data quality check to make sure right data enters the platform and verifying the results of the calculations
- Develop an infrastructure to collect, transform, combine and publish/distribute customer data.
- Define process improvement opportunities to optimize data collection, insights and displays.
- Ensure data and results are accessible, scalable, efficient, accurate, complete and flexible
- Identify and interpret trends and patterns from complex data sets
- Construct a framework utilizing data visualization tools and techniques to present consolidated analytical and actionable results to relevant stakeholders.
- Key participant in regular Scrum ceremonies with the agile teams
- Proficient at developing queries, writing reports and presenting findings
- Mentor junior members and bring best industry practices
QUALIFICATIONS
- 5-7+ years’ experience as data engineer in consumer finance or equivalent industry (consumer loans, collections, servicing, optional product, and insurance sales)
- Strong background in math, statistics, computer science, data science or related discipline
- Advanced knowledge one of language: Java, Scala, Python, C#
- Production experience with: HDFS, YARN, Hive, Spark, Kafka, Oozie / Airflow, Amazon Web Services (AWS), Docker / Kubernetes, Snowflake
- Proficient with
- Data mining/programming tools (e.g. SAS, SQL, R, Python)
- Database technologies (e.g. PostgreSQL, Redshift, Snowflake. and Greenplum)
- Data visualization (e.g. Tableau, Looker, MicroStrategy)
- Comfortable learning about and deploying new technologies and tools.
- Organizational skills and the ability to handle multiple projects and priorities simultaneously and meet established deadlines.
- Good written and oral communication skills and ability to present results to non-technical audiences
- Knowledge of business intelligence and analytical tools, technologies and techniques.
Mandatory Requirements
- Experience in AWS Glue
- Experience in Apache Parquet
- Proficient in AWS S3 and data lake
- Knowledge of Snowflake
- Understanding of file-based ingestion best practices.
- Scripting language - Python & pyspark
Job Description
We are looking for an experienced engineer with superb technical skills. Primarily be responsible for architecting and building large scale data pipelines that delivers AI and Analytical solutions to our customers. The right candidate will enthusiastically take ownership in developing and managing a continuously improving, robust, scalable software solutions.
Although your primary responsibilities will be around back-end work, we prize individuals who are willing to step in and contribute to other areas including automation, tooling, and management applications. Experience with or desire to learn Machine Learning a plus.
Skills
- Bachelors/Masters/Phd in CS or equivalent industry experience
- Demonstrated expertise of building and shipping cloud native applications
- 5+ years of industry experience in administering (including setting up, managing, monitoring) data processing pipelines (both streaming and batch) using frameworks such as Kafka Streams, Py Spark, and streaming databases like druid or equivalent like Hive
- Strong industry expertise with containerization technologies including kubernetes (EKS/AKS), Kubeflow
- Experience with cloud platform services such as AWS, Azure or GCP especially with EKS, Managed Kafka
- 5+ Industry experience in python
- Experience with popular modern web frameworks such as Spring boot, Play framework, or Django
- Experience with scripting languages. Python experience highly desirable. Experience in API development using Swagger
- Implementing automated testing platforms and unit tests
- Proficient understanding of code versioning tools, such as Git
- Familiarity with continuous integration, Jenkins
Responsibilities
- Architect, Design and Implement Large scale data processing pipelines using Kafka Streams, PySpark, Fluentd and Druid
- Create custom Operators for Kubernetes, Kubeflow
- Develop data ingestion processes and ETLs
- Assist in dev ops operations
- Design and Implement APIs
- Identify performance bottlenecks and bugs, and devise solutions to these problems
- Help maintain code quality, organization, and documentation
- Communicate with stakeholders regarding various aspects of solution.
- Mentor team members on best practices
at Virtusa
- Minimum 1 years of relevant experience, in PySpark (mandatory)
- Hands on experience in development, test, deploy, maintain and improving data integration pipeline in AWS cloud environment is added plus
- Ability to play lead role and independently manage 3-5 member of Pyspark development team
- EMR ,Python and PYspark mandate.
- Knowledge and awareness working with AWS Cloud technologies like Apache Spark, , Glue, Kafka, Kinesis, and Lambda in S3, Redshift, RDS
- Does analytics to extract insights from raw historical data of the organization.
- Generates usable training dataset for any/all MV projects with the help of Annotators, if needed.
- Analyses user trends, and identifies their biggest bottlenecks in Hammoq Workflow.
- Tests the short/long term impact of productized MV models on those trends.
- Skills - Numpy, Pandas, SPARK, APACHE SPARK, PYSPARK, ETL mandatory.
Job description
Role : Lead Architecture (Spark, Scala, Big Data/Hadoop, Java)
Primary Location : India-Pune, Hyderabad
Experience : 7 - 12 Years
Management Level: 7
Joining Time: Immediate Joiners are preferred
- Attend requirements gathering workshops, estimation discussions, design meetings and status review meetings
- Experience of Solution Design and Solution Architecture for the data engineer model to build and implement Big Data Projects on-premises and on cloud.
- Align architecture with business requirements and stabilizing the developed solution
- Ability to build prototypes to demonstrate the technical feasibility of your vision
- Professional experience facilitating and leading solution design, architecture and delivery planning activities for data intensive and high throughput platforms and applications
- To be able to benchmark systems, analyses system bottlenecks and propose solutions to eliminate them
- Able to help programmers and project managers in the design, planning and governance of implementing projects of any kind.
- Develop, construct, test and maintain architectures and run Sprints for development and rollout of functionalities
- Data Analysis, Code development experience, ideally in Big Data Spark, Hive, Hadoop, Java, Python, PySpark,
- Execute projects of various types i.e. Design, development, Implementation and migration of functional analytics Models/Business logic across architecture approaches
- Work closely with Business Analysts to understand the core business problems and deliver efficient IT solutions of the product
- Deployment sophisticated analytics program of code using any of cloud application.
Perks and Benefits we Provide!
- Working with Highly Technical and Passionate, mission-driven people
- Subsidized Meals & Snacks
- Flexible Schedule
- Approachable leadership
- Access to various learning tools and programs
- Pet Friendly
- Certification Reimbursement Policy
- Check out more about us on our website below!
www.datametica.com
- Hands-on experience in Development
- 4-6 years of Hands on experience with Python scripts
- 2-3 years of Hands on experience in PySpark coding. Worked in spark cluster computing technology.
- 3-4 years of Hands on end to end data pipeline experience working on AWS environments
- 3-4 years of Hands on experience working on AWS services – Glue, Lambda, Step Functions, EC2, RDS, SES, SNS, DMS, CloudWatch etc.
- 2-3 years of Hands on experience working on AWS redshift
- 6+ years of Hands on experience with writing Unix Shell scripts
- Good communication skills
Advanced technology to Solve Business Problems.( A1)
- Desire to explore new technology and break new ground.
- Are passionate about Open Source technology, continuous learning, and innovation.
- Have the problem-solving skills, grit, and commitment to complete challenging work assignments and meet deadlines.
Qualifications
- Engineer enterprise-class, large-scale deployments, and deliver Cloud-based Serverless solutions to our customers.
- You will work in a fast-paced environment with leading microservice and cloud technologies, and continue to develop your all-around technical skills.
- Participate in code reviews and provide meaningful feedback to other team members.
- Create technical documentation.
- Develop thorough Unit Tests to ensure code quality.
Skills and Experience
- Advanced skills in troubleshooting and tuning AWS Lambda functions developed with Java and/or Python.
- Experience with event-driven architecture design patterns and practices
- Experience in database design and architecture principles and strong SQL abilities
- Message brokers like Kafka and Kinesis
- Experience with Hadoop, Hive, and Spark (either PySpark or Scala)
- Demonstrated experience owning enterprise-class applications and delivering highly available distributed, fault-tolerant, globally accessible services at scale.
- Good understanding of distributed systems.
- Candidates will be self-motivated and display initiative, ownership, and flexibility.
Preferred Qualifications
- AWS Lambda function development experience with Java and/or Python.
- Lambda triggers such as SNS, SES, or cron.
- Databricks
- Cloud development experience with AWS services, including:
- IAM
- S3
- EC2
- AWS CLI
- API Gateway
- ECR
- CloudWatch
- Glue
- Kinesis
- DynamoDB
- Java 8 or higher
- ETL data pipeline building
- Data Lake Experience
- Python
- Docker
- MongoDB or similar NoSQL DB.
- Relational Databases (e.g., MySQL, PostgreSQL, Oracle, etc.).
- Gradle and/or Maven.
- JUnit
- Git
- Scrum
- Experience with Unix and/or macOS.
- Immediate Joiners
Nice to have:
- AWS / GCP / Azure Certification.
- Cloud development experience with Google Cloud or Azure
- Building and operationalizing large scale enterprise data solutions and applications using one or more of AZURE data and analytics services in combination with custom solutions - Azure Synapse/Azure SQL DWH, Azure Data Lake, Azure Blob Storage, Spark, HDInsights, Databricks, CosmosDB, EventHub/IOTHub.
- Experience in migrating on-premise data warehouses to data platforms on AZURE cloud.
- Designing and implementing data engineering, ingestion, and transformation functions
-
Azure Synapse or Azure SQL data warehouse
-
Spark on Azure is available in HD insights and data bricks
- Experience with Azure Analysis Services
- Experience in Power BI
- Experience with third-party solutions like Attunity/Stream sets, Informatica
- Experience with PreSales activities (Responding to RFPs, Executing Quick POCs)
- Capacity Planning and Performance Tuning on Azure Stack and Spark.
- Building and operationalizing large scale enterprise data solutions and applications using one or more of AZURE data and analytics services in combination with custom solutions - Azure Synapse/Azure SQL DWH, Azure Data Lake, Azure Blob Storage, Spark, HDInsights, Databricks, CosmosDB, EventHub/IOTHub.
- Experience in migrating on-premise data warehouses to data platforms on AZURE cloud.
- Designing and implementing data engineering, ingestion, and transformation functions
- Experience with Azure Analysis Services
- Experience in Power BI
- Experience with third-party solutions like Attunity/Stream sets, Informatica
- Experience with PreSales activities (Responding to RFPs, Executing Quick POCs)
- Capacity Planning and Performance Tuning on Azure Stack and Spark.
Indium Software is a niche technology solutions company with deep expertise in Digital , QA and Gaming. Indium helps customers in their Digital Transformation journey through a gamut of solutions that enhance business value.
With over 1000+ associates globally, Indium operates through offices in the US, UK and India
Visit http://www.indiumsoftware.com">www.indiumsoftware.com to know more.
Job Title: Analytics Data Engineer
What will you do:
The Data Engineer must be an expert in SQL development further providing support to the Data and Analytics in database design, data flow and analysis activities. The position of the Data Engineer also plays a key role in the development and deployment of innovative big data platforms for advanced analytics and data processing. The Data Engineer defines and builds the data pipelines that will enable faster, better, data-informed decision-making within the business.
We ask:
Extensive Experience with SQL and strong ability to process and analyse complex data
The candidate should also have an ability to design, build, and maintain the business’s ETL pipeline and data warehouse The candidate will also demonstrate expertise in data modelling and query performance tuning on SQL Server
Proficiency with analytics experience, especially funnel analysis, and have worked on analytical tools like Mixpanel, Amplitude, Thoughtspot, Google Analytics, and similar tools.
Should work on tools and frameworks required for building efficient and scalable data pipelines
Excellent at communicating and articulating ideas and an ability to influence others as well as drive towards a better solution continuously.
Experience working in python, Hive queries, spark, pysaprk, sparkSQL, presto
- Relate Metrics to product
- Programmatic Thinking
- Edge cases
- Good Communication
- Product functionality understanding
Perks & Benefits:
A dynamic, creative & intelligent team they will make you love being at work.
Autonomous and hands-on role to make an impact you will be joining at an exciting time of growth!
Flexible work hours and Attractive pay package and perks
An inclusive work environment that lets you work in the way that works best for you!