11+ EMC GreenPlum Jobs in Delhi, NCR and Gurgaon | EMC GreenPlum Job openings in Delhi, NCR and Gurgaon
Apply to 11+ EMC GreenPlum Jobs in Delhi, NCR and Gurgaon on CutShort.io. Explore the latest EMC GreenPlum Job opportunities across top companies like Google, Amazon & Adobe.
consulting & implementation services in the area of Oil & Gas, Mining and Manufacturing Industry
- Data Engineer
Required skill set: AWS GLUE, AWS LAMBDA, AWS SNS/SQS, AWS ATHENA, SPARK, SNOWFLAKE, PYTHON
Mandatory Requirements
- Experience in AWS Glue
- Experience in Apache Parquet
- Proficient in AWS S3 and data lake
- Knowledge of Snowflake
- Understanding of file-based ingestion best practices.
- Scripting language - Python & pyspark
CORE RESPONSIBILITIES
- Create and manage cloud resources in AWS
- Data ingestion from different data sources which exposes data using different technologies, such as: RDBMS, REST HTTP API, flat files, Streams, and Time series data based on various proprietary systems. Implement data ingestion and processing with the help of Big Data technologies
- Data processing/transformation using various technologies such as Spark and Cloud Services. You will need to understand your part of business logic and implement it using the language supported by the base data platform
- Develop automated data quality check to make sure right data enters the platform and verifying the results of the calculations
- Develop an infrastructure to collect, transform, combine and publish/distribute customer data.
- Define process improvement opportunities to optimize data collection, insights and displays.
- Ensure data and results are accessible, scalable, efficient, accurate, complete and flexible
- Identify and interpret trends and patterns from complex data sets
- Construct a framework utilizing data visualization tools and techniques to present consolidated analytical and actionable results to relevant stakeholders.
- Key participant in regular Scrum ceremonies with the agile teams
- Proficient at developing queries, writing reports and presenting findings
- Mentor junior members and bring best industry practices
QUALIFICATIONS
- 5-7+ years’ experience as data engineer in consumer finance or equivalent industry (consumer loans, collections, servicing, optional product, and insurance sales)
- Strong background in math, statistics, computer science, data science or related discipline
- Advanced knowledge one of language: Java, Scala, Python, C#
- Production experience with: HDFS, YARN, Hive, Spark, Kafka, Oozie / Airflow, Amazon Web Services (AWS), Docker / Kubernetes, Snowflake
- Proficient with
- Data mining/programming tools (e.g. SAS, SQL, R, Python)
- Database technologies (e.g. PostgreSQL, Redshift, Snowflake. and Greenplum)
- Data visualization (e.g. Tableau, Looker, MicroStrategy)
- Comfortable learning about and deploying new technologies and tools.
- Organizational skills and the ability to handle multiple projects and priorities simultaneously and meet established deadlines.
- Good written and oral communication skills and ability to present results to non-technical audiences
- Knowledge of business intelligence and analytical tools, technologies and techniques.
Familiarity and experience in the following is a plus:
- AWS certification
- Spark Streaming
- Kafka Streaming / Kafka Connect
- ELK Stack
- Cassandra / MongoDB
- CI/CD: Jenkins, GitLab, Jira, Confluence other related tools
Publicis Sapient Overview:
The Senior Associate People Senior Associate L1 in Data Engineering, you will translate client requirements into technical design, and implement components for data engineering solution. Utilize deep understanding of data integration and big data design principles in creating custom solutions or implementing package solutions. You will independently drive design discussions to insure the necessary health of the overall solution
.
Job Summary:
As Senior Associate L2 in Data Engineering, you will translate client requirements into technical design, and implement components for data engineering solution. Utilize deep understanding of data integration and big data design principles in creating custom solutions or implementing package solutions. You will independently drive design discussions to insure the necessary health of the overall solution
The role requires a hands-on technologist who has strong programming background like Java / Scala / Python, should have experience in Data Ingestion, Integration and data Wrangling, Computation, Analytics pipelines and exposure to Hadoop ecosystem components. You are also required to have hands-on knowledge on at least one of AWS, GCP, Azure cloud platforms.
Role & Responsibilities:
Your role is focused on Design, Development and delivery of solutions involving:
• Data Integration, Processing & Governance
• Data Storage and Computation Frameworks, Performance Optimizations
• Analytics & Visualizations
• Infrastructure & Cloud Computing
• Data Management Platforms
• Implement scalable architectural models for data processing and storage
• Build functionality for data ingestion from multiple heterogeneous sources in batch & real-time mode
• Build functionality for data analytics, search and aggregation
Experience Guidelines:
Mandatory Experience and Competencies:
# Competency
1.Overall 5+ years of IT experience with 3+ years in Data related technologies
2.Minimum 2.5 years of experience in Big Data technologies and working exposure in at least one cloud platform on related data services (AWS / Azure / GCP)
3.Hands-on experience with the Hadoop stack – HDFS, sqoop, kafka, Pulsar, NiFi, Spark, Spark Streaming, Flink, Storm, hive, oozie, airflow and other components required in building end to end data pipeline.
4.Strong experience in at least of the programming language Java, Scala, Python. Java preferable
5.Hands-on working knowledge of NoSQL and MPP data platforms like Hbase, MongoDb, Cassandra, AWS Redshift, Azure SQLDW, GCP BigQuery etc
6.Well-versed and working knowledge with data platform related services on at least 1 cloud platform, IAM and data security
Preferred Experience and Knowledge (Good to Have):
# Competency
1.Good knowledge of traditional ETL tools (Informatica, Talend, etc) and database technologies (Oracle, MySQL, SQL Server, Postgres) with hands on experience
2.Knowledge on data governance processes (security, lineage, catalog) and tools like Collibra, Alation etc
3.Knowledge on distributed messaging frameworks like ActiveMQ / RabbiMQ / Solace, search & indexing and Micro services architectures
4.Performance tuning and optimization of data pipelines
5.CI/CD – Infra provisioning on cloud, auto build & deployment pipelines, code quality
6.Cloud data specialty and other related Big data technology certifications
Personal Attributes:
• Strong written and verbal communication skills
• Articulation skills
• Good team player
• Self-starter who requires minimal oversight
• Ability to prioritize and manage multiple tasks
• Process orientation and the ability to define and set up processes
Job Responsibilities
- Design, build & test ETL processes using Python & SQL for the corporate data warehouse
- Inform, influence, support, and execute our product decisions
- Maintain advertising data integrity by working closely with R&D to organize and store data in a format that provides accurate data and allows the business to quickly identify issues.
- Evaluate and prototype new technologies in the area of data processing
- Think quickly, communicate clearly and work collaboratively with product, data, engineering, QA and operations teams
- High energy level, strong team player and good work ethic
- Data analysis, understanding of business requirements and translation into logical pipelines & processes
- Identification, analysis & resolution of production & development bugs
- Support the release process including completing & reviewing documentation
- Configure data mappings & transformations to orchestrate data integration & validation
- Provide subject matter expertise
- Document solutions, tools & processes
- Create & support test plans with hands-on testing
- Peer reviews of work developed by other data engineers within the team
- Establish good working relationships & communication channels with relevant departments
Skills and Qualifications we look for
- University degree 2.1 or higher (or equivalent) in a relevant subject. Master’s degree in any data subject will be a strong advantage.
- 4 - 6 years experience with data engineering.
- Strong coding ability and software development experience in Python.
- Strong hands-on experience with SQL and Data Processing.
- Google cloud platform (Cloud composer, Dataflow, Cloud function, Bigquery, Cloud storage, dataproc)
- Good working experience in any one of the ETL tools (Airflow would be preferable).
- Should possess strong analytical and problem solving skills.
- Good to have skills - Apache pyspark, CircleCI, Terraform
- Motivated, self-directed, able to work with ambiguity and interested in emerging technologies, agile and collaborative processes.
- Understanding & experience of agile / scrum delivery methodology
AWS Glue Developer
Work Experience: 6 to 8 Years
Work Location: Noida, Bangalore, Chennai & Hyderabad
Must Have Skills: AWS Glue, DMS, SQL, Python, PySpark, Data integrations and Data Ops,
Job Reference ID:BT/F21/IND
Job Description:
Design, build and configure applications to meet business process and application requirements.
Responsibilities:
7 years of work experience with ETL, Data Modelling, and Data Architecture Proficient in ETL optimization, designing, coding, and tuning big data processes using Pyspark Extensive experience to build data platforms on AWS using core AWS services Step function, EMR, Lambda, Glue and Athena, Redshift, Postgres, RDS etc and design/develop data engineering solutions. Orchestrate using Airflow.
Technical Experience:
Hands-on experience on developing Data platform and its components Data Lake, cloud Datawarehouse, APIs, Batch and streaming data pipeline Experience with building data pipelines and applications to stream and process large datasets at low latencies.
➢ Enhancements, new development, defect resolution and production support of Big data ETL development using AWS native services.
➢ Create data pipeline architecture by designing and implementing data ingestion solutions.
➢ Integrate data sets using AWS services such as Glue, Lambda functions/ Airflow.
➢ Design and optimize data models on AWS Cloud using AWS data stores such as Redshift, RDS, S3, Athena.
➢ Author ETL processes using Python, Pyspark.
➢ Build Redshift Spectrum direct transformations and data modelling using data in S3.
➢ ETL process monitoring using CloudWatch events.
➢ You will be working in collaboration with other teams. Good communication must.
➢ Must have experience in using AWS services API, AWS CLI and SDK
Professional Attributes:
➢ Experience operating very large data warehouses or data lakes Expert-level skills in writing and optimizing SQL Extensive, real-world experience designing technology components for enterprise solutions and defining solution architectures and reference architectures with a focus on cloud technology.
➢ Must have 6+ years of big data ETL experience using Python, S3, Lambda, Dynamo DB, Athena, Glue in AWS environment.
➢ Expertise in S3, RDS, Redshift, Kinesis, EC2 clusters highly desired.
Qualification:
➢ Degree in Computer Science, Computer Engineering or equivalent.
Salary: Commensurate with experience and demonstrated competence
Job Description:
As an Azure Data Engineer, your role will involve designing, developing, and maintaining data solutions on the Azure platform. You will be responsible for building and optimizing data pipelines, ensuring data quality and reliability, and implementing data processing and transformation logic. Your expertise in Azure Databricks, Python, SQL, Azure Data Factory (ADF), PySpark, and Scala will be essential for performing the following key responsibilities:
Designing and developing data pipelines: You will design and implement scalable and efficient data pipelines using Azure Databricks, PySpark, and Scala. This includes data ingestion, data transformation, and data loading processes.
Data modeling and database design: You will design and implement data models to support efficient data storage, retrieval, and analysis. This may involve working with relational databases, data lakes, or other storage solutions on the Azure platform.
Data integration and orchestration: You will leverage Azure Data Factory (ADF) to orchestrate data integration workflows and manage data movement across various data sources and targets. This includes scheduling and monitoring data pipelines.
Data quality and governance: You will implement data quality checks, validation rules, and data governance processes to ensure data accuracy, consistency, and compliance with relevant regulations and standards.
Performance optimization: You will optimize data pipelines and queries to improve overall system performance and reduce processing time. This may involve tuning SQL queries, optimizing data transformation logic, and leveraging caching techniques.
Monitoring and troubleshooting: You will monitor data pipelines, identify performance bottlenecks, and troubleshoot issues related to data ingestion, processing, and transformation. You will work closely with cross-functional teams to resolve data-related problems.
Documentation and collaboration: You will document data pipelines, data flows, and data transformation processes. You will collaborate with data scientists, analysts, and other stakeholders to understand their data requirements and provide data engineering support.
Skills and Qualifications:
Strong experience with Azure Databricks, Python, SQL, ADF, PySpark, and Scala.
Proficiency in designing and developing data pipelines and ETL processes.
Solid understanding of data modeling concepts and database design principles.
Familiarity with data integration and orchestration using Azure Data Factory.
Knowledge of data quality management and data governance practices.
Experience with performance tuning and optimization of data pipelines.
Strong problem-solving and troubleshooting skills related to data engineering.
Excellent collaboration and communication skills to work effectively in cross-functional teams.
Understanding of cloud computing principles and experience with Azure services.
● Create and maintain optimal data pipeline architecture.
● Assemble large, complex data sets that meet functional / non-functional
business requirements.
● Building and optimizing ‘big data’ data pipelines, architectures and data sets.
● Maintain, organize & automate data processes for various use cases.
● Identifying trends, doing follow-up analysis, preparing visualizations.
● Creating daily, weekly and monthly reports of product KPIs.
● Create informative, actionable and repeatable reporting that highlights
relevant business trends and opportunities for improvement.
Required Skills And Experience:
● 2-5 years of work experience in data analytics- including analyzing large data sets.
● BTech in Mathematics/Computer Science
● Strong analytical, quantitative and data interpretation skills.
● Hands-on experience with Python, Apache Spark, Hadoop, NoSQL
databases(MongoDB preferred), Linux is a must.
● Experience building and optimizing ‘big data’ data pipelines, architectures and data sets.
● Experience with Google Cloud Data Analytics Products such as BigQuery, Dataflow, Dataproc etc. (or similar cloud-based platforms).
● Experience working within a Linux computing environment, and use of
command-line tools including knowledge of shell/Python scripting for
automating common tasks.
● Previous experience working at startups and/or in fast-paced environments.
● Previous experience as a data engineer or in a similar role.
at Meslova Systems Pvt Ltd
Artificial Intelligence (AI) Researchers and Developers
Successful candidate will be part of highly productive teams working on implementing core AI algorithms, Cryptography libraries, AI enabled products and intelligent 3D interface. Candidates will work on cutting edge products and technologies in highly challenging domains and will need to have highest level of commitment and interest to learn new technologies and domain specific subject matter very quickly. Successful completion of projects will require travel and working in remote locations with customers for extended periods
Education Qualification: Bachelor, Master or PhD degree in Computer Science, Mathematics, Electronics, Information Systems from a reputed university and/or equivalent Knowledge and Skills
Location : Hyderabad, Bengaluru, Delhi, Client Location (as needed)
Skillset and Expertise
• Strong software development experience using Python
• Strong background in mathematical, numerical and scientific computing using Python.
• Knowledge in Artificial Intelligence/Machine learning
• Experience working with SCRUM software development methodology
• Strong experience with implementing Web services, Web clients and JSON protocol is required
• Experience with Python Meta programming
• Strong analytical and problem-solving skills
• Design, develop and debug enterprise grade software products and systems
• Software systems testing methodology, including writing and execution of test plans, debugging, and testing scripts and tools
• Excellent written and verbal communication skills; Proficiency in English. Verbal communication in Hindi and other local
Indian languages
• Ability to effectively communicate product design, functionality and status to management, customers and other stakeholders
• Highest level of integrity and work ethic
Frameworks
1. Scikit-learn
2. Tensorflow
3. Keras
4. OpenCV
5. Django
6. CUDA
7. Apache Kafka
Mathematics
1. Advanced Calculus
2. Numerical Analysis
3. Complex Function Theory
4. Probability
Concepts (One or more of the below)
1. OpenGL based 3D programming
2. Cryptography
3. Artificial Intelligence (AI) Algorithms a) Statistical modelling b.) DNN c. RNN d. LSTM e.GAN f. CN
Responsibilities:
- Exploring and visualizing data to gain an understanding of it, then identifying differences in data distribution that could affect performance when deploying the model in the real world.
- Verifying data quality, and/or ensuring it via data cleaning.
- Able to adapt and work fast in producing the output which upgrades the decision making of stakeholders using ML.
- To design and develop Machine Learning systems and schemes.
- To perform statistical analysis and fine-tune models using test results.
- To train and retrain ML systems and models as and when necessary.
- To deploy ML models in production and maintain the cost of cloud infrastructure.
- To develop Machine Learning apps according to client and data scientist requirements.
- To analyze the problem-solving capabilities and use-cases of ML algorithms and rank them by how successful they are in meeting the objective.
Technical Knowledge:
- Worked with real time problems, solved them using ML and deep learning models deployed in real time and should have some awesome projects under his belt to showcase.
- Proficiency in Python and experience with working with Jupyter Framework, Google collab and cloud hosted notebooks such as AWS sagemaker, DataBricks etc.
- Proficiency in working with libraries Sklearn, Tensorflow, Open CV2, Pyspark, Pandas, Numpy and related libraries.
- Expert in visualising and manipulating complex datasets.
- Proficiency in working with visualisation libraries such as seaborn, plotly, matplotlib etc.
- Proficiency in Linear Algebra, statistics and probability required for Machine Learning.
- Proficiency in ML Based algorithms for example, Gradient boosting, stacked Machine learning, classification algorithms and deep learning algorithms. Need to have experience in hypertuning various models and comparing the results of algorithm performance.
- Big data Technologies such as Hadoop stack and Spark.
- Basic use of clouds (VM’s example EC2).
- Brownie points for Kubernetes and Task Queues.
- Strong written and verbal communications.
- Experience working in an Agile environment.
Job Description
We are looking for a data scientist that will help us to discover the information hidden in vast amounts of data, and help us make smarter decisions to deliver even better products. Your primary focus will be in applying data mining techniques, doing statistical analysis, and building high quality prediction systems integrated with our products.
Responsibilities
- Selecting features, building and optimizing classifiers using machine learning techniques
- Data mining using state-of-the-art methods
- Extending company’s data with third party sources of information when needed
- Enhancing data collection procedures to include information that is relevant for building analytic systems
- Processing, cleansing, and verifying the integrity of data used for analysis
- Doing ad-hoc analysis and presenting results in a clear manner
- Creating automated anomaly detection systems and constant tracking of its performance
Skills and Qualifications
- Excellent understanding of machine learning techniques and algorithms, such as Linear regression, SVM, Decision Forests, LSTM, CNN etc.
- Experience with Deep Learning preferred.
- Experience with common data science toolkits, such as R, NumPy, MatLab, etc. Excellence in at least one of these is highly desirable
- Great communication skills
- Proficiency in using query languages such as SQL, Hive, Pig
- Good applied statistics skills, such as statistical testing, regression, etc.
- Good scripting and programming skills
- Data-oriented personality