Sr Hadoop Operations Engineer
Skills
About Multinational Company providing energy & Automation digital
Similar jobs
DATA ENGINEER
Overview
They started with a singular belief - what is beautiful cannot and should not be defined in marketing meetings. It's defined by the regular people like us, our sisters, our next-door neighbours, and the friends we make on the playground and in lecture halls. That's why we stand for people-proving everything we do. From the inception of a product idea to testing the final formulations before launch, our consumers are a part of each and every process. They guide and inspire us by sharing their stories with us. They tell us not only about the product they need and the skincare issues they face but also the tales of their struggles, dreams and triumphs. Skincare goes deeper than skin. It's a form of self-care for many. Wherever someone is on this journey, we want to cheer them on through the products we make, the content we create and the conversations we have. What we wish to build is more than a brand. We want to build a community that grows and glows together - cheering each other on, sharing knowledge, and ensuring people always have access to skincare that really works.
Job Description:
We are seeking a skilled and motivated Data Engineer to join our team. As a Data Engineer, you will be responsible for designing, developing, and maintaining the data infrastructure and systems that enable efficient data collection, storage, processing, and analysis. You will collaborate with cross-functional teams, including data scientists, analysts, and software engineers, to implement data pipelines and ensure the availability, reliability, and scalability of our data platform.
Responsibilities:
Design and implement scalable and robust data pipelines to collect, process, and store data from various sources.
Develop and maintain data warehouse and ETL (Extract, Transform, Load) processes for data integration and transformation.
Optimize and tune the performance of data systems to ensure efficient data processing and analysis.
Collaborate with data scientists and analysts to understand data requirements and implement solutions for data modeling and analysis.
Identify and resolve data quality issues, ensuring data accuracy, consistency, and completeness.
Implement and maintain data governance and security measures to protect sensitive data.
Monitor and troubleshoot data infrastructure, perform root cause analysis, and implement necessary fixes.
Stay up-to-date with emerging technologies and industry trends in data engineering and recommend their adoption when appropriate.
Qualifications:
Bachelor’s or higher degree in Computer Science, Information Systems, or a related field.
Proven experience as a Data Engineer or similar role, working with large-scale data processing and storage systems.
Strong programming skills in languages such as Python, Java, or Scala.
Experience with big data technologies and frameworks like Hadoop, Spark, or Kafka.
Proficiency in SQL and database management systems (e.g., MySQL, PostgreSQL, or Oracle).
Familiarity with cloud platforms like AWS, Azure, or GCP, and their data services (e.g., S3, Redshift, BigQuery).
Solid understanding of data modeling, data warehousing, and ETL principles.
Knowledge of data integration techniques and tools (e.g., Apache Nifi, Talend, or Informatica).
Strong problem-solving and analytical skills, with the ability to handle complex data challenges.
Excellent communication and collaboration skills to work effectively in a team environment.
Preferred Qualifications:
Advanced knowledge of distributed computing and parallel processing.
Experience with real-time data processing and streaming technologies (e.g., Apache Kafka, Apache Flink).
Familiarity with machine learning concepts and frameworks (e.g., TensorFlow, PyTorch).
Knowledge of containerization and orchestration technologies (e.g., Docker, Kubernetes).
Experience with data visualization and reporting tools (e.g., Tableau, Power BI).
Certification in relevant technologies or data engineering disciplines.
We are seeking an experienced Senior Data Platform Engineer to join our team. The ideal candidate should have extensive experience with Pyspark, Airflow, Presto, Hive, Kafka and Debezium, and should be passionate about developing scalable and reliable data platforms.
Responsibilities:
- Design, develop, and maintain our data platform architecture using Pyspark, Airflow, Presto, Hive, Kafka, and Debezium.
- Develop and maintain ETL processes to ingest, transform, and load data from various sources into our data platform.
- Work closely with data analysts, data scientists, and other stakeholders to understand their requirements and design solutions that meet their needs.
- Implement and maintain data governance policies and procedures to ensure data quality, privacy, and security.
- Continuously monitor and optimize the performance of our data platform to ensure scalability, reliability, and cost-effectiveness.
- Keep up-to-date with the latest trends and technologies in the field of data engineering and share knowledge and best practices with the team.
Requirements:
- Bachelor's degree in Computer Science, Information Technology, or related field.
- 5+ years of experience in data engineering or related fields.
- Strong proficiency in Pyspark, Airflow, Presto, Hive, Datalake, and Debezium.
- Experience with data warehousing, data modeling, and data governance.
- Experience working with large-scale distributed systems and cloud platforms (e.g., AWS, GCP, Azure).
- Strong problem-solving skills and ability to work independently and collaboratively.
- Excellent communication and interpersonal skills.
If you are a self-motivated and driven individual with a passion for data engineering and a strong background in Pyspark, Airflow, Presto, Hive, Datalake, and Debezium, we encourage you to apply for this exciting opportunity. We offer competitive compensation, comprehensive benefits, and a collaborative work environment that fosters innovation and growth.
Hi,
Enterprise Minds is looking for Data Architect for Pune Location.
Req Skills:
Python,Pyspark,Hadoop,Java,Scala
- Collaborate with the business teams to understand the data environment in the organization; develop and lead the Data Scientists team to test and scale new algorithms through pilots and subsequent scaling up of the solutions
- Influence, build and maintain the large-scale data infrastructure required for the AI projects, and integrate with external IT infrastructure/service
- Act as the single point source for all data related queries; strong understanding of internal and external data sources; provide inputs in deciding data-schemas
- Design, develop and maintain the framework for the analytics solutions pipeline
- Provide inputs to the organization’s initiatives on data quality and help implement frameworks and tools for the various related initiatives
- Work in cross-functional teams of software/machine learning engineers, data scientists, product managers, and others to build the AI ecosystem
- Collaborate with the external organizations including vendors, where required, in respect of all data-related queries as well as implementation initiatives
Technical/Core skills
- Minimum 3 yrs of exp in Informatica Big data Developer(BDM) in Hadoop environment.
- Have knowledge of informatica Power exchange (PWX).
- Minimum 3 yrs of exp in big data querying tool like Hive and Impala.
- Ability to designing/development of complex mappings using informatica Big data Developer.
- Create and manage Informatica power exchange and CDC real time implementation
- Strong Unix knowledge skills for writing shell scripts and troubleshoot of existing scripts.
- Good knowledge of big data platforms and its framework.
- Good to have an experience in cloudera data platform (CDP)
- Experience with building stream processing systems using Kafka and spark
- Excellent SQL knowledge
Soft skills :
- Ability to work independently
- Strong analytical and problem solving skills
- Attitude of learning new technology
- Regular interaction with vendors, partners and stakeholders
Job Description:
- Working knowledge and hands-on experience of Big Data / Hadoop tools and technologies.
- Experience of working in Pig, Hive, Flume, Sqoop, Kafka etc.
- Database development experience with a solid understanding of core database concepts, relational database design, ODS & DWH.
- Expert level knowledge of SQL and scripting preferably UNIX shell scripting, Perl scripting.
- Working knowledge of Data integration solution and well-versed with any ETL tool (Informatica / Datastage / Abinitio/Pentaho etc).
- Strong problem solving and logical reasoning ability.
- Excellent understanding of all aspects of the Software Development Lifecycle.
- Excellent written and verbal communication skills.
- Experience in Java will be an added advantage
- Knowledge of object oriented programming concepts
- Exposure to ISMS policies and procedures.
- We are looking for a Data Engineer with 3-5 years experience in Python, SQL, AWS (EC2, S3, Elastic Beanstalk, API Gateway), and Java.
- The applicant must be able to perform Data Mapping (data type conversion, schema harmonization) using Python, SQL, and Java.
- The applicant must be familiar with and have programmed ETL interfaces (OAUTH, REST API, ODBC) using the same languages.
- The company is looking for someone who shows an eagerness to learn and who asks concise questions when communicating with teammates.
- We are looking for an experienced data engineer to join our team.
- The preprocessing involves ETL tasks, using pyspark, AWS Glue, staging data in parquet formats on S3, and Athena
To succeed in this data engineering position, you should care about well-documented, testable code and data integrity. We have devops who can help with AWS permissions.
We would like to build up a consistent data lake with staged, ready-to-use data, and to build up various scripts that will serve as blueprints for various additional data ingestion and transforms.
If you enjoy setting up something which many others will rely on, and have the relevant ETL expertise, we’d like to work with you.
Responsibilities
- Analyze and organize raw data
- Build data pipelines
- Prepare data for predictive modeling
- Explore ways to enhance data quality and reliability
- Potentially, collaborate with data scientists to support various experiments
Requirements
- Previous experience as a data engineer with the above technologies
Roles & Responsibilities
- Proven experience with deploying and tuning Open Source components into enterprise ready production tooling Experience with datacentre (Metal as a Service – MAAS) and cloud deployment technologies (AWS or GCP Architect certificates required)
- Deep understanding of Linux from kernel mechanisms through user space management
- Experience on CI/CD (Continuous Integrations and Deployment) system solutions (Jenkins).
- Using Monitoring tools (local and on public cloud platforms) Nagios, Prometheus, Sensu, ELK, Cloud Watch, Splunk, New Relic etc. to trigger instant alerts, reports and dashboards. Work closely with the development and infrastructure teams to analyze and design solutions with four nines (99.99%) up-time, globally distributed, clustered, production and non-production virtualized infrastructure.
- Wide understanding of IP networking as well as data centre infrastructure
Skills
- Expert with software development tools and sourcecode management, understanding, managing issues, code changes and grouping them into deployment releases in a stable and measurable way to maximize production Must be expert at developing and using ansible roles and configuring deployment templates with jinja2.
- Solid understanding of data collection tools like Flume, Filebeat, Metricbeat, JMX Exporter agents.
- Extensive experience operating and tuning the kafka streaming data platform, specifically as a message queue for big data processing
- Strong understanding and must have experience:
- Apache spark framework, specifically spark core and spark streaming,
- Orchestration platforms, mesos and kubernetes,
- Data storage platforms, elasticstack, carbon, clickhouse, cassandra, ceph, hdfs
- Core presentation technologies kibana, and grafana.
- Excellent scripting and programming skills (bash, python, java, go, rust). Must have previous experience with “rust” in order to support, improve in house developed products
Certification
Red Hat Certified Architect certificate or equivalent required CCNA certificate required 3-5 years of experience running open source big data platforms
This will include:
Scorecards
Strategies
MIS
The verticals included are:
Risk
Marketing
Product