We are seeking an experienced Senior Data Platform Engineer to join our team. The ideal candidate should have extensive experience with Pyspark, Airflow, Presto, Hive, Kafka and Debezium, and should be passionate about developing scalable and reliable data platforms.
- Design, develop, and maintain our data platform architecture using Pyspark, Airflow, Presto, Hive, Kafka, and Debezium.
- Develop and maintain ETL processes to ingest, transform, and load data from various sources into our data platform.
- Work closely with data analysts, data scientists, and other stakeholders to understand their requirements and design solutions that meet their needs.
- Implement and maintain data governance policies and procedures to ensure data quality, privacy, and security.
- Continuously monitor and optimize the performance of our data platform to ensure scalability, reliability, and cost-effectiveness.
- Keep up-to-date with the latest trends and technologies in the field of data engineering and share knowledge and best practices with the team.
- Bachelor's degree in Computer Science, Information Technology, or related field.
- 5+ years of experience in data engineering or related fields.
- Strong proficiency in Pyspark, Airflow, Presto, Hive, Datalake, and Debezium.
- Experience with data warehousing, data modeling, and data governance.
- Experience working with large-scale distributed systems and cloud platforms (e.g., AWS, GCP, Azure).
- Strong problem-solving skills and ability to work independently and collaboratively.
- Excellent communication and interpersonal skills.
If you are a self-motivated and driven individual with a passion for data engineering and a strong background in Pyspark, Airflow, Presto, Hive, Datalake, and Debezium, we encourage you to apply for this exciting opportunity. We offer competitive compensation, comprehensive benefits, and a collaborative work environment that fosters innovation and growth.
Role: Data Engineer
Total Experience: 5 to 8 Years
Job Location: Gurgaon
Budget -26 28 LPA
Must have - Technical & Soft Skills:
- Python: Data Structures, List, Libraries, Data engineering basics
- SQL: Joins, Groups, Aggregations, Windowing functions, analytic functions etc.
- Worked in AWS services S3, EC2, Glue, Data Pipeline, Athena and Redshift
- Solid hands-on working experience in Big-Data Technologies
- Strong hands-on experience of programming languages like Python, Scala with Spark.
- Good command and working experience on Hadoop/Map Reduce, HDFS, Hive, HBase, and No-SQL Databases
- Hands on working experience on any of the data engineering/analytics platform AWS preferred
- Hands-on experience on Data Ingestion Apache Nifi, Apache Airflow, Sqoop, and Ozzie
- Hands on working experience of data processing at scale with event driven systems, message queues (Kafka/ Flink/Spark Streaming)
- Hands on working Experience with AWS Services like EMR, Kinesis, S3, CloudFormation, Glue, API
- Gateway, Lake Foundation
- Operationalization of ML models on AWS (e.g. deployment, scheduling, model monitoring etc.)
- Feature Engineering/Data Processing to be used for Model development
- Experience gathering and processing raw data at scale (including writing scripts, web scraping, calling APIs, write SQL queries, etc.)
- Hands-on working experience in analyzing source system data and data flows, working with structured and unstructured data
• Designing Hive/HCatalog data model includes creating table definitions, file formats, compression techniques for Structured & Semi-structured data processing
• Implementing Spark processing based ETL frameworks
• Implementing Big data pipeline for Data Ingestion, Storage, Processing & Consumption
• Modifying the Informatica-Teradata & Unix based data pipeline
• Enhancing the Talend-Hive/Spark & Unix based data pipelines
• Develop and Deploy Scala/Python based Spark Jobs for ETL processing
• Strong SQL & DWH concepts.
• Function as integrator between business needs and technology solutions, helping to create technology solutions to meet clients’ business needs
• Lead project efforts in defining scope, planning, executing, and reporting to stakeholders on strategic initiatives
• Understanding of EDW system of business and creating High level design document and low level implementation document
• Understanding of Big Data Lake system of business and creating High level design document and low level implementation document
• Designing Big data pipeline for Data Ingestion, Storage, Processing & Consumption
Designation: Principal Data Engineer
Position Type: Full Time Position
Office Timings: 9AM to 6PM
Compensation: As Per Industry standards
At Monarch, we’re leading the digital transformation of farming. Monarch Tractor augments both muscle and mind with fully loaded hardware, software, and service machinery that will spur future generations of farming technologies. With our farmer-first mentality, we are building a smart tractor that will enhance (not replace) the existing farm ecosystem, alleviate labor availability, and cost issues, and provide an avenue for competitive organic and beyond farming by providing mechanical solutions to replace harmful chemical solutions. Despite all the cutting-edge technology we will incorporate, our tractor will still plow, till, and haul better than any other tractor in its class. We have all the necessary ingredients to develop, build and scale the Monarch Tractor and digitally transform farming around the world.
Monarch Tractor likes to invite an experience Python data engineer to lead our internal data engineering team in India. This is a unique opportunity to work on computer vision AI data pipelines for electric tractors. You will be dealing with data from a farm environment like videos, images, tractor logs, GPS coordinates and map polygons. You will be responsible for collecting data for research and development. For example, this includes setting up ETL data pipelines to extract data from tractors, loading these data into the cloud and recording AI training results.
This role includes, but not limited to, the following tasks:
● Lead data engineering team
● Own and contribute to more than 50% of the data engineering code base
● Scope out new project requirements
● Costing data pipeline solutions
● Create data engineering tooling
● Design custom data structures for efficient processing of data
Data engineering skills we are looking for:
● Able to work with large amounts of text log data, image data, and video data
● Fluently use AWS cloud solutions like S3, Lambda, and EC2
● Able to work with data from Robot Operating System
● 3 to 5 years of experience using Python
● 3 to 5 years of experience using PostgreSQL
● 3 to 5 years of experience using AWS EC2, S3, Lambda
● 3 to 5 years of experience using Ubuntu OS or WSL
Good to have experience:
● Robot Operating System
What you will get:
At Monarch Tractor, you’ll play a key role on a capable, dedicated, high-performing team of rock stars. Our compensation package includes a competitive salary, excellent health, dental and vision benefits, and company equity commensurate with the role you’ll play in our success.
Integration Engineer- SnapLogic
Experience: 5+ years
- Experience with cloud middleware integration platforms like SnapLogic
- Experience in design, develop, validate and deploy the ETL pipelines using the SnapLogic integration tool
- Experience in Integration with MS Dynamics Sales Module
- Should be able to work with DataBases, SaaS products (Salesforce , Dynamics / netsuite ) , REST Web Services, XSLT, XML, JSON
- Should be able to design performance-friendly pipelines
- Proficiency in building complex mappings using Java scripting
- Experience in building common services framework for logging, error handling, configuration, authentication, and authorization
A proficient, independent contributor that assists in technical design, development, implementation, and support of data pipelines; beginning to invest in less-experienced engineers.
- Design, Create and maintain on premise and cloud based data integration pipelines.
- Assemble large, complex data sets that meet functional/non functional business requirements.
- Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
- Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources.
- Build analytics tools that utilize the data pipeline to provide actionable insights into key business performance metrics.
- Work with stakeholders including the Executive, Product, Data and Design teams to assist with data-related technical issues and support their data infrastructure needs.
- Create data pipelines to enable BI, Analytics and Data Science teams that assist them in building and optimizing their systems
- Assists in the onboarding, training and development of team members.
- Reviews code changes and pull requests for standardization and best practices
- Evolve existing development to be automated, scalable, resilient, self-serve platforms
- Assist the team in the design and requirements gathering for technical and non technical work to drive the direction of projects
Technical & Business Expertise:
-Hands on integration experience in SSIS/Mulesoft
- Hands on experience Azure Synapse
- Proven advanced level of writing database experience in SQL Server
- Proven advanced level of understanding about Data Lake
- Proven intermediate level of writing Python or similar programming language
- Intermediate understanding of Cloud Platforms (GCP)
- Intermediate understanding of Data Warehousing
- Advanced Understanding of Source Control (Github)
Work Timing: 5 Days A Week
• Ensure right stakeholders gets right information at right time
• Requirement gathering with stakeholders to understand their data requirement
• Creating and deploying reports
• Participate actively in datamarts design discussions
• Work on both RDBMS as well as Big Data for designing BI Solutions
• Write code (queries/procedures) in SQL / Hive / Drill that is both functional and elegant,
following appropriate design patterns
• Design and plan BI solutions to automate regular reporting
• Debugging, monitoring and troubleshooting BI solutions
• Creating and deploying datamarts
• Writing relational and multidimensional database queries
• Integrate heterogeneous data sources into BI solutions
• Ensure Data Integrity of data flowing from heterogeneous data sources into BI solutions.
Minimum Job Qualifications:
• BE/B.Tech in Computer Science/IT from Top Colleges
• 1-5 years of experience in Datawarehousing and SQL
• Excellent Analytical Knowledge
• Excellent technical as well as communication skills
• Attention to even the smallest detail is mandatory
• Knowledge of SQL query writing and performance tuning
• Knowledge of Big Data technologies like Apache Hadoop, Apache Hive, Apache Drill
• Knowledge of fundamentals of Business Intelligence
• In-depth knowledge of RDBMS systems, Datawarehousing and Datamarts
• Smart, motivated and team oriented
• Sound knowledge of software development in Programming (preferably Java )
• Knowledge of the software development lifecycle (SDLC) and models
About the Role:
Freight Tiger is growing exponentially, and technology is at the centre of it. Our Engineers love solving complex industry problems by building modular and scalable solutions using cutting-edge technology. Your peers will be an exceptional group of Software Engineers, Quality Assurance Engineers, DevOps Engineers, and Infrastructure and Solution Architects.
This role is responsible for developing data pipelines and data engineering components to support strategic initiatives and ongoing business processes. This role works with leads, analysts, and data scientists to understand requirements, develop technical solutions, and ensure the reliability and performance of the data engineering solutions.
This role provides an opportunity to directly impact business outcomes for sales, underwriting, claims and operations functions across multiple use cases by providing them data for their analytical modelling needs.
- Create and maintain a data pipeline.
- Build and deploy ETL infrastructure for optimal data delivery.
- Work with various product, design and executive teams to troubleshoot data-related issues.
- Create tools for data analysts and scientists to help them build and optimise the product.
- Implement systems and processes for data access controls and guarantees.
- Distil the knowledge from experts in the field outside the org and optimise internal data systems.
- Should have 5+ years of relevant experience.
- Strong analytical skills.
- Degree in Computer Science, Statistics, Informatics, Information Systems.
- Strong project management and organisational skills.
- Experience supporting and working with cross-functional teams in a dynamic environment.
- SQL guru with hands-on experience on various databases.
- NoSQL databases like Cassandra, and MongoDB.
- Experience with Snowflake, Redshift.
- Experience with tools like Airflow, and Hevo.
- Experience with Hadoop, Spark, Kafka, and Flink.
- Programming experience in Python, Java, and Scala.
We are actively seeking a Senior Data Engineer experienced in building data pipelines and integrations from 3rd party data sources by writing custom automated ETL jobs using Python. The role will work in partnership with other members of the Business Analytics team to support the development and implementation of new and existing data warehouse solutions for our clients. This includes designing database import/export processes used to generate client data warehouse deliverables.
- 2+ Years experience as an ETL developer with strong data architecture knowledge around data warehousing concepts, SQL development and optimization, and operational support models.
- Experience using Python to automate ETL/Data Processes jobs.
- Design and develop ETL and data processing solutions using data integration tools, python scripts, and AWS / Azure / On-Premise Environment.
- Experience / Willingness to learn AWS Glue / AWS Data Pipeline / Azure Data Factory for Data Integration.
- Develop and create transformation queries, views, and stored procedures for ETL processes, and process automation.
- Document data mappings, data dictionaries, processes, programs, and solutions as per established standards for data governance.
- Work with the data analytics team to assess and troubleshoot potential data quality issues at key intake points such as validating control totals at intake and then upon transformation, and transparently build lessons learned into future data quality assessments
- Solid experience with data modeling, business logic, and RESTful APIs.
- Solid experience in the Linux environment.
- Experience with NoSQL / PostgreSQL preferred
- Experience working with databases such as MySQL, NoSQL, and Postgres, and enterprise-level connectivity experience (such as connecting over TLS and through proxies).
- Experience with NGINX and SSL.
- Performance tune data processes and SQL queries, and recommend and implement data process optimization and query tuning techniques.
1. Must have a very good hands-on technical experience of 3+ years with JAVA or Python
2. Working experience and good understanding of AWS Cloud; Advanced experience with IAM policy and role management
3. Infrastructure Operations: 5+ years supporting systems infrastructure operations, upgrades, deployments using Terraform, and monitoring
4. Hadoop: Experience with Hadoop (Hive, Spark, Sqoop) and / or AWS EMR
5. Knowledge on PostgreSQL/MySQL/Dynamo DB backend operations
6. DevOps: Experience with DevOps automation - Orchestration/Configuration Management and CI/CD tools (Jenkins)
7. Version Control: Working experience with one or more version control platforms like GitHub or GitLab
8. Knowledge on AWS Quick sight reporting
9. Monitoring: Hands on experience with monitoring tools such as AWS CloudWatch, AWS CloudTrail, Datadog and Elastic Search
10. Networking: Working knowledge of TCP/IP networking, SMTP, HTTP, load-balancers (ELB) and high availability architecture
11. Security: Experience implementing role-based security, including AD integration, security policies, and auditing in a Linux/Hadoop/AWS environment. Familiar with penetration testing and scan tools for remediation of security vulnerabilities.
12. Demonstrated successful experience learning new technologies quickly
WHAT WILL BE THE ROLES AND RESPONSIBILITIES?
1. Create procedures/run books for operational and security aspects of AWS platform
2. Improve AWS infrastructure by developing and enhancing automation methods
3. Provide advanced business and engineering support services to end users
4. Lead other admins and platform engineers through design and implementation decisions to achieve balance between strategic design and tactical needs
5. Research and deploy new tools and frameworks to build a sustainable big data platform
6. Assist with creating programs for training and onboarding for new end users
7. Lead Agile/Kanban workflows and team process work
8. Troubleshoot issues to resolve problems
9. Provide status updates to Operations product owner and stakeholders
10. Track all details in the issue tracking system (JIRA)
11. Provide issue review and triage problems for new service/support requests
12. Use DevOps automation tools, including Jenkins build jobs
13. Fulfil any ad-hoc data or report request queries from different functional groups