Primary Duties and Responsibilities
- Experience with Informatica Multidomain MDM 10.4 tool suite preferred
- Partnering with data architects and engineers to ensure an optimal data model design and implementation for each MDM domain in accordance with industry and MDM best practices
- Works with data governance and business steward(s) to design, develop, and configure business rules for data validation, standardize, match, and merge
- Implementation of Data Quality policies, procedures and standards along with Data Governance Team for maintenance of customer, location, product, and other data domains; Experience with Informatica IDQ tool suite preferred.
- Performs data analysis and source-to-target mapping for ingest and egress of data.
- Maintain compliance with change control, SDLC, and development standards.
- Champion the creation and contribution to technical documentation and diagrams.
- Establishes a technical vision and strategy with the team and works with the team to turn it into reality.
- Emphasis on coaching and training to cultivate skill development of team members within the department.
- Responsible for keeping up with industry best practices and trends.
- Monitor, troubleshoot, maintain, and continuously improve the MDM ecosystem.
Secondary Duties and Responsibilities
- May participate in off-hours on-call rotation.
- Attends and is prepared to participate in team, department and company meetings.
- Performs other job related duties and special projects as assigned.
Supervisory Responsibilities
This is a non-management role
Education and Experience
- Bachelor's degree in MIS, Computer Sciences, Business Administration, or related field; or High School Degree/General Education Diploma and 4 years of relevant experience in lieu of Bachelor's degree.
- 5+ years of experience in implementing MDM solutions using Informatica MDM.
- 2+ years of experience in data stewardship, data governance, and data management concepts.
- Professional working knowledge of Customer 360 solution
- Professional working knowledge in multi domain MDM data modeling.
- Strong understanding of company master data sets and their application in complex business processes and support data profiling, extraction, and cleansing activities using Informatica Data Quality (IDQ).
- Strong knowledge in the installation and configuration of the Informatica MDM Hub.
- Familiarity with real-time, near real-time and batch data integration.
- Strong experience and understanding of Informatica toolsets including Informatica MDM Hub, Informatica Data Quality (IDQ), Informatica Customer 360, Informatica EDC, Hierarchy Manager (HM), Business Entity Service Model, Address Doctor, Customizations & Composite Services
- Experience with event-driven architectures (e.g. Kafka, Google Pub/Sub, Azure Event Hub, etc.).
- Professional working knowledge of CI/CD technologies such as Concourse, TeamCity, Octopus, Jenkins, and CircleCI.
- Team player that exhibits high energy, strategic thinking, collaboration, direct communication and results orientation.
Physical Requirements
- Visual requirements include: ability to see detail at near range with or without correction. Must be physically able to perform sedentary work: occasionally lifting or carrying objects of no more than 10 pounds, and occasionally standing or walking, reaching, handling, grasping, feeling, talking, hearing and repetitive motions.
Working Conditions
- The duties of this position are performed through a combination of an open office setting and remote work options. Full remote work options available for employees that reside outside of the Des Moines Metro Area. There is frequent pressure to meet deadlines and handle multiple projects in a day.
Equipment Used to Perform Job
- Windows, or Mac computer and various software solutions.
Financial Responsibility
- Responsible for company assets including maintenance of software solutions.
Contacts
- Has frequent contact with office personnel in other departments related to the position, as well as occasional contact with users and customers. Engages stakeholders from other area in the business.
Confidentiality
- Has access to confidential information including trade secrets, intellectual property, various financials, and customer data.
About T500
Similar jobs
Lightning Job By Cutshort⚡
As part of this feature, you can expect status updates about your application and replies within 72 hours (once the screening questions are answered).
About DataGrokr
DataGrokr (https://www.datagrokr.com) is a cloud native technology consulting organization providing the next generation of big data analytics, cloud and enterprise solutions. We solve complex technology problems for our global clients who rely on us for our deep technical knowledge and delivery excellence.
If you are unafraid of technology, believe in your learning ability and are looking to work amongst smart, driven colleagues whom you can look up to and learn from, you might want to check us out.
About the role
We are looking for a Senior Data Engineer to join our growing engineering team. As a member of the team,
• You will work on enterprise data platforms, architect and implement data lakes both on-prem and in the cloud.
• You will be responsible for evolving technical architecture, design and implementation of data solutions using a variety of big data technologies. You will work extensively on all major public cloud platforms - AWS, Azure and GCP.
• You will work with senior technical architects on our client side to evolve an effective technology architecture and development strategy to implement the solution.
• You will work with extremely talented peers and follow modern engineering practices using agile methodologies.
• You will coach, mentor and lead other engineers and provide guidance to ensure the quality of and consistency of the solution.
Must-have skills and attitudes:
• Passion for data engineering, in-depth knowledge of some of the following technologies – SQL (expert level), Python (expert level), Spark (intermediate level), Big data stack of one of AWS/GCP.
• Hands on experience in data wrangling, data munging and ETL. Should be able to source data from anywhere and transform data to any shape using SQL, Python or Spark.
• Hands on experience working with variable data structures like XML/JSON/AVRO etc
• Ability to create data models and architect data warehouse components
• Experience with Version control (GIT/BIT BUCKET etc)
• Strong understanding of Agile, experience with CI/CD pipelines and processes
• Ability to communicate with technical as well as non-technical audience
• Collaborating with various stakeholders
• Have led scrum teams, participated in Sprint grooming and planning sessions, work / effort sizing and estimation
Desired Skills & Experience:
• At least 5 years of industry experience
• Working knowledge of any of the following - AWS Big Data Stack (S3, Redshift, Athena, Glue, etc.), GCP Big Data Stack (Cloud Storage, Workflow, Dataflow, Cloud Functions, Big Query, Pub Sub, etc.).
• Working knowledge of traditional enterprise data warehouse architectures and migrating them to the Cloud.
• Experience with Data Visualization tool (Tableau / Power BI etc)
• Experience with JIRA / Azure DevOps etc
How will DataGrokr support you in your growth:
• You will be groomed and mentored by senior leaders to take on leadership positions in the company
• You will be actively encouraged to attain certifications, lead technical workshops and conduct meetups to grow your own technology acumen and personal brand
• You will work in an open culture that promotes commitment over compliance, individual responsibility over rules and bringing out the best in everyone.
- Big data developer with 8+ years of professional IT experience with expertise in Hadoop ecosystem components in ingestion, Data modeling, querying, processing, storage, analysis, Data Integration and Implementing enterprise level systems spanning Big Data.
- A skilled developer with strong problem solving, debugging and analytical capabilities, who actively engages in understanding customer requirements.
- Expertise in Apache Hadoop ecosystem components like Spark, Hadoop Distributed File Systems(HDFS), HiveMapReduce, Hive, Sqoop, HBase, Zookeeper, YARN, Flume, Pig, Nifi, Scala and Oozie.
- Hands on experience in creating real - time data streaming solutions using Apache Spark core, Spark SQL & DataFrames, Kafka, Spark streaming and Apache Storm.
- Excellent knowledge of Hadoop architecture and daemons of Hadoop clusters, which include Name node,Data node, Resource manager, Node Manager and Job history server.
- Worked on both Cloudera and Horton works in Hadoop Distributions. Experience in managing Hadoop clustersusing Cloudera Manager tool.
- Well versed in installation, Configuration, Managing of Big Data and underlying infrastructure of Hadoop Cluster.
- Hands on experience in coding MapReduce/Yarn Programs using Java, Scala and Python for analyzing Big Data.
- Exposure to Cloudera development environment and management using Cloudera Manager.
- Extensively worked on Spark using Scala on cluster for computational (analytics), installed it on top of Hadoop performed advanced analytical application by making use of Spark with Hive and SQL/Oracle .
- Implemented Spark using PYTHON and utilizing Data frames and Spark SQL API for faster processing of data and handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, MapReduce and then loading data into HDFS.
- Used Spark Data Frames API over Cloudera platform to perform analytics on Hive data.
- Hands on experience in MLlib from Spark which are used for predictive intelligence, customer segmentation and for smooth maintenance in Spark streaming.
- Experience in using Flume to load log files into HDFS and Oozie for workflow design and scheduling.
- Experience in optimizing MapReduce jobs to use HDFS efficiently by using various compression mechanisms.
- Working on creating data pipeline for different events of ingestion, aggregation, and load consumer response data into Hive external tables in HDFS location to serve as feed for tableau dashboards.
- Hands on experience in using Sqoop to import data into HDFS from RDBMS and vice-versa.
- In-depth Understanding of Oozie to schedule all Hive/Sqoop/HBase jobs.
- Hands on expertise in real time analytics with Apache Spark.
- Experience in converting Hive/SQL queries into RDD transformations using Apache Spark, Scala and Python.
- Extensive experience in working with different ETL tool environments like SSIS, Informatica and reporting tool environments like SQL Server Reporting Services (SSRS).
- Experience in Microsoft cloud and setting cluster in Amazon EC2 & S3 including the automation of setting & extending the clusters in AWS Amazon cloud.
- Extensively worked on Spark using Python on cluster for computational (analytics), installed it on top of Hadoop performed advanced analytical application by making use of Spark with Hive and SQL.
- Strong experience and knowledge of real time data analytics using Spark Streaming, Kafka and Flume.
- Knowledge in installation, configuration, supporting and managing Hadoop Clusters using Apache, Cloudera (CDH3, CDH4) distributions and on Amazon web services (AWS).
- Experienced in writing Ad Hoc queries using Cloudera Impala, also used Impala analytical functions.
- Experience in creating Data frames using PySpark and performing operation on the Data frames using Python.
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS and MapReduce Programming Paradigm, High Availability and YARN architecture.
- Establishing multiple connections to different Redshift clusters (Bank Prod, Card Prod, SBBDA Cluster) and provide the access for pulling the information we need for analysis.
- Generated various kinds of knowledge reports using Power BI based on Business specification.
- Developed interactive Tableau dashboards to provide a clear understanding of industry specific KPIs using quick filters and parameters to handle them more efficiently.
- Well Experience in projects using JIRA, Testing, Maven and Jenkins build tools.
- Experienced in designing, built, and deploying and utilizing almost all the AWS stack (Including EC2, S3,), focusing on high-availability, fault tolerance, and auto-scaling.
- Good experience with use-case development, with Software methodologies like Agile and Waterfall.
- Working knowledge of Amazon's Elastic Cloud Compute( EC2 ) infrastructure for computational tasks and Simple Storage Service ( S3 ) as Storage mechanism.
- Good working experience in importing data using Sqoop, SFTP from various sources like RDMS, Teradata, Mainframes, Oracle, Netezza to HDFS and performed transformations on it using Hive, Pig and Spark .
- Extensive experience in Text Analytics, developing different Statistical Machine Learning solutions to various business problems and generating data visualizations using Python and R.
- Proficient in NoSQL databases including HBase, Cassandra, MongoDB and its integration with Hadoop cluster.
- Hands on experience in Hadoop Big data technology working on MapReduce, Pig, Hive as Analysis tool, Sqoop and Flume data import/export tools.
Responsibilities:
- Be the analytical expert in Kaleidofin, managing ambiguous problems by using data to execute sophisticated quantitative modeling and deliver actionable insights.
- Develop comprehensive skills including project management, business judgment, analytical problem solving and technical depth.
- Become an expert on data and trends, both internal and external to Kaleidofin.
- Communicate key state of the business metrics and develop dashboards to enable teams to understand business metrics independently.
- Collaborate with stakeholders across teams to drive data analysis for key business questions, communicate insights and drive the planning process with company executives.
- Automate scheduling and distribution of reports and support auditing and value realization.
- Partner with enterprise architects to define and ensure proposed.
- Business Intelligence solutions adhere to an enterprise reference architecture.
- Design robust data-centric solutions and architecture that incorporates technology and strong BI solutions to scale up and eliminate repetitive tasks.
- Experience leading development efforts through all phases of SDLC.
- 2+ years "hands-on" experience designing Analytics and Business Intelligence solutions.
- Experience with Quicksight, PowerBI, Tableau and Qlik is a plus.
- Hands on experience in SQL, data management, and scripting (preferably Python).
- Strong data visualisation design skills, data modeling and inference skills.
- Hands-on and experience in managing small teams.
- Financial services experience preferred, but not mandatory.
- Strong knowledge of architectural principles, tools, frameworks, and best practices.
- Excellent communication and presentation skills to communicate and collaborate with all levels of the organisation.
- Preferred candidates with less than 30 days notice period.
XpressBees – a logistics company started in 2015 – is amongst the fastest growing
companies of its sector. While we started off rather humbly in the space of
ecommerce B2C logistics, the last 5 years have seen us steadily progress towards
expanding our presence. Our vision to evolve into a strong full-service logistics
organization reflects itself in our new lines of business like 3PL, B2B Xpress and cross
border operations. Our strong domain expertise and constant focus on meaningful
innovation have helped us rapidly evolve as the most trusted logistics partner of
India. We have progressively carved our way towards best-in-class technology
platforms, an extensive network reach, and a seamless last mile management
system. While on this aggressive growth path, we seek to become the one-stop-shop
for end-to-end logistics solutions. Our big focus areas for the very near future
include strengthening our presence as service providers of choice and leveraging the
power of technology to improve efficiencies for our clients.
Job Profile
As a Lead Data Engineer in the Data Platform Team at XpressBees, you will build the data platform
and infrastructure to support high quality and agile decision-making in our supply chain and logistics
workflows.
You will define the way we collect and operationalize data (structured / unstructured), and
build production pipelines for our machine learning models, and (RT, NRT, Batch) reporting &
dashboarding requirements. As a Senior Data Engineer in the XB Data Platform Team, you will use
your experience with modern cloud and data frameworks to build products (with storage and serving
systems)
that drive optimisation and resilience in the supply chain via data visibility, intelligent decision making,
insights, anomaly detection and prediction.
What You Will Do
• Design and develop data platform and data pipelines for reporting, dashboarding and
machine learning models. These pipelines would productionize machine learning models
and integrate with agent review tools.
• Meet the data completeness, correction and freshness requirements.
• Evaluate and identify the data store and data streaming technology choices.
• Lead the design of the logical model and implement the physical model to support
business needs. Come up with logical and physical database design across platforms (MPP,
MR, Hive/PIG) which are optimal physical designs for different use cases (structured/semi
structured). Envision & implement the optimal data modelling, physical design,
performance optimization technique/approach required for the problem.
• Support your colleagues by reviewing code and designs.
• Diagnose and solve issues in our existing data pipelines and envision and build their
successors.
Qualifications & Experience relevant for the role
• A bachelor's degree in Computer Science or related field with 6 to 9 years of technology
experience.
• Knowledge of Relational and NoSQL data stores, stream processing and micro-batching to
make technology & design choices.
• Strong experience in System Integration, Application Development, ETL, Data-Platform
projects. Talented across technologies used in the enterprise space.
• Software development experience using:
• Expertise in relational and dimensional modelling
• Exposure across all the SDLC process
• Experience in cloud architecture (AWS)
• Proven track record in keeping existing technical skills and developing new ones, so that
you can make strong contributions to deep architecture discussions around systems and
applications in the cloud ( AWS).
• Characteristics of a forward thinker and self-starter that flourishes with new challenges
and adapts quickly to learning new knowledge
• Ability to work with a cross functional teams of consulting professionals across multiple
projects.
• Knack for helping an organization to understand application architectures and integration
approaches, to architect advanced cloud-based solutions, and to help launch the build-out
of those systems
• Passion for educating, training, designing, and building end-to-end systems.
- Understand the business drivers and analytical use-cases.
- Translate use cases to data models, descriptive, analytical, predictive, and engineering outcomes.
- Explore new technologies and learn new techniques to solve business problems creatively
- Think big! and drive the strategy for better data quality for the customers.
- Become the voice of business within engineering and of engineering within the business with customers.
- Collaborate with many teams - engineering and business, to build better data products and services
- Deliver the projects along with the team collaboratively and manage updates to customers on time
What we're looking for :
- Hands-on experience in data modeling, data visualization, and pipeline design and development
- Hands-on exposure to Machine learning concepts like supervised learning, unsupervised learning, RNN, DNN.
- Prior experience working with business stakeholders, in an enterprise space is a plus
- Great communication skills. You should be able to directly communicate with senior business leaders, embed yourself with business teams, and present solutions to business stakeholders
- Experience in working independently and driving projects end to end, strong analytical skills.
Job Description
The applicant must have a minimum of 5 years of hands-on IT experience, working on a full software lifecycle in Agile mode.
Good to have experience in data modeling and/or systems architecture.
Responsibilities will include technical analysis, design, development and perform enhancements.
You will participate in all/most of the following activities:
- Working with business analysts and other project leads to understand requirements.
- Modeling and implementing database schemas in DB2 UDB or other relational databases.
- Designing, developing, maintaining and Data processing using Python, DB2, Greenplum, Autosys and other technologies
Skills /Expertise Required :
Work experience in developing large volume database (DB2/Greenplum/Oracle/Sybase).
Good experience in writing stored procedures, integration of database processing, tuning and optimizing database queries.
Strong knowledge of table partitions, high-performance loading and data processing.
Good to have hands-on experience working with Perl or Python.
Hands on development using Spark / KDB / Greenplum platform will be a strong plus.
Designing, developing, maintaining and supporting Data Extract, Transform and Load (ETL) software using Informatica, Shell Scripts, DB2 UDB and Autosys.
Coming up with system architecture/re-design proposals for greater efficiency and ease of maintenance and developing software to turn proposals into implementations.
Need to work with business analysts and other project leads to understand requirements.
Strong collaboration and communication skills
o Strong Python development skills, with 7+ yrs. experience with SQL.
o A bachelor or master’s degree in Computer Science or related areas
o 5+ years of experience in data integration and pipeline development
o Experience in Implementing Databricks Delta lake and data lake
o Expertise designing and implementing data pipelines using modern data engineering approach and tools: SQL, Python, Delta Lake, Databricks, Snowflake Spark
o Experience in working with multiple file formats (Parque, Avro, Delta Lake) & API
o experience with AWS Cloud on data integration with S3.
o Hands on Development experience with Python and/or Scala.
o Experience with SQL and NoSQL databases.
o Experience in using data modeling techniques and tools (focused on Dimensional design)
o Experience with micro-service architecture using Docker and Kubernetes
o Have experience working with one or more of the public cloud providers i.e. AWS, Azure or GCP
o Experience in effectively presenting and summarizing complex data to diverse audiences through visualizations and other means
o Excellent verbal and written communications skills and strong leadership capabilities
Skills:
ML
MOdelling
Python
SQL
Azure Data Lake, dataFactory, Databricks, Delta Lake
- Participate in planning, implementation of solutions, and transformation programs from legacy system to a cloud-based system
- Work with the team on Analysis, High level and low-level design for solutions using ETL or ELT based solutions and DB services in RDS
- Work closely with the architect and engineers to design systems that effectively reflect business needs, security requirements, and service level requirements
- Own deliverables related to design and implementation
- Own Sprint tasks and drive the team towards the goal while understanding the change and release process defined by the organization.
- Excellent communication skills, particularly those relating to complex findings and presenting them to ensure audience appeal at various levels of the organization
- Ability to integrate research and best practices into problem avoidance and continuous improvement
- Must be able to perform as an effective member in a team-oriented environment, maintain a positive attitude, and achieve desired results while working with minimal supervision
Basic Qualifications:
- Minimum of 5+ years of technical work experience in the implementation of complex, large scale, enterprise-wide projects including analysis, design, core development, and delivery
- Minimum of 3+ years of experience with expertise in Informatica ETL, Informatica Power Center, and Informatica Data Quality
- Experience with Informatica MDM tool is good to have
- Should be able to understand the scope of the work and ask for clarifications
- Should have advanced SQL skills. Including complex PL/SQL coding skills
- Knowledge of Agile is plus
- Well-versed with SOAP, Webservice, and REST API.
- Hand on development using Java would be a plus.