THE ROLE:Sr. Cloud Data Infrastructure Engineer
As a Sr. Cloud Data Infrastructure Engineer with Intuitive, you will be responsible for building or converting legacy data pipelines from legacy environments to modern cloud environments to help the analytics and data science initiatives across our enterprise customers. You will be working closely with SMEs in Data Engineering and Cloud Engineering, to create solutions and extend Intuitive's DataOps Engineering Projects and Initiatives. The Sr. Cloud Data Infrastructure Engineer will be a central critical role for establishing the DataOps/DataX data logistics and management for building data pipelines, enforcing best practices, ownership for building complex and performant Data Lake Environments, work closely with Cloud Infrastructure Architects and DevSecOps automation teams. The Sr. Cloud Data Infrastructure Engineer is the main point of contact for all things related to DataLake formation and data at scale. In this role, we expect our DataOps leaders to be obsessed with data and providing insights to help our end customers.
ROLES & RESPONSIBILITIES:
- Design, develop, implement, and tune large-scale distributed systems and pipelines that process large volume of data; focusing on scalability, low-latency, and fault-tolerance in every system built
- Developing scalable and re-usable frameworks for ingesting large data from multiple sources.
- Modern Data Orchestration engineering - query tuning, performance tuning, troubleshooting, and debugging big data solutions.
- Provides technical leadership, fosters a team environment, and provides mentorship and feedback to technical resources.
- Deep understanding of ETL/ELT design methodologies, patterns, personas, strategy, and tactics for complex data transformations.
- Data processing/transformation using various technologies such as spark and cloud Services.
- Understand current data engineering pipelines using legacy SAS tools and convert to modern pipelines.
Data Infrastructure Engineer Strategy Objectives: End to End Strategy
Define how data is acquired, stored, processed, distributed, and consumed.
Collaboration and Shared responsibility across disciplines as partners in delivery for progressing our maturity model in the End-to-End Data practice.
- Understanding and experience with modern cloud data orchestration and engineering for one or more of the following cloud providers - AWS, Azure, GCP.
- Leading multiple engagements to design and develop data logistic patterns to support data solutions using data modeling techniques (such as file based, normalized or denormalized, star schemas, schema on read, Vault data model, graphs) for mixed workloads, such as OLTP, OLAP, streaming using any formats (structured, semi-structured, unstructured).
- Applying leadership and proven experience with architecting and designing data implementation patterns and engineered solutions using native cloud capabilities that span data ingestion & integration (ingress and egress), data storage (raw & cleansed), data prep & processing, master & reference data management, data virtualization & semantic layer, data consumption & visualization.
- Implementing cloud data solutions in the context of business applications, cost optimization, client's strategic needs and future growth goals as it relates to becoming a 'data driven' organization.
- Applying and creating leading practices that support high availability, scalable, process and storage intensive solutions architectures to data integration/migration, analytics and insights, AI, and ML requirements.
- Applying leadership and review to create high quality detailed documentation related to cloud data Engineering.
- Implementing cloud data orchestration and data integration patterns (AWS Glue, Azure Data Factory, Event Hub, Databricks, etc.), storage and processing (Redshift, Azure Synapse, BigQuery, Snowflake)
- Possessing a certification(s) in one of the following is a big plus - AWS/Azure/GCP data engineering, and Migration.
- 10+ years’ experience as data engineer.
- Must have 5+ Years in implementing data engineering solutions with multiple cloud providers and toolsets.
- This is hands on role building data pipelines using Cloud Native and Partner Solutions. Hands-on technical experience with Data at Scale.
- Must have deep expertise in one of the programming languages for data processes (Python, Scala). Experience with Python, PySpark, Hadoop, Hive and/or Spark to write data pipelines and data processing layers.
- Must have worked with multiple database technologies and patterns. Good SQL experience for writing complex SQL transformation.
- Performance Tuning of Spark SQL running on S3/Data Lake/Delta Lake/ storage and Strong Knowledge on Databricks and Cluster Configurations.
- Nice to have Databricks administration including security and infrastructure features of Databricks.
- Experience with Development Tools for CI/CD, Unit and Integration testing, Automation and Orchestration
Power BI Developer(Azure Developer )
Senior visualization engineer with understanding in Azure Data Factory & Databricks to develop and deliver solutions that enable delivery of information to audiences in support of key business processes.
Ensure code and design quality through execution of test plans and assist in development of standards & guidelines working closely with internal and external design, business and technical counterparts.
- Strong designing concepts of data visualization centered on business user and a knack of communicating insights visually.
- Ability to produce any of the charting methods available with drill down options and action-based reporting. This includes use of right graphs for the underlying data with company themes and objects.
- Publishing reports & dashboards on reporting server and providing role-based access to users.
- Ability to create wireframes on any tool for communicating the reporting design.
- Creation of ad-hoc reports & dashboards to visually communicate data hub metrics (metadata information) for top management understanding.
- Should be able to handle huge volume of data from databases such as SQL Server, Synapse, Delta Lake or flat files and create high performance dashboards.
- Should be good in Power BI development
- Expertise in 2 or more BI (Visualization) tools in building reports and dashboards.
- Understanding of Azure components like Azure Data Factory, Data lake Store, SQL Database, Azure Databricks
- Strong knowledge in SQL queries
- Must have worked in full life-cycle development from functional design to deployment
- Intermediate understanding to format, process and transform data
- Should have working knowledge of GIT, SVN
- Good experience in establishing connection with heterogeneous sources like Hadoop, Hive, Amazon, Azure, Salesforce, SAP, HANA, API’s, various Databases etc.
- Basic understanding of data modelling and ability to combine data from multiple sources to create integrated reports
- Bachelor's degree in Computer Science or Technology
- Proven success in contributing to a team-oriented environment
Experience in AWS Glue
Experience in Apache Parquet
Proficient in AWS S3 and data lake
Knowledge of Snowflake
Understanding of file-based ingestion best practices.
Scripting language - Python & pyspark
Create and manage cloud resources in AWS
Data ingestion from different data sources which exposes data using different technologies, such as: RDBMS, flat files, Streams, and Time series data based on various proprietary systems. Implement data ingestion and processing with the help of Big Data technologies
Data processing/transformation using various technologies such as Spark and Cloud Services. You will need to understand your part of business logic and implement it using the language supported by the base data platform
Develop automated data quality check to make sure right data enters the platform and verifying the results of the calculations
Develop an infrastructure to collect, transform, combine and publish/distribute customer data.
Define process improvement opportunities to optimize data collection, insights and displays.
Ensure data and results are accessible, scalable, efficient, accurate, complete and flexible
Identify and interpret trends and patterns from complex data sets
Construct a framework utilizing data visualization tools and techniques to present consolidated analytical and actionable results to relevant stakeholders.
Key participant in regular Scrum ceremonies with the agile teams
Proficient at developing queries, writing reports and presenting findings
Mentor junior members and bring best industry practices.
5-7+ years’ experience as data engineer in consumer finance or equivalent industry (consumer loans, collections, servicing, optional product, and insurance sales)
Strong background in math, statistics, computer science, data science or related discipline
Advanced knowledge one of language: Java, Scala, Python, C#
Production experience with: HDFS, YARN, Hive, Spark, Kafka, Oozie / Airflow, Amazon Web Services (AWS), Docker / Kubernetes, Snowflake
Data mining/programming tools (e.g. SAS, SQL, R, Python)
Database technologies (e.g. PostgreSQL, Redshift, Snowflake. and Greenplum)
Data visualization (e.g. Tableau, Looker, MicroStrategy)
Comfortable learning about and deploying new technologies and tools.
Organizational skills and the ability to handle multiple projects and priorities simultaneously and meet established deadlines.
Good written and oral communication skills and ability to present results to non-technical audiences
Knowledge of business intelligence and analytical tools, technologies and techniques.
Familiarity and experience in the following is a plus:
Kafka Streaming / Kafka Connect
Cassandra / MongoDB
CI/CD: Jenkins, GitLab, Jira, Confluence other related tools
Who we are:
SkyPoint’s mission is to bring people and data together.
We are the industry's first Modern Data Stack Platform with built-in data lakehouse, account 360, customer 360, entity resolution, data privacy vault, ELT / Reverse ETL, data integration, privacy compliance automation, data governance, analytics, and managed services for organizations in several industries including healthcare, life sciences, senior living, retail, hospitality, business services, and financial services.
We follow a flexible culture founded on awareness, trust, collaboration, ethics, a strong outlook towards commitment, & customer fascination which are the building pillars of SkyPoint Cloud.
We believe in practicing the Ideal Behaviour at SkyPoint Cloud: Treat Human Asset Fair, Fun work environment, 4 E's (Embrace, Engage, Encourage & Empower), Open Communication, Curiosity & Passion.
Who we want:
SkyPoint Cloud is looking for ambitious, independent engineers who want to have a significant impact at a fast-growing company. You will work on our core data pipeline and the integrations that bring in data from many sources we support. We are looking for people who can understand the key values that make our product great and implement those values in the many small decisions you make every day as a developer.
As a Data Engineer at SkyPoint:
- You will work with Python, PySpark, Azure Databricks, VS Code, REST APIs, Azure Durable Functions, Cosmos DB, Serverless and Kubernetes container-based microservices and interact with various Delta Lakehouse and NoSQL databases.
- You will process the data into clean, unified, incremental, automated updates via Azure Durable Functions, Azure Data Factory, Delta Lake, and Spark.
- You will analyze customer data from various connectors, generalize the PII attributes involved in our product’s state-of-the-art Stitch process, and create unified customer profiles.
Primary Duties & Responsibilities:
- Bachelor’s degree, preferably in Data Engineering or Computer Science with 6+ years of experience working on SaaS products.
- Experience working with languages like Python, and Java and technologies such as serverless and containers.
- Strong technical and problem-solving skills, with recent hands-on in Databricks.
- Experience in reliable distributed systems, with an emphasis on high-volume data management within the enterprise and/or web-scale products and platforms that operate under strict SLAs.
- Broad technical knowledge which encompasses Software development and automation. Experience with the use of a wide array of algorithms and data structures.
- Knowledge of working with Azure Functions, Azure Data Lake, Azure Data Factory, Azure Databricks, Spark, Azure DevOps, and Delta Lake.
- Outstanding track record in liaising directly with global clients and prioritizing working on flexible timings to connect with the virtual team, including daily stand-up calls and cross-regional collabs.
Required Qualifications and Skills:
- BS / BE / MS in Computer Science & Engineering and professional work experience.
- Most recent work experience MUST include work on Python (Programming Language) and Databricks.
- Good to have knowledge in Azure Durable Functions, Azure SQL, Cosmos DB, Azure Data Factory, Delta Lakehouse, PySpark, NoSql DB, Serverless, and Kubernetes container-based microservices
- Excellent verbal and written communication skills.
- Exceptional track record of exposure to global clients.
Understand long-term and short-term business requirements to precision match it with the capabilities of different distributed storage and computing technologies from the plethora of options available in the ecosystem.
Create complex data processing pipelines
Design scalable implementations of the models developed by our Data Scientist.
Deploy data pipelines in production systems based on CICD practices
Create and maintain clear documentation on data models/schemas as well as
Troubleshoot and remediate data quality issues raised by pipeline alerts or downstream consumers
- Project Role : AWS Glue Application Developer
- Project Role Description :Design, build and configure applications to meet business process and application requirements.
- Work Experience :4-6 years
- Work location : Off Shore/On-Site
- Must Have Skills : AWS, Glue, DMS, Data integrations and Data Ops
Job Requirements :
- Key Responsibilities : 5 years of work experience with ETL, Data Modelling, and Data Architecture Proficient in ETL optimization, designing, coding, and tuning big data processes using Pyspark Extensive experience to build data platforms on AWS using core AWS services Step function, EMR, Lambda, Glue and Athena, Redshift, Postgres, RDS etc and design/develop data engineering solutions. Orchestrate using Airflow.
- Technical Experience : Hands-on experience on developing Data platform and its components Data lake, cloud Datawarehouse, APIs, Batch and streaming data pipeline Experience with building data pipelines and applications to stream and process large datasets at low latencies
- Enhancements, new development, defect resolution and production support of Big data ETL development using AWS native services.
- Create data pipeline architecture by designing and implementing data ingestion solutions.
- Integrate data sets using AWS services such as Glue, Lambda functions/ Airflow
- Design and optimize data models on AWS Cloud using AWS data stores such as Redshift, RDS, S3, Athena
- Author ETL processes using Python, Pyspark
- Build Redshift Spectrum direct transformations and data modelling using data in S3
- ETL process monitoring using Cloudwatch events
- You will be working in collaboration with other teams. Good communication must.
- Must have experience in using AWS services API, AWS CLI and SDK
- Professional Attributes : Experience operating very large data warehouses or data lakes Expert-level skills in writing and optimizing SQL Extensive, real-world experience designing technology components for enterprise solutions and defining solution architectures and reference architectures with a focus on cloud technology
- Must have 4+ years of big data ETL experience using Python, S3, Lambda, Dynamo DB, Athena, Glue in AWS environment
- Expertise in S3, RDS, Redshift, Kinesis, EC2 clusters highly desired
LodgIQ is led by a team of experienced hospitality technology experts, data scientists and product domain experts. Seed funded by Highgate Ventures, a venture capital platform focused on early stage technology investments in the hospitality industry and Trilantic Capital Partners, a global private equity firm, LodgIQ has made a significant investment in advanced machine learning platforms and data science.
Title : Data Scientist
- Apply Data Science and Machine Learning to a REAL-LIFE problem - “Predict Guest Arrivals and Determine Best Prices for Hotels”
- Apply advanced analytics in a BIG Data Environment – AWS, MongoDB, SKLearn
- Help scale up the product in a global offering across 100+ global markets
- Minimum 3 years of experience with advanced data analytic techniques, including data mining, machine learning, statistical analysis, and optimization. Student projects are acceptable.
- At least 1 year of experience with Python / Numpy / Pandas / Scipy/ MatPlotLib / Scikit-Learn
- Experience in working with massive data sets, including structured and unstructured with at least 1 prior engagement involving data gathering, data cleaning, data mining, and data visualization
- Solid grasp over optimization techniques
- Master's or PhD degree in Business Analytics. Data science, Statistics or Mathematics
- Ability to show a track record of solving large, complex problems
- Does analytics to extract insights from raw historical data of the organization.
- Generates usable training dataset for any/all MV projects with the help of Annotators, if needed.
- Analyses user trends, and identifies their biggest bottlenecks in Hammoq Workflow.
- Tests the short/long term impact of productized MV models on those trends.
- Skills - Numpy, Pandas, SPARK, APACHE SPARK, PYSPARK, ETL mandatory.
We are looking for ETL Developer for Reputed Client @ Coimbatore Permanent role
Work Location : Coimbatore
Experience : 4+ Years
- Talend (or)Strong experience in any of the ETL Tools like (Informatica/Datastage/Talend)
- DB preference (Teradata /Oracle /Sql server )
- Supporting Tools (JIRA/SVN)
- Insurance P&C and Specialty domain experience a plus
- Experience in a cloud-based architecture preferred, such as Databricks, Azure Data Lake, Azure Data Factory, etc.
- Strong understanding of ETL fundamentals and solutions. Should be proficient in writing advanced / complex SQL, expertise in performance tuning and optimization of SQL queries required.
- Strong experience in Python/PySpark and Spark SQL
- Experience in troubleshooting data issues, analyzing end to end data pipelines, and working with various teams in resolving issues and solving complex problems.
- Strong experience developing Spark applications using PySpark and SQL for data extraction, transformation, and aggregation from multiple formats for analyzing & transforming the data to uncover insights and actionable intelligence for internal and external use
bachelor’s degree or equivalent experience
● Knowledge of database fundamentals and fluency in advanced SQL, including concepts
such as windowing functions
● Knowledge of popular scripting languages for data processing such as Python, as well as
familiarity with common frameworks such as Pandas
● Experience building streaming ETL pipelines with tools such as Apache Flink, Apache
Beam, Google Cloud Dataflow, DBT and equivalents
● Experience building batch ETL pipelines with tools such as Apache Airflow, Spark, DBT, or
● Experience working with messaging systems such as Apache Kafka (and hosted
equivalents such as Amazon MSK), Apache Pulsar
● Familiarity with BI applications such as Tableau, Looker, or Superset
● Hands on coding experience in Java or Scala