Big Data Engineer
at Altimetrik
- Expertise in building AWS Data Engineering pipelines with AWS Glue -> Athena -> Quick sight
- Experience in developing lambda functions with AWS Lambda
- Expertise with Spark/PySpark – Candidate should be hands on with PySpark code and should be able to do transformations with Spark
- Should be able to code in Python and Scala.
- Snowflake experience will be a plus
Similar jobs
Position Overview: We are seeking a talented Data Engineer with expertise in Power BI to join our team. The ideal candidate will be responsible for designing and implementing data pipelines, as well as developing insightful visualizations and reports using Power BI. Additionally, the candidate should have strong skills in Python, data analytics, PySpark, and Databricks. This role requires a blend of technical expertise, analytical thinking, and effective communication skills.
Key Responsibilities:
- Design, develop, and maintain data pipelines and architectures using PySpark and Databricks.
- Implement ETL processes to extract, transform, and load data from various sources into data warehouses or data lakes.
- Collaborate with data analysts and business stakeholders to understand data requirements and translate them into actionable insights.
- Develop interactive dashboards, reports, and visualizations using Power BI to communicate key metrics and trends.
- Optimize and tune data pipelines for performance, scalability, and reliability.
- Monitor and troubleshoot data infrastructure to ensure data quality, integrity, and availability.
- Implement security measures and best practices to protect sensitive data.
- Stay updated with emerging technologies and best practices in data engineering and data visualization.
- Document processes, workflows, and configurations to maintain a comprehensive knowledge base.
Requirements:
- Bachelor’s degree in Computer Science, Engineering, or related field. (Master’s degree preferred)
- Proven experience as a Data Engineer with expertise in Power BI, Python, PySpark, and Databricks.
- Strong proficiency in Power BI, including data modeling, DAX calculations, and creating interactive reports and dashboards.
- Solid understanding of data analytics concepts and techniques.
- Experience working with Big Data technologies such as Hadoop, Spark, or Kafka.
- Proficiency in programming languages such as Python and SQL.
- Hands-on experience with cloud platforms like AWS, Azure, or Google Cloud.
- Excellent analytical and problem-solving skills with attention to detail.
- Strong communication and collaboration skills to work effectively with cross-functional teams.
- Ability to work independently and manage multiple tasks simultaneously in a fast-paced environment.
Preferred Qualifications:
- Advanced degree in Computer Science, Engineering, or related field.
- Certifications in Power BI or related technologies.
- Experience with data visualization tools other than Power BI (e.g., Tableau, QlikView).
- Knowledge of machine learning concepts and frameworks.
Job Responsibilities
- Design machine learning systems
- Research and implement appropriate ML algorithms and tools
- Develop machine learning applications according to requirements
- Select appropriate datasets and data representation methods
- Run machine learning tests and experiments
- Perform statistical analysis and fine-tuning using test results
- Train and retrain systems when necessary
Requirements for the Job
- Bachelor’s/Master's/PhD in Computer Science, Mathematics, Statistics or equivalent field andmust have a minimum of 2 years of overall experience in tier one colleges
- Minimum 1 year of experience working as a Data Scientist in deploying ML at scale in production
- Experience in machine learning techniques (e.g. NLP, Computer Vision, BERT, LSTM etc..) andframeworks (e.g. TensorFlow, PyTorch, Scikit-learn, etc.)
- Working knowledge in deployment of Python systems (using Flask, Tensorflow Serving)
- Previous experience in following areas will be preferred: Natural Language Processing(NLP) - Using LSTM and BERT; chatbots or dialogue systems, machine translation, comprehension of text, text summarization.
- Computer Vision - Deep Neural Networks/CNNs for object detection and image classification, transfer learning pipeline and object detection/instance segmentation (Mask R-CNN, Yolo, SSD).
- 5+ years of experience in a Data Engineering role on cloud environment
- Must have good experience in Scala/PySpark (preferably on data-bricks environment)
- Extensive experience with Transact-SQL.
- Experience in Data-bricks/Spark.
- Strong experience in Dataware house projects
- Expertise in database development projects with ETL processes.
- Manage and maintain data engineering pipelines
- Develop batch processing, streaming and integration solutions
- Experienced in building and operationalizing large-scale enterprise data solutions and applications
- Using one or more of Azure data and analytics services in combination with custom solutions
- Azure Data Lake, Azure SQL DW (Synapse), and SQL Database products or equivalent products from other cloud services providers
- In-depth understanding of data management (e. g. permissions, security, and monitoring).
- Cloud repositories for e.g. Azure GitHub, Git
- Experience in an agile environment (Prefer Azure DevOps).
Good to have
- Manage source data access security
- Automate Azure Data Factory pipelines
- Continuous Integration/Continuous deployment (CICD) pipelines, Source Repositories
- Experience in implementing and maintaining CICD pipelines
- Power BI understanding, Delta Lake house architecture
- Knowledge of software development best practices.
- Excellent analytical and organization skills.
- Effective working in a team as well as working independently.
- Strong written and verbal communication skills.
- Expertise in database development projects and ETL processes.
Responsibilities
- Understanding the business requirements so as to formulate the problems to solve and restrict the slice of data to be explored.
- Collecting data from various sources.
- Performing cleansing, processing, and validation on the data subject to analyze, in order to ensure its quality.
- Exploring and visualizing data.
- Performing statistical analysis and experiments to derive business insights.
- Clearly communicating the findings from the analysis to turn information into something actionable through reports, dashboards, and/or presentations.
Skills
- Experience solving problems in the project’s business domain.
- Experience with data integration from multiple sources
- Proficiency in at least one query language, especially SQL.
- Working experience with NoSQL databases, such as MongoDB and Elasticsearch.
- Working experience with popular statistical and machine learning techniques, such as clustering, linear regression, KNN, decision trees, etc.
- Good scripting skills using Python, R or any other relevant language
- Proficiency in at least one data visualization tool, such as Matplotlib, Plotly, D3.js, ggplot, etc.
- Great communication skills.
JD for IOT DE:
The role requires experience in Azure core technologies – IoT Hub/ Event Hub, Stream Analytics, IoT Central, Azure Data Lake Storage, Azure Cosmos, Azure Data Factory, Azure SQL Database, Azure HDInsight / Databricks, SQL data warehouse.
You Have:
- Minimum 2 years of software development experience
- Minimum 2 years of experience in IoT/streaming data pipelines solution development
- Bachelor's and/or Master’s degree in computer science
- Strong Consulting skills in data management including data governance, data quality, security, data integration, processing, and provisioning
- Delivered data management projects with real-time/near real-time data insights delivery on Azure Cloud
- Translated complex analytical requirements into the technical design including data models, ETLs, and Dashboards / Reports
- Experience deploying dashboards and self-service analytics solutions on both relational and non-relational databases
- Experience with different computing paradigms in databases such as In-Memory, Distributed, Massively Parallel Processing
- Successfully delivered large scale IOT data management initiatives covering Plan, Design, Build and Deploy phases leveraging different delivery methodologies including Agile
- Experience in handling telemetry data with Spark Streaming, Kafka, Flink, Scala, Pyspark, Spark SQL.
- Hands-on experience on containers and Dockers
- Exposure to streaming protocols like MQTT and AMQP
- Knowledge of OT network protocols like OPC UA, CAN Bus, and similar protocols
- Strong knowledge of continuous integration, static code analysis, and test-driven development
- Experience in delivering projects in a highly collaborative delivery model with teams at onsite and offshore
- Must have excellent analytical and problem-solving skills
- Delivered change management initiatives focused on driving data platforms adoption across the enterprise
- Strong verbal and written communications skills are a must, as well as the ability to work effectively across internal and external organizations
Roles & Responsibilities
You Will:
- Translate functional requirements into technical design
- Interact with clients and internal stakeholders to understand the data and platform requirements in detail and determine core Azure services needed to fulfill the technical design
- Design, Develop and Deliver data integration interfaces in ADF and Azure Databricks
- Design, Develop and Deliver data provisioning interfaces to fulfill consumption needs
- Deliver data models on Azure platform, it could be on Azure Cosmos, SQL DW / Synapse, or SQL
- Advise clients on ML Engineering and deploying ML Ops at Scale on AKS
- Automate core activities to minimize the delivery lead times and improve the overall quality
- Optimize platform cost by selecting the right platform services and architecting the solution in a cost-effective manner
- Deploy Azure DevOps and CI CD processes
- Deploy logging and monitoring across the different integration points for critical alerts
- Creating, designing and developing data models
- Prepare plans for all ETL (Extract/Transformation/Load) procedures and architectures
- Validating results and creating business reports
- Monitoring and tuning data loads and queries
- Develop and prepare a schedule for a new data warehouse
- Analyze large databases and recommend appropriate optimization for the same
- Administer all requirements and design various functional specifications for data
- Provide support to the Software Development Life cycle
- Prepare various code designs and ensure efficient implementation of the same
- Evaluate all codes and ensure the quality of all project deliverables
- Monitor data warehouse work and provide subject matter expertise
- Hands-on BI practices, data structures, data modeling, SQL skills
- Minimum 1 year experience in Pyspark
- Building and operationalizing large scale enterprise data solutions and applications using one or more of AZURE data and analytics services in combination with custom solutions - Azure Synapse/Azure SQL DWH, Azure Data Lake, Azure Blob Storage, Spark, HDInsights, Databricks, CosmosDB, EventHub/IOTHub.
- Experience in migrating on-premise data warehouses to data platforms on AZURE cloud.
- Designing and implementing data engineering, ingestion, and transformation functions
- Experience with Azure Analysis Services
- Experience in Power BI
- Experience with third-party solutions like Attunity/Stream sets, Informatica
- Experience with PreSales activities (Responding to RFPs, Executing Quick POCs)
- Capacity Planning and Performance Tuning on Azure Stack and Spark.
Role Summary/Purpose:
We are looking for a Developer/Senior Developers to be a part of building advanced analytical platform leveraging Big Data technologies and transform the legacy systems. This role is an exciting, fast-paced, constantly changing and challenging work environment, and will play an important role in resolving and influencing high-level decisions.
Requirements:
- The candidate must be a self-starter, who can work under general guidelines in a fast-spaced environment.
- Overall minimum of 4 to 8 year of software development experience and 2 years in Data Warehousing domain knowledge
- Must have 3 years of hands-on working knowledge on Big Data technologies such as Hadoop, Hive, Hbase, Spark, Kafka, Spark Streaming, SCALA etc…
- Excellent knowledge in SQL & Linux Shell scripting
- Bachelors/Master’s/Engineering Degree from a well-reputed university.
- Strong communication, Interpersonal, Learning and organizing skills matched with the ability to manage stress, Time, and People effectively
- Proven experience in co-ordination of many dependencies and multiple demanding stakeholders in a complex, large-scale deployment environment
- Ability to manage a diverse and challenging stakeholder community
- Diverse knowledge and experience of working on Agile Deliveries and Scrum teams.
Responsibilities
- Should works as a senior developer/individual contributor based on situations
- Should be part of SCRUM discussions and to take requirements
- Adhere to SCRUM timeline and deliver accordingly
- Participate in a team environment for the design, development and implementation
- Should take L3 activities on need basis
- Prepare Unit/SIT/UAT testcase and log the results
- Co-ordinate SIT and UAT Testing. Take feedbacks and provide necessary remediation/recommendation in time.
- Quality delivery and automation should be a top priority
- Co-ordinate change and deployment in time
- Should create healthy harmony within the team
- Owns interaction points with members of core team (e.g.BA team, Testing and business team) and any other relevant stakeholders
About Us
Punchh is the leader in customer loyalty, offer management, and AI solutions for offline and omni-channel merchants including restaurants, convenience stores, and retailers. Punchh brings the power of online to physical brands by delivering omni-channel experiences and personalization across the entire customer journey--from acquisition through loyalty and growth--to drive same store sales and customer lifetime value. Punchh uses best-in-class integrations to POS and other in-store systems such as WiFi, to deliver real-time SKU-level transaction visibility and offer provisioning for physical stores.
Punchh is growing exponentially, serves 200+ brands that encompass 91K+ stores globally. Punchh’s customers include the top convenience stores such as Casey’s General Stores, 25+ of the top 100 restaurant brands such as Papa John's, Little Caesars, Denny’s, Focus Brands (5 of 7 brands), and Yum! Brands (KFC, Pizza Hut, and Taco Bell), and retailers. For a multi-billion $ brand with 6K+ stores, Punchh drove a 3% lift in same-store sales within the first year. Punchh is powering loyalty programs for 135+ million consumers.
Punchh has raised $70 million from premier Silicon Valley investors including Sapphire Ventures and Adam Street Partners, has a seasoned leadership team with extensive experience in digital, marketing, CRM, and AI technologies as well as deep restaurant and retail industry expertise.
About the Role:
Punchh Tech India Pvt. is looking for a Senior Data Analyst – Business Insights to join our team. If you're excited to be part of a winning team, Punchh is a great place to grow your career.
This position is responsible for discovering the important trends among the complex data generated on Punchh platform, that have high business impact (influencing product features and roadmap). Creating hypotheses around these trends, validate them with statistical significance and make recommendations
Reporting to: Director, Analytics
Job Location: Jaipur
Experience Required: 4-6 years
What You’ll Do
- Take ownership of custom data analysis projects/requests and work closely with end users (both internal and external clients) to deliver the results
- Identify successful implementation/utilization of product features and contribute to the best-practices playbook for client facing teams (Customer Success)
- Strive towards building mini business intelligence products that add value to the client base
- Represent the company’s expertise in advanced analytics in a variety of media outlets such as client interactions, conferences, blogs, and interviews.
What You’ll Need
- Masters in business/behavioral economics/statistics with a strong interest in marketing technology
- Proven track record of at least 5 years uncovering business insights, especially related to Behavioral Economics and adding value to businesses
- Proficient in using the proper statistical and econometric approaches to establish the presence and strength of trends in data. Strong statistical knowledge is mandatory.
- Extensive prior exposure in causal inference studies, based on both longitudinal and latitudinal data.
- Excellent experience using Python (or R) to analyze data from extremely large or complex data sets
- Exceptional data querying skills (Snowflake/Redshift, Spark, Presto/Athena, to name a few)
- Ability to effectively articulate complex ideas in simple and effective presentations to diverse groups of stakeholders.
- Experience working with a visualization tool (preferably, but not restricted to Tableau)
- Domain expertise: extensive exposure to retail business, restaurant business or worked on loyalty programs and promotion/campaign effectiveness
- Should be self-organized and be able to proactively identify problems and propose solutions
- Gels well within and across teams, work with stakeholders from various functions such as Product, Customer Success, Implementations among others
- As the stakeholders on business side are based out of US, should be flexible to schedule meetings convenient to the West Coast timings
- Effective in working autonomously to get things done and taking the initiatives to anticipate needs of executive leadership
- Able and willing to relocate to Jaipur post pandemic.
Benefits:
- Medical Coverage, to keep you and your family healthy.
- Compensation that stacks up with other tech companies in your area.
- Paid vacation days and holidays to rest and relax.
- Healthy lunch provided daily to fuel you through your work.
- Opportunities for career growth and training support, including fun team building events.
- Flexibility and a comfortable work environment for you to feel your best.
Your mission is to help lead team towards creating solutions that improve the way our business is run. Your knowledge of design, development, coding, testing and application programming will help your team raise their game, meeting your standards, as well as satisfying both business and functional requirements. Your expertise in various technology domains will be counted on to set strategic direction and solve complex and mission critical problems, internally and externally. Your quest to embracing leading-edge technologies and methodologies inspires your team to follow suit.
Responsibilities and Duties :
- As a Data Engineer you will be responsible for the development of data pipelines for numerous applications handling all kinds of data like structured, semi-structured &
unstructured. Having big data knowledge specially in Spark & Hive is highly preferred.
- Work in team and provide proactive technical oversight, advice development teams fostering re-use, design for scale, stability, and operational efficiency of data/analytical solutions
Education level :
- Bachelor's degree in Computer Science or equivalent
Experience :
- Minimum 5+ years relevant experience working on production grade projects experience in hands on, end to end software development
- Expertise in application, data and infrastructure architecture disciplines
- Expert designing data integrations using ETL and other data integration patterns
- Advanced knowledge of architecture, design and business processes
Proficiency in :
- Modern programming languages like Java, Python, Scala
- Big Data technologies Hadoop, Spark, HIVE, Kafka
- Writing decently optimized SQL queries
- Orchestration and deployment tools like Airflow & Jenkins for CI/CD (Optional)
- Responsible for design and development of integration solutions with Hadoop/HDFS, Real-Time Systems, Data Warehouses, and Analytics solutions
- Knowledge of system development lifecycle methodologies, such as waterfall and AGILE.
- An understanding of data architecture and modeling practices and concepts including entity-relationship diagrams, normalization, abstraction, denormalization, dimensional
modeling, and Meta data modeling practices.
- Experience generating physical data models and the associated DDL from logical data models.
- Experience developing data models for operational, transactional, and operational reporting, including the development of or interfacing with data analysis, data mapping,
and data rationalization artifacts.
- Experience enforcing data modeling standards and procedures.
- Knowledge of web technologies, application programming languages, OLTP/OLAP technologies, data strategy disciplines, relational databases, data warehouse development and Big Data solutions.
- Ability to work collaboratively in teams and develop meaningful relationships to achieve common goals
Skills :
Must Know :
- Core big-data concepts
- Spark - PySpark/Scala
- Data integration tool like Pentaho, Nifi, SSIS, etc (at least 1)
- Handling of various file formats
- Cloud platform - AWS/Azure/GCP
- Orchestration tool - Airflow