Key Responsibilities
Architect and implement enterprise-grade Lakehouse solutions using Databricks
Design and deliver scalable batch and real-time data pipelines using Apache Spark (PySpark/SQL)
Build ETL/ELT pipelines, incremental data loads, and metadata-driven ingestion frameworks
Implement and optimize Databricks components: Delta Lake, Delta Live Tables, Autoloader, Structured Streaming, and Workflows
Design large-scale data warehousing solutions with 3NF and dimensional modeling
Establish data governance, security, and data quality frameworks, including Unity Catalog
Lead ML lifecycle management using MLflow and drive AI use cases (RAG, AI/BI)
Manage cloud-native deployments on Microsoft Azure and integrate with enterprise systems (e.g., ServiceNow)
Drive CI/CD, DevOps practices, and performance optimization of Spark workloads
Provide technical leadership, mentor teams, and ensure successful delivery
Collaborate with stakeholders to translate business requirements into scalable solutions
Required Skills & Experience
10+ years in Data Engineering / Analytics / AI with strong delivery ownership
Deep expertise in Databricks ecosystem (Notebooks, Delta Lake, Workflows, AI/BI, Apps, Genie)
Strong hands-on experience with:
a. Apache Spark (performance tuning & scalability)
b. Python and SQL
Proven experience in:
a. Solution architecture and large-scale data platforms
b. Data warehousing and advanced data modeling
c. Batch and real-time processing systems
Experience with:
a. Azure Databricks and Azure data services
b. MLflow and MLOps practices
c. ServiceNow or enterprise integrations
Exposure to AI technologies (RAG, LLM-based solutions)
Strong stakeholder management and leadership skills
Certifications (Preferred)
Databricks certifications aligned to data engineering and AI tracks, such as:
a. Databricks Certified Data Engineer Associate (validates foundational ETL, Spark, and Lakehouse capabilities)
b. Databricks Certified Data Engineer Professional (advanced expertise in pipeline design, optimization, and governance)
Certifications in Databricks Machine Learning or Generative AI tracks (e.g., ML Associate / Professional) for AI-driven use cases
Relevant cloud certifications in Microsoft Azure or Amazon Web Services for platform deployment and architecture