50+ Apache Spark Jobs in India
Apply to 50+ Apache Spark Jobs on CutShort.io. Find your next job, effortlessly. Browse Apache Spark Jobs and apply today!
ROLE & RESPONSIBILITIES:
We are hiring a Senior DevSecOps / Security Engineer with 8+ years of experience securing AWS cloud, on-prem infrastructure, DevOps platforms, MLOps environments, CI/CD pipelines, container orchestration, and data/ML platforms. This role is responsible for creating and maintaining a unified security posture across all systems used by DevOps and MLOps teams — including AWS, Kubernetes, EMR, MWAA, Spark, Docker, GitOps, observability tools, and network infrastructure.
KEY RESPONSIBILITIES:
1. Cloud Security (AWS)-
- Secure all AWS resources consumed by DevOps/MLOps/Data Science: EC2, EKS, ECS, EMR, MWAA, S3, RDS, Redshift, Lambda, CloudFront, Glue, Athena, Kinesis, Transit Gateway, VPC Peering.
- Implement IAM least privilege, SCPs, KMS, Secrets Manager, SSO & identity governance.
- Configure AWS-native security: WAF, Shield, GuardDuty, Inspector, Macie, CloudTrail, Config, Security Hub.
- Harden VPC architecture, subnets, routing, SG/NACLs, multi-account environments.
- Ensure encryption of data at rest/in transit across all cloud services.
2. DevOps Security (IaC, CI/CD, Kubernetes, Linux)-
Infrastructure as Code & Automation Security:
- Secure Terraform, CloudFormation, Ansible with policy-as-code (OPA, Checkov, tfsec).
- Enforce misconfiguration scanning and automated remediation.
CI/CD Security:
- Secure Jenkins, GitHub, GitLab pipelines with SAST, DAST, SCA, secrets scanning, image scanning.
- Implement secure build, artifact signing, and deployment workflows.
Containers & Kubernetes:
- Harden Docker images, private registries, runtime policies.
- Enforce EKS security: RBAC, IRSA, PSP/PSS, network policies, runtime monitoring.
- Apply CIS Benchmarks for Kubernetes and Linux.
Monitoring & Reliability:
- Secure observability stack: Grafana, CloudWatch, logging, alerting, anomaly detection.
- Ensure audit logging across cloud/platform layers.
3. MLOps Security (Airflow, EMR, Spark, Data Platforms, ML Pipelines)-
Pipeline & Workflow Security:
- Secure Airflow/MWAA connections, secrets, DAGs, execution environments.
- Harden EMR, Spark jobs, Glue jobs, IAM roles, S3 buckets, encryption, and access policies.
ML Platform Security:
- Secure Jupyter/JupyterHub environments, containerized ML workspaces, and experiment tracking systems.
- Control model access, artifact protection, model registry security, and ML metadata integrity.
Data Security:
- Secure ETL/ML data flows across S3, Redshift, RDS, Glue, Kinesis.
- Enforce data versioning security, lineage tracking, PII protection, and access governance.
ML Observability:
- Implement drift detection (data drift/model drift), feature monitoring, audit logging.
- Integrate ML monitoring with Grafana/Prometheus/CloudWatch.
4. Network & Endpoint Security-
- Manage firewall policies, VPN, IDS/IPS, endpoint protection, secure LAN/WAN, Zero Trust principles.
- Conduct vulnerability assessments, penetration test coordination, and network segmentation.
- Secure remote workforce connectivity and internal office networks.
5. Threat Detection, Incident Response & Compliance-
- Centralize log management (CloudWatch, OpenSearch/ELK, SIEM).
- Build security alerts, automated threat detection, and incident workflows.
- Lead incident containment, forensics, RCA, and remediation.
- Ensure compliance with ISO 27001, SOC 2, GDPR, HIPAA (as applicable).
- Maintain security policies, procedures, RRPs (Runbooks), and audits.
IDEAL CANDIDATE:
- 8+ years in DevSecOps, Cloud Security, Platform Security, or equivalent.
- Proven ability securing AWS cloud ecosystems (IAM, EKS, EMR, MWAA, VPC, WAF, GuardDuty, KMS, Inspector, Macie).
- Strong hands-on experience with Docker, Kubernetes (EKS), CI/CD tools, and Infrastructure-as-Code.
- Experience securing ML platforms, data pipelines, and MLOps systems (Airflow/MWAA, Spark/EMR).
- Strong Linux security (CIS hardening, auditing, intrusion detection).
- Proficiency in Python, Bash, and automation/scripting.
- Excellent knowledge of SIEM, observability, threat detection, monitoring systems.
- Understanding of microservices, API security, serverless security.
- Strong understanding of vulnerability management, penetration testing practices, and remediation plans.
EDUCATION:
- Master’s degree in Cybersecurity, Computer Science, Information Technology, or related field.
- Relevant certifications (AWS Security Specialty, CISSP, CEH, CKA/CKS) are a plus.
PERKS, BENEFITS AND WORK CULTURE:
- Competitive Salary Package
- Generous Leave Policy
- Flexible Working Hours
- Performance-Based Bonuses
- Health Care Benefits
Review Criteria
- Strong MLOps profile
- 8+ years of DevOps experience and 4+ years in MLOps / ML pipeline automation and production deployments
- 4+ years hands-on experience in Apache Airflow / MWAA managing workflow orchestration in production
- 4+ years hands-on experience in Apache Spark (EMR / Glue / managed or self-hosted) for distributed computation
- Must have strong hands-on experience across key AWS services including EKS/ECS/Fargate, Lambda, Kinesis, Athena/Redshift, S3, and CloudWatch
- Must have hands-on Python for pipeline & automation development
- 4+ years of experience in AWS cloud, with recent companies
- (Company) - Product companies preferred; Exception for service company candidates with strong MLOps + AWS depth
Preferred
- Hands-on in Docker deployments for ML workflows on EKS / ECS
- Experience with ML observability (data drift / model drift / performance monitoring / alerting) using CloudWatch / Grafana / Prometheus / OpenSearch.
- Experience with CI / CD / CT using GitHub Actions / Jenkins.
- Experience with JupyterHub/Notebooks, Linux, scripting, and metadata tracking for ML lifecycle.
- Understanding of ML frameworks (TensorFlow / PyTorch) for deployment scenarios.
Job Specific Criteria
- CV Attachment is mandatory
- Please provide CTC Breakup (Fixed + Variable)?
- Are you okay for F2F round?
- Have candidate filled the google form?
Role & Responsibilities
We are looking for a Senior MLOps Engineer with 8+ years of experience building and managing production-grade ML platforms and pipelines. The ideal candidate will have strong expertise across AWS, Airflow/MWAA, Apache Spark, Kubernetes (EKS), and automation of ML lifecycle workflows. You will work closely with data science, data engineering, and platform teams to operationalize and scale ML models in production.
Key Responsibilities:
- Design and manage cloud-native ML platforms supporting training, inference, and model lifecycle automation.
- Build ML/ETL pipelines using Apache Airflow / AWS MWAA and distributed data workflows using Apache Spark (EMR/Glue).
- Containerize and deploy ML workloads using Docker, EKS, ECS/Fargate, and Lambda.
- Develop CI/CT/CD pipelines integrating model validation, automated training, testing, and deployment.
- Implement ML observability: model drift, data drift, performance monitoring, and alerting using CloudWatch, Grafana, Prometheus.
- Ensure data governance, versioning, metadata tracking, reproducibility, and secure data pipelines.
- Collaborate with data scientists to productionize notebooks, experiments, and model deployments.
Ideal Candidate
- 8+ years in MLOps/DevOps with strong ML pipeline experience.
- Strong hands-on experience with AWS:
- Compute/Orchestration: EKS, ECS, EC2, Lambda
- Data: EMR, Glue, S3, Redshift, RDS, Athena, Kinesis
- Workflow: MWAA/Airflow, Step Functions
- Monitoring: CloudWatch, OpenSearch, Grafana
- Strong Python skills and familiarity with ML frameworks (TensorFlow/PyTorch/Scikit-learn).
- Expertise with Docker, Kubernetes, Git, CI/CD tools (GitHub Actions/Jenkins).
- Strong Linux, scripting, and troubleshooting skills.
- Experience enabling reproducible ML environments using Jupyter Hub and containerized development workflows.
Education:
- Master’s degree in computer science, Machine Learning, Data Engineering, or related field.
Mandatory (Experience 1) - Must have 8+ years of DevOps experience and 4+ years in MLOps / ML pipeline automation and production deployments
Mandatory (Experience 2) - Must have 4+ years hands-on experience in Apache Airflow / MWAA managing workflow orchestration in production
Mandatory (Experience 3) - Must have 4+ years hands-on experience in Apache Spark (EMR / Glue / managed or self-hosted) for distributed computation
Mandatory (Experience 4) - Must have strong hands-on experience across key AWS services including EKS/ECS/Fargate, Lambda, Kinesis, Athena/Redshift, S3, and CloudWatch
Mandatory (Experience 5) - Must have hands-on Python for pipeline & automation development
Mandatory (Experience 6) - Must have 4+ years of experience in AWS cloud, with recent companies
Mandatory (Company) - Product companies preferred; Exception for service company candidates with strong MLOps + AWS depth
Position Overview: The Lead Software Architect - Python & Data Engineering is a senior technical leadership role responsible for designing and owning end-to-end architecture for data-intensive, AI/ML, and analytics platforms, while mentoring developers and ensuring technical excellence across the organization.
Key Responsibilities:
- Design end-to-end software architecture for data-intensive applications, AI/ML pipelines, and analytics platforms
- Evaluate trade-offs between competing technical approaches
- Define data models, API approach, and integration patterns across systems
- Create technical specifications and architecture documentation
- Lead by example through production-grade Python code and mentor developers on engineering fundamentals
- Conduct design and code reviews focused on architectural soundness
- Establish engineering standards, coding practices, and design patterns for the team
- Translate business requirements into technical architecture
- Collaborate with data scientists, analysts, and other teams to design integrated solutions
- Whiteboard and defend system design and architectural choices
- Take responsibility for system performance, reliability, and maintainability
- Identify and resolve architectural bottlenecks proactively
Required Skills:
- 8+ years of experience in software architecture and development
- Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field
- Strong foundations in data structures, algorithms, and computational complexity
- Experience in system design for scale, including caching strategies, load balancing, and asynchronous processing
- 6+ years of Python development experience
- Deep knowledge of Django, Flask, or FastAPI
- Expert understanding of Python internals including GIL and memory management
- Experience with RESTful API design and event-driven architectures (Kafka, RabbitMQ)
- Proficiency in data processing frameworks such as Pandas, Apache Spark, and Airflow
- Strong SQL optimization and database design experience (PostgreSQL, MySQL, MongoDB) Experience with AWS, GCP, or Azure cloud platforms
- Knowledge of containerization (Docker) and orchestration (Kubernetes)
- Hands-on experience designing CI/CD pipelines Preferred (Bonus)
Skills:
- Experience deploying ML models to production (MLOps, model serving, monitoring) Understanding of ML system design including feature stores and model versioning
- Familiarity with ML frameworks such as scikit-learn, TensorFlow, and PyTorch
- Open-source contributions or technical blogging demonstrating architectural depth
- Experience with modern front-end frameworks for full-stack perspective
Who We Are
At Sonatype, we help organizations build better, more secure software by enabling them to understand and control their software supply chains. Our products are trusted by thousands of engineering teams globally, providing critical insights into dependency health, license risk, and software security. We’re passionate about empowering developers—and we back it with data.
The Opportunity
We’re looking for a Data Engineer with full stack expertise to join our growing Data Platform team. This role blends data engineering, microservices, and full-stack development to deliver end-to-end services that power analytics, machine learning, and advanced search across Sonatype.
You will design and build data-driven microservices and workflows using Java, Python, and Spring Batch, implement frontends for data workflows, and deploy everything through CI/CD pipelines into AWS ECS/Fargate. You’ll also ensure services are monitorable, debuggable, and reliable at scale, while clearly documenting designs with Mermaid-based sequence and dataflow diagrams.
This is a hands-on engineering role for someone who thrives at the intersection of data systems, fullstack development, ML, and cloud-native platforms.
What You’ll Do
- Design, build, and maintain data pipelines, ETL/ELT workflows, and scalable microservices.
- Development of complex web scraping (Playwright) and realtime pipelines (Kafka/Queues/Flink).
- Develop end-to-end microservices with backend (Java 5+, Python 5+, Spring Batch 2+) and frontend (React or any).
- Deploy, publish, and operate services in AWS ECS/Fargate using CI/CD pipelines (Jenkins, GitOps).
- Architect and optimize data storage models in SQL (MySQL, PostgreSQL) and NoSQL stores.
- Implement web scraping and external data ingestion pipelines.
- Enable Databricks and PySpark-based workflows for large-scale analytics.
- Build advanced data search capabilities (fuzzy matching, vector similarity search, semantic retrieval).
- Apply ML techniques (scikit-learn, classification algorithms, predictive modeling) to data-driven solutions.
- Implement observability, debugging, monitoring, and alerting for deployed services.
- Create Mermaid sequence diagrams, flowcharts, and dataflow diagrams to document system architecture and workflows.
- Drive best practices in fullstack data service development, including architecture, testing, and documentation.
What We’re Looking For
Minimum Qualifications
- 2+ years of experience as a Data Engineer or a Software Backend engineering role
- Strong programming skills in Python, Scala, or Java
- Hands-on experience with HBase or similar NoSQL columnar stores
- Hands-on experience with distributed data systems like Spark, Kafka, or Flink
- Proficient in writing complex SQL and optimizing queries for performance
- Experience building and maintaining robust ETL/ELT pipelines in production
- Familiarity with workflow orchestration tools (Airflow, Dagster, or similar)
- Understanding of data modeling techniques (star schema, dimensional modeling, etc.)
- Familiarity with CI/CD pipelines (Jenkins or similar)
- Ability to visualize and communicate architectures using Mermaid diagrams
Bonus Points
- Experience working with Databricks, dbt, Terraform, or Kubernetes
- Familiarity with streaming data pipelines or real-time processing
- Exposure to data governance frameworks and tools
- Experience supporting data products or ML pipelines in production
- Strong understanding of data privacy, security, and compliance best practices
Why You’ll Love Working Here
- Data with purpose: Work on problems that directly impact how the world builds secure software
- Modern tooling: Leverage the best of open-source and cloud-native technologies
- Collaborative culture: Join a passionate team that values learning, autonomy, and impact
About Corridor Platforms
Corridor Platforms is a leader in next-generation risk decisioning and responsible AI governance, empowering banks and lenders to build transparent, compliant, and data-driven solutions. Our platforms combine advanced analytics, real-time data integration, and GenAI to support complex financial decision workflows for regulated industries.
Role Overview
As a Backend Engineer at Corridor Platforms, you will:
- Architect, develop, and maintain backend components for our Risk Decisioning Platform.
- Build and orchestrate scalable backend services that automate, optimize, and monitor high-value credit and risk decisions in real time.
- Integrate with ORM layers – such as SQLAlchemy – and multi RDBMS solutions (Postgres, MySQL, Oracle, MSSQL, etc) to ensure data integrity, scalability, and compliance.
- Collaborate closely with Product Team, Data Scientists, QA Teams to create extensible APIs, workflow automation, and AI governance features.
- Architect workflows for privacy, auditability, versioned traceability, and role-based access control, ensuring adherence to regulatory frameworks.
- Take ownership from requirements to deployment, seeing your code deliver real impact in the lives of customers and end users.
Technical Skills
- Languages: Python 3.9+, SQL, JavaScript/TypeScript, Angular
- Frameworks: Flask, SQLAlchemy, Celery, Marshmallow, Apache Spark
- Databases: PostgreSQL, Oracle, SQL Server, Redis
- Tools: pytest, Docker, Git, Nx
- Cloud: Experience with AWS, Azure, or GCP preferred
- Monitoring: Familiarity with OpenTelemetry and logging frameworks
Why Join Us?
- Cutting-Edge Tech: Work hands-on with the latest AI, cloud-native workflows, and big data tools—all within a single compliant platform.
- End-to-End Impact: Contribute to mission-critical backend systems, from core data models to live production decision services.
- Innovation at Scale: Engineer solutions that process vast data volumes, helping financial institutions innovate safely and effectively.
- Mission-Driven: Join a passionate team advancing fair, transparent, and compliant risk decisioning at the forefront of fintech and AI governance.
What We’re Looking For
- Proficiency in Python, SQLAlchemy (or similar ORM), and SQL databases.
- Experience developing and maintaining scalable backend services, including API, data orchestration, ML workflows, and workflow automation.
- Solid understanding of data modeling, distributed systems, and backend architecture for regulated environments.
- Curiosity and drive to work at the intersection of AI/ML, fintech, and regulatory technology.
- Experience mentoring and guiding junior developers.
Ready to build backends that shape the future of decision intelligence and responsible AI?
Apply now and become part of the innovation at Corridor Platforms!
What You’ll Be Doing:
● Own the architecture and roadmap for scalable, secure, and high-quality data pipelines
and platforms.
● Lead and mentor a team of data engineers while establishing engineering best practices,
coding standards, and governance models.
● Design and implement high-performance ETL/ELT pipelines using modern Big Data
technologies for diverse internal and external data sources.
● Drive modernization initiatives including re-architecting legacy systems to support
next-generation data products, ML workloads, and analytics use cases.
● Partner with Product, Engineering, and Business teams to translate requirements into
robust technical solutions that align with organizational priorities.
● Champion data quality, monitoring, metadata management, and observability across the
ecosystem.
● Lead initiatives to improve cost efficiency, data delivery SLAs, automation, and
infrastructure scalability.
● Provide technical leadership on data modeling, orchestration, CI/CD for data workflows,
and cloud-based architecture improvements.
Qualifications:
● Bachelor's degree in Engineering, Computer Science, or relevant field.
● 8+ years of relevant and recent experience in a Data Engineer role.
● 5+ years recent experience with Apache Spark and solid understanding of the
fundamentals.
● Deep understanding of Big Data concepts and distributed systems.
● Demonstrated ability to design, review, and optimize scalable data architectures across
ingestion.
● Strong coding skills with Scala, Python and the ability to quickly switch between them with
ease.
● Advanced working SQL knowledge and experience working with a variety of relational
databases such as Postgres and/or MySQL.
● Cloud Experience with DataBricks.
● Strong understanding of Delta Lake architecture and working with Parquet, JSON, CSV,
and similar formats.
● Experience establishing and enforcing data engineering best practices, including CI/CD
for data, orchestration and automation, and metadata management.
● Comfortable working in an Agile environment
● Machine Learning knowledge is a plus.
● Demonstrated ability to operate independently, take ownership of deliverables, and lead
technical decisions.
● Excellent written and verbal communication skills in English.
● Experience supporting and working with cross-functional teams in a dynamic
environment.
REPORTING: This position will report to Sr. Technical Manager or Director of Engineering as
assigned by Management.
EMPLOYMENT TYPE: Full-Time, Permanent
SHIFT TIMINGS: 10:00 AM - 07:00 PM IST
We are looking for a highly skilled Sr. Big Data Engineer with 3-5 years of experience in
building large-scale data pipelines, real-time streaming solutions, and batch/stream
processing systems. The ideal candidate should be proficient in Spark, Kafka, Python, and
AWS Big Data services, with hands-on experience in implementing CDC (Change Data
Capture) pipelines and integrating multiple data sources and sinks.
Responsibilities
- Design, develop, and optimize batch and streaming data pipelines using Apache Spark and Python.
- Build and maintain real-time data ingestion pipelines leveraging Kafka and AWS Kinesis.
- Implement CDC (Change Data Capture) pipelines using Kafka Connect, Debezium or similar frameworks.
- Integrate data from multiple sources and sinks (databases, APIs, message queues, file systems, cloud storage).
- Work with AWS Big Data ecosystem: Glue, EMR, Kinesis, Athena, S3, Lambda, Step Functions.
- Ensure pipeline scalability, reliability, and performance tuning of Spark jobs and EMR clusters.
- Develop data transformation and ETL workflows in AWS Glue and manage schema evolution.
- Collaborate with data scientists, analysts, and product teams to deliver reliable and high-quality data solutions.
- Implement monitoring, logging, and alerting for critical data pipelines.
- Follow best practices for data security, compliance, and cost optimization in cloud environments.
Required Skills & Experience
- Programming: Strong proficiency in Python (PySpark, data frameworks, automation).
- Big Data Processing: Hands-on experience with Apache Spark (batch & streaming).
- Messaging & Streaming: Proficient in Kafka (brokers, topics, partitions, consumer groups) and AWS Kinesis.
- CDC Pipelines: Experience with Debezium / Kafka Connect / custom CDC frameworks.
- AWS Services: AWS Glue, EMR, S3, Athena, Lambda, IAM, CloudWatch.
- ETL/ELT Workflows: Strong knowledge of data ingestion, transformation, partitioning, schema management.
- Databases: Experience with relational databases (MySQL, Postgres, Oracle) and NoSQL (MongoDB, DynamoDB, Cassandra).
- Data Formats: JSON, Parquet, Avro, ORC, Delta/Iceberg/Hudi.
- Version Control & CI/CD: Git, GitHub/GitLab, Jenkins, or CodePipeline.
- Monitoring/Logging: CloudWatch, Prometheus, ELK/Opensearch.
- Containers & Orchestration (nice-to-have): Docker, Kubernetes, Airflow/Step
- Functions for workflow orchestration.
Preferred Qualifications
- Bachelor’s or Master’s degree in Computer Science, Data Engineering, or related field.
- Experience in large-scale data lake / lake house architectures.
- Knowledge of data warehousing concepts and query optimisation.
- Familiarity with data governance, lineage, and cataloging tools (Glue Data Catalog, Apache Atlas).
- Exposure to ML/AI data pipelines is a plus.
Tools & Technologies (must-have exposure)
- Big Data & Processing: Apache Spark, PySpark, AWS EMR, AWS Glue
- Streaming & Messaging: Apache Kafka, Kafka Connect, Debezium, AWS Kinesis
- Cloud & Storage: AWS (S3, Athena, Lambda, IAM, CloudWatch)
- Programming & Scripting: Python, SQL, Bash
- Orchestration: Airflow / Step Functions
- Version Control & CI/CD: Git, Jenkins/CodePipeline
- Data Formats: Parquet, Avro, ORC, JSON, Delta, Iceberg, Hudi
MANDATORY:
- Super Quality Data Architect, Data Engineering Manager / Director Profile
- Must have 12+ YOE in Data Engineering roles, with at least 2+ years in a Leadership role
- Must have 7+ YOE in hands-on Tech development with Java (Highly preferred) or Python, Node.JS, GoLang
- Must have strong experience in large data technologies, tools like HDFS, YARN, Map-Reduce, Hive, Kafka, Spark, Airflow, Presto etc.
- Strong expertise in HLD and LLD, to design scalable, maintainable data architectures.
- Must have managed a team of at least 5+ Data Engineers (Read Leadership role in CV)
- Product Companies (Prefers high-scale, data-heavy companies)
PREFERRED:
- Must be from Tier - 1 Colleges, preferred IIT
- Candidates must have spent a minimum 3 yrs in each company.
- Must have recent 4+ YOE with high-growth Product startups, and should have implemented Data Engineering systems from an early stage in the Company
ROLES & RESPONSIBILITIES:
- Lead and mentor a team of data engineers, ensuring high performance and career growth.
- Architect and optimize scalable data infrastructure, ensuring high availability and reliability.
- Drive the development and implementation of data governance frameworks and best practices.
- Work closely with cross-functional teams to define and execute a data roadmap.
- Optimize data processing workflows for performance and cost efficiency.
- Ensure data security, compliance, and quality across all data platforms.
- Foster a culture of innovation and technical excellence within the data team.
IDEAL CANDIDATE:
- 10+ years of experience in software/data engineering, with at least 3+ years in a leadership role.
- Expertise in backend development with programming languages such as Java, PHP, Python, Node.JS, GoLang, JavaScript, HTML, and CSS.
- Proficiency in SQL, Python, and Scala for data processing and analytics.
- Strong understanding of cloud platforms (AWS, GCP, or Azure) and their data services.
- Strong foundation and expertise in HLD and LLD, as well as design patterns, preferably using Spring Boot or Google Guice
- Experience in big data technologies such as Spark, Hadoop, Kafka, and distributed computing frameworks.
- Hands-on experience with data warehousing solutions such as Snowflake, Redshift, or BigQuery
- Deep knowledge of data governance, security, and compliance (GDPR, SOC2, etc.).
- Experience in NoSQL databases like Redis, Cassandra, MongoDB, and TiDB.
- Familiarity with automation and DevOps tools like Jenkins, Ansible, Docker, Kubernetes, Chef, Grafana, and ELK.
- Proven ability to drive technical strategy and align it with business objectives.
- Strong leadership, communication, and stakeholder management skills.
PREFERRED QUALIFICATIONS:
- Experience in machine learning infrastructure or MLOps is a plus.
- Exposure to real-time data processing and analytics.
- Interest in data structures, algorithm analysis and design, multicore programming, and scalable architecture.
- Prior experience in a SaaS or high-growth tech company.
Profile: Big Data Engineer (System Design)
Experience: 5+ years
Location: Bangalore
Work Mode: Hybrid
About the Role
We're looking for an experienced Big Data Engineer with system design expertise to architect and build scalable data pipelines and optimize big data solutions.
Key Responsibilities
- Design, develop, and maintain data pipelines and ETL processes using Python, Hive, and Spark
- Architect scalable big data solutions with strong system design principles
- Build and optimize workflows using Apache Airflow
- Implement data modeling, integration, and warehousing solutions
- Collaborate with cross-functional teams to deliver data solutions
Must-Have Skills
- 5+ years as a Data Engineer with Python, Hive, and Spark
- Strong hands-on experience with Java
- Advanced SQL and Hadoop experience
- Expertise in Apache Airflow
- Strong understanding of data modeling, integration, and warehousing
- Experience with relational databases (PostgreSQL, MySQL)
- System design knowledge
- Excellent problem-solving and communication skills
Good to Have
- Docker and containerization experience
- Knowledge of Apache Beam, Apache Flink, or similar frameworks
- Cloud platform experience.
📢 DATA SOURCING & ANALYSIS EXPERT (L3 Support) – Mumbai 📢
Are you ready to supercharge your Data Engineering career in the financial domain?
We’re seeking a seasoned professional (5–7 years experience) to join our Mumbai team and lead in data sourcing, modelling, and analysis. If you’re passionate about solving complex challenges in Relational & Big Data ecosystems, this role is for you.
What You’ll Be Doing
- Translate business needs into robust data models, program specs, and solutions
- Perform advanced SQL optimization, query tuning, and L3-level issue resolution
- Work across the entire data stack: ETL, Python / Spark, Autosys, and related systems
- Debug, monitor, and improve data pipelines in production
- Collaborate with business, analytics, and engineering teams to deliver dependable data services
What You Should Bring
- 5+ years in financial / fintech / capital markets environment
- Proven expertise in relational databases and big data technologies
- Strong command over SQL tuning, query optimization, indexing, partitioning
- Hands-on experience with ETL pipelines, Spark / PySpark, Python scripting, job scheduling (e.g. Autosys)
- Ability to troubleshoot issues at the L3 level, root cause analysis, performance tuning
- Good communication skills — you’ll coordinate with business users, analytics, and tech teams
What You’ll Be Doing:
● Design and build parts of our data pipeline architecture for extraction, transformation, and loading of data from a wide variety of data sources using the latest Big Data technologies.
● Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
● Work with stakeholders including the Executive, Product, Data and Design teams to assist with data-related technical issues and support their data infrastructure needs.
● Work with machine learning, data, and analytics experts to drive innovation, accuracy and greater functionality in our data system.
Qualifications:
● Bachelor's degree in Engineering, Computer Science, or relevant field.
● 10+ years of relevant and recent experience in a Data Engineer role. ● 5+ years recent experience with Apache Spark and solid understanding of the fundamentals.
● Deep understanding of Big Data concepts and distributed systems.
● Strong coding skills with Scala, Python, Java and/or other languages and the ability to quickly switch between them with ease.
● Advanced working SQL knowledge and experience working with a variety of relational databases such as Postgres and/or MySQL.
● Cloud Experience with DataBricks
● Experience working with data stored in many formats including Delta Tables, Parquet, CSV and JSON.
● Comfortable working in a linux shell environment and writing scripts as needed.
● Comfortable working in an Agile environment
● Machine Learning knowledge is a plus.
● Must be capable of working independently and delivering stable, efficient and reliable software.
● Excellent written and verbal communication skills in English.
● Experience supporting and working with cross-functional teams in a dynamic environment.
REPORTING: This position will report to our CEO or any other Lead as assigned by Management.
EMPLOYMENT TYPE: Full-Time,
Permanent LOCATION: Remote
SHIFT TIMINGS: 2.00 pm-11:00pm IST
Technical Architect (Databricks)
- 10+ Years Data Engineering Experience with expertise in Databricks
- 3+ years of consulting experience
- Completed Data Engineering Professional certification & required classes
- Minimum 2-3 projects delivered with hands-on experience in Databricks
- Completed Apache Spark Programming with Databricks, Data Engineering with Databricks, Optimizing Apache Spark™ on Databricks
- Experience in Spark and/or Hadoop, Flink, Presto, other popular big data engines
- Familiarity with Databricks multi-hop pipeline architecture
Sr. Data Engineer (Databricks)
- 5+ Years Data Engineering Experience with expertise in Databricks
- Completed Data Engineering Associate certification & required classes
- Minimum 1 project delivered with hands-on experience in development on Databricks
- Completed Apache Spark Programming with Databricks, Data Engineering with Databricks, Optimizing Apache Spark™ on Databricks
- SQL delivery experience, and familiarity with Bigquery, Synapse or Redshift
- Proficient in Python, knowledge of additional databricks programming languages (Scala)
We’re Hiring: Senior Data Engineer | Remote (Pan India)
Are you passionate about building scalable data pipelines and optimizing data architecture? We’re looking for an experienced Senior Data Engineer (10+ years) to join our team and play a key role in shaping next-gen data systems.
What you’ll do:
✅ Design & develop robust data pipelines (ETL) using the latest Big Data tech
✅ Optimize infrastructure & automate processes for scalability
✅ Collaborate with cross-functional teams (Product, Data, Design, ML)
✅ Work with modern tools: Apache Spark, Databricks, SQL, Python/Scala/Java
Scala is mandatory
What we’re looking for:
🔹 Strong expertise in Big Data, Spark & distributed systems
🔹 Hands-on with SQL, relational DBs (Postgres/MySQL), Linux scripting
🔹 Experience with Delta Tables, Parquet, CSV, JSON
🔹 Cloud & Databricks exposure
🔹 Bonus: Machine Learning knowledge
📍 Location: Remote (Pan India)
⏰ Shift: 2:00 pm – 11:00 pm IST
💼 Type: Full-time, Permanent
We are hiring freelancers to work on advanced Data & AI projects using Databricks. If you are passionate about cloud platforms, machine learning, data engineering, or architecture, and want to work with cutting-edge tools on real-world challenges, this is the opportunity for you!
✅ Key Details
- Work Type: Freelance / Contract
- Location: Remote
- Time Zones: IST / EST only
- Domain: Data & AI, Cloud, Big Data, Machine Learning
- Collaboration: Work with industry leaders on innovative projects
🔹 Open Roles
1. Databricks – Senior Consultant
- Skills: Data Warehousing, Python, Java, Scala, ETL, SQL, AWS, GCP, Azure
- Experience: 6+ years
2. Databricks – ML Engineer
- Skills: CI/CD, MLOps, Machine Learning, Spark, Hadoop
- Experience: 4+ years
3. Databricks – Solution Architect
- Skills: Azure, GCP, AWS, CI/CD, MLOps
- Experience: 7+ years
4. Databricks – Solution Consultant
- Skills: SQL, Spark, BigQuery, Python, Scala
- Experience: 2+ years
✅ What We Offer
- Opportunity to work with top-tier professionals and clients
- Exposure to cutting-edge technologies and real-world data challenges
- Flexible remote work environment aligned with IST / EST time zones
- Competitive compensation and growth opportunities
📌 Skills We Value
Cloud Computing | Data Warehousing | Python | Java | Scala | ETL | SQL | AWS | GCP | Azure | CI/CD | MLOps | Machine Learning | Spark |
What You’ll Be Doing:
● Design and build parts of our data pipeline architecture for extraction, transformation, and loading of data from a wide variety of data sources using the latest Big Data technologies.
● Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
● Work with stakeholders including the Executive, Product, Data and Design teams to assist with data-related technical issues and support their data infrastructure needs.
● Work with machine learning, data, and analytics experts to drive innovation, accuracy and greater functionality in our data system. Qualifications:
● Bachelor's degree in Engineering, Computer Science, or relevant field.
● 10+ years of relevant and recent experience in a Data Engineer role. ● 5+ years recent experience with Apache Spark and solid understanding of the fundamentals.
● Deep understanding of Big Data concepts and distributed systems.
● Strong coding skills with Scala, Python, Java and/or other languages and the ability to quickly switch between them with ease.
● Advanced working SQL knowledge and experience working with a variety of relational databases such as Postgres and/or MySQL.
● Cloud Experience with DataBricks
● Experience working with data stored in many formats including Delta Tables, Parquet, CSV and JSON.
● Comfortable working in a linux shell environment and writing scripts as needed.
● Comfortable working in an Agile environment
● Machine Learning knowledge is a plus.
● Must be capable of working independently and delivering stable, efficient and reliable software.
● Excellent written and verbal communication skills in English.
● Experience supporting and working with cross-functional teams in a dynamic environment
EMPLOYMENT TYPE: Full-Time, Permanent
LOCATION: Remote (Pan India)
SHIFT TIMINGS: 2.00 pm-11:00pm IST
POSITION:
Senior Data Engineer
The Senior Data Engineer will be responsible for building and extending our data pipeline architecture, as well as optimizing data flow and collection for cross functional teams. The ideal candidate is an experienced data pipeline builder and data wrangler who enjoys working with big data and building systems from the ground up.
You will collaborate with our software engineers, database architects, data analysts and data scientists to ensure our data delivery architecture is consistent throughout the platform. You must be self-directed and comfortable supporting the data needs of multiple teams, systems and products. The right candidate will be excited by the prospect of optimizing or even re-designing our company’s data architecture to support our next generation of products and data initiatives.
What You’ll Be Doing:
● Design and build parts of our data pipeline architecture for extraction, transformation, and loading of data from a wide variety of data sources using the latest Big Data technologies.
● Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
● Work with stakeholders including the Executive, Product, Data and Design teams to assist with data-related technical issues and support their data infrastructure needs.
● Work with machine learning, data, and analytics experts to drive innovation, accuracy and greater functionality in our data system.
Qualifications:
● Bachelor's degree in Engineering, Computer Science, or relevant
field.
● 10+ years of relevant and recent experience in a Data Engineer role.
● 5+ years recent experience with Apache Spark and solid understanding of the fundamentals.
● Deep understanding of Big Data concepts and distributed systems.
● Strong coding skills with Scala, Python, Java and/or other languages and the ability to quickly switch between them with ease.
● Advanced working SQL knowledge and experience working with a variety of relational databases such as Postgres and/or MySQL.
● Cloud Experience with DataBricks
● Experience working with data stored in many formats including Delta Tables, Parquet, CSV and JSON.
● Comfortable working in a linux shell environment and writing scripts as needed.
● Comfortable working in an Agile environment
● Machine Learning knowledge is a plus.
● Must be capable of working independently and delivering stable,
efficient and reliable software.
● Excellent written and verbal communication skills in English.
● Experience supporting and working with cross-functional teams in a dynamic environment.
REPORTING: This position will report to our CEO or any other Lead as assigned by Management.
EMPLOYMENT TYPE: Full-Time, Permanent LOCATION: Remote (Pan India) SHIFT TIMINGS: 2.00 pm-11:00pm IST
WHO WE ARE:
SalesIntel is the top revenue intelligence platform on the market. Our combination of automation and researchers allows us to reach 95% data accuracy for all our published contact data, while continuing to scale up our number of contacts. We currently have more than 5 million human-verifi ed contacts, another 70 million plus machine processed contacts, and the highest number of direct dial contacts in the industry. We guarantee our accuracy with our well-trained research team that re-verifi es every direct dial number, email, and contact every 90 days. With the most comprehensive contact and company data and our excellent customer service, SalesIntel has the best B2B data available. For more information, please visit – www.salesintel.io
WHAT WE OFFER: SalesIntel’s workplace is all about diversity. Different countries and cultures are represented in our workforce. We are growing at a fast pace and our work environment is constantly evolving with changing times. We motivate our team to better themselves by offering all the good stuff you’d expect like Holidays, Paid Leaves, Bonuses, Incentives, Medical Policy and company paid Training Programs.
SalesIntel is an Equal Opportunity Employer. We prohibit discrimination and harassment of any type and offer equal employment opportunities to employees and applicants without regard to race, color, religion, sex, sexual orientation, gender identity or expression, pregnancy, age, national origin, disability status, genetic information, protected veteran status, or any other characteristic protected by law.
Role & Responsibilities
Responsible for ensuring that the architecture and design of the platform remains top-notch with respect to scalability, availability, reliability and maintainability
Act as a key technical contributor as well as a hands-on contributing member of the team.
Own end-to-end availability and performance of features, driving rapid product innovation while ensuring a reliable service.
Working closely with the various stakeholders like Program Managers, Product Managers, Reliability and Continuity Engineering(RCE) team, QE team to estimate and execute features/tasks independently.
Maintain and drive tech backlog execution for non-functional requirements of the platform required to keep the platform resilient
Assist in release planning and prioritization based on technical feasibility and engineering constraints
A zeal to continually find new ways to improve architecture, design and ensure timely delivery and high quality.
Azure DE
Primary Responsibilities -
- Create and maintain data storage solutions including Azure SQL Database, Azure Data Lake, and Azure Blob Storage.
- Design, implement, and maintain data pipelines for data ingestion, processing, and transformation in Azure Create data models for analytics purposes
- Utilizing Azure Data Factory or comparable technologies, create and maintain ETL (Extract, Transform, Load) operations
- Use Azure Data Factory and Databricks to assemble large, complex data sets
- Implementing data validation and cleansing procedures will ensure the quality, integrity, and dependability of the data.
- Ensure data security and compliance
- Collaborate with data engineers, and other stakeholders to understand requirements and translate them into scalable and reliable data platform architectures
Required skills:
- Blend of technical expertise, analytical problem-solving, and collaboration with cross-functional teams
- Azure DevOps
- Apache Spark, Python
- SQL proficiency
- Azure Databricks knowledge
- Big data technologies
The DEs should be well versed in coding, spark core and data ingestion using Azure. Moreover, they need to be decent in terms of communication skills. They should also have core Azure DE skills and coding skills (pyspark, python and SQL).
Out of the 7 open demands, 5 of The Azure Data Engineers should have minimum 5 years of relevant Data Engineering experience.
Role & Responsibilities
About the Role:
We are seeking a highly skilled Senior Data Engineer with 5-7 years of experience to join our dynamic team. The ideal candidate will have a strong background in data engineering, with expertise in data warehouse architecture, data modeling, ETL processes, and building both batch and streaming pipelines. The candidate should also possess advanced proficiency in Spark, Databricks, Kafka, Python, SQL, and Change Data Capture (CDC) methodologies.
Key responsibilities:
Design, develop, and maintain robust data warehouse solutions to support the organization's analytical and reporting needs.
Implement efficient data modeling techniques to optimize performance and scalability of data systems.
Build and manage data lakehouse infrastructure, ensuring reliability, availability, and security of data assets.
Develop and maintain ETL pipelines to ingest, transform, and load data from various sources into the data warehouse and data lakehouse.
Utilize Spark and Databricks to process large-scale datasets efficiently and in real-time.
Implement Kafka for building real-time streaming pipelines and ensure data consistency and reliability.
Design and develop batch pipelines for scheduled data processing tasks.
Collaborate with cross-functional teams to gather requirements, understand data needs, and deliver effective data solutions.
Perform data analysis and troubleshooting to identify and resolve data quality issues and performance bottlenecks.
Stay updated with the latest technologies and industry trends in data engineering and contribute to continuous improvement initiatives.
Role & Responsibilities
Lead and mentor a team of data engineers, ensuring high performance and career growth.
Architect and optimize scalable data infrastructure, ensuring high availability and reliability.
Drive the development and implementation of data governance frameworks and best practices.
Work closely with cross-functional teams to define and execute a data roadmap.
Optimize data processing workflows for performance and cost efficiency.
Ensure data security, compliance, and quality across all data platforms.
Foster a culture of innovation and technical excellence within the data team.
Role & Responsibilities
Lead and mentor a team of data engineers, ensuring high performance and career growth.
Architect and optimize scalable data infrastructure, ensuring high availability and reliability.
Drive the development and implementation of data governance frameworks and best practices.
Work closely with cross-functional teams to define and execute a data roadmap.
Optimize data processing workflows for performance and cost efficiency.
Ensure data security, compliance, and quality across all data platforms.
Foster a culture of innovation and technical excellence within the data team.
Role & Responsibilities
Lead and mentor a team of data engineers, ensuring high performance and career growth.
Architect and optimize scalable data infrastructure, ensuring high availability and reliability.
Drive the development and implementation of data governance frameworks and best practices.
Work closely with cross-functional teams to define and execute a data roadmap.
Optimize data processing workflows for performance and cost efficiency.
Ensure data security, compliance, and quality across all data platforms.
Foster a culture of innovation and technical excellence within the data team.
Job Summary:
Seeking an experienced Senior Data Engineer to lead data ingestion, transformation, and optimization initiatives using the modern Apache and Azure data stack. The role involves working on scalable pipelines, large-scale distributed systems, and data lake management.
Core Responsibilities:
· Build and manage high-volume data pipelines using Spark/Databricks.
· Implement ELT frameworks using Azure Data Factory/Synapse Pipelines.
· Optimize large-scale datasets in Delta/Iceberg formats.
· Implement robust data quality, monitoring, and governance layers.
· Collaborate with Data Scientists, Analysts, and Business stakeholders.
Technical Stack:
· Big Data: Apache Spark, Kafka, Hive, Airflow, Hudi/Iceberg
· Cloud: Azure (Synapse, ADF, ADLS Gen2), Databricks, AWS (Glue/S3)
· Languages: Python, Scala, SQL
· Storage Formats: Delta Lake, Iceberg, Parquet, ORC
· CI/CD: Azure DevOps, Terraform (infra as code), Git
Senior Data Engineer (Apache Stack + Databricks/Synapse)
Share cv to
Thirega@ vysystems dot com - WhatsApp - 91Five0033Five2Three
Dear Candidate,
We are urgently looking for a Release- Big data Engineer For Pune Location.
Experience : 5-8 yrs
Location : Pune
Skills: Big data Engineer , Release Engineer ,DevOps, Aws/Azure/GCP Cloud exp. ,
JD:
- Oversee the end-to-end release lifecycle, from planning to post-production monitoring. Coordinate with cross-functional teams (DBA, BizOps, DevOps, DNS).
- Partner with development teams to resolve technical challenges in deployment and automation test runs
- Work with shared services DBA teams for schema-based multi-tenancy designs and smooth migrations.
- Drive automation for batch deployments and DR exercises. YAML based micro service deployment using shell/Python/Go
- Provide oversight for Big Data toolsets for deployment (e.g., Spark, Hive, HBase) in private cloud and public cloud CDP environments
- Ensure high-quality releases with a focus on stability and long-term performance.
- Able to run the automation batch scripts and debug the deployment and functional aspects/ work with dev leads to resolve the release cycle issues.
Regards,
Minakshi Soni
Executive- Talent Acquisition
Rigel Networks
Position : Software Engineer (Java Backend Engineer)
Experience : 4+ Years
📍 Location : Bangalore, India (Hybrid)
Mandatory Skills : Java 8+ (Advanced Features), Spring Boot, Apache Spark (Spark Streaming), SQL & Cosmos DB, Git, Maven, CI/CD (Jenkins, GitHub), Azure Cloud, Agile Scrum.
About the Role :
We are seeking a highly skilled Backend Engineer with expertise in Java, Spark, and microservices architecture to join our dynamic team. The ideal candidate will have a strong background in object-oriented programming, experience with Spark Streaming, and a deep understanding of distributed systems and cloud technologies.
Key Responsibilities :
- Design, develop, and maintain highly scalable microservices and optimized RESTful APIs using Spring Boot and Java 8+.
- Implement and optimize Spark Streaming applications for real-time data processing.
- Utilize advanced Java 8 features, including:
- Functional interfaces & Lambda expressions
- Streams and Parallel Streams
- Completable Futures & Concurrency API improvements
- Enhanced Collections APIs
- Work with relational (SQL) and NoSQL (Cosmos DB) databases, ensuring efficient data modeling and retrieval.
- Develop and manage CI/CD pipelines using Jenkins, GitHub, and related automation tools.
- Collaborate with cross-functional teams, including Product, Business, and Automation, to deliver end-to-end product features.
- Ensure adherence to Agile Scrum practices and participate in code reviews to maintain high-quality standards.
- Deploy and manage applications in Azure Cloud environments.
Minimum Qualifications:
- BS/MS in Computer Science or a related field.
- 4+ Years of experience developing backend applications with Spring Boot and Java 8+.
- 3+ Years of hands-on experience with Git for version control.
- Strong understanding of software design patterns and distributed computing principles.
- Experience with Maven for building and deploying artifacts.
- Proven ability to work in Agile Scrum environments with a collaborative team mindset.
- Prior experience with Azure Cloud Technologies.
Responsibilities:
• Build customer facing solution for Data Observability product to monitor Data Pipelines
• Work on POCs to build new data pipeline monitoring capabilities.
• Building next-generation scalable, reliable, flexible, high-performance data pipeline capabilities for ingestion of data from multiple sources containing complex dataset.
•Continuously improve services you own, making them more performant, and utilising resources in the most optimised way.
• Collaborate closely with engineering, data science team and product team to propose an optimal solution for a given problem statement
• Working closely with DevOps team on performance monitoring and MLOps
Required Skills:
• 3+ Years of Data related technology experience.
• Good understanding of distributed computing principles
• Experience in Apache Spark
• Hands on programming with Python
• Knowledge of Hadoop v2, Map Reduce, HDFS
• Experience with building stream-processing systems, using technologies such as Apache Storm, Spark-Streaming or Flink
• Experience with messaging systems, such as Kafka or RabbitMQ
• Good understanding of Big Data querying tools, such as Hive
• Experience with integration of data from multiple data sources
• Good understanding of SQL queries, joins, stored procedures, relational schemas
• Experience with NoSQL databases, such as HBase, Cassandra/Scylla, MongoDB
• Knowledge of ETL techniques and frameworks
• Performance tuning of Spark Jobs
• General understanding of Data Quality is a plus point
• Experience on Databricks,snowflake and BigQuery or similar lake houses would be a big plus
• Nice to have some knowledge in DevOps
We are actively seeking a self-motivated Data Engineer with expertise in Azure cloud and Databricks, with a thorough understanding of Delta Lake and Lake-house Architecture. The ideal candidate should excel in developing scalable data solutions, crafting platform tools, and integrating systems, while demonstrating proficiency in cloud-native database solutions and distributed data processing.
Key Responsibilities:
- Contribute to the development and upkeep of a scalable data platform, incorporating tools and frameworks that leverage Azure and Databricks capabilities.
- Exhibit proficiency in various RDBMS databases such as MySQL and SQL-Server, emphasizing their integration in applications and pipeline development.
- Design and maintain high-caliber code, including data pipelines and applications, utilizing Python, Scala, and PHP.
- Implement effective data processing solutions via Apache Spark, optimizing Spark applications for large-scale data handling.
- Optimize data storage using formats like Parquet and Delta Lake to ensure efficient data accessibility and reliable performance.
- Demonstrate understanding of Hive Metastore, Unity Catalog Metastore, and the operational dynamics of external tables.
- Collaborate with diverse teams to convert business requirements into precise technical specifications.
Requirements:
- Bachelor’s degree in Computer Science, Engineering, or a related discipline.
- Demonstrated hands-on experience with Azure cloud services and Databricks.
- Proficient programming skills in Python, Scala, and PHP.
- In-depth knowledge of SQL, NoSQL databases, and data warehousing principles.
- Familiarity with distributed data processing and external table management.
- Insight into enterprise data solutions for PIM, CDP, MDM, and ERP applications.
- Exceptional problem-solving acumen and meticulous attention to detail.
Additional Qualifications :
- Acquaintance with data security and privacy standards.
- Experience in CI/CD pipelines and version control systems, notably Git.
- Familiarity with Agile methodologies and DevOps practices.
- Competence in technical writing for comprehensive documentation.
Publicis Sapient Overview:
The Senior Associate People Senior Associate L1 in Data Engineering, you will translate client requirements into technical design, and implement components for data engineering solution. Utilize deep understanding of data integration and big data design principles in creating custom solutions or implementing package solutions. You will independently drive design discussions to insure the necessary health of the overall solution
.
Job Summary:
As Senior Associate L2 in Data Engineering, you will translate client requirements into technical design, and implement components for data engineering solution. Utilize deep understanding of data integration and big data design principles in creating custom solutions or implementing package solutions. You will independently drive design discussions to insure the necessary health of the overall solution
The role requires a hands-on technologist who has strong programming background like Java / Scala / Python, should have experience in Data Ingestion, Integration and data Wrangling, Computation, Analytics pipelines and exposure to Hadoop ecosystem components. You are also required to have hands-on knowledge on at least one of AWS, GCP, Azure cloud platforms.
Role & Responsibilities:
Your role is focused on Design, Development and delivery of solutions involving:
• Data Integration, Processing & Governance
• Data Storage and Computation Frameworks, Performance Optimizations
• Analytics & Visualizations
• Infrastructure & Cloud Computing
• Data Management Platforms
• Implement scalable architectural models for data processing and storage
• Build functionality for data ingestion from multiple heterogeneous sources in batch & real-time mode
• Build functionality for data analytics, search and aggregation
Experience Guidelines:
Mandatory Experience and Competencies:
# Competency
1.Overall 5+ years of IT experience with 3+ years in Data related technologies
2.Minimum 2.5 years of experience in Big Data technologies and working exposure in at least one cloud platform on related data services (AWS / Azure / GCP)
3.Hands-on experience with the Hadoop stack – HDFS, sqoop, kafka, Pulsar, NiFi, Spark, Spark Streaming, Flink, Storm, hive, oozie, airflow and other components required in building end to end data pipeline.
4.Strong experience in at least of the programming language Java, Scala, Python. Java preferable
5.Hands-on working knowledge of NoSQL and MPP data platforms like Hbase, MongoDb, Cassandra, AWS Redshift, Azure SQLDW, GCP BigQuery etc
6.Well-versed and working knowledge with data platform related services on at least 1 cloud platform, IAM and data security
Preferred Experience and Knowledge (Good to Have):
# Competency
1.Good knowledge of traditional ETL tools (Informatica, Talend, etc) and database technologies (Oracle, MySQL, SQL Server, Postgres) with hands on experience
2.Knowledge on data governance processes (security, lineage, catalog) and tools like Collibra, Alation etc
3.Knowledge on distributed messaging frameworks like ActiveMQ / RabbiMQ / Solace, search & indexing and Micro services architectures
4.Performance tuning and optimization of data pipelines
5.CI/CD – Infra provisioning on cloud, auto build & deployment pipelines, code quality
6.Cloud data specialty and other related Big data technology certifications
Personal Attributes:
• Strong written and verbal communication skills
• Articulation skills
• Good team player
• Self-starter who requires minimal oversight
• Ability to prioritize and manage multiple tasks
• Process orientation and the ability to define and set up processes
Data Engineering : Senior Engineer / Manager
As Senior Engineer/ Manager in Data Engineering, you will translate client requirements into technical design, and implement components for a data engineering solutions. Utilize a deep understanding of data integration and big data design principles in creating custom solutions or implementing package solutions. You will independently drive design discussions to insure the necessary health of the overall solution.
Must Have skills :
1. GCP
2. Spark streaming : Live data streaming experience is desired.
3. Any 1 coding language: Java/Pyhton /Scala
Skills & Experience :
- Overall experience of MINIMUM 5+ years with Minimum 4 years of relevant experience in Big Data technologies
- Hands-on experience with the Hadoop stack - HDFS, sqoop, kafka, Pulsar, NiFi, Spark, Spark Streaming, Flink, Storm, hive, oozie, airflow and other components required in building end to end data pipeline. Working knowledge on real-time data pipelines is added advantage.
- Strong experience in at least of the programming language Java, Scala, Python. Java preferable
- Hands-on working knowledge of NoSQL and MPP data platforms like Hbase, MongoDb, Cassandra, AWS Redshift, Azure SQLDW, GCP BigQuery etc.
- Well-versed and working knowledge with data platform related services on GCP
- Bachelor's degree and year of work experience of 6 to 12 years or any combination of education, training and/or experience that demonstrates the ability to perform the duties of the position
Your Impact :
- Data Ingestion, Integration and Transformation
- Data Storage and Computation Frameworks, Performance Optimizations
- Analytics & Visualizations
- Infrastructure & Cloud Computing
- Data Management Platforms
- Build functionality for data ingestion from multiple heterogeneous sources in batch & real-time
- Build functionality for data analytics, search and aggregation
🚀 Exciting Opportunity: Data Engineer Position in Gurugram 🌐
Hello
We are actively seeking a talented and experienced Data Engineer to join our dynamic team at Reality Motivational Venture in Gurugram (Gurgaon). If you're passionate about data, thrive in a collaborative environment, and possess the skills we're looking for, we want to hear from you!
Position: Data Engineer
Location: Gurugram (Gurgaon)
Experience: 5+ years
Key Skills:
- Python
- Spark, Pyspark
- Data Governance
- Cloud (AWS/Azure/GCP)
Main Responsibilities:
- Define and set up analytics environments for "Big Data" applications in collaboration with domain experts.
- Implement ETL processes for telemetry-based and stationary test data.
- Support in defining data governance, including data lifecycle management.
- Develop large-scale data processing engines and real-time search and analytics based on time series data.
- Ensure technical, methodological, and quality aspects.
- Support CI/CD processes.
- Foster know-how development and transfer, continuous improvement of leading technologies within Data Engineering.
- Collaborate with solution architects on the development of complex on-premise, hybrid, and cloud solution architectures.
Qualification Requirements:
- BSc, MSc, MEng, or PhD in Computer Science, Informatics/Telematics, Mathematics/Statistics, or a comparable engineering degree.
- Proficiency in Python and the PyData stack (Pandas/Numpy).
- Experience in high-level programming languages (C#/C++/Java).
- Familiarity with scalable processing environments like Dask (or Spark).
- Proficient in Linux and scripting languages (Bash Scripts).
- Experience in containerization and orchestration of containerized services (Kubernetes).
- Education in database technologies (SQL/OLAP and Non-SQL).
- Interest in Big Data storage technologies (Elastic, ClickHouse).
- Familiarity with Cloud technologies (Azure, AWS, GCP).
- Fluent English communication skills (speaking and writing).
- Ability to work constructively with a global team.
- Willingness to travel for business trips during development projects.
Preferable:
- Working knowledge of vehicle architectures, communication, and components.
- Experience in additional programming languages (C#/C++/Java, R, Scala, MATLAB).
- Experience in time-series processing.
How to Apply:
Interested candidates, please share your updated CV/resume with me.
Thank you for considering this exciting opportunity.
About Us:
6sense is a Predictive Intelligence Engine that is reimagining how B2B companies do
sales and marketing. It works with big data at scale, advanced machine learning and
predictive modelling to find buyers and predict what they will purchase, when and
how much.
6sense helps B2B marketing and sales organizations fully understand the complex ABM
buyer journey. By combining intent signals from every channel with the industry’s most
advanced AI predictive capabilities, it is finally possible to predict account demand and
optimize demand generation in an ABM world. Equipped with the power of AI and the
6sense Demand PlatformTM, marketing and sales professionals can uncover, prioritize,
and engage buyers to drive more revenue.
6sense is seeking a Staff Software Engineer and data to become part of a team
designing, developing, and deploying its customer-centric applications.
We’ve more than doubled our revenue in the past five years and completed our Series
E funding of $200M last year, giving us a stable foundation for growth.
Responsibilities:
1. Own critical datasets and data pipelines for product & business, and work
towards direct business goals of increased data coverage, data match rates, data
quality, data freshness
2. Create more value from various datasets with creative solutions, and unlocking
more value from existing data, and help build data moat for the company3. Design, develop, test, deploy and maintain optimal data pipelines, and assemble
large, complex data sets that meet functional and non-functional business
requirements
4. Improving our current data pipelines i.e. improve their performance, SLAs,
remove redundancies, and figure out a way to test before v/s after roll out
5. Identify, design, and implement process improvements in data flow across
multiple stages and via collaboration with multiple cross functional teams eg.
automating manual processes, optimising data delivery, hand-off processes etc.
6. Work with cross function stakeholders including the Product, Data Analytics ,
Customer Support teams for their enablement for data access and related goals
7. Build for security, privacy, scalability, reliability and compliance
8. Mentor and coach other team members on scalable and extensible solutions
design, and best coding standards
9. Help build a team and cultivate innovation by driving cross-collaboration and
execution of projects across multiple teams
Requirements:
8-10+ years of overall work experience as a Data Engineer
Excellent analytical and problem-solving skills
Strong experience with Big Data technologies like Apache Spark. Experience with
Hadoop, Hive, Presto would-be a plus
Strong experience in writing complex, optimized SQL queries across large data
sets. Experience with optimizing queries and underlying storage
Experience with Python/ Scala
Experience with Apache Airflow or other orchestration tools
Experience with writing Hive / Presto UDFs in Java
Experience working on AWS cloud platform and services.
Experience with Key Value stores or NoSQL databases would be a plus.
Comfortable with Unix / Linux command line
Interpersonal Skills:
You can work independently as well as part of a team.
You take ownership of projects and drive them to conclusion.
You’re a good communicator and are capable of not just doing the work, but also
teaching others and explaining the “why” behind complicated technical
decisions.
You aren’t afraid to roll up your sleeves: This role will evolve over time, and we’ll
want you to evolve with it
Lead Data Engineer
Data Engineers develop modern data architecture approaches to meet key business objectives and provide end-to-end data solutions. You might spend a few weeks with a new client on a deep technical review or a complete organizational review, helping them to understand the potential that data brings to solve their most pressing problems. On other projects, you might be acting as the architect, leading the design of technical solutions, or perhaps overseeing a program inception to build a new product. It could also be a software delivery project where you're equally happy coding and tech-leading the team to implement the solution.
Job responsibilities
· You might spend a few weeks with a new client on a deep technical review or a complete organizational review, helping them to understand the potential that data brings to solve their most pressing problems
· You will partner with teammates to create complex data processing pipelines in order to solve our clients' most ambitious challenges
· You will collaborate with Data Scientists in order to design scalable implementations of their models
· You will pair to write clean and iterative code based on TDD
· Leverage various continuous delivery practices to deploy, support and operate data pipelines
· Advise and educate clients on how to use different distributed storage and computing technologies from the plethora of options available
· Develop and operate modern data architecture approaches to meet key business objectives and provide end-to-end data solutions
· Create data models and speak to the tradeoffs of different modeling approaches
· On other projects, you might be acting as the architect, leading the design of technical solutions, or perhaps overseeing a program inception to build a new product
· Seamlessly incorporate data quality into your day-to-day work as well as into the delivery process
· Assure effective collaboration between Thoughtworks' and the client's teams, encouraging open communication and advocating for shared outcomes
Job qualifications Technical skills
· You are equally happy coding and leading a team to implement a solution
· You have a track record of innovation and expertise in Data Engineering
· You're passionate about craftsmanship and have applied your expertise across a range of industries and organizations
· You have a deep understanding of data modelling and experience with data engineering tools and platforms such as Kafka, Spark, and Hadoop
· You have built large-scale data pipelines and data-centric applications using any of the distributed storage platforms such as HDFS, S3, NoSQL databases (Hbase, Cassandra, etc.) and any of the distributed processing platforms like Hadoop, Spark, Hive, Oozie, and Airflow in a production setting
· Hands on experience in MapR, Cloudera, Hortonworks and/or cloud (AWS EMR, Azure HDInsights, Qubole etc.) based Hadoop distributions
· You are comfortable taking data-driven approaches and applying data security strategy to solve business problems
· You're genuinely excited about data infrastructure and operations with a familiarity working in cloud environments
· Working with data excites you: you have created Big data architecture, you can build and operate data pipelines, and maintain data storage, all within distributed systems
Professional skills
· Advocate your data engineering expertise to the broader tech community outside of Thoughtworks, speaking at conferences and acting as a mentor for more junior-level data engineers
· You're resilient and flexible in ambiguous situations and enjoy solving problems from technical and business perspectives
· An interest in coaching others, sharing your experience and knowledge with teammates
· You enjoy influencing others and always advocate for technical excellence while being open to change when needed
TOP 3 SKILLS
Python (Language)
Spark Framework
Spark Streaming
Docker/Jenkins/ Spinakar
AWS
Hive Queries
He/She should be good coder.
Preff: - Airflow
Must have experience: -
Python
Spark framework and streaming
exposure to Machine Learning Lifecycle is mandatory.
Project:
This is searching domain project. Any searching activity which is happening on website this team create the model for the same, they create sorting/scored model for any search. This is done by the data
scientist This team is working more on the streaming side of data, the candidate would work extensively on Spark streaming and there will be a lot of work in Machine Learning.
INTERVIEW INFORMATION
3-4 rounds.
1st round based on data engineering batching experience.
2nd round based on data engineering streaming experience.
3rd round based on ML lifecycle (3rd round can be a techno-functional round based on previous
feedbacks otherwise 4th round will be a functional round if required.
Roles and Responsibilities:
- Design, develop, and maintain the end-to-end MLOps infrastructure from the ground up, leveraging open-source systems across the entire MLOps landscape.
- Creating pipelines for data ingestion, data transformation, building, testing, and deploying machine learning models, as well as monitoring and maintaining the performance of these models in production.
- Managing the MLOps stack, including version control systems, continuous integration and deployment tools, containerization, orchestration, and monitoring systems.
- Ensure that the MLOps stack is scalable, reliable, and secure.
Skills Required:
- 3-6 years of MLOps experience
- Preferably worked in the startup ecosystem
Primary Skills:
- Experience with E2E MLOps systems like ClearML, Kubeflow, MLFlow etc.
- Technical expertise in MLOps: Should have a deep understanding of the MLOps landscape and be able to leverage open-source systems to build scalable, reliable, and secure MLOps infrastructure.
- Programming skills: Proficient in at least one programming language, such as Python, and have experience with data science libraries, such as TensorFlow, PyTorch, or Scikit-learn.
- DevOps experience: Should have experience with DevOps tools and practices, such as Git, Docker, Kubernetes, and Jenkins.
Secondary Skills:
- Version Control Systems (VCS) tools like Git and Subversion
- Containerization technologies like Docker and Kubernetes
- Cloud Platforms like AWS, Azure, and Google Cloud Platform
- Data Preparation and Management tools like Apache Spark, Apache Hadoop, and SQL databases like PostgreSQL and MySQL
- Machine Learning Frameworks like TensorFlow, PyTorch, and Scikit-learn
- Monitoring and Logging tools like Prometheus, Grafana, and Elasticsearch
- Continuous Integration and Continuous Deployment (CI/CD) tools like Jenkins, GitLab CI, and CircleCI
- Explain ability and Interpretability tools like LIME and SHAP
- Creating and managing ETL/ELT pipelines based on requirements
- Build PowerBI dashboards and manage datasets needed.
- Work with stakeholders to identify data structures needed for future and perform any transformations including aggregations.
- Build data cubes for real-time visualisation needs and CXO dashboards.
Required Tech Skills
- Microsoft PowerBI & DAX
- Python, Pandas, PyArrow, Jupyter Noteboks, ApacheSpark
- Azure Synapse, Azure DataBricks, Azure HDInsight, Azure Data Factory
Have you streamed a program on Disney+, watched your favorite binge-worthy series on Peacock or cheered your favorite team on during the World Cup from one of the 20 top streaming platforms around the globe? If the answer is yes, you’ve already benefitted from Conviva technology, helping the world’s leading streaming publishers deliver exceptional streaming experiences and grow their businesses.
Conviva is the only global streaming analytics platform for big data that collects, standardizes, and puts trillions of cross-screen, streaming data points in context, in real time. The Conviva platform provides comprehensive, continuous, census-level measurement through real-time, server side sessionization at unprecedented scale. If this sounds important, it is! We measure a global footprint of more than 500 million unique viewers in 180 countries watching 220 billion streams per year across 3 billion applications streaming on devices. With Conviva, customers get a unique level of actionability and scale from continuous streaming measurement insights and benchmarking across every stream, every screen, every second.
What you get to do in this role:
Work on extremely high scale RUST web services or backend systems.
Design and develop solutions for highly scalable web and backend systems.
Proactively identify and solve performance issues.
Maintain a high bar on code quality and unit testing.
What you bring to the role:
5+ years of hands-on software development experience.
At least 2+ years of RUST development experience.
Knowledge of cargo packages for kafka, redis etc.
Strong CS fundamentals, including system design, data structures and algorithms.
Expertise in backend and web services development.
Good analytical and troubleshooting skills.
What will help you stand out:
Experience working with large scale web services and applications.
Exposure to Golang, Scala or Java
Exposure to Big data systems like Kafka, Spark, Hadoop etc.
Underpinning the Conviva platform is a rich history of innovation. More than 60 patents represent award-winning technologies and standards, including first-of-its kind-innovations like time-state analytics and AI-automated data modeling, that surfaces actionable insights. By understanding real-world human experiences and having the ability to act within seconds of observation, our customers can solve business-critical issues and focus on growing their business ahead of the competition. Examples of the brands Conviva has helped fuel streaming growth for include: DAZN, Disney+, HBO, Hulu, NBCUniversal, Paramount+, Peacock, Sky, Sling TV, Univision and Warner Bros Discovery.
Privately held, Conviva is headquartered in Silicon Valley, California with offices and people around the globe. For more information, visit us at www.conviva.com. Join us to help extend our leadership position in big data streaming analytics to new audiences and markets!
Have you streamed a program on Disney+, watched your favorite binge-worthy series on Peacock or cheered your favorite team on during the World Cup from one of the 20 top streaming platforms around the globe? If the answer is yes, you’ve already benefitted from Conviva technology, helping the world’s leading streaming publishers deliver exceptional streaming experiences and grow their businesses.
Conviva is the only global streaming analytics platform for big data that collects, standardizes, and puts trillions of cross-screen, streaming data points in context, in real time. The Conviva platform provides comprehensive, continuous, census-level measurement through real-time, server side sessionization at unprecedented scale. If this sounds important, it is! We measure a global footprint of more than 500 million unique viewers in 180 countries watching 220 billion streams per year across 3 billion applications streaming on devices. With Conviva, customers get a unique level of actionability and scale from continuous streaming measurement insights and benchmarking across every stream, every screen, every second.
As Conviva is expanding, we are building products providing deep insights into end user experience for our customers.
Platform and TLB Team
The vision for the TLB team is to build data processing software that works on terabytes of streaming data in real time. Engineer the next-gen Spark-like system for in-memory computation of large time-series dataset’s – both Spark-like backend infra and library based programming model. Build horizontally and vertically scalable system that analyses trillions of events per day within sub second latencies. Utilize the latest and greatest of big data technologies to build solutions for use-cases across multiple verticals. Lead technology innovation and advancement that will have big business impact for years to come. Be part of a worldwide team building software using the latest technologies and the best of software development tools and processes.
What You’ll Do
This is an individual contributor position. Expectations will be on the below lines:
- Design, build and maintain the stream processing, and time-series analysis system which is at the heart of Conviva's products
- Responsible for the architecture of the Conviva platform
- Build features, enhancements, new services, and bug fixing in Scala and Java on a Jenkins-based pipeline to be deployed as Docker containers on Kubernetes
- Own the entire lifecycle of your microservice including early specs, design, technology choice, development, unit-testing, integration-testing, documentation, deployment, troubleshooting, enhancements etc.
- Lead a team to develop a feature or parts of the product
- Adhere to the Agile model of software development to plan, estimate, and ship per business priority
What you need to succeed
- 9+ years of work experience in software development of data processing products.
- Engineering degree in software or equivalent from a premier institute.
- Excellent knowledge of fundamentals of Computer Science like algorithms and data structures. Hands-on with functional programming and know-how of its concepts
- Excellent programming and debugging skills on the JVM. Proficient in writing code in Scala/Java/Rust/Haskell/Erlang that is reliable, maintainable, secure, and performant
- Experience with big data technologies like Spark, Flink, Kafka, Druid, HDFS, etc.
- Deep understanding of distributed systems concepts and scalability challenges including multi-threading, concurrency, sharding, partitioning, etc.
- Experience/knowledge of Akka/Lagom framework and/or stream processing technologies like RxJava or Project Reactor will be a big plus. Knowledge of design patterns like event-streaming, CQRS and DDD to build large microservice architectures will be a big plus
- Excellent communication skills. Willingness to work under pressure. Hunger to learn and succeed. Comfortable with ambiguity. Comfortable with complexity
Underpinning the Conviva platform is a rich history of innovation. More than 60 patents represent award-winning technologies and standards, including first-of-its kind-innovations like time-state analytics and AI-automated data modeling, that surfaces actionable insights. By understanding real-world human experiences and having the ability to act within seconds of observation, our customers can solve business-critical issues and focus on growing their businesses ahead of the competition. Examples of the brands Conviva has helped fuel streaming growth for include DAZN, Disney+, HBO, Hulu, NBCUniversal, Paramount+, Peacock, Sky, Sling TV, Univision, and Warner Bros Discovery.
Privately held, Conviva is headquartered in Silicon Valley, California with offices and people around the globe. For more information, visit us at www.conviva.com. Join us to help extend our leadership position in big data streaming analytics to new audiences and markets!
Job Title: Data Engineer
Job Summary: As a Data Engineer, you will be responsible for designing, building, and maintaining the infrastructure and tools necessary for data collection, storage, processing, and analysis. You will work closely with data scientists and analysts to ensure that data is available, accessible, and in a format that can be easily consumed for business insights.
Responsibilities:
- Design, build, and maintain data pipelines to collect, store, and process data from various sources.
- Create and manage data warehousing and data lake solutions.
- Develop and maintain data processing and data integration tools.
- Collaborate with data scientists and analysts to design and implement data models and algorithms for data analysis.
- Optimize and scale existing data infrastructure to ensure it meets the needs of the business.
- Ensure data quality and integrity across all data sources.
- Develop and implement best practices for data governance, security, and privacy.
- Monitor data pipeline performance / Errors and troubleshoot issues as needed.
- Stay up-to-date with emerging data technologies and best practices.
Requirements:
Bachelor's degree in Computer Science, Information Systems, or a related field.
Experience with ETL tools like Matillion,SSIS,Informatica
Experience with SQL and relational databases such as SQL server, MySQL, PostgreSQL, or Oracle.
Experience in writing complex SQL queries
Strong programming skills in languages such as Python, Java, or Scala.
Experience with data modeling, data warehousing, and data integration.
Strong problem-solving skills and ability to work independently.
Excellent communication and collaboration skills.
Familiarity with big data technologies such as Hadoop, Spark, or Kafka.
Familiarity with data warehouse/Data lake technologies like Snowflake or Databricks
Familiarity with cloud computing platforms such as AWS, Azure, or GCP.
Familiarity with Reporting tools
Teamwork/ growth contribution
- Helping the team in taking the Interviews and identifying right candidates
- Adhering to timelines
- Intime status communication and upfront communication of any risks
- Tech, train, share knowledge with peers.
- Good Communication skills
- Proven abilities to take initiative and be innovative
- Analytical mind with a problem-solving aptitude
Good to have :
Master's degree in Computer Science, Information Systems, or a related field.
Experience with NoSQL databases such as MongoDB or Cassandra.
Familiarity with data visualization and business intelligence tools such as Tableau or Power BI.
Knowledge of machine learning and statistical modeling techniques.
If you are passionate about data and want to work with a dynamic team of data scientists and analysts, we encourage you to apply for this position.
About Kloud9:
Kloud9 exists with the sole purpose of providing cloud expertise to the retail industry. Our team of cloud architects, engineers and developers help retailers launch a successful cloud initiative so you can quickly realise the benefits of cloud technology. Our standardised, proven cloud adoption methodologies reduce the cloud adoption time and effort so you can directly benefit from lower migration costs.
Kloud9 was founded with the vision of bridging the gap between E-commerce and cloud. The E-commerce of any industry is limiting and poses a huge challenge in terms of the finances spent on physical data structures.
At Kloud9, we know migrating to the cloud is the single most significant technology shift your company faces today. We are your trusted advisors in transformation and are determined to build a deep partnership along the way. Our cloud and retail experts will ease your transition to the cloud.
Our sole focus is to provide cloud expertise to retail industry giving our clients the empowerment that will take their business to the next level. Our team of proficient architects, engineers and developers have been designing, building and implementing solutions for retailers for an average of more than 20 years.
We are a cloud vendor that is both platform and technology independent. Our vendor independence not just provides us with a unique perspective into the cloud market but also ensures that we deliver the cloud solutions available that best meet our clients' requirements.
What we are looking for:
● 3+ years’ experience developing Data & Analytic solutions
● Experience building data lake solutions leveraging one or more of the following AWS, EMR, S3, Hive& Spark
● Experience with relational SQL
● Experience with scripting languages such as Shell, Python
● Experience with source control tools such as GitHub and related dev process
● Experience with workflow scheduling tools such as Airflow
● In-depth knowledge of scalable cloud
● Has a passion for data solutions
● Strong understanding of data structures and algorithms
● Strong understanding of solution and technical design
● Has a strong problem-solving and analytical mindset
● Experience working with Agile Teams.
● Able to influence and communicate effectively, both verbally and written, with team members and business stakeholders
● Able to quickly pick up new programming languages, technologies, and frameworks
● Bachelor’s Degree in computer science
Why Explore a Career at Kloud9:
With job opportunities in prime locations of US, London, Poland and Bengaluru, we help build your career paths in cutting edge technologies of AI, Machine Learning and Data Science. Be part of an inclusive and diverse workforce that's changing the face of retail technology with their creativity and innovative solutions. Our vested interest in our employees translates to deliver the best products and solutions to our customers.
Data Engineer- Senior
Cubera is a data company revolutionizing big data analytics and Adtech through data share value principles wherein the users entrust their data to us. We refine the art of understanding, processing, extracting, and evaluating the data that is entrusted to us. We are a gateway for brands to increase their lead efficiency as the world moves towards web3.
What are you going to do?
Design & Develop high performance and scalable solutions that meet the needs of our customers.
Closely work with the Product Management, Architects and cross functional teams.
Build and deploy large-scale systems in Java/Python.
Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
Create data tools for analytics and data scientist team members that assist them in building and optimizing their algorithms.
Follow best practices that can be adopted in Bigdata stack.
Use your engineering experience and technical skills to drive the features and mentor the engineers.
What are we looking for ( Competencies) :
Bachelor’s degree in computer science, computer engineering, or related technical discipline.
Overall 5 to 8 years of programming experience in Java, Python including object-oriented design.
Data handling frameworks: Should have a working knowledge of one or more data handling frameworks like- Hive, Spark, Storm, Flink, Beam, Airflow, Nifi etc.
Data Infrastructure: Should have experience in building, deploying and maintaining applications on popular cloud infrastructure like AWS, GCP etc.
Data Store: Must have expertise in one of general-purpose No-SQL data stores like Elasticsearch, MongoDB, Redis, RedShift, etc.
Strong sense of ownership, focus on quality, responsiveness, efficiency, and innovation.
Ability to work with distributed teams in a collaborative and productive manner.
Benefits:
Competitive Salary Packages and benefits.
Collaborative, lively and an upbeat work environment with young professionals.
Job Category: Development
Job Type: Full Time
Job Location: Bangalore
The thrill of working at a start-up that is starting to scale massively is something else. Simpl (FinTech startup of the year - 2020) was formed in 2015 by Nitya Sharma, an investment banker from Wall Street and Chaitra Chidanand, a tech executive from the Valley, when they teamed up with a very clear mission - to make money simple so that people can live well and do amazing things. Simpl is the payment platform for the mobile-first world, and we’re backed by some of the best names in fintech globally (folks who have invested in Visa, Square and Transferwise), and
has Joe Saunders, Ex Chairman and CEO of Visa as a board member.
Everyone at Simpl is an internal entrepreneur who is given a lot of bandwidth and resources to create the next breakthrough towards the long term vision of “making money Simpl”. Our first product is a payment platform that lets people buy instantly, anywhere online, and pay later. In
the background, Simpl uses big data for credit underwriting, risk and fraud modelling, all without any paperwork, and enables Banks and Non-Bank Financial Companies to access a whole new consumer market.
In place of traditional forms of identification and authentication, Simpl integrates deeply into merchant apps via SDKs and APIs. This allows for more sophisticated forms of authentication that take full advantage of smartphone data and processing power
Skillset:
Workflow manager/scheduler like Airflow, Luigi, Oozie
Good handle on Python
ETL Experience
Batch processing frameworks like Spark, MR/PIG
File formats: parquet, JSON, XML, thrift, avro, protobuff
Rule engine (drools - business rule management system)
Distributed file systems like HDFS, NFS, AWS, S3 and equivalent
Built/configured dashboards
Nice to have:
Data platform experience for eg: building data lakes, working with near - realtime
applications/frameworks like storm, flink, spark.
AWS
File encoding types: Thrift, Avro, Protobuff, Parquet, JSON, XML
HIVE, HBASE
XpressBees – a logistics company started in 2015 – is amongst the fastest growing
companies of its sector. While we started off rather humbly in the space of
ecommerce B2C logistics, the last 5 years have seen us steadily progress towards
expanding our presence. Our vision to evolve into a strong full-service logistics
organization reflects itself in our new lines of business like 3PL, B2B Xpress and cross
border operations. Our strong domain expertise and constant focus on meaningful
innovation have helped us rapidly evolve as the most trusted logistics partner of
India. We have progressively carved our way towards best-in-class technology
platforms, an extensive network reach, and a seamless last mile management
system. While on this aggressive growth path, we seek to become the one-stop-shop
for end-to-end logistics solutions. Our big focus areas for the very near future
include strengthening our presence as service providers of choice and leveraging the
power of technology to improve efficiencies for our clients.
Job Profile
As a Lead Data Engineer in the Data Platform Team at XpressBees, you will build the data platform
and infrastructure to support high quality and agile decision-making in our supply chain and logistics
workflows.
You will define the way we collect and operationalize data (structured / unstructured), and
build production pipelines for our machine learning models, and (RT, NRT, Batch) reporting &
dashboarding requirements. As a Senior Data Engineer in the XB Data Platform Team, you will use
your experience with modern cloud and data frameworks to build products (with storage and serving
systems)
that drive optimisation and resilience in the supply chain via data visibility, intelligent decision making,
insights, anomaly detection and prediction.
What You Will Do
• Design and develop data platform and data pipelines for reporting, dashboarding and
machine learning models. These pipelines would productionize machine learning models
and integrate with agent review tools.
• Meet the data completeness, correction and freshness requirements.
• Evaluate and identify the data store and data streaming technology choices.
• Lead the design of the logical model and implement the physical model to support
business needs. Come up with logical and physical database design across platforms (MPP,
MR, Hive/PIG) which are optimal physical designs for different use cases (structured/semi
structured). Envision & implement the optimal data modelling, physical design,
performance optimization technique/approach required for the problem.
• Support your colleagues by reviewing code and designs.
• Diagnose and solve issues in our existing data pipelines and envision and build their
successors.
Qualifications & Experience relevant for the role
• A bachelor's degree in Computer Science or related field with 6 to 9 years of technology
experience.
• Knowledge of Relational and NoSQL data stores, stream processing and micro-batching to
make technology & design choices.
• Strong experience in System Integration, Application Development, ETL, Data-Platform
projects. Talented across technologies used in the enterprise space.
• Software development experience using:
• Expertise in relational and dimensional modelling
• Exposure across all the SDLC process
• Experience in cloud architecture (AWS)
• Proven track record in keeping existing technical skills and developing new ones, so that
you can make strong contributions to deep architecture discussions around systems and
applications in the cloud ( AWS).
• Characteristics of a forward thinker and self-starter that flourishes with new challenges
and adapts quickly to learning new knowledge
• Ability to work with a cross functional teams of consulting professionals across multiple
projects.
• Knack for helping an organization to understand application architectures and integration
approaches, to architect advanced cloud-based solutions, and to help launch the build-out
of those systems
• Passion for educating, training, designing, and building end-to-end systems.
• Project Planning and Management
o Take end-to-end ownership of multiple projects / project tracks
o Create and maintain project plans and other related documentation for project
objectives, scope, schedule and delivery milestones
o Lead and participate across all the phases of software engineering, right from
requirements gathering to GO LIVE
o Lead internal team meetings on solution architecture, effort estimation, manpower
planning and resource (software/hardware/licensing) planning
o Manage RIDA (Risks, Impediments, Dependencies, Assumptions) for projects by
developing effective mitigation plans
• Team Management
o Act as the Scrum Master
o Conduct SCRUM ceremonies like Sprint Planning, Daily Standup, Sprint Retrospective
o Set clear objectives for the project and roles/responsibilities for each team member
o Train and mentor the team on their job responsibilities and SCRUM principles
o Make the team accountable for their tasks and help the team in achieving them
o Identify the requirements and come up with a plan for Skill Development for all team
members
• Communication
o Be the Single Point of Contact for the client in terms of day-to-day communication
o Periodically communicate project status to all the stakeholders (internal/external)
• Process Management and Improvement
o Create and document processes across all disciplines of software engineering
o Identify gaps and continuously improve processes within the team
o Encourage team members to contribute towards process improvement
o Develop a culture of quality and efficiency within the team
Must have:
• Minimum 08 years of experience (hands-on as well as leadership) in software / data engineering
across multiple job functions like Business Analysis, Development, Solutioning, QA, DevOps and
Project Management
• Hands-on as well as leadership experience in Big Data Engineering projects
• Experience developing or managing cloud solutions using Azure or other cloud provider
• Demonstrable knowledge on Hadoop, Hive, Spark, NoSQL DBs, SQL, Data Warehousing, ETL/ELT,
DevOps tools
• Strong project management and communication skills
• Strong analytical and problem-solving skills
• Strong systems level critical thinking skills
• Strong collaboration and influencing skills
Good to have:
• Knowledge on PySpark, Azure Data Factory, Azure Data Lake Storage, Synapse Dedicated SQL
Pool, Databricks, PowerBI, Machine Learning, Cloud Infrastructure
• Background in BFSI with focus on core banking
• Willingness to travel
Work Environment
• Customer Office (Mumbai) / Remote Work
Education
• UG: B. Tech - Computers / B. E. – Computers / BCA / B.Sc. Computer Science
- 5+ years of experience in software development.
- At least 2 years of relevant work experience on large scale Data applications
- Good attitude, strong problem-solving abilities, analytical skills, ability to take ownership as appropriate
- Should be able to do coding, debugging, performance tuning, and deploying the apps to Prod.
- Should have good working experience Hadoop ecosystem (HDFS, Hive, Yarn, File formats like Avro/Parquet)
- Kafka
- J2EE Frameworks (Spring/Hibernate/REST)
- Spark Streaming or any other streaming technology.
- Java programming language is mandatory.
- Good to have experience with Java
- Ability to work on the sprint stories to completion along with Unit test case coverage.
- Experience working in Agile Methodology
- Excellent communication and coordination skills
- Knowledgeable (and preferred hands-on) - UNIX environments, different continuous integration tools.
- Must be able to integrate quickly into the team and work independently towards team goals
- Take the complete responsibility of the sprint stories’ execution
- Be accountable for the delivery of the tasks in the defined timelines with good quality
- Follow the processes for project execution and delivery.
- Follow agile methodology
- Work with the team lead closely and contribute to the smooth delivery of the project.
- Understand/define the architecture and discuss the pros-cons of the same with the team
- Involve in the brainstorming sessions and suggest improvements in the architecture/design.
- Work with other team leads to get the architecture/design reviewed.
- Work with the clients and counterparts (in US) of the project.
- Keep all the stakeholders updated about the project/task status/risks/issues if there are any.
About Slintel (a 6sense company) :
Slintel, a 6sense company, the leader in capturing technographics-powered buying intent, helps companies uncover the 3% of active buyers in their target market. Slintel evaluates over 100 billion data points and analyzes factors such as buyer journeys, technology adoption patterns, and other digital footprints to deliver market & sales intelligence.
Slintel's customers have access to the buying patterns and contact information of more than 17 million companies and 250 million decision makers across the world.
Slintel is a fast growing B2B SaaS company in the sales and marketing tech space. We are funded by top tier VCs, and going after a billion dollar opportunity. At Slintel, we are building a sales development automation platform that can significantly improve outcomes for sales teams, while reducing the number of hours spent on research and outreach.
We are a big data company and perform deep analysis on technology buying patterns, buyer pain points to understand where buyers are in their journey. Over 100 billion data points are analyzed every week to derive recommendations on where companies should focus their marketing and sales efforts on. Third party intent signals are then clubbed with first party data from CRMs to derive meaningful recommendations on whom to target on any given day.
6sense is headquartered in San Francisco, CA and has 8 office locations across 4 countries.
6sense, an account engagement platform, secured $200 million in a Series E funding round, bringing its total valuation to $5.2 billion 10 months after its $125 million Series D round. The investment was co-led by Blue Owl and MSD Partners, among other new and existing investors.
Linkedin (Slintel) : https://www.linkedin.com/company/slintel/">https://www.linkedin.com/company/slintel/
Industry : Software Development
Company size : 51-200 employees (189 on LinkedIn)
Headquarters : Mountain View, California
Founded : 2016
Specialties : Technographics, lead intelligence, Sales Intelligence, Company Data, and Lead Data.
Website (Slintel) : https://www.slintel.com/slintel">https://www.slintel.com/slintel
Linkedin (6sense) : https://www.linkedin.com/company/6sense/">https://www.linkedin.com/company/6sense/
Industry : Software Development
Company size : 501-1,000 employees (937 on LinkedIn)
Headquarters : San Francisco, California
Founded : 2013
Specialties : Predictive intelligence, Predictive marketing, B2B marketing, and Predictive sales
Website (6sense) : https://6sense.com/">https://6sense.com/
Acquisition News :
https://inc42.com/buzz/us-based-based-6sense-acquires-b2b-buyer-intelligence-startup-slintel/
Funding Details & News :
Slintel funding : https://www.crunchbase.com/organization/slintel">https://www.crunchbase.com/organization/slintel
6sense funding : https://www.crunchbase.com/organization/6sense">https://www.crunchbase.com/organization/6sense
https://www.nasdaq.com/articles/ai-software-firm-6sense-valued-at-%245.2-bln-after-softbank-joins-funding-round">https://www.nasdaq.com/articles/ai-software-firm-6sense-valued-at-%245.2-bln-after-softbank-joins-funding-round
https://www.bloomberg.com/news/articles/2022-01-20/6sense-reaches-5-2-billion-value-with-softbank-joining-round">https://www.bloomberg.com/news/articles/2022-01-20/6sense-reaches-5-2-billion-value-with-softbank-joining-round
https://xipometer.com/en/company/6sense">https://xipometer.com/en/company/6sense
Slintel & 6sense Customers :
https://www.featuredcustomers.com/vendor/slintel/customers
https://www.featuredcustomers.com/vendor/6sense/customers">https://www.featuredcustomers.com/vendor/6sense/customers
About the job
Responsibilities
- Work in collaboration with the application team and integration team to design, create, and maintain optimal data pipeline architecture and data structures for Data Lake/Data Warehouse
- Work with stakeholders including the Sales, Product, and Customer Support teams to assist with data-related technical issues and support their data analytics needs
- Assemble large, complex data sets from third-party vendors to meet business requirements.
- Identify, design, and implement internal process improvements: automating manual processes, optimising data delivery, re-designing infrastructure for greater scalability, etc.
- Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL, Elastic search, MongoDB, and AWS technology
- Streamline existing and introduce enhanced reporting and analysis solutions that leverage complex data sources derived from multiple internal systems
Requirements
- 3+ years of experience in a Data Engineer role
- Proficiency in Linux
- Must have SQL knowledge and experience working with relational databases, query authoring (SQL) as well as familiarity with databases including Mysql, Mongo, Cassandra, and Athena
- Must have experience with Python/ Scala
- Must have experience with Big Data technologies like Apache Spark
- Must have experience with Apache Airflow
- Experience with data pipeline and ETL tools like AWS Glue
- Experience working with AWS cloud services: EC2 S3 RDS, Redshift and other Data solutions eg. Databricks, Snowflake
Desired Skills and Experience
Python, SQL, Scala, Spark, ETL
| • Java 1.8+ or Open JDK 11+ • Core spring framework • Spring Boot • Spring Rest Services • Any RDBMS • Gradel or Maven • JPA / Hibernate • GIT / Bitbucket • Apache Spark |
What you'll do:
Design and development of scalable applications.
Collaborate with tech leads to get maximum understanding of underlying infrastructure.
Contribute to continual improvement by suggesting improvements to the software system.
Ensure high scalability and performance
You will advocate for good, clean, well documented and performing code; follow standards and best practices.
We'd love for you to have:
Education: Bachelor/Master Degree in Computer Science
Experience: 1-3 years of relevant experience in BI/Big-Data with hands-on coding experience
Mandatory Skills
Strong in problem-solving
Good exposure to Big Data technologies, Hive, Hadoop, Impala, Hbase, Kafka, Spark
Strong experience of Data Engineering
Able to comprehend challenges related to Database and Data Warehousing technologies and ability to understand complex design, system architecture
Experience with the software development lifecycle, design, develop, review, debug, document, and deliver (especially in a multi-location organization)
Working knowledge of Java, python
Desired Skills
Experience with reporting tools like Tableau, QlikView
Awareness of CI-CD pipeline
Inclination to work on cloud platform ex:- AWS
Crisp communication skills with team members, Business owners.
Be able to work in a challenging, dynamic environment and meet tight deadlines
Job Description: Data Scientist
At Propellor.ai, we derive insights that allow our clients to make scientific decisions. We believe in demanding more from the fields of Mathematics, Computer Science, and Business Logic. Combine these and we show our clients a 360-degree view of their business. In this role, the Data Scientist will be expected to work on Procurement problems along with a team-based across the globe.
We are a Remote-First Company.
Read more about us here: https://www.propellor.ai/consulting" target="_blank">https://www.propellor.ai/consulting
What will help you be successful in this role
- Articulate
- High Energy
- Passion to learn
- High sense of ownership
- Ability to work in a fast-paced and deadline-driven environment
- Loves technology
- Highly skilled at Data Interpretation
- Problem solver
- Ability to narrate the story to the business stakeholders
- Generate insights and the ability to turn them into actions and decisions
Skills to work in a challenging, complex project environment
- Need you to be naturally curious and have a passion for understanding consumer behavior
- A high level of motivation, passion, and high sense of ownership
- Excellent communication skills needed to manage an incredibly diverse slate of work, clients, and team personalities
- Flexibility to work on multiple projects and deadline-driven fast-paced environment
- Ability to work in ambiguity and manage the chaos
Key Responsibilities
- Analyze data to unlock insights: Ability to identify relevant insights and actions from data. Use regression, cluster analysis, time series, etc. to explore relationships and trends in response to stakeholder questions and business challenges.
- Bring in experience for AI and ML: Bring in Industry experience and apply the same to build efficient and optimal Machine Learning solutions.
- Exploratory Data Analysis (EDA) and Generate Insights: Analyse internal and external datasets using analytical techniques, tools, and visualization methods. Ensure pre-processing/cleansing of data and evaluate data points across the enterprise landscape and/or external data points that can be leveraged in machine learning models to generate insights.
- DS and ML Model Identification and Training: Identity, test, and train machine learning models that need to be leveraged for business use cases. Evaluate models based on interpretability, performance, and accuracy as required. Experiment and identify features from datasets that will help influence model outputs. Determine what models will need to be deployed, data points that need to be fed into models, and aid in the deployment and maintenance of models.
Technical Skills
An enthusiastic individual with the following skills. Please do not hesitate to apply if you do not match all of them. We are open to promising candidates who are passionate about their work, fast learners and are team players.
- Strong experience with machine learning and AI including regression, forecasting, time series, cluster analysis, classification, Image recognition, NLP, Text Analytics and Computer Vision.
- Strong experience with advanced analytics tools for Object-oriented/object function scripting using languages such as Python, or similar.
- Strong experience with popular database programming languages including SQL.
- Strong experience in Spark/Pyspark
- Experience in working in Databricks
What are the company benefits you get, when you join us as?
- Permanent Work from Home Opportunity
- Opportunity to work with Business Decision Makers and an internationally based team
- The work environment that offers limitless learning
- A culture void of any bureaucracy, hierarchy
- A culture of being open, direct, and with mutual respect
- A fun, high-caliber team that trusts you and provides the support and mentorship to help you grow
- The opportunity to work on high-impact business problems that are already defining the future of Marketing and improving real lives
To know more about how we work: https://bit.ly/3Oy6WlE" target="_blank">https://bit.ly/3Oy6WlE
Whom will you work with?
You will closely work with other Senior Data Scientists and Data Engineers.
Immediate to 15-day Joiners will be preferred.
We are looking for an exceptionally talented Lead data engineer who has exposure in implementing AWS services to build data pipelines, api integration and designing data warehouse. Candidate with both hands-on and leadership capabilities will be ideal for this position.
Qualification: At least a bachelor’s degree in Science, Engineering, Applied Mathematics. Preferred Masters degree
Job Responsibilities:
• Total 6+ years of experience as a Data Engineer and 2+ years of experience in managing a team
• Have minimum 3 years of AWS Cloud experience.
• Well versed in languages such as Python, PySpark, SQL, NodeJS etc
• Has extensive experience in Spark ecosystem and has worked on both real time and batch processing
• Have experience in AWS Glue, EMR, DMS, Lambda, S3, DynamoDB, Step functions, Airflow, RDS, Aurora etc.
• Experience with modern Database systems such as Redshift, Presto, Hive etc.
• Worked on building data lakes in the past on S3 or Apache Hudi
• Solid understanding of Data Warehousing Concepts
• Good to have experience on tools such as Kafka or Kinesis
• Good to have AWS Developer Associate or Solutions Architect Associate Certification
• Have experience in managing a team



















![[x]cube LABS](/_next/image?url=https%3A%2F%2Fcdnv2.cutshort.io%2Fcompany-static%2F639877aa0ad87e002533a1c5%2Fuser_uploaded_data%2Flogos%2Fx_whiteB_eeCk0gqs.png&w=256&q=75)





