
About the Job
This is a full-time role for a Lead DevOps Engineer at Spark Eighteen. We are seeking an experienced DevOps professional to lead our infrastructure strategy, design resilient systems, and drive continuous improvement in our deployment processes. In this role, you will architect scalable solutions, mentor junior engineers, and ensure the highest standards of reliability and security across our cloud infrastructure. The job location is flexible with preference for the Delhi NCR region.
Responsibilities
- Lead and mentor the DevOps/SRE team
- Define and drive DevOps strategy and roadmaps
- Oversee infrastructure automation and CI/CD at scale
- Collaborate with architects, developers, and QA teams to integrate DevOps practices
- Ensure security, compliance, and high availability of platforms
- Own incident response, postmortems, and root cause analysis
- Budgeting, team hiring, and performance evaluation
Requirements
Technical Skills
- Bachelor's or Master's degree in Computer Science, Engineering, or related field.
- 7+ years of professional DevOps experience with demonstrated progression.
- Strong architecture and leadership background
- Deep hands-on knowledge of infrastructure as code, CI/CD, and cloud
- Proven experience with monitoring, security, and governance
- Effective stakeholder and project management
- Experience with tools like Jenkins, ArgoCD, Terraform, Vault, ELK, etc.
- Strong understanding of business continuity and disaster recovery
Soft Skills
- Cross-functional communication excellence with ability to lead technical discussions.
- Strong mentorship capabilities for junior and mid-level team members.
- Advanced strategic thinking and ability to propose innovative solutions.
- Excellent knowledge transfer skills through documentation and training.
- Ability to understand and align technical solutions with broader business strategy.
- Proactive problem-solving approach with focus on continuous improvement.
- Strong leadership skills in guiding team performance and technical direction.
- Effective collaboration across development, QA, and business teams.
- Ability to make complex technical decisions with minimal supervision.
- Strategic approach to risk management and mitigation.
What We Offer
- Professional Growth: Continuous learning opportunities through diverse projects and mentorship from experienced leaders
- Global Exposure: Work with clients from 20+ countries, gaining insights into different markets and business cultures
- Impactful Work: Contribute to projects that make a real difference, with solutions generating over $1B in revenue
- Work-Life Balance: Flexible arrangements that respect personal wellbeing while fostering productivity
- Career Advancement: Clear progression pathways as you develop skills within our growing organization
- Competitive Compensation: Attractive salary packages that recognize your contributions and expertise
Our Culture
At Spark Eighteen, our culture centers on innovation, excellence, and growth. We believe in:
- Quality-First: Delivering excellence rather than just quick solutions
- True Partnership: Building relationships based on trust and mutual respect
- Communication: Prioritizing clear, effective communication across teams
- Innovation: Encouraging curiosity and creative approaches to problem-solving
- Continuous Learning: Supporting professional development at all levels
- Collaboration: Combining diverse perspectives to achieve shared goals
- Impact: Measuring success by the value we create for clients and users
Apply Here - https://tinyurl.com/t6x23p9b

About Spark Eighteen
About
Similar jobs
Job Title: Senior Devops Engineer (Full-time)
Location: Mumbai, Onsite
Experience Required: 5+ Years
Job Description
We are seeking an experienced DevOps Engineer to build and manage infrastructure for a FinTech product company operating with stateful microservices. The deployment environments include hybrid cloud and on-premise setups. The ideal candidate must have strong production experience with Kubernetes, cloud platforms, and infrastructure automation.
Key Responsibilities
- Design, build, and manage infrastructure for stateful microservices (databases, queues, caching layers).
- Work on Kubernetes environments—both managed (EKS/AKS/GKE) and self-managed clusters.
- Build, enhance, and maintain custom Helm Charts for complex deployments.
- Set up and manage CI/CD pipelines using ArgoCD, FluxCD, or similar GitOps tools.
- Architect and optimize multi-tenant deployment models.
- Implement and manage high availability, load balancing, certificate management (SSL/TLS).
- Design deployment architectures based on business requirements.
- Manage cloud infrastructure on AWS/Azure including VPC, IAM, cloud networking, and security.
- Work with Infrastructure-as-Code (IaC) tools (Terraform/CloudFormation/Pulumi), including writing reusable modules.
- Monitor, troubleshoot, and optimize performance across production environments.
- Ensure security best practices in networking, access control, and secrets management.
Mandatory Skills
- 5+ years of DevOps experience in product-based companies (not services/consulting).
- Strong hands-on experience with stateful microservices in production.
- Deep expertise in Kubernetes (managed + self-managed).
- Strong ability to write custom Helm Charts.
- Experience with multi-tenant production environments.
- Expertise in AWS or Azure (cloud networking, IAM, VPC, security groups, etc.).
- Experience setting up GitOps-based CI/CD (ArgoCD/FluxCD).
- Strong understanding of HA, load balancing, DNS, SSL/TLS certificates.
- Ability to justify architectural decisions and propose deployment designs.
- Hands-on experience with IaC tools and writing custom Terraform/Pulumi modules.
Nice to Have
- Exposure to hybrid cloud deployments
- Knowledge of on-premise orchestration & networking
- Experience with service mesh (e.g., Istio, Linkerd)
- Experience with monitoring/logging tools (Prometheus, Grafana, Loki, ELK)
General Description:Owns all technical aspects of software development for assigned applications
Participates in the design and development of systems & application programs
Functions as Senior member of an agile team and helps drive consistent development practices – tools, common components, and documentation
Required Skills:
In depth experience configuring and administering EKS clusters in AWS.
In depth experience in configuring **Splunk SaaS** in AWS environments especially in **EKS**
In depth understanding of OpenTelemetry and configuration of **OpenTelemetry Collectors**
In depth knowledge of observability concepts and strong troubleshooting experience.
Experience in implementing comprehensive monitoring and logging solutions in AWS using **CloudWatch**.
Experience in **Terraform** and Infrastructure as code.
Experience in **Helm**Strong scripting skills in Shell and/or python.
Experience with large-scale distributed systems and architecture knowledge (Linux/UNIX and Windows operating systems, networking, storage) in a cloud computing or traditional IT infrastructure environment.
Must have a good understanding of cloud concepts (Storage /compute/network).
Experience collaborating with several cross functional teams to architect observability pipelines for various GCP services like GKE, cloud run Big Query etc.
Experience with Git and GitHub.Experience with code build and deployment using GitHub actions, and Artifact Registry.
Proficient in developing and maintaining technical documentation, ADRs, and runbooks.
- Design cloud infrastructure that is secure, scalable, and highly available on AWS, Azure and GCP
- Work collaboratively with software engineering to define infrastructure and deployment requirements
- Provision, configure and maintain AWS, Azure, GCP cloud infrastructure defined as code
- Ensure configuration and compliance with configuration management tools
- Administer and troubleshoot Linux based systems
- Troubleshoot problems across a wide array of services and functional areas
- Build and maintain operational tools for deployment, monitoring, and analysis of AWS, Azure Infrastructure and systems
- Perform infrastructure cost analysis and optimization
About Hive
Hive is the leading provider of cloud-based AI solutions for content understanding,
trusted by the world’s largest, fastest growing, and most innovative organizations. The
company empowers developers with a portfolio of best-in-class, pre-trained AI models, serving billions of customer API requests every month. Hive also offers turnkey software applications powered by proprietary AI models and datasets, enabling breakthrough use cases across industries. Together, Hive’s solutions are transforming content moderation, brand protection, sponsorship measurement, context-based ad targeting, and more.
Hive has raised over $120M in capital from leading investors, including General Catalyst, 8VC, Glynn Capital, Bain & Company, Visa Ventures, and others. We have over 250 employees globally in our San Francisco, Seattle, and Delhi offices. Please reach out if you are interested in joining the future of AI!
About Role
Our unique machine learning needs led us to open our own data centers, with an
emphasis on distributed high performance computing integrating GPUs. Even with these data centers, we maintain a hybrid infrastructure with public clouds when the right fit. As we continue to commercialize our machine learning models, we also need to grow our DevOps and Site Reliability team to maintain the reliability of our enterprise SaaS offering for our customers. Our ideal candidate is someone who is
able to thrive in an unstructured environment and takes automation seriously. You believe there is no task that can’t be automated and no server scale too large. You take pride in optimizing performance at scale in every part of the stack and never manually performing the same task twice.
Responsibilities
● Create tools and processes for deploying and managing hardware for Private Cloud Infrastructure.
● Improve workflows of developer, data, and machine learning teams
● Manage integration and deployment tooling
● Create and maintain monitoring and alerting tools and dashboards for various services, and audit infrastructure
● Manage a diverse array of technology platforms, following best practices and
procedures
● Participate in on-call rotation and root cause analysis
Requirements
● Minimum 5 - 10 years of previous experience working directly with Software
Engineering teams as a developer, DevOps Engineer, or Site Reliability
Engineer.
● Experience with infrastructure as a service, distributed systems, and software design at a high-level.
● Comfortable working on Linux infrastructures (Debian) via the CLIAble to learn quickly in a fast-paced environment.
● Able to debug, optimize, and automate routine tasks
● Able to multitask, prioritize, and manage time efficiently independently
● Can communicate effectively across teams and management levels
● Degree in computer science, or similar, is an added plus!
Technology Stack
● Operating Systems - Linux/Debian Family/Ubuntu
● Configuration Management - Chef
● Containerization - Docker
● Container Orchestrators - Mesosphere/Kubernetes
● Scripting Languages - Python/Ruby/Node/Bash
● CI/CD Tools - Jenkins
● Network hardware - Arista/Cisco/Fortinet
● Hardware - HP/SuperMicro
● Storage - Ceph, S3
● Database - Scylla, Postgres, Pivotal GreenPlum
● Message Brokers: RabbitMQ
● Logging/Search - ELK Stack
● AWS: VPC/EC2/IAM/S3
● Networking: TCP / IP, ICMP, SSH, DNS, HTTP, SSL / TLS, Storage systems,
RAID, distributed file systems, NFS / iSCSI / CIFS
Who we are
We are a group of ambitious individuals who are passionate about creating a revolutionary AI company. At Hive, you will have a steep learning curve and an opportunity to contribute to one of the fastest growing AI start-ups in San Francisco. The work you do here will have a noticeable and direct impact on the
development of the company.
Thank you for your interest in Hive and we hope to meet you soon
Objectives :
- Building and setting up new development tools and infrastructure
- Working on ways to automate and improve development and release processes
- Testing code written by others and analyzing results
- Ensuring that systems are safe and secure against cybersecurity threats
- Identifying technical problems and developing software updates and ‘fixes’
- Working with software developers and software engineers to ensure that development follows established processes and works as intended
- Planning out projects and being involved in project management decisions
Daily and Monthly Responsibilities :
- Deploy updates and fixes
- Build tools to reduce occurrences of errors and improve customer experience
- Develop software to integrate with internal back-end systems
- Perform root cause analysis for production errors
- Investigate and resolve technical issues
- Develop scripts to automate visualization
- Design procedures for system troubleshooting and maintenance
Skills and Qualifications :
- Degree in Computer Science or Software Engineering or BSc in Computer Science, Engineering or relevant field
- 3+ years of experience as a DevOps Engineer or similar software engineering role
- Proficient with git and git workflows
- Good logical skills and knowledge of programming concepts(OOPS,Data Structures)
- Working knowledge of databases and SQL
- Problem-solving attitude
- Collaborative team spirit
Experience with Linux infrastructure, opensource databases (mysql, postgres etc), CI/CD tools Jenkins, Gitlab, Nexus Repository, Jira, Agile workflow and Kanban.
- Solid understanding of IP network and TCP/IP
- Minimum of 7+ years’ work experience in IT
- Minimum of 4+ years experience as DevOps Enginneer
- Good interpersonal skills and communication written and oral.
- Experience using AWS (that’s just common sense)
- Experience designing and building web environments on AWS, which includes working with services like EC2, ELB, RDS, and S3
- Experience building and maintaining cloud-native applications
- A solid background in Linux/Unix and Windows server system administration
- Experience using https://www.simplilearn.com/tutorials/devops-tutorial/devops-tools" target="_blank">DevOps tools in a cloud environment, such as Ansible, Artifactory, https://www.simplilearn.com/tutorials/docker-tutorial/what-is-docker-container" target="_blank">Docker, GitHub, https://www.simplilearn.com/tutorials/jenkins-tutorial/what-is-jenkins" target="_blank">Jenkins, https://www.simplilearn.com/tutorials/kubernetes-tutorial/what-is-kubernetes" target="_blank">Kubernetes, Maven, and Sonar Qube
- Experience installing and configuring different application servers such as JBoss, Tomcat, and WebLogic
- Experience using monitoring solutions like CloudWatch, ELK Stack, and Prometheus
- An understanding of writing Infrastructure-as-Code (IaC), using tools like CloudFormation or Terraform
- Knowledge of one or more of the most-used programming languages available for today’s cloud computing (i.e., SQL data, XML data, R math, Clojure math, Haskell functional, Erlang functional, Python procedural, and Go procedural languages)
- Experience in troubleshooting distributed systems
- Proficiency in script development and scripting languages
- The ability to be a team player
- The ability and skill to train other people in procedural and technical topics
- Strong communication and collaboration skills
As a special aside, an AWS engineer who works in DevOps should also have experience with:
- The theory, concepts, and real-world application of Continuous Delivery (CD), which requires familiarity with tools like AWS CodeBuild, AWS CodeDeploy, and AWS CodePipeline
- An understanding of automation
About the company:
Tathastu, the next-generation innovation labs is Future Group’s initiative to provide a new-age retail experience - combining the physical with digital and enhancing it with data. We are creating next-generation consumer interactions by combining AI/ML, Data Science, and emerging technologies with consumer platforms.
The E-Commerce vertical under Tathastu has developed online consumer platforms for Future Group’s portfolio of retail brands -Easy day, Big Bazaar, Central, Brand factory, aLL, Clarks, Coverstory. Backed by our network of offline stores we have built a new retail platform that merges our Online & Offline retail streams. We use data to power all our decisions across our products and build internal tools to help us scale our impact with a small closely-knit team.
Our widespread store network, robust logistics, and technology capabilities have made it possible to launch a ‘2-Hour Delivery Promise’ on every product across fashion, food, FMCG, and home products for orders placed online through the Big Bazaar mobile app and portal. This makes Big Bazaar the first retailer in the country to offer instant home delivery on almost every consumer product ordered online.
Job Responsibilities:
- You’ll streamline and automate the software development and infrastructure management processes and play a crucial role in executing high-impact initiatives and continuously improving processes to increase the effectiveness of our platforms.
- You’ll translate complex use cases into discrete technical solutions in platform architecture, design and coding, functionality, usability, and optimization.
- You will drive automation in repetitive tasks, configuration management, and deliver comprehensive automated tests to debug/troubleshoot Cloud AWS-based systems and BigData applications.
- You’ll continuously discover, evaluate, and implement new technologies to maximize the development and operational efficiency of the platforms.
- You’ll determine the metrics that will define technical and operational success and constantly track such metrics to fine-tune the technology stack of the organization.
Experience: 4 to 8 Yrs
Qualification: B.Tech / MCA
Required Skills:
- Experience with Linux/UNIX systems administration and Amazon Web Services (AWS).
- Infrastructure as Code (Terraform), Kubernetes and container orchestration, Web servers (Nginx, Apache), Application Servers(Tomcat,Node.js,..), document stores and relational databases (AWS RDS-MySQL).
- Site Reliability Engineering patterns and visibility /performance/availability monitoring (Cloudwatch, Prometheus)
- Background in and happy to work hands-on with technical troubleshooting and performance tuning.
- Supportive and collaborative personality - ability to influence and drive progress with your peers
Our Technology Stack:
- Docker/Kubernetes
- Cloud (AWS)
- Python/GoLang Programming
- Microservices
- Automation Tools









