
- Hands on experience in following is a must: Unix, Python and Shell Scripting.
- Hands on experience in creating infrastructure on cloud platform AWS is a must.
- Must have experience in industry standard CI/CD tools like Git/BitBucket, Jenkins, Maven, Artifactory and Chef.
- Must be good at these DevOps tools:
Version Control Tools: Git, CVS
Build Tools: Maven and Gradle
CI Tools: Jenkins
- Hands-on experience with Analytics tools, ELK stack.
- Knowledge of Java will be an advantage.
- Experience designing and implementing an effective and efficient CI/CD flow that gets code from dev to prod with high quality and minimal manual effort.
- Ability to help debug and optimise code and automate routine tasks.
- Should be extremely good in communication
- Experience in dealing with difficult situations and making decisions with a sense of urgency.
- Experience in Agile and Jira will be an add on

About Zimetrics Technologies
About
Connect with the team
Similar jobs
About the Job
This is a full-time role for a Lead DevOps Engineer at Spark Eighteen. We are seeking an experienced DevOps professional to lead our infrastructure strategy, design resilient systems, and drive continuous improvement in our deployment processes. In this role, you will architect scalable solutions, mentor junior engineers, and ensure the highest standards of reliability and security across our cloud infrastructure. The job location is flexible with preference for the Delhi NCR region.
Responsibilities
- Lead and mentor the DevOps/SRE team
- Define and drive DevOps strategy and roadmaps
- Oversee infrastructure automation and CI/CD at scale
- Collaborate with architects, developers, and QA teams to integrate DevOps practices
- Ensure security, compliance, and high availability of platforms
- Own incident response, postmortems, and root cause analysis
- Budgeting, team hiring, and performance evaluation
Requirements
Technical Skills
- Bachelor's or Master's degree in Computer Science, Engineering, or related field.
- 7+ years of professional DevOps experience with demonstrated progression.
- Strong architecture and leadership background
- Deep hands-on knowledge of infrastructure as code, CI/CD, and cloud
- Proven experience with monitoring, security, and governance
- Effective stakeholder and project management
- Experience with tools like Jenkins, ArgoCD, Terraform, Vault, ELK, etc.
- Strong understanding of business continuity and disaster recovery
Soft Skills
- Cross-functional communication excellence with ability to lead technical discussions.
- Strong mentorship capabilities for junior and mid-level team members.
- Advanced strategic thinking and ability to propose innovative solutions.
- Excellent knowledge transfer skills through documentation and training.
- Ability to understand and align technical solutions with broader business strategy.
- Proactive problem-solving approach with focus on continuous improvement.
- Strong leadership skills in guiding team performance and technical direction.
- Effective collaboration across development, QA, and business teams.
- Ability to make complex technical decisions with minimal supervision.
- Strategic approach to risk management and mitigation.
What We Offer
- Professional Growth: Continuous learning opportunities through diverse projects and mentorship from experienced leaders
- Global Exposure: Work with clients from 20+ countries, gaining insights into different markets and business cultures
- Impactful Work: Contribute to projects that make a real difference, with solutions generating over $1B in revenue
- Work-Life Balance: Flexible arrangements that respect personal wellbeing while fostering productivity
- Career Advancement: Clear progression pathways as you develop skills within our growing organization
- Competitive Compensation: Attractive salary packages that recognize your contributions and expertise
Our Culture
At Spark Eighteen, our culture centers on innovation, excellence, and growth. We believe in:
- Quality-First: Delivering excellence rather than just quick solutions
- True Partnership: Building relationships based on trust and mutual respect
- Communication: Prioritizing clear, effective communication across teams
- Innovation: Encouraging curiosity and creative approaches to problem-solving
- Continuous Learning: Supporting professional development at all levels
- Collaboration: Combining diverse perspectives to achieve shared goals
- Impact: Measuring success by the value we create for clients and users
Apply Here - https://tinyurl.com/t6x23p9b
Role: Full-Time, Long-Term Required: Docker, GCP, CI/CD Preferred: Experience with ML pipelines
OVERVIEW
We are seeking a DevOps engineer to join as a core member of our technical team. This is a long-term position for someone who wants to own infrastructure and deployment for a production machine learning system. You will ensure our prediction pipeline runs reliably, deploys smoothly, and scales as needed.
The ideal candidate thinks about failure modes obsessively, automates everything possible, and builds systems that run without constant attention.
CORE TECHNICAL REQUIREMENTS
Docker (Required): Deep experience with containerization. Efficient Dockerfiles, layer caching, multi-stage builds, debugging container issues. Experience with Docker Compose for local development.
Google Cloud Platform (Required): Strong GCP experience: Cloud Run for serverless containers, Compute Engine for VMs, Artifact Registry for images, Cloud Storage, IAM. You can navigate the console but prefer scripting everything.
CI/CD (Required): Build and maintain deployment pipelines. GitHub Actions required. You automate testing, building, pushing, and deploying. You understand the difference between continuous integration and continuous deployment.
Linux Administration (Required): Comfortable on the command line. SSH, diagnose problems, manage services, read logs, fix things. Bash scripting is second nature.
PostgreSQL (Required): Database administration basics—backups, monitoring, connection management, basic performance tuning. Not a DBA, but comfortable keeping a production database healthy.
Infrastructure as Code (Preferred): Terraform, Pulumi, or similar. Infrastructure should be versioned, reviewed, and reproducible—not clicked together in a console.
WHAT YOU WILL OWN
Deployment Pipeline: Maintaining and improving deployment scripts and CI/CD workflows. Code moves from commit to production reliably with appropriate testing gates.
Cloud Run Services: Managing deployments for model fitting, data cleansing, and signal discovery services. Monitor health, optimize cold starts, handle scaling.
VM Infrastructure: PostgreSQL and Streamlit on GCP VMs. Instance management, updates, backups, security.
Container Registry: Managing images in GitHub Container Registry and Google Artifact Registry. Cleanup policies, versioning, access control.
Monitoring and Alerting: Building observability. Logging, metrics, health checks, alerting. Know when things break before users tell us.
Environment Management: Configuration across local and production. Secrets management. Environment parity where it matters.
WHAT SUCCESS LOOKS LIKE
Deployments are boring—no drama, no surprises. Systems recover automatically from transient failures. Engineers deploy with confidence. Infrastructure changes are versioned and reproducible. Costs are reasonable and resources scale appropriately.
ENGINEERING STANDARDS
Automation First: If you do something twice, automate it. Manual processes are bugs waiting to happen.
Documentation: Runbooks, architecture diagrams, deployment guides. The next person can understand and operate the system.
Security Mindset: Secrets never in code. Least-privilege access. You think about attack surfaces.
Reliability Focus: Design for failure. Backups are tested. Recovery procedures exist and work.
CURRENT ENVIRONMENT
GCP (Cloud Run, Compute Engine, Artifact Registry, Cloud Storage), Docker, Docker Compose, GitHub Actions, PostgreSQL 16, Bash deployment scripts with Python wrapper.
WHAT WE ARE LOOKING FOR
Ownership Mentality: You see a problem, you fix it. You do not wait for assignment.
Calm Under Pressure: When production breaks, you diagnose methodically.
Communication: You explain infrastructure decisions to non-infrastructure people. You document what you build.
Long-Term Thinking: You build systems maintained for years, not quick fixes creating tech debt.
EDUCATION
University degree in Computer Science, Engineering, or related field preferred. Equivalent demonstrated expertise also considered.
TO APPLY
Include: (1) CV/resume, (2) Brief description of infrastructure you built or maintained, (3) Links to relevant work if available, (4) Availability and timezone.
Job Responsibilities:
- Managing and maintaining the efficient functioning of containerized applications and systems within an organization
- Design, implement, and manage scalable Kubernetes clusters in cloud or on-premise environments
- Develop and maintain CI/CD pipelines to automate infrastructure and application deployments, and track all automation processes
- Implement workload automation using configuration management tools, as well as infrastructure as code (IaC) approaches for resource provisioning
- Monitor, troubleshoot, and optimize the performance of Kubernetes clusters and underlying cloud infrastructure
- Ensure high availability, security, and scalability of infrastructure through automation and best practices
- Establish and enforce cloud security standards, policies, and procedures Work agile technologies
Primary Requirements:
- Kubernetes: Proven experience in managing Kubernetes clusters (min. 2-3 years)
- Linux/Unix: Proficiency in administering complex Linux infrastructures and services
- Infrastructure as Code: Hands-on experience with CM tools like Ansible, as well as the
- knowledge of resource provisioning with Terraform or other Cloud-based utilities
- CI/CD Pipelines: Expertise in building and monitoring complex CI/CD pipelines to
- manage the build, test, packaging, containerization and release processes of software
- Scripting & Automation: Strong scripting and process automation skills in Bash, Python
- Monitoring Tools: Experience with monitoring and logging tools (Prometheus, Grafana)
- Version Control: Proficient with Git and familiar with GitOps workflows.
- Security: Strong understanding of security best practices in cloud and containerized
- environments.
Skills/Traits that would be an advantage:
- Kubernetes administration experience, including installation, configuration, and troubleshooting
- Kubernetes development experience
- Strong analytical and problem-solving skills
- Excellent communication and interpersonal skills
- Ability to work independently and as part of a team
(Candidates from Service based Companies apply-Looking for automation(shell or python scripting))
SHIFT- Shift time either US East coast or west coast (2:30 PM to 10:30 PM India time or 5 to 2 am india time)
Exp- 5 to 8 years
Salary- Upto 25 LPA
Hyderabad based candidates preferred!
Immediate joiners would be preferred!!
Role Objective:
- Ability to identify processes where efficiency could be improved via automation
- Ability to research, prototype, iterate and test automation solutions
- Good Technical understanding of Cloud service offering, with a sound appreciation of the associated business processes.
- Ability to build & maintain a strong working relationship with other Technical teams using the agile methodology (internal and external), Infrastructure Partners and Service Engagement Managers.
- Ability to shape and co-ordinate delivery of key initiatives to deliver improvements in stability
- Good understanding of the cost of the e2e service provision, and delivery of associated savings.
- Knowledge of web security principals
- Strong Linux experience – comfortable working from command line
- Some networking knowledge (routing, DNS)
- Knowledge of HA and DR concepts and experience implementing them
- Working with team to analyse and design infrastructure with 99.99% up-time.
Qualifications:
- Infrastructure automation through DevOps scripting (Eg Python, Ruby, PowerShell, Java, shell) or previous software development experience
- Experience in building and managing production cloud environments from the ground up.
- Hands-on, working experience with primary AWS services (EC2, VPC, RDS, Route53, S3)
- Knowledge on repository management (GitHub, SVN)
- Solid understanding of web application architecture and RDBMS (SQL Server preferred).
- Experience with IT compliance and risk management requirements is a bonus. (Eg Security, Privacy, HIPAA, SOX, etc)
- Strong logical, analytical and problem-solving skills with excellent communication skills.
- Should have degree in computer science, MIS, engineering or equivalent with 5+ years of experience.
- Should be willing to work in rotational shifts (including the nights)
Perks and benefits:
- Health & Wellness
- Paid time off
- Learning at work
- Fun at work
- Night shift allowance
- Comp off
- Pick and drop facility available to certain distance
Responsibilities
- Building and maintenance of resilient and scalable production infrastructure
- Improvement of monitoring systems
- Creation and support of development automation processes (CI / CD)
- Participation in infrastructure development
- Detection of problems in architecture and proposing of solutions for solving them
- Creation of tasks for system improvements for system scalability, performance and monitoring
- Analysis of product requirements in the aspect of devops
- Managing a team of DevOps, control of task deliveries
- Incident analysis and fixing
Technology stack
Linux, Bash, Salt/Ansible, LXC, libvirt, IPsec, VXLAN, Open vSwitch, OpenVPN, OSPF, BIRD, Cisco NX-OS, Multicast, PIM, LVM, software RAID, LUKS, PostgreSQL, nginx, haproxy, Prometheus, Grafana, Zabbix, GitLab, Capistrano
Skills and Experience
- Understanding of the distributed systems principles
- Understanding of principles for building a resistant network infrastructure
- Experience of Ubuntu Linux administration (Debian-like will be a plus)
- Strong knowledge of Bash
- Experience of working with LXC-containers
- Understanding and experience with infrastructure as a code approach
- Experience of development idempotent Ansible roles
- Experience with relational databases (PostgeSQL), ability to create simple SQL queries
- Experience with git
- Experience with monitoring and metric collect systems (Prometheus, Grafana, Zabbix)
- Understanding of dynamic routing (OSPF)
Preferred experience
- Experience of working with highload zero-downtown environments
- Experience of coding on Python
- Experience of working with IPsec, VXLAN, Open vSwitch
- Knowledge and experience of working with network equipment Cisco
- Experience of working with Cisco NX-OS
- Knowledge of principles of multicast protocols IGMP, PIM
- Experience of setting multicast on Cisco equipment
- Experience of working with Solarflare Onload
- Experience administering Atlassian products
Job Dsecription: (8-12 years)
○ Develop best practices for team and also responsible for the architecture
○ solutions and documentation operations in order to meet the engineering departments quality and standards
○ Participate in production outage and handle complex issues and works towards Resolution
○ Develop custom tools and integration with existing tools to increase engineering Productivity
Required Experience and Expertise
○ Deep understanding of Kernel, Networking and OS fundamentals
○ Strong experience in writing helm charts.
○ Deep understanding of K8s.
○ Good knowledge in service mesh.
○ Good Database understanding
Notice Period: 30 day max
Job Location: Jaipur
Experience Required: Minimum 3 years
About the role:
As a DevOps Engineer for Punchh, you will be working with our developers, SRE, and DevOps teams implementing our next generation infrastructure. We are looking for a self-motivated, responsible, team player who love designing systems that scale. Punchh provides a rich engineering environment where you can be creative, learn new technologies, solve engineering problems, all while delivering business objectives. The DevOps culture here is one with immense trust and responsibility. You will be given the opportunity to make an impact as there are no silos here.
Responsibilities:
- Deliver SLA and business objectives through whole lifecycle design of services through inception to implementation.
- Ensuring availability, performance, security, and scalability of AWS production systems
- Scale our systems and services through continuous integration, infrastructure as code, and gradual refactoring in an agile environment.
- Maintain services once a project is live by monitoring and measuring availability, latency, and overall system and application health.
- Write and maintain software that runs the infrastructure that powers the Loyalty and Data platform for some of the world’s largest brands.
- 24x7 in shifts on call for Level 2 and higher escalations
- Respond to incidents and write blameless RCA’s/postmortems
- Implement and practice proper security controls and processes
- Providing recommendations for architecture and process improvements.
- Definition and deployment of systems for metrics, logging, and monitoring on platform.
Must have:
- Minimum 3 Years of Experience in DevOps.
- BS degree in Computer Science, Mathematics, Engineering, or equivalent practical experience.
- Strong inter-personal skills.
- Must have experience in CI/CD tooling such as Jenkins, CircleCI, TravisCI
- Must have experience in Docker, Kubernetes, Amazon ECS or Mesos
- Experience in code development in at least one high-level programming language fromthis list: python, ruby, golang, groovy
- Proficient in shell scripting, and most importantly, know when to stop scripting and start developing.
- Experience in creation of highly automated infrastructures with any Configuration Management tools like: Terraform, Cloudformation or Ansible.
- In-depth knowledge of the Linux operating system and administration.
- Production experience with a major cloud provider such Amazon AWS.
- Knowledge of web server technologies such as Nginx or Apache.
- Knowledge of Redis, Memcache, or one of the many in-memory data stores.
- Experience with various load balancing technologies such as Amazon ALB/ELB, HA Proxy, F5.
- Comfortable with large-scale, highly-available distributed systems.
Good to have:
- Understanding of Web Standards (REST, SOAP APIs, OWASP, HTTP, TLS)
- Production experience with Hashicorp products such as Vault or Consul
- Expertise in designing, analyzing troubleshooting large-scale distributed systems.
- Experience in an PCI environment
- Experience with Big Data distributions from Cloudera, MapR, or Hortonworks
- Experience maintaining and scaling database applications
- Knowledge of fundamental systems engineering principles such as CAP Theorem, Concurrency Control, etc.
- Understanding of the network fundamentals: OSI, TCI/IP, topologies, etc.
- Understanding of Auditing of Infrastructure and help org. to control Infrastructure costs.
- Experience in Kafka, RabbitMQ or any messaging bus.

2. Has done Infrastructure coding using Cloudformation/Terraform and Configuration also understands it very clearly
3. Deep understanding of the microservice design and aware of centralized Caching(Redis),centralized configuration(Consul/Zookeeper)
4. Hands-on experience of working on containers and its orchestration using Kubernetes
5. Hands-on experience of Linux and Windows Operating System
6. Worked on NoSQL Databases like Cassandra, Aerospike, Mongo or
Couchbase, Central Logging, monitoring and Caching using stacks like ELK(Elastic) on the cloud, Prometheus, etc.
7. Has good knowledge of Network Security, Security Architecture and Secured SDLC practices







