
Key Responsibilities:
- Lead the architecture, design, and implementation of scalable, secure, and highly available AWS infrastructure leveraging services such as VPC, EC2, IAM, S3, SNS/SQS, EKS, KMS, and Secrets Manager.
- Develop and maintain reusable, modular IaC frameworks using Terraform and Terragrunt, and mentor team members on IaC best practices.
- Drive automation of infrastructure provisioning, deployment workflows, and routine operations through advanced Python scripting.
- Take ownership of cost optimization strategy by analyzing usage patterns, identifying savings opportunities, and implementing guardrails across multiple AWS environments.
- Define and enforce infrastructure governance, including secure access controls, encryption policies, and secret management mechanisms.
- Collaborate cross-functionally with development, QA, and operations teams to streamline and scale CI/CD pipelines for containerized microservices on Kubernetes (EKS).
- Establish monitoring, alerting, and observability practices to ensure platform health, resilience, and performance.
- Serve as a technical mentor and thought leader, guiding junior engineers and shaping cloud adoption and DevOps culture across the organization.
- Evaluate emerging technologies and tools, recommending improvements to enhance system performance, reliability, and developer productivity.
- Ensure infrastructure complies with security, regulatory, and operational standards, and drive initiatives around audit readiness and compliance.
Mandatory Skills & Experience:
- AWS (Advanced Expertise): VPC, EC2, IAM, S3, SNS/SQS, EKS, KMS, Secrets Management
- Infrastructure as Code: Extensive experience with Terraform and Terragrunt, including module design and IaC strategy
- Strong hold in Kubernetes
- Scripting & Automation: Proficient in Python, with a strong track record of building tools, automating workflows, and integrating cloud services
- Cloud Cost Optimization: Proven ability to analyze cloud spend and implement sustainable cost control strategies
- Leadership: Experience in leading DevOps/infrastructure teams or initiatives, mentoring engineers, and making architecture-level decisions
Nice to Have:
- Experience designing or managing CI/CD pipelines for Kubernetes-based environments
- Backend development background in Python (e.g., FastAPI, Flask)
- Familiarity with monitoring/observability tools such as Prometheus, Grafana, CloudWatch
- Understanding of system performance tuning, capacity planning, and scalability best practices
- Exposure to compliance standards such as SOC 2, HIPAA, or ISO 27001

About OpsTree Solutions
About
OpsTree Global is a digital transformation and platform engineering partner that helps organizations build scalable, secure, and high-impact technology foundations. With expertise across cloud modernization, Data & AI, Observability & SRE, DevSecOps, security, quality engineering, and end-to-end software delivery, OpsTree enables faster, outcome-driven digital transformation.
As an AWS Advanced Tier Services Partner and App Modernization specialist, OpsTree blends cloud-native practices with AI-driven innovation to deliver resilient, high-performing platforms. Its in-house DevSecOps platform, BuildPiper, helps enterprises standardize and accelerate software delivery at scale.
Trusted by 250+ organizations—from startups to Fortune 100 enterprises—OpsTree is known for making software delivery lean, nimble, and highly productive. Driven by a culture of continuous learning, strong ethics, and thought leadership, OpsTree fosters a transparent and growth-oriented environment that empowers teams to build the next generation of cloud-native solutions.
Candid answers by the company
OpsTree Global helps organizations accelerate digital transformation by building scalable, secure, and cloud-native platforms through platform engineering, DevSecOps, and modernization.
Photos
Connect with the team
Similar jobs
Senior DevOps Engineer (8–10 years)
Location: Mumbai
Role Summary
As a Senior DevOps Engineer, you will own end-to-end platform reliability and delivery automation for mission-critical lending systems. You’ll architect cloud infrastructure, standardize CI/CD, enforce DevSecOps controls, and drive observability at scale—ensuring high availability, performance, and compliance consistent with BFSI standards.
Key Responsibilities
Platform & Cloud Infrastructure
- Design, implement, and scale multi-account, multi-VPC cloud architectures on AWS and/or Azure (compute, networking, storage, IAM, RDS, EKS/AKS, Load Balancers, CDN).
- Champion Infrastructure as Code (IaC) using Terraform (and optionally Pulumi/Crossplane) with GitOps workflows for repeatable, auditable deployments.
- Lead capacity planning, cost optimization, and performance tuning across environments.
CI/CD & Release Engineering
- Build and standardize CI/CD pipelines (Jenkins, GitHub Actions, Azure DevOps, ArgoCD) for microservices, data services, and frontends; enable blue‑green/canary releases and feature flags.
- Drive artifact management, environment promotion, and release governance with compliance-friendly controls.
Containers, Kubernetes & Runtime
- Operate production-grade Kubernetes (EKS/AKS), including cluster lifecycle, autoscaling, ingress, service mesh, and workload security; manage Docker/containerd images and registries.
Reliability, Observability & Incident Management
- Implement end-to-end monitoring, logging, and tracing (Prometheus, Grafana, ELK/EFK, CloudWatch/Log Analytics, Datadog/New Relic) with SLO/SLI error budgets.
- Establish on-call rotations, run postmortems, and continuously improve MTTR and change failure rate.
Security & Compliance (DevSecOps)
- Enforce cloud and container hardening, secrets management (AWS Secrets Manager / HashiCorp Vault), vulnerability scanning (Snyk/SonarQube), and policy-as-code (OPA/Conftest).
- Partner with infosec/risk to meet BFSI regulatory expectations for DR/BCP, audits, and data protection.
Data, Networking & Edge
- Optimize networking (DNS, TCP/IP, routing, OSI layers) and edge delivery (CloudFront/Fastly), including WAF rules and caching strategies.
- Support persistence layers (MySQL, Elasticsearch, DynamoDB) for performance and reliability.
Ways of Working & Leadership
- Lead cross-functional squads (Product, Engineering, Data, Risk) and mentor junior DevOps/SREs.
- Document runbooks, architecture diagrams, and operating procedures; drive automation-first culture.
Must‑Have Qualifications
- 8–10 years of total experience with 5+ years hands-on in DevOps/SRE roles.
- Strong expertise in AWS and/or Azure, Linux administration, Kubernetes, Docker, and Terraform.
- Proven track record building CI/CD with Jenkins/GitHub Actions/Azure DevOps/ArgoCD.
- Solid grasp of networking fundamentals (DNS, TLS, TCP/IP, routing, load balancing).
- Experience implementing observability stacks and responding to production incidents.
- Scripting in Bash/Python; ability to automate ops workflows and platform tasks.
- Good‑to‑Have / Preferred
- Exposure to BFSI/fintech systems and compliance standards; DR/BCP planning.
- Secrets management (Vault), policy-as-code (OPA), and security scanning (Snyk/SonarQube).
- Experience with GitOps patterns, service tiering, and SLO/SLI design. [illbeback.ai]
- Knowledge of CDNs (CloudFront/Fastly) and edge caching/WAF rule authoring.
- Education
- Bachelor’s/Master’s in Computer Science, Information Technology, or related field (or equivalent experience).
Roles & Responsibilities:
- Bachelor’s degree in Computer Science, Information Technology or a related field
- Experience in designing and maintaining high volume and scalable micro-services architecture on cloud infrastructure
- Knowledge in Linux/Unix Administration and Python/Shell Scripting
- Experience working with cloud platforms like AWS (EC2, ELB, S3, Auto-scaling, VPC, Lambda), GCP, Azure
- Knowledge in deployment automation, Continuous Integration and Continuous Deployment (Jenkins, Maven, Puppet, Chef, GitLab) and monitoring tools like Zabbix, Cloud Watch Monitoring, Nagios Knowledge of Java Virtual Machines, Apache Tomcat, Nginx, Apache Kafka, Microservices architecture, Caching mechanisms
- Experience in enterprise application development, maintenance and operations
- Knowledge of best practices and IT operations in an always-up, always-available service
- Excellent written and oral communication skills, judgment and decision-making skills
Bachelor's degree in Computer Science or a related field, or equivalent work experience
Strong understanding of cloud infrastructure and services, such as AWS, Azure, or Google Cloud Platform
Experience with infrastructure as code tools such as Terraform or CloudFormation
Proficiency in scripting languages such as Python, Bash, or PowerShell
Familiarity with DevOps methodologies and tools such as Git, Jenkins, or Ansible
Strong problem-solving and analytical skills
Excellent communication and collaboration skills
Ability to work independently and as part of a team
Willingness to learn new technologies and tools as required
Electrum is looking for an experienced and proficient DevOps Engineer. This role will provide you with an opportunity to explore what’s possible in a collaborative and innovative work environment. If your goal is to work with a team of talented professionals that is keenly focused on solving complex business problems and supporting product innovation with technology, you might be our new DevOps Engineer. With this position, you will be involved in building out systems for our rapidly expanding team, enabling the whole engineering group to operate more effectively and iterate at top speed in an open, collaborative environment. The ideal candidate will have a solid background in software engineering and a vivid experience in deploying product updates, identifying production issues, and implementing integrations. The ideal candidate has proven capabilities and experience in risk-taking, is willing to take up challenges, and is a strong believer in efficiency and innovation with exceptional communication and documentation skills.
YOU WILL:
- Plan for future infrastructure as well as maintain & optimize the existing infrastructure.
- Conceptualize, architect, and build:
- 1. Automated deployment pipelines in a CI/CD environment like Jenkins;
- 2. Infrastructure using Docker, Kubernetes, and other serverless platforms;
- 3. Secured network utilizing VPCs with inputs from the security team.
- Work with developers & QA team to institute a policy of Continuous Integration with Automated testing Architect, build and manage dashboards to provide visibility into delivery, production application functional, and performance status.
- Work with developers to institute systems, policies, and workflows which allow for a rollback of deployments.
- Triage release of applications/ Hotfixes to the production environment on a daily basis.
- Interface with developers and triage SQL queries that need to be executed in production environments.
- Maintain 24/7 on-call rotation to respond and support troubleshooting of issues in production.
- Assist the developers and on calls for other teams with a postmortem, follow up and review of issues affecting production availability.
- Scale Electum platform to handle millions of requests concurrently.
- Reduce Mean Time To Recovery (MTTR), enable High Availability and Disaster Recovery
PREREQUISITES:
- Bachelor’s degree in engineering, computer science, or related field, or equivalent work experience.
- Minimum of six years of hands-on experience in software development and DevOps, specifically managing AWS Infrastructures such as EC2s, RDS, Elastic cache, S3, IAM, cloud trail, and other services provided by AWS.
- At least 2 years of experience in building and owning serverless infrastructure.
- At least 2 years of scripting experience in Python (Preferable) and Shell Web Application Deployment Systems Continuous Integration tools (Ansible).
- Experience building a multi-region highly available auto-scaling infrastructure that optimizes performance and cost.
- Experience in automating the provisioning of AWS infrastructure as well as automation of routine maintenance tasks.
- Must have prior experience automating deployments to production and lower environments.
- Worked on providing solutions for major automation with scripts or infrastructure.
- Experience with APM tools such as DataDog and log management tools.
- Experience in designing and implementing Essential Functions System Architecture Process; establishing and enforcing Network Security Policy (AWS VPC, Security Group) & ACLs.
- Experience establishing and enforcing:
- 1. System monitoring tools and standards
- 2. Risk Assessment policies and standards
- 3. Escalation policies and standards
- Excellent DevOps engineering, team management, and collaboration skills.
- Advanced knowledge of programming languages such as Python and writing code and scripts.
- Experience or knowledge in - Application Performance Monitoring (APM), and prior experience as an open-source contributor will be preferred.
Role Introduction
• This role involves guiding the DevOps team towards successful delivery of Governance and
toolchain initiatives by removing manual tasks.
• Operate toolchain applications to empower engineering teams by providing, reliable, governed
self-service tools and supporting their adoption
• Driving good practice for consumption and utilisation of the engineering toolchain, with a focus
on DevOps practices
• Drive good governance for cloud service consumption
• Involves working in a collaborative environment and focus on leading team and providing
technical leadership to team members.
• Involves setting up process and improvements for teams on supporting various DevOps tooling
and governing the tooling.
• Co-ordinating with multiple teams within organization
• Lead on handovers from architecture teams to support major project rollouts which require the
Toolchain governance DevOps team to operationally support tooling
What you will do
• Identify and implement best practices, process improvement and automation initiatives for
improvement towards quicker delivery by removing manual tasks
• Ensure best practices and process are documented for reusability and keeping up-to date on
good practices and standards.
• Re-usable automation and compliance service, tools and processes
• Support and management of toolchain, toolchain changes and selection
• Identify and implement risk mitigation plans, avoid escalations, resolve blockers for teams.
Toolchain governance will involve operating and responding to alerts, enforcing good tooling
governance by driving automation, remediating technical debt and ensuring the latest tools
are utilised and on the latest versions
• Triage product pipelines, performance issues, SLA/SLO breaches, service unavailable along
with ancillary actions such as providing access to logs, tools, environments.
• Involve in initial / detailed estimates during roadmap planning or feature
estimation/planning of any automation identified for a given toolset.
• Develop, refine, and tune integrations between various tools
• Discuss with Product Owner/team on any challenges from implementation, deployment
perspective and assist in arriving probable solution and escalate any risks to get them
resolved w.r.t DevOps toolchain.
• In consultation with Head of DevOps and other stake holders, prioritization of items, item-
task breakdown; accountable for squad deliverables for sprint
• Involve in reviewing current components and plan for upgrade and ensure its communicated
to wider audience within Organization
• Involve in reviewing access / role and enhance and automate provisioning.
• Identify and encourage areas for growth and improvement within the team e.g conducts
regular 1-2-1’s with squad members to provide support, mentoring and goal setting
• Involve in performance management ,rewards and recognition of team members, Involve in
hiring process.• Plan for upskill of team to know about tools and perform tasks. Ensure quicker onboarding
of new joiners/freshers to team to be productive.
• Review ticket metrics to measure the health of the project including SLAs and plan for
improvement.
• Requirement for on call for critical incidents that happen Out of Hours, based on tooling SLA.
This may include planning standby schedule for squad, carrying out retrospective for every
callout and reviewing SLIs/SLOs.
• Owns the tech/repair debt, risk and compliance for the tooling with respect to
infrastructure, pipelines, access etc
• Track optimum utilization of resources and monitor/track the delivery schedule
• Review solutions designs with the Architects / Principal DevOps Engineers as required
• Provide monthly reporting which align to DevOps Tooling KPIs
What you will have
• Candidate should have 8+ years of experience and Hands-on DevOps experience and
experience in team management.
• Strong communication and interpersonal skills, Team player
• Good working experience of CI/CD tools like Jenkins, SonarQube, FOSSA, Harness, Jira, JSM,
ServiceNow etc.
• Good hands on knowledge of AWS Services like EC2, ECS, S3, IAM, SNS, SQS, VPC, Lambda,
API Gateway, Cloud Watch, Cloud Formation etc.
• Experience in operating and governing DevOps Toolchain
• Experience in operational monitoring, alerting and identifying and delivering on both repair
and technical debt
• Experience and background in ITIL/ITSM processes. The candidate will ensure development
of the appropriate (ITSM) model and processes, based on the ITIL Service Management
framework. This includes the strategic, design, transition, and operation services and
continuous service improvement
• Provide ITSM leadership experience and coaching processes
• Experience on various tools like Jenkins, Harness, Fossa,
• Experience of hosting and managing applications on AWS/AZURE•
• Experience in CI/CD pipeline (Jenkins build pipelines)
• Experience in containerization (Docker/Kubernetes)
• Experience in any programming language (Node.js or Python is preferred)
• Experience in Architecting and supporting cloud based products will be a plus
• Experience in PowerShell & Bash will be a plus
• Able to self manage multiple concurrent small projects, including managing priorities
between projects
• Able to quickly learn new tools
• Should be able to mentor/drive junior team members to achieve desired outcome of
roadmap-
• Ability to analyse information to identify problems and issues, and make effective decisions
within short span
• Excellent problem solving and critical thinking
• Experience in integrating various components including unit testing / CI/CD configuration.
• Experience to review current toolset and plan for upgrade.
• Experience with Agile framework/Jira/JSM tool.• Good communication skills and ability to communicate/work independently with external
teams.
• Highly motivated, able to work proficiently both independently and in a team environment
Good knowledge and experience with security constructs –
About Us -Celebal Technologies is a premier software services company in the field of Data Science, Big Data and Enterprise Cloud. Celebal Technologies helps you to discover the competitive advantage by employing intelligent data solutions using cutting-edge technology solutions that can bring massive value to your organization. The core offerings are around "Data to Intelligence", wherein we leverage data to extract intelligence and patterns thereby facilitating smarter and quicker decision making for clients. With Celebal Technologies, who understands the core value of modern analytics over the enterprise, we help the business in improving business intelligence and more data-driven in architecting solutions.
Key Responsibilities
• As a part of the DevOps team, you will be responsible for configuration, optimization, documentation, and support of the CI/CD components.
• Creating and managing build and release pipelines with Azure DevOps and Jenkins.
• Assist in planning and reviewing application architecture and design to promote an efficient deployment process.
• Troubleshoot server performance issues & handle the continuous integration system.
• Automate infrastructure provisioning using ARM Templates and Terraform.
• Monitor and Support deployment, Cloud-based and On-premises Infrastructure.
• Diagnose and develop root cause solutions for failures and performance issues in the production environment.
• Deploy and manage Infrastructure for production applications
• Configure security best practices for application and infrastructure
Essential Requirements
• Good hands-on experience with cloud platforms like Azure, AWS & GCP. (Preferably Azure)
• Strong knowledge of CI/CD principles.
• Strong work experience with CI/CD implementation tools like Azure DevOps, Team city, Octopus Deploy, AWS Code Deploy, and Jenkins.
• Experience of writing automation scripts with PowerShell, Bash, Python, etc.
• GitHub, JIRA, Confluence, and Continuous Integration (CI) system.
• Understanding of secure DevOps practices
Good to Have -
• Knowledge of scripting languages such as PowerShell, Bash
• Experience with project management and workflow tools such as Agile, Jira, Scrum/Kanban, etc.
• Experience with Build technologies and cloud services. (Jenkins, TeamCity, Azure DevOps, Bamboo, AWS Code Deploy)
• Strong communication skills and ability to explain protocol and processes with team and management.
• Must be able to handle multiple tasks and adapt to a constantly changing environment.
• Must have a good understanding of SDLC.
• Knowledge of Linux, Windows server, Monitoring tools, and Shell scripting.
• Self-motivated; demonstrating the ability to achieve in technologies with minimal supervision.
• Organized, flexible, and analytical ability to solve problems creatively.
Job Description:
Responsibilities
· Having E2E responsibility for Azure landscape of our customers
· Managing to code release and operational tasks within a global team with a focus on automation, maintainability, security and customer satisfaction
· Make usage of CI/CD framework to rapidly support lifecycle management of the platform
· Acting as L2-L3 support for incidents, problems and service request
· Work with various Atos and 3rd party teams to resolve incidents and implement changes
· Implement and drive automation and self-healing solutions to reduce toil
· Enhance error budgets and hands on design and development of solutions to address reliability issues and/or risks
· Support ITSM processes and collaborate with service management representatives
Job Requirements
· Azure Associate certification or equivalent knowledge level
· 5+ years of professional experience
· Experience with Terraform and/or native Azure automation
· Knowledge of CI/CD concepts and toolset (i.e. Jenkins, Azure DevOps, Git)
· Must be adaptable to work in a varied, fast paced exciting, ever changing environment
· Good analytical and problem-solving skills to resolve technical issues
· Understanding of Agile development and SCRUM concepts a plus
· Experience with Kubernetes architecture and tools a plus
Projects you'll be working on:
- We're focused on enhancing our product for our clients and their users, as well as streamlining operations and improving our technical foundation.
- Writing scripts for procurement, configuration and deployment of instances (infrastructure automation) on GCP
- Managing Kubernetes cluster
- Manage product and services like VPC, Elasticsearch, cloud functions, rabbitMQ, redis servers, postgres infrastructure, app engine, etc.
- Supporting developers in setting up infrastructure for services
- Manage and improve microservices infrastructure
- Managing high availability, low latency applications
- Focus on security best practices to ensure assist in security and compliance activities
Requirements
- Minimum 3 years experience as DevOps
- Minimum 1 years' experience with Kubernetes Cluster (Infrastructure as code, maintaining and scalability).
- BASH expertise, node or python professional programming experience
- Experience with setting up, configuring and using Jenkins or any CI tools, building CI/CD pipeline
- Experience setting microservices architecture
- Experience with package management and deployments
- Thorough understanding of networking.
- Understanding of all common services and protocols
- Experience in web server configuration, monitoring, network design and high availability
- Thorough understanding of DNS, VPN, SSL
Technologies you'll work with:
- GKE, Prometheus, Grafana, Stackdriver
- ArgoCD and GitHub Actions
- NodeJS Backend
- Postgres, ElasticSearch, Redis, RabbitMQ
- Whatever else you decide - we're constantly re-evaluating our stack and tools
- Having prior experience with the technologies is a plus, but not mandatory for skilled candidates.
Benefits
- Remote Option - You can work from location of your choice :)
- Reimbursement of Home Office Setup
- Competitive Salary
- Friendly atmosphere
- Flexible paid vacation policy
We are front runners of the technological revolution with an inexhaustible passion for technology! DevOn is the technical organization that originated from Prowareness. We are the company at the forefront of leading DevOps transformations and setting up High Performance Distributed DevOps teams with leading companies worldwide. DevOn helps market leaders to take the next step in software delivery. We consist of a dynamic team, in which personal growth is central!
About You
You have 6+ years of experience in AWS infra Automation. This is a fantastic opportunity to work in a fast-paced operations environment and to develop your career in Cloud technologies, particularly Amazon Web Services.
You are building and monitoring CI/CD pipeline in AWS cloud. This is a highly scalable backend application building on Java platform. We need a resource who can troubleshoot, diagnose and rectify system service issues.
You’re cloud native with Terraform as an orchestration. You would use Terraform as a key Orchestration in Infrastructure as Code.
You're comfortable driving. You prefer to own your work streams and enjoy working in autonomy to progress towards your goals.
You provide an incredible support to the team. You sweat the small stuff but keep the big picture in mind. You know that a pair programming can give better result
An ideal candidate is/are:
This is a key role within our DevOps team and will involve working as part of a collaborative agile team in a shared services DevOps organization to support and deliver innovative technology solutions that directly align with the delivery of business value and enhanced customer experience. The primary objective is to provide support to Amazon Web Services hosted environment, ensure continuous availability, working closely with development teams to ensure best value for money, and effective estate management.
- Setup CI/CD Pipeline from scratch along with integration of appropriate quality gates.
- Expertise level knowledge in AWS cloud. Provision and configure infrastructure as code using Terraform
- Build and configure Kubernetes-based infrastructure, networking policies, LBs, and cluster security. Define autoscaling and cost strategies.
- Automate the build of containerized systems with CI/CD tooling, Helm charts, and more
- Manage deployments and rollbacks of applications
- Implement monitoring and metrics with Cloud watch, Newrelic
- Troubleshoot and optimize containerized workload deployments for clients
- Automate operational tasks, and assist in the transition to service ownership models.


















