
Senior DevOps Engineer (8–10 years)
Location: Mumbai
Role Summary
As a Senior DevOps Engineer, you will own end-to-end platform reliability and delivery automation for mission-critical lending systems. You’ll architect cloud infrastructure, standardize CI/CD, enforce DevSecOps controls, and drive observability at scale—ensuring high availability, performance, and compliance consistent with BFSI standards.
Key Responsibilities
Platform & Cloud Infrastructure
- Design, implement, and scale multi-account, multi-VPC cloud architectures on AWS and/or Azure (compute, networking, storage, IAM, RDS, EKS/AKS, Load Balancers, CDN).
- Champion Infrastructure as Code (IaC) using Terraform (and optionally Pulumi/Crossplane) with GitOps workflows for repeatable, auditable deployments.
- Lead capacity planning, cost optimization, and performance tuning across environments.
CI/CD & Release Engineering
- Build and standardize CI/CD pipelines (Jenkins, GitHub Actions, Azure DevOps, ArgoCD) for microservices, data services, and frontends; enable blue‑green/canary releases and feature flags.
- Drive artifact management, environment promotion, and release governance with compliance-friendly controls.
Containers, Kubernetes & Runtime
- Operate production-grade Kubernetes (EKS/AKS), including cluster lifecycle, autoscaling, ingress, service mesh, and workload security; manage Docker/containerd images and registries.
Reliability, Observability & Incident Management
- Implement end-to-end monitoring, logging, and tracing (Prometheus, Grafana, ELK/EFK, CloudWatch/Log Analytics, Datadog/New Relic) with SLO/SLI error budgets.
- Establish on-call rotations, run postmortems, and continuously improve MTTR and change failure rate.
Security & Compliance (DevSecOps)
- Enforce cloud and container hardening, secrets management (AWS Secrets Manager / HashiCorp Vault), vulnerability scanning (Snyk/SonarQube), and policy-as-code (OPA/Conftest).
- Partner with infosec/risk to meet BFSI regulatory expectations for DR/BCP, audits, and data protection.
Data, Networking & Edge
- Optimize networking (DNS, TCP/IP, routing, OSI layers) and edge delivery (CloudFront/Fastly), including WAF rules and caching strategies.
- Support persistence layers (MySQL, Elasticsearch, DynamoDB) for performance and reliability.
Ways of Working & Leadership
- Lead cross-functional squads (Product, Engineering, Data, Risk) and mentor junior DevOps/SREs.
- Document runbooks, architecture diagrams, and operating procedures; drive automation-first culture.
Must‑Have Qualifications
- 8–10 years of total experience with 5+ years hands-on in DevOps/SRE roles.
- Strong expertise in AWS and/or Azure, Linux administration, Kubernetes, Docker, and Terraform.
- Proven track record building CI/CD with Jenkins/GitHub Actions/Azure DevOps/ArgoCD.
- Solid grasp of networking fundamentals (DNS, TLS, TCP/IP, routing, load balancing).
- Experience implementing observability stacks and responding to production incidents.
- Scripting in Bash/Python; ability to automate ops workflows and platform tasks.
- Good‑to‑Have / Preferred
- Exposure to BFSI/fintech systems and compliance standards; DR/BCP planning.
- Secrets management (Vault), policy-as-code (OPA), and security scanning (Snyk/SonarQube).
- Experience with GitOps patterns, service tiering, and SLO/SLI design. [illbeback.ai]
- Knowledge of CDNs (CloudFront/Fastly) and edge caching/WAF rule authoring.
- Education
- Bachelor’s/Master’s in Computer Science, Information Technology, or related field (or equivalent experience).

About Tradelab Technologies
About
TradeLab is a leading fintech technology provider, delivering cutting-edge solutions to brokers, banks, and fintech platforms. Our portfolio includes high-performance Order & Risk Management Systems (ORMS), seamless MetaTrader integrations, AI-driven customer engagement platforms such as PULSE LLaVA, and compliance-grade risk management solutions. With a proven track record of successful deployments at top-tier brokerages and financial institutions, TradeLab combines scalability, regulatory alignment, and innovation to redefine digital broking and empower clients in the capital markets ecosystem.
Candid answers by the company
Bangalore/Mumbai
Similar jobs
ROLES AND RESPONSIBILITIES:
- Plan, schedule, and manage all releases across product and customer projects.
- Define and maintain the release calendar, identifying dependencies and managing risks proactively.
- Partner with engineering, QA, DevOps, and product management to ensure release readiness.
- Create release documentation (notes, guides, videos) for both internal stakeholders and customers.
- Run a release review process with product leads before publishing.
- Publish releases and updates to the company website release section.
- Drive communication of release details to internal teams and customers in a clear, concise way.
- Manage post-release validation and rollback procedures when required.
- Continuously improve release management through automation, tooling, and process refinement.
IDEAL CANDIDATE:
- 3+ years of experience in Release Management, DevOps, or related roles.
- Strong knowledge of CI/CD pipelines, source control (Git), and build/deployment practices.
- Experience creating release documentation and customer-facing content (videos, notes, FAQs).
- Excellent communication and stakeholder management skills; able to translate technical changes into business impact.
- Familiarity with SaaS, iPaaS, or enterprise software environments is a strong plus.
PERKS, BENEFITS AND WORK CULTURE:
- Competitive salary package.
- Opportunity to learn from and work with senior leadership & founders.
- Build solutions for large enterprises that move from concept to real-world impact.
- Exceptional career growth pathways in a highly innovative and rapidly scaling environment.
Required Skills: Advanced AWS Infrastructure Expertise, CI/CD Pipeline Automation, Monitoring, Observability & Incident Management, Security, Networking & Risk Management, Infrastructure as Code & Scripting
Criteria:
- 5+ years of DevOps/SRE experience in cloud-native, product-based companies (B2C scale preferred)
- Strong hands-on AWS expertise across core and advanced services (EC2, ECS/EKS, Lambda, S3, CloudFront, RDS, VPC, IAM, ELB/ALB, Route53)
- Proven experience designing high-availability, fault-tolerant cloud architectures for large-scale traffic
- Strong experience building & maintaining CI/CD pipelines (Jenkins mandatory; GitHub Actions/GitLab CI a plus)
- Prior experience running production-grade microservices deployments and automated rollout strategies (Blue/Green, Canary)
- Hands-on experience with monitoring & observability tools (Grafana, Prometheus, ELK, CloudWatch, New Relic, etc.)
- Solid hands-on experience with MongoDB in production, including performance tuning, indexing & replication
- Strong scripting skills (Bash, Shell, Python) for automation
- Hands-on experience with IaC (Terraform, CloudFormation, or Ansible)
- Deep understanding of networking fundamentals (VPC, subnets, routing, NAT, security groups)
- Strong experience in incident management, root cause analysis & production firefighting
Description
Role Overview
Company is seeking an experienced Senior DevOps Engineer to design, build, and optimize cloud infrastructure on AWS, automate CI/CD pipelines, implement monitoring and security frameworks, and proactively identify scalability challenges. This role requires someone who has hands-on experience running infrastructure at B2C product scale, ideally in media/OTT or high-traffic applications.
Key Responsibilities
1. Cloud Infrastructure — AWS (Primary Focus)
- Architect, deploy, and manage scalable infrastructure using AWS services such as EC2, ECS/EKS, Lambda, S3, CloudFront, RDS, ELB/ALB, VPC, IAM, Route53, etc.
- Optimize cloud cost, resource utilization, and performance across environments.
- Design high-availability, fault-tolerant systems for streaming workloads.
2. CI/CD Automation
- Build and maintain CI/CD pipelines using Jenkins, GitHub Actions, or GitLab CI.
- Automate deployments for microservices, mobile apps, and backend APIs.
- Implement blue/green and canary deployments for seamless production rollouts.
3. Observability & Monitoring
- Implement logging, metrics, and alerting using tools like Grafana, Prometheus, ELK, CloudWatch, New Relic, etc.
- Perform proactive performance analysis to minimize downtime and bottlenecks.
- Set up dashboards for real-time visibility into system health and user traffic spikes.
4. Security, Compliance & Risk Highlighting
• Conduct frequent risk assessments and identify vulnerabilities in:
o Cloud architecture
o Access policies (IAM)
o Secrets & key management
o Data flows & network exposure
• Implement security best practices including VPC isolation, WAF rules, firewall policies, and SSL/TLS management.
5. Scalability & Reliability Engineering
- Analyze traffic patterns for OTT-specific load variations (weekends, new releases, peak hours).
- Identify scalability gaps and propose solutions across:
- o Microservices
- o Caching layers
- o CDN distribution (CloudFront)
- o Database workloads
- Perform capacity planning and load testing to ensure readiness for 10x traffic growth.
6. Database & Storage Support
- Administer and optimize MongoDB for high-read/low-latency use cases.
- Design backup, recovery, and data replication strategies.
- Work closely with backend teams to tune query performance and indexing.
7. Automation & Infrastructure as Code
- Implement IaC using Terraform, CloudFormation, or Ansible.
- Automate repetitive infrastructure tasks to ensure consistency across environments.
Required Skills & Experience
Technical Must-Haves
- 5+ years of DevOps/SRE experience in cloud-native, product-based companies.
- Strong hands-on experience with AWS (core and advanced services).
- Expertise in Jenkins CI/CD pipelines.
- Solid background working with MongoDB in production environments.
- Good understanding of networking: VPCs, subnets, security groups, NAT, routing.
- Strong scripting experience (Bash, Python, Shell).
- Experience handling risk identification, root cause analysis, and incident management.
Nice to Have
- Experience with OTT, video streaming, media, or any content-heavy product environments.
- Familiarity with containers (Docker), orchestration (Kubernetes/EKS), and service mesh.
- Understanding of CDN, caching, and streaming pipelines.
Personality & Mindset
- Strong sense of ownership and urgency—DevOps is mission critical at OTT scale.
- Proactive problem solver with ability to think about long-term scalability.
- Comfortable working with cross-functional engineering teams.
Why Join company?
• Build and operate infrastructure powering millions of monthly users.
• Opportunity to shape DevOps culture and cloud architecture from the ground up.
• High-impact role in a fast-scaling Indian OTT product.
● Good understanding of how the web works
● Experience with at least one language like Java, Python etc
● Good with Shell scripting
● Experience with *Nix based operating systems
● Experience with k8s, containers
● Fairly good understanding of AWS/GCP/Azure
● Troubleshoot and fix outages and performance issues in infrastructure stack
● Identify gap and design automation tools for all feasible functions in infrastructure
● Good verbal and written communication skills
● Drive SLA/SLO of team
Benefits
This is an opportunity to work on a fairly complex set of systems and improve
them. You will get a chance to learn things like “how to think about code
simplicity”, “how to write for maintainability” and several other things.
● Comprehensive health insurance policy.
● Flexible working hours and a very friendly work environment.
● Flexibility to work either in the office (post Covid) or remotely.
Job Description:
• Drive end-to-end automation from GitHub/GitLab/BitBucket to Deployment,
Observability and Enabling the SRE activities
• Guide operations support (setup, configuration, management, troubleshooting) of
digital platforms and applications
• Solid understanding of DevSecOps Workflows that support CI, CS, CD, CM, CT.
• Deploy, configure, and manage SaaS and PaaS cloud platform and applications
• Provide Level 1 (OS, patching) and Level 2 (app server instance troubleshooting)
• DevOps programming: writing scripts, building operations/server instance/app/DB
monitoring tools Set up / manage continuous build and dev project management
environment: JenkinX/GitHub Actions/Tekton, Git, Jira Designing secure networks,
systems, and application architectures
• Collaborating with cross-functional teams to ensure secure product development
• Disaster recovery, network forensics analysis, and pen-testing solutions
• Planning, researching, and developing security policies, standards, and procedures
• Awareness training of the workforce on information security standards, policies, and
best practices
• Installation and use of firewalls, data encryption and other security products and
procedures
• Maturity in understanding compliance, policy and cloud governance and ability to
identify and execute automation.
• At Wesco, we discuss more about solutions than problems. We celebrate innovation
and creativity.
About us:
HappyFox is a software-as-a-service (SaaS) support platform. We offer an enterprise-grade help desk ticketing system and intuitively designed live chat software.
We serve over 12,000 companies in 70+ countries. HappyFox is used by companies that span across education, media, e-commerce, retail, information technology, manufacturing, non-profit, government and many other verticals that have an internal or external support function.
To know more, Visit! - https://www.happyfox.com/
Responsibilities
- Build and scale production infrastructure in AWS for the HappyFox platform and its products.
- Research, Build/Implement systems, services and tooling to improve uptime, reliability and maintainability of our backend infrastructure. And to meet our internal SLOs and customer-facing SLAs.
- Implement consistent observability, deployment and IaC setups
- Lead incident management and actively respond to escalations/incidents in the production environment from customers and the support team.
- Hire/Mentor other Infrastructure engineers and review their work to continuously ship improvements to production infrastructure and its tooling.
- Build and manage development infrastructure, and CI/CD pipelines for our teams to ship & test code faster.
- Lead infrastructure security audits
Requirements
- At least 7 years of experience in handling/building Production environments in AWS.
- At least 3 years of programming experience in building API/backend services for customer-facing applications in production.
- Proficient in managing/patching servers with Unix-based operating systems like Ubuntu Linux.
- Proficient in writing automation scripts or building infrastructure tools using Python/Ruby/Bash/Golang
- Experience in deploying and managing production Python/NodeJS/Golang applications to AWS EC2, ECS or EKS.
- Experience in security hardening of infrastructure, systems and services.
- Proficient in containerised environments such as Docker, Docker Compose, Kubernetes
- Experience in setting up and managing test/staging environments, and CI/CD pipelines.
- Experience in IaC tools such as Terraform or AWS CDK
- Exposure/Experience in setting up or managing Cloudflare, Qualys and other related tools
- Passion for making systems reliable, maintainable, scalable and secure.
- Excellent verbal and written communication skills to address, escalate and express technical ideas clearly
- Bonus points – Hands-on experience with Nginx, Postgres, Postfix, Redis or Mongo systems.
Kutumb is the first and largest communities platform for Bharat. We are growing at an exponential trajectory. More than 1 Crore users use Kutumb to connect with their community. We are backed by world-class VCs and angel investors. We are growing and looking for exceptional Infrastructure Engineers to join our Engineering team.
More on this here - https://kutumbapp.com/why-join-us.html">https://kutumbapp.com/why-join-us.html
We’re excited if you have:
- Recent experience designing and building unified observability platforms that enable companies to use the sometimes-overwhelming amount of available data (metrics, logs, and traces) to determine quickly if their application or service is operating as desired
- Expertise in deploying and using open-source observability tools in large-scale environments, including Prometheus, Grafana, ELK (ElasticSearch + Logstash + Kibana), Jaeger, Kiali, and/or Loki
- Familiarity with open standards like OpenTelemetry, OpenTracing, and OpenMetrics
- Familiarity with Kubernetes and Istio as the architecture on which the observability platform runs, and how they integrate and scale. Additionally, the ability to contribute improvements back to the joint platform for the benefit of all teams
- Demonstrated customer engagement and collaboration skills to curate custom dashboards and views, and identify and deploy new tools, to meet their requirements
- The drive and self-motivation to understand the intricate details of a complex infrastructure environment
- Using CICD tools to automatically perform canary analysis and roll out changes after passing automated gates (think Argo & keptn)
- Hands-on experience working with AWS
- Bonus points for knowledge of ETL pipelines and Big data architecture
- Great problem-solving skills & takes pride in your work
- Enjoys building scalable and resilient systems, with a focus on systems that are robust by design and suitably monitored
- Abstracting all of the above into as simple of an interface as possible (like Knative) so developers don't need to know about it unless they choose to open the escape hatch
What you’ll be doing:
- Design and build automation around the chosen tools to make onboarding new services easy for developers (dashboards, alerts, traces, etc)
- Demonstrate great communication skills in working with technical and non-technical audiences
- Contribute new open-source tools and/or improvements to existing open-source tools back to the CNCF ecosystem
Tools we use:
Kops, Argo, Prometheus/ Loki/ Grafana, Kubernetes, AWS, MySQL/ PostgreSQL, Apache Druid, Cassandra, Fluentd, Redis, OpenVPN, MongoDB, ELK
What we offer:
- High pace of learning
- Opportunity to build the product from scratch
- High autonomy and ownership
- A great and ambitious team to work with
- Opportunity to work on something that really matters
- Top of the class market salary and meaningful ESOP ownership
Job description
The role requires you to design development pipelines from the ground up, Creation of Docker Files, design and operate highly available systems in AWS Cloud environments. Also involves Configuration Management, Web Services Architectures, DevOps Implementation, Database management, Backups, and Monitoring.
Key responsibility area
- Ensure reliable operation of CI/CD pipelines
- Orchestrate the provisioning, load balancing, configuration, monitoring and billing of resources in the cloud environment in a highly automated manner
- Logging, metrics and alerting management.
- Creation of Bash/Python scripts for automation
- Performing root cause analysis for production errors.
Requirement
- 2 years experience as Team Lead.
- Good Command on kubernetes.
- Proficient in Linux Commands line and troubleshooting.
- Proficient in AWS Services. Deployment, Monitoring and troubleshooting applications in AWS.
- Hands-on experience with CI tooling preferably with Jenkins.
- Proficient in deployment using Ansible.
- Knowledge of infrastructure management tools (Infrastructure as cloud) such as terraform, AWS cloudformation etc.
- Proficient in deployment of applications behind load balancers and proxy servers such as nginx, apache.
- Scripting languages: Bash, Python, Groovy.
- Experience with Logging, Monitoring, and Alerting tools like ELK(Elastic-search, Logstash, Kibana), Nagios. Graylog, splunk Prometheus, Grafana is a plus.
Must Have:
Linux, CI/CD(Jenkin), AWS, Scripting(Bash,shell Python, Go), Ngnix, Docker.
Good to have
Configuration Management(Ansible or similar tool), Logging tool( ELK or similar), Monitoring tool(Ngios or similar), IaC(Terraform, cloudformation).About the client :
Asia’s largest global sports media property in history with a global broadcast to 150+ countries. As the world’s largest martial arts organization, they are a celebration of Asia’s greatest cultural treasure, and its deep-rooted Asian values of integrity, humility, honor, respect, courage, discipline, and compassion. Has achieved some of the highest TV ratings and social media engagement metrics across Asia with its unique brand of Asian values, world-class athletes, and world-class production. Broadcast partners include Turner Sports, Star India, TV Tokyo, Fox Sports, ABS-CBN, Astro, ClaroSports, Bandsports, Startimes, Premier Sports, Thairath TV, Skynet, Mediacorp, OSN, and more. Institutional investors include Sequoia Capital, Temasek Holdings, GIC, Iconiq Capital, Greenoaks Capital, and Mission Holdings. Currently has offices in Singapore, Tokyo, Los Angeles, Shanghai, Milan, Beijing, Bangkok, Manila, Jakarta, and Bangalore.
Position : Devops Engineer – SDE3
As part of the engineering team, you would be expected to have deep technology expertise with a passion for building highly scalable products. This is a unique opportunity where you can impact the lives of people across 150+ countries!
Responsibilities
• Develop Collaborate in large-scale systems design discussions.
• Deploying and maintaining in-house/customer systems ensuring high availability, performance and optimal cost.
• Automate build pipelines. Ensuring right architecture for CI/CD
• Work with engineering leaders to ensure cloud security
• Develop standard operating procedures for various facets of Infrastructure services (CI/CD, Git Branching, SAST, Quality gates, Auto Scaling)
• Perform & automate regular backups of servers & databases. Ensure rollback and restore capabilities are Realtime and with zero-downtime.
• Lead the entire DevOps charter for ONE Championship. Mentor other DevOps engineers. Ensure industry standards are followed.
Requirements
• Overall 5+ years of experience in as DevOps Engineer/Site Reliability Engineer
• B.E/B.Tech in CS or equivalent streams from institute of repute
• Experience in Azure is a must. AWS experience is a plus
• Experience in Kubernetes, Docker, and containers
• Proficiency in developing and deploying fully automated environments using Puppet/Ansible and Terraform
• Experience with monitoring tools like Nagios/Icinga, Prometheus, AlertManager, Newrelic
• Good knowledge of source code control (git)
• Expertise in Continuous Integration and Continuous Deployment setup using Azure Pipeline or Jenkins
• Strong experience in programming languages. Python is preferred
• Experience in scripting and unit testing
• Basic knowledge of SQL & NoSQL databases
• Strong Linux fundamentals
• Experience in SonarQube, Locust & Browserstack is a plus
- 5+ years hands-on experience with designing, deploying and managing core AWS services and infrastructure
- Proficiency in scripting using Bash, Python, Ruby, Groovy, or similar languages
- Experience in source control management, specifically with Git
- Hands-on experience in Unix/Linux and bash scripting
- Experience building, managing Helm-based build and release CI-CD pipelines for Kubernetes platforms (EKS, Openshift, GKE)
- Strong experience with orchestration and config management tools such as Terraform, Ansible or Cloudformation
- Ability to debug, analyze issues leveraging tools like App Dynamics, New Relic and Sumologic
- Knowledge of Agile Methodologies and principles
- Good writing and documentation skills
- Strong collaborator with the ability to work well with core teammates and our colleagues across STS












