
Site Reliability Engineer (Platform Reliability & Uptime)
Location: Bangalore
Experience: 2–5 years
Type: Full-time | On-site
Start: Immediate
Why this role exists
Most systems don’t fail because of one big outage.
They fail because reliability is treated as an afterthought.
Right now, uptime depends too much on individual heroics.
That doesn’t scale.
This role exists to build a reliability system where:
- Uptime is predictable
- Failures are contained
- Escalations don’t depend on leadership
What you’ll do
You will not just monitor systems.
You will own reliability as a product.
1. Drive uptime to production-grade reliability
- Improve system uptime to 99.9% customer-facing SLA within 4 months
- Define and track:
- SLAs / SLOs / error budgets
- Ensure reliability is measured from the customer’s perspective, not internal metrics
2. Build incident response as a system
- Set up a 24/7 incident response rotation across 3 engineers
- Eliminate dependency on leadership (no single escalation point)
- Define:
- Incident severity levels
- Response playbooks
- Escalation protocols
- Ensure fast detection → containment → resolution
3. Contain and fix erratic system behavior
- Identify and resolve:
- Latency spikes
- Downtime incidents
- Integration failures
- Build guardrails to prevent recurrence
- Focus on root cause elimination, not temporary fixes
4. Create continuous reliability feedback loops
- Work closely with engineering teams to:
- Surface recurring failure patterns
- Improve build quality
- Reduce production bugs
- Ensure learnings from incidents directly improve future releases
5. Improve observability and monitoring
- Build dashboards and alerts for:
- System health
- Performance metrics
- Failure signals
- Ensure issues are detected before customers report them
6. Reduce operational fragility
- Remove single points of failure (people, systems, workflows)
- Improve system resilience across:
- Deployments
- Integrations
- Runtime environments
What success looks like
- Uptime reaches 99.9%+ reliably
- Incidents are:
- Detected early
- Contained quickly
- Resolved permanently
- No dependency on a single individual for escalation
- System behavior becomes predictable and stable
- Engineering teams ship with higher reliability confidence
Who you are
- You have 2-5 years of experience in SRE / DevOps / backend systems
- You have worked on production systems with real uptime expectations
- You think in:
- Systems
- Failure modes
- Trade-offs
- You are comfortable debugging live, high-pressure environments
What will make you stand out
- Experience with:
- Distributed systems
- Cloud infrastructure (AWS / Azure / GCP)
- Monitoring & alerting tools
- Have built or improved:
- Incident response systems
- Reliability frameworks
- Strong debugging skills across:
- Infra
- Application
- Integrations
Compensation
₹60,000/month (fixed)
(Aligned with role scope and impact expectations)
Why join
- You will define reliability standards for a production AI platform
- Your work directly impacts:
- Customer trust
- Product performance
- Enterprise readiness
- You will move the system from reactive → predictable
What this role is not
- Not just monitoring dashboards
- Not limited to handling tickets
- Not dependent on escalation to leadership
What this role is
- A builder of reliability systems
- A guardian of uptime and performance
- A multiplier of engineering quality
One question to self-evaluate
Can you build a system where downtime is rare, predictable, and never dependent on a single person?

About Agentic Universe
About
Company social profiles
Similar jobs
About Simbian
Simbian is at the forefront of cybersecurity innovation, leveraging purpose-built AI Agents to deliver 10x security outcomes for global enterprises and MSSPs. Our platform autonomously investigates and responds to alerts, freeing security teams from repetitive tasks. Simbian combines privacy-first technology, proven integration with 70+ enterprise tools, and rapid deployment for measurable value. Role
Overview
We are seeking a collaborative, innovative DevOps Engineer passionate about enabling secure, scalable operations for cutting-edge cybersecurity products. Join our team during a period of high growth and help architect the future of agentic AI security platforms.
Key Responsibilities
• Kubernetes Management:
o Manage and maintain production-grade Kubernetes clusters across multiple cloud providers (AWS is essential, Azure is valuable, GCP is a plus).
o Deploy, upgrade, troubleshoot, and scale stateful and stateless workloads (NGINX, Postgres, MongoDB, OpenCTI, OpenSearch, Kafka, Hadoop, Fluentd) in Kubernetes.
• Cloud Operations:
o Operate and optimize cloud environments, with strong expertise in AWS (AWS Certified Solutions Architect Professional or equivalent Azure cert preferred).
o Design, deploy, and manage infrastructure on AWS and Azure (GCP optional). • SQL Database Management:
o Administer SQL databases, ideally Postgres, on Kubernetes clusters or cloud VMs.
o Perform routine maintenance, backups, upgrades, monitoring, and optimization.
• Infrastructure as Code:
o Build, install, upgrade, and maintain Helm charts with expertise.
o Use and understand Ansible for cloud automation (AWS/Azure), and Terraform for infrastructure provisioning.
• Monitoring, Logging, Observability:
o Implement and manage logging and metrics stacks using OpenSearch/Elasticsearch, Prometheus, Grafana, Thanos or similar open source tools.
• Programming & Scripting:
o Develop automation scripts in Bash (proficient with control structures). o Produce scripts or microservices in Node.js (preferred) or Python/Django (bonus).
• CI/CD:
o Build and maintain CI/CD pipelines preferably using GitHub Actions (Jenkins or equivalent is acceptable).
• Containerization:
o Create, manage, and troubleshoot Docker/Podman containers, images, volumes, and use Docker Compose for local development.
• Customer-Facing On-Prem Deployments (Bonus):
o Install, configure, and support Kubernetes on customer premises.
o Demonstrate ownership, initiative, and strong customer communication skills.
o Solid knowledge of Linux administration, networking, and cloud environments.
What You’ll Bring:
• 4+ years’ experience in DevOps, SRE, or Production Engineering.
• Mastery of Kubernetes, AWS, infrastructure automation, and database management.
• Strong collaborative, curious, and growth-driven mindset.
• Ability to challenge ideas, drive innovation, and embrace rapid change.
• Excellent communication for technical customer interactions.
Why Join Simbian?
• Work with pioneering agentic AI security—impact global security teams.
• Shape infrastructure for privacy-first technology in a high-growth startup.
• Enjoy a dynamic remote-first work culture with opportunities for ownership and advancement.
Job Description:
• Drive end-to-end automation from GitHub/GitLab/BitBucket to Deployment,
Observability and Enabling the SRE activities
• Guide operations support (setup, configuration, management, troubleshooting) of
digital platforms and applications
• Solid understanding of DevSecOps Workflows that support CI, CS, CD, CM, CT.
• Deploy, configure, and manage SaaS and PaaS cloud platform and applications
• Provide Level 1 (OS, patching) and Level 2 (app server instance troubleshooting)
• DevOps programming: writing scripts, building operations/server instance/app/DB
monitoring tools Set up / manage continuous build and dev project management
environment: JenkinX/GitHub Actions/Tekton, Git, Jira Designing secure networks,
systems, and application architectures
• Collaborating with cross-functional teams to ensure secure product development
• Disaster recovery, network forensics analysis, and pen-testing solutions
• Planning, researching, and developing security policies, standards, and procedures
• Awareness training of the workforce on information security standards, policies, and
best practices
• Installation and use of firewalls, data encryption and other security products and
procedures
• Maturity in understanding compliance, policy and cloud governance and ability to
identify and execute automation.
• At Wesco, we discuss more about solutions than problems. We celebrate innovation
and creativity.
We are a boutique IT services & solutions firm headquartered in the Bay Area with offices in India. Our offering includes custom-configured hybrid cloud solutions backed by our managed services. We combine best in class DevOps and IT infrastructure management practices, to manage our clients Hybrid Cloud Environments.
In addition, we build and deploy our private cloud solutions using Open stack to provide our clients with a secure, cost effective and scale able Hybrid Cloud solution. We work with start-ups as well as enterprise clients.
This is an exciting opportunity for an experienced Cloud Engineer to work on exciting projects and have an opportunity to expand their knowledge working on adjacent technologies as well.
Must have skills
• Provisioning skills on IaaS cloud computing for platforms such as AWS, Azure, GCP.
• Strong working experience in AWS space with various AWS services and implementations (i.e. VPCs, SES, EC2, S3, Route 53, Cloud Front, etc.)
• Ability to design solutions based on client requirements.
• Some experience with various network LAN/WAN appliances like (Cisco routers and ASA systems, Barracuda, Meraki, SilverPeak, Palo Alto, Fortinet, etc.)
• Understanding of networked storage like (NFS / SMB / iSCSI / Storage GW / Windows Offline)
• Linux / Windows server installation, maintenance, monitoring, data backup and recovery, security, and administration.
• Good knowledge of TCP/IP protocol & internet technologies.
• Passion for innovation and problem solving, in a start-up environment.
• Good communication skills.
Good to have
• Remote Monitoring & Management.
• Familiarity with Kubernetes and Containers.
• Exposure to DevOps automation scripts & experience with tools like Git, bash scripting, PowerShell, AWS Cloud Formation, Ansible, Chef or Puppet will be a plus.
• Architect / Practitioner certification from OEM with hands-on capabilities.
What you will be working on
• Trouble shoot and handle L2/ L3 tickets.
• Design and architect Enterprise Cloud systems and services.
• Design, Build and Maintain environments primarily in AWS using EC2, S3/Storage, CloudFront, VPC, ELB, Auto Scaling, Direct Connect, Route53, Firewall, etc.
• Build and deploy in GCP/ Azure as needed.
• Architect cloud solutions keeping performance, cost and BCP considerations in mind.
• Plan cloud migration projects as needed.
• Collaborate & work as part of a cohesive team.
• Help build our private cloud offering on Open stack.
FINTECH CANDIDATES ONLY
About the job:
Emint is a fintech startup with the mission to ‘Make the best investing product that Indian consumers love to use, with simplicity & intelligence at the core’. We are creating a platformthat
gives a holistic view of market dynamics which helps our users make smart & disciplined
investment decisions. Emint is founded by a stellar team of individuals who come with decades of
experience of investing in Indian & global markets. We are building a team of highly skilled &
disciplined team of professionals and looking at equally motivated individuals to be part of
Emint. Currently are looking at hiring a Devops to join our team at Bangalore.
Job Description :
Must Have:
• Hands on experience on AWS DEVOPS
• Experience in Unix with BASH scripting is must
• Experience working with Kubernetes, Docker.
• Experience in Gitlab, Github or Bitbucket artifactory
• Packaging, deployment
• CI/CD pipeline experience (Jenkins is preferable)
• CI/CD best practices
Good to Have:
• Startup Experience
• Knowledge of source code management guidelines
• Experience with deployment tools like Ansible/puppet/chef is preferable
• IAM knowledge
• Coding knowledge of Python adds value
• Test automation setup experience
Qualifications:
• Bachelor's degree or equivalent experience in Computer Science or related field
• Graduates from IIT / NIT/ BITS / IIIT preferred
• Professionals with fintech ( stock broking / banking ) preferred
• Experience in building & scaling B2C apps preferred
- Seeking an Individual carrying around 5+ yrs of experience.
- Must have skills - Jenkins, Groovy, Ansible, Shell Scripting, Python, Linux Admin
- Terraform, AWS deep knowledge to automate and provision EC2, EBS, SQL Server, cost optimization, CI/CD pipeline using Jenkins, Server less automation is plus.
- Excellent writing and communication skills in English. Enjoy writing crisp and understandable documentation
- Comfortable programming in one or more scripting languages
- Enjoys tinkering with tooling. Find easier ways to handle systems by doing some research. Strong awareness around build vs buy.
knowledge of EC2, RDS and S3.
● Good command of Linux environment
● Experience with tools such as Docker, Kubernetes, Redis, NodeJS and Nginx
Server configurations and deployment, Kafka, Elasticsearch, Ansible, Terraform,
etc
● Bonus: AWS certification is a plus
● Bonus: Basic understanding of database queries for relational databases such as
MySQL.
● Bonus: Experience with CI servers such as Jenkins, Travis or similar types
● Bonus: Demonstrated programming capability in a high-level programming
language such as Python, Go, or similar
● Develop, maintain and administer tools which will automate operational activities
and improve engineering productivity
● Automate continuous delivery and on-demand capacity management solutions
● Developing configuration and infrastructure solutions for internal deployments
● Troubleshooting, diagnosing and fixing software issues
● Updating, tracking and resolving technical issues
● Suggesting architecture improvements, recommending process improvements
● Evaluate new technology options and vendor products. Ensuring critical system
security through the use of best in class security solutions
● Technical experience or in a similar role supporting large scale production
distributed systems
● Must understand overall system architecture , improve design and implement new
processes.
|
Numerator is looking for an experienced, talented and quick-thinking DevOps Manager to join our team and work with the Global DevOps groups to keep infrastructure up to date and continuously advancing. This is a unique opportunity where you will get the chance to work on the infrastructure of both established and greenfield products. Our technology harnesses consumer-related data in many ways including gamified mobile apps, sophisticated web crawling and enhanced Deep Learning algorithms to deliver an unmatched view of the consumer shopping experience. As a member of the Numerator DevOps Engineering team, you will make an immediate impact as you help build out and expand our technology platforms from on-premise to the cloud across a wide range of software ecosystems. Many of your daily tasks and engagement with applications teams will help shape how new projects are delivered at scale to meet our clients demands. This role requires a balance between hands-on infrastructure-as-code deployments with application teams as well as working with Global DevOps Team to roll out new initiatives. What you will get to do
|
|
Requirements |
|
Nice to have
|
- Have 3+ years of experience in Python development
- Be familiar with common database access patterns
- Have experience with designing systems and monitoring metrics, looking at graphs.
- Have knowledge of AWS, Kubernetes and Docker.
- Be able to work well in a remote development environment.
- Be able to communicate in English at a native speaking and writing level.
- Be responsible to your fellow remote team members.
- Be highly communicative and go out of your way to contribute to the team and help others
Objectives of this Role
Improve reliability, quality, and time-to-market of our suite of software solutions
- Run the production environment by monitoring availability and taking a holistic view of system health
- Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer - needs, and innovating to continually improve
- Provide primary operational support and engineering for multiple large distributed software applications
- Participate in system design consulting, platform management, and capacity planning
- Languages: Python, Java, Ruby DSL, Bash
- Databases : MySQL, Cassandra , Elastic Search
- Deployment: AWS CloudFormation
Essential Criteria:
- 8 or more years administrating production Linux systems in a 24x7 environment
- 3 or more years’ experience in a DevOps/ SRE role as an engineer or technical lead
- At least 1 year of team leadership experience
- Significant knowledge of Amazon Web Services (CLI/APIs, EC2, EBS, S3, VPCs, IAM, AWS Lambda)
- Experience deploying services into containerized orchestration environments such as Kubernetes
- Experience with infrastructure automation tools like CloudFormation, Terraform, etc.
- Experience with at least one of Python, Bash, Ruby, or equivalent
- Experience creating and managing CI/CD pipeline like Jenkins or Spinnaker
- Familiar with version control using Git
- Solid understanding of common security principles
Nice to Have:
- Preference for hands on experience with Serverless Architecture, Kubernetes and Docker
- Strong experience with open-source configuration management tools
- Managing distributed systems spanning multiple AWS regions / data-centers
- Experience with bootstrapping solutions
- Open source contributor
- We’re committed to client success: There are over 6,200 brand and retail websites in the Bazaarvoice network. Our clients represent some of the world’s leading companies across a wide range of industries including retail, apparel, automotive, consumer electronics and travel.
- We’re leaders in consumer-generated content: Each month, more than one billion consumers view and share authentic consumer-generated content, such as ratings and reviews, curated photos, social posts and videos, about products in our network. Thousands upon thousands or reviews are added to the Bazaarvoice network everyday.
- Our network delivers: Network analytics provide insights that help marketers and advertisers provide more engaging experiences that drive brand awareness, consideration, sales, and loyalty.
- We’re a great place to work: We pride ourselves on our unique culture. Join a company that values passion, innovation, authenticity, generosity, respect, teamwork, and performance.
DevOps Consultant!! MERN Stack Project Manager – Systems (Enterprise or Solutions) Architect needed!
Hello superstar,
I appreciate you taking time to read this. I have posted a job for developers to work on a start-up, the link is ......
I would need someone with DevOps experience, to ensure that the project is undertaken with the highest standards possible. I have had many experiences where ‘completed’ software after years of development was filled with bugs and it would be more cost-effective to start from scratch than to attempt to find and correct all the bugs.
I have attempted to learn as much as possible, but I now have an opportunity and it would better serve the venture to have someone handle the management of the project to ensure that;
- We choose the most appropriate technology
- We choose competent developers in those technologies
- The architecture and data modeling are clearly defined in a ‘blueprint’ plan
- A DevOps environment and processes are set up and the developers understand what is required
- Proper tests are carried out to ensure everything works as intended
- There are processes for testers to follow and competent testers are selected to follow them
- Accessibility, localization, and internationalization are planned ahead of time
- Security, scalability, and other future probabilities that I may not even be aware of are considered and planned ahead of time
- Documentation and code reviews, refactoring and other quality assurance processes are undertaken
- Working software is produced and systems that enable new developers or teams of people to easily take over and/or contribute new modules or updates in a controlled and organized fashion
- Cost estimates or budgets/projections or use of SaaS, hosting and other 3rd party services and applications
I am more concerned with a professional and world-class organizational system than with any particular type of software been produced as the strong foundation will enable anything to be creating with efficacy and precision.
Again, thank you for reading this, please reply with the word “superstar” anywhere in the second line of your response. I look forward to hearing from you.
Warm wishes DevOps Evangelist,







