
Site Reliability Engineer (Platform Reliability & Uptime)
Location: Bangalore
Experience: 2–5 years
Type: Full-time | On-site
Start: Immediate
Why this role exists
Most systems don’t fail because of one big outage.
They fail because reliability is treated as an afterthought.
Right now, uptime depends too much on individual heroics.
That doesn’t scale.
This role exists to build a reliability system where:
- Uptime is predictable
- Failures are contained
- Escalations don’t depend on leadership
What you’ll do
You will not just monitor systems.
You will own reliability as a product.
1. Drive uptime to production-grade reliability
- Improve system uptime to 99.9% customer-facing SLA within 4 months
- Define and track:
- SLAs / SLOs / error budgets
- Ensure reliability is measured from the customer’s perspective, not internal metrics
2. Build incident response as a system
- Set up a 24/7 incident response rotation across 3 engineers
- Eliminate dependency on leadership (no single escalation point)
- Define:
- Incident severity levels
- Response playbooks
- Escalation protocols
- Ensure fast detection → containment → resolution
3. Contain and fix erratic system behavior
- Identify and resolve:
- Latency spikes
- Downtime incidents
- Integration failures
- Build guardrails to prevent recurrence
- Focus on root cause elimination, not temporary fixes
4. Create continuous reliability feedback loops
- Work closely with engineering teams to:
- Surface recurring failure patterns
- Improve build quality
- Reduce production bugs
- Ensure learnings from incidents directly improve future releases
5. Improve observability and monitoring
- Build dashboards and alerts for:
- System health
- Performance metrics
- Failure signals
- Ensure issues are detected before customers report them
6. Reduce operational fragility
- Remove single points of failure (people, systems, workflows)
- Improve system resilience across:
- Deployments
- Integrations
- Runtime environments
What success looks like
- Uptime reaches 99.9%+ reliably
- Incidents are:
- Detected early
- Contained quickly
- Resolved permanently
- No dependency on a single individual for escalation
- System behavior becomes predictable and stable
- Engineering teams ship with higher reliability confidence
Who you are
- You have 2-5 years of experience in SRE / DevOps / backend systems
- You have worked on production systems with real uptime expectations
- You think in:
- Systems
- Failure modes
- Trade-offs
- You are comfortable debugging live, high-pressure environments
What will make you stand out
- Experience with:
- Distributed systems
- Cloud infrastructure (AWS / Azure / GCP)
- Monitoring & alerting tools
- Have built or improved:
- Incident response systems
- Reliability frameworks
- Strong debugging skills across:
- Infra
- Application
- Integrations
Compensation
₹60,000/month (fixed)
(Aligned with role scope and impact expectations)
Why join
- You will define reliability standards for a production AI platform
- Your work directly impacts:
- Customer trust
- Product performance
- Enterprise readiness
- You will move the system from reactive → predictable
What this role is not
- Not just monitoring dashboards
- Not limited to handling tickets
- Not dependent on escalation to leadership
What this role is
- A builder of reliability systems
- A guardian of uptime and performance
- A multiplier of engineering quality
One question to self-evaluate
Can you build a system where downtime is rare, predictable, and never dependent on a single person?

About Agentic Universe
About
Company social profiles
Similar jobs
Wissen Technology is hiring for Devops engineer
Required:
-4 to 10 years of relevant experience in Devops
-Must have hands on experience on AWS, Kubernetes, CI/CD pipeline
-Good to have exposure on Github or Gitlab
-Open to work from hashtag Chennai
-Work mode will be Hybrid
Company profile:
Company Name : Wissen Technology
Group of companies in India : Wissen Technology & Wissen Infotech
Work Location - Chennai
Website : www.wissen.com
Wissen Thought leadership : https://lnkd.in/gvH6VBaU
LinkedIn: https://lnkd.in/gnK-vXjF
Sr.DevOps Engineer (5 to 8 yrs. Exp.)
Location: Ahmedabad
- Strong Experience in Infrastructure provisioning in cloud using Terraform & AWS CloudFormation Templates.
- Strong Experience in Serverless Containerization technologies such as Kubernetes, Docker etc.
- Strong Experience in Jenkins & AWS Native CI/CD implementation using code
- Strong Experience in Cloud operational automation using Python, Shell script, AWS CLI, AWS Systems Manager, AWS Lamnda, etc.
- Day to Day AWS Cloud administration tasks
- Strong Experience in Configuration management using Ansible and PowerShell.
- Strong Experience in Linux and any scripting language must required.
- Knowledge of Monitoring tool will be added advantage.
- Understanding of DevOps practices which involves Continuous Integration, Delivery and Deployment.
- Hands on with application deployment process
Key Skills: AWS, terraform, Serverless, Jenkins,Devops,CI/CD,Python,CLI,Linux,Git,Kubernetes
Role: Software Developer
Industry Type: IT-Software, Software Services
FunctionalArea:ITSoftware- Application Programming, Maintenance
Employment Type: Full Time, Permanent
Education: Any computer graduate.
Salary: Best in Industry.- Recommend a migration and consolidation strategy for DevOps tools
- Design and implement an Agile work management approach
- Make a quality strategy
- Design a secure development process
- Create a tool integration strategy
We are a self organized engineering team with a passion for programming and solving business problems for our customers. We are looking to expand our team capabilities on the DevOps front and are on a lookout for 4 DevOps professionals having relevant hands on technical experience of 4-8 years.
We encourage our team to continuously learn new technologies and apply the learnings in the day to day work even if the new technologies are not adopted. We strive to continuously improve our DevOps practices and expertise to form a solid backbone for the product, customer relationships and sales teams which enables them to add new customers every week to our financing network.
As a DevOps Engineer, you :
- Will work collaboratively with the engineering and customer support teams to deploy and operate our systems.
- Build and maintain tools for deployment, monitoring and operations.
- Help automate and streamline our operations and processes.
- Troubleshoot and resolve issues in our test and production environments.
- Take control of various mandates and change management processes to ensure compliance for various certifications (PCI and ISO 27001 in particular)
- Monitor and optimize the usage of various cloud services.
- Setup and enforce CI/CD processes and practices
Skills required :
- Strong experience with AWS services (EC2, ECS, ELB, S3, SES, to name a few)
- Strong background in Linux/Unix administration and hardening
- Experience with automation using Ansible, Terraform or equivalent
- Experience with continuous integration and continuous deployment tools (Jenkins)
- Experience with container related technologies (docker, lxc, rkt, docker swarm, kubernetes)
- Working understanding of code and script (Python, Perl, Ruby, Java)
- Working understanding of SQL and databases
- Working understanding of version control system (GIT is preferred)
- Managing IT operations, setting up best practices and tuning them from time-totime.
- Ensuring that process overheads do not reduce the productivity and effectiveness of small team. - Willingness to explore and learn new technologies and continuously refactor thetools and processes.
We are looking for a DevOps Engineer for managing the interchange of data between the server and the users. Your primary responsibility will be the development of all server-side logic, definition, and maintenance of the central database, and ensuring high performance and responsiveness to request from the frontend. You will also be responsible for integrating the front-end elements built by your co-workers into the application. Therefore, a basic understanding of frontend technologies is necessary as well.
What we are looking for
- Must have strong knowledge of Kubernetes and Helm3
- Should have previous experience in Dockerizing the applications.
- Should be able to automate manual tasks using Shell or Python
- Should have good working knowledge on AWS and GCP clouds
- Should have previous experience working on Bitbucket, Github, or any other VCS.
- Must be able to write Jenkins Pipelines and have working knowledge on GitOps and ArgoCD.
- Have hands-on experience in Proactive monitoring using tools like NewRelic, Prometheus, Grafana, Fluentbit, etc.
- Should have a good understanding of ELK Stack.
- Exposure on Jira, confluence, and Sprints.
What you will do:
- Mentor junior Devops engineers and improve the team’s bar
- Primary owner of tech best practices, tech processes, DevOps initiatives, and timelines
- Oversight of all server environments, from Dev through Production.
- Responsible for the automation and configuration management
- Provides stable environments for quality delivery
- Assist with day-to-day issue management.
- Take lead in containerising microservices
- Develop deployment strategies that allow DevOps engineers to successfully deploy code in any environment.
- Enables the automation of CI/CD
- Implement dashboard to monitors various
- 1-3 years of experience in DevOps
- Experience in setting up front end best practices
- Working in high growth startups
- Ownership and Be Proactive.
- Mentorship & upskilling mindset.
- systems and applications
what you’ll get- Health Benefits
- Innovation-driven culture
- Smart and fun team to work with
- Friends for life
Hands on Experience with Linux administration
Experience using Python or Shell scripting (for Automation)
Hands-on experience with Implementation of CI/CD Processes
Experience working with one cloud platforms (AWS or Azure or Google)
Experience working with configuration management tools such as Ansible & Chef
Experience working with Containerization tool Docker.
Experience working with Container Orchestration tool Kubernetes.
Experience in source Control Management including SVN and/or Bitbucket
& GitHub
Experience with setup & management of monitoring tools like Nagios, Sensu & Prometheus or any other popular tools
Hands-on experience in Linux, Scripting Language & AWS is mandatory
Troubleshoot and Triage development, Production issues
About the Company
- 💰 Early-stage, ed-tech, funded, growing, growing fast
- 🎯 Mission Driven: Make Indonesia competitive on a global scale
- 🥅 Build the best educational content and technology to advance STEM education
- 🥇 Students-First approach
- 🇮🇩 🇮🇳 Teams in India and Indonesia
Skillset 🧗🏼♀️
- You primarily identify as a DevOps/Infrastructure engineer and are comfortable working with systems and cloud-native services on AWS
- You can design, implement, and maintain secure and scalable infrastructure delivering cloud-based services
- You have experience operating and maintaining production systems in a Linux based public cloud environment
- You are familiar with cloud-native concepts - Containers, Lambdas, Orchestration (ECS, Kubernetes)
- You’re in love with system metrics and strive to help deliver improvements to systems all the time
- You can think in terms of Infrastructure as Code to build tools for automating deployment, monitoring, and operations of the platform
- You can be on-call once every few weeks to provide application support, incident management, and troubleshooting
- You’re fairly comfortable with GIT, AWS CLI, python, docker CLI, in general, all things CLI. Oh! Bash scripting too!
- You have high integrity, and you are reliable
What you can expect from us 👌🏼
☮️ Mentorship, growth, great work culture
- Mentorship and continuous improvement are a part of the team’s DNA. We have a battle-tested robust growth framework. You will have people to look up to and people looking up to you
- We are a people-first, high-trust, high-autonomy team
- We live in the TDD, Pair Programming, First Principles world
🌏 Remote done right
- Distributed does not mean working in isolation, feeling alone, being buried in Zoom calls
- Our leadership team has been WFH for 10+ years now and we know how remote teams work. This will be a place to belong
- A good balance between deep focussed work and collaborative work ⚖️
🖥️ Friendly, humane interview process
- 30-minute alignment check and screening call
- A short take-home coding assignment, no more than 2-3 hours. Time is precious
- Pair programming interview. Collaborate, work together. No sitting behind a desk and judging
- In-depth engineering discussion around your skills and career so far
- System design and architecture interview for seniors
What we ask from you👇🏼
- Bring your software engineering — both individual brilliance and collaborative skills
- Bring your good nature — we're building a team that supports each other
- Be vested or interested in the company vision
Must-Have’s:
- Hands-on DevOps (Git, Ansible, Terraform, Jenkins, Python/Ruby)
Job Description:
- Knowledge on what is a DevOps CI/CD Pipeline
- Understanding of version control systems like Git, including branching and merging strategies
- Knowledge of what is continuous delivery and integration tools like Jenkins, Github
- Knowledge developing code using Ruby or Python and Java or PHP
- Knowledge writing Unix Shell (bash, ksh) scripts
- Knowledge of what is automation/configuration management using Ansible, Terraform, Chef or Puppet
- Experience and willingness to keep learning in a Linux environment
- Ability to provide after-hours support as needed for emergency or urgent situations
Nice to have’s:
- Proficient with container based products like docker and Kubernetes
- Excellent communication skills (verbal and written)
- Able to work in a team and be a team player
- Knowledge of PHP, MySQL, Apache and other open source software
- BA/BS in computer science or similar
2. Extensive expertise in the below in AWS Development.
3. Amazon Dynamo Db, Amazon RDS , Amazon APIs. AWS Elastic Beanstalk, and AWS Cloud Formation.
4. Lambda, Kinesis. CodeCommit ,CodePipeline.
5. Leveraging AWS SDKs to interact with AWS services from the application.
6. Writing code that optimizes performance of AWS services used by the application.
7. Developing with Restful API interfaces.
8. Code-level application security (IAM roles, credentials, encryption, etc.).
9. Programming Language Python or .NET. Programming with AWS APIs.
10. General troubleshooting and debugging.








