
Site Reliability Engineer
at Our client company is into Computer software. (YB1)
- Design, develop, test, debug and maintain components of the cloud infrastructure
- Manage operational priorities of the DBaaS
- Establish a process for handling and leading response to security new vulnerabilities
- Lead certification efforts from the security perspective
- Participate in penetration testing efforts
- Design and build DBaaS processes for key management, rotation storage, encryption, and password management
Requirements:
- Strong software design and implementation skills in building infrastructure frameworks
- Experience building and operating extensible, scalable resilient data systems
- Working knowledge of Java and Python Experience using public cloud infrastructure (AWS, GCP, or Azure)
- Containerization tooling (Docker, EKS, Kubernetes)
- Infrastructure as Code Tooling (Example: Terraform, Cloudformation, Etc.)
- Configuration Management Tooling (Ansible, Chef, etc.)
- Automation Scripting (Python preferred)
- Solid understanding of basic systems operations (disk, network, etc)
- Willingness and ability to learn new languages and concepts
- 5+ years of relevant experience

Similar jobs

Job Title: Sr Dev Ops Engineer
Location: Bengaluru- India (Hybrid work type)
Reports to: Sr Engineer manager
About Our Client :
We are a solution-based, fast-paced tech company with a team that thrives on collaboration and innovative thinking. Our Client's IoT solutions provide real-time visibility and actionable insights for logistics and supply chain management. Cloud-based, AI-enhanced metrics coupled with patented hardware optimize processes, inform strategic decision making and enable intelligent supply chains without the costly infrastructure
About the role : We're looking for a passionate DevOps Engineer to optimize our software delivery and infrastructure. You'll build and maintain CI/CD pipelines for our microservices, automate infrastructure, and ensure our systems are reliable, scalable, and secure. If you thrive on enhancing performance and fostering operational excellence, this role is for you.
What You'll Do 🛠️
- Cloud Platform Management: Administer and optimize AWS resources, ensuring efficient billing and cost management.
- Billing & Cost Optimization: Monitor and optimize cloud spending.
- Containerization & Orchestration: Deploy and manage applications and orchestrate them.
- Database Management: Deploy, manage, and optimize database instances and their lifecycles.
- Authentication Solutions: Implement and manage authentication systems.
- Backup & Recovery: Implement robust backup and disaster recovery strategies, for Kubernetes cluster and database backups.
- Monitoring & Alerting: Set up and maintain robust systems using tools for application and infrastructure health and integrate with billing dashboards.
- Automation & Scripting: Automate repetitive tasks and infrastructure provisioning.
- Security & Reliability: Implement best practices and ensure system performance and security across all deployments.
- Collaboration & Support: Work closely with development teams, providing DevOps expertise and support for their various application stacks.
What You'll Bring 💼
- Minimum of 4 years of experience in a DevOps or SRE role.
- Strong proficiency in AWS Cloud, including services like Lambda, IoT Core, ElastiCache, CloudFront, and S3.
- Solid understanding of Linux fundamentals and command-line tools.
- Extensive experience with CI/CD tools, GitLab CI.
- Hands-on experience with Docker and Kubernetes, specifically AWS EKS.
- Proven experience deploying and managing microservices.
- Expertise in database deployment, optimization, and lifecycle management (MongoDB, PostgreSQL, and Redis).
- Experience with Identity and Access management solutions like Keycloak.
- Experience implementing backup and recovery solutions.
- Familiarity with optimizing scaling, ideally with Karpenter.
- Proficiency in scripting (Python, Bash).
- Experience with monitoring tools such as Prometheus, Grafana, AWS CloudWatch, Elastic Stack.
- Excellent problem-solving and communication skills.
Bonus Points ➕
- Basic understanding of MQTT or general IoT concepts and protocols.
- Direct experience optimizing React.js (Next.js), Node.js (Express.js, Nest.js) or Python (Flask) deployments in a containerized environment.
- Knowledge of specific AWS services relevant to application stacks.
- Contributions to open-source projects related to Kubernetes, MongoDB, or any of the mentioned frameworks.
- AWS Certifications (AWS Certified DevOps Engineer, AWS Certified Solutions Architect, AWS Certified SysOps Administrator, AWS Certified Advanced Networking).
Why this role:
•You will help build the company from the ground up—shaping our culture and having an impact from Day 1 as part of the foundational team.


We are seeking a skilled and detail-oriented SRE Release Engineer to lead and streamline the CI/CD pipeline for our C and Python codebase. You will be responsible for coordinating, automating, and validating biweekly production releases, ensuring operational stability, high deployment velocity, and system reliability.
Requirements
● Bachelor’s degree in Computer Science, Engineering, or related field.
● 3+ years in SRE, DevOps, or release engineering roles.
● Proficiency in CI/CD tooling (e.g., GitHub Actions, Jenkins, GitLab).
● Experience automating deployments for C and Python applications.
● Strong understanding of Git version control, merge/rebase strategies, tagging, and submodules (if used).
● Familiarity with containerization (Docker) and deployment orchestration (e.g.,
Kubernetes, Ansible, or Terraform).
● Solid scripting experience (Python, Bash, or similar).
● Understanding of observability, monitoring, and incident response tooling (e.g.,Prometheus, Grafana, ELK, Sentry).
Preferred Skills
● Experience with release coordination in data networking environments
● Familiarity with build tools like Make, CMake, or Bazel.
● Exposure to artifact management systems (e.g., Artifactory, Nexus).
● Experience deploying to Linux production systems with service uptime guarantees.
Responsibilities
● Own the release process: Plan, coordinate, and execute biweekly software releases across multiple services.
● Automate release pipelines: Build and maintain CI/CD workflows using tools such as GitHub Actions, Jenkins, or GitLab CI.
● Version control: Manage and enforce Git best practices, branching strategies (e.g., Git Flow), tagging, and release versioning.
● Integrate testing frameworks: Ensure automated test coverage (unit, integration,regression) is enforced pre-release.
● Release validation: Develop pre-release verification tools/scripts to validate build integrity and backward compatibility.
● Deployment strategy: Implement and refine blue/green, rolling, or canary deployments in staging and production environments.
● Incident readiness: Partner with SREs to ensure rollback strategies, monitoring, and alerting are release-aware.
● Collaboration: Work closely with developers, QA, and product teams to align on release timelines and feature readiness.
Success Metrics
● Achieve >95% release success rate with minimal hotfix rollbacks.
● Reduce mean release deployment time by 30% within 6 months.
● Maintain a weekly release readiness report with zero critical blockers.
● Enable full traceability of builds from commit to deployment.
Benefits
Enjoy a great environment, great people, and a great package
- Stock Appreciation Rights - Generous pre series-B stock options
- Generous Gratuity Plan - Long service compensation far exceeding Indian statutory requirements
- Health Insurance - Premium health insurance for employee, spouse and children
- Working Hours - Flexible working hours with sole focus on enabling a great work environment
- Work Environment - Work with top industry experts in an environment that fosters co-operation, learning and developing skills
- Make a Difference - We're here because we want to make an impact on the world - we hope you do too!
Why Join RtBrick
Enjoy the excitement of a start-up without the risk!
We're revolutionizing the Internet's backbone by using cutting-edge software development techniques. The internet and, more specifically, broadband networks are among the most world's most critical technologies, that billions of people rely on every day. Rtbrick is revolutionizing the way these networks are constructed, moving away from traditional monolithic routing systems to a more agile, disaggregated infrastructure and distributed edge network functions. This shift mirrors transformations seen in computing and cloud technologies, marking the most profound change in networking since the inception of IP technology.
We're pioneering a cloud-native approach, harnessing the power of container-based software, microservices, a devops philosophy, and warehouse scale tools to drive innovation.
And although RtBrick is a young innovative company, RtBrick stands on solid financial ground: we are already cash-flow positive, backed by major telco investors like Swisscom Ventures and T-Capital, and our solutions are actively deployed by Tier-1 telcos including Deutsche Telekom (Europe's largest carrier), Regional ISPs and City ISPs—with expanding operations across Europe, North America and Asia.
Joining RtBrick offers you the unique thrill of a startup environment, coupled with the security that comes from working in a business with substantial market presence and significant revenue streams.
We'd love you to come and join us so why don't you embrace the opportunity to be part of a team that's not just participating in the market but actively shaping the future of telecommunications worldwide.

Job Title: DevOps - 3
Roles and Responsibilities:
- Develop deep understanding of the end-to-end configurations, dependencies, customer requirements, and overall characteristics of the production services as the accountable owner for overall service operations
- Implementing best practices, challenging the status quo, and tab on industry and technical trends, changes, and developments to ensure the team is always striving for best-in-class work
- Lead incident response efforts, working closely with cross-functional teams to resolve issues quickly and minimize downtime. Implement effective incident management processes and post-incident reviews
- Participate in on-call rotation responsibilities, ensuring timely identification and resolution of infrastructure issues
- Possess expertise in designing and implementing capacity plans, accurately estimating costs and efforts for infrastructure needs.
- Systems and Infrastructure maintenance and ownership for production environments, with a continued focus on improving efficiencies, availability, and supportability through automation and well defined runbooks
- Provide mentorship and guidance to a team of DevOps engineers, fostering a collaborative and high-performing work environment. Mentor team members in best practices, technologies, and methodologies.
- Design for Reliability - Architect & implement solutions that keeps Infrastructure running with Always On availability and ensures high uptime SLA for the Infrastructure
- Manage individual project priorities, deadlines, and deliverables related to your technical expertise and assigned domains
- Collaborate with Product & Information Security teams to ensure the integrity and security of Infrastructure and applications. Implement security best practices and compliance standards.
Must Haves
- 5-8 years of experience as Devops / SRE / Platform Engineer.
- Strong expertise in automating Infrastructure provisioning and configuration using tools like Ansible, Packer, Terraform, Docker, Helm Charts etc.
- Strong skills in network services such as DNS, TLS/SSL, HTTP, etc
- Expertise in managing large-scale cloud infrastructure (preferably AWS and Oracle)
- Expertise in managing production grade Kubernetes clusters
- Experience in scripting using programming languages like Bash, Python, etc.
- Expertise in skill sets for centralized logging systems, metrics, and tooling frameworks such as ELK, Prometheus/VictoriaMetrics, and Grafana etc.
- Experience in Managing and building High scale API Gateway, Service Mesh, etc
- Systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive
- Have a working knowledge of a backend programming language
- Deep knowledge & experience with Unix / Linux operating systems internals (Eg. filesystems, user management, etc)
- A working knowledge and deep understanding of cloud security concepts
- Proven track record of driving results and delivering high-quality solutions in a fast-paced environment
- Demonstrated ability to communicate clearly with both technical and non-technical project stakeholders, with the ability to work effectively in a cross-functional team environment.
-
Working with Ruby, Python, Perl, and Java
-
Troubleshooting and having working knowledge of various tools, open-source technologies, and cloud services.
-
Configuring and managing databases and cache layers such as MySQL, Mongo, Elasticsearch, Redis
-
Setting up all databases and for optimisations (sharding, replication, shell scripting etc)
-
Creating user, Domain handling, Service handling, Backup management, Port management, SSL services
-
Planning, testing & development of IT Infrastructure ( Server configuration and Database) and handling the technical issue related to server Docker and VM optimization
-
Demonstrate awareness of DB management, server related work, Elasticsearch.
-
Selecting and deploying appropriate CI/CD tools
-
Striving for continuous improvement and build continuous integration, continuous development, and constant deployment pipeline (CI/CD Pipeline)
-
Experience working on Linux based infrastructure
-
Awareness of critical concepts in DevOps and Agile principles
-
6-8 years of experience
This company is a network of the world's best developers - full-time, long-term remote software jobs with better compensation and career growth. We enable our clients to accelerate their Cloud Offering, and Capitalize on Cloud. We have our own IOT/AI platform and we provide professional services on that platform to build custom clouds for their IOT devices. We also build mobile apps, run 24x7 devops/site reliability engineering for our clients.
We are looking for very hands-on SRE (Site Reliability Engineering) engineers with 3 to 6 years of experience. The person will be part of team that is responsible for designing & implementing automation from scratch for medium to large scale cloud infrastructure and providing 24x7 services to our North American / European customers. This also includes ensuring ~100% uptime for almost 50+ internal sites. The person is expected to deliver with both high speed and high quality as well as work for 40 Hours per week (~6.5 hours per day, 6 days per week) in shifts which will rotate every month.
This person MUST have:
- B.E Computer Science or equivalent
- 2+ Years of hands-on experience troubleshooting/setting up of the Linux environment, who can write shell scripts for any given requirement.
- 1+ Years of hands-on experience setting up/configuring AWS or GCP services from SCRATCH and maintaining them.
- 1+ Years of hands-on experience setting up/configuring Kubernetes & EKS and ensuring high availability of container orchestration.
- 1+ Years of hands-on experience setting up CICD from SCRATCH in Jenkins & Gitlab.
- Experience configuring/maintaining one monitoring tool.
- Excellent verbal & written communication skills.
- Candidates with certifications - AWS, GCP, CKA, etc will be preferred
- Hands-on experience with databases (Cassandra, MongoDB, MySQL, RDS).
Experience:
- Min 3 years of experience as SRE automation engineer building, running, and maintaining production sites. Not looking for candidates who have experience only as L1/L2 or Build & Deploy..
Location:
- Remotely, anywhere in India
Timings:
- The person is expected to deliver with both high speed and high quality as well as work for 40 Hours per week (~6.5 hours per day, 6 days per week) in shifts which will rotate every month.
Position:
- Full time/Direct
- We have great benefits such as PF, medical insurance, 12 annual company holidays, 12 PTO leaves per year, annual increments, Diwali bonus, spot bonuses and other incentives etc.
- We dont believe in locking in people with large notice periods. You will stay here because you love the company. We have only a 15 days notice period.
- 2+ years of demonstrable experience leading site reliability and performance in large-scale, high-traffic environments
- 2+ years of hands-on experience as a DevOps engineer
- Strong leadership, communication and interpersonal skills geared to getting things done
- Developing themselves and the talent within their charge – fostering and creating opportunity for the team
- Strong understanding of SRE concepts and the DevOps culture. Set the direction and strategy for your team, and help shape the overall SRE program for the company
- Be able to lead complicated technical issues and communicating status updates/RCA with management and customers.
- Own site stability, performance, capacity planning, DevOps recruitment.


