
Site Reliability Engineer
at Our client company is into Computer software. (YB1)
- Design, develop, test, debug and maintain components of the cloud infrastructure
- Manage operational priorities of the DBaaS
- Establish a process for handling and leading response to security new vulnerabilities
- Lead certification efforts from the security perspective
- Participate in penetration testing efforts
- Design and build DBaaS processes for key management, rotation storage, encryption, and password management
Requirements:
- Strong software design and implementation skills in building infrastructure frameworks
- Experience building and operating extensible, scalable resilient data systems
- Working knowledge of Java and Python Experience using public cloud infrastructure (AWS, GCP, or Azure)
- Containerization tooling (Docker, EKS, Kubernetes)
- Infrastructure as Code Tooling (Example: Terraform, Cloudformation, Etc.)
- Configuration Management Tooling (Ansible, Chef, etc.)
- Automation Scripting (Python preferred)
- Solid understanding of basic systems operations (disk, network, etc)
- Willingness and ability to learn new languages and concepts
- 5+ years of relevant experience

Similar jobs

Job Title: Sr Dev Ops Engineer
Location: Bengaluru- India (Hybrid work type)
Reports to: Sr Engineer manager
About Our Client :
We are a solution-based, fast-paced tech company with a team that thrives on collaboration and innovative thinking. Our Client's IoT solutions provide real-time visibility and actionable insights for logistics and supply chain management. Cloud-based, AI-enhanced metrics coupled with patented hardware optimize processes, inform strategic decision making and enable intelligent supply chains without the costly infrastructure
About the role : We're looking for a passionate DevOps Engineer to optimize our software delivery and infrastructure. You'll build and maintain CI/CD pipelines for our microservices, automate infrastructure, and ensure our systems are reliable, scalable, and secure. If you thrive on enhancing performance and fostering operational excellence, this role is for you.
What You'll Do 🛠️
- Cloud Platform Management: Administer and optimize AWS resources, ensuring efficient billing and cost management.
- Billing & Cost Optimization: Monitor and optimize cloud spending.
- Containerization & Orchestration: Deploy and manage applications and orchestrate them.
- Database Management: Deploy, manage, and optimize database instances and their lifecycles.
- Authentication Solutions: Implement and manage authentication systems.
- Backup & Recovery: Implement robust backup and disaster recovery strategies, for Kubernetes cluster and database backups.
- Monitoring & Alerting: Set up and maintain robust systems using tools for application and infrastructure health and integrate with billing dashboards.
- Automation & Scripting: Automate repetitive tasks and infrastructure provisioning.
- Security & Reliability: Implement best practices and ensure system performance and security across all deployments.
- Collaboration & Support: Work closely with development teams, providing DevOps expertise and support for their various application stacks.
What You'll Bring 💼
- Minimum of 4 years of experience in a DevOps or SRE role.
- Strong proficiency in AWS Cloud, including services like Lambda, IoT Core, ElastiCache, CloudFront, and S3.
- Solid understanding of Linux fundamentals and command-line tools.
- Extensive experience with CI/CD tools, GitLab CI.
- Hands-on experience with Docker and Kubernetes, specifically AWS EKS.
- Proven experience deploying and managing microservices.
- Expertise in database deployment, optimization, and lifecycle management (MongoDB, PostgreSQL, and Redis).
- Experience with Identity and Access management solutions like Keycloak.
- Experience implementing backup and recovery solutions.
- Familiarity with optimizing scaling, ideally with Karpenter.
- Proficiency in scripting (Python, Bash).
- Experience with monitoring tools such as Prometheus, Grafana, AWS CloudWatch, Elastic Stack.
- Excellent problem-solving and communication skills.
Bonus Points ➕
- Basic understanding of MQTT or general IoT concepts and protocols.
- Direct experience optimizing React.js (Next.js), Node.js (Express.js, Nest.js) or Python (Flask) deployments in a containerized environment.
- Knowledge of specific AWS services relevant to application stacks.
- Contributions to open-source projects related to Kubernetes, MongoDB, or any of the mentioned frameworks.
- AWS Certifications (AWS Certified DevOps Engineer, AWS Certified Solutions Architect, AWS Certified SysOps Administrator, AWS Certified Advanced Networking).
Why this role:
•You will help build the company from the ground up—shaping our culture and having an impact from Day 1 as part of the foundational team.
-
Working with Ruby, Python, Perl, and Java
-
Troubleshooting and having working knowledge of various tools, open-source technologies, and cloud services.
-
Configuring and managing databases and cache layers such as MySQL, Mongo, Elasticsearch, Redis
-
Setting up all databases and for optimisations (sharding, replication, shell scripting etc)
-
Creating user, Domain handling, Service handling, Backup management, Port management, SSL services
-
Planning, testing & development of IT Infrastructure ( Server configuration and Database) and handling the technical issue related to server Docker and VM optimization
-
Demonstrate awareness of DB management, server related work, Elasticsearch.
-
Selecting and deploying appropriate CI/CD tools
-
Striving for continuous improvement and build continuous integration, continuous development, and constant deployment pipeline (CI/CD Pipeline)
-
Experience working on Linux based infrastructure
-
Awareness of critical concepts in DevOps and Agile principles
-
6-8 years of experience
This company is a network of the world's best developers - full-time, long-term remote software jobs with better compensation and career growth. We enable our clients to accelerate their Cloud Offering, and Capitalize on Cloud. We have our own IOT/AI platform and we provide professional services on that platform to build custom clouds for their IOT devices. We also build mobile apps, run 24x7 devops/site reliability engineering for our clients.
We are looking for very hands-on SRE (Site Reliability Engineering) engineers with 3 to 6 years of experience. The person will be part of team that is responsible for designing & implementing automation from scratch for medium to large scale cloud infrastructure and providing 24x7 services to our North American / European customers. This also includes ensuring ~100% uptime for almost 50+ internal sites. The person is expected to deliver with both high speed and high quality as well as work for 40 Hours per week (~6.5 hours per day, 6 days per week) in shifts which will rotate every month.
This person MUST have:
- B.E Computer Science or equivalent
- 2+ Years of hands-on experience troubleshooting/setting up of the Linux environment, who can write shell scripts for any given requirement.
- 1+ Years of hands-on experience setting up/configuring AWS or GCP services from SCRATCH and maintaining them.
- 1+ Years of hands-on experience setting up/configuring Kubernetes & EKS and ensuring high availability of container orchestration.
- 1+ Years of hands-on experience setting up CICD from SCRATCH in Jenkins & Gitlab.
- Experience configuring/maintaining one monitoring tool.
- Excellent verbal & written communication skills.
- Candidates with certifications - AWS, GCP, CKA, etc will be preferred
- Hands-on experience with databases (Cassandra, MongoDB, MySQL, RDS).
Experience:
- Min 3 years of experience as SRE automation engineer building, running, and maintaining production sites. Not looking for candidates who have experience only as L1/L2 or Build & Deploy..
Location:
- Remotely, anywhere in India
Timings:
- The person is expected to deliver with both high speed and high quality as well as work for 40 Hours per week (~6.5 hours per day, 6 days per week) in shifts which will rotate every month.
Position:
- Full time/Direct
- We have great benefits such as PF, medical insurance, 12 annual company holidays, 12 PTO leaves per year, annual increments, Diwali bonus, spot bonuses and other incentives etc.
- We dont believe in locking in people with large notice periods. You will stay here because you love the company. We have only a 15 days notice period.
- 2+ years of demonstrable experience leading site reliability and performance in large-scale, high-traffic environments
- 2+ years of hands-on experience as a DevOps engineer
- Strong leadership, communication and interpersonal skills geared to getting things done
- Developing themselves and the talent within their charge – fostering and creating opportunity for the team
- Strong understanding of SRE concepts and the DevOps culture. Set the direction and strategy for your team, and help shape the overall SRE program for the company
- Be able to lead complicated technical issues and communicating status updates/RCA with management and customers.
- Own site stability, performance, capacity planning, DevOps recruitment.


