
Site Reliability Engineer
at Our client company is into Computer software. (YB1)
- Design, develop, test, debug and maintain components of the cloud infrastructure
- Manage operational priorities of the DBaaS
- Establish a process for handling and leading response to security new vulnerabilities
- Lead certification efforts from the security perspective
- Participate in penetration testing efforts
- Design and build DBaaS processes for key management, rotation storage, encryption, and password management
Requirements:
- Strong software design and implementation skills in building infrastructure frameworks
- Experience building and operating extensible, scalable resilient data systems
- Working knowledge of Java and Python Experience using public cloud infrastructure (AWS, GCP, or Azure)
- Containerization tooling (Docker, EKS, Kubernetes)
- Infrastructure as Code Tooling (Example: Terraform, Cloudformation, Etc.)
- Configuration Management Tooling (Ansible, Chef, etc.)
- Automation Scripting (Python preferred)
- Solid understanding of basic systems operations (disk, network, etc)
- Willingness and ability to learn new languages and concepts
- 5+ years of relevant experience

Similar jobs
Role: DevOps Engineer
Experience: 7+ Years
Location: Pune / Trivandrum
Work Mode: Hybrid
𝐊𝐞𝐲 𝐑𝐞𝐬𝐩𝐨𝐧𝐬𝐢𝐛𝐢𝐥𝐢𝐭𝐢𝐞𝐬:
- Drive CI/CD pipelines for microservices and cloud architectures
- Design and operate cloud-native platforms (AWS/Azure)
- Manage Kubernetes/OpenShift clusters and containerized applications
- Develop automated pipelines and infrastructure scripts
- Collaborate with cross-functional teams on DevOps best practices
- Mentor development teams on continuous delivery and reliability
- Handle incident management, troubleshooting, and root cause analysis
𝐌𝐚𝐧𝐝𝐚𝐭𝐨𝐫𝐲 𝐒𝐤𝐢𝐥𝐥𝐬:
- 7+ years in DevOps/SRE roles
- Strong experience with AWS or Azure
- Hands-on with Docker, Kubernetes, and/or OpenShift
- Proficiency in Jenkins, Git, Maven, JIRA
- Strong scripting skills (Shell, Python, Perl, Ruby, JavaScript)
- Solid networking knowledge and troubleshooting skills
- Excellent communication and collaboration abilities
𝐏𝐫𝐞𝐟𝐞𝐫𝐫𝐞𝐝 𝐒𝐤𝐢𝐥𝐥𝐬:
- Experience with Helm, monitoring tools (Splunk, Grafana, New Relic, Datadog)
- Knowledge of Microservices and SOA architectures
- Familiarity with database technologies

Job Title: DevOps - 3
Roles and Responsibilities:
- Develop deep understanding of the end-to-end configurations, dependencies, customer requirements, and overall characteristics of the production services as the accountable owner for overall service operations
- Implementing best practices, challenging the status quo, and tab on industry and technical trends, changes, and developments to ensure the team is always striving for best-in-class work
- Lead incident response efforts, working closely with cross-functional teams to resolve issues quickly and minimize downtime. Implement effective incident management processes and post-incident reviews
- Participate in on-call rotation responsibilities, ensuring timely identification and resolution of infrastructure issues
- Possess expertise in designing and implementing capacity plans, accurately estimating costs and efforts for infrastructure needs.
- Systems and Infrastructure maintenance and ownership for production environments, with a continued focus on improving efficiencies, availability, and supportability through automation and well defined runbooks
- Provide mentorship and guidance to a team of DevOps engineers, fostering a collaborative and high-performing work environment. Mentor team members in best practices, technologies, and methodologies.
- Design for Reliability - Architect & implement solutions that keeps Infrastructure running with Always On availability and ensures high uptime SLA for the Infrastructure
- Manage individual project priorities, deadlines, and deliverables related to your technical expertise and assigned domains
- Collaborate with Product & Information Security teams to ensure the integrity and security of Infrastructure and applications. Implement security best practices and compliance standards.
Must Haves
- 5-8 years of experience as Devops / SRE / Platform Engineer.
- Strong expertise in automating Infrastructure provisioning and configuration using tools like Ansible, Packer, Terraform, Docker, Helm Charts etc.
- Strong skills in network services such as DNS, TLS/SSL, HTTP, etc
- Expertise in managing large-scale cloud infrastructure (preferably AWS and Oracle)
- Expertise in managing production grade Kubernetes clusters
- Experience in scripting using programming languages like Bash, Python, etc.
- Expertise in skill sets for centralized logging systems, metrics, and tooling frameworks such as ELK, Prometheus/VictoriaMetrics, and Grafana etc.
- Experience in Managing and building High scale API Gateway, Service Mesh, etc
- Systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive
- Have a working knowledge of a backend programming language
- Deep knowledge & experience with Unix / Linux operating systems internals (Eg. filesystems, user management, etc)
- A working knowledge and deep understanding of cloud security concepts
- Proven track record of driving results and delivering high-quality solutions in a fast-paced environment
- Demonstrated ability to communicate clearly with both technical and non-technical project stakeholders, with the ability to work effectively in a cross-functional team environment.
-
Working with Ruby, Python, Perl, and Java
-
Troubleshooting and having working knowledge of various tools, open-source technologies, and cloud services.
-
Configuring and managing databases and cache layers such as MySQL, Mongo, Elasticsearch, Redis
-
Setting up all databases and for optimisations (sharding, replication, shell scripting etc)
-
Creating user, Domain handling, Service handling, Backup management, Port management, SSL services
-
Planning, testing & development of IT Infrastructure ( Server configuration and Database) and handling the technical issue related to server Docker and VM optimization
-
Demonstrate awareness of DB management, server related work, Elasticsearch.
-
Selecting and deploying appropriate CI/CD tools
-
Striving for continuous improvement and build continuous integration, continuous development, and constant deployment pipeline (CI/CD Pipeline)
-
Experience working on Linux based infrastructure
-
Awareness of critical concepts in DevOps and Agile principles
-
6-8 years of experience
This company is a network of the world's best developers - full-time, long-term remote software jobs with better compensation and career growth. We enable our clients to accelerate their Cloud Offering, and Capitalize on Cloud. We have our own IOT/AI platform and we provide professional services on that platform to build custom clouds for their IOT devices. We also build mobile apps, run 24x7 devops/site reliability engineering for our clients.
We are looking for very hands-on SRE (Site Reliability Engineering) engineers with 3 to 6 years of experience. The person will be part of team that is responsible for designing & implementing automation from scratch for medium to large scale cloud infrastructure and providing 24x7 services to our North American / European customers. This also includes ensuring ~100% uptime for almost 50+ internal sites. The person is expected to deliver with both high speed and high quality as well as work for 40 Hours per week (~6.5 hours per day, 6 days per week) in shifts which will rotate every month.
This person MUST have:
- B.E Computer Science or equivalent
- 2+ Years of hands-on experience troubleshooting/setting up of the Linux environment, who can write shell scripts for any given requirement.
- 1+ Years of hands-on experience setting up/configuring AWS or GCP services from SCRATCH and maintaining them.
- 1+ Years of hands-on experience setting up/configuring Kubernetes & EKS and ensuring high availability of container orchestration.
- 1+ Years of hands-on experience setting up CICD from SCRATCH in Jenkins & Gitlab.
- Experience configuring/maintaining one monitoring tool.
- Excellent verbal & written communication skills.
- Candidates with certifications - AWS, GCP, CKA, etc will be preferred
- Hands-on experience with databases (Cassandra, MongoDB, MySQL, RDS).
Experience:
- Min 3 years of experience as SRE automation engineer building, running, and maintaining production sites. Not looking for candidates who have experience only as L1/L2 or Build & Deploy..
Location:
- Remotely, anywhere in India
Timings:
- The person is expected to deliver with both high speed and high quality as well as work for 40 Hours per week (~6.5 hours per day, 6 days per week) in shifts which will rotate every month.
Position:
- Full time/Direct
- We have great benefits such as PF, medical insurance, 12 annual company holidays, 12 PTO leaves per year, annual increments, Diwali bonus, spot bonuses and other incentives etc.
- We dont believe in locking in people with large notice periods. You will stay here because you love the company. We have only a 15 days notice period.
- 2+ years of demonstrable experience leading site reliability and performance in large-scale, high-traffic environments
- 2+ years of hands-on experience as a DevOps engineer
- Strong leadership, communication and interpersonal skills geared to getting things done
- Developing themselves and the talent within their charge – fostering and creating opportunity for the team
- Strong understanding of SRE concepts and the DevOps culture. Set the direction and strategy for your team, and help shape the overall SRE program for the company
- Be able to lead complicated technical issues and communicating status updates/RCA with management and customers.
- Own site stability, performance, capacity planning, DevOps recruitment.






