Job Title: Cloud Operations Engineer (SRE)
Location: Bangalore, India
Experience: 5 – 12 years
Notice Period: Preferred short joiners
About Us:
At OUR CUSTOMER, we are on a mission to empower broadband operators to deliver an exceptional connected home experience for their subscribers. As the most widely deployed provider of Smart Wi-Fi solutions, our technologies enhance user experiences in over 35 million homes worldwide.
Our portfolio includes Smart Wi-Fi software, a cloud-based experience management platform, and advanced data-driven solutions. We provide customized engineering and testing services to help broadband operators deliver high-quality, seamless connectivity.
Join us in building resilient, scalable, and secure cloud solutions that drive the future of broadband!
Job Overview:
We are seeking a highly skilled Cloud Operations Engineer (SRE) with expertise in AWS cloud infrastructure, automation, monitoring, and performance optimization. You will be responsible for ensuring the reliability, scalability, security, and efficiency of our cloud-based applications while driving automation and operational excellence.
This role requires deep knowledge of AWS services, Infrastructure as Code (IaC), monitoring, troubleshooting, and DevOps best practices.
Key Responsibilities:
1. AWS Infrastructure Management:
- Design, implement, and manage scalable, secure, and high-performance AWS cloud environments.
- Optimize cost, performance, and reliability across cloud services.
2. Site Reliability Engineering (SRE):
- Ensure high availability and performance of cloud environments through monitoring, automation, and incident response.
- Implement self-healing and fault-tolerant architectures.
3. Monitoring & Operations:
- Deploy and manage monitoring and logging tools (AWS CloudWatch, Datadog, ELK, Prometheus, Grafana).
- Define SLIs, SLOs, and SLAs for cloud services.
- Proactively analyze performance trends and prevent outages.
4. Automation & Infrastructure as Code (IaC):
- Automate cloud provisioning and management using Terraform, AWS CloudFormation, or Ansible.
- Build and maintain CI/CD pipelines for seamless deployments.
- Use Python, Bash, or other scripting languages to automate operational tasks.
5. Incident Management & Troubleshooting:
- Respond to incidents, outages, and performance issues in a timely manner.
- Conduct root cause analysis (RCA) and implement preventive measures.
- Document incidents and create post-mortem reports.
6. Security & Compliance:
- Implement AWS security best practices, including IAM policies, network security, and encryption.
- Ensure compliance with industry regulations (GDPR, HIPAA, etc.).
- Regularly audit and enhance cloud security posture.
7. Backup, Disaster Recovery & Capacity Planning:
- Develop and manage backup strategies, disaster recovery plans, and high-availability architectures.
- Plan for future capacity needs, scaling resources based on demand.
8. Collaboration & Documentation:
- Work closely with Development, QA, Security, and Product teams to streamline operations.
- Create and maintain detailed technical documentation, architecture diagrams, and operational runbooks.
Qualifications & Skills:
Mandatory:
✅ Education: Bachelor’s degree in Computer Science, Information Technology, or related fields.
✅ Experience: 5+ years as a Cloud SRE, Operations Engineer, or DevOps Engineer with a focus on AWS services.
✅ Cloud Expertise: Strong hands-on experience with AWS services (EC2, S3, RDS, Lambda, ECS, EKS, VPC, Route 53, IAM, etc.).
✅ Infrastructure as Code (IaC): Proficiency with Terraform, AWS CloudFormation, or Ansible.
✅ Linux & Automation: Strong knowledge of Linux/Unix system administration, shell scripting, and automation using Python or Bash.
✅ Monitoring & Logging: Experience with CloudWatch, Datadog, Prometheus, ELK stack, Grafana.
✅ CI/CD & DevOps: Hands-on experience with Jenkins, GitLab CI/CD, CircleCI, or equivalent tools.
✅ Networking: Solid understanding of DNS, Load Balancing, VPNs, Firewalls, and Network Security.
✅ Problem-Solving: Strong analytical skills to troubleshoot and resolve cloud infrastructure issues.
✅ Communication: Excellent verbal and written communication skills to collaborate across teams.
Preferred:
⭐ AWS Certification (AWS Certified Solutions Architect, AWS Certified DevOps Engineer).
⭐ Experience with containerization (Docker, Kubernetes, EKS, Fargate).
⭐ Knowledge of security best practices, compliance (GDPR, HIPAA, etc.).
⭐ Familiarity with GitOps, service mesh technologies, and serverless architectures.
⭐ Experience working in Agile & DevOps environments.