Lead Cloud Reliability Engineer
Job Responsibilities
ā Lead and manage the Cloud Reliability teams to provide strong Managed Services support to end-customers.
ā Isolate, troubleshoot and resolve issues reported by CMS clients in their cloud environment
ā Drive the communication with the customer providing details about the issue, current steps, next plan of action, ETA
ā Gather client's requirements related to use of specic cloud services and provide assistance in seing them up and resolving issues
ā Create SOPs and knowledge articles for use by the L1 teams to resolve common issues
ā Identify recurring issues, perform root cause analysis and propose/implement preventive actions
ā Follow change management procedure to identify, record and implement changes
ā Plan and deploy OS, security patches in Windows/Linux environment and upgrade k8s clusters
ā Identify the recurring manual activities and contribute to automation
ā Provide technical guidance and educate team members on development and operations. Monitor metrics and develop ways to improve.
ā System troubleshooting and problem-solving across plaorm and application domains. Ability to use a wide variety of open-source technologies and cloud services.
ā Build, maintain, and monitor conguration standards.
ā Ensuring critical system security through using best-in-class cloud security solutions.
Qualifications
ā 4-7 years experience in Cloud Infrastructure and Operations domains and IT operational experience preferably in a global enterprise environment.
ā Specialize in one or two cloud deployment platforms: AWS, GCP
ā Hands on experience with AWS/GCP services (EKS, ECS, EC2, VPC, RDS, Lambda, GKE, Compute Engine)
ā Understanding of one or more programming languages (Python, JavaScript, Ruby, Java, .Net)
ā Logging and Monitoring tools (ELK, Stackdriver, CloudWatch)
ā Knowledge on Conguration Management tools such as Ansible, Terraform, Puppet, Chef
ā Experience working with deployment and orchestration technologies (such as Docker, Kubernetes, Mesos)
ā Good analytical, communication, problem solving, and learning skills.
ā Knowledge on programming against cloud plaorms such as Google Cloud Platform and lean development methodologies.
ā Strong service aitude and a commitment to quality.
ā Willingness to work in shifts.