Site Reliability Engineer
at Our client company is into Computer software. (YB1)
- Design, develop, test, debug and maintain components of the cloud infrastructure
- Manage operational priorities of the DBaaS
- Establish a process for handling and leading response to security new vulnerabilities
- Lead certification efforts from the security perspective
- Participate in penetration testing efforts
- Design and build DBaaS processes for key management, rotation storage, encryption, and password management
Requirements:
- Strong software design and implementation skills in building infrastructure frameworks
- Experience building and operating extensible, scalable resilient data systems
- Working knowledge of Java and Python Experience using public cloud infrastructure (AWS, GCP, or Azure)
- Containerization tooling (Docker, EKS, Kubernetes)
- Infrastructure as Code Tooling (Example: Terraform, Cloudformation, Etc.)
- Configuration Management Tooling (Ansible, Chef, etc.)
- Automation Scripting (Python preferred)
- Solid understanding of basic systems operations (disk, network, etc)
- Willingness and ability to learn new languages and concepts
- 5+ years of relevant experience
Similar jobs
-
Working with Ruby, Python, Perl, and Java
-
Troubleshooting and having working knowledge of various tools, open-source technologies, and cloud services.
-
Configuring and managing databases and cache layers such as MySQL, Mongo, Elasticsearch, Redis
-
Setting up all databases and for optimisations (sharding, replication, shell scripting etc)
-
Creating user, Domain handling, Service handling, Backup management, Port management, SSL services
-
Planning, testing & development of IT Infrastructure ( Server configuration and Database) and handling the technical issue related to server Docker and VM optimization
-
Demonstrate awareness of DB management, server related work, Elasticsearch.
-
Selecting and deploying appropriate CI/CD tools
-
Striving for continuous improvement and build continuous integration, continuous development, and constant deployment pipeline (CI/CD Pipeline)
-
Experience working on Linux based infrastructure
-
Awareness of critical concepts in DevOps and Agile principles
-
6-8 years of experience
This company is a network of the world's best developers - full-time, long-term remote software jobs with better compensation and career growth. We enable our clients to accelerate their Cloud Offering, and Capitalize on Cloud. We have our own IOT/AI platform and we provide professional services on that platform to build custom clouds for their IOT devices. We also build mobile apps, run 24x7 devops/site reliability engineering for our clients.
We are looking for very hands-on SRE (Site Reliability Engineering) engineers with 3 to 6 years of experience. The person will be part of team that is responsible for designing & implementing automation from scratch for medium to large scale cloud infrastructure and providing 24x7 services to our North American / European customers. This also includes ensuring ~100% uptime for almost 50+ internal sites. The person is expected to deliver with both high speed and high quality as well as work for 40 Hours per week (~6.5 hours per day, 6 days per week) in shifts which will rotate every month.
This person MUST have:
- B.E Computer Science or equivalent
- 2+ Years of hands-on experience troubleshooting/setting up of the Linux environment, who can write shell scripts for any given requirement.
- 1+ Years of hands-on experience setting up/configuring AWS or GCP services from SCRATCH and maintaining them.
- 1+ Years of hands-on experience setting up/configuring Kubernetes & EKS and ensuring high availability of container orchestration.
- 1+ Years of hands-on experience setting up CICD from SCRATCH in Jenkins & Gitlab.
- Experience configuring/maintaining one monitoring tool.
- Excellent verbal & written communication skills.
- Candidates with certifications - AWS, GCP, CKA, etc will be preferred
- Hands-on experience with databases (Cassandra, MongoDB, MySQL, RDS).
Experience:
- Min 3 years of experience as SRE automation engineer building, running, and maintaining production sites. Not looking for candidates who have experience only as L1/L2 or Build & Deploy..
Location:
- Remotely, anywhere in India
Timings:
- The person is expected to deliver with both high speed and high quality as well as work for 40 Hours per week (~6.5 hours per day, 6 days per week) in shifts which will rotate every month.
Position:
- Full time/Direct
- We have great benefits such as PF, medical insurance, 12 annual company holidays, 12 PTO leaves per year, annual increments, Diwali bonus, spot bonuses and other incentives etc.
- We dont believe in locking in people with large notice periods. You will stay here because you love the company. We have only a 15 days notice period.
- 2+ years of demonstrable experience leading site reliability and performance in large-scale, high-traffic environments
- 2+ years of hands-on experience as a DevOps engineer
- Strong leadership, communication and interpersonal skills geared to getting things done
- Developing themselves and the talent within their charge – fostering and creating opportunity for the team
- Strong understanding of SRE concepts and the DevOps culture. Set the direction and strategy for your team, and help shape the overall SRE program for the company
- Be able to lead complicated technical issues and communicating status updates/RCA with management and customers.
- Own site stability, performance, capacity planning, DevOps recruitment.