Who You Are
What you need
What you get
Our Engineering teams deliver the VIMANA IIoT platform that processes and analyzes billions of streaming events, in near real-time, 24/7 from manufacturing plants all over the world. Our system is deployed on the manufacturing plant floor and the cloud (AWS, GCP, Azure). It runs on the most modern distributed clustering technology utilizing a microservices architecture. Our teams operate in a dynamic, collaborative, agile devops culture.
"Site Reliability Engineering (SRE) is what you get when you treat operations as if it's a
software problem. " [https://sre.googl e/]
As SRE, you will be building, evolving, testing, and operating the infrastructure automation platform used to power our on-prem and cloud services. You will ensure that our staging and production environments are operating and performing optimally and that software is released and deployed in an efficient and streamlined manner, from development to staging to production. This is a hands-on devops role with a balanced amount of tool and infrastructure development, including advanced scripting and automation. You will be supporting our internal infrastructure, as well as providing managed services support, product development, and support for the entire stack for our systems.
These are kinds of technologies you will be using. Candidates with experience in the these will be prefered (a partial list in no specific order):
Roles and Responsibilities
and its bottlenecks.
owners to fix it.
avoid that situation.
Experience & Skills
linux kernel subsystems (memory, storage, network etc).
solutions like Microsoft Azure or Google Cloud.
Mesos/Kubernetes is a plus.
- At least 3 years of experience with relative experience in managing development operations
- Hands-on experience with AWS
- Thorough knowledge on setting up release pipeline, managing multiple environments like Beta, Staging, UAT, and Production
- Thorough knowledge about best cloud practices and architecture
- Hands-on with benchmarking and performance monitoring
- Identifying various bottlenecks and taking pre-emptive measures to avoid downtime
- Hands-on knowledge with at least one toolset Chef/Puppet/Ansible
- Hands-on with CloudFormation / Terraform or other Infrastructure as code is a plus.
- Thorough experience with Shell Scripting and should not know to shy away from learning new technologies or programming languages
- Experience with other cloud providers like Azure and GCP is a plus
- Should be open to R&D for creative ways to improve performance while keeping costs low
What we want the person to do?
- Manage, Monitor and Provision Infrastructure - Majorly on AWS
- Will be responsible for maintaining 100% uptime on production servers (Site Reliability)
- Setting up a release pipeline for current releases. Automating releases for Beta, Staging & Production
- Maintaining near-production replica environments on Beta and Staging
- Automating Releases and Versioning of Static Assets (Experience with Chef/Puppet/Ansible)
- Should have hands-on experience with Build Tools like Jenkins, GitHub Actions, AWS CodeBuild etc
- Identify performance gaps and ways to fix.
- Weekly meetings with Engineering Team to discuss the changes/upgrades. Can be related to code issue/architecture bottlenecks.
- Creative Ways of Reducing Costs of Cloud Computing
- Convert Infrastructure Deployment / Provision to Infrastructure as Code for reusability and scaling.
About the Role
Dremio’s SREs ensure that our internal and externally visible services have reliability and uptime appropriate to users' needs and a fast rate of improvement. You will be joining a newly formed team that will spearhead our efforts to launch a cloud service. This is an opportunity to join a very fast growth startup and help build a cloud service from the ground up.
Responsibilities and Ownership
WHY ZYCUS? :
Zycus Global Leader Procurement: https://www.zycus.com/newsroom/press-releases.html