i

Site Reliability Engineering

i
Posted by Abhimanyu Bhatter
Apply to this job
i
Remote, Noida, Bengaluru (Bangalore), NCR (Delhi | Gurgaon | Noida)
i
6 - 11 yrs
i
₹16L - ₹25L / yr (ESOP available)
Skills
Reliability engineering
Cloud Computing
Amazon Web Services (AWS)
Docker
VMware vSphere
OpenStack
openshift
Kubernetes
DevOps
Google Cloud Platform (GCP)
Job description
What are we looking for:
● Research, propose and evaluate with a 5-year vision, the architecture, design, technologies,
processes and profiles related to Telco Cloud.
● Participate in the creation of a realistic technical-strategic roadmap of the network to transform
it to Telco Cloud and be prepared for 5G.
● Using your deep technical expertise, you will provide detailed feedback to Product Management
and Engineering, as well as contribute directly to the platform code base to enhance both the
Customer experience of the service, as well as the SRE quality of life.
● The individual must be aware of trends in network infrastructure as well as within the network
engineering and OSS community. What technologies are being developed or launched?
● The individual should stay current with infrastructure trends in the telco network cloud domain.
● Be responsible for the Engineering of Lab and Production Telco Cloud environments, including
patches, upgrades, and reliability and performance improvements.
Required Minimum Qualifications: (Education and Technical Skills/Knowledge)
● Software Engineering degree, MS in Computer Science or equivalent experience
● Years of experiences as an SRE, DevOps, Development and/or Support related role
● 0-5 years of professional experience for a junior position
● At least 8 years of professional experience for a senior position
● Unix server administration and tuning : Linux / RedHat / CentOS / Ubuntu
● You have deep knowledge in Networking Layers 1-4
● Cloud / Virtualization (at least two): Helm, Docker, Kubernetes, AWS, Azure, Google Cloud,
OpenStack, OpenShift, VMware vSphere / Tanzu
● You have in-depth knowledge of cloud storage solutions on top of AWS, GCP, Azure and/or
on-prem private cloud, such as Ceph, CephFS, GlusterFS
● DevOps: Jenkins, Git, Azure DevOps, Ansible, Terraform
● Backend Knowledge Bash, Python, Go (other knowledge of Scripting Language is a plus).
● PaaS Level solutions such as Keycloak for IAM, Prometheus, Grafana, ELK, DBaaS (such as MySQL,
Cassandra)
About the Organisation:
The team at Coredge.io is a combination of experienced and young professionals alike having
many years of experience in working with Edge computing, Telecom application development
and Kubernetes. The company has continuously collaborated with the open source community,
universities and major industry players in furthering its goal of providing the industry with an
indispensable tool to offer improved services to its customers. Coredge.io has a global market
presence with its offices in US and New Delhi, India.
About Coredgeio
The team at Coredge.io is a combination of experienced and young professionals alike having many years of experience in working with Edge computing, Telecom application development and Kubernetes. The company has continuously collaborated with the open source community, universities and major industry players in furthering its goal of providing the industry with an indispensable tool to offer improved services to its customers. Coredge.io has a global market presence with its offices in US and New Delhi, India.
Founded
2020
Type
Product
Size
20-100 employees
Stage
Raised funding
Why apply to jobs via CutShort
i
Personalized job matches
Stop wasting time. Get matched with jobs that meet your skills, aspirations and preferences.
i
Verified hiring teams
See actual hiring teams, find common social connections or connect with them directly. No 3rd party agencies here.
i
Move faster with AI
We use AI to get you faster responses, recommendations and unmatched user experience.
2101133
Matches delivered
3712187
Network size
6212
Companies hiring
Similar jobs
i
Founded 2009  •  Product  •  20-100 employees  •  Profitable
DevOps
CI/CD
Cloud Computing
Software deployment
Linux/Unix
Monitoring
Virtualization
i
Remote, Chennai
i
2 - 4 yrs
i
Best in industry

Team Description

Our Engineering teams deliver the VIMANA IIoT platform that processes and analyzes billions of streaming events, in near real-time, 24/7 from manufacturing plants all over the world. Our system is deployed on the manufacturing plant floor and the cloud (AWS, GCP, Azure). It runs on the most modern distributed clustering technology utilizing a microservices architecture. Our teams operate in a dynamic, collaborative, agile devops culture.

 

Role Description

"Site Reliability Engineering (SRE) is what you get when you treat operations as if it's a

software problem. " [https://sre.googl e/]

 

As SRE, you will be building, evolving, testing, and operating the infrastructure automation platform used to power our on-prem and cloud services. You will ensure that our staging and production environments are operating and performing optimally and that software is released and deployed in an efficient and streamlined manner, from development to staging to production. This is a hands-on devops role with a balanced amount of tool and infrastructure development, including advanced scripting and automation. You will be supporting our internal infrastructure, as well as providing managed services support, product development, and support for the entire stack for our systems.

 

Responsibilities

  • Help to architect availability, latency, scalability, efficiency, and security of VIMANA services.
  • Automate and monitor infrastructure security, including network and instance hardening, network and instance intrusion detection and prevention, and web application firewalls.
  • Provision and maintain the health of the various application and data clusters, supporting instances, and technologies.
  • Ensure high-availability through disaster recovery tests where we use chaos engineering to purposefully fail aspects of the infrastructure to test and improve resiliency to failures.
  • Conduct root cause analysis of production issues including troubleshooting and debugging through complex data-stream pipelines.
  • Drive and contribute to a culture of intolerance to manual activity, which results in an automation environment delivering repeatable and scalable response to system issues.
  • Support the client services team in resolving customer issues related to availability.
  • Own the continuous build and continuous delivery infrastructure.

 

Requirements

  • BS degree in Computer Science or related technical field, or equivalent practical experience.
  • Strong coding ability in at least one of: Java, JavaScript/NodeJS, Python, Go, or Shell scripting.
  • Experience in a highly-complex technical operations environment.
  • Experience with Linux systems administration.
  • Experience in managing Cloud Infrastructure on AWS, GCP, or Azure.
  • Experience in maintaining container orchestration systems such as Kubernetes, Nomad, etc.
  • Sharp and tenacious troubleshooting skills: you can fix anything.
  • Ability to pick up new software, frameworks, and APIs quickly.
  • Solid written and verbal English communications skills.
  • Detail oriented, careful, and skilled at prioritising.
  • Ability to handle periodic on-call duty as well as spider-sense awareness of services' health.

 

Technology

These are kinds of technologies you will be using. Candidates with experience in the these will be prefered (a partial list in no specific order):

  • Ansible, Terraform.
  • Flux, Rundeck.
  • Kubernetes, Nomad, Docker.
  • AWS, Azure, GCP.
  • Consul, Itsio, ZooKeeper.
  • Kafka, Kinesis, ActiveMQ, MQTT, Redis.
  • Elasticsearch, InfluxDB.
  • MongoDB, Cassandra.
  • AWS Athena.
  • Traefik, HAProxy, NGINX.
  • SimianArmy, kube-monkey.
  • Cloud Foundry.
  • ELK stack, Grafana.
  • Elastic Beats, Prometheus.
  • OSSEC, WAZUH.
  • Security Monkey.
  • ModSecurity.
  • TeamCity, Jenkins.
  • GitOps.

About VIMANA

We build products and platforms for the Industrial Internet of Things. Our technology is being used around the world in mission-critical applications - from improving the performance of manufacturing plants, to making electric vehicles safer and more efficient, to making industrial equipment smarter.

Please visit https://govimana.com/ to learn more about what we do.

Why Explore a Career at VIMANA
  • We recognize that our dedicated team members make us successful and we offer competitive salaries.
  • We are a workplace that values work-life balance, provides flexible working hours, and full time remote work options.
  • You will be part of a team that is highly motivated to learn and work on cutting edge technologies, tools, and development practices.
  • Bon Appetit! Enjoy catered breakfasts, lunches and free snacks!

VIMANA Interview Process
We usually target to complete all the interviews in a week's time and would provide prompt feedback to the candidate. As of now, all the interviews are conducted online due to covid situation.

1.Telephonic screening (30 Min )

A 30 minute telephonic interview to understand and evaluate the candidate's fit with the job role and the company.
Clarify any queries regarding the job/company.
Give an overview about further interview rounds

2. Technical Rounds

This would be deep technical round to evaluate the candidate's technical capability pertaining to the job role.

3. HR Round

Candidate's team and cultural fit will be evaluated during this round

We would proceed with releasing the offer if the candidate clears all the above rounds.

Note: In certain cases, we might schedule additional rounds if needed before releasing the offer.
Read more
Job posted by
i
Loshy Chandran
Apply for job
DevOps
CI/CD
Linux administration
Kubernetes
Amazon Web Services (AWS)
Puppet
Chef
Python
Java
Go Programming (Golang)
i
Bengaluru (Bangalore)
i
6 - 11 yrs
i
₹20L - ₹38L / yr

 

Roles and Responsibilities

  • Managing Availability, Performance, Capacity of infrastructure and applications.
  • Building and implementing observability for applications health/performance/capacity.
  • Optimizing On-call rotations and processes.
  • Documenting “tribal” knowledge.
  • Managing Infra-platforms like Mesos/Kubernetes,CICD,Observability (Prometheus/New Relic/ELK),Cloud Platforms (AWS/ Azure),Databases,Data Platforms Infrastructure
  • Providing help in onboarding new services with production readiness review process.
  • Providing reports on services SLO/Error Budgets/Alerts and Operational Overhead.
  • Working with Dev and Product teams to define SLO/Error Budgets/Alerts.
  • Working with Dev team to have in depth understanding of the application architecture

          and its bottlenecks.

  • Identifying observability gaps in product services, infrastructure and working with stake

          owners to fix it.

  • Managing Outages and doing detailed RCA with developers and identifying ways to

          avoid that situation.

  • Managing/Automating upgrades of the infrastructure services.
  • Automate toil work.
  •  

Experience & Skills

  • 6+ years of total experience
  • Experience as an SRE/DevOps/Infrastructure Engineer on large scale microservices and infrastructure.
  • A collaborative spirit with the ability to work across disciplines to influence, learn, and

         deliver.

  • A deep understanding of computer science, software development, and networking principles.
  • Demonstrated experience with languages, such as Python, Java, Golang etc.
  • Extensive experience with Linux administration and good understanding the various

linux kernel subsystems (memory, storage, network etc).

  • Extensive experience in DNS, TCP/IP, UDP, GRPC, Routing and Load Balancing.
  • Expertise in GitOps, Infrastructure as a Code tools such as Terraform etc.. and
  • Configuration Management Tools such as Chef, Puppet, Saltstack, Ansible.
  • Expertise of Amazon Web Services (AWS) and/or other relevant Cloud Infrastructure

solutions like Microsoft Azure or Google Cloud.

  • Experience in building CI/CD solutions with tools such as Jenkins, GitLab, Spinnaker,

Argo etc.

  • Experience in managing and deploying containerized environments using Docker,

Mesos/Kubernetes is a plus.

Read more
Job posted by
i
RAKESH RANJAN
Apply for job
i
at A startup company providing AI based software platforms
Agency job
via zyoin
Python
DevOps
Docker
Amazon Web Services (AWS)
Kubernetes
Reliability engineering
i
Remote, Bengaluru (Bangalore)
i
3 - 7 yrs
i
₹10L - ₹30L / yr

Who You Are

  • Creative thinker and strong problem solver with meticulous attention to detail
  • Highly organized, creative, motivated, and passionate about achieving results
  • Able to balance multiple tasks and projects effectively and quickly adapt to new situations and technologies
  • Able to work both independently and as part of a team
  • Systematic problem-solver, coupled with a strong sense of ownership and drive

 

What you need

  • 3-7 years of experience as a Site Reliability Engineer or a mix of a software engineer and DevOps.
  • Strong hands-on knowledge of Linux fundamentals, System administration scripting, performance tuning/scalability, troubleshooting.
  • Write great quality code using SOLID principles including unit and integration tests.
  • Hands-on development experience in an object-orientated programming language like Python.
  • Hands-on experience developing task automations
  • Experience using tools to create and manage CI (continuous integration) and CD (continuous delivery) pipelines.
  • Familiarity with software development tools: source code management (SCM systems), code review systems, issue tracking tools, build tools, test frameworks, code quality tools.
  • Experience implementing open-source observability and alerting tools, like Prometheus, Grafana, Cortex, Thanos, Alertmanager etc
  • Have decent knowledge on networking (VPC, VNet, DNS etc) and of the TCP/IP stack, internet routing and load balancing.
  • Worked with log and configuration management tool
  • Prior experience of working with AWS, Azure, GCP is a plus
  • Prior experience of working with Kubernetes, Docker and containers is plus
  • Strong interpersonal communication skills (including listening, speaking, and writing) and ability to work well in a diverse, team-focused environment with other SREs, Engineers, Product Managers, etc.
  • Documenting your work should be in your DNA

 

What you get

  • A chance to develop and build something (probably from scratch) which you can be proud of
  • Build and Implement modern systems observability solutions including monitoring, alerting, metrics, logging, and APM & distributed tracing.
  • Scale systems sustainably through automation and evolve systems by pushing for changes that improve reliability and velocity.
  • Maintain business continuity by identifying and driving opportunities to make systems highly resilient and human-free.
  • Closely work with the software engineering team to ensure accurate monitoring and metrics are being built into applications before going to production.
  • Develop and maintain software modules for use and re-use in cloud and on-premise systems automation.
  • Identify process gaps and implement process improvements to increase operational reliability
  • Drive standardization efforts across the services, infrastructure, systems, and practices
  • Develop Systems & Tools to help with Development team to uphold the Reliability principles
Read more
Job posted by
i
RAKESH RANJAN
Apply for job
i
Founded 2016  •  Products & Services  •  20-100 employees  •  Bootstrapped
DevOps
Amazon Web Services (AWS)
Python
Go Programming (Golang)
Kubernetes
Shell Scripting
Terraform
Javascript
Docker
Ansible
System Administration
Elastic Search
Monitoring
Amazon RDS
MySQL
SQL
Prometheus
ELK
Grafana
i
Remote only
i
4 - 10 yrs
i
₹12L - ₹30L / yr

Hey there!

 

Biostrap is based in Los Angeles, California with our team working remotely in several countries around the globe. This is a remote position, you’ll need a computer and a high speed internet connection.

 

We are looking for the tough kinds, the warrior ones, always learning  Sr. Devops Engineers to take care of our infrastructure and site reliability @ Biostrap. As an engineer at Biostrap, you will be a part of a lean but extremely passionate team of engineers and work towards making and keeping Biostrap as the go-to best health platform

 

Responsibilities: What would the job be like?

  • Work closely with the engineering team to deploy and maintain the infrastructure.
  • Add automation at every part of the development and deployment lifecycle.
  • Analyze and help in Infrastructure cost optimizations.
  • Build and work with CI + CD workflows..
  • Build robust observability system for system monitoring and tracing.
  • Architect scalable logging servers.
  • Add extensive alerting systems for various important issues, events using monitoring and logging services.
  • Work with other engineers in developing architecture that is scalable and resilient to changes in product requirements and usage in an agile environment.
  • Security Hardening of cloud infrastructure against known/unknown vulnerabilities
  • Write Infrastructure as Code for most of the cloud.
  • Suggest and implement pragmatic changes to infrastructure to increase performance, resilience and availability and to fool-proof infrastructure for future.
  • Build auditing systems for various resource accesses and have a breach detection notification system.
  • Do periodic security reviews and implement improvements.
  • Be incharge of and manage deployments of various services.
  • Work with aws resources, containers and systems like Ansible/EKS/kubernetes.

 

Qualifications: Who should apply for this role?

  • You have 3+ years of working in small to medium size teams building and shipping products.
  • Strong grasp of at least one of the scripting or systems languages like Python, Javascript, Golang etc.
  • Good experience managing various AWS resources.
  • Well equipped with Linux and Bash/Shell scripting
  • Working knowledge of Docker or container management.
  • Have some development experience with Kubernetes.
  • You spin out containers as if it's your fantasy war ground. 
  • Understand deployment tools like Ansible or similar.
  • Built and worked with CI+CD systems like Gitlab Ci, Jenkins, CircleCi, Travis etc.
  • Working knowledge of GIT for version control.
  • Experience with database management and security.
  • Experience with Terraform for Infrastructure as Code.
  • Knowledge of configuration management and secrets/keys management services like AWS KMS, Vault etc.
  • Required to be proficient in English (both speaking and writing).

 

 

Brownie Points for (:D):

  • You already use Biostrap and have plenty of feedback to provide.
  • You can lecture developers on scalable infrastructures.
  • You have built or worked with Prometheus, Grafana, ELK systems.
  • You have a story to tell about how you managed a failure or was part of a disaster recovery.
  • You contribute to Open Source projects or have a good Github/GitLab presence to showcase your past projects.
  • You have sent your code to Space and it runs “a” Rover on Mars. :P
Read more
Job posted by
i
Anirban Das
Apply for job
i
Founded 2008  •  Product  •  500-1000 employees  •  Raised funding
DevOps
Kubernetes
Amazon Web Services (AWS)
Java
Python
Continuous Integration
Docker
Terraform
i
Bengaluru (Bangalore)
i
5 - 9 yrs
i
₹25L - ₹37L / yr
Your Responsibilities
  • We are looking for a Senior SRE with a proven track record of success leading complex cloud-hybrid environments. You will have:
  • Strong sense of Being an Owner, Wearing the Customer Shoes, with the ability to Empower Others demonstrated through clear
  • communication and collaboration.
  • Skills to work independently with multiple global teams, developing, configuring, deploying, and operating our global infrastructure on AWS and on-prem.
  • Operational experience in complex distributed and real-time systems, including experience with SLO/SLAs towards high availability,reliability and DR goals.
  • DevOps experience in building tools and frameworks, with an understanding of continuous deployment processes.
  • Ability to think at scale, bringing a focus on continuous delivery methodologies from design through deployment and operations.
  • Experience building and managing systems with tools including Kubernetes, Chef/Ansible/Puppet, Kafka, Docker, and Terraform.
Required Skill
  • 5+ years experience in a Software and/or Site Reliability Engineering role
  • Experience writing automation code in GoLang, Python or Java
  • Experience developing and operating large scale distributed systems with Kubernetes and Docker
  • Experience in running real time and low latency high available applications (Kafka, gRPC, RTP)
  • Experience running public cloud environments on AWS
  • Experience running hybrid clouds and on-prem infrastructures on Red Hat Enterprise Linux / CentOS
  • Bachelor degree in Engineering, Computer Science or equivalent experience
  • The ability to lead, partner, and collaborate cross functionally across an engineering organization
Read more
Job posted by
i
Sandesh HS
Apply for job
i
Founded 2015  •  Product  •  100-500 employees  •  Raised funding
Python
CI/CD
Amazon Web Services (AWS)
Ansible
Kubernetes
Google Cloud Platform (GCP)
Windows Azure
i
Hyderabad
i
6 - 12 yrs
i
₹20L - ₹40L / yr

About the Role

Dremio’s SREs ensure that our internal and externally visible services have reliability and uptime appropriate to users' needs and a fast rate of improvement. You will be joining a newly formed team that will spearhead our efforts to launch a cloud service. This is an opportunity to join a very fast growth startup and help build a cloud service from the ground up.

Responsibilities and Ownership

  • Ability to debug and optimize code and automate routine tasks.
  • Evangelize and advocate for reliability practices across our organization.
  • Collaborate with other Engineering teams to support services before they go live through activities such as system design consulting, developing software platforms and frameworks, monitoring/alerting, capacity planning and launch reviews.
  • Analyze and optimize our core product by developing and implementing reliability and performance practices.
  • Scale systems sustainably through automation and evolve systems by pushing for changes that improve reliability and velocity.
  • Be on-call for services that the SRE team owns.
  • Practice sustainable incident response and blameless postmortems.

Qualifications

  • 6+ years of relevant experience in the following areas: SRE, DevOps, Cloud Operations, Systems Engineering, or Software Engineering.
  • Excellent command of cloud services on AWS/GCP/Azure, Kubernetes and CI/CD pipelines.
  • Have moderate-advanced experience in Java, C, C++, Python, Go or other object-oriented programming languages.
  • You are Interested in designing, analyzing and troubleshooting large-scale distributed systems.
  • You have a systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive.
  • You have a great ability to debug and optimize code and automate routine tasks.
  • You have a solid background in software development and architecting resilient and reliable applications.
Read more
Job posted by
i
Kiran B
Apply for job
i
at SteelEye is a fast growing FinTech company based in London
Agency job
via Beiing
Amazon Web Services (AWS)
Ansible
Terraform
Python
Docker
i
Remote, Bengaluru (Bangalore)
i
- yrs
i
₹15L - ₹30L / yr
What you’ll do

• Develop and Maintain IAC using Terraform and Ansible
• Draft design documents that translate requirements into code.
• Deal with challenges associated with scale.
• Assume responsibilities from technical design through technical client support.
• Manage expectations with internal stakeholders and context-switch in a fast paced environment.
• Thrive in an environment that uses Elasticsearch extensively.
• Keep abreast of technology and contribute to the engineering strategy.
• Champion best development practices and provide mentorship.

What we’re looking for

• An AWS Certified Engineer with strong skills in
o Terraform
o Ansible
o *nix and shell scripting
• Preferably with experience in:
o Elasticsearch
o Circle CI
o CloudFormation
o Python
o Packer
o Docker
o Prometheus and Grafana
o Challenges of scale
o Production support
• Sharp analytical and problem-solving skills.
• Strong sense of ownership.
• Demonstrable desire to learn and grow.
• Excellent written and oral communication skills.
• Mature collaboration and mentoring abilities.
Read more
Job posted by
i
Divya R
Apply for job
Cloudformation
Ansible
Amazon Web Services (AWS)
Python
JIRA
Perl
Powershell
Bash
Terraform
Groovy
i
Remote only
i
5 - 11 yrs
i
₹10L - ₹17L / yr
  • 5+ years of software development or site reliability engineering or equivalent experience
  • Skilled at problem solving, algorithms, and data structures
  • Building tools and scripting frameworks from scratch
  • Working with Cloud Automation tools like CloudFormation, Terraform, CDK, aws-cli
  • Scripting languages like Python, Groovy, PowerShell, Bash, Perl etc.
  • Configuration automation using Ansible or equivalent tools
  • Exposure to Windows, Linux administration skills
  • Project management tools like Jira, Trello
  • Prior experience in dealing with Datastore technologies like Postgres, MySQL, SQL, DynamoDB is desirable
  • Familiarity with basic networking, security and cloud engineering concepts
  • Team player who is eager to help others to succeed through mentoring and leading by example
  • Highly collaborative with effective written and verbal communication skills
Read more
Job posted by
i
Mohammad Farooq Shaik
Apply for job
i
Founded 1998  •  Product  •  500-1000 employees  •  Profitable
Monitoring
Reliability engineering
AppDynamics
Dynatrace
HTTP
DNS administration
Cisco Certified Network Associate (CCNA)
i
Mumbai
i
- yrs
i
₹15L - ₹25L / yr

Requirements

Technical Skills

  • Ability to solution & deliver all of Operations/SRE services & processes including managing L2 Environment Support
  • 5-12 years of overall environment support experience with 5+ years of experience as support / SRE engineer
  • Experience in implementing Monitoring solutions using APM tools( Example: AppDynamics, Graylog, Dynatrace, Datadog etc.) set up and test proactive monitoring alerts
  • Have a broad knowledge profile and really excel in some areas, such as HTTP/TLS, DNS, networking or containerization
  • Comfortable with large scale production systems and technologies, for example load balancing, monitoring, distributed systems, microservices, and configuration management.

Process Skills

  • Systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive
  • Interest in designing, analyzing and troubleshooting large-scale distributed systems.

Behavioral Skills

  • Practice sustainable incident response and blameless postmortems.
  • Proven ability in developing relationships with stakeholders, communicating project/program status, and understanding detailed business requirements across multiple project initiatives
  • This role requires candidates to work in rotational shifts. 24*7 support

Benefits

LOCATION: Mumbai

COMPENSATION: Competitive

WHY ZYCUS? :

  • Be a part of one of the fastest growing product Company in India
  • Come join a young, dynamic & enterprising team
  • Work on the latest technologies
  • Flexible working hours (As per business requirement).

Zycus Global Leader Procurement: https://www.zycus.com/newsroom/press-releases.html

Read more
Job posted by
i
Varsha Gupta
Apply for job
i
Founded 2015  •  Product  •  100-500 employees  •  Raised funding
Terraform
Kubernetes
Ansible
i
NCR (Delhi | Gurgaon | Noida)
i
- yrs
i
₹10L - ₹21L / yr
WHAT WILL I DO? You will work as a Site Reliability Engineer responsible for the availability, performance, monitoring, and incident response, among other things, of the platforms and services used and owned by Shuttl. The SRE Team works alongside the Engineering team and owns every aspect of service availability as well as disaster recovery and business continuity plans. You will work with other Site Reliability Engineers and report to the Lead of Site Reliability Engineering Team. HOW DO WE WORK? Our engineering process is a five step process which consists of phases for planning, developing, testing & profiling, releasing and monitoring. The planning phase consists of documenting of the feature/task to be done followed by various discussions. These discussions cover product, delivery estimates, release plan, monitoring plan, test plans, architecture, code design, technology choices and best practice adoption. The development and testing phase coexist and involve writing code, unit tests, performance tests, profiling, stress testing, code reviews and QA testing. This phase is punctuated with daily scrums and standups. The release phase is largely about managing and communicating the release to customers and internal stakeholders and activating features. The last phase is the monitoring phase where relevant metrics and exceptions are tracked and any critical refinement for the delivered feature is undertaken. This phase culminates with a retrospective. SREs get involved in this process as early as possible to provide general guidance, recommendations and help with designing the application to be in compliance with community standards such as CNCF and 12 Factor. SRE involvement and influence tends to increase during mid to final stages of development where the application is primed for beta evaluation and all the tooling and instrumentation is finalized. WHAT SKILLS SHOULD I HAVE? For this role we expect you to have 3+ years of experience working as a DevOps Engineer or SRE. You should have a good grasp of Unix like systems, access control, networking nuances, process isolation by the means of kernel provided features, distributed applications and algorithms, job schedulers and secret management among other things. At Shuttl we are a big proponent of Immutable infrastructure. All our infrastructure is hosted with Amazon Web Services and we use Hashicorp's Terraform to manage the infrastructure as code. A good handle on AWS and Terraform is therefore a definitive plus. Since SREs are expected to write a lot of code, you are also expected to be skillful in a programming language, preferably Python or Go.
Read more
Job posted by
i
Tanika Monga
Apply for job
Did not find a job you were looking for?
i
Search
Search for relevant jobs from 10000+ companies such as Google, Amazon & Uber actively hiring on CutShort.
iiiii
Want to apply for this role at Coredgeio?
i
Apply for this job
Why apply via CutShort?
Connect with actual hiring teams and get their fast response. No spam.