As Site Reliability Engineer, you will be involved in exciting technical challenges by analysing, troubleshooting, and designing vital services, platforms, and infrastructure while always thinking about reliability, scalability, resilience, security, and performance.

Requirements

Strong experience in Deployment of AWS cloud infrastructure 3+ years.
3+ years of working experience in infrastructure support and CICD platform, leveraging DevOps, SRE & Agile methodologies.
Hands-on experience in provisioning Infrastructure as Code (IaC) using Terraform Enterprise or community edition.
Experience in cloud environments AWS/GCP/Azure and container technology, Docker and Kubernetes, Docker Swarm, Helm DevOps (Git + CI/CD pipelines), and Jenkins.
AWS (Solutions Architect Professional), (Valid certification)
Experience in programming/scripting in Python/Ruby/Bash for at least 5+ years.
Experience in monitoring and analyzing infrastructure performance using standard performance monitoring tools - Grafana/Prometheus, DataDog, Nagios, New Relic.

Responsibilities

Owning Infra architecture and non-functional requirements, ensuring they fit into a cohesive vision aligned with the rest of the Technology roadmap of the platform for the launch
Propagate DevOps culture across the organization by sharing industry best practices, standards, approaches, documentation, and code with other engineering teams
Design, test and troubleshoot the CICD pipeline for containerized applications from build until deployment
Apply automation and software to any manual and mechanical tasks or parts of the system that would benefit from it or are performed manually.
Able to troubleshoot complicated, cross-platform issues handling OS, Networking, and databases in a cloud-based SaaS environment and handle live production incidents, debug/troubleshoot application and infrastructure issues, and follow and implement SRE best practices.

Benefits

Work-Life Balance
Learning & Development
Sabbatical Leave
Parental Leaves
Profit-Sharing
Office Perks (Free Meal, Snacks)

Job Description

Requirements

Strong experience in Deployment of AWS cloud infrastructure 3+ years.
3+ years of working experience in infrastructure support and CICD platform, leveraging DevOps, SRE & Agile methodologies.
Hands-on experience in provisioning Infrastructure as Code (IaC) using Terraform Enterprise or community edition.
Experience in cloud environments AWS/GCP/Azure and container technology, Docker and Kubernetes, Docker Swarm, Helm DevOps (Git + CI/CD pipelines), and Jenkins.
AWS (Solutions Architect Professional), (Valid certification)
Experience in programming/scripting in Python/Ruby/Bash for at least 5+ years.
Experience in monitoring and analyzing infrastructure performance using standard performance monitoring tools - Grafana/Prometheus, DataDog, Nagios, New Relic.

Responsibilities

Owning Infra architecture and non-functional requirements, ensuring they fit into a cohesive vision aligned with the rest of the Technology roadmap of the platform for the launch
Propagate DevOps culture across the organization by sharing industry best practices, standards, approaches, documentation, and code with other engineering teams
Design, test and troubleshoot the CICD pipeline for containerized applications from build until deployment
Apply automation and software to any manual and mechanical tasks or parts of the system that would benefit from it or are performed manually.
Able to troubleshoot complicated, cross-platform issues handling OS, Networking, and databases in a cloud-based SaaS environment and handle live production incidents, debug/troubleshoot application and infrastructure issues, and follow and implement SRE best practices.

Benefits

Work-Life Balance
Learning & Development
Sabbatical Leave
Parental Leaves
Profit-Sharing
Office Perks (Free Meal, Snacks)

Users love Cutshort

Read about what our users have to say about finding their next opportunity on Cutshort.

Subodh Popalwar

Software Engineer, Memorres

For 2 years, I had trouble finding a company with good work culture and a role that will help me grow in my career. Soon after I started using Cutshort, I had access to information about the work culture, compensation and what each company was clearly offering.

Companies hiring on Cutshort

About TechVerito Software Solutions LLP

Founded :

2016

Type

Size :

20-100

Stage :

Profitable

About

TechVerito is a leading technology consulting and software development company that specializes in the delivery of crafted, high-quality software solutions. With a dedicated team of experienced professionals, we are passionate about leveraging technology to help businesses achieve their goals and drive digital transformation.

TechVerito has worked with clients in Gaming, Education, Retail, Fintech, Banking, healthcare, and non-profit organizations.

We believe that every line of code matters, and we strive for excellence in every aspect of our work. From ideation to deployment, we follow a rigorous development process that ensures the highest standards of quality at every step.

We employ industry best practices, agile methodologies, and the latest technologies to create software solutions that meet our clients’ specific requirements. We specialize in developing custom software applications, web and mobile applications, enterprise solutions, and software integrations. Quality is at the core of everything we do. We believe that high-quality software is the result of a meticulous development process combined with rigorous testing and continuous improvement.

At TechVerito, we value long-term partnerships with our clients. We believe in open and transparent communication, and we work closely with our clients to understand their unique challenges and deliver tailored solutions. We pride ourselves on our ability to adapt to changing requirements, providing flexibility and agility throughout the development process.

Learn more at www.techverito.com

TechVerito has worked with clients in Gaming, Education, Retail, Fintech, Banking, healthcare, and non-profit organizations.

Learn more at www.techverito.com

Tech Stack

Java

Kotlin

Ruby

Go Programming (Golang)

TypeScript

React.js

Angular (2+)

CI-CD

Microservices

Company video

TechVerito Software Solutions LLP's video section

Photos

Connect with the team

Harun Pathan

Connect

Bhushan Chordia

Connect

Vidisha Patel

Connect

Company social profiles

Similar jobs

Site Reliability Engineer/DevOps

at Digital B2B Platform

Agency job

via Jobdost by Sathish Kumar

Bengaluru (Bangalore)

3 - 4 yrs

₹15L - ₹30L / yr

DevOps

Python

CI/CD

Linux/Unix

Git

+6 more

We are a digital B2B platform that offers loans, working capital, and payment services to small businesses.

Candidate MUST HAVE product-based company experience and a minimum of 3years of experience in DevOps.

What you will do (or learn) :
1. Build our application stack on AWS. Infrastructure as code (read Terraform)
2. Build state-of-the-art CI/CD pipelines.
3. Manage data warehouses and data pipelines.
4. Work on infrastructure and data security.
5. State-of-the-art log management system and tooling around them.
6. Monitoring and alerting system.

What do we expect from you?
1. 3 to 10 years of experience with DevOps or SRE principles.
2. Good fundamentals of database management and other distributed systems management.
3. Experience in infrastructure as code or other configuration management systems.
4. Experience in scripting languages (like bash, python, go lang etc.)
5. Good understanding of Linux systems
6. Strong debugging and troubleshooting skills
7. Experience in tooling around monitoring, CI/CD, log management systems.

Senior Site Reliability Engineer/ DevOps Engineer

at Digital B2B Platform

Agency job

via Jobdost by Sathish Kumar

Bengaluru (Bangalore)

4 - 8 yrs

₹25L - ₹60L / yr

Python

DevOps

Amazon Web Services (AWS)

Ansible

Terraform

+4 more

SRE - DevOps Technical Lead

at Srijan Technologies

6 recruiters

Posted by Adyasha Satpathy

Remote only

5 - 12 yrs

₹20L - ₹32L / yr

Kubernetes

Docker

Ansible

Terraform

Amazon Web Services (AWS)

+6 more

SRE - Tech Lead (DevOps):

Location: Permanent Work From Home Option
Notice: Candidates with a notice period of 30 days and less and preferred

SRE-DevOps- Tech Lead - JD:

Srijan is hiring for Site Reliability Engineering (SRE), We are looking for SRE/DevOps- Tech Lead or Sr. Tech Lead with strong automation skills and a good understanding of how to build & run secure & reliable platforms for cloud-native applications. Please find below the detailed job description and kindly go through the same for reference:-

Minimum Experience: 6+ years in DevOps/SRE

Permanent WFH option

Job Description:-

The focus of this role is to build scalable, resilient, secure infrastructure for cloud-native applications whilst automating every mundane task you could think of and build observability dashboards, set up alerts, etc to provide optics to relevant stakeholders. In a nutshell: “You are keepers of Production environments”. You must be a problem solver with the ability to multitask and come with strong collaboration and communication skills.

Key Responsibilities:-

Proactively monitor and review application performance
Handle on-call and emergency support
Ensure software has good logging and diagnostics
Create and maintain operational runbooks
Contribute in Solution Designing and evaluating Technical Debt
Set right practices for Well-Defined Architecture & to minimize toil.
Own SLI, SLO configuration as per Error Budget
Maintain production services through measuring and monitoring availability, latency, and overall system health.
Practice sustainable incident response and blameless postmortems.
Not be afraid to contribute changes back to the Software engineering team to improve the systems.
Managing the delivery pipeline into production.
Able to mentor junior members on regular basis
Troubleshooting issues with web applications
Understanding of security principles and best practices
Ensuring that critical data is backed up
Configuration of monitoring systems including infrastructure monitoring and Application Performance Monitoring systems such as New Relic.
Ensuring that web application infrastructure is built
Ability to act as Customer Technical Advocate and negotiate well with peers on technical fronts.
Flexible enough to work in different Shifts for hyper business requirement
Ability to handle multiple global clients on tech front and generate desired reports to represent health of SRE Delivery.

Skills/Experience:-

A key skill of a SRE Tech Lead is that they have a deep knowledge of the application, the code, and how it runs, is configured, and scales. That knowledge is what makes them so valuable at also monitoring and supporting it as site reliability engineers.
System administration, security, and networking
The SRE Tech Lead expected to have a good understanding of system administration (Linux or Windows) and networking.
Essential commands
User and Group Management
Knowledge of networking concepts (DNS, TCP/IP, and Firewalls)
Service Configuration
Storage Management
Good grasp of fundamental security concepts
Good understanding of infrastructure as code principles.
Knowledge of a scripting language such as Bash
Ability to configure infrastructure using a Configuration Management technology such as Puppet, Chef, or Ansible.
Familiarity with Jenkins or any other CI/CD tool
Proficiency in a high-level programming language such as Python or Go.
Understanding of container technologies such as Docker, Kubernetes
2 yrs+ hands on experience with container orchestration technologies such as ECS, EKS, AKS or Kubernetes would be beneficial.
Use Terraform and other IaC to deploy cloud infrastructure.

Cloud technologies:-

Experience designing available, cost-efficient, fault-tolerant, and scalable distributed systems on AWS/Azure
Hands-on experience using compute, networking, storage, and database AWS/Azure services
Hands-on experience of 4 yrs+ with AWS/Azure deployment and management services
Ability to identify and define technical requirements for an AWS/AZURE-based application
Ability to identify which AWS/AZURE services meet a given technical requirement
Knowledge of recommended best practices for building secure and reliable applications on the AWS/AZURE platform
An understanding of the AWS/AZURE global infrastructure
An understanding of network technologies as they relate to AWS/AZURE
An understanding of security features and tools that AWS/AZURE provides and how they relate to traditional services

SRE - Tech Lead (DevOps):

Location: Permanent Work From Home Option
Notice: Candidates with a notice period of 30 days and less and preferred

SRE-DevOps- Tech Lead - JD:

Minimum Experience: 6+ years in DevOps/SRE

Permanent WFH option

Job Description:-

Key Responsibilities:-

Proactively monitor and review application performance
Handle on-call and emergency support
Ensure software has good logging and diagnostics
Create and maintain operational runbooks
Contribute in Solution Designing and evaluating Technical Debt
Set right practices for Well-Defined Architecture & to minimize toil.
Own SLI, SLO configuration as per Error Budget
Maintain production services through measuring and monitoring availability, latency, and overall system health.
Practice sustainable incident response and blameless postmortems.
Not be afraid to contribute changes back to the Software engineering team to improve the systems.
Managing the delivery pipeline into production.
Able to mentor junior members on regular basis
Troubleshooting issues with web applications
Understanding of security principles and best practices
Ensuring that critical data is backed up
Configuration of monitoring systems including infrastructure monitoring and Application Performance Monitoring systems such as New Relic.
Ensuring that web application infrastructure is built
Ability to act as Customer Technical Advocate and negotiate well with peers on technical fronts.
Flexible enough to work in different Shifts for hyper business requirement
Ability to handle multiple global clients on tech front and generate desired reports to represent health of SRE Delivery.

Skills/Experience:-

A key skill of a SRE Tech Lead is that they have a deep knowledge of the application, the code, and how it runs, is configured, and scales. That knowledge is what makes them so valuable at also monitoring and supporting it as site reliability engineers.
System administration, security, and networking
The SRE Tech Lead expected to have a good understanding of system administration (Linux or Windows) and networking.
Essential commands
User and Group Management
Knowledge of networking concepts (DNS, TCP/IP, and Firewalls)
Service Configuration
Storage Management
Good grasp of fundamental security concepts
Good understanding of infrastructure as code principles.
Knowledge of a scripting language such as Bash
Ability to configure infrastructure using a Configuration Management technology such as Puppet, Chef, or Ansible.
Familiarity with Jenkins or any other CI/CD tool
Proficiency in a high-level programming language such as Python or Go.
Understanding of container technologies such as Docker, Kubernetes
2 yrs+ hands on experience with container orchestration technologies such as ECS, EKS, AKS or Kubernetes would be beneficial.
Use Terraform and other IaC to deploy cloud infrastructure.

Cloud technologies:-

Experience designing available, cost-efficient, fault-tolerant, and scalable distributed systems on AWS/Azure
Hands-on experience using compute, networking, storage, and database AWS/Azure services
Hands-on experience of 4 yrs+ with AWS/Azure deployment and management services
Ability to identify and define technical requirements for an AWS/AZURE-based application
Ability to identify which AWS/AZURE services meet a given technical requirement
Knowledge of recommended best practices for building secure and reliable applications on the AWS/AZURE platform
An understanding of the AWS/AZURE global infrastructure
An understanding of network technologies as they relate to AWS/AZURE
An understanding of security features and tools that AWS/AZURE provides and how they relate to traditional services

Site Reliability Engineer

at An US based firm offering permanent WFH

Agency job

via Jobdost by Mamatha A

Remote only

3 - 10 yrs

₹5L - ₹15L / yr

Python

Amazon Web Services (AWS)

MongoDB

MySQL

Django

+9 more

A network of the world's best developers - full-time, long-term remote software jobs with better compensation and career growth. We enable our clients to accelerate their Cloud Offering and Capitalize on Cloud. We have our own IoT/AI platform and we provide professional services on that platform to build custom clouds for their IoT devices. We also build mobile apps, run 24x7 DevOps/site reliability engineering for our clients.

We are looking for a friendly, very hands-on technical, and dependable professional with plenty of experience as a backend & cloud engineer to provide site reliability services to our internal teams and end customers. We expect you to deliver with TOP quality & high speed. You must have experience developing and designing amazing UI screens.

This person MUST have:

BE Computer Science or equivalent
Cloud app development experience.
Strong Troubleshooting and debugging skills
A strong passion for writing simple, clean, and efficient code.

3 years of experience with the Django framework and other backend technologies.
Knowledge of NodeJS
Experience with building, modifying, and extending API endpoints (REST or GraphQL) for data retrieval and persistence.
Understand how to use a database like Postgres (preferred choice), SQLite, MongoDB, MySQL.
Experience creating high-performance applications.
Experience with messaging and broker tools - Rabbitmq, MQTT
Experience with SQL and NoSQL databases
Experience with the full software development life cycle, including requirements collection, design, implementation, testing, and operational support.
Knowledge of web services
Proficient understanding of code versioning tools Git.
Hands-on experience deploying and managing infrastructure with CloudFormation/Terraform
Experience managing AWS infrastructure.
Hands-on experience in Linux environment.
Basic understanding of Kubernetes/Docker orchestration.
Manges existing infrastructure/Pipelines/Engineering tools (On-Prem or AWS) for the engineering team (Build servers/Jenkins nodes etc.)
Experience with scrum or other agile software development methodology.
Excellent verbal and written communication, teamwork, decision making and influencing skills.
Handle customer calls/emails regarding technical issues for end-users.
Strong communication skills
Attention to detail.

Experience:

Min 3 year experience

Location:

Ahmedabad Office Or,
Work from home

Timings:

40 hours a week with a rotational shift every month.

Position:

Full time/Direct
We have great benefits such as PF, medical insurance, 12 annual company holidays, 12 PTO leaves per year, annual increments, Diwali bonus, spot bonuses and other incentives, etc.
We don't believe in locking in people with large notice periods. You will stay here because you love the company. We have only a 30 days notice period

This person MUST have:

BE Computer Science or equivalent
Cloud app development experience.
Strong Troubleshooting and debugging skills
A strong passion for writing simple, clean, and efficient code.

3 years of experience with the Django framework and other backend technologies.
Knowledge of NodeJS
Experience with building, modifying, and extending API endpoints (REST or GraphQL) for data retrieval and persistence.
Understand how to use a database like Postgres (preferred choice), SQLite, MongoDB, MySQL.
Experience creating high-performance applications.
Experience with messaging and broker tools - Rabbitmq, MQTT
Experience with SQL and NoSQL databases
Experience with the full software development life cycle, including requirements collection, design, implementation, testing, and operational support.
Knowledge of web services
Proficient understanding of code versioning tools Git.
Hands-on experience deploying and managing infrastructure with CloudFormation/Terraform
Experience managing AWS infrastructure.
Hands-on experience in Linux environment.
Basic understanding of Kubernetes/Docker orchestration.
Manges existing infrastructure/Pipelines/Engineering tools (On-Prem or AWS) for the engineering team (Build servers/Jenkins nodes etc.)
Experience with scrum or other agile software development methodology.
Excellent verbal and written communication, teamwork, decision making and influencing skills.
Handle customer calls/emails regarding technical issues for end-users.
Strong communication skills
Attention to detail.

Experience:

Min 3 year experience

Location:

Ahmedabad Office Or,
Work from home

Timings:

40 hours a week with a rotational shift every month.

Position:

Full time/Direct
We have great benefits such as PF, medical insurance, 12 annual company holidays, 12 PTO leaves per year, annual increments, Diwali bonus, spot bonuses and other incentives, etc.
We don't believe in locking in people with large notice periods. You will stay here because you love the company. We have only a 30 days notice period

Site Reliability Engineer

at Uniphore Software Systems

2 recruiters

Posted by Sandesh HS

Bengaluru (Bangalore)

5 - 10 yrs

₹25L - ₹40L / yr

SRE

Site Reliability Engineer

Reliability engineering

DevOps

Kubernetes

+5 more

Your Responsibilities

We are looking for a Senior SRE with a proven track record of success leading complex cloud-hybrid environments. You will have:
Strong sense of Being an Owner, Wearing the Customer Shoes, with the ability to Empower Others demonstrated through clear
communication and collaboration.
Skills to work independently with multiple global teams, developing, configuring, deploying, and operating our global infrastructure on AWS and on-prem.
Operational experience in complex distributed and real-time systems, including experience with SLO/SLAs towards high availability,reliability and DR goals.
DevOps experience in building tools and frameworks, with an understanding of continuous deployment processes.
Ability to think at scale, bringing a focus on continuous delivery methodologies from design through deployment and operations.
Experience building and managing systems with tools including Kubernetes, Chef/Ansible/Puppet, Kafka, Docker, and Terraform.

Required Skill

5+ years experience in a Software and/or Site Reliability Engineering role
Experience writing automation code in GoLang, Python or Java
Experience developing and operating large scale distributed systems with Kubernetes and Docker
Experience in running real time and low latency high available applications (Kafka, gRPC, RTP)
Experience running public cloud environments on AWS
Experience running hybrid clouds and on-prem infrastructures on Red Hat Enterprise Linux / CentOS
Bachelor degree in Engineering, Computer Science or equivalent experience
The ability to lead, partner, and collaborate cross functionally across an engineering organization

Your Responsibilities

We are looking for a Senior SRE with a proven track record of success leading complex cloud-hybrid environments. You will have:
Strong sense of Being an Owner, Wearing the Customer Shoes, with the ability to Empower Others demonstrated through clear
communication and collaboration.
Skills to work independently with multiple global teams, developing, configuring, deploying, and operating our global infrastructure on AWS and on-prem.
Operational experience in complex distributed and real-time systems, including experience with SLO/SLAs towards high availability,reliability and DR goals.
DevOps experience in building tools and frameworks, with an understanding of continuous deployment processes.
Ability to think at scale, bringing a focus on continuous delivery methodologies from design through deployment and operations.
Experience building and managing systems with tools including Kubernetes, Chef/Ansible/Puppet, Kafka, Docker, and Terraform.

Required Skill

5+ years experience in a Software and/or Site Reliability Engineering role
Experience writing automation code in GoLang, Python or Java
Experience developing and operating large scale distributed systems with Kubernetes and Docker
Experience in running real time and low latency high available applications (Kafka, gRPC, RTP)
Experience running public cloud environments on AWS
Experience running hybrid clouds and on-prem infrastructures on Red Hat Enterprise Linux / CentOS
Bachelor degree in Engineering, Computer Science or equivalent experience
The ability to lead, partner, and collaborate cross functionally across an engineering organization

Site Reliability Engineer

at SteelEye is a fast growing FinTech company based in London

Agency job

via Beiing by Divya R

Remote, Bengaluru (Bangalore)

3 - 8 yrs

₹15L - ₹30L / yr

Python

Amazon Web Services (AWS)

Ansible

Terraform

Docker

What you’ll do

• Develop and Maintain IAC using Terraform and Ansible
• Draft design documents that translate requirements into code.
• Deal with challenges associated with scale.
• Assume responsibilities from technical design through technical client support.
• Manage expectations with internal stakeholders and context-switch in a fast paced environment.
• Thrive in an environment that uses Elasticsearch extensively.
• Keep abreast of technology and contribute to the engineering strategy.
• Champion best development practices and provide mentorship.

What we’re looking for

• An AWS Certified Engineer with strong skills in
o Terraform
o Ansible
o *nix and shell scripting
• Preferably with experience in:
o Elasticsearch
o Circle CI
o CloudFormation
o Python
o Packer
o Docker
o Prometheus and Grafana
o Challenges of scale
o Production support
• Sharp analytical and problem-solving skills.
• Strong sense of ownership.
• Demonstrable desire to learn and grow.
• Excellent written and oral communication skills.
• Mature collaboration and mentoring abilities.

Senior/ Lead Site Reliability Engineer

at OJAS

Agency job

via Ojas Innovative Technologies by Pradeep Kumar Burra

Hyderabad

5 - 11 yrs

₹10L - ₹20L / yr

site reliability

cloudformation

Terraform

Ansible

Cloud Automation

+8 more

5+ years of software development or site reliability engineering or equivalent experience
Skilled at problem solving, algorithms, and data structures
Building tools and scripting frameworks from scratch
Working with Cloud Automation tools like CloudFormation, Terraform, CDK, aws-cli
Scripting languages like Python, Groovy, PowerShell, Bash, Perl etc.
Configuration automation using Ansible or equivalent tools
Exposure to Windows, Linux administration skills
Project management tools like Jira, Trello
Prior experience in dealing with Datastore technologies like Postgres, MySQL, SQL, DynamoDB is desirable
Familiarity with basic networking, security and cloud engineering concepts
Team player who is eager to help others to succeed through mentoring and leading by example
Highly collaborative with effective written and verbal communication skills

5+ years of software development or site reliability engineering or equivalent experience
Skilled at problem solving, algorithms, and data structures
Building tools and scripting frameworks from scratch
Working with Cloud Automation tools like CloudFormation, Terraform, CDK, aws-cli
Scripting languages like Python, Groovy, PowerShell, Bash, Perl etc.
Configuration automation using Ansible or equivalent tools
Exposure to Windows, Linux administration skills
Project management tools like Jira, Trello
Prior experience in dealing with Datastore technologies like Postgres, MySQL, SQL, DynamoDB is desirable
Familiarity with basic networking, security and cloud engineering concepts
Team player who is eager to help others to succeed through mentoring and leading by example
Highly collaborative with effective written and verbal communication skills

Site Reliability Engineer

at ScienceLogic

Agency job

via Ojas Innovative Technologies by Mohammad Farooq Shaik

Remote only

5 - 11 yrs

₹10L - ₹17L / yr

AWS CloudFormation

cloud automation

site reliability

cloudformation

Ansible

+9 more

5+ years of software development or site reliability engineering or equivalent experience
Skilled at problem solving, algorithms, and data structures
Building tools and scripting frameworks from scratch
Working with Cloud Automation tools like CloudFormation, Terraform, CDK, aws-cli
Scripting languages like Python, Groovy, PowerShell, Bash, Perl etc.
Configuration automation using Ansible or equivalent tools
Exposure to Windows, Linux administration skills
Project management tools like Jira, Trello
Prior experience in dealing with Datastore technologies like Postgres, MySQL, SQL, DynamoDB is desirable
Familiarity with basic networking, security and cloud engineering concepts
Team player who is eager to help others to succeed through mentoring and leading by example
Highly collaborative with effective written and verbal communication skills

5+ years of software development or site reliability engineering or equivalent experience
Skilled at problem solving, algorithms, and data structures
Building tools and scripting frameworks from scratch
Working with Cloud Automation tools like CloudFormation, Terraform, CDK, aws-cli
Scripting languages like Python, Groovy, PowerShell, Bash, Perl etc.
Configuration automation using Ansible or equivalent tools
Exposure to Windows, Linux administration skills
Project management tools like Jira, Trello
Prior experience in dealing with Datastore technologies like Postgres, MySQL, SQL, DynamoDB is desirable
Familiarity with basic networking, security and cloud engineering concepts
Team player who is eager to help others to succeed through mentoring and leading by example
Highly collaborative with effective written and verbal communication skills

Site Reliability Engineer

at Shuttl

8 recruiters

Posted by Tanika Monga

NCR (Delhi | Gurgaon | Noida)

3 - 6 yrs

₹10L - ₹21L / yr

Terraform

Kubernetes

Ansible

WHAT WILL I DO? You will work as a Site Reliability Engineer responsible for the availability, performance, monitoring, and incident response, among other things, of the platforms and services used and owned by Shuttl. The SRE Team works alongside the Engineering team and owns every aspect of service availability as well as disaster recovery and business continuity plans. You will work with other Site Reliability Engineers and report to the Lead of Site Reliability Engineering Team. HOW DO WE WORK? Our engineering process is a five step process which consists of phases for planning, developing, testing & profiling, releasing and monitoring. The planning phase consists of documenting of the feature/task to be done followed by various discussions. These discussions cover product, delivery estimates, release plan, monitoring plan, test plans, architecture, code design, technology choices and best practice adoption. The development and testing phase coexist and involve writing code, unit tests, performance tests, profiling, stress testing, code reviews and QA testing. This phase is punctuated with daily scrums and standups. The release phase is largely about managing and communicating the release to customers and internal stakeholders and activating features. The last phase is the monitoring phase where relevant metrics and exceptions are tracked and any critical refinement for the delivered feature is undertaken. This phase culminates with a retrospective. SREs get involved in this process as early as possible to provide general guidance, recommendations and help with designing the application to be in compliance with community standards such as CNCF and 12 Factor. SRE involvement and influence tends to increase during mid to final stages of development where the application is primed for beta evaluation and all the tooling and instrumentation is finalized. WHAT SKILLS SHOULD I HAVE? For this role we expect you to have 3+ years of experience working as a DevOps Engineer or SRE. You should have a good grasp of Unix like systems, access control, networking nuances, process isolation by the means of kernel provided features, distributed applications and algorithms, job schedulers and secret management among other things. At Shuttl we are a big proponent of Immutable infrastructure. All our infrastructure is hosted with Amazon Web Services and we use Hashicorp's Terraform to manage the infrastructure as code. A good handle on AWS and Terraform is therefore a definitive plus. Since SREs are expected to write a lot of code, you are also expected to be skillful in a programming language, preferably Python or Go.

Why apply to jobs via Cutshort