Site Reliability Engineer

at Shuttl

DP
Posted by Tanika Monga
icon
NCR (Delhi | Gurgaon | Noida)
icon
3 - 6 yrs
icon
₹10L - ₹21L / yr
icon
Full time
Skills
Terraform
Kubernetes
Ansible
WHAT WILL I DO? You will work as a Site Reliability Engineer responsible for the availability, performance, monitoring, and incident response, among other things, of the platforms and services used and owned by Shuttl. The SRE Team works alongside the Engineering team and owns every aspect of service availability as well as disaster recovery and business continuity plans. You will work with other Site Reliability Engineers and report to the Lead of Site Reliability Engineering Team. HOW DO WE WORK? Our engineering process is a five step process which consists of phases for planning, developing, testing & profiling, releasing and monitoring. The planning phase consists of documenting of the feature/task to be done followed by various discussions. These discussions cover product, delivery estimates, release plan, monitoring plan, test plans, architecture, code design, technology choices and best practice adoption. The development and testing phase coexist and involve writing code, unit tests, performance tests, profiling, stress testing, code reviews and QA testing. This phase is punctuated with daily scrums and standups. The release phase is largely about managing and communicating the release to customers and internal stakeholders and activating features. The last phase is the monitoring phase where relevant metrics and exceptions are tracked and any critical refinement for the delivered feature is undertaken. This phase culminates with a retrospective. SREs get involved in this process as early as possible to provide general guidance, recommendations and help with designing the application to be in compliance with community standards such as CNCF and 12 Factor. SRE involvement and influence tends to increase during mid to final stages of development where the application is primed for beta evaluation and all the tooling and instrumentation is finalized. WHAT SKILLS SHOULD I HAVE? For this role we expect you to have 3+ years of experience working as a DevOps Engineer or SRE. You should have a good grasp of Unix like systems, access control, networking nuances, process isolation by the means of kernel provided features, distributed applications and algorithms, job schedulers and secret management among other things. At Shuttl we are a big proponent of Immutable infrastructure. All our infrastructure is hosted with Amazon Web Services and we use Hashicorp's Terraform to manage the infrastructure as code. A good handle on AWS and Terraform is therefore a definitive plus. Since SREs are expected to write a lot of code, you are also expected to be skillful in a programming language, preferably Python or Go.
Read more

About Shuttl

Shuttl is a bus aggregating platform offering shuttle bus service to its commuters in cities like Noida and Gurgaon. The effort is to address the daily commute problem faced by office goers. It's a mobile based minibus service aimed at making your daily commute more convenient. The vehicles are air-conditioned and operate with high frequency on fixed routes freeing you from the hassles of existing public transport options at a very economical price point.

 

Read more
Founded
2015
Type
Product
Size
100-500 employees
Stage
Raised funding
View full company details
Why apply to jobs via Cutshort
Personalized job matches
Stop wasting time. Get matched with jobs that meet your skills, aspirations and preferences.
Verified hiring teams
See actual hiring teams, find common social connections or connect with them directly. No 3rd party agencies here.
Move faster with AI
We use AI to get you faster responses, recommendations and unmatched user experience.
2101133
Matches delivered
3712187
Network size
15000
Companies hiring

Similar jobs

Python
DevOps
Amazon Web Services (AWS)
Ansible
Terraform
Kubernetes
CI/CD
Git
Linux/Unix
icon
Bengaluru (Bangalore)
icon
4 - 8 yrs
icon
₹25L - ₹60L / yr
We are a digital B2B platform that offers loans, working capital, and payment services to small businesses.

Candidate MUST HAVE product-based company experience and a minimum of 3years of experience in DevOps.

What you will do (or learn) : 

1. Build our application stack on AWS. Infrastructure as code (read Terraform)
2. Build state-of-the-art CI/CD pipelines.
3. Manage data warehouses and data pipelines.
4. Work on infrastructure and data security.
5. State-of-the-art log management system and tooling around them.
6. Monitoring and alerting system.

What do we expect from you?
1. 3 to 10 years of experience with DevOps or SRE principles.
2. Good fundamentals of database management and other distributed systems management.
3. Experience in infrastructure as code or other configuration management systems.
4. Experience in scripting languages (like bash, python, go lang etc.)
5. Good understanding of Linux systems
6. Strong debugging and troubleshooting skills
7. Experience in tooling around monitoring, CI/CD, log management systems. 
Read more
Job posted by
Sathish Kumar

Senior Site Reliability Engineer

at One of the largest Equity broking House in India

Agency job
via HyrHub
Reliability engineering
SRE
DevOps
Amazon Web Services (AWS)
Ansible
Terraform
Kubernetes
Git
helm
icon
Mumbai, Bengaluru (Bangalore)
icon
4 - 8 yrs
icon
₹15L - ₹20L / yr
Common roles and responsibilities:
● Be on a PagerDuty rotation to respond to availability incidents and provide support
for service engineers.
● Run the production environment by monitoring availability and taking a holistic view
of system health
● Building and implementing services to make IT and support better at their jobs.
● Improve reliability, quality, and time-to-market of our suite of software solutions
● Measure and optimize system performance, with an eye toward pushing our
capabilities forward, getting ahead of customer needs, and innovating to continually
improve
● Gather and analyze metrics from both operating systems and applications to assist in
performance tuning and fault finding
● Experience from an agile working development environment
● Participate in system design consulting, platform management, and capacity planning
● Balance feature development speed and reliability with well-defined service level
objectives
Required Skills and Qualifications:
● 3+ years of experience working within DevOps or SRE teams.
● 3+ years experience with AWS Cloud
● Ability to program (structured and OO) with one or more high level languages, such
as Python, Go, Java, and JavaScript
● Must have experience with Ansible, Helm, Terraform and Kubernetes.
● Document every action so your findings turn into repeatable actions–and then into
automation.
● Hands-on experience with Distributed Version Control System such as GIT, AWS
CodeCommit or equivalent
● Know your way around Linux and the Unix Shell.
● Experience or familiarity with ELK stack
● Ability to use Azure DevOps
● Experience with distributed storage technologies like NFS, Ceph, S3 as well as
dynamic resource management frameworks (Mesos, Kubernetes)
● A proactive approach to spotting problems, areas for improvement, and performance
bottlenecks
Read more
Job posted by
Shwetha Naik

DevOps Engineer

at wwwsourcewizco

Founded 2020  •  Product  •  0-20 employees  •  Raised funding
Docker
Terraform
Amazon Web Services (AWS)
DevOps
icon
Bengaluru (Bangalore)
icon
1 - 5 yrs
icon
₹5L - ₹20L / yr
At Sourcewiz, we are building tools to help exporters grow their businesses. Our first product is a vertical sales software built for exporters, which allows them to market their unique creations to more buyers, generate more inquiries and increase their sales conversion.

Founded by a passionate team of serial entrepreneurs and alumni of IIT Delhi, U.C Berkeley, and well-known tech companies such as Uber and Zomato.

Sourcewiz is on a mission to increase India’s export GDP. This is a unique opportunity to
join a funded early-stage startup and have a massive impact on our product, culture, and
direction. It's a lot of work and a roller coaster ride. But, if you are up for it, you can join us
in replacing the tiresome and slow sales process for importers and exporters and have a
significant impact on our customers. We are not a company that believes engineers should be hidden away from decisions, churning out code for features decided from upon high. Instead, our Engineers form strong bonds with cross-functional peers in Product Management, Product Design and others to become experts in their product domain.

We’re looking for people with a strong interest in building successful products or systems;
are comfortable in dealing with lots of moving pieces; have exquisite attention to detail, and
comfortable learning new technologies and systems.

As a Site Reliability Engineer at Sourcewiz, you will...
• Own and improve the scalability and reliability of our products
• Working directly with product engineering team
• Work with RDBMS, Search, Caching and queuing
• Contribute expertise towards architectural planning and ensure the company builds
sustainable services that meet our customer expectations while leveraging appropriate
tools and frameworks.
• Ongoing participation in the review and testing
Read more
Job posted by
Saakshi Bhartiya

Observability Systems Engineer

at Top Global Hedge Fund

Agency job
via Bullhorn Consultants
Kubernetes
Apache Kafka
prometheus
ELK
ELK Stack
Amazon Web Services (AWS)
Linux/Unix
Ansible
Systems analysis and design
icon
Gurugram, Delhi, Noida, Ghaziabad, Faridabad
icon
3 - 8 yrs
icon
₹4L - ₹15L / yr
Experience in Kubernetes as a systems engineer
(deployment, troubleshooting, maintenance,
Helm charts) and Deployment and administration
of one or more of: ELK stack, Kafka, Prometheus
or Grafana with Working knowledge of at least
one cloud platform (GCP, AWS or Azure) & some
configuration management system (such as Salt
or Ansible).Good understanding of networking
concepts (architecture, components, protocols)
& Solid understanding of OS concepts and
internals of Linux is a must.
Read more
Job posted by
Hemant Singh

SRE - DevOps Technical Lead

at Srijan Technologies

Founded 2002  •  Products & Services  •  100-1000 employees  •  Profitable
Kubernetes
Docker
Ansible
Terraform
Amazon Web Services (AWS)
Jenkins
CI/CD
Monitoring
Linux/Unix
DevOps
Azure
icon
Remote only
icon
5 - 12 yrs
icon
₹20L - ₹32L / yr

SRE - Tech Lead (DevOps):

Location: Permanent Work From Home Option
Notice: Candidates with a notice period of 30 days and less and preferred

SRE-DevOps- Tech Lead - JD:

 

Srijan is hiring for Site Reliability Engineering (SRE), We are looking for SRE/DevOps- Tech Lead or Sr. Tech Lead with strong automation skills and a good understanding of how to build & run secure & reliable platforms for cloud-native applications. Please find below the detailed job description and kindly go through the same for reference:-



Minimum Experience: 6+ years in DevOps/SRE

Permanent WFH option

Job Description:-

The focus of this role is to build scalable, resilient, secure infrastructure for cloud-native applications whilst automating every mundane task you could think of and build observability dashboards, set up alerts, etc to provide optics to relevant stakeholders. In a nutshell: “You are keepers of Production environments”. You must be a problem solver with the ability to multitask and come with strong collaboration and communication skills.



Key Responsibilities:-

  • Proactively monitor and review application performance

  • Handle on-call and emergency support

  • Ensure software has good logging and diagnostics

  • Create and maintain operational runbooks

  • Contribute in Solution Designing and evaluating Technical Debt

  • Set right practices for Well-Defined Architecture & to minimize toil.

  • Own SLI, SLO configuration as per Error Budget

  • Maintain production services through measuring and monitoring availability, latency, and overall system health.

  • Practice sustainable incident response and blameless postmortems.

  • Not be afraid to contribute changes back to the Software engineering team to improve the systems.

  • Managing the delivery pipeline into production.

  • Able to mentor junior members on regular basis

  • Troubleshooting issues with web applications

  • Understanding of security principles and best practices

  • Ensuring that critical data is backed up

  • Configuration of monitoring systems including infrastructure monitoring and Application Performance Monitoring systems such as New Relic.

  • Ensuring that web application infrastructure is built

  • Ability to act as Customer Technical Advocate and negotiate well with peers on technical fronts.

  • Flexible enough to work in different Shifts for hyper business requirement

  • Ability to handle multiple global clients on tech front and generate desired reports to represent health of SRE Delivery.



Skills/Experience:-

  • A key skill of a SRE Tech Lead is that they have a deep knowledge of the application, the code, and how it runs, is configured, and scales. That knowledge is what makes them so valuable at also monitoring and supporting it as site reliability engineers.

  • System administration, security, and networking

  • The SRE Tech Lead expected to have a good understanding of system administration (Linux or Windows) and networking.

  • Essential commands

  • User and Group Management

  • Knowledge of networking concepts (DNS, TCP/IP, and Firewalls)

  • Service Configuration

  • Storage Management

  • Good grasp of fundamental security concepts

  • Good understanding of infrastructure as code principles.

  • Knowledge of a scripting language such as Bash

  • Ability to configure infrastructure using a Configuration Management technology such as Puppet, Chef, or Ansible.

  • Familiarity with Jenkins or any other CI/CD tool

  • Proficiency in a high-level programming language such as Python or Go.

  • Understanding of container technologies such as Docker, Kubernetes

  • 2 yrs+ hands on experience with container orchestration technologies such as ECS, EKS, AKS or Kubernetes would be beneficial.

  • Use Terraform and other IaC to deploy cloud infrastructure.







Cloud technologies:-

  • Experience designing available, cost-efficient, fault-tolerant, and scalable distributed systems on AWS/Azure

  • Hands-on experience using compute, networking, storage, and database AWS/Azure services

  • Hands-on experience of 4 yrs+ with AWS/Azure deployment and management services

  • Ability to identify and define technical requirements for an AWS/AZURE-based application

  • Ability to identify which AWS/AZURE services meet a given technical requirement

  • Knowledge of recommended best practices for building secure and reliable applications on the AWS/AZURE platform

  • An understanding of the AWS/AZURE global infrastructure

  • An understanding of network technologies as they relate to AWS/AZURE

  • An understanding of security features and tools that AWS/AZURE provides and how they relate to traditional services







 

Read more
Job posted by
Adyasha Satpathy

Staff Engineer - SRE

at Cloud & Security Firm

Agency job
via HyringNinja
Kubernetes
Ansible
site reliability engineer
SRE
DevOps
Linux/Unix
Python
Go Programming (Golang)
icon
Bengaluru (Bangalore)
icon
5 - 9 yrs
icon
₹20L - ₹60L / yr

Preferred Technical Skills:

  • 7+ years experience with troubleshooting Unix/Linux
  • Understanding of Networking concepts - TCP/IP, SSL/TLS, IPSec, GRE, VPN
  • Experience with algorithms, data structures, complexity analysis, and software design
  • Experience in one or more of the following: C, C++, Python, Go
  • Experience in managing a large-scale web operations role
  • Bonus points for experience with Ansible, Kubernetes, SQL and NoSQL datastores, CI/CD
  • Hands-on working with private or public cloud services in a highly available and scalable production environment. 

Desired Technical Skills:

  • Knowledge of distributed systems is a big plus.

 Additional Skills

  • Great written and verbal communication
  • Ability to work for a geo-distributed cross-functional group
  • Demonstrated ability to own and deliver projects independently
  • Demonstrated ability of technical mentoring and coaching 
  • Strong interpersonal communication skills (including listening, speaking, and writing) and the ability to work well in a diverse, team-focused environment with other SREs, developers, Product Managers, etc
Read more
Job posted by
Thomas G

Site Reliability Engineer - Product

at A listed product development organization

Agency job
via RS Consultants
Amazon Web Services (AWS)
Kubernetes
Ansible
Prometheus
Grafana
Pagerduty
EKS
icon
Pune
icon
4 - 8 yrs
icon
₹15L - ₹15L / yr

Position: Site Reliability Engineer

Location: Pune (Currently WFH, post pandemic you need to relocate)

 

About the Organization:

A funded product development company, headquarter in Singapore and offices in Australia, United States, Germany, United Kingdom, and India. You will gain work experience in a global environment.

 

Job Description:

We are looking for an experienced DevOps / Site Reliability engineer to join our team and be instrumental in taking our products to the next level.

 

In this role, you will be working on bleeding edge hybrid cloud / on-premise infrastructure handing billions of events and terabytes of data a day.

 

You will be responsible for working closely with various engineering teams to design, build and maintain a globally distributed infrastructure footprint.

As part of role, you will be responsible for researching new technologies, managing a large fleet of active services and their underlying servers, automating the deployment, monitoring and scaling of components and optimizing the infrastructure for cost and performance.

 

Day-to-day responsibilities

 

  • Ensure the operational integrity of the global infrastructure
  • Design repeatable continuous integration and delivery systems
  • Test and measure new methods, applications and frameworks
  • Analyze and leverage various AWS-native functionality
  • Support and build out an on-premise data center footprint
  • Provide support and diagnose issues to other teams related to our infrastructure
  • Participate in 24/7 on-call rotation (If Required)

 

Candidate's Profile:

 

 

  • Expert-level administrator of Linux-based systems
  • Experience managing distributed data platforms (Kafka, Spark, Cassandra, etc) Aerospike experience is a plus.
  • Experience with production deployments of Kubernetes Cluster
  • Experience in automating provisioning and managing Hybrid-Cloud infrastructure (AWS, GCP and On-Prem) at scale.
  • Knowledge of monitoring platform (Prometheus, Grafana, Graphite).
  • Experience in Distributed storage systems such as Ceph or GlusterFS.
  • Experience in virtualisation with KVM, Ovirt and OpenStack.
  • Hands-on experience with configuration management systems such as Terraform and Ansible
  • Bash and Python Scripting Expertise
  • Network troubleshooting experience (TCP, DNS, IPv6 and tcpdump)
  • Experience with continuous delivery systems (Jenkins, Gitlab, BitBucket, Docker)
  • Experience managing hundreds to thousands of servers globally
  • Enjoy automating tasks, rather than repeating them
  • Capable of estimating costs of various approaches, and finding simple and inexpensive solutions to complex problems
  • Strong verbal and written communication skills
  • Ability to adapt to a rapidly changing environment
  • Comfortable collaborating and supporting a diverse team of engineers
  • Ability to troubleshoot problems in complex systems
  • Flexible working hours and ability to participate in 24/7 on call support with other team members whenever required.
***** Looking for people from product organizations, who can join at the earliest.
Read more
Job posted by
Biswadeep RS

Site Reliability Engineering

at Coredgeio

Founded 2020  •  Product  •  20-100 employees  •  Raised funding
Reliability engineering
Docker
Kubernetes
DevOps
Site reliability
Cloud Computing
Amazon Web Services (AWS)
VMware vSphere
OpenStack
openshift
Google Cloud Platform (GCP)
icon
Remote, Noida, Bengaluru (Bangalore), NCR (Delhi | Gurgaon | Noida)
icon
6 - 11 yrs
icon
₹16L - ₹25L / yr
What are we looking for:
● Research, propose and evaluate with a 5-year vision, the architecture, design, technologies,
processes and profiles related to Telco Cloud.
● Participate in the creation of a realistic technical-strategic roadmap of the network to transform
it to Telco Cloud and be prepared for 5G.
● Using your deep technical expertise, you will provide detailed feedback to Product Management
and Engineering, as well as contribute directly to the platform code base to enhance both the
Customer experience of the service, as well as the SRE quality of life.
● The individual must be aware of trends in network infrastructure as well as within the network
engineering and OSS community. What technologies are being developed or launched?
● The individual should stay current with infrastructure trends in the telco network cloud domain.
● Be responsible for the Engineering of Lab and Production Telco Cloud environments, including
patches, upgrades, and reliability and performance improvements.
Required Minimum Qualifications: (Education and Technical Skills/Knowledge)
● Software Engineering degree, MS in Computer Science or equivalent experience
● Years of experiences as an SRE, DevOps, Development and/or Support related role
● 0-5 years of professional experience for a junior position
● At least 8 years of professional experience for a senior position
● Unix server administration and tuning : Linux / RedHat / CentOS / Ubuntu
● You have deep knowledge in Networking Layers 1-4
● Cloud / Virtualization (at least two): Helm, Docker, Kubernetes, AWS, Azure, Google Cloud,
OpenStack, OpenShift, VMware vSphere / Tanzu
● You have in-depth knowledge of cloud storage solutions on top of AWS, GCP, Azure and/or
on-prem private cloud, such as Ceph, CephFS, GlusterFS
● DevOps: Jenkins, Git, Azure DevOps, Ansible, Terraform
● Backend Knowledge Bash, Python, Go (other knowledge of Scripting Language is a plus).
● PaaS Level solutions such as Keycloak for IAM, Prometheus, Grafana, ELK, DBaaS (such as MySQL,
Cassandra)
About the Organisation:
The team at Coredge.io is a combination of experienced and young professionals alike having
many years of experience in working with Edge computing, Telecom application development
and Kubernetes. The company has continuously collaborated with the open source community,
universities and major industry players in furthering its goal of providing the industry with an
indispensable tool to offer improved services to its customers. Coredge.io has a global market
presence with its offices in US and New Delhi, India.
Read more
Job posted by
Abhimanyu Bhatter

Site Reliability Engineer

at Dremio

Founded 2015  •  Product  •  100-500 employees  •  Raised funding
Reliability engineering
Site reliability
DevOps
Python
CI/CD
Amazon Web Services (AWS)
Ansible
Kubernetes
Google Cloud Platform (GCP)
Windows Azure
icon
Hyderabad
icon
6 - 12 yrs
icon
₹20L - ₹40L / yr

About the Role

Dremio’s SREs ensure that our internal and externally visible services have reliability and uptime appropriate to users' needs and a fast rate of improvement. You will be joining a newly formed team that will spearhead our efforts to launch a cloud service. This is an opportunity to join a very fast growth startup and help build a cloud service from the ground up.

Responsibilities and Ownership

  • Ability to debug and optimize code and automate routine tasks.
  • Evangelize and advocate for reliability practices across our organization.
  • Collaborate with other Engineering teams to support services before they go live through activities such as system design consulting, developing software platforms and frameworks, monitoring/alerting, capacity planning and launch reviews.
  • Analyze and optimize our core product by developing and implementing reliability and performance practices.
  • Scale systems sustainably through automation and evolve systems by pushing for changes that improve reliability and velocity.
  • Be on-call for services that the SRE team owns.
  • Practice sustainable incident response and blameless postmortems.

Qualifications

  • 6+ years of relevant experience in the following areas: SRE, DevOps, Cloud Operations, Systems Engineering, or Software Engineering.
  • Excellent command of cloud services on AWS/GCP/Azure, Kubernetes and CI/CD pipelines.
  • Have moderate-advanced experience in Java, C, C++, Python, Go or other object-oriented programming languages.
  • You are Interested in designing, analyzing and troubleshooting large-scale distributed systems.
  • You have a systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive.
  • You have a great ability to debug and optimize code and automate routine tasks.
  • You have a solid background in software development and architecting resilient and reliable applications.
Read more
Job posted by
Kiran B
AWS CloudFormation
cloud automation
site reliability
cloudformation
Ansible
Terraform
Cloudformation
Amazon Web Services (AWS)
Python
JIRA
Perl
Powershell
Bash
Groovy
icon
Remote only
icon
5 - 11 yrs
icon
₹10L - ₹17L / yr
  • 5+ years of software development or site reliability engineering or equivalent experience
  • Skilled at problem solving, algorithms, and data structures
  • Building tools and scripting frameworks from scratch
  • Working with Cloud Automation tools like CloudFormation, Terraform, CDK, aws-cli
  • Scripting languages like Python, Groovy, PowerShell, Bash, Perl etc.
  • Configuration automation using Ansible or equivalent tools
  • Exposure to Windows, Linux administration skills
  • Project management tools like Jira, Trello
  • Prior experience in dealing with Datastore technologies like Postgres, MySQL, SQL, DynamoDB is desirable
  • Familiarity with basic networking, security and cloud engineering concepts
  • Team player who is eager to help others to succeed through mentoring and leading by example
  • Highly collaborative with effective written and verbal communication skills
Read more
Job posted by
Mohammad Farooq Shaik
Did not find a job you were looking for?
icon
Search for relevant jobs from 10000+ companies such as Google, Amazon & Uber actively hiring on Cutshort.
Get to hear about interesting companies hiring right now
iconFollow Cutshort
Want to apply to this role at Shuttl?
Why apply via Cutshort?
Connect with actual hiring teams and get their fast response. No spam.
Learn more
Get to hear about interesting companies hiring right now
iconFollow Cutshort