Cutshort logo
Shuttl logo
Site Reliability Engineer
Site Reliability Engineer
Shuttl's logo

Site Reliability Engineer

Tanika Monga's profile picture
Posted by Tanika Monga
3 - 6 yrs
₹10L - ₹21L / yr
Delhi, Gurugram, Noida
Skills
Terraform
skill iconKubernetes
Ansible
WHAT WILL I DO? You will work as a Site Reliability Engineer responsible for the availability, performance, monitoring, and incident response, among other things, of the platforms and services used and owned by Shuttl. The SRE Team works alongside the Engineering team and owns every aspect of service availability as well as disaster recovery and business continuity plans. You will work with other Site Reliability Engineers and report to the Lead of Site Reliability Engineering Team. HOW DO WE WORK? Our engineering process is a five step process which consists of phases for planning, developing, testing & profiling, releasing and monitoring. The planning phase consists of documenting of the feature/task to be done followed by various discussions. These discussions cover product, delivery estimates, release plan, monitoring plan, test plans, architecture, code design, technology choices and best practice adoption. The development and testing phase coexist and involve writing code, unit tests, performance tests, profiling, stress testing, code reviews and QA testing. This phase is punctuated with daily scrums and standups. The release phase is largely about managing and communicating the release to customers and internal stakeholders and activating features. The last phase is the monitoring phase where relevant metrics and exceptions are tracked and any critical refinement for the delivered feature is undertaken. This phase culminates with a retrospective. SREs get involved in this process as early as possible to provide general guidance, recommendations and help with designing the application to be in compliance with community standards such as CNCF and 12 Factor. SRE involvement and influence tends to increase during mid to final stages of development where the application is primed for beta evaluation and all the tooling and instrumentation is finalized. WHAT SKILLS SHOULD I HAVE? For this role we expect you to have 3+ years of experience working as a DevOps Engineer or SRE. You should have a good grasp of Unix like systems, access control, networking nuances, process isolation by the means of kernel provided features, distributed applications and algorithms, job schedulers and secret management among other things. At Shuttl we are a big proponent of Immutable infrastructure. All our infrastructure is hosted with Amazon Web Services and we use Hashicorp's Terraform to manage the infrastructure as code. A good handle on AWS and Terraform is therefore a definitive plus. Since SREs are expected to write a lot of code, you are also expected to be skillful in a programming language, preferably Python or Go.
Read more
Users love Cutshort
Read about what our users have to say about finding their next opportunity on Cutshort.
Subodh Popalwar's profile image

Subodh Popalwar

Software Engineer, Memorres
For 2 years, I had trouble finding a company with good work culture and a role that will help me grow in my career. Soon after I started using Cutshort, I had access to information about the work culture, compensation and what each company was clearly offering.
Companies hiring on Cutshort
companies logos

About Shuttl

Founded :
2015
Type
Size
Stage :
Raised funding
About
N/A
Connect with the team
Profile picture
Suvidha Chib
Profile picture
Ravi Shah
Profile picture
Yamini Galhotra
Profile picture
Zinal Patel
Profile picture
Anuj Kanojia
Profile picture
Shamsul Arfeen
Profile picture
Tanika Monga
Profile picture
Divya Rao
Company social profiles
N/A

Similar jobs

Nvizion Solutions
at Nvizion Solutions
1 recruiter
Anshita Abhilasha
Posted by Anshita Abhilasha
Remote only
3 - 6 yrs
₹6L - ₹15L / yr
DevOps
Google Cloud Platform (GCP)
skill iconAmazon Web Services (AWS)
Linux/Unix
JIRA
+3 more

Nvizion Solutions is looking for the position of Site Reliability Engineer.

 

If interested, kindly share your resume along with contact details.

 

 

Title: Site Reliability Engineer

No. of job openings: 2

Location:Gurgaon/ Hyderabad/ Bengaluru/ Mumbai/Chennai ( Remote location)

Remuneration:Best in the Industry

 

 

·      Experience required: 2 to 4 yrs in the industry

·      Ensuring overall System's reliability

·      Add automation and alerting in the system

·      Providing Troubleshooting support

·      Cross team communications. Working closely with Product team and Customer success team.

·      Proactive support - to ensures the system is back to the healthy state

·      R&D for new tools/technologies to support product and support team

·      Good verbal/written communication to connect with the client.

·      Good team player with a zeal to learn new technologies.

·      The candidate will be part of the team responsible for 24X7 monitoring of distributed global platform.

  • Linux Scripting
  • CI/CD knowledge (Jenkins/ BitBucket Pipelie /GitOps)
  • Version Control
  • Cloud platform knowledge (GCP/AWS/Azure/Digital Ocean)
  • Docker, Kubernetes

 

Read more
Smarsh
at Smarsh
1 recruiter
Nichell Dsouza
Posted by Nichell Dsouza
Bengaluru (Bangalore)
9 - 15 yrs
₹40L - ₹50L / yr
Reliability engineering
skill iconKubernetes
IT infrastructure

Company Description

Smarsh is the leader in communications compliance, archiving, and analytics. We provide compliance across the broadest set of communications channels with insights on what’s being captured. Smarsh customers manage over 500 million daily conversations across 80 channels and growing. Customers include the top 10 U.S., top 8 European, top 5 Canadian, and top 3 Asian banks. The Smarsh advantage is customers stay ahead of compliance and uncover patterns and relationships hidden within their data.

At Smarsh , we’ve been helping our customers manage new forms of communication since 1998. We work closely with regulators including the SEC, FINRA, IIROC, and the PRA and FCA, and with our customers, to ensure that they understand the capabilities of today’s technology and that our platform meets their most stringent requirements. Our products include Connected Capture, Connected Archive, Web Archive & Business Solutions.

 

About the team

Are you an SRE with excellent Observability, Containerization and Orchestration skills? As a Site Reliability Engineer (SRE) in the Smarsh SaaS Operations team, you'll be part of a team who measures and improves production performance reliability through sustainable engineering practices for our suite of applications. Toil will be your number one enemy, observability your closest friend and your mission will be to drive operational burden as close to zero as you can.

Responsibilities

  • Responsible for technical direction at the platform solutions level. Is able to weigh the pros and cons of various solutions and credibly argue for the best path
  • Work closely with Product Management and the rest of the engineering team to define features and their implementations with careful attention to quality, scalability, and maintainability
  • Can break down complex technical solutions into abstractions that the rest of the team and understand
  • Can investigate and solve complex bugs, performance, and scalability issues
  • Collaborates with multiple agile teams to ensure their solutions integrate effectively
  • Track work in ticketing system (JIRA)
  • Participate in Pull Request reviews. Provide and receive feedback to continuously improve.
  • Other duties as assigned.

Desired skills & experience

  • A minimum 10+ years industry experience
  • Masters in CS or equivalent
  • Must have experience in Azure or AWS, either running some large-scale app there or migrating to Azure/AWS. 
  • Experience operating Cloud Foundry in production environments 
  • Experience managing CI/CD systems (Concourse, Jenkins, TravisCI etc.) 
  • Experience deploying and/or operating ELK stack 
  • Experience with container technologies and orchestration platforms (Docker, Kubernetes, Cloud Foundry) 
  • Experience working with monitoring and observability tools (We use Datadog and New Relic) 
  • Familiarity with working with PostgreSQL and MongoDB 
  • Background working in a multi-platform environment (Linux, Windows) 
  • Experience with running on a cloud platform, AWS preferred (S3, RDS, SQS) 
  • Familiarity with Agile/Scrum/Kanban methodologies 
  • Familiarity with programming/scripting languages (ie. Python, Bash, PowerShell, Go, etc.) 

Additional Skills

  • Expert programming skills in relevant languages
  • Exceptional analytical and problem-solving skills
  • Strong communication and collaboration skills
  • Deep understanding of modern software architecture
  • Deep domain knowledge of the industry, platform, and existing processes
  • Fault-tolerant design & maintenance
  • Knowledge and understanding of modern software programming/engineering.
  • Product delivery lifecycle - requirement refinement through ops

 

Why Smarsh?

Ready to join a thriving tech company that’s redefining digital archiving and business intelligence?

Smarsh is the leading comprehensive archiving platform. Recognized as one of today’s fastest growing companies in the U.S., Smarsh delivers innovative cloud-based solutions that help organizations manage and enforce flexible and secure records retention and compliance strategies for electronic communications, including social media and enterprise social networks (Yammer, Chatter, Facebook, LinkedIn and more).

Our motto is ‘People First. Inspire Confidence. Embrace the Impossible.’ We hire lifelong learners who have a passion for their discipline and a track record of excellence. To learn more about us, visit www.smarsh.com/careers

 


Read more
Bengaluru (Bangalore)
5 - 8 yrs
₹5L - ₹20L / yr
Windows Azure
Microsoft Windows Azure
DevOps
Terraform
Solution architecture
+5 more

Senior Cloud Engineer / Jr. Cloud Solutions Architect

 

Roles and Responsibilities

  • Define, implement, deploy and maintain development, QA & production environments for cloud-based Azure architecture.

  • Create a strategy for establishing a secure and well-managed enterprise environment in Azure

  • Define and implement security architecture for production, ensure data security at all levels.

  • Provision Infrastructure as code using Azure CLI Powershell ARM templates and or Terraform with Ansible or other tools.

  • Develop scripts to automate the deployment of resource stacks and associated configurations

  • Extend MLP standard systems management processes into the cloud including change, incident, and problem management

  • Establish and implement monitoring and management infrastructure for both availability and performance management

  • Implement observability patterns using Azure Monitor Azure Application Insights and Log Analytics Workspace.

  • Provide internal training to the team.

 

Primary Skills/Requirements

  • 5+ years of experience in IT and infrastructure

  • 3+ years of experience in Azure design, support and management for a large-scale organization

  • Experience in design and implementation of high availability architecture.

  • Strong experience in Azure CLI Powershell and ARM Templates Terraform.

  • Strong understanding of IT Security and related audits

  • Experience with deploying applications on Linux - Ubuntu

  • Should know Azure offerings (Storage, OS instances, Availability zones, DR, Load balancers, VPN tunnel, Application Gateway, etc.)Cloud monitoring Experience with Azure Log Analytics Azure Monitor.

  • Experience with log collection tools and analysis, as well as infrastructure performance monitoring tools and optimization practices

  • Microsoft Azure Certification MCSE: Cloud Platform and Infrastructure or equivalent certification would be an added advantage

  • Experience with Postgres SQL Database

Behavioural

  • Positive work ethics

  • Ability to adapt to dynamic environment

  • Time Management

  • Team Player

  • Communication skills

  • Ability to work independently

Read more
Remote only
3 - 10 yrs
₹5L - ₹15L / yr
skill iconPython
skill iconAmazon Web Services (AWS)
skill iconMongoDB
MySQL
skill iconDjango
+9 more

A network of the world's best developers - full-time, long-term remote software jobs with better compensation and career growth.  We enable our clients to accelerate their Cloud Offering and Capitalize on Cloud.  We have our own IoT/AI platform and we provide professional services on that platform to build custom clouds for their IoT devices.  We also build mobile apps, run 24x7 DevOps/site reliability engineering for our clients.

We are looking for a friendly, very hands-on technical, and dependable professional with plenty of experience as a backend & cloud engineer to provide site reliability services to our internal teams and end customers. We expect you to deliver with TOP quality & high speed. You must have experience developing and designing amazing UI screens.

 

This person MUST have:

  • BE Computer Science or equivalent
  • Cloud app development experience.
  • Strong Troubleshooting and debugging skills
  • A strong passion for writing simple, clean, and efficient code.
  • 3 years of experience with the Django framework and other backend technologies.
  • Knowledge of NodeJS
  • Experience with building, modifying, and extending API endpoints (REST or GraphQL) for data retrieval and persistence.
  • Understand how to use a database like Postgres (preferred choice), SQLite, MongoDB, MySQL.
  • Experience creating high-performance applications.
  • Experience with messaging and broker tools - Rabbitmq, MQTT
  • Experience with SQL and NoSQL databases
  • Experience with the full software development life cycle, including requirements collection, design, implementation, testing, and operational support.
  • Knowledge of web services
  • Proficient understanding of code versioning tools Git.
  • Hands-on experience deploying and managing infrastructure with CloudFormation/Terraform
  • Experience managing AWS infrastructure.
  • Hands-on experience in Linux environment.
  • Basic understanding of Kubernetes/Docker orchestration.
  • Manges existing infrastructure/Pipelines/Engineering tools (On-Prem or  AWS) for the engineering team (Build servers/Jenkins nodes etc.)
  • Experience with scrum or other agile software development methodology.
  • Excellent verbal and written communication, teamwork, decision making and influencing skills.
  • Handle customer calls/emails regarding technical issues for end-users.
  • Strong communication skills
  • Attention to detail.

 

 

Experience:

  • Min 3 year experience

 

Location:

  • Ahmedabad Office Or,
  • Work from home



Timings:

  • 40 hours a week with a rotational shift every month.

Position:

  • Full time/Direct
  • We have great benefits such as PF, medical insurance, 12 annual company holidays, 12 PTO leaves per year, annual increments, Diwali bonus, spot bonuses and other incentives, etc.
  • We don't believe in locking in people with large notice periods.  You will stay here because you love the company.  We have only a 30 days notice period
Read more
Remote, Bengaluru (Bangalore)
3 - 7 yrs
₹10L - ₹30L / yr
Site Reliability
DevOps
skill iconDocker
skill iconKubernetes
skill iconPython
+2 more

Who You Are

  • Creative thinker and strong problem solver with meticulous attention to detail
  • Highly organized, creative, motivated, and passionate about achieving results
  • Able to balance multiple tasks and projects effectively and quickly adapt to new situations and technologies
  • Able to work both independently and as part of a team
  • Systematic problem-solver, coupled with a strong sense of ownership and drive

 

What you need

  • 3-7 years of experience as a Site Reliability Engineer or a mix of a software engineer and DevOps.
  • Strong hands-on knowledge of Linux fundamentals, System administration scripting, performance tuning/scalability, troubleshooting.
  • Write great quality code using SOLID principles including unit and integration tests.
  • Hands-on development experience in an object-orientated programming language like Python.
  • Hands-on experience developing task automations
  • Experience using tools to create and manage CI (continuous integration) and CD (continuous delivery) pipelines.
  • Familiarity with software development tools: source code management (SCM systems), code review systems, issue tracking tools, build tools, test frameworks, code quality tools.
  • Experience implementing open-source observability and alerting tools, like Prometheus, Grafana, Cortex, Thanos, Alertmanager etc
  • Have decent knowledge on networking (VPC, VNet, DNS etc) and of the TCP/IP stack, internet routing and load balancing.
  • Worked with log and configuration management tool
  • Prior experience of working with AWS, Azure, GCP is a plus
  • Prior experience of working with Kubernetes, Docker and containers is plus
  • Strong interpersonal communication skills (including listening, speaking, and writing) and ability to work well in a diverse, team-focused environment with other SREs, Engineers, Product Managers, etc.
  • Documenting your work should be in your DNA

 

What you get

  • A chance to develop and build something (probably from scratch) which you can be proud of
  • Build and Implement modern systems observability solutions including monitoring, alerting, metrics, logging, and APM & distributed tracing.
  • Scale systems sustainably through automation and evolve systems by pushing for changes that improve reliability and velocity.
  • Maintain business continuity by identifying and driving opportunities to make systems highly resilient and human-free.
  • Closely work with the software engineering team to ensure accurate monitoring and metrics are being built into applications before going to production.
  • Develop and maintain software modules for use and re-use in cloud and on-premise systems automation.
  • Identify process gaps and implement process improvements to increase operational reliability
  • Drive standardization efforts across the services, infrastructure, systems, and practices
  • Develop Systems & Tools to help with Development team to uphold the Reliability principles
Read more
Olacabs.com
at Olacabs.com
6 recruiters
Agency job
via zyoin by RAKESH RANJAN
Bengaluru (Bangalore)
6 - 11 yrs
₹20L - ₹38L / yr
DevOps
Terraform
Ansible
CI/CD
Linux administration
+7 more

 

Roles and Responsibilities

  • Managing Availability, Performance, Capacity of infrastructure and applications.
  • Building and implementing observability for applications health/performance/capacity.
  • Optimizing On-call rotations and processes.
  • Documenting “tribal” knowledge.
  • Managing Infra-platforms like Mesos/Kubernetes,CICD,Observability (Prometheus/New Relic/ELK),Cloud Platforms (AWS/ Azure),Databases,Data Platforms Infrastructure
  • Providing help in onboarding new services with production readiness review process.
  • Providing reports on services SLO/Error Budgets/Alerts and Operational Overhead.
  • Working with Dev and Product teams to define SLO/Error Budgets/Alerts.
  • Working with Dev team to have in depth understanding of the application architecture

          and its bottlenecks.

  • Identifying observability gaps in product services, infrastructure and working with stake

          owners to fix it.

  • Managing Outages and doing detailed RCA with developers and identifying ways to

          avoid that situation.

  • Managing/Automating upgrades of the infrastructure services.
  • Automate toil work.
  •  

Experience & Skills

  • 6+ years of total experience
  • Experience as an SRE/DevOps/Infrastructure Engineer on large scale microservices and infrastructure.
  • A collaborative spirit with the ability to work across disciplines to influence, learn, and

         deliver.

  • A deep understanding of computer science, software development, and networking principles.
  • Demonstrated experience with languages, such as Python, Java, Golang etc.
  • Extensive experience with Linux administration and good understanding the various

linux kernel subsystems (memory, storage, network etc).

  • Extensive experience in DNS, TCP/IP, UDP, GRPC, Routing and Load Balancing.
  • Expertise in GitOps, Infrastructure as a Code tools such as Terraform etc.. and
  • Configuration Management Tools such as Chef, Puppet, Saltstack, Ansible.
  • Expertise of Amazon Web Services (AWS) and/or other relevant Cloud Infrastructure

solutions like Microsoft Azure or Google Cloud.

  • Experience in building CI/CD solutions with tools such as Jenkins, GitLab, Spinnaker,

Argo etc.

  • Experience in managing and deploying containerized environments using Docker,

Mesos/Kubernetes is a plus.

Read more
Dremio
at Dremio
4 recruiters
Kiran B
Posted by Kiran B
Hyderabad
6 - 12 yrs
₹20L - ₹40L / yr
Reliability engineering
Site reliability
DevOps
skill iconPython
CI/CD
+5 more

About the Role

Dremio’s SREs ensure that our internal and externally visible services have reliability and uptime appropriate to users' needs and a fast rate of improvement. You will be joining a newly formed team that will spearhead our efforts to launch a cloud service. This is an opportunity to join a very fast growth startup and help build a cloud service from the ground up.

Responsibilities and Ownership

  • Ability to debug and optimize code and automate routine tasks.
  • Evangelize and advocate for reliability practices across our organization.
  • Collaborate with other Engineering teams to support services before they go live through activities such as system design consulting, developing software platforms and frameworks, monitoring/alerting, capacity planning and launch reviews.
  • Analyze and optimize our core product by developing and implementing reliability and performance practices.
  • Scale systems sustainably through automation and evolve systems by pushing for changes that improve reliability and velocity.
  • Be on-call for services that the SRE team owns.
  • Practice sustainable incident response and blameless postmortems.

Qualifications

  • 6+ years of relevant experience in the following areas: SRE, DevOps, Cloud Operations, Systems Engineering, or Software Engineering.
  • Excellent command of cloud services on AWS/GCP/Azure, Kubernetes and CI/CD pipelines.
  • Have moderate-advanced experience in Java, C, C++, Python, Go or other object-oriented programming languages.
  • You are Interested in designing, analyzing and troubleshooting large-scale distributed systems.
  • You have a systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive.
  • You have a great ability to debug and optimize code and automate routine tasks.
  • You have a solid background in software development and architecting resilient and reliable applications.
Read more
Remote, Bengaluru (Bangalore)
3 - 8 yrs
₹15L - ₹30L / yr
skill iconPython
skill iconAmazon Web Services (AWS)
Ansible
Terraform
skill iconDocker
What you’ll do

• Develop and Maintain IAC using Terraform and Ansible
• Draft design documents that translate requirements into code.
• Deal with challenges associated with scale.
• Assume responsibilities from technical design through technical client support.
• Manage expectations with internal stakeholders and context-switch in a fast paced environment.
• Thrive in an environment that uses Elasticsearch extensively.
• Keep abreast of technology and contribute to the engineering strategy.
• Champion best development practices and provide mentorship.

What we’re looking for

• An AWS Certified Engineer with strong skills in
o Terraform
o Ansible
o *nix and shell scripting
• Preferably with experience in:
o Elasticsearch
o Circle CI
o CloudFormation
o Python
o Packer
o Docker
o Prometheus and Grafana
o Challenges of scale
o Production support
• Sharp analytical and problem-solving skills.
• Strong sense of ownership.
• Demonstrable desire to learn and grow.
• Excellent written and oral communication skills.
• Mature collaboration and mentoring abilities.
Read more
OJAS
Hyderabad
5 - 11 yrs
₹10L - ₹20L / yr
site reliability
cloudformation
Terraform
Ansible
Cloud Automation
+8 more
  • 5+ years of software development or site reliability engineering or equivalent experience
  • Skilled at problem solving, algorithms, and data structures
  • Building tools and scripting frameworks from scratch
  • Working with Cloud Automation tools like CloudFormation, Terraform, CDK, aws-cli
  • Scripting languages like Python, Groovy, PowerShell, Bash, Perl etc.
  • Configuration automation using Ansible or equivalent tools
  • Exposure to Windows, Linux administration skills
  • Project management tools like Jira, Trello
  • Prior experience in dealing with Datastore technologies like Postgres, MySQL, SQL, DynamoDB is desirable
  • Familiarity with basic networking, security and cloud engineering concepts
  • Team player who is eager to help others to succeed through mentoring and leading by example
  • Highly collaborative with effective written and verbal communication skills
Read more
ScienceLogic
Remote only
5 - 11 yrs
₹10L - ₹17L / yr
AWS CloudFormation
cloud automation
site reliability
cloudformation
Ansible
+9 more
  • 5+ years of software development or site reliability engineering or equivalent experience
  • Skilled at problem solving, algorithms, and data structures
  • Building tools and scripting frameworks from scratch
  • Working with Cloud Automation tools like CloudFormation, Terraform, CDK, aws-cli
  • Scripting languages like Python, Groovy, PowerShell, Bash, Perl etc.
  • Configuration automation using Ansible or equivalent tools
  • Exposure to Windows, Linux administration skills
  • Project management tools like Jira, Trello
  • Prior experience in dealing with Datastore technologies like Postgres, MySQL, SQL, DynamoDB is desirable
  • Familiarity with basic networking, security and cloud engineering concepts
  • Team player who is eager to help others to succeed through mentoring and leading by example
  • Highly collaborative with effective written and verbal communication skills
Read more
Why apply to jobs via Cutshort
people_solving_puzzle
Personalized job matches
Stop wasting time. Get matched with jobs that meet your skills, aspirations and preferences.
people_verifying_people
Verified hiring teams
See actual hiring teams, find common social connections or connect with them directly. No 3rd party agencies here.
ai_chip
Move faster with AI
We use AI to get you faster responses, recommendations and unmatched user experience.
21,01,133
Matches delivered
37,12,187
Network size
15,000
Companies hiring
Did not find a job you were looking for?
icon
Search for relevant jobs from 10000+ companies such as Google, Amazon & Uber actively hiring on Cutshort.
companies logo
companies logo
companies logo
companies logo
companies logo
Get to hear about interesting companies hiring right now
Company logo
Company logo
Company logo
Company logo
Company logo
Linkedin iconFollow Cutshort
Users love Cutshort
Read about what our users have to say about finding their next opportunity on Cutshort.
Subodh Popalwar's profile image

Subodh Popalwar

Software Engineer, Memorres
For 2 years, I had trouble finding a company with good work culture and a role that will help me grow in my career. Soon after I started using Cutshort, I had access to information about the work culture, compensation and what each company was clearly offering.
Companies hiring on Cutshort
companies logos