Cutshort logo
Smarsh logo
Principal Site Reliability Engineer
Principal Site Reliability Engineer
Smarsh's logo

Principal Site Reliability Engineer

Nichell Dsouza's profile picture
Posted by Nichell Dsouza
9 - 15 yrs
₹40L - ₹50L / yr
Bengaluru (Bangalore)
Skills
Reliability engineering
skill iconKubernetes
IT infrastructure

Company Description

Smarsh is the leader in communications compliance, archiving, and analytics. We provide compliance across the broadest set of communications channels with insights on what’s being captured. Smarsh customers manage over 500 million daily conversations across 80 channels and growing. Customers include the top 10 U.S., top 8 European, top 5 Canadian, and top 3 Asian banks. The Smarsh advantage is customers stay ahead of compliance and uncover patterns and relationships hidden within their data.

At Smarsh , we’ve been helping our customers manage new forms of communication since 1998. We work closely with regulators including the SEC, FINRA, IIROC, and the PRA and FCA, and with our customers, to ensure that they understand the capabilities of today’s technology and that our platform meets their most stringent requirements. Our products include Connected Capture, Connected Archive, Web Archive & Business Solutions.

 

About the team

Are you an SRE with excellent Observability, Containerization and Orchestration skills? As a Site Reliability Engineer (SRE) in the Smarsh SaaS Operations team, you'll be part of a team who measures and improves production performance reliability through sustainable engineering practices for our suite of applications. Toil will be your number one enemy, observability your closest friend and your mission will be to drive operational burden as close to zero as you can.

Responsibilities

  • Responsible for technical direction at the platform solutions level. Is able to weigh the pros and cons of various solutions and credibly argue for the best path
  • Work closely with Product Management and the rest of the engineering team to define features and their implementations with careful attention to quality, scalability, and maintainability
  • Can break down complex technical solutions into abstractions that the rest of the team and understand
  • Can investigate and solve complex bugs, performance, and scalability issues
  • Collaborates with multiple agile teams to ensure their solutions integrate effectively
  • Track work in ticketing system (JIRA)
  • Participate in Pull Request reviews. Provide and receive feedback to continuously improve.
  • Other duties as assigned.

Desired skills & experience

  • A minimum 10+ years industry experience
  • Masters in CS or equivalent
  • Must have experience in Azure or AWS, either running some large-scale app there or migrating to Azure/AWS. 
  • Experience operating Cloud Foundry in production environments 
  • Experience managing CI/CD systems (Concourse, Jenkins, TravisCI etc.) 
  • Experience deploying and/or operating ELK stack 
  • Experience with container technologies and orchestration platforms (Docker, Kubernetes, Cloud Foundry) 
  • Experience working with monitoring and observability tools (We use Datadog and New Relic) 
  • Familiarity with working with PostgreSQL and MongoDB 
  • Background working in a multi-platform environment (Linux, Windows) 
  • Experience with running on a cloud platform, AWS preferred (S3, RDS, SQS) 
  • Familiarity with Agile/Scrum/Kanban methodologies 
  • Familiarity with programming/scripting languages (ie. Python, Bash, PowerShell, Go, etc.) 

Additional Skills

  • Expert programming skills in relevant languages
  • Exceptional analytical and problem-solving skills
  • Strong communication and collaboration skills
  • Deep understanding of modern software architecture
  • Deep domain knowledge of the industry, platform, and existing processes
  • Fault-tolerant design & maintenance
  • Knowledge and understanding of modern software programming/engineering.
  • Product delivery lifecycle - requirement refinement through ops

 

Why Smarsh?

Ready to join a thriving tech company that’s redefining digital archiving and business intelligence?

Smarsh is the leading comprehensive archiving platform. Recognized as one of today’s fastest growing companies in the U.S., Smarsh delivers innovative cloud-based solutions that help organizations manage and enforce flexible and secure records retention and compliance strategies for electronic communications, including social media and enterprise social networks (Yammer, Chatter, Facebook, LinkedIn and more).

Our motto is ‘People First. Inspire Confidence. Embrace the Impossible.’ We hire lifelong learners who have a passion for their discipline and a track record of excellence. To learn more about us, visit www.smarsh.com/careers

 


Read more
Users love Cutshort
Read about what our users have to say about finding their next opportunity on Cutshort.
Subodh Popalwar's profile image

Subodh Popalwar

Software Engineer, Memorres
For 2 years, I had trouble finding a company with good work culture and a role that will help me grow in my career. Soon after I started using Cutshort, I had access to information about the work culture, compensation and what each company was clearly offering.
Companies hiring on Cutshort
companies logos

About Smarsh

Founded :
2001
Type
Size
Stage :
Profitable
About
N/A
Connect with the team
Profile picture
Nichell Dsouza
Company social profiles
N/A

Similar jobs

Renowned NGO
Agency job
via Merito by Jinita Sumaria
Pune
6 - 12 yrs
₹10L - ₹12L / yr
IT service management
IT operations
IT Strategy
IT project management
Operating systems
+4 more
We are looking for a Lead - Technology & Data Cell for one of the renowned NGOs in Pune.

Role - Lead (Technology & Data Cell)
Experience - 6+ years
Job Location - Aundh, Pune, Maharashtra

About our Client :-

Our client is a Communities Foundation that works in the area of skilling and livelihoods for underserved youths. This is a pioneering program with a strong PPP model, an agency-led approach to livelihoods and a vision of socio-economic transformation.
 
About the Role -
The Lead for Technology and Data consultant Cell has the opportunity to create and implement the vision for enabling the organization to serve 1 million youth by 2030 by using cutting-edge technology and data systems.
They will Tech enable organizational systems for effective operations, devise data solutions for effective decision making and strategic direction. They will closely work with the program teams to fully understand the
program landscape and implement technology solutions accordingly. Implementation would include being the single point of contact for the Software service provider, end to end back-end support and training of the users.
 
Roles and responsibilities:

- Design and Implementation/upgradation of a Tech platform for the Livelihood program:
In collaboration with the Software service provider, an ERP system is being developed and is close to going-live. The responsibilities would include:
i) Understanding the business requirements w.r.t the platform
ii) Data migration: Migrating the legacy data on the platform in the required format whilst ensuring accuracy of the data
iii) End-user training across centers and central team: Hand Holding the team along with Service provider during go-Live and implementation
iv) Troubleshooting wherever required through constant updates and follow-up on system glitches and ensuring resolution with the support of Software service providers.
v) Monitoring of the system application across centers. Identifying required improvisations and suggesting the same.
vi) Coordinating with software service provider for changes and support required for smooth running of the application
vii) Managing and maintaining SMS/Email gateways, domain, servers etc.
viii) Meaningful data extraction and reporting.
ix) Establish Data systems: Establish protocol for data storage and data sharing.
 
Technology requirements for the organization:
i) Identify technology requirements for Donor management, HR management and all other areas as required.
ii) Manage complete hardware requirements across locations including but not limited to server space, computers, internet solutions and data security.

- Data Analytics: 
Facilitate culture of data-driven decision making within the organization, including but not limited to, provision of relevant data analytics to the program team.

- Knowledge Management: Lead the overall knowledge management system for the organization and enable data to be available on cloud with a clear protocol for sharing and storage.
 
What are we looking for:

- Education: BE Computers
- Experience: Project management experience of 5+ years
- Data management skills Proven understanding the principles of data management and administration.
- IT and database skills Familiarity with modern databases and IT systems. - Candidates with a fair understanding of PHP and SQL databases would be preferred.
- Analytical skills
- Problem-solving skills
- Partnership management
- Excellent verbal and written communication skills.
Read more
DeepIntent
at DeepIntent
2 candid answers
17 recruiters
Indrajeet Deshmukh
Posted by Indrajeet Deshmukh
Pune
3 - 6 yrs
Best in industry
skill iconKubernetes
skill iconGit
MySQL
skill iconAmazon Web Services (AWS)
CI/CD
+3 more

With a core belief that advertising technology can measurably improve the lives of patients, DeepIntent is leading the healthcare advertising industry into the future. Built purposefully for the healthcare industry, the DeepIntent Healthcare Advertising Platform is proven to drive higher audience quality and script performance with patented technology and the industry’s most comprehensive health data. DeepIntent is trusted by 600+ pharmaceutical brands and all the leading healthcare agencies to reach the most relevant healthcare provider and patient audiences across all channels and devices. For more information, visit DeepIntent.com or find us on LinkedIn.


We are seeking a skilled and experienced Site Reliability Engineer (SRE) to join our dynamic team. The ideal candidate will have a minimum of 3 years of hands-on experience in managing and maintaining production systems, with a focus on reliability, scalability, and performance. As an SRE at Deepintent, you will play a crucial role in ensuring the stability and efficiency of our infrastructure, as well as contributing to the development of automation and monitoring tools.


Responsibilities:

  • Deploy, configure, and maintain Kubernetes clusters for our microservices architecture.
  • Utilize Git and Helm for version control and deployment management.
  • Implement and manage monitoring solutions using Prometheus and Grafana.
  • Work on continuous integration and continuous deployment (CI/CD) pipelines.
  • Containerize applications using Docker and manage orchestration.
  • Manage and optimize AWS services, including but not limited to EC2, S3, RDS, and AWS CDN.
  • Maintain and optimize MySQL databases, Airflow, and Redis instances.
  • Write automation scripts in Bash or Python for system administration tasks.
  • Perform Linux administration tasks and troubleshoot system issues.
  • Utilize Ansible and Terraform for configuration management and infrastructure as code.
  • Demonstrate knowledge of networking and load-balancing principles.
  • Collaborate with development teams to ensure applications meet reliability and performance standards.


Additional Skills (Good to Know):

  • Familiarity with ClickHouse and Druid for data storage and analytics.
  • Experience with Jenkins for continuous integration.
  • Basic understanding of Google Cloud Platform (GCP) and data center operations.


Qualifications:

  • Minimum 3 years of experience in a Site Reliability Engineer role or similar.
  • Proven experience with Kubernetes, Git, Helm, Prometheus, Grafana, CI/CD, Docker, and microservices architecture.
  • Strong knowledge of AWS services, MySQL, Airflow, Redis, AWS CDN.
  • Proficient in scripting languages such as Bash or Python.
  • Hands-on experience with Linux administration.
  • Familiarity with Ansible and Terraform for infrastructure management.
  • Understanding of networking principles and load balancing.


Education:

Bachelor's degree in Computer Science, Information Technology, or a related field.


DeepIntent is committed to bringing together individuals from different backgrounds and perspectives. We strive to create an inclusive environment where everyone can thrive, feel a sense of belonging, and do great work together.

DeepIntent is an Equal Opportunity Employer, providing equal employment and advancement opportunities to all individuals. We recruit, hire and promote into all job levels the most qualified applicants without regard to race, color, creed, national origin, religion, sex (including pregnancy, childbirth and related medical conditions), parental status, age, disability, genetic information, citizenship status, veteran status, gender identity or expression, transgender status, sexual orientation, marital, family or partnership status, political affiliation or activities, military service, immigration status, or any other status protected under applicable federal, state and local laws. If you have a disability or special need that requires accommodation, please let us know in advance.

DeepIntent’s commitment to providing equal employment opportunities extends to all aspects of employment, including job assignment, compensation, discipline and access to benefits and training.

Read more
Crelio Health
at Crelio Health
1 video
8 recruiters
Shreya Kabra
Posted by Shreya Kabra
Pune
5 - 15 yrs
Best in industry
SRE
Reliability engineering
Site reliability,
Site reliability engineer

Job Summary:

We are seeking a Senior DevOps & SRE Engineer to join our team and help us build, deploy, and maintain our infrastructure and applications. The ideal candidate will have experience working in a fast-paced environment and a strong background in DevOps and Site Reliability Engineering (SRE). You will be responsible for ensuring the reliability, scalability, and security of our applications and infrastructure.

 

Responsibilities:

  • Build and maintain our CI/CD pipeline and deployment automation tools
  • Design and implement monitoring and alerting systems to ensure the health of our applications and infrastructure
  • Work closely with development teams to ensure that code is deployed in a reliable and scalable manner
  • Participate in on-call rotations to provide 24/7 support for our production systems
  • Develop and maintain disaster recovery plans and processes
  • Continuously improve our infrastructure and processes to ensure scalability, reliability, and security
  • Mentor and provide technical leadership to junior team members
  • Keep up-to-date with industry best practices and emerging technologies in DevOps and SRE

Requirements:

  • Bachelor’s degree in Computer Science, Engineering, or a related field
  • 5+ years of experience in DevOps or SRE
  • Strong programming skills in at least one of the following languages: Python, Go, Ruby, or Java
  • Experience with infrastructure as code tools such as Terraform or CloudFormation
  • Experience with containerization technologies such as Docker and Kubernetes
  • Strong understanding of networking concepts such as TCP/IP, DNS, and load balancing
  • Experience with monitoring and logging tools such as Prometheus, Grafana, and ELK stack
  • Excellent problem-solving skills and the ability to troubleshoot complex issues in a fast-paced environment
  • Strong communication and collaboration skills with both technical and non-technical stakeholders

Preferred Qualifications:

  • Experience with cloud providers such as AWS or Azure
  • Experience with building and maintaining large-scale distributed systems
  • Experience with database technologies such as MySQL, PostgreSQL, or MongoDB
  • Experience with automation tools such as Ansible or Chef
  • Experience with Agile development methodologies such as Scrum or Kanban

If you are passionate about DevOps and SRE and have the skills and experience we are looking for, we encourage you to apply for this exciting opportunity.

Read more
Digital B2B Platform
Bengaluru (Bangalore)
3 - 4 yrs
₹15L - ₹30L / yr
DevOps
skill iconPython
CI/CD
Linux/Unix
skill iconGit
+6 more
We are a digital B2B platform that offers loans, working capital, and payment services to small businesses.

Candidate MUST HAVE product-based company experience and a minimum of 3years of experience in DevOps.

What you will do (or learn) : 

1. Build our application stack on AWS. Infrastructure as code (read Terraform)
2. Build state-of-the-art CI/CD pipelines.
3. Manage data warehouses and data pipelines.
4. Work on infrastructure and data security.
5. State-of-the-art log management system and tooling around them.
6. Monitoring and alerting system.

What do we expect from you?
1. 3 to 10 years of experience with DevOps or SRE principles.
2. Good fundamentals of database management and other distributed systems management.
3. Experience in infrastructure as code or other configuration management systems.
4. Experience in scripting languages (like bash, python, go lang etc.)
5. Good understanding of Linux systems
6. Strong debugging and troubleshooting skills
7. Experience in tooling around monitoring, CI/CD, log management systems. 
Read more
Kolkata
8 - 10 yrs
₹8L - ₹10L / yr
VMware Site Recovery Manager
Microsoft Windows
Microsoft Servers administration
Hyper V
WAN
+12 more

Position: Windows SRE

 

Responsibilities:

 

  • Windows Site Reliability Engineer with experience in managing large websites where Millions of customers hit
  • Manage and monitor all installed systems and infrastructure
  • Install, configure, test and maintain operating systems, application software and system management tools
  • Among your responsibilities will be the installation and configuration of storage, servers, Microsoft servers (Cluster Services, File Services, Active Directory Services, Certificate Authority services), Virtual Infrastructure (Hyper-V), IIS, MS SQL and backup system
  • Proactively ensure the highest levels of systems and infrastructure availability
  • Monitor and test application performance for potential bottlenecks, identify possible solutions, and work with developers to implement those fixes
  • Maintain security, backup, and redundancy strategies
  • Write and maintain custom scripts to increase system efficiency and lower the human intervention time on any tasks
  • Participate in the design of information and operational support systems
  • Provide 2nd and 3rd level support
  • Liaise and collaborate with vendors and Zacks personnel for problem resolution, decision making, knowledge sharing

 

Requirements:

 

  • Minimum 5+ years of Windows support experience, 7, 8, 10, and Microsoft Server (all)
  • Windows server expertise
  • Familiar with WAN/LAN technologies
  • Understanding of the OSI model
  • Virtualization - MS Hyper-V, VMware, vSan
  • Strong understanding of Internet protocols including HTTP(S), SSL, TCP, IP
  • MS IIS administration and configuration
  • MS Active Directory
  • MS Storage Space
  • DNS and DHCP
  • SSL certificates and PKI
  • Familiar with the ITIL framework
  • Strong PowerShell experience
  • Information Security experience a plus Other Qualifications
  • Excellent attention to detail

 

Experience: 8-10 years

Read more
Remote, Bengaluru (Bangalore)
3 - 7 yrs
₹10L - ₹30L / yr
Site Reliability
DevOps
skill iconDocker
skill iconKubernetes
skill iconPython
+2 more

Who You Are

  • Creative thinker and strong problem solver with meticulous attention to detail
  • Highly organized, creative, motivated, and passionate about achieving results
  • Able to balance multiple tasks and projects effectively and quickly adapt to new situations and technologies
  • Able to work both independently and as part of a team
  • Systematic problem-solver, coupled with a strong sense of ownership and drive

 

What you need

  • 3-7 years of experience as a Site Reliability Engineer or a mix of a software engineer and DevOps.
  • Strong hands-on knowledge of Linux fundamentals, System administration scripting, performance tuning/scalability, troubleshooting.
  • Write great quality code using SOLID principles including unit and integration tests.
  • Hands-on development experience in an object-orientated programming language like Python.
  • Hands-on experience developing task automations
  • Experience using tools to create and manage CI (continuous integration) and CD (continuous delivery) pipelines.
  • Familiarity with software development tools: source code management (SCM systems), code review systems, issue tracking tools, build tools, test frameworks, code quality tools.
  • Experience implementing open-source observability and alerting tools, like Prometheus, Grafana, Cortex, Thanos, Alertmanager etc
  • Have decent knowledge on networking (VPC, VNet, DNS etc) and of the TCP/IP stack, internet routing and load balancing.
  • Worked with log and configuration management tool
  • Prior experience of working with AWS, Azure, GCP is a plus
  • Prior experience of working with Kubernetes, Docker and containers is plus
  • Strong interpersonal communication skills (including listening, speaking, and writing) and ability to work well in a diverse, team-focused environment with other SREs, Engineers, Product Managers, etc.
  • Documenting your work should be in your DNA

 

What you get

  • A chance to develop and build something (probably from scratch) which you can be proud of
  • Build and Implement modern systems observability solutions including monitoring, alerting, metrics, logging, and APM & distributed tracing.
  • Scale systems sustainably through automation and evolve systems by pushing for changes that improve reliability and velocity.
  • Maintain business continuity by identifying and driving opportunities to make systems highly resilient and human-free.
  • Closely work with the software engineering team to ensure accurate monitoring and metrics are being built into applications before going to production.
  • Develop and maintain software modules for use and re-use in cloud and on-premise systems automation.
  • Identify process gaps and implement process improvements to increase operational reliability
  • Drive standardization efforts across the services, infrastructure, systems, and practices
  • Develop Systems & Tools to help with Development team to uphold the Reliability principles
Read more
Uniphore Software Systems
Sandesh HS
Posted by Sandesh HS
Bengaluru (Bangalore)
5 - 10 yrs
₹25L - ₹40L / yr
SRE
Site Reliability Engineer
Reliability engineering
DevOps
skill iconKubernetes
+5 more
Your Responsibilities
  • We are looking for a Senior SRE with a proven track record of success leading complex cloud-hybrid environments. You will have:
  • Strong sense of Being an Owner, Wearing the Customer Shoes, with the ability to Empower Others demonstrated through clear
  • communication and collaboration.
  • Skills to work independently with multiple global teams, developing, configuring, deploying, and operating our global infrastructure on AWS and on-prem.
  • Operational experience in complex distributed and real-time systems, including experience with SLO/SLAs towards high availability,reliability and DR goals.
  • DevOps experience in building tools and frameworks, with an understanding of continuous deployment processes.
  • Ability to think at scale, bringing a focus on continuous delivery methodologies from design through deployment and operations.
  • Experience building and managing systems with tools including Kubernetes, Chef/Ansible/Puppet, Kafka, Docker, and Terraform.
Required Skill
  • 5+ years experience in a Software and/or Site Reliability Engineering role
  • Experience writing automation code in GoLang, Python or Java
  • Experience developing and operating large scale distributed systems with Kubernetes and Docker
  • Experience in running real time and low latency high available applications (Kafka, gRPC, RTP)
  • Experience running public cloud environments on AWS
  • Experience running hybrid clouds and on-prem infrastructures on Red Hat Enterprise Linux / CentOS
  • Bachelor degree in Engineering, Computer Science or equivalent experience
  • The ability to lead, partner, and collaborate cross functionally across an engineering organization
Read more
Coredgeio
at Coredgeio
1 recruiter
Abhimanyu Bhatter
Posted by Abhimanyu Bhatter
Remote, Noida, Bengaluru (Bangalore), NCR (Delhi | Gurgaon | Noida)
6 - 11 yrs
₹16L - ₹25L / yr
Reliability engineering
skill iconDocker
skill iconKubernetes
DevOps
Site reliability
+6 more
What are we looking for:
● Research, propose and evaluate with a 5-year vision, the architecture, design, technologies,
processes and profiles related to Telco Cloud.
● Participate in the creation of a realistic technical-strategic roadmap of the network to transform
it to Telco Cloud and be prepared for 5G.
● Using your deep technical expertise, you will provide detailed feedback to Product Management
and Engineering, as well as contribute directly to the platform code base to enhance both the
Customer experience of the service, as well as the SRE quality of life.
● The individual must be aware of trends in network infrastructure as well as within the network
engineering and OSS community. What technologies are being developed or launched?
● The individual should stay current with infrastructure trends in the telco network cloud domain.
● Be responsible for the Engineering of Lab and Production Telco Cloud environments, including
patches, upgrades, and reliability and performance improvements.
Required Minimum Qualifications: (Education and Technical Skills/Knowledge)
● Software Engineering degree, MS in Computer Science or equivalent experience
● Years of experiences as an SRE, DevOps, Development and/or Support related role
● 0-5 years of professional experience for a junior position
● At least 8 years of professional experience for a senior position
● Unix server administration and tuning : Linux / RedHat / CentOS / Ubuntu
● You have deep knowledge in Networking Layers 1-4
● Cloud / Virtualization (at least two): Helm, Docker, Kubernetes, AWS, Azure, Google Cloud,
OpenStack, OpenShift, VMware vSphere / Tanzu
● You have in-depth knowledge of cloud storage solutions on top of AWS, GCP, Azure and/or
on-prem private cloud, such as Ceph, CephFS, GlusterFS
● DevOps: Jenkins, Git, Azure DevOps, Ansible, Terraform
● Backend Knowledge Bash, Python, Go (other knowledge of Scripting Language is a plus).
● PaaS Level solutions such as Keycloak for IAM, Prometheus, Grafana, ELK, DBaaS (such as MySQL,
Cassandra)
About the Organisation:
The team at Coredge.io is a combination of experienced and young professionals alike having
many years of experience in working with Edge computing, Telecom application development
and Kubernetes. The company has continuously collaborated with the open source community,
universities and major industry players in furthering its goal of providing the industry with an
indispensable tool to offer improved services to its customers. Coredge.io has a global market
presence with its offices in US and New Delhi, India.
Read more
Dremio
at Dremio
4 recruiters
Kiran B
Posted by Kiran B
Hyderabad
6 - 12 yrs
₹20L - ₹40L / yr
Reliability engineering
Site reliability
DevOps
skill iconPython
CI/CD
+5 more

About the Role

Dremio’s SREs ensure that our internal and externally visible services have reliability and uptime appropriate to users' needs and a fast rate of improvement. You will be joining a newly formed team that will spearhead our efforts to launch a cloud service. This is an opportunity to join a very fast growth startup and help build a cloud service from the ground up.

Responsibilities and Ownership

  • Ability to debug and optimize code and automate routine tasks.
  • Evangelize and advocate for reliability practices across our organization.
  • Collaborate with other Engineering teams to support services before they go live through activities such as system design consulting, developing software platforms and frameworks, monitoring/alerting, capacity planning and launch reviews.
  • Analyze and optimize our core product by developing and implementing reliability and performance practices.
  • Scale systems sustainably through automation and evolve systems by pushing for changes that improve reliability and velocity.
  • Be on-call for services that the SRE team owns.
  • Practice sustainable incident response and blameless postmortems.

Qualifications

  • 6+ years of relevant experience in the following areas: SRE, DevOps, Cloud Operations, Systems Engineering, or Software Engineering.
  • Excellent command of cloud services on AWS/GCP/Azure, Kubernetes and CI/CD pipelines.
  • Have moderate-advanced experience in Java, C, C++, Python, Go or other object-oriented programming languages.
  • You are Interested in designing, analyzing and troubleshooting large-scale distributed systems.
  • You have a systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive.
  • You have a great ability to debug and optimize code and automate routine tasks.
  • You have a solid background in software development and architecting resilient and reliable applications.
Read more
Shuttl
at Shuttl
8 recruiters
Tanika Monga
Posted by Tanika Monga
NCR (Delhi | Gurgaon | Noida)
3 - 6 yrs
₹10L - ₹21L / yr
Terraform
skill iconKubernetes
Ansible
WHAT WILL I DO? You will work as a Site Reliability Engineer responsible for the availability, performance, monitoring, and incident response, among other things, of the platforms and services used and owned by Shuttl. The SRE Team works alongside the Engineering team and owns every aspect of service availability as well as disaster recovery and business continuity plans. You will work with other Site Reliability Engineers and report to the Lead of Site Reliability Engineering Team. HOW DO WE WORK? Our engineering process is a five step process which consists of phases for planning, developing, testing & profiling, releasing and monitoring. The planning phase consists of documenting of the feature/task to be done followed by various discussions. These discussions cover product, delivery estimates, release plan, monitoring plan, test plans, architecture, code design, technology choices and best practice adoption. The development and testing phase coexist and involve writing code, unit tests, performance tests, profiling, stress testing, code reviews and QA testing. This phase is punctuated with daily scrums and standups. The release phase is largely about managing and communicating the release to customers and internal stakeholders and activating features. The last phase is the monitoring phase where relevant metrics and exceptions are tracked and any critical refinement for the delivered feature is undertaken. This phase culminates with a retrospective. SREs get involved in this process as early as possible to provide general guidance, recommendations and help with designing the application to be in compliance with community standards such as CNCF and 12 Factor. SRE involvement and influence tends to increase during mid to final stages of development where the application is primed for beta evaluation and all the tooling and instrumentation is finalized. WHAT SKILLS SHOULD I HAVE? For this role we expect you to have 3+ years of experience working as a DevOps Engineer or SRE. You should have a good grasp of Unix like systems, access control, networking nuances, process isolation by the means of kernel provided features, distributed applications and algorithms, job schedulers and secret management among other things. At Shuttl we are a big proponent of Immutable infrastructure. All our infrastructure is hosted with Amazon Web Services and we use Hashicorp's Terraform to manage the infrastructure as code. A good handle on AWS and Terraform is therefore a definitive plus. Since SREs are expected to write a lot of code, you are also expected to be skillful in a programming language, preferably Python or Go.
Read more
Why apply to jobs via Cutshort
people_solving_puzzle
Personalized job matches
Stop wasting time. Get matched with jobs that meet your skills, aspirations and preferences.
people_verifying_people
Verified hiring teams
See actual hiring teams, find common social connections or connect with them directly. No 3rd party agencies here.
ai_chip
Move faster with AI
We use AI to get you faster responses, recommendations and unmatched user experience.
21,01,133
Matches delivered
37,12,187
Network size
15,000
Companies hiring
Did not find a job you were looking for?
icon
Search for relevant jobs from 10000+ companies such as Google, Amazon & Uber actively hiring on Cutshort.
companies logo
companies logo
companies logo
companies logo
companies logo
Get to hear about interesting companies hiring right now
Company logo
Company logo
Company logo
Company logo
Company logo
Linkedin iconFollow Cutshort
Users love Cutshort
Read about what our users have to say about finding their next opportunity on Cutshort.
Subodh Popalwar's profile image

Subodh Popalwar

Software Engineer, Memorres
For 2 years, I had trouble finding a company with good work culture and a role that will help me grow in my career. Soon after I started using Cutshort, I had access to information about the work culture, compensation and what each company was clearly offering.
Companies hiring on Cutshort
companies logos