Senior/ Lead Site Reliability Engineer

at OJAS

icon
Hyderabad
icon
5 - 11 yrs
icon
₹10L - ₹20L / yr (ESOP available)
icon
Full time
Skills
site reliability
cloudformation
Terraform
Ansible
Cloud Automation
Software Development
AWS CloudFormation
Algorithms
Data Structures
Python
Powershell
DynamoDB
MySQL
  • 5+ years of software development or site reliability engineering or equivalent experience
  • Skilled at problem solving, algorithms, and data structures
  • Building tools and scripting frameworks from scratch
  • Working with Cloud Automation tools like CloudFormation, Terraform, CDK, aws-cli
  • Scripting languages like Python, Groovy, PowerShell, Bash, Perl etc.
  • Configuration automation using Ansible or equivalent tools
  • Exposure to Windows, Linux administration skills
  • Project management tools like Jira, Trello
  • Prior experience in dealing with Datastore technologies like Postgres, MySQL, SQL, DynamoDB is desirable
  • Familiarity with basic networking, security and cloud engineering concepts
  • Team player who is eager to help others to succeed through mentoring and leading by example
  • Highly collaborative with effective written and verbal communication skills
Read more
Why apply to jobs via Cutshort
Personalized job matches
Stop wasting time. Get matched with jobs that meet your skills, aspirations and preferences.
Verified hiring teams
See actual hiring teams, find common social connections or connect with them directly. No 3rd party agencies here.
Move faster with AI
We use AI to get you faster responses, recommendations and unmatched user experience.
2101133
Matches delivered
3712187
Network size
15000
Companies hiring

Similar jobs

Site Reliability Engineer

at Vonage (A Ericsson Company)

Agency job
via AVI Consulting LLP
Terraform
Chef
Ansible
Docker
Kubernetes
CI/CD
KMS
Hashikorp Vault
Grafana
ELK
Datadog
icon
Remote only
icon
4 - 12 yrs
icon
₹15L - ₹25L / yr
http://www.vonage.com" target="_blank">www.vonage.com

Site Reliability Engineer (SRE)
Vonage Engineering Mission: Vonage is the emerging leader in the $100B+ cloud communications platform (CPaaS) market.

Customers like Airbnb, Viber, Whatsapp, Snapchat, and many others depend on our APIs and SDKs to connect with their customers all over the world. As businesses continue to shift to a real-time, customer-centric communications model, we are experiencing a time of impressive growth.

Why this role matters:
Vonage, a leader in cloud communications, is looking to build a new SRE team in Bangalore.

We believe that there shouldn’t be walls between operations and development and we have embraced the DevOps movement.

As a Site Reliability Engineer, you will work as part of the development team to build automation and tools to deploy, monitor and maintain the platform's health, targeted SLO and SLAs.

What you'll do
● Lead the effort in ensuring reliability of the platform.
● Create Software and Tooling that improves performance, stability, and reliability of the
platform.
● Ability to work as part of a Development Team.
● Monitor Application Metrics to help with improving software performance.
● Build solutions that are highly resilient, scalable, and secure.
● Have a wide breadth of knowledge from software, infrastructure, and security.
● Adopt best practices and champion an engineering culture emphasizing Agile.
What's required for application
● Proven experience building, supporting, and architecting high-availability cloud
infrastructure.
● Experience working on monitoring, logging. and alerting solutions and used tools.
● Experience with tooling such as Terraform, Ansible, Docker, Kubernetes, and Chef.
● Fluent and comfortable working with Cloud Infrastructure.
● Ability to read, write, and troubleshoot software code.
● Good understanding of CI/CD tools.
● Champion of devsecops using tools such as Hashicorp Vault, KMS, Secrets Manager,
● Experience with software development, algorithms, data structures, and systems design.
● Understand monitoring tools such as DataDog, ELK, and Grafana.
● Bachelor's degree (or higher) in Computer Science and/or related
work experience.

www.vonage.com

Nice to have, but not required
● Working knowledge on other AWS services like Glacier, Elastic Container Service (ECS),
● Elastic MapReduce (EMR), DynamoDB etc.
● Automation and Orchestration tools such as Jenkins
● Ruby or Java development skills
● Data Pipeline knowledge, especially with tools like MapReduce, Kafka and ELK stack
Read more
Job posted by
Ashesh Shah

DevOps Engineer

at wwwsourcewizco

Founded 2020  •  Product  •  0-20 employees  •  Raised funding
Docker
Terraform
Amazon Web Services (AWS)
DevOps
icon
Bengaluru (Bangalore)
icon
1 - 5 yrs
icon
₹5L - ₹20L / yr
At Sourcewiz, we are building tools to help exporters grow their businesses. Our first product is a vertical sales software built for exporters, which allows them to market their unique creations to more buyers, generate more inquiries and increase their sales conversion.

Founded by a passionate team of serial entrepreneurs and alumni of IIT Delhi, U.C Berkeley, and well-known tech companies such as Uber and Zomato.

Sourcewiz is on a mission to increase India’s export GDP. This is a unique opportunity to
join a funded early-stage startup and have a massive impact on our product, culture, and
direction. It's a lot of work and a roller coaster ride. But, if you are up for it, you can join us
in replacing the tiresome and slow sales process for importers and exporters and have a
significant impact on our customers. We are not a company that believes engineers should be hidden away from decisions, churning out code for features decided from upon high. Instead, our Engineers form strong bonds with cross-functional peers in Product Management, Product Design and others to become experts in their product domain.

We’re looking for people with a strong interest in building successful products or systems;
are comfortable in dealing with lots of moving pieces; have exquisite attention to detail, and
comfortable learning new technologies and systems.

As a Site Reliability Engineer at Sourcewiz, you will...
• Own and improve the scalability and reliability of our products
• Working directly with product engineering team
• Work with RDBMS, Search, Caching and queuing
• Contribute expertise towards architectural planning and ensure the company builds
sustainable services that meet our customer expectations while leveraging appropriate
tools and frameworks.
• Ongoing participation in the review and testing
Read more
Job posted by
Saakshi Bhartiya

Senior Engineer - Cloud Reliability

at Searce Inc

Founded 2004  •  Products & Services  •  100-1000 employees  •  Profitable
DevOps
Terraform
Ansible
Puppet
Reliability engineering
Docker
Software deployment
Application server
IT infrastructure
Technical support
Amazon Web Services (AWS)
Google Cloud Platform (GCP)
icon
Pune
icon
5 - 8 yrs
icon
₹10L - ₹17L / yr
Experience :
● 4-8 years experience in Cloud Infrastructure and Operations domains
● Experience with Linux systems and/OR Windows servers
● Specialize in one or two cloud deployment platforms: AWS, GCP, Azure
● Hands on experience with AWS services (EKS, ECS, EC2, VPC, RDS, Lambda, GKE, Compute Engine)
● Experience with one or more programming languages (Python, JavaScript, Ruby, Java,
.Net)
● Good understanding of Apache Web Server, Nginx, MySQL, MongoDB, Nagios
● Logging and Monitoring tools (ELK, Stackdriver, CloudWatch)
● DevOps Technologies
● Knowledge on Configuration Management tools such as Ansible, Terraform, Puppet,
Chef
● Experience working with deployment and orchestration technologies (such as Docker,
Kubernetes, Mesos)
Read more
Job posted by
Reena Bandekar

Site Reliability Engineers

at Sarvaha Systems Private Limited

Founded 2011  •  Products & Services  •  20-100 employees  •  Profitable
Google Cloud Platform (GCP)
Amazon Web Services (AWS)
Microsoft Windows Azure
DevOps
Python
Kubernetes
Jenkins
Cassandra
Terraform
Windows Azure
Java
ELKI
SRE
Grafana
icon
Remote only
icon
3 - 5 yrs
icon
₹12L - ₹20L / yr

           JD: Site Reliability Engineers         

           Location: PUNE, Remote

     

Sarvaha would like to welcome experienced SRE specialists with minimum of 5 years of professional experience in Google Cloud Platform or AWS based deployments and automation. Sarvaha is a niche software development company that works with some of the best funded startups and established companies across the globe. Your will be expected to work with a globally distributed team and contribute independently as well as lead a team of engineers. This is a hands-on position that would require you to be responsible for production software deployments across global availability zones. 

 

Key Responsibilities

 

  • Design, write and run services that provide visibility into a leading IoT platform & underlying services
  • Automate deployments, diagnostic and debugging tools
  • Participate in on-call rotations
  • Adhere to industry-standard security best practices  
  • Work with other teams in troubleshooting and keeping the systems up and running

 

Skills Required

 

  • Minimum Bachelor’s Degree in Computer Science or related degree
  • Minimum 5+ years of total experience with at least 4 years of experience in SRE, DevOps or similar role. More experience in highly desired
  • 4+ years of hands-on experience with one of AWS/Azure/GCP is must have for this position
  • 1+ years of experience debugging code written in Python, Java or any strongly typed language
  • 3+ years of experience with Kubernetes, Prometheus, ELK, Grafana, Nagios
  • 2+ years of experience with Jenkins or similar build and deploy orchestration tool
  • 2+ years of experience with RDBMs and no-SQL databases (MySQL, Oracle, Cassandra, CDH)
  • 1+ years of experience writing infrastructure as code using Terraform
  • Excellent verbal and written communication and strong interpersonal skills are requisite for success of this position
  • Strong listening and interpersonal skills and attention to details is highly desired

 

Position Benefits

 

  • Top-notch remuneration with non-linear growth
  • Work with industry best cloud architects, DevOPs team and developers
  • Excellent, no-nonsense work environment with the very best people to work with
  • Cutting edge work with Fortune 500 businesses and learn from high-visibility systems that drive public facing, high-traffic systems
Read more
Job posted by
Santosh Maskar
DevOps
Terraform
Ansible
CI/CD
Linux administration
Kubernetes
Amazon Web Services (AWS)
Puppet
Chef
Python
Java
Go Programming (Golang)
icon
Bengaluru (Bangalore)
icon
6 - 11 yrs
icon
₹20L - ₹38L / yr

 

Roles and Responsibilities

  • Managing Availability, Performance, Capacity of infrastructure and applications.
  • Building and implementing observability for applications health/performance/capacity.
  • Optimizing On-call rotations and processes.
  • Documenting “tribal” knowledge.
  • Managing Infra-platforms like Mesos/Kubernetes,CICD,Observability (Prometheus/New Relic/ELK),Cloud Platforms (AWS/ Azure),Databases,Data Platforms Infrastructure
  • Providing help in onboarding new services with production readiness review process.
  • Providing reports on services SLO/Error Budgets/Alerts and Operational Overhead.
  • Working with Dev and Product teams to define SLO/Error Budgets/Alerts.
  • Working with Dev team to have in depth understanding of the application architecture

          and its bottlenecks.

  • Identifying observability gaps in product services, infrastructure and working with stake

          owners to fix it.

  • Managing Outages and doing detailed RCA with developers and identifying ways to

          avoid that situation.

  • Managing/Automating upgrades of the infrastructure services.
  • Automate toil work.
  •  

Experience & Skills

  • 6+ years of total experience
  • Experience as an SRE/DevOps/Infrastructure Engineer on large scale microservices and infrastructure.
  • A collaborative spirit with the ability to work across disciplines to influence, learn, and

         deliver.

  • A deep understanding of computer science, software development, and networking principles.
  • Demonstrated experience with languages, such as Python, Java, Golang etc.
  • Extensive experience with Linux administration and good understanding the various

linux kernel subsystems (memory, storage, network etc).

  • Extensive experience in DNS, TCP/IP, UDP, GRPC, Routing and Load Balancing.
  • Expertise in GitOps, Infrastructure as a Code tools such as Terraform etc.. and
  • Configuration Management Tools such as Chef, Puppet, Saltstack, Ansible.
  • Expertise of Amazon Web Services (AWS) and/or other relevant Cloud Infrastructure

solutions like Microsoft Azure or Google Cloud.

  • Experience in building CI/CD solutions with tools such as Jenkins, GitLab, Spinnaker,

Argo etc.

  • Experience in managing and deploying containerized environments using Docker,

Mesos/Kubernetes is a plus.

Read more
Job posted by
RAKESH RANJAN

Site Reliability Engineer

at A startup company providing AI based software platforms

Agency job
via zyoin
Site Reliability
DevOps
Docker
Kubernetes
Python
Amazon Web Services (AWS)
Reliability engineering
icon
Remote, Bengaluru (Bangalore)
icon
3 - 7 yrs
icon
₹10L - ₹30L / yr

Who You Are

  • Creative thinker and strong problem solver with meticulous attention to detail
  • Highly organized, creative, motivated, and passionate about achieving results
  • Able to balance multiple tasks and projects effectively and quickly adapt to new situations and technologies
  • Able to work both independently and as part of a team
  • Systematic problem-solver, coupled with a strong sense of ownership and drive

 

What you need

  • 3-7 years of experience as a Site Reliability Engineer or a mix of a software engineer and DevOps.
  • Strong hands-on knowledge of Linux fundamentals, System administration scripting, performance tuning/scalability, troubleshooting.
  • Write great quality code using SOLID principles including unit and integration tests.
  • Hands-on development experience in an object-orientated programming language like Python.
  • Hands-on experience developing task automations
  • Experience using tools to create and manage CI (continuous integration) and CD (continuous delivery) pipelines.
  • Familiarity with software development tools: source code management (SCM systems), code review systems, issue tracking tools, build tools, test frameworks, code quality tools.
  • Experience implementing open-source observability and alerting tools, like Prometheus, Grafana, Cortex, Thanos, Alertmanager etc
  • Have decent knowledge on networking (VPC, VNet, DNS etc) and of the TCP/IP stack, internet routing and load balancing.
  • Worked with log and configuration management tool
  • Prior experience of working with AWS, Azure, GCP is a plus
  • Prior experience of working with Kubernetes, Docker and containers is plus
  • Strong interpersonal communication skills (including listening, speaking, and writing) and ability to work well in a diverse, team-focused environment with other SREs, Engineers, Product Managers, etc.
  • Documenting your work should be in your DNA

 

What you get

  • A chance to develop and build something (probably from scratch) which you can be proud of
  • Build and Implement modern systems observability solutions including monitoring, alerting, metrics, logging, and APM & distributed tracing.
  • Scale systems sustainably through automation and evolve systems by pushing for changes that improve reliability and velocity.
  • Maintain business continuity by identifying and driving opportunities to make systems highly resilient and human-free.
  • Closely work with the software engineering team to ensure accurate monitoring and metrics are being built into applications before going to production.
  • Develop and maintain software modules for use and re-use in cloud and on-premise systems automation.
  • Identify process gaps and implement process improvements to increase operational reliability
  • Drive standardization efforts across the services, infrastructure, systems, and practices
  • Develop Systems & Tools to help with Development team to uphold the Reliability principles
Read more
Job posted by
RAKESH RANJAN

Staff Engineer - SRE

at Cloud & Security Firm

Agency job
via HyringNinja
Kubernetes
Ansible
site reliability engineer
SRE
DevOps
Linux/Unix
Python
Go Programming (Golang)
icon
Bengaluru (Bangalore)
icon
5 - 9 yrs
icon
₹20L - ₹60L / yr

Preferred Technical Skills:

  • 7+ years experience with troubleshooting Unix/Linux
  • Understanding of Networking concepts - TCP/IP, SSL/TLS, IPSec, GRE, VPN
  • Experience with algorithms, data structures, complexity analysis, and software design
  • Experience in one or more of the following: C, C++, Python, Go
  • Experience in managing a large-scale web operations role
  • Bonus points for experience with Ansible, Kubernetes, SQL and NoSQL datastores, CI/CD
  • Hands-on working with private or public cloud services in a highly available and scalable production environment. 

Desired Technical Skills:

  • Knowledge of distributed systems is a big plus.

 Additional Skills

  • Great written and verbal communication
  • Ability to work for a geo-distributed cross-functional group
  • Demonstrated ability to own and deliver projects independently
  • Demonstrated ability of technical mentoring and coaching 
  • Strong interpersonal communication skills (including listening, speaking, and writing) and the ability to work well in a diverse, team-focused environment with other SREs, developers, Product Managers, etc
Read more
Job posted by
Thomas G

Site Reliability Engineer

at Dremio

Founded 2015  •  Product  •  100-500 employees  •  Raised funding
Reliability engineering
Site reliability
DevOps
Python
CI/CD
Amazon Web Services (AWS)
Ansible
Kubernetes
Google Cloud Platform (GCP)
Windows Azure
icon
Hyderabad
icon
6 - 12 yrs
icon
₹20L - ₹40L / yr

About the Role

Dremio’s SREs ensure that our internal and externally visible services have reliability and uptime appropriate to users' needs and a fast rate of improvement. You will be joining a newly formed team that will spearhead our efforts to launch a cloud service. This is an opportunity to join a very fast growth startup and help build a cloud service from the ground up.

Responsibilities and Ownership

  • Ability to debug and optimize code and automate routine tasks.
  • Evangelize and advocate for reliability practices across our organization.
  • Collaborate with other Engineering teams to support services before they go live through activities such as system design consulting, developing software platforms and frameworks, monitoring/alerting, capacity planning and launch reviews.
  • Analyze and optimize our core product by developing and implementing reliability and performance practices.
  • Scale systems sustainably through automation and evolve systems by pushing for changes that improve reliability and velocity.
  • Be on-call for services that the SRE team owns.
  • Practice sustainable incident response and blameless postmortems.

Qualifications

  • 6+ years of relevant experience in the following areas: SRE, DevOps, Cloud Operations, Systems Engineering, or Software Engineering.
  • Excellent command of cloud services on AWS/GCP/Azure, Kubernetes and CI/CD pipelines.
  • Have moderate-advanced experience in Java, C, C++, Python, Go or other object-oriented programming languages.
  • You are Interested in designing, analyzing and troubleshooting large-scale distributed systems.
  • You have a systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive.
  • You have a great ability to debug and optimize code and automate routine tasks.
  • You have a solid background in software development and architecting resilient and reliable applications.
Read more
Job posted by
Kiran B
citrix
TCP/IP
Communication Skills
Powershell
OSI model
Citrix
MicrosoftSCM
icon
Remote, Hyderabad
icon
5 - 15 yrs
icon
₹5L - ₹15L / yr
  • Ability to clearly articulate and demonstrate the value proposition. 
  • Provides technical support, configuration, and administration of Citrix environment 
  • Develops processes, procedures, and technical documentation for management of Citrix environment 
  • System and application health monitoring and reporting 
  • Consulting with application owners to install and tune their applications 
  • Provides technical support, configuration, and administration of Windows Server as required 
  • Assists with architectural design to improve reliability, performance, and efficiencies 
  • Responds to situations where standard procedures have failed in isolating or fixing problem. 
  • Provide support for a 24 x 7 operation, when required 
  • Establishes strong working relationships with key staff members in departments. 
  • Minimum 5+ years’ experience as a Wintel Engineer/Architect with an outstanding track record of progressively increasing responsibilities 
  • Strong ability to communicate effectively and multi-task with minimal supervision 
  • Current certifications with Cisco, Citrix & Microsoft required 
  • Strong understanding of TCP/IP and OSI model 
  • Experience in Citrix XenApp, XenDesktop and XenServer technologies. 
  • 5+ years of Network, Storage and Infrastructure (Data Center, etc.) experience. 
  • A strong understanding of leading manufactures routing and switching architecture; and experience with storage and virtualization products from leading manufactures. 
Read more
Job posted by
Abhijit Choudhary

Site Reliability Engineer

at Shuttl

Founded 2015  •  Product  •  100-500 employees  •  Raised funding
Terraform
Kubernetes
Ansible
icon
NCR (Delhi | Gurgaon | Noida)
icon
3 - 6 yrs
icon
₹10L - ₹21L / yr
WHAT WILL I DO? You will work as a Site Reliability Engineer responsible for the availability, performance, monitoring, and incident response, among other things, of the platforms and services used and owned by Shuttl. The SRE Team works alongside the Engineering team and owns every aspect of service availability as well as disaster recovery and business continuity plans. You will work with other Site Reliability Engineers and report to the Lead of Site Reliability Engineering Team. HOW DO WE WORK? Our engineering process is a five step process which consists of phases for planning, developing, testing & profiling, releasing and monitoring. The planning phase consists of documenting of the feature/task to be done followed by various discussions. These discussions cover product, delivery estimates, release plan, monitoring plan, test plans, architecture, code design, technology choices and best practice adoption. The development and testing phase coexist and involve writing code, unit tests, performance tests, profiling, stress testing, code reviews and QA testing. This phase is punctuated with daily scrums and standups. The release phase is largely about managing and communicating the release to customers and internal stakeholders and activating features. The last phase is the monitoring phase where relevant metrics and exceptions are tracked and any critical refinement for the delivered feature is undertaken. This phase culminates with a retrospective. SREs get involved in this process as early as possible to provide general guidance, recommendations and help with designing the application to be in compliance with community standards such as CNCF and 12 Factor. SRE involvement and influence tends to increase during mid to final stages of development where the application is primed for beta evaluation and all the tooling and instrumentation is finalized. WHAT SKILLS SHOULD I HAVE? For this role we expect you to have 3+ years of experience working as a DevOps Engineer or SRE. You should have a good grasp of Unix like systems, access control, networking nuances, process isolation by the means of kernel provided features, distributed applications and algorithms, job schedulers and secret management among other things. At Shuttl we are a big proponent of Immutable infrastructure. All our infrastructure is hosted with Amazon Web Services and we use Hashicorp's Terraform to manage the infrastructure as code. A good handle on AWS and Terraform is therefore a definitive plus. Since SREs are expected to write a lot of code, you are also expected to be skillful in a programming language, preferably Python or Go.
Read more
Job posted by
Tanika Monga
Did not find a job you were looking for?
icon
Search for relevant jobs from 10000+ companies such as Google, Amazon & Uber actively hiring on Cutshort.
Get to hear about interesting companies hiring right now
iconFollow Cutshort
Want to apply to this role at OJAS?
Why apply via Cutshort?
Connect with actual hiring teams and get their fast response. No spam.
Learn more
Get to hear about interesting companies hiring right now
iconFollow Cutshort