Cutshort logo
Olacabs.com logo
DevOps Engineer
DevOps Engineer
Olacabs.com's logo

DevOps Engineer

Agency job
via zyoin
6 - 11 yrs
₹20L - ₹38L / yr
Bengaluru (Bangalore)
Skills
DevOps
Terraform
Ansible
CI/CD
Linux administration
skill iconKubernetes
skill iconAmazon Web Services (AWS)
Puppet
Chef
skill iconPython
skill iconJava
skill iconGo Programming (Golang)

 

Roles and Responsibilities

  • Managing Availability, Performance, Capacity of infrastructure and applications.
  • Building and implementing observability for applications health/performance/capacity.
  • Optimizing On-call rotations and processes.
  • Documenting “tribal” knowledge.
  • Managing Infra-platforms like Mesos/Kubernetes,CICD,Observability (Prometheus/New Relic/ELK),Cloud Platforms (AWS/ Azure),Databases,Data Platforms Infrastructure
  • Providing help in onboarding new services with production readiness review process.
  • Providing reports on services SLO/Error Budgets/Alerts and Operational Overhead.
  • Working with Dev and Product teams to define SLO/Error Budgets/Alerts.
  • Working with Dev team to have in depth understanding of the application architecture

          and its bottlenecks.

  • Identifying observability gaps in product services, infrastructure and working with stake

          owners to fix it.

  • Managing Outages and doing detailed RCA with developers and identifying ways to

          avoid that situation.

  • Managing/Automating upgrades of the infrastructure services.
  • Automate toil work.
  •  

Experience & Skills

  • 6+ years of total experience
  • Experience as an SRE/DevOps/Infrastructure Engineer on large scale microservices and infrastructure.
  • A collaborative spirit with the ability to work across disciplines to influence, learn, and

         deliver.

  • A deep understanding of computer science, software development, and networking principles.
  • Demonstrated experience with languages, such as Python, Java, Golang etc.
  • Extensive experience with Linux administration and good understanding the various

linux kernel subsystems (memory, storage, network etc).

  • Extensive experience in DNS, TCP/IP, UDP, GRPC, Routing and Load Balancing.
  • Expertise in GitOps, Infrastructure as a Code tools such as Terraform etc.. and
  • Configuration Management Tools such as Chef, Puppet, Saltstack, Ansible.
  • Expertise of Amazon Web Services (AWS) and/or other relevant Cloud Infrastructure

solutions like Microsoft Azure or Google Cloud.

  • Experience in building CI/CD solutions with tools such as Jenkins, GitLab, Spinnaker,

Argo etc.

  • Experience in managing and deploying containerized environments using Docker,

Mesos/Kubernetes is a plus.

Read more
Users love Cutshort
Read about what our users have to say about finding their next opportunity on Cutshort.
Subodh Popalwar's profile image

Subodh Popalwar

Software Engineer, Memorres
For 2 years, I had trouble finding a company with good work culture and a role that will help me grow in my career. Soon after I started using Cutshort, I had access to information about the work culture, compensation and what each company was clearly offering.
Companies hiring on Cutshort
companies logos

About Olacabs.com

Founded :
2010
Type
Size
Stage
About
Ola is India’s largest mobility platform and one of the world’s largest ride-hailing companies, serving 250+ cities across India, Australia, New Zealand, and the UK. The Ola app offers mobility solutions by connecting customers to drivers and a wide range of vehicles across bikes, auto-rickshaws, metered taxis, and cabs, enabling convenience and transparency for hundreds of millions of consumers and over 1.5 million driver-partners. Ola’s core mobility offering in India is supplemented by its electric-vehicle arm, Ola Electric; India’s largest fleet management business, Ola Fleet Technologies and Ola Skilling, that aims to enable millions of livelihood opportunities for India's youth. With its acquisition of Ridlr, India’s leading public transportation app and investment in Vogo, a dockless scooter sharing solution, Ola is looking to build mobility for the next billion Indians. Ola also extends its consumer offerings like micro-insurance and credit led payments through Ola Financial Services and a range of owned food brands through India’s largest network of kitchens under its Food business. Ola’s core mobility offering in India is supplemented by its electric-vehicle arm, Ola Electric; India’s largest fleet management business, Ola Fleet Technologies and Ola Skilling, that aims to enable millions of livelihood opportunities for India's youth. With its acquisition of Ridlr, India’s leading public transportation app and investment in Vogo, a dockless scooter sharing solution, Ola is looking to build mobility for the next billion Indians. Ola also extends its consumer offerings like micro-insurance and credit led payments through Ola Financial Services and a range of owned food brands through India’s largest network of kitchens under its Food business.
Read more
Connect with the team
Profile picture
Athul ps
Profile picture
Shuhaib I
Profile picture
Roshni Pillai
Profile picture
Supriya Singh
Profile picture
Shivani Kukreja
Profile picture
Pradeep Kumaar
Company social profiles
linkedintwitterfacebook

Similar jobs

Mumbai
0 - 4 yrs
₹1L - ₹13L / yr
DevOps
SRE
Reliability engineering
Site Reliability Engineer
skill iconPython
+3 more
Job Responsibilities:
• Run the production environment by monitoring availability and taking a holistic view of
system health
• Build software and systems to manage platform infrastructure and applications
• Improve reliability, quality, and time-to-market of our suite of software solutions
• Measure and optimize system performance, with an eye toward pushing our capabilities
forward, getting ahead of customer needs, and innovating to continually improve
• Provide primary operational support and engineering for multiple large distributed
software applications
• Drive cross-team alignment across development teams around reliability initiatives

The ideal candidate must -
• Bachelor’s degree in computer science or other highly technical, scientific discipline
• Ability to program (structured and OO) with one or more high level languages, such as
Python, Java, C/C++, Ruby, and JavaScript
• Good experience with microservices architecture and serverless technologies
• Exposure to event driven architecture and state machines
• A proactive approach to spotting problems, areas for improvement, and performance
bottlenecks
Read more
Nvizion Solutions
at Nvizion Solutions
1 recruiter
Anshita Abhilasha
Posted by Anshita Abhilasha
Remote only
3 - 6 yrs
₹6L - ₹15L / yr
DevOps
Google Cloud Platform (GCP)
skill iconAmazon Web Services (AWS)
Linux/Unix
JIRA
+3 more

Nvizion Solutions is looking for the position of Site Reliability Engineer.

 

If interested, kindly share your resume along with contact details.

 

 

Title: Site Reliability Engineer

No. of job openings: 2

Location:Gurgaon/ Hyderabad/ Bengaluru/ Mumbai/Chennai ( Remote location)

Remuneration:Best in the Industry

 

 

·      Experience required: 2 to 4 yrs in the industry

·      Ensuring overall System's reliability

·      Add automation and alerting in the system

·      Providing Troubleshooting support

·      Cross team communications. Working closely with Product team and Customer success team.

·      Proactive support - to ensures the system is back to the healthy state

·      R&D for new tools/technologies to support product and support team

·      Good verbal/written communication to connect with the client.

·      Good team player with a zeal to learn new technologies.

·      The candidate will be part of the team responsible for 24X7 monitoring of distributed global platform.

  • Linux Scripting
  • CI/CD knowledge (Jenkins/ BitBucket Pipelie /GitOps)
  • Version Control
  • Cloud platform knowledge (GCP/AWS/Azure/Digital Ocean)
  • Docker, Kubernetes

 

Read more
Pune
4 - 8 yrs
₹15L - ₹15L / yr
skill iconAmazon Web Services (AWS)
skill iconKubernetes
Ansible
Prometheus
Grafana
+2 more

Position: Site Reliability Engineer

Location: Pune (Currently WFH, post pandemic you need to relocate)

 

About the Organization:

A funded product development company, headquarter in Singapore and offices in Australia, United States, Germany, United Kingdom, and India. You will gain work experience in a global environment.

 

Job Description:

We are looking for an experienced DevOps / Site Reliability engineer to join our team and be instrumental in taking our products to the next level.

 

In this role, you will be working on bleeding edge hybrid cloud / on-premise infrastructure handing billions of events and terabytes of data a day.

 

You will be responsible for working closely with various engineering teams to design, build and maintain a globally distributed infrastructure footprint.

As part of role, you will be responsible for researching new technologies, managing a large fleet of active services and their underlying servers, automating the deployment, monitoring and scaling of components and optimizing the infrastructure for cost and performance.

 

Day-to-day responsibilities

 

  • Ensure the operational integrity of the global infrastructure
  • Design repeatable continuous integration and delivery systems
  • Test and measure new methods, applications and frameworks
  • Analyze and leverage various AWS-native functionality
  • Support and build out an on-premise data center footprint
  • Provide support and diagnose issues to other teams related to our infrastructure
  • Participate in 24/7 on-call rotation (If Required)

 

Candidate's Profile:

 

 

  • Expert-level administrator of Linux-based systems
  • Experience managing distributed data platforms (Kafka, Spark, Cassandra, etc) Aerospike experience is a plus.
  • Experience with production deployments of Kubernetes Cluster
  • Experience in automating provisioning and managing Hybrid-Cloud infrastructure (AWS, GCP and On-Prem) at scale.
  • Knowledge of monitoring platform (Prometheus, Grafana, Graphite).
  • Experience in Distributed storage systems such as Ceph or GlusterFS.
  • Experience in virtualisation with KVM, Ovirt and OpenStack.
  • Hands-on experience with configuration management systems such as Terraform and Ansible
  • Bash and Python Scripting Expertise
  • Network troubleshooting experience (TCP, DNS, IPv6 and tcpdump)
  • Experience with continuous delivery systems (Jenkins, Gitlab, BitBucket, Docker)
  • Experience managing hundreds to thousands of servers globally
  • Enjoy automating tasks, rather than repeating them
  • Capable of estimating costs of various approaches, and finding simple and inexpensive solutions to complex problems
  • Strong verbal and written communication skills
  • Ability to adapt to a rapidly changing environment
  • Comfortable collaborating and supporting a diverse team of engineers
  • Ability to troubleshoot problems in complex systems
  • Flexible working hours and ability to participate in 24/7 on call support with other team members whenever required.
***** Looking for people from product organizations, who can join at the earliest.
Read more
Remote only
3 - 10 yrs
₹5L - ₹15L / yr
skill iconPython
skill iconAmazon Web Services (AWS)
skill iconMongoDB
MySQL
skill iconDjango
+9 more

A network of the world's best developers - full-time, long-term remote software jobs with better compensation and career growth.  We enable our clients to accelerate their Cloud Offering and Capitalize on Cloud.  We have our own IoT/AI platform and we provide professional services on that platform to build custom clouds for their IoT devices.  We also build mobile apps, run 24x7 DevOps/site reliability engineering for our clients.

We are looking for a friendly, very hands-on technical, and dependable professional with plenty of experience as a backend & cloud engineer to provide site reliability services to our internal teams and end customers. We expect you to deliver with TOP quality & high speed. You must have experience developing and designing amazing UI screens.

 

This person MUST have:

  • BE Computer Science or equivalent
  • Cloud app development experience.
  • Strong Troubleshooting and debugging skills
  • A strong passion for writing simple, clean, and efficient code.
  • 3 years of experience with the Django framework and other backend technologies.
  • Knowledge of NodeJS
  • Experience with building, modifying, and extending API endpoints (REST or GraphQL) for data retrieval and persistence.
  • Understand how to use a database like Postgres (preferred choice), SQLite, MongoDB, MySQL.
  • Experience creating high-performance applications.
  • Experience with messaging and broker tools - Rabbitmq, MQTT
  • Experience with SQL and NoSQL databases
  • Experience with the full software development life cycle, including requirements collection, design, implementation, testing, and operational support.
  • Knowledge of web services
  • Proficient understanding of code versioning tools Git.
  • Hands-on experience deploying and managing infrastructure with CloudFormation/Terraform
  • Experience managing AWS infrastructure.
  • Hands-on experience in Linux environment.
  • Basic understanding of Kubernetes/Docker orchestration.
  • Manges existing infrastructure/Pipelines/Engineering tools (On-Prem or  AWS) for the engineering team (Build servers/Jenkins nodes etc.)
  • Experience with scrum or other agile software development methodology.
  • Excellent verbal and written communication, teamwork, decision making and influencing skills.
  • Handle customer calls/emails regarding technical issues for end-users.
  • Strong communication skills
  • Attention to detail.

 

 

Experience:

  • Min 3 year experience

 

Location:

  • Ahmedabad Office Or,
  • Work from home



Timings:

  • 40 hours a week with a rotational shift every month.

Position:

  • Full time/Direct
  • We have great benefits such as PF, medical insurance, 12 annual company holidays, 12 PTO leaves per year, annual increments, Diwali bonus, spot bonuses and other incentives, etc.
  • We don't believe in locking in people with large notice periods.  You will stay here because you love the company.  We have only a 30 days notice period
Read more
Remote, Bengaluru (Bangalore)
3 - 7 yrs
₹10L - ₹30L / yr
Site Reliability
DevOps
skill iconDocker
skill iconKubernetes
skill iconPython
+2 more

Who You Are

  • Creative thinker and strong problem solver with meticulous attention to detail
  • Highly organized, creative, motivated, and passionate about achieving results
  • Able to balance multiple tasks and projects effectively and quickly adapt to new situations and technologies
  • Able to work both independently and as part of a team
  • Systematic problem-solver, coupled with a strong sense of ownership and drive

 

What you need

  • 3-7 years of experience as a Site Reliability Engineer or a mix of a software engineer and DevOps.
  • Strong hands-on knowledge of Linux fundamentals, System administration scripting, performance tuning/scalability, troubleshooting.
  • Write great quality code using SOLID principles including unit and integration tests.
  • Hands-on development experience in an object-orientated programming language like Python.
  • Hands-on experience developing task automations
  • Experience using tools to create and manage CI (continuous integration) and CD (continuous delivery) pipelines.
  • Familiarity with software development tools: source code management (SCM systems), code review systems, issue tracking tools, build tools, test frameworks, code quality tools.
  • Experience implementing open-source observability and alerting tools, like Prometheus, Grafana, Cortex, Thanos, Alertmanager etc
  • Have decent knowledge on networking (VPC, VNet, DNS etc) and of the TCP/IP stack, internet routing and load balancing.
  • Worked with log and configuration management tool
  • Prior experience of working with AWS, Azure, GCP is a plus
  • Prior experience of working with Kubernetes, Docker and containers is plus
  • Strong interpersonal communication skills (including listening, speaking, and writing) and ability to work well in a diverse, team-focused environment with other SREs, Engineers, Product Managers, etc.
  • Documenting your work should be in your DNA

 

What you get

  • A chance to develop and build something (probably from scratch) which you can be proud of
  • Build and Implement modern systems observability solutions including monitoring, alerting, metrics, logging, and APM & distributed tracing.
  • Scale systems sustainably through automation and evolve systems by pushing for changes that improve reliability and velocity.
  • Maintain business continuity by identifying and driving opportunities to make systems highly resilient and human-free.
  • Closely work with the software engineering team to ensure accurate monitoring and metrics are being built into applications before going to production.
  • Develop and maintain software modules for use and re-use in cloud and on-premise systems automation.
  • Identify process gaps and implement process improvements to increase operational reliability
  • Drive standardization efforts across the services, infrastructure, systems, and practices
  • Develop Systems & Tools to help with Development team to uphold the Reliability principles
Read more
Uniphore Software Systems
Sandesh HS
Posted by Sandesh HS
Bengaluru (Bangalore)
5 - 10 yrs
₹25L - ₹40L / yr
SRE
Site Reliability Engineer
Reliability engineering
DevOps
skill iconKubernetes
+5 more
Your Responsibilities
  • We are looking for a Senior SRE with a proven track record of success leading complex cloud-hybrid environments. You will have:
  • Strong sense of Being an Owner, Wearing the Customer Shoes, with the ability to Empower Others demonstrated through clear
  • communication and collaboration.
  • Skills to work independently with multiple global teams, developing, configuring, deploying, and operating our global infrastructure on AWS and on-prem.
  • Operational experience in complex distributed and real-time systems, including experience with SLO/SLAs towards high availability,reliability and DR goals.
  • DevOps experience in building tools and frameworks, with an understanding of continuous deployment processes.
  • Ability to think at scale, bringing a focus on continuous delivery methodologies from design through deployment and operations.
  • Experience building and managing systems with tools including Kubernetes, Chef/Ansible/Puppet, Kafka, Docker, and Terraform.
Required Skill
  • 5+ years experience in a Software and/or Site Reliability Engineering role
  • Experience writing automation code in GoLang, Python or Java
  • Experience developing and operating large scale distributed systems with Kubernetes and Docker
  • Experience in running real time and low latency high available applications (Kafka, gRPC, RTP)
  • Experience running public cloud environments on AWS
  • Experience running hybrid clouds and on-prem infrastructures on Red Hat Enterprise Linux / CentOS
  • Bachelor degree in Engineering, Computer Science or equivalent experience
  • The ability to lead, partner, and collaborate cross functionally across an engineering organization
Read more
Coredgeio
at Coredgeio
1 recruiter
Abhimanyu Bhatter
Posted by Abhimanyu Bhatter
Remote, Noida, Bengaluru (Bangalore), NCR (Delhi | Gurgaon | Noida)
6 - 11 yrs
₹16L - ₹25L / yr
Reliability engineering
skill iconDocker
skill iconKubernetes
DevOps
Site reliability
+6 more
What are we looking for:
● Research, propose and evaluate with a 5-year vision, the architecture, design, technologies,
processes and profiles related to Telco Cloud.
● Participate in the creation of a realistic technical-strategic roadmap of the network to transform
it to Telco Cloud and be prepared for 5G.
● Using your deep technical expertise, you will provide detailed feedback to Product Management
and Engineering, as well as contribute directly to the platform code base to enhance both the
Customer experience of the service, as well as the SRE quality of life.
● The individual must be aware of trends in network infrastructure as well as within the network
engineering and OSS community. What technologies are being developed or launched?
● The individual should stay current with infrastructure trends in the telco network cloud domain.
● Be responsible for the Engineering of Lab and Production Telco Cloud environments, including
patches, upgrades, and reliability and performance improvements.
Required Minimum Qualifications: (Education and Technical Skills/Knowledge)
● Software Engineering degree, MS in Computer Science or equivalent experience
● Years of experiences as an SRE, DevOps, Development and/or Support related role
● 0-5 years of professional experience for a junior position
● At least 8 years of professional experience for a senior position
● Unix server administration and tuning : Linux / RedHat / CentOS / Ubuntu
● You have deep knowledge in Networking Layers 1-4
● Cloud / Virtualization (at least two): Helm, Docker, Kubernetes, AWS, Azure, Google Cloud,
OpenStack, OpenShift, VMware vSphere / Tanzu
● You have in-depth knowledge of cloud storage solutions on top of AWS, GCP, Azure and/or
on-prem private cloud, such as Ceph, CephFS, GlusterFS
● DevOps: Jenkins, Git, Azure DevOps, Ansible, Terraform
● Backend Knowledge Bash, Python, Go (other knowledge of Scripting Language is a plus).
● PaaS Level solutions such as Keycloak for IAM, Prometheus, Grafana, ELK, DBaaS (such as MySQL,
Cassandra)
About the Organisation:
The team at Coredge.io is a combination of experienced and young professionals alike having
many years of experience in working with Edge computing, Telecom application development
and Kubernetes. The company has continuously collaborated with the open source community,
universities and major industry players in furthering its goal of providing the industry with an
indispensable tool to offer improved services to its customers. Coredge.io has a global market
presence with its offices in US and New Delhi, India.
Read more
Dremio
at Dremio
4 recruiters
Kiran B
Posted by Kiran B
Hyderabad
6 - 12 yrs
₹20L - ₹40L / yr
Reliability engineering
Site reliability
DevOps
skill iconPython
CI/CD
+5 more

About the Role

Dremio’s SREs ensure that our internal and externally visible services have reliability and uptime appropriate to users' needs and a fast rate of improvement. You will be joining a newly formed team that will spearhead our efforts to launch a cloud service. This is an opportunity to join a very fast growth startup and help build a cloud service from the ground up.

Responsibilities and Ownership

  • Ability to debug and optimize code and automate routine tasks.
  • Evangelize and advocate for reliability practices across our organization.
  • Collaborate with other Engineering teams to support services before they go live through activities such as system design consulting, developing software platforms and frameworks, monitoring/alerting, capacity planning and launch reviews.
  • Analyze and optimize our core product by developing and implementing reliability and performance practices.
  • Scale systems sustainably through automation and evolve systems by pushing for changes that improve reliability and velocity.
  • Be on-call for services that the SRE team owns.
  • Practice sustainable incident response and blameless postmortems.

Qualifications

  • 6+ years of relevant experience in the following areas: SRE, DevOps, Cloud Operations, Systems Engineering, or Software Engineering.
  • Excellent command of cloud services on AWS/GCP/Azure, Kubernetes and CI/CD pipelines.
  • Have moderate-advanced experience in Java, C, C++, Python, Go or other object-oriented programming languages.
  • You are Interested in designing, analyzing and troubleshooting large-scale distributed systems.
  • You have a systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive.
  • You have a great ability to debug and optimize code and automate routine tasks.
  • You have a solid background in software development and architecting resilient and reliable applications.
Read more
OJAS
Hyderabad
5 - 11 yrs
₹10L - ₹20L / yr
site reliability
cloudformation
Terraform
Ansible
Cloud Automation
+8 more
  • 5+ years of software development or site reliability engineering or equivalent experience
  • Skilled at problem solving, algorithms, and data structures
  • Building tools and scripting frameworks from scratch
  • Working with Cloud Automation tools like CloudFormation, Terraform, CDK, aws-cli
  • Scripting languages like Python, Groovy, PowerShell, Bash, Perl etc.
  • Configuration automation using Ansible or equivalent tools
  • Exposure to Windows, Linux administration skills
  • Project management tools like Jira, Trello
  • Prior experience in dealing with Datastore technologies like Postgres, MySQL, SQL, DynamoDB is desirable
  • Familiarity with basic networking, security and cloud engineering concepts
  • Team player who is eager to help others to succeed through mentoring and leading by example
  • Highly collaborative with effective written and verbal communication skills
Read more
ScienceLogic
Remote only
5 - 11 yrs
₹10L - ₹17L / yr
AWS CloudFormation
cloud automation
site reliability
cloudformation
Ansible
+9 more
  • 5+ years of software development or site reliability engineering or equivalent experience
  • Skilled at problem solving, algorithms, and data structures
  • Building tools and scripting frameworks from scratch
  • Working with Cloud Automation tools like CloudFormation, Terraform, CDK, aws-cli
  • Scripting languages like Python, Groovy, PowerShell, Bash, Perl etc.
  • Configuration automation using Ansible or equivalent tools
  • Exposure to Windows, Linux administration skills
  • Project management tools like Jira, Trello
  • Prior experience in dealing with Datastore technologies like Postgres, MySQL, SQL, DynamoDB is desirable
  • Familiarity with basic networking, security and cloud engineering concepts
  • Team player who is eager to help others to succeed through mentoring and leading by example
  • Highly collaborative with effective written and verbal communication skills
Read more
Why apply to jobs via Cutshort
people_solving_puzzle
Personalized job matches
Stop wasting time. Get matched with jobs that meet your skills, aspirations and preferences.
people_verifying_people
Verified hiring teams
See actual hiring teams, find common social connections or connect with them directly. No 3rd party agencies here.
ai_chip
Move faster with AI
We use AI to get you faster responses, recommendations and unmatched user experience.
21,01,133
Matches delivered
37,12,187
Network size
15,000
Companies hiring
Did not find a job you were looking for?
icon
Search for relevant jobs from 10000+ companies such as Google, Amazon & Uber actively hiring on Cutshort.
companies logo
companies logo
companies logo
companies logo
companies logo
Get to hear about interesting companies hiring right now
Company logo
Company logo
Company logo
Company logo
Company logo
Linkedin iconFollow Cutshort
Users love Cutshort
Read about what our users have to say about finding their next opportunity on Cutshort.
Subodh Popalwar's profile image

Subodh Popalwar

Software Engineer, Memorres
For 2 years, I had trouble finding a company with good work culture and a role that will help me grow in my career. Soon after I started using Cutshort, I had access to information about the work culture, compensation and what each company was clearly offering.
Companies hiring on Cutshort
companies logos