Site Reliability Engineer

at Dremio

DP
Posted by Kiran B
icon
Hyderabad
icon
6 - 12 yrs
icon
₹20L - ₹40L / yr
icon
Full time
Skills
Reliability engineering
Site reliability
DevOps
Python
CI/CD
Amazon Web Services (AWS)
Ansible
Kubernetes
Google Cloud Platform (GCP)
Windows Azure

About the Role

Dremio’s SREs ensure that our internal and externally visible services have reliability and uptime appropriate to users' needs and a fast rate of improvement. You will be joining a newly formed team that will spearhead our efforts to launch a cloud service. This is an opportunity to join a very fast growth startup and help build a cloud service from the ground up.

Responsibilities and Ownership

  • Ability to debug and optimize code and automate routine tasks.
  • Evangelize and advocate for reliability practices across our organization.
  • Collaborate with other Engineering teams to support services before they go live through activities such as system design consulting, developing software platforms and frameworks, monitoring/alerting, capacity planning and launch reviews.
  • Analyze and optimize our core product by developing and implementing reliability and performance practices.
  • Scale systems sustainably through automation and evolve systems by pushing for changes that improve reliability and velocity.
  • Be on-call for services that the SRE team owns.
  • Practice sustainable incident response and blameless postmortems.

Qualifications

  • 6+ years of relevant experience in the following areas: SRE, DevOps, Cloud Operations, Systems Engineering, or Software Engineering.
  • Excellent command of cloud services on AWS/GCP/Azure, Kubernetes and CI/CD pipelines.
  • Have moderate-advanced experience in Java, C, C++, Python, Go or other object-oriented programming languages.
  • You are Interested in designing, analyzing and troubleshooting large-scale distributed systems.
  • You have a systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive.
  • You have a great ability to debug and optimize code and automate routine tasks.
  • You have a solid background in software development and architecting resilient and reliable applications.
Read more

About Dremio

Founded
2015
Type
Product
Size
100-500
Stage
Raised funding
About
Drive business outcomes with the power of your data. Dremio’s data lake engine empowers data analysts and dramatically improves efficiency and control for data engineers, while lowering cloud costs.
Read more
Connect with the team
icon
Kiran B
icon
View
icon
Pranavsinh Gohil (CW)
icon
View
icon
Maharaja Subramanian (CW)
icon
View
icon
Sumit Singh
Company social profiles
icon
icon
icon
icon
Why apply to jobs via Cutshort
Personalized job matches
Stop wasting time. Get matched with jobs that meet your skills, aspirations and preferences.
Verified hiring teams
See actual hiring teams, find common social connections or connect with them directly. No 3rd party agencies here.
Move faster with AI
We use AI to get you faster responses, recommendations and unmatched user experience.
2101133
Matches delivered
3712187
Network size
15000
Companies hiring

Similar jobs

icon
Remote only
icon
5 - 15 yrs
icon
₹30L - ₹40L / yr
Microsoft Windows Azure
Python
Firewall administration
Ruby
Bash
+3 more

Primary Job Responsibilities

In this role you will be responsible for overall management and implementation of our clients  trading products, research infrastructure and workstation support. 


In this role your responsibilities will be:

- Overall design and implementation of the infrastructure, security and network (Cloud, Equinix and On Premises) either directly or with the help of vendors

- Manage onboarding process of new products, new clients, and new employees

- Manage the Cyber Security and other infrastructure policies

- Build simulation and research environment for researchers (on cloud or on premises)

- Managing back up and recovery of production data

- Managing internal development software and code repositories

- Managing inbound connectivity from clients to us and outbound connectivity from our trading systems to exchanges, brokers and network providers

- Reviewing all the invoices related to infrastructure and optimizing the expenses

- Working closely with Client IT teams, Vendors

 

Requirements

  • Ability to work with little or no supervision while operating within the team
  • Experience building scalable infrastructure on the cloud and On Premises
  • Experience with Firewalls and Networking
  • Experience in deployment of Web Servers
  • Experience working with financial data centers is a big plus
  • Experience with Cloud Platforms such as Amazon Web Services (AWS), Microsoft Azure or Google Cloud Platform
  • 5+ years of relevant experience
  • B.S. (or higher) in Computer Science or equivalent relevant experience
  • Experience in managing Windows System Administration & Linux server operating systems and related utilities and hardware
  • Experience automating system tasks and monitoring infrastructure deployment using a scripting language (Python, Ruby, Bash)
  • Experience with Cisco UCS platforms, Storage platforms & Cisco Networking products
  • High level understanding of infrastructure, networking and security principles
Read more
Digital B2B Platform
Agency job
via Jobdost by Sathish Kumar
icon
Bengaluru (Bangalore)
icon
3 - 4 yrs
icon
₹15L - ₹30L / yr
DevOps
Python
CI/CD
Linux/Unix
Git
+6 more
We are a digital B2B platform that offers loans, working capital, and payment services to small businesses.

Candidate MUST HAVE product-based company experience and a minimum of 3years of experience in DevOps.

What you will do (or learn) : 

1. Build our application stack on AWS. Infrastructure as code (read Terraform)
2. Build state-of-the-art CI/CD pipelines.
3. Manage data warehouses and data pipelines.
4. Work on infrastructure and data security.
5. State-of-the-art log management system and tooling around them.
6. Monitoring and alerting system.

What do we expect from you?
1. 3 to 10 years of experience with DevOps or SRE principles.
2. Good fundamentals of database management and other distributed systems management.
3. Experience in infrastructure as code or other configuration management systems.
4. Experience in scripting languages (like bash, python, go lang etc.)
5. Good understanding of Linux systems
6. Strong debugging and troubleshooting skills
7. Experience in tooling around monitoring, CI/CD, log management systems. 
Read more
One of the largest Equity broking House in India
Agency job
via HyrHub by Shwetha Naik
icon
Mumbai, Bengaluru (Bangalore)
icon
4 - 8 yrs
icon
₹15L - ₹20L / yr
Reliability engineering
SRE
DevOps
Amazon Web Services (AWS)
Ansible
+4 more
Common roles and responsibilities:
● Be on a PagerDuty rotation to respond to availability incidents and provide support
for service engineers.
● Run the production environment by monitoring availability and taking a holistic view
of system health
● Building and implementing services to make IT and support better at their jobs.
● Improve reliability, quality, and time-to-market of our suite of software solutions
● Measure and optimize system performance, with an eye toward pushing our
capabilities forward, getting ahead of customer needs, and innovating to continually
improve
● Gather and analyze metrics from both operating systems and applications to assist in
performance tuning and fault finding
● Experience from an agile working development environment
● Participate in system design consulting, platform management, and capacity planning
● Balance feature development speed and reliability with well-defined service level
objectives
Required Skills and Qualifications:
● 3+ years of experience working within DevOps or SRE teams.
● 3+ years experience with AWS Cloud
● Ability to program (structured and OO) with one or more high level languages, such
as Python, Go, Java, and JavaScript
● Must have experience with Ansible, Helm, Terraform and Kubernetes.
● Document every action so your findings turn into repeatable actions–and then into
automation.
● Hands-on experience with Distributed Version Control System such as GIT, AWS
CodeCommit or equivalent
● Know your way around Linux and the Unix Shell.
● Experience or familiarity with ELK stack
● Ability to use Azure DevOps
● Experience with distributed storage technologies like NFS, Ceph, S3 as well as
dynamic resource management frameworks (Mesos, Kubernetes)
● A proactive approach to spotting problems, areas for improvement, and performance
bottlenecks
Read more
Vonage (A Ericsson Company)
Agency job
via AVI Consulting LLP by Ashesh Shah
icon
Remote only
icon
4 - 12 yrs
icon
₹15L - ₹25L / yr
Terraform
Chef
Ansible
Docker
Kubernetes
+6 more
http://www.vonage.com" target="_blank">www.vonage.com

Site Reliability Engineer (SRE)
Vonage Engineering Mission: Vonage is the emerging leader in the $100B+ cloud communications platform (CPaaS) market.

Customers like Airbnb, Viber, Whatsapp, Snapchat, and many others depend on our APIs and SDKs to connect with their customers all over the world. As businesses continue to shift to a real-time, customer-centric communications model, we are experiencing a time of impressive growth.

Why this role matters:
Vonage, a leader in cloud communications, is looking to build a new SRE team in Bangalore.

We believe that there shouldn’t be walls between operations and development and we have embraced the DevOps movement.

As a Site Reliability Engineer, you will work as part of the development team to build automation and tools to deploy, monitor and maintain the platform's health, targeted SLO and SLAs.

What you'll do
● Lead the effort in ensuring reliability of the platform.
● Create Software and Tooling that improves performance, stability, and reliability of the
platform.
● Ability to work as part of a Development Team.
● Monitor Application Metrics to help with improving software performance.
● Build solutions that are highly resilient, scalable, and secure.
● Have a wide breadth of knowledge from software, infrastructure, and security.
● Adopt best practices and champion an engineering culture emphasizing Agile.
What's required for application
● Proven experience building, supporting, and architecting high-availability cloud
infrastructure.
● Experience working on monitoring, logging. and alerting solutions and used tools.
● Experience with tooling such as Terraform, Ansible, Docker, Kubernetes, and Chef.
● Fluent and comfortable working with Cloud Infrastructure.
● Ability to read, write, and troubleshoot software code.
● Good understanding of CI/CD tools.
● Champion of devsecops using tools such as Hashicorp Vault, KMS, Secrets Manager,
● Experience with software development, algorithms, data structures, and systems design.
● Understand monitoring tools such as DataDog, ELK, and Grafana.
● Bachelor's degree (or higher) in Computer Science and/or related
work experience.

www.vonage.com

Nice to have, but not required
● Working knowledge on other AWS services like Glacier, Elastic Container Service (ECS),
● Elastic MapReduce (EMR), DynamoDB etc.
● Automation and Orchestration tools such as Jenkins
● Ruby or Java development skills
● Data Pipeline knowledge, especially with tools like MapReduce, Kafka and ELK stack
Read more
Top Global Hedge Fund
Agency job
via Bullhorn Consultants by Hemant Singh
icon
Gurugram, Delhi, Noida, Ghaziabad, Faridabad
icon
3 - 8 yrs
icon
₹4L - ₹15L / yr
Kubernetes
Apache Kafka
prometheus
ELK
ELK Stack
+4 more
Experience in Kubernetes as a systems engineer
(deployment, troubleshooting, maintenance,
Helm charts) and Deployment and administration
of one or more of: ELK stack, Kafka, Prometheus
or Grafana with Working knowledge of at least
one cloud platform (GCP, AWS or Azure) & some
configuration management system (such as Salt
or Ansible).Good understanding of networking
concepts (architecture, components, protocols)
& Solid understanding of OS concepts and
internals of Linux is a must.
Read more
icon
Pune
icon
5 - 8 yrs
icon
₹10L - ₹17L / yr
DevOps
Terraform
Ansible
Puppet
Reliability engineering
+7 more
Experience :
● 4-8 years experience in Cloud Infrastructure and Operations domains
● Experience with Linux systems and/OR Windows servers
● Specialize in one or two cloud deployment platforms: AWS, GCP, Azure
● Hands on experience with AWS services (EKS, ECS, EC2, VPC, RDS, Lambda, GKE, Compute Engine)
● Experience with one or more programming languages (Python, JavaScript, Ruby, Java,
.Net)
● Good understanding of Apache Web Server, Nginx, MySQL, MongoDB, Nagios
● Logging and Monitoring tools (ELK, Stackdriver, CloudWatch)
● DevOps Technologies
● Knowledge on Configuration Management tools such as Ansible, Terraform, Puppet,
Chef
● Experience working with deployment and orchestration technologies (such as Docker,
Kubernetes, Mesos)
Read more
icon
Remote only
icon
3 - 5 yrs
icon
₹12L - ₹20L / yr
Google Cloud Platform (GCP)
Amazon Web Services (AWS)
Microsoft Windows Azure
DevOps
Python
+9 more

           JD: Site Reliability Engineers         

           Location: PUNE, Remote

     

Sarvaha would like to welcome experienced SRE specialists with minimum of 5 years of professional experience in Google Cloud Platform or AWS based deployments and automation. Sarvaha is a niche software development company that works with some of the best funded startups and established companies across the globe. Your will be expected to work with a globally distributed team and contribute independently as well as lead a team of engineers. This is a hands-on position that would require you to be responsible for production software deployments across global availability zones. 

 

Key Responsibilities

 

  • Design, write and run services that provide visibility into a leading IoT platform & underlying services
  • Automate deployments, diagnostic and debugging tools
  • Participate in on-call rotations
  • Adhere to industry-standard security best practices  
  • Work with other teams in troubleshooting and keeping the systems up and running

 

Skills Required

 

  • Minimum Bachelor’s Degree in Computer Science or related degree
  • Minimum 5+ years of total experience with at least 4 years of experience in SRE, DevOps or similar role. More experience in highly desired
  • 4+ years of hands-on experience with one of AWS/Azure/GCP is must have for this position
  • 1+ years of experience debugging code written in Python, Java or any strongly typed language
  • 3+ years of experience with Kubernetes, Prometheus, ELK, Grafana, Nagios
  • 2+ years of experience with Jenkins or similar build and deploy orchestration tool
  • 2+ years of experience with RDBMs and no-SQL databases (MySQL, Oracle, Cassandra, CDH)
  • 1+ years of experience writing infrastructure as code using Terraform
  • Excellent verbal and written communication and strong interpersonal skills are requisite for success of this position
  • Strong listening and interpersonal skills and attention to details is highly desired

 

Position Benefits

 

  • Top-notch remuneration with non-linear growth
  • Work with industry best cloud architects, DevOPs team and developers
  • Excellent, no-nonsense work environment with the very best people to work with
  • Cutting edge work with Fortune 500 businesses and learn from high-visibility systems that drive public facing, high-traffic systems
Read more
"A Product Startup"
Agency job
icon
Bengaluru (Bangalore)
icon
5 - 8 yrs
icon
₹5L - ₹20L / yr
Windows Azure
Microsoft Windows Azure
DevOps
Terraform
Solution architecture
+5 more

Senior Cloud Engineer / Jr. Cloud Solutions Architect

 

Roles and Responsibilities

  • Define, implement, deploy and maintain development, QA & production environments for cloud-based Azure architecture.

  • Create a strategy for establishing a secure and well-managed enterprise environment in Azure

  • Define and implement security architecture for production, ensure data security at all levels.

  • Provision Infrastructure as code using Azure CLI Powershell ARM templates and or Terraform with Ansible or other tools.

  • Develop scripts to automate the deployment of resource stacks and associated configurations

  • Extend MLP standard systems management processes into the cloud including change, incident, and problem management

  • Establish and implement monitoring and management infrastructure for both availability and performance management

  • Implement observability patterns using Azure Monitor Azure Application Insights and Log Analytics Workspace.

  • Provide internal training to the team.

 

Primary Skills/Requirements

  • 5+ years of experience in IT and infrastructure

  • 3+ years of experience in Azure design, support and management for a large-scale organization

  • Experience in design and implementation of high availability architecture.

  • Strong experience in Azure CLI Powershell and ARM Templates Terraform.

  • Strong understanding of IT Security and related audits

  • Experience with deploying applications on Linux - Ubuntu

  • Should know Azure offerings (Storage, OS instances, Availability zones, DR, Load balancers, VPN tunnel, Application Gateway, etc.)Cloud monitoring Experience with Azure Log Analytics Azure Monitor.

  • Experience with log collection tools and analysis, as well as infrastructure performance monitoring tools and optimization practices

  • Microsoft Azure Certification MCSE: Cloud Platform and Infrastructure or equivalent certification would be an added advantage

  • Experience with Postgres SQL Database

Behavioural

  • Positive work ethics

  • Ability to adapt to dynamic environment

  • Time Management

  • Team Player

  • Communication skills

  • Ability to work independently

Read more
A listed product development organization
Agency job
via RS Consultants by Biswadeep RS
icon
Pune
icon
4 - 8 yrs
icon
₹15L - ₹15L / yr
Amazon Web Services (AWS)
Kubernetes
Ansible
Prometheus
Grafana
+2 more

Position: Site Reliability Engineer

Location: Pune (Currently WFH, post pandemic you need to relocate)

 

About the Organization:

A funded product development company, headquarter in Singapore and offices in Australia, United States, Germany, United Kingdom, and India. You will gain work experience in a global environment.

 

Job Description:

We are looking for an experienced DevOps / Site Reliability engineer to join our team and be instrumental in taking our products to the next level.

 

In this role, you will be working on bleeding edge hybrid cloud / on-premise infrastructure handing billions of events and terabytes of data a day.

 

You will be responsible for working closely with various engineering teams to design, build and maintain a globally distributed infrastructure footprint.

As part of role, you will be responsible for researching new technologies, managing a large fleet of active services and their underlying servers, automating the deployment, monitoring and scaling of components and optimizing the infrastructure for cost and performance.

 

Day-to-day responsibilities

 

  • Ensure the operational integrity of the global infrastructure
  • Design repeatable continuous integration and delivery systems
  • Test and measure new methods, applications and frameworks
  • Analyze and leverage various AWS-native functionality
  • Support and build out an on-premise data center footprint
  • Provide support and diagnose issues to other teams related to our infrastructure
  • Participate in 24/7 on-call rotation (If Required)

 

Candidate's Profile:

 

 

  • Expert-level administrator of Linux-based systems
  • Experience managing distributed data platforms (Kafka, Spark, Cassandra, etc) Aerospike experience is a plus.
  • Experience with production deployments of Kubernetes Cluster
  • Experience in automating provisioning and managing Hybrid-Cloud infrastructure (AWS, GCP and On-Prem) at scale.
  • Knowledge of monitoring platform (Prometheus, Grafana, Graphite).
  • Experience in Distributed storage systems such as Ceph or GlusterFS.
  • Experience in virtualisation with KVM, Ovirt and OpenStack.
  • Hands-on experience with configuration management systems such as Terraform and Ansible
  • Bash and Python Scripting Expertise
  • Network troubleshooting experience (TCP, DNS, IPv6 and tcpdump)
  • Experience with continuous delivery systems (Jenkins, Gitlab, BitBucket, Docker)
  • Experience managing hundreds to thousands of servers globally
  • Enjoy automating tasks, rather than repeating them
  • Capable of estimating costs of various approaches, and finding simple and inexpensive solutions to complex problems
  • Strong verbal and written communication skills
  • Ability to adapt to a rapidly changing environment
  • Comfortable collaborating and supporting a diverse team of engineers
  • Ability to troubleshoot problems in complex systems
  • Flexible working hours and ability to participate in 24/7 on call support with other team members whenever required.
***** Looking for people from product organizations, who can join at the earliest.
Read more
ScienceLogic
Agency job
via Ojas Innovative Technologies by Mohammad Farooq Shaik
icon
Remote only
icon
5 - 11 yrs
icon
₹10L - ₹17L / yr
AWS CloudFormation
cloud automation
site reliability
cloudformation
Ansible
+9 more
  • 5+ years of software development or site reliability engineering or equivalent experience
  • Skilled at problem solving, algorithms, and data structures
  • Building tools and scripting frameworks from scratch
  • Working with Cloud Automation tools like CloudFormation, Terraform, CDK, aws-cli
  • Scripting languages like Python, Groovy, PowerShell, Bash, Perl etc.
  • Configuration automation using Ansible or equivalent tools
  • Exposure to Windows, Linux administration skills
  • Project management tools like Jira, Trello
  • Prior experience in dealing with Datastore technologies like Postgres, MySQL, SQL, DynamoDB is desirable
  • Familiarity with basic networking, security and cloud engineering concepts
  • Team player who is eager to help others to succeed through mentoring and leading by example
  • Highly collaborative with effective written and verbal communication skills
Read more
Did not find a job you were looking for?
icon
Search for relevant jobs from 10000+ companies such as Google, Amazon & Uber actively hiring on Cutshort.
Get to hear about interesting companies hiring right now
iconFollow Cutshort
Want to apply to this role at Dremio?
Why apply via Cutshort?
Connect with actual hiring teams and get their fast response. No spam.
Learn more
Get to hear about interesting companies hiring right now
iconFollow Cutshort