Site reliability Engineer

at Acceldata

Posted by Richa Kukar
Bengaluru (Bangalore)
3 - 8 yrs
₹20L - ₹40L / yr (ESOP available)
Full time
Reliability engineering
Load balancing
Big Data
Site reliability
Acceldata is creating the Data observability space. We make it possible for data-driven enterprises to effectively monitor, discover, and validate Data pipelines at Petabyte scale. Our customers include a Fortune 500 company, one of Asia's largest telecom companies, and a unicorn fintech startup. We are lean, hungry, customer-obsessed, and growing fast. Our Solutions team values productivity, integrity, and pragmatism. We provide a flexible, remote-friendly work environment.
We are building software that can provide insights into companies' data operations and allows them to focus on delivering data reliably with speed and effectiveness. Join us in building an industry-leading data operations platform that focuses on optimizing modern data lakes for both on-premise and cloud environments.



  • Our Site reliability engineers work on improving the availability, scalability, performance, and reliability of enterprise production services for our products as well as our customer’s data lake environments.
  • You will use your expertise to improve the reliability and performance of Hadoop Data lake clusters and data management services. Just as our products, our SRE are expected to be platform and vendor-agnostic when it comes to implementing, stabilizing, and tuning Hadoop ecosystems.
  • You’d be required to provide implementation guidance, best practices framework, and technical thought leadership to our customers for their Hadoop Data lake implementation and migration initiatives.
  • You need to be 100% hand-on and as a required test, monitor, administer, and operate multiple Data lake clusters across data centers.
  • Troubleshoot issues across the entire stack - hardware, software, application, and network.
  • Dive into problems with an eye to both immediate remediations as well as the follow-through changes and automation that will prevent future occurrences.
  • Must demonstrate exceptional troubleshooting and strong architectural skills and clearly and effectively describe this in both a verbal and written format.


  • Customer-focused, Self-driven, and Motivated with a strong work ethic and a passion for problem-solving.
  • 4+ years of designing, implementing, tuning, and managing services in a distributed, enterprise-scale on-premise and public/private cloud environment.
  • Familiarity with infrastructure management and operations lifecycle concepts and ecosystem.
  • Hadoop cluster design, Implementation, management and performance tuning experience with HDFS, YARN,
  • HIVE/IMPALA, SPARK, Kerberos and related Hadoop technologies are a must.
  • Must have strong SQL/HQL query troubleshooting and tuning skills on Hive/HBase.
  • Must have a strong capacity planning experience for Hadoop ecosystems/data lakes.
  • Good to have hands-on experience with – KAFKA, RANGER/SENTRY, NiFi, Ambari, Cloudera Manager, and HBASE.
  • Good to have data modeling, data engineering, and data security experience within the Hadoop ecosystem.Good to have deep JVM/Java debugging and tuning skills.
Read more

About Acceldata

Home - acceldata
Read more
100-500 employees
Raised funding
View full company details
Why apply to jobs via Cutshort
Personalized job matches
Stop wasting time. Get matched with jobs that meet your skills, aspirations and preferences.
Verified hiring teams
See actual hiring teams, find common social connections or connect with them directly. No 3rd party agencies here.
Move faster with AI
We use AI to get you faster responses, recommendations and unmatched user experience.
Matches delivered
Network size
Companies hiring

Similar jobs

Site Reliability Engineer Level 3

at Acceldata

Founded 2018  •  Product  •  100-500 employees  •  Raised funding
Reliability engineering
Site reliability
Apache HBase
Bengaluru (Bangalore)
6 - 10 yrs
Best in industry

Senior SRE - Acceldata (IC3 Level)

About the Job

You will join a team of highly skilled engineers who are responsible for delivering Acceldata’s support services. Our Site Reliability Engineers are trained to be active listeners and demonstrate empathy when customers encounter product issues. In our fun and collaborative environment  Site Reliability Engineers develop strong business, interpersonal and technical skills to deliver high-quality service to our valued customers.

When you arrive for your first day, we’ll want you to have:

  • Solid skills in troubleshooting to repair failed products or processes on a machine or a system using a logical, systematic search for the source of a problem in order to solve it, and make the product or process operational again
  • A strong ability to understand the feelings of our customers as we empathize with them on the issue at hand
  • A strong desire to increase your product and technology skillset; increase- your confidence supporting our products so you can help our customers succeed

In this position you will…

  • Provide Support Services to our Gold & Enterprise customers using our flagship Acceldata Pulse,Flow & Torch Product suits. This may include assistance provided during the engineering and operations of distributed systems as well as responses for mission-critical systems and production customers.
  • Demonstrate the ability to actively listen to customers and show empathy to the customer’s business impact when they experience issues with our products
  • Participate in the queue management and coordination process by owning customer escalations, managing the unassigned queue.
  • Be involved with and work on other support related activities - Performing POC & assisting Onboarding deployments of Acceldata & Hadoop distribution products.
  • Triage, diagnose and escalate customer inquiries when applicable during their engineering and operations efforts.
  • Collaborate and share solutions with both customers and the Internal team.
  • Investigate product related issues both for particular customers and for common trends that may arise
  • Study and understand critical system components and large cluster operations
  • Differentiate between issues that arise in operations, user code, or product
  • Coordinate enhancement and feature requests with product management and Acceldata engineering team.
  • Flexible in working in Shifts.
  • Participate in a Rotational weekend on-call roster for critical support needs.
  • Participate as a designated or dedicated engineer for specific customers. Aspects of this engagement translates to building long term successful relationships with customers, leading weekly status calls, and occasional visits to customer sites

In this position, you should have…

  • A strong desire and aptitude to become a well-rounded support professional. Acceldata Support considers the service we deliver as our core product.
  • A positive attitude towards feedback and continual improvement
  • A willingness to give direct feedback to and partner with management to improve team operations
  • A tenacity to bring calm and order to the often stressful situations of customer cases
  • A mental capability to multi-task across many customer situations simultaneously
  • Bachelor degree in Computer Science or Engineering or equivalent experience. Master’s degree is a plus
  • At least 2+ years of experience with at least one of the following cloud platforms: Amazon Web Services (AWS), Microsoft Azure, Google Cloud Platform (GCP), experience with managing and supporting a cloud infrastructure on any of the 3 platforms. Also knowledge on Kubernetes, Docker is a must.
  • Strong troubleshooting skills (in example, TCP/IP, DNS, File system, Load balancing, database, Java)
  • Excellent communication skills in English (written and verbal)
  • Prior enterprise support experience in a technical environment strongly preferred

Strong Hands-on Experience Working With Or Supporting The Following

  • 8-12 years of Experience with a highly-scalable, distributed, multi-node environment (50+ nodes)
  • Hadoop operation including Zookeeper, HDFS, YARN, Hive, and related components like the Hive metastore, Cloudera Manager/Ambari, etc
  • Authentication and security configuration and tuning (KNOX, LDAP, Kerberos, SSL/TLS, second priority: SSO/OAuth/OIDC, Ranger/Sentry)
  • Java troubleshooting, e.g., collection and evaluation of jstacks, heap dumps

You might also have…

  • Linux, NFS, Windows, including application installation, scripting, basic command line
  • Docker and Kubernetes configuration and troubleshooting, including Helm charts, storage options, logging, and basic kubectl CLI
  • Experience working with scripting languages (Bash, PowerShell, Python)
  • Working knowledge of application, server, and network security management concepts
  • Familiarity with virtual machine technologies
  • Knowledge of databases like MySQL and PostgreSQL,
  • Certification on any of the leading Cloud providers (AWS, Azure, GCP ) and/or Kubernetes is a big plus

The right person in this role has an opportunity to make a huge impact at Acceldata and add value to our future decisions. If this position has piqued your interest and you have what we described - we invite you to apply! An adventure in data awaits.

Learn more at

Read more
Job posted by
Richa Kukar
Did not find a job you were looking for?
Search for relevant jobs from 10000+ companies such as Google, Amazon & Uber actively hiring on Cutshort.
Get to hear about interesting companies hiring right now
iconFollow Cutshort
Want to apply to this role at Acceldata?
Why apply via Cutshort?
Connect with actual hiring teams and get their fast response. No spam.
Learn more
Get to hear about interesting companies hiring right now
iconFollow Cutshort