Cutshort logo
Acceldata logo
Site reliability Engineer
Site reliability Engineer
Acceldata's logo

Site reliability Engineer

Richa  Kukar's profile picture
Posted by Richa Kukar
3 - 8 yrs
₹20L - ₹40L / yr (ESOP available)
Bengaluru (Bangalore)
Skills
Hadoop
SRE
DevOps
Reliability engineering
Load balancing
Big Data
Site reliability
Acceldata is creating the Data observability space. We make it possible for data-driven enterprises to effectively monitor, discover, and validate Data pipelines at Petabyte scale. Our customers include a Fortune 500 company, one of Asia's largest telecom companies, and a unicorn fintech startup. We are lean, hungry, customer-obsessed, and growing fast. Our Solutions team values productivity, integrity, and pragmatism. We provide a flexible, remote-friendly work environment.
 
We are building software that can provide insights into companies' data operations and allows them to focus on delivering data reliably with speed and effectiveness. Join us in building an industry-leading data operations platform that focuses on optimizing modern data lakes for both on-premise and cloud environments.

 

Responsibilities

  • Our Site reliability engineers work on improving the availability, scalability, performance, and reliability of enterprise production services for our products as well as our customer’s data lake environments.
  • You will use your expertise to improve the reliability and performance of Hadoop Data lake clusters and data management services. Just as our products, our SRE are expected to be platform and vendor-agnostic when it comes to implementing, stabilizing, and tuning Hadoop ecosystems.
  • You’d be required to provide implementation guidance, best practices framework, and technical thought leadership to our customers for their Hadoop Data lake implementation and migration initiatives.
  • You need to be 100% hand-on and as a required test, monitor, administer, and operate multiple Data lake clusters across data centers.
  • Troubleshoot issues across the entire stack - hardware, software, application, and network.
  • Dive into problems with an eye to both immediate remediations as well as the follow-through changes and automation that will prevent future occurrences.
  • Must demonstrate exceptional troubleshooting and strong architectural skills and clearly and effectively describe this in both a verbal and written format.

Requirements

  • Customer-focused, Self-driven, and Motivated with a strong work ethic and a passion for problem-solving.
  • 4+ years of designing, implementing, tuning, and managing services in a distributed, enterprise-scale on-premise and public/private cloud environment.
  • Familiarity with infrastructure management and operations lifecycle concepts and ecosystem.
  • Hadoop cluster design, Implementation, management and performance tuning experience with HDFS, YARN,
  • HIVE/IMPALA, SPARK, Kerberos and related Hadoop technologies are a must.
  • Must have strong SQL/HQL query troubleshooting and tuning skills on Hive/HBase.
  • Must have a strong capacity planning experience for Hadoop ecosystems/data lakes.
  • Good to have hands-on experience with – KAFKA, RANGER/SENTRY, NiFi, Ambari, Cloudera Manager, and HBASE.
  • Good to have data modeling, data engineering, and data security experience within the Hadoop ecosystem.Good to have deep JVM/Java debugging and tuning skills.
Read more
Users love Cutshort
Read about what our users have to say about finding their next opportunity on Cutshort.
Subodh Popalwar's profile image

Subodh Popalwar

Software Engineer, Memorres
For 2 years, I had trouble finding a company with good work culture and a role that will help me grow in my career. Soon after I started using Cutshort, I had access to information about the work culture, compensation and what each company was clearly offering.
Companies hiring on Cutshort
companies logos

About Acceldata

Founded :
2018
Type
Size
Stage :
Raised funding
About

Acceldata is the company that built the leading Multidimensional Data Observability Cloud. This cloud was designed to help data-driven organizations achieve agility in innovation, operational excellence, and enhanced returns on data investment. Embedded analytics and artificial intelligence technologies are becoming more reliant on contemporary organizations to fuel their business operations and choices.


The data observability technologies offered by Acceldata improve the performance of embedded artificial intelligence and analytics workloads by providing purpose-built monitoring and analytics. The first Data Observability Cloud is presently being developed by Acceldata for cloud data warehouses and hybrid data lakes. Acceldata makes it easy for businesses to expand their pipelines to meet the requirements of modern business, regardless of whether they are operating in a platform or cloud environment. Data Observability Cloud by Acceldata provides on-demand operational information to support analytics data workloads and embedded artificial intelligence.

Read more
Connect with the team
Profile picture
Richa Kukar
Profile picture
Abhishek Bharadawaj
Profile picture
Abhishek N
Profile picture
Swapna Chanamala
Profile picture
Akash Sampat
Company social profiles
N/A

Similar jobs

Crelio Health
at Crelio Health
1 video
8 recruiters
Shreya Kabra
Posted by Shreya Kabra
Pune
5 - 15 yrs
Best in industry
SRE
Reliability engineering
Site reliability,
Site reliability engineer

Job Summary:

We are seeking a Senior DevOps & SRE Engineer to join our team and help us build, deploy, and maintain our infrastructure and applications. The ideal candidate will have experience working in a fast-paced environment and a strong background in DevOps and Site Reliability Engineering (SRE). You will be responsible for ensuring the reliability, scalability, and security of our applications and infrastructure.

 

Responsibilities:

  • Build and maintain our CI/CD pipeline and deployment automation tools
  • Design and implement monitoring and alerting systems to ensure the health of our applications and infrastructure
  • Work closely with development teams to ensure that code is deployed in a reliable and scalable manner
  • Participate in on-call rotations to provide 24/7 support for our production systems
  • Develop and maintain disaster recovery plans and processes
  • Continuously improve our infrastructure and processes to ensure scalability, reliability, and security
  • Mentor and provide technical leadership to junior team members
  • Keep up-to-date with industry best practices and emerging technologies in DevOps and SRE

Requirements:

  • Bachelor’s degree in Computer Science, Engineering, or a related field
  • 5+ years of experience in DevOps or SRE
  • Strong programming skills in at least one of the following languages: Python, Go, Ruby, or Java
  • Experience with infrastructure as code tools such as Terraform or CloudFormation
  • Experience with containerization technologies such as Docker and Kubernetes
  • Strong understanding of networking concepts such as TCP/IP, DNS, and load balancing
  • Experience with monitoring and logging tools such as Prometheus, Grafana, and ELK stack
  • Excellent problem-solving skills and the ability to troubleshoot complex issues in a fast-paced environment
  • Strong communication and collaboration skills with both technical and non-technical stakeholders

Preferred Qualifications:

  • Experience with cloud providers such as AWS or Azure
  • Experience with building and maintaining large-scale distributed systems
  • Experience with database technologies such as MySQL, PostgreSQL, or MongoDB
  • Experience with automation tools such as Ansible or Chef
  • Experience with Agile development methodologies such as Scrum or Kanban

If you are passionate about DevOps and SRE and have the skills and experience we are looking for, we encourage you to apply for this exciting opportunity.

Read more
Nagarro Software
at Nagarro Software
1 video
12 recruiters
Nitika Kalra
Posted by Nitika Kalra
Hyderabad
7.5 - 10 yrs
Best in industry
Site Reliability Engineer

👋🏼We're Nagarro.


We are a Digital Product Engineering company that is scaling in a big way! We build products, services, and experiences that inspire, excite, and delight. We work at scale across all devices and digital mediums, and our people exist everywhere in the world (19000+ experts across 33 countries, to be exact). Our work culture is dynamic and non-hierarchical. We're looking for great new colleagues. That's where you come in.


REQUIREMENTS:

  • Must have Skills: Cloud development (Capable), Microservices architecture (MSA) (Strong), Site reliability Engineering
  • Qualifications: - Bachelors degree in computer science or other highly technical, scientific discipline
  • 10+yrs experience and a strong background in areas like cloud operations and site reliability engineering
  • Hands-on knowledge and experience with any of the major public cloud providers (preferably AWS)
  • Good understanding of micro-service architectures and development frameworks; knowledge across tiers in a multi-tier cloud environment including multi-region, multi-zone configurations, load balancers, web servers, application containers, data stores, distributed cache, and content delivery networks
  • Hands-on knowledge and experience with observability and monitoring tools like New Relic, Splunk, Prometheus and Grafana
  • Ability to program (structured and OO) with one or more high-level languages, such as Python, Go lang, Shell scripting, C/C++ or Java
  • Ability to work with query languages to analyze monitoring data and other app-specific transactions
  • A proactive approach to spotting problems, areas for improvement, and performance bottlenecks
  • An Agile mindset and strong communication skills to collaborate with multiple stakeholders across the organization

RESPONSIBILITIES:

  • Participate in functional discussions, system design consulting, platform management, and capacity planning to develop an overall understanding of the product and teams current priorities
  • Partner with development teams to improve services through rigorous testing, release procedures, automation of runbooks, and DR planning
  • Build automation scripts for auto-recover behavior, determining benchmark of critical services, preparing custom dashboards to report performance in production
  • Understand product functionality to design and build thoughtful experiments to simulate chaos and proactively find faults in the systems
  • Work with architects to define service level indicators and create a service catalog
  • Gather and analyze SLIs, SLOs from applications, services, and OS to assist in performance tuning, improving availability and reliability
  • Improve monitoring and observability to increase visibility into key metrics like MTTI, MTTR, and MTTD
  • Work with the team to understand the root cause of production incidents
  • Train and groom engineers to internalize SRE best practices


Read more
Acceldata
at Acceldata
5 recruiters
Richa  Kukar
Posted by Richa Kukar
Bengaluru (Bangalore)
6 - 10 yrs
Best in industry
SRE
Reliability engineering
Site reliability
Hadoop
HDFS
+1 more

Senior SRE - Acceldata (IC3 Level)


About the Job


You will join a team of highly skilled engineers who are responsible for delivering Acceldata’s support services. Our Site Reliability Engineers are trained to be active listeners and demonstrate empathy when customers encounter product issues. In our fun and collaborative environment  Site Reliability Engineers develop strong business, interpersonal and technical skills to deliver high-quality service to our valued customers.


When you arrive for your first day, we’ll want you to have:

  • Solid skills in troubleshooting to repair failed products or processes on a machine or a system using a logical, systematic search for the source of a problem in order to solve it, and make the product or process operational again
  • A strong ability to understand the feelings of our customers as we empathize with them on the issue at hand
  • A strong desire to increase your product and technology skillset; increase- your confidence supporting our products so you can help our customers succeed

In this position you will…

  • Provide Support Services to our Gold & Enterprise customers using our flagship Acceldata Pulse,Flow & Torch Product suits. This may include assistance provided during the engineering and operations of distributed systems as well as responses for mission-critical systems and production customers.
  • Demonstrate the ability to actively listen to customers and show empathy to the customer’s business impact when they experience issues with our products
  • Participate in the queue management and coordination process by owning customer escalations, managing the unassigned queue.
  • Be involved with and work on other support related activities - Performing POC & assisting Onboarding deployments of Acceldata & Hadoop distribution products.
  • Triage, diagnose and escalate customer inquiries when applicable during their engineering and operations efforts.
  • Collaborate and share solutions with both customers and the Internal team.
  • Investigate product related issues both for particular customers and for common trends that may arise
  • Study and understand critical system components and large cluster operations
  • Differentiate between issues that arise in operations, user code, or product
  • Coordinate enhancement and feature requests with product management and Acceldata engineering team.
  • Flexible in working in Shifts.
  • Participate in a Rotational weekend on-call roster for critical support needs.
  • Participate as a designated or dedicated engineer for specific customers. Aspects of this engagement translates to building long term successful relationships with customers, leading weekly status calls, and occasional visits to customer sites

In this position, you should have…

  • A strong desire and aptitude to become a well-rounded support professional. Acceldata Support considers the service we deliver as our core product.
  • A positive attitude towards feedback and continual improvement
  • A willingness to give direct feedback to and partner with management to improve team operations
  • A tenacity to bring calm and order to the often stressful situations of customer cases
  • A mental capability to multi-task across many customer situations simultaneously
  • Bachelor degree in Computer Science or Engineering or equivalent experience. Master’s degree is a plus
  • At least 2+ years of experience with at least one of the following cloud platforms: Amazon Web Services (AWS), Microsoft Azure, Google Cloud Platform (GCP), experience with managing and supporting a cloud infrastructure on any of the 3 platforms. Also knowledge on Kubernetes, Docker is a must.
  • Strong troubleshooting skills (in example, TCP/IP, DNS, File system, Load balancing, database, Java)
  • Excellent communication skills in English (written and verbal)
  • Prior enterprise support experience in a technical environment strongly preferred

Strong Hands-on Experience Working With Or Supporting The Following

  • 8-12 years of Experience with a highly-scalable, distributed, multi-node environment (50+ nodes)
  • Hadoop operation including Zookeeper, HDFS, YARN, Hive, and related components like the Hive metastore, Cloudera Manager/Ambari, etc
  • Authentication and security configuration and tuning (KNOX, LDAP, Kerberos, SSL/TLS, second priority: SSO/OAuth/OIDC, Ranger/Sentry)
  • Java troubleshooting, e.g., collection and evaluation of jstacks, heap dumps

You might also have…

  • Linux, NFS, Windows, including application installation, scripting, basic command line
  • Docker and Kubernetes configuration and troubleshooting, including Helm charts, storage options, logging, and basic kubectl CLI
  • Experience working with scripting languages (Bash, PowerShell, Python)
  • Working knowledge of application, server, and network security management concepts
  • Familiarity with virtual machine technologies
  • Knowledge of databases like MySQL and PostgreSQL,
  • Certification on any of the leading Cloud providers (AWS, Azure, GCP ) and/or Kubernetes is a big plus

The right person in this role has an opportunity to make a huge impact at Acceldata and add value to our future decisions. If this position has piqued your interest and you have what we described - we invite you to apply! An adventure in data awaits.

Learn more at https://www.acceldata.io/about-us">https://www.acceldata.io/about-us



Read more
Why apply to jobs via Cutshort
people_solving_puzzle
Personalized job matches
Stop wasting time. Get matched with jobs that meet your skills, aspirations and preferences.
people_verifying_people
Verified hiring teams
See actual hiring teams, find common social connections or connect with them directly. No 3rd party agencies here.
ai_chip
Move faster with AI
We use AI to get you faster responses, recommendations and unmatched user experience.
21,01,133
Matches delivered
37,12,187
Network size
15,000
Companies hiring
Did not find a job you were looking for?
icon
Search for relevant jobs from 10000+ companies such as Google, Amazon & Uber actively hiring on Cutshort.
companies logo
companies logo
companies logo
companies logo
companies logo
Get to hear about interesting companies hiring right now
Company logo
Company logo
Company logo
Company logo
Company logo
Linkedin iconFollow Cutshort
Users love Cutshort
Read about what our users have to say about finding their next opportunity on Cutshort.
Subodh Popalwar's profile image

Subodh Popalwar

Software Engineer, Memorres
For 2 years, I had trouble finding a company with good work culture and a role that will help me grow in my career. Soon after I started using Cutshort, I had access to information about the work culture, compensation and what each company was clearly offering.
Companies hiring on Cutshort
companies logos