Locations
Bengaluru (Bangalore)
Salary
{{1800000 / ('' == 'MONTH' ? 12 : 100000) | number}} - {{2200000 / ('' == 'MONTH' ? 12 : 100000) | number}} {{'' == 'MONTH' ? '/mo' : 'lpa'}}
Job description
As one of the fastest growing e-commerce companies in Asia, RedMart offers an unparalleled startup experience. Our culture: entrepreneurial, fiercely intelligent, team oriented, deeply creative and whatever you add to it! We’re fanatic about improving our customer experience and providing “wow” customer service.
We're interested in talented, creative and passionate people joining our All-Star team who believe in our mission: To save our customers time and money for the important things in life!
Some things to know before you apply:
We have big plans to disrupt the traditional grocery retail market
Everything we do is focused on empowering our customers
We work really hard
We have a lot of fun!
Job Purpose:
Be a key member in a dynamic DevOps team that ensures any infrastructure failure should always lead to zero impact to customer shopping experience and embraces component failure not with fear but as something that is always accounted for.
Roles & Responsibilities
As a member of the SRE team, you will be dedicated to improving the uptime and availability of RedMart’s end-to-end infrastructure.
You'll be instrumental in running an infrastructure that is:
Receiving many millions page-views per month
Born-in-the-cloud completely free of legacy lockdown
Built from the ground up -- we build, operate, and maintain total control of the entire cloud infrastructure stack
Supporting high velocity software development pipeline
You will dive deep into challenging operational issues that is inherent to modern cloud infrastructure. You will address operations not using traditional but based on holistic and interdisciplinary approach that blends software, systems, automation, and process perspectives.
You will work with battle-tested as well as emerging open-source tools.
You will improve our logging, monitoring and alerting stack built on Cabot, Consul, ELK, and Graphite.
You will work closely with different functional teams to define SLAs based on business criticality. You will formulate failure measurement, enforce SLA and advocate SRE best practices in every team.
You will provide critical on-site 24x7 operations support in case of any infrastructure failure
Skills & Experience Required
Bachelor level degree in Engineering / Computer Science with 5 years of software / systems engineering experience, preferably with exposure to technical on-call capacity, responding to customer impacting events, mitigating and root cause analysis
Strong written and verbal communication
Must be comfortable working in a Linux/Unix environment
Excellent technical problem-solving and troubleshooting skills
Obsessed with tracking everything that moves and strongly believe that hope is not a strategy
Obsessed with Automation and love working with open-source technologies (Mongo, Elasticsearch, StatsD, Graphite, Consul, AMQP, etc)
Excellent scripting skills (Bash, Python, Ruby, etc)
Ability to work and thrive in a fast moving, multicultural start-up environment
Familiarity with Chaos engineering principles and practices
Read more