
Now, more than ever, the Toast team is committed to our customers. We’re taking steps to help restaurants navigate these unprecedented times with technology, resources, and community. Our focus is on building a restaurant platform that helps restaurants adapt, take control, and get back to what they do best: building the businesses they love. And because our technology is purpose-built for restaurants by restaurant people, restaurants can trust that we’ll deliver on their needs for today while investing in experiences that will power their restaurant of the future.
At Toast, our Site Reliability Engineers (SREs) are responsible for keeping all customer-facing services and other Toast production systems running smoothly. SREs are a blend of pragmatic operators and software craftspeople who apply sound software engineering principles, operational discipline, and mature automation to our environments and our codebase. Our decisions are based on instrumentation and continuous observability, as well as predictions and capacity planning.
About this roll* (Responsibilities)
- Gather and analyze metrics from both operating systems and applications to assist in performance tuning and fault finding
- Partner with development teams to improve services through rigorous testing and release procedures
- Participate in system design consulting, platform management, and capacity planning
- Create sustainable systems and services through automation and uplift
- Balance feature development speed and reliability with well-defined service level objectives
Troubleshooting and Supporting Escalations:
- Gather and analyze metrics from both operating systems and applications to assist in performance tuning and fault finding
- Diagnose performance bottlenecks and implement optimizations across infrastructure, databases, web, and mobile applications
- Implement strategies to increase system reliability and performance through on-call rotation and process optimization
- Perform and run blameless RCAs on incidents and outages aggressively, looking for answers that will prevent the incident from ever happening again
Do you have the right ingredients? (Requirements)
- Extensive industry experience with at least 7+ years in SRE and/or DevOps roles
- Polyglot technologist/generalist with a thirst for learning
- Deep understanding of cloud and microservice architecture and the JVM
- Experience with tools such as APM, Terraform, Ansible, GitHub, Jenkins, and Docker
- Experience developing software or software projects in at least four languages, ideally including two of Go, Python, and Java
- Experience with cloud computing technologies ( AWS cloud provider preferred)
Bread puns are encouraged but not required

About Toast
About
Toast empowers restaurants of all sizes to build great teams, increase revenue, improve operations, and delight guests.
We are a NYSE-listed Boston-based public company. We are also series F funded and have raised 400M USD in the last round in 2020.
We pair our deep understanding of the restaurant industry with powerful cloud based software and restaurant-grade hardware to deliver an intuitive, all-in-one platform, across point of sale, guest marketing, digital ordering & delivery, and payroll & HR.
Tech stack
Company video


Candid answers by the company
Toast helps restaurants of all sizes streamline operations, boost revenue, enhance team management, and deliver exceptional guest experiences.
Similar jobs
NOTE- This is a contractual role for a period of 3-6 months.
Responsibilities:
● Set up and maintain CI/CD pipelines across services and environments
● Monitor system health and set up alerts/logs for performance & errors ● Work closely with backend/frontend teams to improve deployment velocity
● Manage cloud environments (staging, production) with cost and reliability in mind
● Ensure secure access, role policies, and audit logging
● Contribute to internal tooling, CLI automation, and dev workflow improvements
Must-Haves:
● 2–3 years of hands-on experience in DevOps, SRE, or Platform Engineering
● Experience with Docker, CI/CD (especially GitHub Actions), and cloud providers (AWS/GCP)
● Proficiency in writing scripts (Bash, Python) for automation
● Good understanding of system monitoring, logs, and alerting
● Strong debugging skills, ownership mindset, and clear documentation habits
● Infra monitoring tools like Grafana dashboards
Key Skills Required:
· You will be part of the DevOps engineering team, configuring project environments, troubleshooting integration issues in different systems also be involved in building new features for next generation of cloud recovery services and managed services.
· You will directly guide the technical strategy for our clients and build out a new capability within the company for DevOps to improve our business relevance for customers.
· You will be coordinating with Cloud and Data team for their requirements and verify the configurations required for each production server and come with Scalable solutions.
· You will be responsible to review infrastructure and configuration of micro services and packaging and deployment of application
To be the right fit, you'll need:
· Expert in Cloud Services like AWS.
· Experience in Terraform Scripting.
· Experience in container technology like Docker and orchestration like Kubernetes.
· Good knowledge of frameworks such as Jenkins, CI/CD pipeline, Bamboo Etc.
· Experience with various version control system like GIT, build tools (Mavan, ANT, Gradle ) and cloud automation tools (Chef, Puppet, Ansible)
What the role needs
● Review of current DevOps infrastructure & redefine code merging strategy as per product roll out objectives
● Define deploy frequency strategy based on product roadmap document and ongoing product market fit relate tweaks and changes
● Architect benchmark docker configurations based on planned stack
● Establish uniformity of environment across developer machine to multiple production environments
● Plan & execute test automation infrastructure
● Setup automated stress testing environment
● Plan and execute logging & stack trace tools
● Review DevOps orchestration tools & choices
● Coordination with external data centers and AWS in the event of provisioning, outages or maintenance.
Requirements
● Extensive experience with AWS cloud infrastructure deployment and monitoring
● Advanced knowledge of programming languages such as Python and golang, and writing code and scripts
● Experience with Infrastructure as code & devops management tools - Terraform, Packer for devops asset management for monitoring, infrastructure cost estimations, and Infrastructure version management
● Configure and manage data sources like MySQL, MongoDB, Elasticsearch, Redis, Cassandra, Hadoop, etc
● Experience with network, infrastructure and OWASP security standards
● Experience with web server configurations - Nginx, HAProxy, SSL configurations with AWS, understanding & management of sub-domain based product rollout for clients .
● Experience with deployment and monitoring of event streaming & distributing technologies and tools - Kafka, RabbitMQ, NATS.io, socket.io
● Understanding & experience of Disaster Recovery Plan execution
● Working with other senior team members to devise and execute strategies for data backup and storage
● Be aware of current CVEs, potential attack vectors, and vulnerabilities, and apply patches as soon as possible
● Handle incident responses, troubleshooting and fixes for various services
Position- Cloud and Infrastructure Automation Consultant
Location- India(Pan India)-Work from Home
The position:
This exciting role in Ashnik’s consulting team brings great opportunity to design and deploy automation solutions for Ashnik’s enterprise customers spread across SEA and India. This role takes a lead in consulting the customers for automation of cloud and datacentre based resources. You will work hands-on with your team focusing on infrastructure solutions and to automate infrastructure deployments that are secure and compliant. You will provide implementation oversight of solutions to over the challenges of technology and business.
Responsibilities:
· To lead the consultative discussions to identify challenges for the customers and suggest right fit open source tools
· Independently determine the needs of the customer and create solution frameworks
· Design and develop moderately complex software solutions to meet needs
· Use a process-driven approach in designing and developing solutions.
· To create consulting work packages, detailed SOWs and assist sales team to position them to enterprise customers
· To be responsible for implementation of automation recipes (Ansible/CHEF) and scripts (Ruby, PowerShell, Python) as part of an automated installation/deployment process
Experience and skills required :
· 8 to 10 year of experience in IT infrastructure
· Proven technical skill in designing and delivering of enterprise level solution involving integration of complex technologies
· 6+ years of experience with RHEL/windows system automation
· 4+ years of experience using Python and/or Bash scripting to solve and automate common system tasks
· Strong understanding and knowledge of networking architecture
· Experience with Sentinel Policy as Code
· Strong understanding of AWS and Azure infrastructure
· Experience deploying and utilizing automation tools such as Terraform, CloudFormation, CI/CD pipelines, Jenkins, Github Actions
· Experience with Hashicorp Configuration Language (HCL) for module & policy development
· Knowledge of cloud tools including CloudFormation, CloudWatch, Control Tower, CloudTrail and IAM is desirable
This role requires high degree of self-initiative, working with diversified teams and working with customers spread across Southeast Asia and India region. This role requires you to be pro-active in communicating with customers and internal teams about industry trends, technology development and creating thought leadership.
About Us
Ashnik is a leading enterprise open-source solutions company in Southeast Asia and India, enabling organizations to adopt open source for their digital transformation goals. Founded in 2009, it offers a full-fledged Open-Source Marketplace, Solutions, and Services – Consulting, Managed, Technical, Training. Over 200 leading enterprises so far have leveraged Ashnik’s offerings in the space of Database platforms, DevOps & Microservices, Kubernetes, Cloud, and Analytics.
As a team culture, Ashnik is a family for its team members. Each member brings in a different perspective, new ideas and diverse background. Yet we all together strive for one goal – to deliver the best solutions to our customers using open-source software. We passionately believe in the power of collaboration. Through an open platform of idea exchange, we create a vibrant environment for growth and excellence.
Package : upto 20L
Experience: 8 yrs
Key Sills Required for Lead DevOps Engineer
Containerization Technologies
Docker, Kubernetes, OpenShift
Cloud Technologies
AWS/Azure, GCP
CI/CD Pipeline Tools
Jenkins, Azure Devops
Configuration Management Tools
Ansible, Chef,
SCM Tools
Git, GitHub, Bitbucket
Monitoring Tools
New Relic, Nagios, Prometheus
Cloud Infra Automation
Terraform
Scripting Languages
Python, Shell, Groovy
· Ability to decide the Architecture for the project and tools as per the availability
· Sound knowledge required in the deployment strategies and able to define the timelines
· Team handling skills are a must
· Debugging skills are an advantage
· Good to have knowledge of Databases like Mysql, Postgresql
It is advantageous to be familiar with Kafka. RabbitMQ
· Good to have knowledge of Web servers to deploy web applications
· Good to have knowledge of Code quality checking tools like SonarQube and Vulnerability scanning
· Advantage to having experience in DevSecOps
Note: Tools mentioned in bold are a must and others are added advantage
We (the Software Engineer team) are looking for a motivated, experienced person with a data-driven approach to join our Distribution Team in Bangalore to help design, execute and improve our test sets and infrastructure for producing high-quality Hadoop software.
A Day in the life
You will be part of a team that makes sure our releases are predictable and deliver high value to the customer. This team is responsible for automating and maintaining our test harness, and making test results reliable and repeatable.
You will:
-
work on making our distributed software stack more resilient to high-scale endurance runs and customer simulations
-
provide valuable fixes to our product development teams to the issues you’ve found during exhaustive test runs
-
work with product and field teams to make sure our customer simulations match the expectations and can provide valuable feedback to our customers
-
work with amazing people - We are a fun & smart team, including many of the top luminaries in Hadoop and related open source communities. We frequently interact with the research community, collaborate with engineers at other top companies & host cutting edge researchers for tech talks.
-
do innovative work - Cloudera pushes the frontier of big data & distributed computing, as our track record shows. We work on high-profile open source projects, interacting daily with engineers at other exciting companies, speaking at meet-ups, etc.
-
be a part of a great culture - Transparent and open meritocracy. Everybody is always thinking of better ways to do things, and coming up with ideas that make a difference. We build our culture to be the best workplace in our careers.
You have:
-
strong knowledge in at least 1 of the following languages: Java / Python / Scala / C++ / C#
-
hands-on experience with at least 1 of the following configuration management tools: Ansible, Chef, Puppet, Salt
-
confidence with Linux environments
-
ability to identify critical weak spots in distributed software systems
-
experience in developing automated test cases and test plans
-
ability to deal with distributed systems
-
solid interpersonal skills conducive to a distributed environment
-
ability to work independently on multiple tasks
-
self-driven & motivated, with a strong work ethic and a passion for problem solving
-
innovate and automate and break the code
The right person in this role has an opportunity to make a huge impact at Cloudera and add value to our future decisions. If this position has piqued your interest and you have what we described - we invite you to apply! An adventure in data awaits.
The brand is associated with some of the major icons across categories and tie-ups with industries covering fashion, sports, and music, of course. The founders are Marketing grads, with vast experience in the consumer lifestyle products and other major brands. With their vigorous efforts toward quality and marketing, they have been able to strike a chord with major E-commerce brands and even consumers.
What you will do:
- Defining and documenting best practices and strategies regarding application deployment and infrastructure maintenance
- Providing guidance, thought leadership and mentorship to development teams to build cloud competencies
- Ensuring application performance, uptime, and scale, maintaining high standards of code quality and thoughtful design
- Managing cloud environments in accordance with company security guidelines
- Developing and implementing technical efforts to design, build and deploy AWS applications at the direction of lead architects, including large-scale data processing, computationally intensive statistical modeling and advanced analytics
- Participating in all aspects of the software development life cycle for AWS solutions, including planning, requirements, development, testing, and quality assurance
- Troubleshooting incidents, identifying root cause, fixing and documenting problems and implementing preventive measures
- Educating teams on the implementation of new cloud-based initiatives, providing associated training as required
Desired Candidate Profile
What you need to have:- Bachelor’s degree in computer science, information technology
- 2+ years of experience as architect, designing, developing, and implementing cloud solutions on AWS platforms
- Experience in several of the following areas: database architecture, ETL, business intelligence, big data, machine learning, advanced analytic
- Proven ability to collaborate with multi-disciplinary teams of business analysts, developers, data scientists and subject matter experts
- Self-motivation with the ability to drive features to delivery
- Strong analytical and problem solving skills
- Excellent oral and written communication skills
- Good logical sense, strong technical skills and the ability to learn new technologies quickly
- AWS certifications are a plus
- Knowledge of web services, API, REST, and RPC
Minimum 4 years exp
Skillsets:
- Build automation/CI: Jenkins
- Secure repositories: Artifactory, Nexus
- Build technologies: Maven, Gradle
- Development Languages: Python, Java, C#, Node, Angular, React/Redux
- SCM systems: Git, Github, Bitbucket
- Code Quality: Fisheye, Crucible, SonarQube
- Configuration Management: Packer, Ansible, Puppet, Chef
- Deployment: uDeploy, XLDeploy
- Containerization: Kubernetes, Docker, PCF, OpenShift
- Automation frameworks: Selenium, TestNG, Robot
- Work Management: JAMA, Jira
- Strong problem solving skills, Good verbal and written communication skills
- Good knowledge of Linux environment: RedHat etc.
- Good in shell scripting
- Good to have Cloud Technology : AWS, GCP and Azure








