Now, more than ever, the Toast team is committed to our customers. We’re taking steps to help restaurants navigate these unprecedented times with technology, resources, and community. Our focus is on building a restaurant platform that helps restaurants adapt, take control, and get back to what they do best: building the businesses they love. And because our technology is purpose-built for restaurants by restaurant people, restaurants can trust that we’ll deliver on their needs for today while investing in experiences that will power their restaurant of the future.
At Toast, our Site Reliability Engineers (SREs) are responsible for keeping all customer-facing services and other Toast production systems running smoothly. SREs are a blend of pragmatic operators and software craftspeople who apply sound software engineering principles, operational discipline, and mature automation to our environments and our codebase. Our decisions are based on instrumentation and continuous observability, as well as predictions and capacity planning.
About this roll* (Responsibilities)
- Gather and analyze metrics from both operating systems and applications to assist in performance tuning and fault finding
- Partner with development teams to improve services through rigorous testing and release procedures
- Participate in system design consulting, platform management, and capacity planning
- Create sustainable systems and services through automation and uplift
- Balance feature development speed and reliability with well-defined service level objectives
Troubleshooting and Supporting Escalations:
- Gather and analyze metrics from both operating systems and applications to assist in performance tuning and fault finding
- Diagnose performance bottlenecks and implement optimizations across infrastructure, databases, web, and mobile applications
- Implement strategies to increase system reliability and performance through on-call rotation and process optimization
- Perform and run blameless RCAs on incidents and outages aggressively, looking for answers that will prevent the incident from ever happening again
Do you have the right ingredients? (Requirements)
- Extensive industry experience with at least 7+ years in SRE and/or DevOps roles
- Polyglot technologist/generalist with a thirst for learning
- Deep understanding of cloud and microservice architecture and the JVM
- Experience with tools such as APM, Terraform, Ansible, GitHub, Jenkins, and Docker
- Experience developing software or software projects in at least four languages, ideally including two of Go, Python, and Java
- Experience with cloud computing technologies ( AWS cloud provider preferred)
Bread puns are encouraged but not required
About Toast
Similar jobs
Only apply on this link - https://loginext.hire.trakstar.com/jobs/fk025uh?source=" target="_blank">https://loginext.hire.trakstar.com/jobs/fk025uh?source=
LogiNext is looking for a technically savvy and passionate Associate Vice President - Product Engineering - DevOps or Senior Database Administrator to cater to the development and operations efforts in product. You will choose and deploy tools and technologies to build and support a robust infrastructure.
You have hands-on experience in building secure, high-performing and scalable infrastructure. You have experience to automate and streamline the development operations and processes. You are a master in troubleshooting and resolving issues in dev, staging and production environments.
Responsibilities:
- Design and implement scalable infrastructure for delivering and running web, mobile and big data applications on cloud
- Scale and optimise a variety of SQL and NoSQL databases (especially MongoDB), web servers, application frameworks, caches, and distributed messaging systems
- Automate the deployment and configuration of the virtualized infrastructure and the entire software stack
- Plan, implement and maintain robust backup and restoration policies ensuring low RTO and RPO
- Support several Linux servers running our SaaS platform stack on AWS, Azure, IBM Cloud, Ali Cloud
- Define and build processes to identify performance bottlenecks and scaling pitfalls
- Manage robust monitoring and alerting infrastructure
- Explore new tools to improve development operations to automate daily tasks
- Ensure High Availability and Auto-failover with minimum or no manual interventions
Requirements:
- Bachelor’s degree in Computer Science, Information Technology or a related field
- 11 to 14 years of experience in designing and maintaining high volume and scalable micro-services architecture on cloud infrastructure
- Strong background in Linux/Unix Administration and Python/Shell Scripting
- Extensive experience working with cloud platforms like AWS (EC2, ELB, S3, Auto-scaling, VPC, Lambda), GCP, Azure
- Experience in deployment automation, Continuous Integration and Continuous Deployment (Jenkins, Maven, Puppet, Chef, GitLab) and monitoring tools like Zabbix, Cloud Watch Monitoring, Nagios
- Knowledge of Java Virtual Machines, Apache Tomcat, Nginx, Apache Kafka, Microservices architecture, Caching mechanisms
- Experience in query analysis, peformance tuning, database redesigning,
- Experience in enterprise application development, maintenance and operations
- Knowledge of best practices and IT operations in an always-up, always-available service
- Excellent written and oral communication skills, judgment and decision-making skills.
- Excellent leadership skill.
staging, QA, and development of cloud infrastructures running in 24×7 environments.
● Most of our deployments are in K8s, You will work with the team to run and manage multiple K8s
environments 24/7
● Implement and oversee all aspects of the cloud environment including provisioning, scale,
monitoring, and security.
● Nurture cloud computing expertise internally and externally to drive cloud adoption.
● Implement systems solutions, and processes needed to manage cloud cost, monitoring, scalability,
and redundancy.
● Ensure all cloud solutions adhere to security and compliance best practices.
● Collaborate with Enterprise Architecture, Data Platform, DevOps, and Integration Teams to ensure
cloud adoption follows standard best practices.
Responsibilities :
● Bachelor’s degree in Computer Science, Computer Engineering or Information Technology or
equivalent experience.
● Experience with Kubernetes on cloud and deployment technologies such as Helm is a major plus
● Expert level hands on experience with AWS (Azure and GCP experience are a big plus)
● 10 or more years of experience.
● Minimum of 5 years’ experience building and supporting cloud solutions
Location: Remote
Job Description :
- Strong hands-on knowledge on Azure DevOps.
- Mandatory Skills required :Azure Devops,docker,Kubernetes
- Skills required : Terraform,GIT,Jenkins,CI/CD,Pipelines,YAML,Scripting,Shell Scripting,Python, Gradle, Maven
- Require only developer experience profiles, and Admin roles are not required
Role Introduction
• This role involves guiding the DevOps team towards successful delivery of Governance and
toolchain initiatives by removing manual tasks.
• Operate toolchain applications to empower engineering teams by providing, reliable, governed
self-service tools and supporting their adoption
• Driving good practice for consumption and utilisation of the engineering toolchain, with a focus
on DevOps practices
• Drive good governance for cloud service consumption
• Involves working in a collaborative environment and focus on leading team and providing
technical leadership to team members.
• Involves setting up process and improvements for teams on supporting various DevOps tooling
and governing the tooling.
• Co-ordinating with multiple teams within organization
• Lead on handovers from architecture teams to support major project rollouts which require the
Toolchain governance DevOps team to operationally support tooling
What you will do
• Identify and implement best practices, process improvement and automation initiatives for
improvement towards quicker delivery by removing manual tasks
• Ensure best practices and process are documented for reusability and keeping up-to date on
good practices and standards.
• Re-usable automation and compliance service, tools and processes
• Support and management of toolchain, toolchain changes and selection
• Identify and implement risk mitigation plans, avoid escalations, resolve blockers for teams.
Toolchain governance will involve operating and responding to alerts, enforcing good tooling
governance by driving automation, remediating technical debt and ensuring the latest tools
are utilised and on the latest versions
• Triage product pipelines, performance issues, SLA/SLO breaches, service unavailable along
with ancillary actions such as providing access to logs, tools, environments.
• Involve in initial / detailed estimates during roadmap planning or feature
estimation/planning of any automation identified for a given toolset.
• Develop, refine, and tune integrations between various tools
• Discuss with Product Owner/team on any challenges from implementation, deployment
perspective and assist in arriving probable solution and escalate any risks to get them
resolved w.r.t DevOps toolchain.
• In consultation with Head of DevOps and other stake holders, prioritization of items, item-
task breakdown; accountable for squad deliverables for sprint
• Involve in reviewing current components and plan for upgrade and ensure its communicated
to wider audience within Organization
• Involve in reviewing access / role and enhance and automate provisioning.
• Identify and encourage areas for growth and improvement within the team e.g conducts
regular 1-2-1’s with squad members to provide support, mentoring and goal setting
• Involve in performance management ,rewards and recognition of team members, Involve in
hiring process.• Plan for upskill of team to know about tools and perform tasks. Ensure quicker onboarding
of new joiners/freshers to team to be productive.
• Review ticket metrics to measure the health of the project including SLAs and plan for
improvement.
• Requirement for on call for critical incidents that happen Out of Hours, based on tooling SLA.
This may include planning standby schedule for squad, carrying out retrospective for every
callout and reviewing SLIs/SLOs.
• Owns the tech/repair debt, risk and compliance for the tooling with respect to
infrastructure, pipelines, access etc
• Track optimum utilization of resources and monitor/track the delivery schedule
• Review solutions designs with the Architects / Principal DevOps Engineers as required
• Provide monthly reporting which align to DevOps Tooling KPIs
What you will have
• Candidate should have 8+ years of experience and Hands-on DevOps experience and
experience in team management.
• Strong communication and interpersonal skills, Team player
• Good working experience of CI/CD tools like Jenkins, SonarQube, FOSSA, Harness, Jira, JSM,
ServiceNow etc.
• Good hands on knowledge of AWS Services like EC2, ECS, S3, IAM, SNS, SQS, VPC, Lambda,
API Gateway, Cloud Watch, Cloud Formation etc.
• Experience in operating and governing DevOps Toolchain
• Experience in operational monitoring, alerting and identifying and delivering on both repair
and technical debt
• Experience and background in ITIL/ITSM processes. The candidate will ensure development
of the appropriate (ITSM) model and processes, based on the ITIL Service Management
framework. This includes the strategic, design, transition, and operation services and
continuous service improvement
• Provide ITSM leadership experience and coaching processes
• Experience on various tools like Jenkins, Harness, Fossa,
• Experience of hosting and managing applications on AWS/AZURE•
• Experience in CI/CD pipeline (Jenkins build pipelines)
• Experience in containerization (Docker/Kubernetes)
• Experience in any programming language (Node.js or Python is preferred)
• Experience in Architecting and supporting cloud based products will be a plus
• Experience in PowerShell & Bash will be a plus
• Able to self manage multiple concurrent small projects, including managing priorities
between projects
• Able to quickly learn new tools
• Should be able to mentor/drive junior team members to achieve desired outcome of
roadmap-
• Ability to analyse information to identify problems and issues, and make effective decisions
within short span
• Excellent problem solving and critical thinking
• Experience in integrating various components including unit testing / CI/CD configuration.
• Experience to review current toolset and plan for upgrade.
• Experience with Agile framework/Jira/JSM tool.• Good communication skills and ability to communicate/work independently with external
teams.
• Highly motivated, able to work proficiently both independently and in a team environment
Good knowledge and experience with security constructs –
Bachelor's degree in information security, computer science, or related.
A Strong Devops experience of at least 4+ years
Strong Experience in Unix/Linux/Python scripting
Strong networking knowledge,vSphere networking stack knowledge desired.
Experience on Docker and Kubernetes
Experience with cloud technologies (AWS/Azure)
Exposure to Continuous Development Tools such as Jenkins or Spinnaker
Exposure to configuration management systems such as Ansible
Knowledge of resource monitoring systems
Ability to scope and estimate
Strong verbal and communication skills
Advanced knowledge of Docker and Kubernetes.
Exposure to Blockchain as a Service (BaaS) like - Chainstack/IBM blockchain platform/Oracle Blockchain Cloud/Rubix/VMWare etc.
Capable of provisioning and maintaining local enterprise blockchain platforms for Development and QA (Hyperledger fabric/Baas/Corda/ETH).
About Navis
Intuitive is the fastest growing top-tier Cloud Solutions and Services company supporting Global Enterprise Customer across Americas, Europe and Middle East.
Intuitive is looking for highly talented hands-on Cloud Infrastructure Architects to help accelerate our growing Professional Services consulting Cloud & DevOps practice. This is an excellent opportunity to join Intuitive’s global world class technology teams, working with some of the best and brightest engineers while also developing your skills and furthering your career working with some of the largest customers.
Job Description :
- Extensive exp. with K8s (EKS/GKE) and k8s eco-system tooling e,g., Prometheus, ArgoCD, Grafana, Istio etc.
- Extensive AWS/GCP Core Infrastructure skills
- Infrastructure/ IAC Automation, Integration - Terraform
- Kubernetes resources engineering and management
- Experience with DevOps tools, CICD pipelines and release management
- Good at creating documentation(runbooks, design documents, implementation plans )
Linux Experience :
- Namespace
- Virtualization
- Containers
Networking Experience
- Virtual networking
- Overlay networks
- Vxlans, GRE
Kubernetes Experience :
Should have experience in bringing up the Kubernetes cluster manually without using kubeadm tool.
Observability
Experience in observability is a plus
Cloud automation :
Familiarity with cloud platforms exclusively AWS, DevOps tools like Jenkins, terraform etc.
The AWS Cloud/Devops Engineer will be working with the engineering team and focusing on AWS infrastructure and automation. A key part of the role is championing and leading infrastructure as code. The Engineer will work closely with the Manager of Operations and Devops to build, manage and automate our AWS infrastructure.
Duties & Responsibilities:
- Design cloud infrastructure that is secure, scalable, and highly available on AWS
- Work collaboratively with software engineering to define infrastructure and deployment requirements
- Provision, configure and maintain AWS cloud infrastructure defined as code
- Ensure configuration and compliance with configuration management tools
- Administer and troubleshoot Linux based systems
- Troubleshoot problems across a wide array of services and functional areas
- Build and maintain operational tools for deployment, monitoring, and analysis of AWS infrastructure and systems
- Perform infrastructure cost analysis and optimization
Qualifications:
- At least 1-5 years of experience building and maintaining AWS infrastructure (VPC, EC2, Security Groups, IAM, ECS, CodeDeploy, CloudFront, S3)
- Strong understanding of how to secure AWS environments and meet compliance requirements
- Expertise using Chef for configuration management
- Hands-on experience deploying and managing infrastructure with Terraform
- Solid foundation of networking and Linux administration
- Experience with CI-CD, Docker, GitLab, Jenkins, ELK and deploying applications on AWS
- Ability to learn/use a wide variety of open source technologies and tools
- Strong bias for action and ownership
DevOps Engineer
Job Description:
The position requires a broad set of technical and interpersonal skills that includes deployment technologies, monitoring and scripting from networking to infrastructure. Well versed in troubleshooting Prod issues and should be able to drive till the RCA.
Skills:
- Manage VMs across multiple datacenters and AWS to support dev/test and production workloads.
- Strong hands-on over Ansible is preferred
- Strong knowledge and hands-on experience in Kubernetes Architecture and administration.
- Should have core knowledge in Linux and System operations.
- Proactively and reactively resolve incidents as escalated from monitoring solutions and end users.
- Conduct and automate audits for network and systems infrastructure.
- Do software deployments, per documented processes, with no impact to customers.
- Follow existing devops processes while having flexibility to create and tweak processes to gain efficiency.
- Troubleshoot connectivity problems across network, systems or applications.
- Follow security guidelines, both policy and technical to protect our customers.
- Ability to automate recurring tasks to increase velocity and quality.
- Should have worked on any one of the Database (Postgres/Mongo/Cockroach/Cassandra)
- Should have knowledge and hands-on experience in managing ELK clusters.
- Scripting Knowledge in Shell/Python is added advantage.
- Hands-on Experience over K8s based Microservice Architecture is added advantage.
Implementing various development, testing, automation tools, and IT infrastructure
Planning the team structure, activities, and involvement in project management activities.
Managing stakeholders and external interfaces
Setting up tools and required infrastructure
Defining and setting development, test, release, update, and support processes for DevOps operation
Have the technical skill to review, verify, and validate the software code developed in the project.
Troubleshooting techniques and fixing the code bugs
Monitoring the processes during the entire lifecycle for its adherence and updating or creating new processes for improvement and minimizing the wastage
Encouraging and building automated processes wherever possible
Identifying and deploying cybersecurity measures by continuously performing vulnerability assessment and risk management
Incidence management and root cause analysis
Coordination and communication within the team and with customers
Selecting and deploying appropriate CI/CD tools
Strive for continuous improvement and build continuous integration, continuous development, and constant deployment pipeline (CI/CD Pipeline)
Mentoring and guiding the team members
Monitoring and measuring customer experience and KPIs
Managing periodic reporting on the progress to the management and the customer
- He has to perform architectural analysis, and he should know how to design enterprise-level systems.
- He should know how to design and simulate tools for the perfect delivery of systems.
- He should know how to design, develop, and maintain systems, processes, procedures to deliver a high-quality service design.
- He has to work with other members of a team and other departments to establish healthy communication and information flow.
- He should know how to deliver a high-performing solution architecture that can support the development efforts of a business.
- He has to plan, design, and configure the most typical business solutions as needed.
- He has to prepare technical documents and other presentations for multiple solutions areas.
- He has to be sure that the best practices for configuration management are carried our as it was needed.
- He has to work on customer specifications, analyze them, and conduct the best product recommendations associated with the platform
Requirements
- AWS Solution Architect 9-10 Years
- Responsible for managing applications on public cloud (AWS) infrastructure.
- Responsible for larger migrations of applications from VM to cloud/cloud-native.
- Responsible for setting up monitoring for cloud/cloud-native-based infrastructure and applications.
- MUST: AWS Solution Architect Professional certification.