👋🏼We're Nagarro.
We are a Digital Product Engineering company that is scaling in a big way! We build products, services, and experiences that inspire, excite, and delight. We work at scale across all devices and digital mediums, and our people exist everywhere in the world (19000+ experts across 33 countries, to be exact). Our work culture is dynamic and non-hierarchical. We're looking for great new colleagues. That's where you come in.
REQUIREMENTS:
- Must have Skills: Cloud development (Capable), Microservices architecture (MSA) (Strong), Site reliability Engineering
- Qualifications: - Bachelors degree in computer science or other highly technical, scientific discipline
- 10+yrs experience and a strong background in areas like cloud operations and site reliability engineering
- Hands-on knowledge and experience with any of the major public cloud providers (preferably AWS)
- Good understanding of micro-service architectures and development frameworks; knowledge across tiers in a multi-tier cloud environment including multi-region, multi-zone configurations, load balancers, web servers, application containers, data stores, distributed cache, and content delivery networks
- Hands-on knowledge and experience with observability and monitoring tools like New Relic, Splunk, Prometheus and Grafana
- Ability to program (structured and OO) with one or more high-level languages, such as Python, Go lang, Shell scripting, C/C++ or Java
- Ability to work with query languages to analyze monitoring data and other app-specific transactions
- A proactive approach to spotting problems, areas for improvement, and performance bottlenecks
- An Agile mindset and strong communication skills to collaborate with multiple stakeholders across the organization
RESPONSIBILITIES:
- Participate in functional discussions, system design consulting, platform management, and capacity planning to develop an overall understanding of the product and teams current priorities
- Partner with development teams to improve services through rigorous testing, release procedures, automation of runbooks, and DR planning
- Build automation scripts for auto-recover behavior, determining benchmark of critical services, preparing custom dashboards to report performance in production
- Understand product functionality to design and build thoughtful experiments to simulate chaos and proactively find faults in the systems
- Work with architects to define service level indicators and create a service catalog
- Gather and analyze SLIs, SLOs from applications, services, and OS to assist in performance tuning, improving availability and reliability
- Improve monitoring and observability to increase visibility into key metrics like MTTI, MTTR, and MTTD
- Work with the team to understand the root cause of production incidents
- Train and groom engineers to internalize SRE best practices
About Nagarro Software
👋🏼We're Nagarro.
We are a Digital Product Engineering company that is scaling in a big way! We build products, services, and experiences that inspire, excite, and delight. We work at scale across all devices and digital mediums, and our people exist everywhere in the world (19000+ experts across 33 countries, to be exact). Our work culture is dynamic and non-hierarchical. We are looking for great new colleagues.
Similar jobs
Job Summary:
We are seeking a Senior DevOps & SRE Engineer to join our team and help us build, deploy, and maintain our infrastructure and applications. The ideal candidate will have experience working in a fast-paced environment and a strong background in DevOps and Site Reliability Engineering (SRE). You will be responsible for ensuring the reliability, scalability, and security of our applications and infrastructure.
Responsibilities:
- Build and maintain our CI/CD pipeline and deployment automation tools
- Design and implement monitoring and alerting systems to ensure the health of our applications and infrastructure
- Work closely with development teams to ensure that code is deployed in a reliable and scalable manner
- Participate in on-call rotations to provide 24/7 support for our production systems
- Develop and maintain disaster recovery plans and processes
- Continuously improve our infrastructure and processes to ensure scalability, reliability, and security
- Mentor and provide technical leadership to junior team members
- Keep up-to-date with industry best practices and emerging technologies in DevOps and SRE
Requirements:
- Bachelor’s degree in Computer Science, Engineering, or a related field
- 5+ years of experience in DevOps or SRE
- Strong programming skills in at least one of the following languages: Python, Go, Ruby, or Java
- Experience with infrastructure as code tools such as Terraform or CloudFormation
- Experience with containerization technologies such as Docker and Kubernetes
- Strong understanding of networking concepts such as TCP/IP, DNS, and load balancing
- Experience with monitoring and logging tools such as Prometheus, Grafana, and ELK stack
- Excellent problem-solving skills and the ability to troubleshoot complex issues in a fast-paced environment
- Strong communication and collaboration skills with both technical and non-technical stakeholders
Preferred Qualifications:
- Experience with cloud providers such as AWS or Azure
- Experience with building and maintaining large-scale distributed systems
- Experience with database technologies such as MySQL, PostgreSQL, or MongoDB
- Experience with automation tools such as Ansible or Chef
- Experience with Agile development methodologies such as Scrum or Kanban
If you are passionate about DevOps and SRE and have the skills and experience we are looking for, we encourage you to apply for this exciting opportunity.