
- Dynatrace Expertise: Lead the implementation, configuration, and optimization of Dynatrace monitoring solutions across diverse environments, ensuring maximum efficiency and effectiveness.
- Cloud Integration: Utilize expertise in AWS and Azure to seamlessly integrate Dynatrace monitoring into cloud-based architectures, leveraging PaaS services and IAM roles for efficient monitoring and management.
- Application and Infrastructure Architecture: Design and architect both application and infrastructure landscapes, considering factors like Oracle, SQL Server, Shareplex, Commvault, Windows, Linux, Solaris, SNMP polling, and SNMP traps.
- Cross-Platform Integration: Integrate Dynatrace with various products such as Splunk, APIM, and VMWare to provide comprehensive monitoring and analysis capabilities.
- Inter-Account Integration: Develop and implement integration strategies for seamless communication and monitoring across multiple AWS accounts, leveraging Terraform and IAM roles.
- Experience working with On-premise Application and Infrastructure
- Experience with AWS & Azure and Cloud Certified.
- Dynatrace Experience & Certification

About DevOpspatial Pvt Ltd
About
Similar jobs
Job Responsibilities:
- Managing and maintaining the efficient functioning of containerized applications and systems within an organization
- Design, implement, and manage scalable Kubernetes clusters in cloud or on-premise environments
- Develop and maintain CI/CD pipelines to automate infrastructure and application deployments, and track all automation processes
- Implement workload automation using configuration management tools, as well as infrastructure as code (IaC) approaches for resource provisioning
- Monitor, troubleshoot, and optimize the performance of Kubernetes clusters and underlying cloud infrastructure
- Ensure high availability, security, and scalability of infrastructure through automation and best practices
- Establish and enforce cloud security standards, policies, and procedures Work agile technologies
Primary Requirements:
- Kubernetes: Proven experience in managing Kubernetes clusters (min. 2-3 years)
- Linux/Unix: Proficiency in administering complex Linux infrastructures and services
- Infrastructure as Code: Hands-on experience with CM tools like Ansible, as well as the
- knowledge of resource provisioning with Terraform or other Cloud-based utilities
- CI/CD Pipelines: Expertise in building and monitoring complex CI/CD pipelines to
- manage the build, test, packaging, containerization and release processes of software
- Scripting & Automation: Strong scripting and process automation skills in Bash, Python
- Monitoring Tools: Experience with monitoring and logging tools (Prometheus, Grafana)
- Version Control: Proficient with Git and familiar with GitOps workflows.
- Security: Strong understanding of security best practices in cloud and containerized
- environments.
Skills/Traits that would be an advantage:
- Kubernetes administration experience, including installation, configuration, and troubleshooting
- Kubernetes development experience
- Strong analytical and problem-solving skills
- Excellent communication and interpersonal skills
- Ability to work independently and as part of a team
About the Company:
Gruve is an innovative Software Services startup dedicated to empowering Enterprise Customers in managing their Data Life Cycle. We specialize in Cyber Security, Customer Experience, Infrastructure, and advanced technologies such as Machine Learning and Artificial Intelligence. Our mission is to assist our customers in their business strategies utilizing their data to make more intelligent decisions. As an well-funded early-stage startup, Gruve offers a dynamic environment with strong customer and partner networks.
Why Gruve:
At Gruve, we foster a culture of innovation, collaboration, and continuous learning. We are committed to building a diverse and inclusive workplace where everyone can thrive and contribute their best work. If you’re passionate about technology and eager to make an impact, we’d love to hear from you.
Gruve is an equal opportunity employer. We welcome applicants from all backgrounds and thank all who apply; however, only those selected for an interview will be contacted.
Position summary:
We are seeking a Staff Engineer – DevOps with 8-12 years of experience in designing, implementing, and optimizing CI/CD pipelines, cloud infrastructure, and automation frameworks. The ideal candidate will have expertise in Kubernetes, Terraform, CI/CD, Security, Observability, and Cloud Platforms (AWS, Azure, GCP). You will play a key role in scaling and securing our infrastructure, improving developer productivity, and ensuring high availability and performance.
Key Roles & Responsibilities:
- Design, implement, and maintain CI/CD pipelines using tools like Jenkins, GitLab CI/CD, ArgoCD, and Tekton.
- Deploy and manage Kubernetes clusters (EKS, AKS, GKE) and containerized workloads.
- Automate infrastructure provisioning using Terraform, Ansible, Pulumi, or CloudFormation.
- Implement observability and monitoring solutions using Prometheus, Grafana, ELK, OpenTelemetry, or Datadog.
- Ensure security best practices in DevOps, including IAM, secrets management, container security, and vulnerability scanning.
- Optimize cloud infrastructure (AWS, Azure, GCP) for performance, cost efficiency, and scalability.
- Develop and manage GitOps workflows and infrastructure-as-code (IaC) automation.
- Implement zero-downtime deployment strategies, including blue-green deployments, canary releases, and feature flags.
- Work closely with development teams to optimize build pipelines, reduce deployment time, and improve system reliability.
Basic Qualifications:
- A bachelor’s or master’s degree in computer science, electronics engineering or a related field
- 8-12 years of experience in DevOps, Site Reliability Engineering (SRE), or Infrastructure Automation.
- Strong expertise in CI/CD pipelines, version control (Git), and release automation.
- Hands-on experience with Kubernetes (EKS, AKS, GKE) and container orchestration.
- Proficiency in Terraform, Ansible for infrastructure automation.
- Experience with AWS, Azure, or GCP services (EC2, S3, IAM, VPC, Lambda, API Gateway, etc.).
- Expertise in monitoring/logging tools such as Prometheus, Grafana, ELK, OpenTelemetry, or Datadog.
- Strong scripting and automation skills in Python, Bash, or Go.
Preferred Qualifications
- Experience in FinOps Cloud Cost Optimization) and Kubernetes cluster scaling.
- Exposure to serverless architectures and event-driven workflows.
- Contributions to open-source DevOps projects.
Infra360 Solutions is a services company specializing in Cloud, DevSecOps, Security, and Observability solutions. We help technology companies adapt DevOps culture in their organization by focusing on long-term DevOps roadmap. We focus on identifying technical and cultural issues in the journey of successfully implementing the DevOps practices in the organization and work with respective teams to fix issues to increase overall productivity. We also do training sessions for the developers and make them realize the importance of DevOps. We provide these services - DevOps, DevSecOps, FinOps, Cost Optimizations, CI/CD, Observability, Cloud Security, Containerization, Cloud Migration, Site Reliability, Performance Optimizations, SIEM and SecOps, Serverless automation, Well-Architected Review, MLOps, Governance, Risk & Compliance. We do assessments of technology architecture, security, governance, compliance, and DevOps maturity model for any technology company and help them optimize their cloud cost, streamline their technology architecture, and set up processes to improve the availability and reliability of their website and applications. We set up tools for monitoring, logging, and observability. We focus on bringing the DevOps culture to the organization to improve its efficiency and delivery.
Job Description
Our Mission
Our mission is to help customers achieve their business objectives by providing innovative, best-in-class consulting, IT solutions and services and to make it a joy for all stakeholders to work with us. We function as a full stakeholder in business, offering a consulting-led approach with an integrated portfolio of technology-led solutions that encompass the entire Enterprise value chain.
Our Customer-centric Engagement Model defines how we engage with you, offering specialized services and solutions that meet the distinct needs of your business.
Our Culture
Culture forms the core of our foundation and our effort towards creating an engaging workplace has resulted in Infra360 Solution Pvt Ltd.
Our Tech-Stack:
- Azure DevOps, Azure Kubernetes Service, Docker, Active Directory (Microsoft Entra)
- Azure IAM and managed identity, Virtual network, VM Scale Set, App Service, Cosmos
- Azure, MySQL Scripting (PowerShell, Python, Bash),
- Azure Security, Security Documentation, Security Compliance,
- AKS, Blob Storage, Azure functions, Virtual Machines, Azure SQL
- AWS - IAM, EC2, EKS, Lambda, ECS, Route53, Cloud formation, Cloud front, S3
- GCP - GKE, Compute Engine, App Engine, SCC
- Kubernetes, Linux, Docker & Microservices Architecture
- Terraform & Terragrunt
- Jenkins & Argocd
- Ansible, Vault, Vagrant, SaltStack
- CloudFront, Apache, Nginx, Varnish, Akamai
- Mysql, Aurora, Postgres, AWS RedShift, MongoDB
- ElasticSearch, Redis, Aerospike, Memcache, Solr
- ELK, Fluentd, Elastic APM & Prometheus Grafana Stack
- Java (Spring/Hibernate/JPA/REST), Nodejs, Ruby, Rails, Erlang, Python
What does this role hold for you…??
- Infrastructure as a code (IaC)
- CI/CD and configuration management
- Managing Azure Active Directory (Entra)
- Keeping the cost of the infrastructure to the minimum
- Doing RCA of production issues and providing resolution
- Setting up failover, DR, backups, logging, monitoring, and alerting
- Containerizing different applications on the Kubernetes platform
- Capacity planning of different environments infrastructure
- Ensuring zero outages of critical services
- Database administration of SQL and NoSQL databases
- Setting up the right set of security measures
Requirements
Apply if you have…
- A graduation/post-graduation degree in Computer Science and related fields
- 2-4 years of strong DevOps experience in Azure with the Linux environment.
- Strong interest in working in our tech stack
- Excellent communication skills
- Worked with minimal supervision and love to work as a self-starter
- Hands-on experience with at least one of the scripting languages - Bash, Python, Go etc
- Experience with version control systems like Git
- Understanding of Azure cloud computing services and cloud computing delivery models (IaaS, PaaS, and SaaS)
- Strong scripting or programming skills for automating tasks (PowerShell/Bash)
- Knowledge and experience with CI/CD tools: Azure DevOps, Jenkins, Gitlab etc.
- Knowledge and experience in IaC at least one (ARM Templates/ Terraform)
- Strong experience with managing the Production Systems day in and day out
- Experience in finding issues in different layers of architecture in a production environment and fixing them
- Experience in automation tools like Ansible/SaltStack and Jenkins
- Experience in Docker/Kubernetes platform and managing OpenStack (desirable)
- Experience with Hashicorp tools i.e. Vault, Vagrant, Terraform, Consul, VirtualBox etc. (desirable)
- Experience in Monitoring tools like Prometheus/Grafana/Elastic APM.
- Experience in logging tools Like ELK/Loki.
- Experience in using Microsoft Azure Cloud services
If you are passionate about infrastructure, and cloud technologies, and want to contribute to innovative projects, we encourage you to apply. Infra360 offers a dynamic work environment and opportunities for professional growth.
Interview Process
Application Screening=>Test/Assessment=>2 Rounds of Tech Interview=>CEO Round=>Final Discussion
Now, more than ever, the Toast team is committed to our customers. We’re taking steps to help restaurants navigate these unprecedented times with technology, resources, and community. Our focus is on building a restaurant platform that helps restaurants adapt, take control, and get back to what they do best: building the businesses they love. And because our technology is purpose-built for restaurants by restaurant people, restaurants can trust that we’ll deliver on their needs for today while investing in experiences that will power their restaurant of the future.
At Toast, our Site Reliability Engineers (SREs) are responsible for keeping all customer-facing services and other Toast production systems running smoothly. SREs are a blend of pragmatic operators and software craftspeople who apply sound software engineering principles, operational discipline, and mature automation to our environments and our codebase. Our decisions are based on instrumentation and continuous observability, as well as predictions and capacity planning.
About this roll* (Responsibilities)
- Gather and analyze metrics from both operating systems and applications to assist in performance tuning and fault finding
- Partner with development teams to improve services through rigorous testing and release procedures
- Participate in system design consulting, platform management, and capacity planning
- Create sustainable systems and services through automation and uplift
- Balance feature development speed and reliability with well-defined service level objectives
Troubleshooting and Supporting Escalations:
- Gather and analyze metrics from both operating systems and applications to assist in performance tuning and fault finding
- Diagnose performance bottlenecks and implement optimizations across infrastructure, databases, web, and mobile applications
- Implement strategies to increase system reliability and performance through on-call rotation and process optimization
- Perform and run blameless RCAs on incidents and outages aggressively, looking for answers that will prevent the incident from ever happening again
Do you have the right ingredients? (Requirements)
- Extensive industry experience with at least 7+ years in SRE and/or DevOps roles
- Polyglot technologist/generalist with a thirst for learning
- Deep understanding of cloud and microservice architecture and the JVM
- Experience with tools such as APM, Terraform, Ansible, GitHub, Jenkins, and Docker
- Experience developing software or software projects in at least four languages, ideally including two of Go, Python, and Java
- Experience with cloud computing technologies ( AWS cloud provider preferred)
Bread puns are encouraged but not required
Job Description
• Minimum 3+ yrs of Experience in DevOps with AWS Platform
• Strong AWS knowledge and experience
• Experience in using CI/CD automation tools (Git, Jenkins, Configuration deployment tools ( Puppet/Chef/Ansible)
• Experience with IAC tools Terraform
• Excellent experience in operating a container orchestration cluster (Kubernetes, Docker)
• Significant experience with Linux operating system environments
• Experience with infrastructure scripting solutions such as Python/Shell scripting
• Must have experience in designing Infrastructure automation framework.
• Good experience in any of the Setting up Monitoring tools and Dashboards ( Grafana/kafka)
• Excellent problem-solving, Log Analysis and troubleshooting skills
• Experience in setting up centralized logging for system (EKS, EC2) and application
• Process-oriented with great documentation skills
• Ability to work effectively within a team and with minimal supervision
The DevOps Engineer's core responsibilities include automated configuration and management
of infrastructure, continuous integration and delivery of distributed systems at scale in a Hybrid
environment.
Must-Have:
● You have 4-10 years of experience in DevOps
● You have experience in managing IT infrastructure at scale
● You have experience in automation of deployment of distributed systems and in
infrastructure provisioning at scale.
● You have in-depth hands-on experience on Linux and Linux-based systems, Linux
scripting
● You have experience in Server hardware, Networking, firewalls
● You have experience in source code management, configuration management,
continuous integration, continuous testing, continuous monitoring
● You have experience with CI/CD and related tools
* You have experience with Monitoring tools like ELK, Grafana, Prometheus
● You have experience with containerization, container orchestration, management
● Have a penchant for solving complex and interesting problems.
● Worked in startup-like environments with high levels of ownership and commitment.
● BTech, MTech or Ph.D. in Computer Science or related Technical Discipline
Specific responsibilities commensurate with experience and include:
- Ability to react quickly and effectively to identify and resolve issues that heavily impact CI/CD system (immediate mitigation of impact, long-term resolution including strategies for risk mitigation/monitoring/alert for proactive resolution of potential future occurrences)
- Design, develop, unit test, and implement build automation scripts including environment configuration validation processes
- Automate and improve development process by evaluation and introduction of new tools and scripts, and manage their life cycle and validation
- Determine branching strategy and maintain branches for various components, products, and product lines
- Come up with solutions to open-ended problems that focus on workflow improvements for the Software department
- Address issues with well-defined requirements efficiently; come up with short-term and long-term solutions and staged deployment strategies
- Self-driven-- takes action to move tickets from start to completion with minimal oversight
- Ability to communicate with and consider perspectives of stakeholders including but not limited to: IT, software development, verification
- Ability to break down a problem into smaller components and solve them in a logical, controlled, clearly explainable approach
- Lead the creation and maintenance of a pre-production environment as a testbed for build process improvements and changes before deployment to the production environment
- Gather metrics via direct input, data based on analysis of developer working habits analysis and pain points to assess current state and areas requiring further improvement
- Define chain of communication and immediate paths of action in the case of a build fault state
- Ability to work within constraints of the internal network without access to commercial cloud solutions
- Create metrics that define ‘efficiency’ and ‘reliability’ in measurable terms, and track them
- Perform static code and security analysis
- Design and execute unit tests and perform code coverage analysis
- Able to work in Agile development team environment
Key Requirement & Qualifications:
- Bachelor’s degree (or higher) in Electrical Engineering, Computer Engineering, Computer Science or equivalent
- 6+ years (minimum) experience handling Build, Release, and Deployment of software on Windows and/or Linux environments (on-premise)
- Experience with the development and deployment of CM processes and tools
- Build automation for .NET using TeamCity (Jenkins is an asset)
- Scripting languages: Windows batch scripting, Powershell, Ant/NAnt
- Source control systems usage, branching strategies, and workflow (Git preferred, Subversion)
- 6+ years of hands-on programming experience with C# and .NET (both Framework and Core)
- Troubleshooting and debugging-- what information to gather when there are issues with CI/CD system, and how to gather it (i.e., analyzing network communication? Windows crash dumps, java logs, etc.)
- 6+ years (minimum) in web/desktop application software development experience
- Excellent problem solving, critical and analytical thinking
- Strong team player who understands SDLC and QA methodologies
- A professional, results-oriented individual with a high degree of self-motivation
- Excellent written and verbal communication skills and the ability to coordinate work/activities with multiple software/IT teams
- Working with virtual machines and build management on virtual machines (VMware preferred).
- Managing configurations for multiple build environments
- OS administration and scripting experience (Windows is a must, Linux desired)
- Experience with test automation tools (NUnit, customer inhouse frameworks) and strategies is an asset
- Creation and maintenance of monitoring and alert systems (Zabbix)
- Familiarity with databases (SQL-based) - create, modify, optimize (via script)
- Data and metrics gathering, aggregation, and reporting
- Experience with work management and documentation tools: JIRA and Confluence
Job Location: Jaipur
Experience Required: Minimum 3 years
About the role:
As a DevOps Engineer for Punchh, you will be working with our developers, SRE, and DevOps teams implementing our next generation infrastructure. We are looking for a self-motivated, responsible, team player who love designing systems that scale. Punchh provides a rich engineering environment where you can be creative, learn new technologies, solve engineering problems, all while delivering business objectives. The DevOps culture here is one with immense trust and responsibility. You will be given the opportunity to make an impact as there are no silos here.
Responsibilities:
- Deliver SLA and business objectives through whole lifecycle design of services through inception to implementation.
- Ensuring availability, performance, security, and scalability of AWS production systems
- Scale our systems and services through continuous integration, infrastructure as code, and gradual refactoring in an agile environment.
- Maintain services once a project is live by monitoring and measuring availability, latency, and overall system and application health.
- Write and maintain software that runs the infrastructure that powers the Loyalty and Data platform for some of the world’s largest brands.
- 24x7 in shifts on call for Level 2 and higher escalations
- Respond to incidents and write blameless RCA’s/postmortems
- Implement and practice proper security controls and processes
- Providing recommendations for architecture and process improvements.
- Definition and deployment of systems for metrics, logging, and monitoring on platform.
Must have:
- Minimum 3 Years of Experience in DevOps.
- BS degree in Computer Science, Mathematics, Engineering, or equivalent practical experience.
- Strong inter-personal skills.
- Must have experience in CI/CD tooling such as Jenkins, CircleCI, TravisCI
- Must have experience in Docker, Kubernetes, Amazon ECS or Mesos
- Experience in code development in at least one high-level programming language fromthis list: python, ruby, golang, groovy
- Proficient in shell scripting, and most importantly, know when to stop scripting and start developing.
- Experience in creation of highly automated infrastructures with any Configuration Management tools like: Terraform, Cloudformation or Ansible.
- In-depth knowledge of the Linux operating system and administration.
- Production experience with a major cloud provider such Amazon AWS.
- Knowledge of web server technologies such as Nginx or Apache.
- Knowledge of Redis, Memcache, or one of the many in-memory data stores.
- Experience with various load balancing technologies such as Amazon ALB/ELB, HA Proxy, F5.
- Comfortable with large-scale, highly-available distributed systems.
Good to have:
- Understanding of Web Standards (REST, SOAP APIs, OWASP, HTTP, TLS)
- Production experience with Hashicorp products such as Vault or Consul
- Expertise in designing, analyzing troubleshooting large-scale distributed systems.
- Experience in an PCI environment
- Experience with Big Data distributions from Cloudera, MapR, or Hortonworks
- Experience maintaining and scaling database applications
- Knowledge of fundamental systems engineering principles such as CAP Theorem, Concurrency Control, etc.
- Understanding of the network fundamentals: OSI, TCI/IP, topologies, etc.
- Understanding of Auditing of Infrastructure and help org. to control Infrastructure costs.
- Experience in Kafka, RabbitMQ or any messaging bus.



