
Please Apply - https://zrec.in/7EYKe?source=CareerSite
About Us
Infra360 Solutions is a services company specializing in Cloud, DevSecOps, Security, and Observability solutions. We help technology companies adapt DevOps culture in their organization by focusing on long-term DevOps roadmap. We focus on identifying technical and cultural issues in the journey of successfully implementing the DevOps practices in the organization and work with respective teams to fix issues to increase overall productivity. We also do training sessions for the developers and make them realize the importance of DevOps. We provide these services - DevOps, DevSecOps, FinOps, Cost Optimizations, CI/CD, Observability, Cloud Security, Containerization, Cloud Migration, Site Reliability, Performance Optimizations, SIEM and SecOps, Serverless automation, Well-Architected Review, MLOps, Governance, Risk & Compliance. We do assessments of technology architecture, security, governance, compliance, and DevOps maturity model for any technology company and help them optimize their cloud cost, streamline their technology architecture, and set up processes to improve the availability and reliability of their website and applications. We set up tools for monitoring, logging, and observability. We focus on bringing the DevOps culture to the organization to improve its efficiency and delivery.
Job Description
Job Title: Senior DevOps Engineer / SRE
Department: Technology
Location: Gurgaon
Work Mode: On-site
Working Hours: 10 AM - 7 PM
Terms: Permanent
Experience: 4-6 years
Education: B.Tech/MCA
Notice Period: Immediately
About Us
At Infra360.io, we are a next-generation cloud consulting and services company committed to delivering comprehensive, 360-degree solutions for cloud, infrastructure, DevOps, and security. We partner with clients to transform and optimize their technology landscape, ensuring resilience, scalability, cost efficiency and innovation.
Our core services include Cloud Strategy, Site Reliability Engineering (SRE), DevOps, Cloud Security Posture Management (CSPM), and related Managed Services. We specialize in driving operational excellence across multi-cloud environments, helping businesses achieve their goals with agility and reliability.
We thrive on ownership, collaboration, problem-solving, and excellence, fostering an environment where innovation and continuous learning are at the forefront. Join us as we expand and redefine what’s possible in cloud technology and infrastructure.
Role Summary
We are seeking a Senior DevOps Engineer (SRE) to manage and optimize large-scale, mission-critical production systems. The ideal candidate will have a strong problem-solving mindset, extensive experience in troubleshooting, and expertise in scaling, automating, and enhancing system reliability. This role requires hands-on proficiency in tools like Kubernetes, Terraform, CI/CD, and cloud platforms (AWS, GCP, Azure), along with scripting skills in Python or Go. The candidate will drive observability and monitoring initiatives using tools like Prometheus, Grafana, and APM solutions (Datadog, New Relic, OpenTelemetry).
Strong communication, incident management skills, and a collaborative approach are essential. Experience in team leadership and multi-client engagement is a plus.
Ideal Candidate Profile
- Solid 4-6 years of experience as an SRE and DevOps with a proven track record of handling large-scale production environments
- Bachelor's or Master's degree in Computer Science, Engineering, or a related field
- Strong Hands-on experience with managing Large Scale Production Systems
- Strong Production Troubleshooting Skills and handling high-pressure situations.
- Strong Experience with Databases (PostgreSQL, MongoDB, ElasticSearch, Kafka)
- Worked on making production systems more Scalable, Highly Available and Fault-tolerant
- Hands-on experience with ELK or other logging and observability tools
- Hands-on experience with Prometheus, Grafana & Alertmanager and on-call processes like Pagerduty
- Problem-Solving Mindset
- Strong with skills - K8s, Terraform, Helm, ArgoCD, AWS/GCP/Azure etc
- Good with Python/Go Scripting Automation
- Strong with fundamentals like DNS, Networking, Linux
- Experience with APM tools like - Newrelic, Datadog, OpenTelemetry
- Good experience with Incident Response, Incident Management, Writing detailed RCAs
- Experience with Applications best practices in making apps more reliable and fault-tolerant
- Strong leadership skills and the ability to mentor team members and provide guidance on best practices.
- Able to manage multiple clients and take ownership of client issues.
- Experience with Git and coding best practices
Good to have
- Team-leading Experience
- Multiple Client Handling
- Requirements gathering from clients
- Good Communication
Key Responsibilities
- Design and Development:
- Architect, design, and develop high-quality, scalable, and secure cloud-based software solutions.
- Collaborate with product and engineering teams to translate business requirements into technical specifications.
- Write clean, maintainable, and efficient code, following best practices and coding standards.
- Cloud Infrastructure:
- Develop and optimise cloud-native applications, leveraging cloud services like AWS, Azure, or Google Cloud Platform (GCP).
- Implement and manage CI/CD pipelines for automated deployment and testing.
- Ensure the security, reliability, and performance of cloud infrastructure.
- Technical Leadership:
- Mentor and guide junior engineers, providing technical leadership and fostering a collaborative team environment.
- Participate in code reviews, ensuring adherence to best practices and high-quality code delivery.
- Lead technical discussions and contribute to architectural decisions.
- Problem Solving and Troubleshooting:
- Identify, diagnose, and resolve complex software and infrastructure issues.
- Perform root cause analysis for production incidents and implement preventative measures.
- Continuous Improvement:
- Stay up-to-date with the latest industry trends, tools, and technologies in cloud computing and software engineering.
- Contribute to the continuous improvement of development processes, tools, and methodologies.
- Drive innovation by experimenting with new technologies and solutions to enhance the platform.
- Collaboration:
- Work closely with DevOps, QA, and other teams to ensure smooth integration and delivery of software releases.
- Communicate effectively with stakeholders, including technical and non-technical team members.
- Client Interaction & Management:
- Will serve as a direct point of contact for multiple clients.
- Able to handle the unique technical needs and challenges of two or more clients concurrently.
- Involve both direct interaction with clients and internal team coordination.
- Production Systems Management:
- Must have extensive experience in managing, monitoring, and debugging production environments.
- Will work on troubleshooting complex issues and ensure that production systems are running smoothly with minimal downtime.

About Infra360 Solutions Pvt Ltd
About
At Infra360.io, we are a next-generation cloud consulting and services company committed to delivering comprehensive, 360-degree solutions for Cloud, Infrastructure, DevOps, MLOps and Security. We partner with clients to modernize and optimize their cloud, ensuring resilience, scalability, cost efficiency and innovation.
We thrive on ownership, collaboration, problem-solving, and excellence, fostering an environment where innovation and continuous learning are at the forefront. Join us as we expand and redefine what’s possible in cloud technology and infrastructure.
Candid answers by the company
Our core services include Cloud Strategy, Site Reliability Engineering (SRE), DevOps, Cloud Security Posture Management (CSPM), and related Managed Services. We specialize in driving operational excellence across multi-cloud environments, helping businesses achieve their goals with agility and reliability.
Similar jobs
Key Responsibilities
DevOps Strategy & Leadership
- Define and execute the end-to-end DevOps strategy for high-frequency trading and fintech platforms.
- Lead, mentor, and scale a high-performing DevOps team focused on automation, reliability, and performance.
- Partner closely with engineering and product leaders to ensure infrastructure strategy supports business and technical goals.
CI/CD & Infrastructure Automation
- Architect, implement, and optimize enterprise-grade CI/CD pipelines for ultra-low-latency trading systems.
- Drive Infrastructure as Code (IaC) adoption using Terraform, Helm, Kubernetes, and advanced automation toolsets.
- Establish robust release management, deployment workflows, and versioning best practices for mission‑critical environments.
Cloud & On‑Prem Infrastructure Management
- Design and manage hybrid infrastructures across AWS, GCP, and on-premise data centers ensuring high availability and fault tolerance.
- Implement sophisticated networking strategies for low-latency workloads including routing optimization and performance tuning.
- Lead multi‑cloud scalability, cost optimization, and environment standardization initiatives.
Performance Monitoring & Optimization
- Oversee large-scale monitoring systems using Prometheus, Grafana, ELK, and related observability tools.
- Implement predictive alerting, automated remediation, and system‑wide health checks for zero‑downtime operations.
- Conduct root-cause analyses and performance tuning for systems processing millions of transactions per second.
Security & Compliance
- Champion DevSecOps practices and embed security across the entire development and deployment lifecycle.
- Ensure adherence to financial regulatory standards (SEBI and global frameworks) with strong audit and compliance mechanisms.
- Lead security automation efforts, vulnerability management, and advanced IAM policy implementation.
Required Skills & Qualifications
- 10+ years of DevOps experience, with 5+ years in a leadership capacity.
- Deep hands-on expertise in CI/CD tools such as Jenkins, GitLab CI/CD, and ArgoCD.
- Strong command of AWS, GCP, and hybrid cloud infrastructures.
- Expert-level knowledge of Kubernetes, Docker, and large-scale container orchestration.
- Advanced proficiency in Terraform, Helm, and overall IaC workflows.
- Strong Linux administration, networking fundamentals (TCP/IP, DNS, Firewalls), and system internals.
- Experience with monitoring and observability platforms (Prometheus, Grafana, ELK).
- Excellent scripting skills in Python, Bash, or Go for automation and tooling.
- Deep understanding of security principles, encryption, IAM, and compliance frameworks.
Good to Have
- Experience with ultra-low-latency or high-frequency trading systems.
- Knowledge of FIX protocol, FPGA acceleration, or network‑level optimizations.
- Familiarity with Redis, Nginx, or other high‑throughput systems.
- Exposure to micro‑second‑level performance tuning or network acceleration technologies.
Why Join Us?
- Be part of a team that consistently raises the bar and delivers exceptional engineering outcomes.
- A culture where innovation, ownership, and bold thinking are valued.
- Exceptional growth opportunities—ideal for someone who thrives in fast-paced, high-impact environments.
- Build systems that influence markets and redefine the fintech landscape.
This isn’t just a role—it’s a challenge, a platform, and a proving ground.
Ready to step up? Apply now.
Role: Senior Platform Engineer (GCP Cloud)
Experience Level: 3 to 6 Years
Work location: Mumbai
Mode : Hybrid
Role & Responsibilities:
- Build automation software for cloud platforms and applications
- Drive Infrastructure as Code (IaC) adoption
- Design self-service, self-healing monitoring and alerting tools
- Automate CI/CD pipelines (Git, Jenkins, SonarQube, Docker)
- Build Kubernetes container platforms
- Introduce new cloud technologies for business innovation
Requirements:
- Hands-on experience with GCP Cloud
- Knowledge of cloud services (compute, storage, network, messaging)
- IaC tools experience (Terraform/CloudFormation)
- SQL & NoSQL databases (Postgres, Cassandra)
- Automation tools (Puppet/Chef/Ansible)
- Strong Linux administration skills
- Programming: Bash/Python/Java/Scala
- CI/CD pipeline expertise (Jenkins, Git, Maven)
- Multi-region deployment experience
- Agile/Scrum/DevOps methodology
Now, more than ever, the Toast team is committed to our customers. We’re taking steps to help restaurants navigate these unprecedented times with technology, resources, and community. Our focus is on building a restaurant platform that helps restaurants adapt, take control, and get back to what they do best: building the businesses they love. And because our technology is purpose-built for restaurants by restaurant people, restaurants can trust that we’ll deliver on their needs for today while investing in experiences that will power their restaurant of the future.
At Toast, our Site Reliability Engineers (SREs) are responsible for keeping all customer-facing services and other Toast production systems running smoothly. SREs are a blend of pragmatic operators and software craftspeople who apply sound software engineering principles, operational discipline, and mature automation to our environments and our codebase. Our decisions are based on instrumentation and continuous observability, as well as predictions and capacity planning.
About this roll* (Responsibilities)
- Gather and analyze metrics from both operating systems and applications to assist in performance tuning and fault finding
- Partner with development teams to improve services through rigorous testing and release procedures
- Participate in system design consulting, platform management, and capacity planning
- Create sustainable systems and services through automation and uplift
- Balance feature development speed and reliability with well-defined service level objectives
Troubleshooting and Supporting Escalations:
- Gather and analyze metrics from both operating systems and applications to assist in performance tuning and fault finding
- Diagnose performance bottlenecks and implement optimizations across infrastructure, databases, web, and mobile applications
- Implement strategies to increase system reliability and performance through on-call rotation and process optimization
- Perform and run blameless RCAs on incidents and outages aggressively, looking for answers that will prevent the incident from ever happening again
Do you have the right ingredients? (Requirements)
- Extensive industry experience with at least 7+ years in SRE and/or DevOps roles
- Polyglot technologist/generalist with a thirst for learning
- Deep understanding of cloud and microservice architecture and the JVM
- Experience with tools such as APM, Terraform, Ansible, GitHub, Jenkins, and Docker
- Experience developing software or software projects in at least four languages, ideally including two of Go, Python, and Java
- Experience with cloud computing technologies ( AWS cloud provider preferred)
Bread puns are encouraged but not required
About us:
HappyFox is a software-as-a-service (SaaS) support platform. We offer an enterprise-grade help desk ticketing system and intuitively designed live chat software.
We serve over 12,000 companies in 70+ countries. HappyFox is used by companies that span across education, media, e-commerce, retail, information technology, manufacturing, non-profit, government and many other verticals that have an internal or external support function.
To know more, Visit! - https://www.happyfox.com/
Responsibilities:
- Build and scale production infrastructure in AWS for the HappyFox platform and its products.
- Research, Build/Implement systems, services and tooling to improve uptime, reliability and maintainability of our backend infrastructure. And to meet our internal SLOs and customer-facing SLAs.
- Proficient in managing/patching servers with Unix-based operating systems like Ubuntu Linux.
- Proficient in writing automation scripts or building infrastructure tools using Python/Ruby/Bash/Golang
- Implement consistent observability, deployment and IaC setups
- Patch production systems to fix security/performance issues
- Actively respond to escalations/incidents in the production environment from customers or the support team
- Mentor other Infrastructure engineers, review their work and continuously ship improvements to production infrastructure.
- Build and manage development infrastructure, and CI/CD pipelines for our teams to ship & test code faster.
- Participate in infrastructure security audits
Requirements:
- At least 5 years of experience in handling/building Production environments in AWS.
- At least 2 years of programming experience in building API/backend services for customer-facing applications in production.
- Demonstrable knowledge of TCP/IP, HTTP and DNS fundamentals.
- Experience in deploying and managing production Python/NodeJS/Golang applications to AWS EC2, ECS or EKS.
- Proficient in containerised environments such as Docker, Docker Compose, Kubernetes
- Proficient in managing/patching servers with Unix-based operating systems like Ubuntu Linux.
- Proficient in writing automation scripts using any scripting language such as Python, Ruby, Bash etc.,
- Experience in setting up and managing test/staging environments, and CI/CD pipelines.
- Experience in IaC tools such as Terraform or AWS CDK
- Passion for making systems reliable, maintainable, scalable and secure.
- Excellent verbal and written communication skills to address, escalate and express technical ideas clearly
- Bonus points – if you have experience with Nginx, Postgres, Redis, and Mongo systems in production.
Come Dive In
The DevOps Engineer will execute the tools and processes to enable DevOps.
Engage in and improve the whole lifecycle of services from inception and design through deployment, operation, and refinement to efficiently deliver high-quality solutions. The candidate should bridge the gap between Development and Operational teams, working with the development teams to meet acceptance criteria and gather and document the requirements. Candidates should be able to work in fast-paced,
multi-disciplinary environments.
As An DevOps Engineer, You Will
● Work in a dynamic, agile team environment developing excellent new applications.
● Participate in design decisions, including new technology research and prototyping
● Collaborate closely with other AWS engineers and architects, cloud engineers, support teams, and other stakeholders
● Promote great Kubernetes and AWS platform design and quality
● Innovate new ideas to evolve our applications and processes
● Continuously analyzing and evaluating our systems, products, and methods for potential improvements.
Mandatory Skills:
● Experience on Linux based infrastructure
● Experience in ECS - Amazon services*
● Should have hands-on containerized Services
● Must know about AWS CI/CD pipeline.
● Must know DevOps concepts and Agile principles
● Knowledge of Git, Docker, and Jenkins
● Knowledge of Infrastructure as Code.
● Experience in using Automation Tools
● Must have experience in Test Driven Development environment setup.
● Working knowledge of Docker and Kubernetes
We recognize that asking you to give 100% of yourself daily requires us to show you the love.
PERKS: what can we offer you?
● Bi-Yearly performance audits and appraisals
● The flexibility of working days/hours
● 5 working days/week (Mon to Fri) and added payout for working Saturday
● Recognition and Appreciation
● A plethora of industry exposure and self-growth opportunities
Visit our site: www.cedcoss.com
Our Client is an IT infrastructure services company, focused and specialized in delivering solutions and services on Microsoft products and technologies. They are a Microsoft partner and cloud solution provider. Our Client's objective is to help small, mid-sized as well as global enterprises to transform their business by using innovation in IT, adapting to the latest technologies and using IT as an enabler for business to meet business goals and continuous growth.
With focused and experienced management and a strong team of IT Infrastructure professionals, they are adding value by making IT Infrastructure a robust, agile, secure and cost-effective service to the business. As an independent IT Infrastructure company, they provide their clients with unbiased advice on how to successfully implement and manage technology to complement their business requirements.
- Providing on-call support within a high availability production environment
- Logging issues
- Providing Complex problem analysis and resolution for technical and application issues
- Supporting and collaborating with team members
- Running system updates
- Monitoring and responding to system alerts
- Developing and running system health checks
- Applying industry standard practices across the technology estate
- Performing system reviews
- Reviewing and maintaining infrastructure configuration
- Diagnosing performance issues and network bottlenecks
- Collaborating within geographically distributed teams
- Supporting software development infrastructure by continuous integration and delivery standards
- Working closely with developers and QA teams as part of a customer support centre
- Projecting delivery work, either individually or in conjunction with other teams, external suppliers or contractors
- Ensuring maintenance of the technical environments to meet current standards
- Ensuring compliance with appropriate industry and security regulations
- Providing support to Development and Customer Support teams
- Managing the hosted infrastructure through vendor engagement
- Managing 3rd party software licensing ensuring compliance
- Delivering new technologies as agreed by the business
What you need to have:
- Experience working within a technical operations environment relevant to associated skills stated.
- Be proficient in:
- Linux, zsh/ bash/ similar
- ssh, tmux/ screen/ similar
- vim/ emacs/ similar
- Computer networking
- Have a reasonable working knowledge of:
- Cloud infrastructure, Preferably GCP
- One or more programming/ scripting languages
- Git
- Docker
- Web services and web servers
- Databases, relational and NoSQL
- Some familiarity with:
- Puppet, ansible
- Terraform
- GitHub, CircleCI , Kubernetes
- Scripting language- Shell
- Databases: Cassandra, Postgres, MySQL or CloudSQL
- Agile working practices including scrum and Kanban
- Private & public cloud hosting environments
- Strong technology interests with a positive ‘can do’ attitude
- Be flexible and adaptable to changing priorities
- Be good at planning and organising their own time and able to meet targets and deadlines without supervision
- Excellent written and verbal communication skills.
- Approachable with both colleagues and team members
- Be resourceful and practical with an ability to respond positively and quickly to technical and business challenges
- Be persuasive, articulate and influential, but down to earth and friendly with own team and colleagues
- Have an ability to establish relationships quickly and to work effectively either as part of a team or singularly
- Be customer focused with both internal and external customers
- Be capable of remaining calm under pressure
- Technically minded with good problem resolution skills and systematic manner
- Excellent documentation skills
- Prepared to participate in out of hours support rota
One of our US based client is looking for a Devops professional who can handle Technical as well as Trainings for them in US.
If you are hired, you will be sent to US for the working from there. Training & Technical work ratio will be 70% & 30% respectively.
Company Will sponsor for US Visa.
If you are an Experienced Devops professional and also given professional trainings then feel free to connect with us for more.
Implement integrations requested by customers
Deploy updates and fixes
Provide Level 2 technical support
Build tools to reduce occurrences of errors and improve customer experience
Develop software to integrate with internal back-end systems
Perform root cause analysis for production errors
Investigate and resolve technical issues
Develop scripts to automate visualization
Design procedures for system troubleshooting and maintenance
Multiple Clouds [AWS/Azure/GCP] hands on experience
Good Experience on Docker implementation at scale.
Kubernets implementation and orchestration.
About the company:
Tathastu, the next-generation innovation labs is Future Group’s initiative to provide a new-age retail experience - combining the physical with digital and enhancing it with data. We are creating next-generation consumer interactions by combining AI/ML, Data Science, and emerging technologies with consumer platforms.
The E-Commerce vertical under Tathastu has developed online consumer platforms for Future Group’s portfolio of retail brands -Easy day, Big Bazaar, Central, Brand factory, aLL, Clarks, Coverstory. Backed by our network of offline stores we have built a new retail platform that merges our Online & Offline retail streams. We use data to power all our decisions across our products and build internal tools to help us scale our impact with a small closely-knit team.
Our widespread store network, robust logistics, and technology capabilities have made it possible to launch a ‘2-Hour Delivery Promise’ on every product across fashion, food, FMCG, and home products for orders placed online through the Big Bazaar mobile app and portal. This makes Big Bazaar the first retailer in the country to offer instant home delivery on almost every consumer product ordered online.
Job Responsibilities:
- You’ll streamline and automate the software development and infrastructure management processes and play a crucial role in executing high-impact initiatives and continuously improving processes to increase the effectiveness of our platforms.
- You’ll translate complex use cases into discrete technical solutions in platform architecture, design and coding, functionality, usability, and optimization.
- You will drive automation in repetitive tasks, configuration management, and deliver comprehensive automated tests to debug/troubleshoot Cloud AWS-based systems and BigData applications.
- You’ll continuously discover, evaluate, and implement new technologies to maximize the development and operational efficiency of the platforms.
- You’ll determine the metrics that will define technical and operational success and constantly track such metrics to fine-tune the technology stack of the organization.
Experience: 4 to 8 Yrs
Qualification: B.Tech / MCA
Required Skills:
- Experience with Linux/UNIX systems administration and Amazon Web Services (AWS).
- Infrastructure as Code (Terraform), Kubernetes and container orchestration, Web servers (Nginx, Apache), Application Servers(Tomcat,Node.js,..), document stores and relational databases (AWS RDS-MySQL).
- Site Reliability Engineering patterns and visibility /performance/availability monitoring (Cloudwatch, Prometheus)
- Background in and happy to work hands-on with technical troubleshooting and performance tuning.
- Supportive and collaborative personality - ability to influence and drive progress with your peers
Our Technology Stack:
- Docker/Kubernetes
- Cloud (AWS)
- Python/GoLang Programming
- Microservices
- Automation Tools











