
Hiring for the below position with one of our premium client
Role: Senior DevOps Engineer
Exp:7+ years
Location: Chennai
Key skills: DevOps, Cloud, Python scripting
Description:
Strong analytical and problem-solving skills
Ability to work independently, learn quickly and be proactive
7-9 years overall and at least 3-4 years of hands-on experience in designing and managing DevOps Cloud infrastructure
Experience must include a combination of:
o Experience working with configuration management tools – Ansible, Chef, Puppet, SaltStack (expertise in at least one tool is a must)
o Ability to write and maintain code in at least one scripting language (Python preferred)
o Practical knowledge of shell scripting
o Cloud knowledge – AWS, VMware vSphere
o Good understanding and familiarity with Linux o Networking knowledge – Firewalls, VPNs, Load Balancers o Web/Application servers, Nginx, JVM environments
o Virtualization and containers - Xen, KVM, Qemu, Docker, Kubernetes, etc.
o Familiarity with logging systems - Logstash, Elasticsearch, Kibana o Git, Jenkins, Jira
If interested kindly apply!

Similar jobs

Senior Cloud Engineer Job Description
Position Title: Senior Cloud Engineer -- AWS [LONG TERM-CONTRACT POSITION]
Location: Remote [REQUIRES WORKING IN CST TIME ZONE]
Position Overview
The Senior Cloud Engineer will play a critical role in designing, deploying, and managing scalable, secure, and highly available cloud infrastructure across multiple platforms (AWS, Azure, Google Cloud). This role requires deep technical expertise, leadership in cloud
strategy, and hands-on experience with automation, DevOps practices, and cloud-native technologies. The ideal candidate will work collaboratively with cross-functional teams to deliver robust cloud solutions, drive best practices, and support business objectives
through innovative cloud engineering.
Key Responsibilities
Design, implement, and maintain cloud infrastructure and services, ensuring high availability, performance, and security across multi-cloud environments (AWS, Azure, GCP)
Develop and manage Infrastructure as Code (IaC) using tools such as Terraform, CloudFormation, and Ansible for automated provisioning and configuration
Lead the adoption and optimization of DevOps methodologies, including CI/CD pipelines, automated testing, and deployment processes
Collaborate with software engineers, architects, and stakeholders to architect cloud-native solutions that meet business and technical requirements
Monitor, troubleshoot, and optimize cloud systems for cost, performance, and reliability, using cloud monitoring and logging tools
Ensure cloud environments adhere to security best practices, compliance standards, and governance policies, including identity and access management, encryption, and vulnerability management
Mentor and guide junior engineers, sharing knowledge and fostering a culture of continuous improvement and innovation
Participate in on-call rotation and provide escalation support for critical cloud infrastructure issues
Document cloud architectures, processes, and procedures to ensure knowledge transfer and operational excellence
Stay current with emerging cloud technologies, trends, and best practices,
Required Qualifications
- Bachelors or Masters degree in Computer Science, Engineering, Information Systems, or a related field, or equivalent work experience
- 6–10 years of experience in cloud engineering or related roles, with a proven track record in large-scale cloud environments
- Deep expertise in at least one major cloud platform (AWS, Azure, Google Cloud) and experience in multi-cloud environments
- Strong programming and scripting skills (Python, Bash, PowerShell, etc.) for automation and cloud service integration
- Proficiency with DevOps tools and practices, including CI/CD (Jenkins, GitLab CI), containerization (Docker, Kubernetes), and configuration management (Ansible, Chef)
- Solid understanding of networking concepts (VPC, VPN, DNS, firewalls, load balancers), system administration (Linux/Windows), and cloud storage solutions
- Experience with cloud security, governance, and compliance frameworks
- Excellent analytical, troubleshooting, and root cause analysis skills
- Strong communication and collaboration abilities, with experience working in agile, interdisciplinary teams
- Ability to work independently, manage multiple priorities, and lead complex projects to completion
Preferred Qualifications
- Relevant cloud certifications (e.g., AWS Certified Solutions Architect, AWS DevOps Engineer, Microsoft AZ-300/400/500, Google Professional Cloud Architect)
- Experience with cloud cost optimization and FinOps practices
- Familiarity with monitoring/logging tools (CloudWatch, Kibana, Logstash, Datadog, etc.)
- Exposure to cloud database technologies (SQL, NoSQL, managed database services)
- Knowledge of cloud migration strategies and hybrid cloud architectures
Please Apply - https://zrec.in/7EYKe?source=CareerSite
About Us
Infra360 Solutions is a services company specializing in Cloud, DevSecOps, Security, and Observability solutions. We help technology companies adapt DevOps culture in their organization by focusing on long-term DevOps roadmap. We focus on identifying technical and cultural issues in the journey of successfully implementing the DevOps practices in the organization and work with respective teams to fix issues to increase overall productivity. We also do training sessions for the developers and make them realize the importance of DevOps. We provide these services - DevOps, DevSecOps, FinOps, Cost Optimizations, CI/CD, Observability, Cloud Security, Containerization, Cloud Migration, Site Reliability, Performance Optimizations, SIEM and SecOps, Serverless automation, Well-Architected Review, MLOps, Governance, Risk & Compliance. We do assessments of technology architecture, security, governance, compliance, and DevOps maturity model for any technology company and help them optimize their cloud cost, streamline their technology architecture, and set up processes to improve the availability and reliability of their website and applications. We set up tools for monitoring, logging, and observability. We focus on bringing the DevOps culture to the organization to improve its efficiency and delivery.
Job Description
Job Title: Senior DevOps Engineer / SRE
Department: Technology
Location: Gurgaon
Work Mode: On-site
Working Hours: 10 AM - 7 PM
Terms: Permanent
Experience: 4-6 years
Education: B.Tech/MCA
Notice Period: Immediately
About Us
At Infra360.io, we are a next-generation cloud consulting and services company committed to delivering comprehensive, 360-degree solutions for cloud, infrastructure, DevOps, and security. We partner with clients to transform and optimize their technology landscape, ensuring resilience, scalability, cost efficiency and innovation.
Our core services include Cloud Strategy, Site Reliability Engineering (SRE), DevOps, Cloud Security Posture Management (CSPM), and related Managed Services. We specialize in driving operational excellence across multi-cloud environments, helping businesses achieve their goals with agility and reliability.
We thrive on ownership, collaboration, problem-solving, and excellence, fostering an environment where innovation and continuous learning are at the forefront. Join us as we expand and redefine what’s possible in cloud technology and infrastructure.
Role Summary
We are seeking a Senior DevOps Engineer (SRE) to manage and optimize large-scale, mission-critical production systems. The ideal candidate will have a strong problem-solving mindset, extensive experience in troubleshooting, and expertise in scaling, automating, and enhancing system reliability. This role requires hands-on proficiency in tools like Kubernetes, Terraform, CI/CD, and cloud platforms (AWS, GCP, Azure), along with scripting skills in Python or Go. The candidate will drive observability and monitoring initiatives using tools like Prometheus, Grafana, and APM solutions (Datadog, New Relic, OpenTelemetry).
Strong communication, incident management skills, and a collaborative approach are essential. Experience in team leadership and multi-client engagement is a plus.
Ideal Candidate Profile
- Solid 4-6 years of experience as an SRE and DevOps with a proven track record of handling large-scale production environments
- Bachelor's or Master's degree in Computer Science, Engineering, or a related field
- Strong Hands-on experience with managing Large Scale Production Systems
- Strong Production Troubleshooting Skills and handling high-pressure situations.
- Strong Experience with Databases (PostgreSQL, MongoDB, ElasticSearch, Kafka)
- Worked on making production systems more Scalable, Highly Available and Fault-tolerant
- Hands-on experience with ELK or other logging and observability tools
- Hands-on experience with Prometheus, Grafana & Alertmanager and on-call processes like Pagerduty
- Problem-Solving Mindset
- Strong with skills - K8s, Terraform, Helm, ArgoCD, AWS/GCP/Azure etc
- Good with Python/Go Scripting Automation
- Strong with fundamentals like DNS, Networking, Linux
- Experience with APM tools like - Newrelic, Datadog, OpenTelemetry
- Good experience with Incident Response, Incident Management, Writing detailed RCAs
- Experience with Applications best practices in making apps more reliable and fault-tolerant
- Strong leadership skills and the ability to mentor team members and provide guidance on best practices.
- Able to manage multiple clients and take ownership of client issues.
- Experience with Git and coding best practices
Good to have
- Team-leading Experience
- Multiple Client Handling
- Requirements gathering from clients
- Good Communication
Key Responsibilities
- Design and Development:
- Architect, design, and develop high-quality, scalable, and secure cloud-based software solutions.
- Collaborate with product and engineering teams to translate business requirements into technical specifications.
- Write clean, maintainable, and efficient code, following best practices and coding standards.
- Cloud Infrastructure:
- Develop and optimise cloud-native applications, leveraging cloud services like AWS, Azure, or Google Cloud Platform (GCP).
- Implement and manage CI/CD pipelines for automated deployment and testing.
- Ensure the security, reliability, and performance of cloud infrastructure.
- Technical Leadership:
- Mentor and guide junior engineers, providing technical leadership and fostering a collaborative team environment.
- Participate in code reviews, ensuring adherence to best practices and high-quality code delivery.
- Lead technical discussions and contribute to architectural decisions.
- Problem Solving and Troubleshooting:
- Identify, diagnose, and resolve complex software and infrastructure issues.
- Perform root cause analysis for production incidents and implement preventative measures.
- Continuous Improvement:
- Stay up-to-date with the latest industry trends, tools, and technologies in cloud computing and software engineering.
- Contribute to the continuous improvement of development processes, tools, and methodologies.
- Drive innovation by experimenting with new technologies and solutions to enhance the platform.
- Collaboration:
- Work closely with DevOps, QA, and other teams to ensure smooth integration and delivery of software releases.
- Communicate effectively with stakeholders, including technical and non-technical team members.
- Client Interaction & Management:
- Will serve as a direct point of contact for multiple clients.
- Able to handle the unique technical needs and challenges of two or more clients concurrently.
- Involve both direct interaction with clients and internal team coordination.
- Production Systems Management:
- Must have extensive experience in managing, monitoring, and debugging production environments.
- Will work on troubleshooting complex issues and ensure that production systems are running smoothly with minimal downtime.
Responsibility :
- Install, configure, and maintain Kubernetes clusters.
- Develop Kubernetes-based solutions.
- Improve Kubernetes infrastructure.
- Work with other engineers to troubleshoot Kubernetes issues.
Kubernetes Engineer Requirements & Skills
- Kubernetes administration experience, including installation, configuration, and troubleshooting
- Kubernetes development experience
- Linux/Unix experience
- Strong analytical and problem-solving skills
- Excellent communication and interpersonal skills
- Ability to work independently and as part of a team
The candidate must have 2-3 years of experience in the domain. The responsibilities include:
● Deploying system on Linux-based environment using Docker
● Manage & maintain the production environment
● Deploy updates and fixes
● Provide Level 1 technical support
● Build tools to reduce occurrences of errors and improve customer experience
● Develop software to integrate with internal back-end systems
● Perform root cause analysis for production errors
● Investigate and resolve technical issues
● Develop scripts to automate visualization
● Design procedures for system troubleshooting and maintenance
● Experience working on Linux-based infrastructure
● Excellent understanding of MERN Stack, Docker & Nginx (Good to have Node Js)
● Configuration and managing databases such as Mongo
● Excellent troubleshooting
● Experience of working with AWS/Azure/GCP
● Working knowledge of various tools, open-source technologies, and cloud services
● Awareness of critical concepts in DevOps and Agile principles
● Experience of CI/CD Pipeline
- Essentail Skills:
- Docker
- Jenkins
- Python dependency management using conda and pip
- Base Linux System Commands, Scripting
- Docker Container Build & Testing
- Common knowledge of minimizing container size and layers
- Inspecting containers for un-used / underutilized systems
- Multiple Linux OS support for virtual system
- Has experience as a user of jupyter / jupyter lab to test and fix usability issues in workbenches
- Templating out various configurations for different use cases (we use Python Jinja2 but are open to other languages / libraries)
- Jenkins PIpeline
- Github API Understanding to trigger builds, tags, releases
- Artifactory Experience
- Nice to have: Kubernetes, ArgoCD, other deployment automation tool sets (DevOps)
- Hands-on experience building database-backed web applications using Python based frameworks
- Excellent knowledge of Linux and experience developing Python applications that are deployed in Linux environments
- Experience building client-side and server-side API-level integrations in Python
- Experience in containerization and container orchestration systems like Docker, Kubernetes, etc.
- Experience with NoSQL document stores like the Elastic Stack (Elasticsearch, Logstash, Kibana)
- Experience in using and managing Git based version control systems - Azure DevOps, GitHub, Bitbucket etc.
- Experience in using project management tools like Jira, Azure DevOps etc.
- Expertise in Cloud based development and deployment using cloud providers like AWS or Azure
This person MUST have:
- B.E Computer Science or equivalent
- 2+ Years of hands-on experience troubleshooting/setting up of the Linux environment, who can write shell scripts for any given requirement.
- 1+ Years of hands-on experience setting up/configuring AWS or GCP services from SCRATCH and maintaining them.
- 1+ Years of hands-on experience setting up/configuring Kubernetes & EKS and ensuring high availability of container orchestration.
- 1+ Years of hands-on experience setting up CICD from SCRATCH in Jenkins & Gitlab.
- Experience configuring/maintaining one monitoring tool.
- Excellent verbal & written communication skills.
- Candidates with certifications - AWS, GCP, CKA, etc will be preferred
- Hands-on experience with databases (Cassandra, MongoDB, MySQL, RDS).
Experience:
- Min 3 years of experience as SRE automation engineer building, running, and maintaining production sites. Not looking for candidates who have experience only as L1/L2.
Location:
- Remotely, anywhere in India
Timings:
- The person is expected to deliver with both high speed and high quality as well as work for 40 Hours per week (~6.5 hours per day, 6 days per week) in shifts which will rotate every month.
Position:
- Full time/Direct
- We have great benefits such as PF, medical insurance, 12 annual company holidays, 12 PTO leaves per year, annual increments, Diwali bonus, spot bonuses and other incentives etc.
- We dont believe in locking in people with large notice periods. You will stay here because you love the company. We have only a 15 days notice period.
- Automate deployments of infrastructure components and repetitive tasks.
- Drive changes strictly via the infrastructure-as-code methodology.
- Promote the use of source control for all changes including application and system-level changes.
- Design & Implement self-recovering systems after failure events.
- Participate in system sizing and capacity planning of various components.
- Create and maintain technical documents such as installation/upgrade MOPs.
- Coordinate & collaborate with internal teams to facilitate installation & upgrades of systems.
- Support 24x7 availability for corporate sites & tools.
- Participate in rotating on-call schedules.
- Actively involved in researching, evaluating & selecting new tools & technologies.
- Cloud computing – AWS, OCI, OpenStack
- Automation/Configuration management tools such as Terraform & Chef
- Atlassian tools administration (JIRA, Confluence, Bamboo, Bitbucket)
- Scripting languages - Ruby, Python, Bash
- Systems administration experience – Linux (Redhat), Mac, Windows
- SCM systems - Git
- Build tools - Maven, Gradle, Ant, Make
- Networking concepts - TCP/IP, Load balancing, Firewall
- High-Availability, Redundancy & Failover concepts
- SQL scripting & queries - DML, DDL, stored procedures
- Decisive and ability to work under pressure
- Prioritizing workload and multi-tasking ability
- Excellent written and verbal communication skills
- Database systems – Postgres, Oracle, or other RDBMS
- Mac automation tools - JAMF or other
- Atlassian Datacenter products
- Project management skills
Qualifications
- 3+ years of hands-on experience in the field or related area
- Requires MS or BS in Computer Science or equivalent field
We are looking for an experienced DevOps engineer that will help our team establish DevOps practice. You will work closely with the technical lead to identify and establish DevOps practices in the company. You will also help us build scalable, efficient cloud infrastructure. You’ll implement monitoring for automated system health checks. Lastly, you’ll build our CI pipeline, and train and guide the team in DevOps practices.
Responsibilities
- Deployment, automation, management, and maintenance of production systems.
- Ensuring availability, performance, security, and scalability of production systems.
- Evaluation of new technology alternatives and vendor products.
- System troubleshooting and problem resolution across various application domains and platforms.
- Providing recommendations for architecture and process improvements.
- Definition and deployment of systems for metrics, logging, and monitoring on the AWS
platform.
- Manage the establishment and configuration of SaaS infrastructure in an agile way
by storing infrastructure as code and employing automated configuration management tools with a goal to be able to re-provision environments at any point in time.
- Be accountable for proper backup and disaster recovery procedures.
- Drive operational cost reductions through service optimizations and demand-based
auto-scaling.
- Have on-call responsibilities.
- Perform root cause analysis for production errors
- Uses open source technologies and tools to accomplish specific use cases encountered
within the project.
- Uses coding languages or scripting methodologies to solve a problem with a custom workflow.
Requirements
- Systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive.
- Prior experience as a software developer in a couple of high-level programming
languages.
- Extensive experience in any Javascript-based framework since we will be deploying services to NodeJS on AWS Lambda (Serverless)
- Strong Linux system administration background.
- Ability to present and communicate the architecture in a visual form.
- Strong knowledge of AWS (e.g. IAM, EC2, VPC, ELB, ALB, Autoscaling, Lambda, NAT
gateway, DynamoDB)
- Experience maintaining and deploying highly-available, fault-tolerant systems at scale (~
1 Lakh users a day)
- A drive towards automating repetitive tasks (e.g. scripting via Bash, Python, Ruby, etc)
- Expertise with Git
- Experience implementing CI/CD (e.g. Jenkins, TravisCI)
- Strong experience with databases such as MySQL, NoSQL, Elasticsearch, Redis and/or
Mongo.
- Stellar troubleshooting skills with the ability to spot issues before they become problems.
- Current with industry trends, IT ops and industry best practices, and able to identify the
ones we should implement.
- Time and project management skills, with the capability to prioritize and multitask as
needed.








