Site Reliability Engineer

Our client company is into Computer software. (YB1)

Site Reliability Engineer

Multi Recruit

Company

Home

Site Reliability Engineer

at Our client company is into Computer software. (YB1)

Agency job

via Multi Recruit

5 - 8 yrs

₹30L - ₹35L / yr

Pune, Bengaluru (Bangalore)

Skills

SRE

DevOps

Site Reliability Engineer

Design, develop, test, debug and maintain components of the cloud infrastructure
Manage operational priorities of the DBaaS
Establish a process for handling and leading response to security new vulnerabilities
Lead certification efforts from the security perspective
Participate in penetration testing efforts
Design and build DBaaS processes for key management, rotation storage, encryption, and password management

Requirements:

Strong software design and implementation skills in building infrastructure frameworks
Experience building and operating extensible, scalable resilient data systems
Working knowledge of Java and Python Experience using public cloud infrastructure (AWS, GCP, or Azure)
Containerization tooling (Docker, EKS, Kubernetes)
Infrastructure as Code Tooling (Example: Terraform, Cloudformation, Etc.)
Configuration Management Tooling (Ansible, Chef, etc.)
Automation Scripting (Python preferred)
Solid understanding of basic systems operations (disk, network, etc)
Willingness and ability to learn new languages and concepts
5+ years of relevant experience

Design, develop, test, debug and maintain components of the cloud infrastructure
Manage operational priorities of the DBaaS
Establish a process for handling and leading response to security new vulnerabilities
Lead certification efforts from the security perspective
Participate in penetration testing efforts
Design and build DBaaS processes for key management, rotation storage, encryption, and password management

Requirements:

Strong software design and implementation skills in building infrastructure frameworks
Experience building and operating extensible, scalable resilient data systems
Working knowledge of Java and Python Experience using public cloud infrastructure (AWS, GCP, or Azure)
Containerization tooling (Docker, EKS, Kubernetes)
Infrastructure as Code Tooling (Example: Terraform, Cloudformation, Etc.)
Configuration Management Tooling (Ansible, Chef, etc.)
Automation Scripting (Python preferred)
Solid understanding of basic systems operations (disk, network, etc)
Willingness and ability to learn new languages and concepts
5+ years of relevant experience

Users love Cutshort

Read about what our users have to say about finding their next opportunity on Cutshort.

Shubham Vishwakarma

Full Stack Developer - Averlon

I had an amazing experience. It was a delight getting interviewed via Cutshort. The entire end to end process was amazing. I would like to mention Reshika, she was just amazing wrt guiding me through the process. Thank you team.

Companies hiring on Cutshort

Similar jobs

SRE

at VY SYSTEMS PRIVATE LIMITED

Posted by Banu S

Bengaluru (Bangalore)

5 - 9 yrs

₹4L - ₹16L / yr

SRE

Production support

Linux/Unix

openshift

Shell Scripting

+1 more

We are seeking an experienced Senior Systems Operations Engineer with 6+ years of experience in Application Production Support, Systems Operations, and Incident Management. The ideal candidate will be responsible for maintaining the stability, availability, and performance of production applications and infrastructure while ensuring adherence to ITIL processes and operational excellence.

The candidate should possess strong expertise in Linux/Unix administration, SQL/Oracle Database support, Production Issue Analysis, Incident & Change Management, and monitoring tools such as Splunk, Grafana, and AppDynamics. The role requires excellent troubleshooting skills, stakeholder communication, and the ability to lead critical incident resolution activities in a 16X7 production environment.

Key Responsibilities:

• Provide advanced production support for critical business applications, ensuring high availability and performance.

• Lead incident and change management processes, including root cause analysis and resolution of production issues.

• Monitor application health using tools such as SPLUNK, Grafana, and AppDynamics.

• Collaborate with development, infrastructure, and business teams to drive continuous improvement in system operations.

• Maintain and optimize Linux/Unix environments and SQL/Oracle databases.

• Document operational procedures, troubleshooting steps, and best practices.

Required Skills:

• Strong experience with Linux/Unix system administration.

• Advanced proficiency in SQL and Oracle database management.

• Expertise in incident and change management within enterprise environments.

• Proven ability to analyse and resolve production issues efficiently.

• Hands-on experience with monitoring and alerting tools: SPLUNK, Grafana, AppDynamics.

• Excellent communication and collaboration skills.

• Experience with cloud platforms (AWS, Azure, GCP)

Desired Candidate Profile

• 6+ years of experience in Application Production Support and Systems Operations.

• Proven experience managing mission-critical production environments.

• Strong expertise in Linux/Unix, SQL, Oracle Database, and monitoring tools.

• Demonstrated success in incident resolution, RCA preparation, and service improvement initiatives.

• Ability to work effectively in a fast-paced 16x7 production support environment.

• Exposure to AI tools and implementation as well

Key Responsibilities:

• Provide advanced production support for critical business applications, ensuring high availability and performance.

• Lead incident and change management processes, including root cause analysis and resolution of production issues.

• Monitor application health using tools such as SPLUNK, Grafana, and AppDynamics.

• Collaborate with development, infrastructure, and business teams to drive continuous improvement in system operations.

• Maintain and optimize Linux/Unix environments and SQL/Oracle databases.

• Document operational procedures, troubleshooting steps, and best practices.

Required Skills:

• Strong experience with Linux/Unix system administration.

• Advanced proficiency in SQL and Oracle database management.

• Expertise in incident and change management within enterprise environments.

• Proven ability to analyse and resolve production issues efficiently.

• Hands-on experience with monitoring and alerting tools: SPLUNK, Grafana, AppDynamics.

• Excellent communication and collaboration skills.

• Experience with cloud platforms (AWS, Azure, GCP)

Desired Candidate Profile

• 6+ years of experience in Application Production Support and Systems Operations.

• Proven experience managing mission-critical production environments.

• Strong expertise in Linux/Unix, SQL, Oracle Database, and monitoring tools.

• Demonstrated success in incident resolution, RCA preparation, and service improvement initiatives.

• Ability to work effectively in a fast-paced 16x7 production support environment.

• Exposure to AI tools and implementation as well

SRE / Site Reliability Engineer

at Wissen Technology

4 recruiters

Posted by Shakthi M

Bengaluru (Bangalore)

3 - 9 yrs

Best in industry

Azure

Python

Terraform

SRE

Windows Azure

Strong hands-on experience in Microsoft Azure Cloud.
Good understanding of Azure services such as Compute, Storage, Event Hub, Event Subscription, Storage Queue, and PaaS services.
Basic understanding of Azure AI Foundry and AI-related Azure service setup.
Good Azure networking basics: VNet, subnet, routing, and basic troubleshooting.
Strong knowledge of Terraform, especially:
Terraform state
plan / apply
troubleshooting failures
migration risks
Terraform Enterprise concepts
Strong Python coding capability, not just basic scripting.
Experience using Python for API integration, automation, JSON/YAML handling, and internal tooling.
Good understanding of CI/CD pipelines.
Ability to troubleshoot pipeline failures.
Comfortable with YAML and JSON.
Ability to troubleshoot Azure infrastructure/platform issues.
Ability to collect logs/evidence and coordinate with network/app/Microsoft support teams.
Basic awareness of agentic AI / LLM concepts.
Awareness of security and cost best practices.

Good to Have Skills

Hands-on experience with Harness.
Hands-on experience with Terraform Enterprise.
Exposure to LangGraph / LangChain.
Exposure to agentic AI workflows or skill creation.
Exposure to Claude or enterprise LLM integrations.
Knowledge of Azure ML Workspace, model registry, and managed endpoints.
MLOps / LLMOps knowledge.
FinOps / Azure cost optimization experience.
Azure certifications: AZ-104, AZ-305, AZ-400, AZ-500.

Strong hands-on experience in Microsoft Azure Cloud.
Good understanding of Azure services such as Compute, Storage, Event Hub, Event Subscription, Storage Queue, and PaaS services.
Basic understanding of Azure AI Foundry and AI-related Azure service setup.
Good Azure networking basics: VNet, subnet, routing, and basic troubleshooting.
Strong knowledge of Terraform, especially:
Terraform state
plan / apply
troubleshooting failures
migration risks
Terraform Enterprise concepts
Strong Python coding capability, not just basic scripting.
Experience using Python for API integration, automation, JSON/YAML handling, and internal tooling.
Good understanding of CI/CD pipelines.
Ability to troubleshoot pipeline failures.
Comfortable with YAML and JSON.
Ability to troubleshoot Azure infrastructure/platform issues.
Ability to collect logs/evidence and coordinate with network/app/Microsoft support teams.
Basic awareness of agentic AI / LLM concepts.
Awareness of security and cost best practices.

Good to Have Skills

Hands-on experience with Harness.
Hands-on experience with Terraform Enterprise.
Exposure to LangGraph / LangChain.
Exposure to agentic AI workflows or skill creation.
Exposure to Claude or enterprise LLM integrations.
Knowledge of Azure ML Workspace, model registry, and managed endpoints.
MLOps / LLMOps knowledge.
FinOps / Azure cost optimization experience.
Azure certifications: AZ-104, AZ-305, AZ-400, AZ-500.

Principal DevOps Engineer

at Securin Labs

Posted by Anitha Chalil

Chennai, Tamilnadu

10 - 18 yrs

₹30L - ₹45L / yr

SRE

DevOps

Platform engineering

AWS CloudFormation

Kubernetes

+6 more

Principal DevOps Engineer

Note - Screening Requirement: Please note that this position requires a minimum of 8+ years of hands-on Architecting DevOps/SRE/Platform Engineering experience specializing in AWS, EKS/Kubernetes, Terraform, Python, Jenkins, and AI workflows (Mandatory skills) from scratch.

We are looking for an absolute builder who has a proven track record of personally architecting, designing and setting up production-grade AWS EKS clusters entirely from scratch. If your experience is primarily limited to managing, maintaining, or optimizing pre-existing environments that were already stood up by another team, this is not the right opportunity for you.

Who are we?

Securin is an AI-driven cybersecurity company focused on proactive, adversarial exposure and vulnerability management. Our mission is to help organizations reduce cyber risk by identifying, prioritising, and remediating the issues that matter most. Powered by a seasoned team of threat researchers and status as a Certified Naming Authority (CNA), Securin combines artificial intelligence / machine learning, threat intelligence, and deep vulnerability research (including the Dark Web) to deliver an adversarial approach to cyber defense. We help enterprises shift from reactive patching to strategic, risk-based exposure and vulnerability management – driving smarter security decisions and faster remediation.

What do we promise?

We are a highly effective tech-enabled cybersecurity solutions provider and promise continual security posture improvement, enhanced attack surface visibility, and proactive prioritized remediation for every one of our client businesses.

What do we provide?

● A chance to be on the leading edge of cybersecurity and AI

● Ability to have direct impact on company growth and revenue strategy

● An opportunity to mentor and be mentored by experts in multiple disciplines

What do we deliver?

Securin helps organizations to identify and remediate the most dangerous exposures, vulnerabilities, and risks in their environment. We deliver predictive and definitive intelligence and facilitate proactive remediation to help organizations stay a step ahead of attackers. By utilising our cybersecurity solutions, our clients can have a proactive and holistic view of their security posture and protect their assets from even the most advanced and dynamic attacks.

Securin has been recognized by national and international organizations for its role in accelerating innovation in offensive and proactive security. Our combination of domain expertise, cutting-edge technology, and advanced tech-enabled cybersecurity solutions has made Securin a leader in the industry.

Core Technology Stack

AWS , EKS / Kubernetes, Jenkins , Python ,Terraform ,CI/CD , AI

Key Responsibilities:

● Architect and manage the end-to-end SaaS platform infrastructure on AWS, including EKS cluster design, VPC networking, IAM, and multi-region availability.

● Build, maintain, and optimize Jenkins-based CI/CD pipelines and develop Python automation scripts for provisioning, deployments, and runbook automation.

● Define and enforce platform SLOs/SLAs; own the observability strategy across logging, metrics, and tracing.

● Manage and participate in the on-call rotation; act as escalation point for P1/P2 incidents and drive post-incident reviews.

● Drive Infrastructure-as-Code (IaC) practices with Terraform/CloudFormation and champions a culture of automation and operational excellence.

● Collaborate cross-functionally with product, security, and engineering teams to align infrastructure roadmap with business goals.

● Identify opportunities to leverage AI to automate operational and DevOps workflows.

● Design and implement AI-assisted solutions for incident triaging, root cause analysis, log analysis, and performance optimization.

● Drive the adoption of AI-powered tools for infrastructure management, deployment automation, monitoring, and troubleshooting.

● Build intelligent workflows that reduce manual effort in release management, capacity planning, and operational support.

● Integrate AI capabilities into CI/CD pipelines to improve code quality, deployment reliability, and operational efficiency.

● Collaborate with engineering teams to automate repetitive tasks and improve developer productivity.

● Define best practices and governance for the safe and effective use of AI across DevOps processes.

● Measure and report on productivity gains, operational improvements, and cost savings achieved through AI adoption.

Requirements:

● 8+ years of experience in DevOps, SRE, or cloud infrastructure engineering roles.

● Deep hands-on expertise with AWS services (EC2, EKS, RDS, S3, IAM, VPC, CloudFront, Route53, Lambda, etc).

● Strong Kubernetes experience: cluster management, Helm, autoscaling (HPA/KEDA).

● Proficiency with Jenkins for complex CI/CD pipeline design and maintenance.

● Solid Python scripting skills for automation, tooling, and infrastructure management tasks.

● Experience with Infrastructure-as-Code using Terraform and/or AWS Cloud Formation.

● Proven track record of architecting and managing end-to-end SaaS products in a cloud-native environment.

● Strong understanding of networking fundamentals, security best practices, and compliance frameworks (SOC 2, ISO 27001 a plus).

● Hands-on experience with on-call processes and incident management frameworks.

Preferred Qualifications:

● AWS certifications: Solutions Architect Professional, DevOps Engineer Professional, or equivalent.

● Familiarity with service mesh, secrets management (Vault, AWS Secrets Manager), and zero-trust security models.

● Experience with multi-tenant SaaS architectures and tenant isolation strategies.

● Knowledge of FinOps principles and AWS cost management tooling.

● Experience with database DevOps: RDS, Aurora schema migrations, and backup strategies.

Core Competencies:

● Strategic Thinking – ability to translate business goals into scalable technical architecture.

● Operational Excellence – strong bias for reliability, automation, and continuous improvement.

● Communication – ability to clearly articulate complex technical topics to non-technical stakeholders.

● Ownership Mindset – proactively identifies and resolves risks without waiting to be asked.

● Resilience Under Pressure – calm and decisive during incidents; leads by example in high-stress situations.

Why should we connect?

We are a bunch of passionate cybersecurity professionals who are building a culture of security. Today, cybersecurity is no more a luxury but a necessity with a global market value of $150 billion.

At Securin, we live by a people-first approach. We firmly believe that our employees should enjoy what they do. For our employees, we provide a hybrid work environment with competitive best-in-industry pay, while providing them with an environment to learn, thrive, and grow. Our hybrid working environment allows employees to work from the comfort of their homes or the office if they choose to. For the right candidate, this will feel like your second home.

If you are passionate about cybersecurity just as we are, we would love to connect and share ideas