Site Reliability Engineer (Platform Reliability & Uptime)

at Agentic Universe

Site Reliability Engineer (Platform Reliability & Uptime)

Agentic Universe

Company

Home

Site Reliability Engineer (Platform Reliability & Uptime)

at Agentic Universe

Posted by Anubhav Kumar Rai

2 - 5 yrs

₹5.4L - ₹7.2L / yr

Bengaluru (Bangalore)

Skills

DevOps

Amazon Web Services (AWS)

Google Cloud Platform (GCP)

Windows Azure

Location: Bangalore

Experience: 2–5 years

Type: Full-time | On-site

Start: Immediate

Why this role exists

Most systems don’t fail because of one big outage.

They fail because reliability is treated as an afterthought.

Right now, uptime depends too much on individual heroics.

That doesn’t scale.

This role exists to build a reliability system where:

Uptime is predictable
Failures are contained
Escalations don’t depend on leadership

What you’ll do

You will not just monitor systems.

You will own reliability as a product.

1. Drive uptime to production-grade reliability

Improve system uptime to 99.9% customer-facing SLA within 4 months
Define and track:
SLAs / SLOs / error budgets
Ensure reliability is measured from the customer’s perspective, not internal metrics

2. Build incident response as a system

Set up a 24/7 incident response rotation across 3 engineers
Eliminate dependency on leadership (no single escalation point)
Define:
Incident severity levels
Response playbooks
Escalation protocols
Ensure fast detection → containment → resolution

3. Contain and fix erratic system behavior

Identify and resolve:
Latency spikes
Downtime incidents
Integration failures
Build guardrails to prevent recurrence
Focus on root cause elimination, not temporary fixes

4. Create continuous reliability feedback loops

Work closely with engineering teams to:
Surface recurring failure patterns
Improve build quality
Reduce production bugs
Ensure learnings from incidents directly improve future releases

5. Improve observability and monitoring

Build dashboards and alerts for:
System health
Performance metrics
Failure signals
Ensure issues are detected before customers report them

6. Reduce operational fragility

Remove single points of failure (people, systems, workflows)
Improve system resilience across:
Deployments
Integrations
Runtime environments

What success looks like

Uptime reaches 99.9%+ reliably
Incidents are:
Detected early
Contained quickly
Resolved permanently
No dependency on a single individual for escalation
System behavior becomes predictable and stable
Engineering teams ship with higher reliability confidence

Who you are

You have 2-5 years of experience in SRE / DevOps / backend systems
You have worked on production systems with real uptime expectations
You think in:
Systems
Failure modes
Trade-offs
You are comfortable debugging live, high-pressure environments

What will make you stand out

Experience with:
Distributed systems
Cloud infrastructure (AWS / Azure / GCP)
Monitoring & alerting tools
Have built or improved:
Incident response systems
Reliability frameworks
Strong debugging skills across:
Infra
Application
Integrations

Compensation

₹60,000/month (fixed)

(Aligned with role scope and impact expectations)

Why join

You will define reliability standards for a production AI platform
Your work directly impacts:
Customer trust
Product performance
Enterprise readiness
You will move the system from reactive → predictable

What this role is not

Not just monitoring dashboards
Not limited to handling tickets
Not dependent on escalation to leadership

What this role is

A builder of reliability systems
A guardian of uptime and performance
A multiplier of engineering quality

One question to self-evaluate

Can you build a system where downtime is rare, predictable, and never dependent on a single person?

Location: Bangalore

Experience: 2–5 years

Type: Full-time | On-site

Start: Immediate

Why this role exists

Most systems don’t fail because of one big outage.

They fail because reliability is treated as an afterthought.

Right now, uptime depends too much on individual heroics.

That doesn’t scale.

This role exists to build a reliability system where:

Uptime is predictable
Failures are contained
Escalations don’t depend on leadership

What you’ll do

You will not just monitor systems.

You will own reliability as a product.

1. Drive uptime to production-grade reliability

Improve system uptime to 99.9% customer-facing SLA within 4 months
Define and track:
SLAs / SLOs / error budgets
Ensure reliability is measured from the customer’s perspective, not internal metrics

2. Build incident response as a system

Set up a 24/7 incident response rotation across 3 engineers
Eliminate dependency on leadership (no single escalation point)
Define:
Incident severity levels
Response playbooks
Escalation protocols
Ensure fast detection → containment → resolution

3. Contain and fix erratic system behavior

Identify and resolve:
Latency spikes
Downtime incidents
Integration failures
Build guardrails to prevent recurrence
Focus on root cause elimination, not temporary fixes

4. Create continuous reliability feedback loops

Work closely with engineering teams to:
Surface recurring failure patterns
Improve build quality
Reduce production bugs
Ensure learnings from incidents directly improve future releases

5. Improve observability and monitoring

Build dashboards and alerts for:
System health
Performance metrics
Failure signals
Ensure issues are detected before customers report them

6. Reduce operational fragility

Remove single points of failure (people, systems, workflows)
Improve system resilience across:
Deployments
Integrations
Runtime environments

What success looks like

Uptime reaches 99.9%+ reliably
Incidents are:
Detected early
Contained quickly
Resolved permanently
No dependency on a single individual for escalation
System behavior becomes predictable and stable
Engineering teams ship with higher reliability confidence

Who you are

You have 2-5 years of experience in SRE / DevOps / backend systems
You have worked on production systems with real uptime expectations
You think in:
Systems
Failure modes
Trade-offs
You are comfortable debugging live, high-pressure environments

What will make you stand out

Experience with:
Distributed systems
Cloud infrastructure (AWS / Azure / GCP)
Monitoring & alerting tools
Have built or improved:
Incident response systems
Reliability frameworks
Strong debugging skills across:
Infra
Application
Integrations

Compensation

₹60,000/month (fixed)

(Aligned with role scope and impact expectations)

Why join

You will define reliability standards for a production AI platform
Your work directly impacts:
Customer trust
Product performance
Enterprise readiness
You will move the system from reactive → predictable

What this role is not

Not just monitoring dashboards
Not limited to handling tickets
Not dependent on escalation to leadership

What this role is

A builder of reliability systems
A guardian of uptime and performance
A multiplier of engineering quality

One question to self-evaluate

Can you build a system where downtime is rare, predictable, and never dependent on a single person?

Users love Cutshort

Read about what our users have to say about finding their next opportunity on Cutshort.

Shubham Vishwakarma

Full Stack Developer - Averlon

I had an amazing experience. It was a delight getting interviewed via Cutshort. The entire end to end process was amazing. I would like to mention Reshika, she was just amazing wrt guiding me through the process. Thank you team.

Companies hiring on Cutshort

About Agentic Universe

Founded :

2022

Type :

Product

Size :

20-100

Stage :

Raised funding

About

Agentic Universe - AI Agents that run outcomes for your teams

Company social profiles

N/A

Similar jobs

Senior Devops Engineer

at FrontM Limited

Posted by Pradeep Chandkiran

Bengaluru (Bangalore)

3 - 5 yrs

₹8L - ₹14L / yr

Kubernetes

Terraform

Amazon Web Services (AWS)

Location: Bangalore preferred / Hybrid as applicable

Experience: 3+ years

Education: B.E/B.Tech in Computer Science, Engineering or a related technical discipline

Salary: Above market standards, flexible for the right candidate

Career growth: Long-term opportunity with potential to lead DevOps architecture and cloud platform operations

About FrontM

FrontM builds software platforms for frontline workforces operating in remote and low-connectivity environments, with a strong focus on the maritime industry. The platform supports communication, collaboration, healthcare, learning, welfare and operational workflows across mobile, web, kiosk and connected device environments.

The platform runs across cloud infrastructure, constrained networks and specialised customer environments, requiring reliable DevOps practices, strong observability, secure architecture and careful operational discipline.

Role Summary

As a Senior DevOps Engineer, you will take ownership of FrontM’s AWS cloud infrastructure, CI/CD pipelines, platform reliability and technical operations. You will work closely with the VP of Delivery, CTO and CEO to maintain secure, scalable and high-availability infrastructure for FrontM’s production systems.

This role requires strong hands-on DevOps experience, broad AWS knowledge, Kubernetes experience and the ability to troubleshoot complex networking and production issues across multi-domain SaaS environments.

Key Responsibilities

Cloud Infrastructure & DevOps Architecture (≈45%)

· Own, maintain and improve AWS cloud infrastructure for FrontM platforms

· Create and maintain Terraform scripts for infrastructure deployment and management

· Manage Kubernetes workloads deployed within AWS EKS

· Support multi-zone AWS infrastructure design for availability, resilience and scale

· Maintain AWS services including Route 53, EC2, API Gateway, VPC, VPN, AWS Cognito, ElastiCache, DynamoDB and Lambda

· Contribute to DevOps architecture planning in line with FrontM’s platform roadmap

CI/CD, Operations & Platform Reliability (≈35%)

· Build, maintain and improve CI/CD pipelines for backend and platform services

· Oversee technical operations with hands-on administration, monitoring and release support

· Ensure continuous server uptime, stability, performance and maintainability

· Debug, respond to and restore system outages in production and staging environments

· Improve observability across infrastructure and applications, including migration from Elastic stack to logz.io

· Support backend stability, scale and performance across Node.js, Java and related services

Security, Networking & Production Support (≈20%)

· Maintain AWS security configurations, access controls and monitoring practices

· Support complex networking requirements across multi-domain SaaS implementations

· Troubleshoot network, infrastructure and access issues with internal teams and customer-side users

· Work with backend teams to support API integrations and infrastructure abstractions for complex requirements

· Document operational procedures, incident findings and technical support steps clearly

Required Technical Skills

Cloud Infrastructure & AWS

· Strong hands-on experience with AWS infrastructure and cloud operations

· Experience with Route 53, EC2, API Gateway, VPC, VPN, AWS Cognito, ElastiCache, DynamoDB and Lambda

· Experience with AWS security setup, monitoring and multi-zone infrastructure

· Ability to manage infrastructure using Terraform

Kubernetes, CI/CD & Observability

· Strong experience with Kubernetes, preferably AWS EKS

· Extensive CI/CD and DevOps experience

· Experience with infrastructure observability and application monitoring tools

· Ability to diagnose production bottlenecks, server failures and performance issues

Backend, Networking & SaaS Operations

· Experience supporting Node.js, Java and backend system procedures for stability and scale

· Good understanding of APIs, integrations and backend service dependencies

· Experience with complex networking and multi-domain SaaS implementations

· Ability to troubleshoot technical issues with non-technical end users

Nice to Have

· Experience with MongoDB clusters in MongoDB Atlas

Personal Attributes

· Strong ownership mindset for uptime, reliability and production stability

· Practical problem-solving approach with the ability to act quickly during incidents

· Clear written and spoken communication in English

· Ability to work independently and coordinate with senior management when required

· Comfortable working in fast-moving engineering teams

· Attention to detail in security, monitoring, documentation and operational processes

Why join FrontM?

Long-Term Career Growth

Opportunity to work on cloud infrastructure used by global maritime and remote workforce customers, with scope to grow into DevOps architecture and platform leadership roles.

Engineering Challenges That Matter

Work on infrastructure that supports applications used in remote, low-bandwidth and operationally demanding environments.

Broad Technical Ownership

Take responsibility across cloud infrastructure, Kubernetes, CI/CD, observability, networking, security and production reliability.

Apply now

Join a team focused on building reliable software infrastructure for real-world use cases and contribute to systems used across the global maritime workforce.

Location: Bangalore preferred / Hybrid as applicable

Experience: 3+ years

Education: B.E/B.Tech in Computer Science, Engineering or a related technical discipline

Salary: Above market standards, flexible for the right candidate

Career growth: Long-term opportunity with potential to lead DevOps architecture and cloud platform operations

About FrontM

Role Summary

Key Responsibilities

Cloud Infrastructure & DevOps Architecture (≈45%)

· Own, maintain and improve AWS cloud infrastructure for FrontM platforms

· Create and maintain Terraform scripts for infrastructure deployment and management

· Manage Kubernetes workloads deployed within AWS EKS

· Support multi-zone AWS infrastructure design for availability, resilience and scale

· Maintain AWS services including Route 53, EC2, API Gateway, VPC, VPN, AWS Cognito, ElastiCache, DynamoDB and Lambda

· Contribute to DevOps architecture planning in line with FrontM’s platform roadmap

CI/CD, Operations & Platform Reliability (≈35%)

· Build, maintain and improve CI/CD pipelines for backend and platform services

· Oversee technical operations with hands-on administration, monitoring and release support

· Ensure continuous server uptime, stability, performance and maintainability

· Debug, respond to and restore system outages in production and staging environments

· Improve observability across infrastructure and applications, including migration from Elastic stack to logz.io

· Support backend stability, scale and performance across Node.js, Java and related services

Security, Networking & Production Support (≈20%)

· Maintain AWS security configurations, access controls and monitoring practices

· Support complex networking requirements across multi-domain SaaS implementations

· Troubleshoot network, infrastructure and access issues with internal teams and customer-side users

· Work with backend teams to support API integrations and infrastructure abstractions for complex requirements

· Document operational procedures, incident findings and technical support steps clearly

Required Technical Skills

Cloud Infrastructure & AWS

· Strong hands-on experience with AWS infrastructure and cloud operations

· Experience with Route 53, EC2, API Gateway, VPC, VPN, AWS Cognito, ElastiCache, DynamoDB and Lambda

· Experience with AWS security setup, monitoring and multi-zone infrastructure

· Ability to manage infrastructure using Terraform

Kubernetes, CI/CD & Observability

· Strong experience with Kubernetes, preferably AWS EKS

· Extensive CI/CD and DevOps experience

· Experience with infrastructure observability and application monitoring tools

· Ability to diagnose production bottlenecks, server failures and performance issues

Backend, Networking & SaaS Operations

· Experience supporting Node.js, Java and backend system procedures for stability and scale

· Good understanding of APIs, integrations and backend service dependencies

· Experience with complex networking and multi-domain SaaS implementations

· Ability to troubleshoot technical issues with non-technical end users

Nice to Have

· Experience with MongoDB clusters in MongoDB Atlas

Personal Attributes

· Strong ownership mindset for uptime, reliability and production stability

· Practical problem-solving approach with the ability to act quickly during incidents

· Clear written and spoken communication in English

· Ability to work independently and coordinate with senior management when required

· Comfortable working in fast-moving engineering teams

· Attention to detail in security, monitoring, documentation and operational processes

Why join FrontM?

Long-Term Career Growth

Opportunity to work on cloud infrastructure used by global maritime and remote workforce customers, with scope to grow into DevOps architecture and platform leadership roles.

Engineering Challenges That Matter

Work on infrastructure that supports applications used in remote, low-bandwidth and operationally demanding environments.

Broad Technical Ownership

Take responsibility across cloud infrastructure, Kubernetes, CI/CD, observability, networking, security and production reliability.

Apply now

Join a team focused on building reliable software infrastructure for real-world use cases and contribute to systems used across the global maritime workforce.

DevOps Engineer

at Peliqan

3 recruiters

Posted by Bharath Kumar

Bengaluru (Bangalore)

3 - 5 yrs

₹10L - ₹20L / yr

Python

Kubernetes

helm

Docker

Amazon Web Services (AWS)

+3 more

DevOps Engineer

Location: Bangalore office

About Peliqan

Peliqan is an all-in-one data platform combining ELT/ETL pipelines, a built-in data warehouse, SQL and low-code Python transformations, reverse ETL, and AI-powered data activation. We connect 250+ data sources and serve enterprise teams, consultants, and SaaS companies. SOC 2 Type II certified and GDPR compliant.

The Role

Own and evolve the infrastructure powering Peliqan's multi-tenant data platform. You'll manage Kubernetes clusters, cloud resources, CI/CD pipelines, and monitoring — keeping everything reliable, secure, and scalable. You'll be the go-to person for infrastructure support across the engineering team.

Responsibilities

Manage and optimise Kubernetes clusters running production workloads — data pipelines, APIs, and customer-facing services.

Maintain Docker-based local development environments for the engineering team.

Administer cloud infrastructure on AWS and Google Cloud (compute, storage, networking, managed databases).

Build and maintain CI/CD pipelines for automated testing, building, and deploying across staging and production.

Set up and manage monitoring, alerting, and logging for platform health and incident response.

Manage release processes — deployments, rollbacks, and release strategies.

Maintain infrastructure-as-code using Helm charts.
Support security hardening and compliance efforts (SOC 2, GDPR).

Requirements

3+ years in a DevOps, SRE, or Infrastructure Engineering role.

Strong hands-on experience with Kubernetes and Helm charts.

Deep familiarity with Docker for containerisation and local dev workflows.

Production experience with AWS and/or Google Cloud.

Proficiency in Python and Bash scripting for automation and tooling.
Solid grasp of DevOps principles: infrastructure-as-code, GitOps, observability, continuous delivery.
Experience with CI/CD platforms (GitHub Actions, GitLab CI, or similar).

Nice to Have

Experience supporting multi-tenant SaaS platforms or data infrastructure at scale.
Knowledge of PostgreSQL, MySQL, or cloud-managed database administration.
Exposure to security compliance frameworks (SOC 2, ISO 27001, GDPR).

DevOps Engineer

Location: Bangalore office

About Peliqan

The Role

Responsibilities

Manage and optimise Kubernetes clusters running production workloads — data pipelines, APIs, and customer-facing services.

Maintain Docker-based local development environments for the engineering team.

Administer cloud infrastructure on AWS and Google Cloud (compute, storage, networking, managed databases).

Build and maintain CI/CD pipelines for automated testing, building, and deploying across staging and production.

Set up and manage monitoring, alerting, and logging for platform health and incident response.

Manage release processes — deployments, rollbacks, and release strategies.

Maintain infrastructure-as-code using Helm charts.
Support security hardening and compliance efforts (SOC 2, GDPR).

Requirements

3+ years in a DevOps, SRE, or Infrastructure Engineering role.

Strong hands-on experience with Kubernetes and Helm charts.

Deep familiarity with Docker for containerisation and local dev workflows.

Production experience with AWS and/or Google Cloud.

Proficiency in Python and Bash scripting for automation and tooling.
Solid grasp of DevOps principles: infrastructure-as-code, GitOps, observability, continuous delivery.
Experience with CI/CD platforms (GitHub Actions, GitLab CI, or similar).

Nice to Have

Experience supporting multi-tenant SaaS platforms or data infrastructure at scale.
Knowledge of PostgreSQL, MySQL, or cloud-managed database administration.
Exposure to security compliance frameworks (SOC 2, ISO 27001, GDPR).

Cloud Architect

Leading Payment Solution Company

Agency job

via People First Consultants by Jayaraj E

Remote, Bengaluru (Bangalore), Chennai, Pune, Hyderabad, Mumbai

3 - 10 yrs

₹8L - ₹28L / yr

Docker

Kubernetes

DevOps

Amazon Web Services (AWS)

Windows Azure

+3 more

Experience: 3+ years of experience in Cloud Architecture

About Company:

The company is a global leader in secure payments and trusted transactions. They are at the forefront of the digital revolution that is shaping new ways of paying, living, doing business and building relationships that pass on trust along the entire payments value chain, enabling sustainable economic growth. Their innovative solutions, rooted in a rock-solid technological base, are environmentally friendly, widely accessible and support social transformation.

Cloud Architect / Lead

Role Overview

Senior Engineer with a strong background and experience in cloud related technologies and architectures. Can design target cloud architectures to transform existing architectures together with the in-house team. Can actively hands-on configure and build cloud architectures and guide others.

Key Knowledge

3-5+ years of experience in AWS/GCP or Azure technologies
Is likely certified on one or more of the major cloud platforms
Strong experience from hands-on work with technologies such as Terraform, K8S, Docker and orchestration of containers.
Ability to guide and lead internal agile teams on cloud technology
Background from the financial services industry or similar critical operational experience

Experience: 3+ years of experience in Cloud Architecture

About Company:

Cloud Architect / Lead

Role Overview

Senior Engineer with a strong background and experience in cloud related technologies and architectures. Can design target cloud architectures to transform existing architectures together with the in-house team. Can actively hands-on configure and build cloud architectures and guide others.

Key Knowledge

3-5+ years of experience in AWS/GCP or Azure technologies
Is likely certified on one or more of the major cloud platforms
Strong experience from hands-on work with technologies such as Terraform, K8S, Docker and orchestration of containers.
Ability to guide and lead internal agile teams on cloud technology
Background from the financial services industry or similar critical operational experience

Cloud Infrastructure Engineer

at F5 Networks

Posted by Gopi Daggumilli

Hyderabad

5 - 10 yrs

Best in industry

Docker

Kubernetes

DevOps

OpenStack

openshift

+16 more

POSITION SUMMARY:

We are looking for a passionate, high energy individual to help build and manage the infrastructure network that powers the Product Development Labs for F5 Inc. The F5 Infra Engineer plays a critical role to our Product Development team by providing valuable services and tools for the F5 Hyderabad Product Development Lab. The Infra team supports both production systems and customized/flexible testing environments used by Test and Product Development teams. As an Infra Engineer, you ’ll have the opportunity to work with cutting-edge technology and work with talented individuals. The ideal candidate will have experience in Private and Public Cloud – AWS-AZURE-GCP, OpenStack, storage, Backup, VMware, KVM, XEN, HYPER-V Hypervisor Server Administration, Networking and Automation in Data Center Operations environment at a global enterprise scale with Kubernetes, OpenShift Container Flatforms.

EXPERIENCE

7- 9+ Years – Software Engineer III

PRIMARY RESPONSIBILITIES:

Drive the design, Project Build, Infrastructure setup, monitoring, measurements, and improvements around the quality of services Provided, Network and Virtual Instances service from OpenStack, VMware VIO, Public and private cloud and DevOps environments.
Work closely with the customers and understand the requirements and get it done on timelines.
Work closely with F5 architects and vendors to understand emerging technologies and F5 Product Roadmap and how they would benefit the Infra team and its users.
Work closely with the Team and complete the deliverables on-time
Consult with testers, application, and service owners to design scalable, supportable network infrastructure to meet usage requirements.
Assume ownership for large/complex systems projects; mentor Lab Network Engineers in the best practices for ongoing maintenance and scaling of large/complex systems.
Drive automation efforts for the configuration and maintainability of the public/private Cloud.
Lead product selection for replacement or new technologies
Address user tickets in a timely manner for the covered services
Responsible for deploying, managing, and supporting production and pre-production environments for our core systems and services.
Migration and consolidations of infrastructure
Design and implement major service and infrastructure components.
Research, investigate and define new areas of technology to enhance existing service or new service directions.
Evaluate performance of services and infrastructure; tune, re-evaluate the design and implementation of current source code and system configuration.
Create and maintain scripts and tools to automate the configuration, usability and troubleshooting of the supported applications and services.
Ability to take ownership on activities and new initiatives.
Infra Global Support from India towards product Development teams.
On-call support on a rotational basis for a global turn-around time-zones
Vendor Management for all latest hardware and software evaluations keep the system up-to-date.

KNOWLEDGE, SKILLS AND ABILITIES:

Have an in-depth multi-disciplined knowledge of Storage, Compute, Network, DevOps technologies and latest cutting-edge technologies.
Multi-cloud - AWS, Azure, GCP, OpenStack, DevOps Operations
IaaS- Infrastructure as a service, Metal as service, Platform service
Storage – Dell EMC, NetApp, Hitachi, Qumulo and Other storage technologies
Hypervisors – (VMware, Hyper-V, KVM, Xen and AHV)
DevOps – Kubernetes, OpenShift, docker, other container and orchestration flatforms
Automation – Scripting experience python/shell/golan , Full Stack development and Application Deployment
Tools - Jenkins, splunk, kibana, Terraform, Bitbucket, Git, CI/CD configuration.
Datacenter Operations – Racking, stacking, cable matrix, Solution Design and Solutions Architect
Networking Skills – Cisco/Arista Switches, Routers, Experience on Cable matrix design and pathing (Fiber/copper)
Experience in SAN/NAS storage – (EMC/Qumulo/NetApp & others)
Experience with Red Hat Ceph storage.
A working knowledge of Linux, Windows, and Hypervisor Operating Systems and virtual machine technologies
SME - subject matter expert for all cutting-edge technologies
Data center architect professional & Storage Expert level Certified professional experience .
A solid understanding of high availability systems, redundant networking and multipathing solutions
Proven problem resolution related to network infrastructure, judgment, negotiating and decision-making skills along with excellent written and oral communication skills.
A Working experience in Object – Block – File storage Technologies
Experience in Backup Technologies and backup administration.
Dell/HP/Cisco UCS server’s administration is an additional advantage.
Ability to quickly learn and adopt new technologies.
A very very story experience and exposure towards open-source flatforms.
A working experience on monitoring tools Zabbix, nagios , Datadog etc ..
A working experience on and BareMetal services and OS administration.
A working experience on the cloud like AWS- ipsec, Azure - express route, GCP – Vpn tunnel etc.
A working experience in working using software define network like (VMware NSX, SDN, Openvswitch etc ..)
A working experience with systems engineering and Linux /Unix administration
A working experience with Database administration experience with PostgreSQL, MySQL, NoSQL
A working experience with automation/configuration management using either Puppet, Chef or an equivalent
A working experience with DevOps Operations Kubernetes, container, Docker, and git repositories
Experience in Build system process and Code-inspect and delivery methodologies.
Knowledge on creating Operational Dashboards and execution lane.
Experience and knowledge on DNS, DHCP, LDAP, AD, Domain-controller services and PXE Services
SRE experience in responsible for availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning.
Vendor support – OEM upgrades, coordinating technical support and troubleshooting experience.
Experience in handling On-call Support and hierarchy process.
Knowledge on scale-out and scale-in architecture.
Working experience in ITSM / process Management tools like ServiceNow, Jira, Jira Align.
Knowledge on Agile and Scrum principles
Working experience with ServiceNow
Knowledge sharing, transition experience and self-learning Behavioral.

POSITION SUMMARY:

EXPERIENCE

7- 9+ Years – Software Engineer III

PRIMARY RESPONSIBILITIES:

Drive the design, Project Build, Infrastructure setup, monitoring, measurements, and improvements around the quality of services Provided, Network and Virtual Instances service from OpenStack, VMware VIO, Public and private cloud and DevOps environments.
Work closely with the customers and understand the requirements and get it done on timelines.
Work closely with F5 architects and vendors to understand emerging technologies and F5 Product Roadmap and how they would benefit the Infra team and its users.
Work closely with the Team and complete the deliverables on-time
Consult with testers, application, and service owners to design scalable, supportable network infrastructure to meet usage requirements.
Assume ownership for large/complex systems projects; mentor Lab Network Engineers in the best practices for ongoing maintenance and scaling of large/complex systems.
Drive automation efforts for the configuration and maintainability of the public/private Cloud.
Lead product selection for replacement or new technologies
Address user tickets in a timely manner for the covered services
Responsible for deploying, managing, and supporting production and pre-production environments for our core systems and services.
Migration and consolidations of infrastructure
Design and implement major service and infrastructure components.
Research, investigate and define new areas of technology to enhance existing service or new service directions.
Evaluate performance of services and infrastructure; tune, re-evaluate the design and implementation of current source code and system configuration.
Create and maintain scripts and tools to automate the configuration, usability and troubleshooting of the supported applications and services.
Ability to take ownership on activities and new initiatives.
Infra Global Support from India towards product Development teams.
On-call support on a rotational basis for a global turn-around time-zones
Vendor Management for all latest hardware and software evaluations keep the system up-to-date.

KNOWLEDGE, SKILLS AND ABILITIES:

Have an in-depth multi-disciplined knowledge of Storage, Compute, Network, DevOps technologies and latest cutting-edge technologies.
Multi-cloud - AWS, Azure, GCP, OpenStack, DevOps Operations
IaaS- Infrastructure as a service, Metal as service, Platform service
Storage – Dell EMC, NetApp, Hitachi, Qumulo and Other storage technologies
Hypervisors – (VMware, Hyper-V, KVM, Xen and AHV)
DevOps – Kubernetes, OpenShift, docker, other container and orchestration flatforms
Automation – Scripting experience python/shell/golan , Full Stack development and Application Deployment
Tools - Jenkins, splunk, kibana, Terraform, Bitbucket, Git, CI/CD configuration.
Datacenter Operations – Racking, stacking, cable matrix, Solution Design and Solutions Architect
Networking Skills – Cisco/Arista Switches, Routers, Experience on Cable matrix design and pathing (Fiber/copper)
Experience in SAN/NAS storage – (EMC/Qumulo/NetApp & others)
Experience with Red Hat Ceph storage.
A working knowledge of Linux, Windows, and Hypervisor Operating Systems and virtual machine technologies
SME - subject matter expert for all cutting-edge technologies
Data center architect professional & Storage Expert level Certified professional experience .
A solid understanding of high availability systems, redundant networking and multipathing solutions
Proven problem resolution related to network infrastructure, judgment, negotiating and decision-making skills along with excellent written and oral communication skills.
A Working experience in Object – Block – File storage Technologies
Experience in Backup Technologies and backup administration.
Dell/HP/Cisco UCS server’s administration is an additional advantage.
Ability to quickly learn and adopt new technologies.
A very very story experience and exposure towards open-source flatforms.
A working experience on monitoring tools Zabbix, nagios , Datadog etc ..
A working experience on and BareMetal services and OS administration.
A working experience on the cloud like AWS- ipsec, Azure - express route, GCP – Vpn tunnel etc.
A working experience in working using software define network like (VMware NSX, SDN, Openvswitch etc ..)
A working experience with systems engineering and Linux /Unix administration
A working experience with Database administration experience with PostgreSQL, MySQL, NoSQL
A working experience with automation/configuration management using either Puppet, Chef or an equivalent
A working experience with DevOps Operations Kubernetes, container, Docker, and git repositories
Experience in Build system process and Code-inspect and delivery methodologies.
Knowledge on creating Operational Dashboards and execution lane.
Experience and knowledge on DNS, DHCP, LDAP, AD, Domain-controller services and PXE Services
SRE experience in responsible for availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning.
Vendor support – OEM upgrades, coordinating technical support and troubleshooting experience.
Experience in handling On-call Support and hierarchy process.
Knowledge on scale-out and scale-in architecture.
Working experience in ITSM / process Management tools like ServiceNow, Jira, Jira Align.
Knowledge on Agile and Scrum principles
Working experience with ServiceNow
Knowledge sharing, transition experience and self-learning Behavioral.

DevOps

at CodeCraft Technologies Private Limited

1 video

3 recruiters

Agency job

via Bullhorn Consultants by Sai Kiran R

Bengaluru (Bangalore)

7 - 12 yrs

₹1L - ₹15L / yr

Shell Scripting

Python

Ansible

Terraform

DevOps

Roles and Responsibilities:

• Gather and analyse cloud infrastructure requirements

• Automating system tasks and infrastructure using a scripting language (Shell/Python/Ruby

preferred), with configuration management tools (Ansible/ Puppet/Chef), service registry and

discovery tools (Consul and Vault, etc), infrastructure orchestration tools (Terraform,

CloudFormation), and automated imaging tools (Packer)

• Support existing infrastructure, analyse problem areas and come up with solutions

• An eye for monitoring – the candidate should be able to look at complex infrastructure and be

able to figure out what to monitor and how.

• Work along with the Engineering team to help out with Infrastructure / Network automation needs.

• Deploy infrastructure as code and automate as much as possible

• Manage a team of DevOps

Desired Profile:

• Understanding of provisioning of Bare Metal and Virtual Machines

• Working knowledge of Configuration management tools like Ansible/ Chef/ Puppet, Redfish.

• Experience in scripting languages like Ruby/ Python/ Shell Scripting

• Working knowledge of IP networking, VPN's, DNS, load balancing, firewalling & IPS concepts

• Strong Linux/Unix administration skills.

• Self-starter who can implement with minimal guidance

• Hands-on experience setting up CICD from SCRATCH in Jenkins

• Experience with Managing K8s infrastructure

Roles and Responsibilities:

• Gather and analyse cloud infrastructure requirements

• Automating system tasks and infrastructure using a scripting language (Shell/Python/Ruby

preferred), with configuration management tools (Ansible/ Puppet/Chef), service registry and

discovery tools (Consul and Vault, etc), infrastructure orchestration tools (Terraform,

CloudFormation), and automated imaging tools (Packer)

• Support existing infrastructure, analyse problem areas and come up with solutions

• An eye for monitoring – the candidate should be able to look at complex infrastructure and be

able to figure out what to monitor and how.

• Work along with the Engineering team to help out with Infrastructure / Network automation needs.

• Deploy infrastructure as code and automate as much as possible

• Manage a team of DevOps

Desired Profile:

• Understanding of provisioning of Bare Metal and Virtual Machines

• Working knowledge of Configuration management tools like Ansible/ Chef/ Puppet, Redfish.

• Experience in scripting languages like Ruby/ Python/ Shell Scripting

• Working knowledge of IP networking, VPN's, DNS, load balancing, firewalling & IPS concepts

• Strong Linux/Unix administration skills.

• Self-starter who can implement with minimal guidance

• Hands-on experience setting up CICD from SCRATCH in Jenkins

• Experience with Managing K8s infrastructure

221515 Senior Software Engineer (Open)

at Cloudera

2 recruiters

Posted by Rahamath Mallick

Remote only

6 - 9 yrs

₹6L - ₹12L / yr

Docker

Kubernetes

DevOps

Amazon Web Services (AWS)

Windows Azure

+1 more

Job Description

We (the Software Engineer team) are looking for a motivated, experienced person with a data-driven approach to join our Distribution Team in Bangalore to help design, execute and improve our test sets and infrastructure for producing high-quality Hadoop software.

A Day in the life

You will be part of a team that makes sure our releases are predictable and deliver high value to the customer. This team is responsible for automating and maintaining our test harness, and making test results reliable and repeatable.

You will:

work on making our distributed software stack more resilient to high-scale endurance runs and customer simulations
provide valuable fixes to our product development teams to the issues you’ve found during exhaustive test runs
work with product and field teams to make sure our customer simulations match the expectations and can provide valuable feedback to our customers
work with amazing people - We are a fun & smart team, including many of the top luminaries in Hadoop and related open source communities. We frequently interact with the research community, collaborate with engineers at other top companies & host cutting edge researchers for tech talks.
do innovative work - Cloudera pushes the frontier of big data & distributed computing, as our track record shows. We work on high-profile open source projects, interacting daily with engineers at other exciting companies, speaking at meet-ups, etc.
be a part of a great culture - Transparent and open meritocracy. Everybody is always thinking of better ways to do things, and coming up with ideas that make a difference. We build our culture to be the best workplace in our careers.

You have:

strong knowledge in at least 1 of the following languages: Java / Python / Scala / C++ / C#
hands-on experience with at least 1 of the following configuration management tools: Ansible, Chef, Puppet, Salt
confidence with Linux environments
ability to identify critical weak spots in distributed software systems
experience in developing automated test cases and test plans
ability to deal with distributed systems
solid interpersonal skills conducive to a distributed environment
ability to work independently on multiple tasks
self-driven & motivated, with a strong work ethic and a passion for problem solving
innovate and automate and break the code

The right person in this role has an opportunity to make a huge impact at Cloudera and add value to our future decisions. If this position has piqued your interest and you have what we described - we invite you to apply! An adventure in data awaits.

Job Description

A Day in the life

You will:

work on making our distributed software stack more resilient to high-scale endurance runs and customer simulations
provide valuable fixes to our product development teams to the issues you’ve found during exhaustive test runs
work with product and field teams to make sure our customer simulations match the expectations and can provide valuable feedback to our customers
work with amazing people - We are a fun & smart team, including many of the top luminaries in Hadoop and related open source communities. We frequently interact with the research community, collaborate with engineers at other top companies & host cutting edge researchers for tech talks.
do innovative work - Cloudera pushes the frontier of big data & distributed computing, as our track record shows. We work on high-profile open source projects, interacting daily with engineers at other exciting companies, speaking at meet-ups, etc.
be a part of a great culture - Transparent and open meritocracy. Everybody is always thinking of better ways to do things, and coming up with ideas that make a difference. We build our culture to be the best workplace in our careers.

You have:

strong knowledge in at least 1 of the following languages: Java / Python / Scala / C++ / C#
hands-on experience with at least 1 of the following configuration management tools: Ansible, Chef, Puppet, Salt
confidence with Linux environments
ability to identify critical weak spots in distributed software systems
experience in developing automated test cases and test plans
ability to deal with distributed systems
solid interpersonal skills conducive to a distributed environment
ability to work independently on multiple tasks
self-driven & motivated, with a strong work ethic and a passion for problem solving
innovate and automate and break the code

DevOps Engineer

at samco securities limited

9 recruiters

Posted by Careers Samco

Mumbai

1 - 6 yrs

₹2L - ₹4L / yr

Docker

Kubernetes

DevOps

Amazon Web Services (AWS)

Windows Azure

+2 more

Install, configuration management, performance tuning and monitoring of Web, App and Database servers.
Install, setup and management of Java, PHP and NodeJS stack with software load balancers.
Install, setup and administer MySQL, Mongo, Elasticsearch & PostgreSQL DBs.
Install, set up and maintenance monitoring solutions for like Nagios, Zabbix.
Design and implement DevOps processes for new projects following the department's objectives of automation.
Collaborate on projects with development teams to provide recommendations, support and guidance.
Work towards full automation, monitoring, virtualization and containerization.
Create and maintain tools for deployment, monitoring and operations.
Automation of processes in a scalable and easy to understand way that can be detailed and understood through documentation.
Develop and deploy software that will help drive improvements towards the availability, performance, efficiency, and security of services.
Maintain 24/7 availability for responsible systems and be open to on-call rotation.

Install, configuration management, performance tuning and monitoring of Web, App and Database servers.
Install, setup and management of Java, PHP and NodeJS stack with software load balancers.
Install, setup and administer MySQL, Mongo, Elasticsearch & PostgreSQL DBs.
Install, set up and maintenance monitoring solutions for like Nagios, Zabbix.
Design and implement DevOps processes for new projects following the department's objectives of automation.
Collaborate on projects with development teams to provide recommendations, support and guidance.
Work towards full automation, monitoring, virtualization and containerization.
Create and maintain tools for deployment, monitoring and operations.
Automation of processes in a scalable and easy to understand way that can be detailed and understood through documentation.
Develop and deploy software that will help drive improvements towards the availability, performance, efficiency, and security of services.
Maintain 24/7 availability for responsible systems and be open to on-call rotation.

Devops Engineer

at Knowlarity Communication India Pvt Ltd

5 recruiters

Posted by Indresh Vikram Singh

Gurugram

2 - 7 yrs

₹10L - ₹19L / yr

Docker

Kubernetes

DevOps

Amazon Web Services (AWS)

Linux/Unix

+1 more

About Us : http://www.knowlarity.com/" target="_blank">http://www.knowlarity.com

About Job Role :

Experience working on Linux based infrastructure
Understanding of any scripting programming language.
Configuration and managing databases such as MySQL
Working knowledge of various tools, open-source technologies, and cloud services (AWS)
Implementing automation tools(Ansible, Jenkins) for deployment and provisioning IT infrastructure
Excellent troubleshooting of cloud systems.
Awareness of critical concepts in DevOps principles.

About Us : http://www.knowlarity.com/" target="_blank">http://www.knowlarity.com

About Job Role :

Devops Engineer

Global SaaS product built to help revenue teams. (TP1)

Agency job

via Multi Recruit by Kavitha S

Bengaluru (Bangalore)

3 - 6 yrs

₹30L - ₹40L / yr

DevOps

Amazon Web Services (AWS)

Cloud Platform

3-6 years of relevant work experience in a DevOps role.
Deep understanding of Amazon Web Services or equivalent cloud platforms.
Proven record of infra automation and programming skills in any of these languages - Python, Ruby, Perl, Javascript.
Implement DevOps Industry best practices and the application of procedures to achieve a continuously deployable system
Continuously improve and increase the capabilities of the CI/CD pipeline
Support engineering teams in the implementation of life-cycle infrastructure solutions and documentation operations in order to meet the engineering departments quality and standards
Participate in production outages and handle complex issues and works towards resolution

3-6 years of relevant work experience in a DevOps role.
Deep understanding of Amazon Web Services or equivalent cloud platforms.
Proven record of infra automation and programming skills in any of these languages - Python, Ruby, Perl, Javascript.
Implement DevOps Industry best practices and the application of procedures to achieve a continuously deployable system
Continuously improve and increase the capabilities of the CI/CD pipeline
Support engineering teams in the implementation of life-cycle infrastructure solutions and documentation operations in order to meet the engineering departments quality and standards
Participate in production outages and handle complex issues and works towards resolution

Lead Devops Engineer

Cloud infrastructure solutions and support company. (SE1)

Agency job

via Multi Recruit by Ranjini A R

Bengaluru (Bangalore)

6 - 8 yrs

₹12L - ₹15L / yr

DevOps

Jenkins

Docker

Kubernetes

CI/CD

+3 more

Specific responsibilities commensurate with experience and include:

Ability to react quickly and effectively to identify and resolve issues that heavily impact CI/CD system (immediate mitigation of impact, long-term resolution including strategies for risk mitigation/monitoring/alert for proactive resolution of potential future occurrences)
Design, develop, unit test, and implement build automation scripts including environment configuration validation processes
Automate and improve development process by evaluation and introduction of new tools and scripts, and manage their life cycle and validation
Determine branching strategy and maintain branches for various components, products, and product lines
Come up with solutions to open-ended problems that focus on workflow improvements for the Software department
Address issues with well-defined requirements efficiently; come up with short-term and long-term solutions and staged deployment strategies
Self-driven-- takes action to move tickets from start to completion with minimal oversight
Ability to communicate with and consider perspectives of stakeholders including but not limited to: IT, software development, verification
Ability to break down a problem into smaller components and solve them in a logical, controlled, clearly explainable approach
Lead the creation and maintenance of a pre-production environment as a testbed for build process improvements and changes before deployment to the production environment
Gather metrics via direct input, data based on analysis of developer working habits analysis and pain points to assess current state and areas requiring further improvement
Define chain of communication and immediate paths of action in the case of a build fault state
Ability to work within constraints of the internal network without access to commercial cloud solutions
Create metrics that define ‘efficiency’ and ‘reliability’ in measurable terms, and track them
Perform static code and security analysis
Design and execute unit tests and perform code coverage analysis
Able to work in Agile development team environment

Key Requirement & Qualifications:

Bachelor’s degree (or higher) in Electrical Engineering, Computer Engineering, Computer Science or equivalent
6+ years (minimum) experience handling Build, Release, and Deployment of software on Windows and/or Linux environments (on-premise)
Experience with the development and deployment of CM processes and tools
Build automation for .NET using TeamCity (Jenkins is an asset)
Scripting languages: Windows batch scripting, Powershell, Ant/NAnt
Source control systems usage, branching strategies, and workflow (Git preferred, Subversion)
6+ years of hands-on programming experience with C# and .NET (both Framework and Core)
Troubleshooting and debugging-- what information to gather when there are issues with CI/CD system, and how to gather it (i.e., analyzing network communication? Windows crash dumps, java logs, etc.)
6+ years (minimum) in web/desktop application software development experience
Excellent problem solving, critical and analytical thinking
Strong team player who understands SDLC and QA methodologies
A professional, results-oriented individual with a high degree of self-motivation
Excellent written and verbal communication skills and the ability to coordinate work/activities with multiple software/IT teams
Working with virtual machines and build management on virtual machines (VMware preferred).
Managing configurations for multiple build environments
OS administration and scripting experience (Windows is a must, Linux desired)
Experience with test automation tools (NUnit, customer inhouse frameworks) and strategies is an asset
Creation and maintenance of monitoring and alert systems (Zabbix)
Familiarity with databases (SQL-based) - create, modify, optimize (via script)
Data and metrics gathering, aggregation, and reporting
Experience with work management and documentation tools: JIRA and Confluence

Specific responsibilities commensurate with experience and include:

Ability to react quickly and effectively to identify and resolve issues that heavily impact CI/CD system (immediate mitigation of impact, long-term resolution including strategies for risk mitigation/monitoring/alert for proactive resolution of potential future occurrences)
Design, develop, unit test, and implement build automation scripts including environment configuration validation processes
Automate and improve development process by evaluation and introduction of new tools and scripts, and manage their life cycle and validation
Determine branching strategy and maintain branches for various components, products, and product lines
Come up with solutions to open-ended problems that focus on workflow improvements for the Software department
Address issues with well-defined requirements efficiently; come up with short-term and long-term solutions and staged deployment strategies
Self-driven-- takes action to move tickets from start to completion with minimal oversight
Ability to communicate with and consider perspectives of stakeholders including but not limited to: IT, software development, verification
Ability to break down a problem into smaller components and solve them in a logical, controlled, clearly explainable approach
Lead the creation and maintenance of a pre-production environment as a testbed for build process improvements and changes before deployment to the production environment
Gather metrics via direct input, data based on analysis of developer working habits analysis and pain points to assess current state and areas requiring further improvement
Define chain of communication and immediate paths of action in the case of a build fault state
Ability to work within constraints of the internal network without access to commercial cloud solutions
Create metrics that define ‘efficiency’ and ‘reliability’ in measurable terms, and track them
Perform static code and security analysis
Design and execute unit tests and perform code coverage analysis
Able to work in Agile development team environment

Key Requirement & Qualifications:

Bachelor’s degree (or higher) in Electrical Engineering, Computer Engineering, Computer Science or equivalent
6+ years (minimum) experience handling Build, Release, and Deployment of software on Windows and/or Linux environments (on-premise)
Experience with the development and deployment of CM processes and tools
Build automation for .NET using TeamCity (Jenkins is an asset)
Scripting languages: Windows batch scripting, Powershell, Ant/NAnt
Source control systems usage, branching strategies, and workflow (Git preferred, Subversion)
6+ years of hands-on programming experience with C# and .NET (both Framework and Core)
Troubleshooting and debugging-- what information to gather when there are issues with CI/CD system, and how to gather it (i.e., analyzing network communication? Windows crash dumps, java logs, etc.)
6+ years (minimum) in web/desktop application software development experience
Excellent problem solving, critical and analytical thinking
Strong team player who understands SDLC and QA methodologies
A professional, results-oriented individual with a high degree of self-motivation
Excellent written and verbal communication skills and the ability to coordinate work/activities with multiple software/IT teams
Working with virtual machines and build management on virtual machines (VMware preferred).
Managing configurations for multiple build environments
OS administration and scripting experience (Windows is a must, Linux desired)
Experience with test automation tools (NUnit, customer inhouse frameworks) and strategies is an asset
Creation and maintenance of monitoring and alert systems (Zabbix)
Familiarity with databases (SQL-based) - create, modify, optimize (via script)
Data and metrics gathering, aggregation, and reporting
Experience with work management and documentation tools: JIRA and Confluence