Skill: Python, Docker or Ansible , AWS
➢ Experience Building a multi-region highly available auto-scaling infrastructure that optimizes
performance and cost. plan for future infrastructure as well as Maintain & optimize existing
infrastructure.
➢ Conceptualize, architect and build automated deployment pipelines in a CI/CD environment like
Jenkins.
➢ Conceptualize, architect and build a containerized infrastructure using Docker,Mesosphere or
similar SaaS platforms.
Work with developers to institute systems, policies and workflows which allow for rollback of
deployments Triage release of applications to production environment on a daily basis.
➢ Interface with developers and triage SQL queries that need to be executed inproduction
environments.
➢ Maintain 24/7 on-call rotation to respond and support troubleshooting of issues in production.
➢ Assist the developers and on calls for other teams with post mortem, follow up and review of
issues affecting production availability.
➢ Establishing and enforcing systems monitoring tools and standards
➢ Establishing and enforcing Risk Assessment policies and standards
➢ Establishing and enforcing Escalation policies and standards

Similar jobs
GCP Cloud Engineer:
- Proficiency in infrastructure as code (Terraform).
- Scripting and automation skills (e.g., Python, Shell). Knowing python is must.
- Collaborate with teams across the company (i.e., network, security, operations) to build complete cloud offerings.
- Design Disaster Recovery and backup strategies to meet application objectives.
- Working knowledge of Google Cloud
- Working knowledge of various tools, open-source technologies, and cloud services
- Experience working on Linux based infrastructure.
- Excellent problem-solving and troubleshooting skills
🚀 RECRUITING BOND HIRING
Role: CLOUD OPERATIONS & MONITORING ENGINEER - (THE GUARDIAN OF UPTIME)
⚡ THIS IS NOT A MONITORING ROLE
THIS IS A COMMAND ROLE
You don’t watch dashboards.
You control outcomes.
You don’t react to incidents.
You eliminate them before they escalate.
This role powers an AI-driven SaaS + IoT platform where:
---> Uptime is non-negotiable
---> Latency is hunted
---> Failures are never allowed to repeat
Incidents don’t grow.
Problems don’t hide.
Uptime is enforced.
🧠 WHAT YOU’LL OWN
(Real Work. Real Impact.)
🔍 Total Observability
---> Real-time visibility across cloud, application, database & infrastructure
---> High-signal dashboards (Grafana + cloud-native tools)
---> Performance trends tracked before growth breaks systems
🚨 Smart Alerting (No Noise)
---> Alerts that fire only when action is required
---> Zero false positives. Zero alert fatigue
Right signal → right person → right time
⚙ Automation as a Weapon
---> End-to-end automation of operational tasks
---> Standardized logging, metrics & alerting
---> Systems that scale without human friction
🧯 Incident Command & Reliability
---> First responder for critical incidents (on-call rotation)
---> Root cause analysis across network, app, DB & storage
Fix fast — then harden so it never breaks the same way again
📘 Operational Excellence
---> Battle-tested runbooks
---> Documentation that actually works under pressure
Every incident → a stronger platform
🛠️ TECHNOLOGIES YOU’LL MASTER
☁ Cloud: AWS | Azure | Google Cloud
📊 Monitoring: Grafana | Metrics | Traces | Logs
📡 Alerting: Production-grade alerting systems
🌐 Networking: DNS | Routing | Load Balancers | Security
🗄 Databases: Production systems under real pressure
⚙ DevOps: Automation | Reliability Engineering
🎯 WHO WE’RE LOOKING FOR
Engineers who take uptime personally.
You bring:
---> 3+ years in Cloud Ops / DevOps / SRE
---> Live production SaaS experience
---> Deep AWS / Azure / GCP expertise
---> Strong monitoring & alerting experience
---> Solid networking fundamentals
---> Calm, methodical incident response
---> Bonus (Highly Preferred):
---> B2B SaaS + IoT / hybrid platforms
---> Strong automation mindset
---> Engineers who think in systems, not tickets
💼 JOB DETAILS
📍 Bengaluru
🏢 Hybrid (WFH)
💰 (Final CTC depends on experience & interviews)
🌟 WHY THIS ROLE?
Most cloud teams manage uptime. We weaponize it.
Your work won’t just keep systems running — it will keep customers confident, operations flawless, and competitors wondering how it all works so smoothly.
📩 APPLY / REFER : 🔗 Know someone who lives for reliability, observability & cloud excellence?
About GradRight
Our vision is to be the world’s leading Ed-Fin Tech company dedicated to making higher education accessible and affordable to all. Our mission is to drive transparency and accountability in the global higher education sector and create significant impact using the power of technology, data science and collaboration.
GradRight is the world’s first SaaS ecosystem that brings together students, universities and financial institutions in an integrated manner. It enables students to find and fund high return college education, universities to engage and select the best-fit students and banks to lend in an effective and efficient manner.
In the last three years, we have enabled students to get the best deals on a $ 2.8+ Billion of loan requests and facilitated disbursements of more than $ 350+ Million in loans. GradRight won the HSBC Fintech Innovation Challenge supported by the Ministry of Electronics & IT, Government of India & was among the top 7 global finalists in The PIEoneer awards, UK.
GradRight’s team possesses extensive domestic and international experience in the launch and scale-up of premier higher education institutions. It is led by alumni of IIT Delhi, BITS Pilani, IIT Roorkee, ISB Hyderabad and University of Pennsylvania. GradRight is a Delaware, USA registered company with a wholly owned subsidiary in India.
About the Role
We are looking for a passionate DevOps Engineer with hands-on experience in AWS cloud infrastructure, containerization, and orchestration. The ideal candidate will be responsible for building, automating, and maintaining scalable cloud solutions, ensuring smooth CI/CD pipelines, and supporting development and operations teams.
Core Responsibilities
Design, implement, and manage scalable, secure, and highly available infrastructure on AWS.
Build and maintain CI/CD pipelines using tools like Jenkins, GitLab CI/CD, or GitHub Actions.
Containerize applications using Docker and manage deployments with Kubernetes (EKS, self-managed, or other distributions).
Monitor system performance, availability, and security using tools like CloudWatch, Prometheus, Grafana, ELK/EFK stack.
Collaborate with development teams to optimize application performance and deployment processes.
Required Skills & Experience
3–4 years of professional experience as a DevOps Engineer or similar role.
Strong expertise in AWS services (EC2, S3, RDS, Lambda, VPC, IAM, CloudWatch, EKS, etc.).
Hands-on experience with Docker and Kubernetes (EKS or self-hosted clusters).
Proficiency in CI/CD pipeline design and automation.
Experience with Infrastructure as Code (Terraform / AWS CloudFormation).
Solid understanding of Linux/Unix systems and shell scripting.
Knowledge of monitoring, logging, and alerting tools.
Familiarity with networking concepts (DNS, Load Balancing, Security Groups, Firewalls).
Basic programming/scripting experience in Python, Bash, or Go.
Nice to Have
Exposure to microservices architecture and service mesh (Istio/Linkerd).
Knowledge of serverless (AWS Lambda, API Gateway).
Must Have -
a. Background working with Startups
b. Good knowledge of Kubernetes & Docker
c. Background working in Azure
What you’ll be doing
- Ensure that our applications and environments are stable, scalable, secure and performing as expected.
- Proactively engage and work in alignment with cross-functional colleagues to understand their requirements, contributing to and providing suitable supporting solutions.
- Develop and introduce systems to aid and facilitate rapid growth including implementation of deployment policies, designing and implementing new procedures, configuration management and planning of patches and for capacity upgrades
- Observability: ensure suitable levels of monitoring and alerting are in place to keep engineers aware of issues.
- Establish runbooks and procedures to keep outages to a minimum. Jump in before users notice that things are off track, then automate it for the future.
- Automate everything so that nothing is ever done manually in production.
- Identify and mitigate reliability and security risks. Make sure we are prepared for peak times,
- DDoS attacks and fat fingers.
- Troubleshoot issues across the whole stack - software, applications and network.
- Manage individual project priorities, deadlines, and deliverables as part of a self-organizing team.
- Learn and unlearn every day by exchanging knowledge and new insights, conducting constructive code reviews, and participating in retrospectives.
Requirements
- 2+ years extensive experience of Linux server administration include patching, packaging (rpm), performance tuning, networking, user management, and security.
- 2+ years of implementing systems that are highly available, secure, scalable, and self-healingon Azure cloud platform
- Strong understanding of networking, especially in cloud environments along with a good understanding of CICD.
- Prior experience implementing industry standard security best practices, including those recommended by Azure
- Proficiency with Bash, and any high-level scripting language.
- Basic working knowledge of observability stacks like ELK, prometheus, grafana, Signoz etc
- Proficiency with Infrastructure as Code and Infrastructure Testing, preferably using Pulumi/Terraform.
- Hands-on experience in building and administering VMs and Containers using tools such as Docker/Kubernetes.
- Excellent communication skills, spoken as well as written, with a demonstrated ability to articulate technical problems and projects to all stakeholders.
- Candidate should be able to write the sample programs using the Tools (Bash, PowerShell, Python or Shell scripting)
- Analytical/logical reasoning
- GitHub Actions
- Should have good working experience with GitHub Actions
- Repository/Workflow Dispatch, writing reusable workflows, etc
- AZ CLI commands
- Hands-on experience with AZ CLI commands
We are looking for a DevOps Engineer (individual contributor) to maintain and build upon our next-generation infrastructure. We aim to ensure that our systems are secure, reliable and high-performing by constantly striving to achieve best-in-class infrastructure and security by:
- Leveraging a variety of tools to ensure all configuration is codified (using tools like Terraform and Flux) and applied in a secure, repeatable way (via CI)
- Routinely identifying new technologies and processes that enable us to streamline our operations and improve overall security
- Holistically monitoring our overall DevOps setup and health to ensure our roadmap constantly delivers high-impact improvements
- Eliminating toil by automating as many operational aspects of our day-to-day work as possible using internally created, third party and/or open-source tools
- Maintain a culture of empowerment and self-service by minimizing friction for developers to understand and use our infrastructure through a combination of innovative tools, excellent documentation and teamwork
Tech stack: Microservices primarily written in JavaScript, Kotlin, Scala, and Python. The majority of our infrastructure sits within EKS on AWS, using Istio. We use Terraform and Helm/Flux when working with AWS and EKS (k8s). Deployments are managed with a combination of Jenkins and Flux. We rely heavily on Kafka, Cassandra, Mongo and Postgres and are increasingly leveraging AWS-managed services (e.g. RDS, lambda).
**THIS IS A 100% WORK FROM OFFICE ROLE**
We are looking for an experienced DevOps engineer that will help our team establish DevOps practice. You will work closely with the technical lead to identify and establish DevOps practices in the company.
You will help us build scalable, efficient cloud infrastructure. You’ll implement monitoring for automated system health checks. Lastly, you’ll build our CI pipeline, and train and guide the team in DevOps practices.
ROLE and RESPONSIBILITIES:
• Understanding customer requirements and project KPIs
• Implementing various development, testing, automation tools, and IT infrastructure
• Planning the team structure, activities, and involvement in project management
activities.
• Managing stakeholders and external interfaces
• Setting up tools and required infrastructure
• Defining and setting development, test, release, update, and support processes for
DevOps operation
• Have the technical skill to review, verify, and validate the software code developed in
the project.
• Troubleshooting techniques and fixing the code bugs
• Monitoring the processes during the entire lifecycle for its adherence and updating or
creating new processes for improvement and minimizing the wastage
• Encouraging and building automated processes wherever possible
• Identifying and deploying cybersecurity measures by continuously performing
vulnerability assessment and risk management
• Incidence management and root cause analysis
• Coordination and communication within the team and with customers
• Selecting and deploying appropriate CI/CD tools
• Strive for continuous improvement and build continuous integration, continuous
development, and constant deployment pipeline (CI/CD Pipeline)
• Mentoring and guiding the team members
• Monitoring and measuring customer experience and KPIs
• Managing periodic reporting on the progress to the management and the customer
Essential Skills and Experience Technical Skills
• Proven 3+years of experience as DevOps
• A bachelor’s degree or higher qualification in computer science
• The ability to code and script in multiple languages and automation frameworks
like Python, C#, Java, Perl, Ruby, SQL Server, NoSQL, and MySQL
• An understanding of the best security practices and automating security testing and
updating in the CI/CD (continuous integration, continuous deployment) pipelines
• An ability to conveniently deploy monitoring and logging infrastructure using tools.
• Proficiency in container frameworks
• Mastery in the use of infrastructure automation toolsets like Terraform, Ansible, and command line interfaces for Microsoft Azure, Amazon AWS, and other cloud platforms
• Certification in Cloud Security
• An understanding of various operating systems
• A strong focus on automation and agile development
• Excellent communication and interpersonal skills
• An ability to work in a fast-paced environment and handle multiple projects
simultaneously
OTHER INFORMATION
The DevOps Engineer will also be expected to demonstrate their commitment:
• to gedu values and regulations, including equal opportunities policy.
• the gedu’s Social, Economic and Environmental responsibilities and minimise environmental impact in the performance of the role and actively contribute to the delivery of gedu’s Environmental Policy.
• to their Health and Safety responsibilities to ensure their contribution to a safe and secure working environment for staff, students, and other visitors to the campus.
Role – Devops
Experience 3 – 6 Years
Roles & Responsibilities –
- 3-6 years of experience in deploying and managing highly scalable fault resilient systems
- Strong experience in container orchestration and server automation tools such as Kubernetes, Google Container Engine, Docker Swarm, Ansible, Terraform
- Strong experience with Linux-based infrastructures, Linux/Unix administration, AWS, Google Cloud, Azure
- Strong experience with databases such as MySQL, Hadoop, Elasticsearch, Redis, Cassandra, and MongoDB.
- Knowledge of scripting languages such as Java, JavaScript, Python, PHP, Groovy, Bash.
- Experience in configuring CI/CD pipelines using Jenkins, GitLab CI, Travis.
- Proficient in technologies such as Docker, Kafka, Raft and Vagrant
- Experience in implementing queueing services such as RabbitMQ, Beanstalkd, Amazon SQS and knowledge in ElasticStack is a plus.










