
About BootLabs:
BootLabs is a boutique tech consulting and digital engineering company that partners with leading enterprises to build and support scalable, cloud-native, and data-driven platforms. We specialise in delivering high-impact solutions across cloud, AI/GenAI, platform engineering, and enterprise application support. Our teams work closely with global clients across highly regulated industries, including BFSI, Healthcare, E-commerce, and other domains, ensuring reliability, security, and operational excellence for mission-critical systems.
JD:
Extensive experience in deploying, and supporting enterprise workloads across Azure IaaS, PaaS, and SaaS environments.
Experience managing enterprise Landing Zones & Azure Policies.
Exposure to Azure Data Lake and Azure Databricks (basic to intermediate).
Knowledge of Terraform and any DevOps CI/CD pipelines.
Hybrid connectivity experience (VPN Gateway / ExpressRoute).
Strong networking fundamentals (NSG, UDR, Peering).
OS-level troubleshooting (Windows & Linux).
Experience with enterprise security tools such as Prisma / Cortex or similar vulnerability & endpoint security tools
Experience in production support, governance, and security guardrails.
Experience handling production incidents, RCA preparation, and performance tuning. Change management & CAB coordination.
Hands-on experience with native Azure security services like Azure Firewall, Azure Application Gateway with WAF etc
Cost optimization ,Monitoring & alerting implementation

About Bootlabs Technologies Private Limited
About
BootLabs is a boutique consulting partner focused on helping enterprises build, scale, and modernize their DevSecOps and cloud platforms. We’re driven by a clear goal—creating future-ready, intelligent systems through strong cloud automation while keeping total cost of ownership in check.
We support organizations in accelerating development and enabling consistent, reliable releases through DevOps automation, including CI/CD and infrastructure as code. Our managed services ensure your cloud platforms stay secure, updated, and high-performing, so your teams can stay focused on core business priorities.
With expertise in site reliability engineering, we help identify system bottlenecks, improve observability, and deliver meaningful insights for faster resolution. We also work closely on cost optimization, ensuring efficient cloud usage with the right controls and scaling mechanisms—so performance improves without unnecessary spend.
Candid answers by the company
Mumbai, Bangalore
Photos
Connect with the team
Similar jobs
REVIEW CRITERIA:
MANDATORY:
- Strong Hands-On AWS Cloud Engineering / DevOps Profile
- Mandatory (Experience 1): Must have 12+ years of experience in AWS Cloud Engineering / Cloud Operations / Application Support
- Mandatory (Experience 2): Must have strong hands-on experience supporting AWS production environments (EC2, VPC, IAM, S3, ALB, CloudWatch)
- Mandatory (Infrastructure as a code): Must have hands-on Infrastructure as Code experience using Terraform in production environments
- Mandatory (AWS Networking): Strong understanding of AWS networking and connectivity (VPC design, routing, NAT, load balancers, hybrid connectivity basics)
- Mandatory (Cost Optimization): Exposure to cost optimization and usage tracking in AWS environments
- Mandatory (Core Skills): Experience handling monitoring, alerts, incident management, and root cause analysis
- Mandatory (Soft Skills): Strong communication skills and stakeholder coordination skills
ROLE & RESPONSIBILITIES:
We are looking for a hands-on AWS Cloud Engineer to support day-to-day cloud operations, automation, and reliability of AWS environments. This role works closely with the Cloud Operations Lead, DevOps, Security, and Application teams to ensure stable, secure, and cost-effective cloud platforms.
KEY RESPONSIBILITIES:
- Operate and support AWS production environments across multiple accounts
- Manage infrastructure using Terraform and support CI/CD pipelines
- Support Amazon EKS clusters, upgrades, scaling, and troubleshooting
- Build and manage Docker images and push to Amazon ECR
- Monitor systems using CloudWatch and third-party tools; respond to incidents
- Support AWS networking (VPCs, NAT, Transit Gateway, VPN/DX)
- Assist with cost optimization, tagging, and governance standards
- Automate operational tasks using Python, Lambda, and Systems Manager
IDEAL CANDIDATE:
- Strong hands-on AWS experience (EC2, VPC, IAM, S3, ALB, CloudWatch)
- Experience with Terraform and Git-based workflows
- Hands-on experience with Kubernetes / EKS
- Experience with CI/CD tools (GitHub Actions, Jenkins, etc.)
- Scripting experience in Python or Bash
- Understanding of monitoring, incident management, and cloud security basics
NICE TO HAVE:
- AWS Associate-level certifications
- Experience with Karpenter, Prometheus, New Relic
- Exposure to FinOps and cost optimization practices
The DevOps Engineer will play a critical role in operationalizing artificial intelligence across Bell Techlogix client environments. This role focuses on building and supporting cloud infrastructure, CI/CD pipelines, and automation frameworks that power AI and machine learning workloads. The ideal candidate has experience supporting AI platforms such as Azure AI, Azure Machine Learning, Azure OpenAI, and ServiceNow or conversational AI platforms, and understands the operational requirements of production AI systems, including reliability, scalability, and security.
Key Responsibilities
•Design, build, and operate cloud infrastructure and platform services that support AI and machine learning workloads in production, SLA-driven managed services environments
•Implement CI/CD and MLOps pipelines to enable automated training, testing, deployment, and rollback of AI and ML models
•Develop and maintain Infrastructure as Code to provision AI-ready environments consistently across dev/test/prod
•Support AI platform operations including monitoring model health, pipeline execution, compute utilization, and data dependencies
•Partner with Machine Learning Engineers and Data Engineers to standardize deployment patterns for AI services and LLM-based solutions
•Enable secure and scalable AI integrations using APIs, messaging, and event-driven architectures
•Implement observability solutions for AI platforms, including logging, metrics, alerting, and drift detection integrations
•Troubleshoot AI platform incidents, perform root cause analysis, and implement remediation to improve reliability and automation coverage
•Apply security best practices for AI environments including secrets management, identity and access controls, network isolation, and policy enforcement
•Support AI-driven automation use cases across platforms such as Microsoft Copilot, ServiceNow, and conversational AI tools
•Collaborate with service desk, security, and architecture teams to continuously improve AI service delivery and operational maturity
Required Qualifications
•Bachelor’s degree in Computer Science, Engineering, or equivalent practical experience
•5+ years of experience in DevOps, cloud engineering, or platform operations, with exposure to AI or data workloads
•Hands-on experience with Microsoft Azure, including compute, networking, storage, and monitoring services
•Experience building CI/CD pipelines using Azure DevOps, GitHub Actions, or similar tools
•Working knowledge of Infrastructure as Code (Terraform and/or Bicep/ARM)
•Scripting experience using PowerShell and/or Python
•Experience supporting production platforms with incident management, change control, and root cause analysis
•Understanding of cloud security fundamentals and enterprise governance requirements
Preferred Qualifications
•Experience with Azure Machine Learning, Azure AI Services, Azure OpenAI, or MLOps frameworks
•Exposure to containerization and orchestration technologies (Docker, Kubernetes, AKS)
•Experience supporting data pipelines or feature stores used by machine learning systems
•Familiarity with ServiceNow, AI-driven ITSM workflows, or automation platforms
•Experience with observability tools
•Knowledge of Responsible AI, data governance, and compliance considerations for AI systems
•Relevant certifications (Microsoft Azure Administrator, Azure DevOps Engineer, Azure AI Engineer)
- 3+ years hands-on Azure cloud & automation experience.
- Experience managing high-availability enterprise systems.
- Microsoft Azure (AKS, VNets, App Gateway, Load Balancers).
- Kubernetes (AKS) & Docker.
- Networking (VPN, DNS, routing, firewalls, NSGs).
- Infra-as-Code (Terraform / Bicep optional).
- Monitoring tools: Azure Monitor, Grafana, Prometheus.
- CI/CD: Azure DevOps, GitLab/Jenkins (added advantage).
- Security: Key Vault, certificates, encryption, RBAC.
- Understanding of PostgreSQL/PostGIS networking.
- Design and manage Azure infrastructure (VMs, VNets, NSGs, Load Balancers, AKS, Storage).
- Deploy and maintain AKS workloads for NiFi, PostGIS, and microservices.
- Architect secure network topology including VNet peering, VPNs, Private Endpoints, DNS & Zero Trust policies.
- Implement monitoring and alerting using Azure Monitor, Log Analytics, Grafana & Prometheus.
- Ensure high uptime, DR planning, backup and failover strategies.
- Automate deployments with Azure DevOps, Helm, ArgoCD & GitOps principles.
- Enforce security, RBAC, compliance, and audit standards across environments.
- Good to have knowledge/experince in Linux administration (Ubuntu/Debian).
Candidate must have a minimum of 8+ years of IT experience
IST time zone.
candidates should have hands-on experience in
DevOps (GitLab, Artifactory, SonarQube, AquaSec, Terraform & Docker / K8")
Thanks & Regards,
Anitha. K
TAG Specialist
DevOps Engineer
KNOLSKAPE is looking for a DevOps Engineer to help us build Educational platforms and products that make learning experiential for leaders of the world.
DevOps Engineer responsibilities include deploying product updates, identifying production issues, and implementing integrations that meet customer needs. If you have a solid background in working with cloud technologies, setting up efficient deployment processes, and are motivated to work with diverse and talented teams, we’d like to meet you.
Ultimately, you will execute and automate operational processes fast, accurately, and securely.
Skills and Experience
- 2+ years of experience in building infrastructure experience with Cloud Providers ( AWS / Azure / GCP)
- Build and Deployment Management (Gitlab).
- Experience in writing automation scripts using Shell, Python, and Terraform based.
- Good experience in building pipelines with YAML-based knowledge of the GitLab environment.
- System Administration skill set.
- Docker/Kubernetes container infrastructure and orchestration
- Deploying/operating NodeJs/PHP/LAMP framework-based clusters with infrastructure.
- Monitoring, metrics collection, and distributed tracing
- Infrastructure as code” – Experience with Terraform preferred.
- Strong AWS Deployment Experience
- Provide system-level technical support
- Desire to learn new technologies while supporting existing
Roles and Responsibilities
- End to End-building CI/CD pipelines using tools like Jenkins and Jenkins Pipelines etc.
- Build CI/CD pipelines to orchestrate provisioning and deployment of both large scale systems
- Develop and implement instrumentation for monitoring the health and availability of services including fault detection, alerting, triage, and recovery (automated and manual)
- Develop, improve, and thoroughly document operational practices and procedures.
- Perform tasks related to securing and keeping the products, tools, and processes you are responsible for securing our infrastructure.
- Agile software development practices
- Understand IT processes, including architecture, design, implementation, and operations
- Open Source development experience
- Self-motivated, able and willing to help where help is needed
- Able to build relationships, be culturally sensitive, have goal alignment, have learning agility
Location: Bangalore
About KNOLSKAPE
KNOLSKAPE is an end-to-end learning and assessment platform for accelerated employee development. Our core belief is that desired business outcomes are achieved best when learning needs are aligned with business requirements, but traditional methodologies for capability development require a new, more updated approach. Keeping with this philosophy, we offer engaging, immersive, and experiential learning and assessment solutions - strategy cascading, business acumen, change management, leadership pipeline, digital capabilities, and talent assessments. Leveraging a blended omnichannel delivery model, KNOLSKAPE offers instructor-led classroom sessions, live virtual sessions, and self-paced courses to suit every learning need.
More than 300 clients in 25 countries have benefited from KNOLSKAPE's award-winning experiential solutions. A 120+ strong team based out of offices in Singapore, India, Malaysia, and the USA serves a rapidly growing global client base across industries such as banking and finance, consulting, IT, FMCG, retail, manufacturing, infrastructure, pharmaceuticals, engineering, auto, government and academia.
KNOLSKAPE is a global Top 20 gamification company, recipient of numerous Brandon Hall awards, and has been recognized as a company to watch for in the Talent Management Space, by Frost & Sullivan, and as a disruptor in the learning space, by Bersin by Deloitte.
4 – 6 years of application development with design, development, implementation, and
support experience, including the following:
o C#
o JavaScript
o HTML
o SQL
o Messaging/RabbitMQ
o Asynchronous communication patterns
Experience with Visual Studio and Git
A working understanding of build and release automation, preferably with Azure DevOps
Excellent understanding of object-oriented concepts and .Net framework
Experience in creating reusable libraries in C#
Ability to troubleshoot and isolate/solve complex bugs, connectivity issues, or OS related
issues
Ability to write complex SQL queries and stored procedures in Oracle and/or MS SQL
Proven ability to use design patterns to accomplish scalable architecture
Understanding of event-driven architecture
Experience with message brokers such as RabbitMQ
Experience in the development of REST APIs
Understanding of basic steps of an Agile SDLC
Excellent communication (both written and verbal) and interpersonal skills
Demonstrated accountability and ownership of assigned tasks
Demonstrated leadership and ability to work as a leader on large and complex tasks
Hands on Experience with Linux administration
Experience using Python or Shell scripting (for Automation)
Hands-on experience with Implementation of CI/CD Processes
Experience working with one cloud platforms (AWS or Azure or Google)
Experience working with configuration management tools such as Ansible & Chef
Experience working with Containerization tool Docker.
Experience working with Container Orchestration tool Kubernetes.
Experience in source Control Management including SVN and/or Bitbucket
& GitHub
Experience with setup & management of monitoring tools like Nagios, Sensu & Prometheus or any other popular tools
Hands-on experience in Linux, Scripting Language & AWS is mandatory
Troubleshoot and Triage development, Production issues
Responsibilities
- Building and maintenance of resilient and scalable production infrastructure
- Improvement of monitoring systems
- Creation and support of development automation processes (CI / CD)
- Participation in infrastructure development
- Detection of problems in architecture and proposing of solutions for solving them
- Creation of tasks for system improvements for system scalability, performance and monitoring
- Analysis of product requirements in the aspect of devops
- Managing a team of DevOps, control of task deliveries
- Incident analysis and fixing
Technology stack
Linux, Bash, Salt/Ansible, LXC, libvirt, IPsec, VXLAN, Open vSwitch, OpenVPN, OSPF, BIRD, Cisco NX-OS, Multicast, PIM, LVM, software RAID, LUKS, PostgreSQL, nginx, haproxy, Prometheus, Grafana, Zabbix, GitLab, Capistrano
Skills and Experience
- Understanding of the distributed systems principles
- Understanding of principles for building a resistant network infrastructure
- Experience of Ubuntu Linux administration (Debian-like will be a plus)
- Strong knowledge of Bash
- Experience of working with LXC-containers
- Understanding and experience with infrastructure as a code approach
- Experience of development idempotent Ansible roles
- Experience with relational databases (PostgeSQL), ability to create simple SQL queries
- Experience with git
- Experience with monitoring and metric collect systems (Prometheus, Grafana, Zabbix)
- Understanding of dynamic routing (OSPF)
Preferred experience
- Experience of working with highload zero-downtown environments
- Experience of coding on Python
- Experience of working with IPsec, VXLAN, Open vSwitch
- Knowledge and experience of working with network equipment Cisco
- Experience of working with Cisco NX-OS
- Knowledge of principles of multicast protocols IGMP, PIM
- Experience of setting multicast on Cisco equipment
- Experience of working with Solarflare Onload
- Experience administering Atlassian products
What will you do?
- Setup, manage Applications with automation, DevOps, and CI/CD tools.
- Deploy, Maintain and Monitor Infrastructure and Services.
- Automate code and Infra Deployments.
- Tune, optimize and keep systems up to date.
- Design and implement deployment strategies.
- Setup infrastructure in cloud platforms like AWS, Azure, Google Cloud, IBM cloud, Digital Ocean etc as per requirement.
DevOps Engineer Skills Building a scalable and highly available infrastructure for data science Knows data science project workflows Hands-on with deployment patterns for online/offline predictions (server/serverless)
Experience with either terraform or Kubernetes
Experience of ML deployment frameworks like Kubeflow, MLflow, SageMaker Working knowledge of Jenkins or similar tool Responsibilities Owns all the ML cloud infrastructure (AWS) Help builds out an entirely CI/CD ecosystem with auto-scaling Work with a testing engineer to design testing methodologies for ML APIs Ability to research & implement new technologies Help with cost optimizations of infrastructure.
Knowledge sharing Nice to Have Develop APIs for machine learning Can write Python servers for ML systems with API frameworks Understanding of task queue frameworks like Celery












