
What you will do:
- Handling Configuration Management, Web Services Architectures, DevOps Implementation, Build & Release Management, Database management, Backups and monitoring
- Logging, metrics and alerting management
- Creating Docker files
- Performing root cause analysis for production errors
What you need to have:
- 12+ years of experience in Software Development/ QA/ Software Deployment with 5+ years of experience in managing high performing teams
- Proficiency in VMware, AWS & cloud applications development, deployment
- Good knowledge in Java, Node.js
- Experience working with RESTful APIs, JSON etc
- Experience with Unit/ Functional automation is a plus
- Experience with MySQL, Mango DB, Redis, Rabbit MQ
- Proficiency in Jenkins. Ansible, Terraform/Chef/Ant
- Proficiency in Linux based Operating Systems
- Proficiency of Cloud Infrastructure like Dockers, Kubernetes
- Strong problem solving and analytical skills
- Good written and oral communication skills
- Sound understanding in areas of Computer Science such as algorithms, data structures, object oriented design, databases
- Proficiency in monitoring and observability

Similar jobs
Key responsibilities
• Design, build, and maintain robust CI/CD pipelines using Azure DevOps Services (Azure Pipelines) and Git-based workflows.
• Implement and manage infrastructure as code (IaC) using ARM templates, Bicep, and/or Terraform for repeatable environment provisioning.
• Containerize applications (Docker) and manage container orchestration platforms such as AKS (Azure Kubernetes Service).
• Automate build, test, release, and rollback processes; integrate automated testing and quality gates into pipelines.
• Monitor and improve platform reliability and observability using logging and monitoring tools (e.g., Azure Monitor, Application Insights, Prometheus, Grafana).
• Drive platform security and compliance through pipeline controls, secrets management (Key Vault / Vault), and secure configuration practices.
• Implement cost-optimization and governance for Azure resources (tags, policies, budgets).
• Troubleshoot build/release failures, production incidents, and performance bottlenecks; perform root-cause analysis and implement permanent fixes.
• Mentor developers in Git workflows, pipeline authoring, best practices for IaC, and cloud-native design.
• Maintain clear documentation: runbooks, deployment playbooks, architecture diagrams, and pipeline templates.
Required skills & experience
• 4+ years hands-on experience working with Azure and cloud-native application delivery.
• Deep experience with Azure DevOps (Repos, Pipelines, Artifacts, Boards).
• Strong IaC skills with Terraform, ARM templates, or Bicep.
• Solid experience with CI/CD design and YAML pipeline authoring.
• Practical knowledge of containerization (Docker) and Kubernetes — preferably AKS.
• Scripting skills: PowerShell, Bash, and/or Python for automation.
• Experience with Git workflows (branching strategies, PRs, code reviews).
• Familiarity with configuration management and secrets management (Azure Key Vault, HashiCorp Vault).
• Understanding of networking, identity (Azure AD), and security fundamentals in Azure.
• Strong troubleshooting, debugging, and incident response skills.
• Good collaboration and communication skills; ability to work across teams.
Certification
AZ-400: Microsoft Certified: DevOps Engineer Expert or AZ-104 or AZ 305 or Terraform Associate.
The DevOps Engineer will play a critical role in operationalizing artificial intelligence across Bell Techlogix client environments. This role focuses on building and supporting cloud infrastructure, CI/CD pipelines, and automation frameworks that power AI and machine learning workloads. The ideal candidate has experience supporting AI platforms such as Azure AI, Azure Machine Learning, Azure OpenAI, and ServiceNow or conversational AI platforms, and understands the operational requirements of production AI systems, including reliability, scalability, and security.
Key Responsibilities
•Design, build, and operate cloud infrastructure and platform services that support AI and machine learning workloads in production, SLA-driven managed services environments
•Implement CI/CD and MLOps pipelines to enable automated training, testing, deployment, and rollback of AI and ML models
•Develop and maintain Infrastructure as Code to provision AI-ready environments consistently across dev/test/prod
•Support AI platform operations including monitoring model health, pipeline execution, compute utilization, and data dependencies
•Partner with Machine Learning Engineers and Data Engineers to standardize deployment patterns for AI services and LLM-based solutions
•Enable secure and scalable AI integrations using APIs, messaging, and event-driven architectures
•Implement observability solutions for AI platforms, including logging, metrics, alerting, and drift detection integrations
•Troubleshoot AI platform incidents, perform root cause analysis, and implement remediation to improve reliability and automation coverage
•Apply security best practices for AI environments including secrets management, identity and access controls, network isolation, and policy enforcement
•Support AI-driven automation use cases across platforms such as Microsoft Copilot, ServiceNow, and conversational AI tools
•Collaborate with service desk, security, and architecture teams to continuously improve AI service delivery and operational maturity
Required Qualifications
•Bachelor’s degree in Computer Science, Engineering, or equivalent practical experience
•5+ years of experience in DevOps, cloud engineering, or platform operations, with exposure to AI or data workloads
•Hands-on experience with Microsoft Azure, including compute, networking, storage, and monitoring services
•Experience building CI/CD pipelines using Azure DevOps, GitHub Actions, or similar tools
•Working knowledge of Infrastructure as Code (Terraform and/or Bicep/ARM)
•Scripting experience using PowerShell and/or Python
•Experience supporting production platforms with incident management, change control, and root cause analysis
•Understanding of cloud security fundamentals and enterprise governance requirements
Preferred Qualifications
•Experience with Azure Machine Learning, Azure AI Services, Azure OpenAI, or MLOps frameworks
•Exposure to containerization and orchestration technologies (Docker, Kubernetes, AKS)
•Experience supporting data pipelines or feature stores used by machine learning systems
•Familiarity with ServiceNow, AI-driven ITSM workflows, or automation platforms
•Experience with observability tools
•Knowledge of Responsible AI, data governance, and compliance considerations for AI systems
•Relevant certifications (Microsoft Azure Administrator, Azure DevOps Engineer, Azure AI Engineer)
- Public clouds, such as AWS, Azure, or Google Cloud Platform
- Automation technologies, such as Kubernetes or Jenkins
- Configuration management tools, such as Puppet or Chef
- Scripting languages, such as Python or Ruby
- Preferred experience in development associated with Kafka or big data technologies understand essential Kafka components like Zookeeper, Brokers, and optimization of Kafka clients applications (Producers & Consumers). -
Experience with Automation of Infrastructure, Testing , DB Deployment Automation, Logging/Monitoring/alerting
- AWS services experience on CloudFormation, ECS, Elastic Container Registry, Pipelines, Cloudwatch, Glue, and other related services.
- AWS Elastic Kubernetes Services (EKS) - Kubernetes and containers managing and auto-scaling -
Good knowledge and hands-on experiences with various AWS services like EC2, RDS, EKS, S3, Lambda, API, Cloudwatch, etc.
- Good and quick with log analysis to perform Root Cause Analysis (RCA) on production deployments and container errors on cloud watch.
Working on ways to automate and improve deployment and release processes.
- High understanding of the Serverless architecture concept. - Good with Deployment automation tools and Investigating to resolve technical issues.
technical issues. - Sound knowledge of APIs, databases, and container-based ETL jobs.
- Planning out projects and being involved in project management decisions. Soft Skills
- Adaptability
- Collaboration with different teams
- Good communication skills
- Team player attitude
- Work towards improving the following 4 verticals - scalability, availability, security, and cost, for company's workflows and products.
- Help in provisioning, managing, optimizing cloud infrastructure in AWS (IAM, EC2, RDS, CloudFront, S3, ECS, Lambda, ELK etc.)
- Work with the development teams to design scalable, robust systems using cloud architecture for both 0-to-1 and 1-to-100 products.
- Drive technical initiatives and architectural service improvements.
- Be able to predict problems and implement solutions that detect and prevent outages.
- Mentor/manage a team of engineers.
- Design solutions with failure scenarios in mind to ensure reliability.
- Document rigorously to keep track of all changes/upgrades to the infrastructure and as well share knowledge with the rest of the team
- Identify vulnerabilities during development with actionable information to empower developers to remediate vulnerabilities
- Automate the build and testing processes to consistently integrate code
- Manage changes to documents, software, images, large web sites, and other collections of code, configuration, and metadata among disparate teams
- Azure Devops - Working experience in Azure yaml pipelines. (Note – Some say they worked in yaml but it’s for Jenkins and not Azure devops)
- Azure – Infrastructure automation using Terraform/ARM templates. (Note – Some say that they worked in terraform but it’s for AWS and not Azure). Please confirm Terraform for Azure infrastructure automation.
- Powershell scripting to automate and deploy .Net applications.
As DevOps Engineer, you'll be part of the team building the stage for our Software Engineers to work on, helping to enhance our product performance and reliability.
Responsibilities:
- Build & operate infrastructure to support website, backed cluster, ML projects in the organization.
- Helping teams become more autonomous and allowing the Operation team to focus on improving the infrastructure and optimizing processes.
- Delivering system management tooling to the engineering teams.
- Working on your own applications which will be used internally.
- Contributing to open source projects that we are using (or that we may start).
- Be an advocate for engineering best practices in and out of the company.
- Organizing tech talks and participating in meetups and representing Box8 at industry events.
- Sharing pager duty for the rare instances of something serious happening.
- Collaborate with other developers to understand & setup tooling needed for Continuous Integration/Delivery/Deployment (CI/CD) practices.
Requirements:
- 1+ Years Of Industry Experience Scale existing back end systems to handle ever increasing amounts of traffic and new product requirements.
- Ruby On Rails or Python and Bash/Shell skills.
- Experience managing complex systems at scale.
- Experience with Docker, rkt or similar container engine.
- Experience with Kubernetes or similar clustering solutions.
- Experience with tools such as Ansible or Chef Understanding of the importance of smart metrics and alerting.
- Hands on experience with cloud infrastructure provisioning, deployment, monitoring (we are on AWS and use ECS, ELB, EC2, Elasticache, Elasticsearch, S3, CloudWatch).
- Experience with relational SQL and NoSQL databases, including Postgres and Cassandra.
- Knowledge of data pipeline and workflow management tools: Azkaban, Luigi, Airflow, etc.
- Experience in working on linux based servers.
- Managing large scale production grade infrastructure on AWS Cloud.
- Good Knowledge on scripting languages like ruby, python or bash.
- Experience in creating in deployment pipeline from scratch.
- Expertise in any of the CI tools, preferably Jenkins.
- Good knowledge of docker containers and its usage.
- Using Infra/App Monitoring tools like, CloudWatch/Newrelic/Sensu.
Good to have:
- Knowledge of Ruby on Rails based applications and its deployment methodologies.
- Experience working on Container Orchestration tools like Kubernetes/ECS/Mesos.
- Extra Points For Experience With Front-end development NewRelic GCP Kafka, Elasticsearch.
● Develop and deliver automation software required for building & improving the functionality, reliability, availability, and manageability of applications and cloud platforms
● Champion and drive the adoption of Infrastructure as Code (IaC) practices and mindset
● Design, architect, and build self-service, self-healing, synthetic monitoring and alerting platform and tools
● Automate the development and test automation processes through CI/CD pipeline (Git, Jenkins, SonarQube, Artifactory, Docker containers)
● Build container hosting-platform using Kubernetes
● Introduce new cloud technologies, tools & processes to keep innovating in commerce area to drive greater business value.
Skills Required:
● Excellent written and verbal communication skills and a good listener.
● Proficiency in deploying and maintaining Cloud based infrastructure services (AWS, GCP, Azure – good hands-on experience in at least one of them)
● Well versed with service-oriented architecture, cloud-based web services architecture, design patterns and frameworks.
● Good knowledge of cloud related services like compute, storage, network, messaging (Eg SNS, SQS) and automation (Eg. CFT/Terraform).
● Experience with relational SQL and NoSQL databases, including Postgres and
Cassandra.
● Experience in systems management/automation tools (Puppet/Chef/Ansible, Terraform)
● Strong Linux System Admin Experience with excellent troubleshooting and problem solving skills
● Hands-on experience with languages (Bash/Python/Core Java/Scala)
● Experience with CI/CD pipeline (Jenkins, Git, Maven etc)
● Experience integrating solutions in a multi-region environment
● Self-motivate, learn quickly and deliver results with minimal supervision
● Experience with Agile/Scrum/DevOps software development methodologies.
Nice to Have:
● Experience in setting-up Elastic Logstash Kibana (ELK) stack.
● Having worked with large scale data.
● Experience with Monitoring tools such as Splunk, Nagios, Grafana, DataDog etc.
● Previously experience on working with distributed architectures like Hadoop, Mapreduce etc.









