Responsibilities
- Building and maintenance of resilient and scalable production infrastructure
Improvement of monitoring systems
- Improvement of monitoring systems
- Creation and support of development automation processes (CI / CD)
- Participation in infrastructure development
- Detection of problems in architecture and proposing of solutions for solving them
- Creation of tasks for system improvements for system scalability, performance and monitoring
- Analysis of product requirements in the aspect of devops
- Incident analysis and fixing
Skills and Experience
- Understanding of the distributed systems principles
- Understanding of principles for building a resistant network infrastructure
- Experience of Ubuntu Linux administration (Debian-like will be a plus)
- Strong knowledge of Bash
- Experience of working with LXC-containers
- Understanding and experience with infrastructure as a code approach
- Experience of development idempotent Ansible roles
- Experience of working with git
Preferred experience
Experience with relational databases (PostgeSQL), ability to create simple SQL queries
- Experience with monitoring and metric collect systems (Prometheus, Grafana, Zabbix)
- Understanding of dynamic routing (OSPF)
- Knowledge and experience of working with network equipment Cisco
- Experience of working with Cisco NX-OS
- Experience of working with IPsec, VXLAN, Open vSwitch
- Knowledge of principles of multicast protocols IGMP, PIM
- Experience of setting multicast on Cisco equipment
- Experience administering Atlassian products

Similar jobs
Job Title : Senior DevOps Engineer
Experience : 5+ Years
Location : Gurgaon, Sector 39
About the Role :
We are seeking an experienced Senior DevOps Engineer to lead our DevOps practices, manage a small team, and build functional, scalable systems that enhance customer experience. You will be responsible for deployments, automation, troubleshooting, integrations, monitoring, and team mentoring while ensuring secure and efficient operations.
Mandatory Skills :
Linux Administration, Shell Scripting, CI/CD (Jenkins), Git/GitHub, Docker, Kubernetes, AWS, Ansible, Database Administration (MariaDB/MySQL/MongoDB), Apache httpd/Tomcat, HAProxy, Nagios, Keepalived, Monitoring/Logging/Alerting, and On-premise Server Management.
Key Responsibilities :
- Implement and manage integrations as per business and customer needs.
- Deploy product updates, fixes, and enhancements.
- Provide Level 2 technical support and resolve production issues.
- Build tools to reduce errors and improve system performance.
- Develop scripts and automation for CI/CD, monitoring, and visualization.
- Perform root cause analysis of incidents and implement long-term fixes.
- Ensure robust monitoring, logging, and alerting systems are in place.
- Manage on-premise servers and ensure smooth deployments.
- Collaborate with development teams for system integration.
- Mentor and guide a team of 3 to 4 engineers.
Required Qualifications & Experience :
- Bachelor’s degree in Computer Science, Software Engineering, IT, or related field (Master’s preferred).
- 5+ years of experience in DevOps engineering with team management exposure.
- Strong expertise in:
- Linux Administration & Shell Scripting
- CI/CD pipelines (Jenkins or similar)
- Git/GitHub, branching, and code repository standards
- Docker, Kubernetes, AWS, Ansible
- Database administration (MariaDB, MySQL, MongoDB)
- Web servers (Apache httpd, Apache Tomcat)
- Networking & Load Balancing tools (HAProxy, Keepalived)
- Monitoring & alerting tools (Nagios, logging frameworks)
- On-premise server management
- Strong debugging, automation, and system troubleshooting skills.
- Knowledge of security best practices including data encryption.
Personal Attributes :
- Excellent problem-solving and analytical skills.
- Strong communication and leadership abilities.
- Detail-oriented with a focus on reliability and performance.
- Ability to mentor juniors and collaborate with cross-functional teams.
- Keen interest in emerging DevOps and cloud trends.

Location: Bangalore
Experience: 2–5 years
Type: Full-time | On-site
Start: Immediate
Why this role exists
Most systems don’t fail because of one big outage.
They fail because reliability is treated as an afterthought.
Right now, uptime depends too much on individual heroics.
That doesn’t scale.
This role exists to build a reliability system where:
- Uptime is predictable
- Failures are contained
- Escalations don’t depend on leadership
What you’ll do
You will not just monitor systems.
You will own reliability as a product.
1. Drive uptime to production-grade reliability
- Improve system uptime to 99.9% customer-facing SLA within 4 months
- Define and track:
- SLAs / SLOs / error budgets
- Ensure reliability is measured from the customer’s perspective, not internal metrics
2. Build incident response as a system
- Set up a 24/7 incident response rotation across 3 engineers
- Eliminate dependency on leadership (no single escalation point)
- Define:
- Incident severity levels
- Response playbooks
- Escalation protocols
- Ensure fast detection → containment → resolution
3. Contain and fix erratic system behavior
- Identify and resolve:
- Latency spikes
- Downtime incidents
- Integration failures
- Build guardrails to prevent recurrence
- Focus on root cause elimination, not temporary fixes
4. Create continuous reliability feedback loops
- Work closely with engineering teams to:
- Surface recurring failure patterns
- Improve build quality
- Reduce production bugs
- Ensure learnings from incidents directly improve future releases
5. Improve observability and monitoring
- Build dashboards and alerts for:
- System health
- Performance metrics
- Failure signals
- Ensure issues are detected before customers report them
6. Reduce operational fragility
- Remove single points of failure (people, systems, workflows)
- Improve system resilience across:
- Deployments
- Integrations
- Runtime environments
What success looks like
- Uptime reaches 99.9%+ reliably
- Incidents are:
- Detected early
- Contained quickly
- Resolved permanently
- No dependency on a single individual for escalation
- System behavior becomes predictable and stable
- Engineering teams ship with higher reliability confidence
Who you are
- You have 2-5 years of experience in SRE / DevOps / backend systems
- You have worked on production systems with real uptime expectations
- You think in:
- Systems
- Failure modes
- Trade-offs
- You are comfortable debugging live, high-pressure environments
What will make you stand out
- Experience with:
- Distributed systems
- Cloud infrastructure (AWS / Azure / GCP)
- Monitoring & alerting tools
- Have built or improved:
- Incident response systems
- Reliability frameworks
- Strong debugging skills across:
- Infra
- Application
- Integrations
Compensation
₹60,000/month (fixed)
(Aligned with role scope and impact expectations)
Why join
- You will define reliability standards for a production AI platform
- Your work directly impacts:
- Customer trust
- Product performance
- Enterprise readiness
- You will move the system from reactive → predictable
What this role is not
- Not just monitoring dashboards
- Not limited to handling tickets
- Not dependent on escalation to leadership
What this role is
- A builder of reliability systems
- A guardian of uptime and performance
- A multiplier of engineering quality
One question to self-evaluate
Can you build a system where downtime is rare, predictable, and never dependent on a single person?
Objectives of this role
•Building and implementing new development tools and infrastructure
•Understanding the needs of stakeholders and conveying them to developers
•Working on ways to automate and improve development and release processes
•Testing and examining code written by others and analysing results
•Ensuring that systems are safe and secure against cybersecurity threats
•Identifying technical problems and developing software updates and fixes
•Working with software developers and software engineers to ensure that development follows established processes and works as intended
•Planning projects and being involved in project management decisions
Responsibilities:
• Set up CI/CD pipelines for automated deployment and delivery
•Setup and management of new and Existing cloud-based Kubernetes cluster services
•Write Ad/Hoc Bash/Python scripts to automate certain operational tasks.
•Designing, maintenance and management of tools for automation of different operational processes.
•Provision of critical system security by leveraging best practices and prolific cloud security solutions.
•System troubleshooting and problem resolution across various application domains and platforms
•Support/maintain development, UAT and production infrastructure.
•Providing recommendations for architecture and process improvements.
•Respond to L2 calls and emails.
•Help administer monitoring systems, alerting, log management, and other IT infrastructure systems.
•Perform root cause analysis of production errors and resolve technical issues
•Design procedures for system troubleshooting and maintenance
Technical Skill Requirements:
•Experience in a DevOps role in AWS/OCI cloud environment.
•Must have experience with CI/CD Pipelines and hands-on experience with DevOps tools such as, Jenkins, Git, Docker, Kubernetes, Ansible, etc.
•Strong knowledge in Terraform for multi-stack cloud infrastructure provisioning.
•Strong knowledge in OCI/AWS-based Kubernetes service management.
•Must have experience with Python/Bash as a scripting language.
•Good knowledge in software debugging, web applications and services (Apache, Nginx, HAProxy)
•Must have knowledge in monitoring setup with Prometheus, Alertmanager, Grafana, Thanos, Loki, Fluentbit, etc.
Good To Have Skills
•PostgreSQL, MySQL, MongoDB, Redis, Keycloak.
•Migrating application from one cloud to another; OCI certifications
•Test Driven Development
Soft Skill Requirements:
•Able to learn new skills and technology quickly.
•Energetic with amazing customer service skills and a team-oriented approach.
•Strong verbal and written communication skills
Experience of Linux
Experience using Python or Shell scripting (for Automation)
Hands-on experience with Implementation of CI/CD Processes
Experience working with one cloud platforms (AWS or Azure or Google)
Experience working with configuration management tools such as Ansible & Chef
Experience working with Containerization tool Docker.
Experience working with Container Orchestration tool Kubernetes.
Experience in source Control Management including SVN and/or Bitbucket
& GitHub
Experience with setup & management of monitoring tools like Nagios, Sensu & Prometheus or any other popular tools
Hands-on experience in Linux, Scripting Language & AWS is mandatory
Troubleshoot and Triage development, Production issues
About the Company
- 💰 Early-stage, ed-tech, funded, growing, growing fast
- 🎯 Mission Driven: Make Indonesia competitive on a global scale
- 🥅 Build the best educational content and technology to advance STEM education
- 🥇 Students-First approach
- 🇮🇩 🇮🇳 Teams in India and Indonesia
Skillset 🧗🏼♀️
- You primarily identify as a DevOps/Infrastructure engineer and are comfortable working with systems and cloud-native services on AWS
- You can design, implement, and maintain secure and scalable infrastructure delivering cloud-based services
- You have experience operating and maintaining production systems in a Linux based public cloud environment
- You are familiar with cloud-native concepts - Containers, Lambdas, Orchestration (ECS, Kubernetes)
- You’re in love with system metrics and strive to help deliver improvements to systems all the time
- You can think in terms of Infrastructure as Code to build tools for automating deployment, monitoring, and operations of the platform
- You can be on-call once every few weeks to provide application support, incident management, and troubleshooting
- You’re fairly comfortable with GIT, AWS CLI, python, docker CLI, in general, all things CLI. Oh! Bash scripting too!
- You have high integrity, and you are reliable
What you can expect from us 👌🏼
☮️ Mentorship, growth, great work culture
- Mentorship and continuous improvement are a part of the team’s DNA. We have a battle-tested robust growth framework. You will have people to look up to and people looking up to you
- We are a people-first, high-trust, high-autonomy team
- We live in the TDD, Pair Programming, First Principles world
🌏 Remote done right
- Distributed does not mean working in isolation, feeling alone, being buried in Zoom calls
- Our leadership team has been WFH for 10+ years now and we know how remote teams work. This will be a place to belong
- A good balance between deep focussed work and collaborative work ⚖️
🖥️ Friendly, humane interview process
- 30-minute alignment check and screening call
- A short take-home coding assignment, no more than 2-3 hours. Time is precious
- Pair programming interview. Collaborate, work together. No sitting behind a desk and judging
- In-depth engineering discussion around your skills and career so far
- System design and architecture interview for seniors
What we ask from you👇🏼
- Bring your software engineering — both individual brilliance and collaborative skills
- Bring your good nature — we're building a team that supports each other
- Be vested or interested in the company vision
MTX Group Inc. is seeking a motivated Lead DevOps Engineer to join our team. MTX Group Inc. is a global implementation partner enabling organizations to become fit enterprises. MTX provides expertise across various platforms and technologies, including Google Cloud, Salesforce, artificial intelligence/machine learning, data integration, data governance, data quality, analytics, visualization and mobile technology. MTX’s very own Artificial Intelligence platform Maverick, enables clients to accelerate processes and critical decisions by leveraging a Cognitive Decision Engine, a collection of purpose-built Artificial Neural Networks designed to leverage the power of Machine Learning. The Maverick Platform includes Smart Asset Detection and Monitoring, Chatbot Services, Document Verification, to name a few.
Responsibilities:
- Be responsible for software releases, configuration, monitoring and support of production system components and infrastructure.
- Troubleshoot technical or functional issues in a complex environment to provide timely resolution, with various applications and platforms that are global.
- Bring experience on Google Cloud Platform.
- Write scripts and automation tools in languages such as Bash/Python/Ruby/Golang.
- Configure and manage data sources like PostgreSQL, MySQL, Mongo, Elasticsearch, Redis, Cassandra, Hadoop, etc
- Build automation and tooling around Google Cloud Platform using technologies such as Anthos, Kubernetes, Terraform, Google Deployment Manager, Helm, Cloud Build etc.
- Bring a passion to stay on top of DevOps trends, experiment with and learn new CI/CD technologies.
- Work with users to understand and gather their needs in our catalogue. Then participate in the required developments
- Manage several streams of work concurrently
- Understand how various systems work
- Understand how IT operations are managed
What you will bring:
- 5 years of work experience as a DevOps Engineer.
- Must possess ample knowledge and experience in system automation, deployment, and implementation.
- Must possess experience in using Linux, Jenkins, and ample experience in configuring and automating the monitoring tools.
- Experience in the software development process and tools and languages like SaaS, Python, Java, MongoDB, Shell scripting, Python, MySQL, and Git.
- Knowledge in handling distributed data systems. Examples: Elasticsearch, Cassandra, Hadoop, and others.
What we offer:
- Group Medical Insurance (Family Floater Plan - Self + Spouse + 2 Dependent Children)
- Sum Insured: INR 5,00,000/-
- Maternity cover upto two children
- Inclusive of COVID-19 Coverage
- Cashless & Reimbursement facility
- Access to free online doctor consultation
- Personal Accident Policy (Disability Insurance) -
- Sum Insured: INR. 25,00,000/- Per Employee
- Accidental Death and Permanent Total Disability is covered up to 100% of Sum Insured
- Permanent Partial Disability is covered as per the scale of benefits decided by the Insurer
- Temporary Total Disability is covered
- An option of Paytm Food Wallet (up to Rs. 2500) as a tax saver benefit
- Monthly Internet Reimbursement of upto Rs. 1,000
- Opportunity to pursue Executive Programs/ courses at top universities globally
- Professional Development opportunities through various MTX sponsored certifications on multiple technology stacks including Salesforce, Google Cloud, Amazon & others
*******************
What is the role?
As DevOps Engineer, you are responsible to setup and maintain GIT repository, DevOps tools like Jenkins, UCD, Docker, Kubernetes, Jfrog Artifactory, Cloud monitoring tools, Cloud security.
Key Responsibilities
- Setup, configure, and maintain GIT repos, Jenkins, UCD, etc. for multi hosting cloud environments.
- Architect and maintain the server infrastructure in AWS. Build highly resilient infrastructure following industry best practices.
- Working on Docker images and maintaining Kubernetes clusters.
- Develop and maintain the automation scripts using Ansible or other available tools.
- Maintain and monitor cloud Kubernetes Clusters and patching when necessary.
- Working on Cloud security tools to keep applications secured.
- Participate in software development lifecycle, specifically infra design, execution, and debugging required to achieve successful implementation of integrated solutions within the portfolio.
- Required Technical and Professional Expertise.
What are we looking for?
- Minimum 4-6 years of experience in IT industry.
- Expertise in implementing and managing Devops CI/CD pipeline.
- Experience in DevOps automation tools. And Very well versed with DevOps Frameworks, Agile.
- Working knowledge of scripting using shell, Python, Terraform, Ansible or puppet or chef.
- Experience and good understanding in any of Cloud like AWS, Azure, Google cloud.
- Knowledge of Docker and Kubernetes is required.
- Proficient in troubleshooting skills with proven abilities in resolving complex technical issues.
- Experience with working with ticketing tools.
- Middleware technologies knowledge or database knowledge is desirable.
- Experience and well versed with Jira tool is a plus.
What can you look for?
A wholesome opportunity in a fast-paced environment will enable you to juggle between concepts yet maintain the quality of content, interact, share your ideas, and have loads of learning while at work. Work with a team of highly talented young professionals and enjoy the benefits of being at Xoxoday.
We are
A fast-growing SaaS commerce company based in Bangalore with offices in Delhi, Mumbai, SF, Dubai, Singapore, and Dublin. We have three products in our portfolio: Plum, Empuls, and Compass. Xoxoday works with over 1000 global clients. We help our clients engage and motivate their employees, sales teams, channel partners, or consumers for better business results.
Way forward
We look forward to connecting with you. As you may take time to review this opportunity, we will wait for a reasonable time of around 3-5 days before we screen the collected applications and start lining up job discussions with the hiring manager. However, we assure you that we will attempt to maintain a reasonable time window for successfully closing this requirement. The candidates will be kept informed and updated on the feedback and application status.
- As a DevOps engineer, you will be responsible for
- Automated provisioning of infrastructure in AWS/Azure/OpenStack environments.
- Creation of CI/CD pipelines to ensure smooth delivery of projects.
- Proactive Monitoring of overall infrastructure (Logs/Resources etc)
- Deployment of application to various cloud environments.
- Should be able to lead/guide a team towards achieving goals and meeting the milestones defined.
- Practice and implement best practices in every aspect of project deliverable.
- Keeping yourself up to date with new frameworks and tools and enabling the team to use them.
Skills Required
- Experience in Automation of CI/CD processes using tools such as GIT, Gerrit, Jenkins, CircleCI, Azure Pipeline, Gitlab
- Experience in working with AWS and Azure platforms and Cloud-Native automation tools such as AWS cloud formation and Azure Resource Manager.
- Experience in monitoring solutions such as ELK Stack, Splunk, Nagios, Zabbix, Prometheus
- Web Server/Application Server deployments and administration.
- Good Communication, Team Handling, Problem-solving, Work Ethic, and Creativity.
- Work experience of at least 1 year in the following are mandatory.
If you do not have the relevant experience, please do not apply.
- Any cloud provider (AWS, GCP, Azure, OpenStack)
- Any of the configuration management tools (Ansible, Chef, Puppet, Terraform, Powershell DSC)
- Scripting languages (PHP, Python, Shell, Bash, etc.?
- Docker or Kubernetes
- Troubleshoot and debug infrastructure Network and operating system issues.
KaiOS is a mobile operating system for smart feature phones that stormed the scene to become the 3rd largest mobile OS globally (2nd in India ahead of iOS). We are on 100M+ devices in 100+ countries. We recently closed a Series B round with Cathay Innovation, Google and TCL.
What we are looking for:
- BE/B-Tech in Computer Science or related discipline
- 3+ years of overall commercial DevOps / infrastructure experience, cross-functional delivery-focused agile environment; previous experience as a software engineer and ability to see into and reason about the internals of an application a major plus
- Extensive experience designing, delivering and operating highly scalable resilient mission-critical backend services serving millions of users; experience operating and evolving multi-regional services a major plus
- Strong engineering skills; proficiency in programming languages
- Experience in Infrastructure-as-code tools, for example Terraform, CloudFormation
- Proven ability to execute on a technical and product roadmap
- Extensive experience with iterative development and delivery
- Extensive experience with observability for operating robust distributed systems – instrumentation, log shipping, alerting, etc.
- Extensive experience with test and deployment automation, CI / CD
- Extensive experience with infrastructure-as-code and tooling, such as Terraform
- Extensive experience with AWS, including services such as ECS, Lambda, DynamoDB, Kinesis, EMR, Redshift, Elasticsearch, RDS, CloudWatch, CloudFront, IAM, etc.
- Strong knowledge of backend and web technology; infrastructure experience with at-scale modern data technology a major plus
- Deep expertise in server-side security concerns and best practices
- Fluency with at least one programming language. We mainly use Golang, Python and JavaScript so far
- Outstanding analytical thinking, problem solving skills and attention to detail
- Excellent verbal and written communication skills
Requirements
Designation: DevOps Engineer
Location: Bangalore OR Hongkong
Experience: 4 to 7 Years
Notice period: 30 days or Less








