
Senior Platform Engineer
at Full stack fleet software company (Established startup)
● Good understanding of how the web works
● Experience with at least one language like Java, Python etc
● Good with Shell scripting
● Experience with *Nix based operating systems
● Experience with k8s, containers
● Fairly good understanding of AWS/GCP/Azure
● Troubleshoot and fix outages and performance issues in infrastructure stack
● Identify gap and design automation tools for all feasible functions in infrastructure
● Good verbal and written communication skills
● Drive SLA/SLO of team
Benefits
This is an opportunity to work on a fairly complex set of systems and improve
them. You will get a chance to learn things like “how to think about code
simplicity”, “how to write for maintainability” and several other things.
● Comprehensive health insurance policy.
● Flexible working hours and a very friendly work environment.
● Flexibility to work either in the office (post Covid) or remotely.

Similar jobs
Job Title: Senior DevOps Engineer
Location: Sector 39, Gurgaon (Onsite)
Employment Type: Full-Time
Working Days: 6 Days (Alternate Saturdays Working)
Experience Required: 5+ Years
Team Role: Lead & Mentor a team of 3–4 engineers
About the Role
We are seeking a highly skilled Senior DevOps Engineer to lead our infrastructure and automation initiatives while mentoring a small team. This role involves setting up and managing physical and cloud-based servers, configuring storage systems, and implementing automation to ensure high system availability and reliability. The ideal candidate will have strong Linux administration skills, hands-on experience with DevOps tools, and the leadership capabilities to guide and grow the team.
Key Responsibilities
Infrastructure & Server Management (60%)
- Set up, configure, and manage bare-metal (physical) servers as well as cloud-based environments.
- Configure network bonding, firewalls, and system security for optimal performance and reliability.
- Implement and maintain high-availability solutions for mission-critical systems.
Queue Systems (Kafka / RabbitMQ) (15%)
- Deploy and manage message queue systems to support high-throughput, real-time data exchange.
- Ensure reliable event-driven communication between distributed services.
Storage Systems (SAN/NAS) (15%)
- Configure and manage Storage Area Networks (SAN) and Network Attached Storage (NAS).
- Optimize storage performance, redundancy, and availability.
Database Administration (5%)
- Administer and optimize MariaDB, MySQL, MongoDB, Redis, and Elasticsearch.
- Handle backup, recovery, replication, and performance tuning.
General DevOps & Automation
- Deploy product updates, patches, and fixes while ensuring minimal downtime.
- Design and manage CI/CD pipelines using Jenkins or similar tools.
- Administer and automate workflows with Docker, Kubernetes, Ansible, AWS, and Git.
- Manage web and application servers (Apache httpd, Tomcat).
- Implement monitoring, logging, and alerting systems (Nagios, HAProxy, Keepalived).
- Conduct root cause analysis and implement automation to reduce manual interventions.
- Mentor a team of 3–4 engineers, fostering best practices and continuous improvement.
Required Skills & Qualifications
✅ 5+ years of proven DevOps engineering experience
✅ Strong expertise in Linux administration & shell scripting
✅ Hands-on experience with bare-metal server management & storage systems
✅ Proficiency in Docker, Kubernetes, AWS, Jenkins, Git, and Ansible
✅ Experience with Kafka or RabbitMQ in production environments
✅ Knowledge of CI/CD, automation, monitoring, and high-availability tools (Nagios, HAProxy, Keepalived)
✅ Excellent problem-solving, troubleshooting, and leadership abilities
✅ Strong communication skills with the ability to mentor and lead teams
Good to Have
- Experience in Telecom projects involving SMS, voice, or real-time data handling.

Job Title: DevOps - 3
Roles and Responsibilities:
- Develop deep understanding of the end-to-end configurations, dependencies, customer requirements, and overall characteristics of the production services as the accountable owner for overall service operations
- Implementing best practices, challenging the status quo, and tab on industry and technical trends, changes, and developments to ensure the team is always striving for best-in-class work
- Lead incident response efforts, working closely with cross-functional teams to resolve issues quickly and minimize downtime. Implement effective incident management processes and post-incident reviews
- Participate in on-call rotation responsibilities, ensuring timely identification and resolution of infrastructure issues
- Possess expertise in designing and implementing capacity plans, accurately estimating costs and efforts for infrastructure needs.
- Systems and Infrastructure maintenance and ownership for production environments, with a continued focus on improving efficiencies, availability, and supportability through automation and well defined runbooks
- Provide mentorship and guidance to a team of DevOps engineers, fostering a collaborative and high-performing work environment. Mentor team members in best practices, technologies, and methodologies.
- Design for Reliability - Architect & implement solutions that keeps Infrastructure running with Always On availability and ensures high uptime SLA for the Infrastructure
- Manage individual project priorities, deadlines, and deliverables related to your technical expertise and assigned domains
- Collaborate with Product & Information Security teams to ensure the integrity and security of Infrastructure and applications. Implement security best practices and compliance standards.
Must Haves
- 5-8 years of experience as Devops / SRE / Platform Engineer.
- Strong expertise in automating Infrastructure provisioning and configuration using tools like Ansible, Packer, Terraform, Docker, Helm Charts etc.
- Strong skills in network services such as DNS, TLS/SSL, HTTP, etc
- Expertise in managing large-scale cloud infrastructure (preferably AWS and Oracle)
- Expertise in managing production grade Kubernetes clusters
- Experience in scripting using programming languages like Bash, Python, etc.
- Expertise in skill sets for centralized logging systems, metrics, and tooling frameworks such as ELK, Prometheus/VictoriaMetrics, and Grafana etc.
- Experience in Managing and building High scale API Gateway, Service Mesh, etc
- Systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive
- Have a working knowledge of a backend programming language
- Deep knowledge & experience with Unix / Linux operating systems internals (Eg. filesystems, user management, etc)
- A working knowledge and deep understanding of cloud security concepts
- Proven track record of driving results and delivering high-quality solutions in a fast-paced environment
- Demonstrated ability to communicate clearly with both technical and non-technical project stakeholders, with the ability to work effectively in a cross-functional team environment.
About us:
HappyFox is a software-as-a-service (SaaS) support platform. We offer an enterprise-grade help desk ticketing system and intuitively designed live chat software.
We serve over 12,000 companies in 70+ countries. HappyFox is used by companies that span across education, media, e-commerce, retail, information technology, manufacturing, non-profit, government and many other verticals that have an internal or external support function.
To know more, Visit! - https://www.happyfox.com/
Responsibilities
- Build and scale production infrastructure in AWS for the HappyFox platform and its products.
- Research, Build/Implement systems, services and tooling to improve uptime, reliability and maintainability of our backend infrastructure. And to meet our internal SLOs and customer-facing SLAs.
- Implement consistent observability, deployment and IaC setups
- Lead incident management and actively respond to escalations/incidents in the production environment from customers and the support team.
- Hire/Mentor other Infrastructure engineers and review their work to continuously ship improvements to production infrastructure and its tooling.
- Build and manage development infrastructure, and CI/CD pipelines for our teams to ship & test code faster.
- Lead infrastructure security audits
Requirements
- At least 7 years of experience in handling/building Production environments in AWS.
- At least 3 years of programming experience in building API/backend services for customer-facing applications in production.
- Proficient in managing/patching servers with Unix-based operating systems like Ubuntu Linux.
- Proficient in writing automation scripts or building infrastructure tools using Python/Ruby/Bash/Golang
- Experience in deploying and managing production Python/NodeJS/Golang applications to AWS EC2, ECS or EKS.
- Experience in security hardening of infrastructure, systems and services.
- Proficient in containerised environments such as Docker, Docker Compose, Kubernetes
- Experience in setting up and managing test/staging environments, and CI/CD pipelines.
- Experience in IaC tools such as Terraform or AWS CDK
- Exposure/Experience in setting up or managing Cloudflare, Qualys and other related tools
- Passion for making systems reliable, maintainable, scalable and secure.
- Excellent verbal and written communication skills to address, escalate and express technical ideas clearly
- Bonus points – Hands-on experience with Nginx, Postgres, Postfix, Redis or Mongo systems.
- Public clouds, such as AWS, Azure, or Google Cloud Platform
- Automation technologies, such as Kubernetes or Jenkins
- Configuration management tools, such as Puppet or Chef
- Scripting languages, such as Python or Ruby
Job Description:
- Hands on experience with Ansible & Terraform.
- Scripting language, such as Python or Bash or PowerShell is required and willingness to learn and master others.
- Troubleshooting and resolving automation, build, and CI/CD related issues (in cloud environment like AWS or Azure).
- Experience with Kubernetes is mandate.
- To develop and maintain tooling and environments for test and production environments.
- Assist team members in the development and maintenance of tooling for integration testing, performance testing, security testing, as well as source control systems (that includes working in CI systems like Azure DevOps, Team City, and orchestration tools like Octopus).
- Good with Linux environment.
Mandatory:
● A minimum of 1 year of development, system design or engineering experience ●
Excellent social, communication, and technical skills
● In-depth knowledge of Linux systems
● Development experience in at least two of the following languages: Php, Go, Python,
JavaScript, C/C++, Bash
● In depth knowledge of web servers (Apache, NgNix preferred)
● Strong in using DevOps tools - Ansible, Jenkins, Docker, ELK
● Knowledge to use APM tools, NewRelic is preferred
● Ability to learn quickly, master our existing systems and identify areas of improvement
● Self-starter that enjoys and takes pride in the engineering work of their team ● Tried
and Tested Real-world Cloud Computing experience - AWS/ GCP/ Azure ● Strong
Understanding of Resilient Systems design
● Experience in Network Design and Management
Job Summary
You'd be meticulously analyzing project requirements and carry forward the development of highly robust, scalable and easily maintainable backend applications, work independently, and you'll have the support & opportunity to thrive in a fast-paced environment.
Responsibilities and Duties:
- building and setting up new development tools and infrastructure
- understanding the needs of stakeholders and conveying this to developers
- working on ways to automate and improve development and release processes
- testing and examining code written by others and analysing results
- ensuring that systems are safe and secure against cybersecurity threats
- identifying technical problems and developing software updates and ‘fixes’
- working with software developers and software engineers to ensure that development follows established processes and works as intended
- planning out projects and being involved in project management decisions
Skill Requirements:
- Managing GitHub (example: - creating branches for test, QA, development and production, creating Release tags, resolve merge conflict)
- Setting up of the servers based on the projects in either AWS or Azure (test, development, QA, staging and production)
- AWS S3 configuring and s3 web hosting, Archiving data from s3 to s3-glacier
- Deploying the build(application) to the servers using AWS CI/CD and Jenkins (Automated and manual)
- AWS Networking and Content delivery (VPC, Route 53 and CloudFront)
- Managing databases like RDS, Snowflake, Athena, Redis and Elasticsearch
- Managing IAM roles and policies for the functions like Lambda, SNS, aws cognito, secret manager, certificate manager, Guard Duty, Inspector EC2 and S3.
- AWS Analytics (Elasticsearch, Athena, Glue and kinesis).
- AWS containers (elastic container registry, elastic container service, elastic Kubernetes service, Docker Hub and Docker compose
- AWS Auto scaling group (launch configuration, launch template) and load balancer
- EBS (snapshots, volumes and AMI.)
- AWS CI/CD build spec scripting, Jenkins groovy scripting, shell scripting and python scripting.
- Sagemaker, Textract, forecast, LightSail
- Android and IOS automation building
- Monitoring tools like cloudwatch, cloudwatch log group, Alarm, metric dashboard, SNS(simple notification service), SES(simple email service)
- Amazon MQ
- Operating system Linux and windows
- X-Ray, Cloud9, Codestar
- Fluent Shell Scripting
- Soft Skills
- Scripting Skills , Good to have knowledge (Python, Javascript, Java,Node.js)
- Knowledge On Various DevOps Tools And Technologies
Qualifications and Skills
Job Type: Full-time
Experience: 4 - 7 yrs
Qualification: BE/ BTech/MCA.
Location: Bengaluru, Karnataka
Mandatory Skills Sets
- Excellent problem-solving skills in technical challenges
- Deep knowledge of at least one cloud platform (AWS Preferred)
- Understanding of Latest cloud computing technologies
- Experience in architecting solutions based on knowledge of infrastructure & application architectures including the integration approaches
- Complete hands-on with ability to grasp evolving technologies and coding languages
- Excellent communication skills which would involve customer facing role
- Design thinking
- Customer facing skills and strong technical capabilities to review the teams work as well as guide the team
- Experience working/building/contributing to proposals for architecture, estimations
Preferred Skills Sets
- Experience architecting infrastructure solutions using both Linux/Unix and Windows with specific recommendations on server, load balancing, HA/DR, & storage architectures.
- Experience architecting or deploying Cloud/Virtualization solutions in enterprise customers.
- Person must have performed Application Architect Role for 3+ years
- AWS platform specific experience a bonus.
- Enterprise application and database architecture a bonus.
● Develop and deliver automation software required for building & improving the functionality, reliability, availability, and manageability of applications and cloud platforms
● Champion and drive the adoption of Infrastructure as Code (IaC) practices and mindset
● Design, architect, and build self-service, self-healing, synthetic monitoring and alerting platform and tools
● Automate the development and test automation processes through CI/CD pipeline (Git, Jenkins, SonarQube, Artifactory, Docker containers)
● Build container hosting-platform using Kubernetes
● Introduce new cloud technologies, tools & processes to keep innovating in commerce area to drive greater business value.
Skills Required:
● Excellent written and verbal communication skills and a good listener.
● Proficiency in deploying and maintaining Cloud based infrastructure services (AWS, GCP, Azure – good hands-on experience in at least one of them)
● Well versed with service-oriented architecture, cloud-based web services architecture, design patterns and frameworks.
● Good knowledge of cloud related services like compute, storage, network, messaging (Eg SNS, SQS) and automation (Eg. CFT/Terraform).
● Experience with relational SQL and NoSQL databases, including Postgres and
Cassandra.
● Experience in systems management/automation tools (Puppet/Chef/Ansible, Terraform)
● Strong Linux System Admin Experience with excellent troubleshooting and problem solving skills
● Hands-on experience with languages (Bash/Python/Core Java/Scala)
● Experience with CI/CD pipeline (Jenkins, Git, Maven etc)
● Experience integrating solutions in a multi-region environment
● Self-motivate, learn quickly and deliver results with minimal supervision
● Experience with Agile/Scrum/DevOps software development methodologies.
Nice to Have:
● Experience in setting-up Elastic Logstash Kibana (ELK) stack.
● Having worked with large scale data.
● Experience with Monitoring tools such as Splunk, Nagios, Grafana, DataDog etc.
● Previously experience on working with distributed architectures like Hadoop, Mapreduce etc.
Skill: Python, Docker or Ansible , AWS
➢ Experience Building a multi-region highly available auto-scaling infrastructure that optimizes
performance and cost. plan for future infrastructure as well as Maintain & optimize existing
infrastructure.
➢ Conceptualize, architect and build automated deployment pipelines in a CI/CD environment like
Jenkins.
➢ Conceptualize, architect and build a containerized infrastructure using Docker,Mesosphere or
similar SaaS platforms.
Work with developers to institute systems, policies and workflows which allow for rollback of
deployments Triage release of applications to production environment on a daily basis.
➢ Interface with developers and triage SQL queries that need to be executed inproduction
environments.
➢ Maintain 24/7 on-call rotation to respond and support troubleshooting of issues in production.
➢ Assist the developers and on calls for other teams with post mortem, follow up and review of
issues affecting production availability.
➢ Establishing and enforcing systems monitoring tools and standards
➢ Establishing and enforcing Risk Assessment policies and standards
➢ Establishing and enforcing Escalation policies and standards







