
We (the Software Engineer team) are looking for a motivated, experienced person with a data-driven approach to join our Distribution Team in Bangalore to help design, execute and improve our test sets and infrastructure for producing high-quality Hadoop software.
A Day in the life
You will be part of a team that makes sure our releases are predictable and deliver high value to the customer. This team is responsible for automating and maintaining our test harness, and making test results reliable and repeatable.
You will:
-
work on making our distributed software stack more resilient to high-scale endurance runs and customer simulations
-
provide valuable fixes to our product development teams to the issues you’ve found during exhaustive test runs
-
work with product and field teams to make sure our customer simulations match the expectations and can provide valuable feedback to our customers
-
work with amazing people - We are a fun & smart team, including many of the top luminaries in Hadoop and related open source communities. We frequently interact with the research community, collaborate with engineers at other top companies & host cutting edge researchers for tech talks.
-
do innovative work - Cloudera pushes the frontier of big data & distributed computing, as our track record shows. We work on high-profile open source projects, interacting daily with engineers at other exciting companies, speaking at meet-ups, etc.
-
be a part of a great culture - Transparent and open meritocracy. Everybody is always thinking of better ways to do things, and coming up with ideas that make a difference. We build our culture to be the best workplace in our careers.
You have:
-
strong knowledge in at least 1 of the following languages: Java / Python / Scala / C++ / C#
-
hands-on experience with at least 1 of the following configuration management tools: Ansible, Chef, Puppet, Salt
-
confidence with Linux environments
-
ability to identify critical weak spots in distributed software systems
-
experience in developing automated test cases and test plans
-
ability to deal with distributed systems
-
solid interpersonal skills conducive to a distributed environment
-
ability to work independently on multiple tasks
-
self-driven & motivated, with a strong work ethic and a passion for problem solving
-
innovate and automate and break the code
The right person in this role has an opportunity to make a huge impact at Cloudera and add value to our future decisions. If this position has piqued your interest and you have what we described - we invite you to apply! An adventure in data awaits.

Similar jobs
Location: Bangalore preferred / Hybrid as applicable
Experience: 3+ years
Education: B.E/B.Tech in Computer Science, Engineering or a related technical discipline
Salary: Above market standards, flexible for the right candidate
Career growth: Long-term opportunity with potential to lead DevOps architecture and cloud platform operations
About FrontM
FrontM builds software platforms for frontline workforces operating in remote and low-connectivity environments, with a strong focus on the maritime industry. The platform supports communication, collaboration, healthcare, learning, welfare and operational workflows across mobile, web, kiosk and connected device environments.
The platform runs across cloud infrastructure, constrained networks and specialised customer environments, requiring reliable DevOps practices, strong observability, secure architecture and careful operational discipline.
Role Summary
As a Senior DevOps Engineer, you will take ownership of FrontM’s AWS cloud infrastructure, CI/CD pipelines, platform reliability and technical operations. You will work closely with the VP of Delivery, CTO and CEO to maintain secure, scalable and high-availability infrastructure for FrontM’s production systems.
This role requires strong hands-on DevOps experience, broad AWS knowledge, Kubernetes experience and the ability to troubleshoot complex networking and production issues across multi-domain SaaS environments.
Key Responsibilities
Cloud Infrastructure & DevOps Architecture (≈45%)
· Own, maintain and improve AWS cloud infrastructure for FrontM platforms
· Create and maintain Terraform scripts for infrastructure deployment and management
· Manage Kubernetes workloads deployed within AWS EKS
· Support multi-zone AWS infrastructure design for availability, resilience and scale
· Maintain AWS services including Route 53, EC2, API Gateway, VPC, VPN, AWS Cognito, ElastiCache, DynamoDB and Lambda
· Contribute to DevOps architecture planning in line with FrontM’s platform roadmap
CI/CD, Operations & Platform Reliability (≈35%)
· Build, maintain and improve CI/CD pipelines for backend and platform services
· Oversee technical operations with hands-on administration, monitoring and release support
· Ensure continuous server uptime, stability, performance and maintainability
· Debug, respond to and restore system outages in production and staging environments
· Improve observability across infrastructure and applications, including migration from Elastic stack to logz.io
· Support backend stability, scale and performance across Node.js, Java and related services
Security, Networking & Production Support (≈20%)
· Maintain AWS security configurations, access controls and monitoring practices
· Support complex networking requirements across multi-domain SaaS implementations
· Troubleshoot network, infrastructure and access issues with internal teams and customer-side users
· Work with backend teams to support API integrations and infrastructure abstractions for complex requirements
· Document operational procedures, incident findings and technical support steps clearly
Required Technical Skills
Cloud Infrastructure & AWS
· Strong hands-on experience with AWS infrastructure and cloud operations
· Experience with Route 53, EC2, API Gateway, VPC, VPN, AWS Cognito, ElastiCache, DynamoDB and Lambda
· Experience with AWS security setup, monitoring and multi-zone infrastructure
· Ability to manage infrastructure using Terraform
Kubernetes, CI/CD & Observability
· Strong experience with Kubernetes, preferably AWS EKS
· Extensive CI/CD and DevOps experience
· Experience with infrastructure observability and application monitoring tools
· Ability to diagnose production bottlenecks, server failures and performance issues
Backend, Networking & SaaS Operations
· Experience supporting Node.js, Java and backend system procedures for stability and scale
· Good understanding of APIs, integrations and backend service dependencies
· Experience with complex networking and multi-domain SaaS implementations
· Ability to troubleshoot technical issues with non-technical end users
Nice to Have
· Experience with MongoDB clusters in MongoDB Atlas
Personal Attributes
· Strong ownership mindset for uptime, reliability and production stability
· Practical problem-solving approach with the ability to act quickly during incidents
· Clear written and spoken communication in English
· Ability to work independently and coordinate with senior management when required
· Comfortable working in fast-moving engineering teams
· Attention to detail in security, monitoring, documentation and operational processes
Why join FrontM?
Long-Term Career Growth
Opportunity to work on cloud infrastructure used by global maritime and remote workforce customers, with scope to grow into DevOps architecture and platform leadership roles.
Engineering Challenges That Matter
Work on infrastructure that supports applications used in remote, low-bandwidth and operationally demanding environments.
Broad Technical Ownership
Take responsibility across cloud infrastructure, Kubernetes, CI/CD, observability, networking, security and production reliability.
Apply now
Join a team focused on building reliable software infrastructure for real-world use cases and contribute to systems used across the global maritime workforce.
Key Responsibilities
- Design, implement, and maintain CI/CD pipelines for backend, frontend, and mobile applications.
- Manage cloud infrastructure using AWS (EC2, Lambda, S3, VPC, RDS, CloudWatch, ECS/EKS).
- Configure and maintain Docker containers and/or Kubernetes clusters.
- Implement and maintain Infrastructure as Code (IaC) using Terraform / CloudFormation.
- Automate build, deployment, and monitoring processes.
- Manage code repositories using Git/GitHub/GitLab, enforce branching strategies.
- Implement monitoring and alerting using tools like Prometheus, Grafana, CloudWatch, ELK, Splunk.
- Ensure system scalability, reliability, and security.
- Troubleshoot production issues and perform root-cause analysis.
- Collaborate with engineering teams to improve deployment and development workflows.
- Optimize infrastructure costs and improve performance.
Required Skills & Qualifications
- 3+ years of experience in DevOps, SRE, or Cloud Engineering.
- Strong hands-on knowledge of AWS cloud services.
- Experience with Docker, containers, and orchestrators (ECS, EKS, Kubernetes).
- Strong understanding of CI/CD tools: GitHub Actions, Jenkins, GitLab CI, or AWS CodePipeline.
- Experience with Linux administration and shell scripting.
- Strong understanding of Networking, VPC, DNS, Load Balancers, Security Groups.
- Experience with monitoring/logging tools: CloudWatch, ELK, Prometheus, Grafana.
- Experience with Terraform or CloudFormation (IaC).
- Good understanding of Node.js or similar application deployments.
- Knowledge of NGINX/Apache and load balancing concepts.
- Strong problem-solving and communication skills.
Preferred/Good to Have
- Experience with Kubernetes (EKS).
- Experience with Serverless architectures (Lambda).
- Experience with Redis, MongoDB, RDS.
- Certification in AWS Solutions Architect / DevOps Engineer.
- Experience with security best practices, IAM policies, and DevSecOps.
- Understanding of cost optimization and cloud cost management.
- Experience with Infrastructure-as-Code tools(IaS) like Terraform and Cloud Formation.
- Proficiency in cloud-native technologies and architectures (Docker/ Kubernetes), Ci/CD pipelines.
- Good experience in Javascript.
- Expertise in Linux / Windows environment.
- Good Experience in Scripting languages like PowerShell / Bash/ Python.
- Proficiency in revision control and DevOps best practices like Git
Now, more than ever, the Toast team is committed to our customers. We’re taking steps to help restaurants navigate these unprecedented times with technology, resources, and community. Our focus is on building a restaurant platform that helps restaurants adapt, take control, and get back to what they do best: building the businesses they love. And because our technology is purpose-built for restaurants by restaurant people, restaurants can trust that we’ll deliver on their needs for today while investing in experiences that will power their restaurant of the future.
At Toast, our Site Reliability Engineers (SREs) are responsible for keeping all customer-facing services and other Toast production systems running smoothly. SREs are a blend of pragmatic operators and software craftspeople who apply sound software engineering principles, operational discipline, and mature automation to our environments and our codebase. Our decisions are based on instrumentation and continuous observability, as well as predictions and capacity planning.
About this roll* (Responsibilities)
- Gather and analyze metrics from both operating systems and applications to assist in performance tuning and fault finding
- Partner with development teams to improve services through rigorous testing and release procedures
- Participate in system design consulting, platform management, and capacity planning
- Create sustainable systems and services through automation and uplift
- Balance feature development speed and reliability with well-defined service level objectives
Troubleshooting and Supporting Escalations:
- Gather and analyze metrics from both operating systems and applications to assist in performance tuning and fault finding
- Diagnose performance bottlenecks and implement optimizations across infrastructure, databases, web, and mobile applications
- Implement strategies to increase system reliability and performance through on-call rotation and process optimization
- Perform and run blameless RCAs on incidents and outages aggressively, looking for answers that will prevent the incident from ever happening again
Do you have the right ingredients? (Requirements)
- Extensive industry experience with at least 7+ years in SRE and/or DevOps roles
- Polyglot technologist/generalist with a thirst for learning
- Deep understanding of cloud and microservice architecture and the JVM
- Experience with tools such as APM, Terraform, Ansible, GitHub, Jenkins, and Docker
- Experience developing software or software projects in at least four languages, ideally including two of Go, Python, and Java
- Experience with cloud computing technologies ( AWS cloud provider preferred)
Bread puns are encouraged but not required
Internshala is a dot com business with the heart of dot org.
We are a technology company on a mission to equip students with relevant skills & practical exposure through internships, fresher jobs, and online trainings. Imagine a world full of freedom and possibilities. A world where you can discover your passion and turn it into your career. A world where your practical skills matter more than your university degree. A world where you do not have to wait till 21 to taste your first work experience (and get a rude shock that it is nothing like you had imagined it to be). A world where you graduate fully assured, fully confident, and fully prepared to stake a claim on your place in the world.
At Internshala, we are making this dream a reality!
👩🏻💻 Your responsibilities would include-
- Building and maintaining operational tools for monitoring and analysis of AWS infrastructure and systems
- Actively monitoring the health and performance of all systems and performing benchmarking and tuning of system applications and operating systems
- Setting up container orchestration using Kubernetes or other orchestration system for a monolithic application
- Continually working with development engineers to design the best system architectures and solutions
- Troubleshooting and resolving issues in our development, test, and production environments
- Maintaining reliability of the system and being on-call for mission-critical systems
- Performing infrastructure cost analysis and optimization
- Ensure systems’ compliance with operational risk standards (e.g. network, firewall, OS, logging, monitoring, availability, resiliency)
- Building, mentoring and leading a team of young professionals, if the need arises
🍒 You will get-
- A chance to build and lead an awesome team working on one of the best recruitment and online trainings products in the world that impact millions of lives for the better
- Awesome colleagues & a great work environment
- Loads of autonomy and freedom in your work
💯 You fit the bill if-
- You are proficient with bash, git and git workflows
- You have 3-5 years of experience as a DevOps Engineer or similar software engineering role
- You have excellent attention to detail
- AWS certification preferred but not mandatory
Engineering Leader, Cloud Infrastructure.
Bengaluru, Karnataka, India
Do you thrive on solving complex technical problems? Do you want to be at the cutting edge of technology? If so,we’re interested in speaking with you!
Your Impact:
We’re looking for a seasoned engineering leader in the Cloud team that is responsible for building, operating, and maintaining a customer-facing DBaaS service in multiple public clouds (AWS, GCP, and Azure). The service supports unified multiverse management of YugabyteDB, including fault-domain aware provisioning, rolling upgrades, security,
networking, monitoring, and day-2 operations (backups, scaling, billing etc). If you’re a strong leader who exemplifies collaboration, who is driven and thrive in a fast-paced startup environment, and who has a strong desire to build an internet-scale, extensible cloud based service with strong emphasis on simplicity and user experience, this job is for
you.
You Will:
Lead, inspire, and influence to make sure your team is successful
Partner with the recruiting team to attract and retain high-quality and diverse talent
Establish great rapport with other development teams, Product Managers, Sales and Customer Success tomaintain high levels of visibility, efficiency, and collaboration
Ensure teams have appropriate technical direction, leadership and balance between short-term impact andlong term architectural vision.
Occasionally contributing to development tasks such as coding and feature verifications to assist teamswith release commitments, to gain an understanding of the deeply technical product as well as to keepyour technical acumen sharp.
You'll need:
BS/MS degree in CS-or- a related field with 5+ years of engineering management experience leading productive, high-functioning teams
Strong fundamentals in distributed systems design and development
Ability to hire while ensuring a high hiring bar, keep engineers motivated, coach/mentor, and handle performance management
Experience running production services in Public Clouds such as AWS, GCP, and Azure
Experience with running large stateful data systems in the Cloud
Prior knowledge of Cloud architecture and implementation features (multi-tenancy, containerization,orchestration, elastic scalability)
A great track record of shipping features and hitting deadlines consistently; should be able to move fast,build in increments and iterate; have a sense of urgency, aggressive mindset towards achieving results and excellent prioritization skills; able to anticipate future technical needs for the product and craft plans to realize them
Ability to influence the team, peers, and upper management using effective communication and collaborative techniques; focused on building and maintaining a culture of collaboration within the team.
Role : SRE
Experience : 4 - 8 Years
- Experience in building, deploying and operating cloud solutions on Kubernetes
- Strong expertise administrating and scaling Kubernetes on bare metal and CKA preferred
- Expertise on K8s Interfaces CNI, CSI, CRI and Service meshe
- Hands-on experience as a DevOps or Automation development
- Demonstrable knowledge of TCP/IP, Linux operating system internals, filesystems, disk/storage technologies and storage protocols.
- Experience working with Helm Charts and building out Infrastructure As Code (IaC)
- Experience in writing software to automate orchestration tasks at scale; we commonly use Python, Go, and Shell scripting
- Knowledge of systems (Linux, GNU tooling), networking (OSI model, DNS, routing) and virtualization vs containerization
- Expertise in CI/CD tooling for cloud-based applications specifically Terraform / CloudFormation, Jenkins and Git
- Architected CNF Orchestration with Kubernetes
- Strong understanding of the principles of 12-factor apps and modern containerized microservices
- Plan for reliability by designing systems to work across our multi-region and multi-cloud environments
- Experience developing and using Application & Integration stacks/tools such as Kafka, Spring Cloud, Apache Camel, Kubernetes, Docker, Redis, Knative, and NoSQL
Job Summary
You'd be meticulously analyzing project requirements and carry forward the development of highly robust, scalable and easily maintainable backend applications, work independently, and you'll have the support & opportunity to thrive in a fast-paced environment.
Responsibilities and Duties:
- building and setting up new development tools and infrastructure
- understanding the needs of stakeholders and conveying this to developers
- working on ways to automate and improve development and release processes
- testing and examining code written by others and analysing results
- ensuring that systems are safe and secure against cybersecurity threats
- identifying technical problems and developing software updates and ‘fixes’
- working with software developers and software engineers to ensure that development follows established processes and works as intended
- planning out projects and being involved in project management decisions
Skill Requirements:
- Managing GitHub (example: - creating branches for test, QA, development and production, creating Release tags, resolve merge conflict)
- Setting up of the servers based on the projects in either AWS or Azure (test, development, QA, staging and production)
- AWS S3 configuring and s3 web hosting, Archiving data from s3 to s3-glacier
- Deploying the build(application) to the servers using AWS CI/CD and Jenkins (Automated and manual)
- AWS Networking and Content delivery (VPC, Route 53 and CloudFront)
- Managing databases like RDS, Snowflake, Athena, Redis and Elasticsearch
- Managing IAM roles and policies for the functions like Lambda, SNS, aws cognito, secret manager, certificate manager, Guard Duty, Inspector EC2 and S3.
- AWS Analytics (Elasticsearch, Athena, Glue and kinesis).
- AWS containers (elastic container registry, elastic container service, elastic Kubernetes service, Docker Hub and Docker compose
- AWS Auto scaling group (launch configuration, launch template) and load balancer
- EBS (snapshots, volumes and AMI.)
- AWS CI/CD build spec scripting, Jenkins groovy scripting, shell scripting and python scripting.
- Sagemaker, Textract, forecast, LightSail
- Android and IOS automation building
- Monitoring tools like cloudwatch, cloudwatch log group, Alarm, metric dashboard, SNS(simple notification service), SES(simple email service)
- Amazon MQ
- Operating system Linux and windows
- X-Ray, Cloud9, Codestar
- Fluent Shell Scripting
- Soft Skills
- Scripting Skills , Good to have knowledge (Python, Javascript, Java,Node.js)
- Knowledge On Various DevOps Tools And Technologies
Qualifications and Skills
Job Type: Full-time
Experience: 4 - 7 yrs
Qualification: BE/ BTech/MCA.
Location: Bengaluru, Karnataka









