
Sr.DevOps Engineer (5 to 8 yrs. Exp.)
Location: Ahmedabad
- Strong Experience in Infrastructure provisioning in cloud using Terraform & AWS CloudFormation Templates.
- Strong Experience in Serverless Containerization technologies such as Kubernetes, Docker etc.
- Strong Experience in Jenkins & AWS Native CI/CD implementation using code
- Strong Experience in Cloud operational automation using Python, Shell script, AWS CLI, AWS Systems Manager, AWS Lamnda, etc.
- Day to Day AWS Cloud administration tasks
- Strong Experience in Configuration management using Ansible and PowerShell.
- Strong Experience in Linux and any scripting language must required.
- Knowledge of Monitoring tool will be added advantage.
- Understanding of DevOps practices which involves Continuous Integration, Delivery and Deployment.
- Hands on with application deployment process
Key Skills: AWS, terraform, Serverless, Jenkins,Devops,CI/CD,Python,CLI,Linux,Git,Kubernetes
Role: Software Developer
Industry Type: IT-Software, Software Services
FunctionalArea:ITSoftware- Application Programming, Maintenance
Employment Type: Full Time, Permanent
Education: Any computer graduate.
Salary: Best in Industry.
About Compufy Technolab
Similar jobs

Location: Bangalore
Experience: 2–5 years
Type: Full-time | On-site
Start: Immediate
Why this role exists
Most systems don’t fail because of one big outage.
They fail because reliability is treated as an afterthought.
Right now, uptime depends too much on individual heroics.
That doesn’t scale.
This role exists to build a reliability system where:
- Uptime is predictable
- Failures are contained
- Escalations don’t depend on leadership
What you’ll do
You will not just monitor systems.
You will own reliability as a product.
1. Drive uptime to production-grade reliability
- Improve system uptime to 99.9% customer-facing SLA within 4 months
- Define and track:
- SLAs / SLOs / error budgets
- Ensure reliability is measured from the customer’s perspective, not internal metrics
2. Build incident response as a system
- Set up a 24/7 incident response rotation across 3 engineers
- Eliminate dependency on leadership (no single escalation point)
- Define:
- Incident severity levels
- Response playbooks
- Escalation protocols
- Ensure fast detection → containment → resolution
3. Contain and fix erratic system behavior
- Identify and resolve:
- Latency spikes
- Downtime incidents
- Integration failures
- Build guardrails to prevent recurrence
- Focus on root cause elimination, not temporary fixes
4. Create continuous reliability feedback loops
- Work closely with engineering teams to:
- Surface recurring failure patterns
- Improve build quality
- Reduce production bugs
- Ensure learnings from incidents directly improve future releases
5. Improve observability and monitoring
- Build dashboards and alerts for:
- System health
- Performance metrics
- Failure signals
- Ensure issues are detected before customers report them
6. Reduce operational fragility
- Remove single points of failure (people, systems, workflows)
- Improve system resilience across:
- Deployments
- Integrations
- Runtime environments
What success looks like
- Uptime reaches 99.9%+ reliably
- Incidents are:
- Detected early
- Contained quickly
- Resolved permanently
- No dependency on a single individual for escalation
- System behavior becomes predictable and stable
- Engineering teams ship with higher reliability confidence
Who you are
- You have 2-5 years of experience in SRE / DevOps / backend systems
- You have worked on production systems with real uptime expectations
- You think in:
- Systems
- Failure modes
- Trade-offs
- You are comfortable debugging live, high-pressure environments
What will make you stand out
- Experience with:
- Distributed systems
- Cloud infrastructure (AWS / Azure / GCP)
- Monitoring & alerting tools
- Have built or improved:
- Incident response systems
- Reliability frameworks
- Strong debugging skills across:
- Infra
- Application
- Integrations
Compensation
₹60,000/month (fixed)
(Aligned with role scope and impact expectations)
Why join
- You will define reliability standards for a production AI platform
- Your work directly impacts:
- Customer trust
- Product performance
- Enterprise readiness
- You will move the system from reactive → predictable
What this role is not
- Not just monitoring dashboards
- Not limited to handling tickets
- Not dependent on escalation to leadership
What this role is
- A builder of reliability systems
- A guardian of uptime and performance
- A multiplier of engineering quality
One question to self-evaluate
Can you build a system where downtime is rare, predictable, and never dependent on a single person?
-
Working with Ruby, Python, Perl, and Java
-
Troubleshooting and having working knowledge of various tools, open-source technologies, and cloud services.
-
Configuring and managing databases and cache layers such as MySQL, Mongo, Elasticsearch, Redis
-
Setting up all databases and for optimisations (sharding, replication, shell scripting etc)
-
Creating user, Domain handling, Service handling, Backup management, Port management, SSL services
-
Planning, testing & development of IT Infrastructure ( Server configuration and Database) and handling the technical issue related to server Docker and VM optimization
-
Demonstrate awareness of DB management, server related work, Elasticsearch.
-
Selecting and deploying appropriate CI/CD tools
-
Striving for continuous improvement and build continuous integration, continuous development, and constant deployment pipeline (CI/CD Pipeline)
-
Experience working on Linux based infrastructure
-
Awareness of critical concepts in DevOps and Agile principles
-
6-8 years of experience
Kutumb is the first and largest communities platform for Bharat. We are growing at an exponential trajectory. More than 1 Crore users use Kutumb to connect with their community. We are backed by world-class VCs and angel investors. We are growing and looking for exceptional Infrastructure Engineers to join our Engineering team.
More on this here - https://kutumbapp.com/why-join-us.html">https://kutumbapp.com/why-join-us.html
We’re excited if you have:
- Recent experience designing and building unified observability platforms that enable companies to use the sometimes-overwhelming amount of available data (metrics, logs, and traces) to determine quickly if their application or service is operating as desired
- Expertise in deploying and using open-source observability tools in large-scale environments, including Prometheus, Grafana, ELK (ElasticSearch + Logstash + Kibana), Jaeger, Kiali, and/or Loki
- Familiarity with open standards like OpenTelemetry, OpenTracing, and OpenMetrics
- Familiarity with Kubernetes and Istio as the architecture on which the observability platform runs, and how they integrate and scale. Additionally, the ability to contribute improvements back to the joint platform for the benefit of all teams
- Demonstrated customer engagement and collaboration skills to curate custom dashboards and views, and identify and deploy new tools, to meet their requirements
- The drive and self-motivation to understand the intricate details of a complex infrastructure environment
- Using CICD tools to automatically perform canary analysis and roll out changes after passing automated gates (think Argo & keptn)
- Hands-on experience working with AWS
- Bonus points for knowledge of ETL pipelines and Big data architecture
- Great problem-solving skills & takes pride in your work
- Enjoys building scalable and resilient systems, with a focus on systems that are robust by design and suitably monitored
- Abstracting all of the above into as simple of an interface as possible (like Knative) so developers don't need to know about it unless they choose to open the escape hatch
What you’ll be doing:
- Design and build automation around the chosen tools to make onboarding new services easy for developers (dashboards, alerts, traces, etc)
- Demonstrate great communication skills in working with technical and non-technical audiences
- Contribute new open-source tools and/or improvements to existing open-source tools back to the CNCF ecosystem
Tools we use:
Kops, Argo, Prometheus/ Loki/ Grafana, Kubernetes, AWS, MySQL/ PostgreSQL, Apache Druid, Cassandra, Fluentd, Redis, OpenVPN, MongoDB, ELK
What we offer:
- High pace of learning
- Opportunity to build the product from scratch
- High autonomy and ownership
- A great and ambitious team to work with
- Opportunity to work on something that really matters
- Top of the class market salary and meaningful ESOP ownership
- Provision Dev Test Prod Infrastructure as code using IaC (Infrastructure as Code)
- Good knowledge on Terraform
- In-depth knowledge of security and IAM / Role Based Access Controls in Azure, management of Azure Application/Network Security Groups, Azure Policy, and Azure Management Groups and Subscriptions.
- Experience with Azure and GCP compute, storage and networking (we can also look for GCP )
- Experience in working with ADLS Gen2, Databricks and Synapse Workspace
- Experience supporting cloud development pipelines using Git, CI/CD tooling, Terraform and other Infrastructure as Code tooling as appropriate
- Configuration Management (e.g. Jenkins, Ansible, Git, etc...)
- General automation including Azure CLI, or Python, PowerShell and Bash scripting
- Experience with Continuous Integration/Continuous Delivery models
- Knowledge of and experience in resolving configuration issues
- Understanding of software and infrastructure architecture
- Experience in Paas, Terraform and AKS
- Monitoring, alerting and logging tools, and build/release processes Understanding of computing technologies across Windows and Linux
- Essentail Skills:
- Docker
- Jenkins
- Python dependency management using conda and pip
- Base Linux System Commands, Scripting
- Docker Container Build & Testing
- Common knowledge of minimizing container size and layers
- Inspecting containers for un-used / underutilized systems
- Multiple Linux OS support for virtual system
- Has experience as a user of jupyter / jupyter lab to test and fix usability issues in workbenches
- Templating out various configurations for different use cases (we use Python Jinja2 but are open to other languages / libraries)
- Jenkins PIpeline
- Github API Understanding to trigger builds, tags, releases
- Artifactory Experience
- Nice to have: Kubernetes, ArgoCD, other deployment automation tool sets (DevOps)
About the Company
Blue Sky Analytics is a Climate Tech startup that combines the power of AI & Satellite data to aid in the creation of a global environmental data stack. Our funders include Beenext and Rainmatter. Over the next 12 months, we aim to expand to 10 environmental data-sets spanning water, land, heat, and more!
We are looking for DevOps Engineer who can help us build the infrastructure required to handle huge datasets on a scale. Primarily, you will work with AWS services like EC2, Lambda, ECS, Containers, etc. As part of our core development crew, you’ll be figuring out how to deploy applications ensuring high availability and fault tolerance along with a monitoring solution that has alerts for multiple microservices and pipelines. Come save the planet with us!
Your Role
- Applications built at scale to go up and down on command.
- Manage a cluster of microservices talking to each other.
- Build pipelines for huge data ingestion, processing, and dissemination.
- Optimize services for low cost and high efficiency.
- Maintain high availability and scalable PSQL database cluster.
- Maintain alert and monitoring system using Prometheus, Grafana, and Elastic Search.
Requirements
- 1-4 years of work experience.
- Strong emphasis on Infrastructure as Code - Cloudformation, Terraform, Ansible.
- CI/CD concepts and implementation using Codepipeline, Github Actions.
- Advanced hold on AWS services like IAM, EC2, ECS, Lambda, S3, etc.
- Advanced Containerization - Docker, Kubernetes, ECS.
- Experience with managed services like database cluster, distributed services on EC2.
- Self-starters and curious folks who don't need to be micromanaged.
- Passionate about Blue Sky Climate Action and working with data at scale.
Benefits
- Work from anywhere: Work by the beach or from the mountains.
- Open source at heart: We are building a community where you can use, contribute and collaborate on.
- Own a slice of the pie: Possibility of becoming an owner by investing in ESOPs.
- Flexible timings: Fit your work around your lifestyle.
- Comprehensive health cover: Health cover for you and your dependents to keep you tension free.
- Work Machine of choice: Buy a device and own it after completing a year at BSA.
- Quarterly Retreats: Yes there's work-but then there's all the non-work+fun aspect aka the retreat!
- Yearly vacations: Take time off to rest and get ready for the next big assignment by availing the paid leaves.
Cloud native technologies - Kubernetes (EKS, GKE, AKS), AWS ECS, Helm, CircleCI, Harness, Severless platforms (AWS Fargate etc.)
Infrastructure as Code tools - Terraform, CloudFormation, Ansible
Scripting - Python, Bash
Desired Skills & Experience:
Projects/Internships with coding experience in either of Javascript, Python, Golang, Java etc.
Hands-on scripting and software development fluency in any programming language (Python, Go, Node, Ruby).
Basic understanding of Computer Science fundamentals - Networking, Web Architecture etc.
Infrastructure automation experience with knowledge of at least a few of these tools: Chef, Puppet, Ansible, CloudFormation, Terraform, Packer, Jenkins etc.
Bonus points if you have contributed to open source projects, participated in competitive coding platforms like Hackerearth, CodeForces, SPOJ etc.
You’re willing to learn various new technologies and concepts. The “cloud-native” field of software is evolving fast and you’ll need to quickly learn new technologies as required.
Communication: You like discussing a plan upfront, welcome collaboration, and are an excellent verbal and written communicator.
B.E/B.Tech/M.Tech or equivalent experience.
What will you do?
- Setup, manage Applications with automation, DevOps, and CI/CD tools.
- Deploy, Maintain and Monitor Infrastructure and Services.
- Automate code and Infra Deployments.
- Tune, optimize and keep systems up to date.
- Design and implement deployment strategies.
- Setup infrastructure in cloud platforms like AWS, Azure, Google Cloud, IBM cloud, Digital Ocean etc as per requirement.
Responsibilities
- Designing and building infrastructure to support AWS, Azure, and GCP-based Cloud services and infrastructure.
- Creating and utilizing tools to monitor our applications and services in the cloud including system health indicators, trend identification, and anomaly detection.
- Working with development teams to help engineer scalable, reliable, and resilient software running in the cloud.
- Participating in on-call escalation to troubleshoot customer-facing issues
- Analyzing and monitoring performance bottlenecks and key metrics to optimize software and system performance.
- Providing analytics and forecasts for cloud capacity, troubleshooting analysis, and uptime.
Skills
- Should have strong experience of a couple of years, in leading DevOps team and planning, defining DevOps roadmap and executing as per the same along with the team
- Familiarity with AWS cloud and JSON templates, Python, AWS Cloud formation templates
- Designing solutions using one or more AWS features, tools, and technologies such as EC2, EBS, Glacier, S3, ELB, CloudFormation, Lambada, CloudWatch, VPC, RDS, Direct Connect, AWS CLI, REST API
- Design and implement system architecture with AWS cloud - Develop automation scripts, ARM templates, Ansible, Chef, Python, Powershell Knowledge of AWS services and cloud design patterns- Knowledge on Cloud fundamentals like autoscaling, serverless
- Have experience with DevOps and Infrastructure as Code: AWS environment and application automation utilizing CloudFormation and third-party tools. CI/CD pipeline setup utilizing
- CI experience with the following is a must: Jenkins, Bitbucket/GIT, Nexus or Artifactory, SonarQube, WireMock or other mocking solution
- Expert knowledge on Windows/Linux OS/Mac with at least 5-6 years of system administration experience
- Should have strong skills in using JIRA build tool
- Should have knowledge in managing the CI/CD pipeline on public cloud deployments using AWS
- Should have strong skills in using tools like Jenkins, Docker, Kubernetes (AWS EKS, Azure AKS), and Cloudformation.
- Experience in monitoring tools like Pingdom, Nagios, etc.
- Experience in reverse proxy services like Nginx and Apache
- Desirable experience in Bitbucket with version control tools like GIT/SVN
- Experience of manual/automated testing desired application deployments
- Experience in database technologies such as PostgreSQL, MySQL
- Knowledge of helm and terraform








