
Site Reliability-Devops Engineer
at Altimetrik

Platform Services Engineer
DevSecOps Engineer
- Strong Systems Experience- Linux, networking, cloud, APIs
- Scripting language Programming - Shell, Python
- Strong Debugging Capability
- AWS Platform -IAM, Network,EC2, Lambda, S3, CloudWatch
- Knowledge on Terraform, Packer, Ansible, Jenkins
- Observability - Prometheus, InfluxDB, Dynatrace,
- Grafana, Splunk • DevSecOps-CI/CD - Jenkins
- Microservices
- Security & Access Management
- Container Orchestration a plus - Kubernetes, Docker etc.
- Big Data Platforms knowledge EMR, Databricks. Cloudera a plus

Similar jobs

Location: Remote (India only)
About Certa At Certa, we’re revolutionizing process automation for top-tier companies, including Fortune 500 and Fortune 1000 leaders, from the heart of Silicon Valley. Our mission? Simplifying complexity through cutting-edge SaaS solutions. Join our thriving, global team and become a key player in a startup environment that champions innovation, continuous learning, and unlimited growth. We offer a fully remote, flexible workspace that empowers you to excel.
Role Overview
Ready to elevate your DevOps career by shaping the backbone of a fast-growing SaaS platform? As a Senior DevOps Engineer at Certa, you’ll lead the charge in building, automating, and optimizing our cloud infrastructure. Beyond infrastructure management, you’ll actively contribute with a product-focused mindset, understanding customer requirements, collaborating closely with product and engineering teams, and ensuring our AWS-based platform consistently meets user needs and business goals.
What You’ll Do
- Own SaaS Infrastructure: Design, architect, and maintain robust, scalable AWS infrastructure, enhancing platform stability, security, and performance.
- Orchestrate with Kubernetes: Utilize your advanced Kubernetes expertise to manage and scale containerized deployments efficiently and reliably.
- Collaborate on Enterprise Architecture: Align infrastructure strategies with enterprise architectural standards, partnering closely with architects to build integrated solutions.
- Drive Observability: Implement and evolve sophisticated monitoring and observability solutions (DataDog, ELK Stack, AWS CloudWatch) to proactively detect, troubleshoot, and resolve system anomalies.
- Lead Automation Initiatives: Champion an automation-first mindset across the organization, streamlining development, deployment, and operational workflows.
- Implement Infrastructure as Code (IaC): Master Terraform to build repeatable, maintainable cloud infrastructure automation.
- Optimize CI/CD Pipelines: Refine and manage continuous integration and deployment processes (currently GitHub Actions, transitioning to CircleCI), enhancing efficiency and reliability.
- Enable GitOps with ArgoCD: Deliver seamless GitOps-driven application deployments, ensuring accuracy and consistency in Kubernetes environments.
- Advocate for Best Practices: Continuously promote and enforce industry-standard DevOps practices, ensuring consistent, secure, and efficient operational outcomes.
- Innovate and Improve: Constantly evaluate and enhance current DevOps processes, tooling, and methodologies to maintain cutting-edge efficiency.
- Product Mindset: Actively engage with product and engineering teams, bringing infrastructure expertise to product discussions, understanding customer needs, and helping prioritize infrastructure improvements that directly benefit users and business objectives.
What You Bring
- Hands-On Experience: 3-5 years in DevOps roles, ideally within fast-paced SaaS environments.
- Kubernetes Mastery: Advanced knowledge and practical experience managing Kubernetes clusters and container orchestration.
- AWS Excellence: Comprehensive expertise across AWS services, infrastructure management, and security.
- IaC Competence: Demonstrated skill in Terraform for infrastructure automation and management.
- CI/CD Acumen: Proven proficiency managing pipelines with GitHub Actions; familiarity with CircleCI highly advantageous.
- GitOps Knowledge: Experience with ArgoCD for effective continuous deployment and operations.
- Observability Skills: Strong capabilities deploying and managing monitoring solutions such as DataDog, ELK, and AWS CloudWatch.
- Python Automation: Solid scripting and automation skills using Python.
- Architectural Awareness: Understanding of enterprise architecture frameworks and alignment practices.
- Proactive Problem-Solving: Exceptional analytical and troubleshooting skills, adept at swiftly addressing complex technical challenges.
- Effective Communication: Strong interpersonal and collaborative skills, essential for remote, distributed teamwork.
- Product Focus: Ability and willingness to understand customer requirements, prioritize tasks that enhance product value, and proactively suggest infrastructure improvements driven by user needs.
- Startup Mindset (Bonus): Prior experience or enthusiasm for dynamic startup cultures is a distinct advantage.
Why Join Us
- Compensation: Top-tier salary and exceptional benefits.
- Work-Life Flexibility: Fully remote, flexible scheduling.
- Growth Opportunities: Accelerate your career in a company poised for significant growth.
- Innovative Culture: Engineering-centric, innovation-driven work environment.
- Team Events: Annual offsites and quarterly Hackerhouse.
- Wellness & Family: Comprehensive healthcare and parental leave.
- Workspace: Premium workstation setup allowance, providing the tech you need to succeed.
The Key Responsibilities Include But Not Limited to:
Help identify and drive Speed, Performance, Scalability, and Reliability related optimization based on experience and learnings from the production incidents.
Work in an agile DevSecOps environment in creating, maintaining, monitoring, and automation of the overall solution-deployment.
Understand and explain the effect of product architecture decisions on systems.
Identify issues and/or opportunities for improvements that are common across multiple services/teams.
This role will require weekend deployments
Skills and Qualifications:
1. 3+ years of experience in a DevOps end-to-end development process with heavy focus on service monitoring and site reliability engineering work.
2. Advanced knowledge of programming/scripting languages (Bash, PERL, Python, Node.js).
3. Experience in Agile/SCRUM enterprise-scale software development including working with GiT, JIRA, Confluence, etc.
4. Advance experience with core microservice technology (RESTFul development).
5. Working knowledge of using Advance AI/ML tools are pluses.
6. Working knowledge in the one or more of the Cloud Services: Amazon AWS, Microsoft Azure
7. Bachelors or Master’s degree in Computer Science or equivalent related field experience
Key Behaviours / Attitudes:
Professional curiosity and a desire to a develop deep understanding of services and technologies.
Experience building & running systems to drive high availability, performance and operational improvements
Excellent written & oral communication skills; to ask pertinent questions, and to assess/aggregate/report the responses.
Ability to quickly grasp and analyze complex and rapidly changing systemsSoft skills
1. Self-motivated and self-managing.
2. Excellent communication / follow-up / time management skills.
3. Ability to fulfill role/duties independently within defined policies and procedures.
4. Ability to balance multi-task and multiple priorities while maintaining a high level of customer satisfaction is key.
5. Be able to work in an interrupt-driven environment.Work with Dori Ai world class technology to develop, implement, and support Dori's global infrastructure.
As a member of the IT organization, assist with the analyze of existing complex programs and formulate logic for new complex internal systems. Prepare flowcharting, perform coding, and test/debug programs. Develop conversion and system implementation plans. Recommend changes to development, maintenance, and system standards.
Leading contributor individually and as a team member, providing direction and mentoring to others. Work is non-routine and very complex, involving the application of advanced technical/business skills in a specialized area. BS or equivalent experience in programming on enterprise or department servers or systems.
- Candidate should be able to write the sample programs using the Tools (Bash, PowerShell, Python or Shell scripting)
- Analytical/logical reasoning
- GitHub Actions
- Should have good working experience with GitHub Actions
- Repository/Workflow Dispatch, writing reusable workflows, etc
- AZ CLI commands
- Hands-on experience with AZ CLI commands

" Skills : Strong experience in Ansible, Cloud, Linux, Python or Shell or Bash scripting
" Experience : 3 - 6 Years
" Location : Bangalore
Good to have cloud skills - Docker / Kubernetes
Scripting skills - Any of Shell / Perl/ bash/Python
Good to have Terraform
As a MLOps Engineer in QuantumBlack you will:
Develop and deploy technology that enables data scientists and data engineers to build, productionize and deploy machine learning models following best practices. Work to set the standards for SWE and
DevOps practices within multi-disciplinary delivery teams
Choose and use the right cloud services, DevOps tooling and ML tooling for the team to be able to produce high-quality code that allows your team to release to production.
Build modern, scalable, and secure CI/CD pipelines to automate development and deployment
workflows used by data scientists (ML pipelines) and data engineers (Data pipelines)
Shape and support next generation technology that enables scaling ML products and platforms. Bring
expertise in cloud to enable ML use case development, including MLOps
Our Tech Stack-
We leverage AWS, Google Cloud, Azure, Databricks, Docker, Kubernetes, Argo, Airflow, Kedro, Python,
Terraform, GitHub actions, MLFlow, Node.JS, React, Typescript amongst others in our projects
Key Skills:
• Excellent hands-on expert knowledge of cloud platform infrastructure and administration
(Azure/AWS/GCP) with strong knowledge of cloud services integration, and cloud security
• Expertise setting up CI/CD processes, building and maintaining secure DevOps pipelines with at
least 2 major DevOps stacks (e.g., Azure DevOps, Gitlab, Argo)
• Experience with modern development methods and tooling: Containers (e.g., docker) and
container orchestration (K8s), CI/CD tools (e.g., Circle CI, Jenkins, GitHub actions, Azure
DevOps), version control (Git, GitHub, GitLab), orchestration/DAGs tools (e.g., Argo, Airflow,
Kubeflow)
• Hands-on coding skills Python 3 (e.g., API including automated testing frameworks and libraries
(e.g., pytest) and Infrastructure as Code (e.g., Terraform) and Kubernetes artifacts (e.g.,
deployments, operators, helm charts)
• Experience setting up at least one contemporary MLOps tooling (e.g., experiment tracking,
model governance, packaging, deployment, feature store)
• Practical knowledge delivering and maintaining production software such as APIs and cloud
infrastructure
• Knowledge of SQL (intermediate level or more preferred) and familiarity working with at least
one common RDBMS (MySQL, Postgres, SQL Server, Oracle)
This company is a network of the world's best developers - full-time, long-term remote software jobs with better compensation and career growth. We enable our clients to accelerate their Cloud Offering, and Capitalize on Cloud. We have our own IOT/AI platform and we provide professional services on that platform to build custom clouds for their IOT devices. We also build mobile apps, run 24x7 devops/site reliability engineering for our clients.
We are looking for very hands-on SRE (Site Reliability Engineering) engineers with 3 to 6 years of experience. The person will be part of team that is responsible for designing & implementing automation from scratch for medium to large scale cloud infrastructure and providing 24x7 services to our North American / European customers. This also includes ensuring ~100% uptime for almost 50+ internal sites. The person is expected to deliver with both high speed and high quality as well as work for 40 Hours per week (~6.5 hours per day, 6 days per week) in shifts which will rotate every month.
This person MUST have:
- B.E Computer Science or equivalent
- 2+ Years of hands-on experience troubleshooting/setting up of the Linux environment, who can write shell scripts for any given requirement.
- 1+ Years of hands-on experience setting up/configuring AWS or GCP services from SCRATCH and maintaining them.
- 1+ Years of hands-on experience setting up/configuring Kubernetes & EKS and ensuring high availability of container orchestration.
- 1+ Years of hands-on experience setting up CICD from SCRATCH in Jenkins & Gitlab.
- Experience configuring/maintaining one monitoring tool.
- Excellent verbal & written communication skills.
- Candidates with certifications - AWS, GCP, CKA, etc will be preferred
- Hands-on experience with databases (Cassandra, MongoDB, MySQL, RDS).
Experience:
- Min 3 years of experience as SRE automation engineer building, running, and maintaining production sites. Not looking for candidates who have experience only as L1/L2 or Build & Deploy..
Location:
- Remotely, anywhere in India
Timings:
- The person is expected to deliver with both high speed and high quality as well as work for 40 Hours per week (~6.5 hours per day, 6 days per week) in shifts which will rotate every month.
Position:
- Full time/Direct
- We have great benefits such as PF, medical insurance, 12 annual company holidays, 12 PTO leaves per year, annual increments, Diwali bonus, spot bonuses and other incentives etc.
- We dont believe in locking in people with large notice periods. You will stay here because you love the company. We have only a 15 days notice period.
This person MUST have:
- Min of 3-5 prior experience as a DevOps Engineer.
- Expertise in CI/CD pipeline maintenance and enhancement specifically Jenkins based pipelines.
- Working experience with engineering tools like git, git work flow, bitbucket, JIRA etc
- Hands-on experience deploying and managing infrastructure with CloudFormation/Terraform
- Experience managing AWS infrastructure
- Hands on experience of Linux administration.
- Basic understanding of Kubernetes/Docker orchestration
- Works closely with engineering team for day to day activities
- Manges existing infrastructure/Pipelines/Engineering tools (On Prem or AWS) for engineering team (Build servers/Jenkin nodes etc.)
- Works with engineering team for new config required for infra like replicating the setups, adding new resources etc.
- Works closely with engineering team for improving existing pipelines for build .
- Troubleshoots problems across infrastructure/services
Experience:
- Min 5-7 year experience
Location
- Remotely, anywhere in India
Timings:
- 40 hours a week (11 AM to 7 PM).
Position:
- Full time/Direct
- We have great benefits such as PF, medical insurance, 12 annual company holidays, 12 PTO leaves per year, annual increments, Diwali bonus, spot bonuses and other incentives etc.
- We dont believe in locking in people with large notice periods. You will stay here because you love the company. We have only a 15 days notice period.

- Work with developers to build out CI/CD pipelines, enable self-service build tools and reusable deployment jobs. Find, explore, and advocate for new technologies for enterprise use.
- Automate the provisioning of environments
- Promote new DevOps tools to simplify the build process and entire Continuous Delivery.
- Manage a Continuous Integration and Deployment environment.
- Coordinate and scale the evolving build and cloud deployment systems across all product development teams.
- Work independently, with, and across teams. Establishing smooth running. environments are paramount to your success, and happiness
- Encourage innovation, implementation of cutting-edge technologies, inclusion, outside-of-the[1]box thinking, teamwork, self-organization, and diversity.
Technical Skills
- Experience with AWS multi-region/multi-AZ deployed systems, auto-scaling of EC2 instances, CloudFormation, ELBs, VPCs, CloudWatch, SNS, SQS, S3, Route53, RDS, IAM roles, security groups, cloud watch
- Experience in Data Visualization and Monitoring tools such as Grafana and Kibana
- Experienced in Build and CI/CD/CT technologies like GitHub, Chef, Artifactory, Hudson/Jenkins
- Experience with log collection, filter creation, and analysis, builds, and performance monitoring/tuning of infrastructure.
- Automate the provisioning of environments pulling strings with Puppet, cooking up some recipes with Chef, or through Ansible, and the deployment of those environments using containers, like Docker or Rocket: (have at least some configuration management tool through some version control).
Qualifications:
- B.E/ B.Tech/ M.C.A in Computer Science, Electronics and Communication Engineering, Electronics and Electrical Engineering.
- Minimum 60% in Graduation and Post-Graduation.
- Good verbal and written communication skills

