

Similar jobs
Job Title : Senior DevOps Engineer
Location : Remote
Experience Level : 5+ Years
Role Overview :
We are a funded AI startup seeking a Senior DevOps Engineer to design, implement, and maintain a secure, scalable, and efficient infrastructure. In this role, you will focus on automating operations, optimizing deployment processes, and enabling engineering teams to deliver high-quality products seamlessly.
Key Responsibilities:
Infrastructure Scalability & Reliability :
- Architect and manage cloud infrastructure on AWS, GCP, or Azure for high availability, reliability, and cost-efficiency.
- Implement container orchestration using Kubernetes or Docker Compose.
- Utilize Infrastructure as Code (IaC) tools like Pulumi or Terraform to manage and configure infrastructure.
Deployment Automation :
- Design and maintain CI/CD pipelines using GitHub Actions, Jenkins, or similar tools.
- Implement deployment strategies such as canary or blue-green deployments, and create rollback mechanisms to ensure seamless updates.
Monitoring & Observability :
- Leverage tools like OpenTelemetry, Grafana, and Datadog to monitor system health and performance.
- Establish centralized logging systems and create real-time dashboards for actionable insights.
Security & Compliance :
- Securely manage secrets using tools like HashiCorp Vault or Doppler.
- Conduct static code analysis with tools such as SonarQube or Snyk to ensure compliance with security standards.
Collaboration & Team Enablement :
- Mentor and guide team members on DevOps best practices and workflows.
- Document infrastructure setups, incident runbooks, and troubleshooting workflows to enhance team efficiency.
Required Skills :
- Expertise in managing cloud platforms like AWS, GCP, or Azure.
- In-depth knowledge of Kubernetes, Docker, and IaC tools like Terraform or Pulumi.
- Advanced scripting capabilities in Python or Bash.
- Proficiency in CI/CD tools such as GitHub Actions, Jenkins, or similar.
- Experience with observability tools like Grafana, OpenTelemetry, and Datadog.
- Strong troubleshooting skills for debugging production systems and optimizing performance.
Preferred Qualifications :
- Experience in scaling AI or ML-based applications.
- Familiarity with distributed systems and microservices architecture.
- Understanding of agile methodologies and DevSecOps practices.
- Certifications in AWS, Azure, or Kubernetes.
What We Offer :
- Opportunity to work in a fast-paced AI startup environment.
- Flexible remote work culture.
- Competitive salary and equity options.
- Professional growth through challenging projects and learning opportunities.

Location: Remote (India only)
About Certa At Certa, we’re revolutionizing process automation for top-tier companies, including Fortune 500 and Fortune 1000 leaders, from the heart of Silicon Valley. Our mission? Simplifying complexity through cutting-edge SaaS solutions. Join our thriving, global team and become a key player in a startup environment that champions innovation, continuous learning, and unlimited growth. We offer a fully remote, flexible workspace that empowers you to excel.
Role Overview
Ready to elevate your DevOps career by shaping the backbone of a fast-growing SaaS platform? As a Senior DevOps Engineer at Certa, you’ll lead the charge in building, automating, and optimizing our cloud infrastructure. Beyond infrastructure management, you’ll actively contribute with a product-focused mindset, understanding customer requirements, collaborating closely with product and engineering teams, and ensuring our AWS-based platform consistently meets user needs and business goals.
What You’ll Do
- Own SaaS Infrastructure: Design, architect, and maintain robust, scalable AWS infrastructure, enhancing platform stability, security, and performance.
- Orchestrate with Kubernetes: Utilize your advanced Kubernetes expertise to manage and scale containerized deployments efficiently and reliably.
- Collaborate on Enterprise Architecture: Align infrastructure strategies with enterprise architectural standards, partnering closely with architects to build integrated solutions.
- Drive Observability: Implement and evolve sophisticated monitoring and observability solutions (DataDog, ELK Stack, AWS CloudWatch) to proactively detect, troubleshoot, and resolve system anomalies.
- Lead Automation Initiatives: Champion an automation-first mindset across the organization, streamlining development, deployment, and operational workflows.
- Implement Infrastructure as Code (IaC): Master Terraform to build repeatable, maintainable cloud infrastructure automation.
- Optimize CI/CD Pipelines: Refine and manage continuous integration and deployment processes (currently GitHub Actions, transitioning to CircleCI), enhancing efficiency and reliability.
- Enable GitOps with ArgoCD: Deliver seamless GitOps-driven application deployments, ensuring accuracy and consistency in Kubernetes environments.
- Advocate for Best Practices: Continuously promote and enforce industry-standard DevOps practices, ensuring consistent, secure, and efficient operational outcomes.
- Innovate and Improve: Constantly evaluate and enhance current DevOps processes, tooling, and methodologies to maintain cutting-edge efficiency.
- Product Mindset: Actively engage with product and engineering teams, bringing infrastructure expertise to product discussions, understanding customer needs, and helping prioritize infrastructure improvements that directly benefit users and business objectives.
What You Bring
- Hands-On Experience: 3-5 years in DevOps roles, ideally within fast-paced SaaS environments.
- Kubernetes Mastery: Advanced knowledge and practical experience managing Kubernetes clusters and container orchestration.
- AWS Excellence: Comprehensive expertise across AWS services, infrastructure management, and security.
- IaC Competence: Demonstrated skill in Terraform for infrastructure automation and management.
- CI/CD Acumen: Proven proficiency managing pipelines with GitHub Actions; familiarity with CircleCI highly advantageous.
- GitOps Knowledge: Experience with ArgoCD for effective continuous deployment and operations.
- Observability Skills: Strong capabilities deploying and managing monitoring solutions such as DataDog, ELK, and AWS CloudWatch.
- Python Automation: Solid scripting and automation skills using Python.
- Architectural Awareness: Understanding of enterprise architecture frameworks and alignment practices.
- Proactive Problem-Solving: Exceptional analytical and troubleshooting skills, adept at swiftly addressing complex technical challenges.
- Effective Communication: Strong interpersonal and collaborative skills, essential for remote, distributed teamwork.
- Product Focus: Ability and willingness to understand customer requirements, prioritize tasks that enhance product value, and proactively suggest infrastructure improvements driven by user needs.
- Startup Mindset (Bonus): Prior experience or enthusiasm for dynamic startup cultures is a distinct advantage.
Why Join Us
- Compensation: Top-tier salary and exceptional benefits.
- Work-Life Flexibility: Fully remote, flexible scheduling.
- Growth Opportunities: Accelerate your career in a company poised for significant growth.
- Innovative Culture: Engineering-centric, innovation-driven work environment.
- Team Events: Annual offsites and quarterly Hackerhouse.
- Wellness & Family: Comprehensive healthcare and parental leave.
- Workspace: Premium workstation setup allowance, providing the tech you need to succeed.
Key Responsibilities:-
• Collaborate with Data Scientists to test and scale new algorithms through pilots and later industrialize the solutions at scale to the comprehensive fashion network of the Group
• Influence, build and maintain the large-scale data infrastructure required for the AI projects, and integrate with external IT infrastructure/service to provide an e2e solution
• Leverage an understanding of software architecture and software design patterns to write scalable, maintainable, well-designed and future-proof code
• Design, develop and maintain the framework for the analytical pipeline
• Develop common components to address pain points in machine learning projects, like model lifecycle management, feature store and data quality evaluation
• Provide input and help implement framework and tools to improve data quality
• Work in cross-functional agile teams of highly skilled software/machine learning engineers, data scientists, designers, product managers and others to build the AI ecosystem within the Group
• Deliver on time, demonstrating a strong commitment to deliver on the team mission and agreed backlog
• DevOps/Build and Release Engineer with maturity to help, define and automate the processes.
• Work, configure, install, manage, on source control tools like AWS Codecommit / GitHub / BitBucket.
• Automate implementation/deployment of code in the cloud-based infrastructure (AWS Preferred).
• Setup monitoring of infrastructure and applications with alerting frameworks
Requirements:
• Able to code in Python.
• Extensive experience with building and supporting Docker and Kubernetes in
production.
• Understand AWS (Amazon Web Services) and be able to jump right into our
environment.
• Security Clearance will be required.
• Lambda used in conjunction with S3, CloudTrail and EC2.
• CloudFormation (Infrastructure as code)
• CloudWatch and CloudTrail
• Version Control (SVN, Git, Artifactory, Bit bucket)
• CI/CD (Jenkins or similar)
• Docker Compose or other orchestration tools
• Rest API
• DB (Postgres/Oracle/SQL Server or NoSql or Graph DB)
• Bachelor’s Degree in Computer Science, Computer Engineering or a closely
related field.
• Server orchestration using tools like Puppet, Chef, Ansible, etc.
Please send your CV at priyanka.sharma @ neotas.com
Neotas.com
● Bachelor Degree or 5+ years of professional or experience.
● 2+ years of hands-on experience of programming in languages such as Python, Ruby,
Go, Swift, Java, .Net, C++ or similar object-oriented language.
● Experience with automating cloud native technologies, deploying applications, and
provisioning infrastructure.
● Hands-on experience with Infrastructure as Code, using CloudFormation, Terraform, or
other tools.
● Experience developing cloud native CI/CD workflows and tools, such as Jenkins,
Bamboo, TeamCity, Code Deploy (AWS) and/or GitLab.
● Hands-on experience with microservices and distributed application architecture, such
as containers, Kubernetes, and/or serverless technology.
● Hands-on experience in building/managing data pipelines, reporting & analytics.
● Experience with the full software development lifecycle and delivery using Agile
practices.
● Preferable (bonus points if you know these):
○ AWS cloud management
○ Kafka
○ Databricks
○ Gitlab CI/CD hooks
○ Python notebooks
Our Client is an IT infrastructure services company, focused and specialized in delivering solutions and services on Microsoft products and technologies. They are a Microsoft partner and cloud solution provider. Our Client's objective is to help small, mid-sized as well as global enterprises to transform their business by using innovation in IT, adapting to the latest technologies and using IT as an enabler for business to meet business goals and continuous growth.
With focused and experienced management and a strong team of IT Infrastructure professionals, they are adding value by making IT Infrastructure a robust, agile, secure and cost-effective service to the business. As an independent IT Infrastructure company, they provide their clients with unbiased advice on how to successfully implement and manage technology to complement their business requirements.
- Providing on-call support within a high availability production environment
- Logging issues
- Providing Complex problem analysis and resolution for technical and application issues
- Supporting and collaborating with team members
- Running system updates
- Monitoring and responding to system alerts
- Developing and running system health checks
- Applying industry standard practices across the technology estate
- Performing system reviews
- Reviewing and maintaining infrastructure configuration
- Diagnosing performance issues and network bottlenecks
- Collaborating within geographically distributed teams
- Supporting software development infrastructure by continuous integration and delivery standards
- Working closely with developers and QA teams as part of a customer support centre
- Projecting delivery work, either individually or in conjunction with other teams, external suppliers or contractors
- Ensuring maintenance of the technical environments to meet current standards
- Ensuring compliance with appropriate industry and security regulations
- Providing support to Development and Customer Support teams
- Managing the hosted infrastructure through vendor engagement
- Managing 3rd party software licensing ensuring compliance
- Delivering new technologies as agreed by the business
What you need to have:
- Experience working within a technical operations environment relevant to associated skills stated.
- Be proficient in:
- Linux, zsh/ bash/ similar
- ssh, tmux/ screen/ similar
- vim/ emacs/ similar
- Computer networking
- Have a reasonable working knowledge of:
- Cloud infrastructure, Preferably GCP
- One or more programming/ scripting languages
- Git
- Docker
- Web services and web servers
- Databases, relational and NoSQL
- Some familiarity with:
- Puppet, ansible
- Terraform
- GitHub, CircleCI , Kubernetes
- Scripting language- Shell
- Databases: Cassandra, Postgres, MySQL or CloudSQL
- Agile working practices including scrum and Kanban
- Private & public cloud hosting environments
- Strong technology interests with a positive ‘can do’ attitude
- Be flexible and adaptable to changing priorities
- Be good at planning and organising their own time and able to meet targets and deadlines without supervision
- Excellent written and verbal communication skills.
- Approachable with both colleagues and team members
- Be resourceful and practical with an ability to respond positively and quickly to technical and business challenges
- Be persuasive, articulate and influential, but down to earth and friendly with own team and colleagues
- Have an ability to establish relationships quickly and to work effectively either as part of a team or singularly
- Be customer focused with both internal and external customers
- Be capable of remaining calm under pressure
- Technically minded with good problem resolution skills and systematic manner
- Excellent documentation skills
- Prepared to participate in out of hours support rota

Requirements and Qualifications
- Bachelor’s degree in Computer Science Engineering or in a related field
- 4+ years of experience
- Excellent analytical and problem-solving skills
- Strong knowledge of Linux systems and internals
- Programming experience in Python/Shell scripting
- Strong AWS skills with knowledge of EC2, VPC, S3, RDS, Cloudfront, Route53, etc
- Experience in containerization (Docker) and container orchestration (Kubernetes)
- Experience in DevOps & CI/CD tools such as Git, Jenkins, Terraform, Helm
- Experience with SQL & NoSQL databases such as MySql, MongoDB, and ElasticSearch
- Debugging and troubleshooting skills using tools such as strace, tcpdump, etc
- Good understanding of networking protocol and security concerns (VPN, VPC, IG, NAT, AZ, Subnet)
- Experience with monitoring and data analysis tools such as Prometheus, EFK, etc
- Good communication & collaboration skills and attention to details
- Participation in rotating on-call duties
MTX Group Inc. is seeking a motivated Lead DevOps Engineer to join our team. MTX Group Inc. is a global implementation partner enabling organizations to become fit enterprises. MTX provides expertise across various platforms and technologies, including Google Cloud, Salesforce, artificial intelligence/machine learning, data integration, data governance, data quality, analytics, visualization and mobile technology. MTX’s very own Artificial Intelligence platform Maverick, enables clients to accelerate processes and critical decisions by leveraging a Cognitive Decision Engine, a collection of purpose-built Artificial Neural Networks designed to leverage the power of Machine Learning. The Maverick Platform includes Smart Asset Detection and Monitoring, Chatbot Services, Document Verification, to name a few.
Responsibilities:
- Be responsible for software releases, configuration, monitoring and support of production system components and infrastructure.
- Troubleshoot technical or functional issues in a complex environment to provide timely resolution, with various applications and platforms that are global.
- Bring experience on Google Cloud Platform.
- Write scripts and automation tools in languages such as Bash/Python/Ruby/Golang.
- Configure and manage data sources like PostgreSQL, MySQL, Mongo, Elasticsearch, Redis, Cassandra, Hadoop, etc
- Build automation and tooling around Google Cloud Platform using technologies such as Anthos, Kubernetes, Terraform, Google Deployment Manager, Helm, Cloud Build etc.
- Bring a passion to stay on top of DevOps trends, experiment with and learn new CI/CD technologies.
- Work with users to understand and gather their needs in our catalogue. Then participate in the required developments
- Manage several streams of work concurrently
- Understand how various systems work
- Understand how IT operations are managed
What you will bring:
- 5 years of work experience as a DevOps Engineer.
- Must possess ample knowledge and experience in system automation, deployment, and implementation.
- Must possess experience in using Linux, Jenkins, and ample experience in configuring and automating the monitoring tools.
- Experience in the software development process and tools and languages like SaaS, Python, Java, MongoDB, Shell scripting, Python, MySQL, and Git.
- Knowledge in handling distributed data systems. Examples: Elasticsearch, Cassandra, Hadoop, and others.
What we offer:
- Group Medical Insurance (Family Floater Plan - Self + Spouse + 2 Dependent Children)
- Sum Insured: INR 5,00,000/-
- Maternity cover upto two children
- Inclusive of COVID-19 Coverage
- Cashless & Reimbursement facility
- Access to free online doctor consultation
- Personal Accident Policy (Disability Insurance) -
- Sum Insured: INR. 25,00,000/- Per Employee
- Accidental Death and Permanent Total Disability is covered up to 100% of Sum Insured
- Permanent Partial Disability is covered as per the scale of benefits decided by the Insurer
- Temporary Total Disability is covered
- An option of Paytm Food Wallet (up to Rs. 2500) as a tax saver benefit
- Monthly Internet Reimbursement of upto Rs. 1,000
- Opportunity to pursue Executive Programs/ courses at top universities globally
- Professional Development opportunities through various MTX sponsored certifications on multiple technology stacks including Salesforce, Google Cloud, Amazon & others
*******************

Skill: Python, Docker or Ansible , AWS
➢ Experience Building a multi-region highly available auto-scaling infrastructure that optimizes
performance and cost. plan for future infrastructure as well as Maintain & optimize existing
infrastructure.
➢ Conceptualize, architect and build automated deployment pipelines in a CI/CD environment like
Jenkins.
➢ Conceptualize, architect and build a containerized infrastructure using Docker,Mesosphere or
similar SaaS platforms.
Work with developers to institute systems, policies and workflows which allow for rollback of
deployments Triage release of applications to production environment on a daily basis.
➢ Interface with developers and triage SQL queries that need to be executed inproduction
environments.
➢ Maintain 24/7 on-call rotation to respond and support troubleshooting of issues in production.
➢ Assist the developers and on calls for other teams with post mortem, follow up and review of
issues affecting production availability.
➢ Establishing and enforcing systems monitoring tools and standards
➢ Establishing and enforcing Risk Assessment policies and standards
➢ Establishing and enforcing Escalation policies and standards

