About the job
Our goal
We are reinventing the future of MLOps. Censius Observability platform enables businesses to gain greater visibility into how their AI makes decisions to understand it better. We enable explanations of predictions, continuous monitoring of drifts, and assessing fairness in the real world. (TLDR build the best ML monitoring tool)
The culture
We believe in constantly iterating and improving our team culture, just like our product. We have found a good balance between async and sync work default is still Notion docs over meetings, but at the same time, we recognize that as an early-stage startup brainstorming together over calls leads to results faster. If you enjoy taking ownership, moving quickly, and writing docs, you will fit right in.
The role:
Our engineering team is growing and we are looking to bring on board a senior software engineer who can help us transition to the next phase of the company. As we roll out our platform to customers, you will be pivotal in refining our system architecture, ensuring the various tech stacks play well with each other, and smoothening the DevOps process.
On the platform, we use Python (ML-related jobs), Golang (core infrastructure), and NodeJS (user-facing). The platform is 100% cloud-native and we use Envoy as a proxy (eventually will lead to service-mesh architecture).
By joining our team, you will get the exposure to working across a swath of modern technologies while building an enterprise-grade ML platform in the most promising area.
Responsibilities
- Be the bridge between engineering and product teams. Understand long-term product roadmap and architect a system design that will scale with our plans.
- Take ownership of converting product insights into detailed engineering requirements. Break these down into smaller tasks and work with the team to plan and execute sprints.
- Author high-quality, highly-performance, and unit-tested code running on a distributed environment using containers.
- Continually evaluate and improve DevOps processes for a cloud-native codebase.
- Review PRs, mentor others and proactively take initiatives to improve our team's shipping velocity.
- Leverage your industry experience to champion engineering best practices within the organization.
Qualifications
Work Experience
- 3+ years of industry experience (2+ years in a senior engineering role) preferably with some exposure in leading remote development teams in the past.
- Proven track record building large-scale, high-throughput, low-latency production systems with at least 3+ years working with customers, architecting solutions, and delivering end-to-end products.
- Fluency in writing production-grade Go or Python in a microservice architecture with containers/VMs for over 3+ years.
- 3+ years of DevOps experience (Kubernetes, Docker, Helm and public cloud APIs)
- Worked with relational (SQL) as well as non-relational databases (Mongo or Couch) in a production environment.
- (Bonus: worked with big data in data lakes/warehouses).
- (Bonus: built an end-to-end ML pipeline)
Skills
- Strong documentation skills. As a remote team, we heavily rely on elaborate documentation for everything we are working on.
- Ability to motivate, mentor, and lead others (we have a flat team structure, but the team would rely upon you to make important decisions)
- Strong independent contributor as well as a team player.
- Working knowledge of ML and familiarity with concepts of MLOps
Benefits
- Competitive Salary
- Work Remotely
- Health insurance
- Unlimited Time Off
- Support for continual learning (free books and online courses)
- Reimbursement for streaming services (think Netflix)
- Reimbursement for gym or physical activity of your choice
- Flex hours
- Leveling Up Opportunities
You will excel in this role if
- You have a product mindset. You understand, care about, and can relate to our customers.
- You take ownership, collaborate, and follow through to the very end.
- You love solving difficult problems, stand your ground, and get what you want from engineers.
- Resonate with our core values of innovation, curiosity, accountability, trust, fun, and social good.

About Censiusai
About
Connect with the team
Similar jobs
Job Summary :
We are looking for a proactive and skilled Senior DevOps Engineer to join our team and play a key role in building, managing, and scaling infrastructure for high-performance systems. The ideal candidate will have hands-on experience with Kubernetes, Docker, Python scripting, cloud platforms, and DevOps practices around CI/CD, monitoring, and incident response.
Key Responsibilities :
- Design, build, and maintain scalable, reliable, and secure infrastructure on cloud platforms (AWS, GCP, or Azure).
- Implement Infrastructure as Code (IaC) using tools like Terraform, Cloud Formation, or similar.
- Manage Kubernetes clusters, configure namespaces, services, deployments, and auto scaling.
CI/CD & Release Management :
- Build and optimize CI/CD pipelines for automated testing, building, and deployment of services.
- Collaborate with developers to ensure smooth and frequent deployments to production.
- Manage versioning and rollback strategies for critical deployments.
Containerization & Orchestration using Kubernetes :
- Containerize applications using Docker, and manage them using Kubernetes.
- Write automation scripts using Python or Shell for infrastructure tasks, monitoring, and deployment flows.
- Develop utilities and tools to enhance operational efficiency and reliability.
Monitoring & Incident Management :
- Analyze system performance and implement infrastructure scaling strategies based on load and usage trends.
- Optimize application and system performance through proactive monitoring and configuration tuning.
Desired Skills and Experience :
- Experience Required - 8+ yrs.
- Hands-on experience on cloud services like AWS, EKS etc.
- Ability to design a good cloud solution.
- Strong Linux troubleshooting, Shell Scripting, Kubernetes, Docker, Ansible, Jenkins Skills.
- Design and implement the CI/CD pipeline following the best industry practices using open-source tools.
- Use knowledge and research to constantly modernize our applications and infrastructure stacks.
- Be a team player and strong problem-solver to work with a diverse team.
- Having good communication skills.
You will be responsible for:
- Managing all DevOps and infrastructure for Sizzle
- We have both cloud and on-premise servers
- Work closely with all AI and backend engineers on processing requirements and managing both development and production requirements
- Optimize the pipeline to ensure ultra fast processing
- Work closely with management team on infrastructure upgrades
You should have the following qualities:
- 3+ years of experience in DevOps, and CI/CD
- Deep experience in: Gitlab, Gitops, Ansible, Docker, Grafana, Prometheus
- Strong background in Linux system administration
- Deep expertise with AI/ML pipeline processing, especially with GPU processing. This doesn’t need to include model training, data gathering, etc. We’re looking more for experience on model deployment, and inferencing tasks at scale
- Deep expertise in Python including multiprocessing / multithreaded applications
- Performance profiling including memory, CPU, GPU profiling
- Error handling and building robust scripts that will be expected to run for weeks to months at a time
- Deploying to production servers and monitoring and maintaining the scripts
- DB integration including pymongo and sqlalchemy (we have MongoDB and PostgreSQL databases on our backend)
- Expertise in Docker-based virtualization including - creating & maintaining custom Docker images, deployment of Docker images on cloud and on-premise services, monitoring of production Docker images with robust error handling
- Expertise in AWS infrastructure, networking, availability
Optional but beneficial to have:
- Experience with running Nvidia GPU / CUDA-based tasks
- Experience with image processing in python (e.g. openCV, Pillow, etc)
- Experience with PostgreSQL and MongoDB (Or SQL familiarity)
- Excited about working in a fast-changing startup environment
- Willingness to learn rapidly on the job, try different things, and deliver results
- Bachelors or Masters degree in computer science or related field
- Ideally a gamer or someone interested in watching gaming content online
Skills:
DevOps, Ansible, CI/CD, GitLab, GitOps, Docker, Python, AWS, GCP, Grafana, Prometheus, python, sqlalchemy, Linux / Ubuntu system administration
Seniority: We are looking for a mid to senior level engineer
Salary: Will be commensurate with experience.
Who Should Apply:
If you have the right experience, regardless of your seniority, please apply.
Work Experience: 3 years to 6 years
- 5+ years of experience in DevOps including automated system configuration, application deployment, and infrastructure-as-code.
- Advanced Linux system administration abilities.
- Real-world experience managing large-scale AWS or GCP environments. Multi-account management a plus.
- Experience with managing production environments on AWS or GCP.
- Solid understanding CI/CD pipelines using GitHub, CircleCI/Jenkins, JFrog Artifactory/Nexus.
- Experience on any configuration management tools like Ansible, Puppet or Chef is a must.
- Experience in any one of the scripting languages: Shell, Python, etc.
- Experience in containerization using Docker and orchestration using Kubernetes/EKS/GKE is a must.
- Solid understanding of SSL and DNS.
- Experience on deploying and running any open-source monitoring/graphing solution like Prometheus, Grafana, etc.
- Basic understanding of networking concepts.
- Always adhere to security best practices.
- Knowledge on Bigdata (Hadoop/Druid) systems administration will be a plus.
- Knowledge on managing and running DBs (MySQL/MariaDB/Postgres) will be an added advantage.
What you get to do
- Work with development teams to build and maintain cloud environments to specifications developed closely with multiple teams. Support and automate the deployment of applications into those environments
- Diagnose and resolve occurring, latent and systemic reliability issues across entire stack: hardware, software, application and network. Work closely with development teams to troubleshoot and resolve application and service issues
- Continuously improve Conviva SaaS services and infrastructure for availability, performance and security
- Implement security best practices – primarily patching of operating systems and applications
- Automate everything. Build proactive monitoring and alerting tools. Provide standards, documentation, and coaching to developers.
- Participate in 12x7 on-call rotations
- Work with third party service/support providers for installations, support related calls, problem resolutions etc.
Experience: 8-10yrs
Notice Period: max 15days
Must-haves*
1. Knowledge about Database/NoSQL DB hosting fundamentals (RDS multi-AZ, DynamoDB, MongoDB, and such)
2. Knowledge of different storage platforms on AWS (EBS, EFS, FSx) - mounting persistent volumes with Docker Containers
3. In-depth knowledge of Security principles on AWS (WAF, DDoS, Security Groups, NACL's, IAM groups, and SSO)
4. Knowledge on CI/CD platforms is required (Jenkins, GitHub actions, etc.) - Migration of AWS Code pipelines to GitHub actions
5. Knowledge of vast variety of AWS services (SNS, SES, SQS, Athena, Kinesis, S3, ECS, EKS, etc.) is required
6. Knowledge on Infrastructure as Code tool is required We use Cloudformation. (Terraform is a plus), ideally, we would like to migrate to Terraform from CloudFormation
7. Setting CloudWatch Alarms and SMS/Email Slack alerts.
8. Some Knowledge on configuring any kind of monitoring tool such as Prometheus, Dynatrace, etc. (We currently use Datadog, CloudWatch)
9. Experience with any CDN provider configurations (Cloudflare, Fastly, or CloudFront)
10. Experience with either Python or Go scripting language.
11. Experience with Git branching strategy
12. Containers hosting knowledge on both Windows and Linux
The below list is *Nice to Have*
1. Integration experience with Code Quality tools (SonarQube, NetSparker, etc) with CI/CD
2. Kubernetes
3. CDN's other than CloudFront (Cloudflare, Fastly, etc)
4. Collaboration with multiple teams
5. GitOps

Job Description:
• Contribute to customer discussions in collecting the requirement
• Engage in internal and customer POC’s to realize the potential solutions envisaged for the customers.
• Design/Develop/Migrate VRA blueprints and VRO workflows; strong hands-on knowledge in vROPS and integrations with application and VMware solutions.
• Develop automation scripts to support the design and implementation of VMware projects.
Qualification:
• Maintain current, high-level technical knowledge of the entire VMware product portfolio and future product direction and In depth level knowledge
• Maintain deep technical and business knowledge of cloud computing and networking applications, industry directions, and trends.
• Experience with REST API and/or Python programming. TypeScript/NodeJS backend experience
• Experience with Kubernetes
• Familiarity with DevOps tools like Ansible, Puppet, Terraform
• End to end experience in Architecture, Design and Development of VMware Cloud Automation suite with good exposure to VMware products and/or Solutions.
• Hands-on experience in automation, coding, debugging and release.
• Sound process knowledge from requirement gathering, implementation, deployment and Support.
• Experience in working with global teams, customers and partners with solid communication skills.
• VMware CMA certification would be a plus
• Academic background in MS/BE/B-Tech/ IT/CS/ECE/EE would be preferred.
- Experience using AWS (that’s just common sense)
- Experience designing and building web environments on AWS, which includes working with services like EC2, ELB, RDS, and S3
- Experience building and maintaining cloud-native applications
- A solid background in Linux/Unix and Windows server system administration
- Experience using https://www.simplilearn.com/tutorials/devops-tutorial/devops-tools" target="_blank">DevOps tools in a cloud environment, such as Ansible, Artifactory, https://www.simplilearn.com/tutorials/docker-tutorial/what-is-docker-container" target="_blank">Docker, GitHub, https://www.simplilearn.com/tutorials/jenkins-tutorial/what-is-jenkins" target="_blank">Jenkins, https://www.simplilearn.com/tutorials/kubernetes-tutorial/what-is-kubernetes" target="_blank">Kubernetes, Maven, and Sonar Qube
- Experience installing and configuring different application servers such as JBoss, Tomcat, and WebLogic
- Experience using monitoring solutions like CloudWatch, ELK Stack, and Prometheus
- An understanding of writing Infrastructure-as-Code (IaC), using tools like CloudFormation or Terraform
- Knowledge of one or more of the most-used programming languages available for today’s cloud computing (i.e., SQL data, XML data, R math, Clojure math, Haskell functional, Erlang functional, Python procedural, and Go procedural languages)
- Experience in troubleshooting distributed systems
- Proficiency in script development and scripting languages
- The ability to be a team player
- The ability and skill to train other people in procedural and technical topics
- Strong communication and collaboration skills
As a special aside, an AWS engineer who works in DevOps should also have experience with:
- The theory, concepts, and real-world application of Continuous Delivery (CD), which requires familiarity with tools like AWS CodeBuild, AWS CodeDeploy, and AWS CodePipeline
- An understanding of automation

DevOps Architect
Experience: 10 - 12+ year relevant experience on DevOps
Locations : Bangalore, Chennai, Pune, Hyderabad, Jaipur.
Qualification:
• Bachelors or advanced degree in Computer science, Software engineering or equivalent is required.
• Certifications in specific areas are desired
Technical Skillset: Skills Proficiency level
- Build tools (Ant or Maven) - Expert
- CI/CD tool (Jenkins or Github CI/CD) - Expert
- Cloud DevOps (AWS CodeBuild, CodeDeploy, Code Pipeline etc) or Azure DevOps. - Expert
- Infrastructure As Code (Terraform, Helm charts etc.) - Expert
- Containerization (Docker, Docker Registry) - Expert
- Scripting (linux) - Expert
- Cluster deployment (Kubernetes) & maintenance - Expert
- Programming (Java) - Intermediate
- Application Types for DevOps (Streaming like Spark, Kafka, Big data like Hadoop etc) - Expert
- Artifactory (JFrog) - Expert
- Monitoring & Reporting (Prometheus, Grafana, PagerDuty etc.) - Expert
- Ansible, MySQL, PostgreSQL - Intermediate
• Source Control (like Git, Bitbucket, Svn, VSTS etc)
• Continuous Integration (like Jenkins, Bamboo, VSTS )
• Infrastructure Automation (like Puppet, Chef, Ansible)
• Deployment Automation & Orchestration (like Jenkins, VSTS, Octopus Deploy)
• Container Concepts (Docker)
• Orchestration (Kubernetes, Mesos, Swarm)
• Cloud (like AWS, Azure, GoogleCloud, Openstack)
Roles and Responsibilities
• DevOps architect should automate the process with proper tools.
• Developing appropriate DevOps channels throughout the organization.
• Evaluating, implementing and streamlining DevOps practices.
• Establishing a continuous build environment to accelerate software deployment and development processes.
• Engineering general and effective processes.
• Helping operation and developers teams to solve their problems.
• Supervising, Examining and Handling technical operations.
• Providing a DevOps Process and Operations.
• Capacity to handle teams with leadership attitude.
• Must possess excellent automation skills and the ability to drive initiatives to automate processes.
• Building strong cross-functional leadership skills and working together with the operations and engineering teams to make sure that systems are scalable and secure.
• Excellent knowledge of software development and software testing methodologies along with configuration management practices in Unix and Linux-based environment.
• Possess sound knowledge of cloud-based environments.
• Experience in handling automated deployment CI/CD tools.
• Must possess excellent knowledge of infrastructure automation tools (Ansible, Chef, and Puppet).
• Hand on experience in working with Amazon Web Services (AWS).
• Must have strong expertise in operating Linux/Unix environments and scripting languages like Python, Perl, and Shell.
• Ability to review deployment and delivery pipelines i.e., implement initiatives to minimize chances of failure, identify bottlenecks and troubleshoot issues.
• Previous experience in implementing continuous delivery and DevOps solutions.
• Experience in designing and building solutions to move data and process it.
• Must possess expertise in any of the coding languages depending on the nature of the job.
• Experience with containers and container orchestration tools (AKS, EKS, OpenShift, Kubernetes, etc)
• Experience with version control systems a must (GIT an advantage)
• Belief in "Infrastructure as a Code"(IaaC), including experience with open-source tools such as terraform
• Treats best practices for security as a requirement, not an afterthought
• Extensive experience with version control systems like GitLab and their use in release management, branching, merging, and integration strategies
• Experience working with Agile software development methodologies
• Proven ability to work on cross-functional Agile teams
• Mentor other engineers in best practices to improve their skills
• Creating suitable DevOps channels across the organization.
• Designing efficient practices.
• Delivering comprehensive best practices.
• Managing and reviewing technical operations.
• Ability to work independently and as part of a team.
• Exceptional communication skills, be knowledgeable about the latest industry trends, and highly innovative
Why you should join us
- You will join the mission to create positive impact on millions of peoples lives
- You get to work on the latest technologies in a culture which encourages experimentation - You get to work with super humans (Psst: Look up these super human1, super human2, super human3, super human4)
- You get to work in an accelerated learning environment
What you will do
- You will provide deep technical expertise to your team in building future ready systems.
- You will help develop a robust roadmap for ensuring operational excellence
- You will setup infrastructure on AWS that will be represented as code
- You will work on several automation projects that provide great developer experience
- You will setup secure, fault tolerant, reliable and performant systems
- You will establish clean and optimised coding standards for your team that are well documented
- You will set up systems in a way that are easy to maintain and provide a great developer experience
- You will actively mentor and participate in knowledge sharing forums
- You will work in an exciting startup environment where you can be ambitious and try new things :)
You should apply if
- You have a strong foundation in Computer Science concepts and programming fundamentals
- You have been working on cloud infrastructure setup, especially on AWS since 8+ years
- You have set up and maintained reliable systems that operate at high scale
- You have experience in hardening and securing cloud infrastructures
- You have a solid understanding of computer networking, network security and CDNs
- Extensive experience in AWS, Kubernetes and optionally Terraform
- Experience in building automation tools for code build and deployment (preferably in JS)
- You understand the hustle of a startup and are good with handling ambiguity
- You are curious, a quick learner and someone who loves to experiment
- You insist on highest standards of quality, maintainability and performance
- You work well in a team to enhance your impact

Requirements and Qualifications
- Bachelor’s degree in Computer Science Engineering or in a related field
- 4+ years of experience
- Excellent analytical and problem-solving skills
- Strong knowledge of Linux systems and internals
- Programming experience in Python/Shell scripting
- Strong AWS skills with knowledge of EC2, VPC, S3, RDS, Cloudfront, Route53, etc
- Experience in containerization (Docker) and container orchestration (Kubernetes)
- Experience in DevOps & CI/CD tools such as Git, Jenkins, Terraform, Helm
- Experience with SQL & NoSQL databases such as MySql, MongoDB, and ElasticSearch
- Debugging and troubleshooting skills using tools such as strace, tcpdump, etc
- Good understanding of networking protocol and security concerns (VPN, VPC, IG, NAT, AZ, Subnet)
- Experience with monitoring and data analysis tools such as Prometheus, EFK, etc
- Good communication & collaboration skills and attention to details
- Participation in rotating on-call duties
- Expertise in Infrastructure & Application design & architecture
- Expertise in AWS, OS & networking
- Having good exposure on Infra & Application security
- Expertise in Python, Shell scripting
- Proficient with Devops tools Terraform, Jenkins, Ansible, Docker, GIT
- Solid background in systems engineering and operations
- Strong in Devops methodologies and processes
- Strong in CI/CD pipeline & SDLC.







