About the job
Our goal
We are reinventing the future of MLOps. Censius Observability platform enables businesses to gain greater visibility into how their AI makes decisions to understand it better. We enable explanations of predictions, continuous monitoring of drifts, and assessing fairness in the real world. (TLDR build the best ML monitoring tool)
The culture
We believe in constantly iterating and improving our team culture, just like our product. We have found a good balance between async and sync work default is still Notion docs over meetings, but at the same time, we recognize that as an early-stage startup brainstorming together over calls leads to results faster. If you enjoy taking ownership, moving quickly, and writing docs, you will fit right in.
The role:
Our engineering team is growing and we are looking to bring on board a senior software engineer who can help us transition to the next phase of the company. As we roll out our platform to customers, you will be pivotal in refining our system architecture, ensuring the various tech stacks play well with each other, and smoothening the DevOps process.
On the platform, we use Python (ML-related jobs), Golang (core infrastructure), and NodeJS (user-facing). The platform is 100% cloud-native and we use Envoy as a proxy (eventually will lead to service-mesh architecture).
By joining our team, you will get the exposure to working across a swath of modern technologies while building an enterprise-grade ML platform in the most promising area.
Responsibilities
- Be the bridge between engineering and product teams. Understand long-term product roadmap and architect a system design that will scale with our plans.
- Take ownership of converting product insights into detailed engineering requirements. Break these down into smaller tasks and work with the team to plan and execute sprints.
- Author high-quality, highly-performance, and unit-tested code running on a distributed environment using containers.
- Continually evaluate and improve DevOps processes for a cloud-native codebase.
- Review PRs, mentor others and proactively take initiatives to improve our team's shipping velocity.
- Leverage your industry experience to champion engineering best practices within the organization.
Qualifications
Work Experience
- 3+ years of industry experience (2+ years in a senior engineering role) preferably with some exposure in leading remote development teams in the past.
- Proven track record building large-scale, high-throughput, low-latency production systems with at least 3+ years working with customers, architecting solutions, and delivering end-to-end products.
- Fluency in writing production-grade Go or Python in a microservice architecture with containers/VMs for over 3+ years.
- 3+ years of DevOps experience (Kubernetes, Docker, Helm and public cloud APIs)
- Worked with relational (SQL) as well as non-relational databases (Mongo or Couch) in a production environment.
- (Bonus: worked with big data in data lakes/warehouses).
- (Bonus: built an end-to-end ML pipeline)
Skills
- Strong documentation skills. As a remote team, we heavily rely on elaborate documentation for everything we are working on.
- Ability to motivate, mentor, and lead others (we have a flat team structure, but the team would rely upon you to make important decisions)
- Strong independent contributor as well as a team player.
- Working knowledge of ML and familiarity with concepts of MLOps
Benefits
- Competitive Salary
- Work Remotely
- Health insurance
- Unlimited Time Off
- Support for continual learning (free books and online courses)
- Reimbursement for streaming services (think Netflix)
- Reimbursement for gym or physical activity of your choice
- Flex hours
- Leveling Up Opportunities
You will excel in this role if
- You have a product mindset. You understand, care about, and can relate to our customers.
- You take ownership, collaborate, and follow through to the very end.
- You love solving difficult problems, stand your ground, and get what you want from engineers.
- Resonate with our core values of innovation, curiosity, accountability, trust, fun, and social good.
About Censiusai
Similar jobs
● Explore
○ As a devops engineer, you will have multiple ways, tools & technologies to solve
a particular problem. We want you to take things in your own hands and figure
out the best way to solve it.
● PDCT
○ Plan, design, code & write test cases for problems you are solving
● Tuning
○ Help to tune performance and ensure high availability of infrastructure, including
reviewing system and application logs
● Security
○ Work on code-level application security
● Deploy
○ Deploy, manage and operate scalable, highly available, and fault-tolerant
systems in client environments.
Technologies (4 out of 5 are required) :
● Terraform*
● Docker*
● Kubernetes*
● Bash Scripting
● SQL
(* marked are a must)
The challenges are great (as are the rewards). If you are looking to take these DevOps
challenges head on & wish to learn a great deal out of it and contribute to the company along
the way, this is the role for you.
Ready?
If developing impactful product for a initial stage startup sounds appealing to you, let’s
have a conversation. (Confidential, of course)
- Seeking an Individual carrying around 5+ yrs of experience.
- Must have skills - Jenkins, Groovy, Ansible, Shell Scripting, Python, Linux Admin
- Terraform, AWS deep knowledge to automate and provision EC2, EBS, SQL Server, cost optimization, CI/CD pipeline using Jenkins, Server less automation is plus.
- Excellent writing and communication skills in English. Enjoy writing crisp and understandable documentation
- Comfortable programming in one or more scripting languages
- Enjoys tinkering with tooling. Find easier ways to handle systems by doing some research. Strong awareness around build vs buy.
Required qualifications and must have skills
-
5+ years of experience managing a team of 5+ infrastructure software engineers
-
5+ years of experience in building and scaling technical infrastructure
-
5+ years of experience in delivering software
-
Experience leading by influence in multi-team, cross-functional projects
-
Demonstrated experience recruiting and managing technical teams, including performance management and managing engineers
-
Experience with cloud service providers such as AWS, GCP, or Azure
-
Experience with containerization technologies such as Kubernetes and Docker
Nice to have Skills
-
Experience with Hadoop, Hive and Presto
-
Application/infrastructure benchmarking and optimization
-
Familiarity with modern CI/CD practices
-
Familiarity with reliability best practices
- Understanding customer requirements and project KPIs
- Implementing various development, testing, automation tools, and IT infrastructure
- Planning the team structure, activities, and involvement in project management activities.
- Managing stakeholders and external interfaces
- Setting up tools and required infrastructure
- Defining and setting development, test, release, update, and support processes for https://www.simplilearn.com/top-benefits-of-learning-devops-article" target="_blank">DevOps operation
- Have the technical skill to review, verify, and validate the software code developed in the project.
- Troubleshooting techniques and fixing the code bugs
- Monitoring the processes during the entire lifecycle for its adherence and updating or creating new processes for improvement and minimizing the wastage
- Encouraging and building automated processes wherever possible
- Identifying and deploying cybersecurity measures by continuously performing vulnerability assessment and risk management
- Incidence management and root cause analysis
- Coordination and communication within the team and with customers
- Selecting and deploying appropriate https://www.simplilearn.com/best-ci-cd-tools-article" target="_blank">CI/CD tools
- Strive for continuous improvement and build continuous integration, continuous development, and constant deployment pipeline (https://www.simplilearn.com/open-source-pipeline-tools-for-devops-article" target="_blank">CI/CD Pipeline)
- Mentoring and guiding the team members
- Monitoring and measuring customer experience and KPIs
- Managing periodic reporting on the progress to the management and the customer
We are hiring candidates who are looking to work in a cloud environment and ready to learn and adapt to the evolving technologies.
Linux Administrator Roles & Responsibilities:
- 5+ or more years of professional experience with strong working expertise in Agile environments
- Deep knowledge in managing Linux servers.
- Managing Windows servers(Not Mandatory).
- Manage Web servers (Apache, Nginx).
- Manage Application servers.
- Strong background & experience in any one scripting language (Bash, Python)
- Manage firewall rules.
- Perform root cause analysis for production errors.
- Basic administration of MySQL, MSSQL.
- Ready to learn and adapt to business requirements.
- Manage information security controls with best practises and processes.
- Support business requirements beyond working hours.
- Ensuring highest uptimes of the services.
- Monitoring resource usages.
Skills/Requirements
- Bachelor’s Degree or Diploma in Computer Science, Engineering, Software Engineering or a relevant field.
- Experience with Linux-based infrastructures, Linux/Unix administration.
- Knowledge in managing databases such as My SQL, MS SQL.
- Knowledge of scripting languages such as Python, Bash.
- Knowledge in open-source technologies and cloud services like AWS, Azure is a plus. Candidates willing to learn will be preferred.
- Experience in managing web applications.
- Problem-solving attitude.
- 5+ years experience in the IT industry.
As part of the engineering team, you would be expected to have
deep technology expertise with a passion for building highly scalable products.
This is a unique opportunity where you can impact the lives of people across 150+
countries!
Responsibilities
• Develop Collaborate in large-scale systems design discussions.
• Deploying and maintaining in-house/customer systems ensuring high availability,
performance and optimal cost.
• Automate build pipelines. Ensuring right architecture for CI/CD
• Work with engineering leaders to ensure cloud security
• Develop standard operating procedures for various facets of Infrastructure
services (CI/CD, Git Branching, SAST, Quality gates, Auto Scaling)
• Perform & automate regular backups of servers & databases. Ensure rollback and
restore capabilities are Realtime and with zero-downtime.
• Lead the entire DevOps charter for ONE Championship. Mentor other DevOps
engineers. Ensure industry standards are followed.
Requirements
• Overall 5+ years of experience in as DevOps Engineer/Site Reliability Engineer
• B.E/B.Tech in CS or equivalent streams from institute of repute
• Experience in Azure is a must. AWS experience is a plus
• Experience in Kubernetes, Docker, and containers
• Proficiency in developing and deploying fully automated environments using
Puppet/Ansible and Terraform
• Experience with monitoring tools like Nagios/Icinga, Prometheus, AlertManager,
Newrelic
• Good knowledge of source code control (git)
• Expertise in Continuous Integration and Continuous Deployment setup using Azure
Pipeline or Jenkins
• Strong experience in programming languages. Python is preferred
• Experience in scripting and unit testing
• Basic knowledge of SQL & NoSQL databases
• Strong Linux fundamentals
• Experience in SonarQube, Locust & Browserstack is a plus
Responsibilities
- Building and maintenance of resilient and scalable production infrastructure
- Improvement of monitoring systems
- Creation and support of development automation processes (CI / CD)
- Participation in infrastructure development
- Detection of problems in architecture and proposing of solutions for solving them
- Creation of tasks for system improvements for system scalability, performance and monitoring
- Analysis of product requirements in the aspect of devops
- Managing a team of DevOps, control of task deliveries
- Incident analysis and fixing
Technology stack
Linux, Bash, Salt/Ansible, LXC, libvirt, IPsec, VXLAN, Open vSwitch, OpenVPN, OSPF, BIRD, Cisco NX-OS, Multicast, PIM, LVM, software RAID, LUKS, PostgreSQL, nginx, haproxy, Prometheus, Grafana, Zabbix, GitLab, Capistrano
Skills and Experience
- Understanding of the distributed systems principles
- Understanding of principles for building a resistant network infrastructure
- Experience of Ubuntu Linux administration (Debian-like will be a plus)
- Strong knowledge of Bash
- Experience of working with LXC-containers
- Understanding and experience with infrastructure as a code approach
- Experience of development idempotent Ansible roles
- Experience with relational databases (PostgeSQL), ability to create simple SQL queries
- Experience with git
- Experience with monitoring and metric collect systems (Prometheus, Grafana, Zabbix)
- Understanding of dynamic routing (OSPF)
Preferred experience
- Experience of working with highload zero-downtown environments
- Experience of coding on Python
- Experience of working with IPsec, VXLAN, Open vSwitch
- Knowledge and experience of working with network equipment Cisco
- Experience of working with Cisco NX-OS
- Knowledge of principles of multicast protocols IGMP, PIM
- Experience of setting multicast on Cisco equipment
- Experience of working with Solarflare Onload
- Experience administering Atlassian products
Must Have Skills
- AWS Solutions Architect and/or DevOps certification, Professional preferred
- BS level technical degree or equivalent experience; Computer Science or Engineering background preferred
- Hands-on technical expertise on Amazon Web Services (AWS) to include but not limited to EC2, VPC, IAM, Security groups, ELB/NLB/ALB, Internet gateway, S3, EBS and EFS.
- Experience in migration-deployment of applications to Cloud, re-engineering of application for cloud, setting up OS and app environments in virtualized cloud
- DevOps automation, CI/CD, infrastructure/services provisioning, application deployment and configuration
- DevOps toolsets to include Ansible, Jenkins and XLdeploy, XLrelease
- Deploy and configuration of Java/Wildfly, Springboot, JavaScript/Node.js and Ruby applications and middleware
- Scripting in Shell (bash) and Extensive Python
- Agile software development
- Excellent written, verbal communication skills, presentation, and collaboration skills - Team leadership skills
- Linux (RHEL) administration/engineering
Experience:
- Deploying, configuring, and supporting large scale monolithic and microservices based SaaS applications
- Working as both an infrastructure and application migration specialist
- Identifying and documenting application requirements for network, F5, IAM, and security groups
- Implementing DevOps practices such as infrastructure as code, continuous integration, and automated deployment
- Work with Technology leadership to understand business goals and requirements
- Experience with continuous integration tools
- Experience with configuration management platforms
- Writing and diagnosing issues with complex shell scripts
- Strong practical application development experience on Linux and Windows-based systems
Job Location: Jaipur
Experience Required: Minimum 3 years
About the role:
As a DevOps Engineer for Punchh, you will be working with our developers, SRE, and DevOps teams implementing our next generation infrastructure. We are looking for a self-motivated, responsible, team player who love designing systems that scale. Punchh provides a rich engineering environment where you can be creative, learn new technologies, solve engineering problems, all while delivering business objectives. The DevOps culture here is one with immense trust and responsibility. You will be given the opportunity to make an impact as there are no silos here.
Responsibilities:
- Deliver SLA and business objectives through whole lifecycle design of services through inception to implementation.
- Ensuring availability, performance, security, and scalability of AWS production systems
- Scale our systems and services through continuous integration, infrastructure as code, and gradual refactoring in an agile environment.
- Maintain services once a project is live by monitoring and measuring availability, latency, and overall system and application health.
- Write and maintain software that runs the infrastructure that powers the Loyalty and Data platform for some of the world’s largest brands.
- 24x7 in shifts on call for Level 2 and higher escalations
- Respond to incidents and write blameless RCA’s/postmortems
- Implement and practice proper security controls and processes
- Providing recommendations for architecture and process improvements.
- Definition and deployment of systems for metrics, logging, and monitoring on platform.
Must have:
- Minimum 3 Years of Experience in DevOps.
- BS degree in Computer Science, Mathematics, Engineering, or equivalent practical experience.
- Strong inter-personal skills.
- Must have experience in CI/CD tooling such as Jenkins, CircleCI, TravisCI
- Must have experience in Docker, Kubernetes, Amazon ECS or Mesos
- Experience in code development in at least one high-level programming language fromthis list: python, ruby, golang, groovy
- Proficient in shell scripting, and most importantly, know when to stop scripting and start developing.
- Experience in creation of highly automated infrastructures with any Configuration Management tools like: Terraform, Cloudformation or Ansible.
- In-depth knowledge of the Linux operating system and administration.
- Production experience with a major cloud provider such Amazon AWS.
- Knowledge of web server technologies such as Nginx or Apache.
- Knowledge of Redis, Memcache, or one of the many in-memory data stores.
- Experience with various load balancing technologies such as Amazon ALB/ELB, HA Proxy, F5.
- Comfortable with large-scale, highly-available distributed systems.
Good to have:
- Understanding of Web Standards (REST, SOAP APIs, OWASP, HTTP, TLS)
- Production experience with Hashicorp products such as Vault or Consul
- Expertise in designing, analyzing troubleshooting large-scale distributed systems.
- Experience in an PCI environment
- Experience with Big Data distributions from Cloudera, MapR, or Hortonworks
- Experience maintaining and scaling database applications
- Knowledge of fundamental systems engineering principles such as CAP Theorem, Concurrency Control, etc.
- Understanding of the network fundamentals: OSI, TCI/IP, topologies, etc.
- Understanding of Auditing of Infrastructure and help org. to control Infrastructure costs.
- Experience in Kafka, RabbitMQ or any messaging bus.
● Develop and deliver automation software required for building & improving the functionality, reliability, availability, and manageability of applications and cloud platforms
● Champion and drive the adoption of Infrastructure as Code (IaC) practices and mindset
● Design, architect, and build self-service, self-healing, synthetic monitoring and alerting platform and tools
● Automate the development and test automation processes through CI/CD pipeline (Git, Jenkins, SonarQube, Artifactory, Docker containers)
● Build container hosting-platform using Kubernetes
● Introduce new cloud technologies, tools & processes to keep innovating in commerce area to drive greater business value.
Skills Required:
● Excellent written and verbal communication skills and a good listener.
● Proficiency in deploying and maintaining Cloud based infrastructure services (AWS, GCP, Azure – good hands-on experience in at least one of them)
● Well versed with service-oriented architecture, cloud-based web services architecture, design patterns and frameworks.
● Good knowledge of cloud related services like compute, storage, network, messaging (Eg SNS, SQS) and automation (Eg. CFT/Terraform).
● Experience with relational SQL and NoSQL databases, including Postgres and
Cassandra.
● Experience in systems management/automation tools (Puppet/Chef/Ansible, Terraform)
● Strong Linux System Admin Experience with excellent troubleshooting and problem solving skills
● Hands-on experience with languages (Bash/Python/Core Java/Scala)
● Experience with CI/CD pipeline (Jenkins, Git, Maven etc)
● Experience integrating solutions in a multi-region environment
● Self-motivate, learn quickly and deliver results with minimal supervision
● Experience with Agile/Scrum/DevOps software development methodologies.
Nice to Have:
● Experience in setting-up Elastic Logstash Kibana (ELK) stack.
● Having worked with large scale data.
● Experience with Monitoring tools such as Splunk, Nagios, Grafana, DataDog etc.
● Previously experience on working with distributed architectures like Hadoop, Mapreduce etc.