Responsibilities
- Building and maintenance of resilient and scalable production infrastructure
Improvement of monitoring systems
- Improvement of monitoring systems
- Creation and support of development automation processes (CI / CD)
- Participation in infrastructure development
- Detection of problems in architecture and proposing of solutions for solving them
- Creation of tasks for system improvements for system scalability, performance and monitoring
- Analysis of product requirements in the aspect of devops
- Incident analysis and fixing
Skills and Experience
- Understanding of the distributed systems principles
- Understanding of principles for building a resistant network infrastructure
- Experience of Ubuntu Linux administration (Debian-like will be a plus)
- Strong knowledge of Bash
- Experience of working with LXC-containers
- Understanding and experience with infrastructure as a code approach
- Experience of development idempotent Ansible roles
- Experience of working with git
Preferred experience
Experience with relational databases (PostgeSQL), ability to create simple SQL queries
- Experience with monitoring and metric collect systems (Prometheus, Grafana, Zabbix)
- Understanding of dynamic routing (OSPF)
- Knowledge and experience of working with network equipment Cisco
- Experience of working with Cisco NX-OS
- Experience of working with IPsec, VXLAN, Open vSwitch
- Knowledge of principles of multicast protocols IGMP, PIM
- Experience of setting multicast on Cisco equipment
- Experience administering Atlassian products
Similar jobs
You will be responsible for:
- Managing all DevOps and infrastructure for Sizzle
- We have both cloud and on-premise servers
- Work closely with all AI and backend engineers on processing requirements and managing both development and production requirements
- Optimize the pipeline to ensure ultra fast processing
- Work closely with management team on infrastructure upgrades
You should have the following qualities:
- 3+ years of experience in DevOps, and CI/CD
- Deep experience in: Gitlab, Gitops, Ansible, Docker, Grafana, Prometheus
- Strong background in Linux system administration
- Deep expertise with AI/ML pipeline processing, especially with GPU processing. This doesn’t need to include model training, data gathering, etc. We’re looking more for experience on model deployment, and inferencing tasks at scale
- Deep expertise in Python including multiprocessing / multithreaded applications
- Performance profiling including memory, CPU, GPU profiling
- Error handling and building robust scripts that will be expected to run for weeks to months at a time
- Deploying to production servers and monitoring and maintaining the scripts
- DB integration including pymongo and sqlalchemy (we have MongoDB and PostgreSQL databases on our backend)
- Expertise in Docker-based virtualization including - creating & maintaining custom Docker images, deployment of Docker images on cloud and on-premise services, monitoring of production Docker images with robust error handling
- Expertise in AWS infrastructure, networking, availability
Optional but beneficial to have:
- Experience with running Nvidia GPU / CUDA-based tasks
- Experience with image processing in python (e.g. openCV, Pillow, etc)
- Experience with PostgreSQL and MongoDB (Or SQL familiarity)
- Excited about working in a fast-changing startup environment
- Willingness to learn rapidly on the job, try different things, and deliver results
- Bachelors or Masters degree in computer science or related field
- Ideally a gamer or someone interested in watching gaming content online
Skills:
DevOps, Ansible, CI/CD, GitLab, GitOps, Docker, Python, AWS, GCP, Grafana, Prometheus, python, sqlalchemy, Linux / Ubuntu system administration
Seniority: We are looking for a mid to senior level engineer
Salary: Will be commensurate with experience.
Who Should Apply:
If you have the right experience, regardless of your seniority, please apply.
Work Experience: 3 years to 6 years
About Hive
Hive is the leading provider of cloud-based AI solutions for content understanding,
trusted by the world’s largest, fastest growing, and most innovative organizations. The
company empowers developers with a portfolio of best-in-class, pre-trained AI models, serving billions of customer API requests every month. Hive also offers turnkey software applications powered by proprietary AI models and datasets, enabling breakthrough use cases across industries. Together, Hive’s solutions are transforming content moderation, brand protection, sponsorship measurement, context-based ad targeting, and more.
Hive has raised over $120M in capital from leading investors, including General Catalyst, 8VC, Glynn Capital, Bain & Company, Visa Ventures, and others. We have over 250 employees globally in our San Francisco, Seattle, and Delhi offices. Please reach out if you are interested in joining the future of AI!
About Role
Our unique machine learning needs led us to open our own data centers, with an
emphasis on distributed high performance computing integrating GPUs. Even with these data centers, we maintain a hybrid infrastructure with public clouds when the right fit. As we continue to commercialize our machine learning models, we also need to grow our DevOps and Site Reliability team to maintain the reliability of our enterprise SaaS offering for our customers. Our ideal candidate is someone who is
able to thrive in an unstructured environment and takes automation seriously. You believe there is no task that can’t be automated and no server scale too large. You take pride in optimizing performance at scale in every part of the stack and never manually performing the same task twice.
Responsibilities
● Create tools and processes for deploying and managing hardware for Private Cloud Infrastructure.
● Improve workflows of developer, data, and machine learning teams
● Manage integration and deployment tooling
● Create and maintain monitoring and alerting tools and dashboards for various services, and audit infrastructure
● Manage a diverse array of technology platforms, following best practices and
procedures
● Participate in on-call rotation and root cause analysis
Requirements
● Minimum 5 - 10 years of previous experience working directly with Software
Engineering teams as a developer, DevOps Engineer, or Site Reliability
Engineer.
● Experience with infrastructure as a service, distributed systems, and software design at a high-level.
● Comfortable working on Linux infrastructures (Debian) via the CLIAble to learn quickly in a fast-paced environment.
● Able to debug, optimize, and automate routine tasks
● Able to multitask, prioritize, and manage time efficiently independently
● Can communicate effectively across teams and management levels
● Degree in computer science, or similar, is an added plus!
Technology Stack
● Operating Systems - Linux/Debian Family/Ubuntu
● Configuration Management - Chef
● Containerization - Docker
● Container Orchestrators - Mesosphere/Kubernetes
● Scripting Languages - Python/Ruby/Node/Bash
● CI/CD Tools - Jenkins
● Network hardware - Arista/Cisco/Fortinet
● Hardware - HP/SuperMicro
● Storage - Ceph, S3
● Database - Scylla, Postgres, Pivotal GreenPlum
● Message Brokers: RabbitMQ
● Logging/Search - ELK Stack
● AWS: VPC/EC2/IAM/S3
● Networking: TCP / IP, ICMP, SSH, DNS, HTTP, SSL / TLS, Storage systems,
RAID, distributed file systems, NFS / iSCSI / CIFS
Who we are
We are a group of ambitious individuals who are passionate about creating a revolutionary AI company. At Hive, you will have a steep learning curve and an opportunity to contribute to one of the fastest growing AI start-ups in San Francisco. The work you do here will have a noticeable and direct impact on the
development of the company.
Thank you for your interest in Hive and we hope to meet you soon
Job Description
• Minimum 3+ yrs of Experience in DevOps with AWS Platform
• Strong AWS knowledge and experience
• Experience in using CI/CD automation tools (Git, Jenkins, Configuration deployment tools ( Puppet/Chef/Ansible)
• Experience with IAC tools Terraform
• Excellent experience in operating a container orchestration cluster (Kubernetes, Docker)
• Significant experience with Linux operating system environments
• Experience with infrastructure scripting solutions such as Python/Shell scripting
• Must have experience in designing Infrastructure automation framework.
• Good experience in any of the Setting up Monitoring tools and Dashboards ( Grafana/kafka)
• Excellent problem-solving, Log Analysis and troubleshooting skills
• Experience in setting up centralized logging for system (EKS, EC2) and application
• Process-oriented with great documentation skills
• Ability to work effectively within a team and with minimal supervision
Platform Services Engineer
DevSecOps Engineer
- Strong Systems Experience- Linux, networking, cloud, APIs
- Scripting language Programming - Shell, Python
- Strong Debugging Capability
- AWS Platform -IAM, Network,EC2, Lambda, S3, CloudWatch
- Knowledge on Terraform, Packer, Ansible, Jenkins
- Observability - Prometheus, InfluxDB, Dynatrace,
- Grafana, Splunk • DevSecOps-CI/CD - Jenkins
- Microservices
- Security & Access Management
- Container Orchestration a plus - Kubernetes, Docker etc.
- Big Data Platforms knowledge EMR, Databricks. Cloudera a plus
Role Purpose:
As a DevOps , You should be strong in both the Dev and Ops part of DevOps. We are looking for someone who has a deep understanding of systems architecture, understands core CS concepts well, and is able to reason about system behaviour rather than merely working with the toolset of the day. We believe that only such a person will be able to set a compelling direction for the team and excite those around them.
If you are someone who fits the description above, you will find that the rewards are well worth the high bar. Being one of the early hires of the Bangalore office, you will have a significant impact on the culture and the team; you will work with a set of energetic and hungry peers who will challenge you, and you will have considerable international exposure and opportunity for impact across departments.
Responsibilities
- Deployment, management, and administration of web services in a public cloud environment
- Design and develop solutions for deploying highly secure, highly available, performant and scalable services in elastically provisioned environments
- Design and develop continuous integration and continuous deployment solutions from development through production
- Own all operational aspects of running web services including automation, monitoring and alerting, reliability and performance
- Have direct impact on running a business by thinking about innovative solutions to operational problems
- Drive solutions and communication for production impacting incidents
- Running technical projects and being responsible for project-level deliveries
- Partner well with engineering and business teams across continents
Required Qualifications
- Bachelor’s or advanced degree in Computer Science or closely related field
- 4 - 6 years professional experience in DevOps, with at least 1/2 years in Linux / Unix
- Very strong in core CS concepts around operating systems, networks, and systems architecture including web services
- Strong scripting experience in Python and Bash
- Deep experience administering, running and deploying AWS based services
- Solid experience with Terraform, Packer and Docker or their equivalents
- Knowledge of security protocols and certificate infrastructure.
- Strong debugging, troubleshooting, and problem solving skills
- Broad experience with cloud hosted applications including virtualization platforms, relational and non relational data stores, reverse proxies, and orchestration platforms
- Curiosity, continuous learning and drive to continually raise the bar
- Strong partnering and communication skills
Preferred Qualifications
- Past experience as a senior developer or application architect strongly preferred.
- Experience building continuous integration and continuous deployment pipelines
- Experience with Zookeeper, Consul, HAProxy, ELK-Stack, Kafka, PostgreSQL.
- Experience working with, and preferably designing, a system compliant to any security framework (PCI DSS, ISO 27000, HIPPA, SOC 2, ...)
- Experience with AWS orchestration services such as ECS and EKS.
- Experience working with AWS ML pipeline services like AWS Sagemak
Experience of Linux
Experience using Python or Shell scripting (for Automation)
Hands-on experience with Implementation of CI/CD Processes
Experience working with one cloud platforms (AWS or Azure or Google)
Experience working with configuration management tools such as Ansible & Chef
Experience working with Containerization tool Docker.
Experience working with Container Orchestration tool Kubernetes.
Experience in source Control Management including SVN and/or Bitbucket
& GitHub
Experience with setup & management of monitoring tools like Nagios, Sensu & Prometheus or any other popular tools
Hands-on experience in Linux, Scripting Language & AWS is mandatory
Troubleshoot and Triage development, Production issues
Job Description:
- Hands on experience with Ansible & Terraform.
- Scripting language, such as Python or Bash or PowerShell is required and willingness to learn and master others.
- Troubleshooting and resolving automation, build, and CI/CD related issues (in cloud environment like AWS or Azure).
- Experience with Kubernetes is mandate.
- To develop and maintain tooling and environments for test and production environments.
- Assist team members in the development and maintenance of tooling for integration testing, performance testing, security testing, as well as source control systems (that includes working in CI systems like Azure DevOps, Team City, and orchestration tools like Octopus).
- Good with Linux environment.
One of our US based client is looking for a Devops professional who can handle Technical as well as Trainings for them in US.
If you are hired, you will be sent to US for the working from there. Training & Technical work ratio will be 70% & 30% respectively.
Company Will sponsor for US Visa.
If you are an Experienced Devops professional and also given professional trainings then feel free to connect with us for more.
Implement integrations requested by customers
Deploy updates and fixes
Provide Level 2 technical support
Build tools to reduce occurrences of errors and improve customer experience
Develop software to integrate with internal back-end systems
Perform root cause analysis for production errors
Investigate and resolve technical issues
Develop scripts to automate visualization
Design procedures for system troubleshooting and maintenance
Multiple Clouds [AWS/Azure/GCP] hands on experience
Good Experience on Docker implementation at scale.
Kubernets implementation and orchestration.
- Azure Devops - Working experience in Azure yaml pipelines. (Note – Some say they worked in yaml but it’s for Jenkins and not Azure devops)
- Azure – Infrastructure automation using Terraform/ARM templates. (Note – Some say that they worked in terraform but it’s for AWS and not Azure). Please confirm Terraform for Azure infrastructure automation.
- Powershell scripting to automate and deploy .Net applications.
1. Should have worked with AWS, Dockers and Kubernetes.
2. Should have worked with a scripting language.
3. Should know how to monitor system performance, CPU, Memory.
4. Should be able to do troubleshooting.
5. Should have knowledge of automated deployment
6. Proficient in one programming knowledge - python preferred.