We are looking for an DevOps Engineer with 4+ years of experience. He/she should be selfmotivated, go-getter, out of the box thinker, and ready to work in a high-energy environment. He/she must demonstrate a high level of ownership, integrity, and leadership skills and be flexible and adaptive with a strong desire to learn & excel. Required Skills:
Strong experience working with tools and platforms like Helm charts, Circle CI, Jenkins,
and/or Codefresh
Excellent knowledge of AWS offerings around Cloud and DevOps
Strong expertise in containerization platforms like Docker and container orchestration platforms like Kubernetes & Rancher
Should be familiar with leading Infrastructure as Code tools such as Terraform, CloudFormation, etc.
Strong experience in Python, Shell Scripting, Ansible, and Terraform
Good command over monitoring tools like Datadog, Zabbix, Elk, Grafana, CloudWatch, Stackdriver, Prometheus, JFrog, Nagios, etc.
Experience with Linux/Unix systems administration.
As a DevOps Developer, you will contribute to a thriving and growing AIGovernance Engineering team. You will work in a Kubernetes-based microservices environment to support our bleeding-edge cloud services. This will include custom solutions, as well as open source DevOps tools (build and deploy automation, monitoring and data gathering for our software delivery pipeline). You will also be contributing to our continuous improvement and continuous delivery while increasing maturity of DevOps and agile adoption practices.
Responsibilities:
Ability to deploy software using orchestrators /scripts/Automation on Hybrid and Public clouds like AWS
Ability to write shell/python/ or any unix scripts
Working Knowledge on Docker & Kubernetes
Ability to create pipelines using Jenkins or any CI/CD tool and GitOps tool like ArgoCD
Working knowledge of Git as a source control system and defect tracking system
Ability to debug and troubleshoot deployment issues
Ability to use tools for faster resolution of issues
Excellent communication and soft skills
Passionate and ability work and deliver in a multi-team environment
Experience with monitoring tools like Nagios, Prometheus and visualisation tools such as Grafana.
Ability to write Ansible, terraform scripts
Linux System experience and Administration
Effective cross-functional leadership skills: working with engineering and operational teams to ensure systems are secure, scalable, and reliable.
Ability to review deployment and operational environments, i.e., execute initiatives to reduce failure, troubleshoot issues across the entire infrastructure stack, expand monitoring capabilities, and manage technical operations.
About Hive Hive is the leading provider of cloud-based AI solutions for content understanding, trusted by the world’s largest, fastest growing, and most innovative organizations. The company empowers developers with a portfolio of best-in-class, pre-trained AI models, serving billions of customer API requests every month. Hive also offers turnkey software applications powered by proprietary AI models and datasets, enabling breakthrough use cases across industries. Together, Hive’s solutions are transforming content moderation, brand protection, sponsorship measurement, context-based ad targeting, and more. Hive has raised over $120M in capital from leading investors, including General Catalyst, 8VC, Glynn Capital, Bain & Company, Visa Ventures, and others. We have over 250 employees globally in our San Francisco, Seattle, and Delhi offices. Please reach out if you are interested in joining the future of AI!
About Role Our unique machine learning needs led us to open our own data centers, with an emphasis on distributed high performance computing integrating GPUs. Even with these data centers, we maintain a hybrid infrastructure with public clouds when the right fit. As we continue to commercialize our machine learning models, we also need to grow our DevOps and Site Reliability team to maintain the reliability of our enterprise SaaS offering for our customers. Our ideal candidate is someone who is able to thrive in an unstructured environment and takes automation seriously. You believe there is no task that can’t be automated and no server scale too large. You take pride in optimizing performance at scale in every part of the stack and never manually performing the same task twice.
Responsibilities ● Create tools and processes for deploying and managing hardware for Private Cloud Infrastructure. ● Improve workflows of developer, data, and machine learning teams ● Manage integration and deployment tooling ● Create and maintain monitoring and alerting tools and dashboards for various services, and audit infrastructure ● Manage a diverse array of technology platforms, following best practices and procedures ● Participate in on-call rotation and root cause analysis Requirements ● Minimum 5 - 10 years of previous experience working directly with Software Engineering teams as a developer, DevOps Engineer, or Site Reliability Engineer. ● Experience with infrastructure as a service, distributed systems, and software design at a high-level. ● Comfortable working on Linux infrastructures (Debian) via the CLIAble to learn quickly in a fast-paced environment. ● Able to debug, optimize, and automate routine tasks ● Able to multitask, prioritize, and manage time efficiently independently ● Can communicate effectively across teams and management levels ● Degree in computer science, or similar, is an added plus! Technology Stack ● Operating Systems - Linux/Debian Family/Ubuntu ● Configuration Management - Chef ● Containerization - Docker ● Container Orchestrators - Mesosphere/Kubernetes ● Scripting Languages - Python/Ruby/Node/Bash ● CI/CD Tools - Jenkins ● Network hardware - Arista/Cisco/Fortinet ● Hardware - HP/SuperMicro ● Storage - Ceph, S3 ● Database - Scylla, Postgres, Pivotal GreenPlum ● Message Brokers: RabbitMQ ● Logging/Search - ELK Stack ● AWS: VPC/EC2/IAM/S3 ● Networking: TCP / IP, ICMP, SSH, DNS, HTTP, SSL / TLS, Storage systems, RAID, distributed file systems, NFS / iSCSI / CIFS Who we are We are a group of ambitious individuals who are passionate about creating a revolutionary AI company. At Hive, you will have a steep learning curve and an opportunity to contribute to one of the fastest growing AI start-ups in San Francisco. The work you do here will have a noticeable and direct impact on the development of the company. Thank you for your interest in Hive and we hope to meet you soon
Scripting language, such as Python or Bash or PowerShell is required and willingness to learn and master others.
Troubleshooting and resolving automation, build, and CI/CD related issues (in cloud environment like AWS or Azure).
Experience with Kubernetes is mandate.
To develop and maintain tooling and environments for test and production environments.
Assist team members in the development and maintenance of tooling for integration testing, performance testing, security testing, as well as source control systems (that includes working in CI systems like Azure DevOps, Team City, and orchestration tools like Octopus).
○ Develop best practices for team and also responsible for the architecture
○ solutions and documentation operations in order to meet the engineering departments quality and standards
○ Participate in production outage and handle complex issues and works towards Resolution
○ Develop custom tools and integration with existing tools to increase engineering Productivity
Required Experience and Expertise
○ Having a good knowledge of Terraform + someone who has worked on large TF code bases.
○ Deep understanding of Terraform with best practices & writing TF modules.
○ Hands-on experience of GCP and AWS and knowledge on AWS Services like VPC and VPC related services like (route tables, vpc endpoints, privatelinks) EKS, S3, IAM. Cost aware mindset towards Cloud services.
○ Deep understanding of Kernel, Networking and OS fundamentals
Define and document best practices and strategies regarding application deployment and infrastructure maintenance.
Ensure limited system failure and increase up-time and availability of the various company apps.
Understand the current application infrastructure and strive for making it better.
Automate infrastructure and develop tools and processes to improve the customer experience and reduce support time.
Work closely with a team of developers and solution strategists to develop, deploy and troubleshoot the deployment and infrastructure issues.
Manage full application stacks from the OS through custom applications using Amazon cloud-based computing environments.
Set up a monitoring stack.
Implement the application’s CI/CD pipeline using the AWS stack. Increasingly automate and improve the testing plans and development workflows and tools.
Work closely with the engineers to design networks, systems, and storage environments that effectively reflect business needs, security requirements, and service level requirements.
Manage a continuous integration/continuous deployment methodology for the server-based technologies.
Proficient in leveraging CI and CD tools to automate testing and deployment. Experience working in an Agile, fast-paced, DevOps environment.
Support internal and external customers on multiple platforms.
First point of contact for handling customer issues, providing guidance and recommendations to increase efficiency and reduce customer incidents.
Learn on the job and explore new technologies with little supervision.
In addition to providing customer support, will be responsible for helping build tools and processes necessary for excellent customer outcomes.
Skills:
Experience with the core AWS services, plus the specifics mentioned in this job description.
Experience working with at least one of the following languages: Node.js, Python, PHP, Ruby, Kotlin or Java.
Proficient with Git and Git workflows and hosted enterprise Git solutions like GitHub.
Experience creating Cloud Formation Template to create Auto Scaling Groups, Route 53, DNS, back-end database, Elastic load balancer, VPCs, Subnets, Security Groups, Cloud Watch, S3, IAM roles, RDS DB instances, and to provide those instances and configure those resources to work together reducing the manual effort.
Experience in deploying and monitoring microservices on Kubernetes, AWS ECS, and AWS EKS
Security aware and ensures that all systems are security standards-compliant.
Good background in Linux/Unix administration.
Experience with building or maintaining cloud-native applications.
Minimum 3-5 years of cloud development experience, preferably AWS
Experience with CI/CD tools like Jenkins preferred.
Good analytical and communication skills
Bachelor’s Degree in Computer Science, Engineering or a related technical discipline
DevOps Engineer responsibilities include deploying product updates, identifying production issues, and implementing integrations that meet customer needs. If you have a solid background in working with cloud technologies, set up efficient deployment processes, and are motivated to work with diverse and talented teams, we’d like to meet you.
Ultimately, you will execute and automate operational processes fast, accurately, and securely.
Skills and Experience
4+ years of experience in building infrastructure experience with Cloud Providers ( AWS, Azure, GCP)
Experience in deploying containerized applications build on NodeJS/PHP/Python to kubernetes cluster.
Experience in monitoring production workload with relevant metrics and dashboards.
Experience in writing automation scripts using Shell, Python, Terraform, etc.
Experience in following security practices while setting up the infrastructure.
Self-motivated, able, and willing to help where help is needed
Able to build relationships, be culturally sensitive, have goal alignment, have learning agility
Roles and Responsibilities
Manage various resources across different cloud providers. (Azure, AWS, and GCP)
Monitor and optimize infrastructure cost.
Manage various kubernetes clusters with appropriate monitoring and alerting setup.
Build CI/CD pipelines to orchestrate provisioning and deployment of various services into kubernetes infrastructure.
Work closely with the development team on upcoming features to determine the correct infrastructure and related tools.
Assist the support team with escalated customer issues.
Develop, improve, and thoroughly document operational practices and procedures.
Responsible for setting up good security practices across various clouds.
YOE: 1- 3years Skill: Python, Docker or Ansible , AWS
➢ Experience Building a multi-region highly available auto-scaling infrastructure that optimizes performance and cost. plan for future infrastructure as well as Maintain & optimize existing infrastructure. ➢ Conceptualize, architect and build automated deployment pipelines in a CI/CD environment like Jenkins. ➢ Conceptualize, architect and build a containerized infrastructure using Docker,Mesosphere or similar SaaS platforms. Work with developers to institute systems, policies and workflows which allow for rollback of deployments Triage release of applications to production environment on a daily basis. ➢ Interface with developers and triage SQL queries that need to be executed inproduction environments. ➢ Maintain 24/7 on-call rotation to respond and support troubleshooting of issues in production. ➢ Assist the developers and on calls for other teams with post mortem, follow up and review of issues affecting production availability. ➢ Establishing and enforcing systems monitoring tools and standards ➢ Establishing and enforcing Risk Assessment policies and standards ➢ Establishing and enforcing Escalation policies and standards
Read more
Get to hear about interesting companies hiring right now