DevOps Engineer Skills Building a scalable and highly available infrastructure for data science Knows data science project workflows Hands-on with deployment patterns for online/offline predictions (server/serverless)
Experience with either terraform or Kubernetes
Experience of ML deployment frameworks like Kubeflow, MLflow, SageMaker Working knowledge of Jenkins or similar tool Responsibilities Owns all the ML cloud infrastructure (AWS) Help builds out an entirely CI/CD ecosystem with auto-scaling Work with a testing engineer to design testing methodologies for ML APIs Ability to research & implement new technologies Help with cost optimizations of infrastructure.
Knowledge sharing Nice to Have Develop APIs for machine learning Can write Python servers for ML systems with API frameworks Understanding of task queue frameworks like Celery
About MoreYeahs
Similar jobs
Company - Apptware Solutions
Location Baner Pune
Team Size - 130+
Job Description -
Cloud Engineer with 8+yrs of experience
Roles and Responsibilities
● Have 8+ years of strong experience in deployment, management and maintenance of large systems on-premise or cloud
● Experience maintaining and deploying highly-available, fault-tolerant systems at scale
● A drive towards automating repetitive tasks (e.g. scripting via Bash, Python, Ruby, etc)
● Practical experience with Docker containerization and clustering (Kubernetes/ECS)
● Expertise with AWS (e.g. IAM, EC2, VPC, ELB, ALB, Autoscaling, Lambda, VPN)
● Version control system experience (e.g. Git)
● Experience implementing CI/CD (e.g. Jenkins, TravisCI, CodePipeline)
● Operational (e.g. HA/Backups) NoSQL experience (e.g. MongoDB, Redis) SQL experience (e.g. MySQL)
● Experience with configuration management tools (e.g. Ansible, Chef) ● Experience with infrastructure-as-code (e.g. Terraform, Cloudformation)
● Bachelor's or master’s degree in CS, or equivalent practical experience
● Effective communication skills
● Hands-on cloud providers like MS Azure and GC
● A sense of ownership and ability to operate independently
● Experience with Jira and one or more Agile SDLC methodologies
● Nice to Have:
○ Sensu and Graphite
○ Ruby or Java
○ Python or Groovy
○ Java Performance Analysis
Role: Cloud Engineer
Industry Type: IT-Software, Software Services
Functional Area: IT Software - Application Programming, Maintenance Employment Type: Full Time, Permanent
Role Category: Programming & Design
Key Responsibilities:-
• Collaborate with Data Scientists to test and scale new algorithms through pilots and later industrialize the solutions at scale to the comprehensive fashion network of the Group
• Influence, build and maintain the large-scale data infrastructure required for the AI projects, and integrate with external IT infrastructure/service to provide an e2e solution
• Leverage an understanding of software architecture and software design patterns to write scalable, maintainable, well-designed and future-proof code
• Design, develop and maintain the framework for the analytical pipeline
• Develop common components to address pain points in machine learning projects, like model lifecycle management, feature store and data quality evaluation
• Provide input and help implement framework and tools to improve data quality
• Work in cross-functional agile teams of highly skilled software/machine learning engineers, data scientists, designers, product managers and others to build the AI ecosystem within the Group
• Deliver on time, demonstrating a strong commitment to deliver on the team mission and agreed backlog
Description
DevOps Engineer / SRE
- Understanding of maintenance of existing systems (Virtual machines), Linux stack
- Experience running, operating and maintainence of Kubernetes pods
- Strong Scripting skills
- Experience in AWS
- Knowledge of configuring/optimizing open source tools like Kafka, etc.
- Strong automation maintenance - ability to identify opportunities to speed up build and deploy process with strong validation and automation
- Optimizing and standardizing monitoring, alerting.
- Experience in Google cloud platform
- Experience/ Knowledge in Python will be an added advantage
- Experience on Monitoring Tools like Jenkins, Kubernetes ,Nagios,Terraform etc
Please find the JD below:
- Candidate should have good Platform experience on Azure with Terraform.
- The devops engineer needs to help developers, create the Pipelines and K8s Deployment Manifests.
- Good to have experience on migrating data from (AWS) to Azure.
- To manage/automate infrastructure automatically using Terraforms. Jenkins is the key CI/CD tool which we uses and it will be used to run these Terraforms.
- VMs to be provisioned on Azure Cloud and managed.
- Good hands-on experience of Networking on Cloud is required.
- Ability to setup Database on VM as well as managed DB and Proper set up of cloud hosted microservices needs to be done to communicate with the db services.
- Kubernetes, Storage, KeyValult, Networking (load balancing and routing) and VMs are the key infrastructure expertise which are essential.
- Requirement is to administer Kubernetes cluster end to end. (Application deployment, managing namespaces, load balancing, policy setup, using blue green/canary deployment models etc).
- The experience in AWS is desirable.
- Python experience is optional however Power shell is mandatory.
- Know-how on the use of GitHub
- Administration of Azure Kubernetes services
Acceldata is creating the Data observability space. We make it possible for data-driven enterprises to effectively monitor, discover, and validate Data platforms at Petabyte scale. Our customers are Fortune 500 companies including Asia's largest telecom company, a unicorn fintech startup of India, and many more. We are lean, hungry, customer-obsessed, and growing fast. Our Solutions team values productivity, integrity, and pragmatism. We provide a flexible, remote-friendly work environment.
We are building software that can provide insights into companies' data operations and allows them to focus on delivering data reliably with speed and effectiveness. Join us in building an industry-leading data observability platform that focuses on ensuring data reliability from every spectrum (compute, data and pipeline) of a cloud or on-premise data platform.
Position Summary-
This role will support the customer implementation of a data quality and reliability product. The candidate is expected to install the product in the client environment, manage proof of concepts with prospects, and become a product expert and troubleshoot post installation, production issues. The role will have significant interaction with the client data engineering team and the candidate is expected to have good communication skills.
Required experience
- 6-7 years experience providing engineering support to data domain/pipelines/data engineers.
- Experience in troubleshooting data issues, analyzing end to end data pipelines and in working with users in resolving issues
- Experience setting up enterprise security solutions including setting up active directories, firewalls, SSL certificates, Kerberos KDC servers, etc.
- Basic understanding of SQL
- Experience working with technologies like S3, Kubernetes experience preferred.
- Databricks/Hadoop/Kafka experience preferred but not required
Job Dsecription: (8-12 years)
○ Develop best practices for team and also responsible for the architecture
○ solutions and documentation operations in order to meet the engineering departments quality and standards
○ Participate in production outage and handle complex issues and works towards Resolution
○ Develop custom tools and integration with existing tools to increase engineering Productivity
Required Experience and Expertise
○ Deep understanding of Kernel, Networking and OS fundamentals
○ Strong experience in writing helm charts.
○ Deep understanding of K8s.
○ Good knowledge in service mesh.
○ Good Database understanding
Notice Period: 30 day max
Requirements
- Design, write and build tools to improve the reliability, latency, availability and scalability of HealthifyMe application.
- Communicate, collaborate and work effectively across distributed teams in a global environment
- Optimize performance and solve issues across the entire stack: hardware, software, application, and network.
- Experienced in building infrastructure with terraform / cloudformation or equivalent.
- Experience with ansible or equivalent is beneficial
- Ability to use a wide variety of Open Source Tools
- Experience with AWS is a must.
- Minimum 5 years of running services in a large scale environment.
- Expert level understanding of Linux servers, specifically RHEL/CentOS.
- Practical, proven knowledge of shell scripting and at least one higher-level language (eg. Python, Ruby, GoLang).
- Experience with source code and binary repositories, build tools, and CI/CD (Git, Artifactory, Jenkins, etc)
- Demonstrable knowledge of TCP/IP, HTTP, web application security, and experience supporting multi-tier web application architectures.
Look forward to
- Working with a world-class team.
- Fun & work at the same place with an amazing work culture and flexible timings.
- Get ready to transform yourself into a health junkie
Join HealthifyMe and make history!