MLOps Lead Engineer
at IT solutions specialized in Apps Lifecycle management. (MG1)
- Automate and maintain ML and Data pipelines at scale
- Collaborate with Data Scientists and Data Engineers on feature development teams to containerize and build out deployment pipelines for new modules
- Maintain and expand our on-prem deployments with spark clusters
- Design, build and optimize applications containerization and orchestration with Docker and Kubernetes and AWS or Azure
- 5 years of IT experience in data-driven or AI technology products
- Understanding of ML Model Deployment and Lifecycle
- Extensive experience in Apache airflow for MLOps workflow automation
- Experience is building and automating data pipelines
- Experience in working on Spark Cluster architecture
- Extensive experience with Unix/Linux environments
- Experience with standard concepts and technologies used in CI/CD build, deployment pipelines using Jenkins
- Strong experience in Python and PySpark and building required automation (using standard technologies such as Docker, Jenkins, and Ansible).
- Experience with Kubernetes or Docker Swarm
- Working technical knowledge of current systems software, protocols, and standards, including firewalls, Active Directory, etc.
- Basic knowledge of Multi-tier architectures: load balancers, caching, web servers, application servers, and databases.
- Experience with various virtualization technologies and multi-tenant, private and hybrid cloud environments.
- Hands-on software and hardware troubleshooting experience.
- Experience documenting and maintaining configuration and process information.
- Basic Knowledge of machine learning frameworks: Tensorflow, Caffe/Caffe2, Pytorch
Similar jobs
● Explore
○ As a devops engineer, you will have multiple ways, tools & technologies to solve
a particular problem. We want you to take things in your own hands and figure
out the best way to solve it.
● PDCT
○ Plan, design, code & write test cases for problems you are solving
● Tuning
○ Help to tune performance and ensure high availability of infrastructure, including
reviewing system and application logs
● Security
○ Work on code-level application security
● Deploy
○ Deploy, manage and operate scalable, highly available, and fault-tolerant
systems in client environments.
Technologies (4 out of 5 are required) :
● Terraform*
● Docker*
● Kubernetes*
● Bash Scripting
● SQL
(* marked are a must)
The challenges are great (as are the rewards). If you are looking to take these DevOps
challenges head on & wish to learn a great deal out of it and contribute to the company along
the way, this is the role for you.
Ready?
If developing impactful product for a initial stage startup sounds appealing to you, let’s
have a conversation. (Confidential, of course)
What we look for:
As a DevOps Developer, you will contribute to a thriving and growing AIGovernance Engineering team. You will work in a Kubernetes-based microservices environment to support our bleeding-edge cloud services. This will include custom solutions, as well as open source DevOps tools (build and deploy automation, monitoring and data gathering for our software delivery pipeline). You will also be contributing to our continuous improvement and continuous delivery while increasing maturity of DevOps and agile adoption practices.
Responsibilities:
- Ability to deploy software using orchestrators /scripts/Automation on Hybrid and Public clouds like AWS
- Ability to write shell/python/ or any unix scripts
- Working Knowledge on Docker & Kubernetes
- Ability to create pipelines using Jenkins or any CI/CD tool and GitOps tool like ArgoCD
- Working knowledge of Git as a source control system and defect tracking system
- Ability to debug and troubleshoot deployment issues
- Ability to use tools for faster resolution of issues
- Excellent communication and soft skills
- Passionate and ability work and deliver in a multi-team environment
- Good team player
- Flexible and quick learner
- Ability to write docker files, Kubernetes yaml files / Helm charts
- Experience with monitoring tools like Nagios, Prometheus and visualisation tools such as Grafana.
- Ability to write Ansible, terraform scripts
- Linux System experience and Administration
- Effective cross-functional leadership skills: working with engineering and operational teams to ensure systems are secure, scalable, and reliable.
- Ability to review deployment and operational environments, i.e., execute initiatives to reduce failure, troubleshoot issues across the entire infrastructure stack, expand monitoring capabilities, and manage technical operations.
Job Description
BUDGET: 20 LPA (MAX)
What you will do - Key Responsibilities
- DevOps architect will be responsible for testing, QC, debugging support, all of the various Server Side and Java software/servers for various products developed or procured by the company, will debug problems with integration of all software, on-field deployment issues and suggest improvements/work-arounds("hacks") and structured solutions/approaches.
- Responsible for Scaling the architecture towards 10M+ users.
- Will work closely with other team members including other Web Developers, Software Developers, Application Engineers, product managers to test and deploy existing products for various specialists and personnel using the software.
- Will act in capacity of Team Lead as necessary to coordinate and organize individual effort towards a successful completion / demo of an application.
- Will be solely responsible for the application approval before demo to clients, sponsors and investors.
Essential Requirements
- Should understand the ins and outs of Docker and Kubernetes
- Can architect complex cloud-based solutions using multiple products on either AWS or GCP
- Should have a solid understanding of cryptography and secure communication
- Know your way around Unix systems and can write complex shell scripts comfortably
- Should have a solid understanding of Processes and Thread Scheduling at the OS level
- Skilled with Ruby, Python or similar scripting languages
- Experienced with installing and managing multiple GPUs spread across multiple machines
- Should have at least 5 years managing large server deployments
Category
DevOps Engineer (IT & Networking)
Expertise
DevOps - 3 Years - Intermediate Python - 2 Years AWS - 3 Years - Intermediate Docker - 3 Years - Intermediate Kubernetes - 3 Years - Intermediate
Exp:8 to 10 years notice periods 0 to 20 days
Job Description :
- Provision Gcp Resources Based On The Architecture Design And Features Aligned With Business Objectives
- Monitor Resource Availability, Usage Metrics And Provide Guidelines For Cost And Performance Optimization
- Assist It/Business Users Resolving Gcp Service Related Issues
- Provide Guidelines For Cluster Automation And Migration Approaches And Techniques Including Ingest, Store, Process, Analyse And Explore/Visualise Data.
- Provision Gcp Resources For Data Engineering And Data Science Projects.
- Assistance With Automated Data Ingestion, Data Migration And Transformation(Good To Have)
- Assistance With Deployment And Troubleshooting Applications In Kubernetes.
- Establish Connections And Credibility In How To Address The Business Needs Via Design And Operate Cloud-Based Data Solutions
Key Responsibilities / Tasks :
- Building complex CI/CD pipelines for cloud native PaaS services such as Databases, Messaging, Storage, Compute in Google Cloud Platform
- Building deployment pipeline with Github CI (Actions)
- Building terraform codes to deploy infrastructure as a code
- Working with deployment and troubleshooting of Docker, GKE, Openshift, and Cloud Run
- Working with Cloud Build, Cloud Composer, and Dataflow
- Configuring software to be monitored by Appdynamics
- Configuring stackdriver logging and monitoring in GCP
- Work with splunk, Kibana, Prometheus and grafana to setup dashboard
Your skills, experience, and qualification :
- Total experience of 5+ Years, in as Devops. Should have at least 4 year of experience in Google could and Github CI.
- Should have strong experience in Microservices/API.
- Should have strong experience in Devops tools like Gitbun CI, teamcity, Jenkins and Helm.
- Should know Application deployment and testing strategies in Google cloud platform.
- Defining and setting development, test, release, update, and support processes for DevOps operation
- Strive for continuous improvement and build continuous integration, continuous development, and constant deployment pipeline (CI/CD Pipeline)
- Excellent understanding of Java
- Knowledge on Kafka, ZooKeeper, Hazelcast, Pub/Sub is nice to have.
- Understanding of cloud networking, security such as software defined networking/firewalls, virtual networks and load balancers.
- Understanding of cloud identity and access
- Understanding of the compute runtime and the differences between native compute, virtual and containers
- Configuration and managing databases such as Oracle, Cloud SQL, and Cloud Spanner.
- Excellent troubleshooting
- Working knowledge of various tools, open-source technologies
- Awareness of critical concepts of Agile principles
- Certification in Google professional Cloud DevOps Engineer is desirable.
- Experience with Agile/SCRUM environment.
- Familiar with Agile Team management tools (JIRA, Confluence)
- Understand and promote Agile values: FROCC (Focus, Respect, Openness, Commitment, Courage)
- Good communication skills
- Pro-active team player
- Comfortable working in multi-disciplinary, self-organized teams
- Professional knowledge of English
- Differentiators : knowledge/experience about
Requirements
You will make an ideal candidate if you have:
-
Experience of building a range of Services in a Cloud Service provider
-
Expert understanding of DevOps principles and Infrastructure as a Code concepts and techniques
-
Strong understanding of CI/CD tools (Jenkins, Ansible, GitHub)
-
Managed an infrastructure that involved 50+ hosts/network
-
3+ years of Kubernetes experience & 5+ years of experience in Native services such as Compute (virtual machines), Containers (AKS), Databases, DevOps, Identity, Storage & Security
-
Experience in engineering solutions on cloud foundation platform using Infrastructure As Code methods (eg. Terraform)
-
Security and Compliance, e.g. IAM and cloud compliance/auditing/monitoring tools
-
Customer/stakeholder focus. Ability to build strong relationships with Application teams, cross functional IT and global/local IT teams
-
Good leadership and teamwork skills - Works collaboratively in an agile environment
-
Operational effectiveness - delivers solutions that align to approved design patterns and security standards
-
Excellent skills in at least one of following: Python, Ruby, Java, JavaScript, Go, Node.JS
-
Experienced in full automation and configuration management
-
A track record of constantly looking for ways to do things better and an excellent understanding of the mechanism necessary to successfully implement change
-
Set and achieved challenging short, medium and long term goals which exceeded the standards in their field
-
Excellent written and spoken communication skills; an ability to communicate with impact, ensuring complex information is articulated in a meaningful way to wide and varied audiences
-
Built effective networks across business areas, developing relationships based on mutual trust and encouraging others to do the same
-
A successful track record of delivering complex projects and/or programmes, utilizing appropriate techniques and tools to ensure and measure success
-
A comprehensive understanding of risk management and proven experience of ensuring own/others' compliance with relevant regulatory processes
Essential Skills :
-
Demonstrable Cloud service provider experience - infrastructure build and configurations of a variety of services including compute, devops, databases, storage & security
-
Demonstrable experience of Linux administration and scripting preferably Red Hat
-
Experience of working with Continuous Integration (CI), Continuous Delivery (CD) and continuous testing tools
-
Experience working within an Agile environment
-
Programming experience in one or more of the following languages: Python, Ruby, Java, JavaScript, Go, Node.JS
-
Server administration (either Linux or Windows)
-
Automation scripting (using scripting languages such as Terraform, Ansible etc.)
-
Ability to quickly acquire new skills and tools
Required Skills :
-
Linux & Windows Server Certification
Experience and Education
• Bachelor’s degree in engineering or equivalent.
Work experience
• 4+ years of infrastructure and operations management
Experience at a global scale.
• 4+ years of experience in operations management, including monitoring, configuration management, automation, backup, and recovery.
• Broad experience in the data center, networking, storage, server, Linux, and cloud technologies.
• Broad knowledge of release engineering: build, integration, deployment, and provisioning, including familiarity with different upgrade models.
• Demonstratable experience with executing, or being involved of, a complete end-to-end project lifecycle.
Skills
• Excellent communication and teamwork skills – both oral and written.
• Skilled at collaborating effectively with both Operations and Engineering teams.
• Process and documentation oriented.
• Attention to details. Excellent problem-solving skills.
• Ability to simplify complex situations and lead calmly through periods of crisis.
• Experience implementing and optimizing operational processes.
• Ability to lead small teams: provide technical direction, prioritize tasks to achieve goals, identify dependencies, report on progress.
Technical Skills
• Strong fluency in Linux environments is a must.
• Good SQL skills.
• Demonstratable scripting/programming skills (bash, python, ruby, or go) and the ability to develop custom tool integrations between multiple systems using their published API’s / CLI’s.
• L3, load balancer, routing, and VPN configuration.
• Kubernetes configuration and management.
• Expertise using version control systems such as Git.
• Configuration and maintenance of database technologies such as Cassandra, MariaDB, Elastic.
• Designing and configuration of open-source monitoring systems such as Nagios, Grafana, or Prometheus.
• Designing and configuration of log pipeline technologies such as ELK (Elastic Search Logstash Kibana), FluentD, GROK, rsyslog, Google Stackdriver.
• Using and writing modules for Infrastructure as Code tools such as Ansible, Terraform, helm, customize.
• Strong understanding of virtualization and containerization technologies such as VMware, Docker, and Kubernetes.
• Specific experience with Google Cloud Platform or Amazon EC2 deployments and virtual machines.c
- Understanding customer requirements and project KPIs
- Implementing various development, testing, automation tools, and IT infrastructure
- Planning the team structure, activities, and involvement in project management activities.
- Managing stakeholders and external interfaces
- Setting up tools and required infrastructure
- Defining and setting development, test, release, update, and support processes for https://www.simplilearn.com/top-benefits-of-learning-devops-article" target="_blank">DevOps operation
- Have the technical skill to review, verify, and validate the software code developed in the project.
- Troubleshooting techniques and fixing the code bugs
- Monitoring the processes during the entire lifecycle for its adherence and updating or creating new processes for improvement and minimizing the wastage
- Encouraging and building automated processes wherever possible
- Identifying and deploying cybersecurity measures by continuously performing vulnerability assessment and risk management
- Incidence management and root cause analysis
- Coordination and communication within the team and with customers
- Selecting and deploying appropriate https://www.simplilearn.com/best-ci-cd-tools-article" target="_blank">CI/CD tools
- Strive for continuous improvement and build continuous integration, continuous development, and constant deployment pipeline (https://www.simplilearn.com/open-source-pipeline-tools-for-devops-article" target="_blank">CI/CD Pipeline)
- Mentoring and guiding the team members
- Monitoring and measuring customer experience and KPIs
- Managing periodic reporting on the progress to the management and the customer
• At least 4 years of hands-on experience with cloud infrastructure on GCP
• Hands-on-Experience on Kubernetes is a mandate
• Exposure to configuration management and orchestration tools at scale (e.g. Terraform, Ansible, Packer)
• Knowledge and hand-on-experience in DevOps tools (e.g. Jenkins, Groovy, and Gradle)
• Knowledge and hand-on-experience on the various platforms (e.g. Gitlab, CircleCl and Spinnakar)
• Familiarity with monitoring and alerting tools (e.g. CloudWatch, ELK stack, Prometheus)
• Proven ability to work independently or as an integral member of a team
Preferable Skills:
• Familiarity with standard IT security practices such as encryption,
credentials and key management.
• Proven experience on various coding languages (Java, Python-) to
• support DevOps operation and cloud transformation
• Familiarity and knowledge of the web standards (e.g. REST APIs, web security mechanisms)
• Hands on experience with GCP
• Experience in performance tuning, services outage management and troubleshooting.
Attributes:
• Good verbal and written communication skills
• Exceptional leadership, time management, and organizational skill Ability to operate independently and make decisions with little direct supervision
Searce is a niche’ Cloud Consulting business with futuristic tech DNA. We do new-age tech
to realise the “Next” in the “Now” for our Clients. We specialise in Cloud Data Engineering,
AI/Machine Learning and Advanced Cloud infra tech such as Anthos and Kubernetes. We
are one of the top & the fastest growing partners for Google Cloud and AWS globally with
over 2,500 clients successfully moved to cloud.
What we believe?
1. Best practices are overrated
○ Implementing best practices can only make one n ‘average’ .
2. Honesty and Transparency
○ We believe in naked truth. We do what we tell and tell what we do.
3. Client Partnership
○ Client - Vendor relationship: No. We partner with clients instead.
○ And our sales team comprises 100% of our clients.
How we work?
It’s all about being Happier first. And rest follows. Searce work culture is defined by
HAPPIER.
1. Humble: Happy people don’t carry ego around. We listen to understand; not to
respond.
2. Adaptable: We are comfortable with uncertainty. And we accept changes well. As
that’s what life's about.
3. Positive: We are super positive about work & life in general. We love to forget and
forgive. We don’t hold grudges. We don’t have time or adequate space for it.
4. Passionate: We are as passionate about the great street-food vendor across the
street as about Tesla’s new model and so on. Passion is what drives us to work and
makes us deliver the quality we deliver.
5. Innovative: Innovate or Die. We love to challenge the status quo.
6. Experimental: We encourage curiosity & making mistakes.
7. Responsible: Driven. Self motivated. Self governing teams. We own it.
Are you the one? Quick self-discovery test:
1. Love for cloud: When was the last time your dinner entailed an act on “How would
‘Jerry Seinfeld’ pitch Cloud platform & products to this prospect” and your friend did
the ‘Sheldon’ version of the same thing.
2. Passion for sales: When was the last time you went at a remote gas station while on
vacation, and ended up helping the gas station owner saasify his 7 gas stations
across other geographies.
3. Compassion for customers: You listen more than you speak. When you do speak,
people feel the need to listen.
4. Humor for life: When was the last time you told a concerned CEO, ‘If Elon Musk can
attempt to take humanity to Mars, why can’t we take your business to run on cloud ?
Your bucket of undertakings:
This position will be responsible to consult with clients and propose architectural solutions
to help move & improve infra from on-premise to cloud or help optimize cloud spend from
one public cloud to the other.
1. Be the first one to experiment on new age cloud offerings, help define the best
practise as a thought leader for cloud, automation & Dev-Ops, be a solution
visionary and technology expert across multiple channels.
2. Continually augment skills and learn new tech as the technology and client needs
evolve
3. Use your experience in Google cloud platform, AWS or Microsoft Azure to build
hybrid-cloud solutions for customers.
4. Provide leadership to project teams, and facilitate the definition of project
deliverables around core Cloud based technology and methods.
5. Define tracking mechanisms and ensure IT standards and methodology are met;
deliver quality results.
6. Participate in technical reviews of requirements, designs, code and other artifacts
7. Identify and keep abreast of new technical concepts in google cloud platform
8. Security, Risk and Compliance - Advise customers on best practices around access
management, network setup, regulatory compliance and related areas.
Accomplishment Set
● Passionate, persuasive, articulate Cloud professional capable of quickly establishing
interest and credibility
● Good business judgment, a comfortable, open communication style, and a
willingness and ability to work with customers and teams.
● Strong service attitude and a commitment to quality.
● Highly organised and efficient.
● Confident working with others to inspire a high-quality standard.
Education, Experience, etc.
1. Is Education overrated? Yes. We believe so. However there is no way to locate you
otherwise. So unfortunately we might have to look for a Bachelor's or Master's
degree in engineering from a reputed institute or you should be programming from
12. And the latter is better. We will find you faster if you specify the latter in some
manner. Not just degree, but we are not too thrilled by tech certifications too ... :)
2. To reiterate: Passion to tech-awesome, insatiable desire to learn the latest of the
new-age cloud tech, highly analytical aptitude and a strong ‘desire to deliver’ outlives
those fancy degrees!
3. 1 - 5 years of experience with at least 2 - 3 years of hands-on experience in Cloud
Computing (AWS/GCP/Azure) and IT operational experience in a global enterprise
environment.
4. Good analytical, communication, problem solving, and learning skills.
5. Knowledge on programming against cloud platforms such as Google Cloud Platf
Requirements
- Design, write and build tools to improve the reliability, latency, availability and scalability of HealthifyMe application.
- Communicate, collaborate and work effectively across distributed teams in a global environment
- Optimize performance and solve issues across the entire stack: hardware, software, application, and network.
- Experienced in building infrastructure with terraform / cloudformation or equivalent.
- Experience with ansible or equivalent is beneficial
- Ability to use a wide variety of Open Source Tools
- Experience with AWS is a must.
- Minimum 5 years of running services in a large scale environment.
- Expert level understanding of Linux servers, specifically RHEL/CentOS.
- Practical, proven knowledge of shell scripting and at least one higher-level language (eg. Python, Ruby, GoLang).
- Experience with source code and binary repositories, build tools, and CI/CD (Git, Artifactory, Jenkins, etc)
- Demonstrable knowledge of TCP/IP, HTTP, web application security, and experience supporting multi-tier web application architectures.
Look forward to
- Working with a world-class team.
- Fun & work at the same place with an amazing work culture and flexible timings.
- Get ready to transform yourself into a health junkie
Join HealthifyMe and make history!