
Job Title:
AI Native Operations Expert – Director / AVP / VP
Company: EOSGlobe
CTC: ₹24 – ₹36 LPA
Open Positions: 3
Experience: 12 – 18 Years
Joining: Immediate Joiners Preferred
Role Overview
EOSGlobe is transforming into an AI-First organization and is looking for an AI Native Operations Expert to lead this transformation. The role focuses on driving automation, process re-engineering, and AI adoption across BPM operations to improve efficiency, scalability, and business impact.
Key Responsibilities
Lead AI-driven transformation initiatives across BPM operations.
Re-engineer processes using Artificial Intelligence, Machine Learning, and automation tools.
Collaborate with leadership and strategy teams to implement AI-first operational models.
Define and track KPIs, productivity metrics, and financial impact of transformation initiatives.
Partner with internal teams and clients to demonstrate AI-driven efficiency and revenue growth.
Identify opportunities for process automation and digital adoption across operations.
Required Skills
Strong expertise in Artificial Intelligence (AI), Machine Learning (ML), and RPA.
Experience in process transformation and digital automation initiatives.
Deep understanding of BPM operations and service delivery models.
Strong leadership and stakeholder management skills.
Analytical mindset with ability to measure financial impact and operational KPIs.
Preferred Qualifications
Experience leading large-scale automation or AI transformation projects.
Exposure to BPM, consulting, or operations leadership roles.
Excellent communication and strategic thinking skills.

About Redfoxa Careerlink Pvt Ltd
About
Company social profiles
Similar jobs
Amura’s Vision
We believe that the most under-appreciated route to releasing untapped human potential is to build a healthier body, and through which a better brain. This allows us to do more of everything that is important to each one of us.
Billions of healthier brains, sitting in healthier bodies, can take up more complex problems that defy solutions today, including many existential threats, and solve them in just a few decades.
Billions of healthier brains will make the world richer beyond what we can imagine today. The surplus wealth, combined with better human capabilities, will lead us to a new renaissance, giving us a richer and more beautiful culture.
These healthier brains will be equipped with deeper intellect, be less acrimonious, more magnanimous, and have a kinder outlook on the world, resulting in a world that is better than any previous time.
We find this vision of the future exhilarating. Our hopes and dreams are to create this future as quickly as possible and ensure that it is widely distributed and optimized to maximize all forms of human excellence.
Role Overview
We are looking for a highly skilled Senior DevOps Engineer (AI-Native Infrastructure & Platform Engineering) with deep expertise in AWS cloud infrastructure, automation, AI infrastructure operations, and modern DevOps/SRE practices.
This role goes beyond traditional DevOps and requires a seasoned specialist capable of building and operating AI-ready infrastructure platforms that support high-throughput APIs, LLM/AI workloads, GPU-based compute, data-intensive systems, real-time inference pipelines, and scalable ML platforms.
You will be responsible for architecting, automating, securing, and optimizing highly scalable and cost-efficient cloud environments that enable high-velocity engineering and AI teams. This is an ideal position for someone who combines technical ownership, an automation-first mindset, and a passion for developer productivity and platform reliability.
Key Responsibilities
Cloud Infrastructure & Platform Engineering (AWS)
- Architect, deploy, and manage highly scalable and secure infrastructure on AWS. Design cloud platforms supporting AI/ML workloads, data pipelines, real-time APIs, and high-concurrency backend systems.
- Hands-on expertise with key AWS services including EC2, ECS/EKS, Lambda, RDS, DynamoDB, S3, VPC, CloudFront, IAM, CloudWatch, and GPU-enabled instances.
- Build and maintain Infrastructure-as-Code (IaC) using Terraform, CloudFormation, or AWS CDK.
- Design multi-AZ and multi-region architectures for high availability and disaster recovery (HA/DR).
- Build reusable platform templates and shared infrastructure modules.
AI/ML Infrastructure & MLOps
- Build and maintain infrastructure for LLM applications, AI inference workloads, model serving platforms, vector databases, and feature stores.
- Support GPU-based workloads and optimize compute/storage usage.
- Enable scalable deployment patterns for AI applications using Kubernetes/EKS. Collaborate with Data Science and ML Engineering teams on model deployment, training/tuning of models, CI/CD for ML systems, experiment environments, and reproducibility.
- Support orchestration and deployment of AI workflows and inference services while implementing observability and reliability for AI pipelines.
CI/CD, Automation & Developer Productivity
- Build and maintain CI/CD pipelines using GitHub Actions, GitLab CI, Jenkins, or AWS CodePipeline.
- Automate deployments, environment provisioning, and release workflows.
- Build self-service developer platforms, preview environments, and reusable deployment workflows to improve developer productivity.
- Implement automated patching, scaling, backups, cleanup workflows, and drift detection.
Containers, Kubernetes & Platform Reliability
- Manage Docker-based environments, containerized applications, and optimize workloads using Kubernetes (EKS) or ECS/Fargate.
- Manage autoscaling, cluster health, node pools, ingress, service mesh, and workload isolation.
- Optimize infrastructure for performance, resilience, and cost-efficiency.
- Implement progressive deployment strategies including blue/green, canary, and rolling deployments.
Observability, Incident Response & SRE Practices
- Implement observability stacks using CloudWatch, Prometheus, Grafana, ELK, Datadog, OpenTelemetry, or New Relic.
- Build actionable dashboards and intelligent alerting systems while defining and tracking SLIs, SLOs, and SLAs.
- Lead incident response, root cause analysis, and blameless postmortems to reduce operational toil and improve MTTR.
FinOps, Cost Governance & Security
- Continuously monitor and optimize cloud costs (compute utilization, storage lifecycle, GPU usage, and data transfer) using AWS Cost Explorer, Budgets, Trusted Advisor, CloudHealth, or Kubecost.
- Implement AWS security best practices for IAM, VPCs, security groups, NACLs, encryption, and manage secrets using KMS, SSM Parameter Store, or Vault.
- Build secure CI/CD pipelines with automated security checks, least-privilege access, audit logging, and ensure compliance readiness for ISO 27001, SOC2, and GDPR.
Collaboration, Leadership & Platform Culture
- Work closely with engineering, AI/ML, QA, product, and operations teams to drive a DevOps, SRE, GitOps, and automation-first culture.
- Mentor junior DevOps and Platform Engineers while creating and maintaining detailed runbooks, architecture diagrams, and platform documentation.
Skills & Qualifications
Must-Have:
- 7+ years of experience in DevOps, SRE, Platform Engineering, or Cloud Infrastructure Engineering.
- Strong expertise in AWS cloud architecture, services, and deep understanding of Kubernetes (EKS), containers, and cloud-native systems.
- Strong Infrastructure-as-Code expertise using Terraform, CloudFormation, or CDK. Strong Linux administration, networking, DNS, routing, and load balancing knowledge. Strong scripting/programming experience in Python, Bash, or Go (preferred). Experience with CI/CD automation, GitOps workflows, and observability platforms supporting scalable production systems.
Preferred / Nice-to-Have:
- Experience with AI/ML infrastructure, MLOps, model serving, vector databases, GPU orchestration, and inference optimization.
- Familiarity with Kafka, Redis, SQS, and event-driven systems.
- Exposure to platform engineering, internal developer platforms, and tools like ArgoCD, Flux, Helm, and OpenTelemetry.
- AWS Certifications: Solutions Architect, DevOps Engineer, or SysOps Administrator. Knowledge of distributed systems and large-scale platform operations.
Preferred / Nice-to-Have:
- Experience with AI/ML infrastructure, MLOps, model serving, vector databases, GPU orchestration, and inference optimization.
- Familiarity with Kafka, Redis, SQS, and event-driven systems.
- Exposure to platform engineering, internal developer platforms, and tools like ArgoCD, Flux, Helm, and OpenTelemetry.
- AWS Certifications: Solutions Architect, DevOps Engineer, or SysOps Administrator. Knowledge of distributed systems and large-scale platform operations.
Here are answers to some questions you may have
Where is your office?
Chennai (Velachery)
Work Model
Work from Office – because great stories are built in person!
Do you have an online presence?
https://amura.ai (we are @AmuraHealth on all social media)
We are looking for a highly skilled and experienced Senior AIOps / MLOps Engineer with strong expertise in Azure Cloud, automation, platform engineering, CI/CD, observability, and enterprise-scale cloud operations.
The ideal candidate should have hands-on experience in designing, implementing, and managing modern cloud-native platforms with focus on AI/ML operationalization, DevOps automation, monitoring, reliability, and infrastructure modernization.
Required Experience
- 6 – 10 Years of overall IT experience
- Strong experience in AIOps / MLOps / DevOps engineering
- Hands-on enterprise experience in Azure Cloud platform engineering
Key Responsibilities
AIOps / MLOps
- Design and implement scalable enterprise-grade AIOps and MLOps platforms across cloud environments.
- Ensure AI platform reliability, governance, security, and model performance optimization.
- Implement LLM/AI model versioning, experiment tracking, drift detection, observability, and operational health monitoring frameworks.
- Collaborate with Data Science, DevOps, Cloud, and Application teams to accelerate AI/ML adoption and platform modernization.
- Develop automation frameworks for AI/ML pipelines, infrastructure provisioning, and operational workflows.
- Lead continuous improvement, automation, and standardization efforts across AI/ML operational ecosystems
- Mentor engineering teams and promote AIOps/MLOps best practices, innovation, and engineering excellence
- Strong Knowledge on embeddings, tokenization, vector databases, and AI/ML model training concepts
Preferred Skills
- Python, MLflow, Model Registry, Experiment Tracking
- Azure DevOps & Azure Cloud
- Azure Machine Learning
- LLMOps / Generative AI operationalization
- AI model deployment and lifecycle management
- AI Gateway and Model Serving architectures
- Azure OpenAI & Azure AI Foundry
- MCP Server implementation and configuration
- CI/CD Automation & AKS
Soft Skills
- Strong communication and stakeholder management
- Good troubleshooting and problem-solving skills
- Ability to work independently and drive ownership
- Strong collaboration and documentation skills
The DevOps Engineer will play a critical role in operationalizing artificial intelligence across Bell Techlogix client environments. This role focuses on building and supporting cloud infrastructure, CI/CD pipelines, and automation frameworks that power AI and machine learning workloads. The ideal candidate has experience supporting AI platforms such as Azure AI, Azure Machine Learning, Azure OpenAI, and ServiceNow or conversational AI platforms, and understands the operational requirements of production AI systems, including reliability, scalability, and security.
Key Responsibilities
•Design, build, and operate cloud infrastructure and platform services that support AI and machine learning workloads in production, SLA-driven managed services environments
•Implement CI/CD and MLOps pipelines to enable automated training, testing, deployment, and rollback of AI and ML models
•Develop and maintain Infrastructure as Code to provision AI-ready environments consistently across dev/test/prod
•Support AI platform operations including monitoring model health, pipeline execution, compute utilization, and data dependencies
•Partner with Machine Learning Engineers and Data Engineers to standardize deployment patterns for AI services and LLM-based solutions
•Enable secure and scalable AI integrations using APIs, messaging, and event-driven architectures
•Implement observability solutions for AI platforms, including logging, metrics, alerting, and drift detection integrations
•Troubleshoot AI platform incidents, perform root cause analysis, and implement remediation to improve reliability and automation coverage
•Apply security best practices for AI environments including secrets management, identity and access controls, network isolation, and policy enforcement
•Support AI-driven automation use cases across platforms such as Microsoft Copilot, ServiceNow, and conversational AI tools
•Collaborate with service desk, security, and architecture teams to continuously improve AI service delivery and operational maturity
Required Qualifications
•Bachelor’s degree in Computer Science, Engineering, or equivalent practical experience
•5+ years of experience in DevOps, cloud engineering, or platform operations, with exposure to AI or data workloads
•Hands-on experience with Microsoft Azure, including compute, networking, storage, and monitoring services
•Experience building CI/CD pipelines using Azure DevOps, GitHub Actions, or similar tools
•Working knowledge of Infrastructure as Code (Terraform and/or Bicep/ARM)
•Scripting experience using PowerShell and/or Python
•Experience supporting production platforms with incident management, change control, and root cause analysis
•Understanding of cloud security fundamentals and enterprise governance requirements
Preferred Qualifications
•Experience with Azure Machine Learning, Azure AI Services, Azure OpenAI, or MLOps frameworks
•Exposure to containerization and orchestration technologies (Docker, Kubernetes, AKS)
•Experience supporting data pipelines or feature stores used by machine learning systems
•Familiarity with ServiceNow, AI-driven ITSM workflows, or automation platforms
•Experience with observability tools
•Knowledge of Responsible AI, data governance, and compliance considerations for AI systems
•Relevant certifications (Microsoft Azure Administrator, Azure DevOps Engineer, Azure AI Engineer)
Why this role exists
Our infrastructure footprint is growing faster than our headcount, and we believe most of that
gap should be closed by automation and AI agents — not by hiring more humans to do toil. We
need someone early in their career who treats manual work as a bug, ships scripts and agents
instead of tickets, and wants to grow into deeper ownership over the next two years.
You will not be the most senior person on the team. You will be the one who multiplies the team.
What you'll own
In your first 1 months
• Take ownership of one slice of our CI/CD pipeline and make it measurably
faster, more reliable, or cheaper. We expect a number on a dashboard to move.
• Build at least three internal automations that replace manual ops toil —
using AI agents (Claude Code, agentic CLIs, scripted LLM workflows) as your force
multiplier.
• Be the first responder for a defined set of alerts. Write the runbooks. Drive
the alert volume down.
• Support senior engineers on AI/ML infrastructure (GPU nodes, inference
services, model deployment) — observe, document, and gradually take on contained
changes under review.
By 3 months you should be
• The go-to person for at least two production systems.
• Shipping routine infrastructure changes without needing senior review.
• Treating "manual" as a code smell.
Required (we will reject without these)
• 0–3 years hands-on experience with one major cloud (AWS, GCP, or
Azure — one is fine, depth beats breadth).
• Fluent in Linux command line, bash, and at least one scripting language
(Python or Go preferred).
• Have shipped something to production that real users hit. A side project
counts; a graded coursework lab does not.
• Comfortable with Docker — you can explain what an image vs. a
container is and why it matters.
• Working knowledge of networking fundamentals: DNS, HTTP/HTTPS,
TLS, ports, basic subnets — enough to debug "it works on my machine."
• Git fluency: branches, merges, rebases, conflict resolution.
• CI/CD pipelines — you have authored or substantially modified pipelines
in GitHub Actions, GitLab CI, ArgoCD, Jenkins, or similar. Not just "I clicked Re-run."
• Kubernetes basics — kubectl for real work, can read pod logs,
understand deployments and services, can debug a CrashLoopBackOff without
panicking. You do not need to have run a cluster; you do need to have lived inside one.
• Active user of AI coding agents (Claude Code, Cursor, Copilot, agentic
CLIs, etc.). You should be able to walk us through specific tasks where they made you
faster, and specific tasks where they failed you and how you noticed. "I have tried it" is
not enough.
Bonus (real plus, not required)
• Infrastructure as Code: Terraform, Pulumi, or Ansible.
• Observability: Prometheus/Grafana, Datadog, OpenTelemetry, any APM.
• Have built or extended an LLM-based agent — a custom MCP server, a
scripted multi-step workflow, an internal tool that calls models in a loop. Anything beyond
chat-with-Claude.
• Exposure to GPU workloads, model serving (vLLM, Triton, TGI, etc.), or
ML pipelines.
What we don't care about
• Whether your degree is in CS — or whether you have a degree at all.
• Brand-name companies on your resume.
• Certifications. They are fine. They do not substitute for having shipped.
How we work
• We default to automation. If you do something manually twice, the third
time you script it or hand it to an agent.
• AI agents are part of the workflow, not a novelty. Expect interview
questions about exactly how you use them — and where you have caught them being
wrong.
• Small, reversible changes beat big-bang rollouts.
• Postmortems are blameless and written down.
• We push back on each other. If you only execute, you will be unhappy
here.
How to apply
Send:
• Your resume.
• A short note (≤200 words) describing one infra or automation problem you
solved, and how AI agents factored in — or did not, and why. We read these. Generic
notes get rejected.
Internal note — delete before posting externally
• Comp band, location policy, team name, and reporting line marked
[CONFIRM] need to be filled in before this goes external.
• The Required list is intentionally tight: CI/CD and Kubernetes basics
promoted from bonus. Expect this to filter ~80% of typical junior DevOps applicants. The
remaining pool will skew toward people who have actually shipped infra at a startup, not
bootcamp grads or pure cloud-cert holders.
• IaC, observability, agent-building, and GPU/ML serving stay as bonus.
Promoting any of these to required at 0–3 yrs collapses the pool to near-zero or forces
hiring senior people at junior comp. If you want IaC required, re-level this to mid (3–5
yrs) and raise the band.
• Screening implication: the resume screen should explicitly check for
CI/CD pipeline authorship and any K8s-touching production work. If neither is on the
resume, reject at screen. Do not waste interview slots.
• Pipeline watch: if fewer than ~15 qualified resumes after 2 weeks of
active sourcing, the first thing to relax is the AI-agent-fluency bar (move to bonus and
screen for it in interview instead). Do not relax the "shipped to production" requirement
— that is the load-bearing filter.
We are looking for a passionate DevOps Engineer who can support deployment and monitor our Production, QE, and Staging environments performance. Applicants should have a strong understanding of UNIX internals and should be able to clearly articulate how it works. Knowledge of shell scripting & security aspects is a must. Any experience with infrastructure as code is a big plus. The key responsibility of the role is to manage deployments, security, and support of business solutions. Having experience in database applications like Postgres, ELK, NodeJS, NextJS & Ruby on Rails is a huge plus. At VakilSearch. Experience doesn't matter, passion to produce change matters
Responsibilities and Accountabilities:
- As part of the DevOps team, you will be responsible for configuration, optimization, documentation, and support of the infra components of VakilSearch’s product which are hosted in cloud services & on-prem facility
- Design, build tools and framework that support deploying and managing our platform & Exploring new tools, technologies, and processes to improve speed, efficiency, and scalability
- Support and troubleshoot scalability, high availability, performance, monitoring, backup, and restore of different Env
- Manage resources in a cost-effective, innovative manner including assisting subordinates ineffective use of resources and tools
- Resolve incidents as escalated from Monitoring tools and Business Development Team
- Implement and follow security guidelines, both policy and technology to protect our data
- Identify root cause for issues and develop long-term solutions to fix recurring issues and Document it
- Strong in performing production operation activities even at night times if required
- Ability to automate [Scripts] recurring tasks to increase velocity and quality
- Ability to manage and deliver multiple project phases at the same time
I Qualification(s):
- Experience in working with Linux Server, DevOps tools, and Orchestration tools
- Linux, AWS, GCP, Azure, CompTIA+, and any other certification are a value-add
II Experience Required in DevOps Aspects:
- Length of Experience: Minimum 1-4 years of experience
- Nature of Experience:
- Experience in Cloud deployments, Linux administration[ Kernel Tuning is a value add ], Linux clustering, AWS, virtualization, and networking concepts [ Azure, GCP value add ]
- Experience in deployment solutions CI/CD like Jenkins, GitHub Actions [ Release Management is a value add ]
- Hands-on experience in any of the configuration management IaC tools like Chef, Terraform, and CloudFormation [ Ansible & Puppet is a value add ]
- Administration, Configuring and utilizing Monitoring and Alerting tools like Prometheus, Grafana, Loki, ELK, Zabbix, Datadog, etc
- Experience with Containerization and orchestration tools like Docker, and Kubernetes [ Docker swarm is a value add ]Good Scripting skills in at least one interpreted language - Shell/bash scripting or Ruby/Python/Perl
- Experience in Database applications like PostgreSQL, MongoDB & MySQL [DataOps]
- Good at Version Control & source code management systems like GitHub, GIT
- Experience in Serverless [ Lambda/GCP cloud function/Azure function ]
- Experience in Web Server Nginx, and Apache
- Knowledge in Redis, RabbitMQ, ELK, REST API [ MLOps Tools is a value add ]
- Knowledge in Puma, Unicorn, Gunicorn & Yarn
- Hands-on VMWare ESXi/Xencenter deployments is a value add
- Experience in Implementing and troubleshooting TCP/IP networks, VPN, Load Balancing & Web application firewalls
- Deploying, Configuring, and Maintaining Linux server systems ON premises and off-premises
- Code Quality like SonarQube is a value-add
- Test Automation like Selenium, JMeter, and JUnit is a value-add
- Experience in Heroku and OpenStack is a value-add
- Experience in Identifying Inbound and Outbound Threats and resolving it
- Knowledge of CVE & applying the patches for OS, Ruby gems, Node, and Python packages
- Documenting the Security fix for future use
- Establish cross-team collaboration with security built into the software development lifecycle
- Forensics and Root Cause Analysis skills are mandatory
- Weekly Sanity Checks of the on-prem and off-prem environment
III Skill Set & Personality Traits required:
- An understanding of programming languages such as Ruby, NodeJS, ReactJS, Perl, Java, Python, and PHP
- Good written and verbal communication skills to facilitate efficient and effective interaction with peers, partners, vendors, and customers
IV Age Group: 21 – 36 Years
V Cost to the Company: As per industry standards
Job Title: Senior AIML Engineer – Immediate Joiner (AdTech)
Location: Pune – Onsite
About Us:
We are a cutting-edge technology company at the forefront of digital transformation, building innovative AI and machine learning solutions for the digital advertising industry. Join us in shaping the future of AdTech!
Role Overview:
We are looking for a highly skilled Senior AIML Engineer with AdTech experience to develop intelligent algorithms and predictive models that optimize digital advertising performance. Immediate joiners preferred.
Key Responsibilities:
- Design and implement AIML models for real-time ad optimization, audience targeting, and campaign performance analysis.
- Collaborate with data scientists and engineers to build scalable AI-driven solutions.
- Analyze large volumes of data to extract meaningful insights and improve ad performance.
- Develop and deploy machine learning pipelines for automated decision-making.
- Stay updated on the latest AI/ML trends and technologies to drive continuous innovation.
- Optimize existing models for speed, scalability, and accuracy.
- Work closely with product managers to align AI solutions with business goals.
Requirements:
- Minimum 4-6 years of experience in AIML, with a focus on AdTech (Mandatory).
- Strong programming skills in Python, R, or similar languages.
- Hands-on experience with machine learning frameworks like TensorFlow, PyTorch, or Scikit-learn.
- Expertise in data processing and real-time analytics.
- Strong understanding of digital advertising, programmatic platforms, and ad server technology.
- Excellent problem-solving and analytical skills.
- Immediate joiners preferred.
Preferred Skills:
- Knowledge of big data technologies like Spark, Hadoop, or Kafka.
- Experience with cloud platforms like AWS, GCP, or Azure.
- Familiarity with MLOps practices and tools.
How to Apply:
If you are a passionate AIML engineer with AdTech experience and can join immediately, we want to hear from you. Share your resume and a brief note on your relevant experience.
Join us in building the future of AI-driven digital advertising!
Key Responsibilities:-
• Collaborate with Data Scientists to test and scale new algorithms through pilots and later industrialize the solutions at scale to the comprehensive fashion network of the Group
• Influence, build and maintain the large-scale data infrastructure required for the AI projects, and integrate with external IT infrastructure/service to provide an e2e solution
• Leverage an understanding of software architecture and software design patterns to write scalable, maintainable, well-designed and future-proof code
• Design, develop and maintain the framework for the analytical pipeline
• Develop common components to address pain points in machine learning projects, like model lifecycle management, feature store and data quality evaluation
• Provide input and help implement framework and tools to improve data quality
• Work in cross-functional agile teams of highly skilled software/machine learning engineers, data scientists, designers, product managers and others to build the AI ecosystem within the Group
• Deliver on time, demonstrating a strong commitment to deliver on the team mission and agreed backlog
About the job
Our goal
We are reinventing the future of MLOps. Censius Observability platform enables businesses to gain greater visibility into how their AI makes decisions to understand it better. We enable explanations of predictions, continuous monitoring of drifts, and assessing fairness in the real world. (TLDR build the best ML monitoring tool)
The culture
We believe in constantly iterating and improving our team culture, just like our product. We have found a good balance between async and sync work default is still Notion docs over meetings, but at the same time, we recognize that as an early-stage startup brainstorming together over calls leads to results faster. If you enjoy taking ownership, moving quickly, and writing docs, you will fit right in.
The role:
Our engineering team is growing and we are looking to bring on board a senior software engineer who can help us transition to the next phase of the company. As we roll out our platform to customers, you will be pivotal in refining our system architecture, ensuring the various tech stacks play well with each other, and smoothening the DevOps process.
On the platform, we use Python (ML-related jobs), Golang (core infrastructure), and NodeJS (user-facing). The platform is 100% cloud-native and we use Envoy as a proxy (eventually will lead to service-mesh architecture).
By joining our team, you will get the exposure to working across a swath of modern technologies while building an enterprise-grade ML platform in the most promising area.
Responsibilities
- Be the bridge between engineering and product teams. Understand long-term product roadmap and architect a system design that will scale with our plans.
- Take ownership of converting product insights into detailed engineering requirements. Break these down into smaller tasks and work with the team to plan and execute sprints.
- Author high-quality, highly-performance, and unit-tested code running on a distributed environment using containers.
- Continually evaluate and improve DevOps processes for a cloud-native codebase.
- Review PRs, mentor others and proactively take initiatives to improve our team's shipping velocity.
- Leverage your industry experience to champion engineering best practices within the organization.
Qualifications
Work Experience
- 3+ years of industry experience (2+ years in a senior engineering role) preferably with some exposure in leading remote development teams in the past.
- Proven track record building large-scale, high-throughput, low-latency production systems with at least 3+ years working with customers, architecting solutions, and delivering end-to-end products.
- Fluency in writing production-grade Go or Python in a microservice architecture with containers/VMs for over 3+ years.
- 3+ years of DevOps experience (Kubernetes, Docker, Helm and public cloud APIs)
- Worked with relational (SQL) as well as non-relational databases (Mongo or Couch) in a production environment.
- (Bonus: worked with big data in data lakes/warehouses).
- (Bonus: built an end-to-end ML pipeline)
Skills
- Strong documentation skills. As a remote team, we heavily rely on elaborate documentation for everything we are working on.
- Ability to motivate, mentor, and lead others (we have a flat team structure, but the team would rely upon you to make important decisions)
- Strong independent contributor as well as a team player.
- Working knowledge of ML and familiarity with concepts of MLOps
Benefits
- Competitive Salary
- Work Remotely
- Health insurance
- Unlimited Time Off
- Support for continual learning (free books and online courses)
- Reimbursement for streaming services (think Netflix)
- Reimbursement for gym or physical activity of your choice
- Flex hours
- Leveling Up Opportunities
You will excel in this role if
- You have a product mindset. You understand, care about, and can relate to our customers.
- You take ownership, collaborate, and follow through to the very end.
- You love solving difficult problems, stand your ground, and get what you want from engineers.
- Resonate with our core values of innovation, curiosity, accountability, trust, fun, and social good.
We are looking for a full-time remote DevOps Engineer who has worked with CI/CD automation, big data pipelines and Cloud Infrastructure, to solve complex technical challenges at scale that will reshape the healthcare industry for generations. You will get the opportunity to be involved in the latest tech in big data engineering, novel machine learning pipelines and highly scalable backend development. The successful candidates will be working in a team of highly skilled and experienced developers, data scientists and CTO.
Job Requirements
- Experience deploying, automating, maintaining, and improving complex services and pipelines • Strong understanding of DevOps tools/process/methodologies
- Experience with AWS Cloud Formation and AWS CLI is essential
- The ability to work to project deadlines efficiently and with minimum guidance
- A positive attitude and enjoys working within a global distributed team
Skills
- Highly proficient working with CI/CD and automating infrastructure provisioning
- Deep understanding of AWS Cloud platform and hands on experience setting up and maintaining with large scale implementations
- Experience with JavaScript/TypeScript, Node, Python and Bash/Shell Scripting
- Hands on experience with Docker and container orchestration
- Experience setting up and maintaining big data pipelines, Serverless stacks and containers infrastructure
- An interest in healthcare and medical sectors
- Technical degree with 4 plus years’ infrastructure and automation experience
DevOps Engineer Skills Building a scalable and highly available infrastructure for data science Knows data science project workflows Hands-on with deployment patterns for online/offline predictions (server/serverless)
Experience with either terraform or Kubernetes
Experience of ML deployment frameworks like Kubeflow, MLflow, SageMaker Working knowledge of Jenkins or similar tool Responsibilities Owns all the ML cloud infrastructure (AWS) Help builds out an entirely CI/CD ecosystem with auto-scaling Work with a testing engineer to design testing methodologies for ML APIs Ability to research & implement new technologies Help with cost optimizations of infrastructure.
Knowledge sharing Nice to Have Develop APIs for machine learning Can write Python servers for ML systems with API frameworks Understanding of task queue frameworks like Celery










