
MLOps Lead Engineer
at IT solutions specialized in Apps Lifecycle management. (MG1)
- Automate and maintain ML and Data pipelines at scale
- Collaborate with Data Scientists and Data Engineers on feature development teams to containerize and build out deployment pipelines for new modules
- Maintain and expand our on-prem deployments with spark clusters
- Design, build and optimize applications containerization and orchestration with Docker and Kubernetes and AWS or Azure
- 5 years of IT experience in data-driven or AI technology products
- Understanding of ML Model Deployment and Lifecycle
- Extensive experience in Apache airflow for MLOps workflow automation
- Experience is building and automating data pipelines
- Experience in working on Spark Cluster architecture
- Extensive experience with Unix/Linux environments
- Experience with standard concepts and technologies used in CI/CD build, deployment pipelines using Jenkins
- Strong experience in Python and PySpark and building required automation (using standard technologies such as Docker, Jenkins, and Ansible).
- Experience with Kubernetes or Docker Swarm
- Working technical knowledge of current systems software, protocols, and standards, including firewalls, Active Directory, etc.
- Basic knowledge of Multi-tier architectures: load balancers, caching, web servers, application servers, and databases.
- Experience with various virtualization technologies and multi-tenant, private and hybrid cloud environments.
- Hands-on software and hardware troubleshooting experience.
- Experience documenting and maintaining configuration and process information.
- Basic Knowledge of machine learning frameworks: Tensorflow, Caffe/Caffe2, Pytorch

Similar jobs
Role: Senior Platform Engineer (GCP Cloud)
Experience Level: 3 to 6 Years
Work location: Mumbai
Mode : Hybrid
Role & Responsibilities:
- Build automation software for cloud platforms and applications
- Drive Infrastructure as Code (IaC) adoption
- Design self-service, self-healing monitoring and alerting tools
- Automate CI/CD pipelines (Git, Jenkins, SonarQube, Docker)
- Build Kubernetes container platforms
- Introduce new cloud technologies for business innovation
Requirements:
- Hands-on experience with GCP Cloud
- Knowledge of cloud services (compute, storage, network, messaging)
- IaC tools experience (Terraform/CloudFormation)
- SQL & NoSQL databases (Postgres, Cassandra)
- Automation tools (Puppet/Chef/Ansible)
- Strong Linux administration skills
- Programming: Bash/Python/Java/Scala
- CI/CD pipeline expertise (Jenkins, Git, Maven)
- Multi-region deployment experience
- Agile/Scrum/DevOps methodology
Must Have -
a. Background working with Startups
b. Good knowledge of Kubernetes & Docker
c. Background working in Azure
What you’ll be doing
- Ensure that our applications and environments are stable, scalable, secure and performing as expected.
- Proactively engage and work in alignment with cross-functional colleagues to understand their requirements, contributing to and providing suitable supporting solutions.
- Develop and introduce systems to aid and facilitate rapid growth including implementation of deployment policies, designing and implementing new procedures, configuration management and planning of patches and for capacity upgrades
- Observability: ensure suitable levels of monitoring and alerting are in place to keep engineers aware of issues.
- Establish runbooks and procedures to keep outages to a minimum. Jump in before users notice that things are off track, then automate it for the future.
- Automate everything so that nothing is ever done manually in production.
- Identify and mitigate reliability and security risks. Make sure we are prepared for peak times,
- DDoS attacks and fat fingers.
- Troubleshoot issues across the whole stack - software, applications and network.
- Manage individual project priorities, deadlines, and deliverables as part of a self-organizing team.
- Learn and unlearn every day by exchanging knowledge and new insights, conducting constructive code reviews, and participating in retrospectives.
Requirements
- 2+ years extensive experience of Linux server administration include patching, packaging (rpm), performance tuning, networking, user management, and security.
- 2+ years of implementing systems that are highly available, secure, scalable, and self-healingon Azure cloud platform
- Strong understanding of networking, especially in cloud environments along with a good understanding of CICD.
- Prior experience implementing industry standard security best practices, including those recommended by Azure
- Proficiency with Bash, and any high-level scripting language.
- Basic working knowledge of observability stacks like ELK, prometheus, grafana, Signoz etc
- Proficiency with Infrastructure as Code and Infrastructure Testing, preferably using Pulumi/Terraform.
- Hands-on experience in building and administering VMs and Containers using tools such as Docker/Kubernetes.
- Excellent communication skills, spoken as well as written, with a demonstrated ability to articulate technical problems and projects to all stakeholders.
About us:
HappyFox is a software-as-a-service (SaaS) support platform. We offer an enterprise-grade help desk ticketing system and intuitively designed live chat software.
We serve over 12,000 companies in 70+ countries. HappyFox is used by companies that span across education, media, e-commerce, retail, information technology, manufacturing, non-profit, government and many other verticals that have an internal or external support function.
To know more, Visit! - https://www.happyfox.com/
Responsibilities:
- Build and scale production infrastructure in AWS for the HappyFox platform and its products.
- Research, Build/Implement systems, services and tooling to improve uptime, reliability and maintainability of our backend infrastructure. And to meet our internal SLOs and customer-facing SLAs.
- Proficient in managing/patching servers with Unix-based operating systems like Ubuntu Linux.
- Proficient in writing automation scripts or building infrastructure tools using Python/Ruby/Bash/Golang
- Implement consistent observability, deployment and IaC setups
- Patch production systems to fix security/performance issues
- Actively respond to escalations/incidents in the production environment from customers or the support team
- Mentor other Infrastructure engineers, review their work and continuously ship improvements to production infrastructure.
- Build and manage development infrastructure, and CI/CD pipelines for our teams to ship & test code faster.
- Participate in infrastructure security audits
Requirements:
- At least 5 years of experience in handling/building Production environments in AWS.
- At least 2 years of programming experience in building API/backend services for customer-facing applications in production.
- Demonstrable knowledge of TCP/IP, HTTP and DNS fundamentals.
- Experience in deploying and managing production Python/NodeJS/Golang applications to AWS EC2, ECS or EKS.
- Proficient in containerised environments such as Docker, Docker Compose, Kubernetes
- Proficient in managing/patching servers with Unix-based operating systems like Ubuntu Linux.
- Proficient in writing automation scripts using any scripting language such as Python, Ruby, Bash etc.,
- Experience in setting up and managing test/staging environments, and CI/CD pipelines.
- Experience in IaC tools such as Terraform or AWS CDK
- Passion for making systems reliable, maintainable, scalable and secure.
- Excellent verbal and written communication skills to address, escalate and express technical ideas clearly
- Bonus points – if you have experience with Nginx, Postgres, Redis, and Mongo systems in production.
-
Pixuate is a deep-tech AI start-up enabling businesses make smarter decisions with our edge-based video analytics platform and offer innovative solutions across traffic management, industrial digital transformation, and smart surveillance. We aim to serve enterprises globally as a preferred partner for digitization of visual information.
Job Description
We at Pixuate are looking for highly motivated and talented Senior DevOps Engineers to support building the next generation, innovative, deep-tech AI based products. If you are someone who has a passion for building a great software, has analytical mindset and enjoys solving complex problems, thrives in a challenging environment, self-driven, constantly exploring & learning new technologies, have ability to succeed on one’s own merits and fast-track your career growth we would love to talk!
What do we expect from this role?
- This role’s key area of focus is to co-ordinate and manage the product from development through deployment, working with rest of the engineering team to ensure smooth functioning.
- Work closely with the Head of Engineering in building out the infrastructure required to deploy, monitor and scale the services and systems.
- Act as the technical expert, innovator, and strategic thought leader within the Cloud Native Development, DevOps and CI/CD pipeline technology engineering discipline.
- Should be able to understand how technology works and how various structures fall in place, with a high-level understanding of working with various operating systems and their implications.
- Troubleshoots basic software or DevOps stack issues
You would be great at this job, if you have below mentioned competencies
- Tech /M.Tech/MCA/ BSc / MSc/ BCA preferably in Computer Science
- 5+ years of relevant work experience
- https://www.edureka.co/blog/devops-skills#knowledge">Knowledge on Various DevOps Tools and Technologies
- Should have worked on tools like Docker, Kubernetes, Ansible in a production environment for data intensive systems.
- Experience in developing https://www.edureka.co/blog/continuous-delivery/">Continuous Integration/ Continuous Delivery pipelines (CI/ CD) preferably using Jenkins, scripting (Shell / Python) and https://www.edureka.co/blog/what-is-git/">Git and Git workflows
- Experience implementing role based security, including AD integration, security policies, and auditing in a Linux/Hadoop/AWS environment.
- Experience with the design and implementation of big data backup/recovery solutions.
- Strong Linux fundamentals and scripting; experience as Linux Admin is good to have.
- Working knowledge in Python is a plus
- Working knowledge of TCP/IP networking, SMTP, HTTP, load-balancers (ELB, HAProxy) and high availability architecture is a plus
- Strong interpersonal and communication skills
- Proven ability to complete projects according to outlined scope and timeline
- Willingness to travel within India and internationally whenever required
- Demonstrated leadership qualities in past roles
More about Pixuate:
Pixuate, owned by Cocoslabs Innovative Solutions Pvt. Ltd., is a leading AI startup building the most advanced Edge-based video analytics products. We are recognized for our cutting-edge R&D in deep learning, computer vision and AI and we are solving some of the most challenging problems faced by enterprises. Pixuate’s plug-and-play platform revolutionizes monitoring, compliance to safety, and efficiency improvement for Industries, Banks & Enterprises by providing actionable real-time insights leveraging CCTV cameras.
We have enabled our customers such as Hindustan Unilever, Godrej, Secuira, L&T, Bigbasket, Microlabs, Karnataka Bank etc and rapidly expanding our business to cater to the needs of Manufacturing & Logistics, Oil and Gas sector.
Rewards & Recognitions:
- Winner of Elevate by Startup Karnataka (https://pixuate.ai/thermal-analytics/">https://pixuate.ai/thermal-analytics/).
- Winner of Manufacturing Innovation Challenge in the 2nd edition of Fusion 4.0’s MIC2020 organized by the NASSCOM Centre of Excellence in IoT & AI in 2021
- Winner of SASACT program organized by MEITY in 2021
Why join us?
You will get an opportunity to work with the founders and be part of 0 to 1 journey& get coached and guided. You will also get an opportunity to excel your skills by being innovative and contributing to the area of your personal interest. Our culture encourages innovation, freedom and rewards high performers with faster growth and recognition.
Where to find us?
Website: http://pixuate.com/">http://pixuate.com/
Linked in: https://www.linkedin.com/company/pixuate-ai
Work from Office – BengaluruPlace of Work:
Hands on experience in:
- Deploying, managing, securing and patching enterprise applications on large scale in Cloud preferably AWS.
- Experience leading End-to-end DevOps projects with modern tools encompassing both Applications and Infrastructure
- AWS Code deploy, Code build, Jenkins, Sonarqube.
- Incident management and root cause analysis.
- Strong understanding of immutable infrastructure and infrastructure as code concepts. Participate in capacity planning and provisioning of new resources. Importing already deployed infra into IaaC.
- Utilizing AWS cloud services such as EC2, S3, IAM, Route53, RDS, VPC, NAT/IG Gateway, LAMBDA, Load Balancers, CloudWatch, API Gateway are some of them.
- AWS ECS managing multi cluster container environments (ECS with EC2 and Fargate with service discovery using Route53)
- Monitoring/analytics tools like Nagios/DataDog and logging tools like LogStash/SumoLogic
- Simple Notification Service (SNS)
- Version Control System: Git, Gitlab, Bitbucket
- Participate in Security Audit of Cloud Infrastructure.
- Exceptional documentation and communication skills.
- Ready to work in Shift
- Knowledge of Akamai is Plus.
- Microsoft Azure is Plus
- Adobe AEM is plus.
- AWS Certified DevOps Professional is plus
Mandatory:
● A minimum of 1 year of development, system design or engineering experience ●
Excellent social, communication, and technical skills
● In-depth knowledge of Linux systems
● Development experience in at least two of the following languages: Php, Go, Python,
JavaScript, C/C++, Bash
● In depth knowledge of web servers (Apache, NgNix preferred)
● Strong in using DevOps tools - Ansible, Jenkins, Docker, ELK
● Knowledge to use APM tools, NewRelic is preferred
● Ability to learn quickly, master our existing systems and identify areas of improvement
● Self-starter that enjoys and takes pride in the engineering work of their team ● Tried
and Tested Real-world Cloud Computing experience - AWS/ GCP/ Azure ● Strong
Understanding of Resilient Systems design
● Experience in Network Design and Management
- Works independently without any supervision
- Work on continuous improvement of the products through innovation and learning. Someone with a knack for benchmarking and optimization
- Experience in deploying highly complex, distributed transaction processing systems.
- Stay abreast with new innovations and the latest technology trends and explore ways of leveraging these for improving the product in alignment with the business.
- As a component owner, where the component impacts across multiple platforms (5-10-member team), work with customers to obtain their requirements and deliver the end-to-end project.
Required Experience, Skills, and Qualifications
- 5+ years of experience as a DevOps Engineer. Experience with the Golang cycle is a plus
- At least one End to End CI/CD Implementation experience
- Excellent Problem Solving and Debugging skills in DevOps area· Good understanding of Containerization (Docker/Kubernetes)
- Hands-on Build/Package tool experience· Experience with AWS services Glue, Athena, Lambda, EC2, RDS, EKS/ECS, ALB, VPC, SSM, Route 53
- Experience with setting up CI/CD pipeline for Glue jobs, Athena, Lambda functions
- Experience architecting interaction with services and application deployments on AWS
- Experience with Groovy and writing Jenkinsfile
- Experience with repository management, code scanning/linting, secure scanning tools
- Experience with deployments and application configuration on Kubernetes
- Experience with microservice orchestration tools (e.g. Kubernetes, Openshift, HashiCorp Nomad)
- Experience with time-series and document databases (e.g. Elasticsearch, InfluxDB, Prometheus)
- Experience with message buses (e.g. Apache Kafka, NATS)
- Experience with key-value stores and service discovery mechanisms (e.g. Redis, HashiCorp Consul, etc)
Specific responsibilities commensurate with experience and include:
- Ability to react quickly and effectively to identify and resolve issues that heavily impact CI/CD system (immediate mitigation of impact, long-term resolution including strategies for risk mitigation/monitoring/alert for proactive resolution of potential future occurrences)
- Design, develop, unit test, and implement build automation scripts including environment configuration validation processes
- Automate and improve development process by evaluation and introduction of new tools and scripts, and manage their life cycle and validation
- Determine branching strategy and maintain branches for various components, products, and product lines
- Come up with solutions to open-ended problems that focus on workflow improvements for the Software department
- Address issues with well-defined requirements efficiently; come up with short-term and long-term solutions and staged deployment strategies
- Self-driven-- takes action to move tickets from start to completion with minimal oversight
- Ability to communicate with and consider perspectives of stakeholders including but not limited to: IT, software development, verification
- Ability to break down a problem into smaller components and solve them in a logical, controlled, clearly explainable approach
- Lead the creation and maintenance of a pre-production environment as a testbed for build process improvements and changes before deployment to the production environment
- Gather metrics via direct input, data based on analysis of developer working habits analysis and pain points to assess current state and areas requiring further improvement
- Define chain of communication and immediate paths of action in the case of a build fault state
- Ability to work within constraints of the internal network without access to commercial cloud solutions
- Create metrics that define ‘efficiency’ and ‘reliability’ in measurable terms, and track them
- Perform static code and security analysis
- Design and execute unit tests and perform code coverage analysis
- Able to work in Agile development team environment
Key Requirement & Qualifications:
- Bachelor’s degree (or higher) in Electrical Engineering, Computer Engineering, Computer Science or equivalent
- 6+ years (minimum) experience handling Build, Release, and Deployment of software on Windows and/or Linux environments (on-premise)
- Experience with the development and deployment of CM processes and tools
- Build automation for .NET using TeamCity (Jenkins is an asset)
- Scripting languages: Windows batch scripting, Powershell, Ant/NAnt
- Source control systems usage, branching strategies, and workflow (Git preferred, Subversion)
- 6+ years of hands-on programming experience with C# and .NET (both Framework and Core)
- Troubleshooting and debugging-- what information to gather when there are issues with CI/CD system, and how to gather it (i.e., analyzing network communication? Windows crash dumps, java logs, etc.)
- 6+ years (minimum) in web/desktop application software development experience
- Excellent problem solving, critical and analytical thinking
- Strong team player who understands SDLC and QA methodologies
- A professional, results-oriented individual with a high degree of self-motivation
- Excellent written and verbal communication skills and the ability to coordinate work/activities with multiple software/IT teams
- Working with virtual machines and build management on virtual machines (VMware preferred).
- Managing configurations for multiple build environments
- OS administration and scripting experience (Windows is a must, Linux desired)
- Experience with test automation tools (NUnit, customer inhouse frameworks) and strategies is an asset
- Creation and maintenance of monitoring and alert systems (Zabbix)
- Familiarity with databases (SQL-based) - create, modify, optimize (via script)
- Data and metrics gathering, aggregation, and reporting
- Experience with work management and documentation tools: JIRA and Confluence
Experience: 5+yrs
Skills Required: -
Experience in Azure Administration, Configuration and Deployment of WindowsLinux VMContainer
based infrastructure Scripting Programming in Python, JavaScriptTypeScript, C Scripting PowerShell ,
Azure CLI and shell Scripts Identity, Access Management and RBAC model Virtual Networking, storage,
and Compute Resources
Azure Database Technologies. Monitoring and Analytics Tools in Azure
Azure DevOps based CICD Build pipeline integrated with GitHub – Java and Node.js
Test Automation and other CICD Tools
Azure Infrastructure using ARM template Terrafor








