MLOps Lead Engineer
at IT solutions specialized in Apps Lifecycle management. (MG1)
- Automate and maintain ML and Data pipelines at scale
- Collaborate with Data Scientists and Data Engineers on feature development teams to containerize and build out deployment pipelines for new modules
- Maintain and expand our on-prem deployments with spark clusters
- Design, build and optimize applications containerization and orchestration with Docker and Kubernetes and AWS or Azure
- 5 years of IT experience in data-driven or AI technology products
- Understanding of ML Model Deployment and Lifecycle
- Extensive experience in Apache airflow for MLOps workflow automation
- Experience is building and automating data pipelines
- Experience in working on Spark Cluster architecture
- Extensive experience with Unix/Linux environments
- Experience with standard concepts and technologies used in CI/CD build, deployment pipelines using Jenkins
- Strong experience in Python and PySpark and building required automation (using standard technologies such as Docker, Jenkins, and Ansible).
- Experience with Kubernetes or Docker Swarm
- Working technical knowledge of current systems software, protocols, and standards, including firewalls, Active Directory, etc.
- Basic knowledge of Multi-tier architectures: load balancers, caching, web servers, application servers, and databases.
- Experience with various virtualization technologies and multi-tenant, private and hybrid cloud environments.
- Hands-on software and hardware troubleshooting experience.
- Experience documenting and maintaining configuration and process information.
- Basic Knowledge of machine learning frameworks: Tensorflow, Caffe/Caffe2, Pytorch
Similar jobs
DESIRED SKILLS AND EXPERIENCE
Strong analytical and problem-solving skills
Ability to work independently, learn quickly and be proactive
3-5 years overall and at least 1-2 years of hands-on experience in designing and managing DevOps Cloud infrastructure
Experience must include a combination of:
o Experience working with configuration management tools – Ansible, Chef, Puppet, SaltStack (expertise in at least one tool is a must)
o Ability to write and maintain code in at least one scripting language (Python preferred)
o Practical knowledge of shell scripting
o Cloud knowledge – AWS, VMware vSphere o Good understanding and familiarity with Linux
o Networking knowledge – Firewalls, VPNs, Load Balancers
o Web/Application servers, Nginx, JVM environments
o Virtualization and containers - Xen, KVM, Qemu, Docker, Kubernetes, etc.
o Familiarity with logging systems - Logstash, Elasticsearch, Kibana
o Git, Jenkins, Jira
The Key Responsibilities Include But Not Limited to:
Help identify and drive Speed, Performance, Scalability, and Reliability related optimization based on experience and learnings from the production incidents.
Work in an agile DevSecOps environment in creating, maintaining, monitoring, and automation of the overall solution-deployment.
Understand and explain the effect of product architecture decisions on systems.
Identify issues and/or opportunities for improvements that are common across multiple services/teams.
This role will require weekend deployments
Skills and Qualifications:
1. 3+ years of experience in a DevOps end-to-end development process with heavy focus on service monitoring and site reliability engineering work.
2. Advanced knowledge of programming/scripting languages (Bash, PERL, Python, Node.js).
3. Experience in Agile/SCRUM enterprise-scale software development including working with GiT, JIRA, Confluence, etc.
4. Advance experience with core microservice technology (RESTFul development).
5. Working knowledge of using Advance AI/ML tools are pluses.
6. Working knowledge in the one or more of the Cloud Services: Amazon AWS, Microsoft Azure
7. Bachelors or Master’s degree in Computer Science or equivalent related field experience
Key Behaviours / Attitudes:
Professional curiosity and a desire to a develop deep understanding of services and technologies.
Experience building & running systems to drive high availability, performance and operational improvements
Excellent written & oral communication skills; to ask pertinent questions, and to assess/aggregate/report the responses.
Ability to quickly grasp and analyze complex and rapidly changing systemsSoft skills
1. Self-motivated and self-managing.
2. Excellent communication / follow-up / time management skills.
3. Ability to fulfill role/duties independently within defined policies and procedures.
4. Ability to balance multi-task and multiple priorities while maintaining a high level of customer satisfaction is key.
5. Be able to work in an interrupt-driven environment.Work with Dori Ai world class technology to develop, implement, and support Dori's global infrastructure.
As a member of the IT organization, assist with the analyze of existing complex programs and formulate logic for new complex internal systems. Prepare flowcharting, perform coding, and test/debug programs. Develop conversion and system implementation plans. Recommend changes to development, maintenance, and system standards.
Leading contributor individually and as a team member, providing direction and mentoring to others. Work is non-routine and very complex, involving the application of advanced technical/business skills in a specialized area. BS or equivalent experience in programming on enterprise or department servers or systems.
- Candidate should have good Platform experience on Azure with Terraform.
- The devops engineer needs to help developers, create the Pipelines and K8s Deployment Manifests.
- Good to have experience on migrating data from (AWS) to Azure.
- To manage/automate infrastructure automatically using Terraforms. Jenkins is the key CI/CD tool which we uses and it will be used to run these Terraforms.
- VMs to be provisioned on Azure Cloud and managed.
- Good hands on experience of Networking on Cloud is required.
- Ability to setup Database on VM as well as managed DB and Proper set up of cloud hosted microservices needs to be done to communicate with the db services.
- Kubernetes, Storage, KeyValult, Networking(load balancing and routing) and VMs are the key infrastructure expertise which are essential.
- Requirement is to administer Kubernetes cluster end to end. (Application deployment, managing namespaces, load balancing, policy setup, using blue-green/canary deployment models etc).
- The experience in AWS is desirable
- Python experience is optional however Power shell is mandatory
- Know-how on the use of GitHub
Our Client is an IT infrastructure services company, focused and specialized in delivering solutions and services on Microsoft products and technologies. They are a Microsoft partner and cloud solution provider. Our Client's objective is to help small, mid-sized as well as global enterprises to transform their business by using innovation in IT, adapting to the latest technologies and using IT as an enabler for business to meet business goals and continuous growth.
With focused and experienced management and a strong team of IT Infrastructure professionals, they are adding value by making IT Infrastructure a robust, agile, secure and cost-effective service to the business. As an independent IT Infrastructure company, they provide their clients with unbiased advice on how to successfully implement and manage technology to complement their business requirements.
- Providing on-call support within a high availability production environment
- Logging issues
- Providing Complex problem analysis and resolution for technical and application issues
- Supporting and collaborating with team members
- Running system updates
- Monitoring and responding to system alerts
- Developing and running system health checks
- Applying industry standard practices across the technology estate
- Performing system reviews
- Reviewing and maintaining infrastructure configuration
- Diagnosing performance issues and network bottlenecks
- Collaborating within geographically distributed teams
- Supporting software development infrastructure by continuous integration and delivery standards
- Working closely with developers and QA teams as part of a customer support centre
- Projecting delivery work, either individually or in conjunction with other teams, external suppliers or contractors
- Ensuring maintenance of the technical environments to meet current standards
- Ensuring compliance with appropriate industry and security regulations
- Providing support to Development and Customer Support teams
- Managing the hosted infrastructure through vendor engagement
- Managing 3rd party software licensing ensuring compliance
- Delivering new technologies as agreed by the business
What you need to have:
- Experience working within a technical operations environment relevant to associated skills stated.
- Be proficient in:
- Linux, zsh/ bash/ similar
- ssh, tmux/ screen/ similar
- vim/ emacs/ similar
- Computer networking
- Have a reasonable working knowledge of:
- Cloud infrastructure, Preferably GCP
- One or more programming/ scripting languages
- Git
- Docker
- Web services and web servers
- Databases, relational and NoSQL
- Some familiarity with:
- Puppet, ansible
- Terraform
- GitHub, CircleCI , Kubernetes
- Scripting language- Shell
- Databases: Cassandra, Postgres, MySQL or CloudSQL
- Agile working practices including scrum and Kanban
- Private & public cloud hosting environments
- Strong technology interests with a positive ‘can do’ attitude
- Be flexible and adaptable to changing priorities
- Be good at planning and organising their own time and able to meet targets and deadlines without supervision
- Excellent written and verbal communication skills.
- Approachable with both colleagues and team members
- Be resourceful and practical with an ability to respond positively and quickly to technical and business challenges
- Be persuasive, articulate and influential, but down to earth and friendly with own team and colleagues
- Have an ability to establish relationships quickly and to work effectively either as part of a team or singularly
- Be customer focused with both internal and external customers
- Be capable of remaining calm under pressure
- Technically minded with good problem resolution skills and systematic manner
- Excellent documentation skills
- Prepared to participate in out of hours support rota
A.P.T Portfolio, a high frequency trading firm that specialises in Quantitative Trading & Investment Strategies.Founded in November 2009, it has been a major liquidity provider in global Stock markets.
As a manager, you would be incharge of managing the devops team and your remit shall include the following
- Private Cloud - Design & maintain a high performance and reliable network architecture to support HPC applications
- Scheduling Tool - Implement and maintain a HPC scheduling technology like Kubernetes, Hadoop YARN Mesos, HTCondor or Nomad for processing & scheduling analytical jobs. Implement controls which allow analytical jobs to seamlessly utilize ideal capacity on the private cloud.
- Security - Implementing best security practices and implementing data isolation policy between different divisions internally.
- Capacity Sizing - Monitor private cloud usage and share details with different teams. Plan capacity enhancements on a quarterly basis.
- Storage solution - Optimize storage solutions like NetApp, EMC, Quobyte for analytical jobs. Monitor their performance on a daily basis to identify issues early.
- NFS - Implement and optimize latest version of NFS for our use case.
- Public Cloud - Drive AWS/Google-Cloud utilization in the firm for increasing efficiency, improving collaboration and for reducing cost. Maintain the environment for our existing use cases. Further explore potential areas of using public cloud within the firm.
- BackUps - Identify and automate back up of all crucial data/binary/code etc in a secured manner at such duration warranted by the use case. Ensure that recovery from back-up is tested and seamless.
- Access Control - Maintain password less access control and improve security over time. Minimize failures for automated job due to unsuccessful logins.
- Operating System -Plan, test and roll out new operating system for all production, simulation and desktop environments. Work closely with developers to highlight new performance enhancements capabilities of new versions.
- Configuration management -Work closely with DevOps/ development team to freeze configurations/playbook for various teams & internal applications. Deploy and maintain standard tools such as Ansible, Puppet, chef etc for the same.
- Data Storage & Security Planning - Maintain a tight control of root access on various devices. Ensure root access is rolled back as soon the desired objective is achieved.
- Audit access logs on devices. Use third party tools to put in a monitoring mechanism for early detection of any suspicious activity.
- Maintaining all third party tools used for development and collaboration - This shall include maintaining a fault tolerant environment for GIT/Perforce, productivity tools such as Slack/Microsoft team, build tools like Jenkins/Bamboo etc
Qualifications
- Bachelors or Masters Level Degree, preferably in CSE/IT
- 10+ years of relevant experience in sys-admin function
- Must have strong knowledge of IT Infrastructure, Linux, Networking and grid.
- Must have strong grasp of automation & Data management tools.
- Efficient in scripting languages and python
Desirables
- Professional attitude, co-operative and mature approach to work, must be focused, structured and well considered, troubleshooting skills.
- Exhibit a high level of individual initiative and ownership, effectively collaborate with other team members.
APT Portfolio is an equal opportunity employer
Exposure to development and implementation practices in a modern systems environment together with exposure to working in a project team particularly with reference to industry methodologies, e.g. Agile, continuous delivery, etc
- At least 3-5 years of experience building and maintaining AWS infrastructure (VPC, EC2, Security Groups, IAM, ECS, CodeDeploy, CloudFront, S3)
- Strong understanding of how to secure AWS environments and meet compliance requirements
- Experience using DevOps methodology and Infrastructure as Code
- Automation / CI/CD tools – Bitbucket Pipelines, Jenkins
- Infrastructure as code – Terraform, Cloudformation, etc
- Strong experience deploying and managing infrastructure with Terraform
- Automated provisioning and configuration management – Ansible, Chef, Puppet
- Experience with Docker, GitHub, Jenkins, ELK and deploying applications on AWS
- Improve CI/CD processes, support software builds and CI/CD of the development departments
- Develop, maintain, and optimize automated deployment code for development, test, staging and production environments
- Proven experience in handling large infrastructure and distributed systems like Kafka, Yarn, Elastic Search, etc..
- Familiarity with Python-related technologies and frameworks like Django or Pyramid.
- Experience with Unix/Linux operating systems internals and administration (e.g. filesystems, inodes, system calls, etc) or networking (e.g. TCP/IP, routing, network topologies, and hardware, SDN, etc)
- Familiarity with at least one of the cloud computing infrastructures - GCP / Azure / AWS
- Familiarity with task queue frameworks like Celery or Pika is a plus.
- Source code management and Implementation of security best practices.
- Experienced in building monitoring/metrics & alerting tool (APM tool), a custom dashboard for each Application stack against the supported environment
- Good understanding & implementation experience using 12-factor App principles
- Awareness of Cloud Security concepts
- Awareness of Information Security concepts and Best Practices
Requirements:-
- Bachelor’s Degree or Master’s in Computer Science, Engineering,Software Engineering or a relevant field.
- Strong experience with Windows/Linux-based infrastructures, Linux/Unix administration.
- knowledge of Jira, Bitbucket, Jenkins, Xray, Ansible, Windows and .Net. as their Core Skill.
- Strong experience with databases such as SQL, MS SQL, MySQL, NoSQL.
- Knowledge of scripting languages such as Shell Scripting /Python/ PHP/Groovy, Bash.
- Experience with project management and workflow tools such as Agile, Jira / WorkFront etc.
- Experience with open-source technologies and cloud services.
- Experience in working with Puppet or Chef for automation and configuration.
- Strong communication skills and ability to explain protocol and processes with team and management.
- Experience in a DevOps Engineer role (or similar role)
- AExperience in software development and infrastructure development is a plus
Job Specifications:-
- Building and maintaining tools, solutions and micro services associated with deployment and our operations platform, ensuring that all meet our customer service standards and reduce errors.
- Actively troubleshoot any issues that arise during testing and production, catching and solving issues before launch.
- Test our system integrity, implemented designs, application developments and other processes related to infrastructure, making improvements as needed
- Deploy product updates as required while implementing integrations when they arise.
- Automate our operational processes as needed, with accuracy and in compliance with our security requirements.
- Specifying, documenting and developing new product features, and writing automating scripts. Manage code deployments, fixes, updates and related processes.
- Work with open-source technologies as needed.
- Work with CI and CD tools, and source control such as GIT and SVN.
- Lead the team through development and operations.
- Offer technical support where needed, developing software for our back-end systems.
Senior Devops Engineer
Who are we?
Searce is a niche’ Cloud Consulting business with futuristic tech DNA. We do new-age tech to realise the “Next” in the “Now” for our Clients. We specialise in Cloud Data Engineering, AI/Machine Learning and Advanced Cloud infra tech such as Anthos and Kubernetes. We are one of the top & the fastest growing partners for Google Cloud and AWS globally with over 2,500 clients successfully moved to cloud.
What do we believe?
- Best practices are overrated
- Implementing best practices can only make one n ‘average’ .
- Honesty and Transparency
- We believe in naked truth. We do what we tell and tell what we do.
- Client Partnership
- Client - Vendor relationship: No. We partner with clients instead.
- And our sales team comprises 100% of our clients.
How do we work?
It’s all about being Happier first. And rest follows. Searce work culture is defined by HAPPIER.
- Humble: Happy people don’t carry ego around. We listen to understand; not to respond.
- Adaptable: We are comfortable with uncertainty. And we accept changes well. As that’s what life's about.
- Positive: We are super positive about work & life in general. We love to forget and forgive. We don’t hold grudges. We don’t have time or adequate space for it.
- Passionate: We are as passionate about the great street-food vendor across the street as about Tesla’s new model and so on. Passion is what drives us to work and makes us deliver the quality we deliver.
- Innovative: Innovate or Die. We love to challenge the status quo.
- Experimental: We encourage curiosity & making mistakes.
- Responsible: Driven. Self motivated. Self governing teams. We own it.
Are you the one? Quick self-discovery test:
- Love for cloud: When was the last time your dinner entailed an act on “How would ‘Jerry Seinfeld’ pitch Cloud platform & products to this prospect” and your friend did the ‘Sheldon’ version of the same thing.
- Passion for sales: When was the last time you went at a remote gas station while on vacation, and ended up helping the gas station owner saasify his 7 gas stations across other geographies.
- Compassion for customers: You listen more than you speak. When you do speak, people feel the need to listen.
- Humor for life: When was the last time you told a concerned CEO, ‘If Elon Musk can attempt to take humanity to Mars, why can’t we take your business to run on cloud ?
Introduction
When was the last time you thought about rebuilding your smart phone charger using solar panels on your backpack OR changed the sequencing of switches in your bedroom (on your own, of course) to make it more meaningful OR pointed out an engineering flaw in the sequencing of traffic signal lights to a fellow passenger, while he gave you a blank look? If the last time this happened was more than 6 months ago, you are a dinosaur for our needs. If it was less than 6 months ago, did you act on it? If yes, then let’s talk.
We are quite keen to meet you if:
- You eat, dream, sleep and play with Cloud Data Store & engineering your processes on cloud architecture
- You have an insatiable thirst for exploring improvements, optimizing processes, and motivating people.
- You like experimenting, taking risks and thinking big.
3 things this position is NOT about:
- This is NOT just a job; this is a passionate hobby for the right kind.
- This is NOT a boxed position. You will code, clean, test, build and recruit & energize.
- This is NOT a position for someone who likes to be told what needs to be done.
3 things this position IS about:
- Attention to detail matters.
- Roles, titles, ego does not matter; getting things done matters; getting things done quicker & better matters the most.
- Are you passionate about learning new domains & architecting solutions that could save a company millions of dollars?
Roles and Responsibilities
This is an entrepreneurial Cloud/DevOps Lead position that evolves to the Director- Cloud engineering .This position requires fanatic iterative improvement ability - architect a solution, code, research, understand customer needs, research more, rebuild and re-architect, you get the drift. We are seeking hard-core-geeks-turned-successful-techies who are interested in seeing their work used by millions of users the world over.
Responsibilities:
- Consistently strive to acquire new skills on Cloud, DevOps, Big Data, AI and ML technologies
- Design, deploy and maintain Cloud infrastructure for Clients – Domestic & International
- Develop tools and automation to make platform operations more efficient, reliable and reproducible
- Create Container Orchestration (Kubernetes, Docker), strive for full automated solutions, ensure the up-time and security of all cloud platform systems and infrastructure
- Stay up to date on relevant technologies, plug into user groups, and ensure our client are using the best techniques and tools
- Providing business, application, and technology consulting in feasibility discussions with technology team members, customers and business partners
- Take initiatives to lead, drive and solve during challenging scenarios
Requirements:
- 3 + Years of experience in Cloud Infrastructure and Operations domains
- Experience with Linux systems, RHEL/CentOS preferred
- Specialize in one or two cloud deployment platforms: AWS, GCP, Azure
- Hands on experience with AWS services (EC2, VPC, RDS, DynamoDB, Lambda)
- Experience with one or more programming languages (Python, JavaScript, Ruby, Java, .Net)
- Good understanding of Apache Web Server, Nginx, MySQL, MongoDB, Nagios
- Knowledge on Configuration Management tools such as Ansible, Terraform, Puppet, Chef
- Experience working with deployment and orchestration technologies (such as Docker, Kubernetes, Mesos)
- Deep experience in customer facing roles with a proven track record of effective verbal and written communications
- Dependable and good team player
- Desire to learn and work with new technologies
Key Success Factors
- Are you
- Likely to forget to eat, drink or pee when you are coding?
- Willing to learn, re-learn, research, break, fix, build, re-build and deliver awesome code to solve real business/consumer needs?
- An open source enthusiast?
- Absolutely technology agnostic and believe that business processes define and dictate which technology to use?
- Ability to think on your feet, and follow-up with multiple stakeholders to get things done
- Excellent interpersonal communication skills
- Superior project management and organizational skills
- Logical thought process; ability to grasp customer requirements rapidly and translate the same into technical as well as layperson terms
- Ability to anticipate potential problems, determine and implement solutions
- Energetic, disciplined, with a results-oriented approach
- Strong ethics and transparency in dealings with clients, vendors, colleagues and partners
- Attitude of ‘give me 5 sharp freshers and 6 months and I will rebuild the way people communicate over the internet.
- You are customer-centric, and feel strongly about building scalable, secure, quality software. You thrive and succeed in delivering high quality technology products in a growth environment where priorities shift fast.
You will be responsible for
1. Setting up, maintaining cloud (AWS/GCP/Azure) and kubernetes cluster and automating
their operation
2. All operational aspects of devtron platform including maintenance, upgrades,
automation.
3. Providing kubernetes expertise to facilitate smooth and fast customer onboarding on
devtron platform
Responsibilities:
1. Manage devtron platform on multiple kubernetes clusters
2. Designing and embedding industry best practices for online services including disaster
recovery, business continuity, monitoring/alerting, and service health measurement
3. Providing operational support for day to day activities involving the deployment of
services
4. Identify opportunities for improving the security, reliability, and scalability of the platform
5. Facilitate smooth and fast customer onboarding on devtron platform
6. Drive customer engagement
Requirements:
● Bachelor's Degree in Computer Science or a related field.
● 2+ years working as a devops engineer
● Proficient in 1 or more programming languages (e.g. Python, Go, Ruby).
● Familiar with shell scripts, Linux commands, network fundamentals
● Understanding of large scale distributed systems
● Basic understanding of cloud computing (AWS/GCP/Azure)
Preferred Qualifications:
● Great analytical and interpersonal skills
● Passion for creating efficient, reliable, reusable programs/scripts.
● Excited about technology, have a strong interest in learning about and playing with the
latest technologies and doing POC.
● Strong customer focus, ownership, urgency and drive.
● Knowledge and experience with cloud native tools like prometheus, kubernetes, docker,
grafana.