
Senior MLOps Engineer
LLM Operations, Observability & Eval Infrastructure
š Mumbai (On-site) | Full-time | 5-7 years
About the Role:
Unico Connect is an AI-first technology partner that builds custom mobile, web, and AI products for clients across multiple geographies.
We are hiring a Senior MLOps Engineer for a dedicated client engagement focused on building an AI-powered application builder platform. The platform consumes LLMs at scale through provider APIs.
This role owns the operational discipline around production LLM consumption - increasingly called LLMOps - covering observability, evaluation infrastructure, model lifecycle, cost operations, prompt deployment, and agent run reliability.
The mandatory requirement is hands-on production experience operating LLM-backed systems, with a strong DevOps or SRE foundation. This is not a model training or ML science role.
The work is making the system around the AI engineer's designs observable, controlled, reliable, and economically accountable. You will pair daily with the Senior AI Engineer, who designs prompts, evals, and agent behaviour - you operationalise those systems for production.
A typical week includes a tracing audit on a degraded agent run, an eval pipeline build for a new model release, a cost attribution review, and a staged prompt rollout.
Responsibilities:
Observability and Tracing
Build and own end-to-end tracing for agent runs: every prompt, response, tool call, token count, latency, and cost, linked to user session and project.
Stand up and operate LLM observability tooling (Langfuse, LangSmith, Braintrust, or Arize Phoenix).
Make debugging a single bad agent run among thousands a routine workflow through searchable traces, failure taxonomies, and dashboards segmented by task type.
Evaluation Infrastructure as a Production System
Operationalise the eval suite designed by the Senior AI Engineer: automated execution in CI on every prompt or model change, with results stored and trended over time.
Implement regression gates that block quality-degrading changes from shipping.
Build production sampling to continuously score a sample of real agent runs and catch quality drift that offline evals miss.
Model Lifecycle Management
Pin model versions, never "latest".
Own the upgrade process: run the eval suite against new model releases and manage eval-gated migrations.
Maintain fallback chains across providers for graceful degradation or queueing during outages.
Track provider deprecation schedules and plan migrations ahead of forced cutoffs.
Cost Operations
Implement per-user and per-task cost attribution - token spend is the platform's largest variable cost and requires the same rigour as cloud cost management.
Set up budget alerts and anomaly detection so a single user or bug cannot burn significant spend overnight.
Monitor prompt cache hit rates and quantify savings.
Manage capacity planning around provider rate limits, including quota negotiation and throughput tiering.
Prompt and Configuration Deployment
Treat prompts as production artifacts: version control for prompts and agent configurations, staged rollout infrastructure (deploy a prompt change to a percentage of traffic before full rollout), A/B testing infrastructure, instant rollback, and audit history covering which prompt version served which user and when.
Reliability Engineering for Agent Runs
Agent runs are long, stateful, and failure-prone.
Own retry and resume semantics so a run that fails mid-way does not restart from scratch.
Implement timeouts and circuit breakers on provider calls, dead-letter handling for failed runs, and queue and concurrency management for agent workloads.
SLO Ownership and Incident Response
Define and track SLOs for agent run latency and completion rates.
Lead incident response when SLOs are breached.
Write postmortems.
Surface reliability risks proactively before they reach users.
Safety and Compliance Operations
Run the moderation pipeline (prompt and output classification) in production.
Monitor for abuse patterns and own incident response when the agent misbehaves at scale.
Maintain audit logs and implement data retention and residency policies for prompts and generated code as enterprise requirements emerge.
AI-Assisted Engineering Discipline
Use Claude, Cursor, and similar tools day to day for infrastructure code, scripts, and pipelines.
Set the team standard for safe use, review, and validation of AI-generated infrastructure before it ships.
Requirements:
Hands-on production ownership of LLM-backed systems in operation (mandatory).
Must have personally shipped and operated at least one LLM-powered system in production, with operational responsibility including oncall, incident response, and reliability ownership.
Alternatively: strong DevOps or SRE background with demonstrated hands-on familiarity with LLMOps tooling (Langfuse, LangSmith, Braintrust, Arize, or equivalent).
POCs and lab work do not qualify.
5+ years of overall engineering experience
With at least 2 years in DevOps, SRE, platform engineering, or LLM operations roles.
This is not an ML science role.
A DevOps or SRE background with a substantive pivot into LLMOps is a strong qualification.
Observability and Tracing Depth
Production experience with LLM observability tooling - Langfuse, LangSmith, Braintrust, or Arize Phoenix.
Comfortable instrumenting with OpenTelemetry, Prometheus, and Grafana.
Able to build and search trace pipelines, define failure taxonomies, and surface quality signals from production traffic.
CI/CD and Quality Gate Experience
Strong with GitHub Actions or GitLab CI.
Experience building automated quality gates: eval-gated pipelines, regression enforcement, or coverage gates that block degrading changes from shipping.
Cost Management and Attribution for Usage-Based Services
Experience owning cost attribution for cloud API spend or equivalent.
Comfortable with budget alerts, anomaly detection, and per-user or per-task cost breakdowns.
Reliability Engineering for Long-Running, Stateful Workloads
Experience with queues, retry patterns, idempotency, and failure recovery on asynchronous or multi-step workloads.
Comfortable defining SLOs and being accountable for them on production systems.
Multi-Provider API Management
Familiarity with LLM provider rate limits, version pinning, fallback chains, and quota management across OpenAI, Anthropic, Google, or equivalent.
Infrastructure as Code and Deployment Automation
Hands-on with Terraform or Pulumi and Docker.
AWS working knowledge (EC2, S3, IAM, EKS or ECS).
Strong with CI/CD for deploying services and configuration changes safely.
Nice to Have
- Experience with prompt A/B testing or staged rollout infrastructure
- Workflow orchestration (BullMQ, Temporal, Celery)
- Content moderation pipeline experience
- Data residency and compliance requirements for AI systems
- Kubernetes (EKS) in production
- AWS certifications

About Unico Connect Private Limited
About
Building quality products are a challenge !
Taking up challenges is our way of upscaling our performance.
Unico Connect is a digital product development company based in Mumbai, India, that comprises of a team of young enthusiastic nerds who thrive on great ideas and exciting projects that look to bring innovative changes in the world. We ideate, create and execute exceptional digital products that revolutionizes the face of modern business.
Photos
Similar jobs

Job Description: Azure Integration Specialist (BizTalk to Azure Integration Services)
Experience: 4+ Years
Location: Hybrid / Remote
Employment Type: Full-Time
Role Overview
We are seeking an experienced Azure Integration Specialist with 4+ years of experience in enterprise application integration, including hands-on expertise in Microsoft BizTalk Server and Azure Integration Services (AIS). The ideal candidate will be responsible for designing, developing, supporting, and migrating integration solutions from BizTalk Server to Azure-based integration platforms.
Key Responsibilities
* Design, develop, and maintain enterprise integration solutions using Azure Integration Services.
* Participate in migration projects from Microsoft BizTalk Server to Azure Integration Services.
* Develop and support Azure Logic Apps, Service Bus, API Management, Azure Functions, and Event Grid solutions.
* Analyze existing BizTalk applications, orchestrations, maps, pipelines, and adapters for migration and modernization.
* Build and consume REST and SOAP APIs and integrate with third-party applications.
* Collaborate with business stakeholders, architects, and development teams to gather integration requirements.
* Troubleshoot and resolve integration issues in production and non-production environments.
* Implement monitoring, logging, and alerting for integration solutions.
* Ensure security, scalability, performance, and compliance requirements are met.
* Prepare technical documentation, design specifications, and deployment procedures.
Required Skills
* 4+ years of experience in enterprise application integration.
* Strong experience with Microsoft BizTalk Server administration and development.
* Hands-on experience with Azure Integration Services:
Ā Ā * Azure Logic Apps
Ā Ā * Azure Service Bus
Ā Ā * Azure API Management
Ā Ā * Azure Functions
Ā Ā * Event Grid
* Experience with XML, XSLT, JSON, and data transformation.
* Strong understanding of REST, SOAP, Web Services, and API integrations.
* Experience with Azure DevOps, CI/CD pipelines, and source control tools.
* Knowledge of integration patterns and messaging concepts.
* Good troubleshooting and analytical skills.
Preferred Skills
* Experience with BizTalk to Azure migration projects.
* Knowledge of Azure Monitoring, Application Insights, and Log Analytics.
* Experience with hybrid integration scenarios.
* Azure certifications are preferred.
Qualifications
* Bachelorās degree in Computer Science, Information Technology, or a related field.
* Strong communication and stakeholder management skills.
* Ability to work independently and in a collaborative team environment.
Job Description
Position Title: Senior System Engineer
Position Type: Full Time
Department: RSG
Reports to: First Level Manager, Indian Development Centre
Company Background:
Cglia is a software development company building highly available, highly secure, cloud-based enterprise software products that helps speed the research process resulting in new drugs, new devices, and new treatments to improve the health and wellbeing of world population.
At Cglia, our work shows our dedication and passion for innovative quality software products that are intuitive and easy to use and exceeds every aspect of customer expectations.
Cglia, is the place that develops world-class professionals who would like to be innovative, creative, learn continuously, and build a solid foundation to build products that are special and delight the customer.
Job Description:
The Senior System Engineer will have expertise in managing both Linux and Windows environments, along with hands-on experience in containerization technologies such as Kubernetes and Docker. Proficiency in Ansible for automation and configuration management is essential. This role is critical in ensuring the seamless operation, deployment, and maintenance of our IT infrastructure.
The ideal candidate has to oversee and participate with the installation, monitoring, maintenance, support, optimization and documentation of all network hardware and software. This includes managing multiple projects, planning network technology roadmaps and configuring/optimizing network services both internally and those integrated with Internet-based services
Job Responsibilities:
Ā· Manage, maintain, and monitor Linux and Windows servers to ensure high availability and performance.
Ā· Perform system upgrades, patches, and performance tuning for both operating systems and DBA servers.
Ā· Deploy, manage, and troubleshoot containerized applications using Kubernetes and Docker.
Ā· Design and implement Kubernetes clusters to ensure scalability, security, and reliability.
Ā· Develop and maintain Ansible playbooks for automation of repetitive tasks, configuration management,
and system provisioning.
Ā· Implement security best practices for both Linux and Windows environments.
Ā· Set up and manage backup and disaster recovery solutions for critical systems and data.
Ā· Work closely with development teams to support CI/CD pipelines and troubleshoot application issues.
Ā· Manage VM Ware in a high availability environment with Disaster Recovery
Ā· Good experience in RAID & Firewall
Ā· Maintaining and managing SQL database server support
Ā· Experience with scripting languages Unix/Shell, Bash or PowerShell
Ā· Assist Quality Assurance with testing program changes, new releases or user documentation and support
new product release activities that include testing customer flows
Ā· Must have the ability to work a flexible schedule and is required to participate in on-call rotation, which
includes different shift timings, weekends, and holidays
Ā· Work across multiple time zones with remote team members
Ā· Perform other duties as deemed necessary to provide quality service to the clients
Experience and Skills Required:
Ā· Minimum 4+ years of experience in Linux and Windows administration
Ā· 3 years of experience in VM Ware in a high availability environment with Disaster Recovery
Ā· Good experience in RAID & Firewall
Ā· 2+ years of experience in SQL database server support
Ā· Ability to quickly acquire an in-depth knowledge of multiple custom applications
Ā· Experience in setting up IT policies based on best practices and monitoring them
Ā· Experience in shell scripting and automating tasks
Ā· Experience in hardware and software monitoring tools
Ā· Experience in administration and best practices for Apache and Tomcat
Ā· Experience in handling Cisco router and firewall configurations and management
Ā· Working knowledge on SQL Server, Oracle and other RDBMS databases
Ā· Must be proactive and possess strong interpersonal, communication and organization skills
Ā· Must possess excellent written and verbal presentation skills
Ā· Must be self-motivated
Ā· Certification in Linux/Windows administration is preferable.
Academics:
Ā· Bachelor's / Master's degree (or equivalent) in computer science or related field or equivalent experience.
Location: Bangalore, India
Experience: 3 Years
Company: Tradelab Technologies
About Tradelab Technologies:
Tradelab Technologies is a leading fintech solutions provider building high-performance trading platforms, brokerage infrastructure, and financial technology products. Our systems handle real-time market data, order management, and analytics for clients across the trading ecosystem.
Role Overview:
We are looking for a skilled DevOps Engineer to manage, optimize, and scale our trading infrastructure. The ideal candidate should have strong experience with CI/CD pipelines, cloud infrastructure, containerization, and system automation, with an emphasis on reliability and performance in production environments.
Key Responsibilities:
- Design, implement, and maintain CI/CD pipelines for automated deployment and monitoring.
- Manage and scale cloud infrastructure (AWS, GCP, or Azure) for high-availability trading systems.
- Work closely with development and QA teams to ensure smooth integration and release processes.
- Automate provisioning, configuration, and monitoring using tools like Ansible, Terraform, or similar.
- Implement logging, alerting, and monitoring systems for proactive issue detection.
- Ensure system reliability, security, and performance in production environments.
- Manage version control and containerized environments (Git, Docker, Kubernetes).
- Troubleshoot infrastructure issues and optimize deployment performance.
Required Skills & Qualifications:
- Bachelorās degree in Computer Science, Engineering, or equivalent.
- Minimum 3 years of experience in DevOps, SRE, or Infrastructure Engineering roles.
- Strong hands-on experience with AWS / GCP / Azure.
- Proficiency in CI/CD tools like Jenkins, GitLab CI, or GitHub Actions.
- Expertise in Docker, Kubernetes, and container orchestration.
- Experience with infrastructure-as-code tools like Terraform, Ansible, or CloudFormation.
- Proficient with Linux administration, shell scripting, and Python or Go for automation.
- Knowledge of monitoring tools like Prometheus, Grafana, ELK Stack, or Datadog.
- Familiarity with networking, security, and load balancing concepts.
Nice-to-Have Skills:
- Experience working with trading or low-latency systems.
- Knowledge of message queues (Kafka, RabbitMQ).
- Exposure to microservices architecture and API management.
- Experience with incident management and disaster recovery planning.
Why Join Tradelab Technologies:
- Be part of a fast-paced fintech environment working on scalable trading infrastructure.
- Collaborate with talented teams solving real-world financial technology challenges.
- Competitive pay, flexible work culture, and opportunities for growth.
Please Apply - https://zrec.in/RZ7zE?source=CareerSite
About Us
Infra360 Solutions is a services company specializing in Cloud, DevSecOps, Security, and Observability solutions. We help technology companies adapt DevOps culture in their organization by focusing on long-term DevOps roadmap. We focus on identifying technical and cultural issues in the journey of successfully implementing the DevOps practices in the organization and work with respective teams to fix issues to increase overall productivity. We also do training sessions for the developers and make them realize the importance of DevOps. We provide these services - DevOps, DevSecOps, FinOps, Cost Optimizations, CI/CD, Observability, Cloud Security, Containerization, Cloud Migration, Site Reliability, Performance Optimizations, SIEM and SecOps, Serverless automation, Well-Architected Review, MLOps, Governance, Risk & Compliance. We do assessments of technology architecture, security, governance, compliance, and DevOps maturity model for any technology company and help them optimize their cloud cost, streamline their technology architecture, and set up processes to improve the availability and reliability of their website and applications. We set up tools for monitoring, logging, and observability. We focus on bringing the DevOps culture to the organization to improve its efficiency and delivery.
Job Description
Job Title:Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā DevOps Engineer GCP
Department:Ā Ā Ā Ā Ā Ā Ā Technology
Location:Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Gurgaon
Work Mode:Ā Ā Ā Ā Ā Ā Ā Ā Ā On-site
Working Hours:Ā Ā Ā 10 AM - 7 PMĀ
Terms:Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Permanent
Experience:Ā Ā Ā Ā Ā 2-4 years
Education:Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā B.Tech/MCA/BCA
Notice Period:Ā Ā Ā Ā Ā Immediately
Infra360.io is searching for a DevOps Engineer to lead our group of IT specialists in maintaining and improving our software infrastructure. You'll collaborate with software engineers, QA engineers, and other IT pros in deploying, automating, and managing the software infrastructure. As a DevOps engineer you will also be responsible for setting up CI/CD pipelines, monitoring programs, and cloud infrastructure.Ā
Below is a detailed description of the roles and responsibilities, expectations for the role.
Tech Stack :
- Kubernetes:Ā Deep understanding of Kubernetes clusters, container orchestration, and its architecture.
- Terraform:Ā Extensive hands-on experience with Infrastructure as Code (IaC) using Terraform for managing cloud resources.
- ArgoCD:Ā Experience in continuous deployment and using ArgoCD to maintain GitOps workflows.
- Helm:Ā Expertise in Helm for managing Kubernetes applications.
- Cloud Platforms:Ā Expertise in GCP,Ā AWSĀ or Azure will be an added advantage.
- Debugging and Troubleshooting:Ā The DevOps Engineer must be proficient in identifying and resolving complex issues in a distributed environment, ranging from networking issues to misconfigurations in infrastructure or application components.
Key Responsibilities:
- CI/CD and configuration management
- Doing RCA of production issues and providing resolution
- Setting up failover, DR, backups, logging, monitoring, and alerting
- Containerizing different applications on the Kubernetes platform
- Capacity planning of different environment's infrastructure
- Ensuring zero outages of critical services
- Database administration of SQL and NoSQL databases
- Infrastructure as a code (IaC)
- Keeping the cost of the infrastructure to the minimum
- Setting up the right set of security measures
- CI/CD and configuration management
- Doing RCA of production issues and providing resolution
- Setting up failover, DR, backups, logging, monitoring, and alerting
- Containerizing different applications on the Kubernetes platform
- Capacity planning of different environment's infrastructure
- Ensuring zero outages of critical services
- Database administration of SQL and NoSQL databases
- Infrastructure as a code (IaC)
- Keeping the cost of the infrastructure to the minimum
- Setting up the right set of security measures
Ideal Candidate Profile:
- A graduation/post-graduation degree in Computer Science and related fields
- 2-4 years of strong DevOps experience with the Linux environment.
- Strong interest in working in our tech stack
- Excellent communication skills
- Worked with minimal supervision and love to work as a self-starter
- Hands-on experience with at least one of the scripting languages - Bash, Python, Go etc
- Experience with version control systems like Git
- Strong experience of GCP.
- Strong experience with managing the Production Systems day in and day out
- Experience in finding issues in different layers of architecture in production environment and fixing them
- Knowledge of SQL and NoSQL databases, ElasticSearch, Solr etc.
- Knowledge of Networking, Firewalls, load balancers, Nginx, Apache etc.
- Experience in automation tools like Ansible/SaltStack and Jenkins
- Experience in Docker/Kubernetes platform and managing OpenStack (desirable)
- Experience with Hashicorp tools i.e. Vault, Vagrant, Terraform, Consul, VirtualBox etc. (desirable)
- Experience with managing/mentoring small team of 2-3 people (desirable)
- Experience in Monitoring tools like Prometheus/Grafana/Elastic APM.
- Experience in logging tools Like ELK/Loki.
Senior DevOps Engineer
Experience: Minimum 5 years of relevant experience
Key Responsibilities:
 ⢠Hands-on experience with AWS tools and CI/CD pipelines, Redhat Linux
 ⢠Strong expertise in DevOps practices and principles
 ⢠Experience with infrastructure automation and configuration management
 ⢠Excellent problem-solving skills and attention to detail
Nice to Have:
 ⢠Redhat certification
Role Purpose:
As a DevOps ,Ā You should be strong in both the Dev and Ops part of DevOps. We are looking for someone who has a deep understanding of systems architecture, understands core CS concepts well, and is able to reason about system behaviour rather than merely working with the toolset of the day. We believe that only such a person will be able to set a compelling direction for the team and excite those around them.Ā
If you are someone who fits the description above, you will find that the rewards are well worth the high bar. Being one of the early hires of the Bangalore office, you will have a significant impact on the culture and the team; you will work with a set of energetic and hungry peers who will challenge you, and you will have considerable international exposure and opportunity for impact across departments.
Responsibilities
- Deployment, management, and administration of web services in a public cloud environment
- Design and develop solutions for deploying highly secure, highly available, performant and scalable services in elastically provisioned environments
- Design and develop continuous integration and continuous deployment solutions from development through production
- Own all operational aspects of running web services including automation, monitoring and alerting, reliability and performance
- Have direct impact on running a business by thinking about innovative solutions to operational problems
- Drive solutions and communication for production impacting incidents
- Running technical projects and being responsible for project-level deliveries
- Partner well with engineering and business teams across continents
Required Qualifications
- Bachelorās or advanced degree in Computer Science or closely related field
- 4 - 6 years professional experience in DevOps, with at least 1/2 years in Linux / Unix
- Very strong in core CS concepts around operating systems, networks, and systems architecture including web services
- Strong scripting experience in Python and Bash
- Deep experience administering, running and deploying AWS based services
- Solid experience with Terraform, Packer and Docker or their equivalents
- Knowledge of security protocols and certificate infrastructure.
- Strong debugging, troubleshooting, and problem solving skillsĀ
- Broad experience with cloud hosted applications including virtualization platforms, relational and non relational data stores, reverse proxies, and orchestration platforms
- Curiosity, continuous learning and drive to continually raise the bar
- Strong partnering and communication skills
Preferred Qualifications
- Past experience as a senior developer or application architect strongly preferred.
- Experience building continuous integration and continuous deployment pipelines
- Experience with Zookeeper, Consul, HAProxy, ELK-Stack, Kafka, PostgreSQL.
- Experience working with, and preferably designing, a system compliant to any security framework (PCI DSS, ISO 27000, HIPPA, SOC 2, ...)
- Experience with AWS orchestration services such as ECS and EKS.
- Experience working with AWS ML pipeline services like AWS Sagemak
Cloud Software Engineer
Notice Period: 45 days / Immediate Joining
Ā
Banyan Data Services (BDS) is a US-based Infrastructure services Company, headquartered in San Jose, California, USA. It provides full-stack managed services to support business applications and data infrastructure.⯠We do provide the data solutions and services on bare metal, On-prem, and all Cloud platforms.⯠Our engagement service is built on the DevOps standard practice and SRE model.
Ā
We offer you an opportunity to join our rocket ship startup, run by a world-class executive team. We are looking for candidates that aspire to be a part of the cutting-edge solutions and services we offer, that address next-gen data evolution challenges. Candidates who are willing to use their experience in areas directly related to Infrastructure Services, Software as Service, and Cloud Services and create a niche in the market.
Ā
Roles and Responsibilities
Ā· A wide variety of engineering projects including data visualization, web services, data engineering, web-portals, SDKs, and integrations in numerous languages, frameworks, and clouds platforms
Ā· Apply continuous delivery practices to deliver high-quality software and value as early as possible.
Ā· Work in collaborative teams to build new experiences
Ā· Participate in the entire cycle of software consulting and delivery from ideation to deployment
Ā· Integrating multiple software products across cloud and hybrid environments
Ā· Developing processes and procedures for software applications migration to the cloud, as well as managed services in the cloud
Ā· Migrating existing on-premises software applications to cloud leveraging a structured method and best practices
Ā
Desired Candidate Profile :Ā *** freshers can also apply ***
Ā
Ā· 2+years of experience with 1 or more development languages such as Java, Python, or Spark.
Ā· 1 year + of experience with private/public/hybrid cloud model design, implementation, orchestration, and support.
Ā· Certification or any training's completion of any one of the cloud environments like AWS, GCP, Azure, Oracle Cloud, and Digital Ocean.
Ā· Strong problem-solvers who are comfortable in unfamiliar situations, and can view challenges through multiple perspectives
Ā· Driven to develop technical skills for oneself and team-mates
Ā· Hands-on experience with cloud computing and/or traditional enterprise datacentre technologies, i.e., network, compute, storage, and virtualization.
Ā· Possess at least one cloud-related certification from AWS, Azure, or equivalent
Ā· Ability to write high-quality, well-tested code and comfort with Object-Oriented or functional programming patterns
Ā· Past experience quickly learning new languages and frameworks
Ā· Ability to work with a high degree of autonomy and self-direction
http://www.banyandata.com" target="_blank">www.banyandata.comĀ
- Cloud and virtualization-based technologies (Amazon Web Services (AWS), VMWare).
- Java Application Server Administration (Weblogic, WidlFfy, JBoss, Tomcat).
- Docker and Kubernetes (EKS)
- Linux/UNIX Administration (Amazon Linux and RedHat).
- Developing and supporting cloud infrastructure designs and implementations and guiding application development teams.
- Configuration Management tools (Chef or Puppet or ansible).
- Log aggregations tools such as Elastic and/or Splunk.
- Automate infrastructure and application deployment-related tasks using terraform.
- Automate repetitive tasks required to maintain a secure and up-to-date operational environment.
Responsibilities
- Build and support always-available private/public cloud-based software-as-a-service (SaaS) applications.
- Build AWS or other public cloud infrastructure using Terraform.
- Deploy and manage Kubernetes (EKS) based docker applications in AWS.
- Create custom OS images using Packer.
- Create and revise infrastructure and architectural designs and implementation plans and guide the implementation with operations.
- Liaison between application development, infrastructure support, and tools (IT Services) teams.
- Development and documentation of Chef recipes and/or ansible scripts. Support throughout the entire deployment lifecycle (development, quality assurance, and production).
- Help developers leverage infrastructure, application, and cloud platform features and functionality participate in code and design reviews, and support developers by building CI/CD pipelines using Bamboo, Jenkins, or Spinnaker.
- Create knowledge-sharing presentations and documentation to help developers and operations teams understand and leverage the system's capabilities.
- Learn on the job and explore new technologies with little supervision.
- Leverage scripting (BASH, Perl, Ruby, Python) to build required automation and tools on an ad-hoc basis.
Who we have in mind:
- Solid experience in building a solution on AWS or other public cloud services using Terraform.
- Excellent problem-solving skills with a desire to take on responsibility.
- Extensive knowledge in containerized application and deployment in Kubernetes
- Extensive knowledge of the Linux operating system, RHEL preferred.
- Proficiency with shell scripting.
- Experience with Java application servers.
- Experience with GiT and Subversion.
- Excellent written and verbal communication skills with the ability to communicate technical issues to non-technical and technical audiences.
- Experience working in a large-scale operational environment.
- Internet and operating system security fundamentals.
- Extensive knowledge of massively scalable systems. Linux operating system/application development desirable.
- Programming in scripting languages such as Python. Other object-oriented languages (C++, Java) are a plus.
- Experience with Configuration Management Automation tools (chef or puppet).
- Experience with virtualization, preferably on multiple hypervisors.
- BS/MS in Computer Science or equivalent experience.
- Excellent written and verbal skills.
Education or Equivalent Experience:
- Bachelor's degree or equivalent education in related fields
- Certificates of training in associated fields/equipmentās
Ā
Ā

Requirements and Qualifications
- Bachelorās degree in Computer Science Engineering or in a related field
- 4+ years of experience
- Excellent analytical and problem-solving skills
- Strong knowledge of Linux systems and internals
- Programming experience in Python/Shell scripting
- Strong AWS skills with knowledge of EC2, VPC, S3, RDS, Cloudfront, Route53, etc
- Experience in containerization (Docker) and container orchestration (Kubernetes)
- Experience in DevOps & CI/CD tools such as Git, Jenkins, Terraform, Helm
- Experience with SQL & NoSQL databases such as MySql, MongoDB, and ElasticSearch
- Debugging and troubleshooting skills using tools such as strace, tcpdump, etc
- Good understanding of networking protocol and security concerns (VPN, VPC, IG, NAT, AZ, Subnet)
- Experience with monitoring and data analysis tools such as Prometheus, EFK, etc
- Good communication & collaboration skills and attention to details
- Participation in rotating on-call duties













