About the Role
We are seeking an experienced OCI L3 / Technical Lead to own the reliability, performance, security, and cost efficiency of our OCI workloads. You will serve as the highest technical escalation point for OCI operations, lead architecture and automation initiatives, mentor engineers, and collaborate cross-functionally to ensure resilient, compliant, and scalable solutions. This role combines hands-on engineering with leadership responsibilities across incident management, platform engineering, and cloud governance.
________________________________________
Key Responsibilities
1) L3 Operations & Escalation Management
• Act as the final technical escalation point for critical incidents, complex problems, and performance issues across OCI services (Compute, Networking, Storage, IAM, Load Balancer, WAF, OKE/Kubernetes, DBaaS/Autonomous DB, Exadata Cloud Service).
• Lead root cause analysis (RCA), produce corrective/preventive action plans, and drive problem management per ITIL.
• Own on-call rotations for priority incidents; coordinate across L2/L1 teams and vendors for swift resolution.
2) Architecture, Design & Governance
• Design and review high-availability, disaster recovery (HA/DR) architectures leveraging OCI regions, ADs, Fault Domains, Backup/Archive Storage, Data Guard (for Oracle DB), and multi-cloud patterns as needed.
• Define landing zone architectures, tenancy/subscription structure, compartment strategy, IAM policies, tagging, and cost governance.
• Establish standards for network segmentation (VCNs, subnets), routing, VPN/ FastConnect, NSGs/Security Lists, and WAF
3) Observability, Performance & Reliability
• Implement and optimize Monitoring, Logging, Alarms, APM, Tracing, and Log Analytics in OCI.
• Capacity plans, and performance baselines; drive performance tuning of compute, networking, databases, and storage.
4) Security, Compliance & Risk
• Enforce OCI security best practices: IAM least privilege, vaults/keys, secrets management, Cloud Guard, vulnerability scanning, CIS benchmarks, and Security Zones.
• Partner with GRC teams on audit readiness, regulatory compliance (e.g., ISO 27001, SOC 2, PCI DSS), data residency, and incident response tabletop exercises.
• Drive patching baselines, image hardening, and secure configuration drift detection.
5) Cost Management & FinOps
• Implement tagging, budgets, usage reports, and cost policies; recommend rightsizing, storage tiers, autoscaling, and reservations/committed use discounts.
• Run monthly cost reviews and produce optimization recommendations; integrate with FinOps dashboards/tools.
6) Migration & Modernization
• Lead migrations into OCI (re-host, re-platform, re-architect) for workloads including Oracle Databases, app servers, microservices, and data pipelines.
• Guide adoption of managed services (Autonomous DB, OKE, Streaming, Functions, API Gateway, Data Integration) and container strategies.
7) Stakeholder Leadership & Mentoring
• Serve as technical lead for cross-functional projects; translate business needs into robust cloud designs.
• Mentor L1/L2 engineers; deliver runbooks, playbooks, and capability uplift training.
• Collaborate with DBAs, App Owners, Security, Network, DevOps, and Product teams for end-to-end outcomes.
________________________________________
Required Qualifications
• 8–12+ years in cloud/infra engineering; 4+ years hands-on with OCI at scale.
• Deep expertise across OCI core services: Compute, VCN/Networking, Block/Object/Archive Storage, Load Balancer, WAF, IAM, Cloud Guard, Logging/Monitoring, OKE/Kubernetes, Autonomous DB/Exadata Cloud.
• Strong in scripting (Python/Bash/PowerShell)
• Solid understanding of ITIL and incident/problem/change processes.
• Proven experience with HA/DR architectures, performance tuning, and cost optimization.
• Hands-on with security hardening, compliance frameworks, and audit support.
________________________________________
Preferred Certifications (Nice to Have)
• OCI Architect Professional
• OCI Cloud Operations Associate / OCI Security Professional
• Oracle Autonomous Database / Exadata Cloud certifications
• CKA/CKAD (Kubernetes), Terraform Associate
• ITIL v4 Foundation/Managing Professional
________________________________________
Technical Stack (Representative)
• Cloud: OCI (Tenancy, Compartments, IAM, Policies, Tags, Budgets)
• Compute/Containers: Compute instances, OKE, OCI Registry, Functions
• Networking: VCN, Subnets, DRG, NAT, Service Gateway, VPN, FastConnect, NSG, WAF, Load Balancer
• Storage/DB: Block/Object/Archive, File Storage, Autonomous DB, Exadata Cloud Service, Data Guard
• Observability: OCI Monitoring, Alarms, Logging, Log Analytics, APM
• Security: Cloud Guard, Security Zones, Vault, KMS, IAM, Policies
• Automation: Terraform, Ansible, Python/Bash/PowerShell, OCI CLI/SDK
• ITSM: Remedy/ServiceNow/Jira (incidents, changes, CMDB), Confluence/Wiki
________________________________________
Soft Skills & Attributes
• Systems thinking, strong analytical and troubleshooting skills.
• Clear communication (can articulate trade-offs and risk).
• Ownership mindset; calm under pressure during Major Incidents.
• Collaborative leadership and mentorship; ability to influence without authority.
________________________________________
Typical Day / Week
• Morning: Review alarms, dashboards, capacity & cost trends; act on exceptions.
• Daytime: Lead solution designs, review plan of actions, mentor engineers, handle L3 escalations.
• Weekly: Architecture council, change advisory board (CAB), cost/security review, RCA review.
• Monthly/Quarterly: DR drills, failure mode analysis, audit support, roadmap updates.
________________________________________
Education
• Bachelor’s/Master’s in Computer Science, Information Technology, or equivalent experience.