5+ Incident management Jobs in Pune | Incident management Job openings in Pune
Apply to 5+ Incident management Jobs in Pune on CutShort.io. Explore the latest Incident management Job opportunities across top companies like Google, Amazon & Adobe.
Lead Cloud Reliability Engineer
Job Responsibilities
● Lead and manage the Cloud Reliability teams to provide strong Managed Services support to end-customers.
● Isolate, troubleshoot and resolve issues reported by CMS clients in their cloud environment
● Drive the communication with the customer providing details about the issue, current steps, next plan of action, ETA
● Gather client's requirements related to use of specic cloud services and provide assistance in seing them up and resolving issues
● Create SOPs and knowledge articles for use by the L1 teams to resolve common issues
● Identify recurring issues, perform root cause analysis and propose/implement preventive actions
● Follow change management procedure to identify, record and implement changes
● Plan and deploy OS, security patches in Windows/Linux environment and upgrade k8s clusters
● Identify the recurring manual activities and contribute to automation
● Provide technical guidance and educate team members on development and operations. Monitor metrics and develop ways to improve.
● System troubleshooting and problem-solving across plaorm and application domains. Ability to use a wide variety of open-source technologies and cloud services.
● Build, maintain, and monitor conguration standards.
● Ensuring critical system security through using best-in-class cloud security solutions.
Qualifications
● 4-7 years experience in Cloud Infrastructure and Operations domains and IT operational experience preferably in a global enterprise environment.
● Specialize in one or two cloud deployment platforms: AWS, GCP
● Hands on experience with AWS/GCP services (EKS, ECS, EC2, VPC, RDS, Lambda, GKE, Compute Engine)
● Understanding of one or more programming languages (Python, JavaScript, Ruby, Java, .Net)
● Logging and Monitoring tools (ELK, Stackdriver, CloudWatch)
● Knowledge on Conguration Management tools such as Ansible, Terraform, Puppet, Chef
● Experience working with deployment and orchestration technologies (such as Docker, Kubernetes, Mesos)
● Good analytical, communication, problem solving, and learning skills.
● Knowledge on programming against cloud plaorms such as Google Cloud Platform and lean development methodologies.
● Strong service aitude and a commitment to quality.
● Willingness to work in shifts.
Job Summary: We are seeking a proactive and technically skilled information security (SOC) Engineer/Analyst to monitor, detect, and respond to cybersecurity threats in real-time. The ideal candidate will have strong analytical skills, be detail-oriented, and possess a sound understanding of threat landscapes, SIEM tools, and incident response. The ideal candidate will possess a strong foundational understanding of cybersecurity governance, robust technical skills in security operations, and a commitment to staying abreast of the evolving threat landscape and internal security requirements.
Key Responsibilities
- Monitor security events and alerts from SIEM and other security tools.
- Perform initial triage and investigation of potential threats or anomalous behavior.
- Escalate incidents according to severity and defined procedures.
- Document incidents, provide root cause analysis, and maintain detailed logs.
- Analyze threat intelligence feeds and correlate with internal data.
- Assist in threat hunting and vulnerability management activities.
- Support continuous improvement of SOC processes and playbooks.
- Collaborate with other IT and Security teams for incident resolution.
- Assist in developing and tuning SIEM rules, queries, and dashboards for threat detection.
- Contribute to vulnerability management and secure configuration of internal systems and cloud environments.
- Support the testing and execution of recovery plans for security systems and data.
- Document incident findings, remediation steps, and contribute to post-incident reviews.
Required Skills & Qualifications:
- Bachelor’s degree in Computer Science, Cybersecurity, or related field.
- 2–5 years of experience in a SOC environment or similar security operations role.
- Familiarity with SIEM tools (e.g., Splunk, QRadar, Sentinel).
- Understanding of TCP/IP, firewalls, IDS/IPS, and common attack vectors.
- Knowledge of malware, phishing, ransomware, and social engineering tactics.
- Hands-on experience with endpoint protection, network monitoring, and forensic tools.
- Excellent communication and documentation skills.
- Preferred Certifications:
- CompTIA Security+ or CySA+
- Vendor-specific SIEM certifications.
Job Title: Incident Manager – Fault Management
Function: Incident, Problem & Change Management
Department: NOC Operations / Command Centre / Service Operations
Experience: 6–10 Years
Employment Type: Full-Time
Shift: 24x7 Rotational (as per business requirement)
Location: Gurgaon / Mumbai / Pune (or as per project needs)
Job Summary
We are looking for an experienced Incident Manager to lead fault management operations, ensuring rapid restoration of services and minimal business impact. The role focuses on Incident, Problem, and Change Management, acting as a central point of coordination during major incidents and ensuring compliance with ITIL processes, SLAs, and governance standards.
The ideal candidate will have strong operational leadership, stakeholder communication, and escalation management skills within complex IT / Telecom environments.
Key Responsibilities
Incident Management
- Own and manage P1/P2/P3 incidents end-to-end in line with ITIL standards.
- Act as Incident Commander during major incidents, leading bridge calls and coordinating technical teams.
- Ensure timely incident detection, logging, categorization, prioritization, and resolution.
- Drive restoration efforts and ensure adherence to SLAs, OLAs, and KPIs.
- Provide regular incident status updates to customers, management, and stakeholders.
- Ensure proper incident documentation, closure notes, and audit readiness.
Fault Management & Monitoring
- Oversee proactive fault detection through NOC monitoring tools.
- Ensure alarms and alerts are correlated, triaged, and assigned appropriately.
- Coordinate with L2/L3 engineering teams for fault isolation and resolution.
- Identify recurring faults and initiate preventive actions.
Problem Management
- Lead Root Cause Analysis (RCA) for recurring and major incidents.
- Facilitate Post-Incident Reviews (PIRs) and track corrective and preventive actions (CAPA).
- Maintain problem records and trend analysis to reduce repeat incidents.
- Work closely with engineering and vendors to drive permanent fixes.
Change Management
- Govern changes to production environments to minimize risk.
- Review and validate Change Requests (CRs), MOPs, rollback plans, and impact assessments.
- Participate in CAB (Change Advisory Board) meetings.
- Ensure changes are executed as per approved windows with pre/post validation.
- Track change-related incidents and drive improvement actions.
Stakeholder & Vendor Coordination
- Act as a single point of contact during service-impacting events.
- Coordinate with internal teams, service providers, OEMs, and vendors.
- Manage customer communication during outages and critical events.
- Escalate issues appropriately to senior management when required.
Governance, Reporting & Continuous Improvement
- Prepare and publish incident, problem, and change management reports (daily/weekly/monthly).
- Monitor and improve operational KPIs and SLA performance.
- Drive process improvements aligned with ITIL best practices.
- Maintain SOPs, runbooks, escalation matrices, and communication templates.
- Support audits, compliance reviews, and regulatory requirements (if applicable).
Required Skills & Competencies
Technical & Process Skills
- Strong expertise in ITIL Incident, Problem, and Change Management.
- Experience working in NOC / Command Centre / Telecom / Enterprise IT Operations.
- Good understanding of infrastructure domains:
- Network (LAN/WAN/SD-WAN)
- Security (Firewalls, SOC coordination)
- Data Center / Cloud (basic understanding)
- Familiarity with monitoring tools (SolarWinds, Netcool, Splunk, PRTG, etc.).
- Hands-on experience with ITSM tools such as ServiceNow, Remedy, Helix, Jira.
Soft Skills
- Strong leadership and decision-making abilities during high-pressure situations.
- Excellent verbal and written communication skills.
- Strong stakeholder and customer management capability.
- Analytical mindset with attention to detail.
- Ability to work independently and in cross-functional teams.
Education & Certifications
- Bachelor’s degree in Engineering, IT, Computer Science, or related field.
- ITIL Foundation (mandatory); ITIL Intermediate/Expert is a plus.
- PMP / PRINCE2 / Agile certifications are advantageous.
Experience
- 6–10 years of experience in Incident / Problem / Change Management roles.
- Prior experience handling Major Incidents in 24x7 operations environments.
- Experience in Telecom, BFSI, Managed Services, or Large Enterprise IT preferred.
Key Performance Indicators (KPIs)
- Incident response and resolution times.
- SLA and availability compliance.
- Reduction in repeat incidents.
- Quality and timeliness of RCA reports.
- Change success rate and reduction in change-related incidents.
L2 Support
Location : Mumbai, Pune, Bangalore
Requirement details : (Mandatory Skills)
- Excell communication skills
- Production Support, Incident Management
- SQL ( Must have experience in writing complex queries )
- Unix ( Must have working experience on Linux operating system.
- Pearl/Shell Scripting
- Candidates working in the Investment Banking domain will be preferred
• Maintain, update, and enhance ITSM environment including Incident, Request,
Problem, Change and Knowledge Management, Service Catalog, Service Portals,
SLAs, Discovery, and Integrations
• Participate in the implementation and configuration of other ServiceNow products
outside of ITSM to increase the adoption of the ServiceNow platform.
• Responsible for performing daily administration, issue resolution, and
troubleshooting of the ServiceNow platform.
Monitor health, usage, and overall compliance of ServiceNow and its applications.
To be the right fit, you will need:
• Minimum of 2 years of demonstrated experience in ServiceNow development and
configuration.
• Proficient in developing, integrating and maintaining applications using ServiceNow
Platform technology and tools.
• Understanding of and experience working with IT Service Management processes
(Incident, Problem, Change, Release, SLM, Service Catalog), PPM/ITBM
• Understanding of IT and business requirements with ability to develop, test, deploy
improvements, and update Manage and administer lists, filters, forms, platform
interfaces (to other data sources) and the data within tables, import sets and the
CMDB.
• Preferably certified in ServiceNow




