Monitoring Specialist
- Install, Configuration, and Tuning of the following AppDynamics Servers: Controller, Event Service Cluster, End User Monitoring, ADA, ADRUM
- Reviews system design and works to continuously improve stability and efficiencies
- Provides system backup recovery methodology and makes recommendations regarding enhancements and/or improvements
- Formulates policies, procedures, and standards relating to system management, and monitors system resource utilization
- Responsible for reducing operational downtime for critical, scheduled, and unscheduled maintenance by accelerating deployments of approved changes/fixes/updates and solutions and automate manual maintenance, deployment, diagnostic health checks, validation, and reporting
- Responsible for creating proactive and reactive monitoring methods, generating customer alerts within the Enterprise Event Management and Monitoring capability
- Skilled at user requirement gathering and can work independently to craft efficient monitoring, alarming solutions, and dashboards
- Understands the Agile process
- Build dashboards in Grafana.
- Setup independently Prometheus, Node exporters, etc
- Comfortable with PromQL.
- Ability to operationally support the underlying database as necessary
- Hands-on Java and/or .Net Development
- IT Operations and Application Support
- Application and systems performance management, measurement, and analysis.
- Deployment and configuration of complex enterprise software
- Solid understanding of Operating Systems (Linux/Windows)
- Strong understanding of built-in O/S monitoring and performance tools.
- Working with a wide variety of platforms and application stacks.
- Ability to understand new application frameworks in customer environments quickly
- Works with minimal direction as a seasoned resource
- Support customer initiatives in their transition towards modernization
- Tracks own work and backlog, familiar with Agile methodology
- Prioritize own work in accordance with user priorities and stakeholder expectations
- Communicates efficiently and effectively both written and verbal
- Reviews system design and works to continuously improve stability and efficiencies
- AppDynamics
- Java
- Linux/Windows
- Fault and Performance Monitoring Tools Administration
- Machine/Dotnet/Java agent
- Grafana, ELK
- Prometheus.
About IT solutions specialized in Apps Lifecycle management. (MG1)
Similar jobs
Job Description.
DB developer
This position is for Oracle Golden Gate developer and DBA position. The Candidate will implement and support Oracle Golden Gate replication components across many databases. Working closely with many application teams and implementing near real time replications with golden gate. Candidate should provide operational support as well.
Roles and responsibilities :
Configure and build Golden Gate Extracts/replicate for multiple databases.
Configure error handling for restart, logging, production support and status reporting in Golden Gate.
Identify and produce documentation of best practices.
Trouble shooting and performance tuning Golden Gate replications.
Implementing necessary monitoring scripts for Golden Gate in UNIX/PERL scripting
Provide Golden Gate solutions :
- Working with Oracle on SRs for critical issues.
- Strong oral and written communication skills along with problem solving skills.
Individual with min experience of over 5+ years on SQL Server
- Good Experience in SQL Server Installation and Configuration
- Backup and Recovery
- Security management
- Troubleshooting & Monitoring
- SSIS/SSRS/SSAS
Profile : Senior System Analyst
Experience :6.0 Year+
Job Location: Noida, Sec-62, ( Work From Office only )
Shift Time : Rotational shift
Working Days : 6 Days
JD Details:
Must-Haves (Technical)
● Experienced in
○ Network monitoring( Must )
○ DB and Cache monitoring
○ Kubernetes monitoring
○ Security events monitoring
○ Web server monitoring
○ Critical server level monitoring(Golden metrics) for distributed systems.
● Hands on experience in following monitoring tools :
○ ELK
○ Grafana ( Must )
○ Nagios (Must)
○ Cacti
○ RKE/Rancher
○ Splunk
○ Cloudtrail
● Hands on experience with the following APM tools :
○ NewRelic
○ AppDynamics ( Must )
○ Datadog ( Must )
● Experienced in the concept of Continuous Monitoring(CM).
● SME in combining multiple data sources to get a clear picture of production systems
● SME for creating alerts related to platform security in above tools.
● Strong knowledge of Linux and Windows environments.
● Strong knowledge of cloud environments.
Good To Have (Technical)
● Scripting for automation
○ Python
○ Bash
● Containerization
○ Docker (Basic knowledge)
● Container Orchestration
○ Kubernetes (Basic knowledge)
● Infrastructure as Code
○ Terraform (Basic knowledge)
7+ years of experience of Workday HCM configuration across 4+ Workday modules and 5+ deployments
Key responsibilities
- Lead design/requirements workshops
- Highly confident in configuring workday with the ability to utilise Workday Community where necessary
- Highly confident in building accurate estimates for Workday configuration work
- Approve effort estimates of more junior consultants
- High level of understanding of how business utilise HR technology to their advantage
- Directly manage a team of consultants based in India
- Manage customer escalations around direct reports work
- Support in building the CloudRock team based in Indian through being an advocate for CloudRock and a leadership figure
- Support and mentor junior resource based in India/Portugal/UK
- Full understanding of the Workday deployment methodology
- Achieve a high level of utilisation
- Be consultative in their customer approach
Job Summary
Cloud Production Support Engineer(PSE) is responsible for fulfilling the day-to-day infrastructure and service requests from the application teams across AWS, CI/CD solutions and observability tools. You will be expected to handle production issues in collaboration with the cloud Infrastructure and application teams.
Responsibilities and Duties
- Troubleshoot production Issues: When technical issues with the cloud infrastructure components arise, PSE must act quickly to analyse the available data and find the root cause of the problem. They may then develop a solution or escalate the problem to other engineering team members while providing stakeholders with progress updates.
- Infrastructure provisioning and modification: Application teams may request to create new infrastructure or modify the existing ones in AWS based on their requirements via the ticketing tool. PSE should ensure that the required data/info is available on the ticket and provide a resolution based on the given SLA.
- Alert Management: Alerts from the observability tools will be received on multiple channels according to the notification settings. PSEs are expected to acknowledge the alerts, troubleshoot the issue, close the alert based on the given SLA, or escalate to the cloud infra/DevOps team for further diagnosis.
- Onboarding, Off-boarding and access management: Whenever an employee joins or leaves the organization, you will receive an onboarding or offboarding request.
- Prepare Technical Documentation: PSEs must prepare documentation when logging product issues, as they must note all details, including their observations, diagnoses, and action steps. Other everyday tasks include weekly reports summarising production performance, upgrade release notes, and troubleshooting guides.
- Product Improvements: Since PSEs have good exposure to the product issues, they should work closely with the PMs+EMs, pass the feedback on the product, and get the improvements/fixes included in the product roadmap.
- Adherence to SLA and timelines: PSEs should always adhere to the timelines shared with other teams for closure of fixes and deliver outcomes as per the SLA guidance agreed with business teams
- Reporting: Report & track weekly regarding SLA metrics, tickets being worked and closed by PSEs/transferred tickets. Identify and devise how productivity can be captured at the individual level and report the same monthly.
Qualifications and Skills
- Degree in Computer Science/Information Technology.
- Two years or more experience in Cloud and system administration.
- Experience troubleshooting in complex environments using monitoring tools.
- Demonstrated experience with containerisation technologies (Docker, Kubernetes, etc.)
- Hands-on experience with the most common AWS services.
- Provide hands on technical support and post-mortem root cause analysis using ITIL standards of Incident Management, Service Request fulfillment, Change Management, Knowledge Management, and Problem Management.
- Actively address and work on user and system tickets in the Service Now ticketing application. Create and implement change tickets for enhancements, new monitoring, and assisting development groups.
- Create, test, and implement Non-Functional Requirements (NFR) for current and new applications.
- Build up technical subject matter expertise on the applications being supported including business flows, application architecture, and hardware configuration. Maintain documentation, knowledge articles, and runbooks.
- Conduct real time monitoring to ensure application OLA/SLAs are achieved and maximum application availability (up time) using an array of monitoring tools.
- Assist in the process to approve application code releases change tickets as well as tasks assigned to the support team to perform and validate the associated implementation plan.
- Approach support with a proactive attitude, desire to seek root cause, in-depth analysis and triage, and strive to reduce inefficiencies and manual efforts.
SAP APO Consultant
Minimum 6+ year of Experience, relevant 5+ years as SAP APO Consultant.
Having an experience in DP andd SNP module
Minimum 2 end to end implementation is Mandatory
• This is for a support project, we are looking for candidates with minimum of 1 Implementation, Rollout, Support, upgrade, enhancements and various areas with implementation
• For users globally to ensure decent project delivery, service delivery, incident management, problem management, and change management.
• Analyse the systems and work directly with users to define system requirements, design and propose solutions, configure the software and train employees.
• Performing day-to-day maintenance on the SAP system, as well as installing new upgrades and testing for bugs, besides system configuration and data migration.
• Optimizing the system for easy use, and training employees in its functions and support all new required business improvements, changes and propose the integrated best-fit design to these new or changed Business Processes without impacting other regions.
• Meeting directly with users to find out their SAP-related needs and incorporate these needs into a cohesive plan
Responsible for cloning databases from production to test / development environments and maintain the users data integrity on the test / development environments
Strong Knowledge on Upgradation of database from Oracle 9.x to Oracle 10g,11g and 12c,18c
Strong Knowledge on Upgradation of Application – 12.1.x to 12.2.x
Patch Analysis and Patching of both Apps, DB tier.
AD utilities (adop,adpatch, adclone, adadmin, adctrl etc.)
Troubleshooting of Application.
create custom responsibilities, request groups and menus
creating and Managing Concurrent Managers and Concurrent Programs.
Cloning of the Applications using Rapid Clone and adclone.
Secondary
Experience in installation and configuring Oracle database servers on multiple operating environments like Red Hat Linux, OEL, IBM AIX,HP-UX Platforms
Installation, configuration and administration of Oracle Application 11i,R12 and 12.2 in a Multi- tier architecture.
Shell scripting.
Web-logic administration and troubleshooting.
Knowledge Tomcat administration.
Knowledge MySQL administration
Knowledge on could like AWS, Oracle Public cloud, Azure.
- Ownership of MBSE support Frontline Desk to address the Service Operations requests
- 16x5 monitoring of MBSE Infrastructure environment
- Create CISM tickets based on the predefined standard cases
- Documentation of Service cases and standard operation procedures
- Contribution to the predefined Knowledge database on Sharepoint.
- Communication & coordination with different teams in India & Germany
- Resolve user-reported issues as per the agreed SLA
- Assignment and coordination of CISM tickets to Level 2 & Level 3 support group
- Knowledge of Linux server deployments
- Knowledge of IBM WAS, IBM HTTP server, proxy configurations, F5 servers, client and server certificates.
- Knowledge of writing Linux scripts (Shell/Perl)
- Good knowledge of IBM DB2
- Good knowledge of LDAP protocols, authentication/authorization in IBM Jazz CLM