Page 1 of 1

Would Alerting be a Problem Mgmt ticket or a Request?

Posted: Sun Aug 19, 2012 2:34 am
by prothos231
The Major Incident team takes a high impacting call. After they restore service, they open a Problem ticket for alerting to prevent the issue from developing into a major incident. The Problem Mgmt team rejects the ticket because they say it is a Request and not a task to diagnisos root cause.

Question: Should the Incident team be making alerting requests through problem managment after it was discovered it would have reduced the downtime or would it be a Request?


Posted: Mon Aug 20, 2012 6:14 am
by Diarmid
What is "alerting"?

What is a "problem"?

What is "problem management"?

What is an "alerting request"?

The quotation marks are to indicate that I am asking about your organization's definitions, not some book answer.

If I can understand what you mean I can apply logic to your question.

Is your definition of a "major incident" confined to level of impact? - If so, you may have other issues to resolve.

Posted: Thu Aug 23, 2012 8:50 am
by Boydness
Diarmid wrote:
The quotation marks are to indicate that I am asking about your organization's definitions, not some book answer.

If I can understand what you mean I can apply logic to your question.
I agree with Diarmid, without a clear understanding of the dynamics related to your company's terms, it is difficult to respond.

I will hazard a guess that the situation is:
An incident occurs with high user impact, the incident is assigned to a dedicated team that responds to that Incident to restore service. The Major incident team notifies Problem Management of the Incident for possible problem management review. Problem Management rejects the exchange for some specified reason.

So, the exchange is the issue. The criteria should be specified for the exchange and the manner in which it is submitted should also be defined.
Problem Management should be responding to Problems that involve reducing the overall restoration time.

The criteria you may consider adding:
No KEDB entry with restoration action/workaround exists for Incident
Meantime to detection for Incident exceeds X amount of time
And specify the manner in which those are submitted to Problem management for review.

The "no KEDB entry" may have been the criteria that would be utilized in this situation. It is Problem Management's responsibility to ensure that KEDB entries exist for utilization by Incident management to facilitate the timely (rapid) restoration of the service. Your Major Incident team may have determined the workaround or even started to isolate a root cause within their restoration efforts, if so, a problem request is submitted or a Problem record is opened and assigned to Major Incident to document the information that the Major Incident team found related to the Incident.

You could also train and designate the supervisor of the Incident Management team to document (the first several steps of the process) and make the 'Valid Problem Decision', thus allowing Major Incident to directly open the Problem Record and document their findings (maybe even the preliminary analysis findings, if the restoration actions went that far). Then it is Problem Management's responsibility to pick up the ball at assigning problem analysts for the Preliminary Analysis and/or handling the 'Viable Business Reason to Continue' Decision.


Posted: Tue Jan 14, 2014 5:14 pm
by Sammy024
Yes what is ALERTING here..... But the time where Major Incident can become a problem is where:
After reading the PIR you see that that the root cause was not found, and Workaround was used to restore the service ----- in this case it can be logged as a problem record to find the permenant fix.

Posted: Tue Jan 14, 2014 5:31 pm
by Boydness
Sammy024 wrote:Yes what is ALERTING here..... But the time where Major Incident can become a problem is where...
Sammy please be mindful of your phrasing, an Incident does not become a Problem. An Incident is only ever an Incident. Yes, an Incident can be an input/trigger into the Problem process. A Problem record does not require an Incident record, it could be triggered by Change, etc. Separate records/tickets are required for each process, so an Incident does not become anything other than an Incident.

Problem alerts

Posted: Wed Jan 29, 2014 11:37 pm
by MadhavaVermaDantuluri
We should send the notification alerts and convince the problem team to provide a sign-off authority upon review of the first alert.

Posted: Thu Jan 30, 2014 9:17 am
by Boydness
The Problem Management process should have criteria to consider if the Problem Request (that was triggered by the Incident) is related to an already established Known Error or if there are merits to further investigation (which would "validate" the Problem Request). So, depending on the resources and other factors, an example of a valid Problem Request (triggered/involving a single Incident) might meet the criteria of being a Priority 1 Incident that had no known Restoration Action (no KE entry), had no Return to Service after 4 hours (or some other duration), requested by senior leadership (or specific roles), and/or other relevant criteria.

Now to be clear, a validated Problem Request simply means that the organization is going to invest some resources into performing a Preliminary Analysis. The PA will be a quick analysis to determine the information to use in the Proceed Further decision (Viable Business Reason to Continue). That information is provided to those that will be making the decision to actually begin the Root Cause Analysis (RCA).

The Problem Request may be valid, the PA may show that that there is a potential unknown root cause that could create the situation where there will be the possibility of a future incidents, but the is no Viable Business Reason to Continue at this time. It is essentially deciding about investing hundreds of man-hours into investigating something that management is willing to live with in the near term and the financial impact is relatively low. The decision may be not to continue unless another incident is potentially attributed to the Problem record within the next X number of days (because the PA did not find another older potentially related Incidents). Leadership decides to treat it as a one-off until more is known and hopefully the next time it happens Incident Management is positioned to capture more data/logs/information the next time, so that the Problem Management process has more to utilize when the Problem record is reviewed again later.

Incident and Problem Management typically have a difficult relationship unless this part of the process is properly structured and well understood among the different levels and tiers. The process is not initially asking for the problem to be "fixed", it is only concerned with validating if their is a NEW (valid) problem, what is readily known about this potential issue (PA), and is their value in devoting resources to investigating the potential issue (VBR to Continue).
Never start the process where the process appears as a hand-off and a VBR by the receiving tier. The recording/documentation through VBR needs to be well defined and controlled externally to those performing the tasks. Also, once that portion of the process is better understood by the personnel in the different level and tiers there will be less resistance to "a new problem" potentially involving their group. Especially, because that new problem may only be a single symptom of something much larger. An incident involving a Real Time Service (RTS) may not be an issue with the service itself but might be a symptom of a local network issue somewhere.