Major Incident's - How do you manage related incidents?

An open discussion on issues related directly or primarily to the service or help desk.
Post Reply
User avatar
vz-r_Dave
Itiler
Itiler
Posts: 15
Joined: Tue Oct 27, 2009 8:00 pm

Wed Jan 13, 2010 7:29 am

Hi guy's

I have been discussing the above with colleagues and we are trying to think of the quickest way to get related incidents logged. Currently we raise the MI and log incidents for each contact after. These are then related to the MI and are solved once the MI is solved. Obviously this assists with Problem Management once the MI has been solved and allows us to understand the full impact of the outage.

One of the guy's believes in order to save time we should be informing the users of the MI and rather then logging tickets for each individual contact listing the user's in the MI ticket it's self. As much as I understand high workloads and lack of service whilst an MI is in progress would this method not allow us to understand the full impact of the MI and affect the ability of Problem Management to raise Problem tickets? Also the integrity of the list of user's, if the SD guy's were to write down in a document or on paper, user's can easily be forgotten etc.

I would like to know what method's you use for MI's. Unfortunately we do not really have a say with our call logging SW. There is more emphasis on the backend then on the requirements of the SD themselves.

In an ideal world our call logging SW would have a button that auto populates and relates a new incident to the MI. A handy CTI would also be good in this instance. I believe that calls would be reduced to one minute or so.

Thanks


User avatar
UKVIKING
ITIL Expert
ITIL Expert
Posts: 3639
Joined: Fri Sep 15, 2006 8:00 pm
Location: London, UK

Wed Jan 13, 2010 7:43 am

Dave

We did it this way for the ticket system

If a MI happened because we noticed something (monitoring) or because there are 10 tickets from individuals

1 - a MI ticket is raised and assigned to the team lead or SD mgr to manage
2 - if people call in or register new IM, they are created and the MI ticket is referenced, and the ticket is put in a particular state which takes the comment from the SD team member to state the MI ticket was primary

3 - all of the work done by teams - internal or external are done via the MI ticket and is trcked by that MI ticket

4 - once the MI issue is over with, the MI ticket is put in a pending closure state while the list of the individual tickets are collated and gathered

all of these are closed with a brief explanation pointing to the parent MI ticket in the results. the system should auto inform the specific users

5 - the SD mgr would write a report - high level explaining the outage and the timelines - this is merely a notificiation report not a RCA report

6 - this is sent to all users (BCC) and to the specific users in the individual tickets (BCC) as well as required types
John Hardesty
ITSM Manager's Certificate (Red Badge)

Change Management is POWER & CONTROL. /....evil laughter
User avatar
vz-r_Dave
Itiler
Itiler
Posts: 15
Joined: Tue Oct 27, 2009 8:00 pm

Wed Jan 13, 2010 8:51 am

Hi John

We have a specific queue for the MI's whcih the MI designated MI manager will monitor until the MI is solved.

The MI and all of the single incidents are related and placed into this queue.

Once the MI is solved we are then able to solve all related incidents to the MI with one click. Thus not having to individualy solve each ticket. The user's are informed via Intranet announcement and the solved notice.

This part of the process work's well, it is just finding the quickest method for getting incident's logged with minimal input from the SD guy's.

Cheers
User avatar
UKVIKING
ITIL Expert
ITIL Expert
Posts: 3639
Joined: Fri Sep 15, 2006 8:00 pm
Location: London, UK

Wed Jan 13, 2010 10:47 am

Dave

Other than having a canned phone message to announce

if your issue is about (MI Description), then there is already a ticket (MINumber), there is no need to continue call

Also, what if the issue is email or web issues, how are the users suppose to be told ?

The reason we did our this way was because we had internal customers (departments) and external customer (businesses) as we were a hosting service

While we did send notices to the customers (i and e), the customer contacts that receive them may not be the customer rep calling

In addition, if you have only 1 MI ticket and no other ticket, how do you know it is a major incident if no users complain
John Hardesty
ITSM Manager's Certificate (Red Badge)

Change Management is POWER & CONTROL. /....evil laughter
User avatar
vman
Itiler
Itiler
Posts: 11
Joined: Sun Jul 05, 2009 8:00 pm

Tue May 18, 2010 6:04 am

hi ALL
I am debating the very same issue at my organization,
My concern is major incident that is that is known to be the cause of multiple minor incident how to you log and track the lifecycle of the call.
As it is best practice to treat each incident as an incident which you initiate an incident process
Is if there is any divergent of this the risk of missing a process and it may contribute to customer dissatisfaction.

My thinking is

Incident is LOG AND evaluated as a major incident and initiates a major incident process.
Mine incidents are log and evaluated, notes I made and the major incident reference as possible course
Note the incident must be evaluated and treated as an incident.
When major incident is resolved all the minor tickets must be evaluated as being resolved.

My point is we may assume that a minor incident is initiated by a major incident but do we know
For certain, we can reference mine incidents and incident but the incident lifecycle must be adhered to
User avatar
Diarmid
ITIL Expert
ITIL Expert
Posts: 1894
Joined: Mon Mar 03, 2008 7:00 pm
Location: Helensburgh

Tue May 18, 2010 6:39 am

there is a thread from last October related to this subject: 2With mass outage - when to stop logging individual incidents"

And there is a thread at itsmfi-forum from the same time: "Incident definition"

The first important thing to do is to make sure you clearly understand the difference between an incident and a call about an incident.

The final most important thing to do is to make sure that all the calls that seemed to be related were, in fact, related, and that restoration is, in fact, universal, and that, therefore, service restoration has, in fact, satisfied every user's need. [a short version of this sentence is available without any occurrence of the phrase "in fact" :twisted: ]
"Method goes far to prevent trouble in business: for it makes the task easy, hinders confusion, saves abundance of time, and instructs those that have business depending, both what to do and what to hope."
William Penn 1644-1718
User avatar
vman
Itiler
Itiler
Posts: 11
Joined: Sun Jul 05, 2009 8:00 pm

Tue May 18, 2010 3:18 pm

You are right!
User avatar
gnome
Itiler
Itiler
Posts: 9
Joined: Mon Mar 22, 2010 8:00 pm

Tue Jun 15, 2010 11:13 pm

EDITED
Last edited by gnome on Tue Jul 27, 2010 6:54 am, edited 1 time in total.
User avatar
Diarmid
ITIL Expert
ITIL Expert
Posts: 1894
Joined: Mon Mar 03, 2008 7:00 pm
Location: Helensburgh

Wed Jun 16, 2010 1:09 pm

Are your Ops team pulling your leg?

Perhaps there is something wrong with your definition of a major incident, if so many are occurring? - the alternative explanation is not very comforting for you!

It is always a bad sign when people are fretting over what is logged rather than what is wrong with the service.
Should we just log a normal incident which is affecting, say 50 staff from using the email or internet assuming issue gets fixed in less than 15min and we had only about 5-10 people ring within the 15min outage window.
This question is utterly meaningless in respect to what constitutes a major incident. The numbers that matter are costs and risks both immediate and ongoing. Without that context it doesn't mean much whether one person is affected or a hundred.

Yes, as I write this I come more to the conclusion that your organization does not have a good handle on incident management and in that case, in my experience, Ops have every right to be worried about everything falling on their head and being blown out of proportion.

Fix your incident management system and the rest will take care of itself.
"Method goes far to prevent trouble in business: for it makes the task easy, hinders confusion, saves abundance of time, and instructs those that have business depending, both what to do and what to hope."
William Penn 1644-1718
User avatar
xabit
Newbie
Newbie
Posts: 1
Joined: Mon Jun 28, 2010 8:00 pm

Tue Jun 29, 2010 9:32 am

Fortunately our processes and call logging tool allow us to manage this effectively.

Once we have raised a MI our call logging system allows us to log multiple interactions against it. In the MI we can then view a list of all calls received in relation to the issue.

Each interaction has a notify by field whereby the user can be advised via email or telephone when the MI is fixed. When the MI is closed, if interactions are set to notify by email it will trigger an email to the user and close the interaction. If telephone, the interaction will remain open with a status of pending callback.

Our previous call logging tool, we were able to link incidents to each other. We would identify a master MI ticket and link child tickets to it. Each child ticket would go into a waiting status pending the resolution of the master MI.
User avatar
gnome
Itiler
Itiler
Posts: 9
Joined: Mon Mar 22, 2010 8:00 pm

Tue Jul 06, 2010 7:21 am

Diarmid wrote:Are your Ops team pulling your leg?

This question is utterly meaningless in respect to what constitutes a major incident. The numbers that matter are costs and risks both immediate and ongoing. Without that context it doesn't mean much whether one person is affected or a hundred.

Fix your incident management system and the rest will take care of itself.
apologies for such a delayed reply and thanks for your input. When you are supporting 12000 clients from various business units it won't be an easy task to prioritize incidents based on the costs and risks. Yes I know, that can never be said as an excuse... As you say, hopefully when the IM process is finalised, this should be all sorted.

TIA
Last edited by gnome on Tue Jul 27, 2010 6:55 am, edited 1 time in total.
User avatar
gnome
Itiler
Itiler
Posts: 9
Joined: Mon Mar 22, 2010 8:00 pm

Tue Jul 06, 2010 7:24 am

xabit wrote:Fortunately our processes and call logging tool allow us to manage this effectively.

Once we have raised a MI our call logging system allows us to log multiple interactions against it. In the MI we can then view a list of all calls received in relation to the issue.

Each interaction has a notify by field whereby the user can be advised via email or telephone when the MI is fixed. When the MI is closed, if interactions are set to notify by email it will trigger an email to the user and close the interaction. If telephone, the interaction will remain open with a status of pending callback.

Our previous call logging tool, we were able to link incidents to each other. We would identify a master MI ticket and link child tickets to it. Each child ticket would go into a waiting status pending the resolution of the master MI.
Hi Xabit, thanks for the reply. My query was more on the necessity in classifying major incident as a Major Outage. Issue we have is most of the times when such MI occur, they just get logged as normal incidents and quickly gets resolved. This results in failure in capturing and classifying MI as they occur and also makes the monthly reporting for OPS performance rosy.
Post Reply