How to mesure IT service availability

An open discussion on issues related directly or primarily to the service or help desk.
Post Reply
User avatar
pawp
Newbie
Newbie
Posts: 3
Joined: Sat Sep 11, 2010 8:00 pm

Sun Sep 12, 2010 6:24 am

In my SLA agreements for particular IT services i am going to propose such metrics as: Availability (in percentage), Mean Time Between Failures, Mean Time to Resotre Service.
But now i am wondering how to identify that Service in Unavailable?
I was thinking that incident raised should mean that service is unavailable or offers decrease performance. But if only one user raises incident and for all other users service is working fine, does it really mean that Service i Unavailable? Maybe creation of Major Incident (more users impacted) means that service is unavailable?
Or maybe Service unavailability should come from Event Management process and therefore should be independent from Incidents raised. What do you think?


User avatar
Diarmid
ITIL Expert
ITIL Expert
Posts: 1894
Joined: Mon Mar 03, 2008 7:00 pm
Location: Helensburgh

Mon Sep 13, 2010 3:52 am

pawp,

altogether an interesting question. Here is my tuppence worth.

Service availability is the concern of Availability Management, not Incident or Event Management. the processes for handling events, incidents and operational actions, all need to inform Availability Management when they detect or cause service unavailability.

Service availability is as defined in your SLA. In other words, if your customer is concerned with partial or limited availability as well as complete unavailability (and what customer would not be) then you have to define how to quantify these in the SLA.

However, you do have to be careful. The number of people affected is often a poor indicator of impact, because some people's roles are, at least some of the time, far more important to the business than others. Good customer service always means addressing customer business needs. You have to be wary of setting up measurements that will drive you to restore relatively less important service to many before restoring vital service to one or two individuals.

By the way, for me, Major Incident is not "more users impacted" but "more business impacted" in accord with above comments.
"Method goes far to prevent trouble in business: for it makes the task easy, hinders confusion, saves abundance of time, and instructs those that have business depending, both what to do and what to hope."
William Penn 1644-1718
User avatar
tolman101
Senior Itiler
Senior Itiler
Posts: 44
Joined: Sun Sep 25, 2005 8:00 pm
Location: Sweden

Thu Sep 16, 2010 9:33 am

If your services have service levels defined and you are using an incident management tool that can assign service levels to a service you may be able to get information by generating reports from your IM tool.

You could perhaps determine that if there is an incident with priority 1 or 2 then your service is either stopped or disrupted. For these types of incidents the IM tool itself could inform you when the service agreement is broken based on time to resolve the incident and the levels stipulated in the SLA.
User avatar
Timo
ITIL Expert
ITIL Expert
Posts: 295
Joined: Thu Oct 25, 2007 8:00 pm
Location: Calgary, Canada

Fri Sep 17, 2010 11:30 am

Diarmid,

I would like to weigh on the service availability being a concern of Availability Management. First, i am not disputing the correctness of this statement, however, many organizations do not have a dedicated AVM process... whatever the reasons are.

So I don't find it totally inconceivable that AVM will be performed to a certain degree by folks who are concerned with managing incidents. While more proactive elements of AVM will probably get overlooked, on the reactive side (identifying AVM issue and reporting) Incident Management can handle the task.

All I am saying is that one doesn't need to start a separate process in order to take benefits of some of that processes' benefits within the existing setup.


Diarmid wrote:pawp,

altogether an interesting question. Here is my tuppence worth.

Service availability is the concern of Availability Management, not Incident or Event Management. the processes for handling events, incidents and operational actions, all need to inform Availability Management when they detect or cause service unavailability.

Service availability is as defined in your SLA. In other words, if your customer is concerned with partial or limited availability as well as complete unavailability (and what customer would not be) then you have to define how to quantify these in the SLA.

However, you do have to be careful. The number of people affected is often a poor indicator of impact, because some people's roles are, at least some of the time, far more important to the business than others. Good customer service always means addressing customer business needs. You have to be wary of setting up measurements that will drive you to restore relatively less important service to many before restoring vital service to one or two individuals.

By the way, for me, Major Incident is not "more users impacted" but "more business impacted" in accord with above comments.
User avatar
Diarmid
ITIL Expert
ITIL Expert
Posts: 1894
Joined: Mon Mar 03, 2008 7:00 pm
Location: Helensburgh

Sat Sep 18, 2010 5:21 am

Timo,

we need to keep clear the distinction between process and function. If a member of the incident team looks at availability (as an assigned role), then that person is contributing to availability management at that time and not to incident management. You still have a separate process for availability management even if you do not have a dedicated function. Otherwise you do not know what you are doing in that area.
"Method goes far to prevent trouble in business: for it makes the task easy, hinders confusion, saves abundance of time, and instructs those that have business depending, both what to do and what to hope."
William Penn 1644-1718
User avatar
Timo
ITIL Expert
ITIL Expert
Posts: 295
Joined: Thu Oct 25, 2007 8:00 pm
Location: Calgary, Canada

Mon Sep 20, 2010 11:30 am

Fair enough.
User avatar
thechosenone69
ITIL Expert
ITIL Expert
Posts: 268
Joined: Tue Jun 05, 2007 8:00 pm

Sun Oct 03, 2010 5:57 pm

Interesting Topic. In addition to what the others said..

Availability Management is all about Availability. If it helps, think about
Unavailability being concerned with significant business impact – is the service meeting the needs of the business as defined in the SLA? If not, then you have Unavailability.

This may seem obvious, and in some cases it is: the IT organisation
provides a service, this service has users, and when users can’t access
the service then it is Unavailable.

There are other factors to consider, particularly connected with Quality
of Service.

Suppose that you have a Sales Processing system that normally
takes 1-2 seconds to save a new order – and this is perfectly
acceptable. However, consider that the time to save a new order
shoots up to 60 seconds one day – does this constitute Unavailability?
If so, what if the transaction time is 10 seconds rather than 60 – is this
also Unavailability?

My advice is this: if the performance of a service, or the quality of an
IT service, is degraded enough to cause significant business impact
you should consider the event as Unavailability(While Providing Evidence of the cost of Unavailability, I will provide the formula at the bottom). If its not then do not calculate it. In my opinion Problem Management should work closelt with Availability management to Identify the cost of Unavailability.

Cost of Unavailability = (Downtime * Users affected * Average cost per user) + (Downtime * Lost Business Revenue per hour) +
Overtime working costs + Sundry cost
Ali Makahleh
Configuration Management(Blue Badge),
ITILV2 Service Manager(Red Badge),
ITILV3 Expert(Lilac Badge) Certified.

“If you can't describe what you are doing as a process, you don't know what you're doing." W. Edwards Deming.
Post Reply