How to measure MTBF or MTBSI

Discussion of any ITIL or related issues that don't fit well into any of the above.
Post Reply
User avatar
DanA
Itiler
Itiler
Posts: 27
Joined: Sun Feb 11, 2007 7:00 pm
Location: Minneapolis, MN, USA

Fri Jan 28, 2011 4:51 pm

Mean Time Between Failure or Mean Time Between System Interruption, how do you measure?

If you have failure at Jan 1, then Feb 1, then March 1, then you can say you have a 1 month MTBF. But what happens if it is now August and you haven't had an failure since? Do you still report a 1 month mean, even though the last failure was 5 months ago? Also, what time period do you go back to for your mean calculation? 12 months? Beginning of calendar year?


User avatar
thechosenone69
ITIL Expert
ITIL Expert
Posts: 268
Joined: Tue Jun 05, 2007 8:00 pm

Fri Jan 28, 2011 6:59 pm

Dana,

MTBF is the mean elapsed time from the time an IT service or component is fully restored until the next occurrence of a failure in the same service or component.

As for the MTBSI(Mean Time Between System Incidents) it is a Metric used for measuring and reporting Reliability. MTBSI is the mean time from when a System or IT Service fails, until it next fails. MTBSI is equal to MTBF + MTRS MTRS(Mean Time to Restore Service.)

Regards,
Ali Makahleh
Configuration Management(Blue Badge),
ITILV2 Service Manager(Red Badge),
ITILV3 Expert(Lilac Badge) Certified.

“If you can't describe what you are doing as a process, you don't know what you're doing." W. Edwards Deming.
User avatar
DanA
Itiler
Itiler
Posts: 27
Joined: Sun Feb 11, 2007 7:00 pm
Location: Minneapolis, MN, USA

Mon Jan 31, 2011 11:12 am

Sure, I understand the definitions. I hope you can see where I have a problem. If there isn't a "next occurrence", how do you calculate a mean? Or if your previous mean was 1 month, but we have yet to have the next occurrence several months later, how does that play in the calculation? Please see my example.
User avatar
UKVIKING
ITIL Expert
ITIL Expert
Posts: 3639
Joined: Fri Sep 15, 2006 8:00 pm
Location: London, UK

Mon Jan 31, 2011 11:48 am

DanA

When there is only 1 incident, I would do the following

As I use excel, I would do the following

The date of the first (only) incident would be is b2
In c2, I would use the function now() to have the current date / time and add .25 (1/4 of a day). I would then have two distinct columns of fields

1 - Number of days since last incident (Failure)
2 - MTBF

Since the two columns would be the same # of days, this will visually state that there has not been an incident
John Hardesty
ITSM Manager's Certificate (Red Badge)

Change Management is POWER & CONTROL. /....evil laughter
User avatar
Diarmid
ITIL Expert
ITIL Expert
Posts: 1894
Joined: Mon Mar 03, 2008 7:00 pm
Location: Helensburgh

Mon Jan 31, 2011 2:38 pm

DanA wrote:If there isn't a "next occurrence", how do you calculate a mean?
It is always the case that the next occurrence has not happened. It is nothing to do with how long since it did. Any calculation using the time since the last incident is really saying: 'if we had an incident today the mean time would be x'

since the incident has not occurred this is pretty meaningless until x is as great as the mean time calculated to the point of the last incident. From then on x is relevant as it indicates some notion of improvement, whether fortuitous or otherwise is another matter.

As to how far back you go: well you go back to the beginning of time. At least you go back far enough for the satisfaction of your customer, and far enough to give you valid information on which to analyse your service record and design improvement goals. whether that is a few months or a few years probably has more to do with how much your services and service objectives change over time rather than some artificial concept like a year.
"Method goes far to prevent trouble in business: for it makes the task easy, hinders confusion, saves abundance of time, and instructs those that have business depending, both what to do and what to hope."
William Penn 1644-1718
User avatar
DanA
Itiler
Itiler
Posts: 27
Joined: Sun Feb 11, 2007 7:00 pm
Location: Minneapolis, MN, USA

Wed Feb 02, 2011 10:51 am

I like this answer and I'll follow up with another question: Is there a better measurement? Seems that Availability combined with MTBSI gives a decent view of your overall % uptime, plus adds measurement of stability. Is anyone doing anything different?
Post Reply