Setting Problem Management Goals

Discussion on issues related directly or largely to ITIL problem management.
Post Reply
User avatar
DustinW07
Itiler
Itiler
Posts: 12
Joined: Wed Aug 01, 2012 8:00 pm

Mon Feb 04, 2013 4:15 pm

My organization is revamping a number of its ITIL services (Incident, Problem, Capacity, and many more). With that said, How do you guys measure how effective your Problem Management Processes are & how do you set your Problem Management goals/expectations? Here's the key measurements we're focusing on & comparing to best-practices from a Process perspective in terms of measuring the process' effectiveness:
-Average time to resolve problem (based on priroty)
-Average time to identify root cause
-% of Problems meeting root cause target
-Average time to close record after resolution has been identified
-% of problems not solved

My dilemma, though, is going 1 step further. How do you set PM goals (Whether annually, 5-yr, etc.) to show how effective the Problem Management service is? I get that PM can lead to lower incident volume, decreased MTTR for incidents with an effective KEDB, increased system availability, etc....but how have you identified "HOW MUCH" improvement you'd expect to see in a given year? Is there a best-practice expectation?....2%, 5%, 10%, etc.

1) Do you have a goal for the Incident volume redution that should occur with effective Problem Management? (i.e. "With effective PM, we expect to see a X% reduction in incidents this year". If so, what percentage does your company use for X?

2) Do you have a goal for the Tier-1 Incident MTTR reduction based on effective PM? (i.e. Tier-1 incident MTTR will be reduced by X% based on effective PM processes).

Etc.

Just trying to understand what types of specific goals & percentage improvements you guys anticipate as a result of effective PM processes. Any help would be greatly appreciated.

Thanks.


User avatar
Diarmid
ITIL Expert
ITIL Expert
Posts: 1894
Joined: Mon Mar 03, 2008 7:00 pm
Location: Helensburgh

Tue Feb 05, 2013 8:45 am

Not an easy area. Problem analysis can be difficult or easy, the solutions can be protracted or simple to implement and these do not correlate well with the priority. Then the world your incidents live in keeps changing, leading to new problems.

Another aspect to look at is how pro-active your problem management is. So something like number (or percentage) of problems identified/analysed/resolved before they cause incidents might be useful.

Improvement figures cannot be taken easily from other environments. They too much depend on the level of maturity you start from, the volatility of the environment, how leading edge the technologies used and how much your organization is willing to invest in problem management.

In the end the performance of problem management has to be largely subjective based on experience, unless you are dealing with very large numbers.
"Method goes far to prevent trouble in business: for it makes the task easy, hinders confusion, saves abundance of time, and instructs those that have business depending, both what to do and what to hope."
William Penn 1644-1718
User avatar
DustinW07
Itiler
Itiler
Posts: 12
Joined: Wed Aug 01, 2012 8:00 pm

Wed Feb 06, 2013 11:00 am

Makes complete sense....Our Problem Management processes, and Incident Management processes for that matter, are not mature at this point. That's part of the reason why we're revamping our RUN services across our IT organization. I agree that a lot of Problem Management is subjective, so I'm trying to find some good measures to make it more objective in order to show it's effectiveness. I'm hesitant in trying to setup a "Goal" measuring the reduction in Incident volume because I think other variables play a part in increasing/decreasing incident volume, so the volume of incidents may not be a good indication of how effective the Problem Management processes are. Here's where I do think I could accurately measure the effectiveness:

1) Number of Resolved Problem Records year-over-year
2) Number of incidents permenetly prevented as a result of resolved Problems - I would track this by accounting for the reported Incident volume that was initially related to the Problem record, and not try to predict what "could have been" had we left it unresolved, since there's really no way to predict this.
3) Percent of Problems logged as a result of proactive Problem management.
4) Decrease in the number of reoccuring incidents
5) Percent of Problems that have workarounds available.

What are your thoughts on those?
User avatar
Diarmid
ITIL Expert
ITIL Expert
Posts: 1894
Joined: Mon Mar 03, 2008 7:00 pm
Location: Helensburgh

Thu Feb 07, 2013 5:13 pm

These stats only work if you have a large infrastructure with many applications and hence many problems. Even then they have to be treated with caution:

1) Has to be linked with unresolved problems and/or with problems open for longer than anticipated. But you also have to distinguish problems where the analysis and proposal is complete but the solution is out of your hands.

2) This should be incident groups and has to take into account the severity and likelihood of the incidents.

3) The numbers have to be large to use % and again what about severity? It's more important to prevent three disasters then fifty inconveniences. Also might look at ones that could have been spotted pro-actively (to guide improvement actions).

4) In a big complex organization this may not happen much. Although it can if you have low maturity at the start.

5) Almost all problems have a work-around of some sort (even if it is just to restart the system or perform some task by other means). Unless you have a way to measure the quality of the work-around you are not measuring anything useful.

Also you sh/could measure cost against benefit.

In a volatile environment, some years you will do well to just keep pace unless you have an infinite budget.
"Method goes far to prevent trouble in business: for it makes the task easy, hinders confusion, saves abundance of time, and instructs those that have business depending, both what to do and what to hope."
William Penn 1644-1718
User avatar
DustinW07
Itiler
Itiler
Posts: 12
Joined: Wed Aug 01, 2012 8:00 pm

Fri Feb 08, 2013 10:32 am

Thanks Diarmid. Our IT organization handles anywhere from 190,000-210,000 incidents a year, so I think the volume will give us plenty of opportunities for Problem Management improvements. :D

I don't disagree with your points, but perhaps I can add additional clarity:

1) Within our tool, we have the ability to identify the number of problems that have been resolved, as well as the number of problems that have gone through an analysis but may not get "resolved" based on things like: cost of the solution, solution requirements, likelihood/impact of the incident, etc., so we will be able to determine the number of records that have been fully resolved, and the number that have gone through the analysis but may not be "Resolved" based on other circumstances.

2 & 3) Within each of those metrics, we would go into detail about where those numbers fall in terms of severity of the incidents, so this would be covered. I don't disagree that a disaster outweighs inconveniences, but over time, the increased volume of inconveniences adds up & adds an element of ineffiency to our shop because now we're spent firefighting repeat inconveneince incidents. Our company, today, spends way too much time firefighting repeat incidents & we don't look for fire prevention opportunities.

5) Almost all problems have a work-around of some sort, but our company is awful at documenting this knowledge, which means even "Known" workarounds aren't available to a vast majority of our support teams because it's in somebody's head. The implementation of our KEDB should help with this of course, and although I wouldn't expect the % of problems w/ known workarounds to vary significantly from year-to-year, I think it has value in showing where we're inefficient in our communication to the incident groups. I'd consider this metric more of a benchmark.

Good points, though. Thanks for your feedback.
User avatar
KenLuo
Senior Itiler
Senior Itiler
Posts: 55
Joined: Fri Nov 02, 2012 8:00 pm
Location: Singapore

Sat Feb 09, 2013 1:25 pm

The goals I set for my problem team:

1) Reduced number of incidents, with the problem team in place, I do expect to see less or at least a trend of incidents reduction, otherwise, why I need problem team.

2) Reduced number of downtime or performance degradation cases.

3) Improved service restoration time.

4) Reduced recurring issues.

I think problems are not easy to address and sometimes you need external vendors support, so basically it does not make any sense to measure the fix time for problem team like what we have done for incident team.
Luo, Tian-Hong (Ken)
Regional Operation Lead

ITIL Expert Certified
User avatar
KenLuo
Senior Itiler
Senior Itiler
Posts: 55
Joined: Fri Nov 02, 2012 8:00 pm
Location: Singapore

Sat Feb 09, 2013 1:28 pm

DustinW07 wrote:Makes complete sense....Our Problem Management processes, and Incident Management processes for that matter, are not mature at this point. That's part of the reason why we're revamping our RUN services across our IT organization. I agree that a lot of Problem Management is subjective, so I'm trying to find some good measures to make it more objective in order to show it's effectiveness. I'm hesitant in trying to setup a "Goal" measuring the reduction in Incident volume because I think other variables play a part in increasing/decreasing incident volume, so the volume of incidents may not be a good indication of how effective the Problem Management processes are. Here's where I do think I could accurately measure the effectiveness:

1) Number of Resolved Problem Records year-over-year
2) Number of incidents permenetly prevented as a result of resolved Problems - I would track this by accounting for the reported Incident volume that was initially related to the Problem record, and not try to predict what "could have been" had we left it unresolved, since there's really no way to predict this.
3) Percent of Problems logged as a result of proactive Problem management.
4) Decrease in the number of reoccuring incidents
5) Percent of Problems that have workarounds available.

What are your thoughts on those?
My thought:
#1, #3, #5 not meaningful.
#2 not easy to measure or capture.
Luo, Tian-Hong (Ken)
Regional Operation Lead

ITIL Expert Certified
Post Reply