Pro-active problem management techniques

Discussion on issues related directly or largely to ITIL problem management.
Post Reply
User avatar
abu1
Senior Itiler
Senior Itiler
Posts: 30
Joined: Tue Oct 25, 2011 8:00 pm

Wed Apr 25, 2012 11:40 am

Apart from looking at trends and reports what are other good ways to pro-activley look for problems..


User avatar
UKVIKING
ITIL Expert
ITIL Expert
Posts: 3639
Joined: Fri Sep 15, 2006 8:00 pm
Location: London, UK

Wed Apr 25, 2012 12:51 pm

Hmmm

if you have microsoft as an operating system
you will have problems

Make sure your o/s and ios patch mgmt process is keepign the systems up to date
find single points of failure
make sure the DR plan is fit for purpose
review the incident tickets for patterns of outages
John Hardesty
ITSM Manager's Certificate (Red Badge)

Change Management is POWER & CONTROL. /....evil laughter
User avatar
elbow
Senior Itiler
Senior Itiler
Posts: 36
Joined: Sun Feb 13, 2011 7:00 pm

Thu Apr 26, 2012 2:48 am

Set up a weekly meeting with the engineers, people at the sharp end. They can give alot more insite than looking at reams of incidents. Also if the service desk is not profficient, this could mean the categorisation of Incidents is incorrect and even closure details wrong therefore throwing your trending into confusion.
User avatar
Diarmid
ITIL Expert
ITIL Expert
Posts: 1894
Joined: Mon Mar 03, 2008 7:00 pm
Location: Helensburgh

Thu Apr 26, 2012 4:54 am

If you find that the categorization of incidents is not correct or reliable, then you should immediately open a problem to investigate and resolve that issue and this should have a high priority.

Looking for patterns amongst incidents is more often going to identify problems than merely looking at trends, and while talking to staff groups working at the sharp end can be rewarding, it will be much more so if you are armed with patterns and trends you have discerned for yourself.
"Method goes far to prevent trouble in business: for it makes the task easy, hinders confusion, saves abundance of time, and instructs those that have business depending, both what to do and what to hope."
William Penn 1644-1718
User avatar
abu1
Senior Itiler
Senior Itiler
Posts: 30
Joined: Tue Oct 25, 2011 8:00 pm

Thu Apr 26, 2012 5:27 am

UKVIKING wrote:Hmmm

if you have microsoft as an operating system
you will have problems

Make sure your o/s and ios patch mgmt process is keepign the systems up to date
find single points of failure
make sure the DR plan is fit for purpose
review the incident tickets for patterns of outages
Company has recently completed project called SCOM which is a microsoft tool which provides alerts real time on condition of servers and hardware like HDD ,CPU and also software issues conflicts etc.

I agree if incident are not categorized properly then it makes trend analysis based on incident reports hard..
User avatar
elbow
Senior Itiler
Senior Itiler
Posts: 36
Joined: Sun Feb 13, 2011 7:00 pm

Thu Apr 26, 2012 5:59 am

Well i am at present looking at a cross section of Incident details over time to build a case for the mis-categorisation in fact and because of ther SD immaturity it has been sometimes more effective in my case to talk to other functions
User avatar
UKIT
Senior Itiler
Senior Itiler
Posts: 50
Joined: Tue Sep 25, 2007 8:00 pm
Location: England

Fri Apr 27, 2012 9:31 am

abu1 wrote:Apart from looking at trends and reports what are other good ways to pro-activley look for problems..
I was involved with an incident resulting in the loss of IT services due to a simple server configuration oversight
A server designed to accommodate duel hot pluggable redundant power supplies had only been fitted with one.
The power supply failed resulting in the loss all IT services being hosted from this server.
A new power supply had to be sourced as a matter of urgency in order to restore full IT services.
Assigned to establish the root cause, it didn’t take long to see how this simple “oversight” could have been so easily avoided.
With a duel power supply configuration, a single power supply can fail without disrupting the live service.
A project to undertake a complete survey of the datacentre to indentify single points of failure (SPOF) was instigated.
The project also looked at servers configured with a single network cards where multi homed fault tolerance network card configurations could be implemented.
Service Transition & IT Project Management
Post Reply