Problem management is a core component of the Information Technology Infrastructure Library (ITIL). Problems are defined as the underlying cause of incidents, and when incidents occur, users are likely to report them so your team can resolve the issue. You’ll then need to identify and eliminate the underlying problem, which is easier said than done.
Problem management can be time-consuming and can easily take over your team’s workload. And if you don’t have the right metrics, it can be difficult for you to make improvements.
The solution is to gather and understand key ITIL problem management metrics. Here’s a quick guide to the most important IT problem management metrics and key performance indicators (KPIs) along with some best practices for creating an efficient system for identifying problems and quickly addressing incidents.
KPIs are one element in planning an efficient support and management system in IT. To practically maintain the services of IT support departments at a high level, it’s necessary to use software adapted to the needs of the organization.
If you’re looking for a top tool to help you with IT problem management, consider SolarWinds® Service Desk, which is designed to help streamline your process.
What’s the Difference Between an ITSM Problem and an Incident?
On the surface, “problem” and “incident” might appear to mean the same thing. But in ITIL, the terms describe two different parts of the problem management process. Understanding important IT problem management metrics requires addressing both the problem and the related incidents.
The term “incident” refers to an unplanned disturbance in the quality of an organization’s IT service. The “problem” is the cause of the disturbance. Thus, there’s a cause-and-effect relationship between the problem and the incident. So, for example, if an employee attempts to log in to an application and can’t and reports the issue to IT, they’ve reported an incident. After IT gets the ticket, they’ll likely investigate the incident, hoping to get to the root cause. After investigating, the IT team finds the source of the incident—that is, the problem. Investigating, addressing, and documenting the incident falls under the purview of problem management. Essentially, problem management is the way IT administrators address the causes of problems instead of simply treating the symptoms (or incidents). Ensuring a streamlined process for treating these causes first requires a clear set of ITIL problem management metrics, or a set of quantifiable measures to track and monitor the processes used to investigate problems.
What Are Some of the Most Common IT Problem Management Metrics and KPIs?
The following ITIL problem management metrics are some of the most common ones you’ll want to consider as part of your best practices.
This metric refers to the time between the report of an incident and its resolution. Quick issue resolution times and low reopen rates are important KPIs for this metric.
First-Contact Resolution Rate
This refers to the percentage of incidents resolved after the first report with no reopens. This is an important problem management metric and KPI, since keeping these percentages high is an indicator of a functional organization.
Service-Level Agreement Compliance Ratio
Problem management service-level agreement (SLA) metrics refer to the number of incident resolutions needed to meet company-wide guidelines in relation to response time, workflow prioritization, and cost, among other elements. High resolution rates are key when it comes to problem management SLA metrics because they’re a sign productivity is high.
Tickets are reported incidents waiting to be resolved. It’s important to understand not only how many tickets are open at a given time but how many tickets go unresolved or get reopened. Managing and documenting active tickets helps reduce the amount of time spent resolving each incident.
Cost Per Ticket
This metric measures how much money is spent resolving incident reports, or tickets. By better understanding how much each incident costs, an organization can better set budgets for hiring service desk team members and identify cost-efficient problem management solutions.
Tickets initially marked as resolved but revisited after resolution contribute to reopen rates. High reopen rates could be an indicator additional training is needed or there are problems with hardware, software, or applications.
Incidents by Department
It’s important to document which departments tickets are coming from. Tracking which departments are experiencing the highest number of incidents helps IT teams uncover gaps in service and ensure departments are getting adequate coverage.
Type of Incident
Categorize incidents by the devices or applications affected. Paying careful attention to which devices and applications are most prone to problems can provide guidance as to which skills and training programs teams need .
Incidents Associated With Known Problems
Tracking tickets based on the problems they’re associated with is important for understanding the overall health of systems and can help organizations understand when major repairs or updates are warranted.
Incidents With No Known Resolution
Sometimes, teams are unable to resolve incidents. Tracking these types of incidents and documenting investigations and attempted solutions within the ITIL framework saves time and effort, as it prevents teams from attempting ineffective solutions. It can also reveal knowledge gaps within a service team, ultimately helping teams better understand when further training is needed.
What Are Best Practices for Addressing IT Problem Management Metrics?
One of the best ways to improve your IT problem management metrics is to create a standard set of procedures for the quick resolution of tickets. This is best done within the context of software designed to help you organize and automate the process.
The first step to keeping incident resolution high is to compile a list of the organization’s key ITIL problem management metrics with clearly established KPIs. To do this, however, an organization needs a detailed ITIL framework providing not only information about tickets, resolutions, and recurring issues but strategies for addressing problems and complete documentation of the ways past problems have been resolved. To do this, organizations need tools capable of helping employees submit service requests, sort those requests by incident type, and archive the investigation and resolution of those tickets.
Once an organization has collected all its data in one place, there are four main techniques for tackling problem management effectively:
This refers to a discussion-based approach where each stakeholder participates in the discussion, shares their data, and contributes to problem analysis.
Ishikawa Diagrams/Cause-and-Effect Analysis
This method uses diagrams to analyze the primary and secondary causes of a problem, including different people, processes, products, and partners.
Kepner Tregoe Problem Analysis
This method defines a problem and works on establishing possible causes and testing them until the right cause is established. It asks questions concerning why something occurred and focuses on establishing the root of the problem.
This technique aims to arrive at the root of the problem by repeatedly asking the question “Why?” Gather your team and plot a series of why questions depending on the complexity of the problem. Use the answers to define actions that resolve the issue and could prevent it from recurring.
How Can IT Management Tools Be Used for IT Problem Management?
Adopting an incident management system is an important way organizations can store all this information in one location, making it possible to look for patterns and gaps in knowledge and service to make training decisions and plan for future growth. A good incident management system has a few key features. First, the system should record and archive incidents while simultaneously classifying them based on their urgency and the impact on the system. An incident management system should also help assign tickets to team members with the appropriate skill sets. Finally, the system should offer tools for reporting the resolution of an incident and helping to diagnose and document the problem.
Finally, communication is key when recurring incidents are impacting employees across an organization. Letting employees know team members are addressing incidents and providing temporary workaround solutions can help bolster productivity even when systems aren’t functioning optimally. SolarWinds Service Desk can facilitate better communication and improve transparency. Managers and leaders can keep other team members updated by sending mass status updates and alerts, which means everyone can be kept informed of progress or roadblocks.
Start with a free 30-day trial of SolarWinds Service Desk to see how you can help your organization better identify and address IT problem management metrics and KPIs.