Table of Contents
Chapter One: Definition of Root Cause Analysis
The ASQ defines a root cause as “a factor that caused a nonconformance and should be permanently eliminated through process improvement.” However, a root cause isn’t just any contributing factor. The root cause needs to be an underlying cause so that the person identifying the source of the problem can prevent it from occurring. Furthermore, the root cause needs to be identifiable.
If a root cause analysis isn’t turning up a true root cause to the problem, the analyst should consider leveraging another analytical tool to solve the problem. The root cause also needs to be within management’s control to fix. For example, if a change to trade policy resulted in a drop in the only source of a specific material, that is out of management’s control. They don’t set government policy! Finally, in order to be a root cause, the issue needs to have a solution that will prevent recurring issues. If it won’t prevent recurrences, there’s likely a causal issue versus a root cause to solve.
Root cause analysis (RCA) is a tool process engineers in manufacturing, and other folks across industries can use to identify what, how, or why the precipitating event occurred.
This approach to problem-solving is usually used when the consequence involves a “safety, health, environmental, quality, reliability, or production impact”.
Identifying the underlying cause of the problem, or root cause, empowers the analyst to identify a solution to the problem. These solutions vary and can include process change or other remedies. Unlike leading indicator-based analysis, root cause analysis is a reaction to an existing or historical problem. The goal is to prevent it from happening again in the future.
Chapter Two: How to complete a Root Cause Analysis
Before you can complete a root cause analysis, you must collect as much data as possible about the events and people involved in the lead-up. If you’re getting testimony to recreate events, it’s important that you get this information as soon as possible. People can be unreliable and their memories are vulnerable to suggestions, especially after time has passed.
The next steps you should take to complete a root cause analysis, or method, will depend on your approach. There are 5 popular approaches to root cause analysis:
- Events and causal factor analysis
- Change analysis
- Barrier analysis
- Risk tree analysis
- Kepner-Tregoe problem solving
Causal factor analysis requires identifying all of the contributing events that led to the problem. Avoid the pitfall of focusing on the most obvious, final contributors. One of the ways to delve into all of the causes is by leveraging the “5 Whys”. The 5 Whys is “an iterative interrogative technique used to explore the cause-and-effect relationships underlying a particular problem.”
For example, imagine your cost of scrap increased over the last quarter. If you were creating a causal factor analysis for increase, you might focus on the most obvious cause. What changed during that time period? The answer might be that a specific line is producing more scrap. If you ask again–why is that line generating more scrap?–you might uncover that there has been significant operator turnover over the last period. Ask why again and you could learn that a few of your experienced operators retired.
Continue this process long enough and you learn a lot more contributing factors to the increase in scrap that provides more detail than “there was an increase in scrap”. This line of logical questioning is called the “5 Whys” because 5 is the number of times someone can benchmark their questioning against. Using the 5 Whys can help you identify the causal factors that contributed to the problem you would like to prevent in the future.
For basic challenges, the 5 Whys themselves can be enough to get to the root cause of the problem. In the non-technical example provided above, a clear solution to the problem would be to replace the latch so the gate closes without needing a brick to hold it closed.
With more sophisticated problems, the 5 Whys might not be enough to solve the root of the issue. Let’s add some color to the example above. What happens if you factor in that the plant waited to hire new team members until a week before the experienced operators retired? What if new operators usually had a couple of months of training before having that same level of responsibility?
It’s not just the operators that caused the problem. There was a series of causal factors that influenced the poor performance.
Chapter Three: Types of Root Cause Analysis
What is Change Analysis?
Change Analysis is a root cause analysis technique that focuses on a specific problem or problematic event. This type of analysis seeks to expose which deviation from regular procedure, or change, drove the unfavorable event. This is the type of analysis manufacturing folks typically think of when discussing change analysis.
Change analysis is easy to learn and apply. Looking for a deviation from a norm also results in a clear corrective action. This provides concrete next steps for anyone conducting the analysis. Furthermore, it makes it easier to detect unusual root causes.
Consider applying a different type of root cause analysis if your standard process isn’t well-defined enough to provide a good basis for comparison. Also, depending on how variable your processes are, the number of moving parts might significantly increase the scope of this type of analysis.
Whether you decide to apply this type of analysis or another form of root cause analysis, make sure to test your assumptions. In the worst-case scenario, you’ll determine that your hypothesis is inconclusive or fail to find an actual root cause. This result, while unpleasant, is better than drawing an incorrect conclusion that causes additional issues in the future.
What is Barrier Analysis?
Barrier analysis is a systematic process used to identify failures of physical, administrative, and procedural barriers that should have prevented the adverse event. This analysis identifies why the barriers failed and determines which types of corrective action are needed to prevent them from failing again in the future.
Start your barrier analysis by identifying all of the barriers that were in place before the adverse event occurred. Review each barrier to determine if it was functioning under normal operating conditions. If there was a deviation in operating conditions, was it performing its intended function under these conditions? Did the barrier help decrease the total cost of the adverse event? Was the barrier’s design strong enough to fulfill its intended purpose? Finally, review whether it was built, maintained, and inspected appropriately leading up to the event.
Use these questions with each barrier to identify how the barriers failed to prevent the event. Note that this may not be the best type of root cause analysis depending on what you are investigating and the state of the existing process or set up leading up to your event.
What is Risk Tree Analysis?
Risk tree analysis, like the previous two analyses we’ve reviewed, is used to analyze the effects of a failed system after an adverse event has occurred. Event trees were developed during the WASH-1400 nuclear power plant safety study in 1974. Fault tree analysis under certain circumstances becomes large and unruly. The event tree was developed to help identify which pathway creates the most significant risk for a failure in a system without requiring each path to be mapped out in the tree.
The risk tree analysis has a few benefits. First, it helps you identify multiple coexisting contributors to failure. This provides multiple layers of detail. On the flip side, the amount of detail available in this analysis can make it easy to overlook subtle differences between branches. Also, this is a more complex form of root cause analysis. The person conducting the analysis needs training and some experience to ensure success.
What is the Kepner-Tregoe method?
The Kepner-Tregoe method of root cause analysis became famous when NASA used it to bring the Apollo 13 team home. It’s a structured methodology for gathering, prioritizing, and evaluating information. Like other forms of root cause analysis, the Kepner-Tregoe method is a systematic approach to solving a problem and analyzing risk.
The first step in this methodology is to identify problems and classify them by level of concern. Then, set the priority level by potential impact, urgency, and growth. Next, decide what action to take or which step to take next. Finally, make a plan for who will be involved, what they will do, where they’re involved, and when they take part. Be sure to scope the extent of each person’s involvement.
The next step to applying this analysis is to determine which objectives must be accomplished, as well as which ones you want to accomplish that aren’t absolutely necessary. This will help you evaluate your options against your objectives so you can determine the best possible choice of action.
The key benefit of the Kepner-Tregoe analysis is the ability to prioritize and focus the analysis. By weighing and setting objectives, this type of analysis provides a more direct review of an issue.
Chapter Four: Challenges with Root Cause Analysis
What is a Causal Factor vs a Root Cause
One challenge when conducting a root cause analysis is ensuring you are identifying root causes rather than causal factors. A causal factor is any behavior, omission, or deficiency that, if corrected, eliminated, or avoided, probably would have prevented the event. A root cause is a factor that if eliminated would definitely prevent recurrence.
Root cause analysis purists focus on identifying a root cause over a causal factor. However, many of the processes where root cause analysis is applied generate adverse events because of human error. Removing a specific manifestation of the error doesn’t necessarily highlight how the type of mistake can be repeated. Ultimately this specific focus can ignore a systematic error.
The most basic requirement for root cause analysis is data. Collecting as much data as possible throughout the process you are examining will improve the quality and efficiency of the root cause analysis. However, this is another focus for root cause analysis critics. Oftentimes data collection about the precipitating event begins after the event. It also requires multiple testimonies and interviews. This qualitative data collection can be unreliable, especially if these interviews need to occur days, weeks, or even months after the event.
Easy vs Lasting Solutions
Unfortunately, solutions to these events can be complex. Instead of removing a piece from an existing process, the best solution may be a complete redesign, new technology implementation, or other large scale adjustments. Even when necessary, administrators, managers, and leaders tend to look for quick and easy solutions.
Prioritizing a solution that doesn’t solve the problem in the long term, while easier, can lead to event recurrence.
If the root cause analysis is seen as a quest to identify culpability, you might be in trouble. The data collection process could be compromised if the root cause analysis looks like a way of finding someone to blame for the event. Balance identifying who is at fault with whatever system produced the unintended result. It’s unlikely a single person created the issue in malice. However, if this were the case, accountability for the individual and organization would be necessary.
Proponents of root cause analysis often encourage groups to collaborate and brainstorm during the process. However, some critics argue that this promotes groupthink and stifles creative approaches and analyses. If you’re building an RCA team, include team members from different groups and functional areas to promote fruitful collaboration.
Tools to Overcome Challenges & Apply Root Cause Analysis
Companies use tools like Tulip to collect data from their people, processes, and machines in real-time. This empowers them to conduct root cause analysis after smaller events and enables faster, and more efficient improvement to these processes. Furthermore, precise data on operators, machines, and changes to procedures makes it easier to avoid the challenges to root cause analysis highlighted above.
Digitally transform your operations with Tulip
See how systems of apps enable agile and connected operations.