The Great Root Cause Problem Solving Debate
Root Cause Problem Solving
If you’ve spent much time in the safety sphere, you’ve learned all about the importance of identifying “root causes” when problems occur. Usually, a root cause is best described as the simple process of continuing to dig deeper and deeper into why an accident or problem occurred. Eventually, as the theory and practice go, you’ll end up at your primary/root cause.
The method makes intuitive sense for a number of reasons, not the least of which is the fact that the most obvious and immediate cause for a problem is not always the most important one; usually a number of preceding conditions had to be met (or not met, as the case may be) for the final ‘domino’ to fall and cause the end result.
Some users on LinkedIn, particularly one Alan Quilley, however, contest that the root cause model for problem solving has some inherent flaws, and hope that this literal ‘line’ of thinking is dying out. He attached to his post an article on the handling of a recent railway disaster in Canada, including a report diagram which showed many factors, as determined by the Transportation Safety Board, in a circular pattern, all shown as contributing to the final cause. It is worth noting that there are no indications of linear progression in such a model, one of the distinguishing factors of root cause analysis.
What followed is one of the most hotly debated LinkedIn discussions I’ve seen. Over 30 comments have been made so far, and at points tempers flared over whether Quilley had a point or was being overly simplistic in his argument, along with the expected rebuttals. It was all a bit dramatic. That said, there was some interesting debate and wisdom to be drawn from the ashes, so here it is.
One of the biggest qualms that Quilley seemed to have with the root cause model was that certain factors were given more weight or thought of as getting “special treatment” when compared to others. In a root causes analysis, for example, you might say that a worker was tired and became inattentive before an accident, but the more important detail was that he had been scheduled for three extra-long shifts in that same week leading up to it.
Quilley argues that all arguments should be given equal attention and addressed. It makes sense, one should give attention to each item contributing to a cause, but to investigate under the pretense that no factors should be labeled as especially important or given more weight seems a bit silly.
User Amjad Alata illustrates this point perfectly by giving an example of a rusty pipe being a cause for a leak, but also poor maintenance as the reason for the rusty pipe in the first place, and then saying, “these two causes need to be addressed, but they cannot be considered of the same weight (or class), because they are not.”
Alata, and a couple others, made the argument that just because factors weren’t labeled in the same way didn’t magically make them equal. In his rust example, it would be important to immediately repair the rust damage (addressing one factor), but more important in the long run to fix underlying causes (the other).
Root Cause Problem Solving and Endless Digging
One legitimate problem with the root cause model I’m willing to concede is that knowing when to stop asking “why?” is difficult. For a while, asking why each factor occurred and then working backward further and further is productive, but eventually you’ll arrive at “because the world exists” or some other broad silliness outside your control.
But that’s just it, a good rule of thumb for root causation backtracking is to simply work backward until you arrive at factors you can no longer control. You can fix a rusty pipe, you can reprimand a maintenance worker, you can change maintenance training and policy, etc., but arrive at a factor predetermined by a government safety agency or the nature of the industry you’re in, and you’ll be wasting time, and this ‘time wasting’ is a big part of what Quilley seems fed up with.
Interestingly enough, one of the most misplaced (I felt) parts of this discussion was that about what kind of diagram should be used to represent causation. Quilley, for one, decries tree and linear diagrams that show one path to a problem.
Instead, the circular diagram from the TSB report is touted as a great new system… except it’s not. See, the problem is that, aside from very broad and general descriptions like “Insufficient Handbreaks,” you aren’t getting a whole lot of useful information.
The TSB knows this as well, and the real discussion of causation and the digging deeper into each one of those factors occurs inside their full written report. The diagram is only meant to show a sweeping overview of all the factors involved, not to be the actual problem-solving tool.
In Quilley’s defense, he likely knows this, but maybe feels the setup lends itself to a clearer understanding of the problem at a glance. To that end, he’s probably right.
In this arena, it ultimately comes down to what you’re trying to do with your boxes, pictures, bubbles, tree roots, etc. Is your diagram meant to be simple/at a glance, or is it meant to help facilitate a deeper understanding of the factors at play and be an actual problem-solving implement? This decision will dictate your choice.
One of the biggest problems with this discussion was a common occurrence in just about every single debate that’s happened in the history of the world ever: Little room for compromise. The truth is, a hybrid of the two models seems, to me, the best way to fully represent a problem and its causes.
The reason people like flow or root cause diagrams is that you can clearly see how one or more factors lead to another. The biggest complaint with them seems to be that they show problems at different levels, more due to the nature of the chart setup than by nature of how the factors actually occurred or contributed.
What might be a direction to explore would be combining this flow chart type progression with the chart of the TSB’s – from each circular ‘petal’ would branch outward the contributing (linear) factors to that particular bubble.
The result, in theory, would be a large inner circle with ALL of the most immediate causes taken into account, and then subsequent sticks/rays of factors going out from each of those, which work backward in time/progression in a similar way to traditional root cause exploration. In this way, you would be giving roughly equal weight to the types of factors that lead to an event, while still acknowledging through exploration of their root causes that different ones require more or more immediate attention.
If there’s nothing else that this discussion showed us, however, it’s that “in theory” means just that, and everyone can have an opinion on the right way to do something without ever having to prove it, test it in practice, etc. So I challenge you to take some of these ideas and see what they look like in practice for you. What works? What leaves something to be desired? Does that last hybrid model have a useful application? Only time, and more importantly practice and hard data, will tell.