[LDSL#4] Root cause analysis versus effect size estimation
Followup to: Information-orientation is in tension with magnitude-orientation. This post is also available on LessWrong.
In the conventional theory of causal inference, such as Rubin’s potential outcomes model or Pearl’s DAG approach, causality is modelled as a relationship of functional determination, X := f(Y). The question of interest becomes to study the properties of f, especially the difference in f across different values of Y. I would call this “effect size estimation”, because the goal is to give quantify the magnitude of an effect of one variable on another.
But as I mentioned in my post on conundrums, people seem to have some intuitions about causality that don’t fit well into effect size estimation, most notably in wanting “the” cause of some outcome when really there’s often thought to be complex polycausality.
Linear diffusion of sparse lognormals provides an answer: an outcome is typically a mixture of many different variables, X := Σi Yi, and one may desire an account which describes how the outcome breaks down into these variables to better understand what is going on. This is “root cause analysis”, and it yields one or a small number of factors because most of the variables tend to be negligible in magnitude. (If the root cause analysis yields a large number of factors, that is evidence that the RCA was framed poorly.)
Is root cause analysis a special case of effect size estimation?
If you know X, Y, and f, then it seems you can do root cause analysis automatically by setting each of the Y’s to zero, seeing how it influences X, and then reporting the Y’s in descending order of influence. Thus, root cause analysis ought to be a special-case of effect size estimation, right?
There are two big flaws with this view:
As you estimate f, you use a particular ontology for Y; for instance the states of the system you are studying. But the root cause may be something that fits in poorly to the ontology you use, for instance a multiplicative interaction between multiple variables in the system, or an exogenous variable from outside the system which causes correlated changes to inside the system.
Identifying the root cause through estimating f requires you to have an extremely detailed model of the system that takes pretty much everything into account, yet this may be prohibitively difficult to do.
You can try to use statistical effect size estimation for root cause analysis. However, doing so creates an exponentially strong bias in favor of common things over important things, so it’s unlikely to work unless you can somehow absorb all the information in the system.
Heuristics for direct root cause analysis
I don’t think I have a complete theory of root cause analysis yet, but I know of some general heuristics for root cause analysis which don’t require comprehensive effect size estimation.
You can look at the most extreme things that were happening in the system, aggregating it across different angles until you find the root cause you are looking for.
You can trace the causality backwards from X, looking only at the parts of f that are immediately relevant for X, and therefore not needing to estimate f in its entirity.
These both require a special sort of data, which I like to think of as “accounting data”. It differs from statistical data in that it needs to be especially comprehensive and quantitative. It would often be hard to perform this type of inference using a small random sample of the system, at least unless the root cause affects the system extraordinarily broadly.