Actually causal inference is also really hard to benchmark. My colleague started an effort to be actually able to reproduce and compare results. Also the algorithms often do not scale too well.
Everytime we wanted to use this for real data it is just a little bit too much effort and the results are not conclusive because it is hard to verify huge graphs. My colleague e.g. wanted to apply it explain risk confounders in investment funds.
I personally also do not like the definition of causality they base it on.
One way to test this is through a placebo test, where you shift the treatment, such as moving it to an earlier date, which I have seen used successfully in practice. Another approach is to test the sensitivity of each feature, which is often considered more of an art than a science. In practice, I haven't observed much success with this method.
You don’t need to look at a graph at all though, right? There are plenty of tests that can help you identify factors that could be significantly affecting your distribution
If you want to make causal inferences you really do have to look at a graph that includes both observed and probable unobserved causes to get any real sense of what’s going on. Automated methods absent real thinking about the data generating process are junk.
Everytime we wanted to use this for real data it is just a little bit too much effort and the results are not conclusive because it is hard to verify huge graphs. My colleague e.g. wanted to apply it explain risk confounders in investment funds.
I personally also do not like the definition of causality they base it on.