The main reason for not using causal inference is not because data scientists do...

riedel · on Oct 15, 2023

Actually causal inference is also really hard to benchmark. My colleague started an effort to be actually able to reproduce and compare results. Also the algorithms often do not scale too well.

Everytime we wanted to use this for real data it is just a little bit too much effort and the results are not conclusive because it is hard to verify huge graphs. My colleague e.g. wanted to apply it explain risk confounders in investment funds.

I personally also do not like the definition of causality they base it on.

Dzidas · on Oct 15, 2023

One way to test this is through a placebo test, where you shift the treatment, such as moving it to an earlier date, which I have seen used successfully in practice. Another approach is to test the sensitivity of each feature, which is often considered more of an art than a science. In practice, I haven't observed much success with this method.

kjkjadksj · on Oct 15, 2023

You don’t need to look at a graph at all though, right? There are plenty of tests that can help you identify factors that could be significantly affecting your distribution

bigfudge · on Oct 15, 2023

If you want to make causal inferences you really do have to look at a graph that includes both observed and probable unobserved causes to get any real sense of what’s going on. Automated methods absent real thinking about the data generating process are junk.

muraiki · on Oct 15, 2023

“Graph” here means the directed acyclic graph encoding the causal relationships, not a chart of a distribution.

esafak · on Oct 15, 2023

You can only select among features that you have measured.

esafak · on Oct 15, 2023

Go on, please. What definition, and algorithms with scaling problems?

mikpanko · on Oct 15, 2023

A/b experiments are definitely a gold standard as they provide true causality measurement (if implemented correctly). However, they are often expensive to run: need to implement the feature in question (which is less than 50% going to work) and then collect data for 1-4 weeks before being able to make the decision. As a result only a small number of business decisions today rely on a/b tests. Observational causal inference can help bring causality into many of the remaining decisions, which need to be made quicker or cheaper.

jiggawatts · on Oct 15, 2023

The “gold standard” has failure modes that seem to be ignored.

E.g.: making UI elements jump around unpredictably after a page load may increase the number of ad clicks simply because users can’t reliably click on what they actually wanted.

I see A/B testing turning into a religion where it can’t be argued with. “The number went up! It must be good!”

bertil · on Oct 15, 2023

That’s generally because the metrics you are looking at do not represent what users care about. It’s different than the testing methodology, often overlooked, and a lot more important.

I’ve argued that A/B testing training should focus on that skill a lot more than Welch’s theory, but I had to record my own classes for that to happen.

travisjungroth · on Oct 15, 2023

But those metrics are hard to move, so you target secondary metrics.

The problem with that strategy becomes obvious when you spell out the consequences: measurably improving the product is hard, so you measure something else and hope you get product improvements.

lr1970 · on Oct 15, 2023

There can be a real ethical dilemma when applying A/B testing in medical setting. Placing someone with an incurable disease in a control group is condemning them to death while in treatment group they might have a chance. On the other hand, without a proper A/B testing methodology the drug efficacy cannot be established. So far no perfect solution to the dilemma has been found.

jrumbut · on Oct 16, 2023

> in a control group

The control group gets the current standard treatment, not nothing (in case that was a source of confusion). Plus they typically don't have to pay for it which is a benefit for them.

Large trials today will typically conduct interim analyses and will have pre-defined guidelines for when to stop the trial because the new treatment is either clearly providing a benefit or is clearly futile.

Here is an example of such a study: https://www.ahajournals.org/doi/10.1161/CIRCHEARTFAILURE.111...

najarvg · on Oct 16, 2023

Most therapeutic trials are nowadays "Intent to treat". So subject would receive either standardized tx or experimental tx in th e randomization. Many of them also have crossovers such that when measurable (as defined by the protocol) benefit is seen, standard tx based subjects can be moved over to the experimental arm

hobs · on Oct 16, 2023

It's not really an ethical dilemma until you know it works, and then usually if the evidence is strong enough they'll cut the trial early.

bertil · on Oct 15, 2023

All the alternative methods require the same sacrifice. More importantly, most suggested treatments fail to cure deadly conditions or have major side effects or risks that are just as unethical to thrust upon people untested.

If you look at it properly, i.e. evaluate what should be your actions before the test (Do nothing, Impose untested treatment, Test with proper control to learn what to do with the majority of the population), the answer is rarely ambiguous.

There is a debate to be had on how much pre-clinical work to be done before clinical testing, but those are increasingly automated, cheap, and fast, so we often reach the point where a double-blind test is the next logical step.

The argument you present is based on either an unwarranted confidence in treatments, or information that wasn’t available when the decision had to be made.

twobitshifter · on Oct 16, 2023

You can end the trial early when it’s clear the treatment is working. This just happened last week with Ozempic for diabetes caused kidney disease. https://www.wxyz.com/news/health/ask-dr-nandi/novo-nordisk-e...

bertil · on Oct 15, 2023

Causal inference is useful, but it's neither quicker nor cheaper.

mikpanko · on Oct 15, 2023

Agree that it is hard today. A person you might know is trying to prove that is doesn’t have to be: https://www.motifanalytics.com/blog/bringing-more-causality-... .

We’d love to chat more with you on the topic - feel free to hit Sean or me on LinkedIn.

bertil · on Oct 15, 2023

I am a big fan of what Sean and you are trying to do–I wrote up a chapter about it this weekend, actually. I’m worried that you both have worked for companies where a lot of work has been done to identify relevant dimensions (metrics and categories) and automate causality (or rather: estimating factors on a pre-existing causal graph because that’s the slight of hands the word “causality” does) made sense once you’ve reached that level of maturity.

But to reach that point, before having relevant dimensions, there has to be a lot of work, generally motivated by disappointing experiments. “Why didn’t that work?” is often answered by “Because our goal is too remote from our actions—here’s a better proxy” or “Because this change only makes sense to 8% of our users, here’s how we can split them.”

I’m worried that too many people will think the tool itself is enough and not a complement to the maturity in understanding a company’s user. This ‘solutionism’ is widespread among Data tools: https://www.linkedin.com/posts/bertilhatt_the-potential-gap-...

mikpanko · on Oct 15, 2023

Thank you for clarifying.

Reading some of your posts I think we agree more than disagree. A big difference from most new analytics tools you see today is that we don't want to provide a magic "solution" (which is bound to over-promise and under-deliver) but rather a generic tool to quickly define and try out different business categories on the data.

Followed you on LinkedIn for more in-depth takes.

esafak · on Oct 15, 2023

It is likely to be cheaper and quicker to run a counterfactual test in the computer than in real life.

The question is how reliable it is.

travisjungroth · on Oct 15, 2023

> As a result only a small number of business decisions today rely on a/b tests.

The default for all code changes at Netflix is they’re A/B tested.

hackernewds · on Oct 15, 2023

an expensive test is better than an expensive mistake :) within the scale of hundreds of decisions made with inherent bias of the product/biz/ops teams that direction misalignment can be catastrophic

Dzidas · on Oct 15, 2023

You can apply it to estimate the impact of any business decision if you have data, so not only IT companies can benefit from it. However, the problem arises when the results don't align with the business's expectations. I have firsthand experience with projects being abandoned simply because the results didn't meet expectations.