Distinguishing association from causation

« Anova | Main | More precision on income and voting? »

October 29, 2007

Distinguishing association from causation

I was pointed to Distinguishing Association from Causation:A Background for Journalists (there is also a PDF version). Here is my summary of their executive summary:

Scientific studies that show an association between a factor and a health effect do not necessarily imply that the factor causes the health effect.

Randomized trials are studies in which human volunteers are randomly assigned to receive either the agent being studied or an inactive placebo, usually under double-blind conditions.

The findings of animal experiments may not be directly applicable to the human situation because of genetic, anatomic, and physiologic differences between species and/or because of the use of unrealistically high doses.

In vitro experiments are useful for defining and isolating biologic mechanisms but are not directly applicable to humans.

The findings from observational epidemiologic studies are directly applicable to humans, but the associations detected in such studies are not necessarily causal.

Useful, time-tested criteria for determining whether an association is causal include:
- Temporality. For an association to be causal, the cause must precede the effect.
- Strength. Scientists can be more confident in the causality of strong associations than weak ones.
- Dose-response. Responses that increase in frequency as exposure increases are more convincingly supportive of causality than those that do not show this pattern.
- Consistency. Relationships that are repeatedly observed by different investigators, in different places, circumstances, and times, are more likely to be causal.
- Biological plausbility. Associations that are consistent with the scientific understanding of the biology of the disease or health effect under investigation are more likely to be causal.

Studies that include appropriate statistical analysis and that have been published in peer-reviewed journals carry greater weight than those that lack statistical analysis and/or have been announced in other ways.

Claims of causation should never be made lightly.

But all this isn't about causation vs association, it's about better studies or worse studies. Association and causation are not binary categories. Instead, there is a continuum from simple models on observational data (correlation between two variables), through more sophisticated models on observational data that include covariates (regression, structural equation models), through yet sophisticated models on observational data that take sample selection bias into consideration (Rubin's propensity score approach), to often simple models on controlled data (randomized experiments). But the mysterious causal "truth" is still out there. If one talks to philosophers these days, they're not even happy with the notion of causality as being powerful enough as a model of reality.

In the past, I've often unfairly complained about studies after having read misleading journalistic reports, so this report is a timely one. But the report has been paid for by large pharma corporations, people may wonder if there is bias or some sort of an agenda in this report.

My quick impression is that they're promoting the best practices in statistical methodology, that all these companies are subscribing to. But there could be greater use of cheaper observational studies with better modeling (such as employing the propensity score approach, or even just better regression modeling) compared to expensive randomized experiments, and society might be better off as a result. Moreover, there is the issue of statistical versus practical significance. What do you think?

Posted by Aleks at October 29, 2007 3:56 PM

RSS feed for this entry.

Trackback Pings

TrackBack URL for this entry:
http://www.stat.columbia.edu/~cook/movabletype/mt-tb.cgi/1256

Comments

One thing I'm not sure about is whether this group is describing "practices in statistical methodology, that all these companies are subscribing to," because it most of the funders are in manufacturing (either of foods, materials, or chemicals -- not so much pharma, though there are a few in there) and produce products that various studies have shown can be bad for human health and/or the environment. As far as I can tell, this group defends these companies against claims that their products are harmful, and this paper serves that end by creating requirements for causation that are so stringent that no product could really be claimed to "cause" any harmful effects to anyone.

I hear you on the journalism issues, but it's tougher to fix than you might think. The reality is that if a journalist writes with all the nuance and caution and explicit statistical uncertainties of a scientist, his news article quickly becomes a journal article, which most of his audience a) can't understand or b) doesn't want to read, or both. And of course, that's assuming that you can even get that kind of stuff past an editor, which is incredibly unlikely. Reporters often have to fight just to include "hedging" language (may/might/could/seems to/appears to/etc) surrounding outright speculation based on findings that are very preliminary.

There are faults on both sides: media want to simplify and sell stories, and audiences refuse to wade through actual data or details. But my hope is always that smart, careful readers who are really interested in something will take the time to look at the study themselves if they want to know what really happened, rather than relying on a 100-word infotainment summary. You might be interested in some posts/comments here (from a few months ago) and here (from a few weeks ago).

Posted by: Anonymous at October 29, 2007 5:55 PM.

Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write.
H.G.Wells

Posted by: KMC at October 29, 2007 6:33 PM.

Topic tackled by Nancy Cartwright in a recent book (Hunting Causes and Using Them):
http://www.cambridge.org/us/catalogue/catalogue.asp?isbn=9780521677981

Posted by: stet at October 31, 2007 7:44 AM.

Since when does lightening (sic) not cause thunder? (see your link on Causation in the section on Probabilistic Causality).

I'm a bit disturbed to see this statement as a supposed counterexample in a causality explanation...

Posted by: Daniel Lakeland at October 31, 2007 12:25 PM.

More fodder for this line of thought. The arguments made lean toward causality, even though they've only made an association (thus far).
http://abcnews.go.com/GMA/OnCall/story?id=3799715

Never mind the absolutist nature of the guidance...

Posted by: Ben at October 31, 2007 12:51 PM.

I'm with Daniel. Lightning does cause thunder.

Posted by: Andrew at October 31, 2007 4:02 PM.

On the lightning and thunder connection... What is the role that causal theoretical models should play in areas like medicine where the strength of the causal relationship is in question?

What I mean here is that unlike in some simple engineering or physics models (ie. gravity causes balls to fall at a constant acceleration near the surface of the earth), often we have partial models of causality, such as that insulin is involved in fat storage, and if people become obese the difficulty of storing additional fat may cause insulin to become less effective, leading to increased risk of diabetes.

But this is only one aspect of causality in diabetes. Sometimes doctors latch on to a causal theory which turns out to be not so much wrong as irrelevant (in that some other factor is a much more important one). Another example is something like calcium in the diet vs osteoporosis. I think recent research and drug development has made it clear that calcium in the diet is not the primary issue in osteoporosis.

Often I hear "correlation does not imply causation" from those who dislike observational studies, but in many cases "causation does not imply causation" in the sense I mention above.

Posted by: Daniel Lakeland at October 31, 2007 4:17 PM.

Daniel, when you only have two variables, I agree with Andrew that lightning causes thunder. But an even better theory will have more variables and there both lightning and thunder will be consequences of the same cause: the electric discharge. Unless one feels that discharge is synonymous with lightning.

Causal theories seem to be assigned a higher prior probability by default. :) All important statistical problems can be reduced to bad priors.

Posted by: Aleks at October 31, 2007 7:35 PM.

Aleks:

I think most people when reading at a general level consider lightning to be the electrical discharge itself.

We can get into complicated causal chains in which we say that it's a butterfly in the amazon that "causes" the disturbance that causes the electrical imbalance that causes the lightning and the thunder, and therefore thunder is caused by insects... but I don't think that's where you want to go is it?

Causal theories seem to be assigned a higher prior probability by default. :) All important statistical problems can be reduced to bad priors.

I think by this you mean that doctors put too much (prior) emphasis on calcium or obesity or whatever single cause they've identified, and that this prior skews their judgement.

That is fine as a description of the problem. The question is, how should statistically minded people modify their model building behavior so as to help avoid this over-confidence in causal models? In other words, just as observational studies have endogeneity and confirmation bias issues, so do causal models where causes are multiple and complicated.

Posted by: Daniel Lakeland at October 31, 2007 8:01 PM.

Daniel:

I prefer feedback-loop multi-cause systems thinking and complexity theory to the-one-correct cause mechanical causal thinking. Perhaps someday more people will use it. But for the time being, the mathematics and statistics associated with systems thinking is even more immature than the mathematics of causality. And education and maturity of a theory is what increases people's priors.

But in particular, I've been developing an information-theoretic approach to interpreting the importance of causes, which explicitly shows the ambiguity in assigning importance to individual causes. Some preliminary work is at http://www.stat.columbia.edu/~jakulin/Int/index.html - feel free to drop me a note what you think about it.

Posted by: Aleks at November 1, 2007 11:05 AM.

These days I’m also concerned about problems of practical significance - in the implications of inference based on models for decision-making. My concern derives mainly from teaching statistical decision theory to current and future public policymakers.

One consequence is that I’m surprised how little work has been done in the social sciences - especially in political science - on the role of loss functions in statistics. Why don’t we have a polisci/policy version of DeGroot’s Optimal Statistical Decisions - especially one we could use for training people in decision-making?

Posted by: Andy at November 2, 2007 5:22 PM.

Statistical Modeling, Causal Inference, and Social Science

October 29, 2007