A second point in the recent Nature Human Behavior Article nature.com/articles/s41562-023 is to adhere to threshold interpretations. Do not use ‘marginally significant’. A finding is significant, or not. But why would scientists make dichotomous claims?

As we write in our recent paper: “if scientists were asked to defend why they make dichotomous claims […] we expect many of them might not find it easy to provide strong arguments in defense of their own actions.” doi.org/10.1177/09593543231160

And yet, such a justification exists. It follows logically and coherently from a methodological falsificationist philosophy of science, of which the Neyman-Pearson approach to hypothesis testing is the most widely used example.

You have heard that some scientists care about testing theories. They do so be trying to falsify predictions derived from these theories. Theories are about phenomena, but our observations are about data. One cannot directly falsify the former just using the latter.

How do scientists go from observed data, to claims about phenomena? This is where methodological decision procedures (such as “is p below or above a certain value) come in. We need to turn observed data into Popperian ‘basic statements’.

Because data have random error, we need to create a decision procedure to turn probabilistic data into statements that can falsify a theory that talks about phenomena. This decision procedure should be transparent, and specified in advance.

By making dichotomous claims based on a prespecified statistical rejection rule, we can make quasibasic statements that express (with a prespecified maximal error probability) statistical inferences that can corroborate or falsify theoretical claims.

Statistics is only one step in the decision procedure that occurs when we test theories! We also need to make decisions about auxiliary hypotheses, such as measurement, and theoretical assumptions (such as boundary conditions).

The Neyman-Pearson approach to statistical inferences, which is the most widely used statistical decision procedure to make dichotomous decisions in science. It allows us to make decisions with a maximum error rate (under assumptions), which is often set at 5%.

This 5% threshold is arbitrary, but it needs to be set somewhere. Although there are ways to justify an alpha level that differs from 5% (see our paper on this journals.sagepub.com/doi/abs/1) it is not easy, and not applicable to all studies. So, the default is there, if you can’t think of any other justification.

(It is a bit nerdy, but fun, to think about why it ended up at 5% - in doi.org/10.1177/09593543231160 we speculate on possible causes. It is a rare point of agreement among scientists, that’s for sure!

So, the recommendation of Nature Human Behavior nature.com/articles/s41562-023 to interpret p-values in a dichotomous manner is not mindless statistics – it is based on a methodological falsificationist philosophy of science, and a coherent approach to statistical inferences when testing theories.

@undefined @lakens Interesting thread...! About the statistical hypothesis rejection / popperian falsificationism analogy, I remember reading a nice counterargument in @rlmcelreath 's Statistical rethinking:

@leovarnet @rlmcelreath This might be a criticism of how some people use NHST, but not relevant when it is done well. Best practice is not in line with what is suggested here (for example, equivalence tests that reject H1 are recommended, also in the Nature Human Behavior article). So, NHST done well is falsificiationist, and more exactly, as I wrote, Neyman-Pearson testing is probably *the* example of methodological falsificationism in science.

Sign in to participate in the conversation
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.