A second point in the recent Nature Human Behavior Article https://www.nature.com/articles/s41562-023-01586-w is to adhere to threshold interpretations. Do not use ‘marginally significant’. A finding is significant, or not. But why would scientists make dichotomous claims?
As we write in our recent paper: “if scientists were asked to defend why they make dichotomous claims […] we expect many of them might not find it easy to provide strong arguments in defense of their own actions.” https://doi.org/10.1177/09593543231160112
And yet, such a justification exists. It follows logically and coherently from a methodological falsificationist philosophy of science, of which the Neyman-Pearson approach to hypothesis testing is the most widely used example.
You have heard that some scientists care about testing theories. They do so be trying to falsify predictions derived from these theories. Theories are about phenomena, but our observations are about data. One cannot directly falsify the former just using the latter.
How do scientists go from observed data, to claims about phenomena? This is where methodological decision procedures (such as “is p below or above a certain value) come in. We need to turn observed data into Popperian ‘basic statements’.
Because data have random error, we need to create a decision procedure to turn probabilistic data into statements that can falsify a theory that talks about phenomena. This decision procedure should be transparent, and specified in advance.
By making dichotomous claims based on a prespecified statistical rejection rule, we can make quasibasic statements that express (with a prespecified maximal error probability) statistical inferences that can corroborate or falsify theoretical claims.
Statistics is only one step in the decision procedure that occurs when we test theories! We also need to make decisions about auxiliary hypotheses, such as measurement, and theoretical assumptions (such as boundary conditions).
This 5% threshold is arbitrary, but it needs to be set somewhere. Although there are ways to justify an alpha level that differs from 5% (see our paper on this https://journals.sagepub.com/doi/abs/10.1177/25152459221080396) it is not easy, and not applicable to all studies. So, the default is there, if you can’t think of any other justification.
(It is a bit nerdy, but fun, to think about why it ended up at 5% - in https://doi.org/10.1177/09593543231160112 we speculate on possible causes. It is a rare point of agreement among scientists, that’s for sure!
So, the recommendation of Nature Human Behavior https://www.nature.com/articles/s41562-023-01586-w to interpret p-values in a dichotomous manner is not mindless statistics – it is based on a methodological falsificationist philosophy of science, and a coherent approach to statistical inferences when testing theories.
@undefined @lakens Interesting thread...! About the statistical hypothesis rejection / popperian falsificationism analogy, I remember reading a nice counterargument in @rlmcelreath 's Statistical rethinking:
@lakens @rlmcelreath I see... thanks for the clarification!