**A. Raoul Van Oosten** @raoulvanoosten@ecoevo.social · Feb 15, 2023, 09:49

**A. Raoul Van Oosten** @raoulvanoosten@ecoevo.social · Feb 15, 2023, 09:49

A. Raoul Van Oosten @raoulvanoosten@ecoevo.social

Feb 15, 2023, 09:49

A. Raoul Van Oosten @raoulvanoosten@ecoevo.social

Greenland et al. (2016, https://doi.org/10.1007/s10654-016-0149-3 ) suggested a more refined goal of statistics than testing study hypotheses is the evaluation of the (un)certainty of effect sizes.

I agree. The majority of studies are exploratory, hence testing hypotheses does not make sense.

#science #statistics #papers

**Daniel Lakens** @lakens@mastodon.social · Feb 15, 2023, 10:29

**Daniel Lakens** @lakens@mastodon.social · Feb 15, 2023, 10:29

Feb 15, 2023, 10:29

Daniel Lakens @lakens@mastodon.social

@raoulvanoosten Don't you think this conflates testing (distinguishing signal from noise) and testing hypotheses (confirming or falsifying theoretical predictions)? Even if you do not want to do the second, you often want to do the first, no?

**A. Raoul Van Oosten** @raoulvanoosten@ecoevo.social · Feb 15, 2023, 11:45

**A. Raoul Van Oosten** @raoulvanoosten@ecoevo.social · Feb 15, 2023, 11:45

Feb 15, 2023, 11:45

A. Raoul Van Oosten @raoulvanoosten@ecoevo.social

@lakens do you mean estimation of ES (un)certainty is enough both to distinguish signal from noise and to test hypotheses?

**Daniel Lakens** @lakens@mastodon.social · Feb 15, 2023, 11:48

**Daniel Lakens** @lakens@mastodon.social · Feb 15, 2023, 11:48

Feb 15, 2023, 11:48

Daniel Lakens @lakens@mastodon.social

@raoulvanoosten That ius exactly the discussion - are you happy just estimating an effect size, and discussing it, even if that effect size can be noise? If so, then you do not need to test. But in practice, this is not what I see in papers - even in estimation papers. People often declare something 'an effect' -not just 'an estimate'

**A. Raoul Van Oosten** @raoulvanoosten@ecoevo.social · Feb 15, 2023, 12:48

**A. Raoul Van Oosten** @raoulvanoosten@ecoevo.social · Feb 15, 2023, 12:48

Feb 15, 2023, 12:48

A. Raoul Van Oosten @raoulvanoosten@ecoevo.social

@lakens I'm not sure yet. I think significance testing (with p < .05) is a bad idea, while a priori determination of when claims will be accepted or refuted is essential. I am undecided whether ES and CIs are enough (I have not come across other frequentist methods yet).

**Daniel Lakens** @lakens@mastodon.social · Feb 15, 2023, 12:51

**Daniel Lakens** @lakens@mastodon.social · Feb 15, 2023, 12:51

Feb 15, 2023, 12:51

Daniel Lakens @lakens@mastodon.social

@raoulvanoosten I think it is currently fashionable to think p<.05 is bad, while almost everyone thinks it is a useful tool that, as all statistics, is often used without sufficient training. The difference is important. I have never seen anyone provide a coherent view on making claims without error control

**A. Raoul Van Oosten** @raoulvanoosten@ecoevo.social · Feb 16, 2023, 14:27

**A. Raoul Van Oosten** @raoulvanoosten@ecoevo.social · Feb 16, 2023, 14:27

Feb 16, 2023, 14:27

A. Raoul Van Oosten @raoulvanoosten@ecoevo.social

@lakens it is definitely fashionable to criticize NHST and the default p < .05. I think the good thing about that is that researchers shouldn't use these defaults without thinking about them. Which is mostly about education, like you say.

**Daniel Lakens** @lakens@mastodon.social · Feb 16, 2023, 16:21

**Daniel Lakens** @lakens@mastodon.social · Feb 16, 2023, 16:21

Feb 16, 2023, 16:21

Daniel Lakens @lakens@mastodon.social

@raoulvanoosten I agree. And I think it is good to show how to do it better in practice. This step does not happen a lot - that was the whole point of the 'moving beyond p < 0.05' special issue - and even in that special issue, most solutions STILL recommended p-values, but proposed to add some additional statistic.

**A. Raoul Van Oosten** @raoulvanoosten@ecoevo.social · Feb 16, 2023, 16:48

**A. Raoul Van Oosten** @raoulvanoosten@ecoevo.social · Feb 16, 2023, 16:48

Feb 16, 2023, 16:48

A. Raoul Van Oosten @raoulvanoosten@ecoevo.social

@lakens but the editors of that special issue suggest "don't use statistical significance". That's also my current standpoint: p-values are fine as a continuous metric, and more education is needed so people use them correctly.

Hypothesis testing needs more, and I'm unsure yet what.

**Daniel Lakens** @lakens@mastodon.social · Feb 16, 2023, 17:01

**Daniel Lakens** @lakens@mastodon.social · Feb 16, 2023, 17:01

Feb 16, 2023, 17:01

Daniel Lakens @lakens@mastodon.social

@raoulvanoosten The editors of that special issue are, regrettably, incompetent biased unscientific individuals who, with their very bad editorial, led to a taskforce that had to correct their mistakes https://doi.org/10.1080/09332480.2021.2003631 so that people like you would not be misguided by them. So, I am very sorry if I just ignore that editorial altogether, and listen to more competent people.

**A. Raoul Van Oosten** @raoulvanoosten@ecoevo.social · Feb 16, 2023, 17:43

**A. Raoul Van Oosten** @raoulvanoosten@ecoevo.social · Feb 16, 2023, 17:43

Feb 16, 2023, 17:43

A. Raoul Van Oosten @raoulvanoosten@ecoevo.social

@lakens subtle :P. I did not know this. I'll check out that paper. Thanks.

**Daniel Lakens** @lakens@mastodon.social · Feb 16, 2023, 17:45

**Daniel Lakens** @lakens@mastodon.social · Feb 16, 2023, 17:45

Feb 16, 2023, 17:45

Daniel Lakens @lakens@mastodon.social

@raoulvanoosten The emotion is strong. I am writing a blog about this (which I rarely do anymore). They also lie that most of the articles in the special issue agree with their view. Not at all true. So unscientific. So misleading.

**A. Raoul Van Oosten** @raoulvanoosten@ecoevo.social · Feb 17, 2023, 08:14

**A. Raoul Van Oosten** @raoulvanoosten@ecoevo.social · Feb 17, 2023, 08:14

Feb 17, 2023, 08:14

A. Raoul Van Oosten @raoulvanoosten@ecoevo.social

@lakens aw man, that sucks. Thanks for the Task Force paper. It seems I should check out all papers in the special issue myself, too.

**Daniel Lakens** @lakens@mastodon.social · Feb 17, 2023, 08:40

**Daniel Lakens** @lakens@mastodon.social · Feb 17, 2023, 08:40

Feb 17, 2023, 08:40

Daniel Lakens @lakens@mastodon.social

@raoulvanoosten I added a short review in our revised version of https://psyarxiv.com/af9by/. The screenshots are the relevant paragraphs. I wanted to write a blog about it to discuss this in more detail (we are very brief in the paper).

**A. Raoul Van Oosten** @raoulvanoosten@ecoevo.social · Feb 17, 2023, 09:41

**A. Raoul Van Oosten** @raoulvanoosten@ecoevo.social · Feb 17, 2023, 09:41

Feb 17, 2023, 09:41

A. Raoul Van Oosten @raoulvanoosten@ecoevo.social

@lakens so using interval hypothesis tests (and meaningful effect sizes)? When I started reading about the NHST-issue about a year ago (with papers like Meehl, 1967), that's what I thought but I couldn't find concrete examples and ended up with the "abandon significance" idea. Good to hear there is a body of work that supports the use of interval hypotheses.

**David Colquhoun** @david_colquhoun@mstdn.social · Feb 17, 2023, 10:47

**David Colquhoun** @david_colquhoun@mstdn.social · Feb 17, 2023, 10:47

Feb 17, 2023, 10:47

David Colquhoun @david_colquhoun@mstdn.social

@raoulvanoosten @lakens
The problem with confidence intervals is that they are random variables, as are p values. That means that when you calculate CI for your experiment, it is not right to say there's a 95% chance that the true value is included in the CI. A repeat of the experiment will give different CI.

**A. Raoul Van Oosten** @raoulvanoosten@ecoevo.social · Feb 17, 2023, 11:48

**A. Raoul Van Oosten** @raoulvanoosten@ecoevo.social · Feb 17, 2023, 11:48

Feb 17, 2023, 11:48

A. Raoul Van Oosten @raoulvanoosten@ecoevo.social

@david_colquhoun so that's what Greenland et al (2016, # 202301051039 greenland2016) meant. So are they still useful?

**David Colquhoun** @david_colquhoun@mstdn.social · Feb 17, 2023, 11:58

**David Colquhoun** @david_colquhoun@mstdn.social · Feb 17, 2023, 11:58

Feb 17, 2023, 11:58

David Colquhoun @david_colquhoun@mstdn.social

@raoulvanoosten
Are CI still useful? The problem is that people tie themselves in knots when trying to define how they should be interpreted, just as they do with p values. That's why I love the Wagenmakers quotation. I've advocated supplementing the p value and CI (rather than abandoning them) with a likelihood ratio or some measure of false positive risk

2df91984e1f63d7b.png

**A. Raoul Van Oosten** @raoulvanoosten@ecoevo.social · Feb 19, 2023, 07:32

**A. Raoul Van Oosten** @raoulvanoosten@ecoevo.social · Feb 19, 2023, 07:32

Feb 19, 2023, 07:32

A. Raoul Van Oosten @raoulvanoosten@ecoevo.social

@david_colquhoun @lakens I think indeed frequentist interference tries to answer the wrong question ("what is the probability of finding these data under the tested hypothesis?" rather than "what is the probability my tested hypothesis is true?"). Your false positive risk ( https://doi.org/10.1098/rsos.171085 ) is much closer to that question.

**Daniel Lakens** @lakens@mastodon.social · Feb 19, 2023, 07:41

**Daniel Lakens** @lakens@mastodon.social · Feb 19, 2023, 07:41

Feb 19, 2023, 07:41

Daniel Lakens @lakens@mastodon.social

@raoulvanoosten @david_colquhoun just a reminder that we can never answer the question 'what is the probability my hypothesis is true' in science. Frequenties is not answering the wrong question. It is answering the right, and only, question.

**David Colquhoun** @david_colquhoun@mstdn.social · Feb 20, 2023, 10:31

**David Colquhoun** @david_colquhoun@mstdn.social · Feb 20, 2023, 10:31

Feb 20, 2023, 10:31

David Colquhoun @david_colquhoun@mstdn.social

@lakens @raoulvanoosten
The problem with p values is that they confuse two quite different quantities.
The probability that you have 4 legs given that you are a cow is high.
The probability that you are a cow, given that you have 4 legs is low.

**Daniel Lakens** @lakens@mastodon.social · Feb 20, 2023, 10:38

**Daniel Lakens** @lakens@mastodon.social · Feb 20, 2023, 10:38

Feb 20, 2023, 10:38

Daniel Lakens @lakens@mastodon.social

@david_colquhoun @raoulvanoosten Except, this is not how p-values are used in practice. Their use is:

If this is a cow, it should have 4 legs. I perform a study. The data I have allow me to reject (with a small error rate) that the animal has 3 or less, or 5 or more, legs. Hence, I will act as if the animal has 4 legs.

The probability that it is a cow is not something any single study can quantify. Also also find attempts to quantify that probability rather uninteresting.

**David Colquhoun** @david_colquhoun@mstdn.social · Feb 20, 2023, 13:37

**David Colquhoun** @david_colquhoun@mstdn.social · Feb 20, 2023, 13:37

Feb 20, 2023, 13:37

David Colquhoun @david_colquhoun@mstdn.social

@lakens @raoulvanoosten
Even without Bayes, surely it would be better to use a likelihood ratio be cause that's the way to quantitate the evidence from your experiment. eg
P( obs ! cow)/P(obs | not cow)

**Daniel Lakens** @lakens@mastodon.social · Feb 20, 2023, 13:39

**Daniel Lakens** @lakens@mastodon.social · Feb 20, 2023, 13:39

Feb 20, 2023, 13:39

Daniel Lakens @lakens@mastodon.social

@david_colquhoun @raoulvanoosten I don't think it is better - it gives additional information. Just like also computing the effect size gives additional information. I am not against it - but it is like asking me if I should by bread, or cheese, for lunch. Probably prefer both :)

**David Colquhoun** @david_colquhoun@mstdn.social · Feb 20, 2023, 13:43

**David Colquhoun** @david_colquhoun@mstdn.social · Feb 20, 2023, 13:43

Feb 20, 2023, 13:43

David Colquhoun @david_colquhoun@mstdn.social

@lakens @raoulvanoosten
Yes, but the information given by the LR contradicts that from the p value (at least in cases where it's sensible to test a point null). That surely means that you have to decide which (LR or p) is more sensible?

**Paweł Lenartowicz** @plenartowicz@qoto.org · Feb 20, 2023, 14:51

**Paweł Lenartowicz** @plenartowicz@qoto.org · Feb 20, 2023, 14:51

Feb 20, 2023, 14:51

Paweł Lenartowicz @plenartowicz@qoto.org

@david_colquhoun @lakens

Fisherian aproach was intoduced in 1925, while Jerzy Neyman proposed his solution in 1926.
It's amazing that after almost 100 years, there are lively discussions about them!

In my opinion, these are two different, not always competitive tools, like screws and nails.
If we have specified hypothesis to test, it'll be better to use Neyman approach and calculate LR or other suitable test, with satisfying α and β.

But, when we want to just test null hypothesis for error detection, or basic exploratory analysis, p-value seems to be good approach.

**David Colquhoun** @david_colquhoun@mstdn.social · Feb 20, 2023, 15:02

**David Colquhoun** @david_colquhoun@mstdn.social · Feb 20, 2023, 15:02

Feb 20, 2023, 15:02

David Colquhoun @david_colquhoun@mstdn.social

@plenartowicz @lakens
I can't agree with your last sentence. p-values are over-optimistic (at least in cases where it's sensible to test a point null) eg -that's been well understood since Sellke & Berger 1987 -see eg https://royalsocietypublishing.org/doi/10.1098/rsos.171085
and
https://www.tandfonline.com/doi/full/10.1080/00031305.2018.1529622
and Benjamin & Berger 2019
https://www.tandfonline.com/doi/full/10.1080/00031305.2018.1543135

That's why Robert Matthews said:

5ad2cfa39922b644.jpg

**Paweł Lenartowicz** @plenartowicz@qoto.org · 2023-02-20T16:05:22Z

Paweł Lenartowicz @plenartowicz@qoto.org

@david_colquhoun @lakens

I'm quite sure that I agree with you. But, in my opinion, it's problem with lack of rigour and putting too much confidence in "statistical ritual" than p-value itself. Let me provide some example:

1) We are using p-value to detect candidate genes in some traits, like depression, anxiety, intelligence...

2) We can conduct some meta-analysis of such analysis, to make sure that candidate gene has effect, and obtain some "satisfying low p-value", like p = .00002 (σ > 4) and based on these results to conclude that we have strong evidence for effect.

Point "2)" is wrong, and there is a little evidence in it, but I think, that using p-value to detect "candidates" is defendable. Or, at least, I'm not aware of any better method.

Feb 20, 2023, 16:05 · · · ·

**David Colquhoun** @david_colquhoun@mstdn.social · Feb 20, 2023, 16:10

**David Colquhoun** @david_colquhoun@mstdn.social · Feb 20, 2023, 16:10

Feb 20, 2023, 16:10

David Colquhoun @david_colquhoun@mstdn.social

@plenartowicz @lakens
A lot of work has gone into genome analyses -they are a good example of the failure of p values. But I'm not at all expert in that area so I'm sorry, but I can't point you to better methods

Trending now

Resources

Developers

What is Mastodon?

qoto.org

More…