Greenland et al. (2016, doi.org/10.1007/s10654-016-014 ) suggested a more refined goal of statistics than testing study hypotheses is the evaluation of the (un)certainty of effect sizes.

I agree. The majority of studies are exploratory, hence testing hypotheses does not make sense.

#science #statistics #papers

@raoulvanoosten Don't you think this conflates testing (distinguishing signal from noise) and testing hypotheses (confirming or falsifying theoretical predictions)? Even if you do not want to do the second, you often want to do the first, no?

@lakens do you mean estimation of ES (un)certainty is enough both to distinguish signal from noise and to test hypotheses?

@raoulvanoosten That ius exactly the discussion - are you happy just estimating an effect size, and discussing it, even if that effect size can be noise? If so, then you do not need to test. But in practice, this is not what I see in papers - even in estimation papers. People often declare something 'an effect' -not just 'an estimate'

@lakens I'm not sure yet. I think significance testing (with p < .05) is a bad idea, while a priori determination of when claims will be accepted or refuted is essential. I am undecided whether ES and CIs are enough (I have not come across other frequentist methods yet).

@raoulvanoosten I think it is currently fashionable to think p<.05 is bad, while almost everyone thinks it is a useful tool that, as all statistics, is often used without sufficient training. The difference is important. I have never seen anyone provide a coherent view on making claims without error control

@lakens it is definitely fashionable to criticize NHST and the default p < .05. I think the good thing about that is that researchers shouldn't use these defaults without thinking about them. Which is mostly about education, like you say.

@raoulvanoosten I agree. And I think it is good to show how to do it better in practice. This step does not happen a lot - that was the whole point of the 'moving beyond p < 0.05' special issue - and even in that special issue, most solutions STILL recommended p-values, but proposed to add some additional statistic.

@lakens but the editors of that special issue suggest "don't use statistical significance". That's also my current standpoint: p-values are fine as a continuous metric, and more education is needed so people use them correctly.

Hypothesis testing needs more, and I'm unsure yet what.

@raoulvanoosten The editors of that special issue are, regrettably, incompetent biased unscientific individuals who, with their very bad editorial, led to a taskforce that had to correct their mistakes doi.org/10.1080/09332480.2021. so that people like you would not be misguided by them. So, I am very sorry if I just ignore that editorial altogether, and listen to more competent people.

@lakens subtle :P. I did not know this. I'll check out that paper. Thanks.

@raoulvanoosten The emotion is strong. I am writing a blog about this (which I rarely do anymore). They also lie that most of the articles in the special issue agree with their view. Not at all true. So unscientific. So misleading.

@lakens aw man, that sucks. Thanks for the Task Force paper. It seems I should check out all papers in the special issue myself, too.

@raoulvanoosten I added a short review in our revised version of psyarxiv.com/af9by/. The screenshots are the relevant paragraphs. I wanted to write a blog about it to discuss this in more detail (we are very brief in the paper).

@lakens so using interval hypothesis tests (and meaningful effect sizes)? When I started reading about the NHST-issue about a year ago (with papers like Meehl, 1967), that's what I thought but I couldn't find concrete examples and ended up with the "abandon significance" idea. Good to hear there is a body of work that supports the use of interval hypotheses.

@raoulvanoosten @lakens
The problem with confidence intervals is that they are random variables, as are p values. That means that when you calculate CI for your experiment, it is not right to say there's a 95% chance that the true value is included in the CI. A repeat of the experiment will give different CI.

@david_colquhoun so that's what Greenland et al (2016, # 202301051039 greenland2016) meant. So are they still useful?

@raoulvanoosten
Are CI still useful? The problem is that people tie themselves in knots when trying to define how they should be interpreted, just as they do with p values. That's why I love the Wagenmakers quotation. I've advocated supplementing the p value and CI (rather than abandoning them) with a likelihood ratio or some measure of false positive risk

@david_colquhoun @lakens a 99% CI is broader than a 95% CI, so the probability of it containing the true value is higher. If the probability is not 99%, is there any way of knowing what it is? Or is that what you call struggling?

@raoulvanoosten @lakens
It's true that CI and p values are (crude) measure of the plausibility of the null, but under many circumstances they're misleadingly optimistic. As Greenland points out, the probability that your CI contain the true value is either 0 or 1.
Inductive inference is hard! 🙂

Sign in to participate in the conversation
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.