**Daniel Lakens** @lakens@mastodon.social · Mar 28, 2023, 06:26

**Daniel Lakens** @lakens@mastodon.social · Mar 28, 2023, 06:26

Daniel Lakens @lakens@mastodon.social

Mar 28, 2023, 06:26

Daniel Lakens @lakens@mastodon.social

Why is it difficult to interpret null results in underpowered studies? Below, you see a study with 50% power for an effect of d = 0.5. Let’s say the observed effect is d = 0.3, so p > 0.05. What do we do?

It could be that the null is true. Then we would observe non-significant results 95% of the time. It could be that there is an effect, but this is a Type 2 error – which should happen 50% of the time. How can we distinguish the two?

d41d1186eedbeb61.png

**Daniel Lakens** @lakens@mastodon.social · Mar 28, 2023, 06:27

**Daniel Lakens** @lakens@mastodon.social · Mar 28, 2023, 06:27

Mar 28, 2023, 06:27

Daniel Lakens @lakens@mastodon.social

The answer is, we can’t. But what we *can* do is to test if the effect, if any, is statistically smaller than anything we would care about. This is done in equivalence testing, or inferiority testing. Is the effect within some range (or below some upper value) we think is too small to matter.

7bdb8af037905b7c.png

**Daniel Lakens** @lakens@mastodon.social · Mar 28, 2023, 06:27

**Daniel Lakens** @lakens@mastodon.social · Mar 28, 2023, 06:27

Mar 28, 2023, 06:27

Daniel Lakens @lakens@mastodon.social

If you design a study, you need to make sure you can corroborate or reject the presence of a predicted effect. Combining NHST and equivalence testing, you can now end up with a *conclusive null result*. The effect, if any, is smaller than what you care about.

**Paweł Lenartowicz** @plenartowicz@qoto.org · Mar 28, 2023, 08:23

**Paweł Lenartowicz** @plenartowicz@qoto.org · Mar 28, 2023, 08:23

Mar 28, 2023, 08:23

Paweł Lenartowicz @plenartowicz@qoto.org

@lakens
Oh, my god, NO!
If we make strong assumptions about normality (Welch's) or uniformity (Student's t-test) of effect, as we do in equivalence testing, we can only conclude that that certain model is unlikely.

In other words, if the real effect is moderated or mediated, this procedure fails. Frequency-based statistics is very sensitive to model misspecification. It is a problem, It's not an advantage to use it. We can't conclude h0 because data is unlikely in specific h1.

**Paweł Lenartowicz** @plenartowicz@qoto.org · 2023-03-28T08:29:47Z

Paweł Lenartowicz @plenartowicz@qoto.org

@lakens

The solution for such problem is already known for 90 years.
1) specify your model
2) test your model against probable alternatives

Mar 28, 2023, 08:29 · · · ·

Resources

Developers

What is Mastodon?

qoto.org

More…