Experimental results discovered in opt-in panels like mTurk or Prolific are surprisingly replicable in nationally-representative samples (70%).
The reason is because there is little treatment heterogeneity in the 1st place.
I’ve always held the default is our treatment effects are heterogeneous, but our tools to discover it are garbage (e.g. interactions).
This paper still uses interactions, but the evidence forlittle treatment heterogeneity is important.
@Protzko well treatment heterogeneity is scale-dependent, so I think its a little more complicated. If a covariate has (1) some effect on the outcome & (2) homogeneity on one scale, then it must be heterogeneous on another scale (additive vs. multiplicative).
L'Abbe plots are a nice visualization of why this must be the case
@PausalZ scale Dependency seems an odd term. Can you explain?
I'm thinking of the difference between a trait and the scale it measures.
Scale dependent sounds more like an artifact, but this may be a different jargon
@Protzko the plot, shows the relationship between measures. The gray solid lines indicate homogeneity (there are an inf number of lines, just a few shown)
For (effect measures conditional on a trait, like the CATE) no heterogeneity, points have to lie on a gray line. However, that cannot hold on both scales as shown by the red dots
@Protzko to summarize, the dependence is an artifact, but applies to any effect measure we select. When we talk about heterogeneity, we need to be careful as it is always in reference to the scale of the effect measure