@tslumley recently posted a reminder that in weighted sampling without replacement, the probability of inclusion is not usually proportional to the weight. @peter_ellis and @rstub posted they were surprised and made their own nice blog posts on the topic.

In importance resampling, this property of weighted sampling without replacement has been considered beneficial. Right now I don't have time to write a longer blog post with code examples, so here is just a short thread 🧵

1/n

@tslumley @peter_ellis @rstub In importance resampling if one of the weights dominates, with replacement the number of unique draws can be very small or even one. High variability of weights leads to high variance (possibly infinite) of importance resampling estimate. 2/n

@tslumley @peter_ellis @rstub Assuming we resample k times from a sample with size S (k < S), sampling without replacement constraints the probability of inclusion to be less than equal to 1/k. This introduces bias, but Skare, Bolviken, and Holden (2003) showed that this reduces the variance so much that the mean square error is better than with replacement! Downside of using sampling without replacement to reduce variance is that we need k<S. 3/n

@tslumley @peter_ellis @rstub Instead of resampling k<S without resampling to constrain the inclusion probability to 1/k, Ionides (2008) proposed truncating the highest importance weights to 1/sqrt(S), and then we can resample k=S with replacement, and get the similar reduction in variance and mean square error. 4/n

@tslumley @peter_ellis @rstub We (Vehtari et al., 2024, jmlr.org/papers/v25/19-556.htm) proposed Pareto smoothing to stabilize importance weights, which improves over the simple truncation. Modifying the weights adds bias, but that is in many cases negligible, and the approach includes self-diagnostic to warn when the bias is non-negligible. 5/n

@tslumley @peter_ellis @rstub Finally, when k=S, Kitagawa (1996) presented stratified and deterministic resampling, and Liu (2001) presented residual resampling, which all have smaller variance than simple random resampling (with replacement). 6/6

@tobychev @tslumley @peter_ellis @rstub No, I meant to write k=S, because that's what I do, but it works with k>=S, too, but I'm not aware of a case where that would be useful

Sign in to participate in the conversation
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.