math nerds, what is k(t) defined by the maximum number k of unique items such that when choosing 9 of them with replacement t times, on average each choice will include no more than 3 items that have been chosen before? #math #probability

(boosts welcome, and no this isn't homework)

Follow

@dragfyre do you mean the minimum k? As you increase the choices available, the average number of collisions will decrease. So once you find a k with sufficiently low collisions that no more than three repeats occur on average, any number k' > k will also satisfy the same criterion.

@khird yes, exactly, thanks. I was thinking it but forgot to specify minimum k.

@dragfyre Just to make sure I'm thinking about the problem correctly, a couple more questions:
1. You have k items in a bin
2. Draw exactly nine of these items and record them
3. Count how many of these items are being recorded for the second time or later
4. Replace all nine in the bin
5. Repeat steps 2, 3, and 4 until a total of t drawings have been completed & counted

When you say the items are chosen "with replacement", do you mean internally in step 2 (a single choice can then contain duplicate items), or just between drawings in step 4 (items can only be duplicated across choices, not within them)?

When you say "on average", do you mean that the average *choice* in a run contains no more than three previously-seen elements, or that in an average *run* no choice contains more than three previously-seen elements? A concrete example: if t=3, say the expected counts are 0, 2, and 4 duplicates in each successive round - the average choice contains two dupes, but the average run contains a choice with four dupes.

@khird 1-5 are correct!

"with replacement" meaning you put the items back in the box after each drawing of 9. that is, you can have the exact same item from one set of nine to the next, but not, say, twice in the same drawing of nine.

and finally, it's the average *run*.

@dragfyre Okay so here are my notes so far:

prob that 1 specific item has been chosen in a single round
9/k

prob that 3 specific items have all been chosen in a single round
9/k * 8/(k-1) * 7/(k-2) = 504/(k³-3k²+2k)

9 choose 3
84

prob that an event, with prob P of happening in any given way, happened in any of x possible ways
1 - (1-P)^x

prob that any 3 of 9 items have all been chosen in a single round
1 - [1 - 504/(k³-3k²+2k)]^84

This lets us compute k(2) = 41. I was surprised at how large k had to be for such a small t, but I couldn't find any errors in my arithmetic or reasoning.

I'm stuck on extending this to multiple previous rounds though. The straightforward approach would be to do 1 - (1-P)^(t²-t), substituting the above expression for P, but I think the probabilities aren't independent - if you draw three times, you have to do three comparisons: 1v2, 1v3, 2v3 - and once you know that 1v2 and 1v3 passed, the probability of 2v3 passing is no longer its a priori value.

@khird yeah I figured it would be a headache and a half to figure out 😅 but it's good to get a sense of how to start it. and it's not too surprising for me that you'd need a large k to be honest, just because nine picks each time is a lot and takes quite a few items out of consideration.

Sign in to participate in the conversation
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.