I know that because of the birthday problem, even after 365 friends, you’re going to have a lot of doubles and that there’s also an infinitesimal chance that even with infinite friends that there’s one day left out. But I was curious how many friends you’d need on average to have every day represented (this is ignoring leap day and assuming birthdays are equally distributed). Or to generalize it further, given n unique boxes and you’re placing balls in them with an equal 1/n chance for the ball to go into any box, how many balls would you have to place on average before every box had at least one ball?
This is known as the coupon collector’s problem, with the coupons in this case being birthdays.
If you keep adding friends one by one until you have one with each birthday, it will take 1 friend until you have the friend with one birthday. After the first friend, each new friend has a probability of 364365 to have a birthday you haven’t seen yet, and the expected number of friends to go through until you see a new birthday is the reciprocal of that probability, that is 365364. After this, the expected number of friends until you see the third unique birthday is 365363, and so forth.
The key insight here is that once you have seen, say, 42 different birthdays, the expected number of more friends you have to inspect until you see a new birthday does not depend on how many friends it took you to reach 42, so we can calculate the expectation for each next-unseen-birthday wait separately and just sum them.
The total expected number of friends until you have seen all birthdays is
where H365 is the 365th harmonic number, which is approximately ln(365)+0.577 (the added term is the Euler-Mascheroni constant).
So you need about 365⋅(ln(365)+0.577)=2364 friends.
If we take leap days into account (but still assume all birthdays except February 29 are uniformly distributed and the probability of being born on Feb 29 is 0.25365.25), things begin to get more complex.
To compute the expected time until all other days than February 29 have been seen, the numerators in the first sum above all become 365.25 instead of 365, so this expectation is 365.253652364≈2366. That’s not much of a difference, so the main correction comes from the risk that we won’t have seen any February 29 by the time all of the other days have been hit.
We can estimate the risk of this happening crudely as (365365.25)2364≈0.1982 — but this is not at all exact because (among other reasons) the averaging that led to 2364 does not commute with the non-linear x↦(365/365.25)x mapping.
The exact risk is
namely, each time we wait for a new birthday the risk that the next new birthday will be non-leap.
(To evaluate the gammas we can either cheat and use Wolfram Alpha, or use the approximation Γ(a+ε)/Γ(a)≈(a+ε2−12)ε which gives plenty of precision here and allows us to compute the risk as Γ(14)44√36558 if only we can find Γ(14) in a reference work).
So the expected number of friends in the leap-day scenario should be