# How likely is it not to be anyone’s best friend?

A teenage acquaintance of mine lamented:

Every one of my friends is better friends with somebody else.

Thanks to my knowledge of mathematics I could inform her that she’s not alone and $e^{-1}\approx 37\%$ of all people could be expected to be in the same situation, which I’m sure cheered her up immensely.

This number assumes that friendships are distributed randomly, such that each person in a population of $n$ chooses a best friend at random. Then the probability that any given person is not anyone’s best friend is $(1-\frac{1}{n-1})^{n-1}$, which tends to $e^{-1}$ for large $n$.

Afterwards I’m not sure this is actually the best way to analyze the claim. Perhaps instead we should imagine assigning a random “friendship strength” to each edge in the complete graph on $n$ vertices, in which case my friend’s lament would be “every vertex I’m connected to has an edge with higher weight than my edge to it”. This is not the same as “everyone choses a best friend at random”, because it guarantees that there’s at least one pair of people who’re mutually best friends, namely the two ends of the edge with the highest weight.

(Of course, some people are not friends at all; we can handle that by assigning low weights to their mutual edges. As long as everyone has at least one actual friend, this won’t change who are whose best friends).

(It doesn’t matter which distribution the friendship weights are chosen by, as long as it’s continuous — because all that matters is the relative order between the weights. Equivalently, one may simply choose a random total order on the $n(n-1)/2$ edges in the complete graph).

In this model, what is the probability that a given person is not anyone’s best friend?

By linearity of expectations, the probability of being mutually best friends with anyone is $\frac{n-1}{2n-3}\approx\frac 12$ (much better than in the earlier model), but that doesn’t take into account the possibility that some poor soul has me as their best friend whereas I myself has other better friends. Linearity of expectation doesn’t seem to help here — it tells me that the expected number of people whose best friend I am is $1$, but not the probability of this number being $0$.

(Edit: Several paragraphs of numerical results now moved to a significantly expanded answer)

The probability for large $n$ is $e^{-1}$ in the friendship-strength model too. I can’t even begin to explain why this is, but I have strong numerical evidence for it. More precisely, if $p(n)$ is the probability that someone in a population of $n$ isn’t anyone’s best friend, then it looks strongly like

as $n$ tends to infinity.

The factor of $2$ may hint at some asymptotic connection between the two models, but it has to be subtle, because the best-friend relation certainly doesn’t look the same in the two models — as noted in the question, in the friendship-strength model we expect half of all people to be mutually best friends with someone, whereas in the model where everyone chooses a best friend independently, the total expected number of mutual friendships is only $\frac12$.

The offset $7/3$ was found experimentally, but there’s good evidence that it is exact. If it means anything, it’s a mystery to me what.

How to compute the probability. Consider the complete graph on $n$ vertices, and assign random friendship weights to each edge. Imagine processing the edges in order from the strongest friendship towards the weakest. For each vertex/person, the first time we see an edge ending there will tell that person who their best friend is.

The graphs we build can become very complex, but for the purposes of counting we only need to distinguish three kinds of nodes:

• W – Waiting people who don’t yet know any of their friends. (That is, vertices that are not an endpoint of any edge processed yet).
• F – Friends, people who are friends with someone, but are not anyone’s best friend yet. Perhaps one of the Waiting people will turn out to have them as their best friend.
• B – Best friends, who know they are someone’s best friend.

At each step in the processing of the graph, it can be described as a triple $(w,f,b)$ stating the number of each kind of node. We have $w+f+b=n$, and the starting state is $(n,0,0)$ with everyone still waiting.

• If we see a WW edge, two waiting people become mutually best friends, and we move to state $(w-2,f,b+2)$. There are $w(w-1)/2$ such edges.
• If we see a WF edge, the F node is now someone’s best friend and becomes a B, and the W node becomes F. The net effect is to move us to state $(w-1,f,b+1)$. There are $wf$ such edges.
• If we see a WB edge, tne W node becomes F, but the B stays a B — we don’t care how many people’s best friends one is, as long there is someone. We move to $(w-1,f+1,b)$, and there are $wb$ edges of this kind.
• If we see a FF or FB or BB edge, it represents a friendship where both people already have better friends, so the state doesn’t change.

Thus, for each state, the next WW or WF or WB edge we see determine which state we move to, and since all edges are equally likely, the probabilities to move to the different successor states are $\frac{w-1}{2n-w-1}$ and$\frac{2f}{2n-w-1}$ and $\frac{2b}{2n-w-1}$, respectively.

Since $w$ decreases at every move between states, we can fill out a table of the probabilities that each state is ever visited simply by considering all possible states in order of decreasing $w$. When all edges have been seen we’re in some state $(0,f,n-f)$, and summing over all these we can find the expected $f$ for a random weight assignment.

By linearity of expectation, the probability that any given node is F at the end must then be $\langle f\rangle/n$.

Since there are $O(n^2)$ states with $w+f+b=n$ and a constant amount of work for each state, this algorithm runs in time $O(n^2)$.

Numerical results. Here are exact results for $n$ up to 18:

 n approx                 exact
--------------------------------------------------
1  100%                   1/1
2  0.00%                  0/1
3 33.33%                  1/3
4 33.33%                  1/3
5 34.29%                 12/35
6 34.81%                 47/135
7 35.16%                731/2079
8 35.40%               1772/5005
9 35.58%              20609/57915
10 35.72%            1119109/3132675
11 35.83%             511144/1426425
12 35.92%           75988111/211527855
13 36.00%         1478400533/4106936925
14 36.06%        63352450072/175685635125
15 36.11%      5929774129117/16419849744375
16 36.16%     18809879890171/52019187845625
17 36.20%    514568399840884/1421472473796375
18 36.24% 120770557736740451/333297887934886875


After this point, exact rational arithmetic with 64-bit denominators start overflowing. It does look like $p(n)$ tends towards $e^{-1}$. (As an aside, the sequence of numerators and denominators are both unknown to OEIS).

To get further, I switched to native machine floating point (Intel 80-bit) and got the $p(n)$ column in this table:

   n     p(n)    A      B      C      D      E      F      G      H
---------------------------------------------------------------------
10 .3572375 1.97+  4.65-  4.65-  4.65-  2.84+  3.74+  3.64+  4.82-
20 .3629434 2.31+  5.68-  5.68-  5.68-  3.47+  4.67+  4.28+  5.87-
40 .3654985 2.62+  6.65-  6.64-  6.64-  4.09+  5.59+  4.90+  6.84-
80 .3667097 2.93+  7.59-  7.57-  7.57-  4.70+  6.49+  5.51+  7.77-
100 .3669469 3.03+  7.89-  7.87-  7.86-  4.89+  6.79+  5.71+  8.07-
200 .3674164 3.33+  8.83-  8.79-  8.77-  5.50+  7.69+  6.32+  8.99-
400 .3676487 3.64+  9.79-  9.69-  9.65-  6.10+  8.60+  6.92+  9.90-
800 .3677642 3.94+ 10.81- 10.60- 10.52-  6.70+  9.50+  7.52+ 10.80-
1000 .3677873 4.04+ 11.17- 10.89- 10.80-  6.90+  9.79+  7.72+ 11.10-
2000 .3678334 4.34+ 13.18- 11.80- 11.63-  7.50+ 10.69+  8.32+ 12.00-
4000 .3678564 4.64+ 12.74+ 12.70- 12.41-  8.10+ 11.60+  8.92+ 12.90-
8000 .3678679 4.94+ 13.15+ 13.60- 13.14-  8.70+ 12.50+  9.52+ 13.81-
10000 .3678702 5.04+ 13.31+ 13.89- 13.36-  8.90+ 12.79+  9.72+ 14.10-
20000 .3678748 5.34+ 13.86+ 14.80- 14.03-  9.50+ 13.69+ 10.32+ 15.00-
40000 .3678771 5.64+ 14.44+ 15.70- 14.67- 10.10+ 14.60+ 10.92+ 15.91-


The 8 other columns show how well $p(n)$ matches various attempts to model it. In each column I show $-\log_{10}|p(n)-f_i(n)|$ for some test function $f_i$ (that is, how many digits of agreement there are between $p(n)$ and $f_i(n)$), and the sign of the difference between $p$ and $f_i$.

• $f_{\tt A}(n)=e^{-1}$

In the first column we compare to the constant $e^{-1}$. It is mainly there as evidence that $p(n)\to e^{-1}$. More precisely it looks like $p(n) = e^{-1} + O(n^{-1})$ — whenever $n$ gets 10 times larger, another digit of $e^{-1}$ is produced.

• $f_{\tt C}(n)=\Bigl(1-\frac{1}{2n-7/3}\Bigr)^{2n-7.3}$

I came across this function by comparing $p(n)$ to $(1-\frac{1}{n-1})^{n-1}$ (the probability in the chose-best-friends-independently model) and noticing that they almost matched beteen $n$ and $2n$. The offset $7/3$ was found by trial and error. With this value it looks like $f_{\tt C}(n)$ approximates $p(n)$ to about $O(n^{-3})$, since making $n$ 10 times larger gives three additional digits of agreement.

• $f_{\tt B}=\Bigl(1-\frac{1}{2n-2.332}\Bigr)^{2n-2.332}$ and $f_{\tt D}=\Bigl(1-\frac{1}{2n-2.334}\Bigr)^{2n-2.334}$

These columns provide evidence that $7/3$ in $f_{\tt C}$ is likely to be exact, since varying it just slightly in each direction gives clearly worse approximation to $p(n)$. These columns don’t quite achieve three more digits of precision for each decade of $n$.

• $f_{\tt E}(n)=e^{-1}\bigl(1-\frac14 n^{-1}\bigr)$ and $f_{\tt F}(n)=e^{-1}\bigl(1-\frac14 n^{-1} - \frac{11}{32} n^{-2}\bigr)$

Two and three terms of the asymptotic expansion of $f_{\tt C}$ in $n$. $f_{\tt F}$ also improves cubically, but with a much larger error than $f_{\tt C}$. This seems to indicate that the specific structure of $f_{\tt C}$ is important for the fit, rather than just the first terms in its expansion.

• $f_{\tt G}(n)=e^{-1}\bigl(1-\frac12 (2n-7/3)^{-1}\bigr)$ and $f_{\tt H}(n)=e^{-1}\bigl(1-\frac12 (2n-7/3)^{-1}-\frac{5}{24} (2n-7/3)^{-2}\bigr)$

Here’s a surprise! Expanding $f_{\tt C}$ in powers of $2n-7/3$ instead of powers of $n$ not only gives better approximations than $f_{\tt E}$ and $f_{\tt F}$, but also approximates $p(n)$ better than $f_{\tt C}$ itself does, by a factor of about $10^{0.2}\approx 1.6$. This seems to be even more mysterious than the fact that $f_{\tt C}$ matches.

At $n=40000$ the computation of $p(n)$ takes about a minute, and the comparisons begin to push the limit of computer floating point. The 15-16 digits of precision in some of the columns are barely even representable in double precision. Funnily enough, the calculation of $p(n)$ itself seems to be fairly robust compared to the approximations.