probability 2/42/4 vs 3/63/6

Recently I was asked the following in an interview:

If you are a pretty good basketball player, and were betting on whether you could make 2 out of 4 or 3 out of 6 baskets, which would you take?

I said anyone since ratio is same. Any insights?

Answer

Depends on how good you are

enter image description here

The explanation is intuitive:

  • If you are not very good (probability that you make a single shot – p < 0.6), then your overall probability is not very high, but it is better to bet that you’ll make 2 out of 4, because you may do it just by chance and your clumsiness has less chance to prove in 4 than in 6 attempts.

  • If you are really good (p > 0.6), then it is better to bet on 3 out of 6, because if you miss just by chance, you have better chance to correct yourself in 6 attempts.

The curves meet exactly at p = 0.6.

In general, the more attempts, the more of real skill reveals

This is best illustrated on the extreme case:

enter image description here

With more attempts, it is almost binary case – you either succeed or not, based on your skill. With high N, the result will be close to your original expectation.

Note that with high N and p = 0.5, the binomial distribution gets narrower and converges to normal distribution.

Everything here just revolves around binomial distribution,

which tells you that the probability that you will score exactly k shots out of n is

P(X = k) = \binom{n}{k} p^k (1-p)^{n-k}

The probability that you will score at least k = n/2 shots (and win the bet) is then

P(X \ge k) = \sum^{n}_{i=k} \binom{n}{i} p^i (1-p)^{n-i}

Why the curves don’t meet at p = 0.5?

Look at the following plots:

enter image description here

These plots are for p = 0.5. The binomial distribution is symmetric for this value. Intuitivelly, you expect 2 of 4 or 3 of 6 to take half of the distribution. But if you look especially at the left plot, it is clear that the middle column (2 successful shots) goes far beyond the half of the distribution (dashed line), which is denoted by the red arrow. In the right plot (3/6), this proportion is much smaller.

If you sum the gold bars, you will get:

P(make at least   2 out of    4) = 0.6875
P(make at least   3 out of    6) = 0.65625
P(make at least 500 out of 1000) = 0.5126125

From these figures, as well as from the plots, is apparent that with high N, the proportion of the distribution “beyond the half” converges to zero, and the total probability converges to 0.5.

So, for the curves to meet for low Ns, p must be higher to compensate for this:

enter image description here

P(make at least   2 out of    4) = 0.8208
P(make at least   3 out of    6) = 0.8208

Full code in R:

f6 <- function(p) {
    dbinom(3, 6, p) +
    dbinom(4, 6, p) + 
    dbinom(5, 6, p) + 
    dbinom(6, 6, p) 
}

f4 <- function(p) {
    dbinom(2, 4, p) +
    dbinom(3, 4, p) + 
    dbinom(4, 4, p)
}

fN <- function(p, from, max) {
    #sum(sapply(from:max, function (x) dbinom(x, max, p)))
    s <- 0
    for (i in from:max) {
        s <- s + dbinom(i, max, p)
    }
    s
}
f1000 <- function (p) fN(p, 500, 1000)


plot(f6, xlim = c(0,1), col = "red", lwd = 2, ylab = "", main = "Probability that you will make ...", xlab = "p (probability you make a single shot)")
curve(f4, col = "green", add = TRUE, lwd = 2)
curve(f1000, add = TRUE, lwd = 2, col = "blue")
legend("topleft", c("2 out of 4", "3 out of 6", "500 out of 1000"), lwd = 2, col = c("green", "red", "blue"), bty = "n")

plotHist <- function (n, p) {
    plot(x=c(-0.5,n+0.5),y=c(0,0.41),type="n", xaxt="n", xlab = "successful shots", ylab = "probability",
        main = paste0(n/2, "/", n, ", p = ", p))
    axis(1, at=0:n, labels=0:n)
    x <- 0:n
    y <- dbinom(0:n, n, p)
    w <- 0.9
    #lines(0:4, dbinom(0:4, 4, 0.5), lwd = 50, type = "h", lend = "butt")
    rect(x-0.5*w, 0, x+0.5*w, y, col = "lightgrey")
    uind <- (n/2+1):(n+1)
    rect(x[uind]-0.5*w, 0, x[uind]+0.5*w, y[uind], col = "gold")
}

par(mfrow = c(1, 2))
plotHist(4, 0.5)
abline(v = 2, lty = 2)
arrows(2-0.5*0.9, 0.17, 2, 0.17, col = "red", code = 3, length = 0.1, lwd = 2)
plotHist(6, 0.5)

f4(0.5)
f6(0.5)
f1000(0.5)

par(mfrow = c(1, 2))
plotHist(4, 0.6)
plotHist(6, 0.6)

f4(0.6)
f6(0.6)

Attribution
Source : Link , Question Author : zephyr , Answer Author : Tomas

Leave a Comment