# Choosing office hours to maximise number of students who can attend at least one time slot

I have the following (real world!) problem which is most easily described using an example.

I ask my six students when the best time to hold office hours would be. I give them four options, and say that I’ll hold two hours in total. The poll’s results are as follows (1 means yes, 0 means no):

$$\begin{array}{l|c|c|c|c} \text{Name} & 9 \text{ am} & 10 \text{ am} & 11 \text{ am} & 12 \text{ pm} \\ \hline \text{Alice} & 1 & 1 & 0 & 0 \\ \text{Bob} & 1 & 1 & 0 & 1 \\ \text{Charlotte} & 1 & 1 & 0 & 1 \\ \text{Daniel} & 0 & 1 & 1 & 1 \\ \text{Eve} & 0 & 0 & 1 & 1 \\ \text{Frank} & 0 & 0 & 1 & 0 \\ \hline \text{Total} & 3 & 4 & 3 & 4 \end{array}$$

Naively, one might pick columns 2 and 4, which have the greatest totals. However, the solution which allows the most distinct people to attend is to pick columns 2 and 3.

In practice, however, I have ~30 possible timeslots and over 100 students, and want to pick, say, five different timeslots for office hours. How do I pick the timeslots which maximise the number of distinct students who can attend?

In general, this problem is the maximum coverage problem, which is NP-hard, so you’re not going to be able to find the optimal solution by any method substantially faster than brute force. (As I mentioned, for a problem of your size, brute force is still feasible.)

However, a modification of your strategy (to choose the timeslots with the highest totals) performs reasonably well.

It doesn’t actually make sense to choose all the timeslots with the highest totals immediately: if they all have the same students in them, then this isn’t any better than just choosing one of the timeslots. Instead, we can:

1. Choose one timeslot with the highest total number of students.
2. Remove all students that can make it to that timeslot from consideration. Recalculate the totals for the remaining students.
3. Repeat steps 1-2 until you have chosen $k=5$ timeslots.

According to the wikipedia article, this algorithm achieves an approximation ratio of $1 – \frac1e$; that is, if you can reach $N$ students with the optimal choice of timeslots, this algorithm will reach at least $(1 – \frac1e)N$ students.

There are other possible approximate solutions out there; for example, there is the big step heuristic, which considers all choices of $p$ timeslots in each step, as opposed to all individual timeslots. (When $p=1$, it is the algorithm above; when $p=k$, it is just brute force.) No fast algorithm has guaranteed performance better than $(1 – \frac1e)N$, but they may perform better in some cases.