You have a six-sided die. You keep a cumulative total of your dice rolls. (E.g. if you roll a 3, then a 5, then a 2, your cumulative total is 10.) If your cumulative total is ever equal to a perfect square, then you lose, and you go home with nothing. Otherwise, you can choose to go home with a payout of your cumulative total, or to roll the die again.
My question is about the optimal strategy for this game. In particular, this means that I am looking for an answer to this question: if my cumulative total is n, do I choose to roll or not to roll in order to maximize my cumulative total? Is there some integer N after which the answer to this question is always to roll?
I think that there is such an integer, and I conjecture that this integer is 4. My reasoning is that the square numbers become sufficiently sparse for the expected value to always be in increased by rolling the die again.
As an example, suppose your cumulative total is 35. Rolling a 1 and hitting 36 means we go home with nothing, so the expected value of rolling once is:
But the next square after 35 is 49. So in the event that we don’t roll a 36, we get to keep rolling the die at no risk as long as the cumulative total is less than 42. For the sake of simplification, let’s say that if we roll and don’t hit 36, then we will roll once more. That die-roll has an expected value of 3.5. This means the expected value of rolling on 35 is:
And since 35.42>35, the profit-maximizing choice is to roll again. And this strategy can be applied for every total. I don’t see when this would cease to be the reasonable move, though I haven’t attempted to verify it computationally. I intuitively think about this in terms of diverging sequences.
I recently had this question in a job interview, and thought it was quite interesting. (And counter-intuitive, since this profit-maximizing strategy invariably results in going home with nothing.)
How to use your observation in general:
Just checking things for 35 isn’t indicative of the general case, which for large values is different. Take your argument and use a general square n2.
Suppose you’re at n2−1. You can leave with n2−1, or you can roll. On a roll, you lose everything with probability 16. With probability 56 you’ll get at least (n+1)2−6=n2+2n−5 by rolling until it isn’t safe to any more. So, for a simple lower bound, we want to know if 56(n2+2n−5) is greater than n2−1. For large n, this is not the case, and we can actually get an upper bound by similar reasoning:
An upper bound:
[A tidied up version of the original follows. Keeping the original upper bound would have required a few extra lines of logic, so I’ve just upped it slightly to keep it brief.]
An upper bound: Suppose we’re between n2−5 and n2−1. If we roll, the best things could go for us would be to lose 16 the time, and, when we don’t lose, we get in the range (n+1)2−6 to (n+1)2−1 (the highest we could get without taking another risk). Just comparing current valuation, you’re trading at least n2−5 for at most 56(n2+2n) by the time you get to the next decision. The difference is −16n2+53n+5. For n≥13 this is negative. So if you’re at 132−k for 1≤k≤6, don’t roll. (Higher n making this difference even more negative gives this conclusion. See next paragraph for details.)
Extra details for the logic giving this upper bound: Let Wn be the current expected winnings of your strategy for the first time you are within 6 of n2, also including times you busted or stopped on your own at a lower value. The above shows that, if your strategy ever includes rolling when within 6 of n2 for n≥13, then Wn+1 is less than Wn. Therefore there’s no worry about anything increasing without bound, and you should indeed never go above 168.
An easy lower bound (small numbers):
Low values: For n=3 we have (5+6+7+8)/6>3 so roll at n=3. Except for the case n=3, if we roll at n, we’ll certainly get at least 56(n+3). So if 56(n+3)−n>0, roll again. This tells us to roll again for n<18 at the least. Since we shouldn't stop if we can't bust, this tells us to roll again for n≤18.
An algorithm and reported results for the general case:
Obtaining a general solution: Start with expected values of n and a decision of "stay" assigned to n for 163≤n≤168. Then work backwards to obtain EV and decisions at each smaller n. From this, you'll see where to hit/stay and what the set of reachable values with the optimal strategy is.
A quick script I wrote outputs the following: Stay at 30, 31, 43, 44, 45, 58, 59, 60, 61, 62, and 75+. You'll never exceed 80. The overall EV of the game is roughly 7.2. (Standard disclaimer that you should program it yourself and check.)