# Cereal Boxes Redux

In my last post, my students were wrestling with a question about cereal prizes.  Namely, if there is one of three (uniformly distributed) prizes in every box, what’s the probability that buying three boxes will result in my ending up with all three different prizes?  Not so great, turns out.  It’s only 2/9.  Of course this raises another natural question: How many stupid freaking boxes do I have to buy in order to get all three prizes?

There’s no answer, really.  No number of boxes will mathematically guarantee my success.  Just as I can theoretically flip a coin for as long as I’d like without ever getting tails, it’s within the realm of possibility that no number of purchases will garner me all three prizes.  But, just like the coin, students get the sense that it’s extremely unlikely that you’d buy lots and lots of boxes without getting at least one of each prize.  And they’re right.  So let’s tweak the question a little: How many boxes do I have to buy on average in order to get all three prizes?  That’s more doable, at least experimentally.

I have three sections of Advanced Algebra with 25 – 30 students apiece.  I gave them all dice to simulate purchases and turned my classroom—for about ten minutes at least—into a mathematical sweatshop churning out Monte Carlo shopping sprees.  The average numbers of purchases needed to acquire all prizes were 5.12, 5.00, and 5.42.  How good are those estimates?

Simulating cereal purchases with dice

Here’s my own simulation of 15,000 trials, generated in Python and plotted in R:

I ended up with a mean of 5.498 purchases, which is impressively close to the theoretical expected value of 5.5.  So our little experiment wasn’t too bad, especially since I’m positive there was a fair amount of miscounting, and precisely one die that’s still MIA from excessively enthusiastic randomization.

And now here’s where I’m stuck.  I can show my kids the simulation results.  They have faith—even though we haven’t formally talked about it yet—in the Law of Large Numbers, and this will thoroughly convince them the answer is about 5.5.  I can even tell them that the theoretical expected value is exactly 5.5.  I can even have them articulate that it will take them precisely one box to get the first new toy, and three boxes, on average, to get the last new toy (since the probability of getting it is 1/3, they feel in their bones that they should have to buy an average of 3 boxes to get it).  But I feel like we’re still nowhere near justifying that the expected number of boxes for the second toy is 3/2.

For starters, a fair number of kids are still struggling with the idea that the expected value of a random variable doesn’t have to be a value that the variable can actually attain.  I’m also not sure how to get at this next bit.  The absolute certainty of getting a new prize in the first box is self-evident.  The idea that, with a probability of success of 1/3, it ought “normally” to take 3 tries to succeed is intuitive.  But those just aren’t enough data points to lead to the general conjecture (and truth) that, if the probability of success for a Bernoulli trial is p, then the expected number of trials to succeed is 1/p.  And that’s exactly the fact we need to prove the theoretical solution.  Really, that’s what we need basically to solve the problem completely for any number of prizes.  After that, it’s straightforward:

The probability of getting the first new prize is n/n.  The probability of getting the second new prize is (n-1)/n … all the way down until we get the last new prize with probability 1/n.  The expected numbers of boxes we need to get all those prizes are just the reciprocals of the probabilities, so we can add them all together…

If X is the number of boxes needed to get all n prizes, then

$E(X) = \frac{n}{n} + \frac{n}{n-1} + \cdots + \frac{n}{1} = n(\frac{1}{n} + \frac{1}{n-1} + \cdots + \frac{1}{1}) = n \cdot H_n$

where Hn is the nth harmonic number.  Boom.

Oh, but yeah, I’m stuck.

## 8 thoughts on “Cereal Boxes Redux”

1. Isn’t it $\frac{2}{3}N+1$, where N is the total number of boxes produced?

• Er…for the “how many boxes do I have to buy to guarantee getting all prizes?” question.

• Yes, you’re right. The coin analogy fails with the cereal scenario because there really is a way to guarantee success here: you just buy one more than 2/3 of the total boxes of cereal. As far as the cereal model goes, however, the analogy works. In order to make the problem practically doable in the face of so many unknowns (total N, number of boxes purchased by others, which prizes they contained, etc.), we have to assume that the prizes are i.i.d, otherwise we’re no longer in a Bernoulli setting and things get prohibitively sneaky for a H.S. problem (or, I’ll wager, any kind of problem).

You are correct, though, I should have been more careful with my language. In this case, though, as in the coin case, the expected value of the geometric random variable converges so quickly that we needn’t spend too much time worrying about it. I did several Python simulations, millions of trials in all, and I think my max number of purchases was somewhere around 30 boxes. If this were a real cereal promotion, that would be such a drop in the bucket w/r/t N that the actual value of N is irrelevant. The probability of getting anywhere near 2/3 of that value without success would be orders of magnitude lower than reasonable.

2. More substantively, I think there is a very real and very unsolved problem at play here. Verifying a prediction in the realm of probability is very, very different from verifying it in normal computational mathematics.

The difference between “about 5.5” and “exactly 5.5” matters to theoretical statistics. Does it matter to understanding probability? I guess the thing I feel in my bones is that there is a tremendous amount of value in being able to design an accurate simulation (either with low-tech dice or high-tech software) to approximate probabilities.

Indeed, this is really the only way to check our theoretical probability computations. You think the probability of getting all three coins matching is 1/2 because it’s either match or no-match? Let’s flip a bunch of coins and see how that works out for you. You think there’s only one way to get all three prizes because order doesn’t matter? Let’s roll some dice and test that model out.

But you’re also interested in building the theoretical model in some specific ways. Good. But you’re gonna need to keep checking back in with the real world. So how can your students begin to feel in their gut that it’s 3/2 boxes until they get the second prize? By going back to the data. There are probably a bunch of “1 box” solutions, a bunch of “2 box” solutions and a smaller number of “3 box”, “4 box”, etc. solutions. I’m guessing they average out to about 3/2.

• I think you’ve articulated one of my key difficulties in teaching probability, even though it’s one of my favorite topics as a student: there are no good reveals. There are no ta-da moments where we can verify our calculations via direct measurement. I can’t pull a Dan Meyer and let the fan coast to a stop. The best I can hope for is a good simulation that gives us reasonable agreement with theory.

Because, in the end, the theory has to make valid predictions about real life. The first few people who had a real run at probability theory couldn’t agree on the odds for simple dice games, for God’s sake. Even Galileo wasn’t sure why predictions and observations wouldn’t quite line up.

It’s just tough for me to keep “checking back,” since it’s time consuming and, ultimately, a little unsatisfying. We end up with results that suggest a pattern consistent with our solution, but that’s a very different kind of verification from hitting the stopwatch and saying, “Yes! I knew it!”

• Indeed. This is an argument in favor of doing LOTS of simulations in the study of probability, right?

We can debate the power of that ceiling fan example, but point made well enough.

• Indeed. This is an argument in favor of doing lots (maybe not capitalized) of simulations in the study of probability. If you hope to convey the importance of modeling (which I do), then you have no choice but to explicitly demonstrate the value of theoretical models in predicting verifiable/falsifiable results. On the flip side of the coin (ha!), you have to demonstrate how conjectures based on observed patterns might be generalized into models in the first place so that we can go about the business of making predictions about future results.

Normally (i.e., in a deterministic setting), we can accomplish both of those goals via direct measurement. “Let’s see if we were right.” Or, “Let’s see if we can expand upon what we’ve noticed.” In probability, I don’t see any way around simulations. Otherwise there’s no way to verify anything, and there’s no data to conjecture about. Every textbook makes a big deal about theoretical vs. experimental probability, but then each is subsequently treated as if its value is simply as a foil for the other. I think simulations allow us to see them rather as interacting with, and informing the other in an intimate way. Which, I submit, is much closer to the truth.