# Consider the Strawberry

In general, I hate doing this — because it feels like a self-promotional trick — but in order for this post to make any kind of sense, you have to go back and read the last one.  In particular, you have to read Max’s comment.  I will put on my teacher face and wait for a few minutes.

***

For two reasons, I’m going to unpack the strawberry analogy a bit more: (1) I am in love with it, and (2) it highlights an important pedagogical point about the relationship between squares and rectangles.  For serious.

As Max pointed out, even very small children have no problem recognizing the rather trivial fact that all strawberries are fruits even though not all fruits are strawberries.  On the flip side, anyone who has ever taught geometry knows, with something like absolute certainty, that much older and more mathematically savvy students have great difficulty recognizing that all squares are rectangles even though not all rectangles are squares.  The situations are structurally identical (in each case we have some set, X, which is a proper subset of another set, Y), but the second one is much more problematic.  Why might that be?

The seemingly obvious answer is that recognizing a strawberry is nearly automatic, and probably evolutionarily encoded, while recognizing a square requires abstract reasoning about the congruence of mathematical objects called “line segments.”  But I’m not at all convinced that’s the problem.  They are both ultimately pattern-recognition tasks.  Without language getting in the way, you (and small children) can probably recognize strawberries and squares with comparable facility.

Which brings us to the language.  Even though the strawberry:fruit::square:rectangle situations are structurally identical, there is an important (and subtle) linguistic distinction in the latter case.  Consider the following story.

You find your favorite small child/guinea pig and present a challenge.  In your left hand you hold a strawberry, and in your right an apple.  You say to this child, “Which hand has the fruit in it?”  The child blinks at you for several moments, trying to study your face for clues about the answer to what has just got to be a trick question, before finally, tentatively, reaching out to point at one of your hands, more or less at random.  You reward the child with a piece of delicious fruit.

Consider the same story, except now you hold in your left hand a picture of a square, and in your right a picture of a generic rectangle.  You say to this child, “Which hand has the rectangle in it?”  The child immediately points to your right hand.  You reward the child with, I guess, a delicious piece of rectangle.

Why are these stories so different?  I submit that it’s not a mathematical issue.  The real problem stems from the fact that, linguistically, there is no unprivileged fruit: every class of fruit gets its own name.  But “square” is privileged relative to “rectangle.”  When presented with a generic rectangle, we have no word for saying that it is “a rectangle that is not a square.”  In fact, I made up the phrase “generic rectangle” precisely to try and convey that information.

So it turns out I lied a little bit before (how fitting) when I said the fruit/rectangle situations were structurally identical.  It’s true that in each case we have a set (square, strawberry) that is a subset of a larger set (rectangle, fruit), but it turns out the larger sets have different linguistic partitions.

$Rectangles = \{Rectangles \cap Squares\} \cup \{Rectangles \cap Squares^C\}$

$Fruits = Apples \cup Apricots \cdots \cup Strawberries \cup \cdots \cup Watermelons$

So when you ask the child which hand contains the rectangle, she chooses the generic rectangle immediately.  Why?  Because, had you meant the square, then you damn sure would’ve just said “square” in the first place, even though both hands hold perfectly correct answers to your challenge.  If our language were set up such that strawberries were the only specially named fruits (which seems like something Max would wholeheartedly support), the child in the first story would likewise choose your non-strawberry hand every time, without hesitation.

So what can we do with this?  It seems that strawberries have something to teach us about squares.  Actually, it seems that all the other fruits have something to teach us about rectangles.  It’s taken the entire history of humanity to organize fruits into useful equivalence classes, but luckily we find ourselves in a much, much simpler situation with rectangles; after all, there are only two classes we care about!  We already have a name for squares, so let’s call non-square rectangles “nares.”  Now our partition looks like this:

$Rectangles = Squares \cup Nares$

Which hand has the nare in it?  Easy.  Better yet, unambiguous.  Now, I’m not seriously lobbying for the introduction of nares into the mathematical lexicon (for one thing, nare is already a word for a weird thing), but it might be a fun way to introduce young children to the concept of a non-square rectangle.  After removing the greatest impediment to understanding the square/rectangle relationship (that “square” is the lone special case of this broader class of “rectangles,” which word is generally reserved for “rectangles-but-not-squares,” since, if someone means “square,” we already have a freaking word for it), that scaffolding can eventually be disassembled.

But the cognitive edifice the scaffolding initially supported will have cured a little by then.  In other words, why not make the distinction we actually care about explicit from the beginning, rather than end up in linguistic contortions to get around the fact that the distinction is solely implicit in standard usage?  Make up your own word, I don’t care.  Don’t want to be cute about it?  Fine.  Just abbreviate non-square rectangles as NSRs or something.  But make them easy to talk about — as easy as it is to talk about a tangerine or cumquat rather than a “fruit that might be a strawberry, but very often is not.”  Because, seriously, if that’s the way our fruit classification worked, there would be an awful lot of kids running around with the reasonable and tightly-held belief that strawberries are not fruit.

And that would be a shame.

# Inconvenient Truths

As happens with amazing frequency, Christopher Danielson said something interesting today on Twitter.

And, as also happens with impressive regularity, Max Ray chimed in with something that led to an interesting conversation — which, in the end, culminated in my assertion that not everything that is mathematically true is pedagogically useful.  I would go further and say that a truth’s usefulness is a function of the cognitive level at which it becomes both comprehensible and important — but not before.

By way of an example, Cal Armstrong took a shot at me (c.f. the Storify link above) for my #TMC13 assertion that it is completely defensible to say that a triangle (plus its interior) is a cone.  Because he is Canadian, I think he will find the following sentiment particularly agreeable: we’re both right.  A triangle both is, and is not, a cone, depending on the context.  It might be helpful to think of it as Schrödinger’s Coneangle: an object that exists as the superposition of two states (cone and triangle),  collapsing into a particular state only when we make a measurement.  In this case, the “measurement” is actually made by our audience.

When I am speaking to an audience of relative mathematical maturity, I can (ahem…correctly) say that cone-ness is a very broadly defined property: given any topological space, X, we can build a cone over X by forming the quotient space

$CX := X \times [0, 1] / \sim$

with the equivalence relation ~ defined as follows:

$(x,1) \sim (y,1) := x-y \in X \times \{0\}$

If we take X to be the unit interval with the standard topology, we get a perfectly respectable Euclidean triangle (and its interior).  Intuitively, you can think of taking the Cartesian product of the interval with itself, which gives you a filled-in unit square, and then identifying one of the edges with a single point.  Boom, coneangle.  Which, like Sharknado, poses no logical problems.

Of course, it is a problem when you’re talking to a middle school geometry student.  In that situation, saying that a triangle is a cone is both supremely unhelpful and ultimately dishonest.  What we really mean is that, in the particular domain of 3-dimensional Euclidean geometry, when we have a circle (disk) in a plane and one non-coplanar point, we can make this thing called a cone by taking all the line segments between the point and the base.  But to that student, in that phase of mathematical life, the particular domain is the only domain, and so we rightly omit the details.  In an eighth-grade geometry class, there is absolutely no good reason to introduce anything else.

Constructing a topological cone over the unit interval

We do this all the time as math teachers.  “Here, kid, is something that you can wrap your head around.  It will serve you quite well for a while.  Eventually we’re going to admit that we weren’t telling you the whole story — maybe we were even lying a little bit — but we’ll refine the picture when you’re ready.  Promise.”

Which brings me back to Danielson’s tweet.  From a mathematical point of view, there are all kinds of problems with saying that a rectangle has “two long sides and two short sides” (so many that I won’t even attempt to name them).  But how bad is this lie?  Better yet, how bad is the spirit of this lie?  I think it depends on the audience.  I’m not sure it’s so very wrong to draw a sharp (albeit technically imaginary) distinction for young children between squares and rectangles that are not squares.  It doesn’t seem all that different to me, on a fundamental level, from saying that cones are 3-dimensional solids.  Or that you can’t take the square root of a negative number.  Or that the sum of the interior angles of a quadrilateral is 360 degrees.  None of those statements is strictly true, but the truths are so very inconvenient for learners grappling with the concepts that we actually care about at the time.  It’s not currently important that they grasp the complete picture.  And it’s probably not feasible for them to do so, anyway.

Teaching mathematics is an iterative process, a feedback loop.  New information is encountered, reconciled with existing knowledge, and ultimately assimilated into a more complete understanding.  Today you will “know” that squares and rectangles are different.  Later, when you’re ready to think about angle measure and congruence, you will learn that they are sometimes the same.  Today you will “know” that times can only be 0 if either a or b is zero.  And tomorrow you will learn about the ring of integers modulo 6.

I will tell you the truth, children.  But maybe not today.

# Luck(?) of the Draw

What is luck?  Is luck?  And, if you vote yea, is a belief in luck an obstacle to understanding probability?

This question came up on Twitter a couple of nights ago when Christopher Danielson and Michael Pershan were discussing Daniel Kahneman’s recent book, Thinking, Fast and Slow.  Specifically, they were talking about the fact that Kahneman doesn’t shy away from using the word luck when discussing probabilistic events.  This, of course, is the kind of thing that makes mathematically fastidious people cringe.  And Danielson and Pershan are nothing if not mathematically fastidious.  Spend like five minutes with their blogs.  So Danielson twittered this string of twitterings:

According to Danielson, luck a “perceived bias in a random event.”  And, according to his interpretation of Kahneman, luck is composed of “happy outcomes that can be explained by probability.”  Let me see if I can define luck for myself, and then examine its consequences.

# What is luck?

I think, at its heart, luck is about whether we perceive the universe to be treating us fairly.  When someone is kind to us, we feel happy, but we can attribute our happiness to another’s kindness.  When someone is mean, we feel sad, but we can attribute our sadness to another’s meanness.  When we are made to feel either happy or sad by random events, however, there is no tangible other for us to thank or blame, and so we’ve developed this idea of being either lucky or unlucky as a substitute emotion.

But happy/sad and lucky/unlucky are relative feelings, and so there must be some sort of zero mark where we just feel…nothing.  Neutral.  With people, this might be tricky.  Certainly it’s subjective.  Really, my zero mark with people is based on what I expect of them.  If a stranger walks through a door in front of me without looking back, that’s roughly what I expect.  And, when that happens, I do almost no emoting whatsoever.  If, however, he holds the door for me, this stranger has exceeded my expectations, which makes me feel happy at this minor redemptive act.  If he sees me walking in behind him and slams the door in my face, he has fallen short of my expectations, which makes me sad and angry about him being an asshole.

And, in this regard, I think that feeling lucky is actually a much more rational response than being happy/sad at people, because with random events at least I can concretely define my expectation.  I have mathematical tools to tell me, with comforting accuracy, whether I should be disappointed with my lot in life; there is no need to rely on messy inductive inferences about human behavior.  So I feel lucky when I am exceeding mathematical expectations, unlucky when I’m falling short, and neutral when my experience roughly coincides with the expected value.  Furthermore, the degree of luck I feel is a function of how far I am above or below my expectation.  The more anomalous my current situation, the luckier/unluckier I perceive myself to be.

Let’s look at a couple examples of my own personal luck.

1. I have been struck by lightning zero times.  Since my expected number of lightning strikes is slightly more than zero, I’m doing better than I ought to be, on average.  I am lucky.  Then again, my expected number of strikes is very, very slightly more than zero, so I’m not doing better by a whole lot.  So yeah, I’m lucky in the lightning department, but I don’t get particularly excited about it because my experience and expectation are very closely aligned.
2. I have both my legs.  Since the expected number of legs in America is slightly less than two, I’m crushing it, appendage-wise.  Again, though, I’m extremely close to the expected value, so my luck is modest.  But, I am also a former Marine who spent seven months in Iraq during a period when Iraq was the explosion capital of the world.  My expected number of legs, conditioned on being a war veteran, is farther from two than the average U.S. citizen, so I am mathematically justified in feeling luckier at leg-having than most leg-having people in this country.

# Conclusion

Of course for some people the lottery is terrible.  People have gambling problems.  People spend way too much money on all kinds of things they probably shouldn’t.  But that doesn’t mean that everyone—or even most people—that play are suckers.  Eating the occasional King Size Snickers probably won’t get your foot chopped off; smoking the occasional cigarette probably won’t kill you (sorry, kids), and buying the occasional lottery ticket will likely have about zero net impact on your finances.  Besides, isn’t it worth it to dream, for even a day, of having indoor hot tubs?  They’re so bubbly.

# Building a Probability Cannon

For just a moment, let’s consider a staple of the second year algebra curriculum: the one-dimensional projectile motion problem.  (I used to do an awful lot of this sort of thing.)  It’s not a fantastic problem—it’s overdone, and often under-well—but it’s representative of many of our standard modeling problems in some important ways:

1. Every one of my students has participated in the activity we’re modeling.  They’ve thrown, dropped, and shot things.  They’ve jumped and fallen and dove from various heights.  In other words, they have a passing acquaintance with gravity.
2. Data points are relatively easy to come by.  All we need is a stopwatch and a projectile-worthy object.  If that’s impractical, then there are also some great and simple—and free—simulations out there (PhET, Angry Birds), and some great and simple—and free—data collection software as well (Tracker).
3. We only need a few data points to fix the parameters.  For a general quadratic model, we only need three data points to determine the particular solution.  Really we only need two, if we assume constant acceleration.
4. Experiments are easy to repeat.  Drop/throw/shoot the ball again.  Run the applet again.
5. The model conforms to a fairly nice and well-behaved family of functions.  Quadratics are continuous and differentiable and smooth, and they’re generally willing to submit to whatever mathematical poking we’re wont to visit upon them without getting gnarly.
6. Theoretical predictions are readily checked.  Want to know, for instance, when our projectile will hit the ground?  Find the sensible zero of the function (it’s pretty easy to sanity check its reasonableness—see #1 above).  Look at a table of values and step through the motion second-by-second (use a smaller delta t for an even better sense of what’s going on).  Click RUN on your simulation, and wait until it stops (self-explanatory).  And, if you’re completely dedicated, build yourself a cannon and put your money where your mouth is.

Of course I’ve chosen to introduce this discussion with the example of projectile motion, but there are plenty of other candidates: length/area/volume, exponential growth and decay, linear speed and distance.  Almost without exception (in the algebra classroom), we model phenomena that satisfy the six conditions listed above.

Almost.  Because then we run into probability, and probability isn’t so tame.  I’ll grant that #1 still holds (though I’m not entirely convinced it holds in the same sense), but the other five conditions go out the window.

# Data points are NOT easy to come by.

I can already hear you protesting.  “Flip a coin…that’s a data point!”  Well, yes.  Sort of.  But in the realm of probability, individual data points are ambiguous.  The ordered pair (3rd flip, heads) is very different from (3 seconds, 12 meters).  They’re both measurements, but the first one has much, much higher entropy.  Interpretation becomes problematic.  Here’s another example: My meteorologist’s incredibly sophisticated model (dart board?) made the following prediction yesterday: P(rain) = 0.6.  In other words, the event “rain” was more likely than the event “not rain.”  It did not rain yesterday.  How am I to understand this un-rain?  Was the model right?  If so, then I’m not terribly surprised it didn’t rain.  Was the model wrong?  If so, then I’m not terribly surprised it didn’t rain.  In what sense have I collected “data?”

And what if I’m interested in a compound event?  What if I want to know not just the result of a lone flip, but P(exactly 352 heads in 1000 flips)?  Now a single data point suddenly consists of 1000 trials.  So it turns out data points have the potential to be rather difficult to come by, which brings us to…

# We need an awful lot of data points.

I’m not talking about our 1000-flip trials here, which was just a result of my arbitrary choice of one particular problem.  I mean that, no matter what our trials consist of, we need to do a whole bunch of them in order to build a reliable model.  Two measurements in my projectile problem determine a unique curve and, in effect, answer any question I might want to ask.  Two measurements in a probabilistic setting tell me just about nothing.

Consider this historical problem born, like many probability problems, from gambling.  On each turn, a player rolls three dice and wins or loses money based on the sum (fill in your own details if you want; they’re not so important for our purposes here).  As savvy and degenerate gamblers, we’d like to know which sums are more or less likely.  We have some nascent theoretical ideas, but we’d like to test one in particular.  Is the probability of rolling a sum of 9 equal to the probability of rolling a sum of 10?  It seems it should be: after all, there are six ways to roll a 9 ({6,2,1},{5,3,1},{5,2,2},{4,4,1},{4,3,2},{3,3,3}), and six ways to roll a 10 ({6,3,1},{6,2,2},{5,4,1},{5,3,2},{4,4,2},{4,3,3})*.  Done, right?

It turns out this isn’t quite accurate.  For instance, the combination {6,2,1} treats all of the 3! = 6 permutations of those numbers as one event, which is bad mojo.  If you go through all 216 possibilities, you’ll find that there are actually 27 ways to roll a 10, and only 25 ways to roll a 9, so the probabilities are in fact unequal.  Okay, no biggie, our experiment will certainly show this bias, right?  Well, it will, but if we want to be 95% experimentally certain that 10 is more likely, then we’ll have to run through about 7,600 trials!  (For a derivation of this number—and a generally more expansive account—see Michael Lugo’s blog post.)  In other words, the Law of Large Numbers is certainly our friend in determining probabilities experimentally, but it requires, you know, large numbers.

*If you’ve ever taught probability, you know that this type of dice-sense is rampant.  Students consistently collapse distinct events based on superficial equivalence rather than true frequency.  Ask a room of high school students this question: “You flip a coin twice.  What’s the probability of getting exactly one head?”  A significant number will say 1/3.  After all, there are three possibilities: no heads, one head, two heads.  Relatively few will immediately notice, without guidance, that “one head” is twice as likely as the other two outcomes.

# Experiments are NOT easy to repeat.

I’ve already covered some of the practical issues here in terms of needing a lot of data points.  But beyond all that, there are also philosophical difficulties.  Normally, in science, when we talk about repeating experiments, we tend to use the word “reproduce.”  Because that’s exactly what we expect/are hoping for, right?  I conduct an experiment.  I get a result.  I (or someone else) conduct the experiment again.  I (they) get roughly the same result.  Depending on how we define our probability experiment, that might not be the case.  I flip a coin 10 times and count 3 heads.  You flip a coin 10 times and count 6 heads.  Experimental results that differ by 100% are not generally awesome in science.  In probability, they are the norm.

As an interesting, though somewhat tangential observation, note that there is another strange philosophical issue at play here.  Not only can events be difficult to repeat, but sometimes they are fundamentally unrepeatable.  Go back to my meteorologist’s prediction for a moment.  How do I repeat the experiment of “live through yesterday and see whether it rains?”  And what does a 60% chance of rain even mean?  To a high school student (teacher) who deals almost exclusively in frequentist interpretations of probability, it means something like, “If we could experience yesterday one million times, about 600,000 of those experiences would include rain.”  Which sounds borderline crazy.  And the Bayesian degree-of-belief interpretation isn’t much more comforting: “I believe, with 60% intensity, that it will rain today.”  How can we justify that level of belief without being able to test its reliability by being repeatedly correct?  Discuss.

# Probability distributions can be unwieldy.

Discrete distributions are conceptually easy, but cumbersome.  Continuous distributions are beautiful for modeling, but practically impossible for prior-to-calculus students (not just pre-calculus ones).  Even with the ubiquitous normal distribution, there is an awful lot of hand-waving going on in my classroom.  Distributions can make polynomials look like first-grade stuff.

# Theoretical predictions aren’t so easily checked.

My theoretical calculations for the cereal box problem tell me that, on average, I expect to buy between 5 and 6 boxes to collect all the prizes.  But sometimes when I actually run through the experiment, it takes me northward of 20 boxes!  This is a teacher’s nightmare.  We’ve done everything right, and then suddenly our results are off by a factor of 4.  Have we confirmed our theory?  Have we busted it?  Neither?  Blurg.  So what are we to do?

# We are to build a probability cannon!

With projectile motion problems, building a cannon is nice.  It’s cool.  We get to launch things, which is awesome.  With probability, I submit that it’s a necessity.  We need to generate data: it’s the raw material from which conjecture is built, and the touchstone by which theory is tested.  We need to (metaphorically) shoot some stuff and see where it lands.  We need…simulations!

If your model converges quickly, then hand out some dice/coins/spinners.  If it doesn’t, teach your students how to use their calculators for something besides screwing up order of operations.  Better yet, teach them how to tell a computer to do something instead of just watching/listening to it.  (Python is free.  If you own a Mac, you already have it.)  Impress them with your wizardry by programming, right in front of their eyes, and with only a few lines of code, dice/coins/spinners that can be rolled/flipped/spun millions of times with the push of a button.  Create your own freaking distributions with lovely, computer-generated histograms from your millions of trials.  Make theories.  Test theories.  Experience anomalous results.  See that they are anomalous.  Bend the LLN to your will.

Exempli Gratia

NCTM was kind enough to tweet the following problem today, as I was in the middle of writing this post:

Okay, maybe the probability is just 1/2.  I mean, any argument I make for Kim must be symmetrically true for Kyle, right?  But wait, it says “greater than” and not “greater than or equal to,” so maybe that changes things.  Kim’s number will be different from Kyle’s most of the time, and it will be greater half of the times it’s different, so…slightly less than 1/2?  Or maybe I should break it down into mutually exclusive cases of {Kim rolls 1, Kim rolls 2, … , Kim rolls 6}.  You know what, let’s build a cannon.  Here it is, in Mathematica:

Okay, so it looks like my second conjecture is right; the probability is a little less than 1/2.  Blammo!  And it only took (after a few seconds of typing the code) 1.87 seconds to do a million trials.  Double blammo!  But how much less than 1/2?  Emboldened by my cannon results, I can turn back to the theory.  Now, if Kyle rolls a one, Kim will roll a not-one with probability 5/6.  Ditto two, three, four, five, and six.  So Kim’s number is different from Kyle’s 5/6 of the time.  And—back to my symmetry argument—there should be no reason for us to believe one or the other person will roll a bigger number, so Kim’s number is larger 1/2 of 5/6 of the time, which is 5/12 of the time.  Does that work?  Well, since 5/12 ≈ 0.4167, which is convincingly close to 0.416159, I should say that it does.  Triple blammo and checkmate!

But we don’t have to stop there.  What if I remove the condition that Kim’s number is strictly greater?  What’s the probability her number is greater than or equal to Kyle’s?  Now my original appeal to symmetry doesn’t require any qualification.  The probability ought simply be 1/2.  So…

What what?  Why is the probability greater than 1/2 now?  Oh, right.  Kim’s roll will be equal to Kyle’s 1/6 of the time, and we already know it’s strictly greater than Kyle’s 5/12 of the time.  Since those two outcomes are mutually exclusive, we can just add the probabilities, and 1/6 + 5/12 = 7/12, which is about (yup yup) 0.583.  Not too shabby.

What if we add another person into the mix?  We’ll let Kevin join in the fun, too.  What’s the probability that Kim’s number will be greater than both Kyle’s and Kevin’s?

It looks like the probability of Kim’s number being greater than both of her friends’ might just be about 1/4.  Why?  I leave it as an exercise to the reader.

That tweet-sized problem easily becomes an entire lesson with the help of a relatively simple probability cannon.  If that’s not an argument for introducing them into your classroom, I don’t know what is.