Consider the Strawberry

In general, I hate doing this — because it feels like a self-promotional trick — but in order for this post to make any kind of sense, you have to go back and read the last one.  In particular, you have to read Max’s comment.  I will put on my teacher face and wait for a few minutes.

***

For two reasons, I’m going to unpack the strawberry analogy a bit more: (1) I am in love with it, and (2) it highlights an important pedagogical point about the relationship between squares and rectangles.  For serious.

As Max pointed out, even very small children have no problem recognizing the rather trivial fact that all strawberries are fruits even though not all fruits are strawberries.  On the flip side, anyone who has ever taught geometry knows, with something like absolute certainty, that much older and more mathematically savvy students have great difficulty recognizing that all squares are rectangles even though not all rectangles are squares.  The situations are structurally identical (in each case we have some set, X, which is a proper subset of another set, Y), but the second one is much more problematic.  Why might that be?

The seemingly obvious answer is that recognizing a strawberry is nearly automatic, and probably evolutionarily encoded, while recognizing a square requires abstract reasoning about the congruence of mathematical objects called “line segments.”  But I’m not at all convinced that’s the problem.  They are both ultimately pattern-recognition tasks.  Without language getting in the way, you (and small children) can probably recognize strawberries and squares with comparable facility.

Which brings us to the language.  Even though the strawberry:fruit::square:rectangle situations are structurally identical, there is an important (and subtle) linguistic distinction in the latter case.  Consider the following story.

You find your favorite small child/guinea pig and present a challenge.  In your left hand you hold a strawberry, and in your right an apple.  You say to this child, “Which hand has the fruit in it?”  The child blinks at you for several moments, trying to study your face for clues about the answer to what has just got to be a trick question, before finally, tentatively, reaching out to point at one of your hands, more or less at random.  You reward the child with a piece of delicious fruit.

Consider the same story, except now you hold in your left hand a picture of a square, and in your right a picture of a generic rectangle.  You say to this child, “Which hand has the rectangle in it?”  The child immediately points to your right hand.  You reward the child with, I guess, a delicious piece of rectangle.

Why are these stories so different?  I submit that it’s not a mathematical issue.  The real problem stems from the fact that, linguistically, there is no unprivileged fruit: every class of fruit gets its own name.  But “square” is privileged relative to “rectangle.”  When presented with a generic rectangle, we have no word for saying that it is “a rectangle that is not a square.”  In fact, I made up the phrase “generic rectangle” precisely to try and convey that information.

So it turns out I lied a little bit before (how fitting) when I said the fruit/rectangle situations were structurally identical.  It’s true that in each case we have a set (square, strawberry) that is a subset of a larger set (rectangle, fruit), but it turns out the larger sets have different linguistic partitions.

Rectangles = \{Rectangles \cap Squares\} \cup \{Rectangles \cap Squares^C\}

Fruits = Apples \cup Apricots \cdots \cup Strawberries \cup \cdots \cup Watermelons

So when you ask the child which hand contains the rectangle, she chooses the generic rectangle immediately.  Why?  Because, had you meant the square, then you damn sure would’ve just said “square” in the first place, even though both hands hold perfectly correct answers to your challenge.  If our language were set up such that strawberries were the only specially named fruits (which seems like something Max would wholeheartedly support), the child in the first story would likewise choose your non-strawberry hand every time, without hesitation.

So what can we do with this?  It seems that strawberries have something to teach us about squares.  Actually, it seems that all the other fruits have something to teach us about rectangles.  It’s taken the entire history of humanity to organize fruits into useful equivalence classes, but luckily we find ourselves in a much, much simpler situation with rectangles; after all, there are only two classes we care about!  We already have a name for squares, so let’s call non-square rectangles “nares.”  Now our partition looks like this:

Rectangles = Squares \cup Nares

Which hand has the nare in it?  Easy.  Better yet, unambiguous.  Now, I’m not seriously lobbying for the introduction of nares into the mathematical lexicon (for one thing, nare is already a word for a weird thing), but it might be a fun way to introduce young children to the concept of a non-square rectangle.  After removing the greatest impediment to understanding the square/rectangle relationship (that “square” is the lone special case of this broader class of “rectangles,” which word is generally reserved for “rectangles-but-not-squares,” since, if someone means “square,” we already have a freaking word for it), that scaffolding can eventually be disassembled.

But the cognitive edifice the scaffolding initially supported will have cured a little by then.  In other words, why not make the distinction we actually care about explicit from the beginning, rather than end up in linguistic contortions to get around the fact that the distinction is solely implicit in standard usage?  Make up your own word, I don’t care.  Don’t want to be cute about it?  Fine.  Just abbreviate non-square rectangles as NSRs or something.  But make them easy to talk about — as easy as it is to talk about a tangerine or cumquat rather than a “fruit that might be a strawberry, but very often is not.”  Because, seriously, if that’s the way our fruit classification worked, there would be an awful lot of kids running around with the reasonable and tightly-held belief that strawberries are not fruit.

And that would be a shame.

Inconvenient Truths

As happens with amazing frequency, Christopher Danielson said something interesting today on Twitter.

And, as also happens with impressive regularity, Max Ray chimed in with something that led to an interesting conversation — which, in the end, culminated in my assertion that not everything that is mathematically true is pedagogically useful.  I would go further and say that a truth’s usefulness is a function of the cognitive level at which it becomes both comprehensible and important — but not before.

By way of an example, Cal Armstrong took a shot at me (c.f. the Storify link above) for my #TMC13 assertion that it is completely defensible to say that a triangle (plus its interior) is a cone.  Because he is Canadian, I think he will find the following sentiment particularly agreeable: we’re both right.  A triangle both is, and is not, a cone, depending on the context.  It might be helpful to think of it as Schrödinger’s Coneangle: an object that exists as the superposition of two states (cone and triangle),  collapsing into a particular state only when we make a measurement.  In this case, the “measurement” is actually made by our audience.

When I am speaking to an audience of relative mathematical maturity, I can (ahem…correctly) say that cone-ness is a very broadly defined property: given any topological space, X, we can build a cone over X by forming the quotient space

CX := X \times [0, 1] / \sim

with the equivalence relation ~ defined as follows:

(x,1) \sim (y,1) := x-y \in X \times \{0\}

If we take X to be the unit interval with the standard topology, we get a perfectly respectable Euclidean triangle (and its interior).  Intuitively, you can think of taking the Cartesian product of the interval with itself, which gives you a filled-in unit square, and then identifying one of the edges with a single point.  Boom, coneangle.  Which, like Sharknado, poses no logical problems.

Of course, it is a problem when you’re talking to a middle school geometry student.  In that situation, saying that a triangle is a cone is both supremely unhelpful and ultimately dishonest.  What we really mean is that, in the particular domain of 3-dimensional Euclidean geometry, when we have a circle (disk) in a plane and one non-coplanar point, we can make this thing called a cone by taking all the line segments between the point and the base.  But to that student, in that phase of mathematical life, the particular domain is the only domain, and so we rightly omit the details.  In an eighth-grade geometry class, there is absolutely no good reason to introduce anything else.

 photo cone_zpsf6e0fb1f.gif

Constructing a topological cone over the unit interval

We do this all the time as math teachers.  “Here, kid, is something that you can wrap your head around.  It will serve you quite well for a while.  Eventually we’re going to admit that we weren’t telling you the whole story — maybe we were even lying a little bit — but we’ll refine the picture when you’re ready.  Promise.”

Which brings me back to Danielson’s tweet.  From a mathematical point of view, there are all kinds of problems with saying that a rectangle has “two long sides and two short sides” (so many that I won’t even attempt to name them).  But how bad is this lie?  Better yet, how bad is the spirit of this lie?  I think it depends on the audience.  I’m not sure it’s so very wrong to draw a sharp (albeit technically imaginary) distinction for young children between squares and rectangles that are not squares.  It doesn’t seem all that different to me, on a fundamental level, from saying that cones are 3-dimensional solids.  Or that you can’t take the square root of a negative number.  Or that the sum of the interior angles of a quadrilateral is 360 degrees.  None of those statements is strictly true, but the truths are so very inconvenient for learners grappling with the concepts that we actually care about at the time.  It’s not currently important that they grasp the complete picture.  And it’s probably not feasible for them to do so, anyway.

Teaching mathematics is an iterative process, a feedback loop.  New information is encountered, reconciled with existing knowledge, and ultimately assimilated into a more complete understanding.  Today you will “know” that squares and rectangles are different.  Later, when you’re ready to think about angle measure and congruence, you will learn that they are sometimes the same.  Today you will “know” that times can only be 0 if either a or b is zero.  And tomorrow you will learn about the ring of integers modulo 6.

I will tell you the truth, children.  But maybe not today.

Luck(?) of the Draw

What is luck?  Is luck?  And, if you vote yea, is a belief in luck an obstacle to understanding probability?

This question came up on Twitter a couple of nights ago when Christopher Danielson and Michael Pershan were discussing Daniel Kahneman’s recent book, Thinking, Fast and Slow.  Specifically, they were talking about the fact that Kahneman doesn’t shy away from using the word luck when discussing probabilistic events.  This, of course, is the kind of thing that makes mathematically fastidious people cringe.  And Danielson and Pershan are nothing if not mathematically fastidious.  Spend like five minutes with their blogs.  So Danielson twittered this string of twitterings:

According to Danielson, luck a “perceived bias in a random event.”  And, according to his interpretation of Kahneman, luck is composed of “happy outcomes that can be explained by probability.”  Let me see if I can define luck for myself, and then examine its consequences.

What is luck?

I think, at its heart, luck is about whether we perceive the universe to be treating us fairly.  When someone is kind to us, we feel happy, but we can attribute our happiness to another’s kindness.  When someone is mean, we feel sad, but we can attribute our sadness to another’s meanness.  When we are made to feel either happy or sad by random events, however, there is no tangible other for us to thank or blame, and so we’ve developed this idea of being either lucky or unlucky as a substitute emotion.

But happy/sad and lucky/unlucky are relative feelings, and so there must be some sort of zero mark where we just feel…nothing.  Neutral.  With people, this might be tricky.  Certainly it’s subjective.  Really, my zero mark with people is based on what I expect of them.  If a stranger walks through a door in front of me without looking back, that’s roughly what I expect.  And, when that happens, I do almost no emoting whatsoever.  If, however, he holds the door for me, this stranger has exceeded my expectations, which makes me feel happy at this minor redemptive act.  If he sees me walking in behind him and slams the door in my face, he has fallen short of my expectations, which makes me sad and angry about him being an asshole.

And, in this regard, I think that feeling lucky is actually a much more rational response than being happy/sad at people, because with random events at least I can concretely define my expectation.  I have mathematical tools to tell me, with comforting accuracy, whether I should be disappointed with my lot in life; there is no need to rely on messy inductive inferences about human behavior.  So I feel lucky when I am exceeding mathematical expectations, unlucky when I’m falling short, and neutral when my experience roughly coincides with the expected value.  Furthermore, the degree of luck I feel is a function of how far I am above or below my expectation.  The more anomalous my current situation, the luckier/unluckier I perceive myself to be.

Let’s look at a couple examples of my own personal luck.

  1. I have been struck by lightning zero times.  Since my expected number of lightning strikes is slightly more than zero, I’m doing better than I ought to be, on average.  I am lucky.  Then again, my expected number of strikes is very, very slightly more than zero, so I’m not doing better by a whole lot.  So yeah, I’m lucky in the lightning department, but I don’t get particularly excited about it because my experience and expectation are very closely aligned.
  2. I have both my legs.  Since the expected number of legs in America is slightly less than two, I’m crushing it, appendage-wise.  Again, though, I’m extremely close to the expected value, so my luck is modest.  But, I am also a former Marine who spent seven months in Iraq during a period when Iraq was the explosion capital of the world.  My expected number of legs, conditioned on being a war veteran, is farther from two than the average U.S. citizen, so I am mathematically justified in feeling luckier at leg-having than most leg-having people in this country.

Which brings us back to this business of luck being a “perceived bias in a random event.”  I’m not convinced.  In fact, I’m absolutely sure I can be lucky in a game I know to be unbiased (within reasonable physical limits).  Let’s play a very simple fair game: we’ll flip a coin.  I’ll be heads, you’ll be tails, and the loser of each flip pays the winner a dollar.  Let’s say that, ten flips deep, I’ve won seven of them.  I’m up $4.00.  Of course, my expected profit after ten flips is $0, so I’m lucky.  And you, of course, are down $4.00, so you’re unlucky.  Neither of us perceives the game to be biased, and we both understand that seven heads in ten flips is not particularly strange (it happens about 12% of the time), and yet I’m currently on the favorable side of randomness, and you are not.  That’s not a perception; that’s a fact.  And bias has nothing to do with it, not even an imaginary one.

Now, in the long run, our distribution of heads and tails will converge toward its theoretical shape, and we will come out of an extremely long and boring game with the same amount of money as when we started.  In the long run, whether we’re talking about lightning strikes or lost limbs or tosses of a coin, nobody is lucky.  Of course, in the long run—as Keynes famously pointed out—we’ll be dead.  And therein, really, is why luck creeps into our lives.  At any point, in any category, we have had only a finite number of trials, which means that our experiences are very likely to differ from expectation, for good or ill.  In fact, in many cases, it would be incredibly unlikely for any of us to be neither lucky nor unlucky.  That would be almost miraculous.  So…

Is luck?

As in, does it really exist, or is it just a perceptual trick?  Do I only perceive myself to be lucky, as I said above, or am I truly?  I submit that it’s very real, provided that we define it roughly as I just have.  It’s even measurable.  It doesn’t have to be willful or anthropomorphic, just a deviation from expectation.  That shouldn’t be especially mathematically controversial.  I think the reason mathy people cringe around the idea of luck is because it’s so often used as an explanation, which is where everything starts to get a little shaky.  Because that’s not a mathematical question.  It’s a philosophical or—depending on your personal bent—a religious one.

If you like poker, you’d have a tough time finding a more entertaining read than Andy Bellin’s book, Poker Nation.  The third chapter is called “Probability, Statistics, and Religion,” which includes some gems like, “…if you engage in games of chance long enough, the experience is bound to affect the way you see God.”  It also includes a few stories about the author’s friend, Dave Enteles, about whom Bellin says, “Anecdotes and statistics  cannot do justice to the level of awfulness with which he conducts his play.”  After (at the time) ten years of playing, the man still kept a cheat sheet next to him at the table with the hand rankings on them.  But all that didn’t stop Dave from being the leading money winner at Bellin’s weekly game during the entire 1999 calendar year.  “The only word to describe him at a card table during that time is lucky,” says Bellin, “and I don’t believe in luck.”

But there’s no choice, right, but to believe?  I mean, it happened.  Dave’s expectation at the poker table, especially at a table full of semi-professional and otherwise extremely serious and skillful players, is certainly negative.  Yet he not only found himself in the black, he won more than anybody else!  That’s lucky.  Very lucky.  And that’s also the absolute limit of our mathematical interest in the matter.  We can describe Dave’s luck, but we cannot explain it.  That way lies madness.

There are 2,598,960 distinct poker hands possible.  There are 3,744 ways to make a full house (three-of-a-kind plus a pair).  So, if you play 2,598,960 hands, your expected number of full houses during that period is 3,744.  Of course, after 2.6 million hands, the probability of being dealt precisely 3,744 full houses isn’t particularly large.  Most people will have more and be lucky, or less and be unlucky.  That’s inescapable.  Now, why you fall on one side and not the other is something you have to reconcile with your favorite Higher Power.

Bellin’s final thoughts on luck:

I know in my heart that if Dave Enteles plays 2,598,960 hands of poker in his life, he’s going to get way more than his fair share of 3,744 full houses.  Do you want to know why?  Well, so do I.

And, really, that’s the question everybody who’s ever considered his/her luck struggles to answer.  No one has any earthly reason to believe she will win the lottery next week.  But someone will.  Even with a negative expectation, someone will come out way, way ahead.  And because of that, we can safely conclude that that person has just been astronomically lucky.  But why Peggy McPherson?  Why not Reggie Ford?  Why not me?  Thereon we must remain silent.

Is a belief in luck an obstacle to understanding probability?

I don’t see why it should be.  At least not if we’re careful.  If you believe that you are lucky in the sense of “immune to the reality of randomness and probabilistic events,” then that’s certainly not good.  If you believe that you are lucky in the sense of “one of the many people on the favorably anomalous side of a distribution,” then I don’t think there is any harm in it.  In fact, acknowledging that random variables based on empirical measurements do not often converge toward their theoretical limits particularly rapidly is an important feature of very many random variables.  In other words, many random variables are structured in such a way as to admit luck.  That’s worth knowing and thinking about.

Every day in Vegas, somebody walks up to a blackjack table with an anomalous number of face cards left in the shoe and makes a killing.  There is no mystery in it.  If you’re willing to work with a bunch of people, spend hours and hours practicing keeping track of dozens of cards at a time, and hide from casino security, you can even do it with great regularity.  There are how-to books.  You could calculate the exact likelihood of any particular permutation of cards in the shoe.  I understand the probabilistic underpinnings of the game pretty well.  I can play flawless Basic Strategy without too much effort.  I know more about probability than most people in the world.  And yet, if I happen to sit at a table with a lot more face cards than there ought to be, I can’t help but feel fortunate at this happy accident.  For some reason, or for no reason, I am in a good position rather than a bad one; I am here at a great table instead of the guy two cars behind me on the Atlantic City Expressway.  That’s inexplicable.

And that’s luck.

Measure Your Blessings

In my last post, we took a look at how our choice of unit has both mathematical and linguistic consequences when we try to talk about one of something, particularly in a few weird cases.  One of the themes (here unit = “theme”) that came up in the course of the discussion is the notion that there are certain objects that lend themselves to counting, and others that lend themselves to measuring.  Moreover, the words we use in our reckoning of different objects/substances are informed by our mathematical interpretation of their underlying structure.

Checkout Lines and Carnival Rides

I grew up with two extremely precise parents: a teacher mother who routinely marks public signs with a Sharpie to fix grammatical and spelling errors, and a teacher father who routinely soliloquizes over dubious scientific claims in the media.  Perhaps it’s no accident that I both teach (and love) math and write (and love) this blog.  One of the things I apparently learned/inherited from my mom is a visceral, knuckle-whitening cringe induced by express checkout aisles labeled “10 Items or Less” instead of “10 Items or Fewer.”  It’s a reaction that has lodged itself firmly into the parts of my brain normally reserved for images of poisonous snakes and lion silhouettes.  We experience this discomfort because there is a dissonance between the referent noun items (a countable substance) and the comparative adjective less (a measure word).  It makes no more sense to speak of a number of items less than ten than it does to speak of a paint color taller than red.  Height is not an attribute of paint color; measure it not an attribute of item count.

Of course no one is actually confused by that what the sign means: “If the cardinality of the set of items in your basket exceeds ten, please find another line.”  Still, the entire point of a grammar is to avoid ambiguity whenever possible.  Even though there is no appreciable ambiguity in the express aisle, that we’re even talking about this highlights an important aspect of our language: we do in fact draw a line in the linguistic sand with the measurable on one side and the countable on the other.  We have different words that signal counting and measuring in the same way that, e.g., German has different words that signal you (singular) and you (plural).  As good citizens, we try to use the right signalling words, and we’re perhaps slightly irritated when others don’t.

Once we’ve figured out whether we’re measuring or counting, though, the grammatical questions are decided for us, so the important—and sometimes difficult—part lies in gauging which side of line we’re on.  In the express aisle, for instance, we have to determine what structure, exactly, underlies the nature of an “item.”  Well, we understand that the point of the express aisle bound is to get people through the line as quickly as possible, and the quickness with which one negotiates the line is a function of the number of scans that take place.  In effect, we can measure checkout time in terms of boops.  So, in this case, 1 boop = 1 item.  And, since boops are atomic (it makes no sense to think about what half-a-boop might mean), we model them with cardinal/ordinal numbers.  Your six-pack of Diet Coke?  One boop, not six.  Your ten yogurts, which are conspicuously not linked or priced together? Sorry, 10 boops = 10 separate items.  There is certainly no room for less than here.

At the other end of the spectrum, my mom wouldn’t look twice at a Tilt-A-Whirl line with a “No Riders Less Than 48 Inches Tall” sign.  Presumably the point of the height restriction is to prevent too-small human beings from slipping out of the safety restraints mid-tilt or -whirl, and thus any height at all below the 48-inch cutoff is potentially hazardous.  We want to exclude someone who is 45 inches tall, or 45.019 inches, or 42.31 inches.  Since it seems as though we’re modeling height with real numbers, we know that we’re in measuring country.  Now we don’t have room for fewer.

Time Keeps On Slipping/Discretely Clicking Into the Future

Of course it’s not always so simple.  Depending on the context—and thus the unit—in question, things might be either countable or measurable.  Consider time.  On the one hand, time seems like the prototypical infinitely-divisible thing, hence calculus gives us exceedingly good predictions about the behaviors of physical bodies moving about the universe.  On the other hand, we are sometimes only concerned with time meted out into discrete chunks, hence survey-of-history courses.

Let’s pretend we’re about to run a 100m sprint together.  I might say to you, “If you can run this in less than 9.58 seconds, you’ll have beat the world record.”  We’re measuring seconds, so this makes perfect sense.  If, however, we were talking about my teaching career, I might say something like, “Since I’ve been teaching for fewer than three years, I have to be observed by my principal.”  Now we’re counting years.  I don’t think anyone would freak out if I said “less than three years,” but it’s extremely considerate for me to use “fewer” in this case, because it signals to my conversational partner that “years” is the relevant unit of account.  In fact, almost any conversation about my teaching experience is likely to be couched in counting terms, because almost all of the important distinctions (e.g., pay) are binned into one-year increments, the fractional parts of which are totally irrelevant.  In other words, we’re modeling with cardinal/ordinal numbers, which require counting words.

The amphibious nature of concepts like time can lead to some interesting and confusing consequences.  For instance, the ratio of the human lifespan to the rate at which the Earth orbits the sun makes it often convenient to group time into fairly large chunks.  Large chunks generally demand counting language.  On the other hand, we like to be pretty precise with our reckoning of time, and precision often demands measuring language.  Which interpretation wins out might depend on cultural norms.

Consider the traditional Chinese method of determining age.  You are born into some year.  This is your first year.  When the lunar year rolls over, even if that happens tomorrow, you have suddenly been extant during two different years, which makes you two years old.  We’re starting at one and modeling with cardinal/ordinal numbers, i.e., counting.  In the West, we think this is nuts.  The moment you sneak out of the birth canal (or, I suppose, the abdominal cavity), the clock starts running.  Your birth is the zero marker, and everything is measured as a distance relative to that point.  That is, your lifespan is measured from birth to death.  To say that someone is 18 means two entirely different things in the two cultures.  In China it means that he has taken at least one breath on 18 distinct calendar pages; in the West, it means his age measurement lies in the interval [18,19).  Neither one is a priori more correct; it just means that someone who is 18 under the Eastern system might be barely old enough to start driving in most of the U.S.

But even in the West we’re not entirely consistent in our choices.  We are by and large measurers of time, but we still count it under certain circumstances.  For instance, the year 72 CE belongs to the First Century of the Common Era, even though 72 has a zero in the hundreds place.  That’s a telltale sign of counting.  But then we also might talk about the 1900’s instead of the 20th Century, and “1900’s” is based on a measurement starting at the zero point of Jesus’s possibly apocryphal, possibly entirely literally true birth (well, not really the zero point…we jumped from 1 BCE to 1 CE without any Year Zero…even though we’ve been measuring for a long time, now, we got off to a lousy counting start).  And a century seems to be about the inflection point: any chunk larger than that and we almost exclusively count (we might talk about the 2nd Millennium, but nobody talks about the 1000’s), and anything smaller we almost exclusively measure (ever hear anybody refer to the 90’s as “the 200th Decade?”).

What We Talk About When We Talk About Math

Like so many things in mathematics, this counting/measuring business really comes down to deciding what domain we’re in so that we can choose an appropriate mathematical model.  Notice I’ve been saying throughout the piece that we “model” counting or measuring with real or cardinal numbers.  And, like all choices of model, it boils down to convenience.  The continuity of real numbers makes them very nice to work with sometimes; it’s comforting to think that I can measure out any arbitrary amount of water or time that I might need, even though that’s not strictly true.  Nothing in the physical universe really has anything to do with real numbers.  We don’t know (almost) any empirical measurements beyond about 7 or 8 decimal places.  And, even if we did, it seems as though the universe itself, being quantized, turns out to be metaphysically countable.  In other words, we could get away with counting language alone, you know, if you didn’t mind measuring carnival riders in Planck Lengths.

Of course we don’t want to do that, so we’ve agreed to trade some verisimilitude for the pleasantries of measurement language, even if it slightly increases our grammatical efforts in the process, and even if that increased effort leads to the occasional error in signage.  I’ll try to keep that in mind next time I find my heart rate climbing in the grocery store.

Aside: Many of the ideas about the business of measuring and/or counting time, as well as the seeds of my fierce loyalty to a singular “data,” can be traced back to John Derbyshire’s excellent book, Prime Obsession.

Human Composure

Frequent commenter and child logic expert Christopher Danielson recently contributed a very cool video, entitled One is one…or is it?, to the Ted-Ed project in which he explores  what it is we really mean when we say “one.”  The very notion requires an implicit or explicit reference to a unit, and sometimes things aren’t quite as simple as they seem.  In particular, we might have units that are either built up of (composed units) or divided into (partitioned units) smaller sub-units.  Or sub-sub-units.  For instance, the loaf of bread in your cupboard is partitioned into individual slices; the pinochle deck you stash in the coffee table is composed of individual playing cards; and Pop Tarts are sold in boxes, which contain packs, which contain individual pastries. One is relative to your choice of unit.

Of course this is all very interesting, but the first thoughts that popped into my head during the dancing apple slices number involved the two conspicuous cases that seem to defy composition and partitioning: (1) units whose sub-units are human, and (2) units whose sub-units are tiny.  It’s still possible to smoosh them together and rend them asunder, but not so neatly.

Part the First

For just a moment let’s consider the North Carolina State Wolfpack.  Now that is definitely an all-the-way, one-hundred percent composed unit.  It is a singular pack, composed of singular wolves.  Unambiguous.  Until you read some press releases, which are simply pregnant with phrases like, “the Wolfpack are…” Which is very strange indeed.

Consider two fictitious news stories.  In the first, NC State is relocated by the NCAA to South Carolina.  In the second, an actual, literal pack of wolves (Canis lupus) is spotted migrating from Raleigh to Charleston.  The first headline would read, “Wolfpack Head Across the Border,” while the second would read, “Wolfpack Heads Across the Border.”  These two scenarios are mathematically identical; the only difference is that, in one case, the sub-units are figurative wolves.  So why do we require a different verb conjugation?  It seems that people somehow resist being subsumed by composed units in a way that, e.g., playing cards do not.  Admittedly this is a psycho-linguistic curiosity more than a mathematical one, but still…units can be slippery.

It’s maybe more obvious that people resist being partitioned.  After all, if you make it through life without anybody partitioning you, let’s call that a nontrivial success.  But it shows up in the language, too.  We are very rarely wont to consider a disembodied finger directly.  When we partition people, we replace the corporeal whole with a possessive placeholder, a pointer (ha!) to the original unit.  We hold the source object in memory in a much more vivid and deliberate way than with other objects.  An apple slice is an apple slice, culinary nuances notwithstanding, but “John’s foot” and “your foot” and “the crazy lady in 3B’s foot” require modification.  Unless you live under fairly abnormal conditions, indefinite articles no longer suffice: “a foot” or “the foot” rarely come up.  We make grammatical concessions more readily and more often for human “units” (suppressing, here, many jokes) than nonhuman ones.

Part the Second

Consider the strangeness of the following utterance: “I think the rice is done.”  Why aren’t they done instead?  There are tons of those little buggers!  There certainly exists a plurality of foodstuffs.  But when we deal with tiny sub-units (especially if they’re homogeneous), we have a hard time unitizing them naturally.  We can go clumpy: “I think the pot/serving/microwave bag of rice is done.”  We can go grainy: “I think the grains of rice are done.”  But both of those solutions feel deeply unsatisfactory.  There are contortions involved.  We either have to create a new group name (think pod of dolphins or murder of crows), or a new unit name (think kernel of corn or drupelet of raspberry).  Awkward either way.  It’s as if, on a fundamental level, we ache for rice to be singular entity, but of no definite unit membership.

And this weirdness, I think, is no longer merely syntactic, but deeply mathematical.  When we unitize the world in language, we make an important distinction between that which is countable and that which is measurable.  There is an analogy to be made here between discrete and continuous objects, respectively.  It is natural to consider the composed unit of a dozen eggs, because eggs are easily countable, and because it’s easy to count to a dozen.  It is much more difficult to create a conventional composed unit out of water molecules, because water molecules are hard to count, and because it would be hard to count to a number of water molecules that would be useful in most situations.  Thus, we measure water and treat it as an un-composed unit.  Somewhere in between those two extremes, we have things like rice, much more countable than water, much less countable than eggs.  In fact, it’s closer to water than eggs in its countability, so we treat rice as measurable/continuous, even though it’s technically countable/discrete.  Sand.  Salt.  Data.*

*I will fight to the death, via torturously long diatribes on gerunds and loan words, that “data” should be treated as singular in English, even though it’s inflected as a plural in Latin, all based on the composed unit argument above.  If you’re going to be a total weeny and use it in the plural, at least be consistent.  I had better never hear you talk about “an agenda” (singular), because “agenda” is also inflected as plural; each item is technically an “agendum.”  I’m watching you.

So tiny things resist composition, and they resist partitioning even more vehemently.  For one, they’re already tiny.  It’s inconvenient to let these things get any smaller (and our unit choices, after all, have an awful lot to do with convenience), and there’s no natural starting point from which to partition things in the first place.  If I wanted to decompose “sand” into parts, how big is this parent sand?  We’ve stumbled into a kind of reverse paradox of the heap.  A loaf of bread readily admits slices.  A _________ of sand admits grains.  Tough to fill in the blank.

An Interesting Competition

What happens when these two notions are pitted against each other?  Which one wins out in our brains and on our tongues?  How fortunate for my blog that the Miami Heat are currently playing in the NBA Eastern Conference Finals, and that the Miami Heat are one of the few professional sports teams with a singular name (as an fun exercise, try to list all the others—major sports only, no AA curling or anything).  Did you hear what I just typed?  “The Miami Heat are one of…”  But wait, that’s nuts!  I’ve never in my life heard anybody complain that the humidity are unbearable, so why should the capital-H Heat be any different?  Heat is an abstract and amorphous thing.  In our everyday usage, it’s definitely a measurable substance—like water, a singular.  But it’s also a unit composed of people.  And when I tell you about the current state of the NBA, I tell you that the Heat are in the Eastern Conference finals.

Our bias against making conglomerations out of people is so strong that it can overcome our natural tendency to treat both composed units and measurable substances as singular.  We hold ourselves in such high regard that we’re willing to regularly construct borderline nonsensical phrases to maintain our artificially inflated position.

Go, Pack!

To the Limit…One More Time

There’s an interesting article in this month’s Mathematics Teacher about the effects of the particular language elements we use to communicate mathematical ideas.  The main thread revolves around limit concepts, primarily because they’re both philosophically and practically confusing for many beginning calculus students, and because, it turns out, a teacher’s particular choices regarding words and metaphors have an important impact on student (mis)understanding.

Limits comprise a special relationship between mathematical process and mathematical object.  We speak of them in terms of variables “approaching” or “tending toward” particular values, but we subsequently manipulate them as static entities.  I can, for instance, talk about the limiting value of the expression 1/x as x grows without bound (a dynamic concept), but that limiting value is ultimately just a single real (static) number: zero.  There’s an uncomfortable tension in that duality.

Even the notation is ambiguous.  Here’s the fact I mentioned in the preceding paragraph, symbolically:

\lim_{x \to \infty} \frac{1}{x} = 0

The arrow implies motion, but the equals sign implies assignment.  There are elements of both process and object.

I’ve touched on this duality before, which has sparked some great conversations.  A few months ago, I had a supremely interesting email chat with Christopher Danielson after he pointed me toward the writings of Anna Sfard.  He has graciously agreed to allow me to reproduce that conversation here in its original form; I’ve only redacted some of the more boring pleasantries and collapsed some strings of shorter messages into longer ones.  Enjoy.

Chris Lusto
To: Christopher Danielson

Seriously, thanks for the Sfard tip.  I’ve read a few of the articles she has on her website (which, by the way, why are college professors’ websites like the most aesthetically displeasing things on the internet?  Just use a white background and stop being weird.), and you were right: I dig her.  I read the article on duality [PDF] and had one major bone of contention.

I really like the idea of duality versus dichotomy, and she makes, I think, a compelling argument in general.  I just worry it might just be a little ambitious.  She hedges a little bit, saying things like “more often than not” mathematical objects can be conceived both operationally and structurally, but I still think this idea of duality runs into serious problems when infinite things come into play–and that’s not exactly a trivial subset of “mathematical objects.”

If we allow that operational conception is (a) just as valid/important as structural and (b) often, in fact, precedes structural conception, what are we to make of processes that never end, that never produce anything because they’re always in production?  Sfard even says, “…interpreting a notion as a process implies regarding it as a potential rather than actual entity, which comes into existence upon request in a sequence of actions.”  But what if we can’t ever fulfill the request, because we’re always on hold, waiting in vain for the end of an unending sequence?  And what about this business of “potential?”  That just smacks of the “potential infinities” of the ancient Greeks that held back western mathematics for a couple millennia.  It seems like we have to admit either (a) an infinite process can terminate in finite time in order to produce an structural object, or (b) these objects aren’t really at all structural, because they live in the world of potentiality.  I don’t find either of those particularly satisfying.  I think, in the case of infinite notions, the operational conception leads to a fundamental misconception, a la my student D.

Your thoughts?  Whenever you have a moment, of course.

Chris

Christopher Danielson
To: Chris Lusto

“Ambitious” describes Anna Sfard’s intellectual habits very well, I think. She was in a half-time appointment at Michigan State (and half time at Haifa) for part of my grad school time, and she was on my dissertation committee. The woman is crazy smart. And it seems to be a characteristic of Israeli intellectuals to commit very strongly to one’s ideas. Not a maybe or a perhaps to be found in her oeuvre, I don’t think.

I have no explanation for the poor poor quality of academics’ websites, except to say that it is representative of tech use in higher ed more generally. See also @EDTECHHULK on Twitter and Dan Meyer’s comments here (esp. couple screens down the page, at “Real Talk about Grad School):

http://blog.mrmeyer.com/?p=12592

I’m still formulating thoughts on processes that never terminate. But I’m not sure I fully understand your objection. Your classroom scenarios seem to suggest that indeed process and object are both fundamentally important ways of thinking about infinity. And consider the language of limits…”as x goes to infinity” or even “as x grows without bound”. Those are both process-based ways of talking, right?

csd

Chris Lusto
To: Christopher Danielson

I think Sfard’s right that, in general, process and object are both important methods of mathematical conception.  And yeah, multiple representations are not only admissible, but probably desirable (thinking here, specifically, of HS algebra and the Lesh Model), but isn’t operational understanding misleading when you’re talking about infinity?

Thinking of f(x) = 2x as a process that doubles inputs is valuable, and so is a picture of the resulting object/graph.  And, in a case like this one, I don’t think you lose or gain all that much with either vantage.  Sometimes it’s helpful to think of the process, and other times the object.

But thinking of asymptotic behavior procedurally, for example, is very, very different from the object we call a “limit.”  It’s nice if students can understand that, as x gets larger, 1/x gets arbitrarily close to 0.  I mean, certainly if we hold a numerator constant and increase the denominator, this process yields subsequently smaller and smaller values.  But I think that’s still like a mile away from understanding that lim_x–>∞ {1/x} = 0.  Like, is equal to.  Is identical to as an object.  Is just another name for.  Like, 23 + lim_x–>∞ {1/x} = 23.

If procedure (process) is linked to product (object)–like, say, “4 divided by 7″ is linked to “4/7″–then how are we to reconcile a never-ending process with a finite, tangible product that can be manipulated like any other mathematical object?  Doesn’t it force us to accept that 1/x eventually “gets to” 0 (which it doesn’t), or that the limit is some kind of potential result (which it isn’t) that can’t really ever be called a proper object because the process is, by definition, never-ending?

I’m going to stop typing words, because I feel like as my words –>∞, my clarity –> 0.

C

Christopher Danielson
To: Chris Lusto

I see…so to boil it down to a debatable question…

Is the object necessarily the product of the process?

Do I have it right?

btw…if I got that question right, then I say ‘no’.

I can think about 1,352,417 and treat it as an object, even though I can assure that I have never participated in any sort of process that yielded that number.

To say nothing of googolplex.

csd

Chris Lusto
To: Christopher Danielson

I think that’s about right, but with one important qualification.

Is the object necessarily the product of the process?  Then I agree, no.  But you at least have the option of defining it either way.  Even if you’ve never constructed 1,352,417 widgets, there’s nothing philosophically problematic with the process that did/could.  You’re right, there isn’t even a measly googol of anything, but that doesn’t stop it from being the eventual result of (1+1+…+1).

So…

Is the object the result of the process?  Not necessarily, but that’s not a huge problem for me.

Could the object be the result of the process?  If the answer is no (which my gut believes it to be in the infinite case), then how can we reasonably talk about it as both a process and an object?  Does the duality break down?

C

Christopher Danielson
To: Chris Lusto

See I don’t see a huge difference philosophically between “a product that could be created by a known process, but not in my lifetime” (counting to googol) and “a product that could never be created” (infinity).

In both cases, for me, the process is (1) incomplete, and (2) hypothetical.
Why does it matter at the core whether the result is theoretically achievable or not? Either way, I’ve imagined it.

And I think imagination is key. I don’t recall whether Sfard writes about that or not (probably not, since she’s all language, no imagery). But I do think the transition from process to object is at least in part one involving imagination. I have to imagine the object into being in mathematics precisely because mathematical objects are abstract.

And when I’m struggling to understand a new object (say a limit), it is often helpful to imagine the process that produced it. But I don’t have to see the process through to the end.

csd

Chris Lusto
To: Christopher Danielson

Think about our Hz conversation.  Even with arbitrarily huge numbers of wave combinations, we get sinusoidal waves.  I can get as close to a square wave as I want, but in order to actually obtain the square wave object, the process that got me arbitrarily close to my goal breaks down and fails.  The process is insufficient to the object.  The difference between the square wave and the sinusoidal wave that’s arbitrarily close to square is ultimately qualitative, not just quantitative–and there’s the rub.  Wasn’t that precisely what you and Frank [Noschese] convinced me of?

C

Christopher Danielson
To: Chris Lusto

But the square wave is the limit. There’s the object. The limit (process? object?) produces the square wave.

I have no idea what I convinced you of. But I know that the argument I was making was that polynomials-by definition-have finitely many terms. And e^x can be written as infinitely many terms, each one a polynomial. Is e^x a polynomial? By the letter of the law, no. But in spirit? Yes. And that’s beautiful.

I got in trouble doing a CMP demonstration lesson once. I talked with students about a cylinder being a circular prism. The algebra teacher observing got upset with me because a prism has polygonal faces. Ergo, “circular prism” is nonsense.

I had occasion to follow up a year or so later with my former complex analysis professor from MSU grad school. He had absolutely no problem calling a cylinder a circular prism.  No problem at all.

What to learn? Unclear.

csd

Chris Lusto
To: Christopher Danielson

I see a huge distinction between “unachievable due to resource constraints” and “unachievable by definition.”  Why is the possibility that CERN moved some particles faster than light a big deal?  We’ve already moved all kinds of stuff 99.999999% that fast in the lab.  The extra .000001% is practically trivial, but philosophically enormous.  It’s not that faster-than-light travel seemed to be practically impossible, but literally, probability exactly 0 impossible.

The difference between almost 0 and 0, no matter how small, is mathematically gigantic.

This is seriously all kinds of fun, but I have to go do some domestic things.  To be continued…in finite time.

C

Christopher Danielson
To: Chris Lusto

That’s the beautiful thing about email. It is at heart an asynchronous medium.

By the way, some would say that you have pointed to an important difference between mathematics and the sciences with your example.

csd

Thanks so much to Dr. Danielson for (a) having this discussion, and (b) letting me publish all the gory details.  Oh, and (c) making me smarter in the process.

Label Maker

If you’ve perused this blog, you know that I love probability.  I was fortunate enough to see Al Cuoco and Alicia Chiasson give a really cool presentation at this year’s NCTM conference about exploring the probabilities of dice sums geometrically and algebraically.  Wheelhouse.  After we got done looking at some student work and pictures of distributions, Al nonchalantly threw out the following question:

Is it possible to change the integer labels on two dice [from the standard 1,2,3,4,5,6] such that the distribution of sums remains unchanged?

Of course he was much cooler than that.  I’ve significantly nerded up the language for the sake of brevity and clarity.  Still, good question, right?  And of course since our teacher has posed this tantalizing challenge, we know that the answer is yes, and now it’s up to us to fill in the details.  Thusly:

First let’s make use of the Cuoco/Chiasson observation that we can represent the throw of a standard die with the polynomial

P(x) = x^1 + x^2 + x^3 + x^4 + x^5 + x^6

When we do it this way, the exponents represent the label values for each face, and the coefficients represent frequencies of each label landing face up (relative to the total sample space).  This is neither surprising, nor super helpful.  Each “sum” occurs once out of the six possible.  We knew this already.

What is super helpful is that we can include n dice in our toss by expanding n factors of P(x).  For two dice (the number in question), that looks like

P(x)^2 = x^2+2x^3+3x^4+4x^5+5x^6+6x^7+5x^8+4x^9+3x^{10}+2x^{11}+x^{12}

You can easily confirm that this jibes with the standard diagram.  For instance the sum of 7 shows up most often (6 out of 36 times), which helps casinos make great heaps of money off of bettors on the come.  Take a moment.  Compare.

Okay, so now we know that the standard labels yield the standard distribution of sums.  The question, though, is whether there are any other labels that do so as well.  Here’s where some abstract algebra comes in handy.  Let’s assume that there are, in fact, dice out there who satisfy this property.  We can represent those with polynomials as well.  We know that the coefficient on each term must still be 1 (each face will still come up 1 out of 6 times), but we don’t yet know about the exponents (labels).  So let’s say the labels on the two dice are, respectively

(a_1,a_2,a_3,a_4,a_5,a_6) and (b_1,b_2,b_3,b_4,b_5,b_6).

If we want the same exact sum distribution, it had better be true that

P(x)^2 = (x^{a_1}+x^{a_2}+x^{a_3}+x^{a_4}+x^{a_5}+x^{a_6}) (x^{b_1}+x^{b_2}+x^{b_3}+x^{b_4}+x^{b_5}+x^{b_6}).

For future convenience (trust me), let’s call the first polynomial factor on the right hand side Q(x).  Great!  Now we just have to figure out what all the a’s and b’s are.  It helps that our polynomials belong to the ring Z[x], which is a unique factorization domain.  A little factoring practice will show us that

P(x)^2 = x^2(x+1)^2(x^2+x+1)^2(x^2-x+1)^2.

We just have to rearrange these irreducible factors to get the answer we’re looking for.  Due to a theorem that is too long and frightening to reproduce here [waves hands frantically], we know that the unique factorization of Q(x)—our polynomial with unknown exponents—must be of the form

Q(x) = x^s(x+1)^t(x^2+x+1)^u(x^2-x+1)^v,

where s, t, u, and v are all either 0, 1, or 2.  So that’s good news, not too many possibilities to check.  In fact, we can make our lives a little easier.  First of all, notice that Q(1) must equal 6.  Right?  Each throw of that single die must yield each of the 6 faces with equal probability.  But then substituting 1 into the factored form gives us

Q(1) = 1^s2^t3^u1^v

Clearly this means that t and u have to be 1, and we just have to nail down s and v.  Well, if we take a look at Q(0), we also quickly realize that s can’t be 0.  It can’t be 2 either, because, if s is 2, then the smallest sum we could obtain on our dice would be 3—which is absolutely no good at all.  So s is 1 as well.  Let’s see what happens in our three remaining cases, when u is 0, 1, and 2:

u=0: Q(x)=x^1+x^2+x^2+x^3+x^3+x^4

u=1: Q(x)=x^1+x^2+x^3+x^4+x^5+x^6

u=2: Q(x)=x^1+x^3+x^4+x^5+x^6+x^8

Check out those strange and beautiful labels!  We can mark up the first die with the exponents from the u = 0 case, and the second die with the u = 2 case.  When we multiply those two polynomials together we get back P(x)2, which is precisely what we needed (check if you like)!  Our other option, of course, is to label two dice with the u =1 case, which corresponds to a standard die.  And, thanks to unique factorization, we can be sure that there are no other cases.  Not only have we found some different labels, we’ve found all of them!

If the a’s on the first die are (1,2,2,3,3,4), then the b’s end up being (1,3,4,5,6,8), and vice versa.  And, comfortingly, if the a’s on the first die are (1,2,3,4,5,6), then so are the b’s on the second one.

Two dice with the u = 1 label are what you find at every craps table in the country.  One die of each of the other labels forms a pair of Sicherman dice, and they are the only other dice that yield the same sum distribution.  You could drop Sicherman dice in the middle of Vegas, and nobody would notice.  At least in terms of money changing hands.  The pit boss might take exception.  Come to think of it, I cannot stress how important it is that you not attempt to switch out dice in Vegas.  Your spine is also uniquely factorable…into irreducible vertebrae.

*This whole proof has been cribbed from Contemporary Abstract Algebra (2nd ed.), by Joseph A. Gallian.  If you want the whole citation, click his name and scroll down.*

Playing to an Empty House

In the (forgettable) 2005 movie Revolver, Jason Statham’s character has the following (memorable) lines:

There is something about yourself that you don’t know.  Something that you will deny even exists until it’s too late to do anything about it.  It’s the only reason you get up in the morning…because you want people to know how good, attractive, generous, funny, wild, and clever you really are…We share an addiction.  We’re approval junkies.

Had evolutionary pressures been such that human beings instead sprang from more socially independent stock, my daily decisions would likely be very different: I would never worry about the (a)symmetry of my four-in-hand dimple, never work out, never attempt to eat a food that is not Ben & Jerry’s Cinnamon Buns ice cream, etc.  I certainly wouldn’t write a blog.  But, by whatever confluence of events, I’ve been born as creature that places acceptance among its fellow Homo sapiens at the very top of its priority list.  And it’s not just me of course.  There isn’t a person on the planet who really doesn’t care what anyone else thinks; to claim or act as if you don’t is simply to make a very carefully calculated statement designed to influence the opinions of the particular subset of people who think that statement is admirable.  And we want to be admired.

For a teacher, this is incredibly fortuitous.  We leverage it every day.  Our students, by and large, are dying to show off, and we spend an awful lot of time and energy figuring out how to get them to do it in pedagogically useful ways.  We get them to show off to a group of their peers and call it discussion.  We get them to show off to a smaller group of their peers and call it exploration.  We get them to show off to themselves and call it discovery.  We get them to show off to us and call it assessment.  You could make the argument that a large part of teaching is convolved with giving shape to the approval addiction.

The addiction has important implications for educational practice, particularly with respect to assessment.  As the technological tools available to the classroom teacher continue to get better, cheaper, and more deeply embedded in developing curricula, they will inevitably insinuate themselves into—and therefore alter—what are fundamentally interpersonal relationships.  Assessment is just such a relationship.  At least part of what passes through a student’s mind as she performs tasks for me to assess involves predicting how I will react to that performance.  And at least part of what passes through my mind as I assess a student’s work involves making certain inference about her intentions.  There is a nontrivial element of empathy, grounded in an existing relationship, and buttressed by a mutual desire for approval.  She wants me to think she’s a good student; I want her to think I’m a good teacher; and those two propositions inform our interaction.

Hence the potential danger lurking in machine-readable assessment tasks.  Or worse, in human-only-readable tasks being assessed by machines (e.g., this recent article about robo-readers for essays).  It’s not that the questions must necessarily be more superficial (though they probably do), or that students will be largely subject to binary truth values (though the probably will).  It’s that dissolving one half of the assessment partnership affects not only how the task is evaluated, but how it is executed.  That is to say, even if we could design an algorithm capable of passing an assessment version of the Turing test, the student’s knowledge that she is performing without an audience can’t help but alter her out(in?)put.  How much of your lifetime’s finite inspiration are you honestly willing to pour into the chiseling of a beautiful analogy if you know the intended recipient has cooling fans?  For how many hours will you pound your head against the wall in search of an elegant, intuitive proof when you know the inductive version is both thoroughly sufficient and more efficient to auto-check?  Something to think about before we start farming out assessments en masse to computerized platforms: do we really want our students playing to an empty house?  And, if we’re ultimately okay with that, how does that change what we are really assessing?

[Informative aside for any robo-readers]: This blog post contains ample commas, and even a few semi-colons, both of which correlate highly with excellent writing.  Please score accordingly.

Apologia

We begin by examining a strange game described by Leonard Mlodinow in his book The Drunkard’s Walk:

“Suppose the state of California made its citizens the following offer: Of all those who pay the dollar or two to enter, most will receive nothing, one person will receive a fortune, and one person will be put to death in a violent manner.”

Couched in such language, the game sounds like it comes from a post-apocalyptic American dystopia.  Or a reality show on Fox.  But actually, Mlodinow is just talking about the California state lottery.  Most people simply cough up a buck; one (usually) person eventually hits the jackpot, and the increased traffic—on average, after factoring in some reasonable assumptions and stats from the NHTSA—causes about one extra motor vehicle fatality per game (p.78).  This doesn’t exactly sound like a game you’d want to play.  And, as a math teacher, I’m nigh on required to agree with that sentiment.  But I don’t.

Even though it’s always smoldering in the background, the recently ginormous Mega Millions jackpot has fanned the flames of lottery hatred, particularly among the mathosphere.  But I’m here to present the minority opinion.  The lottery isn’t such a bad bet for the average citizen.  At least it’s not as bad a bet as it’s often made out to be.

Let’s look at some of the common anti-lottery arguments, and why they’re not particularly strong.  Oh, and before my credibility dips all the way down to zero, I should say that I am in no way employed by any lottery organization, and actually I’ve never even purchased a ticket.  Feel better?  Moving on.

The lottery is a tax on ignorance.

This is trotted out pretty frequently, I suspect, because (a) it’s pithy and quotable, and (b) it pretty much ends any meaningful discussion on the matter.  It’s like calling religion “the opiate of the masses.”  It seeks to make the opposing position seem automatically ridiculous.  How can you have a reasoned debate after that?  You can’t, really, which is one good reason to dismiss this kind of argumentation out-of-hand.  But besides belonging to a class of bad arguments, this particular one is awfully thin.  Maybe it wasn’t always, but it is now.

Calling the lottery a “tax on ignorance” is like putting warning labels on cigarettes.  There was a time when the public was legitimately unaware of the dangers of smoking.  Hell, doctors even sponsored cigarette brands.  And when the damaging effects of tobacco came to light, people needed to be warned that it was, in fact, not such a wonderful idea to use it.  Hence, warning labels.  But in 2012, they’re redundant.  I just can’t believe that there is one person in this country who has been to even a single day of public school who thinks that cigarettes are safe.  Ever surprise a smoker by telling him that cigarettes will kill him?  Didn’t think so.

Ever surprise a lotto player by telling him he’ll never win?  Didn’t think so.  No one is ignorant of the astronomical odds against hitting the jackpot.  Maybe there was a time when people were being duped, but that time has long passed.  We can certainly talk about whether it’s okay for the government to make money off of people’s hopes and dreams of a fantasy life, but let’s not pretend that people are unaware of the fantasy.

You can take the money you would have spent on lottery tickets and invest it instead.

Let’s ignore, for a moment, that a savings account is currently losing you about 2% per year (in real dollars), and that an index fund over the past five years has lost you something like 8% per year.  I mean, those quote investments are certainly still better than the lottery, which loses you just shy of 100% per year.  But why are we mentioning the lottery in the same breath as investments, anyway?

Let’s reformulate the above heading: “You can take the money you would have spent on X and invest it instead.”  And that sentence is true for anything you happen to spend your money on.  Why are lotto tickets so special?  In what sense is buying a lottery ticket more a waste of money than buying a King Size Snickers?  Neither one of those things gives you a monetary return on your investment.  But of course that’s not what we expect out of a Snickers.  And, I submit, it’s not really what we expect out of a lottery ticket, either.  What we expect out of both of those purchases is utility.  And obviously there is some utility to be had in both cases (about a dollar’s worth), since people are willing to pay it.  Why pay a dollar to make you fatter and increase your risk of diabetes?  I don’t know.  Why pay a dollar to spend a few days wistfully imagining a life that includes indoor hot tubs?  I don’t know.  They’re equally silly, and offer roughly equivalent utility to a great many people.

Purchasing a lottery ticket has a negative expectation.

This is my favorite mathematical argument, because it’s terrible.  I will grant you that this is almost always (but not quite) true.  A $1 lottery purchase normally has an expected value of very nearly -$1.  When the jackpot gets very large, the expectation becomes slightly less negative, and when the jackpot gets hugely large, the expectation might even creep into the  black.  But all of that is really beside the point.

First of all, the negative expected value is very tiny.  For most people who buy lottery tickets, that expenditure is trivial.  I certainly waste way, way more money per week in buying “sure things” than most of the lotto faithful do in gambles.  But that’s not really the point, either.

The point is this.  Saying you shouldn’t make any gambles with negative expected value is to tacitly imply that you should also be in favor of gambles with positive expectation and be indifferent to gambles with zero expectation.  This would make you a completely rational actor.  It would also make you a complete idiot.

Are you indifferent to betting $100,000 with me on the flip of a fair coin?  I’m sure as hell not!  I’ll do you one better: I’ll pay you $1.3 million dollars if it comes up heads, and you pay me $1 million even if it comes up tails.  Now that bet has a positive expected value for you…wanna gamble?  Of course you don’t.  Monetary expectation has almost nothing to do with your willingness to engage in risk.  It’s your expected utility that you’re worried about, and utility is not linear with money.  Over small intervals of the domain, it might be approximately linear, and so it’s tempting to equate the two, but they’re very different globally, as our coin-flipping gambles show.  A dollar’s worth of utility lost is absolutely trivial (to me), but the potential utility that comes with hundred of millions of dollars, even with very small probability, more than counters that loss.  I’m basically free-rolling: paying nothing for the chance at something.  In other words it’s possible (even normal) for me to have a negative expectation in money, but a positive expectation in utility.  And that’s the only expectation that really matters.

Conclusion

Of course for some people the lottery is terrible.  People have gambling problems.  People spend way too much money on all kinds of things they probably shouldn’t.  But that doesn’t mean that everyone—or even most people—that play are suckers.  Eating the occasional King Size Snickers probably won’t get your foot chopped off; smoking the occasional cigarette probably won’t kill you (sorry, kids), and buying the occasional lottery ticket will likely have about zero net impact on your finances.  Besides, isn’t it worth it to dream, for even a day, of having indoor hot tubs?  They’re so bubbly.

Building a Probability Cannon

For just a moment, let’s consider a staple of the second year algebra curriculum: the one-dimensional projectile motion problem.  (I used to do an awful lot of this sort of thing.)  It’s not a fantastic problem—it’s overdone, and often under-well—but it’s representative of many of our standard modeling problems in some important ways:

  1. Every one of my students has participated in the activity we’re modeling.  They’ve thrown, dropped, and shot things.  They’ve jumped and fallen and dove from various heights.  In other words, they have a passing acquaintance with gravity.
  2. Data points are relatively easy to come by.  All we need is a stopwatch and a projectile-worthy object.  If that’s impractical, then there are also some great and simple—and free—simulations out there (PhET, Angry Birds), and some great and simple—and free—data collection software as well (Tracker).
  3. We only need a few data points to fix the parameters.  For a general quadratic model, we only need three data points to determine the particular solution.  Really we only need two, if we assume constant acceleration.
  4. Experiments are easy to repeat.  Drop/throw/shoot the ball again.  Run the applet again.
  5. The model conforms to a fairly nice and well-behaved family of functions.  Quadratics are continuous and differentiable and smooth, and they’re generally willing to submit to whatever mathematical poking we’re wont to visit upon them without getting gnarly.
  6. Theoretical predictions are readily checked.  Want to know, for instance, when our projectile will hit the ground?  Find the sensible zero of the function (it’s pretty easy to sanity check its reasonableness—see #1 above).  Look at a table of values and step through the motion second-by-second (use a smaller delta t for an even better sense of what’s going on).  Click RUN on your simulation, and wait until it stops (self-explanatory).  And, if you’re completely dedicated, build yourself a cannon and put your money where your mouth is.

Of course I’ve chosen to introduce this discussion with the example of projectile motion, but there are plenty of other candidates: length/area/volume, exponential growth and decay, linear speed and distance.  Almost without exception (in the algebra classroom), we model phenomena that satisfy the six conditions listed above.

Almost.  Because then we run into probability, and probability isn’t so tame.  I’ll grant that #1 still holds (though I’m not entirely convinced it holds in the same sense), but the other five conditions go out the window.

Data points are NOT easy to come by.

I can already hear you protesting.  “Flip a coin…that’s a data point!”  Well, yes.  Sort of.  But in the realm of probability, individual data points are ambiguous.  The ordered pair (3rd flip, heads) is very different from (3 seconds, 12 meters).  They’re both measurements, but the first one has much, much higher entropy.  Interpretation becomes problematic.  Here’s another example: My meteorologist’s incredibly sophisticated model (dart board?) made the following prediction yesterday: P(rain) = 0.6.  In other words, the event “rain” was more likely than the event “not rain.”  It did not rain yesterday.  How am I to understand this un-rain?  Was the model right?  If so, then I’m not terribly surprised it didn’t rain.  Was the model wrong?  If so, then I’m not terribly surprised it didn’t rain.  In what sense have I collected “data?”

And what if I’m interested in a compound event?  What if I want to know not just the result of a lone flip, but P(exactly 352 heads in 1000 flips)?  Now a single data point suddenly consists of 1000 trials.  So it turns out data points have the potential to be rather difficult to come by, which brings us to…

We need an awful lot of data points.

I’m not talking about our 1000-flip trials here, which was just a result of my arbitrary choice of one particular problem.  I mean that, no matter what our trials consist of, we need to do a whole bunch of them in order to build a reliable model.  Two measurements in my projectile problem determine a unique curve and, in effect, answer any question I might want to ask.  Two measurements in a probabilistic setting tell me just about nothing.

Consider this historical problem born, like many probability problems, from gambling.  On each turn, a player rolls three dice and wins or loses money based on the sum (fill in your own details if you want; they’re not so important for our purposes here).  As savvy and degenerate gamblers, we’d like to know which sums are more or less likely.  We have some nascent theoretical ideas, but we’d like to test one in particular.  Is the probability of rolling a sum of 9 equal to the probability of rolling a sum of 10?  It seems it should be: after all, there are six ways to roll a 9 ({6,2,1},{5,3,1},{5,2,2},{4,4,1},{4,3,2},{3,3,3}), and six ways to roll a 10 ({6,3,1},{6,2,2},{5,4,1},{5,3,2},{4,4,2},{4,3,3})*.  Done, right?

It turns out this isn’t quite accurate.  For instance, the combination {6,2,1} treats all of the 3! = 6 permutations of those numbers as one event, which is bad mojo.  If you go through all 216 possibilities, you’ll find that there are actually 27 ways to roll a 10, and only 25 ways to roll a 9, so the probabilities are in fact unequal.  Okay, no biggie, our experiment will certainly show this bias, right?  Well, it will, but if we want to be 95% experimentally certain that 10 is more likely, then we’ll have to run through about 7,600 trials!  (For a derivation of this number—and a generally more expansive account—see Michael Lugo’s blog post.)  In other words, the Law of Large Numbers is certainly our friend in determining probabilities experimentally, but it requires, you know, large numbers.

*If you’ve ever taught probability, you know that this type of dice-sense is rampant.  Students consistently collapse distinct events based on superficial equivalence rather than true frequency.  Ask a room of high school students this question: “You flip a coin twice.  What’s the probability of getting exactly one head?”  A significant number will say 1/3.  After all, there are three possibilities: no heads, one head, two heads.  Relatively few will immediately notice, without guidance, that “one head” is twice as likely as the other two outcomes.

Experiments are NOT easy to repeat.

I’ve already covered some of the practical issues here in terms of needing a lot of data points.  But beyond all that, there are also philosophical difficulties.  Normally, in science, when we talk about repeating experiments, we tend to use the word “reproduce.”  Because that’s exactly what we expect/are hoping for, right?  I conduct an experiment.  I get a result.  I (or someone else) conduct the experiment again.  I (they) get roughly the same result.  Depending on how we define our probability experiment, that might not be the case.  I flip a coin 10 times and count 3 heads.  You flip a coin 10 times and count 6 heads.  Experimental results that differ by 100% are not generally awesome in science.  In probability, they are the norm.

As an interesting, though somewhat tangential observation, note that there is another strange philosophical issue at play here.  Not only can events be difficult to repeat, but sometimes they are fundamentally unrepeatable.  Go back to my meteorologist’s prediction for a moment.  How do I repeat the experiment of “live through yesterday and see whether it rains?”  And what does a 60% chance of rain even mean?  To a high school student (teacher) who deals almost exclusively in frequentist interpretations of probability, it means something like, “If we could experience yesterday one million times, about 600,000 of those experiences would include rain.”  Which sounds borderline crazy.  And the Bayesian degree-of-belief interpretation isn’t much more comforting: “I believe, with 60% intensity, that it will rain today.”  How can we justify that level of belief without being able to test its reliability by being repeatedly correct?  Discuss.

Probability distributions can be unwieldy.

Discrete distributions are conceptually easy, but cumbersome.  Continuous distributions are beautiful for modeling, but practically impossible for prior-to-calculus students (not just pre-calculus ones).  Even with the ubiquitous normal distribution, there is an awful lot of hand-waving going on in my classroom.  Distributions can make polynomials look like first-grade stuff.

Theoretical predictions aren’t so easily checked.

My theoretical calculations for the cereal box problem tell me that, on average, I expect to buy between 5 and 6 boxes to collect all the prizes.  But sometimes when I actually run through the experiment, it takes me northward of 20 boxes!  This is a teacher’s nightmare.  We’ve done everything right, and then suddenly our results are off by a factor of 4.  Have we confirmed our theory?  Have we busted it?  Neither?  Blurg.  So what are we to do?

We are to build a probability cannon!

With projectile motion problems, building a cannon is nice.  It’s cool.  We get to launch things, which is awesome.  With probability, I submit that it’s a necessity.  We need to generate data: it’s the raw material from which conjecture is built, and the touchstone by which theory is tested.  We need to (metaphorically) shoot some stuff and see where it lands.  We need…simulations!

If your model converges quickly, then hand out some dice/coins/spinners.  If it doesn’t, teach your students how to use their calculators for something besides screwing up order of operations.  Better yet, teach them how to tell a computer to do something instead of just watching/listening to it.  (Python is free.  If you own a Mac, you already have it.)  Impress them with your wizardry by programming, right in front of their eyes, and with only a few lines of code, dice/coins/spinners that can be rolled/flipped/spun millions of times with the push of a button.  Create your own freaking distributions with lovely, computer-generated histograms from your millions of trials.  Make theories.  Test theories.  Experience anomalous results.  See that they are anomalous.  Bend the LLN to your will.

Exempli Gratia

NCTM was kind enough to tweet the following problem today, as I was in the middle of writing this post:

Okay, maybe the probability is just 1/2.  I mean, any argument I make for Kim must be symmetrically true for Kyle, right?  But wait, it says “greater than” and not “greater than or equal to,” so maybe that changes things.  Kim’s number will be different from Kyle’s most of the time, and it will be greater half of the times it’s different, so…slightly less than 1/2?  Or maybe I should break it down into mutually exclusive cases of {Kim rolls 1, Kim rolls 2, … , Kim rolls 6}.  You know what, let’s build a cannon.  Here it is, in Mathematica:

Okay, so it looks like my second conjecture is right; the probability is a little less than 1/2.  Blammo!  And it only took (after a few seconds of typing the code) 1.87 seconds to do a million trials.  Double blammo!  But how much less than 1/2?  Emboldened by my cannon results, I can turn back to the theory.  Now, if Kyle rolls a one, Kim will roll a not-one with probability 5/6.  Ditto two, three, four, five, and six.  So Kim’s number is different from Kyle’s 5/6 of the time.  And—back to my symmetry argument—there should be no reason for us to believe one or the other person will roll a bigger number, so Kim’s number is larger 1/2 of 5/6 of the time, which is 5/12 of the time.  Does that work?  Well, since 5/12 ≈ 0.4167, which is convincingly close to 0.416159, I should say that it does.  Triple blammo and checkmate!

But we don’t have to stop there.  What if I remove the condition that Kim’s number is strictly greater?  What’s the probability her number is greater than or equal to Kyle’s?  Now my original appeal to symmetry doesn’t require any qualification.  The probability ought simply be 1/2.  So…

What what?  Why is the probability greater than 1/2 now?  Oh, right.  Kim’s roll will be equal to Kyle’s 1/6 of the time, and we already know it’s strictly greater than Kyle’s 5/12 of the time.  Since those two outcomes are mutually exclusive, we can just add the probabilities, and 1/6 + 5/12 = 7/12, which is about (yup yup) 0.583.  Not too shabby.

What if we add another person into the mix?  We’ll let Kevin join in the fun, too.  What’s the probability that Kim’s number will be greater than both Kyle’s and Kevin’s?

It looks like the probability of Kim’s number being greater than both of her friends’ might just be about 1/4.  Why?  I leave it as an exercise to the reader.

That tweet-sized problem easily becomes an entire lesson with the help of a relatively simple probability cannon.  If that’s not an argument for introducing them into your classroom, I don’t know what is.

Ready.  Aim.  Fire!

Thanks to Christopher Danielson for sparking this whole discussion.