Monday, May 4, 2009

Oompa Loompa, Doompadee Doo

ESPN’s television broadcast the morning of the final Bulls-Celtics game listed the results of a website poll whereby web readers could pick from among four predictions for the game: Bulls in regulation, Bulls in overtime, Celtics in regulation, Celtics in overtime. I don’t recall the precise numbers, and the poll does not seem to be available any longer online, but something like 30% of respondents predicted the Bulls to win in OT and 21% predicted the Celtics in OT. So more than half of poll-takers thought Game 7 would go to overtime! (In the event, Boston won 109-99, in regulation time.)

Now, the raw proportion of all NBA games across the land, across years that go to OT is probably no more than 5%. To the extent you know that two teams are evenly matched (and Chicago and Boston minus Garnett certainly are), you are probably on relatively safer ground to predict that the game will go to overtime. But I would think that for these even-steven Bulls and Celtics, the most prudently correct answer to the question “Will the next game go to overtime?” is surely “No”. So over half of poll-takers got it wrong, no?


At first glance, this looks to me like an example of the representativeness bias, whereby, e.g., if you ask a poll respondent, “Who would you more likely find in a library: (i) a woman, or (ii) a middle-aged woman environmentalist with glasses?” many respondents will choose the latter because it sounds more quintessentially library-esque. But, of course, back in the real world, maybe 55% of library patrons are women, while about 3% are middle-aged bespectacled tree-huggettes. People don’t handle probabilities well. An overtime game between the Bulls and Celtics feels more emblematic of that series, so people feel, however wrongly, that it will recur.

Perhaps my point is weakened because 4 of the first 6 Bulls-Celtics games actually did in fact go to OT, so out of that sample, it’s actually safe to predict extra minutes. But sometimes crazy shit goes down, just by fluke. It’s hard to believe that I should update my probabilistic expectation of overtime from 5% to 50%. What do the numbers say? Consider the following Bayesian probability formula:

Such an updating would require that the proportion of Bulls-Celtics games out of all the overtime games played in 2008-09 is roughly ten times the proportion of Bulls-Celtics games out of all the NBA games played this season. So let’s consider some data. (I obtained all these numbers from the wonderful basketball-reference.com.) In 2008-09, there were 1230 regular-season games and 45 first-round playoff games, for a total of 1275 games, of which 70 regular-season and 4 playoff, or 5.804%, were overtime games. Out of all those 1275 games, the Bulls and Celtics played 3 regular-season games and 7 playoff games, for a total of 10 games, which was 0.784% of the 1275 games. Out of the 74 overtime games, 4, or 5.405%, were Bulls-Celtics games. So, the answer to my equational riddle up above is 5.405% * 5.804% / 0.784%, which equals 40.0%. Hmmm, maybe I was too quick to criticize the masses. 40% is pretty close to 50%, so predicting an overtime game based on the available data is not such a stretch. What if we make the analysis more precise by truncating the data set just prior to May 2nd, the date of Bulls-Celts game 7? This would put us in the position of a Bayesian observer filling out that ESPN poll just after game 6. This only deletes two games from our data set — Bulls-Celts game 7 and Heat-Hawks game 7. Now the correct Bayesian probability of overtime is 44.4% ! Wow! My initial supercilious dismissal of the silly plebes was way off. It seems that ESPN web users internalized the available observations quite well. 44.4% is still not high enough to predict that Game 7 will go to OT, but it's dang close.

Meanwhile, the Atlanta-Miami series also seems to be indicative that the teams are evenly matched, but in a wildly different way. If Atlanta had won three games by huge margins and Miami had won three squeakers, then we might say that Atlanta is a better team and Miami has been touched by kind luck. But actually, the first six games were all settled by double-figure margins, and no game saw a lead change after the first quarter. We might say that the two teams are mediocre at best, and equally likely to suffer horrible mental holidays.

-------------------

UPDATE JUNE 17TH: I just became aware of this June 10th blog post from Northwestern University professor of economics Jeff Ely. Check out the two Youtube animated graphs, showing the how the distribution of score differentials changes in the last 60 seconds of an NBA game, averaged over 12 years of data. Absolutely amazing stuff. The basic gist is that tie outcomes after 48 minutes are far, far more likely than 1-point victories or 2-point victories. You should also read the comments, which provide some speculative explanations for that data. I don't think this new information contradicts anything in my original post above.

No comments: