Balkinization

Crosspost from Freakonomics:

I eagerly awaited and quickly devoured SuperFreakonomics when it appeared a few weeks ago. And while many reviewers are focusing on the substance of the book, I’m struck by two shifts in the Levitt/Dubner method.

First, SuperFreakonomics is more of an effort at problem solving. The original Freakonomics book showed how creative econometrics applied to historic data could be used to uncover the “hidden” causes of observed behavior. To be sure, SuperFreakonomics retains many examples of the hidden-side-of-everything data mining. But the new book is much more of a solutions book. It uses economic thinking to generate new ideas to solve really big problems. Levitt and Dubner are admirably leveraging the success of the first book to try to make the world a better place. They are on the lookout for concrete suggestions to reduce the lives lost from hurricanes, hospital infections, global warming, automobile accidents and even walking drunk.

In the original book, number crunching itself was the solution. Forensic number crunching could help identify whether Sumo wrestlers had thrown a match or whether Chicago teachers were cheating on test scores. In the new book, number crunching is instead used to verify that a particular solution (such as hand-washing or ocean cooling) is likely to work.

The Randomization Lens

The second methodological shift is subtler. The first book focused on historical data. For example, a core story of the original book looked at data on crime and abortion. In a truly inspired moment, Levitt (and his coauthor John Donohue) were able to show that legalizing abortion reduced the amount of crime — 18 years later. Mining historic data can produce truly startling results.

But a higher proportion of the new book is devoted to studies that use randomized field experiments to find out what causes what. If you want to know whether offering donors a two-for-one matching grant produces more charitable donations than a one-for-one grant, you randomly assign potential donors to receive one of these two solicitations and then look to see whether the two groups give different amounts.

One sign of the shift toward randomization is the prominence of John List and his rise to fame in the economics profession. John is one of the great field experimenters in economics today. He’s the kind of guy who goes to baseball card shows and at random treats one set of card dealers differently from another and then sees whether they offer different prices. (You can read an excerpt of the book’s discussion of List here).

SuperFreakonomics not only relates the results of more randomized experiments than Freakonomics did, it also explains how the idea of randomized experiments is leading statisticians to think more clearly about how to use regression analysis to test for causal effects with historic data. There is a new zeitgeist in the way economists think about running regressions. Today, statistical economists explicitly think of their regressions in terms of randomized experiments. They think of the variable of interest as the “treatment” and ask themselves what kind of assumptions they need to make or what kind of statistical procedures they need to run on the historic data to emulate a randomized study. This new way of thinking is very much on display in the truly excellent (but technically demanding) book, Mostly Harmless Econometrics: An Empiricist’s Companion, by Joshua Angrist and Jorn-Steffen Pischke. (I praised the book in a previous post because it “captures the feeling of how to go about trying to attack an empirical question….”). For example, Angrist and Pischke show that the regression-discontinuity design (which I’ll say more about in a later post) provides causal inference from historic correlation because it emulates randomized assignment of a treatment to otherwise similar subjects.

What Economists Would Really Like To Do

SuperFreakonomics very much reflects this new randomization lens as a way of thinking about data-mining. Without off-putting jargon, Levitt and Dubner explain how regressions can give you quasi-experimental results. Indeed, with help from my Kindle, I found three parallel descriptions that turn on making the randomization analogy. For example, listen to how they describe testing for sex discrimination on the job:

Economists do the best they can by assembling data and using complex statistical techniques to tease out the reasons why women earn less than men. The fundamental difficulty, however, is that men and women differ in so many ways. What an economist would really like to do is perform an experiment, something like this: take a bunch of women and clone male versions of them; do the reverse for a bunch of men; now sit back and watch. By measuring the labor outcomes of each gender group against their clones, you could likely gain some real insights. Or, if cloning weren’t an option, you could take a bunch of women, randomly select half of them, and magically switch their gender to male, leaving everything else about them the same, and do the opposite with a bunch of men. Unfortunately, economists aren’t allowed to conduct such experiments. (Yet.)

They go on to describe how, in the absence of randomized data, some (limited) progress might be gleaned by looking at the historic experience of transgendered people — before and after sex reassignment surgery. They take a similar approach when tackling the question of testing physician quality:

What you’d really like to do is run a randomized, controlled trial so that when patients arrive they are randomly assigned to a doctor, even if that doctor is overwhelmed with other patients not well equipped to handle a particular ailment. But we are dealing with one set of real, live human beings who are trying to keep another set of real, live human beings from dying, so this kind of experiment isn’t going to happen, and for good reason.
Since we can’t do a true randomization, and if simply looking at patient outcomes in the raw data will be misleading, what’s the best way to measure doctor skill? Thanks to the nature of the emergency room, there is another sort of de facto, accidental randomization that can lead us to the truth.

The “next in line” queue at some emergency rooms provides quasi-random assignments and allows researchers to emulate the results on a randomized test. The magic “really like to do” words appear a third time when Levitt and Dubner talk about testing whether more incarceration would really lower the crime rate:

To answer this question with some kind of scientific certainty, what you’d really like to do is conduct an experiment. Pretend you could randomly select a group of states and command each of them to release 10,000 prisoners. At the same time, you could randomly select a different group of states and have them lock up 10,000 people, misdemeanor offenders perhaps, who otherwise wouldn’t have gone to prison. Now sit back, wait a few years, and measure the crime rate in those two sets of states. Voilà! You’ve just run the kind of randomized, controlled experiment that lets you determine the relationship between variables.
Unfortunately, the governors of those random states probably wouldn’t take too kindly to your experiment. Nor would the people you sent to prison in some states or the next-door neighbors of the prisoners you freed in others. So your chances of actually conducting this experiment are zero.
That’s why researchers often rely on what is known as a natural experiment, a set of conditions that mimic the experiment you want to conduct but, for whatever reason, cannot. In this instance, what you want is a radical change in the prison population of various states for reasons that have nothing to do with the amount of crime in those states. Happily, the American Civil Liberties Union was good enough to create just such an experiment.

The methodological repetition across these examples is one of the book’s strengths. This is really the way that many empirical economists talk to themselves about testing. Regardless of the problem, we often now start with the same basic question.

One of the great early stories from SuperFreakonomics is the finding that “even after factoring in the deaths [innocent bystanders from drunk driving], walking drunk leads to five times as many deaths per mile as driving drunk.” The substantive fact is not only surprising, but the story also metaphorically foreshadows the book’s new emphasis on experimental approaches. After all, what makes a drunkard’s walk so dangerous is that the drunkard lurches from side to side randomly.