Balkinization  

Monday, October 13, 2008

Free Super-Crunching Software

Ian Ayres

Crosspost from Freakonomics:

I probably have an unhealthy attraction to the powers of Excel. I taught my daughter how to use it when she was 7. When I teach corporate finance, I try to make sure that my law students come away from the course knowing how to crunch in Excel.


It would be embarrassing to teach students how to use Microsoft Word in a law-school course; but one of the goals of my corporate finance course is to make sure that they can comfortably manipulate its numerical cousin.


A middle-school math teacher recently told me that there are some things you can do on a graphing calculator that you just can’t do in Excel. I’m pretty sure (like 99 percent sure) that this is not true. In fact, Microsoft has expanded the functionality of Excel so that it’s starting to invade the domain of statistical packages.



The just-published (shameless plug) paperback edition of Super Crunchers has a new chapter that describes several different free tools that make it easier and easier to crunch numbers.



1. Microsoft has a new data-mining add-in that lets you run all kinds of cool statistical procedures inside Excel. Taking a page from the Google playbook, Microsoft is just giving this add-in away (but it only works if you’ve purchased the Office 2007 version of Excel).


2. Google (taking a page from its own playbook) is giving away its Website Optimizer, which will let you run randomized experiments on your own web page.


Any webmaster who is not running randomized trials on different page content is making a serious mistake.


Here’s an explanatory video. I’ve used the Website Optimizer myself and it is a joy to use.


3. I’ve created and assembled links to a bunch of cool “prediction tools” that let you plug in a few numbers and predict how long you’ll live, predict your due date (if you’re pregnant), rate the quality of a book title, or even predict political or sporting contests.


One of the cool things about these tools is that they provide feedback on the precision of predictions that is easy to digest. When you see the results of an experiment like this one below, you have a pretty clear idea of not only the winner, but of how confident you should be in the results.


INSERT DESCRIPTION


(As with all other statistical tests, you should not just blindly accept the p-values in the print out, but these graphics are still a huge leap forward.)


A fourth freebie is the open-source statistical package called “R.” While most members of the Freakonomics crowd tend to use Stata as their statistical package of choice (and businesses tend to run SAS or SPSS), R is the Linux of statistical software. It lets you do an awful lot for free.


Of course, having mastered the commands of Stata and SAS, I have poor incentives to learn the commands of a new (GNU) software. And R is probably not kept up to speed on the cutting-edge empirical methods as quickly as the traditional packages. (I should disclose that SPSS and SAS have paid me handsomely to give Super Crunching talks, so I may not be the most objective observer.)


But then again, R has plenty of power to run the vast majority of statistical techniques. There is still a huge discrepancy between the techniques that are used by academics and those used in business.


In fact, here’s a Super Crunchers bleg: Can anyone identify an instance where a business has run an instrumental-variables regression?


The I.V. approach has been around for decades and is a standard (if misused) technique in hundreds, if not thousands, of academic articles. But provocatively, I’d almost bet that it has never yet been used by a corporation to help make a business decision. We’ll send some Freakonomics schwag to the first person who can prove me wrong.




Comments:

And R is probably not kept up to speed on the cutting-edge empirical methods as quickly as the traditional packages.

Eh? I don't know how quickly Stata/SPSS/SAS get updated, but I do know that people write and give away their own packages for R constantly. For any gee whiz thing, there's some cutting-edge statistician who wants to do it in R and is willing to write a package to do so.
 

What Paul said. R's capabilities run circles around Stata's. And, in political science, at least, R is rapidly supplanting Stata (which supplanted SAS/SPSS a decade or more ago) as the one package you *must* know.

As for IVs, I did some consulting for a(n unnamed) big firm a decade or so ago where I used IV regression. Does that land me the swag?
 

Post a Comment

Older Posts
Newer Posts
Home