Perils in big data

Worthwhile FT piece by economist Tim Harford about the big data phenomenon.  Make no mistake -- the availability of terrabytes of data on all things imaginable will eventually lead to exciting breakthroughs.  Harford's concern goes back to the old question of correlation versus causation.  Just because two variables appear together at a relatively high frequency, it does not mean that one causes the other.  And even if there was causation, it is hard to tell which way causality runs.

Another challenge is whether the correlation is spurious because it is based on biased data.  Harford notes this is a particular challenge for data sources such as Twitter, whose users do not represent the overall population.  Money quote:
“Big data” has arrived, but big insights have not. The challenge now is to solve new problems and gain new answers – without making the same old statistical mistakes on a grander scale than ever.
MBA students at NC State are getting training on a wide range of tools for analyzing big data, but they also are getting the theoretical insight they will need to make sense of the results they obtain.  

