Monday, 15 May 2017

Correlate things ... because you can?

It's certainly possible to run -- and find! -- correlations between any number of things, but we all hopefully) know that correlation does not indicate causality, or really that one item has anything to do with the other. Sometimes correlations indicate things that co-occur in a meaningful way (they may not have a causal relationship but rather a common underlying cause), and other times correlations are just random.

Tyler Vigen has a blog -- and now a book -- that identifies all manner of spurious correlations. Y'know, like the correlation between Japanese passenger cars sold in the US and suicides by crashing motor vehicles. The correlation is real! The numbers don't lie! But we need the be careful about interpreting it.

Check it out! Which is your favorite?

http://www.tylervigen.com/spurious-correlations

You can even discover your own. For example, what correlates with sunlight in FL? Find the answers here: http://tylervigen.com/discover?type_select=sunlight&var_select=Florida&exclude_county=on

2 comments:

  1. This is a great point - just because things correlate does not necessarily mean that they are casual or related. It could be mere coincidence, like many of the things on Tyler's blog happen to be, I imagine! Nonetheless, it is fun to look at all of his graphs and try to imagine reasons as to how the two things are directly casual to one another. For example, I laugh to myself as I imagine mothers feeding their children copious amounts of mozzarella cheese in order to increase the likelihood that their sons and daughters will eventually receive doctoral degrees in engineering because of this! This site/book is a good reminder that we must always use caution in our interpretation of things and explore all the avenues that the data is giving us (or perhaps what it may not be giving us - what is left unseen and why).

    ReplyDelete
    Replies
    1. People will do ALL SORTS of crazy things because they misunderstand data. I remember hearing about two neighboring high schools and how parents attempted to transfer from one to the other because of higher test scores even though the difference between the performance of students at the two schools was not statistically significant.

      Delete