After weathering weeks of attacks from conservative pundits, Nate Silver earned vindication on Election Day. Not only did he correctly peg Obama’s resounding victory – putting his chances of winning at more than 90% when many pundits deemed the race “too close to call” – but he also correctly predicted all the states, after getting only one wrong in 2008.
A prevailing narrative following the election has been the triumph of the quants over the pundits — and data over intuition — with Silver hailed as the face of the former. As we discussed in our election report card, it did not take much to outperform the pundit community, which turned in a woeful performance this election season. This raises the question: might Silver be getting too much credit because he is being compared to the wrong benchmark?
In fact, a better benchmark for prediction models such as Silver’s would be aggregators such as prediction markets (e.g. Betfair, Pinnacle, Intrade) and poll averages (e.g. RealClearPolitics). These sites harness the so-called “wisdom of crowds” and thus theoretically offer the best window into consensus view (we say theoretically because prediction markets can lack liquidity and polls can lack incentives). At PunditTracker, we believe the consensus view is the most appropriate benchmark for all predictions, so let’s look at how Silver stacked up against one of the prediction markets.
We collected projections from FiveThirtyEight (Silver’s blog) and Intrade on the day prior to the election, both for the Presidential race (50 states & District of Columbia) as well as the 33 Senate races.
Both Silver and Intrade had Romney winning Florida (Silver’s projection changed after we collected the data) and incorrectly predicted Republicans would win the Montana and North Dakota Senate seats.
Now, since Silver and Intrade were making probabilistic forecasts, we can award them more or less credit depending on how much conviction they had in their calls. Using this line of thought, Silver should get more credit for assigning a higher probability to Obama winning (86%) than Intrade did (67%). A simple tool to calibrate for conviction is known as the Brier score. After adjusting for the longshot bias in prediction markets, Silver achieved a better Brier score in the Presidential races (Florida accounted for the bulk of the difference), while Intrade barely edged him out in the Senate races. Overall, Silver turned in the superior performance in 2012 when judged using this methodology.
A paper by David Rotschild that compared the projections of Silver and InTrade during the 2008 election arrived at the following conclusion:
<< [In] the 2008 election cycle FiveThirtyEight’s debiased poll-based forecasts were, on average, slightly more accurate than Intrade’s raw prediction market-based prices. But when prediction markets are properly debiased, they are more accurate and contain more information than debiased polls; this advantage is most significant for forecasts made early in the cycle and in not-certain races (i.e., the races typically of most interest). >>
When viewed relative to a more appropriate benchmark, Silver’s projections for the past two elections were very solid, and he deserves praise for this achievement. However, the notion that he was a lone wolf with his high-conviction projections of Obama victories in both 2008 and 2012 is entirely misplaced. This was the consensus view, properly defined. Granted, the headline “Nate Silver got one more state right than the crowd did” is not as sexy as “Nate Silver Nails It: 50 for 50″ (Drudge) or dubbing him “America’s Chief Wizard” (Gawker). But it is precisely sensational headlines such as these that create divergences between pundit reputations and pundit track records.
Well-functioning markets are difficult to beat — just ask active managers in the mutual fund industry. And should the political betting markets become more liquid, they will only get better over time. Betfair and Pinnacle, which are believed by many to have greater liquidity, had significantly higher odds of Obama winning relative to InTrade.
We should add that at least one prominent person agrees with this assessment. That person is Nate Silver. Consider this excerpt from his book, “The Signal and The Noise”:
<< Could FiveThirtyEight and other good political forecasters beat Intrade if it were fully legal in the US and its trading volumes were an order of magnitude or two higher? I’d think it would be difficult. >>
A robust benchmark is necessary to hold pundits accountable. We believe “the crowd” offers the best such benchmark, and this idea is the linchpin of PunditTracker’s scoring system. By having our users vote on the likelihood of each prediction, our goal is to establish a consensus-based benchmark that will only become more efficient over time (for those new to the site, there is an extra incentive to vote). With your help, we can put an end to inflated pundit reputations and identify those that are truly adding value.