Karsten Petrat

Pollsters got it wrong in the 2016 election. Now they want another shot.

There’s a new crowd of would-be oracles, determined not to replicate the mistakes of their predecessors.

by Rob Arthur
February 14, 2020

On the night of November 8, 2016, Charles Franklin, like millions of other Americans, watched the presidential election results roll in with what he described as “a sinking feeling.” But Franklin, a Wisconsin pollster and professor of law and public policy at Marquette University, wasn’t distressed on account of his personal political preferences; he had his reputation at stake. Just a week earlier, his own poll had shown Hillary Clinton up six points in Wisconsin. Instead, here she was, losing by seven-tenths of a point.

Franklin was on duty with ABC’s Decision Desk, one member of an expert behind-the-scenes team responsible for calling states for Clinton or Donald Trump as the tallies came in. As he watched the returns pile up until four in the morning, it became clear that his survey was off.

“Nobody wants to be wrong,” he says, looking back. “So in that sense it was very depressing.”

He wasn’t the only pollster to misread the election. According to RealClearPolitics, every single one of more than 30 polls in Wisconsin in the months leading to the election had Clinton winning the state by margins ranging from 2 to 16 points. And these errors had been amplified further because they were then used as fuel for computer algorithms that predicted an overall Clinton victory.

After Donald Trump had made his victory speech and the dust had cleared, everyone started to admit their errors.

“It gutted me to realize I had been wrong,” wrote Natalie Jackson, a data scientist at the Huffington Post, which had given Clinton a 98% chance of winning.

The media, including many outlets whose own forecasts had given Clinton a strong likelihood of victory, started to decry the failure of prediction algorithms. Some critics were more circumspect than others, acknowledging that some forecasters had accurately described a Trump victory as merely improbable. But many cast doubt on the whole idea of predicting elections. Some even used the election as ammunition to attack the entire field of data science.

Yet nearly four years later, and with another contest looming, forecasters are beginning to issue early predictions for 2020. The backlash to 2016 hasn’t dissuaded them—in fact, there’s now a whole new crowd of would-be oracles, determined not to replicate the mistakes of their predecessors.

What went wrong

A cocktail of problems led to the polling misses of 2016. Some surveys failed to contact enough less--educated white voters, while some Trump supporters declined to admit which way they would be voting. Trump’s unconventional strategy also turned out more citizens in heavily Republican rural counties. Pollsters incorrectly assumed that these people would stay away as they had done in previous elections, which made Trump’s base appear smaller than it really was.

But while pollsters received the majority of the blame, perhaps more condemnation ought to have fallen on the forecasters, who turn pollsters’ data into predictions.

“Two major forecasters had Hillary Clinton at 99% to win,” says G. Elliott Morris, a data journalist at the Economist who works on election forecasting. “When she didn’t, a lot of them just blamed pollsters, because it’s easy for them.”

There were at least two major errors committed by some of the data scientists who helped design the prediction algorithms. First, they assumed that if the odds of being off by nearly seven points in Wisconsin were low, the odds of a comparable error in other critical states like Michigan and Pennsylvania were tiny. In fact, polling problems in one state were correlated with mistakes in other, similar states. Assuming that polls were entirely independent of each other—rather than reflecting the same reactions to the same issues—produced overconfidence in Clinton’s lead.

Second, prediction algorithms failed to register the record number of undecided voters as a warning sign. Because so many voters were on the fence right up to Election Day—and would end up breaking strongly for Trump—Clinton’s margins were much less safe than they appeared.

“It was staring us right in the face,” says Rachel Bitecofer, a professor of political science at Christopher Newport University. Had there been more polls in the closely contested states just before Election Day, she suggests, analysts might have picked up on the unusually high number of voters who decided to turn out at the last moment.

It wasn’t just the forecasters’ fault, though. Even when their probabilities for each candidate were accurate, the public seemed to have trouble comprehending the meaning of those numbers.

During the closing days of the election campaign, I was working at FiveThirtyEight, one of the most prominent outlets making predictions. My job didn’t involve the presidential race: instead, I was covering baseball’s World Series. When the Chicago Cubs were down three games to one in the seven-game series against the Cleveland Indians, I noted that their odds of winning, at around one in six, were a hair below Trump’s chances of taking the White House. Six teams had done it before in the 113-year history of the World Series, and another seven had pulled it off in other playoff rounds, so it was definitely possible, but it wasn’t typical. Afterwards, when both the Cubs and Trump won against the odds, I received a deluge of hate tweets blaming me for somehow jinxing into existence two very possible turns of fate.

“If you hear there’s going to be a 20% chance of rain, you don’t bring your umbrella. And then it rains and you get all ticked off and it’s probably your fault,” says Steven Shepard, an editor and election forecaster at Politico. “But that 20% occurrence isn’t necessarily that unlikely.”

Many people seemed to look at which candidate was projected to win (usually Clinton) without considering how certain the forecasters were. A 70% chance of a Clinton victory certainly favored the Democrat, but ought to have been viewed very differently from a 99% chance.

Still, some did say 99%, and they were undoubtedly too aggressive. Sam Wang at the Princeton Election Consortium estimated Trump’s chances at less than 1%, and even pledged to eat a bug if Trump earned more than 240 electoral votes.

When the election result came through, Wang stayed true to his word. A week after polling day, he appeared on CNN with a can of “gourmet” crickets (“gourmet from the point of view of a pet,” he clarified) and decried the spectacle his bet had caused. “I’m hoping that we can get back to data, and thinking thoughtfully about policy and issues,” he said before dipping a cricket in honey and, with a pained expression, gulping the insect down.

Triple threat

Not all forecasts were as far off as Wang’s. Some even anticipated a victory for Trump. To understand why they came in so differently, it’s valuable to look at the range of approaches, which fall into three broad classes.

The earliest forecasts in each election cycle come from what are called fundamentals models. These are typically built from presidential approval ratings, economic statistics, and demographic indicators. A strong economy presages victory for the incumbent’s party, as does a high approval rating for the incumbent. The demographic makeup of a state can also be used to predict the outcome—for example, white, non-college-educated voters tended to vote for Trump in 2016, so states with lots of them are more likely to go his way in 2020 as well.

Because these factors are relatively stable, reliable fundamentals predictions can be made much earlier than most other types of forecast. Models like this seem too simple to capture all the quirks and scandals of the modern, two-year campaign. But they performed shockingly well in 2016: six out of 10 predicted the final popular vote to within one percentage point.

The presidency isn’t chosen by straight-up national popular vote, however, and that’s a key limitation of fundamentals approaches: few predict the final results of the Electoral College.

Fundamentals models have another weakness. If late-breaking news arises, such as a scandal at the end of the race or a sudden shift in the economy (the 2008 financial crisis is a good example), then these stable forecasts can suddenly become woefully out of date. To compensate for this, a decade or so ago statisticians started popularizing new kinds of quantitative models that aren’t quite as vulnerable to these October surprises. They process polling data as it comes out and produce a day-by-day estimate of who will win, so they can respond if public opinion shifts.

RealClearPolitics and the New York Times’ Upshot both have well-regarded quantitative models, but no model has more fame—or, arguably, a better track record—than Nate Silver’s FiveThirtyEight forecast, named for the total number of votes in the Electoral College. FiveThirtyEight’s algorithm comes in several variations, but all take care to adjust polls according to how trustworthy the survey organization is and whether its results tend to consistently lean Democratic or Republican. The careful ingestion of polling data, and the attention Silver pays to uncertainty, have traditionally set it apart from other forecasts. “FiveThirtyEight is the gold standard,” Bitecofer told me.

Of the major quantitative election predictions, FiveThirtyEight’s was the most conservative, assigning Clinton a 71.4% chance to win on the eve of the election. “That sounds about right now in retrospect,” says Charles Franklin: Trump’s victory was an unlikely, but not impossible, outcome.

Finally, there are predictors out there who eschew mathematical approaches altogether, relying instead upon a combination of intuition, polling, and the output from all the other kinds of models put together. These qualitative predictions run on one of the most sophisticated and yet error-prone computational engines we know of: the human brain.

Rather than precise numeric estimates, qualitative forecasters typically group races into one of four categories on a scale ranging from safe to toss-up.

“Toss-up” means there is no favorite: “Kind of a coin flip,” says Kyle Kondik, a qualitative forecaster with the University of Virginia’s Crystal Ball political analysis newsletter. “Lean,” he says, is a small edge for one side or the other. “Likely” is a larger edge for one side or the other. And “safe,” he says, means we’d be shocked if that party lost. Some qualitative predictors argue that these verbal groupings help readers understand the relative probabilities better than the more exact numbers offered elsewhere.

While these predictions may seem less scientific than ones based on crunching numbers, some boast an impressive level of accuracy. In the 2018 midterms, according to a third-party assessment of several professional forecasts, it was the aptly named Crystal Ball that did best, not FiveThirtyEight’s statistical algorithm. Performance tends to fluctuate from cycle to cycle, however: the best practice, according to pollsters and academics, is to consume a wide variety of forecasts—qualitative, quantitative, and fundamentals.

What next?

Nearly all the forecasters I spoke to had received vitriolic hate mail after the 2016 results. Yet dozens of new modelers have thrown their hats into the ring for 2020.

They will be rolling out their predictions for the first time this year, and they are intent on avoiding mistakes from past election cycles. Morris, the Economist’s forecaster, is one of those entering the field. He has called previous, error-prone predictions “lying to people” and “editorial malpractice.” “We should learn from that,” he says.

The Economist will be building its algorithm using polls published by outside organizations, but it will also be conducting its own surveys to shore up the results in ambiguous states and races, which Morris hopes can lead to greater accuracy.

The Washington Post, too, is making its first gamble on predictions—but taking a different route. It is staying out of the forecasting game until returns start coming in. Only once the first precincts start to announce vote totals on Election Day will the Post deploy its analytical model to judge the likelihood that specific candidates take the state or district for which they are competing. By waiting until the first ballots are counted, the Post’s data scientists plan to drastically reduce the error in predicting the rest of the votes, albeit at the cost of being unable to release an early projection.

Experienced forecasters and pollsters aren’t sitting on their hands either. Builders of fundamentals models are beginning to take up the challenge of predicting the Electoral College instead of just the popular vote. Bitecofer designed a model based primarily on demographics that is already predicting a narrow electoral-vote victory for the Democratic challenger, whoever that may be.

The designers of those problematic quantitative algorithms appear to have learned their lesson about correlated errors between states. The Huffington Post issued a mea culpa for its 98% prediction of a Clinton victory. Wang, the bug-eating Princeton professor, has pledged to update his algorithm so that it will be much less confident in 2020, admitting on his blog that his earlier model was “a mistake.”

Qualitative forecasters, meanwhile, took a variety of lessons from 2016. “There are a lot of different things that in hindsight I wish that maybe we had focused on a little bit more, but I would say the fundamentals--based models were the best in that election,” says the University of Virginia’s Kondik. “I wish we all paid them greater heed.”

Kondik and others stress the need to be cautious about any prediction given the historic unpopularity of the sitting president, which ought to decrease his chances, and the strong economy, which ought to increase them. Those dueling factors mean the race is uncertain so far from Election Day.

Elsewhere, media organizations have also started providing their estimates in ways that are designed to give the reader a better, more intuitive grasp of what probabilities mean. Rather than writing that Democrats had an 87.9% chance of taking the House during the 2018 midterm elections, for example, FiveThirtyEight emphasized that they could have expected to win seven times out of eight.

“Psychologists have found that people are better at understanding these types of [numbers],” wrote Yphtach Lelkes, a professor of communications at the University of Pennsylvania.

Finally, pollsters are upping their game as well. The American Association for Public Opinion Research (AAPOR) issued a retrospective of 2016 with lessons for future elections. Tips include using statistical tricks to ensure that population samples are more representative of the state being surveyed and conducting more polls in the final days of the campaign so as to capture the leanings of late--deciding voters, who proved so critical to Trump’s victory.

Franklin, the Wisconsin pollster, was one of the authors of AAPOR’s post-mortem. The systematic failure of dozens of surveys across several states suggest that his poll’s mistake was due to a real shift in the closing days of the race, rather than an earlier, more fundamental error. Still, he wonders what might have been: “What if we had polled through the weekend before the election? Would we have captured the swing toward Trump in those data?”

Quantum polling

But while mistakes from four years ago can be corrected, new difficulties may also crop up for the 2020 cycle. Some may even be driven by forecasting itself. Some experts argue that election predictions may be influencing the very results they are trying to predict.

According to a recent study, an overwhelmingly liberal audience tuned in to those overly confident quantitative forecasts in 2016. Previously published studies suggest that when people believe the outcome of an election is certain, they are less likely to vote, especially if the certainty is stacked in favor of their chosen candidate. So in a twist on what is known as the observer effect—in which the mere act of watching something changes the outcome—feeding a heavily Democratic audience with a steady diet of overconfident polling like Wang’s could have reduced turnout significantly. Given that the race was essentially decided by only 107,000 votes in three states, any reduction could have been important.

“Clinton lost by so few votes that it is certainly possible that probabilistic forecasts caused enough Democrats to stay home that it affected the outcome,” wrote Lelkes. Clinton herself suggested as much. “I don’t know how we’ll ever calculate how many people thought it was in the bag, because the percentages kept being thrown at people—‘Oh, she has an 88 percent chance to win!’” she said in an interview in New York magazine.

Even if election forecasting didn’t change the outcome in 2016, it could have more of an impact on future campaigns.

“Horse race polling is believed to increase political cynicism, affect turnout, increase polarization, and likely supplants information about substantive issues,” wrote Lelkes. “It causes people to view politics as a game, where they go out and root for their team, rather than support candidates based on their political positions.” And if these effects are real, they are likely to get more powerful as more forecasts happen.

Some forecasters, like Silver, have dismissed this concern. They argue that it isn’t their job to tell people whether or not to vote—or to tell the media what to cover. Others, however, are taking the advice of Lelkes and his colleagues more seriously.

“We’re experimenting with ways to convey uncertainty that won’t turn people off [from voting],” says the Economist’s Morris. “But I think that is still a problem that forecasters are going to have … I don’t know how we get around some of the societal implications of our work.”

Rob Arthur is an independent journalist and data science consultant based in Chicago.