22.3 C
New York
Thursday, June 20, 2024

Have Pollsters Cleaned Up Their Act in Time for the Midterms?

I regret to inform you that it’s election season again.

With the midterms approaching in November, US politics junkies will soon find themselves sucked back into a familiar pattern: devouring news reports on the latest polls, comparing polling averages, and compulsively refreshing election forecast models until the trend lines burn into their brains.

But can these news obsessives trust the numbers they’re seeing?

There was a stretch when it seemed as though the new science of election forecasting—aggregating all the polls, applying statistical techniques, and adjusting with other types of economic and historical data—could accurately predict what would happen every four years. Nate Silver, founder of FiveThirtyEight and granddaddy of this modern discipline, famously nailed the 2008 and 2012 presidential elections.

Then came 2016, when the polls failed to capture support for Donald Trump among working-class voters. After that debacle, the polling industry vowed not to get fooled again. So what happened in 2020? The polls failed to capture support for Donald Trump among working-class voters. What had been forecast as a comfortable victory for Joe Biden ended up a squeaker in the Electoral College, with Biden prevailing by superthin margins in crucial swing states where polls had exaggerated his advantage.

G. Elliott Morris is a data journalist at The Economist, where he runs the magazine’s election forecasting operation—a skill he honed as an undergraduate. His model gave Biden a 97 percent chance of victory in 2020 and predicted he’d win 356 electoral votes. (He won only 306.) In a new book, Strength In Numbers, Morris acknowledges the failures of polling over the years but argues that polls remain crucial for democracy. This week, he spoke to WIRED about what went wrong in the last election, how hard it is to predict what will go wrong next time, and why the answer to bad polls is for everyone to just trust them a little bit less. 

This interview has been condensed and lightly edited.

WIRED: In 2016, famously, the polls were pretty badly off. In 2020, they were pretty badly off again, in almost the exact same way. What’s our best understanding of what went wrong?

G. Elliott Morris: In any year, pollsters suffer from a fundamental problem: The types of people they talk to may not be representative of the entire population. They could talk to too many white voters, too many non-college-educated voters, too many poor or rich voters.

Coming out of 2016, it seemed like the consensus was, “Oops, we didn’t include enough non-college white voters. So if we correct for that, we should be good.” That was the narrative heading into 2020, wasn’t it?

In 2016, there were higher levels of polarization by education than there ever had been before, with bigger gaps between how non-college and college-educated people were voting. Some pollsters knew to take that into account—to make sure that their polls had the right percentage of non-college-educated voters—but not all pollsters did that. And so you ended up with these errors from polls where the sample was too educated. And because education has become correlated with vote choice, they ended up overstating Democratic support in 2016.

Most PopularBusinessThe End of Airbnb in New York

Amanda Hoover

BusinessThis Is the True Scale of New York’s Airbnb Apocalypse

Amanda Hoover

CultureStarfield Will Be the Meme Game for Decades to Come

Will Bedingfield

GearThe 15 Best Electric Bikes for Every Kind of Ride

Adrienne So

Going into 2020, I think a naive reading of the polls would have suggested, “Okay, if you weight by education, then you’ll be fine.” And that's the position some people took. But we had a new type of error in 2020 called non-response error, where Republicans who were particularly likely to support Donald Trump were also the ones who were particularly unlikely to respond to surveys. So even if you had the right composition of demographics, of education, you were still going to undersample those Republicans.

In other words, even if you’re weighting for non-college-educated voters, what you don’t realize is you’re not getting a representative sample of them.

Right. Some pollsters in 2020 were even weighting to make sure their polls had the right sample of 2016 Trump supporters and 2016 Clinton supporters. But they ran into the same problem. It could be that the 2016 Trump supporters who respond are the ones who are less likely to support Trump now, and that makes your poll biased toward the Democrats.

Heading into the midterms, how likely are we to get more accurate results than last time?

Because polls in 2016, 2018, and 2020 were all biased toward Democrats, the assumption is that that’s going to happen this year. But there’s no inherent reason that patterns of non-response would stay the same between these two elections. There's a theory that because Donald Trump was telling his supporters that polls were fake news or whatever, then they were less likely to respond to surveys in those years. But there's really no way to know for sure why this pattern happened. Therefore, we don’t know if it’s going to persist into the future.

So I can’t sit here and tell you that polls are going to be biased toward the Republicans this year. The only really good prediction is that we’re still in an environment where non-response by party plays a big role in how accurate a poll or a polling average is going to be. And so we should still be expecting a higher probability of big misses than we would have expected back when polling response rates were higher, and you had to do less work to your sample to make it representative.

So we should just anticipate a higher probability of systemic polling misses, and the public and the media should be setting expectations accordingly.

Yeah. In my book, I argue that the press and the public should have lower expectations for polls than they did in 2016 or 2020. This expectation that polls on average are going to provide an unbiased prediction of an election, or maybe have an error of at most 3 or 4 percent, is wrong. We are so sorted by party, beyond what our demographics suggest about who you’re going to vote for, that weighting by party or doing some other adjustment to make sure you have a politically representative poll is more important than ever—but it’s a very hard task.

You can’t just look at people’s party affiliations, because a lot of people are Republican voters, but they’re registered Independent or they’re not registered with any party, and the same for Democratic voters.

Most PopularBusinessThe End of Airbnb in New York

Amanda Hoover

BusinessThis Is the True Scale of New York’s Airbnb Apocalypse

Amanda Hoover

CultureStarfield Will Be the Meme Game for Decades to Come

Will Bedingfield

GearThe 15 Best Electric Bikes for Every Kind of Ride

Adrienne So

And you don’t even have party registration in every state. Wisconsin, for example, doesn't have party registration. And even where there is, the people who register as a Republican aren’t always Republican voters. That just might be what they originally registered as, but they’re now reliable Democrats and just never updated.

How do we in the media contribute to misinterpretation of polling data?

Let’s think of a hypothetical. Maybe a polling organization releases two polls: one a month ago showing Democrats ahead by five, and one today showing they’re only up two. Then a political pundit reports, “There's been a three-percentage-point move toward Republicans over the last month.” Nope. Any pollster will tell you, we actually don’t know whether that's real movement because it’s within the margin of error. If you must write this horse-race piece, explain that you don’t really know what's going on.

Reporting on polls and forecasts can actually affect election outcomes. There’s a strong case that James Comey’s announcement just before the 2016 election, that the FBI was reopening the Hillary Clinton email investigation, was enough to tip the election in the final week. Comey later said that he had been sure Hillary would win, and so he didn’t think his actions would meddle with the results. More generally, maybe if Democratic voters hadn’t been as confident as they were, some people who didn’t bother to vote would have cast a ballot. What responsibility do pollsters and forecasters have, given that in predicting the outcome, you could actually affect the outcome?

There’s been one notable study on the impact of election forecasting probabilities on voter turnout. It was an experiment where people were told the probability of their candidate winning, and then given the chance to donate money to that candidate. The researchers concluded that the higher the win probability, the less motivated people were to spend. The results suggested that if people were reading election forecasts that say 99 percent Hillary Clinton instead of, let’s say, 70 percent Hillary Clinton, then turnout would have decreased by, like, two percentage points. Now, that’s not nothing. That’s a couple million people. That’s certainly enough to change the election outcome. But it’s still only one study. We don’t know whether that holds true for actual voting behavior. And even that estimate comes with a margin of error.

Plus, we know that during campaigns, the most common candidate behavior is to insist you’re winning even if you’re losing. That suggests there’s at least a folk wisdom that people are going to be more likely to vote for you if they believe that you’re the winner, which kind of cuts against what I was just suggesting.

Yeah. I think there’s a tendency among critics of election forecasts to insist that it changes outcomes, that it’s dangerous. I’m not convinced of that. I do think it’s plausible, so I think it deserves more research. And I’ve worked with the people who are doing this research. I don’t want to shut it down. But I also think some of the critics are a bit too convinced of these effects.

Most PopularBusinessThe End of Airbnb in New York

Amanda Hoover

BusinessThis Is the True Scale of New York’s Airbnb Apocalypse

Amanda Hoover

CultureStarfield Will Be the Meme Game for Decades to Come

Will Bedingfield

GearThe 15 Best Electric Bikes for Every Kind of Ride

Adrienne So

The other important thing I’ll say is, if the Comey quote is true, then actually he needed to listen to good election forecasts that showed the number was more like 70 percent. So that becomes an argument for further forecasts.

Well, what is a “good” forecast? If we go back to 2016, as you say, Nate Silver’s forecast gave Trump a 30 percent chance of winning. Other models pegged Trump's chances at more like 1 percent or low single digits. The sense is that, because Trump won, Nate Silver was, therefore, “right.” But of course, we can’t really say that. If you say something has a 1-in-100 chance of happening, and it happens, that could mean you underrated it, or it could just mean the 1-in-100 chance hit.

This is the problem with figuring out whether election forecasting models are tuned correctly to real-world events. Going back to 1940, we have only 20 presidential elections in our sample size. So there’s no real statistical justification for a precise probability here. 97 versus 96—it’s insanely hard with our limited test size to know whether these things are being calibrated correctly to 1 percent. This entire exercise is much more uncertain than the press, I think, leads the consumers of polls and forecasts to believe.

In your book, you talk about Franklin Roosevelt’s pollster, who was an early genius of polling—but even his career, eventually, went up in flames later on, right?

This guy, Emil Hurja, was Franklin Roosevelt’s pollster and election forecaster extraordinaire. He devised the first kind of aggregate of polls, the first tracking poll. A really fascinating character in the story of polling. He’s crazy accurate at first. In 1932 he predicts that Franklin Roosevelt is going to win by 7.5 million votes, even though other people are forecasting that Roosevelt’s going to lose. He wins by 7.1 million votes. So Hurja is better calibrated than the other pollsters at the time. But then he flops in 1940, and then later he’s basically as accurate as your average pollster.

In investing, it’s hard to beat the market over a long period of time. Similarly, with polling, you have to rethink your methods and your assumptions constantly. Even though early on Emil Hurja is getting called “the Wizard of Washington” and “the Crystal Gazer of Crystal Falls, Michigan,” his record slips over time. Or maybe he just got lucky early on. It’s hard after the fact to know whether he was really this genius predictor.

I bring this up because—well, I’m not trying to scare you, but it may be that your biggest screw-up is somewhere in the future, yet to come.

That’s sort of the lesson here. What I want people to think about is, just because the polls were biased in one direction for the past couple of elections doesn’t mean they're going to be biased the same way for the same reasons in the next election. The smartest thing we can do is read every single poll with an eye toward how that data was generated. Are these questions worded properly? Is this poll reflective of Americans across their demographic and political trends? Is this outlet a reputable outlet? Is there something going on in the political environment that could be causing Democrats or Republicans to answer the phone or answer online surveys at higher or lower rates than the other party? You have to think through all these possible outcomes before you accept the data. And so that is an argument for treating polls with more uncertainty than the way we’ve treated them in the past. I think that’s a pretty self-evident conclusion from the past couple of elections. But more importantly, it’s truer to how pollsters arrive at their estimates. They are uncertain estimates at the end of the day; they’re not ground truth about public opinion. And that’s how I want people to think about it.

Related Articles

Latest Articles