Data Standards Polling & Politics

Media polling and the way forward

While most of us are hitting refresh on election results, and skimming Twitter, there is a small and heated conversation about how the US media polls could have almost uniformly underestimated the Trump vote nationally, and in state after state.

This is a helpful conversation to have, if it is constructive, adding more light than heat.

And, it is more important than ever that we have civil, fact-based conversations.

Some are saying that “the pollsters were wrong.” That is not correct. SOME media and academic pollsters were wrong in the 2020 elections. And, those media pollsters were all wrong in the same direction. In fact, their polling undercounted Trump and GOP strength in 2016, 2018 and 2020.

But, it’s important to note that some public pollsters were very very close, including the Trafalgar Group, Rasmussen, and Selzer & Co for the Des Moines Register. It is also important to note that all of these firms were derided for publicly releasing polling data showing a tight race. They were mocked and eye rolled because their data did not fit the conventional wisdom in newsrooms. The most extreme example of this was the Twitter reaction to Selzer & Co polling for the Des Moines Register showing a +7 for Trump in Iowa. Trump won Iowa by 8.2%.  

It’s also important to note that the USC Dornsife Center and Trafalgar Group were prescient in their contention that there was a “shy Trump voter”. That segment could be accounted for based on “social circle” questions that ask who a respondent’s social circle is supporting. This approach removes the social desirability / social undesirability effect and gives respondents a safe way to tell the truth. Instead of asking a single voter who they are voting for, this line of questioning asks who their friends and family are supporting.

Further, it must be acknowledged that the media and the public only see PUBLIC surveys. They are not privy to either (1) the work of professional campaign pollsters, or (2) the extensive vote modelling and database work that is now commonplace in campaigns. This work is extremely high quality, with much larger sample sizes, nightly tracking, and deeper analysis of the internals. I worked in this world at Public Opinion Strategies and Fabrizo, McLaughlin, before transitioning to corporate research. Modern American campaigns simply have far more resources for high quality polling than cash-strapped media and academic institutions. But, if you want to know what the polling is telling a Presidential campaign, just look at the candidate’s travel schedule and ad buying. It’s remarkably transparent.

Market researchers understand that a survey is a tool used to understand the reality of perception and to build strategy. When surveys are used for anything else, then they are something else.

With all of this as a preface, researchers should make a dispassionate analysis of what happened, so that media polling can make constructive changes in the future.

This analysis can be broken into four parts. I’m not going to spend time on data collection methodology, because this has been well discussed in the industry, and the consensus view of the research community is that mixed mode – a combination of data collection methods, is optimal.   

In 2020 media pollsters appeared to make some critical mistakes in four areas: (1) Sampling, (2) Questioning, (3) Contextualizing, and (4) Analyzing.

  • Survey Sampling

Estimating a future electorate based on the shape of past elections is always a challenge.

And, in almost every election there are subtle changes in the composition of the electorate that differ from the previous election cycle. But, in this case, it appears that media pollsters are systemically under sampling non-college educated whites. This was almost certainly the case in 2016, and it appears to have happened again in 2020. This matters in modern elections, because non-college educated whites are a core component of Trump’s political base and appear to be moving toward the Republican party. At the same time, college educated whites appear to be moving toward the Democratic party. We saw examples of this on election night with small drops in Trump support across American suburbs. Under-sampling non-college educated whites under-samples the Trump vote, and probably explain about half of the gap between polled and actual support levels.

After the 2016 elections, the under-sampling of non-college educated whites was identified, and some media polling groups reported that they would set education sample quotas within ethnicity. But, sifting through the data over the course of the campaign, it is not clear that that happened in 2020.

Additionally, evangelicals appear to have been under-sampled by media pollsters, despite the fact that they are a predictable, 25-26% of the electorate every cycle. Exit polls suggest that Trump received 81% of evangelical support in 2016 and 76% this year. In their election eve survey, one prominent media poll only had “evangelical/fundamentalist” at 17%. A quick calculation means that accounting for this error alone would shift the Trump percentage +6, moving a Biden +10 survey to a Biden +4, and bringing it much closer to the final popular vote result. It was an immediate red flag, and why I expected this election to be much closer than the conventional wisdom expected.

  • Questioning: Social Undesirability Bias

Was there a so-called “Shy Trump Voter” segment in the 2020 elections?

Were these voters afraid to tell the truth about their vote intentions for fear of social sanction?

Yes.

In fact, Public Opinion Strategies found in their post-election survey that “nineteen percent (19%) of Trump voters said they kept their support for Trump a secret from most of their friends, compared to just 8% of Biden voters.”

This should not be surprising.

A national survey this summer sponsored by the CATO Institute found that the following % of each group agreed with the statement:

“The political climate these days prevents me from saying things I believe because others might find them offensive.”

Strong Liberal                         42%

Liberal                                     52%

Moderate                                 64%

Conservative                           77%

Strong Conservative               77%

This data strongly suggests that right of center voters are much more likely to self-censor.

Contrary to the stereotype of the boisterous Trump supporter, this data suggests that conservatives are much less likely to share their opinions.

The question then becomes, how do we better capture the opinions of those less likely to share? And, taking this one step further, how do we help respondents share their opinions when it is socially undesirable for them to do so?   

One potential answer is “social circle” question structure. The USC Dornsife center notes that the social circle question accurately predicted the winner in the 2016 U.S. Presidential election, the 2017 French Presidential election, the 2017 Dutch Parliamentary election, the 2018 Swedish Parliamentary election and the 2018 U.S. election for House of Representatives. And, their research suggests that this question structure is more reliable than the “own intention” question structure traditionally used by pollsters. You can read more about this here:

https://dornsife.usc.edu/news/stories/3338/experimental-polling-point-to-trump-victory/

  • Contextualizing:

Unfortunately, the media is struggling to interpret the polling data, and this is because (1) it is not their area of expertise, and (2) they generally lack the wider historical context of how states and counties traditionally perform. Neither of these are meant to be harsh or critical.
In fact, the latter problem is now endemic in American media, due to staff cuts in most newspapers. This problem is most acute at news organizations that cover state politics in their capitol city. These staff members have been sharply reduced as media economics have been squeezed. The result is that there are very few longtime political journalists working at the state level. And, that means there is now less institutional memory and ability to contextualize political data.

The result is that the public reads simple horse race coverage, without any wider context. Florida is an exceptional example of this. The media reported horse race coverage. But, the media should have noted that both Rick Scott and Ron DeSantis polled behind their Democratic opponents and yet still won in 2018. They should also have noted that Trump was winning a suspiciously high percentage of Latino votes in pre-election surveys. That would have been another tip off. And, finally, they should have known that the Florida GOP had invested heavily in registering right of center voters. This cut the registration gap between Republicans and Democrats in Florida – giving an advantage to Republicans. None of this was widely reported or explained before election day. And, this resulted in surprise at the outcome.

Context is also important on a wider level. Political veterans know that states and counties perform at a generally predictable level. Politically speaking, states don’t swing wildly between parties. They have a political culture and vote within a general range, evolving over decades. Much of the state level media polling this year was unbelievable on its face, simply based on historical knowledge of the long arc of a state’s voting behavior. 

There are parallels here to commercial market research. Wider knowledge of consumer behavior, of past marketing campaigns, and years of focus groups with consumers provides skilled researchers the ability to contextualize quantitative information, placing this information within a wider set of knowledge. The research industry often calls this process “triangulation” – contextualizing core survey findings by synthesizing that knowledge with additional, external data and observations.

  • Analyzing:

Unfortunately, most media polls focus on basic horse race coverage – candidate A xx% and candidate B yy%. Most people think this is how a pollster analyzes the data, but that is not the case. Instead, pollsters analyze ballot support (a) by target group and (b) by turnout propensity and past vote behavior. When analyzing by target group, a pollster compares current ballot support levels and the spread against target. In practical terms this means that if your candidate historically needs x% of a media market or y% of union households or z% of black voters, then you know where you are under vs overperforming and where you need to focus. Most people intuit this, but if coverage of a media poll doesn’t go into that level of depth or provide the crosstabs or data file, then these critical bits of data cannot be interrogated. The more complex analysis is the latter – estimated turnout. And I suspect this is also where some error crept into the media polls.

Pollsters measure turnout propensity in terms of (1) self-reported enthusiasm to vote, (2) strength of candidate support (sometimes called “intensity”), and (3) historical behavior from the voter file. Media polls rarely go into this level of analysis. This is unfortunate, because most media polls that I saw had a slight edge to Trump in terms of support intensity or enthusiasm. Small differentials in vote intensity express themselves on election day in the form of incremental, 1-2% gains.

Finally, it’s important to note that campaign pollsters report candidate ballot support levels in a turnout matrix based on estimate vote propensity. Most people think that pollsters report the data in simple horse race coverage. They do not and have not for years. But media polls usually provide just the horse race, head to head, numbers. That’s unfortunate and misses the dynamism and complexity of an election.    

The Way Forward:

Some public polling firms were very accurate. Most media polls were not.

We can fix media polling, like we fix anything else. We identify the failure points and eliminate them. Media pollsters need to get much better at modelling the real electorate, not all registered voters, but likely voters. They need to ensure that their sample frames and data reflect the historical composition of the electorate. They need to explore new methods for capturing vote intent by considering a shift from “own intent” questioning to “social circle” questioning. They need to better contextualize the data they are collecting, triangulating this information with wider knowledge of a political geography. And they need to progress beyond reporting just registered voter (RV) and likely voter (LV) horse race data, explaining the impact of intensity of support and vote propensity on the shape of the actual electorate.

All of this can be fixed.

Leave a Comment

* By using this form you agree with the storage and handling of your data by this website.
Please note that your e-mail address will not be publicly displayed.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related Articles