Opinion polls are like a bowl of soup – you have to stir it well to taste the whole

As we approach elections on 29 May, the public is being bombarded with polls, often generating disparate or conflicting results. Pollsters who provide clear explanations of how the poll was conducted tend to be more accurate than those who do not.

Pollsters are fond of the joke that says elections are only held to see which pollster got it right. In truth, by election time, pollsters are forgotten, as are their predictions.

Wolfgang Donsbach, in a global review of polling and its possible impact on election outcomes, concluded that “the effects [of polls] remain first of all minimal and secondly they can be seen as completely harmless.”

He went on to note that polls are commonly drowned out by the electoral predictions of the media and politicians, and that the significance of polls “is overestimated by both politicians and social scientists”.

Perhaps not what pollsters want to hear, but a healthy corrective to the hype that surrounds the release of a new poll, and the bitchiness of post-release commentary.

As we approach national and provincial elections on 29 May, readers are already being bombarded with polls of many shapes and sizes, often generating disparate or conflicting results. This feeds a fairly common view (in Daily Maverick’s comments section, for example), which suggests that if you were to cross a corrupt lawyer with a crooked politician you’d get a pollster, who should be treated accordingly.

This may be compounded by some of the technical jargon that is used in articles and press releases but very rarely explained – weighting, imputing, missing data, margin of error, confidence internals and the like. This article tries to explain why those terms matter, using data published recently in Daily Maverick.

The deliberate use of technical language for a general readership may be part of the mythology creation that every profession does – baffling with bullshit, in short.

Sociologist Karl Mannheim noted many years ago that these myths may be “intentional or unintentional, conscious, semi-conscious, or unconscious, deluding oneself or of others, taking place on a psychological level and structurally resembling lies”.

Where pollsters fall on this scale is left to the reader.

An unregulated industry – time to change?

First, a couple of broader points. Polling in South Africa is entirely unregulated.

In the UK, poll quality is overseen by the British Polling Council, in America by the National Council on Public Polls (though not very active); there is a global academic/practitioner entity, the World Association for Public Opinion Research (Wapor), and many others.

These bodies set standards, can demand transparency from members (including publishing micro-samples of anonymised data for quality checking), and make findings on the work of their members.

Even if it were only voluntary and self-regulating, creating a South African equivalent may be an important intervention to maintain quality and to help consumers of polls.

Second, we are at least free to poll!

Many countries have banned polling, and many more embargo them.

Latin America has the largest share of countries with pre-election poll embargoes. Europe is second.

Wapor noted that “in nearly a third of… countries in West Asia and North Africa and sub-Saharan Africa, it is not permitted to publish polls about elections and voting (in about a quarter of countries in those two regions, polls about voting cannot even be conducted).”

In total, governments officially regulate polls in a third of all countries globally, and one in four countries (of the 157 studied) do not permit surveys that deal with religion, crime or voting. Their surveys must be rather dull affairs.

Wobbly polling

In the 2010s, the quality of polls declined (or wobbled rather badly) internationally, and the polling governance entities became more prominent.

The Pew Centre reported that “93% of national polls overstated the [2020] Democratic candidate’s support among voters, while nearly as many (88%) did so in 2016.”

The problem persisted, as Nate Silver noted: “2020 had the highest average error of… six presidential general election cycles… (albeit only a tenth of a point worse than 2016). And it was tied with 2016 for being the worst cycle for presidential primary polls.”

In the United Kingdom, polling for the 2015 election was described as “appalling” and the British Polling Council convened an official inquiry specifically to understand why this was the case, noting that the polls preceding the 2015 general election “were some of the most inaccurate since election polling first began in the UK in 1945”.

Even Donald Trump is a problem: American pollsters are “holding their breath”, according to The New York Times.

They know the problems: people are not taking pollster calls, turnout is unpredictable, and Trump supporters in particular refuse to take part in polls.

Some percentage of people also answer out of annoyance at being polled. This results in “4% of Americans reporting they have personally been decapitated”.

Transparency

The main conclusion of the UK inquiry, as of every other entity analysing poll quality, was the same: transparency was paramount.

Nate Silver noted that “our research finds that pollsters that meet the transparency criterion still are outperforming others”.

Put simply: pollsters who provide clear explanations on how the poll was conducted tend to be more accurate than those who do not.

Surveys – the basis of polling – are resolutely based on random samples, where everyone in the population has an equal chance of being selected. This generates surveys that include a range of all types of people from all types of communities in the country, thus covering race, religion, sex, politics, income and so on.

A well-drawn sample also allows relatively small sample sizes to reflect the views of the entire country, with measurable accuracy.

When teaching sampling, the easiest analogy is the bowl of soup. To test if soup is ready, you stir it well, and one spoonful will tell you the answer, rather than a spoonful from left and right, top and bottom, and so on. A sample is similar: if it is well mixed, a small number can (with measurable accuracy) represent the whole.

Bigger samples do have lower error bars than small samples, but they are horribly expensive. The recent Change Starts Now poll, with a 9,000 respondent sample, had an error bar below 1%.

Ipsos released a poll soon afterwards, with a sample of 3,600 and told readers that “all sample surveys are subject to a margin of error, determined by sample size, sampling methodology and response rate. The sample error for this sample at a 95% confidence level is a maximum of plus or minus 1.8%”.

The “sampling strategy” was not explained, but a 1.8% sample error means that if the same procedure is used many times over, 95% of the time the correct population average will be within the sample estimate plus or minus 1.8%. The “plus or minus” matters – a finding for the whole sample may be down by 1.8% or up by 1.8% – that’s the range of error.

The raw data are then “weighted” to the total population. Weights are adjustment factors. If, for example, your fieldworkers realised a sample with men at 43%, they would be weighted up to the 49% they should be; the too-large female sample would be adjusted in the opposite direction.

Clearly, sampling is a key problem for both phone-based and internet surveys, also used in South Africa.

In 2023, South Africa was said to have internet penetration standing at 72% – so are the remaining 28% simply bypassed? What happens if that 28% of people are markedly different from the 72%, either demographically or in the views they hold?

Census 2022 found that 92% of the population had a cellphone. Again, what of the 8% who do not?

In households where one person owns a cellphone, what about the characteristics of the others?

In rural India, 91% cellphone penetration existed at the time of Covid – but this varied widely depending on which voting district was being studied.

Furthermore, the study found, “poverty is just one dimension of exclusion.

“In the Indian context, mobile phone ownership can vary across gender, age and caste. Previous surveys have found that within households, women and elderly individuals are less likely to have access to mobile phones than younger men.”

The parallels with South Africa are obvious.

To complicate matters further, many people own multiple cellphones, mixing work and personal, and using different providers. Others buy pre-loaded SIM cards and use them until empty, and buy another, so numbers change frequently.

Many people loathe being cold-called, and refuse to answer, or put the phone down when irritated. In America, where phone surveying was formerly the gold standard, response rates have gone from 36% in 1997 to 6% today.

The role of the reader

Surveys and polls come with some rules that need to be made clear to readers, so the reader has to do some work too – if engagement is not entirely based on a priori disbelief, which seems an odd starting position.

The key question that readers ask when confronted by a poll is – “can we trust the findings?” To be able to answer that question, we now know that some basic knowledge of terminology helps, but the key criterion is transparency on the part of the pollster – about the poll and about what they are doing with the data.

Look at the slice of KwaZulu-Natal data from the Change Starts Now survey. Here you can see all the options available to respondents set out transparently – the only filter is the need for them to be registered voters.

polls

Clearly, this survey did not claim to predict the election – fieldwork occurred in November and a week of December 2023. No manifesto had been launched. No campaigning had begun. Established parties (especially the ANC) are traditionally polled at their lowest ebb this far out from voting day.

Thus while some readers insist that the data need to be recalculated to only show how votes will look as if it were election day, this is rather silly. If the reader wants to know the answer, being transparent with the data – showing all options (as our kids learn at school, show all your workings) ensures they can do so themselves.

Simply subtract the “won’t vote” and “DK/Refused” groups (the larger red circle on the right) to keep only declared voters in the frame (the red oval), and re-percentage (ie party votes are now calculated out of 72% of KZN respondents who stated a vote preference): the ANC comes out at 36%, the DA at 21%, the EFF at 17%, and “other” at 28% (percentages have been rounded up and may not add up to 100%).

But for smaller, new parties, suddenly it appears there are no available votes for them – everyone is already committed.

However, you can see (from the original graph) that 28% of the sample has been re-percentaged out of existence. The story this far from the election is the full spread of views, not a desperate rush for a headline.

When polls are released mere weeks away from 29 May, the graphs should show election predictions: six months out, that would be asinine.

But is that really the key finding here?

Surely the fact that 28% of KZN respondents refused to give a party choice is profound: it speaks to the fear that remains prevalent in the province, and also the fact that organising for the newly launched uMkhonto Wesizwe (MK) party has been happening for a while – MK was not an option in the survey, so how many were refusing to choose a party because it wasn’t there? Surely it is better to see that large undeclared group than re-percentaging them away like pixie dust?

Given that MK is taking votes from other parties and has a 28% pool of undeclared votes to fish in, and could be a major disrupter – that is the key takeaway from this graphic.

That 28% can also be “imputed”. Simply put, data are used to model the “typical” party voter (generally by demographics, preferably with social attitudes included), and any respondent who refused to choose a party but matches that profile is imputed to the party whose voters they seem to typify.

In this case, one may say that rural, African men with matric education and aged above 29 are typical Inkatha Freedom Party members: thus everyone who fits that profile but did not choose a party is imputed to the IFP tally.

This may help parties with backroom strategising. But it is entirely wrong to impute votes and not tell the reader exactly how you did so, and show the before and after data. (And most won’t want to show their imputation technology, since it is a strategic tool.)

Again, angry letters demand that recalculation and/or imputation must be done – but with no value-add this far out from the election, it sounds rather inane.

So: check your pollster. Are they telling you how they ran the survey, what methodology they used, their error bar and showing you all their data? Or are they just showing you a recalculated (including imputation) final tally of their own, and asking that you “trust us”?

I know which one I prefer. DM

David Everatt is a Professor at the Wits School of Governance.

Comments - Please login in order to comment.

T'Plana Hath says:

28 February 2024 at 14:03

There are two types of people in the world: those who can extrapolate from incomplete data.

Log in to Reply

Cookie	Duration	Description
__cfduid	1 month	The cookie is used by cdn services like CloudFlare to identify individual clients behind a shared IP address and apply security settings on a per-client basis. It does not correspond to any user ID in the web application and does not store any personally identifiable information.
_GRECAPTCHA	5 months 27 days	This cookie is set by Google. In addition to certain standard Google cookies, reCAPTCHA sets a necessary cookie (_GRECAPTCHA) when executed for the purpose of providing its risk analysis.
cid	1 year	This is an important cookie in making credit card transaction on the website. It allows the online transaction without storing the credit card information.This service is provided by Stripe.com.
connect.sid	1 month	This cookie is used for authentication and for secure log-in. It registers the log-in information.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
elementor	never	This cookie is used by the website's WordPress theme. It allows the website owner to implement or change the website's content in real-time.
JSESSIONID	session	Used by sites written in JSP. General purpose platform session cookies that are used to maintain users' state across page requests.
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__atuvc	1 year 1 month	This cookie is set by Addthis to make sure you see the updated count if you share a page and return to it before our share count cache is updated.
__atuvs	30 minutes	This cookie is set by Addthis to make sure you see the updated count if you share a page and return to it before our share count cache is updated.
__cf_bm	30 minutes	This cookie is set by CloudFlare. The cookie is used to support Cloudflare Bot Management.
__pvi	1 day	This cookie is used for the implementation of the news content from other sites.
bcookie	2 years	This cookie is set by linkedIn. The purpose of the cookie is to enable LinkedIn functionalities on the page.
lidc	1 day	This cookie is set by LinkedIn and used for routing.

Cookie	Duration	Description
__gads	1 year 24 days	This cookie is set by Google and stored under the name dounleclick.com. This cookie is used to track how many times users see a particular advert which helps in measuring the success of the campaign and calculate the revenue generated by the campaign. These cookies can only be read from the domain that it is set on so it will not track any data while browsing through another sites.
_ga	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_ga_Y7XD5FHQVG	2 years	This cookie is installed by Google Analytics.
_gat_UA-10686674-1	1 minute	This is a pattern type cookie set by Google Analytics, where the pattern element on the name contains the unique identity number of the account or website it relates to. It appears to be a variation of the _gat cookie which is used to limit the amount of data recorded by Google on high traffic volume websites.
_gid	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visted in an anonymous form.
ajs_anonymous_id	never	This cookie is set by Segment.io to check the number of ew and returning visitors to the website.
ajs_user_id	never	The cookie is set by Segment.io and is used to analyze how you use the website
ANON_ID	3 months	This cookie is provided by Tribalfusion. The cookie is used to give a unique number to visitors, and collects data on user behaviour like what page have been visited. This cookie also helps to understand which sale has been generated by as a result of the advertisement served by third party.
jam_heavy_ga_session	5 years	This cookie is installed by Google Analytics.
UserID1	3 months	The cookie sets a unique anonymous ID for a website visitor. This ID is used to continue to identify users across different sessions and track their activities on the website. The data collected is used for analysis.
uvc	1 year 1 month	The cookie is set by addthis.com to determine the usage of Addthis.com service.

Cookie	Duration	Description
__tbc	2 years	This cookie is used for measuring the efficiency of advertisement by registering data on visitors from multiple website.
_cc_aud	8 months 26 days	The cookie is set by crwdcntrl.net. The purpose of the cookie is to collect statistical information in an anonymous form about the visitors of the website. The data collected include number of visits, average time spent on the website, and the what pages have been loaded. These data are then used to segment audiences based on the geographical location, demographic, and user interest provide relevant content and for advertisers for targeted advertising.
_cc_cc	session	The cookie is set by crwdcntrl.net. The purpose of the cookie is to collect statistical information in an anonymous form about the visitors of the website. The data collected include number of visits, average time spent on the website, and the what pages have been loaded. These data are then used to segment audiences based on the geographical location, demographic, and user interest provide relevant content and for advertisers for targeted advertising.
_cc_dc	8 months 26 days	The cookie is set by crwdcntrl.net. The purpose of the cookie is to collect statistical information in an anonymous form about the visitors of the website. The data collected include number of visits, average time spent on the website, and the what pages have been loaded. These data are then used to segment audiences based on the geographical location, demographic, and user interest provide relevant content and for advertisers for targeted advertising.
_cc_id	8 months 26 days	The cookie is set by crwdcntrl.net. The purpose of the cookie is to collect statistical information in an anonymous form about the visitors of the website. The data collected include number of visits, average time spent on the website, and the what pages have been loaded. These data are then used to segment audiences based on the geographical location, demographic, and user interest provide relevant content and for advertisers for targeted advertising.
_kuid_	5 months 27 days	The cookie is set by Krux Digital under the domain krxd.net. The cookie stores a unique ID to identify a returning user for the purpose of targeted advertising.
_rxuuid	1 year	The main purpose of this cookie is targeting, advertesing and effective marketing. This cookie is used to set a unique ID to the visitors, which allow third party advertisers to target the visitors with relevant advertisement up to 1 year.
ANON_ID_old	3 months	This cookie helps to categorise the users interest and to create profiles in terms of resales of targeted marketing. This cookie is used to collect user information such as what pages have been viewed on the website for creating profiles.
bscookie	2 years	This cookie is a browser ID cookie set by Linked share Buttons and ad tags.
CMID	1 year	The cookie is set by CasaleMedia. The cookie is used to collect information about the usage behavior for targeted advertising.
CMPRO	3 months	This cookie is set by Casalemedia and is used for targeted advertisement purposes.
CMPS	3 months	This cookie is set by Casalemedia and is used for targeted advertisement purposes.
CMST	1 day	The cookie is set by CasaleMedia. The cookie is used to collect information about the usage behavior for targeted advertising.
DSID	1 hour	This cookie is setup by doubleclick.net. This cookie is used by Google to make advertising more engaging to users and are stored under doubleclick.net. It contains an encrypted unique ID.
google_push	5 minutes	This cookie is set by the Bidswitch. This cookie is used to collect statistical data related to the user website visit such as the number of visits, average time spent on the website and what pages have been loaded. This collected information is used to sort out the users based on demographics and geographical locations inorder to serve them with relevant online advertising.
i	1 year	The purpose of the cookie is not known yet.
id	3 months	The main purpose of this cookie is targeting and advertising. It is used to create a profile of the user's interest and to show relevant ads on their site. This Cookie is set by DoubleClick which is owned by Google.
IDE	1 year 24 days	Used by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
IDSYNC	1 year	This cookie is used for advertising purposes.
KADUSERCOOKIE	3 months	The cookie is set by pubmatic.com for identifying the visitors' website or device from which they visit PubMatic's partners' website.
KTPCACOOKIE	1 day	This cookie is set by pubmatic.com for the purpose of checking if third-party cookies are enabled on the user's website.
ljt_reader	1 year	This is a Lijit Advertising Platform cookie. The cookie is used for recognizing the browser or device when users return to their site or one of their partner's site.
loc	1 year 1 month	This cookie is set by Addthis. This is a geolocation cookie to understand where the users sharing the information are located.
mc	1 year 1 month	This cookie is associated with Quantserve to track anonymously how a user interact with the website.
mt_mop	1 month	Stores information about how the user uses the website such as what pages have been loaded and any other advertisement before visiting the website for the purpose of targeted advertisements.
personalization_id	2 years	This cookie is set by twitter.com. It is used integrate the sharing features of this social media. It also stores information about how the user uses the website for tracking and targeting.
suid_legacy	1 year	This cookie is used to collect information on user preference and interactioin with the website campaign content. This cookie is used for promoting events and products by the webiste owners on CRM-campaign-platform.
TDCPM	1 year	The cookie is set by CloudFlare service to store a unique ID to identify a returning users device which then is used for targeted advertising.
TDID	1 year	The cookie is set by CloudFlare service to store a unique ID to identify a returning users device which then is used for targeted advertising.
test_cookie	15 minutes	This cookie is set by doubleclick.net. The purpose of the cookie is to determine if the user's browser supports cookies.
tluid	3 months	This cookie is set by the provider AdRoll.This cookie is used to identify the visitor and to serve them with relevant ads by collecting user behaviour from multiple websites.
tuuid	1 year	This cookie is set by .bidswitch.net. The cookies stores a unique ID for the purpose of the determining what adverts the users have seen if you have visited any of the advertisers website. The information is used for determining when and how often users will see a certain banner.
tuuid_lu	1 year	This cookie is set by .bidswitch.net. The cookies stores a unique ID for the purpose of the determining what adverts the users have seen if you have visited any of the advertisers website. The information is used for determining when and how often users will see a certain banner.
uid	5 months 27 days	This cookie is used to measure the number and behavior of the visitors to the website anonymously. The data includes the number of visits, average duration of the visit on the website, pages visited, etc. for the purpose of better understanding user preferences for targeted advertisments.
uuid	1 year 27 days	To optimize ad relevance by collecting visitor data from multiple websites such as what pages have been loaded.
VISITOR_INFO1_LIVE	5 months 27 days	This cookie is set by Youtube. Used to track the information of the embedded YouTube videos on a website.
wfivefivec	1 year 1 month	The domain of this cookie is owned by Dataxu. The main business activity of this cookie is targeting and advertising. This cookie tracks the advertisement report which helps us to improve the marketing activity.
xbc	2 years	This cookie is used for optmizing the advertisement on the website more relevant by analysing the user behaviour and interaction with the website.
YSC	session	This cookies is set by Youtube and is used to track the views of embedded videos.

Cookie	Duration	Description
__browsiSessionID	30 minutes	No description available.
__browsiUID	1 year	No description available.
__cflb	23 hours	This cookie is used by Cloudflare for load balancing.
__gpi	1 year 24 days	No description
ajs_group_id	never	This cookie is set by Segment.io. The purpose of the cookie is currently not identified.
blkbs	6 days 23 hours	No description
charitable_session	1 day	No description available.
cookietest	session	No description
debug	never	No description available.
gCStest	7 years 1 month 26 days 16 hours	No description
muc_ads	2 years	No description
revengine_browser_id	session	No description
revengine-browser-token	session	RevEngine Data Tool.
rl_user_id	never	No description available.
tf_respondent_cc	6 months	No description
UserMatchHistory	1 month	Linkedin - Used to track visitors on multiple websites, in order to present relevant advertisement based on the visitor's preferences.
vic_loc_error	10 minutes	No description
vicinity_id	1 year 10 months 24 days 11 hours	Vicinity Advertising.

Defend Truth

ROAD TO 2024 ELECTIONS OP-ED

Opinion polls are like a bowl of soup – you have to stir well to taste the whole

As we approach elections on 29 May, the public is being bombarded with polls, often generating disparate or conflicting results. Pollsters who provide clear explanations of how the poll was conducted tend to be more accurate than those who do not.

An unregulated industry – time to change?

Wobbly polling

Transparency

The role of the reader

Comments - Please login in order to comment.

Investigations

Investigations

News & Analysis

News & Analysis

Features

Features

Newsletters

Newsletters

Community

Community

DM168

DM168

This article is free to read.

Sign up for free or sign in to continue reading.

Daily Maverick needs your support

We would like our readers to start paying for Daily Maverick...

Join the Gauteng Premier Debate.

Feeling powerless in politics?

Equip yourself with the tools you need for an informed decision this election. Get the Elections Toolbox with shareable party manifesto guide.

Defend Truth

ROAD TO 2024 ELECTIONS OP-ED

Opinion polls are like a bowl of soup – you have to stir well to taste the whole

As we approach elections on 29 May, the public is being bombarded with polls, often generating disparate or conflicting results. Pollsters who provide clear explanations of how the poll was conducted tend to be more accurate than those who do not.

An unregulated industry – time to change?

Wobbly polling

Transparency

The role of the reader

Comments - Please login in order to comment.

Investigations

Investigations

News & Analysis

News & Analysis

Features

Features

Newsletters

Newsletters

Community

Community

DM168

DM168

Please peer review 3 community comments before your comment can be posted

This article is free to read.

Sign up for free or sign in to continue reading.

Daily Maverick needs your support

We would like our readers to start paying for Daily Maverick...

Join the Gauteng Premier Debate.

Feeling powerless in politics?

Equip yourself with the tools you need for an informed decision this election. Get the Elections Toolbox with shareable party manifesto guide.