We tend to think that vast amounts of data are representative by definition, but that is not necessarily the case. Big data should also be subjected to stress tests. Three experts enlighten us on the Uber’s and Airbnb’s of online access panels, the rise of the robots and the promise of artificial intelligence (AI) in this article extracted from the ESOMAR 2019 Global Market Research report and edited for Research World.
With humans leaving an ever-increasing digital fingerprint of what they think, feel, say and do, computationally analysing increasing volumes of data to reveal key patterns, trends and associations has – undoubtedly – been key to delivering actionable insights at scale over the last few years. This observation, made by Caroline Frankum, Global CEO of Kantar’s Profiles Division, comes with an added warning though. “Following the GDPR’s implementation last May, and as data privacy legislation intensifies – e.g. California Consumer Privacy Act (CCPA) coming into effect in January 2020, and the growing uncertainty of third-party cookies – it is increasingly apparent that it is not the amount of data that’s important, but what organizations can do with data compliantly that matters.”
Even so, it is still often believed that vast amounts of data are representative by definition, notes Andrew Konya, CEO of Remesh, a platform that allows users to get qualitative insights at a quantitative scale to make better decisions. “We do encounter the common misconception that more data – bigger N – means higher confidence in results. However, most researchers seem very aware of how a non-representative sample of participants in quantitative research translates to lower quality results.”
Presidential poll
Pete Doe, Chief Research Officer at clypd, an audience-based sales platform for television advertising, observed earlier that online access panels have enabled cheap survey research to proliferate in the past decade. He now confirms that there is a natural tendency for people to think ‘bigger is better’, even among some professionals with some statistical training. “People are taught that margins of error reduce as sample sizes increase, but they aren’t always taught about biases in measurement. So yes, it is a fairly common misconception, but this is not a new problem.”
Honesty and transparency
Big data keeps revolutionising the insights industry at breakneck speed, almost as if it deliberately ignores the many red flags. But the representativity concerns are very real and need to be tackled. For Doe, the main question that any practical research user has to answer is: ‘Will I make a better decision if I use this data source?’ The answer to that question is not always clear, he stresses.
Some of the concerns about representativity are actually concerns about validity. “Inferring people’s behaviour from device activity is not simple, especially if the device is used by different people at different times.”
As for red flags, Doe thinks that users should be wary of companies that are unwilling or unable to explain their data sources. “No-one with a realistic outlook expects perfection in research, but we should expect honesty and transparency.”
Billions at risk
Whilst online access panels have enabled cheap survey research to proliferate in the past decade, the results obtained from these convenience panels can be biased and unrepresentative due to the sampling methods employed. Several checks and stress tests could be applied in order to diminish these effects.
Doe believes these checks need to reflect the use of the data and the money at risk. For media transaction data (ratings), oversight from Joint Industrial Councils (and the Media Rating Council in the USA) is worth the expenditure to assure quality when billions of dollars of ad spend ride on the data, he emphasises.
“And these services don’t use cheap online survey research data for that very reason. For research with a narrower scope and less money at risk, users should at least expect transparency around data sources, the recency of any classification data, details about the survey, response rates, sample sizes, data editing and projection methods.”
Uber’s and Airbnb’s
In order to illustrate the low-budget online panel trend, Frankum makes a telling comparison: “Tech-enabled entrants offering cheaper online access panels are like the Ubers and Airbnb’s of the panel world – they do not own any panellists but focus on renting opt-in panel assets to leverage their technical assets.”
“Clients are increasingly looking for richer profiles of consumers. This means compliantly matching behavioural data to proprietary profile attributes to create more addressable audiences, before a single survey question has been asked. This is something the cheaper online access panels are not set up for, or accredited to do. So, their data is more limited in how it can be used.”
Increased integration
The use of big data for gaining insights is still very much in development, and will see further changes over the next few years. Doe believes it will continue to grow, and there will be even cheaper options available. “But these will be unlikely to deliver quality findings. There could be attempts to harness AI to synthesize insights, for example by inferring behaviours and attitudes from online data, perhaps harnessing voice activated device data, within privacy constraints.”
The infrastructure and algorithms people refer to as big data will enable increased integration between passively collected data, across multiple channels, and first-person research data, predicts Konya. The result is likely to generate insights which are increasingly likened to behavioural impacts.
Profound shift
Doe stresses that data is not research. “Data is a raw material that needs to be refined before it can be used as part of a research study, and that requires research and statistical expertise.” As having lots of data becomes easier, Konya thinks people will shift their focus from ‘what is the N?’ to ‘how confident are we in this result?’ That shift will likely move the industry away from working to deliver the minimum cost per participant – or data point – to working to deliver the lowest cost per unit confidence.
At Kantar, Frankum and her people are currently learning a lot from working with AI. “AI is also something to keep a close eye on when it comes to making big data more representative for insights and market research in the future.”
This article is an excerpt of the original “Big data and representativity”, published in the 2019 Global Market Research report. Read the full content by accessing the report, here.