The Behavioural Economics of Big Data

Colin Strong

It has long been recognised by those working with data that given a large enough sample size then most data points will have statistically significant correlations because at some level everything is related to everything else. The psychologist Paul Meehl famously called this the ‘Crud Factor’ leading us to believe there are real relationships in the data where in fact the linkage is trivial.

Nate Silver made the same point when he warned that the number of ‘meaningful relationships’ is not increasing in step with the meteoric increase in amount of data available. We simply generate a larger number of false positives, an issue endemic in data analytics which led John Ioannidis to suggest that two-thirds of the findings in medical journals were in fact not robust.

So if we cannot always rely on statistical techniques to cut through swathes of data to find meaningful patterns then where do we turn? Naturally, we look to ourselves. Perhaps this is implicit in the discussion about the qualities of good data scientists, being ‘informed sceptics’ that balance judgement and analysis or that the key qualities are having a sense of wonder, a quantitative knack, persistence and technical skills. However as soon as we recognise that humans are involved in the analysis of data we need to start exploring some of the frailties of our judgement for if there is one thing that behavioural economics has taught us, is that none of us is immune from misinterpreting data.

One cognitive function that is surely critical for any data scientist is the ability to find order and spot patterns in data. As humans we are excellent at doing this for which there are good evolutionary reasons – our ability to do so is what drives new findings and thus our advancement. But as with all cognitive functions, our strength is also a weakness as this ability is so integral to ourselves that this can tip over into detecting patterns when in fact none exist.

As Thomas Gilovitch points out in his classic book ‘How We Know What isn’t So’, one of the problems we encounter when looking at data is that it’s very hard for us to see when it is random and when there are in fact patterns. When we look at a truly random sequence we tend to think there are patterns in the data because it somehow looks too ordered or ‘lumpy’. So, for example, when we throw a coin twenty times then there is a 50% chance of getting 4 heads in a row, a 25 percent chance of five in a row, and a 10 percent chance of a run of six. But if you give this sequence to most individuals they will consider that these were patterns in the data and not at all random. This explains the ‘hot hand’ fallacy where we think we are on a winning streak – in whatever that may be – from cards to basketball to football. In each of these areas where the data is random but happens to include a sequence we massively over-interpret the importance of this pattern.

This effect does not only apply to numeric data but also to analysis of visuals. This is important as visualisation is rapidly becoming a key element of Big Data analytics. A good example of the pitfalls is from the latter part of World War II when the Germans had a particularly intense bombing campaign on London. It was a commonly held view at the time that the bombs were landing in clusters which made some parts of London more dangerous than others. However, after the war analysis of the data showed that the bombs had in fact landed in a random sequence and no part of London was more dangerous than another.

It is easy to see why Londoners had, at the time, concluded there was a sequence in the bombing as eye balling the data retrospectively easily allows one to start seeing patterns. A more rigorous approach of course requires us to generate hypotheses which are then tested on other sets of data.

And of course once we start seeing patterns we quickly start to generate stories that would explain data. As Duncan Watts might say, everything is obvious when you know why. We hate uncertainty and strive to reduce this by quickly adopting explanations. The challenge we then face is that we tend to only seek out information which is consistent with the story we have developed. This ‘confirmation bias’ was first identified in a series of experiments in the 1960’s which showed that we seek out data which confirms our theory rather than test it. And when we get new information we tend to interpret it in a way that is self-serving. So we cement in our misinterpretation of data alarmingly quickly.

Of course seeing patterns in data is hugely helpful when it allows us to generate hypotheses that we use as the starting point of proper testing of the data. And this is where big data needs to adopt the rigour of experimental approach, whereby we state our objectives, formulate tangible hypotheses to reflect these objectives and then design experiments to test these hypotheses. Market research should be well placed in this respect given that our understanding of consumers allows us to identify ways to mine data, to build these hypotheses.

If we fail to do this and just ‘let the data speak for itself’ then we run this risk of being misled by unholy alliance of the Crud Factor of spurious statistical correlations along with fallible human judgement. Far from Big Data being the end of science, scientific method may in fact prove to be the salvation of big data.

Colin Strong heads up technology research at GfK in the UK. He is particularly interested in the way our lives are increasingly mediated by data – between consumers, brands and government. Colin is interested in exploring the opportunities this represents for a better understanding of human behaviour but also to examine the implications for brand strategy and social policy.

5 comments

Meagan August 11, 2016 at 5:47 pm

Wow! Afrer all I got a weblog from where I be capable oof truly obtain valuable data regarding my study and knowledge.

automotive May 10, 2016 at 5:28 am

Unquestionably imagine that which you said. Your
favorite reason seemed to be at the internet the simplest factor to have
in mind of. I say to you, I certainly get annoyed while
folks consider worries that they just do not know about.
You controlled to hit the nail upon the top as smartly as outlined out the entire thing without having side effect , people can take a
signal. Will probably be back to get more. Thanks

4twenty2 November 29, 2013 at 11:24 am

Thanks, very informative with great examples of The Crud Factor in action.

Sandra Pickering November 28, 2013 at 9:34 am

Thanks, Colin.
A very pertinent and timely piece.
As a complementary point to yours, I’d also add that the hypotheses to be tested should ideally come from outside the knowledge base for that data.
So, for example, hypotheses on shopper behaviour should come from theories rooted in psychology or sociology rather than in shopper market research.

colin strong November 28, 2013 at 7:20 pm

Thanks Sandra – I do agree – although let’s not underestimate the value of category understanding and expertise. Integrating these different approaches seems key to me.

Cookie	Type	Duration	Description
cli_user_preference	persistent	1 year	Keeps track of the cookie consents for on the current domain.
cookielawinfo-checkbox-marketing	persistent	1 year	Keeps track of the cookie consent for a specific category on the current domain.
cookielawinfo-checkbox-measurement	persistent	1 year	Keeps track of the cookie consent for a specific category on the current domain.
cookielawinfo-checkbox-necessary	persistent	1 year	Keeps track of the cookie consent for a specific category on the current domain.
cookielawinfo-checkbox-non-necessary	0	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Non Necessary".
cookielawinfo-checkbox-preferences	persistent	1 year	Keeps track of the cookie consent for a specific category on the current domain.
hustle_module_show_count-	persistent	1 day	This cookie is used to determine when the internal slide-in/pop-up/embed module for newsletter opt-ins is displayed to the user.
inc_optin_	persistent	1 hour	This cookie is used to determine when the internal slide-in/pop-up/embed module for newsletter opt-ins is displayed or hidden to the user.
PHPSESSID	session	0 minute	Preserves user session state across page requests. The PHPSESSID cookie is native to PHP and enables websites to store serialised state data. On the website it is used to establish a user session and to pass state data via a temporary cookie, which is commonly referred to as a session cookie. Stores unique session ID.
viewed_cookie_policy	persistent	1 hour	Stores the user's cookie consent state for the current domain.
viewed_cookie_policy	0	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
wordpress_	session	session	WordPress cookie for a logged in user.
wordpress_logged_in_	session	session	WordPress cookie for a logged in user.
wordpress_test_	session	session	WordPress cookie for a logged in user.
wordpress_test_cookie	session	session	WordPress test cookie.
wp-settings-	session	session	Wordpress also sets a few wp-settings-[UID] cookies. The number on the end is your individual user ID from the users database table. This is used to customize your view of admin interface, and possibly also the main site interface.
wp-settings-time-	session	session	Wordpress also sets a few wp-settings-{time}-[UID] cookies. The number on the end is your individual user ID from the users database table. This is used to customize your view of admin interface, and possibly also the main site interface.

Cookie	Type	Duration	Description
AMP_TOKEN	persistent	1 year	This cookie name is associated with Google Universal Analytics - which is a significant update to Google's more commonly used analytics service. It contains a token that can be used to retrieve a Client ID from AMP Client ID service. Other possible values indicate opt-out, inflight request or an error retrieving a Client ID from AMP Client ID service.
collect	third party	session	Used to send data to Google Analytics about the visitor's device and behaviour. Tracks the visitor across devices and marketing channels.
_ga	persistent	2 year	Registers a unique ID that is used to generate statistical data on how the visitor uses the website.
_gid	persistent	1 day	Registers a unique ID that is used to generate statistical data on how the visitor uses the website.
__gads	third party	2 years	Associated with the DoubleClick for Publishers service from Google. It serves purposes such as measuring interactions with the ads on our domain and preventing the same ads from being shown to you too many times.
__utma	persistent	2 years	This cookie is typically written to the browser upon the first visit. If the cookie has been deleted by the browser operator, and the browser subsequently visits strategy-business.com, a new __utma cookie is written with a different unique ID. In most cases, this cookie is used to determine unique visitors to strategy-business.com, and it is updated with each page view. Additionally, this cookie is provided with a unique ID that Google Analytics uses to ensure both the validity and the accessibility of the cookie as an extra security measure.
__utmb	persistent	30 minutes	This cookie is typically written to the browser upon the first visit. If the cookie has been deleted by the browser operator, and the browser subsequently visits strategy-business.com, a new __utma cookie is written with a different unique ID. In most cases, this cookie is used to determine unique visitors to strategy-business.com, and it is updated with each page view. Additionally, this cookie is provided with a unique ID that Google Analytics uses to ensure both the validity and the accessibility of the cookie as an extra security measure.
__utmc	persistent	30 minutes	Historically, this cookie operated in conjunction with the __utmb cookie to determine whether or not to establish a new session for the user. For backward compatibility purposes with sites still using the urchin.js tracking code, this cookie will continue to be written and will expire when the user exits the browser. However, if you are debugging your site tracking and you use the ga.js tracking code, you should not interpret the existence of this cookie in relation to a new or expired session.
__utmv	persistent	2 years	This cookie is not normally present in a default configuration of the tracking code. The __utmvcookie passes the information provided via the _setVar() method, which you use to create a custom user segment. This string is then passed to the Analytics servers in the GIF request URL via the utmcc parameter. This cookie is written only if you have added the¬_setVar() method for the tracking code on your website page.
__utmz	persistent	6 months	This cookie stores the type of referral used by the visitor to reach strategy-business.com, whether via a direct method, a referring link, a website search, or a campaign such as an ad or an email link. It is used to calculate search engine traffic, ad campaigns, and page navigation within strategy-business.com. The cookie is updated with each page view to strategy-business.com.

Cookie	Type	Duration	Description
GoogleAdServingTest	persistent	session	Used to register what ads have been displayed to the user.
IDE	persistent	1 year	Used by Google DoubleClick to register and report the website user's actions after viewing or clicking one of the advertiser's ads with the purpose of measuring the efficacy of an ad and to present targeted ads to the user.
test_cookie	third party	1 day	Used to check if the user's browser supports cookies.
__ab12#	persistent	2 years	Pending

Top 10 Global Consumer Trends 2020

Top 10 Global Consumer Trends 2021

Understanding the Why? Projective Techniques in Qualitative…

African consumers resistance to e-commerce and what is…

The fascinating dynamism of the African Insights industry

Christmas 2020: Opportunities to close the year on…

Make your customer experience meaningful, not only frictionless

There Is a Way Out of This Mess

Nail Biting in Georgia US Senate Races –…

Media polling and the way forward

U.S. election pollsters: watch Florida for key indicators!

Post-pandemic marketing & advertising trends among marketers

Cross-Media Measurement, XMM: no viewing – no outcomes!

XMM Disconnect? As Alice went into Wonderland, things…

Innovations in media measurement, accelerated by COVID, establish…

Insight from the Insight250 winners: Data-driven leadership

Insights from the Insight250 winners: Evolutions and innovations…

Customer advocacy: How to turn customers into friends,…

Brands as provocations: How to connect at scale…

Predictive qual: How to turn the art of…

What It truly means to be tech-enabled in…

Insights on insights: Which survey data analysis solution…

Eating in, is the new testing out –…

Behavioural tech-heads: What technology needs to learn from…

SHOBSERVATORY Research Chronicles: The heart of the brand…

ESOMAR announces the 2021 award winners

SHOBSERVATORY Research Chronicles: How presentations are created

The Behavioural Economics of Big Data

5 comments

Leave a Comment Cancel Reply

Predictive qual: How to turn the art of qual into a science...

The Behavioural Economics of Big Data

5 comments

Leave a Comment Cancel Reply

Related Articles

We value your privacy!