Little Data

Jerry W. Thomas

You are not feeling well, so you visit your friendly family doctor. He puts you in a new, electronic scanner and generates 28 trillion measurements of your temperature all over the surface of your body. He then saves all of these big-data measurements and, using advanced statistical algorithms and supercomputers, announces that your temperature is 98.6 degrees Fahrenheit. What a relief! Big data to the rescue.

The Bandwagon
As the “big data” bandwagon picks up momentum, consultants, professors, conference organisers, authors, magazines, blogs, software firms, pundits, crooks, private equity firms, and computer hardware manufacturers clamour to get aboard. Rarely has a bandwagon attracted so much attention or so many passengers. The basic premises of big data appear to be that:

More data is always better than less data.
Volume, variety and velocity of data create new sources of potential knowledge and prescience.
With big data, all questions can be answered; the “why” will finally be revealed to the human race, and the future can be accurately predicted.

Is big data an accurate picture of the future, or is it simply a mirage shimmering in the distant desert heat? Is it the pathway to ultimate truth, or is it only a bandwagon of exaggerated promises and illusory dreams?

The truth is the solution to marketing and business problems — and the identification of strategic opportunities — often lies in the realm of little data, not big data. You don’t have to boil the ocean to determine its salt content. You don’t have to eat the whole steer to know it’s tough.

The Limits of Data
The preponderance of business data — indeed, all data — in the world is historical data, or “tracking” data, such as financial data, sales data, customer behavioural data, weather data, and inventory data. Virtually all data tends to be backward-looking, analogous to looking in the rearview mirror to steer a car forward.

No matter how current or instantaneous data are (i.e., the velocity) or the sheer amount of data, the backward-looking bias is an omnipresent limitation. We might see trends in that data that give us an inkling of the near-term future, and we might be able to find out what has driven a firm’s success in the past, but most historical data are of limited value in predicting the future.

Data You Can Trust
Often, without thinking, we tend to see all data as equal, but rarely is this true. The corporate world is awash in data. It streams in from all directions 24 hours a day, and the data deluge continues to worsen.

In fact, the growing flood of data is part of the problem. More data often means more confusion. Which data are correct? What data can be trusted? Here’s a point of view on the trustworthiness of various types of data, ranked from most trustworthy to least:

Experimental Data. Carefully designed and carefully controlled experiments, conducted by objective third parties who are experts in such experiments, yield the most trustworthy data. Before-after and side-by-side controls are employed, along with sophisticated statistical analyses, to separate the noise from the signal.
Survey Research Data. Scientific research studies, conducted by experienced professionals who are objective third parties, yield trustworthy data. Often this data is experimental in nature. Research design, normative data, mathematical modelling, stimulus controls, statistical controls, historical experience, quality-assurance standards, etc., tend to make this data very precise. Noise tends to be minimal.
Marketing-Mix Modelling Data. The creation of an analytical database, the cleansing and normalising of that data, and the use of multivariate statistics and modelling to isolate and neutralise some of the noise tend to make marketing-mix modelling data better than actual sales data. The signal in marketing-mix modelling data is more stable, more reliable and more measurable. This type of data can be valuable in helping companies understand what variables are driving their businesses (is it media advertising, or the number of salesmen, or pricing differentials?), but it generally takes multiple years of data to get maximum value out of marketing-mix modelling.
Media-Mix Modelling Data. This is the same concept as marketing-mix modelling, just applied to a different set of variables. The same general rules apply. An analytic database, data cleansing, modelling and statistics allow the noise in the data to be minimised, so that the effects of various media can be isolated. Again, if combined with controlled experiments, the data and analyses are much more explanatory.
Sales Data. Sales data are pretty good, but not perfect, measures of actual sales. But sales are not reliable and valid measures of advertising effectiveness, optimal media spending, product quality, service productivity, competitive activities, etc. Sales data can only be trusted so far. The economy, competitive activity, the weather, inflation, the vacation cycle, news events, political events, aberrations in inventories and distribution, pricing disturbances, etc., create false echoes and distorted illusions. Sales data are not good measures of cause and effect. Sales are reasonably good measures of what happened, but not why it happened or what forces caused it to happen.
Eye-Tracking Data. With steady improvements in measurement equipment and software, the direction the human eye is pointing can be determined with a high degree of accuracy — less than one degree of error in a controlled environment with high-quality equipment. This can provide useful diagnostic information to help understand why a package, website or advertisement is failing to attract attention or failing to register certain messages or images.
Biometric or Physiological Measurements. Galvanic skin response, eye pupil dilation, heart rate, EEG (brainwave) measurements, facial emotions recognition, etc., are very interesting and exciting, and they may one day open portals into the human soul, but for the present these measures are largely speculative and unproven. Some of these measures are reasonably good at tracking arousal, but there’s no precise way to know if the arousal is positive or negative without bringing in survey or qualitative research.
Communities or Advisory Panel Data. Many large companies have bought into systems that allow them to frequently talk to and survey a small group of target consumers over and over again. Surveys among this group are conducted by various folks in the corporation on a daily or weekly basis. The cost per survey or measurement is relatively low — if the quality of outcomes is not taken into account. Such communities are not truly representative, not randomly chosen, and seldom ever validated. Over time the risks of conditioning and learning undermine the representativeness of the community, assuming it existed at the outset.

Social-Media Data. Social-media data are very popular in corporate America. The data are comparatively inexpensive, often massive, and real-time (day by day, hour by hour). Many new software tools and systems make analyses of the data relatively easy. Social-media data are, perhaps, most valuable as an early-warning system — of something going wrong, of a competitive initiative, or of an unexpected aberration. Social-media data, however, must always be viewed with suspicion and skepticism, for several reasons:

Many product categories and brands are scarcely ever mentioned in social media, making sample sizes too small for data reliability.
Social-media comments are influenced by the news cycle, special events, media advertising, promotions, publicity, movies, competitive activity, and television shows (i.e., there is a lot of noise in the data).
Social-media data are subject to manipulation. You may think you are following an important trend in the data, only to learn later it was a clever ruse to confuse by a competitor. Increasingly, corporations and other organisations are striving to create social-media content and manage social-media comments, so the research value of the data is rapidly diminishing.
As social-media comments are identified and collected via web scraping, we almost never know the exact source, the context, the stimulus, or the history that underlie a comment. These unknowns make interpretation risky, indeed. That’s why social-media data must be viewed with trepid spirit and jaundiced eye.

Little Data
Corporate decision makers often would be better served if they relied on tried-and-true tools and systems from the world of little data, rather than illusions from big data. Sampling theory teaches that if the sample is random, one can measure the behaviour or mood of the whole by talking to very few people.

A sample of 1,500 is sufficient to predict who will win a presidential election. A sample of 200 to 300 respondents is generally sufficient to predict how much the whole population will like a new product or service. A sample of 200 users can test a new peanut butter in-home for a week, and from this it can be precisely determined if the product is optimal and what its market share will be once introduced.

These are examples of little data. Survey research is relatively inexpensive, yet very accurate, because professional researchers know the source, stimulus, context, and history — and have tried-and-true measuring instruments, normative data, quality assurance, and controls.

Marketing research can be designed to be forward-looking and predictive, rather than backward-looking. Experienced researchers can create alternative futures and measure the relative appeal of the differing visions of the future. These researchers can predict the sales volume of new products within narrow tolerances, based on survey research. They can optimise the formulation of a new product via product testing. They can accurately predict the effectiveness of new commercials long before they air. They can measure the size and composition of an industry or category with amazing precision, based solely on scientific sampling and surveys.

All of this research is based on little data. The data are derived from random sampling, carefully controlled experiments, and/or scientific surveys. The sample and sampling error are known; the stimulus is known; the questions are known; the context is understood; and the meaning of the answers is known.

Despite the marketing hoopla and gurus touting big data, little data often provides a more accurate basis for sound corporate decision-making.

Jerry W. Thomas is President and Chief Executive of Decision Analyst Inc.

Cookie	Type	Duration	Description
cli_user_preference	persistent	1 year	Keeps track of the cookie consents for on the current domain.
cookielawinfo-checkbox-marketing	persistent	1 year	Keeps track of the cookie consent for a specific category on the current domain.
cookielawinfo-checkbox-measurement	persistent	1 year	Keeps track of the cookie consent for a specific category on the current domain.
cookielawinfo-checkbox-necessary	persistent	1 year	Keeps track of the cookie consent for a specific category on the current domain.
cookielawinfo-checkbox-non-necessary	0	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Non Necessary".
cookielawinfo-checkbox-preferences	persistent	1 year	Keeps track of the cookie consent for a specific category on the current domain.
hustle_module_show_count-	persistent	1 day	This cookie is used to determine when the internal slide-in/pop-up/embed module for newsletter opt-ins is displayed to the user.
inc_optin_	persistent	1 hour	This cookie is used to determine when the internal slide-in/pop-up/embed module for newsletter opt-ins is displayed or hidden to the user.
PHPSESSID	session	0 minute	Preserves user session state across page requests. The PHPSESSID cookie is native to PHP and enables websites to store serialised state data. On the website it is used to establish a user session and to pass state data via a temporary cookie, which is commonly referred to as a session cookie. Stores unique session ID.
viewed_cookie_policy	persistent	1 hour	Stores the user's cookie consent state for the current domain.
viewed_cookie_policy	0	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
wordpress_	session	session	WordPress cookie for a logged in user.
wordpress_logged_in_	session	session	WordPress cookie for a logged in user.
wordpress_test_	session	session	WordPress cookie for a logged in user.
wordpress_test_cookie	session	session	WordPress test cookie.
wp-settings-	session	session	Wordpress also sets a few wp-settings-[UID] cookies. The number on the end is your individual user ID from the users database table. This is used to customize your view of admin interface, and possibly also the main site interface.
wp-settings-time-	session	session	Wordpress also sets a few wp-settings-{time}-[UID] cookies. The number on the end is your individual user ID from the users database table. This is used to customize your view of admin interface, and possibly also the main site interface.

Cookie	Type	Duration	Description
AMP_TOKEN	persistent	1 year	This cookie name is associated with Google Universal Analytics - which is a significant update to Google's more commonly used analytics service. It contains a token that can be used to retrieve a Client ID from AMP Client ID service. Other possible values indicate opt-out, inflight request or an error retrieving a Client ID from AMP Client ID service.
collect	third party	session	Used to send data to Google Analytics about the visitor's device and behaviour. Tracks the visitor across devices and marketing channels.
_ga	persistent	2 year	Registers a unique ID that is used to generate statistical data on how the visitor uses the website.
_gid	persistent	1 day	Registers a unique ID that is used to generate statistical data on how the visitor uses the website.
__gads	third party	2 years	Associated with the DoubleClick for Publishers service from Google. It serves purposes such as measuring interactions with the ads on our domain and preventing the same ads from being shown to you too many times.
__utma	persistent	2 years	This cookie is typically written to the browser upon the first visit. If the cookie has been deleted by the browser operator, and the browser subsequently visits strategy-business.com, a new __utma cookie is written with a different unique ID. In most cases, this cookie is used to determine unique visitors to strategy-business.com, and it is updated with each page view. Additionally, this cookie is provided with a unique ID that Google Analytics uses to ensure both the validity and the accessibility of the cookie as an extra security measure.
__utmb	persistent	30 minutes	This cookie is typically written to the browser upon the first visit. If the cookie has been deleted by the browser operator, and the browser subsequently visits strategy-business.com, a new __utma cookie is written with a different unique ID. In most cases, this cookie is used to determine unique visitors to strategy-business.com, and it is updated with each page view. Additionally, this cookie is provided with a unique ID that Google Analytics uses to ensure both the validity and the accessibility of the cookie as an extra security measure.
__utmc	persistent	30 minutes	Historically, this cookie operated in conjunction with the __utmb cookie to determine whether or not to establish a new session for the user. For backward compatibility purposes with sites still using the urchin.js tracking code, this cookie will continue to be written and will expire when the user exits the browser. However, if you are debugging your site tracking and you use the ga.js tracking code, you should not interpret the existence of this cookie in relation to a new or expired session.
__utmv	persistent	2 years	This cookie is not normally present in a default configuration of the tracking code. The __utmvcookie passes the information provided via the _setVar() method, which you use to create a custom user segment. This string is then passed to the Analytics servers in the GIF request URL via the utmcc parameter. This cookie is written only if you have added the¬_setVar() method for the tracking code on your website page.
__utmz	persistent	6 months	This cookie stores the type of referral used by the visitor to reach strategy-business.com, whether via a direct method, a referring link, a website search, or a campaign such as an ad or an email link. It is used to calculate search engine traffic, ad campaigns, and page navigation within strategy-business.com. The cookie is updated with each page view to strategy-business.com.

Cookie	Type	Duration	Description
GoogleAdServingTest	persistent	session	Used to register what ads have been displayed to the user.
IDE	persistent	1 year	Used by Google DoubleClick to register and report the website user's actions after viewing or clicking one of the advertiser's ads with the purpose of measuring the efficacy of an ad and to present targeted ads to the user.
test_cookie	third party	1 day	Used to check if the user's browser supports cookies.
__ab12#	persistent	2 years	Pending

Top 10 Global Consumer Trends 2020

Top 10 Global Consumer Trends 2021

Understanding the Why? Projective Techniques in Qualitative…

African consumers resistance to e-commerce and what is…

The fascinating dynamism of the African Insights industry

Christmas 2020: Opportunities to close the year on…

Make your customer experience meaningful, not only frictionless

There Is a Way Out of This Mess

Nail Biting in Georgia US Senate Races –…

Media polling and the way forward

U.S. election pollsters: watch Florida for key indicators!

Post-pandemic marketing & advertising trends among marketers

Cross-Media Measurement, XMM: no viewing – no outcomes!

XMM Disconnect? As Alice went into Wonderland, things…

Innovations in media measurement, accelerated by COVID, establish…

Insight from the Insight250 winners: Data-driven leadership

Insights from the Insight250 winners: Evolutions and innovations…

Customer advocacy: How to turn customers into friends,…

Brands as provocations: How to connect at scale…

Predictive qual: How to turn the art of…

What It truly means to be tech-enabled in…

Insights on insights: Which survey data analysis solution…

Eating in, is the new testing out –…

Behavioural tech-heads: What technology needs to learn from…

SHOBSERVATORY Research Chronicles: The heart of the brand…

ESOMAR announces the 2021 award winners

SHOBSERVATORY Research Chronicles: How presentations are created

Leave a Comment Cancel Reply

Predictive qual: How to turn the art of qual into a science...

Little Data

Leave a Comment Cancel Reply

Related Articles

We value your privacy!