Is unstructured data the new lifeblood of market research?

The “briefing questions for unstructured data” were published by ESOMAR last week, and not a moment too soon. It was high time to replace the “24 Questions to help buyers of social media research” published in 2012, with a new updated document that provides guidance to buyers of social intelligence and text analytics solutions for sources other than social media. This comprehensive work includes 26 questions and has taken a year to complete, from inception to publication.

In 1998, Merrill Lynch cited a rule of thumb that somewhere around 80-90% of all potentially usable business information may originate in unstructured form.^[1] Today we see even stronger statements floating around, such as: over 90% of all the human knowledge accumulated since the beginning of time, is unstructured data. This includes text, images, audio and video. We can think of the other 10% as numbers in tables (structured), which is the primary result of any quantitative market or marketing research.

Other than reading, listening to, or viewing unstructured data there is another way to understand their meaning. Especially if we are dealing with big data then there is only one way to discover and understand the information hidden in mega-,giga-,tera-, peta- or n-ta-bytes of data: artificial intelligence. With machine learning – which is the discipline that produces A.I. – we have the ability to create models that can process large files of text or images in seconds, and annotate sentences, paragraphs, sections, objects or even whole documents with topics, sentiment and specific emotions. Sentiment and semantic analysis are the two most popular ways to analyse and understand unstructured data with the use of machine learning or a rules based approach. When the unstructured data to be analysed is in text format, the discipline falls under Computer Science (not linguistics funnily enough) and is called Natural Language Processing (NLP) or Text Analytics.

There seems to be a lot of perceived complexity in using artificial intelligence or other engineered approaches to analyse text and images in an automated way. We need to whether radically simplify, or acquire enough knowledge to understand what may seem complex and difficult at a first glance. Whatever the case, this article and certainly the ESOMAR briefing aim to educate and simplify at the same time.

Some machine learning basics

Before we dive into the various use cases of NLP and practical applications of the briefing, let’s set the stage with some basic information on what is possible, and what not so much.

What is possible with machine learning:

sentiment and semantic agreement of humans with the annotations of a machine learning model over 80%
to achieve the above accuracy regardless of text language
to annotate text for multiple emotions that go beyond positive and negative sentiment
to automatically caption millions of images using text that can then be analysed for topics and sentiment
to analyse data from any source – machine learning is not only language but also data source agnostic

What is really difficult to achieve with machine learning:

100% agreement of multiple humans with the annotations of a machine learning model (100% agreement of one human with the machine learning model is achievable)
over 70% accuracy with a subject generic machine learning model in a language
over 70% accuracy using a rules based approach
combined accuracy for brand, sentiment and topics over 70%

Use cases for unstructured data analytics

There is multitude of users, data sources and use cases within an organisation, and all of them can benefit from the document ESOMAR has published. Let’s start with relevant data sources:

Social Media
Other public websites
Answers to open ended questions
Transcripts of in-depth interviews and focus group discussions
Call centre conversations with customers
Organic conversations on private online communities

ESOMAR mainly caters to the market researchers in organisations globally, but there are many more users of text and image analytics solutions sitting in different departments, that can benefit from this briefing. Here is a combined list of users and use case examples for each one, which is not exhaustive by any means:

Market research – for insights from social and other unstructured data sources
Public relations – to manage brand and corporate reputation
Customer service – to respond to questions, complaints and requests
Advertising – to leverage positive testimonials
Marketing – to find and leverage influencers
Product Development – to learn about missing product features or ones that are not appreciated by consumers
Innovation (beyond new product development) – to learn about emerging trends and new product use cases
Competitive Intelligence – to gauge how competitors are doing in an industry or product category
Operations – to learn about issues that need fixing
Finance (together with marketing) – to find out about sentiment towards pricing
Board – to benchmark and track sentiment on governance
Sales – to find sales leads who express purchase intent

Based on the fact that there are so many use cases, there are many tools that initially started with a single use case in mind – the most popular ones being public relations, reputation management and customer care. As time went, by these tools were looking for growth, so they – almost without exception – decided to dabble in the market research sector. This very fact created an immense problem for the market research industry: it led to a delay in the adoption of social intelligence solutions i.e. the use of text and image analytics to process and annotate unsolicited opinions on the web, for consumer insights purposes.

To offer some more clarity, the delay happened because insights professionals tried out some of the social media monitoring tools that were around in 2010-2012, and figured out that their accuracy was so low that could not be used for market research purposes. This is why ESOMAR had created the 24 questions to help buyers of social media research back in 2012. By that time the market research world had already written off social media listening – as many called it – as not accurate, not representative, and by extension not only useless but also possibly harmful.

Fast forward to 2019, thankfully, perceptions have changed. It was proven to the powers that be that social media data can be cleaned from irrelevant posts, and text analytics can be accurate enough for market research purposes. This makes the ESOMAR guide comprising 26 questions to ask before you buy your way into an automated text and image analytics capability so timely and necessary, not just for market researchers but for all prospective buyers out there.

Practical applications of the questions in this briefing

There are five sections in the document that are meant to guide buyers of related tools and services to ask vendors the right questions. The answers to these questions will enable buyers to take an informed purchasing decision. Here are the five sections:

Company Profile and Capabilities

First of all it is important to know who we are dealing with. Is this a pure technology company with a tool or do they have any subject matter expertise? For example, market research and insights expertise would be nice if the buyer is an insights professional.

Data sources and types

Is this solution making use of specific data sources that it provides as part of the service or is it just an analytics solution – meaning the client should provide the data for the analysis. Even if the company provides data, is the technology source agnostic? In other words, can it process and accurately annotate all 5 source examples listed above?

Software design and capabilities

This section is one of the two most important ones. It helps the buyer understand how the data processing, annotation and analysis is done; in which languages and what types of data are analysed.

Data quality and validation

This is the other one of the two most important sections in the briefing. We all know the saying: garbage in – garbage out. This is about cleaning the data before processing and annotating them.

Ethical and legal compliance

The ESOMAR code of conduct has always been stricter than the law and this briefing is no different. Not only should the vendor be GDPR compliant, but they should also ensure that no harm is done to subjects in the research no matter how insignificant it may seem.

For some questions there are no right or wrong answers; the vendor just needs to have a plausible answer – if they do not, then that in itself would constitute a red flag. As an example, if on question 18 about the vendor’s minimum accuracy the answer is “What do you mean?” then a good next step for the buyer would be to walk away… and fast!

Tip for the uninitiated

Consumer research has typically been performed by asking questions in surveys or qualitative research. For many insights professionals, social media intelligence or intelligence extracted from other unstructured data sources, is fairly new. If this guide is your first exposure to Natural Language Processing or image analytics then it is possible that some of the questions or explanations for context will not be enough to get a thorough understanding of the issue the guide is trying to address. In such a case feel free to contact ESOMAR or the project team co-chairs directly with your questions.

If it turns out that we will need to create answers to frequently asked questions about the 26 questions and their possible answers, then this may imply that we did not do such a good job simplifying for our audience. The only consolation is that even if it contains a lot of complexity, it is a step in the right direction. Thank you ESOMAR for being open, flexible and very supportive to this initiative.

Cookie	Type	Duration	Description
cli_user_preference	persistent	1 year	Keeps track of the cookie consents for on the current domain.
cookielawinfo-checkbox-marketing	persistent	1 year	Keeps track of the cookie consent for a specific category on the current domain.
cookielawinfo-checkbox-measurement	persistent	1 year	Keeps track of the cookie consent for a specific category on the current domain.
cookielawinfo-checkbox-necessary	persistent	1 year	Keeps track of the cookie consent for a specific category on the current domain.
cookielawinfo-checkbox-non-necessary	0	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Non Necessary".
cookielawinfo-checkbox-preferences	persistent	1 year	Keeps track of the cookie consent for a specific category on the current domain.
hustle_module_show_count-	persistent	1 day	This cookie is used to determine when the internal slide-in/pop-up/embed module for newsletter opt-ins is displayed to the user.
inc_optin_	persistent	1 hour	This cookie is used to determine when the internal slide-in/pop-up/embed module for newsletter opt-ins is displayed or hidden to the user.
PHPSESSID	session	0 minute	Preserves user session state across page requests. The PHPSESSID cookie is native to PHP and enables websites to store serialised state data. On the website it is used to establish a user session and to pass state data via a temporary cookie, which is commonly referred to as a session cookie. Stores unique session ID.
viewed_cookie_policy	persistent	1 hour	Stores the user's cookie consent state for the current domain.
viewed_cookie_policy	0	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
wordpress_	session	session	WordPress cookie for a logged in user.
wordpress_logged_in_	session	session	WordPress cookie for a logged in user.
wordpress_test_	session	session	WordPress cookie for a logged in user.
wordpress_test_cookie	session	session	WordPress test cookie.
wp-settings-	session	session	Wordpress also sets a few wp-settings-[UID] cookies. The number on the end is your individual user ID from the users database table. This is used to customize your view of admin interface, and possibly also the main site interface.
wp-settings-time-	session	session	Wordpress also sets a few wp-settings-{time}-[UID] cookies. The number on the end is your individual user ID from the users database table. This is used to customize your view of admin interface, and possibly also the main site interface.

Cookie	Type	Duration	Description
AMP_TOKEN	persistent	1 year	This cookie name is associated with Google Universal Analytics - which is a significant update to Google's more commonly used analytics service. It contains a token that can be used to retrieve a Client ID from AMP Client ID service. Other possible values indicate opt-out, inflight request or an error retrieving a Client ID from AMP Client ID service.
collect	third party	session	Used to send data to Google Analytics about the visitor's device and behaviour. Tracks the visitor across devices and marketing channels.
_ga	persistent	2 year	Registers a unique ID that is used to generate statistical data on how the visitor uses the website.
_gid	persistent	1 day	Registers a unique ID that is used to generate statistical data on how the visitor uses the website.
__gads	third party	2 years	Associated with the DoubleClick for Publishers service from Google. It serves purposes such as measuring interactions with the ads on our domain and preventing the same ads from being shown to you too many times.
__utma	persistent	2 years	This cookie is typically written to the browser upon the first visit. If the cookie has been deleted by the browser operator, and the browser subsequently visits strategy-business.com, a new __utma cookie is written with a different unique ID. In most cases, this cookie is used to determine unique visitors to strategy-business.com, and it is updated with each page view. Additionally, this cookie is provided with a unique ID that Google Analytics uses to ensure both the validity and the accessibility of the cookie as an extra security measure.
__utmb	persistent	30 minutes	This cookie is typically written to the browser upon the first visit. If the cookie has been deleted by the browser operator, and the browser subsequently visits strategy-business.com, a new __utma cookie is written with a different unique ID. In most cases, this cookie is used to determine unique visitors to strategy-business.com, and it is updated with each page view. Additionally, this cookie is provided with a unique ID that Google Analytics uses to ensure both the validity and the accessibility of the cookie as an extra security measure.
__utmc	persistent	30 minutes	Historically, this cookie operated in conjunction with the __utmb cookie to determine whether or not to establish a new session for the user. For backward compatibility purposes with sites still using the urchin.js tracking code, this cookie will continue to be written and will expire when the user exits the browser. However, if you are debugging your site tracking and you use the ga.js tracking code, you should not interpret the existence of this cookie in relation to a new or expired session.
__utmv	persistent	2 years	This cookie is not normally present in a default configuration of the tracking code. The __utmvcookie passes the information provided via the _setVar() method, which you use to create a custom user segment. This string is then passed to the Analytics servers in the GIF request URL via the utmcc parameter. This cookie is written only if you have added the¬_setVar() method for the tracking code on your website page.
__utmz	persistent	6 months	This cookie stores the type of referral used by the visitor to reach strategy-business.com, whether via a direct method, a referring link, a website search, or a campaign such as an ad or an email link. It is used to calculate search engine traffic, ad campaigns, and page navigation within strategy-business.com. The cookie is updated with each page view to strategy-business.com.

Cookie	Type	Duration	Description
GoogleAdServingTest	persistent	session	Used to register what ads have been displayed to the user.
IDE	persistent	1 year	Used by Google DoubleClick to register and report the website user's actions after viewing or clicking one of the advertiser's ads with the purpose of measuring the efficacy of an ad and to present targeted ads to the user.
test_cookie	third party	1 day	Used to check if the user's browser supports cookies.
__ab12#	persistent	2 years	Pending

Top 10 Global Consumer Trends 2020

Top 10 Global Consumer Trends 2021

Understanding the Why? Projective Techniques in Qualitative…

African consumers resistance to e-commerce and what is…

The fascinating dynamism of the African Insights industry

Christmas 2020: Opportunities to close the year on…

Make your customer experience meaningful, not only frictionless

There Is a Way Out of This Mess

Nail Biting in Georgia US Senate Races –…

Media polling and the way forward

U.S. election pollsters: watch Florida for key indicators!

Post-pandemic marketing & advertising trends among marketers

Cross-Media Measurement, XMM: no viewing – no outcomes!

XMM Disconnect? As Alice went into Wonderland, things…

Innovations in media measurement, accelerated by COVID, establish…

Insight from the Insight250 winners: Data-driven leadership

Insights from the Insight250 winners: Evolutions and innovations…

Customer advocacy: How to turn customers into friends,…

Brands as provocations: How to connect at scale…

Predictive qual: How to turn the art of…

What It truly means to be tech-enabled in…

Insights on insights: Which survey data analysis solution…

Eating in, is the new testing out –…

Behavioural tech-heads: What technology needs to learn from…

SHOBSERVATORY Research Chronicles: The heart of the brand…

ESOMAR announces the 2021 award winners

SHOBSERVATORY Research Chronicles: How presentations are created

Is unstructured data the new lifeblood of market research?

Some machine learning basics

Use cases for unstructured data analytics

Practical applications of the questions in this briefing

Tip for the uninitiated

Leave a Comment Cancel Reply

Predictive qual: How to turn the art of qual into a science...

Is unstructured data the new lifeblood of market research?

Some machine learning basics

Use cases for unstructured data analytics

Practical applications of the questions in this briefing

Tip for the uninitiated

Leave a Comment Cancel Reply

Related Articles

We value your privacy!