For all kinds of analytics, the first thing we should always ask is, how reliable are the data and how can I benefit from them?
In the field of analytics, it is very important that our data at least contain the following characteristics:
Characteristics of Data
For any form of data to be remotely useful, we first need to make sure that they are accurate. Flaws in data collection and implementation can taint the data, rendering them useless.
In order to obtain accurate data, we need to define the purpose of the data and the context in which it will be consumed.
A good example will be to capture the consumer interest level of a particular product.
How do we define “interest level”?
When an user lands on the product page? Or when the user adds the product to the cart?
Only by clearly defining the purpose and context, can we ensure the accuracy of the data.
Without complete data, the insights drawn will either be flawed or skewed.
Ever encountered a case where the result/insight posted by a company is highly controversial?
Try googling “singapore train breakdown“.
If the agenda is to paint the company in a good light, selective and incomplete data is the way to go.
But if we are genuinely interested to know how we are faring so that we can make improvement, data completeness is of utmost importance.
There are 2 parts to consistent data. One is to make sure that the naming of values are consistent, the other is to ensure that the data collection method is consistent.
Example – Inconsistent naming of values
For an e-Commerce website, a product can appear on many different webpages, such as home page, product page, campaign page. And on each of this different pages, the name of the same product might differ slightly.
Home page – “Fabuex Washing Powder”
Product page – “Fabuex Washing Powder – Soft Edition”
Campaign page – “Fabuex Washing Powder – Your Trusted Brand”
If we simply use these product names for analytics purposes, what we will have are 3 different pieces of data referring to the same product.
Remember, there is no good way for the analytics tools to differentiate whether these values belong to the same product or are referring to different products. In many analytics tools, we can do post-processing of data, but… why not do it right in the first place?
We have covered this in more details here.
Example – Inconsistent Data Collection Method
Referring back to the same example where we were trying to define “consumer interest level”.
During the analytics implementation, we realized that both developers were collecting the data differently.
Developer A – Capture data on load of product X detail page
Developer B – Capture data on click of “Add To Cart” on product Y detail page
Assuming that both products have the same amount of traffic, which product do you think will have a higher “consumer interest level”?
Answer? Product X. Reason being, landing on a product detail page does not mean that the user will click on the “Add To Cart” button. But if the user click on the “Add To Cart” button, surely he or she must have landed on the product detail page in order to do that right?
A tiger is a cat, but not all cats are tigers.
With consistent data, we can then draw comparisons and provide relevant insights.
Consumers are fickle-minded and trends are ever-changing. Therefore, businesses should never rely on outdated data and insights to drive decision-making.
But that doesn’t mean old data are useless! Using historical data, we can identify trends to tell us a story.
Are we doing better than before? If not, why are the causes? What can we do in order to maintain or improve our edge?
In short, always ensure insights are up-to-date. Historical data are useful in trends and comparisons, but not as a data source on its own.
Having a ton of data might not be as useful as we would think. What we need are good data that can allow us to draw useful and actionable insights.
On a side note, if you are into web analytics and have not heard of Data Layer, you should definitely check out this article: