Data gathering is an important process in any business. To ensure that the gathered data will be useful for an organisation it should be in pristine condition. Making decisions based on dirty data can be detrimental to the company.
Now what is dirty data anyway? Putting it simply, data that is duplicated, inaccurate, incomplete or flawed in any way is exactly what dirty data is. Therefore it is essential that quality control is applied in the process of data gathering.
Now how do we do that, exactly? There’s an old saying that goes; “Straight from the horse’s mouth”. A lot of organisations put significant weight in what people say about their products and services. They invest millions in data gathering campaigns using surveys, questionairres and feedback forms.
Granted, these are globally accepted and practiced methods of data gathering. In fact, if you persevere, you can probably get a significant pool of respondents providing a wealth of business data. If you believe in strength in numbers, you’d probably need to rethink that approach when it comes to data gathering. Bear in mind that effective data gathering is not built around a single pillar of strength.
Personally, I feel that trustable data must be verifiable and quantifiable. Now what does this mean in real world terms? Verifiable data means that the data can be traced back to its exact source. Quantifiable data means that it can be valued. Let’s apply this concept by looking at a couple of examples:
- A well dressed female customer buys a few expensive jewellery
- Mrs Basu bought a few expensive jewellery
- Mrs Basu bought a diamond ring, a gold bracelet and a pair of pearl earrings
- Mrs Basu, a regular customer, who happens to be the General Manager of Accounts at Company X, bought a diamond ring worth RM1200, a gold bracelet worth RM2000 and a pair of pearl earrings at RM800 (after getting a 20% discount of the original sales price) at 3:30pm today
The first statement is neither verifiable nor quantifiable. Basically we don’t know who exactly the well dressed female customer is, nor do we know what “a few” or “expensive” means. Those two terms are highly subjective.
The second statement is better; at least we can verify who the customer is. Nevertheless, we still don’t know what exactly did she bought.
The third statement makes the picture much clearer. We know who bought what. However, it can be better as the next statement shows.
The fourth and last statement pretty much hits it on the head. We know exactly who bought exactly what and at what exact point of time.
Now, let’s say your company has invested in a state-of-the-art IT solution that can handle tons of data and output it in so many graphs that you can get epileptic seizure just by watching the rendering process. How sure are you that this implimentation will contribute significant ROI? Do you even have a complete picture on what the investment was in the first place? Can you even quantify the exact breakeven point of the implementation? Can you verify that?
I’ve been in IT long enough to know that there is no magic bullet when it comes to data analysis. Often enough, improperly sanitised data contribute a lot to the perceived “failure” of an analytical IT implimentation. It’s just too easy to blame the system when the fault is much, much more ingrained.
Computerised systems can pretty much handle any kind of duplicate detection or ensuring data completeness. However, accuracy is pretty much a hit and miss thing.
I’m not saying that IT solutions can’t add, substract or perform more complex mathematical processes properly. The fact of the matter is, the inputted data is only as accurate as what the user keys into the system. All the computer systems I’ve seen do not make presumptions of its operators. Therefore, if the entered data matches the acceptable pattern, it’s allowed to go through.
The human factor is an often overlooked aspect in the data gathering process. Training is mostly methodical instead of focusing on the importance of ensuring accurate input. Putting it simply, systems training is more focused on the how instead of the why.
Humans are intelligent creatures. More importantly, they’re also selfish. So if they are made to understand why making use of an IT implimentation properly will affect their income, job security and ultimately their survival in the organisation; you can be pretty darn sure that they’ll strive to do the best job possible.
And that my friends is the bottom line.