Understanding Data Quality is critical if you want to use data with confidence, as only good quality data can power accurate analysis, which in turn supports informed business decisions.

When most people think about Data Quality, they tend to focus on accuracy, but this is only one of a number of quality dimensions that need to be considered.

1) Accuracy

Accuracy is a measure of how well data reflects reality. Accuracy problems often arise at the point data are being collected, or when being transcribed manually into spreadsheets or applications.

Common examples include: spelling mistakes in names; incorrect phone numbers or email addresses; wrong stock numbers; or data captured in the wrong measurement (e.g. lbs instead of kg).

For further reading see:

How to Identify and Address Inaccurate Data

2) Completeness

You may have encountered a situation where incomplete data prevents you from taking an action.

Here the problem is not that the data is inaccurate, but that the required attribute is simply missing.

If you have ten thousand customer records, what percentage of them contain a contact phone number? Is it 99% or only 20%?

For data to be considered complete, all the data required for a particular use case must be captured and available to utilise.

Completeness is less of an issue if the missing data it isn’t core to the planned use – e.g. phone number completeness is not important if you are planning to communicate with customers via email, but it will be important if you plan to use phone number as a mandatory part of an authentication process.

For further reading see:

How to Identify and Manage Data Completeness Issues

3) Uniqueness

Do you have a duplicate problem? Is the same object, person or event represented multiple times in your records?