What is data consistency?
Data consistency can be viewed as the state of a considered dataset to be free of contradictions. In order to keep this desirable state for a prolonged period of time, the set of data needs to be strictly policed.
Why is data consistency so important?
Data consistency alone is not enough to have “good” data – it also has to be correct and up to date. But consistency is at the base of it all. Without it, the effectiveness of any dataset is impaired at the very least and tends to worsen over time. Ultimately it may render the entire dataset unusable, turning all the effort put into its creation and maintenance into a waste of time.
There are abundant reports of failures stemming from a single simple inconsistency: Units being mixed up like kilometres vs. miles, litres vs. gallons, pounds vs. kilograms, hours vs. days… I’m sure you’ve heard of some.
How to enforce consistency?
There are a few inevitable measures to take. Depending on the size and complexity of the dataset, the required efforts can range from negligible to massive. Whatever they may be, they are well worth it!
- Understand the structure of the data. And understand it all!
- If the dataset is too big to deal with in its entirety, slice it into chunks that are small enough to work on with the resources available. However, make sure you start with the most basic entities. If you have a CRM and you want to make your business opportunities consistent, you better have consistent customer and prospect data first.
- Define the rules which ensure your data will stay consistent.
- Build easy to produce reports on data consistency. These will become a measure for the success of your consolidation effort and help keep inconsistent data from popping up later on. Make sure every rule is considered in the reports defined.
Run the reports regularly, so that you do not lose track of the status.
- Stop introduction of inconsistent data. If you do not do this before removing or fixing inconsistencies, you will soon feel like old King Sisyphus! The bottom line is: Do not allow inconsistent data to enter the set, no matter what. Achieving this without interrupting business processes can be quite challenging. Absolutely necessary – but challenging.
If you have to compromise on a specific measure, make sure you keep note of it and address any potential consequences – preferrably in an automated way.
- Clean the data. There may be a way to do this automatically, but typically a portion of the cleansing has to be done manually.
- Maintenance is key – keep watching over the data. Any procedural or software change may result in new ways of introducing inconsistencies.
- Ensure that every change to the dataset’s structure is scrutinised for potential new inconsistencies and extend the rules and reports accordingly.
Is this it?
Again, not at all! If you have perfectly consistent data, it might still be wrong or outdated… But without consistency, you might not even be able to determine that. If all that sounds horrendously complicated, I’m here and ready to assist.
See below for my contact information