What is data consistency?
Data consistency can be viewed as the state of a considered data set to be free of contradictions. In order to keep this desirable state for a prolonged time, the set of data needs to be kept free of contradictions at any time.
Why is data consistency so important?
Data consistency alone is not enough to have “good” data, as useful data also has to be correct and up to date. But consistency is at the base of it all. Without it, the usefuleness of any dataset is impaired at least and tends to worsen over time, which may render all of it unusable, hence turning all the effort put into its creation and maintenance into waste.
There are abundant reports of failures stemming from a single simple inconsistency: Units being mixed up like kilometers vs. miles, litres vs. gallons, pounds vs. kilogramms, hours vs. days… I’m sure you know some.
How enforce consistency?
There are a few inevidable measures to take. Depending on the size and complexity of the dataset, the required efforts can range from neglegible to massive. Whatever it is, it is worth it!
- Understand the structure of the data. And understand it all!
- If the dataset is too big to deal with at once, slice it into chunks that are small enough to work on with the resources available. However, make sure you start with the most basic entities. If you have a CRM and you want to make your business opportunities consistent, you better have consistent customer and prospect data first.
- Define the rules which ensure your data will stay consistent.
- Build easy to repeat reports on data consistency. These will become a measure for the success of your consolidation effort and help keeping inconsistent data popping up later on. Make sure every rule is considered in the reports defined.
Run the reports recurringly, so that you do not lose track of the status.
- Stop introduction of inconsistent data. If you do not do this before removing or fixing inconsistency, you will soon find ou that you have become Sisyphus. The bottom line is: Do not allow inconsistent data to enter the set, no matter what. Achieving this without interrupting business processes can be quite challenging.
If you have to compromise on a specific measure, make sure you keep notice of it and address any potential consequences – preferrably in an automatic way.
- Clean the data. There may be a way to do this automatically, but usually a protion of the cleansing has to be done manually.
- Keep watching over the data, as any procedural or software change may result in new ways of introducing inconsistenies.
- Ensure that every change to the dataset’s structure is scrutinised for potential new inconsistencies and extend the rules and reports accordingly.
Is this it?
Again, not at all! If you have all perfectly consistent data, it might just be wrong or outdated… But without consistency, you might not even be able to determine that.
See below for my contact information