During a workshop, IBM executive Rajeev Priyardashi posed the question: why should I take care of my health? And he answered it himself: because I plan to run and play with my grandkids when I’m 70.
Then, he compared it: a company’s data is like our health. We need to take care of it so we can make better business decisions, gain insights that can help us see opportunities, and reduce risks related to security and data breaches.
The analogy took off, and the idea of healthy data caught on in corporations.
What to Consider
- To have healthy data, you need to:
- Have a common definition for everyone about what that data is – in other words, create a data dictionary.
- Have data rules and policies, defining who is responsible for that data.
- Monitor data quality daily.
- Automatically send an email to the right person for any anomaly, and corrective measures should be taken.
- Recover the data’s history, its lineage. Knowing the source is important for reliable data.
Data Catalog
To treat data – and this is important – we need it mapped and centralized in one location (we covered this topic in this post).
Data treatment requires methodology, and the Data Management Body of Knowledge (DAMA DMBOK) provides guidance for the entire data lifecycle:
Data – Plan – Specify – Deploy – Create & Acquire – Maintain & Use – Archive & Retrieve – Delete
To take advantage of the vast amount of data existing in companies today, it’s essential that all users know the data and understand what it represents. Let’s say the data is “Customer Name.” But is it a personal or corporate customer? If it’s a corporation, does it refer to the legal name or the trade name of the company? A glossary of terms will help standardize and facilitate general understanding. It seems trivial, but standardization will prevent data from being considered invalid / inconsistent when someone needs it.
Metadata
Data needs technical classification parameters. The negative number -15,000 is just a spreadsheet entry. But, if it’s the balance of a delinquent customer, it takes on a different meaning. This leads to another point: data needs metadata – the complementary information that will answer “corporate customer, trade name” or “negative balance, delinquent customer.”
Data also needs an “owner,” the person who is responsible for it and who can modify it. The data administrator should have a complete view of how that data is used and by whom. This way, if data is changed or deleted, it’s possible to predict which areas and systems will be impacted. For any anomaly, an email should be automatically sent to the right person, and corrective measures should be taken.
Lineage
Data alteration brings to mind one more thing: data needs history, also called lineage, which allows you to understand how that data was constructed, where it came from, if it was updated, who changed it, and when. Lineage is also important in the event of a need for legal verification.
Fortunately, there’s software to help put all of this into practice; there’s no need to reinvent the wheel. At Scala, specialized teams can help your company at all stages of data governance and in the subsequent stages of preparing databases for the use of Machine Learning and Artificial Intelligence. Want to talk to an expert? Just ask!