How to Have (and Maintain) a Healthy Database

What to Consider

To have healthy data, you need to:

Have a common definition for everyone about what that data is – in other words, create a data dictionary.

Have data rules and policies, defining who is responsible for that data.

Monitor data quality daily.

Automatically send an email to the right person for any anomaly, and corrective measures should be taken.

Recover the data’s history, its lineage. Knowing the source is important for reliable data.

Data Catalog

To treat data – and this is important – we need it mapped and centralized in one location (we covered this topic in this post).

Data treatment requires methodology, and the Data Management Body of Knowledge (DAMA DMBOK) provides guidance for the entire data lifecycle:

Data – Plan – Specify – Deploy – Create & Acquire – Maintain & Use – Archive & Retrieve – Delete

To take advantage of the vast amount of data existing in companies today, it’s essential that all users know the data and understand what it represents. Let’s say the data is “Customer Name.” But is it a personal or corporate customer? If it’s a corporation, does it refer to the legal name or the trade name of the company? A glossary of terms will help standardize and facilitate general understanding. It seems trivial, but standardization will prevent data from being considered invalid / inconsistent when someone needs it.

Metadata

Data needs technical classification parameters. The negative number -15,000 is just a spreadsheet entry. But, if it’s the balance of a delinquent customer, it takes on a different meaning. This leads to another point: data needs metadata – the complementary information that will answer “corporate customer, trade name” or “negative balance, delinquent customer.”

Data also needs an “owner,” the person who is responsible for it and who can modify it. The data administrator should have a complete view of how that data is used and by whom. This way, if data is changed or deleted, it’s possible to predict which areas and systems will be impacted. For any anomaly, an email should be automatically sent to the right person, and corrective measures should be taken.

Lineage

Data alteration brings to mind one more thing: data needs history, also called lineage, which allows you to understand how that data was constructed, where it came from, if it was updated, who changed it, and when. Lineage is also important in the event of a need for legal verification.

Fortunately, there’s software to help put all of this into practice; there’s no need to reinvent the wheel. At Scala, specialized teams can help your company at all stages of data governance and in the subsequent stages of preparing databases for the use of Machine Learning and Artificial Intelligence. Want to talk to an expert? Just ask!

AI Governance Platforms: Ensuring Artificial Intelligence Serves Humanity

Leia +

What to Consider

Data Catalog

Metadata

Lineage

Continue navegando pelas categorias de conteúdos

Você também pode gostar...

AI Governance Platforms: Ensuring Artificial Intelligence Serves Humanity

Data Lakes: What They Are and How to Implement a Governance Plan

Data Monetization: What Impact Does It Have on Organizations?