Throughout my long career of building and implementing data quality processes, I’ve consistently been told that data quality could not be implemented within data sources, because doing so would disrupt production systems. Therefore, source data was often copied to a central location – a staging area – where it was cleansed, transformed, unduplicated, restructured and loaded into new applications, such as an enterprise data warehouse or master data management hub.
This paradigm of dragging data from where it lives through data quality processes that exist elsewhere (and whose results are stored elsewhere) had its advantages. But one of its biggest disadvantages was the boundary it created – original data lived in its source, but quality data lived someplace else.
These boundaries multiply with the number of data sources an enterprise has. That’s why a long-stated best practice has been to implement data quality processes as close to the data source as possible.;