Throughout my long career of building and implementing data quality processes, I've consistently been told that data quality could not be implemented within data sources, because doing so would disrupt production systems. Therefore, source data was often copied to a central location – a staging area – where it was cleansed, transformed, unduplicated, restructured and loaded into new applications, such as an enterprise data warehouse or master data management hub.
This paradigm of dragging data from where it lives through data quality processes that exist elsewhere (and whose results are stored elsewhere) had its advantages. But one of its biggest disadvantages was the boundary it created – original data lived in its source, but quality data lived someplace else.
These boundaries multiply with the number of data sources an enterprise has. That's why a long-stated best practice has been to implement data quality processes as close to the data source as possible.;
Chief Analytics Officer Europe
15% off with code 7WDCAO17
Chief Analytics Officer Spring 2017
15% off with code MP15
Big Data and Analytics for Healthcare Philadelphia
$200 off with code DATA200
10% off with code 7WDATASMX
Data Science Congress 2017
20% off with code 7wdata_DSC2017