During the Enterprise Data World conference last week, it was clear that many organizations are wrestling with the rapid changes in information management and governance necessitated, and many are assessing where they are in this process, even questioning “where to start?”
William McKnight of McKnight Associates noted importantly in his opening keynote: “don’t talk yourself out of starting.” Stan Christaens of Collibra added that organizations are finding different points to get underway, whether facilitating self-service analytics, enabling data stewards to care for data, working on critical compliance requirements, or freeing data scientists to find relevant data. But, as Mike Nicosia of TIAA commented, while maturity assessments may provide insight, “without context, you cannot make good decisions.”
This is a challenge for data-driven businesses as they endeavor to get actionable insights from critical enterprise data assets, leveraging next-generation Big Data environments.
My opening day was filled with tutorials on Data Modeling wrapped around my own presentation “Finding Quality in the Data Lake”. In the morning, I heard about advanced, but traditional techniques for modeling the enterprise data warehouse. That afternoon, I learned about the challenges of modeling for NoSQL databases.
What struck me in comparing the two was context – that is, the understood context of a given piece of data. In the first, the originating context is stripped away to get to a model of an entity – a computerized representation through data of some real-world object. In the second, the context is maintained through the use of techniques such as document stores or graphed relationships. As the instructor in the latter tutorial noted, “context is critical.”
As I’ve recently reflected on the meaning of data quality in the emerging structure of the Data Lake, the notion of context for Big Data takes on primary importance. Nicosia used the analogy of a cholesterol test. If you’ve had the test and the doctor says you are at 250, what does that mean? Is it good, is it bad?
You need context – context that includes a definition of what the data is, how it’s recorded, whether it has a scale of measurement, and even whether there is a prior value or measurement for comparison.
However, Big Data context is not simply a reflection of what data means. As Andrew Patricio, former CDO of the District of Columbia Public Schools commented “What problem are you trying to solve?” There needs to be a focus on “relevant data.”
Theresa DelVecchio Dys, Director of Social Policy Research and Analysis at Feeding America noted that their Data First Initiative started with a problem statement. As she noted, “not all data is good for all things.
Chief Analytics Officer Spring 2017
15% off with code MP15
Big Data and Analytics for Healthcare Philadelphia
$200 off with code DATA200
10% off with code 7WDATASMX
Data Science Congress 2017
20% off with code 7wdata_DSC2017
20% off with code AIP17-7WDATA-20