Data modeling has been having a tough time as the IT industry has pivoted to big data. With big data’s external sourcing and so-called unstructured form, data models seemed less relevant. In the new NoSQL world, data modelers struggled to apply tools and techniques grounded in a relational database mindset.
Schema-on-read is the antithesis of data models’ diligent analysis and structuring performed long before the first field was committed to disk. Even in the relational world, speed and agility of delivery have undercut the role of the traditional data modeler.
The Value of Modeling
This hiatus in modeling thought and application has been most unfortunate. What is data modeling other than a search for and exploration of meaning in information? In an environment where data is coming into enterprises from more diverse and ill-defined sources, in ever-greater volumes, and at higher speed, understanding its meaning and deciphering its structure is of the utmost importance.
Data scientists complain they spend 80 percent of their time preparing data for analysis. Their plight has led to the emergence of data wrangling and structure discovery tools. However, despite their value in “munging” data, most lack the theoretical foundation that was incorporated in entity-relationship (ER) data modeling as far back as Peter Chen’s 1976 seminal paper, “The Entity-Relationship Model — Toward a Unified View of Data.”
Data is a representation of the real world, and it is from that real world that an understanding of all data must emerge.
How that can happen is the subject of Thomas Frisendal’s new book Graph Data Modeling for NoSQL and SQL. Starting from the psychological premise that data modeling is the exploration and discovery of meaning and structure and thus requires a largely visual approach, he notes that traditional models look too much like relational tables. They are engineering artifacts closer to physical database design than to the maps of meaning a business person could understand and use creatively.
Frisendal is reiterating an older concept: a multilevel architecture for data modeling. Although I discussed five levels in Data Warehouse — from Architecture to Implementation in 1997, he opts for a simpler three-level approach — conceptual, logical, and physical — and proposes that new representations are needed at the top two levels.