When you read the books about raising children, we are often taught that children need structure. Structure helps our children feel safe by doing familiar tasks and develop discipline in accomplishing those tasks. This leads to happier children and helps all of those who deal with them. Sure, it is OK to draw outside the lines occasionally. It’s a particularly good way to foster creativity and free thought. However, structure is important to a happy and healthy life.
Similarly, organizations who are serious about raising the maturity of their analytics also need to think about having the right balance of structure and unstructured. Organizations often have complex analytical needs and it’s very difficult to solve all of the needs with a single solution. There is a place and a time to store and process big data without structure. Databases like Mongo DB and Cassandra are perfect tools for doing just that. However, just like raising children, it is often beneficial to impose structure as a rule.
Why structure is important to Big Data
Forcing structure on your data can lead to better performance in analytics as there is less searching for data to answer the query. The structured database knows better where the data exists in the sea of data and can access it with precision. Unstructured data may be scattered across nodes and be more difficult and time-consuming to find. Sure, you may have to spend some time preprocessing it. However, applying schemas lets you take a junk drawer full of “stuff” and organize it into nice neat Tupperware containers of similar data.
Data quality and standardization are also a factor. It’s more difficult to apply standardization to unstructured data as it is often analyzed in whatever form it was created. For example, when a device sends updates to be analyzed by the latest Internet of Things project, time stamps may come with hours, minutes and seconds, or they may come with just a date. The date data may be in US or European format. By applying structure, you can force data quality onto your data and do a better job in accuracy in reporting.
Structure provides benefit for compression efficiency, too.