Drawing a map of distributed data systems

Drawing a map of distributed data systems

Drawing a map of distributed data systems

How we created an illustrated guide to help you find your way through the data landscape.

Designing Data-Intensive Applications, the book I’ve been working on for four years, is finally finished, and should be available in your favorite bookstore in the next week or two. An incomplete beta (Early Release) edition has been available for the last 2 1/2 years as I continued working on the final chapters.

Throughout that process, we have been quietly working on a surprise. Something that has not been part of any of the Early Releases of the book. In fact, something that I have never seen in any tech book. And today we are excited to share it with you.

In Designing Data-Intensive Applications, each of the 12 chapters is accompanied by a map. The map is a kind of graphical table of contents of the chapter, showing how some of the main ideas in the chapter relate to each other.

Here is an example, from Chapter 3 (on storage engines):

Don’t take it too seriously—some of it is a little tongue-in-cheek, we have taken some artistic license, and the things included on the map are not exhaustive.

Read Also:
The Importance of IT Operations in the Big Data Era

But it does reflect the structure of the chapter: political or geographic regions represent ways of doing something, and cities represent particular implementations of those approaches. Similar things are more likely to be close together, and roads or rivers represent concepts that connect different implementations or regions.

Most computing books describe one particular piece of software and discuss all the aspects of how it works. This book is structured differently: it starts with the concepts—discussing the high-level approaches of how you might solve some problem, and comparing the pros and cons of each—and then points out which pieces of software use which approach. The maps use the same structure: the region in which a city is located tells you what approach it uses.

For example, in the map above, you can see a high-level subdivision into two countries: transaction processing and analytics. Within transaction processing, there are two regions: log-structured storage and B-trees, which are two ways of implementing OLTP storage engines. Within the B-tree region, you see databases like MySQL and PostgreSQL[1], while within the log-structured region you see databases like Cassandra and HBase. On the analytics side, you can see that the mountain range representing column storage reaches into both the data warehousing and the Hadoop regions, since the approach applies to both.

Read Also:
With Customer Intelligence, the future begins today

The maps are in black and white, both because of practicalities of printing and also because I was looking for a Tolkien-esque style. You are, of course, welcome to color them in yourself. In fact, by coloring them in, you would be following a fine tradition: for over three centuries, maps were printed in black and white from an engraved copper plate, and then colored in by hand.

Each of the chapters has a map like that, focusing on the particular aspects discussed in that chapter. This means that some cities appear on multiple islands—the data landscape is multidimensional, so a city may lie in more than one (conceptual) realm. For example, the map below is for Chapter 5 (on the topic of replication):

Cities representing Cassandra, MongoDB, MySQL, and others appear on both this map, the Chapter 3 map above, and some other maps, too.

Shipping routes connect some of the ports shown in the maps, in cases where there is a noteworthy link between chapters. Most of the maps are of islands, but there are some exceptions. (I won’t give away too much, but I just want to say...beware of the Kraken.)

Read Also:
4 tactics that put data ahead of drama when making IT procurement decisions

I am incredibly delighted that O’Reilly was willing to take on this crazy idea of creating maps.



Data Innovation Summit 2017

30
Mar
2017
Data Innovation Summit 2017

30% off with code 7wData

Read Also:
How Microsoft's other machine learning tricks could make its bots even smarter

Big Data Innovation Summit London

30
Mar
2017
Big Data Innovation Summit London

$200 off with code DATA200

Read Also:
How the Hilton, Hyatt, and Marriott can Leverage Data to Compete With Airbnb

Enterprise Data World 2017

2
Apr
2017
Enterprise Data World 2017

$200 off with code 7WDATA

Read Also:
​What big data can reveal about your staff

Data Visualisation Summit San Francisco

19
Apr
2017
Data Visualisation Summit San Francisco

$200 off with code DATA200

Read Also:
The Emergence of the Citizen Data Scientist

Chief Analytics Officer Europe

25
Apr
2017
Chief Analytics Officer Europe

15% off with code 7WDCAO17

Read Also:
The Self-Driving Enterprise: How AI Will Make Apps and Us Work Better

Leave a Reply

Your email address will not be published. Required fields are marked *