The 3 Vs of Big Data revisited: Venn diagrams and visualization

The 3 Vs of Big Data revisited: Venn diagrams and visualization

The 3 Vs of Big Data revisited: Venn diagrams and visualization

This discussion is about visualization. The three Vs of big data (volume, velocity, variety) or the three skills that make a data scientist (hacking, statistics, domain expertise) are typically visualized using a Venn diagram, representing all the potential 8 combinations through set intersections. In the case of big data, I believe (visualization, veracity, value) are more important than (volume, velocity, variety), but that's another issue. Except that one of my Vs is visualization and all these Venn diagrams are visually wrong: the color at the intersection of two sets should be the blending of both colors of the parent sets, for easy interpretation and easy generalization to 4 or more sets. For instance, if we have three sets A, B, C painted respectively in red, green, blue, the intersection of A and B should be yellow, the intersection of the three should be white.

Here, I'll discuss how to create better diagrams, and then focus on how to add extra dimensions to an existing chart - including not just visual elements, but sound.

Read Also:
Big Data: Now an Integral Part of Talent Recruiting

If you want to represent 3 sets, you need to choose 3 base colors for the 3 sets, and then the colors for the intersections will be automatically computed using color addition rule. It makes sense to use red, green, blue as the base colors for two reasons:

Actually, you don't even need to use Venn diagrams when using this color scheme: instead you can use 8 non-overlapping rectangles, with the size of each rectangle representing the number of observations in each set / subset. Note that, to the contrary, choosing red, green and yellow as the three base colors would be very bad because the intersection of red and green is yellow, which is also the color of the third set.

If you have 4 sets, and assuming the intensity for each R/G/B component is a number between 0 and 1 (as in the rgb function in the R language), a good set of base colors satisfying the above first property is: {(0.5,0,0), (0,0.5,0), (0,0,0.5), (0.5,0.5,0.5)} corresponding to dark red, dark green, dark blue, grey.

Read Also:
10 Women Influencers in Big Data Analytics

For 5 sets or more, it is better to use a table rather than a diagram, although you can find interesting but very intricate (difficult to read) Venn diagrams on Google.

If you are not familiar with how colors blend, do this exercise: create a rectangle filled in yellow, in your favorite graphic editor. Next to this rectangle, create another rectangle filled with pixels that alternate between red and green: this latter rectangle will appear yellow to your eyes.

 



Enterprise Data World 2017

2
Apr
2017
Enterprise Data World 2017

$200 off with code 7WDATA

Read Also:
Data Science for IoT vs Classic Data Science: 10 Differences

Data Visualisation Summit San Francisco

19
Apr
2017
Data Visualisation Summit San Francisco

$200 off with code DATA200

Read Also:
The Importance of Location in Real Estate, Weather, and Machine Learning

Chief Analytics Officer Europe

25
Apr
2017
Chief Analytics Officer Europe

15% off with code 7WDCAO17

Read Also:
7 Keys To Building A Successful Big Data Infrastructure
Read Also:
4 Barriers to Leveraging Big Data for Operational Innovation

Chief Analytics Officer Spring 2017

2
May
2017
Chief Analytics Officer Spring 2017

15% off with code MP15

Read Also:
Big Data: Now an Integral Part of Talent Recruiting

Big Data and Analytics for Healthcare Philadelphia

17
May
2017
Big Data and Analytics for Healthcare Philadelphia

$200 off with code DATA200

Read Also:
Big Data: Now an Integral Part of Talent Recruiting

Leave a Reply

Your email address will not be published. Required fields are marked *