The 3 Vs of Big Data revisited: Venn diagrams and visualization

The 3 Vs of Big Data revisited: Venn diagrams and visualization

The 3 Vs of Big Data revisited: Venn diagrams and visualization

This discussion is about visualization. The three Vs of big data (volume, velocity, variety) or the three skills that make a data scientist (hacking, statistics, domain expertise) are typically visualized using a Venn diagram, representing all the potential 8 combinations through set intersections. In the case of big data, I believe (visualization, veracity, value) are more important than (volume, velocity, variety), but that's another issue. Except that one of my Vs is visualization and all these Venn diagrams are visually wrong: the color at the intersection of two sets should be the blending of both colors of the parent sets, for easy interpretation and easy generalization to 4 or more sets. For instance, if we have three sets A, B, C painted respectively in red, green, blue, the intersection of A and B should be yellow, the intersection of the three should be white.

Here, I'll discuss how to create better diagrams, and then focus on how to add extra dimensions to an existing chart - including not just visual elements, but sound.

Read Also:
Benchmarking Predictive Models

If you want to represent 3 sets, you need to choose 3 base colors for the 3 sets, and then the colors for the intersections will be automatically computed using color addition rule. It makes sense to use red, green, blue as the base colors for two reasons:

Actually, you don't even need to use Venn diagrams when using this color scheme: instead you can use 8 non-overlapping rectangles, with the size of each rectangle representing the number of observations in each set / subset. Note that, to the contrary, choosing red, green and yellow as the three base colors would be very bad because the intersection of red and green is yellow, which is also the color of the third set.

If you have 4 sets, and assuming the intensity for each R/G/B component is a number between 0 and 1 (as in the rgb function in the R language), a good set of base colors satisfying the above first property is: {(0.5,0,0), (0,0.5,0), (0,0,0.5), (0.5,0.5,0.5)} corresponding to dark red, dark green, dark blue, grey.

Read Also:
Why hasn't artificial intelligence made the Internet smarter?

For 5 sets or more, it is better to use a table rather than a diagram, although you can find interesting but very intricate (difficult to read) Venn diagrams on Google.

If you are not familiar with how colors blend, do this exercise: create a rectangle filled in yellow, in your favorite graphic editor. Next to this rectangle, create another rectangle filled with pixels that alternate between red and green: this latter rectangle will appear yellow to your eyes.

 



Data Science Congress 2017

5
Jun
2017
Data Science Congress 2017

20% off with code 7wdata_DSC2017

Read Also:
IT: Easy data visualization and analysis, all day, every day

AI Paris

6
Jun
2017
AI Paris

20% off with code AIP17-7WDATA-20

Read Also:
How Big Data is Changing The Way You Fly

Chief Data Officer Summit San Francisco

7
Jun
2017
Chief Data Officer Summit San Francisco

$200 off with code DATA200

Read Also:
The care and feeding of a data project

Customer Analytics Innovation Summit Chicago

7
Jun
2017
Customer Analytics Innovation Summit Chicago

$200 off with code DATA200

Read Also:
How the Internet of Things Helps Water Management
Read Also:
Big data for guide dogs. IBM helps train service dogs

HR & Workforce Analytics Innovation Summit 2017 London

12
Jun
2017
HR & Workforce Analytics Innovation Summit 2017 London

$200 off with code DATA200

Read Also:
How the Internet of Things Helps Water Management

Leave a Reply

Your email address will not be published. Required fields are marked *