The 3 Vs of Big Data revisited: Venn diagrams and visualization

The 3 Vs of Big Data revisited: Venn diagrams and visualization

The 3 Vs of Big Data revisited: Venn diagrams and visualization

This discussion is about visualization. The three Vs of big data (volume, velocity, variety) or the three skills that make a data scientist (hacking, statistics, domain expertise) are typically visualized using a Venn diagram, representing all the potential 8 combinations through set intersections. In the case of big data, I believe (visualization, veracity, value) are more important than (volume, velocity, variety), but that's another issue. Except that one of my Vs is visualization and all these Venn diagrams are visually wrong: the color at the intersection of two sets should be the blending of both colors of the parent sets, for easy interpretation and easy generalization to 4 or more sets. For instance, if we have three sets A, B, C painted respectively in red, green, blue, the intersection of A and B should be yellow, the intersection of the three should be white.

Here, I'll discuss how to create better diagrams, and then focus on how to add extra dimensions to an existing chart - including not just visual elements, but sound.

Read Also:
Open-Source Deep Learning Frameworks and Visual Analytics

If you want to represent 3 sets, you need to choose 3 base colors for the 3 sets, and then the colors for the intersections will be automatically computed using color addition rule. It makes sense to use red, green, blue as the base colors for two reasons:

Actually, you don't even need to use Venn diagrams when using this color scheme: instead you can use 8 non-overlapping rectangles, with the size of each rectangle representing the number of observations in each set / subset. Note that, to the contrary, choosing red, green and yellow as the three base colors would be very bad because the intersection of red and green is yellow, which is also the color of the third set.

If you have 4 sets, and assuming the intensity for each R/G/B component is a number between 0 and 1 (as in the rgb function in the R language), a good set of base colors satisfying the above first property is: {(0.5,0,0), (0,0.5,0), (0,0,0.5), (0.5,0.5,0.5)} corresponding to dark red, dark green, dark blue, grey.

Read Also:
How IoT will affect information governance

For 5 sets or more, it is better to use a table rather than a diagram, although you can find interesting but very intricate (difficult to read) Venn diagrams on Google.

If you are not familiar with how colors blend, do this exercise: create a rectangle filled in yellow, in your favorite graphic editor. Next to this rectangle, create another rectangle filled with pixels that alternate between red and green: this latter rectangle will appear yellow to your eyes.

 



Sentiment Analysis Symposium

27
Jun
2017
Sentiment Analysis Symposium

15% off with code 7WDATA

Read Also:
Blockchain Technology Has The Power to Let Us Build An Entirely New Internet

Data Analytics and Behavioural Science Applied to Retail and Consumer Markets

28
Jun
2017
Data Analytics and Behavioural Science Applied to Retail and Consumer Markets

15% off with code 7WDATA

Read Also:
Blockchain Technology Has The Power to Let Us Build An Entirely New Internet

AI, Machine Learning and Sentiment Analysis Applied to Finance

28
Jun
2017
AI, Machine Learning and Sentiment Analysis Applied to Finance

15% off with code 7WDATA

Read Also:
Top 3 tips on how to use predictive analytics for your business
Read Also:
How to perform real time Text Analytics on Twitter streaming data in SAS ESP

Real Business Intelligence

11
Jul
2017
Real Business Intelligence

25% off with code RBIYM01

Read Also:
2016: The year AI got creative

Advanced Analytics Forum

20
Sep
2017
Advanced Analytics Forum

15% off with code Discount15

Read Also:
10 Dataviz Tools To Enhance Data Science

Leave a Reply

Your email address will not be published. Required fields are marked *