image-20160315-17754-110z1ns

Size doesn’t matter in Big Data, it’s what you ask of it that counts

Size doesn’t matter in Big Data, it’s what you ask of it that counts

Big Data is changing the way we do science today. Traditionally, data were collected manually by scientists making measurements, using microscopes or surveys. These data could be analysed by hand or using simple statistical software on a PC.

Big Data has changed all that. These days, tremendous volumes of information are being generated and collected through new technologies, be they large telescope arrays, DNA sequencers or Facebook.

The data is vast, but the kinds of data and the formats they take are also new. Consider the hourly clicks on Facebook, or the daily searches on Google. As a result, Big Data offers scientists the ability to perform powerful analyses and make new discoveries.

The problem is that Big Data hasn’t yet changed the way many researchers ask scientific questions. In biology in particular, where tools like genome sequencing are generating tremendous amounts of data, biologists might not be asking the right kinds of questions that Big Data can answer.

Read Also:
Looking To Evolve Your Business Through Software Intelligence?

Asking questions is what scientists do. Biologists ask questions about the living world, such as “how many species are there?” or “what are the evolutionary relationships between rats, bats and primates?”.

The way we ask questions says a lot about the type of information we use. For example, systematists like myself study the diversity and relationship between the many species of creatures throughout evolutionary history.

We have tended to use physical characteristics, like teeth and bones, to classify mammals into taxonomic groups. These shared characteristics allow us to recognise new species and identify existing ones.

Enter Big Data, and cheap DNA sequencing technology. Now systematists have access to new forms of information, such as whole genomes, which have drastically changed the way we do systematics. But it hasn’t changed the way many systematists frame their questions.

Biologists are expecting big things from Big Data, but they are finding out that it initially delivers only so much. Rather than find out what these limitations are and how they can shape our questions, many biologists have responded by gathering more and more data. Put simply: scientists have been lured by size.

Read Also:
Neural networks? Machine learning? Here’s your secret decoder for AI buzzwords

Quantity is often seen as a benchmark of success. The more you have, the better your study will be.

This thinking stems from the idealistic view of complete datasets with unbiased sampling. Statisticians call this “n = all”, which represents a data set that contains all the information.

If all the data was available, then scientists wouldn’t have the problem of missing or corrupted data. A real world example would be a complete genome sequence.;

 



Chief Analytics Officer Europe

25
Apr
2017
Chief Analytics Officer Europe

15% off with code 7WDCAO17

Read Also:
Machine Learning, Analytics Offer Untapped Potential, McKinsey Says

Chief Analytics Officer Spring 2017

2
May
2017
Chief Analytics Officer Spring 2017

15% off with code MP15

Read Also:
Are you brave enough to change your Data Habits?

Big Data and Analytics for Healthcare Philadelphia

17
May
2017
Big Data and Analytics for Healthcare Philadelphia

$200 off with code DATA200

Read Also:
Are you brave enough to change your Data Habits?
Read Also:
Five Steps for Using Analytics to Transform Your Business

SMX London

23
May
2017
SMX London

10% off with code 7WDATASMX

Read Also:
Why Embedded Analytics Will Change Everything

Data Science Congress 2017

5
Jun
2017
Data Science Congress 2017

20% off with code 7wdata_DSC2017

Read Also:
Hybrid infrastructure gets a boost with Big Data

Leave a Reply

Your email address will not be published. Required fields are marked *