image-20160315-17754-110z1ns

Size doesn’t matter in Big Data, it’s what you ask of it that counts

Size doesn’t matter in Big Data, it’s what you ask of it that counts

Big Data is changing the way we do science today. Traditionally, data were collected manually by scientists making measurements, using microscopes or surveys. These data could be analysed by hand or using simple statistical software on a PC.

Big Data has changed all that. These days, tremendous volumes of information are being generated and collected through new technologies, be they large telescope arrays, DNA sequencers or Facebook.

The data is vast, but the kinds of data and the formats they take are also new. Consider the hourly clicks on Facebook, or the daily searches on Google. As a result, Big Data offers scientists the ability to perform powerful analyses and make new discoveries.

The problem is that Big Data hasn’t yet changed the way many researchers ask scientific questions. In biology in particular, where tools like genome sequencing are generating tremendous amounts of data, biologists might not be asking the right kinds of questions that Big Data can answer.

Read Also:
The State of Smart Data in the Enterprise – 2017

Asking questions is what scientists do. Biologists ask questions about the living world, such as “how many species are there?” or “what are the evolutionary relationships between rats, bats and primates?”.

The way we ask questions says a lot about the type of information we use. For example, systematists like myself study the diversity and relationship between the many species of creatures throughout evolutionary history.

We have tended to use physical characteristics, like teeth and bones, to classify mammals into taxonomic groups. These shared characteristics allow us to recognise new species and identify existing ones.

Enter Big Data, and cheap DNA sequencing technology. Now systematists have access to new forms of information, such as whole genomes, which have drastically changed the way we do systematics. But it hasn’t changed the way many systematists frame their questions.

Biologists are expecting big things from Big Data, but they are finding out that it initially delivers only so much. Rather than find out what these limitations are and how they can shape our questions, many biologists have responded by gathering more and more data. Put simply: scientists have been lured by size.

Read Also:
Artificial Intelligence Is Now Replacing Skilled Jobs. But Here's One Thing It Can't Replace

Quantity is often seen as a benchmark of success. The more you have, the better your study will be.

This thinking stems from the idealistic view of complete datasets with unbiased sampling. Statisticians call this “n = all”, which represents a data set that contains all the information.

If all the data was available, then scientists wouldn’t have the problem of missing or corrupted data. A real world example would be a complete genome sequence.;

 



Data Innovation Summit 2017

30
Mar
2017
Data Innovation Summit 2017

30% off with code 7wData

Read Also:
Stop Funding Data for Superman Systems!

Big Data Innovation Summit London

30
Mar
2017
Big Data Innovation Summit London

$200 off with code DATA200

Read Also:
Stop Funding Data for Superman Systems!

Enterprise Data World 2017

2
Apr
2017
Enterprise Data World 2017

$200 off with code 7WDATA

Read Also:
Big Data Will Create Its Own Version Of Retail Therapy

Data Visualisation Summit San Francisco

19
Apr
2017
Data Visualisation Summit San Francisco

$200 off with code DATA200

Read Also:
Data-Driven Decision Making: Who Should Make the Decisions, You or the Numbers?
Read Also:
Is the Big Data Analytics Market Growing or Slowing?

Chief Analytics Officer Europe

25
Apr
2017
Chief Analytics Officer Europe

15% off with code 7WDCAO17

Read Also:
Stop Funding Data for Superman Systems!

Leave a Reply

Your email address will not be published. Required fields are marked *