The big data ecosystem for science

The big data ecosystem for science

The big data ecosystem for science

Big Data management is essential for experimental science and technologies used in various science communities often predate those in Big Data industry and in many cases continue to develop independently. This post highlights some of these technologies, focusing on those used by several projects supported by the National Energy Research Scientific Computing Centre (NERSC).

Large-scale data management is essential for experimental science and has been for many years. Telescopes, particle accelerators and detectors, and gene sequencers, for example, generate hundreds of petabytes of data that must be processed to extract secrets and patterns in life and in the universe.

The data technologies used in these various science communities often predate those in the rapidly growing industry big data world, and, in many cases, continue to develop independently, occupying a parallel big data ecosystem for science (see Figure 1). This post highlights some of these technologies, focusing on those used by several projects supported by the National Energy Research Scientific Computing Centre (NERSC).

Read Also:
Humans are the weak link in Big Data

This post originally appeared on oreilly.com, organizers of Strata Hadoop World. Republished with permission.

“One of the most valuable events to advance your career.”

Across these projects we see a common theme: data volumes are growing, and there is an increasing need for tools that can effectively store and process data at such a scale. In some cases, the projects could benefit from big data technologies being developed in industry, and in some other projects, the research itself will lead to new capabilities.

The Large Hadron Collider (LHC) at the European Organization for Nuclear Research (CERN) in Geneva is the world’s largest scientific instrument, designed to collide protons at the highest energies ever achieved. The resulting spray of particles is observed in detectors the size of buildings, in an attempt to discover one-in-a-billion events that have the potential to uncover new fundamental particles and, ultimately, secrets of the universe. The extreme rate of data produced, together with the overall volume of data and the rarity of interesting events, has made the research with the LHC one of the original examples of big data. LHC experiments require smart data ingestion, efficient data storage formats that allow for fast extraction of relevant data, powerful tools for transfer to collaborators around the world, and sophisticated statistical analysis.

Read Also:
Machine Learning For Drug Discovery

The LHC enables protons to collide 40 million times per second in detectors packed with instruments that take hundreds of millions of measurements during each collision (Figure 2).



Chief Analytics Officer Europe

25
Apr
2017
Chief Analytics Officer Europe

15% off with code 7WDCAO17

Read Also:
A to Z of Analytics

Chief Analytics Officer Spring 2017

2
May
2017
Chief Analytics Officer Spring 2017

15% off with code MP15

Read Also:
How Facial Recognition is Shaping the Future of Marketing Innovation

Big Data and Analytics for Healthcare Philadelphia

17
May
2017
Big Data and Analytics for Healthcare Philadelphia

$200 off with code DATA200

Read Also:
Simulations figure out way to generate $27 trillion in benefits for US Healthcare

SMX London

23
May
2017
SMX London

10% off with code 7WDATASMX

Read Also:
How We Will Harvest Cognition in 2017

Data Science Congress 2017

5
Jun
2017
Data Science Congress 2017

20% off with code 7wdata_DSC2017

Read Also:
When Big Data Isn’t Enough – You Need Fast Data!
Read Also:
Why segment customers in a Big Data world?

Leave a Reply

Your email address will not be published. Required fields are marked *