Potent Trio: Big Data

Potent Trio: Big Data, Hadoop, and Finance Analytics

Potent Trio: Big Data, Hadoop, and Finance Analytics

Big data is a universal phenomenon. Every business sector and aspect of society is being touched by the expanding flood of information from sensors, social networks, and streaming data sources. The financial sector is riding this wave as well. We examine here some of the features and benefits of Hadoop (and its family of tools and services) that enable large-scale data processing in finance (and consequently in nearly every other sector).

Three of the greatest benefits of big data are discovery, improved decision support, and greater return on innovation. In the world of finance, these also represent critical business functions:

When confronted with the inevitable avalanche of financial data from many business and customer channels, the modern data-driven firm can find help in the supporting technologies that comprise the Hadoop ecosystem. Hadoop provides much-needed functionality in several areas for the business data analyst. These functions include big data storage, access, warehousing, query, and processing (mining and analytics).

The Hadoop Distributed File System (HDFS, for storage), HBase (for read/write access and database-like querying), Hive (for data warehouse functionality), and Pig (for processing and workflow management) have been around for a while. In addition to these, there are now some new tools and techniques in the Hadoop toolkit.

Read Also:
Telcos Gain Valuable Insight with “Big Data”

One of the most recent additions to the Hadoop family is Spark. Spark is a fast general purpose engine for large-scale data processing. Spark speeds up processing by enabling parallel, complex, interactive, in-memory calculations on big data. Spark also provides capabilities for interactive querying, machine learning, graph processing, and stream processing. As financial data streams increase not only in size, but also in real-time response requirements, the opportunities to use Spark will only increase in the months and years ahead.

Another powerful member of the Hadoop stack is Drill. Drill allows financial data analysts to perform what they love the most: interactive self-service ad hoc analyses! These analyses can now be performed on a large scale using Drill, which enables analytics across billions of records. The SQL capabilities of Drill provide a familiarity that we can all appreciate. But it doesn’t stop there. Usually, when we mention “SQL,” we tend to think of relational (schema-based) databases. But Drill can query schema-less datasets as well. This is referred to as NOSQL.  

Read Also:
How Data Visualization Is Breathing New Life into Social Media Marketing Metrics

A common misconception is that NoSQL means “No SQL”. That is not accurate. It is actually an abbreviation for “Not Only SQL,” which offers a perfect expression of Drill’s versatility. A common data format that is schema-less is JSON (JavaScript Object Notation), but any other data object that consists of key-value pairs can be processed by Hadoop or queried by Drill. A simple key-value pair may have this form: (item_id, item_key, item_value). Here is an example: (blog001, “author”, “Kirk Borne”), (blog001, “topic”, “big data”), and so on. It is flexible, extensible, and scalable.

A flat file containing key-value pairs can be easily constructed, incrementally updated, quickly edited, and readily partitioned to different processing nodes on a Hadoop cluster. All of this can be done without the time sink of rebuilding database indices, or modifying the schema, or re-normalizing the database relations.

 



Data Innovation Summit 2017

30
Mar
2017
Data Innovation Summit 2017

30% off with code 7wData

Read Also:
Protecting Privacy Is Good For Business
Read Also:
9 Hot Big Data And Analytics Startups To Watch

Big Data Innovation Summit London

30
Mar
2017
Big Data Innovation Summit London

$200 off with code DATA200

Read Also:
How Data Visualization Is Breathing New Life into Social Media Marketing Metrics

Enterprise Data World 2017

2
Apr
2017
Enterprise Data World 2017

$200 off with code 7WDATA

Read Also:
Protecting Privacy Is Good For Business

Data Visualisation Summit San Francisco

19
Apr
2017
Data Visualisation Summit San Francisco

$200 off with code DATA200

Read Also:
SQL on Hadoop benchmarks get serious

Chief Analytics Officer Europe

25
Apr
2017
Chief Analytics Officer Europe

15% off with code 7WDCAO17

Read Also:
Fixed Pricing is a Thing of the Past: How Does Big Data Save Consumers Money?

Leave a Reply

Your email address will not be published. Required fields are marked *