I got 99 data stores and integrating them ain't fun

I got 99 data stores and integrating them ain’t fun

I got 99 data stores and integrating them ain’t fun

You know the story: corporation grows bigger and bigger through acquisitions, organograms go off the board, IT assets skyrocket, everyone is holding on to theirs trying to secure their roles, complexity multiplies, chaos reigns and the effort to deliver value via IT becomes painstaking. This may help explain why a lead enterprise architect in one of Europe's biggest financial services organizations is looking for solutions in unusual places.

Let's call our guy Werner and his organization WXYZ. Names changed to protect the innocent, but our fireside chat in Semantics conference last week was real and indicative of data integration pains and remedies. WXYZ's course over the years has resulted in tens of different data stores that need to be integrated to offer operational and strategic analytic insights. A number of initiatives with a number of consultancies and vendors have failed to deliver, budgets are shrinking and personnel is diminishing.

Granted, a big part of this has nothing to do with technology per se, but more with organizational politics and vendor attitude. But when grandiose plans fail, the ensuing stalemate means that in order to move forward a quick-win is needed, one that will ideally require as little time and infrastructure as possible to work, can be deployed incrementally and scale as required eventually. A combination of well-known concepts and under the radar software may offer a solution.

Read Also:
The Open Data journey explained

What do you do then, when you cannot afford to build and populate a data lake, or yet-another-data-warehouse? Federated querying to the rescue. This means that data stay where they are, queries are sent over the network to different data sources and overall answers are compiled by combining results. The concept has been around for a while and is used by solutions like Oracle Big Data. Its biggest issues revolve around having to develop and/or rely on custom solutions for communication and data modeling, making it hard to scale beyond point-to-point integration,

Could these issues be addressed? Data integration relies on mappings between a mediated schema and schemata of original sources, and transforming queries to match original sources schema. Mediated schemata don't have to be developed from scratch -- they can be readily reused from a pool of curated Linked Data vocabularies.

 



Chief Analytics Officer Spring 2017

2
May
2017
Chief Analytics Officer Spring 2017

15% off with code MP15

Read Also:
The data science project lifecycle
Read Also:
10 Dataviz Tools To Enhance Data Science

Big Data and Analytics for Healthcare Philadelphia

17
May
2017
Big Data and Analytics for Healthcare Philadelphia

$200 off with code DATA200

Read Also:
The data science project lifecycle

SMX London

23
May
2017
SMX London

10% off with code 7WDATASMX

Read Also:
10 Deep Learning Terms Explained in Simple English

Data Science Congress 2017

5
Jun
2017
Data Science Congress 2017

20% off with code 7wdata_DSC2017

Read Also:
10 Deep Learning Terms Explained in Simple English

AI Paris

6
Jun
2017
AI Paris

20% off with code AIP17-7WDATA-20

Read Also:
The Growth of Business Intelligence in 2017

Leave a Reply

Your email address will not be published. Required fields are marked *