Demystifying Data Warehouses

Demystifying Data Warehouses, Data Lakes, and Data Marts

Demystifying Data Warehouses, Data Lakes, and Data Marts

Read this eGuide to discover the fundamental differences between iPaaS and dPaaS and how the innovative approach of dPaaS gets to the heart of today’s most pressing integration problems, brought to you in partnership with Liaison.

 As data and analytics become a more integral part of business processes in an organization, so the non-DBAs among us might start to feel lost in a sea of technical terms which are frequently thrown around by technical teams. The disproportionately loud vendor noise that exists in this space further generates jargon, hype, and confusion (just try to get a straight answer to “what is big data”).

This post is meant to be used by business users as a (very) abridged guide to the various types of repositories your data might reside in: databases, data marts, data warehouses and data lakes, so that you have a basic understanding of each of these concepts and the role they play in what you’re actually after – real, up-to-date insight from your data.

Ready to get educated? Let’s get started.

In one form or another, the database is at the heart of most data storage and management systems. The relational database used with many applications and systems holds data in tables of rows and columns. In a table, a row corresponds to a record with a set sequence of data fields, while a column lists one given data field for all the records. The data is structured in that only the “right” kind of data can be used in a given field: for example, in a customer relational database, a shipping date cannot be used in a field for a delivery address, and so on.

Read Also:
Top Streaming Technologies for Data Lakes and Real-Time Data

The structure or “schema” of a relational database is defined before starting to record data. It is often left unchanged afterward. However, by organizing the database as separate tables with defined relationships between them, structured data can be accessed or reassembled in many different ways. By comparison, the need to handle unstructured data has led to the creation of other types of databases. To efficiently handle free-form text in emails or for variable-length video clips, for example, such a non-relational database may have very few fields or different numbers of fields for different records, and may allow fields to be changed “on the fly” after data storage operations begin.

For key business systems like sales, accounting, and production, it is critical that input of transactional data to the database is quick and reliable, without disruption to the flow of business. The database can be optimized for these “write” operations, by minimizing the duplication of data fields (normalizing the data) among the database tables.

Read Also:
The Emergence and Future of the Data Engineer

A transactional system can also be efficient in retrieving specific information about an individual transaction, like the date of a shipment to a customer. On the other hand, the “write-oriented” design makes it less well-suited for collating data to provide information such as overall shipment figures over the last two years. This kind of “read-oriented” operation may require massive manipulation of data records or recombination of large tables when querying production databases, either of which could then have a significant negative impact on a transactional system performance.

Analytical databases systems are optimized for “read” operations and often run separately from transactional-operational systems. From time to time, they ingest data from the transactional systems and possibly other data sources, but otherwise, they perform relatively few “writes”. Analytical systems are used to consolidate data (do roll-ups), slice data (for example, all shipments over one year of a given product), dice data (for example, shipments of a given product to a specific set of customers for a given quarter), and drill down to reveal successive layers of detail from a higher-level statistic.

Read Also:
Hottest Data Governance Trends You Need to Know for 2017

 



Data Science Congress 2017

5
Jun
2017
Data Science Congress 2017

20% off with code 7wdata_DSC2017

Read Also:
How Banking Can Survive Digital Disruption

AI Paris

6
Jun
2017
AI Paris

20% off with code AIP17-7WDATA-20

Read Also:
How Banking Can Survive Digital Disruption

Chief Data Officer Summit San Francisco

7
Jun
2017
Chief Data Officer Summit San Francisco

$200 off with code DATA200

Read Also:
The Next Wave of Deep Learning Architectures

Customer Analytics Innovation Summit Chicago

7
Jun
2017
Customer Analytics Innovation Summit Chicago

$200 off with code DATA200

Read Also:
3 Advantages of Using Neo4j Alongside Oracle RDBMS

HR & Workforce Analytics Innovation Summit 2017 London

12
Jun
2017
HR & Workforce Analytics Innovation Summit 2017 London

$200 off with code DATA200

Read Also:
Executive Survey: Big Data Has Been a Big Success

Leave a Reply

Your email address will not be published. Required fields are marked *