A data warehouse is a database that is dedicated to data analysis and reporting. It combines data from multiple operational applications and provides one location for decision-support data. A warehouse should include staging tables — one staging table for each source table or file.
As the diagram shows, day-to-day operational databases and external sources provide the raw data. Special logic monitors the data sources for changes and prepares periodic extracts of new data.
Staging tables accumulate the history of input data from the periodic extracts. The staging tables do not modify the input data. They merely accumulate it along with some system metadata. There is one staging table for each source table or file.
Data may then flow to an operational data store (ODS). The purpose of the ODS is to integrate overlapping source data. The ODS provides a single source of normalized operational data. Thus if there are two sources of account information, there are two staging tables, and one account table in the ODS.
Ultimately the data reaches the data warehouse, either directly from staging or via an ODS. The warehouse restructures the data into facts and dimensions – a format suitable for data mining and decision support queries. For example, an account dimension would combine one or more account data source tables.
Consider a customer account application where customers can use a Web interface to view and edit their own data. Here’s a table from the application.
Here’s the corresponding staging table.
Chief Analytics Officer Spring 2017
15% off with code MP15
Big Data and Analytics for Healthcare Philadelphia
$200 off with code DATA200
10% off with code 7WDATASMX
Data Science Congress 2017
20% off with code 7wdata_DSC2017
20% off with code AIP17-7WDATA-20