Data warehouses offer a window into an organization’s historical performance and ongoing operations, providing data analysts and business users with information on things such as customer behavior, business trends, and quarterly and annual sales. Despite the emergence of Hadoop and other big data technologies, the growing need for companies to capture and analyze data from various sources is keeping the data warehouse as relevant as ever, if not more so. But before investing in a data warehouse platform, the first step is to examine whether your organization really needs one and what business benefits it can receive.
To accomplish this, you must consider the two data warehouse deployment options — enterprise-wide or departmental. You also need to determine if unstructured big data will be a component of the data warehouse environment and decide whether to integrate traditional data warehousing for online analytical processing (OLAP) uses with data processing and management for big data analytics. Finally, you must be able to match the various use cases for data warehousing to the most appropriate data warehouse platform types.
Why does your organization need a data warehouse? The general concept of data warehousing is quite simple: Data is regularly extracted from the operational systems that support the business and copied to a specialized system — a data warehouse — for analysis and reporting via dashboards, portals and business intelligence, reporting, and analytics tools. The following conditions may indicate that your organization could benefit from a data warehouse: You’re struggling to report effectively on business activities within your company because required data isn’t readily available. Data is being copied separately by different departments and groups for analysis in spreadsheets that aren’t necessarily consistent with one another. Uncertainties about the accuracy of data are causing corporate executives and business managers to question the veracity of reports. BI reporting against production databases causes nightly or monthly processing of transaction data to be extended. With a properly implemented data warehouse, you can help your organization accurately answer questions about your business, such as what happened and why. Data warehousing improves data availability because it collects data from disparate locations and sources into a central repository. Once the data is in the warehouse instead of in production databases, operational workflows become more efficient because analytical activity has been moved to a separate system. As it moves through to the data warehouse, the data is evaluated, cleansed and transformed; this means that the quality of the information in reports generated from the data warehouse should be improved. This is the second article in a four-part series on data warehouse platforms. The first article laid the groundwork for deploying a data warehouse, while this article examined specific use cases for buying a data warehouse platform. Article No. 3 will help you determine your must-have features. The concluding article will examine specific data warehouse platforms from the leading vendors, comparing and contrasting their features.
The data warehouse environment can differ greatly across organizations, however. Deployments can follow one of two paths — an enterprise data warehouse (EDW) or a data mart or a combination of these. An EDW is architected to contain all of the pertinent data from an enterprise’s operational systems — and perhaps external data sources — and is used across all departments. The data is manipulated for query purposes, transforming and aggregating it for BI gathering (see Figure 1). Some organizations have implemented an operational data store (ODS) as an interim step between the operational systems and the data warehouse. Operational data is copied to the ODS and then extracted for use in the data warehouse.;