Read this eGuide to discover the fundamental differences between iPaaS and dPaaS and how the innovative approach of dPaaS gets to the heart of today’s most pressing integration problems, brought to you in partnership with Liaison.
As data and analytics become a more integral part of business processes in an organization, so the non-DBAs among us might start to feel lost in a sea of technical terms which are frequently thrown around by technical teams. The disproportionately loud vendor noise that exists in this space further generates jargon, hype, and confusion (just try to get a straight answer to “what is big data”).
This post is meant to be used by business users as a (very) abridged guide to the various types of repositories your data might reside in: databases, data marts, data warehouses and data lakes, so that you have a basic understanding of each of these concepts and the role they play in what you’re actually after – real, up-to-date insight from your data.
Ready to get educated? Let’s get started.
In one form or another, the database is at the heart of most data storage and management systems. The relational database used with many applications and systems holds data in tables of rows and columns. In a table, a row corresponds to a record with a set sequence of data fields, while a column lists one given data field for all the records. The data is structured in that only the “right” kind of data can be used in a given field: for example, in a customer relational database, a shipping date cannot be used in a field for a delivery address, and so on.
The structure or “schema” of a relational database is defined before starting to record data. It is often left unchanged afterward. However, by organizing the database as separate tables with defined relationships between them, structured data can be accessed or reassembled in many different ways. By comparison, the need to handle unstructured data has led to the creation of other types of databases. To efficiently handle free-form text in emails or for variable-length video clips, for example, such a non-relational database may have very few fields or different numbers of fields for different records, and may allow fields to be changed “on the fly” after data storage operations begin.
For key business systems like sales, accounting, and production, it is critical that input of transactional data to the database is quick and reliable, without disruption to the flow of business. The database can be optimized for these “write” operations, by minimizing the duplication of data fields (normalizing the data) among the database tables.
A transactional system can also be efficient in retrieving specific information about an individual transaction, like the date of a shipment to a customer. On the other hand, the “write-oriented” design makes it less well-suited for collating data to provide information such as overall shipment figures over the last two years. This kind of “read-oriented” operation may require massive manipulation of data records or recombination of large tables when querying production databases, either of which could then have a significant negative impact on a transactional system performance.
Analytical databases systems are optimized for “read” operations and often run separately from transactional-operational systems. From time to time, they ingest data from the transactional systems and possibly other data sources, but otherwise, they perform relatively few “writes”. Analytical systems are used to consolidate data (do roll-ups), slice data (for example, all shipments over one year of a given product), dice data (for example, shipments of a given product to a specific set of customers for a given quarter), and drill down to reveal successive layers of detail from a higher-level statistic.