This vendor-written tech primer has been edited by Network World to eliminate product promotion, but readers should note it will likely favor the submitter’s approach.
As organizations work to make big data broadly available in the form of easily consumable analytics, they should consider outsourcing functions to the cloud. By opting for a Big Data as a Service solution that handles the resource-intensive and time-intensive operational aspects of big data technologies such as Hadoop, Spark, Hive and more, enterprises can focus on the benefits of big data and less on the grunt work.
The advent of big data raises fundamental questions about how organizations can embrace its potential, bring its value to greater parts of the organization and incorporate that data with pre-existing enterprise data stores, such as enterprise data warehouses (EDWs) and data marts.
The dominant big data technology in commercial use today is Apache Hadoop. It’s used alongside other technologies that are part of the greater Hadoop ecosystem, such as the Apache Spark in-memory processing engine, the Apache Hive data warehouse infrastructure, and the Apache HBase NoSQL storage system.
In order for enterprises to include big data in their core enterprise data architecture, adaptation of and investment in Big Data as a Service technologies are required. A modern data architecture suited for today’s demands should be comprised of the following components:
* High-performance, analytic-ready data store on Hadoop. How can big data be speedy and analysis-ready? A best practice for building an analysis-friendly big data environment is to create an analytic data store that loads the most commonly used datasets from the Hadoop data lake and structures them into dimensional models. With an analytic-ready data store on top of Hadoop, organizations can get the fastest response to queries. These models are easy for business users to understand, and they facilitate the exploration of how business contexts change over time.
This analytic data store must not only support reporting for the known-use cases, but also exploratory analysis for unplanned scenarios. The process should be seamless to the user, eliminating the need to know whether to query the analytic data store or Hadoop directly.
* Semantic layer that facilitates “business language” data analysis. How can big data be accessible to more business users? To hide the complexities in raw data and to expose data to business users in easily understood business terms, a semantic overlay is required. This semantic layer is a logical representation of data, where business rules can be applied.
For example, a semantic layer can define “high-value customers” as “those who have been customers for more than three years and are making new or renewal purchases on a regular basis.;