Alan Clark of OpenStack describes how the OpenStack open source IaaS provides a solution for big data in the cloud and how it can offer an attractive Hadoop deployment strategy.
We’ve all seen statistics demonstrating the amount of data being generated and gathered daily with average amounts in the Petabyte and Exabyte range. Processing such large amounts of data is what Big Data is built to do. It’s no wonder that the industry has become so large with the amount of data and the potential business around it.
While there are multiple solution sets targeting big data analytics, we have traditionally focused on maximizing dedicated hardware and large processing power. As of recent there is a fast growing convergence of Big Data and cloud, particularly where the data sets are unstructured with simple data models — an area of specific focus for the Apache Hadoop technology.
Convergence of Big Data and Cloud with OpenStack
The mission of OpenStack is to produce a ubiquitous open source cloud platform that will meet the needs of public and private cloud providers. OpenStack is about Infrastructure-as-a-Service (IaaS), an open source project, community and ecosystem that has dramatically grown to over the past five years. Today the project hosts over 27,000 individual members, 2,000 contributors and 500 supporting companies. The number of components has grown from the original two to over 25 today.
These statistics convey the significance of open source and the transformation of the cloud over time. Users within the open source project can see the power of open source and appreciate its ability to embrace new ideas and market needs including the convergence of Big Data on cloud.
While this transformation is taking place, it is important to note that OpenStack is not recreating Hadoop or any of the other Big Data technologies. The OpenStack effort was created to facilitate the care and management of Big Data within the IaaS infrastructure; sometimes called Analytics-as-a-Service. The technology effort within OpenStack is code named Sahara and was created to provision, launch and manage Hadoop clusters on top of OpenStack, making it simple to deploy and manage Big Data infrastructure and tools.
Where Big Data has traditionally opted for dedicated hardware, what is driving it to the cloud? Cloud espouses dynamic workloads, multi-tenancy and the sharing of resources. The clue is in the growth and expansion of analytics ideas and data based solutions, for example the expansion into real time on-demand analysis and response. A great example of this would be the consumer shopping experience demonstrating need for real time on-demand analysis.
Retail establishments have traditionally used Big Data to predict retail trends by combining customer histories with current web browsing patterns and social media responses, thereby leveraging this data to target customer segments and prepare for customer demand. Such analysis is on going, and is predictable in scale, size and duration. Traditionally, workloads have been considered of low benefit for deployment within a cloud.
However, it is very evident that the customer shopping experience is changing. Not only are retailers looking to actively interact with enhanced intelligence; they are looking for immediate and personalized privileges and services. Such relationships evolve beyond a simple purchasing experience to active engagement throughout the life of the product. For example, in this day and age, a consumer doesn’t just buy a regular watch anymore; the watch now tells them when and where to eat based on a geographical location. A myriad of devices and even your shoes can record how many steps have been taken. Heart rates can be recorded and analyzed giving the consumer real-time health analysis.
Providing real-time services is the perfect fit for cloud, including real-time analytics needed for quick elasticity, rapid service deployment and agility. As businesses discover the potential of an enhanced customer relationship, ideas and innovative methods for that relationship drive business growth. Yet the services for this growth profile are different than traditional analytics. Real-time analysis and storage vary over time and location. In other words, these types of services need to be elastic to rapidly respond to changing workload demands over time.
Yet today’s businesses maintain a need for technical efficiency to control cost, risk and security. Cloud has proven to be the most viable answer, providing businesses the agility and elasticity they are looking for while also providing centralized control and stability.