SAS Event Stream Processing (ESP) cannot only process structured streaming events (a collection of fields) in real time, but has also very advanced features regarding the collection and the analysis of unstructured events. Twitter is one of the most well-known social network application and probably the first that comes to mind when thinking about streaming data source. On the other hand, SAS has powerful solutions to analyze unstructured data with SAS Text Analytics. This post is about merging 2 needs: collecting unstructured data coming from Twitter and doing some text analytics processing on tweets (contextual extraction, content categorization and sentiment analysis).
Before moving forward, SAS ESP is based on a publish and subscribe model. Events are injected into an ESP model using an “adapter” or a “connector.” or using Python and the publisher API Target applications consume enriched events output by ESP using the same technology, “adapters” and “connectors.” SAS ESP provides lots of them, in order to integrate with static and dynamic applications.
Then, an ESP model flow is composed of “windows” which are basically the type of transformation we want to perform on streaming events. It can be basic data management (join, compute, filter, aggregate, etc.) as well as advanced processing (data quality, pattern detection, streaming analytics, etc.).
SAS ESP 4.2 provides two adapters to connect to Twitter as a data source and to publish events from Twitter (one event per tweet) to a running ESP model. There are no equivalent connectors for Twitter.
Both two adapters are publisher only and include:
The second one is more advanced, using a different API (GNIP, bought by Twitter) and providing additional capabilities (access to history of tweets) and performance. The adapter builds event blocks from a Twitter Gnip firehose stream and publishes them to a source window. Access to this Twitter stream is restricted to Twitter-approved parties. Access requires a signed agreement.
In this article, we will focus on the first adapter. It consumes Twitter streams and injects event blocks into source windows of an ESP engine. This adapter has free capabilities.