Natural relationships between data contain a gold mine of insights for business users. Unfortunately, traditional databases have long stored data in ways that break these relationships, hiding what could be valuable insight. Although databases that focus on the relational aspect of data analytics abound, few are as effective at revealing the hidden valuable insights as a graph database.
A graph database is designed from the ground up to help the user understand and extrapolate nuanced insight from large, complex networks of interrelated data. Highly visual graph databases represent discrete data points as “vertices” or “nodes.” The relationships between these vertices are depicted as connections called “edges.” Metadata, or “properties” of vertices and edges, are also stored within the graph database to provide more in-depth knowledge of each object. Traversal allows users to move between all the data points and find the specific insights the user seeks.
To better explain how graph databases work, I will use IBM Graph, a technology that I helped to build and am excited to teach new users about. Let’s dive in.
Based on the Apache TinkerPop framework for building high-performance graph applications, IBM Graph is built to enable and work with powerful applications through a fully managed graph database service. In turn, the service provides users with simplified HTTP APIs, an Apache TinkerPop v3 compatible API, and the full Apache TinkerPop v3 query language. The goal of this type of database is to make it easier to discover and explore the relationships in a property graph with index-free adjacency using nodes, edges, and properties. In other words, every element in the graph is directly connected to adjoining elements, eliminating the need for index lookups to traverse a graph.
Through the graph-based NoSQL store it provides, IBM Graph creates rich representations of data in an easily digestible manner. If you can whiteboard it, you can graph it. All team members, from the developer to the business analyst, can contribute to the process.
The flexibility and ease of use offered by a graph database such as IBM Graph mean that analyzing complex relationships is no longer a daunting task. A graph database is the right tool for a time when data is generated at exponentially high rates amid new applications and services. A graph database can be leveraged to produce results for recommendations, social networks, efficient routes between locations or items, fraud detection, and more. It efficiently allows users to do the following:
Schema with indexes.Graph databases can either leverage a schema or not. IBM Graph works with a schema to create indexes that are used for querying data. The schema defines the data types for the properties that will be employed and allows for the creation of indexes for the properties. In IBM Graph, indexes are required for the first properties accessed in the query. The schema is best done beforehand (although it can be appended later) in order to ensure that the vertices and edges introduced along the way can work as intended.
A schema should define properties, labels, and indexes for a graph. For instance, if analyzing Twitter data, the data would be outlined as , , and vertices, and the connections between them are , , , and . Indices are also created to query schemas.
Loading data. Although a bulk upload endpoint is available, the Gremlin endpoint is the recommended method for uploading data to the service. This is because you can upload as much data as you want via the Gremlin endpoint.