What we’re going to be talking about today are data science and graph recommendations:
I’ve been with Neo4j for two years now, but have been working with Neo4j and Cypherfor three. I discovered this particular graph database when I was a grad student at the University of Texas Austin studying for a masters in statistics with a focus on on social networks.
Real-time recommendation engines are one of the most common use cases for Neo4j, and one of the things that makes it so powerful and easy to use. To explore this, I’ll explain how to incorporate statistical methods into these recommendations by using example datasets.
The first will be simple – entirely in Cypher with a focus on social recommendations. Next we’ll look at the similarity recommendation, which involves similarity metrics that can be calculated, and finally a clustering recommendation.
The following dataset includes food and drink places in the Dallas Fort Worth International Airport, one of the major airport hubs in the United States:
We have place nodes in yellow and are modeling their location in terms of gate and terminal. And we are also categorizing the place in terms of major categories for food and drink. Some include Mexican food, sandwiches, bars and barbecue.
Let’s do a simple recommendation. We want to find a specific type of food in a certain location in the airport, and the curled brackets represent user inputs which are being entered into our hypothetical app:
This English sentence maps really well as a Cypher query:
This is going to pull all the places in the category, terminal, and gate the user has requested. Then we get the absolute distance of the place to gate where the user is, and return the results in ascending order. Again, a very simple Cypher recommendation to a user based just on their location in the airport.
Let’s look at a social recommendation. In our hypothetical app, we have users who can log in and “like” places in a way similar to Facebook, and can also check into places:
In the above app, we also have users with “likes” relationships to a place node and who are also friends with other users. Consider this data model on top of the first model that we explored, and now let’s find food and drink places in the following categories closest to gate in whatever terminal that user’s friends like:
The clause is very similar to the clause of our first Cypher query, except now we are matching on places:
The first three lines are the same, but for the user in question – the user that’s “logged in” – we want to find their friends through the relationship along with the places those friends liked. With just a few added lines of Cypher, we are now taking a social aspect into account for our recommendation engine.
Again, we’re only showing categories that the user explicitly asked for that are in the same terminals the user is in. And, of course, we want to filter this by the user who is logged in and making this request, and it returns the name of the place along with its location and category. We are also accounting for how many friends have liked that place and the absolute value of the distance of the place from the gate, all returned in the clause.
Now let’s take a look at a similarity recommendation engine:
Similarly to our earlier data model, we have users who can like places, but this time they can also rate places with an integer between one and 10. This is easily modeled in Neo4j by adding a property to either the node or the relationship.
This allows us to find other similar users, like in the example of Greta and Alice. We’ve queried the places they’ve mutually liked, and for each of those places, we can see the weights they have assigned.