Visualizing Graphs in Apache Zeppelin using Neo4J
- by 7wData
Apache Zeppelin is a useful tool for doing interactive and fast data analysis. Powered by native Apache Spark and with support for Spark-SQL, it brings some features we love about Jupyter notebooks together with some very cool built in visualisations. For example, you can query a Spark data frame using the Spark-SQL interpreter and then visualize the results in form of pie-charts, line graphs and group the results dynamically. Such abilities make Zeppelin an handy tool for quick demonstrations and story-boarding. We at The Data Team frequently use Apache Zeppelin for data analysis and even for use-case presentations to our clients.
Most data science use cases involve the mapping of the business or the domain context onto the data that’s been collected about the domain. This business context could be a collection of entities in the domain and their relationships. This knowledge of the domain lends itself to representation well when we use a graph to describe the domain. Such graphs are essentially collections objects and the relationships that exist between them. In this context, a graph database comes in handy to implement these entities and their relationships as a series of relationships, and even convert such data about graphs into a visual form.
Recently, we have had an opportunity to present few use-cases to one of our clients, where we used such graph representations of knowledge about a domain. For our implementation, we stored the domain context in form of a graph, thereby connecting entities and their relationships. This graph database is to store the domain only, and not massive amounts of data from social networks or other such systems, and therefore, the data store didn’t need be a scalable graph database. This is therefore very different from deriving relationships from large scale data or do graph analytics, for which GraphX and frameworks like Apache Giraph or Titan are a better fit. Since we didn’t need the scale of such databases, we used Neo4j to store our graph of objects and relationships. Neo4j provides a suite of REST APIs using which you can query and store information as graph.
In Neo4J terminology, the query language is the Cypher Query Language (CQL). Through this implementation, we provided our client with a single interface to view their business context, and then subsequently use the same information to query data sets using tools such as Spark and Hive.
[Social9_Share class=”s9-widget-wrapper”]
Upcoming Events
From Text to Value: Pairing Text Analytics and Generative AI
21 May 2024
5 PM CET – 6 PM CET
Read More