Go to Top

The Graph Database and the RDF Database

In a twist that has inevitable written all over it, the database industry has at last begun to take heed of the power of consumerization. The once mighty RDBMS is now obliged to make room for an emerging and increasingly important partner in the data center: the graph database. Twitter’s doing it, Facebook’s doing it, even online dating sites are doing it; what they are doing is tracing relationship graphs. After all, social is social, and ultimately it’s all about relationships.

There are all kinds of graph databases and projects on the market right now, and most of them are purpose-built for a particular workload or platform. Some technologies are not even databases at all, but rather graph analytics engines that pull data from any convenient source, even Hadoop, and analyze it in the graph engine.

Twitter developed its graph database in house and released it as FlockDB to the open source community. Twitter designed it specifically to store relationships and activity between users. Facebook rolled out Graph Search to allow its users to query its Social Graph to discover connections that extend beyond “so-and-so is a friend of so-and-so.” As far as online dating, it’s primarily a matter of who dated whom and what worked out well, but it’s possible to create complex graphs of who liked or disliked what movies or band or food or sports team or whatever ­– and to recommend potential matches on such a basis and also to analyze what worked and what didn’t.

For a refresher, in case you’ve forgotten, graph databases are based on the mathematics of graph theory. They store data in terms of nodes, properties and edges. Nodes are entities which can have attributes just like in an RDBMS. The edges are the relationships (i.e., the connections between nodes) and the properties are attributes of the relationship. A simple example: John likes to listen to The Beatles. John (person entity) likes to listen to (property) The Beatles (band entity). John (person entity) watches videos of (property) The Beatles (band entity). So in this simple example: the nodes are person and band, the edge is the connection between person and band and it has properties. In this example, the properties of the edge linking person John to band The Beatles are: “likes to listen to” and “watches videos of.”

The point is that graph databases gather information about relationships between entities and standard relational databases are not built to do that. Now you can model your way around this problem by (in our example) defining a person-band table (i.e., entity) and recording the properties in that table along with the key value pairs that they relate to. Unfortunately, when you store data like that and you ask simple questions such as “what are the bands that John and Eric like but Rebecca and Iris do not like?” the RDBMS takes forever, or at least a long time, to get you an answer. Graph databases and technologies are built to answer such queries and serve up the answers quickly.

The Graph Database and the RDF Database  

If, when you read the words “John likes to listen to The Beatles,” your mind said: “hey that’s a subject-predicate-object data triple,” you’re right and you’ve been spending far too much time in the land of RDF databases. The point is that RDF databases logically store data as triples, and hence, like graph databases they are also very good at answering queries that require the database to navigate its way around a graph.

Both RDF databases and graph databases do such things exceptionally well. However they are not exactly the same kind of engine either, at the logical level or the physical level. By definition, RDF databases standardize on the SPARQL query language (SPARQL is a recursive acronym of Sparql Protocol and RDF Query Language). These triple store databases needed a query language that went much further than SQL so that the semantic querying of data would be possible – which in turn would bring the world closer to the much-heralded-but-yet-to-actually-arrive semantic web.

SPARQL is, of course, emerging from its infancy, but it is already powerful. It is not just capable of semantic queries, it is capable of inferencing (or reasoning) with the data, which is a first for a query language. In time this may be a killer capability. You won’t just retrieve information, you will also be able to use the database to deduce new information by examining facts (assertions) in the data.

To lean on a famous example, an RDF database might contain the datum “All men are mortal,” and also the datum “Socrates is a man.” Through applying an inference to this data it would be able to declare new data: “Socrates is mortal.” Moving to another famous example: “Epimenides says that all Cretans are liars” and “Epimenides is a Cretan.” The database might (if programmed to avoid getting into endless loops) point out that here there is a contradiction in the data. Graph databases cannot do such things. But right now that probably doesn’t matter much, because it is still an explorative area of software.

Where the RDF databases really score is when you want to do set processing (a la SQL) at the same time that you want to do graph processing. Consider a query such as “Who are the biggest influencers on Twitter over the past six months?”

Both the RDF and Graph database would handle such a query and return the same results quickly. But if you ask the very different question, “Which influencers have had the same pattern of influence on Twitter over the last six months?” you are asking both for graph processing and set processing at the same time to get to the answer, and the RDF databases do both well. Not only that, but this is an area of analytics, which was virtually untapped until recently, because there was no software that could easily do it.

My personal belief is that this is what will separate the RDF databases from the graph databases in the end. Analytics. Graph analytics.

Robin Bloor

About Robin Bloor

Robin is co-founder and Chief Analyst of The Bloor Group. He has more than 30 years of experience in the world of data and information management. He is the creator of the Information-Oriented Architecture, which is to data what the SOA is to services. He is the author of several books including, The Electronic [email protected], From the Silk Road to the eRoad; a book on e-commerce and three IT books in the Dummies series on SOA, Service Management and The Cloud. He is an international speaker on information management topics. As an analyst for Bloor Research and The Bloor Group, Robin has written scores of white papers, research reports and columns on a wide range of topics from database evaluation to networking options and comparisons to the enterprise in transition.

Robin Bloor

About Robin Bloor

Robin is co-founder and Chief Analyst of The Bloor Group. He has more than 30 years of experience in the world of data and information management. He is the creator of the Information-Oriented Architecture, which is to data what the SOA is to services. He is the author of several books including, The Electronic [email protected], From the Silk Road to the eRoad; a book on e-commerce and three IT books in the Dummies series on SOA, Service Management and The Cloud. He is an international speaker on information management topics. As an analyst for Bloor Research and The Bloor Group, Robin has written scores of white papers, research reports and columns on a wide range of topics from database evaluation to networking options and comparisons to the enterprise in transition.

4 Responses to "The Graph Database and the RDF Database"

  • Ian Mercer
    January 25, 2015 - 6:44 pm Reply

    It might also be worth calling out another key difference between graph databases and triple stores: namely that the edge in a graph database is typically not itself a node and therefore cannot be used in a subject or object position in other statements. So in RDF it is possible to create metadata about data and metadata about the metadata and so on up, but to do the same in a graph database requires the introduction of additional nodes to represent what were edges and that quickly becomes vastly more complex.

    For example, a label ‘father’ on the edge of a graph isn’t related to the label ‘son’, they are just labels, but in RDF the two are related and that allows reasoning. This is what you discussed in terms of reasoning capabilities but the underlying storage differences are what prevents graph databases from being able to add such capabilities.

    Maybe also worth noting that all RDF databases have well-defined, universal interchange formats but that graph databases don’t.

    • Robin Bloor
      Robin Bloor
      January 26, 2015 - 7:56 am Reply

      Thanks for this contribution. It is much appreciated.

  • J.Barrasa
    March 4, 2015 - 8:35 pm Reply

    I think what Ian is trying to explain is that RDF can be used to describe both data and metadata. This means that an ontology describing a conceptual model (metadata) can also be expressed in RDF and live in the same triple store as the individuals (data) described in terms of this ontology.
    This enables pretty powerful querying involving both data and metadata.

    Inference (reasoning) in RDF stores, is carried out by rule engines that can derive new triples by applying forward/backward/hybrid chaining based on the well defined semantics of the ontologies/metadata.

  • Bryan Thompson
    April 7, 2015 - 8:26 am Reply

    I’d like to recommend two papers that deal with RDF reification. The first introduces RDF* and SPARQL* as syntactic extensions of RDF and SPARQL that make it easier to interchange and query such data, along with the transform rules required to turn them into RDF and SPARQL. This paper also shows that these capabilities are within the existing semantics of RDF and SPARQL. For me, this should be a wake up call for vendors to deliver high performance for link attributes and (more generally) statements about statements in RDF database platforms.

    The second paper outlines a reconciliation of RDF* (and hence RDF) with property graphs.

    1. RDF* (Foundations of an Alternative Approach to Reification in RDF) by Olaf Hartig and Bryan Thompson – http://arxiv.org/pdf/1406.3399.pdf

    2. Reconciliation of RDF* and Property Graphs by Olaf Hartig – http://arxiv.org/abs/1409.3288

Leave a Reply

Your email address will not be published. Required fields are marked *