Inside Analysis

The Sparkle of SPARQL

In our blog entitled Is Another Database Race About to Start? we presented an overview of Resource Description Framework (RDF) technology and why it is significant. So now let’s delve deeper into the magic behind using a semantic database: SPARQL.

We can start by bowing down to SQL as the default query language for analytics. Despite its limitations and shortcomings, it will not be dethroned in the foreseeable future, if ever. The relational databases that occupy the bulk of the world’s data centers are built for SQL queries, and nowadays even NoSQL databases employ SQL.

SQL is great for retrieving data that is stored in regular sets (i.e., relational tables), but it has no ability to derive meaning from data, and it is weak and often hopeless at navigating its way through networks of data via the relationships within sets (such as who knows who knows whom).

SPARQL (from SPARQL Protocol and RDF Query Language) is the query language for data stored in an RDF format, which is a collection of subject-predicate-object triples. We will address this in depth in another article, but at a high level it is enough to note that when data is stored in an RDF format, all the relationships inherent in the data are preserved, including its meaning. In general, SPARQL is to the semantic web and graph analytics (more on this later, too) what SQL is to the relational database.

Anyone who knows SQL can easily learn SPARQL, as the two languages are similar. Both use commands such as SELECT, WHERE and GROUP BY, and both use the same naming conventions for functions such as aggregates and string functions. But the similarities end when it comes to how the languages work.

Even with a complex data warehouse in place, performing a SQL query against multiple data sources is difficult. Data sources must be mapped, and schemas must be written. Metadata needs to be managed, and when the data itself changes or when new data sources are added, the whole model needs to be updated.

SPARQL, on the other hand, is inherently schema free and allows for federation. Users can immediately start working with new data sources without having to go through IT, and with a single query, request data from multiple locations at once. Such locations can include Web data, internal enterprise databases or external sources such as partner data, public data sets or social media streams. This federated approach allows users to query any collection of databases as if it were one, and because it requires no schema, when data or data sources change, nothing needs to be done.

This is achieved by the use of SPARQL endpoints, which are basically URLs that accept a SPARQL query and return the results. Most large public data sets provide APIs to the endpoints, and there are several publicly available resources for finding them. Internal data can also be converted to RDF format with relative ease, making enterprise assets SPARQL-ready. This makes sharing data sets extremely flexible and accessible. By allowing access to new data sources on the fly and enabling queries to federate across multiple data sources, SPARQL achieves an enormous advantage over SQL, both in respect to time to insight and ease of use.

For example, imagine that you want to find the population, area and median income of all U.S. cities to determine if there is a relationship between population density and income. Using SQL, you would have to query each city database separately, ensure that you have the right data set, then perform complex and time-consuming joins on the data. With SPARQL, one query can target all the data sources and pull the results into a single set. No joins, no coding.

Let’s consider a more business-critical example. In the financial services sector, regulations regarding risk and compliance present an enormous challenge to data management teams. Fraud detection and fraud prevention, in particular, are rendered nearly impossible when data is dispersed across departments and external sources. This is where SPARQL excels. By providing the ability to combine trading data with other data, such as social network feeds and geospatial data, the SPARQL user can quickly find oft-hidden patterns or connections between people and/or entities, thus exposing threats like fraud much sooner.

RDF and SPARQL work hand in glove to deliver a versatile analytics environment. There is no need to consider schema, since RDF data is built on data structures that are flexible by design. SPARQL can work alongside existing database investments, and it provides far more intelligence over the data it can query. This type of technology helps organizations turn big data challenges into big data opportunities.

Robin Bloor

About Robin Bloor

Robin is co-founder and Chief Analyst of The Bloor Group. He has more than 30 years of experience in the world of data and information management. He is the creator of the Information-Oriented Architecture, which is to data what the SOA is to services. He is the author of several books including, The Electronic B@zaar, From the Silk Road to the eRoad; a book on e-commerce and three IT books in the Dummies series on SOA, Service Management and The Cloud. He is an international speaker on information management topics. As an analyst for Bloor Research and The Bloor Group, Robin has written scores of white papers, research reports and columns on a wide range of topics from database evaluation to networking options and comparisons to the enterprise in transition.

Robin Bloor

About Robin Bloor

Robin is co-founder and Chief Analyst of The Bloor Group. He has more than 30 years of experience in the world of data and information management. He is the creator of the Information-Oriented Architecture, which is to data what the SOA is to services. He is the author of several books including, The Electronic B@zaar, From the Silk Road to the eRoad; a book on e-commerce and three IT books in the Dummies series on SOA, Service Management and The Cloud. He is an international speaker on information management topics. As an analyst for Bloor Research and The Bloor Group, Robin has written scores of white papers, research reports and columns on a wide range of topics from database evaluation to networking options and comparisons to the enterprise in transition.

3 Responses to "The Sparkle of SPARQL"

  • Richard
    November 3, 2014 - 1:02 pm Reply

    Robin- You have a good sense of a critical technology that must mature. The more I study enterprise analytics at scale, the more I appreciate the need for deep semantics of that ecosystem. SQL may be the king-of-the-mountain for data warehousing, but it is a blind lion for data analytics – lots of roar but clueless. We need to both model and manage complex workflows where data and process elements are interwoven. Also… wonder what is the link to IBM Watson for semantic analytics? Lots to ponder in this blog! -Richard

  • Irene
    November 16, 2014 - 5:46 pm Reply

    SPARQL is a very powerful tool, but it would be a stretch to say that it can integrate diverse data sources and accommodate changes to the data sources without using any mappings or requiring query changes. Creating mappings, however, is much simpler than with other technologies.

    For more detailed information with examples on how this works in SPARQL, see http://www.topquadrant.com/2014/05/05/comparing-sparql-with-sql/.

Leave a Reply

Your email address will not be published. Required fields are marked *