May 8, 2024

Introduction to SPARQL

SPARQL (SPARQL Protocol and RDF Query Language) is a W3C standard to query RDF data (Knowledge Graphs). SPARQL is the equivalent of SQL for relational databases. SPARQL can in its simplest form be viewed as pattern matching query language. RDF is a triple pattern consisting of a subject, predicate and object. In SPARQL, the main body of the query will consist of a set of triples, where each part of the triple can either consist of a value or a variable. The idea is to match triples and find solutions to the variables. The anatomy of a SPARQL query is as follows:

PREFIX foo: <>
PREFIX bar: <

SELECT ?v1 ?v2 WHERE    ?v1 foo:property ?x .
    ?x bar:name ?v2

The first two lines are the prefixes which are used in the where clause. The where clause is where the query will have triple patterns. The blue part is the query result clause, in this case the result set will contain results for variables ?v1 and ?v2. Each variable starts with the question mark (?) identifier (sometimes the dollar sign $ is used). The last two lines contains query modifiers, for example limit the results to the first 10, or order in ascending on variable ?v1.

Consider the following data:

ex:person1 foaf:name “Alice” .
ex:person1 foaf:knows ex:person2 .
ex:person1 foaf:knows ex:person3 . 

ex:person2 foaf:name “Bob” . 

ex:person3 foaf:name “Mary” .


Now we need to find the names of all people:

PREFIX foaf: <>
	?x foaf:name ?name .

The query starts with the prefix, to instruct the SPARQL engine to replace any values starting with foaf: to replace with therefore, foaf:name is expanded to FOAF is a schema that describes persons, their activities and relations to other people and objects. The query then instructs the engine to return only results for the variable name. The where clause are the instructions for the query engine to help return results. In this case, the query engine will be finding and returning all triples (since we do not have any result modifiers) that have the predicate foaf:name. The solutions from this query are (not in any particular order):

  • ?x = ex:person1, ?name = Alice
  • ?x = ex:person2, ?name = Bob
  • ?x = ex:person3, ?name = Mary

However, the query engine will only return the results assigned to the variable ?name to the user. The more variables there are in the query, the larger the solutions search space can be. Furthermore, with more triple patterns, solutions will be a cartesian product.

Consider a query where we need to find the names of Alice’s friends:

PREFIX foaf: <>
PREFIX ex: <> 

	ex:person1 foaf:knows ?friend .
	?friend foaf:name ?name .

In this case we are instructing the query engine to get all friends for ex:person1 and then get the names for these friends. The solutions are:

  • ?friend = ex:person2, ?name = Bob
  • ?friend = ex:person3, ?name = Mary

Uses of SPARQL

SPARQL can be used for:

  • Explore data by querying unknown relationships.
  • Perform complex joins.
  • Transform data from one vocab to another.
  • Federation querying – querying different KGs on different endpoints to get more results.


SQL is suited for querying relational databases, whilst SPARQL is used to query RDF-based data. SQL is constrained to work within one database, and different databases might have different SQL flavours. On the other hand, SPARQL can be used to query multiple data stores (RDF graphs) through federation. Furthermore, SPARQL no explicit JOINS are needed in SPARQL which makes it more scalable than SQL with more complex models. There are some constructs that are the same, such as, SELECT, WHERE, FILTER, and UNION. SQL is based on tabular format, therefore operations mostly include filtering, aggregating, and joining. SPARQL’s operations are more graph-based operations such as pattern matching, and property path expressions, which allows for navigating through complex relationships.

About the author:
REach out

To discuss a project, collaboration, or for anything else, just shoot us a message.

Let's work together