Ordered data in RDF: About Arrays, Lists, Collections, Sequences and Pagination

2/7/2020 - Joep Meindertma

Sooner or later when working with RDF, you'll need to work with ordered data / n-ary relations. The subject predicate object model does not support Arrays, and no - you can't use the order in which triples appear.

However, RDF does support Collections (rdf:List) and Containers (rdf:Bag, rdf:Seq, rdf:Alt). And the open nature of RDF allows for even more alternatives, such as the Ordered List Ontology. All these concepts are (subtly) different, and can be quite confusing. In this article, I'll explain the models behind these concepts, show how they are serialized, and give some insights into when you should use which.

Skip to the bottom for a TL;DR!

RDF Collections

Maybe you've seen something like this in JSON-LD:

{
  "@list": [ "Arnold", "Bob", "Catherine" ]
}

Or something like this in a Turtle document:

someList :b ( "Arnold" "Bob" "Catherine")

These are rdf:Collections. These serialization formats (turtle and JSON-LD) have syntactic sugar for Collections, so they appear as regular arrays. Under the hood, however, they have a very different data model. Collections are linked lists and its chains consist of rdf:List nodes, connected by rdf:rest relations:

RDF:List

Every rdf:List has a rdf:first object, which refers to the actual content of the list item. Intuitively, it might be a bit weird to refer to Bob as a first in a List. However, since Lists are recursive and often contain other Lists, it actually makes sense to refer to the content as first. Collections always end with an rdf:nil namednode, which means they have a formally known ending. If we'd express the same information in N-Triples, we'd get something like this:

_:b0 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> "Arnold" .
_:b0 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> _:b1 .
_:b1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> "Bob" .
_:b1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> _:b2 .
_:b2 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> "Catherine" .
_:b2 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> <http://www.w3.org/1999/02/22-rdf-syntax-ns#nil> .

This explicitly linked nature means that it's possible to start a list in location A, and have it continue in a completely different domain:

RDF:List

It also means that inserting items is quite easy, as only the node before it has to be adjusted:

RDF:List

However... This structure can be kind of hard to deal with. Parsing is non-trivial, and might require a lot of lookups, which is very costly. Also, appending a single node to a List is hard, because it requires that the link of the item before it is adjusted.

RDF Containers

There are three types of RDF Containers rdf:Seq, rdf:Bag, rdf:Alt

  • An rdf:Seq is an ordered container
  • An rdf:Bag is an unordered container
  • An rdf:Alt is an unordered set of alternatives, in which the first one is the default option

RDF Containers use a numbered predicate (e.g. rdf:_1) to indicate that something is a child / member of the Container:

RDF:Seq

This model is often easier to parse and serialize than an rdf:List.

_:someSeq <http://www.w3.org/1999/02/22-rdf-syntax-ns#_1> "Arnold" .
_:someSeq <http://www.w3.org/1999/02/22-rdf-syntax-ns#_2> "Bob" .
_:someSeq <http://www.w3.org/1999/02/22-rdf-syntax-ns#_3> "Catherine" .

Appending items is easy as well, simply add one new statement and increment the predicate. However, inserting items requires you to rewrite many triples. If you use a reasoner that has implemented the basic RDFS spec, you can use rdfs:member (the superclass of every rdf:_n property) to get all members of an RDF Container.

Pagination with ActivityStreams and Hydra Collections

When your arrays will be either too long to serialize, or too computationally heavy to generate at run-time, you'll need some form of pagination. It's recommended to use an existing ontology for this. The W3C Activity Streams 2.0 ontology defines the as:Collection. It provides keys for things like next, totalItems and orderedItems. This spec seems to be designed with JSON-LD serialization in mind, so it relies on JSON arrays, which represent rdf:List Collections.

Similar to this is the hydra:Collection. The Hyrda spec still is in draft status, but it has been updated in october last year.

Converting to Arrays

If you're building an RDF application, and you want to access your RDF Lists / Collections / Sequences as an array, make sure to find an RDF library that has support for ordered data. We've open-sourced a JS library that might be useful for this: @rdfdev/collections. Get in touch if we can help!

TL;DR

There are no arrays in RDF, and don't use the order in which serialized triples appear.

RDF Containers:

  • Come in three forms: rdf:Seq (ordered), rdf:Bag (unordered), rdf:Alt (alternatives with default)
  • You can add new items by simply adding RDF triples
  • Inserting items is hard: requires rewriting many statements
  • Must be stored in a single graph / machine / server (centralized)
  • Have a formally unknown ending (open world assumption)

RDF Collections:

  • An ordered chain of rdf:List resources
  • You have to edit / remove statements before you can add new items
  • Inserting items is easy: requires changing just a few statements
  • Can span many graphs / machines / servers (decentralized)
  • Have a known ending (the rdf:nil)

Pagination:

Converting to arrays: