Full-stack linked data: lessons from building an RDF web app

Joep Meindertsma, 11 Sep 2020

How might a web application work if it exclusively uses Linked Data (RDF) to communicate between server and client? In this article, I’ll tell you about our API journey, why we chose linked data (RDF), the challenges that we faced, and some of the solutions that we came up with.

About the project

In 2016, we started Argu.co: an e-democracy platform. We wanted more people to become politically engaged, and make sure that their opinions and ideas were heard by decision makers. We needed a web app where people would discuss issues, vote, share ideas and make decisions together. We started with Ruby on Rails, and built a traditional server-side monolithic HTML serving app. This proved to be a quick and effective way to start, but as time passed, we noticed more and more limitations of this approach.

It was clear that we needed a dedicated client-side single page application and an API.

Designing a client-server contract (API)

Re-inventing the wheel seems like a bad idea, so we searched for best practices and existing standards in API design. We initially went with JSON-API (not just ‘a’ JSON API, but a formal specification that describes things like pagination). Since we were already using React, and saw its popularity climbing, we went with that for the front-end. Now we had to decide how the internal state of the client should look like. We picked Redux, which uses a single (immutable) JS object which contains all application state (including data).

As we worked on this, our front-end started looking more and more like our back-end. It now had explicit knowledge of:

This seemed wrong. We were taught to write DRY code: don’t repeat yourself, but it seemed like we were repeating ourselves all the time. This meant that when we wanted some feature changed, we had to adjust both the front and the back-end. Something seemed off.

Why we opted for Linked Data

Our CTO Thom van Kalkeren thoroughly studied the principles behind REST, HATEOS and Hypermedia, and concluded that we need to give everything URLs. Not just the pages that we present to our users, but every single thing that can be used by a client - including menu items, buttons, actions and form fields. Luckily for us, the good folks at W3C had written quite a bit of specs about how data should look when you use URLs: the RDF specification (and friends).

Linked Data (or RDF) has some unique qualities:

So, we went with Linked Data.

The cost of being an early adopter

Although the first RDF spec was already 10 years old, there were not many existing libraries (let alone tutorials) for us to get started. Practically all Linked Data projects functioned as viewers that directly show individual RDF Statements in a table. We wanted something else, we wanted a user-friendly web app that resembled social media platforms. We needed interactivity, notifications, forms… And doing things like that with Linked Data, turned out to be pretty hard. Luckily for you, we’ve shared quite a bit of tools, libraries and ideas that could make your life a bit easier if you choose to go Linked Data.

Getting used to RDF

RDF is a weird data model. It was primarily designed as a language for the semantic web, which aims to (formally, logically) describe our world using semantic triples. The design decisions made to achieve that, however, make things harder for you as a developer:

Making Ruby on Rails work with RDF

Ruby on Rails is an opinionated MVC framework. By default, it works great for generating HTML pages (or JSON objects, using Rails-API), but RDF is a whole different beast. We created our own serializers (rdf-serializers), but more importantly, Thom and Arthur Dingemans created linked-rails. This gem provides some abstractions for working with RDF, including:

Check out the Wiki if you want to learn more!

Forms and form validation

Argu was always about getting opinions from people, so it needs a lot of forms. And these can differ for many reasons:

So, how should the server communicate these forms to the client?

We stumbled upon SHACL, a schema language for RDF (written in RDF), that provides a way to validate RDF graphs. This seemed like the way to go! However, it became obvious that forms cannot entirely depend on shapes. Should you render a radio select, or a dropdown? How do you describe a multi-page form? How do you describe interrelated form fields (if x is a, show form option y). And to make things even more challenging: how do you make these forms cacheable, given some fields might render differently for different users?

Again, we had to create our own form abstraction. We used the SHACL spec where possible, but had to create some new Classes and Properties to solve the challenges that we faced. The Forms DSL in linked-rails provides a simple interface to create these forms.

Turning RDF into HTML

Ultimately, the user needs to show a pretty web app, which means the RDF from the server has to be transformed into HTML. Step one is fetching the data from the server, and storing it in the client.

We started using RDFlib.js, written partially by the inventor of Linked Data (and HTTP, HTML, the WWW…) Tim Berners-Lee himself. It provided a lot of useful features for dealing with RDF: parse, fetch, serialize, mutate, search… However, our app had kind of a unique set of requirements (mostly related to performance when dealing with many, dynamic triples), so we started working on a different RDF store.

My colleague Thom wrote two libraries: link-lib and link-redux. Link-lib does most of the heavy lifting: it provides a store, deals with parsing, serializing, fetching, sending RDF data, handling the actions, registering the views (more an that in a minute). Link-redux (should be called link-react, but that name was taken) adds some useful functions for using it in React.

In practice, it works like this:

Serialization & parsing performance

RDF can be serialized in many ways (at least 10). Some formats look like the ones you know (JSON-LD, RDF/XML), some reflect the internal model really well (N-Triples), and some have interesting and complex semantic features (Notation3). We didn’t really care about any of that - we just wanted to make things fast. Parsing JSON-LD was extremely hard and slow, Parsing Turtle was better (but still slow), so for a while we opted for the with N-Triples. But ultimately, Thom came up with a new format: HexTuples-NDJSON, which was actually very simple to implement both in the Ruby back-end and in the JS front-end. It made our entire server app about 2x faster (it was working a lot on serialization), and in the front-end we noticed a similar speed increase.

Bulk-API, caching and more performance improvements

Still, our app wasn’t quick enough. The core problem: every single resource on screen with has a URL, and needs to be fetched independently. In other words, opening a single page could require over a 100 round-trips. Thom came up with a solution: the bulk-API, which would accept a body containing all the URLs that needed to be fetched. This improved many things, but our back-end was still running each resource request internally with all the existing overhead (checking the user, session, etc.). Even when that is fixed, we’re still performing way too much hard work for every request - we need some form of caching.

That’s why were working on a new RDF Triple Strore, written in Rust, and it is fast. It offers three ways of querying:

It’s not rolled out yet in production, but we’ll open source it soon enough.

FAQ

Which triple store do you use?

Although our back-end serializes linked data, we don’t use a triple store. The Rails back-end has a mostly strict schema in SQL tables, but we do have a table in our Postgres schema that stores single triples. Keep in mind that the RDF representation of data can be created during serialization - so your app can create linked data without using a triple store! This is often a preferable approach in apps that have business logic or any type constraints. But as I’ve mentioned earlier, we’re working on a new triple store (written in Rust) that we’re planning on open sourcing soon, which we use for caching to keep performance optimal.

Why not use SPARQL?

SPARQL is the de facto query language for RDF data, so it seems logical to use it somewhere in our stack. However, we don’t. Getting SPARQL performant is actually pretty difficult, and we don’t need the powerful query options that it provides. Most of the requests from the front-end just ask for all triples about one or multiple subjects, and these kind of queries don’t require SPARQL. SPARQL is useful for more complex graph property traversal queries, but is not necessarily the best approach for simpler queries.

RDF tools that we’ve built

If you need any help, get in touch!

« Ordered data in RDF: About Arrays, Lists, Collections, Sequences and Pagination