VoID

From semanticweb.org
Revision as of 12:45, 26 February 2012 by MichaelHausenblas (Talk | contribs)

Jump to: navigation, search
Vocabulary of Interlinked Datasets
Homepage: rdfs.org/ns/void/
Language: RDF Schema
Last release: March 6 2011
Last revision: March 6 2011
Namespace: http://rdfs.org/ns/void#

VoID (from "Vocabulary of Interlinked Datasets") is an RDF based schema to describe linked datasets. With VoID the discovery and usage of linked datasets can be performed both effectively and efficiently. A dataset is a collection of data, published and maintained by a single provider, available as RDF, and accessible, for example, through dereferenceable HTTP URIs or a SPARQL endpoint.

Contents

Overview

Basically, we find two classes at the heart of VoID:

  • A dataset (void:Dataset) is a collection of data, which is:
    • published and maintained by a single provider, and
    • available as RDF, and
    • accessible, for example, through dereferenceable HTTP URIs or a SPARQL endpoint.
  • The interlinking is modelled by a linkset (void:Linkset). A linkset in voiD is a subclass of a dataset, used for storing triples to express the interlinking relationship between datasets. In each interlinking triple, the subject is a resource hosted in one dataset and the object is a resource hosted in another dataset. This modelling enables a flexible and powerful way to talk in great detail about the interlinking between two datasets, such as how many links there exist, which kind of links (e.g. owl:sameAs or foaf:knows) are present, or stating who claims these statements.

In the following, the modelling of the interlinking in voiD is depicted:

voiD interlinking concept

The core resources of the VoID spec are as follows:

  1. VoID vocabulary (normative), defines the classes and properties (available in HTML and RDF)
  2. Describing Linked Datasets with the VoID Vocabulary, explains the usage of VoID for both data publisher and consumer (along with other vocabularies such as Dublin Core, FOAF, etc.)
  3. VoID code repository, hosting exemplary implementations (also issues re the vocabulary are accessible there)

Using VoID

A simple VoID example that describes two well-known LOD datasets and their interlinking is shown in the following.

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix void: <http://rdfs.org/ns/void#> .

:DBpedia rdf:type void:Dataset ;
         foaf:homepage <http://dbpedia.org/> .

:DBLP rdf:type void:Dataset ;
      foaf:homepage <http://www4.wiwiss.fu-berlin.de/dblp/all> ;
      dcterms:subject <http://dbpedia.org/resource/Computer_science> ;
      dcterms:subject <http://dbpedia.org/resource/Journal> ;
      dcterms:subject <http://dbpedia.org/resource/Proceedings> .

:DBpedia void:subset :DBpedia2DBLP  .

:DBpedia2DBLP rdf:type void:Linkset ;
              void:target :DBpedia ;
              void:target :DBLP .

Let us assume that the above VoID description has, for example, been gathered by a semantic indexer from the VoID documents data publishers provide along with their dataset. It is then possible to execute the following example query:

SELECT DISTINCT ?dataset
WHERE {
  ?dataset a void:Dataset .
  ?dataset dcterms:subject <http://dbpedia.org/resource/Journal> .
}

This query asks for all datasets that have been categorised as containing data about journals and will eventually return :DBLP. A slightly modified version of the query above applied to the RKB explorer is shown below:

SELECT DISTINCT ?dataset
WHERE {
 ?dataset a void:Dataset .
 ?dataset dcterms:subject <http://dbpedia.org/resource/Category:Computer_scientists> .
}

The above SPARQL query will list all datasets about computer scientists.

Much more is possible with VoID, though. You can describe basic dataset metadata (such as publisher, license, etc.), vocabularies used, example resources, SPARQL endpoint availability, and of course the fine-grained interlinking between the datasets. Such an example, describing partly the interlinking between DBpedia and Geonames, is:

:DBpedia void:subset :DBpedia2Geonames .

:DBpedia2Geonames a void:Linkset ;
             void:linkPredicate owl:sameAs ;
             void:target :DBpedia ;
             void:target :Geonames .

Generating and Consuming

There are already tools and applications available that consume or produce VoID.

Generators & Editors

  • voidGen (Christoph Böhm, HPI, Universität Potsdam)
  • VoID tools is a collection of Jena-based tools to support the generation of VoID descriptions including RDFStats statistics.
  • OpenLink's Virtuoso produced VoID, see DB.DBA.RDF_VOID_STORE
  • liftSSM is an XSLT that takes a semantic sitemap in XML and creates a stub voiD description in RDF/XML.
  • The current version v1.1 of the NxParser (Parser for NTriples, NQuads, and more) supports exporting VoID descriptions of statistics.
  • ve2 - the2 VoID editor allows to manually generate a VoID file in RDF Turtle format, incl. the definition of the characteristics of your linked dataset, such as categories, interlinking, technical features, licensing, etc.

Exploration, Browser, Stores

Examples in the Wild

In the examples below, VoID is used for different purposes and use cases, however, the time it took people to implement it may be an indicator for how remarkably easy it is to generate/consume VoID.


Eurostat's Metadata Server RAMON

The Eurostat Metadata Server RAMON uses VoID to describe their data, including countries and NUTS codes.

Open Data Communities

Open Data Communities uses VoID, see for example Index of Multiple Deprivation Ranking, 2010.

RPI's Linked Data from data.gov

http://data-gov.tw.rpi.edu uses VoID to describe their datasets, see for example in an URI design note.

SPARQL Endpoints Status

Mondeca provides a list of the availability of public SPARQL endpoints fetched dynamically using CKAN, using VoID along with an extension to describe the status of an endpoint.

data.CNR.it

Italian National Research Council (CNR) publishes organizational data at http://data.cnr.it/ and provides also a VoID description.

The Stationery Office (TSO)

The UK-based 'The Stationery Office' (TSO) provides information management & publishing solutions to the public and private sectors and uses VoID to describe different UK datasets, see for example http://gov.tso.co.uk/gazettes/void

PSI Catalogues Aggregator

The PSI Catalogues Aggregator offers voiD descriptions for Public Sector Information (PSI) catalogues. For example, see their data.gov.uk VoID description.

Ordnance Survey

Ordnance Survey, Great Britain's national mapping agency uses VoID to describe their data.

OECD Glossary of Statistics

oecd.dataincubator.org is a dataset offering VoID about data extracted from the OECD Glossary of Statistics.

Lingvoj

As reported by Bernard Vatant, lingvoj has a VoID description as well; lingvoj is a linked dataset dedicated to the publication and use of multilingual RDF descriptions of human languages.

OpenLink Software

From the Virtuoso Open-Source Edition 5.0.10 release (2009-02-13) we gather that OpenLink has added support for VoID. Further, Kingsley Idehen (CEO and founder of OpenLink) has announced a demo from their Virtuoso platform: the so called uriburner service turns structured HTML into RDF and uses VoID to represent the (on the fly) generated data description.

As announced on 5 March 2009, OpenLink has generated a VoID graph for the DBpedia; use <http://dbpedia.org/void/> for the default graph field at http://dbpedia.org/sparql

Virtuoso (both Open- and Closed-Source variants) also now includes scripts which use built-in functions for VoID generation and storage, e.g., DB.DBA.RDF_VOID_STORE

RDFohloh

Sergio announced that RDFohloh has VoID descriptions.

RKB explorer

As reported by Hugh Glaser, the RKB explorer activity has a VoID site which enables query and browse for CRS datasets. For example:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX void:<http://rdfs.org/ns/void#>
PREFIX scovo:<http://purl.org/NET/scovo#>

SELECT ?subjects ?objects ?stats  
WHERE { 
 ?crs void:subjectsTarget ?subjects . 
 ?crs void:objectsTarget ?objects . 
 ?crs void:statItem ?item .
 ?item rdf:value ?stats . 
}

The interlinkage of the RKB sites can be visualised from the VoID data.

The VoID data for a particular RKB site can be accessed as Linked Data at that site, for example the dblp.rkbexplorer.com site has a void.ttl file, voiD dataset URI and information about its CRS.

SchemaCache

SchemaCache publishes a VoID description.

Decimalised Database of Concepts

In the decimalised database of concepts (DDC) dataset, VoID is used extensively. DDC is a collection of topics suitable for use in linked data. It is inspired by the Dewey Decimal Classification, but no guarantees are made about the closeness of its resemblance as a whole. SKOS mapping links are provided from this database to the Dewey system, to Library of Congree Classification codes and to DBPedia resources where possible.

Telegraphis

Linked Data about currencies on telegraphis.net, see http://telegraphis.net/data/void

EPrints

EPrints Repository software publishes RDF as of v3.2.1, and automatically describes the dataset using VoID. Suggestions for improvement to cjg@ecs.soton.ac.uk (it's worth getting right as many repositories will end up with this code)

Feedback and Discussions

We have a VoID discussion group (void-discussion@googlegroups.com) if you are interested to share your experience or have a question. If you have a feature request or want to file a bug report, please use the VoID Issue Tracker. Some of us hang out on #swig IRC channel at Freenode.


Related Specifications

There are some specifications that use or extend VoID or are related to it:


See also

The VoID vocabulary is maintained by the voiD team:

by the VoID team

what others say about VoID

Aggregated references:


Listings:

Personal tools
Namespaces

Variants
Actions
Navigation
services
Toolbox