VoID
| Vocabulary of Interlinked Datasets | |
|---|---|
| Homepage: | rdfs.org/ns/void/ |
| Language: | RDF Schema |
| Last release: | March 6 2011 |
| Last revision: | March 6 2011 |
| Namespace: | http://rdfs.org/ns/void# |
VoID (from "Vocabulary of Interlinked Datasets") is an RDF based schema to describe linked datasets. With VoID the discovery and usage of linked datasets can be performed both effectively and efficiently. A dataset is a collection of data, published and maintained by a single provider, available as RDF, and accessible, for example, through dereferenceable HTTP URIs or a SPARQL endpoint.
Overview
Basically, we find two classes at the heart of VoID:
- A dataset (
void:Dataset) is a collection of data, which is:- published and maintained by a single provider, and
- available as RDF, and
- accessible, for example, through dereferenceable HTTP URIs or a SPARQL endpoint.
- The interlinking is modelled by a linkset (
void:Linkset). A linkset in voiD is a subclass of a dataset, used for storing triples to express the interlinking relationship between datasets. In each interlinking triple, the subject is a resource hosted in one dataset and the object is a resource hosted in another dataset. This modelling enables a flexible and powerful way to talk in great detail about the interlinking between two datasets, such as how many links there exist, which kind of links (e.g.owl:sameAsorfoaf:knows) are present, or stating who claims these statements.
In the following, the modelling of the interlinking in voiD is depicted:
The core resources of the VoID spec are as follows:
- VoID vocabulary (normative), defines the classes and properties (available in HTML and RDF)
- Describing Linked Datasets with the VoID Vocabulary, explains the usage of VoID for both data publisher and consumer (along with other vocabularies such as Dublin Core, FOAF, etc.)
- VoID code repository, hosting exemplary implementations (also issues re the vocabulary are accessible there)
Using VoID
A simple VoID example that describes two well-known LOD datasets and their interlinking is shown in the following.
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix dcterms: <http://purl.org/dc/terms/> . @prefix void: <http://rdfs.org/ns/void#> . :DBpedia rdf:type void:Dataset ; foaf:homepage <http://dbpedia.org/> . :DBLP rdf:type void:Dataset ; foaf:homepage <http://www4.wiwiss.fu-berlin.de/dblp/all> ; dcterms:subject <http://dbpedia.org/resource/Computer_science> ; dcterms:subject <http://dbpedia.org/resource/Journal> ; dcterms:subject <http://dbpedia.org/resource/Proceedings> . :DBpedia void:subset :DBpedia2DBLP . :DBpedia2DBLP rdf:type void:Linkset ; void:target :DBpedia ; void:target :DBLP .
Let us assume that the above VoID description has, for example, been gathered by a semantic indexer from the VoID documents data publishers provide along with their dataset. It is then possible to execute the following example query:
SELECT DISTINCT ?dataset
WHERE {
?dataset a void:Dataset .
?dataset dcterms:subject <http://dbpedia.org/resource/Journal> .
}
This query asks for all datasets that have been categorised as containing data about journals and will eventually return :DBLP. A slightly modified version of the query above applied to the RKB explorer is shown below:
SELECT DISTINCT ?dataset
WHERE {
?dataset a void:Dataset .
?dataset dcterms:subject <http://dbpedia.org/resource/Category:Computer_scientists> .
}
The above SPARQL query will list all datasets about computer scientists.
Much more is possible with VoID, though. You can describe basic dataset metadata (such as publisher, license, etc.), vocabularies used, example resources, SPARQL endpoint availability, and of course the fine-grained interlinking between the datasets. Such an example, describing partly the interlinking between DBpedia and Geonames, is:
:DBpedia void:subset :DBpedia2Geonames .
:DBpedia2Geonames a void:Linkset ;
void:linkPredicate owl:sameAs ;
void:target :DBpedia ;
void:target :Geonames .
Generating and Consuming
There are already tools and applications available that consume or produce VoID.
Generators & Editors
- voidGen (Christoph Böhm, HPI, Universität Potsdam)
- Hadoop-based, scalable VoID generation
- This was used for Creating voiD Descriptions for Web-scale Data (winner of the ISWC2010 Billion Triples Challenge Track)
- VoID tools is a collection of Jena-based tools to support the generation of VoID descriptions including RDFStats statistics.
- OpenLink's Virtuoso produced VoID, see DB.DBA.RDF_VOID_STORE
- liftSSM is an XSLT that takes a semantic sitemap in XML and creates a stub voiD description in RDF/XML.
- The current version v1.1 of the NxParser (Parser for NTriples, NQuads, and more) supports exporting VoID descriptions of statistics.
- ve2 - the2 VoID editor allows to manually generate a VoID file in RDF Turtle format, incl. the definition of the characteristics of your linked dataset, such as categories, interlinking, technical features, licensing, etc.
Exploration, Browser, Stores
- At the VoID store one can find, browse and query various VoID descriptions.
- The VoID Browser allows to view and query VoID descriptions.
- voX, the Dataset Explorer allows to explore datasets.
Examples in the Wild
In the examples below, VoID is used for different purposes and use cases, however, the time it took people to implement it may be an indicator for how remarkably easy it is to generate/consume VoID.
Decimalised Database of Concepts
In the decimalised database of concepts (DDC) dataset, VoID is used extensively. DDC is a collection of topics suitable for use in linked data. It is inspired by the Dewey Decimal Classification, but no guarantees are made about the closeness of its resemblance as a whole. SKOS mapping links are provided from this database to the Dewey system, to Library of Congree Classification codes and to DBPedia resources where possible.
EPrints
EPrints Repository software publishes RDF as of v3.2.1, and automatically describes the dataset using VoID. Suggestions for improvement to cjg@ecs.soton.ac.uk (it's worth getting right as many repositories will end up with this code)
Italian National Research Council (CNR)
Italian National Research Council (CNR) publishes organizational data at http://data.cnr.it/ and provides also a VoID description.
Lingvoj
As reported by Bernard Vatant, lingvoj has a VoID description as well; lingvoj is a linked dataset dedicated to the publication and use of multilingual RDF descriptions of human languages.
LODStats
LODStats attempts to compute comprehensive summaries and statistics for all LOD datasets on the Web. It uses VoID and the Data Cube Vocabulary to represent the statistics. Example for Europeana Linked Data
Open Data Communities
Open Data Communities uses VoID, see for example Index of Multiple Deprivation Ranking, 2010.
OECD Glossary of Statistics
oecd.dataincubator.org is a dataset offering VoID about data extracted from the OECD Glossary of Statistics.
OpenLink Virtuoso
Since Virtuoso 5.0.10 (2009-02-13), OpenLink has included support for VoID. Further, Kingsley Idehen (CEO and founder of OpenLink) has announced a demo from their Virtuoso platform: the URIBurner service turns structured HTML into RDF, and uses VoID to represent the (on the fly) generated data description.
As announced on 2009-03-05, OpenLink has generated a VoID graph for DBpedia; use <http://dbpedia.org/void/> for the default graph field at http://dbpedia.org/sparql.
Virtuoso (both Open- and Closed-Source variants) also now includes scripts which use built-in functions for VoID generation and storage, e.g., DB.DBA.RDF_VOID_STORE
Ordnance Survey
Ordnance Survey, Great Britain's national mapping agency uses VoID to describe their data.
PSI Catalogues Aggregator
The PSI Catalogues Aggregator offers voiD descriptions for Public Sector Information (PSI) catalogues. For example, see their data.gov.uk VoID description.
RAMON, Eurostat's Metadata Server
The Eurostat Metadata Server RAMON uses VoID to describe their data, including countries and NUTS codes.
RDFohloh
Sergio announced that RDFohloh has VoID descriptions.
RKB explorer
As reported by Hugh Glaser, the RKB explorer activity has a VoID site which enables query and browse for CRS datasets. For example:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX void: <http://rdfs.org/ns/void#> PREFIX scovo: <http://purl.org/NET/scovo#> SELECT ?subjects ?objects ?stats WHERE { ?crs void:subjectsTarget ?subjects . ?crs void:objectsTarget ?objects . ?crs void:statItem ?item . ?item rdf:value ?stats }
The interlinkage of the RKB sites can be visualised from the VoID data.
The VoID data for a particular RKB site can be accessed as Linked Data at that site, for example the dblp.rkbexplorer.com site has a void.ttl file, voiD dataset URI and information about its CRS.
RPI's Linked Data from data.gov
http://data-gov.tw.rpi.edu uses VoID to describe their datasets, see for example in an URI design note.
SchemaCache
Talis' SchemaCache publishes a VoID description.
SPARQL Endpoints Status
Mondeca provides a list of the availability of public SPARQL endpoints fetched dynamically using CKAN, using VoID along with an extension to describe the status of an endpoint.
The Stationery Office (TSO)
The UK-based 'The Stationery Office' (TSO) provides information management & publishing solutions to the public and private sectors and uses VoID to describe different UK datasets, see for example http://gov.tso.co.uk/gazettes/void
Telegraphis
Linked Data about currencies on telegraphis.net, see http://telegraphis.net/data/void
World Bank
World Bank Linked Data has a VoID + SPARQL Service Description.
Feedback and Discussions
We have a VoID discussion group (void-discussion@googlegroups.com) if you are interested to share your experience or have a question. If you have a feature request or want to file a bug report, please use the VoID Issue Tracker. Some of us hang out on #swig IRC channel at Freenode.
Related Specifications
There are some specifications that use or extend VoID or are related to it:
- DCAT (Data Catalog vocabulary) - see http://vocab.deri.ie/dcat-overview
- RDF HDT (Header-Dictionary-Triples) - see http://www.w3.org/Submission/HDT-Related/
- The Provenance Vocabulary - see http://purl.org/net/provenance/
- voidp: A Vocabulary for Data and Dataset Provenance - see http://www.enakting.org/provenance/voidp/
See also
The VoID vocabulary is maintained by the voiD team:
by the VoID team
- voiD stores and Interesting Queries, Keith Alexander, blog post
- voiD, datasets, graphs, documents, and dcterms:isPartOf backlinks, Keith Alexander, blog post
- voiD talk, Michael Hausenblas, LDOW09
- On the Effectiveness and Efficiency of Discovery, Michael Hausenblas, blog post
- voiD: a Vocabulary of Interlinked Datasets, Keith Alexander, blog post
- Discovery and Usage of Linked Datasets on the Web of Data, Michael Hausenblas, NodMag #4
- European Semantic Web Conference 2008 at Tenerife, Keith Alexander, blog post
- Back from ESWC 2008, Richard Cyganiak, blog post
- voiD - full ESWC08 lightning talk, Michael Hausenblas, slides
what others say about VoID
Aggregated references:
- 50+ citations via Google Scholar (in early 2012)
Listings:
- Making things findable Peter Mika, WIMS2011 key note
- Practical semantic web - creating a catalog of Linked data Anand Mallaya, blog post
- Creating a Linked Data Architecture for Neuroscience Tim Schultz, Johnson & Johnson
- Designing URI Sets for the UK Public Sector UK Chief Technology Officer Council
- Uncertainty reasoning for linked data (position paper at URSW09 workshop) Dave Reynolds, HP labs
- A journey to Semantic Web query federation in the life sciences Kei-Hoi Cheung et al
- Linked Data and the Public Domain Ian Davis, Talis, Nodalities blog
- ArticleReview: Geographical Linked Data: The Administrative Geography of Great Britain on the Semantic Web, Andrea Huang, blog post
- Discovering SPARQL, Alex Tucker, Talis blog post
- service descriptions: comparison of VoiD, DARQ, and SADDLE, Gregory Williams, post to public-rdf-dawg@w3.org
- VoiD graph for DBpedia, Kingsley Idehen, post to public-lod@w3.org
- On the relation of voiD and Silk, Chris Bizer, post to public-lod@w3.org
- The Linked Data Value Spiral, Benjamin Nowack, blog post
- Faceted Views over Large-Scale Linked Data, Orri Erling, short paper submission to LDOW09, a WWW09 workshop
- Distributed Query: Pellet into the voiD, Kendall Clark, blog post
- voiD: Linking Linked Data, Zach Beauvais, Talis blog post
- Surveying with voiD, PeteJ, eFoundations, blog post
- Linked Data & The Year 2009 (updated), Orri Erling, blog post
- Optimizing SPARQL Queries over Disparate RDF Data Sources through Distributed Semi-Joins, Jan Zemanek, Simon Schenk, ISWC2008 Posters and Demonstrations
- VOID, Or Will the LOD Cloud bring Rain?, Orri Erling, blog post
- Linked Data Metadata. Introspecting the Semantic Web, Aldo Bucchi, blog post
- Describing the content of RDF datasets, Yves Raimond, blog post
- Uniform Access to Links and Properties via W3C ESW Wiki
