KWTR: Ontology repositories
Raul Palma Universidad Politecnica de Madrid Please add your CV in the list of contributors
 From Data Repositories to Ontology Repositories
Since we can find in the literature many different meanings and definitions to what a data repository is, and in general to what a repository is, we will first clarify what we understand by a data repository, instead of giving another definition. We consider a data repository any collection of digital data that is available to one or more entities (e.g. users, systems) for a variety of purposes (e.g. learning, administrative processes, research, etc.) and that has the characteristics proposed by Heery, R. and Anderson, S. [Heery05]:
- content is deposited in a repository, whether by the content creator, owner or third party
- the repository architecture manages content as well as metadata
- the repository offers a minimum set of basic services e.g. put, get, search, access control
- the repository must be sustainable and trusted, well-supported and well managed
Furthermore, the architecture of the data repository (e.g. centralized, distributed) should be transparent to the entities accessing its contents.
It is also important to note that over time, data repositories have been referred by other names (e.g. knowledge base, data library, digital library, data warehouse) depending of its contents, purpose or capabilities.
The term data library is usually used in the literature to refer to subject specific datasets (e.g. climate data library, time series data library, geospatial data library, etc.). Moreover, a data library tends to house local data collections and provides access to them through various means (CD-/DVD-ROMs or central server for download). Thus, in general a data library usually provides access to the complete dataset instead to provide the basic services (e.g. search, put, get) a data repository offers.
Around the middle 1990s the term digital library (previously also known as electronic library or virtual library) was first made popular by the NSF/DARPA/NASA Digital Libraries Initiative. According to [arms01] a digital library is a managed collection of information, with associated services, where the information is stored in digital formats and accessible over a network. The information stored can be very diverse and used by many different users. In general a digital library is considered similar to a traditional library (i.e. it used by users to find information that others have created, and use it for study, reference, or entertainment) but it takes advantage of the new technologies to deliver the information to users.
A data warehouse is the main repository of the organization's historical data that became popular during the late 1980s and early 1990s. The purpose of a data warehouse is to perform analysis of the stored data for management's decision making. Data is entered into this repository periodically, usually in an append-only manner. A data repository however, does not necessarily have that analysis functionality provided by a data warehouse.
Similarly to data repository, it is also possible to find many different meaning and definitions to what is a knowledge base. Yet, in general, a knowledge base is a central repository of information, a special kind of database about a particular subject for knowledge management. Usually a knowledge base may use an ontology to specify its structure (entity types and relationships) and its classification scheme, but it may also include unstructured or unformalized information expressed in natural language or procedural code. Also, in contrast to a data repository, usually the purpose of the knowledge base is to allow automated deductive reasoning over the stored knowledge (i.e. decide how to act by running formal reasoning procedures over the base of knowledge).
The widespread use of repositories in academia and industry is due to the variety of digital resources they can hold and to the benefits they provide (e.g. enhanced access to resources, data sharing/reusing, preservation of digital resources, etc.). Furthermore, many areas are using domain specific repositories (e.g. educational repositories, music repository, etc.) that provide all the benefits of a data repository in addition to specialized services/characteristics.
It is not surprising that some years ago, the ontology and semantic web community became interested in using repositories to hold semantic content (e.g. ontologies). Within the last years, ontologies have seen an enormous development and application in many domains, especially in the context of the semantic web. Academia and industry are developing and using ontologies to provide new technologies and support daily operations. Therefore, currently there exists a large amount of ontologies developed by many different parties which makes necessary the means to share and reuse them.
Initial efforts to collect the base of existing ontologies proposed the creation of libraries systems (i.e. known as Ontology library systems) that offered various functions for managing, adapting and standardizing groups of ontologies [ding01o]. These systems defined an important environment in grouping and reorganizing ontologies for further re-use, integration, maintenance, mapping and versioning. They defined an evaluation model based on the functionality the library system provided. Examples of library systems are: WebOnto, Ontolingua, DAML Ontology Library System, SchemaWeb, etc.
Currently, efforts are put in the creation of ontology repositories. An ontology repository is similar to what Ding et al defined as an ontology library system (i.e. they define a set of services for managing, adapting and sharing ontologies), but they also have some differences. In the rest of this chapter we will clarify what is an ontology repository in a formal way.
Y. Ding and D. Fensel. Ontology library systems: The key to successful ontology reuse, 2001. In this paper, the authors identified the main criteria (management, adaptation, and standardization) for evaluating the functionality of the library systems. Based on the proposed criteria, the authors surveyed most existing ontology library systems in that moment (2001). Finally, the authors summarized the comparison and proposed various important requirements for structuring ontology library systems.
J. Hartmann, R. Palma, A. Gomez-Perez. Ontology Repositories. In this paper the authors discuss essential aspects of ontology repositories. First, the authors provide the history of ontology repositories. Then, the authors describe the generic architecture of an ontology repository and corresponding management systems. The core elements and services of an Ontology Repository are discussed in detail and the authors illustrate management systems for ontology repositories.
Marian Babik, Ladislav Hluchy. Towards a scalable grid ontology repository. Cracow '04 Grid Workshop. In this paper the authors describe a grid ontology repository system (OntStore), which is capable of storing and querying the ontologies and the resource descriptions in a distributed manner. It is based on the distributed hash table (DHT) and well known W3C standardized languages OWL/RDF/RDFS. The system decomposes the ontologies into the corresponding triples (subject, predicate, object) and uses the DHT system called Pastry to hash and store the elements. The authors present the results from experiments describing semantic data distribution and query routing performance.
- Centralized ontology portal Onthology: http://www.onthology.org/
- Centralized ontology portal Knowledge Zone: http://smi-protege.stanford.edu:8080/KnowledgeZone/
 Key challanges
Ontology repositories will be a crucial cornerstone facilitating efficient knowledge access and reuse especially in the context of the Semantic Web. We expect that there will be a shift in ontology engineering towards developing ontologies in a modular way. We are optimistic that then the critical mass of ontology modules in repositories can be reached, and ontology engineers will start reusing them and providing new ones. Hence the key challanges for the success of ontology repositories are:
- a standard vocabulary for describing and annotating ontologies (ontology metadata) required to improve and ensure the interoperability, exchange, access and usabilty of ontologies.
- industrial applications which provide easy access to existing ontologies.
 Related projects
- EU Network of Excellence KnowledgeWeb. http://knowledgeweb.semanticweb.org/
- EU project Networked Ontologies - NeOn. http://www.neon-project.org/
KnowledgeWeb project started the initial efforts towards the definition of an ontology metadata vocabulay known as OMV (http://omv.ontoware.org). In this project, also two related applications were developed: The centralized ontology repository Onhtology (http://www.onthology.org/) and the ontology registry Oyster (http://oyster.ontoware.org).
The NeOn project continues the efforst towards the standarization of OMV and is also developing new related applications like the new version of the ontology registry Oyster (Oyster2) (http://ontoware.org/projects/oyster2), the ontology registry specialization of centrasite (http://www.infoq.com/zones/centrasite/), the ontology gateway Watson (http://watson.kmi.open.ac.uk/), etc.
The successful use of ontology repositories will alleviate many problems e.g.:
- the problem of having many isolated ontologies created by many different parties.
- the costs of the duplicate efforts
- the possible interoperability problems between ontology-based applications.
Hence only with the appropriate invest of money and time the benefits of ontology repositories will become a reality in the near future.
- [Heery05] Rachel Heery and Sheila Anderson. Digital repositories review, February 2005.
- [arms01] William Y. Arms. Digital Libraries. The MIT Press, 2001.
- [ding01o] Y. Ding and D. Fensel. Ontology library systems: The key to successful ontology reuse, 2001.