KWTR: Information retrieval
[edit] Contributors:
Paul - Alexandru Chirita, L3S Research Center
Please add your CV in the list of contributors
- What is the state of the art of Semantic Web in your research field?
There is a visible interest in semantics and many researchers are interested in extracting and exploiting semantics in their algorithms (for example, to annotate a Web page with keywords). However, this has little to do with the Semantic Web research area per se. Nobody uses RDF, SPARQL, ontologies, and the like.
- Provide references and short abstracts of three papers you consider as significant in your research field.
1. The PageRank Citation Ranking: Bringing Order to the Web, By S. Brin, L. Page.
Abstract: The importance of a Web page is an inherently subjective matter, which depends on the readers interests, knowledge and attitudes. But there is still much that can be said objectively about the relative importance of Web pages. This paper describes PageRank, a method for rating Web pages objectively and mechanically, effectively measuring the human interest and attention devoted to them.
2. Term-weighting approaches in automatic text retrieval, By G. Salton.
Abstract: The experimental evidence accumulated over the past 20 years indicates that text indexing systems based on the assignment of appropriately weighted single terms produce retrieval results that are superior to those obtainable with other more elaborate text representations. These results depend crucially on the choice of effective termweighting systems. This article summarizes the insights gained in automatic term weighting, and provides baseline single-term-indexing models with which other more elaborate content analysis procedures can be compared.
3. Indexing by latent semantic analysis, By S. Deerwester, S. Dumais, G. Furnas, T. Landauer, R. Harshman.
Abstract: A new method for automatic indexing and retrieval is described. The approach is to take advantage of implicit higher-order structure in the association of terms with documents (semantic structure) in order to improve the detection of relevant documents on the basis of terms found in queries. The particular technique used is singular-value decomposition, in which a large term by document matrix is decomposed into a set of ca. 100 orthogonal factors from which the original matrix can be approximated by linear combination. Documents are represented by ca. 100 item vectors of factor weights. Queries are represented as pseudo-document vectors formed from weighted combinations of terms, and documents with supra-threshold cosine values are returned. Initial tests find this completely automatic method for retrieval to be promising.
- Please provide one or more examples (either business, or research, or both) in which semantic web has been used (if you can, add some references).
I am trying to bridge the gap between Information Retrieval and Semantic Web, but there is little work in this direction. Below is an example of such a bridging:
- P-TAG: Large Scale Automatic Generation of Personalized Annotation TAGs for the Web, By P.–A. Chirita, S. Costache, S. Handschuh, W. Nejdl.
- Are there existing tools or demos? Please indicate some of them.
There are many industrial applications (basically all search engines).
As from the Academia, here are some examples of tools: - The SMART Retrieval System (older, but famous) - The TERRIER Platform.
- What are the open problems in your Semantic Web research field? Why?
I work in Information Retrieval, not in Semantic Web.
- Provide references and links of the most relevant Semantic Web research projects in your field.
Since we are some of the very few working in both Semantic Web and Information Retrieval, I would recommend the Beagle++ Semantic Desktop Search Engine.
- What challenges try these projects to overcome?
We are trying to add semantics (in the Semantic Web kind of way) into Desktop Search applications. We have two major research directions: (1) Add relevant metadata to Desktop resources, and (2) Provide ranking algorithms which order these resources according to various importance metrics.
- What are their foreseen benefits (both in market and scientific community)?
The Personal Information Management / Desktop Search area clearly lacks working solutions, so we think our algorithms will have a reasonably high impact once finished. Also, we expect at least our ranking approach (which is more IR oriented) to be adopted by the industrial players as well.
- When, in your opinion, will projects’ results be ready for industry?
We plan to have reasonably complete results by the end of 2008. However, our ideas should be completely developed by the end of 2007.
- Do you think that it is important to invest (money and time) in these topics? Why?
The major motivation we found was that current Desktop Search tools either do not work at all (here are usually the prototypes), or they build upon simple Term Frequency x Inverse Document Frequency ranking, which is pretty naive for the current amounts of existing data.
- What are, in your opinion, the most relevant Semantic Web challenges that will be solved in the long term (10 years)? Why?
I personally find Semantic Web too complex (and sometimes not really practical) to succeed. If this is true, I think in the next years the Semantic Web algorithms will focus more and more onto real-life practical applications, and in the same time try be as simple as possible.