Web Content Mining with Human Language Technologies workshop 2006
Subevent of ISWC2006
Start November 6 2006 13:00 (iCal)
End November 6 2006 17:00
Homepage: Homepage
City: Athens, Georgia
Country: USA
Important dates
Papers due: August 1 2006
Notification: September 5 2006
Camera ready due: September 18 2006

With the large growth of the information stored in the World Wide Web, it is necessary to have available tools for automatic or semi-automatic analyzes of web data. Hence, a large effort has been invested in the last years in developing techniques for extracting patterns and implicit information from the web, a task that is usually known as Web Mining. Web Mining itself can be divided into three subtasks according to the kind of data that is collected: web structure, web usage and web content.

Web content mining consists of automatically mining data from textual web documents that can be represented with machine-readable semantic formalisms. Initially, most web content mining systems used wrappers to map documents to other data structures, but this is highly dependent on the the layout and formatting instructions inside web pages. Therefore, alternative approaches, that make use of Natural Language Processing-based techniques, are increasingly used.

While more traditional approaches to Information Extraction from text, such as those applied to the Message Understanding Conferences during the nineties, relied on small collections of documents with many semantic annotations, the characterist.css of the web (its size, redundancy and the lack of semantic annotations in most texts) favor efficient algorithms able to learn from unannotated data. Furthermore, new types of web content such as web forums, blogs and wikis, some of them included in the so-called Web 2.0, are also a source of textual information that contain an underlying structure from which specialist systems can benefit. The workshop will give special emphasis to how existing techniques can benefit from these kinds of contents.

This workshop aims at bringing together researchers from the Semantic Web, the Natural Language Processing and the Text Mining communities. The web constitutes a unique source of information to train and exploit systems for tasks such as Named Entity Identification and Classification, Term Identification, Relationships Extraction, Ontology Learning and Population from text and Text Mining. The Semantic Web community can contribute providing semantic formalisms and tools for knowledge representation and reasoning for exploiting the extracted metadata. The goal of the workshop is to establish communication between all these communities.