KWTR: Machine learning for NLP

From semanticweb.org
Jump to: navigation, search

[edit] Contributors:

Main Contributors:

Diana Maynard - University of Sheffield (USFD) (d.maynard@dcs.shef.ac.uk)

Yaoyong Li - University of Sheffield (USFD) (y.li@dcs.shef.ac.uk)

See the list of contributors


  • What is the state of the art of ML for IE?

ML has been widely used for IE. It has been successfully used for named entity recognition and its extension, the information entity recognition (namely recognising the pre-defined types of entities). For example, given a reasonable number of instances of one particular type of entity in training data, learning algorithms such as SVM or Maximal Entropy can recognise 80-90% instances of that type in test documents with similar accuracy. However, ML is not so successful when used for more complicated IE tasks such as relation extraction and event extraction. Here it can only achieve moderate performance (50-60%) for named entity relation extraction and even lower for event extraction. ML has also been used for ontology based IE (OBIE). Some learning algorithms has been proposed for OBIE, which take into account the relations between the concepts in ontology and hence obtain better results than those tha ignore those relations.


  • Provide references and short abstracts of three papers you consider as significant in your research field.

1. ML for information entity recognition: Y. Li, K. Bontcheva, and H. Cunningham. SVM Based Learning System For Information Extraction. In J. Winkler, M. Niranjan and N. Lawerence (Eds.): Deterministic and Statistical Methods in Machine Learning, LNAI 3635, Springer Verlag, pp. 319-339. 2005

This paper presents an SVM-based learning system for information extraction (IE). One distinctive feature of our system is the use of a variant of the SVM, the SVM with uneven margins, which is particularly helpful for small training datasets. In addition, the approach needs fewer SVM classifiers to be trained than other recent SVM-based systems.

2. ML for relation extraction

Zhou G., Su, J., Zhang, J., Zhang, M. Combining Various Knowledge in Relation Extraction, Proceedings of the 43th Annual Meeting of the Association for Computational Linguistics (2005)

This paper investigates the incorporation of diverse lexical, syntactic and semantic knowledge in feature-based relation extraction using SVM. Our study illustrates that the base phrase chunking information is very effective for relation extraction and contributes to most of the performance improvement from syntactic aspect while additional information from full parsing gives limited further enhancement. This suggests that most of useful information in full parse trees for relation extraction is shallow and can be captured by chunking.


3. Multi-class learning algorithm. I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun. Support Vector Machine Learning for Interdependent and Structured Output Spaces. In Proceedings of the 21st International Conference on Machine Learning,, Banff, Canada, 2004.

Learning general functional dependencies is one of the main goals in machine learning.This paper addresses the issue of problems involving complex outputs such as multiple dependent output variables and structured output spaces. The aim is to generalize multiclass Support Vector Machine learning in a formulation that involves features extracted jointly from inputs and outputs. The resulting optimization problem is solved efficiently by a cutting plane algorithm that exploits the sparseness and structural decomposition of the problem.


  • Please provide one or more examples (either business, or research, or both) in which ML for IE has been used (if you can, add some references).

....

  • Are there existing tools or demos? Please indicate some of them.

The machine learning plug-in of GATE provides the facilities for applying ML to IE. It transfers the linguistic features obtained by using other components and/or plug-ins of GATE into a suitable form for the learning algorithms. It implements the start of the art learning algorithms such as SVM for IE, including information entity recognition and relation extraction.


  • What are the open problems in your research field? Why?

One important problem in applying ML for IE is to improve the performance on the complicated IE tasks such as relation extraction and event extraction, because current performances are not sufficient for practical applications. One possible solution is to the discovery of suitable linguistic features for those tasks. Morphological and grammatical features are sufficient for information entity recognition but do not seem to represent the most useful features for relation extraction. Another possible solution is to do the general semantic analysis first (e.g. by using ML) and then extract the relations and events from the semantic analysis results, because the relations and event extraction is really related to the semantic structure of text rather than the morphological or grammatical features of words.

  • Provide references and links of the most relevant Semantic Web research projects in your field.

....

  • What challenges try these projects to overcome?

....

  • What are their foreseen benefits (both in market and scientific community)?

....

  • When, in your opinion, will projects’ results be ready for industry?

....

  • Do you think that it is important to invest (money and time) in these topics? Why?

Yes, because automatic information extraction techniques with high performance would greatly improve the performances of information retrieval and knowledge management, which are so important in the present digital society.



  • What are, in your opinion, the most relevant Semantic Web challenges that will be solved in the long term (10 years)? Why?

....

Personal tools
Namespaces

Variants
Actions
Navigation
services
Toolbox