Hybrid models for semantic classification of Chinese unknown words

Research output: Contribution to conferencePaper

8 Scopus citations

Abstract

This paper addresses the problem of classifying Chinese unknown words into fine-grained semantic categories defined in a Chinese thesaurus. We describe three novel knowledge-based models that capture the relationship between the semantic categories of an unknown word and those of its component characters in three different ways. We then combine two of the knowledge-based models with a corpus-based model which classifies unknown words using contextual information. Experiments show that the knowledge-based models outperform previous methods on the same task, but the use of contextual information does not further improve performance.

Original languageEnglish (US)
Pages188-195
Number of pages8
StatePublished - Dec 1 2007
EventHuman Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics, NAACL HLT 2007 - Rochester, NY, United States
Duration: Apr 22 2007Apr 27 2007

Other

OtherHuman Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics, NAACL HLT 2007
CountryUnited States
CityRochester, NY
Period4/22/074/27/07

    Fingerprint

All Science Journal Classification (ASJC) codes

  • Language and Linguistics
  • Linguistics and Language

Cite this

Lu, X. (2007). Hybrid models for semantic classification of Chinese unknown words. 188-195. Paper presented at Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics, NAACL HLT 2007, Rochester, NY, United States.