Hybrid models for sense guessing 866 of Chinese unknown words

Research output: Contribution to journalArticle

2 Scopus citations

Abstract

This paper addresses the problem of classifying Chinese unknown words into fine-grained semantic categories defined in a Chinese thesaurus, Cilin (Mei et al. 1984). We present three novel knowledge-based models that capture the relationship between the semantic categories of an unknown word and those of its component characters in three different ways, and combine two of them with a corpus-based model that uses contextual information to classify unknown words. Experiments show that the combined knowledge-based model outperforms previous methods on the same task, but the use of contextual information does not further improve performance.

Original languageEnglish (US)
Pages (from-to)99-128
Number of pages30
JournalInternational Journal of Corpus Linguistics
Volume13
Issue number1
DOIs
StatePublished - Dec 1 2008

All Science Journal Classification (ASJC) codes

  • Language and Linguistics
  • Linguistics and Language

Fingerprint Dive into the research topics of 'Hybrid models for sense guessing 866 of Chinese unknown words'. Together they form a unique fingerprint.

  • Cite this