Rule-based Word Clustering for Text Classification

Hui Han, Eren Manavoglu, C. Lee Giles, Hongyuan Zha

Research output: Contribution to journalConference article

6 Citations (Scopus)

Abstract

This paper introduces a rule-based, context-dependent word clustering method, with the rules derived from various domain databases and the word text orthographic properties. Besides significant dimensionality reduction, our experiments show that such rule-based word clustering improves by 8% the overall accuracy of extracting bibliographic fields from references, and by 18.32% on average the class-specific performance on the line classification of document headers.

Original languageEnglish (US)
Pages (from-to)445-446
Number of pages2
JournalSIGIR Forum (ACM Special Interest Group on Information Retrieval)
Issue numberSPEC. ISS.
StatePublished - Dec 1 2003
EventProceedings of the Twenty-Sixth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2003 - Toronto, Ont., Canada
Duration: Jul 28 2003Aug 1 2003

Fingerprint

Experiments
Text classification
Clustering
Rule-based
Dimensionality reduction
Data base
Experiment

All Science Journal Classification (ASJC) codes

  • Management Information Systems
  • Hardware and Architecture

Cite this

Han, Hui ; Manavoglu, Eren ; Giles, C. Lee ; Zha, Hongyuan. / Rule-based Word Clustering for Text Classification. In: SIGIR Forum (ACM Special Interest Group on Information Retrieval). 2003 ; No. SPEC. ISS. pp. 445-446.
@article{9809f2ab738f49a0bbc24f824dbc2468,
title = "Rule-based Word Clustering for Text Classification",
abstract = "This paper introduces a rule-based, context-dependent word clustering method, with the rules derived from various domain databases and the word text orthographic properties. Besides significant dimensionality reduction, our experiments show that such rule-based word clustering improves by 8{\%} the overall accuracy of extracting bibliographic fields from references, and by 18.32{\%} on average the class-specific performance on the line classification of document headers.",
author = "Hui Han and Eren Manavoglu and Giles, {C. Lee} and Hongyuan Zha",
year = "2003",
month = "12",
day = "1",
language = "English (US)",
pages = "445--446",
journal = "SIGIR Forum (ACM Special Interest Group on Information Retrieval)",
issn = "0163-5840",
publisher = "Association for Computing Machinery (ACM)",
number = "SPEC. ISS.",

}

Rule-based Word Clustering for Text Classification. / Han, Hui; Manavoglu, Eren; Giles, C. Lee; Zha, Hongyuan.

In: SIGIR Forum (ACM Special Interest Group on Information Retrieval), No. SPEC. ISS., 01.12.2003, p. 445-446.

Research output: Contribution to journalConference article

TY - JOUR

T1 - Rule-based Word Clustering for Text Classification

AU - Han, Hui

AU - Manavoglu, Eren

AU - Giles, C. Lee

AU - Zha, Hongyuan

PY - 2003/12/1

Y1 - 2003/12/1

N2 - This paper introduces a rule-based, context-dependent word clustering method, with the rules derived from various domain databases and the word text orthographic properties. Besides significant dimensionality reduction, our experiments show that such rule-based word clustering improves by 8% the overall accuracy of extracting bibliographic fields from references, and by 18.32% on average the class-specific performance on the line classification of document headers.

AB - This paper introduces a rule-based, context-dependent word clustering method, with the rules derived from various domain databases and the word text orthographic properties. Besides significant dimensionality reduction, our experiments show that such rule-based word clustering improves by 8% the overall accuracy of extracting bibliographic fields from references, and by 18.32% on average the class-specific performance on the line classification of document headers.

UR - http://www.scopus.com/inward/record.url?scp=1542317626&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=1542317626&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:1542317626

SP - 445

EP - 446

JO - SIGIR Forum (ACM Special Interest Group on Information Retrieval)

JF - SIGIR Forum (ACM Special Interest Group on Information Retrieval)

SN - 0163-5840

IS - SPEC. ISS.

ER -