Clustering scientific literature using sparse citation graph analysis

Levent Bolelli, Seyda Ertekin, C. Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

19 Scopus citations

Abstract

It is well known that connectivity analysis of linked documents provides significant information about the structure of the document space for unsupervised learning tasks. However, the ability to identify distinct clusters of documents based on link graph analysis is proportional to the density of the graph and depends on the availability of the linking and/or linked documents in the collection. In this paper, we present an information theoretic approach towards measuring the significance of individual words based on the underlying link structure of the document collection. This enables us to generate a non-uniform weight distribution of the feature space which is used to augment the original corpus-based document similarities. The experimental results on the collection of scientific literature show that our method achieves better separation of distinct groups of documents, yielding improved clustering solutions.

Original languageEnglish (US)
Title of host publicationKnowledge Discovery in Databases
Subtitle of host publicationPKDD 2006 - 10th European Conference on Principles and Practice of Knowledge Discovery in Databases, Proceedings
PublisherSpringer Verlag
Pages30-41
Number of pages12
ISBN (Print)3540453741, 9783540453741
StatePublished - Jan 1 2006
Event10th European Conference on Principles and Practice of Knowledge Discovery in Databases, PKDD 2006 - Berlin, Germany
Duration: Sep 18 2006Sep 22 2006

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4213 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other10th European Conference on Principles and Practice of Knowledge Discovery in Databases, PKDD 2006
CountryGermany
CityBerlin
Period9/18/069/22/06

    Fingerprint

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Bolelli, L., Ertekin, S., & Giles, C. L. (2006). Clustering scientific literature using sparse citation graph analysis. In Knowledge Discovery in Databases: PKDD 2006 - 10th European Conference on Principles and Practice of Knowledge Discovery in Databases, Proceedings (pp. 30-41). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4213 LNAI). Springer Verlag.