Harmony K-means algorithm for document clustering

Mehrdad Mahdavi, Hassan Abolhassani

Research output: Contribution to journalArticle

90 Scopus citations

Abstract

Fast and high quality document clustering is a crucial task in organizing information, search engine results, enhancing web crawling, and information retrieval or filtering. Recent studies have shown that the most commonly used partition-based clustering algorithm, the K-means algorithm, is more suitable for large datasets. However, the K-means algorithm can generate a local optimal solution. In this paper we propose a novel Harmony K-means Algorithm (HKA) that deals with document clustering based on Harmony Search (HS) optimization method. It is proved by means of finite Markov chain theory that the HKA converges to the global optimum. To demonstrate the effectiveness and speed of HKA, we have applied HKA algorithms on some standard datasets. We also compare the HKA with other meta-heuristic and model-based document clustering approaches. Experimental results reveal that the HKA algorithm converges to the best known optimum faster than other methods and the quality of clusters are comparable.

Original languageEnglish (US)
Pages (from-to)370-391
Number of pages22
JournalData Mining and Knowledge Discovery
Volume18
Issue number3
DOIs
StatePublished - Jun 1 2009

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Computer Science Applications
  • Computer Networks and Communications

Fingerprint Dive into the research topics of 'Harmony K-means algorithm for document clustering'. Together they form a unique fingerprint.

  • Cite this