Hierarchical co-clustering based on entropy splitting

Wei Cheng, Xiang Zhang, Feng Pan, Wei Wang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

Two dimensional contingency tables or co-occurrence matrices arise frequently in various important applications such as text analysis and web-log mining. As a fundamental research topic, co-clustering aims to generate a meaningful partition of the contingency table to reveal hidden relationships between rows and columns. Traditional co-clustering algorithms usually produce a predefined number of flat partition of both rows and columns, which do not reveal relationship among clusters. To address this limitation, hierarchical co-clustering algorithms have attracted a lot of research interests recently. Although successful in various applications, the existing hierarchial co-clustering algorithms are usually based on certain heuristics and do not have solid theoretical background. In this paper, we present a new co-clustering algorithm with solid information theoretic background. It simultaneously constructs a hierarchical structure of both row and column clusters which retains sufficient mutual information between rows and columns of the contingency table. An efficient and effective greedy algorithm is developed which grows a co-cluster hierarchy by successively performing row-wise or column-wise splits that lead to the maximal mutual information gain. Extensive experiments on real datasets demonstrate that our algorithm can reveal essential relationships of row (and column) clusters and has better clustering precision than existing algorithms.

Original languageEnglish (US)
Title of host publicationCIKM 2012 - Proceedings of the 21st ACM International Conference on Information and Knowledge Management
Pages1472-1476
Number of pages5
DOIs
StatePublished - Dec 19 2012
Event21st ACM International Conference on Information and Knowledge Management, CIKM 2012 - Maui, HI, United States
Duration: Oct 29 2012Nov 2 2012

Publication series

NameACM International Conference Proceeding Series

Other

Other21st ACM International Conference on Information and Knowledge Management, CIKM 2012
CountryUnited States
CityMaui, HI
Period10/29/1211/2/12

Fingerprint

Clustering algorithms
Entropy
Experiments

All Science Journal Classification (ASJC) codes

  • Software
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Computer Networks and Communications

Cite this

Cheng, W., Zhang, X., Pan, F., & Wang, W. (2012). Hierarchical co-clustering based on entropy splitting. In CIKM 2012 - Proceedings of the 21st ACM International Conference on Information and Knowledge Management (pp. 1472-1476). (ACM International Conference Proceeding Series). https://doi.org/10.1145/2396761.2398455
Cheng, Wei ; Zhang, Xiang ; Pan, Feng ; Wang, Wei. / Hierarchical co-clustering based on entropy splitting. CIKM 2012 - Proceedings of the 21st ACM International Conference on Information and Knowledge Management. 2012. pp. 1472-1476 (ACM International Conference Proceeding Series).
@inproceedings{ab1346f901fb46069af2ffaf6ee50359,
title = "Hierarchical co-clustering based on entropy splitting",
abstract = "Two dimensional contingency tables or co-occurrence matrices arise frequently in various important applications such as text analysis and web-log mining. As a fundamental research topic, co-clustering aims to generate a meaningful partition of the contingency table to reveal hidden relationships between rows and columns. Traditional co-clustering algorithms usually produce a predefined number of flat partition of both rows and columns, which do not reveal relationship among clusters. To address this limitation, hierarchical co-clustering algorithms have attracted a lot of research interests recently. Although successful in various applications, the existing hierarchial co-clustering algorithms are usually based on certain heuristics and do not have solid theoretical background. In this paper, we present a new co-clustering algorithm with solid information theoretic background. It simultaneously constructs a hierarchical structure of both row and column clusters which retains sufficient mutual information between rows and columns of the contingency table. An efficient and effective greedy algorithm is developed which grows a co-cluster hierarchy by successively performing row-wise or column-wise splits that lead to the maximal mutual information gain. Extensive experiments on real datasets demonstrate that our algorithm can reveal essential relationships of row (and column) clusters and has better clustering precision than existing algorithms.",
author = "Wei Cheng and Xiang Zhang and Feng Pan and Wei Wang",
year = "2012",
month = "12",
day = "19",
doi = "10.1145/2396761.2398455",
language = "English (US)",
isbn = "9781450311564",
series = "ACM International Conference Proceeding Series",
pages = "1472--1476",
booktitle = "CIKM 2012 - Proceedings of the 21st ACM International Conference on Information and Knowledge Management",

}

Cheng, W, Zhang, X, Pan, F & Wang, W 2012, Hierarchical co-clustering based on entropy splitting. in CIKM 2012 - Proceedings of the 21st ACM International Conference on Information and Knowledge Management. ACM International Conference Proceeding Series, pp. 1472-1476, 21st ACM International Conference on Information and Knowledge Management, CIKM 2012, Maui, HI, United States, 10/29/12. https://doi.org/10.1145/2396761.2398455

Hierarchical co-clustering based on entropy splitting. / Cheng, Wei; Zhang, Xiang; Pan, Feng; Wang, Wei.

CIKM 2012 - Proceedings of the 21st ACM International Conference on Information and Knowledge Management. 2012. p. 1472-1476 (ACM International Conference Proceeding Series).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Hierarchical co-clustering based on entropy splitting

AU - Cheng, Wei

AU - Zhang, Xiang

AU - Pan, Feng

AU - Wang, Wei

PY - 2012/12/19

Y1 - 2012/12/19

N2 - Two dimensional contingency tables or co-occurrence matrices arise frequently in various important applications such as text analysis and web-log mining. As a fundamental research topic, co-clustering aims to generate a meaningful partition of the contingency table to reveal hidden relationships between rows and columns. Traditional co-clustering algorithms usually produce a predefined number of flat partition of both rows and columns, which do not reveal relationship among clusters. To address this limitation, hierarchical co-clustering algorithms have attracted a lot of research interests recently. Although successful in various applications, the existing hierarchial co-clustering algorithms are usually based on certain heuristics and do not have solid theoretical background. In this paper, we present a new co-clustering algorithm with solid information theoretic background. It simultaneously constructs a hierarchical structure of both row and column clusters which retains sufficient mutual information between rows and columns of the contingency table. An efficient and effective greedy algorithm is developed which grows a co-cluster hierarchy by successively performing row-wise or column-wise splits that lead to the maximal mutual information gain. Extensive experiments on real datasets demonstrate that our algorithm can reveal essential relationships of row (and column) clusters and has better clustering precision than existing algorithms.

AB - Two dimensional contingency tables or co-occurrence matrices arise frequently in various important applications such as text analysis and web-log mining. As a fundamental research topic, co-clustering aims to generate a meaningful partition of the contingency table to reveal hidden relationships between rows and columns. Traditional co-clustering algorithms usually produce a predefined number of flat partition of both rows and columns, which do not reveal relationship among clusters. To address this limitation, hierarchical co-clustering algorithms have attracted a lot of research interests recently. Although successful in various applications, the existing hierarchial co-clustering algorithms are usually based on certain heuristics and do not have solid theoretical background. In this paper, we present a new co-clustering algorithm with solid information theoretic background. It simultaneously constructs a hierarchical structure of both row and column clusters which retains sufficient mutual information between rows and columns of the contingency table. An efficient and effective greedy algorithm is developed which grows a co-cluster hierarchy by successively performing row-wise or column-wise splits that lead to the maximal mutual information gain. Extensive experiments on real datasets demonstrate that our algorithm can reveal essential relationships of row (and column) clusters and has better clustering precision than existing algorithms.

UR - http://www.scopus.com/inward/record.url?scp=84871067027&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84871067027&partnerID=8YFLogxK

U2 - 10.1145/2396761.2398455

DO - 10.1145/2396761.2398455

M3 - Conference contribution

SN - 9781450311564

T3 - ACM International Conference Proceeding Series

SP - 1472

EP - 1476

BT - CIKM 2012 - Proceedings of the 21st ACM International Conference on Information and Knowledge Management

ER -

Cheng W, Zhang X, Pan F, Wang W. Hierarchical co-clustering based on entropy splitting. In CIKM 2012 - Proceedings of the 21st ACM International Conference on Information and Knowledge Management. 2012. p. 1472-1476. (ACM International Conference Proceeding Series). https://doi.org/10.1145/2396761.2398455