On constructing an optimal consensus clustering from multiple clusterings

Piotr Berman, Bhaskar DasGupta, Ming Yang Kao, Jie Wang

    Research output: Contribution to journalArticle

    9 Citations (Scopus)

    Abstract

    Computing a suitable measure of consensus among several clusterings on the same data is an important problem that arises in several areas such as computational biology and data mining. In this paper, we formalize a set-theoretic model for computing such a similarity measure. Roughly speaking, in this model we have k > 1 partitions (clusters) of the same data set each containing the same number of sets and the goal is to align the sets in each partition to minimize a similarity measure. For k = 2, a polynomial-time solution was proposed by Gusfield (Information Processing Letters 82 (2002) 159-164). In this paper, we show that the problem is MAX-SNP-hard for k = 3 even if each partition in each cluster contains no more than 2 elements and provide a 2 - frac(2, k)-approximation algorithm for the problem for any k.

    Original languageEnglish (US)
    Pages (from-to)137-145
    Number of pages9
    JournalInformation Processing Letters
    Volume104
    Issue number4
    DOIs
    StatePublished - Nov 15 2007

    Fingerprint

    Clustering
    Partition
    Approximation algorithms
    Similarity Measure
    Data mining
    Polynomials
    Computing
    Computational Biology
    Information Processing
    Approximation Algorithms
    Data Mining
    Polynomial time
    Minimise
    Model

    All Science Journal Classification (ASJC) codes

    • Theoretical Computer Science
    • Signal Processing
    • Information Systems
    • Computer Science Applications

    Cite this

    Berman, Piotr ; DasGupta, Bhaskar ; Kao, Ming Yang ; Wang, Jie. / On constructing an optimal consensus clustering from multiple clusterings. In: Information Processing Letters. 2007 ; Vol. 104, No. 4. pp. 137-145.
    @article{7c1a17ff823b4640bd34580c00b742c2,
    title = "On constructing an optimal consensus clustering from multiple clusterings",
    abstract = "Computing a suitable measure of consensus among several clusterings on the same data is an important problem that arises in several areas such as computational biology and data mining. In this paper, we formalize a set-theoretic model for computing such a similarity measure. Roughly speaking, in this model we have k > 1 partitions (clusters) of the same data set each containing the same number of sets and the goal is to align the sets in each partition to minimize a similarity measure. For k = 2, a polynomial-time solution was proposed by Gusfield (Information Processing Letters 82 (2002) 159-164). In this paper, we show that the problem is MAX-SNP-hard for k = 3 even if each partition in each cluster contains no more than 2 elements and provide a 2 - frac(2, k)-approximation algorithm for the problem for any k.",
    author = "Piotr Berman and Bhaskar DasGupta and Kao, {Ming Yang} and Jie Wang",
    year = "2007",
    month = "11",
    day = "15",
    doi = "10.1016/j.ipl.2007.06.008",
    language = "English (US)",
    volume = "104",
    pages = "137--145",
    journal = "Information Processing Letters",
    issn = "0020-0190",
    publisher = "Elsevier",
    number = "4",

    }

    On constructing an optimal consensus clustering from multiple clusterings. / Berman, Piotr; DasGupta, Bhaskar; Kao, Ming Yang; Wang, Jie.

    In: Information Processing Letters, Vol. 104, No. 4, 15.11.2007, p. 137-145.

    Research output: Contribution to journalArticle

    TY - JOUR

    T1 - On constructing an optimal consensus clustering from multiple clusterings

    AU - Berman, Piotr

    AU - DasGupta, Bhaskar

    AU - Kao, Ming Yang

    AU - Wang, Jie

    PY - 2007/11/15

    Y1 - 2007/11/15

    N2 - Computing a suitable measure of consensus among several clusterings on the same data is an important problem that arises in several areas such as computational biology and data mining. In this paper, we formalize a set-theoretic model for computing such a similarity measure. Roughly speaking, in this model we have k > 1 partitions (clusters) of the same data set each containing the same number of sets and the goal is to align the sets in each partition to minimize a similarity measure. For k = 2, a polynomial-time solution was proposed by Gusfield (Information Processing Letters 82 (2002) 159-164). In this paper, we show that the problem is MAX-SNP-hard for k = 3 even if each partition in each cluster contains no more than 2 elements and provide a 2 - frac(2, k)-approximation algorithm for the problem for any k.

    AB - Computing a suitable measure of consensus among several clusterings on the same data is an important problem that arises in several areas such as computational biology and data mining. In this paper, we formalize a set-theoretic model for computing such a similarity measure. Roughly speaking, in this model we have k > 1 partitions (clusters) of the same data set each containing the same number of sets and the goal is to align the sets in each partition to minimize a similarity measure. For k = 2, a polynomial-time solution was proposed by Gusfield (Information Processing Letters 82 (2002) 159-164). In this paper, we show that the problem is MAX-SNP-hard for k = 3 even if each partition in each cluster contains no more than 2 elements and provide a 2 - frac(2, k)-approximation algorithm for the problem for any k.

    UR - http://www.scopus.com/inward/record.url?scp=34548032601&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=34548032601&partnerID=8YFLogxK

    U2 - 10.1016/j.ipl.2007.06.008

    DO - 10.1016/j.ipl.2007.06.008

    M3 - Article

    VL - 104

    SP - 137

    EP - 145

    JO - Information Processing Letters

    JF - Information Processing Letters

    SN - 0020-0190

    IS - 4

    ER -