Controlling size when aligning multiple genomic sequences with duplications

Minmei Hou, Piotr Berman, Louxin Zhang, Webb Miller

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    7 Citations (Scopus)

    Abstract

    For a genomic region containing a tandem gene cluster, a proper set of alignments needs to align only orthologous segments, i.e., those separated by a speciation event. Otherwise, methods for finding regions under evolutionary selection will not perform properly. Conversely, the alignments should indicate every orthologous pair of genes or genomic segments. Attaining this goal in practice requires a technique for avoiding a combinatorial explosion in the number of local alignments. To better understand this process, we model it as a graph problem of finding a minimum cardinality set of cliques that contain all edges. We provide an upper bound for an important class of graphs (the problem is 7 NP-hard and very difficult to approximate in the general case), and use the bound and computer simulations to evaluate two heuristic solutions. An implementation of one of them is evaluated on mammalian sequences from the α-globin gene cluster.

    Original languageEnglish (US)
    Title of host publicationAlgorithms in Bioinformatics - 6th International Workshop, WABI 2006, Proceedings
    PublisherSpringer Verlag
    Pages138-149
    Number of pages12
    ISBN (Print)3540395830, 9783540395836
    StatePublished - Jan 1 2006
    Event6th International Workshop on Algorithms in Bioinformatics, WABI 2006 - Zurich, Switzerland
    Duration: Sep 11 2006Sep 13 2006

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Volume4175 LNBI
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349

    Other

    Other6th International Workshop on Algorithms in Bioinformatics, WABI 2006
    CountrySwitzerland
    CityZurich
    Period9/11/069/13/06

    Fingerprint

    Duplication
    Genomics
    Alignment
    Genes
    Gene
    Speciation
    Graph in graph theory
    Clique
    Explosion
    Process Model
    Explosions
    Cardinality
    Computer Simulation
    NP-complete problem
    Heuristics
    Upper bound
    Evaluate
    Computer simulation

    All Science Journal Classification (ASJC) codes

    • Theoretical Computer Science
    • Computer Science(all)

    Cite this

    Hou, M., Berman, P., Zhang, L., & Miller, W. (2006). Controlling size when aligning multiple genomic sequences with duplications. In Algorithms in Bioinformatics - 6th International Workshop, WABI 2006, Proceedings (pp. 138-149). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4175 LNBI). Springer Verlag.
    Hou, Minmei ; Berman, Piotr ; Zhang, Louxin ; Miller, Webb. / Controlling size when aligning multiple genomic sequences with duplications. Algorithms in Bioinformatics - 6th International Workshop, WABI 2006, Proceedings. Springer Verlag, 2006. pp. 138-149 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
    @inproceedings{155d847231054364b2fed6f6b4e07138,
    title = "Controlling size when aligning multiple genomic sequences with duplications",
    abstract = "For a genomic region containing a tandem gene cluster, a proper set of alignments needs to align only orthologous segments, i.e., those separated by a speciation event. Otherwise, methods for finding regions under evolutionary selection will not perform properly. Conversely, the alignments should indicate every orthologous pair of genes or genomic segments. Attaining this goal in practice requires a technique for avoiding a combinatorial explosion in the number of local alignments. To better understand this process, we model it as a graph problem of finding a minimum cardinality set of cliques that contain all edges. We provide an upper bound for an important class of graphs (the problem is 7 NP-hard and very difficult to approximate in the general case), and use the bound and computer simulations to evaluate two heuristic solutions. An implementation of one of them is evaluated on mammalian sequences from the α-globin gene cluster.",
    author = "Minmei Hou and Piotr Berman and Louxin Zhang and Webb Miller",
    year = "2006",
    month = "1",
    day = "1",
    language = "English (US)",
    isbn = "3540395830",
    series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
    publisher = "Springer Verlag",
    pages = "138--149",
    booktitle = "Algorithms in Bioinformatics - 6th International Workshop, WABI 2006, Proceedings",
    address = "Germany",

    }

    Hou, M, Berman, P, Zhang, L & Miller, W 2006, Controlling size when aligning multiple genomic sequences with duplications. in Algorithms in Bioinformatics - 6th International Workshop, WABI 2006, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 4175 LNBI, Springer Verlag, pp. 138-149, 6th International Workshop on Algorithms in Bioinformatics, WABI 2006, Zurich, Switzerland, 9/11/06.

    Controlling size when aligning multiple genomic sequences with duplications. / Hou, Minmei; Berman, Piotr; Zhang, Louxin; Miller, Webb.

    Algorithms in Bioinformatics - 6th International Workshop, WABI 2006, Proceedings. Springer Verlag, 2006. p. 138-149 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4175 LNBI).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    TY - GEN

    T1 - Controlling size when aligning multiple genomic sequences with duplications

    AU - Hou, Minmei

    AU - Berman, Piotr

    AU - Zhang, Louxin

    AU - Miller, Webb

    PY - 2006/1/1

    Y1 - 2006/1/1

    N2 - For a genomic region containing a tandem gene cluster, a proper set of alignments needs to align only orthologous segments, i.e., those separated by a speciation event. Otherwise, methods for finding regions under evolutionary selection will not perform properly. Conversely, the alignments should indicate every orthologous pair of genes or genomic segments. Attaining this goal in practice requires a technique for avoiding a combinatorial explosion in the number of local alignments. To better understand this process, we model it as a graph problem of finding a minimum cardinality set of cliques that contain all edges. We provide an upper bound for an important class of graphs (the problem is 7 NP-hard and very difficult to approximate in the general case), and use the bound and computer simulations to evaluate two heuristic solutions. An implementation of one of them is evaluated on mammalian sequences from the α-globin gene cluster.

    AB - For a genomic region containing a tandem gene cluster, a proper set of alignments needs to align only orthologous segments, i.e., those separated by a speciation event. Otherwise, methods for finding regions under evolutionary selection will not perform properly. Conversely, the alignments should indicate every orthologous pair of genes or genomic segments. Attaining this goal in practice requires a technique for avoiding a combinatorial explosion in the number of local alignments. To better understand this process, we model it as a graph problem of finding a minimum cardinality set of cliques that contain all edges. We provide an upper bound for an important class of graphs (the problem is 7 NP-hard and very difficult to approximate in the general case), and use the bound and computer simulations to evaluate two heuristic solutions. An implementation of one of them is evaluated on mammalian sequences from the α-globin gene cluster.

    UR - http://www.scopus.com/inward/record.url?scp=33750247337&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=33750247337&partnerID=8YFLogxK

    M3 - Conference contribution

    AN - SCOPUS:33750247337

    SN - 3540395830

    SN - 9783540395836

    T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

    SP - 138

    EP - 149

    BT - Algorithms in Bioinformatics - 6th International Workshop, WABI 2006, Proceedings

    PB - Springer Verlag

    ER -

    Hou M, Berman P, Zhang L, Miller W. Controlling size when aligning multiple genomic sequences with duplications. In Algorithms in Bioinformatics - 6th International Workshop, WABI 2006, Proceedings. Springer Verlag. 2006. p. 138-149. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).