The protein sequence design problem in canonical model on 2D and 3D lattices

Piotr Berman, Bhaskar DasGupta, Dhruv Mubayi, Robert Sloan, György Turán, Yi Zhang

    Research output: Chapter in Book/Report/Conference proceedingChapter

    7 Scopus citations

    Abstract

    In this paper we investigate the protein sequence design (PSD) problem (also known as the inverse protein folding problem) under the Canonical model1 on 2D and 3D lattices [12, 25]. The Canonical model is specified by (i) a geometric representation of a target protein structure with amino acid residues via its contact graph, (ii) a binary folding code in which the amino acids are classified as hydrophobia (H) or polar (P), (iii) an energy function ψ defined in terms of the target structure that should favor sequences with a dense Hydrophobic core and penalize those with many solvent-exposed hydrophobic residues (in the Canonical model, the energy function ψ gives an H-H residue contact in the contact graph a value of -1 and all other contacts a value of 0), and (iv) to prevent the solution from being a biologically meaningless all H sequence, the number of H residues in the sequence S is limited by fixing an upper bound A on the ratio between H and P amino acids. The sequence S is designed by specifying which residues are H and which ones are P in a way that realizes the global minima of the energy function ψ. In this paper, we prove the following results: (1) An earlier proof of NP-completeness of finding the global energy minima for the PSD problem on 3D lattices in [12] was based on the NP-completeness of the same problem on 2D lattices. However, the reduction was not correct and we show that the problem of finding the global energy minima for the PSD problem for 2D lattices can be solved efficiently in polynomial time. But, we show that the problem of finding the global energy minima for the PSD problem on 3D lattices is indeed NP-complete by a providing a different reduction from the problem of finding the largest clique on graphs. (2) Even though the problem of finding the global energy minima on 3D lattices is NP-complete, we show that an arbitrarily close approximation to the global energy minima can indeed be found efficiently by taking appropriate combinations of optimal global energy minima of substrings of the sequence 5 by providing a polynomial-time approximation scheme (PTAS). Our algorithmic technique to design such a PTAS for finding the global energy minima involves using the shifted slice-and-dice approach in [6,17,18J. This result improves the previous best polynomial-time approximation algorithm for finding the global energy minima in [12] with a performance ratio of 1/2.

    Original languageEnglish (US)
    Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    EditorsSuleyman Cenk Sahinalp, S. Muthukrishnan, Ugur Dogrusoz
    PublisherSpringer Verlag
    Pages244-253
    Number of pages10
    ISBN (Print)354022341X, 9783540223412
    DOIs
    StatePublished - 2004

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Volume3109
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349

    All Science Journal Classification (ASJC) codes

    • Theoretical Computer Science
    • Computer Science(all)

    Fingerprint Dive into the research topics of 'The protein sequence design problem in canonical model on 2D and 3D lattices'. Together they form a unique fingerprint.

  • Cite this

    Berman, P., DasGupta, B., Mubayi, D., Sloan, R., Turán, G., & Zhang, Y. (2004). The protein sequence design problem in canonical model on 2D and 3D lattices. In S. C. Sahinalp, S. Muthukrishnan, & U. Dogrusoz (Eds.), Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (pp. 244-253). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 3109). Springer Verlag. https://doi.org/10.1007/978-3-540-27801-6_18