### Abstract

In this paper we investigate the protein sequence design (PSD) problem (also known as the inverse protein folding problem) under the Canonical model^{1} on 2D and 3D lattices [12, 25]. The Canonical model is specified by (i) a geometric representation of a target protein structure with amino acid residues via its contact graph, (ii) a binary folding code in which the amino acids are classified as hydrophobia (H) or polar (P), (iii) an energy function ψ defined in terms of the target structure that should favor sequences with a dense Hydrophobic core and penalize those with many solvent-exposed hydrophobic residues (in the Canonical model, the energy function ψ gives an H-H residue contact in the contact graph a value of -1 and all other contacts a value of 0), and (iv) to prevent the solution from being a biologically meaningless all H sequence, the number of H residues in the sequence S is limited by fixing an upper bound A on the ratio between H and P amino acids. The sequence S is designed by specifying which residues are H and which ones are P in a way that realizes the global minima of the energy function ψ. In this paper, we prove the following results: (1) An earlier proof of NP-completeness of finding the global energy minima for the PSD problem on 3D lattices in [12] was based on the NP-completeness of the same problem on 2D lattices. However, the reduction was not correct and we show that the problem of finding the global energy minima for the PSD problem for 2D lattices can be solved efficiently in polynomial time. But, we show that the problem of finding the global energy minima for the PSD problem on 3D lattices is indeed NP-complete by a providing a different reduction from the problem of finding the largest clique on graphs. (2) Even though the problem of finding the global energy minima on 3D lattices is NP-complete, we show that an arbitrarily close approximation to the global energy minima can indeed be found efficiently by taking appropriate combinations of optimal global energy minima of substrings of the sequence 5 by providing a polynomial-time approximation scheme (PTAS). Our algorithmic technique to design such a PTAS for finding the global energy minima involves using the shifted slice-and-dice approach in [6,17,18J. This result improves the previous best polynomial-time approximation algorithm for finding the global energy minima in [12] with a performance ratio of 1/2.

Original language | English (US) |
---|---|

Title of host publication | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |

Editors | Suleyman Cenk Sahinalp, S. Muthukrishnan, Ugur Dogrusoz |

Publisher | Springer Verlag |

Pages | 244-253 |

Number of pages | 10 |

ISBN (Print) | 354022341X, 9783540223412 |

DOIs | |

State | Published - 2004 |

### Publication series

Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|

Volume | 3109 |

ISSN (Print) | 0302-9743 |

ISSN (Electronic) | 1611-3349 |

### All Science Journal Classification (ASJC) codes

- Theoretical Computer Science
- Computer Science(all)

## Fingerprint Dive into the research topics of 'The protein sequence design problem in canonical model on 2D and 3D lattices'. Together they form a unique fingerprint.

## Cite this

*Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)*(pp. 244-253). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 3109). Springer Verlag. https://doi.org/10.1007/978-3-540-27801-6_18