9 Citations (Scopus)

Abstract

Selecting a protein sequence that corresponds to a specific three-dimensional protein structure is known as the protein design problem. One principal bottleneck in solving this problem is our lack of knowledge of precise atomic interactions. Using a simple model of amino acid interactions, we determine three crucial factors that are important for solving the protein design problem. Among these factors is the protein alphabet-a set of sequence elements that encodes protein structure. Our model predicts that alphabet size is independent of protein length, suggesting the possibility of designing a protein of arbitrary length with the natural protein alphabet. We also find that protein alphabet size is governed by protein structural properties and the energetic properties of the protein alphabet units. We discover that the usage of average types of amino acid in proteins is less than expected if amino acids were chosen randomly with naturally occurring frequencies. We propose three possible scenarios that account for amino acid underusage in proteins. These scenarios suggest the possibility that amino acids themselves might not constitute the alphabet of natural proteins.

Original languageEnglish (US)
Pages (from-to)622-628
Number of pages7
JournalProteins: Structure, Function and Genetics
Volume54
Issue number4
DOIs
StatePublished - Mar 1 2004

Fingerprint

Proteins
Amino Acids
Structural properties

All Science Journal Classification (ASJC) codes

  • Structural Biology
  • Biochemistry
  • Molecular Biology

Cite this

@article{b63702fbc81b4404a48b0e1c33736910,
title = "What Is the Protein Design Alphabet?",
abstract = "Selecting a protein sequence that corresponds to a specific three-dimensional protein structure is known as the protein design problem. One principal bottleneck in solving this problem is our lack of knowledge of precise atomic interactions. Using a simple model of amino acid interactions, we determine three crucial factors that are important for solving the protein design problem. Among these factors is the protein alphabet-a set of sequence elements that encodes protein structure. Our model predicts that alphabet size is independent of protein length, suggesting the possibility of designing a protein of arbitrary length with the natural protein alphabet. We also find that protein alphabet size is governed by protein structural properties and the energetic properties of the protein alphabet units. We discover that the usage of average types of amino acid in proteins is less than expected if amino acids were chosen randomly with naturally occurring frequencies. We propose three possible scenarios that account for amino acid underusage in proteins. These scenarios suggest the possibility that amino acids themselves might not constitute the alphabet of natural proteins.",
author = "Nikolay Dokholyan",
year = "2004",
month = "3",
day = "1",
doi = "10.1002/prot.10633",
language = "English (US)",
volume = "54",
pages = "622--628",
journal = "Proteins: Structure, Function and Genetics",
issn = "0887-3585",
publisher = "Wiley-Liss Inc.",
number = "4",

}

What Is the Protein Design Alphabet? / Dokholyan, Nikolay.

In: Proteins: Structure, Function and Genetics, Vol. 54, No. 4, 01.03.2004, p. 622-628.

Research output: Contribution to journalArticle

TY - JOUR

T1 - What Is the Protein Design Alphabet?

AU - Dokholyan, Nikolay

PY - 2004/3/1

Y1 - 2004/3/1

N2 - Selecting a protein sequence that corresponds to a specific three-dimensional protein structure is known as the protein design problem. One principal bottleneck in solving this problem is our lack of knowledge of precise atomic interactions. Using a simple model of amino acid interactions, we determine three crucial factors that are important for solving the protein design problem. Among these factors is the protein alphabet-a set of sequence elements that encodes protein structure. Our model predicts that alphabet size is independent of protein length, suggesting the possibility of designing a protein of arbitrary length with the natural protein alphabet. We also find that protein alphabet size is governed by protein structural properties and the energetic properties of the protein alphabet units. We discover that the usage of average types of amino acid in proteins is less than expected if amino acids were chosen randomly with naturally occurring frequencies. We propose three possible scenarios that account for amino acid underusage in proteins. These scenarios suggest the possibility that amino acids themselves might not constitute the alphabet of natural proteins.

AB - Selecting a protein sequence that corresponds to a specific three-dimensional protein structure is known as the protein design problem. One principal bottleneck in solving this problem is our lack of knowledge of precise atomic interactions. Using a simple model of amino acid interactions, we determine three crucial factors that are important for solving the protein design problem. Among these factors is the protein alphabet-a set of sequence elements that encodes protein structure. Our model predicts that alphabet size is independent of protein length, suggesting the possibility of designing a protein of arbitrary length with the natural protein alphabet. We also find that protein alphabet size is governed by protein structural properties and the energetic properties of the protein alphabet units. We discover that the usage of average types of amino acid in proteins is less than expected if amino acids were chosen randomly with naturally occurring frequencies. We propose three possible scenarios that account for amino acid underusage in proteins. These scenarios suggest the possibility that amino acids themselves might not constitute the alphabet of natural proteins.

UR - http://www.scopus.com/inward/record.url?scp=1242312358&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=1242312358&partnerID=8YFLogxK

U2 - 10.1002/prot.10633

DO - 10.1002/prot.10633

M3 - Article

VL - 54

SP - 622

EP - 628

JO - Proteins: Structure, Function and Genetics

JF - Proteins: Structure, Function and Genetics

SN - 0887-3585

IS - 4

ER -