Selecting a protein sequence that corresponds to a specific three-dimensional protein structure is known as the protein design problem. One principal bottleneck in solving this problem is our lack of knowledge of precise atomic interactions. Using a simple model of amino acid interactions, we determine three crucial factors that are important for solving the protein design problem. Among these factors is the protein alphabet-a set of sequence elements that encodes protein structure. Our model predicts that alphabet size is independent of protein length, suggesting the possibility of designing a protein of arbitrary length with the natural protein alphabet. We also find that protein alphabet size is governed by protein structural properties and the energetic properties of the protein alphabet units. We discover that the usage of average types of amino acid in proteins is less than expected if amino acids were chosen randomly with naturally occurring frequencies. We propose three possible scenarios that account for amino acid underusage in proteins. These scenarios suggest the possibility that amino acids themselves might not constitute the alphabet of natural proteins.

Original languageEnglish (US)
Pages (from-to)622-628
Number of pages7
JournalProteins: Structure, Function and Genetics
Issue number4
StatePublished - Mar 1 2004

All Science Journal Classification (ASJC) codes

  • Structural Biology
  • Biochemistry
  • Molecular Biology


Dive into the research topics of 'What Is the Protein Design Alphabet?'. Together they form a unique fingerprint.

Cite this