Using multiple sequence correlation analysis to characterize functionally important protein regions

Manish C. Saraf, Gregory L. Moore, Costas D. Maranas

Research output: Contribution to journalArticle

30 Citations (Scopus)

Abstract

Protein co-evolution under structural and functional constraints necessitates the preservation of important interactions. Identifying functionally important regions poses many obstacles in protein engineering efforts. In this paper, we present a bioinformatics-inspired approach (residue correlation analysis, RCA) for predicting functionally important domains from protein family sequence data. RCA is comprised of two major steps: (i) identifying pairs of residue positions that mutate in a coordinated manner, and (ii) using these results to identify protein regions that interact with an uncommonly high number of other residues. We hypothesize that strongly correlated pairs result not only from contacting pairs, but also from residues that participate in conformational changes involved during catalysis or important interactions necessary for retaining functionality. The results show that highly mobile loops that assist in ligand association/dissociation tend to exhibit high correlation. RCA results exhibit good agreement with the findings of experimental and molecular dynamics studies for the three protein families that are analyzed: (i) DHFR (dihydrofolate reductase), (ii) cyclophilin, and (iii) formyl-transferase. Specifically, the specificity (percentage of correct predictions) in all three cases is substantially higher than those obtained by entropic measures or contacting residue pairs. In addition, we use our approach in a predictive fashion to identify important regions of a transmembrane amino acid transporter protein for which there is limited structural and functional information available.

Original languageEnglish (US)
Pages (from-to)397-406
Number of pages10
JournalProtein Engineering
Volume16
Issue number6
StatePublished - Jun 1 2003

Fingerprint

Sequence Analysis
Proteins
Hydroxymethyl and Formyl Transferases
Cyclophilins
Amino Acid Transport Systems
Protein Engineering
Tetrahydrofolate Dehydrogenase
Molecular Dynamics Simulation
Computational Biology
Catalysis
Bioinformatics
Ligands
Molecular dynamics
Amino acids
Association reactions

All Science Journal Classification (ASJC) codes

  • Biochemistry
  • Molecular Biology

Cite this

@article{5e0158a1669840cb83ede14e78b89c69,
title = "Using multiple sequence correlation analysis to characterize functionally important protein regions",
abstract = "Protein co-evolution under structural and functional constraints necessitates the preservation of important interactions. Identifying functionally important regions poses many obstacles in protein engineering efforts. In this paper, we present a bioinformatics-inspired approach (residue correlation analysis, RCA) for predicting functionally important domains from protein family sequence data. RCA is comprised of two major steps: (i) identifying pairs of residue positions that mutate in a coordinated manner, and (ii) using these results to identify protein regions that interact with an uncommonly high number of other residues. We hypothesize that strongly correlated pairs result not only from contacting pairs, but also from residues that participate in conformational changes involved during catalysis or important interactions necessary for retaining functionality. The results show that highly mobile loops that assist in ligand association/dissociation tend to exhibit high correlation. RCA results exhibit good agreement with the findings of experimental and molecular dynamics studies for the three protein families that are analyzed: (i) DHFR (dihydrofolate reductase), (ii) cyclophilin, and (iii) formyl-transferase. Specifically, the specificity (percentage of correct predictions) in all three cases is substantially higher than those obtained by entropic measures or contacting residue pairs. In addition, we use our approach in a predictive fashion to identify important regions of a transmembrane amino acid transporter protein for which there is limited structural and functional information available.",
author = "Saraf, {Manish C.} and Moore, {Gregory L.} and Maranas, {Costas D.}",
year = "2003",
month = "6",
day = "1",
language = "English (US)",
volume = "16",
pages = "397--406",
journal = "Protein Engineering, Design and Selection",
issn = "1741-0126",
publisher = "Oxford University Press",
number = "6",

}

Using multiple sequence correlation analysis to characterize functionally important protein regions. / Saraf, Manish C.; Moore, Gregory L.; Maranas, Costas D.

In: Protein Engineering, Vol. 16, No. 6, 01.06.2003, p. 397-406.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Using multiple sequence correlation analysis to characterize functionally important protein regions

AU - Saraf, Manish C.

AU - Moore, Gregory L.

AU - Maranas, Costas D.

PY - 2003/6/1

Y1 - 2003/6/1

N2 - Protein co-evolution under structural and functional constraints necessitates the preservation of important interactions. Identifying functionally important regions poses many obstacles in protein engineering efforts. In this paper, we present a bioinformatics-inspired approach (residue correlation analysis, RCA) for predicting functionally important domains from protein family sequence data. RCA is comprised of two major steps: (i) identifying pairs of residue positions that mutate in a coordinated manner, and (ii) using these results to identify protein regions that interact with an uncommonly high number of other residues. We hypothesize that strongly correlated pairs result not only from contacting pairs, but also from residues that participate in conformational changes involved during catalysis or important interactions necessary for retaining functionality. The results show that highly mobile loops that assist in ligand association/dissociation tend to exhibit high correlation. RCA results exhibit good agreement with the findings of experimental and molecular dynamics studies for the three protein families that are analyzed: (i) DHFR (dihydrofolate reductase), (ii) cyclophilin, and (iii) formyl-transferase. Specifically, the specificity (percentage of correct predictions) in all three cases is substantially higher than those obtained by entropic measures or contacting residue pairs. In addition, we use our approach in a predictive fashion to identify important regions of a transmembrane amino acid transporter protein for which there is limited structural and functional information available.

AB - Protein co-evolution under structural and functional constraints necessitates the preservation of important interactions. Identifying functionally important regions poses many obstacles in protein engineering efforts. In this paper, we present a bioinformatics-inspired approach (residue correlation analysis, RCA) for predicting functionally important domains from protein family sequence data. RCA is comprised of two major steps: (i) identifying pairs of residue positions that mutate in a coordinated manner, and (ii) using these results to identify protein regions that interact with an uncommonly high number of other residues. We hypothesize that strongly correlated pairs result not only from contacting pairs, but also from residues that participate in conformational changes involved during catalysis or important interactions necessary for retaining functionality. The results show that highly mobile loops that assist in ligand association/dissociation tend to exhibit high correlation. RCA results exhibit good agreement with the findings of experimental and molecular dynamics studies for the three protein families that are analyzed: (i) DHFR (dihydrofolate reductase), (ii) cyclophilin, and (iii) formyl-transferase. Specifically, the specificity (percentage of correct predictions) in all three cases is substantially higher than those obtained by entropic measures or contacting residue pairs. In addition, we use our approach in a predictive fashion to identify important regions of a transmembrane amino acid transporter protein for which there is limited structural and functional information available.

UR - http://www.scopus.com/inward/record.url?scp=0042767572&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0042767572&partnerID=8YFLogxK

M3 - Article

VL - 16

SP - 397

EP - 406

JO - Protein Engineering, Design and Selection

JF - Protein Engineering, Design and Selection

SN - 1741-0126

IS - 6

ER -