Utilizing protein structure to identify non-random somatic mutations

Gregory A. Ryslik, Yuwei Cheng, Kei Hoi Cheung, Yorgo Modis, Hongyu Zhao

Research output: Contribution to journalArticle

25 Citations (Scopus)

Abstract

Background: Human cancer is caused by the accumulation of somatic mutations in tumor suppressors and oncogenes within the genome. In the case of oncogenes, recent theory suggests that there are only a few key " driver" mutations responsible for tumorigenesis. As there have been significant pharmacological successes in developing drugs that treat cancers that carry these driver mutations, several methods that rely on mutational clustering have been developed to identify them. However, these methods consider proteins as a single strand without taking their spatial structures into account. We propose an extension to current methodology that incorporates protein tertiary structure in order to increase our power when identifying mutation clustering.Results: We have developed iPAC (identification of Protein Amino acid Clustering), an algorithm that identifies non-random somatic mutations in proteins while taking into account the three dimensional protein structure. By using the tertiary information, we are able to detect both novel clusters in proteins that are known to exhibit mutation clustering as well as identify clusters in proteins without evidence of clustering based on existing methods. For example, by combining the data in the Protein Data Bank (PDB) and the Catalogue of Somatic Mutations in Cancer, our algorithm identifies new mutational clusters in well known cancer proteins such as KRAS and PI3KC α. Further, by utilizing the tertiary structure, our algorithm also identifies clusters in EGFR, EIF2AK2, and other proteins that are not identified by current methodology. The R package is available at: http://www.bioconductor.org/packages/2.12/bioc/html/iPAC.html.Conclusion: Our algorithm extends the current methodology to identify oncogenic activating driver mutations by utilizing tertiary protein structure when identifying nonrandom somatic residue mutation clusters.

Original languageEnglish (US)
Article number190
JournalBMC bioinformatics
Volume14
Issue number1
DOIs
StatePublished - Jun 13 2013

Fingerprint

Protein Structure
Mutation
Proteins
Protein
Cluster Analysis
Clustering
Cancer
Driver
Neoplasms
Tertiary Protein Structure
Oncogenes
Amino Acids
Methodology
Amino acids
Spatial Structure
Tumor
Drugs
Genome
Carcinogenesis
Clustering algorithms

All Science Journal Classification (ASJC) codes

  • Structural Biology
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Applied Mathematics

Cite this

Ryslik, G. A., Cheng, Y., Cheung, K. H., Modis, Y., & Zhao, H. (2013). Utilizing protein structure to identify non-random somatic mutations. BMC bioinformatics, 14(1), [190]. https://doi.org/10.1186/1471-2105-14-190
Ryslik, Gregory A. ; Cheng, Yuwei ; Cheung, Kei Hoi ; Modis, Yorgo ; Zhao, Hongyu. / Utilizing protein structure to identify non-random somatic mutations. In: BMC bioinformatics. 2013 ; Vol. 14, No. 1.
@article{54df6e2eca5044c1a1b1517ff5fba4d6,
title = "Utilizing protein structure to identify non-random somatic mutations",
abstract = "Background: Human cancer is caused by the accumulation of somatic mutations in tumor suppressors and oncogenes within the genome. In the case of oncogenes, recent theory suggests that there are only a few key {"} driver{"} mutations responsible for tumorigenesis. As there have been significant pharmacological successes in developing drugs that treat cancers that carry these driver mutations, several methods that rely on mutational clustering have been developed to identify them. However, these methods consider proteins as a single strand without taking their spatial structures into account. We propose an extension to current methodology that incorporates protein tertiary structure in order to increase our power when identifying mutation clustering.Results: We have developed iPAC (identification of Protein Amino acid Clustering), an algorithm that identifies non-random somatic mutations in proteins while taking into account the three dimensional protein structure. By using the tertiary information, we are able to detect both novel clusters in proteins that are known to exhibit mutation clustering as well as identify clusters in proteins without evidence of clustering based on existing methods. For example, by combining the data in the Protein Data Bank (PDB) and the Catalogue of Somatic Mutations in Cancer, our algorithm identifies new mutational clusters in well known cancer proteins such as KRAS and PI3KC α. Further, by utilizing the tertiary structure, our algorithm also identifies clusters in EGFR, EIF2AK2, and other proteins that are not identified by current methodology. The R package is available at: http://www.bioconductor.org/packages/2.12/bioc/html/iPAC.html.Conclusion: Our algorithm extends the current methodology to identify oncogenic activating driver mutations by utilizing tertiary protein structure when identifying nonrandom somatic residue mutation clusters.",
author = "Ryslik, {Gregory A.} and Yuwei Cheng and Cheung, {Kei Hoi} and Yorgo Modis and Hongyu Zhao",
year = "2013",
month = "6",
day = "13",
doi = "10.1186/1471-2105-14-190",
language = "English (US)",
volume = "14",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central",
number = "1",

}

Ryslik, GA, Cheng, Y, Cheung, KH, Modis, Y & Zhao, H 2013, 'Utilizing protein structure to identify non-random somatic mutations', BMC bioinformatics, vol. 14, no. 1, 190. https://doi.org/10.1186/1471-2105-14-190

Utilizing protein structure to identify non-random somatic mutations. / Ryslik, Gregory A.; Cheng, Yuwei; Cheung, Kei Hoi; Modis, Yorgo; Zhao, Hongyu.

In: BMC bioinformatics, Vol. 14, No. 1, 190, 13.06.2013.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Utilizing protein structure to identify non-random somatic mutations

AU - Ryslik, Gregory A.

AU - Cheng, Yuwei

AU - Cheung, Kei Hoi

AU - Modis, Yorgo

AU - Zhao, Hongyu

PY - 2013/6/13

Y1 - 2013/6/13

N2 - Background: Human cancer is caused by the accumulation of somatic mutations in tumor suppressors and oncogenes within the genome. In the case of oncogenes, recent theory suggests that there are only a few key " driver" mutations responsible for tumorigenesis. As there have been significant pharmacological successes in developing drugs that treat cancers that carry these driver mutations, several methods that rely on mutational clustering have been developed to identify them. However, these methods consider proteins as a single strand without taking their spatial structures into account. We propose an extension to current methodology that incorporates protein tertiary structure in order to increase our power when identifying mutation clustering.Results: We have developed iPAC (identification of Protein Amino acid Clustering), an algorithm that identifies non-random somatic mutations in proteins while taking into account the three dimensional protein structure. By using the tertiary information, we are able to detect both novel clusters in proteins that are known to exhibit mutation clustering as well as identify clusters in proteins without evidence of clustering based on existing methods. For example, by combining the data in the Protein Data Bank (PDB) and the Catalogue of Somatic Mutations in Cancer, our algorithm identifies new mutational clusters in well known cancer proteins such as KRAS and PI3KC α. Further, by utilizing the tertiary structure, our algorithm also identifies clusters in EGFR, EIF2AK2, and other proteins that are not identified by current methodology. The R package is available at: http://www.bioconductor.org/packages/2.12/bioc/html/iPAC.html.Conclusion: Our algorithm extends the current methodology to identify oncogenic activating driver mutations by utilizing tertiary protein structure when identifying nonrandom somatic residue mutation clusters.

AB - Background: Human cancer is caused by the accumulation of somatic mutations in tumor suppressors and oncogenes within the genome. In the case of oncogenes, recent theory suggests that there are only a few key " driver" mutations responsible for tumorigenesis. As there have been significant pharmacological successes in developing drugs that treat cancers that carry these driver mutations, several methods that rely on mutational clustering have been developed to identify them. However, these methods consider proteins as a single strand without taking their spatial structures into account. We propose an extension to current methodology that incorporates protein tertiary structure in order to increase our power when identifying mutation clustering.Results: We have developed iPAC (identification of Protein Amino acid Clustering), an algorithm that identifies non-random somatic mutations in proteins while taking into account the three dimensional protein structure. By using the tertiary information, we are able to detect both novel clusters in proteins that are known to exhibit mutation clustering as well as identify clusters in proteins without evidence of clustering based on existing methods. For example, by combining the data in the Protein Data Bank (PDB) and the Catalogue of Somatic Mutations in Cancer, our algorithm identifies new mutational clusters in well known cancer proteins such as KRAS and PI3KC α. Further, by utilizing the tertiary structure, our algorithm also identifies clusters in EGFR, EIF2AK2, and other proteins that are not identified by current methodology. The R package is available at: http://www.bioconductor.org/packages/2.12/bioc/html/iPAC.html.Conclusion: Our algorithm extends the current methodology to identify oncogenic activating driver mutations by utilizing tertiary protein structure when identifying nonrandom somatic residue mutation clusters.

UR - http://www.scopus.com/inward/record.url?scp=84878818533&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84878818533&partnerID=8YFLogxK

U2 - 10.1186/1471-2105-14-190

DO - 10.1186/1471-2105-14-190

M3 - Article

C2 - 23758891

AN - SCOPUS:84878818533

VL - 14

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

IS - 1

M1 - 190

ER -