Information-Theoretic Inference of an Optimal Dictionary of Protein Supersecondary Structures

Arun S. Konagurthu, Ramanan Subramanian, Lloyd Allison, David Abramson, Maria Garcia de la Banda, Peter J. Stuckey, Arthur Lesk

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

We recently developed an unsupervised Bayesian inference methodology to automatically infer a dictionary of protein supersecondary structures (Subramanian et al., IEEE data compression conference proceedings (DCC), 340–349, 2017). Specifically, this methodology uses the information-theoretic framework of minimum message length (MML) criterion for hypothesis selection (Wallace, Statistical and inductive inference by minimum message length, Springer Science & Business Media, New York, 2005). The best dictionary of supersecondary structures is the one that yields the most (lossless) compression on the source collection of folding patterns represented as tableaux (matrix representations that capture the essence of protein folding patterns (Lesk, J Mol Graph. 13:159–164, 1995). This book chapter outlines our MML methodology for inferring the supersecondary structure dictionary. The inferred dictionary is available at http://lcb.infotech.monash.edu.au/proteinConcepts/scop100/dictionary.html.

Original languageEnglish (US)
Title of host publicationMethods in Molecular Biology
PublisherHumana Press Inc.
Pages123-131
Number of pages9
DOIs
StatePublished - Jan 1 2019

Publication series

NameMethods in Molecular Biology
Volume1958
ISSN (Print)1064-3745
ISSN (Electronic)1940-6029

Fingerprint

Amino Acid Motifs
Data Compression
Protein Folding
Patient Selection

All Science Journal Classification (ASJC) codes

  • Molecular Biology
  • Genetics

Cite this

Konagurthu, A. S., Subramanian, R., Allison, L., Abramson, D., de la Banda, M. G., Stuckey, P. J., & Lesk, A. (2019). Information-Theoretic Inference of an Optimal Dictionary of Protein Supersecondary Structures. In Methods in Molecular Biology (pp. 123-131). (Methods in Molecular Biology; Vol. 1958). Humana Press Inc.. https://doi.org/10.1007/978-1-4939-9161-7_6
Konagurthu, Arun S. ; Subramanian, Ramanan ; Allison, Lloyd ; Abramson, David ; de la Banda, Maria Garcia ; Stuckey, Peter J. ; Lesk, Arthur. / Information-Theoretic Inference of an Optimal Dictionary of Protein Supersecondary Structures. Methods in Molecular Biology. Humana Press Inc., 2019. pp. 123-131 (Methods in Molecular Biology).
@inbook{10cafe7af8f3469692a720c4fafd3ceb,
title = "Information-Theoretic Inference of an Optimal Dictionary of Protein Supersecondary Structures",
abstract = "We recently developed an unsupervised Bayesian inference methodology to automatically infer a dictionary of protein supersecondary structures (Subramanian et al., IEEE data compression conference proceedings (DCC), 340–349, 2017). Specifically, this methodology uses the information-theoretic framework of minimum message length (MML) criterion for hypothesis selection (Wallace, Statistical and inductive inference by minimum message length, Springer Science & Business Media, New York, 2005). The best dictionary of supersecondary structures is the one that yields the most (lossless) compression on the source collection of folding patterns represented as tableaux (matrix representations that capture the essence of protein folding patterns (Lesk, J Mol Graph. 13:159–164, 1995). This book chapter outlines our MML methodology for inferring the supersecondary structure dictionary. The inferred dictionary is available at http://lcb.infotech.monash.edu.au/proteinConcepts/scop100/dictionary.html.",
author = "Konagurthu, {Arun S.} and Ramanan Subramanian and Lloyd Allison and David Abramson and {de la Banda}, {Maria Garcia} and Stuckey, {Peter J.} and Arthur Lesk",
year = "2019",
month = "1",
day = "1",
doi = "10.1007/978-1-4939-9161-7_6",
language = "English (US)",
series = "Methods in Molecular Biology",
publisher = "Humana Press Inc.",
pages = "123--131",
booktitle = "Methods in Molecular Biology",

}

Konagurthu, AS, Subramanian, R, Allison, L, Abramson, D, de la Banda, MG, Stuckey, PJ & Lesk, A 2019, Information-Theoretic Inference of an Optimal Dictionary of Protein Supersecondary Structures. in Methods in Molecular Biology. Methods in Molecular Biology, vol. 1958, Humana Press Inc., pp. 123-131. https://doi.org/10.1007/978-1-4939-9161-7_6

Information-Theoretic Inference of an Optimal Dictionary of Protein Supersecondary Structures. / Konagurthu, Arun S.; Subramanian, Ramanan; Allison, Lloyd; Abramson, David; de la Banda, Maria Garcia; Stuckey, Peter J.; Lesk, Arthur.

Methods in Molecular Biology. Humana Press Inc., 2019. p. 123-131 (Methods in Molecular Biology; Vol. 1958).

Research output: Chapter in Book/Report/Conference proceedingChapter

TY - CHAP

T1 - Information-Theoretic Inference of an Optimal Dictionary of Protein Supersecondary Structures

AU - Konagurthu, Arun S.

AU - Subramanian, Ramanan

AU - Allison, Lloyd

AU - Abramson, David

AU - de la Banda, Maria Garcia

AU - Stuckey, Peter J.

AU - Lesk, Arthur

PY - 2019/1/1

Y1 - 2019/1/1

N2 - We recently developed an unsupervised Bayesian inference methodology to automatically infer a dictionary of protein supersecondary structures (Subramanian et al., IEEE data compression conference proceedings (DCC), 340–349, 2017). Specifically, this methodology uses the information-theoretic framework of minimum message length (MML) criterion for hypothesis selection (Wallace, Statistical and inductive inference by minimum message length, Springer Science & Business Media, New York, 2005). The best dictionary of supersecondary structures is the one that yields the most (lossless) compression on the source collection of folding patterns represented as tableaux (matrix representations that capture the essence of protein folding patterns (Lesk, J Mol Graph. 13:159–164, 1995). This book chapter outlines our MML methodology for inferring the supersecondary structure dictionary. The inferred dictionary is available at http://lcb.infotech.monash.edu.au/proteinConcepts/scop100/dictionary.html.

AB - We recently developed an unsupervised Bayesian inference methodology to automatically infer a dictionary of protein supersecondary structures (Subramanian et al., IEEE data compression conference proceedings (DCC), 340–349, 2017). Specifically, this methodology uses the information-theoretic framework of minimum message length (MML) criterion for hypothesis selection (Wallace, Statistical and inductive inference by minimum message length, Springer Science & Business Media, New York, 2005). The best dictionary of supersecondary structures is the one that yields the most (lossless) compression on the source collection of folding patterns represented as tableaux (matrix representations that capture the essence of protein folding patterns (Lesk, J Mol Graph. 13:159–164, 1995). This book chapter outlines our MML methodology for inferring the supersecondary structure dictionary. The inferred dictionary is available at http://lcb.infotech.monash.edu.au/proteinConcepts/scop100/dictionary.html.

UR - http://www.scopus.com/inward/record.url?scp=85064240074&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85064240074&partnerID=8YFLogxK

U2 - 10.1007/978-1-4939-9161-7_6

DO - 10.1007/978-1-4939-9161-7_6

M3 - Chapter

C2 - 30945216

AN - SCOPUS:85064240074

T3 - Methods in Molecular Biology

SP - 123

EP - 131

BT - Methods in Molecular Biology

PB - Humana Press Inc.

ER -

Konagurthu AS, Subramanian R, Allison L, Abramson D, de la Banda MG, Stuckey PJ et al. Information-Theoretic Inference of an Optimal Dictionary of Protein Supersecondary Structures. In Methods in Molecular Biology. Humana Press Inc. 2019. p. 123-131. (Methods in Molecular Biology). https://doi.org/10.1007/978-1-4939-9161-7_6