Occupancy classification of position weight matrix-inferred Transcription Factor Binding Sites

Hollis Wright, Aaron Cohen, Kemal Sönmez, Gregory Yochum, Shannon McWeeney

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Background: Computational prediction of Transcription Factor Binding Sites (TFBS) from sequence data alone is difficult and error-prone. Machine learning techniques utilizing additional environmental information about a predicted binding site (such as distances from the site to particular chromatin features) to determine its occupancy/functionality class show promise as methods to achieve more accurate prediction of true TFBS in silico. We evaluate the Bayesian Network (BN) and Support Vector Machine (SVM) machine learning techniques on four distinct TFBS data sets and analyze their performance. We describe the features that are most useful for classification and contrast and compare these feature sets between the factors. Results: Our results demonstrate good performance of classifiers both on TFBS for transcription factors used for initial training and for TFBS for other factors in cross-classification experiments. We find that distances to chromatin modifications (specifically, histone modification islands) as well as distances between such modifications to be effective predictors of TFBS occupancy, though the impact of individual predictors is largely TF specific. In our experiments, Bayesian network classifiers outperform SVM classifiers. Conclusions: Our results demonstrate good performance of machine learning techniques on the problem of occupancy classification, and demonstrate that effective classification can be achieved using distances to chromatin features. We additionally demonstrate that cross-classification of TFBS is possible, suggesting the possibility of constructing a generalizable occupancy classifier capable of handling TFBS for many different transcription factors.

Original languageEnglish (US)
Article numbere26160
JournalPloS one
Volume6
Issue number11
DOIs
StatePublished - Nov 4 2011

Fingerprint

Position-Specific Scoring Matrices
binding sites
Transcription Factors
transcription factors
Binding Sites
artificial intelligence
Classifiers
Chromatin
chromatin
Learning systems
Bayesian networks
Support vector machines
Histone Code
prediction
methodology
Islands
histones
Computer Simulation
Histones

All Science Journal Classification (ASJC) codes

  • Biochemistry, Genetics and Molecular Biology(all)
  • Agricultural and Biological Sciences(all)

Cite this

Wright, Hollis ; Cohen, Aaron ; Sönmez, Kemal ; Yochum, Gregory ; McWeeney, Shannon. / Occupancy classification of position weight matrix-inferred Transcription Factor Binding Sites. In: PloS one. 2011 ; Vol. 6, No. 11.
@article{35ca704fcefc46e4b40440318417859a,
title = "Occupancy classification of position weight matrix-inferred Transcription Factor Binding Sites",
abstract = "Background: Computational prediction of Transcription Factor Binding Sites (TFBS) from sequence data alone is difficult and error-prone. Machine learning techniques utilizing additional environmental information about a predicted binding site (such as distances from the site to particular chromatin features) to determine its occupancy/functionality class show promise as methods to achieve more accurate prediction of true TFBS in silico. We evaluate the Bayesian Network (BN) and Support Vector Machine (SVM) machine learning techniques on four distinct TFBS data sets and analyze their performance. We describe the features that are most useful for classification and contrast and compare these feature sets between the factors. Results: Our results demonstrate good performance of classifiers both on TFBS for transcription factors used for initial training and for TFBS for other factors in cross-classification experiments. We find that distances to chromatin modifications (specifically, histone modification islands) as well as distances between such modifications to be effective predictors of TFBS occupancy, though the impact of individual predictors is largely TF specific. In our experiments, Bayesian network classifiers outperform SVM classifiers. Conclusions: Our results demonstrate good performance of machine learning techniques on the problem of occupancy classification, and demonstrate that effective classification can be achieved using distances to chromatin features. We additionally demonstrate that cross-classification of TFBS is possible, suggesting the possibility of constructing a generalizable occupancy classifier capable of handling TFBS for many different transcription factors.",
author = "Hollis Wright and Aaron Cohen and Kemal S{\"o}nmez and Gregory Yochum and Shannon McWeeney",
year = "2011",
month = "11",
day = "4",
doi = "10.1371/journal.pone.0026160",
language = "English (US)",
volume = "6",
journal = "PLoS One",
issn = "1932-6203",
publisher = "Public Library of Science",
number = "11",

}

Occupancy classification of position weight matrix-inferred Transcription Factor Binding Sites. / Wright, Hollis; Cohen, Aaron; Sönmez, Kemal; Yochum, Gregory; McWeeney, Shannon.

In: PloS one, Vol. 6, No. 11, e26160, 04.11.2011.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Occupancy classification of position weight matrix-inferred Transcription Factor Binding Sites

AU - Wright, Hollis

AU - Cohen, Aaron

AU - Sönmez, Kemal

AU - Yochum, Gregory

AU - McWeeney, Shannon

PY - 2011/11/4

Y1 - 2011/11/4

N2 - Background: Computational prediction of Transcription Factor Binding Sites (TFBS) from sequence data alone is difficult and error-prone. Machine learning techniques utilizing additional environmental information about a predicted binding site (such as distances from the site to particular chromatin features) to determine its occupancy/functionality class show promise as methods to achieve more accurate prediction of true TFBS in silico. We evaluate the Bayesian Network (BN) and Support Vector Machine (SVM) machine learning techniques on four distinct TFBS data sets and analyze their performance. We describe the features that are most useful for classification and contrast and compare these feature sets between the factors. Results: Our results demonstrate good performance of classifiers both on TFBS for transcription factors used for initial training and for TFBS for other factors in cross-classification experiments. We find that distances to chromatin modifications (specifically, histone modification islands) as well as distances between such modifications to be effective predictors of TFBS occupancy, though the impact of individual predictors is largely TF specific. In our experiments, Bayesian network classifiers outperform SVM classifiers. Conclusions: Our results demonstrate good performance of machine learning techniques on the problem of occupancy classification, and demonstrate that effective classification can be achieved using distances to chromatin features. We additionally demonstrate that cross-classification of TFBS is possible, suggesting the possibility of constructing a generalizable occupancy classifier capable of handling TFBS for many different transcription factors.

AB - Background: Computational prediction of Transcription Factor Binding Sites (TFBS) from sequence data alone is difficult and error-prone. Machine learning techniques utilizing additional environmental information about a predicted binding site (such as distances from the site to particular chromatin features) to determine its occupancy/functionality class show promise as methods to achieve more accurate prediction of true TFBS in silico. We evaluate the Bayesian Network (BN) and Support Vector Machine (SVM) machine learning techniques on four distinct TFBS data sets and analyze their performance. We describe the features that are most useful for classification and contrast and compare these feature sets between the factors. Results: Our results demonstrate good performance of classifiers both on TFBS for transcription factors used for initial training and for TFBS for other factors in cross-classification experiments. We find that distances to chromatin modifications (specifically, histone modification islands) as well as distances between such modifications to be effective predictors of TFBS occupancy, though the impact of individual predictors is largely TF specific. In our experiments, Bayesian network classifiers outperform SVM classifiers. Conclusions: Our results demonstrate good performance of machine learning techniques on the problem of occupancy classification, and demonstrate that effective classification can be achieved using distances to chromatin features. We additionally demonstrate that cross-classification of TFBS is possible, suggesting the possibility of constructing a generalizable occupancy classifier capable of handling TFBS for many different transcription factors.

UR - http://www.scopus.com/inward/record.url?scp=80455156114&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80455156114&partnerID=8YFLogxK

U2 - 10.1371/journal.pone.0026160

DO - 10.1371/journal.pone.0026160

M3 - Article

VL - 6

JO - PLoS One

JF - PLoS One

SN - 1932-6203

IS - 11

M1 - e26160

ER -