Exploring inconsistencies in genome-wide protein function annotations: A machine learning approach

Carson Andorf, Drena Dobbs, Vasant Honavar

Research output: Contribution to journalArticle

20 Citations (Scopus)

Abstract

Background: Incorrectly annotated sequence data are becoming more commonplace as databases increasingly rely on automated techniques for annotation. Hence, there is an urgent need for computational methods for checking consistency of such annotations against independent sources of evidence and detecting potential annotation errors. We show how a machine learning approach designed to automatically predict a protein's Gene Ontology (GO) functional class can be employed to identify potential gene annotation errors. Results: In a set of 211 previously annotated mouse protein kinases, we found that 201 of the GO annotations returned by AmiGO appear to be inconsistent with the UniProt functions assigned to their human counterparts. In contrast, 97% of the predicted annotations generated using a machine learning approach were consistent with the UniProt annotations of the human counterparts, as well as with available annotations for these mouse protein kinases in the Mouse Kinome database. Conclusion: We conjecture that most of our predicted annotations are, therefore, correct and suggest that the machine learning approach developed here could be routinely used to detect potential errors in GO annotations generated by high-throughput gene annotation projects.

Original languageEnglish (US)
Article number284
JournalBMC bioinformatics
Volume8
DOIs
StatePublished - Aug 3 2007

Fingerprint

Molecular Sequence Annotation
Inconsistency
Gene Ontology
Annotation
Learning systems
Machine Learning
Genome
Genes
Proteins
Protein
Ontology
Protein Kinases
Databases
Computational methods
Mouse
Protein Kinase
Throughput
Gene
Inconsistent
Computational Methods

All Science Journal Classification (ASJC) codes

  • Structural Biology
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Applied Mathematics

Cite this

@article{a4691c26dd30473ab9d15cede7f03567,
title = "Exploring inconsistencies in genome-wide protein function annotations: A machine learning approach",
abstract = "Background: Incorrectly annotated sequence data are becoming more commonplace as databases increasingly rely on automated techniques for annotation. Hence, there is an urgent need for computational methods for checking consistency of such annotations against independent sources of evidence and detecting potential annotation errors. We show how a machine learning approach designed to automatically predict a protein's Gene Ontology (GO) functional class can be employed to identify potential gene annotation errors. Results: In a set of 211 previously annotated mouse protein kinases, we found that 201 of the GO annotations returned by AmiGO appear to be inconsistent with the UniProt functions assigned to their human counterparts. In contrast, 97{\%} of the predicted annotations generated using a machine learning approach were consistent with the UniProt annotations of the human counterparts, as well as with available annotations for these mouse protein kinases in the Mouse Kinome database. Conclusion: We conjecture that most of our predicted annotations are, therefore, correct and suggest that the machine learning approach developed here could be routinely used to detect potential errors in GO annotations generated by high-throughput gene annotation projects.",
author = "Carson Andorf and Drena Dobbs and Vasant Honavar",
year = "2007",
month = "8",
day = "3",
doi = "10.1186/1471-2105-8-284",
language = "English (US)",
volume = "8",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central",

}

Exploring inconsistencies in genome-wide protein function annotations : A machine learning approach. / Andorf, Carson; Dobbs, Drena; Honavar, Vasant.

In: BMC bioinformatics, Vol. 8, 284, 03.08.2007.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Exploring inconsistencies in genome-wide protein function annotations

T2 - A machine learning approach

AU - Andorf, Carson

AU - Dobbs, Drena

AU - Honavar, Vasant

PY - 2007/8/3

Y1 - 2007/8/3

N2 - Background: Incorrectly annotated sequence data are becoming more commonplace as databases increasingly rely on automated techniques for annotation. Hence, there is an urgent need for computational methods for checking consistency of such annotations against independent sources of evidence and detecting potential annotation errors. We show how a machine learning approach designed to automatically predict a protein's Gene Ontology (GO) functional class can be employed to identify potential gene annotation errors. Results: In a set of 211 previously annotated mouse protein kinases, we found that 201 of the GO annotations returned by AmiGO appear to be inconsistent with the UniProt functions assigned to their human counterparts. In contrast, 97% of the predicted annotations generated using a machine learning approach were consistent with the UniProt annotations of the human counterparts, as well as with available annotations for these mouse protein kinases in the Mouse Kinome database. Conclusion: We conjecture that most of our predicted annotations are, therefore, correct and suggest that the machine learning approach developed here could be routinely used to detect potential errors in GO annotations generated by high-throughput gene annotation projects.

AB - Background: Incorrectly annotated sequence data are becoming more commonplace as databases increasingly rely on automated techniques for annotation. Hence, there is an urgent need for computational methods for checking consistency of such annotations against independent sources of evidence and detecting potential annotation errors. We show how a machine learning approach designed to automatically predict a protein's Gene Ontology (GO) functional class can be employed to identify potential gene annotation errors. Results: In a set of 211 previously annotated mouse protein kinases, we found that 201 of the GO annotations returned by AmiGO appear to be inconsistent with the UniProt functions assigned to their human counterparts. In contrast, 97% of the predicted annotations generated using a machine learning approach were consistent with the UniProt annotations of the human counterparts, as well as with available annotations for these mouse protein kinases in the Mouse Kinome database. Conclusion: We conjecture that most of our predicted annotations are, therefore, correct and suggest that the machine learning approach developed here could be routinely used to detect potential errors in GO annotations generated by high-throughput gene annotation projects.

UR - http://www.scopus.com/inward/record.url?scp=34748833491&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34748833491&partnerID=8YFLogxK

U2 - 10.1186/1471-2105-8-284

DO - 10.1186/1471-2105-8-284

M3 - Article

C2 - 17683567

AN - SCOPUS:34748833491

VL - 8

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

M1 - 284

ER -