The KA/KS ratio test for assessing the protein-coding potential of genomic regions: An empirical and simulation study

Research output: Contribution to journalArticle

147 Citations (Scopus)

Abstract

Comparative genomics is a simple, powerful way to increase the accuracy of gene prediction. In this study, we show the utility of a simple test for the identification of protein-coding exons using human/mouse sequence comparisons. The test takes advantage of the fact that in the vast majority of coding regions, synonymous substitutions (KS) occur much more frequently than nonsynonymous ones (KA) and uses the KA/KS ratio as the criterion. We show the following: (1) most of the human and mouse exons are sufficiently long and have a suitable degree of sequence divergence for the test to perform reliably; (2) the test is suited for the identification of long exons and single exon genes, which are difficult to predict by current methods; (3) the test has a false-negative rate, lower than most of current gene prediction methods and a false-positive rate lower than all current methods; (4) the test has been automated and can be used in combination with other existing gene-prediction methods.

Original languageEnglish (US)
Pages (from-to)198-202
Number of pages5
JournalGenome research
Volume12
Issue number1
DOIs
StatePublished - Jan 26 2002

Fingerprint

Exons
Genes
Proteins
Genomics

All Science Journal Classification (ASJC) codes

  • Genetics
  • Genetics(clinical)

Cite this

@article{3ba1dafa64514057b3f77e581c7286be,
title = "The KA/KS ratio test for assessing the protein-coding potential of genomic regions: An empirical and simulation study",
abstract = "Comparative genomics is a simple, powerful way to increase the accuracy of gene prediction. In this study, we show the utility of a simple test for the identification of protein-coding exons using human/mouse sequence comparisons. The test takes advantage of the fact that in the vast majority of coding regions, synonymous substitutions (KS) occur much more frequently than nonsynonymous ones (KA) and uses the KA/KS ratio as the criterion. We show the following: (1) most of the human and mouse exons are sufficiently long and have a suitable degree of sequence divergence for the test to perform reliably; (2) the test is suited for the identification of long exons and single exon genes, which are difficult to predict by current methods; (3) the test has a false-negative rate, lower than most of current gene prediction methods and a false-positive rate lower than all current methods; (4) the test has been automated and can be used in combination with other existing gene-prediction methods.",
author = "Anton Nekrutenko and Makova, {Kateryna D.} and Li, {Wen Hsiung}",
year = "2002",
month = "1",
day = "26",
doi = "10.1101/gr.200901",
language = "English (US)",
volume = "12",
pages = "198--202",
journal = "Genome Research",
issn = "1088-9051",
publisher = "Cold Spring Harbor Laboratory Press",
number = "1",

}

The KA/KS ratio test for assessing the protein-coding potential of genomic regions : An empirical and simulation study. / Nekrutenko, Anton; Makova, Kateryna D.; Li, Wen Hsiung.

In: Genome research, Vol. 12, No. 1, 26.01.2002, p. 198-202.

Research output: Contribution to journalArticle

TY - JOUR

T1 - The KA/KS ratio test for assessing the protein-coding potential of genomic regions

T2 - An empirical and simulation study

AU - Nekrutenko, Anton

AU - Makova, Kateryna D.

AU - Li, Wen Hsiung

PY - 2002/1/26

Y1 - 2002/1/26

N2 - Comparative genomics is a simple, powerful way to increase the accuracy of gene prediction. In this study, we show the utility of a simple test for the identification of protein-coding exons using human/mouse sequence comparisons. The test takes advantage of the fact that in the vast majority of coding regions, synonymous substitutions (KS) occur much more frequently than nonsynonymous ones (KA) and uses the KA/KS ratio as the criterion. We show the following: (1) most of the human and mouse exons are sufficiently long and have a suitable degree of sequence divergence for the test to perform reliably; (2) the test is suited for the identification of long exons and single exon genes, which are difficult to predict by current methods; (3) the test has a false-negative rate, lower than most of current gene prediction methods and a false-positive rate lower than all current methods; (4) the test has been automated and can be used in combination with other existing gene-prediction methods.

AB - Comparative genomics is a simple, powerful way to increase the accuracy of gene prediction. In this study, we show the utility of a simple test for the identification of protein-coding exons using human/mouse sequence comparisons. The test takes advantage of the fact that in the vast majority of coding regions, synonymous substitutions (KS) occur much more frequently than nonsynonymous ones (KA) and uses the KA/KS ratio as the criterion. We show the following: (1) most of the human and mouse exons are sufficiently long and have a suitable degree of sequence divergence for the test to perform reliably; (2) the test is suited for the identification of long exons and single exon genes, which are difficult to predict by current methods; (3) the test has a false-negative rate, lower than most of current gene prediction methods and a false-positive rate lower than all current methods; (4) the test has been automated and can be used in combination with other existing gene-prediction methods.

UR - http://www.scopus.com/inward/record.url?scp=0036144777&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0036144777&partnerID=8YFLogxK

U2 - 10.1101/gr.200901

DO - 10.1101/gr.200901

M3 - Article

C2 - 11779845

AN - SCOPUS:0036144777

VL - 12

SP - 198

EP - 202

JO - Genome Research

JF - Genome Research

SN - 1088-9051

IS - 1

ER -