Distribution of base pair repeats in coding and noncoding DNA sequences

Dokholyan V. Nikolay, Buldyrev V. Sergey, Havlin Shlomo, Stanley H. Eugene

Research output: Contribution to journalArticle

33 Citations (Scopus)

Abstract

We analyze the histograms for the lengths of the 16 possible distinct repeats of identical dimers, known as dimeric tandem repeats, in DNA sequences. For coding regions, the probability of finding a repetitive sequence of l copies of a particular dimer decreases exponentially as l increases. For the noncoding regions, the distribution functions for most of the 16 dimers have long tails and can be approximated by power-law functions, while for coding DNA, they can be well fit by a first-order Markov process. We propose a model, based on known biophysical processes, which leads to the observed probability distribution functions for noncoding DNA. We argue that this difference in the shape of the distribution functions between coding and noncoding DNA arises from the fact that noncoding DNA is more tolerant to evolutionary mutational alterations than coding DNA.

Original languageEnglish (US)
Pages (from-to)5182-5185
Number of pages4
JournalPhysical Review Letters
Volume79
Issue number25
DOIs
StatePublished - Jan 1 1997

Fingerprint

coding
deoxyribonucleic acid
dimers
distribution functions
Markov processes
probability distribution functions
histograms

All Science Journal Classification (ASJC) codes

  • Physics and Astronomy(all)

Cite this

Nikolay, Dokholyan V. ; Sergey, Buldyrev V. ; Shlomo, Havlin ; Eugene, Stanley H. / Distribution of base pair repeats in coding and noncoding DNA sequences. In: Physical Review Letters. 1997 ; Vol. 79, No. 25. pp. 5182-5185.
@article{edeeec0197aa460682e08aaeb87bec31,
title = "Distribution of base pair repeats in coding and noncoding DNA sequences",
abstract = "We analyze the histograms for the lengths of the 16 possible distinct repeats of identical dimers, known as dimeric tandem repeats, in DNA sequences. For coding regions, the probability of finding a repetitive sequence of l copies of a particular dimer decreases exponentially as l increases. For the noncoding regions, the distribution functions for most of the 16 dimers have long tails and can be approximated by power-law functions, while for coding DNA, they can be well fit by a first-order Markov process. We propose a model, based on known biophysical processes, which leads to the observed probability distribution functions for noncoding DNA. We argue that this difference in the shape of the distribution functions between coding and noncoding DNA arises from the fact that noncoding DNA is more tolerant to evolutionary mutational alterations than coding DNA.",
author = "Nikolay, {Dokholyan V.} and Sergey, {Buldyrev V.} and Havlin Shlomo and Eugene, {Stanley H.}",
year = "1997",
month = "1",
day = "1",
doi = "10.1103/PhysRevLett.79.5182",
language = "English (US)",
volume = "79",
pages = "5182--5185",
journal = "Physical Review Letters",
issn = "0031-9007",
publisher = "American Physical Society",
number = "25",

}

Distribution of base pair repeats in coding and noncoding DNA sequences. / Nikolay, Dokholyan V.; Sergey, Buldyrev V.; Shlomo, Havlin; Eugene, Stanley H.

In: Physical Review Letters, Vol. 79, No. 25, 01.01.1997, p. 5182-5185.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Distribution of base pair repeats in coding and noncoding DNA sequences

AU - Nikolay, Dokholyan V.

AU - Sergey, Buldyrev V.

AU - Shlomo, Havlin

AU - Eugene, Stanley H.

PY - 1997/1/1

Y1 - 1997/1/1

N2 - We analyze the histograms for the lengths of the 16 possible distinct repeats of identical dimers, known as dimeric tandem repeats, in DNA sequences. For coding regions, the probability of finding a repetitive sequence of l copies of a particular dimer decreases exponentially as l increases. For the noncoding regions, the distribution functions for most of the 16 dimers have long tails and can be approximated by power-law functions, while for coding DNA, they can be well fit by a first-order Markov process. We propose a model, based on known biophysical processes, which leads to the observed probability distribution functions for noncoding DNA. We argue that this difference in the shape of the distribution functions between coding and noncoding DNA arises from the fact that noncoding DNA is more tolerant to evolutionary mutational alterations than coding DNA.

AB - We analyze the histograms for the lengths of the 16 possible distinct repeats of identical dimers, known as dimeric tandem repeats, in DNA sequences. For coding regions, the probability of finding a repetitive sequence of l copies of a particular dimer decreases exponentially as l increases. For the noncoding regions, the distribution functions for most of the 16 dimers have long tails and can be approximated by power-law functions, while for coding DNA, they can be well fit by a first-order Markov process. We propose a model, based on known biophysical processes, which leads to the observed probability distribution functions for noncoding DNA. We argue that this difference in the shape of the distribution functions between coding and noncoding DNA arises from the fact that noncoding DNA is more tolerant to evolutionary mutational alterations than coding DNA.

UR - http://www.scopus.com/inward/record.url?scp=0000952690&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0000952690&partnerID=8YFLogxK

U2 - 10.1103/PhysRevLett.79.5182

DO - 10.1103/PhysRevLett.79.5182

M3 - Article

AN - SCOPUS:0000952690

VL - 79

SP - 5182

EP - 5185

JO - Physical Review Letters

JF - Physical Review Letters

SN - 0031-9007

IS - 25

ER -