Tandem stretches of guanines can associate in hydrogen-bonded arrays to form G-quadruplexes, which are stabilized by K+ ions. Using computational methods, we searched for G-Quadruplex Sequence (GQS) patterns in the model plant species Arabidopsis thaliana. We found ∼1200 GQS with a G3 repeat sequence motif, most of which are located in the intergenic region. Using a Markov modeled genome, we determined that GQS are significantly underrepresented in the genome. Additionally, we found ∼43000 GQS with a G2 repeat sequence motif; notably, 80 of these were located in genic regions, suggesting that these sequences may fold at the RNA level. Gene Ontology functional analysis revealed that GQS are overrepresented in genes encoding proteins of certain functional categories, including enzyme activity. Conversely, GQS are underrepresented in other categories of genes, notably those for non-coding RNAs such as tRNAs and rRNAs. We also find that genes that are differentially regulated by drought are significantly more likely to contain a GQS. CD-detected K+ titrations performed on representative RNAs verified formation of quadruplexes at physiological K+ concentrations. Overall, this study indicates that GQS are present at unique locations in Arabidopsis and that folding of RNA GQS may play important roles in regulating gene expression.
All Science Journal Classification (ASJC) codes