Plant endogenous small RNAs (sRNAs) are important regulators of gene expression. There are two broad categories of plant sRNAs: microRNAs (miRNAs) and endogenous short interfering RNAs (siRNAs). MicroRNA loci are relatively well-annotated but compose only a small minority of the total sRNA pool; siRNA locus annotations have lagged far behind. Here, we used a large data set of published and newly generated sRNA sequencing data (1333 sRNA-seq libraries containing more than 20 billion reads) and a uniform bioinformatic pipeline to produce comprehensive sRNA locus annotations of 47 diverse plants, yielding more than 2.7 million sRNA loci. The two most numerous classes of siRNA loci produced mainly 24- and 21-nucleotide (nt) siRNAs, respectively. Most often, 24-nt-dominated siRNA loci occurred in intergenic regions, especially at the 5′-flanking regions of protein-coding genes. In contrast, 21-nt-dominated siRNA loci were most often derived from double-stranded RNA precursors copied from spliced mRNAs. Genic 21-nt-dominated loci were especially common from disease resistance genes, including from a large number of monocots. Individual siRNA sequences of all types showed very little conservation across species, whereas mature miRNAs were more likely to be conserved. We developed a web server where our data and several search and analysis tools are freely accessible.
All Science Journal Classification (ASJC) codes