Identification of novel maize miRNAs by measuring the precision of precursor processing

Background miRNAs are known to play important regulatory roles throughout plant development. Until recently, nearly all the miRNAs in maize were identified by comparative analysis to miRNAs sequences of other plant species, such as rice and Arabidopsis. Results To find new miRNA in this important crop, small RNAs from mixed tissues were sequenced, resulting in over 15 million unique sequences. Our sequencing effort validated 23 of the 28 known maize miRNA families, including 49 unique miRNAs. Using a newly established criterion, based on the precision of miRNA processing from precursors, we identified 66 novel miRNAs in maize. These miRNAs can be grouped into 58 families, 54 of which have not been identified in any other species. Five new miRNAs were validated by northern blot. Moreover, we found targets for 23 of the 66 new miRNAs. The targets of two of these newly identified miRNAs were confirmed by 5'RACE. Conclusion We have implemented a novel method of identifying miRNA by measuring the precision of miRNA processing from precursors. Using this method, 66 novel miRNAs and 50 potential miRNAs have been identified in maize.


Background
MiRNAs are known to play crucial roles in the regulation of gene expression in plants [1], including functions such as, leaf polarity, auxin response, floral identity, flowering time, and stress response [2][3][4][5][6][7]. MiRNAs are typically~21 nucleotides in length. In plants, miRNA genes are transcribed by RNA polymerasell into primary miRNA transcripts (pri-miRNA) which can form imperfect stem-loop secondary structure [8,9]. Then the pri-miRNAs are trimmed and spliced into miRNA/miRNA* duplex by Dicer-like1 (DCL1) with the help of dsRNA binding protein HYL1 and dsRNA methylase HEN1 [1,[10][11][12]. The length of the pre-miRNAs in plants ranges from about 80-nt to 300-nt, and is more variable than in animals. After being transported to the cytoplasm, the mature miRNAs can match to the corresponding target mRNAs through RNA-induced silencing complex (RISC) and the miRNA* are thought to be degraded [1,13]. MiRNAs regulate their target mRNA either by cleaving in the middle of their binding sites or by translational repression [14,15]. The plant miRNAs are highly complementary to their targets with about 0~4 nucleotides mismatches [1].
The majority of miRNAs were originally discovered through traditional Sanger sequencing of small RNA pools [16][17][18]. With the advent of second (next) generation sequencing technology, the rate of miRNA discovery increased dramatically [19][20][21]. However, due to the complexity of small RNA population, identification of miRNAs from the small RNA pools of sequencing product was not trivial. Typically, genomic sequences matched to all the small RNA with a length of 19~22-nt were extended upstream and downstream to get a collection of candidate precursors. Their secondary structures were then checked using a number of criteria with Minimum Free Energy (MFE) as the most important one [17,[19][20][21]. The presence of miRNA* has been regarded as a golden standard to reliably annotate a novel miRNA. Nevertheless, miRNA* have only been reported to be showed up with mature miRNA around 10% of the time [22]. As miRNAs can be enriched in certain genomic regions, a clustering algorithm was sometimes used for miRNA identification from large scale small RNA sequencing data. In these studies [23][24][25], hotspots of small RNA generation were identified if they match with multiple known miRNAs; individual hairpin sequences within these hotspots were subsequently checked to see whether some of them could be qualified as miRNAs.
As many miRNAs are conserved among different organisms, sequences of miRNAs found in one species can be used to identify corresponding miRNAs in other species through comparative analysis [6,26]. However, not all the miRNAs are conserved across different organisms. Direct prediction of potential miRNAs, based on the characteristics of miRNA precursors, has been shown to be a useful approach to identify miRNAs for any organisms, provided that there are a large amount of genomic sequences available [27]. However, as millions, even billions of inverted repeat sequences exist in complex genomes, candidate miRNAs identified just based on computational prediction often show a high rate of false positive.
Maize is an important crop as well as a model of plant genetics. A number of miRNAs with specific function have been reported in maize. The miR172 was reported to target APETALA2 floral homeotic transcription factor that is required for spikelet meristem determination [28]. Also, miR172 functions in promoting vegetative phase transition by regulating the APETALA2-like gene glossy15 [29]. The expression of teosinte glume architec-ture1 (tga1), which plays an important role in maize domestication, is regulated by miR156 [28]. The miR166 has been found to target a class III homeodomain leucine zipper (HD-ZIPIII) protein that acts on the asymmetry development of leaves in maize [30].
There are a total of 84 unique maize mature miRNAs belonging to 28 miRNA families in the current version of miRBase (release 17) [31]. These 84 miRNAs are the products of 167 precursors. All of these miRNAs were originally identified by searching with known miRNA from other plant species, such as Arabidopsis and rice [31][32][33][34][35]. Recently, 150 mature miRNAs from 26 families were validated by Illumina sequencing [34]. To do de novo identification of new miRNAs in maize, we have sequenced small RNAs from mixed tissues, tissues of endosperm and embryo using a next generation sequencing system. Moreover, a new method of identifying novel miRNAs, by measuring the precision of miRNA processing from their precursors, was employed. This method, conceptually proposed by Meyer et al., holds that the precise processing from precursor is both necessary and sufficient criterion for miRNA annotation [36]. We report here the establishment of such a method of identifying miRNAs by measuring the precision of miRNA processing from precursors. This method has resulted in 66 newly identified miRNAs and 50 potential miRNAs in maize. Of the 66 newly identified miRNAs, 62 belong to 54 families that have not been identified before in any other organisms.

Sequencing of maize small RNAs
In order to identify novel miRNAs from maize, four different small RNA samples (two from mixed tissues, one from embryo and another from endosperm) of B73 inbred line were sequenced. The sequencing effort resulted in over 43 million signatures with a length of 18~30nt, representing over 15 million unique sequences ( Table 1). The overall size distribution of the sequenced reads from all four sequencing effort were very similar, with the 24-nt class being the most abundant, followed by 22-nt and 21-nt classes (Figure 1). Such a size distribution is consistent with recent report that 22-nt siR-NAs were specifically enriched in maize compared with other plants [37,38]. Although over 43 million sequences were generated, a large number of signatures were only sequenced once, suggesting that maize has a very complex small RNA composition. The percentages of small RNAs sequenced once in four samples were 81.8% (2, 997, 412) and 77.9% (3,227,436) in two mixed tissues, 77.5% (5,339,164) in endosperm and 78.6% (3, 003, 817) in embryo, respectively. As in other small RNA sequencing efforts, there was a small portion of distinct signatures that matched to mitochondria or chloroplast genomes. In the four independently sequenced samples, there were 4.7%, 5.9%, 7.2% and 19% total signatures that respectively represent 0.26%, 0.50%, 0.49% and 1.2% unique reads matched to non-coding RNAs including tRNA, rRNA, snRNA, snoRNA ( Table 2).

Validation of known maize miRNAs in miRBase
There are a total of 84 unique mature miRNA sequences belonging to 28 miRNA family in the current miRBase for maize. All these miRNAs were identified by computational method based on sequence conservation using sequences of known miRNAs of other species [31][32][33][34]. Out of the 84 unique miRNA sequences, 49 can be confirmed by our sequencing Except for zma-miR393, zma-miR1432, zma-miR408, zma-miR482 and zma-miR395, 23 of 28 known maize miRNA families had members detected in at least one of the four sequenced libraries. Some of the conserved miRNAs showed very high abundances in our sequenced libraries, for example, zma-miR156a, b, c, d, e, f, g, h and i had more than 20, 000 reads in our four samples ( Table 3). Sequencing of the four libraries showed that some miRNAs from the current miRNA database may have been mis-annotated. For example, there are two variants for miR166 in the current miRBase. First, zma-miR166b, c, d, e, h, and i are annotated as 22-nt (UCGGACCAGGCUUCAUUCCCC), while zma-miR166a is annotated as 21-nt (UCGGACCAGGCUU-CAUUCCC). The 21-nt form has been sequenced 15432, 10857, 19833 and 37037 times respectively in four databases, while the 22-nt form was only sequenced 240, 260, 711 and 476 times. The 21-nt form is nearly one hundred times more abundant than that of 22-nt, therefore we concluded that zma-miR166b, c, d, e, h and i should have the same mature miRNA of 21-nt as zma-miR166a.
Consistent with the general opinion that the miRNA* degrades soon after the biogenesis of mature miRNA, the miRNA* had much less abundance than its corresponding miRNA in the sequencing dataset. Out of 167 miRNA precursors of maize in the current miR-Base, 143 had miRNA* annotated. Among the annotated miRNA*, 62 of them could be found in our small RNA sequencing libraries. We also found 10 miRNA* among the remaining 25 precursors that have not been annotated before. The total sequencing abundance of miRNA* in our four libraries was about 0.7% of that of mature miRNAs. However, there were two exceptions where miRNA* had more reads than its corresponding miRNA as reported before [20]. The abundance of the originally annotated miRNA* of zma-miR396a and zma-miR396b was much higher (31,120,199, 59 times in four sequenced libraries) than its annotated miRNA (only 16,9,38,20 in the same sequenced libraries). The same thing happened to zma-miR408, whose miRNA was sequenced less than its miRNA*. Both      miRNAs had strong conservation among plant species and their target genes validated [39]. This may suggest that a small fraction of miRNA* do not degrade as fast as others.

Novel miRNA identification and target prediction
During the miRNA biogenesis process, the pri-miRNA transcribed by RNA polymerase II is trimmed and spliced into miRNA/miRNA* duplex by Dicer-like1 (DCL1) [1]. The precise enzymatic cleavage of miRNA/ miRNA* from the precursor is a key criterion that distinguishes miRNAs from diverse siRNA [36]. We observed that, for most miRNA precursors, there were few small RNA reads other than miRNA and miRNA* that mapped to the precursors. To gain an overall pattern of small RNA distribution along the miRNA precursors, we tested the percentage of small RNA reads mapped to position of mature miRNAs vs. reads mapped to other regions of the same miRNA precursors for all known maize miRNAs. The result showed that out of the 120 known miRNA precursors which had mature miRNA expressed in our four small RNA libraries, 104 (86.7%) had over 75% of the small RNA reads mapped to the exact mature miRNA/miRNA* sites or 4-nt around. Having 75% of reads mapped to the miRNA/miRNA* and its close vicinity had recently been proposed as a primary criterion for valid miRNA annotation. Our result further demonstrated that such a precise processing criterion [36] could be used as a straightforward and reliable method to identify the miRNA from the diverse small RNA data.
To identify novel miRNAs using the method described above, maize genome sequences (downloaded from http://www.maizesequence.org) with known transposons masked were used to generate inverted repeat sequences. A total of 330, 048 inverted repeat sequences  with a copy number of no more than 10 in the maize genome were obtained. These inverted repeat sequences were then folded by RNAfold, in both sense and antisense directions, which effectively narrowed down the candidate precursors. Candidate single loop precursors with an overall length of 80-300bp were kept in this study. We then attempted to identify novel miRNAs from our four sequenced RNA samples separately using the precise processing criterion as described in methods ( Figure 2). There were 314 sense and 313 antisense RNAs that qualified as miRNA precursor candidates based on the primary criterion. Finally, the secondary structures of these candidates were carefully checked for their validity as miRNA precursors, along with their corresponding mature miRNAs (Figure 3). There were 13 new miRNAs identified from mixed tissues I, 22 from mixed tissues II, 30 from embryo, 38 from endosperm (Table 4). All together we obtained a total of 66 unique new miRNAs. These new miRNAs could be grouped into 58 families ( Table 4), given that two miRNAs with less than 4 nucleotides mismatches were grouped into one family. Sixty-two of the 66 newly identified miRNAs belonging to 54 families have not been identified before in any other organisms. Since some of the miRNAs are derived from multiple precursors, the 66 newly identified miRNAs correspond to 70 miRNA precursors. The full information and secondary structure were shown in Additional file 1 and Additional file 2.
From the 66 new miRNAs, 16 were sequenced in all four libraries, 17 in three, 15 in two and 18 in one library. The expressions of the 5 newly identified miR-NAs were validated by Northern blot using RNAs from kernel of mixed stages ( Figure 4). As additional evidence to support the annotation of some of these miRNAs, 22 of the 70 new miRNA precursors were found to have miRNA* in our sequencing data (Additional file 1).
The 54 miRNA families that were identified for the first time in maize from our sequencing effort provided an opportunity to identify conserved miRNAs that have not yet been discovered in other plant species. After searching the genomes of sorghum, rice and Arabidopsis, we found 17 conserved in sorghum, 14 in rice and 2 in Arabidopsis (Table 5).
As most miRNAs are near perfect complementary to their corresponding targeted mRNAs, we performed the target prediction by allowing no more than 3 mismatches between miRNA and its corresponding mRNA sequences [40]. After searching in the annotated maize filtered genes set, we found 41 targeted genes for 23 new miRNAs, 2 of which were validated by 5'RACE. GRMZM2G416426 and GRMZM2G037792 were targeted by miRNA3 and miRNA65, respectively ( Figure  5). GRMZM2G416426 was predicted to be an alcohol dehydrogenase 1 (adh1) and GRMZM2G037792 was a GRAS transcription factor. MiRNA65 was identical to miR171a, b, c in Arabidopsis, which is reported to target GRAS transcriptional factor in Arabidopsis [41,42], suggesting that this miRNA and target pairs were conserved among dicot and monocot plants. A complete list of our predicted miRNAs and their predicted targets are shown in Additional file 3. The target gene GRMZM2G401869 of new miRNA4, was annotated to be a ribosomal protein, reported to be regulated by miR-10a in mouse [43]. MiRNA38 was predicted to target a plant specific abscisic acid (ABA) stress-induced protein (GRMZM 2G027241) [44].

Identification of new miRNAs according to the precision of excision from the stem-loop precursor
MiRNAs have been known to play very important post-transcriptional regulation roles throughout plant development. Identifying new miRNA is therefore a critical step towards the understanding of biological regulation. However, small RNA populations in all organisms are extremely complex; while accurate miR-NAs identification is not straightforward. Thus far, the majority of reported miRNAs have been identified by "extending method" [17,[19][20][21][22]. The short reads that resulted from sequencing were mapped to the known reference genome and then candidate precursors were taken by extending upstream and downstream of small map sites. The secondary structures of these extended sequences were then carefully checked for consideration as miRNA precursors. This method typically cost significant computation time, as millions or billions of small RNA sequence generated from sequencing need to be mapped to and extended in the genome individually. For any miRNA precursors, there are other small RNA sequences mapped to 4-nt around the mature miRNA, which often confuse the miRNA annotation. Lacking other supportive information, the appearance of miRNA* is regarded as an essential condition for valid miRNA annotation. However, being degraded after miRNA release, miRNA* has a much lower probability of being sequenced than that of mature miRNA. The annotation of miRNAs based on the appearance of miRNA* would often miss many true miRNAs. As the sequencing becomes relatively easily available with the development of new sequencing technology [45,46], a robust miRNAs identification system has become increasingly important. In this study, we adopted the primary criterion suggested recently by a large group of scientists in the field of plant miRNA [36]. Our method is based on an assumption that: if any sequences with stem-loop secondary structure have 75% of all small RNAs mapped onto this stemloop fall in one distinct position (where the miRNA/ miRNA* locate), then this hairpin sequences should be annotated as a miRNA precursor [36]. The advantages of our new method are apparent; it saves significant  computation time, and the exact sequences of mature miRNAs for all the precursors are easy to determine. However, finding new miRNAs using this method is highly depended on the depth of small RNA sequencing, which is practical only using a next generation sequencing platform. Additionally, our method starting with the prediction of potential miRNA precursors using a very relaxed criterion, it is still possible that some precursors may have been missed, particularly for those of the multi-loop secondary structure. Although our method relied on the precision of excision from the stem-loop precursors, as demonstrated by the small RNA sequencing data, other cleavage patterns of miRNA precursors, such as the extensive degradome sequencing in rice [47], can also be used to verify miRNA prediction. The elegant degradome sequencing results showed that most conserved miRNA precursors were cleaved precisely at the beginning or end of miRNA/miRNA* duplex.

Additional miRNA candidates
Using this new method, we have identified 66 new miR-NAs, 62 of which have not been identified before in any other organism. The discovery of these miRNAs and their targeted genes was a critical step in understanding the complex miRNA regulation network of this important crop.
According to our method, a relative high sequencing depth is required for new miRNAs identification. In our four libraries, unique small RNAs were sequenced an average of 2.6 times. Thus, we have taken 5 as the minimal abundance in the new miRNA prediction. However, some real miRNAs were not sequenced in high enough coverage and were missed. There were 50 small RNAs with a sequencing coverage lower than 5 but higher than 2. At the same time, the corresponding genomic regions of these 50 small RNA fulfill all the criteria for typical miRNA precursors; therefore, these 50 small RNAs are potential miRNA candidates (Additional file 4).

Some miRNA precursors overlap with the protein-coding genes
Based on the maize genome annotation release-5b downloaded from http://www.maizesequence.org/, the genome locations of the 167 known and 70 new miRNA precursors were determined. About 18% of the precursors were located within annotated protein coding genes ( Figure 6). For those miRNAs that fell on genes, 10% overlapped with exons (sense and anti-sense), and 7% were located in intron regions. This result was consistent with the result reported in P. patens [48], where more than half of the miRNA precursors overlapped with protein coding regions.
The small RNA population in maize is highly complicated To identify novel maize miRNA, we conducted four next generation sequencing runs for small RNAs: two mixed tissues, embryo and endosperm. Although we generated over 40 million signatures, sequences from the four databases have a limited overlap, with only 233, 132 unique sequences appeared in all four libraries and a small fraction overlapped between two libraries ( Figure  7). This limited overlap indicates a very large number of small RNAs exist in maize. We noticed that some known miRNAs had very different abundance in the four databases especially between embryo and endosperm: 30 new miRNAs were sequenced either in embryo or endosperm. For example, zma-miR168a, b and zma-miR166a had a very high abundance in the two mixed tissues and the endosperm while they could not be detected in the embryo library, which indicates that they may be endosperm specific. Although their true tissue specificity needs to be further validated through experiments, their relatively high level of expression in embryo or endosperm suggested that they could have important regulatory roles throughout embryo/endosperm development.

Conclusion
We have implemented a novel process of identifying miRNA from small RNA sequencing data by measuring the precision of miRNA processing from precursors. Using this method, 66 novel miRNAs belonging to 54 families have been identified in maize. These newly identified miRNAs can be grouped into 58 families, of which 54 have not been identified in any other species.

Plant Materials and sequencing
B73 inbred was used in our study. Four separated RNA samples were sequenced. Two samples were the mixed tissues of root, stem, leaf, tassel, ear, shoot, pollen and silk. Another two samples were the tissues of endosperm and embryo. The embryo and endosperm were collected 12, 16, 20 and 24 days after pollination. For samples of mixed tissues, RNAs were extracted from 8 tissues separately by using TRIzol reagent (Invitrogen) and then mixed in equally amount for sequencing. The small RNAs of 18-28-nt in length were purified by polyacrylamide gel electrophoresis (PAGE). 3' and 5' adaptors were added for RT-PCR amplification and PCR products were subjected to sequencing. Low quality reads and the adaptor sequences were removed before further analysis.

Novel miRNAs identification and target prediction
The maize genome sequences masked with the MIPs repeat were downloaded from B73 genome project [49] (release-3b.50, http://www.maizesequence.org/. Inverted repeat sequences were extracted by EMBOSS-einverted [50] from the masked genome sequences using parameters "-gap = 20, -threshold = 60, -match = 5, -mismatch = -4, -maxrepeat = 400". As we noticed that a sequence can have different secondary structures in the sense and antisense direction when calculated by RNAfold, the inverted repeat sequences were folded in both directions to retrieve stem-loop sequences with single loops as the candidate miRNA precursors. We then attempted to identify novel miRNAs from four sequenced RNA samples separately. For each sequenced RNA samples, short reads were first mapped to all the candidate precursors. The precursors that fulfilled the following conditions were selected for further analysis: 1) precursor had at least one unique small RNA of 20~22-nt mapped on it, 2) the unique small RNA had at least five identical reads in a sequencing library, 3) the genomic sequences of the unique small RNA are less than 20 copies in B73 genome. For each selected precursor, a particular position that had the highest number of identical 20-22-nt small RNAs mapped to it was regarded as the potential mature miRNA.
Finally, distributions of reads for all the mapped small RNAs for the selected precursors were checked. If the number of small RNAs mapped around the potential mature miRNA (including 4-nt upstream and downstream) account for 75% of all the reads mapped to the precursors, then the candidate precursor was regarded as a true miRNA precursor, while the most abundant small RNA mapped on it was regarded as the mature miRNA.
After screening by the primary criteria, the secondary structures of the precursors were predicted again by RNAfold [51] using additional parameters. The secondary structures of the inverted repeat should satisfy the following: the MFEI [32](minimum free energy calculated by RNAfold divided by the sequence length) should ≤-0.15; the miRNA candidates should be on the stem of the stem-loop sequences; the candidate miRNA and miRNA* should have no more than 5 mismatches. Inverted repeat sequences that passed all the filters were regarded as our new miRNA precursors, and their corresponding mature miRNAs were the small RNA with the largest abundance among the ones mapped to them.
To find additional evidences for the newly identified miRNAs, the expression of the precursors were tested by BLAST analysis with existing expressed sequence tag (EST) database. We also tried to find the miRNA* in our four small RNA libraries as additional evidence for the annotation of new miRNAs.
To find conserved miRNA in Arabidopsis, rice and sorghum for newly identified miRNAs in maize, genome sequences of Arabidopsis, rice and sorghum http://www. arabidopsis.org/, http://www.tigr.org, http://www.phytozome.net/ were downloaded. If the new miRNAs have conserved sequences of no more than 4 mismatches in the genome, we extend the corresponding sequences for further analysis. Two extensions were made: one upstream 30-nt and downstream 300nt, the other upstream 300nt and downstream 30-nt as putative precursors, for the reason that mature miRNA are on the 3' or 5' stem of its precursor. Then the putative miRNA precursors' secondary structures were predicted by RNAfold. If the secondary structure fulfilled the criterion for miRNA precursors, we considered the miRNA conserved in the genome.
The maize annotated coding sequences and Go annotation were downloaded from B73 genome project [49] http://www.maizesequence.org/. release-5b). Because most miRNAs were near perfect matches to their corresponding target mRNA, we identify the miRNA target using BLAST with no more than 3 mismatches between miRNA and target sequences.

New miRNA validation by northern blot
The RNA gel blot hybridization was performed as described previously [52]. Total RNA was extracted using RNA pure plant kit (TIANGEN of Beijing, DP437). The low-molecular-weight (LMW) RNA was detected by 15% polyacrylamide gels, blotted onto Hybond N+ membrane (Amersham BioSciences) using the transferring machine (Bio-Rad Laboratories, Mini Trans-Blot) for 1.5 hours at 200 mA, and UV crosslinked. Blot probes for specific small RNAs were labeled using γ-32 P-dATP by T4 polynucleotide kinase (NEB, M0201). Blots were prehybridized and hybridized at 37°C for 7 and 24 hours respectively using perfectHyb plus hybridization buffer (Sigma, H7033). Blots were washed at 50°C first with 2 × SSC/0.2% SDS for 15 minutes one time, the second wash was carried out using 1 × SSC/ 0.1% SDS for 10 minutes, then repeated.

miRNA target validation by 5' RACE
Total RNA was extracted from maize (B73 inbreed) 14 days seedlings, then treated with RQ1 DNase I