SiRNA transcriptome sequencing
We sequenced 12 siRNA samples including the kernels (0, 3 and 5 DAP) and endosperms (7, 10 and 15 DAP) harvested from the maize B73 and Mo17 reciprocal crosses. On average, 13.6 million adaptor-trimmed siRNA reads were produced for each sample, ranging from 18 to 27 nucleotides (nt) in length. The three most abundant classes were 24-, 22- and 21-nt siRNAs, representing approximately 55.3%, 13.4% and 12.6% of the total siRNAs, respectively (Additional file 1: Figure S1). The kernel samples exhibited higher proportions of 24-nt siRNAs (70.7% at 0 DAP, 68.1% at 3 DAP and 64.4% at 5 DAP) compared with the endosperm samples (36.6% at 7 DAP, 50.1% at 10 DAP and 41.8% at 15 DAP), whereas the proportions of 21- and 22-nt siRNAs in the endosperm samples were higher (37.4% at 7 DAP, 26.7% at 10 DAP and 34.9% at 15 DAP) compared with those of the kernel samples (17.4% at 0 DAP, 18.0% at 3 DAP and 21.2% at 5 DAP) (Additional file 1: Figure S1). The siRNA reads of 12 samples were mapped to the B73 reference genome (release 5.6b, http://MaizeSequence.org), and only the perfectly matched reads were retained for further analysis. Approximately 37.6% of the reads were uniquely mapped on the B73 genome, whereas 58.8%, 68.2% and 74.0% of the reads were mapped on the B73 genome, respectively, when maximally 10, 100 and 1,000 multiple matched locations for a read (multi-reads) were allowed (Additional file 2: Table S1).
Genomic and genic distributions of 24-nt siRNAs
We identified and classified genomic loci generating siRNAs based on previous methods [24],[25] (Methods), and these siRNA loci were further defined in terms of their dominant siRNA population representing more than 50% of the total siRNAs at a locus. When only using uniquely mapped reads, 416,100 siRNA loci were identified across all ten chromosomes; when multi-reads were included, 644,496, 836,618 and 931,924 siRNA loci were identified with the maximum mapping locations of 10, 100 and 1,000, respectively. The chromosomal distribution demonstrated that siRNA loci were preferentially enriched in high-density gene regions and chromosomal end regions but devoid in repeat-rich pericentromeric and centromeric regions (Figure 1), which is consistent with the observation in maize shoots that only a small proportion of siRNAs are generated from the pericentromeric and centromeric regions [26]. To include more information and better reflect the original siRNA situation, 931,924 siRNA loci containing multi-reads (≤1000 locations) were used for further analysis, with more than 90% of the siRNA loci predominantly producing 24-nt siRNAs and 3% and 1% of the loci being enriched with 21- and 22-nt siRNAs, respectively (Additional file 3: Table S2).
To statistically evaluate the association of the 21-, 22- and 24-nt siRNA loci with genic, intergenic and repetitive regions, we compared the observed occurrence with the simulated occurrence of 21-, 22- and 24-nt siRNA loci in association with the three genomic region types (Mehtods). The 21-nt siRNA loci showed a significantly high degree of association with genic regions (64.6% observed vs. 10.7% simulated) and a low with intergenic regions (17.6% observed vs. 75.4% simulated); the 22-nt siRNA loci had a relatively high degree of association with repetitive regions (92% observed vs. 72% simulated) but low with intergenic regions (3.8% observed vs. 17.6% simulated). In contrast, the 24-nt siRNA loci showed no obvious enriched tendency within any genomic region (Additional file 4: Table S3).
Furthermore, we examined the detailed distributions of 24-nt siRNAs by plotting their relative densities along protein-coding genes, transposable elements (TEs) and pseudogenes plus the 1-kb upstream and 1-kb downstream regions. During the density calculation, the contribution of a multi-read to a locus was divided by its mappable location number to minimize the potential bias. Interestingly, different patterns of 24-nt siRNA enrichment were observed in different gene classes, genic territories, developmental stages and strands (Figure 2). First, the siRNA density was significantly higher in protein coding genes than in pseudogenes and TEs in terms of the 1-kb flanking regions, and the protein-coding genes themselves exhibited a much greater siRNA density in the 1-kb flanking regions compared with the gene body regions. Second, 24-nt siRNAs were more enriched on the antisense strands than on the sense strands of protein-coding genes. Third, the overall density of 24-nt siRNAs in the kernels was much greater than in the endosperm, and the 24-nt siRNAs in the 10-DAP endosperm were more dense than those in the 7- and 15-DAP endosperms (Figure 2).
Activation of paternal siRNAs in maize kernels and endosperms
The paternal genome inherited from the sperm is initiated in a coordinated manner upon double fertilization. Supported by the results from the high throughput sequencing and comprehensive SNP information from the comparison of the fully assembled Mo17 and reference B73 genomes [22], we successfully differentiated the parental origin of the siRNA reads for more than 140,000 siRNA loci, enabling us to examine the process of paternal siRNA activation during the early stages of maize kernel development. Because long gaps caused by insufficient reads in the assembled Mo17 genome could lead to inevitable bias, siRNA loci with gaps of longer than 5% of their own length were not considered (74,521 siRNA loci remaining). To minimize the influence of ambiguous mapping, siRNA loci containing paternal reads in un-pollinated 0-DAP kernels or with low expression level (Materials and Methods) were removed, and 39,126 siRNA loci were retained for further analysis.
By plotting the paternal versus maternal expression of the remaining siRNA loci, we observed a gradual activation pattern of paternal siRNAs after fertilization in 3- and 5-DAP kernels in terms of the numbers of expressed paternal siRNA loci and their increased expression levels (Figure 3A). From 0 to 3 DAP and from 3 to 5 DAP, 1,165 loci with 3,112 paternal siRNAs and 1,372 loci with 3,956 paternal siRNAs, respectively, were activated. The relatively small proportion of activated siRNA loci and low expression levels were most likely caused by a large proportion of maternal tissue. Even so, we can still conclude that the paternal siRNAs are initiated earlier in maize kernels than previously reported in Arabidopsis seeds, in which, the paternal siRNAs remained silent until 5 DAP [27]. However, this predominantly maternally biased expression pattern of siRNAs in the kernels may not necessarily indicate a delayed activation of siRNA paternal alleles compared with the maternal alleles because more than 99% of the loci that were identified in the endosperm samples were also detected in the kernel samples.
In the 7-DAP endosperm, the parental siRNAs reached a normal dosage ratio of 2 m:1p, implying an early full paternal siRNA loci activation and a balanced siRNA biogenesis between the two parental genomes (Figure 3a and b). Interestingly, the proportion of parental siRNAs was the same as that of the protein-coding genes in the 7-DAP endosperm, consistent with the 2 m:1p parental genome composition [22]. Compared with the 1,372 paternally activated loci with 3,956 paternal reads in the 5-DAP kernels, 15,172 loci with 171,235 paternal reads, 13,381 loci with 111,678 paternal reads and 14,321 loci with 159,329 paternal reads were found in the 7-, 10- and 15-DAP endosperms, respectively (Figure 3b). The large difference in paternal siRNA expression between the 5-DAP kernels and the 7-DAP endosperm also supports our speculation that the paternal alleles are likely initiated earlier in the endosperm; rather than an abrupt siRNA burst from the paternal genome within only 2 days, because these alleles were hardly sampled in the kernels due to the high proportions of maternal tissues around the endosperm and embryo. In the 10-DAP endosperm, we observed a global maternal bias of siRNA expression deviating from 2 m:1p, consistent with the report in Arabidopsis endosperm in which an abrupt increase of maternal Pol IV-dependent siRNAs occurred from 4 to 6 DAP, followed by reduced or even undetectable maternal p4-siRNA expression levels during the later developmental stages [17].
Dynamic expression of imprinted siRNA loci
Based on the parental expression plotting of the siRNA loci, we observed a subset of siRNAs that were expressed maternal-only or paternal-only in a parent-of-origin pattern (Figure 3A), and this type of biased expression of siRNAs has been previously reported in rice and Arabidopsis[17],[18]. To avoid the influence of maternal tissue in the maize kernels, we decided to identify imprinted siRNA loci only in 7-, 10- and 15-DAP pure endosperms using the following procedures. First, a set of approximately 140,000 identified SNP-containing siRNA loci were prescreened using two criteria: the locus on the Mo17 genome must not contain a gap ≥ 5% of its length, and there must be no paternally derived reads mapped in the 0-DAP kernel. Then, the χ2 test was performed to examine whether the observed ratios of the maternally versus paternally derived reads significantly deviated from the normal ratio of 2 m:1p in each cross. Furthermore, a more stringent cutoff was employed to precisely define the MESL and paternally expressed siRNA loci (PESL), respectively, namely at least 90% of the maternal reads or 70% of the paternal reads from the total SNP-associated reads (with a minimum of 20 reads) mapped to a locus in both reciprocal crosses. At the significance level of p = 0.01, the parental expression ratio of 2,220, 2,451 and 2,136 siRNA loci were detected to have a significant deviation from 2 m:1p in both of the reciprocal crosses, and among which, 59, 1,188 and 110 loci exhibited a parent-of-origin pattern at 7, 10 and 15 DAP, respectively. Finally, a set of 460 candidate imprinted siRNA loci were retained in the three endosperm samples based on the reads count cutoff, including 456 MESL and only 4 PESL. Except for three and thirteen loci that were simultaneously imprinted at three and two stages, respectively, the majority of the MESL (444 of the 456) were stage-specific, with 394 (88.7%) occurring only at 10 DAP (Figure 4a and Additional file 5: Table S5). By plotting the maternal versus the paternal expression of all the imprinted siRNA loci using SNP-containing reads in the 7-, 10- and 15-DAP endosperms, we found that the expression levels of the 10-DAP-specific MESL peaked at 10 DAP and decreased at 7 and 15 DAP, and their parental expression ratio also dynamically changed: at 7-DAP, these MESL already exhibited a maternally biased expression pattern, and their parental expression ratios were significantly deviated from the expected 2 m:1p ratio (χ2 test, p ≤ 0.01) but failed to pass the ratio cutoff (90%). Later, at 10 DAP, their maternal bias was more deviated due to the elevated siRNA expression of the maternal alleles and satisfied our imprinting screen criteria; However, these MESL changed to be biallelically expressed at 15 DAP (Figure 4B and C). Conclusively, the transient imprinting status of the 10-DAP-specific MESL was likely attributed to the dynamically changed expression level and parental dosage ratio at 10 DAP compared with 7 and 15 DAP.
Imprinted genes might be influenced by the adjacent imprinted siRNA loci
The stage-dependent imprinting pattern of the 10-DAP-specific MESL is reminiscent of the enrichment of 10-DAP-specific MEGs reported by Xin et al. [22] using the same set of maize endosperm smaples. To examine whether the expression of imprinted genes were potentially influenced by their adjacent imprinted siRNA loci, we specifically examined imprinted siRNAs within the 2-kb flanking regions of the 371 imprinted genes combined from the previous three studies [22],[28],[29]. Of the 371 imprinted genes, 357 were associated with a siRNA locus, including 298 that harbored SNP-containing siRNA loci. A total of 166 imprinted genes were further excluded because their harbored siRNA loci contained a gap longer than 5% of its length on the Mo17 genome or contained paternal reads at 0 DAP. Of the remaining 132 siRNA loci, only 31 siRNA loci adjacent to 29 imprinted genes had a minimum of 20 siRNA reads and were included in the final list for the determination of their imprinting status. Finally, 13 of the 29 imprinted genes (13 paternally expressed genes (PEGs) and 16 MEGs) contained strictly defined imprinted siRNA loci located within their 2-kb flanking regions. This fraction (13/29) was considered a significant co-occurrence of imprinted genes and imprinted siRNA loci based on the simulation analysis (Figure 5a). Specifically, we randomly placed the same amount of imprinted genes onto the genome and counted how many imprinted siRNA loci were located within their 0.1, 0.5-, 1-, 2-, 5- and 10-kb flanking regions. The simulation was performed 1,000 times to generate a series of empirical distributions of the numbers of paired imprinted genes and imprinted siRNA loci. The distributions indicated that the chance of 13 co-occurrence of imprinted genes and imprinted siRNA loci was significantly higher than the expected numbers generated by the simulations, examined by the hypergeometric distribution test (p = 2.122 × 10−6) (Figure 5a).
Among the thirteen pairs of associated imprinted protein-coding genes and imprinted siRNA loci, five pairs were MEGs and MESL, one pair was a PEG and PESL, and the remaining seven pairs were PEGs and MESL (Figure 5b). The last seven pairs were potentially interesting due to their opposite parental expression patterns because maternally derived siRNAs might be involved in the epigenetic silencing of the maternal alleles in cis, causing the paternal allele-specific expression of their adjacent imprinted genes. For example, the PEG GRMZM2G139406, encoding a C2H2-type zinc finger protein was highly expressed in the 7-DAP endosperm, but the expression level decreased at 10 DAP and undetectable at 15 DAP, harboring a MESL exhibiting an opposite imprinting status in the three stages (Additional file 6: Figure S2).
SiRNAs are likely to participate in nutrient uptake and allocation in the developing maize endosperm
Genomic distribution of siRNA loci was preferentially enriched in high-density gene and chromosome ends regions and there was a higher density of siRNAs flanking protein-coding genes compared with TEs and pseudogenes (Figures 1 and 2). To better understand the potential functions of siRNAs in maize kernels and endosperms, we classified these siRNA loci as kernel-specific and endosperm-specific based on a cutoff of a five-fold difference in average siRNA abundance among the three kernel stages (0, 3, 5 DAP) and three endosperm stages (7, 10, 15 DAP). We identified 453 kernel-specific and 729 endosperm-specific loci from 51,071 siRNA loci containing at least 1,000 reads combined from the 12 samples. A hierarchical clustering of these siRNA loci showed a higher degree of expression changes in the endosperm compared with that in the kernels which may indicates a more dynamic activity of siRNA loci in the developing endosperm (Figure 6a). A Blast2GO analysis showed that 316 genes harboring endosperm-specific siRNA loci within their 2-kb flanking regions were predominantly enriched in the “RNA splicing via transesterification reactions”, “DNA recombination” and “protein-DNA complex assembly” terms (Additional file 7: Table S4). Interestingly, these genes were also enriched in the “regulation of telomere maintenance” category, consistent with the dense distribution of siRNAs in the maize chromosome ends (Figure 1). In contrast, the 257 genes harboring kernel-specific siRNA loci were significantly enriched in the “response to cytokinin stimulus” and “response to organic nitrogen” GO functional categories (Additional file 7: Table S4).
Furthermore, we identified 27, 776 and 270 stage-enriched siRNA loci in the 7-, 10- and 15-DAP endosperms, respectively, using the same criteria as above (Figure 6b). A total of 12, 551 and 103 protein-coding genes, respectively, contained these corresponding stage-specific siRNA loci in their 2-kb flanking regions. Genes harboring 10-DAP-specific siRNA loci showed a significant enrichment in the “response to auxin stimulus”, “response to brassinosteroid stimulus” and “regulation of gene expression” groups (Figure 6C and Additional file 7: Table S4). This result is consistent with our previous observation in the RNA-Seq data in which 20 genes encoding auxin responsive factor (ARF) showed elevated expression at 10 DAP, followed by decreased expression at 15 DAP [22]. This result is also coincident with the report that the IAA concentration abruptly increased in 10-DAP endosperm [30] and then decreased at approximately 15 DAP [31],[32].
In contrast, genes harboring 15-DAP-specific siRNA loci were enriched in the “nutrient reservoir activity”, “protein localization to vacuole” and “secondary metabolite biosynthetic process” categories, including four members of the zein storage protein family (Additional file 7: Table S4). Interestingly, this result is consistent with the observation that 15 DAP is the stage of rapid fresh weight increase and the biosynthesis of zein proteins and starches [23]. Although the exact mechanism of how these stage-specific siRNA loci are involved in the hormone response and nutrient accumulation remains unclear, our data indicates that siRNA expression may correspond to specific cellular or biochemical processes that are essential for endosperm or kernel development.