In comparison with current knowledge of genomic imprinting (i.e. regarding number of imprinted genes and regulatory mechanisms) in mammalian genomes, the study of genomic imprinting in plants has been hindered by the low number of imprinted genes that have been reported and studied to date. In this study, we have sought to address this by identifying novel imprinted genes in the model plant Arabidopsis thaliana and considering our results in the light of screens performed by others, and of current theories concerning the regulation of imprinting in plants.
In this study, we have conducted a genome-wide allele-specific expression analysis screen using cDNA-AFLP to identify 93 maternally expressed TDFs from a total of 4500 polymorphic allele-specific TDFs. Some of these may represent candidate maternally expressed genes regulated by imprinting in the model plant Arabidopsis thaliana. To identify the genes represented by each TDF, we developed a novel bioinformatics software program called GenFrag which can directly identify genes (in well annotated sequenced genomes e.g. Col-0 accession) based only on the size of the TDF and the selective nucleotides of the primers used to generate the TDF. Although cDNA-AFLP is an early generation transcriptomics platform, as a technique it has some distinct advantages over probe hybridisation based approaches such as microarrays. These advantages include: (a) applicability to any species (including species with no genomic information), (b) low cost and reproducibility, (c) small amounts of RNA template needed, (d) detection of lowly expressed genes and (e) high specificity to distinguish closely related genes [47–50]. However, one of the most time-consuming steps in the cDNA-AFLP technique is the excision of TDFs from gels so that the TDF can be sequenced (typically following amplification and/or subcloning into a plasmid). To increase the throughput of gene identification in cDNA-AFLP experiments involving species with sequenced and well annotated genomes (such as Arabidopsis thaliana), we developed the GenFrag bioinformatics software program.
There have been previous efforts to develop bioinformatic approaches to improve the efficiency of (cDNA-)AFLP techniques. The large amount of DNA sequence data available for several species has been used for in silico predictions of virtual transcript profiles. Tailor-made software, such as AFLPinSilico  and GenEST [52, 53], allow high-throughput identification of AFLP and cDNA-AFLP TDFs for Arabidopsis thaliana and Globodera rostochiensis, respectively. These in silico approaches were also developed to enable experiment simulations, decreasing the time needed for AFLP optimisation, and the number of samples which need to be processed [51–53]. The GenFrag program developed in this study is designed to facilitate high throughput direct identification of genes from cDNA-AFLP experiments with fully sequenced well-annotated genomes such as that of Arabidopsis thaliana. We have made the GenFrag program freely available to the research community at: http://www.nem.wur.nl/UK/Research/bio/.
In our study to identify novel imprinted genes in Arabidopsis thaliana, we applied the GenFrag program to the 93 TDFs displaying a maternal-specific expression pattern, and could thereby identify 52 maternally expressed genes (MEGs) in Arabidopsis thaliana (Table 1). By filtering for expression within seeds and enrichment within endosperm tissues, we ranked 18 MEGs on the basis of the absolute difference of their expression levels between the seed coat and the endosperm (Table 2). The identification of MS5-like and PDE120 was also supported by alternative approaches i.e. comparison with the dataset of Day et al. (; Table 1) and ranking by ratio of Endosperm/Seed Coat expression (Additional file 6 Table S5). For any given gene expressed in the developing seed, it is difficult to separate both the absolute and relative contributions of the different seed tissues, especially given their differing ploidies (triploid in the endosperm, diploid maternal in the seed coat, diploid hybrid in the embryo) and the differences in cellular/nuclear abundance for the different tissues (seed coat, endosperm, embryo). As the contributions to total transcription are normalised against units of RNA no direct determination of the absolute contributions from each seed tissue is possible. However, we can demonstrate that biallelic expression in the seed is detectable at the developmental stage we sample through use of a biallelic endosperm expressed gene (PHE2) as a positive control (Table 3). Our approach does have the advantage of allowing a focus on highly expressed genes, whose transcripts in seeds 4 dap are least likely to have been maternally deposited in the central cell prior to fertilisation. The endosperm is transcriptionally active immediately following fertilization, such that maternally deposited, long-lived RNAs are unlikely to play an important role  or be found at high levels in endosperm tissues 4 dap. This contrasts with the early development of the embryo, where expression in the embryo is maternally-biased (88% of transcripts at the 2-4 cell stage, for example), with paternal alleles subsequently becoming reactivated at the later globular stages of embryo development . Hence, the top ranked endosperm-enriched genes identified in our study can be considered to be the most likely imprinted genes (Table 2).
A striking finding in our study is that there is little overlap in terms of genes detected between all of the different screens for imprinted genes in Arabidopsis thaliana conducted to date, including our study (Additional file 10 Figure S5). Possible explanations for such lack of overlap can include (a) use of different accessions (genetic backgrounds); (b) use of samples from different developmental stages (where the relative abundance and contribution of embryo versus endosperm tissues will differ); (c) use of different filtering criteria; (d) use of different experimental approaches for isolation of seed, embryo and endosperm tissues and RNA from each tissue; and (e) use of different transcriptome profiling platforms and bioinformatic pipelines. In this study we demonstrate that the imprinted genes we have identified are unlikely to be detected at the later developmental stage used by Hsieh et al. , whilst the lack of overlap between the next-generation sequencing approaches of Hsieh et al. (2011) and Wolff et al.  is likely contributed to the analysis of different time points (7-8 DAP versus 4 DAP) and different accessions (Col-0 × Ler-0 versus Col-0 × Bur-0). There is some overlap (7 genes) between the RNA sequencing approach of  (Col-0 × Bur-0 crosses) and a screen for genes regulated by DMRs in Col-gl X Ler-0 crosses  suggesting that DMRs may control gene-specific imprinting for a limited number of loci, and/or that their ability to do so may vary according to different genetic backgrounds. Although it seems likely that all these approaches have identified imprinted genes it would seem that detection of imprinted loci (gene-specific or allele-specific) may be dependent upon accessions (genetic backgrounds), developmental stages sampled and experimental methodology. These factors may introduce significant variation between the results of different studies. Given the increasing numbers of allele-specific expression effects being detected in plants, it may be opportune for the imprinting research community to develop some common standards for the definition and validation of imprinted genes in flowering plants (see also ).
For the top three ranked genes ATCDC48, PDE120 and MS5-like, using LCM, we could independently detect expression of these genes in 4 dap seed tissues (seed coat, endosperm and embryo) (Additional file 7 Figure S2). For ATCDC48 and PDE120 we also confirmed that expression was low in pre-fertilized ovules but increased during the course of seed development (Figure 1A, B), which is consistent with these genes being subject to post-fertilisation expression in the developing seed (i.e. not maternally deposited). We confirmed that all three of these endosperm-expressed genes are maternally expressed in 4 dap reciprocal F1 hybrid seeds from different accessions and hence represent novel cases of gene-specific imprinting in Arabidopsis thaliana (Figures 2 and 3). While ATCDC48 and PDE120 are subject to binary imprinted expression, MS5-like shows a preferential maternal expression pattern of imprinting [9, 21], as some paternal expression is also detected (Figure 2). Although the expression levels of MS5-like were similar in Col-0 and Ler-0 (Figure 1), and in the pattern determined for Ws-0 (Seed Genes Network), the extent of imprinting did vary, with the C24 and Bur-0 alleles displaying a greater extent of imprinting when paternally inherited.
ICRs of imprinted genes often overlap with DMRs. Hence, we considered that our top-ranked imprinted genes ATCDC48, PDE120 and MS5-like might contain candidate DMRs in their genomic vicinity and that, if so, these could be candidate ICRs. We could identify DMRs upstream of PDE120 and one DMR downstream of ATCDC48 that could potentially act as ICRs (Figures 4A and 4B). However, the difference in methylation between wild-type and dme endosperm did not reveal any DMR for MS5-like (Figure 4C). Expression of DME in the central cell leads to hypomethylation of the maternal genome. However, the methylation data used  represent the global methylation status of both the maternal and paternal genomes of the endosperm. This could explain why no DMR could be identified for MS5-like. Control of imprinting at the MS5-like locus may be independent of DNA methylation, or be regulated by a DMR far distal to the gene. Methylation-independent imprinting has been observed for some imprinted loci in mammals  and histone methylation by Polycomb group proteins has been shown to regulate several imprinted genes in plants [37, 44, 55]. Our results indicate that lack of MET1 in the male gamete has no effect on imprinting of ATCDC48, PDE120 and MS5-like in the developing seed. In contrast, we find that lack of MET1 leads to overexpression of ATCDC48 and PDE120 in vegetative leaf tissues. No effects of lack of MET1 in vegetative tissues were observed for MS5-like. Taking into consideration the recent findings of  and previous reports showing that PcG complexes regulate imprinting [37, 44–46], we also tested for possible effects of the maternal FIS-complex on regulation of the three maternally expressed imprinted genes and found that fertilising fis2 plants with wild-type pollen did not lead to any loss of imprinting. Hence, alternative epigenetic pathways are likely to regulate imprinting of MS5-like. Such regulation can neither be ruled out for ATCDC48 and PDE120. Further characterization of the imprinted ATCDC48, PDE120 and MS5-like loci will provide opportunities for increasing our understanding of the epigenetic mechanisms involved in the regulation of genomic imprinting in angiosperms.
The maternally expressed imprinted gene, ATCDC48A, is a homohexameric AAA(+) ATPase chaperone implicated in cell cycle control and cell proliferation. CDC48/p97 represents a highly conserved protein which plays a role as an initiation factor for DNA replication in many species  and has been shown to be essential in a wide range of multicellular and unicellular organisms . In plants, the CDC48A protein has been shown to physically interact with the SOMATIC EMBRYOGENESIS RECEPTOR LIKE KINASE 1 (SERK1) protein [58, 59]. The Arabidopsis thaliana genome contains three CDC48 loci, ATCDC48A (At3g09840), ATCDC48B (At3g53230) and ATCDC48C (At5g03340). ATCDC48A can functionally complement CDC48 mutants of Saccharomyces cerevisiae , and loss of the PUX1 negative regulator of ATCDC48 leads to accelerated plant growth due to increased cell division and expansion . Additional studies in Arabidopsis thaliana conducted with T-DNA knockout lines of AtCDC48A have demonstrated that homozygous null seedlings are viable until 5 days old but die shortly thereafter. It was also demonstrated that null Atcdc48a alleles have a drastically reduced transmission efficiency through the male gametophyte (i.e. ATCDC48A is essential for normal pollen germination and tube elongation) .
Our results indicate that ATCDC48A is maternally expressed and subject to genomic imprinting in the developing seed (endosperm) (Figures 1, 2 and 3). Although the imprinting status of the maize homolog of ATCDC48A has not yet been determined, it is possible that imprinting of the maize homolog of ATCDC48A (or other cell-cycle genes) could be responsible for the dosage effects on cell-cycle progression observed in endosperm from interploidy crosses of maize . While a clear role for ATCDC48 in the control of DNA replication in plant cells has not yet been established, our findings that ATCDC48 is a maternally expressed imprinted gene in developing endosperm resonates with a role in controlling proliferation as suggested for imprinted genes by the parental conflict theory .
Less is known from a functional perspective regarding the other two imprinted genes identified in this study. The MS5-like maternally expressed imprinted gene has sequence similarity to Male Sterile 5 (MS5), a gene that has been shown to be essential for male meiosis in Arabidopsis thaliana . MS5-like also displays sequence similarity with the sulphur deficiency-induced gene AtSDI1 .
The maternally expressed imprinted gene PDE120 is annotated as a pigment defective embryo (pde) mutant in the SeedGenes database [64, 65]. The nuclear encoded PDE120 locus encodes the TIC40 protein which is a component of the protein import apparatus of the inner envelope of the chloroplast . The identification of a maternally expressed imprinted nuclear gene which encodes a protein product targeted to the maternally-inherited chloroplasts could be suggestive of selection for imprinting at nuclear loci where strong control by maternally-inherited alleles of chloroplast function is essential .