MITE insertion-dependent expression of CitRKD1 with a RWP-RK domain regulates somatic embryogenesis in citrus nucellar tissues

Background Somatic embryogenesis in nucellar tissues is widely recognized to induce polyembryony in major citrus varieties such as sweet oranges, satsuma mandarins and lemons. This capability for apomixis is attractive in agricultural production systems using hybrid seeds, and many studies have been performed to elucidate the molecular mechanisms of various types of apomixis. To identify the gene responsible for somatic embryogenesis in citrus, a custom oligo-DNA microarray including predicted genes in the citrus polyembryonic locus was used to compare the expression profiles in reproductive tissues between monoembryonic and polyembryonic varieties. The full length of CitRKD1, which was identified as a candidate gene responsible for citrus somatic embryogenesis, was isolated from satsuma mandarin and its molecular function was investigated using transgenic ‘Hamlin’ sweet orange by antisense-overexpression. Results The candidate gene CitRKD1, predominantly transcribed in reproductive tissues of polyembryonic varieties, is a member of the plant RWP-RK domain-containing protein. CitRKD1 of satsuma mandarin comprised two alleles (CitRKD1-mg1 and CitRKD1-mg2) at the polyembryonic locus controlling embryonic type (mono/polyembryony) that were structurally divided into two types with or without a miniature inverted-repeat transposable element (MITE)-like insertion in the upstream region. CitRKD1-mg2 with the MITE insertion was the predominant transcript in flowers and young fruits where somatic embryogenesis of nucellar cells occurred. Loss of CitRKD1 function by antisense-overexpression abolished somatic embryogenesis in transgenic sweet orange and the transgenic T1 plants were confirmed to derive from zygotic embryos produced by self-pollination by DNA diagnosis. Genotyping PCR analysis of 95 citrus traditional and breeding varieties revealed that the CitRKD1 allele with the MITE insertion (polyembryonic allele) was dominant and major citrus varieties with the polyembryonic allele produced polyembryonic seeds. Conclusion CitRKD1 at the polyembryonic locus plays a principal role in regulating citrus somatic embryogenesis. CitRKD1 comprised multiple alleles that were divided into two types, polyembryonic alleles with a MITE insertion in the upstream region and monoembryonic alleles without it. CitRKD1 was transcribed in reproductive tissues of polyembryonic varieties with the polyembryonic allele. The MITE insertion in the upstream region of CitRKD1 might be involved in regulating the transcription of CitRKD1. Electronic supplementary material The online version of this article (10.1186/s12870-018-1369-3) contains supplementary material, which is available to authorized users.


Background
Somatic embryogenesis in nucellar tissues of citrus species is an apomictic system and genetically uniform clones with the same genotype as the maternal plant can feasibly be produced by sowing the seeds despite their highly heterozygous genomes [16]. This capability for apomixis is attractive in agricultural production systems using hybrid seeds, and many researchers have investigated the molecular mechanisms of various types of apomixis [20]. Among the various types of apomixis, citrus apomixis, in which a somatic embryo develops in nucellar tissues, is classified as sporophytic apomixis [37]. Major citrus varieties, such as satsuma mandarin (Citrus unshiu Marc.), sweet orange (C. sinensis (L.) Osbeck), grapefruit (C. paradisi Macfad.), and ponkan mandarin (C. reticulata Blanco), generally develop one or more somatic embryos that are genetically identical to the mother tree in addition to or instead of a zygotic embryo in the seed. This ability to generate multiple somatic embryos and a zygotic embryo in the same ovary tissue is called polyembryony in citrus. In citrus breeding, polyembryony frequently hampers the efforts to obtain zygotic embryos from sexual crosses because somatic embryos grow preferentially to zygotic embryos. Therefore, monoembryonic varieties are generally selected as the seed parent in cross breeding, which is a limitation to breeding because it reduces the available mating combinations. While, polyembryony is useful in rootstock propagation and genetically uniform rootstocks can feasibly be prepared solely by sowing seeds despite the highly heterozygous genomes of citrus species.
To date, various studies have been conducted to investigate the molecular mechanism of citrus adventive embryogenesis, as well as those of other types of apomixis [20]. Polyembryony is dominantly inherited into offspring according to observations of segregation in various cross populations [15]. It is conceivable that a single or a few genes are involved in the somatic embryogenesis and several molecular markers linked to a polyembryonic locus controlling embryonic type (mono/polyembryony) have been developed [11,17,29]. In our previous studies [29,30], a major polyembryonic locus was located on linkage group 1 of the mandarin standard genetic map (AGI map) [36] and scaffold 1 of the clementine mandarin (C. clementina hort ex. Tanaka) genome sequence [42]. Subsequently, molecular tagging of the polyembryonic locus and construction of haplotype-specific bacterial artificial chromosome (BAC) contigs for the polyembryonic locus were carried out. Thereafter, the genomic region of the polyembryonic locus spanning approximately 380 kbp was sequenced and 70 open reading frames (ORFs) were predicted from genomic sequences [28]. Transcription-based approaches have been used to explore the genes associated with citrus somatic embryogenesis. Various genes with specific transcription profiles in either monoembryonic or polyembryonic varieties have been identified by subtractive suppression hybridization (SSH) and microarray analyses [10,22,27]. In these studies, heat shock-related proteins (HSPs) were predominantly expressed among polyembryonic variety genes, as well as WRKY, WD40, and serine carboxypeptidase (SCP) genes.
Recently, next-generation sequence (NGS) technology has allowed rapid and comprehensive sequencing analyses for whole genomes and transcripts of target tissues and organisms. Using NGS technology, the regulatory genes involved in somatic embryogenesis were explored and it was proposed that miRN23-5p-Cs9g06920, a micro-RNA (miRNA, a type of non-coding RNA with a typical length of 20-24 nucleotides), likely has a major role in regulating somatic embryogenesis [23]. It was reported that CitRWP encoding a RWP-RK domain-containing protein [35] would regulate somatic embryogenesis because the insertion of a miniature inverted-repeat transposable element (MITE) was found only in the CitRWP genes of polyembryonic citrus varieties in NGS-based comparative genomic sequence analysis and the transcription level of CitRWP in their ovules was higher than in monoembryonic varieties [41]. The MITE co-segregated with poly/ monoembryonic type in a segregation population of 217 seedlings. In Arabidopsis, RKD genes containing the RWP-RK domain have been characterized as important regulators of an egg cell-related gene expression program [35]. However, functional validation of these candidate genes through transcriptome and NGS analyses remains to be done.
Here, to identify the candidate RKD gene (CitRKD1) responsible for somatic embryogenesis in citrus, a custom oligo-DNA microarray was designed using ORFs newly predicted from a 380 kbp draft sequence of the polyembryonic locus [28], and was used to compare the transcription profiles in reproductive tissues between monoembryonic and polyembryonic varieties. The full length of CitRKD1 was isolated from satsuma mandarin and its molecular function was investigated using transgenic 'Hamlin' sweet orange by antisense-overexpression. The examined citrus varieties contained a pair of CitRKD1 alleles at the polyembryonic locus. The allele with a MITE insertion in the upstream region was highly expressed in reproductive tissues and is likely involved in somatic embryogenesis. We also developed a poly/monoembryony discriminating DNA marker using the conserved sequences of CitRKD1 alleles that could help increase genetic diversity in citrus breeding.

Microarray analysis to identify the candidate gene regulating citrus somatic embryogenesis
To identify the principal gene regulating citrus somatic embryogenesis in the polyembryonic locus, a custom oligo-DNA microarray was designed using the draft sequence. In total, 391 probes including multiple probes for each newly predicted ORF (named NP-ORF to discriminate them from previously reported ORFs for the polyembryonic locus [28]) and 29,148 probes for mRNA loci of the clementine mandarin genome were mounted on the custom oligo-DNA microarray, in which 50 genes overlapped between the polyembryonic locus and mRNA loci of the clementine genome sequence. This custom oligo-DNA microarray was used to compare gene expression patterns among young whole fruits at 15, 30, 45 and 60 days after flowering (DAF) of 'Kiyomi' (C. unshiu Marc. × C. sinensis L. Osbeck) as a monoembryonic variety and 'Harumi' ('Kiyomi' × C. reticulata Blanco) as a polyembryonic variety. Under strict filter conditions, 12 of the 391 probes in the polyembryonic locus consistently showed significant expression changes greater than 2-fold between 'Kiyomi' and 'Harumi' throughout experimental period. These probes were derived from NP-ORF24 (3 probes), NO-ORF44 (6 probes) and NP-ORF60 (3 probes). The expression patterns of these candidate genes are shown in Fig. 1, using the average expression values of their corresponding probes. NP-ORF24 primarily showed homology to ciclev10003992m in scaffold 5 of the clementine mandarin genome sequence, which was a different region from the polyembryonic locus. It was annotated as phosphoribosylaminoimidazole carboxylase but had partially homology to other different loci of clementine mandarin genomes, implying it may be a member of a possible multigene family. The expression values of NP-ORF24 in 'Kiyomi' tended to be higher than those in 'Harumi' but the standard deviation was very large across the whole experiment. NP-ORF44 showed high homology to Ciclev10010497m, which was annotated as a RWP-RK domain-containing protein.
The expression values of NP-ORF44 in 'Harumi' were significantly higher than those in 'Kiyomi'. NP-ORF60 showed high homology to ciclev10009286m, which was annotated as a MYB-like DNA-binding protein.
The expression value of NP-ORF60 remained high until 30 DAF and then decreased to its lowest level at 45 DAF, with parallel expression in the two varieties. Reverse transcription-polymerase chain reaction (RT-PCR) was carried out to confirm these expression patterns using cDNA derived from young whole fruits at 15 and 30 DAF of three monoembryonic varieties ('Kiyomi' , clementine mandarin, Mato buntan pumelo (Citrus grandis L. Osbeck)) and three polyembryonic varieties ('Harumi' , satsuma mandarin, ponkan mandarin) (Fig. 2). Only NP-ORF44 showed a clear difference among them. Various sizes of PCR fragments ranging from 400 bp to 800 bp were observed for NP-ORF24 although PCR fragment was expected to be single and 400 bp in size. This expression pattern was in agreement with a blast search result suggesting that NP-ORF24 might have multiple paralogs on various loci in citrus. According to these results, NP-ORF44 was selected as a candidate for the gene that regulates somatic embryogenesis out of 79 NP-ORFs in the polyembryonic locus region.
Among the 29,148 probes generated from mRNA sequences of the clementine mandarin genome, 356 genes including NP-ORF44 (Ciclev10010497m) showed more than 2-fold expression changes between 'Kiyomi' and 'Harumi' throughout the experimental period (Additional file 1: Table S1 and Additional file 2: Table  S2). The expression of 85 genes was highly expressed while 270 genes were weakly expressed in reproductive tissues of 'Harumi' in comparison with those of 'Kiyomi'. Among them were several genes commonly identified in past reports that are likely associated with somatic embryogenesis, such as UDP-glycosyltransferase superfamily proteins, ankyrin repeat family proteins, heat shock proteins, and protein kinases [22,23,27]. These genes were reported to be involved in oxidative stress responses and callose deposition in the cell. GO term enrichment analysis was carried out to interpret their biological functions (Fig. 3). Nucleosome assembly was the most enriched term in polyembryonic reproductive tissues, followed by lipid metabolic process, oxidationreduction process, and hydrolase activity. Conversely, transmembrane transport, oxidoreductase activity, proteolysis, and RNA binding processes were less active. Although some of these terms might reflect differences of genome composition between 'Harumi' and 'Kiyomi' , this result indicated that polyembryonic reproductive tissues progress more active cell proliferation than monoembryonic reproductive tissues. In fact, the growth of somatic embryos is generally precocious and more vigorous than that of zygotic embryos. Oxidation reduction-related GO terms were found in both lists but the frequency was comparatively higher among the weakly expressed genes. In cotton (Gossypium hirsutum L.), inducing oxidative stress promoted somatic embryogenesis [9]. Oxidation reduction-related GO terms were extremely enriched during the early bud stage before flowering [22,23].
Isolation and sequence analysis of the cDNA and genomic clone of CitRKD1 Full-length cDNA and genomic clones corresponding to NP-ORF44 were isolated from satsuma mandarin and their genomic structures were characterized (Fig. 4a). Two independent RKD homologs, named CitRKD1-mg1 and CitRKD1-mg2, each comprised a 1065 bp ORF encoding 354 deduced amino acids with a molecular mass of 86.8 kDa and a pI of 4.2. The ORFs had six nucleotide differences between them but most were synonymous mutations, with only one non-synonymous mutation.
Their genomic clones had six exons and five introns, and a 219 bp sequence including a 185 bp putative MITE sequence was inserted in the upstream region of CitRKD1-mg2. The MITE had a typical structure with a target site duplication (TSD) and terminal inserted repeat (TIR). The insertion was flanked by a 3 bp TSD (TAA) and the ends of the element had a 19 bp TSD (ACACATTCCAAATTTTTTA). BLAST analysis revealed that these genes were highly homologous to RKD genes of sweet orange (orange1.1g041600m), clementine mandarin (Ciclev10010497m) and trifoliate orange (Poncirus trifoliata Raf.) (ANH22496) with more than 97.5% identity at the amino acid sequence level. These citrus RKD genes were named CitRKD1 and the RKD genes of sweet orange (CitRKD1-org), clementine mandarin (CitRKD1-clm) and trifoliate orange (CitRKD1-tfo) were considered as alleles of CitRKD1. The amino acid sequences of CiRKD1-mg1 and CitRKD1-mg2 were aligned with other citrus CitRKD1 alleles (Fig. 4b).
Their deduced amino acid sequences contained typical RWP-RK domains in the carboxy terminal region. RKD family members containing the RWP-RK domain have recently been characterized as important regulators of an egg cell-related gene expression program [35]. The RWP-RK domain consists of a basic region, helix region and loop, and the amino acid sequences around these regions were well conserved among the citrus species. Because the amino acid sequences of the citrus CitRKD1 alleles were highly conserved, their protein functions are expected to be almost identical. Neighbour-joining phylogenetic tree analysis was carried out using the amino acid sequences of the CitRKD1 alleles, and NLP and RKD genes of clementine mandarin and Arabidopsis (Fig. 5). There are seven RWP-RK domain-containing proteins in the clementine mandarin genome. Ciclev10010497m in scaffold 1, Ciclev10032332m in scaffold 4, and Ciclev10027531m in scaffold 7 are structurally classified into the RKD gene family. The citrus CitRKD1 alleles were clustered with Ciclev10010497m in scaffold 1, which corresponded to the polyembryonic locus controlling embryonic type, implying that CitRKD1 comprised multiple alleles at this locus. Because the CitRKD1 alleles clustered more closely with Arabidopsis RKD genes than Arabidopsis NLP genes, CitRKD1 was considered a member of the RKD family and should play a principal role in regulating somatic embryogenesis.

Expression analysis of CitRKD1 in various tissues of Satsuma mandarin by RT-PCR
RT-PCR was carried out to investigate the expression pattern of CitRKD1 in various tissues of satsuma mandarin. CitRKD1 transcripts were detected in flowers at anthesis and young whole fruit at 30 DAF; thereafter, the transcript level decreased towards 60 DAF (Fig. 6a). In floral organs at anthesis, CitRKD1 transcripts were detected in all examined tissues including the pistil, ovary, anther, petal and sepal (Fig. 6b). In contrast, no transcripts were detected in the vegetative tissues of leaves and stems, or in mature fruit. This expression pattern is likely associated with somatic embryo development in the seed, in which the formation of a primordium cell and initial spherical embryo was observed in the flowering bud stage and around 60 DAF, respectively [18,19].
Satsuma mandarin had two alleles (CitRKD1-mg1 and CitRKD1-mg2) for CitRKD1, but conventional RT-PCR could not clarify whether the obtained PCR fragments came from either or both. Direct sequencing analysis of the PCR fragments indicated that transcripts were amplified from CitRKD1-mg2 based on a SNP (A or G) at the 240th nucleotide from the initiation position of the coding region (Fig. 6c). The transcription of CitRKD1-mg2 with a MITE insertion in the upstream region would be responsible for somatic embryogenesis.

Production of transgenic sweet orange transformed with CitRKD1-mg2 in the antisense direction
To identify the function of CitRKD1, the coding region of CitRKD1-mg2 in the antisense direction was inserted into the CiFT co-expression vector (Fig. 7a) [7]. 'Hamlin' sweet orange has polyembryonic seeds in nature and it was expected that transgenic 'Hamlin' sweet orange would fail to produce polyembryonic seeds. A total of 1274 epicotyl segments from 'Hamlin' sweet orange seedlings were transformed with Agrobacterium carrying CiFT co-expression vector construct, which were usually sufficient number to obtain plural independent transgenic lines. Ultimately, only one independent transgenic line was recovered, probably owing to poor regeneration and growth, and was subsequently grown in the greenhouse. Integration of the transgene into 'Hamlin' sweet orange was investigated through PCR analysis and the obtained transgenic line was confirmed to carry the vector construct. Southern blot analysis was carried out to determine the copy number of the transgene in the transgenic line, and revealed that it had a single copy (data not shown). RT-PCR analysis revealed that transcripts of the transgene were highly accumulated in leaves and flowers in the transgenic line (Fig. 7b). In control 'Hamlin' sweet orange, transcripts of the endogenous sweet orange CitRKD1 allele (CitRKD1-org) were not detected in leaves and were accumulated in flowers. RT-PCR using CitRK-D1-org specific primers revealed that the transcription level of endogenous CitRKD1-org was reduced under detection level in the transgenic sweet orange by the effect of transgene.  The seed coats were removed from the seeds of harvested transgenic fruits, and transgenic and control 'Hamlin' sweet orange seeds were subsequently photographed (Fig. 7c). In transgenic sweet orange, most seeds exhibited a morphologically smooth surface and a few seeds had a slightly rough surface. In contrast, all seeds of the control 'Hamlin' sweet orange were recognizable as polyembryonic seeds at a glance. In the control sweet orange, 2-4 independent T 1 plants germinated from each seed and a total of 27 independent T 1 plants were grown from 10 seeds in pots. Conversely, 10 independent T 1 plants were grown from 10 seeds of transgenic sweet orange. Cleaved amplified polymorphic sequence (CAPS) marker analysis was carried to investigate whether the T 1 plants germinated from self-pollinated zygotic embryos or somatic embryos. Five CAPS markers located on five different linkage groups on the AGI map [36], which exhibited heterozygous genotypes in sweet orange, were used for DNA diagnosis of T 1 plants. All T 1 plants germinated from the control sweet orange had the same heterozygous genotype (a/b) for the five CAPS markers examined as the sweet orange and transgenic mother trees. In contrast, all T 1 plants germinated from the transgenic sweet orange seeds had different genotypes to the mother tree (a/a or b/b), which occurred by self-pollination, for either of the five CAPS markers. A minimum set of three CAPS markers (Tf0001/MspI, Mf0097/DraI, Tf0013/RsaI) could genetically discriminate all transgenic T 1 plants from the mother tree (Fig. 8). This confirmed that the loss of CitRKD1 function from antisense-overexpression resulted in failure to generate polyembryonic seeds in transgenic sweet orange. This result provided strong genetic evidence that CitRKD1 plays an important role in regulating citrus somatic embryogenesis.

Association analysis between the MITE insertion in the upstream region and transcription of the CitRKD1 alleles
Of the two CitRKD1 alleles in satsuma mandarin, transcription was only observed for the allele with the MITE insertion in the upstream region. To understand the association between the MITE insertion in the upstream region and the transcription of CiRKD1 alleles, a tentative genotyping PCR assay was carried out to amplify the genomic region around the MITE insertion site (Fig. 9a). The five monoembryonic varieties commonly had an approximately 1.0 kbp single genomic PCR fragment, A B Fig. 3 GO term enrichment analysis of highly expressed genes (a) and weakly expressed genes (b) using 356 genes with more than 2-fold differences in expression ratios between of 'Kiyomi' (monoembryonic variety) and 'Harumi' (polyembryonic variety) during whole experimental period. The Y-axis indicates the Fisher's test P-value (−log10) while the five polyembryonic varieties commonly had a 1.3 kbp genomic PCR fragment with or without a 1.0 kbp fragment (Fig. 9b). The 1.0 kbp genomic PCR fragment was derived from CitRKD1 without the MITE insertion and the 1.3 kbp genomic PCR fragment came from CitRKD1 with the MITE insertion. The monoembryonic varieties had homozygous genotypes for the CitRKD1 allele without the MITE insertion, while the polyembryonic varieties had heterozygous genotypes with and without the MITE insertion, or homozygous genotypes with the MITE insertion. RT-PCR analysis revealed that transcripts were only present in the reproductive tissues of the polyembryonic varieties (Fig. 9c). Sequence analysis of the PCR fragments amplified from genomic DNA and cDNA indicated that the transcripts were generated from the alleles with the MITE insertion (data not shown). Therefore, CitRKD1 alleles without the MITE insertion could be designated monoembryonic alleles and CitRKD1 alleles with the MITE insertion were designated polyembryonic alleles. Considering that CitRKD1 transcripts were observed only in polyembryonic varieties with polyembryonic alleles, it is conceivable that the MITE insertion in the upstream region might affect the transcription level of CitRKD1.

Consistency between allelic genotypes of CitRKD1 and embryonic types in 95 traditional and breeding varieties
To confirm the consistency between CitRKD1 allelic genotypes and embryonic phenotypes, the genotypes of 95 traditional and breeding varieties from Japanese breeding programs were investigated for MITE insertion allelism. A pair of primers was newly designed using the conserved sequences among CitRKD1 alleles of 14 ancestral varieties from Japanese breeding programs (Table 1) [14]. In allelic genotyping PCR using the new primer set, approximately 0.7 and 1.0 kbp fragments corresponded to monoembryonic alleles (M) and polyembryonic alleles (P), respectively (Fig. 10). Among the 14 ancestral varieties, Duncan grapefruit, Dancy tangerine (C. tangerina hort. ex Tanaka), sweet orange, satsuma mandarin, ponkan mandarin, Mediterranean mandarin (C. deliciosa  Fig. 4 Genomic structure of CitRKD1 alleles (CitRKD1-mg1 and CitRKD1-mg2) isolated from satsuma mandarin (a) and amino acid sequence alignment (b) of CitRKD1 alleles of orange (CitRKD1-org, orange1.1g041600m), clementine mandarin (CitRKD1-clm, Ciclev10010497m) and trifoliate orange (CitRKD1-tfo, ANH22496) with those of satsuma mandarin. A miniature inverted-repeat transposable element (MITE) comprising the typical target site duplication (TSD) and terminal inserted repeat (TIR) structure was found in the upstream region of CitRKD1-mg2. The asterisks indicate the conserved RWP-RK domain in plant RKD genes genotyping PCR, allelic genotyping of CitRKD1 was performed for the remaining 81 varieties. The genotypes of 52 varieties were M/M and those of 27 pedigrees were M/P. Only 'Seminole' and 'Saga' mandarin had P/P genotype, with single fragments of approximately 1.0 kbp in allelic genotyping PCR. The embryonic types of the 95 varieties were perfectly consistent with their allelic genotypes, with M/M varieties having monoembryonic seeds and M/P or P/P varieties having polyembryonic seeds. This allelic genotype and embryonic type information is summarized in Table 1.
Parent-offspring trio analysis was carried out to validate the allele combinations inherited from parent varieties. The allele genotypes of most parent varieties were M/M or M/P, so their offspring genotypes should be either M/ M or M/P but not P/P. Among the 78 parent-offspring combinations in which parentage was confirmed by DNA diagnosis [31], no discrepancies were observed in the parent-offspring relationships. The polyembryonic alleles of 'Seminole' and 'Saga' were inherited from polyembryonic alleles in both parents with the M/P genotype. These results confirmed that MITE insertion was correlated with

Discussion
In the present study, we report the isolation and functional characterization of CitRKD1, which has a RWP-RK domain and is likely involved in somatic embryogenesis in citrus nucellar tissues. CitRKD1, which corresponds to NP-ORF44 at the polyembryonic locus controlling embryonic type, was identified to regulate somatic embryogenesis in nucellar tissues through various analyses of gene expression, genomic structure and loss of function using transgenic sweet orange. Plant-specific RWP-RK family transcription factors are phylogenetically classified into two major subfamilies: NIN-like proteins (NLPs), which are key regulators of N signalling and are involved in nodule organogenesis, and RKD proteins, which are involved in egg cell specification and differentiation [5]. Most RKD genes are expressed predominantly in plant reproductive tissues and are considered to regulate egg cell proliferation by reprograming nucellar cells. AtRKD5 has a broad expression profile throughout all Arabidopsis tissues including vegetative organs and might have a fundamental function during the plant life cycle rather than in egg cell formation [21,34]. Phylogenetic analysis (Fig. 5) suggested the citrus RKD genes responsible for somatic embryogenesis might have functionally diverged from Arabidopsis RKD genes from a putative common ancestor gene during the evolutionary process in dicotyledonous plants. CitRKD1 lacked motif 12 of the RKD(A) subfamily structure but possessed motifs 1 and 2 around the RWP-RK domain [5]. Therefore, CitRKD1 would be expected to have a similar protein function in regulating egg cell proliferation by reprograming nucellar cells.
The effect of loss of CitRKD1 function on somatic embryogenesis was investigated using transgenic sweet orange by antisense-overexpression and resulted in the failure of somatic embryogenesis. Although only one independent line of transgenic sweet orange was generated in this study, this result provides important genetic A B C Fig. 6 Gene expression patterns of CitRKD1 in the flower, leaf, stem, young whole fruit at 30 days after flowering (DAF), and fruit tissues (juice sac and peel) from 60 to 180 DAF (a), and floral organs at anthesis (b) of satsuma mandarin by RT-PCR. EF1-α was used as an endogenous control gene. The PCR fragments were sequenced (c) and the CitRKD1 transcripts were confirmed to derive from CitRDK1-mg2 based on a SNP evidence that CitRKD1 at the polyembryonic locus is involved in somatic embryogenesis and that the transcription of CitRKD1 is required to generate polyembryos in nucellar tissues. At present, it is still hard to evaluate reproductive organ traits in transgenic perennial fruit trees because of their long juvenile stage. In addition, a possible reason for the low success rate of transgenic lines might be deleterious effects of the transgene. In Arabidopsis, mutants of the AtRKD1 and AtRKD2 genes responsible for egg cell proliferation did not display obvious defects in either sporophytic or gametophytic tissues due to functional redundancy within the AtRKD gene family, but ectopic expression of AtRKD1 and AtRKD2 caused severe distortion of plant growth including aberrant tissue proliferation [21]. In addition, Atrkd4 loss of function mutants showed impaired zygotic cell elongation and subsequent cell division patterning [40]. In Marchantia polymorpha, MpRKD, which is orthologous to AtRKD genes, is required to keep the egg cell quiescent in the absence of fertilization by preventing parthenogenesis, but ectopic expression of MpRKD in Arabidopsis did not produce any obvious phenotypes [34]. Considering the RDK family genes are intimately intertwined with other RDK family genes or with downstream associated genes in the embryogenic process, it is possible that the loss of CitRKD1 function might have deleterious effects on the regeneration of sweet orange. In fact, the feasibility of obtaining regenerated plants in citrus depends on the maternal genotype and monoembryonic varieties tend to have lower embryogenic capacity both in vivo and in vitro than polyembryonic varieties [4].
In citrus, multiple alleles of CitRKD1 with high sequence homologies are located at the polyembryonic locus controlling embryonic type (poly/monoembryony). These alleles were classified structurally into polyembryonic alleles with the MITE insertion and monoembryonic alleles without it. CitRKD1 transcripts were only detected in the reproductive tissues of polyembryonic varieties with polyembryonic alleles. Therefore, the MITE insertion in the upstream region likely acts as a regulator by affecting transcription. It was reported that the MITE insertion in the upstream region co-segregated with poly/ A B C Fig. 7 Structure of the CiFT co-expression vector with antisense CitRKD1-mg2 (a) and gene expression of CitRKD1 in leaves and flowers of control 'Hamlin' sweet orange (CNT) and transgenic sweet orange by RT-PCR (b). The transcription level of endogenous CitRKD1-org is investigated using CitRKD1-org specific primers. Transgenic sweet orange bearing fruits, seeds of control and transgenic 'Hamlin' sweet oranges after removing their seed coats are photographed (c). Only one independent line of regenerated transgenic sweet orange was obtained out of 1274 Agrobacterium infected epicotyl segments which are generally sufficient number to obtained plural independent transgenic lines, probably owing to deleterious effects of transgene monoembryonic phenotypes in a segregating population and representative varieties [41]. MITE is a class of DNA transposon and comprises TIRs flanked by small direct repeats [8]. MITEs are frequently inserted into promoters, untranslated regions, introns or coding region of plant genes and play an important role in regulating gene expression and in genome evolution. The effects of MITEs inserted close to genes on transcription levels vary, e.g. some MITE insertions increase expression, some decrease expression, and some do not affect expression at all in rice (Oryza sativa) [24]. There are many reports that MITE insertion in the upstream region of a gene enhances transcription and results in a new phenotype, such as in Hordeum vulgare [38] and Sorghum bicolor [26]. The cis-regulatory elements in the upstream regions of CitRKD1-mg1 and CitRKD1-mg2 were compared using the PLACE database (https://sogo.dna.affrc.go.jp/cgi-bin/sogo.cgi?lang=en&pj=640&action=page&pa-ge=newplace) (data not shown). Various cis-regulatory elements such as a MYB recognition site found in Arabidopsis rd22 (MYB2CONSENSUSAT) [1], a binding site for BELL homeodomain transcription factors (BIHD1OS) [25], a copper-response element (CURECORECR) [33], and a light-induced motif (SORLIP1) [12] were commonly found in both upstream regions. While, several cis-regulatory elements specific to CitRKD1-mg2 were found such as a cis-regulatory element for L1 layer-specific gene expression (SORLIP1AT9) [2] and a sugar response element Fig. 8 CAPS marker analysis of T 1 plants germinated from seeds of control sweet orange (C1-1 to C10-3) and transgenic sweet orange (T1 to T10). DNA isolated from wild type sweet orange leaves (C-L) and transgenic sweet orange leaves (T-L) was used as a reference template. T 1 plants with an underline had different genotypes from the control sweet orange and mother tree. Identical genotypes to sweet orange for the three CAPS markers means the T 1 plant germinated from a polyembryo, while a different genotype means the T 1 plant germinated from a zygotic embryo produced by self-pollination (SRE) [39]. These cis-regulatory elements are not sufficient to explain the broad expression profile of CitRKD1 in reproductive tissues of polyembryonic varieties. Considering that AtRKD1 and AtRKD2 are temporally and preferentially expressed in egg cells in Arabidopsis, it is conceivable that the MITE insertion might alter the innate time-and tissue-specific expression profile in embryogenesis, resulting in broad expression in various reproductive tissues and prolonged gene expression from flowering to the early stage of fruit development.
A DNA marker to detect MITE insertion in the upstream region of CitRKD1 was newly developed using the conserved sequence among 14 ancestral varieties from Japanese citrus breeding programs. This DNA marker named as P/M marker could be applied to various citrus species including mandarin, sweet orange, pumelo, and their hybrids. The allelic genotypes of 95 varieties and breeding pedigrees evaluated using the P/M marker perfectly matched their embryonic phenotypes and the allele combinations in parent-offspring relationships. These results are in agreement with previous reports that polyembryony in citrus is regulated by a single dominant gene [15,32]. In citrus, the polyembryony phenomenon makes it difficult to obtain zygotic embryos and is one of the limiting factors to expanding the genetic diversity. The newly developed P/M maker can be applied to a wide range of citrus varieties and could help to increase the number of parental combinations in citrus cross breeding.

Conclusion
The present study provides genetic evidence that the CitRKD1 with a RWP-RK domain at the polyembryonic locus plays a principal role in regulating somatic embryogenesis in citrus nucellar tissues. CitRKD1 alleles were divided into two types, polyembryonic alleles with a MITE insertion and monoembryonic alleles without it. The protein functions of these alleles might be almost   were collected for microarray analysis, immediately frozen in liquid nitrogen, and stored at − 80°C for RNA isolation. Total RNA was extracted according to a past report [13]. A custom oligo-DNA microarray was designed using the eArray system (Agilent Technologies, Santa Clara, CA, USA; https://earray.chem.agilent.com/earray/) according to the standard system protocols. Probes were constructed using 79 NP-ORFs in the 380 kbp draft sequence of the polyembryonic locus (accession number: AB573149) [28] by the Rice Genome Automated Annotation System (RiceGAAS) (http://ricegaas.dna.affrc.go.jp/) program, and 29,148 assembled mRNAs in the clementine mandarin genome sequence ver. 1.0 (accession number: AMZM001000000) [42]. For the 79 NP-ORFs, probe design was carried out against each exon as far as possible. In total, 29,539 independent probes (391 probes from the polyembryonic locus, 29,148 probes from mRNAs of the clementine mandarin genome sequence) were generated and duplicates were mounted on the custom oligo-DNA microarray (Agilent Design #059979) under the 4 × 44 K format of the Agilent system.
RNA samples were labelled with cyanine-3-labelled cytosine triphosphate using a Low Input Quick-Amp : Phenotype is not visually confimed due to seedless but is determined based on the phenotypes of pedigrees in the cross hybrid tests Labelling Kit (Agilent Technologies) according to the manufacturer's directions. The labelled cRNAs were subsequently hybridized on the custom oligo-DNA microarray using a Gene Expression Hybridization kit (Agilent Technologies) according to the manufacturer's directions. After washing in GE washing buffer (Agilent Technologies), the glass slides were scanned with an Agilent Microarray Scanner (G2505C, Agilent Technologies). The Feature Extraction software (version 10.5.1.1), employing defaults for all parameters, was used to convert the images into gene expression data. Data analysis was carried out using Subio platform version 1.12 (Subio Inc., Aichi, Japan). The raw data were normalized to the 75th percentile intensity of probes above the background level (gIsWellAbove = 1). The normalized values were filtered based on expression changes under the following conditions: lower signal intensity cut-off < 100, cut-off probes flagged with IsPosAndSignif, expression changes greater than 2-fold differences in the expression ratio between 'Kiyomi' and 'Harumi' throughout all experiments. GO term enrichment analysis was carried out using the Fisher's exact test function of the software. The complete microarray data have been deposited in the NCBI Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE115082) under series entry GSE115082.
Gene expression analysis of candidate genes in whole young fruits at 15 and 30 DAF of 'Kiyomi' and 'Harumi' by RT-PCR cDNA was prepared with 1 μg of purified total RNA using a QuantiTect® Reverse Transcription Kit (Qiagen, Hilden, Germany). The PCR reaction was performed in a ProFlex PCR system (Applied Biosystems, Foster City, CA, USA) thermal cycler using Ex Taq® DNA polymerase (Takara, Tokyo, Japan) under the following conditions: 30 cycles of 10 s at 94°C, 15 s at 56°C, and 60 s at 72°C. Primer sequences for four candidate genes were designed using the conserved sequences in the corresponding genes of clementine mandarin and sweet orange in the Phytozome database (https://phytozome.jgi.doe.gov/pz/portal.html). The upper fragment is derived from the polyembryonic allele and the lower fragment is derived from the monoembryonic allele. 'Seminole' (15) and 'Saga' mandarin (63) possess homozygous polyembryonic alleles in their genomes