Genome-wide identification and transcriptional analysis of folate metabolism-related genes in maize kernels

Background Maize is a major staple food crop globally and contains various concentrations of vitamins. Folates are essential water-soluble B-vitamins that play an important role as one-carbon (C1) donors and acceptors in organisms. To gain an understanding of folate metabolism in maize, we performed an intensive in silico analysis to screen for genes involved in folate metabolism using publicly available databases, followed by examination of the transcript expression patterns and profiling of the folate derivatives in the kernels of two maize inbred lines. Results A total of 36 candidate genes corresponding to 16 folate metabolism-related enzymes were identified. The maize genome contains all the enzymes required for folate and C1 metabolism, characterized by highly conserved functional domains across all the other species investigated. Phylogenetic analysis revealed that these enzymes in maize are conserved throughout evolution and have a high level of similarity with those in sorghum and millet. The LC-MS analyses of two maize inbred lines demonstrated that 5-methyltetrahydrofolate was the major form of folate derivative in young seeds, while 5-formyltetrahydrofolate in mature seeds. Most of the genes involved in folate and C1 metabolism exhibited similar transcriptional expression patterns between these two maize lines, with the highest transcript abundance detected on day after pollination (DAP) 6 and the decreased transcript abundance on DAP 12 and 18. Compared with the seeds on DAP 30, 5-methyltetrahydrofolate was decreased and 5-formyltetrahydrofolate was increased sharply in the mature dry seeds. Conclusions The enzymes involved in folate and C1 metabolism are conserved between maize and other plant species. Folate and C1 metabolism is active in young developing maize seeds at transcriptional levels.

Due to the lack of functional DHNA, HPPK/DHPS, ADCS, ADCL, and DHFS, humans cannot synthesize folate de novo, and thus folate fortification in foods such as wheat flour is required [2]. Besides, overexpressing folate biosynthetic and metabolic enzymes originating from plant or non-plant organisms is known to be an effective alternative to enhance folate contents in food crops including tomato, rice, and maize [7][8][9][10]. Maize is a major staple food crop globally. To date, few studies on folate metabolism genes in maize are available [11,12]. For example, the first DHFR-TS gene from maize was cloned and the RNA transcripts for ZmDHFR-TS were shown to accumulate to high levels in developing maize kernels and meristematic tissues [11]. Another gene involved in folate metabolism was characterised in the brown midrib 2 (bm 2) mutant, in which a functional MTHFR gene showed reduced transcript levels. As a result, the mutant showed a reddish-brown colour associated with reductions in lignin concentration and alterations in lignin composition [12]. However, no systematic characterisation of folate metabolism genes in maize has been reported, and how folates flow during maize kernel formation remains unknown. Therefore, identification of folate-related genes at the whole genome level and characterisation of folate metabolism during maize kernel formation could provide a foundation for understanding of the folate metabolism in maize and molecular breeding of folate-fortified maize varieties.
In this study, an intensive in silico analysis was performed to screen for genes involved in folate metabolism using all publicly available databases. We found that the maize genome contains all enzymes required for folate and C1 metabolism, which are characterised by highly conserved domains, similar to other species. To further advance our understanding of the folate metabolism in maize, two representative maize inbred lines with significant differences in total folates in mature seeds were chosen to investigate the expression of folate-related genes and the profiling of folate derivatives during kernel formation.

Results and discussion
Identification and phylogenetic analysis of putative folate metabolic genes in maize To understand the folate metabolism in maize, we first investigated the conservation of all folate-related genes between Arabidopsis and maize on a whole-genome scale as the folate metabolism pathway has been well characterised in Arabidopsis compared to other plant species. Folate metabolism involves folate synthesis and the C1 cycle. Enzymes involved in folate synthesis in maize were identified via BLAST using homologs from Arabidopsis. Consequently, eight enzymes were identified (Table 1). One ortholog was identified for HPPK/ DHPS and ADCS, respectively, two for GTPCHI, DHNA, DHFS, and FPGS, respectively, three for ADCL, and four for DHFR. Within each group of maize orthologs such as GTPCHI, DHNA, DHFS, and DHFR, the protein similarities were all higher than 90 %. The protein similarity between the two FPGS orthologs was 77.8 %. A rather low protein similarity was observed in between ADCL orthologs (45.3 % for between ADCL1 and ADCL2). These results indicated that the majority of orthologs involved in folate synthesis were conserved in maize.
Eight enzymes involved in C1 metabolism in maize were also identified, which were annotated as SHMT, GDC complex (GDCH, GDCP, and GDCT), DHC, MTHFR, MS, 10-FDF, FTHS, and 5-FCL, respectively. Because SHMT1 is the major functional SHMT enzyme in Arabidopsis [13,14], maize SHMT1, the closest counterpart of Arabidopsis SHMT1, was used in this study. We found that the maize GDC protein complex consisted of one GDCP, one GDCT, and four GDCHs, and the lowest sequence similarity to maize GDCH among the GDCH orthologs was 71.2 %. 10-FDF and FTHS each had one ortholog; MTHFR and 5-FCL each had two orthologs, and the sequence similarity between each pair of orthologs was 94.5 % and 51.2 %, respectively. DHC and MS each had three orthologs, and the lowest sequence similarities among orthologs were 61.0 % (between FOLD2 and FOLD3) and 96.3 % (between MS1 and MS2), respectively ( Table 2). These results indicated that the majority of orthologs involved in C1 metabolism at protein level were highly conserved in maize.
To investigate whether folate metabolism-related proteins identified in maize contain conserved domains for their enzymatic activities, all homologs from plants (e.g. sorghum, rice, millet, and Arabidopsis), mammals (e.g. human, rat and mouse), and microorganisms (e.g. yeast and E. coli) were analyzed using Simple Modular Architecture Research Tool [15] (SMART). As expected, the enzymes participating in folate metabolism and C1 cycle were largely conserved between maize and other species. The representative proteins from maize, Arabidopsis, and E. coli are shown in Tables 3 and 4. A detailed comparison of the enzymes involved in folate synthesis between the three species led to the following interesting findings. First, the same PFAM domains were present with different lengths. For example, both FPGS and DHFS contained the Mur_ligase_M domain that is responsible for attaching glutamates to folylpolyglutamates or monoglutamates, respectively. However, the Mur_ligase_M domain in FPGS was 36-amino acid shorter than that in DHFS both in maize and Arabidopsis (Table 3). Second, GTPCHI evolved two repeats of the GTP_cyclohydroI domain in the plants, while only one in E. coli (Table 3). Third, three enzymes, including ADCS, HPPK/DHPS, and DHFR/TS, have evolved to be bifunctional enzymes in the plants. For example, both maize and Arabidopsis ADCS contained two GATases, one Anth_synt_I_N, and one chorismate_binding domain, functionally corresponding to Anth_synt_I_N and chorismate_bindingcontaining PABA and GATase-containing PABB in E. coli to produce ADC. Similar phenomena were observed in HPPK/DHPS and DHFR/TS, respectively (Table 3). Two enzymes involved in C1 reactions contained different number of PFAM domains in different species. For example, three GCV_T domains were present in the maize GCST, whereas two in Arabidopsis and E. coli. The five domains in E. coli MS, i.e. S-methyl_trans, Pterin_bind, B12-binding, B12-binding_2, and Met_synt_B12, were found to be merged as two domains of Meth_synt_1 and Meth_synt_2 in Arabidopsis and maize (Table 4). Phylogenetic trees of folate-related proteins from sorghum, rice, millet, Arabidopsis, human, rat, mouse, yeast and E. coli were constructed using the neighbour-joining method. The majority of clade credibility values between maize and sorghum or millet were higher than 70 %, suggestive of a close relationship between the enzymes in maize with those in sorghum and millet. These observations are consistent with the fact that maize, sorghum, and millet share a common C4 origin [16,17] (Figs. 2, 3, 4). Some homologs, including ADCS, ADCL, DHNA, HPPK/DHPS, and DHFS, were not present in animals (Fig. 2), and the remaining homologs from plants and animals were divided into two sibling groups (Figs. 3 and 4). There was a special type of tree where the plant branches were divided into multiple classes, and each class contained most of the plant species, such as DHC, ADCL, 5-FCL, and GDCH (Table 1 and  Table 2). The remaining trees were characterized that all the plant homologs were classed as a single clade, in which the maize orthologs were either present as a single gene, such as ADCS, HPPK/DHPS, GDCT, GDCP, SHMT1, HPPK/DHPS, 10-FDF, and FTHS, or as multiple genes, such as DHNA, DHFS, GTPCHI, DHNA, DHFS, DHFR, MS, FPGS, and MTHFR (Figs. 2, 3, 4; Table 1 and Table 2). These results indicate that the folate metabolism-related proteins are conserved in maize, and the differentiation of the function of these proteins is complicated during the evolutionary process. Maize differed from Arabidopsis in the number of genes participating in folate and C1 metabolism. For example, more orthologs of DHFR, GTPCHI, DHFS, and GDCH as well as less orthologs of DHNA, 10-FDF, FPGS, DHC, HPPK/DHPS, and GDCP were identified in maize than in Arabidopsis. Of these enzymes, four, including AtDHFS, AtFPGS1, AtFPGS2, and AtFPGS3, functioned as a ligase in Arabidopsis [18] (Table 2). A mutation in AtDHFS caused embryo lethality [19], and the dysfunction of FPGS1 or FPGS2 resulted in abnormal responses to low nitrogen in the dark or light [20,21]. These reports are suggestive of distinct functions between the DHFS and FPGS in Arabidopsis, albeit they contain the same domain. In maize, the Mur_ligase_M domain was also found   to be present in the corresponding orthologs, including two DHFSs and two FPGSs, and further biochemical and genetic studies on these orthologs will elucidate their biological functions. DHNAs were reported to have distinct expression pattern between Arabidopsis and maize [22,23]. In Arabidopsis, three DHNA orthologs were identified, among which AtFolB2 was highly expressed in roots, stems, siliques, young leaves, and mature leaves, whereas AtFolB3 was undetectable [22]. However, only two DHNA orthologs were identified (Fig. 2). The transcripts of FOLB1 MAIZE and FOLB2 MAIZE were abundant in roots, shoots, developing leaves and tassels, and seeds [23]. These observations imply that the maize orthologs may play different roles than Arabidopsis ones.

Folate profiling in maize kernels
Maize kernels are the primary source of folates for humans [24]. Investigation of folate biosynthesis during kernel formation and in mature seeds is important for understanding folate metabolic flux in maize. To this end, two representative maize inbred lines with a significant difference in total folates in dry seeds were chosen. Ji63 is originated from China, belonging to the NSS subpopulation with pedigree being (127-32 × Tie84) × (Wei24 × Wei20); GEMS31 is from the United States, belonging to the TST subpopulation with pedigree being 2282-01_XL380_S11_F2S4_9226-Blk26/00 [25]. 5-F-THF and 5-M-THF in the dry seeds from these two inbred lines grown in different locations were measured using liquid chromatography-tandem mass spectroscopy (LC/MS). Irrespective of the significant variations across the years, GEMS31 contained a lot more total folates than Ji63, with 12.9 folds being the smallest difference in 2010 (Table 5). Moreover, it was observed that 5-F-THF accounted for over 70.3 % of total folates in Ji63 and 94.4 % in GEMS31 across the four consecutive years. These results indicated that 5-F-THF was the major storage form of folate derivative in both GEMS31 and Ji63 regardless of the total folate levels in dry seeds.
To investigate how folate derivatives are accumulated during kernel formation, the kernels at R1 (silking stage) on DAP 6, R2 (blistering stage) on DAP 12, R3 (milking stage) on DAP 18, R4 (late milk-dough stage) on DAP 24, and R5 (early dent stage) on DAP 30 were collected for LC-MS analysis in 2013. In contrast to that in dry seeds, 5-M-THF was more accumulated than 5-F-THF in young seeds of both lines from DAP 6 to DAP18. GEMS31 and Ji63 contained similar levels of total folates in the seeds at the early developmental stages which was indicated by the ratio of folates in GEMS31 vs folates in Ji 63 being around 1 (0.91 on DAP 6 and 1.07 on DAP 12). At the late developmental stages, i.e. DAP 18 and DAP 30, the total folates in GEMS31 were significantly higher than that in Ji63 from (Fig. 5). These results were quite different from that observed in dry seeds, suggesting an ongoing active folate metabolism during the seed maturation.
5-M-THF accounted for over 60 % of the total folates in GEMS31 (61.1 % for DAP 6, 67.2 % for DAP 12, and 69.9 % for DAP 18) and over 90.2 % in Ji63 (90.2 % for DAP 6, 98.3 % for DAP 12, and 97.1 % for DAP 18) during early stages of kernel formation (Table 6). However, no significant change in 5-F-THF was observed before DAP 18 in either of the inbred lines: 5-F-THF in GEMS31 maintained~0.80 nmol/g FW, while that in Ji63~0.10 nmol/g FW before DAP18. After DAP 18, 5-M-THF was decreased to a similar level in both lines, and the proportion of 5-M-THF was also reduced due to the increased 5-F-THF ( Fig. 5; Table 6). Notably, from DAP 30 on, a much sharper increase of 5-F-THF was observed in GEMS31 than in Ji63 (Fig. 5). The profiling of these two inbred lines demonstrated that 5-M-THF was the dominant folate derivative at least before DAP 18, implying a more active C1 reaction at early stages of seed development than late stages given the fact that 5-M-THF is the donor for C1 cycle.
Different metabolites show different accumulation patterns during seed development, and the storage metabolites normally start to accumulate from the early developmental Table 5 The contents of total folate and the proportion of 5-F-THF in mature dry seeds Total folates (nmol/g DW) The proportion of 5-F-THF (%)   [26,27]. In maize, over 80 % of total starch is stored in the endosperm, 80 % of total oil in the embryo, and proteins are found in both the embryo and endosperm [28]. The rate of oil synthesis typically peaks between DAP 15 and DAP 25, and the accumulation peaks on DAP 30; carotenoids behave in a similar manner [29]. Starch accumulation occurs from DAP 10, peaks on DAP 15, and remains steady thereafter [27]. Likewise, amino acids accumulate during the early stage, and steady-state transcripts of the genes involved in amino acid biosynthesis peak in kernels on DAP 10 and in embryos on DAP 15 [26]. It has also been reported that some metabolites are decreased during kernel formation. For example, flavone is decreased during DAP 14 to DAP 40 in maize [30]. Unlike the metabolites mentioned above, folate derivatives showed different accumulation patterns in maize kernels. 5-M-THF peaked on DAP 12 and consistently decreased, whereas 5-F-THF remained unchanged at low levels during the early stages,  Fig. 6 qRT-PCR of folate-synthesis related genes during kernel formation. qRT-PCR of folate-synthesis related genes during kernel formation of Ji63 and GEMS31, respectively. Three biological samples were used for analysis and all reactions were performed in quadruplicate. Data are means ± SD (n = 4). Names of the proteins are listed in Table 1. The same samples were used as that used for folate profiling. Because expression of ADCL3 was not detected, it's not shown but gradually increased to high levels in dry seeds (Fig. 5).
These results indicate that the various folate derivatives may differ one aother in functioning during seed development in maize.

Transcript expression of folate-related genes in maize kernel
To understand the transcriptional expression of the genes involved in folate and C1 metabolism, the ortholog genes identified above were investigated in the developing seeds of Ji63 and GEMS31 using qRT-PCR (Figs. 6 and 7). The same samples were used as that used for folate profiling. Transcripts of the genes involved in folate biosynthesis were most abundant on DAP 6 in the two lines (Fig. 6), and a similar pattern was observed for C1 metabolism-related genes (Fig. 7), albeit an exception was observed for ADCL2 in Ji63 (Fig. 6). The most active DNA synthesis takes place at early stage of seed development (DAP 1 to DAP 6), for which the folatedependent purine and pyrimidine synthesis is required [31,32]. Thus, the observation that the highest transcript levels of folate-related genes were detected on DAP 6 is supportive of the previous reports, and indicates that the folate and C1 metabolism is active in young seeds. However, a precaution must be taken to correlate the gene transcript levels with folate levels. First, the folate profiling revealed a peak of 5-M-THF on DAP 12, but transcripts of the genes encoding MS, consuming 5-M-THF to synthesize methionine, and MTHFR, catalyzing formation of 5-M-THF, peaked on DAP 6 and decreased sharply on DAP 12 and DAP18 (Figs. 5 and 7). Second, there was no significant difference in transcript abundance of the folate-related genes between GEMS31 and Ji 63 although the total folates in the dry seeds were markedly different. The observations mentioned above suggest an existing complicated folate metabolism-regulatory mechanism in maize seeds. Investigation of the enzymatic activities of folate-related enzymes in combination with a genome-wide association study would allow us to elucidate the roles of the folate metabolism-related proteins in folate derivative accumulation in maize kernels.

Conclusions
Taken together, these findings suggest that folate and C1 metabolism is conserved between maize and other species, especially sorghum and millet. Metabolite profiling demonstrates that 5-M-THF is the dominant folate derivative in early developing seeds, and 5-F-THF is the major storage form in mature seeds. These two folate derivatives play different roles during kernel development. Genes involved in folate and C1 metabolism are actively expressed at the early stages of kernel development. This study provides a foundation for a future indepth investigation of folate metabolism in maize. Fig. 7 qRT-PCR of C1 metabolism related genes during kernel formation. qRT-PCR of C1 metabolism related genes during kernel formation Ji63 and GEMS31, respectively. Three biological samples were used for analysis and all reactions were performed in quadruplicate. Data are means ± SD (n = 4). Names of the proteins are listed in Table 2. The same samples were used as that used for folate profiling

Plant materials and folate measurement
Ji63 and GEMS31 inbred plants were grown at Shunyi, Beijing, China in the summer of 2013. The experimental field was loamy soil with pH 6.8, organic matter 0.7 %, phosphorus 13.8 mg/L, and potassium 48 mg/kg. During field preparation, 440 kg/acre of urea (46-0-0) was applied. The herbicides were applied 5 d after planting. Plants were hand planted in 5-m-long rows with row and plant spacing of 25 cm, respectively. Kernel samples were harvested on 6, 12, 18, 24, and 30 days after pollination (DAP) and removed from the ear axis of three ears, respectively. Three biological replicates which the kernels from three ears were mixed as one replicate were harvested and frozen in liquid nitrogen immediately. The folates exaction and measurement were repeated for four times in each replicate. Similar results were obtained in these replicates, and the results of one replicate were described and discussed in this reports. Besides, these two inbred lines were grown in 2009 in Hainan, in 2010 in Yunnan, and in 2012 in Hainan, China. Standards of 5-M-THF and 5-F-THF were purchased from Schircks Laboratories. The samples collected from field were used for identification of folate profiles. The methods for sample preparation and metabolite measurement were described previously [20]. The contents of folate in dry seeds of each inbred line were measured once across the four consecutive years. Folates in seeds on DAP 6, 12, 18, 24, and DAP 30 were measured in four biological replicates, and each sample consisted of 50 mg of plant material.

Identification of folate metabolic genes in maize and other species
With reported processes of the folate metabolic enzymes in plants as queries [3], the Blast software were used to search the maize genome databases, including the Maize Genetics and Genomics Database [33], Arabidopsis Information Resource [34], National Center for Biotechnology Information [35] (NCBI), Phytozome [36], and the Swiss-Prot Protein Database [37] (Swiss-Prot). The proteins and their accession numbers used for alignment and phylogenetic tree construction are listed in Table 3.

Alignment, phylogenetic analysis and domain detection
Total of 238 amino acid sequences of folate metabolic enzymes in maize and other species were aligned using the ClustalW tool [38]. The multiple alignments resulted in an unrooted distance tree using neighbour-joining algorithms of MEGA version 5. The reliability of the tree was examined using bootstrap analyses (1000 replicates). The conserved motifs were identified using Simple Modular Architecture Research Tool [15].

Quantitative real-time qRT-PCR
Total RNA from maize kernels of DAP 6, DAP 12, and DAP 18 was extracted using a standard TRIzol RNA isolation protocol (Invitrogen) [39], respectively. To eliminate any residual genomic DNA, total RNA was treated with RNase-free DNase I (New England Biolabs) [40] and used to synthesise first-strand complementary DNA (cDNA) using the RevertAid First Strand cDNA Synthesis kit (Fermentas) [41]. Primers used in this paper are listed in Table 7. Primer premier 5.0 [42] was used to design the primers according to the CDS sequences of related genes.
qRT-PCR was performed in a 7500 real-time PCR system using the SYBR premix Ex Taq (TaKaRa) [43]. The cDNAs were made from three samples and all reactions were performed in quadruplicate. PCR conditions were as follows: 95°C for 30 s, 40 cycles of 95°C for 5 s, 60°C for 34 s. The ACTIN (GRMZM2G126010) was used as the reference gene to normalize the target gene expression, which was calculated using the relative quantization method (2 -ΔΔCT ).