Transcriptomic and metabolomic analysis provides insights into anthocyanin and pyocyanin accumulation in pear

Background Pear is one of the most important fruit crops worldwide. Anthocyanins and procyanidins (PAs) are important secondary metabolites that affect the appearance and nutritive quality of pear. However, few studies have focused on the molecular mechanism underlying anthocyanin and PA accumulation in pear . Results We conducted metabolome and transcriptome analyses to identify candidate genes involved in anthocyanin and PA accumulation in young fruits of the pear cultivar ‘Clapp Favorite’ (CF) and its red mutation cultivar ‘Red Clapp Favorite’ ( RCF ). Gene– metabolite correlation analyses revealed a ‘core set’ of 20 genes that were strongly correlated with 10 anthocyanin and seven PA metabolites. Of these, PcGSTF12 was confirmed to be involved in anthocyanin and PA accumulation by complementation of the tt19-7 Arabidopsis mutant. Interestingly, PcGSTF12 was found to be responsible for the accumulation of procyanidin A3, but not petunidin 3, 5-diglucoside, opposite to the function of AtGSTs in Arabidopsis . Transformation with PcGSTF12 greatly promoted or repressed genes involved in anthocyanin and PA biosynthesis, regulation, and transport. Electrophoretic mobility shift and luciferase reporter assays confirmed positive regulation of PcGSTF12 by PcMYB114 . Conclusion These findings identify a core set of genes for anthocyanin and PA accumulation in pear. Of these, PcGSTF12 , was confirmed to be involved in anthocyanin and PA accumulation. Our results also identified an important anthocyanin and PA regulation node comprising two core genes, PcGSTF12 and PcMYB114 . These results provide novel insights into anthocyanin and PA accumulation in pear and represent a valuable data set to guide future functional studies and pear breeding. regulation core and PcMYB114

The DEGs between group 2-1 and group RCF-CF were subjected to GO (Supplementary Table S4) and KEGG functional pathway analyses (Supplementary Table S5
Interestingly, kuromanin chloride and pelargonin shared the largest number of common transcripts (36 transcripts). This suggested that kuromanin chloride and pelargonin might have evolved similar accumulation mechanisms.
The 203 transcripts were annotated with descriptions from the SwissProt and NR databases. Six transcripts have been functionally characterized to play roles in anthocyanin accumulation in pear previously: PcMYB10, PcMYB114, PcCHS, PcCHI, PcF3H, and PcANS (Supplementary Table S7). The rest were newly identified as candidate genes involved in anthocyanin and PA accumulation in pear. The 203 transcripts were grouped into two clusters (I-II) (Supplementary Table S7). Genes in cluster I were strongly correlated with anthocyanins. Cluster I comprised 183 genes (90.1%). Of these, 147 genes were correlated with a single anthocyanin: 142 genes were correlated with kuromanin chloride, three genes were correlated with cyanidin, one gene was correlated with malvidin-3-galactoside chloride, and one gene was correlated with pelargonin. The remaining genes in cluster I were closely correlated with two or more anthocyanins: 31 genes were commonly correlated with kuromanin chloride and pelargonin, two genes were correlated with pelargonin and cyanidin, two genes were correlated with malvidin-3galactoside chloride and oenin, and one gene was correlated with cyanidin 3-rutinoside, oenin, and cyanidin. Cluster II contained 20 genes (9.9%) that were strongly correlated with both anthocyanins and PAs. Of these genes, two phenylpropanoid structural genes (encoding 4CL1 and 4CL2), six flavonoid structural genes (encoding CHS, 3 CHIs, F3H, and ANS), six TF genes (encoding bZIP1, MYB3, MYB86, MYB111, MYB114, and KNAT1), two phytohormone signal transduction genes (encoding IAA13 and ERF003), two DNA-directed RNA polymerase genes (encoding rpoB and Rpb1) and one GST transporter gene (encoding GSTF12) were positively correlated with anthocyanins and PAs. One WRKY TF gene, WRKY28, was negatively correlated with anthocyanins and PAs (Supplementary Table S7, S8). Each gene in cluster II was strongly correlated with many metabolites. We found that these 20 genes were strongly correlated with all 17 anthocyanin and PA metabolites (Fig.   3A). Thus, they were considered to represent the core genes for anthocyanin and PA accumulation in pear. Of these, PcRPB1 (PCP004386) was correlated with the fewest metabolites: one anthocyanin and three PAs; and PcGSTF12 (PCP025171) was correlated with the most metabolites: seven anthocyanins and seven PAs (Fig. 3B). qPCR analysis of DEGs related to anthocyanin and PA accumulation To validate the RNA-seq data, we conducted qPCR analyses of 10 of the anthocyanin and/or PA candidate genes: PcCHI, PcC1, PcMYB114, PcHB7, PcGAI1, PcCHS, PcGSTF12 , PcANS, PcHB12, and PcMYB10 (for gene IDs and primers, see Supplementary Table S9).
The transcript profiles of all selected genes were very similar to those detected from the RNA-seq data (Fig. 4). The results showed that PcGSTF12 was most up-regulated in comparison groups 'RCF1 vs. CF1' and 'RCF2 vs. CF2'. This result was highly consistent with the results of RNA-seq, and provided further evidence for the crucial role of PcGSTF12 in anthocyanin and PA accumulation in pear. Thus, we conducted further analyses to confirm the function of PcGSTF12.

PcGSTF12-mediated anthocyanin and PA accumulation in pear
Our combined metabolite and transcriptomic analyses revealed a core set of genes closely correlated with pear anthocyanins and PAs, which strongly suggested that they play key roles in anthocyanin and PA accumulation in pear. To test this, we focused on the most upregulated gene among the core set of anthocyanin and PA candidate genes, PcGSTF12, for functional analysis.
Interestingly, a large amount of petunidin 3, 5-diglucoside was detected in WT but not in tt19-7-OE. In contrast, procyanidin A3 was detected only in tt19-7-OE. These results confirmed that PcGSTF12 is responsible for anthocyanin and PA accumulation.
Interestingly, its affinity for anthocyanins and PAs differed from that of AtGSTs in Arabidopsis. In particular, our results showed that PcGSTF12 is responsible for the accumulation of procyanidin A3 but not petunidin 3, 5-diglucoside, opposite to the function of AtGSTs in Arabidopsis. PcGSTF12 is a newly identified member of the phi GST family involved in anthocyanin and PA accumulation.
Next, we analyzed RNA-seq data to identify which genes were affected by PcGSTF12 in the seedlings of tt19-7-OE vs. tt19-7. In total, we found 28 strongly affected genes, which encoded proteins involved in anthocyanin and PA biosynthesis, regulation, and transport (Supplementary Table S10). These results showed that PcGSTF12 might not only be an anthocyanin and PA transporter, but may also participate in many other steps of anthocyanin and PA accumulation.
(2) Upstream regulation of PcGST12 Correlation analyses showed that the transcript level of PcGSTF12 was significantly correlated with that of PcMYB114 (Fig. 5D). Further, several MYB-binding sites were found in the PcGSTF12 promoter, indicating that PcGSTF12 might be directly bound by, and regulated by, MYB transcription factors (Supplementary Table S11). Several R2R3-MYB genes are known to bind to MBS sites [27], and we found a MBS site within an 801-bp region upstream of the start codon. Thus, this MBS site was used in an EMSA assay (for primers, see Supplementary Table S9). The biotinylated probe was able to bind PcMYB114 protein, and the addition of a high concentration of cold probe significantly reduced the binding affinity of the biotinylated probe. To test whether PcGSTF12 could be regulated by PcMYB114, we further carried out luciferase reporter assay (for primers, see Supplementary Table S9). The relative LUC activity of the PcGSTF12 promoter was about 6-fold that of the empty vector control. These results showed that PcMYB114 could directly bind to the MBS site in the PcGSTF12 promoter (Fig. 5E) and positively regulate its activity (Fig. 5F).
We also found many cis-acting elements involved in auxin-, ethylene-, and gibberellinsignaling in the PcGSTF12 promoter. This indicated that PcGSTF12 might be the common downstream target of R2R3-MYBs, and auxin, ethylene, and gibberellin signals that regulate the anthocyanin and PA pathways (Supplementary Table S11). Together, these results provide new clues about PcGSTF12-mediated anthocyanin and PA accumulation in pear.

Discussion
We present the first genome-wide examination of anthocyanins, PAs, and the gene expression profiles of pear using young fruits of cv. 'CF' and its red mutant 'RCF'. Through combined transcriptomic and metabolic analyses, we found a core set of 20 candidate genes related to anthocyanin and PA accumulation in pear. These findings increase our understanding of the molecular mechanism of anthocyanin and PA accumulation in pear, especially during the early stage of fruit development.  Table S8). We also detected positive correlations between PcMYB114 and procyanidin A, procyanidin B1, and procyanidin B3 (Supplementary Table S8), indicating that PcMYB114 also functions in PA accumulation in pear. Except for PcMYB114, the other MYB genes PcMYB3, PcMYB86, and PcMYB111 are newly identified as candidates involved in anthocyanin and PA accumulation in pear.
Interestingly, these R2R3-MYBs were correlated with different anthocyanins and PAs, suggesting that they have undergone sub-specialization to play different and specific roles in anthocyanin and PA accumulation in pear.
Other TFs are also involved in anthocyanin and PA accumulation in pear. The HD family of TFs is unique to plants, and its members are proposed to play key roles in developmental processes such as root development, plant cell differentiation, fruit ripening, and leaf and flower senescence [33][34][35]. Members of the HD-Zip I and HD-Zip IV TF subfamilies also play key roles in anthocyanin accumulation. ANTHOCYANINLESS2 (ANL2) was the first HD-Zip IV gene found to be involved in the tissue-specific accumulation of anthocyanins. In In our study, a bZIP TF gene, PcbZIP1, was positively and closely correlated with six anthocyanins and five PAs. The bZIP TFs harbor a highly conserved bZIP domain [40]. They are diverse transcriptional regulators, playing crucial roles in plant development, physiological processes, and biotic/abiotic stress responses [41]. Recently, two bZIP TFs, MdHY5 and MdbZIP44, were shown to promote anthocyanin accumulation in apple [42,43].
Our results provide further evidence that bZIP TFs act as positive regulators in anthocyanin accumulation. Thus, as well as functioning in anthocyanin accumulation, bZIP TFs may also play important roles in PA accumulation.
In this study, a WRKY TF gene, WRKY28, was closely negatively correlated with one anthocyanin and four PAs.  50,51]. They can be divided into eight subgroups, among which the tau and phi classes play key roles in flavonoid transport [52]. Bronze 2 ( bz2) in maize was the first tau class GST reported to be involved in anthocyanin accumulation; bz2 produces yellow skin kernels because of disabled anthocyanin transport into the vacuole [53]. Anthocyanin deposition is also affected by genes in the phi class, such as AtGSTF12 in Arabidopsis [54,55], FvRAP in strawberry [26], CsGSTF1 in tea [52], and VvGST4 in grapevine [56].
Interestingly, these phi-class GSTs show extensive functional diversification. For example, AtGSTF12 plays a key role in both anthocyanin and PA accumulation in Arabidopsis [54], while CsGSTF1 in tea functions only in anthocyanin accumulation [52]. Functional divergence of GSTs has arisen through nonsynonymous mutations, especially at key amino acid sites [51]. For example, a single amino acid mutation (Arg39 to Trp39) was found to be responsible for the high enzymatic activity of Populus euphratica PeGSTU30 [51]. Li et al. [57] showed that one Trp to Leu substitution at amino acid 205 in AtTT19 resulted in an

Conclusions
In this study, we identified 4065 DEGs and 19 differentially expressed metabolites (12 anthocyanins and seven PAs) between young fruits of the green pear 'CF' and its red mutation 'RCF'. Based on correlation analyses between DEGs and anthocyanins/PAs, we found 203 candidate genes for the accumulation of 10 anthocyanins and seven PAs. We further identified a 'core set' of 20 candidate genes for pear anthocyanin and PA accumulation. Of these, PcGSTF12 was functionally characterized as an important anthocyanin and PA carrier in pear. We also identified an important pear anthocyanin and PA regulation node consisting of two core genes, PcGSTF12 and PcMYB114. These results provide novel insights into pear anthocyanin and PA accumulation. The candidate genes for pear anthocyanin and PA accumulation presented here represent a valuable data set to guide future functional studies.

Plant materials
'RCF' is a typical red pear sport of 'CF' that was discovered in the USA. The fruit of 'RCF' is initially green, then changes quickly to red within 1 week after full bloom, and remains red until the fruit ripens. The coloration pattern of 'RCF' differs from that of most pear species, which color at the ripening stage. Thus, 'CF' and 'RCF' are ideal materials to study the molecular mechanism of anthocyanin and PA accumulation in young pear. The Briefly, fruits of 'CF' and 'RCF' with a similar green color were first sampled on 19 April 2017; these samples were collected at 2 DAFB, and were named 'CF1' and 'RCF1', respectively. Fruits with significant differences in color between 'CF' and 'RCF' were sampled on 22 April 2017; these samples collected at 5 DAFB were named 'CF2' and 'RCF2', respectively. In each experiment, skins from 100 fruits per replicate were collected. Three independent biological replicates were collected for analyses. The fruit skin samples were immediately frozen in liquid nitrogen and stored at −80 °C until further metabolite, RNA-sequencing (RNA-Seq), and qPCR analyses.

Metabolite extraction and separation
Metabolite extraction and separation were carried out as described by Wang et al. [58].
Briefly, the freeze-dried fruit skin was crushed into powder and then extracted overnight at 4 °C in 1.0 ml 70% aqueous methanol. Following centrifugation at 10, 000 g for 10 min, the extracts were filtered before HPLC analysis. The analytical conditions were as follows, Total RNA isolation and RNA-Seq analysis Total RNA was isolated using Trizol reagent (Invitrogen, Carlsbad, CA, USA) and its integrity was evaluated using a 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA). The mRNA was purified from high-quality total RNAs using oligo (dT) magnetic beads and then broken into short fragments with fragmentation buffer. The cDNA was synthesized using a cDNA Synthesis Kit (TaKaRa, Dalian, China) and linked to sequencing adapters at both ends. The cDNA libraries were sequenced using the Illumina sequencing system (HiSeq™ 2000, Illumina, San Diego, CA, USA). Clean reads were obtained using the NGS QC Toolkit [59]. Differential expression analysis was carried out using the DESeq R package (2012). The threshold for significant differential expression was P < 0.05, and |log 2 fold change| ≥ 1 was used to identify the differentially expressed genes (DEGs) between two different cDNA libraries. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses of DEGs were performed using the R platform as described elsewhere [60].

Integrated metabolome and transcriptome analyses
Pearson's correlation coefficients were calculated between the metabolome and transcriptome data. The coefficients were calculated from log2 (fold change) of each metabolite and log2 (fold change) of each transcript by the EXCEL program. Correlations with a coefficient of R 2 >0.8 were selected. Metabolome and transcriptome relationships were visualized using Cytoscape (version 2.8.2).

qRT-PCR validation
The total RNA samples used for RNA-Seq were also used for cDNA synthesis using the PrimeScript™ RT Reagent Kit (TaKaRa) according to the manufacturer's instructions. The qRT-PCR analyses were conducted as described previously [13] using the primers shown in Supplementary Table 1. PcActin was used as the reference gene. Three biological replicates were analyzed.

PcGSTF12 promoter analysis
The 2000-bp upstream sequence of PcGSTF12 was analyzed using tools at the PlantCARE

Availability of data and materials
All data generated or analyzed during this study are included in this published article and its supplementary information files.

45.
Duan S, Wang J, Gao C, Jin C, Li Table S8. Correlation analysis between 20 core genes and 10 anthocyanins and seven procyanidins. Table S9. List of primers used for qPCR analysis and making constructs. Table S10. RNA-seq analysis of candidate genes for anthocyanin and procyanidin accumulation in tt19-7 and tt19-7-OE seedlings.

Figure 4
Transcript levels of anthocyanin-and procyanidin-related genes. Each experiment had three biological replicates.