Skip to main content

Accumulation mechanism of metabolite markers identified by machine learning between Qingyuan and Xiushui counties in Polygonatum cyrtonema Hua


Polygonatum cyrtonema Hua is a traditional Chinese medicinal plant acclaimed for its therapeutic potential in diabetes and various chronic diseases. Its rhizomes are the main functional parts rich in secondary metabolites, such as flavonoids and saponins. But their quality varies by region, posing challenges for industrial and medicinal application of P. cyrtonema. In this study, 482 metabolites were identified in P. cyrtonema rhizome from Qingyuan and Xiushui counties. Cluster analysis showed that samples between these two regions had distinct secondary metabolite profiles. Machine learning methods, specifically support vector machine-recursive feature elimination and random forest, were utilized to further identify metabolite markers including flavonoids, phenolic acids, and lignans. Comparative transcriptomics and weighted gene co-expression analysis were performed to uncover potential candidate genes including CHI, UGT1, and PcOMT10/11/12/13 associated with these compounds. Functional assays using tobacco transient expression system revealed that PcOMT10/11/12/13 indeed impacted metabolic fluxes of the phenylpropanoid pathway and phenylpropanoid-related metabolites such as chrysoeriol-6,8-di-C-glucoside, syringaresinol-4'-O-glucopyranosid, and 1-O-Sinapoyl-D-glucose. These findings identified metabolite markers between these two regions and provided valuable genetic insights for engineering the biosynthesis of these compounds.

Peer Review reports


Polygonatum cyrtonema Hua is an ancient and traditional herbal plant in China (Fig. 1A). The rhizome of this plant, rich in a diverse array of secondary metabolites including triterpenoid saponins, steroids, and flavonoids, offers a myriad of health benefits [1]. These include the prevention of diabetes in obese individuals, enhancement of insulin secretion, improvement of insulin resistance, antibacterial properties, anti-tumor effects, anti-inflammatory properties and anti-aging benefits [2, 3]. P. cyrtonema primarily grown in forests in the southern region of China [4]. Studies have noted differences in the phytochemical compositions of P. cyrtonema from various regions [5, 6]. However, key metabolite markers of P. cyrtonema between regions is unclear, which will inevitably affect the development of related industrial and medicinal application. Hence, it is significant to detect specific metabolic indicators for differentiating the quality of P. cyrtonema grown in various regions.

Fig. 1
figure 1

Metabolomic analysis of P. cyrtonema from two sites in Qingyuan and Xiushui Counties. A Diagram of P. cyrtonema plant. B PCA analysis of the two sites. C Component analysis of the identified metabolites from P. cyrtonema. D Complexheatmap of all metabolites identified by relative quantification

Recently, machine learning methods have significantly enhanced the detection of metabolite markers in plant and human [7, 8]. The support vector machine-recursive feature elimination (SVM-RFE) is a common multivariate approach in machine learning that is extensively employed for selecting features and classifying high-throughput data [9]. Furthermore, the random forest (RF), known for its superior accuracy and interpretability, has gained prominence in predictive modeling for feature biomarkers [7, 10]. Consequently, machine learning has proven instrumental in efficiently identify metabolite markers from voluminous metabolomic datasets. Further elucidation of the underlying biosynthetic mechanisms of these signature metabolites across different regions is particularly crucial.

Accumulation of plant secondary metabolites are usually originated from the phenylpropanoid pathway, such as coumarins, lignans, flavonoids, phenolic acids, and lignins [11]. The phenylpropanoid pathway typically initiates in plants by synthesizing phenylalanine through the glycolysis and shikimic acid pathway [12,13,14]. Afterwards, phenylalanine ammonia lyase (PAL), cinnamate-4-hydroxylase (C4H), and 4-coumaryl-CoA ligase (4CL) catalyze the reaction in a sequential manner to generate cinnamic acid, hydroxycinnamic acid, and CoA-linked 4-coumarate, respectively [15]. These metabolites are directed into two principal downstream pathways: the biosynthesis of monophenols and flavonoids. Lignin production is primarily regulated by hydroxycinnamoyl transferase (HCT), which plays an essential role in controlling the subsequent monophenol metabolism [16]. This is followed by the catalytic activities of cinnamoyl-CoA reductase (CCR) and cinnamyl-alcohol dehydrogenase (CAD). Downstream flavonoid metabolism is believed to be controlled by the pivotal chalcone synthase (CHS). After CHS synthesizes the chalcone, it undergoes conversion to flavanone through the action of CHI. Subsequently, a sequence of enzymes including flavanone 3-hydroxylase (F3H), flavonol synthase (FLS), and flavonoid synthase (FNS) facilitate the generation of dihydroflavonols, quercetin, and apigenin, correspondingly [17]. Despite the extensive knowledge on the phenylpropanoid pathway genes in A. thaliana, the comprehension of biosynthesis of this way in P. cyrtonema remains largely elusive. Currently, application of weighted gene co-expression network analysis (WGCNA) has gained prominence for elucidating the genetic architecture underlying the biosynthesis of secondary metabolites, which integrates comprehensive transcriptomic and metabolomic data [18]. This method considers the association degree among genes by analyzing the pairwise correlations of gene expression profiles, which offers considerable advantages in the exploration of secondary metabolite biosynthesis [18, 19].

In this study, we used a widely targeted metabolome approach to identify 482 metabolites from 21 different wild genotypes of P. cyrtonema growing in Qingyuan and Xiushui counties. Subsequently, the RF and SVM-RFE machine learning techniques were employed to distinguish characteristic metabolites between Qingyuan and Xiushui counties. By comparative transcriptome and WGCNA analyses, candidate genes of these compounds were mined. The PcOMT10/11/12/13 genes were characterized using tobacco transient transformation system. All these results revealed the biomarkers of metabolites in these two counties and their accumulation mechanism in P. cyrtonema.

Materials and methods

Plant materials and sampling

Twenty one wild genotypes of P. cyrtonema were collected from Qingyuan and Xiushui counties (Table S1) and transplanted to the plantation of Zhejiang A&F University. During the year 2017, the rhizomes of P. cyrtonema were disinfected using a 1% concentration of carbendazim for a duration of 60 min. Subsequently, they were washed with tap water and placed in a pot containing nutrient-rich soil. During the year 2020, three sets of P. cyrtonema rhizome samples with removal of fibrous roots were collected, rapidly frozen in liquid nitrogen and stored at -80 °C [20].

Identification and quantification of metabolites through widely targeted metabolome approach.

Metabolites were extracted, identified, and quantified following the method described by Han et al. (2023). In short, the rhizome samples were freeze-dried and then pulverized using a mixer (Retsch, Haan, Germany). The freeze-dried tissue (0.1 g) was dissolved in extraction buffer, and the mixture was incubated at 4 °C for 12 h. The extraction solution was centrifuged with 14,000 rpm for 8 min. Subsequently, the extracted samples were analyzed using an ultrahigh-performance liquid chromatography-electrospray ionization-tandem mass spectrometry (UPLC-ESI–MS/MS). Following UPLC, the effluent was connected in succession to an ESI-triple quadrupole linear ion trap (Q TRAP)-MS [20].

The triple quadrupole linear ion trap mass spectrometer (QTrap) was operated using Analyst 1.6 software (AB Sciex) to gather scans from the linear ion trap (LIT) and triple quadrupole (QQQ). Polypropylene glycol solutions at concentrations of 10 and 100 mol/L were used for instrument tuning and mass calibration for QQQ and LIT scans. QQQ scans were obtained using the multiple reaction monitoring (MRM) mode, with the collision gas (nitrogen) set at 5 psi. For each MRM transition, the DP and CE were optimized. The eluted metabolites were monitored at each time interval by monitoring a specific MRM transition (Fig. S1; Table S2) [20].

Simca-P software (version 13.0, Umetrics AB, Umea, Sweden) was used to perform orthogonal partial least squares discriminant analysis (OPLS-DA) and unsupervised principal component analysis (PCA) on the processed data. The R software was used to conduct hierarchical clustering analysis of metabolites across samples [21]. To identify differentially accumulated metabolites (DAMs), screening criteria such as log2(Fold Change) ≥ 1, p-value ≤ 0.05, or variable importance in the projection (VIP) ≥ 1 were applied. The metabolic data for these two wild genotypes (T118_21 and T26_14) were obtained from Han et al. (2023).

Identification of signature metabolites accumulated in Qingyuan and Xiushui counties

According to Han et al. (2024), feature metabolites were identified by using the RF function of “caret” package. The selection was based on the significance of these metabolites in classification, considering the variations among the metabolome samples. To prevent sample size bias and overfitting, 100 replicas of the random forest model were trained and tested using a well-balanced ten-fold cross-validation approach. Using the “e1071” library in R, a support vector machine (SVM) model was built, and metabolites with an average rank below 30 were chosen as the featured metabolites. Next, we deduced the metabolites that overlapped between the two classifiable models [21].

RNA sequencing and analysis

The MiniBEST Plant RNA Extraction Kit (TaKaRa) was used to extract total RNA. RNA concentration and quality were evaluated with the Agilent 2100 Bioanalyzer. A total of 45 RNA-seq libraries were created via NEB Next Ultra RNA Library Preparation Kit (NEB, E7530) and the NEB Next Multiplex Oligos (NEB, E7500). Next, paired-end sequencing was carried out using an Illumina Hiseq 2500 platform. The Trinity software was employed to compute the transcript abundance of genes in transcripts per million (TPM). Differentially expressed genes were identified with |log2 (fold change) |≥ 1 and p-value (Padj) < 0.05, as detected by DESeq [20]. The RNA-seq data included in this study were obtained from Han et al. (2023).

Weighted Gene Co-expression Network Analysis (WGCNA)

The WGCNA package (version 1.6.6) in R software (version 3.4.4) was used to construct weighted gene co-expression networks and identify associated modules; r > 0.90 and p < 0.001 were defined as the criteria for identifying significant modules related to flavonoids, whereas r ≥ 0.80 and p ≤ 0.001 were established as the criteria for significant modules related to phenolic acids and lignans. Hub genes related to the compounds were considered to meet two criteria, including a single gene-trait correlation over 0.90, and an edge weight greater than 0.1. Gephi was used to display the network of co-expression genes [21]. The RNA-seq data included in this study were obtained from Han et al. (2023).

Tobacco transient expression assay

PcOMT10, PcOMT11, PcOMT12 and PcOMT13 CDS were cloned into the modified plant expression vector pCAMBIA1380-35S::GFP and subsequently injected into Agrobacterium tumefaciens GV3101 [20]. All constructions were expressed transiently in Nicotiana benthaniana leaves. Three days after injection, positive transgenic leaves were identified using real-time quantitative PCR. Widely targeted metabolomics was used to identify phenylpropanoids in overexpressed transgenic tissues. In these trials, three biological replicates were employed.

Extraction of RNA and real-time quantitative PCR (RT-qPCR)

Total RNA was extracted using the MiniBEST Plant RNA Extraction kit (TaKaRa, Japan). The cDNA was obtained by PrimerScript RT Enzyme Mix I kit (TaKaRa, Japan). RT-qPCR was analyzed by SYBR® Premix Ex Taq II (TaKaRa, Japan) and a CFX96 TouchTM Real-Time PCR System (BIO-RAD, USA). Melt-Curve analysis (60 °C—95 °C, 0.5 °C increment for 5 s per step) was used to test the amplicon specificity after the PCR was run as follows: 30 s at 95 °C for pre-denaturation, 40 cycles of 5 s at 95 °C for denaturation, and 30 s at 60 °C for annealing (Takara, Japan) [20]. The comparative Ct (Ct: cycle threshold) approach was applied for relative quantification. Each analysis required technical replicates and three independent biological replicates, respectively. Primers are listed in Table S3.


Metabolite identification among 21 genotypes

To better understand fluctuating variations in bioactive substances among 21 different genotypes from Qingyuan and Xiushui counties, a total of 482 metabolites in P. cyrtonema rhizomes were successfully identified using UPLC-ESI–MS/MS (Table S4). The PCA analysis revealed a strong sample correlation within region, but a distinct separation between the two counties (Fig. 1B). These metabolites were then categorized into 11 distinct groups, which included lignans and coumarins (8 metabolites), lipids (89), nucleotides and derivatives (17), organic acids (36), steroids (52), phenolic acids (50), alkaloids (51), amino acids and derivatives (50), carbohydrates (23), flavonoids (81), and others (25) (Table S5; Fig. 1C and S2). Of those, phenylpropanoids accounted for 28.83% of the total metabolites. Clustering analysis of these compounds also demonstrated a noticeable differentiation between the two regions (Fig. 1D), suggesting variations in the accumulation of these metabolites under different cultivation conditions.

Machine learning-based selection of the best metabolite markers

Machine learning methods were used to screen metabolite markers based on a total of 482 metabolites identified by widely targeted metabolome analysis. Based on the average rank and cv error, 30 feature metabolites were ultimately determined using SVF-RFE (Table S6; Fig. 2A). Furthermore, the top RF model, which relied on 44 trees, recognized 20 metabolites as feature indicators (Table S7; Fig. 2B and C). After the classification by RF, the metabolites that were considered most significant were quercetin-3-O-(6''-malonyl) galactoside, padelaoside C, 9-Hydroxy-12-oxo-10(E), 15(Z)-octadecadienoic acid, epipinoresinol, and salicin. Three overlapping metabolites identified by SVM-RFE and RF methods, including apigenin-7-O-(2''-glucosyl)arabinoside, L-Azetidine-2-carboxylic acid, seryl threonine were screened using Venn diagrams (Fig. S3). Finally, based on the significance of machine learning classification, three metabolites were selected as metabolite markers including quercetin-3-O-(6''-malonyl) galactoside, salicin and epipinoresinol, belonging to flavonoids, phenolic acids and lignans respectively. Furthermore, clustering analysis also revealed a distinct regional differentiation for the aforementioned three classes of coumpounds from the two regions (Fig. 2D), demonstrating that these compounds can effectively distinguish the quality of P. cyrtonema from the two locations.

Fig. 2
figure 2

Machine learning methods identify characteristic metabolites. A SVM-RFE, a support vector machine—recursive feature elimination approach. B The number of decision trees and the relationship to the error rate. C Top 20 of feature metabolite results in the random forest method. D Complexheatmap of all phenylpropane metabolites in two sites

Mining and characterization of key candidate genes involved in flavonoid biosynthesis

Two genotypes T118_21 (Qingyuan County) and T26_14 (Xiushui County) were selected to conduct comparative analysis of flavonoid compounds, which showed that these metabolites had significantly different contents between the two genotypes (Fig. 3A and B).

Fig. 3
figure 3

Mining of key candidate genes in flavonoid biosynthesis. A Volcano plot of flavonoids metabolites; B Relative content of flavonoids compounds. C Volcano map of differentially expressed genes (p-value < 0.05). D Correlation of flavonoids and modules. E qRT-PCR of PcOMT10/11/12/13 in N. benthaniana leaves. F Changes in chrysoeriol-6,8-di-C-glucoside content after overexpression in N. benthaniana leaves. G Flavonoid biosynthesis pathway and expression of key genes in the P. cyrtonema. Green genes show that compound accumulation is inhibited, whereas red genes promote compound accumulation

Further, a comparative transcriptome analysis between the two genotypes was also conducted on the two genotypes. A total of 625 significantly differential expressed genes (DEGs) were identified (Fig. 3C). Four candidate genes related to flavonoids synthesis were selected fromt those DEGs (Fig. S4), and were named PcOMT10, PcOMT11, PcOMT12 and PcOMT13, respectively, based on the phylogenetic analysis (Fig. S5; Table S8). WGCNA was also applied to identify significantly correlated metabolites-modules based on transcripts and 14 flavonoid metabolites (Fig. 3D). There was a strong correlation between apigenin-6-C-xyloside-8-C-arabinoside and apigenin-6-C-glucoside-8-O-xylcoside with the green module respectively, suggesting the similar accumulation mechanisms of these compounds involved in flavonoid biosynthesis. Further, hub genes in darkturquoise and green modules, including the flavonoid structural genes CHI, UGT1 and UGT2, as well as an ERF gene that positively correlating with the UGT1 gene, were identified.

To investigate their roles in the accumulation of flavonoid compounds, we constructed the corresponding plant expression vectors (Fig. S6) and performed transient transformation experiments in N. benthaniana. The qRT-PCR showed that the expression levels of PcOMT11/12/13 were markedly upregulated compared with the control in N. benthaniana leaves (Fig. 3E). Notably, overexpression of PcOMT12 resulted in a significant increase in the content of chrysoeriol-6,8-di-C-glucoside in N. benthaniana leaves (Fig. 3F). Furthermore, the overexpression of PcOMT11 and PcOMT13 led to a substantial increase and decrease, respectively, in the levels of chrysoeriol-7-O-gentiobioside. This indicates their respective roles in promoting and inhibiting accumulation (Fig. 3G).

Mining and characterization of key candidate genes involved in phenolic acids and lignans biosynthesis

Comparative analysis of content of phenolic acids and lignans two genotypes between T118_21 and T26_14 revealed there are 11 different metabolites, including three upregulated and eight downregulated metabolites (Fig. 4A and B). According to WGCNA analysis, there was a strong positive correlation between p-Coumaroylmalic acid and the blue module; salicin exhibited the highest correlation with the lightgreen module; epipinoresinol exhibited a strong correlation with the lightpink4 module (Fig. S7). Finally, we identified two candidate genes, including UGT3 and NAC in the lightgreen module, and NAC might negatively regulate the UGT3 gene. Based on comparative transcriptome analysis, three DEGs PcOMT10, PcOMT11 and PcOMT12 were identified. When PcOMT10 and PcOMT11 were overexpressed respectively in N. benthaniana leaves, the contents of both syringaresinol-4'-O-glucopyranosid (Fig. 4C) and 1-O-sinapoyl-D-glucose (Fig. 4D) were significantly decreased whereas the content of metabolite marker epipinoresinol increased with the overexpression of PcOMT12 (Fig. 4E).

Fig. 4
figure 4

Mining of key candidate genes in the biosynthesis of phenolic acids and lignans. A Volcano plot of phenolic acids and lignans metabolites. B Heatmap of relative content of phenolic acids and lignans compounds. C Changes in syringaresinol-4'-O-glucopyranosid content after gene overexpression in N. benthaniana leaves. D Changes in 1-O-Sinapoyl-D-glucose content after gene overexpression in N. benthaniana leaves. E Phenolic acids and lignans biosynthesis pathway and expression of key genes in the P. cyrtonema. Green genes show that compound accumulation is inhibited, whereas red genes promote compound accumulation

Biosynthetic model of metabolite markers-related phenylpropanoid compounds in P. cyrtonema

Based on metabolite markers between the two regions and comparative transcriptome analysis of two representative genotypes, we constructed a biosynthetic model of important phenylpropanoids in P. cyrtonema. The manifestation of the four genes PcOMT10/11/12/13 in N. benthaniana, were consistent with the pattern of compound accumulation in P. cyrtonema (Fig. 5). Surprisingly, PcOMT12 has a promotional effect on the synthesis of several oxymethylated phenylpropane compounds, such as epipinoresinol, 4-O-glucosyl-sinapate and chrysoeriol-6,8-di-C-glucoside. Moreover, PcOMT13 only somewhat inhibited the accumulation of chrysoeriol-7-O-gentiobioside content. The PcOMT10/11 showed the same catalytic effect in the synthesis of syringaresinol-4'-O-glucopyranosid and vanillic acid.

Fig. 5
figure 5

Analysis of the biosynthetic pathway of phenylpropane in P. cyrtonema. In the pathway diagram, red genes indicate promotion of compound accumulation, while green genes indicate inhibition of compound accumulation


P. cyrtonema is a Chinese traditional and classic dual-purpose plant for food and medicine with a number of biological properties, such as anti-aging, nourishing Yin, anti-inflammation, and immune regulation [2, 4]. In this study, large differences in rhizome metabolites of P. cyrtonema among different germplasms and regions have been investigated (Fig. S2). Identifying metabolite markers in non-model plants remains challenging. Traditional analytical methods may identify a larger number of metabolites; whereas with the use of machine learning methods it is able to focus on a few compounds with a larger weighting, which greatly narrows the scope [21]. Over the past few years, some researcher have begun using machine learning techniques for fruit flavor identification [7]. By machine learning techniques, ten metabolic biomarkers were identified to distinguish specific Chinese cherry accessions [22]. In this study, we detected significant metabolite markers including quercetin-3-O-(6"-malonyl) galactoside, salicin, and epipinoresinol linked to the distinctions between Qingyuan and Xiushui counties using machine learning methods. Each of these metabolites contributes significantly to the plant's growth and medicinal properties. Quercetin-3-O-(6"-malonyl) galactoside, for instance, is associated with the enzymatic browning of iceberg lettuce [23] and has been observed in higher concentrations in Ribes nigrum L. grown in northern regions compared to southern ones [24]. Salicin, commonly extracted from herbaceous plants, is used to alleviate pain in acute rheumatic conditions [25]. Additionally, forsythoside synthesis involves epipinoresinol, a prevalent compound in Forsythia [26].

Through WGCNA, our study identified four key pathway genes, including CHI and UGT1/2/3, as well as two regulatory genes, ERF and NAC, which specifically modulate UGT1 and UGT3, respectively. CHI, catalyzing naringenin, serves as a pivotal intermediate in the flavonoid biosynthesis pathway [27]. Notably, we observed a positive correlation between CHI and vitexin-2"-O-galactoside in our analysis.

Plant glycosyltransferases, known for their extensive role in secondary metabolism, have been comprehensively characterized in various species, including Arabidopsis [28], cereals [29], and rice [30]. Intriguingly, despite a reduction in UGT2 expression, we detected an increase in its associated metabolites, such as apigenin-6-C-xyloside-8-C-arabinoside and apigenin-6-C-glucoside-8-O-xylcoside. Furthermore, our findings imply that ERF may upregulate UGT1, promoting the accumulation of quercetin-3-O-(6"-malonyl) galactoside. Corroborating our results, Wan et al. (2023) reported that transient overexpression of CsERF003 in citrus fruits markedly increased flavanones, flavonoids, and flavonols, alongside the upregulation of key genes involved in their biosynthesis, hinting at CsERF003's regulatory role in flavonoid biosynthesis via UGT expression modulation [31]. Our results also suggest ERF might enhance the expression of UGT1 to facilitate the buildup of quercetin-3-O-(6"-malonyl) galactoside. Conversely, the NAC exhibited a negative correlation with UGT3, potentially enhancing salicin content. Previous studies have underscored NAC's vital role in plant stress responses, predominantly by modulating flavonoid synthesis [32]. In A. thaliana, the NAC transcription factor ANAC078 regulates flavonoid biosynthesis in response to high light stress [33], while in Norway spruce, overexpression of PaNAC03 led to diminished flavonol biosynthesis and aberrant embryo development [34]. These observations suggest that NAC may facilitate an increase in phenolic acid content while reducing flavonol levels.

The O-methyltransferase has garnered increasing attention due to its unexpectedly broad compatibility and selectivity towards a diverse array of substrates, spanning from the synthesis of simple catechols to intricate phenylpropanoids and isoquinoline alkaloids [35, 36]. Previous studies revealed IiOMT3 methylated the 3'-hydroxy moiety of flavonoids such as eriodictyol and 3'-hydroxydaizein. Additionally, it methylated the 7-OH positions of flavones and facilitated the conversion of caffeic acid into ferulic acid [37]. In this study, substrate binding and specificity in plant O-methyltransferases were verified by the results of widely targeted metabolomics of P. cyrtonema and overexpressed transgenic N. benthaniana. Meanwhile, the four genes (PcOMT10/11/12/13) mined in P. cyrtonema showed similar function to those in N. benthaniana for catalyzing the generation of target products from substrates, demonstrating that these enzymes are relatively functionally conserved for the catalytic mechanism among different species. Among them, the PcOMT12 gene played a positive role in different target compound accumulation in multiple reactions and significantly increased chrysoeriol-6,8-di-C-glucoside content after overexpression in N. benthaniana, which is presumed to have high catalytic activity and widely catalytic substrates. It is expected to be further exploited in the future.

In this study, we used machine learning for the first time to identify three significant metabolite markers between Qingyuan and Xiushui counties in P. cyrtonema. Then multi-omics approach were used to mine and verify candidate genes (PcOMT10/11/12/13) with over-expressed transient expression in N. benthaniana leaves. These results offer novel insights into the molecular basis of phenylpropanoid accumulation in P. cyrtonema.


Polygonatum cyrtonema Hua is an ancient and traditional medicinal plant in China. This study focused on the metabolomics of P. cyrtonema growing in Qingyuan County (Zhejiang Province) and Xiushui County (Jiangxi Province). Based on machine learning, we discovered three metabolite markers quercetin-3-O-(6"-malonyl) galactoside, epipinoresinol, salicin. Additionally, comparative transcriptome analysis mined four O-methyltransferases PcOMT10/11/12/13. Additionally, WGCNA helped to identify potential candidate genes associated with flavonoids such as CHI, UGT1, UGT2, ERF, as well as the phenolic acids-related genes UGT3 and NAC. Furthermore, transient overexpression of four OMTs were conducted in N. benthaniana leaves validating the role in altering metabolic flow for accumulating phenylpropanoids. These results are expected to provide an important basis for metabolites substance and subsequent genetic studies applied for precision breeding of P. cyrtonema facilitating the production of high-value-added medicinal products.

Availability of data and materials

Transcriptome data can be found in NCBI (, and the submission numbers is SUB13866596.


  1. Shi Y, Si D, Chen D, Zhang X, Han Z, Yu Q, et al. Bioactive compounds from Polygonatum genus as anti-diabetic agents with future perspectives. Food Chem. 2023;408:135183.

    Article  CAS  PubMed  Google Scholar 

  2. Si J, Zhu Y. Polygonati rhizoma—a new high-quality crop with great potential and not occupying farmland. Sci Sin-Vitae. 2021;51:1477–84.

    Article  Google Scholar 

  3. Zhao L, Xu C, Zhou W, Li Y, Xie Y, Hu H, et al. Polygonati Rhizoma with the homology of medicine and food: A review of ethnopharmacology, botany, phytochemistry, pharmacology and applications. J Ethnopharmacol. 2023;309:116296.

    Article  CAS  PubMed  Google Scholar 

  4. Chen D, Han Z, Si J. Huangjing (Polygonati rhizoma) is an emerging crop with great potential to fight chronic and hidden hunger. Sci China Life Sci. 2021;64(9):1564–6.

    Article  PubMed  PubMed Central  Google Scholar 

  5. Zhang J, Qiu X, Tan Q, Xiao Q, Mei S. A comparative metabolomics study of flavonoids in radish with different skin and flesh colors ( Raphanus sativus L. ). J Agric Food Chem. 2020;68:14463–70.

    Article  CAS  PubMed  Google Scholar 

  6. Ye Y, Zhang X, Chen X, Xu Y, Liu J, Tan J, et al. The use of widely targeted metabolomics profiling to quantify differences in medicinally important compounds from five Curcuma (Zingiberaceae) species. Ind Crops Prod. 2022;175:114289.

    Article  CAS  Google Scholar 

  7. Colantonio V, Ferrão LFV, Tieman DM, Bliznyuk N, Sims C, Klee HJ, et al. Metabolomic selection for enhanced fruit flavor. Proc Natl Acad Sci USA. 2022;119:e2115865119.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Shen B, Yi X, Sun Y, Bi X, Du J, Zhang C, et al. Proteomic and metabolomic characterization of COVID-19 patient sera. Cell. 2020;182:59-72.e15.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Zhang X, Lu X, Shi Q, Xu X, Leung HE, Harris LN, et al. Recursive SVM feature selection and sample classification for mass-spectrometry and microarray data. BMC Bioinformatics. 2006;7:197.

    Article  PubMed  PubMed Central  Google Scholar 

  10. Wang H, Yang F, Luo Z. An experimental study of the intrinsic stability of random forest variable importance measures. BMC Bioinformatics. 2016;17:60.

    Article  PubMed  PubMed Central  Google Scholar 

  11. Dong N, Lin H. Contribution of phenylpropanoid metabolism to plant development and plant–environment interactions. J Integr Plant Biol. 2021;63:180–209.

    Article  CAS  PubMed  Google Scholar 

  12. Vogt T. Phenylpropanoid Biosynthesis. Mol Plant. 2010;3:2–20.

    Article  CAS  PubMed  Google Scholar 

  13. Cavallini E, Matus JT, Finezzo L, Zenoni S, Loyola R, Guzzo F, et al. The phenylpropanoid pathway is controlled at different branches by a set of R2R3-MYB C2 repressors in grapevine. Plant Physiol. 2015;167:1448–70.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Ali MB, McNear DH. Induced transcriptional profiling of phenylpropanoid pathway genes increased flavonoid and lignin content in Arabidopsisleaves in response to microbial products. BMC Plant Biol. 2014;14: 84.

    Article  PubMed  PubMed Central  Google Scholar 

  15. Muro-Villanueva F, Mao X, Chapple C. Linking phenylpropanoid metabolism, lignin deposition, and plant growth inhibition. Curr Opin Biotechnol. 2019;56:202–8.

    Article  CAS  PubMed  Google Scholar 

  16. D’Auria JC. Acyltransferases in plants: a good time to be BAHD. Curr Opin Plant Biol. 2006;9:331–40.

    Article  PubMed  Google Scholar 

  17. Lv Y-Q, Li D, Wu L-Y, Zhu Y-M, Ye Y, Zheng X-Q, et al. Sugar signal mediates flavonoid biosynthesis in tea leaves. Horticulture Research. 2022;9:uhac049.

    Article  PubMed  PubMed Central  Google Scholar 

  18. Han Z, Ahsan M, Adil MF, Chen X, Nazir MM, Shamsi IH, et al. Identification of the gene network modules highly associated with the synthesis of phenolics compounds in barley by transcriptome and metabolome analysis. Food Chem. 2020;323:126862.

    Article  CAS  PubMed  Google Scholar 

  19. The DREAM Module Identification Challenge Consortium, Choobdar S, Ahsen ME, Crawford J, Tomasoni M, Fang T, et al. Assessment of network module identification across complex diseases. Nat Methods. 2019;16:843–52.

    Article  Google Scholar 

  20. Han Z. Machine learning uncovers accumulation mechanism of flavonoid compounds in Polygonatum cyrtonema Hua. Plant Physiology and Biochemistry. 2023;18:107839.

    Article  Google Scholar 

  21. Han Z, Xu Z, Xu Y, Lin J, Chen X, Wang Y, et al. Phylogenomics reveal DcTPS-mediated terpenoid accumulation and environmental response in Dendrobium catenatum. Ind Crops Prod. 2024;208:117799.

    Article  CAS  Google Scholar 

  22. Liu Z, Wang H, Zhang J, Chen Q, He W, Zhang Y, et al. Comparative metabolomics profiling highlights unique color variation and bitter taste formation of Chinese cherry fruits. Food Chem. 2024;439:138072.

    Article  CAS  PubMed  Google Scholar 

  23. Mai F, Glomb MA. Isolation of phenolic compounds from iceberg lettuce and impact on enzymatic browning. J Agric Food Chem. 2013;61:2868–74.

    Article  CAS  PubMed  Google Scholar 

  24. Yang W, Alanne A-L, Liu P, Kallio H, Yang B. Flavonol glycosides in currant leaves and variation with growth season, growth location, and leaf position. J Agric Food Chem. 2015;63:9269–76.

    Article  CAS  PubMed  Google Scholar 

  25. Ferguson GB. removal of a melanotic sarcoma, originating in the sheath of the sartorius muscle, from a man aged seventy-five. The Lancet. 1876;107:384–5.

    Article  Google Scholar 

  26. Rahman MMA, Dewick PM, Jackson DE, Lucas JA. Biosynthesis of lignans in Forsythia intermedia. Phytochemistry. 1990;29:1841–6.

    Article  CAS  Google Scholar 

  27. Tohge T, de Souza LP, Fernie AR. Current understanding of the pathways of flavonoid biosynthesis in model and crop plants. J Exp Bot. 2017;68:4013–28.

    Article  CAS  PubMed  Google Scholar 

  28. Gandia-Herrero F, Lorenz A, Larson T, Graham IA, Bowles DJ, Rylott EL, et al. Detoxification of the explosive 2,4,6-trinitrotoluene in Arabidopsis: discovery of bifunctional O - and C -glucosyltransferases. Plant J. 2008;56:963–74.

    Article  CAS  PubMed  Google Scholar 

  29. Brazier-Hicks M, Evans KM, Gershater MC, Puschmann H, Steel PG, Edwards R. The C-Glycosylation of Flavonoids in Cereals. J Biol Chem. 2009;284:17926–34.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Ko JH, Kim BG, Hur H-G, Lim Y, Ahn J-H. Molecular cloning, expression and characterization of a glycosyltransferase from rice. Plant Cell Rep. 2006;25:741–6.

    Article  CAS  PubMed  Google Scholar 

  31. Wan H, Liu Y, Wang T, Jiang P, Wen W, Nie J. Combined transcriptomic and metabolomic analyses identifies CsERF003, a citrus ERF transcription factor, as flavonoid activator. Plant Sci. 2023;334:111762.

    Article  CAS  PubMed  Google Scholar 

  32. Zhao X, Wu T, Guo S, Hu J, Zhan Y. Ectopic expression of AeNAC83, a NAC transcription factor from Abelmoschus esculentus, inhibits growth and confers tolerance to salt stress in Arabidopsis. IJMS. 2022;23: 10182.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Morishita T, Kojima Y, Maruta T, Nishizawa-Yokoi A, Yabuta Y, Shigeoka S. Arabidopsis NAC Transcription Factor, ANAC078, regulates flavonoid biosynthesis under high-light. Plant Cell Physiol. 2009;50:2210–22.

    Article  CAS  PubMed  Google Scholar 

  34. Dalman K, Wind JJ, Nemesio-Gorriz M, Hammerbacher A, Lundén K, Ezcurra I, et al. Overexpression of PaNAC03, a stress induced NAC gene family transcription factor in Norway spruce leads to reduced flavonol biosynthesis and aberrant embryo development. BMC Plant Biol. 2017;17:6.

    Article  PubMed  PubMed Central  Google Scholar 

  35. Joshi CP, Chiang VL. Conserved sequence motifs in plant S-adenosyl-L-methionine-dependent methyltransferases. Plant Mol Biol. 1998;37:663–74.

  36. Frick S, Kutchan TM. Molecular cloning and functional expression ofO-methyltransferases common to isoquinoline alkaloid and phenylpropanoid biosynthesis. Plant J. 1999;17:329–39.

    Article  CAS  PubMed  Google Scholar 

  37. Tan Y, Yang J, Jiang Y, Sun S, Wei X, Wang R, et al. Identification and characterization of two Isatis indigotica O-methyltransferases methylating C-glycosylflavonoids. Horticulture Res. 2022;9:uhac140.

    Article  Google Scholar 

Download references


This work was funded by the National Key R&D Program of China (No. 2021YFD1000200), the National Natural Science Foundation of China (32200214), the Natural Science Foundation of Zhejiang Province (LQ22H280003).

Author information

Authors and Affiliations



QG is responsible for conceptualizing, curating data, conducting formal analysis, investigating, visualizing, and writing drafts. JY, KF: Writing draft and Formal analysis. YX: Experiments. ZG, HZ, CL, SC, JS: Investigation and Conceptualization. DC and ZH contributed to the conceptualization, data curation, funding acquisition, methodology, project administration, supervision, and writing - review & editing of the study.

Corresponding authors

Correspondence to Donghong Chen or Zhigang Han.

Ethics declarations

Ethics approval and consent to participate

The collecting of all samples in this study were collected from the plantation of Zhejiang Agriculture and Forestry University. All experimental studies complied with relevant institutional, national and international guidelines and regulations.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gong, Q., Yu, J., Guo, Z. et al. Accumulation mechanism of metabolite markers identified by machine learning between Qingyuan and Xiushui counties in Polygonatum cyrtonema Hua. BMC Plant Biol 24, 173 (2024).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: