De novo transcriptome assembly reveals putative biosynthetic genes involved in the biosyntheses of isoflavones and miroestrol in Pueraria candollei var. mirifica

Background: Pueraria candollei var. mirifica , a Thai medicinal plant used traditionally as a rejuvenating herb, is known as a rich source of phytoestrogens, including isoflavonoids and the highly estrogenic miroestrol and deoxymiroestrol. Although these active constituents in P. candollei var. mirifica have been known for some time, actual knowledge regarding their biosynthetic genes remains unknown. Results: A de novo transcriptome analysis was conducted using combined P. candollei var. mirifica tissues of young leaves, mature leaves, tuberous cortices, and cortex-excised tubers.A total of 166,923 contigs was assembled for functional annotation using protein databases and as a library for identification of genes that are potentially involved in the biosynthesis of isoflavonoids and miroestrol. Twenty-one differentially expressed genes from four separate libraries were identified as candidatesinvolved in these biosyntheticpathways, and their respective expressions were validated by quantitative real-time reverse transcription polymerase chain reaction. Notably, isoflavonoid profiling generated by LC-MS/MS was positively correlated with expression levels of isoflavonoid biosynthetic genes across the four types of tissues. Moreover, we identified R2R3 MYB transcription factors that may be involved in the regulation of isoflavonoid biosynthesis in P. candollei var. mirifica . To confirm the function of a key-isoflavone biosynthetic gene, P. candollei var. mirifica isoflavone synthase identified in our library was transiently co-expressed with an Arabidopsis MYB12 transcription factor ( At MYB12) in Nicotiana benthamiana leaves. Remarkably, the combined expression of these proteins ledto the production of the isoflavone genistein. Conclusions: Our results provide compelling evidence regarding the integration of transcriptome and metabolome as a powerful tool for unraveling biosynthetic pathways in plants. type using three biological replicates. Based on

are generally similar, starting with phenylalanine ammonia lyase removing amides from the first substrate, phenylalanine, to produce cinnamic acid that is hydroxylated by cinnamate 4-hydroxylase to produce p-coumarate. The enzyme 4-coumarate-CoA ligase then activates p-coumarate by attaching a co-enzyme A (coA), and subsequently, chalcone synthase (CHS) binds p-coumaroyl-CoA to three molecules of malonyl-CoA to form a chalcone skeleton. Chalcone can be converted to flavanone by chalcone isomerase.
Liquiritigenin is a substrate for both the flavonoid and the isoflavonoid pathway.
Isoflavonoids are generally synthesized from common intermediates (either liquiritigenin or naringenin) within the recognized flavonoid biosynthetic pathway by aryl migration, which is catalyzed by isoflavone synthase (IFS). This pathway leads to the formation of an intermediate product, 2-hydroxyisoflavanone, which is then dehydrated to daidzein through catalysis by 2-hydroxyisoflavanone dehydratase ( Fig. 1) [12]. CHS and IFS have been identified in P. mirifica several years ago [13,14], however, their enzymatic functions have not been elucidated. Udomsuk et al. [15] suggested that miroestrol biosynthesis may share a common pathway with isoflavonoid biosynthesis due to their similar backbone structure. So far, no biosynthetic genes or enzymes involved in miroestrol biosynthesis have been reported, thus also the transcription factors responsible for regulating the expression of these biosynthetic genes remain to be identified. V-myb myeloblastosis viral oncogene homolog transcription factors (MYB TFs), which are the largest plant transcription factor family, have been reported to possess key functions in regulating the synthesis of phenylpropanoid-derived compounds in plants [16]. These proteins have attracted substantial interest regarding phytoestrogen biosynthesis in plants.
Transcriptomes produced from high-throughput sequencing of various plants are a potential source for identifying genes involved in the biosynthesis of different secondary metabolites. The transcriptome of Pueraria lobata, a species closely related to P. mirifica, has been published previously [17,18]. Although these two plants produce similar types of isoflavones, miroestrol occurs only in P. mirifica. In the afore-mentioned studies, P. lobata genes that encode core isoflavone biosynthetic enzymes were identified, and their expression levels in various tissues were examined.
In the current study, we high-throughput sequenced P. mirifica to produce transcriptome libraries of young leaves, mature leaves, cortex-excised tubers, and tuberous cortices, and we de novo assembled the transcriptomes to characterize the biosynthetic pathway of miroestrol. Additionally, MYB TFs involved in isoflavonoid biosynthesis were also identified. This integrative approach of using transcriptomics and metabolomics provides new insights for the prediction and identification of putative biosynthetic genes and transcription factors that are potentially involved in P. mirifica isoflavonoid and miroestrol biosynthetic pathways.

De novo transcriptome assembly of P. mirifica
To obtain nucleotide sequences of expressed genes in various tissues of P. mirifica, we constructed a cDNA library from pooled tissues including, young leaves, mature leaves, cortex-excised tubers, and tuberous cortices ( Fig. 2a-2d). The library was processed using an Illumina Hiseq 2000 platform, yielding approximately 8.2 giga base pairs and a total of 7,386,137,640 clean nucleotides (nt). The de novo assembly resulted in 166,923 contigs (62,567,517 nt) and 104,283 unigenes (81,810,584 nt), the mean length/N50 for contigs and unigenes was 375/734 nt and 785/1558 nt, respectively. To assess the completeness of transcriptome data, we performed a BUSCO analysis compared to the 2,121 single-copy orthologs of the eudicot lineage. The de novo transcriptome assembly was complete to 87.4%, 6.9% of contigs were fragmented, and 5.7% of the transcriptome was missing (Supplementary Table S1). In addition, the assembly produced here was compared to that of other leguminous plants (summarized in Table 1).  Figure S3 shows the substantial number of transcripts related to cellular components and metabolic processes. The remaining 67,225 unigenes produced no BLAST hits. These unannotated unigenes may be uncharacterized genes or assembled sequences that were too short to produce hits.
The Kyoto Encyclopedia of Genes and Genomes (KEGG) is a comprehensive database for identifying biological pathways and for functional annotation of gene products. Pathwaybased annotation helps produce an overview of the different metabolic processes that are active within an organism, and it helps improve our understanding of biological functions of unigenes. All unigenes were analyzed using the KEGG pathway with an e-value cutoff of < 10 − 5 . We obtained 33,317 unigenes, 3,997 of which were related to biosynthesis of secondary metabolites and 8,103 to general metabolic pathways.  Figure   S4. These functional annotations were used for identifying genes involved in isoflavonoid biosynthesis in P. mirifica.

Proposed miroestrol biosynthetic pathway, differential accumulation of transcripts associated with isoflavonoids, and miroestrol biosynthesis
Miroestrol biosynthesis potentially shares a pathway with isoflavonoid biosynthesis [15].
We propose that daidzein, a common isoflavone aglycone, is hydroxylated by at least two cytochrome P450 enzymes at the 2′ and 3′ carbon of the B-ring to produce 2′,5′dihydroxydaidzein; these enzyme may be members of the CYP81E subfamily which is known to use isoflavones as substrates [22][23][24][25]. Then, 2′,5′-dihydroxydaidzein would be reduced by isoflavone reductase and subsequently would be prenylated by a prenyltransferase using dimethylallyl diphosphate as a co-substrate to produce deoxymiroestrol and miroestrol ( Fig. 1).
Based on a functional annotation of each unigene found in the P. mirifica de novo transcriptome assembly and phylogenetic analyses, a total of 14 putative genes involved in isoflavone biosynthesis were predicted as follows: two PAL genes (encoding phenylalanine ammonia lyase), one C4H gene (encoding cinnamate 4-hydroxylase), three 4CL genes (encoding 4-coumarate-CoA ligase), two CHS genes (encoding chalcone synthase), one CHR gene (encoding chalcone reductase), two CHI genes (encoding chalcone isomerase), one IFS gene (encoding IFS), and two HID genes (encoding 2-hydroxyisoflavanone dehydratase). In addition, a total of seven putative genes were identified in our proposed miroestrol biosynthetic pathway: three CYP81E genes, two IFR genes (encoding isoflavone reductase), and two PT genes (encoding prenyltransferase). To investigate gene expression levels of these unigenes across four tissues of P. mirifica, reads per kilobase million of transcripts were calculated from the RNA sequencing data generated using a NextSeq500 platform. An analysis of differentially expressed genes (DEG) on these unigenes (probability threshold q = 0.9) was visualized as a heat map based on our proposed miroestrol biosynthetic pathway (Fig. 1). Additionally, a total of 16 putative genes encoding UDP-glycosyltransferase that are potentially involved in this pathway were phylogenetically identified from the transcriptome data. Their DEGs across the four tissue types are shown as a heatmap in Supplementary Fig. S5.
To validate the DEG analysis obtained from RNA sequencing, candidate unigenes were selected and analyzed based on their expression levels in the four tissue types using quantitative real-time reverse transcription polymerase chain reaction (qRT-PCR). As expected, expression levels of the selected unigenes were positively correlated with DEGs identified by RNA sequencing (Fig. 3). Indeed, most of the isoflavone biosynthetic genes were highly expressed in tuberous cortices, compared to young leaves, mature leaves, and cortex-excised tubers; however, PmCYP81Es, PmIFRs, and PmPT higher expressed in mature leaves than in the other tissues.

Phytoestrogens and annotated constituents in P. mirifica
High-performance liquid chromatography (HPLC) coupled to tandem mass spectroscopy  Figure S6), suggesting that PmIFS is involved in isoflavone biosynthesis.

Identification of MYB transcription factors involved in isoflavone biosynthesis
We identified 85 putative genes encoding MYB transcription factors in the P. mirifica transcriptome. All putative P. mirifica MYB transcription factors (PmMYB) were aligned to known MYB transcription factors of other plants such as Arabidopsis thaliana, Glycine max, and other species, and a phylogenetic tree was produced from this alignment (Fig. 4).
Based on these analyses, we found six candidate PmMYBs that are potentially involved in PmMYB75, PmMYB76, and PmMYB77 clustered with GmMYB176, which is also involved in controlling CHS expression and isoflavonoid synthesis in soybean [33,34]. Furthermore, also PmMYB24 and PmMYB77 were highly expressed in mature leaves, whereas PmMYB18, PmMYB23, PmMYB75, and PmMYB76 were highly expressed in tuberous cortices (Fig. 5).

Discussion
De novo transcriptome assembly is a useful method for gathering comprehensive information on genetic resources without the need for whole genome sequencing. In addition, this technique facilitates discovery of novel genes, molecular markers, and tissue-specific expression patterns. In the absence of comprehensive genomic data of P.  Fig.S6). The demonstrated in planta function of PmIFS suggests that our transcriptome libraries were suitable for identifying other genes involved in miroestrol biosynthesis. Although in a previous study, a biosynthetic pathway of miroestrol was tentatively suggested [15], the genes required for miroestrol biosynthesis remained unknown so far. Here, the biosynthetic pathway of miroestrol was reconsidered and the most plausible mechanism was proposed ( Fig. 1). Seven putative genes encoding three biosynthetic enzymes involved in our proposed miroestrol biosynthetic pathway were also identified in our transcriptome data. Accumulation of deoxymiroestrol and miroestrol was reported to be higher in tuberous cortices than in cortex-excised tubers of P. mirifica [10].
Similarly, we detected deoxymiroestrol specifically in the tuberous cortices of P. mirifica; miroestrol, however, was not detected. Miroestrol can be non-enzymatically converted from deoxymiroestrol at high temperatures or in acidic or alkaline solutions during storage and extraction [38,39]. In our experiment, P. mirifica tissues were harvested and rapidly frozen in liquid nitrogen, which would prevent conversion of deoxymiroestrol to miroestrol.
In addition, the proposed intermediates 2′-hydroxydaidzein and 3′-hydroxydaidzein were not detected in any tissue. We hypothesized that an organized multienzyme cluster, known as metabolon, may be involved in miroestrol biosynthesis and be responsible for These lower expression levels of putative miroestrol biosynthetic genes could be partially responsible for the low accumulation levels of deoxymiroestrol and miroestrol. In addition, we found considerably high amounts of puerarin in tuberous cortices. Since puerarin, deoxymiroestrol, and miroestrol share an early biosynthetic pathway (Fig. 1), daidzein may be primarily diverted into the puerarin biosynthetic pathway, producing lower daidzein levels for miroestrol biosynthesis. Moreover, we phylogenetically identified a total of five putative genes annotated as C-glycosyltransferase, which plays a key role in A different strategy to enhance the production of isoflavonoids and their valuable derivatives is to manipulate transcription factor regulation. In this study, we identified MYB TF, which generally acts either as a transcriptional activator or repressor for various groups of plant secondary metabolites, including isoflavonoids. Amino acid alignments and expression patterns are informative for determining the functions of individual MYB TFs.
Six candidate MYBs (indicated in red in the phylogenetic tree in Fig. 4) are proposed due to their phylogenetic clustering with characterized MYB TFs that are involved in the regulation of isoflavonoid biosynthetic genes. Interestingly, four of these transcription factor genes were highly expressed in the tuberous cortex, which showed the highest accumulation of isoflavones (Fig. 5). Moreover, two of these transcription factors, PmMYB18 and PmMYB75, shared one branch with those MYBs reported as activators for transcription of chalcone reductase and IFS and increased isoflavonoid production[30, 34].
These two MYBs are thus important candidates for further functional characterization, and PmMYB TFs may be associated with the regulation of isoflavonoid biosynthesis and miroestrol production.

Conclusion
We identified several candidate genes encoding key enzymes or transcription factors involved in the biosynthesis of isoflavonoid and miroestrol. Integrative analyses of transcriptomics and metabolomics indicated the complexity of gene expression and metabolite profiles across tissues, suggesting that cortical tuber tissue is a major site of isoflavonoid biosynthesis. Further molecular and biochemical studies on candidate genes involved in miroestrol biosynthesis are required to identify the functions of these candidate genes.

Plant material, chemicals, and reagents
Leaves and tubers of an approximately two-year-old P. mirifica cultivar SARDI190 (Fig. 2a

Analysis of gene expression levels across the four tissues of mirifica generated using Illumina NextSeq500
Total RNA was extracted from each tissue type using three biological replicates. Based on spectrophotometry, high-quality RNA aliquots of the replicates were pooled per tissue type at equal quantities. The four RNA pools were then subjected to pair-end sequencing using were deposited in the NCBI sequence read archive (accession numbers SRR10177497, SRR10177499, SRR10177500, and SRR10177498 for sequences produced from young leaves, mature leaves, cortex-excised tubers, and tuberous cortices, respectively.)

Functional annotation of assembled sequences
The assembled sequences were first compared against the NCBI non-redundant protein database (https://www.ncbi.nlm.nih.gov/refseq/), Swiss-Prot

Phylogenetic analyses
All putative genes were translated to amino acid sequences using the Expasy translate tool (http://web.expasy.org/translate/). All full-length sequences of putative genes involved in isoflavonoid and chromene synthesis were aligned using BioEdit software (http://www.mbio.ncsu.edu/BioEdit/bioedit.html). We applied a maximum likelihood method with 100 bootstrap replicates using MEGA7 software [54] to produce phylogenetic trees.

Assessment of candidate gene expression using qRT-PCR
Transcription levels of candidate genes were assessed using qRT-PCR. Total RNA isolated from the four tissue types was used (Fig. 2a-2d)
Additional file 3.docx - Table S3. DNA primer list for qRT-PCR validation. Figure 1 Proposed miroestrol biosynthetic pathway in P. mirifica. The heatmap following each gene name indicates the relative differential expression of genes in young leaves, mature leaves, tubers without cortices, and cortices of P. mirifica, respectively. Enzyme abbreviation: PAL; Phenylalanine ammonia-lyase, C4H;