Comparative transcriptome analysis of roots, stems and leaves of Isodon amethystoides reveals candidate genes involved in Wangzaozins biosynthesis

Background Isodon amethystoides (Ben-th) Cy Wu et Hsuan is an important traditional medicinal plant endowed with pharmacological properties effective in the treatment of various diseases, including pulmonary tuberculosis. The tetracyclic diterpenoids, Wangzaozins (Wangzaozin A, glaucocalyxin A, glaucocalyxin B), are the major bioactive compounds of I. amethystoides. However, the molecular information about the biosynthesis of these compounds still remains unclear. Results An examination of the accumulated levels of Wangzaozins in I. amethystoides revealed considerable variations in the root, stem, and leaf tissues of this plant, indicating possible differences in metabolite biosynthesis and accumulation among various tissues. To better elucidate the tetracyclic diterpenoid biosynthesis pathway, we generated transcriptome sequences from the root, stem, and leaf tissues, and performed de novo sequence assembly, yielding 230,974 transcripts and 114,488 unigenes, with average N50 lengths of 1914 and 1241 bp, respectively. Putative functions could be assigned to 73,693 transcripts (31.9%) based on BLAST searches against annotation databases, including GO, KEGG, Swiss-Prot, NR, and Pfam. Moreover, the candidate genes involving in the diterpenoid biosynthesis, such as CPS, KSL, were also analyzed. The expression profiles of eight transcripts, involving the tetracyclic diterpenoid biosynthesis, were validated in different I. amethystoides tissues by qRT-PCR, unraveling the gene expression profile of the pathway. The differential expressions of ISPD, ISPF and ISPH (MEP pathway), and IaCPS and IaKSL (diterpenoid pathway) candidate genes in leaves and roots, may contribute to the high accumulation of Wangzaozins in I. amethystoides leaves. Conclusion The genomic dataset and analyses reported here lay the foundations for further research on this important medicinal plant. Electronic supplementary material The online version of this article (10.1186/s12870-018-1505-0) contains supplementary material, which is available to authorized users.


Background
The herbaceous genus Isodon (formerly Rabdosia), in the family Lemiaceae, contains a number of traditional medicinal plants that have a long history in traditional popular prescription in China. There are approximately 150 species of Isodon, nearly 70% of which are distributed in tropical and subtropical Asia [1,2]. Many Isodon species have been shown to be effective in treating different illness, including I. japonicus for bleomycin-induced pulmonary fibrosis [3], I. eriocalyx for autoimmune encephalomyelitis amelioration [4], I. phyllostachys for its antiphlogistic and antibiotic actions [5], and I. rubescens for its anticancer, anti-inflammatory, and antibacterial properties [6]. Isodon amethystoides (Ben-th) Cy Wu et Hsuan, referred to as "Wang Zao Zi" and "Xiang Cha Cai" in China, is known for its broad applications as a folk remedy for abscesses, swollen sores, and tumors [7]. Unlike other traditional Chinese medicines needing concerted application, I. amethystoides is commonly used as a single drug for therapeutic purposes. Its ethanol extracts have shown a broad spectrum of antimicrobial efficacy, and in the 1980s were investigated as "Wang Zao Zi Soap" or "Wang Zao Zi Injection" for clinical trials in the Huaibei region of China [8]. The total healing rate reached 94% from 200 clinical cases with pneumonia, lung abscess, and pulmonary tuberculosis, thereby demonstrating its potential pharmacological value when used as a simple preparation.
Wangzaozins, including Wangzaozin A, glaucocalyxin A (Wangzaozin B), glaucocalyxin B (Wangzaozin C), are ent-kaurane-type tetracyclic diterpenoids isolated from Wangzaozi [9,10]. Having the same basic carbon skeleton, the three compounds differ in modified groups. Wangzaozin A has a hydroxyl on C-3 where it is a ketone in glaucocalyxin A, and glaucocalyxin B is the acetylated structure of glaucocalyxin A [10]. Pharmacological studies have identified the bioactivities of Wangzaozins. For example, glaucocalyxin A has been widely studied for certain important biological activities, including antitumor [11], apoptosis induction [12,13], antibacterial, anti-oxidative, anticoagulative, antithrombotic, antifibrotic, immune, and antineuroinflammatory activities [5,14,15]. Wangzaozin A, specifically isolated from Wang Zao Zi and not from other Isodon species, has been proved to be effective in the treatment of pneumonia, lung abscess, and pulmonary tuberculosis when applied independently.
DNA sequence information is a prerequisite for elucidating the molecular mechanism underlying diterpenoid biosynthesis. For non-sequenced medicinal plants, RNA-seq provides an efficient tool for us to generate large amount of omics data. For Isodon species, the transcriptomic data were reported in I. rubescens [6,21], but it has not been recorded in I. amethystoides and other species yet. In the present study, we report transcriptome databases from the root, stem, and leaf tissues of one-year-old I. amethystoides plants. Through comprehensive analysis of the transcriptome data, we identified thousands of genes, including a set of putative genes involved in the diterpenoid biosynthesis pathway and a number of transcription factors (TFs). Moreover, we analyzed the transcripts involving tetracyclic diterpenoid Wangzaozins biosynthesis in different tissues. This work lays the foundations for functionally elucidating the genes involved in Wangzaozins biosynthesis, and molecular regulation of the biosynthesis of bioactive compounds in I. amethystoides.

Tetracyclic diterpenoid accumulation
To examine the distribution pattern of tetracyclic diterpenoids, we determined the contents of three major bioactive tetracyclic diterpenoid compounds, Wangzaozin A, glaucocalyxin A, and glaucocalyxin B, in three tissues (root, leaf, and stem) of I. amethystoides. The three compounds have similar chemical structures, but differ in the structure of modified groups. The highest content of Wangzaozin A was observed in leaves, as high as 0.22% of DWT, followed by 0.031% in stems, which were 40-and 6-fold higher than that in roots, respectively. The highest accumulation of glaucocalyxin A was also found in leaves (0.16% of DWT), followed by stems, which were 48-and 7-fold higher than that in roots, respectively (Fig. 1). The content of glaucocalyxin B in leaves was 6-fold higher than that in roots, whereas no glaucocalyxin B was detected in stems. Due to the distinct distributions of the major tetracyclic diterpenoids, all three tissues (root, stem, and leaf) were selected as experimental materials for studying the tetracyclic diterpenoid biosynthesis pathway via comparative transcriptome analysis.

Illumina sequencing and read assembly
To generate a transcriptome database for I. amethystoides, nine RNAseq libraries were constructed from the root, stem, and leaf tissues with three biological replicates. A total of 65.28 Gb clean data was recovered after trimming to remove adaptors, primer sequences, poly-A tails, and short and low-quality sequences. A range of 71 to 74 million for each tissue was obtained (Additional file 1: Table  S2). Using the high-quality reads, 132,720 transcripts were assembled, with an average length of 1209 bp and N50 length of 1914 bp. Finally, 114,488 unigenes were identified. The total accumulated size of the assembled unigenes was approximately 98 Mb, with an N50 length of 1241 bp and an average size of 858 bp. There were 42,851 (43.419%) unigenes in the size range 300-500 bp, 30,335 (30.737%) at 501-1000 bp, 14,572 (14.765%) at 1000-2000 bp, and 10,933 (11.078%) > 2000 bp (Additional file 2: Figure S1). The correlation indices between repeated samples were ≈0.9 (Additional file 3: Table S3, Additional file 4: Figure S2), suggesting that the RNA-seq results are credible.

Functional annotation
The I. amethystoides transcripts were annotated by BLAST searches against several public databases. A statistical summary of these annotations is listed in Additional file 5: could be assigned at least one putative function from one of these databases. Functional annotations of the assembled transcripts with plant NRDB demonstrated that the majority of these are homologous to those in species including Sesamum indicum, Erythranthe guttata, and Vitis vinifera (Additional file 6: Figure S3). The remaining 35.63% of the transcripts had no significant protein matches in these databases. These non-matched transcripts may be novel or diverse proteins and long non-coding RNAs in I. amethystoides, or could be derived from less conserved 3′ or 5′ untranslated regions of the genes.
The assembled genes were assigned to the biological pathways described in KEGG to better understand the functions in specific metabolic pathways in I. amethystoides (Fig. 2). To systematically analyze inner cell metabolic pathways and complex biological behaviors, the annotated coding region sequences were mapped to the reference canonical pathways in the KEGG pathway database. As a consequence, 17,527 transcripts (15.31%) could be assigned to five main categories: "metabolism," "genetic information processing," "environmental information processing," "cellular process," and "organismal systems," with 50 sub-categories and 124 pathways (Fig. 2, Additional file 11: Table S7). These enzymes have assigned functions in 23 secondary metabolic pathways in KEGG (Table 1). Among these, 230 transcripts encode key enzymes involved in the pathways for tetracyclic diterpenoid biosynthesis, including the synthesis of the tetracyclic diterpenoid backbone (134 transcripts), monoterpenoids (2 transcripts), diterpenoids (29 transcripts), sesquiterpenoid (1 transcript), and other terpenoids (66 transcripts). Nine transcripts with high homology with GA oxidase sequences. Two hundred and eighty-eight transcripts were related to genes involved in the flavonoid biosynthesis pathway, including the phenylpropanoid (205 transcripts), flavonoid (64 transcripts), and flavone and flavonol (19 transcripts) biosynthesis pathways. Ninety transcripts were associated with alkaloid biosynthesis, including indole alkaloid biosynthesis (4 transcripts), isoquinoline alkaloid biosynthesis (38 transcripts), and tropane, piperidine, and pyridine alkaloid biosynthesis (48 transcripts). Identification and future characterization of transcripts involved in these different metabolic pathways will help us to better understand their functions in the biosynthesis of active compounds in Isodon plants.

Differential gene expression analysis
To investigate the differential gene expression among different tissues, the high-quality reads from individual samples were mapped onto the I. amethystoides transcriptome using CLC Genomics Workbench (v8.0.2). DEGs were identified (Additional file 12: Table S8) using the DESeq package with FDR ≤ 0.01 and at least a 2-fold expression change among different tissues [26]. The number of DEGs was highest between the root and leaf, and lowest between the stem and leaf. Furthermore, the DEGs in one tissue with respect to those in the other two tissues were also investigated. Roots contained the largest number of most highly expressed transcripts, with 8906 transcripts being more abundant than in the  (Fig. 3a). In contrast, in stems, 5072 genes were expressed at a higher level than in roots and leaves, while 7792 genes were expressed at a lower level. In addition, a total of 6354 transcripts exhibited tissue-specific expression, with 1127, 760, and 4467 transcripts being specifically expressed in root, stem, and leaf, respectively (Fig. 3b). Among the identified DEGs, 1373 encoded TFs, some of which were specifically expressed in the root, stem, and leaf (Additional file 12: Table S8). Future characterization of specific TFs is required for a better understanding of the gene expression profiles and regulation in the various biochemical pathways in I. amethystoides.
To explore the possible genes regulating the distinct distribution of Wanzaozins, we further performed KEGG enrichment analyses with the DEGs among root, leaf, and stem. These DEGs significantly enriched in a number of pathways. Between root and leaf, photosynthesis, plant hormone signal transduction, zeatin biosynthesis, carotenoid biosynthesis, photosynthesis -antenna proteins, diterpenoid biosynthesis, porphyrin and chlorophyll metabolism, and phenylpropanoid biosynthesis, are the first eight pathways showing the greatest differential expression (Fig. 4a). Between leaf and stem, the eight pathways showing significant enrichment were photosynthesis, plant hormone signal transduction, phenylpropanoid biosynthesis, starch and sucrose metabolism, carotenoid biosynthesis, cyanoamino acid metabolism, ABC transporters, and phenylalanine metabolism (Fig. 4b). Between root and stem, photosynthesis, photosynthesis -antenna proteins, phenylalanine metabolism, phenylpropanoid biosynthesis, steroid biosynthesis, carotenoid biosynthesis, diterpenoid biosynthesis were the eight categories showing significant enrichment (Fig. 4c). The pathways of primary metabolism, including photosynthesis, amino acid metabolism, and starch and sucrose metabolism, which are essential to plant growth, were common in all three tissues. With the exception these common pathways, the enriched secondary metabolic pathways, including diterpenoid and flavonoid biosynthesis, were also detected between different tissues, particularly between roots and above-ground parts (Additional file 13: Table S9), indicating the possible distinct distribution of secondary metabolites among various tissues.

Putative genes involved in the tetracyclic diterpenoid biosynthesis
Totally 29 transcripts classified into 12 enzyme categories were identified by using transcriptome data to examine the KEGG database (Additional file 14: Table S10). NCBI blast showed that these candidate transcripts displayed 67-99% homology with their respective best matches.
The putative pathway of tetracyclic diterpenoids Wangzaozins biosynthesis was proposed (Fig. 5). In the MEP pathway, 14 transcripts were identified, including those of four 1-deoxy-D-xylulose-5-phosphatesynthases (DXS) and two 1-deoxy-D-xylulose-5-phosphate reductoisomerases (DXR). DXS and DXR promote the generation of 2-C-methyl-derythritol 4-phosphate (MEP) from the substrate D-glyceraldehyde 3-phosphate and pyruvate. Of the four DXS genes, IaDXS1 expressed similarly in all three tissues, whereas IaDXS4 showed the highest expression in stem. Both IaDXR1 and IaDXR2 displayed a constitutive expression in three tissues, and IaDXR1 had a remarkably higher expression than IaDXR2. These suggest that IaDXS1, IaDXS4, and IaDXR1 are important for the generation of MEP and deserve further functional characterization. MEP further reacts under the catalysis of following enzymes. In our transcriptomic dataset, only one transcript for 2-C-methyl-D-erythritol 4-phosphate cytidylyltransferase (ISPD), 1-hydroxy-2-methyl-2-(E)-butenyl 4-diphosphate reductase (ISPF), and 1-hydroxy-2-methyl-2-(E)-butenyl 4-diphosphate reductase (ISPH) each was identified, and their expressions were 2 to 3 -fold higher in leaf than root, inferring that these three genes may be single copy genes. There are two transcripts for both IaISPE and IaISPG, which displayed constitutive expressions or predominant expression in one of the three tissues. These genes may lead to high efficiency of tetracyclic diterpenoid substrate biosynthesis in leaf of I. amethystoides.
The tetracyclic diterpenoid biosynthesis involves geranyl diphosphate synthase (GPPS), geranylgeranyl diphosphate synthase (GGPPS), ent-copalyl diphosphate synthase (CPS) and ent-kaurene synthase (KSL). In total, 15 transcripts were found to be involved in this pathway. Two GPPS and four GGPPS transcripts were identified in our dataset. IaGPPS1 was expressed predominantly in root and rarely detectable in leaf, while IaGPPS2 showed the highest expression in stem. IaGGPS1 was expressed at a considerably higher level in above ground parts than roots, whereas IaGGPS2 showed a similar expression among three tissues.
With regard to CPS, an important enzyme in the diterpenoid pathway, 5 transcripts were found in I. amethystoides. IaCPS4 showed constitutive expression among three tissues, while IaCPS1 had a higher expression in stem than other tissues. In the dataset we found one CPS transcript, IaCPS3 (DN34471_c0_g1), containing  the complete open reading frame, was expressed predominantly in leaves. Phylogenetic analysis indicated that IaCPS1 fell into a same clade with IeCPS2, and it also has a high similarity with IrCPS4 (Fig. 6). IeCPS2 and IrCPS4 have been determined to be involved in pharmacologically active Isodon diterpenoids biosynthesis [18,19]. Therefore, the CPS candidate genes, especially IaCPS1 (DN64742_c2_g3), may have important roles in Wangzaozins biosynthesis.
For ent-kaurene synthase, we detected 4 possible transcripts. Except for IaKSL2, which showed a similar expression level in all three tissues, the other three were expressed at a considerably higher level in leaves and stems. Particularly, IaKSL4 was highly expressed in leaf and stem. These KLS candidate transcripts shared high homologies with known counterparts in other species (Fig. 6). IaKSL2 and IaKSL4 were closely related to IrKSL5 and IrKSL4, respectively. IrKSL2 and IrKSL5 have been shown to react with ent-CPP [19]. The results indicate that KSL genes (IaKSL2 and IaKSL4) having a high or predominant expression, may be a cause of accumulation of Wangzaozins in I. amethystoides leaves.
The transcriptomic data for different tissues would be helpful for obtaining the CDS of putative genes involved in Wangzaozins biosynthesis pathways and unraveling their family members and functions in I. amethystoides.

Expression patterns of gibberellin (GA) biosynthesis candidate genes among different tissues
Wangzaozins, belonging to ent-kaurane-type tetracyclic diterpenoids [9,10], share the same requisite skeletal structure with gibberellin, as well as oridonin. Oridonin has been reported to have the same precursor of ent-kaurene with gibberellin [19,27]. Therefore, ent-kaurene is thought to be the common substrate for biosynthesis of gibberellin and Wangzaozins. Because plant hormone signal transduction showed significant enrichment between root and leaf, and stem and leaf, we analyzed the expressions of GA biosynthesis genes among three tissues. Nine transcripts that are potentially involved in gibberellin biosynthesis were detected: three encoding gibberellin 20-oxidases (GA20ox), two gibberellin 3-oxidases (GA3ox) two ent-kaurene oxidases (KO) and two ent-kaurenoic acid oxidases (KAO) (Fig. 7a). Except for IaGA3ox1 being expressed consistently, all other genes showed differential expression patterns among three tissues. Phylogenetic analysis showed that these GA biosynthesis related candidate genes had a close genetic relationship with those reported in other Isodon species or dicots (Fig. 7b). The differential expressions of these candidate transcripts may lead to the difference in GA production among different tissues.

qRT-PCR validation of candidate genes involved in Wangzaozins biosynthesis
To validate the transcriptome analysis data and also to evaluate the differential expression profile among different tissues, we selected eight transcripts cognate to Wangzaozins biosynthesis for qRT-PCR analysis. The expressions of these transcripts based on FPKM value are shown in Fig. 8. After validation, it was found that all eight transcripts showed a similar expression pattern based on FPKM value, though there are some differences in expression folds. Both IaDXS1 and IaDXR1 showed a similar expression among three tissues. The single copy transcripts IaISPD, IaISPF and IaISPH had a predominant expression in leaf or stem. Interestingly, IaISPH was not detected in root by qRT-PCR. IaGGPPS2 and IaCPS1 showed an approximately 2-fold higher expression in stems than root. IaKSL4 was expressed predominantly in leaf, with a 159-fold increased expression compared to that in the root respectively. The qRT-PCR results of all eight genes were in a good agreement with the transcriptomic data. The results further confirm the high reliability of transcriptomic data from three replicates, which will be helpful for understanding the biosynthesis pathway of Wangzaozins in I. amethystoides.
On the basis of the large-scale transcriptome data, identification of candidate genes of transcription factors (TFs) or those encoding key enzymes is essential for elucidating the biosynthesis of tetracyclic diterpenoids in I. amethystoides. TFs, which generally occur as gene families, play key roles in regulating the expression of genes by specifically binding to the cis-regulatory elements in the promoter regions of target genes [35]. Approximately 2% (1373) of the assembled transcripts were TFs, which could be classified into 55 TF families, including the C2H2, bZIP, and bHLH families. The roles of TFs in regulating diterpenoids have been widely studied in crops [36,37]. For the Lamiaceae family, bZIP and bHLH genes have been found to be involved in the regulation of tanshinone biosynthesis [38,39]. The evidence suggest that functional characterization of TFs has the potential to increase tetracyclic diterpenoid production. However, there is no reports focusing on the role of TFs in regulating diterpenoid biosynthesis in Isodon species. Thus, the candidate TFs provide a large gene pool to mine important regulatory genes for Wangzaozins production.
For the biosynthesis of Wangzoazins, 29 transcripts, including 14 in the MEP pathway and 15 in the tetracyclic diterpenoid biosynthesis were identified in this study. ISPD, ISPF, and ISPH have been found in our RNA-seq dataset as a single copy, consistent with the conclusion in Stevia rebaudiana [16]. All the three transcripts had a 2 to 3-fold higher expression in leaf than root. Four DXS and two DXR genes had constitutive expressions or similar expression among three tissues, which implies that these genes may play roles both in primary and diterpenoid metabolisms. The GGPPS family has been reported to be encoded by two to 12 members in plants [40]. Four IaGGPPS candidate genes were found in the present study, of which IaGGPPS1 (DN49479_c0_g1) showed higher expressions in above ground parts compared to the root.
Five CPS transcripts were screened out from the RNA-seq dataset, with similar expression among the three examined tissues, or preferential expression in leaf, stem, or root. From these CPS transcripts, we identified one (IaCPS3, DN34471_c0_g1) containing the complete open reading frame. Phylogenetic analysis revealed that it shares a high amino acid identity with I. rubescens IrCPS3. IrCPS3 was expressed five-fold higher in stem than root and leaf [19]. Other CPS transcripts share a high homology with known CPS genes from other plant species. Particularly, IaCPS1 was placed in the same clade with I. eriocalyx IeCPS2. IeCPS2 has been shown to specifically involved in the biosynthesis of pharmacologically active Isodon diterpenoids rather than gibberellin [18], implying a potential role of IaCPS1 in the biosynthesis of Wangzaozins.
The ent-kaurene synthase (EC 4.2.3.19) is an enzyme that catalyzes the formation of ent-kaurene from ent-copalyl diphosphate by releasing diphosphate. From the RNA-seq dataset, we identified four kaurene synthase-like genes in I. amethystoides. IaKSL2 and IaKSL4 are closely related to IrKSL5 and IrKSL4 respectively. In I. rubescens, IrKSL5 and IrKSL4 are involved in the generation of ent-kaurene [19]. It suggests that IaKSL2 and IaKSL4 may be the potential genes to react with ent-CPP in I. amethystoides.
The functional diversification of diterpene syntheses has been discovered in Salvia miltiorrhiza [41] and I. rubescens [19]. Therefore, the candidate genes identified in this study will make an important contribution to molecular studies on the biosynthesis of tetracyclic diterpenoid biosynthesis in I. amethystoides.
Gibberellins are diterpenoids that are derived biosynthetically from ent-kaurene, as some pharmacological diterpenoids [18,19,27]. In this study, the expressions of 9 transcripts associated with gibberellin biosynthesis were analyzed using the transcriptomic data. Except for IaGA3ox1 being expressed similarly among the three tissues, all other genes showed differential expression patterns. This infers the possible difference in GA production among diverse tissues of I. amethystoides.

Conclusion
We generated a comprehensive transcriptome assembly harboring genetic information for mass sequence data of I. amethystoides. These sequence data allowed us to identify and characterize the molecular functions of TFs and candidate transcripts associated with tetracyclic diterpenoid metabolic pathways, which provide insight into the biosynthesis of bioactive compounds in I. amethystoides. Particularly, twenty nine transcripts encoding key enzymes in tetracyclic diterpenoid biosynthesis were identified. The higher expressions of IaISPD, IaISPE and IaISPF (MEP pathway), and IaCPS (such as IaCPS1) and IaKSL (such as IaKSL2 and IaKSL4) candidate genes in leaves, may contribute to the high accumulation of Wangzaozins in I. amethystoides leaves. Intensive biochemical, enzymatic, physiological, and molecular studies on these candidate transcripts will be necessary to better understand the underlying regulatory mechanisms. Moreover, our study records the first genomics resource for I. amethystoides and lays the foundations for further improvements in this medicinal crop through genetic, genomic, and engineering methods.

Extraction and estimation of Wangzaozins
Each tissue was pooled from three plants, air-dried, and ground into a fine powder. One gram of powder was weighed into 20 mL of ethanol for ultrasonication extraction at room temperature for 1 h. The extract was collected by filtering and then dried via rotary evaporation. The yielded pellet was dissolved in methanol to an appropriate concentration.
Standard references of Wangzaozins (Wangzaozin A, glaucocalyxin A, and glaucocalyxin B) were purchased from Chengdu Purechem-standard Co., Ltd. (Chengdu, China). Each standard reference was dissolved in methanol to yield a 0.1 mg·mL − 1 solution.
An Agilent Technologies 1260 Infinity Series HPLC system equipped with a Thermo-C18 column (250 mm × 4.6 mm, 5.0 μm) was used for liquid chromatographic separations. The mobile phase consisted of 0.1% acetic acid in water (solvent A) and methanol (solvent B). The gradient elution was as follows: 0-6 min, linear from 40 to 45% B; 6-35 min, held at 45% B; 35-45 min, linear from 45 to 95% B. The flow rate was 1.0 mL/min. The injection volume was 10 μL. The column temperature was set at 30°C and the sampler was set at 4°C.
The contents of Wangzaozin A, glaucocalyxin A, and glaucocalyxin B in each tissue were quantified by calculating the peak area compared with those of the respective standards by linear regression. The content was presented as an average percentage of dry weight (DWT) from three replicates.

RNA extraction and transcriptome sequencing
All plant samples were separately cut into small pieces, and 2.0 g (fresh weight) of each tissue pooled from three plants was used for RNA isolation. The experiments contained three biological replications, and thus nine plants were used for each tissue. Total RNA extraction and RNA integrity evaluation were performed according to a previously published protocol [42]. RNA sequencing libraries with an average insert length of 250 bp were sent Genepioneer Technologies Corporation (Nanjing, China) for sequencing on Illumina HiSeq4000 platform (Illumina Inc., USA).

Data filtering and de novo assembly
The raw data of nine RNA sequencing reads were trimmed to remove the adaptor sequences, duplicated sequences, ambiguous reads ('N'), and low-quality reads. The yielded clean reads were then used for sequence assembly using the Trinity program with default parameters [43]. Contigs were generated by combining the clean reads with a certain overlap and then clustered using TGICL software [44] to yield unigenes. Non-redundant unigenes were acquired by removing the redundancies (−l 40, −c 10, and -v 20).

Differentially expressed genes (DEGs)
The fragments per kb per million reads (FPKM) method was employed to quantify the transcript abundances [26]. The thresholds for the significance test were set as FDR value ≤0.01 and the absolute value of log2 ratio ≥ 1. Each set of differentially expressed unigenes was analyzed by hypergeometric Fisher exact test (P < 0.01) and Benjamini correction (FDR < 0.05) to construct the KEGG pathways.

Verification of gene expression using qRT-PCR
Eight DEGs related to Wangzaozins biosynthesis were selected for quantitative real-time PCR analysis. The primer sequences used are listed in Additional file 15: Table  S1. qRT-PCR was performed using the ABI7500 fast Real-Time PCR system (Applied Biosystems, USA) with a SYBR® Premix ExTaq™ Kit (Takara, Dalian, China). The Actin gene was used for normalization of expression level. Three biological replicates and three technical repeats were contained for each gene and sample. The 2 −ΔΔCT method was used to calculate the relative gene expression [48].