Transcriptome analysis reveals underlying immune response mechanism of fungal disease in Gastrodia elata Bl. f. glauca S. Chow

Gastrodia elata Bl. f. glauca S. Chow is a rare medicinal plant. G. elata f. glauca is unavoidably infected by pathogens in their growth process, for they are usually grown in a semi-wild state. If it happens, that will seriously affect the yield and quality of their tubers. In previous work, we have successfully isolated and identied the pathogenic fungus from infected tubers in G. elata f. glauca. In this study, healthy and fungal diseased mature tubers of G. elata f. glauca from Changbai Mountain area were used as experimental materials. between Illumina Noveseq Paired-end sequencing data was generated for each sample with 2×150 bps read lengths.

So far, the most studied G. elata Bl. variety is G. elata Bl. f. glauca in Zhaotong City, Yunnan Province. However, the genomics research of G. elata Bl. f. glauca genetics in Changbai Mountain area is almost blank. In fact, G. elata Bl. f. glauca is not only a traditional Chinese medicinal material in Changbai Mountain, but also one of the most vital special economic crops in Jilin Province. Therefore, whether it is thought from the breeding of highquality traditional Chinese medicine germplasm resources, or considered on the improvement of local economic development level, the research on disease response mechanism of G. elata Bl. f. glauca in Changbai Mountain area has important production signi cance and application value. In this study, we made a detailed comparison between healthy and fungal diseased G. elata Bl. f. glauca tubers by means of transcriptome sequencing and metabolic pathway analysis. That would provide a new insight for the breeding of disease resistant varieties of G. elata Bl. f. glauca.

Results
Sequencing overview A total of 45.88 GB clean data were generated by sequencing platform, and clean data number of each sample was more than 6.23 GB. GC content ranged from 47.16% to 49.09%, and Q30 of each sample was above 92.92% (Additional le: Table S1). The sequencing results showed that sequencing fragments had high randomness and reliability (Additional le: Figure S1A). After transcript de novo assembly, 60324 Unigenes in total were obtained, and the N50 was 2409 kb. Furthermore, 19670 (32.61%) of them were over 1 kb in length (Additional le: Figure S1B). All these indicative data display high assembly integrity. Functional annotation and differential expression analysis DEGs annotation and function classi cation A total of 5140 DEGs were annotated to seven databases. nr (RefSeq non-redundant proteins) had the largest number of annotated DEGs (5066), while KEGG had the least (1745) (Fig.1a). The venn diagram of function annotation in different database as Fig.1b. It was learned that those DEGs between healthy and fungal diseased samples chie y classi ed into "signal transduction mechanisms", "carbohydrate transport and metabolism", "defense mechanisms", "energy production and conversion", "general function prediction only", "posttranslation modi cation, protein turnover, chaperones", "translation, ribosomal structure and biogenesis" (Fig.1c, d). We can believe that G. elata Bl. f. glauca possibly respond to fungal disease by enhancing or weakening these physiological or biological activities. GO  Using KEGG database, 122 pathways were enriched and top 50 was showed as Figure 2c. The enrichment degree was based on the P-value and enrichment factor (Fig. 2b). 9 pathways were signi cantly enriched (p<0.05), and they fell into 3 pathway categories: metabolism, environmental information processing, organismal systems. Speci c pathway names displayed in Table 1.

Differential expression analysis
A total of 7540 DEGs were identi ed. 4326 of these DEGs were up-regulated in diseased group, and 3214 were down-regulated (Fig.3a, b). In addition, 40440 Unigenes did not demonstrate signi cantly differential expression. In other words, DEGs between healthy and diseased samples accounted for 15.71% of all Unigenes.

KEGG pathways analysis
So far, it has been proved that plant disease resistance is relative to plant-pathogen interaction, plant hormone signal transduction, and other pathways about certain secondary metabolite biosynthesis or metabolism[56, [61][62][63][64]. Consistently, we got similar results in this study ( Fig.5-7, Table 1).
Furthermore, numerous DEGs regulating secondary metabolites biosynthesis, like ubiquinone, brassinosteroid, diterpenoid, phenylpropanoid, avone and avonol, revealed active state. In addition, genes related metabolic process of secondary metabolites also revealed signi cant differential expression. In starch and sucrose metabolism pathway, DEGs related to fructose and glucose synthesis is up-regulated, while DEGs linked to starch and glycogen production showed down-regulated expression.
In summary, fungal disease response is a complex process involving multiple biological processes. And meanwhile, there is more than one gene participated in this process. Interestingly, one gene was not appeared in single pathway. That is to say, one gene may perform more than one function simultaneously. Anyway, these signi cantly enriched pathways would well reveal the underlying response mechanism of fungal disease in G. elata Bl. f. glauca.
Potential immune response mechanism of fungal disease in G. elata Bl. f. glauca Actually, in this study, many genes related to stress response and disease resistance demonstrated high expression and signi cant difference. Moreover, They were members of certain transcription factor families, like WRKY, GH3, JAZ, CML, ERF, TGA. Furthermore, these genes were closely connected with derivatives of jasmonic acid, salicylic acid, brassinosteroid, ethylene and auxin. By BLAST (blast.ncbi.nlm.nih.gov/Blast.cgi), it is revealed that amino acid sequences of four JAZ genes in G. elata family were highly similar to certain sequences in Dendrobium catenatum, Phalaenopsis equestris, Apostasia shenzhenica ( Figure 8). Noteworthily, they were all belong to TIFY10 family.
Finally, we found 10 candidate genes responding to fungal disease in G. elata Bl. f. glauca ( Fig.8; Table 2). Seven of them participated in plant hormone signal transduction (ko04075) pathway, and three were part of plant-pathogen interaction (ko04626) pathway.

Discussion
According to research in available, plant disease response mechanisms mainly include PAMP-triggered immunity (PTI), effector-triggered immunity (ETI) and systemic acquired resistance (SAR). ETI is usually accompanied by the occurrence of hypersensitivity reaction (HR), giving rise to programmed cell death (PCD).
Moreover, ETI can also induce SAR. As is known to all, PTI and SAR are non-speci c immunity, while ETI is speci c immunity. From what we study, it can be concluded that the disease response mechanism of G. elata Bl. f. glauca involves all above three kinds of mechanisms in the whole process of infection. Resistance genes (R genes) were classi ed into nine types based on intracellular and extracellular pathogen recognition mechanisms [65]. Here, we discovered potential R genes in G. elata Bl. f. glauca were probably the members of transcription factor families like WRKY, GH3, TIFY/JAZ, CML, ERF, TGA. Coincidently, it has been reported that these above transcription factors did be widely involved in various defense responses[56, [66][67][68][69][70][71][72][73][74][75][76][77][78]. Interestingly, GH3 and CML can also regulates fruit development [79,80]. However, it still needs further study on how these genes perform their functions in respond to fungal disease in G. elata Bl. f. glauca.
In fact, plant hormones also have a vital role in the process of plant-pathogen interaction. Simultaneously, ETI itself can also induce and generate certain special plant hormone signal [80]. In this research, we found a large number of DEGs annotated to signal transduction mechanisms by means of functional annotation.
In addition, some DEGs in ubiquinone and other terpenoid-quinone biosynthesis pathway, like 4CL, were also signi cant differential expression between healthy and diseased group. 4CL belongs to the plant phenylpropane derivative, which is related to the synthesis of avonoids and lignin and is a key enzyme in the biosynthetic pathway. A report indicates that Fm4CL-like 1 is involved in secondary cell wall development and lignin synthesis and it play an important role in osmotic stress by affecting cell wall and stomatal development [95]. This may be a part of fungal disease response mechanism in G. elata Bl. f. glauca.

Conclusions
In conclusion, fungal disease response mechanism in G. elata Bl. f. glauca is quite complicated. Firstly, JA/ET signal transduction play a positive role in regulation. The up-regulation expression of JAZ and ERF1 indirectly induces ubiquitin mediated proteolysis. Secondly, SA signal transduction reveals negative regulation. The down-regulation expression of TGA indirectly triggered disease resistance. Thirdly, brassinosteroid biosynthesis also makes contributions to fungal disease response. CYP90A1 and CYP90D2 display down-regulation and upregulation, respectively. Last but not least, auxin signaling pathway involves in fungal disease response actively. However, JA/ET signaling pathway is undoubtedly the most highlighted. As the candidate genes response to fungal disease in G. elata Bl. f. glauca, their speci c functions still need to be further veri ed. Of course, more insight into the molecular mechanisms of fungal disease response also requires to be revealed.

Plant material and growth conditions
Three healthy tubers (HGe) and three fungal diseased tubers (DGe) were used in this experiment. They were identi ed as mature tubers of G. elata Bl. f. glauca S. Chow by Professor Yugang Gao in Jilin Agricultural University.
These samples were collected from a planting base (126º44'20''E, 42º24 '30''N)  steps are required to build the library: puri cation and fragmentation of mRNA, synthesis and puri cation of double-stranded cDNA, the end repair or dA tail addition, junction ligation and USER (uracil-speci c excision reagent) enzyme digestion, ligated products puri cation and fragments size classi cation, library ampli cation, magnetic bead puri cation or sorting of ampli cation products, library quality control. cDNA library was checked for quality and quantity using Agilent Bioanalyzer (Agilent; www.agilent.com). All RNA sequences of 150 bp between 5'-terminal and 3'-terminal was sequenced through Illumina Noveseq high-ux sequencing platform. Paired-end sequencing data was generated for each sample with 2×150 bps read lengths.

Reads mapping and transcript de novo assembly
The resulting reads called raw data are stored in fastq format. The raw data of each sequencing sample includes two fastq les containing reads determined at both ends of all cDNA fragments. The quality of raw reads was assessed using the fastqc program (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/).
Data ltering on raw data to remove low quality reads and reads containing connector or poly-N, we obtained high quality clean data.
Using Trinity software with default parameters, the sequence assembly of clean data is carried out in combined assembly. In this way, the sequencing depth can be increased indirectly, and transcripts with low expression abundance in G. elata Bl. f. glauca RNA samples can be assembled more completely. Clean data of each sample was aligned with assembled transcript or Unigene library to obtain mapped reads that matched transcript or Unigene library.

Gene expression and annotation
A Unigene supported by a minimum of three mapped high-quality reads was considered as expressed. DEGs were evaluated with the DESeq2 package. Benjamini-Hochberg method was used to correct the signi cant Pvalue obtained from the original hypothesis test. Finally, FDR (false discovery rate, corrected P-value) was used as one of the key indexes of DEGs identi cation. This is done for the sake of reducing the false positive caused by independent statistical hypothesis test to a large number of gene expression values. FC (fold change) means the ratio of gene expression levels between healthy tubers and diseased tubers. Negative log2FC values indicate higher expression in healthy samples, while positive log2FC values showed higher expression in disease ones. In addition, the gene expression abundance, described by FPKM (reads per kilobase of exon model per million mapped reads) value, is also a factor to be considered for DEGs identi cation. When gene expression abundance is small, that is to say, be with low signal values, it may not be detected in subsequent validation.
In organisms, different gene products coordinate with each other to perform biological functions, and enrichment analysis helps to further interpret the gene functions. All Unigenes were annotated into databases such as Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), COG (clusters of orthologous groups), KOG (clusters of euKaryotic Orthologous Groups), eggNOG (Evolutionary Genealogy of Genes: Nonsupervised Orthologous Groups). The differential expression and enrichment analysis were conducted using Fisher′s exact test to obtain an adjusted P-value (less than 0.05) with an FDR correction.

Supplementary Information
Additional le 1: Table S1. Sequencing and reads mapping.
Additional le 2:       Transcription factor prediction. The x-axis shows transcription factor family name.   Phylogenetic tree of TIFY10 in Orchidaceae plant.