Selection and validation of reference genes for measuring gene expression in Toona ciliata under different experimental conditions by quantitative real-time PCR analysis

Background Before studying gene expression of different organisms, it is important to determine the best reference gene. At present, the most accurate method of detecting gene expression is quantitative real-time PCR (RT-qPCR). With this method, reference genes that are stable in different biological systems and under different conditions can be obtained. Toona ciliata Roem (T. ciliata). is a valuable and fast-growing timber specie. In this study, 20 reference genes were identified using RT-qPCR, as a primary prerequisite for future gene expression analysis. Four different methods, geNorm, NormFinder, BestKeeper, and RankAggreg were used to evaluate the expression stability of the 20 candidate reference genes in various tissues under different conditions. Results The experimental results showed that TUB-α was the most stably expressed reference gene across all samples and UBC17 was the most stable in leaves and young stems under Hypsipyla robusta (H. robusta) and methyl jasmonate (MeJA) treatments. In addition, PP2C59 and UBC5B were the best-performing genes in leaves under H. robusta treatment, while HIS1 and ACT7 were the best reference genes in young stems. The two best reference genes were 60S-18 and TUB-α after treatment at 4 °C. The expression of HIS6 and MUB1 was the most stable under PEG6000 treatment. The accuracy of the selected reference genes was verified using the transcription factor MYB3 (TcMYB3) gene. Conclusions This is the first report to verify the best reference genes for normalizing gene expression in T. ciliata under different conditions, which will facilitate future elucidation of gene regulations in this species.


Background
Toona ciliata Roem. belongs to the Meliaceae family, which is widely distributed in China, Australia, and India. Because of its straight trunks and russet wood, T. ciliata has the title of "Chinese Mahogany" [1]. But its population has declines sharply in the past century due to environmental degradation and destruction by humans, and it has been listed as one of the "National Class II Key Protected Endangered Plants" in China. T. ciliata has great economic value, for example, its wood is often used to produce high-end furniture, instruments and crafts [2]. More importantly, it is also a medicinal plant as a result of the rich chemical substances in its roots, stems and leaves [3]. Compounds that have been isolated from T. ciliata include ketones, steroids, and coumarins, many of which have antifungal, antiglycation, or anti-tumor activities [4][5][6][7], and its flower extract has a therapeutic effect on gastric ulcers [8]. However, the yield of compounds isolated from T. ciliata is low. In addition, in previous research, it has been found that T. ciliata is very susceptible to the moth pest Hypsipyla robusta Moore [9] that eats mainly the young stems and causes the hollow branches to fail to grow and die in some cases. This pest is not only a regional issue in China, but also a worldwide problem. In some of the main areas where H. robusta is distributed, such as Australia and Brazil, T. ciliata also faces serious damage from H. robusta [10][11][12]. At present, there are no chemical or physical methods to prevent or control H. robusta effectively, and current pest control methods are time-and labor-consuming, thus not applicable to large-scale forest plantations [13]. It may help to pest control by obtaining insect-resistant plants through molecular breeding. In order to synthesize a desired compound related to the resistance mechanism, it is necessary to first explore the pathway and its related regulatory genes [14,15]. Gene expression analysis is one of the most powerful tools to explore biosynthetic and insect-resistance mechanisms in T. ciliata. So far, the knowledge base ICG (http://icg.big.ac.cn) has collected reference genes from more than 120 plant species including Arabidopsis [16], peanut [17], cucumber [18], and soybean [19], except T. ciliata. Nor are there any literature references about the housekeeping genes in T. ciliata, which can be used for the standardization of gene expression.
RT-qPCR has good repeatability, high sensitivity, accurate quantification, and fast reaction, making it a powerful tool to carry out the entire PCR process and the most commonly used method of determining gene expression levels [20]. However, RT-qPCR can be affected by multiple sources of error, such as the amount of starting materials, the integrity of the RNA, and the efficiency of the enzymatic reactions. It is therefore necessary to introduce a stably expressed housekeeping gene as a reference for correction and standardization, so as to control the unnecessary errors generated within and between samples [21].
The commonly used housekeeping genes are those consistently express under all conditions, such as genes encoding actin (ACT), glyceraldehyde-3 phosphate dehydrogenase (GAPDH), and tubulin (TUB) [22]. However, more and more studies are now questioning the existence of genes that are stably expressed across different tissues, different experimental conditions, and different species. In order to ensure the accuracy of an experiment, it is important to select those suitable reference genes for specific experimental conditions [23]. Software packages, including geNorm [24], NormFinder [25], and BestKeeper [26], are widely used to assess the expression stability of candidate reference genes and determine the best choices. Many researchers have used these algorithms to successfully identify reference genes in different species [27,28]. The use of reference genes in expression analysis has greatly facilitated research in plant development and evolutionary mechanisms in species where a reference genome sequence is available [29].

Results
Primer specificity, amplification efficiency, and expression profiles of candidate reference genes Reverse-transcribed cDNA from each sample was used as a template with primers for standard PCR amplification. Electrophoresis verified all PCR products were specific with single bands in the gel (Fig. S1). The melting profiles of all amplified candidate reference genes using RT-qPCR showed single peaks (Fig. S2). A standard curve for each candidate was obtained by serial dilution, and their linear correlation coefficients were all greater than 0.99 (R 2 > 0.99). The amplification efficiency for the 20 candidate reference genes ranged from 90.41% for PPIA95 to 102.44% for PGK. Further details of primers are given in Table 1.
The expression levels of all candidate reference genes were determined by RT-qPCR under all of the following conditions: different tissues, H. robusta treatment, 4°C treatment, MeJA treatment, and PEG6000 treatment. The expression levels of candidate genes were very different across the samples. The maximum cycle threshold (CT) value was 31.66, and the minimum was 13.18 ( Fig. 1). Among them, PPIA26 showed the highest expression abundance, with the maximum, minimum, and median of the CT values being 31.66, 20.07, and 23.36, respectively. EF1 showed the lowest expression abundance, with the maximum, minimum, and median CT values being 22.16, 13.18, and 17.17, respectively. In addition, candidate genes exhibited significant variability in expression. MUB1 and UBC5B had a relatively narrow range of CT values compared with other genes, indicating that they are more stably expressed. Notably, these results show that none of the genes are expressed stably across all conditions, so it is necessary to screen reference genes for T. ciliata under specific conditions.

Stability of expression of candidate reference genes
The software packages geNorm, NormFinder, and Best-Keeper were used to evaluate the expression stability of the 20 candidate reference genes under different experimental conditions. The R software RankAggreg package was used for overall ranking [34].

GeNorm analysis
In geNorm analysis, M value is calculated for each pair of genes. The stability of gene expression is evaluated based on the M value; the genes with threshold M value below 1.5 are considered as stably expressed, and the gene with the lowest M value is regarded as the most stably expressed reference gene. The results of geNorm analysis of 20 candidate reference genes under different conditions are shown in Fig. 2a-h. The M values of all candidate genes from all the tested samples were below 1.5 (Fig. 3). Under H. robusta treatment, UBC17, PP2C59, and UBC5B were most stably expressed in leaves (Fig. 2a), while HIS1, UBC5B, and ACT7 exhibited few expression fluctuations in young stems (Fig. 2b). Data Analyses from the two tissues under H. robusta treatment showed that UBC5B, HIS1, and ACT7 were with the most stable expression as their M values are the lowest (Fig. 2c). The most stably expressed genes across different tissues were 18S and TUB-α, with M values around 0.2 (Fig. 2d). PPIA95 showed good stability under both 4°C and MeJA treatments; 60S-18 and UBC17 were stably expressed only under 4°C and MeJA treatment, respectively ( Fig. 2e and f). The two genes with the lowest M values under drought stress which was simulated by PEG6000 treatment were PP2C57 and EF1 (Fig. 2g). And EF2 and EF1 had the highest stability with M value of 0.49 (Fig. 2h).
In general, it is more reliable to use multiple reference genes are more reliable than a single reference gene for quantitative gene analysis. Given this, geNorm calculates the pairwise variation (V n/n + 1 ) of the normalization factor after the introduction of a new reference gene, and determines the optimal number of reference genes based on this ratio. The default V n/n + 1 value for the software is 0.15. If the ratio is less than 0.15, the number of internal gene combinations that can meet the requirements for relative quantification is n, otherwise another reference gene needs to be introduced. In our study, the values of pairwise variation V 2/3 under conditions with H. robusta treatment, 4°C treatment, MeJA treatment, PEG6000 treatment, and different tissues, were all less than 0.15, indicating that the optimal number of reference gene combinations is two (Fig. 3). Across all samples, pairwise variation (V 2/3 ) was 0.180, V 3/4 was 0.15, and V 4/5 was 0.126, indicating that the addition of the third and fourth reference genes has different impacts on the results. It always better to use fewer reference gene due to the time and cost economy consideration, hence the best reference gene combination was EF2, EF1, ACT7 and UBC5B for all the samples.

NormFinder analysis
In order to further determine the stability of candidate reference genes, NormFinder was used to re-analyze the data. The results are shown in Fig. 4a-h. Under H. robusta treatment, the top three genes with stable expression in leaves were TUB-α (stability value =0.038), HIS1 (0.105), and PP2C59 (0.161) (Fig. 4a), while the most stable genes were ACT7 (0.042), UBC5B (0.042), and TIP41 (0.109) in young stems (Fig. 4b). The top three reference candidates in two tissues (leaves and stems) were TUB-α (0.170), UBC5B (0.206), and PPIA95 (0.250) ( Fig. 4c). The genes 18S (0.088) and TUB-α (0.112) had lower stability values across different tissues since they showed the most stable expression, which is consistent with the results of geNorm analysis (Fig. 4d). However, ACT7 (0.014), TUB-α (0.082), and PGK (0.099) were the most stable candidate genes under 4°C treatment (Fig.  4e), which is inconsistent with the results of geNorm analysis; this may be due to the fact that the two software packages use different algorithms. Under MeJA treatment, the two most stable reference genes were 18S (0.078) and UBC17 (0.100) (Fig. 4f), while under PEG6000 treatment, the two most stable were 18S (0.055) and HIS6 (0.082) (Fig. 4g). TUB-α (0.281), 18S (0.316), and PGK (0.335) were the three genes with the lowest stability values for all samples (Fig. 4h). It is not consistent with geNorm analysis which showed that PPIA26, PP2C59 and HIS6 were the most unstable genes.

BestKeeper analysis
BestKeeper takes as input the CT data for each geneprimer pair combination and calculates the coefficient of variation (CV) and standard deviation (SD), as shown in

RankAggreg analysis
In this study, three algorithms were used to analyze the expression stability of 20 candidate reference genes. The gene ranking tables generated by them are different because of their different algorithms. RankAggreg is an algorithm designed to aggregate large ranking lists. It performs aggregation of ordered lists based on the rankings via the Cross-Entropy Monte Carlo algorithm or a Genetic Algorithm [34]. To provide a consensus ranking, we used RankAggreg to calculate the overall gene ranking for each experimental condition, as shown in Table 3. The consensus for the top two genes in H. robusta treatment on leaves and under MeJA treatment was consistent with the results of geNorm analysis. HIS1 ranked first for young stem tissue under H. robusta treatment.
The first-placed genes were 60S-18 and HIS6 under 4°C treatment and PEG6000 treatment, respectively. TUB-α was the most stable gene in different tissues and all samples. The expression of PPIA26 was the most unstable under all experimental conditions except PEG6000 treatment.

Validation of reference genes
In order to verify the expression stability of the selected reference genes by the software, expression of the TcMYB3 gene was quantified using either the two most stable genes, alone and in combination, or the two most unstable genes in the consensus ranking. Under H. robusta treatment, the relative expression of TcMYB3 in leaves and young stems reached a peak at 12 h when the most stable genes and their combinations were used for standardized. But the relative expression of TcMYB3 was abnormally increased when standardized with the most unstable genes (Fig. 5a,  b). As shown in Fig. 5c, when using the most stable genes as the reference genes, the expression level of TcMYB3 increased 1-1.5 times in young leaves compared with the expression level in mature leaves and the expression of other tissues (shoots, young stems, roots, and flowers) was down-regulated and the depression multiple was basically the same. But the expression level of TcMYB3 in the flower was the highest using the most unstable gene (PPIA26) as the reference gene. In addition, the expression level and trends were very similar when the most stable two reference genes and their combination were used for relative quantification under other stresses, including 4°C treatment (Fig. 5d), MeJA treatment (Fig. 5e), and PEG6000 treatment (Fig. 5f). Whereas neither the expression level nor trend was consistent when the two most unstable internal reference genes were used for relative quantification. It is evident that the use of unstable references for gene expression analysis in T. ciliata can result in biased results.

Discussion
It is ideal that reference genes are stably expressed under all experimental conditions and show stable expression levels across various tissues and growth stages of the organism, but such genes are almost non-existent [35].
More and more studies are showing that the genes that are stably expressed in different species and under different conditions change [36][37][38][39]. Selection of the most suitable reference gene for specific conditions using RT- PCR is therefore very important. This study was dedicated to discovering the best reference genes for gene expression analysis in T. ciliata under different conditions. There were 20 candidate genes from the T. ciliata transcriptome database screened and analyzed by RT-qPCR. It was found that the best reference genes were not consistent across different conditions. For examples, PP2C59 and UBC5B were most suitable for leaves under H. robusta treatment, whereas HIS1 and ACT7 were more optimal for young stems under H. robusta treatment, TUB-α and PPIA95 for comparing different tissues, and 60S-18 and TUB-α for leaves under 4°C treatment.
In this study, we used four methods, geNorm, Norm-Finder, BestKeeper, and RankAggreg, to evaluate the expression stability of 20 candidate genes. The first three algorithms were used to evaluate the expression stability of candidate genes. Our results demonstrated that reference values and calculation methods used by the three algorithms were very different [40]. NormFinder calculates stability values based on intra-and inter-group differences [25], while geNorm compares a reference gene with all genes in a given sample to evaluate the best reference gene [24]. In BestKeeper, CV and SD values determine the ranking of stability of candidate genes [41]. Due to the difference of the algorithms among the three software packages, they generated different rankings for the same set of experimental data, although the results of analysis with geNorm and NormFinder had few variations in this study. For example, 18S was the best reference gene to use across different tissue conditions according to geNorm and NormFinder, whereas the best reference gene was HIS6 identified by BestKeeper analysis. For young stem tissue exposed to H. robusta stress, ACT7 and UBC5B were put forward by geNorm and NormFinder, but ranked low in BestKeeper results. In order to consolidate the results from the three algorithms, RankAggreg was used for overall ranking [34]. Many researchers use ReFinder to calculate the final ranking [42][43][44]. ReFinder assigns an appropriate weight to each gene and calculates the geometric mean of its weights to give the final ranking [45]. RankAggreg uses a cross-entropy Monte Carlo algorithm or genetic algorithm to produce aggregated ordered lists based on rankings [34]. Both tools play very important roles in the consolidation of the screening results for internal reference genes from other softwares.
Other researchers have studied the best reference genes for plants under pest stress, and STP4 was found to be the best reference gene for use in Brassica juncea under biotic stress caused by aphid infestation [46]. ABCT and FBOX were found to be the most stable in soybean under soybean aphid (SBA) stress; TUB4 and TUA4 were stable under two-spotted spider mite (TSSM) stress [47]. Miranda indicated that both GmELF1A and GmTUA5 were stable reference genes for normalization of expression data obtained from soybean roots infected with Meloidogyne incognita, and GmCYP2 and GmELF1A were the best reference genes in soybean leaves infested by Anticarsia gemmatalis [48]. Under H. robusta stress, the reference genes that performed best in leaves and young stem tissues were different in our study. PP2C59 and UBC5B showed high stability of expression in leaves, while only PP2C59 ranked high for young stems. Once again, the appropriate reference genes for different species under different conditions and in different tissues vary. Hence, it is necessary to identify the best reference genes for specific conditions via RT-qPCR. Protein phosphatase can reverse the phosphorylation of protein kinases, thereby dynamically controlling protein phosphorylation and protein phosphatase 2Cs (PP2Cs) is the most abundant type of Fig. 3 Pairwise variations (V) for the 20 candidate reference genes calculated by geNorm to determine the optimal number of reference genes for accurate normalization. The threshold used was 0.15 phosphatase in plants [49]. Although it not often has been used as a candidate internal reference gene, it is stable in our study under pest stress and in different tissues for GA treatment of Santalum album [50]. Therefore, when screening reference genes in other species, PP2Cs can be considered as a candidate reference gene.
Under most experimental conditions in this study (all except for PEG6000 treatment), the reference gene with the worst performance was PPIA26, which was the best reference gene recommended by BestKeeper for use under PEG6000 stress. Another gene in this family, PPIA95, ranked first in geNorm analysis for both 4°C cold stress and MeJA treatment. For leaves and young stems under H. robusta treatment and MeJA stress, PPIA95 ranked third in NormFinder analysis. In Best-Keeper analysis, PPIA95 ranked third under 4°C cold stress and PEG6000-induced drought stress, and among all samples it ranked second. Overall, the PPIA gene family is a promising reference gene set in T. ciliata. The PPIA gene family encodes proteins with functions in immune responses, as well as resistance to cancer, autoimmune diseases, protozoan, and viral infections   [51]. In plants, genes of the PPIA family are rarely used as internal reference genes, but they are abundantly expressed in the T. ciliata transcriptome data and the expression level in each sample is very similar, which is the main reason for choosing them. As reference genes, they are also stably expressed in animals. For example, in different heart and disease conditions, PPIA is recognized by ReFinder as the best reference gene in different skeletal muscles of mice, and it ranked first for human endometrial cancer [52,53]. PPIB is believed as the optimal reference gene in analyzing the blood of Machado-Joseph disease (MJD) patients [54].

Conclusion
This study is the first report about screening and verification of expression stability analysis of a series of reference genes under different conditions in T. ciliata, showing that the optimal reference genes were TUB-α and PGK across all samples; PP2C59 and UBC5B in leaves and HIS1 and ACT7 in young stems under H. robusta treatment; TUB-α and PPIA95 in different tissues; 60S-18 and TUB-α under 4°C treatment; UBC17 and PPIA95 under MeJA treatment; HIS1 and MUB1 under PEG6000 treatment, respectively. We believe this research is important for accurate quantification and expression analysis of genes under different conditions in T. ciliata. It will play a vital role in the molecular breeding work of T. ciliata, such as the research on the H. robusta-resistant and drought-resistant varieties, as well as the research on the metabolic pathways of precious compounds in plants in the future.

Plant materials
Five different experiments were conducted for data collection (Table 4). Experimental samples were all collected from one-year old T. ciliata, grown in a greenhouse in South China Agricultural University (SCAU). For samples from different tissues, mature leaves, young leaves, flowers, shoots, and young stems were collected at 9:00 am on August 25, 2019. Before treatment with the H. robusta, PEG6000, 4°C, and MeJA, all the seedlings were pre-incubated in incubator for 7 days with 16 h of light at 28°C and 8 h of dark at 22°C to mimic the wild environment. For H. robusta treatment, seedlings were exposed to herbivores, and leaves and young stems were harvested after 0, 6, 12, 24, and 36 h. After seedlings were treated with 0, 5, 10, 20, and 30% (w/v) of PEG6000 for 7 days, leaves were collected. For 4°C treatment, seedlings were placed at 4°C, and samples  (Table S1), and then the introns and exons of the candidate reference genes were obtained by sequence alignment (NCBI-Blast), and finally primers for RT-qPCR analysis of each reference gene were designed using the web-based Primer-Blast tool from NCBI (https://www.ncbi.nlm.nih.gov/tools/ primer-blast/). Apart from 60S-18, SAMDC, TUB-β and PPIA26, RT-qPCR primers were designed across introns. Details of these genes and primers are shown in Table 1.

PCR and RT-qPCR analysis
The volume of each PCR amplification reaction mix was 20 μL, containing 10 μL of Phata Max Buffer, 2 μL of five-fold diluted cDNA, 2 μL of each primer (10 μM), 0.5 μL of dNTP, 0. To test the specificity of the RT-qPCR primers, the products of PCR were analyzed by nucleic acid electrophoresis on a 2% (w/v) gel and the melting curve was included after amplification. All samples used for RT-qPCR analysis had three biological replicates, each containing three technical replicates. In order to calculate the gene-specific PCR efficiency (E) and correlation coefficient (R 2 ) of each gene, a standard curve was generated from the mixed complementary DNA (cDNA) using a fivefold dilution series.
Analysis of stability of expression of candidate reference genes CT values were obtained by RT-qPCR, and used to evaluate the expression levels of candidate genes in different experimental conditions and tissues. Three commonly employed algorithms, geNorm [24], NormFinder [25], and BestKeeper [26], were used to evaluate the stability of candidate reference genes in different experiments.
The package geNorm (Version3.5) screens stable reference genes by calculating the M value of the stability of each candidate gene, and the criterion is that the smaller the M value is, the higher the stability of the candidate gene is. It also calculates pairwise variations of the normalized factor after introducing a new internal reference gene, and determines the number of optimal internal reference genes based on the ratio V n / V n + 1 . If the value of V n / V n + 1 is less than 0.15, the number of optimal internal reference genes is n. If the value of V n / V n + 1 is greater than 0.15, the number of optimal reference genes is n + 1. NormFinder selects the most suitable internal reference gene by calculating a stability value for gene expression of the candidates. The lower the stability value is, the more stable the gene is. Using BestKeeper, the SD and CV of expression of each candidate gene can be obtained. The CV ± SD values of different genes were then compared to determine the relative stability of expression of the candidates. Finally, in order to generate an overall ranking of candidate genes from the data generated by geNorm, NormFinder, and Best-Keeper, we used the RankAggreg (version 0.6.5) software package in R as previously described [14,[55][56][57]. Ran-kAggreg is an algorithmic package that can combine different ranking lists. Based on the size of the rankings list, we used the Cross-Entropy Monte Carlo algorithm [34]. The rankings list previously generated by the three packages were used as input with the following parameters: the distance was calculated using Spearman's Footrule function, with rho set at 0.1, the seed at 100, and the "convIn" argument at 50.

Validation of reference genes
In order to verify the accuracy of the rankings and the stability of expression of the selected reference genes, the two most stable reference genes, alone and in combination, and the two most unstable reference genes, as recommended by RankAggreg were used to verify the relative expression of TcMYB3 under 4°C, MeJA, PEG6000, in different tissues, and under H. robusta treatment (leaves and young stems). Finally, we used the 2 -△△CT method to calculate the relative expression levels of the verified genes, where △CT = CT (target gene)-CT (reference gene), △△CT = △CT (treatment)-△ CT (control), 2 -△△CT = relative expression. Three technical replicates were performed for each biological sample [58].
Additional file 1 Figure S1: Amplification products of the twenty candidate reference genes and TcMYB3. Figure S2: Melting curves of candidate reference genes and TcMYB3.
Additional file 2 Table S1. Coding sequences and genomic DNA sequences of candidate reference gene.