- Research article
- Open Access
Selection of internal control genes for quantitative real-time RT-PCR studies during tomato development process
BMC Plant Biology volume 8, Article number: 131 (2008)
The elucidation of gene expression patterns leads to a better understanding of biological processes. Real-time quantitative RT-PCR has become the standard method for in-depth studies of gene expression. A biologically meaningful reporting of target mRNA quantities requires accurate and reliable normalization in order to identify real gene-specific variation. The purpose of normalization is to control several variables such as different amounts and quality of starting material, variable enzymatic efficiencies of retrotranscription from RNA to cDNA, or differences between tissues or cells in overall transcriptional activity. The validity of a housekeeping gene as endogenous control relies on the stability of its expression level across the sample panel being analysed. In the present report we describe the first systematic evaluation of potential internal controls during tomato development process to identify which are the most reliable for transcript quantification by real-time RT-PCR.
In this study, we assess the expression stability of 7 traditional and 4 novel housekeeping genes in a set of 27 samples representing different tissues and organs of tomato plants at different developmental stages. First, we designed, tested and optimized amplification primers for real-time RT-PCR. Then, expression data from each candidate gene were evaluated with three complementary approaches based on different statistical procedures. Our analysis suggests that SGN-U314153 (CAC), SGN-U321250 (TIP41), SGN-U346908 ("Expressed") and SGN-U316474 (SAND) genes provide superior transcript normalization in tomato development studies. We recommend different combinations of these exceptionally stable housekeeping genes for suited normalization of different developmental series, including the complete tomato development process.
This work constitutes the first effort for the selection of optimal endogenous controls for quantitative real-time RT-PCR studies of gene expression during tomato development process. From our study a tool-kit of control genes emerges that outperform the traditional genes in terms of expression stability.
The study of gene expression patterns is one of the modern molecular biology cornerstones. Gene expression analyses have provided insight into complex biological processes, increasing our understanding of signalling and metabolic pathways that underlie environmental responses and development. Real-time reverse transcription PCR (real-time RT-PCR) is currently the standard method for accurate expression profiling of a moderate number of selected genes, its main advantages being a higher sensitivity and specificity, and a broader quantification range than previous molecular techniques [1–4]. Real-time RT-PCR analysis has become the most common method for verification of microarray expression results [5, 6], reaching a notable level of throughput [7, 8].
Regardless of the experimental technique employed, appropriate normalization is essential for obtaining an accurate and reliable quantification of gene expression levels, especially when measuring small expression differences or when working with tissues of different histological origin . The purpose of normalization is to correct for variability associated with the various steps of the experimental procedure, such as differences in initial sample amount, RNA recovery, RNA integrity, efficiency of cDNA synthesis, and differences in the overall transcriptional activity of the tissues or cells analyzed. Among the numerous normalization approaches that have been proposed [10–15] the use of internal controls or reference genes has become the method of choice [3, 4], because they potentially account for all the above-mentioned sources of variability. Since the internal control and target sequences are naturally present in the biological sample, both will undergo the same type of variation throughout the assay. The success of this normalization strategy is highly dependent on the choice of the appropriate control gene: its expression level should be relatively constant across the tissues or cells tested, and should not be significantly altered by the experimental pressures introduced . If the expression of the reference gene is affected by an excessive variation the detection of small changes becomes unfeasible or, at worst, erroneous expression patterns could be inferred .
There is a general consensus on using housekeeping genes as internal controls in RT-PCR expression analyses. Since housekeeping genes are required for cellular survival, it is assumed that they are stably expressed and are often used without validating their suitability. However, numerous studies reported that the transcript levels of commonly used housekeeping genes can vary considerably under different experimental conditions [10, 11, 17–23]. Moreover, a reference gene with stable expression in one organism may not be suitable for normalization of gene expression in another [24, 25]. In recent years, it has become clear that it is necessary to validate the expression stability of a candidate control gene in each experimental system prior to its use for normalization. In this regard, several free software-based applications such as geNorm , NormFinder  or qBase  permit the statistical identification of the best internal controls from a group of candidate normalization genes in a given set of biological samples. The combination of these statistical tools with microarray and expressed sequence tags (EST) data sets has been shown to be a valuable source of internal control genes for real-time RT-PCR experiments, providing a new generation of reference genes with very stable expression levels that outperform the classical housekeeping genes [23, 27, 29].
Tomato is an important model for genetic and molecular studies, and an international tomato genome project in currently in progress. However, a systematic study validating internal control genes for expression analyses of different developmental stages has not been accomplished in tomato as has occurred with Arabidopsis , rice  and soybean . Searches of the literature reveal a single report in which several classical housekeeping genes are proposed as internal controls based on the relative abundance of tomato EST . Nevertheless, no genes were identified that showed stable expression across a wide range of developmental conditions and any candidate control gene was further evaluated with a more accurate analytical technique. In the present report, we tested the performance of 7 classical and 4 novel housekeeping genes as internal controls for quantitative real-time RT-PCR experiments, in a set of 27 samples representing different tissues and organs of tomato plants at different developmental stages. In addition to 3 references genes suitable for transcript normalization in the whole developmental series, we recommended other combinations of internal controls that provide a more accurate normalization in studies focused on less heterogeneous sample panels.
A set of 27 tissue samples from Solanum lycopersicon cv. ciliegia plants, comprising all tomato organs at different developmental stages, was processed with a commercial kit. Purified total RNAs had a mean value of 1.98 (SD = 0.09) for 260/280 nm ratios and showed, after denaturing electrophoresis, sharp and intense 18S and 25S ribosomal RNA bands with a practical absence of smears. The level of genomic DNA (gDNA) contamination in each RNA preparation was estimated by real-time PCR through the amplification of an alpha-tubulin gene sequence (tables 1 and 2). Only RNA samples from mature roots and immature green fruit gave a contamination signal, but with threshold cycle (Ct) values higher than 35. The cDNAs obtained from contaminated RNA samples were controlled during the corresponding RT-PCRs by means of reverse transcriptase-minus amplification reactions (RT-minus controls).
Performance of amplification primers
A total of 11 genes were selected as candidates for normalization of gene expression measures during tomato development studies (table 1). These include 7 classical (GAPDH, EFα1, TBP, RPL8, APT, DNAJ and TUA) and 4 novel (TIP41, SAND, CAC and SGN-U346908 – from now on referred to as "Expressed") housekeeping genes. In order to control for gDNA contaminations in the cDNA samples, PCR primers were designed on different exons (table 2) or spanning an exon-exon junction (forward primer for GAPDH and TIP41 genes), mainly guided by information about Arabidopsis genes. The performance of the amplification primers was tested by real-time PCR in two ways. First, aliquots from the 27 cDNA samples were pooled and used as template in amplification reactions with each primer-pair. A single band with the expected size (table 2) was obtained in each case without signs of primer-dimers formation (figure 1, odd lanes), as suggested by the previous melting curve analyses. Second, amplification primers were tested using gDNA as template (figure 1, even lanes). Seven primer-pairs yielded amplicons longer than those obtained with a cDNA template (table 2), whereas primers for GAPDH, TBP, RPL8, and SAND genes were unable to amplify genomic sequences. This result implies that intron position prediction in tomato genes was successful. As summarized in table 2, six amplicons obtained from gDNA have a melting temperature sufficiently different from those of corresponding cDNA amplicons to allow detection of gDNA interferences in a homogeneous assay. In the case of EFα1 primers, real-time RT-PCR should be followed by standard agarose gel electrophoresis. Absolute Tm values in table 2 should be considered with caution because they depend on the ionic strength of the actual reaction mix and the precision/accuracy of the real-time PCR platform.
Finally, in order to optimize PCR conditions, different primer concentrations were tested by real-time RT-PCR with the cDNA pool as template. Table 2 shows primer concentrations that provided the lowest Ct and thus the highest amplification efficiency.
Ct data collection
Real-time RT-PCR was conducted on the 27 cDNA samples with the 11 primer-pairs. RT-minus controls were incorporated for mature roots and immature green fruit samples and only with the seven primer-pairs that yielded an amplification signal using gDNA as template. The 11 candidate control genes displayed a relatively wide range of expression level with mean Ct values between 21.1 (GAPDH) and 30.9 (EFα1). The RT-minus controls for mature roots and immature green fruit reached the fluorescence threshold only with APT, CAC and "Expressed" primers, but an extra treatment with DNase was not required because the Ct values of the mentioned RT-minus reactions were at least 10 cycles higher than those in the corresponding RT-PCRs, exceeding the minimum of 5 cycles recommended by Nolan et al. . Amplification specificity was confirmed by melting analysis or, in the case of EFα1 primers, by agarose gel electrophoresis.
Expression stability of housekeeping genes in the whole developmental series
In order to identify the most stably expressed genes during tomato development, the entire Ct dataset was analyzed using three different statistical approaches that have been incorporated in free specific VBA applets. The "pairwise comparison strategy" , implemented in the geNorm software, evaluates the variation of relative quantity (RQ) ratios for each gene-pair along the sample series. The "model-based approach for estimation of expression variation" , supported by the NormFinder software, estimates intra- and intergroup variation, and thus subdivision of the sample set in at least two coherent groups is required for the correct application of this approach. In this sense, we initially established the following sample-groups: roots (n = 5), leaves (n = 7; including cotyledons), inflorescences (n = 9) and fruits (n = 6). Since a minimum of 8 samples/group is recommended , expression data from different organs were also combined into "vegetative" (roots and leaves; n = 12) and "reproductive" (flowers and fruits; n = 15) sample-groups. The third statistical approach determines the expression stability for each control gene as the coefficient of variation (CV) of the relative expression levels after normalization. This evaluation strategy has been incorporated in the qBase program although limited to the analysis of 5 candidate genes .
The results of the three evaluation approaches are shown in table 3. It is noteworthy that definition of sample-groups had a notable effect on NormFinder output. Only two different NormFinder analyses were included in table 3 because we believe that sample grouping should not be arbitrary in an effort to adjust group sizes to increment statistical power, but rather that it reflects comparisons that researchers wish to make. It is remarkable that the NormFinder output with 4 sample-groups and CV ranking differ only in the relative position of the CAC and TIP41 genes. The results of the three statistical analyses exhibit several common features: i) CAC and TIP41 always rank as the most stably expressed housekeeping genes; ii), "Expressed", TBP and SAND also exhibit a remarkable stability of their expression levels and are always included among the 5 best performing reference genes; iii) GAPDH, EFα1 and TUA show unstable expression patterns and are always classified among the least reliable control genes.
Since the different statistical analyses applied to the expression data represent complementary strategies, we decided to combine results of the three evaluation approaches in a consensus rank, after averaging the two NormFinder outputs. For this purpose, genes were scored from 1 (most stable) to 11 (less stable) based on their relative position in each individual list. When two candidate genes are co-localized in a particular ranking (i.e., CV of the corresponding expression stability values ≤ 15%), both were scored with the average of the two consecutive positions. From the resulting consensus rank (table 3) it can be concluded that the best choice for normalization of expression measures in the entire developmental series are CAC and TIP41 genes, followed by "Expressed". Analysis of pairwise variation between two sequential normalization factors (NF) revealed that three genes are sufficient to calculate an accurate sample-specific NF as the geometric mean of their RQs. That is, the addition of a fourth control gene into the CAC/TIP41/Expressed combination does not significantly change NFs. The variation value for the pairs NF3/NF4 (V3/4 = 0.118) is lower than the default cut-off value of 0.15 . The mean M and CV values for the CAC/TIP41/Expressed genes in the complete developmental series are M = 0.537 and CV = 0.338. These values are inside the ranges M ≤ 1 and CV ≤ 0.5, which have been proposed by Hellemans et al.  as acceptable for heterogeneous sample panels, such as the space-temporal one surveyed in the present study. Unfortunately, reference values for assessing the relevance of NormFinder scoring have not been specified by the software's authors . In short, the CAC/TIP41/Expressed gene-triplet is recommended for accurate normalization of gene expression measures encompassing the complete development process in tomato.
To assess the validity of the procedure for the selection of control genes detailed above, the relative expression level of the ToFZY gene was estimated in five tomato tissue samples, using the control genes that we recommended for the normalization of gene expression measures in the whole developmental series. For this purpose, we used ToFZY specific primers described previously  and applied an efficiency-correction model for relative quantification . Our results (figure 2) were highly concordant with the transcriptional pattern of YUC1/YUC4 genes (the Arabidopsis homologous to tomato ToFZY gene) reported by Cheng et al.  based on histochemical analysis of GUS reporter lines and in situ hybridization.
Assessment of normalization in sample subsets
The same evaluation procedure applied to data from the whole developmental series was tested on different sample combinations which, in our understanding would represent plausible experimental contexts. Cotyledons were always included in the leaf sample group because top-ranked genes were not affected by this combination. The unique exceptions to the analysis routine were the inference of consensus gene-rankings for individual organs (root, leaf, inflorescence and fruit), which were constructed without participation of NormFinder software because an estimate of the intergroup variation is not possible. Results shown in table 4 can be used as a guide for selection of suitable control genes that fulfil specific research needs with regard to the particular developmental series analysed. The complete consensus rankings are available as additional file 1.
The combinations of control genes recommended for the different sample subsets (table 4) are basically constructed with those that were ranked among the top five in the analysis of the whole developmental series (table 3), with the notable exception of TBP which is now downgraded in most consensus rankings (see additional file 1). It is clear that normalization of expression measures within organs have different requirements from comparisons between organs. On the one hand, two control genes are sufficient for accurate normalization in individual organs, as indicated by V2/3 values lower than 0.15 (table 4). The recommended gene-pairs have mean stability values that are acceptable (M ≤ 0.5 and CV ≤ 0.25) for relatively homogeneous sample panels . In these cases, a third control gene is included (table 4, in brackets) for those wanting to use a minimum of 3 genes for calculating NFs, as suggested by Vandesompele et al. . In addition, the SAND gene is revealed as appropriate for normalization within organs, but less advisable for between-organs experiments. On the other hand, when the sample subsets were comprised of 2 or 3 different organs the evaluation procedure indicated that 3 control genes are necessary for a reliable normalization (see V3/4 values in table 4). The expression levels of the 2 proposed triplets of control genes undergo oscillations comparable to those observed in the entire sample set, to judge by the values of the corresponding stability parameters (mean M and CV values in table 4). Moreover, control genes recommended for normalization in the complete developmental series (CAC/TIP41/Expressed) are also suitable for 3 different combinations of plant organs. The only exception is the subset integrated by inflorescence and fruit samples which can be suitably normalized with the following gene-triplet: CAC/SAND/RPL8. In this case, if RPL8 is substituted by TIP41, the next most stable gene in the inflorescence/fruit consensus ranking (additional file 1), the mean values for stability parameters would remain acceptable (M = 0.433 and CV = 0.341) and, at the same time, would allow the tool-kit of control genes for normalization during tomato development to be reduced to 4 components: CAC, TIP41, Expressed and SAND.
The detection of differentially expressed genes has contributed to understanding how developmental processes are conducted in a biological system such as tomato plant. In the field of gene-expression analysis, real-time reverse transcription PCR (RT-PCR) has become the method of choice for accurate expression profiling of selected genes [1–4]. Correct sample-normalization is an absolute prerequisite for reliable and accurate measurement of gene expression, especially when studying the biological relevance of small differences or when handling samples from different organs or tissues . The actual gold standard for controlling inter-sample variations, both in the amount and quality of cDNA inputs, is the use of suitable genes as endogenous controls . However, since there are no universal control genes, a set of potential references must be previously validated in each particular experimental background. Recently, an exhaustive analysis anchored on microarray data about expression profiles in Arabidopsis  allowed the identification of hundreds of potential reference genes, which show exceptional expression stability throughout development and under a wide range of environmental conditions. Despite its relevance as a model organism, certain biological processes are not tractable in Arabidopsis, such as the ripening of fleshy fruits which has received considerable attention in tomato. In addition, the conclusions derived from studies in Arabidopsis cannot be directly extrapolated to any vascular plant species. For example, UBQ10 gene shows highly stable expression in Arabidopsis , whereas it seems unsuitable for normalization in different tissues at different developmental stages in rice and soybean [24, 25]. This emphasizes the importance of preliminary evaluation studies, aimed to identify the most stable housekeeping genes in different organisms. Taking the above-mentioned arguments into account, we accomplished a systematic study of the expression stability of 11 housekeeping genes in Solanum lycopersicon, along a series composed of 27 samples from different tissues/organs at different developmental stages.
In an effort to minimize bias introduced by the validation approach, three different, yet complementary, statistical strategies were used to select the best internal controls for normalization of gene expression studies in tomato. The pairwise comparison strategy, accessible through the geNorm software , is a very popular option for verifying the expression stability of candidate genes. It relies on the principle that variations in the expression ratios of two housekeeping genes reflect the fact that at least one of the two genes is not constantly expressed. Its main advantage is that expression ratios allow a fine control of variations in the amount of cDNA inputs, because these oscillations associated to technical variability affect both paired genes equally. It has been argued that the major weakness of the pairwise comparison approach is its sensitivity to co-regulation, that is, it apparently tends to select those genes with the highest degree of similarity in their expression profile . However, it should be noted that the stability measure provided by geNorm (M) is the mean pairwise variation between a gene and all other tested candidates, and thus a pair of highly co-regulated genes could soon be eliminated during the selection process if they show high inter-sample variability. In addition, the advantage of two co-regulated genes is inversely proportional to the number of candidate genes being validated. An obvious prediction about behaviour of two co-regulated genes in the pairwise variation approach is that they will be scored with a similar M value. Indeed, there are numerous examples in the literature of genes belonging to the same functional class (typically different subunits of the same multiprotein complex) that are not top-ranked by the geNorm software, but which occupy closed positions in the ranking. Whatever that means, and since it is very difficult to foresee common expression patterns, the threshold cycle data were analyzed with two other statistical strategies that are less sensitive towards co-regulation of the candidate genes. On the one hand, the "model-based approach" implemented in the NormFinder software examines variation within and between sample groups that must be defined by the user. On the other hand, overall expression variation of each candidate gene was measured as the coefficient of variation (CV) of the normalized relative quantities (NRQ). The NormFinder approach stands out because it makes a balance of two sources of variation, but it does not account for systematic errors during sample preparation. Nevertheless, the CV strategy overcomes this drawback through the handling of normalized quantities, and may be a good alternative when the sample set cannot be appropriately subdivided. Although other valid statistical strategies have been successfully applied to control gene selection , the above-mentioned approaches are usually preferred because they are supported by user-friendly software.
Since the 3 statistical approaches complement one another their outcomes were equally weighted and combined into a consensus ranking. As the main result of this analysis, based on real-time PCR data, we proposed a tool-kit of control genes suitable for normalization of gene expression measures in a wide variety of samples in tomato. This tool-kit is composed of 4 housekeeping genes (CAC, TIP41, Expressed and SAND), which are recommended in different combinations depending on the sample origin (tables 3 and 4). Our analysis suggests that studies involving different tomato organs require at least 3 control genes for reliable and accurate normalization, while two control genes are sufficient for experiments within particular organs. The method of calculating a sample-specific normalization factor as the geometric mean of multiple carefully selected housekeeping genes  is currently the golden standard [3, 12]. This approach has been adopted by many researchers and has been empirically and statistically validated [26, 35–37]. Although the minimal use of three control genes has been proposed for the correct normalization of RT-PCR data  the actual optimal number of control genes should arise from a balance between economic considerations and accuracy, keeping in mind that normalization with multiple genes is less error-prone than single gene normalization [26, 35–37].
Among the housekeeping genes evaluated in the present study, DNAJ, GAPDH and TUA genes have been previously described as "candidate controls" in tomato plants . These genes were selected after the expression analysis of 127 transcripts in 27 expressed sequence tag libraries, but none of them was described as a suitable control gene for all tissues. Our results, based on data obtained with a more accurate and precise technique, lead to the conclusion that DNAJ gene may be useful for normalization in inflorescence samples (table 4) and, to a lesser extent, in leaves, fruits or a leaf/inflorescence developmental series (additional file 1). This is in accord with the results of Cocker and Davis . However, we suggest that GAPDH and, especially, TUA should be avoided as control genes because their expression stability is far from acceptable. For instance, the NRQs of TUA gene showed CVs higher than 180% in leaf and fruit samples. As another contribution of the present report, our results indicate that reliable normalization of the whole tomato developmental series is possible with the CAC, TIP41 and Expressed genes. Finally, the results reported herein are in good agreement with those described in Arabidopsis by Czechowski et al. guided by microarray expression data . In fact, the 4 control genes that we recommended for normalization in tomato are among the 5 top-ranked genes in Arabidopsis, although with a different relative position in the respective rankings. These novel control genes, as in Arabidopsis, are superior to traditional ones in terms of expression stability.
This work constitutes the first in-depth study aimed to validate the optimal control genes for the quantification of transcript levels during tomato development using real-time RT-PCR technology. We have tested the expression stabilities of 11 candidate genes in a set of 27 tissue samples from tomato plants. As a result of this evaluation, we recommend 4 non-classical housekeeping genes as superior references for normalization of gene expression measures in different tomato developmental stages, and provide primer sequences whose performance in real-time PCR experiments is demonstrated. Finally, we have provided useful background information about the procedure of control gene selection in quantitative RT-PCR studies of gene expression.
Growth and Maintenance of Plants
Tomato (Solanum lycopersicon) cv. ciliegia plants were maintained under growth chamber conditions at 25 ± 2°C with standard potting compost in 9 cm diameter pots. The relative humidity was kept around 60% with a 12 h photoperiod (120 mol PAR m-2 s-1). Plants were moved to a glasshouse when the 5-leaves stage was reached.
Sampling of the developmental series was prolonged over a 5-month period and comprised a total of 27 samples. The primary root that emerges through the seed coat was harvested at 72, 78 and 96 h following water imbibition. The proximal and distal portions of the mature root were collected at the 7-leaves stage. Cotyledons were excised at 96 h after seed imbibition. Six leaf samples were harvested per individual at the 6-leaves stage and always came from the apical leaflet. A total of 9 inflorescence developmental stages were established on the basis of bud sizes (8 samples; from 1 to 8 mm) and flower opening, as proposed by Brukhin et al. . Seeds and pericarp were gently removed from fruits at 3 different developmental stages: immature green, breaker and red stages. After collection, samples were immediately frozen in liquid N2 and stored at -80°C until RNA extraction.
Total RNA and genomic DNA isolation
Total RNA was purified from tissue samples using the Spectrum™ Plant Total RNA Kit and on-column DNase I digestion, following the manufacturer's recommendations (Sigma-Aldrich). The amount of starting sample was 50 mg and this required the pooling of tissues from 5–30 individuals depending on the size of the material recovered. RNA was quantified using absorbance at 260 nm, whereas its purity was assessed based on absorbance ratios at 260/280 nm. The integrity of purified RNA was confirmed by denaturing agarose gel electrophoresis and ethidium bromide staining. Genomic DNA was isolated from young leaves (100 mg) using the GenElute™ Plant Genomic DNA Miniprep Kit (Sigma-Aldrich) according to the manufacturer's instructions and checked by standard agarose electrophoresis.
Selection of tomato sequences
We selected 11 potential reference genes (table 1) that belong to different functional classes to reduce the chance that the genes might be co-regulated, with the possible exception of RPL8 and EFα1 since both participated in the translation process. This group of genes comprised several classical housekeeping genes which are commonly used as internal control for expression studies [7, 24, 35, 39, 40], such as GAPDH (glyceraldehyde-3-phosphate dehydrogenase), EFα1 (elongation factor α1), TBP (TATA binding protein), RPL8 (ribosomal protein L8), APT (adenine phosphoribosyl transferase), DNAJ (DnaJ-like protein) and TUA (alpha-tubulin). Based on expressed sequence tag data, the GAPDH, DNAJ and TUA genes have been proposed as internal controls for expression analyses involving different tomato organs .
The set of candidate reference genes also included less conventional housekeeping genes which showed highly stable expression levels in analyses of microarray data-sets from Arabidopsis , such as TIP41 (TIP41-like protein), SAND (SAND family protein), AT5G46630 (clathrin adaptor complexes subunit) or AT4G33380 (expressed sequence). Some of these genes have also showed to be stably expressed in a variety of grapevine tissues . Potential homologs to the corresponding Arabidopsis housekeeping genes were identified in the tomato unigen collection of the SOL Genomics Network (Cornell University; http://www.sgn.cornell.edu) via amino acid sequence comparisons with the BLASTX application.
PCR primer design
The amplification primers for real-time PCR were designed using PRIMER3 software . In order to control for genomic DNA contamination, amplification primers were targeted to different exons (table 2). Information about exon positions in tomato GAPDH and EFα1 genes was directly available from databases. For the remaining tomato housekeeping genes, exon/intron boundaries were predicted through alignments  involving amino acid or nucleotide sequences from Arabidopsis and tomato, and based on information about exon positions from Arabidopsis Genome Project. The performance of the designed primers (table 2) was tested by real-time PCR using either tomato cDNA or genomic DNA templates.
Real-time amplification reactions were performed using SYBR Green detection chemistry and run in triplicate on 96-wells plates with the iCycler iQ thermocycler (Bio-Rad). Reactions were prepared in a total volume of 20 μl containing: 4 μl of template, 2 μl of each amplification primer (optimized concentration in table 2), 10 μl of 2× FastStart SYBR Green Master (Roche Applied Science) and 2 μl of fluorescein as normalization dye. Blank controls were run in triplicate for each master mix. The cycling conditions were set as follows: initial denaturation step of 95°C for 10 min to activate the FastStart Taq DNA polymerase, followed by 45 cycles of denaturation at 95°C for 15 s, annealing at 60°C for 30 s and extension at 72°C for 30 s. The amplification process was followed by a melting curve analysis, ranging from 60°C to 90°C, with temperature increasing steps of 0.2°C every 10 s. Baseline and threshold cycles (Ct) were automatically determined using the Bio-Rad iQ Software 3.0.
The cDNA samples for real-time RT-PCR experiments were synthesized from 1 μg of total RNA and random nonamer primers, using the First-Strand Synthesis System of Sigma-Aldrich. The cDNAs were diluted to a final volume of 200 μl. A mixture of the 27 diluted cDNA samples was used for selecting the optimal concentration of each PCR primer pair (table 2), in a 0.2–0.6 μM range and based on the generation of lowest Ct values. The PCR efficiency was determined for each primer pair in its optimal concentration (table 2) with the DART-PCR workbook , which uses fluorescence data captured during the exponential phase of each amplification reaction. The amplicons obtained with each primer pair from the cDNA mixture and from a random subset of individual cDNA samples were checked by electrophoresis on 2% agarose gels and ethidium bromide staining.
The possibility of genomic DNA contamination in the RT-PCR assays was controlled in two ways, and through the ability of amplification primers to generate different amplicons from genomic DNA than from cDNA. First, each primer pair was tested by real-time PCR using tomato genomic DNA as template (1 ng). The melting temperature and the size of the amplicons obtained in these reactions were annotated (table 2) and considered further in the analyses of the RT-PCR results. Second, for each of the 27 RNA samples, a quantity equivalent to the cDNA used in the amplification reactions (i.e. 20 ng of total RNA) was amplified by real-time PCR using primers targeted to alpha-tubulin sequences (table 2). These primers provided a great power for detecting genomic DNA contaminations. RNA samples giving alfa-tubulin amplification were further controlled by means of RT-minus amplification reactions.
Statistical analysis of gene expression stability
The suitability of candidate control genes was evaluated by applying three different statistical approaches to expression data (i.e. Ct values). These strategies provide complementary measures of gene expression stability among cDNA samples. In the first approach, Ct values were converted into relative quantities (RQs) using the sample with the lowest Ct as calibrator and taking into account the amplification efficiencies calculated for each primer-pair (table 2), and then imported into geNorm v.3.5 software [; http://medgen.ugent.be/~jvdesomp/genorm/]. This program estimates an expression stability value (M) for each gene, defined as the average pairwise variation of a particular gene with all other control genes in a given panel of cDNA samples. Genes with the lowest M values have the most stable expression. Housekeeping genes are ranked by geNorm through the elimination of the worst-scoring candidate control gene (this is, the one with the highest M value) and recalculating of new M values for the remaining genes. After this procedure is completed, two candidate genes are always top ranked because expression ratios are required for gene-stability measurements. The geNorm program also allows the minimal number of control genes required for calculating an accurate normalization factor (NF) to be determined, as the geometric mean of their RQs, but restricted to the gene ranking previously defined by the same program. This statistical procedure was adapted to any list of ordered genes in a homemade Excel worksheet. For this aim, first a NF is calculated for each sample with the two top ranked genes. Then, the most stable remaining gene is stepwise included and the NF is recalculated in each step. Finally, a pairwise variation of two sequential NF (Vn/n+1) was estimated as the standard deviation of the logarithmically transformed NFn/NFn+1ratios, reflecting the effect of including an additional gene. A pairwise variation of 0.15 is accepted as cut-off , below which the inclusion of an additional control gene is not required for reliable normalization.
In the second evaluation approach, Ct values were log-transformed and imported into the NormFinder software [; http://www.mdl.dk/publicationsnormfinder.htm]. This strategy is based on a mathematical model of gene expression that enables estimation of the intra- and intergroup variation, which are then combined into a stability value. Candidate control genes with the minimal intra- and intergroup variation will have the lowest stability value and will be top ranked. For adequate application of the NormFinder program, the sample set should be subdivided into at least two coherent groups, each one ideally integrated for a minimum of 8 samples.
In the third evaluation approach, the coefficient of variation of normalized relative expression levels was calculated for candidate genes throughout each developmental series tested. This statistical approach, proposed by Hellemans et al. , has been implemented in the qBase software http://medgen.ugent.be/qbase/ but only 5 candidate genes can be simultaneously evaluated. In order to overcome this limitation, we incorporated in an Excel worksheet the formulas 11, 13 and 15 described in  for calculating normalized relative quantities (NRQs). First, mean Ct values were transformed into RQs using the specific amplification efficiency of each primer-pair and the sample with the lowest Ct as calibrator (formula 11). Then, a sample-specific NF was calculated as the geometric mean of the RQs estimate for all candidate genes (formula 13). Finally, NRQs were calculated as the ratio of the RQ estimated for a gene/sample pair and the corresponding sample NF (formula 15).
Gachon C, Mingam A, Charrier B: Real-time PCR: what relevance to plant studies?. J Exp Bot. 2004, 55: 1445-1454. 10.1093/jxb/erh181.
Wong ML, Medrano JF: Real-time PCR for mRNA quantitation. BioTechniques. 2005, 39: 75-85. 10.2144/05391RV01.
Nolan T, Hands RE, Bustin SA: Quantification of mRNA using real-time RT-PCR. Nat Protocols. 2006, 1: 1559-1582. 10.1038/nprot.2006.236.
VanGuilder HD, Vrana KE, Freeman WM: Twenty-five years of quantitative PCR for gene expression analysis. BioTechniques. 2008, 44: 619-626. 10.2144/000112776.
Chuaqui RF, Bonner RF, Best CJ, Gillespie JW, Flaig MJ, Hewitt SM, Phillips JL, Krizman DB, Tangrea MA, Ahram M, Linehan WM, Knezevic V, Emmert-Buck MR: Post-analysis follow-up and validation of microarray experiments. Nat Genet. 2002, 32: 509-514. 10.1038/ng1034.
Canales RD, Luo Y, Willey JC, Austermiller B, Barbacioru CC, Boysen C, Hunkapiller K, Jensen RV: Evaluation of DNA microarray results with quantitative gene expression platforms. Nat Biotechnol. 2006, 24: 1115-1122. 10.1038/nbt1236.
Czechowski T, Bari RP, Stitt M, Scheible WR, Udvardi MK: Real-time RT-PCR profiling of over 1400 Arabidopsis transcription factors: unprecedented sensitivity reveals novel root- and shoot-specific genes. Plant J. 2004, 38: 366-379. 10.1111/j.1365-313X.2004.02051.x.
Morrison T, Hurley J, Garcia J, Yoder K, Katz A, Roberts D, Cho J, Kanigan T, Ilyin SE, Horowitz D, Dixon JM, Brenan CJ: Nanoliter high throughput quantitative PCR. Nucleic Acids Res. 2006, 34: e123-10.1093/nar/gkl639.
Huggett J, Dheda K, Bustin SA: Normalization. Real-Time PCR. Edited by: Dorak MT. 2006, New York: BIOS Advanced Methods, 83-91.
Thellin O, Zorzi W, Lakaye B, De Borman B, Coumans B, Hennen G, Grisar T, Igout A, Heinen E: Housekeeping genes as internal standards: use and limits. J Biotechnol. 1999, 75: 291-295. 10.1016/S0168-1656(99)00163-7.
Suzuki T, Higgins PJ, Crawford DR: Control selection for RNA quantitation. Biotechniques. 2000, 29: 332-337.
Pfaffl MW: Quantification strategies in real time PCR. A-Z of quantitative PCR. Edited by: Bustin SA. 2004, La Jolla, CA: International University Line, 1-20.
Gilsbach R, Kouta M, Bönisch H, Brüss M: Comparison of in vitro and in vivo reference genes for internal standardization of real-time PCR data. BioTechniques. 2006, 40: 173-177. 10.2144/000112052.
Argyropoulos D, Psallida C, Spyropoulos CG: Generic normalization method for real-time PCR. Application for the analysis of the mannanase gene expressed in germinating tomato seed. FEBS J. 2006, 273: 770-777. 10.1111/j.1742-4658.2006.05109.x.
Libus J, Štorchová H: Quantification of cDNA generated by reverse transcription of total RNA provides a simple alternative tool for quantitative RT-PCR normalization. BioTechniques. 2006, 41: 156-164. 10.2144/000112232.
Dheda K, Huggett JF, Chang JS, Kim LU, Bustin SA, Johnson MA, Rook GA, Zumla A: The implications of using an inappropriate reference gene for real-time reverse transcription PCR data normalization. Anal Biochem. 2005, 344: 141-143. 10.1016/j.ab.2005.05.022.
Foss DL, Baarsch MJ, Murtaugh MP: Regulation of hypoxanthine phosphoribosyltransferase, glyceraldehyde-3-phosphate dehydrogenase and beta-actin mRNA expression in porcine immune cells and tissues. Anim Biotechnol. 1998, 9: 67-78.
Schmittgen TD, Zakrajsek BA: Effect of experimental treatment on housekeeping gene expression: validation by real-time, quantitative RT-PCR. J Biochem Biophys Meth. 2000, 46: 69-81. 10.1016/S0165-022X(00)00129-9.
Warrington JA, Nair A, Mahadevappa M, Tsyganskaya M: Comparison of human adult and fetal expression and identification of 535 housekeeping/maintenance genes. Physiol Genomics. 2000, 2: 143-147.
Selvey S, Thompson EW, Matthaei K, Lea RA, Irving MG, Griffiths LR: Beta-actin – an unsuitable internal control for RT-PCR. Mol Cell Probes. 2001, 15: 307-311. 10.1006/mcpr.2001.0376.
Lee PD, Sladek R, Greenwood CM, Hudson TJ: Control genes and variability: absence of ubiquitous reference transcripts in diverse mammalian expression studies. Genome Res. 2002, 12: 292-297. 10.1101/gr.217802.
Glare EM, Divjak M, Bailey MJ, Walters EH: β-Actin and GAPDH housekeeping gene expression in asthmatic airways is variable and not suitable for normalising mRNA levels. Thorax. 2002, 57: 765-770. 10.1136/thorax.57.9.765.
Czechowsky T, Stitt M, Altmann T, Udvardi K, Scheible W-R: Genome-wide identification and testing of superior reference genes for transcript normalization in Arabidopsis. Plant Physiol. 2005, 139: 5-17. 10.1104/pp.105.063743.
Jain M, Nijhawan A, Tyagi AK, Khurana JP: Validation of housekeeping genes as internal control for studying gene expression in rice by quantitative real-time PCR. Biochem Biophys Res Commun. 2006, 345: 646-651. 10.1016/j.bbrc.2006.04.140.
Jian B, Liu B, Bi Y, Hou W, Wu C, Han T: Validation of internal control for gene expression study in soybean by quantitative real-time PCR. BMC Molecular Biology. 2008, 9: 59-10.1186/1471-2199-9-59.
Vandesompele J, De Preter K, Pattyn F, Poppe B, Van Roy N, De Paepe A, Speleman F: Accurate normalization of real-time quantitative RT-PCR data by geometric averaging of multiple internal control genes. Genome Biol. 2002, 3: RESEARCH0034-10.1186/gb-2002-3-7-research0034.
Andersen CL, Jensen JL, Orntoft TF: Normalization of real-time quantitative reverse transcription-PCR data: a model-based variance estimation approach to identify genes suited for normalization, applied to bladder and colon cancer data sets. Cancer Res. 2004, 64: 5245-5250. 10.1158/0008-5472.CAN-04-0496.
Hellemans J, Mortier G, De Paepe A, Speleman F, Vandesompele J: qBase relative quantification framework and software for management and automated analysis of real-time quantitative PCR data. Genome Biol. 2007, 8: R19-10.1186/gb-2007-8-2-r19.
Hoogewijs D, Houthoofd K, Matthijssens F, Vandesompele J, Vanfleteren JR: Selection and validation of a set of reliable reference genes for quantitative sod gene expression analysis in C. elegans. BMC Mol Biol. 2008, 9: 9-16. 10.1186/1471-2199-9-9.
Coker JS, Davies E: Selection of candidate housekeeping controls in tomato plants using EST data. BioTechniques. 2003, 35: 740-748.
Expósito-Rodríguez M, Borges AA, Borges-Pérez A, Hernández M, Pérez JA: Cloning and biochemical characterization of ToFZY, a tomato gene encoding a flavin monooxygenase involved in a tryptophan-dependent auxin biosynthesis pathway. J Plant Growth Regul. 2007, 26: 329-340. 10.1007/s00344-007-9019-2.
Pfall MW: A new mathematical model for relative quantification in real-time RT-PCR. Nucleic Acid Res. 2001, 29: e45-10.1093/nar/29.9.e45.
Cheng Y, Dai X, Zhao Y: Auxin biosynthesis by the YUCCA flavin monooxygenases controls the formation of floral organs and vascular tissues in Arabidopsis. Genes Dev. 2006, 20: 1790-1799. 10.1101/gad.1415106.
Brunner AM, Yakovlev IA, Strauss SH: Validating internal controls for quantitative plant gene expression studies. BMC Plant Biol. 2004, 4: 14-10.1186/1471-2229-4-14.
Reid KE, Olsson N, Schlosser J, Peng F, Lund ST: An optimized grapevine RNA isolation procedure and statistical determination of reference genes for real-time RT-PCR during berry development. BMC Plant Biol. 2006, 6: 27-37. 10.1186/1471-2229-6-27.
Szabo A, Perou CM, Karaca M, Perreard L, Quackenbush JF, Bernard PS: Statistical modelling for selecting housekeeper genes. Genome Biol. 2004, 5 (8): RESEARCH0059-10.1186/gb-2004-5-8-r59.
Gabrielsson BG, Olofsson LE, Sjogren A, Jernas M, Elander A, Lonn M, Rudemo M, Carlsson LM: Evaluation of reference genes for studies of gene expression in human adipose tissue. Obesity Res. 2005, 13: 649-652. 10.1038/oby.2005.72.
Brukhin V, Hernould M, Gonzalez N, Chevalier C, Mouras A: Flower development schedule in tomato Lycopersicon esculentum cv. sweet cherry. Sex Plat Reprod. 2003, 15: 311-320.
Dheda K, Huggett JF, Bustin SA, Johnson MA, Rook G, Zumla A: Validation of housekeeping genes for normalizing RNA expression in real-time PCR. Biotechniques. 2004, 37: 112-119.
Radonić A, Thulke S, Mackay IM, Landt O, Siegert W, Nitsche A: Guideline to reference gene selection for quantitative real-time PCR. Biochem Biophys Res Commun. 2004, 313: 856-862. 10.1016/j.bbrc.2003.11.177.
Rozen S, Skaletsky HJ: Primer3 on the WWW for general users and for biologist programmers. Bioinformatics Methods and Protocols. Edited by: Krawetz S, Misener S, Totowa NJ. 2006, Humana Press: Methods in Molecular Biology, 365-386.
Tamura K, Dudley J, Nei M, Kumar S: MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Mol Biol Evol. 2007, 24: 1596-1599. 10.1093/molbev/msm092.
Peirson SN, Butler JN, Foster RG: Experimental validation of novel and conventional approaches to quantitative real-time PCR data analysis. Nucleic Acids Res. 2003, 31: e73-10.1093/nar/gng073.
This work was partially funded by an INVESCAN, S.L. grant (No. OTT2001438) to the CSIC. The first author was supported by a research contract (ID-TF-06/002) from the Consejería de Industria, Comercio y Nuevas Tecnologías (Gobierno de Canarias). The authors thank CajaCanarias for research support. We also acknowledge Mrs. Pauline Agnew, who endeavoured to edit the English translation of the manuscript.
MER performed the entire experimental procedure; MER and JAP analyzed data and wrote the manuscript; MER, AAB, ABP and JAP conceived, designed and supervised the study. All authors read and critically revised the manuscript.
About this article
Cite this article
Expósito-Rodríguez, M., Borges, A.A., Borges-Pérez, A. et al. Selection of internal control genes for quantitative real-time RT-PCR studies during tomato development process. BMC Plant Biol 8, 131 (2008) doi:10.1186/1471-2229-8-131
- Control Gene
- Pairwise Variation
- Accurate Normalization
- Intergroup Variation
- Adenine Phosphoribosyl Transferase