Skip to main content

Identification and evaluation of new reference genes in Gossypium hirsutumfor accurate normalization of real-time quantitative RT-PCR data



Normalizing through reference genes, or housekeeping genes, can make more accurate and reliable results from reverse transcription real-time quantitative polymerase chain reaction (qPCR). Recent studies have shown that no single housekeeping gene is universal for all experiments. Thus, suitable reference genes should be the first step of any qPCR analysis. Only a few studies on the identification of housekeeping gene have been carried on plants. Therefore qPCR studies on important crops such as cotton has been hampered by the lack of suitable reference genes.


By the use of two distinct algorithms, implemented by geNorm and NormFinder, we have assessed the gene expression of nine candidate reference genes in cotton: GhACT4, GhEF1α5, GhFBX6, GhPP2A1, GhMZA, GhPTB, GhGAPC2, GhβTUB3 and GhUBQ14. The candidate reference genes were evaluated in 23 experimental samples consisting of six distinct plant organs, eight stages of flower development, four stages of fruit development and in flower verticils. The expression of GhPP2A1 and GhUBQ14 genes were the most stable across all samples and also when distinct plants organs are examined. GhACT4 and GhUBQ14 present more stable expression during flower development, GhACT4 and GhFBX6 in the floral verticils and GhMZA and GhPTB during fruit development. Our analysis provided the most suitable combination of reference genes for each experimental set tested as internal control for reliable qPCR data normalization. In addition, to illustrate the use of cotton reference genes we checked the expression of two cotton MADS-box genes in distinct plant and floral organs and also during flower development.


We have tested the expression stabilities of nine candidate genes in a set of 23 tissue samples from cotton plants divided into five different experimental sets. As a result of this evaluation, we recommend the use of GhUBQ14 and GhPP2A1 housekeeping genes as superior references for normalization of gene expression measures in different cotton plant organs; GhACT4 and GhUBQ14 for flower development, GhACT4 and GhFBX6 for the floral organs and GhMZA and GhPTB for fruit development. We also provide the primer sequences whose performance in qPCR experiments is demonstrated. These genes will enable more accurate and reliable normalization of qPCR results for gene expression studies in this important crop, the major source of natural fiber and also an important source of edible oil. The use of bona fide reference genes allowed a detailed and accurate characterization of the temporal and spatial expression pattern of two MADS-box genes in cotton.


Gene expression analysis is increasingly important in many fields of biological research. Understanding patterns of expressed genes is crucial to provide insights into complex regulatory networks and will lead to the identification of genes relevant to new biological processes [1].

Reverse transcription real-time quantitative polymerase chain reaction (qPCR) is a robust method to study gene expression changes [2]. The main advantages of qPCR when compared to other experimental techniques used to evaluate gene expression levels, such as Northern blot hybridization and reverse transcription-polymerase chain reaction (RT-PCR), are its higher sensitivity, specificity, and broad quantification range of up to seven orders of magnitude [3]. Therefore, qPCR analysis has become the most common method for validating the whole-genome microarray data or a smaller set of genes and molecular diagnostics [4]. Although being extremely powerful technique, qPCR suffers from certain pitfalls, noteworthy the use of unreliable reference genes for the normalization step [5]. Normalization is necessary for the correction of non-specific variations, such as inaccurate quantification of RNA and problems in the quality of RNA that can trigger variable reverse transcription and PCR reactions. A number of strategies have been proposed to normalize qPCR data but normalization remains one of the most important challenges concerning this technique [5].

The expression of reference genes used for normalization in qPCR analysis should remain constant between the cells of different tissues and under different experimental conditions; otherwise, it can lead to erroneous results. Recent reports have demonstrated that some of the most well-known and frequently used reference genes are inappropriate for normalization in qPCR analysis due to expression variability [68]. The importance of reference genes for plant qPCR analysis has been recently emphasized even though the identification of these genes is quite laborious [9, 10]. Microarray datasets can also be a rich source of information for selecting qPCR reference genes [6], but unfortunately, this tool is still not available for most of plant species, including cotton.

The classical housekeeping genes involved in basic cellular processes such as 18 S rRNA, ubiquitin, actin, β-tubulin, and glyceraldehyde-3-phosphate dehydrogenase have been recurrently used as internal controls for gene expression analysis in plant as they are supposed to have a uniform expression all samples and experimental conditions tested. However, several reports demonstrated that the transcript levels of these genes also vary considerably under different experimental conditions and are consequently unsuitable for gene expression studies [6, 11]. Statistical algorithms such as geNorm [1], NormFinder [12] and BestKeeper [13] have been developed for the evaluation of best suited reference gene(s) for normalization of qPCR data in a given set of biological samples. Recognizing the importance of reference genes in normalization of RT-qPCR data, various housekeeping genes have been evaluated for stable expression under specific conditions in various organisms. Many works have been carried on animal and human health [3, 14] field that describe the identification of multiple reference genes for normalisation of qPCR data, but similar reports are scarce in plant research [4, 15, 16]. Czechowski et al. (2005) employed a new strategy for the identification of reference genes in Arabidopsis thaliana. Based on the microarray data of Affymetrix ATH1, several new reference genes were revealed in Arabidopsis [6]. Some of these genes have no previous information about function in Arabidopsis or any other organism. The list of new Arabidopsis reference genes revealed by Czechowski and collaborators was successfully employed to search reference genes in unrelated species such as Vitis vinifera by sequence homology [9]. Recently, our group was also successful in providing new reference genes for qPCR in Coffea arabica and Brachiaria brizantia using the same strategy employed in V. vinifera [17, 18].

Cotton (Gossypium spp.) is the world's most important source of natural fiber and also an important source of edible oil [19]. Because of its unique reproductive developmental aspects and speciation history, G. hirsutum has attracted considerable scientific interest, not only among plant breeders and agricultural scientists, but also among taxonomists, developmental geneticists, and evolutionary biologists [2024]. In spite of this, qPCR analyses in cotton are still hampered by the use of inappropriate references genes.

In this study, we report the validation of housekeeping genes to identify the most suitable internal control gene(s) for normalization of qPCR data obtained in different plant organs and floral verticils and also during flower and fruit development. In addition, to illustrate the usefulness of the new reference genes, we provided a detailed expression analysis of two MADS-box transcription factors in cotton, putative homologues of Arabidopsis AGAMOUS and SEPALLATA3 genes.


Plant Material

Experiments were performed using three-month old Gossypim hirsutum plants variety "BRS Cedro". Plants were grown under controlled temperature (21 ± 4°C) and natural photoperiod in Embrapa CENARGEM in Brasília (DF, Brazil). The organs used from cotton plants were flower buds, fruits, leaves, stems, branches, roots and floral meristem. We also included seven stages of flower development (flower buds with the following diameter sizes: 2, 4, 6, 7, 8, 10 and 12 mm) and four stages of fruit development (fruits with the following diameter sizes:10 to 15, 16 to 20, 21 to 30 and larger than 30 mm)[25]. The stages of flower and fruit and the respective major events of development are summarized in Additional file 1. In addition, floral organs (sepal, petal, stamen, carpel and pedicel) from 6 mm flower buds were dissected and harvested. The material was harvested from, at least, five different cotton plants to obtain one pool. The procedure was repeated with five distinct plants in order to obtain a second pool, the biological replicate. All samples were immediately frozen in liquid nitrogen and stored at -80°C until needed for RNA extraction.

Total RNA isolation and cDNA synthesis

Frozen samples were ground to a fine powder in liquid nitrogen with a pestle and mortar. The total RNA extractions were performed from 100 mg of each macerate plant tissue in liquid nitrogen, using Invisorb Spin Plant RNA Mini kit (Invitek) according to the protocol of the manufacturer. Two other methods of RNA extraction were evaluated (Qiagen Plant RNA easy kit and Trizol), but the yields and DNA purity in our hands were unsatisfactory (data not shown). RNA concentration and purity were determined using a NanoDropTM Spectrophotometer ND-1000 (Thermo Scientific), and the integrity of RNA was also assessed by 1% agarose gel electrophoresis and ethidium bromide staining. The presence of contaminant DNA in the RNA samples was verified by PCR using primers spanning two exon and gel electrophoresis analysis. No fragments of genomic DNA were identified in all samples tested in this work (data not shown). The presence of spurious product of amplification caused by genomic DNA was also continuously checked by the verification of RT-qPCR dissociation profile. Both tests showed that the Invisorb Spin Plant RNA Mini kit efficiently removed contaminant DNA from the RNA samples. cDNAs were synthesized by adding 50 μM of Oligo(dT24V) primer and 10 mM of each deoxyribonucleoside 5'-triphosphate (dNTPs) to 1 μg of total RNA. This mixture was incubated at 65°C for five minutes, and briefly chilled on ice. First Strand Buffer, 20 mM of dithiothreitol (DTT) and 200 units of Superscript III (Invitrogen) were added to the prior mixture and the total volume (20 μL) was incubated at 50°C for 1 h following manufacturer's instructions. Inactivation of the reverse transcriptase was done by incubating the mixture at 70°C for 15 min and the cDNA solution was stored at -20°C.

Real-time quantitative polymerase chain reaction (qPCR)

Eight of the nine putative cotton reference genes evaluated in this work, GhACT4 (actin gene family), GhEF1α5 (elongation factor 1-alpha), GhFBX6 (F-box family protein), GhPP2A1 (catalytic subunit of protein phosphatase 2A), GhMZA (clathrin adaptor complexes medium subunit family protein), GhPTB (polypyrimidine tract-binding protein homolog), GhGAPC2 (glyceraldehyde-3-phosphate dehydrogenase C-2), GhβTUB3 (β-tubulin), were selected according to their similarity to reference genes identified in Arabidopsis (Table 1) [6]. The sequences of possible G. hirsutum homologues were identified through a BLASTN against the database of the Green plant GB TAIR (The A. thaliana Information Resource, Only sequences that showed similarity higher than 1e-75 (E-value) were considered as putative homologous to the Arabidopsis genes and were selected for primer design. We also selected the gene encoding the poly-ubiquitin, GhUBQ14, commonly used in cotton for experiments of Northern blots and RT-qPCRs [26, 27] (Table 1). Primers were designed with Primer 3 software [28] using as criterion amplified products from 80 to 180 bp with a Tm of 60 ± 1°C (primer sequences are shown in Table 1). Both candidate reference and MADS-box genes were amplified from cDNA. Melting curve and gel electrophoresis analysis of the amplification products confirmed that the primers amplified only a single product with expected size (data not shown). Primer sets efficiencies were estimated for each experimental set by Miner software [29], and the values were used in all subsequent analysis (Table 2 and Additional file 2). Miner software pinpoints the starting and ending points of PCR exponential phase from raw fluorescence data, and estimates primer set amplification efficiencies through a nonlinear regression algorithm without the need of a standard curve.

Table 1 Reference genes and their primer sequences that were selected for evaluation of expression stability during flower development in cotton (Gossypium hirsutum) for qPCR analysis, as the sequence of two genes of interest MADS-box.
Table 2 Values of efficiency ± standard deviation (SD) of the primers of the housekeeping genes and average values of quantification cycle (Cq) ± standard deviation (SD) of biological replicates generated by the Miner to the genes of reference of G. hirsutum.

Polymerase chain reactions were carried out in an optical 96-well plate with a Chromo4 Real time PCR Detector (BioRad) sequence detection system, using SYBR®Green to monitor dsDNA synthesis. Reaction mixtures contained 10 μL of diluted cDNA (1:50), 0.2 μM of each primer, 50 μM of each dNTP, 1× PCR Buffer (Invitrogen), 3 mM MgCl2, 2 μL of SYBR®Green I (Molecular Probes) water diluted (1:10000), and 0.25 units of Platinum Taq DNA polymerase (Invitrogen), in a total volume of 20 μL. Reaction mixtures were incubated for five minutes at 94°C, followed by 40 amplification cycles of 15 s at 94°C, 10 s at 60°C and 15 s at 72°C. PCR efficiencies and optimal quantification cycle threshold (Cq values were estimated using the online Real time PCR Miner tool [29]. For all reference and MADS-box genes studied, two independent biological samples of each experimental condition were evaluated in technical triplicates.

Databases and procedures for searching Cotton MADS-box sequences

The primary data source for this work was clustered gene sequences of the Cotton Genome Database (U.S. Department of Agriculture, Agricultural Research Service CottonDB - In order to search for MADS-box sequences, a MADS-box consensus sequence was used. This consensus was generated by the COBBLER program (COnsensus Biasing By Locally Embedding Residues, from all identified MADS-box amino acid sequences "MGRKKIEIKRIENKTNRQVTFSKRRNGLFKKAHELSVLCDAEV ALIVFSPSGrlyeyannni" [30]. Searches were conducted using the tBLASTN algorithm with the BLOSUM62 scoring matrix [31]. All sequences that exhibit a significant alignment (E-value of ≤ 7 × 10-13) with the consensus were retrieved from Unigene in the Cotton Genome Database All retrieved sequences were then re-inspected for occurrence of MADS conserved motif using the InterProScan and PRODOM programs. Multiple alignments with complete sequences or domains were conducted using the CLUSTALW program using default parameters and then manually revised [32]. Phylogenetic trees were constructed using pairwise distance matrices for neighbor-joining method [33] and p-distance on the Mega 4.1 program [34]. Assessment of node confidence was done by means of 1,000 bootstrap replicates.

Analysis of gene expression stability

Expression levels of the nine housekeeping genes in all the sample pools were determined by the number of cycles (Cq) needed for the amplification related fluorescence to reach a specific threshold level of detection. Cq values were converted in qBase software v1.3.5 [35] into non-normalized relative quantities, corrected by PCR efficiency, using the formula Q = EΔCq where E is the efficiency of the gene amplification and ΔCq is the sample with the lowest expression in the data set minus the Cq value of the sample in question. These quantities were imported into geNorm v3.5 [1] and NormFinder [12] analysis tools, which were used as described in their manuals. Data of biological replicates were analyzed separately in both programs.


In order to compare the expression levels of target genes in different tissues at the same time, it is crucial to normalize all the samples by the same set of reference genes. For the evaluation of potential reference, a well known housekeeping gene, poly-ubiquitin (GhUBQ14), was included in the qPCR experiments [26]. We selected eight new candidates to housekeeping genes (GhACT4, GhEF1α5, GhFBX6, GhPP2A1, GhMZA, GhPTB, GhGAPC2, GhβTUB3) in G. hirsutum. These genes are putative homologues of eight Arabidopsis genes included in the list of 27 best reference genes for qPCR analysis (Table 1) [6]. For the selection of the putative cotton housekeeping genes, we searched in the Cotton DB for homologues to the Arabidopsis referenced genes, only eight candidates that showed very high similarities (E-value > 1e-75) were included in the final list. The eight genes found in the cotton databanks belong to different functional classes based on Arabidopsis sequence information, which reduce the chances of co-regulated expression occurrence among these genes (Table 1). The gene name, accession number, A. thaliana homologue locus, A. thaliana annotation, similarity end identity, gene size, and primer sequence, are provided in Table 1. The nine cotton candidate reference genes were evaluated for gene expression stability by qPCR in a set of 23 cotton samples grouped into five different experimental sets. The first experimental set was composed of plant organs: leaves, stem, branch, root, flower buds (RNA pools of stages 2 to 12 mm) and fruits (RNA pools of stages 10 to 15 to fruits larger than 30 mm). The second set included floral meristem and size selected flower buds, based on their diameter of 2, 4, 6, 7, 8, 10 and 12 mm. The third experimental set was composed of the floral verticils: sepal, petal, stamen, carpel and pedicel. The fourth experimental set consists of four stages of fruit development based on it diameter: 10 to 15 (1), 16 to 20 (2), 21 to 30 (3) and larger than 30 mm (4). Finally, in the fifth set, we included all the tissues samples used in this study (23 distinct biological samples).

Total RNA was isolated from different tissue samples and reverse transcribed. The RNA quality for all samples was checked by gel eletrophoresis analisys and spectrophotometric assays (data not shown). Within a biological replicate, for a tissue sample, the same cDNA pool was used for qPCR analysis of each of the nine genes using gene-specific primers. qPCRs were performed in triplicate for each of the 23 cDNA pools along with a no template control in parallel for each gene. The melting-curve analysis performed by the PCR machine after 40 cycles of amplification and agarose gel electrophoresis showed that all the 9 primer pairs amplified a single PCR product of desired size from various cDNA (results not shown). Primer efficiencies for all primer combinations were higher than 0.90 (90%) in all experimental sets. Although, two primers pairs presented efficiencies below 90% in four samples: GhGAPC2 in flower buds and floral and plant organs and GhPP2A1 in floral organs (Table 2). The mean Cq value (average of 6 values from the two biological replicates) in a tissue sample for each gene is shown in Table 2. Cq values were in the range of 15.30 and 29.17. GhACT4, GhUBQ14 and GhEF1α5 are the top three most expressed genes in all sets followed by GhMZA, GhβTUB3, GhPP2A1 and GhPTB. GhFBX6 and GhGAPC2 genes present the lowest expression levels in all samples.

We used geNorm v3.5 software, to analyze the expression stability of the tested genes in all samples, and ranked them accordingly to gene stability measure (M). The M value is obtained by the use of relative expression values for each cDNA sample as input for the geNorm algorithm based on the geometric averaging of multiple control genes and mean pairwise variation of a gene from all other control genes in a given set of samples. Therefore, genes with the lowest M values have the most stable expression. The results obtained with geNorm algorithm are presented in the Figure 1 and summarized in Table 3. The geNorm algorithm also determines the pairwise variation Vn/n + 1, which measures the effect of adding further reference genes on the normalisation factor (that is calculated as the geometric mean of the expression values of the selected reference genes). It is advisable to add additional reference genes to the normalisation factor until the added gene has no significant effect. Vandesompele et al. (2002) used 0.15 as a cut-off value, below which the inclusion of an additional reference gene is not required. Pairwise variation analysis (Figure 2) showed that the ideal number of reference genes may be different for distinct set of samples. For instance, for the normalization of the floral organ set, only two genes are necessary. On the other hand, five genes are required for the normalization of the plant organ set. When evaluating all the pairwise variation, the least stable housekeeping gene was GhGAPC2 followed by GhβTUB3 since they significantly increased the pairwise variation during the whole assay by increasing the V value as shown in Figure 2. However, Vandesompele and collaborators recommend the use of at least three reference genes whenever this result obtained in our analysis is observed [1].

Figure 1

Expression stability values (M) and ranking of the candidate reference genes as calculated by geNORM in al 23 cDNA samples. Average expression stability values (M) of the reference genes were measured during stepwise exclusion of the least stable reference genes. A lower value of average expression stability, M, indicates more stable expression.

Figure 2

Pairwise variation (V) to determine the optimal number of control genes for an accurate normalization. The pairwise variation (Vn/Vn+1) was analyzed between the normalization factors NFn and NFn+1 by the geNorm software. Asterisk indicates the optimal number of genes for normalization.

Table 3 Candidates genes ranked according to their expression stability estimated using geNorm algorithm after stepwise exclusion of the least stable reference gene

In addition, to the analysis by geNorm we also evaluated the data with NormFinder algorithm (Table 4). Differentially to geNorm, NormFinder takes into account intra- and intergroup variations for normalization factor (NF) calculations. When the outcome of geNorm and NormFinder are compared few, but relevant, differences are observed (Table 5). These discrepancies between the results are expected since the geNorm and NormFinder are based on distinct statistical algorithms.

Table 4 Cotton reference genes for normalization and their expression stability values calculated by the NormFinder software
Table 5 Best combination of reference genes based on geNorm and NormFinder programs

To assess the validity of the procedure for the selection of control genes detailed above, the relative expression level of two cotton genes that belong to MADS-box family were inspected. After the search in Cotton db using the MADS-box consensus sequence, 18 ESTs were found with high similarity to MIKC MADS box family (E-value ≤ 7 × 10-13) (Data not shown). The reduced number of cotton MIKC type genes is expected since the ESTs sequencing efforts in cotton are very limited when compared to other species such as Arabidopsis and rice. In spite of the low number of MADS-box genes, the phylogenetic analysis identified good candidates to homologous genes of Arabidopsis AGAMOUS (AG) and SEPALLATA3 (SEP3) (data not shown). The homologue of AG, was previously characterized by RT-PCR and named GhMADS3 [36]. RT-PCR analysis suggests that GhMADS3 expression is restricted to stamens and carpels. Ectopic expression in Nicotiana tabacum L. indicates that it is the cotton orthologous gene to AG [36]. The Arabidopsis thaliana SEP3 is expressed in the three inner whorls of organs throughout flower development, but there is no information of the putative homologue of cotton (GhSEP-like1), identified by our phylogenetic analysis [37]. The expression of GhMADS3 and GhSEP-like1 was estimated in different plant organs, during flower development and in the floral organs of 6 mm flower buds. The qPCR analysis employed the control genes recommended by NormFinder program for the normalization of gene expression. The analysis revealed that G. hirsutum GhMADS3 and GhSEP-like1 genes very similar expression profiles of AG and SEP3 genes from Arabidopsis (Figure 3). However, we also observed unexpected expression patterns: GhSEP-like1 is expressed in cotton fruits and the GhMADS3 in pedicels of 6 mm flower buds.

Figure 3

Relative mRNA levels of GhMADS3 and GhSEP-like1 mRNA in the different plant organs (a), during the flower development (b) and in the floral organs (c). Cq and amplification efficiency values were processed with the qBase software. Normalization was performed using the best combination of reference genes recommended by NormFinder program to each experimental set. The combination of GhUBQ14 and GhPP2A1 were used as internal control for plant organs (a), GhACT4 and GhUBQ14 for flower buds (b) and GhACT4 and GhFBX6 for floral organs (c).


The qPCR is broadly accepted as the method of choice for accurate and sensitive quantification of gene transcript levels, even for those genes whose transcript levels are low. For valid qPCR analysis, accurate normalization of gene expression against an appropriate internal control is required. The ideal control gene should have similar expression regardless of experimental conditions, including different cell types, developmental stages, and/or sample treatment. However, no one gene has a stable expression under every experimental condition, as numerous studies reported that expression of housekeeping genes can also vary considerably with experimental conditions. Consequently, normalization of gene expression with a single reference gene can trigger erroneous data and, consequently, misinterpretation of experiment results. Therefore, it is necessary to validate the expression stability of a control gene under specific experimental conditions prior to its use in qPCR normalization.

Normalisation with multiple reference genes is becoming the golden standard, but reports that identify such genes in plant research are limited [3, 4, 17, 18, 38, 39], even though algorithms are available to test the expression stability of candidates [1, 12, 13] and a number of candidate reference genes for Arabidopsis have been proposed [6]. To obtain a solid basis for normalization of our gene expression data when studying the flower development in cotton, we evaluated the expression stability of nine candidate reference genes, including one traditional "housekeeping" gene in five different experimental sets. Candidate genes were selected according to the level of DNA sequence similarity to genes previously identified as reference genes in Arabidopsis and cotton. This strategy has been successful in finding good reference genes in other species such as grape [39] and it was already employed by our group in coffee and B. brizantha [17, 18]. Another strategy used to identify bona fide qPCR reference genes is to check housekeeping genes previously used in Northern and RT-PCR studies [40, 41]. However, it has be shown that the expression of traditional reference genes may vary enormously depending on the test condition [6]. In cotton, Tu and collaborators tested six putative constitutive genes (Histone3, UBQ7, Actin, Cyclophilin, Gbpolyubiquitin-1 and Gbpolyubiquitin-2), two of them (Gbpolyubiquitin-1 and Gbpolyubiquitin-2) from previously published data [42]. In contrast to the present work, roots, floral stages and verticils samples were not included in the final set of samples [41]. The reference genes evaluation was performed using exclusively geNorm and the value obtained for the pairwise variation with the best control genes was above the cut-off value of 0.15 suggested by Vandesompele et al. [1]. Moreover, the expression in the fiber developmental series of the all six putative reference genes varied greatly, hampering their use for qPCR [41].

We elected the NormFinder as the preferential method for the selection of the best references genes since it considers intra- and inter-group variations for the normalization factor (NF). However, geNorm was also important to compose the final set of references genes for the experimental conditions tested in this work. Our analysis has shown that each experimental condition tested demands a specific set of reference genes (Table 3 and 4). This result emphasizes the importance of reference genes validation for each experimental condition, especially when samples belong to very different groups, e.g. different organs.

When plant organs and all samples were tested, GhUBQ14 and GhPP2a1 were considered the most appropriate reference genes. GhUBQ14 and GhPP2a1 should avoid error transferences since NormFinder chose them as the best combination of genes. NormFinder chose GhACT4 and GhUBQ14 as the best combination of two genes in flower buds. Both programs ranked GhACT4 as the most stable gene, conferring higher robustness to the NF. Our analyses of different floral organs revealed that GhACT4 and GhFBX6 are the most appropriated genes for qPCR normalization, since they represent the best combination of genes considered by NormFinder to improve NF. GhFBX6 was ranked by both algorithms as the most stable gene in the floral organs set. Finally, fruit development GhMZA was considered as the most stable gene in both the NormFinder and geNorm programs, and NormFinder chose GhMZA and GhPTB as the best combination of genes.

The GhACT4, GhEF1α5, GhFBX6, GhPP2A1, GhMZA, GhPTB, GhGAPC2, GhβTUB genes were identified as novel reference genes in A. thaliana through microarray experiments and were validated by qPCR [7]. Among them, GhGAPC2 gave poor results in our analysis in cotton. GhUBQ14, a traditional reference gene in cotton [26] was well evaluated by NormFinder ranking in the best combination in three of the five experimental sets. Although, evaluations of a traditional reference genes by the same procedures used in this work not always give support to their frequent use. For instance, UBQ10 gene shows highly stable expression in Arabidopsis [6] whereas its putative homologue has been shown unsuitable for normalization of different tissues at different developmental stages in rice and soybean [4, 43].

Other commonly used housekeeping gene, GhβTUB, displayed inappropriate expression variability limiting its use as internal control in cotton. A similar result was also observed for the β-tubulin of B. brizantha when male and female reproductive tissues, spikelets, roots and leaves were evaluated [17]. On the other hand, β-TUB is one of most stably expressed genes in poplar (Populus ssp) tissue samples among the 10 reference genes tested [10]. GAPDH, another traditional reference gene, was considered the most appropriate reference gene when coffee leaves drought-stressed vs. control plants and different coffee cultivar leaves were analyzed [18]. Taken together, these results suggest that the housekeeping genes are regulated differently in different plant species and may exhibit differential expression patterns. This may partly be explained by the fact that housekeeping genes are not only implicated in the basal cell metabolism but also may participate in other cellular functions [11].

The programs employed to evaluate reference genes in our study (geNorm and NormFinder) use the same input data, i.e. non-normalized relative quantities, and Cqs need to be transformed considering primer pair efficiencies. In our experience, it is crucial to evaluate primer pair efficiencies for each sample tested since primer efficiency varies depend on the according to biological sample. The importance of this step can be well illustrated by the primer efficiency variation of GhGACP2 in flower buds compared to fruits (Table 2).

The values of Cq presented here should not be considered alone, but they may help in the selection of best combination of reference genes when there is previous data about target gene expression levels. Similar expression levels of the reference and target genes are considered an important issue regarding qPCR normalization [1]. Indeed, references genes with excessively high/low expression levels compared to target genes can trigger problems for data analysis [44, 45].

As suggested by Remans and collaborators [7], biological replicates were submitted to geNorm and NormFinder as independent samples. This procedure increased the credibility of the most suitable cotton reference genes because it takes into account possible variations in reference gene expression that are not due to different treatments, but intrinsic to the gene itself.

To illustrate the suitability of the reference genes revealed in the present study, two putative cotton homologues to AG and SEP3 (GhMADS3 and GhSEP-like1) had their expression profile evaluated in different plant organs, during flower development and in floral organs at flower buds of 6 mm (Figure 3). As it is observed to AG and SEP3, the GhMADS3 and GhSEP-like1 genes are highly expressed in flower buds, but GhSEP-like1 also shows a high expression in fruits. GhMADS3 also is expressed in higher levels after stage of 2 mm and throughout cotton flower development. The low expression of GhMADS3 in floral meristem is expected as well a high expression level in stamen and carpels of 6 mm flower bud. The AG gene is expressed in few cells during the initial flower development to establish organ identity and is also important at later stages of stamens and carpels development [46, 47]. The GhMADS3 expression observed in pedicels may be the result of contamination of material derived from carpels. These two organs are merged, which hamper a perfect separation during flower bud dissection to collect the samples. Our analysis of GhMADS3 expression by RT-qPCR refined the information of the previous study adding accuracy, spatial and temporal information to the expression during floral development [36]. In addition, it revealed that this GhMADS3 is also expressed in cotton fruits (Figure 3).

The high expression in fruits of GhSEP-like1 contrast to the spatial and developmental expression pattern of SEP3 in Arabidopsis, former AGL9 (Figure 3) [37]. SEP3 is expressed in all Arabidopsis flower verticils throughout development but no transcripts are found in siliques. However, PPERSEP3, a putative Prunus persica homologue of SEP3, is expressed during fruit development [48]. In addition, GhMADS4 and GhMADS7, genes from AGAMOUS subclass in cotton, are also expressed during fiber development [23]. Taking together, these results suggest that besides flower identity SEP3 and AG-like genes in cotton may be involved in an additional developmental process during fruit development.


This work constitutes the first in-depth study to validate the optimal control genes for the quantification of transcript levels in different cotton organs and during flower and fruit development. The use of the new cotton reference genes combined with size collected flower buds and floral organ dissection allowed a precise spatial and temporal characterization of two MADS-box genes in cotton plants. In summary, the new cotton reference genes will enable more accurate and reliable normalization of qPCR results for gene expression studies in this important crop plant.


  1. 1.

    Vandesompele JDPK, Pattyn F, Poppe B, Van Roy N, De Paepe A, Speleman F: Accurate normalization of real-time quantitative RT-PCR data by geometric averaging of multiple internal control genes. Genome Biol. 2002, 3: 7. 10.1186/gb-2002-3-7-research0034.

    Article  Google Scholar 

  2. 2.

    Gachon CMA, Charrier B: Real-time PCR: what relevance to plant studies?. J Exp Bot. 2004, 55: 1445-1454. 10.1093/jxb/erh181.

    PubMed  CAS  Article  Google Scholar 

  3. 3.

    Hong SYSS, Yang MS, Xiang F, Park CM: Exploring valid reference genes for gene expression studies in Brachypodium distachyon by real-time PCR. Bmc Plant Biology. 2008, 8: 112. 10.1186/1471-2229-8-112.

    PubMed  PubMed Central  Article  Google Scholar 

  4. 4.

    Jain MNA, Tyagi AK, Khurana JP: Validation of housekeeping genes as internal control for studying gene expression in rice by quantitative real-time PCR. Biochem Biophys Res Commun. 2006, 345: 646-651. 10.1016/j.bbrc.2006.04.140.

    PubMed  CAS  Article  Google Scholar 

  5. 5.

    Huggett JDK, Bustin S, Zumla A: Real-time RT-PCR normalisation; strategies and considerations. Genes Immun. 2005, 6: 279-284. 10.1038/sj.gene.6364190.

    PubMed  CAS  Article  Google Scholar 

  6. 6.

    Czechowski TSM, Altmann T, Udvardi MK, Scheible WR: Genome-wide identification and testing of superior reference genes for transcript normalization in Arabidopsis. Plant Physiol. 2005, 139: 5-17. 10.1104/pp.105.063743.

    PubMed  CAS  PubMed Central  Article  Google Scholar 

  7. 7.

    Remans TSK, Opdenakker K, Mathijsen D, Vangronsveld J, Cuypers A: Normalisation of realtime RT-PCR gene expression measurements in Arabidopsis thaliana exposed to increased metal concentrations. Planta. 2008, 227: 1343-1349. 10.1007/s00425-008-0706-4.

    PubMed  CAS  Article  Google Scholar 

  8. 8.

    Schmittgen TD, Zakrajsek BA: Effect of experimental treatment on housekeeping gene expression: validation by real-time, quantitative RT-PCR. J Biochem Biophys Methods. 2000, 46 (1-2): 69-81. 10.1016/S0165-022X(00)00129-9.

    PubMed  CAS  Article  Google Scholar 

  9. 9.

    Reid KE, Olsson N, Schlosser J, Peng F, Lund ST: An optimized grapevine RNA isolation procedure and statistical determination of reference genes for real-time RT-PCR during berry development. BMC Plant Biol. 2006, 6: 27-37. 10.1186/1471-2229-6-27.

    PubMed  PubMed Central  Article  Google Scholar 

  10. 10.

    Brunner AYI, Strauss S: Validating internal controls for quantitative plant gene expression studies. Bmc Plant Biol. 2004, 4: 14. 10.1186/1471-2229-4-14.

    PubMed  PubMed Central  Article  Google Scholar 

  11. 11.

    Thellin O, ElMoualij B, Heinen E, Zorzi W: A decade of improvements in quantification of gene expression and internal standard selection. Biotechnology Advances. 2009, 27 (4): 323-333. 10.1016/j.biotechadv.2009.01.010.

    PubMed  CAS  Article  Google Scholar 

  12. 12.

    Andersen CLJJ, Orntoft TF: Normalization of real-time quantitative reverse transcription-PCR data: A model-based variance estimation approach to identify genes suited for normalization, applied to bladder and colon cancer data sets. Cancer Res. 2004, 64: 5245-5250. 10.1158/0008-5472.CAN-04-0496.

    PubMed  CAS  Article  Google Scholar 

  13. 13.

    Pfaffl MWTA, Prgomet C, Neuvians TP: Determination of stable housekeeping genes, differentially regulated target genes and sample integrity: BestKeeper - Excel-based tool using pair-wise correlations. Biotechnol Lett. 2004, 26: 509-515. 10.1023/B:BILE.0000019559.84305.47.

    PubMed  CAS  Article  Google Scholar 

  14. 14.

    De Boever S, Vangestel C, De Backer P, Croubels S, Sys SU: Identification and validation of housekeeping genes as internal control for gene expression in an intravenous LPS inflammation model in chickens. Veterinary Immunology and Immunopathology. 2008, 122 (3-4): 312-317. 10.1016/j.vetimm.2007.12.002.

    PubMed  CAS  Article  Google Scholar 

  15. 15.

    Ransbotyn V, Reusch TBH: Housekeeping gene selection for quantitative real-time PCR assays in the seagrass Zostera marina subjected to heat stress. Limnology and Oceanography-Methods. 2006, 4: 367-373.

    CAS  Article  Google Scholar 

  16. 16.

    Exposito-Rodriguez M, Borges A, Borges-Perez A, Perez J: Selection of internal control genes for quantitative real-time RT-PCR studies during tomato development process. BMC Plant Biology. 2008, 8 (1): 131. 10.1186/1471-2229-8-131.

    PubMed  PubMed Central  Article  Google Scholar 

  17. 17.

    Silveira E, Alves-Ferreira M, Guimaraes L, da Silva F, Carneiro V: Selection of reference genes for quantitative real-time PCR expression studies in the apomictic and sexual grass Brachiaria brizantha. BMC Plant Biology. 2009, 9 (1): 84. 10.1186/1471-2229-9-84.

    PubMed  PubMed Central  Article  Google Scholar 

  18. 18.

    Cruz F, Kalaoun S, Nobile P, Colombo C, Almeida J, Barros LMG, Romano E, Grossi-de-Sa MF, Vaslin M, Alves-Ferreira M: Evaluation of coffee reference genes for relative expression studies by quantitative real-time RT-PCR. Molecular Breeding. 2009, 23 (4): 607-616. 10.1007/s11032-009-9259-x.

    CAS  Article  Google Scholar 

  19. 19.

    Zhang J, Guo W, Zhang T: Molecular linkage map of allotetraploid cotton (Gossypium hirsutum L. × Gossypium barbadense L.) with a haploid population. Theoretical and Applied Genetics. 2002, 105 (8): 1166-1174. 10.1007/s00122-002-1100-4.

    PubMed  CAS  Article  Google Scholar 

  20. 20.

    Wendel JFCR: Polyploidy and the evolutionary history of cotton. Advances in Agronomy. 2003, 78: 139-186. full_text.

    Article  Google Scholar 

  21. 21.

    Adams KL, Cronn R, Percifield R, Wendel JF: Genes duplicated by polyploidy show unequal contributions to the transcriptome and organ-specific reciprocal silencing. PNAS. 2003, 100 (8): 4649-4654. 10.1073/pnas.0630618100.

    PubMed  CAS  PubMed Central  Article  Google Scholar 

  22. 22.

    Wang S, Wang J-W, Yu N, Li C-H, Luo B, Gou J-Y, Wang L-J, Chen X-Y: Control of Plant Trichome Development by a Cotton Fiber MYB Gene. Plant Cell. 2004, 16 (9): 2323-2334. 10.1105/tpc.104.024844.

    PubMed  CAS  PubMed Central  Article  Google Scholar 

  23. 23.

    Lightfoot DJ, Malone KM, Timmis JN, Orford SJ: Evidence for alternative splicing of MADS-box transcripts in developing cotton fibre cells. Molecular Genetics and Genomics. 2008, 279 (1): 75-85. 10.1007/s00438-007-0297-y.

    PubMed  CAS  Article  Google Scholar 

  24. 24.

    Liu ZL, Adams KL: Expression partitioning between genes duplicated by polyploidy under abiotic stress and during organ development. Current Biology. 2007, 17: 1669-1674. 10.1016/j.cub.2007.08.030.

    PubMed  CAS  Article  Google Scholar 

  25. 25.

    Greenberg SMST, Setamou M, Coleman JR: Influence of Different Cotton Fruit Sizes on Boll Weevil (Coleoptera Curculionidae) Oviposition and Survival to Adulthood. Environ Entomol. 2004, 33: 443-449. 10.1603/0046-225X-33.2.443.

    Article  Google Scholar 

  26. 26.

    Li XBFX, Wang XL, Cai L, Yang WC: The cotton ACTIN1 gene is functionally expressed in fibers and participates in fiber elongation. Plant Cell. 2005, 17: 859-875. 10.1105/tpc.104.029629.

    PubMed  CAS  PubMed Central  Article  Google Scholar 

  27. 27.

    Li ZK, Wang XF, Ma J, Zhang GY, Ma ZY: Cloning and characterization of a tau glutathione S-transferase subunit encoding gene in Gossypium hirsutum. Genes Genet Syst. 2008, 83 (3): 219-225. 10.1266/ggs.83.219.

    PubMed  CAS  Article  Google Scholar 

  28. 28.

    Rozen SSH: Primer3 on the WWW for general users and for biologist programmers. Methods Mol Biol. 2000, 132: 365-386.

    PubMed  CAS  Google Scholar 

  29. 29.

    Zhao SFR: Comprehensive algorithm for quantitative real-time polymerase chain reaction. J Comput Biol. 2005, 12: 1047-1064. 10.1089/cmb.2005.12.1047.

    PubMed  CAS  PubMed Central  Article  Google Scholar 

  30. 30.

    Dias BFO, Simões-Araújo JL, Russo CAM, Margis R, Alves-Ferreira M: Unravelling MADS-BOX gene family in Eucalyptus ssp: A starting point to understand yheir importance on developmental mechanisms of vegetative organs. Genetics and Molecular Biology. 2005, 38 (3): 501-510.

    Article  Google Scholar 

  31. 31.

    Altschul SF, Madden TL, Schaffer AA, Zhang JH, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research. 1997, 25 (17): 3389-3402. 10.1093/nar/25.17.3389.

    PubMed  CAS  PubMed Central  Article  Google Scholar 

  32. 32.

    Thompson JD, Higgins DG, Gibson TJ: Clustal-W-Improving the Sensitivity of Progressive Multiple Sequence Alignment through Sequence Weighting, Position-Specific Gap Penalties and Weight Matrix Choice. Nucleic Acids Research. 1994, 22 (22): 4673-4680. 10.1093/nar/22.22.4673.

    PubMed  CAS  PubMed Central  Article  Google Scholar 

  33. 33.

    Saitou N, Nei M: The Neighbor-Joining Method - a New Method for Reconstructing Phylogenetic Trees. Molecular Biology and Evolution. 1987, 4 (4): 406-425.

    PubMed  CAS  Google Scholar 

  34. 34.

    Tamura K, Dudley J, Nei M, Kumar S: MEGA4: Molecular evolutionary genetics analysis (MEGA) software version 4.0. Molecular Biology and Evolution. 2007, 24 (8): 1596-1599. 10.1093/molbev/msm092.

    PubMed  CAS  Article  Google Scholar 

  35. 35.

    Hellemans J, Mortier G, De Paepe A, Speleman F, Vandesompele J: qBase relative quantification framework and software for management and automated analysis of real-time quantitative PCR data. Genome Biology. 2007, 8 (2). 10.1186/gb-2007-8-2-r19.

  36. 36.

    Guo Y, Zhu Q, Zheng S, Li M: Cloning of a MADS Box Gene (GhMADS3) from Cotton and Analysis of Its Homeotic Role in Transgenic Tobacco. Journal of Genetics and Genomics. 2007, 34 (6): 527-535. 10.1016/S1673-8527(07)60058-7.

    PubMed  CAS  Article  Google Scholar 

  37. 37.

    Mandel MA, Yanofsky MF: The Arabidopsis AGL9 MADS box gene is expressed in young flower primordia. Sex Plant Reprod. 1998, 11 (1): 22-28. 10.1007/s004970050116.

    CAS  Article  Google Scholar 

  38. 38.

    Martin RCHV, Dombrowski JE: Evaluation of Reference Genes for Quantitative RT-PCR in Lolium perenne. CROP SCIENCE. 2008, 48: 1881-1887. 10.2135/cropsci2007.10.0597.

    CAS  Article  Google Scholar 

  39. 39.

    Reid KEON, Schlosser J, Peng F, Lund ST: An optimized grapevine RNA isolation procedure and statistical determination of reference genes for real-time RT-PCR during berry development. Bmc Plant Biol. 2006, 6: 27. 10.1186/1471-2229-6-27.

    PubMed  PubMed Central  Article  Google Scholar 

  40. 40.

    Cavallari CFBSF, Maluf MP, Maia IG: Identification of suitable control genes for expression studies in Coffea Arabica under different experimental conditions. Bmc Plant Biol. 2009, 10: 1.

    Google Scholar 

  41. 41.

    Tu LL, Zhang XL, Liu DQ, Jin SX, Cao JL, Zhu LF, Deng FL, Tan JF, Zhang CB: Suitable internal control genes for qRT-PCR normalization in cotton fiber development and somatic embryogenesis. Chinese Science Bulletin. 2007, 52: 3110-3117. 10.1007/s11434-007-0461-0.

    CAS  Article  Google Scholar 

  42. 42.

    Shi YH, Zhu SW, Mao XZ, Feng JX, Qin YM, Zhang L, Cheng J, Wei LP, Wang ZY, Zhu YX: Transcriptome profiling, molecular biological, and physiological studies reveal a major role for ethylene in cotton fiber cell elongation. Plant Cell. 2006, 18 (3): 651-664. 10.1105/tpc.105.040303.

    PubMed  CAS  PubMed Central  Article  Google Scholar 

  43. 43.

    Jian BLB, Yurong B, Wesheng H, Cunxiang W, Han T: Validation of internal control for genes expression study in soybean by quantification real-time PCR. Bmc Plant Biol. 2008, 9: 59.

    Google Scholar 

  44. 44.

    Frost PNF: Validation of reference genes for transcription profiling in the salmon louse, Lepeophtheirus salmonis, by quantitative real-time PCR. Vet Parasitol. 2003, 118: 169-174. 10.1016/j.vetpar.2003.09.020.

    PubMed  CAS  Article  Google Scholar 

  45. 45.

    Robinson TLSI, Sutherland J: Validation of candidate bovine reference genes for use with real-time PCR. Vet Immunol and Immunopathol. 2007, 115: 160-165. 10.1016/j.vetimm.2006.09.012.

    CAS  Article  Google Scholar 

  46. 46.

    Yanofsky MFMH, Bowman JL, Drews GN, Feldmann KA, Meyerowitz EM: The protein encoded by the Arabidopsis homeotic gene Agamous resembles transcription factors. Nature. 1990, 346: 35-39. 10.1038/346035a0.

    PubMed  CAS  Article  Google Scholar 

  47. 47.

    Ito TWF, Yu H, Das P, Ito N, Alves-Ferreira M, Riechmann JL, Meyerowitz EM: The homeotic protein AGAMOUS controls microsporogenesis by regulation of SPOROCYTELES. Nature. 2004, 430: 356-360. 10.1038/nature02733.

    PubMed  CAS  Article  Google Scholar 

  48. 48.

    Tani E, Polidoros AN, Flemetakis E, Stedel C, Kalloniati C, Demetriou K, Katinakis P, Tsaftaris AS: Characterization and expression analysis of AGAMOUS-like, SEEDSTICK-like, and SEPALLATA-like MADS-box genes in peach (Prunus persica) fruit. Plant Physiol Biochem. 2009, 47 (8): 690-700. 10.1016/j.plaphy.2009.03.013.

    PubMed  CAS  Article  Google Scholar 

Download references


We thank Bruna P Matta, Camila M. Patreze and Fernanda Cruz for early discussions and comments. This work is part of SA and SMN' MSc theses from Pós-Graduação em Biotecnologia Vegetal and Pós-Graduação em Genética, respectively (Federal University of Rio de Janeiro, Brazil) with fellowship of CNPq. This work was supported by National Council for Science and Technology - CNPq - Brazil (310254/2007-8; M A-F), Associação Mato-grossense de Algodão (IMAmt - FACUAL) and Fundação Carlos Chagas Filho de Amparo à Pesquisa do Estado do Rio de Janeiro - FAPERJ (E-26/102.861/2008; M.A-F).

Author information



Corresponding author

Correspondence to Marcio Alves-Ferreira.

Additional information

Authors' contributions

SA and SMN were responsible for the experiments, RNA sample preparation, RT-qPCR data analyses and drafting the manuscript. OS and MF G-S contributed with sample preparation and study design. MA-F participated as supervisor in the study design, analyses and writing. All authors read and approved the final manuscript.

Sinara Artico, Sarah M Nardeli contributed equally to this work.

Electronic supplementary material

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Artico, S., Nardeli, S.M., Brilhante, O. et al. Identification and evaluation of new reference genes in Gossypium hirsutumfor accurate normalization of real-time quantitative RT-PCR data. BMC Plant Biol 10, 49 (2010).

Download citation


  • Reference Gene
  • Fruit Development
  • Flower Development
  • Floral Organ
  • Floral Meristem