Polyploidy and the petal transcriptome of Gossypium
© Rambani et al.; licensee BioMed Central Ltd. 2014
Received: 31 July 2013
Accepted: 8 October 2013
Published: 6 January 2014
Skip to main content
© Rambani et al.; licensee BioMed Central Ltd. 2014
Received: 31 July 2013
Accepted: 8 October 2013
Published: 6 January 2014
Genes duplicated by polyploidy (homoeologs) may be differentially expressed in plant tissues. Recent research using DNA microarrays and RNAseq data have described a cacophony of complex expression patterns during development of cotton fibers, petals, and leaves. Because of its highly canalized development, petal tissue has been used as a model tissue for gene expression in cotton. Recent advances in cotton genome annotation and assembly now permit an enhanced analysis of duplicate gene deployment in petals from allopolyploid cotton.
Homoeologous gene expression levels were quantified in diploid and tetraploid flower petals of Gossypium using the Gossypium raimondii genome sequence as a reference. In the polyploid, most homoeologous genes were expressed at equal levels, though a subset had an expression bias of AT and DT copies. The direction of gene expression bias was conserved in natural and recent polyploids of cotton. Conservation of direction of bias and additional comparisons between the diploids and tetraploids suggested different regulation mechanisms of gene expression. We described three phases in the evolution of cotton genomes that contribute to gene expression in the polyploid nucleus.
Compared to previous studies, a surprising level of expression homeostasis was observed in the expression patterns of polyploid genomes. Conserved expression bias in polyploid petals may have resulted from cis-acting modifications that occurred prior to polyploidization. Some duplicated genes were intriguing exceptions to general trends. Mechanisms of gene regulation for these and other genes in the cotton genome warrants further investigation.
The genus Gossypium currently consists of approximately 45 diploid and six polyploid species [1, 2]. The six polyploid species of this genus formed between 1–2 million years ago [3, 4]. While these polyploid species are currently geographically separated, their monophyletic origin makes this genus an ideal system to study the effects of polyploidization on gene expression. Two polyploid species, G. hirsutum and G. barbadense, produce spinnable fiber used by the textile industry and represent the majority of world-wide cotton production. An investigation of the effects of polyploidization on gene expression could further our understanding of the basis of superior cotton fiber qualities and fiber yields in tetraploid cotton cultivars.
Polyploidization causes a simultaneous duplication of all nuclear DNA, and some of the genomic consequences of polyploidization can be dramatic [5–7]. One consequence of polyploidization is unequal expression of homoeologous loci. This phenomenon was first described in cotton, for single duplicate gene pairs using Single-Strand Conformation Polymorphism (SSCP) [1, 2, 8] and genome-wide using custom DNA microarrays [3, 4, 9]. Subsequent investigations found that expression biases between duplicate genes could be due to growth stage [10–13] or stress  and that the inter-genomic biases were reminiscent of monoallelic expression biases (i.e. inter-allelic) in diploid Homo sapiens[15–17]. Regardless of the growth stage, tissue, or stress, the degree of bias between duplicated gene pairs was distributed across a spectrum of different expression ratios including the 50:50 ratio of most homoeologous gene pairs [12, 18, 19]. Of the genes with biased expression in petal tissue, approximately 76% of homoeolog expression biases were immediately apparent after genomic merger, while the remaining 24% of homoeolog expression biases had been molded by evolutionary forces over time . The expression level changes that accompanied polyploidization were considered two distinct phases of duplication gene evolution and have been reported in other natural and synthetic allopolyploid species [6, 21–26].
Another consequence of polyploidization is expression level dominance. Expression level dominance has been characterized by the abundance of transcript rather than the transcript origin . It was defined by comparing expression levels in Gossypium tetraploids to those in related diploids for a given gene. When the tetraploid gene expression level was statistically indistinguishable from one of the diploids, it was assumed that the diploid with the expression level matching that found in the polyploid was dominant. When many genes throughout the genome exhibited expression dominance, the generalized trend was considered expression level dominance of the genome if one of the two genomes was more frequently dominant than the other genome in the tetraploid nucleus. An expression dominance of one of the two genomes was found in leaf [19, 27] and petal [18, 20] tissue of interspecific hybrids and natural Gossypium polyploids. Expression level dominance has also been observed in other polyploid species such as Coffea, Spartina and wheat . Molecular factors contributing to expression level dominance are still unclear, but the cis- and trans- interactions of the regulatory machinery in the two distinct genomes are one explanation . External factors could also play a role since temperature was shown to influence the magnitude and direction of expression level dominance in Coffea species . Differential epigenetic regulation is another possible explanation.
Previously, the transcript contributions of the two co-resident cotton genomes were quantified by custom microarrays [9, 10, 18] or with RNA-seq and EST assemblies . However, a more accurate assessment of transcriptome composition is possible through RNA-seq technology because gene expression measurement by RNA-seq is not influenced by probe specificity, ascertainment bias of a template sequence, and cross-hybridization [30, 31]. Here, we used RNA-seq and the annotated genes of G. raimondii to measure gene expression in several polyploid accessions of cotton within its phylogenetic framework.
List of plant materials used in this study
G. arboreum X G. raimondii
A2 x D5
Petal tissue was collected from plants growing under controlled greenhouse conditions at the Pohl Conservatory, Iowa State University, USA. Tissue was harvested at the time of full petal expansion after dawn but before pollination. Taking one flower from three different plants made three biological replicates for experiments. Harvested tissue was flash frozen in liquid nitrogen and stored at −80°C until RNA and DNA extraction.
RNA samples were extracted from the three replicates using a modified hot borate method . RNA samples were quantified using Ribogreen (Invitrogen Inc., Grand Island, NY) and their quality was evaluated on an Agilent Bioanalyzer (Agilent Technologies, Santa Clara, CA). As described by Illumina, cDNA was sheared by sonication to a 200–400 bp fragment size (Covaris Inc., Woburn, MA). RNA-seq libraries were prepared according to the Illumina TruSeq RNA library prep kit protocol and sequenced on an Illumina HiSeq using v.2 chemistry at the Huntsman Cancer Center, SLC, UT. The sequencing reads are available at the NCBI Sequence Read Archive under Study: SRP028270 and Experiment: SRX328344.
Number of reads (Millions) that were categorized from reference mapping RNA-seq reads
G. hirsutum Maxxa
G. hirsutum Tx2094
Universal Probability of expression Codes (UPC) uses a mixed-model approach to quantify the probability of gene expression in a sample . UPC determined which genes were actively expressed in the petal tissue for each accession. Active genes in all the accessions were called ‘commonly expressed genes’ and they were used to generate GO annotations for the petal transcriptome through Blast2GO . BLASTX was performed on the Fulton Super Computer at BYU. Blast2GO visual tools were employed to build pie charts depicting gene ontology. Utilizing GO annotations and Enzyme Codes (EC) the KEGG IDs were assigned to each gene and the transcript abundance was calculated for KEGG pathways .
EdgeR was used to normalize expression data and perform differential expression analysis . Two factors were used as explanatory variables in model design matrix: ‘accession’ with four levels (diploid F1-hybrid, G. tomentosum, G. hirsutum Tx2094, G. hirsutum Maxxa) and ‘genome’ with two levels (A-genome or D-genome). A single, nested interaction design was used to determine genes significantly differentially expressed between accessions. A separate, simple single factor experiment with 8 levels was used to detect genes differentially expressed between two genomes for each accession. EdgeR performs exact test for the negative binomial distribution coefficients to provide p-values and false discovery rates (q-values) for all the genes. Genes with <0.05 FDR were considered differentially expressed.
A phylogeny based on expression levels of the genes from all the accessions was built using the neighbor-joining algorithm, with sum of squared differences for all genes between accessions within a distance matrix. We built another phylogeny using the neighbor-joining algorithm based on the difference between the two homoeologous gene expression levels.
To analyze expression level dominance, every gene of each polyploid accession was separately analyzed and characterized according to the relationships between the RPKM values of the different genomes. Genes without expression in petals as determined by UPC were excluded from analysis. Each gene was categorized after comparison of A2 and D5 expression to the total expression of the polyploid. A matrix was constructed with the number of genes from the two comparisons and these numbers were used to calculate the total number of genes in each category of expression level dominance.
A total of 50 bp, single-ended RNA-seq reads were generated from three replicates of each accession (Table 1). Maxxa and G. tomentosum had the most RNA-seq reads (> 40 M each) and diploid D5 had the fewest RNA-seq reads (~37 M). Each of these reads was mapped (or aligned) to the 13 large pseudo-molecules of the D-genome reference sequence (v. 2.2.1, ) containing an initial set of gene annotations. Not all the reads mapped to the reference genome sequence. Perhaps this is because either the initial draft of the D-genome reference did not have all of the genes annotated, or because transcripts mapped to genomic regions outside of annotated genes, or because some of the genes were not assembled to the 13 large pseudo-molecules. Of the annotated genes in the reference pseudo-molecules, 80% had at least one mapped read from the petal tissue.
Approximately 25% of the mapped reads were assigned to each genome (AT and DT). If the mapped reads overlapped a homoeo-SNP position (SNPs between the A- and D-genomes), they were categorized as belonging to one of the two genomes or as a chimeric read because it had A- and D-genome nucleotides at different loci in close proximity (A-reads, D-reads, and X-reads, respectively; ). If a read did not overlap a homoeo-SNP position, the read was unable to be categorized as originating from either the AT- or DT-genome (N reads) (Table 2). The number of uncategorized reads was not unexpected given the limited divergence between the AT- and DT-genome in coding sequences .
The number of genes expressed in each Gossypium accession, the total number shared by every accession, and the number of genes found to have unequal transcript contribution of both genomes (A T and D T ) to the transcript pool (genome bias)
G. hirsutum Maxxa
G. hirsutum Tx2094
Using Blast2GO, we assigned GO IDs to the common genes based on their RefSeq Blast hits and categorized them into three separate gene ontologies according to their putative function  (Additional file 1: Figure S1). The cellular component (CC) ontology had the highest number of assigned GOIDs (88%) followed by the biological process (BP) ontology (17%) and molecular process (MP) ontology (9%). The most abundant GO terms of CC were cytoplasm related (cytoplasm (28%) and cytoplasmic part (27%, Additional file 2: Figure S2). Cellular protein metabolic processes (31%) and kinase activity (41%) were the most plentiful GO terms for the BP and MP ontologies, respectively. Similar distributions among categories have also been reported from the petal tissue of other species, like Dianthus and Safflower . Enzyme-coding genes were identified and their role in KEGG enzymatic pathways was determined. A total of 4,565 genes were assigned an enzyme code ID corresponding to 654 different enzymes (i.e. many genes were members of large gene families). These 654 enzymes were found to be part of 93 different enzymatic pathways in petal tissue. The enzymatic pathways can be divided into four general categories: Metabolic pathways, Biosynthetic pathways, Degradation pathways and signaling pathways (Additional file 2: Figure S2). Transcript abundance of genes involved in metabolic pathways like starch and amino sugar metabolism was highest in petal tissue compared to other enzymatic pathways. Amongst biosynthetic pathways, biosynthesis of amino acids and flavonoids were most abundant whereas other processes like wax and pigment synthesis had smaller representation.
The two co-resident genomes of polyploid nuclei did not always contribute equally to the transcript pool. Unequal contribution by the AT- and DT-genome homoeologs to the cotton transcript pool of any single gene (on the D-genome reference sequence) is referred to as ‘genome bias’ [4, 9, 18, 19, 45]. A comparison of the number of biased genes between accessions indicated that the large majority of genes did not have a genome bias (i.e. bias ratio of AT/DT where AT > DT or vice versa) when the genes with 99% UPC expression likelihood were considered. Approximately 20% of genes expressed in petals had a significant bias towards the AT- or DT-genome (Table 3). Though it wasn’t significantly different, G. tomentosum had the highest number of biased genes, followed by Maxxa, then Tx2094, among the natural tetraploids. This ranking was similar to a previous study of petal tissue that also found a higher number of biased genes in G. tomentosum than in G. hirsutum or the diploid F1-hybrid . We included G. tomentosum in this study because it had a greater number of homoeologous genes with large expression biases (i.e. a broader distribution of A/D expression ratios) than other polyploids. However, we found only subtle differences in the distribution ratios of gene expression among any of the accessions using RNA-seq (Additional file 3: Figure S3). The most notable difference of the distributions was that the diploid F1-hybrid had a smaller proportion of expressed genes with a detectable expression bias.
The direction of bias for these 478 gene pairs was conserved for most petal transcriptomes with a few exceptions. Of the 478 gene pairs, 246 were consistently biased in the AT direction (Figure 3B) and 202 were consistently biased in the DT direction. These 448 gene pairs were located on all thirteen chromosomes of the D-genome reference. Since we found a significant correlation (p < 0.005) between the number of total genes/chromosome and number of biased genes/chromosome, the homoeologous gene expression bias was probably not related to their chromosomal location. Interestingly, 19 gene pairs in the diploid F1-hybrid had consistent, yet contrary biases compared to the natural polyploids (e.g. a gene pair within the diploid F1-hybrid had an AT-bias, but a DT-bias was detected in all of the other three natural polyploids). In addition, each of the polyploids had a unique set of a small number of gene pairs with a contrarian directional bias than the remaining accessions (6 genes in G. tomentosum, 2 genes in Maxxa, and 1 gene in Tx2094).
Zooming out to reconsider accession totals of biased genes, 339 genes were biased in the diploid F1-hybrid but not the natural polyploids, while 770 genes were biased in the natural polyploids but not the hybrid (Figure 3A). The expression bias of these 1,009 genes (339 + 770) was found to be different between the diploid F1-hybrid and the natural polyploids. Further examination of the 770 homoeologous gene pairs with conserved expression bias in the natural polyploids found that the direction of expression bias (i.e. AT > DT or DT < AT) was conserved for all but 1–2 pairs of genes (depending on which two natural polyploids were compared). This conserved direction of homoeologous bias suggested a difference in cis-regulation of gene expression between genomes. Inspection of all of the other biases shared between two or more accessions found strong, consistent directions of homoeologous expression bias suggesting that the bias has roots in the independent evolution of the two progenitor species prior to reuniting within the common tetraploid nucleus.
Counts of duplicated genes from this RNA-seq study were placed into the 12 expression levels categories and compared to the previously reported microarray results of petal transcriptomes (Additional file 4: Figure S4, ). Previously, a modest level D-genome ELD was found as a ratio between categories (Categories II and XI/Categories IV and IX). In our RNA-seq results, we did not find a discernable level expression level dominance because the ELD ratios were very close to 1 (0.92, 1.02, and 0.98 in the diploid F1-hybrid, Maxxa, and G. tomentosum, respectively). This disagreement between the microarray and RNA-seq results could have been due to an artifact of microarray probe design where more EST templates were available for the D-genome than the A-genome (i.e. floral ESTs were only obtained from the D-genome ). In addition, our use of SCAN UPC expression likelihoods reduced the amount of transcriptional noise by filtering the low-level expressed genes from the analysis (note the smaller total number of genes in each treatment, Additional file 4: Figure S4).
We found that the polyploid categories II and IV (greater expression of transcripts from both polyploid AT- and DT-genomes compared to diploids) are ~10-fold more abundant than categories IX and XI (less expression of transcripts from both polyploid AT- and DT-genomes compared to diploids; Figure 6). All categories where polyploid gene expression level was expressed as high (or higher) as the diploid with the highest expression level had many genes (generally >1,000; categories II, IV, V, VI, and VIII). All categories where polyploid gene expression level was expressed as low (or lower) as the diploid with the lowest expression level had very few genes (generally < 100; Categories III, VII, IX, X, and XI).
Because we could discern the AT and DT transcripts in our RNA-seq data, we were curious if the general trend of increased expression in the polyploid was due to an increase of transcription levels in a single genome or both genomes (Figure 6). Thus, we considered genes that had a significant expression bias within each of the 12 expression categories. For example, 1,504 genes exhibited expression level dominance of the AT-genome in the diploid F1-hybrid. Of those, 382 AT-DT gene pairs contained homoeo-SNPs and had significant expression bias based on read counts of each homoeolog. All 382 of these gene pairs had a AT-genome bias and 0 genes had a DT-genome bias and while this was the most extreme difference, it represented a general trend within the expression categories. Generally, the direction of expression level dominance in all ‘increased expression’ categories (Categories II, IV, V, and VI) coincided with the RNA-seq read composition of genes with significant homoeolog expression bias. Perhaps, this also indicated cis-acting regulation of genes where a higher expression of one of the two diploid genomes was conserved in the polyploid nucleus and potential overexpression was controlled by and at the same levels as in the diploids.
Interestingly, category VIII is described as exceptional or transgressive expression in the polyploid (A = D < ATDT). It had equivalent contributions from each genome. In contrast to other categories, more genes were placed in this category on a percentage basis by our RNA-seq analysis than by the previous microarray experiment. There were also many fewer of these genes in the diploid F1-hybrid than in the natural polyploids. Perhaps this transgressive expression represent a different type of ‘additive’ gene expression where each diploid contributes an additive amount of transcript (such that two copies in the nucleus results in twice as much gene product), except that these loci have not individually evolved a mechanism for feedback control of expression. Such an interpretation would agree with a form of pseudo-overdominance (i.e. heterosis) between genomes and may have been due to complementary combinations of cis- and trans-acting factors resulting in more expression together than either genome alone.
Number of genes with or without differential expression in the diploids and polyploids (for example, 6,358 genes were equally expressed in the diploids; of those, 6,245 were also found to be equally expressed in the diploid F 1 -hybrid and 113 were found to be differentially expressed in the diploid F 1 -hybrid)
A- > D
D- > A
Diploid F 1 -hybrid
G. hirsutum maxxa
G. hirsutum TX2094
In addition, we compared the average degree of bias detected in the diploid F1-hybrid to the average degree of bias detected in the natural polyploids. When there was a significant DT bias detected in the diploid F1-hybrid, there was also a significant difference in the amount of bias between the diploid F1-hybrid and the other polyploids. When there was a significant AT bias detected in the diploid F1-hybrid, there was not a significant difference in the amount of bias between the F1 and the other polyploids.
Changes in gene expression from diploid to tetraploid genomes
Diploid F 1 -hybrid
G. hirsutum maxxa
G. hirsutum TX2094
Cotton fiber tissue has been the main focus of many transcriptome studies of Gossypium species because of its economic importance [10–12, 48, 49]. In contrast, petal tissue is an excellent ‘model’ tissue for cotton gene expression because of its highly canalized development and limited interaction with the environment [18, 20]. In one of the first applications of the G. raimondii reference genome , the gene expression levels in petals were determined by mapping RNA-seq reads to the annotated genes. Only 36-42% of the Gossypium genome was expressed in the petal tissue of open flowers prior to pollination. This number was noticeably smaller than the number of genes expressed during the development of cotton fiber , but we only sampled one time point of petal development. Between 75% and 83% of the expressed genes were expressed in every accession (Table 3). While these commonly expressed genes had a 99% probability of expression, they were not expressed at the same levels in the different accessions. Using gene expression as a metric , the expected phylogenetic relationships were clearly seen within a simple neighbor-joining tree containing two major clades (A and D, Figure 4).
Distributing transcripts in GO categories developed a molecular snap shot of the petal tissue. The cellular component ontology that includes multi-subunit enzymes and other protein complexes was most abundant GO category (88%). Petal cells undergo rapid elongation to reach full petal expansion stage. Actin cytoskeleton helps with cell elongation by transporting vesicles and organelles to the site of growth from cytoplasm. The cytoplasm (28%) and cytoplasmic parts (27%) were most represented under cellular component GO category. About 17% of transcripts fell under biological processes GO category and under this category cellular protein metabolic processes (31%) were most prominent. Petal tissue is an energy sink tissue for plant reproduction where starch and sucrose are mobilized from photosynthetic organs and broken down to sugars that function as precursors to essential primary and secondary metabolites . This was supported by the transcript abundance of different KEGG pathways. Many enzymes expressed in petal tissue were involved in starch and sucrose metabolism pathways.
In addition to a comprehensive assessment of gene expression in petals, the improvements in our analytical methodology resulted in three refinements of our understanding of the cotton transcriptome. First, both Yoo et al.  and this study found that expression bias is only found in a minority of genes. While previous studies used EST assemblies, we used an assembled genome reference and its corresponding gene annotations to determine gene expression resulting in more accurate numbers of expressed genes in a single tissue and the individual contributions of the distinct polyploid genomes.
Second, UPC was used to assign an expression probability to each gene. Genes with low expression levels have less accuracy and cause inflated comparisons of expression level differences due to overlapping standard deviations of expression (e.g., A is equal to P; D is equal to P, but A is not equal to D). Elimination of low-level expressed genes emphasized two clear trends of the polyploid transcriptome. The first trend was that we found negligible ‘down-regulation’ in the polyploid. When the low-level expressed genes (instead of a fold-change threshold) were eliminated, the frequency of categories of ‘down-regulation’ in the polyploid (III, IX, X, XI) were negligible compared to the frequency of ‘up-regulation’ categories (II, IV, V, VI). The second trend was that we found no ELD in the cotton petal transcriptome. Previously, a small D-genome ELD was found in petals, but it could be potentially explained by the origin of the microarray probes (Additional file 4: Figure S4). A small A-genome ELD in cotton leaves  could partially be explained by the difference of evolutionary distance between AT and A2, and DT and D5. Since AT and A2 are ~2× closer in sequence similarity than DT and D5, the AT-genome reads of the polyploid are more likely to map as equally well (or equally poor) as the A-genome diploid than the D-genome sequences, particularly when an EST assembly of AT and DT is used a reference. This potential inherent bias is fully considered when SNP-tolerant mapping is used since SNP positions are masked during the read alignment resulting in normalized read mapping efficiencies. This study was the first study to use SNP-tolerant mapping in RNA-seq analysis of a polyploid transcriptome.
Third, when the polyploid had equal to the higher diploid parent (i.e. up-regulation categories II, IV, V, VI), it was previously found that the expression level of the polyploid was explained by an increased expression of the ‘recessive’ parent . We found no evidence for a significant contribution from the ‘recessive’ parent. In fact, we observed an exceptional amount of expression stability between the diploid and polyploid petals. This paradoxical finding could be due to the differences between leaf and petal, or it could be due to the analytical refinements mentioned above. Further examination of additional RNA-seq dataset will contribute a greater understanding of ELD.
Our results contribute to a growing understanding of the evolution of gene expression in cotton. Adams et al., first reported expression bias in natural Gossypium polyploids. Since each gene and its corresponding cis-acting regulatory sequence(s) of the diploid A- and D-genomes can be traced to a common ancestry 5–10 mya , the cause of differential expression between homoeologs remained a mystery. Using microarray technology, homoeologous expression biases were observed in petal tissues from five Gossypium polyploids and a diploid F1-hybrid [18, 20]. These previous observations suggested two phases or modes for the evolution of cotton gene expression during polyploidization. Similar to Yoo et al. , our results suggest three phases of evolution that ultimately determine the expression levels of duplicated genes in the cotton genome.
The first phase consists of independent changes in upstream, cis-acting regulatory regions of genes prior to genome hybridization. In cotton, speciation of the two diploid genomes followed distinct evolutionary trajectories as evidenced by their differential accumulation of retrotransposons . During speciation, the cis-acting promoter regions of protein coding genes also likely changed for a modest number of genes. Evidence of changes to cis-acting regulatory regions was observed by the conservation of homoeologous gene expression bias. If the observed expression biases were the result of a stochastic process during polyploid formation, then the direction of homoeologous expression bias would also be stochastic. When we inspected all genes with an expression bias in more than one accession (Figure 3A), we found that direction of bias was nearly always conserved in the four different accessions and in two different hybridization events (e.g. Figure 3B). Thus, the direction of expression bias of these loci may have been due to differential cis-acting regulation efficiency that evolved in the diploids and was subsequently conserved or maintained in the polyploid nucleus. This manner and level of conserved cis-acting regulation has not been previously reported. Previously, a mode of gene regulation (i.e. cis-) could only be ascribed to the small number of genes where the polyploid expression patterns were different that the diploids (Table 5).
Alternatively, or in addition, to cis-acting modification, transposable elements could have region of influence causing a quantitative reduction of transcription factor binding and correspondingly a reduction of transcription initiation . Indeed, TEs have been differentially amplified and inserted in the A- and D-genomes . A correlation of gene distance to the nearest TE in the D-genome reference sequence was only significant in one of the five datasets we tested. Thus, there was not a strong association between TE distance and expression bias, even with a ~2× difference in genome size between the A- and D-genomes largely due to TE amplification and re-insertion. While a significant difference between categories of genes (A-biased or D-biased) was not apparent, only the distances from a single annotated reference genome were used (D-genome). We anticipate that a more meaningful comparison will be possible once the A-genome sequence has been published. A publically accessible A-genome sequence would allow the actual distances between A-genome TE insertions and annotated genes to be used with the current D-genome distances for a more biologically meaningful comparison.
The second phase of duplicate gene evolution is hybridization. When two genomes are combined into a single nucleus, they share gene regulatory factors, which can lead to novel regulatory processes and changes in regulatory networks. These novel interactions have been considered as a genome shock that accompanies polyploidization and may provide a new source of genetic variation , see  for recent review. However, we found that most homoeologous genes were not expressed at statistically different levels in petals from tetraploids. If trans-acting factors were regulating homoeologous gene expression, both homoeologous loci would be expressed at similar or equal levels (e.g. Table 5). Our results suggest that either the expression levels of most genes are controlled to some degree by trans-acting factors or that our experimental design lacked sufficient power to detect a significant difference between homoeologous expression levels. With only three replicates, we had modest empirical estimates of power in our dataset. While some real homoeologous differences remained undetected, the general interpretation of the relative amounts of cis- and trans-acting regulation will be evaluated against future studies. Previous reports of an overall genome bias (DT > AT composed of many genes with small differences) due to either accelerated evolution of one of the two genomes or a mechanism of epigenetic control of expression, did not agree with our current findings [18, 20]. Perhaps, the previous use of DNA microarrays resulted in an ascertainment bias through probe construction.
The second phase of gene evolution can also be interpreted through the lens of expression level dominance [4, 18, 27]. Nucleolar dominance is one type of molecular interaction that occurs when two genomes merge to form a polyploid and it is mediated by the targeting of specific siRNAs [54, 55]. Expression level dominance is a more generalized phenomenon (without a sequence-specific trigger) and it was found in the diploid F1-hybrid and natural Gossypium allopolyploids in previous microarray-based and RNA-seq studies [18, 19, 27]. We did not detect a consistent pattern of expression level dominance in our analysis. Within the comparative categories of gene expression that identify expression level dominance, the polyploid expression levels more commonly matched or exceeded the diploid species with the greater level of expression, independent of whether that diploid had the A- or D-genome. This consistent polyploid ‘high-level expression’ appeared to be a general trend that could also be explained by cis-acting transcriptional regulators. For example, suppose that the A- or D-genome has a cis-acting regulating sequence that is more efficient than the other genome’s homoeologous sequence due to mutation during the first phase of evolution. When combined into a polyploid nucleus, a more efficient cis-acting regulator would continue to induce one of the two homoeologous genes at the requisite level for expression in the polyploid nucleus just as it had in the diploid nucleus (i.e. polyploid expression was the same as the higher expressing diploid). The less common events where the polyploid had lower expression levels could be due to trans-acting transcription repressors where the active repressor of one genome reduces expression levels in both genomes.
In the second evolutionary phase, we can also consider the constitution of the polyploid expression levels in the categories that match or exceed the expression levels of the diploids because we can distinguish between AT and DT transcripts. For example, if the polyploid matched or exceeded the expression level of the A-genome (Category IV and VI), the polyploid transcripts would largely consist of AT transcripts and not DT transcripts. The reverse was true for categories II and V where the polyploid matched the expression level of the D-genome diploid. Furthermore, most (50-70%) of the genes in the ‘expression level dominance’ categories actually showed no change in the AT- relative to the A-genome and in DT- relative to D-genome suggesting similar controls of expression between the diploid and the respective polyploid.
The third phase of gene expression evolution occurs between hybridization and adaptation to the species current biological niche. We found a larger number of biased homoeologous genes in the natural polyploids when compared to the diploid F1-hybrid in agreement with Flagel et al. . Since we had relatively similar amounts of statistical power for each accession, the increased number of genes detected with a homoeologous bias could have evolved since formation of the ancestral polyploid. When a significant bias was detected between the homoeologs, the difference was greater in the DT genomes of the natural polyploids than it was in the genome of the diploid F1-hybrid (but not the AT biased genes). Perhaps, this observation represents a higher rate of mutation accumulation in the AT-genome cis-acting regulatory regions than the homoeologous DT-genome regions. It also agreed with our previous finding of general nucleotide diversity between the two genomes , our general findings of overall expression in Table 4, and previous reports . Although rate heterogeneity seems unlikely, a relative rate test cannot be evaluated without the genome sequence of an appropriate outgroup (e.g. G. kirkii).
Here we used RNA-seq to characterize gene expression in polyploid cotton petals with an unprecedented level of resolution within each genome. Future comparisons of these findings to similar results in other polyploids will perhaps enable an extrapolation of general trends to unifying concepts of polyploid gene expression in plants. Collectively, these results suggest that after polyploidization 1) most homoeologous gene pairs are expressed at approximately equal amounts, 2) ~20% of expressed genes have biased expression, 3) a portion of these genes have expression biases in more than one polyploid and the direction of their bias is largely conserved, 4) for any set of genes with expression biases in more than one accession, only a very a small number of genes have a direction of expression bias that was unique or contrary to the observed direction in other accessions, and 5) that general trends of gene expression can be interpreted as cis- and trans-acting regulation of polyploid gene expression. We recognize that the simple explanations of increased and decreased levels of expression genes represent only a few of many possible combinations of cis- and trans-acting factors on gene expression within discrete and interconnected biochemical pathways. Indeed, the modest number of changes to polyploid gene expression corresponded to the limited number of genomic changes that were previously found to accompany polyploidization in cotton . Perhaps, these exceptional gene expression biases indicated an additional level of gene expression regulation such as DNA methylation. The interesting exceptions to a conserved direction of gene expression bias and their potential effects on the phenotype merit further investigation.
We are grateful to the assistance of Kara Grupp at Iowa State University for her help collecting petal tissue. The DNA sequencing was accomplished with the assistance of Brian Dalley at the Hustmant Cancer Center. The bioinformatic analysis was only possible through the generous support of the Fulton SuperComputer Laboratory at Brigham Young University. We acknowledge the support of the National Science Foundation Plant Genome Research Program and the support of Cotton Inc., Cary, NC.
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.