Molecular population genetics and gene expression analysis of duplicated CBF genes of Arabidopsis thaliana
© Lin et al; licensee BioMed Central Ltd. 2008
Received: 02 May 2008
Accepted: 07 November 2008
Published: 07 November 2008
CBF/DREB duplicate genes are widely distributed in higher plants and encode transcriptional factors, or CBFs, which bind a DNA regulatory element and impart responsiveness to low temperatures and dehydration.
We explored patterns of genetic variations of CBF1, -2, and -3 from 34 accessions of Arabidopsis thaliana. Molecular population genetic analyses of these genes indicated that CBF2 has much reduced nucleotide diversity in the transcriptional unit and promoter, suggesting that CBF2 has been subjected to a recent adaptive sweep, which agrees with reports of a regulatory protein of CBF2. Investigating the ratios of Ka/Ks between all paired CBF paralogus genes, high conservation of the AP2 domain was observed, and the major divergence of proteins was the result of relaxation in two regions within the transcriptional activation domain which was under positive selection after CBF duplication. With respect to the level of CBF gene expression, several mutated nucleotides in the promoters of CBF3 and -1 of specific ecotypes might be responsible for its consistently low expression.
We concluded from our data that important evolutionary changes in CBF1, -2, and -3 may have primarily occurred at the level of gene regulation as well as in protein function.
Genome duplication has been proposed as an advantageous path to evolutionary innovation and functional divergence [1, 2]. Ohno's theory hypothesizes that duplicates are mostly silenced by degenerative mutations following duplication due to a redundancy in function. Although probabilities of duplicates acquiring novel and beneficial functions is lower than duplicates being silenced , the classical model of the fates of duplicate genes proposed by Ohno is coming under increasing scrutiny [4–6]. Preservation of duplicate genes is believed to be a common fate, particularly for genes that contain multiple regulatory regions, and thereby multiple copies of genes are maintained in the genome [7–10].
Duplicated proteins evolve for some time under relaxed functional constraints, after which functional divergence occurs when formerly neutral substitutions convey a selective advantage in a novel environment or genetic background . Gene duplication is often followed by an elevated mutation rate [11, 12]. Rapid duplicate gene evolution is caused by positive Darwinian selection or relaxation of the functional constraints in the redundant gene. Continuation of gene duplication leads to the formation of gene groups with high similarities in nucleotide and amino acid sequences. The majority of genes in higher organisms are members of multigene families or subfamilies . Three potential evolutionary fates of duplicated genes have been suggested [1, 14]: (A) Duplicated genes may be inactivated by the degenerative mutations and become nonfunctional (i.e., defunctionalization). (B) A duplicated gene may diverge to acquire a new function (i.e., neofunctionalization). (C) Degenerative mutations occur in each copy after duplication. Both genes may be altered in such a way that the combined activity of the two genes together fulfills the task of the ancestral gene in a complementary fashion (i.e., subfunctionalization) [2, 15].
In this report, we attempted to characterize the molecular population genetics and gene expression of these three CBF genes from 34 ecotypes, and to assess the roles of selective forces that have driven the divergence of these three duplicate genes. Usually molecular evolutionary approaches are used to study selection forces on gene sequences; here we have the opportunity to check whether the results from those methods are supported by many studies of the expression and function of CBF genes. Our results, coupled with those from other published studies, indicate that functional divergence in CBF1, -2, and -3 resulted from promoter polymorphisms as well as coding sequence variations.
Sequence polymorphisms of CBF1, -2, and -3 in Arabidopsis thaliana
Nucleotide diversity of CBF s
For CBF transcriptional units (TUs) the nucleotide diversity (π) of CBF3 was highest (0.00685) but close to that of CBF1 (0.00548), while the diversity of CBF2 was still the lowest (0.00264). However, the nucleotide diversity, π, of CBF2's TU was 2.4-fold higher than that of its promoter (Table 1). In the CBF coding regions, 11, 11, and 16 nonsynonymous substitutions were respectively found in CBF1, -2, and -3. Of the total 38 nonsynonymous substitutions, only three were located in AP2 domains (but outside the proposed alpha-helix motif) which are responsible for DNA binding. Seven nonsynonymous substitutions were in the N-terminal region where the function is unknown. No amino acid replacement was observed in the NLS. Twenty-eight nonsynonymous substitutions existed in transcriptional activation domains which were responsible for activation of CBF regulon genes. Ten of 11 amino acid displacements in CBF2 were singletons except for one in the N-terminal region (Figure 1C).
Ecotypes Hi-0 and Bl-1 had lost ten amino acids in CBF1's transcriptional activation domain, but their open-reading frames were still present. A single adenine insertion was found in the CBF2 coding region of Kas-2. This insertion, located at the beginning of the transcriptional activation domain, led to an early nonsense stop codon in this reading frame, and 92 of ~98 residues of this domain had been lost. This C-terminal domain is necessary for activation of the expressions of downstream genes . We believe that CBF2 in the Kas-2 ecotype has become a null allele. No indels were found in CBF3's coding regions. We sequenced CBF4 coding sequences in 34 ecotypes as well, and no indels were found. Two nonsynonymous substitutions were observed: one was a singleton and the other was informative.
Distinct patterns of sequence polymorphisms were observed in different regions of CBF
Tajima's D test was marginally positive in the promoter of CBF1 (D = 1.96, 0.1 <p < 0.05, Table 1), however, after removing sites before site 195 (see additional file 2A), the D value was significantly positive (D = 2.12, p < 0.05). This suggests a signature of balancing selection dominating the majority of the CBF1 promoter. In the TU of CBF1, the D value was not significant. In contrast to CBF1, both the promoter and TU of CBF2 had an excess of singletons, and Tajima's D tests were significantly negative (-2.32 and -2.23 in the promoter and TU, respectively, p < 0.01). This indicates that the entire region of CBF2 was favored by positive selection, and a beneficial allele has been fixed in the population. However, a significant negative D value can also be the result of deriving low-frequency detrimental alleles or a bottleneck effect. The signature of positive selection of CBF2 was confirmed by other tests and physiological data as described below. Unlike CBF1 and -2, neither the promoter nor TU was significant in CBF3.
Neutrality tests and tests of heterogeneity of polymorphisms to divergence using outgroup sequences
Fay and Wu's H
-8.555 (p = 0.0015*)
-3.686 (p = 0.0681)
Purifying selection acts on CBFs but different levels of divergence were observed among different domains
Evolutionary distances between paralogs and orthologs in non-coding and coding regions
Promoters a (paralogs)
CBF1 vs. CBF2
0.448 ± 0.043
CBF2 vs. CBF3
0.360 ± 0.035
CBF1 vs. CBF3
0.501 ± 0.045
CBF1 vs. CBF2
0.212 ± 0.041
CBF2 vs. CBF3
0.156 ± 0.033
CBF1 vs. CBF3
0.266 ± 0.047
CBF1 vs. CBF2
0.408 ± 0.069
0.062 ± 0.012
CBF2 vs. CBF3
0.462 ± 0.078
0.068 ± 0.014
CBF1 vs. CBF3
0.546 ± 0.093
0.064 ± 0.013
CBF1 vs. CBF4
1.478 ± 0.451
0.216 ± 0.031
CBF2 vs. CBF4
1.728 ± 0.654
0.233 ± 0.032
CBF3 vs. CBF4
1.456 ± 0.431
0.232 ± 0.030
AtCBF2 vs. AlCBF2
0.340 ± 0.059
0.048 ± 0.012
AtCBF3 vs. AlCBF3
0.284 ± 0.056
0.045 ± 0.011
Compared to the divergence among cold-induced CBF1, -2, and -3, the divergent region between dehydration-induced CBF4 and CBF1, -2, and -3 was unique. The AP2 domains were apparently conserved because there were some gaps in the N-terminal side of this domain. There is a region within the transcriptional activation domain which had Ka/Ks ratios which were all > 1 (1.6~3.0), a signature of positive selection in protein evolution, in all three comparisons. The three highest Ka/Ks peaks covered almost the same region (Figure 4B).
Positive selection in the transcriptional activation domain
So far we have found that the most divergent regions are in the transcriptional activation domains of CBFs; however, the sliding window plots detected no region with a ω (Ka/Ks ratio) value of > 1. This is because for sliding window plots, each window (of 50 bp) overlapped with the previous and following windows because the step size (10 bp) was smaller than the window size. These sizes of the window and overlapping windows might not have allowed us to detect smaller regions of positive selection or to have precisely located the divergent region. Herein we used a non-overlapping sliding window method to locate the existence of positive selection for six residues per window in the transcriptional activation domain.
We used codeml in the PAML3.15 package  to estimate ω values among sites with different models. CBF2 sequences could not be computed because too many branches had collapsed after the bootstrap re-sampling. The results of all LRTs for CBF1 and -3 were significant (the highest p value was 0.002 among all LRTs). M2a, M3, and M8 all identified a small proportion of protein (< 1%) with ω values of > 1, while most portions of CBF1 and -3 were under purifying selection. One positively selected site, 178 T (181 T in Figure 5; site703 in additional file 2B) was identified in CBF1, and two sites, 2 S (125 bp of CDS in additional file 4B) and 151E (152E in Figure 5; site572 in additional file 4B), were identified in CBF3. Nine ecotypes carried 178 S instead of 178 T in CBF1, while in CBF2 and -3, all ecotypes carried a serine residue at this site. Seven ecotypes carried 2 S instead of 2 N, and seven ecotypes carried 151A instead of 151E in CBF3.
Gene expression experiments identify several naturally knockout or knockdown of CBFs
In addition to naturally occurring mutants, we found that the expressions of CBF3 in Cvi-0 and Co-1, and CBF1 in Cvi-0 and Can-0 after 4°C treatment for 1.5 h were consistently low in different batches. However, gene expressions in most other ecotypes were not consistently high or low among batches (data not shown).
DNA sequence polymorphism
The overall level of genome-wide polymorphism in Arabidopsis was estimated in the values between 0.06 and 0.07 [51, 52]. The π values of CBF3's promoter, and CBF1 and -3's TUs are close to these two estimates. Altogether, CBF2 was the most highly conserved gene in the promoter region and TU of this small family, and CBF1 carried the highest nucleotide diversity in the promoter, the diversity of TU was compatible with that of the TU of CBF3. The nucleotide diversities of promoters of CBFs were greater than those among TUs. This suggests that the promoters and TUs of CBFs have been subjected to distinct selection mechanisms and/or demographic histories. With respect to protein evolution, AP2 domains were very conservative compared to the transcriptional activation domains. This suggests that among CBFs in Arabidopsis, the DNA targeting abilities have not changed throughout their evolutionary history. The divergence, if any, of CBF biochemical functions should exist in transcriptional activation domains.
Different selection forces working on different CBFgenes
Selection has played a role in the maintenance of conservation in coding sequences among CBF1, -2, and -3 in the form of purifying selection. As to protein evolution, these three duplicate genes are functionally constrained as evidenced by the Ka/Ks ratios being much lower than the neutral expectation of 1 for both paralogs and orthologs (Table 3). The excess of low-frequency polymorphisms for CBF2 indicates that it is strictly constrained. The nucleotide variation in the CBF1 promoter was characterized by dimorphism indicating balancing selection, while the entire region of CBF3 did not deviate from neutral expectations.
However, a strong population structure and genome-wide deviation from the standard neutral model in A. thaliana were reported [51, 52]. Nordberg et al.  suggested that the skewness of Tajima's D of the A. thaliana genome was due to an excess of rare alleles and demographic processes. To reassess the influence of the population structure, we used the Mantel test (zt software ) to evaluate the existence of isolation-by-distance in our dataset. The promoters and transcriptional units of CBF1, -2, and -3 were examined. The results from both regions of CBF2 were not significant (promoter: r = -0.055, p = 0.345; TU: r = -0.031, p = 0.556), but were significant in the promoters of CBF1 and -3 (r = 0.146, p = 0.037; and r = 0.134, p = 0.046). It seems the results from Tajima's D test of CBF2 are independent of the population structure. Nevertheless, the population structure is expected to push Tajima's D toward positive values , and this effect may have partially contributed to the significantly positive D of CBF1's promoter. However, the dimorphism we observed was not related to the geographic distribution of ecotypes. The origins of maintaining such allelic dimorphism are still unclear. This pattern of allelic dimorphism is not unprecedented in Arabidopsis, having been reported for many loci including TFL1 . It may be a result of the demographic history of this species complex .
We do think that CBF2 played a unique role different from those of CBF1 and -3 after gene duplication. It is possible that the regulatory function of CBF2 was favored by positive selection which was detected in our dataset.
Positive selection sites in the transcriptional activation domain
The transcriptional activation domain of CBF1 was studied by Wang et al. , and six hydrophobic cluster motifs were identified by a computational analysis. They used a series of GAL4DBD/CBF1AD (CBF1AD: transcriptional activation domain) fusion constructs containing either nonsense codon introductions or alanine substitutions in CBF1AD to assay their abilities to transactivate the GAL4-responsive lacZ reporter enzyme in yeast. Based on the high sequence identity and the fact that almost the same set of downstream genes can be activated by CBF1, -2, and -3, the mechanism of transcriptional regulation of CBF regulons should be very similar. Thus, the functional dissection of this domain from CBF1 might also be the case in CBF2 and -3.
The non-overlapping sliding window of Ka/Ks suggests that two possible regions were favored by positive selection after gene duplication (Figure 5). These two regions are not located in the hydrophobic clusters 2, 3, and 4 which Wang et al.  considered to have been the most critical motifs in transactivation. In these two regions, four of six residues substituted by alanine led to substantial changes in reporter enzyme activities (139T → A, 162P → A, 165S → A, and 169F → A, Figure 5). It was interesting to note that the activity increase of 162P → A was the highest among substitutions, and 169F → A was also the greatest among those that led to activity decreases.
We want to see whether the positive selection site in the sequence agrees with the two regions mentioned above. With the site models of the codeml program, two positively selected amino acids were identified in the transcriptional activation domain as expected but not locate in the two regions identified by the sliding window of Ka/Ks method.
A decrease of 26% of reporter enzyme activity was observed when alanine was substituted for 181 T of CBF1, and in CBF3, 152E is in hydrophobic cluster 1. When the same site in CBF1 was substituted by alanine (152E → A), a 63% increase in reporter activity was observed (Figure 5).
Hydrophobic residues within the functionally critical HC2, -3, and -4 of CBFs were found to be conserved among different ecotypes. Most positively selected regions of the activation domain identified in this paper are located outside of critical motifs. This difference is because the meaning of Ka/Ks is different from that of hydrophobic cluster, the former emphasizes variation in amino acid replacement and the latter in no or low variation. It implies that the hydrophobic cluster is critical in the interaction with other activator and the two regions found in this study might involve in tuning the transcriptional activation.
Ecotypes from lower latitude may accumulate mutations that reduce CBFs expressions
Some variations we found for specific nucleotides of the promoters of CBF1 and -3 were probably responsible for the low level of gene expression. CBF1 of Cvi-0 has four nucleotides which are unique among the ecotypes (nos. 1, 249, 469, and 541, see additional file 2), while Can-0 has two unique nucleotides (nos. 57 and 469). CBF3 of Cvi-0 of has four nucleotides which are unique among the ecotypes (nos. 338, 1017, 1167, and 1284 in additional file 4). However, none of these nucleotides was located in the known cis-element of CBF1 or -3 when screened by the PLANTCARE website .
It is interesting to note that singletons of the CBF1 promoters of Cvi-0 and Can-0 comprised 35% (6/17) of the 34 accessions, and nucleotide mutations mainly occurred in accessions collected from lower latitudes (see additional file 2). Singletons of the CBF3 promoters of Cvi-0 and Co-1 comprised 16% (5/31) of the 34 accessions, and nucleotide mutations also mainly occurred in lower latitudes (see additional file 4). Thus lower latitudes (probably indicating warmer-temperature environments) might allow accumulation of nucleotide mutations in the promoter regions.
Evidence of functional diversification in CBF1, -2, and -3
In a study to test for benefits and costs (number of fruits) of cold tolerance, CBF2 and CBF3 overexpressers showed costs of cold tolerance and no fitness benefits in control and cold environments. CBF1-overexpressing plants showed no fitness costs of cold tolerance in the control environment and showed a marginal fitness benefit in the cold environment . A CBF2 knockout mutant, cbf2, had a higher capacity to tolerate freezing than wild-type ones which was correlated with stronger and more-sustained expressions of CBF1 and CBF3 in the mutant. They concluded that CBF2 is a negative regulator of CBF1 and CBF3 , and CBF2 but not CBF1 and CBF3 was found as a gene contributed to a major freezing QTL locus . Moreover, in contrast to CBF2, CBF1 and CBF3 positively regulate cold acclimation by activating the same subset of CBF-target genes . These studies indicated functional changes have occurred at the level of CBF proteins.
Ecotype Kas-2 carries a CBF3 knockout allele and a null allele CBF2 gene due to a single insertion in the coding region. At first, we predicted that CBF1, which is the only functional CBF induced by cold, in this ecotype would be highly expressed to compensate for the double-mutant effect. However, this prediction was not correct. This is also evidence that CBF1, -2, and -3 are not strictly functional equivalents.
Evolutionarily important changes have also occurred at the level of gene regulation in CBF1, -2, and -3 in addition to protein function. Chinnusamy et al.  identified a mutant, ICE1 (inducer of CBF expression 1), that results in the CBF3 gene no longer being induced in response to low temperature, but which has little effect on cold induction of CBF1 and CBF2. CBFs are subjected to different temporal regulation during cold acclimation. Recently, Novillo et al.  further supported that CBF1 and CBF3 are regulated in a different way then CBF2. Although subfunctionalization of the coding region has been detected, subfunction in the promoter is also a process leading to separate expression control of CBFs.
In summary according to sequence analyses, we demonstrate that different selection forces have been working on the promoter and coding regions of CBF1, -2, and -3. After CBF duplication, CBF2 experienced selective sweeps in the promoter and coding regions, and this may have resulted in its unique functional divergence which differs from those of CBF1 and -3. The promoter of CBF1 has a signature of balancing selection, while CBF3 did not deviate from neutrality. CBF promoters share less similarity than do the coding regions, indicating they are differentially regulated or have been shaped by different evolutionary forces. There is evidence of positive selection in both the CBF1 and -3' transcriptional activation domains within each domain and between them. CBF1 and -3 were shown to activate same regulons but need to work cooperatively , and they are regulated by different upstream factors . The function of the progenitor CBF gene has not been determined to date. Nevertheless, it seems that the complements of CBF1 and -3 are under subfunctionalization, while CBF2, with a different regulatory role, might be a case of neofunctionalization.
Growth conditions, genomic polymerase chain reaction (PCR), and sequencing
All ecotype resources are listed in the supplementary materials (see additional file 1). The seeds were chilled at 4°C in the dark for 5 days before being grown in soil medium (Bio-mix TTing Substratum, Moerdijk, Nederland). Seeds of A. lyrata ssp.petraea (Plech, Bavaria, Germany) were chilled for more than 2 weeks. The seedlings were grown at 22°C under a 16-h light/8-h dark photoperiod with a light intensity of ~100 μE/m2/s. For the gene expression study, three-week old seedlings were treated at 4°C, for 1.5 h under fluorescent lights at 40 μE/m2s. To avoid the effects of circadian clock behavior of CBFs, we always started the chilling treatment at 13:00 (ZT4, ). After cold treatment, leaves from five individuals were combined as one sample for each ecotype and were immediately frozen in liquid nitrogen. Three independent trials were performed. The interval between trials was 3~4 weeks.
Genomic DNA was extracted following a modified protocol according to Doyle and Doyle  and stored at -20°C. Primers (see additional files 6 and 7 for primer sequences and PCR conditions) were designed according to the ecotype Col genomic sequence in The Arabidopsis Information Resource (TAIR)  using the Primer3 program . We used the PreMix2.0 (TOPBIO, Taipei, Taiwan) PCR reagent mix for all PCR reactions. The PCR products were run on 1% agarose gels, and the desired bands were excised for purification and direct sequencing using a Taq Dye Dideoxy Terminator Cycle Sequencing Kit (Applied Biosystems, Foster city, CA, USA) and an ABI 3730 sequencer (Applied Biosystems). Vector NTI Suite 9 Contig-Express (Invitrogen, Karlsruhe, Germany) was used for sequence assembly and chromatogram inspection.
The CBF3 genes in the various ecotypes were first sequenced. To avoid PCR and sequencing errors, two independent PCR samples for each ecotype were separately sequenced. In addition to the primers used in the PCRs, additional primers were designed and used in sequenced regions with ambiguous signals. No sequence difference was observed among any two independent CBF3 sequence replicates, and therefore we performed one PCR for CBF1 and -2. Primers in the PCR reactions for CBF1 and -2 were used for sequencing, and additional primers were also used to sequence ambiguous regions (the second PCRs were performed for sequencing ambiguous regions). To isolate CBF orthologs from A. lyrata, primers were designed to amplify the coding regions. The genomic-PCR products from A. lyrata were purified by gel-elution. After the PCR fragments were ligated into the vector, pGEM®-T Easy (Promega, Madison, WI, USA), the constructs were transformed to competent DH5-α cells (RBC Bioscience, Taipei, Taiwan). Twenty-six colonies were selected for sequencing. Each copy of the CBF orthologs was determined based on at least three clones. In this research, because the sequences of CBF4 were used as the outgroup for certain sequence and phylogenetic analyses mentioned below, only the coding regions of CBF4 were determined.
Sequence data from this article have been deposited with the GenBank Data Libraries under accession nos: EF522962~EF523192, EF523193~EF523195, and EF523196. Genomic sequences of Col of AT4G25470 (CBF2), AT4G25480 (CBF3), AT4G25490 (CBF1), and AT5G51990 (CBF4) were retrieved from TAIR.
Sequences were aligned using CLUSTAL X . Certain insertions were manually removed after sequence alignments to maximize alignable regions prior to calculating the polymorphism parameters. In the analysis of promoter sequences, CBF1, -2, and -3 respectively of Ita-0, Cvi-0, and Kas-2 were excluded. In the promoter sequences from Ita-0's CBF1 and Kas-2's CBF3, there was a region substituted by a sequence from another part of the genome. Most parts of the CBF2 promoter region had been lost in Cvi-0 in the region we surveyed. The promoter of Co-1 CBF1 was included in our analysis after removing a 365-bp insertion near the site of transcriptional initiation. In the analysis of nucleotide polymorphism of the transcriptional unit, the CBF2 sequences from Ita-0 and Kas-2 were included after removing their insertions in the 3' untranslated region (UTR). In Ita-0, a long insertion (> 1100 bp) and a GGGAAA sequence (5 bp before the insertion) were removed from the 3'UTR, but we did not finish sequencing this insertion. The remaining 3'UTR sequence of Ita-0 after this insertion was treated as missing data. Coding sequences of CBF1 of Hi-0 and Bl-1 and CBF2 of Kas-2 were not analyzed in the Ka/Ks calculation to avoid loss of polymorphism information in these regions due to gaps that led to amino acid deletions (Hi-0 and Bl-1) and frame-shifts (Kas-2).
Sequence divergences between orthologs and paralogs were estimated using MEGA3.1 . The evolutionary distance, K, between CBF promoters (500 bp upstream from the TU) were computed using the Kimura 2-paramater substitution model, and gaps were treated as missing data. Distances between orthologous coding sequences, and between paralogous sequences were computed from synonymous (Ks) and nonsynonymous sites (Ka) by the Nei-Gojobori method . All standard errors of divergence distances were determined using 500 bootstrap replicates.
The nucleotide diversity, Tajima's D test , Fay and Wu's H , and a sliding window analysis were carried out using DNASP4.10 . The window size for the promoter and TU analyses was 100 bp with a step size of 20 bp. The window size for the Ka/Ks sliding window plots was 50 bp with a step size of 10 bp. In order to identify fluctuations in Ka/Ks more precisely in the transcriptional activation domains, we used a sliding window approach in which the window and step sizes were both 18 bp. This allowed non-overlapping window scanning for six residues per window. Fay and Wu's H was computed using A. lyrata CBF orthologs as outgroups. The heterogeneity of polymorphisms to fixed differences was computed using DNA Slider1.11 . Only sequences from silent sites were considered. For Gmean and DKS statistics, R = 2, 4, 8, 16, and 32 were simulated with 1000 replicates. The highest p value of each statistic is reported.
A CBF phylogenetic tree was constructed by the Neighbor-joining method  under the HKY85  model based on the coding sequences using PAUP*4.10 . One thousand bootstrap replicates were performed, and > 50% frequencies are shown. The bootstraps values are given at each node. One sequence (from one ecotype) of each haplotype of coding sequences was used. CBF homologs from Lycopersicon esculentum, Le CBF1, -2, and -3 (GenBank accession nos.: AY034473, AY497899, and AY497899, respectively), were used as outgroups.
In addition to locating the sequence region being under selection by the sliding window approach, we want to further find the sites are positively selected. Because it is impossible that positive selection operates on every lineage (at different times) and every codon has the same strength, Yang developed statistical tests based on the maximum-likelihood method [41–43] to estimate specific lineages or codons which are under positive selection. The value of ω is allowed to vary among sites (codons), and sites under positive selection can be identified by site models. In the one-ratio model (M0), all lineages and sites have the same value of ω. In the site models (M1a, M2a, M3, M7, and M8), classes of ω are allowed according to different assumptions [41, 43]. These models were designed as hierarchal relationships, and log likelihood values obtained by M0 (one-ratio), M1a (nearly neutral), M2a (positive selection), M3 (discrete), M7 (beta), and M8 (beta and selection) were compared using likelihood ratio tests (LRTs). The significances of M0 vs. M3, M1a vs. M2a, and M7 vs. M8 were estimated . Codeml in the PAML3.15 package  was used to detect specific sites (or codons) which were positively selected. Phylogenetic trees of CBF1, -2, and -3 were constructed by maximum-likelihood using the HKY85  model, and > 50% branching topology was retained after 1000 bootstrap replicates. Trees were constructed according to full-length (promoter and TU) sequences using PAUP*4.10 . CBF1, -2, and -3 respectively in Ita-0, Cvi-0, and Kas-2 were abandoned when estimating positively selected codons because of their long sequence indels.
The promoter sequences of CBF genes were analyzed using TCS1.21 software . Alignment gaps were treated as missing data. The promoters of CBF1, -2, and -3 were sorted according to haplotypes. There are 15, 15, and 18 haplotypes of the CBF1, -2, and -3 promoters, respectively. Promoter haplotypes of ecotypes with higher frequencies were selected, and totally 16 ecotypes were chosen for the CBF expression analysis.
RNA extraction and complementary (c)DNA synthesis
RNA was isolated using the REzol™ C & T (Protech, Taipei, Taiwan) reagent following the manufacturer's protocol. RNA was dissolved in DEPC-treated sterile water. After checking the RNA integrity on a 1% agarose gel, 4 μl (2 μg) of RNA was used to synthesize cDNA for each 20-μl reaction volume by Superscript III-RT with oligo-dT12–18 primers according to the manufacturer's protocol (Invitrogen). Each sample was diluted1/20-fold.
Real-time PCR analysis
Different promoter haplotypes were used for real-time PCR analysis. The promoter haplotypes were sorted by TCS , and 16 ecotypes were chosen (covering most of promoter haplotypes) for expression analysis.
Three primer pairs were designed to measure the expressions of CBF1, -2, and -3 (see additional file 8 for primer sequences). The primer pair for CBF1 expression measurements, CBF1LP212/CBF1RP369, was designed according to the sequence of the N-terminal coding region; the length of the CBF1 PCR fragment was 158 bp. For CBF2 measurements, the primer pair, CBF2LP744/CBF2RP913, was designed according to sequences of the C-terminal coding region and 3'UTR; the length of the CBF2 PCR fragment was 170 bp. For CBF3, the primer pair, CBF3LP38/CBF3RP196, was designed from the sequence of the 5'UTR and N-terminal of the coding region; the length of the CBF3 PCR fragment was 159 bp. All primers were designed in regions specific to each CBF but which were conserved among the ecotypes used. Therefore, to consider the specificity and annealing temperatures of the primer pairs, we could not design them all according to the 3'-end of sequences since reverse transcription may produce truncated cDNA. For each cDNA sample, the expressions of CBF1, -2, and -3 were all analyzed. Primers of the housekeeping genes ACTIN2 (AT3G18780) and ACT2LP602/ACT2RP739, were designed to span a 78-bp 2nd intron. This allowed us to detect genomic DNA contamination in our cDNA samples. The presence of two or more PCR products, with different lengths and GC contents, was detected by melting kinetics, which were measured in every plate for every sample. The annealing temperature (Tm) of all primer pairs was close to 60°C. After sequencing the PCR products amplified by different primer pairs, we confirmed that our amplifications were specific to each CBF and ACTIN2, but not other highly similar members.
We used the iCycler iQ real-time detection system (Bio-Rad Laboratories, Hercules, CA, USA) to quantify gene expression. Fifteen-microliter reaction mixtures (per well) contained 7.5 μl of the two-fold enzyme mix (iQ™ SYBR Green Supermix, Bio-Rad), 1.9 μl deionized/sterile water, 0.3 μl of each primer (300 nM final concentration), and 5 μl diluted cDNA (~0.1 μg). We used the same PCR protocol for all CBF1, -2, and -3 and ACTIN2 primer pairs. The PCR protocol was 95°C 3 min for hot-start polymerase (one cycle), 95°C 30 s, 60°C 30 s, and 72°C 15 s for 40 cycles. After each PCR we performed melting kinetics to monitor the specificity of the CBF and ACTIN2 amplifications.
Standard curves of specific CBFs and ACTIN2 were established in every plate to quantify the relative expression levels (the formula of the standard curve quantification method was based on the ABI PRISM 7700 Sequence Detection System user bulletin #2). For each cold-treated cDNA sample, four sample repeats were measured in every plate for both the CBFs and ACTIN2. The data of real-time PCR were analyzed using iCycler iQ optical system software 3.1 (Bio-Rad). Baseline cycles were set to two~ten cycles, and the threshold position of the fluorescence was set to 30. Expressed quantities of each CBF of the ecotypes were normalized to ACTIN2 quantities of the same ecotype. PCR efficiencies of different primer pairs were calculated according to Rasmussen's method . The average PCR efficiency was > 97%.
This study was supported by grants from the National Taiwan University (97R0066-35) and the National Science Council (NSC94-22621-B-034-001 and NSC95-22621-B-034-001) Executive Yuan, Taiwan. The authors are grateful for the donation of seed stocks of A. lyrata ssp. petraea and A. thaliana from Dr. Maria Clauss and Dr. Chiang, Yu-Chung, respectively.
- Ohno S: Evolution by gene duplication. 1970, Springer-Verlag, Heidelberg, GermanyView ArticleGoogle Scholar
- Force A, Lynch M, Pickett FB, Amores A, Yan YL, Postlethwait J: Preservation of duplicate genes by complementary, degenerative mutations. Genetics. 1999, 151: 1531-1545.PubMedPubMed CentralGoogle Scholar
- Lynch M, Force AG: The origin of interspecific genomic incompatibility via gene duplication. Am Nat. 2000, 156: 590-605. 10.1086/316992.View ArticleGoogle Scholar
- Massingham T, Davies IJ, Liò P: Analysing gene function after duplication. Bioessays. 2001, 23 (10): 873-876. 10.1002/bies.1128.PubMedView ArticleGoogle Scholar
- Duarte JM, Cui L, Wall PK, Zhang Q, Zhang X, Leebens-Mack J, Ma H, Altman N, dePamphilis W: Expression pattern shifts following duplication indicative of subfunctionalization and neofunctionalization in regulatory genes of Arabidopsis. Mol Biol Evol. 2006, 23: 469-478. 10.1093/molbev/msj051.PubMedView ArticleGoogle Scholar
- Yang X, Tuskan GA, Cheng ZM: Divergence of the Dof gene families in poplar, Arabidopsis, and rice suggests multiple modes of gene evolution after duplication. Plant Physiol. 2006, 142: 820-830. 10.1104/pp.106.083642.PubMedPubMed CentralView ArticleGoogle Scholar
- Ohta T: Simulating evolution by gene duplication. Genetics. 1987, 115: 207-213.PubMedPubMed CentralGoogle Scholar
- Clark AG: Invasion and maintenance of a gene duplication. Proc Natl Acad Sci USA. 1994, 91: 2950-2954. 10.1073/pnas.91.8.2950.PubMedPubMed CentralView ArticleGoogle Scholar
- Nowak MA, Boerlijst MC, Cooke J, Smith JM: Evolution of genetic redundancy. Nature. 1997, 388: 167-171. 10.1038/40618.PubMedView ArticleGoogle Scholar
- Wagner A: Redundant gene functions and natural selection. J Evol Biol. 1999, 12: 1-16. 10.1046/j.1420-9101.1999.00008.x.View ArticleGoogle Scholar
- Zhang J, Rosenberg HF, Nei M: Positive Darwinian selection after gene duplication in primate ribonuclease genes. Proc Natl Acad Sci USA. 1998, 95: 3708-3713. 10.1073/pnas.95.7.3708.PubMedPubMed CentralView ArticleGoogle Scholar
- Bielawski JP, Yang Z: Positive and negative selection in the DAZ gene family. Mol Biol Evol. 2001, 18: 523-529.PubMedView ArticleGoogle Scholar
- Hughes AL: Adaptive evolution of genes and genomes. 1999, Oxford University Press, Oxford, UKGoogle Scholar
- Lynch M, Conery JS: The evolutionary fate and consequences of duplicate genes. Science. 2000, 290: 1151-1155. 10.1126/science.290.5494.1151.PubMedView ArticleGoogle Scholar
- Hughes AL: The evolution of functionally novel proteins after gene duplication. Proc R Soc Lond Ser B. 1994, 256: 119-124. 10.1098/rspb.1994.0058.View ArticleGoogle Scholar
- Gilmour SJ, Fowler SG, Thomashow MF: Arabidopsis transcriptional activators CBF1, CBF2, and CBF3 have matching functional activities. Plant Mol Biol. 2004, 54: 767-781. 10.1023/B:PLAN.0000040902.06881.d4.PubMedView ArticleGoogle Scholar
- Gilmour SJ, Zarka DG, Stockinger EJ, Salazar MP, Houghton JM, Thomashow MF: Low temperature regulation of the Arabidopsis CBF family of AP2 transcriptional activators as an early step in cold-induced COR gene expression. Plant J. 1998, 16: 433-442. 10.1046/j.1365-313x.1998.00310.x.PubMedView ArticleGoogle Scholar
- Nakano T, Suzuki K, Fujimura T, Shinshi H: Genome-wide analysis of ERF gene family in Arabidopsis and rice. Plant Physiol. 2006, 140: 411-432. 10.1104/pp.105.073783.PubMedPubMed CentralView ArticleGoogle Scholar
- Liu Q, Kasuga M, Sakuma Y, Abe H, Miura S, Yamaguchi-Shinozaki K, Shinozaki K: Two transcription factors, DREB1 and DREB2, with an EREBP/AP2 DNA binding domain separate two cellular signal transduction pathways in drought- and low-temperature-responsive gene expression, respectively, in Arabidopsis. Plant Cell. 1998, 10: 1391-1406. 10.1105/tpc.10.8.1391.PubMedPubMed CentralView ArticleGoogle Scholar
- Yamaguchi-Shinozaki K, Shinizaki K: A novel cis-acting element in an Arabidopsis gene is involved in responsiveness to drought, low-temperature or high-salt stress. Plant Cell. 1994, 6: 251-164. 10.1105/tpc.6.2.251.PubMedPubMed CentralView ArticleGoogle Scholar
- Fowler S, Thomashow MF: Arabidopsis transcriptome profiling indicates that multiple regulatory pathways are activated during cold acclimation in addition to the CBF cold response pathway. Plant Cell. 2002, 14: 1675-1690. 10.1105/tpc.003483.PubMedPubMed CentralView ArticleGoogle Scholar
- Haake V, Cook D, Riechmann JL, Pineda O, Thomashow MF, Zhang JZ: Transcription factor CBF4 is a regulator of drought adaptation in Arabidopsis. Plant Physiol. 2002, 130: 639-648. 10.1104/pp.006478.PubMedPubMed CentralView ArticleGoogle Scholar
- Chinnusamy V, Ohta M, Kanrar S, Lee BH, Hong X, Agarwal M, Zhu JK: ICE1: a regulator of cold-induced transcriptome and freezing tolerance in Arabidopsis. Genes Dev. 2003, 17: 1043-1054. 10.1101/gad.1077503.PubMedPubMed CentralView ArticleGoogle Scholar
- Jackson MW, Stinchcombe JR, Korves TM, Schmitt J: Costs and benefits of cold tolerance in transgenic Arabidopsis thaliana. Mol Ecol. 2004, 13: 3609-3615. 10.1111/j.1365-294X.2004.02343.x.PubMedView ArticleGoogle Scholar
- Novillo F, Alonso JM, Ecker JR, Salinas J: CBF2/DREB1C is a negative regulator of CBF1/DREB1B and CBF3/DREB1A expression and plays a central role in stress tolerance in Arabidopsis. Proc Natl Acad Sci USA. 2004, 101: 3985-3990. 10.1073/pnas.0303029101.PubMedPubMed CentralView ArticleGoogle Scholar
- Alonso-Blanco C, Gomez-Mena C, Llorente F, Koornneef M, Salinas J, Martínez-Zapater JM: Genetic and molecular analyses of natural variation indicate CBF2 as a candidate gene for underlying a freezing tolerance quantitative trait locus in Arabidopsis. Plant Physiol. 2005, 139: 1304-1312. 10.1104/pp.105.068510.PubMedPubMed CentralView ArticleGoogle Scholar
- Fowler SG, Cook D, Thomashow MF: Low temperature induction of Arabidopsis CBF1, 2, and 3 is gated by the circadian clock. Plant Physiol. 2005, 137: 961-968. 10.1104/pp.104.058354.PubMedPubMed CentralView ArticleGoogle Scholar
- Doyle JJ, Doyle JL: Isolation of plant DNA from fresh tissue. Focus. 1990, 12: 13-15.Google Scholar
- TAIR, The Arabidopsis Information Resource. [http://www.arabidopsis.org]
- Rozen S, Skaletsky HJ: Primer3 on the WWW for general users and for biologist programmers, Bioinformatics Methods and Protocols: Methods in Molecular Biology. Edited by: Krawetz S, Misener S. 2000, Humana Press, Totowa, NJ, 365-386.Google Scholar
- Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG: The Clustal_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 1997, 25: 4876-4882. 10.1093/nar/25.24.4876.PubMedPubMed CentralView ArticleGoogle Scholar
- Kumar S, Tamura K, Nei M: MEGA3: integrated software for molecular evolutionary genetics analysis and sequence alignment. Brief Bioinform. 2004, 5: 150-163. 10.1093/bib/5.2.150.PubMedView ArticleGoogle Scholar
- Nei M, Gojobori T: Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol Biol Evol. 1986, 3: 418-426.PubMedGoogle Scholar
- Tajima F: Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics. 1989, 123: 585-595.PubMedPubMed CentralGoogle Scholar
- Fay JC, Wu CI: Hitchhiking under positive Darwinian selection. Genetics. 2000, 155: 1405-1413.PubMedPubMed CentralGoogle Scholar
- Rozas J, Sanchez-DelBarrio JC, Messeguer X, Rozas R: DnaSP, DNA polymorphism analyses by the coalescent and other methods. Bioinformatics. 2003, 19: 2496-2497. 10.1093/bioinformatics/btg359.PubMedView ArticleGoogle Scholar
- McDonald JH: Improved tests for heterogeneity across a region of DNA sequence in the ratio of polymorphism to divergence. Mol Biol Evol. 1998, 15: 377-384.PubMedView ArticleGoogle Scholar
- Saitou N, Nei M: The neighbor-joining method: A new method for reconstructing phylogenetic trees. Mol Biol Evol. 1987, 4: 406-425.PubMedGoogle Scholar
- Hasegawa M, Kishino H, Yano T: Dating the human-ape splitting by a molecular clock of mitochondrial DNA. J Mol Evol. 1985, 22: 160-174. 10.1007/BF02101694.PubMedView ArticleGoogle Scholar
- Swofford D: PAUP*: phylogenetic analysis using parsimony (*and other methods). Version 4. 2002, Sinauer Associates, Sunderland, MAGoogle Scholar
- Nielsen R, Yang Z: Likelihood models for detecting positively selected amino acids and applications to the HIV-1 envelope gene. Genetics. 1998, 148: 929-936.PubMedPubMed CentralGoogle Scholar
- Yang Z: Likelihood ratio tests for detecting positive selection and application to primate lysozyme evolution. Mol Biol Evol. 1998, 15: 568-573.PubMedView ArticleGoogle Scholar
- Yang Z, Nielsen R, Goldman N, Pederson AMK: Codon-substitution models for heterogeneous selection pressures among sites classes. Genetics. 2000, 155: 431-449.PubMedPubMed CentralGoogle Scholar
- Bielawski JP, Yang Z: Maximum likelihood methods for detecting adaptive evolution after gene duplication. J Struct Funct Genom. 2003, 3: 201-212. 10.1023/A:1022642807731.View ArticleGoogle Scholar
- Yang Z: PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl BioSci. 1997, 13: 555-556.PubMedGoogle Scholar
- Clement M, Posada D, Crandall K: TCS: a computer program to estimate gene genealogies. Mol Ecol. 2000, 9: 1657-1660. 10.1046/j.1365-294x.2000.01020.x.PubMedView ArticleGoogle Scholar
- Rasmussen R: Quantification on the LightCycler:Rapid Cycle Real-time PCR, Methods and Applications. Edited by: Meuer S, Wittwer C, Nakagawara K. 2001, Springer Press, Heidelberg, 21-34.View ArticleGoogle Scholar
- Nei M: Molecular evolutionary genetics. 1987, Columbia University Press, New YorkGoogle Scholar
- Watterson GA: Number of segregating sites in genetic models without recombination. Theor Populat Biol. 1975, 7: 256-276. 10.1016/0040-5809(75)90020-9.View ArticleGoogle Scholar
- Wang Z, Triezenberg SJ, Thomashow MF, Stockinger EJ: Multiple hydrophobic motifs in Arabidopsis CBF1 COOH-terminus provide functional redundancy in transactivation. Plant Mol Biol. 2005, 58: 543-559. 10.1007/s11103-005-6760-4.PubMedView ArticleGoogle Scholar
- Nordborg M, Hu TT, Ishino Y, Jinal J, Christopher T, Zheng H, Bakker E, Calabrese P, Gladstone J, Goyal R, Jakobsson M, Kim S, Morozov Y, Padhukasahasram B, Plagnol V, Rosenberg NA, Shah C, Wall JD, Wang J, Zhao K, Kalbfleisch T, Schulz V, Kreitman M, Bergelson J: The pattern of polymorphism in Arabidopsis thaliana. Plos Biol. 2005, 3: 1289-1299. 10.1371/journal.pbio.0030196.View ArticleGoogle Scholar
- Schmid KJ, Ramos-Onsins S, Ringys-Becktein H, Weisshaar B, Mitchell-Olds T: A multilocus sequence survey in Arabidopsis thaliana reveals a genome-wide departure from a neutral model of DNA sequence polymorphism. Genetics. 2005, 169: 1601-1615. 10.1534/genetics.104.033795.PubMedPubMed CentralView ArticleGoogle Scholar
- ZT software. [http://www.psb.ugent.be/~erbon/mantel]
- Olsen KM, Womack A, Garrett AR, Suddith JI, Purugganan MD: Contrasting evolutionary forces in the Arabidopsis thaliana floral developmental pathway. Genetics. 2002, 160: 1641-1650.PubMedPubMed CentralGoogle Scholar
- Sharbel T, Haubold B, Mitchell-Olds T: Genetic isolation by distance in Arabidopsis thaliana: biogeography and postglacial colonization of Europe. Mol Ecol. 2000, 9: 2109-2118. 10.1046/j.1365-294X.2000.01122.x.PubMedView ArticleGoogle Scholar
- PLANTCARE. [http://bioinformatics.psb.ugent.be/webtools/plantcare/html]
- Novillo F, Medina J, Salinas J: Arabidopsis CBF1 and CBF3 have a different function than CBF2 in cold acclimation and define different gene classes in the CBF regulon. Proc Natl Acad Sci USA. 2007, 104: 21002-21007. 10.1073/pnas.0705639105.PubMedPubMed CentralView ArticleGoogle Scholar