Comparative genomic analysis of 1047 completely sequenced cDNAs from an Arabidopsis-related model halophyte, Thellungiella halophila

Background Thellungiella halophila (also known as T. salsuginea) is a model halophyte with a small size, short life cycle, and small genome. Thellungiella genes exhibit a high degree of sequence identity with Arabidopsis genes (90% at the cDNA level). We previously generated a full-length enriched cDNA library of T. halophila from various tissues and from whole plants treated with salinity, chilling, freezing stress, or ABA. We determined the DNA sequences of 20 000 cDNAs at both the 5'- and 3' ends, and identified 9569 distinct genes. Results Here, we completely sequenced 1047 Thellungiella full-length cDNAs representing abiotic-stress-related genes, transcription factor genes, and protein phosphatase 2C genes. The predicted coding sequences, 5'-UTRs, and 3'-UTRs were compared with those of orthologous genes from Arabidopsis for length, sequence similarity, and structure. The 5'-UTR sequences of Thellungiella and Arabidopsis orthologs shared a significant level of similarity, although the motifs were rearranged. While examining the stress-related Thellungiella coding sequences, we found a short splicing variant of T. halophila salt overly sensitive 1 (ThSOS1), designated ThSOS1S. ThSOS1S contains the transmembrane domain of ThSOS1 but lacks the C-terminal hydrophilic region. The expression level of ThSOS1S under normal growth conditions was higher than that of ThSOS1. We also compared the expression levels of Na+-transport-system genes between Thellungiella and Arabidopsis by using full-length cDNAs from each species as probes. Several genes that play essential roles in Na+ excretion, compartmentation, and diffusion (SOS1, SOS2, NHX1, and HKT1) were expressed at higher levels in Thellungiella than in Arabidopsis. Conclusions The full-length cDNA sequences obtained in this study will be essential for the ongoing annotation of the Thellungiella genome, especially for further improvement of gene prediction. Moreover, they will enable us to find splicing variants such as ThSOS1S (AB562331).


Background
Thellungiella halophila (also known as T. salsuginea) is used as a model system for understanding abiotic stress tolerance. It shows tolerance not only to extreme salinity stress, but also to chilling, freezing, and ozone stresses [1][2][3][4][5][6][7][8][9][10]. Thellungiella is closely related to Arabidopsis, with 90% cDNA sequence identity between the two species, and it can be easily transformed by using the floral dipping method [1,11]. Thellungiella has a number of other features useful for genetic research, such as small size, short life cycle, high seed number, and self-compatibility.
The Arabidopsis genome sequence and other genetic resources, including collections of full-length cDNAs, have provided powerful tools for comparative genomics to understand the biology and evolution of other plants [3,5,12]. In particular, highly accurate full-length cDNA sequences that span the entire protein-coding region of a given gene can advance comparative, functional, and structural genome analyses. The accurate prediction of protein-coding regions in genome sequences is limited by the difficulty of finding islands of coding sequences within an ocean of noncoding DNA, and by the complexity of individual genes that may code for multiple peptides through alternative splicing. The sequence data from full-length cDNAs has contributed to the accuracy of annotation and to improving gene prediction in Arabidopsis [13][14][15]. For these reasons, we have been working to collect similar data for Thellungiella.
We previously reported construction of a full-length cDNA library of Thellungiella derived from various tissues and from whole seedlings subjected to environmental stress treatments, including high salinity, chilling, freezing, and abscisic acid (ABA). We obtained a total of 35 171 sequences from 20 000 clones, and named them RIKEN Thellungiella Full-length (RTFL) cDNA clones. These sequences were assembled by using the CAP3 method and were clustered into 9569 nonredundant cDNA groups [16].
Thellungiella has an effective system for suppressing Na + influx and for excreting Na + [3]. It also exhibits high potassium/sodium selectivity, according to electrophysiological analysis of instantaneous current [4]. This implies that Thellungiella has ion channels with specific features that lead to superior sodium/potassium homeostasis. Membrane transporters have been shown to be important components of salt tolerance mechanisms in other species on account of their regulation of ion homeostasis. For example, the SALT OVERLY SENSI-TIVE (SOS) pathway is a well-defined pathway in Arabidopsis for the regulation of sodium ion homeostasis during plant growth under salinity stress [17,18]. In this pathway, a calcium-binding protein, SOS3, perceives a change in intracellular calcium concentration induced by salt stress and then binds to and activates SOS2, a serine/threonine protein kinase. The SOS3-SOS2 complex increases the expression and activity of SOS1, which encodes a plasma membrane Na + /H + exchanger (antiporter) [19,20]. Activated SOS1 transports cytosolic sodium out of the cell, reducing the cellular build-up of toxic levels of sodium [17]. The Thellungiella SOS1 gene, ThSOS1, has a conserved amino acid sequence and protein structure with orthologous genes from Arabidopsis and other plants [21]. Transgenic Thellungiella plants in which ThSOS1 transcript levels were reduced by RNA interference (RNAi) showed lower salt tolerance than wild-type plants, suggesting that SOS1 is critical for salt tolerance in halophytic species as well as in glycophytic species such as Arabidopsis [21]. Recently, a 193-kb Thellungiella BAC clone containing the putative SOS1 locus was sequenced, annotated, and compared with the sequence in the orthologous 146-kb region of the Arabidopsis genome on chromosome 2 [22].
Here, we selected 1047 cDNAs for genes related to salt stress, transcription factors, transporters, and protein phosphatase 2Cs from 9569 individual RTFL clones, and determined the complete sequences. We then predicted the coding sequence (CDS), 5'-UTR, and 3'-UTR for each of the cDNAs and compared them with the corresponding regions from the orthologous Arabidopsis genes. We also compared the expression levels of Thellungiella and Arabidopsis Na + -transport system genes by using full-length cDNAs to probe Northern blots under equal conditions of hybridization and detection.

Comparison of CDS, 5'-UTR, and 3'-UTR sequences between orthologous genes in Thellungiella and Arabidopsis
To assess the quality of the completely sequenced cDNAs, we performed BLAST analysis using CDS sequences against nucleotide or peptide sequences from the TAIR8 dataset (see Additional file 1, Table S1 and Additional file 2, Table S2) and identified Arabidopsis orthologs of the 1047 Thellungiella genes. The average lengths of the Arabidopsis orthologous CDSs, 5'-UTRs, and 3'-UTRs were 1331 ± 698 bp, 160 ± 145 bp, and 241 ± 146 bp, respectively. Figure 2 compares lengths and identities between the CDS, 5'-UTR, and 3'-UTR regions of the 1047 Thellungiella cDNAs and those of the orthologous genes from Arabidopsis. Most CDS pairs showed highly similar lengths, whereas the 5'-and 3'-UTR pairs showed significant variation in length (Figure 2A). The average nucleotide identity within homologous CDS pairs was 87%, whereas the average identity within the 5'-and 3'-UTR pairs was only 57% to 61% ( Figure 2B). A previous analysis of the transcriptional differences between Thellungiella and Arabidopsis showed that Arabidopsis has a global defense strategy that requires bulk gene expression, while Thellungiella induces expression of genes functioning in protein folding, posttranslational modification, and protein redistribution [5]. The sequence diversity in the 5'-and 3'-UTR pairs may be involved in the posttranslational regulation of stress tolerance mechanisms in Thellungiella.

Comparison of structure of UTR regions between Thellungiella and Arabidopsis
To compare the overall architecture of the UTRs between Thellungiella and Arabidopsis, we randomly selected 10 orthologous pairs with 5'-UTRs of least 50 bp in length from PP2Cs, transcription factors and transporters, respectively. We identified motif families shared between the 5'-UTRs of Thellungiella and Arabidopsis orthologs using the Dragon Motif Builder system [26]. Analyzing each of the orthologous pairs individually, we compared the order of the shared motifs between each pair (Additional file 3, Figure S1). Figure 3 shows the arrangement of the motifs in 5'-UTR regions in nine orthologous gene pairs in Thellungiella and Arabidopsis. These motif sequences are shown in Additional file 4, Table S3. The members of each orthologous 5'-UTR pair shared 4 to 19 motifs; however, positional rearrangements were found between the members of each pair. Similar positional rearrangements of 5'-UTR motifs were  reported in a comparison of 48 pairs of orthologous sequences between common carp and zebrafish [27]. The presence of such shared motif families suggests the existence of regulatory components common to both species.

Structural comparison of ThSOS1 and splicing variant ThSOS1S
Only one clone was orthologous to Arabidopsis SOS1 among 20 000 sequenced Thellungiella cDNAs [16]. The deduced protein was a splice variant of ThSOS1 (Acc. No. EF207775.1). In the variant, an exon encoding 19 amino acid (aa) residues (60 nucleotides) followed by a stop codon was inserted at the beginning of the 15th exon of ThSOS1 ( Figure 4A, B). We named the short variant ThSOS1S, for Thellungiella halophila Salt Overly Sensitive 1 Short form. ThSOS1 comprises an N-terminal, integral membrane domain (responsible for Na + transport) and a C-terminal hydrophilic region. In contrast, the predicted ThSOS1S protein has the transmembrane domain of ThSOS1 but lacks the C-terminal hydrophilic region, because the stop codon occurs just after the sequence encoding the transmembrane domain ( Figure 4C).
The transmembrane portion of ThSOS1/ThSOS1S has sequence similarities with plasma membrane Na + /H + exchangers of animal, bacterial, and fungal cells [20]. In animal cells, Na + /H + exchanger 1 (NHE1) functions as a Na + /H + antiporter to maintain pH homeostasis [28]. NHE1 has a C-terminal tail of~300 aa, which is important in regulating the Na + /H + antiporter activity through phosphorylation or binding of regulatory proteins [29]. The Synechocystis Na + /H + antiporter SynNhaP also has a long hydrophilic C-terminal tail (100 aa). In SynNhaP, the deletion of a 56-aa hydrophilic terminal region partially inhibited the antiporter activity, and replacement of the long C-terminal tail with the orthologous region from the halotolerant cyanobacterium Aphanothece halophytica, ApNhaP, altered its ion specificity [30]. Arabidopsis Na + /H + antiporter SOS1 has 12 predicted transmembrane domains in the N-terminal region and a long cytoplasmic tail of~700 aa at the C-terminus [20]. The predicted cytoplasmic tail of SOS1 interacts with radical-induced cell death 1 (RCD1), a regulator of oxidative stress responses under salt or oxidative stress. Like rcd1 mutants, sos1 mutants show an altered sensitivity to oxidative stresses [31]. These results suggest that the long C-terminal tail mediates not only the regulation of transport activity with a variety of intracellular regulatory proteins, but also the ion specificity and the cross-talk with other stress tolerance mechanisms. The N-terminal transmembrane region of SOS1 shows high similarity among various organisms ( Figure 4A), whereas there is no significant similarity among the Cterminal regions [30]. The C-terminal sequence variation may result in different functions for this region among different organisms. In particular, NhaP, a Na + /H + antiporter of Pseudomonas aeruginosa, is highly homologous to SOS1, NHE1, SynNhaP, and ApNha1 ( Figure 4A), but it does not have the C-terminal long tail [32]. ThSOS1S is similar to NhaP in that it contains only a Na + /H + -exchanger domain in the transmembrane domain. It is possible that ThSOS1S functions as an Na + /H + antiporter whereas ThSOS1 functions not only in salt stress response (via the N-terminal Na + /H + antiporter), but also in response to other abiotic stresses (via the long C-terminal tail).

Expression levels of ThSOS1 and ThSOS1S
We performed qRT-PCR analysis of ThSOS1 and ThSOS1S expression by using primers specific to each of these splice variants ( Figure 5A). We detected both transcripts, suggesting that Thellungiella normally produces both forms ( Figure 5B). Interestingly, the expression level of ThSOS1S under normal growth conditions was higher than that of ThSOS1. The expression level of SOS1 in Thellungiella is higher than that in Arabidopsis when full-length cDNAs are used as probes, especially under normal growth conditions [3]. These data suggest that the high expression of ThSOS1 detected under normal growth conditions derive from the high expression level of ThSOS1S. To confirm the existence of such a similar splice variant in Arabidopsis, RT-PCR was performed using primer sets that are able to detect the splice variants in Thellungiella and Arabidopsis. The short splice variant corresponding to ThSOS1S was not detected in Arabidopsis, whereas both splice variants were detected in Thellungiella ( Figure 5C). This result suggests that the short splice variant of SOS1 is specific to Thellungiella.

Expression profiles of Na + transport genes of Thellungiella and Arabidopsis
The set of completely sequenced Thellungiella cDNA clones contains several genes that function in the Na + transport system, including SOS1, NHX1, NHX2, NHX5, and high affinity K + transporter 1 (HKT1). We performed RNA blot analysis of these genes in both Thellungiella and Arabidopsis using full-length cDNAs as probes. In each case, RNA blots of a given species were hybridized with probes derived from that same species, with conditions of probe radioactivity, hybridization, and       exposure period normalized between the two species. The expression levels of SOS1, NHX1, NHX2 and HKT1 in Thellungiella were higher than those in Arabidopsis under both normal and high-salinity conditions ( Figure  6). SOS1, NHX1, and HKT1 play essential roles in salt tolerance in Arabidopsis [33][34][35], and transgenic plants overexpressing either SOS1 or NHX1 show higher tolerance to salt stress than do wild-type plants [18,36]. In particular, the expression level of NHX1 was very high in Thellungiella under both normal-and high-salinity conditions, suggesting that the constitutively high expression of molecules functioning in Na + transport may partly account for the high salinity tolerance of Thellungiella.

Conclusions
We sequenced 1047 Thellungiella cDNAs and used this information to compare the responses of Thellungiella and Arabidopsis to high-salinity conditions. The fulllength cDNA sequences will contribute to annotation of the Thellungiella genome and will improve gene predictions. Moreover, these fully sequenced cDNAs will enable finding splicing variants such as ThSOS1S. RNA blot analysis indicated that the extreme salt tolerance of Thellungiella might be attributable to the constitutively higher expression of genes functioning in the Na + transport system.

Data access
Sequences from this study have been deposited in NCBI GenBank under accession numbers [GenBank: AK352512] to [GenBank: AK353558]. The RTFL clones are available for distribution from the RIKEN Bioresource Center http://www.brc.riken.go.jp/lab/epd/Eng/.

Determination of CDSs, 5'-UTRs, and 3'-UTRs of full-length cDNAs
The locations of CDSs were determined with the EMBOSS getorf program (ver. 6 [40]), which identifies the longest stretch of uninterrupted sequence between a start codon (ATG) and stop codon (TGA, TAG, TAA) in the 5'-to 3' direction as the predicted CDS. The sequences before and after each predicted CDS were designated as the 5'-and 3'-UTRs, respectively. The 3' poly(A)-tail lengths were not included when determining the UTR lengths.

Identification of orthologous genes in Arabidopsis and Thellungiella
The CDS data set of 1047 Thellungiella cDNAs was compared with the gene sequences in The Arabidopsis Information Resource (TAIR8) by using BLAST searches (ver. 2.2.17 [41]). The top hit in each BLAST search was assumed to be the Arabidopsis ortholog.

Measurement of plant Na + content
Two-week-old Arabidopsis and Thellungiella plants grown on 1/2 MS agar plates were transferred to plates containing 1/2 MS agar medium plus 250 mM NaCl. Plants were harvested at 1, 3, 5, 7, 10, 14, 21, and 28 days after transfer. For each sample, five plants were pooled and soaked in 5 mL sterile distilled water. The leaf-water mixture was boiled for 15 min, filtered through a 0.2-μm filter (Toyo Roshi Kaisha, Ltd.), and diluted 20-fold. The solution was analyzed by using a Shim-pack IC-C3/C3 (S) column (Shimadzu, Japan) on a Shimadzu PIA-1000 Personal Ion Analyzer (Shimadzu, Japan).