Identification of tissue-specific, abiotic stress-responsive gene expression patterns in wine grape (Vitis vinifera L.) based on curation and mining of large-scale EST data sets

Background Abiotic stresses, such as water deficit and soil salinity, result in changes in physiology, nutrient use, and vegetative growth in vines, and ultimately, yield and flavor in berries of wine grape, Vitis vinifera L. Large-scale expressed sequence tags (ESTs) were generated, curated, and analyzed to identify major genetic determinants responsible for stress-adaptive responses. Although roots serve as the first site of perception and/or injury for many types of abiotic stress, EST sequencing in root tissues of wine grape exposed to abiotic stresses has been extremely limited to date. To overcome this limitation, large-scale EST sequencing was conducted from root tissues exposed to multiple abiotic stresses. Results A total of 62,236 expressed sequence tags (ESTs) were generated from leaf, berry, and root tissues from vines subjected to abiotic stresses and compared with 32,286 ESTs sequenced from 20 public cDNA libraries. Curation to correct annotation errors, clustering and assembly of the berry and leaf ESTs with currently available V. vinifera full-length transcripts and ESTs yielded a total of 13,278 unique sequences, with 2302 singletons and 10,976 mapped to V. vinifera gene models. Of these, 739 transcripts were found to have significant differential expression in stressed leaves and berries including 250 genes not described previously as being abiotic stress responsive. In a second analysis of 16,452 ESTs from a normalized root cDNA library derived from roots exposed to multiple, short-term, abiotic stresses, 135 genes with root-enriched expression patterns were identified on the basis of their relative EST abundance in roots relative to other tissues. Conclusions The large-scale analysis of relative EST frequency counts among a diverse collection of 23 different cDNA libraries from leaf, berry, and root tissues of wine grape exposed to a variety of abiotic stress conditions revealed distinct, tissue-specific expression patterns, previously unrecognized stress-induced genes, and many novel genes with root-enriched mRNA expression for improving our understanding of root biology and manipulation of rootstock traits in wine grape. mRNA abundance estimates based on EST library-enriched expression patterns showed only modest correlations between microarray and quantitative, real-time reverse transcription-polymerase chain reaction (qRT-PCR) methods highlighting the need for deep-sequencing expression profiling methods.

The roots of terrestrial plants are vital organs for the acquisition of water and essential minerals. As such, roots serve as the first site of perception and/or injury for many types of abiotic stress, including water deficiency, salinity, nutrient deficiency, and heavy metals [37][38][39]. Vitis roots also accumulate a number of unique stilbene and oligostilbene defense compounds, chemical species not found in seed or other phytoalexin-rich tissues [40,41]. Despite the importance of roots, the study of V. vinifera root tissues has been rather limited in contrast to the study of berry tissues. In a comparative EST study, Moser and colleagues generated 1555 ESTs from V. vinifera cv. Pinot Noir root tissue and found them enriched for genes with functions in primary metabolism and energy [42]. Using a 12 K CombiMatrix custom array, Mica and colleagues profiled the expression of microRNAs (miRNAs), small (19-24 nt) noncoding RNAs that negatively regulate gene expression post-transcriptionally in multiple organs. This study showed that roots had nine and four miRNAs with either significantly increased or decreased relative abundance, respectively, relative to leaves and early inflorescences [8]. A framework physical or genetic map has also been developed for wine grape, using resistant and susceptible crosses, to locate genetic determinants associated with resistance to the root pathogen phylloxera [43]. EST transcriptional profiling has recently been used to identify genes that might be involved in resistance to Rhizobium vitis in the semi-resistant Vitis hybrid 'Tamnara' [44].
In grapevine, more than 350,000 EST sequences have been generated and analyzed to identify gene expression related to a wide range of processes including berry development in wine grape [30,45] and in table grape [46], tissue-specific gene expression [6,42], the fulfillment of chilling requirements in dormant grape buds [34], and the characterization of resistance to pathogens such as Xylella fastidiosa [47] and Rhizobium vitis [44]. To discern how steady-state transcript accumulation changes in response to multiple environmental stress treatments, we generated a total of 45,784 ESTs from leaf and berry tissues from vines subjected to abiotic stresses (e.g., salinity, cold, heat, water deficit, and anoxia). These were compared with 32,286 ESTs within 20 libraries derived from leaf and berry tissues deposited in the public databases. Clustering and assembly of leaf and berry ESTs with all available V. vinifera full-length transcripts and ESTs returned a total of 13,278 unique sequences, with 2302 singletons and 10,976 clusters mapping to known gene models. Of these 10,976 unique clusters, 739 transcripts were found to have significant differential expression among the libraries examined. Comparison of in silico digital expression analysis with transcript abundance estimates obtained by Affymetrix Vitis GeneChip ® genome microarrays and quantitative real-time reverse transcription-polymerase chain reaction (qRT-PCR) revealed that EST frequency counts were in moderate agreement with microarray or qRT-PCR analysis. Given the relative lack of ESTs available for grape root tissues, 16,452 ESTs were sequenced from roots of young vines (10 cm in length), grown under unstressed conditions as well as under cold, salinity, and water deficit stress. The major categories of genes expressed in root tissues were defined and 135 genes with root-specific or highly enriched root expression patterns were identified.

Results
EST library analysis from abiotically stressed tissues of Vitis vinifera cDNA libraries derived from abiotically stressed leaves (Library ID 10208) and berries (Library ID 12435) of V. vinifera cv. Chardonnay, were sequenced to generate 24,400 and 21,384 ESTs, respectively (Table 1). In addition, a total of 16,452 ESTs were sequenced from a normalized cDNA library synthesized from Magenta box grown root tissues from cv. Cabernet Sauvignon exposed to control, water deficit, cold, and salinity stress conditions (see Methods section for details) (Library ID 22274). In total, 66,236 expressed sequence tags (ESTs) were generated ( Table 1). The leaf and berry libraries were described previously in the context of flower and berry development [6]. In addition, five unstressed leaf libraries, representing a total of 8642 ESTs, 13 whole berry with seeds libraries derived from unstressed source tissues at various stages of berry development, representing a total of 31,840 ESTs, and two root libraries, representing a total of 1657 ESTs, present within the UniGene database [48] were compiled (Table 1). These EST collections were used as tools to identify transcripts encoding abiotic stress responsive transcripts in leaves and berries and root-specific or root-enriched transcripts.
To create up-to-date annotations, each EST was matched with the corresponding "tentative consensus" (TC) contig sequence from the Vitis vinifera Gene Index (VvGI, version 6, July 30, 2008, Dana Farber Cancer Institute) [49] and predicted peptide sequences from the Genoscope 8.4X Vitis vinifera cv. Pinot Noir (GSVIV) genome assembly, August 8, 2007 [1]. A newer version of VVGI (7.0, 4/17/2010) was released since this analysis was undertaken. However, this release is substantially similar to 6.0, containing the same 25,497 gene models derived from the NCBI RefSeq source and only 4851 additional ESTs and was not expected to substantially alter the findings presented. A newer 12X coverage draft of the Vitis vinifera genome has also become available. However, some gene models annotated in this 12X draft were found to contain greater frequencies of intronexon splices not supported by EST evidence (data not shown) and, therefore, the 12X draft was not used.
Because the mixed stress normalized root library was generated using a normalization technique that would, in effect, reduce the apparent expression of the most abundant transcripts, and because few other unstressed root ESTs were available for comparison, characterization of the genes in the root EST library was performed in a separate analysis.

Identifying EST redundancy
In estimation of gene expression patterns inferred from EST frequencies, which are the number of times the transcript of gene x i is observed in relation to the total number of random observations of all genes, (x i / ∑x), any ESTs from a single clone sequenced from both the 5' and 3' directions must be counted exactly once to avoid overestimation of the frequency of genes. cDNA library sequencing strategies varied among sources, with Libraries generated or used in the present study. The Stressed Leaf (SL) and Stressed Berry (SB) libraries were generated previously [6], the Stressed Root "VVM" library was generated specifically for this study, and all other libraries were obtained from the dbEST database maintained by the NCBI. Tissues, dbEST library identifier, sequencing direction, and library descriptions are provided. Unique clones were identified from the ESTs of bidirectionally sequenced libraries as described in the "Methods" and "Results" sections. This method of redundancy elimination was extended next to those bi-directionally sequenced clones from the non-stressed leaf and berry libraries obtained from the UniGene database (Table 1). Many errors were found in the annotated compositions of leaf (Library IDs: 12752, 12753, 12948, and 12949) and berry libraries (Library IDs: 12754, 13015, 13016 and 13017). The errors and the corrections made are explained below as presented in Figure 1 and summarized in Table 2. For the Cabernet Sauvignon leaf library CA48LN (Library ID 12948), we were able to organize 1486 ESTs into 743-paired reads. Within these pairs, > 68% (509) could not be assigned to the same gene. Similarly, high rates of disagreement were found within other libraries listed in Table 2. As these rates were higher than those observed in paired reads from abiotically stressed leaf or berry libraries, the cause or causes of these higher error rates were investigated further.
The cDNA libraries presented in Table 2 were bidirectionally sequenced and had annotation that allowed for the partial reconstruction of the workflow by which they were prepared and sequenced originally [50] with clone names deposited to NCBI such as "CA48LN09IF-A9, 5'end." This annotation identifies the library "CA48LN," a batch number ("09," the plate within that batch (I), location on a 96-well plate (A9) and direction (5'). All ESTs in a given library shared the library stem, batches generally contained four plates (I-IV), and 80% of plates were sequenced from both 5' and 3' directions. When the forward and reverse pairs of ESTs in Library ID 12948 were organized by their 96-well plate well order (A1,A2,...,A12,...,H1,...,H12), various patterns of "well slip" were identified, wherein the gene ID for well A9 (5') matched the gene ID of well A10 (3'), A10 (5') matched A11 (3'), and so forth. The distance of these "well slips" was neither uniform nor consistent.
To determine all pairs of ESTs with incorrectly paired wells, a method was devised that would identify robustly "well slips" of non-uniform distances, analogous to the dot-plot method of local nucleotide sequence alignment [51]. In this method, the gene IDs of ESTs were arranged from A1-H12 for each 5' and 3' plate and plotted along two axes with a dot designating wherever the gene IDs were identical ( Figure 1). The dot plot proved effective at identifying forward-reverse pairing in plates with "well slips," such as in Figure 1A, wherein the four forward and reverse plates of "batch 09" in leaf Library ID 12948 were plotted in the order 1f, 1r, 2f, 2r, 3f, 3r, 4r along both the × and y axes. The main diagonal bisecting the plot, where the ordered list is identical to itself, is flanked by four offset diagonals that illustrate where the forward and reverse plate pairs match (1f≈1r, 2f≈2r, etc.). The matching clearly distinguished pairs of plates through the variable "well slips" in Library ID 12948.
This matching process was repeated for all plate batches (generally four forward and four reverse plates per batch) of the libraries listed in Table 2 and other error types besides the "well slips" seen in Library ID 12948 were uncovered. Some plates were duplicated, as seen in Figure 1B, wherein all combinations of the forward and reverse of four individual plates matched in berry Library ID 12753 (1f≈1r≈2f≈2r). Were these errors not identified, the ESTs of plate 1 and 2 would have been added both to the frequency totals of the genes therein (i.e., counting twice what should only be counted once), resulting in an overestimation of the frequency of those transcripts in the library. Other pairs of plates showed a less complete duplication pattern as seen as the inchoate diagonals between plates 1 and 2 (pink) and between 2 and 3 (purple) in Figure 1E, and all four plates (purple) in Figure 1F. In other cases, a plate did not match the annotated reverse, but a different plate instead, such as the pair-swapping of Library ID 12948 (3f≈4r and 4f≈3r) in Figure 1C and triplication (2f≈2r≈3r) / mis-pairing (3f≈1r) in berry Library ID 13016 ( Figure 1D). Where identified, these partial duplications and mismatched plates were handled just as the full duplications were, with the EST counts reduced to reflect the true number of independent clones involved.
The same analytical method was then extended to compare every plate in a library to all other plates in Figure 1 Correcting erroneous EST identities in bi-directionally sequenced leaf and berry libraries with dot plots. Contig names assigned to ESTs from bi-directionally sequenced libraries were plotted in two dimensions to identify "motifs of self-similarity" analogous to dotplot sequence alignments. The sequencing batch, plate order, and well position were recapitulated from dbEST submission files as a sequential list arranged as 1f, 1r, 2f, 2r, 3f, 3r, 4f, 4r, and plotted against itself in the x and y axes. A) Diagonals indicate four sets of plates from Library ID 12948, batch 8 are named and paired correctly (blue); B) Library ID 12753, batch 1, all combinations of plates 1f, 1r, 2f and 2r are duplicates (salmon), plates 3 and 4 are correctly paired (blue); C) Library ID 12948, batch 10 plate 1f matches 1r (blue), plate 2f and 2r did not match, plate 3f matches 4r (salmon), 4f matches 3r (magenta); D) Berry Library ID 13016, batch 1, plate 3r matches with 2f and 2r (salmon), 1r matches with 3f (magenta), 1f has no match, plate 4 is paired correctly (blue); E) Library ID 13017, batch 2, Plates 1 and 2 display partial matching (pink), plates 2 and 3 also partially match (purple); F) Berry Library ID 13017 batch 3, partial matching between all four plates (purple); G) Berry Library ID 13015, batch 2, plate 1 matches batch 5 plate 1r (salmon); other plate match errors are also apparent in lower right hand quandrant (magenta); H), Leaf Library ID 12752, batch 5, plate 4r matches Berry Library ID 12754, batch 5, plates 4fr (salmon).
that library. One additional case of unexpected matching was found, where plates from one batch match the plates of a different batch in the same library ( Figure  1G). Lastly, we extended the method to compare every plate in every library against all other plates in all other libraries, even those annotated as arising from different tissues. From this, a single instance was found where a plate in the leaf Library ID 12752 (plate 4r) was identical to a pair of plates from berry Library ID 12754 (Figure 1H). The genes encoded on these plates were consistent with those found in mature berry library (e.g., cell wall proteins, ripening-related proteins, and no photosynthesis genes), but not a leaf library, leading to the conclusion that a cDNA library misassignment error had occurred, and leading to the exclusion of these data from our analyses. To uncover other possible library assignment errors, every plate from all libraries in the present study were compared against all other libraries (e.g., bud tissues, petioles, flowers, and pathogen infected leaves) that were not considered for our abiotic stress analysis, but no further spurious pairings were detected (data not shown). Upon exhaustively identifying all observable patterns of errors, 5' ESTs were paired with their 3' partners and the unique clones within each library were counted (Table 1). In total, errors in the identification/annotation of 5558 of 23,351 ESTs (24%) were discovered from the libraries listed in Table 2.

Estimating gene expression by EST frequency
In order to measure differences in gene expression patterns among stressed and unstressed leaves and berries, the EST frequency within each GSVIV gene ID (or Uni-Gene ID, in cases where no GSVIV gene model could be assigned) was calculated for each leaf, berry, stressed leaf, and stressed berry library. The EST frequencies of the five leaf libraries were combined by weighted mean, as were the 13 berry frequencies [52]. Differential gene expression was then calculated using the combined EST frequency counts for genes using the IDEG6 web tool [53]. The chi-squared test (χ 2 ) was used as the test statistic, as recommended when conducting statistical comparisons of more than two groups [54]. At a p-value cutoff of < 0.001, 739 genes were estimated to have differential expression among the libraries compared. The 739 genes were then organized by hierarchical clustering, using a function of the Pearson correlation coefficient as the distance metric and the average agglomeration method ( Figure 2). The sets of genes clustered first between tissue type, as seen by the first branching in the dendrogram, and then by control or abiotic stress condition, as seen in the next two branches. At this distance the four clusters generally correlated to transcript abundance profiles within a single library type with the largest cluster of 355 transcripts corresponding to tissues of stressed leaves (SL). The leaf cluster (L) contained 127 genes, whereas stressed berry (SB) and unstressed berry clusters (B) contained 127 and 130 genes, respectively. The annotation, gene models, and relative frequencies of all 739 genes are listed by cluster in Additional Files 1, 2, 3 and 4. The high number of transcripts present within the stressed leaf Well pairing "slips" Leaf Library ID 12948 10 Figure 1A Leaf Library ID 12949 1

Incorrect plate pairings
Leaf Library ID 12753 Plate quadruplicated 1 Figure 1B Berry Library ID 12754 Plate quadruplicated 1 Leaf Library ID 12948 Plate pair swap 2 Figure 1C Berry Library ID 13015 Plate pair swap 2 Plate triplicated 3 Figure 1G Berry Library ID 13016 Plate triplicated 1 Figure 1D Plate pair swap 1

Sequences originated from a different library
Leaf Library ID 12752 and Berry Library ID 12754 Plate of "leaf" ESTs actually a triplicate of berry Lib.12754 ESTs 1 Figure 1H Errors in the supplied annotation of a set of cDNA clones that were sequenced bidirectionally were identified and corrected to generate accurate counts of EST frequency. Errors are categorized by the scope of the error, from "well slips" between single pairs of 5' and 3' 96-well plates of ESTs, through incorrectly identified pairs of plates of increasing scope. The number of times each error occurred (pairs or larger groups of 96-well plates affected) and was corrected is shown. Errors that are visualized by dot-plot in Figure 1 are cross-referenced.
cluster might reflect the depth to which this library was sequenced, the variety of abiotic stresses to which these source tissues were subjected, and the diversity of transcripts expressed within the grape leaf transcriptome under abiotic stresses [10]. Of these 739 genes with differential expression among the cDNA library clusters, 637 were matched successfully to GSVIV gene/protein identifiers, which were then matched with the annotation files associated with Vitis-Net [55]. VitisNet networks were combined into categories of their major networks, with metabolic networks divided into primary metabolism, photosynthesis, secondary metabolism, and hormone biosynthesis, the latter category being grouped with the hormone signaling category. Gene IDs that were "out-of-network", but that had functional annotations associated with them in the VitisNet master list were also incorporated into the functional category designations. In Figure 3, the functional categories of genes identified within the four major clusters are shown.
Without over-interpretation, some key differences among the functional categories of genes prominent within each organ/condition are clearly apparent. For example, unstressed leaves ( Figure 3A) were distinguished by a large proportion (28%) of primary metabolic genes with some photosynthetic genes, such as RUBISCO small subunit and plastidic photosynthetic electron transport components being extremely over represented. Transcripts for non-specific lipid-transfer protein, metallothionein, early light-induced protein (ELIP1), and several unknown genes were also highly represented within this cluster along with 23S rRNA (Additional File 2). In stressed leaf, 11% of transcripts encoded photosynthesis-related functions, including plastidic ATP synthase and electron transport chain subunits, suggesting that higher demands and/or damage might occur under stress that must be repaired ( Figure 3B). Consistent with this suggestion is the over representation of several families of low molecular heat shock proteins. Leaves under abiotic stress expressed a greater proportion of specific transport genes (21%) (Additional File 1). Interestingly, the activity of transposons is apparently de-repressed in stressed leaves as judged by the preponderance (7%) of a centromere-specific class of retrotransposons. Similar abiotic induction of retroelements in non-germline tissue has been described in Solanaceous species and the ABA-induction of the Tnt1A promoter in Arabidopsis thaliana [56]. The unstressed berry cluster possessed overrepresented transcripts encoding genes with functions involved in primary metabolism, translation, cell wall-related proteins (9%), and transport (12%) ( Figure 3C, Additional File 4). In contrast, the stressed berry cluster ( Figure  3D) had the highest proportion of genes annotated as "stress-responsive" (17%) including overrepresented transcripts encoding xyloglucan endotransglucosylase/ hydrolases, a DEAD box RNA helicase, and seed storage proteins including albumins and globulins and several highly abundant unknown proteins (Additional File 3).

Correlation with microarray data
Next, differences in transcript expression patterns estimated by EST frequency were compared with a second platform, the Affymetrix ® Vitis GeneChip ® microarray. Of the 739 transcripts described above, microarray probeset identifiers could be assigned for 489 of them. All differentially expressed genes available from microarray experiments in which similar stresses were imposed were collected. For leaf tissue, within which our stressed leaf library included a mixture of drought, NaCl, heat, and light stressed tissue, two experiments were used as a source for microarray data: an experiment in which drought and salt stress were applied over a 16 d period [10] and an experiment that analyzed rapid changes (≤ 24 h) in gene expression under osmotic stress (mannitol), NaCl, and chilling exposure [31]. For the berry libraries, microarray data from a drought stressed berry time course experiment of Chardonnay and Cabernet Sauvignon [27] were compared with EST frequency data. Following the example of van Ruissen and colleagues [57], probeset expression values were then compared with EST frequencies using only those probesets for which significant differences were observed between stressed and unstressed tissues in the original microarray experiments. Using this method, 184 comparisons of significantly different changes were plotted ( Figure 4). Overall correlation between the microarray and frequency-based expression measures was modest. The non-parametric Spearman rank correlation was modestly positive, at (r s = 0.2), but with a P < 0.005, indicating that this similarity, while modest, is extremely unlikely to be due to chance alone. Pearson correlation was similar (r = 0.21). In other studies comparing microarray to EST or similar tag-based technologies, modest Spearman and Pearson correlations have been observed [58]. Following the example of Li and colleagues, the directional concordance, which is the directional agreement in either increased or decreased relative transcript abundance in response to stress treatment, among the 184 significant genes common to both microarray or EST sampling detection methods was determined. In their comparison of SAGE tags with microarrays in multiple human tissues, these authors found 75% directional concordance among significant genes [58]. Similarly, for our 184 shared genes, the directional concordance was 69% or more than two agreements per disagreement.  Figure 3 Functional categories of differentially expressed transcripts identified by EST frequency analysis. Functional assignments of genes found in the four major clusters of differentially expressed genes. At the chosen hierarchy depth / distance, the four clusters correspond, in large part, to maximal frequencies within A) Leaf, B) Stressed Leaf, C) Berry, and D) Stressed Berry cDNA libraries. Assignments are based upon the data available at VitisNet http://www.sdstate.edu/aes/vitis/pathways.cfm [93]. Chart colors progress clockwise from the top.
In order to verify the gene expression ratios determined by microarray analysis, qRT-PCR was performed on the set of genes listed in Additional file 5. These genes were selected at random and represented genes expressed preferentially in either leaf or berry tissues. Relative mRNA expression for 17 and 22 transcripts was assayed in drought-stressed and well-watered berry tissue and leaf tissue, respectively. A linear regression of the log 2 -ratios of those genes found strong correlation between transcript abundance measured by microarray and qRT-PCR methods (Pearson correlation, r = 0.85) and a very high degree of directional concordance (34/ 39 genes or 87%) ( Figure 5).

Identification of root-enriched genes
The 16,452 ESTs sequenced from the normalized abiotic stressed Cabernet Sauvignon root cDNA library (VVM) were matched to their VvGI ver. 6 consensus sequence contigs [59] and, when possible, to the 8.4X genomic GSVIV gene/protein identifiers and matched with the annotation files associated with VitisNet [55], resulting in the identification of 6424 non-redundant transcripts. Of these, 6002 were mapped successfully to 8.4X GSVIV gene models, whereas the remaining 307 singletons and 115 VvGI contigs did not match GSVIV gene models. The cDNA library normalization method was successful in generating a highly complex library, with 3449 (54%) unique transcripts being represented by EST singletons. Annotation of the 6424 non-redundant root transcripts revealed 4505 (70%) had known functions, 455 (7%) matched a previously annotated gene model, but the function was unclear, and 1464 (22.8%) had unknown functions, with no homology matches to any previously described gene ( Figure 6A). The functional categories were assigned for the 4505 transcripts with known functions ( Figure 6B). Overall, the VVM normalized library contained a high diversity of transcripts with the functional categories of primary metabolism, signal transduction, and transport systems being well represented ( Figure 6B).
Next, the 16,452 VVM Cabernet Sauvignon root ESTs plus an additional 1657 ESTs from two Cabernet Sauvignon root libraries (Library ID 14445, 16696; Table 1) were analyzed for either root-specific or root-enriched transcripts. These root cDNA libraries were compared with a total of 291,233 ESTs from 114 libraries comprising the NCBI UniGene dataset http://www.ncbi.nlm.nih. gov/UniGene/lbrowse2.cgi?TAXID=29760 [48] with the exception of five EST libraries derived from in vitro or cell cultures (Library ID 10498, 15513), mixed organ (e. g., root and leaf together) cDNA libraries (Library ID 20007, 20010), or an amplified fragment length Scatterplot of EST frequencies compared with microarray expression levels. Log 2 -transformed frequency distributions of ESTs from mixed stressed leaf (e.g., water deficit, NaCl, heat, high light) and berry (water deficit stress) and unstressed leaf and berry tissue were compared to 184 Affymetrix ® Vitis GeneChip ® log 2 -abundance ratios of chilling, osmotic (mannitol), and salt stress, and water-deficit-stressed leaf [10,31] and waterdeficit-stressed whole berry tissues [24]. Differences in gene model EST frequencies between stressed and unstressed library pairs (i.e., stressed berries compared with unstressed berries) were plotted along a log 2 scale as well. The Spearman rank correlation, r s , was 0.2047, with likelihood P = 0.005). Filled and gray circles indicate agreement and disagreement in directional concordance, respectively. The total number of genes present in each Cartesian quadrant are shown in gray-shaded boxes. polymorphism (AFLP) cDNA library (Library ID 20099). Relative EST frequency counts were calculated as previously described using weighted averages for the combination of libraries grouped into either "root" or "nonroot" groups. EST frequency counts for genes with two or more ESTs within one or both of the library sets (singletons were removed) and corresponding differential gene expression patterns were calculated with the IDEG6 web tool using the Audic-Claverie statistic (AC), p-value < 0.01. Bonferroni multiple-testing correction was applied to consider only p-values < 3.0 × 10 -6 [53,60]. The comparison of root ESTs against all nonroot ESTs resulted in the initial identification of 255 genes that had p-values below the significance threshold. Furthermore, the AC statistic identified 135 "rootenriched" transcripts that showed greater frequencies in root compared with non-root tissues as listed in Table  3. In addition, 119 of the 255 genes were identified as being enriched in the non-root libraries. Because a normalized root cDNA library was analyzed, these 119 genes were not considered further as the normalization process was expected to result in a systematic underrepresentation of highly abundant root transcripts. Evaluation of the functional categories of the 135 rootenriched genes showed that genes for primary and secondary metabolism as well as transport processes were more numerous compared with the entire root EST collection ( Figure 6C).

Validation of root-enriched genes
In order to confirm root expression patterns estimated by EST frequency, the expression of a set of putative root-specific or root-enriched genes was selected for validation by qRT-PCR. Gene-specific primers were designed for ten of the 135 highly root-enriched transcripts. Genes were selected not only for those with very high root EST count, but also for those gene with lower frequencies, but still considered statistically significant. The gene-specific primers used are listed in Additional File 6. Relative transcript abundance for each gene was tested within root and shoot tissue of Cabernet Sauvignon ( Figure 7). Two-way ANOVA by gene and tissue was performed, and both were significant (P < 0.0001).
After ANOVA, individual Bonferroni corrected t-statistics were computed for each individual gene between root and shoot tissues. Of these ten transcripts, six were found to be significantly more abundant in roots than shoots by Student's t-statistic (p < 0.01). Transcript abundances ranged from 3.8-to 730-times greater abundance in roots than shoots. The most highly root-enriched transcript encoded an uncharacterized Vitis tonoplast intrinsic protein TIP1;4 (GSVIVP00024394001) and was detected at 730-times greater transcript abundance in roots than in shoot tissue. This correlates well with the estimated expression by EST frequencies, where it was found with a frequency of~33.6 tags per ten thousand (tp10k) in roots compared with 0.1 tp10k in non-root tissues (57 root ESTs compared with 2 non-root ESTs). A resveratrol Omethyltransferase (ROMT, GSVIVP00018661001) that was found with a frequency of 17.7 tp10k in roots (30 root ESTs compared with 0 non-root EST) was expressed 120-fold greater in root than in shoot as estimated by qRT-PCR. Similarly, a terpene synthase (TPS) gene, (E, E)-alpha-farnesene synthase [61], was found with a frequency of 13.6 tp10k in roots (23 root ESTs compared with 0 non-root EST) and was 44-fold more abundant in root than shoot as assessed by qRT-PCR. A cinnamyl-alcohol dehydrogenase (9 root ESTs compared with 1 non-root EST) was expressed 27-fold greater in roots than in shoots. A flavonol 3-O-glucosyltransferase (10 root ESTs compared with 1 non-root EST) showed a 8.3-fold greater abundance in roots than in shoots. Lastly, a Myb transcription factor-like a gene (5 root ESTs compared with 0 non-root EST) was tested to evaluate the selected significance cutoff. This transcript     was detected at 3.8-fold greater abundance in roots than in shoots (significant, p < 0.05). In contrast, three of the genes tested (e.g., AP2/ERF114, NGATHA1, Nitrate Reductase 2) failed to demonstrate a significant difference as measured by the multiple test-corrected t-statistic, and a single transcript, a second Myb transcription factor-like b gene (7 root ESTs compared with 1 nonroot EST), was determined to be 2.6-fold less abundant in roots than in shoots (p < 0.05) (Figure 7). For all ten genes tested, the Spearman rank correlation between the two measures of gene expression (EST frequency compared with qRT-PCR) was high (r s = 0.78, p = 0.005).
Although only ten genes were tested, estimation of transcript abundance by EST frequency was apparently effective in identifying genes with root-specific expression, despite the majority of root ESTs coming from a normalized library source.

Discussion
Data mining to discover Vitis vinifera stress-adaptive genes In order to identify novel transcripts that respond to multiple environmental stress treatments, EST libraries generated by us and those derived from public sources were carefully curated and mined to obtain estimates of transcript abundance based on EST frequencies. A total of 21,499 and 18,963 unique ESTs derived from nonnormalized cDNA libraries from mixed abiotic stress leaf and water-deficit stressed berry tissues, respectively, were compared with 5277 and 24,953 unique ESTs derived from cDNA libraries generated with unstressed leaf and berry tissues (Table 1). Tag frequency-based detection of differentially expressed genes is a wellestablished methodology for ESTs [53,60,62], SAGE [57], and MPSS [7], and continues to be an important tool in the era of "next-generation" deep sequencing of transcriptomes [63]. Aside from the removal of Root-enriched genes were identified by EST frequency comparison of Vitis vinifera roots compared with all other tissues, using the Audic-Claverie (AC) statistic [60]. Fold-difference (root/shoot) Figure 7 Expression of candidate root-specific genes in roots and shoots of Cabernet Sauvignon. qRT-PCR analysis of ten selected transcripts in shoot (white bars) and root (gray bars) tissues. Transcript abundances derived from three biological replicates were normalized to an actin reference gene and fold differences were standardized to shoot expression values. Error bars represent standard error. Two-way ANOVA (gene, tissue) was performed followed by post-test Bonferroni-corrected t-statistics. Significant differences in gene expression (root compared with shoot) are indicated by asterisks. * denotes p < 0.05; ** denotes p < 0.01; *** denotes p < 0.001. Fold-differences are drawn on log scale. The tested genes are listed below in the order that they appear on the graph from left to right, with the number of root ESTs compared with non-root ESTs in parentheses. Myb family transcription factorlike b, (7 compared with 1); Nitrate reductase 2, redundant ESTs derived from bi-directional and/or same direction resequencing of individual cDNA clones, one of the main issues encountered during the data curation process was the discovery of various types of naming errors within and across plated clone libraries. With the aid of a simple dot-plot method analogous to that used for local nucleotide sequence alignments [51], gene IDs could be aligned and readily visualized to discover incorrectly paired plates (or portions of plates) containing "well slip" naming errors that would have overestimated the number of ESTs actually present within a particular cDNA library of interest due to duplicated sequencing of plates within the same library ( Table 2, Figure 1A-G). Application of this technique also allowed for the discovery of a misassigned plate of ESTs from a leaf cDNA library to a berry cDNA library, an error that would have confounded the accuracy of EST counting with regard to a particular tissue ( Figure 1H). Comparing EST frequency counts from cDNA libraries of mixed or water-deficit stressed leaf and berry tissues, respectively, with those from cDNA libraries from unstressed leaf and berry tissues, a total of 739 transcripts were identified and clustered into four main clusters (Figure 2, Additional Files 1, 2, 3 and 4). Of these, 637 (86%) transcripts could be annotated and assigned to functional categories (Figure 3). Each cluster contained distinct functional groups that reflected clearly the tissue type and treatment condition in question. For example, transcripts encoding the CBL-interacting protein kinase 10 (CIPK10) were overrepresented in both the stressed leaf (SL) and stressed berries (SB) clusters. CIPK10 participates in the calcineurin B-like (CBL) calcium sensor protein-CIPK network that decodes calcium signals in response to environmental perturbations [64]. The Arabidopsis CIPK10 is localized to the nucleus and cytoplasm when expressed as a GFP fusion in Nicotiana benthamiana leaves [65]. CBL-CIPK interactions are crucial for the regulation of ion homeostasis during salinity stress and other forms of environmental stress, not only at the plasma membrane and tonoplast, but also at the cytoplasm, and nucleus [65]. The increased abundance of CIPK10 transcripts in these stress-specific cDNA libraries indicates this CIPK might play a role in stress adaptation in both Vitis leaves and berries. Several other stress-specific transcripts appeared to be over-represented in both stress libraries including RD22, a salt-, dehydration-, and ABA-responsive gene in grape berries [66] (Additional File 1 and 3). In addition to the genes discussed earlier that were enriched within the stressed berry (SB) cluster, several pathogenesisrelated (PR) proteins, such as three thaumatin genes, a class IV chitinase gene, two osmotin genes, and Snakin-1, a cysteine-rich peptide that exhibits broad-spectrum antimicrobial activity in vitro and fungal and bacterial pathogen resistance in vivo [67], were also enriched in this cluster. The identification of this collection of PR proteins using the EST frequency counting approach outlined here clearly illustrates its practical utility in the discovery of genetic determinants important for biotic and abiotic stress responses. A large number of unknown genes with discrete, cluster-specific expression patterns were also identified, particularly within the stressed leaf (SL) cluster. Such unknown genes can serve as primary targets for future, detailed investigations into gene function.

Validation of EST frequency counts by microarray analysis
In order to validate the efficacy of the EST frequency counting method, 489 out of 739 transcripts could be identified on the Affymetrix ® Vitis GeneChip ® microarray and thus compared using these two distinct technical approaches. The remaining 250 transcripts had no match, and thus, were potentially not described previously as being abiotic stress responsive in Vitis. Between the two platforms, expression data for 184 transcripts could be compared where significant differences in gene expression patterns were observed using both technologies. Like previous reports comparing tag and hybridization measures [63], a modest (r = 0.21), but significant correlation between the two platforms was observed (Figure 4). Further comparison between the two methods revealed a directional concordance of 69%, indicating that the two platforms agreed to a greater extent in terms of their general gene expression trends. What might account for these rather modest correlations? First, these low correlations might be related partly to differences in the reported magnitude of increased or decreased transcript abundance. However, for every two genes that were reported increased or decreased significantly by both platforms, one gene changed significantly in opposite directions ( Figure 4). Thus, magnitude can only account for part of the disagreement. Second, the use of public data sets, which are highly diverse, might introduce biases in gene representation. In earlier studies that have mined public datasets, such as in a comparison of EST reads generated by 454 pyrosequencing with microarray mRNA profiles in two porcine tissues, four-to-one concordance (160 compared with 38) ratios were observed [63] or in a comparison of SAGE tags with microarrays mRNA profiles within a set of human tissues, three-to-one concordance ratios were observed [58]. In the present study, while major systematic errors within the public data sets were corrected in an attempt to capture correct frequency counts for unstressed leaf and berry libraries (Figure 1; Tables 1, 2), these public data sets contained large differences in grapevine cultivar, age, developmental stage, season, terroir, and sample preparation that were likely to introduce biases in gene representation. Third, the relative complexity of our mixed stress leaf library might be a source of bias, because the source tissue for this library included RNA from UV-and heat-treated leaves, treatments for which corresponding microarray data were unavailable for comparison. The presence of genes strongly or exclusively regulated by UV or heat stress would be expected to contribute to the population of the significant-by-EST transcripts with which no corresponding microarray data could be compared.

EST-based gene discovery in Vitis roots
To redress the relative paucity of available grape root sequence data, more than 16,000 ESTs were generated from a normalized cDNA library (VVM) constructed from Cabernet Sauvignon root tissues exposed to cold, salinity, and water deficit stress (Table 1). During its preparation, this library was normalized with the aim of increasing the number of different and low-abundance root genes identified [68]. The 16,452 ESTs assemble into 6424 unique transcripts, of which 3449 (>53%) were represented just once. Because normalized libraries are biased, resulting in an under-counting of abundant transcripts and over-counting of rare ones, they violate the assumption of random sampling, and as such, are not usually considered for use in tag frequency analyses of gene expression [6,42]. Recognizing that library normalization would, at a minimum, underestimate the true relative expression of most root transcripts, the identification of root-specific or root-abundant EST was attempted by EST frequency counting. A total of 18,109 root-derived ESTs were compared with 291,233 ESTs from 114 non-root cDNA libraries. This analysis resulted in the identification of 135 "root-enriched" transcripts with significantly greater EST frequencies in roots than other tissues as determined by the AC statistic ( Table 3). Validation of a set of 10 candidate root genes with varying degrees of apparent root enrichment by qRT-PCR confirmed six genes to be significantly more abundant in grapevine roots than in shoots ( Figure  7). The correlation between estimated EST frequencies and qRT-PCR expression ratios was strong (r s = 0.78) and significant (P = 0.005). Shoot tissue was used to confirm broadly, but not exhaustively, that expression patterns were root-enhanced. Confirmation of the rootspecific expression patterns of these candidate genes will require that additional non-root tissue types (e.g., stems, flowers, berries, etc.) be tested on a gene-by-gene basis.
Chief among the qRT-PCR-validated root genes is a gene encoding an aquaporin/tonoplast intrinsic protein 1;4 (VvTIP1;4) that was expressed as much as 730-fold more in roots than in shoots. VvTIP1;4 has been previously identified from genomic sequence by two groups [69,70], but has not yet been characterized functionally.
Another root-enriched gene, which showed 120-fold greater mRNA abundance in roots than in shoots by qRT-PCR, encodes a putative resveratrol-O-methyltransferase (ROMT), which is 78% identical and 88% similar to a known Vitis ROMT [71]. The ROMT characterized by Schmidlin and colleagues was observed to doubly Omethylate molecules of resveratrol into pterostilbene, a phytoalexin with 5-10 times greater in vitro fungitoxicity than resveratrol [71]. This root-expressed ROMT is also structurally distinct from a ROMT recently characterized in red berries. The red berry ROMT transcript was more abundant in the red grape Cabernet Sauvignon than the white Chardonnay and had peak expression two weeks after véraison in the red cultivar only [72]. A terpene synthase (TPS) was highly expressed in roots with a 44fold greater relative abundance in root than in shoots. Martin and colleagues identified this TPS to be an (E, E)alpha-farnesene synthase in a thorough survey to characterize V. vinifera TPS genes [61]. This TPS exhibited activity that was unique among the 39 characterized, producing only (E, E)-alpha-farnesene when fed farnesene diphosphate (FPP), rather than a mixture of multiple products. A cinnamyl-alcohol dehydrogenase (CAD) gene was also confirmed to be 27-fold more abundant in roots than in shoots. CAD genes are crucial for the synthesis of the lignin compounds in wood formation, but some CAD genes might possess other activities or functions. In Arabidopsis, the activity of the promoters of some AtCAD genes has been observed in cells where CAD-mediated lignification does not appear to take place, including young root tips [73]. Lastly, an UDP-Glucose O-glucosyltransferase (UGT) gene was 8.3-fold more abundant in roots than in shoots. When compared to the positionspecific scoring matrices (PSSMs) found in NCBI's Conserved Domain Database (CDD) [74], this UGT was most similar to the PLN02554 group of UGTs, which are classified as flavonol 3-O-glucosyltransferases (EC 2.4.1.91). However, determining the exact catalytic activities of UGTs generally requires biochemical characterization as even single amino acid changes in UGT proteins can alter regioselectivity (e.g., which hydroxyl group is glycosylated) or UDP-sugar substrate preference [75,76]. Four other candidate genes were also surveyed, but none were found to exhibit significant, root-enriched mRNA expression at p < 0.01.

Conclusions
Abiotic stresses, especially water-deficit stress, have major impacts on vine growth and berry development that ultimately can impact wine quality. Here, EST frequency counts were exploited to identify candidate genes with mRNA expression profiles altered by abiotic stresses by comparing large EST collections from cDNA libraries prepared from leaf and berries harvested from vines subjected to mixed abiotic stresses to publicly available EST collections from these same tissues harvested from unstressed vines. This analysis identified 739 transcripts with significant differential expression in abiotically stressed leaves and berries. Comparison of EST frequency counts of these genes with available microarray expression data identified 184 genes, which also showed significant differences between stressed, and unstressed tissues. While the correlation in expression patterns was modest at best, 69% of genes exhibited directional concordance. Furthermore, the EST frequency counting approach led to the identification of many novel candidate genes whose stress-induced mRNA expression patterns had not been described previously. To identify genes preferentially or exclusively expressed in Vitis roots, a tissue that had previously been largely uncharacterized, 16,452 EST were characterized from a normalized, abiotically stressed cDNA library from Cabernet Sauvignon. Comparison of these ESTs with publicly available EST collections from nonroot tissues allowed for the identification of 135 rootenriched transcripts, a majority of which showed rootpreferential mRNA expression when validated by qRT-PCR. This root-enriched EST collection will serve as a rich resource not only for future studies into the abiotic stress-response networks operating within roots, but also for future genotyping efforts of Vitis rootstock that differ in salinity or drought tolerance characteristics or for manipulation of root stock traits in wine grape.

Plant material
Total RNA was extracted from abiotically stressed V. vinifera cv. Chardonnay leaf and berry tissue 8,9,11,13,15,16 weeks after flowering) using a modified Tris-LiCl protocol as previously described [77]. Root tissue was collected from 10 cm high V. vinifera cv. Cabernet Sauvignon cuttings grown in autoclaved, sterile 77 mm × 77 mm × 97 mm (W × L × H) Magenta GA-7 boxes (Magenta Corp., Chicago, IL) containing 80 ml of 1% Plant Tissue Culture Agar (#A111, Phytotechnology Laboratories, Shawnee Mission, KS) with Murashige & Skoog modified Basal Medium w/ Gamborg Vitamins (#M404, Phytotechnology Laboratories), 1.5% sucrose at pH 5.7 [78,79] grown under fluorescent lamps providing a photon flux density of 50 μmol m -2 s -1 on a 16-h light (24°C)/8-h dark (18°C) cycle. Roots were detached from non-stressed plants and subjected to control conditions (bathed in liquid MS media as above), water deficit stress conditions by exposure to air (for 2 and 4 h), cold (1.5°C), and 150 mM NaCl (in liquid MS media as above) stress for 2, 4 and 6 h. The 6 h time point for water-deficit stress exposure was not used because intact RNA could not be recovered from root tissue after 4 h of stress.

Leaf and Berry cDNA Library Construction, sequencing and processing
The preparation of the leaf (Library ID 10208) and berry (Library ID 12534) cDNA libraries was described previously [6]. The frozen, ground tissue of Chardonnay leaf and berry were homogenized in a buffer containing 200 mM Tris-HCl, pH = 8.5, 1.5% (w/v) lithium dodecyl sulfate, 300 mM LiCl, 10 mM sodium EDTA, 1% w/v sodium deoxycholate, and 1% v/v NP-40. Following autoclaving, 2 mM aurintricarboxylic acid, 20 mM dithiotheitol (DTT), 10 mM thiourea, and 2% w/v polyvinylpolypyrrolidone were added immediately before use. Following precipitation with sodium acetate and isopropanol precipitation, samples were extracted once with 25:24:1 phenol:chloroform:isoamyl and then twice with 24:1 chloroform:isoamyl prior to performing LiCl precipitations to remove DNA contamination. Poly(A)+ RNA was purified from 500 mg of total RNA using the Micro-FastTrack™ 2.0 mRNA Isolation Kit (Invitrogen, Inc., Carlsbad, CA) according to the manufacturer's instructions. cDNA was synthesized from 1-5 μg of poly (A)+ RNA using a Lambda Uni-Zap-XR cDNA synthesis kit according to the manufacturer's recommended protocol (Stratagene, La Jolla, CA). The directionally cloned (EcoRI/XhoI) cDNA libraries generated were then massexcised in vivo and the resulting plasmids (pBluescript II) were propagated in the E. coli SOLR host strain. Individual cDNA clones containing inserts were amplified using the TempliPhi DNA Sequencing Template Amplification kit (Amersham Biosciences Corp., Piscataway, NJ) and sequenced using the dideoxy chain-termination method on an Applied Biosystems 3700 automated DNA sequencing system using the Prism™ Ready Reaction Dyedeoxy™ Terminator Cycle Sequencing kit (Applied Biosystems Division, Perkin-Elmer, Foster City, CA). The T3 primer (5'-GGGAAAT-CACTCCCAATTAA-3') and the T7 primer (5'-GTAA-TACGACTCACTATAGGGC-3') were used for 5' reads and 3' reads of cDNA clones, respectively. Oligo-dT primer (T 22 M) was used for 3' sequencing reads of cDNA clones containing poly-A tails.
Raw single-pass sequence data were retrieved from a Geospiza Finch server and downloaded to the EST Analysis Pipeline (ESTAP) [80] for cleansing and analysis. Following removal of vector and low quality sequences, all sequences ≤ 50 bp in length were discarded. Remaining sequences were clustered using d2_cluster [81] and CAP3 algorithms [82] using default parameters established for ESTAP.

Root cDNA library construction
A third mixed cDNA library ("VVM", Library ID 22274) was constructed using total RNA from cold, water-deficit, 150 mM NaCl stressed and control condition roots.
Total RNAs from different treatments were extracted and equal quantities were pooled before mRNA selection. Poly(A)+ mRNA was isolated from total RNA using the Oligotex Direct mRNA kit (Qiagen, Valencia, CA). cDNA synthesis was conducted by converting poly (A)+ mRNA to double-stranded cDNA with the 5'-AACTGGAAGAATTCGCGGCCGCTCGCATTTTTT TTTTTTTTTTTTV-3' (V = A,C,G) primer and Superscript III reverse transcriptase (Invitrogen). Doublestranded cDNAs were size-selected (more than 600 bp), modified with EcoRI adaptors (AATTCCGTTG CTGTCG -Promega #C1291) and digested with NotI. The cDNAs were then directionally cloned into EcoRI-NotI digested pBluescript II SK+ phagemid vector (Stratagene, Inc., La Jolla, CA). The total number of white colony forming units (cfu) before amplification was 3.0 × 10 6 . Blue colonies (empty vectors) were less than 10% of the total colonies present on plates. Purified plasmid DNA from the primary library was converted to singlestranded circles and used as the template for PCR amplification using the T7 (5'-TAATACGACTCACTA TAGGG-3') and T3 (5'-AATTAACCCTCACTAA AGGG-3') priming sites flanking the cloned cDNA inserts as previously described [68]. The purified PCR products, representing the entire cloned cDNA population, were used as a driver for normalization. Hybridization between the single-stranded library (50 ng) and the PCR products (500 ng) was carried out for 44 hours at 30°C. Unhybridized single-stranded DNA circles were separated from hybridized DNA rendered partially double-stranded and electroporated into Escherichia coli DH10B cells to generate the normalized library. The total number of clones with insert was 1.6 × 10 6 cfu. Background levels of empty clones were less than 10%. cDNA library normalization and construction was performed by the W.M. Keck Center for Comparative and Functional Genomics at the Roy J. Carver Biotechnology Center at the University of Illinois at Urbana-Champaign. Normalization efficiency was verified by random sampling and sequencing of 96 and 285 clones from both the primary and the normalized libraries, respectively, and comparing their redundancy rates.

Identification of differentially expressed transcripts
All available EST information for individual ESTs and library of origin were downloaded from UniGene to match paired EST reads from single clone origins. Redundantly represented clones (e.g., two or more ESTs derived from the same clone) were identified from matching clone information parsed from dbEST submission files and verified using the DotPlot (version 2.1.1) plug-in (http://sourceforge.net/projects/dotplot/) [89] for the Eclipse (version 3.4.2) software development environment (http://www.eclipse.org/) [90] with the unique name of their 8.4X gene models plotted plate-wise on two axes to verify pairs of clones. EST totals were then adjusted to reflect the correct totals [91].
The frequency of each gene in each library was calculated by dividing the EST count by library size. The EST frequencies of multiple libraries of the same type (e.g., the multiple unstressed berry libraries) were combined into a single frequency term by the weighted mean, as described by Haverty and colleagues [52]. Differences in gene expression were estimated by EST frequency for genes with at least four ESTs present in the dataset using the web tool "Identifying Differentially Expressed Genes 6" (IDEG6; (http://telethon.bio.unipd.it/bioinfo/ IDEG6_form/) [53] with the recommended chi-squared test for multiple library comparisons with a p-value cutoff of < 0.0001. With these settings, IDEG6 calculates the likelihood that the frequency distribution of each gene would be expected by chance and reports the frequencies (transcripts/10,000) of genes below the cut-off. Hierarchical clustering of differentially expressed genes was performed using the Cluster software package [92], using the function (1 -Pearson correlation coefficient) as the pairwise distance metric and the average agglomeration method. The differentially expressed genes were matched to probesets found on the Affymetrix Vitis GeneChip ® microarray [55] and were then compared by Spearman rank correlation to the expression data of the significantly changed genes of multiple Affymetrix microarray experiments in which abiotic stress conditions were tested at multiple time points [10,27,31]. For the microarray probeset expression values, the time point/condition with the greatest fold-change was used for comparison and probesets with contradictory responses to stress (expression significantly increased in one condition, but significantly decreased in another) were not considered. Functional annotation was then assigned using the pathways, networks and out-of-network annotations found in VitisNet software http:// www.sdstate.edu/aes/vitis/pathways.cfm [55,93]. The VVM library sequences were compared to non-root EST libraries in a separate analysis, again with the IDEG6 web tool (http://telethon.bio.unipd.it/bioinfo/IDEG6_form/) [94] using the recommended Audic-Claverie (AC) statistic for comparisons of pairs, p-value < 0.01, with Bonferroni multiple-testing correction adjustment determined by the IDEG6 software (adjusted p-value cutoff of < 3.0 × 10 -6 ) [53,60].

Quantitative Real-time Reverse Transcriptase-PCR
Frozen leaf and shoot tissues were ground in liquid nitrogen by mortar and pestle and total RNA was extracted from the frozen powder using a Qiagen RNeasy plant mini kit (Qiagen Inc., Valencia, CA) with on-column DNase treatment according to manufacturers' instructions. Frozen berry and root tissue RNA was extracted using a Qiagen RNeasy Plant Midi kit, except that the manufacturer's instructions were modified by the addition of 2% polyethylene glycol (MW > 20,000 kD, Sigma-Aldrich, Inc., St. Louis, MO) to reduce polyphenol contamination [77]. RNA integrity was confirmed by electrophoresis on 1.5% agarose gels containing formaldehyde. cDNA was synthesized using an iScript cDNA Synthesis Kit (Bio-Rad Laboratories, Inc., Hercules, CA) according to manufacturers' instructions with a uniform 1 μg RNA/reaction volume reverse-transcribed. Gene-specific primers for real-time qRT-PCR were selected using Primer-BLAST at NCBI http://www.ncbi.nlm.nih.gov/tools/primer-blast/index. cgi?LINK_LOC=BlastHome[95] using RefSeq V. vinifera transcripts as input, screened against all other V. vinifera RefSeq sequences, and the following Primer3 [96] settings: Tm range 58-60°C, product size = 50-150 bp, primer size = 13-25 nt, max poly-X = 3, G/C content = 30-80%. Primer pairs were selected for an anti-GC clamp, such that no more than two of the last five 3' nucleotides were either G or C, as per qRT-PCR instrument recommendations. Quantitative real-time RT-PCR reactions were prepared using Fast SYBR ® Green Master Mix and performed using an ABI PRISM ® 7500 Sequence Detection System (Applied Biosystems, Inc., Foster City, CA). Expression was determined for triplicate biological replicates using the ΔΔCt method, referenced to a eIF4a endogenous control gene (GSVIV gene model, GSVIVP00034135001) for leaf and berry comparisons or to an actin 7 endogenous control gene (NCBI locus ID, LOC 100232968) for shoot and root comparisons [97]. Primers designed and used in this study along with cognate gene descriptions are listed in additional files 5 and 6. Additional file 5: List of primers used for real-time qRT-PCR of shoot and berry gene expression. Primers were generated for realtime qRT-PCR of genes for comparison with microarray and EST frequency results. Gene name, gene model or contig identifier, forward primer (FP) and reverse primer (RP) sequences, and product size are shown.

Additional material
Additional file 6: List of primers used for real-time qRT-PCR of root gene expression. Primers were generated for real-time qRT-PCR corroboration of root-enriched gene expression estimated by EST frequency. Gene name, NCBI gene locus identifier, forward primer (FP) and reverse primer (RP) sequences and product size are shown.