Chestnut resistance to the blight disease: insights from transcriptome analysis

Background A century ago, Chestnut Blight Disease (CBD) devastated the American chestnut. Backcross breeding has been underway to introgress resistance from Chinese chestnut into surviving American chestnut genotypes. Development of genomic resources for the family Fagaceae, has focused in this project on Castanea mollissima Blume (Chinese chestnut) and Castanea dentata (Marsh.) Borkh (American chestnut) to aid in the backcross breeding effort and in the eventual identification of blight resistance genes through genomic sequencing and map based cloning. A previous study reported partial characterization of the transcriptomes from these two species. Here, further analyses of a larger dataset and assemblies including both 454 and capillary sequences were performed and defense related genes with differential transcript abundance (GDTA) in canker versus healthy stem tissues were identified. Results Over one and a half million cDNA reads were assembled into 34,800 transcript contigs from American chestnut and 48,335 transcript contigs from Chinese chestnut. Chestnut cDNA showed higher coding sequence similarity to genes in other woody plants than in herbaceous species. The number of genes tagged, the length of coding sequences, and the numbers of tagged members within gene families showed that the cDNA dataset provides a good resource for studying the American and Chinese chestnut transcriptomes. In silico analysis of transcript abundance identified hundreds of GDTA in canker versus healthy stem tissues. A significant number of additional DTA genes involved in the defense-response not reported in a previous study were identified here. These DTA genes belong to various pathways involving cell wall biosynthesis, reactive oxygen species (ROS), salicylic acid (SA), ethylene, jasmonic acid (JA), abscissic acid (ABA), and hormone signalling. DTA genes were also identified in the hypersensitive response and programmed cell death (PCD) pathways. These DTA genes are candidates for host resistance to the chestnut blight fungus, Cryphonectria parasitica. Conclusions Our data allowed the identification of many genes and gene network candidates for host resistance to the chestnut blight fungus, Cryphonectria parasitica. The similar set of GDTAs in American chestnut and Chinese chestnut suggests that the variation in sensitivity to this pathogen between these species may be the result of different timing and amplitude of the response of the two to the pathogen infection. Resources developed in this study are useful for functional genomics, comparative genomics, resistance breeding and phylogenetics in the Fagaceae.


Background
The family Fagaceae includes more than 900 species in nine genera [1] found in temperate, subtropical, and tropical regions of the world [2]. In the Western Hemisphere, the Fagaceae range from southern Canada to Colombia [1] where they grow as tall trees or, occasionally, as shrubs. The Fagaceae include major species such as chestnuts, oaks, and beeches [2]. The Fagaceae have significant ecological and economic value. They are predominant species of most hardwood forests in the Northern Hemisphere. Nuts of Fagus (beeches), Castanea (chestnuts), Quercus (oaks) and most Castanopsis species are important sources of food for forest animals and produce a high quality extractable oil [3]. The Fagaceae are important as a timber resource for construction, telephone poles, floors, furniture, cabinets, and other applications. American chestnut played an especially important economic and ecological role by providing food to various insects, birds, and mammals, and food, fiber, and wood for rural communities, prior to the introduction of the Chestnut Blight Disease (CBD) early in the 20th century [4].
Fossil dating reveals that species of the order Fagales existed 84 million years ago (Mya) [5]. The time of divergence of the families of the Fagales has been estimated to be as early as 103 mya [6]. Fagaceae species hold a key position in the phylogeny of angiosperms ( Figure 1); they cluster closely to Cucurbitales in the Eurosid I group, which includes several model plant species, such as Medicago truncatula, Glycine max, Populus trichocarpa, and Eucalyptus grandis for which complete or nearly complete whole-genome sequences exist. The close phylogenetic relationships of several model plants should make members of the Fagaceae good models for comparative genomics between woody and herbaceous species.
Genomic resources for the Fagaceae also have practical applications. CBD is a stem canker disease caused by the fungus Cryphonectria parasitica. The pathogen infects stem tissues and kills the above ground portions of trees by girdling the cambium. Today, American chestnut exists primarily as an understory shrub, repeatedly sprouting from the root collar of blight-topped trees [8,9]. Programs to breed timber-type blight resistant chestnut were initiated after appearance of the blight [8], but failed to produce the desired timber-type tree with acceptable levels of resistance. In 1983, a program was initiated by the American Chestnut Foundation [10] to introgress genes from blight-resistant Asian chestnut species into American chestnut via backcross breeding [11]. The American Chestnut Cooperators Foundation, has made crosses between pure American chestnut trees that have limited resistance [8] for future selection. Introductions of hypovirulent fungal genotypes of C. parasitica [12], have successfully controlled the severity of blight in Europe, but have had only limited success in North America. Understanding the molecular basis of resistance to the CBD could enable the eventual restoration of American chestnut to the eastern North American forests [13].
The Genomic Tool Development for the Fagaceae (GTDF) project [14] was organized to develop genomic resources for several North American Fagaceae species to help address major forest health problems of American chestnut and other trees within this botanical family. The over 1.5 million cDNA sequences generated in this study for American and Chinese chestnut have been used to advance knowledge about the genetics of resistance to the CBD. A preliminary comparison of transcriptome sequences of healthy and blight infected American chestnut and Chinese chestnut resulted in the identification of potential candidate disease resistance genes [15]. However, the 2009 report was based on only 13,000 and 15,000 contigs for American chestnut and Chinese chestnut, respectively, available at the time. We now report on over 83,000 contigs in total for the two species, which includes 9000 Sanger sequences from Chinese chestnut in addition to 454 sequence reads. Moreover, the assembly of transcript contigs was performed using the SeqMan™ NGen™ v1.2 software (DNAStar, Inc) which provided better assembly of the 454 reads than the version of the Newbler program available in 2008. The orthologs used for comparison of DTA genes between American and Chinese chestnut were better defined in this study through the use of a reciprocal best hits analysis. The DEGseq software package [16] used for assessing transcript abundance (TA) includes statistical analyses that make it a more powerful tool than the one used the previous analysis.
This paper reports on further analysis of the transcriptome of chestnut and the identification of candidate genes involved in the defense against CBD. The coverage of the transcriptome was analyzed for number of genes identified, sequence length, gene family size, and similarity to the transcriptome of other woody and herbaceous species. The transcript contigs include full length coding sequences from American and Chinese chestnut respectively which represents a valuable resource for genomic studies in the Fagaceae. The analysis of the differential transcript abundance (DTA) of chestnut genes using DEGseq [16] allowed the identification of hundreds of defense related candidate genes with DTA in canker versus healthy stem tissues. A small set of candidate genes for resistance to CBD were verified for their DTA using real time quantitative RT-PCR.

Sequencing summary
More than 1.5 million (1,526,670) reads were generated corresponding to 400 million nucleotides of cDNA Figure 1 Simplified phylogeny of the major groups of woody plants. Species for which genome sequencing has been completed are indicated in red. (Modified from [7]. sequences from the two studied species. Transcript contigs were assembled from the pyrosequencing reads using Newbler software (Roche 454) and designated version 1 on the Fagaceae website [14]. A second assembly, performed on sequences generated from both pyrosequencing and capillary sequencing reads using SeqMan™ NGen™ v1.2 software (DNAStar, Inc), was designated version 2. The version 2 contig set included longer contigs and more sequences were integrated into the contigs when compared to the original 454 Newbler assemblies. The combination of Sanger sequences and 454 sequences also resulted in slightly fewer but longer contigs. General information about the sequences and contigs identified from each species are summarized (Tables 1 and 2).

Analysis of the American and Chinese chestnut transcriptomes
For the two Fagaceae species in this study, over one and half million sequencing reads were generated and yielded a total of 93,018 contigs in total for the separate assemblies of the cDNA libraries for the 10 tissues sampled. A small fraction of contigs matched mitochondrial (1.3%) and chloroplast (3%) genes. Similarly,~2.5% of American and Chinese chestnut sequences obtained from canker tissues had best BLASTX alignments to the Cryphonectria parasitica proteome. Transcriptome assembly, version 2 (using all of the reads combined across all tissues) , led to the identification of 34,800 and 48,501  contigs from American and Chinese chestnut respectively, from pyrosequencing alone, and 34,800 and 48,335  contigs, with the addition of the Sanger sequences for  American chestnut and Chinese chestnut, respectively  (Table 2). GO annotation using the Arabidopsis thaliana proteome as reference showed that the transcriptome of these species covers a wide range of biological processes ( Figure 2) suggesting that the cDNA libraries were unbiased and well-suited for studies of development and physiology. The distribution of biological processes of the identified contigs from American and Chinese chestnut ( Figure 2) did not show any statistically significant differences (p-value > 0.05) [17]. BLASTX alignments to model system proteomes showed that~60% of the transcript contig sequences from the chestnut species studied have strong similarity to predicted proteins in Arabidopsis thaliana or Populus trichocarpa. Of the contig sequences that did not have any significant matches to Arabidopsis thaliana genes, 5 to 6% had a match to Populus trichocarpa genes. The remaining contigs (~30%) did not match any sequence in either the Arabidopsis thaliana or Populus trichocarpa proteomes. We observed a bias toward longer sequences in the contigs with BLASTX alignments to the model proteomes. The distribution of contig length showed that~85% of sequences without BLASTX alignments to the proteomes of the two model species were short (< 250 nt). In contrast, only about 50% of contigs with good BLASTX hits on the model species proteomes were shorter than 250 nt. BLASTX searches were then conducted against the proteomes of all of the plant species for which the whole genome sequence were available at the time of this study, including Vitis vinifera, Carica papaya, Medicago truncatula, Oryza sativa, Populus trichocarpa, Physcomitrella patens, and Selaginella moellendorffii. A large fraction of chestnut contigs had better BLAST alignment scores to woody species than to the herbaceous species ( Figure 3). For instance, over 35% of contigs from American and Chinese chestnut had best alignments to Vitis vinifera and Populus trichocarpa. Only~5% of the contigs had best alignments to Arabidopsis thaliana. This bias cannot be attributed to a GC content difference between contigs from woody versus herbaceous species as their GC content have similar distributions (data not shown).

Coverage of the transcriptomes
From the 34,800 and 48,335 contigs, a total of 11,431 and 10,016 large transcript contigs (more than 800 nucleotides) were identified from American chestnut and Chinese chestnut, respectively. The size of the transcript contigs assembled including large ones ranged from 258 bp to 1038 bp with approx. 4-12% of the American and Chinese chestnut contigs covering at least 70% of the length of the coding sequences relative to the respective genes in Arabidopsis thaliana. Analyses of the length of contigs showed that 344 (6.0%) and 874 (6.7%) contigs were full length in American chestnut and Chinese chestnut, respectively. Analysis of gene family size, both in Table 2 Results of mass assembly of sequence reads from all libraries for each of American chestnut and Chinese chestnut into contigs 1 .

Reads
Assembly  Figure 2 Histogram presentation of Gene Ontology classification of putative biological processes of contigs from American chestnut (AC), and Chinese chestnut (CC). The Y axis indicates the annotation count corresponding to each biological process indicated on the X axis.
American and Chinese chestnut as well as in Populus trichocarpa, Arabidopsis thaliana, and Oryza sativa, showed that the numbers of genes per family identified in chestnuts is similar to their counterpart in the model plant species suggesting a good coverage of the transcriptome in these two Fagaceae species (Figure 4). For instance, the number of members per gene family correlates well between American chestnut, Chinese chestnut, and the other model plant species (Additional file 1: Figure S1), with correlations coefficients of R = 0.8 for Chinese chestnut versus Arabidopsis and R = 0.74 for Chinese chestnut versus Populus.

Defense related genes in American and Chinese chestnut
In silico analysis of transcript abundance using the DEGseq approach [16] identified 1715 GDTA in canker tissues versus healthy stems in American chestnut and 720 G DTA in Chinese chestnut ( Figure 5, Additional file 2: Table S1, and Additional file 3: Table S2). The number of reads per transcript contig ranged between 7 and 6388 in canker and between 0 and 756 in healthy stem tissues. GO annotation distribution ( Figure 5) showed that 177 and 86 of the identified genes from American and Chinese chestnut, respectively, were involved in response to abiotic or biotic stimuli. Twenty two percent and twenty three percent of American chestnut and Chinese chestnut genes from this functional category were involved in defense against biotic stresses. Most of the gene transcripts were highly abundant in canker tissues of both species (Additional file 4: Table S3) and thus represent good candidates for defense against the CBD fungus. GO annotation distribution ( Figure 5) showed that the most frequent molecular functions of the identified defense-related genes were hydrolases, protein binding, transferases, and transporters. Several annotation categories including "secondary metabolic process", "oxidoreductase", "cellulose and pectincontaining cell wall", "hydrolases", and "lyases activity" were significantly over-represented in Chinese than American chestnut. A statistical analysis using the GOstat program [18] confirmed the enrichment of Chinese chestnut transcritpome in these functional categories (p-value < 0.01). On the opposite, several functional categories including mainly house-keeping genes such "structural constituent of the ribosome", "translation", "ribosome biogenesis and assembly", and "protein metabolic process" were over-represented in American chestnut than Chinese chestnut (p-value < 0.01). The over-representation of house-keeping GDTAs in American chestnut could be associated with the increase in protein synthesis at the infection site for defense against the pathogen. The list of the identified defense-related genes showing DTA involves several related pathways (Tables 3 and 4, Additional file 2: Table S1, Additional file 3: Table S2, and Additional file 4: Table S3). The first category includes  genes involved in the biosynthesis of lignin and other cell wall components such as 4-coumarate:CoA ligase (4CL), Cinnamyl-Alcohol Dehydrogenase (CAD), cinnamoyl CoA reductase (CCR), peroxidase, Myb transcription factor, and UDP-glucose:thiohydroximate S-glucosyltransferase. Genes involved in programmed cell death and hypersensitivity such as Myo-inositol-1-phosphate, ATPase transporter, voltage-dependent anion channel, 2-deoxy-D-arabinoheptulosonate 7-phosphate, and cysteine proteinase precursor-like protein were also identified in canker tissues. However, one of the highly represented categories was phytohormone signaling including ethylene, jasmonic acid (JA), salicyc acid (SA), and abscisic acid (ABA) (Tables 3 and 4, Additional file 1: Figure S1, Additional file 2: Table S1). For example, transcripts of 12 genes involved in JA response were differentially abundant in Chinese chestnut. These include allene oxide cyclase, JAZ1, lipoxygenase, 12-oxophytodienoate reductase, 3-ketoacyl-CoA thiolase, chitinase, plastidic fatty acid desaturase, and others. Lipooxygenase, chitinase, and ACC oxidase are among genes with the most DTA in canker versus healthy stem (Additional file 3: Table S3). Genes involved in the response to SA include alpha-dioxygenase, mitochondrial     Table S3). Few genes were induced in American versus Chinese and vice versa (Additional file 4: Table S3).
Analyses of a small set of these candidate genes using quantitative RT-PCR in healthy stem tissue versus Cryphonectria parasitica inoculated stem tissues from American and Chinese chestnut confirmed the differential expression of several of these candidates ( Figure 6).

Discussion
In this study, over 1.5 million sequencing reads from various tissues were generated and assembled into 34,800 and 48,501 contigs, for American and Chinese chestnut respectively. The low level of contamination with organelle DNA, the fraction of complete or nearly complete full length cDNA sequences, and the depth coverage of genes involved in various biological processes indicate that pyrosequencing is an excellent tool for gene discovery, EST sequencing, and transcriptome analysis in nonmodel tree species. Combining 454 and Sanger sequences can improve contig construction as the assembly using both data sets resulted in more reads being integrated into contigs than by assembly of pyrosequences alone, resulting in slightly fewer but longer contig sequences.
Using a tool developed recently [19], that estimates the amount of sequencing needed to cover the transcriptome of a given species, that takes into consideration the sequencing platform and the number of contigs generated from each species, we determined that as few as two or three additional plates of 454 sequence from different developmental stages and physiological conditions should allow for 100% coverage of the transcriptomes of Chinese and American chestnut. The sequencing effort of the GTDF increased the number of Fagaceae cDNAs available in GenBank from a few sequences to hundreds of thousands of sequences for each of the species studied. The cDNA sequences and contig sequences, which are publicly accessible at the website [14] hosted at Clemson University, provides tools for Fagaceae species-specific BLAST searches. The cDNA sequences have been used to select molecular markers such as Simple Sequence Repeats (SSRs), and Single Nucleotide Polymorphisms (SNPs) for genetic mapping in Chinese and American chestnut (Kubisiak et al., in preparation), for ordering the physical map of Chinese chestnut (Fang et al., in preparation), and for characterizing microRNAs (Barakat et al., unpublished data). The sequences were also used to identify candidate genes involved in resistance to CBD [15], some of which are being functionally characterized (Powell et al., unpublished data). The genes with DTA were also being examined for colocalization with QTLs for resistance to CBD.
BLASTX alignments to the proteomes of two model systems (Arabidopsis or Populus) showed that~60% of the transcript contig sequences from the Fagaceae species studied have strong similarity to predicted proteins. The remaining contigs did not match any sequence in the Arabidopsis or Populus proteomes. A large fraction of these are short sequences that may originate from 3' or 5' untranslated regions, which tend to be highly divergent between species (Additional file 1: Figure S1). Part of these sequences may also correspond to non-coding RNAs, or potential chestnut-specific genes. BLAST searches against the proteomes of eight plant species with complete genome sequences (Vitis vinifera, Carica papaya, Medicago truncatula, Oryza sativa, Populus trichocarpa, Selaginella moellendorffii, Physcomitrella patens, and Chlamydomonas reinhardtii) showed that a large fraction of the Fagaceae contigs have better alignments with genes in woody species such as Vitis vinifera, Populus trichocarpa, and Carica papaya versus the nonwoody species angiosperm species. Similar results have been reported for other tree species such as Liriodendron tulipifera and Carica papaya [20,21]. Many long-lived woody plants, including Fagaceae species, exhibit extended phases of juvenile development before they reach flowering age, whereas most herbaceous plants reach flowering age in a single growing season. Thus, these observations could be associated with a slower evolutionary rate of genes in woody species versus herbaceous species.
In silico analysis of gene expression identified over two fold GDTAs in American than Chinese chestnut. However, most of the difference between the two DGTA sets lies in the number of induced housekeeping genes associated with the increase in resource utilization for plant defense at the infection site. The difference in the number of house-keeping genes induced in American and Chinese is maybe due to the amplitude of response of the two species to C. parasitica infection. The number of genes belonging to the category "response to biotic and abiotic stimuli" in American and Chinese chestnut from this is over 14 and 6 times higher than the ones reported previously using a partial dataset [22]. Also, several genes identified previously were not confirmed in this analysis. However, many new genes with DTA were identified in this study. The discrepancy between these results is linked to the larger dataset, which includes both 454 and capillary sequences, the better contig assembly, and the use of a more powerful tool for DTA analysis.
Genes with DTA identified in this study belong to well known plant pathways such as phenylpropanoid metabolism, phytohormone (JA, ABA, ethylene and SA) signaling, cell wall biosynthesis, proteolysis, and others. These genes and pathways function at different times in the plant response to pathogens. The category of genes involved in phenylpropanoid metabolism act early in plant defense, serving to inhibit or to block the penetration and the progression of the plant pathogen. This category includes genes for biosynthesis of monolignol and other phenolic compounds. Previous studies [23][24][25][26][27] showed that lignin biosynthesis is crucial for cell wall apposition, one of the first lines of plant defense against invading fungi. Besides lignin, the biosynthesis of other polymers such as callose seems to follow infection as suggested by the increased transcript abundance of UDP-glucose:thiohydroximate S-glucosyltransferase [28]. Other phenolic products that are involved in plant defense against pest and pathogens seem to be produced as well, as deduced by the presence of transcripts encoding genes such as flavanone 3-hydroxylase and flavonol 7-O-glucosyltransferase known to regulate flavonoid biosynthesis [29].
The second most important category of genes detected in response to the blight infection includes genes from phytohormone signaling pathways including JA, SA, and ethylene. These hormones trigger the activation of induced systemic resistance and systemic acquired resistance (SAR) to nectrotrophic pathogens [30,31]. The SAR is an effective defense mechanism against a broad range of pathogens and insects. Several genes from the JA response pathway such as methyl jasmonate esterase (MES1), acyl-CoA oxidase, a phyB pathway, and ATPase transporter were identified [32]. Genes involved in SA response such as hydroxy-2-methyl-2-(E)-butenyl 4diphosphate, HopW1-1-Interacting protein 1 (WIN1) were identified [33]. The SA pathway, which is considered one of the major pathways involved in defense against nectrotrophic pathogens, regulates the expression of defense effector genes and systemic acquired resistance through the repression of the auxin signaling pathway [16,[33][34][35][36]. Another hormone that seems to play a role in the resistance of chestnut to CBD is abscissic acid (ABA). While ABA was described as a susceptibility factor, other studies [37,38] showed that it activates plant defense by priming for callose deposition or by restricting the progression of the fungus Cochliobolus miyabeanus in the mesophyll of rice [38]. Other signaling genes involved in SAR that induce numerous defense genes include apoplastic lipid transfer protein, and basic chitinase, etc [39].
The third category of genes with DTA in canker tissues includes genes involved in early response as part of the HR. Among these are transcripts encoding proteins such as ATPase transporter, kinases, carbonic anhydrase, AMMECR1, MIPS1, voltage-dependent anion channel, 2deoxy-D-arabino-heptulosonate 7-phosphate (DAHP) synthase, and glutathione peroxidase that were reported previously to be involved in the hypersensitivity resistance (HR) and cell death in plants under pathogenic attack [18,31,[40][41][42][43]. Reactive oxygen species (ROS) seem to be induced following C. parasitica infection as several genes involved in oxidative stress (alpha-dioxygenase, fumarase, cytosolic GADPH (C subunit), cytosolic ascorbate peroxidase APX1) had more abundant transcripts. Furthermore, several pathogenesis related (PR) genes such as elicitor-activated gene 3-1 (ELI3-1), aromatic alcohol:NADP + oxidoreductase, thaumatin, pathogenesis-related, and antifungal chitin-binding protein had differentially abundant transcripts in canker versus healthy stem tissues. PR proteins, of which some have antimicrobial functions [44], are mainly induced in localized pathogen attack around HR lesions. It is unknown what roles the HR and cell death play for chestnut defense against a necrotrophic pathogen such as C. parasitica. Alternatively, some of the genes involved in the HR may activate a systemic response of the plant or the pathogen may trigger HR to facilitate its colonization of the plant as reported for other pathogens [45]. Several other genes involved in defense such as eIF(iso)4E [46] were also implicated.
The candidate genes identified in this study represent a valuable resource for studying the genetic basis underlying resistance to CBD and the isolation of the fungal pathogen resistance genes. Comparative mapping of the blight resistance quantitative trait loci (QTL) of chestnut with peach disease resistance QTL is revealing that the genes for several of the differentially abundant chestnut transcripts in canker versus healthy stem tissues map to disease resistance QTL regions in both species (Fang and collaborators, in preparation). This suggests that some of the genes identified in this study may play a major role in plant defense against the CBD. Because most of the defense genes and gene networks were induced in both American and Chinese chestnut canker tissues, the question that remains is what then accounts for one of these species being susceptible and the other resistant to CBD. The timing of the response to the pathogen infection, and the amplitude of the response, could result in Chinese chestnut resistance to the blight. More information from transcript profiling during the time course of infection in these two species is required to address this question.

Conclusions
This project has generated over 83,000 transcript contigs from American and Chinese chestnuts and identified hundreds of genes and regulation pathways which may be involved in chestnut resistance to the pathogen C. parasitica. These resources have also been used for SNP and SSR marker development, genetic mapping, physical mapping, and genome organization comparison among chestnut species and between chestnut and other closely related species. Several of the candidate genes identified in this study are in the process of being analyzed for their function using transformation in planta. The cDNA database generated in this project is also being used to map expressed genes and to annotate the proteome in the Chinese chestnut genome that is being assembled during submission of this report (John Carlson, personal communication).

Plant materials
Ten cDNA libraries, representing a range of tissues from American and Chinese chestnut were prepared for EST sequencing by 454 technology [47] (Table 1). Tissue samples were submerged in liquid nitrogen immediately upon collection and stored at -80°C until use. To create cankers, the stems of chestnut trees were inoculated with the hypervirulent C. parasitica strain EP155 as described by Hebard and collaborators [15,48]. For American chestnut and Chinese chestnut, canker tissue was collected at 5 and 14 days post-inoculation as previously described [15,48]. These times correspond to early and late stages of interaction between the plant and the pathogen [48].
RNA preparation, cDNA library synthesis, and 454 sequencing Total RNA was prepared as described previously [49], and assessed with a 2100 Bioanalyzer (Agilent Technologies). cDNA and 454 libraries were constructed as described [15]. The 454 libraries were sequenced using the model GS20, for the American chestnut canker and Chinese chestnut canker libraries, and an FLX model 454 DNA sequencer (Roche Diagnostics), as previously described [15], for the remaining libraries. The sequence data was deposited into the Short Read Archive at the National Center for Biotechnology Information (study accession SRP000395).

Sanger sequencing
To analyze the transcriptomes of American and Chinese chestnut, this project generated Sanger sequences for about~9000 cDNA clones from a subtractive library enriched in genes highly expressed in canker tissues in Chinese versus American chestnut using capillary sequencing. A total of 8,101 cDNA sequences were obtained after filtering reads for quality. RNA was prepared using the method described previously [49] and reverse transcribed using the SMART PCR cDNA Synthesis Kit (Clontech, Mountain View, CA). Substrative libraries were constructed using Chinese chestnut as the tester and American chestnut as the driver following the manufacturer recommendation (Clontech). Sequencing of the substractive libraries was conducted at the Clemson University Genomics Institute by an automated Sanger sequencing protocol.

Transcript assembly and contig annotation
The 454 sequence reads were assembled into contigs using 454 Newbler (Roche Diagnostics) or SeqMan™ NGen™ v1.2 software (DNAStar, Inc), optimized for 454 next generation data. The new assemblies are available on the Fagaceae website [14]. cDNA libraries were constructed using random priming which results in low poly A/T tail contamination and therefore no filtering was performed. Also, SeqMan removes low quality ends including homopolymer runs of poly(A/T) that have lower qualities in 454 sequencing. Contamination with mitochondrial and chloroplast genes was assessed by running a BLASTX search against Arabidopsis mitochondrion and chloroplast proteomes. An assembly using 454 and 9000 Sanger sequences was performed and compared the one that used 454 sequences only. Full-length contigs were identified by running a BLASTX search against the Arabidopsis thaliana proteome and comparing the lengths of the aligned portion of each contig and the putative proteins [50]. The annotation of contigs was performed by BLASTX [51] against the Arabidopsis thaliana proteome (e-value = e -5 ) and the Gene Ontology (GO) [52] system [15]. Comparison of GO annotation distribution between species was conducted using the GOstat program [17] set to the following parameters: GO-DB: tair; Min Sub-GO length: 3; P-Value Cutoff: 0.01; GO-Cluster Cutoff: -1; with no correction for multiple testing because the high dependence between GO terms will cause the test to be overly conservative. To determine which model species with most best hits to Fagaceae transcript contigs, BLAST alignments were conducted by querying the Fagaceae contigs against the proteomes of algal, moss and higher plant species with fully sequenced genomes (Chlamydomonas reinhardtii, Physcomitrella patens, Selaginella moellendorffii, Oryza sativa, Vitis vinifera, Populus trichocarpa, Carica papaya, and Arabidopsis thaliana) and the e-values of the best hits from each species were compared.

Identification of DTA in canker tissues
DEGseq [16] was used to identify gene specific differences in transcript abundance. The DEGseq package was chosen because it integrates several statistical methods, can estimate a theoretical replicate when an experimental one is not provided, and has been used routinely to identify DTA [16,53,54]. The number of 454 reads per contig for each gene was compared between canker and healthy stem tissues in American and Chinese chestnut separately. Similar analyses were performed for gene orthologs from both species. Orthologs were identified using a reciprocal best hit approach. DEGseq employs a random sampling model based on the read count in canker and healthy stem tissue libraries and performs a hypothesis test based on that model. Two theoretical four-fold local standard deviation lines can be drawn on the expression MA-plot to estimate the noise level of genes with different intensities and identify gene expression differences in different libraries. Genes passing the threshold are identified as exhibiting DTA. GO enrichment analyses were performed using Blast2Go software [55].

Validation tests of GDTA by real-time quantitative RT-PCR
Real-time quantitative RT-PCR tests were conducted to determine the extent to which the number of EST reads per gene obtained by shotgun sequencing accurately reflected transcript levels in the source tissues. RT-PCR estimates of transcript abundance were conducted on RNA from healthy and canker stem tissues from American chestnut and Chinese chestnut. RT-PCRs were performed as described previously [56,57]. Quantitative real time PCRs (qRT-PCRs) were prepared using the SYBR Green Master Mix kit (Applied Biosystems) and run in an Applied Biosystems 7500 Fast Real-Time PCR system with default parameters. Primers were designed using Primer Express ® software (Applied Biosystems). A gene encoding 18S rRNA was used as an endogenous standard to normalize template quantity. We used only one standard because we did not observe any tissue specific differences in expression of 18S rRNA gene in our study. In addition, RT-PCR analyses were performed to confirm the expression of GDTA already identified using in silico expression analysis. For each gene, three biological replicates (three different trees) and three technical replicates were performed. Statistical analyses used Statistica 6.0 software (StatSoft Poland Inc., Tulsa, OH, USA), to estimate the significance of the differences.