Characterization of expressed sequence tags obtained by SSH during somatic embryogenesis in Cichorium intybus L

Background Somatic embryogenesis (SE) is an asexual propagation pathway requiring a somatic-to-embryonic transition of differentiated somatic cells toward embryogenic cells capable of producing embryos in a process resembling zygotic embryogenesis. In chicory, genetic variability with respect to the formation of somatic embryos was detected between plants from a population of Cichorium intybus L. landrace Koospol. Though all plants from this population were self incompatible, we managed by repeated selfing to obtain a few seeds from one highly embryogenic (E) plant, K59. Among the plants grown from these seeds, one plant, C15, was found to be non-embryogenic (NE) under our SE-inducing conditions. Being closely related, we decided to exploit the difference in SE capacity between K59 and its descendant C15 to study gene expression during the early stages of SE in chicory. Results Cytological analysis indicated that in K59 leaf explants the first cell divisions leading to SE were observed at day 4 of culture. In contrast, in C15 explants no cell divisions were observed and SE development seemed arrested before cell reactivation. Using mRNAs isolated from leaf explants from both genotypes after 4 days of culture under SE-inducing conditions, an E and a NE cDNA-library were generated by SSH. A total of 3,348 ESTs from both libraries turned out to represent a maximum of 2,077 genes. In silico subtraction analysis sorted only 33 genes as differentially expressed in the E or NE genotype, indicating that SSH had resulted in an effective normalisation. Real-time RT-PCR was used to verify the expression levels of 48 genes represented by ESTs from either library. The results showed preferential expression of genes related to protein synthesis and cell division in the E genotype, and related to defence in the NE genotype. Conclusion In accordance with the cytological observations, mRNA levels in explants from K59 and C15 collected at day 4 of SE culture reflected differential gene expression that presumably are related to processes accompanying early stages of direct SE. The E and NE library obtained thus represent important tools for subsequent detailed analysis of molecular mechanisms underlying this process in chicory, and its genetic control.


Background
Cells in complex multicellular organisms acquire their structural and functional attributes by differentiation, a genetic program specific for a given environment of the cells. In many higher organisms, differentiation is a unidirectional and irreversible process. In higher plants, however, most differentiated cells are able to transdifferentiate, i.e. start a new differentiation pathway. Somatic embryogenesis (SE) may be considered as the ultimate form of this plant cell totipotency in that fully differentiated somatic cells are induced to regenerate new plants via a developmental pathway that resembles zygotic embryogenesis [1]. Despite considerable efforts, the processes underlying the transition from a somatic to embryogenic cell, i.e. induction, dedifferentiation, and redifferentiation, are still poorly understood [2].
For twenty years, processes related to the induction of SE in chicory are being studied in our laboratory at the cytological, physiological, and molecular level using the interspecific hybrid '474' (Cichorium intybus L. × C. endivia L.). In contrast to many agronomical important varieties of chicory, somatic embryos are readily and rapidly formed in large numbers in different explants from the hybrid '474' when cultured under constant agitation in the dark at 35°C in a Murashige and Skoog culture medium containing low concentrations of auxin and cytokinin [3]. SE is direct under these conditions, i.e. without the development of a callus, and embryos are formed from single cells [4].
Using the chicory hybrid '474' SE model system, different methods have been applied to clone genes that might be involved in the early phases of SE, or that at least could serve as markers of SE induction. The genes corresponding to the cloned cDNAs were differentially expressed in explants of the hybrid '474' during SE and not or only weakly expressed in explants of non-embryogenic C. intybus L. varieties cultured under the same conditions, suggesting that the expression of these genes was related to SE and not to the stress due to the culture conditions [5][6][7][8]. However, determination of causal relationships between the differentially expressed genes and SE is hampered due to the interspecific status of hybrid '474', and its complete sterility.
More recently, genetic variability with respect to the formation of somatic embryos was found present in a Hungarian landrace of C. intybus L., called 'Koospol', from which also the C. intybus L. parent of the hybrid '474' originated (M-C. Quillet, B. Delbreil, and B. Deprez, unpublished results). Upon screening plants from this landrace, embryogenic and non-embryogenic genotypes were identified, offering the possibility to introduce genetics as a tool to study the molecular mechanisms underlying the induction of SE in chicory. The plant K59 was selected as a highly embryogenic (E) genotype, and a few seeds were obtained after repeated selfing of this normally selfincompatible, and highly heterozygous, genotype. Amongst the plants grown from these seeds, the plant C15 was found to represent a non-embryogenic (NE) genotype, incapable of forming somatic embryos under our SEinducing conditions. Sharing a similar genetic background, the genotypes K59 and its descendant C15 thus seemed an obvious choice as starting material for detailed analysis of differential gene expression during the early stages of SE in chicory.
In this study, we report the generation of an embryogenic (E) and a non-embryogenic (NE) cDNA library by applying suppression subtractive hybridization (SSH) [9] using mRNAs isolated from leaf explants of genotypes K59 and C15, cultured for 4 days under SE-inducing conditions. From the libraries 3,500 cDNA-clones were selected, sequenced, and subjected to database searches to annotate the putative functions of the representing genes. Differentially expressed genes were identified by in silico subtraction analysis and real-time RT-PCR. Several genes preferentially expressed in K59 seem to encode proteins involved in protein synthesis and cell division, whereas proteins encoded by genes preferentially expressed in C15 may be involved in defence. The results are discussed with respect to the quality of the libraries, and their use for future research on differential gene expression during SE in chicory.

Cellular events during the induction of somatic embryogenesis in K59 and C15
When leaf explants from the genotypes K59 and C15 were cultivated under SE-inducing conditions as developed for the hybrid '474' [3], somatic embryos were formed in the explants from K59, but not in those from C15 ( Fig. 1). Following the cellular events in explants from both genotypes during SE culture by microscopic examination of semi-thin sections, revealed that SE development in K59 was similar as described previously for the hybrid '474' [4]. The first visible response in the explants, starting after one day of culture, was cell reactivation; reactivating cells being characterized by enlarged nuclei and clearly distinguishable nucleoli. At this stage the nucleus is still oppressed against the cell wall, situated between the plasmalemma and the tonoplast of the central vacuole, and is surrounded by chloroplasts. One day later, the first reactivated cells with their nuclei positioned in the middle of the cell and surrounded by a fragmented vacuole, were observed. At day 4 of SE culture, in explants of K59 reactivating and reactivated cells were observed, as well as some first cell divisions preceding somatic embryo formation (Fig. 1a, b). In contrast, in explants from C15 only cells that seemed to have started reactivation were observed (Fig. 1d, e), albeit in lower numbers as compared to K59. Observations at day 8 of the culture showed the presence of many proembryos in the explants of K59 (Fig. 1c), whereas in the explants from C15 there was still no development of reactivated or dividing cells (Fig. 1f). From these results it was concluded that differences in mRNA levels in explants from K59 and C15 collected at day 4 of SE culture are likely to reflect differential gene expression related to processes accompanying the early stages of SE in K59.

Generation of SSH from an embryogenic and a nonembryogenic genotype
Messenger RNAs isolated from leaf explants of K59 and C15, collected at day 4 of culture under SE inducing conditions, were used to construct two subtractive cDNAlibraries by applying SSH. The E library, a library sup-posed to be enriched in cDNAs representing genes preferentially expressed during SE, was obtained by using cDNAs from K59 mRNAs as 'tester' and cDNAs from C15 mRNAs as 'driver'. The NE library was obtained by reversing 'tester' and 'driver' mRNA.
Sequencing was carried out for about 3,500 cDNAs clones randomly selected from both libraries: 2,000 from the E library and 1,500 from the NE library. After removing bad quality sequences, a total of 1,944 ESTs from the E library and 1,404 from the NE library were conserved for further analyses. The average length of the 3,348 ESTs was 456 bp, and the GC content was equal to 45% (Tab. 1).
A database named E/NE db was generated from all ESTs from the E and NE library. To generate OCs (Original Clusters, regrouping identical ESTs) two successive criteria were applied. First, all 3,348 sequences from the E/NE db were submitted to a BlastN search (E-value ≤ E -30 ) against this database. The sorted sequences were grouped in 2,174 primary OCs (encoded OC0001 -OC2174). Next, sequences in primary OCs containing more than one EST were aligned and their mutual sequence identity determined. OCs containing ESTs that all shared at least 95% identity over a contiguous sequence of 150 bp retained their primary code. OCs that contained two or more groups of ESTs by this criterion were split in 2 or more OCs, respectively, each identified by the primary code followed by a letter (e.g. OC0603_a and OC0603_b; cf Tab. 3, and Additional file 1). This analysis revealed a total of 2,302 OCs, of which 1764 (52.7% of the total number of sequences) were singletons, and 538 contained between 2 and 40 ESTs.

Comparison of chicory ESTs with sequences of other species allowing the formation of contigs
Direct determination of the number of genes represented by the 2,302 OCs identified was not possible since the digestion of cDNA with RsaI during the SSH procedure left no overlapping sequences, and thus prevented the con- Cytological differences in leaf explants of chicory genotypes K59 and C15 cultured under SE-inducing conditions struction of contigs. To identify OCs that potentially represented the same gene, sequential BlastN searches (Evalue ≤ E -30 ) were performed against assembled ESTs from lettuce (assembled ESTs from CGPD: Compositae Genome Project Database, [10]), sunflower (assembled ESTs from CGPD), and Zinnia elegans L. (assembled ESTs from PGDB: Plant Genome Database, [11]) (Tab. 2). The hierarchy of the searches was determined by the botanical proximity of the species; chicory and lettuce belonging to the tribe Lactuaceae of the subfamily Cichorioideae of the Asteraceae, whereas sunflower and Z. elegans belong to the subfamily Asteroideae [12]. In parallel, BlastX searches were performed using the non-redundant (NR) GeneBank database [13], and the Arabidopsis translated coding sequences [14].
The results of the BlastN searches in the Asteraceae databases showed a high proportion of 'no hits found', i.e. 1035 ESTs of chicory (31%) were not represented in Asteraceae databases. In comparison, only 13-17% of the chicory ESTs did not match to any sequence in the Arabidopsis and Genebank NR databases (Tab. 2). Furthermore, the results quite often suggested that OCs that were clearly distinguished by our OC-criteria, as well as by the results of the BlastX searches, matched to the same Asteraceae contig. Taken together, this clearly indicated that the CGPD and PGDB databases were not yet sufficiently exhaustive to represent the Asteraceae transcriptomes.
It was therefore decided to use the results of the BlastX searches to assemble the chicory OCs into contigs. In a first attempt, OCs and singletons with the same best hit (E-value ≤ E -5 ) in the AtGDB were considered to represent the same gene. However, ESTs grouped in different OC, because they had less than 95% sequence identity, were sometimes found to match with the same Arabidopsis coding sequence. In these cases very often each OC matched with a different sequence in the NR GeneBank database (see Additional file 1). This may be indicative for a higher number of duplicated genes in chicory than in Arabidopsis. These analyses led to the formation of 189 contigs by the regrouping of 111 OCs with at least 2 ESTs, and 303 singletons. Together with the remaining 1,888 OCs, we estimated that the 3,348 ESTs selected from the E and NE library represent at maximum 2,077 genes (see Additional file 1). From these 2,077 annotated genes, 1,061 genes (51%) were composed of ESTs exclusively originating from the E library, 730 genes (35%) of ESTs exclusively originating from the NE library, and 286 genes (14%) of ESTs present in both libraries.
The total number of genes is probably overestimated, since part of the OCs and contigs containing ESTs without significant matches may correspond to genes already accounted for, because they represent untranslated mRNA regions. If we consider OCs composed of non-matching ESTs to be part of genes already accounted for, and when we omit the criterion of 95% identity over a contiguous sequence of 150 bp to divine an OC, a minimum of 1,698 genes was obtained.

Annotation and functional classification of ESTs
Annotation of the ESTs using BlastX searches against sequences in NR GeneBank database, revealed that 55% of the ESTs had a high similarity with the best match (Evalue ≤ E -30 ), and 32% a moderate similarity (E -30 to E -5 ). The 13% remaining with an E-value ≥ E -5 , or with no match found, was classified as 'no hits found'. The lack of sequence homology may be related to the average length of 354 bp for these ESTs, about 120 bp less than the average sequence length of 471 bp for annotated ESTs (Tab. 3) [15].
From the 279 ESTs representing genes related to protein synthesis, 202 (6% of the total number of ESTs) represent genes encoding ribosomal proteins: 122 and 79 ESTs from the E and the NE library, respectively. In comparison, only  . Pred (prediction) indicates in which genotype (E or NE) the gene was found preferentially expressed. ER: expression ratios (log 2 ) between the embryogenic and the non-embryogenic genotype at day 4 of SE culture estimated by real-time RT-PCR. ER1 and ER2 designate the expression ratios obtained from 2 independent first strand cDNA synthesis reactions (see Methods). The asterisk indicates genes that were found to be differentially expressed according the thresholds applied (see Methods). The distributions of the putative functions of the annotated genes, except for those encoding ribosomal proteins, resembled those reported for genes or cDNAs in Arabidopsis and other plant species [17][18][19], including the about 25% sequences with undetermined functions. Apparently the SSH procedure had not effected an enrichment of ESTs representing genes implemented in particular functions in one of the genotypes, at least not at this level of functional assignment.

In silico screening and real time RT-PCR
The large proportion of the annotated genes represented by single ESTs (1,461 of 2,077, i.e. 70%) (Fig. 3) seems to indicate an efficient normalisation of the libraries; only 3% of the genes were represented by more than 5 ESTs. Furthermore, only 14% of the annotated genes in the different functional groups were represented by ESTs from both libraries (Tab. 1), as could be expected for an ad random selection of the ESTs from libraries that were effectively normalised.
To verify further the efficiency of the normalisation realised by the SSH procedure, we performed an in silico subtraction, or 'digital northern', on all the ESTs from our libraries. This analysis is based on the relation between the abundance of ESTs in a cDNA library and the differential expression of the corresponding genes [20,21], and is ordinarily performed on ESTs in cDNA libraries that are not normalized [22,23]. Inversely, applying this analysis may provide a measure for the normalisation achieved, and as normalization is nearly always imperfect, in particular for very abundant transcripts [24], it may also reveal ESTs representing genes preferentially expressed in the E or NE genotype.
Using the significance test of Audic and Claverie [25], it was found that from the 2,077 annotated genes, only 33 (1.6%) were expected to be differentially expressed; 6 genes preferentially expressed in the E genotype, and 27 in the NE genotype (Tab. 3). For 24 of the 33 predicted genes the abundance of transcripts was measured by real-time RT-PCR (Tab. 3). The results confirmed the differentially expression of 14 genes, i.e. 2 preferentially expressed in explants from the E genotype, and 12 in explants from the NE genotype. For 2 genes that were predicted to be preferentially expressed in the NE genotype, it was found that they were actually preferentially expressed in the E genotype. The remaining 8 genes were found to be not differentially expressed.
The relative low number of differentially expressed genes predicted by in silico subtraction suggested a high efficiency of the normalisation realised by SSH. It was reported that normalisation and enrichment by SSH is ineffective for abundant transcripts in tester or driver samples [24], leading to an elevated number of background clones. Indeed, the real-time RT-PCR experiments showed that in comparison to the level of transcripts for actin-2, transcripts of some genes were abundant (> 100-fold higher than actin-2) in explants of both K59 and C15, where as for other genes the level of transcripts was considerably lower (< 50-fold) than for actin-2 (data not shown). These results indicated that there was no relation between transcript level and EST representation in the libraries.

Discussion
An E and a NE cDNA-library were generated by SSH using mRNAs isolated from leaf explants from two chicory genotypes differing in SE capacity cultured for 4 days under SE-inducing conditions, and a total of 3,348 ESTs from both libraries turned out to represent a maximum of 2,077 genes. Real time RT-PCR analyses of the expression of 48 annotated genes revealed that after 4 days of culture under SE-inducing conditions, 14 genes were preferentially expressed in the E genotype, and 20 genes in the NE genotype (Tab. 3). This indicated that the E and NE library contain ESTs representing genes differentially expressed in K59 or C15, even though ESTs found present in one library not necessarily represented a gene exclusively expressed in the corresponding genotype. In addition, some of the differences in gene expression between the two lines might be the result of genetic differences that have nothing to do with SE capacity. As we intend to use ESTs from both libraries for future studies on gene expression during SE, the most important contribution of SSH in the construction of the libraries was probably the normalisation achieved, heightening the possibility to find ESTs representing feebly expressed genes.
We chose to construct the E and NE library using mRNAs isolated from explants of K59 and C15 cultured for 4 days under SE conditions on the basis of cytological observations, in particular the occurrence of the first cell divisions in explants of K59. Seven genes encoding ribosomal proteins were tested by real-time RT-PCR, and were all found to be preferentially expressed in explants of K59, the embryogenic genotype (Tab. 3). This probably reflects ribosome biogenesis required for the preparation of cells in the explants to enter the SE transdifferentiation pathway, and in particular to reinitiate cell divisions. The relation between augmented expression of genes encoding ribosomal proteins and cell divisions has been documented in several studies (e.g. [28,16]). In Z. elegans, many genes encoding ribosomal proteins were found to be preferentially expressed during the transdifferentiation of mesophyll cells into xylem cells [18], and in aspen relative high numbers of ESTs representing ribosomal protein genes were reported for cDNA libraries from meristematic tissues [29,30]. In addition to the ribosomal protein-encoding genes, preferential expression in K59 was detected for 2 genes implicated in cell cycling: a gene encoding a CDC48-like protein (OC1427_b) [31] and a G protein beta subunit-like protein (OC1929) [32] (Tab. 3).
Distribution and number of assembled sequences Figure 3 Distribution and number of assembled sequences.
The above results indicated that the preferential expression of genes implicated in cell division in K59 concurs with the cytological observations. Cells in the explants of K59 that enter the cell division cycle have lost their original identity, and most of them seem to enter the SE pathway thereafter. A gene (Cont6402)  were found to represent a gene homologous to ZWILLE/ PINHEAD in Arabidopsis. The early expression during SE of genes regulating stem cell maintenance may indicate that they also play a role in the transdifferentiation process accompanying SE (cf. [2]).
Another interesting result may be the preferentially expression in K59 of a gene (OC0687) putatively encoding an arabinogalactan protein (AGP) similar to DcAGP1 from carrot. DcAGP1 encodes a non-classical AGP with strong similarity to a family of basic proline-rich proteins [34]. AGPs are supposed to be involved in many signaling pathways [35], and were reported to be essential for the formation of somatic embryos in chicory [26].
The cytological studies indicated that in C15 cells reacted differently to the SE-inducing culture conditions, possibly by failing to progress in cell reactivation (Fig. 1). The differences in gene expression between C15 and K59 observed at day 4 of SE culture suggests that in contrast to the opportunistic response as observed for K59, cells in the explants from C15 reacted to the stresses applied by a defensive response. This was illustrated by the preferential expression in C15 of genes involved in the ethylene signalling pathway: two genes encode ACC oxidases (Cont0006 and OC0168), and a gene encodes an ethylene response element binding protein (OC1347). Some other genes preferentially expressed in C15, also related to defence, encode a metallothionein (Cont9039), a glutathione transferase (OC0023_a), and a leaf senescencerelated protein (OC1068) [36,37].

Conclusion
The E and NE cDNA libraries described in this paper will be important new tools in our ongoing efforts to unravel the molecular mechanisms underlying the early stages of direct somatic embryogenesis in chicory. None of the genes identified in this study has been identified as such previously in our laboratory [5][6][7][8]. This is probably due to the limited number of clones from our libraries that were sequenced, to the differences in the timing and way of selection, and/or differences between the E and NE genotypes used for screening. The results of the real-time PCR analysis showed that our libraries contain ESTs representing genes differentially expressed in the E genotype K59 and the NE genotype C15. It remains to be established, however, which of these genes are implicated in the different responses of the explants from both genotypes upon SE culture conditions, and in particular in the early stages of SE. A transcriptional analysis by cDNA microarray is currently performed for explants of K59 and C15 during the first 6 days of SE culture. This should lead to the identification of genes differentially expressed during SE, and their expression patterns may provide clues on their roles in this process. Furthermore, preliminary experiments indicate that the number of SE formed in explants from plants in progenies obtained after crossing K59 with a compatible low embryogenic genotype shows a continuous quantitative distribution, i.e. behaves as a quantitative trait. These plants are polymorphic for a large number of molecular markers, and a molecular genetic map for this progeny has been realized in our laboratory. This will serve to identify quantitative trait loci (QTL) for SE, as well as to map genes differentially expressed during SE. Co-localization of genes differentially expressed during SE with QTL for this process may help to identify those genes of which the expression is causally implicated in direct SE in chicory.
This report also presents a medium scale sequencing of cDNAs representing genes in chicory. In fact, of the 3,348 ESTs selected from the E and NE library (additional file 1), only 13 showed homologies to 11 chicory sequences of the total 218 entries for chicory already present in the GeneBank NR database. Though modest in comparison to databases for some other Asteraceae, like lettuce, sunflower, and Z. elegans, our database for chicory may serve as a source for comparative studies in this important plant family.

Somatic embryogenesis culture, tissue collection and RNA extraction
The Cichorium intybus embryogenic (K59) and nonembryogenic (C15) genotypes were grown in the greenhouse, and maintained by vegetative propagation. Leaves from six-leaves stage plants were surface sterilized, and cut up in fine strips (2 cm × 0.2 cm). Each culture contained 15 explants from a single leaf in 20 ml M17S20 culture medium [3], and was placed in darkness at 35°C under constant agitation (80 rpm). Explants were collected at day 4 of SE culture and RNA from each culture was