Computational annotation of genes differentially expressed along olive fruit development

Background Olea europaea L. is a traditional tree crop of the Mediterranean basin with a worldwide economical high impact. Differently from other fruit tree species, little is known about the physiological and molecular basis of the olive fruit development and a few sequences of genes and gene products are available for olive in public databases. This study deals with the identification of large sets of differentially expressed genes in developing olive fruits and the subsequent computational annotation by means of different software. Results mRNA from fruits of the cv. Leccino sampled at three different stages [i.e., initial fruit set (stage 1), completed pit hardening (stage 2) and veraison (stage 3)] was used for the identification of differentially expressed genes putatively involved in main processes along fruit development. Four subtractive hybridization libraries were constructed: forward and reverse between stage 1 and 2 (libraries A and B), and 2 and 3 (libraries C and D). All sequenced clones (1,132 in total) were analyzed through BlastX against non-redundant NCBI databases and about 60% of them showed similarity to known proteins. A total of 89 out of 642 differentially expressed unique sequences was further investigated by Real-Time PCR, showing a validation of the SSH results as high as 69%. Library-specific cDNA repertories were annotated according to the three main vocabularies of the gene ontology (GO): cellular component, biological process and molecular function. BlastX analysis, GO terms mapping and annotation analysis were performed using the Blast2GO software, a research tool designed with the main purpose of enabling GO based data mining on sequence sets for which no GO annotation is yet available. Bioinformatic analysis pointed out a significantly different distribution of the annotated sequences for each GO category, when comparing the three fruit developmental stages. The olive fruit-specific transcriptome dataset was used to query all known KEGG (Kyoto Encyclopaedia of Genes and Genomes) metabolic pathways for characterizing and positioning retrieved EST records. The integration of the olive sequence datasets within the MapMan platform for microarray analysis allowed the identification of specific biosynthetic pathways useful for the definition of key functional categories in time course analyses for gene groups. Conclusion The bioinformatic annotation of all gene sequences was useful to shed light on metabolic pathways and transcriptional aspects related to carbohydrates, fatty acids, secondary metabolites, transcription factors and hormones as well as response to biotic and abiotic stresses throughout olive drupe development. These results represent a first step toward both functional genomics and systems biology research for understanding the gene functions and regulatory networks in olive fruit growth and ripening.

biosynthetic pathways useful for the definition of key functional categories in time course analyses for gene groups.

Conclusion:
The bioinformatic annotation of all gene sequences was useful to shed light on metabolic pathways and transcriptional aspects related to carbohydrates, fatty acids, secondary metabolites, transcription factors and hormones as well as response to biotic and abiotic stresses throughout olive drupe development. These results represent a first step toward both functional genomics and systems biology research for understanding the gene functions and regulatory networks in olive fruit growth and ripening.

Background
Fruit development is the result of genetically programmed processes influenced by environmental factors. To identify and characterize genes involved in these processes, different genomic approaches (ESTs, large-scale microarrays, deep transcriptome profiling, etc.) have been used in several fruit species [1] and the body of information concerning transcriptional networks and regulatory circuits involved in important physiological and developmental processes increased tremendously during the last two decades. In tomato, large-scale EST sequencing projects resulted in a better insight into molecular mechanisms of fruit ripening processes and in the identification of common transcription factors not previously associated with ripening [2,3]. Generation of ESTs and consequent discovery of genes with potential roles in fruit development have also been reported in grape berry [4,5]. In apple, an extensive analysis has been made using all EST sequences available in public databases to identify genes temporally or spatially regulated during fruit growth and development [6]. Other extensive EST sequencing projects focusing on fruit development have been set up in peach [7], melon [8] and kiwifruit [9]. Sequence information derived from advanced EST sequencing is an essential resource for functional genomics studies based on the use of microarray technology and real-time PCR. Following the pioneering work of Aharoni and co-workers [10] on strawberry, several papers have now been published on the use of microarrays in different fruit species.
Olea europaea L. is an evergreen species, traditionally cultivated in the Mediterranean basin. The oil that results from mechanical extraction of the fruits is a predominant component of the worldwide known 'Mediterranean diet', to which increasing attention is being paid for its health benefits and cancer-protective properties [11]. These attributes are closely related to the oil composition and to the concentration of active bio-molecules resulting from the catabolic and anabolic processes taking place throughout olive fruit development which is a long process lasting several months. The oil content of olives can reach up to 30% (fresh weight) at full ripening [12]: it accumulates in the mesocarp and, at a lower extent, in the seed [13]. Oil accumulation in the pulp increases slowly, reaching the plateau after veraison. A marked tryacylglycerol (TAG) accumulation in seed and pulp occurs after endocarp lignification, when about 40 mg of oil per fruit per week can be synthesized. The fatty acid profile of the oil accumulating in the fruit is important in relation to its nutritional properties [11]. The main fatty acid is oleic acid (C18:1), which represents about 75% of total fatty acids, followed by linoleic (C18:2), palmitic (C16:0), stearic (C18:0) and linolenic (C18:3) acid. The pattern of fatty acid synthesis and desaturation varies during maturation and ripening, according to cultivars and to environmental conditions [14,15]. Other important metabolites accumulate throughout olive fruit development. They include polyphenols [16], carotenoids [17], chlorophylls [18], sterols and terpenoids [19] all directly or indirectly affecting olive oil quality and its technological and nutritional properties.
Information concerning genetic regulation of these metabolic processes in olive is still very limited. Only few genes involved in fatty acid metabolism have been characterized [13,[20][21][22][23][24]. A monosaccharide transporter (OeMST2), whose expression increases during fruit maturation, when a massive accumulation of sugars occurs, has been recently cloned [25]. Moreover, the gene encoding a geranylgeranyl reductase (OeCHLP) has been isolated and its role in organ development and stress response in relation to tocopherol action hypothesized [26]. Information at molecular level about polyphenol and triterpenoid metabolism is lacking, as well as the mechanisms involved in olive fruit development and ripening.
Among different strategies available for identifying differentially expressed genes, suppression subtractive hybridization (SSH) libraries have been successfully used in fruit science to elucidate mechanisms regulating anthocyanin metabolism in grape berries [27], proanthocyanidin biosynthesis in persimmon [28], processes involved in early growth and ethylene-induced ripening in banana [29,30], and in orange pigmentation [31]. This paper deals with the identification via SSH of large repertories of differentially expressed genes in developing olive fruits, and their computational annotation by means of different bioinformatic software. The identification and characterization of gene regulatory networks and key metabolic pathways during fruit growth and development represent a prerequisite for improving olive oil quality and its health-related properties.

Results
The study was based on the preparation of cDNA libraries using SSH that, likewise to the differential display (DD), represents an efficient strategy to isolate genes with an antagonist expression pattern. This technique enabled to identify transcripts of genes differentially expressed among the three different developmental stages of olive fruit corresponding to initial fruit set (30 DAF), completed pit hardening (90 DAF) and veraison (130 DAF) ( Figure 1).
As far as the composition of the four subtractive libraries is concerned, the number of differentially expressed sequences randomly chosen varied from a minimum of 236 to a maximum of 317 per library, with a total number of clones equal to 1,132 (Table 1). The average length of the cDNA clones was 597 bp with a wide range of variation, from 48 up to 1,283 bp. The redundancy within each single library was relatively low, ranging between 1.7% and 5.3%. Taking into account the whole set of sequences, the overall redundancy calculated among the four libraries was equal to 3.7%. The sequences of each single library were preliminarily analyzed using the CAP3 program in order to isolate the singlets and assemble contiguous and overlapping clones into contigs. This affected the comparative redundancy that increased up to 6.2%.
Querying with cDNA sequences the non-redundant NCBI databases allowed the attribution of a BLAST hit of 79%, 91%, 88% and 78% of the clones belonging to the A, B, C and D libraries, respectively ( Table 2). The average sequence similarity was around 76%, ranging from 72% to 80%, and the median E-value for each single library ranged from 1e-47 to 1e-77.
Around 75% of the BLAST hits of the olive fruit cDNA sequences were homologous to coding sequences present in the rice, Arabidopsis and grapevine genomes, with more than 1,000 hits per species. It is worthy to note that until now (June 2009) only 47 BLAST hits for olive could be recorded ( Figure 2).
The computational analysis of the whole EST collection using the software Blast2GO allowed the annotation of the expressed sequences according to the terms of the three main Gene Ontology vocabularies, i.e. cellular compartment, molecular function and biological process (Figure 3). As far as cellular compartments are concerned, the most represented are plastids and mitochondria, with more than 50% of the total annotations, followed by cytosol, plasma membrane, endoplasmic reticulum and nucleoplasm, whereas other cellular compartments were represented at a much lower scale ( Figure 3A). Concerning the molecular function, the most represented categories were those of nucleotide binding proteins, followed by proteins with transport, kinase and enzymatic activities. The other molecular functions were represented at a lower extent ( Figure 3B). More than 30 categories were found for the biological process vocabulary, being carbohydrate metabolism, response to biotic and environmental stresses, generation of precursors, metabolites and energy, and catabolic processes the most represented (Figure 3C). Although numerically less represented, it is worth to mention the presence of terms related to the secondary metabolites, metabolism of lipids, synthesis of amino acids and derivatives, metabolites and their precursors, and protein modification process.
Olive fruit growth and developmental stages considered for SSH library construction Focusing on the GO annotation of each single subtractive library, a number of olive fruit stage-specific GO terms were identified (Figure 4). Among the 296 and 464 GO terms found in the A and B libraries, 75 and 101 were associated to down-and up-regulated genes, respectively. The most significant GO terms were encoding elements of hormone biosynthesis and signal transduction mediated by ethylene, jasmonic acid, salicylic acid, and abscisic acid, as well as biosynthesis of secondary metabolites, such as terpenoids. In the C and D libraries, a total of 375 and 549 GO terms were recovered, 78 and 183 of which were related to down-and up-regulated genes, respectively. Among these, there are GO terms associated to environmental stress responses, catabolism of secondary metabolites (as terpenes, limonene and carotene), response to hormones (gibberellins and cytokinins), and auxin signal transduction.
The analysis of GO terms shared by pair-wise library combinations allows to determine which genes are continuously or transiently down-or up-regulated during the studied process. This analysis retrieved only 7 terms for the down-regulated genes, whereas as many as 69 were those collected among the up-regulated ones. On the contrary, comparable numbers (21 vs. 25) of GO terms associated to transiently up-and down-regulated genes among the three fruit developmental stages were found ( Figure  5).
Quantitative Real-Time PCR experiments were carried out to corroborate the expression patterns of a subset of sequences (i.e. 89 out of 642 unisequences, equal to 14%), corresponding to 61 different genes. Expression patterns related to the selected gene sequences and estimated in pair-wise comparisons between the three different fruit stages are reported in Figure 6. The Real-Time PCR analyses validated the results from SSH experiments for 42 out of 61 genes (about 69%). The validated genes were grouped according to different expression patterns (Additional file 1).
The Kyoto Encyclopaedia of Genes and Genomes (KEGG) was queried for sequences encoding enzymes and the deduced gene products were associated to specific metabolic and/or biosynthetic pathways related to carbohydrates, fatty acid and secondary metabolism. The most represented KEGG maps associated with carbohydrate and fatty acid biosynthesis and metabolism were organized in a simple network analysis representative of major communication ways among retrieved metabolic pathways (Additional files 3 and 4). As expected, several genes encoding enzymes related to carbohydrate and fatty acid compounds were transcriptionally up-or down-regulated during olive fruit development. Several components of the carbon fixation in photosynthetic organisms (Map:00710), and starch and sucrose metabolism (Map:00500) were modulated in their expression in the early stages of fruit development . KEGG maps analysis pointed out an intense up-regulation of the majority of enzymes related to the pentose phosphate pathway (Map:00030), glycolysis and gluconeogenesis (Map:00010) along with starch and sucrose metabolism  The max E-value was equal to 1e--6 corresponding to the cut-off adopted for GO annotation.
(Map:00500) (Additional file 3). Table 3 reports the enzymes involved in starch and sucrose metabolism, glycolysis and gluconeogenesis. Transcripts of enzymes involved in the synthesis of pyruvate from β-D-fructose-6-P, such as 6-phosphofructokinase , the precursors of palmitic and oleic acid, respectively, were up-regulated at veraison as well as a number of genes involved in the FA metabolism (Table 4).
KEGG maps were also produced for secondary metabolic pathways. Considering the flavonoids as an example, the BLAST hits as retrieved from NCBI databases , and biological processes (C) vocabularies. In (A) plastids and mitochondria were the most represented cellular compartments. In (B) the most represented categories were nucleotide binding proteins, followed by proteins with transport, kinase and enzymatic activities. In (C) more than 30 categories were found, being carbohydrate metabolism, response to biotic and environmental stresses, generation of precursors, metabolites and energy, and catabolic processes the most represented. The analysis of the olive EST dataset with the MapMan software enabled to reconstruct overview metabolism maps (Additional file 6) and to group sequences in the main regulatory networks ( Figure 8). As far as the in silico expression analysis of all olive EST clones linked to the Arabidopsis genechip sequences (Affymetrix), most genes proved to be associated to constantly and transiently regulated genes during fruit development involved in cell wall synthesis and breakdown, fatty acid biosynthesis and lipid breakdown, starch and sucrose metabolism, glycolysis, and secondary metabolism (i.e., terpenoids and flavonoids). Gene products involved in amino acid biosynthesis and metabolism were also identified (Additional file 6). Furthermore, several genes encoding transcription factors were mainly down-regulated throughout fruit development, while some others were related to protein modification and degradation ( Figure 8). Genes related to hormone biosynthesis and action appeared to be differently regulated according to the type of hormone. A down-regulation of genes involved in auxin biosynthesis and metabolism, as an oxido/reductase and an IAAamino acid synthase, occurred between 30 and 90 DAF, while during late development the expression of auxin responsive factors such as ARF1 and ARF7, were up-and down-regulated, respectively. By contrast, the synthesis of abscisic acid (ABA) was clearly stimulated throughout development, since two key enzymes encoded by ABA2 (ABA DEFICIENT 2) and AAO3 (ABSCISIC ALDEHYDE OXIDASE 3), were up-regulated at completed pit-hardening and veraison, respectively. In addition, a down-regulation of a gene encoding a GA-regulated protein occurred between completed pit hardening and veraison. During the same developmental phase, an up-regulation of the ARR1, a protein involved in the citokinin response, as well as a down-regulation of an UGT2, encoding a UDP-glucosyl transferase, actively involved in the metabolism of the hormone, were observed. Genes involved in jasmonic acid (JA) metabolism were up-regulated throughout fruit development. A transcriptional up-regulation of an ethylene receptor of the ERS type as well as a down-regulation of genes involved in brassinosteroid (BR) biosynthesis and action were observed during early development.
Taking into account genes related to biotic and abiotic stress responses, dual multiple contingency tests were performed to identify GO terms associated to sequences significantly and antagonistically distributed between libraries. The results of this analysis evidenced the abundance of sequences related to light stimulus (GO:0009416) and radiation (GO:0009314), as well as to biotic and abiotic stresses (Table 5). Sequences related to light stimulus and radiation resulted to be up-regulated from 30 to 60 DAF, while those associated with response to oxidative stresses were down-regulated during the same developmental period.

Discussion
Olea europaea L. is a common tree species of the Mediterranean basin that plays a peculiar role in the landscape characterization and represents a major agricultural commodity as source of olive oil. Olives are not only a significant food source, but also contribute to human health and are becoming popular in health-conscious diets far beyond the Mediterranean area of olive oil traditional use. Taking into account the increasing worldwide commercial interest of olive oil and the lack of information on its genomic features, a functional genomic approach, able to gain insights into the genetic and molecular aspects controlling fruit development and ripening, may be considered of primary interest.
A number of physiological and biochemical data is available on growth, development and ripening of the olive drupe but, unlike other fruit species such as peach, apple and grape, information on olive gene sequences and gene products is very limited in the main public gene databases. This study was carried out on cv. Leccino, one of the most widespread Italian varieties, characterized by a relatively short fruit developmental cycle, and a high degree of synchronization of processes defining ripening.
The SSH approach allowed the identification of 1,132 differentially expressed gene sequences in three selected developmental stages of the olive fruit, named initial fruit set (30 DAF), completed pit hardening (90 DAF) and Olive fruits showed significant differences for the distribution of GO terms among ontological vocabularies and categories in relation to the developmental stages. GO terms involved in carbohydrate, fatty acid and flavonoid metabolism were analyzed by setting up KEGG maps. As far as carbohydrates are concerned, genes involved in carbohy-drate metabolism were modulated in their expression and, in particular, the expression pattern of genes related to starch metabolism was coherent with a temporary role of starch as storage compound during fruit early development. Sequences of enzymes involved within the pentose phosphate pathway, glycolysis and gluconeogenesis, along with starch and sucrose metabolism, were up-regulated. The up-regulation of genes encoding enzymes involved in the synthesis of pyruvate from different substrates occurring during early development, may indicate that the fruit at this stage is highly energy demanding. There are two sources of assimilates for fruit growth in olive. The major source is certainly the sugars translocated in the phloem from leaves or sites of storage, comprising mannitol, raffinose, stachyose, and sucrose [32]. The secondary source is sugars formed by photosynthesis in developing fruits themselves that remain green for a considerable period and retain active chlorophyll even when they change colour as approaching maturity. While chlorophyll is mostly in the exocarp, the mesocarp has been shown to contain significant amounts of phosphoenol pyruvate [12], the CO 2 fixation enzyme of the CAM and C4 photosynthetic pathway.  Genes related to fatty acid (FA) biosynthesis appeared to be up-regulated throughout development although transcripts of specific enzymes accumulate at different extent depending on the developmental stage. Enzymes controlling FA chain elongation were up-regulated throughout fruit development: the synthesis of short chain FAs was impaired during early development whereas the synthesis of the precursors of palmitoleic and oleic acids was upregulated during late development. This finding is consistent with the oil accumulation pattern in olive mesocarp, that starts around 40-60 DAF and reaches at ripening the highest amount, confirming what previously observed [20,21]. A number of genes involved in the FA metabolism appeared to be differentially expressed throughout development with a significant up-regulation at veraison. This might be interpreted as a homeostatic reaction to the large FA accumulation occurring at this developmental stage.
Taking into account the secondary metabolites, genes involved in phenylpropanoid and alkaloid biosynthesis and caffeine, limonene and pinene metabolism appeared to be differentially expressed throughout fruit development. Dihydrokaempferol 4-reductase, flavanone 3-dioxygenase, naringenin-chalcone synthase, and leucocyanidin oxygenase, four enzymes controlling flavone and flavonol, as well as anthocyanin biosynthesis, were up-regulated from 90 to 130 DAF, thus suggesting an increased accumulation of the related metabolites during late development. Flavonoids are important secondary metabolites precursor of flavonols and anthocyanins, the latter being responsible of the color development occurring at ripening. Key genes related to the anthocyanin biosynthetic pathway (chalcone synthase, CHS, flavanone 3hydroxylase, F3H, dihydroflavonol reductase, DFR, and anthocyanidin synthase, ANS) proved to be up-regulated at veraison stage. A similar coordinated up-regulation of these genes has been observed in grape berries at the onset of ripening, at the time of pigmentation changes [33].
The relatively high abundance of transcripts related to hormones supports their key regulatory role in olive fruit development, as demonstrated in several other fruits [34]. The synthesis of ABA is clearly stimulated throughout fruit development, as demonstrated by the up-regulation of two key enzymes of its biosynthetic pathway. The pattern of expression changed according to the type of hormone. The observed down-regulation of genes involved in auxin biosynthesis and metabolism is consistent with a lowering of the auxin content reported in other fruits throughout development. The different regulation of auxin responsive factors, such as ARF1 and ARF7, which act as negative and positive regulators of IAA responsive genes, respectively [35], might imply a decreased sensitivity to the hormone during late fruit development. This is consistent with a negative regulation of auxin on the onset of the ripening syndrome observed in non climacteric fruits. At late developmental stage, a down regulation of the zeatin O-glucosyltransferase 2, actively involved in CK metabolism, as well as an up-regulation of ARR1, a protein involved in the CK response, have been observed. ARR1 is a type-B ARR transcription factor involved in CKresponsive phenomena. It has been proposed that ARR1, together with ARR10 and ARR12, redundantly play pivotal roles in the AHK-dependent phosphorelay signaling in response to CK [36]. Taking into account that CK metabolic enzymes are up-regulated by the hormone, these transcriptional changes may reflect a lowering of CK concentration, along with an increase of fruit sensitivity to CK occurring during late development. The veraison is a developmental stage characterized by a strengthening of Regulatory network map constructed with MapMan software using the olive fruit EST dataset Figure 8 Regulatory network map constructed with MapMan software using the olive fruit EST dataset. Several genes encoding transcription factors were mainly down-regulated throughout fruit development, while some others were related to protein modification and degradation. Genes related to the biosynthesis and action of IAA, ABA, GA, ethylene, cytokinins, JAs and SA appeared to be differently regulated according to the type of hormone.
the fruit sink action, that, also in olive, might be regulated by a complex hormone cross-talk and interaction, as demonstrated in other non climacteric fruits as grape berries [37]. In spite of any signal variation in terms of expression of genes related to ethylene biosynthesis, an up-regulation of an ERS type ethylene receptor has been observed during early fruit development. This might imply an increase in sensitivity of the fruitlet to ethylene, the hormone involved in the regulation of the immature fruit physiological drop that in olive occurs between fruit set and pit hardening [38]. Genes involved in the jasmonates (JAs) metabolism and brassinosteroids (BRs) biosynthesis were up-and down regulated, respectively. If the gene expression pattern is mirrored by a similar evolution of JA and BR concentration in fruit tissues, this would indicate that the role of JAs and BRs in olive is different from other fruit types such as peach and grape. In fact, in peach it has been demonstrated that JAs delay fruit ripening [39], while BRs stimulate the onset of veraison in grape berry [40].
The dual multiple contingency tests allowed the identification of GO terms related to biotic and abiotic stress responses significantly and antagonistically distributed between libraries. The transcriptional profile of genes related to light, temperature and biotic stresses parallels the dynamics of day length and light intensity, both peaking at the pit hardening stage. Moreover, these data may indicate that genes responding to environmental stimuli are transcriptionally regulated up to pit hardening, and post-transcriptionally up to the veraison.

Conclusion
As a concluding remark, the SSH technique allowed to identify a set of 642 differentially expressed unique sequences. Among these, 89 (14%) corresponding to 61 different key genes were further investigated by Real-Time PCR, pointing out a validation of the SSH results as high as 69%. The bioinformatic annotation of all gene sequences was useful to shed light on metabolic pathways and to understand specific regulatory networks. In fact, data here reported represent a significant contribution to the elucidation of transcriptional aspects related to carbohydrates, FAs, secondary metabolites, transcription factors and hormones as well as response to biotic and abiotic stresses. Particularly interesting are data related to hormones, pointing out the complexity of the role played by these compounds in olive fruit development and ripening.
These molecular and bioinformatic data represent a first step toward both functional genomics and systems biology research for understanding the gene functions and regulatory networks in olive fruit growth, development and ripening.

RNA extraction
Total RNA was extracted from pericarp of about 16 fruits for each sampling date, using the RNeasy Plant Mini Kit (Qiagen). Contaminating genomic DNA was removed from total RNA by two DNase treatments. The first was performed using the RNase-Free-DNase Set (Qiagen) during RNA isolation procedure. After sample elution, total RNA was treated with DNAse I (Promega). The RNA aqueous method (Ambion) was applied to purify total RNA from phenolic compounds and other substances that  (Figure 1). For the two forward libraries A and C, cDNA from samples collected at 30 and 90 DAF, respectively, were used as tester and cDNA from 90 and 130 DAF, respectively, were used as drivers, vice versa for the reverse subtractive libraries B and D. For DNA isolation alkaline lyses method was followed. The clones were grown in 0.5 ml of Terrific Broth (TB) overnight in 96 well plates. An aliquot was saved for glycerol stocks.
Rest culture was centrifuged in HT-6000BSorvall centrifuge and re-suspended in P-1 buffer. Following re-suspension, buffer P-2 was added to lyse the cells. After 2 min, buffer P3 was added to neutralize the solutions. The lysate was transferred to 96-well filter plates then the centrifugation was performed at 2,000 g for 2 min. The clear lysate thus obtained was mixed with 0.7 volumes of ethanol and centrifuged at 6,500 g in HT-6000B centrifuge for 20 min. The pellet was once washed with 70% ethanol, dried dissolved and re-suspended in 50 μL of sterile water. The complete analysis was performed using CLC Combined Workbench 3 software. The sequence data were imported in the directory files of the software. The software has data bank of various vectors in use for library construction, if not it was imported from Manufacture's web site (Invitrogen). The sequence data were mass aligned with the vector and the sequences homologous to vector were mass trimmed. The vector trimmed sequences were mass blasted in batches using CLC software. The resulting files were saved and exported into Microsoft excel format. The sequencing was performed using ABI3700 automatic DNA sequencers using 25-50 ng of DNA template according to manufactures protocol. These experimental steps have been performed at Rx Biosciences Lab (Rockville, MD, USA). Annotation of all sequences was performed by using default parameters on the two ranges of length previously described. Furthermore, InterPro Scan [45] was performed to find functional motifs and related GO terms by using the specific tool implemented in the Blast2GO software with the default parameters. Finally, the 'Augment Annotation by ANNEX' function was used to refine annotations (http://www.goat.no, [46]). The GOslim 'goslim_plant.obo' was used to achieve specific GO terms by means of a plant-specific reduced version of the Gene Ontology http://geneontology.org.

Gene Ontology annotation
Annotation distribution among originated libraries was represented by Venn diagrams by computing all retrieved annotation with the VennMaster software with the default parameters http://www.informatik.uni-ulm.de/ni/staff/ HKestler/vennm/doc.html.
Enzyme mapping of annotated sequences was done by direct GO to Enzyme annotation and used to query the Kyoto Encyclopaedia of Genes and Genomes (KEGGhttp://www.genome.jp/kegg/, [47][48][49]) to define the main metabolic pathways involved.
MapMan http://gabi.rzpd.de/projects/MapMan/ analysis was done using the olive dataset properly rearranged as input files. The Arabidopsis proteome was downloaded from ftp://ftp.arabidopsis.org/home/tair/Genes/ TAIR8_genome_release/TAIR8_sequences/ and the olive transcriptome dataset used as query for local BLASTX analysis. Once blasted, the Arabidopsis AGI code relative to all BLAST hits with an E-value equal or greater than E-6 were recovered by means of an home made Perl script.
Retrieved AGI codes were then converted in ATH1-121501 genechip identifiers (Affymetrix) by using the PhyloGenie web interface http://bioinfos erver.rsbs.anu.edu.au/utils/affytrees/; [50]). For each library, an arbitrary expression values were assigned to all EST-linked Affymetrix identifiers, and the originated dataset used as input form for subsequent MapMan analysis. Finally, a time course representation of whole olive EST dataset related to primary and secondary metabolites and cellular processes environment was done by mapping the datasets within the appropriate MapMan pathways.

Real-Time PCR analysis
Quantitative Real-Time PCR experiments were carried out to validate some of the genes isolated by SSH and characterized by GO. Among the whole dataset of non-redundant sequences, 89 gene sequences belonging to key biosynthetic and metabolic pathways were selected according to the length (over 100 bp) and e-value (higher than 1E-6). All cDNAs were prepared from fruits collected at three developmental stages (30,90, and 120 DAF), corresponding to the ones used for the construction of SSH libraries, using the Super Script Reverse Transcriptase kit (Invitrogen).
Specific primer pairs for each of the sequences were designed (Additional file 7) and tested for their activity at 60°C by conventional PCR. Different primers were also designed to discriminate putative isoforms. Quantitative Real-Time RT-PCR analyses were then performed using a thermal cycler 7300 Real-Time PCR System (Applied Biosystem) equipped with a 96 well plates system with the SYBR green PCR Master Mix reagent (Applied Biosystem). All Real-Time PCR experiments were performed with two independent sets of RNA samples: each analysis was performed in a final volume of 20 μl containing 2 μl of cDNA diluted 1:50, 0,3 μM of each primer, and 10 μl of 2× SYBR Green PCR Master Mix according to the manufacturer's instructions. The following thermal cycling profile was used for all PCRs: 95°C for 20 sec, 50 cycles of 95°C for 10 s and 60°C for 1 min. All quantifications were normalized to the Olea europaea Elongation Factor 1 gene used as housekeeping gene and amplified in the same conditions. Data resulting from quantitative Real-Time PCR were corrected on the basis of the housekeeping gene by the ΔΔCt method. Pair-wise analyses of gene expression values at the three developmental stages were performed by comparing fruit stages (2 versus 1 and 3 versus 2). These two series of ratios were treated by Cluster 3.0 software [51]. All data were normalized and transformed in a logarithmic scale to compare expression levels of all genes and group them according to expression patterns.