Comparative mapping in the Fagaceae and beyond with EST-SSRs

Background Genetic markers and linkage mapping are basic prerequisites for comparative genetic analyses, QTL detection and map-based cloning. A large number of mapping populations have been developed for oak, but few gene-based markers are available for constructing integrated genetic linkage maps and comparing gene order and QTL location across related species. Results We developed a set of 573 expressed sequence tag-derived simple sequence repeats (EST-SSRs) and located 397 markers (EST-SSRs and genomic SSRs) on the 12 oak chromosomes (2n = 2x = 24) on the basis of Mendelian segregation patterns in 5 full-sib mapping pedigrees of two species: Quercus robur (pedunculate oak) and Quercus petraea (sessile oak). Consensus maps for the two species were constructed and aligned. They showed a high degree of macrosynteny between these two sympatric European oaks. We assessed the transferability of EST-SSRs to other Fagaceae genera and a subset of these markers was mapped in Castanea sativa, the European chestnut. Reasonably high levels of macrosynteny were observed between oak and chestnut. We also obtained diversity statistics for a subset of EST-SSRs, to support further population genetic analyses with gene-based markers. Finally, based on the orthologous relationships between the oak, Arabidopsis, grape, poplar, Medicago, and soybean genomes and the paralogous relationships between the 12 oak chromosomes, we propose an evolutionary scenario of the 12 oak chromosomes from the eudicot ancestral karyotype. Conclusions This study provides map locations for a large set of EST-SSRs in two oak species of recognized biological importance in natural ecosystems. This first step toward the construction of a gene-based linkage map will facilitate the assignment of future genome scaffolds to pseudo-chromosomes. This study also provides an indication of the potential utility of new gene-based markers for population genetics and comparative mapping within and beyond the Fagaceae.


Background
Genetic linkage maps constitute an ideal framework for studies of the genetic architecture of quantitative traits [1,2] and genome evolution [3,4]. They are also a prerequisite for map-based gene cloning [5][6][7] and for the ordering of physical scaffolds in genome sequencing projects [8]. Furthermore they are essential tools for marker assisted plant breeding [9].
Comparative analyses of genetic maps across phylogenetically related species are based on the development of transferable and orthologous genetic makers. Simple sequence repeats (SSRs) are the markers of choice, because they are reproducible, abundant in the genome and they provide highly polymorphic information and are readily transferable between phylogenetically related species [10]. Their properties are highly prevalent in EST-derived SSRs, making these markers particularly useful, as shown for Theobroma [11], Silena [12], Prunus [13], Dactylis [14] and Citrus [15]. SSRs are also easy to handle and, once developed, are cost-effective markers for high-throughput genotyping.
In the last 12 years, several linkage maps have been generated for the three main genera of the Fagaceae family: oaks (Quercus), beeches (Fagus), and chestnuts (Castanea). These long-lived species constitute important economic and ecological resources and have been the focus of genetic investigations relating to their evolution and more applied objectives, such as those of conservation and breeding programs [16]. Linkage maps have been established to support forward genetic approaches for studying the genetic architecture of adaptive traits (number, location and effect of QTLs) and to increase our knowledge of the structural features of the oak genome and its evolutionary history.
First-generation linkage maps have been obtained with anonymous RAPD and AFLP markers for oak [17], chestnut [18,19] and beech [20]. QTL studies, mostly in oak, have focused on dissecting the genetic architecture of adaptive traits, such as growth and bud phenology [21][22][23] and of traits related to species divergence between pedunculate and sessile oaks, two species occurring in sympatry in Europe [24,25]. A limited number of genomic SSRs (about 50) and ESTbased (about 50) markers [26,27], have also been added to these maps. These markers allowed to align homologous linkage groups between oak and chestnut and to compare and validate the QTLs that had been previously characterized in the two genera [27,28]. A first step toward the construction of a dense SSRbased genetic map was taken recently, with the development and mapping of 256 EST-SSRs [29]. The authors used a selective mapping strategy with a bin set of 14 highly informative offspring from a single full-sib (FS) mapping population for which an AFLP framework map was available. SSR markers were assigned to 44 bins of the female and 37 bins of the male parental maps, spanning the entire genome.
The main goal of this study was to advance the establishment of a dense EST-SSR-based map for oak, by genotyping trees with a broader genetic background and using a larger set of genomic and EST-SSRs. Our specific objectives were as follows:

i) To optimize comparative mapping between two
Quercus species by identifying a subset of SSRs that were transferable and orthologous across different mapping pedigrees. We genotyped a total of 400 offspring from five families obtained from controlled crosses of the Q. robur and Q. petraea genotypes. We then generated 10 individual linkage maps (one for each of the parents used in the crosses) by the two-way pseudo-testcross mapping strategy [30] and constructed consensus maps for each species from 419 genomic and EST-based SSR markers.
ii) To determine gene content (synteny) and order (collinearity) between these two sympatric species [7,[30][31][32]. iii) To assess the transferability of a subset of EST-SSRs in several Fagaceae and Nothofagaeae species and to describe the genetic diversity of several oak populations depending on the type of the repeated motifs. We also mapped transferable EST-SSRs, in European chestnut for which two linkage maps were available [28] making it possible to refine the first comparative map for oak and chestnut [27]. iv) To unravel the evolutionary paleohistory of oak chromosomes, by genetic mapping of 321 EST-SSR and 60 SNP-based markers identified from oak transcriptome sequence information (31,798 Sangerbased unigenes from Ueno et al. [33]).
These four objectives are interconnected, as shown in Figure 1.

Functional annotation of EST-SSRs
The functional annotation of EST-SSRs was based on Gene Ontology [34] and was performed with Blast2GO [35], using the following parameters: Blastx search against the non redundant NCBI database (e-value of 1e -6 ).
On the basis of GO categories, we assigned oak ESTs containing SSR motifs (Ueno et al. [33]) to three principal groups: biological processes, cellular compounds and molecular functions. The GO classification obtained was compared (with Expander software, [36]) between four sets of sequences containing SSRs: 3'UTRs (7,680 elements), 5'UTRs (8,646 elements), coding regions (13,899 elements) and non-coding regions (15,829 elements).
Mapping of SSRs in Q. robur and Q. petraea and construction of consensus species maps Mapping populations Five mapping pedigrees (P1-P5) of variable sample sizes were used (Table 1), consisting of one Q. robur x Q. petraea, one Q. petraea and three Q. robur full-sib families. These full-sibs were installed at the nurseries of INRA (Cestas-France), the University of Göttingen (Germany) and Alterra (Wageningen -The Netherlands). DNA was extracted from the leaves with the DNeasy plant mini kit (Qiagen, Hilden, Germany), according to the manufacturer's instructions.

Development of SSR markers and genotyping
A subset of 573 EST-SSRs identified by Durand et al. [29] was screened for polymorphisms against the 10 parents of the five mapping populations and four offspring per pedigree. We added 93 genomic SSRs (gSSRs) described in previous studies [37][38][39][40][41][42][43][44], KawaharaT pers comm) to the screening step (see Additional file 1 for detailed information on markers and primer sequences). We then used the polymorphic markers to genotype the five mapping populations.

Individual map construction
We constructed 10 parental genetic linkage maps (7 for Q. robur and 3 for Q. petraea) by the two-way pseudotest cross mapping strategy [30]. Linkage analysis was performed with JoinMap version 4.0 [46]. Polymorphic SSR loci were classified into three categories: testcross markers segregating in a 1:1 ratio, testcross markers segregating in a 1:1:1:1 ratio and intercross markers segregating in a 1:2:1 ratio. Chi-squared goodness-of-fit tests were used to identify markers with patterns of segregation departing from Mendelian expectations. Loci with distorted ratios (P-value <0.05) were excluded from linkage map construction. Individuals and loci for which more than 50% of the data were missing were excluded from the analysis. A minimum LOD score of 3 and a maximum recombination fraction of 0.45 were set as the linkage thresholds for marker grouping. Maternal and paternal datasets were created with the "create maternal and paternal population nodes" command in JoinMap. The regression mapping algorithm was used for map construction. Recombination frequencies were converted into map distances in centimorgans (cM), with the Kosambi mapping function. Linkage groups were drawn with MapChart [47].

Estimation of genome size
Genome length (L) was estimated from partial linkage data, according to the formula L = n(n-1)d/k, where n is the number of framework markers, d is the maximum distance between two adjacent markers (in cM) at a minimum LOD score for linkage, and k is the number of marker pairs with a LOD value exceeding a minimum threshold [48,49]. LOD score thresholds of 3, 4 and 5 were used to estimate genome length.
Construction of consensus genetic linkage maps for Q. robur and Q. petraea Consensus species maps for Q. robur and Q. petraea were established by combining parental map datasets for each species with the "join-combine groups for map integration" command of JoinMap, which creates a composite map from different linkage groups sharing common markers. We used the mapping parameters and options described above. We assessed the heterogeneity of recombination rates between SSR marker pairs, and SSR markers with highly heterogeneous recombination rates were excluded from the construction of species-specific framework maps. Markers that could not be ordered with the same degree of confidence were added as accessory markers, using the two-point LOD scores and recombination fraction available from the "maximum linkage" table of JoinMap. Similarly, when several markers were found to be collocated, only one was retained on the species framework map; the others were added as accessory markers.

Databases
Single-tree genotypic data for offspring and linkage maps are available from the QuercusMap database of the Quercus portal (https://w3.pierroton.inra.fr/QuercusPortal/index.php), the European genetic and genomic web resources for Quercus. DNA sequences and primer pairs for SSR loci are available from the SSR database at the same URL.
Transferability of EST-SSRs and comparative mapping of oak and chestnut Transferability of EST-SSRs We assessed the transferability of EST-SSR markers to other Fagaceae species, by carrying out cross-species amplification in six species (Castanea sativa, Fagus sylvatica, Quercus faginea, Quercus pyrenaica, Quercus ilex, and Quercus suber). We also assessed transferability to two species of the related family Nothofagaceae (Nothofagus pumilio and Nothofagus antarctica). Each species was represented by at least two individuals. SSR amplification and genotyping were performed as described above, with a subset of 243 EST-SSRs randomly selected from the list reported by Durand et al. [29]. The selected microsatellite markers included 137 di-, 90 tri-, 2 tetra-, 1 penta-and 13 hexanucleotide repeats.

Comparative mapping of Quercus and Castanea
In total, 96 offspring of a single full-sib pedigree of Castanea sativa were genotyped with Quercus EST-SSRs.
We assessed the amplification of 339 loci using the PCR conditions described above. Polymorphic SSRs were added to the polymorphic markers already available for this pedigree (including RAPD, AFLP, gSSR, EST-P markers; 393 loci in total). Individual parental maps and a consensus map were constructed with JoinMap, using the same procedure followed for Quercus. Finally, homologous linkage groups in Quercus and Castanea were identified from the location of orthologous markers displaying multiple and parallel linkages.

Diversity analysis
Two experiments were carried out to provide insight into the genetic diversity of EST-SSRs. The first focused on a large number of loci in a small number of individuals of the two sympatric species Q. robur and Q. petraea (Additional file 2). We assessed the polymorphism of the same set of 243 oak EST-SSRs for 12 individuals from each species. DNA was extracted from leaves with the DNeasy plant mini kit (Qiagen). An M13 tail (TGT AAA ACG ACG GCC AGT) was added to the 5'-end of each forward primer, as described by Schuelke [45].  [50], which is implemented for the sample-size independent rarefaction analysis of allelic richness.
The second experiment was conceived as a proof of concept for the use of the EST-SSRs in the genetic analysis of the largely unknown semi-decidious oak species distributed around the Mediterranean basin. We genotyped 96 individuals from the two submediterranean oak species Quercus faginea and Quercus pyrenaica with 64 EST-SSRs evenly distributed among the 12 linkage groups (25 in common with the previous experiment). Additional file 3 shows the locations of the 8 populations per species that were selected to represent most of the geographic and ecological variation in the two species. DNA was extracted from leaf samples using a modified CTAB method because many of the Q. pyrenaica samples clogged the columns of commercial DNA extraction kits. The main modification to the standard DNA extraction procedure was the thorough chloroform extraction (3-4 times with a 1:1 volume) following cell lysis in the CTAB buffer. PCRs were performed in a total volume of 10 μL containing 1 x PCR buffer, 100 μM of each dNTP, 0.25 U Taq polymerase (KAPA Taq, KapaBiosystems, Boston, USA), 2 mM MgCl 2 , 0.045 μM forward primers, 0.165 μM reverse primer, 0.165 μM M13-fluorescent primer and 10 ng of template DNA. Cycling conditions consisted of an initial denaturation step (94°C 5 minutes) followed by 7 touch-down cycles (from 63.7 to 59.5°C), 20 cycles at 59.5°C annealing temperature, 12 cycles at 57.5°C annealing temperature and a final extension step (10 minutes at 72°C). The fluorescently labelled PCR products were electrophoretically separated in an ABI3130 sequencer (Applied Biosystems) using the GS500LIZ size standard. Peak sizes were scored with GeneMapper v.4.0 and allele binning was performed with MsatAllele R package [51]. Genetic diversity parameters (AR, H o , H e ) were estimated with Fstat v.2.9.3 [50].

Oak genome evolution
Gene choice and genotyping of SNP-based markers We constructed an integrated map for Quercus based on the EST-SSRs and an additional set of SNP-based markers, for analysis of the synteny between the oak linkage map and those of other previously sequenced eudicots.
We identified 105 candidate genes (set 1) for involvement in bud burst on the basis of the following criteria: i) differential expression between the periods before and after bud flush [52], ii) colocalization with bud burst QTLs [22], and iii) a known functional role in model plants. Two types of polymorphisms were identified: in vitro SNPs/Indels from resequenced gene fragments from a panel of nine oak populations [22] and unpublished data and in-silico SNPs/Indels retrieved from expressed sequence tags [33] as described by Lepoittevin et al. [53]. Finally, 78 in vitro and 306 in silico SNPs/ Indels were included in a 384-SNP assay, including 26 insertions-deletions (indels) of between 1 and to 3 bp in size. Full description of the SNP array is provided by Alberto et al. [54].
Genotyping was carried out on three mapping populations comprising 177, 80 and 90 F1 plants from the P1, P2 and P3 pedigrees, respectively. DNA was extracted with the Invisorb DNA plants 96 kit from Invitek (GmbH, Berlin, Germany), according to the manufacturer's instructions. Multiplex reactions were prepared with 250 ng of template DNA per sample. Genotyping was carried out with the Illumina GoldenGate SNP genotyping platform (Illumina, San Diego, CA, USA) at the Genome-Transcriptome Facility in Bordeaux, France (http://www4.bordeaux-aquitaine.inra.fr/pgtb). The intensity of the fluorescent signals was measured with the BeadXpress Reader (Illumina Inc, San Diego, USA) and analyzed with GenomeStudio v 3.1.14 (Illumina Inc). Quality scores were generated for each genotype, using a GenCall50 (GC50) score cutoff of 0.25 and a CallRate (CR) threshold of 0.85. These scores reflect the quality of genotype clusters (GC50) and the proportion of samples with a genotype defined for a particular SNP (CR) [55]. Genotype clusters were adjusted manually if necessary.
In addition to this first set of markers, seven candidate genes (set 2) for drought and hypoxia tolerance (from 25 genes initially screened) were found to be informative in one to three pedigrees. Two methods were used for genotyping: i) SSCP (single-strand conformation polymorphism [56], which was used for the first time on a Licor sequencer (see Additional file 4), and ii) primer extension with the detection of fluorescence polarization [57] with the Acycloprime-FP SNP detection kit (Perkin Elmer Life Sciences, Boston, MA, USA). Genotyping was carried out in accordance with the kit manufacturer's instructions and fluorescence was measured with a fluorescence polarization reader (Victor-Wallace from Perkin Elmer Life Sciences, Boston, MA, USA) at 20, 25 and 35 cycles.

Consensus map construction
The consensus map was constructed by joining the 10 independent parental maps based on the mapping populations in which EST-SSRs (P1 to P5) and SNP-based markers (P1 to P3) were mapped. JoinMap was first used to calculate individual maps from raw segregation data, with the Kosambi mapping function. A minimum LOD score threshold of 3.0 was used for the grouping of all markers. An integrated map was then constructed for each linkage group, by integrating the "bridge markers" common to two or more individual maps. For construction of the consensus map, we assumed that the rates of recombination between the two species and between male and female maps were uniform (but see [58]).
Using the regression algorithm of JoinMap, we obtained three maps with different levels of statistical support for ordering (denoted map1-map2-map3 in descending order of statistical support) for each linkage group. For macrosynteny analysis, we decided to retain the most reliable map (map1), adding markers with lower LOD scores as accessory markers. The position of each accessory marker relative to its most probable framework marker was then determined from the twopoint LOD scores and recombination fractions provided by the "maximum linkage" table of JoinMap. Finally, markers were assigned to 10 cM bins within each of the 12 linkage groups of the consensus map, for the identification of regions orthologous to sequences in Arabidopsis, grape, poplar, Medicago and soybean.

Synteny and duplication analysis
We used BLAST to align genomes (i.e. CDS for sequenced genomes and ESTs for oak). We used two parameters for these analyses, to take into account not only similarity but also the relative lengths of the aligned sequences: CIP (cumulative identity percentage) and CALP (cumulative alignment length percentage). CIP = P [ID x (HSP/AL) x 100] corresponds to the cumulative percentage sequence identity observed for all the highscoring sequence pairs (HSPs) divided by the cumulative aligned length (AL), which corresponds to the sum of all HSP lengths. CALP [AL/Query length], is the cumulative AL for all HSPs divided by the length of the query sequence. The use of these parameters for BLAST analysis resulted in the highest cumulative percentage identity over the longest cumulative length, thus maximizing stringency in the definition of conservation between the two genomes compared.

Distribution of paralogous and orthologous gene pairs
We estimated sequence divergence and dated speciation events, based on the rates of non-synonymous (Ka) and synonymous (Ks) substitutions calculated with MEGA-3 [59]. The mean substitution rate (r) for grasses -6.5 × 10 -9 substitutions per synonymous site per yearwas used to determine the ages of the genes considered [60,61]. The time (T) since gene insertion was then estimated with the formula T = Ks/r.

SSR-based map construction in Q. robur and Q. petraea and synteny analysis Identification of polymorphic markers
In total, we identified 573 primer pairs, which were tested for polymorphism in at least one pedigree (Table 2). Overall, 378 EST-SSRs were informative. We tested 93 of the gSSRs already available for the Fagaceae; 68 (73%) were found to be polymorphic in at least one pedigree. Thus, in total, 446 polymorphic loci (68 gSSRs and 378 EST-SSRs) were available for further mapping.

Construction of individual linkage maps
Genotypic data were available for 446 loci in one to five mapping populations among them 397 were mapped. We found that 50% to 85% of the loci tested were polymorphic, depending on the pedigree. The interspecific pedigree (P2) was found to be less polymorphic than the intraspecific pedigrees (Table 2). Differences in levels of polymorphism between intra vs. interspecific pedigrees are likely be due to sampling effects, as the two species exhibit similar levels of genetic diversity and very low interspecific differentiation. The number of tested loci found to be polymorphic varied considerably between the parents, from 100 loci for P5-female (Q. robur) to 205 loci for P1-female (Q. robur). Distorted loci were more frequent for P2-female (Q. robur) from the interspecific pedigree (6.8%) than for the other pedigrees ( Table 2). Unlinked loci were rare (except for both of the parental maps for P5, in which 45% of the loci were ungrouped). In the following analyses, we focused on the four pedigrees (P1 to P4) because of the smaller set of data for P5.
Linkage group (LG) statistics We constructed 12 LGs for each parental map, except for the interspecific P2male parent, for which LG11 was missing due to the fact that all markers were distorted and therefore excluded a priori from linkage map construction. Interestingly, LG11 for the P2-female parent also included four distorted loci suggesting the presence of loci involved in species incompatibility. The development and mapping of a large amount of SNP markers will certainly provide new insights into the identification and mapping of loci involved in reproductive barriers between these two hybridizing oak species.
The mean number of markers per LG was between 7.6 for LG4 and 26.9 for LG2 over the 8 maps, ("groups" with only 2 markers were not considered) (Additional file 7). The mean map length of the various LGs was between 42.2 cM for LG4 and 85.3 cM for LG2, with an overall mean of 58.4 cM over the 8 maps (Additional file 8). Estimation of total genome length The total observed map length was 572.9 cM to 846.6 cM (Additional file 8).
Based on these partial linkage data, estimated genome sizes were obtained for various LOD score values (3, 4 and 5). They ranged from 945 to 1,611 cM (Additional file 9).

Number of alleles of mapped SSRs
The number of alleles observed in the 10 parents depended on the type of motif considered: the number of loci with three or four alleles was systematically higher for loci with dinucleotide repeats than for those with trinucleotide or hexanucleotide repeats (Table 3). This trend was conserved even if we excluded gSSRs from the analysis.
Mapping with several pedigrees The genotyping of several pedigrees significantly increased the number of loci identified as polymorphic ( Figure 2). P1 was the most informative pedigree in terms of mapped markers (274 polymorphic loci). However, adding P3 to the analysis increased the number of markers identified as polymorphic by 24% (88 new markers). Successive additional inclusions of P2, P4 and P5 increased the number of polymorphic markers identified by 35, 11 and 16 SSRs, respectively. For the 12 linkage groups obtained for each parental map, the number of markers common to at least two maps varied between 15 (LG4) and 54 (LG2) (Figure 3). The information provided by each marker for the 10 genotyped parents varied from 18.5% polymorphic loci for one parent, to 21.4% for two, 15.4% for 3, 14.4% for 4, 9.1% for 5, 11% for 6 and less than 5% for 7 or more ( Figure 4). The number of shared loci per linkage map decreased with the number of maps considered. In total, 89 mapped SSRs were common to two parental maps, whereas only two were common to nine parental maps.

Consensus maps for Q. robur and Q. petraea and comparative mapping
A consensus map for Q. robur was constructed from the seven Q. robur parental maps. This map includes 398 markers (including 179 accessory markers) and spans 933 cM (Table 4). Similarly, a consensus map for Q. petraea was established from the three parental maps available for this species. It includes 275 markers (90 accessory markers) and spans 767 cM (Table 4, Figure 5).
LG  Table 5). The consensus species maps were compared for the analysis of genomic organization and structural rearrangements. A high degree of macrocollinearity was observed between the two maps, based on 100 common markers evenly distributed over the 12 LGs ( Figure 5). Some order discrepancies occurred in small sections of LGs, as in LG2 and LG3, for example. Furthermore, the positions of a few markers were inconsistent over larger distances. For example, GOT009 was localized to the top of LG1 for Q. petraea but was found in the center of this LG in Q. robur. It should also be noted that LG11 was split into two parts in Q. petraea.

Transferability of EST-SSRs to other members of the Fagaceae and Nothofagaceae and comparative mapping of Quercus and Castanea Transferability of EST-SSRs
We assessed the transferability of 198 EST-SSR markers to Q. ilex and Q. suber, of 194 markers to C. sativa and F. sylvatica, and 126 markers to N. pumilio and N. antarctica (Additional file 10). A PCR product of the expected size was amplified in at least one of the Fagaceae or Nothofagaceae species for 91.8% (223/243) of the EST-SSRs tested. Within the Fagaceae family, transferability was greatest for

Comparative mapping
We mapped 555 polymorphic markers in Castanea ( Table 6), 91 of which were common (63 EST-SSRs, 16 gSSRs, 12 EST-P) to the consensus Quercus map. For all 12 Castanea LGs (LG-C), homologous linkage groups were identified in Quercus (LG-Q), with four to 17 markers shared for LG4-C (= LG5-Q) and LG1-C (= LG2-Q), respectively (Table 7). A set of 16 markers was located on linkage groups that were not homologous between Castanea and Quercus. Overall, macrosynteny was well conserved between the two genera, despite the inversion of a few markers (illustrated for one LG in Figure 6 and supported for all LGs in Additional file 11).

Diversity analysis
A high-quality amplification product was obtained for 83.8% (166) of the 198 markers studied in the first experiment and 94.6% (157) were found to be polymorphic in at least one natural population of Q. robur and Q. petraea. In the two populations considered, expected heterozygosity (H e ) ranged from low (0.100 and 0.091, for Q. robur and Q. petraea, respectively) to high (0.939 and 0.964, respectively) values. Diversity levels (allelic richness and H e ) were similar in the two species. Diversity levels were rather similar in the two submediterranean Quercus species (Additional file 12). All diversity estimates were largest for EST-SSRs with dinucleotide repeat motifs. However, differences were small between tri and hexanucleotide EST-SSRs in these two oaks. The same trend was observed for Q. robur and Q. petraea.

Oak genome evolution Construction of a gene-based consensus linkage map
Individual maps were constructed from EST-SSRs (see above) and SNPs. For SNP-based markers, 105 (set 1) and 7 (set 2) candidate genes were genotyped in three mapping populations: 56 (set 1) and 4 (set 2) were localized on at least one of the six parental maps (Additional file 1 and  LG10 and LG11. Markers were assigned to 86-10 cM bins within each of the 12 linkage groups of the consensus map, for the identification of regions orthologous to regions from Arabidopsis, grape, poplar, Medicago, and soybean.

Synteny and duplication analysis
Independent intraspecific (i.e. paralogs) and interspecific (i.e. orthologs) comparisons are required for the precise inference of paralogous or orthologous gene relationships between oak and other eudicots and to determine the precise history of oak evolution from the known ancestor of eudicot genomes.
The integration of independent analyses of duplications within and synteny between the five major eudicot genomes led to the precise characterization in oak of five of the seven paleoduplications recently identified as the basis of the definition of seven ancestral chromosomal groups in eudicots [63]. These ancestral shared duplications were found on the following chromosome-pair combinations in Table 4 LG size (in cM) for both species, Q. robur and Q. petraea n°LG LG1 LG2 LG3 LG4 LG5 LG6 LG7 LG8 LG9 LG10 LG11 LG12 tot mean  Figure 5 LG consensus species maps of Q. robur and Q. petraea. oak, the locations of the seven ancestral paleoduplications in grape also being indicated : g1-g14-g17/o1-o2-o10, g2-g15-g12-g16/[not identified in oak], g3-g4-g7-g18/o6-o11, g4-g9-g11/[is partially fused into o2], g5-g7-g14/o3-o6, g6-g8-g13/o7-o8, g10-g12-g19/o5-o8. Thus, five of the seven previously identified ancestral shared duplications are characterized here for the first time in oak. Based on the ancestral and lineage-specific duplications already reported for eudicots, an evolutionary scenario can be developed in which the 12 oak chromosomes evolve from the seven chromosomes of the eudicot ancestor or, more precisely, from the 21 chromosomes resulting from polyploidization of the paleohexaploid intermediate ( Figure 9). We suggest that at least eight major ancestral chromosome fusions (Cf) occurred to yield the current 12-chromosome structure, and that this process involved an intermediate ancestor that also had 12 chromosomes (Figure 9).

Discussion
Our results provide new biological information about certain features of oak EST-SSRs, the benefits of linkage mapping with multiple pedigrees, the macrosynteny between two interfertile oak species (Q. robur and Q. petraea) and between two closely related genera (Quercus and Castanea), and about the evolution of the oak genome from the ancestor of the eudicot genome.

Characteristics of oak EST-SSRs
As reported in other species [64], [9], dinucleotide-SSRs (di-SSRs) occurred preferentially within UTR regions, whereas trinucleotide-SSRs (tri-SSRs) that do not interfere with the reading frame occurred mostly in the coding regions of oak ESTs. The rate of polymorphism was also higher for di-SSR loci than for tri-SSR loci (72% vs. 65% (Table 3)), suggesting that SSRs occurring within UTRs are more polymorphic than those in coding regions. The number of alleles was also larger for di-SSR loci than for tri-SSR loci (60% of di-SSRs presented three or four alleles per locus, versus 37% of tri-SSRs (Table 3). A similar pattern has been reported for other species, such as castor bean [65] and cotton [66].
EST-SSRs were highly transferable between Fagaceae species, consistent with findings for other dicots, such as Prunus [13], Camellia [67], Citrus [15] and other species (reviewed in [10]), demonstrating a higher degree of transferability across taxonomic boundaries for EST-SSR markers than for genomic SSRs [68]. As expected, the transferability of Quercus EST-SSRs decreased with increasing phylogenetic distance between the species concerned. Furthermore, more than 75% of EST-SSR markers displayed high levels of genetic diversity in natural populations of Q. robur and Q. petraea. Thus, EST-SSR loci can generate sufficient polymorphism to Table 5 Mean distance between two loci for each LG and both species, Q. robur and Q. petraea LG3 LG4 LG5 LG6 LG7 LG8 LG9 LG10 LG11 LG12 mean   LG_Q LG_C constitute a valuable source of functional SSR markers for population genetic studies within the Fagaceae. As a proof of concept, we used two other Quercus species (Q. faginea and Q. pyrenaica) to provide the foundations for the use of a set of EST-SSR markers for comparative population genetic studies of the almost 20 species of deciduous oaks in the Mediterranean region. The high transferability rates into such species and the elevated polymorphims grant the use of our set of EST-SSRs for such purposes.

Linkage mapping with multiple pedigrees
Mapping based on multiple segregating populations has several advantages over mapping based on a single pedigree. First, such strategies make it possible to map much larger numbers of markers. In this study, 274 loci were mapped in the most polymorphic mapping population (P1), but the analysis of four more pedigrees made it possible to map another 145 loci. The L-shaped distribution of the number of markers common to the different populations ( Figure 4) clearly demonstrates that the number of polymorphic markers suitable for mapping increases with the number of pedigrees considered. A consensus map for oak is currently being constructed on a much larger scale, with SNP-based markers genotyped in four oak pedigrees with a total of 1,100 offspring. The addition of several thousand gene-based markers will provide a valuable tool for the alignment of genomic scaffolds from the oak genome (which is currently being sequenced) with a linkage map, with a view to establishing pseudochromosomes. Second, based on comparisons of the positions of the mapped markers in the various populations, we identified 26 loci (6%) with different linkage group positions

LG1 Castanea
LG2 Quercus Figure 6 Synteny between Quercus and Castanea for LG1. Loci in red are common for both species, loci in green are located as accessory loci (theta/LOD), parts of linkage group which are represented by the same colour correspond to homologous segments between the two species.
in different populations (25 assigned to two LGs and one assigned to three LGs), suggesting that different paralogs were indeed amplified in different genetic backgrounds, probably due to nucleotide variability at priming sites. In most cases, the discrepancies observed concerned parental maps for different pedigrees, but no such trend was identified concerning the species origin of the paralogous loci. Interestingly, eight of the 16 annotated sequences corresponding to proteins of known function belong to multiple gene families (e.g. ribosomal, RNA-binding, thioredoxin, O-methyl transferase proteins). Finally, the establishment of linkage maps for multiple pedigrees within a species is also a prerequisite for multiple pedigree-based QTL detection strategies aiming to identify and validate QTLs in a broad genetic background [69][70][71][72][73]. To this end, a total of 100 EST-SSRs evenly spaced and common to the six parental maps of P1, P2 and P3 have been chosen and will be genotyped in 150 to 300 F1s for the identification of QTLs for adaptive traits (e.g. bud phenology -unpublished results).
Comparative mapping of oak species that hybridize naturally: Q. robur and Q. petraea, and beyond We present here the first genetic maps for two interfertile white oak species, making it possible to trace chromosomal changes [74]. A relatively large proportion (323/ 397; 81%) of the loci mapped was common to at least two parental maps. The integrated species maps of 397 loci covered all 12 LGs, with a mean distance between markers of 2.60 cM for Q. robur and 2.91 cM for Q. petraea. As expected, the total length of the integrated maps was greater than the length of the individual maps, as previously reported for Vitis [75], Lactuca [72] and Picea [76]. Figure 7 Syntenic relationships between oak and Arabidopsis, grape, poplar, Medicago, soybean genomes. Schematic representation of the orthologs identified between the grape chromosomes (g1 to g12) used as a reference, and the Arabidopsis (a1 to a5), poplar (p1 to p19), Medicago (m1 to m8), soybean (s1 to s20) and oak (o1 to o12) chromosomes. Each line represents an orthologous gene. The seven different colors used to represent the blocks reflect the eudicot origin from the seven ancestral chromosomes. These results suggest that integrated maps probably cover regions not covered by the individual maps, in distal positions on the chromosomes. The genome lengths of the two consensus maps were very different -933 cM for Q. robur and 767 cM for Q. petraeadespite the similar physical size of the two genomes [77]. This discrepancy may reflect differences in recombination rates between Q. robur and Q. petraea or differences in recombination rate in these particular genotypes. The overall macrocollinearity between these two species maps was high, with little shuffling of marker order between homologous LGs. Some local inconsistencies in marker order were observed, as reported for other species [72,75,78,79], but no duplication or major chromosomal rearrangement (inversion, translocation) was characterized. This high degree of collinearity should facilitate the identification of genomic islands involved in species differentiation [80,81].
A comparison of the consensus maps of Quercus and Castanea revealed a high degree of collinearity and synteny between the 12 homologous linkage groups, despite the divergence of their lineages 70 million years ago [82]. A search for genes underlying similar QTLs, based Figure 9 Oak genome paleohistory. The oak chromosomes are represented with a seven colour code to illuminate the evolution of segments from a common ancestor with seven chromosomes (A1-A7). The lineage specific shuffling events (such as chromosome fusion, CF) that have shaped the modern oak karyotype from the n = 7 or 21 ancestors are mentioned on the figure. on comparative mapping, could be considered, making use of the sequencing data available for Castanea.

Oak genome evolution
We have identified precise chromosomal relationships within the oak genome corresponding to the ancestral hexaploidization event reported in eudicots [83]. This made it possible to propose an evolutionary scenario describing the development of the modern oak genome from the ancestral eudicot karyotype over the last 100 million years. Such information is of prime importance for gene cloning and, for example, detecting gene function by complementing Arabidopsis mutants. The ancestral hexaploidization event in eudicots generated two additional copies for any ancestral gene function considered [3]. In modern eudicot species, these three homologous copies may have or may not have retained the same function as the ancestral gene. It is thus of the utmost importance, when cloning candidate genes on the basis of synteny (i.e. translational genomics approach with the use of reference (i.e. sequenced) genomes), to investigate all the duplicated copies, which may prove to be redundant or complementary in terms of their function and the phenotype they confer [84].

Conclusion
This study provides new insights into the distribution of EST-derived SSRs between five mapping populations of two oak species and the benefits of using multiple pedigrees for the construction of consensus maps. We mapped 397 loci, 81% of which were common to at least two different mapping populations. The level of conserved macrosynteny was very high between Q. robur and Q. petraea, as well as between Quercus spp. and Castanea sativa, opening perspectives for QTL validation across phylogenetically related species as demonstrated by Faivre Rampant et al. [85].
Functional characterization of these EST-derived oak SSRs revealed many genes with biological, cellular and molecular functions. Their position is now being compared to that of already mapped QTLs and suggest putative positional candidate genes that are being used as anchor markers to fine map large effect QTLs (e.g. for water use efficiency and bud burst) and identify the underlying sub genomic region using the BAC libraries available for Quercus robur [85].