Development of new genomic microsatellite markers from robusta coffee (Coffea canephora Pierre ex A. Froehner) showing broad cross-species transferability and utility in genetic studies

Background Species-specific microsatellite markers are desirable for genetic studies and to harness the potential of MAS-based breeding for genetic improvement. Limited availability of such markers for coffee, one of the most important beverage tree crops, warrants newer efforts to develop additional microsatellite markers that can be effectively deployed in genetic analysis and coffee improvement programs. The present study aimed to develop new coffee-specific SSR markers and validate their utility in analysis of genetic diversity, individualization, linkage mapping, and transferability for use in other related taxa. Results A small-insert partial genomic library of Coffea canephora, was probed for various SSR motifs following conventional approach of Southern hybridisation. Characterization of repeat positive clones revealed a very high abundance of DNRs (1/15 Kb) over TNRs (1/406 kb). The relative frequencies of different DNRs were found as AT >> AG > AC, whereas among TNRs, AGC was the most abundant repeat. The SSR positive sequences were used to design 58 primer pairs of which 44 pairs could be validated as single locus markers using a panel of arabica and robusta genotypes. The analysis revealed an average of 3.3 and 3.78 alleles and 0.49 and 0.62 PIC per marker for the tested arabicas and robustas, respectively. It also revealed a high cumulative PI over all the markers using both sib-based (10-6 and 10-12 for arabicas and robustas respectively) and unbiased corrected estimates (10-20 and 10-43 for arabicas and robustas respectively). The markers were tested for Hardy-Weinberg equilibrium, linkage dis-equilibrium, and were successfully used to ascertain generic diversity/affinities in the tested germplasm (cultivated as well as species). Nine markers could be mapped on robusta linkage map. Importantly, the markers showed ~92% transferability across related species/genera of coffee. Conclusion The conventional approach of genomic library was successfully employed although with low efficiency to develop a set of 44 new genomic microsatellite markers of coffee. The characterization/validation of new markers demonstrated them to be highly informative, and useful for genetic studies namely, genetic diversity in coffee germplasm, individualization/bar-coding for germplasm protection, linkage mapping, taxonomic studies, and use as conserved orthologous sets across secondary genepool of coffee. Further, the relative frequency and distribution of different SSR motifs in coffee genome indicated coffee genome to be relatively poor in microsatellites compared to other plant species.


Background
Coffee tree, a member of the family Rubiaceae, belongs to the genus Coffea that comprises > 100 species. Of these two species, the tetraploid Coffea arabica L. (i.e. arabica coffee; 2n = 4x = 44) and the diploid C. canephora Pierre ex A. Froehner (i.e. robusta coffee; 2n = 2x = 22), are cultivated commercially. Coffee, one of the most popular non-alcoholic beverages, is consumed regularly by 40% of the world population mostly in the developed world [1], and thus occupies a strategic position in the world socioeconomy.
Efforts undertaken globally to improve coffee, though successful, have proven to be too slow and severely constrained owing to various factors. The latter includes: genetic and physiological makeup (low genetic diversity and ploidy barrier in arabicas, and self incompatibility/ easy cross-species fertilization in robustas), long generation cycle, requirement of huge land resources, and equally the dearth of easily accessible and assayable genetic tools/techniques for screening/selection. The situation warrants recourse to newer, easy, practical technologies that can provide acceleration, reliability and directionality to the breeding efforts, and allow characterization of cultivated/secondary genepool for proper utilization of the available germplasm in genetic improvement programs. In this context, development of DNA marker tools and availability of markers-based molecular linkage maps becomes imperative for MASbased accelerated breeding of improved coffee genotypes.
Among the different types of DNA markers, the Short Sequence Repeats (SSR) based microsatellite markers promise to be the most ideal ones due to their multiallelic nature, high polymorphism content, locus specificity, reproducibility, inter-lab transferability and ease for automation [2]. Microsatellite markers have been developed for a large number of plant species and are increasingly being used for ascertaining germplasm diversity, linkage analysis and molecular breeding [3]. Despite these advantages, only ~180 microsatellite markers have been reported till to date for coffee [4][5][6][7][8][9][10][11][12], signifying the need for expanding the repertoire of these genetically highly informative markers for efficient management and improvement of coffee germplasm resources. Here we report, a set of 44 novel microsatellite markers developed by radioactive screening of a small-insert partial genomic library of C. canephora (robusta coffee). Interestingly, all these markers exhibit broad cross-species transferability. We also demonstrate their utility as genetic markers for ascertaining the germplasm diversity, genotype individualization, linkage mapping and taxonomic affinities.

Results
The present study aimed to isolate new coffee-specific informative SSRs useful as genetic markers for characterizing coffee genome and linkage mapping studies. For the purpose, a partial small-insert genomic library was constructed from a commercially cultivated robusta variety 'Sln-274'. The library was screened using radioactive SSR oligo probes to isolate SSR-containing DNA fragments, which were sequenced and used for designing primer pairs from the flanking regions and subsequent conversion to PCR-based SSR markers. The designed primer pairs were standardized for PCR amplification, and then validated for utility as genetic markers using panels of elite coffee genotypes, a mapping population for linkage studies, and related taxa of coffee for cross-species transferability. In addition, sequence data of the screened and putative SSR-positive selected clones were used to assess the relative abundance of different SSR motifs in robusta coffee genome. In total 44 new highly informative SSR markers are developed.

Screening/Identification of SSR positive genomic sequences from the small insert partial genomic library of Sln-274
The small-insert partial genomic library constructed from robusta variety Sln-274 comprised 15,744 clones. Radioactive screening of the arrayed and blotted clones indicated 446 putative positives of which good quality sequence data could be obtained for 199 clones. The average insert size of the sequenced clones was 773.5 bp. Considering the latter, and that the sequenced clones represented a random sample of the genomic library with respect to the size, the total size of the cloned genome amounted to 12.2 Mb which equaled to ca. 1.5 % of the robusta coffee genome [13] (Table 1). SSR search of the clone sequences using the MISA search module, detected 76 genuine SSR-positive clones (0.48% of the total library) containing both targeted and non-targeted SSR motifs. Overall, these clones contained 92 SSRs comprising DNRs (48.3%), TNRs (25.9%), and HO-NRs (4.8%), and 24 SSRs comprising only MNRs (20.7%) ( Table 1, 2). Among the targeted repeat motifs (screened SSR-oligo nucleotides), AG was the most abundant repeat (26.7%), followed by AC (12.9%) and AGC (7.8%), whereas CCG (0.9%) was the least abundant and ACT was not detected at all (Table 2). Similarly, among the non-targeted SSR motifs other than MNRs, AT was the most abundant repeat (8.6%, Table 2).

Frequency and distribution of SSRs in coffee genome
A total of 76 targeted SSRs (DNRs and TNRs) and 10 nontargeted DNRs were assessed for their lengths, distribution in the present library, and their relative abundance in the robusta genome (Table 2). Average length (in terms of repeat units) for the DNRs and TNRs was 9.6 and 5.9, respectively. Among DNRs, AT and AG were comparable and longer than AC, whereas ACG and AGC were the longest of the TNRs ( Table 2). The size of cloned/screened genomic library and the observed data for identified SSRs were considered along with the earlier predicted size of the robusta genome [13] to derive relative estimates for frequency/distribution of different SSR motifs in the robusta genome. The analysis revealed coffee genome to be enriched in AT type DNRs (AT-DNR), which were estimated to be many fold more than any other SSR motifs (targeted and/or non-targeted). The results indicated one AT-DNR per 16 Kb (1/16 Kb) of robusta genome; this was almost 20-fold higher than the next most abundant DNR i.e. AG (ca. 1/393 Kb). The DNRs as a single class were estimated to be 1/15 Kb genome when AT (comprising 94% of the total DNRs) was included, and 1/265 Kb coffee genome for the remaining ones. In comparison, the overall frequency of TNRs was calculated to be 1/406 Kb with AGC being the most predominant (ca. 1/1300 Kb) and CCG the least (ca. 1/12200 Kb). In addition, a few other higher order SSRs (mainly the AT-rich) were also detected but these were not used for estimate calculations, as their numbers were very low. Thus, the present study indicated an abundance of one SSR (either DNR or TNR) per 15 Kb of robusta coffee genome, wherein the DNRs were ~27 times more abundant than the TNRs.

Development of microsatellite markers
All the identified SSR-positive sequences were tried to design primer pairs for conversion to microsat markers using 'SSR motif length' (of ≥ 7 and 5 repeats for DNRs and higher order SSRs, respectively) as one major criterion. As a result, only 56 of the total 92 identified SSRs (all except MNRs) were found suitable for primer design indicating 60.9% primer suitability. These comprised 42.2% DNRs, 40.7% compound SSRs, 6.8% TNRs, 5.1% TtNRs and 1.7% HNRs. In addition, primers were also designed for 2 of the randomly chosen 14 MNRs to test their potential for conversion to SSR markers. Among the SSRs found unsuitable for primer design, 70.6% had shorter motif length and 29.4% had flanking regions unsuitable for primer modeling. Of the 58 potential primer pairs designed, 52 could be successfully amplified and 44 of these could further be validated (Table 3, 4) as useful markers indicating ~76% primer to marker conversion ratio.

Validation of microsatellite markers for use in genetic studies
Germplasm characterization Allelic diversity, heterozygosity status and extent of polymorphism For ascertaining the useful attributes of genetic markers, all the new 44 microsatellite markers were tested on a panel of 16 elite robusta and arabica genotypes. Good    Figure 1a.
The PIC values varied considerably for the new markers across the tested genotypes. The mean PIC value for arabicas was 0.49 (range 0.12 -0.81), which was significantly less than 0.62 (0.23 -0.83) observed for robusta (Table 4, Figure 1b). Further, the student's t test revealed highly significant differences in the total number of amplified alleles (N A ) and PIC value estimates for arabica and robusta genotypes (N A : t = 3.18, P = 0.00, and PIC: t = 3.46, P = 0.00) for the amplified and comparable markers.
The above SSR allelic data, when used to calculate the heterozygosity estimates, revealed highly significant differences between the observed and expected heterozygosity both for arabicas (mean H o : 0.29 and mean H e = 0.50; paired t value = 3.64; P = 0.00) as well as for robustas (mean H o : 0.52 mean H e : 0.63; paired t value = -2.54; P = 0.01). The results, thus, suggested significant heterozygote deficiency in both the germplasm sets. Further, only 15 of the 23 Pms (62.5%) were found to be in HW equilibrium in the case of arabicas, while the remaining eight showed significant heterozygote deficiency (Table 4) corroborating the heterozygosity data. Similarly, in robustas, 28 (65.2%) of the 41 Pms were found to be in HW equilibrium and of the remaining 14 Pms, eight markers showed significant heterozygote deficiency while six markers showed heterozygote excess.
The LD test performed for all the Pms, showed 29.8% (82 of 275) and 25.0% (202 of 780) pair-wise comparisons in significant dis-equilibrium (P < 0.05) for arabicas and robustas respectively. On an average each Pm was found to be in dis-equilibrium with 3.4 (SD: ± 2.4, SE: ± 0.51) other Pms in case of arabicas and 4.9 (SD: ± 4.0, SE: ± 0.63) for robustas. The maximum LD was observed for the marker CaM24 (with six other markers) in arabicas and CaM26 (with eight other markers) in robustas.

Mappability of novel SSR markers
The new SSR markers were tested for their mappability on robusta linkage map. In total, 9 of the 44 new markers (20.5%) were found to be polymorphic for the parents of the robusta pseudo-testcross mapping population i.e. CXR and Kagganahalla. The nine markers (CaM03, 16, 20, 22, 32, 35, 42, 44 and 46) could be mapped on the robusta linkage map developed by us [12]. Notably, seven of the markers (except CaM16 and CaM46) were mapped on independent LGs, which indicated the new markers to be randomly distributed on the robusta genome ( Figure 2, Table 3).

Cross-species/-genera transferability and primer conservance
Cross species transferability of the new robusta derived SSR-markers was tested for 13 related Coffea and two Psilanthus species. In general, the markers resulted in robust cross-species amplifications with alleles of comparable sizes in the tested taxa (Table 4). Overall, an average transferability of ~92% was observed (Table 6, 7), which was higher for Coffea spp. (> 93%) than for the related Psilanthus spp. (~82%). Moreover, within different Coffea taxa, across its different botanical subsections, the transferability was comparable (> 91%). The data thus, indicated a very high marker conservance across the related coffee species, which was calculated to be ~91% over all the tested markers. Marker CaM54 exhibited lowest conservance of 23% (for Coffea species) and 27% (over all taxa), whereas 24 markers were found to be 100% conserved. The data also revealed the presence of some private alleles (PAs), which possibly could be species-specific. In total, 104 such alleles were found in Coffea (with a mean number of 8.7 PAs/species) and 35 in Psilanthus species (17.5 PAs/species), over all the 44 markers. These accounted for ~34% of amplified alleles in Coffea spp. and 45% of those amplified in Psilanthus spp.

Generic affinities within/between cultivated and wild coffee germplasm
The diploid microsatellite data were examined for their potential in genetic diversity studies by studying the variation and interrelationship between the cultivated as well as wild genepool. The average genetic distance values (calculated using the SSR allelic data) were found to be 0. The NJ phenetic tree generated using the genetic distance estimates for eight genotypes each from arabica and robusta clearly resolved the tested germplasm in two distinct clusters, one representing all the tetraploid arabicas, while the other comprised all the diploid robustagenotypes ( Figure 3) with significant branch support. The selections from pure arabicas formed a single cluster within arabicas, whereas selections from hybrids formed different group. HdeT was found closest to S2790 and S2792, whereas Sln11 was found to be the most distant entry in arabicas. Similarly, a clustering analysis of 14 related species (12 Coffea and two Psilanthus spp.; Figure  4) along with two genotypes each from C. arabica and C. canephora formed coherent clusters of diploid Erythrocoffeas (C.

Distribution and abundance of detected SSR motifs
The coffee-specific SSR markers described in this study were developed using the conventional approach of construction/screening of a partial small-insert genomic library. The success rate of any microsatellite development effort is indicated by the proportion of SSR-containing clones in the library followed by number of detected SSRs, qualities of SSR motifs and also by the quality of flanking regions. In the present study, 76 good quality SSR-positive clones containing a total of 116 SSRs were obtained from which 44 SSR markers were developed (Table 1, 3). The results, thus, suggested a success rate of 0.48% in the identification of potential target SSR-positive clones, and 0.28% in overall marker development. In a representative study to assess success of conventional library screening approach for microsat marker development in 16 different plant genera, it was found that the proportion of SSRpositive clones varied significantly (0.059% to 5.8% with an average of 2.5%) from species to species [14]. The observed SSR detection efficiency of the approach in this study was comparable with earlier reports in Acasia (0.32%, [15]) and peanut (0.43%, [16]), but was higher than rice (0.22%, [17]), potato, (0.06 to 0.15%, [18]) and wheat (0.11% [19]), and less than white spruce (0.62%, [20]).
The estimates derived from this study revealed that the relative distribution of different SSRs in robusta coffee genome is relatively poor in overall SSR abundance (1/ 160 Kb for targeted SSRs, and 1/15 kb including the nontargeted SSRs; Table 2) compared to various other plant species such as Arabidopsis, rice, barley (1 every 6-8 Kb) [21] and mulberry (our unpublished data). Nevertheless, the relative frequency, repeat lengths, and distribution pattern of different types of genomic SSRs in coffee genome ( Table 2) were comparable to those reported in a number of plant species like apple [22], avacado [23], birch [24], peach [25], Acasia [15] and tomato [26]. In specific, AG was detected in higher proportion (almost 2 times) than AC; AG repeat cores were, in general, found to be longer than any other SSR type. Repeat cores of TNRs were, in general, smaller than DNRs, and AT (the non-targeted SSR) was found to be the most abundant in comparison to any other DNR or TNR. In comparison, the AT-rich TNRs in the coffee genome were found to be relatively less abundant than seen in most plant species [16,27,28], but comparable to some of the tree species like avacado (ACC > AGG > AAG, [23]) and peach (abundant in AGG, [25]). A species specific-pattern of TNR abundance has also been demonstrated in closely related species like rice and wheat that belong to the same family but differ significantly in their genomic TNR content [29][30][31]. Some of the variation seen in the SSR estimates (relative frequency, distribution and abundance) as discussed above across different studies including the present one on coffee, can be ascribed to the differences in criteria used for SSR search viz., minimum length of repeat-core, the size of the genomic library screened, screening stringency, oligos used for screening and SSR mining tools, notwithstanding the innate differences in genomic organization of SSRs in different species.
A comparison of the relative abundance/distribution of genomic SSRs with that of genic-SSRs developed from coffee transcriptome earlier by us [11], revealed two striking differences viz., an apparent higher abundance of SSRs in the transcriptome (1/2.16 Kb) and a near reverse pattern of TNR abundance/relative distribution in two types of SSRs. Importantly, the two most abundant TNRs (AAG, ACT) in the genic-SSRs were least abundant or notdetected in the genomic SSRs. The observation would suggest interesting possibilities of differential distribution/ organization of TNRs as well as restriction sites for the enzymes used for library construction across gene-rich and gene-deficient regions of the coffee genome. However, such possibilities can only be addressed by further detailed genomic studies in times to come.
Relative position of the nine new SSR markers (20% of the total tested) mapped on a robusta coffee map [12] Figure 2 Relative position of the nine new SSR markers (20% of the total tested) mapped on a robusta coffee map [12]. The reference map was generated using pseudo-testcross mapping population derived from a cross of 'CxR' (a commercial robusta hybrid) and Kagganahalla (a local selection from India). Note that the new mapped markers are distributed randomly across different linkage groups. The value at the base of each LG refers to its relative length in centi-Morgans (cM).

Development of new SSR markers
In coffee, to the best of our knowledge till date only ca. 180 genomic SSRs have been described in literature [4][5][6][7][8][9][10][11] warranting continuous efforts to develop additional new markers to expand the existing repertoire for their efficient deployment in genetic studies in coffee. In this study 63% of the detected SSRs were found useful for primer design/ marker conversion, a much higher success rate compared to that reported for apple (30% [22]), cassava (37.7% [32]), Elymus caninus (11.1% [33]), oat 25.2% [34] and potato (26.9% [18]). The two main sequence attributes that rendered 36 identified SSRs unsuitable for primer design were found to be: a shorter repeat core, and a lowcomplexity flanking region (AT/GC-rich and/or regions prone to secondary structure formation) unsuitable for primer modeling. Interestingly, in the present study, not even a single failure was due to the location of SSR-core towards the end of clone sequence, which is reported to be one major limiting factor in many earlier studies in cassava, tomato, oat and fir [26,32,34,35]. The higher success rate and less number of limiting factors in primer-designing observed in this study are expected to be due to the better suitability of the restriction enzymes, as well as, the relatively longer genomic fragments (0.5 to 1.5 kb) used for the genomic library construction. Importance of size of the genomic fragments used for construction of genomic library/SSR-marker development has also been shown earlier in groundnut [16].

II. Parents and mapping population used for testing utility in mapping analysis
Parents: CXR (12) and Kagganahalla (9); Mapping population: 175 segregating progenies

III. Species of Coffea and Psilanthus (related taxa of cultivated coffee) used for transferability studies
The proportion of designed primers successfully producing amplification products gives a primer-to-marker conversion ratio and indicates the ultimate success of the library construction effort. In this study, of the 58 primer pairs designed, 44 could be validated as efficient SSRmarkers (see Tables 3, 4, and the discussion in the following sections) thus resulting in ~75.8% primer-to-marker conversion ratio, broadly comparable to many earlier conventional genomic library-based studies viz., cucurbits [36], Elymus [33], peanut [16], tomato, [26] and rice [17].
One of the lowest primer-to-marker convertibility reported for Douglas fir (4.1%) was suggested to be due to the complexity unique to the conifer genomes [35,[37][38][39]. Further, a survey of the literature suggests, in general, a higher conversion ratios for small genomes like apple, peach, and a negative correlation between the genome size and the amplification efficiency of SSR primers due to mechanistic reasons [40].
Two of the 44 new SSR markers described here (CaM49, 55) were based on MNR repeats. In general, these markers warranted much more critical appraisal for ascertaining their individual alleles/sizing that in many cases were not easily distinguishable from the similar sized confounding stutter amplicons (data not shown). Therefore, it may be prudent to avoid use of such MNR-based markers despite these being informative, unless no other markers are available.

Utility of new SSRs as genetic markers
Till date, there are a few studies describing development of coffee-specific SSR markers [4][5][6][7][8][9][10][11]; however, only a few of these provide data for the utility of new SSRs in genetic studies [8,11]. Therefore, one major aim of the present study was to test the potential of the new markers reported here for their use in studies related to genetic diversity in cultivated coffee germplasm, linkage mapping, constructing reference panels/bar codes for individualization of genotypes, cross-species transferability, and taxonomic relationship in related taxa.

Germplasm characterization Level of allelic polymorphism and genetic diversity
Various genetic parameters viz., allelic diversity, PIC, H o , H e , Kurtosis/skewness, HWE, LD, calculated for all the new SSRs amply demonstrated their utility as genetic markers (see results, Table 4). In general, the markers revealed low to moderate allelic/genetic diversity which was comparable and in some cases more than that reported for the earlier described coffee genomic SSRs [6,8], and as expected, invariably higher than the genic-SSRs [10,11,41]. The total number of alleles amplified by

NJ tree showing relationship between 14 Coffea and two
Psilanthus taxa based on the allelic diversity generated using the new SSR markers HdeT HdeT different markers in the tested arabicas and robustas was almost similar; however, the markers were found significantly more informative with higher PIC values for robustas. In addition, it was important to note that 13 of the tested markers amplified two distinct but similar sized alleles across all the tested arabicas suggesting these to be the result of duplicated fixed loci in the arabica genome. The above observations are likely considering the reproductive behavior, genome evolution and domestication process of two types of coffee. The robustas are expected to be genetically more diverse (leading to higher PIC for tested markers) due to their out-crossing behavior in contrast to arabicas that are self-compatible and also known to suffer from narrow genetic base resulting from the genetic bottleneck during domestication process [8,11]. Similarly, the duplicate loci in arabica genome are plausible as it is an allotetraploid resulted from hybridization of two homeologous diploid genomes (C. eugenioides and C. canephora) followed by diploidization and stabilization [42].  [6,10,11], and indeed reflective of the genetic composition and mating behavior of the tested materials. Overall, these studies indicated that the tested robusta germplasm comprised allogamous, relatively unrelated genotypes (selections and one hybrid), while autogamous arabicas comprised mostly of hybrid varieties/selections with overlapping/shared pedigrees. The results thus suggest the suitability of the new markers for reliably ascertaining genetic diversity in the coffee genepool.

Discriminatory power of new SSR markers
Individualization of plant germplasm resources has become important in the present day scenario for their proper management and utilization, as well as IPR protection which can be achieved by DNA typing techniques involving use of highly polymorphic markers like SSRs. Germplasm characterization using such typing approaches remains a costly proposition, especially if the target species like coffee that has very limited diversity in its available genepool. To circumvent these problems and increase the utility of such efforts, it has been proposed to build reference DNA polymorphism data resources/panels for coffee germplasm using robust markers like SSRs and common experimental guidelines [12,43]. Such reference resource can then readily be used for coffee genotype individualization, germplasm selection for breeding/ improvement, and germplasm exchange in international collaborations [12,43]. In this context, it becomes important to ascertain the PI estimates (that provide very informative indicators of the discrimination potential) of the SSR markers, before deployment in germplasm characterization studies. In general, the PI estimates for the new markers ranged from low to moderate when considered individually, but were highly informative for genotype discrimination when tested together (cumulative PI).
Moreover, the estimates were found to be reflective of the diversity status in the test germplasm, and accordingly were significantly different (lower) for arabicas than the robustas ( Table 5). The analysis in general indicated the need for use of 3-4 times more markers to achieve the comparable level of discrimination in the two coffee genepools. Moreover, the data suggested that from practical point of view it might be prudent to calculate the sibbased PI (a more conservative estimate of discrimination) for deciding the number of markers that can provide sufficient variability for individualization of the test germplasm. This is expected as the sib-based PI discounts the possible similarities/relatedness in the target germplasm arising due to overlapping pedigrees/common parentage.

Mappability of the new SSR markers
One of the major potential utilities of DNA markers is their use as robust genomic landmarks on the linkage groups that can subsequently be tagged to the gene(s) controlling important traits of interest providing possibilities of MAS-based breeding. This requires generation of reasonably dense linkage maps populated with large number of revisitable DNA markers for which the SSRs remain the most desired ones. Till date, very few SSRs are mapped on the robusta linkage map [7,44] warranting extensive efforts to generate more SSR markers usable for linkage analysis. In this regard, we tested the suitability of the new markers for linkage mapping using a pseudo-testcross mapping population of robusta coffee. Significantly, 20.5% of the markers were found to be polymorphic for the parents of the mapping population, and all of these could be successfully mapped (Figure 2). The mapped markers were distributed on different linkage groups, and some of these mapped towards the ends of the LGs as has been seen in the earlier studies [44]. The data, thus, strongly demonstrate that the new markers can be efficiently used for genetic linkage studies in coffee.

Cross-species/-generic transferability
The low-moderate level of diversity exhibited by the new markers in the cultivated coffee genepool, is more than compensated by their high potential for cross-species transferability. All the markers revealed robust cross-species/-generic amplifications with alleles of comparable sizes when tested for 13 Coffea and two Psilanthus taxa ( Table 7). The data revealed that the markers described here show much better taxa transferability than the earlier published genomic SSR markers [6,9,10], but relatively less than the genic SSR markers reported by us [10,11]. More importantly, the markers showed comparable transferability across related species of Coffea as well as 2 species of the related genus Psilanthus. This is significant as successful cross-species amplification is generally restricted to related species within a genus and reduces when tested for different genera [11,45]. Further, it was interesting to note that all the new SSRs that were monomorphic/uninformative for the tested arabica and robusta germplasm, exhibited considerable polymorphism across the tested related taxa. The only exception was the marker CaM54 that showed a very low conservance even across the Coffea spp. Thus, the new SSR markers described here strengthen the possibility of their use as Conserved Orthologous Sets (COS) for genetic characterization of different related wild coffee taxa, and also for coffee taxonomic/synteny studies.

Diversity analysis and genetic relatedness within/between Coffea and Psilanthus species
The genomic SSRs described in this study, despite revealing low level of polymorphism, were able to group all the 16 genotypes belonging to two cultivated germplasms in phenetic clustering that were indicative of species relationship and confirming their known pedigrees ( Figure  3). For example, the analysis confirmed the related origin of S2790 and S2792, which are two-way hybrids between HdeT and Taferikela.
Similarly, the analysis of 20 representative samples belonging to 14 Coffea and two Psilanthus species, revealed generic affinities that were in general agreement with their known taxonomic relationships, based on their geographical distribution as well as Chevalier's botanical classification [46] (Figure 4). Accordingly, the phenetic tree based on the new markers data very clearly grouped the analyzed related coffee species as per their respective botanical subsections (see results). Importantly, the analysis distinctly separated the two Paracoffea species (P. bengalensis and P. wightiana) from all the other Coffea spp. These results are similar to the earlier published studies undertaken to ascertain species relationships using SSRs [8,9,11], as well as other marker approaches [47][48][49]. A close relationship of C. kapakata to the Mozambicoffea taxa, and status of the only Melanocoffea taxon C. stenophylla as seen here was also indicated earlier in the EST-SSR and ISSR-based studies [11,47]. These results, thus, demonstrate that the new SSR markers developed in the present study can be highly informative in exploring the taxonomic relationship of coffee species complex.

Conclusion
In summary, the present study describes 44 new microsatellite markers developed using the conventional approach of construction/screening of partial small-insert genomic library. The approach was found to be successful but difficult and experiment-intensive with low success rate of 0.48%. Analysis of the identified SSR-positive genomic clones provided insights into the relative abundance, and distribution pattern of different SSR motifs in the coffee genome that was found to be relatively poor in its SSR abundance compared to many other plant genomes. Overall, the DNRs were much more abundant than TNRs, and among different types of SSR motifs, AT was the most abundant followed by AG, AC, and ACG. The TNR CCG, was the least abundant. More than 50% of the identified SSRs could be converted to usable markers resulting in a high primer-to-marker conversion ratio. All the 44 markers were found to be polymorphic in the tested coffee/ related germplasm and their utility as efficient genetic markers could be demonstrated for diversity analysis, germplasm individualization, linkage mapping, crossspecies transferability and taxonomic studies. This study has thus enriched the available small repertoire of coffee SSR markers by 44 new SSRs, which are not only useful for cultivated coffee but are also expected to be equally useful for genetic studies involving related species that constitute the important secondary genepool for improvement of coffee.

Plant material and DNA extraction
In this study sixteen elite genotypes belonging to C. arabica and C. canephora were used along with 14 related wild species belonging to Coffea and Psilanthus genera ( Table  7). The leaf samples for each of them were collected from germplasm bank maintained at Central Coffee Research Institute, Balehonnur, Karnataka, India and DNA was isolated following the method described by Aggarwal et al. [50].

Construction of genomic library and isolation of SSR containing sequences
A partial small-insert genomic library was constructed using standard procedures [51] from total cellular DNA isolated from an elite robusta genotype, Sln-274. Approximately 10 µg of genomic DNA was digested with Rsa I and Hae III (NEB) restriction endonucleases (NEB, USA) and fractionated in 1% agarose gel. Genomic fragments of 500 to 1500 bp were gel-excised, purified using the GFX column (Amersham), ligated to pMOS Blunt-ended plasmid vector (Amersham) using T4 DNA-ligase, and finally the ligated genomic inserts were cloned in Escherichia coli DH10B host cells by electroporation. The transformed cells were grown overnight and recombinant white colonies were individually picked up and maintained in forty one 384-well microtiter culture plates, and replicated onto Hybond-N + nylon membranes (Amersham Biosciences, USA) to obtain high-density hybridization filters for screening. All the 15,744 arrayed recombinant clones were Southern hybridized to γ-32 P-labeled two oligo pools (each comprising different synthetic oligonucleotides in equimolor concentration), viz Pool-I: (CA) 15 , (GA) 15 , (CAA) 10 , (CAT) 10 , (ACT) 10 , (GATA) 10 , (AGA) 10 , (CATA) 10 ; and Pool-II: (CTG) 10 , (GAC) 10 , (AGG) 10 , (GGT) 10 , (GCC) 10 , (GC) 15 . Hybridized clones were selectively picked up and individually processed for plasmid isolation following the standard alkaline lysis method [51]. The genomic inserts were then amplified and sequenced using M13 universal primers for both the strands on 3700 DNA Analyzer using BigDye™ chemistry as per the manufacturer's details (Applied Biosystems, USA). The sequences were aligned and edited using Autoassembler (Applied Biosystems, USA) and finally saved in FASTA format.

Marker Development
The identification and localization of microsatellites in the sequenced clones was performed using microsatellite search module MISA (for more information please see Availability & requirements section below) followed by visual assessment. Criteria for SSR search by the MISA were repeat stretches having a minimum of: 12 repeat units for MNRs, six repeat units in case of DNRs and five repeat units for HO-NRs. The microsatellites were classified considering the complementarities of the repeat motifs, e.g., AG, GA, TC and CT were considered as a single category. Primer pairs were designed for the SSR containing sequences with minimum of seven DNRs, and/or five repeats for all other SSRs using GENETOOL Lite version 1.0 (for more information please see Availability & requirements section below). The primers were commercially synthesized (Bioserve, India -for more information please see Availability & requirements section below) with forward primers having the fluorescent label FAM or HEX. The details of these new markers viz., locus designation, primer sequences, repeat motifs, allele attributes, PIC estimates and Genbank accession numbers, are summarized in Table 3, 4. The primer pairs were standardized and PCR was performed as described earlier [10,11]. The amplified products were run on capillary-based 3730 DNA Analyzer (Applied Biosystems) and the products were precisely sized for major, comparable and conspicuous peaks using GeneMapper 3.7 (Applied Biosystems), using default parameters.

Statistical and genetic analysis
The allelic data for eight genotypes each for arabicas and robustas were used to calculate different statistical and genetic parameters. The statistical attributes like mean, skewedness, kurtosis, t-test etc. were calculated using Microsoft Excel function utilities. Observed heterozygosity (H o ) was calculated as fraction of heterozygous genotypes over total number of genotyped plants. Expected heterozygosity (H e ) was calculated according to the following formula [52]: H e = (n/n-1)(1-Σpi 2 ).
PIC values were calculated according to Botstein et al. [53] as follows: 1-Σpi 2 -ΣΣ2pi 2 pj 2 , where, n = the total number of alleles detected for a microsatellite marker, Pi = the frequency of the i th allele, and pj = the frequency of the (i+1) th allele in the set of analyzed genotypes.
The bi-allelic polymorphic data were also tested for Hardy-Weinberg equilibrium (HWE) using Fisher's exact test and Markov chain algorithm with forecasted chain length of 10,000,000 and 100,000 dememorization steps and linkage dis-equilibrium (LD) test was performed using 1,000 permutations. For arabicas, the markers that showed invariable presence of 'double alleles' across the tested germplasm were considered as independent amplifications from duplicated loci present in two distinct copies and were excluded from the analysis for the allelic attributes described above. The H o , H e , estimates and HW and LD tests were done using the program Arlequin ver 3.1 [52], and the probability of identity (PI) estimates were calculated using the program Gimlet ver 1.3.2 [54]. Private alleles (PAs) were determined using the software Convert ver. 1.3.1 [55] over all the 30 genotypes. The discriminatory power of each microsatellite locus was calculated by estimating sib-based and unbiased corrected PI estimates and cumulative power of discrimination was calculated as products of PIs of successive informative markers arranged in decreasing order as described by Waits et al. [56]. Cross-taxa transferability (T mark ) was calculated as proportion of primers showing successful amplification vis-à-vis all the tested primers, whereas primer conservance (C taxa ) was calculated as proportion of the species displaying successful amplification vis-à-vis all the tested markers.