- Research article
- Open Access
Development of new genomic microsatellite markers from robusta coffee (Coffea canephoraPierre ex A. Froehner) showing broad cross-species transferability and utility in genetic studies
BMC Plant Biologyvolume 8, Article number: 51 (2008)
Species-specific microsatellite markers are desirable for genetic studies and to harness the potential of MAS-based breeding for genetic improvement. Limited availability of such markers for coffee, one of the most important beverage tree crops, warrants newer efforts to develop additional microsatellite markers that can be effectively deployed in genetic analysis and coffee improvement programs. The present study aimed to develop new coffee-specific SSR markers and validate their utility in analysis of genetic diversity, individualization, linkage mapping, and transferability for use in other related taxa.
A small-insert partial genomic library of Coffea canephora, was probed for various SSR motifs following conventional approach of Southern hybridisation. Characterization of repeat positive clones revealed a very high abundance of DNRs (1/15 Kb) over TNRs (1/406 kb). The relative frequencies of different DNRs were found as AT >> AG > AC, whereas among TNRs, AGC was the most abundant repeat. The SSR positive sequences were used to design 58 primer pairs of which 44 pairs could be validated as single locus markers using a panel of arabica and robusta genotypes. The analysis revealed an average of 3.3 and 3.78 alleles and 0.49 and 0.62 PIC per marker for the tested arabicas and robustas, respectively. It also revealed a high cumulative PI over all the markers using both sib-based (10-6 and 10-12 for arabicas and robustas respectively) and unbiased corrected estimates (10-20 and 10-43 for arabicas and robustas respectively). The markers were tested for Hardy-Weinberg equilibrium, linkage dis-equilibrium, and were successfully used to ascertain generic diversity/affinities in the tested germplasm (cultivated as well as species). Nine markers could be mapped on robusta linkage map. Importantly, the markers showed ~92% transferability across related species/genera of coffee.
The conventional approach of genomic library was successfully employed although with low efficiency to develop a set of 44 new genomic microsatellite markers of coffee. The characterization/validation of new markers demonstrated them to be highly informative, and useful for genetic studies namely, genetic diversity in coffee germplasm, individualization/bar-coding for germplasm protection, linkage mapping, taxonomic studies, and use as conserved orthologous sets across secondary genepool of coffee. Further, the relative frequency and distribution of different SSR motifs in coffee genome indicated coffee genome to be relatively poor in microsatellites compared to other plant species.
Coffee tree, a member of the family Rubiaceae, belongs to the genus Coffea that comprises > 100 species. Of these two species, the tetraploid Coffea arabica L. (i.e. arabica coffee; 2n = 4x = 44) and the diploid C. canephora Pierre ex A. Froehner (i.e. robusta coffee; 2n = 2x = 22), are cultivated commercially. Coffee, one of the most popular non-alcoholic beverages, is consumed regularly by 40% of the world population mostly in the developed world , and thus occupies a strategic position in the world socio-economy.
Efforts undertaken globally to improve coffee, though successful, have proven to be too slow and severely constrained owing to various factors. The latter includes: genetic and physiological makeup (low genetic diversity and ploidy barrier in arabicas, and self incompatibility/easy cross-species fertilization in robustas), long generation cycle, requirement of huge land resources, and equally the dearth of easily accessible and assayable genetic tools/techniques for screening/selection. The situation warrants recourse to newer, easy, practical technologies that can provide acceleration, reliability and directionality to the breeding efforts, and allow characterization of cultivated/secondary genepool for proper utilization of the available germplasm in genetic improvement programs. In this context, development of DNA marker tools and availability of markers-based molecular linkage maps becomes imperative for MAS-based accelerated breeding of improved coffee genotypes.
Among the different types of DNA markers, the Short Sequence Repeats (SSR) based microsatellite markers promise to be the most ideal ones due to their multi-allelic nature, high polymorphism content, locus specificity, reproducibility, inter-lab transferability and ease for automation . Microsatellite markers have been developed for a large number of plant species and are increasingly being used for ascertaining germplasm diversity, linkage analysis and molecular breeding . Despite these advantages, only ~180 microsatellite markers have been reported till to date for coffee [4–12], signifying the need for expanding the repertoire of these genetically highly informative markers for efficient management and improvement of coffee germplasm resources. Here we report, a set of 44 novel microsatellite markers developed by radioactive screening of a small-insert partial genomic library of C. canephora (robusta coffee). Interestingly, all these markers exhibit broad cross-species transferability. We also demonstrate their utility as genetic markers for ascertaining the germplasm diversity, genotype individualization, linkage mapping and taxonomic affinities.
The present study aimed to isolate new coffee-specific informative SSRs useful as genetic markers for characterizing coffee genome and linkage mapping studies. For the purpose, a partial small-insert genomic library was constructed from a commercially cultivated robusta variety 'Sln-274'. The library was screened using radioactive SSR oligo probes to isolate SSR-containing DNA fragments, which were sequenced and used for designing primer pairs from the flanking regions and subsequent conversion to PCR-based SSR markers. The designed primer pairs were standardized for PCR amplification, and then validated for utility as genetic markers using panels of elite coffee genotypes, a mapping population for linkage studies, and related taxa of coffee for cross-species transferability. In addition, sequence data of the screened and putative SSR-positive selected clones were used to assess the relative abundance of different SSR motifs in robusta coffee genome. In total 44 new highly informative SSR markers are developed.
Screening/Identification of SSR positive genomic sequences from the small insert partial genomic library of Sln-274
The small-insert partial genomic library constructed from robusta variety Sln-274 comprised 15,744 clones. Radioactive screening of the arrayed and blotted clones indicated 446 putative positives of which good quality sequence data could be obtained for 199 clones. The average insert size of the sequenced clones was 773.5 bp. Considering the latter, and that the sequenced clones represented a random sample of the genomic library with respect to the size, the total size of the cloned genome amounted to 12.2 Mb which equaled to ca. 1.5 % of the robusta coffee genome  (Table 1). SSR search of the clone sequences using the MISA search module, detected 76 genuine SSR-positive clones (0.48% of the total library) containing both targeted and non-targeted SSR motifs. Overall, these clones contained 92 SSRs comprising DNRs (48.3%), TNRs (25.9%), and HO-NRs (4.8%), and 24 SSRs comprising only MNRs (20.7%) (Table 1, 2). Among the targeted repeat motifs (screened SSR-oligo nucleotides), AG was the most abundant repeat (26.7%), followed by AC (12.9%) and AGC (7.8%), whereas CCG (0.9%) was the least abundant and ACT was not detected at all (Table 2). Similarly, among the non-targeted SSR motifs other than MNRs, AT was the most abundant repeat (8.6%, Table 2).
Frequency and distribution of SSRs in coffee genome
A total of 76 targeted SSRs (DNRs and TNRs) and 10 non-targeted DNRs were assessed for their lengths, distribution in the present library, and their relative abundance in the robusta genome (Table 2). Average length (in terms of repeat units) for the DNRs and TNRs was 9.6 and 5.9, respectively. Among DNRs, AT and AG were comparable and longer than AC, whereas ACG and AGC were the longest of the TNRs (Table 2). The size of cloned/screened genomic library and the observed data for identified SSRs were considered along with the earlier predicted size of the robusta genome  to derive relative estimates for frequency/distribution of different SSR motifs in the robusta genome. The analysis revealed coffee genome to be enriched in AT type DNRs (AT-DNR), which were estimated to be many fold more than any other SSR motifs (targeted and/or non-targeted). The results indicated one AT-DNR per 16 Kb (1/16 Kb) of robusta genome; this was almost 20-fold higher than the next most abundant DNR i.e. AG (ca. 1/393 Kb). The DNRs as a single class were estimated to be 1/15 Kb genome when AT (comprising 94% of the total DNRs) was included, and 1/265 Kb coffee genome for the remaining ones. In comparison, the overall frequency of TNRs was calculated to be 1/406 Kb with AGC being the most predominant (ca. 1/1300 Kb) and CCG the least (ca. 1/12200 Kb). In addition, a few other higher order SSRs (mainly the AT-rich) were also detected but these were not used for estimate calculations, as their numbers were very low. Thus, the present study indicated an abundance of one SSR (either DNR or TNR) per 15 Kb of robusta coffee genome, wherein the DNRs were ~27 times more abundant than the TNRs.
Development of microsatellite markers
All the identified SSR-positive sequences were tried to design primer pairs for conversion to microsat markers using 'SSR motif length' (of ≥ 7 and 5 repeats for DNRs and higher order SSRs, respectively) as one major criterion. As a result, only 56 of the total 92 identified SSRs (all except MNRs) were found suitable for primer design indicating 60.9% primer suitability. These comprised 42.2% DNRs, 40.7% compound SSRs, 6.8% TNRs, 5.1% TtNRs and 1.7% HNRs. In addition, primers were also designed for 2 of the randomly chosen 14 MNRs to test their potential for conversion to SSR markers. Among the SSRs found unsuitable for primer design, 70.6% had shorter motif length and 29.4% had flanking regions unsuitable for primer modeling. Of the 58 potential primer pairs designed, 52 could be successfully amplified and 44 of these could further be validated (Table 3, 4) as useful markers indicating ~76% primer to marker conversion ratio.
Validation of microsatellite markers for use in genetic studies
Allelic diversity, heterozygosity status and extent of polymorphism
For ascertaining the useful attributes of genetic markers, all the new 44 microsatellite markers were tested on a panel of 16 elite robusta and arabica genotypes. Good allelic amplification was obtained for all the markers across the tested genotypes, except for CaM54 that did not give any amplification for the arabicas. In general, the new markers revealed low to medium allelic diversity, and notably 13 of them (CaM02, 06, 15, 18, 21, 31, 34, 35, 39, 43, 55, 57, 58) resulted in double alleles in case of all the tested arabicas. Overall, a maximum of six and seven alleles (NA) with an average of 2.7 and 3.8 alleles/marker were obtained for the tested markers of which 83.7% and 90.9% were polymorphic/informative forarabica and robusta genotypes respectively (Table 4). Seven markers (CaM08, 09, 11, 12, 22, 23, 53) in the case of arabicas and four (CaM11, 13, 15, 23) for robustas were found to be monomorphic. The distribution of number of alleles amplified by each polymorphic marker (Pm) was highly skewed for arabica genotypes (Kurtosis: 1.19 and Skewness: 1.22) in comparison with robustas (Kurtosis: -1.08 and Skewness: -0.57) as seen in Figure 1a.
The PIC values varied considerably for the new markers across the tested genotypes. The mean PIC value for arabicas was 0.49 (range 0.12 – 0.81), which was significantly less than 0.62 (0.23 – 0.83) observed for robusta (Table 4, Figure 1b). Further, the student's t test revealed highly significant differences in the total number of amplified alleles (NA) and PIC value estimates for arabica and robusta genotypes (NA: t = 3.18, P = 0.00, and PIC: t = 3.46, P = 0.00) for the amplified and comparable markers.
The above SSR allelic data, when used to calculate the heterozygosity estimates, revealed highly significant differences between the observed and expected heterozygosity both for arabicas (mean Ho: 0.29 and mean He = 0.50; paired t value = 3.64; P = 0.00) as well as for robustas (mean Ho: 0.52 mean He: 0.63; paired t value = -2.54; P = 0.01). The results, thus, suggested significant heterozygote deficiency in both the germplasm sets. Further, only 15 of the 23 Pms (62.5%) were found to be in HW equilibrium in the case of arabicas, while the remaining eight showed significant heterozygote deficiency (Table 4) corroborating the heterozygosity data. Similarly, in robustas, 28 (65.2%) of the 41 Pms were found to be in HW equilibrium and of the remaining 14 Pms, eight markers showed significant heterozygote deficiency while six markers showed heterozygote excess.
The LD test performed for all the Pms, showed 29.8% (82 of 275) and 25.0% (202 of 780) pair-wise comparisons in significant dis-equilibrium (P < 0.05) for arabicas and robustas respectively. On an average each Pm was found to be in dis-equilibrium with 3.4 (SD: ± 2.4, SE: ± 0.51) other Pms in case of arabicas and 4.9 (SD: ± 4.0, SE: ± 0.63) for robustas. The maximum LD was observed for the marker CaM24 (with six other markers) in arabicas and CaM26 (with eight other markers) in robustas.
Discriminatory power (individualization capacity) of novel SSR markers
The discriminatory power of all the new informative SSR markers for possible genotype individualization were inferred by calculating two types of the 'probability of identity' (PI) estimates i.e. sib-based and unbiased considering the tested germplasm as related or unrelated, respectively. PI estimates obtained (Table 5), show that the sib-based PI values for individual markers were around 10-1 for both the arabicas and robustas, whereas the unbiased PI estimates ranged from 10-1 – 10-4 for arabicas and 10-1 – 10-3 for robustas. In comparison, the cumulative PIs indicating discriminatory power of the new markers were found to be manifold higher for the tested robusta genepool compared to arabicas. The sib-based cumulative PIs calculated over 10, 20 and total number of most informative markers (23 in the case of arabicas and 40 in the case of robustas) were: 4.28 × 10-4, 8.39 × 10-6, 5.29 × 10-6 for arabicas, and 5.1 × 10-5, 1.81 × 10-8, 1.22 × 10-12 for robustas. Similarly, comparable unbiased cumulative PI estimates were: 2.14 × 10-15, 4.59 × 10-20, 1.09 × 10-20 for arabicas, and 2.68 × 10-20, 4.54 × 10-32, 2.05 × 10-43 for robustas.
Mappability of novel SSR markers
The new SSR markers were tested for their mappability on robusta linkage map. In total, 9 of the 44 new markers (20.5%) were found to be polymorphic for the parents of the robusta pseudo-testcross mapping population i.e. CXR and Kagganahalla. The nine markers (CaM03, 16, 20, 22, 32, 35, 42, 44 and 46) could be mapped on the robusta linkage map developed by us . Notably, seven of the markers (except CaM16 and CaM46) were mapped on independent LGs, which indicated the new markers to be randomly distributed on the robusta genome (Figure 2, Table 3).
Cross-species/-genera transferability and primer conservance
Cross species transferability of the new robusta derived SSR-markers was tested for 13 related Coffea and two Psilanthus species. In general, the markers resulted in robust cross-species amplifications with alleles of comparable sizes in the tested taxa (Table 4). Overall, an average transferability of ~92% was observed (Table 6, 7), which was higher for Coffea spp. (> 93%) than for the related Psilanthus spp. (~82%). Moreover, within different Coffea taxa, across its different botanical subsections, the transferability was comparable (> 91%). The data thus, indicated a very high marker conservance across the related coffee species, which was calculated to be ~91% over all the tested markers. Marker CaM54 exhibited lowest conservance of 23% (for Coffea species) and 27% (over all taxa), whereas 24 markers were found to be 100% conserved. The data also revealed the presence of some private alleles (PAs), which possibly could be species-specific. In total, 104 such alleles were found in Coffea (with a mean number of 8.7 PAs/species) and 35 in Psilanthus species (17.5 PAs/species), over all the 44 markers. These accounted for ~34% of amplified alleles in Coffea spp. and 45% of those amplified in Psilanthus spp.
Generic affinities within/between cultivated and wild coffee germplasm
The diploid microsatellite data were examined for their potential in genetic diversity studies by studying the variation and interrelationship between the cultivated as well as wild genepool. The average genetic distance values (calculated using the SSR allelic data) were found to be 0.26 (SD: ± 0.06; SE: ± 0.01), 0.43 (SD: ± 0.06; SE: ± 0.01) and 0.51 (SD: ± 0.17; SE: ± 0.02) for the tested arabicas, robustas and over both the sets, respectively. Similar estimates calculated for different Coffea and Psilanthus species were: 0.57 (SD: ± 0.12; SE: ± 0.04) for Erythrocoffea (diploid + tetraploid), 0.54 (SD: ± 0.07; SE: ± 0.05) for Erythrocoffea (diploids), 0.58 (SD: ± 0.05; SE: ± 0.02) for Mozambicoffea, 0.63 (SD: ± 0.09; SE: ± 0.02) for Pachycoffea, 0.65 (only two species, thus no SD) for Paracoffea, and 0.72 (SD: ± 0.10; SE: ± 0.01) over all the compared species.
The NJ phenetic tree generated using the genetic distance estimates for eight genotypes each from arabica and robusta clearly resolved the tested germplasm in two distinct clusters, one representing all the tetraploid arabicas, while the other comprised all the diploid robustagenotypes (Figure 3) with significant branch support. The selections from pure arabicas formed a single cluster within arabicas, whereas selections from hybrids formed different group. HdeT was found closest to S2790 and S2792, whereas Sln11 was found to be the most distant entry in arabicas. Similarly, a clustering analysis of 14 related species (12 Coffea and two Psilanthus spp.; Figure 4) along with two genotypes each from C. arabica and C. canephora formed coherent clusters of diploid Erythrocoffeas (C. canephora, C. congensis), tetraploid Erythrocoffea (C. arabica), Mozambicoffea (C. racemosa, C. eugenioides, C. salvatrix, C. kapakata), and Pachycoffea (C. liberica, C. dewevrei, C. abeokutae as one cluster and C. excelsa, C. arnoldiana, C. aruwemiensis as other cluster). A single entry for Melanocoffea represented by C. stenophylla was the most divergent among the Coffea species and showed proximity with entries from Paracoffea section (Psilanthus spp.).
Distribution and abundance of detected SSR motifs
The coffee-specific SSR markers described in this study were developed using the conventional approach of construction/screening of a partial small-insert genomic library. The success rate of any microsatellite development effort is indicated by the proportion of SSR-containing clones in the library followed by number of detected SSRs, qualities of SSR motifs and also by the quality of flanking regions. In the present study, 76 good quality SSR-positive clones containing a total of 116 SSRs were obtained from which 44 SSR markers were developed (Table 1, 3). The results, thus, suggested a success rate of 0.48% in the identification of potential target SSR-positive clones, and 0.28% in overall marker development. In a representative study to assess success of conventional library screening approach for microsat marker development in 16 different plant genera, it was found that the proportion of SSR-positive clones varied significantly (0.059% to 5.8% with an average of 2.5%) from species to species . The observed SSR detection efficiency of the approach in this study was comparable with earlier reports in Acasia (0.32%, ) and peanut (0.43%, ), but was higher than rice (0.22%, ), potato, (0.06 to 0.15%, ) and wheat (0.11% ), and less than white spruce (0.62%, ).
The estimates derived from this study revealed that the relative distribution of different SSRs in robusta coffee genome is relatively poor in overall SSR abundance (1/160 Kb for targeted SSRs, and 1/15 kb including the non-targeted SSRs; Table 2) compared to various other plant species such as Arabidopsis, rice, barley (1 every 6–8 Kb)  and mulberry (our unpublished data). Nevertheless, the relative frequency, repeat lengths, and distribution pattern of different types of genomic SSRs in coffee genome (Table 2) were comparable to those reported in a number of plant species like apple , avacado , birch , peach , Acasia  and tomato . In specific, AG was detected in higher proportion (almost 2 times) than AC; AG repeat cores were, in general, found to be longer than any other SSR type. Repeat cores of TNRs were, in general, smaller than DNRs, and AT (the non-targeted SSR) was found to be the most abundant in comparison to any other DNR or TNR. In comparison, the AT-rich TNRs in the coffee genome were found to be relatively less abundant than seen in most plant species [16, 27, 28], but comparable to some of the tree species like avacado (ACC > AGG > AAG, ) and peach (abundant in AGG, ). A species specific-pattern of TNR abundance has also been demonstrated in closely related species like rice and wheat that belong to the same family but differ significantly in their genomic TNR content [29–31]. Some of the variation seen in the SSR estimates (relative frequency, distribution and abundance) as discussed above across different studies including the present one on coffee, can be ascribed to the differences in criteria used for SSR search viz., minimum length of repeat-core, the size of the genomic library screened, screening stringency, oligos used for screening and SSR mining tools, notwithstanding the innate differences in genomic organization of SSRs in different species.
A comparison of the relative abundance/distribution of genomic SSRs with that of genic-SSRs developed from coffee transcriptome earlier by us , revealed two striking differences viz., an apparent higher abundance of SSRs in the transcriptome (1/2.16 Kb) and a near reverse pattern of TNR abundance/relative distribution in two types of SSRs. Importantly, the two most abundant TNRs (AAG, ACT) in the genic-SSRs were least abundant or not-detected in the genomic SSRs. The observation would suggest interesting possibilities of differential distribution/organization of TNRs as well as restriction sites for the enzymes used for library construction across gene-rich and gene-deficient regions of the coffee genome. However, such possibilities can only be addressed by further detailed genomic studies in times to come.
Development of new SSR markers
In coffee, to the best of our knowledge till date only ca. 180 genomic SSRs have been described in literature [4–11] warranting continuous efforts to develop additional new markers to expand the existing repertoire for their efficient deployment in genetic studies in coffee. In this study 63% of the detected SSRs were found useful for primer design/marker conversion, a much higher success rate compared to that reported for apple (30% ), cassava (37.7% ), Elymus caninus (11.1% ), oat 25.2%  and potato (26.9% ). The two main sequence attributes that rendered 36 identified SSRs unsuitable for primer design were found to be: a shorter repeat core, and a low-complexity flanking region (AT/GC-rich and/or regions prone to secondary structure formation) unsuitable for primer modeling. Interestingly, in the present study, not even a single failure was due to the location of SSR-core towards the end of clone sequence, which is reported to be one major limiting factor in many earlier studies in cassava, tomato, oat and fir [26, 32, 34, 35]. The higher success rate and less number of limiting factors in primer-designing observed in this study are expected to be due to the better suitability of the restriction enzymes, as well as, the relatively longer genomic fragments (0.5 to 1.5 kb) used for the genomic library construction. Importance of size of the genomic fragments used for construction of genomic library/SSR-marker development has also been shown earlier in groundnut .
The proportion of designed primers successfully producing amplification products gives a primer-to-marker conversion ratio and indicates the ultimate success of the library construction effort. In this study, of the 58 primer pairs designed, 44 could be validated as efficient SSR-markers (see Tables 3, 4, and the discussion in the following sections) thus resulting in ~75.8% primer-to-marker conversion ratio, broadly comparable to many earlier conventional genomic library-based studies viz., cucurbits , Elymus , peanut , tomato,  and rice . One of the lowest primer-to-marker convertibility reported for Douglas fir (4.1%) was suggested to be due to the complexity unique to the conifer genomes [35, 37–39]. Further, a survey of the literature suggests, in general, a higher conversion ratios for small genomes like apple, peach, and a negative correlation between the genome size and the amplification efficiency of SSR primers due to mechanistic reasons .
Two of the 44 new SSR markers described here (CaM49, 55) were based on MNR repeats. In general, these markers warranted much more critical appraisal for ascertaining their individual alleles/sizing that in many cases were not easily distinguishable from the similar sized confounding stutter amplicons (data not shown). Therefore, it may be prudent to avoid use of such MNR-based markers despite these being informative, unless no other markers are available.
Utility of new SSRs as genetic markers
Till date, there are a few studies describing development of coffee-specific SSR markers [4–11]; however, only a few of these provide data for the utility of new SSRs in genetic studies [8, 11]. Therefore, one major aim of the present study was to test the potential of the new markers reported here for their use in studies related to genetic diversity in cultivated coffee germplasm, linkage mapping, constructing reference panels/bar codes for individualization of genotypes, cross-species transferability, and taxonomic relationship in related taxa.
Level of allelic polymorphism and genetic diversity
Various genetic parameters viz., allelic diversity, PIC, Ho, He, Kurtosis/skewness, HWE, LD, calculated for all the new SSRs amply demonstrated their utility as genetic markers (see results, Table 4). In general, the markers revealed low to moderate allelic/genetic diversity which was comparable and in some cases more than that reported for the earlier described coffee genomic SSRs [6, 8], and as expected, invariably higher than the genic-SSRs [10, 11, 41]. The total number of alleles amplified by different markers in the tested arabicas and robustas was almost similar; however, the markers were found significantly more informative with higher PIC values for robustas. In addition, it was important to note that 13 of the tested markers amplified two distinct but similar sized alleles across all the tested arabicas suggesting these to be the result of duplicated fixed loci in the arabica genome. The above observations are likely considering the reproductive behavior, genome evolution and domestication process of two types of coffee. The robustas are expected to be genetically more diverse (leading to higher PIC for tested markers) due to their out-crossing behavior in contrast to arabicas that are self-compatible and also known to suffer from narrow genetic base resulting from the genetic bottleneck during domestication process [8, 11]. Similarly, the duplicate loci in arabica genome are plausible as it is an allotetraploid resulted from hybridization of two homeologous diploid genomes (C. eugenioides and C. canephora) followed by diploidization and stabilization .
Different genetic parameters/tests such as Ho, He, LD, HWE are important indicators of origin, evolution and distribution of diversity in the available genepool. The heterozygosity measures (Ho, He) for the new SSR markers indicated significant heterozygote decay (deficiency) in the tested germplasm. Kurtosis/skewness parameters indicated that the allelic diversity for the new SSRs does not follow normal distribution. Similarly, the HWE and LD analysis of the polymorphic markers (Pms) revealed only about 2/3rd of the markers (63 – 65 %) in HW equilibrium and about 25–29 % markers showing significant LD in the analyzed arabicas and robustas. These results are in agreement with our earlier observations with genomic as well as genic-SSRs [6, 10, 11], and indeed reflective of the genetic composition and mating behavior of the tested materials. Overall, these studies indicated that the tested robusta germplasm comprised allogamous, relatively unrelated genotypes (selections and one hybrid), while autogamous arabicas comprised mostly of hybrid varieties/selections with overlapping/shared pedigrees. The results thus suggest the suitability of the new markers for reliably ascertaining genetic diversity in the coffee genepool.
Discriminatory power of new SSR markers
Individualization of plant germplasm resources has become important in the present day scenario for their proper management and utilization, as well as IPR protection which can be achieved by DNA typing techniques involving use of highly polymorphic markers like SSRs. Germplasm characterization using such typing approaches remains a costly proposition, especially if the target species like coffee that has very limited diversity in its available genepool. To circumvent these problems and increase the utility of such efforts, it has been proposed to build reference DNA polymorphism data resources/panels for coffee germplasm using robust markers like SSRs and common experimental guidelines [12, 43]. Such reference resource can then readily be used for coffee genotype individualization, germplasm selection for breeding/improvement, and germplasm exchange in international collaborations [12, 43]. In this context, it becomes important to ascertain the PI estimates (that provide very informative indicators of the discrimination potential) of the SSR markers, before deployment in germplasm characterization studies. In general, the PI estimates for the new markers ranged from low to moderate when considered individually, but were highly informative for genotype discrimination when tested together (cumulative PI).
Moreover, the estimates were found to be reflective of the diversity status in the test germplasm, and accordingly were significantly different (lower) for arabicas than the robustas (Table 5). The analysis in general indicated the need for use of 3–4 times more markers to achieve the comparable level of discrimination in the two coffee genepools. Moreover, the data suggested that from practical point of view it might be prudent to calculate the sib-based PI (a more conservative estimate of discrimination) for deciding the number of markers that can provide sufficient variability for individualization of the test germplasm. This is expected as the sib-based PI discounts the possible similarities/relatedness in the target germplasm arising due to overlapping pedigrees/common parentage.
Mappability of the new SSR markers
One of the major potential utilities of DNA markers is their use as robust genomic landmarks on the linkage groups that can subsequently be tagged to the gene(s) controlling important traits of interest providing possibilities of MAS-based breeding. This requires generation of reasonably dense linkage maps populated with large number of revisitable DNA markers for which the SSRs remain the most desired ones. Till date, very few SSRs are mapped on the robusta linkage map [7, 44] warranting extensive efforts to generate more SSR markers usable for linkage analysis. In this regard, we tested the suitability of the new markers for linkage mapping using a pseudo-testcross mapping population of robusta coffee. Significantly, 20.5% of the markers were found to be polymorphic for the parents of the mapping population, and all of these could be successfully mapped (Figure 2). The mapped markers were distributed on different linkage groups, and some of these mapped towards the ends of the LGs as has been seen in the earlier studies . The data, thus, strongly demonstrate that the new markers can be efficiently used for genetic linkage studies in coffee.
The low-moderate level of diversity exhibited by the new markers in the cultivated coffee genepool, is more than compensated by their high potential for cross-species transferability. All the markers revealed robust cross-species/-generic amplifications with alleles of comparable sizes when tested for 13 Coffea and two Psilanthus taxa (Table 7). The data revealed that the markers described here show much better taxa transferability than the earlier published genomic SSR markers [6, 9, 10], but relatively less than the genic SSR markers reported by us [10, 11]. More importantly, the markers showed comparable transferability across related species of Coffea as well as 2 species of the related genus Psilanthus. This is significant as successful cross-species amplification is generally restricted to related species within a genus and reduces when tested for different genera [11, 45]. Further, it was interesting to note that all the new SSRs that were monomorphic/uninformative for the tested arabica and robusta germplasm, exhibited considerable polymorphism across the tested related taxa. The only exception was the marker CaM54 that showed a very low conservance even across the Coffea spp. Thus, the new SSR markers described here strengthen the possibility of their use as Conserved Orthologous Sets (COS) for genetic characterization of different related wild coffee taxa, and also for coffee taxonomic/synteny studies.
Diversity analysis and genetic relatedness within/between Coffea and Psilanthus species
The genomic SSRs described in this study, despite revealing low level of polymorphism, were able to group all the 16 genotypes belonging to two cultivated germplasms in phenetic clustering that were indicative of species relationship and confirming their known pedigrees (Figure 3). For example, the analysis confirmed the related origin of S2790 and S2792, which are two-way hybrids between HdeT and Taferikela.
Similarly, the analysis of 20 representative samples belonging to 14 Coffea and two Psilanthus species, revealed generic affinities that were in general agreement with their known taxonomic relationships, based on their geographical distribution as well as Chevalier's botanical classification  (Figure 4). Accordingly, the phenetic tree based on the new markers data very clearly grouped the analyzed related coffee species as per their respective botanical sub-sections (see results). Importantly, the analysis distinctly separated the two Paracoffea species (P. bengalensis and P. wightiana) from all the other Coffea spp. These results are similar to the earlier published studies undertaken to ascertain species relationships using SSRs [8, 9, 11], as well as other marker approaches [47–49]. A close relationship of C. kapakata to the Mozambicoffea taxa, and status of the only Melanocoffea taxon C. stenophylla as seen here was also indicated earlier in the EST-SSR and ISSR-based studies [11, 47]. These results, thus, demonstrate that the new SSR markers developed in the present study can be highly informative in exploring the taxonomic relationship of coffee species complex.
In summary, the present study describes 44 new microsatellite markers developed using the conventional approach of construction/screening of partial small-insert genomic library. The approach was found to be successful but difficult and experiment-intensive with low success rate of ~0.48%. Analysis of the identified SSR-positive genomic clones provided insights into the relative abundance, and distribution pattern of different SSR motifs in the coffee genome that was found to be relatively poor in its SSR abundance compared to many other plant genomes. Overall, the DNRs were much more abundant than TNRs, and among different types of SSR motifs, AT was the most abundant followed by AG, AC, and ACG. The TNR CCG, was the least abundant. More than 50% of the identified SSRs could be converted to usable markers resulting in a high primer-to-marker conversion ratio. All the 44 markers were found to be polymorphic in the tested coffee/related germplasm and their utility as efficient genetic markers could be demonstrated for diversity analysis, germplasm individualization, linkage mapping, cross-species transferability and taxonomic studies. This study has thus enriched the available small repertoire of coffee SSR markers by 44 new SSRs, which are not only useful for cultivated coffee but are also expected to be equally useful for genetic studies involving related species that constitute the important secondary genepool for improvement of coffee.
Plant material and DNA extraction
In this study sixteen elite genotypes belonging to C. arabica and C. canephora were used along with 14 related wild species belonging to Coffea and Psilanthus genera (Table 7). The leaf samples for each of them were collected from germplasm bank maintained at Central Coffee Research Institute, Balehonnur, Karnataka, India and DNA was isolated following the method described by Aggarwal et al. .
Construction of genomic library and isolation of SSR containing sequences
A partial small-insert genomic library was constructed using standard procedures  from total cellular DNA isolated from an elite robusta genotype, Sln-274. Approximately 10 μg of genomic DNA was digested with Rsa I and Hae III (NEB) restriction endonucleases (NEB, USA) and fractionated in 1% agarose gel. Genomic fragments of 500 to 1500 bp were gel-excised, purified using the GFX column (Amersham), ligated to pMOS Blunt-ended plasmid vector (Amersham) using T4 DNA-ligase, and finally the ligated genomic inserts were cloned in Escherichia coli DH10B host cells by electroporation. The transformed cells were grown overnight and recombinant white colonies were individually picked up and maintained in forty one 384-well microtiter culture plates, and replicated onto Hybond-N+ nylon membranes (Amersham Biosciences, USA) to obtain high-density hybridization filters for screening. All the 15,744 arrayed recombinant clones were Southern hybridized to γ-32P-labeled two oligo pools (each comprising different synthetic oligonucleotides in equimolor concentration), viz Pool-I: (CA)15, (GA)15, (CAA)10, (CAT)10, (ACT)10, (GATA)10, (AGA)10, (CATA)10; and Pool-II: (CTG)10, (GAC)10, (AGG)10, (GGT)10, (GCC)10, (GC)15. Hybridized clones were selectively picked up and individually processed for plasmid isolation following the standard alkaline lysis method . The genomic inserts were then amplified and sequenced using M13 universal primers for both the strands on 3700 DNA Analyzer using BigDye™ chemistry as per the manufacturer's details (Applied Biosystems, USA). The sequences were aligned and edited using Autoassembler (Applied Biosystems, USA) and finally saved in FASTA format.
The identification and localization of microsatellites in the sequenced clones was performed using microsatellite search module MISA (for more information please see Availability & requirements section below) followed by visual assessment. Criteria for SSR search by the MISA were repeat stretches having a minimum of: 12 repeat units for MNRs, six repeat units in case of DNRs and five repeat units for HO-NRs. The microsatellites were classified considering the complementarities of the repeat motifs, e.g., AG, GA, TC and CT were considered as a single category. Primer pairs were designed for the SSR containing sequences with minimum of seven DNRs, and/or five repeats for all other SSRs using GENETOOL Lite version 1.0 (for more information please see Availability & requirements section below). The primers were commercially synthesized (Bioserve, India – for more information please see Availability & requirements section below) with forward primers having the fluorescent label FAM or HEX. The details of these new markers viz., locus designation, primer sequences, repeat motifs, allele attributes, PIC estimates and Genbank accession numbers, are summarized in Table 3, 4. The primer pairs were standardized and PCR was performed as described earlier [10, 11]. The amplified products were run on capillary-based 3730 DNA Analyzer (Applied Biosystems) and the products were precisely sized for major, comparable and conspicuous peaks using GeneMapper 3.7 (Applied Biosystems), using default parameters.
Statistical and genetic analysis
The allelic data for eight genotypes each for arabicas and robustas were used to calculate different statistical and genetic parameters. The statistical attributes like mean, skewedness, kurtosis, t-test etc. were calculated using Microsoft Excel function utilities. Observed heterozygosity (Ho) was calculated as fraction of heterozygous genotypes over total number of genotyped plants. Expected heterozygosity (He) was calculated according to the following formula :
PIC values were calculated according to Botstein et al.  as follows:
n = the total number of alleles detected for a microsatellite marker,
Pi = the frequency of the ith allele, and
pj = the frequency of the (i+1)th allele in the set of analyzed genotypes.
The bi-allelic polymorphic data were also tested for Hardy-Weinberg equilibrium (HWE) using Fisher's exact test and Markov chain algorithm with forecasted chain length of 10,000,000 and 100,000 dememorization steps and linkage dis-equilibrium (LD) test was performed using 1,000 permutations. For arabicas, the markers that showed invariable presence of 'double alleles' across the tested germplasm were considered as independent amplifications from duplicated loci present in two distinct copies and were excluded from the analysis for the allelic attributes described above. The Ho, He, estimates and HW and LD tests were done using the program Arlequin ver 3.1 , and the probability of identity (PI) estimates were calculated using the program Gimlet ver 1.3.2 . Private alleles (PAs) were determined using the software Convert ver. 1.3.1  over all the 30 genotypes. The discriminatory power of each microsatellite locus was calculated by estimating sib-based and unbiased corrected PI estimates and cumulative power of discrimination was calculated as products of PIs of successive informative markers arranged in decreasing order as described by Waits et al. . Cross-taxa transferability (T mark ) was calculated as proportion of primers showing successful amplification vis-à-vis all the tested primers, whereas primer conservance (C taxa ) was calculated as proportion of the species displaying successful amplification vis-à-vis all the tested markers.
The average genomic distance estimates between the detected SSR motifs were obtained by considering random sampling of the genome. Thus, for targeted SSRs, size of the sampled genome was considered equal to the total size of screened library, whereas for the non-targeted SSRs, the size of genome actually sequenced was used to get the estimates, considering the haploid coffee genome equivalent to 809 Mb . Initially, the number of different DNRs and TNRs present in the robusta genome were estimated from the screened genome for targeted SSRs and the sequenced genome for non-targeted SSR i.e. AT-DNR. These were further used to estimate distribution of different SSRs in terms of SSR per Mb of the genome, and also as spacing between two such consecutive SSR repeats in robusta genome.
The linkage map was constructed using JoinMap ver 3.0, at LOD 5.0 and other default parameters as per the software instructions. The segregating allelic data was scored for the tested microsatellites as per the models specified in JoinMap for co-dominant marker-segregation in a pseudo-testcross population. The segregation data obtained in this study was used along with the mapping data available for the reference robusta population in the lab (unpublished).
Genetic Diversity Analysis
The SSR data from Pms were used to ascertain the generic relationships/affinities between the tested germplasm (cultivated genotypes/related species) using cluster analysis based on genetic distance values. Initially 100 bootstrap distance matrices were generated using bi-allelic microsatellite data analysis tool, MicroSatellite Analyzer (MSA)  and Nei's genetic distance measure . From these distance data, neighbour joining (NJ) trees were generated for each matrix separately using Phylip ver 3.6  by 'neighbor' command, which was followed by generation of consensus trees, one each for the cultivated germplasm and inter-species relationships.
Availability & requirements
1 MIcroSAtellite: http://pgrc.ipk-gatersleben.de/misa/
2 GENETOOL Lite version 1.0: http://www.biotools.com/downloads/brochures/GeneTool2.pdf
3 Bioserve, India: http://www.bioserveindia.com/
- C taxa :
Conservation of markers across the tested taxa
Conserved Orthologous Sets
Higher Order Nucleotide Repeats
Simple Sequence Repeat
Number of Alleles NJ: Neighbour Joining
Polymorophism Information Content
Probability of Identity
- T mark :
Transferability of markers across all the studied taxa
Fitter R, Kaplinsky R: Who gains from product rents as the coffee market becomes more differentiated? A value chain analysis. IDS Bulletin (Special Issue). 2001, 32 (3): 69-82.
Powell W, Machray GC, Provan J: Polymorphism revealed by simple sequence repeats. Trends Plant Sci. 1996, 1: 215-222.
Gupta PK, Varshney RK: The development and use of microsatellite markers for genetic analysis and plant breeding with emphasis on bread wheat. Euphytica. 2000, 113: 163-185.
Combes MC, Andrzejewski S, Anthony F, Bertrand B, Rovelli P, Graziosi G, Lashermes P: Characterization of microsatellite loci in Coffea arabica and related coffee species. Mol Ecol. 2000, 9: 1178-1180.
Rovelli P, Mettulio R, Anthony F, Anzueto F, Lashermes P, Graziosi G: Microsatellites in Coffea arabica L. Coffee Biotechnology and Quality. Edited by: Sera T, Soccol CR, Pandey A, Roussos S. Kluwer Academic Publishers; 2000:123-133.
Baruah A, Naik V, Hendre PS, Rajkumar R, Rajendrakumar P, Aggarwal RK: Isolation and characterization of nine microsatellite markers from Coffea arabica L., showing wide cross-species amplifications. Mol Ecol Notes. 2003, 3: 647-650.
Coulibaly I, Revol B, Noirot M, Poncet V, Lorieux M, Carasco-Lacombe C, Minier J, Dufour M, Hamon P: AFLP and SSR polymorphism in a Coffea interspecific backcross progeny [(C. heterocalyx X C. canephora) X C. canephora]. Theor Appl Genet. 2003, 107: 1148-1155.
Moncada P, McCouch S: Simple sequence repeat diversity in diploid and tetraploid Coffea species. Genome. 2004, 47: 501-509.
Poncet V, Hamon P, Minier J, Carasco C, Hamon S, Noirot M: SSR cross-amplification and variation within coffee trees (Coffea spp.). Genome. 2004, 47: 1071-1081.
Bhat PR, Krishnakumar V, Hendre PS, Rajendrakumar P, Varshney RK, Aggarwal RK: Identification and characterization of expressed sequence tags-derived simple sequence repeats markers from robusta coffee variety 'CXR' (an interspecific hybrid of Coffea canephora & Coffea congensis). Mol Ecol Notes. 2005, 80-83.
Aggarwal RK, Hendre PS, Varshney RK, Bhat PR, Krishnakumar V, Singh L: Identification, characterization and utilization of EST-derived genic microsatellite markers for genome analyses of coffee and related species. Theor Appl Genet. 2007, 114: 359-372.
Hendre PS, Aggarwal RK: DNA markers: development and application for genetic improvement of coffee. Genomic Assisted Crop Improvement: Genomics Applications in Crops. Edited by: Varshney RK, Tuberosa R. Springer-Verlag, Germany. 2007, 2: 399-434.
Marie D, Brown SC: A cytometric exercise in plant DNA histograms, with 2C values for 70 species. Biol Cell. 1993, 78: 41-51. [http://data.kew.org/cvalues/searchguide.html]
Zane L, Bargelloni L, Patarnello T: Strategies for microsatellite isolation: a review. Mol Ecol. 2002, 11: 1-16.
Butcher PA, Decroocq S, Gray Y, Moran GF: Development, inheritance and cross-species amplification of microsatellite markers from Acacia mangium. Theor Appl Genet. 2000, 101: 1282-1290.
Ferguson ME, Burow MD, Schulze SR, Bramel PJ, Paterson AH, Kresovich S, Mitchell S: Microsatellite identification and characterization in peanut (A. hypogaea L.). Theor Appl Genet. 2004, 108: 1064-1070.
Chen X, Temnykh S, Xu Y, Cho YG, McCouch SR: Development of a microsatellite framework map providing genome-wide coverage in rice (Oryza sativa L.). Theor Appl Genet. 1997, 95: 553-567.
Ashkenazi V, Chani E, Lavi U, Levy D, Hillel J, Veilleux RE: Development of microsatellite markers in potato and their use in phylogenetic and fingerprinting analyses. Genome. 2001, 44: 50-62.
Bryan GJ, Collins AJ, Stephenson P, Orry A, Smith JB, Gale MD: Isolation and characterisation of microsatellites from hexaploid bread wheat. Theor Appl Genet. 1997, 94: 557-563.
Rajora OP, Rahman MH, Dayanandan S, Mosseler A: Isolation, characterization, inheritance and linkage of microsatellite DNA markers in white spruce (Picea glauca) and their usefulness in other spruce species. Mol Gen Genet. 2001, 264 (6): 871-882.
Cardle L, Ramsay L, Milbourne D, Macaulay M, Marshall D, Waugh R: Computational and Experimental Characterization of Physically Clustered Simple Sequence Repeats in Plants. Genetics. 2000, 156: 847-854.
Guilford P, Prakash S, Zhu JM, Rikkerink E, Gardiner S, Bassett H, Forster R: Microsatellites in Malus X domestica (apple): abundance, polymorphism and cultivar identification. Theor Appl Genet. 1997, 94: 249-254.
Sharon D, Cregan PB, Mhameed S, Kusharska M, Hillel J, Lahav E, Lavi U: An integrated genetic linkage map of avocado. Theor Appl Genet. 1997, 95: 911-921.
Pekkinen M, Varvio S, Kulju KK, Karkkainen H, Smolander S, Vihera-Aarnio A, Koski V, Sillanpaa MJ: Linkage map of birch, Betula pendula Roth, based on microsatellites and amplified fragment length polymorphisms. Genome. 2005, 48: 619-625.
Sosinski B, Gannavarapu M, Hager LD, Beck LE, King GJ, Ryder CD, Rajapakse S, Baird WV, Ballard RE, Abbott AG: Characterization of microsatellite markers in peach [Prunus persica (L.) Batsch]. Theor Appl Genet. 2000, 101: 421-428.
Areshchenkova T, Ganal MW: Comparative analysis of polymorphism and chromosomal location of tomato microsatellite markers isolated from different sources. Theor Appl Genet. 2002, 104: 229-235.
Lagercrantz U, Ellegren H, Andersson L: The abundance of various polymorphic microsatellite motifs differs between plants and vertebrates. Nucleic Acids Res. 1993, 21: 1111-1115.
Wang Z, Weber JL, Zhong G, Tanksley SD: Survey of plant short tandem DNA repeats. Theor Appl Genet. 1994, 88: 1-6.
Miyao A, Zhong HS, Monna L, Yano M, Yamamoto K, Havukkala I, Minobe Y, Sasaki T: Characterization and genetic mapping of simple sequence repeats in the rice genome. DNA Res. 1996, 3: 233-238.
Akagi H, Yokozeki Y, Inagaki A, Fujimura T: Microsatellite DNA markers for rice chromosomes. Theor Appl Genet. 1996, 93: 1071-1077.
Song QJ, Fickus EW, Cregan PB: Characterization of trinucleotide SSR motifs in wheat. Theor Appl Genet. 2002, 104: 286-293.
Mba REC, Stephenson P, Edwards K, Melzer S, Nkumbira J, Gullberg U, Apel K, Gale M, Tohme J, Fregene M: Simple sequence repeat (SSR) markers survey of the cassava (Manihot esculenta Crantz) genome: towards an SSR-based molecular genetic map of cassava. Theor Appl Genet. 2001, 102: 21-31.
Sun GL, Salomon B, Bothmer RV: Characterization and analysis of microsatellite loci in Elymus caninus (Tritiaceae: Poaceae). Theor Appl Genet. 1998, 96: 676-682.
Pal N, Sandhu JS, Domier LL, Kolb FL: Development and characterization of microsatellite and RFLP-derived PCR markers in oat. Crop Sci. 2002, 42: 912-918.
Slavov GT, Howe GT, Yakovlev I, Edwards KJ, Krutovskii KV, Tuskan GA, Carlson JE, Strauss SH, Adams WT: Highly variable SSR markers in Douglas-fir: Mendelian inheritance and map locations. Theor Appl Genet. 2004, 108: 873-880.
Danin-Poleg Y, Reis N, Tzuri G, Katzir N: Development and characterization of microsatellite markers in Cucumis. Theor Appl Genet. 2001, 102: 61-72.
Pfeiffer A, Oliveri AM, Morgante M: Identification and characterisation of microsatellites in Norway spruce (Picea abies K.). Genome. 1997, 40: 419-
Hicks M, Adams D, O'Keefe S, MacDonald E, Hodgegetts R: The development of RAPD and microsatellite markers in lodgepole pine (Pinus contorta var. latifolia). Genome. 1998, 41: 797-805.
Soranzo N, Provan J, Powell W: Characterisation of microsatellite loci in Pinus sylvestris L. Mol Ecol. 1998, 7: 1260-1261.
Garner TW: Genome size and microsatellites: the effect of nuclear size on amplification potential. Genome. 2002, 45: 212-215.
Varshney RK, Graner A, Sorrells ME: Genic microsatellite markers in plants: features and applications. Trends Biotechnol. 2005, 23: 48-55.
Lashermes P, Combes MC, Trouslot P, Anthony F, Charrier A: Molecular analysis of the origin and genetic diversity of Coffea arabica L.: implications for coffee improvement. Proceedings of EUCARPIA meeting on tropical plants. Montpellier, France; 1996:23-29.
Aggarwal RK, Rajkumar R, Rajendrakumar P, Hendre PS, Baruah A, Phanindranath R, Annapurna V, Prakash NS, Santaram A, Sreenivasan CS, Singh L: Fingerprinting of Indian coffee selections and development of reference DNA polymorphism panels for creating molecular IDs for variety identification. Proceedings of 20th international conference on coffee science (ASIC). Bangalore, India; 2004:751-755.
Lashermes P, Combes MC, Prakash NS, Trouslot P, Lorieux M, Charrier A: Genetic linkage map of Coffea canephora : effect of segregation distortion and analysis of recombination rate in male and female meioses. Genome. 2001, 44: 589-596.
Peakall R, Gilmore S, Keys W, Morgante M, Rafalski A: Cross-species amplification of soybean (Glycine max) simple sequence repeats (SSRs) within the genus and other legume genera: implications for the transferability of SSRs in plants. Mol Biol Evol. 1998, 15: 1275-1287.
Chevalier A: Les cafeiers du globe. III. Systematique des caféiers at Faux cafeiers. Maladieset insect nuisible. Encyclopedie de biologique 28. Edited by: Lechevalier P. Paris, France; 1947:356-
Ruas PM, Ruas CF, Rampim L, Carvaljo VP, Ruas EA, Sera T: Genetic relationship in Coffea species and parentage determination of interspecific hybrids using ISSR (inter-simple sequence repeat) markers. Genet Mol Biol. 2003.
Orozco-Castillo C, Chalmers KJ, Powell W, Waugh R: RAPD and organelle specific PCR re-affirms taxonomic relationships within the genus Coffea. Plant Cell Reports. 1996, 15: 337-341.
Lashermes P, Combes MC, Trouslot P, Charrier A: Phylogenetic relationship of coffee-tree species (Coffea L.) as inferred from ITS sequences of nuclear ribosomal DNA. Theor Appl Genet. 1997, 94: 947-955.
Aggarwal RK, Shenoy VV, Ramadevi J, Rajkumar R, Singh L: Molecular characterization of some Indian Basmati and other elite rice genotypes using fluorescent-AFLP. Theor Appl Genet. 2002, 105: 680-690.
Sambrook J, Fritsch EF, Maniatis T: Molecular cloning: A laboratory manual. New York: Cold Spring Harbor Press; 1989.
Excoffier L, Laval G, Schneider S: Arlequin ver 3.0: an integrated software package for population genetics data analysis. Evol Bioinform Online. 2005, 1: 47-50.
Botstein D, White RL, Skolnick M, Davis RW: Construction of a genetic linkage map in man using restriction fragment length polymorphisms. Am J Hum Genet. 1980, 32: 314-331.
Valiere N: Gimlet: a computer program for analysing genetic individual identification data. Mol Ecol Notes. 2002, 2: 377-379.
Glaubitz JC: Convert: a user-friendly program to reformat diploid genotypic data for commonly used population genetic software packages. Mol Ecol Notes. 2004, 4: 309-310.
Waits LP, Luikart G, Taberlet P: Estimating the probability of identity among genotypes in natural populations: cautions and guidelines. Mol Ecol. 2001, 10: 249-256.
Dieringer D, Schlotterer C: MicroSatellite Analyser (MSA): a platform independent analysis tool for large microsatellite data sets. Mol Ecol. 2003, 3: 167-169.
Nei M: Genetic distance between populations. Am Naturalist. 1972, 106: 238-292.
Felsenstein J: PHYLIP (Phylogeny Inference Package) version 3.6. Distributed by the author. Department of Genome Sciences, University of Washington, Seattle; 2004.
The authors thank the Department of Biotechnology (DBT), Government of India for the financial support under the National Network project on 'Coffee Genomics'; Director, CCMB, Hyderabad for the facilities to undertake the study; Director Research, Coffee Board, Bangalore for coffee materials; Mr Md Ashraf Ashfaq for help provided during the initial period of the work; Dr. T. Ramakrishna Murthy, Scientist, CCMB for correction and editing the manuscript. PSH acknowledges CSIR, India for junior and senior research fellowship during his doctoral research.
PSH constructed, screened the library and sequenced positive clones; developed and standardized majority of the markers; validated and analyzed the data; drafted the manuscript. PR, AL and AV standardized and validated some of the markers and helped in analysis. RKA conceptualized, planned, supervised, finalized, approved and communicated the final manuscript.