Development and characterization of microsatellite markers for Morus spp. and assessment of their transferability to other closely related species

Background Adoption of genomics based breeding has emerged as a promising approach for achieving comprehensive crop improvement. Such an approach is more relevant in the case of perennial species like mulberry. However, unavailability of genomic resources of co-dominant marker systems has been the major constraint for adopting molecular breeding to achieve genetic enhancement of Mulberry. The goal of this study was to develop and characterize a large number of locus specific genic and genomic SSR markers which can be effectively used for molecular characterization of mulberry species/genotypes. Result We analyzed a total of 3485 DNA sequences including genomic and expressed sequences (ESTs) of mulberry (Morus alba L.) genome. We identified 358 sequences to develop appropriate microsatellite primer pairs representing 222 genomic and 136 EST regions. Primers amplifying locus specific regions of Dudia white (a genotype of Morus alba L), were identified and 137 genomic and 51 genic SSR markers were standardized. A two pronged strategy was adopted to assess the applicability of these SSR markers using mulberry species and genotypes along with a few closely related species belonging to the family Moraceae viz., Ficus, Fig and Jackfruit. While 100% of these markers amplified specific loci on the mulberry genome, 79% were transferable to other related species indicating the robustness of these markers and the potential they hold in analyzing the molecular and genetic diversity among mulberry germplasm as well as other related species. The inherent ability of these markers in detecting heterozygosity combined with a high average polymorphic information content (PIC) of 0.559 ranging between 0.076 and 0.943 clearly demonstrates their potential as genomic resources in diversity analysis. The dissimilarity coefficient determined based on Neighbor joining method, revealed that the markers were successful in segregating the mulberry species, genotypes and other related species into distinct clusters. Conclusion We report a total of 188 genomic and genic SSR markers in Morus alba L. A large proportion of these markers (164) were polymorphic both among mulberry species and genotypes. A substantial number of these markers (149) were also transferable to other related species like Ficus, Fig and Jackfruit. The extent of polymorphism revealed and the ability to detect heterozygosity among the cross pollinated mulberry species and genotypes render these markers an invaluable genomic resource that can be utilized in assessing molecular diversity as well as in QTL mapping and subsequently mulberry crop improvement through MAS.


Background
Mulberry, a perennial out-breeding tree species is distributed in varied environments ranging from tropical to sub-arctic regions. The wide distribution can be attributed to its capability to adapt to diverse agro-climatic conditions, fast regeneration and both sexual and asexual modes of propagation. The mulberry leaf serves as the sole source of food to the domesticated silkworm, Bombyxmori L., and hence contributes significantly to the success of silk industry in India. It is predicted that around 27,000 MT of raw silk would need to be produced by the year 2030 to meet the demand in India [1]. This goal is strongly dependant on improving mulberry productivity. Enhancing the yield potential and minimizing the yield loss due to stresses are therefore the most viable strategies to achieve genetic enhancement of mulberry [2].
Despite the significant progress achieved so far, genetic improvement of mulberry yield potential through conventional breeding has been distressingly slow, mainly because of the perennial growth habit and complex inheritance pattern. Convincing evidences suggest that relevant traits need to be introgressed onto an elite genetic background to achieve greater success in crop improvement endeavors. Thus, the applications of modern molecular and genomic tools are expected to strongly complement the breeding efforts in enhancing yield potential of mulberry [2]. Advances in PCR based genomic approaches have generated robust DNA marker systems [3,4], which offer an effective approach to augment breeding methods for mulberry improvement [5]. Randomly amplified polymorphic DNA (RAPD), Amplified fragment length polymorphism (AFLP) and Inter simple sequence repeats (ISSR) have been the most frequently employed marker systems to study the genetic diversity among mulberry species and genotypes [6][7][8]. Though these marker systems provide a good option to discriminate the evolutionary relationships among species [9], being dominant, RAPD, AFLP and ISSR markers have limited application in marker assisted breeding, especially in heterozygous out-breeding perennial species like mulberry. Lack of sufficient number of co-dominant marker systems renders molecular breeding practices in mulberry still a distant possibility.
Microsatellites or simple sequence repeats (SSR) are short stretches of tandemly repeated DNA sequences, distributed throughout the eukaryotic genome [10,11]. SSR markers display locus specificity, are co-dominant and highly transferable to other related species [12] and hence are the most attractive choice of marker systems for mulberry. Further, the higher ability to detect polymorphism by the SSR markers is an added advantage while analyzing closely related species and/or genotypes, which is often the case in breeding programs [13]. The efficiency of the SSR markers in genetic screening has been reported in tree species like peach, olive and fig [14][15][16].
Except for the reports of Aggarwal et al. [17] and Zhao et al. [18], there have not been many efforts in developing co-dominant markers in mulberry. From this background, the main aim of this work was to generate SSR markers for characterizing mulberry germplasm and/or mapping populations. We report a large number of genic and genomic SSR markers for mulberry and examined their transferability to closely related species like Ficus (Ficusbengalensis), Fig (Ficuscarica) and Jackfruit (Artocarpusheterophyllus).

Result and discussion
Pre-cloning enrichment strategy was adopted to isolate the genomic microsatellite regions and a set of previously characterized expressed sequence tags (ESTs) [19][20][21] were analyzed to identify genic microsatellite regions. A total of 3485 sequences, including 1094 genomic and 2391 EST sequences were analyzed for the presence of microsatellite regions. Locus specific primers were designed for such target sequences to develop SSR markers.

Isolation and characterization of genomic microsatellites
Analysis of the genomic sequences revealed a total of 900 diverse microsatellite loci (Table 1). Among them, 167 (18.56%) sequences had mono nucleotide repeats (MNR) followed by 303 (33.67%) sequences with dinucleotide repeats (DNR). Tri nucleotide repeats (TNR) were found among 155 (17.22%) sequences while tetra (TtNR), penta (PNR) and hexa (HNR) nucleotide repeats were relatively less frequent in the enrichment library ( Figure 1). Besides these types, 52 (5.78%) microsatellite loci with repeat motifs having more than six nucleotide bases referred to as long nucleotide repeats (LNR) were also identified. It is well accepted that di, tri, tetra, penta and hexa repeat motifs represent an appropriate marker system and can generally distinguish greater diversity [22]. Hence, the LNRs and MNRs were excluded from designing locus specific primers. In our study, "TC/AG" repeats constituted the most frequent DNR microsatellite variant (25.5%) followed by "CT/GA". While "AT/TA" and "AG/TC" repeats were reported as the most frequent in plant genomes [17,[23][24][25][26][27][28][29]. He et al. [30] identified "GA/CT" as the most frequently occurring di-repeat motifs in groundnut. Our results revealed the presence of both the types of DNR motifs indicating a possibility that these markers would be able to distinguish greater diversity among mulberry accessions. The least abundant DNR motifs found in genomic SSRs was "CA/GT and CG/GC". The frequency of "GC" repeats was generally less in genomic regions of most plants as reported in peach [31], coffee [32], rubber tree [33], wheat [34] and soybean [35]. While "GAA" repeats were most frequent (15.9%) among the TNRs, "AAAT" repeats were the most frequent tetra nucleotide repeats (16.6%). Similarly, "AAAAC" and "AAAAAG" repeat types were more frequent among the PNR and HNR groups, respectively.
Based on the repeat sequences, the microsatellite regions were classified as perfect, interrupted (more than one of the same repeat motif spaced by a few base pairs) and compound repeats (different repeat motifs occurring tandemly and/or interrupted by a few base pairs). Details about the genomic SSR marker types, their repeat motifs detected in the enrichment library and the gene bank accession number are presented in Table 2. Of the repeat regions identified, 74.5% were perfect, 6.5% were interrupted and 19% were compound repeats. Repeat regions of the "perfect" type are more common in plant genome compared with "interrupted" or "compound" [36,37]. Though greater representation of compound repeat motifs is not common in plant genomes, they seem to exhibit greater levels of polymorphism and hence have a distinct advantage in mapping and diversity analysis [38][39][40][41].

Development of genomic SSR markers
Although DNA sequences harboring microsatellite regions were captured using specific probes, primers could not be designed to all the sequences. In instances where the repeat stretch was less than 15 nucleotides or in situations where the repeat regions were close to the ends of the sequences, primers were not designed. Thus, out of the 1094 genomic clones sequenced, 222 primer pairs could be developed ( Table 1). The web-based program, Primer3 (http://bioinfo.ebc.ee/mprimer3/), was adopted to design primers to the identified regions with more than 15 nucleotide repeats so as to amplify at least 150 bp fragments. The pre-cloning enrichment strategy captured specific genomic regions that were complementary to the microsatellite probes used. Thus, this approach enhanced the success of identifying specific loci that were unique in the genome. Of the set of 222 primer pairs developed, 137 (61.71%) showed locus specific amplification reiterating the advantages of the pre-cloning enrichment strategy in discovering microsatellite regions [17,30,42,43]. These locus specific markers       were HNR types ( Figure 1). These genomic SSR markers developed for mulberry have been deposited in the NCBI GenBank database and the details of all the locus specific primers are given in Table 2.

Isolation and characterization of genic microsatellites
A set of 2391 stress specific EST sequences obtained by subjecting K2, a leading mulberry variety [19][20][21], was examined for the presence of repeat motifs and 800 sequences were found to contain a total of 1155 genic microsatellite regions (Table 1). Of these, 254 sequences were found to contain more than one microsatellite locus. Mono nucleotide repeats were the most common among the sequences (Figure 1) followed by tri and hexa-repeat motifs (28.3% and 38.3% respectively). Among the factors that cause the generation of repeat sequences in the genome, replication slippage is often considered as the major mechanism. Though, this is a random phenomenon, the slippage in genic regions occurs in repeats of three bases clubbed with frame shift mutations which suppresses non-triplet repeats resulting in the abundance of TNR and HNR motifs [44][45][46]. A total of 180 compatible microsatellite regions were identified represented by 136 primer pairs (Figure 1). A significant 87.5% of these were perfect while 5.8% were interrupted and 6.6% were compound repeats (Table 3). It appears that the forces causing tandem repeats such as mutation, replication slippage etc., occurred more frequently in non-coding regions than the genic regions [22,45,47]. It is also possible that the lethal mutations in genic regions would subsequently eliminate the genotype while the sequence variations in non-coding regions of the genome would persist, resulting in the observation of higher frequency of sequence variations in the noncoding genomic regions. Accordingly, more numbers of repeat regions were found on the genomic regions (82%) while 48% were found in the genic regions. A large number of clones with more than 15bp of repeat motifs were found among the markers developed. Results revealed that the frequency of such markers was more in the non-coding regions of the mulberry genome than the genic regions [25]. The presence of longer repeats in the genome may have an evolutionary advantage leading to differences in the ability to adapt to new environments [48,49].

Validation of genomic and genic SSR markers
The genic and genomic SSR markers were validated using four contrasting genotypes of Morus alba that were chosen based on variations in certain physiological traits [50] and seven different mulberry species (all belonging to the genus Morus) ( Table 4). Of the 222 genomic and 136 genic SSR markers screened, 137 (62%) genomic and 51 (37%) genic SSR markers showed single locus amplification in all the Morus species as well as genotypes of Morus alba (Table 5). Further, genomic SSRs exhibited greater levels of polymorphism compared with the genic SSR markers. Such phenomenon has also been reported in other plant species [51]. Of the 188 markers examined, 87 (46.2%) detected heterozygosity in the mulberry genotypes and species with a maximum of 1.00 for markers MulSSR39, Mul3SSR26 Mul3SSR91 and Mul3SSR135, (Additional file 1). Around 41% of the genic markers also detected heterozygosity among the mulberry genotypes and species (Additional file 1). SSR markers are highly suited for mapping even in cross pollinated species because of their ability to detect heterozygosity. The markers developed in this study also detected significant levels of heterozygosity in mulberry species and genotypes.
Variations in the genic regions, though less frequent, would have a greater possibility of having a direct role in altering the phenotype of an organism [52]. The variability obtained for the SSR markers across mulberry species and genotypes was analyzed using Power Marker version 3.25 and the results are summarized in Table 6. A total   While most of the markers developed in the study amplified the genomic DNA of all mulberry species and genotypes, a few also included private or rare alleles. For instance, Mul3SSR153 only could amplify a few particular mulberry species (M. lavigata, M. assambola) and a mulberry genotype (Dudia white). Such private/rare alleles have great utility in establishing the genetic authenticity of a particular species and/or genotype in germplasm characterization as well as in genetic screening experiments [54].
Most of the genic and genomic SSR markers developed in this study were highly informative with an average PIC value of 0.543 which ranged from 0.000 to 0.929 among mulberry species and genotypes ( Table 6). Percentage of variation explained by the principal component analysis also revealed that 41% of the markers were effective in discriminating the variation among the mulberry species and genotypes confirming their efficiency in detecting genetic variations even among closely related varieties.
Two mulberry genotypes viz., Dudia white and UP105 were identified as contrasting lines differing in root traits and WUE in earlier studies [50]. These lines were crossed and a F 1 segregating population was developed. Of the 188 markers examined, 94 genomic and 22 genic markers were found to be polymorphic between these two parents. These polymorphic markers would be a very useful genomic resource for constructing a genetic linkage map for mulberry. This work is in progress and when done would lead to the determination of the linkage between markers and their position on mulberry linkage groups.
In the present investigation, we report a large number of genic and genomic SSR markers that can be exploited to examine the diversity among mulberry genotypes and  species. However, the relevance of the marker system would increase if they are transferable to other species.

Transferability of the SSR markers to other related species
The transferability of the mulberry SSR markers was examined using three species belonging to the family Moraceae viz.,Ficus (F.bengalensis), Fig (F. carica), and Jackfruit (A. heterophyllus) ( Table 4). Of all the markers evaluated 78% (107) genomic and 82% genic (42) markers showed locus specific amplification in at least one of the three species studied (Table 5). Around 30% of the markers were transferable to all the three species. Of the 107 genomic and 42 genic markers, 70% and 76% were transferable to jackfruit. The transferability of these markers was relatively low in Fig and Ficus, which ranged between 41 to 46% (Table 5). It can be perceived that the genic regions of related genomes would be more conserved than the non-coding regions and hence would have higher transferability [55]. These markers would be highly useful for genome mapping and comparative genomics in mulberry and other closely related species belonging to Moraceae. Several reports confirm the molecular relatedness of mulberry with a few other plant species belonging to the family Moraceae [56,57]. Thus, the effective transferability of both genic and genomic SSR markers to these species can be expected. In this context, the present study is significant as a large proportion of the mulberry markers were found to be effectively transferable to these closely related species of family Moraceae.

Diversity analysis
Genetic diversity among the mulberry and three closely related species from the family Moraceae was analyzed using the 188 locus specific markers. We used two clustering algorithms viz., Unweighted Neighbor Joining (NJ) and factorial analysis (FA) to group the species and genotypes. The results of genetic relationships among the species and mulberry genotypes based on NJ and FA is presented in Figures 3 and 4. Both the algorithms were congruent and grouped the species and genotypes into four clusters. A. heterophyllus, F. bengalensis and F. carica segregated into a distinct cluster (I) while other mulberry species and genotypes clustered separately (II, III and IV). It was interesting to note that Dudia white clustered along with M. lavigata and M. assambola, while all other mulberry species and genotypes grouped into clusters III and IV. Though the dendrogram in Figure 3 indicates clusters III and IV as different, based on the boot strap values, these clusters could be considered as not significantly distinct. Therefore it is apparent that all the mulberry genotypes and species share common alleles except the genotype Dudia white and mulberry species M. lavigata and M. assambola.The diversity structure represented by the factorial analysis also indicated a similar grouping pattern for the mulberry species and genotypes ( Figure 4). Though Dudia white is often considered as a genotype of M. alba, there is no firm molecular evidence for its origin.
The genetic relatedness of the 14 species and genotypes is explained in the Table 7. Based on the dissimilarity matrix Fig and UP105 showed maximum dissimilarity (93.8%) and Fig and Ficus showed the least (38%). Among the mulberry species and genotypes, the minimum genetic dissimilarity (44.4%) was observed between M. alba and M. exotica and highest dissimilarity of 74.7% was found between Dudia white and UP105.  These two genotypes significantly differed in physiological traits such as root length and water use efficiency [50]. Overall, the diversity analysis clearly indicates that the markers reported in this study are very well conserved across the taxa and can be effectively utilized to study the genetic relationship among varieties, genotypes and species of Moraceae.

Conclusion
Considering the commercial importance of mulberry and the complexity of trait based breeding, a focused molecular breeding strategy needs to be evolved for the genetic enhancement of this crop. Lack of sufficient genomic resources such as SSR markers has been one of the major constraints. We report a total of 188 robust locus   The markers displayed high levels of polymorphic information content (PIC) and heterozygosity, enhancing the opportunities of using these markers in diversity analysis as well as for tagging QTLs governing complex agronomic and physiological traits. All the markers developed have been deposited in NCBI/EMBL database and are publicly available.

Plant materials and DNA extraction
Two strategies were adopted for the generation of genomic resources of microsatellite markers for mulberry. Microsatellite motifs in the genomic regions were identified by adopting the pre-cloning enrichment strategy using the genomic DNA isolated from a mulberry genotype Dudia white. Similarly, a stress expressed sequence tag (EST) was analyzed to identify microsatellite motifs in genic regions of mulberry genome. Details of the methodology adopted are described below.
Pre-cloning enrichment strategy for the construction of genomic library and mining of microsatellite motifs The SSR enriched genomic library was constructed by a modified method of Saghaimaroof et al. [58]. Four micrograms of high quality genomic DNA was extracted from a genotype, Dudia white. This genotype was identified based on the extensive phenotyping carried out with a diverse set of mulberry germplasm [50]. The genomic DNA was digested by blunt-end generating restriction endonuclease, RsaI (MBI Fermentas, USA). This restriction reaction generated a large number of approximately 500-1000 base pair fragments. The ligation of Super SNX linkers, consisting of a Super SNX 24-mer (5'-GTTT AAGGCCTAGCTAGCAGAATC-3') and a phosphorylated 28-mer (5'-pGATTCTGCTAGCTAGGCCTTAAACA AAA-3') to the blunt termini of restriction fragments was performed for 2 hours at 37°C. To ensure linker ligation, 10 μl of digested and ligated product was pre-amplified using 1.5 μl of Super SNX24 Forward primer (10 μM), 150 μM of dNTPs, 2 mM MgCl 2 , 1 unit of Taq DNA polymerase and 25 μg/ ml of BSA in a volume of 25 μl. Self-ligation of the linkers was avoided by adding 1 unit of the restriction enzyme, XmnI. PCR amplification was carried out with a program consisting ofan initial DNA denaturation step of 95°C for 2 min followed by 20 cycles of: DNA denaturation step at 95°C for 20 s, primer annealing cycle with the appropriate temperature for specific primer pairs for 20s and a DNA Figure 4 Factorial analysis for grouping of mulberry species, genotypes and three related species using genomic and genic SSR markers. Ficus (Ficus bengalensis), Fig (Ficus carica) and Jackfruit (Artocarpus heterophyllus) were the closely related species examined for the transferability of microsatellite markers developed. All species and genotypes belong to family Moraceae. The touchdown PCR conditions facilitate the microsatellite probes to hybridize with complimentary DNA repeat fragments (i.e., expectantly long prefect repeats) when the reaction mixture is at or near the microsatellite probes melting temperature. Hybridized fragments were selectively isolated using Streptavidin coated paramagnetic beads (Roche, Mannheim, Germany). Enriched DNA fragments were amplified with super SNX24 primers and purified using PCR purification column (Sigma, USA). The purified enriched products were ligated to pTZ57R/T vector (MBI Fermentas, USA) using T4-DNA ligase overnight at 16°C. The ligated genomic inserts were cloned in competent E. coli DH5α host cells and grown over night at 37°C. The transformed colonies were confirmed by performing PCR using M13 universal primers (3 μM), 100 μM dNTPs, 2 mM MgCl 2, 1 U Taq DNA polymerase and 1X PCR buffer, at an annealing temperature of 58°C for 30 cycles. PCR products of the recombinant clones were purified using PCR-purification column (Sigma, USA) and sequenced using M13 forward and reverse primers on ABI 3700 sequencer.

Development of EST library to identify genic microsatellite markers
A stress transcriptome was developed by extracting the total mRNA from the leaves of water stressed and well watered mulberry plants. A widely adopted mulberry variety, K2 was used for this purpose. A modified guanidiumisothiocyanate protocol [59] was adopted to isolate total RNA from mature leaf tissue. Total messenger RNA (mRNA) was then isolated from 1 mg of total RNA using mRNA isolation kit (Promega). The mRNA was reverse transcribed to develop cDNA and the ESTs have been isolated [19]. These EST sequences were used in this investigation to develop genic SSR markers.

SSR marker development
Initially, the sequences were analyzed to identify unique and non-redundant libraries of genic and genomic regions for designing primers. The nucleotide sequences were analyzed using the Clustal-W, an on-line toolto determine the complemetarity between pairs of sequences. The non-redundant sequences were analyzed with "Mreps" software (http://bioinfo.lifl.fr/mreps/mreps.php) to identify sequences containing microsatellite motifs. The analysis revealed the presence of a single nucleotide base being the repeat motif (mono nucleotide repeat -MNR) to as high as regions with more than six bases (long nucleotide repeat -LNR). The MNR and LNR sequences were omitted from further analysis and primers were designed only the sequences with repeat motifs of two nucleotides (di-nucleotide repeats -DNR) and six nucleotides (hexa-nucleotide repeats -HNR). Primer3, also online software was used for designing appropriate primers [60]. The quality of primers was determined using the FAST PCR program and only those primers that would amplify a fragment in the range of 150 and 450 base pairs of template DNA were selected. Synthesis of these primers was outsourced to Bioserve India Pvt. Ltd., Hyderabad). Each of the primer pairs was standardized for their locus specific amplification using the genomic DNA of Dudia white as a template. Gradient-PCR was carried out in a total volume of 15 μL containing 2 ng of DNA template, 1× Taq buffer, 2 mM MgCl 2 , 0.2 mM dNTPs, 1 U Taq DNA polymerase (MBI Fermentas, USA) and 3 μM each of forward and reverse primers. Amplification was performed in a epGradient Master cycler (Eppendorf, Hamburg)with the following PCR conditions: DNA denaturation at 95°C for 5 min followed by 30 cycles of 95°C for 1 min, primer annealing temperatures ranging between 45-65°C for 45 s (depending on the Ta for each primer pair) and a DNA extension step of 72°C for 45 s and a final extension step at 72°C for 8 min. The details of the primer sequences, their annealing temperatures, expected amplicon size etc. are summarized in Table 2 and Table 3. The amplified products were resolved on 3% agarose gels. Only those primer pairs that produced unambiguous single band amplification alone were considered for the development of SSR markers in mulberry. This stringency ensured the development of robust SSR markers in mulberry which can be effectively used for diversity analysis as well as for constructing genetic linkage maps. Only such markers were further used for validation.

Validation of markers
Each of the markers was examined for their ability in amplifying the genomic DNA from other mulberry species and genotypes. Genomic DNA was extracted from seven distinct mulberry species and four contrasting genotypes of mulberry using a modified CTAB method [61]. These four genotypes were selected based on the extensive phenotyping of a set of 295 germplasm accessions for the variability in root traits and water use efficiency. Thus, the four genotypes represent contrast for these highly relevant drought adaptive traits. The list of the mulberry species and genotypes are given in Table 4. The template DNA from the different mulberry species and genotypes were amplified using each of the primers for genic and genomic microsatellite markers. The PCR conditions followed are same as that adopted for gradient PCR, explained above. All the amplified products were analyzed on microchip based electrophoresis system MultiNA (Shimadzu biotech, Japan) and the highest peak detected by the fragment analyzer was scored for the presence of the expected band for each primer pair. The polymorphism data was scored and used for the determination of polymorphic information content (PIC) for each marker as per Liu and Muse [62], Observed heterozygosity and allele diversity were computed using the Power Marker 3.25 software [62]. The most appropriate locus specific marker competent to divulge the variation among the species and genotypes was determined by principle component analysis (PCA).

Genetic diversity and cross species transferability
It is well known that there would be significant levels of sequence homology between closely related species and hence, there would be a possibility of a specific SSR marker detecting a similar locus in other related species. Establishment of the transferability of markers to other related species is therefore important while developing locus specific marker systems. The transferability of these markers was examined in three closely related species belonging to the family Moraceae, namely Ficus (F. bengalensis), Fig (F. carica) and Jackfruit (A. heterophyllus) ( Table 4).
The percentage of transferability of the markers was calculated for each species by determining the presence of target loci to the total number of loci analyzed. The allelic diversity data obtained for all the microsatellite loci amplified were used to compute the genetic dissimilarity using DARwin v.5.0 program [63]. The dissimilarity matrix was further used to group the species according to their genetic relatedness based on Unweighted Neighbor Joining method and factorial analysis.