Characterization, validation, and cross-species transferability of EST-SSR markers developed from Lycoris aurea and their application in genetic evaluation of Lycoris species

Background The Lycoris genus includes many ornamentally and medicinally important species. Polyploidization and hybridization are considered modes of speciation in this genus, implying great genetic diversity. However, the lack of effective molecular markers has limited the genetic analysis of this genus. Results In this study, mining of EST-SSR markers was performed using transcriptome sequences of L. aurea, and 839 primer pairs for non-redundant EST-SSRs were successfully designed. A subset of 60 pairs was randomly selected for validation, of which 44 pairs could amplify products of the expected size. Cross-species transferability of the 60 primer pairs among Lycoris species were assessed in L. radiata Hreb, L. sprengeri Comes ex Baker, L. chinensis Traub and L. anhuiensis, of which between 38 to 77% of the primers were able to amplify products in these Lycoris species. Furthermore, 20 and 10 amplification products were selected for sequencing verification in L. aurea and L. radiata respectively. All products were validated as expected SSRs. In addition, 15 SSRs, including 10 sequence-verified and 5 unverified SSRs were selected and used to evaluate the genetic diversity of seven L. radiata lines. Among these, there were three sterile lines, three fertile lines and one line represented by the offspring of one fertile line. Unweighted pair group method with arithmetic mean analysis (UPGMA) demonstrated that the outgroup, L. aurea was separated from L. radiata lines and that the seven L. radiata lines were clustered into two groups, consistent with their fertility. Interestingly, even a dendrogram with 34 individuals representing the seven L. radiata lines was almost consistent with fertility. Conclusions This study supplies a pool of potential 839 non-redundant SSR markers for genetic analysis of Lycoris genus, that present high amplification rate, transferability and efficiency, which will facilitate genetic analysis and breeding program in Lycoris. Supplementary Information The online version contains supplementary material available at 10.1186/s12870-020-02727-3.


Background
The genus Lycoris, a member of the Amaryllidaceae family, contains 30 species distributed all over the world. In China, there are 15 species and 1 variety, among which, L. radiata and L. aurea are the most widespread species. Species of Lycoris genus are used as medical herbs that produce unique amaryllidaceae alkaloids, which exhibit a wide range of medical functions, such as anti-viral, antitumor and acetyl-cholinesterase-inhibitory activities [1][2][3][4]. Moreover, since flowers of this genus have a variety of colors and shapes, and some also have fragrance [5], they are very popular as ornamental plants. Hence the demand for Lycoris species has been increasing.
Molecular markers have proven to be effective in the analysis of genetic diversity. Many types of DNA-based molecular markers have been employed in the analysis of genetic diversity, such as RFLPs, RAPDs and SSRs [16,17], among which SSRs are very popular for their co-dominant inheritance, good reproducibility and costeffectiveness [18,19]. SSRs are developed from two types of sequence resources. One is from genomic DNA, called genomic SSRs, and the other is from expressed sequence tags (ESTs), called EST-SSRs or genic SSRs. Compared to genomic SSRs, EST-SSRs are more transferable across related species [20]. Nowadays, more and more species have been sequenced using Next Generation Sequencing techniques, thus a large number of referable ESTs are available in GenBank. Therefore, developing EST-SSR markers will be convenient for many species, especially for those whose genomic researches progress slowly. For example, EST-SSR markers have been developed from transcriptome data of Curcuma alismatifolia [21], Dendrobium officinale [22], Salix psammophila [23] and Robinia pseudoacacia [24].
In Lycoris, transcriptome sequencing has been performed in L. longituba, L. aurea and L. sprengeri [25][26][27][28], and abundant sequences containing SSRs have been obtained, which offer a convenient and cost-effective opportunity to develop EST-SSRs. Nonetheless, there have been only 27 EST-SSRs developed from L. longituba [25] and L. sprengeri [29], and 27 genomic SSRs from L. radiata, as well as a hybrid between L. aurea and L. radiata [30,31]. Thus, developing more SSRs in Lycoris genus is demanded for genetic analysis and for facilitating research on germplasm conservation and breeding. Previously, using short reads sequencing technology (Illumina), we sequenced the transcriptome of L. aurea seedlings subjected to methyl jasmonate (MJ) treatment and assembled 59,643 unigenes [27]. In this study, EST-SSR mining was carried out based on these data. The major objective of this study was to supply a vast pool of non-redundant SSR markers for the genetic analysis of the Lycoris genus and to facilitate the development and utilization of elite germplasm in this genus.

Development and characterization of EST-SSRs
Among the 59,643 unigenes assembled from transcriptome sequences of L. aurea [27], a total of 4637 SSRs were detected using the MISA program. To eliminate the redundant SSRs with respect to alternative transcripts of the same gene, sequence alignment was performed with BioEdit. Finally, primers were designed for a total of 839 SSRs including 623 tri-nucleotide repeats, 147 di-nucleotide repeats, 9 tetra-nucleotide repeats, 13 penta-nucleotide repeats, and 47 hexa-nucleotide repeats. As shown in Table 1, tri-and di-nucleotide repeats were main types of the detected SSRs, which was consistent with a previous report from an analysis using 454 pyrosequencing [26]. Our data showed that tri-nucleotide repeats accounted for 74.25%, in which (AAG)n/(CTT) n was the most common type. In addition, di-nucleotide repeats accounted for 17.52%, in which (AG)n/(CT) n repeat was the most common type, and (GC)n/(GC) n was the least common. Among the EST-SSR sequences, there were 28 ESTs containing 2 SSR loci, 4 ESTs containing 3 SSR loci and 1 EST containing 8 SSR loci. In all, a total of 839 pairs of SSR-specific primers were designed using software Primer3 (http://bioinfo.ut.ee/ primer3/), representing 796 unigenes (Table S1).

Validation of EST-SSRs
To validate the designed primers, a subset of 60 primer pairs was randomly selected and tested in L. aurea, which consisted of 9 di-nucleotide repeats, 46 trinucleotide repeats, 1 tetra-nucleotide repeats, 2 penta-nucleotide repeats and 2 hexa-nucleotide repeats. Fortyfour primer pairs produced clear bands that matched the predicted sizes, accounting for 73.3% effective amplification. To confirm whether the amplification products are SSRs, 20 amplification products observed as clear and strong bands on a high-resolution agarose gel were selected, cloned and sequenced (Table S6). The results showed that the repeat motifs were consistent with the predicted sequences for all the 20 SSRs, though some SSRs varied a little in the repeat numbers. As shown in Table 2, the repeats in LaES12, LaES18, LaES25, LaES36 and LaES41 were reduced, whereas those in LaES20, LaES22, LaES31, and LaES53 were increased, while the others kept consistency with the predicted repeat number. Furthermore, the 20 sequence-verified SSRs were employed for polymorphic analysis of 11 L. aurea individuals (Table S2). One hundred clear bands were amplified and 85% of them were polymorphic. The results demonstrated that these EST-SSRs are useful for genetic analysis in L. aurea.

Transferability of the EST-SSR markers within the Lycoris genus
To verify whether the primer pairs designed from the EST sequences of L. aurea could also effectively amplify the same SSR motifs in other Lycoris species. L. radiata and L. sprengeri with x = 11 as well as L. chinensis and L. anhuiensis with x = 8, were selected for analysis. As shown in Table 3, 44 of the 60 EST-SSRs amplified products of expected size in L. aurea and L. radiata, while 45, 23 and 30 EST-SSRs were amplifiable in L. sprengeri, L. chinensis and L. anhuiensis respectively, accounting for 38-77% amplification rate. These results demonstrated that amplification rates differed greatly in the assessed species, high in L. radiata and L. sprengeri, and low in L. chinensis and L. anhuiensis. Additionally, two markers, LaES23 and LaES24 with no amplification products in L. aurea, produced amplicons in L. radiata and/or L. sprengeri, indicating that an insertion may have occurred in this region of L. aurea.
Further sequencing verification was performed to validate whether such amplification products correspond to the same SSRs in these Lycoris species. As shown in Table 2, ten of the 20 sequence-verified SSRs in L. aurea were selected and confirmed in L. radiata. Moreover, LaES36 and LaES53 were validated in all the five species (Table S6). In general, these repeat motifs were consistent among the Lycoris species, although there was a small difference in the number of repeat motifs. In the ten SSRs generated for L. radiata, the repeat motifs decreased in five SSRs, increased in two and were unchanged in the remaining three SSRs (Table 2). Meanwhile the sequences of LaES36 and LaES53 from all five species were aligned. As shown in Fig. 1, multiple sequence deletions were observed in the amplification of LaES36 in L. radiata, but the flanking sequences of the repeat motifs, designed as primer sequences, were conserved in all the five species. Conservation of the SSR flanking sequences would account for the transferability of EST-SSRs [20]. Amplification products subjected to sequence verification were all validated as SSRs, which suggested that the SSR primers with proper size products are probably authentic SSRs, further denoting high potential of the set of EST-SSRs.

GO (gene ontology) annotation of genes harboring the set of SSRs
EST-SSRs are derived from transcribed genes and hence the annotation of the corresponding genes will be helpful for their application in trait associated marker selection [32]. GO annotations of the genes harboring EST-SSRs are listed in  19 participate in processes such as Golgi organization and cysteine biosynthesis. As the transcriptome sequences were obtained from seedlings of L.
aurea treated with MJ, it is acceptable that many of the genes were associated with processes involved in responses to stimulus or signals.
Genetic diversity and population structure analysis of L. radiata lines To apply the newly developed SSR markers for genetic analysis, 7 L. radiata lines with difference in fertility, were selected for diversity analysis (Table 4). Of these, three are fertile, three are sterile, and one line consisted of the progeny of one fertile line, Pop6. Considering the GO annotations and transferability of the set of 60 EST-SSRs, 15 EST-SSR markers, containing 10 sequenceverified and 5 unverified SSR markers (LaES3, LaES20, LaES31, LaES46 and LaES58) were chosen to explore the genetic diversity of the 7 L. radiata lines. A total of 88 bands were detected in the L. radiata lines, among which 80 were polymorphic. The parameters of genetic diversity and differentiation were described in Tables 5 and 6, respectively. Polymorphism information content (PIC) values of the used EST-SSR markers ranged from 0.374 to 0.850, except for LaES18 and LaES27, which demonstrated no polymorphism in the L. radiata lines. The mean PIC value among the verified and unverified markers were 0.636 and 0.681, indicated that products were not sequenced, and '-' indicates no products or products with improper sizes Table 3 Cross-species amplification of the 60 microsatellite loci in Lycoris species Markers Amplification in L. aurea in L. chinensis in L. anhuiensis in L. radiata in L. sprengeri (Table 5). Also, the genetic differentiation (Gst) and the gene flow (Nm) parameters were similar between the verified and the unverified markers (Table 6), which suggested that both SSR groups are useful for genetic analysis in L. radiata. The average genetic differentiation parameter Gst was 0.752 among the 7 L. radiata lines, which means that diversity between populations accounted for 75.2% of total diversity, suggesting high genetic diversity among populations. Nevertheless, genetic communication among those L. radiata lines was very low, for Nm was just 0.172 (Table 6). We propose that the low genetic communication among L radiata lines may result from asexual reproduction, which is the main reproduction mode of Lycoris. Therefore, genetic diversity assessment of Lycoris accessions collected from different habitats is important and valuable for the development of elite germplasm.
In order to elucidate the genetic relationship of these 7 L. radiata lines, L. aurea was added as an outgroup to construct a dendrogram based on Nei's genetic distance. As shown in Table 7, Nei's genetic distances presented high variance between these lines, ranging from 0.0064 to 0.5568. The distance between L. aurea and Pop4 was the furthest, while that between Pop2 and Pop3 was the closest, then that between Pop6 and Pop7. Correspondingly, a dendrogram was constructed based on the genetic distance using the unweighted pair group method with arithmetic mean analysis (UPGMA). As shown in Fig. 2, the outgroup was separated from L. radiata, while L. radiata lines were clustered into two groups. Interestingly, these two groups were consistent with their fertility. The three sterile lines comprised group I, while the three fertile lines plus the offspring population formed group II. These results hinted high efficiency of the 15 EST-SSRs.
To further evaluate the relationship of the 34 L. radiata individuals, population structure was analyzed with Structure 2.3.4. The ΔK method showed that the optimal K value was K = 4 ( Fig. 3a), which inferred the existence of four main groups among the 34 L. radiata individuals. As shown in Fig. 3b, individuals from each line had the same ancestry except for lines 6 and 7, among which some individuals were scattered in more than one group, suggesting admixed ancestry of Pop6 and Pop7. From the Q value profile of populations (Table S3), it was also concluded that Pop6 and Pop7 were admixed. As shown in Fig. 3b, 6 individuals containing one from Pop6, the parent line, and five from offspring Pop7 were assigned to intermediate. It is acceptable that a parental line with an admixed ancestry would produce greater diversity in its offspring line.

Discussion
Lycoris species laid a little behind other species in genetic research. So far, some molecular markers such as SCoT, ITS, have been used to determine the genetic diversity and evolutionary relationship among Lycoris species [33,34]. However, these molecular markers are not sufficient for such studies as linkage group construction and marker assisted selection (MAS). Instead, SSRs are powerful DNA markers for this type of research, because of their co-dominant inheritance. Though there  Total  44  23  30  44  45 Note: '+' indicates that products with proper sizes, '-' indicates no products or products with improper sizes have been 27 genomic SSR markers and 27 genic SSR markers available in Lycoris species [25,[29][30][31], they are far from adequate for better genetic analysis of Lycoris genus. Since the Lycoris genus has a massive genome [15], a large number of DNA markers are necessary to cover the whole genome. Thus, the list of 839 potential EST-SSR markers released in this study will be helpful for genetic analysis, especially for genetic improvement of Lycoris. Transcriptome sequencing technology makes it easier to develop EST-SSRs, and nowadays abundant EST-SSRs are available [35][36][37]. However, sequence redundancy is a major disadvantage of EST-SSRs due to multiple alternative transcripts for the same gene [21]. In addition, some SSRs occasionally exist in reverse complementary formats, but in fact they are identical. However, these common phenomenon about EST-SSRs from transcriptome sequences were often ignored in the development of EST-SSRs. Thus, the actual number of EST-SSRs obtained from transcriptome sequences should be much less than that detected. Moreover, such redundancy will reduce the representativeness of the EST-SSRs developed, and then their efficiency in application. Bazzo et al. successfully developed a total of 418 SSRs from 7492 detected EST-SSRs after considering their redundancy from transcriptome sequences of macaúba palm (Acrocomia aculeata) [37]. In our study, thousands of EST-SSRs were detected from transcriptome sequences of L. aurea seedlings treated with MJ, but many of these EST-SSRs were redundant duplicates. By eliminating duplicated EST-SSRs, we identified 839 SSRs from 4637 detected EST-SSRs as a pool of non-redundant candidate EST-SSRs, which would be promising for SSR development and correspondingly for genetic analysis.
When EST-SSRs are developed from transcribed eukaryotic genes, introns may affect the amplification products. Additionally, assembly errors may also affect amplification of the SSRs [35,36]. Therefore, in this study, sequencing confirmation was conducted in the amplicons that generated clear bands of expected sizes, and the results confirmed that the designed primers amplified the expected loci. Supposing the same amplification rate of the subset of EST-SSRs in L. aurea, there would be about 615 primer pairs in the primer pool which can be developed as EST-SSR markers for L. aurea, suggesting great potential of the primer pool.
EST-SSRs have a high transferability among closely related species [35,36,38,39]. When assaying the transferability of EST-SSRs, amplification condition affecting the effectiveness and fidelity of amplification is an important factor [19]. In this study, the annealing temperature was set at 58°C to avoid unspecific amplification when performing PCR amplification in all the five   species. In total, 23 out of 44 EST-SSRs from L. aurea were transferable among L. radiata, L. sprengeri, L. chinensis and L. anhuiensis, which accounted for 52.27% transferability, higher than that observed with genomic SSRs [30,31]. Interestingly, it was found that the transferability was high in L. radiata and L. sprengeri, whereas it was lower in L. chinensis and L. anhuiensis. It is known that L. aurea, L. chinensis, L. anhuiensis, L. radiata and L. sprengeri are common and primitive species of Lycoris and also that some Lycoris species that originated from hybridization are mostly hybrids derived from the above species [14,15]. Thus, EST-SSR transferability observed in this study will supply informative and practical guidance for application of this set of EST-SSRs in Lycoris.
The polymorphism information content (PIC) is an important parameter of a DNA marker to reflect the power of the molecular marker. In general, markers with PIC> 0.5 are defined as highly polymorphic [40]. In our case, the average PIC value of the 13 EST-SSR markers used, except LaES18 and LaES27, was 0.654, higher than many EST-SSR markers developed from transcriptome sequences, such as for Torreya grandis (0.357) [41], Mucuna pruriens (0.24) [42] and Lagerstroemia spp (0.589) [43]. The high PIC values observed in our work hinted a high efficiency of these set of markers in genetic analysis.  The genetic relationship of the 7 L. radiata lines and L. aurea was analyzed by using the 15 EST-SSR markers. The results showed a significant genetic distance between L. aurea and the L. radiata lines, with an average of 0.4547. The closest genetic distance, 0.0064, was found between Pop2 and Pop3, two lines from Nanjing Botanical Garden Mem. Sun Yat-Sen, which are possibly the same accessions. The Q value profile of the Structure analysis also indicated the same ancestry for Pop2 and Pop3 (Table S3). The genetic distance between an offspring line, Pop7, and its parent line, Pop6, was 0.0304, also suggesting a close relationship. In addition, the Q value profile demonstrated that both Pop6 and Pop7 had an admixed ancestry of groups 1, 2, and 3, which is reasonable for an admixed parent line can yield an admixed offspring. In general, the genetic distances and genetic structure based on the mere 15 EST-SSRs were reasonable and acceptable.
Because EST-SSR markers are identified from transcribed RNA sequences and may be linked to functional genes with a possible impact on important traits, EST-SSRs may have advantages over genomic SSRs [19]. For example, González et al. focused on EST-SSRs from genes involved in some specific pathways, and three EST-SSRs were even able to discriminate different properties of the fruits [32]. In our case, the dendrogram based on mere 15 EST-SSRs showed that the outgroup, L. aurea was separated from L. radiata lines and that the 7 L. radiata lines were clustered into two groups consistent with their fertility. Moreover, the UPGMA dendrogram of the 34 L. radiata individuals based on 15 EST-SSR markers clustered all the individuals into two groups as well (Fig. 4), similar to that obtained from the analysis with the 7 L. radiata lines and an outgroup (Fig. 2). The first group consisted of all the individuals from three sterile lines, plus two individuals from Pop7, and another group was comprised of the remaining individuals from the fertile lines. Considering that LaES18 and LaES27 showed no polymorphism in L. radiata lines, the dendrogram was just based on the 13 EST-SSR markers, suggesting that those 13 EST-SSR markers are highly efficient. And the high efficiency may be associated with the functions of those transcribed genes. Although the mechanisms of fertility in L. radiata is unclear, many studies demonstrated that gibberellic acid (GA) and auxin signaling are involved in regulating reproductive development [44,45]. Since crosstalk among ethylene, ABA, GA and auxin is undoubted [46], all these signals may play roles in reproductive development. Additionally, we speculated that processes of flower organ development as well as vegetative phase change, may affect seed development and finally result in a productive trait: fertility or sterility. Interestingly, among the 13 EST-SSRs, 4 are involved in hormonal signal response, 4 are associated with flower organ development and 3 participate in vegetative phase change. Furthermore, correlation analysis showed that 28 loci  (Table S5). Actually, these loci were from 10 of the above 11 EST-SSR markers, associated with flower organ development, vegetative phase change and hormonal signal response. In general, GO annotations of these 10 EST-SSRs may explain the high consistency between the clusters and fertility. Nonetheless, more work and efforts are needed to elucidate the mechanism of fertility in L. radiata.

Conclusions
This study included the mining of EST-SSR markers from transcriptome data of L. aurea. A potential pool of 839 non-redundant EST-SSR markers was supplied for Lycoris. Marker characterization and validation demonstrated that the newly developed SSR markers have high amplification rate and transferability. Moreover, nearly half of the set of SSRs have GO annotations, which would be useful for trait associated marker selection in Lycoris. Further, 15 EST-SSRs were selected for the genetic diversity analysis of 7 L. radiata lines, consequently the 7 L. radiata lines were clustered into two groups

Plant materials
The plant materials include 11 L. aurea individuals and 34 individuals of L. radiata, and one individual of each L. sprengeri, L. chinensis and L. anhuiensis. Individuals of L. aurea, L. sprengeri, L. chinensis and L. anhuiensis were collected from Nanjing Botanical Garden Mem. Sun Yat-Sen (118°83′ E, 32°05′ N). Collection information and characteristics of L. radiata lines are provided in Table 4. The voucher specimens were identified by Prof. EST-SSR validation was performed in L. aurea (Table  S2). Cross-species transferability analysis of EST-SSRs was performed in L. radiata, L. sprengeri, L. chinensis and L. anhuiensis. Thirty-four L. radiata individuals that belong to 7 lines were used for genetic diversity analysis (Table 4).

SSR mining and primer design
The MISA [47] was used for microsatellites screening. Perfect di-, tri-, tetra-, penta-, and hexa-nucleotide motifs were detected by setting the parameters to a minimum of 6, 5, 4, 4, and 4 repeats, respectively. SSR sequences in different transcripts of the same gene were aligned using BioEdit to detect duplicate EST-SSRs. Primer pairs were designed using software Primer3 (http://bioinfo.ut.ee/pri mer3/) with the parameters set as: primer length of 18-24

DNA extraction
Young leaves of each individual were ground into powder with liquid nitrogen. DNA extraction was carried out according to instructions of the plant genomic DNA Mini Kit (Tiangen, Beijing, China), and DNA was detected on 1% agarose gel to evaluate DNA quality and concentration. The total DNA samples were diluted at concentration of 20 ng/μL with TE buffer and stored at − 20°C for PCR amplification.

EST-SSR validation
PCR reactions were carried out in a 20 μL reaction volume, containing 20 ng of genomic DNA, 0.5 μM of each primer, and 10 μL of 2x Taq Master Mix (Dye Plus) (Vazyme Biotech, Nanjing, China). The PCR reactions were performed in an Eppendorf Mastercycler ep gradient thermal cycler using the following program: 3 min at 94°C, followed by 35 cycles of 30 s at 94°C, 30 s at 58°C, and 30 s at 72°C, then a final extension at 72°C for 5 min. PCR products were analyzed via 2% MetaPhor™ agarose (Lonza.com) gel electrophoresis, and products with strong and clear band were cloned into a T-vector and sequenced.
Genetic diversity analysis in L. radiata PCR products were separated on 8% non-denaturing polyacrylamide gels for polymorphism analysis and visualized by silver staining. PCR products were manually scored based on allele size following data scoring as "0" in the absence of the band and "1" as its presence. The binary data matrix was subjected to POPGENE1.32. Population genetic parameters of 7 L. radiata lines, (Na, Ne, h, I) and differentiation parameters (Ht, Hs, Gst and Nm) were evaluated by POPGENE version 1.32 [48] and PIC by PIC_CALC [49]. The UPGMA dendrogram of lines was constructed with L. aurea as an outgroup based on the genetic similarities, allowing a 1000 replicate bootstrap. Similarly, an UPGMA dendrogram was constructed with the 34 L. radiata individuals.
Statistical analyses were conducted using SPSS statistics (version 20), applying the Spearman correlation coefficient test.

Population structure analysis
An analysis of population structure and ancestry of the 34 L. radiata individuals based on Bayesian statistics, without prior assignment to populations, was performed using Structure v.2.3.4 [50,51]. In this study, SSRs were applied as dominant markers and the binary data (0, 1) were used. Since the input data in different ploidy models is not acceptable in Structure 2.3.4, Pop1, a triploid population was denoted as a diploid population. Batch runs with correlated and independent allele frequencies among inferred clusters were tested with population parameters set to admixture model (burn-in 50,000; run-length 100,000). The program Structure Harvester (http:// taylor0.biology. ucla.edu/structure Harvester/#) was used to estimate the final K value for the STRUCTURE analysis based on both the Plot of mean posterior probability (LnP(D)) values and the ad hoc Evanno's ΔK statistics [52]. L. radiata individuals were allocated to a cluster if Q values were greater or equal to 0.70, or otherwise considered as intermediate or admixed.