The novel developed microsatellite markers revealed potential hybridization among Cymbidium species and the interspecies sub-division of C. goeringii and C. ensifolium

Background Orchids (Cymbidium spp.) exhibit significant variations in floral morphology, pollinator relations, and ecological habitats. Due to their exceptional economic and ornamental value, Cymbidium spp. have been commercially cultivated for centuries. SSR markers are extensively used genetic tools for biology identification and population genetics analysis. Result In this study, nine polymorphic EST-SSR loci were isolated from Cymbidium goeringii using RNA-Seq technology. All nine SSR loci showed transferability in seven other congeneric species, including 51 cultivars. The novel SSR markers detected inter-species gene flow among the Cymbidium species and intra-species sub-division of C. goeringii and C. ensifolium, as revealed by neighborhood-joining and Structure clustering analyses. Conclusion In this study, we developed nine microsatellites using RNA-Seq technology. These SSR markers aided in detecting potential gene flow among Cymbidium species and identified the intra-species sub-division of C. goeringii and C. ensifolium. Supplementary Information The online version contains supplementary material available at 10.1186/s12870-023-04499-y.


Background
Orchidaceae is one of the most abundant species angiosperm families, constitutes approximately 10% of flowering plant species, and displays unique flower morphologies [1][2][3].Orchids account for a large share of the global floriculture trade both as cut flowers and as potted plants and were estimated to comprise around 10% of the international fresh-cut flower trade [4,5].Orchids are fast-growing potted flowering plants in many countries in terms of sales [6].Hybridization between species happens in nature and during culturing [7,8].The genus Cymbidium comprises 44 species that are widely distributed in East Asia [9][10][11].Cymbidium spp.(Orchidaceae) are popular potted flowers which were considered to have great value in ornamental and economic and have been cultivated for several centuries [12].Despite the great value, the richness of orchid species decreased dramatically, and a lot of orchid species have become rare or endangered in the world [5,13].Because of a long history of cultivation and nature hybridization, the genetic variation of Cymbidium spp. is high diversity and complex [14].Consequently, the taxonomic classification of Cymbidium becomes very difficult [15].Although several approaches have been attempted to understand genetic diversity [16][17][18][19], the genetic resources for the characterization of Cymbidium are still insufficient.Some microsatellite markers that developed for the genus Cymbidium are not well-tested in cross-species [16,17,19].Additionally, the genetic relationship among many of the major lineages of Cymbidium species remains unclear and the genetic relationship between species is not clear [9,20].It is necessary to develop reliable markers to evaluate the genetic diversity and phylogenetic relationship of Cymbidium for effective conservation and utilization.
Microsatellites or simple sequence repeats (SSRs) are a subcategory of tandem repeats consisting of 1-6 nucleotides in length (motifs) found in genomes of all prokaryotes and eukaryotes [21,22].Microsatellites have been utilized liberally over previous years since they are profoundly informative with a high mutation rate per generation per locus (10 −7 to 10 −3 ) [21] and relatively selective neutrality [23,24] As high polymorphism, abundance, co-dominance, selective neutrality and transferability across species, microsatellite markers have been widely used in species and cultivars identification [25].The availability of high-throughput sequencing technologies (RNA-Seq) has enabled researchers to identify a substantial number of microsatellites at less cost and effort compared to traditional SSR development processes [26].
In this study, nine novel microsatellite markers were developed and characterized based on RNA-Seq data.Combined with four SSR markers from published literature, thirteen SSR markers were used to figure out: (1) how prevalent these SSR markers are in cross-species amplification; (2) is there sub-division population structure intra-species; (3) is there genetic hybridization interspecies in the genus Cymbidium.

Sequencing and de novo assembly of transcriptome
In total, 11.07 Gb of clean data was obtained using the Illumina NovaSeq platform.RNA-Seq yielded 22,739,372 clean paired-end reads at least 150 bp in length, and 72,556 Unigenes were gained from the clean reads performed by de novo assembly with Trinity.The average length of Unigenes is 835 bp.The N50 of the Unigenes was 1,483 bp.

Frequency and distribution of SSRs in the transcriptome
Using the MISA software, a total of 95,224 Unigenes were scanned and 15,244 SSR loci were detected (Table 2).The SSR locus discovered from transcriptome data includes six types: mono-, di-, tri-, tetra-, penta-, and hexanucleotide repeat motifs.The content among different types varies greatly.The Di-nucleotide repeat motif ranked the most abundant type (accounting for 44.19%) and the penta-nucleotides was the least abundant type (accounting for 1.11%) (Fig. 3).The counts of four types of Di-nucleotide and ten types of Tri-nucleotide were presented at Fig. 3.

Genetic polymorphisms of 13 SSR loci
In total, nine loci were selected from the transcriptome data of C. goeringii.The sequences of the nine loci were submitted to NCBI (https:// www.ncbi.nlm.nih.gov/ nucco re/ OP480 183-OP480 192) (Table 3).Combined with the four loci from published literature [17,27], 167 alleles were detected from C. goeringii population (Table 3).The observed heterozygosities varied dramatically cross 13 loci, ranged from 0.09 to 1.The expected heterozygosities of most loci were lager than 0.75, only locus H93 present the observed heterozygosities of 0.54.Null allele frequency ranged from 0 to 0.8 across 13 SSR markers (Table 4).Linkage disequilibrium was not detected between any pair of loci.

Cross-species analysis
The 9 polymorphic SSR loci isolated from C. goeringii (Table 3) and 4 loci from published literature (Table S1) were tested for cross-species amplification with 72 individuals from eight Cymbidium species.all these loci could be successfully amplified across eight Cymbidium species (Table 5).The genetic diversity for the eight Cymbidium species were listed in Table 5.The gene flow between species were presented in Fig.  4).

Principal coordinate (PCoA) analysis of four Cymbidium species
In Principal coordinate (PCoA) analysis, the first two principal component accounts for 24.66% (Fig. 5).

The Neighbor-Joining phylogenetic analysis
Based on the distance calculation method of Shared Allele, the Neighbor-Joining phylogenetic analysis presented the phylogenetic relationship of the 72 Cymbidium individuals (Fig. 6).Most of the individuals belong to the same species clustered together in the Neighbor-Joining tree.Such as all the C. tortisepalum individuals cluster into the LPL clade; ten C. faberi individuals clustered in the HUL clade and most C. goeringii and C. ensifolium individuals clustered in CL clades and JL clades separately (Fig. 6).But intra-species sub-division existed in both C. ensifolium (JL) and C. goeringii (CL), and each species contains three sub-clades (Fig. 6).The Neighbor-Joining tree also revealed some gene flow between Cymbidium species.

The population structure analysis
In the population structure analysis, the magnitude of Delta K as a function of K suggested the existence of 4 clusters for Cymbidium.when K = 4, the value Delta K was the largest (Fig. 7).We present the structure result when K = 4 (Fig. 8).The 4 clusters were presented using 4 colors: yellow, green, red, and blue, and the percentage of each color presented the proportion of each cluster individually.The yellow cluster takes more than 90% of most C. tortisepalum (LPL) individuals.C. tortisepalum was mainly constituted by a yellow cluster.Only one C. sinense individual (ML 99) was included in this work, and the was constituted mainly by yellow which is very similar to the constitution of C. tortisepalum.C. faberi was mainly constituted by green clusters.The green cluster was also contained in C. goeringii and C. ensifolium.The color constitution of C. goeringii and C. ensifolium were very complex.Both C. goeringii and C. ensifolium contain four clusters in Structure analysis (Fig. 8).

Discussion
The novel developed microsatellite markers by transcriptome sequencing for Cymbidium In this study, nine Cymbidium SSR markers were developed using the RNA-seq technique.The availability of high-throughput sequencing technologies has recently assisted researchers, providing excellent opportunities for life sciences [28].Generating transcriptome data through RNA sequencing has been successfully reported for SSR marker development in non-model plants with no reference genome as de novo sequencing [29][30][31][32].Compared with the SSRs developed from genomic sequences, the  SSR markers isolated from transcripts (ETS-SSR) displayed high transferability among related species and high genetic differentiation, low error rates, and low null allele frequencies but relatively low polymorphisms [33].
In this study, the transcriptome data provide abundant resources of the SSR sites, which would be useful in studies on the genetic diversity, and population genetics of C. goeringii and congeneric other species.In this study, the newly developed microsatellite markers are highly transferable in the genus Cymbidium.The nine SSR loci could be successfully amplified across eight Cymbidium species (Table 5).Microsatellite was one of the most widely used neutral molecular markers [21][22][23][24].Because of the high level of polymorphism, high abundance, co-dominance, selective, neutrality, and transferability across species, microsatellite markers have been utilized for a variety of applications in plant studies, including species/cultivars identification, paternity testing, genes mapping, construction of linkage maps, markers assisted selections and back-crosses, population genetics, gene flow, phylogenetics, and conservation genetics [25,34,35].The nine microsatellite markers could be successfully amplified in eight Cymbidium species, and proforma highly polymorphic.Up to 22 alleles were detected in two loci (Table 4).As the urgent need for an identification method in orchid business marketing, our newly developed microsatellite markers will be useful in Cymbidium species and cultivars discrimination and identification both in orchid business and research.

The intra-species sub-division
Phylogenetic analysis was frequently used in resolving the genetic variation and structure of Orchidaceae species [36][37][38][39][40]. Using the novel developed SSR markers, the population genetic analysis in the genus Cymbidium revealed intra-species divergence and inter-species hybridization.The phylogenetic analysis presented the intra-species divergence in both C. ensifolium (JL clade   6).In the PCoA analysis, unlike C. tortisepalum and C. faberi, in which individuals clustered together, the C. goeringii and C. ensifolium individuals are scattered (Fig. 5).The intra-species divergence was also presented in the STRU CTU RE analysis (Fig. 8).The cultivators (individuals) of C. goeringii and C. ensifolium presented complex constitutions.In natural populations, C. ensifolium and C. goeringii present low-level genetic diversity between populations [41,42].In this work, the genetic structure is more significant than the natural population.That may be the consequence of artificial breeding accelerated genetic divergence.In this work, all the C. ensifolium and C. goeringii individuals are cultivators.Genetic diversity analysis discovered more genetic divergence in cultivators.Using RAPD markers, two distinct groups were revealed among cultivators of C. goeringii [20].Based on 38 C. ensifolium cultivars, high genetic diversity was discovered using RAPD analysis [43].This indicated that higher genetic diversity exists in the cultivator than in the natural population of C. ensifolium and C. goeringii.

Inter-species gene flow among Cymbidium species
Neutral molecular markers were frequently used in detecting inter-species hybridization and gene flow [24,34,44].Gene flow between Cymbidium species was not discovered for the first time.In one molecular genetic analysis work on the genus Cymbidium, one C. faberi cultivator 'Ruyisu' was clustered into C. goeringii group in STRU CTU RE analysis based on SSR markers [17].Sympatric distribution may cause inters-species hybridization in Orchid.The natural distribution of C. goeringii and C. faberi overlaps frequently, and both distributes in southwest and southeast China [45].Sympatric distributed interspecific hybridization was discovered in another genus of Orchid.Natural hybridization were detected and proved between sympatric distributed Geodorum eulophioides and G. densiflorum [46].
Artificial cross-breeding may be another reason for the inter-species gene flow in Cymbidium.Orchids have been cultivated for centuries, artificial cross-breeding in Cymbidium is quite frequency [47], and hybridization between species happens multiple times during culturing [7,8,12].In this work, three cultivators of C. goeringii and C. faberi were clustered into CL-HUL mixed clade (Fig. 6), and indicated complex genetic background of these three cultivators (Fig. 6).

Conclusions
The newly developed microsatellite markers of Cymbidium goeringii with RNA-seq data were highly polymorphic, and successfully amplified across 8 Cymbidium species.Based on the SSR markers, intra-species subdivision was detected in C. goeringii and C. ensifolium; inter-species gene flow was detected among C. goeringii, C. ensifolium, and C. faberi.These SSR makers will be useful in the genus Cymbidium's cultivar and species identification and population genetic cultivar.

Materials
Fresh leave of Cymbidium goeringii 'da fu gui' was collected for RNA extraction and transcriptome sequencing.C. goeringii 'da fu gui' was a popular orchid cultivator and classic representative of spring orchids with lotus petal flowers.C. goeringii 'da fu gui' was collected from natural forests in 1909.The transcriptome sample used in this experiment was initially brought from the seedling and plant company at Shaoxing, Zhejiang Province, China, and then cultured at the Orchid greenhouse of Zhejiang A&F University by Dr. Hui-Juan Ning.5).All of these Cymbidium specimens were collected from southeast and southwest China (Table S2) and identified by Dr. Hui-Juan Ning (the author of this work) and preserved at the Orchid greenhouse of Zhejiang A&F University.The detail of the collection location, the cultivars' name, and the morphology of all the samples were listed in supplementary table 1.The specimens used in this work were purchased from plant companies and these 8 species have not been listed in national key Fig. 8 STRU CTU RE genetic clusters of 72 individuals (up) and 8 species (down) of genus Cymbidium.Green, yellow, blue, and red represent the assignment probability for the four major clades protected plants.We collected the samples without any required permissions.Our sample collection work and experimental research complied with local legislation and national and international guidelines.All the plant materials were cultured at the Orchid greenhouse of Zhejiang A&F University (ZAFU) or persevered deposited at the herbarium of ZAFU.The voucher no. of each specimen was listed in Table S2.

DNA extraction, RNA extraction, cDNA library construction and sequencing
The total RNA of one C. goeringii individual was extracted using a modified CTAB RNA extraction method for further transcriptome sequencing [48].The genomic DNA of all the specimens was extracted using a modified DNA extraction method to detect polymorphisms of isolated microsatellite loci [49].The quality and quantity of the exacted DNA and RNA was assessed using 1.5% agarose gel electrophoresis and NanoDrop 2000 spectrophotometer (Thermo Scientific, Wilmington, DE, USA).RNA-Seq library was constructed using Illumina TruSeq RNA Sample Preparation Kit (Illumina, San Diego, California, USA).The C. goeringii RNA was sequenced with RNA-Seq on the Illumina NovaSeq platform at BGI Tech (Shenzhen, China) generating 6.8 Gb reads.

Transcriptome assembly and Unigenes annotation
The raw data yielded from RNA-Seq was conducted through a quality assessment and credibility analysis using Trimmomatic [50].Low-quality sequences were removed in the sequencing process.Trinity was used for conducting the de novo assembly [51,52].The transcripts were assembled and the main transcript was selected from the local area as Unigenes [53].
Unigenes sequences were compared against NCBI nr (National Center for Biotechnology Information nonredundant protein sequences), NT (Nucleotide Sequence Database) KOG (EuKaryotic Orthologous Groups of proteins), SwissProt (Swiss-Prot Sequence Database), KEGG (the Kyoto Encyclopedia of Genes and Genomes), Intersection, and Interpro databases to associate Unigenes with annotated proteins and functional information [54][55][56].Gene ontology analyses were conducted using Blast2GO [57].WEGO [58] was used to characterize GO annotations and statistics, and to describe the molecular functions of genes, cell components, and biological processes involved.

Microsatellites identification based on transcriptome data
The microsatellite tool (MISA-web) [59] was conducted to detect microsatellite loci with the following criteria [29]: mono-nucleotide repeat motifs with at least 12 repeats, di-nucleotide repeat motifs with at least six repeats and repeats of all other motif lengths extend at least five repeats.

PCR amplification and genotyping
Twenty-one C. goeringii individuals were amplified to survey the polymorphism of the SSR loci.PCR amplification was performed under an appropriate annealing temperature (Table 2).The primers were attached FAM or HEX fluorescent (Applied Biosystems, New York, USA).Fragment sizes were determined on an ABI 3100 Genetic Analyzer (Applied Biosystems).ROX 500 (Applied Biosystems) was used as the internal lane size standard.

SSR markers data analysis and cross-species analysis
GenALEX [62] was used to calculate the number of alleles (Na), the effective number of alleles (Ne), Shannon's information index (I), PIC Polymorphism information content (PIC), and the Fixed index (F) of each locus based on the data of C. goeringii.The likelihood ratio test was employed to estimate linkage disequilibrium using Genepop [63] and P-values were adjusted using the Bonferroni correction.The null-allele frequency was analyzed using Genepop [63].
To validate the transferability of the polymorphic loci isolated from C. goeringii, cross-species amplifications were tested for the 72 individuals from eight Cymbidium species using the same procedures as above, except that the annealing temperature was reoptimized for each locus.The number of provenance samples (Nps), number of alleles (Na), effective number of alleles (Ne), Shannon's information index (I), observed heterozygosity (Ho), and expected heterozygosity (He) was calculated for each species using GenALEX [62].The pairwise species estimates of the number of migrants (Nm) were calculated among C. goeringii, C. ensifolium, C. tortisepalum, and C. faberi using GenALEX [62].

Cluster analysis of eight Cymbidium species
GenALEX [62] was used to calculate the Pairwise Population Matrix of Nei Genetic Identity between populations, followed by PcoA analysis using the Omic share website tool (https:// www.omics hare.com/ tools/).
Powermarker software [64] was used to calculate the genetic distance based on the Shared Allele algorithm, and then a phylogenetic tree was constructed based on the Neighbor-Joining method, and the final results were visualized with MEGA version X [65].
The population structure analysis was performed using Structure v2.3.4 [66], the parameters length of the burn-in period was set to 100,000 and the number of MCMC Reps after burn-in was set to 500,000, the optimal K value was calculated using the harvest online website (https:// taylo r0.biolo gy.ucla.edu/ struct_ harve st/), then repeated sampling analysis was performed with CLUMPP [67], visualization was performed with distruct software [66].

Fig. 1
Fig. 1 Cymbidium goeringii Unigenes annotation against eight databases (A) and against different species (B) only Inorganic ion transport and metabolism Intracellular trafficking, secretion, and vesicular transport Lipid transport and metabolism Nuclear structure Nucleotide transport and metabolism Posttranslational modification, protein turnover, chaperones Replication, recombination and repair RNA processing and modification Secondary metabolites biosynthesis, transport and catabolism

Fig. 4
Fig. 4 Gene flow among C. faberi, C. ensifolium, C. tortisepalum, and C. goeringii.The number noted on the line between species indicated the number of migrants (Nm) between species

Fig. 7
Fig. 7 the magnitude of DeltaK (B) as a function of K suggested the existence of 4 clusters for Cymbidium.Results are from 10 replicates for each of 1 ≤ K ≤ 9

Table 1
Unigenes annotation of C. goeringii against eight databases

Table 2
Prediction of SSRs out of the transcript datasets of C. goeringii B Fig. 2 Cymbidium goeringii gene annotation based on GO, KOG, and KEGG databases.A Gene Ontology (GO) annotation graph of C. goeringii.B EuKaryotic Ortholog Groups (KOG) annotation graph of C. goeringii.C Kyoto Encyclopedia of Gene and Genomes (KEGG) annotation graph of C. goeringii

Table 3
Characteristics of 9 microsatellite loci isolated from

Table 4
Genetic diversity 13 SSR markers based on C. goeringii Na number of alleles, Ne Effective number of alleles, f NA null-allele frequency, I Shannon's information index, PIC Polymorphism information content, F Fixed index, H O observed heterozygosity, H E expected heterozygosity

Table 5
Genetic diversity of eight Cymbidium species Nps Number of provenance samples, Na number of alleles, Ne Effective number of alleles, I Shannon's information index, H O: observed heterozygosity, H E expected heterozygosity I, JL clade II and JL clade III) and C. goeringii (CL clade I, CL clade II and CL clade III) species (Fig.