Skip to main content

Development of a robust SNP marker set for genotyping diverse gene bank collections of polyploid roses

Abstract

Background

Due to genetic depletion in nature, gene banks play a critical role in the long-term conservation of plant genetic resources and the provision of a wide range of plant genetic diversity for research and breeding programs. Genetic information on accessions facilitates gene bank management and can help to conserve limited resources and to identify taxonomic misclassifications or mislabelling. Here, we developed SNP markers for genotyping 4,187 mostly polyploid rose accessions from large rose collections, including the German Genebank for Roses.

Results

We filtered SNP marker information from the RhWag68k Axiom SNP array using call rates, uniformity of the four allelic dosage groups and chromosomal position to improve genotyping efficiency. After conversion to individual PACE® markers and further filtering, we selected markers with high discriminatory power. These markers were used to analyse 4,187 accessions with a mean call rate of 91.4%. By combining two evaluation methods, the mean call rate was increased to 95.2%. Additionally, the robustness against the genotypic groups used for calling was evaluated, resulting in a final set of 18 markers. Analyses of 94 pairs of assumed duplicate accessions included as controls revealed unexpected differences for eight pairs, which were confirmed using SSR markers. After removing the duplicates and filtering for accessions that were robustly called with all 18 markers, 141 out of the 1,957 accessions showed unexpected identical marker profiles with at least one other accession in our PACE® and SSR analysis. Given the attractiveness of NGS technologies, 13 SNPs from the marker set were also analysed using amplicon sequencing, with 76% agreement observed between PACE® and amplicon markers.

Conclusions

Although sampling error cannot be completely excluded, this is an indication that mislabelling occurs in rose collections and that molecular markers may be able to detect these cases. In future applications, our marker set could be used to develop a core reference set of representative accessions, and thus optimise the selection of gene bank accessions.

Peer Review reports

Background

The conservation of plant biodiversity is a crucial global issue. Genetic diversity is an important parameter for the maintenance of sustainable agriculture and horticulture, and is required for the creation of more productive and resistant plants [1]. Human intervention poses a severe threat to global biodiversity, particularly in light of climate change. Currently, the estimated number of plant species that are threatened by extinction has increased to 39.4% [2].

Due to genetic depletion in nature, gene banks play a critical role in the long-term conservation of plant genetic resources and the provision of a wide range of plant genetic diversity for research and breeding programs. There are approximately 1,750 gene banks worldwide that preserve more than seven million accessions of food crops [3]. In addition, there are approximately 2,700 botanic gardens [4] serving as gene banks for crops and ornamental plants that are important for both more sustainable breeding (e.g., resistance breeding) and for the preservation of cultural heritage. For this reason, the German Genebank for Roses was established in 2009, coordinated by the Europa Rosarium Sangerhausen. The world's largest rose collection currently comprises approximately 8,700 rose species and varieties (3,409 of which belong to the gene bank), which are available for breeding and research (G. Schulz, Europa Rosarium Sangerhausen, personal communication, April 29, 2024.). The availability of highly diverse genetic resources for roses is important, as only 8 to 10 of the 130 to 200 known wild rose species have contributed to more than 30,000 registered varieties [5,6,7]. Due to interspecific hybridisation and cross-fertilisation, the genomes of modern varieties are very complex, ranging from triploid (2n = 3x = 21) to octoploid (2n = 8x = 56) [8, 9]. However, tetraploid varieties (2n = 4x = 28) are mostly used for breeding [10].

For the efficient conservation of genetic resources in gene banks, the distinct identification of individual accessions is of the utmost importance. Gene bank collections may contain historical varieties with unknown breeding histories, leading to the cataloguing of varieties with synonymous or incorrect names. Confusion in labelling is also possible, particularly for vegetatively propagated plants like roses that have been in the field for years or need to be replaced due to damage. Among the 571 banana accessions from the gene bank of the CGIAR International Musa Germplasm Transit Centre that have undergone field verification, 10% were mislabelled, and an additional 6% were misclassified [11]. Using chloroplast markers, Zhang et al. (2021) [12] discovered that 17% of 53 rice seed accessions from gene banks or field collections were mislabelled. Therefore, it is crucial to develop effective methods for identifying and rectifying misclassifications in gene bank management. Often, authenticity verification is performed by documenting phenotypic data over many years. Because this procedure is strongly dependent on the experience of the observer as well as on the expression of phenotypic traits under different environments, the use of molecular genetic methods in combination with phenotypic data is crucial for improving gene bank management.

Molecular markers such as random amplified polymorphic DNA (RAPD), amplified fragment length polymorphism (AFLP), and simple sequence repeats (SSRs) are very useful for genetic diversity analysis [13]. However, it is difficult to automate and/or to score unambiguously these marker types and RAPDs are also difficult to reproduce (Jones et al., 1997). Therefore, high-throughput analyses and technology-independent genetic information that can be easily shared make single nucleotide polymorphisms (SNPs) the markers of choice [14]. Different techniques are available for SNP genotyping, such as PCR-based methods, genotyping by sequencing (GBS) and microarrays. For roses, the WagRhSNP chip with 68,893 markers was developed in 2015 and is based on sequence polymorphisms in expressed genes in garden and cut roses [15]. Due to the availability of high-quality rose genome sequences [16], these SNP data can be assigned to a position in the genome and used for genome wide association studies (GWAS) to identify markers of adventitious rooting, anthocyanin/carotenoid content and other floral traits [17,18,19]. However, SNP chips are very expensive if high genotype numbers are analysed with only a few markers.

PACE® and/or KASP™ markers are cost-effective alternatives for SNP detection in germplasm collections. These allele-specific PCR-based techniques can be easily scaled up to several thousand genotypes and to a flexible number of markers. In addition, they can be automated and display different ploidy levels. KASP™ and PACE® markers have already been used for variety identification in crops such as rice and faba bean [20, 21]. Additionally, Winfield et al. (2020) [22] developed a set of 21 KASP™ markers to distinguish 2,652 different apple varieties. To our knowledge, no reports on the development of PACE® or KASP™ marker sets for fingerprinting studies in garden roses have been published to date.

The aim of this study was to develop a robust, easy-to-use and cost-effective marker set that can be used to genetically characterise accessions in the German Genebank for Roses. For this purpose, SNPs with evenly distributed allele dosages and high calling rates were selected from the WagRhSNP chip, and PCR-based PACE® markers were developed and tested. The final marker set was tested for its suitability and resolution on 4,187 accessions, mainly from the gene bank stock. The results of different analytical approaches were compared, and five additional SSR markers were used to confirm suspected duplicates and genotypes with identical SNP profiles. In addition, 95 randomly selected genotypes were used to evaluate the additional information provided by amplicon sequencing and to examine an alternative SNP detection technology.

Methods

Plant material

A total of 4,187 individual rose accessions were sampled at the Europa Rosarium Sangerhausen, Germany (2974 cultivars and 127 species), and the Bundessortenamt (Federal Plant Variety Office) Hannover, Germany (1,086 cultivars). A list that assigns the identifiers to the origin is available under https://doi.org/10.6084/m9.figshare.25837849 and species names are listed in Additional File 1. Samples were collected in May/June 2021 and May 2022.

Young rose leaves that were not yet fully developed were sampled and stored at room temperature under dark and humid conditions to degrade polysaccharides. After two days, approximately 20 to 40 mg of leaf tissue was placed into the tubes of a 96-well rack and dried. On each rack, one tube was left empty to serve as a control. The remaining sampled leaf tissue was dried and stored as a backup sample.

DNA extraction

Samples were isolated with the Mag-Bind® M1128 Plant DNA Plus Kit from Omega Bio-tek, Inc., using the Phoenix Pure 96 System (Hangzhou Allsheng Instruments Co., Ltd.). Deviating from the manufacturer's instructions for the kit, the incubation time was increased to 40 min after addition of the CSPL buffer. Furthermore, an additional sorbitol prewash step was performed according to Inglis et al. (2018) [23] to improve DNA quality.

The DNA quality and concentration were assessed using a NanoDrop™ 2000/2000c Spectrophotometer (Thermo Fisher Scientific Inc.) at 260/280 nm. For a set of arbitrarily selected samples, the quality and concentration were determined using agarose gel electrophoresis with lambda DNA as a standard. The DNA was then diluted with elution buffer/PCR-grade water to a concentration of 6 ng/µL and stored at -20 °C.

PACE® genotyping assay

Two allele-specific forward primers and one common reverse primer were designed for each SNP based on the sequences 50 bp around the SNP using the PACE® Assay Design Template from 3CR Bioscience Ltd. The primer sequences can be found in Supplementary Table 1, Additional File 2. The PACE® assay was performed using PACE® 2.0 Genotyping Master Mix with a low Rox concentration according to the manufacturer's instructions. For a 384-well plate, 2.5 µL of DNA (15 ng) and 2.5 µL of master mix, including the assay mix, were combined in each reaction using the liquid handling system epMotion® 5075t from Eppendorf SE. Thermal cycling and fluorescent signal detection were conducted with a QuantStudio™ 6 Flex Real-Time PCR System (Thermo Fisher Scientific, Inc.). When using a newly designed assay mix, 26 to 39 PCR cycles were tested. To import the plate layout into the QuantStudio Real-Time PCR System (v1.3) software, an import file containing genotypes and markers was created using a custom-made Python script [24], which is available on request. Because the QuantStudio software used cannot score tetraploid organisms, the FitTetra package [25] from the statistical software R [26], version 4.0.4, was used for SNP dosage calling. Here, dip.filter = 0 and p.threshold = 0.95 were used instead of the default settings. The PACE® assay was repeated for DNA plates in which more than 20% of the genotypes could not be scored.

Calculation of evenness

The evenness of allele dosage groups for each marker was calculated according to the following formula:

$$Evenness (I)= \frac{{H}_{S} }{\text{ln}(S)}$$

where S is the ploidy level and HS is the Shannon‒Wiener index, which is calculated using the following formula:

$${Shannon-Wiener Index (H}_{S})=-\sum_{i=1}^{S}{p}_{i}*\text{log}{p}_{i}$$

where pi is the relative frequency of the single allele dosage.

Assessment of molecular identity

The genetic distance matrices were calculated using the dist(method = 'maximum') function in R [26] based on the score files from FitTetra [25].

Amplicon sequencing

Two PCRs were performed for amplicon sequencing. In the first PCR, gene-specific regions containing the SNP marker were amplified with M13-tailed primers. In the second PCR, unique sequence tags were added at the 5´ region of the primer binding to the M13 tail of the product from the first PCR. Gene-specific primers were designed using Primer3Plus [27]. The primers used for tagging were obtained from Pjevac et al. (2021) [28]. All primer sequences are provided in Supplementary Table 2 and Supplementary Table 3, Additional File 2.

PCR with gene-specific primers was performed in a 25-μL reaction containing 24 ng of DNA, 1 × Q5 buffer, 200 μM dNTPs, 0.5 μM forward and reverse primers and 0.02 U of Q5 High Fidelity DNA Polymerase (New England Biolabs GmbH). The second PCR with the tagging primers was also performed using Q5 High Fidelity DNA polymerase in a 50-μL reaction containing 2 μL of the PCR product from the first round. The cycling conditions were 98 °C for 30 s for initial denaturation; 25 (for the first PCR) and 12 (for the second PCR) cycles of 98 °C for 10 s, 60 °C (for the first PCR) and 69 °C (for the second PCR) for 20 s and 72 °C for 20 s; and a final extension of 2 min at 72 °C. Products from the second PCR were checked with agarose gel electrophoresis and pooled in equal amounts for each marker. The samples were purified using the NucleoSpin® Gel and PCR Clean-up Kit (MACHEREY–NAGEL GmbH & Co. KG) according to the manufacturer's instructions. The samples were diluted and sent to GENEWIZ Germany GmbH for Illumina sequencing.

Sequences were analysed using the AmpliSAT web tools [29], which are available at http://evobiolab.biol.amu.edu.pl/amplisat. AmpliCLEAN was used with a minimum Phred quality score of 30. The AmpliCHECK output was used for further analysis.

Microsatellites (SSRs)

SSRs were amplified using the previously published primer pairs RMS043, RMS070, Rdr1, RMS066 and RMS059 [30,31,32]. Among these markers, Rdr1 tags a highly complex locus of TNL genes on rose chromosome 1. The primer sequences can be found in Supplementary Table 4, Additional File 2. Two multiplex reactions were performed using these primers. Multiplex reaction A included the pooled primer pairs RMS043 (IRD700 labelled) and RMS070 (IRD800 labelled). In multiplex reaction B, the primer pairs Rdr1 (IRD700 labelled), RMS066 (IRD800 labelled), and RMS059 (IRD800 labelled) were combined. The PCR reaction consisted of 20 µL and contained 18 ng of DNA, 2 µl 10 × Williams buffer (containing 100 mM Tris/HCl ph 8.3, 500 mM KCl, 20 mM MgCl2, 0.01% gelatine), 200 µM dNTPs, 0.25 µM each of the forward and reverse primers, and 0.05 U/µL DCS polymerase. The PCR conditions were as follows: initial denaturation for 5 min at 94 °C, followed by 25 cycles of 30 s at 94 °C, 30 s at 56 °C and 30 s at 72 °C, and a final step at 72 °C for 5 min. Five microlitres of PCR product was diluted with 50 µL formamide and then incubated at 95 °C for 5 min. SSR marker bands were visually scored on a 6% polyacrylamide gel using vertical electrophoresis and a LI-COR 4200 DNA Analyser (LI-COR, Inc.) as described in Spiller et al. (2011) [31].

Results

Selection of the marker set based on chip data

Data generated using the 68 k WagRhSNPChip from previous studies [19] were used to determine the allele dosage of an association panel consisting of 95 garden rose cultivars. The 68 k WagRhSNPChip uses two specific probes, called the 'first probe' and 'second probe', to detect genetic variations. The probes bind on opposite sides of the SNP, providing two sets of fluorescence data. Using FitTetra, 37,514 of the 68 k SNPs on the chip were scored for probe 1 and 38,925 for probe 2, resulting in a total of 47,691 unique SNPs. To identify markers with a uniform allele distribution across the panel as well as a high calling rate, the markers were filtered based on the following criteria: 1) at least 10% of the genotypes are in each of the five possible allele dosage clusters, 2) 90% of all genotypes are assigned with a 99% probability to an allele dosage group, and 3) the average of the maximum probability of all genotypes belonging to any allele dosage group is 99%. This resulted in 189 markers of potential interest, 40 of which were filtered from both probes. Based on the allele distribution and calling rates, the markers were converted to PACE® markers and tested on 95 to 190 accessions until 20 of the 75 converted PACE markers could be selected for genotyping. The marker combination was selected in such a way that at least two markers were distributed on each of the seven chromosomes (Supplementary Fig. 1, Additional File 3).

Allele dosage distribution, calling rate and robustness of the marker set analysed in 4,187 rose accessions by comparing different plate layouts

For fluorescence-based detection of allele expression in a biallelic SNP, the allele dosage can only be classified in comparison to the fluorescence ratios of the other samples in the particular analysis. Consequently, the composition of samples in an analysis unit can have an effect on calling. Therefore, a total of 4,187 accessions were genotyped with the developed marker set of 20 PACE® markers, and fluorescence data were analysed with FitTetra using two different approaches (Fig. 1). In the first approach (the original plate layout), the data were used as they were placed in the 384-well plate layout. In general, 380 genotypes and four negative controls were analysed with one marker or 190 genotypes, and two negative controls were analysed with two markers per plate. Thus, 190 to 380 genotypes per marker were combined in one analysis. In the second approach (the single plate layout) the 190 to 380 genotypes on one 384-well plate were divided into groups of 95 genotypes according to the original arrangement on the 96 well DNA plates and then analysed separately. Additionally, both approaches were combined by using both scores and then compared with each other. Adding fluorescence values from separate plate runs was avoided, as different runs may slightly differ in fluorescence properties and therefore add additional noise to the analysis.

Fig. 1
figure 1

Determination of marker quality by combining different layouts. The DNA from 95 genotypes and one negative control was stored in 96-well plates. For the PACE genotyping assay, DNA from two to four plates was pipetted into 384-well plates. The statistical analysis was conducted with FitTetra using two different approaches. First, the original plate layout was used for the analysis. Second, the fluorescence data from the 384-well plate were subdivided into 96 samples stored on one DNA plate (single plate layout). For each of the subsets, an independent analysis was conducted. Then, the results of both approaches were compared (combined analysis)

The proportion of genotypes that were unambiguously assigned to an allele dosage with a p-value of less than 0.05 was on average 91.4% for the original plate layout, 93.6% for the single plate layout, and 95.2% for the combination of the two approaches (Table 1). The best marker for the single plate layout, RhK5_5648_324, scored 97.1% for all genotypes and was also the best marker for the original plate layout, with a calling rate of 96.3%. After combining the scores from the original plate layout and single plate layout, the calling rate increased to 97.3%. The highest calling rate after combining the two approaches was obtained for the marker Rh12GR_60359_1361 (97.9%). Before combining the two approaches, these calling rates were only 95.2% (single plate layout) and 94.8% (original plate layout). The two markers with the lowest calling rates were RhK5_125_737 (89.8% after combination) and RhMCRND_17396_291 (87.3% after combination). All other markers called at least for 94% of the genotypes after combining the two approaches.

Table 1 Proportion of called genotypes in the panel of 4,187 rose genotypes per plate layout

The combination of markers with an even distribution of allele dosages has greater discrimination potential than markers with strongly biased distributions. To assess whether the allele dosages were equally distributed, the proportion of the genotypes within a cluster was considered, and the evenness was calculated based on the Shannon‒Wiener index. In the single plate layout, the allele dosages of the genotypes were equally distributed among all five possible clusters (P0, P1, P2, P3, and P4) for the markers RhMCRND_17396_291 and RhK5_11100_1011, with evenness values of 0.996 and 0.988, respectively (Table 2). In contrast, the RhK5_8422_105 marker had the lowest evenness value of 0.740. Although 59% of the genotypes had allele dosage four, all other allele dosages were underrepresented, with proportions ranging from 3 to 16% (Table 2).

Table 2 Allele distribution across the five clusters per marker for the single plate layout (4,187 genotypes)

Using the two different scoring methods, the original and the single plate layout, the number of genotypes scored with all 20 markers could be increased from 1,812 to 2,537 of all 4,187 accessions (Fig. 2). Fourteen genotypes could not scored with any of the markers and analysis methods.

Fig. 2
figure 2

Number of genotypes with scored markers compared among different scoring approaches. The calculations were performed for the single plate layout, original plate layout and the combination of the scores from both layouts. The total number of all genotypes was 4,187

The robustness of the marker scoring was assessed by comparing the scores from the single plate layout analysis and the original plate layout analysis. For two markers no differences in scores were noted, maximum error rate of 5% was observed for 13 additional markers (Table 3). Two markers had differences in scores greater than 10%. The mean error rate of all markers was 3%. For all markers, the differences were mainly in only one difference in the allele dosage group (e.g. ABBB or AAAB instead of AABB). With the exception of marker RhK5_125_737, the proportion of differences in more than one allele dose group was less than 1%.

Table 3 Differences in the scores between single plate layout and original plate layout using 4,187 genotypes

Validation of the suspected duplicates

Among the 4,187 accessions, 94 cultivars occurred twice in the collections (once at each of the locations). The molecular identity of these 94 pairs was checked by PACE® genotyping assays. The markers RhK5_125_737 and RhK5_8422_105 were excluded from this analysis due to differences in scores of more than 10% between different plate layout analyses (Table 3), resulting in a core set of 18 robust markers. Overall, 58 of the 94 pairs (62%) had the same SNP pattern (Fig. 3). Fifteen of the pairs differed in one to three SNPs, and seven of the hypothetical duplicates had differences in more than three SNPs. For 14 pairs, more than four markers could not be scored, so these were excluded from the statistical analysis.

Fig. 3
figure 3

Analysis of 94 assumed genotype pairs using the 18-PACE marker set and additional SSR analyses. First, the PACE assay was performed on 4,187 accessions, after which the scores of the pairs were compared. To confirm the results, the DNA was isolated again, and the PACE assay was repeated. Additionally, SSR marker analysis with five markers was performed. “No results” means that more than four of the 18 SNPs or more than three of the five SSR markers failed

To confirm these results, the genomic DNA of 116 of the accessions (58 pairs) was isolated again, and the PACE® genotyping assays were repeated. This included all pairs in which a difference of at least one SNP was detected or in which no conclusion could be drawn due to failures in more than four SNPs. In addition to these 36 pairs, 22 randomly selected duplicate pairs with no marker differences from the first set of DNA samples were included. All putative pairs were arranged in the repetition in such a manner that the pairs were located on the same plate. In addition, SSR analysis with five markers was conducted for the 116 samples.

The 22 pairs in which no differences were initially found in the first set of samples were confirmed using the repeated PACE® assay and the SSR analyses (Fig. 3). Additionally, all seven pairs with differences in more than three SNPs in the first PACE® assay also had different marker patterns in the repeated PACE® assay and the SSR analysis. The percentage of differently called markers in the first and second PACE® assay was approximately 1.3%. The 19 markers scored differently between the first and second PACE assay were present in 15 accessions. Eleven of these cases showed deviations in one allele dosage group, three cases showed deviations in two allele dosage groups and five cases showed a non-scored marker (Additional File 4). For the pairs that could not be analysed in the first PACE® assay due to too many failures, eleven of the 14 pairs in the repeated PACE® assay showed no difference in the SNPs between all pairs, which was also confirmed with the SSR analysis. Three of the pairs still had more than four missing markers. The results of the repeated PACE® assay based on the 18-marker set and the SSR analysis were consistent for all pairs.

Number of other accessions with identical SNP profiles

To assess the discriminatory power of the 18-marker set, the number of unique SNP profiles was determined. For this purpose, only accessions in which all 18 markers were called without any missing values and in which the scores were identical between the single plate layout and original plate layout (proof of concept) were analysed. Of the 4,187 samples, 1,957 accessions fulfilled these requirements. All but 224 accessions showed a unique SNP profile according to the 18-marker set. These 224 accessions were grouped into 96 different clusters, each of which had the same SNP profiles (Fig. 4). Subtracting the known and verified duplicate accessions described in the previous chapter, resulted in 174 accessions grouped in 73 clusters. The clusters often consisted of two or three accessions. Interestingly, one cluster consisted of 11 accessions.

Fig. 4
figure 4

Number of accessions with identical SNP profiles. A total of 4,187 accessions (acc.) were analysed with 18 PACE markers and then filtered to exclude accessions with missing values and non-identical scores in both plate layouts. This resulted in 1,957 accessions, of which 1,733 had a unique SNP profile. The remaining 224 accessions clustered in 96 groups. After excluding known duplicates, 174 accessions in 73 clusters remained. Additional SSR analysis was performed, and the PACE assay was repeated based on newly isolated DNA

To confirm the accessions with identical PACE® marker profiles, SSR analyses with five markers were conducted, and the PACE® genotyping assay based on the 18-marker set was repeated. For the repetition, the DNAs from individuals of the same cluster were located on the same 384-well plate. Using this method, the molecular identity of 62 clusters, containing 140 accessions, out of the 73 clusters was confirmed. Of these 62 clusters, 58 clusters were further confirmed with the five SSR markers, and two groups showed no band patterns. However, two clusters with two accessions each, which were identical according to both PACE® assays, were found to exhibit differences in the band pattern for two of the five SSR markers or for all five markers respectively.

In seven of the 73 clusters, the repeated PACE® genotyping assay showed differences between the individuals, which were confirmed in five clusters using the SSR markers. In four of the five groups, the PACE® assay showed differences in one to two SNPs. However, in one case, differences in all 18 SNPs were detected, indicating that the sample was mixed up. No conclusion could be drawn for four clusters because the SSR markers did not produce fragments or the samples were no longer available.

Comparison of the performance of the marker set between the PACE® genotyping assay and amplicon sequencing data

PACE® genotyping assays are only one method for determining the allele configurations of SNPs, and direct sequencing with NGS platforms represents an alternative even for large numbers of genotypes. Therefore, primers were successfully designed for 13 SNPs out of the 20-marker set and used to analyse 95 out of the 4,187 accessions randomly selected by amplicon sequencing to compare the information content as well as the workload of the PACE® assays. After excluding genotypes with fewer than 100 reads and sequences with less than 10% of the total reads of the genotype, the results were compared with the PACE® genotyping results using AmpliSat and the R package FitTetra as allele dosages. For eight of the 13 markers, differences in the allele dosage estimations between the PACE® assay and amplicon sequencing were observed for a maximum of four genotypes (Table 4). For the markers RhK5_10792_6318 and RhK5_1295_1946, 18 to 38 accessions were called differently. To exclude sequencing errors and errors due to contamination or sample handling, amplicon sequencing was repeated for the markers RhK5_10792_6318 and RhK5_1295_1946. For RhK5_10792_6318, 25 genotypes were differently called also in the replicate. The marker RhK5_1295_1946 could not be successfully called by FitTetra in the repetition. Interestingly, five genotypes showed differences in the repeated amplicon scoring for the marker RhK5_10792_6318, which had previously been correctly assigned. The markers RhK5_8422_105 and RhMCRND_11585_178 could not be called using FitTetra because there were too few genotypes with more than 100 reads. The analysis with genotypes above 50 reads showed that the marker RhK5_8422_105 showed differences in scoring between PACE assay and Amplicon in 22 of the genotypes and the marker RhMCRND_11585_178 in only one of the genotypes (Additional File 5).

Table 4 Comparison between PACE® and amplicon scoring within the same SNP analysed in 95 genotypes

Discrimination accuracy of the markers from the marker set using the amplicon sequencing approach

The amplicon sequencing primers were designed so that the amplicons excluding the tags had lengths between 177 and 308 bp (Table 5). The sequence information was used to further identify polymorphisms located around the actual SNP in the 95 randomly selected genotypes. Two of the 95 genotypes had fewer than 100 reads for all 13 markers and were therefore excluded from further analyses. The marker Rh12GR_2923_1285 had the lowest number of polymorphic sites and different haplotypes, detecting eight different haplotypes and distinguishing 24 different genotypes based on six SNPs (Table 5, Additional File 6, and Additional File 7). The marker RhK5_8422_105, with the largest number of polymorphisms (56 polymorphic sites and 27 haplotypes) in a 177-bp product, was excluded from further analysis because more haplotypes per genotype were distinguished than expected after filtering and unspecific primer binding sites were found. The marker RhK5_978_1131 could distinguish the largest number of genotypes (65) with 24 polymorphic sites. The number of actual genotypes resulting from the possible combinations of haplotypes of a marker varied from 18 (RhK5_69_1627) to 65 (RhK5_978_1131).

Table 5 Number of variants for each marker according to amplicon sequencing in 95 genotypes

When combining only two different amplicon markers, the number of actual genotypes that could be distinguished increased to 47 (RhK5_69_1627 and RhK5_1295_1946 combined) up to 88 (RhK5_978_1131 and RhK5_5648_324) out of the 93 possible combinations (Table 6). On average, 76 of the 93 genotypes could be distinguished when two amplicon markers were combined. When all markers were used, all of the 93 genotypes could be distinguished.

Table 6 Number of actual variants after combining the markers from amplicon sequencing of the 95 genotypes

Discussion

In the present study, a set of 18 PACE® markers was developed based on SNP chip data and used to analyse 4,187 accessions, mainly from the German Genebank for Roses. The purpose of this marker set is to uniquely characterise the accessions to identify duplicates and to detect mislabelling in large rose collections. In the future, molecular information could also be used to calculate genetic distances and assess genetic diversity in collections. According to UPOV, the requirements for markers used for DNA profiling include high discriminatory power, reproducibility within and between different laboratories, and the possibility of varying the number of samples and/or the number of markers [33]. Therefore, we decided to use SNP markers in our study because they exhibit the following advantages over other marker types: 1) SNP markers can be automated, 2) the biallelic format is easy to process, 3) dosage information can be obtained for polyploids, and 4) these markers can be monitored on different platforms.

Considering how many markers are needed

The number of markers included in the genotyping set is a trade-off between cost/time and resolution. Assuming that up to five different SNP configurations of a biallelic marker can exist for tetraploid organisms (AAAA, AAAT, AATT, ATTT, and TTTT), the number of markers required to distinguish 4,187 genotypes is six (Supplementary Fig. 2, Additional File 3). Thus, 18 markers could theoretically discriminate \({5}^{18}\) accessions. For this calculation, an ideal marker set was assumed, consisting of markers that are completely independent of each other, all having a uniform allele distribution and an error-free calling rate of 100%. In practice, however, these assumptions are not fulfilled. The calling rate we achieved (95.2%) was greater than that of Tang et al. (2022) [34] who used 48 KASP™ markers from the LGC Genomics database to differentiate 518 rice varieties with a calling rate of 92.69%. Nevertheless, a 100% calling rate cannot be achieved for various reasons. Primers may not bind due to low DNA quality or rare variants. In addition, the allele dosage clusters may not be sufficiently separated; thus, some genotypes cannot be assigned with sufficient certainty. We analysed all accessions with FitTetra under the assumption that the accessions were tetraploids (4x = 28). However, cultivated roses can also be triploid (3x = 21), and wild species are often diploid (2x = 14) [35]. Distinguishing diploid and tetraploid yam accessions, Cormier et al. (2019) [36] reported good allele dosage assessment quality when using KASP™ assays and the five allele dosage clusters of tetraploids. But although diploid organisms are expected to show the same homozygous and balanced allele dosage ratios as tetraploids (0:2 = 0:4; 1:1 = 2:2; 2:0 = 4:0), triploid roses could show clusters between the expected allele dosage ratios (1:3; 3:1) and are therefore not called. Including ploidy level in the statistical evaluation may improve the calling rate, but analysing each sample with a flow cytometer would lead to much higher costs and considerably more personnel effort. Therefore, the calling rate in our study (95.2%) together with our evaluation of the robustness of the markers is a good compromise for a cost-effective marker assay with sufficiently high precision, even if ploidie levels are mixed in the samples.

In addition, the allele distribution in our study was not uniform. Although we only selected markers in the SNP chip where at least 10% of the 95 genotypes were scored in each of the allele dosage groups, the minor frequencies of the genotypes in the allele dosage groups in the 4,187 accessions screened by PACE assay often had significantly lower values (2–17%). We assume that the differences are because the population structure of the genotypes in the WagRhSNP chip data is not representative. Since roses are highly polymorphic and heterozygous, there may be bias in some SNPs of the chip, which is designed based on a small panel of 95 genotypes that will not cover other genotypic groups properly (ascertainment bias) [37, 38]. This feature also likely explains why only 20 out of the 75 markers converted from the SNP chip analysis were selected. This conversion rate of 27% is significantly lower than that reported in other studies. Shen et al. (2021) [39] were able to use 69.4% of their KASP™ markers for high-quality genotype clusters in broccoli, whereas Winfield et al. (2020) [22] showed that 21 out of 31 (68%) selected KASP™ markers had a high level of reproducibility and good discrimination capacity in apple varieties. However, both Winfield et al. (2020) [22] and Shen et al. (2021) [39] scored only diploid organisms and did not expect tetraploid clusters. Due to the low transformation rate of the markers in our study, we preferred to test the performance of individual markers first; however, many approaches are available to program a pipeline in advance to test marker combinations in silico [22, 40, 41].

Given that the assumptions of a 100% call rate and a uniform allele distribution are not met in reality, we tested three-fold more markers (18 markers) than required under these theoretical assumptions. We showed that the resolution of our marker set was comparable to that of the SSR markers. Only four out of the 1,957 accessions (0.2%) could not be distinguished with our developed marker set with 18 PACE® markers (including repetitions), but could be distinguished with SSR markers.

Robustness of the developed marker set

Assessing the robustness of the marker set is crucial, particularly when comparing numerous accessions that cannot be scored in a single analysis unit. The evaluation of the PACE® assays is not possible in relation to the individual sample; this can only be achieved by comparing the fluorescence ratios with other samples on the analysed plate. By comparing the scores of different plate layouts, we investigated the influence of the composition or number of samples on the scoring of the respective marker. In our study, the markers had an average disagreement of 3% between the different analysis units (the single plate layout and the original plate layout). Two markers, RhK5_8422_105 and RhK5_125_737, showed differences between the analysis units in more than 10% of all analysed accessions and were therefore excluded from the marker set. These deviations may be due to underrepresentation of an allele dosage in the population analysed. The RhK5_8422_105 marker had the lowest evenness with an allele dosage in which only 3% of all accessions occured. For the second poorly performing marker, RhK5_125_737, the calculated evenness of 0.979 initially seems relatively high; however, the evenness is clearly distorted by incorrect calling by FitTetra (Additional File 8). Therefore, it is not sufficient to only consider evenness. Our study demonstrates the importance of conducting robustness analyses for newly developed markers, particularly when comparing samples that are not located on the same plate and therefore not assessed in the same analysis unit. Some confirmed duplicate accessions had differences in one to three SNPs when analysed using the two different plate layouts. When comparing pairs on the same plate, the SSR and PACE® results were identical. This suggests the need to establish an error threshold to account for statistical uncertainties when analysing accessions on different analysis units, even when markers are carefully selected. In particular, samples that only differ in one allele dosage group for one SNP should be re-analysed with several markers or on the same 96 or 384 well plate, which can be statistically evaluated as one analysis unit. Evaluation of marker quality by comparing the scores of genotypes in relation to two differently sized groups was not found in the current literature on fingerprinting studies. In the published studies, the error rate was often determined by comparing a set of technical replicates or was not performed at all [21, 22, 36, 39, 42]. Therefore, we tested a new approach by analysing different plate layouts. However, our method involved subsampling, and the genotypes on the plate were not completely rearranged. It would be interesting to analyse a larger unit with 1,536 samples on one plate (for which a high-throughput cycler is required) and then subsample hundreds of times to evaluate the robustness of a marker based on the allele dosages of the accessions localised on the same plate.

Amplicons have a significantly higher resolution, but PACE® assays are currently more cost-effective, and the platform change is not trivial

SNPs can be detected using various techniques. As NGS technologies continue to evolve, precision and output constantly increase, whereas the cost per sequenced base decreases [43]. Consequently, amplicon sequencing potentially represents the first application of NGS to address the problems of systematic SNP genotyping in roses. In our study, we compared amplicons with PACE® markers for a small set of genotypes in terms of cost and resolution. For the amplicon sequencing of 95 genotypes, we spent approximately 380 € of consumables and sequencing (mainly for the proof reading Taq polymerase) per marker, whereas PACE® genotyping cost approximately 7 € for 95 genotypes (DNA isolation not included). Additionally, for amplicon sequencing, the workload is much greater given the need for additional steps in addition to the two PCRs for pooling and purification of the products. Nevertheless, there is no need to purchase and maintain a real-time PCR system. Furthermore, although the PACE® assays only provide information on the respective SNP, amplicon sequencing provides additional sequence information around the SNP. In our study, the amplicon sequences showed between six to 24 SNPs and/or INDELs compared to one bi-allelic SNP in the PACE® assay. Therefore, NGS technologies such as amplicon sequencing might use fewer loci in total to unambiguously characterise accessions of roses in the future, and costs and labour can be further reduced by using a multiplex PCR approach like it is already done in rice, soybean and other major agricultural crops or for pathogen detection [44,45,46].

However, the desired platform change for the detection of SNPs is not easily implemented. In our study, we investigated the comparability of SNP allele dosages between PACE® technology and amplicon sequencing. The agreement between PACE® and the amplicon markers for the analysed SNPs was only 76% When comparing SeqSNP and KASP™, agreement rates of 94% and greater have been reported in the literature [22]. One explanation for the low agreement rate is that too few samples were scanned, the proportion of incorrectly called samples was only 4%. In order to further increase the agreement rates and the general statistical certainty, it would be advisable to set the read threshold per genotype well above 100. Because the determined allele dosages within the investigated SNP differed strongly between the PACE® analyses and the amplicon sequencing data for the markers RhK5_8422_105, RhK5_1295_1946 and RhK5_10792_6318, the primer binding sites for the PACE® primers were examined on the basis of the sequence information. According to the amplicon sequence information, the binding sites for the PACE® markers RhK5_8422_105 and RhK5_1295_1946 are located in highly polymorphic regions that contain SNPs and/or INDELs (Supplementary Fig. 3 and Supplementary Fig. 4, Additional File 3), which can lead to difficulties in primer binding for some variants. Additionally, the amplicon primers for the marker RhK5_8422_105 bind not only to the region on chromosome 3 but also to a region on chromosome 1 and amplify a fragment of similar size (10 bp difference). Although the binding sites for the PACE® primer RhK5_10792_6318P showed no polymorphisms (Supplementary Fig. 5, Additional File 3), it is possible that the amplicon primers bind to highly polymorphic regions. This problem can be avoided by developing new amplicons that are not only depending on prior information from SNP chips but are additionally based on published genome sequences of species and cultivars [16, 47, 48] to identify conserved regions for primer binding sites around SNPs. The combination of different reference genomes can reduce the problem of ascertainment bias and errors at problematic sites that occur during assembly due to the highly polymorphic regions in the genome. However, in our study, the amplicons were developed based on PACE® markers so that they could be compared with each other.

Using our marker set for gene bank management: Identification of mislabelling and duplicate accessions

Mislabelling is a common problem in gene bank management worldwide [11, 12, 49]. It is critical to identify the degree of mislabelling in collections, as this affects the quality of the collection and has negative effects on resource allocation in maintaining the collection. Therefore, the aim of this study was to develop a marker set that effectively discriminates different rose genotypes. This should help in the confirmation of accessions deposited at different locations as well as in the identification of duplicates in the gene bank stock. Of the 94 assumed duplicates located at two different sites, 82 pairs of accessions were confirmed, and eight were refuted using our developed marker set and SSR markers. This is a strong indication of mislabelling given that contamination during the transfer of the samples into the tubes as well as during DNA isolation and downstream applications can be ruled out by re-isolating the DNA. Nevertheless, mixing of the samples may have occurred during harvesting and sample processing, even though the process was performed with the utmost care, including numerous quality control checks such as sampling by three people and transfer of the leaf samples into the reaction tubes by at least two people.

Of the 1,957 accessions analysed with all 18 markers without any failure and complete agreement between the different analysis approaches, 174 accessions grouped into 73 clusters after removing the known expected duplicates. Of these, 58 clusters were subsequently confirmed using five previously published SSR markers and repeated PACE® analyses. As mentioned above, the 15 clusters that showed different band patterns in the SSR analysis, containing 42 accessions, could be the result of a very low remaining error rate due to mixing of the samples during collection. Indeed, for seven groups, confusion of accessions or sampling errors are very likely as according to information from the Europa Rosarium Sangerhausen no close relationships between the accessions are known and the phenotypes do not look similar. However, our assessments indicate that the remaining clusters are mostly truly genetically identical accessions. For the accessions in 38 of the clusters, also the phenotypes of the accessions look very similar, and it is very possible that the accessions are identical and have only been given different trade names. This shows how important the information obtained from our marker set in combination with phenotypic comparisons is for gene bank management. However, it is also possible that the accessions are closely related, making it challenging to distinguish these accessions using molecular tools. These clusters might also include cases where sport mutations have been used to register new varieties (known or unknown). In this case, our analyses would not lead to differences in the marker patterns, as we expect only a few genetic changes to cause the sport mutant, and we only covered a very small part of the genome with our markers [50]. Indeed, two of the 59 clusters contained known sport mutants, which we were not able to distinguish. Additionally, one accession and its radiation-induced mutant showed the same marker pattern.

Conclusion

In our study, we developed a set of 18 robust PACE® markers for the fingerprinting of 4,187 rose accessions. The information provided by this marker set is very useful for gene bank management because it displays a high discriminative capacity. Given that PACE® markers always need to be analysed in larger genotypic groups, the robustness of markers can be improved by the choice of group sizes for allele calling. Mislabelling due to label or location changes is a major problem in gene banks worldwide. In addition, there is occasionally confusion regarding breeding history, and breeders often give the same commercial names to their varieties more than once. With the help of our marker set, the genetic identities can be determined and compared. In future applications, our marker set could be used to develop a core reference set of representative accessions, as recommended by Glaszmann et al. (2010) [51] and Kilian and Graner (2012) [52], and thus optimise the selection of gene bank accessions. Additionally, more than 120 wild rose species were successfully analysed with our marker set; therefore, these 18 PACE® markers might also be useful for future biodiversity studies.

As the price of NGS technologies continues to decrease, future analyses may utilise alternative technologies. Given that only two-thirds of the PACE® markers from our set could be converted into amplicon markers, amplicons independent of the SNP positions of the WagRhSNP chip might represent informative alternatives since more genetic information can be used and so ascertainment bias can be reduced. This should be taken into account when comparing the PACE® genotyping results with the amplicon data. However, the PACE® marker set we have developed is currently more cost- and time-effective for genotyping large rose collections.

Data availability

The datasets generated and/or analysed during the current study are available in the figshare repository with the DOIs https://doi.org/10.6084/m9.figshare.25837849 (PACE SNP data), https://doi.org/10.6084/m9.figshare.25846063 (SSR and repeated PACE assay results) and https://doi.org/10.6084/m9.figshare.25836727 (Amplicon sequences). The chip data are available from Wamhoff, D., Patzer, L., Schulz, D. F., Debener, T., & Winkelmann, T. (2023). GWAS of adventitious root formation in roses identifies a putative phosphoinositide phosphatase (SAC9) for marker-assisted selection. Plos one, 18(8), e0287452 under the following DOI: https://doi.org/https://doi.org/10.25835/ie9cdzgi.

The chip data are available at [17] under following DOI: https://doi.org/https://doi.org/10.25835/ie9cdzgi

Abbreviations

AFLP:

Amplified fragment length polymorphism

GBS:

Genotyping-by-sequencing

GWAS:

Genome Wide Association Studies

RAPD:

Random amplified polymorphic DNA

SNP:

Single nucleotide polyporphisms

SSR:

Simple sequence repeats

References

  1. Schulze E-D, Beck E, Buchmann N, Clemens S, Müller-Hohenstein K, Scherer-Lorenzen M. Plant Ecology. 2nd ed. Berlin: Springer Berlin Heidelberg; 2019. p. 865–99.

  2. Antonelli A, Smith RJ, Fry C, Simmonds MS, Paul J Kersey, H.W. Pritchard, et al. State of the World’s Plants and Fungi: Royal Botanic Gardens (Kew) Sfumato Foundation; 05.10.2020.

  3. FAO. Genebank standards for plant genetic resources for food and agriculture; 2014.

  4. O’Donnell K, Sharrock S. Botanic Gardens Complement Agricultural Gene Bank in Collecting and Conserving Plant Genetic Diversity. Biopreservation and Biobanking. 2018;16:384–90. https://doi.org/10.1089/bio.2018.0028.

    Article  PubMed  PubMed Central  Google Scholar 

  5. Wissemann V. Conventional taxonomy (wild roses) In: Roberts AV, Debener T, Gudin S, editors. Encyclopedia of rose science; 2003. p. 111–7.

  6. Young MA. Modern Roses–12: The Comprehensive List of Roses in Cultivation or of Historical or Botanical Importance In Shreveport: The American Rose Society; 2007.

  7. Wylie AP. The history of garden roses. Masters Memorial Lecture.With plates. J R Horticultural Soc. 1954;79:8–24.

  8. Roberts AV, Gladis T, Brumme H. DNA amounts of roses (Rosa L.) and their use in attributing ploidy levels. Plant Cell Rep. 2009;28:61–71. https://doi.org/10.1007/s00299-008-0615-9.

  9. Veluru A, Singh KP, [Nachname nicht vorhanden] N, Panwar S, [Nachname nicht vorhanden] G, Bhat KV, et al. Understanding genetic diversity, structure and population differentiation in selected wild species and cultivated Indian and exotic rose varieties based on microsatellite allele frequencies. IJGPB 2019. https://doi.org/10.31742/IJGPB.79.3.8.

  10. Smulders M, van de Weg WE, Bourke PM, Voorrips RE, Arens P, Maliepaard C. Some thoughts on how to use markers in tetraploid rose breeding. Acta Hortic. 2019:1–6. https://doi.org/10.17660/actahortic.2019.1232.1.

  11. van den Houwe I, Chase R, Sardos J, Ruas M, Kempenaers E, Guignon V, et al. Safeguarding and using global banana diversity: a holistic approach. CABI Agric Biosci. 2020;1:1–22. https://doi.org/10.1186/s43170-020-00015-6.

    Article  Google Scholar 

  12. Zhang W, Sun Y, Liu J, Xu C, Zou X, Chen X, et al. DNA barcoding of Oryza: conventional, specific, and super barcodes. Plant Mol Biol. 2021;105:215–28. https://doi.org/10.1007/s11103-020-01054-3.

    Article  CAS  PubMed  Google Scholar 

  13. Collard BCY, Jahufer MZZ, Brouwer JB, Pang ECK. An introduction to markers, quantitative trait loci (QTL) mapping and marker-assisted selection for crop improvement: The basic concepts. Euphytica. 2005;142:169–96. https://doi.org/10.1007/s10681-005-1681-5.

    Article  CAS  Google Scholar 

  14. Gupta PK, Rustgi S, Mir RR. Array-based high-throughput DNA markers for crop improvement. Heredity. 2008;101:5–18. https://doi.org/10.1038/hdy.2008.35.

    Article  CAS  PubMed  Google Scholar 

  15. Koning-Boucoiran CFS, Esselink GD, Vukosavljev M, van 't Westende WPC, Gitonga VW, Krens FA, et al. Using RNA-Seq to assemble a rose transcriptome with more than 13,000 full-length expressed genes and to develop the WagRhSNP 68k Axiom SNP array for rose (Rosa L.). Front. Plant Sci. 2015;6:249. https://doi.org/10.3389/fpls.2015.00249.

  16. Hibrand Saint-Oyant L, Ruttink T, Hamama L, Kirov I, Lakhwani D, Zhou NN, et al. A high-quality genome sequence of Rosa chinensis to elucidate ornamental traits. Nature Plants. 2018;4:473–84. https://doi.org/10.1038/s41477-018-0166-1.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Wamhoff D, Patzer L, Schulz DF, Debener T, Winkelmann T. GWAS of adventitious root formation in roses identifies a putative phosphoinositide phosphatase (SAC9) for marker-assisted selection. PLoS ONE. 2023;18: e0287452. https://doi.org/10.1371/journal.pone.0287452.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Schulz DF, Schott RT, Voorrips RE, Smulders MJM, Linde M, Debener T. Genome-Wide Association Analysis of the Anthocyanin and Carotenoid Contents of Rose Petals. Front Plant Sci. 2016;7:1798. https://doi.org/10.3389/fpls.2016.01798.

    Article  PubMed  PubMed Central  Google Scholar 

  19. Schulz D, Linde M, Debener T. Detection of Reproducible Major Effect QTL for Petal Traits in Garden Roses. Plants. 2021;10:897. https://doi.org/10.3390/plants10050897.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Steele K, Tulloch MQ, Burns M, Nader W. Developing KASP Markers for Identification of Basmati Rice Varieties. Food Anal Methods. 2021;14:663–73. https://doi.org/10.1007/s12161-020-01892-3.

    Article  Google Scholar 

  21. Mulugeta B, Tesfaye K, Keneni G, Ahmed S. Genetic diversity in spring faba bean (Vicia faba L.) genotypes as revealed by high-throughput KASP SNP markers. Genet Resour Crop Evol. 2021;68:1971–86. https://doi.org/10.1007/s10722-021-01110-x.

  22. Winfield M, Burridge A, Ordidge M, Harper H, Wilkinson P, Thorogood D, et al. Development of a minimal KASP marker panel for distinguishing genotypes in apple collections. PLoS ONE. 2020;15: e0242940. https://doi.org/10.1371/journal.pone.0242940.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Inglis PW, Pappas MdCR, Resende LV, Grattapaglia D. Fast and inexpensive protocols for consistent extraction of high quality DNA and RNA from challenging plant and fungal samples for high-throughput SNP genotyping and sequencing applications. PLOS ONE. 2018;13:e0206085. https://doi.org/10.1371/journal.pone.0206085.

  24. van Rossum G, Drake FL. PYTHON 2.6 Reference Manual. CreateSpace, Scotts Valley, CA; 2009.

  25. Voorrips RE, Gort G, Vosman B. Genotype calling in tetraploid species from bi-allelic marker data using mixture models. BMC Bioinformatics. 2011;12:172. https://doi.org/10.1186/1471-2105-12-172.

    Article  PubMed  PubMed Central  Google Scholar 

  26. R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. 2021. https://www.R-project.org/. Accessed 14 April 2024.

  27. Untergasser A, Nijveen H, Rao X, Bisseling T, Geurts R, Leunissen JAM. Primer3Plus, an enhanced web interface to Primer3. Nucleic Acids Res. 2007;35:W71–4. https://doi.org/10.1093/nar/gkm306.

    Article  PubMed  PubMed Central  Google Scholar 

  28. Pjevac P, Hausmann B, Schwarz J, Kohl G, Herbold CW, Loy A, Berry D. An Economical and Flexible Dual Barcoding, Two-Step PCR Approach for Highly Multiplexed Amplicon Sequencing. Front Microbiol. 2021;12: 669776. https://doi.org/10.3389/fmicb.2021.669776.

    Article  PubMed  PubMed Central  Google Scholar 

  29. Herdegen S, Migalska M, Radwan J. AMPLISAS: a web server for multilocus genotyping using next-generation amplicon sequencing data. Mol Ecol Resour. 2016;16:498–510. https://doi.org/10.1111/1755-0998.12453.

    Article  CAS  PubMed  Google Scholar 

  30. Yan Z, Denneboom C, Hattendorf A, Dolstra O, Debener T, Stam P, Visser PB. Construction of an integrated map of rose with AFLP, SSR, PK, RGA, RFLP, SCAR and morphological markers. Theor Appl Genet. 2005;110:766–77. https://doi.org/10.1007/s00122-004-1903-6.

    Article  CAS  PubMed  Google Scholar 

  31. Spiller M, Linde M, Hibrand-Saint Oyant L, Tsai C-J, Byrne DH, Smulders MJM, et al. Towards a unified genetic map for diploid roses. Theor Appl Genet. 2011;122:489–500. https://doi.org/10.1007/s00122-010-1463-x.

    Article  PubMed  Google Scholar 

  32. Terefe D, Debener T. An SSR from the leucine-rich repeat region of the rose Rdr1 gene family is a useful resistance gene analogue marker for roses and other Rosaceae. Plant Breeding. 2011;130:291–3. https://doi.org/10.1111/j.1439-0523.2010.01780.x.

    Article  CAS  Google Scholar 

  33. UPOV. GUIDELINES FOR DNA-PROFILING: MOLECULAR MARKER SELECTION AND DATABASE CONSTRUCTION ("BMT GUIDELINES"). 17th ed.; 2021. https://www.upov.int/edocs/infdocs/en/upov_inf_17.pdf. Accessed 12 Apr 2024.

  34. Tang W, Lin J, Wang Y, An H, Chen H, Pan G, et al. Selection and Validation of 48 KASP Markers for Variety Identification and Breeding Guidance in Conventional and Hybrid Rice (Oryza sativa L.). Rice (N Y). 2022;15:48. https://doi.org/10.1186/s12284-022-00594-0.

  35. Crane YM, Byrne DH. GENETICS | Karyology. In: Andrew V. Roberts, editor. Encyclopedia of Rose Science. Oxford: Elsevier; 2003. p. 267–273. https://doi.org/10.1016/B0-12-227620-5/00026-4.

  36. Cormier F, Mournet P, Causse S, Arnau G, Maledon E, Gomez R-M, et al. Development of a cost-effective single nucleotide polymorphism genotyping array for management of greater yam germplasm collections. Ecol Evol. 2019;9:5617–36. https://doi.org/10.1002/ece3.5141.

    Article  PubMed  PubMed Central  Google Scholar 

  37. Geibel J, Reimer C, Weigend S, Weigend A, Pook T, Simianer H. How array design creates SNP ascertainment bias. PLoS ONE. 2021;16: e0245178. https://doi.org/10.1371/journal.pone.0245178.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Lachance J, Tishkoff SA. SNP ascertainment bias in population genetic analyses: why it is important, and how to correct it. BioEssays. 2013;35:780–6. https://doi.org/10.1002/bies.201300014.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Shen Y, Wang J, Shaw RK, Yu H, Sheng X, Zhao Z, Gu H. Development of GBTS and KASP panels for genetic diversity, population structure, and fingerprinting of a large collection of broccoli (Brassica oleracea L. var …; 2021.

  40. Dou T, Wang C, Ma Y, Chen Z, Zhang J, Guo G. CoreSNP: an efficient pipeline for core marker profile selection from genome-wide SNP datasets in crops. BMC Plant Biol. 2023;23:580. https://doi.org/10.1186/s12870-023-04609-w.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Fujii H, Ogata T, Shimada T, Endo T, Iketani H, Shimizu T, et al. Minimal marker: an algorithm and computer program for the identification of minimal sets of discriminating DNA markers for efficient variety identification. J Bioinform Comput Biol. 2013;11:1250022. https://doi.org/10.1142/S0219720012500229.

    Article  PubMed  Google Scholar 

  42. Gazendam I, Mojapelo P, Bairu MW. Potato Cultivar Identification in South Africa Using a Custom SNP Panel. Plants. 2022. https://doi.org/10.3390/plants11121546.

    Article  PubMed  PubMed Central  Google Scholar 

  43. Christensen KD, Dukhovny D, Siebert U, Green RC. Assessing the Costs and Cost-Effectiveness of Genomic Sequencing. J Pers Med. 2015;5:470–86. https://doi.org/10.3390/jpm5040470.

    Article  PubMed  PubMed Central  Google Scholar 

  44. Li B, Saingam P, Ishii S, Yan T. Multiplex PCR coupled with direct amplicon sequencing for simultaneous detection of numerous waterborne pathogens. Appl Microbiol Biotechnol. 2019;103:953–61. https://doi.org/10.1007/s00253-018-9498-z.

    Article  CAS  PubMed  Google Scholar 

  45. Onda Y, Takahagi K, Shimizu M, Inoue K, Mochida K. Multiplex PCR Targeted Amplicon Sequencing (MTA-Seq): Simple, Flexible, and Versatile SNP Genotyping by Highly Multiplexed PCR Amplicon Sequencing. Front Plant Sci. 2018;9:201. https://doi.org/10.3389/fpls.2018.00201.

    Article  PubMed  PubMed Central  Google Scholar 

  46. Song Q, Quigley C, He R, Wang D, Nguyen H, Miranda C, Li Z. Development and implementation of nested single-nucleotide polymorphism (SNP) assays for breeding and genetic research applications. The Plant Genome. 2024;17: e20491. https://doi.org/10.1002/tpg2.20491.

    Article  CAS  Google Scholar 

  47. Nakamura N, Hirakawa H, Sato S, Otagaki S, Matsumoto S, Tabata S, Tanaka Y. Genome structure of Rosa multiflora, a wild ancestor of cultivated roses. DNA Res. 2018;25:113–21. https://doi.org/10.1093/dnares/dsx042.

    Article  CAS  PubMed  Google Scholar 

  48. Chen F, Su L, Hu S, Xue J-Y, Liu H, Liu G, et al. A chromosome-level genome assembly of rugged rose (Rosa rugosa) provides insights into its evolution, ecology, and floral characteristics. Hortic Res. 2021;8:141. https://doi.org/10.1038/s41438-021-00594-z.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Dreiseitl A, Zavřelová M. Non-Authenticity of Spring Barley Genotypes Revealed in Gene Bank Accessions. Plants. 2022. https://doi.org/10.3390/plants11223059.

    Article  PubMed  PubMed Central  Google Scholar 

  50. Debener T, Janakiram T, Mattiesch L. Sports and seedlings of rose varieties analysed with molecular markers. Plant Breeding. 2000;119:71–4. https://doi.org/10.1046/j.1439-0523.2000.00459.x.

    Article  CAS  Google Scholar 

  51. Glaszmann JC, Kilian B, Upadhyaya HD, Varshney RK. Accessing genetic diversity for crop improvement. Curr Opin Plant Biol. 2010;13:167–73. https://doi.org/10.1016/j.pbi.2010.01.004.

    Article  CAS  PubMed  Google Scholar 

  52. Kilian B, Graner A. NGS technologies for analyzing germplasm diversity in genebanks. Brief Funct Genomics. 2012;11:38–50. https://doi.org/10.1093/bfgp/elr046.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We would like to thank the Bundessortenamt and its staff, especially Dr. Daniela Christ and Dr. Burkhard Spellerberg, and the Europa Rosarium Sangerhausen and its staff, especially Thomas Hawel and Gerhild Schulz, for providing the samples and for their support in sampling the rose accessions. We would also like to thank Gerhild Schulz for her excellent expertise in categorising the rose accessions and Dr. Daniela Christ, Dr. Swenja Tams, Katja Näthke and Dr. Frauke Lüddeke for their very helpful comments on the manuscript. We also thank Prof. Dr. Thomas Scheper, Dr. Frank Stahl, and Nina Scheler from the Institute of Technical Chemistry for providing and introducing us to the PhoenixPure System. We would like to thank Sara Krause, Laura Koschitzki and Julia Schröter for great technical support with DNA isolation and SSR markers. Additionally, we extend our gratitude to Klaus Dreier, Dr. Marian Uhe and the student assistants for their support in sampling the rose accessions.

Funding

Open Access funding enabled and organized by Projekt DEAL. This work/the project is supported (was supported) by funds of the Federal Ministry of Food and Agriculture (BMEL) based on a decision of the Parliament of the Federal Republic of Germany via the Federal Office for Agriculture and Food (BLE), grant number 2819HS010.

Author information

Authors and Affiliations

Authors

Contributions

TD and ML acquired the funding. TD, ML and LP designed the study. DS contributed the chip data. LP, TT, DW, DS produced and analysed PACE data. LP investigated the quality of the marker set and analysed amplicon data. LP, TD, ML wrote the manuscript with contributions from all authors.

Corresponding author

Correspondence to Thomas Debener.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: Species names.

Additional file 2: Supplementary Table 1. 20-SNP marker set for genotyping roses.

12870_2024_5782_MOESM3_ESM.docx

Additional file 3: Supplementary Figure 1. Distribution of the PACE markers in the genome. Supplementary Figure 2. Theoretical consideration of the required number of markers for tetraploid organisms. Supplementary Figure 3. Binding sites for the PACE marker primers RhK5_8422_105Q. Supplementary Figure 4. Binding sites for the PACE marker primers RhK5_1295_1946P. Supplementary Figure 5. Binding sites for the PACE marker primers RhK5_10792_6318P. 

Additional file 4: Scatterplots from suspected duplicates with one or more SNP differences. 

12870_2024_5782_MOESM5_ESM.docx

Additional file 5: Additional File 5. Comparison between PACE® and amplicon scoring within the same SNP analysed in 95 genotypes with a threshold of 50 reads per genotype.

Additional file 6: Additional File 6. Haplotypes and haplotype frequencies for each marker.

Additional file 7: Additional File 7. Haplotype dosages of each genotype.

Additional file 8: Incorrect calling in FitTetra for the marker RhK5_125_737.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Patzer, L., Thomsen, T., Wamhoff, D. et al. Development of a robust SNP marker set for genotyping diverse gene bank collections of polyploid roses. BMC Plant Biol 24, 1076 (2024). https://doi.org/10.1186/s12870-024-05782-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12870-024-05782-2

Keywords