Skip to main content

SSR markers development and their application in genetic diversity of burdock (Arctium lappa L.) germplasm

Abstract

Background

Arctium lappa L. is a medicinal edible homologous plant, commonly known as burdock or bardana, which belongs to the Asteraceae family and is abundant all over the world. Genetic diversity assessment is essential for A. lappa germplasm resource conservation and breeding. The assessment techniques include morphological, biochemical, and DNA marker analysis. However, the limited number of available DNA markers is insufficient to conduct related genetic diversity assessment studies.

Results

In this study, we conducted RNA sequencing of the A. lappa cultivar 'Yanagikawa Ideal' and developed SSR markers to characterize the genetic diversity and population structure of 56 A. lappa accessions and 8 wild relative accessions. A total of 4,851 simple sequence repeats (SSRs) loci were identified. The proportions of mono-, di- and tri-nucleotide repeat motifs were 30.40%, 21.50% and 33.10%, respectively. We developed and verified the reliability of 28 SSR core primer pairs through electronic polymerase chain reaction (ePCR) and the PCR amplification process. The polymorphism information content (PIC) values of the 28 SSR core primer pairs ranged from 0.246 to 0.848, with 14 pairs of SSR primers displaying high polymorphism (PIC > 0.5). The 28 SSR core primer pairs showed 100% mobility in Arctium tomentosum Miller and 96.43% mobility in Synurus deltoides (Aiton) Nakai, indicating their high versatility. The average Shannon information index (I) was 1.231, and the average observed heterozygosity (Ho) was 0.132, the average expected heterozygosity (He) was 0.564. The 64 accessions were divided into three clusters at a genetic distance of 0.558. AMOVA analysis shows 83% genetic variation within populations and 17% among populations, highlighting implications for conservation and breeding strategies.

Conclusion

Our study provides 28 newly high-quality SSR markers to enhance genetic resource conservation and breeding programs for A. lappa, as well as to support comparative genomics and cross-species breeding strategies for related species.

Peer Review reports

Background

Arctium lappa L. (2n = 2x = 36) is an annual or biennial medicinal edible homologous plant that belongs to the Asteraceae family. It is widely distributed throughout Northern Asia, Europe, and North America. There are approximately 10 species in the genus Arctium, two of which, A. lappa and A. tomentosum Miller (A. tomentosum), are found in China. Many parts of biennial A. lappa (roots, leaves, fruits, etc.) are medically used to treat diseases in Europe and Asia [1,2,3] for thousands of years. A. lappa has attracted a great deal of attention due to its possession of highly recognized bioactive metabolites with significant therapeutic potential [4,5,6,7,8,9]. Numerous pharmacological effects have been demonstrated in vitro and in vivo by A. lappa and its bioactive metabolites, including antimicrobial, anti-obesity, antioxidant, anticancer, anti-inflammatory, anti-diabetic, anti-allergic, antiviral, gastroprotective, hepatoprotective, and neuroprotective activities [10,11,12,13,14,15,16,17,18,19,20,21,22,23,24]. Besides, annual A. lappa root is eaten raw or cooked in a variety of food preparations, petiole young leaves and sprouts are also widely eaten in China, Japan, and South Korea due to their good health benefits. China is the biggest producer and exporter of A. lappa, contributing to 80% of the world’s Arctium lappa L. exports in 2023 with a complete of $30 million.

However, there are few good varieties of medicinal and vegetable burdock in China. Medicinal burdock mainly relies on the cultivation of local varieties and the accession of wild resources. Vegetable burdock varieties were selected by radiation mutagenesis. Apart from 'Xin lin No.1', a new variety cultivated in China, Japanese varieties like 'Yanagikawa Ideal', 'Rehmannia glutinosa', and 'Tsunei' are now widely grown in China. Genetic diversity analysis plays a pivotal role in the assessment and conservation of burdock germplasm and some efforts have been made to identify burdock germplasm. The identification of germplasm is a prerequisite for breeding superior varieties, and molecular marker technology could shorten the identification time and provide more accurate results. In China, the Liaoning team primarily employed traditional Chinese Medicine (TCM) quality markers, or Q-Markers, to select high-quality burdock varieties. International research has focused more on identifying the nutrients of the respective local burdock varieties, with a scarcity of molecular markers and related studies.

With the development of burdock genomics and molecular biology, it is now possible to identify burdock germplasm using molecular markers, study the genetic diversity of germplasm resources, localize QTLs of important traits, and conduct gene mining and breeding work. In previous studies, internal transcribed spacer (ITS) sequences [25], random amplified polymorphic DNA (RADP) markers, and sequence-related amplified polymorphism (SRAP) markers were employed for cluster analysis of burdock germplasm to generate fingerprints [26]. However, the number of accessions used was small. In addition, the late start of the collection of burdock germplasm resources, the large and complex genome, and the lack of effective molecular markers, limit germplasm conservation and the analysis of A. lappa functional genomics research. This issue is particularly prominent in constructing high-density genetic maps and mining functional genomes, as highlighted in related studies [27,28,29,30].

The novelty of the SSR markers compared to existing tools used for A. lappa studies, known for their high polymorphism, universality, and simplicity, are an ideal tool for diversity analysis and marker-assisted molecular breeding for complex traits. It has been shown that SSR loci play a role in gene regulation, transcription, evolution, protein function, genome structure, DNA replication, and cellular recycling [31, 32]. SSR loci have become the preferred markers for genetic diversity and resource identification. Many studies [33, 34] have reported the widespread use of SSRs based on transcriptome or genome sequencing data in various crops worldwide. However, no relevant studies on SSR molecular markers for burdock have been conducted so far.

Therefore, we selected 28 SSR core molecular markers through SMRT sequencing and revealed the rich genetic diversity within the 64 accessions. Our study confirmed a lower genetic diversity in vegetable cultivars than in medicinal ones, while wild accessions exhibited the highest diversity. We recommended selecting wild burdock germplasm with a large genetic distance as parents in future breeding programs to develop new burdock varieties. This not only supports the conservation and identification of these valuable resources but also facilitates the identification of elite genotypes and paves the way for molecular breeding strategies and medicinal use in burdock.

Results

Transcriptomics and SSR locus identification of A. lappa

Construction of RNA sequencing gene libraries. A total of 29.27 Gb raw data with an N50 of 164,821 bp was obtained. After processing, 26,373 polished consensus sequences were generated with a maximum length of 7,443 bp minimum length of 370 bp, and an N50 of 2739 bp. The number of multiple and uniquely mapped sequences was 25,494, which accounted for 96.67% of the whole processed transcripts, indicating a high quality of the transcriptome sequencing. Subsequently, the mapped transcripts were subjected to novel gene discovery and annotation, transcription factor analysis, LncRNA analysis, and fusion gene analysis (Fig.S1-S4). These high-quality transcripts provide us with a large number of potential SSR loci.

A total of 4,851 valid SSR loci with a length longer than 10 nt were identified from the entire RNA sequencing gene libraries using TBtools software. Among these loci, the 4,851 validated SSRs analyzed consisted of 1,477 single-base repeats, 1,044 double-base repeats, 1,606 triple-base repeats, and 724 other repeat types (Fig. 1). The loci had an average density of 2.7958 SSR Mb−1. According to previous studies [32], we have categorized the SSR loci into three major classes: Class I (hypervariable: > 30 nt) with 523 repeats, Class II (potentially variable: 20–30 nt) with 948 repeats, and Class III (variable: < 20 nt) with 3380 repeats. The dominant motif types for each repeat unit were A/T (1454 repeats), AG/CT (338 repeats), and AAG/CTT (67 repeats), accounting for 29.97%, 6.97%, and 1.38% of the total repeat units, respectively (Table 1). The repetitive units were repeated 5–49 times, and the length of single SSR loci ranged from 10 to 262 bp, with motifs mainly concentrated in the range of 10–150 bp. The length of compound SSR loci ranged from 18 to 262 bp. Recently, similar research in various plants including rice, pigeonpea, and pomegranate, has shown significantly higher Class I SSRs (> 30 bp) [32, 35, 36].

Fig.1
figure 1

Repeat type, repeat number, and proportion of SSR motifs in the transcripts of 'Yanagikawa Ideal'

Table 1 Repeat type, repeat number, and proportion of RNA-Seq in A. lappa

SSR core primer pairs development and selection

In this study, 485 SSR primer pairs were designed based on the 524 SSR loci of Class I, which are might highly variable. These SSR primers were validated and screened by ePCR to generate 108 pairs of polymorphic SSR primers. PCR amplification using four different burdock genotypes showed that 28 SSR core primers with clear and stable bands (Fig. 2) and polymorphism would be good candidates for analyzing the genetic diversity in burdock (Table 2). Therefore, the DNA of the 64 accessions collected was extracted and amplified using SSR core primers, yielding a total of 258 polymorphic sites, with an average of 9.21 polymorphic sites per primer pair. The average number of alleles value of each primer pair (Na) was 7.393, the average effective number of alleles (Ne) was 3.191, the average Shannon information index (I) was 1.231, the average observed heterozygosity (Ho) was 0.132, the average expected heterozygosity (He) was 0.564, and the average unbiased expected heterozygosity (uHe) and fixation index (F) were 0.570 and 0.755, respectively (Table 3). The mean fixation index (F) values ranged from 0.228 to 1, mean 0.755 (Tables 2 and 3), indicating that high genetic diversity and population differentiation. These genetic diversity indices indicated that the populations had high genetic diversity. Besides, the PIC values of the 28 SSR core primer pairs ranged from 0.246 to 0.848 (Table 4), with 14 pairs of SSR primers having PIC values greater than 0.5, indicating high polymorphism. These 28 markers were transferred to two other species, resulting in transferability rates of 100% (28) in A. tomentosum, and 96.43% (27) in S. deltoides, indicating their high versatility in analyzing the genetic diversity of different species in the genus A. lappa and S. deltoides.

Fig.2
figure 2

Results of AHL015 (08 H01), CaoW651 (09 A02), SHI-2008227 (04 D01), and AnHC0419 (39 G05) at SSR locus A30 fragment length (bp) fluorescence intensity (A.U.). Note: Fragment length (bp, X-axis) and fluorescence intensity (A.U., Y-axis) of AHL015 (08 H01), CaoW651 (09 A02), SHI-2008227 (04 D01), and AnHC0419 (39 G05) at SSR locus A30. Denatured PCR products were separated using ABI3730xL and the results showed clear bands and good polymorphism

Table 2 Genetic diversity index of 28 SSR markers across the 64 accessions
Table 3 Genetic diversity index of SSR primers weighted average
Table 4 The characteristics of the 28 SSR core primer pairs

Genetic diversity analysis of A. lappa

SSR primers can reveal the genetic diversity characteristics of different populations and provide important evidence for germplasm resource identification and conservation. In this experiment, 64 accessions and their closely related wild species were analyzed using 28 SSR core molecular markers. The results of the genetic distance matrix showed that the maximum genetic distance between the germplasm was 1, the minimum genetic distance was 0.0385, and the average genetic distance was 0.7106. The clustering results using MEGA 10 showed that 64 accessions were divided into three groups with a genetic distance of 0.558 (Figs. 3 and 4). Cluster I contained 14 accessions, including three S. deltoides, some A. lappa accessions, and wild burdock germplasm. Cluster II contained 15 germplasm, including all vegetable burdock. The remaining 35 accessions were classified into cluster III, which included three medicinal burdocks, one S. deltoides, and the remaining wild burdock germplasm. Principal component analysis showed there were three taxa among the 64 accessions (Fig. 5), which was consistent with the results of cluster analysis.

Fig.3
figure 3

Neighbor-Joining clustering dendrogram of 64 accessions. Note: The background colour of cluster I is yellow, that of cluster II is blue, and that of cluster III is red. Cluster I contains two subgroups, designated as Ia and Ib. Germplasm numbers correspond to the respective germplasm types and species names

Fig.4
figure 4

Neighbor-Joining clustering dendrogram of the 64 accessions based on their genetic distance matrix

Fig.5
figure 5

Principal coordinate analysis plot for 64 accessions showing the separation into three main clusters

The results of population structure analysis of the 64 accessions when K = 2, the value of K was the largest, so the 64 accessions were divided into 2 groups (Figs. 6 and 7). Group I contained 14 accessions, and it can be seen that the germplasm of this group is the same as that in cluster I, both contain 3 accessions of S. deltoides and 4 accessions of A. tomentosum. Normally, when Q < 0.6, the germplasm is considered to be of mixed origin with high genetic diversity. In this study, one germplasm, 'WangCh243', showed Q < 0.8, along with 8 other germplasms: 'SCSB-W-232', 'WangCh243', 'ZhengBJ204', 'AnHC0419', 'M493', 'Ili A. lappa 12', 'Li burdock 18', and 'Rehmannia glutinosa '. Population structure analysis showed that 52 A. lappa and 1 S. deltoides germplasms clustered in Group II, consistent with the germplasm in cluster II and III, which contain all the vegetable and medicinal cultivars.

Fig.6
figure 6

ΔK evaluations of the 64 accessions

Fig.7
figure 7

Genetic structure of 64 accessions as inferred by STRUCTURE based on 28 SSR primer pairs. Note: Group I is represented by the red corresponding germplasm number, while group II is represented by the green corresponding germplasm number

AMOVA analysis of 64 accessions was employed to assess the variance components among and within populations. The genetic variation rate among the A. lappa populations was 17%, while the genetic variation rate within populations was 83% (Table 5), indicating high genetic diversity among these 64 accessions.

Table 5 Analysis of the molecular variance (AMOVA) of the 64 accessions populations

Discussion

High-quality and high-transferability SSR markers identified for A. lappa

With the continuous development of sequencing technology, transcriptomics, and genomics, studies have been carried out on many crops and medical herbs. At present, SMRT RNA-seq has been widely used to identify SSR markers [29, 31] and high quality of the transcriptome data could ensure a superior database for screening SSR loci (Fig.S1-S4). SSR motif distribution analysis showed findings (Fig. 1) aligned with studies on SSR motif distribution and polymorphism in Phytolacca acinose, and medicinal plants [37, 38]. SSR core primers generally have good polymorphism, stability, repeatability, and distinguishability [39,40,41,42,43]. In this study, we found that A30, A60, A87, A216, A295, A297, A396, A425, A430, and A464 can completely distinguish all germplasm resources with a minimum number of these alternative primers. As the number of germplasm resources increased, it was inevitable to increase the number of core primers. The PIC values between 0.250 and 0.500 indicate a moderate level of polymorphism for the SSR primer pairs, while values greater than 0.500 represent high polymorphism. Thus, 96.43% of the screened SSR primers (27 pairs) exhibited medium to high polymorphism in this study. These 28 SSR core primer pairs can be used for DUS testing and authentication of new burdock varieties, construction of fingerprint libraries of germplasm resources, and efficient management and utilization of existing A. lappa germplasm. Cluster analysis, principal coordinate analysis, and genetic structure analysis have classified the convergence of the developed markers, indicating their scientific validity.

Many studies suggested not only that SSR markers have shown transferability in closely related species [31, 32], but also in different genera, such as Arctium and S. deltoides. Cross-species amplification showed that 27 out of 28 SSR loci were transferable between A. lappa, A. tomentosum, and S. deltoides, demonstrating their universality in identifying burdock germplasm resources. Primers A421 showed no amplification in S. deltoides. It is speculated that the polymorphism of A421 primers can be combined with morphological markers as a criterion for the identification of A. lappa and S. deltoides, indicating cross-species breeding possibilities. EST-SSR transferability provides a cost-effective source of markers for A. tomentosum and S. deltoides, which is important for taxa with low microsatellite frequencies or for those whose microsatellites are difficult to isolate [27, 44].

Genetic diversity analysis among 64 A. lappa accessions

In this study, we have collected 64 accessions of the genus Arctium and the closely related genus Synurus, containing 9 vegetable cultivars, 3 medicinal ones, and 52 wild accessions. The 28 SSR core primer pairs were applied for cluster analysis of these varieties. Among the well-distinguished 64 accessions, S. deltoides and A. tomentosum clustered together in cluster I, indicating the accuracy of 28 SSR core primer pairs. Results of the genetic distance matrix indicated genetic diversity in vegetable cultivars is lower than in medicinal ones and the richest in diversity is wild accessions. The results are similar to Chrysanthemum × Morifolium ramat, Helianthus annuus L., and other medicinal plants [37, 45,46,47,48]. The AMOVA analysis indicated that a high proportion of the total genetic variance was attributable to variations within populations (83%), highlighting implications for breeding and conservation strategies. Clustering and PCA analyses consistently reveal genetic relationships among A. lappa accessions.

Lower genetic diversity in vegetable cultivars than in medicinal ones

A. lappa vegetable cultivars were clustered in cluster II, including the varieties 'Tokiwa', 'Xin lin No.1', 'Xunfeng', 'Yanagikawa Ideal', and 'Tian ma 2018', with genetic distances ranging from 0.0714 to 0.1071, which was a small genetic distance, confirming that these varieties were selected from radiation mutagenesis breeding. The clustering result also indicates that the parents of the local A. lappa vegetable cultivars 'Xin lin No.1' and 'Tian ma 2018' might come from Japan. This suggests that long-term artificial selection may have adverse effects and result in a reduction of genetic diversity and a narrowing of the genetic base of A. lappa variety. The genetic distances among the A. lappa medicinal cultivar 'Dalizi', 'Dongbei Dalizi', and 'Chuan Dali 1' [3] were 0.2143 to 0.3036, indicating that the genetic diversity among A. lappa medicinal cultivars was higher than that of A. lappa vegetable cultivars. The breeding method of A. lappa medicinal cultivars was still based on traditional systematic selection, which was consistent with genetic distance results. This is a similar pattern in fruit science, which have cross breeding, irradiation breeding and bud-sport selection [28, 40]. Based on the clustering results, we recommended selecting A. lappa wild germplasm with far genetic distance in future breeding of new A. lappa varieties to enrich the genetic compositions in A. lappa cultivars and improve their resistance and environmental adaptability.

High genetic diversity among wild A. lappa resources

The native habitat of wild accession is mainly located in Xinjiang, western Sichuan, and Yunnan-Guizhou, which are the areas where the second to third stages of China geography are located. According to the clustering results, 10 of the 11 germplasm in cluster Ia were from Xinjiang. We speculate that the closer the geographical distribution of A. lappa wild accessions is, the closer the affinity they have, which is in part related to the origin and spread of A. lappa. S. deltoides 'M493' is vastly different from A. lappa wild accessions 'SHI- M493' and 'SHI- 2009275' in the genetic distance matrix, which confirmed their genetic relationship of molecular markers. A. tomentosum 'SHI-2008227', 'SHI-2009459', 'Zhangdy367', and 'Zhangdy244' were clustered in cluster Ib. The genetic distance between them was 0.4643, 0.5357, and 0.4643, respectively, and they are all from Xinjiang, indicating that geographic location influences genetic diversity.

Breeding recommendations and future directions

Analyzing the genetic diversity and population structure of germplasm resources would clarify the relationship between resources in different regions, explore new valuable genes, and promote the selection and breeding of new high-yielding, high-quality, and multi-resistant varieties. Given the lack of identification of burdock germplasm resources, it is recommended to make efforts to increase the accessions, identification, and protection of A. lappa germplasm resources. In this study, we used SMRT sequencing to sequence the A. lappa cultivar 'Yanagikawa Ideal', identified the SSR locus, and developed 28 SSR core primer pairs that effectively distinguished 64 accessions. In addition, wild accessions exhibited higher genetic diversity, which is crucial for enriching cultivated varieties. Our findings will provide some guidance for breeding new valuable cultivars with strong resistance and adaptability by using genetically diverse wild varieties, as well as to identify distant genetic relationships to broaden the genetic base of new burdock varieties.

Our study was the first to identify 28 SSR primer pairs and analyze the genetic diversity of 64 accessions, but these 28 pairs of primers could only cover 14 burdock chromosomes (18 chromosomes in total). Future research will focus on expanding germplasm identification and conservation to prevent genetic erosion, exploring additional genetic markers or whole-genome studies for deeper insights, and investigating genetic traits linked to adaptability, yield, and resistance. We hope our work will help develop new varieties with excellent traits and contribute to the breeding strategies of burdock and related species in Asteraceae family.

Conclusions

The 28 newly developed SSR primer pairs with high polymorphism will be useful for further investigating population genetics and germplasm identification of A. lappa accessions and other members of this genus. Our research study confirms that genetic diversity in vegetable cultivars is lower than in medicinal ones and the richest in diversity is wild accessions. We recommended selecting A. lappa wild germplasm as parent selection, with far genetic distance in future breeding of new A. lappa varieties to enrich the genetic compositions in A. lappa cultivars and improve their resistance and environmental adaptability.

Materials and methods

Germplasm material accessions and SMRT sequencing

In this study, we collection 64 accessions, including 56 A. lappa, 4 A. tomentosum, and 4 Synurus deltoides (Aiton) Nakai (S. deltoides) (Table 6). Forty-five varieties were collected from the germplasm bank of wild species, eight were collected from the fields in the Ili Kazakh Autonomous Prefecture, one medicinal germplasm was provided by Sichuan Agricultural University, and the remaining ten accessions were collected from Xuzhou, Jiangsu Province, China.

Table 6 Basic information of the 64 accessions

'Yanagikawa Ideal', a variety of edible burdock, was selected for SMRT sequencing. Samples were collected from different parts of the plant at different stages, including small flowers (unopened and in full bloom), biennial leaves, petioles, stems, young annual leaves and petioles, mature annual leaves and petioles, annual roots, and seedlings. Total RNA samples of acceptable quality and concentration were obtained, followed by library construction. These mixed samples were sequenced using PacBio Sequel SMRT sequencing to assemble third-generation of full-length reference transcripts [48,49,50], resulting in a comprehensive database of A. lappa SSR types from mixed samples of three individual plants.

PCR polymorphism amplification and primer screening

A. lappa samples were collected from young leaves and hypocotyls, after seed germination. The tissues were frozen in liquid nitrogen and ground using a tissue grinder. Genomic DNA was extracted from the frozen tissues using a modified CTAB method [51]. The quality of the DNA samples was assessed using a NanoDrop 1000 spectrophotometer (Thermo Scientific, DE, USA)and electrophoresis on a 1% agarose gel. DNA samples were added to ddH2O containing RNase H and diluted to a working concentration of 50 ng/μL and stored at -80°C. The total PCR reaction volume was 10 μL, consisting of 5 μL of 2 × Taq MIX (Vazyme, China), 1 μL of forward primer (0.01 nmol/μL), 1 μL of reverse primer (0.01 nmol/μL), 1 μL of DNA template (50 ng/μL), and 2 μL of ddH2O. The PCR conditions were as follows: an initial denaturation at 95°C for 3 min, followed by 35 cycles of denaturation at 95°C for 15 s, annealing at 55–60°C for 15 s, and extension at 72°C for 1 min. A final extension step at 72°C for 5 min completed the reaction. Using the SSRminer function in the TBtools software [52], SSRs were rapidly identified within the database range. The Batch qPCR Primer Design function was then utilized to design SSR primers for transcript sequences in bulk. The Primer Check function was employed to screen for polymorphic SSR primers. The SSR motifs length was set > 30 nt, with amplification product sizes ranging from 100–600 bp and Tm (℃) values of 55–65°C, avoiding hairpin structures and primer dimers. Four A. lappa genomic or transcriptomic data (BioProject ID PRJNA598011; GenBank ID GCA_023525745.1; BioProject ID PRJNA548834 and CNP0006581) were used to validate the confidence of the designed primers by ePCR, and then screened using TBtools software. Four genotypes, 'Xin lin No.1', 'Rehmannia glutinosa', 'Ili A. lappa 1' and 'Chuan Dali 1' were used to verify the accuracy of the screened primers. PCR amplification of the highly polymorphic SSR was repeated, and the selected PCR products were separated by capillary electrophoresis using a fragment analyser (AATI). Specific bands were read using PROSzie 3.0 to identify SSR core primer pairs (Table 4). The core primers were subsequently subjected to fluorescence modifications at the 5′ end, including 6-FAM (blue), HEX (green), and ROX (red). These core primers were used for fluorescence PCR amplification to analyze the genetic diversity and population structure of the 64 accessions.

Data statistics and analysis

The reliability of ePCR was verified by TBtools software to import the A. lappa genome data from NCBI for verification and analysis. The PCR products were sequenced on an ABI3730XL sequencer. The results were read in GenMarker software to obtain the fragment database, which was imported in GenAlEx 6.503 [53] for principal coordinate analysis (PCoA). GenAlEx 6.503 software was also utilized to calculate several genetic diversity parameters, including: the number of alleles (Na), effective number of alleles (Ne), Shannon information index (I), observed heterozygosity (Ho), expected heterozygosity (He), Unbiased Expected Heterozygosity (uHe), Fixation Index (F), polymorphism information content (PIC), and molecular variance (AMOVA) among A. lappa populations.

To convert the fragment database into binary code (0/1), amplifed products were scored as present (1) or absent (0). The binary (0/1) database was imported into Darwin software to calculate the Jaccard genetic distance matrix. The genetic distance matrix was then imported into MEGA 10 to perform clustering analysis using the NJ (Neighbor-Joining clustering method) method. The resulting phylogenetic tree was annotated using Evolview 24 [54] (Interactive tree of life). Finally, Structure 2.3.4 was used to analyze the structure of 64 accessions. The population number K was set to rang from 2 to 10, 10 runs, burn-in time of 200,000, and 1,200,000 repetitions. The results were compiled and uploaded to the Structure Harvester online platform to determine the final K value [55].

Data availability

All data generated or analysed during this study are included in this article. The data that support the findings of this study have been deposited into CNGB Sequence Archive (CNSA) of China National GeneBank DataBase (CNGBdb) with accession number CNP0006581.

References

  1. Jin X, Liu S, Chen S, Wang L, Cui Y, He J, Fang S, Li J, Chang Y. A systematic review on botany, ethnopharmacology, quality control, phytochemistry, pharmacology and toxicity of Arctium lappa L. fruit. J Ethnopharmacol. 2023;308:116223.

  2. Li Z, Zhang Z, Ding J, Li Y, Cao G, Zhu L, Bian Y, Liu Y. Extraction, structure and bioactivities of polysaccharide from root of Arctium lappa L: a review. Int J Biol Macromol. 2024;265(2):131035.

    Article  PubMed  CAS  Google Scholar 

  3. Jiang Y, Wang T, Zuo D, Wang L, Yang R, Liao J, Zhang L. A new Arctium lappa cultivar “Chuan Dali 1”. Acta Horticulturae Sinica. 2023;50(5):1171–2 (in Chinese).

    Google Scholar 

  4. Chan YS, Cheng LN, Wu JH, Chan E, Kwan YW, Lee SM, Leung GP, Yu PH, Chan SW. A review of the pharmacological effects of Arctium lappa (burdock). Inflammopharmacology. 2011;19(5):245–54.

    Article  PubMed  CAS  Google Scholar 

  5. da Silva LM, Allemand A, Mendes DA, Dos Santos AC, André E, de Souza LM, Cipriani TR, Dartora N, Marques MC, Baggio CH, Werner MF. Ethanolic extract of roots from Arctium lappa L. accelerates the healing of acetic acid-induced gastric ulcer in rats: involvement of the antioxidant system. Food Chem Toxicol. 2013;51:179–87.

    Article  PubMed  Google Scholar 

  6. Hsieh CJ, Kuo PL, Hsu YC, Huang YF, Tsai EM, Hsu YL. Arctigenin, a dietary phytoestrogen, induces apoptosis of estrogen receptor-negative breast cancer cells through the ROS/p38 MAPK pathway and epigenetic regulation. Free Radic Biol Med. 2014;67:159–70.

    Article  PubMed  CAS  Google Scholar 

  7. Corrêa RCG, Peralta RM, Haminiuk CWI, Maciel GM, Bracht A, Ferrelra ICFR. New phytochemicals as potential human anti-aging compounds: reality, promise, and challenges. Crit Rev Food Sci Nutr. 2018;58(6):942–57.

    Article  PubMed  Google Scholar 

  8. Gao Y, Gu C, Wang K, Wang H, Ruan K, Xu Z, Feng Y. The effects of hypoglycemia and weight loss of total lignans from fructus arctii in KKAy mice and its mechanisms of the activity. Phytother Res. 2018;32(4):631–42.

    Article  PubMed  CAS  Google Scholar 

  9. de Souza ARC, de Oliveira TL, Fontana PD, Carneiro MC, Corazza ML, de Messias Reason IJ, Bavia L. Phytochemicals and biological activities of burdock (Arctium lappa L) extracts: a review. Chem Biodivers. 2022;19(11):e202200615.

    Article  PubMed  Google Scholar 

  10. Wang D, Bădărau AS, Swamy MK, Shaw S, Maggi F, da Silva LE, López V, Yeung AWK, Mocan A, Atanasov AG. Arctium species secondary metabolites chemodiversity and bioactivities. Front Plant Sci. 2019;10:834.

    Article  PubMed  PubMed Central  Google Scholar 

  11. Yang RY, Tan JY, Liu Z, Shen XL, Hu YJ. Lappaol F regulates the cell cycle by activating CDKN1C/p57 in human colorectal cancer cells. Pharm Biol. 2023;61(1):337–44.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  12. Zhou Y, Liu L, Xiang R, Bu X, Qin G, Dai J, Zhao Z, Fang X, Yang S, Han J, Wang G. Arctigenin mitigates insulin resistance by modulating the IRS2/GLUT4 pathway via TLR4 in type 2 diabetes mellitus mice. Int Immunopharmacol. 2023;114:109529.

    Article  PubMed  CAS  Google Scholar 

  13. Guo M, Liang J, Wu S. On-line coupling of counter-current chromatography and macroporous resin chromatography for continuous isolation of arctiin from the fruit of Arctium lappa L. J Chromatogr A. 2010;1217(33):5398–406.

    Article  PubMed  CAS  Google Scholar 

  14. Gao Q, Yang M, Zuo Z. Overview of the anti-inflammatory effects, pharmacokinetic properties and clinical efficacies of arctigenin and arctiin from Arctium lappa L. Acta Pharmacol Sin. 2018;39(5):787–801.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  15. Lv M, Chen S, Shan M, Si Y, Huang C, Chen J, Gong L. Arctigenin induces activated HSCs quiescence via AMPK-PPARγ pathway to ameliorate liver fibrosis in mice. Eur J Pharmacol. 2024;974:176629.

    Article  PubMed  CAS  Google Scholar 

  16. Wang Y, Zhang L, Wang D, Guo X, Wu S. Room temperature ionic liquids-based salting-in strategy for counter-current chromatography in the separation of arctiin. J Chromatogr A. 2016;1478:26–34.

    Article  PubMed  CAS  Google Scholar 

  17. Gao D, Chen H, Li H, Yang X, Guo X, Zhang Y, Ma J, Yang J, Ma S. Extraction, structural characterization, and antioxidant activity of polysaccharides derived from Arctium lappa L. Front Nutr. 2023;10:1149137.

    Article  PubMed  PubMed Central  Google Scholar 

  18. Xiang W, Yin G, Liu H, Wei J, Yu X, Xie Y, Zhang L, XueTang, Jiang W, Lu N. Arctium lappa L. polysaccharides enhanced the therapeutic effects of nasal ectomesenchymal stem cells against liver fibrosis by inhibiting the Wnt/β-catenin pathway. Int J Bio Macromol. 2024;261(1):129670.

  19. Li L, Qiu Z, Bai X, Zhu W, Ali I, Ma C, Zheng Z, Qiao X. Integrated mechanism of immune response modulation by Arctium lappa L. fructans based on microbiome and metabolomics technologies. J Agric Food Chem. 2024;72(19):10981–94.

    Article  PubMed  CAS  Google Scholar 

  20. Singh AK, Singla RK, Pandey AK. Chlorogenic acid: a dietary phenolic acid with promising pharmacotherapeutic potential. Curr Med Chem. 2023;30(34):3905–26.

    Article  PubMed  CAS  Google Scholar 

  21. Susanti S, Iwasaki H, Itokazu Y, Nago M, Taira N, Saitoh S, Oku H. Tumor specific cytotoxicity of arctigenin isolated from herbal plant Arctium lappa L. J Nat Med. 2012;66(4):614–21.

    Article  PubMed  CAS  Google Scholar 

  22. Sun Q, Liu K, Shen X, Jin W, Jiang L, Sheikh MS, Hu Y, Huang Y. Lappaol F, a novel anticancer agent isolated from plant Arctium lappa L. Mol Cancer Ther. 2014;13(1):49–59.

    Article  PubMed  CAS  Google Scholar 

  23. Jiang YY, Yu J, Li YB, Wang L, Hu L, Zhang L, Zhou YH. Extraction and antioxidant activities of polysaccharides from roots of Arctium lappa L. Int J Biol Macromol. 2018;123:531–8.

    Article  PubMed  Google Scholar 

  24. Chen Y, Su JY, Yang CY. Ultrasound-assisted aqueous extraction of chlorogenic acid and cynarin with the impact of inulin from burdock (Arctium lappa L.) roots. Antioxidants (Basel). 2022;11(7):1219.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  25. Xu L, Dou DQ, Wang B, Yang YY, Kang TG. Correlation between ITS sequences of Arctium lappa L. and the quality of medicinal herbs. China J Chinese Meteria Medica. 2011;36(14):1931–5 (in Chinese).

    CAS  Google Scholar 

  26. Geng GY. Identification of germplasm resources and development of high-efficiency cultivation technology in burdock (Arctium lappa L.). Nanjing Agricultural Universuty, 2019. Master Degree. (in Chinese)

  27. Li X, Liu X, Wei J, Li Y, Tigabu M, Zhao X. Development and transferability of EST-SSR markers for Pinus koraiensis from cold-stressed transcriptome through Illumina sequencing. Genes (Basel). 2020;11(5):500.

    Article  PubMed  PubMed Central  Google Scholar 

  28. Patil PG, Singh NV, Bohra A, Raghavendra KP, Mane R, Mundewadikar DM, Babu KD, Sharma J. Comprehensive characterization and validation of chromosome-specific highly polymorphic SSR markers from pomegranate (Punica granatum L.) cv. tunisia genome. Front Plant Sci. 2021;12:645055.

    Article  PubMed  PubMed Central  Google Scholar 

  29. Liu N, Cheng FY, Guo X, Zhong Y. Development and application of microsatellite markers within transcription factors in flare tree peony (Paeonia rockii) based on next-generation and single-molecule long-read RNA-seq. J Integr Agr. 2021;20(7):1832–48.

    Article  CAS  Google Scholar 

  30. Yang Y, Li S, Xing Y, Zhang Z, Liu T, Ao W, Bao G, Zhan Z, Zhao R, Zhang T, Zhang D, Song Y, Bian C, Xu L, Kang T. The first high-quality chromosomal genome assembly of a medicinal and edible plant Arctium lappa. Mol Ecol Resour. 2022;22(4):1493–507.

    Article  PubMed  CAS  Google Scholar 

  31. Bazzo BR, de Carvalho LM, Carazzolle MF, Pereira GAG, Colombo CA. Development of novel EST-SSR markers in the macaúba palm (Acrocomia aculeata) using transcriptome sequencing and cross-species transferability in Arecaceae species. BMC Plant Bio. 2018;18(1):276.

    Article  CAS  Google Scholar 

  32. Bharti R, Kumar S, Parekh MJ. Development of genomic simple sequence repeat (gSSR) markers in cumin and their application in diversity analyses and cross-transferability. Ind Crop Prod. 2018;111:158–64.

    Article  CAS  Google Scholar 

  33. Lu QX, Gao J, Wu JJ, Zhou X, Wu X, Li MD, Wei YK, Wang RH, Qi ZC, Li P. Development of 19 novel microsatellite markers of lily-of-the-valley (Convallaria, Asparagaceae) from transcriptome sequencing. Mol Biol Rep. 2020;47(4):3041–7.

    Article  PubMed  CAS  Google Scholar 

  34. Biswas MK, Bagchi M, Biswas D, Harikrishna JA, Liu Y, Li C, Sheng O, Mayer C, Yi G, Deng G. Genome-wide novel genic microsatellite marker resource development and validation for genetic diversity and population structure analysis of banana. Genes (Basel). 2020;11(12):1479.

    Article  PubMed  CAS  Google Scholar 

  35. Singh H, Deshmukh RK, Singh A, Singh AK, Gaikwad K, Sharma TR, Mohapatra T, Singh NK. Highly variable SSR markers suitable for rice genotyping using agarose gels. Mol Breeding. 2010;25(2):359–64.

    Article  CAS  Google Scholar 

  36. Bohra A, Jha R, Pandey G, Patil PG, Saxena RK, Singh IP, Singh D, Mishra RK, Mishra A, Singh F, Varshney RK, Singh NP. New hypervariable SSR markers for diversity analysis, hybrid purity testing and trait mapping in pigeonpea [Cajanus cajan (L.) millspaugh]. Front Plant Sci. 2017;8:377.

    Article  PubMed  PubMed Central  Google Scholar 

  37. Cheng Y, Li P, Yang Y, Zhang J, Yan F, Wang H. Microsatellites for Phytolacca acinose (Phytolaccaceae), a traditional medicinal herb. Appl Plant Sci. 2017;5(10):1700028.

    Article  Google Scholar 

  38. Guo Y, Zhai L, Long H, Chen N, Gao C, Ding Z, Jin B. Genetic diversity of Bletilla striata assessed by SCoT and IRAP marker. Hereditas. 2018;155:35.

    Article  PubMed  PubMed Central  Google Scholar 

  39. Niemandt M, Roodt-Wilding R, Tobutt KR, Bester C. Microsatellite marker applications in Cyclopia (Fabaceae) species. S Afr J Bot. 2018;116:52–60.

    Article  CAS  Google Scholar 

  40. Ouni R, Zborowska A, Sehic J, Choulak S, Hormaza J, Garkava-Gustavsson L, Mars M. Genetic diversity and structure of tunisian local pear germplasm as revealed by SSR markers. Hortic Plant J. 2020;6(2):61–70.

    Article  Google Scholar 

  41. Fu YR, Liu FL, Li S, Tian DK, Dong L, Chen YC, Su Y. Genetic diversity of the wild Asian lotus (Nelumbo nucifera) from northern China. Hortic Plant J. 2021;7(5):488–500.

    Article  CAS  Google Scholar 

  42. Huang JJ, Liu YM, Han FQ, Fang ZY, Yang LM, Zhuang M, Zhang YY, Lv HH, Wang Y, Ji JL, Li ZS. Genetic diversity and population structure analysis of 161 broccoli cultivars based on SNP markers. Hortic Plant J. 2021;7(5):423–33.

    Article  CAS  Google Scholar 

  43. Xu YQ, Liu R, Xue JQ, Wang SL, Zhang XX. Genetic diversity and relatedness analysis of nine wild species of tree peony based on simple sequence repeats markers. Hortic Plant J. 2021;7(6):579–88.

    Article  Google Scholar 

  44. Samarina LS, Malyarovskaya VI, Reim S, Yakushina LG, Koninskaya NG, Klemeshova KV, Shkhalakhova RM, Matskiv AO, Shurkina ES, Gabueva TY, Slepchenko NA, Ryndin AV. Transferability of ISSR, SCoT and SSR markers for Chrysanthemum × Morifolium ramat and genetic relationships among commercial russian cultivars. Plants (Basel). 2021;10(7):1302.

    PubMed  CAS  Google Scholar 

  45. Li P, Zhang F, Chen S, Jiang J, Wang H, Su J, Fang W, Guan Z, Chen F. Genetic diversity, population structure and association analysis in cut chrysanthemum (Chrysanthemum morifolium Ramat.). Mol Genet Genomics. 2016;291(3):1117–25.

    Article  PubMed  CAS  Google Scholar 

  46. Mandel JR, Dechaine JM, Marek LF, Burke JM. Genetic diversity and population structure in cultivated sunflower and a comparison to its wild progenitor. Helianthus annuus L. Theor Appl Genet. 2011;123(5):693–704.

    Article  PubMed  CAS  Google Scholar 

  47. Lavudya S, Thiyagarajan K, Ramasamy S, Sankarasubramanian H, Muniyandi S, Bellie A, Kumar S, Dhanapal S. Assessing population structure and morpho-molecular characterization of sunflower (Helianthus annuus L) for elite germplasm identification. PeerJ. 2024;31(12):e18205.

    Article  Google Scholar 

  48. Abdel-Ghany SE, Hamilton M, Jacobi JL, Ngam P, Devitt N, Schilkey F, Ben-Hur A, Reddy AS. A survey of the sorghum transcriptome using single-molecule long reads. Nat Commun. 2016;24(7):11706.

    Article  Google Scholar 

  49. Sun L, Luo H, Bu D, Zhao G, Yu K, Zhang C, Liu Y, Chen R, Zhao Y. Utilizing sequence intrinsic composition to classify protein-coding and long non-coding transcripts. Nucleic Acids Res. 2013;41(17):e166.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  50. Zheng Y, Jiao C, Sun H, Rosli HG, Pombo MA, Zhang P, Banf M, Dai X, Martin GB, Giovannoni JJ, Zhao PX, Rhee SY, Fei Z. iTAK: a program for genome-wide prediction and classification of plant transcription factors, transcriptional regulators, and protein kinases. Mol Plant. 2016;9(12):1667–70.

    Article  PubMed  CAS  Google Scholar 

  51. Doyle JJ, Doyle JL. A rapid DNA isolation procedure for small quantities of fresh leaf tissue. Phytochem Bull. 1987;19(1):11–5.

    Google Scholar 

  52. Chen C, Wu Y, Li J, Wang X, Zeng Z, Xu J, Liu Y, Feng J, Chen H, He Y, Xia R. TBtools-II: A “one for all, all for one” bioinformatics platform for biological big-data mining. Mol Plant. 2023;16(11):1733–42.

    Article  PubMed  CAS  Google Scholar 

  53. Peakall R, Smouse PE. GenAlEx 6.5: Genetic analysis in excel. population genetic software for teaching and research-an update. Bioinformatics. 2012;28(19):2537–9.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  54. Subramanian B, Gao S, Lercher MJ, Hu S, Chen WH. Evolview v3: a webserver for visualization, annotation, and management of phylogenetic trees. Nucleic Acids Res. 2019;47(W1):270–5.

    Article  Google Scholar 

  55. Evanno G, Regnant S, Goudet J. Detecting the number of clusters of individuals using the software STRUCTURES: a simulation study. Mol Ecol. 2005;14(8):2611–20.

    Article  PubMed  CAS  Google Scholar 

Download references

Acknowledgements

We gratefully acknowledge Mr.Qing, The Germplasm Bank of Wild Species. For his valuable support in maintaining the wild plant materials for this study. We appreciate for editor and reviewer’s warm work and their valuable comments and suggestions.

Clinical trial study

Not applicable.

Funding

This work was supported by the Scientific Research Fund of Xuzhou Academy of Agricultural Sciences (XM2021007) and Species Variety Resource Conservation Project of MOAR (2023DUS11).

Author information

Authors and Affiliations

Contributions

Y.S., J.F., H.X., and Z.H. performed the experiments. Y.L., Y.L., X.Z. and Y.L. performed statistical analyses. Y.S and J.L. drafted the manuscript. Y.S. and Y.L. contributed to the experimental design and edition of the manuscript. The author(s) read and approved the final manuscript.

Corresponding author

Correspondence to Yaju Liu.

Ethics declarations

Ethics approval and consent to participate

All research on the plant materials detailed in this manuscript comply with the IUCN Policy Statement on Research Involving Species at Risk of Extinction and the Convention on the Trade in Endangered Species of Wild Fauna and Flora. All plant materials collected in this study are conserved in Jiangsu Xuhuai Regional Agricultural Academy of Sciences, China, and the seeds are freely available for scientifc research. This study was supported by the Scientific Research Fund of Xuzhou Academy of Agricultural Sciences (XM2021007) and Species Variety Resource Conservation Project of MOAR (2023DUS11), including handling these plants and collecting samples.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

12870_2025_6203_MOESM1_ESM.docx

Supplementary Material 1: Fig.S1 Gene function annotation KEGG metabolic pathway classification map of 'Yanagikawa Ideal'. Note: The numbers on the bars represent the number of genes in the annotations; The other coordinate axis is the code of Level1 function classes in the database, and the explanation of the code corresponds to the legend description kyoto encyclopedia of genes and genomes. Fig.S2 Transcription factor analysis of A. lappa cultivar 'Yanagikawa Ideal'. Fig.S3 Venn diagram of predicted outcomes of LncRNA in A. lappa cultivar 'Yanagikawa Ideal'. Note: LncRNA prediction statistics of each software prediction for the number of noncoding transcripts drawn as a Venn diagram to visualize the number of noncoding transcripts predicted by each method of common and unique, in order to ensure the accuracy of the prediction results, and ultimately selected a total of the prediction results of each software for subsequent analysis. Fig.S4 Results of gene structure analysis in A. lappa cultivar 'Yanagikawa Ideal'. Note: Circus from the outside to the inside are: 1. Chromosome sequence; 2. Alternative splicing sites (Stacking bar chart, different types of alternative splicing are represented by different colors, light blue is RI, green is A3, yellow is A5, purple is SE, red is MX, brown is AF, dark blue is AL); 3. The APA site; 4. Partial diagram of new transcripts: the closer the color is to red, the greater the density; 5. New gene distribution map: the closer the color is to red, the greater the density; 6. IncRNA density distribution; 7. Fusion genes, purple lines (same), yellow lines (different) genes on chromosomes have fused Fusion gene analysis Fusion genes are chimeric genes in which the coding regions of two or more genes are linked together and under the control of the same set of regulatory sequences (including promoters, enhancers, ribosome binding sequences, terminators, etc.). 

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Su, Y., Fu, J., Xie, H. et al. SSR markers development and their application in genetic diversity of burdock (Arctium lappa L.) germplasm. BMC Plant Biol 25, 196 (2025). https://doi.org/10.1186/s12870-025-06203-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12870-025-06203-8

Keywords