Complete chloroplast of four Sanicula taxa (Apiaceae) endemic to China: lights into genome structure, comparative analysis, and phylogenetic relationships

Background The genus Sanicula comprises ca. 45 taxa, widely distributed from East Asia to North America, which is a taxonomically difficult genus with high medicinal value in Apiaceae. The systematic classification of the genus has been controversial for a long time due to varied characters in key morphological traits. China is one of the most important distributed centers, with ca. 18 species and two varieties. At present, chloroplast genomes are generally considered to be conservative and play an important role in evolutionary relationship study. To investigate the plastome evolution and phylogenetic relationships of Chinese Sanicula, we comprehensively analyzed the structural characteristics of 13 Chinese Sanicula chloroplasts and reconstructed their phylogenetic relationships. Results In present study, four newly complete chloroplast genome of Sanicula taxa by using Illumina sequencing were reported, with the typical quadripartite structure and 155,396–155,757 bp in size. They encoded 126 genes, including 86 protein-coding genes, 32 tRNA genes and 8 rRNA genes. Genome structure, distributions of SDRs and SSRs, gene content, among Sanicula taxa, were similar. The nineteen intergenic spacers regions, including atpH-atpI, ndhC-trnM, petB-petD, petD-rpoA, petN-psbM, psaJ-rpl33, rbcL-accD, rpoB-trnC, rps16-trnQ, trnE-psbD, trnF-ndhJ, trnH-psbA, trnN-ndhF, trnS-psbZ, trnS-trnR, trnT-trnF, trnV-rps12, ycf3-trnS and ycf4-cemA, and one coding region (ycf1 gene) were the most variable. Results of maximum likelihood analysis based on 79 unique coding genes of 13 Chinese Sanicula samples and two Eryngium (Apiaceae-Saniculoideae) species as outgroup taxa revealed that they divided into four subclades belonged to two clades, and one subclade was consistent with previously traditional Sanicula section of its system. The current classification based on morphology at sect. Sanicla and Sect. Tuberculatae in Chinese Sanicula was not supported by analysis of cp genome phylogeny. Conclusions The chloroplast genome structure of Sanicula was similar to other angiosperms and possessed the typical quadripartite structure with the conserved genome arrangement and gene features. However, their size varied owing to expansion/contraction of IR/SC boundaries. The variation of non-coding regions was larger than coding regions of the chloroplast genome. Phylogenetic analysis within these Chinese Sanicula were determined using the 79 unique coding genes. These results could provide important data for systematic, phylogenomic and evolutionary research in the genus for the future studies. Supplementary Information The online version contains supplementary material available at 10.1186/s12870-023-04447-w.


Introduction
Sanicula L. (Apiaceae-Saniculoideae), consists of ca.45 taxa, is widely distributed from East Asia to North America [1,2].China is one of the most important distributed centers, with ca.18 species and two varieties [3][4][5].It was known as the considerably complex taxonomic genus, with its varied morphological characters in rhizomes, leaves, inflorescences and fruits, placed comparatively primitive within Subfam.Saniculoideae Burnett in primitive of Apiaceae [6][7][8][9][10].Traditionally, based on the features of leaves, flower and fruits, Shan [6] divided the species of world Sanicula into five sections, i.e.Tuberculatae, Pseudopetagnia, Sanicla, Sandwicenses and Sanicoria, and demonstrated that the Chinese Sanicula taxa belonged to the former three sections.A classification was accepted by many later authors [3,4,11].
Sanicula had consistently been viewed as a relatively natural genus within the family Apiaceae [12][13][14].Molecular phylogenetic analyses had also suggested that the genus was a monophyletic group yet based on few Sanicula samples by using the nuclear ribosomal internal transcribed spacer (ITS) region and chloroplast DNA (cpDNA) rpl16 intron, rpoC1 intron, trnQ-rps16 and rps16-trnK intergenic spacers [8,[13][14][15][16][17].Then, a revised phylogeny of Apioideae and Saniculoideae in Apiaceae based on the 90 whole plastome sequences, including only four Chinese Sanicula species, suggested that sectional relationships in Sanicula were distinct from the traditional classification system [2].
Furthermore, based on recent wild observations in eastern, southern, and western China, the interspecific relationships of some groups in the genus were extremely perplexing [9,10].It was also mentioned by many authors, including Chen et al. [14,17], Shan & Constance [6], and Yang et al. [2].In addition, numerous species and varieties in Sanicula were poorly defined due to a lack of field studies and consistent characteristics for diagnostic methods in the literature [2,6,14,17].Therefore, further exploration into more stable genetic variations and effective markers is critical for utilizing and protecting the Sanicula plants.
Chloroplast (cp) genomes were highly conserved in terms of the genetic replication mechanisms in uniparental inheritance and possess the relatively high level of genetic variation resulting from the low selective pressure [18].In addition to its low sequencing cost caused by rapid development of illumine and assembly technologies, the cp genome had been relatively more successful than fragments in resolving the relationship between many species at different taxonomic levels in many species [3,19].
The plastomes of six Chinese Sanicula species were reported previously, including S. astrantiifolia H. Wolff ex Kretschmer, S. chinensis Bunge, S. flavovirens Z. H. Chen, D. D. Ma & W. Y. Xie, S. giraldii R. H. Shan & S. L. Liou, S. lamelligera Hance, S. orthacantha S. Moore and S. rubriflora F. Schmidt [2,17,20,21].However, it seemed that the samples of S. chinensis and S. orthacantha used in previous study were likely mixed up due to their same voucher information.Additionally, few Sanicula species had been involved in molecular studies [2,22] and fewer effective markers were discovered to deal with their inter-and intra-specific relationships.
The aim of this study was to 1) determine the whole plastome sequence of 13 Chinense Sanicula taxa, including the four newly sequenced taxa; 2) compare the global structural patterns of available Chinese Sanicula cp genomes; 3) examine variations in the SSRs and repeat sequences among 13 Sanicula cp genomes; 4) to reconstruct the phylogeny of Chinese Sanicula taxa, and improve the understanding of the relationship and evolution in Chinese Sanicula taxa.

Chloroplast genome structures of four taxa in Chinenese
Sanicula L. and one species in Eryngium L All five new Sanicula cp genomes (Table 1) were similar to other species of Sanicula or other genera in Apiaceae [17].The size of four new cp genome in Sanicula ranged from 155,396 bp in S. orthacantha var.brevispina to 155,757 bp in S. caerulescens, exhibiting a typical quadripartite structure with two single copy regions (LSC and SSC) which were separated by a pair of inverted repeats (IRa and IRb) (Fig. 1).The length of the large single-copy (LSC) ranged from 85,818 bp (S. orthacantha var.brevispina) to 86,209 bp (S. caerulescens), the small singlecopy (SSC) ranged from 17,089 bp (S. hacquetiodes) to 17,106 bp (S. tienmuensis), and IR regions ranged from 26,225 bp (S. caerulescens) to 26,332 bp (S. hacquetiodes) (Table 1).The overall GC content ranged from 38.16% (S. caerulescens) to 38.21% (S. hacquetiodes).
All four newly sequenced Sanicula cp genomes here encoded 103 unique genes, including 79 unique protein-coding genes (PCGs), 20 unique tRNA genes and four unique rRNA genes, and 23 of these were duplicated, with a total of 126 genes (Table 1; * showing the new chloroplast genomes reported in this study).13 genes contain one (atpF, ndhA, ndhB, petB, rpl16, rpl2, rpoC1, rps16, trnA-UGC, trnI-GAU) or two (clpP1, rps12, ycf3) introns, and two of these were tRNA genes (Table 2, Fig. 1).The cp genome contained coding regions ranging from 55.92% to 56.07% and non-coding regions ranging from 43.93% to 44.08%, including both intergenic spacers and introns (Table 2).They were divided into four categories, consisting of photosynthesis, self-replication, other genes, and function unknown showing the new chloroplast genomes reported in this study), and divided into four categories, consisting of photosynthesis, self-replication, other genes, and function unknown genes (Table 2).

Repeat structures and simple sequence repeats
The characteristics of Simple sequence repeats (SSRs) in four newly sequenced Sanicula cp genomes (S.

Statistics of codon usage
According to the codon usage analysis, the total sequence sizes of the PCGs were 67,857-67,863 bp in the four newly sequenced Sanicula taxa genomes; 22,673-22,690 codons were encoded (Additional File 4: Table S4).Leucine encoded with the maximum number of codons ranged from 2382 to 2390, followed by isoleucine, with the number of codons ranged from 1909 to 1918.Cysteine was the least with 237-239.The relative synonymous codon usage (RSCU) values varied slightly among the four newly sequenced Sanicula genomes (Fig. 7).Thirty-two codons were used frequently with RSCU ≥ 1 and 34 codons used less frequently with RSCU < 1. AUG showed a preference in all the four cp genomes.The frequency of use for the codon UGG, encoding the tryptophan (Trp), showed no bias (RSCU = 1).

Phylogenetic analysis
A total of three datasets, including the whole cp genomes sequences (Additional File 5: Fig. S1A), concatenation of 126 unique IGS regions (Additional File 5: Fig. S1B), concatenation of the unique 79 unique PCGs regions (Fig. 8) were constructed to investigate the phylogenetic relationships among 13 Sanicula taxa, with Eryngium planum L. and E. foetidum as outgroup taxa.By using the maximum likelihood (ML) method, three phylogenetic trees were built based on the three respective datasets, which exhibited highly concordant between one another.Therefore, only the ML topology of concatenation of the 79 unique PCGs regions among 13 Sanicula taxa, which also by using the Bayesian inference (BI) and Maximum Parsimony (MP) analyses, were shown here with the ML/MP/ BI support [bootstrap support (bs) / bs / posterior probability (pp)] values added at each node with only slight differences (Fig. 8).

The chloroplast genomic features, sequence variation and the potential molecular markers in Sanicula
The genus of Sanicula L. could be easily distinguished by basal leaves orbicular, rounded-cordate or cordatepentagonal, usually palmately lobed; flowers polygamous, umbels in racemous, cymous or corymbose inflorescences from other genera of Subfam.Saniculoideae in Apiaceae.However, to understand the taxonomy and phylogenetic relationships in Sanicula had been particularly difficult based on its varied morphological characters in rhizomes, leaves, inflorescences and fruits.As a result, the previously reported chloroplast genomes of certain Sanicula species, which have been associated with ambiguous or incorrect information and potential misidentifications, were not included in our analysis, including S. astrantiifolia, S. chinensis, S. giraldii, S. lamelligera, S. orthacantha.In this study, 13 Sanicula genomes (including four newly sequenced, five re-annotated, and four previously reported) representing nine species, one variety and two Eryngium species were used to clarify the phylogenetic relationship.
The structure, gene orders and GC content were highly conserved and nearly similar in the samples of Sanicula analyzed here, and were also identical to other cp genomes in other genera of Apiaceae and other angiosperms [2,17,20,21,[23][24][25][26].The size of the 13 cp genomes varied from 155,335 (S. flavovirens; NC_061752 and OP703176) to 155,764 bp (S. lamelligera; OP703174) (Table 1).The Sanicula cp genomes sequenced here all contained total 126 genes (including 103 unique genes) with the total GC content being 38.16% or 38.25% (Table 1).However, some species were found to contain different numbers of genes in different samples, for examples, S. flavovirens (NC_061752), S. orthacantha var.stolonifera (MT561028), S. rubriflora (MT528260) and S. rubriflora (NC_060324) were reported to contain 129, 133, 133, 130 genes, respectively, whereas all annotated here with 126 genes.To eliminate the influences of references used and annotation software, the 13 samples were re-annotated using Plastid Genome Annotator (PGA) and Geneious Prime 2020.0.5 with Heteromorpha arborescens (NC_053554), and their tRNA genes were verified by tRNA-SE.Unexpectedly, we examined all the 13 sequences re-annotated only with 126 genes and did not find any gene loss in this study (Table 2).
The variation of length in cp genomes usually hinted the IR region expansions, which were useful in evolutionary studies in some taxa [23][24][25][31][32][33].However, our findings indicated that there were only minor variations observed in the cp genomes of Sanicula examined, with no significant expansions or contractions.Among 13 Sanicula cp genome, the length of the IR region varied, with S. rubriflora (26,340 bp; MT528260) exhibiting the longest IR length, while S. flavovirens (26,217 bp; NC_061752) had the shortest.Only the ndhF gene, with an expansion length of 34 bp for S. rubriflora expanded to the IRb region, for remaining 12 Sanicula samples were entirely located within the SSC region.And the rps19 gene with contractions length of 27 bp away from IRb region only in S. flavovirens (NC_061752).These results were also similar to the expansion in the cp genome of other species in Apiaceae among IR regions [18,34].
Genome composition, including the factors such as gene sequence length, tRNA abundance, GC distribution position, and other related features, along with natural selection, were the two major factors affecting codon usage bias [27,28,[35][36][37].The total number of 63 codons present across the Sanicula cp genomes encoding 20 amino acids and codon usage was biased towards A or U at the third codon position, which was in consistent with other Apiaceae taxa [2,23,29,30].Many works proved that the variation of SSRs in cp genomes were widely used in population genetic studies, species identification and evolutionary relationship [26,34,38].In this study, the characteristics of SSRs and short dispersed repeats (SDRs) were also similar among these Sanicula cp genomes.Our results suggested that the mononucleotide (A/T) account for the most abundant repeat type, and the IR regions contained less SDRs and SSRs than LSC and SSC regions, which were consistent with the analyses in other Apiaceae taxa [2,18,34].Therefore, this indicated that the LSC and SSC regions possessed high level of nucleotide variability, which could be used as potential polymorphic molecular markers for identification, phylogeny, and evolutionary study in Sanicula.

Phylogenetic analysis
Based on the conservatism and heritance, cp genomes were effective in inferring the phylogenetic relationships at various taxonomic levels [40].In our phylogenetic analysis using the 79 unique coding genes, the monophyly and infrageneric classification in Chinese Sanicula were investigated.The status of S. chinenesis, S. hacquetiodes, and S. giraldii was also re-evaluated.
The systematics of Chinese Sanicula had been discussed based on molecular phylogenetic research [2,8,13,14,17,21,41] and morphological study [6,41].The phylogenetic trees obtained here were found to be consistent with those reported by Vargas et al. [41] using nuclear ribosomal DNA internal transcribed spacer (ITS).For instance, S. lamelligera and S. orthacantha formed a monophyletic lineage within Sect.Pseudopetagnia, as proposed by Wolff [12].However, the other two sections, including Sect.Tuberculatae and Sect.Sanicla, defined by Drude [42] and de Candolle [43], and relationship among species within these sections suggested by Shan & Constance [6] were not supported.
The samples in clade I had involucellate bracteoles small and shorter than umbellets, fertile flowers 1 to 3 per umbellule, fruits characterized by tuberculate, prickly, lamellate, squamosa [3,6].Two monophyletic subclades, including subclade A and B, were resolved in this clade.Subclade A contained six taxa belonged to Sect.Pseudopetagnia, which characterized by involucellate bracteoles small and shorter than umbellets, fertile flowers only one per umbellule and fruits squamosa, lamellate or with straightly spiculate spicules.Subclade B included two species, S. giraldii of Sect.Sanicla and S. hacquetiodes of Sect.Tuberculatae, with fully supported nested within clade I. Furthermore, Shan & Constance [6] noted that the noteworthy relationship of S. hacquetiodes with Sect.Pseudopetagnia based on the similar morphological characters, including the presence of generally one fertile flowers and tendency towards a subracemose inflorescence structure.However, two species in subclade B lack of a consistent morphological synapomophy except for involucellate bracteoles small and shorter than umbellets.
Species in clade II could be easily distinguished from taxa in clade I by involucellate bracteoles often longer than umbellets in flowering and fertile flowers often 3 or more per umbellule [3,6].In this study, clade II showed that Sanicula chinensis of Sect.Sanicla was nested within Sect.Tuberculatae (including S. flavovirens [5] and S. rubriflora), which formed the sister group of S. flavovirens in well supported.In accordance with previous publications [2], it was suggested that S. chinensis and S. orthacantha formed a strong-supported sister group to S. lamelliegera.However, upon conducting a critical examination, the sample of S. chinensis (MK208987) used in the referenced paper [2] might be a misidentification.Thus, it was advisable to exercise caution when utilizing the sequence in future, as the reliability of the results obtained from this data remained debatable.Within clade II, two subclades, namely subclades C and D, were fully supported.Morphologically, species belonging to subclade D exhibited considerably longer involucellate bracteoles length compared to those in subclade C. Subclade C encompassed two species, formerly assigned to two sections (Sect.Tuberculatae and Sect.Sanicla), which could be easily distinguished by flower characteristics.Subclade D contained two samples of S. rubriflora, a species that was previously classified within Sect.Tuberculatae.It was suggested by Shan & Constance [6] that S. rubriflora may potentially represent an ancestral species within the genus Sanicula.Notably, S. rubriflora was more closely related to S. flavovirens in having numerous staminate flowers with pedicels and base tuberculate fruits with stout uncinate prickles above, rather than to S. chinensis, which had bits of staminate flowers with deciduous pedicel and fruit only covered with uncinate prickles.Thus, to better address the issue of inconsistent classification of chloroplast (cp) genomes and morphology more effectively, it was crucial to obtain additional samples from other species in Sanicula.Particularly, the ones within Sect.Tuberculatae and Sect.Sanicla should be included to validate their placement within the Chinese Sanicula.
Taxonomic inconsistencies in the delimitation of taxa continue to pose a challenge within the genus Sanicula.For instance, S. orthacantha var.brevispina was treated as a synonym of S. orthacantha var.orthacantha by Shan & Constance [6] and Hiroe [7], while had been reinstated as a distinct variety by Liou [11], Fu [44], Wang [45], Sheh & Phillippe [3] and Pimenov [4].Additionally, S. orthacantha var.stolonifera was only recognized by Sheh & Phillippe (2005) along with its publication.In previous study [10], we found that S. orthacantha var.orthacantha definitely differed from S. orthacantha var.brevispina only by short rhizome, oblique rootstock bearing elongated, fibrous roots, sometimes fleshy stoloniferous (vs.slender, elongate and lignified nodes stoloniferous), and S. orthacantha var.stolonifera was a synonym of S. orthacantha var.brevispina.In this study, the results strongly supported the clustering of S. orthacantha var.brevispina with S. orthacantha var.stolonifera, while weakly supporting the relationship between S. orthacantha var.orthacantha and S. orthacantha var.brevispina.Thus, our findings provided substantial support for the treatment proposed by Li et al. [10].

Conclusion
This study reports four newly sequenced complete cp genomes of Sanicula taxa, i.e. S. caerulescens, S. hacquetiodes, S. orthacantha var.brevispina, S. tienmuensis, following the analysis of SSRs, codon usage, IR boundaries, sequence divergence estimates with other nine Chinese Sanicula samples.Insight into the interspecific relationships in the 11 Chinese Sanicula taxa (including 13 samples) verifies, in some degree, the traditional system based on morphology analysis.These results will help to understand the relationship and evolution clearly in Sanicula at the molecular level and benefit their identification, utilization, and protection as herbal medicinal genus.

Plant materials, DNA extraction and sequencing of the chloroplast genomes
Eight species and one variety of Sanicula L. and one species of Eryngium L. were collected from field observation in China (Table 3).Fresh and healthy leaf tissues were collected in field and stored in silica gel.Voucher specimens were deposited in the herbarium of Institute of Botany, Jiangsu Province and Chinese Academy of Sciences (NAS), and their deposition numbers were listed in the

Fig. 1 A
Fig. 1 A circular gene map of four newly sequenced Sanicula chloroplast genomes.Genes shown outside are transcribed clockwise, and inside the circle are transcribed counterclockwise.Genes are color-coded to distinguish different functional groups.The dark grey and the light grey plots in the inner circle correspond to the GC content and AT content, respectively

Fig. 2 A
Fig. 2 A circular gene map of one newly sequenced chloroplast genomes of Eryngium foetidum.Genes shown outside are transcribed clockwise, and inside the circle are transcribed counterclockwise.Genes are color-coded to distinguish different functional groups.The dark grey and the light grey plots in the inner circle correspond to the GC content and AT content, respectively

Fig. 4
Fig. 4 Plots of percent sequence identity of the chloroplast genomes of 12 Sanicula taxa with S. orthacantha var.stonifera (NCBI accession no.MT561028) as a reference

Fig. 5 Fig. 6 Fig. 7
Fig. 5 The nucleotide diversity of the whole chloroplast genomes of the 13 Sanicula taxa.LSC indicates large single copy region, IR indicates inverted repeat region, SSC indicates small single copy region

Fig. 8
Fig. 8 Phylogenetic tree based on complete cp genomes resulting from ML, MP and BI analysis of 13 Sanicula samples and two Eryngium species as references based on concatenation of 79 unique coding genes.The bootstrap support values and posterior probability values are displayed on the branches in the order ML/MP/BI, and values less than 50/50/0.5 are not shown

Table 1
Summary of chloroplast genome features in this study, including four new chloroplast genomes of the Sanicula taxa and one newly in Eryngium, a

Number of unique genes Number of unique genes Number of total genes Number of total genes Reference PCGs tRNAs rRNAs PCGs tRNAs rRNAs
region of 26,161 bp (Fig.2).The overall GC content was 38.13%.It contained 127 genes, including 86 PCGs, 33 tRNA genes and 8 rRNA genes (Table1;(a)

Table 2
List of annotated genes in the chloroplast genomes of four newly sequenced Sanicula taxa and one sample of Eryngium foetidumNotes: Gene ▲ indicates only in Eryngium foetidum; Gene* indicates gene with one introns; Gene** indicates gene with two introns; Gene(2) indicates number of repeat units is 2