Plastome structure, phylogenomics and evolution of plastid genes in Swertia (Gentianaceae) in the Qing-Tibetan Plateau

Background The genus Swertia is of great medicinal importance and one of the most taxonomically challenging taxa within Gentianaceae, largely due to the morphological similarities of species within this genus and with its closely related genera. Previous molecular studies confirmed its polyphyly but suffered from low phylogenetic resolutions because only limited sequence loci were used. Thus, we conducted the structural, gene evolutionary, and phylogenetic analyses of 11 newly obtained plastomes of Swertia. Our result greatly improved the phylogenetic resolutions in Swertia, shed new light on the plastome evolution and phylogenetic relationships of this genus. Results The 11 Swertia plastomes together with the published seven species proved highly similar in overall size, structure, gene order, and content, but revealed some structural variations caused by the expansion and contraction of the IRb region into the LSC region, due to the heterogeneous length of the ψycf1. The gene rps16 was found to be in a state flux with pseudogenes or completely lost. Similar situation was also documented in other genera of Gentianaceae. This might imply loss of the gene in the common ancestor of Gentianaceae. The distribution plot of ENC vs. GC3 showed all these plastomes arranging very close in the Wright line with an expected ENC value (49–52%), suggesting the codon usage of Swertia was mainly constrained by a GC mutation bias. Most of the genes remained under the purifying selection, however, the cemA was identified under positive selection, possibly reflecting an adaptive response to low CO2 atmospheric conditions during the Late Miocene. Our phylogenomic analyses, based on 74 protein-coding genes (CDS), supported the polyphyly of Swertia with its close allies in the subtribe Swertiinae, presumably due to recent rapid radiation. The topology inferred from our phylogenetic analyses partly supported the current taxonomic treatment. Finally, several highly variable loci were identified, which can be used in future phylogenetic studies and accurate identification of medicinal genuineness of Swertia. Conclusions Our study confirmed the polyphyly of Swertia and demonstrated the power of plastome phylogenomics in improvement of phylogenetic resolution, thus contributing to a better understanding of the evolutionary history of Swertia. Supplementary Information The online version contains supplementary material available at 10.1186/s12870-022-03577-x.

south-western China and the Himalaya region [1][2][3]. Ho described Swertia in China and classified them under seven sections [2]. Later, based on their previous studies, Ho and Liu published a worldwide classification of Swertia, comprising 168 species, under three subgenus, 11 sections [1].
Swertia can be recognized with rotated flowers, one or two nectaries per petal lobe, and a variety of different appendages around the nectaries. However, this genus is one of the most taxonomically difficult genera within Gentianaceae, largely due to the morphological similarities (i.e., nectariferous and rotate corolla lobes) of the species in Swertia and with its allies, namely, Halenia, Lomatogonium, Lomatogoniopsis and veratrilla [1,[4][5][6][7]. Over the past decades, continues efforts have been devoted to molecular phylogeny studies of Swertia based on ITS and plastid sequence loci. It has become clear that Swertia was highly polyphyletic, intertwining with other genera in the subtribe Swertiinae on the inferred trees. Some clades were left unresolved and usually with weak node support, e.g., the Comastoma-Lomatogonium-sect. Platynema, Halenia-sect. Swertopsis, most likely due to only a few molecular markers were employed [4][5][6]8]. More recently, a study based on 76 protein-coding genes generated from plastomes, supported Swertia's polyphyly with greatly improved resolution when compared with those earlier molecular studies, however, only a few Swertia species were included as the representative of this genus [9].
Species of Swertia have long been used in traditional herbal medicine in Asian countries, including China, India, Korea, and Japan. The crude drug of Swertia is applied to treat a variety of diseases, including hepatic, choleric, inflammatory diseases, and bitter stomachic [10][11][12]. These herbaceous plants are rich in xanthones, flavonoids, iridoid and seco-iridoid glycosides, terpenoids, and alkaloids [13]. Despite the excellent medicinal value and abundant medicinal studies, the accurate identification of genuineness of Swertia species has been proved difficult and still lacks of DNA barcoding method [14]. To this aim, barcoding by a comparative plastome method of Swertia authentication is needed to assure good medicinal quality.
Here, we report newly sequenced 11 plastomes of Swertia. Our aims were to: (a) infer the plastome structural evolution and adaptation of Swertia; (b) identify the most variable regions as candidate DNA barcodes for future Swertia species identification; (c) test the phylogeny of Swertia in regard to its current taxonomic treatment.
The 20 Swertia plastomes contained 129-131 genes, composing of 84-86 protein-coding genes (CDS), 8 rRNAs and 37 tRNAs (Fig. 1, Table 1). Twelve CDSs and six tRNAs contained one intron. The clpP and ycf3 contained two introns ( Table 2). The intron of trnK-UUU included the coding sequence for the matK. Rps12 was identified as a trans-spliced gene, with the first exon (5′ end of the sequence) in the LSC region (rpl20_clpP), and the remaining two exons (3′ end of the sequence) in the IR region (rps7_trnV-GAC ). Ycf1 sequence was duplicated in all the 20 Swertia, with one complete copy (1,014-5,517 bp) located on the SSC-IRa border served as a functional gene, and one incomplete copy (920-1,008 bp) on the IRb-SSC border identified as a pseudogene. The same situation was also found in rps19, one complete copy (279 bp) located on the LSC-IRb border acted as a functional gene, while the incomplete copy (161-303 bp) on the Ira-LSC border identified as a pseudogene. ψycf1 and ψrps19 were resulted from incomplete duplications of their functional copes. Furthermore, rps16 was either identified as a pseudogene (in 16 species) or completely lost (in four species). Also, ψinfA were observed in 11 Swertia plastomes. In addition, ψaccD were found in S. cordata and S. mussotii, ψrpl33 in S. hispidicalyx, and Ψycf2 in S. wolfganiana (Fig. 1, Table 3).     The boundaries between IR and SSC/LSC regions in the 20 plastomes were compared (Fig. 2). The LSC/ IRb junction was found only within the rps19. The Ira/ SSC junction was found within the ycf1. The boundary between IRb and LSC slightly varied: 12 species were within the ndhF gene, and seven species within the overlapping region of the ψycf1 and ndhF, only S. tetraptera was identified within the ψycf1. The Ira/LSC border was either within ψrps19 or adjacent to the trnH-GUG .

Codon preference analysis
The total sequence sizes of the CDSs for codon analysis were 66,522-79,371 bp in the 20 Swertia plastomes, and they encoded 22,174-26,457 codons (Additional file 3, Table S3). Isoleucine (Ile) was encoded by the highest number of codons (1,901-2,223), whereas cysteine (Cys) was the least (261-308). The relative synonymous codon usage (RSCU) values of all codons in the form of a heatmap were shown in Fig Table 4, Additional file 4, Table S4). Further, the ENC vs. GC3 plot ( Fig. 4) showed the 20 Swertia plastomes clustered just below the Wright Curve [24,25], and ranging around the expected ENC of 50% GC3.

Characterization of repeat sequences and simple sequence repeats (SSR)
In our study, a total of 980 repeat sequences were detected in the 20 Swertia plastomes, including four categories: forward, reverse, complement, and palindromic. The numbers of the four repeated types were as follows: 18-23 in forward repeats, 3-11 in reverse, 0-2 in complement, and 17-24 in palindromic ( Fig. 5 a, Additional file 5, Table S5). The sizes of repeats ranged from 17 to 509 bp ( Fig. 5 b, Table 5, Additional file 5, Table S5). According to the length of repeat sequences, we artificially divided them into six categories: 17-26 bp, 27-36 bp, 37-46 bp, 47-56 bp, 57-66 bp, and > 66 bp. Among them, the majority of repeats (81%) were 17-26 bp long, while only one repeat was found > 66 bp (509 bp forward repeat) in S. cordata. Slightly more than half (53.4%) of the repeat sequences were located in intergenic spacer (IGS).
A total of 588 simple sequence repeats (SSRs) were identified across the 20 Swertia plastomes ( Table 6). The majority (94.4%) of these SSRs were monomers, particularly A/T, and their lengths were mostly in the 12-19 bp range. There were only 2.7% of dimers (AT and TA) and 2.9% of trimers (GAA, TTA, and ATT), no tetramers, hexamers, and pentamers were found.

Selective pressure in plastid genes
The site-specific model showed none of the 50 CDSs of the 35 Gentianaceae species was under strong positive selection. Though, the cemA (ω = 0.59623) and ycf1 (ω = 0.95807) were found under slightly positive selection. The ω ratios of the most genes were less than 0.2, indicating that they were under a strong purifying selection (Additional file 7, Table S7). The branch-specific model detected that the cemA was under positive selection in most Swertia clades along the phylogenetic tree ( Fig. 8, Additional file 10, Fig. S2), including Clade I (ω = 290), Clade III (ω = 35) and Clade IV (ω = 999), but not for Clade II (ω < 1). For the cemA, the two-ratio model (m2) was significantly favoured over the one-ratio model (m0, 2ΔL = 54.96; P < 0.05). However, for the ycf1, both two-ratio model (m2) and free-ratio model (m1) were refused in the LRT test, indicating the same ω ratio of the ycf1 for all branches in the phylogeny (Additional file 7, Table S7).

Plastome structural evolution and likely adaptation of Swertia
The plastome sequences newly obtained from the 11 Swertia species, plus previously sequenced eight species (Addition file 1) were very similar in the overall structure, length, gene order, and contents (Table 1 and Fig. 1). However, the missing in the rest four species (S. cincta, S. dichotoma, S. pubescens, S. hispidicalyx). The pseudogene rps16 was caused by loss of the second exon and part of the intron. Losses of plastid-encoded rps16 have been reported across a wide range of seed plants [15]. Among Gentianaceae with published plastome sequences, the loss of rps16 has been previously documented for members of variable genera [26][27][28][29][30]. We assumed that the loss of functionality of rps16 might be a synapomorphy shared in the whole family of Gentianaceae, implying an ancient loss in the ancestor of Gentianaceae. The gene loss was likely resulted from the intracellular nuclear-coded rps16 that transferred from mitochondria [31][32][33].
The InfA was identified to be pseudogene in 11 Swertia species. Among them, there was no start codon in 10 species (S. cordata, S. bifolia, S. leducii, S. franchetiana, S. mussotii, S. hispidicalyx, S. verticillifolia, S. souliei, S. wolfganiana, S. erythrosticta), but in S. macrosperma, a 59 bplong-sequence has been deleted. The infA was also found to be pseudogene in seven Gentiana species [34], a sister clade of Swertia in Gentianaceae. In an intensive comparative research on the infA, the entirely missing and/or pseudogene have been reported occurred across 309 angiosperms, suggesting this gene has been transferred to the nucleus many times in different seed plants [35].
Mutation pressure on DNA sequences and natural selection are the two major factors that have been widely accepted to account for codon usage bias [36]. Our analysis of the 20 Swertia plastomes revealed a codon bias having A or U at the third position (Additional file 3, Fig. 3). In the absence of selection pressures, this A/U bias has increased the RSCU values for A/U ending codons. The fact that chloroplast genomes had similar biases of codon usage suggested that chloroplasts, in general, might have particular characteristics of codon usage [37], our finding was consistent with previous reports from many other chloroplast genomes [38][39][40]. The distribution plot of ENC vs. GC3 for the 20 Swertia plastomes showed all the plastomes lying very close in the Wright line [24] with an expected ENC value (49-52%), suggesting the codon usage of Swertia was mainly constrained by a GC mutation bias but not natural selection (Fig. 4).
Long repeat sequences of plastomes have been reported to play major roles in genomic rearrangements and sequence variations [41,42]. However, our study showed no correlation between large repeats and rearrangement events. SSRs have been used widely in plant genomic and evolutionary studies because of their high amounts of variability within species (Table 6) [43,44]. Our study showed the most abundant SSRs were A/T rich mononucleotide, which was consistent with previous reports that SSRs usually consisted of polyA or polyT repeats and rarely contain G or C repeats [45,46].
Our selective pressure analyses of 50 common CDSs (> 300 bp) extracted from the 35 Gentianaceae plastomes only identified the cemA being positively selected. The cemA was found to encode an envelope membrane protein [47]. More recent researches revealed the disruption of this gene led to increased light sensitivity and impacted the CO 2 transportation system in plants [48,49]. Genes related to a specific environment are normally assumed to be under positive selection [48]. Previous studies suggested that the origin and speciation of Swertiinae occurred within 10 Ma (the Late Miocene), and Swertia within 4 Ma (the Pliocene to the Quaternary), possibly triggered by the uplift of the Tibetan Plateau and the climate change associated [6,50,51]. During Mid to Late Miocene, vast areas on earth had experienced a climate cooling transition [52,53], its driven factor, as believed by many researchers, was the decline in atmospheric CO 2 levels, which subsequently led to a transition of the modern ecosystem, e.g., the global radiation of C4 plants [54][55][56][57][58]. Therefore, the positive selection detected in cemA might indicate it occurred in the adaptation and speciation processes in Swertia species during the Late Miocene as a response to low level of atmospheric CO 2. Interestingly, more positive selection was detected in species from higher-elevation regions in the Qing-Tibetan Plateau region of China (Clade IV) but no positive selection was found in species from lower-elevation regions (the eastern Qing-Tibetan Plateau and the south western China) (Clade II) (Additional file 10, Fig. S2), suggesting an adaptive response to the low atmospheric CO 2 environment of the Qinghai-Tibetan Plateau region. However, more Swertia species are needed to be included to test this hypothesis whether positive selection detected on cemA as a beneficial result in the evolution of Swertia genus. In addition, further studies are needed to uncover whether changes in selection pressure are associated with specific biochemical pathways or functions of the cemA in the adaptation and divergence of Swertia.

Phylogenetic analysis
Our plastome phylogenetic study did not produce any surprises topologically: the relationship inferred from the 74 common CDSs were consistent with clades resolved in previously published phylogenetic trees, Swertia was polyphyletic with its closely-related allies, namely Veratrilla, Halenia, Comastoma, Lomatogonium, and Lomatogoniopsis [4-6, 8, 9]. However, our study has greatly improved the phylogenetic resolution of the inferred clades in comparison with these earlier published trees, with most of the nodes having 100% support. The 20 Swertia species divided into four well-supported clades in the phylogenetic tree. Our molecular phylogeny study agreed with Ho TN taxonomic classification in sect. Swertopsis and sect. Platynema [1], besides, a wide range of incongruences between morphological and molecular data was observed (Fig. 8).
Clade I comprised five species: two species belonging to subgen. Poephila (sect. Poephila and sect. Macranthos), paralleling three species of sect. Platynema, Subgen. Ophilia. Clade I was the sister clade to Clade II consisted of six species of sect. Swertopsis, subgen. Ophilia, forming a well-supported monophyletic clade. However, S. leducii, too, belonging to the sect. Swertopsis, was separated from Clade II and positioned in a larger group consisted of Lomatogonipsis-Logmatogonium-Comastoma, Clade I, and Clade II.
Clade III comprised two Halenia species, S. dichotoma of sect. Japonicae, subgen. Ophelia, and S. tetraptera. These four species clustered together with a notably higher solutions in internodes (BS 98-100%, 1.00 PP) than in previously published trees [6,8], supported their close relationship. The taxonomic treatment of S. tetraptera has been in constant debates: Grisebach and Ma segregated it from Swertia as genus Anagallidium [62,63]. Ho, based on morphology, pollination mode, and molecular evidence, segregated this species from Swertia and elevated it to a separate genus as Sinoswertia, but was not used in the recent account of the genus by Flora of China [2,64]. Our phylogenetic study supported S. tetraptera' close relationship with S. dichotoma (sect. Japonicae). Furthermore, these two Swertia species are more closely related to Halenia than with any other Swertia species. The genus Halenia is distributed in Asian mountains but with its diversification in south America [4,5,64], yet we only included two QTP-distributed Halenia species in this study, further studies are needed to discriminate the relationship between Swertia and Halenia. Lastly, clade IV consisted of two paralleled groups: in one group, five species belonging to two different sections (sect. Apterae and sect. Apterae) of subgen. Swertia clustered together, being sister with S. cordata, a species from a different subgenus (sect. Ophelia, Subgen. Ophilia).
Our topology inferred from phylogenetic analyses included two clades that contained remarkably short terminal branches: S. kouitchensis-S. punicea-S. fracnetiana-S. mussotii in clade II and S. przewalskii-S. bifolia-S. souliei-S. wolfganiana in clade IV. This result indicated the emergence of these Swertia species might have occurred only recently, as confirmed by their extremely small genetic distance values, and also supported by our previous divergence time analysis based on rbcL and matK loci: most of the extant Swertia species were evolved during the Pliocene to the late Pleistocene period (4-0.1 Ma) [6]. Recent rapid radiations were detected in numerous studies, and can be explained by climate oscillations in the Quaternary [65][66][67][68][69]. Recently diverged species often have incomplete reproductive barriers and subsequently experience hybridization and/or introgression, especially where their habitats overlap [70]. Ongoing hybridization and introgression that accompanied recent rapid radiation have been detected in numerous studies, both plants and animals [71][72][73][74][75]. This could be the same scenario in Swertia and its allies in our study. Rapid radiations are among the most notorious phylogenetic problems, because retention and stochastic sorting of ancestral polymorphisms can yield incongruence between taxonomic circumscription and molecular data [76], as we observed in the case of Swertia and its closely related taxa, as well as in other rapidly diverged species [71][72][73][74][75]. To uncover the real evolutionary history of Swertia, we will use multi-locus molecular sequences that have evolved independently and population genomic approaches in our future study.

Conclusion
This work was the first to report full sequence data and features of the whole plastomes of 11 Swertia species. The comparison of these plastomes revealed high similarities in terms of the overall structure, long repeat sequence, SSR, and codon usage. However, expansion and contraction in the IRb-LSC region were detected, due to the heterogeneous length of the ψycf1. Our phylogenetic analyses based on 74 CDSs supported the polyphyly of Swertia, was consistent with previous molecular studies, but with significant improved solutions. Furthermore, our phylogenetic study yielded extremely short branches, indicating a recent rapid radiation, hypothetically triggered by the Quaternary climate fluctuations. Our result inferred from the molecular data only partly supported the current taxonomic treatment. Furthermore, the cemA was identified under positive selection in most of the Swertia species, possibly reflecting an adaptive response to low CO 2 atmospheric conditions during the Late Miocene. We proposed a set of 10 most variable loci (trnH-GUG_psbA, ndhC_trnV-UAC , petB intron, trnL-UAA intron, rps16, rps16_trnK-UUU , trnS-UCC_trnG-UCC , trnS-GGA_rps4, psbM_trnD-GUC , ccsA_ndhD) as markers for future phylogenetic and taxonomic studies of Swertia. Overall, our results demonstrated the power of plastome phylogenomic to improve phylogenetic resolution and contribute to a better understanding of plastid gene evolution in Swertia.

Sample collection, plastome sequencing, assembly, and annotation
We collected plant specimens in the Qing-Tibetan Plateau region (QTP) in China. Fresh leaves were collected in the field and dried in silica gel for the future DNA extraction. Total genomic DNA was extracted with the CTAB protocol [77]. Voucher specimens were deposited at the Herbarium of Northwest Institute of Plateau Biology, China Academy of Science (Xining, China) (Additional file 1, Table S1).
We sequenced the complete plastomes of the 11 Swertia species with an Illumina Hiseq 2500 sequencing system following instructions of DNA Nextera XT Sample Prep Kit (Illumina ™ ) at Novegene Co., Ltd. in Wuhan, China. Genomic DNA was fragmented randomly and then the required length of DNA fragments was obtained by electrophoresis. Adapters were ligated to DNA fragments followed by cluster preparation and sequencing. A paired-end library (150 × 2) was constructed with an insert size of 350 base pairs (bp), and then 150 bp paired reads were sequenced using the Illumina Hiseq 2500 sequencing system. The total number, length and quality of reads sequenced for each Swertia sample was analyzed with FastQC V. 0.11.9 [78]. Low-quality reads were filtered (> 50% of bases with a quality score ≤ 20 and ≥ 1% of missing bases) using SOAPnuke 2.1.0 [79]. High quality clean reads were assembled using NOVOPlasty v3.6 [80] with the default parameters. In order to validate the assembled plastome sequence error, the clean reads were mapped to the assembled plastomes using the Circlator V. 1.5.5 program [81]. The coverage depth of plastomes was calculated with the SAMtools V. 1.10 program [82]. Complete plastomes were annotated using the online program GeSeq [83] with two reference plastomes (Swertia mussotii, GenBank: KU641021 and Gentianopsis grandis, GenBank: NC_049879) using the default values to predict genes coding for proteins (CDS), transfer RNAs (tRNA), and ribosomal DNAs (rRNAs). The start and stop codon positions of the open reading frame for all the coding genes were manually checked and adjusted if necessary, with the Sequin program (NCBI). Organellar Genome DRAW [84] was employed to draw gene maps. The raw pair-end reads, the complete plastome sequence and gene annotation of the 11 newly assembled Swertia plastomes were submitted to the NCBI GenBank under BioProject No. PRJNA807092 (Additional file 1, Table S1).

Plastome features
For each plastome, the length and gene were identified; LSC, SSC, IRa, and IRb region were plotted with junction positions being compared; the Guanine-cytosine (GC) content was calculated for each region with Geneious R7 [85].

Codon usage
All CDSs for each Swertia plastome were extracted using Phylosuite v1.2.2 [86]. The amount of codon and relative synonymous codon usage (RSCU) ratio was calculated using Mega X [43]. The RSCU distribution was illustrated in the form of heatmaps using OriginPro2021 software (OriginLab Corporation, Northampton, MA, USA). The level of codon usage bias was determined by calculating ENC, GC1, GC2, GC3, and CAI with CodonW (http:// codonw. sourc eforge. net/). The ENC-GC3 (ENC vs. GC3) plot was carried out to examine whether the codon usage of a certain gene was affected by mutation or also by other factors such as natural selection. If the corresponding points were distributed around the expectation curve, it was possible to predict the mutation was the only factor affecting codon usage bias, as introduced by Wright [24] originally, and later improved by Liu [25]. The equation of the curve is as follows: pENC = 2.5-s + {29.5 / [s 2 + (1 − s) 2 ]} [24,25]. The ENC vs. GC3 plot was generated by ggplot2 in R v.3.6.3 (https:// www.r-proje ct. org/).

Characterization of repeat sequences and simple sequence repeats (SSR)
Forward, reverse, palindromic, and complementary repeats were identified by REPuter online program with the default settings, Maximum Computed Repeats 50 and Minimal Repeat Size 8 [87]. Simple sequence repeats (SSRs) were exploited using the MISA-web program [88] with the following parameters: ≥ 10 repeat units for mononucleotide SSRs, ≥ 6 for dinucleotide, and ≥ 3 for dinucleotide, tetranucleotide, pentanucleotide, and hexanucleotide SSRs. All overlapping repeats were removed from the final results.

Phylogenetic analysis
Phylogenetic analyses of Swertia were performed using both maximum likelihood (ML) and Bayesian inference (BI) methods based on the 74 CDSs shared between the 20 plastomes of Swertia (include eight published species) and altogether 11 species in subtribe Swertiinae and four species in Gentianaceae, with Coffea arabica (Rubiaceae) as the outgroup (Additional file 2, Table S2) [9,[92][93][94][95][96][97][98][99]. The nucleotide sequences of the common 74 CDSs were extracted from each plastome (with pseudogenes excluded), concatenated, and aligned with Phylosuite v1.2.2 [90], using default settings. The best-fit models of nucleotide substitution, the GTR + I + G were selected by jModeltest v.2.1.7 [100] with the Akaike Information Criterion (AIC) [101]. The ML analyses were performed with RAXML v.8.2.12 [102]. Search for the best ML tree (the "-f a" option) was using the GTRGAMMAI substitution model (GTR + I + G) that was selected by jModeltest, then drawing support values using the rapid bootstrap (1000 replicates). The BI analysis was performed for 20 million generations in MrBayes v.3.2.6 [103], with two runs of two independent Markov chain Monte Carlo (MCMC) chains. Each chain started with a random tree, default priors, and sampling trees every 100 generations, and the first 25% generation was discarded as burn-in. Moreover, the evolution pairwise distance matrices were constructed with MEGA X based on the whole plastome sequences and 74 CDSs among the 20 Swertia species, using the Kimura 2-parameter model [43,104].

Selective pressure analysis
Selective pressures were analyzed for 50 common CDSs that are longer than 300 bp of 35 Gentianaceae species included in the phylogenetic analysis (see above). Firstly, the ratio (ω = dN/dS) of nonsynonymous (dN) to synonymous nucleotide (dS) substitution rates was calculated using the Codeml program in PAML4.9 with the site-specific model (seqtype = 1, model = 0, NSsites = 0, 1, 2, 3, 7, 8) [105,106]. The codon frequencies were set by the F3 × 4 model. The likelihood ratio test (LRT) was used to identify positive selected sites in comparisons of M0 vs. M3, M1 vs. M2, M7 vs. M8. The Bayes Empirical Bayes (BEB) method was used to identify codons under positive selection. BEB values higher than 0.95 indicate sites that are potentially under positive selection. Furthermore, for CDSs that was detected under positive selection from the site-specific model, the branch model was used to detect signatures of positive selection along specific lineages by using the three models (one-ratio, free-ratio and two-ratio model).
The one-ratio model (m0), assumes the same dN/dS ratio for all branches in the phylogeny. This model was compared to the free-ratio model (m1) that assumes an independent ω ratio for each branch. The one-ratio model (m0) was then compared to the two-ratio model (m2) that assumes the Swertia clades (set as the foreground branch) have ω ratios different from the other branches (set as the background branch). Similarly, the LRT was used to identify positive selected branches in comparisons of m0 vs. m1and m0 vs. m2 [105,106].