Phylogenetic analysis and development of molecular markers for five medicinal Alpinia species based on complete plastome sequences

Background Alpinia species are widely used as medicinal herbs. To understand the taxonomic classification and plastome evolution of the medicinal Alpinia species and correctly identify medicinal products derived from Alpinia species, we systematically analyzed the plastome sequences from five Alpinia species. Four of the Alpinia species: Alpinia galanga (L.) Willd., Alpinia hainanensis K.Schum., Alpinia officinarum Hance, and Alpinia oxyphylla Miq., are listed in the Chinese pharmacopeia. The other one, Alpinia nigra (Gaertn.) Burtt, is well known for its medicinal values. Results The four Alpinia species: A. galanga, A. nigra, A. officinarum, and A. oxyphylla, were sequenced using the Next-generation sequencing technology. The plastomes were assembled using Novoplasty and annotated using CPGAVAS2. The sizes of the four plastomes range from 160,590 bp for A. galanga to 164,294 bp for A. nigra, and display a conserved quadripartite structure. Each of the plastomes encodes a total of 111 unique genes, including 79 protein-coding, 28 tRNA, and four rRNA genes. In addition, 293–296 SSRs were detected in the four plastomes, of which the majority are mononucleotides Adenine/Thymine and are found in the noncoding regions. The long repeat analysis shows all types of repeats are contained in the plastomes, of which palindromic repeats occur most frequently. The comparative genomic analyses revealed that the pair of the inverted repeats were less divergent than the single-copy region. Analysis of sequence divergence on protein-coding genes showed that two genes (accD and ycf1) had undergone positive selection. Phylogenetic analysis based on coding sequence of 77 shared plastome genes resolves the molecular phylogeny of 20 species from Zingiberaceae. In particular, molecular phylogeny of four sequenced Alpinia species (A. galanga, A. nigra, A. officinarum, and A. oxyphylla) based on the plastome and nuclear sequences showed congruency. Furthermore, a comparison of the four newly sequenced Alpinia plastomes and one previously reported Alpinia plastomes (accession number: NC_048461) reveals 59 highly divergent intergenic spacer regions. We developed and validated two molecular markers Alpp and Alpr, based on two regions: petN-psbM and psaJ-rpl33, respectively. The discrimination success rate was 100 % in validation experiments. Conclusions The results from this study will be invaluable for ensuring the effective and safe uses of Alpinia medicinal products and for the exploration of novel Alpinia species to improve human health. Supplementary Information The online version contains supplementary material available at 10.1186/s12870-021-03204-1.

plants and provide new probes for species identification [13,14]. Their comparatively conserved and well-defined genome structures allow the investigation of a wide range of crucial issues. Initially, genetic studies focused on understanding each plastid genome, particularly of the overview features, such as genome size, gene content, and sequence repetition [15]. Lately, the crucial role of the plastomes in the evolution and impact for speciation has become obvious demonstrated by the sequence divergence, large inversion, differences in coding and intergenic regions, and evolutionary analysis [16,17].
To date, complete plastomes are available from more than 100 Zingiberaceae species, including four Alpinia species. Recently, a complete plastome of A. oxyphylla (NC_035895) was analyzed, and the plastome shared the highest sequence similarity of > 90 % to that of A. zerumbet [18]. Based on the single nucleotide polymorphism (SNP) matrix among 28 whole plastomes, including a plastome (NC_048461) of A. hainanensis and two plastomes (NC_035895, MK262729) of A. oxyphylla, a phylogenetic analysis showed that Alpinia and Amomum are closely related in the family Zingiberaceae [19]. Such results provided useful information to understand the Alpinia evolution. However, they have not focused on the species that are widely used for their medicinal values and there is no phylogenetic analysis using nuclear markers in Alpinia species.
In previous reports, phylogeny, biodiversity assessment within populations, and the authentication of Alpinia species have been studied using several molecular markers. Nuclear ribosomal DNA internal transcribed spacers (ITS) sequences have been used as markers to distinguish A. galanga from its adulterants (Zhao et al. 2001). Efficacy of DNA barcode internal transcribed spacer 2 (ITS2) was tested on species identification of Alpinia species from Peninsular Malaysia [20]. Also, the information of genetic relatedness was developed using seven plastid barcoding loci among wild Alpinia nigra (Gaertn.) B.L. Burtt populations [21].
Lately, chloroplast-derived DNA markers were developed to authenticate medicinal plants. One example is SNPs and insertion-deletion mutations (Indels) of the intergenic regions in the plastome of Panax ginseng species [22,23]. However, there are no systematic studies to develop molecular markers for medicinal Alpinia species. Our short-term goal is to understand the taxonomic relationship of medicinal Alpinia species and develop molecular markers for their discrimination. And our long-term goal is to develop a method for ensuring the efficacy and safety of Alpinia medicinal products and identify new Alpinia species for medicinal uses. In this study, we reported and compared the four complete plastome sequences of A. galanga, A. nigra, A. officinarum, and A. oxyphylla sampled from Guangxi, China. The phylogenetic relationships of medicinal Alpinia species were studied based on plastome sequences and single-copy nuclear genes. Molecular markers based on plastomes were furtherly developed for the discrimination of the five Alpinia species and were validated successfully.

Features of the Alpinia species plastomes
The plastomes are circular structures of 160,590 bp (A. galanga), 164,294 bp (A. nigra), 162,140 bp (A. officinarum), and 161,394 bp (A. oxyphylla) long. The schematic representation of the plastomes is shown in Fig. 1 and Figures S1, S2 and S3, respectively. The four plastomes display the typical quadripartite characters and show a high degree of conservation in organization and structure. They consist of a Large Single-Copy (LSC) region (87,267 − 88,970 bp) and a Small Single-Copy (SSC) region (15, (Table S1), and are somewhat higher than those of the whole plastomes.

Sequence repetition in the Alpinia plastomes
Comparative analysis of sequence repetition between all four plastomes found that the overall distribution, types, and numbers of repeats are highly similar among the plastomes. Simple sequence repeats (SSRs) are sequences composed of repeats with motifs from 1 to 6 bp in length. They are widespread in plastomes and widely utilized for species identification, genetic linkage construction, and molecular breeding [24]. A total of 293-296 SSRs were found in the Alpinia plastomes ( Table 2). The most abundant mononucleotide SSRs are polyadenine or polythymine repeat types. Interestingly, hexanucleotide SSRs were not found in the plastomes of A. galanga and A. oxyphylla but were detected in the other two Alpinia plastomes. Further analysis of the size and location of the different SSR units and comparison revealed that the composite SSR was variable among the four species, while the dinucleotide repeat of AT was conserved (Tables S10, S11, S12, S13 and S14).
Long repeat analyses of four sequenced plastomes showed that 45-49 dispersed repeats were detected, which belong to forward, reverse, complementary and palindromic repeats (Table 3). Forward (direct) and palindrome (inverted) repeats were considerably higher in number than reverse and complement repeats. The majority of these repeats with the repeat length range from 30 to 49 bp were located in intergenic spacer (IGS) regions (Tables S15, S16, S17 and S18). We found the dispersed repeats within those genes were mostly located in the exons but not in the introns. They can potentially facilitate structural rearrangements and develop variability among plastomes in a population [25].
On average, the numbers of detected tandem repeats range from 28 in A. officinarum up to 33 in A. oxyphylla. The copy numbers of these repeats range from 1.9 to 5.3 copies per tandem repeat, and the repeat sizes range from 30 to 158 bp per copy (Tables S19, S20, S21 and S22). The tandem repeats were found extensively in the IGS regions.

Expansion of the IR regions in Alpinia plastomes
The variations in the single-copy and IR regions' sizes and boundaries commonly cause evolutionary events such as contraction and expansion in the plastome architecture [26]. We compared the IR and single-copy region boundaries among six species, including one Zingiber species and the five Alpinia plastomes, the four Alpinia sequenced in our research, and A. hainanensis. Two A. oxyphylla genomes sequences previously were included in the analysis. Some divergences were identified among the plastomes of four Alpinia species and Z. spectabile (Fig. 2). Particularly, IR expansions were found in the LSC/IRa boundary of the four Alpinia species, which included the complete rps19 gene in the IRs of these species.
In contrast, the rps19 gene is located in the LSC region of Z. spectabile. The distances between the border of rps19 and the IR/LSC junction were 13, 160, 119, 129, and 129 bps in the plastomes of Z. spectabile, A. galanga, A. nigra, A. officinarum, and A. oxyphylla, respectively. Another interesting observation is that the ycf1 gene is localized in the IRb region. The ycf1 gene sequence is significantly longer in the Alpinia species, 3944 bp for A. nigra, 1428 bp for A. galanga, and 3944 bp for A. nigra, compared with that of Z. spectabile (924 bp) (Fig. 2).

Hypervariable regions
We compared the plastome sequences of five Alpinia species, among them, A. oxyphylla with three accessions, and Zingiber species to determine the overall variations among the Alpinia and Zingiber species. As shown in Fig. 3, the plastomes are highly conserved among these species. The IR regions were less divergent than the LSC and SSC regions. The coding regions were more conserved than the noncoding regions. However, ndhA, petB, ycf1, and ycf2 genes showed a relatively high degree of sequence divergence. In contrast, the IGS regions were highly diverse, particularly in the following regions: rps16-trnQ, petN-psbM, psaC-ndhE, accD-psaI, psaJ-rpl33, matK-rps16, psbH-petB (Fig. 3).    Hypervariable regions can be used to resolve phylogenies and to discriminate closely related plant species [27]. The pairwise comparison of intergenic spacer regions was conducted to identify divergence hotspot regions among the five Alpinia species using the Kimura 2-parameter (K2p) model. The average K2p distance ranged from 0.00 to 6.793 among 59 IGSs extracted from these species. Among them, the IGS regions psbE-petL, petN-psbM, accD-psaI, petD-rpoA showed the largest distances of 6.79, 6.32, 5.51, and 5.27, respectively ( Fig. 4, Table S23).

Phylogenomic analyses based on plastome data
The availability of more complete plastome sequences of Alpinia species allows us to conduct phylogenomic analyses with higher resolution in Zingiberaceae (Fig. 5). We performed a phylogenetic analysis using the Maximum likelihood (ML) method based on DNA sequences of 77 genes shared among 20 species from Zingiberaceae, including the four Alpinia species sequenced in the study (Table S24). The sister genus of Alpinia is Amomum with a Bootstrap score (BS) of 100. The species of Alpinia are distributed in two main clades. The first clade (BS: 100) is formed by A. galanga and A. nigra, both medicinal species. The second clade (BS: 100) contains most of the species sampled to date. These species are from the tropical and subtropical geographic regions and many species are of medicinal value. Two accessions of A. oxyphylla are clustered together (BS: 99), which are subsequently clustered together with A. officinarum (Fig. 5). Also, the phylogenetic positions of A. galanga and A. nigra were reported for the first time based on the plastomes. The bootstrap scores are high for all branches indicating the high degree of reliability of the phylogenetic tree.

Phylogenetic analysis based on nuclear markers
The low-coverage sequence data generated from this study allowed us to perform phylogenetic analysis using additional nuclear markers. We extracted nuclear genes from sequence data among the Angiosperms-mega 353 gene set [28]. Among these genes, 352, 353, 353, 352 genes had mapped reads, and the reads mapped to 173, 28, 93, 59 genes were assembled into contigs for A. galanga, A. nigra, A. officinarum, and A. oxyphylla, respectively. Among these assembled contigs, only four genes (AT4G04780, AT3G53760, AT5G53800, AT1G06240) were shared among the four species. These four genes were used to construct a phylogenetic tree using the same method as that for the complete plastome sequences. The reconstructed ML tree with these four genes was well resolved overall. And two of the nodes were supported with bootstrap values of 75 and 69 % (Fig. 6). Among the four Alpinia species, A. galanga was sister to A. nigra, and A. officinarum was sister to A. oxyphylla. To compare if the relationships in both the nuclear and plastome trees are consistent, the phylogenetic analysis of plastomes with the same taxon sampling as the nuclear tree was conducted. The relevant result was consistent with the results of phylogenetic inferences obtained with nuclear markers (Fig. 6). This approach enabled us to define further the phylogenetic relationship between the four Alpinia species using nuclear genes.

Variation and evolutionary selection of protein-coding genes
Purifying/positive selection analyses of 77 protein-coding genes in the Alpinia plastomes showed that most genes exhibited ω values less than 0.5. Five genes (psbI, petN, psbM, petL, and psbT) had the lowest ω ratios close to 0. In contrast, the ω values of ycf2, accD, rpl23, rps7, and ycf1 were more than 1.00, respectively (Table S25). The results showed that the genes accD and ycf1 were under positive selection. The likelihood ratio test identified three and five amino acid sites in accD and ycf1 that were positively selected (under posterior probability >   (Table 4). These sites are also highly polymorphic in the two genes.

Molecular marker development based on Alpinia plastomes
To discriminate the five medicinal Alpinia species, we selected two hypervariable IGS regions, petN-psbM, and psaJ-rpl33, to develop two DNA markers named Alpp and Alpr, respectively. The PCR primers used to amplify these two markers are shown in Table S26. PCR amplification of total DNAs from all five medicinal species samples resulted in products having expected size (Fig. 7, Figure S12, Table S27). The DNA fragments were extracted from each band and then subjected to Sanger sequencing. The sequencing results were identical to the expected sequences (Figures S13 and S14). Marker Alpp, derived from the petN-psbM IGS region, has two specific SNP loci and one Indel loci. These three variable loci can be used to differentiate three of the five Alpinia species, except A. officinarum and A. oxyphylla. The marker Alpr, derived from the psaJ-rpl33 IGS region. It has two SNP loci and one Indel loci. When using the SNP and Indel loci from both Alpp and Alpr, all the five species can be differentiated successfully (Fig. 8). We also have tested the new primers on all ten available Alpinia plastomes obtained from NCBI and this study in silico. These markers can discriminate all eight species based on the SNP and Indel loci from both Alpp and Alpr ( Figures S15 and S16).

Discussion
Here, we studied five medicinal Alpinia species, Alpinia galanga, A. hainanensis, A. officinarum, A. oxyphylla, and A. nigra. We sequenced the four plastomes of these five species. Three of them belonging to Alpinia galanga, A. officinarum, and A. nigra were reported for the first time. Two plastomes of A. oxyphylla were released during the study period. We carried out a detailed analysis of the genome features, performed the phylogenetic analysis with plastid proteomes and nuclear makers. Lastly, we developed a set of two primers that can distinguish these five medicinal species. Compared to the plastomes of previously published Alpinia species, all the plastomes presented in this study exhibited consistent genomic structure, gene order, and content. And there are no significant structural rearrangements, such as inversions or gene relocations (Fig. 1, Table S1). The size of the A. oxyphylla plastomes (MK940824) in this study is almost identical to the other two reported plastomes, which were 161,394 bp (MK940824), 161,410 bp (MK262729) [19], and 161,351 bp (NC_035895) [18]. We found that the most abundant mononucleotide SSRs are of polyadenine or polythymine repeat types in the four Alpinia species, consistent with those reported previously [29]. Plastomes are well-arranged, except for the expansion of the IR regions in the Alpinia species. Judged from comparative analysis with the plastome of Z. spectabile as a reference, the IR lengths of all the four Alpinia species plastomes were all increased to ≥ 160 kbp. Also, one evidence supporting this expansion is that the rps19 gene has moved to the IR regions. In other species of Alpinia, plastomes reported so far [18,19], the entire rps19 gene is also localized in the IR region, which is consistent with our findings. The analysis revealed that the four Alpinia species sequenced in our study have heteroplasmy sites in their plastomes. However, the positions of these detected heteroplasmic sites and two developed molecular markers did not overlap.
Classifications and phylogenetic analysis among Zingiberaceae were previously reported based on morphological features and DNA sequences of the nuclear internal transcribed spacer (ITS) and plastid matK regions [1,[30][31][32]. We use four new plastome sequences to define the position of four Alpinia species in Zingiberaceae. The new accession of A. oxyphylla sequenced (MK 940,824) in this study was most closely related to the other one A. oxyphylla plastomes reported previously (NC_035895) [19]. To date, the phylogenetic inference of Alpinia species has mainly relied on plastid markers [18,19] and few multi-copy nuclear ribosomal regions such as ITS [33]. Our phylogenetic analysis results create reliable phylogenies of the four Alpinia species sequenced by us using the nuclear markers for the first time. In addition, phylogenetic analysis using plastome and nuclear sequences revealed the identical phylogenetic relationships for the four Alpinia species.
Because of the lack of mobility, plants must deal with the challenge of abiotic stresses, such as soil salinity, drought, and extreme temperature. Many genes from plastomes, such as clpP [34], rbcL [35], and matK [36], ycf1 and ycf2 [37], have been positively selected. The positive selection of the plastome genes may serve as an adaptive evolution for adjusting to environmental changes. In the selective pressure analysis, five genes were positively selected, and their selection might reflect the adaptive evolution of these Alpinia species. The results are consistent with the reports that accD and ycf1 evolved under positive selection in the Zingiber plastomes [37]. Particular amino acids were identified to have been positively selected in two genes, accD, and ycf1. For example, the plastid accD is an essential gene required for leaf development [38], and the ycf1 is crucial for plant viability [39]. In the current research, all four Alpinia species studied distributed in tropical and Fig. 3 Sequence identity plot of the seven Alpinia plastomes with Zingiber spectabile as a reference by mVISTA. The species names are shown to the left. The grey arrows above the alignment indicate the transcription direction of genes. In the alignment box, the blue color box indicates protein-coding, the pink color box shows the conserved noncoding sequence, and the light green box indicates tRNAs and rRNAs. The x-axis represents the positions in the cp. genome, and the Y-scale represents the percent identity ranging from 50-100 % subtropical areas. Their living environment's high temperature and humidity may be the reason for the positive selection of the accD and ycf1 genes.
One of our goals is to develop markers that can distinguish the five medicinal Alpinia species. DNA markers derived from the plastomes have been widely used and are considered highly discriminatory for species identification such as Panax and Cruciata, including SNPs and InDels [22,40]. So far, these plastome-derived DNA markers are usually used to analyze intraspecies level diversity and phylogenetic analysis in Alpinia [20,21]. The most variable regions of the complete plastome can be used for DNA barcoding of closely related plant species [27]. Therefore, we developed the specific markers for discriminating Alpinia species based on the plastomes' hypervariable regions. The hypervariable regions identified in our study, such as petN-psbM, psaC-ndhE, accD-psaI, were similar to those reported previously [19]. We found two markers derived from the petN-psbM and psaJ-rpl33 IGS regions that successfully distinguished the five Alpinia species. The marker Alpp1 can't discriminate between A. officinarum and A. oxyphylla, because they are more closely related than with the other studied species. It has to be used combined with the marker Alpr1 for successful discrimination of the five Alpinia species.
Only a handful of Alpinia plastomes are sequenced and available in databases. Because the genus includes more than 200 spp., the information on the phylogeny of the genus is still rather limited. The complete Alpinia plastome sequences provided in this study expanded the taxonomic sampling and subsequently formulated new hypotheses about new potential relationships among Alpinia taxa [41]. From this point forward, additional plastomes of Alpinia species should be sequenced, which allow us to take a broad view of the evolutionary relationship and evolutionary processes of Alpinia species, lay the foundation for the further usage of these plants for the benefit of human lives. In this study, we developed molecular markers for the five Alpinia species that are of economic importance. With the identification of additional economically important Alpinia species, the same methodology can be used to identify their corresponding differentiating markers.

Conclusions
The complete plastomes of A. galanga, A. nigra, and A. officinarum are reported for the first time in this study. In addition, two molecular markers were developed from the hypervariable regions that can distinguish these five medicinal Alpinia species. The results obtained from these studies will contribute to our understanding of Alpinia classification, plastome evolution, and the discrimination of medicinal products derived from Alpinia species.  Tribes to which each species belongs were shown to the right side of the tree. Bootstrap values were calculated with 1000 replicates Fig. 6 Phylogenetic trees based on common genes identified using HybPiper pipeline and the shared DNA sequences of 77 protein-coding genes in the plastomes for the same five species. The phylogenetic tree on the left panel was constructed with the sequences of 4 shared contigs for nuclear genes present in 4 Alpinia species found by the HybPiper pipeline using the maximum likelihood method implemented in Phylosuite. The Oryza sativa L. was used as the outgroup. Bootstrap support scores were calculated from 1000 replicates. And the phylogenetic tree on the right panel was constructed with the shared DNA sequences of 77 protein-coding genes in the plastomes of the same five species in the nuclear tree using the same methods in phylogenomic analysis The samples were silica-dried and stored at the Herbarium of the Institute of Medicinal Plant Development (voucher numbers: Implad201910413, Implad201910414, Implad20180327, and Implad20180362). To develop molecular markers of Alpinia species, we collected fresh leaves of another group from Guangxi Medicinal Plant Garden in Nanning, Guangxi, China, and the ginger garden of South China Botanical Garden, China (113°36' E, 23°18' N, 510,650). All samples were collected with permission from the Garden authorities. Detailed information is shown in Table S27. A plant genomic DNA kit (Tiangen Biotech, Beijing, Co., Ltd.) was used to extract total DNAs. The purity of total DNA was evaluated using electrophoresis on 1.0 % agarose gels. And the concentration was measured using a Nanodrop spectrophotometer 2000 (Thermo Fisher Scientific Inc., Waltham, MA, USA). This study complies with relevant institutional, national, and international guidelines and legislation.

Plastome sequencing, assembly, and annotation
The sequencing libraries of total DNA from each species were prepared using the TruSeq DNA Sample Prep Kit (Illumina, Inc., San Diego, CA, USA) following the manufacturer's instructions. The total DNA was sheared into fragments at approximately 500 bp long for pairedend library construction. The libraries were sequenced on an Illumina HiSeq 3000 platform (Illumina Inc., San Diego, CA, USA). After obtaining the paired-end reads (2 × 250 bp), we downloaded the plastid genomes from the GenBank database (https://www.ncbi.nlm.nih.gov/ genome/organelle/). These plastome sequences were used to search against Illumina paired-end reads using BLASTn with an E-value cutoff of 1e-5. The filtered reads were considered plastome-related and used for the downstream genome assembly. SPAdes (v. 3.10.1) [42] and CLC Genomics Workbench (v. 7, QIAGEN, Aarhus, Denmark) were used for de novo assembly. The dot plot of the contigs and reference genome were constructed and visualized for evaluating the assembly quality. The contigs were subjected to reassembly using the Seqman module of Lasergene (v. 11.0, Madison, Wisconsin). Only one contig was obtained for each of the Alpinia species.
We used the CpGAVAS2 web server [43] was used to annotate the four genomes. Cutoffs for the E-values of BLASTn and BLASTx were set to 1e-10. After the prefiltering step, the number of top hits for annotation included in the reference gene sets was set to 10. Manual Table 4 Likelihood ratio tests to identify positively selected sites within the accD and ycf1 genes across 21 Alpinia plastomes  Fig. 7 The gel electrophoresis results of the amplification of DNA barcodes using designed primers. Lane M was the marker of DL1000. The lanes from left to right corresponded to products amplificated from the first individual of A. galanga, A. hainanensis, A. nigra, A. officinarum, and A. oxyphylla by primer Alpp and Alpr, respectively. The original uncropped image is shown in Figure S12 corrections were performed to determine the positions of the start and stop codons and the intron/exon boundaries. Codon usage frequency and GC content (i.e., the relative content of guanines and cytosines) were calculated using custom scripts. Their circular gene maps were drawn by the cpgview web server (http://www. herbalgenomics.org/cpgview/). The raw data and the annotated plastomes have been submitted to GenBank. The accession numbers of raw data were SRR9072115 (A. galanga), SRR9072120 (A. nigra), SRR9080445 (A.

Repeat sequence analysis
SSRs were detected using the MISA Perl Script (http:// pgrc.ipk-gatersleben.de/misa/). The minimal numbers of repeat units are eight for mononucleotide repeats, four for di-and trinucleotide repeats, and three for tetra-, penta-, and hexanucleotide repeats. Long repeat sequences with a minimal length of 30 bp and hamming distance = 3 were predicted using REPuter [44]. Tandem repeat structures were scanned with Tandem Repeats Finder [45]. We set the parameters to 2 for matches and 7 for mismatches and Indels. In contrast, we set the minimum alignment score and maximum period size to 50 and 500, respectively. The minimum repeat size was 30 bp, and the cutoff for similarities among the repeat units was 90 %. All of the identified repeat structures were verified manually. Nested or redundant repeats were removed.

Identification of nuclear markers for phylogenetic analysis
To explore the phylogenetic relationship implied by single-copy nuclear markers, we used the HybPiper v1.2 [50] to identify nuclear markers among the Angiosperms-mega 353 gene set [28] from our sequencing reads for the four Alpinia species and then used them for phylogenetic analysis. The command line is "./reads_first.py -b mega353.fasta -r sample_001.fastq sample_002.fastq --prefix sample_result -bwa". The HybPiper package contains an internal reference set of 353 genes. It can identify genes from high-throughput sequencing results that are homologous to these 353 genes and extract them for phylogenetic analysis. In particular, the expanded Angiosperms353 target file [28], which is a drop-in replacement for the original Angiosperms353 file [51] in the HybPiper analyses, was used to capture loci in our sequence reads. We identified the potential genes for phylogenetic analysis as follows. Firstly, we used the retrieval script in HybPiper to identify contigs matching each probe (https://github.com/ mossmatters/HybPiper). This was done using the reads_ first.py script. Secondly, the common genes among the four species were selected. Finally, the contigs of these genes were used to create a phylogeny. Briefly, a phylogenetic tree was constructed with the contigs of these nuclear genes as above by IQTREE v1.6.10 [48] with 1000 non-parametric bootstrap replications, except the best-fit model is HKY + F and the outgroup is Oryza sativa L. The phylogenetic analyses based on these nuclear sequences were conducted for the sample taxa as those based on the shared coding sequences of 77 protein-coding genes in the plastomes.

Identification of the hypervariable regions
We conducted a comparative genome analysis for the complete Alpinia plastomes using the software mVISTA (http://genome.lbl.gov/vista/mvista/submit.shtml) in the Shuffle-LAGAN mode. The annotated Z. spectabile plastome (NC_020363) was used as the reference in the analysis.
To identify the most divergent regions, we wrote a custom script to extract the start and end of the IGS regions from the GenBank files for the five plastomes, together with the plastome of A. hainanensis. A total of 59 IGSs shared by the five Alpinia plastomes were identified. The sequences were extracted and aligned using the ClustalW2 (v. 2.0.12) program with options "-type = DNA -gapopen = 10 -gapext = 2" [54]. Pairwise distances were calculated using the K2p evolution model implemented in the distmat program from the EMBOSS package (v. 6.3.1) [55].
Additional file 1: Figure S1. Schematic representation of the A. nigra plastome features. Figure S2. Schematic representation of the A. officinarum plastome features. Figure S3. Schematic representation of the A. oxyphylla plastome features. Figure S4. The schematic diagram of position and length of introns and exons for the splitting genes in the plastome of A. galanga. The gene rps12 was a trans-splicing gene. Figure  S5. The schematic diagram of position and length of introns and exons for the splitting genes in the plastome of A. nigra. The gene rps12 was a trans-splicing gene. Figure S6. The schematic diagram of position and length of introns and exons for the splitting genes in the plastome of A. officinarum. The gene rps12 was a trans-splicing gene. Figure S7. The schematic diagram of position and length of introns and exons for the splitting genes in the plastome of A. oxyphylla. The gene rps12 was a trans-splicing gene. Figure S8. The VCF output for the A. officinarum. Figure S9. The VCF output for the A. oxyphylla. Figure S10. The VCF output for the A. oxyphylla. Figure S11. The VCF output for the A. nigra. Figure S12. The original and full-length gel electrophoresis results of the amplification of DNA barcodes using designed primers. Figure S13. The alignment of amplicons produced by designed Alpp primers. Figure  S14. The alignment of amplicons produced by designed Alpr primers. Figure S15. The alignment of amplicons in 10 Alpinia plastomes produced by designed Alpp primers in silico. Figure S16. The alignment of amplicons in 10 Alpinia plastomes produced by designed Alpp primers in silico.
Additional file 2: Table S1. Base composition in the plastomes of four Alpinia species . Table S2. List of genes annotated in the plastome of A. galanga. Numbers in parentheses represented the repetition of genes. Superscript T: trans-splicing gene. Table S3. List of genes annotated in the plastome of A. nigra. Numbers in parentheses represented the repetition of genes. Superscript T: trans-splicing gene. Table S4. List of genes annotated in the plastome of A. officinarum. Numbers in parentheses represented the repetition of genes. Superscript T: trans-splicing gene. Table S5. List of genes annotated in the plastome of A. oxyphylla. Numbers in parentheses represented the repetition of genes. Superscript T: trans-splicing gene. Table S6. The length of introns and exons for the splitting genes in the plastome of A. galanga. The gene rps12 was a trans-splicing gene. Table S7. The length of introns and exons for the splitting genes in the plastome of A. nigra. The gene rps12 was a transsplicing gene. Table S8. The length of introns and exons for the splitting genes in the plastome of A. officinarum. The gene rps12 was a transsplicing gene. Table S9. The length of introns and exons for the splitting genes in the plastome of A. oxyphylla. The gene rps12 was a transsplicing gene. Table S10. SSR identified in the plastome of A.galanga. P1 = Mononucleotide; P2 = Di nucleotide; P3 = Tri nucleotide; P4 = Tetra nucleotide; P5 = Penta nucleotide; 6 = Hexa nucleotide repeats and c = Compound repeat microsatellites. Table S11. SSR identified in the plastome of A.nigra. P1 = Mononucleotide; P2 = Di nucleotide; P3 = Tri nucleotide; P4 = Tetra nucleotide; P5 = Penta nucleotide; 6 = Hexa nucleotide repeats and c = Compound repeat microsatellites. Table S12. SSR identified in the plastome of A. officinarum. P1 = Mononucleotide; P2 = Di nucleotide; P3 = Tri nucleotide; P4 = Tetra nucleotide; P5 = Penta nucleotide; 6 = Hexa nucleotide repeats and c = Compound repeat microsatellites. Table S13. SSR identified in the plastome of A. oxyphylla. P1 = Mononucleotide; P2 = Di nucleotide; P3 = Tri nucleotide; P4 = Tetra nucleotide; P5 = Penta nucleotide; 6 = Hexa nucleotide repeats and c = Compound repeat microsatellites. Table S14. Comparison of SSR markers found among four Alpinia species and one outgroup species of Zingiber spectabile. Zisp: Zingiber spectabile; Alga: Alpinia galanga; Alni: Alpinia nigra; Alof: Alpinia officinarum; Alox: Alpinia oxyphylla. Table S15. Dispersed repeat sequences in the plastome of A. galanga. Table S16. Dispersed repeat sequences in the plastome of A.nigra. Table S17. Dispersed repeat sequences in the plastome of A. officinarum. Table  S18. Dispersed repeat sequences in the plastome of A. oxyphylla. Table  S19. Tandem repeat sequences identified in the plastome of A. galanga. Table S20. Tandem repeat sequences identified in the plastome of A. nigra. Table S21. Tandem repeat sequences identified in the plastome of A.officinarum. Table S22. Tandem repeat sequences identified in the plastome of A.oxyphylla. a: coding sequences; b: intergenic spacers. Table S23. The distances among the shared intergenic spacer (IGS) regions from the five Alpinia plastomes. Alga: Alpinia galanga; Alha: Alpinia hainanensis; Alni: Alpinia nigra; Alof: Alpinia officinarum; Alox: Alpinia oxyphylla. Table S24. The list of accession numbers of the plastome sequences used in the phylogenetic analyses of the Zingiberaceae . Table  S25. The dN, dS and dN/dS (ω) value of 77 commom protein-coding genes from plastomes of 21 Alpinia species. Table S26. The two pairs of primers for the ampilification of DNA barcodes. Table S27. The list of sample numbers of the samples used in the species discrimination analyses of the Alpinia.