- Research
- Open access
- Published:
Assembly and comparative analysis of the first complete mitochondrial genome of Astragalus membranaceus (Fisch.) Bunge: an invaluable traditional Chinese medicine
BMC Plant Biology volume 24, Article number: 1055 (2024)
Abstract
Background
Astragalus membranaceus (Fisch.) Bunge is one of the most well-known tonic herbs in traditional Chinese medicine, renowned for its remarkable medicinal value in various clinical contexts. The corresponding chloroplast (cp) and nuclear genomes have since been accordingly sequenced, providing valuable information for breeding and phylogeny studies. However, the mitochondrial genome (mitogenome) of A. membranaceus remains unexplored, which hinders comprehensively understanding the evolution of its genome.
Results
For this study, we de novo assembled the mitogenome of A. membranaceus (Fisch.) Bunge var. mongholicus (Bunge) P. K. Hsiao using a strategy integrating Illumina and Nanopore sequencing technology and subsequently performed comparative analysis with its close relatives. The mitogenome has a multi-chromosome structure, consisting of two circular chromosomes with a total length of 398,048 bp and an overall GC content of 45.3%. It encodes 54 annotated functional genes, comprising 33 protein-coding genes (PCGs), 18 tRNA genes, and 3 rRNA genes. An investigation of codon usage in the PCGs revealed an obvious preference for codons ending in A or U (T) bases, given their high frequency. RNA editing identified 500 sites in the coding regions of mt PCGs that exhibit a perfect conversion of the base C to U, a process that tends to lead to the conversion of hydrophilic amino acids into hydrophobic amino acids. From the mitogenome analysis, a total of 399 SSRs, 4 tandem repeats, and 77 dispersed repeats were found, indicating that A. membranaceus possesses fewer repeats compared to its close relatives with similarly sized mitogenomes. Selection pressure analysis indicated that most mt PCGs were purifying selection genes, while only five PCGs (ccmB, ccmFc, ccmFn, nad3, and nad9) were positive selection genes. Notably, positive selection emerged as a critical factor in the evolution of ccmB and nad9 in all the pairwise species comparisons, suggesting the extremely critical role of these genes in the evolution of A. membranaceus. Moreover, we inferred that 22 homologous fragments have been transferred from cp to mitochondria (mt), in which 5 cp-derived tRNA genes remain intact in the mitogenome. Further comparative analysis revealed that the syntenic region and mt gene organization are relatively conserved within the provided legumes. The comparison of gene content indicated that the gene composition of Fabaceae mitogenomes differed. Finally, the phylogenetic tree established from analysis is largely congruent with the taxonomic relationships of Fabaceae species and highlights the close relationship between Astragalus and Oxytropis.
Conclusions
We provide the first report of the assembled and annotated A. membranaceus mitogenome, which enriches the genetic resources available for the Astragalus genus and lays the foundation for comprehensive exploration of this invaluable medicinal plant.
Background
Mitochondria (mt) are indispensable organelles that serve as a hub for energy metabolism and play a crucial role in biological growth, development, and reproductive processes [1]. The mitochondrial genome (mitogenome) possesses characteristics of endosymbiotic origin and semi-autonomous inheritance, evolving independently of the nuclear genome [2, 3]. More recently, the continuous progress of sequencing technologies has fostered a meteoric rise in the deciphering of organellar genomes. Despite this, the completion of plant mitogenome sequencing has severely lagged due to the complexity of mitogenome assembly. In contrast to their counterparts in animals and the conserved plastid genomes, plant mitogenomes are more variable and diverse in terms of genome size, structural rearrangements, repeat elements, and gene relocation [4,5,6]. As has already been determined and documented from studies, the size of the plant mitogenome fluctuates substantially, spanning a 200-fold variance from 66 kb in Viscum scurruloideum Barlow [7] to 12 Mb in Larix sibirica Ledeb [8]. This divergence in size is primarily driven by the expansion of non-coding regions, especially the proliferation of diverse repeats and fusion of foreign DNA via horizontal or intracellular transfer, contributing to the emergence of larger and more complex plant mitogenomes [9, 10]. Furthermore, the structure of plant mitogenomes is intricate. Unlike the master circle model, which is applicable to most plastid genomes and animal mitogenomes, the actual architecture of plant mitogenomes appears to involve circular, linear, highly branched, or complex multi-chromosome configurations [11]. It is evident that, in reality, the mitogenome configuration of most terrestrial plants dramatically varies due to frequent recombination activities mediated by repeated sequences [12, 13]. Besides their large size, structural rearrangement is the most distinctive feature of plant mitogenomes. Plant mtDNA is prone to rearrangements due to its high homologous recombination activity. Mitogenome rearrangements can be neutral, affecting only non-coding regions of mtDNA. However, poor growth or lethality can result from the loss of portions of essential genes promoted by rearrangements within mitogenomes [14]. Importantly, mitogenome rearrangements can generate chimeric genes, which are potentially associated with cytoplasmic male sterility (CMS) [15]. Thus, the continuous enrichment and improvement of techniques are essential to precisely decipher the plant mitogenome configurations and display all information in a more intuitive way.
Although plant mitogenomes differ extraordinarily in size and structure, their gene contents are relatively conserved (typically 24 core genes and 17 variable genes) [16]. In particular, genes related to respiratory complexes and energy synthesis are highly conserved in angiosperm mitogenomes; meanwhile, in some species—such as V. scurruloideum—the mitogenome has mysteriously lost all nad genes encoding NADH dehydrogenase [7]. Furthermore, plant mitogenomes evolve slowly in terms of sequence changes and present a low rate of synonymous substitutions [17]. According to prior research, the general idea remains that the mitogenome exhibits the lowest rate of substitutions of the three genomic compartments in plants. The presence of functional genes can also vary greatly due to post-transcriptional editing during mitogenome rearrangements [18]. RNA editing is a common phenomenon in land plants and is essential for establishing the massive diversity observed in gene sequences. Modification of RNA nucleotide sequences can intensely enrich the variety of proteins through affecting gene expression and RNA stability. Other distinct features of plant mitogenomes are horizontal gene transfer (HGT) and intracellular gene transfer (IGT). HGT is relatively rare during plant evolution and mainly inferred from parasitic plants, and is generally recognized as a mode for transmission of genetic material between organisms [19, 20]. In contrast, IGT from the chloroplast (cp) and nuclei to mt occurs continuously and dynamically, which may trigger alterations in the structure and size of mitogenomes [13, 21]. Historically, due to the presence of highly enriched repeats and frequent recombination, plant mitogenomes have been less popular in phylogenetic studies than cp genomes. However, with the continuous advancements in next-generation sequencing—especially those involving long-read techniques—a more refined understanding of mitogenomes provides structured and statistically significant hints regarding plant phylogeny [22]. Consequently, comparative analysis of plant mitogenomes has logically become a hotspot for elucidating the evolutionary mechanism of plants.
Astragalus L. is the largest genus in the Fabaceae family comprising approximately 3,000 known species found widely distributed in the Northern Hemisphere, South America, and Africa [23, 24]. There are an estimated 500 species of the genus in China, many of which have been employed in traditional Chinese medicine for thousands of years [23, 25]. Astragalus membranaceus (Fisch.) Bunge, also known as “Astragali radix” or “Huangqi”, is the source of one of the most well-known Chinese medicinal herbs, produced from its dry root. Huangqi, translated as “yellow leader” in Chinese, refers to the yellow color of the roots as well as its leading status as one of the most important tonic herbs in traditional Chinese medicine [26]. Modern phytochemical evidence indicates that A. membranaceus is rich in various bioactive ingredients, including flavonoids, saponins, polysaccharides, amino acids, and trace elements [27, 28]. Due to its abundance of chemical nature of constituents, A. membranaceus exerts multiple pharmacological effects, such as anti-tumor, immunomodulatory, antioxidant, anti-inflammatory, and anti-depressant properties [25, 27]. Therefore, A. membranaceus has been proposed as one of the top ten most frequently used herbs in various clinical practices. A. membranaceus (Fisch.) Bunge var. membranaceus and A. membranaceus (Fisch.) Bunge var. mongholicus (Bunge) P. K. Hsiao are two Chinese native varieties with huge medicinal value [29] and have been tested and recorded in the 2020 Edition of the Pharmacopoeia of the People’s Republic of China. However, recent studies have indicated that the chemical composition of distinct A. membranaceus varieties significantly differs [29]. Inherent differences between these varieties might cause drug efficacy and safety issues, therefore accurate identification of varieties is essential for the rational utilization of these invaluable medicinal plants. Meanwhile, the classification of species within the Astragalus genus has been historically difficult, given their diverse morphology resulting from the complexity of the genus. In recent years, remarkable success in species identification has been achieved through genomics approaches represented by DNA barcoding technology [30]. The successive acquisition of genome sequence information from cp [31, 32] and nuclei [33] represents noteworthy progress in the genomics research of A. membranaceus and provides valuable information for molecular evolution and taxonomic classification. The mitogenomes are deemed to provide more sufficient phylogenetic information for the elucidation of the genetic relationships among species and intractable phylogenies [34]. To date, mitogenome sequencing of A. membranaceus has not been performed. Therefore, knowledge of the mitogenome is urgently needed to provide genetic resources for further studies on the evolutionary understanding and genetic improvement of A. membranaceus.
For the current study, we employed an Illumina short-read and Nanopore long-read integrated strategy for de novo assembly of the mitogenome of A. membranaceus (Fisch.) Bunge var. mongholicus (Unless specified, A. membranaceus refers to A. membranaceus (Fisch.) Bunge var. mongholicus in the subsequent description of this paper for simplicity). The assembled A. membranaceus mitogenome was characterized in terms of gene annotation, codon usage, sequence variation, selection pressure, RNA editing events, synteny inference, gene loss, and phylogenetic position analyses through comparison with the corresponding information of other published Fabaceae species. In addition, the possibility of IGT was investigated based on the occurrence of homologous sequences between the cp and mt genomes of A. membranaceus. The results reported here are expected to fill the knowledge gap regarding mitogenomic evolution for the Astragalus genera and lay a solid foundation for molecular evolution studies and the breeding of A. membranaceus.
Results
Mitogenome assembly and genomic features of A. Membranaceus
In terms of topological structure, the A. membranaceus mitogenome consists of six contigs ranging from 4,006 bp to 181,678 bp in length (Fig. 1A). These mt contigs have overlapping regions and were used to construct the schematic graph for assembling the genome. To facilitate a more complete description and subsequent analysis, the A. membranaceus mitogenome was assembled into two complete circular chromosomes: chromosome 1 (chr 1) was organized with the order of contig 1–contig 4–contig 5–contig 6–contig 5–contig 4–contig 3–contig 1, while chromosome 2 (chr 2) was circularized in the order of contig 2–contig 3–contig 4–contig 2. Overall, the A. membranaceus mitogenome is a putative double-ring DNA with a total length of 398,048 bp and an overall GC content of 45.3% (Fig. 1B, C). The larger circular chromosome (chr 1) is 263,390 bp in length with a GC content of 45.3%, while the smaller chromosome (chr 2) is 134,658 bp in size and with 45.2% GC content (Table 1). The slight preference for AT in the A. membranaceus mitogenome is congruent with other angiosperms [6, 12]. The read mapping results showed that reads covered all positions of the mitogenome (Fig. S1), indicating that the assembled chromosomes were continuous.
Assembly results of the A. membranaceus mitogenome. (A) The assembly graph of the A. membranaceus mitogenome displayed in Bandage. The mt contigs are represented by different color segments. (B, C) Circular maps of the A. membranaceus mitogenome. Genes with different functions were depicted using different colors. Genes presented on the outside and inside of the circle are transcribed clockwise and counterclockwise, respectively. The innermost darker gray region corresponds to GC content with the middle gray line as the 50% threshold line
The A. membranaceus mitogenome encodes 54 annotated functional genes, corresponding to 33 protein-coding genes (PCGs), 18 tRNA genes, and 3 rRNA genes (Table 2). Of the 33 PCGs, 25 are considered core genes responsible for electron transport, oxidative phosphorylation, cytochrome c biogenesis, and ATP synthesis. PCGs constitute the majority of the coding regions, accounting for 7.5% of the total mitogenome, while tRNA and rRNA genes account for only 0.3% and 1.1%, respectively (Table 1). The non-coding regions represent more than 90% of the whole-genome sequence, indicating the low genomic density of A. membranaceus. Introns emerged in seven of the annotated genes, namely, nad1, nad2, nad4, nad5, nad7, ccmFc, and rps10. Moreover, we identified 13 tRNAs as being mitochondrial native, while the other 5 are chloroplastic in origin (Table 2).
PCGs codon usage preference
In the complete mitogenome of A. membranaceus, we identified a total of 9,956 codons within the 33 predicted PCGs (Table S1). The codon number of the mt genes varies greatly, ranging from 75 in atp9 to 677 in matR. Not surprisingly, the genome has 64 standard codons, of which 61 sense codons encode all 20 amino acids, whereas the remaining 3 are translation termination signals. In A. membranaceus, almost all PCGs use ATG as their starting codons, with the exception of the initiation codon ACG for nad1. There are three types of stop codons used in the mt PCGs, including the termination codon TAA that operates in 21 genes, TGA in 7 genes, and TAG in 5 genes. The codon distribution and relative synonymous codon usage (RSCU) of nine legume mitogenomes were successively analyzed to investigate the codon usage behavior of PCGs. We compared A. membranaceus with eight other Fabaceae species (i.e., Astragalus complanatus R. Br. ex Bunge, Caragana spinosa (L.) DC., Glycyrrhiza glabra L., Medicago sativa L., Oxytropis arctobia Bunge, Pisum fulvum Sibth. et Sm., Trifolium aureum Pollich, and Trigonella foenum-graecum L.). Like most deciphered plant mitogenomes, A. membranaceus exhibits a general preference for leucine (Leu) and serine (Ser) in amino acid utilization. In contrast, cysteine (Cys) and tryptophane (Trp) were relatively less abundant among the compared species (Fig. S2). Notably, the use of amino acid residues is found to be highly conserved in these legumes based on interspecies comparison. Codons with RSCU values greater than 1 are defined as high-frequency codons and are presumed to be preferentially used in amino acid encodings. In terms of amino acids, except for UGG (Trp) and AUG (methionine, Met), the vast majority of amino acids presented a skewness in their codon usage patterns (Fig. 2). A total of 31 high-frequency codons were identified in the A. membranaceus mitogenome, suggesting a high bias in the usage of these codons. The codons UAA (End), GCU (alanine, Ala), and UAU (tyrosine, Tyr) displayed a high degree of codon usage preference, with their RSCU values exceeding 1.5. Apart from the two codons ACC (threonine, Thr) and UUG (Leu), the remaining preferentially used codons ended in A/U (T) bases, reflecting the AT bias in their codon usage patterns.
Prediction of RNA editing events
To gain a deeper understanding of gene expression within the mitogenome, we performed further identification of potential RNA editing sites in A. membranaceus through comparisons with the eight other Fabaceae species. A total of 500 sites were predicted to have undergone RNA editing in all 33 mt PCGs of A. membranaceus, involving the uniform conversion of the base C to U. The frequency of editing events is similar among the investigated species, ranging from 448 in M. sativa to 504 in G. glabra (Fig. 3A). The extremely conserved PCGs may be shared among the analyzed legumes, based on the conservative number of sites. All identified edited sites involved substitution of the first 2-base positions in each codon (Fig. 3B). The distributions of predicted sites among different genes were compared and found to be uneven (Fig. 3C). The numbers of edited sites encoded by genes for cytochrome c biogenesis, complex I, and transport membrane proteins were higher on average, indicating that genes of these categories are more susceptible to RNA modification. Furthermore, we analyzed the amino acid changes generated by RNA editing to investigate the physicochemical modifications at the protein level. All 500 predicted editing sites in A. membranaceus were found to be non-synonymous. The non-synonymous sites correspond to 14 types of amino acid conversion, mainly involving two amino acid alterations from Ser to Leu and proline (Pro) to Leu (Fig. 3A). Interestingly, the nature of amino acid changes in A. membranaceus is consistent with those in the other compared species. In addition, a total of 241 (48.2%) site changes involved the conversion of hydrophilic amino acids to hydrophobic amino acids (Table S2), which would be conducive to improving the stability of proteins in the A. membranaceus mitogenome.
Comparison of RNA editing events across the mt PCGs of A.membranaceus with eight different legumes. (A) The percentage of RNA editing sites that cause different amino acid conversions, with the number at the top of the box representing the total sites for each species. (B) Number of RNA editing sites at different codon positions. (C) The distribution of RNA editing sites across different PCGs
Identification of repeat sequences
In addition to the intergenic region differences, the extensive presence of repeat sequences and exogenous fragments is also an important factor in mitogenomic structural variances. The main specific repeat sequence types are simple sequence repeats (SSRs), tandem repeats, and dispersed repeats. In our investigation of A. membranaceus, a certain number of repeat sequences were found distributed in the mitogenome, including 399 SSRs, 4 tandem repeats, and 77 dispersed repeats. Among the identified SSRs, the mono- and di-nucleotide repeat sequences were the most abundant, with 164 loci (41.1%) and 160 loci (40.1%), respectively. The tetra-nucleotide repeats accounted for 13.5% of the total, followed by 4.5% of tri-, 0.5% of penta-, and 0.3% of hexa-nucleotide repeats (Fig. S3). Further analysis of the repeat units revealed that the mono-nucleotide repeats of A/T were more prevalent than the other repetitive types. Polynucleotide repeats only occupied a small portion of the entire SSRs and primarily occurred in the intergenic or intronic regions, excluding a single tri-nucleotide repeat of GAA that appeared in the coding region of rps1, as well as two tetra-nucleotide repeats of ATAA and CATT occurring in atp6 and rps3, respectively. Simultaneously, the comparison of A. membranaceus with eight other legumes indicated that the number of SSRs counted in the mitogenomes ranged from 262 in M. sativa to 419 in G. glabra (Fig. S3). The most abundant SSRs in these mitogenomes have mono- and di-nucleotide repeat units, comprising more than 80% of the total repeat number. The SSRs provided here could be developed as potential molecular markers for identification of A. membranaceus.
Tandem repeats refer to the sequences formed by duplication events, with approximately 1 to 200 bases as repeating units in tandem. In the A. membranaceus mitogenome, 4 tandem repeats ranging from 15 to 42 bp were detected (Table S3). All of these repeats, without exception, were located in intergenic spacers and contained two copies of the sequence. In addition, the mitogenome of A. membranaceus possessed 77 pairs of dispersed repeats with lengths greater than 30 bp, comprising 32 pairs of forward repeats and 45 pairs of palindromic repeats (Table S4). No reverse and complementary repeats were found within the species. The dispersed repeat sequences ranged from 30 to 8,031 bp in length, and the majority are small-sized repeats with lengths mainly between 30 bp and 100 bp. There were 13 such repeats with lengths of 100 bp or more (R1-R13), but only one of which was longer than 1 kb (R1). Additional comparison of different species indicates that the number of repeats in the compared plants varies from 55 in T. aureum to 137 in T. foenum-graecum, comprising 3.0–10.9% of the total genome (Table S5). In A. membranaceus, the 77 dispersed repeats covered 13,583 bp (3.4%) of the whole mitogenome. In terms of ranking according to both the number of repeats and the proportion of repeated sequences in the entire genome, A. membranaceus occupies a lower position compared to other species. Notably, A. membranaceus has fewer repeats—both in number and relative percentage—when compared to the similarly sized mitogenomes of A. complanatus, C. spinosa, and P. fulvum. Despite having the largest mitogenome (44,064 bp) among the analyzed species, G. glabra has the lowest relative percentage of repeats, accounting for only 3.0% of the total mitogenome size. In stark contrast, the most abundant repeats were unexpectedly detected in the small-sized mitogenome of T. foenum-graecum. Consequently, the total length of repeats may not be the major reason for the differences in the mitogenome size in legumes.
Estimation of selection pressure on PCGs
To estimate the evolutionary rate of mt PCGs, the Ka/Ks ratios of 27 single-copy orthologous genes of the A. membranaceus mitogenome were inspected and contrasted with those of eight other legumes. As shown in Fig. 4, the majority of PCGs had Ka/Ks values less than 1, implying purifying selection of these genes during evolution. In particular, notably low values for cob, cox1, cox2, cox3, nad2, nad4, nad5, nad6, rpl5, rpl16, and rps14 were identified in the presented comparisons, indicating that these 11 genes have undergone intense purification and may play vital roles in maintaining the indispensable functions of the mt. An additional five genes (ccmB, ccmFc, ccmFn, nad3, and nad9) were observed as having been subjected to positive selection pressure with Ka/Ks ratios greater than 1. Notably, the extraordinarily high values of ccmB and nad9 were detected in all the pairwise comparisons, suggesting the extremely critical role of these genes in the evolution of A. membranaceus. As many as seven positively selected genes were identified between A. membranaceus and C. spinosa, implying that these PCGs may have suffered from diverse selection pressures since the time of divergence of the two species from their last common ancestor. In stark contrast, only one positively selected gene was observed for the pair of A. membranaceus and O. arctobia, which hints strongly at their close evolutionary relationship.
Intracellular gene transfer of A. membranaceus organelle genomes
We searched for homologous sequences between the mt and cp genomes in A. membranaceus to identify potential IGT events. In the sequence similarity analysis, 22 fragments were identified as being homologous between the two organelle genomes (Fig. 5). The cp-derived fragments add up to 1,791 bp in length and occupy a proportion of 0.4% of the entire mitogenome and 1.4% of the cp genome. Among these fragments, the length ranged from 29 bp to 248 bp (Table S6). Additionally, regarding their distribution, the identified fragments were not found to be uniformly inserted into the circular chromosomes of A. membranaceus: of the 22 fragments, 15 are present on chr 2. Annotation analysis indicates that of these homologous fragments, five are intact tRNA genes (trnD-GUC, trnH-GUG, trnM-CAU, trnN-GUU, and trnW-CCA) and two are incomplete rRNA genes (rrn18/rrn16 and rrn26/rrn23). Nevertheless, we did not find cp-derived PCGs in the A. membranaceus mitogenome.
Synteny inference
A. membranaceus and its close relatives were subjected to synteny analysis to investigate their syntenic regions and rearrangements. Among the analyzed mitogenomes, numerous sequences or synteny were shared, especially between A. membranaceus and A. complanatus (Fig. 6). The overall mitogenome nucleotide homology of the two Astragalus plants is 77% (309 kb), which is consistent with their relationship as sister species. Additionally, a high degree of synteny conservation is also exhibited in the mitogenomes of A. membranaceus and O. arctobia. Meanwhile, lower amounts of DNA are shared between A. membranaceus and G. glabra (59%; 236 kb) and C. spinosa (56%; 224 kb). In stark contrast, the mitogenomes of A. membranaceus between T. aureum and M. sativa retained merely 160 kb and 174 kb of shared DNA, respectively, indicating extensive differentiation among the Fabaceae mitogenomes. To investigate whether recombination events disrupted the gene organization of A. membranaceus, the genomic locations of orthologous genes were compared among the nine selected species. As indicated in Table S7, a total of nine conserved gene clusters (i.e., rps3–rpl16, atp4–nad4L, rrn18–rrn5, nad3–rps12, cob–rps14–rpl5, nad1–matR, nad5–rps1, rps10–cox1, and rrn26–trnM) were identified across the compared mitogenomes. Eight of those conserved clusters were retained in all of the compared species, possibly dating back to their original mitogenomes. The considerable number of shared clusters seems to support the closer relationship between species during the evolutionary process.
Gene loss comparison and phylogenetic analysis
We compared the mitogenomes of A. membranaceus with 32 other species to investigate the different gene compositions among the Fabales mitogenomes. The information of the provided mitogenomes is detailed in Table S8. As depicted in Fig. 7 and 29 PCGs appeared in all 33 mitogenomes. The core genes were highly conserved among the analyzed species, with the exception of nad4L missing in only Sophora flavescens Aiton and cox2 missing in Medicago truncatula Gaertn., Phaseolus vulgaris L., and Vigna unguiculata (L.) Walp. However, substantial loss events were identified with respect to ribosomal protein and succinate dehydrogenase genes. Six genes (i.e., rpl2, rpl10, rps7, rps13, rps19, and sdh3) were almost completely lost from the analyzed Fabaceae mitogenomes. Consequently, the frequent loss of ribosome protein and sdh genes appears to be a dominant factor determining the high diversity of gene content among different angiosperm mitogenomes.
Subsequently, phylogenetic analysis of the 33 complete mitogenomes was conducted to explore the phylogenetic position of A. membranaceus in Fabaceae. Phylogenetic trees were constructed based on maximum likelihood (ML) and Bayesian inference (BI) methods using an aligned data matrix consisting of 10 single-copy orthologous genes from these species. The set of conserved PCGs comprises atp4, ccmB, cob, cox1, cox3, matR, nad4, nad6, rps4, and rps12. ML and BI trees had a consistent typology, and most branches had relatively high node support values, indicating the strong reliability of the phylogenetic relationship revealed by the mt PCGs (Fig. 8). The overall structure of this mtDNA-based phylogeny closely mirrors the taxonomic relationships among these species. The trees strongly support the separation of Fabaceae from the clade composed of Surianaceae. All taxa of the studied genus in the Fabaceae family showed good clustering, and species from the same genus clustered closely together into one clade. Our findings indicate that A. membranaceus and A. complanatus are well clustered, and the two Astragalus species are sisters to Oxytropis and nested in one branch with C. spinosa. In addition, A. membranaceus represents a relatively recent divergence, suggesting that the evolutionary rate among coding genes in the A. membranaceus mitogenome appears to be far slower than that in the other analyzed species, with the exception of S. flavescens, Vicia faba L., Arachis hypogaea L., M. sativa, Lupinus albus L., and A. complanatus.
The phylogenetic relationships among A. membranaceus and 32 Fabales species using the maximum likelihood (ML) and Bayesian inference (BI) methods. Suriana maritima was selected as an outgroup. Numbers on branches indicated bootstrap values of ML tree (left) and posterior probabilities of BI tree (right), respectively
Discussion
Mitochondria are the primary sites of respiration in plants, generating the energy needed for carrying out biological processes [35]. Plant mitogenomes present more complex and relatively conserved characteristics than the cp genome, which creates favorable conditions for better understanding the molecular evolution and phylogenetic relationships among different species, especially close relatives [36, 37]. The continuous development of genome-sequencing technology has accelerated the process of understanding plant mitogenomes, considering originally laborious methods that were used. The evolution of plant mitogenomes involves substantial genomic rearrangements, resulting in remarkable diversity in genome structure and size [38]. In this study, we employed a hybrid assembly strategy integrating second- and third-generation sequencing methods to complete the mitogenome of A. membranaceus—an invaluable Chinese medicinal plant from the Fabaceae family—for the first time. The assembled mitogenome was found to be a multi-chromosome structure, comprising two distinct circular chromosomes with a total length of 398,048 bp. Compared to previously reported Fabaceae species, the mitogenome of A. membranaceus is moderate in size (Table S5). GC content is a good indicator of genome characterization, reflecting the composition of amino acids within protein groups [6, 39]. The entire mitogenome of A. membranaceus has a GC content of 45.3%, similar to that of other Fabaceae species such as P. vulgaris (45.1%) [40], T. foenum-graecum (45.3%) [41], and M. truncatula (45.4%) [42]. In summary, the mitogenomes of Fabaceae are conserved in GC content but vary widely in genome size, which is congruent with other angiosperms [43].
The observed fluctuations in size are primarily attributed to the variation in intergenic regions, especially the accumulation of diverse repeats and the migration of foreign sequences [9, 10]. Previous studies have provided evidence that repeat sequences are essential for inter- or intra-molecular recombination and play a crucial role in shaping the mitogenome [44, 45]. Dispersed repeats are critical for altering genome size and generating genetic structural variation, which have a significant impact on the dynamics of plant genomes. Positive correlations between repeat content and mitogenome size have been detected in monocots, indicating that the scale of dispersed repeats may have contributed to the increase in their mitogenome size [46]. In this study, the A. membranaceus mitogenome was found to contain a certain number of repeat sequences, of which 77 are dispersed repeats with lengths greater than 30 bp, accounting for 3.4% of the whole genome. The comparative analysis revealed that repeats are poorly conserved in the Fabaceae mitogenomes (Table S5). The G. glabra mitogenome has only 3.0% repeat sequences despite having the largest genome (nearly 440,064 bp), while the smaller T. foenum-graecum genome (345,604 bp) contains more repeats (10.9%). Therefore, repeat content may not be a good indicator for evaluating the mitogenome size differentiation in leguminous plants. However, the mechanism for genome expansion may also be related to the length grade of repeats. The Faboideae mitogenomes harbor relatively high numbers of small repeats (< 100 bp), whereas large repeats (> 1,000 bp) are limited in the Faboideae [47]. In this study, the A. membranaceus mitogenome contains only one large repeat pair of 8,031 bp, while the most abundant repeats are small-sized (Table S4). The limited large repeats might give rise to a smaller genomic size for A. membranaceus. The homologous fragments originating from cp and nuclear DNA translocations to mt also contribute to the variation in mitogenome size [13, 21]. A total of 22 homologous fragments, ranging from 29 bp to 248 bp in length, were successfully identified as spanning from the cp to mt organelles within A. membranaceus. The proportion of these transfer fragments in A. membranaceus (0.4%) was relatively low, being 1–10.3% of mtDNA in most angiosperm mitogenomes for comparison [48]. The above results support the inference that plastid-derived sequences may not be the main cause of mitogenome expansion [8]. Further annotation of these fragments revealed that all five tRNA genes are intact, suggesting that they still have functions in the mitogenome. Collectively, there is frequent sequence migration between the mt and cp genomes of A. membranaceus, while its significance for the adaptive evolution of the A. membranaceus mitogenome remains to be explored in further studies.
RNA editing is a pervasive nucleotide modification process that occurs primarily in the organelles of higher plants, contributing to enriching the variation in genetic information and thus the diversity of gene products [49]. The prevalent occurrence of RNA editing is regulated by single-base transitions, with C-to-U conversions being the most common. In this study, a total of 500 sites were identified to have undergone RNA editing in all 33 mt PCGs of A. membranaceus. Unsurprisingly, all the predicted sites were uniform conversions of C to U, resulting in single-base transitions and occasionally double-base substitutions. Notably, all edited sites resulted in non-synonymous codon alterations, with 45.2% (226 edits) of the site changes involving amino acid transformation to Leu after RNA editing (Fig. 3A). The corresponding phenomena have also been observed in diverse species, such as Angelica biserrata (Shan et Yuan) Yuan et Shan [6], Artemisia argyi L [50]., and Taraxacum mongolicum Hand.-Mazz [51]. RNA editing has remarkably different biological functions, including affecting the function of proteins through changing the hydrophilicity and hydrophobicity of amino acids in the coding regions of genes, as well as promoting RNA splicing in the non-coding regions of genes [52, 53]. The presence of hydrophobic amino acids has a significant impact on protein stability. Conversely, an increase in the proportion of hydrophilic amino acids aids in protein folding [6]. In A. membranaceus, nearly half of the predicted sites (48.2%) involved the conversion of hydrophilic amino acids to hydrophobic amino acids (Table S2). The decrease in the proportion of hydrophilic amino acids would be expected to improve the overall stability of protein structures. In addition, the uneven distribution of predicted sites among A. membranaceus mt PCGs revealed that post-transcriptional modification may be crucial for maintaining the functions of genes for cytochrome c biogenesis, complex I, and transport membrane proteins (Fig. 3C). The frequency and type of RNA editing are generally conserved in closely related lineages [43, 54]. In the comparison involving multiple species, the number of editing events in different Fabaceae mitogenomes was found to be highly conserved, providing further evidence for their highly conserved PCGs and close evolutionary relationships.
Analysis of the nucleotide substitution rate is a valuable means for understanding the dynamics of molecular evolution [55]. The Ka/Ks ratio is commonly used to estimate the selective pressure experienced by specific genome PCGs during evolution and reveal the underlying genetic mechanism [56]. Generally speaking, most PCGs suffer as a result of neutral or negative selection, considering their high conserved nature, causing the Ka/Ks ratios not to exceed 1 [43, 57]. In our comparison of 27 single-copy orthologous genes in A. membranaceus mtDNA and eight other legumes, the majority of PCGs were negatively selected during the evolutionary process, implying that these genes are highly conserved in Fabaceae. In particular, significantly lower Ka/Ks values were observed for 11 genes (cob, cox1, cox2, cox3, nad2, nad4, nad5, nad6, rpl5, rpl16, and rps14) in the present investigation (Fig. 4), indicating that intense purification acted on these genes in maintaining their indispensable functions in mitogenomes. However, within the nine compared plants, five genes (ccmB, ccmFc, ccmFn, nad3, and nad9) showed signs of positive selection. The genes ccmB, ccmFc, and ccmFn are presumably participate in the process of mt cytochrome c maturation through regulating the export and metabolism of heme [58, 59]. As nad3 and nad9 are vital members of NADH dehydrogenase and encode the essential components of mt complex I, we speculate that the adaptive evolution of A. membranaceus may be related to the improvements in respiratory electron transport and oxidative phosphorylation. Notably, the extraordinarily high Ka/Ks ratios of ccmB and nad9 indicates their extremely critical roles in the evolution of A. membranaceus, possibly associated with environmental stress. Collectively, these five genes that have experienced positive selection might have developed novel functions and played an important role in the adaptability of A. membranaceus.
The arrangement of homologous regions has been extensively applied toward understanding the phylogenetic status between species. Synteny analysis provides an effective method for deciphering the arrangement of sequences or genes [60]. In the present study, massive synteny was found, demonstrating the high genetic similarity among the Fabaceae mitogenomes (Fig. 6). The mitogenome sequences of A. membranaceus and A. complanatus showed a remarkably high identity of 77%, indicating that species with closer genetic relationships always share the majority of sequences. However, these homologous sequences are arranged irregularly, indicating substantial gene rearrangements in the compared mitogenomes. Plant mitogenomes typically represent high variability in gene organization, due to the high frequency of homologous recombination [61]. However, from a phylogenetic perspective, gene orders and clusters remain conserved in evolutionarily close species [62]. In our study, a total of nine conserved gene clusters were identified across the compared mitogenomes (Table S7). Eight of those clusters were shared by all of the selected species, demonstrating their high conservation in Fabaceae mitogenomes. Comparatively, the cluster sdh4–cox3–(atp8) was extensively preserved in dicots [63], but was absent or lost in the analyzed legumes. These exceptions may be attributed to the frequent recombination events occurring during plant mitogenome evolution [62].
During the course of evolution, mitogenomes have experienced frequent loss of genes and apparent reduction of shared DNA among the extant angiosperms. As demonstrated in our study, the genes responsible for electron transport and oxidative phosphorylation subunits are quite stable, whereas genes encoding succinate dehydrogenases and ribosomal proteins are prone to being lost from plant mitogenomes (Fig. 7). The content of ribosomal protein genes differs considerably among plant mitogenomes, usually reflecting distinct losses and transfers to the nucleus [64]. Within Fabaceae, six genes (i.e., rpl2, rpl10, rps7, rps13, rps19, and sdh3) were almost completely absent from the mitogenomes, implying that the loss of these genes may have occurred in a common ancestor and their functions may have been compensated for by other genes during rapid evolution. Mitogenome analyses have shed light on phylogenetic relationships among higher plants, as evidenced in Apiales [6], Malpighiales [65], and Asteraceae [66]. To gain further insight into the phylogenetic position of A. membranaceus in Fabaceae, comprehensive phylogenetic analysis was performed using 10 single-copy orthologous genes from 33 species. The overall structure of this mtDNA-based phylogeny reflects the well-defined taxonomic relationships among these species. Our results support the availability of mitogenome data in revealing the taxonomic relationships among families, orders, or higher systematic levels of higher plants.
Conclusions
In the present study, the complete mitogenome of A. membranaceus was described and comparatively analyzed with other available Fabaceae species for the first time. We successfully deciphered the A. membranaceus mitogenome through integrating Nanopore and Illumina sequencing technology, revealing the multi-chromosome conformation of its mtDNA structure. The mitogenome presents a high preference in codon usage and preserves a considerable number of cp-derived intracellular gene transfer fragments, providing valuable genetic information. Extensive analyses were conducted by comparing the genomic characteristics of the A. membranaceus mt with those of multiple Fabaceae mitogenomes, including prediction of RNA editing, repeat element identification, selective pressure estimation, synteny inference, phylogenetic analysis, and gene loss comparison. The discovery of significant genomic variability contributes to enriching genetic information and broadening our understanding of mitogenome diversity and evolution in higher plants. In conclusion, the knowledge gained from the our mitogenomic study provides many new insights into Astragalus genetics and evolution, laying the foundation for further molecular-assisted classification and genetic improvements involving A. membranaceus.
Methods
Plant materials and mitogenome sequencing
The sample material of Astragalus membranaceus (Fisch.) Bunge var. mongholicus (Bunge) P.K. Hsiao was collected from Hengshan mountain in Datong, Shanxi, China (39°38′N, 113°45′E) (Fig. S4). The plant sample was identified by Kun Zhang. The specimens of A. membranaceus has been deposited in herbarium in Shanxi Datong University with the voucher number: HSAM202301. High-quality genomic DNA was extracted from fresh young leaves using a modified CTAB procedure [67]. The quality and concentration of DNA samples were monitored using agarose gel electrophoresis, NanoDrop instrument (Thermo Fisher Scientific, USA), and Qubit fluorometer (Thermo Fisher Scientific, USA). A hybrid short-read (Illumina) and long-read (Oxford Nanopore) sequencing strategy was implemented to obtain full-length mitogenome sequences.
First, optional fragmentation of large mtDNA sequences was performed using the BluePippin system (Sage Science, USA). A > 8 kb library was constructed using the SQK-LSK109 ligation kit (Oxford Nanopore Technologies, ONT, UK) and the standard protocol. The purified library was loaded into a Nanopore PromethION sequencing platform (ONT, UK) and 9.9 Gb in 1.2 million raw long reads (SRA accession SRR25949529) were produced. Then, 9.7 Gb clean data with an average read size of 8,673 bp remained after filtering and re-editing using NanoFilt and NanoPlot in Nanopack [68], which covered 101X average depth of the mitogenome. Meanwhile, libraries with an average fragment length of 350 bp were prepared utilizing the NEBNext® Ultra™ DNA Library Prep Kit (New England Biolabs, England) and then sequenced on the Illumina Novaseq 6000 platform (Illumina, USA) with 150 bp paired-end reads. The Illumina short-read sequencing yielded 15.5 Gb in 51.6 million reads (SRA accession SRR25949646). After editing using the NGS QC Tool Kit v2.3.3 [69], 15.1 Gb clean data were generated from 50.5 million reads, corresponding to an average depth of coverage of 240X across the entire mitogenome.
Mitogenome assembly and annotation
The initial assembly was performed using Miniasm v0.3-r179 [70]. After trimming adapter sequences with Porechop v0.2.4 [71] and polishing the resulting assembly with Racon v1.4.7 [72], a rough but computationally efficient assembly was obtained. Potential mt contigs with homology to the Astragalus complanatus mitogenome (NCBI Reference Sequence: NC_065024.1) were identified using Bandage [73]. Subsequently, we proceeded to align the Nanopore reads to our draft A. membranaceus assembly using minimap2 [74]. The aligned readings were then separated and reassembled using Flye v2.6 [75] and Canu v2.1.1 [76] sequentially. The final mitogenome sequence was obtained by polishing with Pilon [77] using Illumina Novaseq sequencing reads. Ultimately, through the assembly procedures mentioned above, a double-ring structure was generated for the A. membranaceus mitogenome.
Referring previous angiosperm mt genes to query sequences in the NCBI database (https://www.blast.ncbi.nlm.nih.gov), the complete mitogenome of A. membranaceus was annotated using MITOFY [54] and MFANNOT [78]. The tRNA and rRNA genes were detected using the tRNAscan-SE 2.0 [79] and BLASTN software [80], respectively. The origins of tRNA genes were identified using BLASTN on tRNAscan-SE 2.0 with the matching rate set to ≥ 70%, the length set to ≥ 30, and an E-value of ≤ 1 × 10− 3. Subsequently, the OGDRAW program [81] was further used to visualize the circular and syntenic gene cluster maps of the A. membranaceus mitogenome. The assembled complete sequence of the A. membranaceus mitogenome has been deposited in GenBank (NCBI), under accession numbers OR567492.1 and OR567493.1.
Codon usage identification and prediction of RNA editing sites
The protein-coding sequences of the mitogenome were extracted using PhyloSuite v1.2.2 [82]. CodonW v1.4.4 was used to perform codon usage analysis on the PCGs and calculate the amino acid composition and RSCU values. In addition, we used the online PREP-Mt server suite (http://prep.unl.edu/) to predict putative RNA editing sites in the PCGs of the mitogenomes of A. membranaceus and other Fabaceae members (i.e., Astragalus complanatus, Caragana spinosa, Glycyrrhiza glabra, Medicago sativa, Oxytropis arctobia, Pisum fulvum, Trifolium aureum, and Trigonella foenum-graecum). To obtain a more accurate prediction for edits, the cutoff value was set to 0.2 [83].
Repeat elements and selective pressure analysis
Three types of repeats—SSRs, tandem repeats, and dispersed repeats—were assessed across the target species mitogenomes. SSRs were searched using the MISA webserver (https://webblast.ipk-gatersleben.de/misa/) [84] with eight, four, four, three, three, and three repeat units set as minimum thresholds corresponding to the motif size of one to six nucleotides, respectively. Tandem repeats were predicted using the Tandem Repeats Finder v4.09 program (http://tandem.bu.edu/trf/trf.html) [85] with parameter settings of two for matches and seven for mismatches and indels. The public tool REPuter (https://bibiserv.cebitec.uni-bielefeld.de/reputer) [86] was further used to identify dispersed repeats, with a minimal repeat size of 30 bp.
To infer the direction and magnitude of natural selection acting on PCGs during the evolution of A. membranaceus, the pairwise nucleotide substitution rates were estimated for the shared PCGs in mtDNA of nine closely related Fabaceae species. The single-copy orthologous genes were captured using OrthoFinder v2.3.14 [87]. ParaAT2.0 was used with default settings to align and format the orthologous gene pairs [88]. KaKs_Calculator v.2.0 [89] was used to calculate the synonymous (Ks) and non-synonymous (Ka) substitution rates as well as Ka/Ks values.
Intracellular gene transfer analysis
The cp genome of A. membranaceus was de novo assembled using SPAdes v3.11.0 software [90], and the sequence was then annotated using the PGA software package [91], using the cp genome of Astragalus melilotoides Pall. (NCBI Reference Sequence: NC_072247.1) as the initial reference. The assembled complete cp genome sequence of A. membranaceus has been submitted to GenBank, and is openly available under the accession number OR525835.1. To identify the homologous fragments in the mt and cp genomes, we employed BLASTN software [80] with an E-value of 1 × 10− 5 as the screening criterion. The potential transferred DNA fragments were then visualized using the Circos package v0.69 [92].
Synteny and phylogenetic inferences
To determine mtDNA shared between species, each pair of mitogenomes was searched using BLASTN with parameter settings of a word size of 7 and an E-value of 1 × 10− 5. Genome synteny and rearrangements between A. membranaceus and closely related species were analyzed using Mauve v2.3.1 [93] with a cutoff value of 42. To understand the phylogenetic position of A. membranaceus in the Fabaceae family, the mitogenomes of closely related species were selected in constructing phylogenetic trees. A total of 33 species mitogenomes were downloaded from NCBI, including Suriana maritima L. (Surianaceae) as the outgroup (Table S8). The nucleic acid sequences of core orthologous genes among the analyzed species were extracted and concatenated using OrthoFinder v2.3.14 [87], and then aligned using MUSCLE v3.8.1551 [94]. The optimal evolutionary model of CpREV + I + G + F was selected using ProtTest v3.4.2 [95] based on Bayesian Information Criterion (BIC). Subsequently, based on the matrix of concatenated sequences, the maximum likelihood (ML) phylogenetic tree was established using IQ-TREE v1.6.12 [96] with 1,000 bootstrap replicates. For the Bayesian inference (BI) tree, we used ModelFinder [97] to find the best-fit model as CpREV + F + I + G4. Then, the BI analysis was performed by MrBayes v3.2.6 [98] with the Markov Chain Monte Carlo method for 200,000 generations and sampling trees every 1,000 generations. The first 25% of trees discarded as burn-in, the remaining trees were used to generate a consensus tree. Finally, tree visualization was achieved using iTOL v6 (https://itol.embl.de/) [99].
Data availability
The assembled complete sequence of the A. membranaceus mitogenome has been deposited in GenBank (NCBI), under accession numbers OR567492.1 and OR567493.1. The associated BioProject, SRA and Bio-Sample numbers are PRJNA1014209, SRR25949529, SRR25949646, SAMN37322111, and SAMN37320107, respectively.
Abbreviations
- BI:
-
Bayesian inference
- Cp:
-
Chloroplast
- cpDNA:
-
Chloroplast DNA
- ML:
-
Maximum likelihood
- Mt:
-
Mitochondria
- mtDNA:
-
Mitochondrial DNA
- NCBI:
-
National Center for Biotechnology Information
- PCGs:
-
Protein-coding genes
- RSCU:
-
Relative synonymous codon usage
- SSRs:
-
Simple sequence repeats
References
Liberatore KL, Dukowic-Schulze S, Miller ME, Chen CB, Kianian SF. The role of mitochondria in plant development and stress tolerance. Free Radical Biol Med. 2016;100:238–56.
Timmis JN, Ayliffe MA, Huang CY, Martin W. Endosymbiotic gene transfer: organelle genomes forge eukaryotic chromosomes. Nat Rev Genet. 2004;5(2):123–35.
Wu ZQ, Sloan DB. Recombination and intraspecific polymorphism for the presence and absence of entire chromosomes in mitochondrial genomes. Heredity (Edinb). 2019;122(5):647–59.
Sloan DB, Alverson AJ, Chuckalovcak JP, Wu M, McCauley DE, Palmer JD, Taylor DR. Rapid evolution of enormous, multichromosomal genomes in flowering plant mitochondria with exceptionally high mutation rates. PLoS Biol. 2012;10(1):e1001241.
Kozik A, Rowan BA, Lavelle D, Berke L, Schranz ME, Michelmore RW, Christensen AC. The alternative reality of plant mitochondrial DNA: one ring does not rule them all. PLoS Genet. 2019;15(8):e1008373.
Wang L, Liu X, Xu YJ, Zhang ZW, Wei YS, Hu Y, Zheng CB, Qu XY. Assembly and comparative analysis of the first complete mitochondrial genome of a traditional Chinese medicine Angelica Biserrata (Shan Et Yuan) Yuan Et Shan. Int J Biol Macromol. 2024;257(Pt 1):128571.
Skippington E, Barkman TJ, Rice DW, Palmer JD. Miniaturized mitogenome of the parasitic plant Viscum Scurruloideum is extremely divergent and dynamic and has lost all nad genes. Proc Natl Acad Sci USA. 2015;112(27):E3515–24.
Putintseva YA, Bondar EI, Simonov EP, Sharov VV, Oreshkova NV, Kuzmin DA, Konstantinov YM, Shmakov VN, Belkov VI, Sadovsky MG, Keech O, Krutovsky KV. Siberian larch (Larix sibirica Ledeb.) Mitochondrial genome assembled using both short and long nucleotide sequence reads is currently the largest known mitogenome. BMC Genomics. 2020;21(1):654.
Goremykin VV, Lockhart PJ, Viola R, Velasco R. The mitochondrial genome of Malus domestica and the import-driven hypothesis of mitochondrial genome expansion in seed plants. Plant J. 2012;71(4):615–26.
Donnelly K, Cottrell J, Ennos RA, Vendramin GG, A’Hara S, King S, Perry A, Wachowiak W, Cavers S. Reconstructing the plant mitochondrial genome for marker discovery: a case study using Pinus. Mol Ecol Resour. 2017;17(5):943–54.
Smith DR, Keeling PJ. Mitochondrial and plastid genome architecture: reoccurring themes, but significant differences at the extremes. Proc Natl Acad Sci USA. 2015;112(33):10177–84.
Yang HY, Ni Y, Zhang XY, Li JL, Chen HM, Liu C. The mitochondrial genomes of Panax notoginseng reveal recombination mediated by repeats associated with DNA replication. Int J Biol Macromol. 2023;252:126359.
Wang J, Kan SL, Liao XZ, Zhou JW, Tembrock LR, Daniell H, Jin SX, Wu ZQ. Plant organellar genomes: much done, much more to do. Trends Plant Sci. 2024;29(7):754–69.
Kubo T, Newton KJ. Angiosperm mitochondrial genomes and mutations. Mitochondrion. 2008;8(1):5–14.
Schnable PS, Wise RP. The molecular basis of cytoplasmic male sterility and fertility restoration. Trends Plant Sci. 1998;3(5):175–80.
Adams KL, Palmer JD. Evolution of mitochondrial gene content: gene loss and transfer to the nucleus. Mol Phylogenet Evol. 2003;29(3):380–95.
Drouin G, Daoud H, Xia JN. Relative rates of synonymous substitutions in the mitochondrial, chloroplast and nuclear genomes of seed plants. Mol Phylogenet Evol. 2008;49(3):827–31.
Sloan DB, Wu ZQ. Molecular evolution: the perplexing diversity of mitochondrial RNA editing systems. Curr Biol. 2016;26(1):R22–4.
Mower JP, Stefanović S, Young GJ, Palmer JD. Plant genetics: gene transfer from parasitic to host plants. Nature. 2004;432(7014):165–6.
Petersen G, Anderson B, Braun HP, Meyer EH, Møller IM. Mitochondria in parasitic plants. Mitochondrion. 2020;52:173–82.
Gualberto JM, Newton KJ. Plant mitochondrial genomes: dynamics and mechanisms of mutation. Annu Rev Plant Biol. 2017;68:225–52.
Van de Paer C, Bouchez O, Besnard G. Prospects on the evolutionary mitogenomics of plants: a case study on the olive family (Oleaceae). Mol Ecol Resour. 2018;18(3):407–23.
Bratkov VM, Shkondrov AM, Zdraveva PK, Krasteva IN. Flavonoids from the genus Astragalus: phytochemistry and biological activity. Pharmacogn Rev. 2016;10(19):11–32.
Tian CY, Li XS, Wu ZN, Li ZY, Hou XY, Li FY. Characterization and comparative analysis of complete chloroplast genomes of three species from the Genus Astragalus (Leguminosae). Front Genet. 2021;12:705482.
Guo ZZ, Lou YM, Kong MY, Luo Q, Liu ZQ, Wu JJ. A systematic review of phytochemistry, pharmacology and pharmacokinetics on Astragali radix: implications for Astragali radix as a personalized medicine. Int J Mol Sci. 2019;20(6):1463.
Auyeung KK, Han QB, Ko JK. Astragalus membranaaeus: A review of its protection against inflammation and gastrointestinal cancers. Am J Chin Med. 2016;44(1):1–22.
Fu J, Wang ZH, Huang LF, Zheng SH, Wang DM, Chen SL, Zhang HT, Yang SH. Review of the botanical characteristics, phytochemistry, and pharmacology of Astragalus Membranaceus (Huangqi). Phytother Res. 2014;28(9):1275–83.
Sheik A, Kim K, Varaprasad GL, Lee H, Kim S, Kim E, Shin JY, Oh SY, Huh YS. The anti-cancerous activity of adaptogenic herb Astragalus Membranaceus. Phytomedicine. 2021;91:153698.
Lin LZ, He XG, Lindenmaier M, Nolan G, Yang J, Cleary M, Qiu SX, Cordell GA. Liquid chromatography-electrospray ionization mass spectrometry study of the flavonoids of the roots of Astragalus mongholicus and A. Membranaceus. J Chromatogr A. 2000;876(1–2):87–95.
Antil S, Abraham JS, Sripoorna S, Maurya S, Dagar J, Makhija S, Bhagat P, Gupta R, Sood U, Lal R, Toteja R. DNA barcoding, an effective tool for species identification: a review. Mol Biol Rep. 2023;50(1):761–75.
Lei WJ, Ni DP, Wang YJ, Shao JJ, Wang XC, Yang D, Wang JS, Chen HM, Liu C. Intraspecific and heteroplasmic variations, gene losses and inversions in the chloroplast genome of Astragalus Membranaceus. Sci Rep. 2016;6:21669.
Wang B, Chen HM, Ma HP, Zhang H, Lei WJ, Wu WW, Shao JJ, Jiang M, Zhang H, Jia ZP, Liu C. Complete plastid genome of Astragalus membranaceus (Fisch.) Bunge var. Membranaceus. Mitochondrial DNA B Resour. 2016;1(1):517–9.
Chen Y, Fang T, Su H, Duan SF, Ma RR, Wang P, Wu L, Sun WB, Hu QC, Zhao MX, Sun LJ, Dong XH. A reference-grade genome assembly for Astragalus mongholicus and insights into the biosynthesis and high accumulation of triterpenoids and flavonoids in its roots. Plant Commun. 2023;4(2):100469.
Feng YL, Xiang XG, Akhter D, Pan RH, Fu ZX, Jin XH. Mitochondrial phylogenomics of Fagales provides insights into plant mitogenome mosaic evolution. Front Plant Sci. 2021;12:762195.
Millar AH, Whelan J, Soole KL, Day DA. Organization and regulation of mitochondrial respiration in plants. Annu Rev Plant Biol. 2011;62:79–104.
Yu XL, Duan ZG, Wang YJ, Zhang QX, Li W. Sequence analysis of the complete mitochondrial genome of a medicinal plant, Vitex rotundifolia Linnaeus f. (Lamiales: Lamiaceae). Genes. 2022;13(5):839.
Li J, Tang H, Luo H, Tang J, Zhong N, Xiao LZ. Complete mitochondrial genome assembly and comparison of Camellia sinensis var. Assamica Cv. Duntsa. Front Plant Sci. 2023;14:1117002.
Wu ZQ, Liao XZ, Zhang XN, Tembrock LR, Broz A. Genomic architectural variation of plant mitochondria-a review of multichromosomal structuring. J Syst Evol. 2020;60(1):160–8.
Ma QY, Wang YX, Li SS, Wen J, Zhu L, Yan KY, Du YM, Ren J, Li SX, Chen Z, Bi CW, Li QZ. Assembly and comparative analysis of the first complete mitochondrial genome of Acer Truncatum Bunge: a woody oil-tree species producing nervonic acid. BMC Plant Biol. 2022;22(1):29.
Bi CW, Lu N, Xu YQ, He CP, Lu ZH. Characterization and analysis of the mitochondrial genome of common bean (Phaseolus vulgaris) by comparative genomic approaches. Int J Mol Sci. 2020;21(11):3778.
He YF, Liu WY, Wang JL. Assembly and comparative analysis of the complete mitochondrial genome of Trigonella foenum-graecum L. BMC Genomics. 2023;24(1):756.
Bi CW, Wang XL, Xu YQ, Wei SY, Shi Y, Dai XG, Yin TM, Ye N. The complete mitochondrial genome of Medicago truncatula. Mitochondrial DNA B Resour. 2016;1(1):122–3.
Ke SJ, Liu DK, Tu XD, He X, Zhang MM, Zhu MJ, Zhang DY, Zhang CL, Lan SR, Liu ZJ. Apostasia mitochondrial genome analysis and monocot mitochondria phylogenomics. Int J Mol Sci. 2023;24(9):7837.
Mackenzie S, He S, Lyznik A. The elusive plant mitochondrion as a genetic system. Plant Physiol. 1994;105(3):775–80.
Guo WH, Zhu AD, Fan WS, Mower JP. Complete mitochondrial genomes from the ferns Ophioglossum californicum and Psilotum nudum are highly repetitive with the largest organellar introns. New Phytol. 2017;213(1):391–403.
Zhang K, Wang YH, Zhang X, Han ZP, Shan XF. Deciphering the mitochondrial genome of Hemerocallis citrina (Asphodelaceae) using a combined assembly and comparative genomic strategy. Front Plant Sci. 2022;13:1051221.
Shi YC, Liu Y, Zhang SZ, Zou R, Tang JM, Mu WX, Peng Y, Dong SS. Assembly and comparative analysis of the complete mitochondrial genome sequence of Sophora japonica ‘JinhuaiJ2’. PLoS ONE. 2018;13(8):e0202485.
Sloan DB, Wu ZQ. History of plastid DNA insertions reveals weak deletion and at mutation biases in angiosperm mitochondrial genomes. Genome Biol Evol. 2014;6(12):3210–21.
Sun T, Bentolila S, Hanson MR. The unexpected diversity of plant organelle RNA editosomes. Trends Plant Sci. 2016;21(11):962–73.
Chen HM, Huang LF, Yu J, Miao YJ, Liu C. Mitochondrial genome of Artemisia argyi L. suggested conserved mitochondrial protein-coding genes among genera Artemisia, Tanacetum and Chrysanthemum. Gene. 2023;871:147427.
Jiang M, Ni Y, Li JL, Liu C. Characterisation of the complete mitochondrial genome of Taraxacum mongolicum revealed five repeat-mediated recombinations. Plant Cell Rep. 2023;42(4):775–89.
Guo WH, Grewe F, Mower JP. Variable frequency of plastid RNA editing among ferns and repeated loss of uridine-to-cytidine editing from vascular plants. PLoS ONE. 2015;10(1):e0117075.
Wu B, Chen HM, Shao JJ, Zhang H, Wu K, Liu C. Identification of symmetrical RNA editing events in the mitochondria of Salvia miltiorrhiza by strand-specific RNA sequencing. Sci Rep. 2017;7:42250.
Alverson AJ, Wei XX, Rice DW, Stern DB, Barry K, Palmer JD. Insights into the evolution of mitochondrial genome size from complete sequences of Citrullus lanatus and Cucurbita pepo (Cucurbitaceae). Mol Biol Evol. 2010;27(6):1436–48.
Li J, Zhang Z, Vang S, Yu J, Wong GK, Wang J. Correlation between Ka/Ks and Ks is related to substitution model and evolutionary lineage. J Mol Evol. 2009;68(4):414–23.
Zhang Z, Li J, Zhao XQ, Wang J, Wong GK, Yu J. KaKs_Calculator: calculating ka and ks through model selection and model averaging. Genom Proteom Bioinf. 2006;4(4):259–63.
ZFR, KWJ, LYR, CXY, ZT, XBJ, QLM, YZY, MYT. Comprehensive analysis of the complete mitochondrial genomes of three Coptis species (C. Chinensis, C. Deltoidea and C. Omeiensis): the important medicinal plants in China. Front Plant Sci. 2023;14:1166420.
Giegé P, Grienenberger JM, Bonnard G. Cytochrome c biogenesis in mitochondria. Mitochondrion. 2008;8(1):61–73.
Verrier PJ, Bird D, Burla B, Dassa E, Forestier C, Geisler M, Klein M, Kolukisaoglu U, Lee Y, Martinoia E, Murphy A, Rea PA, Samuels L, Schulz B, Spalding EJ, Yazaki K, Theodoulou FL. Plant ABC proteins–a unified nomenclature and updated inventory. Trends Plant Sci. 2008;13(4):151–9.
Zhang X, Shan YY, Li JL, Qin QL, Yu J, Deng HP. Assembly of the complete mitochondrial genome of Pereskia aculeata revealed that two pairs of repetitive elements mediated the recombination of the genome. Int J Mol Sci. 2023;24(9):8366.
Palmer JD, Adams KL, Cho Y, Parkinson CL, Qiu YL, Song K. Dynamic evolution of plant mitochondrial genomes: mobile genes and introns and highly variable mutation rates. Proc Natl Acad Sci USA. 2000;97(13):6960–6.
Liao XF, Zhao YH, Kong XJ, Khan A, Zhou BJ, Liu DM, Kashif MH, Chen P, Wang H, Zhou RY. Complete sequence of kenaf (Hibiscus cannabinus) mitochondrial genome and comparative analysis with the mitochondrial genomes of other plants. Sci Rep. 2018;8(1):12714.
Siqueira SF, Dias SM, Hardouin P, Pereira FR, Lejeune B, de Souza AP. Transcription of succinate dehydrogenase subunit 4 (sdh4) gene in potato: detection of extensive RNA editing and co-transcription with cytochrome oxidase subunit III (cox3) gene. Curr Genet. 2002;41(4):282–9.
Adams KL, Qiu YL, Stoutemyer M, Palmer JD. Punctuated evolution of mitochondrial gene content: high and variable rates of mitochondrial gene loss and transfer to the nucleus during angiosperm evolution. Proc Natl Acad Sci USA. 2002;99(15):9905–12.
Wang X, Zhang RG, Yun QZ, Xu YY, Zhao GC, Liu JM, Shi SL, Chen Z, Jia LM. Comprehensive analysis of complete mitochondrial genome of Sapindus mukorossi Gaertn.: an important industrial oil tree species in China. Ind Crop Prod. 2021;174:114210.
Wu ZH, Yang TG, Qin R, Liu H. Complete mitogenome and phylogenetic analysis of the Carthamus tinctorius L. Genes (Basel). 2023;14(5):979.
Porebski S, Bailey LG, Baum BR. Modification of a CTAB DNA extraction protocol for plants containing high polysaccharide and polyphenol components. Plant Mol Biol Rep. 1997;15(1):8–15.
De Coster W, D’Hert S, Schultz DT, Cruts M, Van Broeckhoven C. Bioinformatics. 2018;34(15):2666–9. NanoPack: visualizing and processing long-read sequencing data.
Patel RK, Jain M. NGS QC toolkit: a toolkit for quality control of next generation sequencing data. PLoS ONE. 2012;7(2):e30619.
Li H. Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences. Bioinformatics. 2016;32(14):2103–10.
Wick RR, Judd LM, Gorrie CL, Holt KE. Completing bacterial genome assemblies with multiplex MinION sequencing. Microb Genom. 2017;3(10):e000132.
Vaser R, Sović I, Nagarajan N, Šikić M. Fast and accurate de novo genome assembly from long uncorrected reads. Genome Res. 2017;27(5):737–46.
Wick RR, Schultz MB, Zobel J, Holt KE. Bandage: interactive visualization of de novo genome assemblies. Bioinformatics. 2015;31(20):3350–2.
Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018;34(18):3094–100.
Kolmogorov M, Yuan J, Lin Y, Pevzner PA. Assembly of long, error-prone reads using repeat graphs. Nat Biotechnol. 2019;37(5):540–6.
Koren S, Walenz BP, Berlin K, Miller JR, Bergman NH, Phillippy AM. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 2017;27(5):722–36.
Walker BJ, Abeel T, Shea T, Priest M, Abouelliel A, Sakthikumar S, Cuomo CA, Zeng QD, Wortman J, Young SK, Earl AM. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS ONE. 2014;9(11):e112963.
Gautheret D, Lambert A. Direct RNA motif definition and identification from multiple sequence alignments using secondary structure profiles. J Mol Biol. 2001;313(5):1003–11.
Schattner P, Brooks AN, Lowe TM. The tRNAscan-SE, snoscan and snoGPS web servers for the detection of tRNAs and snoRNAs. Nucleic Acids Res. 2005;33:W686–9.
Chen Y, Ye WC, Zhang YD, Xu YS. High speed BLASTN: an accelerated MegaBLAST search tool. Nucleic Acids Res. 2015;43(16):7762–8.
Greiner S, Lehwark P, Bock R. OrganellarGenomeDRAW (OGDRAW) version 1.3.1: expanded toolkit for the graphical visualization of organellar genomes. Nucleic Acids Res. 2019;47(W1):W59–64.
Zhang D, Gao FL, Jakovlić I, Zou H, Zhang J, Li WX, Wang GT. PhyloSuite: an integrated and scalable desktop platform for streamlined molecular sequence data management and evolutionary phylogenetics studies. Mol Ecol Resour. 2020;20(1):348–55.
Mower JP. The PREP suite: predictive RNA editors for plant mitochondrial genes, chloroplast genes and user-defined alignments. Nucleic Acids Res. 2009;37:W253–9.
Beier S, Thiel T, Münch T, Scholz U, Mascher M. MISA-web: a web server for microsatellite prediction. Bioinformatics. 2017;33(16):2583–5.
Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 1999;27(2):573–80.
Kurtz S, Choudhuri JV, Ohlebusch E, Schleiermacher C, Stoye J, Giegerich R. Reputer: the manifold applications of repeat analysis on a genomic scale. Nucleic Acids Res. 2001;29(22):4633–42.
Emms DM, Kelly S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 2019;20(1):238.
Zhang Z, Xiao JF, Wu JY, Zhang HY, Liu GM, Wang XM, Dai L. ParaAT: a parallel tool for constructing multiple protein-coding DNA alignments. Biochem Biophys Res Commun. 2012;419(4):779–81.
Wang DP, Zhang YB, Zhang Z, Zhu J, Yu J. KaKs_Calculator 2.0: a toolkit incorporating gamma-series methods and sliding window strategies. Genom Proteom Bioinf. 2010;8(1):77–80.
Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin VM, Nikolenko SI, Pham S, Prjibelski AD, Pyshkin AV, Sirotkin AV, Vyahhi N, Tesler G, Alekseyev MA, Pevzner PA. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol. 2012;19(5):455–77.
Qu XJ, Moore MJ, Li DZ, Yi TS. PGA: a software package for rapid, accurate, and flexible batch annotation of plastomes. Plant Methods. 2019;15:50.
Zhang HG, Meltzer P, Davis S. RCircos: an R package for Circos 2D track plots. BMC Bioinformatics. 2013;14:244.
Darling AE, Mau B, Perna NT. progressiveMauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS ONE. 2010;5(6):e11147.
Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32(5):1792–7.
Darriba D, Taboada GL, Doallo R, Posada D. ProtTest 3: fast selection of best-fit models of protein evolution. Bioinformatics. 2011;27(8):1164–5.
Nguyen LT, Schmidt HA, von Haeseler A, Minh BQ. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol. 2015;32(1):268–74.
Kalyaanamoorthy S, Minh BQ, Wong TKF, von Haeseler A, Jermiin LS. ModelFinder: fast model selection for accurate phylogenetic estimates. Nat Methods. 2017;14(6):587–9.
Ronquist F, Teslenko M, van der Mark P, Ayres DL, Darling A, Höhna S, Larget B, Liu L, Suchard MA, Huelsenbeck JP. MrBayes 3.2: efficient bayesian phylogenetic inference and model choice across a large model space. Syst Biol. 2012;61(3):539–42.
Letunic I, Bork P. Interactive tree of life (iTOL) v5: an online tool for phylogenetic tree display and annotation. Nucleic Acids Res. 2021;49(W1):W293–6.
Acknowledgements
We thank the editor and the anonymous reviewers for their insightful comments and suggestions on the manuscript. The authors thank Shenzhen Huitong Biotechnology Co., Ltd., China for genome sequencing and MDPI for English language editing.
Funding
This work was funded by Scientific Research Project of Shanxi Datong University, China (Grant No. 2022CXY22), Provincial Natural Science Foundation of Liaoning, China (Grant No. 2023-BSBA-286).
Author information
Authors and Affiliations
Contributions
K.Z. conceived the idea and wrote the manuscript. K.Z. and G.Q. designed the experiment and performed data analysis. G.Q. and Y.Z. contributed to editing and revising the manuscript. J.L. provided the material and supervised the research. All authors have read and agreed to the final version of this manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
The plant materials used in the study were collected on public land under permission. The collection of plant materials and use comply with relevant institutional, national, and international guidelines and legislation. Because A. membranaceus is not an endangered wild plant, plant collection does not require specific permits. This article does not contain any studies with human participants or animals and does not involve any endangered or protected species.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Zhang, K., Qu, G., Zhang, Y. et al. Assembly and comparative analysis of the first complete mitochondrial genome of Astragalus membranaceus (Fisch.) Bunge: an invaluable traditional Chinese medicine. BMC Plant Biol 24, 1055 (2024). https://doi.org/10.1186/s12870-024-05780-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12870-024-05780-4







