Assembly and comparative analysis of the complete mitochondrial genome of Ilex metabaptista (Aquifoliaceae), a Chinese endemic species with a narrow distribution

Background Ilex metabaptista is a woody tree species with strong waterlogging tolerance and is also admired as a landscape plant with high development prospects and scientific research value. Unfortunately, populations of this species have declined due to habitat loss. Thus, it is a great challenge for us to efficiently protect I. metabaptista resources from extinction. Molecular biology research can provide the scientific basis for the conservation of species. However, the study of I. metabaptista genetics is still in its infancy. To date, no mitochondrial genome (mitogenome) in the genus Ilex has been analysed in detail. Results The mitogenome of I. metabaptista was assembled based on the reads from Illumina and Nanopore sequencing platforms; it was a typical circular DNA molecule of 529,560 bp with a GC content of 45.61% and contained 67 genes, including 42 protein-coding genes, 22 tRNA genes, and 3 rRNA genes. Repeat sequence analysis and prediction of RNA editing sites revealed a total of 286 dispersed repeats, 140 simple repeats, 18 tandem repeats, and 543 RNA editing sites. Analysis of codon usage showed that codons ending in A/T were preferred. Gene migration was observed to occur between the mitogenome and chloroplast genome via the detection of homologous fragments. In addition, Ka/Ks analysis revealed that most of the protein-coding genes in the mitogenome had undergone negative selection, and only the ccmB gene had undergone potential positive selection in most asterids. Nucleotide polymorphism analysis revealed the variation in each gene, with atp9 being the most notable. Furthermore, comparative analysis showed that the GC contents were conserved, but the sizes and structure of mitogenomes varied greatly among asterids. Phylogenetic analysis based on the mitogenomes reflected the exact evolutionary and taxonomic status of I. metabaptista. Conclusion In this study, we sequenced and annotated the mitogenome of I. metabaptista and compared it with the mitogenomes of other asterids, which provided essential background information for further understanding of the genetics of this plant and helped lay the foundation for future studies on molecular breeding of I. metabaptista. Supplementary Information The online version contains supplementary material available at 10.1186/s12870-023-04377-7.


Background
Ilex L. (holly), from the monogeneric family Aquifoliaceae, is one of the largest woody dioecious angiosperm genera, and it contains approximately 600 species widely distributed from the tropics to temperate regions [1].As an evergreen shrub, I. metabaptista Loes.ex Diels grows beside the river beach at altitudes of 300-1200 m and is only found in Chongqing, Guangxi, Guizhou, Hubei, Hunan and Sichuan in China [2].It displays a strong waterlogging tolerance capacity and has high horticultural value [3].As a valuable endemic species with small populations, it is regarded as a natural resource with potential economic and ecological importance.Unfortunately, populations of this species have declined due to continuing declines in the area and extent of habitat [4].Thus, it is a great challenge to efficiently protect I. metabaptista resources from extinction.The investigation of the molecular diversity and evolution of this species will help establish more effective conservation countermeasures for the future [5].However, there has been little progress in the industrial development of I. metabaptista for a long time due to a lack of genomic resources and unclear genetic relationships.
Mitochondria and chloroplasts are organelles with a semiautonomous genetic system in higher plant cells, and they carry relevant genetic information [6,7].The nuclear genomes carry the overwhelming majority of information, but the chloroplast and mitochondrial genomes are nonetheless also indispensable in eukaryotes [8].The plant mitogenomes have undergone rapid and tremendous structural changes since the initial endosymbiotic event [9][10][11].Thus, the mitogenomes of plants are approximately 100 − 10,000 times larger and more structurally complex than those of animals [12].The mitogenomes of land plants demonstrate large genome size variation, ranging from 66 kb in Viscum scurruloideum [13] to 11.7 Mb in Larix sibirica [14], which can be attributed to the frequent recombination of repetitive sequences and incorporation of foreign sequences via intracellular or horizontal transfer [9,15].The number of genes in land plant mitogenomes varies widely, typically between 32 and 67 [16,17]; however, the functional genes exhibit substantial conservation [9,11].Additionally, structural complexity is another important feature of plant mitogenomes.Although plant mitogenomes have low mutation rates when compared to plastid (3-5 times lower) and nuclear genomes (10-20 times lower), the structures and gene orders are highly variable in plants [17][18][19][20].
Mitochondria are a powerful tool for studying the origin of species, genetic diversity, and phylogenetics [12,21].However, it is difficult to purify plant mitochondria, which are often interfered with by chloroplasts and other plastids [15], and to assemble their genomes due to their complex structure [16,22], which makes it comparatively challenging to carry out plant mitogenome studies.To date, more than 5000 plant chloroplast genomes have been sequenced, but only approximately 400 plant mitogenomes have been published in the NCBI database [12].In addition, sequenced plants largely differ in their classification with a strong bias towards crops [23], and only one complete mitogenome of species from the order Aquifoliales has been identified [24].Plant mitogenomes vary greatly in both genome structure and content, nucleotide substitution rates, and repeat recombination levels [18,25].These variations in mitogenomes are observed not only between plant species but also within the same species [12,26], in stark contrast to the conserved structure of plant chloroplast genomes [22].Thus, the mitogenome is a valuable source of genetic information for the study of plant phylogeny and essential cellular processes [6].Furthermore, the mitogenome is widely used in evolutionary analysis and interspecies discrimination studies, especially for the construction of ancient phylogenetic relationships and those among close species, because its genetic system is typically inherited maternally, relatively independent of the nucleus and relatively conserved [15,[27][28][29].
To date, the complete chloroplast genome sequences of a total of 55 Ilex species have been made available in the NCBI GenBank database (accessed on 4 May 2023), and nuclear genome sequencing has been performed in I. latifolia [30], I. asprella [31], and I. polyneura [32].To date, no mitogenome in the genus Ilex, except for the mitogenome sequence of I. pubescens released in 2019 [24], has been analysed in detail, which might greatly hinder a deep understanding of the evolution of mitogenomes in this large family.The complete chloroplast genome of I. metabaptista has already been assembled (GenBank Accession number: NC_069021.1);however, no report on the mitochondrial and nuclear genomes of this species has yet been published.
Therefore, in this study, the I. metabaptista mitogenome was sequenced and annotated for the first time.In addition, we conducted a comprehensive analysis with regard to genomic characteristics, repetitive sequences, RNA editing, codon preference, migration sequences and comparative genomics with other asterids and performed a phylogenetic analysis.These results will help better understand the structure and function of the I. metabaptista mitogenome and provide useful molecular markers for conservation biology, population genetics, and evolutionary studies on this species.

Sequencing and genomic features of theI. metabaptista mitogenome
The total DNA of I. metabaptista was sequenced, and the raw data were prepared for assembly, resulting in 12.45 G Illumina sequencing data and 14.41 G Nanopore PromethION sequencing data with an average read length of 8,863 bp (Table S1).We then assembled the complete mitogenome of I. metabaptista, which was a circular sequence with a length of 529,560 bp.The functional classifications and physical locations of the annotated genes are shown in Fig. 1.In the I. metabaptista mitogenome, 67 genes, including 42 protein-coding genes (PCGs), 22 tRNA genes, and 3 rRNA genes, were annotated.Additionally, 3,122 open reading frames (ORFs) were identified.
The length of all PCGs was 33,123 bp, accounting for only 6.25% of the total mitogenome length.There were 55 genes with no introns, accounting for 82.09% of the total.In addition, 26 introns were found in the other 12 I. metabaptista mitochondrial genes; nad1, nad2, nad5, and nad7 had 4 introns; and nad4 had 3 introns.
The nucleotide composition of the whole mitogenome (Table 2) was A (27.27%), T (27.12%),C (22.70%), and G (22.91%).The entire mitogenome had a GC content of 45.61%, composed of 43.18% PCGs, 51.83% rRNAs, and 50.82% tRNAs.Strikingly, the GC content of the PCGs was lower than that of other CDS regions (tRNAs and rRNAs).The GC skew was positive in CDS regions and in the mitogenome.

Repeat sequence analysis
Repeat sequences are abundant in the plant mitogenome, including simple sequence repeats (SSRs), tandem repeats and dispersed repeats [10,16].Different types of repeat sequences found in I. metabaptista are shown in Fig. 2. Dispersed repeats are repetitive sequences that are scattered throughout the genome [21].In the I. metabaptista mitogenome, a total of 286 dispersed repeats were identified with a length greater than or equal to 29 bp; of these, 144 were forward repeats and 142 were palindromic repeats.The lengths of the longest forward repeat sequence and the longest palindrome repeat sequence were 810 and 413 bp, respectively.The total length of the scattered repetitive sequences was 19,931 bp, accounting for 3.76% of the total length of the mitogenome.The abundance of both types of repeats was the highest when repeats were in the range of 30-39 bp (Fig. 3).
SSRs are DNA fragments with a length of 1-6 bp that are widely used in species research due to their advantages, which include polymorphism, codominant inheritance, relative abundance, and wide genome coverage [16].As shown in Table 3, we identified 140 SSRs in the I. metabaptista mitogenome, and the detected SSR sites included monomer, dimer, trimer, tetramer, and pentamer repeats.Tetramer repeats were the most abundant SSR type, constituting 42.14% of the total identified SSRs, followed by dimer and hexamer repeats, which accounted for 23.57% and 21.43% of  Tandem repeats, also known as satellite DNAs, are core repeating units of 1-200 bases repeated several times in tandem and are widely present in eukaryotic and some prokaryotic genomes [34].As shown in Table 4, a total of 18 tandem repeats ranging in length from 9 to 39 bp that had a match degree greater than 81% were found in the genome.

Prediction of RNA editing sites
In all eukaryotes, the addition, loss, or substitution of bases in the coding region of the transcribed RNA Fig. 2 Distribution of repetitive sequences in the I. metabaptista mitogenome.The outermost circle is the SSRs, followed by the tandem repeat sequence, and the innermost concatenation is the dispersed repeat sequence is called RNA editing [28].In this study, a total of 543 RNA editing sites were predicted within 39 PCGs of the I. metabaptista mitogenome (Table 5).All RNA editing sites were unevenly distributed among different genes, ranging from 2 (rps14, rps7, and sdh3) to 39 (nad4) (Fig. 4).After RNA editing, 43.09% of amino acids were predicted to remain unchanged in hydrophobicity, 8.47% to change from hydrophobic to hydrophilic, and 47.51% to change from hydrophilic to hydrophobic.
There were only 30 codon transfer types, corresponding to 14 amino acid transfer types.Among all codon transfer types, TCA = > TTA was the most common, with 84 sites.The predicted results also showed that the amino acids generated after codon editing had the highest tendency to convert to leucine after RNA editing; 46.22% (251 sites) of amino acids were converted to leucine.All RNA-editing sites in the I. metabaptista mitogenome were the C-T editing type; among these, 30.57% (166) of the editing sites were located on the first base of the triplet codon, and 65.75% (357) of the editing sites were located on the second base of the triplet codon.There were two particular editing cases in which both the first and second bases of the triplet codon were edited, resulting in the conversion of proline (CCC, CCT) to phenylalanine (TTC, TTT).However, no editing occurred at the third position of the triplet codons.In addition, 0.92% of the amino acids were edited into a stop codon (TAG, TGA).

Analysis of codon usage
We analysed the codon composition of the I. metabaptista mitogenome (Table 6).The number of codons in all coding genes was 11,041, and the GC1, GC2, and GC3 content and the average GC content of 3 bases (all GC) were less than 50%, indicating that the codons of the I. metabaptista mitogenome were biased because of the use of both A and T bases.The effective codon number (Nc) was 53.24, which indicated that the codon preference of the I. metabaptista mitogenome was weak [21].As shown in Table 1, most PCGs used ATG as the start codon, whereas nad4L and rps10 used ACG as the start codon, presumably a consequence of alteration by RNA editing [16], and rps4 used TTG as the start codon.The utilization rates of the TAA, TGA, and TAG stop codons were 56.41, 33.33, and 10.26%, respectively.The use rate of the TAA stop codon was the highest.
The codon usage bias in the I. metabaptista mitogenome was measured by calculating the relative synonymous codon usage (RSCU) (Table S2).If RSCU = 1, it indicates that codon usage is unbiased, and if RSCU < 1, it indicates that the actual frequency of use of the codon is lower than the frequency of use of other synonymous codons, and if RSCU > 1, it is higher than the frequency of use of other synonymous codons [21].As shown in Fig. 5, there were 30 codons with RSCU > 1, indicating that the usage frequency of these codons was greater than that of other synonymous codons.Among these, 27 codons ending with the A/T base were identified, and these accounted for 90.00% of the codons.

Phylogenetic analysis
To understand the evolutionary status of the I. metabaptista mitogenome, phylogenetic analysis was performed on the I. metabaptista mitogenome together with the published mitogenomes of 29 other plants, including 28 asterids and Spinacia oleracea (designated as the outgroup).A phylogenetic tree was obtained based on these species, as shown in Fig. 7.As an outgroup, S. oleracea was distinct from the asterids.All 7 taxa of the studied orders (Ericales, Gentianales, Solanales, Lamiales, Aquifoliales, Asterales and Apiales) were well clustered.Moreover, the phylogenetic tree strongly supported the separation of campanulids from lamiids and the separation of the basal groups from campanulids and lamiids.In addition, the target tree species I. metabaptista and I. pubescens, which both belong to the genus Ilex in the Aquifoliaceae family, were clustered into a narrow branch with a high bootstrap support value (100%) and formed a sister cluster with the clade of Asterales and Apiales with a high bootstrap support value of 99% (Fig. 7).Consistent with the APG IV taxonomic tree [35], this study also found that Aquifoliales was placed at the base of the campanulids.In general, the clustering in the phylogenetic tree is consistent with the relationships of these species at the order level, indicating that the mitogenome-based clustering results are reliable.Based on the phylogenetic relationships among the 30 species, different groups of plants were selected for further comparative analysis.

Substitution rates of protein-coding genes
To evaluate selective pressures during the evolutionary dynamics of PCGs among closely related species, the nonsynonymous (Ka) and synonymous (Ks) substitution ratios (Ka/Ks) were calculated.In the case of neutral selection, Ks = Ka or Ka/Ks = 1.If the Ka value is higher than the Ks value, it is indicative of positive selection (Ka/Ks > 1), while if Ks > Ka or Ka/Ks < 1, it is indicative of negative selection [36,37].The 39 PCGs from the I. metabaptista mitogenome were compared with the mitogenomes of 7 other asterids for Ka/Ks calculation.As shown in Fig. 8, for the gene-specific substitution rates, Ka/Ks ranged from 0.024 at the atp9 gene to 5.684 at the atp9 gene.The ccmB gene exhibited the highest average Ka/Ks value (1.112), which was higher than 1, suggesting that positive selection occurred during evolution.However, the Ka/Ks values of most genes were less than 1 in most species, suggesting that they had undergone negative selection during evolution.The atp1 gene had the smallest average Ka/Ks value (0.185), less than 1.0 in all species, indicating strong purifying selection and high conservation during the evolutionary process in asterids plants [38].

Nucleotide diversity
Nucleotide diversity (Pi) can be used to evaluate the variation in nucleic acid sequences of different species, and regions with higher variability can be selected as potential molecular markers for population genetics [39].The nucleotide diversity of the 39 PCGs and 3 rRNA genes among the eight asterids is shown in Fig. 9.The Pi values of 42 genes ranged from 0.026 to 0.114, and most of the Pi values were lower than 0.1.Among the PCGs, atp9 (Pi = 0.114) displayed the highest variability, and sdh3 (Pi = 0.066) and cox2 (Pi = 0.061) were also highly variable.In contrast, the most conserved PCGs were nad2 (Pi = 0.017) and nad7 (Pi = 0.017).Moreover, three rRNA genes were all conserved, with values of 0.0102 in rrn5, 0.012 in rrn26 and 0.015 in rrn18.Overall, the nucleotide diversity of the PCGs was highly variable among the eight asterids.

Comparison of mitogenome sizes and GC contents with those of other asterids
The size and GC content of the I. metabaptista mitogenome were compared with those of 28 other published asterid mitogenomes (Table S3).As shown in Fig. 10, the genome sizes of the selected asterids varied greatly, ranging from 211,002 bp (Chrysanthemum boreale) to 1,249,593 bp (Platycodon grandiflorus).The I. metabaptista mitogenome was similar to I. pubescens in size, which was moderate in size relative to most genomes of asterids (Fig. 10).However, the difference in the GC contents of mitogenomes was relatively small, approximately 45%.

Comparison of the genome structure with other asterid mitogenomes
Because only one mitogenome of a species in Aquifoliaceae has been reported, the mitogenome of I. metabaptista was only compared with seven asterids, including one Ericales, one Gentianales, one Solanales, one Lamiales, one Aquifoliales, one Asterales and one Apiales, to further investigate the genome structural variations.As shown in Fig. 11, closely related species shared the most sequences, even outside of the coding regions; species belonging to different groups shared fewer sequences.

Synteny analysis
As shown in Fig. 12, the dot-plot analysis showed that longer synteny sequences with higher similarity were found between I. metabaptista and I. pubescens than between I. metabaptista and other asterids.Pairwise synteny analysis (Fig. 13) showed that there were a large number of homologous colinear blocks, which were not arranged in the same order among individual mitogenomes.These large rearrangement events indicated that the mitogenomes are extremely nonconserved in structure among these eight asterids.Homologous sequences were distributed along the plant mitogenomes, and closely related species shared the most homologous sequences.

Characterization of theI. metabaptista mitogenome
Mitochondria provide plant cells with the energy needed for life processes [15].Plant mitogenomes are fascinating molecules whose variations in noncoding regions and low conservation across species have generated major interest [40].However, sequencing and analysis of plant mitogenomes are more difficult due to a relatively complex genome characterized by the accumulation of repetitive sequences, incorporation of chloroplast DNA, and extensive rearrangements, which hinder genome assembly [15,18].With the rapid development of highthroughput sequencing and assembly technologies, there has been rapid growth in plant mitogenome projects and high-quality mitogenome assemblies in the past several years [16].The key features of the I. metabaptista mitogenome are described in this article.Because of the high recombination frequency, plant mitogenomes have a dynamic structure with various configurations, such as major loops, sub loops and linear molecules, in mitochondria [8,11].The I. metabaptista mitogenome reported in this study had the typical circular structure of land plant genomes with a length of 529,560 bp and GC content of 45.61%, which were similar to those of I. pubescens (517,520 bp; 45.55%) [24].
Repeats are important sources of information for developing markers for population and evolutionary analyses, which are widely present in mitogenomes [16].Repeats in mitochondrial DNA are generally vital for intermolecular recombination, which plays a crucial role in shaping the mitogenome [33,41].Numerous repetitive sequences have been discovered in the mitogenome of I. metabaptista, which might indicate the frequent intermolecular recombination frequently occurring in the mitogenome that could dynamically alter the structure and conformation of the mitogenome during evolution [28].The identified monomer SSRs were mainly composed of the A and T bases connected via two hydrogen bonds, which required less energy to break the bonds than that for the GC bonds [21].
RNA editing occurs during a posttranscriptional process in the mitogenome and chloroplast genome of higher plants and can alter genetic information at the mRNA level [8,16].The study of RNA editing sites aids in the comprehension of plant mitochondrial gene expression [33].In this study, the number of RNA editing sites (543 sites) predicted in the I. metabaptista mitogenome was similar to those of other angiosperm plants, such as Photinia serratifolia (488) [12], Diospyros oleifera (515) [22], and Sapindus mukorossi (487) [9], but less than those of gymnosperms, such as Taxus cuspidata (974) [42].However, there were fewer types of codon amino transfer and acid transfer (30 codons; 14 amino acids) than those of angiosperm plants (50-60; approximately 30) [22].Therefore, the I. metabaptista mitogenome has more RNA-editing sites but fewer editing types.Consistent with previous studies, the most abundant transfer type in I. metabaptista  was TCA = > TTA [7,21], and the selection of editing sites showed a strong bias, with all editing sites being C-T editing, which is the most common editing type in plant mitogenomes [22].Additionally, the second position base of the triplet codon was most prone to RNA editing events, and a leucine tendency after RNA editing was found in the amino acids of predicted editing codons [28].In addition, RNA editing could lead to the premature termination of the coding process in the I. metabaptista mitogenome, thus altering the function of the gene [21].

Mitogenome comparison in asterids
With the rapid development of sequencing technology, an increasing number of complete plant mitogenomes have been assembled and reported recently, facilitating the comparative analysis of mitogenome features among multiple plant species [16,21].We compared the genome of I. metabaptista to those of other asterids to learn more about its structure and organization.The mitogenomes have undergone extensive rearrangements and are extremely nonconserved in structure among asterids, which might be the main reason for the evolution and diversification of plant mitogenomes [27].
The Ka/Ks analysis and the comparison of genomic features with other plant mitogenomes should contribute to a comprehensive understanding of plant mitochondrial evolution [17].Generally, consistent with previous studies [21,22,28], most of the PCGs in I. metabaptista had negative selection during the evolution process, indicating that the PCGs in the mitogenome were relatively well conserved.However, the ccmB gene was the only gene that underwent positive selection during evolution, which was consistent with that of Suaeda glauca [28].Other plant mitogenomes also have PCGs with Ka/Ks ratios > 1, and a high gene Ka/Ks ratio plays an important role in further studies on gene selection and evolution of species [38].In studies of gene selection and evolution in  the Aquifoliaceae family, high Ka/Ks gene ratios are very important [21].
The size and GC content are the primary factors for assessing species [7].We also compared the size and GC content of the I. metabaptista mitogenome with those of other asterids.The genome sizes differ greatly, but their GC contents are relatively consistent among asterids, which supports the conclusion that GC contents are highly conserved during the evolutionary process of higher plants [9,28].In conclusion, the mitogenome of I. metabaptista shares features that are common among other asterids.

Patterns of codon use bias
Codons play a vital role during transformation of genetic information [15].There is a wide variation in the rate of genomic codon usage among different species and organisms, which is thought to be the result of a relative equilibrium that gradually develops within the cell over a long period of evolutionary selection [43].In I. metabaptista, most PCGs were the typical ATG start codon, and the distribution of amino acid compositions was similar to other angiosperms [21,28].Codon composition analysis showed that the codon preference of the I. metabaptista mitogenome was weak, there were 30 codons for which the RSCU > 1, and most of these ended with A/T bases.The results indicated a strong A or T bias in the third position of the codon in the PCGs of the I. metabaptista mitogenome; this is commonly observed in plant mitogenomes [21].

Intergenomic sequence transfers
The evolution of the mitogenome involves many structural rearrangements and gene transfer events [44].An important feature of plant mitogenome evolution is the transfer of genes between the mitochondria and the chloroplast genomes [16,45,46].Therefore, tracking intergenomic transfer between organellar genomes is essential for understanding the evolution of plant mitogenomes [11,47].During mitochondrial evolution, the length and sequence similarity of the migrated fragments vary among higher plants [48].In this study, the proportion of the transferred fragments between the mitochondria and the chloroplast genomes in I. metabaptista was similar to the previously reported data for Vitex rotundifolia (2.36%) [49] and B. chinense (2.56%) [21] but lower than Ipomoea batatas (7.35%) [29].In addition, tRNA genes are most commonly transferred Fig. 5 Relative synonymous codon usage (RSCU) in the I. metabaptista mitogenome.The different amino acids are shown on the x-axis.RSCU values are the number of times a particular codon is observed relative to the number of times that codon would be expected for uniform synonymous codon usage from the chloroplast genome to the mitogenome in angiosperms [28,45].We found that the intracellular tRNA genes transferred frequently from chloroplasts to mitochondria in I. metabaptista, which was similar to the results in S. glauca [28] and Acer truncatum [16].These findings indicated that tRNA genes were more conserved than PCGs and rRNA genes during evolution since they might remain functional in the mitogenome [43].

Phylogenetic inference
Because of its many advantages, including maternal inheritance, rapid evolution, low recombination rates, and many available molecular markers, the mitogenome has become a useful tool for the study of taxonomy, phylogeny, evolution, population genetics, and comparative genomics [27,29].Ilex L. exhibits notable morphological diversity, and the boundaries of some species have not been clearly defined in this genus due to similar morphological features [50,51].Thus, further research is needed to understand the origin and evolutionary relationships of this genus.Recently, several studies have characterized the genus Ilex by means of phylogeny and biogeography [1], complete chloroplast genome assembly [51], SSR analysis [52], and nuclear genome assembly [30][31][32].Aside from this, the taxonomy of the genus is still not clear, and the mitogenomes can help to understand the evolutionary relationships existing among species of the Aquifoliaceae family and the putative hybrid origin for many species within the genus.In the current study, based on the information obtained from the mitogenome, a phylogenetic analysis of the I. metabaptista mitogenome and the published mitogenomes of 29 plant species was performed.The evolutionary relationships among these species were consistent Fig. 6 Distribution of homologous fragments between mitochondria and chloroplasts in I. metabaptista.The green arcs of the circle represent the chloroplast genome, and the yellow arcs represent the mitogenome.The blue lines between the arcs correspond to the genomic fragments that are homologous with the topology of the phylogenetic tree, indicating the consistency of traditional and molecular taxonomy, which illustrated the possibility of employing information acquired from mitogenomes in plant phylogenetic studies.In addition, these results will lay the foundation for identifying further evolutionary relationships within Aquifoliaceae.However, due to the lack of adequate representative mitogenomes, more mitogenomes of Aquifoliaceae need to be sequenced to better resolve the phylogeny and evolutionary biology within this large family [22].

Conclusions
In this study, our study produced the first detailed characterization of a complete mitogenome in Ilex.The mitogenome of I. metabaptista was sequenced, assembled, and annotated, and the DNA and amino acid sequences of annotated genes were analysed thoroughly.The I. metabaptista mitogenome was circular and 529,560 bp in length.In addition, 67 genes, of which 42 PCGs, 22 tRNA genes, and 3 rRNA genes, were annotated in the mitogenome.Then, the repeat sequences, RNA-editing sites, homologous fragments between mitochondria and chloroplasts, patterns of biased codon usage, and selective pressure were analysed.Additionally, Ka/Ks analysis, nucleotide polymorphism analysis, and comparative analysis of genomic features were performed to provide a more comprehensive understanding of mitogenome evolution in asterids.Furthermore, the evolutionary status of I. metabaptista was verified by phylogenetic analysis based on the mitogenomes of this species and 29 other taxa.This study provides extensive information regarding the I. metabaptista mitogenome, which is helpful for future research on the genetic variation, systematic evolution, and breeding of I. metabaptista.Therefore, these results help us lay a solid foundation for the cultivation, exploitation, and utilization of this multifunctional tree species.

Plant materials, mitochondrial DNA isolation and genome sequencing
Fresh young leaves of I. metabaptista were collected from a female tree (Figure S1 Total genomic DNA was extracted using a plant genomic DNA kit (Tiangen Biotech, Beijing, China).The DNA purity was detected with a 1.0% agarose gel.Then, the qualified library was sequenced and assembled by applying second-and third-generation sequencing platforms by Nanjing Genepioneer Technology Co., Ltd.(Nanjing, China).
Sequencing was performed following the protocol for the Illumina NovaSeq 6000 platform and the library protocol for Nanopore PromethION sequencing.To obtain a high-quality I. metabaptista mitogenome, we used fastp (v0.20.0, https:// github.com/ OpenG ene/ fastp) software to filter the raw data, discard the sequencing junction and primer sequences in the reads, filter out reads with an average quality value of less than Q5, filter out reads for which the number (N) was greater than 5, and obtain high-quality reads.The triple sequenced data were filtered using Filtlong (v0.2.1, https:// link.zhihu.com/?target= https% 3A// github.com/ rrwick/ Filtl ong) software and counted using Perl scripts.

Assembly and annotation of the mitogenome
Plant mitochondrial genes are very conserved.Taking advantage of this feature, the third-generation comparison software Minimap2 (v2.1) [53] was used to compare the original third-generation data with the reference gene sequence (plant mitochondrial core gene) and screen the sequence with a length greater than 50 bp as the candidate sequence in the alignment.The sequence with more aligned genes (one sequencing sequence contains multiple core genes) and higher alignment quality (covering more complete core genes) was selected as the seed sequence.Compare the original long-read sequencing data with the seed sequence, the sequences with minimum overlap of 1 kb and at least 70% similarity were added to seed sequence, and iteratively align the original data to the seed sequence, so as to obtain all long-read sequencing data of the mitogenome.Then, the third-generation assembly software canu [54] was used to correct the third-generation data obtained, and Bowtie2 (v2.3.5.1) [55] was used to align the second-generation data to the corrected sequence.The default parameter Unicycler (v0.4.8) was used to compare the above second-generation data and the corrected third-generation data for concatenation.Finally, the ringed I. metabaptista mitogenome was obtained, and the average depth of assembled mitogenomes was 325×.
Mitogenome annotation was performed using the following steps: the encoded proteins and rRNAs were Fig. 10 Sizes and GC contents of 29 asterid mitogenomes compared to published plant mitochondrial sequences using BLAST, and further manual adjustments were made based on closely related species.The tRNA was annotated using tRNAscanSE (http:// lowel ab.ucsc.edu/ tRNAs can-SE/) with default settings.ORFs were annotated using Open Reading Frame Finder (http:// www.ncbi.nlm.nih.gov/ gorf/ gorf.html).The circular mitochondrial map was drawn using the Draw Organelle Genome Maps online software (OGDRAW v1.3.1, https:// chlor obox.mpimp-golm.mpg.de/ OGDraw.html).

Codon usage analysis
The codon composition of the mitogenome of I. metabaptista was analysed using a self-encoded Perl script to screen for a unique CDS and determine the number of codons per gene, GC content (GC1, GC2, and GC3), average GC content of 3 bases (GC all), effective number of codons (Nc, effective number of codons), and RSCU of synonymous codons.

Homologous fragment analysis
The chloroplast genome sequence of I. metabaptista (NC_069021.1) was downloaded from the NCBI Organelle Genome Resources Database.BLAST software on NCBI was used to identify the homologous fragments between the mitogenome and chloroplast genome.Screening criteria were set as the matching rate ≥ 70%, E-value ≤ 1e -5 , and length ≥ 30 bp.The results were visualized using circos (v0.69-5).

RNA editing analyses
The editing sites in the mitochondrial RNA of I. metabaptista were identified using the mitochondrial geneencoding proteins of plants as reference proteins.The analysis was conducted using the Plant Predictive RNA Editor (PREP) suite [58] (http:// prep.unl.edu/).

Phylogenetic tree construction
To acquire the phylogenetic position of I. metabaptista, 29 plant mitogenomes (Table S4) were downloaded from the NCBI Organelle Genome Resources database (http:// www.ncbi.nlm.nih.gov/ genome/ organ elle/).Among these species, not only were the complete mitogenome sequences of these species for analysis available in NCBI, but they were also placed clearly in taxonomy and were widely used.The shared CDSs of 30 species from different families were aligned using MAFFT (v7.427, --auto mode) software [59].The aligned sequences were connected endto-end, trimmed with trimAl (v1.4.rev15, parameter: -gt 0.7), and jmodeltest-2.1.10software was used to predict the model after trimming and determine that the model was of the GTR type.Then, RAxML v8.2.10 [60] (https:// cme.h-its.org/ exeli xis/ softw are.html) software was used to select the GTRGAMMA model, bootstrap = 1000, and build the maximum likelihood evolutionary tree.Spinacia oleracea was designated as an outgroup.

Fig. 1
Fig. 1 Circular map of the I. metabaptista mitogenome.Gene map showing 67 annotated genes of different functional groups.Genes shown on the outside and inside of the circle are transcribed clockwise and counterclockwise, respectively.The dark grey region in the inner circle depicts the GC content

Fig. 3
Fig. 3 Distribution of lengths of interspersed repeats in the I. metabaptista mitogenome

Fig. 4
Fig. 4 Distribution of RNA editing sites in protein-coding genes of the I. metabaptista mitogenome

Fig. 7 Fig. 8 Fig. 9
Fig. 7 The phylogenetic relationships of I. metabaptista with 29 other plant species.Spinacia oleracea served as an outgroup.The bootstrap values are listed in each node.The number after the species name is the GenBank accession number.Colours indicate the groups to which the specific species belong

Fig. 11
Fig. 11 Comparison of I. metabaptista mitochondrial structures relative to asterids.The two outermost circles depict the gene length and orientation of the genome; the inner circles represent the similarity results with other reference genomes; the black circles represent the GC content

Fig. 12
Fig. 12 Dot-plot graphs indicating synteny sequences between mitogenomes in asterids compared to I. metabaptista as the reference

Table 1
Functional classifications and physical locations of genes in the I. metabaptista mitogenome

Table 2
Composition and skewness of the I. metabaptista mitogenome the total SSRs, respectively; the number of trimer and pentamer repeats was the lowest.Monomer repeats composed of A/T bases accounted for 93.33% of monomer SSRs, and dimer repeats composed of AG/ CT bases accounted for 51.52% of dimer SSRs.There were no hexanucleotide repeats in the I. metabaptista mitogenome.

Table 3
The SSR types detected in the I. metabaptista mitogenome

Table 4
Distribution of tandem repeats in the I. metabaptista mitogenome

Table 5
Prediction of RNA editing sites

Table 6
Overall characteristics of codon usage in the I. metabaptista mitogenome

Table 7
Homologous fragments between mitochondria and chloroplasts in I. metabaptista