New insights into the plastome evolution of Lauraceae using herbariomics

Background The family Lauraceae possesses ca. 50 genera and 2,500–3,000 species that are distributed in the pantropics. Only half of the genera of the family were represented in previously published plastome phylogenies because of the difficulty of obtaining research materials. Plastomes of Hypodaphnideae and the Mezilaurus group, two lineages with unusual phylogenetic positions, have not been previously reported and thus limit our full understanding on the plastome evolution of the family. Herbariomics, promoted by next generation sequencing technology, can make full use of herbarium specimens, and provides opportunities to fill the sampling gap. Results In this study, we sequenced five new plastomes (including four genera which are reported for the first time, viz. Chlorocardium, Hypodaphnis, Licaria and Sextonia) from herbarium specimens using genome skimming to conduct a comprehensive analysis of plastome evolution of Lauraceae as a means of sampling representatives of all major clades of the family. We identified and recognized six types of plastomes and revealed that at least two independent loss events at the IR-LSC boundary and an independent expansion of SSC occurred in the plastome evolution of the family. Hypodaphnis possesses the ancestral type of Lauraceae with trnI-CAU, rpl23 and rpl2 duplicated in the IR regions (Type-I). The Mezilaurus group shares the same plastome structure with the core Lauraceae group in the loss of trnI-CAU, rpl23 and rpl2 in the IRa region (Type-III). Two new types were identified in the Ocotea group: (1) the insertion of trnI-CAU between trnL-UAG and ccsA in the SSC region of Licaria capitata and Ocotea bracteosa (Type-IV), and (2) trnI-CAU and pseudogenizated rpl23 inserted in the same region of Nectandra angustifolia (Type-V). Our phylogeny suggests that Lauraceae are divided into nine major clades largely in accordance with the plastome types. The Hypodaphnideae are the earliest diverged lineage supported by both robust phylogeny and the ancestral plastome type. The monophyletic Mezilaurus group is sister to the core Lauraceae. Conclusions By using herbariomics, we built a more complete picture of plastome evolution and phylogeny of the family, thus providing a convincing case for further use of herbariomics in phylogenetic studies of the Lauraceae. Supplementary Information The online version contains supplementary material available at 10.1186/s12870-023-04396-4.


Background
Lauraceae, belonging to the Laurales of magnoliids, contain ca.50 genera and 2,500-3,000 species [1][2][3].Species of this family are mostly woody with exception of the herbaceous parasite Cassytha and widely distributed in tropical and subtropical regions [4].Tall tree species are dominant in the evergreen broad-leaved forests of the tropics and important in maintaining the local communities [4][5][6][7].In addition, many Lauraceae species are valuable economically, as a source of medicines, excellent timber, fruits, spices, and perfumes [4,8,9].
The phylogeny of the family Lauraceae remains poorly resolved because of the low resolution of molecular markers and inadequate sampling of species.Over the past two decades, published phylogenetic studies of Lauraceae were mainly based on single or multiple molecular markers [2,[10][11][12][13][14][15][16][17][18].Due to low divergence of commonly used markers, inter-and intrageneric phylogenetic relationships within the family have not been fully resolved [19][20][21].
Plastome sequences have been successfully used for inferring phylogeny of green plants at different taxonomic levels owing to rich sequence variation [22][23][24][25].Plastome sequences have also been used to resolve interand intrageneric phylogeny of the family Lauraceae [19,20,24,[26][27][28].At the family level, both Song et al. [19] and Liu et al. [20] recognized nine clades of Lauraceae (i.e., Hypodaphnideae, Cryptocaryeae, Caryodaphnopsideae, Neocinnamomeae, Cassytheae, Mezilaurus group, Perseeae, Cinnamomeae and Laureae), though they did not sample two of them in their phylogenomic studies, i.e., Hypodaphnis and the Mezilaurus group.Insufficient sampling of important lineages has been an obstacle to a better understanding of the plastome evolution of the family Lauraceae.Plastomes of 190 species of 27 genera of Lauraceae are available in NCBI (Table S1; accessed 22 March 2022), over 90% of them belong to Cryptocaryeae and the core Lauraceae group, and most of them are from Asia (Fig. 1) [29].Neotropical species of Cinnamomeae remain poorly represented, only one plastome of Nectandra and seven plastomes of Ocotea were sequenced [26,29].In particular, the African Hypodaphnis and the American Mezilaurus group represent evolutionary distinct lineages of Lauraceae but are still lacking in plastome studies: the genus Hypodaphnis is the earliest diverged lineage in the family Lauraceae (Hypodaphnideae), and the Mezilaurus group is sister to the core Lauraceae [2,10,11].This sampling bias is largely attributable to the unavailability of research materials.
Content, structure, and gene organization of plastomes are important in understanding evolutionary relationships of plants [30,31].Plastomes of Lauraceae show a relatively conserved quadripartite structure, and consist of 128-130 genes except Cassytha with only 113 genes [26,29,32].Recent studies have suggested that at least four different types of plastomes were existing in the family Lauraceae according to variation of ycf2-rpl2 regions at the IR-LSC boundary [26,28,29,32].The plastome of Cryptocaryeae lost the rpl2 gene in the IRb.Plastomes of Caryodaphnopsideae, Neocinnamomeae and the core Lauraceae group lost a segment of ycf2 and total trnI-rpl23-rpl2 region in the IRa.The parasitic genus Cassytha is unique in losing the entire IR region.The fourth type, only found in plastomes of Caryodaphnopsis henryi and a sample of Cinnamomum chartophyllum (synonym of Camphora chartophylla), contained two copies of rpl2 in the IR regions [26,28].At least two independent events caused by IR reduction might have occurred in the plastome evolution of Lauraceae [26,32].
Herbaria are a "treasure trove", harboring thousands of specimens with accurately identified materials and enormously relevant information [33,34].Museum specimens can improve material availability and overcome sampling biases in phylogenetic studies if they can be used in sequencing studies.However, it is difficult to obtain sequences using Sanger sequencing method because museum DNA is highly degraded and fragmented, and DNA extraction and gene amplification of Lauraceae are also challenging because of rich polysaccharides and polyphenols in plant tissues [1,35].Driven by next-generation sequencing (NGS) technology, herbariomics (Herbarium genomics) is a promising field [35,36].By using herbarium specimens, this new approach can largely solve the problem of sampling bias and taxonomic identification on the one hand, and is a costefficient and time-saving approach on the other hand [36,37].Herbarium specimens have rarely been used in phylogenomic studies of the family Lauraceae though plastomes were successfully obtained from specimens of Phoebe neurantha and Cin.bodinieri preserved for 79 years and 59 years, respectively [38].
In this study, we successfully obtained five plastomes representing five genera (Licaria, Ocotea, Chlorocardium, Sextonia and Hypodaphnis) from herbarium specimens, and filled the sampling gap of Hypodaphnis and the Mezilaurus group.We tested the applicability of herbariomics in phylogenomic studies of Lauraceae and explored the plastome evolution of the family.

Plastome variation of Lauraceae
The plastomes of Lauraceae contained at least six major types according to the varied number and position of rpl2, rpl23 and trnL-CAU genes (Fig. 2).Type-I was characteristic of the Hypodaphnideae with rpl2, rpl23 and trnI-CAU located in both IR regions.Type-II was restricted to the Cryptocaryeae with one copy of rpl2 missing due to contraction of the IRb boundary.In contrast, Type-III plastomes lost not only rpl2 but also rpl23, trnI-CAU and part of ycf2 due to contraction of the IRa boundary.This type was found in the remaining laurel species excepting the unique Cassytha filiformis whose IR was lost (Type-VI), and three American species from the Ocotea group.Ocotea bracteosa and L. capitata displayed a new type (Type-IV) with the plastomes gaining another copy of trnI-CAU near ccsA in SSC region compared with Type-III.Moreover, our re-annotation of the plastome of Nectandra angustifolia showed that it not only acquired an additional copy of trnI-CAU but also had a pseudogenizated rpl23 gene inserted between trnI-CAU and ccsA in the SSC region, which was defined as Type-V.
Pairwise alignments of sampled plastome sequences of Lauraceae showed a high similarity of over 84.7% (Fig. 5, Table S6), except for the parasitic Cas.filiformis displaying extremely low similarity (63.5-65.5%) to other genera.Two clusters were established based on similarity.One cluster comprised the core Lauraceae and the Mezilaurus group, which indicated higher similarity (≥ 94.0%) with one another; almost all species of the core Lauraceae displayed a pairwise similarity of over 98.0%.Notably, N. angustifolia had the lowest pairwise similarity range from 96.2 to 97.3% among the core Lauraceae.The other cluster consisted of the Cryptocaryeae with pairwise similarity over 94.0%.Caryodaphnopsis tonkinensis, Neocinnamomum delavayi and H. zenkeri were relatively independent and showed similarity lower than 92%, 90.1% and 89.6% with other species of Lauraceae, respectively.To investigate the plastome variation at the gene level, we calculated the percentages of variable characters for coding and non-coding regions of all sampled species.In total, coding regions were more conservative than noncoding regions (Fig. S3C).There were 14 coding regions exhibiting high variation (Fig. S3A): matK, rps16, rpoC2, accD, rpl20, rpoA, rps8, rpl22, rpl2, rpl23, ycf2, ycf1, rpl32 and ccsA (the percentage of variation > 20%).The SSC region contained only two genes (ycf1 and rpl32) with the percentage of variation over 30%.Seven non-coding regions exhibited high variation (Fig. S3B): psbA_trnH-GUG, trnG-UCC_trnR-UCU, rpl2_rps19, trnI-CAU_ycf2, rpl32_trnL-UAG, ccsA_trnL-UAG, rps15_ycf1 (the percentage of variation > 40%).Highly divergent regions were mainly distributed in IR boundaries.The region between trnI-CAU and ycf2 showed the highest variation at 59%.

Phylogenomics of Lauraceae
The complete plastomes and protein-coding genes of 35 species of Lauraceae were used to reconstruct phylogenetic trees.The two aligned data matrices were 150,930 bp and 66,843 bp long, respectively, and contained 13,907 bp (9.2%) and 5,731 bp (8.6%) parsimony informative sites, respectively.

Structural variation of plastomes in Lauraceae
By supplementing the five newly sequenced plastomes, we had representatives of all the nine clades of Lauraceae to achieve a more comprehensive knowledge of the plastome structure of the family.Plastomes of the family Lauraceae are conserved with a high sequence similarity no less than 84.7% between clades (excepting Cassytha; Fig. 5; Table S6), but gain and loss of DNA fragments do provide characters to classify the plastomes of the family into six types which are largely congruent with intrafamilial phylogenetic relationships (Fig. 2).Four of these six types have been reported in recent studies [26,32], corresponding to the Types I, II, III and VI recognized in this study (Fig. 2); here we recognize two new plastome types in the Ocotea group, i.e., Type-IV and Type-V (Fig. 2).
Hypodaphnis is the most primitive branch in the Lauraceae, and its plastome had not been reported.This genus has the Type-I plastome (Fig. 2), which contains two copies of rpl2 in the IR regions, and possesses the largest number of genes (131) and protein coding genes (86) in the family Lauraceae (Table 1) [29,32].In addition, the plastome of Hypodaphnis has fewer SSRs and the lowest GC content in the family Lauraceae (Table 1, S4) [29].This type of plastome was reported as an exceptional variation in Song et al. [26] and Xiao and Ge [28].
The Mezilaurus group had not been included in previous phylogenomic studies, we sequenced two species of the group, i.e., C. rodiei and S. rubra.Both species have the Type-III plastome (Fig. 2) which largely agrees with the plastome structure of Neocinnamomeae, Caryodaphnopsideae and the core Lauraceae group [26], with a few exceptions that we will be discussed below.Besides plastome structure, they demonstrated low sequence divergence and high similarity with other species that possess Type-III plastome (≥ 90.1%; Fig. 5; Table S6).
Plastomes of the Ocotea group possess considerable variation.All the published Ocotea plastomes possess Type-III plastome [29], our newly sequenced samples show different variation and belong to a new type.In O. bracteosa and L. capitata, the insertion of trnI-CAU occurred between trnL-UAG and ccsA genes in the SSC region (Fig. 2).This variation of gene organization in the SSC has not been reported in plastomes of Lauraceae before.To confirm this unusual variation, we designed specific primers for the inserted trnI-CAU and conducted a PCR amplification, confirmed the presence of trnI-CAU in the SSC region (Fig. S2).We define this variation as the Type-IV plastome.Notably, the Neotropical dioecious O. bracteosa has a plastome structure distinct from two closely related Ocotea species (O.guianensis and O. tabacifolia) belonging to the same dioecious clade in the Ocotea group [17,29,39], but shows the same plastome type as the monoecious L. capitata [40].This may suggest potential diversity of plastome types in the Ocotea group, which is highly probable because the Ocotea group is speciose [1].Moreover, the plastome of N. angustifolia was published five years ago [26].We reannotated the published plastome of N. angustifolia, and found that a trnI-CAU gene and a pseudogenizated rpl23 are inserted in the SSC region; we consider this variation as the Type-V plastome of the family (Fig. 2).The pseudogenizated gene of rpl23 that has been reported in the genus Cassytha [32] was determined because it shows 98% similarity with another rpl23 gene copy, but differs from the latter in having two internal terminators.More samples representing different lineages of the Ocotea group are needed to better understand plastome evolution of this group.
Although we have recognized six plastome types, it is apparent that structural variation may occur within a particular genus or even a certain species.Unusual structural variations of plastomes were found in Caryodaphnopsis and Cam.chartophylla (≡ Cin.chartophyllum; Fig. S5) [28].The published four plastomes of Caryodaphnopsis contain two different types, three of them belong to Type-III (MF939343, MN698962, NC_050345), but one (MF939346) belongs to Type-I as does Hypodaphnis.Despite the structural variation, the reported samples of Caryodaphnopsis belong to a same clade in the plastome phylogeny [19].Similar structural variation was found in Cam.chartophylla: one sample (OL943972) belongs to Type I while the other one (MW421301) belongs to Type-III [28].So far, we have found three genera of the family showing infra-generic/specific plastome structural variation, two genera discussed here show reversed plastome variation (Type-I).It remains unclear how and why this exceptional reversal occurs and whether it is rare or common.Without doubt, more samples are needed to verify the structural variation in the future.

Plastome evolution in Lauraceae
Previous studies have suggested that at least two independent evolutionary events occurred in the plastome evolution of Lauraceae, including different loss events at the IR-LSC boundary [26,32].In this study, we found a more complicated evolutionary history and drew a comprehensive picture of plastome evolution of the family Lauraceae by accessing plastome structure of Hypodaphnis and the Mezilaurus group (Figs. 2 and 7).
The plastome of Hypodaphnis is important for an understanding of the plastome evolution of Lauraceae.This genus possesses the Type-I plastome which is similar to that of Amborella trichopoda [41] and magnoliids including Piper (Piperales), Liriodendron and Magnolia (Magnoliales), and Illigera (Laurales) [26,30,42,43].The structural similarity of plastomes between Hypodaphnis and basal angiosperms suggests that the Type-I plastome structure is ancestral and other types of plastomes of the Lauraceae may have been derived from this type.
The plastomes of Lauraceae show a contracting evolutionary process due to at least two gene loss events at the IR-LSC boundary, followed by an independent expansion of the SSC region in the Ocotea group alone (Figs. 2  and 7).The Type-II plastome of Cryptocaryeae may have lost rpl2 in the IRb region independently due to the contraction of the IRb.For the IR loss of Cassytha plastome (Type-VI) after it diverged from the Neocinnamomeae (Type-III), Caryodaphnopsideae (Type-III), the Mezilaurus group (Type-III) and the core Lauraceae (Type-III, IV and V), there may have been two scenarios as Wu et al. [32] proposed.One is that contraction of IRa caused the loss of a copy of rpl2-ycf2 in the common ancestor of Type-III, IV, V and VI, and subsequent contractions of the IRa and IRb resulted in the Type-VI plastome with IR completely lost in Cassytha.Alternatively, the Type-VI plastome evolved by dropping a copy of the IR region independently, while the common ancestor of Type-III, IV and V lost a copy of rpl2-ycf2 due to contraction of IRa.
Surprisingly, the Ocotea group experienced independent expansion events of the SSC region, giving rise to the two newly recognized plastome types, i.e., Type-IV and Type-V (Figs. 2 and 7).Unlike the variation at the IR-LSC boundary in many Lauraceae species, there are three scenarios to explain the type transition from Type-III to Type-IV.First, the insertion of trnI-CAU to the SSC region of the ancestral plastome of the Ocotea group caused the transition from Type-III to Type-IV.Subsequent insertion of a pseudogenizated rpl23 gene or a rpl23 gene to be pseudogenizated may have caused the transition from Type-IV to Type-V in Nectandra.Second, the trnI-CAU_rpl23 segments inserted in the SSC region of the ancestral plastome of the Ocotea group, causing the plastome transition from Type-III to Type-V.Subsequent loss of rpl23 gene resulted in the transition from Type-V to Type-IV.Third, Type-IV and Type-V evolved from Type-III due to the insertion of trnI-CAU and trnI-CAU_rpl23 segments independently.Based on repeats analyses (Fig. 3), we found that the longest repeat (153 bp) occurred in both O. bracteosa and L. capitata, thereby contributing to the presence of trnI-CAU in the SSC region.This result is consistent with the suggestion of Xiao and Ge [28] that longer repeats in the plastomes of the Ocotea group than other species of Cinnamomeae may have led to a different evolutionary pattern in this tribe.As the Ocotea group is speciose and contains variable plastome types, more plastome patterns and complicated evolutionary histories may be discovered in the future when more species are sampled.
A dated phylogeny is helpful to understand the time frame of the plastome evolution of Lauraceae (Fig. 7).Our age estimates are largely congruent with previous studies [2,13,24].The stem age of the family Lauraceae was in the Early Cretaceous (ca.107.7 mya).Two independent loss events leading to the transition from Type-III to Type-II and Type-VI in Cryptocaryeae and Cassytheae occurred at ca. 100 mya and ca.90 mya respectively (Fig. 7), while the expansion event of SSC occurred in the Late Eocene (ca.38.8 mya; Fig. 7).We have not identified any geological events related to the structural changes of plastomes of the family Lauraceae.

Phylogenomics of Lauraceae
Our plastid phylogenomic result confirms that the family Lauraceae contains nine major clades corresponding to the eight previously described tribes and the Mezilaurus group.The CDS and CPG phylogenies show overall congruent topology except for the tribe Laureae which is one of the most complicated clades with conflicting phylogenetic signals in the plastome evolution [27].The relationships among the nine clades of Lauraceae are consistent with previous plastome phylogenetic results [19,20,29,44], and receive support from the plastome types as well (Figs. 2 and 7).
Hypodaphnis is restricted to tropical Africa and contains only one extant species (i.e., H. zenkeri) [2].Morphologically the genus is the only one with a truly inferior ovary in Lauraceae.According to previous studies based on plastid and nuclear markers, Hypodaphnis appears to be sister to all other extant Lauraceae, this position, however, receives rather low support [2,10,11].Song et al. [19] obtained a robust phylogeny of Lauraceae using complete plastomes and nine plastid markers (matK, psbA-trnH, rbcL, rpl16, rpoB, rpoC1, trnL, trnL-trnF, and trnT-trnL) for sampling purposes, and confirmed the sister relationship between Hypodaphnis and the remainder of the family with high support.In combination with both morphological and molecular evidence, the clade of Hypodaphnis was described as Hypodaphnideae [19].Here, our new phylogenomic result together with the ancestral plastome type of Hypodaphnis corroborate the primitive position of Hypodaphnis in the family Lauraceae.
The Mezilaurus group is monophyletic and consists of six genera including Anaueria, Chlorocardium, Clinostemon, Mezilaurus, Sextonia and Williamodendron [2].This group is sister to the core Lauraceae clade according to previous molecular studies based on nuclear and plastid markers [2,10,11,19,45].Our phylogenomic tree confirms that the Mezilaurus group is monophyletic and the sister relationship of this group to the core Lauraceae clade receives high support (Fig. 6, S4).However, no synapomorphy has been recorded in morphology and anatomy of the clade to date due to high variability [45,46].Neither does the plastome structure provide useful taxonomic characters to unite all genera of this group together.Further studies are necessary to better understand the synapomorphy of the group.

Herbariomics in Lauraceae
Phylogenetic studies of Lauraceae are still in their early stages due to the lack of plant materials.Global herbaria house numerous accurately identified plant specimens and are a potential material source for species sampling [34,35].Herbariomics and genome skimming based on NGS technique offer a powerful, efficient, and promising approach to obtain more species and DNA sequences [33,34].Museum specimens usually contains low DNA quality because of degradation and fragmentation and tissues of Lauraceae are rich in polysaccharides and polyphenols [1,34].These factors limit full use of herbarium specimens in phylogenetic studies of Lauraceae [1,35].In this study, we suggest that the mCTAB method is sufficient for extracting DNA from herbarium samples of Lauraceae, 20-30 mg leaf tissues of herbarium specimens can produce over 1,000 ng DNA (Table 2) [47].We successfully obtained five plastomes of Lauraceae using specimens collected 15 years ago (Table 2), and filled the sampling gap for the phylogeny of the Lauraceae by adding plastomes of Hypodaphnideae and the Mezilaurus group.Our study suggests that herbariomics provides a new opportunity and opens a new era for plastome phylogenomic studies of Lauraceae.

Conclusion
Utilizing leaf tissue of herbarium specimens, we successfully obtained five new plastomes of Lauraceae, representing five genera (Licaria, Ocotea, Chlorocardium, Sextonia and Hypodaphnis) belonging to three different clades of the family, i.e., Hypodaphnideae, the Mezilaurus group, and the Ocotea group.Hypodaphnis possesses the ancestral plastome type of the family with rpl2, rpl23 and trnI-CAU duplicated in the IR region.The Mezilaurus group possesses the same plastome type as the core Lauraceae group.Two new plastome types of the family Lauraceae were recognized in the Ocotea group.Licaria capitata and O. bracteosa possess plastomes with trnI-CAU inserted between trnL-UAG and ccsA in the SSC region (Type-IV) unlike their relatives, whereas N. angustifolia has a plastome with trnI-CAU and pseudogenizated rpl23 inserted in the same region (Type-V).Plastome evolution of Lauraceae has become better understood by adding plastomes of Hypodaphnis and the Mezilaurus group in phylogenomic studies and filling the sampling gap of unusual lineages of the family Lauraceae.We also show that herbariomics is a powerful tool to obtain extensive species sampling from accurately identified herbarium specimens for phylogenetic studies of such a difficult family as the Lauraceae.

Taxon sampling
We obtained leaf samples of L. capitata, O. bracteosa, C. rodiei, S. rubra and H. zenkeri from herbarium specimens deposited in the Herbarium of Missouri Botanical Garden (MO) and Harvard University Herbaria (A, GH) (Table 2).To infer the plastome phylogeny of Lauraceae, plastome sequences of the family were also downloaded from NCBI (accessed October 13 2021).In general, we downloaded one plastome sequence for each genus of the family when available.Multiple sequences of genera in Laureae with ambiguous phylogenetic relationships were selected according to Song et al. [26].In total 35 plastomes were selected, included 31 genera representing all nine clades of Lauraceae.Calycanthus chinensis, Chimonanthus nitens and Chim.praecox (Calycanthaceae, Laurales) were chosen as the outgroup.Information of sequences and their accession numbers are listed in Table S7.

DNA extraction and genomic sequencing
Genomic DNA was extracted from 20 to 30 mg leaves of herbarium specimens using a modified CTAB method (mCTAB) [47].3% CTAB was used, and approximately 2% polyvinyl polypyrrolidone (PVP) and 0.1% β-mereaptoethanol were added.In order to make full use of leaf materials, DNA extraction was repeated once, and DNA solutions were combined at the end.DNA quality was assessed with Agilent 5400 (Agilent Technologies Inc., U.S.A.).Short-insert libraries were prepared following the manufacturer's manual (Illumina) without a supersonic fragmentation treatment of the total DNA considering the degraded nature of herbarium specimens with short fragments.The DNA libraries were sequenced by Illumina Novo Seq6000 at Novogene Co., Ltd (Beijing, China).A total of ~ 2 Gb of 150 bp paired-end reads were obtained for each sample.

Genome structure identification
To verify the structure of the newly sequenced plastomes, a pair of gene-specific primers (1-F: GCCGCCATGGT-GAAATTGGTAGA, 1-R: GCATCCATRGCTGAATG-GTTAAAG) were designed to determine the presence of trnI-CAU in L. capitata and O. bracteosa.Sextonia rubra was selected as a control (Fig. S2A)

Genome structure analysis and genome comparisons
Plastomes of Lauraceae can be better understood with structural analyses and comparisons of genomes.We first calculated pairwise distance among genera based on the complete sequences and visualized the similarity via a hot map generated on ImageGP website [68].Then we calculated the percentage of variable sites among coding and non-coding regions to visualize the variations at gene level.The violin plot was generated by R package ggplot2 3.3.5 [69].Five newly sequenced plastomes and eight species covering the nine clades of Lauraceae were selected for structural comparison.Because plastome structure was already explored in Perseeae and Laureae [19,32], only Persea americana was chosen as representative here.More than one sequence was selected from Cinnamomeae to compare plastome structure among them.

Divergence time estimation
The ML tree generated by the concatenated CDS dataset was used for dating analyses.We selected five macrofossils for calibration following Li et al. [24].First, the middle Albian fossil Virginianthus calycanthoides was employed to calibrate the crown age of Laurales at the root node of the tree.We defined a minimum age of 107.7 mya according to Massoni et al. [70] [72].Fifth, Machilus maomingensis was used to calibrate the stem age of Machilus.The locality of this fossil was dated to the Eocene-Oligocene boundary (C5: age 33.7-33.9mya) [73].
Dating analyses were carried out with the approximate likelihood calculation using MCMCTree in PAML4.9j[74].The time unit was set to 100 mya, and the default soft tail of 2.5% was applied for the minimum and maximum bounds of all calibration points.For the root node and nodes whose age were well estimated (C1, C2 and C5), we used the lower and upper bounds that can be set to place the maximum probability of the node falling in a certain space between the calibrations.The remaining calibration nodes (C3 and C4) were used for the lower minimal bound with offset (p) and scale parameter (c) set as 0.1 and 0.2, respectively.The substitution rate was a rough estimation using BASEML (in PAML) at first.Then the ML estimates of branch lengths, the gradient vector, and Hessian matrix were calculated in MCMC-Tree using the GTR + G substitution models (model = 7).The parameter of rgene_gamma and sigma2_gamma was set as G (1, 33.3) and G (1, 4.5) according to previous estimation, respectively.A relaxed-clock model (clock = 2) was established.Two independent MCMC runs were conducted with burnin = 2,000,000, sampfreq = 100, nsample = 100,000.The stationary state and convergence of each run were checked in Tracer v.1.7.1 [75] to ensure that all parameters had effective sample sizes (ESS) above 200.

Fig. 1
Fig. 1 Visualization of available plastomes of Lauraceae in NCBI.(A) The systematic distribution of available species, the species number and relative percentage of each clade are shown in the pie charts; (B) The systematic distribution of available plastomes, the plastome number and relative percentage of each clade are shown in the pie charts.Different clades are indicated by different colors

Fig. 3 Fig. 2
Fig. 3 Repeats of the five newly sequenced plastomes.A. Number of three types of repeats; B. Length of three types of repeats.P = Palindrome repeat, D = Direct repeat, T = Tandem repeat

Fig. 4
Fig.4SSR analysis of the five newly sequenced plastomes.A. Simple sequence repeat unit composition; B. The distribution of repeats in the large single copy (LSC) region, the small single copy (SSC) region, the inverted repeat regions (IRs), the intergenic spacer regions (IGS), the coding DNA sequences (CDS) and the others

Fig. 5
Fig. 5 Similarity plot based on pairwise comparison of plastomes from the untrimmed whole-genome alignment.Similarity scores are color-coded from white (40% sequence identity) to black (100% sequence identity)

Fig. 6
Fig. 6 Maximum-likelihood (ML) tree inferred from CDS genes.Different tribal clades are highlighted with different colors.Five newly sequenced species are indicated with a red star.Each branch is assigned with UFBoot and SH-aLRT supports that are indicated above and below the line, respectively.The clades with 100% support for both tests are indicated by a black circle at the node.The phylogenetic tree with branch length is shown on the upper left

Fig. 7
Fig. 7 The chronogram of Lauraceae using MCMCtree.Blue bars on the nodes indicate the 95% HPD, mean age of each node is indicated above the bar, calibrating nodes are shown by red circles.Five newly sequenced species are indicated with a red star.For geologic timescale and subdivisions, PL + Q is abbreviated for Pliocene and the Quaternary.Six types of chloroplast genomes are indicated by rectangles with different colors on tip nodes

Table 2
Vouchers and accession nos. of five new sequenced plastomes in this study and set the upper boundary age 113 mya of Albian as the maximum age of this node (C1: age 107.7-113 mya).Second, Jerseyanthus calycanthoides was used to calibrate the split between Calycanthus and Chimonanthus.The age of this fossil was believed to be from the Coniacian-Santonian boundary (C2: age 85.8-86.8mya) [70].Third, the Cretaceous fossil taxon Neusenia tetrasporangiata was applied to calibrate the stem age of Neocinnamomum, the boundary age of Santonian-Campanian (C3: age 72.1-86.3mya) [71] was set as the age range of the fossil.Fourth, Alseodaphne changchangensis was applied to calibrate the crown age of the Persea group.The age of this fossil was dated back to the late Early Eocene to the early Late Eocene (C4: age 37-48 mya)