Full-length transcriptomic identification of R2R3-MYB family genes related to secondary cell wall development in Cunninghamia lanceolata (Chinese fir)

Background R2R3-MYB is a class of transcription factor crucial in regulating secondary cell wall development during wood formation. The regulation of wood formation in gymnosperm has been understudied due to its large genome size. Using Single-Molecule Real-Time sequencing, we obtained full-length transcriptomic libraries from the developmental stem of Cunninghamia lanceolata, a perennial conifer known as Chinese fir. The R2R3-MYB of C. lanceolata (hereafter named as ClMYB) associated with secondary wall development were identified based on phylogenetic analysis, expression studies and functional study on transgenic line. Results The evolutionary relationship of 52 ClMYBs with those from Arabidopsis thaliana, Eucalyptus grandis, Populus trichocarpa, Oryza sativa, two gymnosperm species, Pinus taeda, and Picea glauca were established by neighbour-joining phylogenetic analysis. A large number of ClMYBs resided in the woody-expanded subgroups that predominated with the members from woody dicots. In contrast, the woody-preferential subgroup strictly carrying the members of woody dicots contained only one candidate. The results suggest that the woody-expanded subgroup emerges before the gymnosperm/angiosperm split, while most of the woody-preferential subgroups are likely lineage-specific to woody dicots. Nine candidates shared the same subgroups with the A. thaliana orthologs, with known function in regulating secondary wall development. Gene expression analysis inferred that ClMYB1/2/3/4/5/26/27/49/51 might participate in secondary wall development, among which ClMYB1/2/5/26/27/49 were significantly upregulated in the highly lignified compression wood region, reinforcing their regulatory role associated with secondary wall development. ClMYB1 was experimentally proven a transcriptional activator that localised in the nucleus. The overexpression of ClMYB1 in Nicotiana benthamiana resulted in an increased lignin deposition in the stems. The members of subgroup S4, ClMYB3/4/5 shared the ERF-associated amphiphilic repression motif with AtMYB4, which is known to repress the metabolism of phenylpropanoid derived compounds. They also carried a core motif specific to gymnosperm lineage, suggesting divergence of the regulatory process compared to the angiosperms. Conclusions This work will enrich the collection of full-length gymnosperm-specific R2R3-MYBs related to stem development and contribute to understanding their evolutionary relationship with angiosperm species. Supplementary Information The online version contains supplementary material available at 10.1186/s12870-021-03322-w.

molecular mechanism of wood formation in C. lanceolata was attempted by constructing a cDNA library and expressed sequence tag (EST) database of C. lanceolata var. Dugaushau [23] and a transcriptome database containing more than 80,000 sequences established using Illumina high-throughput sequencing [24]. Additionally, candidate genes related to lignin and cellulose biosynthesis were analysed in Wang et al. (2011) [25]. Despite that, identi cation and characterization of gene functions in C. lanceolate have been complicated by incomplete and redundant sequences obtained from Illumina shortread sequencing analysis. We used the PacBio long-read sequencing platform to obtain full-length transcriptomic libraries from the developmental stem of C. lanceolata. As such, the R2R3-MYBs of C. lanceolata (hereafter named as ClMYBs) were identi ed. Protein sequence characteristics, phylogenetic relationships, and expression patterns of ClMYB encoding genes were subsequently investigated. Based on their evolutionary relationship with the known genes from Arabidopsis thaliana and woody species, we identi ed nine ClMYBs associated with secondary cell wall development with some degree of diversi cation of tissue speci city compared to the angiosperm species. The involvement of ClMYB1 in secondary wall development was identi ed and characterized by ectopic expression in tobacco. These results provide essential genetic resources for improving the wood quality of C. lanceolata at the molecular level.

Plant materials
The plant materials were harvested from two-year-old living plants belonging to C. lanceolata clone 2015-2, obtained from the Forest Germplasm Resource Conservation Nursery (E119.726, N30.267) located at the State Key Laboratory of Subtropical Silviculture, Zhejiang A&F University. The nursery was maintained by our laboratory; hence no permission for the sample collection was required. To our knowledge there was no herbarium available for the deposition of the voucher specimen of this speci c plant. Plant identi cation was carried out by the co-author, Professor Zaikang Tong. Plants were grown at 26-28 o C under a light intensity of 1,500 Lux and a photoperiod of 12 h daily. Eleven different plant organs and tissues were collected from mid-March until the end of June in 2017. Among these organs, female cones (FC) and male cones (MC) were harvested from upper branches at similar heights. The roots (RT) were cut from hydroponically grown clone 2015-2 plants. Leaves (L) consisted of the one-year-old mature coniferous needles. Five stem segments were collected from the top to bottom of an ongoing-year branch, with each segment being 2 cm long and marked sequentially as S1, S2, S3, S4, and S5. Phloroglucinol-HCl staining was performed on the sequential stem cross-sections to observe vascular development (Supplementary Figure S1). The bark (B) of a two-year-old stem segment was collected and the 1.0-mm-thick xylem (X) layer was scraped from the outer xylem. All samples were quickly frozen in liquid nitrogen and stored at -80°C. Nicotiana benthamiana was used for genetic transformation for subsequent analysis of the gene functions.

Compression wood induction
An arti cial bending treatment was performed to induce compression wood [26]. In early May of 2017, stem of C. lanceolata plants that displayed relatively uniform growth were bent at an angle of approximately 45° to the vertical direction of the ground. The trunk segments at the bending site were sampled after 10, 30 and 60 days of bending. After the bark was removed, the outer xylem was scraped from the compression wood (CW) and opposite wood (OW) and was quickly frozen in liquid nitrogen.
Untreated, upright plants were used as controls (CK). Five biological replicates were tested. Transverse sections (thickness of 10 μm) were prepared with the use of stem segments harvested after 60 days of bending. To visualize the thickening of the cell wall, the sections were stained with safranin. In addition, the double-wall thickness of tracheid was measured using a modi cation of Franklin's method [27]. At least 90 tracheids from each wood type were measured using a light microscope, and the data were analysed for signi cant differences using SPSS 19.0 (IBM) with Student's t-test.

Preparation of full-length transcriptomic library
The RNA from ve stem segments were subjected to PacBio library construction and sequencing. The cDNAs were synthesized with Clontech SMARTer PCR cDNA Synthesis Kit according to the manufacturer's protocol and the PCR products were sequenced with the PacBio Sequel System. The longread sequence was analysed by Single-Molecule Real-Time (SMRT) analysis software (v. 2.3.0) using the RS_IsoSeq.1 protocol (https://github.com/Paci cBiosciences/). High-quality consensus isoforms were obtained based on the Iterative Clustering for Error Correction (ICE) algorithm and subsequent Quiver polishing. All these isoforms were further clustered and ltered using CD-Hit (v. 4.6) [28] to obtain unigenes.
Identi cation of R2R3-MYB family members in C. lanceolata The Hidden Markov Model (HMM) le for the MYB domain (PF00249) was downloaded from the Pfam database (http://pfam.sanger.ac.uk/). The putative MYB members obtained from the full-length transcriptome of C. lanceolata were identi ed by HMMER 3.0 [29]. A total of 116 putative sequences containing MYB domain were identi ed. The protein sequences were analysed via SMART (http://smart.embl-heidelberg.de/) and InterPro (http://www.ebi.ac.uk/interpro/) to examine the topological arrangement of MYB domains in the sequences. BLASTp searches in conjunction with the nonredundant (NR) library (https://blast.ncbi.nlm.nih.gov/Blast.cgi) were used to aid in the annotation of the sequences. As a result, 70 sequences harbouring R2R3-MYB domain were identi ed. ClustalX multiple sequence alignment tool (v. 2.1) [30] was employed to remove redundant sequences. High similarity sequences were ltered-out and as a result, 52 putative R2R3-MYB sequences remained in the dataset for further analysis.
Protein sequence analysis and phylogenetic tree construction The ExPASy database (https://expasy.org) was used to analyse the physicochemical properties of R2R3-MYBs in C. lanceolata. Multiple sequence alignments were performed using ClustalX, and sequence logos of the R2 and R3 domains were plotted via the WebLogo (http://weblogo.Berkeley.edu/logo.cgi). The motifs of the R2R3-MYBs in C. lanceolata were analysed using the MEME program (http://memesuite.org/tools/meme). The motif width was set to 6-50, and the number of motifs was 20, with the default values for the remaining parameters.
Expression pro le analysis of ClMYBs based on RNA sequencing (RNA-Seq) and quantitative real-time PCR (qRT-PCR) The RNA from various organs and tissues such as female cone (FC), male cone (MC), leaves (L), roots (RT), ve stem segments (named as S1to S5), bark (B) and xylem (X) were used to construct RNA-Seq libraries with three biological replicates were included. Total RNA from C. lanceolata samples was extracted according to the PureLink Plant RNA Reagent Kit manual (ThermoFisher Scienti c, USA). Paired-end 150-bp sequencing was performed on the Illumina HiSeq Xten platform. The transcript abundance of the ClMYBs in the different organs and tissues was reported in terms of fragments per kilobase of exon model per million mapped reads (FPKM). After they were log-transformed, the expression data were subjected to hierarchical clustering, and an expression pro le heatmap was constructed by TBtools software [32].
The qRT-PCR analysis was performed using a PrimeScript Reagent Kit in conjunction with a SYBR Premix Ex Taq Kit (Takara, Japan). The qRT-PCR system consisted of 5 μl of SYBR Premix Ex Taq (2×), 0.2 μl of forward primer (10 μM), 0.2 μl of reverse primer (10 μM), 0.8 μl of the rst-strand cDNA, and enough double-distilled water (ddH 2 O) to bring the total volume of 10 μl. The reaction was performed with the following the temperature program: 95°C for 2 min; 40 cycles of 95°C for 5 sec, and 60°C for 30 sec. The data obtained were analyzed by the 2 -ΔΔCt method, with the housekeeping gene ClActin used as an internal control. All calculations were performed via CFX96 Manager version 1.6 (Bio-Rad, USA) and Excel 2007 software. The qRT-PCR primer sequences are listed in Supplementary Table S1.

Subcellular localization, transcriptional activation assay and ectopic expression analysis of ClMYB1
The open reading frame (ORF) of ClMYB1 was ampli ed by PCR with gene-speci c primers and then inserted into a pCAMBIA1302 vector by restriction enzyme ligation, yielding a ClMYB1::sGFP (synthetic green uorescent protein) fusion expression cassette, which was transformed into onion epidermal cells via the particle bombardment method. The uorescence signals were observed and imaged via a Zeiss LSM510 confocal microscope (Carl Zeiss, Oberkochen, Germany).
The ORF of ClMYB1 without its termination codon was inserted into a pGBKT7 vector (Clontech, USA) harbouring GAL4-BD via Gateway technology. The recombinant plasmid was subsequently transformed into the yeast (Saccharomyces cerevisiae) strain Gold2. Positive clones were selected on SD/-Trp media, and the transformants were plated and cultured on SD/-Trp/-His/-Ade/X-a-gal media for transcriptional activation analysis.
ClMYB1 was inserted into a pCAMBIA13011 vector downstream to the 35S promoter by restriction enzyme digestion and ligation. This p35S::ClMYB1 expression cassette was transformed into N. benthamiana via the leaf disc method, mediated by Agrobacterium tumefaciens strain GV3101. Phenotypic analysis of two-month-old transgenic plants (T3) was performed. The DNA was extracted by the cetyl-trimethylammonium bromide (CTAB) method, and the transformation events of ClMYB1 were analysed via PCR. Total RNA was extracted via a TRIzol kit (ThermoFisher Scienti c, USA), and the expression of ClMYB1 was measured via semiquantitative RT-PCR (semi-qRT-PCR) as described by [24]. The relevant primer sequences are listed in Supplementary Table S1. Phloroglucinol-HCl staining was applied to observe the lignin deposition in the stems, and the relative lignin content in the stems was determined according to the lignin-thioglycolic acid (LTGA) method [33].

Identi cation and domain analysis of ClMYBs
A total of 116 candidate sequences with MYB domains were selected from the full-length transcriptomic library, including three 4R-MYB, seven 3R-MYB, 70 R2R3-MYB, and 36 1R-MYB proteins. After the removal of redundant sequences, we identi ed 52 R2R3-MYB with complete ORF, which were sequentially named as ClMYB1-52 (Supplementary Table S2).
A comparative phylogenetic study of R2R3-MYBs in C. lanceolata and other plant species To analyse the evolutionary relationship of ClMYBs and known R2R3-MYBs from A. thaliana, E. grandis, P. trichocarpa, O. sativa, P. taeda, and P. glauca, we constructed the neighbour-joining phylogenetic tree ( Fig. 2; Supplementary Figure S2). According to the topological structure, all the proteins could be divided into 47 subgroups, 21 of which harboured R2R3-MYBs from C. lanceolata. Over 50% of the subgroups were supported by bootstrap value of > 70%. Lower bootstrap values were found in the subgroups with mixed sequences of gymnosperm and angiosperm origin. This can be justi ed because many gene duplication events in the R2R3-MYBs occurred after divergence from a common ancestor [34] and rapid diversi cation of the family after the division [35]. Nomenclatures of the subgroups were previously reported for A. thaliana [8,36], later these were modi ed by   [37] following a genomewide evolutionary analysis of the R2R3-MYBs from the woody species, E. grandis. The sequences of A. thaliana and E. grandis were correctly placed in the previously assigned subgroups. Three C. lanceolata speci c subgroups were identi ed and they were named as subgroup 13b and ClMYB15, which harboured single candidate, ClMYB24 and ClMYB15, respectively. Subgroup ClMYB17 contained two candidates, ClMYB17 and ClMYB18. Furthermore, subgroup 9c containing ClMYB28 and PgMYB11 represented a gymnosperm speci c lineage.
From the evolutionary analysis of R2R3-MYBs from E. grandis, MYBs of strictly woody species are detected in a total of ve woody-preferential subgroups [12]. In the present study, we found that only ClMYB21 present in the woody-preferential subgroup III (Supplementary Figure S2). Subgroup 5 (S5) and AtMYB5 (SAtM5) contained a high proportion of MYBs from woody perennial plant species and were referred to as woody-expanded subgroups, with eight and six ClMYBs assigned to these two subgroups, respectively. Motif analysis of R2R3-MYB proteins of C. lanceolata In addition to the R2 and R3 domains, 17 conserved motifs were identi ed via MEME; these motifs were designated as motifs 1 to 17 in increasing order of their corresponding E-values ( Fig. 3; Supplementary Data S1). Motif 1 was upstream of the R2 domain, and the other motifs were downstream of the R3 domain (Fig. 3B). Among the different subgroups, the amount and type of motifs were signi cantly different. Most of the members belonging to the same subgroup shared one or more motifs (Fig. 3A). For example, ClMYB8, ClMYB9, ClMYB11 and ClMYB29 belonging to S22 contained motif 5, motif 8, and motif 17. Motif 15 carrying the core sequence of (E/I)LIHMADFFQ was unique to ClMYB3, ClMYB4 and ClMYB5 in the subgroup S4. Whereas, motif 17 with the sequence of (D/E)(I/V)NLDL(C/S)ISLP was the ERF-associated amphiphilic repression (EAR) motif that common between angiosperm and gymnosperm in subgroup S4 and dictated the repressive role of the members on the phenylpropanoid metabolism [59].
Expression pro les of ClMYBs in different organs and tissues To explore the involvement of ClMYB in the secondary wall development, expression pro les of the 52 ClMYB encoding genes in the six organs and tissues were analyzed using the available transcriptome data. Forty-three ClMYB were expressed in all the organs and tissues tested (FPKM > 0), eight of which (ClMYB5, 8, 9, 11, 17, 23, 25, and 47) were constitutively expressed (FPKM > 2 in all samples) ( Fig. 4; Supplementary Table S3). Compared with the other ClMYBs, ClMYB9 showed relatively higher expression levels in the organs and tissues examined, with a corresponding mean FPKM value of 100.4. In some cases, the expression pro les of closely related members were similar. For example, the expression patterns of ClMYB22, ClMYB23, and ClMYB25 in the subgroup SAtM5, one of the woody-expanded subgroups, they were clustered in two sub-clades that were highly expressed in female cone. However, in most cases, the expression pro les of the genes present in the same subgroup differed, suggesting their speci c roles in different organs. This notion was manifested by ClMYB14 and ClMYB45, though they also belonged to subgroup SAtM5, they were highly expressed in the male cone. Likewise, the expression of another SAtM5 ortholog, ClMYB20 was highly expressed in root and bark tissues.
The expression pro les of the nine candidates that might involve in regulating secondary wall development in ligni ed tissues could be divided into three categories: ClMYB1, ClMYB26, ClMYB27, ClMYB49, and ClMYB51 were dominantly expressed in the stem xylem. (ClMYB3 and ClMYB5) and (ClMYB2 and ClMYB4) were segregated in sub-clades that exhibited a high expression in the root and female cone, respectively. Since the R2R3-MYBs may act as an activator or repressor to secondary wall development, we followed their expression pro le in different stem segments with maturation increasing from S1 to S5 (Fig. 5, Supplementary Table S3). For the xylem speci c regulators (ClMYB1, ClMYB26, ClMYB49, and ClMYB51), their expression increased congruent with stem maturation; except ClMYB27 that its expression was indifferent in the developmental stem tissues. The expression of ClMYB3, ClMYB4, and ClMYB5 in the stems tended to decrease with stem maturation, suggesting they may play negative regulatory roles in wood formation. Interestingly, we found that although ClMYB21 belonged to the woody-preferential subgroup III, its gene expression was not signi cantly different in the developmental stems. The RNA-Seq data were further veri ed by qRT-PCR experiments on eight representative samples for 12 selected MYB genes (Supplementary Figure S3). Correlation coe cients of 0.845-0.956 (p < 0.01) indicated that the gene expression patterns obtained from the RNA-Seq data were consistent with the qRT-PCR results.

Expression patterns of ClMYBs involved in compression wood development
Trees develop reaction wood in response to a gravistimulus. In gymnosperms, reaction wood usually develops on the lower side of leaning stems or branches and is referred to as compression wood. Typical features of C. lanceolate compression wood could be observed when plants were mechanically bent at a 45° angle from the vertical axis. The tracheid walls of compression wood were signi cantly thicker than those of normal wood (Fig. 6A, Supplementary Data S2) and contained more lignin but less cellulose [60]. By analysing the expression of ClMYBs encoding genes associated with secondary wall development in the compression wood, we could follow how they were coordinated or reprogrammed that might cause perturbation of anatomical and chemical features of woody tissues in Chinese r (Fig. 6B, Supplementary Data S3).
Xylem speci c ClMYBs generally exhibited increased expression level in compression wood as the bending period prolonged albeit with minor differences. The expression level of ClMYB26, ClMYB27 and ClMYB49 exhibited similar expression patterns, as the expression level in the compression wood was signi cantly greater than that in the upright wood and reached its maximum after 60 days of bending. The expression level of ClMYB1 in the compression wood was rst increased but then decreased with the progression of the bending; also, these levels were signi cantly higher than those in the upright wood. No signi cant difference in expression levels between the opposite wood and upright wood, except that ClMYB49 in opposite wood had shown a signi cant higher expression level than that in the upright wood. The expression level of ClMYB51 in both the compression wood and opposite wood uctuated and did not exhibit a distinct increment in the compression wood as that shown by other xylem speci c MYBs.
The responses of the potential negative regulators ClMYB3, ClMYB4, and ClMYB5 to bending were also different from each other during compression wood induction. The expression level of ClMYB3 in the compression wood was signi cantly lower than that in the control, but the expression level in the opposite wood was not signi cantly different from that of upright wood. The expression level of ClMYB4 in the compression wood decreased signi cantly during the rst 10 days and then returned to values similar to that in the upright wood, while the corresponding values of the opposite wood were signi cantly lower than those of the control and were 0.45 and 0.44 at 30 and 60 days after bending, respectively. Additionally, the expression of ClMYB5 in the compression wood gradually increased, reaching a maximum (2.11) after 60 days of bending. Only after ten days of bending, the expression of this gene in the opposite wood was signi cantly different from that in the control.

Subcellular Localization And Transcriptional Activity Analysis Of Clmyb1
The phylogenetic and gene expression analysis suggested that the function of the xylem speci c ClMYBs related to the regulation of secondary wall development might be conserved in C. lanceolata. To verify the protein function, we examined the functionality of ClMYB1 (GenBank accession number: JQ904045) by subcellular localization and overexpression in tobacco plant. ClMYB1 was highly homologous to PtMYB1 and PtrMYB152 (Fig. 7A) and was expressed predominantly in the developing stem, suggesting that this gene participated in secondary wall development in C. lanceolata. The subcellular localization showed that the control GFP was distributed throughout the whole epidermal cell, while the ClMYB1:GFP fusion protein accumulated exclusively in the nucleus (Fig. 7B). The results indicated that ClMYB1 is localised in the nucleus, which is consistent with its function as a transcription factor. As shown in Fig. 7C, the yeast transformants harboring the negative control plasmid pGBKT7 and the recombinant plasmid pGBKT7-ClMYB1 were both able to grow on SD/-Trp media, but only the yeast containing the recombinant plasmid grew well on the selection media and induced the expression of the α-galactosidase reporter gene, indicating that ClMYB1 is a transcriptional activator. Taken together, these results indicated that ClMYB1 functions as a transcriptional activator in the nucleus.

Ectopic expression of ClMYB1 in N. benthamiana
To investigate its regulatory functions in secondary vascular development, ClMYB1 gene was successfully transformed into N. benthamiana. Eight putative transgenic lines were obtained, and two ClMYB1-overexpressed lines were selected for in-depth characterization. PCR revealed that ClMYB1 had been successfully introduced into the N. benthamiana genome and was transcribed in transformed lines (Fig. 8B). Compared with the wild-type (WT), the transgenic lines exhibited smaller leaves with an undulate margin and a smaller angle between the petiole of the upper leaves and the main stem (Fig. 8A).
Cross-sections of the top, middle and bottom parts of the stems were prepared for phloroglucinol-HCl staining. No signi cant difference was observed from staining results between the upper stem segments of the transgenic lines and WT, but darker staining was observed in the xylem regions of the middle and bottom stem segments from transgenic lines (Fig. 8C). A subsequent assay showed that the lignin content of the stems from transgenic lines was signi cantly higher than that of wild type (Fig. 8D, Supplementary Data S4). These results indicated that ClMYB1-overexpression could promote lignin deposition in the xylem regions of stems.

Discussion
Large genome size and limited gene information severely hinder the study of molecular mechanisms on critical biological processes related to organ development and stress resistance in many valuable coniferous timber species. Using an advanced sequencing technology, 52 R2R3-MYB encoding genes with complete ORFs were obtained from the full-length transcriptomic libraries of C. lanceolata. Although not as many as in Arabidopsis (126 AtMYBs) and Populus (187 PtrMYBs), the number of identi ed ClMYBs were distinctly greater than that previously reported in the conifers such as P. taeda and P. glauca [6,59]. This work effectively updated and improved the collection of full-length gymnosperms speci c R2R3-MYB transcription factors related to stem development.
Evolutionary relationship of R2R3-MYBs from C. lanceolata R2R3-MYBs are crucial in the regulation of plant development [8]. The number of R2R3-MYBs increased gradually in basal land plants and eventually expanded rapidly after the emergence of seed plants (spermatophyte) [21]. Depending on the mechanism of gene duplication, some duplicated genes may undergo sub-or neo-functionalization that implicated in the diversi cation of gene function [61]. The comparative analysis of gene mapping on the spruce genome with those of monocots and A. thaliana suggests that the number of gene duplication shared by angiosperm and gymnosperm is greater than that of gymnosperm speci c gene expansion [62]. Similarly, the R2R3-MYBs from C. lanceolata belonged to most subgroups shared by monocot and dicots, indicating that these subgroups might share the common ancestor. Only a small number of subgroups were speci c to gymnosperm lineage. There is an excellent scope for characterization of the candidate genes to hypothesize new functions related to the coniferous developmental growth. The dominant expression of gene encoding ClMYB24 of subgroup S13b in xylem tissues, inferred it a potential regulator of stem development in gymnosperms.
It has been shown that the divergence between monocot and dicotyledonous species, which accompanied by whole-genome duplications [63], resulted in the speciation of subgroups limited to monocots and/or dicotyledonous lineages [9,12]. Out of the ve woody-preferential subgroups identi ed by Soler et al., (2015) [12], only one of them harboured the ClMYB21 from C. lanceolata. In contrast, the woody-expanded subgroups that were typically enriched in R2R3-MYBs of woody species, but also carrying one or two sequences from A. thaliana or O. sativa, contained a high number of sequences from C. lanceolata. In particular, the woody-expanded subgroup SAtM5 contained up to six ClMYBs, with their expression being variably expressed in different part of organs. There are two possible reasons for the absence of ClMYBs in majority of the woody-preferential subgroup: 1) the corresponding transcripts were not detected in the present transcriptomic libraries; 2) most of the woody-preferential subgroups have emerged after the gymnosperm/angiosperm split. The overrepresentation of ClMYBs in the woodyexpanded subgroups suggested that the subgroup appears predating the divergence of angiospermgymnosperm and underwent asymmetric gene duplication and diversi cation in woody species [20].

Involvement of ClMYBs in the regulation of secondary wall development
The ClMYBs resided in some of the A. thaliana secondary cell wall-related subgroups based on the phylogeny analysis. We also observed their expression in different organ and developmental stem, as well as compression wood to justify their involvement in secondary wall development. It was hypothesized that the functional role of the xylem speci c ClMYB1/26/49/51 might be directly involved in the upregulation of secondary wall development in the stem of C. lanceolata. Their categorisation with those of A. thaliana and woody species was supported by a high bootstrap value. ClMYB1 and ClMYB51 were clustered in subgroup SAtm85 (bootstrap value 83%), which included several MYBs related to secondary wall development, such as AtMYB85, PtrMYB152, and PtMYB1. AtMYB85, a target gene of AtMYB46/83, regulate lignin biosynthesis in A. thaliana [40] and is highly expressed in the xylem of in orescence stems and roots [64]. PtrMYB152 is a transcriptional activator speci c for lignin biosynthesis in P. trichocarpa and is expressed mainly in ligni ed tissues [65]. MYB1 of P. taeda activates the transcription of genes encoding lignin biosynthetic enzymes by binding with AC elements and is mainly expressed in developing xylem [19]. ClMYB1 and ClMYB51 were expressed mainly in the developing xylem of the stem, and their expression levels in the stem increased with ligni cation. In response to the bending treatment, the expression of ClMYB1 was upregulated, corresponding to the lignin content in the compression wood. To date, no available genetic transformation system has been established for C. lanceolata, so the biological functions of most candidates involved in the wood formation of this species remain mostly unexplored. In this study, we selected ClMYB1 for further functional characterization, based on subcellular localization and transcriptional activity analysis.
Functional analysis by overexpression of ClMYB1 in N. banthamiana indicated that it could increase lignin deposition in the stems. The results indicated the retention of function between ClMYB1, AtMYB85 and PtrMYB152, although the conservation of their activators and downstream targets remains to be studied further.
The expression patterns of ClMYB49 in different organs, tissues and during the formation of compression wood were essentially consistent with those of ClMYB1, indicating that they may also be involved in the regulation of secondary wall development in C. lanceolata. The sequences of ClMYB49 were segregated in subgroup SAtm46 together with PtMYB4 and AtMYB46/83 (bootstrap value 91%).
AtMYB46/83 activated the transcriptional activity of secondary wall biosynthesis and was directly controlled by the NAC domain-containing SND1 master regulator in A. thaliana [39,52,66]. PtMYB4 was expressed mainly in the xylem and its protein can bind to AC elements. The overexpression of this gene in tobacco lead to an increase in lignin content [18]. ClMYB26 was clustered in the subgroup SAtM103 with AtMYB103, PtrMYB010, and PtrMYB128 (bootstrap value 97%). Intriguingly, AtMYB103 in uenced syringyl-lignin synthesis by regulating the expression of ferulate-5-hydroxylase (F5H) gene [40,67]. Slignin is typically enriched in the secondary wall of dicots, while gymnosperms have only a small amount of S-lignin [68]. The regulatory process of ClMYB26 will be subjected to future investigation to identify the downstream targets and the in uence on lignin composition. The speci city of the regulatory mechanism of ClMYB26 and AtMYB103 towards the biosynthesis of monolignol would clarify whether divergence of function has occurred after the angiosperm/gymnosperm split.

Repressive Role Of Clmybs In Subgroup S4
The expression of ClMYB3, ClMYB4, and ClMYB5 in the stems decreased with increasing ligni cation, and these genes clustered with AtMYB4 and EgrMYB1 in Compared with angiosperm, subgroup S4 was asymmetrically expanded in gymnosperm species such as P. glauca and P. taeda [59]. ClMYB3/4/5 carried a unique motif with core sequence of (E/I)LIETADFFQ.
The similar motif was also detected when aligning with the P. glauca orthologs (Pg5/10/13). Still, none were found in the orthologs of O. sativa, E. grandis, and P. trichocarpa, suggesting this motif was speci c to gymnosperm lineage. Since the role of this unique motif is unknown, the regulatory process involving ClMYB3/4/5 may en-route differently, though they share the repressive EAR motif responding to external stimuli and hormonal signals [69] with angiosperms. Moreover, in response to the bending treatment, ClMYB3 and ClMYB5 were down-and upregulated respectively with the development of compression wood. These all observations suggested that they might be recruited differently in the regulatory process, hence targeting different downstream targets, and playing different roles in wood formation responding to defence and stress stimulation. There was no additional external funding received for this study. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Availability of data and material
The raw sequence data reported in this paper have been deposited in the Genome Sequence Archive [70] in BIG Data Center [71], Beijing Institute of Genomics (BIG), Chinese Academy of Sciences, under accession numbers CRA003755 that are publicly accessible at http://bigd.big.ac.cn/gsa. The datasets supporting the conclusions of this article are included within the article and its supplemental les.
Ethics approval and consent to participate  Figure S1 Cross-sections stained with Phloroglucinol-HCl for the different ligni cation stems of C. lanceolata. Five stem segments from the top to bottom of a one-year-old branch were collected and marked sequentially S1, S2, S3, S4, and S5. Figure S3 Expression pro les of 12 ClMYB genes in eight organs and tissues using qRT-PCR. Data were normalised to ClActin gene. The expression level in S1 was used as a reference and was set at 1. Error bars indicate standard deviation of three biological replicates. The corresponding FPKM values were listed. The r represented the correlation coe cient between qRT-PCR and RNA-Seq data, and the p was the P-value. L: leaves, S1-5: stem segments from top to bottom, X: xylem, B: bark.
Table S1 Primers used in the analysis of qRT-PCR, subcellular localization, yeast hybridization, overexpression, and semi-qRT-PCR. Data S1 Descriptions and sequence logo plots of motifs in ClMYB proteins.
Data S2 Tracheid wall thickness of different samples.