The Transcriptome of Cunninghamia lanceolata male/female cone reveal the association between MIKC MADS-box genes and reproductive organs development

Background Cunninghamia lanceolata (Chinese fir), a member of the conifer family Cupressaceae, is one of the most popular cultivated trees for wood production in China. Continuous research is being performed to improve C. lanceolata breeding values. Given the high rate of seed abortion (one of the reasons being the failure of ovule and pollen development) in C. lanceolata, the proper formation of female/male cones could theoretically increase the number of offspring in future generations. MIKC MADS-box genes are well-known for their roles in the flower/cone development and comprise the typical/atypical floral development model for both angiosperms and gymnosperms. Results We performed a transcriptomic analysis to find genes differentially expressed between female and male cones at a single, carefully determined developmental stage, focusing on the MIKC MADS-box genes. We finally obtained 47 unique MIKC MADS-box genes from C. lanceolata and divided these genes into separate branches. 27 out of the 47 MIKC MADS-box genes showed differential expression between female and male cones, and most of them were not expressed in leaves. Out of these 27 genes, most B-class genes (AP3/PI) were up-regulated in the male cone, while TM8 genes were up-regulated in the female cone. Then, with no obvious overall preference for AG (class C + D) genes in female/male cones, it seems likely that these genes are involved in the development of both cones. Finally, a small number of genes such as GGM7, SVP, AGL15, that were specifically expressed in female/male cones, making them candidate genes for sex-specific cone development. Conclusions Our study identified a number of MIKC MADS-box genes showing differential expression between female and male cones in C. lanceolata, illustrating a potential link of these genes with C. lanceolata cone development. On the basis of this, we postulated a possible cone development model for C. lanceolata. The gene expression library showing differential expression between female and male cones shown here, can be used to discover unknown regulatory networks related to sex-specific cone development in the future.

(Continued from previous page) discover unknown regulatory networks related to sex-specific cone development in the future.
Keywords: Cone development, C. lanceolata, transcriptome, MADS-box gene, Floral development model Background Cunninghamia lanceolata (Lamb.) Hook (Chinese fir) is one of the most commercially important timber trees in the south of China. It has been cultivated for thousands of years because of its outstanding wood properties and high growth rate [1], with its plantation area accounting for 24% of the total plantation area in China [2]. C. lanceolata is a monoecious conifer species, with female cones distributed in the upper and middle crown and male cones distributed in the middle and lower crown [3]. C. lanceolata female/ male cones differ greatly from classic angiosperm flowers (Fig. 1a). Specifically, the female cone is comprised of bract-scales with ovules produced at their base. The bract-scales will gradually open and ovules will then receive pollen to complete fertilization ( Fig. 2b-c) [8]. In C. lanceolata, male cones are aggregated into a compound structure consisting of several strobili, each of which is wrapped by many microsporophylls that contain the pollen sac, which release pollen grains after they have matured ( Fig. 2a and c) [8]. Due to its high commercial value, C. lanceolata is continuously improved by breeders. One of the important breeding goals is to increase its reproductive efficiency; as~50-70% of its seeds may be abortive, a fundamental reason for its low germination rate [9]. Why seeds abort is not yet fully understood; two possible causes are pollen abortion and abnormal ovule development [10]. To understand the underlying processes causing seed abortion, it is therefore necessary to study the molecular mechanisms of flower/cone development.
Flower development is a complex biological process and is affected both by genetic and environmental factors [11]. In angiosperms, the classic homeotic ABC model explains how local gene expression is able to control flower identity (Class A genes, SQUA/AP1; Class B genes, AP3, PI; Class C genes, AG) [12]. After its initial conception, the ABC model was later expanded upon by adding class D (Class D genes, SHP, STK) [13] and E (Class E genes, SEP1, 2, 3, 4) [14] genesthe ABCDE modelas more genes were found to be involved. The model works as such that unique combinations of each homeotic gene class (A to E) that are expressed in a certain region of the developing flower, give rise to a specific flower tissue type [5]. For example, class A and E are involved in sepal formation, the combination of ABE affects petal development, BCE controls stamen formation, CE affects carpel development and DCE is involved in establishing ovule (Fig. 1a) [15,16]. However, in gymnosperms, the development of female and male cones is assumed to be controlled by tetramers of class B and C proteins only [5,6]. However, still two alternative perspectives on the gymnosperm floral homeotic model  [4]. b B(C) model was proposed to control the development of male and female cones in gymnosperms, (C) indicates C + D [4][5][6]. c Two type of MADS-box proteins are shown: type I (SRF-like) and type II (MEF2-like). "The scale indicates the number of amino acids of the protein. The "?" indicates that the C-terminal is not well defined yet" [7] (redrawn from Fig. 1(https://www.pnas.org/content/97/10/5328), Copyright (2000) National Academy of Sciences, U.S.A.) exist: the B(C) system (Fig. 1b) and the (A)B(C) system, in which (C) represents class C + D, (A) represents class A + E [5].
Excepting the AP2 gene, genes belonging to the ABCDE model are members of the MADS-box gene family, which have crucial functions in floral organ development [17]. All MADS-box proteins harbor a highly conserved domain, the MADS domain, which can be grouped into two main lineages: type I (SRF-like) and type II (MEF2-like), based on sequence conservation [7]. Both MADS lineages can be found in plants, animals and fungi. However, there are some special structures, such as the K domain, that are only found in the type II MADS-box genes of plants ( Fig. 1c; redrawn from Fig.1 (https://www.pnas.org/content/97/10/5328) "Copyright (2000) National Academy of Sciences, U.S.A." ) [7]. So far, type II MADS-box genes have been more thoroughly studied for their functions in plant flower development [18]. A distinguishing feature of type II MADSbox genes in plants is that they harbor three more domains than type I MADS genes: an intervening (I) domain, a keratin-like coiled-coil (K) domain, and a Cterminal (C) domain (Fig. 1c) [19]. The highly conserved MADS domain is one of the main features of this gene family, which determines DNA binding and dimerization [20]. The K domain likely mediates protein-protein interactions, and is possibly also involved in the direct interaction with other proteins [20]. The MADS domain and K domain are linked by a short intervening I domain [21]. In some MADS-box proteins the C-terminal region is involved in the transcriptional activation or ternary complex formation [22,23]. These genes are classified as MIKC-type and are specific to plants [7].
Previous research in C. lanceolata has mainly focused on the regulation of cambial activity [24], EST-SSR markers development [25][26][27], genes associated with growth and development [28], cellulose and lignin biosynthesis [29] and proteome analysis of early seed development [1]. Until now, little is known about the molecular mechanisms of its female/male cones development. Here, we conducted an RNA-Seq transcriptomic approach to identify genes that are differentially expressed between immature female and male cones of C. lanceolata. This study provides a valuable resource for gymnosperm cone development-related genes and may aid in breeding trees with increased seed numbers in upcoming Chinese fir improvement programs.

Plant material
Immature female and male cones were collected in late February from two different living trees (No.3-15-31, No.4-9-31) that belong to a single C. lanceolata clone (6421 ) [24]. The trees were located at the Yangkou forest station of the Chinese fir National Germplasm Bank in Fujian, China. This station has a cooperative relationship with Nanjing Forestry University. To avoid the impact of sample differences, the female/male cones were sampled at a similar state and height in the trees. The collecting state of the female cone is that covered with green scale leaves and slightly opened bracts (Fig. 2b), while the male cones are consisting of several strobili (Fig. 2a). At this time, the ovule is already appeared but not fully formed, and so as the pollen. The author Renhua Zheng was responsible for the formal identification of the samples. However, to our knowledge, there is no herbarium to deposite the voucher specimen of this specific , integument (f'), nucellus (g') [8] material. For transcriptomic analysis, female/male cones were immediately frozen in liquid nitrogen and stored at − 80°C until RNA extraction. For SEM analysis, fresh cones were collected and fixed using 2.5% glutaraldehyde (0.1 M PBS, pH 7.2). All materials were obtained with permission.

Scanning electron microscopy
Female/male cones fixed in 2.5% glutaraldehyde were flushed with 0.1 M phosphate buffer, dehydrated using a series of graded ethanol solutions, dried using a critical point dryer (K850, EMITECH, England), mounted with double-sided adhesive tape on stubs, and coated with aurum in a sputter coater (E-1010, HITACHI, Japan). Samples were observed on a Quanta 200 scanning electron microscope (FEI, America) [30].

RNA extraction and mRNA library construction
An ethanol precipitation protocol and CTAB-PBIOZOL reagent was used for the purification of total RNA according to the manufacturer's instructions. Total RNA was quality controlled and quantified by a NanoDrop and Agilent 2100 bioanalyzer (Thermo Fisher Scientific, MA, USA). Oligo (dT)-attached magnetic beads were used to purify mRNA. mRNA was then fragmented, after which first-and secondstrand cDNA was generated using the First Strand reaction system. Afterwards, the purified cDNA was ligated to specific adapter sequences. Then, cDNA fragments were amplified by PCR, then purified using Ampure XP Beads. An Agilent 2100 Bioanaylzer and ABI StepOnePlus Real-Time PCR System were used for quantification and quality control of the sample library. The library was then sequenced using an Illumina HiSeq 4000 platform (BGI-Shenzhen, China) (reads length 151 bp). Sequenced reads were deposited in the NCBI Sequence Read Archive (SRA) with the accession number SRR10161401, SRR10161402, SRR10161403, SRR10161404.

Transcriptome data assembly and functional annotation
The raw data was first filtered to obtain high-quality clean data. Adapter sequences, low-quality reads (we define a low quality read as having more than 20% of its bases with a quality score below 10) and reads with more than 5% of their bases unknown were removed from the raw reads. Clean reads were then quality controlled by FastQC v0.11.7 [31]. Clean reads extended into contigs through the overlap between sequences by running Trinity (v2.0.6) [32]. Then, according to pairedend sequence information, contigs were assembled into transcript sequences.
Coding regions of assembled unigenes were annotated by mapping them to several public databases, respectively, using TransDecoder, after which a blastp algorithm [33] was run against uniprot_sprot [34] and HMMER databases with Pfam-A.hmm (Hidden Markov Model) [35] to identify conserved proteins.
Functional annotation of these sequences was performed by running blast against protein sequences from Arabidopsis thaliana, Populus trichocarpa, Oryza sativa, and Swiss Prot [34]. The final Gene Ontology (GO) [36] annotation result merged data from both A. thaliana and P. trichocarpa. Due to our interest in transcription factors (TFs) specifically, a gene type parameter was added to the annotation process. In all cases, the BLAST algorithm [33] was applied with an E-value parameter not greater than 10 − 5 .

Differential expression analysis
Gene expression levels were estimated by mapping clean reads to the Trinity transcript assembly using RSEM [37] for each sample. The abundance of all genes was normalized and calculated using uniquely mapped reads via the FPKM method [38]. The software edgeR [39] was used to identify differentially expressed genes (DEGs). The resulting P-value thresholds were adjusted for false discovery rate (FDR) via a multiple testing approach [40]. The condition for filtering significantly differentially expressed genes (up-and down-regulated genes) was FDR < 0.01 & fold change > 2. An R package was used for visualization of results and read dispersion. Significantly DEGs were also subjected to a GO enrichment analyses through the TopGO R package [41]. To detect which transcriptional factor families were significantly enriched (P-value < 0.01) at this developmental stage, a Chi-square test was used.

Identification of MADS-box transcription factors and MADS-box DEGs and phylogeny reconstruction
To identify C. lanceolata MADS-box sequences, two reported Hidden Markov Model profiles SRF (PF00319) and K-box (PF01486) were obtained from Pfam [35]. Using HMMER software [42] with these two profiles and filter condition E-value ≤1.0E-04 candidate sequences were obtained, then further verified sequences using SMART [43].
To faithfully identify differentially expressed MADSbox genes in female/male cones of C. lanceolata, lowly expressed MADS-box genes were removed from the DEG list by the edgeR analysis package [39], leaving 27 MADS-box unigenes. R packages pheatmap (1.0.12) and MEGA 7.0 were used to analyze expression levels and construct phylogenetic tree, which shown as heatmap clusters. Sequence raw data of one-year-old leaves [44] were downloaded from the NCBI Sequence Read Archive (SRA) database (SRX2586190) to use as a vegetative organ expression comparison.
MADS-box sequences of A. thaliana, O. sativa and Vitis vinifera were obtained from the Plant Transcription Factor Database (http://planttfdb.cbi.pku.edu.cn/index. php), while sequences of Cryptomeria japonica, Picea abies, Pinus taeda (http://congenie.org) were gained separately from three articles by Futamura et al. [45], Carlsbecker et al . [46] and Chen et al. [47]. All the reference sequences were listed in additional file 1. Subsequently, full length of multiple sequence were aligned using MAFFT [48], after which the RAxML v8.2.11 [49] was used to construct a phylogenetic tree with the PROT-GAMMAAUTO mode and 100 bootstrap replications. To support phylogenetic analysis, the alignment of MADS-box genes M, I, K, C domain in V. vinifera, P. abies, P. taeda, C. japonica, and C. lanceolata were selected and showed by Texshade [50].

qRT-PCR analysis
Several MADS-box genes were selected to validate our DEGs detection. Total RNA was obtained from immature female/male cones using a Bioteke plant total RNA extraction kit (RP3301), only replacing the lysis buffer by CTAB. Total RNA integrity was determined by gel electrophoresis (1% gel) and RNA concentration was measured using a Nanodrop-2000 spectrophotometer (Thermo, Inc.). cDNA was synthesized through a reverse transcriptase approach using the Vazyme HiScript 1st Strand cDNA Synthesis Kit(R211-02), then quantified using a Qubit 2.0 (Invitrogen). Quantitative real-time PCR (qRT-PCR) reactions were performed in triplicate using the Vazyme AceQ qPCR SYBR Green Master Mix (without ROX) (Q121-02) on a LightCycler 480 II (Roche). Gene expression analysis was performed based on three technical and biological replicates and normalized with the reference gene CleIF3. Expression data were calculated through the Livak calculation method, and show as log (2 -ΔΔCt ) [51].

Results
Female and male cones development in C. lanceolata Seed abortion is a non-negligible aspect of C. lanceolata breeding. To improve breeding values of C. lanceolata, it is necessary to study the molecular mechanisms of cone development. This is only exacerbated by the fact that C. lanceolata is a gymnosperm, and the structure of its female/male cone differs greatly from that of angiosperm flowers ( Fig. 1a and Fig. 2) [52]. In order to better understand the morphological characteristics of C. lanceolata female/male cones, we used a Scanning Electron Microscope (SEM) to observe the female and male cones, especially to observe the ovule and pollen. At the stage we sampled, each female cone contains a high number of bract-scales, with an ovuliferous scale at the base of the bract-scale, and 2-3 ovules located at the ovuliferous scale (Fig. 3b). The lobes, nucellus, and integument (Fig. 3c) have already formed but not yet completely differentiated, one lobe of ovuliferous scale develops for each ovule (Fig. 3c) [30]. The male cone is composed of a high number of microsporophylls, one at the central position and the remaining in a surrounding spiral arrangement. Each microsporophyll bears 2-3 pollen sacs (Fig. 3d-e). Each pollen (Fig. 3f) contains a pollen aperture.

Sequence assembly and annotation of C. lanceolata
We next conducted a whole transcriptomic approach to identify transcripts that are differentially regulated between the development processes of female and male cones in C. lanceolata. We therefore isolated total RNA from whole female/male cones and used Illumina sequencing technology to determine the transcriptome. We obtained a total of 22 (Table S1) were retained for further assembly. In total, 24.14GB RNA-Seq data were generated from sequencing. We assembled a total of 97,856 transcripts with a contig N50 length of 1925 bp and 63,223 unigenes with a contig N50 length of 1721 bp (Table S2). The median contig length of all transcripts and unigenes was 784 bp and 620 bp, the average length of all transcripts and unigenes was 1228 bp and 1066 bp, respectively (Table S2). All of the 63,223 assembled unigenes (Table S2) were functionally annotated, 2117 transcription factors were identified (Table S3).

Differential gene expression between female and male cones
In order to identify specific differentially expressed transcripts of C. lanceolata female and male cones at the same developmental stage. Among all the assembled unigenes, unigenes with low expression level were removed, resulting in 18,045 unigenes. We filtered these unigenes based on a selection criteria of FDR < 0.01 and Fold Change > 2. Then, we further characterized these genes using GO terms and functional classification. We found 5016 unigenes that were significantly differentially expressed, of which 2506 unigenes were down-regulated and 2510 unigenes were up-regulated in the male cones compared with the female cones (Fig. 4).
A GO enrichment analysis successfully categorized 2217/2168 of the up−/down-regulated unigenes into three GO subgroups (Fig. 5). We plotted the top 20 enriched GO terms of each subgroup, separately. The down-regulated genes (male < female, cool-toned) were involved in DNA replication (BP), nucleus (CC), protein binding (MF) etc., while the up-regulated (male > female, warm-toned) genes were involved in pollen exine formation (BP), cell wall (CC), oxidoreductase activity (MF) etc. (Fig. 5). These results indicated that cell division is active in the vigorous growth stage of female cones, while the male cone we sampled is mostly involved in pollen development.
Focusing on our previously identified 2117 C. lanceolata TFs, we found three gene families to be significantly enriched (P < 0.01): AP2, MYB-related, and MADS-box ( Table 1). The significant expression of MADS-box genes during C. lanceolata cone development is consistent with their roles during flower formation in other plant species, suggesting that this gene family plays an important role in C. lanceolata as well.

MIKC MADS-box transcription factors in C. lanceolata
Using the method above, we finally obtained 47 unique MIKC MADS-box genes from C. lanceolata (Table S4) and divided these genes into several branches, based on previous research (Fig. 6a, Tables 2 and S4) [46,47,53,54]. Meanwhile, the comparison results of MADS-box proteins domain in C. lanceolata, P. abies, P. taeda and V. vinifera, making the phylogenetic analysis available (Fig. S1). Most MADS branches can be found in C. lanceolata, like AP3/PI (class B), SEP (class E), AG (class C), STK/ SHP (class D), which are involved in flower organ identity [16]. However, branches like FLC, BS, FUL and AP1 (class A) cannot be found. This may be explained by the low expression of these homologous genes during the selected period. Another possible reason is that the C. lanceolata genome does not contain these genes. Due to not having a certain typical flower structure, floral organ identity related genes, like AP1, which contributes to the sepal and petal formation in angiosperms, could have been lost during evolutionary time. Specific cases of such potential gene loss require further research to illustrate.
On the contrary, some MADS branches, like the TM8 genes, are not found in Arabidopsis and Rice, but can be found in C. lanceolata, C. japonica [45], V. vinifera [53]. These results suggest that TM8 genes were established in the common ancestor of angiosperms and gymnosperms and that they have been lost independently during the relatively recent evolution history of some plant lineages [55].
We also identified a small number of MIKC MADSbox genes that can be classified into GGM7 branches, and not found in angiosperms [46]. In contrast, AGL15 and AGL12 genes were found in C. lanceolata, and Pinus taeda [47], as well as in angiosperms like A. thaliana [56] and V. vinifera [53], indicating that these genes might be functionally conservative and important for both angiosperms and gymnosperms flower/cone development. Meanwhile, there is a gene that cannot be classified into any branches. We searched this gene in NCBI (https://www.ncbi.nlm.nih.gov/) using blastp and found  that it was partial identity to AG-like gene. However, the classification cannot be gained in our phylogenetic tree. Thus, we named it with its number: MADS41, which make it a novel candidate gene.

MIKC MADS-box DEGs in C. lanceolata female and male cones
We next used our differential gene expression data to identify which MADS-box genes are differentially expressed between female and male cones, using expression data from leaves as a comparison of non-reproductive tissue. We reasoned that genes involved in the development of reproductive organs should be more specifically expressed in those organs. Out of the 47 C. lanceolata MIKC MADS-box genes, 27 genes differentially expressed between male and female cones, of which 18 (out of 27) are not expressed in leaves and 9 (out of 27) are not significantly expressed in leaves. (Fig.  6b and Table S5). Most B-class genes (AP3/PI) (4) were up-regulated in the male cone, similar to what was found in previous studies performed in other plant species [57,58], while TM8 genes were clearly expressed at a higher level in the female cone and more likely to be involved in female cone development.
Since we found that some AG (class C + D) genes are upregulated in male cones and others in female cones, it seems likely that these genes are involved in the development of either cones. We identified three SEP (class E) and four AGL6 genes in C. lanceolata. However, SEP genes showed a very low expression level, which is difficult to determine their differential expression across cones. Nevertheless, we do find two AGL6 genes expressed in both female and male cones. In fact, during the stage we collected, AGL6 genes showed higher expression level in the female cones.
GGM7 genes can be subdivided into 2 categories according to the phylogenetic tree: DAL10-like and DAL21-like. They have different expression patterns in female and male cones of P. abies [46], as well as in C. lanceolata. While in A. thaliana AGL15 expressed in   Fig. 6a.
leaf, inflorescences, anthers and pollen [59]; SVP expressed in young leaves, floral primordia and early coflorescences [60], AGL15 and SVP are highly expressed in male and female cones of C. lanceolata, respectively. Besides, MADS41 is a special gene with no obviously classification. But its high expression level in female cones, making it a candidate gene that may be involved in female cone development.
Validation of the C. lanceolata female/male cone transcriptomes In order to validate the differences observed between female and male cone libraries, we selected a limited number of C. lanceolata MIKC MADS-box genes from the differentially expressed gene list (Table S5) and performed qRT-PCR analysis on whole cone RNA (Fig. 7). This set includes genes known to be involved in carpel or stamen development in model organisms (AG, AP3/ PI), as well as genes not found in some angiosperms (such as Arabidopsis and rice) (TM8). These results were in close agreement with the RNA-Seq data, for example, the expression level of ClMADS34 gene in male cones was about 100 times that of female cones, and the expression of ClMADS10 in female cones was almost 10 times that of male cones, which was consistent with the results of the transcriptome data, suggesting the reliability of our transcriptomic profiling data ( Fig. 6b and Fig. 7).

Discussion
As an important timber species, C. lanceolata reproduction has always been one of the traits sought to be improved by breeding programs. Seed abortion is a common occurrence in C. lanceolata and can be caused by improperly formed ovules and pollen. Here, we studied the molecular mechanisms of C.
lanceolata cone development though a transcriptome analysis.
Based on these data, we performed sequential analyses to identify the differences between female and male cones, then we focused on the MADS-box gene family in C. lanceolata to reveal the potential specific genes involved in C. lanceolata cone development and the manifestation of the ABC model in gymnosperms.
We found class B, C, D and E genes in C. lanceolata, and for those genes which significantly up-regulated in male or female cone, were mostly not expressed in leaves. The B-class genes, AP3/PI (ClMADS44, 45,46,47) were mostly up-regulated in the male cone, which is most likely to influence male organ development. Similar results have been reported in angiosperm Quercus suber [57] and gymnosperm C. japonica [61], for example, the CjMADS1 gene (B-type MADS-box gene), expressing in male cone through its development in C. japonica. As is known from Norway spruce, B-type MADS-box genes, which are active in male organ primordia [62], are homologous to the B-class genes in angiosperms [63]. These findings indicated that B-type genes are maintained in both gymnosperms and angiosperms and may be conserved throughout seed plants.
C and D-class genes cannot be separated clearly in C. lanceolata, and are expressed in both reproductive organs. AG genes expressed in both cones of P. abies [64] and Gnetum gnemon [65] (gymnosperms). This is consistent with findings in Quercus suber (angiosperm), where Cclass genes are expressed at a similar level in both male and female flowers [57]. These results indicated that these C-type genes may play a similar role in both gymnosperms and angiosperms, which act as supply for both female and male cone/flower development [4]. Unfortunately, we were unable to identify the expression of E-class genes, as them are not significantly expressed in male or female Fig. 7 Relative expression of differentially expressed C. lanceolata female and male genes chosen to validate RNA-Seq results. ClMADS7, 10, 16, 26, 34, 47 were selected for validation, CleIF3 was used as a reference gene. The y-axis indicates the expression level (2 -ΔΔCt ), which was calculated using the Livak's method [47] and then transformed to a log10 scale (log10 (2 -ΔΔCt )). Error bars indicate the standard error (SE) cones. For this reason, we speculate that E-class genes are not necessary during this developmental process.
Additionally, we identified the expression of AGL6 genes in C. lanceolata, which are expressed in both female and male cones, higher in female cones, but not in leaves, similar as the expression pattern of their homologous genes in G. gnemon [65]. GGM7 genes had captured great deal of attention from us since they were only found in gymnosperms [66]. In P. abies, the GGM7 branch contains 2 genes: DAL10 and DAL21. DAL10 is specifically active in seed cones and pollen cones [66], and DAL21 is not detected in male cones or vegetative shoots, but in ovuliferous scale of female cones. Meanwhlie, ClMADS 30, 31, 32, which were classified into DAL21 branch, expressed in an obviously high level in female cones but not in male cones and leaves, with a similar expression pattern of that in P. abies. But things changed when it comes to DAL10 genes. DAL10 genes (ClMADS 39, 40) in C. lanceolata expressed in both female and male cones, and even higher in female cones. It reflects that there are both functional conservatism and functional differentiation in genes of different species.
AGL15 and SVP gene act as repressor of floral transition in A. thaliana [59,60], while in C. lanceolata, AGL15 and SVP are highly expressed in male and female cones, respectively. It could be an interesting research issue and may imply a similar inhibitor in Chinese fir, restricting the development range of cones. Furthermore, we identified several genes which may play an important role in female cone development. We detected TM8 genes which were all up-regulated in the female cone and basically not expressed in leaves. Researchers have found that in E. grandis, EgTM8 is expressed in the early and late floral bud [67]. And in tomato, TM8 may be important for ovary and fruit formation [68]. Gramzow et al. [69] showed that TM8 genes could be found in many gymnosperms, but little research has revealed its function in organ development in gymnosperms. Considering that ovules and pollen of C. lanceolata were still under development at the time of collection, we speculate that these genes are very likely to influence ovule development and can be further studied.
Based on our results, we tend to agree with the B(C) model of gymnosperm cone development proposed by Theißen et al. [5], which A and E-class genes may not involve in cone development. In order to verify the applicability of this model in C. lanceolata, more experiments are needed to confirm the function of B(C) genes and rule out the involvement of other genes (A, E-class genes). In a general way, we study the gene function by overexpressing and knockout this gene in the species. Unfortunately, a mature transgenic system for C. lanceolata has not yet been developed, and performing transgenesis experiments in this species would have the added downside that the flowering of woody plants takes a long time. Thus, other method should be considered, for example, expressed C. lanceolata B-type genes in model organisms such as Arabidopsis, so as to study the degree of functional conservation of those genes. But it must be emphasized that the gene function studies will eventually return to the species itself. Yet considering the difficulty of generating transgenic gymnosperms and their long generation times, these studies would need a lot of time and efforts.
Due to our limitation of material selection, the results were limited to the differential genes between female and male cones at a certain developmental period. Although some noteworthy genes were indeed found through our study, some information for cone development may be lost, and participation of those MADS-box genes in the entire developmental process cannot be obtained. Further research could monitor the entire developmental process, from cone initiation to female cone fertilization, to potentially find all MADS-box genes involved, and perform a more complete interpretation.

Conclusions
In summary, we performed an RNA-Seq analysis of female and male cones in C. lanceolata and analyzed the gene expression differences between female and male cones. We identified 47 MIKC MADS-box genes in C. lanceolata, and identified some MADS-box genes related to cone development in C. lanceolata, possibly conforming to the previous B(C) model for gymnosperms. We also identified additional genes that may play an important role in female/male cone development. In addition, we provided a library of gene data that shows differential expression between the female and male cones, which can be used as a basis for discovering unknown regulatory networks in the future. proteins in V. vinifera, C. japonica, P. abies, P. taeda and C. lanceolata (not fully length).
Additional file 6 Table S6. List of the primers used for qRT-PCR.
Abbreviations FPKM: Fragments per kilobase of transcript per million mapped reads; RSEM: RNA-seq by expectation maximization; edgeR: Empirical analysis of digital gene expression data in R.