Transcriptome analysis reveals a composite molecular map linked to unique seed oil profile of Neocinnamomum caudatum (Nees) Merr.

BACKGROUND
Neocinnamomum caudatum (Nees) Merr., a biodiesel tree species in the subtropical areas of South China, India and Burma, is distinctive from other species in Lauraceae family and its seed oil is rich in linoleic acid (18:2) and stearic acid (18:0). However, there is little genetic information about this species so far. In this study, a transcriptomic analysis on developing seeds of N. caudatum was conducted in an attempt to discern the molecular mechanisms involving the control of the fatty acid (FA) and triacylglycerol (TAG) biosynthesis.


RESULTS
Transcriptome analysis revealed 239,703 unigenes with an average length of 436 bp and 137 putative biomarkers that are related to FA formation and TAG biosynthesis. The expression patterns of genes encoding β-ketoacyl-acyl carrier protein synthase I (KASI), β- ketoacyl-acyl carrier protein synthase II (KASII), stearoyl-ACP desaturase (SAD), fatty acid desaturase 2 (FAD2), fatty acid desaturase 8 (FAD8) and acyl-ACP thioesterase A/B (FATA/B) were further validated by qRT-PCR. These genes displayed a very similar expression pattern in two distinct assays. Moreover, sequence analysis of different FATBs from diverse plant species revealed that NcFATB is structurally different from its counterpart in other species in producing medium-chain saturated FAs. Concertedly, heterologous expression of NcFATB in E. coli BL21 (DE3) strain showed that this corresponding expressed protein, NcFATB, prefers long-chain saturated fatty acyl-ACP over medium-chain fatty acyl-ACP as substrate.


CONCLUSIONS
Transcriptome analysis of developing N. caudatum seeds revealed a composite molecular map linked to the FA formation and oil biosynthesis in this biodiesel tree species. The substrate preference of NcFATB for long-chain saturated FAs is likely to contribute to its unique seed oil profile rich in stearic acid. Our findings demonstrate that in the tree species of Lauraceae family, the FATB enzymes producing long-chain FAs are structurally distinct from those producing medium-chain FAs, thereby suggesting that the FATB genes may serve as a biomarker for the classification of tree species of Lauraceae family.


Background
Neocinnamomum caudatum (Nees) Merr., a widely distributed species in the subtropical areas of South China, India and Burma, was assigned to the genus Neocinnamomum in the family Lauraceae [1,2]. Being one of the most enigmatic species of the Lauraceae family, N. caudatum shares many morphological similarities with species of the genus Cinnamomum. However, the phylogenetic analysis based on chloroplast genome shows that the genus Neocinnamomum is monophyletic and evolutionally far away from Cinnamomum [3,4]. Genus Neocinnamomum comprises only six species endemic to tropical Asia and shares a close relationship with the genus Caryodaphnopsis [3,5]. In China, N. caudatum is also known as "Baigui" whose bark and leaves have long been used as a traditional Chinese medicine [2]. In addition, mature seeds of N. caudatum contain up to 57.4 % of the storage lipid TAG on a dry weight basis [6]. In a sharp contrast to many well-documented Lauraceae species that produce predominantly medium-chain fatty acids (MCFA) in their seeds, such as decanoic acid (8:0), capric acid (10:0) and lauric acid (12:0),seeds of N. caudatum contain exclusively long-chain fatty acids(LCFA). Palmitic acid (16:0), stearic acid (18:0), oleic acid (18:1), linoleic acid (18:2) and linolenic acid (18:3), respectively account for 11.3, 21.2, 15.8, 35 and 13.1% of the total FAs (expressed as a mole percent) [6,7]. Notably, such a high proportion of 18:0 is rare in the family Lauraceae and even in the subclass Magnoliidae, implying that the molecular mechanisms governing the FA formation and triacylglycerol biosynthesis in seeds of N. caudatum are very likely to be different from those in other well-documented species in the family Lauraceae. Recently, N. caudatum has received much attention for its significant seed oil content, distinctive FA profile and abundant fruit yield. It has been recommended as a potential source of biodiesel in China [8]. Surprisingly, however, little is known about its genomic information so far.
In higher plants, the biosynthesis of FA initially takes place in the plastids, starting with pyruvate generated from glycolysis. In the plastids, pyruvate is oxidized to acetyl-CoA, which is then carboxylated by acetyl-CoA carboxylase (ACC) generating malonyl-CoA, the building block of FA synthesis [9]. FAs assembly occurs on acyl carrier protein (ACP) via a cycle of 4 reactions allowing the elongation of the acyl chain by two carbons each cycle. After seven cycles, the saturated 16:0-ACP can either be hydrolyzed by an acyl-ACP thioesterase (FAT) or further elongated by a β-ketoacyl-acyl carrier protein synthase (KASII) to 18:0-ACP. The latter then undergoes two fates: direct hydrolysis by a FAT enzyme or desaturation by SAD to generate 18:1-ACP which is then subjected to further hydrolysis. The free FAs formed from Acyl-ACP are then transported to the cytosol for further desaturation or elongation [10,11]. It is generally accepted that the FAT enzymes are one of the key determinants of the FA chain length [12]. Based on the substrate preference, there are two types of FATs in plants (FATA and FATB). FATA prefers unsaturated acyl-ACP (such as 16:1-ACP and 18:1-ACP ), while FATB prefers saturated acyl-ACP (such as 16:0-ACP and 18:0-ACP) [12]. In Lauraceae family, the MCFA-specific FATBs have been identified and characterized in Umbellularia californica, Cinnamomum Camphor, Cinnamomum longepaniculatum and Lindera communis [13][14][15][16]. The residues or domains that are presumably responsible for the substrate specificities of these FATBs were also studied by site directed mutagenesis and domain swapping experiments [14,17,18]. Nevertheless, owing to the lack of information pertaining to FATB homologs from the LCFA-rich species in the family Lauraceae, a major knowledge gap that limiting our understanding of the molecular mechanisms for the drastic differences in the FA composition among different tree species in this family, has yet to be addressed.
In this study, we analyzed the seed oil content and FA composition of eleven species of the family Lauraceae. We obtained the evidence that seeds of N. caudatum contained high quantity of 18:2 and 18:0. Transcriptome analysis on its developing seeds was subsequently conducted to identify candidate genes involving in the LCFA formation and triacylglycerol biosynthesis in this species. Furthermore, the heterologous expression analysis reveals that NcFATB, which is structurally different from its counterpart from the MCFA-rich species, prefers long-chain saturated FAs. This is consistent with the richness of 18:0 in the seed of N. caudatum. Collectively, this study for the first time generated comprehensive molecular information regarding the seed oil biosynthesis in N. caudatum, thereby helping guide future efforts to manipulate oil production in certain tree species.

Results
The two fatty acids 18:2 and 18:0 occurred in high proportions in the seed of N. caudatum The aim of our initial study was to identify tree species as a potential source for biofuel production. Hence, we analyzed the total lipid content and FA composition of collected seeds from 11 species of the family Lauraceae by gas chromatography. It was found that in the eight tree species C. camphora, U. californica, A. forrestii, L. cubeb, L. communis, L. angustifolia, P. Americana and N. caudatum, seed oil content reaches more than 25% on the dry weight basis versus less than 5% in M. yunnanensis, P. cavaleriei and C. tonkinensis. Furthermore, six of the above mentioned oil-rich species, i.e. C. camphora, U. californica, A. forrestii, L. cubeb, L. angustifolia and L. communis, produce seed oils consisting predominantly of MCFA. In contrast, the seed oils of P. Americana and N. caudatum are mainly composed of LCFA, and the respective fatty acid composition is very similar to that found in three other tested species, i.e. M. yunnanensis, P. cavaleriei and C. tonkinensis (Table 1). Notably, N. caudatum appeared to have the highest proportion of 18:0 among all tested species (more than 20% of total FAs) ( Table 1). A very interesting discovery is that although the FA composition of seed oil of N. caudatum is very similar to that of C. tonkinensis, a close relative species of N. caudatum [5], seed oil content of N. caudatum was much higher than that of C. tonkinensis (Table 1).
To further understand lipid metabolism in N. caudatum, different tissues of three N. caudatum trees growing at Xishuangbanna Tropical Botanical Garden were collected at various developmental stages, and the total lipid content and fatty acid composition were analyzed (Fig. 1a). As shown in Fig. 1b, the fruits expanded rapidly after flowering, with the average diameter of fruits reaching 0.5 and 1.0 cm at 52 and 96 days after flowering (DAF), respectively. After 96 DAF, the size of most fruits stop expanding and the color of seed capsule started to turn red from green. In parallel, lipid analysis showed that the leaves and flower buds as well as developing seeds at the early stage (20 and 52 DAF) contained a limited amount of extracted lipids (only 2 to 4% on the dry weight basis) (Fig. 1b, Additional file 1). The predominant FAs in the seeds at 20 and 52 DAF were found to be 18:2 and 16:0, and their total quantity accounted for 40~59 % of the total FAs. As the embryos continued to develop, seed oil biosynthesis was accelerated, and the oil content elevated to 9, 22, 31 and 42 % on the dry weight basis at 81, 96, 126 and 146 DAF, respectively (Fig. 1b). Concomitantly, the FA composition also changed remarkably. While 18:2 constituting 40 % was still a major FA at 81 and 126 DAF, 16:0 declined significantly. Meanwhile, 18:0 increased up to 15 and 20 % of total FAs. In addition, the proportion of polyunsaturated fatty acids (PUFA, 18:2 and 18:3) was peaked at 81 DAF, followed by a slight decrease from 57 to 53 % of total FAs at 146 DAF (Fig. 1c).

RNA-Seq, de novo assembly of unigenes and functional annotation
To discern the molecular mechanisms by which lipid metabolism is regulated in N. caudatum, its seeds at three development stages were used for RNA-Seq analysis. As a result, a total of 4,034 million paired-ends reads in nine libraries were generated, which account for 60.5 giga base pairs (GB) data ( Table 2). Q20 and GC percentage of the reads in the libraries are also listed in Table 2. After eliminating the reads of adapters and low quality reads, 417 to 474 million clean reads were obtained from each sample. From the high quality clean reads, 529,269 contigs with an average length of 284 bp were de novo assembled. Meanwhile, 239,703 unigenes with an average length of 436 bp were obtained ( Table  2). The size fraction of the unigenes showed that 52.5 % (125,761) of the unigenes had an average length less than 300 bp and only 1.4 % (3,417) of the unigenes were longer than 2000 bp (Additional file 2A).
To explore the metabolic functions and interactions of the detected unigenes, we analyzed the detected unigenes using a pathway based analysis KEGG (Kyoto Encyclopedia of Genes and Genome) [20]. A total number of 5,374 unigenes were assigned to 35 groups of five major categories: metabolism, genetic information processing, environmental information processing, cellular processes and organismal systems. The major pathways were related to translation (556 unigenes, 10.3%), carbohydrate metabolism (410 unigenes, 7.6%) and signal transduction (409 unigenes, 7.6%) (Additional file 6). Notably, 237 unigenes (4.4%) were annotated to lipid related metabolisms, including FA biosynthesis, glycerolipid metabolisms, linoleic acid metabolism and α-linolenic acid metabolism (Additional file 7).

Identification of unigenes associated with FA formation and triacylglycerol biosynthesis through DEG analysis
Since the oil content and FA composition of N. caudatum seed vary greatly with its developmental stages, DEG analyses were performed between each two of three fruit developmental stages to understand the mechanisms underlying the changes of oil content and FA composition. Three developmental stages correspond to 52, 96 and 146 DAF. In brief, a total of 5,493, 7,183 and 1,150 unigenes were found differentially expressed in 96 vs. 52 (Contrast Group I), 146 vs. 52 (Contrast Group II) and 146 vs. 96 DAF (Contrast Group III), respectively (Fig. 2). More specifically, 3,274, 4,038 and The distribution of up-regulated genes at different seed developmental stages. c The distribution of down-regulated genes at different seed developmental stages 581 unigenes were up-regulated, and 2,219, 3,150, and 569 unigenes were down-regulated within each contrast group, respectively (Fig. 2a). As shown by the Venn diagram, 166 unigenes were also co-expressed for the up-regulated genes in each contrast group. In addition, 845, 1,394 and 199 upregulated unigenes were specific to Contrast Group I, II and III, respectively (Fig. 2b). As for the down-regulated unigenes, 48 unigenes were co-expressed in all three contrast groups, and 559, 1,375 and 409 down-regulated unigenes were specific to Contrast Group I, II and III, respectively (Fig. 2c).
Based on the local-blast search against the database with 81 key Arabidopsis genes that are involved in FA formation and triacylglycerol synthesis (Additional file 10), 137 unigenes were identified with high sequence similarities (E-value < 1.0E 10 -5 ) to the Arabidopsis homologs. Among these 137 identified unigenes, 15 of them were predicted to code for pyruvate dehydrogenase complex (PDHC), 9 for ACCase, 5 for SAD and 4 for FAD8. In contrast, only a single homolog encoding individual FAD2/3/6/7 enzyme or FATA/B was identified in the N. caudatum genome (Additional file 10). According to the log2 transformed FPKM of these unigenes (Fig. 3), FATB, SAD, FAD2 and FAD8 genes displayed much higher expression levels than did FATA, KASII and FAD3. Moreover, qRT-PCR analysis revealed that KASI, KASII, FATA, FATB, SAD1, SAD2, FAD2 and FAD8 genes were expressed in all of the tissues tested, including leaves, flowers and developing fruits. Overall, the expression patterns of these genes coincided with the results from RNA-Seq analysis (Fig. 4 a-h). The transcripts of KASII, FAD2, FAD8 and SAD2 were accumulated at higher levels during the stages of rapid oil synthesis (81-96 DAF) and then decreased drastically as the fruit ripens. In contrast, the expression levels of FATB and SAD1 were relatively high at the early embryo development stage (52 DAF) and decreased rapidly during fruit maturation (Fig. 4). Interestingly, the expression of FATA in the fruits dropped to its lowest level at 81 DAF and then started to increase along with fruit maturation (146 DAF), which was agreed with 18:1 accumulation in the seeds.

Functional characterization of NcFATB in E. coli BL21 (DE3) strain
In this study, only one unigene (ID: c90067_g1) annotated as FATB was identified in N. caudatum developing seeds. Its transcripts were shown to accumulate at detectable levels in all of the tested fruit samples. The full-length CDS of this unigene, designated NcFATB, was subsequently cloned and sequenced. The deduced NcFATB peptide has 416 amino acid residues with the molecular weight of 45.87 kDa and the theoretical pI of 6.29. Multiple sequence alignment revealed that NcFATB shares 90, 81 and 80% similarity with its counterpart from L. communis (AHF72806), U. californica (AAC49001) and C. camphora (AAC49151), respectively (Fig. 5). Asn 312 , His 314 and Cys 348 located in the C terminus of NcFATB could be the presumed catalytic residues, as suggested in other FATB proteins [14,21]. The phylogenetic analysis showed that NcFatB is evolutionally more close to the LCFA acyl-ACP catalyzing FatBs rather than the MCFA acyl-ACP catalyzing FatBs such as CcFatB, UcFatB and ClFatB (Fig. 6). The phylogenetic analysis also suggested that NcFatB may prefer LCFA acyl-ACP as substrates.
To experimentally determine its function, NcFATB under the control of a T7 promoter was heterologously expressed in E. coli BL21(DE3). Five hours after IPTG induction at various concentrations (0, 0.1, 0.5 and 1 mM) at 30°C, a clear band near 50 kD was detected by SDS-PAGE (Additional file 11), which is corresponding to the NcFATB protein in the supernatant of bacterial lysate. At 24 hours after IPTG induction, the total FAs in the cell culture expressing NcFATB were increased up to 59.3 mg/L versus 21 mg/L for that of the pET-28a(+) empty vector. There was also an obvious difference in the FA composition between these different cell lines (Fig. 7). In the supernatant of NcFATB-expressing cell culture, the amounts of 16:0 and 18:0 were changed most dramatically, increasing from approximately 4.  (Fig. 7). Our data suggest that NcFATB possesses a high capacity to release long-chain saturated FAs (16:0 and 18:0) from the respective acyl-ACP substrates.

Discussion
The exploitation of new plant resources for their oil-producing characteristics as well as for less demand for farmland has received much attention over the past 30 years [19,20,22]. China is rich in wild oil-producing plant resources. More than 400 Lauraceae species have been discovered in China [6]. Many of them were found dominant in MCFA in their seeds [7,23]. These MCFA lipids could serve as indispensable raw materials for numerous industrial products, such as surfactants, lubricants and detergents [6,24]. In this study, C. camphora, U. californica, A. forrestii, L. cubeb and L. communis were discovered to contain relatively high seed oil content (>30% DW) and the dominant FAs in these seeds  (Table 1). In contrast, LCFA was dominant in the seeds of N. caudatum, L. angustifolia, P. americana, M. yunnanensis, P. cavaleriei and C. tonkinensis. Strikingly, the FAs of N. caudatum seeds were mainly composed of 18:2 and 18:0, which is remarkably different from that of other well-documented Lauraceae species such as C. camphora and U. californica (Table 1). To our knowledge, not many species in the plant kingdom have been reported to exhibit such unique seed oil profile [25][26][27]. In light of this, we speculated that a distinct mechanism for lipid metabolism may occur in N. caudatum and the study of this specific molecular mechanism may enable us to identify the target plants suitable for future genetic engineering for seed oil production.
RNA-Seq is an effective means to study the molecular regulation of a particular trait in a new plant species that lacks any reference genomes [19,28,29]. Recent advances in this technology have made it relatively easy to identify critical genes and pathways associated with lipid metabolism in some Lauraceae species such as C. camphor [30], Litsea cubeba [31], Persea americana [32,33] AAA34215) and Cinnamomum longepaniculatum. The conserved catalytic residues in the C-terminus were marked with asterisk and Lindera glauca [34]. Given that little genomic information about N. caudatum is currently available, we initially conducted high-throughput transcriptome sequencing of developing seeds of this species. The data from the RNA-Seq experiment combined with subsequent validation by qRT-PCR would help generate a comprehensive picture about the possible causes of the characteristic fatty acid profile in the seed oil of N. caudatum.
In higher plants, the 18:0 level can be regulated by three types of enzymes [9]. The first type of enzyme is the plastid-localized KASII, which catalyzes the elongation of 16:0-ACP to 18:0-ACP [35]. As shown in a previous study, overexpression of KASII gene in sunflower seeds could elevate the 18:0 levels significantly [34]. Unexpectedly, the FPKM values of NcKASII were relatively low in the seeds of N. caudatum (Additional file 10). In agreement, the qRT-PCR results showed that the expression level of KASII decreased remarkably at the fruit maturation stage (126 DAF), which is an important stage for 18:0 accumulation (Fig. 4b). Nevertheless, it remains unclear at this stage whether NcKASII is a key determinant of the high 18:0 accumulation in N. caudatum seeds. The second type of enzyme controlling the 18:0 levels is FAT, which releases FAs from acyl-ACP. It was discovered that the FPKM value of NcFATB was much higher than that of NcFATA in all the developing seeds tested (Additional file 10). Accordingly, RT-PCR showed that the expression levels of NcFATB gradually increased after 81 DAF (Fig. 4d), which were correlated with the 18:0 levels in the seeds (Fig. 1c), suggesting that the NcFATB gene may play an important role in controlling 18:0 accumulation in N. caudatum seeds. The third type of enzyme is SAD, which catalyzes the conversion of 18:0-ACP to 18:1-ACP Fig. 6 Phylogenetic analysis of NcFATB protein sequences in various species. A neighbor-joining tree based on 31 FATB protein sequences form 29 plant species was constructed in Mega 6.0. YP387830 protein in Desulfovlbrlo alaskensis was selected as the out group in the plastids. The formed 18:1-ACP is subjected to hydrolysis exerted primarily by FATA [9]. Since the total amount of 18:1, 18:2 and 18:3 are higher than that of 18:0 in N. caudatum seed lipid, it can be inferred that high NcSAD activities are present in the developing seeds. This view is supported by the high FPKM values of NcSAD1 and NcSAD2 (Fig. 3, Additional file 10). Nevertheless, the results from DEG analysis and qRT-PCR showed that the expression levels of NcSAD1 and NcSAD2 varied greatly among different stages during seed development (Fig. 4e & f). The two NcSAD genes displayed opposite expression patterns, suggesting that they may play distinct roles in FA accumulation (Fig. 4). Last, it should be pointed out that although it is unknown to what extent the plastidial acyltransferases play a role in the control of the 18:0 levels in the seed oil, such a role cannot be excluded since they may compete for the same substrate 18:0-ACP with NcFATB and NcSAD enzymes.
It is generally accepted that FAD2 is the key determinant of the 18:2 level in oilseeds. In agreement with this, the FPKM values of NcFAD2 in developing seeds were very high (>400, Additional file 10), which were correlated with the high level of this FA in the seeds. Nevertheless, the expression levels of NcFAD2 varied with the different stages of seed development (Fig. 4g). Interestingly, the FPKM value of NcFAD8 (107670_g4) coding for a plastidal desaturase was shown to be much higher than that of NcFAD3 (Additional file 10). In addition, the high level of NcFAD8 expression, as shown by qRT-PCR, seems to coincide with the relatively high level of 18:3 accumulation in the developing seeds ( Fig.  1 and Fig. 4h). This intriguing result may raise a question as to whether the conversion of 18:2 to 18:3 by this enzyme significantly contributes to the final level of 18:3 in the seed oil. Such a question merits further investigation.
In this study, we also found the deduced amino acid sequence of NcFATB shared high sequence homology with UcFATB1/2, CcFATB and LcFATB (Fig. 5). It belongs to the FAT family (PF01643) and contains a helix/ multi S stranded sheet motif (hotdog folds) with three conserved catalytic residues in the C terminus (N 312 , H 314 & C 348 ) [21,36]. Interestingly, like LcFATB, NcFATB has 35-37 extra residues compared to CcFATB, UcFATB and ClFATB (Fig. 5). Those extra residues appear to be conserved and are distributed in the N and C termini of NcFATB protein (Fig. 5). Considering the fact that the seeds of U. californica, C. camphor and C. longipaniculatum contain predominantly MCFA, while L. communis sarcocarp and N. caudatum seeds are rich in LCFA [16,37], there is a possibility that these extra 35-37 residues in the FATB sequence may influence the substrate specificity. Our phylogenetic analysis also showed the protein sequence of NcFATB and LcFATB were grouped together and shared close relationships with the LCFA-dominant species. In contrast, three MCFA-specific FATBs were grouped together and spread away from other LCFA-specific FATBs (Fig. 6). In the future studies, site-directed mutagenesis or domain swapping of NcFATB may help define the roles of individual residues in controlling the substrate preference of this protein.
Heterologous expression of target gene in E. coli cells has proven to be an effective way to determine gene function [38]. The substrate specificities of FATB could be inferred from the contents of free FAs in the culture medium, as reported in a previous study where expression of UcFATB in E. coli FA degradation mutant strain K27 resulted in huge elevation of MCFA (more than 80 % of total FAs) in the medium [39]. In this study, we found that heterologous expression of NcFATB in E. coli BL21 (DE3) cells resulted in a 2.86-fold increase in total free FAs. LCFAs (16:0, 18:0, 16:1 and 18:1) constituted 66.7% of the total FAs in the culture medium, whereas the two MCFAs, 10:1 and 12:0, only reached 20% of the total FAs (Fig. 7). Since the amount of free LCFAs is 3.37-fold higher than that of free MCFAs in the culture medium, it is reasonable to speculate that NcFATB prefers LCFA-ACP over MCFA-ACP as substrates, which is consistent with the fact that the seed of N. caudatum only has a very limited amount of MCFAs. As to the low-level accumulation of free MCFA in the medium of bacterial culture, one probable explanation is that the high-level expression of NcFATB protein in the BL21 (DE3) cells may trigger weak hydrolysis of MCFA-ACP even if this enzyme has low affinity for MCFA-ACP (Additional file 11). This assumption is supported by a previous study with FATB of L. communis whose Values presented are means ± SD of three biological replicates sarcocarps contain predominantly LCFAs [39]. In this reported study, heterologous expression of LcFATB resulted in similar amount of MCFA accumulation in the culture medium of E. coli BL21 (DE3) fad88 mutant strain [39]. Together, our results suggest that NcFATB prefers LCFA-ACP as the substrate although we cannot rule out the possibility that the heterologously expressed NcFATB may utilize MCFA-ACP as substrate to some extent, especially under some favorable conditions. To more accurately define the substrate preference of NcFATB, it is necessary to dissect the in planta function of this gene in the future.

Conclusions
In this study, the lipid content and FA composition of eleven species from Lauraceae family were first quantified, and the transcriptome in the developing seeds of N. caudatum was analyzed. NcFATB was shown to possess a unique structure and a high capacity to use long-chain saturated fatty acyl-ACPs as substrates, providing an explanation for the 18:0-rich oil profile in N. caudatum seeds. Our study depicted a comprehensive view of triacylglycerol biosynthesis in N. caudatum seeds, which may be useful in exploiting this plant species as an industrial resource.

Lipid analysis
To analyze the lipid content and FA composition, seed kernels from nine species of Lauraceae family were collected and grinded into fine powders in liquid nitrogen. Lipid extraction was performed as previously described [40]. Glyceryl triheptadecanoate (Cat# T2151, Sigma-Aldrich, USA) was added as internal standard (50 μg each sample). The extracted lipids were esterified into FA methyl esters (FAMEs) and analyzed with GC-FID (Agilent 7890B Gas Chromatography equipped with DB23 column, 60m*0.25mm* 0.25 μm, Agilent,USA). The temperature program initiated with 160°C for 1.5 minutes and increased to 240°C at a rate of 20°C/minutes, then kept at 240°C for 10 minutes. The total lipid content and FA compositions were calculated by comparing the peak area of target FAs and the internal standard (methyl heptadecanoate). Data presented are mean ± SD of three biological replicates. The seed oil content and FA composition of U. californica and L. angustifolia were taken from previous published literatures for comparison purposes [37,41].

RNA extraction and cDNA library construction
Three important stages of N. caudatum fruits, early cotyledon development stage (52 DAF), fast oil accumulating stage (96 DAF) and fully maturation stage (146 DAF) from three individuals, were selected for RNA-seq analysis [2]. Total RNA of the fruits was extracted by using the RNeasy Plant Mini Kit (Qiagen, USA). DNase I (RQ1, Promega, USA) was added to remove any genomic DNA contamination. Total RNA was quantified using Nanodrop ND-2000 spectrophotometer (Nanodrop Technologies, USA). All the samples showed a 260/280 nm ratio of 1.8 to 2.1. The poly-A tailed mRNA was purified from the total RNA using Dynabeads™ mRNA Purification Kit (Cat # 61006,Thermo Fisher Scientific, USA). The first-strand cDNA fragments were synthesized with random primers and transformed into double-strand cDNA. Fragments of desirable lengths (200-300 bp) were purified, end-repaired and ligated with the sequencing adapters through A and T complementary base pairing. The sequencing library was constructed using polymerase chain reaction (PCR). The synthesized cDNA libraries were normalized to a 10 nM and gradually diluted and quantified to 4-5 pM.

Deep sequencing, unigenes assembly and gene annotation
Nine cDNA libraries were deep-sequenced on the Illumina NextSeq™ 500 platform at Shanghai Personal Biotechnology Co., Ltd. A total of 424 -476 million paired-ends raw reads in each library were sequenced. After filtering out the low quality reads (mean mass frac-tion<Q20) and raw reads with adaptors, 417-474 million clean reads in each library were de novo assembled using Trinity software (Version: r20140717, k-mer 25 bp) [42].

Differential expression gene(DEG)analysis
DEG screening was conducted by DESeq (Version 1.18.0) [45]. Genes were estimated to be significantly differentially expressed if expression values showed a log2 (fold change) >1 and P value < 0.05 between any two tested developmental stages. Co-expression analysis of DEG and Venn diagram were drawn by VENNY2.1 (http://bioinfogp.cnb.csic.es/tools/venny/index.html). GO functional categorization of DEGs was performed with BGI WEGO program [46] (http://wego.genomics.org.cn /cgi-bin/wego/ index.pl). The ten most-represented GO terms of each category were demonstrated in figures. KEGG annotation of DEGs was performed on KASS as described above.

Identification and expression analysis of unigenes
To identify the putative genes that associated with long-chain saturated FA formation and triacylglycerol biosynthesis, a local-blast search was performed between N. caudatum unigenes and 81 Arabidopsis genes which were crucial for FA formation and triacylglycerol assembling [47]. Unigenes that had high sequence similarity (E value < 10 -5 ) to the Arabidopsis homologs were identified (Additional file 10). Besides, their Fragments Per Kilobase of transcript per Million (FPKM) values were log2 transformed and demonstrated with Multi Experiment Viewer (MeV) software. Moreover, the expression levels of FATA/B, FAD2/8, KASI/II and SAD1/2 in the samples were further validated with qRT-PCR (CFX96 Real Time PCR System, Bio-red, USA; SYBR Premix Ex Taq™ Cat # RR420, TaKaRa, Japan). The primers were designed based on the conserved sequences of these unigenes (See Additional file 13) and ACT11 was selected as the reference gene. The relative expression level of each target gene was calculated by delta-delta Cq method [48] and normalized with its expression level in the leaves (equal to 1 ). Three biological replicates were conducted and data were presented as mean ± SD.