Transcriptome analysis of metabolic pathways associated with oil accumulation in developing seed kernels of Styrax tonkinensis, a woody biodiesel species

Background Styrax tonkinensis (Pierre) Craib ex Hartwich has great potential as a woody biodiesel species having seed kernels with high oil content, excellent fatty acid composition and good fuel properties. However, no transcriptome information is available on the molecular regulatory mechanism of oil accumulation in developing S. tonkinensis kernels. Results The dynamic patterns of oil content and fatty acid composition at 11 time points from 50 to 150 days after flowering (DAF) were analyzed. The percent oil content showed an up-down-up pattern, with yield and degree of unsaturation peaking on or after 140 DAF. Four time points (50, 70, 100, and 130 DAF) were selected for Illumina transcriptome sequencing. Approximately 73 million high quality clean reads were generated, and then assembled into 168,207 unigenes with a mean length of 854 bp. There were 5916 genes that were differentially expressed between different time points. These differentially expressed genes were grouped into 9 clusters based on their expression patterns. Expression patterns of a subset of 12 unigenes were confirmed by qRT-PCR. Based on their functional annotation through the Basic Local Alignment Search Tool and publicly available protein databases, specific unigenes encoding key enzymes, transmembrane transporters, and transcription factors associated with oil accumulation were determined. Three main patterns of expression were evident. Most unigenes peaked at 70 DAF, coincident with a rapid increase in oil content during kernel development. Unigenes with high expression at 50 DAF were associated with plastid formation and earlier stages of oil synthesis, including pyruvate and acetyl-CoA formation. Unigenes associated with triacylglycerol biosynthesis and oil body development peaked at 100 or 130 DAF. Conclusions Transcriptome changes during oil accumulation show a distinct temporal trend with few abrupt transitions. Expression profiles suggest that acetyl-CoA formation for oil biosynthesis is both directly from pyruvate and indirectly via acetaldehyde, and indicate that the main carbon source for fatty acid biosynthesis is triosephosphate originating from phosphohexose outside the plastid. Different sn-glycerol-3-phosphate acyltransferases are implicated in diacylglycerol biosynthesis at early versus late stages of oil accumulation. Triacylglycerol biosynthesis may be accomplished by both diacylglycerol and by phospholipid:diacylglycerol acyltransferases.

synthase (FAS) complex, and acyl-CoA:DAG acyltransferase (DGAT) [18][19][20]. Zhang et al. [15] have previously shown that PDC and ACC activities support FA biosynthesis in developing S. tonkinensis kernels, and DGAT supports TAG biosynthesis. Different parts of the process are localized in different compartments; for example, glycolysis in the cytosol and plastids, FA biosynthesis in plastids, and TAG biosynthesis in endoplasmic reticulum (ER) [20]. Thus, there are many transmembrane transporters needed to mediate the transfer of essential metabolites between compartments in developing oil seeds [21][22][23][24].
Understanding of the pathways involved in oil accumulation during S. tonkinensis kernel development requires the exploration of functional genes, transcription factors and transmembrane transporters. RNA-seq, one of the next-generation sequencing (NGS) technologies, is widely used for exploring functional genes, constructing expression profiles and transcriptional regulatory pathways, and investigating comparative and evolutionary genomics [25][26][27]. It provides a readily available, cheap, and comprehensive method to acquire large-scale genomic and transcriptomic data for non-model plant species and has been used to analyze the transcriptomic profiles and construct summary overviews of FA and TAG biosynthesis during oil accumulation in woody oil plants, such as Elaeis guineensis Jacq., Prunus sibirica L., Pistacia chinensis Bunge, Symplocos paniculata Miq., and Lindera glauca (Siebold & Zucc.) Blume [28][29][30][31][32]. However, functional gene expression and regulatory profiles of oil biosynthesis and metabolism in S. tonkinensis have not yet been analyzed.
In this study, we explored changes in gene expression, oil content and FA composition during seed kernel development in S. tonkinensis over an extended time course. The aims of the study were to: (1) confirm the dynamic pattern of oil content and FA composition during kernel development; (2) assess transcriptional profiles at four time points leading up to kernel maturity; (3) identify functional unigenes encoding vital enzymes related to FA and TAG biosynthesis; and (4) confirm the metabolic pathways associated with oil accumulation in developing kernels indicated by functional unigenes.

Dynamic patterns of oil content and fatty acid composition
In this study, the dynamic pattern of oil concentration based on fresh mass and dry mass during kernel development was evaluated (Fig. 1a). There was a dramatic increase in the per cent oil content from 60 to 70 days after flowering (DAF). Thereafter, per cent oil content followed an up-down-up pattern with peaks at 80 DAF and at kernel maturity. The final oil content represented about 44 and 56% of the kernel fresh mass and dry mass, respectively.

Sequence analysis and de novo transcriptome assembly
Based on the observed changes in oil content and quality reported in Fig. 1, and related changes in morphological and physiological indicators in previous studies [15,17], we defined four stages of kernel development: an initial stage prior to the major rise in content (50)(51)(52)(53)(54)(55)(56)(57)(58)(59)(60), an active stage of rising oil content (60-80 DAF), a third stage with decreased oil content and high polyunsaturated FAs (80-120 DAF), and a final maturation stage with stable oil content and composition (120-150 DAF). To explore changes in transcription, we sampled these stages at 50, 70, 100 and 130 DAF. Three cDNA libraries were constructed from independent kernel RNA samples obtained at each of these four time points.
More than 5 GB raw data were respectively generated from every RNA-Seq sample. An average 72,584,106 (97.15%) of clean reads was obtained after removing the adaptor and low-quality reads (Table 1). A total of 365, 312 contigs were obtained with a mean length of 1146 bp and GC content of 40.45% using Trinity software. Subsequently, a final contigs assembly produced 168,205 unigenes (N50: 1437) with a mean length of 854 bp. The number of unigenes assembled in this study was comparable to numbers assembled in studies of kernel development in other woody plants [29][30][31][32]. Of the unigenes we found, 42.25% (71,071) were 200-400 bp in length, 32.17% (54,119) were 400-1000 bp, 15.62% (26,268) were 1000-2000 bp, and 9.96% (6747) exceeded 2000 bp (Table 2).

Functional annotation and classification
To annotate and identify putative functions, all the assembled unigenes were annotated using the Basic Local Alignment Search Tool (BLAST) program against five publicly available protein databases with a cut-off E-value of 10 − 5 . In total, there were 74,524 (44.30%), 63,788 (37.92%), 60,017 (35.81%), 49,305 (29.31%), and 20,424 (12.14%) unigenes that had significant hits with known proteins in nonredundant protein (NR), the Swiss-Prot protein (SwissProt), Gene Ontology (GO), Clusters of Orthologous Groups (COG), and the Kyoto Encyclopedia of Genes and Genomes (KEGG) databases, respectively. Of all the assembled unigenes, 89,387 (53.14%) had significant BLAST matches in at least one of the five databases, whereas 14,105 (8.39%) had significant hits in all databases. However, the remaining 78, 818 (46.86%) unigenes had no significant annotation hit, either because they represent novel tissue-specific genes not in a database or because there were a large number of short sequences (71,071, 42.25%) lacking a characterized protein domain to get BLAST hits.

Analysis of gene expression
To investigate the dynamic expression patterns of the specific genes associated with oil accumulation, the transcriptome profiles from different time points were compared. All 5916 differentially expressed genes (DEGs) were subject to a clustering analysis using Mfuzz to yield 9 clusters (Additional file 3). Cluster I, containing 1115 DEGs, consisted mainly of unigenes that had a rapid decrease in expression between 50 and 70 DAF, remaining low thereafter. The major pathway enrichment of unigenes in cluster I was focused on biosynthesis of amino acids, glycerolipid metabolism and starch and sucrose metabolism. Cluster II contained 1035 DEGs, representing the unigenes with continuous downward trends from 50 through to 100 DAF. The unigenes of cluster II were mainly enriched into the pathways of biosynthesis of amino acids, glycolysis/gluconeogenesis and purine metabolism. Cluster III, containing 839 DEGs, represented the unigenes with higher expression levels at both 50 and 70 DAF. The pathway enrichment of unigenes in cluster III was mainly focused on fatty acid biosynthesis, fatty acid metabolism, glycolysis/gluconeogenesis, the pentose phosphate pathway (PPP), protein processing in endoplasmic reticulum, pyruvate metabolism and the TCA cycle. Clusters IV and V, containing 485 and 580 DEGs respectively, both peaked at 70 DAF during the most active period of rising oil content. Cluster IV was enriched in unigenes for ABC transporters, amino sugar and nucleotide sugar metabolism, fatty acid biosynthesis and the PPP. Cluster V rose to the peak at 70 DAF more abruptly than cluster IV and was enriched in unigenes for glycerophospholipid metabolism, pyruvate metabolism and tyrosine metabolism. Cluster VI contained 226 DEGs, and represented the unigenes with expression levels that mostly continued to increase from 70 to 100 DAF, as oil accumulation was slowing down. This cluster was mainly enriched in the pathways of glycolysis/gluconeogenesis, pyruvate metabolism and ubiquitin mediated proteolysis. Cluster VII contained 339 DEGs, representing the unigenes that showed a more-or-less gradual increase during all of kernel development and peaking at either 100 or 130 DAF. Cluster VII was enriched in unigenes for alpha-linolenic acid metabolism, glycerolipid metabolism, glycerophospholipid metabolism, purine metabolism and ribosomal proteins. Cluster VIII contained 565 DEGs, representing the unigenes with somewhat up-down-up change trends during kernel development and was mainly enriched in the pathways of fatty acid metabolism, glycolysis/gluconeogenesis, protein processing in endoplasmic reticulum and starch and sucrose metabolism. Cluster IX, containing 731 unigenes, represented the unigenes with complex change trends and lower expression levels at 70 or 100 DAF. The pathway enrichment of unigenes in cluster IX was mainly focused on biosynthesis of amino acids, fatty acid degradation, fatty acid metabolism, glycerolipid metabolism, oxidative phosphorylation, phenylpropanoid biosynthesis, protein processing in endoplasmic reticulum and pyrimidine metabolism.
To validate the RNA-Seq results, relative expression and temporal transcription profiles of 12 identified key genes were analyzed by qRT-PCR. There was very good consistency between the relative expression levels of these genes and the RPKM comparative ratios (Additional file 4). Thus, the dynamic expression patterns of kernel unigenes uncovered by RNA-Seq can be considered reliable.

Unigenes related to pyruvate and acetyl-CoA formation
Unigenes representing key enzymes in the glycolytic pathway and the pentose phosphate pathway (PPP) related to pyruvate and acetyl-CoA formation include ATP-dependent phosphofructokinase ( Ten unigenes were identified to encode PFK, including two for PFK2, four for PFK3, two for PFK4, and two for PFK5. Nine unigenes were identified for PFP (six for PFP-α and three for PFP-β). Twenty-three unigenes were found to encode PK, which included nine for PK1, six for PK2, four for PK4, one for PK6, one for PK9, and two for PK10. Moreover, three unigenes were identified to encode ACS and 22 unigenes were found to encode four subunits of the PDC, including two for PDC E1-α, five for PDC E1-β, eight for PDC E2, and seven for PDC E3. Seven unigenes encoding G6PD (four for G6PD1 and three for G6PD2) and five unigenes encoding PGD (two for PGD1, one for PGD2, and two for PGD3) were identified. For each protein, the unigene with the highest RPKM value was selected to examine expression levels over time (Additional file 5). The selected unigenes showed two major expression patterns (Fig. 6), with peaking at 50 DAF (pfk3, g6pd1, pgd3, pdc E1-β and pdc E3), or at 70 DAF (pfp, pk, acs, g6pd2, pdg1, pdg2, pdc E1-α and pdc E2). All of these unigenes had a comparatively low expression level at the last two time points (100 and 130 DAF). Nine unigenes were found to encode ACC including three for accA, three for accB, and three for accC. One unigene was found for MAT. Thirty-eight unigenes were identified as essential cofactors of FAS components, including 20 for KAS (six for KASI, five for KASII, and nine for KASIII), eight for KAR, three for HAD, and seven for EAR. Six unigenes were found to encode SAD. Moreover, eight unigenes encoding FAT (two for FATA and six for FATB) and 21 unigenes encoding ACSL (three for ACSL1, three for ACSL2, one for ACSL3, seven for ACSL4, one for ACSL5, three for ACSL6, one for ACSL7, one for ACSL8, and one for ACSL9) were identified. Three unigenes were identified to encode FAD2, five for LPCAT1 and seven for PLA2. Additionally, eight unigenes for ACP were identified by the functional annotation analyses. The unigene expression trends of enzymes involved in FA biosynthesis were analyzed (Fig. 6). Except KAR and PLA2, all the FA biosynthesis unigenes had dynamic expression patterns similar to the unigenes related to pyruvate and acetyl-CoA formation, with peak expression at the second time point (70 DAF). The fatb unigene had low expression at the later stages (100 and 130 DAF), while fad2 had low expression at the initial stage (50 DAF). . We identified two unigenes for encoding GK, thirteen for GPAT (four for GPAT1, two for GPAT3, one for GPAT4, three for GPAT6, one for GPAT8, and two for GPAT9), eighteen for LPAT (five for LPAT1, five for LPAT2, three for LPAT4, and five for LPAT5), and seven for PAP. Nine unigenes for DGAT and seven for PDAT (six for PDAT1 and one for PDAT2) were identified. The change trends of unigenes related to TAG biosynthesis were complex (Additional file 5). GK had a downward trend during kernel development, while GPAT showed an up-down-up trend peaking at 70 DAF. LPAT and PAP had a similar change trend with a peak at 70 DAF, while PDAT and DGAT peaked at 100 and 130DAF, respectively.

Unigenes related to triacylglycerol and oil body biosynthesis
Unigenes encoding the oil body membrane proteins oleosin (OLE), caleosin (CLE) and steroleosin (SLE) were analyzed. Nine unigenes encoding OLE were identified (six for OLE1, two for OLE3, and one for OLE5), but only one unigene encoding CLE was found, and none for SLE. With kernel development, ole1 expression levels increased continuously but the other unigenes were low and did not show much change.
Five enzymes related to carbon partitioning in the glycoxidation pathway were analyzed: glucose-  1.37). There were nine, seven, five, four and seven unigenes identified to encode APG, SS, GPI, GAPDH and MDH, respectively. All except APG had unigene expression peaks at 70 DAF (Fig. 6).

Discussion
Oil accumulation and biodiesel fuel property assessment in developing S. tonkinensis kernels Oil content is a principal factor for evaluating the potential of biodiesel feedstocks [4,9]. Previous studies have shown that the oil content of fully ripened S. tonkinensis kernels ranges from 58.6 to 63.0% [9,14,15], which is higher than many other oil plants including L. glauca (31.6%), P. chinensis (35.1%), J. curcas (39.8%), and Vernicia fordii (Hemsl.) Airy Shaw (40.0%) [32,[35][36][37]. In this study, the dynamic pattern of oil content based on dry mass showed an up-down-up trend with peaks at 80 and 140 DAF. Conversion of these data relative to fresh mass, based on water contents at corresponding times in Zhang et al. [15], resulted in a very similar pattern (ρ = 0.936, p < 0.01). A rapid increase in oil content from 60 to 70 DAF coincides with the initiation of rapid growth in kernel biomass. Thereafter, over the period where kernel biomass is known to increase steadily [15], oil content fluctuated (Fig. 1a). However, considering both content and size, maximum oil yield is expected by 140 DAF.

Clustering analysis of DEGs
According to the functional annotation and classification of the unigenes and KEGG pathway assignment [31,33], multiple unigenes encoding key enzymes associated with oil accumulation were identified. All unigenes that were differentially expressed between different sampling dates, irrespective of pathway assignment, were grouped into 9 clusters based on their temporal expression patterns (Additional file 3). These clusters show distinct patterns in expression throughout kernel development, but also significant temporal overlap in terms of associated pathways and processes. For example, clusters I and II represented DEGs with high expression at 50 DAF and were enriched in DEGs associated with amino acid biosynthesis, whereas DEGs related to FA biosynthesis were mainly found in clusters III and IV which represented the unigenes having high expression levels mostly at 70 DAF. These patterns are consistent with protein accumulation beginning sooner than FA accumulation [15].
The DEGs related to TAG biosynthesis were mainly found in clusters V, VII and IX. Clusters V and VII represented the unigenes having high expression levels that were delayed relative to clusters III and IV, but there was no clear temporal relationship with oil accumulation for cluster IX. Furthermore, the unigenes in cluster VIII, whose expression change trends best coincided with the up-down-up pattern of oil accumulation during kernel development, were obviously enriched in glycolysis/gluconeogenesis, starch and sucrose metabolism and protein process in endoplasmic reticulum rather than FA or TAG biosynthesis. It's possible that some of these glycolysis/gluconeogenesis and starch and sucrose metabolism-related unigenes may be involved in the provision of substrate to support both phases of FA biosynthesis.

Pyruvate and acetyl-CoA formation
To support FA biosynthesis, sucrose, the major form of sugar transported in most plants, is decomposed into fructose and glucose to yield pyruvate by way of the glycolytic pathway [19,21,41]. Three enzymes in this pathway, PFK, PFP, and PK, play key regulatory roles in the formation of pyruvate [21,29]. Pyruvate is used to produce acetyl-CoA by the PDC directly, and ACS can also produce acetyl-CoA by utilizing acetate produced from pyruvate by way of acetaldehyde. In addition, the PPP operates parallel to glycolysis, starting at G6PD, while also supplying fatty acid biosynthesis with NADPH by way of PDG [19,21]. The enzymes PFK and PFP both catalyze the phosphorylation of fructose-6-phosphate, the first committed step of glycolysis, but PFK has been identified in both cytosol and plastid, while PFP is restricted to plastids [28,42]. In concurrence, G6PD1, which is cytosolic, had a unigene expression pattern similar to PFK, while G6PD2, which is located in the plastid, had a unigene expression pattern similar to PFP. These expression patterns are consistent with the existence of phosphohexose in both cytosol and plastids. The dynamic patterns of most unigenes encoding PK were coincident with pfp, including pk1, pk2, pk4, pk6 and pk10, which are restricted to the plastid. The unigenes for plastidial PDC E1-α and PDC E2 had similar patterns, as did ACS, indicating that acetyl-CoA could be formed by two different pathways, which contrasts with results in P. sibirica that suggested a less important role for ACS [29]. Like pfk, and consistent with P. sibirica, expression patterns of unigenes for non-plastidial PDC E1-β and PDC E3 trended downwards from 50 DAF. Most unigenes related to pyruvate and acetyl-CoA formation in the glycolytic pathway had higher expression levels at the first two time points (50 DAF and 70 DAF), indicating greater emphasis on the formation of these primary substrates during the early stages of S. tonkinensis kernel development.

Fatty acid biosynthesis and oil accumulation
Focusing on key enzymes involved in oil biosynthesis and accumulation, a summary overview of oil accumulation processes in developing kernels was constructed (Fig. 8). The de novo FA biosynthesis from precursors of acetyl-CoA occurs in plastids and adds two carbon units to malonyl-acyl carrier protein (ACP) in every condensation reaction cycle catalyzed by the FAS complex [43][44][45]. ACC, as the first committed key and a rate-limiting enzyme, converts acetyl-CoA to malonyl-CoA [29]. MAT converts malonyl-CoA to malonyl-ACP, which is the primary substrate for a subsequent series of condensation reactions [45]. In each condensation cycle, the FAS complex adds two additional carbon units to malonyl-ACP [46] to yield either palmitic acid-ACP (16:0-ACP) or stearic acid-ACP (18:0-ACP) after six or seven cycles, respectively. Additionally, SAD converts 18:0-ACP to oleic acid-ACP (18:1-ACP). Then, FAT releases free FA from ACP; specifically, FATA releases free C18:1 from C18:1-ACP, and FATB releases free C16:0 from C16:0-ACP, while either FATA or FATB can release C18:0 from C18:0-ACP. ACSL on the outer membrane of the plastid esterifies free FAs to generate an acyl-CoA pool [19,47]. After that, some acyl-CoAs will combine with glycerol-3-phosphate (G3P) to form TAG, while others, especially C18:1-CoA, will enter the phosphatidylcholine (PC) pool and be esterified to form acyl-PCs by LPCAT. In the PC pool, FAD2 converts C18:1-PC to C18:2-PC, an additional substrate for TAG biosynthesis [31,46]. PLA2 recycles PC to lysophosphatidylcholine (Lyso-PC) [46].
In this study, most FA biosynthesis unigenes had dynamic expression patterns similar to the unigenes related to pyruvate and acetyl-CoA formation; however, the rise in expression levels from 50 DAF to 70 DAF was in some cases more pronounced for FA biosynthesis-related enzymes (i.e., MAT, HAD, EAR, and SAD) than for unigenes related to pyruvate and acetyl-CoA formation, in accordance with the carbon flow from sugar to oil [15,19]. The unigenes of FATB, the main enzyme responsible for free saturated FA release, had low expression levels at the last two time points (100 and 130 DAF), consistent with the actual FA profiles (especially C16:0 and C18:0) reported in Fig. 1b. The unigenes of FAD2, the main enzyme responsible for desaturation in unsaturated FA biosynthesis [48], showed a rapid rise in expression levels between 50 and 70 DAF and then a relatively slow decline later on, consistent with the actual FA profiles (cf., the increase in unsaturated relative to saturated FAs from 50 to 70 DAF in Fig. 1b, d). The identification of StFAD2 would be useful in genetic engineering of S. tonkinensis for producing seed oils optimal for biodiesel fuel properties.
TAG is the major form of carbon storage in oil seeds. De novo TAG biosynthesis from precursors of G3P occurs in the ER via the Kennedy pathway [20,49]. TAG is stored in oil bodies surrounded by a lipid monolayer and associated oil-body-membrane proteins including OLE, CLE and SLE, which are released from the ER into the cytoplasm [50,51]. GPAT, as the first committed key enzyme in the Kennedy pathway, catalyzes G3P acylation to form lysophosphatidic acid (LAP) in the sn-1 position [52]. LPAT catalyzes LAP acylation to form phosphatidic acid (PA) in the sn-2 position. PAP converts PA to diacylglycerol (DAG). After that, rate-limiting enzymes DGAT and PDAT catalyze DAG acylation at the sn-3 position with acyl-CoA or acyl-PC to form TAG [47,53]. In addition, G3P is converted from glycerol by GK [31].
Among the 13 gpat unigenes, the sustained expression of gpat9 in S. tonkinensis kernels suggests that GPAT9 is the key initial acyltransferase for TAG biosynthesis [29,53]. In contrast, gpat8, involved in cutin biosynthesis [54], was strongly expressed early in kernel development (50 and 70 DAF). The unigenes related to PC, PA and DAG biosynthesis such as lpcat, fad2, pla2, gk, gpat9, lpat2, and pap also had peaks at the earlier stages (50 DAF or 70 DAF), whereas the unigenes related to TAG biosynthesis such as dgat and pdat2, peaked later on (100 DAF or 130 DAF) (Figs. 6 and 8). The maximum expression of pdat2 and dgat was at 100 DAF and 130 DAF (Fig. 8), respectively, indicating that TAG biosynthesis was accomplished both by DGAT with acyl-CoA via the Kennedy pathway and by PDAT with acyl-PC from the PC pool [31,55].
As FA and TAG accumulated during kernel development, unigenes related to oleosin synthesis (ole1, ole3 and ole5), and thus oil body formation, were steeply and steadily upregulated (Fig. 6). In particular, the expression level of ole1 was very high. In contrast, cle expression remained quite low throughout kernel development. These patterns are consistent with results reported for developing seed kernels of P. sibirica [29]. The expression pattern of ole1 was very similar to the evolution of total soluble protein content during kernel development reported by Zhang et al. [15].

Fatty acid and triacylglycerol degradation and metabolism
During times of carbon and energy deficiency in higher plants, TAG is hydrolyzed to release free FAs and glycerol, which is vital for cellular energy balance, lipid homeostasis, membrane proliferation, cell growth, and survival [56]. There are two TAG lipases, TAGL and MAGL, responsible for the majority of oil breakdown [31,56]. The released free FAs and/or acyl-CoA esters are converted to acetyl-CoA in the glyoxylate cycle and gluconeogenesis via the β-oxidation pathway with three core enzymes consisting of ACOX, MFP2 and ACAA [57]. In S. tonkinensis, the expression patterns of unigenes related to FA and TAG degradation and metabolism had peaks at 70 DAF (Fig. 6), similar to the major unigenes associated with FA and TAG biosynthesis. Consequently, the unigenes for DGAT, PDAT, and TAGL, three vital enzymes in the TAG pathway, do not have the same expression pattern, suggesting opportunities for genetic cloning and modification to improve cellular oil content and quality. ACOX1, active in the desaturation of both longand medium-chain acyl-CoAs, and ACOX2, targeting longchain acyl-CoAs, showed high unigene expression levels early in kernel development, indicating that free FA (especially C16:0 and C18:0) may be broken down to provide acetyl-CoA, the key metabolite for energy production.

Transmembrane transporters and transcription factors involved in oil accumulation
In general, plant glycolysis occurs in both cytosol and plastid, while FA and TAG biosynthesis occur in plastid and ER, respectively [43,45]. Therefore, the interchange of glycolytic intermediates and FA between different organelles becomes necessary. Several highly selective plastidic transporters, including GLT, GPT, TPT, PPT and PLAP, transport glycolytic metabolites generated in the cytosol to plastids for plastidial glycolysis [19,21,24,29,58]. ACBP and ABC are possible transporters related to FA or acyl-CoA transfer [23,59]. In the present study, expression levels of glt1 and gpt2 were highest at 50 DAF, whereas tpt and ppt1 were highest at 70 DAF, the latter showing a pattern similar to glycolysis-related unigenes in developing kernels (Figs. 1 and 6). These RPKM levels and patterns suggest that the main carbon source for FA biosynthesis in S. tonkinensis kernels is triosephosphate originating from phosphohexose outside the plastid, despite Expression of PLAP6, which may be related to plastid formation [60], was largely restricted to earlier stages. Expression levels of unigenes for ACBP4 and ACBP5, as intracellular carriers of acyl-CoA esters [59], were both sustained through 70 to 100 DAF, suggesting they transport acyl-CoA to the ER during oil accumulation. The unigene for ABCA9, which also transports FA or acyl-CoA into the ER [23], had a small peak at 70 DAF but greater expression at the last time point, verifying that ABCA9, as well as ACBP4 and ACBP5, plays an important role when storage lipids accumulate in developing kernels.
According to previous studies, TFs play positive or negative roles in seed oil synthesis and deposition [61][62][63][64][65]. In this study, expression patterns for ABI3, ABI4, HSI2/VAL1, L1L, PKL, and WRI1 suggest that they play positive roles in regulating genes for oil biosynthesis, while other TFs may act as negative or ambiguous regulatory factors. Among them, WRI1 displayed a coordinated transcriptional profile with pk, pdc, acc, kas, sad, and fad2, consistent with its function as a central regulator that directs carbon flux from glycolysis into FA biosynthesis during seed oil accumulation [65,66]. Similar results were seen in previous investigations on E. guineensis and P. sibirica [28,29].

Carbon competition during kernel development
In developing seeds, carbon partitioning determines the eventual starch/oil balance [44]. Zhang et al. [15] suggested that there is carbon competition between fatty acid and starch biosynthesis during S. tonkinensis kernel development. They found that between 40 and 60 DAF, the total soluble sugar concentration increased from 120 to 200 mg/g FW, but it then decreased to approximately the initial level by 90 DAF, where it remained until kernel maturation. Concurrently, starch concentration remained low (about 3 mg/g FW) between 30 and 80 DAF but then began to increase steadily to 18 mg/g FW. This phenomenon may result from the combined action of a series of enzymes related to glycoxidation, such as APG, SS, GPI, GAPDH, MDH and the PDC. The expression patterns of the unigenes for these enzymes were similar to changes in their respective activities reported in earlier research [15]. The unigenes for SS, the PDC and ACC, which are associated with the biosynthesis of starch or fatty acid, had peaks at 70 DAF. However, the subsequent decrease in expression levels of pdc and acc at 100 and 130 DAF were more obvious than for ss, and this might favor greater carbon flux to starch accumulation at middle to later stages of kernel development.

Conclusions
In this study, the accumulation of oil content during kernel development showed an up-down-up pattern with two peaks at 80 and 140 DAF, respectively, while FA composition remained relatively static. Kernel oil from mature seed had good biodiesel fuel properties. Consequently, both oil quantity and quality are maximized if kernels are harvested when development is complete. Four time points were selected for further transcriptome analysis, and a total of 168,205 unigenes were assembled and annotated successfully. Differentially expressed unigenes were grouped into 9 clusters based on their expression patterns. Three main patterns of expression in unigenes related to oil accumulation were evident. The majority of unigenes encoding enzymes associated with FA biosynthesis peaked at 70 DAF, on the cusp of a rapid increase in oil content. Expression of these unigenes then declined to lower levels while oil synthesis was sustained during kernel growth. Unigenes peaking or with relatively high expression at 50 DAF tended to be associated with plastid formation and earlier stages of oil synthesis in the cytosol, such as pyruvate and acetyl-CoA formation (directly from pyruvate and indirectly via acetaldehyde), indicating that the main carbon source for FA biosynthesis is triosephosphate originating from phosphohexose outside the plastid. Relatively few unigenes peaked at 100 or 130 DAF, but notable exceptions were unigenes associated with TAG biosynthesis and oil body development. TAG biosynthesis is likely accomplished both by DGAT via the Kennedy pathway and by PDAT from the PC pool. Expression patterns were also consistent with some carbon competition for starch accumulation at middle to late stages of kernel development, as reported by Zhang et al. [15]. This is the first comprehensive analysis of transcriptome and sequence information for genes related to oil accumulation in developing S. tonkinensis kernels. Results of this study will serve as an important foundation to more deeply explore the regulatory mechanism of oil accumulation in this species, and may provide reference for other potential biodiesel species.

Plant materials and RNA extraction
The S. tonkinensis plants (from Jishui, Jiangxi Province) used in this study were purchased by Jiangsu Guoxing Co., Ltd., in 2011, and planted in the Styracaceae Germplasm Repository, Luhe District, Nanjing, China (32°32′ N, 118°50′ E), where they grew under natural conditions. In early May, 2017, 15 trees were tagged for sampling. Beginning July 15th (50 DAF), fresh fruits were randomly collected every 10 days. The kernels stripped from fruits and seed coats were immediately frozen in liquid nitrogen and stored at − 70°C until use. After drying in a 65°C oven for 72 h and weighing, oil content and FA composition, including proportions of saturated, monounsaturated and polyunsaturated FA, were determined by extraction on a Soxhlet apparatus followed by gas chromatography-mass spectrometry (GC/MS), using methods previously described by Zhang et al. [15]. Biodiesel fuel properties including density (ρ), kinematic viscosity (η), cetane number (CN), iodine value (IV), and cold filter plugging point (CFPP) were predicted from the FA composition according to Wang et al. [40] and expressed using a triangular prediction model based on the percentages of saturated, monounsaturated, and polyunsaturated acids [40].
Kernels from four representative time points were used for comparative deep transcriptome analysis. There were three biological replicates for each time point, with each replicate consisting of material pooled from five different trees. Total RNA was extracted using Plant RNA Kit (Omega Bio-Tek, Doraville, GA, USA) according to the manufacturer's instructions. The quantity and quality of total RNA were assessed using 1% agarose gels and a Nanodrop ND 2000 Spectrophotometer (Nanodrop Technologies, Wilmington, DE, USA). The integrity and concentration of total RNA were assessed using a Bioanalyzer 2100 RNA 6000 Nano Kit (Agilent Technologies, Santa Clara, CA, USA).

cDNA library construction and sequencing
The poly(A) mRNA was isolated from total RNA using an Oligotex mRNA Mini Kit (Qiagen, Inc., Valencia, CA, USA) according to the manufacturer's instructions. cDNA library construction and normalization were performed using methods previously described by Niu et al. [29]. The 12 cDNA libraries were sequenced on the Illumina Hiseq 4000 Sequencing platform (Illumina, Inc., San Diego, CA, USA) and 150 bp paired-end reads were generated. The raw data were processed by removing the low-quality sequences (reads with more than 50% Q < 19 bases), the adaptor-pollute sequences, and sequences with ambiguous base reads accounting for more than 5%. The clean reads were assembled into unigenes by Trinity software (Trinity Release v2.4.0, Broad Institute of MIT and Harvard, Cambridge, MA, USA) [67].

Functional annotation of unigenes
To understand their functions, the assembled unigenes were aligned to the publicly available protein databases including the NCBI non-redundant protein (NR), the Swiss-Prot protein (SwissProt), Gene Ontology (GO), Clusters of Orthologous Groups (COG), and the Kyoto Encyclopedia of Genes and Genomes (KEGG) using Basic Local Alignment Search Tool (BLAST) with a cutoff E-value of 10 − 5 [68].

Expression analysis of unigenes
The expression levels of the unigenes were calculated as Reads Per Kilobase Million Mapped Reads (RPKM), which eliminates the effect of sequencing depth and gene length on gene expression levels and permits direct data comparisons by the DESeq method [69]. Expression levels of unigenes involved in metabolic pathways associated with seed oil accumulation were calculated. The DEGs between different time points were identified with p adj < 0.05 and |log2 (foldchange value) | ≥ 1 [69]. Clustering analysis using Mfuzz [70] was performed based on the expression patterns of all the identified DEGs (https://www.omicsolution.org/).

qRT-PCR validation
Twelve unigenes related to oil accumulation were selected for validation via quantitative real-time PCR (qRT-PCR). The amplification primers were designed by Primer Premier 5.0 software (Premier Biosoft International, Palo Alto, CA, USA). All reactions were carried out on a StepOne Real-Time PCR System (Applied Biosystems, Foster City, CA, USA) using SYBR Green Dye (Takara, Dalian, China) according to the manufacturer's instructions. Relative gene expression was calculated by the 2 -ΔΔCt method with 18S ribosomal RNA as an internal control [32]. All primers used in this study are listed in Additional file 6.