Explore the gene network regulating the composition of fatty acids in cottonseed

Background Cottonseed is one of the major sources of vegetable oil. Analysis of the dynamic changes of fatty acid components and the genes regulating the composition of fatty acids of cottonseed oil is of great significance for understanding the biological processes underlying biosynthesis of fatty acids and for genetic improving the oil nutritional qualities. Results In this study, we investigated the dynamic relationship of 13 fatty acid components at 12 developmental time points of cottonseed (Gossypium hirsutum L.) and generated cottonseed transcriptome of the 12 time points. At 5–15 day post anthesis (DPA), the contents of polyunsaturated linolenic acid (C18:3n-3) and saturated stearic acid (C18:0) were higher, while linoleic acid (C18:2n-6) was mainly synthesized after 15 DPA. Using 5 DPA as a reference, 15,647 non-redundant differentially expressed genes were identified in 10–60 DPA cottonseed. Co-expression gene network analysis identified six modules containing 3275 genes significantly associated with middle-late seed developmental stages and enriched with genes related to the linoleic acid metabolic pathway and α-linolenic acid metabolism. Genes (Gh_D03G0588 and Gh_A02G1788) encoding stearoyl-ACP desaturase were identified as hub genes and significantly up-regulated at 25 DPA. They seemed to play a decisive role in determining the ratio of saturated fatty acids to unsaturated fatty acids. FAD2 genes (Gh_A13G1850 and Gh_D13G2238) were highly expressed at 25–50 DPA, eventually leading to the high content of C18:2n-6 in cottonseed. The content of C18:3n-3 was significantly decreased from 5 DPA (7.44%) to 25 DPA (0.11%) and correlated with the expression characteristics of Gh_A09G0848 and Gh_D09G0870. Conclusions These results contribute to our understanding on the relationship between the accumulation pattern of fatty acid components and the expression characteristics of key genes involved in fatty acid biosynthesis during the entire period of cottonseed development. Supplementary Information The online version contains supplementary material available at 10.1186/s12870-021-02952-4.


Background
The quality and nutritional value of plant oils are largely defined by their fatty acid compositions. Because of the low freezing point of saturated fatty acids (SFAs), vegetable oils with a high content of SFAs will coagulate, turbid, semi solidified or even solidified at a lower temperature. From health perspective, SFAs may increase the risk of cardiovascular diseases [1][2][3][4]. Polyunsaturated fatty acids (PUFAs), e. g. linoleic acid (C18: 2n-6) and linolenic acid (C18:3n-3), are beneficial to human health, but they are easily oxidized and become rancid when exposed to oxygen under general storage conditions [4][5][6][7]. Chemical hydrogenation of PUFAs can improve the oxidation stability of oil, but unfortunately, it produces harmful trans fatty acids as by-products [8,9]. Monounsaturated fatty acid (MUFA) oleic acid (C18: 1n-9) is stable and considered to be beneficial to human health [3,[8][9][10][11][12]. Edible oil with a high proportion of C18:1n-9 represents "healthy oil", such as olive oil, due to low content of SFAs and no need of chemical hydrogenation [13].
Cottonseed kernel contains 25-35% oil and is one of the major sources of edible vegetable oil. Its nutritional components and quality have thus attracted much attention. The major fatty acid components of cottonseed oil are linoleic acid (C18:2n-6), palmitic acid (C16:0), oleic acid (C18:1n-9) and stearic acid (C18:0). Generally, the relative content of C18:1n-9 in cottonseed oil is~15%. Because of its nutritional benefit, increasing the relative content of C18:1n-9 is one of the major targets of genetic improvement of plant oils. In higher plants, C18:1n-9 is converted from C18:1n-9-ACP by one of the acyl-ACP thioesterases, FATA, and is then used as substrate to synthesize C18:2n-6, the most abundant fatty acid component of cottonseed oil, by fatty acid desaturase 2 (FAD2) [14]. Genetically modified plants with high oleic acid (C18:1n-9) content have been developed in several crops, e.g. soybean [15,16], rapeseed [17,18], cotton [3,10,19] and jatropha [20], by decreasing FAD2 expression. Changes in fatty acid composition improved the nutritional value but may have unexpected side effects, such as reduced total oil content or seed vigor [3,8], probably due to the natural compositions of various fatty acids in seeds as a result of natural selection being necessary for seed germination and seedling growth. Composition of seed fatty acids is also closely related to the self-protection and physiological defense mechanisms of offspring [3,21]. Therefore in-depth analysis of the gene network regulating fatty acid components of cottonseed oil is of great significance for their genetic modification to meet the needs of both human health and the fitness of cotton plants.
In this study, we first investigated the dynamic changes of fatty acid components in the developing seeds of an Upland cotton (Gossypium hirsutum L.) cultivar Xinluzao 33 and generated transcriptomes of developing seeds at multiple developmental time points. We then analyzed the expression profiles of the key genes involved in fatty acid biosynthesis and performed gene network analysis using differentially expressed genes (DEGs). The results deepened our understanding of the genes associated with the fatty acid metabolic network and the temporal regulation of cottonseed fatty acid components, and provided gene expression information that may be helpful for identification of targets for improvement of cottonseed oil quality.

Transcriptome analysis
To understand how the dynamic changes in fatty acid components during cottonseed development are regulated, we generated seed transcriptome profiles using RNA-Sequencing (RNA-seq). RNA-seq libraries were generated from 12 different developmental time points (5,10,15,20,25,30,35,40,45,50, 55 and 60 DPA) and sequenced. A total of~599 million raw reads were generated. The average Q30 of the reads from all samples was 94.04% and the read GC content was 44.66%. Raw reads were filtered to remove low quality ones and a total of 576,668,718 clean reads were finally used in alignment. Overall, from 93.35 to 96.38% of the clean reads could be aligned to the reference genome with 87.80-90.09% of them being uniquely aligned (Table  S3). These results suggested that the RNA-seq data were high quality and suitable for further analyses.
We identified genes differentially expressed at 10-60 DPA seeds using 5 DPA seeds as a control. From 10 DPA to 20 DPA, the number of DEGs increased from 396 to 2451. The number of DEGs at 25, 30 and 35 DPA three middle developmental time points was similar and started to increase at 40 DPA with a significant increase observed at 55 and 60 DPA (Fig. S2A). Regarding the number of up-and down-regulated DEGs, two transition points were observed at 10-15 DPA and 40-45 DPA. There were more down-regulated DEGs than up-regulated ones at 10 DPA and 45-60 DPA, but more up-regulated DEGs than down-regulated ones at 15-40 DPA (Fig. S2A). The period from 10 to 30 DPA is critical for oil accumulation. We further compared DEGs at 10, 20 and 30 DPA three time points. In the comparison group of 20 DPA vs 10 DPA, 30 DPA vs 20 DPA and 30 DPA vs 10 DPA, 1268, 1607, and 2517 DEGs were upregulated, and 430, 1030 and 1302 DEGs were downregulated, respectively. Among these DEGs, 501, 667 and 1413 were unique to each comparison, respectively, and 113 genes showed differential expression in all three comparisons (Fig. S2B).

Expression profiles of the genes involved in biosynthesis of fatty acids during cottonseed development
The de novo biosynthesis of fatty acids takes place mainly in the plastid. We first investigated the expression dynamics of the major genes involved in fatty acid biosynthesis ( Fig. 1). The first step of fatty acid biosynthesis is the formation of malonyl coenzyme by acetyl-CoA carboxylase (ACC), which is a key rate-limiting enzyme in fatty acid biosynthesis [23] and has a great impact on the final level of seed fatty acid content. The cotton genome contains two pairs of homoeologous ACC genes (Gh_A01G1574 and Gh_D01G1885, Gh_A05G2294 and Gh_D05G2554).  [22] All four genes had a higher expression level in the early to middle developmental stages than in the late stages but the two pair homoeologs had a different profile, with Gh_ A01G1574 and Gh_D01G1885 being mainly expressed in 5-15 DPA seeds while Gh_A05G2294 and Gh_D05G2554 being mainly expressed in 20-25 DPA seeds. Malonyl-CoA is catalyzed by fatty acid synthase (FAS) to produce 16:0 carbon palmitoyl-ACP via the recurring reactions. The elongation of the 16-carbon palmitoyl-ACP to the 18carbon stearoyl-ACP is catalyzed by β-ketoacyl-ACP synthase II [24]. Cotton has four pairs of homoeologous genes encoding β-ketoacyl-ACP synthase II (KASII). They are Gh_A13G1675 and Gh_D13G2493, Gh_A08G2201 and Gh_D08G2566, Gh_A06G0711 and Gh_D06G0821, and Gh_A09G0720 and Gh_D09G2463. Gh_A13G1675 and Gh_D13G2493 were mainly expressed in 5-15 DPA seeds, Gh_A08G2201 and Gh_D08G2566 were mainly expressed in 20-25 DPA seeds, and Gh_A06G0711 and Gh_ D06G0821 had their highest expression levels observed at 30 DPA. These results suggest that Gh_A13G1675 and Gh_D13G2493 might be the major genes contributing to the conversion from C16:0-ACP to C18:0-ACP in cottonseeds. C18:0-ACP is desaturated by Δ9 -stearoyl-ACP desaturase (SAD) to form monounsaturated C18:1n-9-ACP. Twelve SAD genes (Gh_A02G0894, Gh_D02G1097, Gh_ A02G1788, Gh_D03G0588, Gh_A05G2865, Gh_D05 G3162, Gh_A10G1286, Gh_D10G1197, Gh_A05G2312, Gh_D05G2574, Gh_A12G0128 and Gh_D05G3167) were identified in the cotton genome. Most had a higher expression level in 5-25 DPA seeds than in 30-60 DPA seeds, with the exception of two pairs of homoeologs (Gh_ A02G1788 and Gh_D03G0588, Gh_A05G2312 and Gh_ D05G2574), which had high expression levels at 20-25 DPA (Fig. 1).
In plants, the acyl group from acyl-ACP can be hydrolyzed by two different classes of acyl-ACP thioesterases (FATs), FATA and FATB, to release free fatty acids. During cottonseed development, one pair of homoeologous FATA gene (Gh_A08G1740 and Gh_D08G2090) was highly expressed, particularly in 5-25 DPA seeds, and three of the four pairs of homoeologous FATB (Gh_A08G2137 and Gh_D08G2504, Gh_A06G0514 and Gh_D06G0571, Gh_A13G1750 and Gh_ D13G2098, and Gh_A07G0026 and Gh_D07G0034) were also mainly expressed in 5-25 DPA seeds. The specificity of FATs is the major determinant of the chain length and the level of saturated fatty acids (SFAs) found in most plant tissues [25]. It had been shown that FATB displayed high catalytic efficiencies towards 16:0-ACP, whereas FATA had a high activity with oleoyl-ACP and a low activity with 16:0-ACP (~75-fold lower than FATB) [25]. At the early developmental stage of cottonseed, the expression levels of KASII and FATA were high, resulted in a higher accumulation of C18:0 in 0-15 DPA seeds. However, C18:0-ACP is also the substrate for synthesis of C18:1n-9-ACP. Meanwhile, the expression level of FATBs was significantly higher than that of FATAs at the same developmental stage during cottonseed development, particularly in 15-25 DPA seeds, consequently, the accumulation of C16:0 was much greater than that of C18:0 in cottonseed, and C16:0 being the major SFA in mature cottonseeds.
The major polyunsaturated fatty acid (PUFA) in cottonseed is synthesized by fatty acid desaturase 2 (FAD2, C18:1n-9 to C18:2n-6 desaturation) and polyunsaturated linolenic acid (C18:3n-3) is desaturated from C18:2n-6 by fatty acid desaturase 3 (FAD3). These reactions take place in the endoplasmic reticulum (ER). Two pairs of homoeologous GhFAD2 (Gh_A13G1850 and Gh_ D13G2238, Gh_A11G2814 and Gh_D11G3169) were expressed in developing cottonseeds with different expression patterns. Gh_A13G1850 and Gh_D13G2238, named as GhFAD2-1 [26], were mainly expressed in 25-50 DPA seeds, while Gh_A11G2814 and Gh_ D11G3169 were mainly expressed in 5-20 DPA seeds. The relative content of C18:2n-6 increased significantly from 15 to 25 DPA and remained high afterwards, overlapping with the period of high expression of Gh_ A13G1850 and Gh_D13G2238, indicating that these two homoeologs are the major contributors of C18:2n-6 biosynthesis, consistent with previous results [26,27]. GhFAD3 are encoded by two pairs of homoeologous genes (Gh_A09G0848 and Gh_D09G0870, Gh_ A07G0946 and Gh_D07G1026). These two pairs also had a different expression pattern in developing cottonseed. Gh_A09G0848 and Gh_D09G0870 had a relatively high expression level in 5 to 15 DPA seeds, particular in 5 DPA. After 20 DPA, their expression level decreased significantly and was almost undetectable in mature cottonseed. In contrast, expression of Gh_A07G0946 and Gh_D07G1026 was observed from 5 to 50 DPA, with the highest expression level observed in 40 DPA seeds. C18: 3n-3 was apparently accumulated in the early stage of seed development and reached its peak (7.44%) at 5 DPA. The relative content of C18:3n-3 was significantly decreased from 1.16% at 15 DPA to 0.11% at 25 DPA and remained very low afterwards. These results indicate that the content of C18:3n-3 in cottonseed was determined by Gh_A09G0848 and Gh_D09G0870 rather than Gh_A07G0946 and Gh_D07G1026.

Confirmation of DEGs by qRT-PCR analysis
To validate the RNA-seq results and identified DEGs, 12 genes related to fatty acid synthesis, including 6 genes related to fatty acid desaturation, were selected for confirmation by qRT-PCR. The relative expression levels and expression tendency of the selected genes determined by qRT-PCR were highly consistent with that of RNA-seq (Fig. 2), indicating highly reliable RNA-seq results.

Gene ontology analysis of DEGs
To get insights into the biological function of the DEGs, we performed Gene Ontology (GO) analysis using the genes differentially expressed between 5 DPA and other time points. In total, 15,647 non-redundant DEGs were identified and they were enriched with 308 GO terms (Table S4). The relative content of each fatty acid component changed significantly from 5 to 30 DPA and tended to be relatively stable after 30 DPA. The 11,645 non-redundant DEGs identified in 10-30 DPA were enriched with 178 GO terms. Individually, 20, 21, 77, 112 and 111 GO terms were enriched at 10, 15, 20, 25 and 30 DPA, respectively, with 8, 1, 24, 26 and 29 GO terms unique to each time point (Fig. 3a). The number of enriched GO items was significantly fewer at 10 and 15 DPA than at other time points (Fig. 3a, b). Nine commonly enriched GO terms were found in DEGs from 10 to 30 DPA and were related to fatty acid biosynthesis and redox-related process (Fig. 3d). In addition to these GO terms enriched in all time points, 1, 4 and 37 commonly enriched GO terms were found between 15 and 20 DPA, between 20 and 25 DPA, and between 25 and 30 DPA, respectively (Fig. 3a), and 26 GO items were commonly enriched in 20, 25 and 30 DPA three time points (Fig. 3c, Table S5). In 35-60 DPA, 34 GO terms were significantly enriched. In addition, fatty acid synthase activity, fatty acid metabolism and fatty acid biosynthesis processes were the commonly enriched GO items in 35-60 DPA (Table S6).
Through GO analysis of DEGs among different groups, it was found that, in terms of biological processes, these DEGs were mainly related to fatty acids, lipids and other metabolic processes; regarding cell components, they were involved in cell parts and membranes; and their main molecular functions were binding, splitting and catalytic activity. These results suggest that fatty acid synthesis and metabolism-related pathways are mainly enriched in the early stages of cottonseed development, while genes related to cellular component and molecular functions were enriched in the middle development stages.

KEGG pathways of DEGs
Pathway analysis was further performed to determine the functions of DEGs. A total of 122 enriched KEGG pathways were found for the 15,647 non-redundant DEGs from 10 to 60 DPA (Table S7). Many significantly changed pathways were related to metabolism of fatty acid elongation, biosynthesis of unsaturated fatty acids, and α-linolenic acid metabolism, which are associated with the synthesis of fatty acid components (Table S8). Generally, C18:3n-3 was mainly synthesized at the early stage (around 5 DPA) of seed development. The enriched KEGG pathways at 10 DPA included those involved in metabolism of phenylalanine, alanine, aspartate and glutamate, and biosynthesis of phenylpropanoid, monoterpenoid, glucosinolate and cutin, suberine and wax. At 20 DPA, pathways related to biosynthesis of flavonoid, diterpenoid, and unsaturated fatty acids were enriched (Fig. 4a, Table S8). FAD2 (Gh_D13G2238 and Gh_A13G1850) and FAD3 (Gh_A09G0848 and Gh_ D09G0870) were components of these pathways (Fig.  S3B). The pathways enriched at 25 DPA were also related to biosynthesis of unsaturated fatty acids and elongation of fatty acids (Fig. 4b, Table S8). Among the 13 DEGs in the fatty acid extension pathway, 12 encodes 3-ketoyl-CoA synthetase (Fig. S3C). The enriched pathways at 30 DPA were similar to that at 25 DPA. We found a remarkable feature that the elongation of fatty acid pathway was enriched in 35, 40, 45, and 50 DPA cottonseed, while the fatty acid degradation pathway was enriched in 60 DPA (Table S9), which might explain why the relative content of the fatty acids tended to be stabilized at the late development stages.

Gene network analysis with WGCNA
WGCNA analysis was performed to obtain a comprehensive understanding the gene network regulating fatty acid synthesis during cottonseed development. The 15,647 non-redundant DEGs identified between 5 DPA and other time points were used in the analysis and a total of 10 gene modules associated with the specific expression profiles of different samples were identified (Fig. 5a). Among these 10 modules, 6 (yellow, brown, green, pink, magenta and blue) were significantly associated with middle and/or late stages of cottonseed development, and no gene module was significantly associated with the early developmental stage (5-15 DPA) although the turquoise module was marginally significant at 5 DPA (Fig. 5b). The yellow module with 861 genes was highly associated with 20 DPA. The brown (870 genes) module was significantly associated with 25 DPA. The green (400 genes) and pink (74 genes) modules were significantly associated with 30 DPA and 40 DPA, respectively. There were 33 genes in the magenta module that was significantly associated with 50 DPA, and 1037 genes in the blue module that was significantly associated with 55 DPA (Fig. 5b). KEGG analysis was performed on 3275 DEGs from these six modules. The modules of yellow, brown, pink and magenta were all enriched with genes involved in the α-linolenic acid metabolism. The relative content of C18:3n-3 was relatively high in 5 DPA seeds (7.44%) and reduced significantly to 0.55% in 20 DPA seeds (Table 1), consistent with significant down-regulation of GhFAD3 (Gh_A09G0848 and Gh_D09G0870) at 20 DPA (Fig. 1) and C18:3n-3 being a precursor substance for production of jasmonic acid and oxyacids as genes of the α-linolenic acid metabolism pathway involved in biosynthesis of jasmonic acid and oxyacids were up-regulated at 20 DPA (Table S10). In addition, at 25 DPA, the brown module was also related to fatty acid metabolism and biosynthetic pathways; both the pink module and the  (Table S10).

Co-expression gene networks and hub genes associated with fatty acid synthesis
According to the changes of the relative content of fatty acids during cottonseed development and the results of gene network analysis, the yellow and brown modules were chosen for co-expression analysis to identify the hub genes of the modules. Based on the criteria of eigengene-based connectivity (K ME ) value ≥0.89 and edge weight value ≥0.5, 29 genes were found to be coexpressed in the brown module. Gh_D03G0588 and Gh_ A02G1788 encoding SAD were identified to be the hub genes of the brown module (Fig. 6). At 25 DPA, Gh_ D03G0588 and Gh_A02G1788 were significantly upregulated (Fig. 6, Table S11). At this time point, the relative content of C18:0 decreased significantly, indicating that most C18:0-ACP might has been used in synthesis of C18:1n-9-ACP due to significantly increased expression level of SAD genes. However, the relative content of C18:1n-9 did not increase significantly, indicating rapid desaturation and conversion of C18:1n-9-ACP to C18: 1n-9 and finally C18:2n-6, the major unsaturated fatty acid in cottonseed (Table 1).

Discussion
Cotton is not only the most important fiber crop, but also an important oil crop. Our results showed that the content of SFAs in cottonseed oil is relatively high (over 20% of C16:0). It means that long-term consumption of cottonseed oil might increase the incidence of cardiovascular diseases. PUFAs that are easy to be oxidized and unstable under the conditions of light and high temperature are also relatively high (up to 60%), while the content of C18:1n-9, a fatty acid being stable and beneficial to human health, is only 15.6% in cottonseed oil, far lower than 75.7% in olive oil [28], 55.9-72.0% in rapeseed oil [29], 28.0% in sunflower oil [2], 41.5% in sesame oil [2], 42.9% in palm oil and 29.1% in soybean oil [28]. Therefore, it has great practical value to enhance effectively the content of C18:1n-9 in cottonseed to improve the nutritional quality of cottonseed oil.
Understanding plant lipid metabolism is the basis of improving oil quality and increasing oil yield. The biochemical pathways of storage oil production in plants have been well documented. However, the factors regulating fatty acid synthesis and controlling total oil content in oilseed crops remain to be elucidated [30,31]. According to our study, C16:0, C18:0 and C18:3n-3 were mainly synthesized at 5-15 DPA during the development of cottonseed. Synthesis of C18:2n-6 happened mainly in the middle and late stages of seed development. These fatty acid components showed an obvious correlation with each other during oil accumulation. In general, increase of C18:2n-6 was accompanied by decrease of other fatty acid components, particularly C18: 3n-3, which decreased from 7.44% at 5 DPA to 0.02% at 60 DPA. Compared with their highest content at 15 DPA, C16:0 and C18:0 decreased by 50% and nearly 6 times in mature cottonseed, respectively. C18:1n-9 and C18:2n-6 accounted for more than 70% of all fatty acid components in mature cottonseed and were the main fatty acid components determining the total oil content of cottonseed.
The range of C18:1n-9 content in cottonseed is rather narrow among cotton germplasm resources [32], and it is difficult to breed high C18:1n-9 cotton varieties by conventional breeding methods [33]. Understanding the expression characteristics and functions of the key genes in fatty acid synthesis is the prerequisite for improving cottonseed oil quality. Expression of genes encoding key enzymes for fatty acid synthesis have been shown to be tissue and developmental stage specific [34]. We found  S2A), consistent with the results reported in a previous transcriptome sequencing experiment [22]. The relative content of most fatty acid components started to decrease after 30 DPA and C18:2n-6, the most abundant component, stabilized after 25 DPA (Table 1). In agree with this observation, down-regulated DEGs in the 30 DPA vs 20 DPA comparison were enriched with GO terms related to fatty acid biosynthesis (Table S12). Similar results have also been reported in soybean [35]. Through the analysis of GO enrichment and KEGG pathways of DEGs, it was found that fatty acid biosynthesis, fatty acid metabolism, and cutin and wax biosynthesis are involved in the early development of cottonseed. It has been proposed that GhFAD2 expression increases to rapidly accumulate large amount of C18:2n-6, when the content of C16:0 and C18: 0 exceeds a certain threshold [3]. We found that the expression level of GhFAD2 increased significantly from 20 DPA (Fig. 1). Accordingly, the content of C18:2n-6 increased significantly from 20 DPA and reached its maximum at 40 DPA (Table 1). When the amount of substrate was lower than the threshold, the expression level of GhFAD2 was down-regulated and tended to be relatively stable. And then, the very long chain fatty acid components and their derivatives, such as wax constituents, were synthesized in the seeds. Through the analysis of gene expression profile during cottonseed development, GhFAD3 (Gh_A09G0848 and Gh_D09G0870) was highly expressed at 5 DPA and decreased significantly at 20 DPA. After a further reduction at 25 DPA, GhFAD3 maintained a very low expression level from 30 to 60 DPA. The expression profiles of GhFAD2 and GhFAD3 indicate these two enzymes play an important role in regulating the content of fatty acid components during cottonseed development.
Cotton has a complex allotetraploid genome and gene replication can lead to functional redundancy and diversification. It has been demonstrated that expression of3 0% of the cotton homoeologs are significantly A-or Dbiased during fiber development [36]. We found that DEGs during the rapid oil accumulation period (10-30 DPA) are biased towards the D subgenome (Table S13). This was consistent with the result that high expression levels of fatty acid synthesis genes were found in the Dt homeologs [37]. Based on the analyses of WGCNA and the co-expression gene networks, the SAD genes (Gh_ A02G1788 and Gh_D03G0588) encoding the soluble desaturase enzyme that converts saturated C18:0 into unsaturated C18:1n-9 [38] were identified as the candidate hub genes highly associated with unsaturated fatty acid content at 25 DPA. SAD is highly active in developing cottonseeds, resulting in only 2-3% of C18:0 in the seed oil at maturity, but the C18:0 content can be significantly increased by silencing expression of SAD genes [10]. It is worth noting that the highest expression level of SAD (Gh_A02G1788 and Gh_D03G0588) at 25 DPA was consistent with their preferential expression in developing ovules, especially at 25 DPA [39,40]. The continuous decrease of C18:0 content from 15 to 25 DPA was accompanied by the increase of C18:1n-9 content, which is consistent with the result that SAD plays a decisive role in determining the ratio of saturated fatty acids to unsaturated fatty acids [38]. Moreover, the activity of SAD genes and consequently the C18:1n-9 content play important roles in senescence regulation, fungal infection resistance, and mechanical damage reactions [39,[41][42][43][44].
Because of its low specific gravity and high energy content, fat is the most economical and effective storage form of nutrients. During cottonseed development, the relative content of saturated C16:0 is always above 20%, indicating the importance of C16:0 in seed development and physiological activity. C16:0 is also a precursor for subsequent biosynthesis of other fatty acids. PUFAs are not only essential structural components of cells, but also play important roles in seed germination, physiological activity and resistance to adverse environment [3]. The content of PUFAs increased gradually during the period of seed maturation and the C18:2n-6 content was over 50% in mature cottonseed. C18:3n-3 is also an important substrate for biosynthesis of many other lipids and Jasmonic acid (JA) through the octadecanoid pathway.
Although the content of C18:3n-3 was dramatically reduced from 20 DPA, it was thought that the threshold requirement for C18:3n-3 to synthesize JA is very low in cotton based on the observation of cotton anthers with only 1-2% of C18:3n-3 remaining fertile [25]. A high level of C18:2n-6 contributes to seed germination and early seedling growth under low temperature and salt conditions [45]. The highest content of PUFAs in cottonseed was C18:2n-6, not C18:3n-3. The ratio of C18:2n-6 to C18:3n-3 during cottonseed development was shown in Fig. S4. This accumulation pattern of fatty acid components in cottonseed is not accidental, but a result of nature selection closely related to the mechanisms of plants' selfdefense and protection of their offspring.

Conclusions
Investigating the correlation between the content of various fatty acid components and the expression pattern of key genes responsible for fatty acid biosynthesis is a necessary step to understand the regulation of fatty acid metabolism and is also an effective way to explore the physiological significance of the key genes and the strategy for improving cottonseed oil quality. In this study, we have gained certain insights into the relationship between the dynamic changes of the expression levels of fatty acid biosynthesis genes and the accumulation of the main fatty acids in cottonseed, although the biological function of most DEGs is yet to be further investigated in future.

Plant materials
Cotton (G. hirsutum L. cv Xinluzao 33) plants were grown in the field in Shihezi, China. Flowers were tagged on the day of blooming. Bolls were collected at 5, 10,15,20,25,30,35,40,45,50,55, and 60 day post anthesis (DPA), respectively. No permissions were needed to study and collect these samples. Seeds were separated from bolls and stored in refrigerator at − 80°C until further use. 5-60 DPA de-husked seeds were used in measurement of fatty acid content and transcriptome analysis.

Fatty acid analysis
The content of fatty acid components of each sample was analyzed by external standard method. Thirty-seven fatty acid methyl esters (Sigma-Aldrich) with known concentration were diluted by 2, 5, 10, 20, 40 times with hexane (Table S1), respectively, and used to establish the quantitative standard curve of C14:0, C16:0, C18:0, C18:1n-9, C18: 2n-6, and C18:3n-3 (Fig. S1). Seeds were ground into powder with liquid nitrogen, and then were freeze-dried using a vacuum freeze dryer. For each analysis, 1 g freeze-dried sample was transferred into glass tube, and oil was extracted using Soxtherm apparatus (Gerhadt). Then, 5 ml 0.4 M KOH-methanol and 5 ml hexane were added and mixed. The solution was transferred into vials and shaken for 30 min at 40°C. Subsequently, the upper hexane layer was used in GC-MS analysis. The qualitative and quantitative analyses were performed using GCMS-QP2020 (Shimadzu, Japan) at an electron ionization of 70 eV with an HP-88 column (100 m × 0.2 mm) and film thickness of 0.2 μm according to Liu et al. (2019) [25]. The details were as follows: after split/splitless injection at 250°C and incubation in oven for 2 min at 40°C, the temperature was increased to 240°C at a rate of 4°C/min and then held at 240°C for 15 min. Helium was used as the carrier gas with a constant flow rate of 2 mL/min. A volume of 1.0 μL was injected and the injection mode was split with a ratio of 10:1. The ion source and interface temperature was set to 200°C and 250°C, respectively. The mass scan range was m/z 40-500 and the solvent delay time was set to 13 min. The quantification was carried out according to the response value of quantitative ions and the established standard curve. Each test was repeated three times, and the content of each fatty acid component was calculated as the percentage of total measured fatty acids.

RNA isolation, library preparation and sequencing
Total RNA was extracted using Trizol reagent (Invitrogen) following the manufacturer's protocol. RNA purity was checked using the NanoPhotometer® spectrophotometer (IMPLEN). RNA concentration was measured using Qubit® RNA Assay Kit in Qubit® 2.0 Flurometer (Life Technologies). RNA integrity was assessed using the RNA Nano 6000 Assay Kit with the Bioanalyzer 2100 system (Agilent). A total amount of 3 μg RNA per sample was used as input material for library preparation using NEBNext® Ultra™ RNA Library Prep Kit for Illu-mina® (NEB). The libraries were sequenced on an Illumina Hiseq platform. Using Hisat2 v2.0.5, paired-end clean reads were aligned to the TM-1 reference genome [23]. FeatureCounts v1.5.0-p3 was used to count the read numbers mapped to each gene. FPKM of each gene was calculated based on the length of the gene and the number of reads mapped to the gene.

Differential expression, gene function annotation and enrichment analysis
Differential expression analysis was performed using the DESeq R package (1.18.0) through negative binomial distribution and calculation of false discovery rate [46]. The adjusted p-value < 0.05 and (|log 2 (fold change) | ≥2 were used as the criteria for identification of DEGs among samples. Gene Ontology (GO) enrichment analysis of DEGs was implemented by the GO seq R package, in which gene length bias was corrected. GO terms with a corrected p-value less than 0.05 were considered to be significantly enriched. KOBAS software was used to test the statistical enrichment of DEGs in KEGG pathways.

Quantitative RT-PCR validation of DEGs
Gene specific primers were designed using cDNA sequences of the target genes with the NCBI Primer-BLAST program (Table S2). qRT-PCR was carried out as described by Cheng et al. (2020) [47]. Three independent biological experiments were performed for samples of each time point. The relative expression levels were calculated with cotton ubiquitin gene (GhUBI, XM_012634824) as a reference according to Cheng et al. (2020) [47].

Constructing co-expression networks of DEGs
The Weighted Correlation Network Analysis (WGCNA) package was used to construct the co-expression networks among DEGs associated with fatty acid synthesis in seeds from different developmental stages. The hub genes were screened on the basis of the module K ME values and high-weight values. The correlation networks were drawn using Cytoscape 3.6.1.