Quantitative trait loci analysis of seed oil content and composition of wild and cultivated soybean

Background Soybean oil is a major source of edible oil, and the domestication of wild soybean has resulted in significant changes in oil content and composition. Extensive efforts have been made to identify genetic loci that are related to soybean oil traits. The objective of this study was to identify quantitative trait loci (QTLs) related to soybean seed oil and compare the fatty acid composition between wild and cultivated soybean. Results Using the specific-locus amplified fragment sequencing (SLAF-seq) method, a total of 181 recombinant inbred lines (RILs) derived from a cross between wild soybean ZYD00463 (Glycine soja) and cultivated soybean WDD01514 (Glycine max) were genotyped. Finally, a high-density genetic linkage map comprising 11,398 single-nucleotide polymorphism (SNP) markers on 20 linkage groups (LGs) was constructed. Twenty-four stable QTLs for seed oil content and composition were identified by model-based composite interval mapping (CIM) across multiple environments. Among these QTLs, 23 overlapped with or were adjacent to previously reported QTLs. One QTL, qPA10_1 (5.94–9.98 Mb) on Chr. Ten is a novel locus for palmitic acid. In the intervals of stable QTLs, some interesting genes involved in lipid metabolism were detected. Conclusions We developed 181 RILs from a cross between wild soybean ZYD00463 and cultivated soybean WDD01514 and constructed a high-density genetic map using the SLAF-seq method. We identified 24 stable QTLs for seed oil content and compositions, which includes qPA10_1 on Chr. 10, a novel locus for palmitic acid. Some interesting genes in the QTL regions were also detected. Our study will provide useful information for scientists to learn about genetic variations in lipid metabolism between wild and cultivated soybean.


Background
Soybean (Glycine max L. Merr.) is one of the most important protein and oil crops [1], the yields of which accounted for nearly 60% of the world's oilseed production in 2018 [2]. Soybean oil is mainly composed of five fatty acids, which include palmitic acid (C16:0), stearic acid (C18:0), oleic acid (C18:1), linoleic acid (C18:2), and linolenic acid (C18:3), which are present at approximate average concentrations of 10, 4, 18, 55, and 13%, respectively [3,4]. The quality of soybean oil depends on the composition of the fatty acids, which affects the nutritional value, flavor, and stability of the soybean oil. Unsaturated fatty acids play an important role in immune system regulation, blood clotting, neurotransmission, cholesterol metabolism, and the structure of membrane phospholipids in the brain and retina [5]. However, unsaturated fatty acids are susceptible to oxidation, resulting in an off-flavor and reducing oil shelf life [6,7]. Soybeans are thus currently bred to have higher amounts of monounsaturated fatty acids (oleic acid) and lower amounts of polyunsaturated fatty acids (linoleic and linolenic acid), which increases the oxidative stability and is also better for human health [4,8].
Soybean seed oil content and composition are controlled by multiple quantitative trait loci (QTLs)/genes and are also affected by environmental factors [9,10]. To date, the QTLs associated with seed oil and fatty acids in soybean have been extensively investigated [11][12][13][14][15][16][17][18][19]. Since the first research attempted to discover oil QTLs in soybean [10], more than 322 oil QTLs and 228 fatty acid QTLs have been identified across all 20 chromosomes (Chr.) in the SoyBase database [20]. Among these QTLs, some are stable and have been detected in different bi-parental populations and environments, including the QTL regions of 1.64-2.09 Mb and 33. 35-35.95 Mb on Chr. Twenty for seed oil content [14,[21][22][23][24][25] and the QTL region of 44.58-48.58 Mb on Chr. Fourteen for seed linolenic acid [11,13,24,26]. However, most of these identified QTLs have low selection accuracy and have not been effectively used in markerassisted selection (MAS) in soybean for seed oil due to insufficient linkage disequilibrium with desirable QTL alleles and the genetic complexity of the trait [27,28].
Cultivated soybean seeds have an oil content of approximately 18-22%, whereas wild soybean seeds contain about 8-10% oil [25]. In an attempt to identify genes controlling seed oil content and composition, QTL analyses for oil content between cultivated and wild soybean need to be conducted. In the present study, large-scale SNP markers were developed using SLAF-seq to construct the linkage group and map the QTLs controlling seed oil and composition using a population of 181 recombinant inbred lines (RILs) developed from an interspecific cross between wild soybean ZYD00463 (Glycine soja) and the cultivated soybean WDD01514 (Glycine max). Our results could inform scientists about the genetic variation in seed oil between wild and cultivated soybean during the domestication process and further improve oil quantity and quality by molecular breeding.

Phenotypic variation
The seed oil content and five predominant fatty acid compositions of the parents and progenies were determined from 2015 to 2016 in Wuhan, Hubei Province and Xuchang, Henan Province. Table 1 shows that the average seed oil content of WDD01514 is 23.76%, which is approximately two-fold higher than ZYD00463 (11.89%). In terms of seed oil composition, WDD01514 has higher oleic acid (20.57%) and lower linolenic acid (7.60%) content than ZYD00463, which are 11.39 and 16.60%, respectively. Large variations in oil content and composition were observed among the 181 RILs, ranging from 11.36 to 13.11% for oil content, 17.07 to 22.68% for oleic acid content, and 11.05 to 14.41% for linolenic acid content.
The normality test using the Shapiro-Wilk (w) statistic indicated that oil content, palmitic acid, stearic acid (except 2015/X), and linolenic acid were normally distributed with P-values > 0.05. However, oleic acid and linoleic acid were not normally distributed (P < 0.05) ( Table 1; Additional file 1). Oleic acid was skewed toward ZYD00463, while linoleic acid was skewed toward WDD01514 (Additional file 1). Moreover, transgressive segregation was observed in the progenies for palmitic, stearic, oleic, and linoleic acid (Table 1; Additional file 1), suggesting that comprehensive genetic recombination of alleles had occurred between parents.
The broad-sense heritability (h 2 ) of the oil and fatty acids ranged from 80.2 to 92.0% in the combined environment, which indicated that the genetic variations accounted for a major proportion of the observed phenotypic variations (Table 1). ANOVA showed that the F-value of the G × E interaction was significant for all traits (P < 0.001). However, the F-value was less than the genotype ( Table 2).
A positive correlation was observed between oil content and oleic acid in all four environments (P < 0.01), whereas a negative correlation was observed between oil content and palmitic acid, oil content and linolenic acid, palmitic acid and oleic acid, oleic acid and linoleic acid, and oleic acid and linolenic acid in all four environments (P < 0.01) (Additional file 2). This suggested that the important genetic factors controlling these traits are tightly linked.

Construction of a genetic map with SNP markers developed using SLAF-seq
The SLAF-seq method was applied to develop SNP markers between the two parents. Ultimately, a total of 11, 398 SNP markers distributed over 20 linkage groups (LGs) were used to construct the genetic linkage map (Additional files 3, Additional files 4). These SNP markers encompassed 2913.78 cM of the soybean genome, with a mean distance of 0.26 cM between markers. The genetic distances of 20 LGs spanned from 126.23 cM (Chr. 11) to 226.60 cM (Chr. 03), with mean marker intervals ranging from 0.15 cM to 0.60 cM. The largest LG (Chr. 01) contained 891 SNP markers, while the smallest one (Chr. 13) had 248 SNP markers (Additional file 5).

QTLs for oil content
A total of 22 QTLs for seed oil content were mapped in this study (Additional file 6). Among these, seven stable QTLs with an LOD > 3.6 were identified across multiple environments ( Fig. 1; Table 3) and were mapped to Chr. 02 (qOC2_ 1), 08 (qOC8_1 and qOC8_2), 15 (qOC15_1 and qOC15_2), and 20 (qOC20_1 and qOC20_2) (Additional file 7). The QTL qOC2_1 on Chr. 02 explained an average of 6.7% of the phenotypic variance for oil content; qOC8_1 and qOC8_ 2 on Chr. 08 explained an average of 8.9 and 6.6% of the phenotypic variance, respectively; qOC15_1 and qOC15_2 on Chr. 15 explained 12.9 and 11.0% of the phenotypic variance on average, respectively; and qOC20_1 and qOC20_2 on Chr. 20 explained 12.2 and 19.3% of the phenotypic variance on average, respectively. All of the QTLs showed negative additive effects, indicating the negative effect on oil content for the allele from the wild soybean parent ZYD00463. In comparison with the reported QTL regions, six QTLs, including qOC8_1, qOC8_2, qOC15_1, qOC15_2, qOC20_1, and qOC20_2, overlapped with previous QTLs ( Fig. 2; Additional file 8). The QTL qOC2_1, which was located within the interval of 5.08-6.27 Mb on Chr. 02, was adjacent to the mapped QTL within the interval of 6.86-9.67 Mb ( Fig. 2; Additional file 8).

Discussion
In the present study, we identified 24 stable QTLs for seed oil and composition. By comparing their mapped regions with previous reports on the soybean reference genome ( Fig. 2; Additional files 8 and 11), we discovered that qPA10_1 did not overlap with or was not adjacent to any of the previously reported QTLs. Furthermore, it did not contain QTNs associated with palmitic acid obtained by GWAS. Due to a higher density genetic map constructed with 11,398 SNP markers, the intervals of the QTL regions were significantly reduced in comparison with those previously reported. For example, for oil content, the physical distance of qOC8_1, qOC8_2, qOC15_2, and qOC20_2 in our study was 7.52-9. 44   In comparison with previously reported QTLs (http:// www.soybase.org) (Additional files 8 and 11), 23 out of the 24 QTLs were close to or overlapped with previously reported QTLs (Fig. 2). For example, the QTL qOC2_1 for oil content on Chr. 02 was located around 5.08-6.27 Mb, which was adjacent to a reported QTL (6.86-9.67 Mb) [50]. In addition, qOC8_1 on Chr. 08, qOC8_2 on Chr. 08, qOC15_1 on Chr. 15 [18]. Although these QTLs are located in similar regions, whether their responsible genes are identical requires further investigation. In contrast, qPA10_1 on Chr. 10 was mapped to the region of 5.94-9.98 Mb, which was distal from the reported 0.98-1.87 Mb [24] (Fig. 2; Additional file 8) and did not contain any reported QTNs that were associated with palmitic acid by GWAS [58] (Additional file 11). This result indicated that this QTL is a novel locus for palmitic acid.
By annotating all genes in 24 stable QTL intervals with the Gene Annotation Tool of the SoyBase database [20], 12 important enzyme-encoding genes involved in lipid metabolism were identified (Additional file 9). For example, GmACCase (Glyma.05 g221100) encoded an acetyl-CoA carboxylase that catalyzes the formation of malonyl-CoA from acetyl-CoA as the direct substrate of the de novo biosynthesis of fatty acids [59]. GmFabD (Glyma.11 g164500) encoded a malonyl-CoA:ACP malonyltransferase that is responsible for transferring the malonyl group of malonyl-CoA to an acyl carrier protein (ACP). GmACP (Glyma.05 g201300 and Glyma.16 g011300) encoded an acyl carrier protein that transports the growing fatty acid chain between the enzymatic domains of fatty acid synthase [60]. GmFabB (Glyma.05 g218600 and Glyma.13 g128000) and GmFabF (Glyma.13 g112700) encoded a ketoacyl-ACP synthases I and II, respectively, which are mainly used to produce palmitoyl-ACP and stearoyl-ACP as the condensing enzyme of fatty acid chain elongation, respectively. GmFabG (Glyma.08 g102100) and GmFabZ (Glyma.15 g052500) encoded a 3ketoacyl-ACP reductase and 3-hydroxyacyl-ACP dehydrase, which catalyze the reduction and dehydration reaction of 3-ketoacyl-ACP, respectively [61]. In addition, GmFAD3 (Glyma.11 g174100) encoded an omega-3 fatty acid desaturase 3 that catalyzes a third double bond into linoleic acid to produce linolenic acid [40,62]. GmDGAT (Glyma.13 g106100 and Glyma.13 g118300) encoded a diacylglycerol acyltransferase that catalyzes the formation of TAGs from fatty acids and glycerol 3-phosphate [63,64]. The presence of these genes within the stable QTL suggests that these may contribute to soybean seed lipid metabolism. However, whether these enzyme-encoded genes are responsible for the corresponding QTLs requires confirmation using transgenic methods. In addition to the key enzyme genes involved in lipid metabolism, some transcription factor genes also have important roles in regulating fatty acid biosynthesis. We identified 11 transcription factor genes involved in lipid metabolism within the 24 stable QTL intervals (Additional file 10), including GmB3, GmC3H, GmDBB, GmHLH, GmMYB, GmNF-YA, GmWRKY, and GmZIP. Several transcription factors involved in regulating oil and fatty acid biosynthesis have been studied in soybean. For example, the overexpression of GmNFYA in Arabidopsis significantly increases seed oil content [46]. GmMYB73 promotes lipid accumulation in transgenic Arabidopsis, possibly through the suppression of GLABRA2, a transcription factor of HD-ZIP [49]. The roles of transcription factors in our QTL regions in the domestication of the seed oil trait in soybean should be further studied.
In the interval of the novel QTL qPA10_1, three candidate genes potentially involved in lipid metabolism were also identified, including GmPK (Glyma.10 g065000), GmB3 (Glyma.10 g076100), and GmZIP (Glyma.10 g071700). GmPK (Glyma.10 g065000) encodes a pyruvate kinase (PK). During soybean seed development, phosphoenolpyruvate carboxylase and pyruvate kinase activities contribute to a complex interaction that regulates the metabolic flow of glycolytic carbon into precursors for both protein and oil biosynthesis [65]. In Arabidopsis thaliana seeds, reducing the plastidic pyruvate kinase activity by disruption of the gene encoding the β1 subunit of pyruvate kinase resulted in a 60% reduction of seed oil content [66]. GmB3 (Glyma.10 g076100) encodes a B3 domain family of transcription factor. In Brassica napus, the BnFUSCA3 (BnFUS3) mutant, a member of B3 domain transcription factors, repressed seed oil levels and increased levels of linoleic acid, possibly due to the reduced expression of ω-3 FADESATURASE (FAD3) [67]. In soybean, GmLEC2a, a B3 domain transcription factor, complemented the defects of the Arabidopsis atlec2 mutant in seedling development and triacylglycerol accumulation. The overexpression of GmLEC2a in Arabidopsis seeds increased triacylglycerol content by 34% and the composition of long chain fatty acids by 4% relative to the control seeds [68]. GmZIP (Glyma.10 g071700) encodes a homeobox-leucine zipper protein. Song et al. (2013) reported that GmbZIP123 (Glyma.06 g010200) enhances lipid content in transgenic Arabidopsis seeds by promoting expression of sucrose transporter genes (SUC1 and SUC5) and cell wall invertase genes (cwINV1, cwINV3, and cwINV6) [48]. In A. thaliana, bZIP67 binds to G-boxes in the FATTY ACID DESA-TURASE3 (FAD3) promoter, enhances FAD3 expression, and increases linolenic acid seed content [69].

Conclusions
By means of SLAF-seq technology, we constructed a high-density genetic map comprising 11,398 SNP markers and identified 24 stable QTLs for soybean seed oil content and fatty acid composition using a 181 RIL population derived from a wild soybean and a cultivated soybean. Among these QTLS, one QTL qPA10_1 did not overlap with or was not close to previously reported QTLs and also did not contain any QTNs associated with palmitic acid, indicating that it is a novel locus. Some interesting genes in the QTL regions were also identified and are worthy of further investigation. Our study provides a valuable information that may contribute to the elucidation of oil biosynthesis in soybean.

Plant materials
The RIL mapping population comprised 181 F 7 progenies derived from a cross between the wild soybean ZYD00463 (Glycine soja) and the cultivated soybean WDD01514 (Glycine max). The parents of ZYD00463 (Glycine soja) and WDD01514 (Glycine max) were provided by the Oil Crops Research Institute of the Chinese Academy of Agricultural Sciences (Wuhan, China). All 181 RILs and their parents were planted at two experimental stations in Hubei Province, Wuhan (N30°35′, E114°33′) and Henan Province, Xuchang (N34°02′, E113°81′) in 2015 and 2016. The two sites possess different climatic conditions, with Wuhan experiencing higher temperatures and more rain and Xuchang experiencing lower temperatures and less rain. Three replicates of the parents and progenies were planted following a randomized complete block design. Each plot comprised a 2.5-m row with 1.0-m spacing between rows and 0.5m spacing between adjacent plants. For each genotype, seeds were harvested from five plants from each plot at the R8 growth stage (full-maturity stage) [70]. The soybean seeds were air-dried until a constant weight, then the traits were assessed as described below.

Oil and fatty acid determination
The seed oil content and fatty acid composition were determined according to Wei et al. [71] with minor modifications. Briefly, 30 soybean seeds from each line were ground to a fine powder. Twenty milligrams of each powdered sample were transferred to a 10-mL glass tube. Sulfuric acid-methanol (5%, 2 mL), butylated hydroxytoluene (BHT) (0.2%, 25 μL), methylbenzene (300 μL), and internal standard (IS) (methyl heptadecanoate, Sigma Aldrich, St. Louis, USA, 2.5-5 mg/mL, 100 μL) were added to the samples for fatty acid methyl ester (FAME) preparation, and the sample was esterified in a water bath at 90-95°C for 1.5 h. After cooling to room temperature, sodium chloride (0.9%, 1 mL) and n-hexane (1 mL) were added to the extracts. The supernatant was obtained for gas chromatography (GC) analysis.
The esters were separated by a GC (Agilent 6890 N, USA) fitted with a capillary column (FFAP, 30 m, 0.25 mm i.d., 0.50-μm film thickness). Nitrogen was used as the carrier gas at an inlet pressure of 25 psi. The temperatures of the injection port and detector (FID) were maintained at 250°C and 260°C, respectively, and the temperature program for the column was as follows: 210°C (1 min), increasing to 230°C at 10°C/min (22 min). The computer software used for statistical analysis was STATISTICA version 6.0 (Statsoft Inc., Oklahoma, USA). The peaks were identified based on their retention times using authentic standards of fatty acid methyl esters. The relative peak area was used for quantification of the contents of the fatty acids. The formula for calculating seed oil content was as follows: where A t and A s are the total peak area and the internal standard peak area identified based on their retention times, respectively; and m s and m i are the weights of the internal standard and dry seed. The samples of each line were determined thrice.

Statistical analysis
All the phenotypic data were analyzed using PROC MIXED program in SAS 9.3 (SAS Institute Inc., Cary, NC, USA). Pearson's correlation coefficients among all traits were calculated from lines in single environments using the PROC CORR function in SAS 9.3. The heritability in a single environment was estimated as follows: The heritability across environments was calculated as follows: where δ 2 g is the genotypic variance component for traits per plot among the RILs, δ 2 e is the error variance, r is the number of replications for the trait, and n is the number of environments [72,73]. ANOVA was conducted across environments to determine the significance of genotype, environment, and their interactions. The error components of variance ( δ 2 e ), genotype × environment interaction ( δ 2 gy ), and genotype ( δ 2 g ) were analyzed using the general linear model procedure (PROC GLM) in SAS 9.3 (SAS Institute Inc.). All parameters were estimated from the expected mean squares in the ANOVA.

Genetic map construction
The SLAF-seq method was applied to develop SNP markers between the two parents [74]. The SNP markers were used to construct the high-density genetic linkage map using the Kosambi mapping function of the JoinMap version 4.0 software [75]. The SNP markers were grouped based on a LOD score of 3.0 and then ordered by the input algorithm to estimate the recombination frequencies. Recombination frequencies between linked loci were transformed into distances (cM) [76]. The collinearity of the LGs with the soybean reference genome was analyzed by aligning the sequence of each SNP marker with the genome sequences of Williams 82 [77].

QTL mapping and candidate gene prediction
Composite interval mapping (CIM) incorporated into WinQTL cartographer version 2.5 was used to detect additive QTLs [78]. For each trait, the threshold for the identification of a significant QTL with a LOD > 3.6 was estimated by permutation tests with 1000 repetitions at P < 0.05. Cofactors were taken into account, and a window size of 10 cM around the test interval was selected for CIM analysis. The distribution of the QTLs on the genetic linkage map was mapped using MapChart version 2.2 [79]. The detected QTLs were denoted by combining a letter or letters representing the abbreviation of traits with a chromosome number [80]. The QTLs that were repeatedly detected in at least two environments, Wuhan and Xuchang across 2 years, were defined as stable QTLs in this study. The QTLs that were previously reported by other groups were defined as reported QTLs. The predicted genes within the stable QTL intervals were obtained from the SoyBase database according to the annotation of the soybean reference genome (Wm82.a2.v1.1). Gene Ontology (GO) enrichment analysis of the predicted genes was performed using the GO website with default settings [81].