QTL mapping reveals key factors related to the isoflavone contents and agronomic traits of soybean (Glycine max)

Background Soybean is a valuable source of edible protein and oil, as well as secondary metabolites that can be used in food products, cosmetics, and medicines. However, because soybean isoflavone content is a quantitative trait influenced by polygenes and environmental interactions, its genetic basis remains unclear. Results This study was conducted to identify causal quantitative trait loci (QTLs) associated with soybean isoflavone contents. A mutant-based F2 population (190 individuals) was created by crossing the Korean cultivar Hwanggeum with low isoflavone contents (1,558 µg g−1) and the soybean mutant DB-088 with high isoflavone contents (6,393 µg g−1). A linkage map (3,049 cM) with an average chromosome length of 152 cM was constructed using the 180K AXIOM® SoyaSNP array. Thirteen QTLs related to agronomic traits were mapped to chromosomes 2, 3, 11, 13, 19, and 20, whereas 29 QTLs associated with isoflavone contents were mapped to chromosomes 1, 3, 8, 11, 14, 15, and 17. Notably, the qMGLI11, qMGNI11, qADZI11, and qTI11, which located Gm11_9877690 to Gm11_9955924 interval on chromosome 11, contributed to the high isoflavone contents and explained 11.9% to 20.1% of the phenotypic variation. This QTL region included four candidate genes, encoding β-glucosidases 13, 14, 17–1, and 17–2. We observed significant differences in the expression levels of these genes at various seed developmental stages. Candidate genes within the causal QTLs were functionally characterized based on enriched GO terms and KEGG pathways, as well as the results of a co-expression network analysis. A correlation analysis indicated that certain agronomic traits (e.g., days to flowering, days to maturity, and plant height) are positively correlated with isoflavone content. Conclusions Herein, we reported that the major QTL associated with isoflavone contents was located in the interval from Gm11_9877690 to Gm11_9955924 (78 kb) on chromosome 11. Four β-glucosidase genes were identified that may be involved in high isoflavone contents of soybean DB-088. Thus, the mutant alleles from soybean DB-088 may be useful for marker-assisted selection in developing soybean lines with high isoflavone contents and superior agronomic traits. Supplementary Information The online version contains supplementary material available at 10.1186/s12870-023-04519-x.

Despite their importance, determining the genetic basis of isoflavone biosynthesis and accumulation has been difficult, with previous studies having limited success in accurately identifying the associated quantitative trait loci (QTLs).The isoflavone content is considered to be a quantitative trait primarily influenced by minor QTLs.Earlier research revealed the isoflavone content may vary by up to 3-fold depending on the effects of the genotype, environment, and genotype × environment interaction [14][15][16].Linkage mapping is a typical method used to identify genomic loci related to complex qualitative/quantitative traits, with the recombination frequency of bi-parental populations (e.g., F 2 , DH, RIL, and NIL populations) used to identify causal alleles [17,18].Although the F 2 generation undergoes only a single meiotic cycle, which makes linkage mapping challenging, it is useful for quickly and easily generating populations [19].Additionally, QTL mapping using an F 2 population is an efficient way to analyze qualitative traits, such as disease resistance, as well as quantitative traits, including metabolite contents (e.g., protein, oil, and sucrose).However, secondary metabolite contents are generally less heritable than the contents of other major compounds, but the high heritability of specific isoflavone contents has been reported [20][21][22][23][24]. Accordingly, QTL analyses may be applicable for these isoflavone forms in soybean seeds.
During the past decade, molecular markers have been used to construct linkage maps for QTL analyses.Many researchers have tried to reveal the genetic factors underlying complex traits using various molecular markers, including restriction fragment length polymorphisms, simple sequence repeats (SSRs), and single nucleotide polymorphisms (SNPs) [25][26][27].The first linkage mapping study for isoflavones, which was conducted by Njiti et al. [28], involved the RIL mapping population of 'Essex' × 'Forrest' and 201 SSR markers.Meksem et al. [29] constructed a linkage map (2,823 cM) with an average marker spacing of 26 cM using 133 SSR markers and the RIL population used by Njiti et al. [28].They detected major QTLs on linkage groups A1, B1, and N. Primomo et al. [30] detected 17 isoflavone-related QTLs on 11 linkage groups using 207 RILs from ' AC756' × 'RCAT Angora' and 99 SSR markers.Zeng et al. [31] constructed a linkage map (2,020 cM) using 130 RILs and 125 SSR markers from multiple environments and detected novel QTLs (QDZF_1, QGTF_1, and QTIF_1) on linkage group F in Satt144.Gutierrez-Gonzalez et al. [32] created a linkage map (2,151 cM) comprising five QTLs using 446 SNP markers and five SSR markers and 188 RILs derived from 'Magellan' × 'PI 437654' .
In the past, low-density linkage maps were constructed because of insufficient polymorphic markers and technological limitations, resulting in large gaps between markers.However, recent advances in next-generation sequencing technologies, including genotyping-bysequencing, restriction-site-associated DNA sequencing (RAD-seq), and specific locus amplified fragment sequencing (SLAF-seq), have facilitated the identification of many SNPs relevant for the high-density genotyping of crops [33][34][35][36].On the basis of SLAF-seq and RAD-seq analyses, Li et al. [37], Pei et al. [38], and Cai et al. [39] detected novel QTLs (qIF20-2, aG8, qMD19, qMG18, qTIF19, and qIF5-1) that explained 4.2%-59.95% of the phenotypic variation (PV) and included isoflavone-related genes and transcription factor genes.The complete soybean genome sequence was successfully assembled in 2010 [40], which accelerated the development of high-throughput SNP chip platforms, including SoySNP6K Illumina Infinium BeadChip [41], SoySNP50K Illumina Infinium BeadChip [42], 180K AXIOM ® SoyaSNP array [43], NAJU 355K SoySNP array [44], and SoySNP618K array [45].Kim et al. [46] identified a novel GmSACPD-C allele using the RIL population derived from a high stearic acid soybean mutant, while also completing the genotyping for QTL mapping using the SoySNP6K platform and a KASP assay.Vuong et al. [47] genotyped 553 soybean plants using SoySNP50K and revealed 14 QTLs containing 60 SNPs associated with the resistance to the soybean cyst nematode through a GWAS. Lee et al. [48] genotyped 430 soybean lines from a core collection using the 180K AXIOM ® SoyaSNP array and determined that 14 novel SNPs responsible for soya-saponin biosynthesis are closely related to Glyma.07g524600 through a GWAS.Hu et al. [49] identified qPDH1 as a pod dehiscencerelated QTL that includes Glyma09g06290 (bHLH transcription factor) on chromosome 16 using 211 soybean accessions and the NAJU 355K SoySNP array.Li et al. [48] performed a GWAS and combined the results with QTL mapping data and identified candidate genes (E2, GmPRR3b, and GmVIP5) associated with the flowering time and the circadian clock in 265 soybean landraces using SoySNP618K.
Mutation breeding is an effective strategy for developing new cultivars with ideal agronomic traits and functional compounds.Gamma-rays are one of the most commonly used physical mutagens (60%) for mutation breeding [50].More than 3,402 mutant varieties of 210 species are registered in the International Atomic Energy Agency/Mutant Variety Database (http:// www.iaea.org/).We previously created a mutant diversity pool (MDP) comprising 208 mutant soybean lines from seven wild-type cultivars that were irradiated with 60 Co gamma rays (250 Gy).We subsequently performed target region amplification polymorphism (TRAP) and transposable element (TE)-TRAP marker analyses to clarify genetic associations as well as gene expression and GWAS analyses of agronomic traits and phytochemical contents using our MDP and selected a soybean mutant with high isoflavone contents (average of 7,351 μg g −1 over a 3-year period) [51][52][53].Because there has been relatively little breeding for isoflavone contents, a QTL analysis was conducted using only a few varieties (e.g., Angora, Luheidou2, Wayao, and Zhongdou 27) with isoflavone contents of 2,246-3,791 μg g −1 as the parental plants [30,37,39,54].
The objective of this study was to address the abovementioned problems by (1) narrowing the large confidence interval of QTLs underlying isoflavone contents; (2) eliminating the genetic bottleneck by using a mutantbased mapping population; and (3) increasing the low isoflavone contents of the Korean soybean cultivar Hwanggeum, which has superior agronomic traits.An F 2 mapping population consisting of 190 plants was constructed by crossing the paternal parent DB-088 with the maternal parent Hwanggeum.To reveal the genetic basis of the ideal agronomic traits and isoflavone contents, the mutant-based F 2 population was phenotyped and genotyped using the 180K AXIOM ® SoyaSNP array for the fine-mapping and QTL analyses.

Mapping population
The soybean mutant DB-088, which has high isoflavone contents, was selected from the MDP consisting of 208 soybean mutants at the Radiation Breeding Farm of the Korea Atomic Energy Research Institute (KAERI; 35.5699°N 126.9772°E,Jeongeup, Jeollabuk, Republic of Korea) [52].Briefly, 1,000 seeds were collected from seven soybean cultivars and irradiated with 60 Co gamma rays (250 Gy) at KAERI in 2008.We subsequently propagated 1,695 mutants from the M 1 generation to the M 5 generation via single seed descent.In the M 5 generation, we selected mutants with distinct morphological characteristics and agronomic traits (e.g., high yield, growth type, and climate adaptability).We then constructed a mutant population consisting of 208 genotypes without redundant phenotypes and propagated it until the M 10 generation; the population was designated as the MDP.The isoflavone contents were measured in the M 10 generation as well as in the M 11 and M 12 generations to determine the year-to-year variation over a 2-year period.Finally, DB-088 was selected following the gamma irradiation of Danbaek [55].
To construct the mutant-based mapping population, F 1 plants were generated by crossing DB-088 with Hwanggeum in the soybean breeding field of Chonnam National University (36°17′N, 126°39′E, Gwangju, Republic of Korea).A total of 190 F 2 plants were grown in plastic pots (15 cm diameter) filled with commercial soil (Hungnong, Pyeongtaek, Republic of Korea) at the Radiation Breeding Research Farm of KAERI for the production and harvest of mature seeds.

Descriptive statistics and visualization of phenotypic data
Descriptive statistics including minimum, maximum, mean, standard deviation (SD), skewness, kurtosis, and coefficient of variation were calculated using Microsoft EXCEL 2016.The Pearson's correlation coefficient (PCC) and analysis of variance (ANOVA) were estimated in R as follows: corr_coef (data, verbose = TRUE) function and aov (formula, data = NULL, projections = FASLE, qr = TRUE, contrasts = NULL) function in METAN package, respectively.The broad-sense heritability (H 2 ) was calculated through the variability package as follows: , where V g is the genotypic variance and V p is the phenotypic variance.A histogram for the phenotypes was generated using the hist function in R.

DNA extraction and SNP genotyping
Fully expanded trifoliate leaves from 190 F 2 plants and two parental plants were placed in 2 ml tubes and frozen immediately using liquid nitrogen.The leaves were ground to a fine powder using TissueLyser II (Qiagen, Valencia, CA, USA).Genomic DNA was extracted from 100 mg ground material using the DNeasy Plant Mini Kit (Qiagen).The genomic DNA quality and concentration were determined using the NanoDrop ND-1000 spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA).In addition, 190 F 2 plants and two parental plants (Hwanggeum and DB-088) were genotyped using the 180K AXIOM ® SoyaSNP array (180,961 SNP markers) according to the manufacturer instructions at DNALINK Inc. (Seoul, Republic of Korea).

Construction of a genetic linkage map and QTL mapping
A total of 26,760 SNP markers were filtered on the basis of a lack of heterozygous alleles and non-polymorphism between the parental plants.To construct the linkage map, polymorphic SNP markers were modified to the following input formats according to the JoinMap v4.1 (Kyazma, Wageningen, Netherlands) instructions: a, maternal allele; h, hetero allele; and b, paternal allele.The JoinMap v4.1 software was used to filter and identify the significant SNPs and genetic linkage according to physical positions.All SNPs that were selected had a segregation ratio that fit the expected 1:2:1 ratio as determined by the Chi-squared test.Next, the SNPs were filtered by screening for a lack of identical loci and genetic distortion according to locus similarity and the Chi-squared test, respectively, for the linkage mapping analysis.The genetic position was estimated on the basis of the recombination frequency using Kosambi's mapping function.Using MapQTL v6.0 (Kyazma), the map file, genotype file, and phenotype file based on the maximum logarithm of the odds (LOD) scores were determined to identify the significant QTLs using 1,000 permutations with a significance level of 0.05.On the basis of the permutations, the LOD score was set as a fixed value to detect the presence of QTLs across all chromosomes.The initial interval mapping (IM) and multiple QTL mapping (MQM) methods were used for QTL mapping.Using a map file, including genotypes and genetic positions, and the genetic linkage map with LOD peaks, the SNP positions in QTLs were reorganized with MapChart v2.2.Markers were named using the following format: q (i.e., QTL) + chromosome number + trait name + order.Information regarding the QTLs was obtained from SoyBase (https:// www.soyba se.org/) and Phytozome (https:// phyto zome-next.jgi.doe.gov/).

Gene Ontology and Kyoto Encyclopedia of Genes and Genomes enrichment and co-expression network analyses
The candidate genes within the causal QTLs were functionally annotated according to the Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases and a gene co-expression network analysis was performed using the ClueGO [57] and STRING [58] Cytoscape plug-ins.To detect the enriched GO terms and KEGG pathways, the strengths of the relationships between terms and pathways were determined according to chance-corrected kappa statistics and visualized using ClueGO v.2.5.9 and CluePedia v.1.5.9.The term-term associations represent clusters of enriched functions and functional diversity based on a predefined kappa score.The load marker and network specificity were adjusted for Glycine max (Wm82.a2.v1) and the Global option, respectively.Gene-gene interaction networks were constructed using StringApp v.2.0.1.A confidence cutoff value of 0.65 was applied to simplify networks and eliminate duplications.Information regarding the identifiers and gene descriptions were retrieved from the String database.Data were analyzed and visualized using Cytoscape 3.9.1.

RNA extraction and quantitative real-time PCR
Immature and mature Hwanggeum and DB-088 seeds were sampled at different developmental stages (R5, R6, R6.5, R7, and R8), with three biological replicates per sample.The seeds were immediately frozen and ground to a fine powder using liquid nitrogen and a mortar and pestle.Total RNA was extracted from the ground material using the TRIzol reagent (Invitrogen, Carlsbad, CA, USA).The RNA quantity and quality were determined using a NanoDrop ND-1000 spectrophotometer (Thermo Fisher Scientific).First-strand cDNA was synthesized from the high-quality RNA using the SuperScript III First-Strand Synthesis SuperMix (Invitrogen).Gene expression levels were analyzed by quantitative real-time PCR, which was completed using the CFX96 Real-time PCR system (Bio-Rad, Hercules, CA, USA) and the iTaq Universal SYBR Green Supermix (Bio-Rad).All steps were performed as described by the manufacturer.Relative expression levels were calculated using the 2 −ΔΔCt method [59], with Fbox serving as the reference gene.The primer sets (Table S1) were designed using Primer-3plus (https:// www.prime r3plus.com/).
In contrast to Danbaek, DB-088 is characterized by indeterminate growth, resulting in delayed flowering and maturation, which makes cultivation and breeding challenging.Therefore, we attempted to increase the isoflavone contents of Hwanggeum, which has large seeds, matures relatively early, and exhibits determinate growth.Compared with DB-088 during the growth, maturity, and harvesting stages, Hwanggeum had superior agronomic traits, including FT, DM, GH, and HSW (Table S3).Seven quantitative traits (DF, DM, PH, LW, LL, NN, and HSW) differed significantly between the parental plants (Figure S1).In the F 2 population, 13 agronomic traits were evaluated (Figure S2), of which only HC fit the 3:1 segregation ratio (χ 2 = 0.3439 and P = 0.5576).
The analysis of the correlations between the agronomic traits and 12 isoflavone forms indicated DF and DM were positively correlated with the isoflavone contents, with the exception of the aglycone and AGNI contents.Furthermore, compared with the other agronomic traits, DF (r = 0.34, P < 0.001), DM (r = 0.50, P < 0.001), and LW (r = 0.34, P < 0.001) had stronger positive correlations with the TI content.The average TI, DF, DM, and LW values for the bottom/top 20% of the F 2 population were 1,655.72/5,279.66μg g −1 , 46.4/51.95days, 127.25/151.05days, and 7.29/8.65cm, respectively.The differences were significant (P < 0.001).

Genetic linkage map for the F 2 population using a high-density SNP chip array
To construct the linkage map, the raw data (180,961 SNPs) were filtered using the following criteria: no missing data and monomorphic regions in the two parental plants and the F 2 population (Table 2 and Figure S3).Of the 26,760 SNPs (14.79%) that were retained, 5,382 (20.11%) with a segregation ratio of 1:2:1 were mapped to 20 chromosomes.The number of SNPs mapped to a chromosome ranged from 114 (chromosomes 11 and 12) to 479 (chromosome 2), with an average of 269 per chromosome.The relatively even distribution of the SNP markers among the 20 chromosomes was appropriate for constructing a linkage map.The chromosome length ranged from 87.81 cM (chromosome 4, 142 SNP markers) to 241 cM (chromosome 8, 377 SNP markers), with an average length of 152.46 cM.The total length of the linkage map was 3,049.14cM.The average interval between adjacent markers ranged from 0.38 cM (chromosome 2) to 1.58 cM (chromosome 11), whereas the largest gap between SNP markers varied from 5.80 cM (chromosome 8) to 31.26 cM (chromosome 11).On average, 98.14% of the SNP markers on each chromosome were separated by < 5 cM [range: 92.11% (chromosome 11) to 99.74% (chromosome 3)], implying the mapped SNP markers were generally clustered together on the chromosomes.

Expression patterns of the candidate genes affecting the isoflavone contents and agronomic traits
The Hwanggeum and DB-088 seed TI contents were measured at five seed developmental stages (Figure S5).The increase in the TI content varied among the examined stages.In addition, the TI content was significantly higher (P < 0.001) in the DB-088 seeds than in the Hwanggeum seeds at all developmental stages.Moreover, the Hwanggeum and DB-088 seed TI contents increased rapidly in the R6.5, but the TI contents in R6.5 and R7 were 3.12-fold and 3.7-fold higher in the DB-088 seeds than in the Hwanggeum seeds, respectively.
The expression levels of four candidate genes within the major interval Gm11_9877690-Gm11_9955924 were analyzed in different seed developmental stages (Fig. 4).Among the four BGLU (β-glucosidase) genes, the BGLU13 (Glyma.11g129800)and BGLU17-1 (Glyma.11g129900)expression levels in Hwanggeum increased by more than 7-fold in R7, whereas the BGLU14 (Glyma.11g130000)and BGLU17 (Glyma.11g130100)expression levels in DB-088 tended to be highest during the early stages (R5, R6, and R6.5).The candidate gene expression levels differed significantly between the parental plants.The interactions between genes with similar expression patterns, which were revealed by the co-expression analysis, suggest the genes may contribute to the regulation of isoflavone contents and agronomic traits.

Co-expression networks for the candidate genes responsible for isoflavone contents and agronomic traits
The 267 candidate genes within the QTLs related to isoflavone contents were annotated with 102 GO terms, resulting in 298 connected gene nodes and 596 edges.These genes were categorized into 58 biological processes, 19 cellular components, and 28 molecular functions (Fig. 5).A total of 128 functionally characterized genes were annotated with 15 GO terms, including catalytic activity, nucleic acid binding, and cellular response to chemical stimulus.Many genes were annotated with multiple GO terms.The 174 candidate genes within the QTLs related to agronomic traits were used to construct the co-expression networks on the basis of gene-gene interactions.According to the enriched GO terms and KEGG pathways, the candidate genes were categorized into 49 biological processes, 17 cellular components, and 13 molecular functions (Figure S6).Detailed regarding these GO terms are provided in Tables S5 and S6.The co-expression network analysis indicated 40 genes were either directly or indirectly linked through one or more interactions with the isoflavone contents, reflecting the similarity in their functions (Fig. 6).Additionally, within the causal QTLs, one gene related to agronomic traits and four genes associated with isoflavone contents were involved in gene-gene interactions.Specifically, within the major interval for agronomic traits on chromosome 11, Glyma.11g129900(β-glucosidase 17, BGLU17) interacted with four BGLU genes, namely Glyma.11g129500(BGLU13), Glyma.11g129700(BGLU17), Glyma.11g129300(BGLU15), and Glyma.11g129600(BGLU17), which were responsible for DF, GH, PT, and LL (Figure S7).In terms of the isoflavone contents, an interaction was detected between two BGLU genes, Glyma.11g129800(BGLU13) and Glyma.11g129900(BGLU17), which were associated with MGLI, MGNI, ADZI, and TI (Fig. 6).

Discussion
Isoflavones are polyphenolic compounds that are highly abundant in soybean [10].Soybean isoflavones are multifunctional compounds important for legume-microbe interactions and the estrogenic effects of 17B-estradiol [3,12].However, the practical breeding strategies for developing soybean varieties with high isoflavone contents are inefficient and inadequate.We previously developed the DB-088 mutant with high isoflavone Table 3 QTL analysis for isoflavone contents in the F 2 population DZE Daidzein, GNE Genistein, DZI Daidzin, GLI Glycitin, GNI Genistin, MDZI Malonyl daidzin, MGLI Malonyl glycitin, MGNI Malonyl genistin, ADZI Acetyl daidzin, AGNI Acetyl genistin, TI Total isoflavone, QTL Quantitative trait locus, Chr Chromosome; 1.5-LOD interval Indicates the significant location of a QTL, Bin no.Number of bin markers, GD Genetic distance, PD Physical distance, Peak Major peak in interval, LOD Logarithm of odds, and the ratio of the probability that a QTL is present, R 2 The ratio (%) of the QTL explains for the phenotypic variation, AE The additive effect, and negative value means that it is reduced by allele of HG, Ref Reference contents via gamma irradiation [52].Compared with the corresponding contents in Danbaek and Hwanggeum, there was an approximately 4-fold increase in the TI content (1,580.51/1,558.04μg g −1 , respectively) in DB-088 (aglycones: 52.61 μg g −1 and glucosides: 6,340.75μg g −1 ); the contents of the aglycones and conjugated forms were 3.7-fold/2.3-fold(14.34/21.98μg g −1 , respectively) and 4-fold/4.1-fold(1,566.17/1,536.06μg g −1 , respectively) higher in DB-088, respectively (Table S2).
Isoflavone conjugates, which are the main constituents in seeds, are relatively stable and soluble, making them appropriate for storage in vacuoles [64].Additionally, isoflavone conjugates can be converted into aglycones via fermentation, thereby increasing their bioavailability for the production of pharmaceuticals as well as moisturizing and anti-aging cosmetics [65,66].
In the current study, our objective was to identify QTLs associated with high isoflavone contents in DB-088.The SoyBase database contains relatively few QTLs related to secondary metabolites.For example, it includes 297, 81, and 31 QTLs related to the isoflavone, tocopherol, and saponin contents, respectively, whereas there are 1,032 QTLs associated with the oil content.Most earlier QTL analyses were conducted using homozygous populations, but maintaining large populations can be cumbersome, time-consuming, and costly [38,39,67].This compelled us to conduct a QTL mapping analysis of isoflavone contents and agronomic traits using a heterozygous population (F 2 generation).By identifying causal QTLs with potential candidate genes, we revealed the potential applicability of these traits.The heritability of the contents of 12 isoflavones varied, from the relatively low heritability of the aglycone contents (35%-72%) to the higher heritability of the glucoside (63%-98%), malonyl-glucoside (72%-96%), and TI (44%-99%) contents [5,30,68,69].Additionally, according to our data, the heritability of individual isoflavone contents varied from 52% (GLE) to 98% (GNE), with an average of 86.4%.The similarity in the heritability of isoflavone contents between the heterozygous and homozygous populations suggests that the isoflavone contents will be maintained at similar levels between generations.The correlation analysis showed that the TI content was highly positively correlated with the DZI, GNI, MDZI, MGNI, and ADZI contents (r > 0.89).Moreover, the contents of individual isoflavones and the TI content were positively correlated with the flowering  [4] also reported that MDZI (r = 0.92), MGNI (r = 0.90), DZI (r = 0.81), and GNI (r = 0.89) contents are positively correlated with the TI content, but their analysis detected a negative correlation between the ADZI and TI contents.In the present study, a delay in DF and DM by approximately 5 and 24 days, respectively, in the mapping population resulted in a 3.2fold increase in the isoflavone content.However, aglycone and AGNI contents were negatively correlated with almost all of the examined agronomic traits, including DF [r = − 0.12 (DZE) to − 0.27 (GLE)] and DM [r = − 0.16 (DZE) to − 0.37(GLE)].Our findings are consistent with those of a previous study that showed the TI content can increase by up to 60.41% depending on the maturity time [70].Moreover, Wang et al. [71] detected a positive correlation between the isoflavone content and maturity time, with isoflavone levels increasing during the seed filling stage under cool conditions.However, the GNI (r = − 0.27), DZI (r = − 0.17), and GNE (r = − 0.33) contents were negatively correlated with DM.
We verified 29 QTLs for isoflavone contents using a mutant-based F 2 mapping population and 180,961 SNP markers.The QTLs mapped to chromosomes 1, 3, 8, 11, 14, 15, and 17 had LOD scores of 4.36-17.7 and explained 6.4%-29.6% of the PV, suggestive of the presence of a major QTL underlying the isoflavone contents of soybean seeds.Of these QTLs, six related to isoflavone contents were also identified in earlier studies.Major QTLs for isoflavone contents were previously localized to chromosome 5.More specifically, gen-A1, dai-A1, and tot-A1, which are related to the GNE, DZE, and TI contents, are located in a 7-13.9cM interval between Satt236 and Sat_271 and account for 5.8%-9.2% of the PV [72], whereas qGEN5, qDAI5, and TOT5 (Satt236-Sat_271 and Satt174-Sat_236) explain 5.5%-8.3% of the PV and have a positive additive effect (50.3-161.7)[60].Gutierrez-Gonzalez et al. [32] identified major QTLs (qGNE5, qDAI5, and qTOT5) at the same position on chromosome 5 (BARC-042999-08498-BARC-016279-02316) using a different set of RILs.Yang et al. [73] also reported that the major QTL for the DZE, GNE, and TI contents is located near Sat_217 on chromosome 5.Furthermore, Cai et al. [39] confirmed that qIF5-1 is a major isoflavone-related QTL (611.4 kb) that contains 81 candidate genes, including genes encoding transcription factors and ubiquitin-related proteins.However, in the present study, although no significant QTLs were detected on chromosome 5, eight QTLs affecting more than two types of isoflavones were mapped to chromosome 11; these QTLs explained up to 20.1% of the PV.Kassem et al. [63,74] identified Satt197, Satt197, Satt251, and Sct26 on chromosome 11 as the major QTLs for the seed GLE content across multiple environments.These QTLs are localized to a 2.9-cM interval and explain 21%-50.2% of the PV.Additionally, qGEN11 (15.6 cM), qDAI11 (13.6 cM), and qTOT (3.7 cM) are located between BARC-04299-08498 and BARC-016279-02316 (623.4 kb) and account for 7% of the PV, whereas qGLY11 (responsible for 3% of the PV) was identified in an interval between BARC-054421-12081 and BARC-050069-09363 that includes a gene encoding 4-coumarate:CoA ligase (4CL) [32].In the current study, 22 QTLs associated with more than two isoflavone forms were revealed as closely related and were located on chromosomes 1, 8, 11, 15, and 17.The QTLs in the intervals had LOD scores of 4.37-11.7 and explained 6.4%-20.1% of the PV, suggestive of the broad contributions of both parental plants and the presence of additive effects on various types of isoflavones.Some of the identified genes encode proteins involved in the phenylpropanoid pathway (e.g., flavonoid and isoflavone biosynthesis).For example, the qGLI15 and qMGLI15 QTLs appear to include a gene (Glyma.15g240700)encoding MAP kinase 19.Genes encoding glycosyl hydrolase family members and acyl-CoA synthetase/ ligases were identified in qGNE01, qMGNI01, qMGLI11, qMGNI11, qADZI11, and qTI11.The Gm11_9877690-Gm11_9955924 interval related to the MGLI, MGNI, ADZI, and TI contents explained 11.9%-20.1% of the PV (Table 3).This interval was also related to agronomic traits, including DF, DM, GH, PT, LL, and NN.In fact, the variation in the isoflavone contents differed significantly depending on DF and DM, which is consistent with the results of our correlation analysis.Ultimately, four putative genes encoding a β-glucosidase (BGLU) were identified within this 78-kb interval on chromosome 11.The additive effect of this region on the TI content (613.6 μg g −1 ) was due to the mutant alleles in the paternal plant.
To date, there has been insufficient research on the effects of BGLU genes on isoflavone accumulation in soybean seeds.The BGLU13 and BGLU14 amino acid sequences are respectively 98% and 87% similar to that of BGLU17, which belongs to a group of hydrolases that specifically target flavonoid and isoflavonoid conjugates [75,76].Previous studies characterized a β-glucosidase (β-D-glucoside glucohydrolase; EC 3.2.1.21)as a glycosyl hydrolase contributing to the plant defense system (e.g., against biotic and abiotic stresses and herbivores), lignification, and cell wall remodeling.This enzyme can catalyze the hydrolysis of glucosides to their aglycones in two steps (glycosylation and deglycosylation), resulting in structural and functional diversity [77].It cleaves the β-glucosidic linkages of glucose conjugates in microorganisms, animals, and plants [78,79].A total of 47, 51, and 64 BGLU genes have been identified in Arabidopsis thaliana, Medicago truncatula, and Brassica rapa, respectively [76,77,80].Ishihara et al. [81] determined that BGLU6 is an acyl-glucose-dependent glucosyltransferase homolog that is co-expressed with phenylpropanoid biosynthetic genes; the mutation of this gene in the transgenic bglu6 mutant line decreases the flavonol 3-O-gentiobioside 7-O-rhamnoside (F3GG7R) content in accordance with the lack of F3GG7R associated with Ler BGLU6 genes.Our study findings were consistent with the results of previous studies showing that the total isoflavone content is approximately 4-fold higher in fully mature seeds (R7) than in immature seeds (R5) and green full-sized seeds (R6), with more than 80% of the TI content comprising glycosides and malonyl-glucosides.The glycosylation of isoflavones (glycosylation and malonylation) leads to increased solubility and stability, which is conducive to transport and storage in vacuoles.Thus, the overexpression of BGLU13 (Glyma.11g129800)and BGLU17 (Glyma.11g129900) in Hwanggeum likely inhibits the formation and accumulation of isoflavone-glucosides in the late seed developmental stage because of the deglycosylation.Conversely, these genes are expressed at extremely low levels in DB-088, which enables glucosides to accumulate in the seeds.

Conclusion
We mapped the causal QTLs underlying agronomic traits and isoflavone contents using an F 2 mapping population derived from the soybean mutant DB-088 and a high-density SNP chip array.The QTLs identified in this study included novel QTLs as well as previously reported QTLs.In the major interval on chromosome 11, we identified the following candidate genes associated with favorable agronomic traits and high TI contents: BGLU13 (Glyma.11g129800),BGLU14 (Glyma.11g130000),and BGLU17 (Glyma.11g129900and Glyma.11g130100).The study results may be useful for future research aimed at developing molecular markers applicable for the breeding of new soybean lines with optimized isoflavone contents and agronomic performance.

Fig. 1
Fig. 1 Distribution of isoflavone contents in the F 2 population.The red and blue dotted lines indicate Hwanggeum and DB-088, respectively.The green line indicates the average for the F 2 mapping population

Fig. 3
Fig. 3 Analysis of QTLs related to isoflavone contents using a high-density linkage map of 20 chromosomes.The SNP position and genetic distance are provided on the right and left sides, respectively.The QTLs are positioned to the right of the chromosome.The QTLs in the same color are in identical positions on the chromosome

Fig. 4 Fig. 5 (
Fig. 4 Expression levels of four candidate genes in the major interval in different seed developmental stages.The expression level was normalized to the reference gene Fbox.Error bars represent the standard error.*, **, and *** indicate significance at P < 0.05, 0.01, and 0.001, respectively

Fig. 6
Fig. 6 Gene-gene interaction network comprising the candidate genes associated with isoflavone contents.Edges (gray line) indicate gene-gene interactions.Node (gene) colors were arbitrarily selected.High confidence scores are indicated by thick and dark lines

Table 2
Information of linkage map in F 2 population