Identification and validation of quantitative trait loci for kernel traits in common wheat (Triticum aestivum L.)

Background Kernel weight and morphology are important traits affecting cereal yields and quality. Dissecting the genetic basis of thousand kernel weight (TKW) and its related traits is an effective method to improve wheat yield. Results In this study, we performed quantitative trait loci (QTL) analysis using recombinant inbred lines derived from the cross ‘PuBing3228 × Gao8901’ (PG-RIL) to dissect the genetic basis of kernel traits. A total of 17 stable QTLs related to kernel traits were identified, notably, two stable QTLs QTkw.cas-1A.2 and QTkw.cas-4A explained the largest portion of the phenotypic variance for TKW and kernel length (KL), and the other two stable QTLs QTkw.cas-6A.1 and QTkw.cas-7D.2 contributed more effects on kernel width (KW). Conditional QTL analysis revealed that the stable QTLs for TKW were mainly affected by KW. The QTLs QTkw.cas-7D.2 and QKw.cas-7D.1 associated with TKW and KW were delimited to the physical interval of approximately 3.82 Mb harboring 47 candidate genes. Among them, the candidate gene TaFT-D1 had a 1 bp insertions/deletion (InDel) within the third exon, which might be the reason for diversity in TKW and KW between the two parents. A Kompetitive Allele-Specific PCR (KASP) marker of TaFT-D1 allele was developed and verified by PG-RIL and a natural population consisted of 141 cultivar/lines. It was found that the favorable TaFT-D1 (G)-allele has been positively selected during Chinese wheat breeding. Thus, these results can be used for further positional cloning and marker-assisted selection in wheat breeding programs. Conclusions Seventeen stable QTLs related to kernel traits were identified. The stable QTLs for thousand kernel weight were mainly affected by kernel width. TaFT-D1 could be the candidate gene for QTLs QTkw.cas-7D.2 and QKw.cas-7D.1.


Background
Common wheat (Triticum aestivum L.) is one of the most important cereal crops for feeds 40% of population in the world (http://www.fao.org/). Wheat yield is determined by thousand kernel weight (TKW), kernel number per spike, and effective tiller number [1]. Among them, TKW is the most stable and highest heritable trait, and it is also an important selection target for the genetic improvement of wheat yield [2]. Kernel weight is a complex yield component, which is mainly affected by kernel length (KL), kernel width (KW), kernel length / kernel width (KL/W) and kernel thickness [3]. Therefore, exploring the genetic variation of TKW and its related traits is an effective approach to increase wheat yield [4].
Previous researches have shown that conditional QTL mapping has been used to study genetic basis of complex traits in crops [22,23]. In wheat, conditional QTL analysis were carried out to evaluate the static genetic control of traits at different growth stages for kernel size and weight [23,24] and yield [25]; to reveal the dynamic genetic factors of plant height [26,27]; and to reveal the genetic contribution of different nitrogen and phosphorus supplement environments factors to QTL expression by dissecting QTLs based on trait values conditioned [28].
Using a RIL population derived from 'PuBing 3228 (P3228) × Gao8901 (G8901)', the objectives of this study were to (i) identify stable and major QTLs for TKW, KL, KW and KL/W under different field conditions; (ii) reveal the contribution of the other kernel traits to TKW using conditional QTL analysis; (iii) predict candidate gene(s) for targeted QTLs interval based on reference genome annotation information; (iv) develop KASP markers of the candidate gene(s) and verified by PG-RIL and a natural population consisted of 141 cultivar/lines for markerassisted selection in high-TKW wheat breeding.

Phenotypic performance and correlation analysis
The 176 RIL population and their two parents P3228, G8901 were planted in four environments to identify stable and major QTLs for kernel-related traits. The means and ranges of four kernel-related traits (TKW, KL, KW and KL/W) are listed in Table 1. Compared with P3228, G8901 had wider KW, but shorter KL ( Fig. 1 and Table 1). For the RIL population, the frequency of kernel traits in all environments and best linear unbiased predictors (BLUP) showed a continuous distribution with ranges from 27.33 to 44.97 g in TKW, 5.64 to 7.09 mm in KL, 2.84 to 3.39 mm in KW and 1.78 to 2.43 in KL/W (Table  1 and Fig. 2). The Shapiro-Wilk test and Pearson's correlation coefficients of the four traits were calculated based on the BLUP data of four individual environments, indicating that TKW, KL, KW and KL/W showed normal distributions in multiple environments ( Fig. 2 and Table 2). Moreover, TKW was positively correlated with KL and KW, and negatively correlated with KL/W ( Table 2). The variance for genotype, environment and genotype × environment (GE) interaction effects were highly significant in TKW, KL, KW and KL/W (Additional file 1: Table S1). All the broad-sense heritability (H) of four traits were higher than 0.60 ( Table 2), indicating that these traits were mainly determined by genetic factors.

QTL mapping
A total of 47 putative QTLs were detected for TKW, KL, KW and KW/L (Figs. 3a-3d and Additional file 1: Table  S2). Among them, 25, eight and 13 QTLs were located on the A, B and D genome, respectively. The single QTL explained 1.79-22.41% of the phenotypic variance with threshold log-of-odds (LOD) value ranging from 2.54 to 11 (Additional file 1: Table S2). Seventeen stable QTLs could be detected in more than two individual environments (Fig. 3a-e and Table 3).

Epistasis and QTL × environment interaction
A total of 15 pairs of epistasis QTLs for TKW, KL, KW and KW/L were detected, involving 30 QTLs on 15 chromosomes (Additional file 1: Table S3). Three pairs of epistasis interaction QTLs for TKW with PVE of 11.20, 7.10, and 8.93% were detected on chromosomes 1B/2D, 4D/6D, and 5A/6D, respectively, indicating that the interactions between those QTLs had no significant main effect on TKW (Additional file 1: Table S3). Three pairs of epistasis interaction sites of KL were detected, among which the interactions on chromosomes 4A/3B was between the major and non-major QTLs, while the interactions on 2D/3A and 6B/ 6D were between non-majors, and all of the three QTLs could increase KL (Additional file 1: Table S3). Four pairs of epistasis interactional QTLs for KW were detected, and they were all interactional between non-major QTLs. The two combinations of 3B/6A and 5B/6D could increase the KW, while the two combinations of 4B/6B and 5D/6B could decrease the KW. Five pairs of epistasis interactional QTLs for KL/W were detected, all of which were interactional between non-major QTLs. The two combinations of 6D/6D and 1B/6D could reduce KL/W, while the other three combinations could increase KL/W. QTL × environment (QE) interactions were detected at 43 loci for TKW, KL, KW and KW/L (Additional file 1: Table  S4). They overlapped with 47 putative QTLs of four traits, indicating that the TKW, KL, KW and KL/W were affected by environment. Among them, the largest environmental effect was detected in the interval AX-109416575-AX-108738265 (PVE (AbyE) = 21.93%), indicating that the major QTLs QTkw.cas-4A and QKl.cas-4A for TKW and KL, respectively, were significantly affected by the environment (Additional file 1: Table S4). Ten pairs of epistasis interactions were detected for additive-additive-environment (AAE), including three, one, three and three pairs of epistasis QTLs for TKW, KL, KW and KL/W, respectively (Additional file 1: Table S3).

QTL analysis for TKW conditioned on kernel-related traits
To dissect genetic effects of the KL, KW and KL/W on the expression of QTLs for TKW, conditional QTL analysis were conducted. After conditioned on KL, KW or KL/W, a total of 23 conditional QTLs comprising 47 QTL × environments were detected for TKW (Additional file 1: Table S5). Among them, 19 QTLs were identified as unconditional analysis, while the other 10 QTLs were newly detected, with four QTLs identified in at least two environments (Additional file 1: Table S5).
The QTLs QTkw.cas-2A.1, QTkw.cas-4A and QTkw.cas-4D were detected when TKW was conditioned on KW and KL/W instead of KL (Table 4 and Additional file 1: Table S5). This result indicated that these QTLs may be associated with KL, but independent of KW and KL/W. Four QTLs (QTkw.cas-5A, QTkw.cas-6A.1, QTkw.cas-7A and QTkw.cas-7D.2) were identified to be associated with KW, but independent of KL and KL/W (Table 4 and Additional file 1: Table S5). The QTL QTkw.cas-1A.2, was detected when TKW was conditioned on KL, but absent when conditioned on KW or KL/W ( Table 4), suggesting that it may be independent of KL, but was associated with either one or both of KW and KL/W. The stable QTL QTkw.cas-5D was not detected when TKW was conditioned on KL, KW or KL/W (Table 4).

Important QTL clusters
A total of seven QTL clusters were identified, all of them were related to more than one trait ( Fig. 3a-d and Table 5). Three intervals harboring various QTLs can be identified in at least three environments ( Fig. 3a-d, Tables 3 and 5). The interval AX-110540586-AX-108840708 on chromosome 4A affected TKW and KL across all the four environments and BLUP data, and the additional effects were derived from G8901 ( Fig. 3a-d, Tables 3 and 5). The interval AX-109892808-AX-110438513 on chromosome 6A affected TKW and KW across the three environments and BLUP data, with P3228 conferring the favorite allele ( Fig. 3c and Table 5). The interval AX-111061288-AX-111184541 on chromosome 7D showed significant effects on TKW and KW across three environments and BLUP data and on KL/ W in one environment and BLUP data (Table 5 and Fig.   Fig. 1 Phenotypic characterization of two parents and some representative RIL 3d). In this interval, the G8901-derived allele increased TKW and KW and decreased KL/W ( Table 3).
Subsequently, we annotated 47 genes in the 3.82 Mb region (Additional file 2: Table. S3). Among them, TaFT-D1 (TraesCS7D02G111600), a homolog of Arabidopsis FLOWERING LOCUS T, was considered as the candidate gene for QTkw.cas-7D.2 and QKw.cas-7D.1 (Additional file 1: Tables S6). Then, we designed genome-specific    primers for sequencing to analyse the genome sequence of TaFT-D1 from G8901 and P3228 (Additional file 1: Tables S9), and found that there was a 1 bp deletion at position + 840 in the third exon of TaFT-D1 in P3228.
Protein sequence alignment revealed that this deletion caused frameshift mutation with loss function of the TaFT-D1 protein in P3228 (Additional file 2: Fig. S2). We further analyzed the expression profiles of 47 candidate genes in different tissues using the Chinese Spring cv-1 development (pair) database [50]. As shown in Additional file 2: Fig. S3, the expression of TaFT-D1 was highest in leaves and young spikes, slightly lower in stems and substantially lower in root and developing grain.

Development of KASP markers and analysis for alleles of TaFT-D1
Two SNPs markers (AX-111061288 and AX-111184541) closely linked to the two stable QTLs (QTkw.cas-7D.2 and QKw.cas-7D.1) and 1 bp InDel of TaFT-D1 were further converted to KASP markers (Fig. 4a, Additional file 2: Fig. S4 and Additional file 1: Tables S9). After screening PG-RIL and a natural population consisted of 141 cultivar/ lines using these KASP markers, we found that the KASP marker of TaFT-D1 was co-segregated with SNPs marker AX-111184541. This result further proved that TaFT-D1 was an important candidate gene for the QTkw.cas-7D.2 and QKw.cas-7D.1. Furthermore, two-tailed t test was performed between the InDel of TaFT-D1 and four kernelrelated traits collected from multiple environments. The results showed that the InDel of TaFT-D1 was significantly correlated with TKW, KW and KL/W but not with KL for PG-RIL (Fig. 4b-e). For the natural population consisted of 141 cultivar/lines, the InDel of TaFT-D1 was associated with TKW and KW in the three environments, except that no significant differences were observed in the KL and KL/ W of G8901-allele (TaFT-D1(G)-allele) and P3228-allele TaFT-D1(G)-allele underwent positive selection during Chinese wheat breeding To determine whether the two TaFT-D1 alleles were subjected to selecting, we investigated the geographic distribution of the TaFT-D1 alleles in 150 Chinese wheat landraces and 172 modern cultivars. The Chinese wheat production area is divided into 10 agro-ecological wheat production regions according to environment, type of cultivars and growing season [51,52]. Compared with landraces, the proportion of TaFT-D1(G)-allele in modern cultivars was higher in the seven agro-ecological wheat production regions (except for regions IV, VIII and IX), suggesting that TaFT-D1(G)-allele have undergone positive selection during wheat breeding process ( Fig. 5a and  b). This confirmed that the favorable TaFT-D1(G)-allele can be used in different wheat production regions.

Unconditional QTLs and conditional QTLs effects
Previous researches have shown that the combination of QTL mapping and conditional genetic analysis enable the identification of the influence of one trait on another [22,28] (Table 4). Notably, QTkw.cas-5D was not detected when TKW was conditioned on KL or KW ( Table  4). The total PVE of the four QTLs conditioned on KW was significantly higher than the two on KL, indicating that KW contributes more than KL to TKW in the PG-RIL population ( Table 4). The unconditional QTL analysis showed that the major QTL QTkw.cas-4A on chromosome 4A was co-located with QTL QKl.cas-4A for KL, with G8901-derived allele increasing both TKW and KL (Table 3 and Fig. 3b). Using conditional QTL analysis, we found that the QTkw.cas-4A was entirely contributed by KL, partially by KW and entirely independent by KL/W (

QTL comparison
To date, a large number of QTLs for TKW and kernel morphological traits have been mapped in common wheat [45,48]. To investigate whether there were overlapping QTLs in different genetic backgrounds, we compared the QTLs interval in this study with those in the previous studies. Some stable QTLs have been reported in the previous studies. For example, the interval AX-108835689-AX-110438513 on chromosome 6A contained QTkw.cas-6A.1 and QKw.cas-6A, corresponding to the reported QTLs for kernel weight in different RIL population [44][45][46]. The gene TaGW2-A1 was also located in this interval, and it affects TKW by regulating the KW of bread wheat [13,52]. It was also reported that the major stable QTLs QTkw.cas-4A and QKl.cas-4A were in the interval AX-108738265-AX-109416575 (Table 5), overlapping with the locus for TKW in the previous study [40,47]. The QTL QTkw.cas-7D in the interval AX-111061288-AX-111184541 on chromosome 7D has also reported previously [39,43,53,54]. Therefore, these important QTLs that were not affected by genetic background are important selection targets in wheat breeding.

Advantages of high-density genetic maps
Previous genetic maps were mainly constructed by gelbased markers. Moreover, the confidence intervals associated with detected QTLs were relatively large and the numbers of markers was limited, which restricted further fine mapping of QTLs and their applications in breeding [27,38]. Compared with gel-based markers, high-density SNP arrays have the advantage of abundant markers and can further reduce the confidence interval for QTL localization.
In this study, we used the wheat 660 K high-density SNP chips to screen the PG-RIL population, and found that the confidence interval for most QTLs was less than 3 cM (Table 3 and Additional file 1: Table S2). Furthermore, the SNP markers in the confidence interval have clear base sequence and position information, which is effective for fine mapping using the reference genome [27]. For instance, the stable QTL QTkw.cas-7D.2 and QKw.cas-7D.1 were colocated in interval between 92.756-93.059 cM, and the physical interval of the Chinese Spring reference genome V1.0 is 65.50-69.32 Mb (Table 3 and Fig. 3).
Functional prediction of candidate genes for QTkw.cas-7D.2 and QKw.cas-7D.1 In crops, genes that regulated flowering have diverse functions, some affecting the yield-related traits [54]. Kernel weight can be manipulated by altering the duration of kernel filling, which is greatly influenced by flowering-related genes. For instance, overexpression of TaGW8, the positive regulator of cell proliferation and grain filling, results in early flowering and enhanced kernel width and yield in wheat [55,56]. Overexpression of TaZIM-A1 represses the expression of TaFT1, leading to a delay in heading date and decreased TKW in common wheat [57]. In the present  Table S6). Among them, compared with G8901, frameshift mutation of TaFT-D1 in P3228 leads to loss of protein function (Additional file 2: Fig. S2). TaFT1, a homolog gene of Arabidopsis FLOWERING LOCUS T, is a major gene that regulates wheat flowering [58,59]. It has diverse functions on regulating different reproductive traits, such as flowering time, spike development and seed development [60,61]. The loss function of TaFT-D1 in P3228-allele lines resulted in delayed flowering and decreased TKW, while the high expression of TaFT-D1 in the G8901-allele lines leads to accelerated flowering time and increased TKW. Notes: a A trait name in bold type indicates that major QTLs were detected for the corresponding trait, and a trait name in underlined type indicates that stable QTLs were detected for the corresponding traits. (+) indicates that the most favorable allele is derived from the parent P3228, (−) indicates that the most favorable allele is derived from the parent G8901 Note: a denotes the additive effect of a conditional QTL, in absolute values, that reduces or increase less than 10% compared to the corresponding unconditional QTL b denotes the additive effect of a conditional QTL, in absolute values, that reduces more than 10% compared to the corresponding unconditional QTL c denotes the additive effect of a conditional QTL, in absolute values, that increase more than 10% compared to the corresponding unconditional QTL. d denotes the QTL couldn't be detected in unconditional analysis, but can be detected in conditional analysis (+) indicates that the most favorable allele is derived from the parent P3228, (−) indicates that the most favorable allele is derived from the parent G8901. E and numerals in parentheses indicate the environment in which the QTL was detected and the percentage of phenotypic variance explained (PVE) by the additive effects of the mapped QTLs, respectively

Diagnostic marker and marker-assisted selection
Abundance of diagnostic markers in wheat enables breeders to create better combinations and select favorable cultivars to meet local breeding goals [62]. To date, numerous SNP loci related to kernel traits have been identified in wheat by high-throughput SNP chips combined with bi-parental populations [1,34,39]. In the present study, a KASP marker was developed to distinguish two alleles of TaFT-D1 and verified in PG-RIL and a natural population consisted of 141 cultivar/lines (Fig. . Furthermore, the alleles of TaFT-D1 were significantly associated with TKW and KW in both PG-RIL and natural populations (Fig. 4). G8901-allele, the favorable allele that produces higher TKW, was gradually accumulated during the wheat breeding process (Fig. 5). Therefore, the KASP marker can facilitate map-based cloning of QTkw.cas-7D.2 and QKw.cas-7D.1 and molecular-assisted selection breeding for high-yield in wheat.

Conclusions
In this study, we performed QTL analysis using the PG-RIL population in four environments for kernel-related traits (TKW, KL, KW and KL/W), which were mainly distributed on chromosomes 1A, 1B, 4A, 5D, 6A, 7A and 7D (Fig. 3 and Additional file 2: Table S1). A total of 17 stable QTLs were identified in more than two individual environments (Table 3). Notably, the stable QTLs for TKW were mainly affected by KW (Table 4). Furthermore, the QTLs QTkw.cas-7D.2 and QKw.cas-7D.1 were delimited to the physical interval of approximately 3.82 Mb, and TaFT-D1 was considered as the candidate gene. Based on a 1 bp InDel of TaFT-D1 between the two parents, a KASP marker of TaFT-D1 allele was developed and verified by PG-RIL and a natural population. The favorable TaFT-D1 (G)-allele associated with TKW and KW has been positively selected during Chinese wheat breeding. In addition, the current study provided new options for dissecting the genetic basis of yield and molecular-assisted breeding.

Plant materials and field trials
A mapping population composed of 176 F 6-9 RILs derived from 'PuBing3228 × Gao 8901 'was developed by single seed descent method. The wheat germplasm P3228 was developed by Dr. Lihui Li (Chinese Academy of Agricultural Sciences). G8901 is a commercial cultivar released by Gaocheng institute of agricultural science, Hebei, China. The P3228 has higher kernel number per spike and the G8901 has higher thousand kernel weight (Fig. 1). A natural population consisted of 141 cultivar/ lines (maintained in our laboratory, Additional file 1: Table S7) was used for the KASP marker screening and two-tailed t test. The 176 RILs, with two parents and the natural population were grown at the Luancheng Agroecosystem Station, Chinese Academy of Sciences (37°53′ 15″N, 114°40′47″E) during four growing seasons from 2013 to 2014 to 2016-2017. In each environment, the RILs, two parents and the natural population were planted in a randomized complete block design with three replicates. A 1.5 m 2 subplot with four 1.5 mlong rows, 0.25 m apart, and 30 seeds for each row were used. The water, fertilizer and other management of all field trials were carried out in accordance with local standard practices. In addition, 150 landraces of the Chinese wheat mini-core collection [63] and 172 modern cultivars (maintained in our laboratory) were used to analyze the geographic distribution of TaFT-D1 alleles (Additional file 1: Table S8). The 150 landraces and 88 modern cultivars of the Chinese wheat mini-core collection were kindly provided by Dr. Xueyong Zhang (Chinese Academy of Agricultural Sciences).

Phenotypic evaluation and statistical analysis
For the four environments, 10 representative plants were sampled from each plot to investigate kernel-related traits. At seed maturity, the TKW and kernel morphometric traits (KL, KW and KL/W) of at least 500 kernels were measured three times using the rapid SC-G grain appearance quality image analysis system (WSeen Detection, Hangzhou, China

QTL mapping
The 'PuBing3228 × Gao 8901' RIL population and the two parents were genotyped by the Affymetrix wheat 660 K SNP array [64]. A total of 101,136 loci showed polymorphisms between P3228 and G8901. The linkage map comprised 23 linkage groups that consisted of 4477 bins, spanning 3529.5 cM in length, with an average interval distance of 0.782 cM between the adjacent markers. Linkage analysis was performed using JoinMap v4 [65], and the genetic map was drawn by Mapchart 2.0 [66]. The QTLs were scanned with QTL ICIMapping V4.1 [67] through inclusive composite interval mapping of additive and dominant QTL (ICIM-ADD) [67]. The LOD score to detect the presence of a QTL was above at 2.50 [68]. Digenic epistasis and environment interaction of QTLs were analyzed using QTL ICIMapping V4.1 through inclusive composite interval mapping of epistatic QTL (ICIM-EPI) [69]. The LOD score to detect the digenic epistasis QTL was above at 5.0 [68]. The QTL × environment interactions were scanned with QTL ICIMapping V4.1 through inclusive composite interval mapping of additive and dominant QTL (ICIM-ADD) [70]. QTLs with overlapping confidence intervals were regarded as the congruent QTLs. The QTLs were named based on McIntosh et al. [71], 'cas' represents Chinese Academy of Science. Conditional QTL analysis was performed to analyze the genetic contributions of kernel-related traits to TKW, by the procedure of inclusive composite interval mapping [23]. The conditional phenotypic values (y (TKW|KL) ) of TKW in wheat were obtained by the mixed-model approach. The conditional phenotypic value can be partitioned as. y (TKW|KL)= μ (TKW|KL) + G (TKW|KL) + E (TKW|KL) + e (TKW|KL) . where (TKW|KL) denote TKW conditional on KL; y (TKW|KL) is the conditional phenotypic value of TKW on KL; μ (TKW|KL) is the conditional population mean, G (TKW|KL) is the conditional general genotypic effect; E (TKW|KL) is the conditional effect for the environment and e (TKW|KL) is the conditional residual error.

Comparison of QTLs related to kernel traits
We used flanking SNP markers sequence of QTLs to BLAST against the reference genome of Chinese Spring to acquire the physical position of the region [49]. High confidence candidate genes in the target interval were retrieved based on coding sequences (IWGSC_RefSeq_Annotations_ v1.0), and were further analyzed on NCBI Non-redundant protein sequences for function annotations. The expression profile database of nine candidate genes was blasted based on Chinese Spring cv-1 Development (pair) [50].
Additional file 1 Table S1 Analysis of variance for the investigated traits of the PG-RIL in four environments.