Association of yield-related traits in founder genotypes and derivatives of common wheat (Triticum aestivum L.)

Background Yield improvement is an ever-important objective of wheat breeding. Studying and understanding the phenotypes and genotypes of yield-related traits has potential for genetic improvement of crops. Results The genotypes of 215 wheat cultivars including 11 founder parents and 106 derivatives were analyzed by the 9 K wheat SNP iSelect assay. A total of 4138 polymorphic single nucleotide polymorphism (SNP) loci were detected on 21 chromosomes, of which 3792 were mapped to single chromosome locations. All genotypes were phenotyped for six yield-related traits including plant height (PH), spike length (SL), spikelet number per spike (SNPS), kernel number per spike (KNPS), kernel weight per spike (KWPS), and thousand kernel weight (TKW) in six irrigated environments. Genome-wide association analysis detected 117 significant associations of 76 SNPs on 15 chromosomes with phenotypic explanation rates (R2) ranging from 2.03 to 12.76%. In comparing allelic variation between founder parents and their derivatives (106) and other cultivars (98) using the 76 associated SNPs, we found that the region 116.0–133.2 cM on chromosome 5A in founder parents and derivatives carried alleles positively influencing kernel weight per spike (KWPS), rarely found in other cultivars. Conclusion The identified favorable alleles could mark important chromosome regions in derivatives that were inherited from founder parents. Our results unravel the genetic of yield in founder genotypes, and provide tools for marker-assisted selection for yield improvement. Electronic supplementary material The online version of this article (10.1186/s12870-018-1234-4) contains supplementary material, which is available to authorized users.


Background
Wheat, the most widely grown grain crop providing the food requirements of about 35% of the global population, generates the largest total harvest and is the most traded grain commodity [1][2][3]. Studying and understanding the phenotypes and genotypes of its agronomic traits may result in an improvement its yield stability.
Single nucleotide polymorphisms (SNP), as thirdgeneration molecular markers, are superior in automated genotyping [4][5][6]. There are many reports on the use of high-density Illumina iSelect 90 K SNP chips in generating linkage maps [7][8][9]. For example, Gao et al. [7] built a genetic linkage map of hexaploid wheat that included 5536 polymorphic SNP markers covering a genetic length of 3609.4 cM using the 90 K iSelect SNP array. Jin et al. [9] identified 46,961 polymorphic SNPs in a 176-RIL population derived from a Gaocheng 8901/Zhoumai 16 cross using the 90 K and 660 K SNP arrays, and they produced a genetic map with a total length of 4121 cM and marker density of 0.09 cM/marker in bread wheat.
Analysis of the breeding history of many crop species revealed the presence and roles of founder parents. Molecular markers were used to analyze the contributions of the genetic bases of founder parents in improvement of barley [16], sugarcane [17], rice [18][19][20], and wheat [21,22]. For example, Li et al. [19] and Tan et al. [20] built genetic maps of rice showing that quantitative trait loci (QTLs) of kernel number per spike, thousandgrain weight, and yield in the founder parent Minghui 63 were transmitted to the progenies over generations. By pedigree tracking of the founder parent Beijing 8, Li et al. [21] found that the frequencies of alleles unique to Beijing 8 varied from 0 to 0.96 in its 51 descendants, suggesting that some of them underwent rigorous artificial selection. Jiang et al. [22] confirmed that Ningmai 9 could serve as a founder parent and found some significant chromosome regions that might be used in wheat breeding.
In this study we genotyped 215 wheat cultivars using the iSelect 9 K SNP array, including 11 founder parents and 106 derivatives. Based on multi-environmental trial data we used GWAS to identify favorable alleles of yield-related traits through sequential generations of breeding. Favorable alleles identified in derivatives could be used to detect important chromosome regions inherited from the founder parents. This information might be used for marker-assisted selection (MAS) in wheat breeding.

Phenotypic assessment
The average coefficients of variation for phenotypic traits in each environment ranged from 6.29 to 26.35%, indicating considerable phenotypic variation (Table 1). There were significant positive correlations between traits across environments (P < 0.01; Additional file 1: Table S1).
The founder parents Funo, Bima 4, and Nanda 2419 and their derivatives over following generations were compared in terms of yield-related traits, including plant height (PH), spike length (SL), spikelet number per spike (SNPS), kernel number per spike (KNPS), kernel weight per spike (KWPS), and thousand kernel weight (TKW).
PH gradually declined and TKW increased with advancing generations, while SL, SNPS, KNPS, and KWPS showed no significant changes. This indicated continuing selective pressure on PH and TKW during breeding (Additional file 2: Table S2).

Allelic diversity and genetic structure
Genotyping of the 215 wheat cultivars using 9 K SNP array identified 4138 polymorphic SNPs, of which 3792 were mapped to single chromosome positions. Among them, 1795 were present in the A genome chromosomes, 1787 in the B genome, and only 210 in the D genome (Additional file 3: Table S3). Genetic diversity was analyzed using the 3792 SNPs. Gene diversity and polymorphism information content (PIC) ranged from 0.009 to 0.500 and from 0.009 to 0.375, with averages of 0.319 and 0.256, respectively. Major allele frequencies reached a maximum of 0.995, with an average of 0.762 (Additional file 3: Table S3), indicating that the germplasm was highly diverse.
The number of subpopulation (K) was plotted against the ΔK calculated from the Structure, and the peak of the broken line graph was observed at K = 2 (Fig. 1a, b). The neighbor-joining method was used to classify 215 wheat cultivars based on Nei's standard genetic distance [23], and they were divided into two groups (Fig. 1c). The first group (162) mainly consisted of Funo, Nanda 2419, and their derivatives, which mainly originated from Anhui, Henan, Hunan, Jiangsu, Shaanxi, and Sichuan provinces. The second group (53) mainly consisted of Bima 4 and its derivatives, which mainly originated from Beijing and Shandong. This further demonstrated that the population was basically divided into two subpopulations.

Associations between yield-related traits and SNPs
Of the 3792 SNP markers, 3271 had a frequency higher than 0.05. Association analyses between the six yieldrelated traits and SNP markers showed that there were 117 significantly associated signals (P < 3.06 × 10 − 4 ) among the 76 associated SNP loci, including 20, 35, 6, 23, 24, and 9 signals associated with PH, SL, SNPS, KNPS, KWPS, and TKW, respectively (Fig. 2). The phenotypic explanation rates (R 2 ) ranged from 2.03 to 12.76%. The associated loci were located on 15 chromosomes (Table 2). Significant associations were found in two or more environments for 25 SNP loci; for example, wsnp_Ex_c49211_53875575-5A (SL) was significantly associated in all six environments, whereas others were significant in two to five environments ( Table 2).

Phenotypic effects of yield-related alleles
The phenotypic effects of alleles were further analyzed ( Table 3). Favorable alleles with larger genetic effects on PH, SL, SNPS, KNPS, KWPS and TKW were ; and wsnp_Ku_rep_c69511_68887456-3A TT (increases in TKW by 1.41 g in 09TA, 1.01 g in 09YL, 1.48 g in 09YZ, and 1.33 g in BLUP), respectively. The frequencies of these alleles at associated loci ranged from 6.05 to 97.21%, and exceeded 50% for 64 alleles, indicating strong selection on those alleles in breeding.

Transmission of favorable alleles from founder parents
All 76 alleles with a positive effect on yield-related traits identified in the association analysis were used to analyze the transmission rates of alleles from founder parents to progenies, as well as the frequencies of favorable alleles in later generations. Transmission rates from the first generation of Funo to the fifth generation were between 81.88 and 65.48%, and the frequencies of favorable alleles in different generations changed from 71.99 to 78.21%. Transmission rates from the first generation of Bima 4 to the fourth generation were between 79.94 and 64.38%, and frequencies of favorable alleles increased from 74.79 to 79.49%. Likewise, transmission rates for first to fifth generation derivatives of Nanda 2419 were between 64.25 and 50.72%, while the corresponding frequencies of favorable alleles increased from 68.91 to 78.21% (Fig. 3). Although the transmission rates of alleles from founder parents decreased with the number of generations, the percentage of favorable alleles increased.
Overall analysis of chromosome regions involving 76 favorable alleles showed that among the 15 chromosomes with association signals for agricultural traits, only three regions, 95.5-97.8 cM on 3B, 136.2-144.1 cM on 4A, and 116.0-133.2 cM on 5A had high frequencies for alleles with a positive influence on yield traits (Fig. 4a). In particular, the 3B region was associated with SL and PH (Fig. 4b), while the 4A region associated with SL ( Fig. 4c). Additionally, the 116.0-133.2 cM region on 5A was present in derivate cultivars with high frequency and associated with KWPS (Figs. 4d and 5).

Genetic diversity of founder parents and derivatives
One hundred and seventeen of the 215 cultivars investigated in this study were first to fifth generation derivatives of Funo, Nanda 2419 and Bima 4 that were bred in different provinces of China. The 215 cultivars were divided into two groups, first (162 accessions, 75.3%) including Funo, Nanda 2419, and their derived offspring, while the second (53, 24.7%) included Bima 4 and many of its derived offspring. Pedigree analysis showed that the first generation derivatives of Funo Sumai 2 and Sumai 3, as well as the second generation Ning'ai 8628 and Wu 7815-4-1, clustered together. Moreover, the first generation of Funo derivatives obtained from a cross with Neixiang 5 (Zhengzhou 17, Kaifeng 10 and Xuchang 26), and the second generation obtained from crosses involving Zhengzhou 17 (Sudi 8112, Zhengzhou 741, Huapei 128-8 and Xiangmai 5) were also in the same cluster. Thus, different cultivars from the same original cross had high similarity, indicating little genetic differences in the traits analyzed. Moreover, among different clusters, genetically related lines mostly grouped in the same cluster, indicating the results were consistent with the genealogy (Additional file 4: Table S4). However, a few lines with direct pedigree relationship to a particular founder did not fall into the same group. For instance, 16 of the 17 first generation Funo lines belonged to the first group, but Linnong 14, which are 50% related to Funo, fell into the second group, showing that high performance offspring with large differences could be selectively bred from the same founder parent. Manhattan and Q-Q plots of six phenotypic traits with 3792 genome-wide SNP markers shown as dot plots of compressed MLM at P < 3.06 × 10 − 4 . Red horizontal line corresponds to the threshold value for significant association. Green and orange colors separate different chromosomes. Significantly associated SNP markers are labeled with blue dotted lines. (a) SNPs wsnp_Ex_c12048_19288999 and wsnp_Ku_c99567_87349060 associated with PH were consistently detected in 5 and 3 environments, respectively; (b) SNPs wsnp_Ex_rep_ c67779_66463916, wsnp_Ex_c3463_6348659 and wsnp_Ex_c49211_53875575 associated with SL were consistently detected in 4, 3 and 6 environments, respectively; (c) SNPs associated with SNPS were detected in less than two environments; (d) both wsnp_Ku_c29102_39008953 and wsnp_Ex_c13154_20784674 associated with KNPS were detected in 3 environments; (e) SNP wsnp_Ex_c12341_19693570 associated with KWPS was detected in 3 environments; (f) SNP wsnp_Ku_rep_c69511_68887456 associated with TKW was detected in 4 environments

Dissection of founder parents by favorable alleles
Previous studies found that the genes controlling important traits tended to be clustered rather than randomly distributed on chromosomes [24][25][26][27]. For example, Huang et al. [25] identified QTLs for TKW and KNPS in the Xgwm334a-Xgwm1043 region on chromosome 6A, PH, KNPS, and TKW near Xgwm786 on chromosome 7D, and KNPS, spike weight, heading date, TKW, and PH in the Xgwm1220-Xgwm1002 region also on chromosome 7D. Li et al. [26] localized eight QTLs for TKW, spike number per square meter, sterile spikelet number per spike and fertile spikelet number per spike near markers Xwmc31, Xgdm67, and Xgwm428 on chromosome 7D.
We investigated favorable allele combinations carried by the founder parents and found that among the 76 associated loci, Bima 4, Funo, and Nanda 2419 carried 58, 56 and 48 favorable alleles, respectively. Among the 25 associated loci detected in multiple environments, Bima 4, Funo and Nanda 2419 carried 20, 19 and 14 favorable alleles, respectively. In particular, the wsnp_CAP8_c1419_836050 -wsnp_Ex_c6129_10723019 region on chromosome 3B associated with both SL and KNPS; and wsnp_Ex_ c1563_2987002 -wsnp_Ex_c3463_6348659 on chromosome 4A associated with SL. Favorable alleles in these two segments were present with high frequency in Bima 4, Funo, and Nanda 2419, indicating that these varieties have

Conclusions
Two hundred and fifteen wheat cultivars were genotyped by the 9 K SNP iSelect assay and all were phenotyped for six yield-related traits in six environments. Comparisons of yield-related traits in founder parents Funo, Bima 4, Nanda 2419, and their derivatives indicated that breeders applied a strong selective pressure on PH and TKW. MAF, PIC and gene diversity analysis using 3792 SNP markers showed high genetic diversity. Genomewide association analysis of yield-related traits detected 117 significant associations at 76 SNP loci on 15 chromosomes. Twenty five associations were detected in two or more environments. Three regions with highfrequencies of favorable alleles were identified in position 95.5-97.8 cM on chromosome 3B, and in position 136.2-144.1 cM and 116.0-133.2 cM on chromosome 5A. The region on chromosome 5A associated with KWPS was highly distinctive in favorable alleles between founder and derived lines compared to other cultivars. Our findings partially identify the genetic basis of the role of founder parents in crop breeding, and provide information for future wheat improvement by marker-assisted selection.

Phenotyping
The whole germplasm set was planted at three locations (Taian in Shandong; Yangling in Shaanxi; and Yangzhou in Jiangsu) in two growing seasons (2008-2009 and 2009-2010). Field management followed local practices. The six irrigated environments were designated 09TA, 10TA, 09YL, 10YL, 09YZ and 10YZ. Field experiments were grown in randomized block designs with three replications. Each line was planted in 2 m, 5 row plots at 40 kernels per row, with a row spacing of 20 cm. The agronomic traits PH and TKW were measured at maturity. Thirty spikes of each line were randomly collected from the middle row and used for measurements of SL, SNPS, KNPS, and KWPS.

Genotyping and statistical analysis
Genomic DNA extraction was carried out using the CTAB method [34]. Descriptive statistical analysis and analysis of variance (ANOVA) of phenotypic data were calculated by using SPSS 21.0 (http://www.brothersoft. com/ibm-spss-statistics-469577.html). The best linear unbiased prediction (BLUP) method was used to calculate the mean values of each trait [35][36][37].
SNP genotyping was performed on the BeadStation and iScan instruments and conducted at the Genome Center of the University of California at Davis according to the manufacturer's protocols (Illumina, USA) [5]. Data correction, input and output performed using GenomeStudio v2011.1 [38]. Information on chromosome location of polymorphic SNPs was obtained from Cavanagh et al. [5]. PowerMarker V3.25 was used to estimate genetic diversity of SNPs [39]. Population structure of the 215 cultivars was evaluated with 3792 SNP markers distributed on all 21 chromosomes using Structure 2.3.4 with a burn-in period at 50,000 iterations and a run of 500,000 replications of Markov Chain Monte Carlo (MCMC) after burn in [40]. For each run, 5 independent runs were performed with the number of Fig. 4 Distributions of favorable alleles associated with agronomic traits in 215 cultivated varieties. Green indicates the favored allele at each locus, yellow indicates alternative alleles, white indicates missing data. a comparative distribution of favorable alleles associated with agronomic traits in 117 founder parents and derivatives and 98 independent varieties; b frequencies of favorable alleles associated with SL and PH in the chromosome 3B region 95.5-97.8 cM in the founder parents and derivatives and other cultivated varieties; c frequencies of favorable alleles associated with SL in the 4A region 136.2-144.1 cM after comparing founder parents and derivatives with an independent variety group; d frequencies of favorable alleles associated with KWPS in 5A region 116.0-133.2 cM after comparing founder parents and derivatives with cultivated varieties cluster K varying from 1 to 10, leading to 50 Structure outputs. Then the number of populations was estimated on the basis of the Evanno criterion [41]. Based on the Q + K model [42,43] and TASSEL 5.0 software [31] (http://www.maizegenetics.net), GWAS was performed using the yield-related traits and SNP marker data. After exclusion of SNP loci with frequencies < 0.05, a uniform suggestive genome-wide significance threshold (1/3271 = 3.06 × 10 − 4 , or P < 3.06 × 10 − 4 , -LogP > 3.51) was given.
The 215 wheat cultivars were grouped by the neighborjoining method in MEGA 5.0 [32]. The transmission frequencies of alleles from founder parents to later generations as well as favorable alleles were computed in this study. The transmission rate was defined as the percentage of average numbers of alleles carried by one generation derived from the founder parent relative to the total number of alleles detected. The frequency of favorable alleles was defined as the percentage of average numbers of favorable alleles carried by one generation relative to the total number of favorable alleles detected.