Genome-wide association study reveals the genetic basis of yield- and quality-related traits in wheat

Background Identifying the loci and dissecting the genetic architecture underlying wheat yield- and quality-related traits are essential for wheat breeding. A genome-wide association study was conducted using a high-density 90 K SNP array to analyze the yield- and quality-related traits of 543 bread wheat varieties. Results A total of 11,140 polymorphic SNPs were distributed on 21 chromosomes, including 270 significant SNPs associated with 25 yield- and quality-related traits. Additionally, 638 putative candidate genes were detected near the significant SNPs based on BLUP data, including three (TraesCS7A01G482000, TraesCS4B01G343700, and TraesCS6B01G295400) related to spikelet number per spike, diameter of the first internode, and grain volume. The three candidate genes were further analyzed using stage- and tissue- specific gene expression data derived from an RNA-seq analysis. These genes are promising candidates for enhancing yield- and quality-related traits in wheat. Conclusions The results of this study provide a new insight to understand the genetic basis of wheat yield and quality. Furthermore, the markers detected in this study may be applicable for marker-assisted selection in wheat breeding programs. Supplementary Information The online version contains supplementary material available at 10.1186/s12870-021-02925-7.


Background
Bread wheat (Triticum aestivum L.), which is a widely cultivated cereal crop that is highly adaptable, provides approximately 21% of the total calories and 23% of protein in the human diet (www.fao.org/faostat/en). As a staple food for about 35-40% of the global population, wheat is a good source of nutrients and has unique gluten properties, making it useful for producing diverse food products [1]. The increasing global population and improvements in the standard of living for many people worldwide have forced breeders to continually aim to produce new high-quality and high-yielding wheat varieties [2].
Yield and quality are complex traits. Additionally, the limited genetic diversity of bread wheat has resulted in breeding bottlenecks, and the application of traditional breeding methods has led to gradual increases in wheat yield and quality [3]. Genome sequencing and highthroughput chip-based genotyping platforms are critical for clarifying the mechanisms regulating the wheat yield potential and quality as well as for enhancing breeding methods [4]. Several SNP arrays (e.g., 9 K, 35 K, 90 K, 660 K, and 820 K) have recently been developed. They have been used to analyze bi-parental populations and identify loci (QTLs) controlling yield-and qualityrelated traits [5][6][7][8][9]. However, traditional QTL mapping methods are usually based on specific characteristics of parental populations, and are time-consuming and laborious [10].
GWAS are common in breeding programs because they are more efficient and require less effort in analyzing complex traits under various environmental conditions than other research methods [11]. Specifically, GWASs have been useful for detecting yield-associated loci in wheat, including plant height (PH), kernel number per spike (KNPS) and thousand grain weight (TGW) [12,13]. However, because of the need for many seeds and the substantial time required to assess some quality traits, there have been relatively few GWASs regarding wheat quality traits such as wet gluten content (WGC) and grain protein content (GPC) [14,15]. Moreover, there are few reports describing a GWAS conducted to investigate lodging resistance, which is an important factor influencing wheat yield and quality.
For a GWAS, the size and diversity of the panel are important because a small panel and large linkage disequilibrium (LD) blocks may lead to the identification of false positive associations [16]. Regarding wheat, only a few GWAS for yield and quality traits have involved large natural populations and SNP chips. Furthermore, wheat has been cultivated in China for more than 4000 years and has now been cultivated in 10 major agroecological zones [17]. Due to the long evolutionary period, Chinese wheat germplasms have been artificially selected in different regions and have regional genetic characteristics [18,19]. Accordingly, the objectives and requirements for improving wheat varieties differ considerably among these regions. Thus, we performed a GWAS of wheat yield and quality involving 543 representative bread wheat cultivars, including 531 Chinese wheat cultivars from 10 provinces, and a wheat 90 K SNP array following phenotypic analyses in six environments.
The aim of this study was to identify the stable loci and candidate genes significantly associated with wheat yield and quality. The results described herein may be useful for revealing the genetic basis of yield and quality. The corresponding SNP markers that were identified may ultimately facilitate the breeding of new highquality and high-yielding wheat varieties.

Phenotypic variation and correlation analysis
The phenotypic data for the 543 wheat lines characterized regarding growth-and development-related traits, yield-related traits, and quality-related traits in six environments are listed in Table S1. The phenotypic variations among genotypes were determined based on the heritability, range, mean, standard deviation, and the coefficient of variation. There were obvious variations for all traits, especially the coefficient of variation for thrust (TH) (49.28%) in the E5 environment. Table S2 provides the estimated correlation coefficients for this combined analysis. The broad-sense heritability (h 2 ) for most traits was approximately 0.80, with the highest and lowest heritabilities detected for PH (0.92) and PET (0.70). Accordingly, most traits were stable and largely determined by genetic factors. The correlation coefficient was highest (0.970) between wet gluten content (WGC) and grain protein content (GPC), but was also relatively high between TGW and GPR (0.919), wet gluten content (WGC) and flour content (FC) (0.842).

Quality-related traits
For the grain volume (GV), 17 significant SNP loci were detected on five chromosomes (1B, 3B, 3D, 6A, and 6B) and explained about 2.52-5.01% of the phenotypic variation. The two significant SNP loci for GPC detected on chromosomes 1A and 3D accounted for 2.35-2.53% of the phenotypic variation. Two significant SNP loci for WGC were detected on chromosomes 3D and 5D and accounted for 2.43-2.90% of the phenotypic variation. Ten significant SNP loci for the flour content (FC) were detected on chromosomes 1B, 4B, 5A, 5B, 6A, and 7B, accounting for 1.80-3.27% of the phenotypic variation.

Putative candidate gene analysis and expression data
In our study, the 200-, 380-, and 600-kb sequences flanking the related SNPs in subgenomes A, B, and D, respectively, were identified as potential candidate gene regions. A total of 638 putative candidate genes detected of the significant SNPs flanking-regions based on BLUP data were identified by screening the annotated genes in the recently released genome sequence (IWGSC RefSeq v1.0) (Table S4). We performed subsequent haplotype and expression analysis for the following three critical traits: SNS (Fig. 1), FD (Fig. 2), and GV (Fig. 3).
Regarding the SNPs associated with SNS, an association peak that included four significantly associated SNPs was detected on chromosome 7A (Fig. 1a, b). On the basis of the SNPs on chromosome 7A, three haplotypes were identified, with the mean SNS for haplotype II (19.57 ± 1.14) significantly lower than the corresponding values for haplotypes I (19.78 ± 1.33) and III (20.31 ± 1.43) (Fig. 1c, d). One of the four significant SNPs (BS00026622_51) was located at 942 bp in the third exon of the gene TraesCS7A01G482000. This locus was detected in four environments and in the BLUP model. SNP (C/A) at this location in the relevant genomic regions cause amino acids to change from leucine to isoleucine (Fig.  1e). The expression of the TraesCS7A01G482000 candidate gene when two nodes were detectable was significantly upregulated according to the RNA-seq analysis of the spike (Fig. 1f).
Among the SNPs associated with FD, four similar SNPs were distributed on chromosome 4B and all were detected in the BLUP model and in at least one environment (Fig. 2a). These four SNPs were BS00023035_51, Tdurum_contig48366_1324, Tdurum_contig50783_67, and Tdurum_contig50783_285, and four SNPs were separated 470.67 kb (Fig. 2b). Because of their close genetic relationship and significant correlation, these SNPs were used for the subsequent haplotype analysis, which revealed two distinct haplotypes (I and II). A total of 301 wheat materials were included haplotype I, with an FD of 3.25 ± 0.48 cm, whereas 208 wheat varieties were included haplotype II, with an FD of 3.17 ± 0.45 cm (Fig.  2c, d). The FD of haplotype I was significantly greater than that of haplotype II, implying lodging is less likely for wheat varieties with haplotype I than for varieties with haplotype II. Furthermore, three of the four significant SNPs were detected in the CDS of TraesCS4B01G343700. This gene contains two exons and the CDS comprises 2450 bp. The two SNPs located in the second exon resulted in amino acid changes from arginine to histidine and from aspartic acid to asparagine (Fig. 2e). The TraesCS4B01G343700 expression level increased in the stem and internode during wheat growth and development, peaking in the milk stage. This suggests that this gene helps mediate wheat internode growth and development (Fig. 2f). The changes in GV were highly correlated with a set of SNPs in a 1.95 Mb genomic region (528.62-530.57 Mb) on chromosome 6B (Fig. 3a, b). These loci were detected in multiallelic and BLUP contexts. Three haplotypes were detected in 407 wheat lines on the basis of genotyping results. More specifically, haplotypes I, II, and III were detected in 339, 40, and 28 accessions, respectively. The mean GV of haplotype II (789.88 ± 8.58) was significantly lower than that of haplotypes I (798.50 ± 8.29) and III (799.19 ± 4.60) (Fig. 3c, d). Moreover, one of the four significant SNPs (Excalibur_c11245_880) was detected at 3315 bp in the TraesCS6B01G295400 coding sequence (CDS). The SNP (G/A) leads to changes in amino acids from arginine to histidine (Fig. 3e). The RNA-seq data revealed that this gene was highly expressed in developing seeds at 6 days post-anthesis (Fig. 3f).

Discussion
GWAS have been conducted to analyze the yield and quality of various crops, including rice [21], soybean [22], cotton [23], and sorghum [24]. The size of the study population is closely related to the accuracy of the association analysis. Long et al. [25] reported that increasing the population size increases the number of individuals with rare alleles, thereby increasing the accuracy and efficiency of the positioning of rare alleles. Therefore, our GWAS for yield-and quality-related traits involved a wheat 90 K SNP array for 543 wheat accessions in multiple environments.
We completed a broad-scale comparison of the results of our study and those of previous investigations. For PH, a locus (RAC875_rep_c105718_585) on chromosome 4D identified in four environments based on BLUP data is about one LD from a QTL (QPH.caas-4DS) described by Li et al. [26], This previously identified QTL was also stably identified in two Chinese bread wheat populations. A QTL (QKNS.caas-4AL) for KNPS between markers Kukri_rep_c106490_583 and RAC875_c29282_566 on chromosome 4A was detected earlier in four environments by Gao et al. [13]. Two sites (Excalibur_c9370_966 and wsnp_Ra_rep_c70233_67968353) related to KNPS revealed in our study are located within this QTL. Gao et al. also reported a stable QTL (QTKW.caas-4BS.1) for TGW between markers BobWhite_c162_145 and Kukri_ c66885_230 on chromosome 4B. In the present investigation, a predicted TGW-associated SNP locus (Tdurum_ contig97386_207) was detected between two markers on chromosome 4B. Additionally, TaAPO-A1/WAPO-A1 (TraesCS7A02G481600) was identified as a candidate gene for SNS through map-based cloning [27][28][29]. Two loci . This gene is within one LD of a stable SNS-related locus (GENE-0787_85) revealed in the current study. For the quality-related traits, our GPC locus (GENE-0411_ 807) overlaps the GPC QTL mapped to chromosome 1A by Kumar et al. [31]. Li et al. [32] identified a locus (IWB41869) related to starch granules on chromosome 7B, which is close to the FC locus (RAC875_c26057_370) we detected on the same chromosome.
In addition, we have also identified new stable SNPs that affect specific genes and have been detected in multiple environments. For example, TraesCS7A01G482000, which is related to SNS, was predicted to encode a haloacid dehalogenase (HAD)-like hydrolase domaincontaining protein. In rice, the overexpression of OsHAD1 reportedly leads to enhanced phosphatase activity and increased total and soluble P contents in Pideficient transgenic seedlings during the early panicle development stage. Increasing the P uptake rate can promote spikelet formation [33,34]. The RNA-seq data available in an online wheat gene expression database indicated this gene is highly expressed when two nodes are detectable, which coincides with a key period for wheat spikelet development.
Lodging resistance is another important yieldrelated trait. Four significant SNPs for internode diameter were identified on chromosome 4B. These SNPs were detected within TraesCS4B01G343700. In Arabidopsis, AtVPS25 (vacuolar protein sortingassociated protein) regulates auxin biosynthesis via its effects on the expression of specific auxin-related genes. An increase in the auxin content of wheat plants may lead to increased stalk diameters and enhanced lodging resistance [35][36][37]. A transcriptomelevel analysis of TraesCS4B01G343700 in different tissues and developmental stages proved that this gene is highly expressed during the stem and internode development stage. The expression level peaked in the milk grain stage, implying the changes occurring in this stage have important implications for the lodging resistance of wheat plants.
Regarding the quality-related trait GV, we identified TraesCS6B01G295400, which encodes a LisH domainlike protein, as a candidate gene. In rice, OsLIS-L1 (lissencephaly type-1-like 1 protein) is important for male gametophyte formation, with mutations to this gene resulting in abnormal development. Additionally, the protein encoded by this gene influences grain characteristics and is closely related to the floral development and grain-filling stages [38]. An analysis of publicly available wheat RNA-seq data revealed that this gene is highly expressed in developing seeds, specifically in the starchy endosperm from day 6 to day 14 post-anthesis and showed a gradually decreasing trend. Accordingly, this gene is important for the early grain-filling stage.

Conclusions
In summary, we conducted a GWAS based on the wheat 90 K SNP array to investigate the yield-and qualityrelated traits in 543 major wheat accessions. The resulting data were analyzed to identify relevant SNP loci and candidate genes. We are currently developing Kompetitive Allele-Specific PCR markers for the significant loci. These markers will enable researchers and breeders to efficiently transfer alleles into elite wheat genotypes [39]. Additionally, a more thorough characterization of the candidate genes described herein may enhance our understanding of the molecular mechanisms regulating wheat yield and quality.

Plant materials and experimental design
A bread wheat panel of 543 genotypes including cultivars, regional test lines, and introduced parental lines was used, the details have been published in our previous paper [20].  Table S1. The phenotypic traits were assessed in all six environments. The phenotypic data for each environment and the BLUP data were used for the genome-wide association analysis.

Phenotypic data analysis
The descriptive statistical analysis and correlation analysis for the phenotypic data were completed using the SPSS 25.0 software. Pearson's correlation coefficients were calculated to evaluate the correlations among the traits.

SNP genotyping
The wheat 90 K Illumina Infinium SNP array was used to genotype the association panel containing 543 accessions. The SNP data were clustered and automatically called using the Illumina BeadStudio genotyping software (Illumina, San Diego, CA, USA). The data were filtered to remove alleles with a detection rate less than 0.1 and a minor allele frequency less than 0.05 [12]. Additionally, samples with a loss rate greater than 10% and a heterozygosity frequency greater than 20% were eliminated.

Genome-wide association analysis
The population structure, relative kinship, and LD were analyzed in a previous study [20]. In the current study, we completed a GWAS using the GAPIT package [40] in the R software. A mixed linear model program (Q + K) [41], with the population stratification results and kinship as covariates, was used to minimize false positives [40]. The P value threshold was calculated based on the number of markers (P = 1/n, n = total number of SNPs used) as described by Li et al. [42]. Regarding the GWAS results, a P value of 1/11,140 (−log 10 P = 4.05) was used as the criterion for identifying significant SNPs.

Prediction of candidate genes and expression analysis
The 'Chinese Spring' Genome database (IWGSC RefSeq v1.0, http: //www.wheatgenome.org/) was used for predicting candidate genes for the significant sites revealed by the genome-wide association analysis. Specifically, candidate genes around the significant sites were identified according to the differences in the LD decay distance among chromosomal groups. The expression profiles of putative candidate genes were analyzed using a wheat gene expression database available online (http://www.wheat-expression.com/). This database, which includes 850 wheat RNA-sequencing samples and an annotated genome, reveals the similarities and differences between homoeolog expression levels in diverse tissues, developmental stages, and cultivars [33,43].
Additional file 2: Table S1 Categories and descriptive statistics for the 25 yield-and quality-related traits evaluated in six different environments. Table S2 Correlations among 25 yield-and quality-related traits. Table  S3 Basic details regarding the SNP markers used for a genome-wide association study of 543 wheat genotypes. Table S4 Information regarding the candidate genes identified based on BLUP data.