Genome-wide association studies of seven agronomic traits under two sowing conditions in bread wheat

Background Wheat is a cool seasoned crop requiring low temperature during grain filling duration and therefore increased temperature causes significant yield reduction. A set of 125 spring wheat genotypes from International Maize and Wheat Improvement Centre (CIMMYT-Mexico) was evaluated for phenological and yield related traits at three locations in Pakistan under normal sowing time and late sowing time for expose to prolonged high temperature. With the help of genome-wide association study using genotyping-by-sequencing, marker trait associations (MTAs) were observed separately for the traits under normal and late sown conditions. Results Significant reduction ranging from 9 to 74% was observed in all traits under high temperature. Especially 30, 25, 41 and 66% reduction was observed for days to heading (DH), plant height (PH), spikes per plant (SPP) and yield respectively. We identified 55,954 single nucleotide polymorphisms (SNPs) using genotyping by sequencing of these 125 hexaploid spring wheat genotypes and conducted genome-wide association studies (GWAS) for days to heading (DH), grain filled duration (GFD), plant height (PH), spikes per plant (SPP), grain number per spike (GNS), thousand kernel weight (TKW) and grain yield per plot (GY). Genomic regions identified through GWAS explained up to 13% of the phenotypic variance, on average. A total of 139 marker-trait associations (MTAs) across three wheat genomes (56 on genome A, 55 on B and 28 on D) were identified for all the seven traits studied. For days to heading, 20; grain filled duration, 21; plant height, 23; spikes per plant, 13; grain numbers per spike, 8; thousand kernel weight, 21 and for grain yield, 33 MTAs were detected under normal and late sown conditions. Conclusions This study identifies the essential resource of genetics research and underpins the chromosomal regions of seven agronomic traits under normal and high temperature. Significant relationship was observed between the number of favored alleles and trait observations. Fourteen protein coding genes with their respective annotations have been searched with the sequence of seven MTAs which were identified in this study. These findings will be helpful in the development of a breeder friendly platform for the selection of high yielding wheat lines at high temperature areas. Electronic supplementary material The online version of this article (10.1186/s12870-019-1754-6) contains supplementary material, which is available to authorized users.


Background
As the world population will be increased to 9 billion at the end of twenty-first century, it is predicted that the food demand, especially for wheat, will be increased by 50% by 2030 and 70% by 2050 [1,2]. On the other hand, the mean temperature in South Asia will increase by 4°C until 2050 or by end of this century [3]. Significant wheat yield losses of 32 to 39% worldwide [4] and 40% in less developed-irrigated wheat growing areas [5] have been reported. During grain filling duration, high temperature (> 30°C) also occurs in 40% of the temperate zones that grow 36 million ha of wheat [6].
Around 6% decline in global wheat production [7] and 3-4% in indo-gigantic planes [8] has been assessed for each degree rise in temperature during the reproductive stage of the crop. High temperature decreases chlorophyll content and photosynthetic capacity of leaves [9], decreases the grain number per spike due to floret abortion, and lowers the grain weight [10]. Grain filling rate and ultimately the yield are highly affected as the temperature increases above 30°C during anthesis [11]. Stone and Nicolas [12] reported that exposure of 32°C for 4 days reduced yield by 23%. Losses due to heat stress are phenomenal and compelling wheat researchers to focus more on heat stress as compared to other abiotic stresses such as drought, metal or radiation etc.
To cope with high temperature in wheat during reproductive stage is desired by the breeders. Traits affected by high temperature are investigated to develop heat tolerant wheat cultivars. It is therefore essential to grow wheat lines under normal and heat stressed conditions to examine the variability. Using conventional wheat breeding approaches, valuable traits related to stress tolerance have been incorporated. However, it is time consuming due to limited field screening methods and laborious due to low probability of combining coveted alleles [13]. Progeny screening can be more precise and genetic gains can be accelerated if the conventional approaches are aided by molecular based techniques.
To dissect the genetic basis of the quantitative traits associated with heat tolerance, loci for grain filling duration and heat tolerance were mapped [14,15] with few simple sequence repeats (SSRs). Other quantitative trait loci (QTL) for heat tolerance using SSRs were then identified [16,17]. These studies were based on low marker density using SSRs. QTL mapping identifies genomic regions with low resolution; it hinders their use in diverse germplasm [18].
In order to obtain better genomic dissection of complex quantitative traits like heat tolerance, a high-throughput and robust genotyping approach is in demand [19]. Most recently Diversity Arrays Technology (DArT) markers were used to map agronomic traits in heat prone areas [20]. Genotyping with SSRs and DArT is cost-ineffective, laborious and less accuracy of prediction [19]. Genotypingby-sequencing (GBS) is a high-throughput, cost effective, next generation sequence-based technology [21] which detects single nucleotide polymorphisms (SNP). GBS markers allow researchers to obtain high prediction accuracy as compared to previously used array of hybridization-based markers [22].
Based on such high density SNP markers, genome-wide association study (GWAS) can inspect large gene pools representative of diverse breeding reservoirs. GWAS is the most suitable approach to locate robust QTLs that show effect in both normal and stressed conditions. Recently GWAS has been effectively used to map QTLs for grain weight, drought stress, floret fertility, grain architecture and starch granule size [23][24][25][26][27].
Our objective was to use a high-density (GBS) platform through GWAS to identify chromosomal regions potentially associated with days to heading (DH), grain filling duration (GFD), plant height (PH), spikes per plant (SPP), grain number per spike (GNS), thousand kernel weight (TKW) and grain yield (GY) under normal and late sown conditions. Information from this scientific work will help wheat breeders for genetic improvement in the development of heat tolerant lines.

Phenotypic assessment
Description of seven studied trait at three locations under two treatments (NS: Normal and LS: Late sown) indicates significant (p-value < 0.01) effect on 125 genotypes (Table 1). For each trait, the difference recorded between NS and LS condition across all the environments was above the critical difference i.e. least significant difference (LSD at p-value 0.05). It was confirmed that the late sown trials faced high temperature from anthesis to physiological maturity that affects each studied trait. Yield losses due to high temperature ranged from 66 to 76% on average at three locations. Plant height was reduced by 25, 30 and 20% on average at Islamabad (ISD), Sargodha (SGD) and Bahawalpur (BWP) respectively. Average days to heading were decreased from 126 to 88 at ISD, 123 to 81 at SGD, 108 to 78 at BWP. Grain filling duration was decreased by 9 to 34%. Mean square of genotypes was also separately calculated for each location and treatment as a one-way source of variation to observe the inherent potential of the lines under study. Mean square of genotypes, locations and treatments for all studied traits were found to be significant (p-value < 0.001) with the degree of precision (CV%) ranging from 6.5 to 18.52 (Table 2). Interaction mean squares for all the traits except GFD were significant.
Biplots were constructed for NS and LS conditions separately. The two principal components explained 45.27% variability in NS and 53.09% variability in LS conditions. Traits TKW, GNS, PH, GY, SPP occupied the same section in the biplot which indicates similar pattern of variability among these traits. These traits were found correlated to each other. On the other hand GFD and DH were opposite to each other in two dimensional biplot spread indicates their highly negative  In total 87,096 SNPs were identified but 55,954 GBS markers were filtered with MAF > 0.05, average polymorphism information content (PIC) and diversity was 0.28 and 0.32, respectively. Out of these 55,954 markers, 35% were from A genome, 41% from B genome and 24% from D (Table 3). Chromosome 2B has the highest number of markers (4248), while 4D has the lowest (1180).  Average physical distance between two adjacent SNP loci (SNP-density) along each chromosomal length can be found in Additional file 2: Figure S2. Five sub-groups ( Fig. 4a) were expressed by 125 genotypes after the distribution along two principal components (PCs) and this finding was further validated through AHC analysis which confirmed five distinct clusters. Sub-group 3, 5 and 2 were distinctly spread in two dimensional space (Fig. 4a). Range of dissimilarity index was 0.05 to 0.25 which elucidated five clusters at cutoff value of 0.12  Table S1. Trait legends are same as in Table 1 Fig Between-class variance decomposition for the optimal classification was 0.022 (15.51%). Range of distance between each class centroid was 0.178 to 0.406. Clusters 1 to 5 contained 43, 17, 14, 22 and 29 genotypes respectively. Each genotype with its corresponding score of five PCs and its designated cluster can be seen in Additional file 1: Table S1.
Linkage disequilibrium was calculated from 3,154,450 pairs using 100 marker sliding window operation out of which 16% of the pairs were with zero LD and 12% of the pairs were found in significant (p-value < 0.05) LD. Out of these significant pairs, inter-and intra-chromosomal LD was calculated for 3838 and 320,510 pairs respectively. Out of 320,510, there were 162,362 (51%) intra-chromosomal pairs which showed > 0.1 LD with an average of 0.293 within 16.70 M base pair (Mb) throughout the whole genome. There were 504 inter-chromosomal pairs with > 0.1 r 2 LD with an average of 0.151. Intra-chromosomal LD ranged from 0.1 at chromosome 4D within 38.84 Mb pair distance to 0.24 r 2 at chromosome 1D within 13.33 Mb pair distance (Fig. 5a). Average LD on A, B and D-genome was 0.177, 0.17 and 0.168 with in the average 14.62, 14.58 and 20.19 Mb pairs distance respectively. Inter-chromosomal LD was noted between 0.06 to 0.08 r 2 (Fig. 5b). The extent of (threshold LD > 0.1 R 2 ) LD-decay was observed at 25 Mb pair distance (Fig. 5c). A total of 162,866 pairs were identified having > 0.1 r 2 LD (Fig. 5d).

Marker-trait associations (MTAs)
A total of 139 MTAs were identified out of which 69 were identified in NS experiments and 70 were found under LS conditions. Among them, 56, 55 and 28 were on genome A, B and D, respectively (Table 4). Marker S4A_610095520 was found to be associated with two (DH, GFD) traits in LS conditions; S7B_700803008 was linked with SPP in NS and GY in LS conditions and marker S6A_453869891 was identified for TKW in both NS and LS conditions (Table 4).
Six genomic regions were identified in NS and 14 were detected in LS conditions that affected days to heading. For GFD, 12 marker in NS and 11 markers in LS conditions were detected. The trait PH was found to be associated with 12 genomic regions in NS and 12 in LS conditions; SPP with seven regions in NS and six in LS conditions. Three markers were found associated with GNS in NS and five in LS conditions. Fourteen MTAs were found in NS and seven in LS conditions that affected TKW. Sixteen MTAs in NS and 17 MTAs in LS conditions were detected that control GY (  6 Mb to 735 Mb) on chromosome 7A was found to be of prime importance due to association with multiple agronomic traits including DH, TKW and GY traits (Fig. 6). Effect of favorable alleles was estimated for all seven traits studied under LS conditions. The highest allelic effect (61%) was observed for days to heading under LS conditions (Fig. 7). Genotypes (serial number 34, 109, 115, 120; Additional file 1: Table S1) with nine favorable alleles exhibited the highest (> 0.06 kg/plot) grain yield. Favorable alleles for grain yield explained 45% variability. Increase in grain yield was observed as the number of favored allele increased additively (R 2 = 0.45). Number of favored alleles in each genotype for seven traits under LS conditions is given in Additional file 1: Table S1. Seven MATs were selected on the basis of > 0.3 MAF and their sequences were searched using Basic Local Alignment Search Tool (BLAST) through the link (https://plants.ensembl.org /Multi/Tools/Blast?db = core). With these seven sequences, 144 hits were found. Fourteen gene hits with greater than 90% sequence similarity were selected with their respective annotations (

Discussion
Source to sink ratio is disturbed when heat stress affects the photosynthetic apparatus due to alteration in carbon assimilatory processes [28] which causes reduction in yield and yield related agronomic traits. Hence the present study was designed to evaluate the heat resistance potential of 125 spring wheat genotypes under three different locations. In this detailed phenotypic experiment, the traits were examined under two sowing conditions (NS and LS) that provided the copious opportunity to observe the difference exhibited by a trait under normal and heat-stressed (late sown) conditions. Traits studied in different locations were averaged to understand the stability across the environment as performed by Ogbonnaya et al. [20]. In this way their combined contribution under overall environmental variation in the detection of associated genomic regions has been highlighted under normal and heat-stressed conditions. Thus our study showed the MTAs for normal and late sown conditions separately, so that actively participating genomic regions controlling the traits under heat stress can be emphasized.
After reviewing the previous literature, only those traits were selected for evaluation and eventually genome-wide mapped which were reportedly affected by the heat stress [16,17]. Significant reduction in all studied traits was observed due to heat stress. Except DH and PH, all traits were affected by genotype x environment interaction and their ranges show that the traits are quantitatively inherited, as has been in prior investigation [17] also showed such trends. Heritability estimates for the traits DH, SPP and GNS were lower in late sown than normal sown trials same as reported by Tiwari et al. [17] while GFD, PH, TKW and GY have higher heritability in late sown than normal sown conditions as shown by Paliwal et al. [16], similar heritability estimates for TKW have been reported by Pinto et al. [29].
Two-way analysis of variance showed acceptable ranges for CV% indicated that the design of the experiment is appropriate, similar trends in CV % have been shown by Ogbonnaya et al. [20]. Under heat-stressed conditions, GY is significantly affected by PH, GFD, SPP and GNS in a positive manner while PH significantly affected DH, SPP and TKW. Under heat stressed conditions DH is inversely related with SPP, GFD and GY as well while GFD if reduced that can cause ultimate reduction in the grain yield. Similar trends have also been reported by Paliwal et al. [16]. Early maturing wheat lines have maximum GFD that can cause an increase in GY.
The biplot was constructed to see the expression of genotypes in normal and heat-stressed conditions with respect to the studied traits. The biplot distinctly which genotypes performed well under normal conditions, under heat-stressed and under both. It has also been observed that the first principal component in LS conditions explains higher variability (30.14%) as than NS conditions (24.52%), and the same trend was followed by second principal component in NS and LS conditions. Variation among the genotypes has been well explained under heat-stressed conditions as compared to under normal conditions. Genotypes with good height were high yielding and those genotypes having high number of spikes per plant (SPP) have been found with high GNS under heat-stressed conditions and delayed heading caused shrinkage in GFD.
As proven by the LSD value for each trait, the effect of heat stress was significant, so the MTAs detected in both normal and heat-stressed conditions were underlined. The main focus of the study was to identify the genomic regions under normal and heat-stressed conditions so the heat susceptibility indices for the traits as calculated and mapped by some of the previous studies [16,17] was not the objective of this study. Mapping for heat susceptibility indices of the traits cannot detect those genomic regions specifically expressed under heat stress, so the main interest was to identify the MTAs under heat stress and to pinpoint the involvement of different chromosomal regions due to normal and heat-stressed conditions.
In the present trait DH was associated with six markers in normal and 14 markers on 8 different chromosomes in heat-stressed conditions. Two markers on 1D and two    markers on 5D along with three markers on 6B and three markers on 7A were identified as controlling the DH in LS conditions. Four markers, one on each of 2A, 4A, 5B and 6A were also identified meanwhile. Fifteen percent variation in DH has been explained under heat stressed conditions by the marker (S6A_71077435) on 6A; Ogbonnaya et al. [20] also reported a region on 6A with 7.48 R 2 for DH. Using functional markers VRN-A1, VRN-B1 and VRN-D1 on chromosomes 5A, 5B and 5D Ogbonnaya et al. [20] explained the variation in DH. In the present study, four markers (one on 5B, three on 5D) explained 15% variation in DH. Grain filling duration was controlled by the regions on some multiple chromosomes (Table 4) in NS and LS conditions. Above 12% variation has been explained by most of the regions in both sowing conditions. Paliwal et al. [16] reported the distance between two markers (Xgwm935-Xgwm1273) on chromosome 2B long arm linked with GFD, later on Tiwari et al. [17] identified other QTLs for GFD were identified and most recently Ogbonnaya et al. [20] detected a region on 1B explaining 4.5% of GFD. Present study identified some GFD operating regions on 2B, 5B, 5D and 7D which validates previous [16,17] findings and also underlined some additional regions on chromosomes 3B and 4A. A region on chromosome 1B that explained 15% variation in GFD under heat-stressed conditions also corroborate with the findings of Ogbonnaya et al. [20].
Using DArTseq, plant height was mapped by Bellucci et al. [30] on chromosome 2A and 6A explaining 6 to 7% variation. Ogbonnaya et al. [20] reported a reported an MTA for PH on 5A with 4.85 R 2 . We report a 6 Mb region on 5A (426 to 732 Mb) associates with PH and explains 12 to 14% variation along with another locus on the same (5A) chromosome with the 23% R 2 . Under heat stress, there is another region of 1 Mb (32 Mb to 33 Mb) on chromosome 3B that is associated with PH. In European winter wheat, using SSR, MTA on 6A for PH was reported [31,32]. In our study a genomic region on chromosome 6A at 212 Mb is linked with PH in NS conditions but a region on the same (6A) chromosome4 00 Mb away (at 611 Mb) has also been detected controlling plant height under heat stress explaining 16 and 10% variation respectively. Spikes per plant was associated with genetic regions on 5A, 5B, 6A, 7A, 7B in NS conditions but in LS conditions the trait variation has been explained up to 14% with regions on 1B, 4B and 7A. A two Mb spanned genetic region on 2D (72-74 Mb) explained 12% variation in SPP under heat stress. Number of spikes per plant is a trait of prime importance for yield [33], so the information given by these MTAs can be used to predict the number of spikes under heat stress. For grain number per spike, in addition to the previously reported [20] region on 7B, we report MTAs for GNS on 3A and 5D under NS and on 2A, 2D, 6A and 6D under LS conditions with explained R 2 from 10 to 19%.
Thousand kernel weight is reportedly governed by the regions on 5A, 6A [18], 1A, 1B, 2D, 5A, 6A [20]. Paliwal et al. [16] detected MTAs for TKW on 2B, 7B and 7D. Tiwari et al. (2013) also showed the MTAs for TKW in normal and heat-stressed conditions on 2B, 6B and 2B, 6A, 7D respectively. In the present research a 4 Mb region on chromosome 6B uncovered by four GBS markers explained 12 to 14% variation in TKW, not only these results validating the previous findings as well as revealing the function of additional genomic region on 6B in NS conditions. But in heat-stressed conditions, a chromosomal location on 3B revealed by three GBS markers explained 10 to 13% variation in TKW. An MTA for TKW on 2B identified in this study is in agreement with the findings of Paliwal et al. [16] and Tiwari et al. [17]. Another set of three markers on 7A in the region from 718 to 735 Mb explained 10 to 11% variation in TKW.
Grain yield is a complex quantitative trait and has MTAs spread over different chromosomes; in both NS  [16] in the heat stress studies, a chromosomal region on 2B is of prime importance. In the present study, 11 MTAs for different traits have been detected on 2B. Using reference genome, GBS-markers associated with the traits of interest were ordered on the basis of physical mega base pair distance

Conclusion
To assess the thermo tolerance of the wheat germplasm, there is a dire need to highlight those regions involved in the adaptation to high temperature. This can be done by finding the association of genomic regions with the help of next generation SNP genotyping (such as GBS) so that the detected sequences can be utilized for the development of high throughput user friendly markers which may be used in marker assisted or genomic selection. Our study contributed in the effort of finding the genomic regions expressed during high temperature. Seven chromosomal regions with minor allele frequencies greater than 0.3 have been identified and their sequence was searched with BLAST which showed 14 gene hits with > 90% similarity. The genotypes with high number of favorable alleles detected in this study can be utilized in wheat breeding programs.

Material
Germplasm was comprised of 125 spring wheat genotypes (Additional file 1: Table S1) obtained from International Maize and Wheat Improvement Centre (CIMMYT). Field trials were conducted at three different locations in Pakistan; i) National Agriculture Research Center (NARC), Islamabad; ii) University of Sargodha, Sargodha and iii) Regional Agriculture Research Institute (RARI) Bahawalpur. Geographical coordinates for the experimental location at Islamabad (ISD) were 33o40'34″N 73o7'54″E, Sargodha (SGD) 32o7'50″ N 72o40'50″E and Bahawalpur (BWP) 29o23'11″N 71o39'12″E. At each location, one complete set of germplasm was sown at the optimum time (during third week of November) considered as normal and the second complete set was sown late (during third week of December) to uncover the genotype response to high temperature during anthesis. The meteorological data has been presented in the Additional file 4: Figure S10.

Phenotyping
Alpha lattice design with two replications was followed for all the experiments. Plots of two meter two rows with 30 cm row spacing were specified for each entry. Each trait under study was recorded separately for normal and late sown trials. Days to heading (DH) which is the period from sowing to the appearance of heads was noted when more than 50% of the plants of each genotype displayed heads at Zadoks stage 59 [34]. Grain filling duration (GFD) was recorded as the phase between heading and physiological maturity; the stage from sowing date to the time when green color of more than 50% of the spikes disappeared at Zadoks stage 89 [34], which is referred to as physiological maturity. After reaching physiological maturity, plant height (PH) in centimeters was measured with meter rule from the base of the plant up to the top of the spike excluding the awn length. Number of spikes per plants (SPP) was also estimated by taking the average of the number of spikes of five randomly sampled plants of each entry. Grain number per spike (GNS) was calculated, as it was the average of kernels in the main stem spikes of each of ten randomly selected plants from each entry. Grains of all spikes of ten randomly selected plants of each entry were bulked separately. Thousand kernel weight (TKW) was measured in grams by counting thousand grains randomly from each bulk using electrical weighing balance. Grain yield in kilograms for each entry was recorded after threshing whole 2 meter two rows plot.

GBS-SNP genotyping
All 125 lines were genotyped using genotyping-by-sequencing (GBS). DNA extraction kit from Prima Scientific (Bangkok, Thailand) was used to extract genomic DNA. GBS was performed following the Mascher et al. [35] protocol using two-enzyme (MspI-PstI) approach. GBS-SNPs were identified from sequence tags by aligning the sequence reads to the reference genome of Chinese Spring (CS) wheat. A pipeline of TASSEL 5 software was used for SNP calling against the whole wheat (CS) genome assembly (WGA 0.4, International Wheat Genome Sequencing Consortium). SNPs were called by setting minimum/maximum minor allele frequencies (MAF, 0.02 and 0.5) and minimum locus coverage was set at 0.2. Identified SNPs were named as "chromosome number_ physical position" i.e. 6D_13778811. The SNP markers with > 10% missing data and > 20% heterozygosity were not considered for further analysis. The genotypic panel of 55,954 SNP marker used in the study has average minor allele frequency 0.197; proportion of overall heterozygosity was 0.107 while the average missing proportion was 0.052.

Statistical analysis
By using the software Meta-R, best linear unbiased estimates (BLUEs) for three locations were calculated from restricted maximum likelihood (REML) analysis. The Table 5 Gene hits of marker-trait association (MTAs) regions identified with basic local alignment search tool (BLAST) and their annotation with high confidence by IWGSC assembly BLASTX (NCBI BLAST)