Haplotype block analysis of an Argentinean hexaploid wheat collection and GWAS for yield components and adaptation

Background Increasing wheat (Triticum aestivum L.) production is required to feed a growing human population. In order to accomplish this task a deeper understanding of the genetic structure of cultivated wheats and the detection of genomic regions significantly associated with the regulation of important agronomic traits are necessary steps. To better understand the genetic basis and relationships of adaptation and yield related traits, we used a collection of 102 Argentinean hexaploid wheat cultivars genotyped with the 35k SNPs array, grown from two to six years in three different locations. Based on SNPs data and gene-related molecular markers, we performed a haplotype block characterization of the germplasm and a genome-wide association study (GWAS). Results The genetic structure of the collection revealed four subpopulations, reflecting the origin of the germplasm used by the main breeding programs in Argentina. The haplotype block characterization showed 1268 blocks of different sizes spread along the genome, including highly conserved regions like the 1BS chromosome arm where the 1BL/1RS wheat/rye translocation is located. Based on GWAS we identified ninety-seven chromosome regions associated with heading date, plant height, thousand grain weight, grain number per spike and fruiting efficiency at harvest (FEh). In particular FEh stands out as a promising trait to raise yield potential in Argentinean wheats; we detected fifteen haplotypes/markers associated with increased FEh values, eleven of which showed significant effects in all three evaluated locations. In the case of adaptation, the Ppd-D1 gene is consolidated as the main determinant of the life cycle of Argentinean wheat cultivars. Conclusion This work reveals the genetic structure of the Argentinean hexaploid wheat germplasm using a wide set of molecular markers anchored to the Ref Seq v1.0. Additionally GWAS detects chromosomal regions (haplotypes) associated with important yield and adaptation components that will allow improvement of these traits through marker-assisted selection.


Background
World population is projected to grow to nearly 10 billion by 2050 and more than 11 billion by 2100 and, on a global scale, agriculture expansion has slowed down and production increases have been achieved mainly through agricultural intensification [1]. Satisfying the increasing feed and food demand will mainly come from yield improvement: it will be required that wheat yield (and other staple crops) be increased by at least 50% in the next few decades [2], which will depend, among other components, on improving yield potential [3]. This scenario becomes more complex if we consider that each degree-Celsius increase in global mean temperature would, on average, reduce global and in particular Argentinean wheat yields by 6.0% [4,5]. Furthermore, crop management will need to be environmentally more sustainable in the future [1]. Under this restrictive context, yield genetic gain will be required to increase by 1.16-1.31% per year to satisfy the projected demand of cereals for food, feed and biofuels by 2050 [6]. Unfortunately, genetic gains reported in wheat from different countries for the last decades seem to have been increasing less than required [7][8][9].
In Argentina, with 4.46 Mha, wheat is the third most important crop in terms of planted area after soybean and corn, average 2012-2016 [10], spanning a wide range of environments. Since 1999 the genetic gain of local cultivars has shown signs of stagnation, as indicated by genetic gain values of only 0.18% per year, mostly explained by a stabilization in grain number per unit area without changes in grain weight. Modern cultivars have increased the number of grains per spike without changes in spike number per unit area; fruiting efficiency (a.k.a. spike fertility index) was the trait that better explained the changes in grain number per unit area, and its improvement might be a way to promote increments in grain number without penalization in grain weight [11][12][13].
Quantitative trait locus (QTL) mapping is currently key for understanding the genetic basis of complex traits [14], including yield components and adaptation. Cost-effective, high-throughput genotyping tools for genetic study, such as single nucleotide polymorphism (SNP) arrays, are now available for all major crop species, including wheat [15], enabling the characterisation of (i) genome-wide population diversity and structure; (ii) selective sweeps and directional selection; and (iii) marker-trait relationships in genome-wide association studies (GWAS) [16,17]. Nonetheless, in this latter context, certain limitations are found: (i) SNPs only provide biallelic information [18]; and (ii) associations are likely to be due to loci in linkage disequilibrium (LD) with a gene or a controlling sequence, rather than to underlying causal reasons [19]. To overcome the biallelic problem and increase the resolution of candidate regions, the analysis of haplotypes can be employed, i.e. of the combination of co-inherited markers from polymorphic sites within a certain chromosome region. Exploring haplotypes in this way can take full advantage of ancient recombination events in order to identify the genetic loci underlying traits at a relatively high resolution [20].
The recently released wheat reference genome assembly (IWGSC Ref. Seq. v1.0), developed by the International Wheat Genome Sequencing Consortium (IWGSC) [21], allows a more accurate comparison of the chromosome regions associated with agronomic traits from independent research studies and helps accelerate cloning of further important genes for wheat breeding.
In this study, 102 Argentinean wheat cultivars were genotyped by using the 35k Axiom Wheat Breeder's Genotyping Array [22]. The polymorphic SNPs were physically anchored to the wheat reference genome assembly [21] and the genetic structure of the collection was determined. Based on multi-environmental trial data, haplotype-based GWAS was conducted to identify chromosome regions affecting crop adaptation and the building of yields. Adaptation traits include heading date, a relevant trait since it will determine the environment to which the crop will be exposed during its reproductive stages, and plant height, which is the trait most associated with lodging in Argentinean cultivars [23]. Yieldrelated traits include thousand grain weight and number of grains per spike, currently called "numerical components of yield", which are the main factors proposed by [24] to analyze the differences in yield between cultivars. Additionally, [25] proposed considering the number of grains (GN) as the product between two variables currently known as "ecophysiological components": the spike dry weight at anthesis (SDWA) and the fruiting efficiency (FE, aka spike fertility index, ie the quotient between GNS and SDWA). Later [26] proposed the use of spike dry weight at harvest without grains (or chaff ) as a substitute to SDWA. Then, the fruiting efficiency is calculated in this work as the grain number produced per unit of chaff (FEh). The FEh has shown a high association with the GN in both Argentinean [13,[27][28][29] and foreign [30] cultivars and moderately high heritability [12,28,29]. GWAS provides valuable information for a better understanding of the major genetic components involved in adaptation and yield components for bread wheat, with major implications for breeding programs aimed at increasing yield potential of new cultivars, in order to contribute to food security.

SNP-based genetic structure of the Argentinean wheat collection
The genotyping of the 102 Argentinean hexaploid wheat cultivars collection yielded a total of 7972 polymorphic SNPs with MAF ≥ 10%. A set of 6486 polymorphic SNPs was genetically anchored to specific wheat chromosomes based on mapping information from five biparental populations described previously [22]. A second set of 909 additional SNPs were anchored to specific chromosomes using three local biparental populations and an additional 577 SNPs were anchored using the association method mentioned above (Additional file 1: Table S1). Of all the anchored SNPs, 3034 were on the A subgenome, 4010 on the B subgenome and the lowest number, 928 SNPs, on the D subgenome. To reveal the genetic relationships among the 102 Argentinean hexaploid wheat cultivars collection, we conducted a model-based approach using R STRUCTURE software [31]. A model-based approach is a cluster analysis that evaluates genetic similarity among genotypes without using prior information. For the analysis we used the 7972 SNPs anchored to the wheat genome described before. We detected four subpopulations in the collection reflecting the origin of the germplasm used by the main breeding programs in Argentina (Additional file 2: Table S2). A graphical representation of the STRUC-TURE Q matrix was included (Additional file 3: Figure  S1). Subpopulation 1 (Table 1) included only introduced cultivars of French origin (100%), with the Nidera, Syngenta and Sursem breeding companies exclusively represented in this group. Subpopulation 2 included most of the old cultivars (9 out of 12 released between 1930 and 1990), also known as traditional germplasm [32,33] and all cultivars from the Don Mario breeding company released up until 2010. Subpopulation 3 included cultivars released by the Klein breeding company (46.15%) and, in lower frequency, cultivars belonging to the ACA, Buck, INTA and Relmó breeding companies, most of them with pedigrees including CIMMYT germplasm. Finally, subpopulation 4 included cultivars released by INTA (33.33%), Buck, Klein, ACA and Relmó, with most of them also having pedigrees including CIMMYT germplasm.

SNP and gene marker-based haplotype blocks construction and characterization of the Argentinean wheat cultivar collection
A detailed information about the haplotype-based map construction for the Argentinean wheat collection is found in the Table 2. A total of 1268 haplotype blocks (HBs) were built in our wheat cultivars collection, of which 518 were on the A subgenome, 641 on the B subgenome and 109 on the D subgenome. The 94% of the SNP and gene markers were involved in these HBs and only 495 SNP/gene-related markers were not included in any HB. The average block sizes were significantly larger on the D subgenome chromosomes than on the A or B subgenomes, with observed averages of 13.5, 6.7 and 4.8 Mb, respectively ( Table 2). In general, within each chromosome, the largest HBs (> 30 Mb) were placed in the centromeric region, where recombination is lower, while the smallest HBs were located towards the telomeric regions, where the recombination rate and gene density are higher (Fig. 1a). The average distance between HBs on the D subgenome was 8 times larger than that for the B or A subgenomes, with an average of 16.1, 2.7 and 2.4 Mb, respectively. The maximum spacing between segments was also significantly higher in the D subgenome with respect to B and A, with average values of 106.6, 57.1 and 44.2 Mb, respectively ( Table 2). An interesting case can be described for chromosome 1 B, where we observed several large HBs on the short arm, from HB 9 to 20 (HB average size 24 Mb in the region). A possible explanation is the presence of the 1BL/1RS wheat/rye translocation which is maintained at high frequency in Argentinean cultivars. The lack of recombination events within the translocation generates a highly conserved block involving almost the entire short arm of chromosome 1B (Fig. 1b) (Table 2).

Broad sense heritability and correlations among traits
All evaluated traits showed high heritability: the crop adaptation traits HD and PH showed broad sense heri-   (Table 3).
Regarding correlations between traits across the tested locations, the most consistent pattern was observed for FEh vs GNS, with positive correlations in most years at the Southern locations, Azul and Balcarce, and in all years at the Northern location Marcos Juárez ( Table 4). The relationship between FEh vs TGW showed no significant correlations in all tested environments, except Azul 2014.
The PH showed a positive correlation with TGW in Azul (one year) and in all years in Marcos Juárez with higher significant values. On the other hand, the PH showed negative correlations with GNS in most years at the Southern locations, contrasting with Marcos Juárez with no significant correlation in most tested years. A similar pattern can be described for PH vs FEh, with negative correlations in most years at the Southern locations, contrasting with Marcos Juárez with no significant correlations in all years ( Table 4).
The HD showed a positive correlation with GNS in the Southern locations (2 years), contrasting with Marcos Juárez with negative correlations in three years. The HD also showed negative and highly significant correlations with TGW in Azul (1 year) and Marcos Juárez (2 years). A positive correlation between HD and FEh was observed in Azul (2 years), differing from Marcos Juárez with no significant correlations in most tested years. The HD vs PH showed contrasting correlations at the Southern locations and no significant correlations in most years in Marcos Juárez. The TGW vs GNS correlation showed positive values in Marcos Juárez (3 years), with no significant correlations in most Southern environments (Table 4).
Contrasting correlations observed across the Southern and Northern locations for PH vs GNS, PH vs FEh,

Haplotype-based gWAS analysis Fruiting efficiency at harvest (FEh):
We identified 17 haplotypes/markers associated with FEh of which one belongs to chromosome 1A, seven to 2A, one to 3B, two to 4A, two to 5A, three to 6A and one to 7A (Table 5; Additional file 6: Table S5). Eleven haplotypes associated with FEh were significant in at least one year in each of the three locations, i.e. Southern (Azul and Balcarce) and Northern (Marcos Juárez). Five haplotypes were significant exclusively in Southern locations and only the SNP AX-94874921 located on chromosome 2A (707.1Mb) showed significant association exclusively in Marcos Juárez.
An example of a haplotype with significant association with FEh across Southern and Northern locations is Chr5A-B43-Hap1, located on chromosome 5A (476.4-476.7 Mb) (Fig. 2a). The HB Chr5A-B43 is composed of seven SNPs, and the significantly associated haplotype Hap1.The Chr5A-B43-Hap1 was detected in 16 out of 102 evaluated cultivars (Fig. 2b). The presence of Chr5A-B43-Hap1 significantly increased FEh in the three tested locations, albeit with different intensities, with a stronger effect in Southern locations (Fig. 2c).
Three haplotypes significantly associated with FEh were detected as colocating with additional traits: haplotype  Table 5, and the cultivars with the haplotype for higher FEh also presented higher GNS. In the case of Chr5A-B33-Hap4, the correlation with PH was negative (Table 5) and cultivars with the haplotype for higher FEh values showed lower values for PH. Finally, for Chr6A-B24-Hap2, the correlation was also negative ( Table 5): cultivars with the haplotype for higher FEh values showed lower values for TGW.

Thousand grain weight (TGW):
We identified 11 haplotypes/markers associated with TGW, of which one belongs to chromosome 2D, two to 3A, one to 3B, four to 6A and three to 7B ( Table 5). The SNP AX-95257035 located on chromosome 3A (686.8 Mb) was significantly associated with TGW only in Marcos Juárez in four of five tested years. The remaining ten haplotypes/markers showed significant associations with TGW across Southern and Northern locations with, in general terms, higher significance values at the Northern location (Table 5). Two haplotypes significantly associated with TGW and colocating with other traits were detected. The Chr3B-B111-246 Hap4 on chromosome 3B (817.4-817.8 Mb) colocated with PH, with the previously described Chr6A-B24-247 Hap2 on chromosome 6A (205.1-233.3Mb) colocating with FEh. In the case of Chr3B-B111-Hap4, the correlation with PH was positive, as cultivars carrying the haplotype for higher TGW also expressed higher PH (Table 5).

grain number per spike (GNS):
We identified two haplotypes associated with GNS, one belonging to chromosome 2A and the second to 4A ( Table 5). The Chr2A-B49-Hap2 was significantly associated across all locations and Chr4A-B19-Hap5 was significantly associated in Azul and Marcos Juárez, though with no significant effect in Balcarce. The haplotype Chr2A-B49-Hap2 colocated with FEh and its effect was previously described (Table 5).

Heading date (HD):
A total of 16 haplotypes/markers showed significant associations with HD. The regions were distributed as follows: three on chromosome 1B, two on 2B, one on 2D, four on 3A, one on 3D, one on 5B, two on 6A, one on 6B and one on chromosome 7A. For this trait, it is important to highlight the consistent effect of the functional marker for the Ppd-D1 gene on chromosome 2D, which showed significant associations over the three tested locations during in all evaluated years. The highest associations were detected in Marcos Juárez with latitude 5°lower than Southern locations (Azul and Balcarce). Cultivars carrying the Ppd-D1 sensitive allele (42 of the 102 cultivars tested) delayed flowering in all environments. Also it is important to mention that all haplotypes affecting HD showed significant associations at the Northern location in two or more years, contrasting with Southern locations Azul, showing three haplotypes with no significant effects on HD, and Balcarce, with nine haplotypes in the same situation. Additionally, we identified one haplotype colocated with PH, Chr6B-B26-Hap3 (157.6-157.8Mb), that increased HD values and negatively affected PH (Table 5).

Plant height (PH):
In the haplotype-based GWAS, 51 chromosome regions were associated with this trait. Among these, one was located on chromosome 1A, three on 1B, three on 1D, two on 2A, eight on 2B, three on 3A, five on 3B, one on 4A, two on 4B, four on 5A, three on 5B, six on 6A, seven on 6B, one on 7A and two on chromosome 7B. Forty-    (Table 5).

Genetic structure and haplotype characterization
In this work, results of a model-based cluster analysis differentiated four subpopulations, whereas previously [32], using the same 102 cultivars and a small number of markers associated with genes of agronomic interest and neutral markers (SSRs and ISBPs), detected three subpopulations, which nonetheless and as expected overlapped with the four identified here. In general terms, subpopulations 1 and 2 in our study matched with subpopulations 1 and 3 of the previous work, in including introductions of European origin and traditional germplasm, respectively. Subpopulations 3 and 4 in our study matched subpopulation 2 of the previous work, all including in most cases germplasm with pedigrees of CIMMYT origin. Differences between subpopulations 3 and 4 may be related to the presence of the 1BL/1RS wheat-rye translocation, as 24/26 cultivars in subpopulation 3 carry this, whereas no cultivar is in a similar situation in subpopulation 4 (data not shown). This translocation has been widely used in breeding to achieve resistance to several pathogens and insects, to broaden adaptation and to increase yield [34].
Argentina is no exception, as modern cultivars carrying 1BL/1RS have shown: Klein Gladiador, Klein Nutria, Klein Yarará, LE 2333, LE 2341 and ACA 906, released between 2009 and 2010, confirm the consistent contribution of the translocation to biotic and abiotic stress tolerance. In spite of these advantages, special attention should be given to the use of 1BL/1RS, due to its detrimental effects on gluten strength and bread-making quality, albeit that they vary depending upon the genetic background [35]. Wheat has been exposed to intense artificial and natural selection since its domestication, resulting in large HBs as observed in elite germplasm [18,36]. This issue can be observed in our work with the just mentioned presence of the 1BL/1RS wheat-rye translocation (Fig. 1b). In any case, we must also take into account the position of the HB in the chromosome, as we see in Fig. 1a, where large haplotypes (> 30 Mb) are formed in centromeric and pericentromeric regions of each chromosome. As previously reported in the work of [37], these regions can be a challenge when introducing allelic variants of interest in the region and reducing the size of the HBs.

GWAS analysis Yield potential related traits:
Fruiting efficiency has been proposed as a promising trait for increasing yield potential in wheat [12, 13, 28, 38, Table 5). The strongest genetic associations for FEh were detected in the Southern locations Azul and Balcarce, with lower associations observed in Marcos Juárez. These results imply FEh could provide an avenue to increase yield in regions where yield potential is high, as in the case of the Southern locations in our study.
In a previous study, a positive correlation between FEh and grain number was detected [13,39]. Here, we detected a haplotype Chr2A-B49-Hap2 (704.8-705.8Mb) on chromosome 2A, that positively modifies both the traits FEh and GNS (Table 5). In a further study, [40] detected a SNP associated with FE and grains per spikelet (RFL_Contig3780_64), located at 676.24 Mb on chromosome 2A. They proposed CONSTANS 4 (CO4) and TaVrs1 genes as candidates to explain the variation detected. However, based upon the IWGSC Ref Seq. V1.0, CO4 is located at 594.58 Mb (81.66 Mb proximal to RFL_Contig3780_64). Our haplotype Chr2A-B49-Hap2 is located 28.56 Mb distal from RFL_Contig3780_64. The physical distance between the Chr2A-B49-Hap2 and CO4 is estimated as 110. 22 Mb, and, taking into account the critical value for the decay of linkage desequilibrium (LD) by 0.1 is estimated as an average of 50 Mb for the wheat genome [41], would discard CO4 as a candidate for the FEh QTL detected at Chr2A-B49-Hap2. On the other hand, the gene TaVrs1, also called GNI-A1, was recently cloned and characterized by [42]; unfortunately the GNI-A1 gene is not assembled on IWGSC Ref Seq. V1.0 chromosome 2A and its location cannot be compared to the Chr2A-B49-Hap2 haplotype.
Little is known about the relationship between FEh and PH. We detected a haplotype on chromosome 5A, Chr5A-B33-Hap4 (445.2 Mb) that associates with both traits, but in opposite directions.
Another relevant trait in the determination of yield potential is grain weight. We detected six significant haplotypes/markers with positive effects and five with negative effects on TGW (Table 5). This list includes four haplotypes on chromosome 6A associated with TGW, one of them Chr6A-B24-Hap2 (205. ). This haplotype, negatively associated with TGW and positively associated with FEh, was located 4.5Mb proximal to the gene TaGW2-A1, which is located at 237.75Mb on chromosome 6A and has been shown to be related to grain weight [43]. Non-functional mutants for this gene increase the weight and size of wheat grains [44]. Shared function and physical location would turn TaGW2-A1 into a promising candidate gene for Chr6A-B24-Hap2. On the other hand, the gene TaGS5-3A located at 176.55Mb on chromosome 3A has also been associated with larger grain size and higher thousand grain weight [45]. We found one haplotype and one SNP associated with TGW on chromosome 3A (Table 5), but they were located too far away from TaGS5-3A (123 and 510Mb apart, respectively) to be considered as a gene candidate. In the opposite direction to that observed for FEh, the strongest genetic associations for TGW were detected in the Northern location Marcos Juárez, promoting TGW as an interesting trait to increase yield for those latitudes.
For GNS, [46] used GWAS to detect a region located at 691.22Mb on chromosome 2A that significantly affects the number of grains per spike. We detected the haplotype Chr2A-B49-Hap2, located at 704. 8-705.8 Mb, that positively affects GNS, only 13 Mb distal from the [46] region.

Crop adaptation related traits:
For HD, we found that the Ppd-D1 gene marker [47] is the strongest associated with the trait, especially in the location Marcos Juárez, situated at the lowest latitude in our study. These results agree with those obtained by [33], where the Ppd-D1 gene was found to be the main determinant of life cycle in Argentinean wheat cultivars.
On chromosome 1B, we detected three haplotypes associated with HD. [48] proposed the gene TaFT-B3 (581.4Mb) as a HD modifier for short days on this chromosome. Our nearest significant haplotype, Chr1B-B76-Hap2 (Table 5), is located 61.7 Mb distal from TaFT-B3, being slightly farther than the critical value of 0.1 for LD decay estimated as 50 Mb [41].
On the other hand, [49] showed that loss of function mutants of the PHYTOCLOCK 1 (WPCL1) gene, located at 740.1Mb on chromosome 3A, were associated with extra-early flowering time. We detected four haplotypes on chromosome 3A associated with HD. Our nearest significant haplotype, Chr3A-B-Hap4 (Table 5), is located 39.5Mb proximal to WPCL1 positioning WPCL1 as candidate for the Chr3A-B-Hap4 HD association.
Finally, no significant associations were detected for HD with the genes Vrn-A1 (5A) [50], Vrn-B1 (5B), Vrn-D1 (5D) [51] and Ppd-B1 (2B) [52], indicating that this set of genes did not appear to affect heading time in our cultivar collection. The most likely explanation is that may come from the field conditions where the collection was evaluated, with satisfied vernalization requirements, or potentially high epistatic interactions among vernalization genes, causing difficulties in the detection of minor effects on flowering time.
In the case of PH, we did not find significant associations between the trait and the molecular markers for the "green revolution" dwarfism genes Rht-B1 and Rht-D1, described by [53]. Although the collection presented a wide variation in PH, it should be mentioned that the wheat collection used in our study is mainly composed of semi-dwarf elite germplasm and that the Rht-B1 and Rht-D1 genes are balanced.
Within the 51 significant associations detected, the chromosome 1B haplotype Chr1B-B17-Hap4 and the chromosome 6B SNP AX-94943227 (234.8 Mb) were particularly noteworthy, since they were significant across the three locations in all analyzed years. To our knowledge, there are no plant height related genes described on these chromosomes, which suggest these consistent PH-HB/SNP associations are promising targets for further gene cloning projects.
We detected four haplotypes on chromosome 5A associated with PH. In a mapping work [54] located the GAresponsive Rht genes Rht9 and Rht12 on chromosome arm 5AL, where Rht9 was linked to the SSR barc151 (558.34 Mb) and Rht12 was located 5.4cM from the SSR Xgwm291 (698.19 Mb). Our haplotype Chr5A-B54-Hap3 mapped at 516.1-527.9 Mb might be associated with Rht9 linked to SSR barc151 at 558.34Mb. However, such an association with Rht12 has to be discarded in our study, since additional PH haplotypes are placed in genetic regions distant to this PH gene.
On chromosome 6A, we detected six haplotypes associated with PH; in particular, the haplotype Chr6A-B54-Hap5 (610.0 Mb) was significantly associated with PH in the three locations and in all evaluation years, except Balcarce 2015. However, none of the associated haplotypes was located in physical positions close to any of the known PH genes on 6A, namely Rht25 (144.0-148.3 Mb) detected by [55] and Rht18/Rht14/Rht24(413.73 Mb) described previously [56].
It is important to highlight that small association panels have been shown to increase both type 1 and type 2 error rates, failing to detect true associations while also generating a higher rate of false positive associations [57]. In this work, we used a small association panel (102 cultivars), but we used a conservative approach for GWAS, in order to reduce spurious associations. However, we may be failing to detect genomic regions that have low rates of explanation of phenotypic values or are found in low frequency in the panel. It is recommended to validate the associations reported in independent populations before being used in wheat breeding programs.

Conclusion
The genetic structure of the Argentinean hexaploid wheat collection was determined by the use of SNP markers, in which a strong relationship was displayed between the subpopulations obtained and the germplasm origin used by the main breeding programs in the country. Based on SNP and gene-related markers physically anchored to the IWGSC Ref Seq v1.0, a haplotype map was constructed, allowing the detection of highly conserved and selected regions like the 1BL/1RS translocation. A GWAS detected ninety-seven chromosome region associated with yield components and adaptation. In the case of yield components, we highlight regions on chromosomes 1A, 2A, 3B, 4A, 5A, 6A and 7A associated with FEh, particularly at higher yield potential locations. For adaptation, the most relevant effect on HD was the Ppd-D1 gene on chromosome 2D, being the main determinant in the variations in the life cycle of the Argentinean wheats. The use of the IWGSC Ref Seq v1.0 allowed us to precisely compare all detected associated regions with genes and QTLs reported in previous studies.

Plant material
A previously described [32] set of 102 bread wheat cultivars that includes old (such as cv. Barletta 77 released in 1927) and recent (up to 2010) commercial cultivars were selected from the main wheat breeding companies in Argentina and used for the haplotype block construction and GWAS analysis . Seed stocks were kindly provided by the Instituto Nacional de Tecnología Agropecuaria (INTA) Experimental Station Marcos Juárez, Wheat Germplasm Bank (Marcos Juárez, Argentina).

SNP-based haplotype construction
The SNP-based haplotype structure for each wheat chromosome was evaluated using the Haploview 4.2 software package [70]. The package defines haplotype blocks (HB) and provides the number of haplotypes and their physical length (bp) for each block, as well as the number of tagged SNPs based on the solid spine of linkage disequilibrium (LD) (Extend spine if D' > 0.8). This means that the first and the last marker in a block are in strong LD with the intermediate markers that are not necessarily in LD with each other [18,70]. The HBs from Haploview were converted to a table by using an in-house python script available on GitHub 2 . The script transforms the table file results into a suitable input for the 'Genome Association and Prediction Integrated Tool' (GAPIT) software [71]. Detailed information about the haplotype map constructed is given in Additional file 4: Table S3.  More detailed information of the field experiments carried out for GWAS is described in Table 6. The collection was evaluated for two crop adaptation traits: heading date (HD) and plant height (PH), and three yield component related traits: fruiting efficiency at harvest (aka spike fertility index; FEh), grain number per spike (GNS) and thousand grain weight (TGW). The HD (in days) was measured from emergence until 50% of the spike had emerged from the flag leaf [72]. The PH (in cm) was determined after maturity as the mean of 10 to 20 randomly selected plants (according to experiment) by measuring the main tiller of each plant from the ground to the top of the spike, excluding awns. At maturity, 5 to 15 random spikes (according to experiment), were sampled. They were cut at the lowest spikelet level, weighed and threshed. Spike chaff dry weight (in g) was calculated as the difference between total spike dry weight before threshing and total grain weight. The FEh (in grains g-1) was calculated at harvest as the quotient between grain number and spike chaff dry weight per sample according to [26]. The GNS was measured as the mean of the total number of grains in the selected spikes. The TGW (in g) was determined by weighing 1000 grains. In MsJz, the phenotypic data was collected using Phenobook software [73]. All the phenotypic data used for GWAS is given in Additional file 5: Table S4.