Genetic diversity, population structure and marker-trait associations for agronomic and grain traits in wild diploid wheat Triticum urartu

Background Wild diploid wheat, Triticum urartu (T. urartu) is the progenitor of bread wheat, and understanding its genetic diversity and genome function will provide considerable reference for dissecting genomic information of common wheat. Results In this study, we investigated the morphological and genetic diversity and population structure of 238 T. urartu accessions collected from different geographic regions. This collection had 19.37 alleles per SSR locus and its polymorphic information content (PIC) value was 0.76, and the PIC and Nei’s gene diversity (GD) of high-molecular-weight glutenin subunits (HMW-GSs) were 0.86 and 0.88, respectively. UPGMA clustering analysis indicated that the 238 T. urartu accessions could be classified into two subpopulations, of which Cluster I contained accessions from Eastern Mediterranean coast and those from Mesopotamia and Transcaucasia belonged to Cluster II. The wide range of genetic diversity along with the manageable number of accessions makes it one of the best collections for mining valuable genes based on marker-trait association. Significant associations were observed between simple sequence repeats (SSR) or HMW-GSs and six morphological traits: heading date (HD), plant height (PH), spike length (SPL), spikelet number per spike (SPLN), tiller angle (TA) and grain length (GL). Conclusions Our data demonstrated that SSRs and HMW-GSs were useful markers for identification of beneficial genes controlling important traits in T. urartu, and subsequently for their conservation and future utilization, which may be useful for genetic improvement of the cultivated hexaploid wheat. Electronic supplementary material The online version of this article (doi:10.1186/s12870-017-1058-7) contains supplementary material, which is available to authorized users.


Background
Bread wheat (Triticum aestivum L.) is one of the most important crops in the world, providing about 20% of the calories consumed globally (FAOSTAT 2011; http:// faostat.fao.org/). To meet the world's growing demand for food, it is urgent to develop high yielding varieties with good end-product making quality [1]. A better understanding of the genetic basis of yield and its components is a pre-requisite, though genomic research in bread wheat remains a major challenge because of the complexity associated with its hexaploid structure and huge genome size [2,3]. Wild diploid wheat, T. urartu (2n =2× =14; AA), the A-genome donor of cultivated tetraploid (2n =4× =28; genome AABB) and hexaploid wheat (2n =6× =42; AABBDD) [4], played an important role in the development and evolution of cultivated bread wheat. With the available reference genome [5], it is more feasible to exploit T. urartu as the reference sub-genome of common wheat, which will obviously provide considerable valuable information for the improvement of the latter.
Although T. urartu possesses the A genome in common with bread wheat, its genetic diversity has not been well investigated as Einkorn wheat (T. monococcum), another diploid progenitor [6]. Nowadays, the genetic diversity within wheat cultivars has been drastically reduced in the process of domestication and breeding [7,8], and it is essential to apply new contributing genes for wheat improvement. As a wild diploid progenitor of hexaploid wheat, T. urartu harbors rich allelic diversity for numerous important traits, including agronomic characteristics, grain quality and biotic stress tolerance [9][10][11]. Genetic variation of T. urartu has been investigated using various markers such as isozyme, restriction fragment length polymorphism (RFLP), amplified fragment length polymorphism (AFLP), and random amplified polymorphic DNA (RAPD) markers [12][13][14][15][16]. Such genetic diversities can be exploited to elucidate the genetic basis of natural variation of important quantitative traits. Hence, more accessions widespread should be employed to provide a more comprehensive on the characteristic of the whole population.
Most agronomic traits in common wheat, such as yield and its components, dough quality and resistant characters, are controlled by multiple genes and influenced substantially by the environment, which hinders the dissection of their genetic basis [17]. Classical linkage mapping based on bi-parental populations was a conventional approach to dissect the genetic bases of complex traits [18]. Various studies have identified a set of major effect quantitative trait loci (QTL) for important agronomic traits in wheat, such as kernel weight and dough quality [17,[19][20][21][22][23][24]. As an alternative way to QTL mapping, association mapping uses diverse material to associate genetic markers with a phenotype of interest, which presents higher mapping resolution of the phenotypes at a population level [25]. It has been exploited successfully to identify genomic regions contributing to numerous traits in diverse crops, such as maize [26,27], rice [28], sorghum [29] and soybean [30]. There is increasing interests in identifying novel marker-trait associations using association mapping in wheat [31,32]. For example, Breseghello and Sorrells [17] found significant associations between some simple sequence repeats (SSR) markers and wheat kernel traits, including weight, length and width of kernels.
The objectives of this study were to investigate the genetic diversity, population structure and relationships among a collection of 238 T. urartu accessions collected from the Fertile Crescent region and to identify marker loci associated with important agronomic and grain traits. Our results would provide further insights into the utility of association mapping for marker-assisted selection and its potential application in bread wheat breeding.

Plant material
A total of 238 T. urartu accessions, which covered most of the original areas, were subjected to SSR and highmolecular-weight glutenin subunits (HMW-GS) analysis with SDS-PAGE. This panel was obtained from the Institute of Genetics and Developmental Biology, Chinese Academy of Sciences (IGDB, CAS), the National Small Grain collection (USDA-ARS) and the International Center for Agricultural Research in the Dry Areas (ICARDA). Among these accessions, 84 were originated from Lebanon, 80 from Turkey, 37 from Syria, 12 from Armenia, 11 from Jordan, 11 from Iran, and three from Iraq ( Fig. 1; Additional file 1: Table S1).

Field experiment and phenotyping
The collected T. urartu accessions were planted at the experimental station of the Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing (40°5′56″N and 116°25′8″E), that of Henan Agricultural University, Zhengzhou (34°51′52″N, 113°35′45″E) and that of Dezhou Academy of Agricultural Sciences, Dezhou (37°45′69″N, 116°30′23″E) in two consecutive years Each field trial was managed in a completely randomized block design with two replications. All the plants were grown in a single 2-m row with 40 cm between rows and 20 cm between individuals. Nine traits were evaluated and analyzed, including heading date (HD), plant height (PH), spike length (SPL), spikelet number per spike (SPLN), tiller angle (TA), grain length (GL), grain width (GW), grain length/width ratio (GLW) and thousand-grain weight (TGW). The HD was counted as days from sowing to heading, and the date of heading was subsequently recorded when half of the spikes emerged from the flag leaf in each accession. The TA was measured between the last developed tillers and the ground level with a protractor at the maximum tillering stage, while the other agronomic traits were determined on the primary tiller. After the harvest, a minimum of 200 grains from each sample was photographed on a flat-bed scanner and the images were obtained as the aerial view of the ventral side of the grains. The GL, GW and GLW were calculated using the grain analyzer software (SC-G Scanner, Wanshen Detection Technology Inc., Hangzhou, China), and the TGW was measured by the average of two 1000 kernel-weights.

DNA isolation and nested-PCR amplification
Genomic DNA was isolated from young leaf tissue of two-week-old seedlings (one individual per accession) using the cetyl trimethyl ammonium bromide (CTAB) method [33]. DNA concentration was determined and diluted to a working solution of 50 ng/μL. Primer sequences of 62 SSR markers (Additional file 1: Table  S2) used in this experiment were obtained from the GrainGenes database (https://wheat.pw.usda.gov/cgibin/graingenes/browse.cgi?class=marker). The nested-PCR amplifications were performed using fluorescent dye labeling system according to Schuelke [34]. In brief, amplification reactions were performed in a final volume of 15 μL, containing 3 μL template DNA, 0.2 μL forward primer with the M13 tail at its 5′-end (2.0 μM), 1.0 μL M13 primer (labeled with 6-FAM, NED, VIC or PET), 1 Amplified PCR products were pooled and then purified with 3.0 M sodium acetate and 70% ethanol before adding HiDi-formamide. Fluorescently labeled DNA fragments were separated by capillary electrophoresis in an ABI 3730xlDNA Analyzer with GeneScan-500 LIZ size standard (Applied Biosystems, Foster City, CA, USA). SSR polymorphism was analyzed by Gen-eMapper software version 4.0 (Applied Biosystems, Foster City, CA, USA) according to the manufacturer's instructions.

Protein extraction and SDS-PAGE analysis
In each accession, the HMW-GS proteins were extracted from three seeds according to previous procedure [35], and SDS-PAGE was performed to fractionate the HMW-GSs using 10% (w/v) separating and 3% (w/v) stacking gels. Electrophoresis was conducted at a constant current of 18 mA at room temperature for about 8 h, and then the SDS-PAGE gels were stained overnight with Coomassie Brilliant Blue G-250 in a solution containing 20% (v/v) methanol and 10% (v/v) acetic acid. De-staining was carried out with tap water and then the gels were subjected to image capturing on a highresolution scanner (GE ImageScanner III). The identification of HMW-GSs and alleles were based on the methodology described by [36].

Genetic diversity and phylogenetic analysis
For 62 SSR polymorphic loci, the genetic parameters of variability were estimated via the POPGENE1.32 software [37], including number of observed (Ao) and expected (Ae) alleles per locus, observed (Ho) and expected (He) heterozygosity, Shannon's information index (I), Nei's Gene diversity (GD) and Polymorphism information content (PIC). The accuracy of these genotyping data was also manually checked for the scoring errors and null alleles using the Micro-Checker software [38]. All the parameters were computed for both the whole collection and the subsets considering their geographic origin and population structure category. Genetic distance and phylogenetic analyses of T. urartu accessions were conducted based on the Jaccard's coefficient similarity matrix obtained from the proportion of shared SSR fragments and HMW-GS bands, then a dendrogram was drawn with NTSYSpc 2.11 program [39] using UPGMA algorithm. The fitness of the dendrogram was assessed by bootstrap analysis running 1000 replications.

Population structure
The model-based (Bayesian) STRUCTURE version 2.3.1 was applied to identify clusters of genetically similar individuals on the basis of their genotypes [40]. The program was run five times independently for K value (number of subpopulations) ranging from 1 to 15, adopting the admixture model and correlated allele frequencies, with a burn-in period of 50,000 and the number of replications at 100,000. The normal logarithm of the probability was calculated against each K value, and the optimal number of subpopulations was determined using the ΔK approach described by Evanno et al. [41]. Then each T. urartu individuals could be assigned to the putative subpopulations according to their average membership coefficient, which was calibrated using CLUMPP [42,43]. A nested analysis of population structure with the same software and parameters was carried out to distinguish the next level of subpopulations.

Analysis of molecular variance (AMOVA)
Analysis of molecular variance (AMOVA) was performed to estimate the genetic variance within and among inferred populations in T. urartu accessions. Population differentiation was assessed by calculating pairwise Fst values and Nei's genetic distances for different regional population pairs. The threshold for statistical significance was determined by running 10,000 permutations. Principal coordinate analysis (PCoA) was also carried out based on binary genetic distance. The generated eigenvalues and accumulated percentage of the variation were applied to plot the scatter diagram of these representative accessions, with the first two principal coordinates which accounted for the highest variation. All the calculations mentioned above were implemented in the GenAlex 6.5 software [44].

Association analysis
The markers with minor allele frequency less than 5% were removed in order to reduce false positive associations. The pairwise kinship (K matrix) among samples was generated from the program SPAGeDi [45]. Linkage disequilibrium estimates for each pair of loci and marker-trait association analysis were conducted using TASSEL 2.1 software [46], via a general linear model (GLM) and a mixed linear model (MLM). In GLM model, population structure of the T. urartu mapping panel was included as fixed effects, while association was estimated by simultaneous accounting of multiple levels of population structure (Q matrix) and relative kinship among the individuals (K matrix) in MLM. The significant threshold for associations between loci and traits was set at P < 0.01. The Bonferroni correction of multiple testing was performed based on the q value using false discovery rate (FDR, α c = 0.05). For all associated markers, the average phenotypic effects of different alleles were estimated using the method proposed by Breseghello and Sorrells [17].

Phenotypic variation and correlations among traits
The phenotypic data of investigated traits across the six environments, including mean values, minimum and maximum values, standard deviations and the heritability estimates (h 2 ), were calculated ( Table 1). The data revealed a broad variation for all traits in these T. urartu accessions, e.g. PH had an average of 112.43 cm (minimum 86.10 cm and maximum 149.10 cm) with 12.23 cm standard deviation in E1, and TGW ranged from 4.29 to 18.87 g with 2.96 g standard deviation in E5. The phenotypic data for HD, PH, SPL, SPLN, GL, GW, GLW and TGW followed the normal distribution, suggesting that these traits were controlled by multiple loci. The broad-sense heritability (h 2 ) of all the traits was relatively high, ranging from 68.91% for TA (E2) to 94.05% for SPLN (E4), which confirmed that most of the phenotypic variance was genetically determined. Moreover, significant correlations coefficients among different environments were also detected, implying the less genotype x environment interactions of these traits (Additional file 1: Table S3). Depending on the collection site, all of the T. urartu accessions could be roughly split as Eastern Mediterranean coastal and Mesopotamia-Transcaucasia group, and the details of phenotypic performances illustrated that most of the traits investigated in this study differed greatly between the two major subsets (Additional file 1: Table S4). The correlation coefficient (r) analysis revealed that several traits were correlated (Additional file 1: Table  S5). The highest positive correlation was detected between TGW and GW (r ranging from 0.68 to 0.81 in different environments, significant at P < 0.01), followed by TGW versus GL (0.60-0.69, P < 0.01), indicating that grain weight was largely influenced by grain size in T. urartu. In addition, significant positive correlations of HD with PH, SPL and SPLN were observed in all environments, suggesting that late-heading varieties were prone to have a high statue and large spike with many spikelets. On the other hand, HD was also negatively correlated with GL and GLW, demonstrating that a T. urartu accession with late heading date almost followed with small grain length and low grain length/width ratio. Notably, TA had a significant positive correlation with PH and SPLN, which reflected the tendency for prostrate type to have a high plant and a large number of spikelets.

Genetic diversity revealed by SSR markers
To evaluate the genetic diversity of the T. urartu population, 62 SSR primers (loci) distributed on seven chromosomes ( Table 2) were selected to perform the nested-PCR amplifications and detected with fluorescent dye labeling system. In 238 T. urartu accessions, a total of 1201 alleles ranging from 4 (Xbarc138) to 42 (Xgwm136) were amplified, with an average of 19.37 alleles per locus (Ao). 881 rare alleles (73.36%) were detected with the frequency lower than 5%, but none was fixed with the frequency more than 90%, resulting in an average expected allele (Ae) of 7.29. This suggested that the higher genetic variations of alleles were present in T. urartu accessions. Consequently, the major allele frequencies varied from 0.13 (Xcfd15) to 0.86 (Xbarc206), with the overall mean of 0.32.
We observed 18 heterozygous loci for the SSR markers assayed particularly, with the observed heterozygosity (Ho) and expected heterozygosity (He) ranged between 0.04-0.72 and 0.24-0.85, respectively. As a result, a total of 1357 genotypes were deduced with an average of 21.89 per SSR marker. According to the polymorphic information content (PIC), 53 SSR loci (85.48%) were highly informative (PIC >0.5), eight (12.90%) were reasonably informative (0.5 > PIC >0.25) and only one (1.61%) was slightly informative (PIC <0.25). Calculation of the Nei's gene diversity (GD) for 62 loci demonstrated that Xcfa2134 preserved the highest GD (GD = 0.94) and Xbarc206 did the lowest (GD = 0.25), with the mean value of 0.80. Hardy-Weinberg equilibrium testing of these markers indicated that T. urartu population is not mating randomly, probably owing to the self-pollination in diploid wheat (Additional file 1: Table S6).
The panel of analyzed T. urartu accessions possessed high polymorphic information, covering most original areas. In order to explore and compare the variability inherent in genetic diversity, variability parameters were calculated in eight sample subsets ( Table 3). The number of alleles amplified in the Mesopotamia-Transcaucasia group was higher than that in the Eastern Mediterranean coastal group (16.38 versus 11.02, P < 0.01), resulting in a general decrease in GD from 0.76 to 0.64. Likewise, the Shannon's information indices (I) were counted as 1.97 versus 1.48 ( Table 3). Beyond that, the Mesopotamia-Transcaucasia group preserved more rare alleles (frequency < 5%) (498) than the Eastern Mediterranean coastal group (377). Our data demonstrated that the Mesopotamia-Transcaucasia group had much higher genetic diversity, and these regions might be the diversity center of T. urartu.
With respect to the geographic regions analyzed separately, the subpopulation of accessions collected from Turkey exhibited the highest diversity, followed by that from Northern Syria, Southwestern Syria, Lebanon, Iran, Jordan, Armenia and Iraq, due to their differences in number of alleles and genetic diversity (Table 3). GD in each population showed a similar trend ranging from

Genetic diversity revealed by HMW-GSs
The pattern of HMW-GSs from most of the T. urartu accessions is formed by two distinct electrophoretic moving zones, including one major 1Ax subunit zone with slower mobility and one major 1Ay subunit zone with faster mobility (Fig. 2). Among the 238 T. urartu accessions, the 1Ay subunit was not expressed in 69 accessions (28.99%), and only one (0.42%) was found silenced for the 1Ax subunit. All the 1Ax subunits in T. urartu showed faster electrophoretic mobility than the subunit 1Ax1 detected in bread wheat Xiaoyan 54 (XY), and four 1Ax subunits in 46 accessions displayed slower electrophoretic mobility than the 1Ax2 * present in bread wheat Cheyenne (CNN) (Fig. 2). In most T. urartu accessions, the electrophoretic mobility of the 1Ay subunits was faster than the 1Dy10 subunit of Cheyenne, except for one Armenian and two Turkish accessions (U17). A total of eleven 1Ax and eight 1Ay subunits were detected, resulting in 18 HMW-GS genotypes (U1-U18) ( Table 4). U1, U6 and U10 appeared exclusively in Turkish accessions, U14, U15 and U16 only in Lebanese accessions, U5 merely in southwestern Syria, and U12 and U18 were solely present in northern Syria. Concerning the frequencies of these HMW-GS genotypes, U7 was the most abundant (65 accessions, 27.31%), followed by U8 (42 accessions, 17.65%), U2 (35 accessions, 14.71%) and U14 (21 accessions, 8.82%). The remaining 14 patterns totally accounted for 31.51% of accessions, of which U1, U6 and U18 were each present in only one accession. When considering the HMW-GS locus as a co-dominant marker, its PIC and GD were 0.86 and 0.88, respectively, which were comparable with these of the SSR markers.

Genetic relationships among the accessions
The information about genetic variation determined from SSR data combining with the SDS-PAGE analysis of HMW-GS was employed to estimate similarity matrix value. Based on Jaccard's coefficient, an UPGMA dendrogram was constructed to reveal the genetic relationships (Fig. 3). The phylogenetic tree clearly assigned the 238 T. urartu accessions into two clusters: Cluster I mainly distributed in the east of Mediterranean coastal regions, including Lebanon, Jordan and southwestern Syria, and Cluster II tended to occur in the Mesopotamia and Transcaucasia regions, including Turkey, Iraq, Iran, Armenia and northern Syria. The genetic distance in the dendrogram revealed that Cluster II exhibited more diversity than Cluster I, which was consistent with its extensive geographic distribution. In Cluster I, Lebanese accessions (84) gathered tightly and were further distinguished from the other accessions, thus this cluster split into two major subclasses. Cluster II was also divided into two subclasses, one of which contained all the Iraqi and Iranian accessions (14), most of Turkish and northern Syrian accessions (84), and one Armenian accession, whereas the other one contained eight Turkish accessions, six northern Syrian accessions and eleven Armenian accessions. Typically, the sequenced accession, PI428198 (G1812) [5] grouped with most Turkish materials at the first subclass of Cluster II. The Mantel test revealed a high and significant cophenetic correlation (r = 0.92, P < 0.001), indicating a good fit between the dendrogram and its original similarity matrix. The range of similarity coefficients (0.03-0.97) showed abundant genetic variation in this collection, which is supported by the high means observed for the number of alleles per locus and the PIC values.
Principal coordinates analysis (PCoA) was also performed in order to assess the individual differences (Fig. 4). The first two axes accounted for 42.25 and 17.28% of the genetic variation, respectively, and occupied 59.53% in total. The first coordinate clearly discriminated Eastern Mediterranean coastal accessions from Mesopotamia-Transcaucasia accessions, while the second coordinate separated the two large clusters gradually into small groups due to the latitude variation. The PCoA data confirmed the UPGMA analysis.

Population structure
In order to understand the population stratification of T. urartu, the model-based Bayesian cluster analysis was tested using the software program STRUCTURE (Fig. 5). The data was analyzed by successively increasing the number of subpopulations (K) from two to fifteen, with five independent runs for each K value. The CLUMPP alignment of independent solutions showed the high posterior probability in assignment of accessions among runs. At K = 2, we detected a division between 112 eastern Mediterranean coastal accessions (Cluster I) and 126 Mesopotamia-Transcaucasia accessions (Cluster II). At K = 3, Cluster II split into 61 Turkish accessions and the remainders, while 58 Lebanese accessions were segregated from the rest of Cluster I when K = 4. With increasing K, minor subpopulations from different geographic regions such as Jordan, Iran, Iraq and Armenia could be separated gradually. The optimum number of subpopulations (K) was identified based on lnP (D) value and the delta K (ΔK) method suggested K = 2 as the best fitting one in our study (Additional file 2: Fig. S1). This structure-based data was mainly consistent with the genetic relationships of the traditional clusters.
Based on AMOVA analysis, most genetic variation was detected among individuals (60.04%), followed by variation among geographic regions (18.93%), variation within individuals (12.87%) and variation among the large two Clusters (8.16%). The overall Fst among the Clusters and geographic regions were 0.1790 and 0.2506, respectively (P < 0.05). The pairwise Fst value, interpreted as standardized population distances between two populations, ranged from 0.0658 (between Turkish and northern Syrian populations) to 0.4419 (between Jordanian and Armenian populations). The Nei's genetic distance data consisted with the Fst estimates, in which Turkish population showed the smallest genetic distance with northern Syrian population (0.0289), whereas the greatest genetic distance was observed between Jordanian and Armenian populations (0.7947) ( Table 5).

Linkage disequilibrium and marker-trait association analysis
The SSR and HWM-GS data were subjected to evaluate the linkage disequilibrium (LD) on a whole genome level. Across all the 63 loci, 1953 locus pairs were detected in the entire T. urartu collection, with 257 possible linked (in the same linkage groups) and 1696 unlinked locus pairs (from different linkage groups), respectively. Among these linked locus pairs, the pairwise r 2 values varied from 0.00 to 0.25 with a median of 0.04, and only nine (3.50%) marker pairs were at significant LD level (r 2 > 0.1 and P < 0.001), which suggested seldom LD among the analyzed loci.
The association analysis between markers and phenotypic data was carried out using GLM and MLM models in the software TASSEL. Among the 63 × 9 marker-trait comparisons, 67 significant associations (11.82%) were identified using GLM approach with Q-matrix. However, the MLM analysis reduced to 10 significant associations (1.76%) once considering K-matrix as co-variate. The Qmatrix (K = 2) inferred from STRUCTURE defined the ancestry coefficient of individuals in the population and the K-matrix was subjected to correct for their genetic relatedness. Besides, the averages of the phenotypic variations calculated by the two models for association mapping were 15.89% (Q) and 22.74% (Q + K), indicating that the MLM could explain more genetic variation than GLM. Henceforth, our attention will focus on associations incorporating both Q-matrix and K-matrix since they were more conservative and accurate. Considering the MLM statistics, significant markertrait associations for six traits were identified with six SSRs and protein marker HMW-GSs (HD: Xcfa2193-3A; PH: Xgwm328 Table 6). The Xcfa2193-3A on chromosome 3A was highly associated with HD in four environments, explaining 9.39 to 15.74% of the phenotypic variation, while Xgwm328-2A on chromosome 2A was closely associated with PH in all the six environments, accounting for 23.47 to 32.15% of the phenotypic variation. Xgwm328-2A was also simultaneously associated with SPL and SPLN in at least five environments, having the phenotypic variations of 32.43 to 37.90% and 28.42 to 38.77%, respectively. This multiple marker-trait association might be attributed to the pleiotropic effects of the genetic locus or the consequential relationship among these associated traits. Xgwm63-7A was significantly associated with SPL and SPLN in five environments, with the phenotypic variations of 20.67 to 27.80% and 19.24 to 23.93%, respectively, and Xbarc138-4A showed a stable association with SPLN in four environments, with the phenotypic variations of 14.43 to 18.18%. In particular, the HMW-GSs, encoded by Glu-1 locus on the long arm of chromosome 1A, were associated with TA in three environments, and it explained phenotypic variations ranging from 28.18 to 34.46%. As for grain traits, Xcfa2219-1A and Xgwm293-5A were associated with grain length (GL) in five environments, and could explain 15.99 to 20.33% and 23.55 to 28.23% of the phenotypic variation, respectively.
We further investigated the relationships between the genotype and haplotype with the phenotypic traits analyzed (Fig. 6). For HD, the 214-bp genotype of Xcfa2193-3A, which was observed in 10 Armenian accessions, exerted a positive effect on delaying the heading date, whereas the 202-bp genotype in 72 accessions was linked to medium heading date. Furthermore, the 200bp allele present at Xgwm328-2A locus in 11 accessions was strongly associated with high values of PH, which also correlated with the increase of SPL and SPLN. In contrast, the 208-bp allele was preferentially shared by genotypes in 29 accessions with low PH, SPL and SPLN. Similarly, 55 accessions carrying the 170-bp allele at Xgwm63-7A produced remarkably longer SPL and more SPLN than accessions with other alleles, and the 196-bp allele at Xbarc138-4A could also bring more SPLN in 17 accessions than others. At Glu-1 locus, the HMW-GS encoded U7 pattern was associated with the erect plant architecture (65 accessions), whereas the U14 pattern with the prostrate type (21 accessions). Regarding GL, the significant associations were attributed to the 224-bp allele at Xcfa2219-1A and the 205-bp allele at Xgwm293-5A being specific to genotypes with large kernel length.

Discussion
Nowadays, the genetic variability of wheat cultivars is decreasing as a consequence of the genetic erosion of cultivated hexaploid wheat. As a wild diploid progenitor of hexaploid wheat, T. urartu harbors rich allelic diversity for numerous important traits, including agronomic  characteristics, dough quality and biotic stress tolerance [9][10][11]. Thus, this wild species could be exploited in yield potential and quality improvement of bread wheat.

Morphological and genetic diversity
In recent years, SSRs markers have been proven to be an efficient tool for molecular and genetic studies in wheat [47][48][49]. Many SSRs covering the A genome of common wheat were employed in diploid wheat, due to their transferability among closely related species [50]. In this work, we observed a higher level of polymorphisms in T. urartu using SSR markers developed in common wheat. A total of 1201 alleles were identified from the 62 SSR loci in the 238 accessions, with an average of 19.37 alleles per locus, which is much higher than that observed in an earlier study (19.37 vs. 8.00) of 23 T. urartu accessions using 25 SSRs [51]. In addition, low genetic variation of T. urartu has also been reported with RFLP and RAPD markers [12][13][14]. This is the first time that such a high level of genetic diversity was characterized in T. urartu using SSR markers, probably because of a relatively large number of accessions collected in a wider geographic region. Moreover, compared to other wheat species, T. urartu possesses higher level of genetic diversity than the A genome of tetraploid and hexaploid wheat [52,53].
HMW-GSs are the major seed storage proteins that determine dough viscoelastic properties and breadmaking quality [54,55], which could also represent useful markers for assessment of genetic variability in wild diploid wheat collections [56,57]. In present study, a total of eleven 1Ax and eight 1Ay subunits were detected, resulting in 18 HMW-GS combinations in 238 accessions. Even though 1Ay subunit is totally inactive in common wheat [58], 71.01% T. urartu accessions were found to express the 1Ay subunit, which was consistent with previous reports [56,59,60]. The polymorphic information content (PIC) of our collections was 0.86, and the GD value was 0.88, which suggested that genetic variation revealed by HMW-GSs was comparable with that by SSR markers, probably because of the post-translational modification of protein markers.
The genetic diversity observed in this study is well reflected by the variation in multiple biological traits.  Genetic relationships and population structure T. urartu species are endemic to the major geographic regions of the Fertile Crescent [61,62]. In our study, the UPGMA dendrogram divided the diverse panel into two major subpopulations consistent with their geographic origin and ecological distribution (Fig. 1). The accessions from Eastern Mediterranean coast belonged to Clusters I, spreading from Lebanon to Jordan, of where the climate was described as poor rainfall and low temperature [63]. The levels of population differentiation in the Bekaa valley of Lebanon, southwestern Syria and the plains alongside Jordan River were low owing to high degrees of gene flow, occasional migration and crosspollination among these accessions [15]. The other possibility is that Lebanese, southwestern Syrian and Jordan populations most probably originated from the same ancestral population, which could be inferred from the rare alleles shared by multiple loci in this study. The rest of accessions from Mesopotamia and Transcaucasia belonged to Clusters II, including Turkey, Iraq, Iran, Armenia and northern Syria. Of them, most Turkish accessions collected along the east-west road from Nusaybin to Viransehir, were in the area of South Eastern Anatolian basin characterized as mild winter and warm-dry summer [64]. The Iranian samples, located in the Zagros mountain range with a similar climatic environment, were also included in the same branch of phylogenetic tree. Other places, such as Urfa, Gaziantep and Mus of Turkey, northern Syria and part of Iraq, had a continental cold and long winter with mild summer-dry climate [64], thus, these T. urartu accessions were grouped together. Specially, populations originated from Armenian, Nusaybin east of Turkey and Malikiyah of northern Syria exhibited high frequencies for the rare alleles such as 127-bp at Xgwm614-2A and 214-bp at Xcfa2193-3A, which separated the populations from other regions. The restricted distribution of these alleles indicated that variation at the associated loci possibly has some adaptive significance, which was supported by previous report [14]. On the whole, genetic differentiation among the accessions was also clearly demonstrated by the first and the second principal coordinates (Fig. 4).
In order to avoid distortion of the relationships among members and the spurious association mapping between phenotype and genotype, we examined population structure of the representative T. urartu collection used in this study. The result separated these accessions into Eastern Mediterranean coastal and Mesopotamia-Transcaucasia populations, generally in agreement with the phylogenetic tree and PCoA analysis. However, the loss of genetic diversity and decrease of population size were detected from Cluster II to Cluster I, since bottleneck and genetic drift may generate in the duration of natural selections. Given the subdivision, accessions from Turkey and northern Syria exhibited higher diversity than others, which had also been reported previously [15,51], and huge genetic variations were subsequently emerged among different originated populations. Our results provided inspirations for preservation and sampling of natural T. urartu populations in these regions.

Marker-trait associations
Agronomic traits such as PH, HD, SPL and TA play important roles in wheat life cycle and environment adaptability, which are closely associated with the yield potential [19,65]. Grain shape, as specified by GL, GW and GLW, is a crucial determinant of grain appearance quality and grain weight in wheat [66]. Therefore, these quantitative traits have drawn major attention in the process of wheat breading over the world. Compared to traditional QTL mapping, association mapping is less time-consuming as no segregating population needs to be developed and no segregating offspring needs to be grown [25]. Under such circumstances, a number of marker-trait associations have already been identified for significant meta-QTLs in bread wheat [67][68][69][70]. Nevertheless, a genome-wide association mapping study of agronomic traits based on elite diploid wheat germplasm is still lacking. In this study, six SSR and HMW-GS markers were detected to be highly associated with six agronomic and grain shape related traits in T. urartu ( Table 5). The Xcfa2193-3A on chromosome 3A associated with HD explained >35.41% of phenotypic variation in four environments, which had also been reported in common wheat [71]. Xgwm328-2A and Xgwm63-7A associated with SPL/SPLN in our study appeared to increase both SPL and SPLN in common wheat [72,73]. As for GL, we also detected a closely associated SSR marker Xcfa2219-1A, which had been reported to locate in an additive-effect QTL controlling GL in common wheat [74]. Xgwm293 on chromosome 5A was recently characterized associated with GL and TKW in a doubled haploid (DH) mapping population [75]. Moreover, Xbarc138-4A for SPLN was associated with yellow rust in ITMI-mapping population [76]. The data demonstrated that these genetic regions may be conservative between T. urartu and hexaploid wheat, and T. urartu could be explored in underlying gene characterization through mapping based cloning for its relative simple genome [5]. Interestingly, the marker of HMW-GSs showed a significant association with TA after the correction of FDR. In this case, the U7 pattern containing only one 1Ax subunit was associated with the erect type, whereas the U14 pattern containing both 1Ax and 1Ay subunits was associated with the prostrate type. However, the common wheat varieties rarely express 1Ay subunit and tend to develop an erect plant, this differentiation may be due to the evolution and domestication from T. urartu to hexaploid wheat.
Except the phenotypic traits mentioned above, the A genome of wheat is also known to contain QTLs or genes affecting other agronomic traits, heat and drought tolerance, pathogen resistance and so on [77][78][79]. The allelic variation between T. urartu and T. aestivum indicates the great potential for discovery and utilization of wild diploid relatives in wheat breeding. The associations determined in this study would be very useful for marker-assisted selection (MAS) in wheat breeding programs, although more effort is needed to validate these associations in other populations. Along with the progress and wide applications of comparative genomics approaches, further work to elucidate the genetic basis of complex traits will be accelerated.

Conclusion
Genetic diversity and population structure of 238 T. urartu accessions were analyzed through SSR and HMW-GS markers and their associations with phenotypes were detected. Six markers, associated with HD, PH, SPL, SPLN, TA and GL were determined, which should be beneficial to effectively exploit new genetic variations of the wild diploid T. urartu in yield and quality improvement of common wheat using MAS programs. Our results also provide further insights into conservation and future utilization of wild wheat resources.

Additional files
Additional file 1: Table S1. T. urartu accessions used in the present. Table S2. List of SSR primers used for genetic diversity and association analysis. Table S3. Correlations coefficients for the investigated traits of 238 T. urartu accessions in six environments. Table S4. Details of phenotypic performances and ANOVA analysis of differences between the Eastern Mediterranean coastal and Mesopotamia-Transcaucasia groups. Table S5. Correlation coefficients between the investigated traits in 238 T. urartu accessions in six environments. Table S6. Summary of Hardy-Weinberg equilibrium testing for SSR markers used in this study. (DOCX 75 kb)