Phenotypic evaluation of grain morphology using digital imaging
Seed shape and size are among the most important agronomic traits due to their significant effect on grain weight, milling yield, and market price. Manual measurement methods have limits to the number of data, the quality of measurements, and the variety of shape data that can be gleaned. By contrast, computational methods using DI technology could enable us to automatically measure robust size descriptors (grain length, width, perimeter and area) and Elliptic Fourier descriptors (EFDs) capturing shape variation such as roughness, asymmetric skewing or other two dimensional aspects not encompassed by axes or distinctions in overall object area . Only few studies are available based on DI analysis of seed size and shape in wheat [15, 16, 19, 26, 29]. Among these studies, Gegas et al. , Williams et al.  and Williams and Sorrells  used shape variations as targeted traits influencing grain size and weight and results are comparable to our work. In this study, low correlations between the major grain dimensions and EFDs indicate that different aspects of grain morphology were captured by each phenotyping method and likely could be selected independently. Because EFDs were more highly correlated with TKW and grain length than other traits, therefore, it would be preferred if kernel shape were used in selection to increase length and TKW. The correlations between EFDs and TKW suggest that they are able to relate the uniformity and smoothness of the kernel to grain weight because roughness or shriveling would be expected to reduce the ratio of internal volume to surface area of the kernel. Use of EFDs recorded from kernels imaged on end (vertical images in this study) also can characterize variation in the depth or angle of a wheat seed’s crease which will impact the volume to surface area relationship of a grain.
A large number of significant correlations were observed for remaining size and shape traits (Table 1). For the ease of understanding we only discussed important relationships that give us new insight into the complex composition of grain size and shape components. In the nutshell: i) grain roundness has significant negative relationship with grain length indicating both traits influencing grain weight independently; ii) horizontal and vertical deviations from optimal ellipse were positively correlated with grain length and width, respectively, indicating deviation from the ellipse enhances grain length and width, and ultimately TKW; iii) grain length and width had slightly significant positive correlation indicating the possibility of finding some SHWs having wider and lengthy grains simultaneously which may lead to above the average TKW. This possibility of finding co-localized QTLs influencing grain length and breadth is expected and discussed below. However, grain width had more positive impact on TKW as compared to grain length. Although previous studies reported moderate correlations between grain weight, length, and width with r = 0.51–0.68 , and r = 0.21–0.75 , our results were in agreement with Lee et al. , who reported strong correlation (r = 0.83) between kernel weight and size. Studies have shown that kernel weight was positively correlated with grain yield  and kernel growth rate ; however, Xiao et al.  found TKW less correlated with grain yield in 1B.1R × non 1B.1R crosses across environments.
All previous reports described grain size and shapes emerged as independent traits in primitive and improved wheat germplasm , similar to the results obtained in this study using the D genome synthetic hexaploids. However, the significant reduction of phenotypic variation in grain shapes in breeding germplasm pool is probably as a result of relatively recent evolutionary and domestication bottleneck. As a consequence, the phenotypic variability offered by SHWs may fill the gap and is a good choice germplasm which can be used to improve grain weight of wheat, hence enhancing grain yield.
The association of grain size and shape descriptors with TKW was further resolved by path coefficient analysis which depicted the phenotypic model with more precision. This revealed that grain thickness has maximum direct effect on grain weight followed by VArea, whereas HArea has relatively less direct effect on grain weight. Some principal components like HPC1, HPC2, VPC3, and VPC4 have direct negative effect on grain weight and loci harboring their control should undergo negative selection in order to get superior grain weight genotypes. The efficiency of indirect selection depends on the correlation between a selected trait and a target trait as well as the heritability of the selected trait. Gegas et al.  confirmed that kernel size and shape were largely independent traits in a study of six wheat populations. The results showed that the phenotypic correlations among these traits were caused by closely linked genes or genes with pleiotropic effects.
Genetic diversity and population structure in synthetic hexaploid wheats
Genetic diversity within Ae. tauschii and synthetic hexaploids have been studied using several marker systems . Recently, Sohail et al.  analyzed the diversity using 4,449 polymorphic DArT markers and found the diversity of Ae. tauschii ssp. strangulata, the origin of D genome of bread wheat, contains only a limited part of whole diversity of Ae. tauschii. Thus, SHWs produced by crossing between tetraploid wheat and any subspecies of Ae. tauschii include untapped amount of genetic variation in which useful genes for bread wheat breeding must be present. Our results indicate that five substructures were appropriate in delineating the population structure within the SHWs used in this study. The assignment of the SHWs to the five subgroups was largely in agreement with their Ae. tauschii parent and less so with the durum parent. Recently, Mulki et al.  studied a wide array of synthetic hexaploids and indicated the presence of seven substructures were appropriate in delineating the population structure. The minor difference in the results may be attributed to the higher number of accessions used as compared to this study. The frequency of Ae. tauschii accessions amongst the SHWs varied from one to a maximum of five while the durum elite lines ranged from 1 to 45, an indication of the complexity of the crosses. It has been suggested that the STRUCTURE algorithm does not converge to an optimal K when complex genetic structures exist, such as strong relatedness within some germplasm .
Linkage disequilibrium patterns
Linkage disequilibrium is influenced by recombination rate, allele frequency, population structure and selection . In this study, the LD generally decreased with the increase of genetic distance with very strong LD between pairs of loci observed at genetic distances of up to 9 cM, suggestive of LD maintained by genetic linkage. Our results are consistent with those reported by previous studies in wheat. In a similar study using a subset of 91 SHWs, Emebiri et al.  reported that the general trend was high LD up to 15 cM, and a decline thereafter. LD was estimated to extend to about 10 cM among 43 United States bread wheat elite varieties and breeding lines . Crossa et al.  reported that some LD blocks extended up to 87 cM in a set of 170 bread wheat breeding lines. Breseghello and Sorrells  suggested that LD may differ among populations and may need to be evaluated for each population on a case-by-case basis. Nevertheless, it is important to characterize germplasm for examining the extent of LD to study the genetic diversity. Overall the observed LD was low in SHWs and only ~10.2% of the marker pairs reached the threshold of 0.2 r
value in the collection (N = 13453; marker pairs). Generally self-fertilization leads to a more extensive LD due to the several reduced effective recombination levels . The lower values of LD observed in SHWs are in concordance with what has been previously reported by Chao et al.  using the SNP marker system. They reported that CIMMYT wheat populations with the lowest LD among completely linked loci and the slowest rate of LD decay was possibly a consequence of an intensive use of synthetic wheat lines. Synthetics wheats and their derivatives have greatly increased genetic diversity in hexaploid wheat, particularly in the D-genome . A similar case is observed in these SHWs where unusual patterns of LD, rate of LD decay and lower pairwise r
values are attributed to the genomic constitution of the germplasm. It is well known that the introduction of new haplotypes from divergent population can increase the extent of LD .
Marker-trait associations and co-linearity with identified QTLs
The MTAs identified in this study, can be categorized as those affecting (1) individual dimensions of the seed and TKW, (2) multiple dimensions of the seed (meaning a single QTL that affects more than one dimension of the seed, such as length and width simultaneously), and (3) individual dimensions of the seed but not TKW. This study is the first report in using association mapping for grain size and shape that employed quantitative photometric measurements. In total, 38 MTAs for grain length, width, thickness, and TKW are relatively most important due to their immediate effect on enhancing grain yield. The co-linearity of MTAs of different traits was observed on chromosomes 1A, 2B, 3A, 3D, and 5B and indicated these regions as stable. A complete region on chromosome 2B from 51 to 69.9 cM harbor 31 MTAs of important grain phenotypes which is strong evidence of the presence of some functional genes within this proximity affecting grain phenotype. Previously, 3 meta-QTLs were identified on chromosome 2B , but none can be co-localized within this region. The proximity of TaSus-B1 on chromosome 2B is within some of the MTAs identified in this study (Table 4). The co-linearity of some of the important genes and QTLs revealed the presence of Ppd-B1, Q.Yld.crc-2B and QTkw.sfr-2B within proximity of this region. The selection based on these DArT markers can result in selection of SHWs carrying better grain size and shape phenotypes which can be exploited for wheat genetic improvement.
Previously, only two association mapping studies are available solely focusing on mapping of grain weight QTLs [22, 31]. None of two studies used DArT markers, hence, it will be difficult to align and compare QTLs detected by these studies. However, we were able to identify five loci within proximity of QTLs identified by Williams and Sorrells  using consensus DArT map information . These QTLs include Q56 (FFD, grain thickness, grain area) on chromosome 3D, Q09 (grain length) on chromosome 2A, Q17 (grain thickness, width) on chromosome 1A, Q30 (TKW) on chromosome 5B and Q42 (grain width) on chromosome 2B. Previously, QTLs affecting seed size have been identified across all chromosomes of wheat, with varying degrees of effect seen for individual QTL [16, 26–28, 30], and many of them were found within same regions identified in the present study (Table 3).
It is expected that 26 MTAs present on chromosomes 2D, 3D, 5D, 6D, and 7D may have novel allelic variability for measured traits. Horizontal area and FFD were found to be associated with markers present on same genetic region (104 cM) on chromosome 2D indicating the relative importance of this region underlying grain size and weight. Zhang et al.  identified a meta-QTL related to grain weight within the same region. Chromosome 3D appeared to have two genomic regions associated with TKW, grain thickness, width, volume and VArea. Additionally, the known wheat grain weight encoding gene, TaCKX-D1, was found to be associated with VArea, grain volume and VDFE indicating the contribution of this locus to grain weight may be through the route to enhance vertical area of grain. Similarly, five very important traits (FFD, grain width, thickness, TKW and VArea) were found to be clustered on the distal portion of chromosome 3D. Several QTLs related to grain weight have been identified on chromosome 3D and available in literatures .
Haplotype analysis of other known grain weight encoding genes, TaCwi-2A, TaSus-6B, TaGW-2B and TaSAP-A1, also dissected their potential role to enhance grain weight through different photometric measurements of grain size and shape. MTAs solely for TKW, grain length and width were identified on chromosomes 1A, 2B, 3A, 3B, 3D, 4A, 5B, 7A and 7B. Several QTLs were previously reported for kernel width and length on different chromosomes; for example, Campbell et al.  reported QTLs on chromosomes 1A, 2A, 2B, 2DL, and 3DL. Breseghello and Sorrells  reported QTLs on 1B, 2D, and 5B. Sun et al.  reported QTLs on chromosomes 4A and 6A which were absent in this study. Xiao et al.  identified a cluster of QTLs for grain length, width and weight on chromosome 6D, which also remained absent in this study. The justified reason for the absence of these QTLs is the very different genetic background of SHWs having A and B genomes from durum wheats and D genome from wild accessions of Ae. tauschii. Therefore, the identification of several QTLs is suggestive to be the novel addition to existing information and in case of co-linearity with existing QTLs, SHWs may carry new alleles.
Quantitative analyses of the photometric data revealed that grain size and shape are largely independent traits. This is unlikely to be the result of artificial selection during breeding since size and shape are also independent variables in primitive wheat. At the developmental level, this phenomenon may reflect differential modulation in growth (or growth arrest) along the main axes of the grain at different developmental stages. The notion that certain developmental constraints during grain growth could lead to morphological changes is further corroborated by recent studies on grain size/shape genes in rice [58, 59]. The GS3 locus was found to have major effects on grain length and weight and smaller effects on grain width , and the longer grains can be attributed to relaxed constraints during grain elongation . The GW2 gene was shown to alter grain width and weight and to lesser extend grain length owing to changes in the width of the spikelet hull . Similarly, the SW5 gene has been reported to affect grain width by modulating the size of the outer glume .
The results of our study demonstrated the value of genome-wide association mapping for identifying MTAs for grain size, shape and weight using genetic resources such as the SHWs. Given the diversity of MTAs identified, the SHWs possessing potentially novel alleles at different genomic regions could be used as parents in a marker-assisted backcrossing scheme to develop genotypes with higher grain weight, hence high yielding, in elite wheat backgrounds. For potentially new loci associated with grain phenotype, the development of appropriate genetic stocks using bi-parental populations, backcross families, near-isogenic lines and physical and chemical mutagenesis would enable appropriate delineation of the importance of these loci in enhancing grain weight. The DArT marker clones are almost sequenced and information is available in public domain that can assist geneticists to convert DArT into STS markers which would facilitate the incorporation of the favorable loci into elite wheat germplasm.
Relationship between number of favorable alleles and grain phenotypes
One of the relative advantages of AM is the validation of favored alleles in natural germplasm collection [22, 40]. Zhang et al.  found that allele Xgwm130
underwent very strong positive selection during modern breeding. Xgwm130 maps between Xgwm295 and Xgwm1002, with a genetic distance of 1.1 cM from Xgwm295. Similar results were obtained for TaSus-B1 gene for TKW, where most of the Chinese wheat germplasm carried favorable allele indicating the high selection pressure . Thus, the identification of favored alleles will help in choosing parents for crossing programs, to ensure maximum levels of favored alleles across sets of loci targeted for selection, and to promote fixation at these loci . Whereas linear correlations between major grain phenotypes (TKW, grain length, width and thickness) and favored alleles indicate the additive effects of QTLs or genes, the possibility of other genetic effects should not be discounted. However, powers to detect allelic effect reduce when numbers of germplasm lines are very few (Figure 6b,c).
One interesting phenomenon in wheat is that genes or markers associated with yield vary across latitudes, such as TaSus2 on chromosome 2B , TaGW2 on chromosome 6A  and gpw7596 (EST-SSR) on chromosome 7B . Favored alleles usually occur at relatively lower latitudes. This might indicate that the functional genes at these loci, including mapped alleles and those linked with markers, might be responsive to sunlight and temperature during the growing season [58, 65]. Recently, Jones et al.  devised a strategy to exploit Ae. tauschii diversity for wheat improvement in relation to climatic and environmental conditions of a specific geography. This informed and rational strategy can be applied to SHWs by identifying the Ae. tauschii accessions in the pedigree of SHWs lines with desirable characteristics. This will enhance the breeding values of SHWs and breeders will be able to offer novel diversity tailored to the environment in any regional breeding program. Nevertheless, current results are encouraging and wider options are available to exploit SHWs to enhance grain yield.
Functional analysis of trait associated DArTs and draft genome sequence of Ae. tauschii
DArT markers have been widely used for different studies in many plant species including wheat. For many years they have been used as anonymous markers, however, the acquisition of sequence knowledge of DArT markers made them useful tool for many studies such as co-linearity studies, fine mapping of loci of interest, and identification of candidate genes in association mapping. The in silico identification of putative function of DArT loci associated with grain phenotype is a step forward towards exploitation of these loci for practical wheat improvement. Nevertheless, many of the DArT sequences blasted for the similarity search did not show positive results or in some cases identified genes of unknown function. However, in some cases results are encouraging. The medium to low positive results through blast analysis in this study are in agreement with Tinker et al.  where only 40% of the DArT sequences showed significant blast similarities to the genes in public databases. However, results were slightly higher for wheat DArT sequences and 64% of them matched with the genes in public databases . In the present study, about 75% of the sequences displayed significant blast similarities and 32% of the sequences were fully annotated. The cluster of sequences of DArT markers on chromosome 2B translated into genes with valid biological functions and may be important candidates for future studies. Similarly, some grain shape parameters (like VPC3) have negative effect on grain trait and the down regulation of predicted biological function of such DArT sequences (wPt-2533, wPt-3389 etc.) may be the proper interpretation of the results. Overall, the knowledge of the functional meaning of these widespread markers will provide a very useful tool for the identification of candidate genes for traits under investigation.
The strategy here for the functional analysis of D-genome specific DArTs was slightly different which ultimately yielded more powerful results. DArT sequences were used as query to BLAST in draft genome sequence of Ae. taushii  to locate the scaffold carrying those sequences and to identify the genes within those scaffolds. This also identified the position of scaffold on chromosome based on the genetic map provided in the supplementary information of Jia et al. . The candidate regions within scaffolds were explored for the flanking genes and almost all queries resulted in positive results. A summary of the results and the genes present in flanking sequences are depicted in Table 5. The strong association of markers wPt-8463, wPt-0485, wPt-2923, and wPt-8164 with several grain phenotype parameters and presence of some important genes with valid biological functions make them priority candidates for the fine mapping and subsequent cloning of the genes responsible to enhance grain size and weight. Similar is in the case of other D-genome specific DArT sequences. Overall, this approach proved to be very useful for targeting sequences that might be orthologous to genes in other cereals. Marone et al.  used similar approach to identify the genomic regions having NBS-LRR domain superfamily encoding tolerance to biotic stresses in plants, while more than 61 DArT sequences showed significant similarity to the gene sequences in the public databases of model species such as Brachypodium and rice . Similarly, the DArT markers associated with insect pest resistance were also searched in different bioinformatics databases to assign the translating function to the sequences found similar . Webster et al.  used the specific WECPDF domain within cell wall invertase gene (IVR1) as query to search for its homologues in wheat genome survey sequence database and found five potential isoforms on multiple chromosomes. Conclusively, this approach proved to be very useful and may serve as template for gene cloning and further deployment in wheat breeding.