Genetic dissection of quantitative and qualitative traits using a minimum set of barley Recombinant Chromosome Substitution Lines

Background Exploring the natural occurring genetic variation of the wild barley genepool has become a major target of barley crop breeding programmes aiming to increase crop productivity and sustainability in global climate change scenarios. However this diversity remains unexploited and effective approaches are required to investigate the benefits that unadapted genomes could bring to crop improved resilience. In the present study, a set of Recombinant Chromosome Substitution Lines (RCSLs) derived from an elite barley cultivar ‘Harrington’ as the recurrent parent, and a wild barley accession from the Fertile Crescent ‘Caesarea 26–24’, as the donor parent (Matus et al. Genome 46:1010–23, 2003) have been utilised in field and controlled conditions to examine the contribution of wild barley genome as a source of novel allelic variation for the cultivated barley genepool. Methods Twenty-eight RCSLs which were selected to represent the entire genome of the wild barley accession, were genotyped using the 9 K iSelect SNP markers (Comadran et al. Nat Genet 44:1388–92, 2012) and phenotyped for a range of morphological, developmental and agronomic traits in 2 years using a rain-out shelter with four replicates and three water treatments. Data were analysed for marker traits associations using a mixed model approach. Results We identified lines that differ significantly from the elite parent for both qualitative and quantitative traits across growing seasons and water regimes. The detailed genotypic characterisation of the lines for over 1800 polymorphic SNP markers and the design of a mixed model analysis identified chromosomal regions associated with yield related traits where the wild barley allele had a positive response increasing grain weight and size. In addition, variation for qualitative characters, such as the presence of cuticle waxes on the developing spikes, was associated with the wild barley introgressions. Despite the coarse location of the QTLs, interesting candidate genes for the major marker-trait associations were identified using the recently released barley genome assembly. Conclusion This study has highlighted the role of exotic germplasm to contribute novel allelic variation by using an optimised experimental approach focused on an exotic genetic library. The results obtained constitute a step forward to the development of more tolerant and resilient varieties. Electronic supplementary material The online version of this article (10.1186/s12870-018-1527-7) contains supplementary material, which is available to authorized users.


Background
Breeding programmes have successfully increased yield and quality traits of major cereals including barley [1,2]. However, the domestication and breeding history of this ancient crop has drastically reduced the genetic diversity and current improvements in crop performance and yield have stagnated and reached a plateau [3]. Recently there has been renewed interest in wild germplasm collections as sources of beneficial alleles to mitigate the effects of reduced diversity and to ensure the future efficiency of agriculture under a changing climate [4][5][6].
Barley has a broad geographical distribution and because of its long history has adapted to a wide range of different environments [7,8]. Over 460,000 accessions of Hordeum are preserved in gene banks worldwide, second only to rice (773,948) and wheat (856,168) (FAO 2010) and provide an enormous potential resource for allele mining. There have been many descriptive and comparative studies examining diversity in wild and landrace barleys, but few examples utilising the allelic diversity have been reported [3]. Importantly, crosses between cultivated and wild barleys are fertile and several research groups have developed experimental genetic stocks. These include introgression lines of chromosomal segments from the wild barley genome into domesticated barley [9][10][11][12][13] and multi-parental nested populations with wild donor genomes [14][15][16]. Several of these populations have been used to introgress favourable alleles from wild barley into breeding populations for resistance to various diseases and to improve agronomic traits under post-anthesis drought. For example, disease resistance [17,18] and increased performance under drought and saline conditions [19,20] have been associated with genetic polymorphisms for which wild alleles contribute favourable genotypic variation for plant growth and yield under stress. In addition, the introgression of novel exotic allelic variants has been found to constitutively improve economically important traits related with the malting characteristics of the crop [21] as well as key yield components such as grain weight and seeds per spike [9,22,23].
The utilisation of high-throughput genotyping platforms in some of these studies together with field trial data over several seasons and/or sites facilitated targeting gene-based molecular markers associated with interesting traits that could potentially be deployed in crop breeding programmes. Recently, Saade et al. [20] and Cu et al. [21] were able to identify diagnostic SNP markers for which wild barley allelic variants were associated with significant increases in crop yield performance in saline environments and malting quality traits such as α-amylase activity and fermentability, respectively. However, the identification and utilisation of novel exotic alleles for improving barley commercially still remains limited in relation to other cereal crops since its contribution is mostly restricted to experimental populations [4,24]. Nonetheless, in other cereal crops such as rice and wheat, [24,25], the use of wild relatives' genomes has proved crucial in the release of new varieties with improved tolerance to biotic and abiotic threats.
Therefore, optimising the identification of exotic alleles for barley is still a major undertaking for most crop improvement programmes. The recent advances in genomic technologies and sequencing have resulted in the identification of millions of single nucleotide polymorphisms (SNPs) [6,26], and an almost complete genome sequence [27] provides the necessary means to identify candidate genes underlying important quantitatively inherited traits. In this context, traditional introgression breeding schemes through marker-assisted backcrossing and selection need to be reconsidered in the contribution of genetic gains to the crop. Several rounds of backcrossed generations to reduce the effect of undesirable alleles and phenotyping of hundreds (in biparental crosses) or over 1000 (in multiparental crosses) lines in experimental trials can make these studies time consuming and labour intensive [4,28,29]. However, this approach offers the opportunity to test novel allele combinations derived from wide crosses and the detection of rare alleles associated with important traits that would not be detected in large collections of germplasm [30]. Furthermore, the establishment of unique homozygous lines simplifies the genetic dissection of complex traits since these can be measured across locations and/or growing seasons in a genetically stable germplasm [31].
The selection of minimum sets of introgression lines representing the entire genome of the wild donor parent has been suggested as an approach for simplifying the exploration of novel exotic allelic variation while overcoming the trade-offs with using large numbers of recombinant lines [32]. In barley, subsets of introgression lines have been defined for the main AB-populations derived from biparental crosses, which cover most if not all the donor genotype and these have been used to identify QTLs by simple statistical approaches, facilitating the rapid screening of genetic variation for traits requiring detailed phenotypic evaluations. For example, Schnaithmann and Pillen [33] and Naz et al. [34] identified QTLs for Nitrogen Use Efficiency (NUE) and root morphological traits in subsets of 28 and 72 ingrogression lines (BC 3 S 4:6 ) respectively.
For the present study, we used the Recombinant Chromosome Substitution Lines (RCSLs) developed by Matus et al. [9]. This was one of the first AB-populations developed using barley as a model crop and several studies have shown the potential of the lines for exploring novel allelic combinations for malting traits as well as for drought resistance and agronomic traits [9,[35][36][37]. Considering these previous studies, our main goal was to effectively decipher the genetics underlying the variation for qualitative and quantitative traits using a minimum group of lines representing the genome of the exotic wild barley accession,  in the genetic background of the elite North American spring malting barley, cv. Harrington, used as recurrent parent of the population. By using a suitable field experimental approach, we aimed to assess the effect of the donor genome in the performance of the elite parent. Morphological, developmental and agronomic aspects of the crop were evaluated aiming to determine the potential of the wild genome to contribute constitutive improvements to the crop as well as to its adaptability in the conditions of our experiment. Additionally, we attempted to identify chromosome regions explaining the phenotypic variation encountered by the design of a suitable marker-trait association analysis for this minimum group of lines. Finally, we pursued the identification of candidate genes underlying the qualitative and quantitative variation of traits using the newest and most complete reference sequence barley map.

Genotypic characterisation of 28 RCSLs
The subset of 28 RCSLs, cv. Harrington and Caesarea 26-24 were successfully genotyped using the 7000 SNP-markers from the 9 K Infinium iSelect SNP platform [38]. The resulting genetic map consisted of 1848 SNP markers with a total length of 988.8 cM, with map positions determined using the Morex × Barke RILs population [38]. On average, the percentage of wild barley genome introgressed in the RCSLs corresponded to 13.4% but ranged from as low as 3.7% (OSU053) to 26.6% (OSU033) corresponding genetically from a few centiMorgans to almost half a chromosome ( Fig. 1) (Additional file 1: Table S1).
In order to identify genomic regions associated with a range of phenotypic traits, the RCSLs were sown under a 'rainout' shelter specifically designed to deliver different amounts of water under the same experimental conditions. Over two seasons and contrasting water regimes 14 quantitative traits defining crop performance and 5 qualitative traits were scored (Table 1). Considerable variation was observed for all the measured traits, which were consistent across growing seasons and treatments, particularly for the qualitative traits. The effect of the water treatment and growing season significantly affected quantitative phenotypic scores, shaping the response of the lines for important morphological and agronomic traits as revealed by the Genotype by Treatment (GxT) and Genotype by Year (GxY) interaction effects ( Table 2).

Phenotypic variation and genetic dissection of qualitative traits
Five traits were measured visually, including glossy spike and colour, seed shattering, grain threshability and lodging, all of which are common characteristics of wild barley. The last three of these are undesirable traits for crop production and were observed in a few of the lines, OSU12 still shattered, OSU15 awns remained during threshing and OSU65 was prone to lodging. Interestingly, glossy spike, which may play a role in regulating water loss, was present in 5 of the RCSLs (OSU012, OSU047, OSU060, OSU074, and OSU090) and purple coloration of the spike was observed in OSU012, OSU065, and OSU144. For both spike characters we were able, looking at the overlapping exotic chromosome regions shared across groups of lines showing the same phenotype, to identify candidate regions for both traits. An association with a locus (GLS1) on top of chromosome 1H (0 to 3.2 cM), flanked by 12_30969 and 12_30715 was identified for glossy spike. From the pseudomolecules [27] we were able to physically define a 584,212 bp size region with 20 annotated genes. The annotation and descriptors of the genes in and around the locus were assessed as candidates potentially associated with the wax biosynthesis pathway. Three interesting candidates were identified based on its predicted function: an "esterase/lipase/thioesterase family protein" (HORVU1Hr 1G000320.14) an "O-acyltransferase (WSD1-like) family protein" (HORVU1Hr1G000150.13) and a "Fatty acyl-CoA reductase 1" (HORVU1Hr1G000190.3). The last two were located at 551,441 bp and 453,531 bp from the position of the marker 12_30969. Similarly, a region associated with the purple coloration of the grain was defined on chromosome 2H (86 cM to 97.8 cM). The physical position of the flanking markers (SCRI_RS_116694 and SCRI_RS_153793) determined a 6.5 Mbp region with 215 associated transcripts. A "basic helix-loop-helix (bHLH) DNA-binding superfamily protein" (HORVU2Hr1G095 870.1) and a "Myc-like anthocyanin regulatory protein" (HORVU2Hr1G096810.9) were identified as interesting candidate genes. Finally, chromosome regions represented by single RCSLs were found associated with known domestication traits such as the brittle spike (3H, 23.9 cM to 35.1 cM) and awn retention or reduced grain threshability (1H, 95.9 cM to 100.9 cM).

Phenotypic variation and genetic dissection of quantitative traits Morphological and developmental traits
The water treatment had a marked effect on plant growth. Overall, plants under drought were shorter than under irrigation. Considering collar height (COL) as total plant height, water stress reduced average plant height by 12.6 cm in 2013 and 10.1 cm in 2014 as compared with non-stress treatment (Table 2a). In this case the effect of the year was not significant, and the mean values reached were very similar across growing seasons. In regard to the length of the peduncle (PdL) and spike (EAR), these were significantly shorter in 2014 than in 2013. In addition, the water stress reduced peduncle length considerably both years, particularly in 2013 where the difference between non-stressed and stressed was 6.4 cm (3.1 cm in 2014). Furthermore, the spikes were significantly shorter under drought in 2013, with a slight difference of 0.6 cm as compared with the irrigated plots, but similar across water treatments in 2014. In general, the RCSLs were considerably taller than Harrington and showed longer peduncles and spikes.
Additionally, crop development varied across the trials (Table 2b). The 2014 growing season was significantly shorter than in 2013: plants reached heading 5 days earlier on average within a period of 10 days (18 days in 2013).       (Table 2c). The partially irrigated treatment was considered as the control condition to measure the effect of drought in reference to a non-stressed water regime. These plots reached consistently greater yields across seasons providing a more appropriate potential yield value for the experimental conditions. An excess of moisture in the soil profile of fully irrigated plots appeared to be sub-optimal for crop growth and yield, particularly in 2014 growing season (with fully irrigated plots showing yield values in the range of the stressed plots (DY (kg/ha) of 1288.6-4234.3 compared to 1151.8-4294.2 for drought (Table 2c)). The greater accumulated rainfall and rainy days prior to the beginning of the experiment in 2014 could have had an effect on the increased groundwater moisture registered regularly throughout the growing season, particularly at the greater depth in the soil profile (i.e. 40 cm, Additional file 2: Figure S1 & Additional file 3: Figure S2)(Additional file 4: Table S2). For yield, the mean values for the RCSLs tended to be smaller than those for cv. Harrington in all water treatments (Table 2c)  water stressed conditions (Table 2c). In contrast, the mean value for Thousand Grain Weight (TGW) was greater in the RCSLs compared to cv. Harrington in the three water treatments, with maximum weights of 55.9 g and 56.1 g under water stressed in 2013 and 2014 respectively (48.4 g and 49.6 g in cv. Harrington) (Table 2c).

GE interaction and stability of agronomic traits
Because we identified significant variation for dry yield and TGW in response to the environmental conditions (growing season and water treatment), we used an AMMI analysis to interpret our findings further. For dry yield, the environment was the main source of variation explaining 44.1% of the model sum of squares, whereas genotype accounted for 17.6%. Conversely, the weight of the genotype on the sum of squares for TGW (32.8%) was larger than that for environments (29.3%). In addition, the GE interaction for dry yield was minor (9.3%) were overall not significant (P = 0.0682) as compared to the GE for TGW (10.4%, P < 0.001). Yet, the significant interaction with IPCA1 (P < 0.001) for dry yield, suggests slight differences in the genotypes specific adaptation for the trait. Nevertheless, the variation in the genotypes relative performance in different environments was more noticeable for grain weight than for dry yield (Table 3). These results are illustrated by the AMMI2 biplots which reflect the variation across genotypes in the GE interaction and the relationships between the environments (data not shown). However, we focused on the results of the AMMI1 biplots for dry yield and TGW for the identification of RCSLs with improved performance and stability in relation to the elite variety cv Harrington (Fig. 2). Harrington showed good yield performance and relatively stable for the conditions of the experiment, however its mean values for grain weight were lower and less stable than those for the majority of the RCSLs. Genotypes such as OSU047, OSU053 and OSU060 not only had genotype main performances in the range of Harrington but were broadly adapted to the environments tested showing more overall stability in TGW (IPCA1 close to zero).

QTL identification
A total of 177 stable marker-trait associations across experiments and treatments were identified for 14 quantitative traits (P < 0.05) using the mixed model association analysis that tested the effect of SNP polymorphisms on the performance of quantitative traits (Table 4). Eighty-five loci-trait associations were defined for the marker main effect (major QTLs) and 80 on its interaction with treatment (minor QTLs). On average, chromosome regions affecting the phenotype spanned 14.5 cM with the main effect peak QTL genomic region of 5.9 cM. The exotic genome was found to increase the traits' mean performance in 55.3% of the loci associations compared to the 44.7% where the phenotypic mean was reduced. The wild alleles were associated with increases in the estimated means of morphological traits such as collar height and peduncle length whereas agronomic traits such as dry yield and harvest index showed diminished performance (Table 2; Additional file 6: Table S4).

Plant developmental and morphological traits
The most numerous marker-trait associations were found for collar height (20 loci) with a generalised increased of the phenotypic means associated with the wild alleles. The QTLs COL10 (3H), COL14 (5H) and COL16 (6H) were identified as the most significant (P < 0.001) associations for the marker main effect (Additional file 6: Table S4). Each of these three loci explained a large proportion of the phenotypic variance increasing collar height by 27.1 cm (R 2 = 30.5%), 14.5 cm (R 2 = 31.2%) and 21.5 cm (R 2 = 38.1%), respectively. Many of these associations were found to co-locate with major QTLs for closely related traits such as peduncle length and ear length. In contrast, the exotic alleles decreased plant height at 7 associated QTLs, the greatest observed at COL20 on chromosome 7H (10.73 cm shorter, R 2 = 11.3%). For heading date, 12 loci on 6 of the 7 chromosomes (none on 6H) were significantly associated with the trait    Table S4). The exotic alleles contributed to reduce the estimated time to reach heading at seven of the marker-traits associations identified. The strongest associations (P < 0.001) for the trait were found on chromosomes 2H (HEA2, R 2 = 51.1%) and 7H (HEA11, R 2 = 46.8%) where the exotic alleles decreased the estimated time to reach heading by six days. At five marker-trait associations the exotic alleles were found to delay heading date, HEA4 on chromosome 2H delayed by 2.1 days (R 2 = 10.8%). Similarly, the exotic genome was found to decrease the number of tillers at most of the QTLs defined for the trait. The largest effects were associated with TILL2 (2H, P < 0.001) and TILL8 (6H, P < 0.01) with 20.9 and 34.0 fewer tillers as compared to the elite barley alleles estimated effect. The latter was found concomitant with the chromosome region affecting plant height (COL16) and ear length (EAR9) on 6H.

Agronomic traits
Numerous marker-trait associations were found for agronomic traits including dry yield, biomass, harvest index and grains per m 2 for which the wild barley alleles generally diminished the crop mean performance. For example, the exotic genome was found to reduce dry yield by up to 16.4% of the estimated yield for the elite alleles at DY3 on chromosome 2H (R 2 = 23.7%) (Additional file 6: Table S4). A positive effect of the wild genome was only detected for minor QTLs such as DY4 (3H, R 2 = 4.3%) and DY9 (4H, R 2 = 6.8%), associated with yield increases of 16.3 and 11.6% under full irrigation respectively. Additionally, exotic alleles were found to slightly improve yield (2.7%) under drought conditions at DY8 (R 2 = 3%) loci on 4H, however a 9.2% reduction under full irrigation was also associated with the locus. Similar results were obtained for chromosome regions associated with harvest index where maximum decrease on trait performance (from 42.6 to 35.5%) was associated with the mean effect of wild barley alleles at HI5 on 2H (R 2 = 44.2%). This QTL collocated with DY3 on 2H (106.4 to 119.8 cM) (Additional file 6: Table S4). In contrast, most QTLs defined for TGW were associated with positive effects of the exotic genome on the trait mean performance (14 out of 16 QTLs, 8 out of 9 positive associations at the marker main effect level). The wild alleles were found to increase grain weight up to 7.9% (i.e. 3.6 g) of the trait mean performance at TGW13 (5H) and up to 10.1% (i.e. 4.6 g) under partial irrigation at the minor association at TGW3 (2H) (Additional file 6: Table S4). Interestingly, some of the major associations found for the trait were concomitant with QTLs for seed parameters such as seed width. For example, an increase in 4.3% (i.e. 0.15 mm) of seed width associated to the wild genome was found for SdW11 in the same chromosome region of 5H for TGW13. Similar observations were made on chromosomes 3H for SdW5 and TGW8 (3.9 and 5.9% coincident increase on seed width and weight) and on chromosome 4H for SdW7 and TGW10 (2.9 and 4.9% increases respectively).

Pleiotropic effects of major developmental loci
Groups of QTLs were found to cluster at several genomic locations. Major associations for heading date and plant height were generally showing the strongest effect at these chromosome regions. For example, HEA2 on chromosome 2H (R 2 = 51.1%, P < 0.001) was found to collocate with a major QTLs for biomass (BY1) as well as for collar height (COL3), peduncle length (PdL2) and seed parameters (SdL2 and SdA2) in the marker-treatment interaction effect (Fig. 3). Here the exotic alleles reduced time to heading by 6 days (P < 0.001), decreased biomass production (7% less, P < 0.001) and were shorter (up to 11.7 cm shorter plant under full irrigation, P < 0.01) but were associated with longer peduncles and larger seeds, particularly under drought (3.9 cm and 0.82 mm 2 estimated increase respectively, P < 0.05). In addition, chromosome regions associated with TGW (TGW3 and TGW4) were found near HEA2 where the exotic alleles increased the mean trait performance by 6.4% (i.e. 2.9 g). Similar effects were observed for a cluster of QTLs located on chromosome 7H sharing a peak marker region from 84.6 cM to 89.5 cM. Exotic alleles decreased time to heading by 6 days (HEA11) and reduced yield by 10.2% (DY14) while lengthening the plant peduncles PdE9 and PdL12 (4.4 cm and 4.9 cm respectively). The QTLs COL20, BY9 and TGW16 were found to collocate with the HEA11 locus (Fig. 3). Here the wild alleles were associated with 10.7 cm reduction on plant height, 911.5 kg ha − 1 less biomass yield and up to 3 g increase on TGW under drought respectively. Major developmental genes associated with plant phenology were thought to be responsible for the pleiotropic effect manifested across plant morphological and agronomic traits.

Identification of candidates for grain weight and size improvement
In order to investigate putative candidate genes that could explain the phenotypic variation found for QTLs where wild alleles could contribute favourable alleles for improved crop performance, a few chromosome regions unlinked to developmental loci were targeted. The concomitant effect of seed weight and size was used as an exemplar to examine gene annotations of potential candidates that could be responsible for an increase in seed plumpness and therefore, seed size and weight.
The physical position of the flanking markers defining the SdW5/TGW8 locus on 3H and the SdW11/TGW13 locus on 5H defined a 3.6 Mbp and 2.9 Mbp regions, respectively, for which 92 and 51 annotated genes were found using the barley new genome assembly [27]. For the SdW5/TGW8 locus on 3H, the transcript HOR-VU3Hr1G010530.2 (corresponding to MLOC_61457.1 in the previous assembly [39] predicted to encode for a 14-3-3 protein was identified as an interesting candidate gene due to the regulative role of these proteins in the starch biosynthesis pathway in Arabidopsis [40] and during grain development in cereal crops such as wheat [41], rice [42] and barley [43]. In addition, HORVU3Hr1G010570.2 (MLOC_74351.2 in the previous assembly) was also targeted due to its function as an oligopeptide transporter (PTR2 family protein) and its potential role in the mobilisation of peptides in grain-filling seeds. Finally, for the SdW11/ TGW13 locus on 5H, the HORVU5Hr1G093350.11 and HORVU5Hr1G093390.2 transcripts (MLOC_71738.1 and MLOC_76072.2 in the previous assembly respectively), both described as 'Solute carrier family 22 member 1' , were identified as potential candidates for the locus.

Discussion
Given the amount of valuable exotic genetic resource with potential for barley crop improvement and the need for enriching the crop genetic base for sustainable agriculture, effective pre-breeding strategies to systematically mine novel allelic variation are required [44]. In this study we genotyped, using the high throughput 9 K SNP chip [38], a representative set of 28 lines from a second backcrossed generation of RCSLs [9] to identify QTLs of agronomical relevance. Replicated and reproducible field trials in a semi-controlled and realistic environment, provided by a rain-out shelter, allowed precise phenotypic data for morphological, developmental and agronomic traits to be measured. In addition, we generated an improved genetic map for over 1800 SNP markers using the 9 K iSelect chip for barley [38], and designed an association analysis, by means of a mixed model approach, which allowed a coarse but accurate dissection of the genetics underlying quantitative and qualitative aspects of the crop. Putative candidate genes were identified using the recently published barley genome assembly [27].

Improved wild genome marker coverage with a small number of representative lines
Using this approach we were able to achieve a greater degree of marker saturation, which improved the estimation of the extent and overlap of the substituted segments from the wild barley donor, doubling the number of markers mapped previously on the full set of 140 lines ( [45]; 765 mapped SNPs). This meant that several traits, which had been observed but unmapped in a previous field evaluation [9] could be located to a specific position on the genetic map. For example, we were able to associate awn retention (an undesirable trait for cultivated barley) within a 5 cM region on chromosome 1H. Furthermore, the locus was found coincidental with the threshability locus thresh-1 identified by Schmalenbach et al. [46] using a large set of 91 lines (BC 4 S 2 ), providing similar mapping resolution as the 28 RCSLs used in our study. For some traits, however, the small sample size could only locate the QTL to large chromosomal regions. An example of this is the QTLs identified on 2H (HEA2) and 7H (HEA11), which were found to reduce heading date by 6 days when the wild barley allele is present. We identified the main determinant of long day photoperiod response locus (Ppd-H1) and an earliness per se locus (eps7L) [47] as good candidate genes for these major QTLs, respectively. However, the extent of the QTL effect spanned more than 40 cM in both cases, which could have partially masked the effect of other flowering genes in the proximity of these loci such as HvCEN on 2H [38] and the photoperiod response gene HvCO1 on 7H [48,49]. Only by 'breaking up' the chromosome introgressions through new backcrosses in groups of Near Isogenic Lines (NILs) could some of these QTLs be located to smaller genomic regions.

Detailed field phenotyping in a subset of 28 RCSLs evaluated under rain-out shelter
Most detailed phenotypic analyses required controlled experimentation and considerable replication which limits the number of lines that can be scored and is In both cases, heading date accounted for the strongest association at the loci (HEA2 and HEA11) explaining the two largest phenotypic variations found for the trait often unrepresentative of field conditions [50]. As a compromise we optimised the number of lines that represented the complete wild barley donor genome to perform detailed replicated and reproducible phenotyping in realistic semi-controlled trials under rain-out shelters over two consecutive seasons. The results obtained were supported by four replicates per genotype and treatment each growing season, despite the coarse estimation of QTLs derived from using fewer numbers of lines; we could achieve a good analytical power in the study. In addition, the experimental setup allowed an effective control over the water treatments while other environmental variables varied across the different treatments in the field trials. Moreover, monitoring the soil profile moisture throughout the two growing seasons and gathering climatic data from the onsite meteorological weather station was key in the understanding of the seasonal variations found for crop performance particularly for the irrigated treatments. An excess of irrigation delivered in the full irrigated plots could have caused hypoxia and waterlogging, having a negative impact on optimal crop production under irrigation. Nevertheless, establishing two irrigated regimes allowed us to define yield potential values in a non-stressed environment that could be used as a reference. This contrasts with previous QTL field studies of the entire RCSLs population, or a set of 80 lines, in which plants were grown in contrasting sites differing not only in annual rainfall (some under severe drought conditions) but also in soil characteristics, altitude and temperature [36,51]. In previous studies, the scale of the experiments restricted the evaluations to a maximum of two replicates per genotype and season which could have limited the statistical power of the genotypic or environmentally -induced QTLs identified [52]. There are limitations to using small population sizes for QTL analyses and the power to detect small effect QTLs may be lost. The choice of population sizes has been discussed for decades and theoretically larger sizes are more likely to detect small effect QTLs, practically this is not always possible, and compromises are required. Yang et al. [53] addressed this question using a population of double haploids and conclude that with increased marker density, small populations were suitable for detecting QTLs. Because the introgressed regions are genetically well defined and almost the entire donor parent is represented, using a well characterised subset we can identify QTL to particular genomic locations. Validation and further fine mapping, using Near Isogenic Lines (NILs) is required to identify candidate genes for specific traits.
RCSLs in the identification of putative candidate genes associated with the accumulation of spike cuticle waxes Although the study was designed to identify quantitative traits, we were able to characterise and genetically dissect adaptive traits associated with the wild barley donor that had not been reported in previous studies of the RCSLs [9,[35][36][37]51]. One such trait, which requires detailed observations, is spike glossiness. This is commonly associated with wild barley [16] and wild wheat [54]. The phenotype is caused by a depletion of β-diketones during the formation of spikes cuticle wax components which leads to a bright green non-glaucous phenotype due to modifications in the microcrystalline structure of the cuticular wax layer [55,56]. Reduced glaucoussness increases tissues permeability to water loss, affects the angle at which irradiation is captured and increases tissues temperature, affecting the effective protection of the epicuticle as barrier against abiotic stresses such as drought or heat [42,57]. From the literature, it is unclear the role that this phenotype would play in wild barley adaptation. We speculate that the increased wettability of non-glaucous spikes would favour seed imbibition and so seed germination in the water limited conditions typical from wild barley natural habitats. However, King and von Wettstein-Knowles [58] have shown that increased ear wetting could promote pre-harvest sprouting in humid environments and be detrimental to crop production. The segregation for spike glaucosity in the 28 RCSLs was localised to a 3.2 cM region between 0 and 3.2 cM on chromosome 1H for which wild alleles were associated with the glossy non-glaucous phenotype. The locus seems to correspond to the eceriferum gene locus Cer-yy reported by Lundqvist and Von Wettstein-Knowles [59] for which no candidate genes have been found. In our study, we identified three putative genes for the locus based on their annotations and their conceivable function in the wax biosynthesis pathway taking other cer mutants described for barley as a reference [54,60]. An esterase/ lipase/thioesterase family protein, an O-acyltransferase (WSD1-like) family protein and a fatty acyl-CoA reductase were tagged as potential candidates involved in the generation and transport of fatty acids conforming the wax cuticle [55,60,61]. However, it is interesting to note that, because of the dominant inheritance of the gene, the function of Cer-yy has been described as an active regulator in the differentiation process of spike lemmas rather than in the process of wax synthesis and deposition by itself [62]. Therefore, other candidate genes that might be involved in the regulation of events occurring during spike differentiation would need to be considered. Investigations are underway to fine map the locus and elucidate its function in sets of NILs differing for the Cer-yy locus.
Major developmental loci determined the largest phenotypic variations found in the quantitative variation of traits The strongest marker-trait associations found were related to known major developmental genes such as Ppd-H1 and eps7L flowering genes and the sdw1-denso semi-dwarfing mutation. Yield related QTLs cluster around these loci due to the strong pleiotropic effects that flowering has on plant performance. In most of the cases, the introgression of exotic alleles at these loci limit vegetative growth or increase plant height, which in general negatively affect traits such as grain yield and harvest index. The effect agrees with observations in other backcross populations using germplasm or landraces from the Fertile Crescent as donor genome [22,33,[63][64][65][66] which, to some extent, reflects the strong selective pressures that these traits have experienced during domestication and particularly through the crop breeding process [6,[67][68][69]. Interestingly, however, TGW, which is often used as a measure of yield, was shown to be greater in most of the RCSLs, an observation also noted by Matus et al. [9]. Several of the marker-trait associations for thousand grain weight (TGW4 on 2H and TGW16 on 7H) appear coincidental with QTLs identified using the larger population of 137 RCSLs genotyped with 1536 SNP markers [70] and phenotyped in contrasting Chilean environments over one growing season [51]. In our study, the wild alleles increased grain weight by 6.4% for the marker main effect at TGW4 and in response to water stress at TGW16. However, it is most likely that these are determined by the pleiotropic effects of the major developmental loci, HEA2 and HEA11, in the region for Ppd-H1 and eps7L flowering time genes.
Potential candidate genes associated with transport and accumulation of carbohydrates were identified for locus where Caesarea 26-24 alleles contributed gains in grain size and weight of the elite cv. Harrington Manipulating source-sink relationships and deciphering the genetics and physiological mechanisms associated with dry matter partitioning in grain development is one of the major targets for achieving significant genetic gains in yield of major cereal crops [71,72]. Genes associated with effective transport of sucrose [73] into the growing grain as well as those responsible for cell growth and expansion [74] and carbohydrates metabolism during grain development [75] have been directly associated with modifications on grain weight and its components (grain length and width) and therefore, potentially useful for optimising grain development in cereal crops. However, the segregation for main developmental loci seems to largely interfere in the localisation of loci associated with stable variations on grain weight and size in QTL studies in biparental populations [76,77]. In our study we were able to identify two relatively small non-developmental loci on chromosomes 3H and 5H where the wild alleles were associated with a 5.9 and 7.9% increase in main grain weight, respectively (i.e. mean additive effects of 2.7 g and 2.2 g over 1000 grains respectively), showing concomitant positive effects on grain width. Evidence from the literature pinpointed the region on 3H as a particularly interesting target to investigate in the future. We identified an interesting potential candidate gene which encodes a 14-3-3 protein reported to be actively involved in the regulation of the carbohydrate metabolism in barley, predominantly during grain filling and germination [43,78]. Indeed, Alexander and Morris [43] showed how increased levels of 14-3-3 protein in barley endosperms inhibited the activity of sucrose synthase proteins responsible for the storage of starch in sink tissues. Similar roles for these proteins have been reported in wheat [41] and rice [79] grain development for which accumulation of 14-3-3 transcript and formation and accumulation of starch were negatively correlated leading to poorly filled grains. In addition, 14-3-3 proteins have also been found as regulators in the formation of trehalose 6 phosphate (T6P), an intermediate compound in the trehalose biosynthesis pathway [80,81]. This key molecule acts as a signal for sucrose availability having a huge influence on the stimulation and rate of starch biosynthesis in plant sink tissues [82][83][84] which is particularly important in the development of the starchy endosperm during grain development of cereal crops such as wheat [75]. Therefore, despite the synthesis of polysaccharides in cereals endosperm being rather intricate, we could hypothesise the exotic allelic variant for this QTL contributes to differences in the elite barley sink strength or the ability to accumulate photo-assimilates in the grain which could contribute constitutive genetic gains in crop yield performance. This hypothesis was hinted at in previous studies, suggesting genotypic differences in the translocation and accumulation of carbohydrates and osmolytes in the RCSLs [37,85], however further investigations using NILs for the targeted regions of these QTL to fine map or transcriptomic analysis on groups of RCSLs harbouring the introgression on 3H would be necessary to support this hypothesis.
The utilisation of the new barley genome assembly [27] was key in the identification of interesting candidate genes explaining the phenotypic variation associated with these loci. However, it should be noted that the abundance of marker-trait associations detected (177 in total) could represent some overestimations that may confound the results of the analysis possibly as a consequence of epistatic interaction between wild barley introgressions [86,87]. Nevertheless, we see our approach as a useful first step towards the genetic dissection of traits, which require detailed phenotyping and replication as well as those that vary across seasons. Indeed, favouring greater replications and precision phenotyping in small but genetically well-characterised sets of lines has been seen as an effective optimisation for QTL-mapping rather than increasing the number of lines evaluated at the expense of replicates [53]. More and more novel secondary traits with direct impact on crop production and sustainability appear to be interesting targets for breeding programmes, however little is known about their genetic control given that, in many cases, their evaluation requires sophisticated or costly evaluations less suitable for large mapping population. For instance, rhizodeposition and soil microbial interactions are complex traits which have been found to affect soil C content and availability of water and nutrients for plants. Recently, Mwafulirwa et al. [88], showed variability in soil carbon dynamics associated with genotypic differences in a small group of six RCSLs used in the present study. In this first evaluation no marker-trait association study could be conducted, however the principle of the current study could be useful for identification of marker-trait associations in a manageable group of lines. Similarly, genotypic differences for traits associated with improved photosynthetic performance under severe drought conditions such as grain Δ 13 C, have been reported in small groups of RCSLs [37]. These differences could eventually relate to genotypic differences in root system development across the population. In fact, in a recent evaluation of few RCSLs we found important genotypic variations of root growth parameters in a 2D pouch experimental approach [89]. Hence, precise phenotyping of these traits in the subset of 28 RCSLs could lead to the identification of major loci governing phenotypic variation of new interesting traits requiring thorough and expensive analysis. However, it is important to keep in mind that only genetic dissection of highly heritable quantitative traits might be achievable [53] and that for increasing the accuracy of the QTL detection and reduce the number of misleading associations more RCSLs would need to be included in the study.

Conclusions
Here we have proposed an approach to dissect quantitative traits using an exotic genetic library selected from an early backcrossed generation (BC 2 ). A single locus analysis by means of a mixed model approach was found effective for coarse QTL location in a reduced group of recombinant lines with more than one alien introgression per line and/or chromosome. We were able to identify stable marker trait associations for relevant agronomic characters that were compared with the literature and previous QTL studies in the RCSLs [51]. Most of the QTLs clustered at developmental loci where exotic alleles had a strong effect on plant phenology and development. Nevertheless, non-developmental loci associated with favourable effects of the exotic genome on crop performance could also be identified.
Finally, wild barley accessions are being recognised as a novel source of allelic variation, which could contribute to enhance crop performance under challenging environments [3,4]. In addition to introgession lines using wild barley as a donor, other advanced backcross populations are being developed together with state-of-the-art genomics technologies will provide that step change required using this yet untapped variation.

Plant material and genotypic characterisation
Twenty-eight Recombinant Chromosome Substitution Lines (RCSLs) were selected to represent the entire genome of the Israeli wild barley accession, Caesarea 26-24, in the uniform elite genetic background of the North American two-row spring malting barley cv. Harrington [9]. Briefly, the lines were obtained through an advanced backcross strategy [90] by which, after two backcrosses with the recurrent parent (cv. Harrington) and six generations of self-pollination (BC 2 F 6 ), the wild donor genome was segmented and introgressed in 137 RCSLs [9]. The selection of a subset of 28 RCSLs was based on the genotypic architecture of the lines and a minimum tilling panel for each of the barley chromosomes using SNP marker data generated as part of a Generation Challenge Programme [45] for the Barley Oligo Pool Assay 1 (BOPA1) [70]. For the present study, the set of preselected RCSLs was advanced to the BC 2 F 9 generation and characterised for a larger set of markers from 9 K Infinium iSelect SNP platform (7864 gene-based SNP markers) described by Comadran et al. [38].
Genomic DNA was extracted from leaf samples of ten-day-old seedlings (around 100 mg wet weight plant material) from the 28 RCSLs, cv. Harrington and Caesarea 26-24 using the QiagenDNeasy Plant Mini Kit (Qiagen). Genotyping was conducted by TraitGenetics GmbH (Gatersleben, Germany) using DNA samples at a concentration of 50 ng μl − 1 . The genotype calls were analysed using the Illumina GenomeStudio Genotyping (GT) module (Illumina, San Diego, CA) and Flapjack [91]. Polymorphic SNP marker data was imported to Graphical Genotypes software, GGT 2.0 [92], to visualise and characterise the proportion of exotic chromosomal regions from the donor parent introgressed into the elite genetic background for each of the lines.

Field experimental set-up and phenotypic characterisation
The 28 RCSLs and the recurrent parent, cv. Harrington, were grown in field trials over two growing seasons (2013 and 2014) at The James Hutton Institute (Latitude 56.45°N, Longitude 3.06°W). In order to characterise the RCSLs under different water regimes, the experiments were carried out using a rain-out shelter that protected the plots subjected to water deficit from rainfall providing better control over the soil water content in the irrigated treatments. The experiments were established in a row column design with water treatments and replicates superimposed. Four replications and three water treatments were established each growing season at different location within the field trial. Experimental plots were arranged in 72 rows and 5 columns along the trial. Each plot consisted of six 0.8 m long rows of plants with a 0.2 m gap between plots. Guard plots (cv. Concerto) were sown in extra rows between the replicates within each water treatment and along the sides of the trial to minimize the edge effect. In addition, an extra row of guard plots was sown at the opened ends of the rain-out shelter and along the sides of the field trial to reduce the amount of water coming from rainfall (Fig. 4).
Sowing dates were 15th and 17th April in 2013 and 2014, respectively. Seeds were sown using a Wintersteiger Seedmatic drill at 4 cm depth. In 2013 the granulated fertilizer HYDRO Sulphur cut (22% N, 4% P 2 O 5 , 14% K 2 O, and 7.5% SO 3 ) was used as a seed dressing and in 2014 as a top-dress fertilizer. The fertilizer was applied at 270 kg ha − 1 the day of sowing and up to 500 kg ha − 1 a week after sowing following the local agronomic practices for spring barley. To minimize the effect of pests, field trials were treated with fungicides Bravo 500 (1 l ha), Sil-traXPro (0.6 l ha − 1 ), Vegos (0.25 l ha − 1 ) and Justice (0.15 l ha − 1 ) on the 7th and 9th June and for aphids on 11th and 17th July in 2013 and 2014, respectively. Once the seedlings had established in the field at growth stages GS13-GS14 [93], the rain-out shelter was built and skinned with polythene film 0.15 mm thick Clear-High UV (Visqueen, UK). This occurred about a month after sowing, the 16th May 2013 and 14th May 2014, when the sprinkler irrigation system was set up defining different water treatments from the 23rd May and 21st May in 2013 and 2014, respectively.

Water treatment
Two irrigated (full irrigation and partial irrigation) and non-irrigated treatments (drought) were established to assess the response to water deficit in the development of the crop. The irrigated plots were watered at the equivalent of 2.5 mm rainfall per day based on the Met Office rainfall average records for east Scotland from 1910 during the barley growing season (http://www.metoffice.gov.uk/). The irrigation was stopped at different growth stages [93] in these two treatments: at grain milk development (GS 73-77) in fully irrigated plots (23rd July in 2013 and the 24th July in 2014) and before anthesis (GS 50-55) in the partially irrigated plots (24th June in 2013 and 23rd June in 2014) and no more water was added to the plots. The non-irrigated plots constituted the drought treatment and they were not watered after the rain-out shelter was built.
The soil moisture content in the field was monitored on a weekly basis using a capacitance probe PR2/4 (Delta-T Services Ltd) at different depths within the soil profile in 53 access tubes evenly distributed across the field trial. In addition, meteorological data were obtained from The James Hutton Institute weather station records. Daily air and soil temperature and rainfall accumulation were collated to define the climate conditions prior to the setup of the rain-out shelter and throughout the field experiments in each growing season.

Phenotypic traits measured
The RCSLs were characterised for thirteen morphological, developmental and agronomic traits in the 2013 and 2014 field trials (Table 1). Additionally, qualitative traits such as seed shattering, grain threshability, purple grain and glossy spike were recorded as presence or Fig. 4 Field experimental setup. a Schematic layout of the row column experimental design with three water treatments superimposed on top and four replicates (doted boxes) within each water treatment. Droughted (light blue) and fully irrigated (dark blue) plots faced north and south respectively in 2013. This arrangement was swapped in 2014. Partially irrigated plots (medium blue) were established in between the other two water treatments. Guard plots (green) were sown between water treatments and the edges of the rainout shelter. b Guard plots view from along the outer long edge of the rain-out shelter. c Top view of the field trial. d Sprinkler irrigation system absence throughout crop development. At plant physiological maturity, the two middle rows of each experimental plot were hand-harvested and used to determine yield and yield component data for each genotype.

Data analysis Phenotypic variation of quantitative traits
Residual maximum likelihood method (REML) was used to estimate fixed effects and random effect parameters in the traits measured [94]. A three factorial mixed model analysis was used to assess the effect of genotype, treatment, growing season and their interaction on the phenotypic variation observed in thirteen quantitative traits. Statistical significance for the fixed model effects was assessed by using a chi-squared based Wald-test. The random term included the replicate and the column to take into account the spatial variation in the field trials. Therefore, in a first approach the marker information was not included, and the genotypes were considered in the fixed term of the three factorial mixed model analysis that is presented in the following section.
In order to identify genotypes that significantly differed from the recurrent parent, a Fisher's least significance difference (LSD) test was used to compare the Best Linear Unbiased Estimators (BLUEs) of the fixed terms at 0.01 levels of probability. The VMCOMPARI-SON procedure in Genstat 17 (VSN International, UK) was computed for the analysis.

Marker-trait association analysis
Marker-trait associations for qualitative measurable traits were visually identified as overlapping exotic chromosome introgressions shared among RCSLs showing the wild phenotype. For quantitative traits, identification of associations required the development of an analytic approach that integrated the molecular marker information into a mixed model to test the effect of SNPs on the phenotypic variation. The estimates for fixed and random parameters of the mixed model were obtained by residual maximum likelihood (REML) method [94] using GenStat 17th Edition (VSN International, UK). The trait response was calculated according to the following hierarchical model: where μ is the general mean, M i is the fixed effect of the i-th SNP marker, T j is the fixed effect of the j-th water treatment, M i *T j is the fixed effect of the interaction of the i-th SNP marker and the j-th treatment, Y k is the fixed effect of the k-th year, M i *Y k is the fixed effect of the interaction of the i-th genotype and the k-th year, T j *Y k is the fixed effect of the interaction of the j-th treatment and the k-th year, R l (T j *Y k ) is the random effect of the l-th replicate nested in j-th treatment and k-th year, C m (T j *Y k *R l ) is the random effect of the m-th column nested in the j-th treatment, k-th year and l-th replicate, G n is the random effect of the n-th genotype and ε o(ijklmn) is the residual term of X ijklmno . The analysis was conducted for one marker at a time in a loop computed for each trait. For the analysis, blocks of contiguous markers that were polymorphic for the same RCSLs were treated as a single entity. Since the wild barley introgressions overlapped across contiguous regions within a chromosome, the length of each genomic region tested was determined as the genetic distance between the first SNP markers defining adjacent loci. This simplified the computational process of the analysis by removing redundant marker information. Nevertheless, mapping information of the entire set of markers was taken into account to define the size of the chromosome regions associated with the phenotype observed or QTLs.

QTL location and identification of candidate genes
Blocks of markers with significant main effects and/or interactions with the water treatment at 0.05(*), 0.01(**) and 0.001(***) levels of significance were considered for QTL location. Neighbouring blocks of markers showing significant effects in the same direction were assumed to be part of the same QTL and used to define the significant region of the marker-trait association. The p value of the most strongly associated SNP marker-trait within a chromosome region was used to define the peak region of the QTL. Since we were interested in identifying only stable marker-trait association across experiments and treatments, the SNP × Year interaction were not investigated further.
The Best Linear Unbiased Estimates (BLUEs) for each trait for the donor ([Hsp]) and the recurrent parent ([Hv]) alleles obtained for the significant peak region were used to calculate the relative contribution (RP) of the exotic parent alleles on the trait performance as follows: The position of major determinant genes related to plant phenology and morphology was estimated from previous studies and public gene databases to support the results obtained from this analysis. Additionally, some genomic regions accounting for a significantly large proportion of a trait's variance (R 2 ) were used to identify possible putative candidate genes underlying the predicted phenotypic variation. In this case the number of genes in the targeted regions was first estimated by looking at the physical position of the iSelect SNP markers in the newly released barley genome assembly [27]. Then putative gene content was compiled by examining the annotated high-confidence genes for each defined region. Finally, possible candidates were identified by gene ontology annotations and the homologies found in other crop relative species genomes using Basic Local Alignment Search Tool (BLAST) analysis on the NCBI website.

Stability analysis of agronomic traits
Values for yield (DY) and grain weight (TGW) for each of the RCSLs were used to investigate the effect of the exotic genome in the performance and stability of the crop across the environmental conditions imposed in the field. Genotype by environment (GE) interaction was studied by means of an additive main effects and multiplicative interaction model (AMMI) for which six environments were defined as the combination of growing season and water treatment. Using this method, the overall variation observed for yield traits such as dry yield and thousand grain weight was partitioned into genotype main effects, environment main effects and GE interaction [95]. The model combines a principal component analysis (PCA) of the GE interaction that is obtained as a result of a two factorial analysis of variance taking genotype and environment as the main effect. The AMMI model is: where Y GE is the yield value of genotype G on environment E, μ is the general mean, G i is the genotype effect, E j is the environment effect, λ k is the singular value (eigenvalue) of the k-th principal component axis, γ ik and δ jk are the genotype and the environment scores (eigenvectors) for the k-th principal component axis, ρ ij is the interaction residual and ε ijk is the random error. As a result, the interaction principal components generated (IPCA1 and IPCA2) are used to graphically summarise (biplots) the GE variation observed. The AMMI2 biplot uses the genotypes and environments scores for the first two IPCA components (IPCA1 on the X-axes and IPCA2 on the Y-axes) giving information about the GE patterns observed. Genotypes with IPCA scores close to zero are more stable or widely adapted to the tested environments whereas specific adaptation of the genotypes is determined by the length of the orthogonal projection of the genotype points onto the environmental vectors. In addition, the AMMI1 biplot tests the genotypes yield potential and stability simultaneously by plotting in the same diagram the average yields (X-axes) and the first dimension measure of GE interaction (IPCA1) for both genotypes and environments (Y-axes). AMMI model was computed using Genstat 17 (VSN International, UK).