Genome-wide association mapping for root traits in a panel of rice accessions from Vietnam

Background Despite recent sequencing efforts, local genetic resources remain underexploited, even though they carry alleles that can bring agronomic benefits. Taking advantage of the recent genotyping with 22,000 single-nucleotide polymorphism markers of a core collection of 180 Vietnamese rice varieties originating from provinces from North to South Vietnam and from different agrosystems characterized by contrasted water regimes, we have performed a genome-wide association study for different root parameters. Roots contribute to water stress avoidance and are a still underexploited target for breeding purpose due to the difficulty to observe them. Results The panel of 180 rice varieties was phenotyped under greenhouse conditions for several root traits in an experimental design with 3 replicates. The phenotyping system consisted of long plastic bags that were filled with sand and supplemented with fertilizer. Root length, root mass in different layers, root thickness, and the number of crown roots, as well as several derived root parameters and shoot traits, were recorded. The results were submitted to association mapping using a mixed model involving structure and kinship to enable the identification of significant associations. The analyses were conducted successively on the whole panel and on its indica (115 accessions) and japonica (64 accessions) subcomponents. The two associations with the highest significance were for root thickness on chromosome 2 and for crown root number on chromosome 11. No common associations were detected between the indica and japonica subpanels, probably because of the polymorphism repartition between the subspecies. Based on orthology with Arabidopsis, the possible candidate genes underlying the quantitative trait loci are reviewed. Conclusions Some of the major quantitative trait loci we detected through this genome-wide association study contain promising candidate genes encoding regulatory elements of known key regulators of root formation and development. Electronic supplementary material The online version of this article (doi:10.1186/s12870-016-0747-y) contains supplementary material, which is available to authorized users.


Background
Vietnam is a tropical country in Southeast Asia with a rice-based agricultural economy. Rice is grown on 82 % of the agricultural area, which corresponds to 7.75 M ha for a production of 43.6 million tons in 2012 [1]. Vietnam is the world's second rice exporter (6.4 million tons in 2012). Rice is mainly grown under irrigated conditions in the river deltas, notably the Mekong delta in South Vietnam (52 % of Vietnam rice production) and the Red River delta in North Vietnam (18 % of Vietnam rice production); however, because three-quarters of Vietnam's territory is made up of mountainous and hilly regions, other ecosystems are also represented (upland, rainfed lowland and mangrove).
Vietnam is among countries most threatened by climate change [2]. In particular, between spring and summer, all of the central areas of Vietnam are subject to periods of recurrent and severe drought that affect rice plantlets just after planting or plants during grain filling and can result in important yield losses. To improve rice drought resistance, an ideotype with a large number of deep and thick roots and a high root-to-shoot ratio was advocated, assuming that there was water at depth in the soil profile [3]. However, because roots develop underground and are not easily observed, this ideotype is difficult to select for. One way to achieve this goal would be to use indirect selection based on markers that are tightly linked to genes that control these root traits [4]. Knowledge of the genetic control of root development in rice is rapidly improving. Numerous root quantitative trait loci (QTLs) have been detected in various mapping populations ( [5] for a review). Three QTLs that are involved in water and nutrient uptake by roots have recently been cloned [6][7][8]. Furthermore, other QTLs have been finely mapped, and the underlying genes are close to being identified [9,10]. The rice orthologs of several genes that were initially identified in Arabidopsis have also been shown to have an effect on root development in rice (reviewed in [11][12][13][14]). However, this useful information is still far from giving a clear overall pattern of the network of genes that are involved. Genome-wide association studies (GWAS) are a way to directly identify new candidate genes or, more reasonably, to narrow down the chromosomal segments that carry functional factors to much smaller intervals [15]. Because of the lower linkage disequilibrium (LD) that is encountered in natural populations, the resolution of QTL detection in such populations is higher than that obtained by classical mapping populations of the same size. However, the corollary of this low LD is that the average distance between the markers that are used to genotype the population needs to be shorter than the LD decay distance to properly cover the whole genome. Such high marker density has only become accessible, in most species, with the development of new sequencing technologies, notably genotyping by sequencing (GBS). Genotyped panels representing a broad geographic diversity have been developed [16,17] and used in GWAS for root traits [17,18]. However, although their size is on the order of 150 to 400 accessions, these panels still explore only a small fraction of the large rice diversity. Accessions from Vietnam are not widely represented in world-wide panels although local genetic resources, notably from geographically diverse countries, have been shown to bear unexploited but interesting variations for useful traits [19,20]. Even among the 3000 rice genomes that were recently sequenced, only 55 Vietnamese accessions were included [21]. To take advantage of the allelic richness that can be encountered locally, we have developed a panel that is exclusively composed of accessions from Vietnam (Additional file 1: Table S1). This panel of 182 accessions has been genotyped with approximately 22,000 single nucleotide polymorphisms (SNPs) using GBS, and its structure and the decay of LD have been analyzed in depth [22]. The panel is composed of twothirds indica, one-third japonica and a few admixed accessions. Several subpopulations (6 in the indica subpanel and 4 in the japonica one) were detected within each subpanel. The average distances between polymorphic markers are 18 kb, 28 kb and 44 kb, for the whole panel, the indica and the japonica subpanels, respectively. On average, the pairwise LD, measured by r2, reaches 0.52 and 0.71 at 25 kb in the indica and japonica subpanels, respectively, and decays faster to background levels in the indica subpanel (r2 < 0.2 at 100 kb) than in the japonica subpanel (r2 < 0.2 at 425 kb). Because the distance between markers is shorter than the LD decay, the marker coverage is sufficient to undertake GWAS in all panels. Because the accessions came from different ecosystems, ranging from upland to mangrove, that were subject to specific but severe stresses (e.g., drought for upland or rainfed lowland rice or salinity for irrigated or mangrove rice), this panel constitutes an excellent resource for studying the genetic control of root system architecture and abiotic stress resistance.
In this paper, we performed an association study on root traits using our panel of Vietnamese varieties. Using a soil column system, different root parameters (maximum root depth, root biomass in different soil layers, crown root number, and crown root thickness) were investigated. Several QTLs were detected in the indica and japonica subpanels or in the whole panel. Among these QTLs, one associated with crown root thickness on chromosome 2 and one associated with crown root number on chromosome 11 had the highest levels of significance.

Phenotyping
The results of the analysis of variance (ANOVA) are presented in Table 1. The variety effect was highly significant for all of the traits. The broad-sense heritability of the traits, ranging from 0.65 to 0.90, was moderate to high, with the exception of two related traits (deepest point reached by roots (DEPTH) and maximum root length (MRL)) for which values of 0.35 and 0.46, respectively, were registered. The replication effect was often significant, and the block effect was almost always highly significant, indicating some internal heterogeneity within replicates that the design helped to control. This environmental heterogeneity may be due to slight differences in light intensity due to the shade from neighbor trees and to the disposition of the blocks in the screenhouse, some peripheral, some central. The accession means were therefore adjusted from block effects. The mean, standard deviation, range and coefficient of variation (CV) of the whole panel are presented in Additional file 2: Table S2. A graphical representation of the plant architecture of each accession is shown in Additional file 3: Figure S1. A moderate to large variation was observed for most of the traits, as seen through the CVs of the panel varying from 20 % to 63 %, with the exception of longest leaf length (LLGHT), DEPTH, MRL, root thickness (THK) and shallow root proportion (SRP) whose CVs were less than 20 %. The same elements for the indica and japonica subpanels are presented in Table 2. For most of the shoot and root biomass traits, including deep root traits (shoot dry weight (SDW), MRL, root mass in the 00-20 cm segment (DW0020), root mass in the 20-40 cm segment (DW2040), root mass in the 40-60 cm segment (DW4060), root mass below 60 cm (DWB60), root dry weight (RDW), deep root mass (<40 cm) weight (DRW) and plant dry weight (PDW)), the mean values of the indica accessions were higher than those of the japonica accessions. The indica accessions had on average a much larger biomass, shorter leaves, more tillers and many more crown roots but had thinner roots and fewer resources allocated to roots, notably to deep roots (lower root to shoot ratio (R_S) and slightly lower deep root proportion (<40 cm) (DRP)). However, the trait distributions ( Fig. 1) showed that the range of variation of the indica and japonica accessions was largely overlapping. To confirm these results and assess to what extent the observed phenotypic variability was determined by the genetic structure, a mean comparison was conducted between groups within the whole panel and between subpopulations within each subpanel for the genotyped accessions (Additional file 4: Table S3). For the majority of the traits except for DEPTH, MRL, DWB60, SRP, DRP and R_S, the phenotypic differences between the indica and japonica subpanels within the whole panel were highly significant. There were also difference between subpopulations within each subpanel for most of the traits except for DEPTH for the indica subpanel, number of tillers (TIL), SDW, number of crown root per tiller (NR_T) and PDW for the japonica subpanel and DW0020 for both subpanels. The percentage of phenotypic variance that was explained by the panel structure, which provides an alternate estimate of the relationships between genetic structure and phenotype for a given trait, gave similar results, with high percentages generally associated with the highest within-subpanel phenotypic differentiation (Additional file 4: Table S3). The mean comparisons showed that subpopulations I3 and, to a lesser extent, I6 in the indica subpanel and subpopulations J1 and J3 in the japonica subpanel had the deepest and thickest roots while subpopulations I1 and I5 as well as J2 and J4 registered the poorest performances in this respect.
The correlation coefficients among traits were highly significant and similar in direction within the whole panel and the two subpanels (Additional file 5: Table  S4). The magnitude of the differences between the indica and japonica subpanels varied from trait to trait but was generally small, except for combinations involving number of crown roots (NRC). The high positive correlations between root dry masses in different layers (greater than 0.8; data not shown) were derived from their pyramidal relationships. NCR was highly correlated with TIL (0.72 in the whole panel), as expected because the root and tiller emissions are synchronized in rice. To determine whether it was possible to disentangle these two traits, the NR/T ratio was calculated. TIL, NCR and NR/T were not correlated with the root depth (whether MRL or LENGTH).
A principal components analysis (PCA) was run on the adjusted means of all of the accessions. Together, the two first axes of the PCA explained 69.6 % of the variation. As shown by the circle of correlations (Fig. 2), almost all traits, with the exception of SRP and NR_T, which are ratios, were positively correlated with axis 1. Axis 1 can be viewed as an axis of increasing vigor opposing small and large plants when examining the accession positions on the first plane (Fig. 3). R_S was the only trait not correlated to axis 1. The second axis was characterized by an opposition between TIL, NCR, PDW, SDW, SRP and DW0020, corresponding to superficial biomass, and DEPTH, MRL, DRP and DWB60, corresponding to root biomass in the deepest layer. Root biomass in the intermediate layers (DW2040 and DW4060) was not correlated to axis 2. R_S, THK and LLGTH were also strongly correlated to axis 2, indicating that deep rooted varieties had also thick roots, long leaves and a high root to shoot ratio, all features that are characteristics of the tropical japonica group. The distribution of the accessions on the first plane ( Fig. 3) confirmed these interpretations. The two top and the bottom-right quadrants were mostly occupied by indica accessions (in red), while japonica accessions (in blue) were mostly found in the lower-left quadrant, showing a much clearer separation than when considering each trait separately. However, the indica and japonica clouds overlapped to some extent, and some indica accessions were found in the middle of the japonica accessions and vice versa. When repeated for the indica and japonica panels separately, the patterns were highly similar to that of the whole panel (data not shown).

Association mapping
We performed successive association mappings for the whole panel and then separately for the indica and japonica subpanels. The mixed model that included both the structure and kinship matrices exerted good control over false positive rates for most traits as shown by the quantile-quantile plots for the whole set of accessions, the indica set and the japonica set, respectively ( Fig. 4a to c). On these graphs, for most traits, the cumulative distribution of observed P-values fitted well with the expected uniform distribution that was represented by the diagonal, at least for the smallest log (P-values). There were two exceptions, DEPTH for the whole panel and THK for the japonica subpanel, for which the curves moved away from the diagonal. The inflation factor lambda was computed to quantitatively assess the extent of these deviations. Lambda was in the range of 0.95 to 1.07 for all traits except these two (1.25 for DEPTH in the whole panel and 1.50 for THK in the japonica subpanel, respectively). For these two trait x panel combinations, a larger number of false positives is expected compared with other combinations.
In the whole panel, and the indica and japonica subpanels, 66, 20 and 26 markers, respectively, were significant at P ≤ 1e-04 (Table 3). The higher number of QTLs that were detected in the whole panel is most likely the result of its larger size. The most significant associations were recorded for DEPTH on chromosome 1 (q17; P = 2.67e-07) and NCR on chromosome 11 (q45; P = 6.59e-07) for the whole panel, THK on chromosome 2 (q57; P = 4.77e-07) for the indica subpanel, and TIL  , respectively. In a few cases, several physically close but not always adjacent markers showed the exact same level of significance. After verification, these markers appeared to be in full LD. In such cases, the extreme markers are given as an interval (Site1-Site2) in Table 3. Most of these intervals were small (on the order of 1 to 200 kb), but in at least one case (q76) on chromosome 6, the interval covered 2.5 Mb. In another case, for NR_T with the japonica panel, significant markers belonging to different chromosomes were in full LD. This situation involved 89 markers distributed across chromosomes 2, 3, 6, 7, 8 and 12. The corresponding QTL was not kept in the results table because it was not possible to unambiguously attribute it to a chromosome. Some of the QTLs were common between the whole panel and the two subpanels, more so for the indica subpanel (7 occurrences), which represents 2/3 of the whole panel accessions, than for the japonica subpanel (2 occurrences) at P < 1e-04. These numbers increased to 34 and 13, respectively, when decreasing the threshold to P <1e-03 for the significant   markers (in italics in Table 3). Surprisingly, no association shared by the indica and japonica panels was detected, but half of the markers that were significant in one subpanel were monomorphic (Minor Allele Frequency (MAF) below 5 %) in the other and, were therefore, not tested (noted as nP in Table 3). In all three panels, the number of significant markers varied from trait to trait, but the range of variation was higher for the whole panel (from 1 to 14). The number of associations was greater than 5 for DEPTH (14 associations) and NCR (10 associations) in the whole panel, and for TIL (6 associations) in the japonica panel. For the remaining traits, this number was equal to or less than 5. Some of the significant markers were associated with several traits as shown in Table 4. Taking the markers that were common between panels or between traits as a single QTL, a total of 88 different sites or segments were significant at P < 1e-04 in this study.

Function of genes that were linked to significant markers
Among the 88 different sites identified, 33 were in genes with predicted functions. Given the level of LD in the panel, the genes that were within an interval of +/-25 kb on both sides of the significant markers were also surveyed using the query tools of OrygenesDB [23] to retrieve 889 additional genes, of which 407 had predicted functions (Additional file 6: Table S5). This list of QTL-associated genes was first compared to the list of approximately 200 genes that were recorded in EURoot database [24] that are known, mostly via mutant analysis, to play roles in rice root architecture, root development or water and nutrient transport. No correspondence was found except for PLASMA MEMBRANE INTRINSIC PROTEIN 2;1 (OsPIP2;1) gene located 16 kb from q61 on chromosome 7, which was significant for THK in the japonica panel ( Table 5). The list of QTL-associated genes was similarly compared with a list of genes that are specifically expressed during crown root formation or development [25,26]. Eleven of the QTL-associated genes corresponded to genes that are specifically expressed in different zones of the crown root such as the root cap, the lateral root differentiation zone and the mature zone ( Table 5). Most of these genes had a predicted biochemical function, but no precise information could be found regarding their biological functions. Selecting only those genes associated with root trait QTLs, the literature was then scanned to determine whether information about their biological function or that of their predicted Arabidopsis ortholog(s) was available. This approach revealed 13 additional interesting candidate genes, which are also listed in Table 5.

Discussion and conclusions
We have phenotyped the root traits of a panel of 182 Vietnamese varieties in a soil-based phenotyping system to analyze the genetic control of root architecture. The phenotypic variation of the panel was analyzed at the light of its genetic structure for GWAS purpose. The japonica subpanel showed on average poor performance, with lower mean values than the indica subpanel for deep root traits and biomasses. However, the analysis The P-value of the test in the three panels up to P = 1e-03 is given in italics. In bold, QTLs with q-values < 0.05 per sub-population shows that the overall subpanel performance masked important differences between subpopulations. These differences seem to be mostly related to ecosystem adaptation and risks of drought, as also shown by Lafitte et al. [26] with a sample of varieties originating from across Asia. This is true for the japonica subpanel, for which subpopulations J1 and J3, with the deepest and thickest roots, include mostly upland rice varieties, while subpopulations J2 and J4, with thin and shallow roots, correspond to irrigated and mangrove rice varieties, respectively [22]. More surprisingly, the same is also true for the indica subpanel: the two best subpopulations (I3 and I6) are almost exclusively composed of upland accessions (I3) and from a mixture of upland and rainfed lowland accessions (I6), whereas the two subpopulations (I1 and I4) with the poorest performance originated from irrigated ecosystems [22]. The indica types from group I3 constitute interesting donors of deep and thick roots that may be easier to use as parents in crosses with other indica vaccessions than the upland japonica accessions, by reducing the risks of F 1 inter subspecies sterility. The global organization of root variability in this panel highlights the need to control population structure and, because the panel phenotypic differences partly overlapped with the genetic structure for some of the traits, to perform individual analyses of each subpanel rather than only performing an analysis of the whole panel.
In this study, we detected QTLs for all the examined traits. Their number is generally limited, but some QTLs show both good P-values (P < 1e-06) and q-values (q < 0.05) and seem to merit further research, particularly the   QTL for THK on chromosome 2 and for NCR on chromosome 11. We found some significant markers located on different chromosomes but in full LD. In such situation, it is impossible to know where the functional mutation is located. This situation of full LD between markers far apart or on different chromosomes is probably much more frequent than was observed in our study but is difficult to detect for non-significant markers and generally goes unnoticed. The risk of such situations is likely higher when the panel size is low, as is the case here for the japonica subpanel.
Generally, the markers that were significant for a given trait in the whole panel were also significant for the same trait in the indica or the japonica panel, albeit with lower levels of significance. However, we found ten markers significant in the whole panel but not significant in any of the two subpanels. We did not find any markers associated with the same traits in both the indica and japonica subpanels. These differences in associations detected between panels may be partly due to the limited size of the subpanels, notably of the japonica subpanel, which limits our detection power. A panel size below 100 accessions is considered as suboptimal for association detection, notably for low MAF markers [15]. However, as shown in the example of Zhao et al. [27] with subpanel sizes (57 to 97 accessions) similar to that of our japonica subpanel, significant associations can still be detected when the marker effects are large. The differences in associations detected between panels may also partly result from the large variation in allelic frequencies in the different panels observed for many markers. These variations themselves result from the strong bipolar organization of the genetic diversity in rice Arabidopsis MYB60, root development [42] due to the independent domestications of indica and japonica subspecies followed by limited introgressions from one subspecies to the other, mainly in areas of low divergence [28,29]. This second possibility is supported by the reduction of the set of polymorphic markers (MAF greater than 5 %) from 21,623 in the whole panel to 13,814 in the indica subpanel and 8821 in the japonica subpanel. Many of the markers that were polymorphic in one subpanel were considered monomorphic (MAF below 5 %) in the other. Only 20.1 % of the 21,623 initial markers were polymorphic in both subpanels, while 15.4 % were monomorphic in both but with different alleles (markers discriminating indica from japonica accessions). When comparing the significant associations identified in this study with those identified by [17] for similar root traits in a japonica panel with no common accessions except the checks, we found one case of co-localization (d <50 bp). A marker that was significantly associated with NR_T in our study (q50 at position 42.706 Mb on chr 1) was associated with R_S in their study. These authors did not measure NR_T. However, we cannot exclude that this co-localization occurred by chance because the same marker is very far from the threshold of significance for R_S in any panel in our study.
We assessed the function of the genes in which or near which significant markers were located. OsPIP2;1 (Os07g26690), at 16 kb of q61, is more specifically expressed in the root exodermis and up-regulated specifically in roots of upland varieties after osmotic stress treatment [30,31]. However, the three markers that were found between the QTL and the gene showed a decreasing level of significance when approaching the gene, and it is difficult to establish a link between the function of this protein and the phenotype (THK) that is associated with the related QTL.
The literature searches for the biological functions of the QTL-associated genes and their predicted Arabidopsis ortholog(s) led to interesting candidates. For example, crown root initiation and development in rice are known to involve local auxin flux regulation by the relocation of PIN-FORMED (PIN) auxin efflux proteins, resulting in the activation by auxin of the gene encoding the LATERAL ORGAN BOUNDARY DOMAIN (LBD) transcription factor CROWN ROOT LESS 1 (CRL1) [11,13,14,32]. Some genes that are involved in this regulatory pathway were included in the confidence interval of QTLs that were related to NCR or NR_T. Os01g49690 (NCR, q36) is an ortholog of Arabidopsis PHYTOCHROME-ASSOCIATED SERINE/THREONINE PROTEINPHOSPHATASE1 (FYPP1). The Fypp1; Fypp2 double mutant is characterized by an elevation of a phosphorylated PIN protein that results in a basal-to-apical subcellular PIN accumulation, an increase in root basipetal auxin transport, and a phenotype showing shorter roots and less lateral root formation [33]. SORTING NEXIN2 (SNX2a), the Arabidopsis ortholog of Os05g27920 (NR T_T, q51), shares partial functional redundancy with SNX1, which is involved in the endocellular transport of PIN2 via the formation of SNX1-containing endosomes [34]. AUXIN/INDOLE-3-ACETIC ACID (IAA) are negative regulators of the auxin response and are involved in CRL1gene expression regulation by auxin [32]. The over-expression of OsIAA4 (Os01g18360, NR_T, q49) results in a reduction of the number of crown roots that form after auxin treatment, suggesting that this gene may be involved in the auxin signaling pathway that controls the formation of crown roots [35]. CRL1 regulates genes involved in meristem formation and patterning, such as SCARECROW (SCR), which acts together with SHORTROOT (SHR) to control the division of the cortex-endodermis initial cell in Arabidopsis or with QUIESCENT CENTER HOME OBOX (QHB), which is an ortholog of the WUSCHELrelated WOX5 gene, to contribute to the quiescent center and root stem cell specification and maintenance [36][37][38]. Finally, Os07g37010 (NR_T, q53) is the ortholog of Arabidopsis CYCLIN D6;1 that was identified as a direct target of SHR and SCR and is involved in the cell division events leading to root meristem formation and function [39]. All of these genes may be involved upstream or downstream of CRL1 in the regulation of crown root formation. OsMADS15 (Os0 7g01820, NCR, q42) over-expressing plants display alterations in development, including the over development of crown roots associated with enhanced expression of the transcription factor WOX11, another gene that controls the formation and development of crown roots through a CRL1-independent pathway [40,41]. Os02 g51710 is an ortholog of Arabidopsis POLYCYSTIN-1, LIPOXYGENASE, ALPHA-TOXIN AND TRIACYLGLYC-EROL LIPASE 1 (PLAT1). PLAT1 is induced by abiotic stress and positively regulates lateral root development [42].
Some genes whose function is related to root development and that are close to QTLs for deep-root biomass have been also identified. Os12g03150 (DW4060, q71) is an ortholog of Arabidopsis AtMYB60, which controls stomatal closure and stimulates root development in response to drought [43]. Os12g03110 (DW4060, q71) is the ortholog of MINI ZINC FINGER 1 (MIF1), which is a negative regulator of plant development, including root development [44]. Os01g10900 (DWB60, RDW, q73) is the ortholog of Arabidopsis MORPHOGENESIS OF ROOT HAIR 1 (MRH1), which regulates root-hair development [45]. Finally, Os02g50372 (DWB60, q74), is an expressed protein of unknown function but was previously reported as a candidate gene underlying a QTL that is related to root mass at depth, based on expression studies [46].
Concerning root thickness, Os06g09280 (THK, q60) is an ortholog of ROOT INITIATION DEFECTIVE 1 (RID1). RID1 is involved in different developmental processes, such as meristem maintenance, and leaf and root morphogenesis. In roots, RID1 contributes to the proper expression of SCR and WOX5, two key genes involved in meristem shaping and root-tissue patterning [47]. Os03g63970 (THK, q59) corresponds to GIBBERELIC ACID 20 OXIDASE 1 (OsGA20ox1), which is a gene underlying a previously identified QTL for vigor at early developmental stages [48]. In Arabidopsis, GA, together with SCR and SHR, plays a key role in regulating the differentiation of supernumerary cortex cell layers in roots [49]; thus, the function of OsGA20ox1 should be further investigated. Similarly, cytokinins play a key role with auxin in root vascular patterning [50]. Os07g26740 (THK, q61), which encodes the RESPONSE REGULATOR 7 (OsR R7), may be involved in this process.
Several genes were identified whose reported biological function is consistent with the associated phenotype, the rice gene or its predicted Arabidopsis ortholog (Table 5). Knock-down or gain-of-function mutants of these genes should be generated to further investigate their function in the related phenotypes. Nevertheless, the fact that two genes are predicted to be orthologs in rice and Arabidopsis based on sequence homology does not guarantee that their function is conserved between the two species. In addition, several genes supporting QTLs initially had unknown functions [51,52]. Therefore, to further validate these QTLs, it will be interesting to conduct an exhaustive differential expression study between contrasting varieties for the related phenotype for all genes included in the confidence interval of a given QTL. This approach has often been used as a criterion to restrict the number of candidate genes underlying a QTL independently of their putative function [46,48,53]. Another way to validate these QTLs, which were detected on a statistical basis, is to develop mapping populations segregating for the QTL. Although time consuming, this approach is a necessary step towards the positional cloning of the QTL. Marker haplotypes were established in the regions of the QTLs for NCR on chromosome 11 and THK on chromosome 2, which are the two most significant QTLs. Accessions of the panel with contrasting haplotypes will be used in a future study to make crosses and develop mapping populations in an indica background for NCR and in a japonica background for THK.

Materials
The material that was used in the experiment was composed of a panel of 197 accessions from Vietnam and 3 controls (Nipponbare, IR64 and Azucena). The seeds from the Vietnamese accessions came from the Plant Resource Center, Hanoi, Vietnam. These accessions were mostly traditional lines originating from various regions of Vietnam. The accession name, ID number, province of origin, ecosystem of origin, varietal group (indica (I), japonica (J) or intermediate (m)) and subpopulation within the varietal group (I1 to I6 for the indica accessions and J1 to J4 for the japonica accessions), when known, are listed in Additional file 1: Table S1. Other characteristics of the accessions can be found in Phung et al. [22]. The accessions were seed increased for one generation under field conditions, and one plant representative of the plot was selected for DNA extraction and further phenotyping.

Phenotyping experiment
The experiment was conducted at Van Giang Agricultural Station (latitude 20°39′ N, longitude 106°3′ W) near Hanoi, Vietnam, in August-September 2012. The experimental design was an alpha-lattice with 3 replicates. A block factor (10) was introduced to control for possible environmental variations within replicates. A block included 20 accessions. The experimental unit was composed of one plant.

Conditions of the experiment
The plants were grown under screenhouse conditions in well-drained black plastic bags, 80 cm long and 16 cm wide, that were filled with fine river sand and supplemented with fertilizer (5/10/3 NPK at the rate of 20 kg/ m 3 of sand). The seeds were sown on watered filter paper, incubated at 28°C for 3 days at obscurity and transplanted into bags (one seed per bag). Three accessions did not germinate. The plants were watered four times a day (6 am, 10 am, 2 pm, and 6 pm) except during rain. Fertilizer (2.5 g 15/15/15 N/P/K per plant,) was added three weeks after transplanting following solubilization in water. The root system was collected 45 days after sowing.
For each plant, the length of the longest leaf was measured (LLGTH). The number of tillers per plant (TIL) was counted. The shoot part was dried for 3 days at 72°C in an oven and weighed (SDW). The deepest point that was reached by the roots in the bag was measured (DEPTH). Then, the roots were carefully washed. The maximum root length (MRL) was measured. The number of crown roots (NCR) below the tillering plateau was counted, and the number of crown roots per tiller (NR_T) was computed as NCR/TIL. The root thickness (THK) was measured using a micrometer on 5 roots per plant, 2 cm below the tillering plateau, and averaged. The root system was divided into 4 segments from the top to the bottom (0 to 20 cm (DW0020), 20 to 40 cm (DW2040), 40 to 60 cm (DW4060) and below 60 cm (DWB60)) and the segments were oven-dried for 3 days at 72°C. The root dry weight (RDW) and deep root weight (DRW) were computed as the sum of the four segments and the sum of the two deeper segments (DW4060 + DWB60), respectively. The shallow root weight proportion (SRP) was computed as DW0020* 100/RDW. The deep root weight proportion (DRP) was computed as DRW*100/RDW. The root-to-shoot ratio (R_S) was computed as SDW*100/RDW.
To represent the measured phenotypic traits on a synthetic figure, a tracing software named "RASTA" (for Root And Shoot TrAcing) was developed using the PHP hypertext preprocessor (PHP, The PHP Group, http:// www.php.net) coding language and a drawing set of instructions from the Graphics Draw Graphics Library (http:// www.libgd.org). The software is available from the corresponding author. The shoot and root parts of each accession were drawn as two polygons whose dimensions were proportional to the considered traits (see Additional file 3: Figure S1 for detailed explanations).

Genotyping by Sequencing (GBS)
GBS data were available for 182 Vietnamese accessions of the 197 that germinated, and for the three controls. The GBS method and the selection of the resulting SNP markers are described in detail in [22]. Briefly, the genome complexity reduction was done using PstI/TaqI restriction digest, a combination of enzymes enabling a good sequencing depth, and was followed by Illumina short read sequencing. Markers that had no position on the Nipponbare sequence or that had more than 20 % missing data were discarded. Because of the sensitivity of GWAS to the presence of unbalanced genotypic classes markers that had a minor allele frequency (MAF) of less than 5 % were also discarded. The remaining missing data (5.2 % in the whole date set) were imputed using Beagle v3.3.2 [54]. The final data set was composed of 21,623 markers without missing data. Indica (115 accessions) and japonica (64 accessions) subpanels were also individualized and contained 13,814 and 8821 markers, respectively, that were polymorphic and with an MAF of less than 5 % in their specific subpanel.

Statistical analyses
An analysis of variance (ANOVA) was performed considering the variety, replicate and block effects as fixed. The fixed effects were tested using SAS 9.2. (SAS Institute, Cary NC, USA) and least square variety means were computed to adjust for imbalance of the block effects. The phenotypic correlations between traits were computed using these adjusted means. To analyze the organization of the phenotypic variability, a Principal Component Analysis (PCA) was undertaken on the adjusted means of all of the traits. To further analyze the relationships between phenotypic variability and genetic structure, which can deeply affect the results of GWAS, three independent ANOVAs were performed on the adjusted means of each trait considering a group effect as the main factor within the whole panel, and considering a subpopulation effect as the main factor within each of the two subpanels, and using a Newman and Keuls test to detect differences between group means or between subpopulation means. For each trait-by-panel combination, the percentage of phenotypic variance that was explained by the structure was computed by a regression of the phenotype on the percentages of admixtures that were obtained by Phung et al. [22] for all accessions. These analyses were performed using SAS v9.2, except for the PCA, for which XLStat [55] was used.

Association mapping
We performed successive association analyses on the whole panel (182 accessions * 21,623 markers), and on the indica (115 accessions * 13,814 markers) and japonica (64 accessions * 8821 markers) subpanels using Tassel v3 [56]. Three accessions of the whole panel classified as admixed (% of admixture below 75 % in any subpanel [22]) were excluded from the subpanels. For all of the traits, we used a mixed model with a structure matrix that was considered as a fixed effect and a kinship matrix considered as a random effect as covariates to control the false-positive rate. We chose the options of no compression and reevaluation of variance components for each marker. The structure matrices of the three panels were determined by running a PCA on their respective marker data sets. For the three panels, the first six PCA axes were retained, and the scores of the accessions on these six axes were used as the structure matrix. The kinship matrices were established using the pairwise Identity by State (IBS) method proposed by Tassel. Quantile-quantile plots (QQ plots) were drawn using Tassel. QQ plots enable to graphically evaluate the number of false positives observed with the chosen model, based on deviations from the uniform law. The inflation factor lambda was computed to quantitatively assess the extent of these deviations using the "estlambda" function from the R GenABEL package. The expected value of lambda is 1 for no inflation situations. We used a threshold of 1e-04 to declare an association significant. To determine this threshold, the effective number of independent tests (Meff) was first computed using the R SimpleM package and was used as the denominator in the Bonferroni correction formula as proposed by Gao [57]. The method is said to give close approximation to a permutation threshold. Meff were 8545, 5649, and 2310 for the whole panel, the indica and the japonica subpanels, respectively, corresponding to P values of approximately 5e-06, 1e-05, and 5e-05. Since we wanted to be able to make comparisons across populations and across traits, we chose the less stringent common P value of 1e-04. Then, we computed the q-value corresponding to each P-value for all of the traits in all three panels using the R Q-value package v1.0 [58] as a measure of the falsediscovery rate. The associations that were significant at a q-value < 0.05 were represented in bold in Table 3. The Manhattan plots that graphically represent the significance of all markers were drawn using Tassel.