Genome-wide Association Study (GWAS) of mesocotyl elongation based on re-sequencing approach in rice

Background Mechanized dry seeded rice can save both labour and water resources. Rice seedling establishment is sensitive to sowing depth while mesocotyl elongation facilitates the emergence of deeply sown seeds. Results A set of 270 rice accessions, including 170 from the mini-core collection of Chinese rice germplasm (C Collection) and 100 varieties used in a breeding program for drought resistance (D Collection), was screened for mesocotyl lengths of seedlings grown in water (MLw) in darkness and in 5 cm sand culture (MLs). Twenty six accessions (10.53 %) have MLw longer than 1.0 cm. Eleven accessions had the highest mesocotyl lengths, i.e. 1.4 – 5.05 cm of MLw and 3.0 – 6.4 cm in 10 cm sand culture, including 7 upland landraces or varieties. The genotypic data of 1,019,883 SNPs were developed by re-sequencing of those accessions. A whole-genome SNP array (Rice SNP50) was used to genotype 24 accessions as a validation panel, giving 98.41 % of consistent SNPs with the re-sequencing data in average. GWAS based on compressed mixed linear model was conducted using GAPIT. Based on a threshold of -log(P) ≥8.0, 13 loci were associated to MLw on rice chromosome 1, 3, 4, 5, 6 and 9, respectively. Three associated loci, on chromosome 3, 6, and 10, were detected for MLs. A set of 99 associated SNPs for MLw, based on a compromised threshold (−log(P) ≥7.0), located in intergenic regions or different positions of 36 annotated genes, including one cullin and one growth regulating factor gene. Conclusions Higher proportion and extension of elongated mesocotyls were observed in the mini-core collection of rice germplasm and upland rice landraces or varieties, possibly causing the correlation between mesocotyl elongation and drought resistance. GWAS found 13 loci for mesocotyl length measured in dark germination that confirmed the previously reported co-location of two QTLs across populations and experiments. Associated SNPs hit 36 annotated genes including function-matching candidates like cullin and GRF. The germplasm with elongated mesocotyl, especially upland landraces or varieties, and the associated SNPs could be useful in further studies and breeding of mechanized dry seeded rice. Electronic supplementary material The online version of this article (doi:10.1186/s12870-015-0608-0) contains supplementary material, which is available to authorized users.

seeding can save water, but are labour costing. So mechanized dry seeding is probably the most efficient way of rice seedling establishment, saving 30 % labour than machine-transplanting rice (MTPR) as estimated in Korean trials [3].
In rainfed areas or areas of inadequate irrigation, transplanting rice could completely fail or delay in years with less and/or delayed rainfall. As an example, a minimum of 600 mm of cumulative rainfall was required to complete field puddling and transplanting of rice in the Philippines, much higher than 150 mm cumulative rainfall required by dry seeding [4]. In 1 year of every 4 years, a delay of 20 days for dry seeding could happen, much shorter than 40-day delay for transplanting [5]. MDSR has been widely adopted and will expand to much larger area if effective managements are available to control weeds and to maintain uniform plant density, e.g. fine tillage, better land levelling, more appropriate seed placement, improved nutrient application, varieties with higher seedling vigor and lodge resistance [6]. So far, the appropriate techniques are not fully available yet to ensure the perfect seedling establishments.
Rapid and well seedling establishment is important for weed competitiveness and good harvesting of DSR, determined by sowing depth and a few other factors. The seedling establishment and shoot dry weight were critically affected by the depths of soil and water layer in lowland wet seeded rice [7]. Hanviriyapant et al. reported the well establishment and strong seedlings of a tall, vigorous-growing cultivar and higher sensitivity of semidwarf cultivar to sowing depth and time of sowing after irrigation [8]. An experiment of gradient sowing depths showed that the seedling establishment of wheat was not affected by sowing depths from 2.3 to 8.3 cm, but declined to about 6 % at 14.3 cm [9].
Elongation of both mesocotyl and coleoptile can facilitate the seedling establishment of rice when sown deep in soil or under water layer [10,11]. Mgonja et al. reported the association between mesocotyl elongation and seedling vigor [12]. Alibu et al. found that coleoptile length was more enhanced under submergence while mesocotyl elongated more in soil-sand culture. Sown 8 cm deep, the emergence of only a few genotypes was determined by varied mesocotyl elongation, not the variation of coleoptile lengths [13], similar to an early observation in indica rice [14]. Mesocotyl elongation has been found to be the cause of deep-seeding tolerance in maize [15,16].
Genome-wide association study (GWAS) based on SSR [28] or single nucleotide polymorphism (SNP) markers [29][30][31][32][33] has been widely used in model plant species including rice. Extremely high resolution can be achieved by dense SNPs identified in diverse germplasm panels based on the 2nd generation genome sequencing or SNP array approaches [29][30][31][32][33][34][35]. In this study, GWAS based on resequencing approach was conducted in a set of rice landraces or varieties for mesocotyl elongation as a key character enhancing rice seedling emergence, especially after dry seeding with relatively higher sowing depth.

Phenotypic variations of mesocotyl elongation among rice germplasm accessions
A wide range of mesocotyl lengths in different rice germplasm accessions, from almost no elongation to a maximum of 5.05 cm, were observed in the dark germination experiment. Mesocotyl length varied from nearly zero to a maximum of 2.05 cm among those rice accessions when measured in 5 cm sand culture. ANOVA showed highly significant variance among rice germplasm accessions, together with less or no significant variance between replications for ML in dark germination with water (MLw) and ML in sand culture (MLs) ( Table 1).
As shown in Fig. 1, only a low proportion of germplasm accessions had largely elongated mesocotyl. The MLw of 26, 29 and 192 accessions were higher than 1.0 cm, in the range of 0.5-1.0 cm and shorter than 0.5 cm, respectively. MLs showed similar general trend with MLw, but had some deviation around MLw (Fig. 1). The mesocotyl lengths measured in dark germination (MLw) and in sand culture (MLs) had highly significant correlation (r = 0.784**; Additional file 1: Table S1). A third experiment was conducted to confirm previous results and to check the reaction of mesocotyl elongation to higher depth of sand or soil covering layers, using 30 landraces or varieties representing accessions with low, medium and high mesocotyl elongation. As sorted by MLw on the axis of abscissa (Fig. 2), ascending lines showed consistent trends between the measurements of mesocotyl lengths in all experiments. The seedlings had similar mesocotyl lengths in either sand or soil culture. The reaction of mesocotyl elongation to two seeding depths showed different patterns among rice accessions. The first 10 accessions (on the left in the chart) had almost same mesocotyl lengths for both depths, i.e. no more increase under 10 cm sand culture as a more favoured condition, implying that the measurements here represented the maximum capacity of mesocotyl elongation of those accessions. Another 10 accessions (in the middle) had a little longer mesocotyl lengths under 10 cm than under 5 cm covering layers, suggesting their maximum capacity up to 2.5-3 cm that was equivalent to or a little higher than the detectable limit in experiment of 5 cm sand or soil culture. For the last 10 accessions, mesocotyl lengths were higher in 10 cm than in 5 cm depth. It is obvious that those landraces or varieties had capacities of mesocotyl elongation from 3 to 6 cm, fully expressed in 10 cm, but not in 5 cm culture. The low measurements (2-3 cm) in 5 cm sand or soil culture were perhaps the result of light inhibition after the emergence of coleoptiles or leaves of the seedlings.

SNP validation and population structure analysis
A subset of 24 accessions, including 9 from C collection and 15 from D collection, were genotyped using the RiceSNP50 whole-genome SNP array [31]. There are 10,851 SNP loci shared by the genotypic data sets from re-sequencing SNP calling and SNP array. Each accession has effective data on 8,313-10,746 common SNP loci after excluding loci with missing data in either SNP calling or array. The accuracy of SNP calling and missing  Table S2).
The population structure was estimated using a subset of 144,995 SNP loci with less than 10 % missing data in D collection before imputation (as the total SNP number called from the sequencing reads of the accessions in the D collection is much lower than that in the C collection). Using genotypic data before imputation could avoid the possible influence from imputed values on genetic distance and LD levels. A two sub-population structure, highly matching the two subspecies in rice, was observed among those accessions in this study ( Fig. 3; Additional file 3: Figure S1). Among 4 aus accessions, DULAR and N22 were grouped into indica while AUS 454 and LAMBAYE-QUE into japonica subpopulation.

Genome-wide association study (GWAS)
Forward model selection procedure provided the largest Bayesian information criteria (BICs) for both traits when zero PCs/covariates were included in the GWAS models (Additional file 4: Table S3). This result suggested that the PCs estimated from SNP data had weak covariance with the phenotypic data. Using -log(P) ≥8.0 as the threshold at a significant level of 0.01 after Bonferroni multiple test correction, a total of 13 loci were declared to have highly significant association with the mesocotyl lengths (MLw). Those associated loci were located on 6 chromosomes of rice, including 3, 3, 1, 2, 2, 2 loci on chromosome 1, 3, 4, 5, 6 and 9, respectively (Fig. 4a). Seven peaks with -log(P) values larger than 10 in Manhattan plot indicated very strong signals of association between the trait and the chromosomal regions, especially four regions on chromosome 3, 5, 6 and 9 which host sharp -log(P) peaks.
The Manhattan plot of MLs shows totally different pattern (Fig. 4b). Only three associated SNPs were detected at the significant level of -log(P) ≥8.0, including two SNPs locating in the same regions associated to MLw on chromosome 3 and 6, one SNP on chromosome 10 with no association to MLw.
As Bonferroni correction was recognized to be too conservative [36], a compromised threshold of -log(P) ≥7.0 was used to screen out a set of 99 SNPs associating to MLw and 7 SNPs to MLs (Additional file 5: Table S4). Among MLw associated SNPs, 52, 16, 24, 3, 3, 1 SNPs located in intergenic regions, intron, promoter, CDSsynonymous, CDS-nonsynonymous and 5′ UTR regions of 36 annotated genes, respectively. Two MLs associated SNPs hit the promoter region of LOC_Os03g40390 while another SNP and the remaining four SNPs located in the intron of LOC_Os10g20860 and the intergenic regions, respectively.
In about 15.7Kb interval (29288539-29304267) on rice chromosome 1, five MLw associated SNPs located in the promoter, CDS-nonsynonymous or intergenic regions of three putative genes (LOC_Os01g50970, LOC_Os01 g50980, LOC_Os01g50990). Those genes have been annotated as expressed protein with unknown function, putatively expressed cullin and FBD domain containing protein, respectively. One associated SNP (0430137498) located in the promoter of rice gene LOC_Os04g51190, annotated as a growth-regulating factor.

Discussion
Retrieving the character of mesocotyl elongation to develop varieties for mechanized dry seeded rice In the past several decades, many labour-saving methods of seedling establishment have been developed and widely used in rice production in Asian countries where hand transplanting rice became common during 1950- Fig. 3 Neighbor joining tree of 270 rice accessions showed a two-subpopulation structure in consistence with the classification of indica (in red) and japonica (in blue) subspecies. Four aus accessions (in green) were grouped into two subpopulations 70s. Among them, mechanized dry seeded rice (MDSR) is probably the system using the least water and labour resource [3][4][5]. As the majority of modern rice varieties were developed for transplanting system in irrigated environments, their performance has not been optimized for direct seeding, especially in drought-prone environments. Early maturing, high-yielding rice varieties that can withstand drought and compete with weeds are urgently required in the dry-seeded rice system. In this case, well establishment and vigorous growth of the rice seedlings become very important [4].
In order to obtain quick and uniform seedling emergence, shallow sowing with a narrow range of depth (e.g. 2-3 cm) is required in drill seeding for most semidwarf rice varieties. Seedling establishment decreases remarkably, together with the delayed seedling emergence and poor early growth, when seeding depth is higher than 5 cm [3]. But shallowly sown seeds are vulnerable to bird damage while the derived plants are possibly sensitive to lodging at late stage [36]. In drought prone areas, the quick lost of moisture in shallow soil layer would cause delayed or failed seed germination and seedling emergence. This is the major reason why the period from pre-irrigation to sowing has critical influence on seedling establishment of DSR [8]. Narrow tolerant range of seeding depth will cause high risk of inadequate management in mechanized seeding if the soil was not finely tilled and levelled or the seed drill did not give precise seed placement. So rice varieties with tolerance to varied seeding depth, would reduce such kind of risk or additional requirements to farm machinery, then facilitate the expanding of mechanized dry seeded rice.
An early observation confirmed the association of mesocotyl elongation with seedling vigor in rice [12] and a wide range of genetic variation of this trait among rice germplasm [11,13,[17][18][19], even though the percentage of germplasm with mesocotyl length higher than 1.0 cm was low, e.g. less than 1 % in a set of 1500 accessions [19]. In this study, 26 accessions had mesocotyl length (MLw) higher than 1.0 cm, showing much higher percentage (10.53 %) than previous reports (Fig. 1). Among 11 accessions with most elongated mesocotyl in this study, there are 7 upland accessions (4 landraces and 3 varieties), accounting for a quite high proportion. Larger genetic variation could be expected in core or mini-core collection of germplasm. And it seems reasonable that more upland rice accessions have highly elongated mesocotyl [18].
A few publications described the failed emergence of semidwarf rice varieties and/or the successful emergence of tall, vigorously growing varieties when sown deep [8,10]. It should be true that most modern rice varieties, developed for transplanting cultivation, have lost the character of mesocotyl elongation. But an important question is whether mesocotyl elongation is tightly linked to plant height. Mgonja et al. found no correlation between mesocotyl elongation and characters of mature plants like plant height and internode length L1 [20]. In this study, the same set of rice accessions were evaluated in field for drought resistance using water regimes (data not shown). Both MLw and MLs are correlated to plant height in both conditions (r = 0.250~0.349; P ≤ 0.01; Additional file 1: Table S1); correlated to grain yield and spikelet fertility in drought treatment, but not in well watered condition. These results did not necessarily indicate the linkage or pleiotropism of loci controlling mesocotyl elongation and plant height or drought resistance. It is more likely the consequences of the high proportion of upland landraces or varieties in the population which had longer mesocotyl, higher plant height and drought resistance at the same time. So development of semidwarf varieties possessing both mesocotyl elongation and drought resistance is necessary for mechanized dry seeded rice and achievable by using those potential germplasm screened in this study.

Mesocotyl elongation QTLs and candidate genes
Among 3-8 QTLs for mesocotyl length reported in different mapping populations [22][23][24][25][26][27], two QTLs (qMel-1, qMel-3) on rice chromosome 1 and 3 were repeatedly detectable and showed large effects across experiments [22-24, 26, 27, 37]. Substitution mapping confined qMel-1 into a 3,799Kb interval from RM5448 to RM5310 and qMel-3 into a 6,964Kb region from RM3513 to RM1238, containing 490 and 700 putative genes, respectively [27]. In this study, one SNP marker at the bottom of chromosome 1 was associated with MLw (P = 2.57E-09), about 0.17 Mb away from the interval of RM5448-RM5310. Strong association signals were detected in qMel-3 region represented by the sharp -log(P) peaks in the Manhattan plots for both MLw and MLs (Fig. 4), including 3 SNPs within a 50 Kb region. The positions of those associated SNPs were not within, but about 2.59 Mb beyond the interval between RM3513 and RM1238. If confirmed in further studies like candidate gene cloning, the results demonstrate the high power of GWAS based on high dense SNPs.
The threshold of genome-wide association test using a large number of SNP markers remains an issue under controversy. Nakagawa suggested that both standard and adjusted Bonferroni procedures should be abandoned because of reduced statistical power [38]. Controlling of false discovery rate (FDR) was introduced by Benjamini [39] and recommended as a better statistical reference to set the threshold of associated loci. In this study, both P values and FDR adjusted P values showed similar effect in locating loci if referring to the peaks of significance above -log(P) ≥6 or -log(FDR adjusted P) ≥3 (Additional file 6: Figure S2A). In general, −log(FDR adjusted P) values increased as -log(P) values did (Additional file 6: Figure S2B). However, −log(FDR adjusted P) values remained unchanged around 3 while -log(P) varied from 6 to 7. Declared at the threshold of -log(FDR adjusted P) ≥3, the number of associated SNPs, 401 for MLw, seems too large. So a compromised threshold at -log(P) ≥7 were used to select significant SNPs (99 for MLw; 7 for MLs). Forty seven SNPs located in different positions of 36 annotated genes (itional file 5, Table S4). Among them, one cullin gene and OsGRF3 had putative functions related to growth regulation. Cullin proteins was found as part of the scaffolds of multiple E 3 ligase [40], including the E 3 ubiquitin ligase SCF TIR1 that mediates ubiquitination of auxin/IAA proteins [41]. The first growth regulating factor gene (OsGRF1) was identified as a transcript factor in rice, responding to gibberellin (GA) and showing potential regulatory role in stem growth [42]. Choi et al. [43] analyzed the expression patterns of OsGRF1 and its 11 homologs in the rice genome. Seven genes showed induced expression by GA 3 . Almost all OsGRF genes had high expression in primary leaves and the highest node containing shoot apical meristem or intercalary meristem and part of the elongation zone. As a candidate gene hit by the associated SNP in our study, OsGRF3 was the only GRF gene that had strong level of expression in mesocotyls and coleoptiles.

Conclusions
Higher proportion and extension of mesocotyl elongation were observed in a population of landraces and varieties from the mini-core collection of Chinese rice germplasm and a collection of parental varieties for drought tolerant rice breeding. High proportion of upland rice accessions within those having top mesocotyl lengths (7 of 11 accessions) could be the cause of the correlation between mesocotyl elongation and drought resistance, implying the important role and reservation of this character in upland rice germplasm. GWAS found 13 loci for mesocotyl length measured in dark germination that confirmed the previously reported co-location of two QTLs across populations and experiments. Associated SNPs hit 36 annotated genes including putatively function-matching candidates like cullin and GRF. The germplasm with elongated mesocotyl, especially upland landraces or varieties, and the associated SNPs could be useful in further studies and breeding of mechanized dry seeded rice.

Rice germplasm and phenotypic experiments
The materials used in this study consisted of two sets of rice germplasm. One is part of the mini-core collection of Chinese rice germplasm, provided by Huazhong Agricultural University and China Agricultural University (170 accessions, denoted as C Collection) [33,44] and a set of varieties collected for the breeding program of water-saving and drought -resistant rice (WDR) [45] by Shanghai Agrobiological Gene Center (100 accessions, denoted as D Collection) (Additional file 7: Table S5).
Two experiments were conducted to measure the mesocotyl length of rice seedlings grown in water (MLw, cm) in darkness or under 5 cm sand layer (MLs, cm) for 10 days. In each of two replications of the dark germination experiment, 20 seeds of each accession were sterilized with 3 % H 2 O 2 solution, rinsed by tap water three times, submerged in water for pre-soaking by 24 h. Then seeds were put on one layer of filter paper above a sponge sheet in a plastic box with cover (L × W × H = 12 × 12 × 2 cm). The boxes were kept in darkness in carton boxes that were placed in the incubator with constant temperature of 25°C. The mesocotyl lengths of five normal seedlings from each box were measured using rulers.
The sand culture experiments had two replications that were arranged with 3d interval to allow quick finish of the measurements in each replication. Stainless steel boxes without bottom (L × W × H = 90 × 30 × 30 cm) were placed on a levelled sand bed. After adding 5 cm sand layer, 12 seeds from each accession were placed on sand surface in a single row (about 2 cm apart between seeds) along the width of the box. The space between two rows is about 5 cm. Another 5 cm sand layer was added over the seeds and saturated with water by sprinkler until leaking from the bottom of the boxes. Mesocotyl lengths of 10 seedlings were measured using rulers after all seedlings were taken out from the sand and washed by water. This experiment was conducted in late May to early June in a green house. The air temperature was within the range from 20 to 38°C while the temperature in sand layer ranged from 20 to 31°C. There were 247 accessions that had effective phenotypic data of both MLw and MLs after removing accessions with missing data caused by inadequate seed samples or failed germination in one experiment or both experiments.
Thirty accessions, including those with longest MLw and a few accessions with low or moderate mesocotyl elongation, were used in an additional experiment to check the mesocotyl elongation when seeds germinated under 5-10 cm layers of sand or soil. This experiment was conducted using the same boxes and procedure as described above, but setting two depth of cover layer and using dry soil as another medium.
ANOVA and Pearson's correlation analysis with twotailed significance were conducted using SPSS v16.0.

Genotyping by re-sequencing and SNP validation
Whole genome re-sequencing was conducted for two germplasm sets using Solexa Hiseq 2000 system. Accessions in the C Collection and D collection were re-sequenced for 2.5 and 5× average genome coverage, respectively. The same pipelines with similar parameters [33], using the softwares BWA, SAMtools and BCFtools [46,47], were used to call SNPs from sequencing reads for both collections using the rice reference genome of Nipponbare (MSU Rice Genome Annotation Project Release 6.1) [48,49]. A merged genotypic data set was built by obtaining the intersectional loci of the two SNP data sets from C and D collections. Imputation procedure was conducted by using FillGenotype program (Filling missing genotype (Fimg), http://www.ncgr.ac.cn/fimg/intr.html) based on K-nearest neighbor (KNN) algorithm, using the default parameters (w = 80, p = −7, k = 5, and f = 0.7) [29]. For the whole set of germplasm, the final genotypic data consists of 1,019,883 SNP loci.
In order to evaluate accuracy of SNP calling and imputation pipeline, a high-density whole-genome SNP array, RiceSNP50 [34], was used to genotype a validation panel of 24 accessions including 9 from C collection and 15 from D collection. DNA amplification, fragmentation, chip hybridization, single base extension, staining and scanning were conducted by Life Science and Technology Center, China National Seed Group Co., LTD (Wuhan, China), according to Infinium HD Assay Ultra Protocol (http://www.illumina.com/). The RiceSNP50 array contains about 51K evenly distributed SNP markers [34]. About 43K SNPs with high quality were used in the comparison with the SNP calls from re-sequencing. The percentages of consistent SNP loci were calculated by dividing the number of identical SNPs by the effective SNP number within the common set of SNP loci (n = 10,851) between array and SNP calls from re-sequencing (Additional file 2: Table S2).

Population structure analysis and genome-wide association mapping
Based on a subset of 144,994 SNPs that had less than 10 % missing data in D Collection (with much lower total SNP number than in C collection) before imputation, we used the Dnadist program to generate a pairwise distance matrix that was used to construct the unrooted and unweighted neighbour-joining tree by the Neighbor program from the software PHYLIP (V3.695, http://evolution.gene tics.washington.edu/phylip.html) [50]. The exported phylogenetic tree in Newick format was modified in format using an online tool Interactive Tree of Life [51]. In addition, the genetic structure of rice population was estimated by the model-based program STRUCTURE version 2.3.4 (http://pritch.bsd.uchicago.edu/structure.html) [52,53]. Adopting an admixture model allowing for correlated allele frequencies among populations, with no linkage model, we used the run-length parameters as the burn-in period of 2,000 and the number of MCMC replications after burn-in of 5,000. Ten independent simulations using K-value ranging from 2 to 11, with eight replications, yielded consistent results. The inferred groups between successive K values were decided to identify the real number of clusters of individuals based on Evanno's methods [54].