Genome-wide association study of important agronomic traits within a core collection of rice (Oryza sativa L.)

Zhang, Peng; Zhong, Kaizhen; Zhong, Zhengzheng; Tong, Hanhua

doi:10.1186/s12870-019-1842-7

Research article
Open access
Published: 17 June 2019

Genome-wide association study of important agronomic traits within a core collection of rice (Oryza sativa L.)

Peng Zhang ORCID: orcid.org/0000-0003-2224-9942¹^na1,
Kaizhen Zhong¹^na1,
Zhengzheng Zhong¹ &
…
Hanhua Tong¹

BMC Plant Biology volume 19, Article number: 259 (2019) Cite this article

4088 Accesses
28 Citations
1 Altmetric
Metrics details

Abstract

Background

Cultivated rice (Oryza sativa L.) is one of the staple food for over half of the world’s population. Thus, improvement of cultivated rice is important for the development of the world. It has been shown that abundant elite genes exist in rice landraces in previous studies.

Results

A genome-wide association study (GWAS) performed with EMMAX for 12 agronomic traits measured in both Guangzhou and Hangzhou was carried out using 150 accessions of Ting’s core collection selected based on 48 phenotypic traits from 2262 accessions of Ting’s collection, the GWAS included more than 3.8 million SNPs. Within Ting’s core collection, which has a simple population structure, low relatedness, and rapid linkage disequilibrium (LD) decay, we found 32 peaks located closely to previously cloned genes such as Hd1, SD1, Ghd7, GW8, and GL7 or mapped QTL, and these loci might be natural variations in the cloned genes or QTL which influence potentially agronomic traits. Furthermore, we also detected 32 regions where new genes might be located, and some peaks of these new candidate genes such as the signal on chromosome 11 for heading days were even higher than that of Hd1. Detailed annotation of these significant loci were shown in this study. Moreover, according to the estimated LD decay distance of 100 to 350 kb on the 12 chromosomes in this study, we found 13 identical significant regions in the two locations.

Conclusions

This research provided important information for further mining these elite genes within Ting’s core collection and using them for rice breeding.

Background

Cultivated rice (Oryza sativa L.) is one of the staple foods for over half of the world’s population. Uncovering the genetic basis of natural variations in important agronomic traits in rice landraces is indispensable for ensuring the world’s food supply.

In general, linkage mapping is a conventional method for gene mining in rice. However, association mapping based on linkage disequilibrium (LD) has been widely used in rice studies since it was firstly reported in maize [1, 2]. Association mapping could overcome the limitations (i.e., limited alleles, high cost and poor mapping resolution) of linkage mapping [3] and enable researchers to use modern genetic technologies for exploiting natural genetic diversity and identifying elite genes in the genome [4]. Furthermore, many candidate genes or loci have been identified in rice through genome-wide association study (GWASs) of agronomic traits [5,6,7,8,9,10], abiotic stress tolerance [11,12,13] and metabolites [14, 15].

A population with diverse landraces or cultivars which could be used in crops GWASs is supposed to be a permanent resource and be rephenotyped for many traits [2]. Sampling populations (e.g., core collections and mini core collections) created from rice landraces might be a suitable choice for rice GWASs [16]. Rice landraces are easier to be utilized for breeding than wild rice because they have greater genetic diversity than elite cultivars and represent an intermediate stage of domestication history between wild rice and cultivars [17]. As early as 1920–1964, Ying Ting collected more than 7128 rice landraces from all over China and from some of the other main rice-cultivating countries. This collection is one of the earliest collections of rice germplasm resources in China and was named Ting’s collection [18]. Moreover, a rice core collection called Ting’s core collection and consisting of 150 accessions selected based on 48 phenotypic traits has been constructed from 2262 accessions of Ting’s collection [18]. In Ting’s core collection, the average polymorphism information content (PIC) is 0.48, and the average genetic diversity is 0.54 [19]. Furthermore, Ting’s core collection has been used in association mapping of 12 agronomic traits [20] and aluminum tolerance [21] with 274 SSR markers. However, no association mapping with higher resolution has been performed for agronomic traits within Ting’s core collection.

In the present study, a GWAS of 12 rice agronomic traits was carried out using Ting’s core collection of rice landraces with more than 3.8 million high-quality 3.8 million SNPs by whole-genome re-sequencing. Regions identified by the GWAS were compared with those identified as QTL and candidate genes in previous studies. This information will be very useful for rice breeders to improve elite cultivars.

Results

Comparison between Ting’s core collection and other populations used in GWASs

Ting’s core collection consists of 150 rice landraces that were collected from 20 different provinces of China and from North Korea, Japan, the Philippines, Brazil, Sulawesi, Java, Oceania, and Vietnam (Additional file 2: Table S1). The number of varieties in Ting’s core collection is lower than that in a population of Chinese rice landraces [5], a global collection [9] and a mini core collection of japonica rice [8], however, the phenotypic diversity in several agronomic traits in Ting’s core collection are comparable to those in above mentioned collections or even higher for some agronomic traits (Fig. 1).

Genome re-sequencing and SNP identification

Whole-genome re-sequencing of Ting’s core collection was performed, resulting in a total of 522.4 Gb of clean data with an average sequencing depth of 7.3× and an average coverage of 82.9% of the reference genome (Additional file 2: Table S2). The distribution of SNP positions along each chromosome are shown in Additional file 1: Figure S1. A total of 3,808,730 SNPs and 391,756 InDels with a minor allele frequency > 0.05 were generated, and 386,562 SNPs were found in the CDS region (Additional file 2: Table S3).

Phenotypic variation

A wide range of phenotypic variation in the 12 agronomic traits was revealed in Ting’s core collection both in Guangzhou and Hangzhou (Fig. 1). Plant height, grain length, grain width, grain length/width, 100 grains weight, flag leaf length, flag leaf width and flag leaf length/width showed similar distributions in the two locations, while heading days, seed set rate, panicle length and panicle number per plant had different distributions in the two locations. The broad-sense heritability ranged from 56.2% (Heading days) to 96.5% (Grain length) for these traits (Fig. 1).

Population structure and LD estimation in Ting’s core collection

We performed PCA to identify the population structure of Ting’s core collection with all SNPs data, and we observed two subpopulations in Ting’s core collection (Fig. 2). The discrimination obtained via a NJ tree based on the SNP data was not identical to that based on Cheng’s index method (Additional file 2: Table S1) [19] and showed fairly consistent results with that from the PCA (Fig. 3). Moreover, the LD dropped to the half of its maximum value at a distance of 100~350 kb on the 12 chromosomes, which is agreement with previous measurements [5, 9, 22, 23] (Additional file 1: Figure S2).

Relative kinship among varieties in Ting’s core collection and the effect of controlling type I error using EMMAX

In Ting’s core collection, most kinship estimates between varieties were zero, and none of the kinship values were larger than 0.5, indicating that these varieties were unrelated (Additional file 1: Figure S3).

Observed versus expected P values for each signal were graphed for estimating the effect of controlling for type I errors. As deviations from expected values demonstrate that the statistical analysis may cause spurious associations [24]. Our result indicated that the false positives were unlikely for all traits except grain length/width for the EMMAX method used in this study (Additional file 1: Figure S4).

GWAS of 12 agronomic traits

A total of 3,808,730 SNPs were included in a GWAS of 12 agronomic traits using the EMMAX method. Only one association signal’s -log₁₀(P) value was higher than 6.58 (this value was the significant threshold in this study, please see methods section)—a signal for heading days (Fig. 4a). Thus, we used -log₁₀(mBF) = 4.97 as the significance threshold for different traits in our study. A total of 1308 and 4272 significant loci were identified for the 12 agronomic traits in Guangzhou and Hangzhou, respectively (Table 1). The top-ranking candidate gene-based association signals for each trait are shown in Additional file 3: Table S4.

Table 1 Summary of association mapping results for 12 agronomic traits using EMMAX

Full size table

Furthermore, Si et al. (2016) indicated that they considered analyzing the 11 predicted genes within the 260-kb interval centered on the index SNP from the GWAS given the estimated LD decay rate of about 100 to 200 kb [25]. Thus, we analyzed whether some of the significant detections for each trait were identical in the two locations according to the estimated distance of LD decay of 100 to 350 kb on the 12 chromosomes (Additional file 1: Figure S2). Three significant regions (located on chromosomes 5, 6 and 7) for seed set rate were detected both in Guangzhou and Hangzhou. Moreover, two significant regions for flag leaf length/width were detected (located on chromosomes 10 and 12) in both locations (Figs. 4b, d, 5a, b, c, d and Table 1). Moreover, we chose the top 16 most significant signals (P value < 1 × 10^− 6) for in-depth analysis (Tables 2 and 3). The significant association signals with smaller P values and higher consecutive peaks for each trait are summarized in Table 3, Figs. 4 and 5, these signals might be located in candidate genes/regions. In addition, a detailed distribution of these new gene-based association signals is included in Additional file 4: Table S5 To confirm the effect of different alleles at the top 16 significant SNPs in the present study, we performed allelic analysis to these SNPs. Accessions in Ting’s core collection carrying different alleles for most of the 16 SNPs showed distinct discrepancies of phenotypes (Fig. 6).

Table 2 Information of new genome-wide significant association signals using EMMAX

Full size table

Table 3 Top highest genome-wide significant association signals of agronomic traits using EMMAX

Full size table

In our study, we also identified some genes that were reported in previous studies according to the estimated distance of LD decay of 100 to 350 kb on the 12 chromosomes. We think a SNP is close to a cloned gene when it locates in 350 kb from the cloned gene. For heading days, significant association signals close to OsMADS51 on chromosome 1, OsPRR1 [26] on chromosome 2, DTH3 [27] on chromosome 3, CKI [28] on chromosome 3, HAF1 [29] on chromosome 4, Hd1 [30] on chromosome 6 and OsMADS13 [31] on chromosome 12 were detected (Fig. 4a and Table 4). For plant height, significant association signals close to SD1 [32], Ghd7 [33] and Ghd8 [34] were identified (Fig. 4b and Table 4). For seed set rate, signals close to SPP1 [35] and Rf-1 [36] were found (Fig. 4c and Table 4). For panicle length, significant association signals close to OsBRI1 [37], LP [38], SSD1 [39], FZP [40], LP1 [41] and SP1 [42] were found (Fig. 4d and Table 4). For grain length, significant association signals close to GS3 [43] and TGW6 [44] were detected (Fig. 4e and Table 4). For grain width, significant association signals close to GW2 [45], GS2 [46], GL3.2 [47], GS5 [48], GS6 [49], TGW6 [44], OsSPL16-GW8 [50] and SLG [51] were detected (Fig. 5a and Table 4). For 100 grains weight, significant association signals close to GW5 [52], TGW6 [43], GL7 [53] and OsSPL16 [50] were identified (Fig. 5b and Table 4).

Table 4 Top highest genome-wide significant association signals of agronomic traits using EMMAX

Full size table

Discussion

The abundant genetic variation in Ting’s core collection makes it an important reservoir of genetic diversity and potential source of beneficial alleles for rice breeding (Fig. 1). It is very difficult to mine and utilize the exotic genes in all the rice accessions (i.e., 775,000) in the world [54] by either linkage mapping or association mapping. The maximum population size used for GWAS was 1495 rice accessions in a previous study [10]. One of the methods of utilizing a large set of germplasm in a GWAS is to construct a core collection [16]. A rice core collection consisting of 150 accessions selected based on 48 morphological traits from 2262 accessions of Ting’s collection has been constructed and used in rice association mapping with low resolution [19, 20]. Therefore, we performed a GWAS by whole-genome re-sequencing for getting higher resolution within Ting’s core collection.

Although the population size of Ting’s core collection is smaller than that of three other populations [5, 8, 9], the phenotypic diversity of several agronomic traits was comparable to that of these populations or even higher for some agronomic traits. Moreover, more than 3.8 million SNPs in Ting’s core collection were developed. The ratio of SNPs to population size in Ting’s core collection is higher than that in previous studies in which the ratio were approximately 3.6 million SNPs to 517 rice landraces [5], 0.04 million SNPs to 413 diverse landraces and cultivars [9], 4.1 million SNPs to 950 worldwide varieties [6], 1.6 million SNPs to 1495 elite hybrid varieties [10] and 0.04 million SNPs to 176 japonica varieties [8]. Furthermore, a simpler population structure (Figs. 2 and 3), more rapid LD decay (Additional file 1: Figure S2) and more distant relatedness (Additional file 1: Figure S3) among accessions were found in Ting’s core collection than in other collections. The above mentioned information illuminates and supports the fact that Ting’s core collection is suitable for GWASs.

Population structure in the present study was not identical to that in our previous study [55, 56]. This discrepancy might be due to molecular markers density used in two studies. In our previous study, 274 SSR markers were included to detect the population structure while about 3.8 million SNPs were used in present study.

A total of 3,808,730 SNPs from 150 varieties were used for the GWAS (Additional file 2: Table S3). A mixed model was performed using EMMAX software [55, 56]. EMMAX not only can correct for a wide range of sample structures by explicitly accounting for pairwise relatedness between individuals, using high-density markers to model the phenotype distribution. But also can reduce computational time [55, 56]. The value obtained from a rough Bonferroni correction of P = 1/n, where n is the total number of markers used in the GWAS, is widely applied as the threshold P value for significance [5,6,7,8, 10]. The threshold P value for significance in our study was P ≤ 2.63 × 10^− 7, corresponding to -log₁₀(P) = 6.58. However, only one peak, i.e., one on chromosome 4 for heading days was higher than this threshold value in Fig. 4a. Hence, we chose a lower -log₁₀(mBF) value as the significance threshold for different traits in our study (Table 1) because there will be no significant locus according to the theoretical threshold P value. We speculated that this result might due to population size in our study. However, Ting’s core collection is suitable for GWASs because the peaks located in well-known genes such as SD1, GS2, GS3, GS5, GL7, GW8 and TGW6 were also much lower than the theoretical threshold value (Figs. 4 and 5).

In our study, some significant association signals were identified through a GWAS of Ting’s core collection. First, loci significantly associated with agronomic traits were uncovered close to cloned genes such as Hd1, SD1, Ghd7, GW8, and GL7 (Figs. 4, 5 and Table 4) that were reported in previous studies. Moreover, some of these loci were located by coincidence in these genes, and they might be natural variations of these genes, which could be functional (Table 2 and Additional file 3: Table S4). Second, Si et al. [25] indicated that some significant loci within the distance of LD decay might be identical to each other. However, there were no identical significant loci in the two locations overall (Table 1), but some identical significant regions were discovered in the two locations when the estimated distance of LD decay of 100 to 350 kb was considered in Ting’s core collection (Table 1, Figs. 4 and 5). Third, some new significant association signals that might be candidate genes were detected in our study (Figs. 4, 5 and Additional file 4: Table S5). Some peaks of these candidate genes such as the peak on chromosome 4 for heading days (Fig. 4a) were even higher than the threshold value. Further, the peak on chromosome 11 for heading days (Fig. 4a) was higher than that of some famous genes such as Hd1. It would be valuable to test the functions of these candidate genes because some loci or regions were also detected by previous studies. For instance, the region on chromosome 8 for plant height, the region at position 23,300,000 on chromosome 1 for heading days and the region at position 21,650,000 on chromosome 2 were found to be significantly associated with related traits in the study of Zhao et al. [9].

Conclusions

In this study, Ting’s core collection showed abundant genetic variation for agronomic traits and was proved to be a suitable natural population that could be comparable to other populations used in previous GWASs. Moreover, according to this study, core collections constructed from large natural populations of other plants might be good choices for GWASs. Furthermore, some natural variations in cloned genes were founded in this study, and these variations could be used for functional analysis of these genes. In addition, new candidate genes identified in this study could be very useful for rice improvement. In sum, this study provided important information for further mining these elite genes within Ting’s core collection and using them for rice breeding.

Methods

Plant material

Ting’s core collection with 150 accessions of rice landraces [18], was used in this study. The information for these accessions is shown in Additional file 2: Table S1.

Phenotyping

In total, 12 agronomic traits of Ting’s core collection were measured in two locations. The methods of measuring these 12 agronomic traits were identical to those described in detail in our previous study [20].

A randomized complete block design with three replications was used in two locations. First, Ting’s core collection was cultivated at the farm of South China Agricultural University, Guangzhou (23°16′’ N, 113°8′ E), during the late season (July–November) in 2009. The design and methods of this research in Guangzhou were described in detail in our previous study [20]. Second, Ting’s core collection was cultivated at the farm of China National Rice Research Institute, Hangzhou (30°3′ N, 120°2′ E), during the late season (May–October) in 2016. A randomized complete block design with three replications, as in Guangzhou, was used during this season in Hangzhou. The space between rows and between plants was set to 26 and 20 cm, respectively. Twenty-four plants of each variety were grown in four rows with 6 plants per row. For each block, the five plants in the middle position of the second and third row of each variety were selected to prevent edge effects. The broad-sense heritability (H²) was calculated as \( {H}^2={\sigma}_{\mathrm{g}}^2/\left({\sigma}_{\mathrm{g}}^2+{\sigma}_{\mathrm{e}}^2\right) \), where \( {\sigma}_{\mathrm{g}}^2 \) is the genetic variance, \( {\sigma}_{\mathrm{e}}^2 \) is the environmental variance.

DNA isolation and genome sequencing

Total genomic DNA was extracted using a modified SDS method. Then, each landrace’s DNA was sheared randomly into ~ 500-bp fragments by Covaris, and the DNA fragments were loaded on 2% agarose gels. Fragments of ~ 500 bp were recovered and purified, and adapters were then added to each fragment. After making libraries for the clusters, they were loaded into an Illumina HiSeq™ 4000 for 2× 150-bp paired-end sequencing at 6~7-fold genome coverage.

The 150-bp paired-end reads were mapped onto the rice reference genome (IRGSP 1.0) using bwamem with the –M option in BWA software [57]. The mapped reads were realigned by using RealignerTargetCreator and IndelRealigner in GATK [58]. UnifiedGenotyper in GATK was used with the −glm BOTH option to label SNPs and indels. After removing nucleotide variants with a missing rate ≥ 0.25 and a minor allele frequency > 0.05, a total of 3,808,730 SNPs and 391,756 indels were generated.

Population genetic analyses

Principal component analysis (PCA), construction of a neighbor-joining (NJ) tree, determination of LD decay level and kinship analysis among landraces were performed based on SNPs. The population structure of the 150 varieties was estimated with PCA by using the software EIGENSTRAT [59]. PHYLIP version 3.695 software (http://evolution.genetics.washington.edu/phylip/getme-new1.html) was used to construct the NJ tree on the basis of similarity measures. The software MEGA V5.2 was used to observe the NJ tree [60]. The LD in Ting’s core collection was evaluated using squared Pearson’s correlation coefficients (r²) calculated with the −r2 command in the software PLINK [61]. A Q matrix was obtained from the membership probability of each variety using ADMIXTURE Version 1.22 software [62]. The Q matrix was used for further association mapping. The Loiselle algorithm was chosen to construct a kinship matrix (K) with the software SPAGeDi [63]. Moreover, all negative kinship values were set to zero.

GWAS

A total of 3,808,730 SNPs from 150 varieties were used for GWAS. A mixed model was performed using EMMAX software [56]. P ≤ 2.63 × 10^− 7 (P = 1/n, n = total number of markers used [7], which is a rough Bonferroni correction, corresponding to -log₁₀(P) = 6.58). However, no significant loci were detected based on this threshold, hence, we calculated another significance threshold, i.e., a minimum Bayes factor (mBF), based on the P value threshold for significance. The mBF was calculated using the following formula: mBF = −e_*P_*ln(P) [64]. Thus, the significance threshold in this study was -log₁₀(P) = 4.97.

Availability of data and materials

The datasets used during the current study are available from the corresponding author on reasonable request.

Abbreviations

CDS:: Coding sequence
DNA:: Deoxyribonucleic acid
EMMAX:: Efficient mixed model association eXpedited
IRGSP:: International Rice Genome Sequencing Project
QTL:: Quantitative trait locus
SDS:: Sodium dodecyl sulfate
SNPs:: Single Nucleotide Polymorphisms
SSR:: Simple Sequence Repeats

References

Remington DL, Thornsberry JM, Matsuoka Y, Wilson LM, Whitt SR, Doebley J, Kresovich S, Goodman MM, Buckler ET. Structure of linkage disequilibrium and phenotypic associations in the maize genome. Proc Natl Acad Sci U S A. 2001;98(20):11479–84.
Article CAS PubMed PubMed Central Google Scholar
Huang X, Han B. Natural variations and genome-wide association studies in crop plants. Annu Rev Plant Biol. 2014;65:531–51.
Article CAS PubMed Google Scholar
Kraakman A, Niks RE, Van den Berg P, Stam P, Van Eeuwijk FA. Linkage disequilibrium mapping of yield and yield stability in modern spring barley cultivars. Genetics. 2004;168(1):435–46.
Article CAS PubMed PubMed Central Google Scholar
Zhu C, Gore M, Buckler ES, Yu J. Status and prospects of association mapping in plants. Plant Genome. 2008;1(1):5–20.
Article CAS Google Scholar
Huang X, Wei X, Sang T, Zhao Q, Feng Q, Zhao Y, Li C, Zhu C, Lu T, Zhang Z, et al. Genome-wide association studies of 14 agronomic traits in rice landraces. Nat Genet. 2010;42(11):961–76.
Article CAS PubMed Google Scholar
Huang X, Zhao Y, Wei X, Li C, Wang A, Zhao Q, Li W, Guo Y, Deng L, Zhu C, et al. Genome-wide association study of flowering time and grain yield traits in a worldwide collection of rice germplasm. Nat Genet. 2012;44(1):32–53.
Article Google Scholar
Yang W, Guo Z, Huang C, Duan L, Chen G, Jiang N, Fang W, Feng H, Xie W, Lian X, et al. Combining high-throughput phenotyping and genome-wide association studies to reveal natural genetic variation in rice. Nat Commun. 2014;5:5087.
Article CAS PubMed Google Scholar
Yano K, Yamamoto E, Aya K, Takeuchi H, Lo P, Hu L, Yamasaki M, Yoshida S, Kitano H, Hirano K, et al. Genome-wide association study using whole-genome sequencing rapidly identifies new genes influencing agronomic traits in rice. Nat Genet. 2016;48(8):927.
Article CAS PubMed Google Scholar
Zhao K, Tung C, Eizenga GC, Wright MH, Ali ML, Price AH, Norton GJ, Islam MR, Reynolds A, Mezey J, et al. Genome-wide association mapping reveals a rich genetic architecture of complex traits in Oryza sativa. Nat Commun. 2011;(9):467.
Huang X, Yang S, Gong J, Zhao Y, Feng Q, Gong H, Li W, Zhan Q, Cheng B, Xia J, et al. Genomic analysis of hybrid rice varieties reveals numerous superior alleles that contribute to heterosis. Nat Commun. 2015;6:6258.
Article CAS PubMed Google Scholar
Famoso AN, Zhao K, Clark RT, Tung C, Wright MH, Bustamante C, Kochian LV, McCouch SR. Genetic architecture of aluminum tolerance in rice (Oryza sativa) determined through genome-wide association analysis and QTL mapping. PLoS Genet. 2011;7:e10022218.
Article Google Scholar
Ueda Y, Frimpong F, Qi Y, Matthus E, Wu L, Hoeller S, Kraska T, Frei M. Genetic dissection of ozone tolerance in rice (Oryza sativa L.) by a genome-wide association study. J Exp Bot. 2015;66(1):293–306.
Article CAS PubMed Google Scholar
Norton GJ, Douglas A, Lahner B, Yakubova E, Guerinot ML, Pinson SRM, Tarpley L, Eizenga GC, McGrath SP, Zhao F, et al. Genome wide association mapping of grain arsenic, copper, molybdenum and zinc in rice (Oryza sativa L.) grown at four international field sites. PLoS One. 2014;9:e896852.
Google Scholar
Chen W, Gao Y, Xie W, Gong L, Lu K, Wang W, Li Y, Liu X, Zhang H, Dong H, et al. Genome-wide association analyses provide genetic and biochemical insights into natural variation in rice metabolism. Nat Genet. 2014;46(7):714–21.
Article CAS PubMed Google Scholar
Matsuda F, Nakabayashi R, Yang Z, Okazaki Y, Yonemaru J, Ebana K, Yano M, Saito K. Metabolome-genome-wide association study dissects genetic architecture for generating natural variation in rice secondary metabolism. Plant J. 2015;81(1):13–23.
Article CAS PubMed Google Scholar
Zhang P, Zhong K, Shahid MQ, Tong H. Association analysis in rice: from application to utilization. Front Plant Sci. 2016;7:1202.
PubMed PubMed Central Google Scholar
Londo JP, Chiang YC, Hung KH, Chiang TY, Schaal BA. Phylogeography of Asian wild rice, Oryza rufipogon, reveals multiple independent domestications of cultivated rice, Oryza sativa. Proc Natl Acad Sci U S A. 2006;103(25):9578–83.
Article CAS PubMed PubMed Central Google Scholar
Li X, Lu Y, Li J, Xu H, Shahid MQ. Strategies on sample size determination and qualitative and quantitative traits integration to construct core collection of rice (Oryza sativa). Rice Sci. 2011;18(1):46–55.
Article Google Scholar
Zhang P, Li J, Li X, Liu X, Zhao X, Lu Y. Population structure and genetic diversity in a rice core collection (Oryza sativa L.) investigated with SSR markers. PLoS One. 2011;6:e2756512.
Article Google Scholar
Zhang P, Liu X, Tong H, Lu Y, Li J. Association mapping for important agronomic traits in core collection of rice (Oryza sativa L.) with SSR markers. PLoS One. 2014;9:e11150810.
Google Scholar
Zhang P, Zhong K, Tong H, Shahid MQ, Li J. Association mapping for aluminum tolerance in a core collection of rice landraces. Front Plant Sci. 2016;7:1415.
PubMed PubMed Central Google Scholar
Mather KA, Caicedo AL, Polato NR, Olsen KM, McCouch S, Purugganan MD. The extent of inkage disequilibrium in rice (Oryza sativa L.). Genetics. 2007;177(4):2223–32.
Article CAS PubMed PubMed Central Google Scholar
McNally KL, Childs KL, Bohnert R, Davidson RM, Zhao K, Ulat VJ, Zeller G, Clark RM, Hoen DR, Bureau TE, et al. Genomewide SNP variation reveals relationships among landraces and modern varieties of rice. Proc Natl Acad Sci U S A. 2009;106(30):12273–8.
Article CAS PubMed PubMed Central Google Scholar
Yang X, Yan J, Shah T, Warburton ML, Li Q, Li L, Gao Y, Chai Y, Fu Z, Zhou Y, et al. Genetic analysis and characterization of a new maize association mapping panel for quantitative trait loci dissection. Theor Appl Genet. 2010;121(3):417–31.
Article PubMed Google Scholar
Si L, Chen J, Huang X, Gong H, Luo J, Hou Q, Zhou T, Lu T, Zhu J, Shangguan Y, et al. OsSPL13 controls grain size in cultivated rice. Nat Genet. 2016;48(4):447–56.
Article CAS PubMed Google Scholar
Kim SL, Lee S, Kim HJ, Nam HG, An G. OsMADS51 is a short-day flowering promoter that functions upstream of Ehd1, OsMADS14, and Hd3a. Plant Physiol. 2008;147(1):438.
Article CAS Google Scholar
Murakami M, Ashikari M, Miura K, Yamashino T, Mizuno T. The evolutionarily conserved OsPRR quintet: Rice pseudo-response regulators implicated in circadian rhythm. Plant Cell Physiol. 2003;44(11):1229–36.
Article CAS PubMed Google Scholar
Bian X, Liu X, Zhao Z, Jiang L, Gao H, Zhang Y, Zheng M, Chen L, Liu S, Zhai H, et al. Heading date gene, dth3 controlled late flowering in O. Glaberrima Steud. By down-regulating Ehd1. Plant Cell Rep. 2011;30(12):2243–54.
Article CAS PubMed Google Scholar
Dai C, Xue H. Rice early flowering1, a CKI, phosphorylates DELLA protein SLR1 to negatively regulate gibberellin signalling. EMBO J. 2010;29(11):1916–27.
Article CAS PubMed PubMed Central Google Scholar
Yang Y, Fu D, Zhu C, He Y, Zhang H, Liu T, Li X, Wu C. The RING-finger ubiquitin ligase HAF1 mediates heading date 1 degradation during photoperiodic flowering in rice. Plant Cell. 2015;27(9):2455–68.
Article CAS PubMed PubMed Central Google Scholar
Yano M, Katayose Y, Ashikari M, Yamanouchi U, Monna L, Fuse T, Baba T, Yamamoto K, Umehara Y, Nagamura Y, et al. Hd1, a major photoperiod sensitivity quantitative trait locus in rice, is closely related to the arabidopsis flowering time gene CONSTANS. Plant Cell. 2000;12(12):2473–83.
Article CAS PubMed PubMed Central Google Scholar
Hu Y, Liang W, Yin C, Yang X, Ping B, Li A, Jia R, Chen M, Luo Z, Cai Q, et al. Interactions of OsMADS1 with floral homeotic genes in rice flower development. Mol Plant. 2015;8(9):1366–84.
Article CAS PubMed Google Scholar
Sasaki A, Ashikari M, Ueguchi-Tanaka M, Itoh H, Nishimura A, Swapan D, Ishiyama K, Saito T, Kobayashi M, Khush GS, et al. Green revolution: a mutant gibberellin-synthesis gene in rice. Nature. 2002;416(6882):701–2.
Article CAS PubMed Google Scholar
Xue W, Xing Y, Weng X, Zhao Y, Tang W, Wang L, Zhou H, Yu S, Xu C, Li X, et al. Natural variation in Ghd7 is an important regulator of heading date and yield potential in rice. Nat Genet. 2008;40(6):761–7.
Article CAS PubMed Google Scholar
Yan W, Wang P, Chen H, Zhou H, Li Q, Wang C, Ding Z, Zhang Y, Yu S, Xing Y, et al. A major QTL, Ghd8, plays pleiotropic roles in regulating grain productivity, plant height, and heading date in rice. Mol Plant. 2011;4(2):319–30.
Article CAS PubMed Google Scholar
Liu T, Mao D, Zhang S, Xu C, Xing Y. Fine mapping SPP1, a QTL controlling the number of spikelets per panicle, to a BAC clone in rice (Oryza sativa). Theor Appl Genet. 2009;118(8):1509–17.
Article CAS PubMed Google Scholar
Akagi H, Nakamura A, Yokozeki-Misono Y, Inagaki A, Takahashi H, Mori K, Fujimura T. Positional cloning of the rice Rf-1 gene, a restorer of BT-type cytoplasmic male sterility that encodes a mitochondria-targeting PPR protein. Theor Appl Genet. 2004;108(8):1449–57.
Article CAS PubMed Google Scholar
Nakamura A, Fujioka S, Sunohara H, Kamiya N, Hong Z, Inukai Y, Miura K, Takatsuto S, Yoshida S, Ueguchi-Tanaka M, et al. The role of OsBRI1 and its homologous genes, OsBRL1 and OsBRL3, in rice. Plant Physiol. 2006;140(2):580–90.
Article CAS PubMed PubMed Central Google Scholar
Li M, Tang D, Wang K, Wu X, Lu L, Yu H, Gu M, Yan C, Cheng Z. Mutations 1 in the F-box gene LARGER PANICLE improve the panicle architecture and enhance the grain yield in rice. Plant Biotechnol J. 2011;9(9):1002–13.
Article CAS PubMed Google Scholar
Asano K, Miyao A, Hirochika H, Kitano H, Matsuoka M, Ashikari M. SSD1, which encodes plant-specific novel protein, controls plant elongation by regulating cell division in rice. Proc Jpn Acad Ser B Phys Biol Sci. 2010;86(3):265–73.
Article CAS PubMed PubMed Central Google Scholar
Komatsu M, Chujo A, Nagato Y, Shimamoto K, Kyozuka J. FRIZZY PANICLE is required to prevent the formation of axillary meristems and to establish floral meristem identity in rice spikelets. Development. 2003;130(16):3841–50.
Article CAS PubMed Google Scholar
Liu E, Liu Y, Wu G, Zeng S, Thi TGT, Liang L, Liang Y, Dong Z, She D, Wang H, et al. Identification of a candidate gene for panicle length in rice (Oryza sativa L.) via association and linkage analysis. Front Plant Sci. 2016;7:596.
PubMed PubMed Central Google Scholar
Li S, Qian Q, Fu Z, Zeng D, Meng X, Kyozuka J, Maekawa M, Zhu X, Zhang J, Li J, et al. Short panicle1 encodes a putative PTR family transporter and determines rice panicle size. Plant J. 2009;58(4):592–605.
Article CAS PubMed Google Scholar
Fan C, Xing Y, Mao H, Lu T, Han B, Xu C, Li X, Zhang Q. GS3, a major QTL for grain length and weight and minor QTL for grain width and thickness in rice, encodes a putative transmembrane protein. Theor Appl Genet. 2006;112(6):1164–71.
Article CAS PubMed Google Scholar
Ishimaru K, Hirotsu N, Madoka Y, Murakami N, Hara N, Onodera H, Kashiwagi T, Ujiie K, Shimizu B, Onishi A, et al. Loss of function of the IAA-glucose hydrolase gene TGW6 enhances rice grain weight and increases yield. Nat Genet. 2013;45(6):707.
Article CAS PubMed Google Scholar
Song X, Huang W, Shi M, Zhu M, Lin H. A QTL for rice grain width and weight encodes a previously unknown RING-type E3 ubiquitin ligase. Nat Genet. 2007;39(5):623–30.
Article CAS PubMed Google Scholar
Hu J, Wang Y, Fang Y, Zeng L, Xu J, Yu H, Shi Z, Pan J, Zhang D, Kang S, et al. A rare allele of GS2 enhances grain size and grain yield in rice. Mol Plant. 2015;8(10):1455–65.
Article CAS PubMed Google Scholar
Xu F, Fang J, Ou S, Gao S, Zhang F, Du L, Xiao Y, Wang H, Sun X, Chu J, et al. Variations in CYP78A13 coding region influence grain size and yield in rice. Plant Cell Environ. 2015;38(4):800–11.
Article CAS PubMed Google Scholar
Li Y, Fan C, Xing Y, Jiang Y, Luo L, Sun L, Shao D, Xu C, Li X, Xiao J, et al. Natural variation in GS5 plays an important role in regulating grain size and yield in rice. Nat Genet. 2011;43(12):1266–9.
Article CAS PubMed Google Scholar
Sun L, Li X, Fu Y, Zhu Z, Tan L, Liu F, Sun X, Sun X, Sun C. GS6, a member of the GRAS gene family, negatively regulates grain size in rice. J Integr Plant Biol. 2013;55(10):938–49.
CAS PubMed Google Scholar
Wang S, Wu K, Yuan Q, Liu X, Liu Z, Lin X, Zeng R, Zhu H, Dong G, Qian Q, et al. Control of grain size, shape and quality by OsSPL16 in rice. Nat Genet. 2012;44(8):950.
Article CAS PubMed Google Scholar
Feng Z, Wu C, Wang C, Roh J, Zhang L, Chen J, Zhang S, Zhang H, Yang C, Hu J, et al. SLG controls grain size and leaf angle by modulating brassinosteroid homeostasis in rice. J Exp Bot. 2016;67(14):4241–53.
Article CAS PubMed PubMed Central Google Scholar
Liu J, Chen J, Zheng X, Wu F, Lin Q, Heng Y, Tian P, Cheng Z, Yu X, Zhou K, et al. GW5 acts in the brassinosteroid signalling pathway to regulate grain width and weight in rice. Nat Plants. 2017;3:17043.
Article CAS PubMed Google Scholar
Wang Y, Xiong G, Hu J, Jiang L, Yu H, Xu J, Fang Y, Zeng L, Xu E, Xu J, et al. Copy numbe variation at the GL7 locus contributes to grain size diversity in rice. Nat Genet. 2015;47(8):944.
Article CAS PubMed Google Scholar
FAO, The second report on the state of the world's plant genetic 1 resources for food and agriculture. Commission on genetic resources for food and agriculture, 2010.
Google Scholar
Kang HM, Sul JH, Service SK, Zaitlen NA, Kong SY, Freimer NB, Sabatti C, Eskin E. Variance component model to account for sample structure in genome-wide association studies. Nat Genet. 2010;42(4):348–54.
Article CAS PubMed PubMed Central Google Scholar
Li H, Durbin R. Fast and accurate long-read alignment with burrows-wheeler transform. Bioinformatics. 2010;26:589.
Article PubMed PubMed Central Google Scholar
DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, Philippakis AA, Del Angel G, Rivas MA, et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet. 2011;43:491.
Article CAS PubMed PubMed Central Google Scholar
Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet. 2006;38(8):904–9.
Article CAS PubMed Google Scholar
Tamura K, Dudley J, Nei M, Kumar S. MEGA4: molecular evolutionary genetics analysis (MEGA) software version 4.0. Mol Biol Evol. 2007;24(8):1596–9.
Article CAS PubMed Google Scholar
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, de Bakker PI, Daly MJ, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81(3):559–75.
Article CAS PubMed PubMed Central Google Scholar
Alexander DH, Novembre J, Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 2009;19(9):1655–64.
Article CAS PubMed PubMed Central Google Scholar
Hardy OJ, Vekemans X. SPAGeDi: a versatile computer program to analyse spatial genetic structure at the individual or population levels. Mol Ecol Notes. 2002:618.
Article Google Scholar
Goodman SN. Of p-values and Bayes: a modest proposal. Epidemiology. 2001;12:295–7.
Article CAS PubMed Google Scholar

Download references

Acknowledgments

We are grateful to Dr. Jinquan Li from Max Planck Institute for Plant Breeding Research for his advice and assistance and Dr. Xiangdong Liu from South China Agricultural University for supplying Ting’s core collection. We would like to thank the anonymous reviewers for valuable suggestions and American Journal Experts (https://www.aje.com) for English language editing.

Funding

This work was supported by three funds of the National Natural Science Foundation of China (31701401, 31872862 and 31601287), a fund from Zhejiang Province Public Welfare Technology Application Research Project (LGN19C130005), a fund from the State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources (SKLCUSA-b201713), as well as a fund from Shanghai Agrobiological Gene Center (201503). The funding bodies did not play any role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.

Author information

Peng Zhang and Kaizhen Zhong contributed equally to this study.

Authors and Affiliations

State Key Laboratory of Rice Biology, China National Rice Research Institute, Hangzhou, 310006, China
Peng Zhang, Kaizhen Zhong, Zhengzheng Zhong & Hanhua Tong

Authors

Peng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Kaizhen Zhong
View author publications
You can also search for this author in PubMed Google Scholar
Zhengzheng Zhong
View author publications
You can also search for this author in PubMed Google Scholar
Hanhua Tong
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceived and designed the experiments: PZ and HT. Performed the experiments: PZ and KZ. Analyzed the data: PZ, KZ, ZZ and HT. Contributed reagents/materials/analysis tools: PZ and HT. Wrote the paper: PZ, KZ and HT. All authors read and approved the final manuscript

Corresponding authors

Correspondence to Peng Zhang or Hanhua Tong.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Additional files

Additional file 1:

Figure S1. SNP distribution along position in each chromosome. Figure S2. Genome-wide average LD decay estimated in Ting’s core collection on 12 chromosomes. Figure S3. Distribution of pairwise relative 1 kinship values based on 3.8 million SNPs in Ting’s core collection. The height of blue bar represents the percentage of varieties in different range of kinships. Figure S4. Plots of observed versus expected P-values using EMMAX for 12 agronomic traits. Red symbol represents expected P-values, and Blue symbol represents observed P-values. (DOCX 1084 kb)

Additional file 2:

Table S1. Accessions, variety names, origin and germplasm types of 150 rice varieties in Ting’s core collection. Table S2. Re-sequencing average read depth and coverage in Ting’s core collection. Table S3. Summary of categorized SNPs and InDels. (DOC 246 kb)

Additional file 3:

Table S4. List of all P-value ranked genes in the gene-based association analysis of heading days/plant height/seed set rate/panicle length/grain length/100 grains weight/flag leaf length/flag leaf width/panicle number per plant. (XLSX 169 kb)

Additional file 4:

Table S5. List of new loci in association analysis of heading days/plant height/seed set rate/panicle length/grain length/grain width/100 grains weight/ flag leaf length/panicle number per plant. (XLSX 51 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article

Zhang, P., Zhong, K., Zhong, Z. et al. Genome-wide association study of important agronomic traits within a core collection of rice (Oryza sativa L.). BMC Plant Biol 19, 259 (2019). https://doi.org/10.1186/s12870-019-1842-7

Download citation

Received: 16 July 2018
Accepted: 21 May 2019
Published: 17 June 2019
DOI: https://doi.org/10.1186/s12870-019-1842-7

Genome-wide association study of important agronomic traits within a core collection of rice (Oryza sativa L.)

Abstract

Background

Results

Conclusions

Background

Results

Comparison between Ting’s core collection and other populations used in GWASs

Genome re-sequencing and SNP identification

Phenotypic variation

Population structure and LD estimation in Ting’s core collection

Relative kinship among varieties in Ting’s core collection and the effect of controlling type I error using EMMAX

GWAS of 12 agronomic traits

Discussion

Conclusions

Methods

Plant material

Phenotyping

DNA isolation and genome sequencing

Population genetic analyses

GWAS

Availability of data and materials

Abbreviations

References

Acknowledgments

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher’s Note

Additional files

Additional file 1:

Additional file 2:

Additional file 3:

Additional file 4:

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Plant Biology

Contact us