Multi-locus genome-wide association studies for five yield-related traits in rice
BMC Plant Biology volume 21, Article number: 364 (2021)
Improving the overall production of rice with high quality is a major target of breeders. Mining potential yield-related loci have been geared towards developing efficient rice breeding strategies. In this study, one single-locus genome-wide association studies (SL-GWAS) method (MLM) in conjunction with five multi-locus genome-wide association studies (ML-GWAS) approaches (mrMLM, FASTmrMLM, pLARmEB, pKWmEB, and ISIS EM-BLASSO) were conducted in a panel consisting of 529 rice core varieties with 607,201 SNPs.
A total of 152, 106, 12, 111, and 64 SNPs were detected by the MLM model associated with the five yield-related traits, namely grain length (GL), grain width (GW), grain thickness (GT), thousand-grain weight (TGW), and yield per plant (YPP), respectively. Furthermore, 74 significant quantitative trait nucleotides (QTNs) were presented across at least two ML-GWAS methods to be associated with the above five traits successively. Finally, 20 common QTNs were simultaneously discovered by both SL-GWAS and ML-GWAS methods. Based on genome annotation, gene expression analysis, and previous studies, two candidate key genes (LOC_Os09g02830 and LOC_Os07g31450) were characterized to affect GW and TGW, separately.
These outcomes will provide an indication for breeding high-yielding rice varieties in the immediate future.
Rice (Oryza sativa L.) is one of the three major food crops supporting more than 50% of the whole population worldwide . In 2018, the total rice output accounted for 32.24% of the total grain production in China followed by maize (http://www.stats.gov.cn/). While in the world, the average production of rice from 1994 to 2019 is 654.78 million tonnes per year, accounting for 27.28% of total cereals output (http://www.fao.org/faostat/en/#data/QC/visualize). The growing global population and the deteriorating environment issue new challenges to the breeding of high-yielding crops . Rice yield is a complex quantitative agronomic trait multiplicatively governed by three major components as the number of grains per panicle, thousand-grain weight, and the number of panicles per plant . Besides, grain size (including grain length, width, and thickness) are also closely related to rice productivity . A previous study reported that rice yield was a representative quantitative trait regulated by several minor genes, and a more efficient tool was needed to develop for exploiting these minor QTLs . Many genes have been reported controlling grain size, grain number, and yield. For example, the GS3  is a major gene controlling rice grain length (GL). A mutation in the second exon changes a cysteine codon (TGC) to a termination codon (TGA) at the protein level, resulting in a diversity of rice GL. The GW5  is an IQ calmodulin-binding motif family protein, which regulates rice grain width and weight. Loss-of-function gw5 showed wider grain compared to the wild type. The WTG1/ OsOTUB1  encodes an otubain-like protease with deubiquitination activity, which expresses in developing grains and panicles. The overexpression of WTG1 showed narrow, thin, and long rice grains as a result of slim cells. The OsSPL13 is a SQUAMOSA promoter-binding-like protein [9, 10], which was reported controlling rice grain length, grain number, grain size, and yield.
The genome-wide association study (GWAS) has become a powerful tool for mining QTL associated with complex traits [11, 12]. Single-locus GWAS (SL-GWAS) methods such as mixed linear model (MLM) , efficient mixed-model association eXpedited (EMMAX) , and factored spectrally transformed linear mixed models (FaST-LMM)  have been widely used to investigate tremendous genetic variants for agronomic traits. However, these SL-GWAS methods are limited in detecting marginal effects QTNs influenced by the polygenic background and stringent Bonferroni correction .
To address the shortcomings of SL-GWAS, multi-locus GWAS (ML-GWAS) has been developed as a multi-dimensional genome scan method in which the effects of all markers are estimated at the same time . In particular, to solve the problem associated with co-factor selection in the ML-GWAS model when there are many markers, the mrMLM package was proposed, which containing the following six ML-GWAS methodologies: mrMLM (multi-locus random-SNP-effect MLM) , FASTmrMLM (fast mrMLM) , ISIS EM-BLASSO (iterative modified-sure independence screening expectation-maximization-Bayesian least absolute shrinkage and selection operator) , pKWmEB (integration of Kruskal-Wallis test with empirical Bayes) , FASTmrEMMA (fast multi-locus random-SNP-effect efficient mixed model analysis) , and pLARmEB (polygenic-background-control-based least angle regression plus empirical Bayes) . ML-GWAS also has a lower false-positive rate and has been applied successfully to identify significant QTNs with subtle contributions for several agronomic [23,24,25,26]. But no studies have focused on ML-GWAS for yield-related traits in rice as yet. In general, the QTL (Quantitative Trait Loci) refers to the signal identified by single-locus methods, such as GLM, MLM, etc. In such QTL, it mostly contained numerous associated SNPs (Single nucleotide polymorphisms). While in multi-locus GWAS methods, when all the potentially associated markers were identified in the first step. These markers were submitted into a model in further analysis and true QTNs (Quantitative Trait Nucleotides) were further confirmed by the likelihood ratio test .
In the current study, a large-scale natural population of 529 rice accessions with five yield-related traits and 607,201 SNPs was conducted by a hybrid method of one SL-GWAS (MLM) algorithm and five ML-GWAS (mrMLM, FASTmrEMMA, pLARmEB, pKWmEB, and ISIS EM-BLASSO) models. We aim to investigate common QTNs via multiple methodologies and then deduce potential candidate genes to accelerate molecular marker-assisted breeding and boost rice production.
Five yield-related traits (including GL, GW, GT, TGW, and YPP) were selected to examine whether significant phenotypic variances exist in the yield among the 529 rice varieties. The results manifested that the parameters were varied for accessions to their corresponding traits (Table S1). For instance, the GL ranged from 6.13 to 10.97 mm, with a mean of 8.57 mm. The YPP had a great variation ranging from 4.41 to 92.66 g, whereas the GT possessed the smallest range from 1.61 to 2.45 mm with a CV of 6.86%. The CV of GL, GW, TGW, and YPP were 10.25, 12.36, 16.09, and 44.90%, respectively. Also, the frequency distributions of all five traits obeyed approximately the normal distributions.
Furthermore, the Pearson correlation coefficients (PCC) among the five traits were also estimated (Fig. 1). All paired traits showed statistically significant differences at the p-value< 0.001 except the relationship between TGW and YPP (p-value< 0.05). GL and GW had a negative relation with PPC = − 0.40, which was corresponding with a previous study . GL was also associated negatively with GT (PCC = -0.21) while positively with TGW (PCC = 0.41) and YPP (PCC = 0.17), respectively. In addition, TGW was observed to positively correlate to GT (PCC = 0.54), GL (PCC = 0.41), and GW (PCC = 0.36), indicating grain size might make a major contribution to grain weight. These results exhibited that there was a close relationship among the five rice traits, which played an important role in regulating the rice grain shape and productivity.
Population structure and linkage disequilibrium analysis
To understand the population structure of the panel, PCA analysis was performed using 607,201 SNPs, which was mentioned in the Methods section. Five conceivable subpopulations were respectively distinguished via PC1, PC2, and PC3 (Fig. 2a and b). Next, a maximum likelihood phylogenetic tree was analysed by their genetic distances, which was derived from the SNP differences in these genotypes. The population could be divided into six distinct subgroups, 95 indica I (IndI), 74 indica II (IndII), 43 tropical japonica (TrJ), 93 temperate japonica (TeJ), 46 aus, and 178 admixture of the others (Adm), respectively (Fig. 2c). Based on the results from both the phylogenetic tree and PCA, the panel was separated into six groups.
The LD decay distance was further estimated using the identified SNPs. As delineated in Fig. 2d, the genome-wide LD decay rate of all populations was approximately 43 kb, where the r2 dropped to half of the maximum value. Due to that, the theoretical average marker density was one SNP per 9 kb. In fact, the actual value of SNP density in the genome already reached 0.6 (kb/SNP) as the distribution of these SNPs within the whole genome was summarized in Fig. S1. Therefore, we concluded that these markers were sufficiently dense for detecting the associated QTNs.
SL-GWAS and ML-GWAS analyses
All five yield-related traits were analyzed using one SL-GWAS (MLM) to identify QTL and five ML-GWAS (mrMLM, FASTmrMLM, pLARmEB, pKWmEB, and ISIS EM-BLASSO) methods to identify QTNs (Fig. S2). As for MLM, 152, 106, 12, 111, and 64 SNPs corresponding to 3, 19, 8, 53, and 56 QTLs were found to be tightly associated with GL, GW, GT, TGW, and YPP under the cut-off criterion of p-value =1.65 × 10− 6, respectively (Table S2). As figured in QQ plots (Fig. S2f and 2i), the curves of GL and TGW were consistent with optimal trends, implying that the false-positive errors were controlled well and the results of the MLM model were reliable.
A total of 74 significant QTNs (LOD ≥ 3) were simultaneously defined to be associated with the above five objective traits by at least two ML-GWAS methods (Table 1). Among these QTNs, 19, 9, 7, 22, and 17 were found to be associated with GL, GW, GT, TGW, and YPP, respectively. A total of 22 correlated QTNs were distinguished for TGW, which were widely located on all 12 chromosomes. For GW, 9 candidate QTNs were distributed on chromosomes 3, 4, 5, 9, 10, and 12. A total of 17 QTN hotspots were detected significantly related to YPP, spread over 2, 3, 4, 5, 6, 7, 8, 10, 11, and 12 chromosomes. Of these, six QTNs were found simultaneously using at least three ML-GWAS methods (qYPP-3-1, qYPP-4-2, qYPP-5-2, qYPP-7-2, qYPP-10-1, and qYPP-10-2). Notably, qYPP-7-2 was determined across all five ML-GWAS approaches, explaining the 0.28 ~ 1.79% of the phenotypic alteration. Five QTNs (qTGW-5-1, qTGW-5-2, qGW-5-2, qYPP-7-2, and qYPP-10-2) were mapped by four or more ML-GWAS models.
Moreover, we compared 161 published rice yield-related genes’ locations with 74 significant QTNs and their genomic ranges (300 kb up- and down-stream around the associated QTNs) (Fig. 3 and Table S3). Nearly one-third of QTNs overlapped with the known genes in total. For example, qGL-3-3, qGL-3-4, and qGL-3-5 overlapped with GS3 (Grain Size 3). Stunningly, three QTLs, qGL-5-1, qGW-5-1, and qGW-5-2, controlling multiple traits (GL and GW) simultaneously existed in the same region on chromosome 5, which were adjacent to GW5 (Grain Width 5). These regions are generally regarded as pleiotropic. Additionally, we found 51 novel QTLs such as qYPP-4-1, qGL-10-1, qTGW-11–2, and qTGW-12–1 without coinciding with or adjacent to the known genes.
In addition, we compared the results of ML-GWAS and SL-GWAS and a total of 20 common QTNs were identified (Table S4). Four, four, seven, and five QTNs were discovered associated with GL, GW, TGW, and YPP, respectively. While no common QTN for GT was identified in this study.
Prediction of potential candidate genes
Only the QTNs simultaneously detected by both ML-GWAS and SL-GWAS were further analysed. The qTGW-7-1, which was located at 18,639,992 bp on chromosome 7, was identified associated with TGW using both SL-GWAS and ML-GWAS methods. This QTN was detected tightly related to TGW with LARmEB, pKWmEB, and ISIS EM-BLASSO methods with the LOD ranged from 3.57 to 11.64 (Table 1). In the MLM method, this SNP was also significantly (p-value = 3.62× 10− 9) associated with TGW with an R2 of 7.79% (Table S2). Then, a local LD block (18,623,910–18,644,333 bp) was defined (Fig. 4a) with the step we mentioned in the Method section. Based on genome-wide annotation information, LOC_Os07g31440 and LOC_Os07g31450 were extracted from this region. Among them, LOC_Os07g31440 encodes an expressed protein with unknown function. The LOC_Os07g31450, also known as CHR729/CRL6, is a CHD (Chromodomain, helicase/ATPase, and DNA-binding domain) protein. Then, we defined four haplotypes of LOC_Os07g31450 (HapA, HapB, HapC, and HapD) based on the missense mutations in the gene. The accessions with the favorable HapD displayed significantly higher TGW than those with the HapA, HapB, and HapC types (Fig. 4b). These findings revealed that the grain weights of the accessions with favorable haplotype variations were predominantly improved compared to those with unfavourable variations. A previous study reported that CHR729 expresses ubiquitously, such as in stems, leaves, leaf sheaths, young panicles, and flower organs . To further explore the expression pattern of LOC_Os07g31450 in different tissues, we utilized the CREP database to analyse and found LOC_Os07g31450 had the highest expression level in the young panicles (< 1 mm, 3–5 mm, and 10–15 mm) (Fig. 4c).
The qGW-9-2 was another QTN simultaneously detected by both SL-GWAS and ML-GWAS. This QTN was identified significantly associated with GW using pLARmEB and pKWmEB methods with the LOD value of 6.74 and 7.74, respectively. Moreover, this site was also detected by the MLM method with the p-value of × 10− 8. A 50.7 kb LD block (1,314,833–1,365,534 bp) was defined for this QTN. In this region, LOC_Os09g02830 is annotated as OsMADS78 which belongs to the MADS-box family. OSMADS78 has been confirmed to be an important regulator of early seed developmental transition and impacts both rice seed size and quality . Nevertheless, the biological function of OsMADS78 is far from being understood. Here, our study speculated that OsMADS78 might contribute to rice grain width regulation.
Comparison of SL-GWAS and ML-GWAS results
The conventional single-locus methods like the general linear model (GLM) and MLM have been widely implemented to identify genetic variants in many cereals [30,31,32]. However, these models have certain shortcomings as they neglect the overall effects of multiple loci and suffer from the problem of multiple test corrections for critical values. For example, the stringent threshold leads to missing many robust QTLs, particularly small-effect QTLs in MLM . ML-GWAS methodologies therewith have been developed, such as mrMLM, FASTmrMLM, LASSO , and FarmCPU (Fixed and random model Circulating Probability Unification) . After comparing the statistical power of ML-GWAS with SL-GWAS methods, several studies demonstrated that multi-locus methods have lower false-positive error and higher statistical power than single-locus ones [17, 35, 36]. According to each ML-GWAS algorithm has its own characteristics and different QTL detection, investigators generally combine the merits of several ML-GWAS methods to mine target QTL for complex agronomic traits [26, 37, 38].
In the present study, we adopted the MLM model and five ML-GWAS methods to analyse five yield-related traits of 529 rice core germplasms. Consequently, 152, 106, 12, 111, and 64 significant SNPs, while 3, 19, 8, 53, and 56 QTLs were detected by MLM underlying GL, GW, GT, TGW, and YPP, respectively (Table S2).
Likewise, 161, 136, 160, 189, and 171 significant QTNs were identified using ML-GWAS methods linked with the above five traits successively. We noted that the number of QTLs mapped by MLM was less than the QTNs identification of five ML-GWAS algorithms, especially of those about GT and YPP. The previous study observed similar findings in GWAS analysis of soybean seed size, suggesting that the recognition results of the five ML-GWAS methods outperformed those of the two SL-GWAS programs. In addition, the QTNs distribution detected by ML-GWAS approaches were more dispersed compared with MLM. For instance, as described in Fig. S2a, the significant loci identified by MLM were concentrated near GS3 on chromosome 3. Whereas, many loci on other chromosomes failed to meet the threshold, indicating that it was difficult to find new loci from other chromosomes when applied traditional MLM. Afterward, a lot of significant QTNs presented across ML-GWAS models were not only situated on chromosome 3, but also widely distributed on other chromosomes, among which QTNs examined by at least three methods are worthy of further research. These data explicated that the ML-GWAS algorithms are considered more effective, powerful, and robust when applying to investigate the small-effect QTNs for yield-related traits.
Comparison of QTLs or QTNs detected in our study and previous studies
Over the past decade, numerous rice yield-related genes such as GS3, GW2, and GW5 have been identified and their functional roles were deeply elucidated [39, 40]. Among them, GS3, the first molecular characterized QTL for grain size, controls grain weight and length, with minor impacts on grain width and thickness . GW2 (Grain width 2), negatively regulates grain width  and GW2 homologs in common wheat plays a critical role in the genetic control of grain weight and protein content traits . GW5 influences grain width and weight acting in the brassinosteroid pathway and overexpression of GSE5/GW5 resulted in narrow grains [44, 45].
In the current study, we characterized 74 QTNs for five yield-related traits that were simultaneously identified using two or more ML-GWAS methods. Compared with the mapped loci of the previous studies, 23 QTNs and its ±300 kb genomic ranges overlapped the known genes. On chromosome 3, the QTN hotspot (qGL-3-3, qGL-3-4, and qGL-3-5) was located near the region of GS3 for grain length. The QTN cluster on chromosome 5 (qGW-5-1, qGW-5-2, and qGL-5-1) was mapped nearby GW5. Moreover, there were 51 novel QTNs excluded in the genomic regions of the past reports. Therefore, these identified makers may be the potential QTNs controlling rice grain and productivity.
Dissecting two candidate genes of yield-related traits
Using the efficient mixed-model association, two robust QTLs (qTGW-7-1 and qGW-9-2) were validated with major effects associated with yield-related traits using both SL- and ML-GWAS approaches. In the candidate qTGW-7-1 related to grain weight, LOC_Os07g31450/CHR729 is a kind of CHD protein, which encoded protein contains 2259 amino acids, belonging to the CHD3 subfamily . CHR729 has been reported to play an essential role in multiple aspects of rice root and seed development. As an example, CHR729 can control seedling development through the gibberellin pathway  and affect crown root formation through the auxin signalling pathway . In this study, GWAS results inferred that CHR729 might be related to grain weight, and transcriptional expression analysis reflected that CHR729 was highly expressed in young panicles. The role played by CHR729 in regulating rice grain and even productivity is worthwhile for further study and confirmation.
In the candidate qGW-9-2 involved in grain width, gene LOC_Os09g02830 (OsMADS78) belongs to the MADS transcription factor family. MADS family is a large family with conserved MADS-box domains, whose members widely take part in the key regulatory pathways of plant growth and reproduction (including flower formation) . The OsMADS family in rice participates in controlling flowering time, development of root and seed, especially of flower organs [48, 49]. For instance, OsMADS16/SPW1, which is homologous to APETALA3 in Arabidopsis, belongs to the Class B in the ABC model of flower organ development, determining the properties of slates and stamens in rice flower organs . Moreover, OsMADS13 controls ovule identity . OsMADS26 negatively regulates resistance to rice blast and drought tolerance . And OsMADS23, OsMADS25, OsMADS27, OsMADS57, and OsMADS61 determine root development . Although there are 75 members in the OsMADS family , nearly half of the members’ functions are still unknown. In this study, our findings give a clue that OsMADS78 may be related to grain width in rice. Recently, Paul et al. reported that OSMADS78 modulates early seed developmental transition and impacts rice grain length, grain width, 1000 grain weight, and grain quality . This conclusion showed the high reliability of our results, corresponding with this published research which verified the biological function of OSMADS78 by over-expression experiment .
In this study, a total of 74 QTN hotspots were simultaneously detected for five yield-related traits by two or more ML-GWAS methods. Among them, qTGW-7-1 and qGW-9-2, closely associated with TGW and GW separately, were presented across both SL-GWAS and ML-GWAS analyses. Besides, two key annotated genes (LOC_Os07g31450 and LOC_Os09g02830) underlying the above two target genomic ranges were mined. In summary, many robust QTLs and two candidate genes were supposed to potentially modulate grain shape and productivity in rice. This research made a beneficial attempt by a combinatory approach of ML-GWAS methods and will facilitate the detection of yield-related QTNs.
Phenotyping data and statistical analyses
The complete phenotypic records of 529 rice core accessions were downloaded from RiceVarMap v2.0 , an integrated dataset of rice genomic variations denoted by Huazhong Agricultural University. This set of germplasm contains diverse rice cultivars. Thereinto, the materials were classified into 299 indica (95 indica I, 74 indica II, 13 indica III, and 117 indica intermediate types), 156 japonica (93 temperate japonica, 43 tropical japonica, and 20 japonica intermediate types), 46 aus, 14 aromatic, and 14 intermediate types.
The yield-related agronomic characters include grain length (GL), grain width (GW), grain thickness (GT), thousand-grain weight (TGW), and yield per plant (YPP) were downloaded from RiceVarMap2 website (http://ricevarmap.ncpgr.cn/phenos/). The detailed information, including experimental design, years, replicate, could be found in a study . Meanwhile, the minimum, maximum, mean, standard deviation, range, and coefficient of variation (CV) for each trait were calculated in Table S1. Pearson correlation analysis for phenotypic data was conducted using the SAS 9.4 software (http://www.sas.com/).
Genotyping data processing
High-quality re-sequencing raw data of 529 germplasms were derived from RiceVarMap (http://ricevarmap.ncpgr.cn/v1/) . The detailed information of data processing and BCF files could be found on the RiceVarMap webpage. Raw single nucleotide variants (SNV) were processed by PLINK 1.9 software with parameter --maf 0.05 --geno 0.05 --snps-only. Then the genotypic data was imputed by Beagle 5.0 and a total of 607,201 SNPs were left for further analysis .
Clustering analysis, population structure, and LD analysis
The matrix of pairwise genetic distance derived from 607,201 SNPs was implemented to construct phylogenetic trees by SNPhylo with the default parameter . Principal component analysis (PCA) and kinship matrix were performed by the Tassel 5.2 program to estimate the population structure . Linkage disequilibrium (LD) between SNPs was estimated as the squared correlation coefficient (r2) of alleles, meanwhile the whole population and sub-populations were implemented by software PopLDdecay .
Genome-wide association study
In our case, GWAS was performed in 529 rice varieties with 607,201 high-quality SNPs. A mixed linear model (MLM) was carried out for single-locus method to evaluate the trait-SNP association analysis for agriculture traits using the Tassel software. The first three principal components (PCs) and kinship matrix were used as covariates to correct population structure for decreasing false-positive rate in MLM. The genome-wide significance threshold (p-value = 1.65 × 10− 6) was calculated by negative log(1/n, n is the number of SNPs).
Five ML-GWAS methods within the mrMLM R package (https://cran.r-project.org/web/packages/mrMLM/index.html) were used to map candidate QTNs, including mrMLM, FASTmrMLM, FASTmrEMMA, pLARmEB, and ISIS EM-BLASSO. All parameters were set at default values, and the critical LOD score was set to 3 for robust QTNs at the last stage. All these five methods used the PCA and kinship matrices in our study. The Manhattan and QQ plots for GWAS were displayed using the R package CMplot (https://github.com/YinLiLin/R-CMplot). QTNs were named as Q + the initial letter of traits name abbreviations + chromosome number + occurrence sequence .
Identification of putative genes
The QTNs identified by at least two different ML-GWAS methods were regarded as the putative candidate loci. Local LD blocks containing at least two SNPs were calculated with all imputed SNP using the PLINK 1.9 software . The local LD blocks of each significant QTN were determined via confidence intervals described by Gabriel . The LD heatmap was visualized using Haploview software . All the genes located in the LD block of QTNs were extracted for further analysis. By comprehensive analysis of gene annotation (MSU 6.1, http://rice.plantbiology.msu.edu/pub/data/Eukaryotic_Projects/o_sativa/annotation_dbs/pseudomolecules/version_6.1/), protein domain function in previous reports, and transcriptome information (data were deposited in the CREP database, Collections of Rice Expression Profiling, http://crep.ncpgr.cn/), the candidate genes for each trait were further mined.
Phenotypic difference of candidate genes
The haplotypes of the candidate gene were determined based on the missense SNP, then the Wilcox-test was used to test the phenotypic difference among each haplotype. The characters of each trait were visualized with box plots using the R 3.6.1 language .
Availability of data and materials
The genotype and phenotype datasets analysed during the current study are available in the RiceVarMap (http://ricevarmap.ncpgr.cn/). The raw sequence data are available in NCBI with accession number PRJNA171289.
Chromodomain, helicase/ATPase, and DNA-binding domain
Collections of Rice Expression Profiling
Coefficient of variation
Efficient mixed model association eXpedited
Fixed and random model Circulating Probability Unification
Factored spectrally transformed linear mixed models
Fast multi-locus random-SNP-effect efficient mixed model analysis
General linear model
Grain size 3
Grain width 2
Grain width 5
Genome-wide association study
- ISIS EM-BLASSO:
Iterative modified-sure independence screening expectation-maximization-Bayesian LASSO
Least absolute shrinkage and selection operator
Logarithm of Odds
Multi-locus genome-wide association studies
Mixed linear model
multi-locus random-SNP-effect MLM
Principal component analysis
Pearson correlation coefficient
polygenic-background-control-based least angle regression plus empirical Bayes
Integration of Kruskal-Wallis test with empirical Bayes
Quantitative trait locus
Quantitative trait loci
Quantitative trait nucleotide
- R2 :
The proportion of total phenotypic variance explained by each QTN
The squared correlation coefficient
Single-locus genome-wide association studies
Single nucleotide variant
Yield per plant
Muthayya S, Sugimoto JD, Montgomery S, Maberly GF. An overview of global rice production, supply, trade, and consumption. Ann N Y Acad Sci. 2014;1324(1):7–14. https://doi.org/10.1111/nyas.12540.
Challinor AJ, Watson J, Lobell DB, Howden SM, Smith DR, Chhetri N. A meta-analysis of crop yield under climate change and adaptation. Nat Clim Chang. 2014;4(4):287–91. https://doi.org/10.1038/nclimate2153.
Xing Y, Zhang Q. Genetic and molecular bases of rice yield. Annu Rev Plant Biol. 2010;61(1):421–42. https://doi.org/10.1146/annurev-arplant-042809-112209.
Zhong H, Liu C, Kong W, Zhang Y, Zhao G, Sun T, et al. Effect of multi-allele combination on rice grain size based on prediction of regression equation model. Mol Gen Genomics. 2020;295(2):465–74. https://doi.org/10.1007/s00438-019-01627-y.
Bai X, Wu B, Xing Y. Yield-related QTLs and their applications in Rice genetic improvement. J Integr Plant Biol. 2012;54(5):300–11. https://doi.org/10.1111/j.1744-7909.2012.01117.x.
Fan C, Yu S, Wang C, Xing Y. A causal C-A mutation in the second exon of GS3 highly associated with rice grain length and validated as a functional marker. Theor Appl Genet. 2009;118(3):465–72. https://doi.org/10.1007/s00122-008-0913-1.
Weng J, Gu S, Wan X, Gao H, Guo T, Su N, et al. Isolation and initial characterization of GW5, a major QTL associated with rice grain width and weight. Cell Res. 2008;18(12):1199–209. https://doi.org/10.1038/cr.2008.307.
Wang S, Wu K, Qian Q, Liu Q, Li Q, Pan Y, et al. Non-canonical regulation of SPL transcription factors by a human OTUB1-like deubiquitinase defines a new plant type rice associated with higher grain yield. Cell Res. 2017;27(9):1142–56. https://doi.org/10.1038/cr.2017.98.
Si L, Chen J, Huang X, Gong H, Luo J, Hou Q, et al. OsSPL13 controls grain size in cultivated rice. Nat Genet. 2016;48(4):447–56. https://doi.org/10.1038/ng.3518.
Zhong H, Kong W, Gong Z, Fang X, Deng X, Liu C, et al. Evolutionary analyses reveal diverged patterns of Squamosa promoter binding protein-like (Spl) gene family in Oryza genus. Front Plant Sci. 2019;10 May:1–10.
Prom LK, Ahn E, Isakeit T, Magill C. GWAS analysis of sorghum association panel lines identifies SNPs associated with disease response to Texas isolates of Colletotrichum sublineola. Theor Appl Genet. 2019;132(5):1389–96. https://doi.org/10.1007/s00122-019-03285-5.
Li N, Zheng H, Cui J, Wang J, Liu H, Sun J, et al. Genome-wide association study and candidate gene analysis of alkalinity tolerance in japonica rice germplasm at the seedling stage. Rice. 2019;12(1):24. https://doi.org/10.1186/s12284-019-0285-y.
Zhang YM, Mao Y, Xie C, Smith H, Luo L, Xu S. Mapping quantitative trait loci using naturally occurring genetic variance among commercial inbred lines of maize (Zea mays L.). Genetics. 2005;169:2267–75.
Kang HM, Sul JH, Service SK, Zaitlen NA, Kong SY, Freimer NB, et al. Variance component model to account for sample structure in genome-wide association studies. Nat Genet. 2010;42(4):348–54. https://doi.org/10.1038/ng.548.
Lippert C, Listgarten J, Liu Y, Kadie CM, Davidson RI, Heckerman D. FaST linear mixed models for genome-wide association studies. Nat Methods. 2011;8(10):833–5. https://doi.org/10.1038/nmeth.1681.
Wang SB, Feng JY, Ren WL, Huang B, Zhou L, Wen YJ, et al. Improving power and accuracy of genome-wide association studies via a multi-locus mixed linear model methodology. Sci Rep. 2016;6.
Cui Y, Zhang F, Zhou Y. The application of multi-locus GWAS for the detection of salt-tolerance loci in rice. Front Plant Sci. 2018;9 October:1–9. https://doi.org/10.3389/fpls.2018.01464.
Tamba CL, Zhang YM. A fast mrMLM algorithm for multi-locus genome-wide association studies. bioRxiv. 2018;October.
Tamba CL, Ni YL, Zhang YM. Iterative sure independence screening EM-Bayesian LASSO algorithm for multi-locus genome-wide association studies. PLoS Comput Biol. 2017;13(1):e1005357. https://doi.org/10.1371/journal.pcbi.1005357.
Ren WL, Wen YJ, Dunwell JM, Zhang YM. PKWmEB: integration of Kruskal-Wallis test with empirical Bayes under polygenic background control for multi-locus genome-wide association study. Heredity (Edinb). 2018;120(3):208–18. https://doi.org/10.1038/s41437-017-0007-4.
Wen YJ, Zhang H, Ni YL, Huang B, Zhang J, Feng JY, et al. Methodological implementation of mixed linear models in multi-locus genome-wide association studies. Brief Bioinform. 2018;19(4):700–12. https://doi.org/10.1093/bib/bbw145.
Zhang J, Feng JY, Ni YL, Wen YJ, Niu Y, Tamba CL, et al. PLARmEB: integration of least angle regression with empirical Bayes for multilocus genome-wide association studies. Heredity (Edinb). 2017;118(6):517–24. https://doi.org/10.1038/hdy.2017.8.
Xu Y, Yang T, Zhou Y, Yin S, Li P, Liu J, et al. Genome-wide association mapping of starch pasting properties in maize using single-locus and multi-locus models. Front Plant Sci. 2018;9 September:1–10. https://doi.org/10.3389/fpls.2018.01311.
Misra G, Badoni S, Domingo CJ, Cuevas RPO, Llorente C, Mbanjo EGN, et al. Deciphering the genetic architecture of cooked rice texture. Front Plant Sci. 2018;9 October:1–16. https://doi.org/10.3389/fpls.2018.01405.
Lü H, Yang Y, Li H, Liu Q, Zhang J, Yin J, et al. Genome-wide association studies of photosynthetic traits related to phosphorus efficiency in soybean. Front Plant Sci. 2018;9 August. https://doi.org/10.3389/fpls.2018.01226.
Liu S, Zhong H, Meng X, Sun T, Li Y, Pinson SRM, et al. Genome-wide association studies of ionomic and agronomic traits in USDA mini core collection of rice and comparative analyses of different mapping methods. BMC Plant Biol. 2020;20(1):441. https://doi.org/10.1186/s12870-020-02603-0.
Ngangkham U, Samantaray S, Yadav MK, Kumar A, Chidambaranathan P, Katara JL. Effect of multiple allelic combinations of genes on regulating grain size in rice. PLoS One. 2018;13(1):1–20. https://doi.org/10.1371/journal.pone.0190684.
Ma X, Ma J, Zhai H, Xin P, Chu J, Qiao Y, et al. CHR729 is a CHD3 protein that controls seedling development in rice. PLoS One. 2015;10.
Paul P, Dhatt BK, Miller M, Folsom JJ, Wang Z, Krassovskaya I, et al. MADS78 and MAdS79 are essential regulators of early seed development in RICE. Plant Physiol. 2020;182(2):933–48. https://doi.org/10.1104/pp.19.00917.
Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet. 2006;38(8):904–9. https://doi.org/10.1038/ng1847.
He L, Xiao J, Rashid KY, Yao Z, Li P, Jia G, et al. Genome-wide association studies for pasmo resistance in flax (Linum usitatissimum L.). Front Plant Sci. 2019;9 January:1982. https://doi.org/10.3389/fpls.2018.01982.
Sant’Ana GC, Pereira LFP, Pot D, Ivamoto ST, Domingues DS, Ferreira RV, et al. Genome-wide association study reveals candidate genes influencing lipids and diterpenes contents in Coffea arabica L. Sci Rep. 2018;8:1–12.
Xu Y, Xu C, Xu S. Prediction and association mapping of agronomic traits in maize using multiple omic data. Heredity (Edinb). 2017;119(3):174–84. https://doi.org/10.1038/hdy.2017.27.
Liu X, Huang M, Fan B, Buckler ES, Zhang Z. Iterative usage of fixed and random effect models for powerful and efficient genome-wide association studies. PLoS Genet. 2016;12(2):e1005767. https://doi.org/10.1371/journal.pgen.1005767.
Segura V, Vilhjálmsson BJ, Platt A, Korte A, Seren Ü, Long Q, et al. An efficient multi-locus mixed-model approach for genome-wide association studies in structured populations. Nat Genet. 2012;44(7):825–30. https://doi.org/10.1038/ng.2314.
Hu X, Zuo J, Wang J, Liu L, Sun G, Li C, et al. Multi-locus genome-wide association studies for 14 main agronomic traits in Barley. Front Plant Sci. 2018;871(November):1–14. https://doi.org/10.3389/fpls.2018.01683.
Wu X, Li Y, Shi Y, Song Y, Zhang D, Li C, et al. Joint-linkage mapping and GWAS reveal extensive genetic loci that regulate male inflorescence size in maize. Plant Biotechnol J. 2016;14(7):1551–62. https://doi.org/10.1111/pbi.12519.
Ma Z, He S, Wang X, Sun J, Zhang Y, Zhang G, et al. Resequencing a core collection of upland cotton identifies genomic variation and loci influencing fiber quality and yield. Nat Genet. 2018;50(6):803–13. https://doi.org/10.1038/s41588-018-0119-7.
Sakamoto T, Matsuoka M. Identifying and exploiting grain yield genes in rice. Curr Opin Plant Biol. 2008;11(2):209–14. https://doi.org/10.1016/j.pbi.2008.01.009.
Zuo J, Li J. Molecular genetic dissection of quantitative trait loci regulating rice grain size. Annu Rev Genet. 2014;48(1):99–118. https://doi.org/10.1146/annurev-genet-120213-092138.
Fan C, Xing Y, Mao H, Lu T, Han B, Xu C, et al. GS3, a major QTL for grain length and weight and minor QTL for grain width and thickness in rice, encodes a putative transmembrane protein. Theor Appl Genet. 2006;112(6):1164–71. https://doi.org/10.1007/s00122-006-0218-1.
Song XJ, Huang W, Shi M, Zhu MZ, Lin HX. A QTL for rice grain width and weight encodes a previously unknown RING-type E3 ubiquitin ligase. Nat Genet. 2007;39(5):623–30. https://doi.org/10.1038/ng2014.
Zhang Y, Li D, Zhang D, Zhao X, Cao X, Dong L, et al. Analysis of the functions of TaGW2 homoeologs in wheat grain weight and protein content traits. Plant J. 2018;94(5):857–66. https://doi.org/10.1111/tpj.13903.
Duan P, Xu J, Zeng D, Zhang B, Geng M, Zhang G, et al. Natural variation in the promoter of GSE5 contributes to grain size diversity in Rice. Mol Plant. 2017;10(5):685–94. https://doi.org/10.1016/j.molp.2017.03.009.
Liu J, Chen J, Zheng X, Wu F, Lin Q, Heng Y, et al. GW5 acts in the brassinosteroid signalling pathway to regulate grain width and weight in rice. Nat Plants. 2017;3(5). https://doi.org/10.1038/nplants.2017.43.
Wang Y, Wang D, Gan T, Liu L, Long W, Wang Y, et al. CRL6, a member of the CHD protein family, is required for crown root development in rice. Plant Physiol Biochem. 2016;105:185–94. https://doi.org/10.1016/j.plaphy.2016.04.022.
Callens C, Tucker MR, Zhang D, Wilson ZA. Dissecting the role of MADS-box genes in monocot floral development and diversity. J Exp Bot. 2018;69(10):2435–59. https://doi.org/10.1093/jxb/ery086.
Yu C, Su S, Xu Y, Zhao Y, Yan A, Huang L, et al. The effects of fluctuations in the nutrient supply on the expression of five members of the AGL17 clade of MADS-box genes in rice. PLoS One. 2014;9.
Jeon JS, Lee S, Jung KH, Yang WS, Yi GH, Oh BG, et al. Production of transgenic rice plants showing reduced heading date and plant height by ectopic expression of rice MADS-box genes. Mol Breed. 2000;6(6):581–92. https://doi.org/10.1023/A:1011388620872.
Xiao H, Wang Y, Liu D, Wang W, Li X, Zhao X, et al. Functional analysis of the rice AP3 homologue OsMADS16 by RNA interference. Plant Mol Biol. 2003;52(5):957–66. https://doi.org/10.1023/A:1025401611354.
Dreni L, Jacchia S, Fornara F, Fornari M, Ouwerkerk PBF, An G, et al. The D-lineage MADS-box gene OsMADS13 controls ovule identity in rice. Plant J. 2007;52(4):690–9. https://doi.org/10.1111/j.1365-313X.2007.03272.x.
Khong GN, Pati PK, Richaud F, Parizot B, Bidzinski P, Mai CD, et al. OsMADS26 negatively regulates resistance to pathogens and drought tolerance in rice. Plant Physiol. 2015;169(4):2935–49. https://doi.org/10.1104/pp.15.01192.
Arora R, Agarwal P, Ray S, Singh AK, Singh VP, Tyagi AK, et al. MADS-box gene family in rice: genome-wide identification, organization and expression profiling during reproductive development and stress. BMC Genomics. 2007;8(1):242. https://doi.org/10.1186/1471-2164-8-242.
Zhao H, Yao W, Ouyang Y, Yang W, Wang G, Lian X, et al. RiceVarMap: a comprehensive database of rice genomic variations. Nucleic Acids Res. 2015;43(D1):D1018–22. https://doi.org/10.1093/nar/gku894.
Xie W, Wang G, Yuan M, Yao W, Lyu K, Zhao H, et al. Breeding signatures of rice improvement revealed by a genomic variation map from a large germplasm collection. Proc Natl Acad Sci U S A. 2015;112(39):E5411–9. https://doi.org/10.1073/pnas.1515919112.
Browning BL, Zhou Y, Browning SR. A one-penny imputed genome from next-generation reference panels. Am J Hum Genet. 2018;103(3):338–48. https://doi.org/10.1016/j.ajhg.2018.07.015.
Lee TH, Guo H, Wang X, Kim C, Paterson AH. SNPhylo: a pipeline to construct a phylogenetic tree from huge SNP data. BMC Genomics. 2014;15(1):162. https://doi.org/10.1186/1471-2164-15-162.
Bradbury PJ, Zhang Z, Kroon DE, Casstevens TM, Ramdoss Y, Buckler ES. TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics. 2007;23(19):2633–5. https://doi.org/10.1093/bioinformatics/btm308.
Zhang C, Dong SS, Xu JY, He WM, Yang TL. PopLDdecay: a fast and effective tool for linkage disequilibrium decay analysis based on variant call format files. Bioinformatics. 2019;35(10):1786–8. https://doi.org/10.1093/bioinformatics/bty875.
McCouch SR. Gene nomenclature system for rice. Rice. 2008;1(1):72–84. https://doi.org/10.1007/s12284-008-9004-9.
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81(3):559–75. https://doi.org/10.1086/519795.
Gabriel SB, Schaffner SF, Nguyen H, Moore JM, Roy J, Blumenstiel B, et al. The structure of haplotype blocks in the human genome. Science. 2002;296(5576):2225–9. https://doi.org/10.1126/science.1069424.
Barrett JC, Fry B, Maller J, Daly MJ. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics. 2005;21(2):263–5. https://doi.org/10.1093/bioinformatics/bth457.
R Core Team. R: A language and environment for statistical computing: R Foundation for Statistical Computing; 2019.
The authors thank the lab members for their assistance.
This study was supported by the National Special Key Project for Transgenic Breeding (2016ZX08001001) and the National Key Research and Development Program of China (2016YFD0100400). The funding agencies played no role in the design of the study, data collection, analysis, and interpretation or writing the manuscript.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Distribution of SNP markers on Chromosomes. The x-axis represents the number of SNPs in the 1 Mb window, and the y-axis represents 12 chromosomes of rice. Different colors represent different numbers of SNPs.
Manhattan plots of the SL-GWAS and ML-GWAS for yield. (A-E) Manhattan maps representing five traits of GL, GW, GT, TGW, and YPP, respectively; The x-axis displays the chromosome label, and the y-axis displays -log10 (p-value). The dotted and solid gray lines show significant associations between SNPs and phenotype value with threshold levels of p-value < 1.65 × 10− 6 and p-value < 8.25 × 10− 8, separately. Red, green, blue, purple, yellow, and grey dots represent mrMLM, FASTmrEMMA, pLARmEB, pKWmEB, ISIS EM-BLASSO, and MLM models. (F-J) QQ plots represent MLM analysis of the above five traits.
Statistical analysis of five rice yield-related traits.
SNPs and QTLs mapping information in MLM analysis.
The 161 published genes and their functional information of yield traits in rice.
QTN simultaneously detected by both ML-GWAS and MLM methods.
About this article
Cite this article
Zhong, H., Liu, S., Sun, T. et al. Multi-locus genome-wide association studies for five yield-related traits in rice. BMC Plant Biol 21, 364 (2021). https://doi.org/10.1186/s12870-021-03146-8