Skip to main content

Genome-wide association studies and whole-genome prediction reveal the genetic architecture of KRN in maize

Abstract

Background

Kernel row number (KRN) is an important trait for the domestication and improvement of maize. Exploring the genetic basis of KRN has great research significance and can provide valuable information for molecular assisted selection.

Results

In this study, one single-locus method (MLM) and six multilocus methods (mrMLM, FASTmrMLM, FASTmrEMMA, pLARmEB, pKWmEB and ISIS EM-BLASSO) of genome-wide association studies (GWASs) were used to identify significant quantitative trait nucleotides (QTNs) for KRN in an association panel including 639 maize inbred lines that were genotyped by the MaizeSNP50 BeadChip. In three phenotyping environments and with best linear unbiased prediction (BLUP) values, the seven GWAS methods revealed different numbers of KRN-associated QTNs, ranging from 11 to 177. Based on these results, seven important regions for KRN located on chromosomes 1, 2, 3, 5, 9, and 10 were identified by at least three methods and in at least two environments. Moreover, 49 genes from the seven regions were expressed in different maize tissues. Among the 49 genes, ARF29 (Zm00001d026540, encoding auxin response factor 29) and CKO4 (Zm00001d043293, encoding cytokinin oxidase protein) were significantly related to KRN, based on expression analysis and candidate gene association mapping. Whole-genome prediction (WGP) of KRN was also performed, and we found that the KRN-associated tagSNPs achieved a high prediction accuracy. The best strategy was to integrate all of the KRN-associated tagSNPs identified by all GWAS models.

Conclusions

These results aid in our understanding of the genetic architecture of KRN and provide useful information for genomic selection for KRN in maize breeding.

Background

Maize (Zea mays L.) arose from a single domestication event from its wild progenitor, teosinte, in southern Mexico approximately 9000 years ago and is now one of the most important cereal crops worldwide [1]. During domestication, its morphological characteristics, especially inflorescence architectures, differed profoundly [2, 3]. The shift from small ears in teosinte to larger ears in modern maize was accompanied by a dramatic increase in kernel row number (KRN) [4]. Thus, constant efforts have been made to explore the genetic basis underlying the striking diversities in inflorescence architecture and KRN in maize.

KRN is an important ear trait and is formed by multiple meristem types during female inflorescence development, including inflorescence meristems (IMs), spikelet pair meristems (SPMs), spikelet meristems (SMs) and floral meristems (FMs) [5]. To date, some genes have been cloned and found to be involved in complex regulatory networks responsible for meristem development and KRN modification by studying mutants [6,7,8,9,10]. However, these classical mutants show negative pleiotropy for other traits related to plant architecture and are difficult to directly use in maize breeding [11]. Therefore, linkage mapping and association mapping have been performed in naturally varying populations with the aim of identifying more elite natural alleles controlling KRN.

Although many quantitative trait loci (QTLs) related to KRN were identified by linkage mapping in biparental segregating populations, few have been successfully cloned due to their small genetic effects, except for KRN4 [12] and KRN1 [13]. Genome-wide association studies (GWASs) of KRN have also been conducted and revealed many quantitative trait nucleotides (QTNs) [14,15,16]. At the same time, GWAS results can be easily influenced by population structure and rare variants in natural populations [17]. Therefore, many statistical models have been developed to improve power for identifying genotype-phenotype associations when using the GWAS approach, such as the single-locus mixed linear model (MLM) method [18, 19] and the multilocus methods mrMLM [20], ISIS EM-BLASSO [21], pLARmEB [22], FASTmrEMMA [23], pKWmEB [24], and FASTmrMLM [25]. The MLM method is a single-locus fixed-single nucleotide polymorphism (SNP)-effect approach used in the case of a polygenic background to control population structure [18, 19]. To reduce the false positive rate (FPR), stringent Bonferroni correction is used for multiple testing correction in the MLM approach [26]. The multilocus method is an alternative GWAS procedure that is based on a random-SNP-effect model, and no multiple testing correction is needed [26]. There are two steps in this model. First, a reduced number of SNPs is selected through different algorithms, and the SNPs are then used in the multilocus model to detect true signals [20,21,22,23,24,25,26]. Recently, a few studies have implemented the above GWAS methods to detect important loci controlling different traits in rice [27], maize [28], flax [29], bread wheat [30] and upland cotton [31, 32].

Previous studies have revealed that KRN is quantitatively inherited and that the effects of a single genetic locus are generally small, which poses challenges for genetic improvement in maize breeding. Therefore, the best approach is to improve the ability to predict KRN by integrated analysis of more markers distributed throughout the whole genome. Genomic selection (GS), or whole-genome prediction (WGP), has the capacity to use full-genome data to increase breeding efficiency [33]. In previous studies, WGPs of KRN were performed in F1 hybrids between recombinant inbred lines [34], interconnected biparental maize populations [35] and 339 maize inbred lines [36], all of which showed that KRN was a trait suitable for genome-wide prediction. Liu et al. [15] showed that approximately 300 top KRN-associated tagSNPs were sufficient for predicting the KRN of inbred lines and hybrids using ridge regression best linear unbiased prediction (rr-BLUP). Based on these analyses, we are faced with determining how to select fewer markers to accurately predict KRN. Several studies reported that selecting association markers from the results of GWASs and including them as fixed effects in WGP models resulted in better performance than that achieved with single WGP models [37,38,39]. This might provide a way to simultaneously model different aspects of genetic architecture and is especially accessible to breeders [39].

In this study, we performed a GWAS of an association panel including 639 maize inbred lines based on the MaizeSNP50 BeadChip by using one single-locus method, the MLM method, and six multilocus methods, mrMLM, FASTmrMLM, FASTmrEMMA, pLARmEB, pKWmEB and ISIS EM-BLASSO. The common significant QTNs codetected by different methods and across different environments were analyzed, and the candidate genes related to KRN were further predicted. WGP was also performed using various KRN-related tagSNPs to dissect the genetic architecture of KRN.

Results

Natural variation in KRN within the association panel

KRN was measured within our association panel, which included 639 maize inbred lines, in XX (Xinxiang in Henan Province, 35.19°N, 113.53°E), BJ (Beijing, 39.48°N, 116.28°E) and GZL (Gongzhuling in Jilin Province, 43.50°N, 124.82°E) in 2011 (Table S1). The results showed that KRN was normally distributed in each environment, and the KRNs among environments were highly positively correlated, with correlations ranging from 0.73 between XX and BJ to 0.79 between XX and GZL (Fig. 1a). KRN exhibited high broad-sense heritability (H2 = 0.90, Table 1), which was similar to the results of previous studies [14, 16]. Comparing KRN among the different environments, we found that it showed the smallest average (13.69), minimum (8.60) and maximum (20.60) values in XX, where all accessions were planted in summer (June). With increasing latitude, where the accessions were planted in spring (May), the average KRN increased (14.65 in BJ and 14.59 in GZL). The largest range (max - min) in KRN appeared in GZL (12.60), which had the longest day length (Table 1). Based on previous results [40], our association panel could be divided into five subgroups: Reid, tangsipingtou (TSPT), lvdahonggu (LRC), Lancaster and P. The KRN statistical analysis results of various subgroups are shown in Table S2. There were no significant differences in KRN among the five subgroups (Fig. 1b). These results indicated that KRN was a quantitative trait and that the phenotypic variation among the tested inbred lines in the association panel was beneficial for dissecting the genetic architecture of KRN.

Fig. 1
figure1

Phenotypic analysis. a Correlation analysis of the KRN phenotype among XX, BJ and GZL. The frequency distribution diagrams of KRN in three environments were plotted, and the correlation coefficient between each pair of environments was calculated. b Violin plots of KRN in the subgroups (P, Lancaster, TSPT, LRC, and Reid) of this association mapping panel

Table 1 Phenotypic variance in KRN for 639 maize inbred lines in three environments

QTNs for KRN identified by different methods

Single-locus analysis of KRN (MLM)

Based on the MaizeSNP50 BeadChip, we obtained 42,667 high-quality SNPs distributed on 10 maize chromosomes. Under the P < 0.0001 and P < 0.001 thresholds, 3/56, 3/46, 1/24, and 3/51 KRN-associated QTNs were found in XX (Fig. 2a), in BJ (Fig. 2b), in GZL (Fig. 2c) and with BLUP (Fig. 2d), respectively. To account for overcorrection in this model, the P < 0.001 threshold was selected to identify KRN-associated QTNs. Finally, 177 QTNs were found to be associated with KRN, and the proportion of phenotypic variance explained (PVE) by these individual QTNs ranged from 1.84 to 4.01% (Table S3).

Fig. 2
figure2

Genome-wide distribution of significant QTNs detected by different models under four conditions. a XinXiang (XX), Henan Province by the MLM method; b Beijing (BJ) by the MLM method; c Gongzhuling (GZL), Jilin Province, by the MLM method; d BLUP across the three environments by the MLM method; e The genome-wide distribution of all the significant QTNs identified by seven methods: the four circles from outside to inside show the distribution of significant QTNs identified in XX, BJ, and GZL and with BLUP, respectively. Dots of different colors represent QTNs mined by different GWAS models: red dots, MLM; green dots, mrMLM; blue dots, FASTmrMLM; black dots, FASTmrEMMA; pink dots, pLARmEB; purple dots, pKWmEB; pale goldenrod dots, ISIS EM-BLASSO

Multiple-locus analysis of KRN

Using different multiple-locus models, we identified different numbers of significant QTNs for KRN in XX, BJ, and GZL and together with BLUP across all locations. These QTNs were unevenly distributed on 10 chromosomes, with the most QTNs on Chr. 1 and the fewest on Chr. 8 (Fig. 2e). Specifically, 15 (FASTmrEMMA)-177 (mrMLM) QTNs in XX, 11 (FASTmrEMMA)-30 (ISIS EM-BLASSO) QTNs in BJ, 12 (FASTmrEMMA)-55 (mrMLM) QTNs in GZL and 11 (FASTmrEMMA)-106 (mrMLM) QTNs for BLUP were identified by the six different methods (Table S4). Comparative analysis of the GWAS results among different statistical approaches showed that FASTmrEMMA detected the fewest QTNs in all the environments, while mrMLM detected the most QTNs in all the environments, except for BJ (Table S4). QTN overlap analysis among the seven methods indicated that the common QTNs codetected by at least two methods accounted for more than 40% of the QTNs in different environments (Figure S1a and Table S5, 42% in XX, 62% in BJ, 58% in GZL and 47% with BLUP). For example, 65 common QTNs representing 30 loci were codetected by two methods in XX, and 39 common QTNs representing 13 loci, 28 common QTNs representing 7 loci, 25 common QTNs representing 5 loci, and 6 common QTNs representing 1 locus were codetected by three, four, five and six methods, respectively (Figure S1a and Table S5). No QTNs were identified by all 7 methods in different locations. Overall, ISIS EM-BLASSO, which detected the third largest number of QTNs, identified the most codetected QTNs, followed by FASTmrMLM (Figure S1a and Table S5). Comparative analysis of the GWAS results among the different environments showed that the majority of the QTNs identified by the MLM method and ISIS EM-BLASSO were repeatedly detected in different locations (Figure S1b, Table S6).

Overall, comparing our GWAS results with those of previous studies, we found that some important genes controlling inflorescence architecture in maize were located within 200 kb of the significant QTNs, including CT2 (Zm00001d027886), FEA3 (Zm00001d040130), BAD1 (Zm00001d005737), RA1 (Zm00001d020430), and VT2 (Zm00001d008700) (Table S13).

Annotation and expression of candidate genes for KRN

To obtain reliable significant QTNs and predict the candidate genes for KRN, only the QTNs simultaneously identified by at least three methods (either single-locus or multilocus) and in at least two environments were used for the next analysis. Finally, seven QTNs controlling KRN were obtained (Table 2). The seven QTNs were located on chromosomes 1, 2, 3, 5, 9, and 10, and the PVE by these QTNs ranged from 1.06 to 5.21%. Based on the linkage disequilibrium (LD) in the association panel (Figure S2), 49 genes around the QTNs (200 kb upstream and downstream) were obtained, and their expression varied widely in different maize tissues (Fig. 3a and Table S7). For example, Zm00001d016760, which encodes the abscisic acid stress ripening 6 protein, is highly expressed in the roots, and Zm00001d031426, which encodes serine/threonine-protein kinase, and Zm00001d043298, which encodes a P-loop containing nucleoside triphosphate hydrolase superfamily protein, are highly expressed in tassels and anthers. Among the 49 genes, 22 were differentially expressed in different spike development mutants (Table S8); i.e., the ra1, ra2 and ra3 mutants had abnormal highly branched tassels and ears, with the ears displaying a very large KRN [41]; the kn1 mutant had smaller ears and fewer spikelets [42]. This result suggested that these 22 genes might be involved in ear development in maize.

Table 2 Significant KRN-associated QTNs codetected in at least two environments and by at least three models
Fig. 3
figure3

Candidate gene analysis of KRN. a Expression heatmap of the genes located in the codetected regions. All expression data were collected from inbred B73. Leaf 1 means the leaf base; leaf 2 means a 1-cm leaf; leaf 3 means a 4-cm leaf; leaf 4 means the leaf tip; leaf 5 means the leaf at 20 days after pollination (DAP); S10 means the kernel at 10 DAP. b ARF29 (Zm00001d026540) gene association mapping using the Ames 228 panel. c CKO4 (Zm00001d043293) gene association mapping

Interestingly, we found that Zm00001d026540 (encoding auxin response factor 29, ARF29), which was located within 200 kb downstream of PZE-110106563 on Chr. 10 and was detected by the MLM method and all six multilocus GWAS methods (Table 2), had higher expression in SAMs and ears than in other tissues (Table S7). Candidate gene association mapping was also performed. The SNPs within ARF29 and the 10-kb promoter and 10-kb region downstream of ARF29 were obtained from maize HapMap3 [43]. The KRN of 282 inbred lines was measured in six environments (see Methods), and the BLUP values were calculated. The MLM mapping result showed that five SNPs (two SNPs in the gene and three SNPs in the region upstream of the gene) around ARF29 were significantly related to KRN (Fig. 3b and Table 3). ARF29 can bind the Bif1 (which is related to SAM development and final KRN) promoter by recognizing the TTTCGG motif [44, 45]. The S10_147,122,969 SNP, located within the gene body, was significantly associated with KRN. Two alleles for this SNP (A/T) were present in this panel, with the A allele conferring a higher KRN. Cytokinins also play an important role in the development of immature spikes and the formation of final KRN [46]. For example, UB3 regulates KRN by the cytokinin pathway and CLAVATA-WUSCHEL pathway [46]. In this study, CKO4 (Zm00001d043293, encoding cytokinin oxidase protein) was detected as being located within 200 kb upstream of PUT-163a-110,967,306-138 on Chr. 3 by four GWAS methods (MLM, mrMLM, pLARmEB, and pKWmEB, Table 2), and candidate gene association mapping of CKO4 was also conducted. The SNPs and KRN were also obtained from HapMap3 and 282 inbred lines. The MLM results showed that two SNPs located upstream of CKO4 were significantly associated with KRN (Fig. 3c and Table 3). The S3_191,837,578 SNP had two alleles (T/G), and the T allele was associated with a higher KRN but had a lower frequency. Therefore, this allele may not be widely useful in maize breeding.

Table 3 Candidate gene association analysis

Whole-genomic prediction of KRN

We first analyzed the LD blocks of all markers using the threshold value r2 > 0.2 and obtained 27,688 tagSNPs in our association panel. Then, we randomly selected different numbers of tagSNPs, from 5 to 27,000, in the whole genome to calculate the prediction accuracies for KRN of the inbred lines, which was calculated as a correlation between predicted and true values from the simulations. The results showed that the prediction accuracies increased as the number of tagSNPs increased (Fig. 4a and Table S9). More specifically, the prediction accuracies sharply increased when the number of tagSNPs increased from 5 to 500 and then slowly increased when the number of tagSNPs increased from 400 to 2000. Once the number exceeded 2000, the prediction accuracies maintained a consistently high level. Although a large number of tagSNPs were used to predict KRN, the prediction accuracies were still less than 0.5. The effects of training population size on the prediction accuracy were also assessed based on a marker number of 14,000 (approximately 50% of the total tagSNPs). In the association panel, the prediction accuracies improved with increasing training population size. When the training population size increased from 50 to 90%, a slight increase in prediction accuracy was observed (Fig. 4b and Table S10).

Fig. 4
figure4

Whole-genome prediction of KRN in the inbred lines. a The KRN prediction accuracy for different numbers of randomly selected tagSNPs (from 5 to 27,000) based on BLUP values by using the rrBLUP model. b KRN prediction accuracy for different training population sizes. c Comparison of prediction accuracy between the top tagSNPs and random tagSNPs. **, P < 0.01. d Comparison of the prediction accuracy of different tagSNPs identified by different models. **, P < 0.01

To better understand the genetic architecture of KRN and improve the ability to predict it, we ranked the 27,688 tagSNPs according to their significance in relation to KRN, as obtained by the MLM method, to obtain the top tagSNPs. We found that these top tagSNPs had a higher prediction accuracy (ranging from 0.58 for the top 100 tagSNPs to 0.66 for the top 700 tagSNPs) than randomly selected tagSNPs (ranging from 0.22 for 100 random tagSNPs to 0.33 for 700 random tagSNPs) (Fig. 4c and Table S11).

The tagSNPs representing the significant QTNs detected by different models based on BLUP were collected and used to calculate prediction accuracies for KRN in our association panel. The results showed that these tagSNPs identified by different methods had different prediction accuracies ranging from 0.43 (FASTmrEMMA) to 0.60 (ISIS EM-BLASSO) (Fig. 4d and Table S12). We also found that the tagSNPs associated with KRN identified by the same method showed different prediction accuracies in diverse environments (Figure S3 and Table S12). To explore whether using the codetected QTNs in different GWAS methods could increase prediction accuracies for KRN, we selected the common QTNs identified by at least two, three, four, five or six methods to obtain the predictions. The results showed that only the common QTNs identified by at least two methods (common ≥2) could maintain predictability at a high level; other common QTNs had no advantage in predicting KRN, which may be due to the smaller QTN numbers (Figure S3 and Table S12).

Additionally, to improve the prediction ability, we put the KRN-related tagSNPs detected by seven methods together in a single environment (204 in XX, 87 in BJ, 118 in GZL and 167 for BLUP), namely, M-total tagSNPs, to conduct KRN prediction. As a result, we found that the prediction accuracies were improved sharply and reached 0.74 in XX, 0.66 in BJ, 0.75 in GZL and 0.75 for BLUP (Fig. 4d and Table S12). These predictabilities were much higher than those of the single method in each environment (Table S12). Then, we collected the tagSNPs associated with KRN from all methods and all environments, namely, E-M-total tagSNPs, and obtained 439 tagSNPs in total. However, there was only a slight increase in prediction accuracy (ranging from 0.68 in BJ to 0.79 for BLUP for the 439 tagSNPs) when we used the much higher number of E-M-total tagSNPs compared to the fewer M-total tagSNPs (Fig. 4d and Table S12).

Discussion

To date, the GWAS approach has been widely used to investigate the genetic basis of important traits in many species by calculating the association between genotypic and corresponding phenotypic variations [47]. To identify true association signals, many statistical methods based on different algorithms have been established. In this study, we selected one single-locus method, MLM, and six multilocus methods, mrMLM, FASTmrMLM, FASTmrEMMA, pLARmEB, pKWmEB and ISIS EM-BLASSO, to perform comprehensive GWAS mapping of KRN in our association panel. Among the seven methods, mrMLM identified the largest number of QTNs, FASTmrEMMA identified the fewest QTNs, and ISIS EM-BLASSO identified the most codetected QTNs, which were consistent with the results reported by Cui et al. [27] for salt-tolerance loci in rice. Therefore, multilocus models are valuable alternative methods for GWASs of KRN in maize. Additionally, a small number of common QTNs codetected by different methods was also observed in the study of Peng et al. [30] for free amino acid levels in bread wheat.

Comparing our GWAS results with those of previous studies, we found that some important genes controlling inflorescence architecture in maize were located within 200 kb of significant QTNs (Table S13), including CT2 (Zm00001d027886), FEA3 (Zm00001d040130), BAD1 (Zm00001d005737), RA1 (Zm00001d020430), and VT2 (Zm00001d008700). Among these genes, CT2 [7] and FEA3 [10] function in CLAVATA-WUSCHEL feedback signaling, and their mutations result in enlarged and fasciated ear primordia and increased KRN. BAD1 [48] and RA1 [41], both of which encode transcription factors, are involved in the genetic regulation of the floral branch system by the ROMASO pathway in maize. VT2 [49] functions in auxin biosynthesis and has dramatic effects on vegetative and reproductive development, and mutant ears show obvious defects. Additionally, approximately 60% of the significant QTNs within LD regions were codetected by previous GWAS mapping of inflorescence development, and some of these loci were pleiotropic [14, 15].

WGP is also an effective method in animal breeding and plant improvement [50]. Because KRN is mainly controlled by additive loci, we selected the rrBLUP additive model to conduct WGP [51]. As expected, prediction accuracy increased as the number of randomly selected tagSNPs increased, which was consistent with the finding of Liu et al. [15] and determined by the influence of marker density on WGP [50]. However, the randomly selected tagSNPs showed a low predictive ability, and thus, we decided to combine the GWAS results with WGP to explore the best marker dataset for KRN prediction. As a result, higher prediction levels were easily reached when using the significant tagSNPs, and the moderate to high values were consistent with those reported by Liu et al. [15], Guo et al. [34], Riedelsheimer et al. [35] and Xu et al. [36]. This result suggested that integrating significant signals from GWASs into WGP models as fixed effects was effective for enhancing the prediction of KRN. A similar conclusion was reached by Liu et al. [15] for KRN, by Bian and Holland [52] for resistance to southern leaf blight (SLB) and gray leaf spot (GLS) and plant height (PHT) in maize and by Spindel et al. [39] for tropical rice improvement. Although different evaluations of WGP models incorporating peak GWAS signals have been performed in maize and sorghum [53], our research indicated that the use of QTNs passing a certain threshold in the above GWAS methods as fixed effects in the rrBLUP model is a powerful tool for KRN prediction, which was a trait-specific consideration in the given population in this study.

Based on the results of this study, we suggest that KRN is controlled by many additive loci and that the rrBLUP model can be used for KRN prediction in maize inbred lines. The combined utilization of different GWAS methods is helpful for predicting candidate genes and KRN in maize breeding.

Conclusions

In this study, multiple GWAS methods were used to identify significant QTNs for KRN in maize. The seven GWAS methods revealed different numbers of KRN-associated QTNs, ranging from 11 to 177. Based on these results, seven important regions for KRN located on chromosomes 1, 2, 3, 5, 9, and 10 were identified by at least three methods and in at least two environments. Moreover, 49 genes from the seven regions were expressed in different maize tissues. Among the 49 genes, ARF29 (Zm00001d026540, encoding auxin response factor 29) and CKO4 (Zm00001d043293, encoding cytokinin oxidase protein) were significantly related to KRN, based on expression analysis and candidate gene association mapping. WGP of KRN was also performed, and we found that the KRN-associated tagSNPs achieved a high prediction accuracy. The best strategy was to integrate the total KRN-associated tagSNPs identified by all GWAS models. These results will facilitate our understanding of the genetic basis of KRN and provide important candidate genes for further research on this important trait.

Methods

Plant materials and phenotyping

An association panel of 639 maize inbred lines, representing a wide range of genetic diversity of temperate inbred lines in China [54], was collected for GWASs. We declare that all plant materials comply with the ‘Convention on the Trade in Endangered Species of Wild Fauna and Flora’ in this study. The plant materials used in this study were conserved in our lab.

All the accessions were planted following a randomized block design of three replicates in three environments in 2011: Gongzhuling in Jilin Province (43.50°N, 124.82°E), Xinxiang in Henan Province (35.19°N, 113.53°E) and Beijing (39.48°N, 116.28°E) in 2011. For descriptive purposes, the three environments were designated GZL, XX and BJ, respectively. At each location, the field experiments include in a single row 3 m in length, with 0.6 m between adjacent rows and 12 individual plants per row. The Institute of Crop Science of the Chinese Academy of Agricultural Sciences has established experimental field bases at all the above locations. The Institute of Crop Science approved the field experiments, and field management followed local maize management practices. In this study, the field studies did not involve endangered or protected species.

Five ears were harvested from each line, and KRN was evaluated in the middle part of the ears [54]. BLUP values were calculated using the SAS PROC MIXED model, with genotype, environment and replicate as random effects [14, 55]. The broad-sense heritability (H2) of KRN was calculated according to Wu et al. [40]. The coefficient of variation was calculated as CV (%) = SD/mean, where SD and mean refer to the standard deviation and mean, respectively, of KRN in each environment [55].

DNA extraction and genotyping

Young leaves of five plants of each maize line according were collected for genomic DNA extraction. We extract the genomic DNA followed the cetyltrimethylammonium bromide (CTAB) method [56]. All samples were quality checked and genotyped using the MaizeSNP50 BeadChip, which is an Illumina BeadChip array of 56,110 maize SNPs developed from the B73 reference sequence [57]. Then, the successfully called SNPs with a missing rate of more than 20% and minor allele frequency (MAF) of < 0.05 were excluded from the genotyping dataset [58]. After that, 42,667 high-quality SNPs were used in further analysis.

GWAS mapping

One single-locus method, MLM, and six multilocus methods, including mrMLM, FASTmrMLM, FASTmrEMMA, pLARmEB, pKWmEB, and ISIS EM-BLASSO, were used in this study. Alleles of each polymorphic locus with a minor frequency > 0.05 were used for further analysis. A kinship matrix was calculated and principal component analysis (PCA) was performed with the TASSEL 5.2 program [59]. An MLM controlling for population structure (Q) and kinship (K) (MLM Q + K) was also generated in TASSEL 5.2 [18, 19]. Six multilocus GWAS mapping methods were used along with the software package mrMLM.GUI v3.2 in the R environment (http://127.0.0.1:5846/) [26]. All parameters were set at default values, the critical threshold of significant associations for the MLM was set at –log10P ≥ 3, and the logarithm of odds (LOD) score for the six multilocus methods was set at ≥3 [26].

Candidate gene analysis

The LD decay with physical distance in our association panel was calculated in TASSEL 5.2 to be 200 kb (Figure S2). The candidate genes in the 200-kb region around significant QTNs detected by at least three models and in two environments were identified based on the B73 reference genome V4 from MaizeGDB (https://www.maizegdb.org/). Expression data for these genes were collected from previous studies [42, 60]. Genome fragments containing the SNPs within the selected genes, including the 10-kb promoter region, the gene bodies and the 10-kb region downstream of the genes, were obtained from the maize HapMap3 dataset [43]. The candidate gene mapping analyses were conducted on a global maize association mapping panel of 282 diverse lines. The phenotypes of this association panel were provided in our previous report [40], and KRN was measured in six environments, including Beijing, Xinxiang in Henan, and Urumqi in Xinjiang in 2009 and 2010. Association analysis was conducted by the MLM method in TASSEL 5.2, controlling for population structure (Q) and kinship (K). The first three principal components (PCs), which were analyzed in a previous study [40], were used as covariants to control for existing population structure in the 282-line association mapping panel. Significant marker-trait associations were declared at –log10P > 3.

Genomic prediction of KRN

To predict the KRN of the inbred lines, we estimated predictability by WGP. We grouped the LD blocks in PLINK software [61] using the threshold value r2 > 0.2 and identified tagSNPs according to the LD blocks. The ridge regression best linear unbiased prediction (rrBLUP) package was used to perform genomic prediction in R [62]. We randomly selected half of the lines of our association panel as the training population (320 inbred lines) and the remaining 319 inbred lines as the validation population [15]. We used the KRN-related tagSNPs identified by different methods to perform genomic prediction of KRN for the inbred lines under four conditions (XX, BJ, GZL and BLUP). Simultaneously, 5 to 27,000 randomly selected tagSNPs, the total tagSNPs related to KRN identified by the seven methods in a single environment (M-total tagSNPs), the total tagSNPs for KRN from all methods and environments (E-M-total tagSNPs) and the common tagSNPs for KRN detected by at least two, three, four, five, or six methods were also used for the same procedure. The random sampling of tagSNP numbers, the training and validation populations and the predictions were all repeated 100 times.

Availability of data and materials

The datasets used and/or analyzed during the current study are available from the corresponding author upon reasonable request.

Abbreviations

BJ:

Beijing

FMs:

Floral meristems

GWAS:

Genome-wide association study

GZL:

Gongzhuling

IM:

Inflorescence meristem

KRN:

Kernel row number

LD:

Linkage disequilibrium

MLM:

Mixed linear model

QTL:

Quantitative trait locus

QTN:

Quantitative trait nucleotide

SMs:

Spikelet meristems

SNP:

Single nucleotide polymorphism

SPMs:

Spikelet pair meristems

XX:

Xinxiang

References

  1. 1.

    Matsuoka Y, Vigouroux Y, Goodman MM, Sanchez GJ, Buckler E, Doebley J. A single domestication for maize shown by multilocus microsatellite genotyping. Proc Natl Acad Sci U S A. 2002;99:6080–4.

    CAS  Article  PubMed  Google Scholar 

  2. 2.

    Iltis HH. From teosinte to maize: the catastrophic sexual transmutation. Science. 1983;222:886–94.

    CAS  Article  PubMed  Google Scholar 

  3. 3.

    Iltis HH. Homeotic sexual translocations and the origin of maize (Zea Mays, Poaceae): a new look at an old problem. Econ Bot. 2000;54:7–42.

    Article  Google Scholar 

  4. 4.

    Doebley J. The genetics of maize evolution. Annu Rev Genet. 2004;38:37–59.

    CAS  Article  PubMed  Google Scholar 

  5. 5.

    Thompson BE, Hake S. Translational biology: from Arabidopsis flowers to grass inflorescence architecture. Plant Physiol. 2009;149:38–45.

    CAS  Article  PubMed  Google Scholar 

  6. 6.

    Taguchi-Shiobara F, Yuan Z, Hake S, Jackson D. The fasciated ear2 gene encodes a leucine-rich repeat receptor-like protein that regulates shoot meristem proliferation in maize. Genes Dev. 2001;15:2755–66.

    CAS  Article  PubMed  Google Scholar 

  7. 7.

    Bommert P, Je BI, Goldshmidt A, Jackson D. The maize Galpha gene COMPACT PLANT2 functions in CLAVATA signalling to control shoot meristem size. Nature. 2013;502:555–8.

    CAS  Article  Google Scholar 

  8. 8.

    Bommert P, Nagasawa NS, Jackson D. Quantitative variation in maize kernel row number is controlled by the FASCIATED EAR2 locus. Nat Genet. 2013;45:334–7.

    CAS  Article  Google Scholar 

  9. 9.

    Chuck GS, Brown PJ, Meeley R, Hake S. Maize SBP-box transcription factors unbranched2 and unbranched3 affect yield traits by regulating the rate of lateral primordia initiation. Proc Natl Acad Sci U S A. 2014;111:18775–80.

    CAS  Article  PubMed  Google Scholar 

  10. 10.

    Je BI, Gruel J, Lee YK, Bommert P, Arevalo ED, Eveland AL, et al. Signaling from maize organ primordia via FASCIATED EAR3 regulates stem cell proliferation and yield traits. Nat Genet. 2016;48:785–91.

    CAS  Article  PubMed  Google Scholar 

  11. 11.

    Li M, Zhong W, Yang F, Zhang Z. Genetic and molecular mechanisms of quantitative trait loci controlling maize inflorescence architecture. Plant Cell Physiol. 2018;59:448–57.

    CAS  Article  PubMed  Google Scholar 

  12. 12.

    Liu L, Du Y, Shen X, Li M, Sun W, Huang J, et al. KRN4 controls quantitative variation in maize kernel row number. PLoS Genet. 2015;11:e1005670.

    Article  CAS  PubMed  Google Scholar 

  13. 13.

    Wang J, Lin Z, Zhang X, Liu H, Zhou L, Zhong S, et al. KRN1, a major quantitative trait locus for kernel row number in maize. New Phytol. 2019;223:1634–46.

    CAS  Article  PubMed  Google Scholar 

  14. 14.

    Brown PJ, Upadyayula N, Mahone GS, Tian F, Bradbury PJ, Myles S, et al. Distinct genetic architectures for male and female inflorescence traits of maize. PLoS Genet. 2011;7:e1002383.

    CAS  Article  PubMed  Google Scholar 

  15. 15.

    Liu L, Du Y, Huo D, Wang M, Shen X, Yue B, et al. Genetic architecture of maize kernel row number and whole genome prediction. Theor Appl Genet. 2015;128:2243–54.

    Article  PubMed  Google Scholar 

  16. 16.

    Xiao Y, Tong H, Yang X, Xu S, Pan Q, Qiao F, et al. Genome-wide dissection of the maize ear genetic architecture using multiple populations. New Phytol. 2016;210:1095–106.

    CAS  Article  PubMed  Google Scholar 

  17. 17.

    Flint-Garcia SA, Thuillet AC, Yu J, Pressoir G, Romero SM, Mitchell SE, et al. Maize association population: a high-resolution platform for quantitative trait locus dissection. Plant J. 2005;44:1054–64.

    CAS  Article  PubMed  Google Scholar 

  18. 18.

    Zhang Y-M, Mao Y, Xie C, Smith H, Luo L, Xu S. Mapping quantitative trait loci using naturally occurring genetic variance among commercial inbred lines of maize (Zea mays L.). Genetics. 2005;169:2267–75.

    CAS  Article  PubMed  Google Scholar 

  19. 19.

    Yu J, Pressoir G, Briggs WH, Vroh Bi I, Yamasaki M, Doebley JF, et al. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat Genet. 2006;38:203–8.

    CAS  Article  PubMed  Google Scholar 

  20. 20.

    Wang S-B, Feng J-Y, Ren W-L, Huang B, Zhou L, Wen Y-J, et al. Improving power and accuracy of genome-wide association studies via a multi-locus mixed linear model methodology. Sci Rep. 2016;6:19444.

    CAS  Article  PubMed  Google Scholar 

  21. 21.

    Tamba CL, Ni Y-L, Zhang Y-M. Iterative sure independence screening EM-Bayesian LASSO algorithm for multi-locus genome-wide association studies. PLoS Comput Biol. 2017;13:e1005357.

    Article  CAS  PubMed  Google Scholar 

  22. 22.

    Zhang J, Feng JY, Ni YL, Wen YJ, Niu Y, Tamba CL, et al. pLARmEB: integration of least angle regression with empirical Bayes for multilocus genome-wide association studies. Heredity (Edinb). 2017;118:517–24.

    CAS  Article  Google Scholar 

  23. 23.

    Wen YJ, Zhang H, Ni YL, Huang B, Zhang J, Feng JY, et al. Methodological implementation of mixed linear models in multi-locus genome-wide association studies. Brief Bioinform. 2018;19:700–12.

    Article  PubMed  Google Scholar 

  24. 24.

    Ren W-L, Wen Y-J, Dunwell JM, Zhang Y-M. pKWmEB: integration of Kruskal–Wallis test with empirical Bayes under polygenic background control for multi-locus genome-wide association study. Heredity. 2018;120:208–18.

    CAS  Article  PubMed  Google Scholar 

  25. 25.

    Tamba CL, Zhang YM. A fast mrMLM algorithm for multi-locus genome-wide association studies. bioRxiv. 2018. https://doi.org/10.1101/341784.

  26. 26.

    Zhang Y-M, Jia Z, Dunwell JM. The applications of new multi-locus GWAS methodologies in the genetic dissection of complex traits. Front Plant Sci. 2019;10:100.

    Article  PubMed  Google Scholar 

  27. 27.

    Cui Y, Zhang F, Zhou Y. The application of multi-locus GWAS for the detection of salt-tolerance loci in rice. Front Plant Sci. 2018;9:1464.

    Article  PubMed  Google Scholar 

  28. 28.

    Xu Y, Yang T, Zhou Y, Yin S, Li P, Liu J, et al. Genome-wide association mapping of starch pasting properties in maize using single-locus and multi-locus models. Front Plant Sci. 2018;9:1311.

    Article  PubMed  Google Scholar 

  29. 29.

    He L, Xiao J, Rashid KY, Yao Z, Li P, Jia G, et al. Genome-wide association studies for pasmo resistance in flax (Linum usitatissimum L.). front. Plant Sci. 2019;9:1982.

    Google Scholar 

  30. 30.

    Peng Y, Liu H, Chen J, Shi T, Zhang C, Sun D, et al. Genome-wide association studies of free amino acid levels by six multi-locus models in bread wheat. Front Plant Sci. 2018;9:1196.

    Article  PubMed  Google Scholar 

  31. 31.

    Li C, Fu Y, Sun R, Wang Y, Wang Q. Single-locus and multi-locus genome-wide association studies in the genetic dissection of fiber quality traits in upland cotton (Gossypium hirsutum L.). Front Plant Sci. 2018;9:1083.

    Article  PubMed  Google Scholar 

  32. 32.

    Su J, Ma Q, Li M, Hao F, Wang C. Multi-locus genome-wide association studies of fiber-quality related traits in chinese early-maturity upland cotton. Front Plant Sci. 2018;9:1169.

    Article  PubMed  Google Scholar 

  33. 33.

    Heffner E, Sorrells M, Jannink J-L. Genomic selection for crop improvement. Crop Sci. 2009;49:1–12.

    CAS  Article  Google Scholar 

  34. 34.

    Guo T, Li H, Yan J, Tang J, Li J, Zhang Z, et al. Performance prediction of F1 hybrids between recombinant inbred lines derived from two elite maize inbred lines. Theor Appl Genet. 2013;126:189–201.

    CAS  Article  PubMed  Google Scholar 

  35. 35.

    Riedelsheimer C, Endelman JB, Stange M, Sorrells ME, Jannink JL, Melchinger AE. Genomic predictability of interconnected biparental maize populations. Genetics. 2013;194:493–503.

    Article  PubMed  Google Scholar 

  36. 36.

    Xu Y, Xu C, Xu S. Prediction and association mapping of agronomic traits in maize using multiple omic data. Heredity (Edinb). 2017;119:174–84.

    CAS  Article  Google Scholar 

  37. 37.

    Zhang Z, Ober U, Erbe M, Zhang H, Gao N, He J, et al. Improving the accuracy of whole genome prediction for complex traits using the results of genome wide association studies. PLoS One. 2014;9:e93017.

    Article  CAS  PubMed  Google Scholar 

  38. 38.

    Arruda MP, Lipka AE, Brown PJ, Krill AM, Thurber C, Brown-Guedira G, et al. Comparing genomic selection and marker-assisted selection for Fusarium head blight resistance in wheat (Triticum aestivum L.). Mol Breed. 2016;36:84.

    Article  CAS  Google Scholar 

  39. 39.

    Spindel JE, Begum H, Akdemir D, Collard B, Redona E, Jannink JL, et al. Genome-wide prediction models that incorporate de novo GWAS are a powerful new tool for tropical rice improvement. Heredity (Edinb). 2016;116:395–408.

    CAS  Article  Google Scholar 

  40. 40.

    Wu X, Li Y, Shi Y, Song Y, Zhang D, Li C, et al. Joint-linkage mapping and GWAS reveal extensive genetic loci that regulate male inflorescence size in maize. Plant Biotechnol J. 2016;14:1551–62.

    CAS  Article  PubMed  Google Scholar 

  41. 41.

    Vollbrecht E, Springer PS, Goh L, Buckler ES, Martienssen R. Architecture of floral branch systems in maize and related grasses. Nature. 2005;436:1119–26.

    CAS  Article  PubMed  Google Scholar 

  42. 42.

    Bolduc N, Yilmaz A, Mejia-Guerra MK, Morohashi K, O'Connor D, Grotewold E, et al. Unraveling the KNOTTED1 regulatory network in maize meristems. Genes Dev. 2012;26:1685–90.

    CAS  Article  PubMed  Google Scholar 

  43. 43.

    Bukowski R, Guo X, Lu Y, Zou C, He B, Rong Z, et al. Construction of the third-generation Zea mays haplotype map. Gigascience. 2017;7:1–12.

    CAS  Google Scholar 

  44. 44.

    Barazesh S, McSteen P. Barren inflorescence1 functions in organogenesis during vegetative and inflorescence development in maize. Genetics. 2008;179:389–401.

    CAS  Article  PubMed  Google Scholar 

  45. 45.

    Galli M, Khakhar A, Lu Z, Chen Z, Sen S, Joshi T, et al. The DNA binding landscape of the maize AUXIN RESPONSE FACTOR family. Nat Commun. 2018;9:4526.

    Article  CAS  PubMed  Google Scholar 

  46. 46.

    Du Y, Liu L, Li M, Fang S, Shen X, Chu J, et al. UNBRANCHED3 regulates branching by modulating cytokinin biosynthesis and signaling in maize and rice. New Phytol. 2017;214:721–33.

    CAS  Article  Google Scholar 

  47. 47.

    Xiao Y, Liu H, Wu L, Warburton M, Yan J. Genome-wide association studies in maize: praise and stargaze. Mol Plant. 2017;10:359–74.

    CAS  Article  Google Scholar 

  48. 48.

    Bai F, Reinheimer R, Durantini D, Kellogg EA, Schmidt RJ. TCP transcription factor, BRANCH ANGLE DEFECTIVE 1 (BAD1), is required for normal tassel branch angle formation in maize. Proc Natl Acad Sci U S A. 2012;109:12225–30.

    CAS  Article  PubMed  Google Scholar 

  49. 49.

    Phillips KA, Skirpan AL, Liu X, Christensen A, Slewinski TL, Hudson C, et al. Vanishing tassel2 encodes a grass-specific tryptophan aminotransferase required for vegetative and reproductive development in maize. Plant Cell. 2011;23:550–66.

    CAS  Article  PubMed  Google Scholar 

  50. 50.

    Desta ZA, Ortiz R. Genomic selection: genome-wide prediction in plant improvement. Trends Plant Sci. 2014;19:592–601.

    CAS  Article  PubMed  Google Scholar 

  51. 51.

    Zhang Z, Erbe M, He J, Ober U, Gao N, Zhang H, et al. Accuracy of whole-genome prediction using a genetic architecture-enhanced variance-covariance matrix. G3 (Bethesda). 2015;5:615–27.

    Article  Google Scholar 

  52. 52.

    Bian Y, Holland JB. Enhancing genomic prediction with genome-wide association studies in multiparental maize populations. Heredity (Edinb). 2017;118:585–93.

    CAS  Article  Google Scholar 

  53. 53.

    Rice B, Lipka AE. Evaluation of RR-BLUP genomic selection models that incorporate peak genome-wide association study signals in maize and sorghum. Plant Genome. 2019;12:180052.

    Article  CAS  Google Scholar 

  54. 54.

    Li X, Li Y, Wu X, Bai N, Song Y, Zhang D, et al. Characteristic analysis of kernel related traits in important maize inbred lines of China. J Plant Genet Resour. 2016;17:1–5.

    Google Scholar 

  55. 55.

    Li YX, Li C, Bradbury PJ, Liu X, Lu F, Romay CM, et al. Identification of genetic variants associated with maize flowering time using an extremely large multi-genetic background population. Plant J. 2016;86:391–402.

    CAS  Article  PubMed  Google Scholar 

  56. 56.

    Murray MG, Thompson WF. Rapid isolation of high molecular weight plant DNA. Nucleic Acids Res. 1980;8:4321–5.

    CAS  Article  PubMed  Google Scholar 

  57. 57.

    Ganal MW, Durstewitz G, Polley A, Berard A, Buckler ES, Charcosset A, et al. A large maize (Zea mays L.) SNP genotyping array: development and germplasm genotyping, and genetic mapping to compare with the B73 reference genome. PLoS One. 2011;6:e28334.

    CAS  Article  PubMed  Google Scholar 

  58. 58.

    Wu X, Li Y, Shi Y, Song Y, Wang T, Huang Y, et al. Fine genetic characterization of elite maize germplasm using high-throughput SNP genotyping. Theor Appl Genet. 2014;127:621–31.

    Article  PubMed  Google Scholar 

  59. 59.

    Bradbury PJ, Zhang Z, Kroon DE, Casstevens TM, Ramdoss Y, Buckler ES. TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics. 2007;23:2633–5.

    CAS  Article  PubMed  Google Scholar 

  60. 60.

    Yi F, Gu W, Chen J, Song N, Gao X, Zhang X, et al. High temporal-resolution transcriptome landscape of early maize seed development. Plant Cell. 2019;31:974–92.

    CAS  Article  PubMed  Google Scholar 

  61. 61.

    Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81:559–75.

    CAS  Article  PubMed  Google Scholar 

  62. 62.

    Endelman J. Ridge regression and other kernels for genomic selection with R package rrBLUP. Plant Genome. 2011;4:250–5.

    Article  Google Scholar 

Download references

Acknowledgments

Not applicable.

Funding

Thanks to the National Natural Science Foundation (91735306, 31801373), we were able to conduct genotyping of the association mapping panel. Thanks to the Ministry of Science and Technology of China (2016YFD0100303, 2016YFD0100103) and the CAAS Innovation Program, we were able to conduct phenotype identification of the association mapping panel. None of these funding bodies have any relationship with the publication of this manuscript.

Author information

Affiliations

Authors

Contributions

Y. A. and L. C. performed the GWAS and WGP and drafted the manuscript; Y-x. L. and C. L. conceived the study and helped discuss the results. Y. S. and D. Z. led the planning of this study. T. W. and Y. L. designed the research and edited the manuscript. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Yu Li or Tianyu Wang.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Additional file 1: Figure S1.

Common QTNs codetected with different models and in different environments. a, The common QTNs codetected by different methods. The X-axis represents different environments. The Y-axis represents the corresponding number of significant QTNs detected by only one method and by at least two, three, four, five, six or seven methods. b, The common QTNs codetected across different locations.

Additional file 2: Figure S2.

LD decay with physical distance in our association panel.

Additional file 3: Figure S3.

Whole-genome prediction of KRN in the inbred lines. The bars with different colors represent prediction accuracies for the KRN when using tagSNPs identified by different models. P-values were estimated based on the two-tailed Student’s t-test. ***: P-value < 0.0001; NS: P-value > 0.05.

Additional file 4: Table S1

. A list of material information in our association panel.

Additional file 5: Table S2.

Descriptive statistics of KRN from the subgroups in the association panel.

Additional file 6: Table S3.

The significant QTNs for KRN identified by the MLM method.

Additional file 7: Table S4.

The significant QTNs for KRN identified by six multilocus methods.

Additional file 8: Table S5

. The common QTNs codetected by different methods.

Additional file 9: Table S6

. The common QTNs codetected across different locations.

Additional file 10: Table S7

. The expression of the candidate genes in different maize tissues.

Additional file 11: Table S8.

Genes related to spike mutation in maize.

Additional file 12: Table S9

. KRN prediction accuracies for different numbers of randomly selected tagSNPs (from 5 to 27,000).

Additional file 13: Table S10

. KRN prediction accuracies for different training population sizes.

Additional file 14: Table S11

. Comparison of prediction accuracy between the top tagSNPs and random tagSNPs.

Additional file 15: Table S12

. The prediction accuracies for KRN of the inbred lines obtained using the tagSNPs representing the significant QTNs identified by different methods.

Additional file 16: Table S13

. Comparison of our GWAS results with QTNs detected in previous studies.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

An, Y., Chen, L., Li, YX. et al. Genome-wide association studies and whole-genome prediction reveal the genetic architecture of KRN in maize. BMC Plant Biol 20, 490 (2020). https://doi.org/10.1186/s12870-020-02676-x

Download citation

Keywords

  • Maize
  • Kernel row number
  • Genome-wide association study
  • Quantitative trait nucleotide
  • Whole-genome prediction