Genome-wide association studies and whole-genome prediction reveal the genetic architecture of KRN in maize

Background Kernel row number (KRN) is an important trait for the domestication and improvement of maize. Exploring the genetic basis of KRN has great research significance and can provide valuable information for molecular assisted selection. Results In this study, one single-locus method (MLM) and six multilocus methods (mrMLM, FASTmrMLM, FASTmrEMMA, pLARmEB, pKWmEB and ISIS EM-BLASSO) of genome-wide association studies (GWASs) were used to identify significant quantitative trait nucleotides (QTNs) for KRN in an association panel including 639 maize inbred lines that were genotyped by the MaizeSNP50 BeadChip. In three phenotyping environments and with best linear unbiased prediction (BLUP) values, the seven GWAS methods revealed different numbers of KRN-associated QTNs, ranging from 11 to 177. Based on these results, seven important regions for KRN located on chromosomes 1, 2, 3, 5, 9, and 10 were identified by at least three methods and in at least two environments. Moreover, 49 genes from the seven regions were expressed in different maize tissues. Among the 49 genes, ARF29 (Zm00001d026540, encoding auxin response factor 29) and CKO4 (Zm00001d043293, encoding cytokinin oxidase protein) were significantly related to KRN, based on expression analysis and candidate gene association mapping. Whole-genome prediction (WGP) of KRN was also performed, and we found that the KRN-associated tagSNPs achieved a high prediction accuracy. The best strategy was to integrate all of the KRN-associated tagSNPs identified by all GWAS models. Conclusions These results aid in our understanding of the genetic architecture of KRN and provide useful information for genomic selection for KRN in maize breeding.

genes have been cloned and found to be involved in complex regulatory networks responsible for meristem development and KRN modification by studying mutants [6][7][8][9][10]. However, these classical mutants show negative pleiotropy for other traits related to plant architecture and are difficult to directly use in maize breeding [11]. Therefore, linkage mapping and association mapping have been performed in naturally varying populations with the aim of identifying more elite natural alleles controlling KRN.
Although many quantitative trait loci (QTLs) related to KRN were identified by linkage mapping in biparental segregating populations, few have been successfully cloned due to their small genetic effects, except for KRN4 [12] and KRN1 [13]. Genome-wide association studies (GWASs) of KRN have also been conducted and revealed many quantitative trait nucleotides (QTNs) [14][15][16]. At the same time, GWAS results can be easily influenced by population structure and rare variants in natural populations [17]. Therefore, many statistical models have been developed to improve power for identifying genotypephenotype associations when using the GWAS approach, such as the single-locus mixed linear model (MLM) method [18,19] and the multilocus methods mrMLM [20], ISIS EM-BLASSO [21], pLARmEB [22], FASTmrEMMA [23], pKWmEB [24], and FASTmrMLM [25]. The MLM method is a singlelocus fixed-single nucleotide polymorphism (SNP)-effect approach used in the case of a polygenic background to control population structure [18,19]. To reduce the false positive rate (FPR), stringent Bonferroni correction is used for multiple testing correction in the MLM approach [26]. The multilocus method is an alternative GWAS procedure that is based on a random-SNP-effect model, and no multiple testing correction is needed [26]. There are two steps in this model. First, a reduced number of SNPs is selected through different algorithms, and the SNPs are then used in the multilocus model to detect true signals [20][21][22][23][24][25][26]. Recently, a few studies have implemented the above GWAS methods to detect important loci controlling different traits in rice [27], maize [28], flax [29], bread wheat [30] and upland cotton [31,32].
Previous studies have revealed that KRN is quantitatively inherited and that the effects of a single genetic locus are generally small, which poses challenges for genetic improvement in maize breeding. Therefore, the best approach is to improve the ability to predict KRN by integrated analysis of more markers distributed throughout the whole genome. Genomic selection (GS), or whole-genome prediction (WGP), has the capacity to use full-genome data to increase breeding efficiency [33]. In previous studies, WGPs of KRN were performed in F 1 hybrids between recombinant inbred lines [34], interconnected biparental maize populations [35] and 339 maize inbred lines [36], all of which showed that KRN was a trait suitable for genome-wide prediction. Liu et al. [15] showed that approximately 300 top KRN-associated tagSNPs were sufficient for predicting the KRN of inbred lines and hybrids using ridge regression best linear unbiased prediction (rr-BLUP). Based on these analyses, we are faced with determining how to select fewer markers to accurately predict KRN. Several studies reported that selecting association markers from the results of GWASs and including them as fixed effects in WGP models resulted in better performance than that achieved with single WGP models [37][38][39]. This might provide a way to simultaneously model different aspects of genetic architecture and is especially accessible to breeders [39].
In this study, we performed a GWAS of an association panel including 639 maize inbred lines based on the MaizeSNP50 BeadChip by using one single-locus method, the MLM method, and six multilocus methods, mrMLM, FASTmrMLM, FASTmrEMMA, pLARmEB, pKWmEB and ISIS EM-BLASSO. The common significant QTNs codetected by different methods and across different environments were analyzed, and the candidate genes related to KRN were further predicted. WGP was also performed using various KRN-related tagSNPs to dissect the genetic architecture of KRN.  (Table S1). The results showed that KRN was normally distributed in each environment, and the KRNs among environments were highly positively correlated, with correlations ranging from 0.73 between XX and BJ to 0.79 between XX and GZL (Fig. 1a). KRN exhibited high broad-sense heritability (H 2 = 0.90, Table 1), which was similar to the results of previous studies [14,16]. Comparing KRN among the different environments, we found that it showed the smallest average (13.69), minimum (8.60) and maximum (20.60) values in XX, where all accessions were planted in summer (June). With increasing latitude, where the accessions were planted in spring (May), the average KRN increased (14.65 in BJ and 14.59 in GZL). The largest range (max -min) in KRN appeared in GZL (12.60), which had the longest day length (Table 1). Based on previous results [40], our association panel could be divided into five subgroups: Reid, tangsipingtou (TSPT), lvdahonggu (LRC), Lancaster and P. The KRN statistical analysis results of various subgroups are shown in Table S2. There were no significant differences in KRN among the five subgroups (Fig. 1b). These results indicated that KRN was a quantitative trait and that the phenotypic variation among the tested inbred lines in the association panel was beneficial for dissecting the genetic architecture of KRN.

Multiple-locus analysis of KRN
Using different multiple-locus models, we identified different numbers of significant QTNs for KRN in XX, BJ, and GZL and together with BLUP across all locations. These QTNs were unevenly distributed on 10 chromosomes, with the most QTNs on Chr. 1 and the fewest on Chr. 8 (Fig. 2e). Specifically, 15 (FASTmrEMMA)-177 (mrMLM) QTNs in XX, 11 (FASTmrEMMA)-30 (ISIS EM-BLASSO) QTNs in BJ, 12 (FASTmrEMMA)-55 (mrMLM) QTNs in GZL and 11 (FASTmrEMMA)-106 (mrMLM) QTNs for BLUP were identified by the six different methods (Table S4). Comparative analysis of the GWAS results among different statistical approaches showed that FASTmrEMMA detected the fewest QTNs in all the environments, while mrMLM detected the most QTNs in all the environments, except for BJ (Table S4). QTN overlap analysis among the seven methods indicated that the common QTNs codetected by at least two methods accounted for more than 40% of the QTNs in different environments ( Figure S1a and Table S5, 42% in XX, 62% in BJ, 58% in GZL and 47% with BLUP). For example, 65 common QTNs representing 30 loci were codetected by two methods in XX, and 39 common QTNs representing 13 loci, 28 common QTNs representing 7 loci, 25 common QTNs representing 5 loci, and 6 common QTNs representing 1 locus were codetected by three, four, five and six methods, respectively (Figure S1a and Table S5). No QTNs were identified by all 7 methods in different locations. Overall, ISIS EM-BLASSO, which detected the third largest number of QTNs, identified the most codetected QTNs, followed by FASTmrMLM ( Figure S1a and Table S5). Comparative analysis of the GWAS results among the different environments showed that the majority of the  QTNs identified by the MLM method and ISIS EM-BLASSO were repeatedly detected in different locations ( Figure S1b, Table S6).

Annotation and expression of candidate genes for KRN
To obtain reliable significant QTNs and predict the candidate genes for KRN, only the QTNs simultaneously identified by at least three methods (either single-locus or multilocus) and in at least two environments were used for the next analysis. Finally, seven QTNs controlling KRN were obtained ( Table 2). The seven QTNs were located on chromosomes 1, 2, 3, 5, 9, and 10, and the PVE by these QTNs ranged from 1.06 to 5.21%. Based on the linkage disequilibrium (LD) in the association panel ( Figure S2), 49 genes around the QTNs (200 kb upstream and downstream) were obtained, and their expression varied widely in different maize tissues ( Fig. 3a and Table S7). For example, Zm00001d016760, which encodes the abscisic acid stress ripening 6 protein, is highly expressed in the roots, and Zm00001d031426, which encodes serine/threonine-protein kinase, and Zm00001d043298, which encodes a P-loop containing nucleoside triphosphate hydrolase superfamily protein, are highly expressed in tassels and anthers. Among the 49 genes, 22 were differentially expressed in different spike development mutants (Table S8); i.e., the ra1, ra2 and ra3 mutants had abnormal highly branched tassels and ears, with the ears displaying a very large KRN [41]; the kn1 mutant had smaller ears and fewer spikelets [42]. This result suggested that these 22 genes might be involved in ear development in maize.  Interestingly, we found that Zm00001d026540 (encoding auxin response factor 29, ARF29), which was located within 200 kb downstream of PZE-110106563 on Chr. 10 and was detected by the MLM method and all six multilocus GWAS methods (Table 2), had higher expression in SAMs and ears than in other tissues (Table  S7). Candidate gene association mapping was also performed. The SNPs within ARF29 and the 10-kb promoter and 10-kb region downstream of ARF29 were obtained from maize HapMap3 [43]. The KRN of 282 inbred lines was measured in six environments (see Methods), and the BLUP values were calculated. The MLM mapping result showed that five SNPs (two SNPs in the gene and three SNPs in the region upstream of the gene) around ARF29 were significantly related to KRN (Fig. 3b and Table 3). ARF29 can bind the Bif1 (which is related to SAM development and final KRN) promoter by recognizing the TTTCGG motif [44,45].
The S10_147,122,969 SNP, located within the gene body, was significantly associated with KRN. Two alleles for this SNP (A/T) were present in this panel, with the A allele conferring a higher KRN. Cytokinins also play an important role in the development of immature spikes and the formation of final KRN [46]. For example, UB3 regulates KRN by the cytokinin pathway and CLAV ATA-WUSCHEL pathway [46]. In this study, CKO4 (Zm00001d043293, encoding cytokinin oxidase protein) was detected as being located within 200 kb upstream of PUT-163a-110,967,306-138 on Chr. 3 by four GWAS methods (MLM, mrMLM, pLARmEB, and pKWmEB, Table 2), and candidate gene association mapping of CKO4 was also conducted. The SNPs and KRN were also obtained from HapMap3 and 282 inbred lines. The MLM results showed that two SNPs located upstream of CKO4 were significantly associated with KRN ( Fig. 3c and Table 3). The S3_191,837,578 SNP had two alleles  (T/G), and the T allele was associated with a higher KRN but had a lower frequency. Therefore, this allele may not be widely useful in maize breeding.

Whole-genomic prediction of KRN
We first analyzed the LD blocks of all markers using the threshold value r 2 > 0.2 and obtained 27,688 tagSNPs in our association panel. Then, we randomly selected different numbers of tagSNPs, from 5 to 27,000, in the whole genome to calculate the prediction accuracies for KRN of the inbred lines, which was calculated as a correlation between predicted and true values from the simulations. The results showed that the prediction accuracies increased as the number of tagSNPs increased ( Fig. 4a and Table S9). More specifically, the prediction accuracies sharply increased when the number of tagSNPs increased from 5 to 500 and then slowly increased when the number of tagSNPs increased from 400 to 2000. Once the number exceeded 2000, the prediction accuracies maintained a consistently high level. Although a large number of tagSNPs were used to predict KRN, the prediction accuracies were still less than 0.5. The effects of training population size on the prediction accuracy were also assessed based on a marker number of 14,000 (approximately 50% of the total tagSNPs). In the association panel, the prediction accuracies improved with increasing training population size. When the training population size increased from 50 to 90%, a slight increase in prediction accuracy was observed ( Fig. 4b and Table S10).
To better understand the genetic architecture of KRN and improve the ability to predict it, we ranked the 27,688 tagSNPs according to their significance in relation to KRN, as obtained by the MLM method, to obtain the top tagSNPs. We found that these top tagSNPs had a higher prediction accuracy (ranging from 0.58 for the top 100 tagSNPs to 0.66 for the top 700 tagSNPs) than randomly selected tagSNPs (ranging from 0.22 for 100 random tagSNPs to 0.33 for 700 random tagSNPs) ( Fig. 4c and Table S11).
The tagSNPs representing the significant QTNs detected by different models based on BLUP were collected and used to calculate prediction accuracies for KRN in our association panel. The results showed that these tagSNPs identified by different methods had different prediction accuracies ranging from 0.43 (FAS-TmrEMMA) to 0.60 (ISIS EM-BLASSO) ( Fig. 4d and Table S12). We also found that the tagSNPs associated  Table S12). To explore whether using the codetected QTNs in different GWAS methods could increase prediction accuracies for KRN, we selected the common QTNs identified by at least two, three, four, five or six methods to obtain the predictions. The results showed that only the common QTNs identified by at least two methods (common ≥2) could maintain predictability at a high level; other common QTNs had no advantage in predicting KRN, which may be due to the smaller QTN numbers ( Figure S3 and Table S12).
Additionally, to improve the prediction ability, we put the KRN-related tagSNPs detected by seven methods together in a single environment (204 in XX, 87 in BJ, 118 in GZL and 167 for BLUP), namely, M-total tagSNPs, to conduct KRN prediction. As a result, we found that the prediction accuracies were improved sharply and reached 0.74 in XX, 0.66 in BJ, 0.75 in GZL and 0.75 for BLUP ( Fig. 4d and Table S12). These predictabilities were much higher than those of the single method in each environment (Table S12). Then, we collected the tagSNPs associated with KRN from all methods and all environments, namely, E-M-total tagSNPs, and obtained 439 tagSNPs in total. However, there was only a slight increase in prediction accuracy (ranging from 0.68 in BJ to 0.79 for BLUP for the 439 tagSNPs) when we used the much higher number of E-M-total tagSNPs compared to the fewer Mtotal tagSNPs (Fig. 4d and Table S12).

Discussion
To date, the GWAS approach has been widely used to investigate the genetic basis of important traits in many species by calculating the association between genotypic and corresponding phenotypic variations [47]. To identify true association signals, many statistical methods based on different algorithms have been established. In this study, we selected one single-locus method, MLM, and six multilocus methods, mrMLM, FASTmrMLM, FASTmrEMMA, pLARmEB, pKWmEB and ISIS EM-BLASSO, to perform comprehensive GWAS mapping of KRN in our association panel. Among the seven methods, mrMLM identified the largest number of QTNs, FASTmrEMMA identified the fewest QTNs, and ISIS EM-BLASSO identified the most codetected QTNs, which were consistent with the results reported by Cui et al. [27] for salt-tolerance loci in rice. Therefore, multilocus models are valuable alternative methods for GWASs of KRN in maize. Additionally, a small number of common QTNs codetected by different methods was also observed in the study of Peng et al. [30] for free amino acid levels in bread wheat.
Comparing our GWAS results with those of previous studies, we found that some important genes controlling inflorescence architecture in maize were located within 200 kb of significant QTNs (Table S13), including CT2 (Zm00001d027886), FEA3 (Zm00001d040130), BAD1 (Zm00001d005737), RA1 (Zm00001d020430), and VT2 (Zm00001d008700). Among these genes, CT2 [7] and FEA3 [10] function in CLAVATA-WUSCHEL feedback signaling, and their mutations result in enlarged and fasciated ear primordia and increased KRN. BAD1 [48] and RA1 [41], both of which encode transcription factors, are involved in the genetic regulation of the floral branch system by the ROMASO pathway in maize. VT2 [49] functions in auxin biosynthesis and has dramatic effects on vegetative and reproductive development, and mutant ears show obvious defects. Additionally, approximately 60% of the significant QTNs within LD regions were codetected by previous GWAS mapping of inflorescence development, and some of these loci were pleiotropic [14,15].
WGP is also an effective method in animal breeding and plant improvement [50]. Because KRN is mainly controlled by additive loci, we selected the rrBLUP additive model to conduct WGP [51]. As expected, prediction accuracy increased as the number of randomly selected tagSNPs increased, which was consistent with the finding of Liu et al. [15] and determined by the influence of marker density on WGP [50]. However, the randomly selected tagSNPs showed a low predictive ability, and thus, we decided to combine the GWAS results with WGP to explore the best marker dataset for KRN prediction. As a result, higher prediction levels were easily reached when using the significant tagSNPs, and the moderate to high values were consistent with those reported by Liu et al. [15], Guo et al. [34], Riedelsheimer et al. [35] and Xu et al. [36]. This result suggested that integrating significant signals from GWASs into WGP models as fixed effects was effective for enhancing the prediction of KRN. A similar conclusion was reached by Liu et al. [15] for KRN, by Bian and Holland [52] for resistance to southern leaf blight (SLB) and gray leaf spot (GLS) and plant height (PHT) in maize and by Spindel et al. [39] for tropical rice improvement. Although different evaluations of WGP models incorporating peak GWAS signals have been performed in maize and sorghum [53], our research indicated that the use of QTNs passing a certain threshold in the above GWAS methods as fixed effects in the rrBLUP model is a powerful tool for KRN prediction, which was a trait-specific consideration in the given population in this study.
Based on the results of this study, we suggest that KRN is controlled by many additive loci and that the rrBLUP model can be used for KRN prediction in maize inbred lines. The combined utilization of different GWAS methods is helpful for predicting candidate genes and KRN in maize breeding.

Conclusions
In this study, multiple GWAS methods were used to identify significant QTNs for KRN in maize. The seven GWAS methods revealed different numbers of KRNassociated QTNs, ranging from 11 to 177. Based on these results, seven important regions for KRN located on chromosomes 1, 2, 3, 5, 9, and 10 were identified by at least three methods and in at least two environments. Moreover, 49 genes from the seven regions were expressed in different maize tissues. Among the 49 genes, ARF29 (Zm00001d026540, encoding auxin response factor 29) and CKO4 (Zm00001d043293, encoding cytokinin oxidase protein) were significantly related to KRN, based on expression analysis and candidate gene association mapping. WGP of KRN was also performed, and we found that the KRN-associated tagSNPs achieved a high prediction accuracy. The best strategy was to integrate the total KRN-associated tagSNPs identified by all GWAS models. These results will facilitate our understanding of the genetic basis of KRN and provide important candidate genes for further research on this important trait.

Plant materials and phenotyping
An association panel of 639 maize inbred lines, representing a wide range of genetic diversity of temperate inbred lines in China [54], was collected for GWASs. We declare that all plant materials comply with the 'Convention on the Trade in Endangered Species of Wild Fauna and Flora' in this study. The plant materials used in this study were conserved in our lab.
All the accessions were planted following a randomized block design of three replicates in three environments in 2011: Gongzhuling in Jilin Province (43.50°N, 124.82°E), Xinxiang in Henan Province (35.19°N, 113.53°E) and Beijing (39.48°N, 116.28°E) in 2011. For descriptive purposes, the three environments were designated GZL, XX and BJ, respectively. At each location, the field experiments include in a single row 3 m in length, with 0.6 m between adjacent rows and 12 individual plants per row. The Institute of Crop Science of the Chinese Academy of Agricultural Sciences has established experimental field bases at all the above locations. The Institute of Crop Science approved the field experiments, and field management followed local maize management practices. In this study, the field studies did not involve endangered or protected species.
Five ears were harvested from each line, and KRN was evaluated in the middle part of the ears [54]. BLUP values were calculated using the SAS PROC MIXED model, with genotype, environment and replicate as random effects [14,55]. The broad-sense heritability (H 2 ) of KRN was calculated according to Wu et al. [40]. The coefficient of variation was calculated as CV (%) = SD/ mean, where SD and mean refer to the standard deviation and mean, respectively, of KRN in each environment [55].

DNA extraction and genotyping
Young leaves of five plants of each maize line according were collected for genomic DNA extraction. We extract the genomic DNA followed the cetyltrimethylammonium bromide (CTAB) method [56]. All samples were quality checked and genotyped using the MaizeSNP50 BeadChip, which is an Illumina BeadChip array of 56, 110 maize SNPs developed from the B73 reference sequence [57]. Then, the successfully called SNPs with a missing rate of more than 20% and minor allele frequency (MAF) of < 0.05 were excluded from the genotyping dataset [58]. After that, 42,667 high-quality SNPs were used in further analysis.

GWAS mapping
One single-locus method, MLM, and six multilocus methods, including mrMLM, FASTmrMLM, FAS-TmrEMMA, pLARmEB, pKWmEB, and ISIS EM-BLASSO, were used in this study. Alleles of each polymorphic locus with a minor frequency > 0.05 were used for further analysis. A kinship matrix was calculated and principal component analysis (PCA) was performed with the TASSEL 5.2 program [59]. An MLM controlling for population structure (Q) and kinship (K) (MLM Q + K) was also generated in TASSEL 5.2 [18,19]. Six multilocus GWAS mapping methods were used along with the software package mrMLM.GUI v3.2 in the R environment (http://127.0.0.1:5846/) [26]. All parameters were set at default values, the critical threshold of significant associations for the MLM was set at -log 10 P ≥ 3, and the logarithm of odds (LOD) score for the six multilocus methods was set at ≥3 [26].

Candidate gene analysis
The LD decay with physical distance in our association panel was calculated in TASSEL 5.2 to be 200 kb ( Figure  S2). The candidate genes in the 200-kb region around significant QTNs detected by at least three models and in two environments were identified based on the B73 reference genome V4 from MaizeGDB (https://www. maizegdb.org/). Expression data for these genes were collected from previous studies [42,60]. Genome fragments containing the SNPs within the selected genes, including the 10-kb promoter region, the gene bodies and the 10-kb region downstream of the genes, were obtained from the maize HapMap3 dataset [43]. The candidate gene mapping analyses were conducted on a global maize association mapping panel of 282 diverse lines. The phenotypes of this association panel were provided in our previous report [40], and KRN was measured in six environments, including Beijing, Xinxiang in Henan, and Urumqi in Xinjiang in 2009 and 2010. Association analysis was conducted by the MLM method in TASSEL 5.2, controlling for population structure (Q) and kinship (K). The first three principal components (PCs), which were analyzed in a previous study [40], were used as covariants to control for existing population structure in the 282-line association mapping panel. Significant marker-trait associations were declared atlog 10 P > 3.

Genomic prediction of KRN
To predict the KRN of the inbred lines, we estimated predictability by WGP. We grouped the LD blocks in PLINK software [61] using the threshold value r 2 > 0.2 and identified tagSNPs according to the LD blocks. The ridge regression best linear unbiased prediction (rrBLUP) package was used to perform genomic prediction in R [62]. We randomly selected half of the lines of our association panel as the training population (320 inbred lines) and the remaining 319 inbred lines as the validation population [15]. We used the KRN-related tagSNPs identified by different methods to perform genomic prediction of KRN for the inbred lines under four conditions (XX, BJ, GZL and BLUP). Simultaneously, 5 to 27,000 randomly selected tagSNPs, the total tagSNPs related to KRN identified by the seven methods in a single environment (M-total tagSNPs), the total tagSNPs for KRN from all methods and environments (E-M-total tagSNPs) and the common tagSNPs for KRN detected by at least two, three, four, five, or six methods were also used for the same procedure. The random sampling of tagSNP numbers, the training and validation populations and the predictions were all repeated 100 times.