Germplasm and experimental design
In 2010, the NCRPIS collection of inbred lines [26] was evaluated for disease resistance at the Central Crops Research Station in Clayton, NC. The 2010 field experiment consisted of 2572 inbred line entries and was arranged in an augmented single replicate design. Experimental entries were divided into 18 sets of differing sizes based on maturity and field assignment, and sets were then randomly subdivided into incomplete blocks (where the maximum block size across sets was 23 plots). Each block within each set was augmented with a B73 check plot in a random position, and five other checks of varying maturities (IL14H, Ki11, P39, SA24, and Tx303) were included once per set in a random position.
In 2011 and 2012, a novel association mapping panel consisting of 771 diverse inbred line entries was evaluated for disease resistance in Clayton, NC. Based on phenotypic information from the 2010 field experiment, a subset of 681 inbred lines from the NCRPIS collection representing a range of both pedigrees and disease severity scores was chosen for the panel. An additional 90 lines, mostly modern public lines available from North Carolina State University as well as a few lines developed by private industry with recently expired Plant Variety Protection Act (exPVPA) coverage that had become available through the NCRPIS in the spring of 2011 were included. The complete panel of 771 entries was divided into eight sets based on maturity and replicated across the two years using an augmented design. Within years, sets were randomized within the field, and each set was blocked using an α-lattice design [38]. Similar to the NCRPIS evaluation, each block was augmented by a randomly assigned B73 check plot, and five other checks representing a range of maturities and disease reactions (GE440, NC358, 794, B47, and Tx303) were included once per set.
Topcross F1 hybrids representing a subset of inbred lines from the 2011–2012 association panel were also evaluated in Clayton, NC in 2011 and 2012. Due to seed availability, topcross seed was limited to a sample of 405 inbred lines from the total 771 entries of the association panel. F1 hybrid seed was generated by crossing inbred lines to either the stiff stalk exPVPA inbred tester PHB47 or the non-stiff stalk exPVPA inbred tester PHZ51 (or both). Overall, 92 lines were crossed only to B47, 162 lines were crossed only to PHZ51, and 151 lines were crossed to both testers, resulting in a total of 556 F1 hybrid entries in the topcross panel. In the 2011 and 2012 field experiments, topcross entries were classified by tester and maturity (early or late, for a total of four tester × maturity combinations), and each tester × maturity combination was randomly subdivided into three groups. One random group of each tester × maturity combination was assigned to a set, for total of three sets (with four groups per set). Similar to the inbred association panel, sets were randomized within the field in each year, groups were randomized within set, and each group was then subdivided into incomplete blocks, but the topcross hybrids were grown in different field blocks than the inbreds. Each block was augmented with a B73 × PHZ51 topcross check plot in a random position, and two other hybrids that exhibited relatively good resistance to Fusarium ear rot in previous experiments (Pioneer 31G66 and NC478 × GE440) were included once per group. Lastly, one additional check plot of P39 × PHZ51 or CML52 × PHZ51 was included once per group depending on maturity (early or late, respectively).
Inoculation and phenotyping methods
The 2010 NCPRIS collection experiment and the 2011/2012 inbred association panel experiments were inoculated with local toxigenic Fusarium verticillioides isolates using the toothpick method [12],[22]. Approximately one week after flowering, a toothpick containing dried F. verticillioides conidia was inserted near the base of the primary ear of five plants in each plot. At maturity, inoculated ears were harvested and visually scored for Fusarium ear rot symptoms. Scores were assigned to each ear in increments of 5% from 0% to 100% diseased based on the percentage of the ear displaying disease symptoms [19].
Topcross hybrid experiments in 2011 and 2012 were inoculated with a suspension of F. verticillioides conidia using the method described by Robertson et al. [19]. Approximately one week after flowering, 5 mL of a liquid suspension containing 2 × 106 conidia mL−1 was injected into the silk channel of the primary ear of five plants in each plot. One week following the first inoculation, 5 mL of the conidia suspension was injected near the base of the primary ear of the same plants inoculated in the first week. At maturity, inoculated ears were harvested and visually scored using the same protocol as the inbred disease experiments. Raw data from both the inbred and topcross experiments are provided in supplemental datasets File S1 and File S2 in Additional file 1, respectively.
Genotypic data
The genotypic data used in this study consisted of 200,978 SNPs filtered from the GBS markers developed by Romay et al. [26]. The original set of markers consisted of 681,257 SNPs generated by the approach described by Elshire et al. [27] and Glaubitz et al. [28] with missing data imputed using the haplotype-based imputation method described by Romay et al. [26]. SNP data are available at http://panzea.org/db/gateway?file_id=Romay_etal_2013_imputed_geno_data. In addition, the Romay et al. [26] marker set was augmented with GBS data for the ninety inbred lines in the 2011/2012 association panel that were not present in the NCPRIS collection in 2010. GBS data for the aforementioned lines were obtained through the Institute for Genomic Diversity at Cornell Unversity, Ithaca, NY (http://www.igd.cornell.edu). Even after haplotype-based imputation, some missing genotypes exist because the imputation method of Romay et al. [26] does not impute missing data when the observed scores within a test haplotype window do not sufficiently match the reference haplotype set. Therefore, the augmented SNP marker set was then filtered to include only those markers that had less than 20% missing data (after haplotype-based imputation) and a minor allele frequency (MAF) greater than 5%. Duplicate samples present in the Romay et al. [26] data set were also removed from the augmented data set; after this filtering step, genotypic data were available for a total of 2480 inbred lines from across all years combined. The final genotypic data set used in the GWAS analyses is provided in supplemental dataset File S3 in Additional file 1.
Statistical analyses
Estimation of least square means
Fusarium ear rot data from the 2010 NCPRIS collection experiment and the 2011/2012 inbred association panel experiments were first analyzed separately to determine the best fitting spatial model within each year, and then the best models within each year were combined together to form a single multi-environment trial analysis. Within each year, a model was first fit with a fixed entry effect, fixed first, second, third, and fourth order polynomial trend effects in both the row and column directions [39], and flowering time as a fixed linear covariate. Only those fixed trend effects significant at P < 0.01 were chosen to remain in the model, and flowering time was also dropped from the model if it was not significant at P < 0.05. Once significant fixed effects were selected, random effects were chosen using Akaike’s Information Criterion [40] to compare four different models within each year: a model fitting only the significant fixed effects; a model fitting significant fixed effects and random set and block within set effects; a model fitting fixed effects and an anisotropic correlated error structure [39]; and a model fitting fixed effects, random set and block within set effects, and an anisotropic correlated error structure. All models were weighted by the number of ears scored within each plot, and a natural logarithmic transformation of raw ear rot scores was used in all analyses due to an association between the magnitude of predicted ear rot values and residuals. All analyses were performed using ASReml version 3 software [41].
Once the best model within each year was selected, a single multi-environment trial analysis was conducted by nesting the various best spatial models within year. Fixed effects from the individual year analyses were checked again for significance in the combined model, and those which became insignificant in the combined model were dropped. The combined model had the form:
The effects in this model were a fixed entry (line) effect (LINEl), random year (YEARi) and line × year effects, a heterogeneous error variance structure within each year εijkl (with unique variances in each year), and the various spatial effects nested within their respective years: a random set effect in 2010 (SET(YEAR)ij), a random block within set effect in 2010 (BLOCK(SET × YEAR)ijk), a fixed first order trend in the row direction in 2011 (βrow with associated indicator variable, x r-ijkl, indexing the row position in the field), and a fixed second order trend in the column direction in 2011 (βcol with associated indicator variable, x 2
c
- ijkl, indexing the column position in the field). Of the 2480 inbred lines with available genotypic data, least squares means were estimated for 1687 lines from the combined model (File S4 in Additional file 1). Means were not estimable for the remaining lines due to missing phenotypic observations in all years (typically due to extreme time to maturity or poor seed production). Given the imbalance in the number of experimental entries in 2010 versus 2011/2012, a second filtered least squares mean data set was created that included only the 734 inbred lines for which we had data from all three years of testing (File S4 in Additional file 1).
Ear rot data from the 2011/2012 topcross experiments were analyzed using the same model selection protocol as the inbred experiments. The only difference in model selection in the topcross experiments was the testing of random set, group within set, and block within group effects in addition to other fixed and random effects tested in the inbred models. The combined model for the topcross experiments consisted of a fixed entry effect, random year and entry × year effects, a heterogeneous error variance structure within each year, and the significant spatial and experimental design factors nested within years: a fixed flowering time covariate in both years, an anisotropic correlated error structure in the row direction in both years, and a fixed first order trend in the row direction in 2011. From the combined model, least squares means were estimated for all 556 topcross hybrid entries. Means were then divided into two separate data sets based on tester. The B47 topcross set contained 243 means, and the PHZ51 topcross set contained 313 means (File S4 in Additional file 1).
Heritability of Fusarium ear rot resistance was estimated within the inbred association panel and topcross hybrid experiments. The same models used to estimate least square means were used to estimate heritability except entries were treated as random effects to obtain estimates of genetic variance. Entry mean-basis heritability was estimated as
where is the average prediction error variance for all pairwise comparisons of entries and is the estimated genetic variance [42]. Five entry mean-basis heritabilities were estimated: across the full inbred association panel, within the filtered inbred subset of 734 lines, across all topcross hybrids, within the B47 topcrosses, and within the PHZ51 topcrosses.
Genotypic correlations between inbred rot resistance and hybrid rot resistance were estimated using individual location least square means for inbred entries and their corresponding topcross hybrids in a multivariate mixed model in ASReml. The least squares means used to calculate genetic correlations were only from years in which both inbred entries and hybrids were evaluated simultaneously (2011 and 2012). The model statement in ASReml was specified as
where Y
INB
is the inbred per se rot score variate, Y B47 is the B47 topcross hybrid rot score variate, Y PHZ51 is PHZ51 topcross hybrid rot score variate, Trait fits the mean for all three disease variates, Trait.Year fits a fixed year effect for each disease variate, and Trait.Entry fits the random genotype effect for each disease variate. Each term in the model was associated with one variance component for each trait and three covariance components between the three traits.
Association analyses
A genetic kinship matrix (K; File S5 in Additional file 1) for all 2480 inbred lines based on observed allele frequencies ([43]; method 1) was created using R software version 3.0.1 [44]. A subset of 10,241 SNP markers from the entire genotypic data set of 2480 inbred lines was used to produce K. The subset of markers was created by selecting markers from the complete marker set with less than 1% missing data. Missing genotypes remaining in the marker subset were imputed using a stochastic approach described by Zapata-Valenzuela et al. [45]. This method imputes a categorical genotype based on the frequency of all genotypes observed at the same locus across all individuals. This method imputes genotypic values that are expected to maintain the genotypic frequencies observed across the non-missing data. A principal components analysis in R was used to obtain the first two principal components of K in order to study the association of population structure with mean Fusarium ear rot scores.
The R package GAPIT version 3.35 [46] was used for the genome-wide association analyses based on a compressed mixed linear model [47]. Analyses were conducted on four sets of means: the entire set of inbred lines with phenotype data (1687 entries); the filtered set of inbred lines tested in all years (734 entries); the B47 topcross set (243 entries); and the PHZ51 topcross set (313 entries). In each set of means, missing values were included to allow for the same kinship matrix to be used across all analyses. The mixed linear model implemented by GAPIT was
where y is the vector of ear rot least squares means on the natural-log scale, β is a vector of fixed effects including SNP marker effects, u is a vector of random additive genetic effects from background QTL for lines, X and Z are design matrices, and e is a vector of random residuals. The variance of the u vector was modeled as
where K is the 2480 × 2480 matrix of pairwise kinship coefficients and is the estimated additive genetic variance [47]. The full K matrix was used for all analyses.
Restricted maximum likelihood estimates of variance components were obtained using the optimum compression level and population parameters previously determined (P3D) options in GAPIT [47]. The positive false discovery rate (FDR) across all 200,978 tests of association between one SNP and ear rot resistance was estimated by GAPIT using the Benjamini - Hochberg method [48]. The MaizeGDB genome browser [49] was used to identify predicted genes either containing or located within 0.5 Mb of significant SNP hits from the GWAS. Annotations of predicted genes were combined from the maize reference sequence 5b filtered gene set (available from MaizeGDB; http://ftp.maizegdb.org/MaizeGDB/FTP/B73_RefGen_v2_dumps/) and the 6a reference sequence available at Phytozome V10 (http://phytozome.jgi.doe.gov/pz/portal.html) [31]. SNP positions were also converted to RefGen V3 positions to permit use of the Ensembl variant effect predictor tool (http://plants.ensembl.org/Zea_mays/Info/Index) to determine the type of mutation caused by SNPs [50].
The 1687 lines of the full inbred panel with phenotype data were grouped into one of five major maize subpopulations (stiff stalk, non-stiff stalk, tropical, popcorn, and sweet corn) based on pedigree information compiled by Romay et al. ([26]; http://genomebiology.com/content/supplementary/gb-2013-14-6-r55-s1.xlsx). Pedigree descriptors of the additional North Carolina State University lines added to the experiment in 2011 were obtained from http://www.cropsci.ncsu.edu/maize/germplasm.html and appended to the Romay et al. [26] data set. Lines of mixed ancestry (“unclassified”) were dropped from the analysis. Landraces were also dropped due to very small sample size. The frequencies of alleles that reduced disease severity at significantly associated SNPs from the GWAS were estimated within each subpopulation in R software, and a Fisher’s exact test was used to test the null hypothesis that the frequency of the allele conferring increased disease resistance was the same across all five subpopulations.
Data resampling analysis
To measure the robustness of GWAS associations detected in the full inbred panel analysis, we generated 50 subsample data sets, each containing phenotypic data from a random sample of about 80% of the inbred lines. Subsample data sets were generated in 10 replications in each replication the complete data set was partitioned into five folds, each fold containing an approximately equally sized random sample of lines. GWAS was conducted on each of the 50 subsample data sets in the same manner as for the full data set. The resample model inclusion probability (RMIP; [51]) for each SNP was computed as the frequency across the 50 data subsamples with which the SNP’s association test had a p-value less than 10−5.
Availability of supporting data
The data sets supporting the results of this article are available at the Panzea.org repository: http://www.panzea.org/db/gateway?file_id=Zila_etal_2014_data_and_supp.