Polymorphisms in monolignol biosynthetic genes are associated with biomass yield and agronomic traits in European maize (Zea mays L.)

Background Reduced lignin content leads to higher cell wall digestibility and, therefore, better forage quality and increased conversion of lignocellulosic biomass into ethanol. However, reduced lignin content might lead to weaker stalks, lodging, and reduced biomass yield. Genes encoding enzymes involved in cell wall lignification have been shown to influence both cell wall digestibility and yield traits. Results In this study, associations between monolignol biosynthetic genes and plant height (PHT), days to silking (DTS), dry matter content (DMC), and dry matter yield (DMY) were identified by using a panel of 39 European elite maize lines. In total, 10 associations were detected between polymorphisms or tight linkage disequilibrium (LD) groups within the COMT, CCoAOMT2, 4CL1, 4CL2, F5H, and PAL genomic fragments, respectively, and the above mentioned traits. The phenotypic variation explained by these polymorphisms or tight LD groups ranged from 6% to 25.8% in our line collection. Only 4CL1 and F5H were found to have polymorphisms associated with both yield and forage quality related characters. However, no pleiotropic polymorphisms affecting both digestibility of neutral detergent fiber (DNDF), and PHT or DMY were discovered, even under less stringent statistical conditions. Conclusion Due to absence of pleiotropic polymorphisms affecting both forage yield and quality traits, identification of optimal monolignol biosynthetic gene haplotype(s) combining beneficial quantitative trait polymorphism (QTP) alleles for both quality and yield traits appears possible within monolignol biosynthetic genes. This is beneficial to maximize forage and bioethanol yield per unit land area.


Background
Elevating the polysaccharide to lignin ratio is one possible approach to improve the quality of biofeedstocks for ethanol conversion [1]. It is believed that cell wall lignin content is negatively correlated with forage digestibility [2] and bioethanol production [3]. Removing lignin by oxidative pretreatment could significantly increase the release of available sugars in subsequent enzyme hydrolysis compared to the untreated control [4]. In maize, a 1% increase in available cellulose is expected to increase the potential ethanol production from 101.6 to 103.3 gallons per dry ton of biomass, as calculated using the U.S. Department of Energy's Theoretical Ethanol Yield Calculator and Feedstock Composition Database [5]. Theoretical maximum ethanol yields from biomass are highly correlated (r 2 = 0.9) with acid detergent lignin concentration [6]. According to Lorenz et al. [1], variation in ethanol yield is driven by glucan convertibility, which is highly correlated with ruminal digestibility and lignin content. Besides the lignin content, other aspects of cell wall lignification like the ratio of syringyl to guaiacyl lignin units affect cell wall digestibility [7,8] and, therefore, likely ethanol production from biofeedstocks. The syringyl to guaiacyl ratio impacts the efficiency of cell wall hydrolysis in forage sorghums [9]. In summary, modification of cell wall lignification is a promising route to improve the quality of bioenergy crops.
However, reduced lignin content can influence the overall plant performance. Generally, reduced lignin content results in weaker stalks, reduced stover and grain yield, and delayed maturity [10]. In maize, brownmidrib (bm) mutants show a decreased lignin content and increased cell wall digestibility [11]. For instance, lignin content is reduced by one third and cell wall digestibility is increased by 9% in bm3 lines or hybrids [12]. However, maize bm lines or hybrids show reduced vigor during vegetative growth, a high incidence of stalk breakage at maturity, and decreased grain and stover yield [13][14][15][16]. Similarly, bm hybrids of Sudan grass and sorghum also show reduced dry matter yield [17,18]. Genetically engineered tobacco with reduced CCoAOMT [19] or PAL activities [20], poplar with down-regulated CCR activity [21], Arabidopsis with a mutation in the CCR1 [22], C3H [23], and C4Hgenes [24], or with double mutations in the COMT1and CCoAOMT1 genes [25] showed reduced plant size. By silencing the HCT gene in Arabidopsis, Besseau et al. [26] obtained mutants with modified lignin structure as well as repressed plant growth. Silencing of HCT resulted in redirection of the metabolic flux into flavonoids, which suppressed auxin transport.
Decreased lignin content does not necessarily have negative effects on plant growth. After divergent selection for fiber concentration in maize, Wolf et al. [27] found only weak and inconsistent correlations between lignin content and various agronomic traits. Weller et al. [28] found no yield difference between bm3 and wildtype isolines. He et al. [29] developed O-methyltransferase down-regulated maize with a 17% decrease in lignin content, increased digestibility, without effect on dry matter yield. In aspen, repression of 4CL led to a 45% reduction in lignin content [30]. While the structural integrity at both the cellular and whole-plant level was not affected, enhanced leaf, root, and stem growth were observed, as well as increased cellulose content [30]. By simultaneously silencing HCT and CHS genes, Besseau et al. [26] obtained normal growing Arabidopsis plants with substantially altered lignin composition. In summary, cell wall lignification is generally, but not always, negatively correlated with biomass yield and other agronomic traits. These correlations can be due to: (1) linkage of genes controlling monolignol biosynthesis and biomass yield, (2) pleiotropy at the level of genes but not QTPs within monolignol biosynthetic genes affecting both groups of traits, and (3) pleiotropic effects of QTP(s) within monolignol biosynthetic genes. The underlying genetic cause(s) for these correlations impact the strategy for breeding of bioenergy crops.
Ten enzymes are involved in converting phenylalanine to monolignols in maize, and the majority is encoded by two or more genes [31]. Four genes encode PAL proteins in Arabidopsis, which catalyze the first step in the phenylpropanoid pathway [32]. In maize, PAL has both phenylalanine and tyrosine ammonia lyase activity [33] and at least five contigs with PAL/TAL annotation were identified [31]. The other enzymes involved in biosynthesis of monomers include cinnamate 4-hydroxylase (C4H), 4-coumarate:CoA ligase (4CL), hydroxycinnamoyl-CoA transferase (HCT), p-coumarate 3-hydroxylase (C3H), caffeoyl-CoA O-methyltransferase (CCoAOMT), cinnamoyl-CoA reductase (CCR), ferulate 5-hydroxylase (F5H), caffeic acid O-methyltransferase (COMT), and cinnamyl alcohol dehydrogenase (CAD), with at least two, seven, two, one, five, eight, two, one, and seven sequences were identified, respectively [31]. Association mapping is a promising approach to identify candidate QTPs for traits of interest [34][35][36][37]. The CCoAOMT2 gene is co-localized with a QTL for cell wall digestibility and lignin content [38], and an 18-bp indel in the first exon was found to be associated with cell wall digestibility [34]. In addition, associations have been identified between neutral detergent fiber (NDF) and polymorphisms within PAL, 4CL1, C3H, and F5H genes, between in vitro digestibility of organic matter (IVDOM) and polymorphisms within PAL, 4CL1, and C3H, and between digestibility of neutral detergent fiber (DNDF) and polymorphisms in C3H and F5H genes [35,36]. However, genes encoding any of these 10 enzymes have so far not been studied in relation to biomass yield-related traits. In this study, the relationship between 10 monolignol biosynthetic genes belonging to eight enzyme encoding genes or gene families and the biomass yield-related traits: plant height (PHT), days to silking (DTS), dry matter content (DMC), and dry matter yield (DMY) were analyzed. Only one or two gene member(s) of each gene family were amplified. Our objectives were to investigate, (1) whether candidate quantitative trait polymorphisms (QTPs) for these four traits can be identified in monolignol biosynthetic genes, and (2) whether candidate QTPs for biomass yieldrelated traits and cell wall digestibility traits act pleiotropically by comparing the results of this study with results from previous forage trait association studies [ [35,36], Brenner et al.: Polymorphisms in O-methyltransferase genes are associated with stover cell wall digestibility in European maize (Zea mays L.), submitted]. The results are discussed with respect to implications for breeding of maize for forage and lignocellulosic ethanol production.

Phenotypic data analyses
Mean phenotypic values for individual lines across four environments ranged from 109.3 to 197.1 cm for PHT, 68.5 to 85.7 days for DTS, 23.2% to 36.0% for DMC, and 2.5 to 8.4 t/ha for DMY. Overall mean values were 152.0 cm, 78.6 days, 28.5%, and 5.3 t/ha, respectively, for these four traits (Table 1). Variance components for genotype and interactions between genotype and environment were significant (P = 0.01) and variance components for environment were significant (P = 0.01) for PHT, DTS, and DMC. Heritabilities were 88.0%, 92.0%, 85.7%, and 81.9% for PHT, DTS, DMC, and DMY, respectively ( Table 1). Means of dent lines were significantly higher than means of flint lines for DTS (P = 0.01), DMY (P = 0.01), and PHT (P = 0.05), whereas DMC was not significantly different between dent and flint lines.

Association analyses
Association analyses revealed that six genes, coding for COMT, CCoAOMT2, 4CL1, 4CL2, F5H, and PAL proteins, were associated with at least one of the four biomass yield-related traits. 10 associations were identified by GLM when including population structure in the analysis and controlling for multiple testing. Among those, seven were validated by MLM (Tables 3  and 4), which, in addition to population structure, corrects for finer scale relative kinship. However, none of these polymorphisms identified by MLM remained significant after controlling for multiple testing by FDR. At the PAL locus a tight LD group containing 17 polymorphisms with r 2 = 1 was associated with days to silking (DTS). The 39 lines were classified into two groups by this LD group. The lines including AS1-8, 11-22, 24, and 29 were six days earlier than the remaining lines. This LD group explained 7% of the total DTS variation in our population. At the 4CL2 locus, a tight LD group consisting of two SNPs (at position 192 and 217) in complete LD explained 14.3% of the phenotypic variation for PHT. The SNP at position 217 led to an amino acid change. The lines with the TG allele at these two positions were on average 17 cm higher than the lines with the CA allele. At the CCoAOMT2 locus, three polymorphisms (an indel starting at position 75, two SNPs at position 144 and 406) were in a tight LD group These two indels both resulted in reading frame shift, with one of those being a singleton. Lines with an Adenine insertion at position 454 silked on average three days earlier than the remaining lines. The COMT gene has been shown to strongly affect cell wall digestibility and plant height. However, only one polymorphism was detected for associations with DTS. The indel in the 3'UTR was detected only by GLM and explained 10.3% of the phenotypic variation for DTS. Finally, one trait association was detected at the F5H locus, which was a missense substitution at position 65 and explained 22.4% of the phenotypic variation for DTS.

Pleiotropic polymorphisms affecting biomass yield and forage quality
In order to increase the chance of finding potential pleiotropic QTP affecting both biomass yield-related and digestibility traits, associations of monolignol biosynthetic genes [ [35,36], Brenner et al.: Polymorphisms in Omethyltransferase genes are associated with stover cell wall digestibility in European maize (Zea mays L.), submitted] were determined without multiple test adjustment. In our study, two additional trait associations were detected only by MLM, one of which was an association between a synonymous SNP in the COMT gene and PHT, the other one was between a tight LD group in the F5H gene (two SNPs at position 5 and 6 in complete LD) and DMY. Despite of these relaxed statistical test conditions, only two polymorphisms in 10 monolignol biosynthetic genes were associated with both biomass yield-related and cell wall digestibility traits. The indel starting at position 810, resulting in a reading frame shift in the 4CL1 gene, was associated with IVDOM [36] and DTS identified by both GLM and MLM. It was also associated with NDF identified by GLM [36]. The tight LD group with two SNPs in complete LD in the F5H gene, resulting in a substitution from Proline to Arginine, was associated with both DMY (by MLM) and NDF (by GLM) [36]. In addition, the tight LD group in the PAL gene showing association with DTS in our study was also associated with NDF [35]. However, the association between this LD group and NDF was only detected when population structure was not considered. In summary, no pleiotropic polymorphisms associated with DNDF and DMY or PHT were identified.  These lines are the same 40 lines (except D_AS34) used by Andersen et al. [36]. Flint-and dent lines are denoted by F_ and D_ prefixes, respectively. PHT: plant height (cm); DTS: days to silking; DMC: dry matter content of stover; DMY: dry matter yield of stover (tons per hectare) LSD5: least significant difference at 5% level between lines CI: confidence interval *, ** significant at 5% and 1% level, respectively.

Impact of the association analysis method on QTP identification
Two statistical approaches (GLM and MLM) were employed as in previous association studies for better comparison across quality [ [35,36], Brenner et al.: Polymorphisms in O-methyltransferase genes are associated with stover cell wall digestibility in European maize (Zea mays L.), submitted] and yield-related traits (this study). In those former studies, the same line panel, gene sequences, and marker data have been used. Inclusion of both population structure and relative kinship reduces the number of false positive associations compared to including population structure alone [40]. In the present study, most of the associations identified for biomass yield and other agronomic traits by GLM were also identified by MLM, although none of the associations identified by MLM remained significant after controlling for multiple testing. Therefore, we can not exclude the possibility that familiar relatedness resulted in false positives. However, this result might also suggest that inclusion of relative kinship information might in some cases mask genuine associations, comparable to likely false negatives of flowering time caused by inclusion of population structure for the Dwarf8 gene in European maize [41]. In this example, likely true effects of QTP on flowering time were confounded with presence of one particular allele set in flint, the other in dent lines.

Characterization of polymorphisms associated with biomass yield and agronomic traits
We compared trait-associated (27) with not-associated polymorphisms (255) within the 10 monolignol biosynthetic genes regarding (i) the distribution among SNPs and indels, and (ii) polymorphisms among coding and non-coding sequences. Based on Chi-square tests, traitassociated polymorphisms for biomass yield-related traits were not preferentially due to either SNPs or indels, and not primarily located in either coding or non-coding gene regions. Polymorphisms in conserved motifs with impact on protein function or abundance are more likely candidates for causative QTPs [42]. Within the PAL gene in our study, 1 out of 17 polymorphisms in the LD group associated with DTS was located within a possible bipartite RAV1 binding site [43,44]. RAV1 has been suggested as a negative regulator of plant growth and development [45]. In addition, five polymorphisms in the same LD group were located within Dof-like motifs [43]. Dof transcription factors play a critical role in plant growth and development [46]. Those six polymorphisms are more likely candidates for causative QTPs, whereas the remaining 11 significant associations within the same LD group are more likely due to linkage. To pinpoint causative polymorphisms, further dissection based on additional alleles at low LD is required. In the CCoAOMT2 gene, a 40-60 bp indel at position 663 was just six base pairs upstream of a 3' splicing donor site, spanning a potential "branching site" for splicing. Consequently, this indel might affect splicing and in this way interfere with the mRNA sequence and function of CCoAOMT2. Moreover, this indel also spanned part of a bipartite RAV1 binding site [43]. Interestingly, this site was associated with three biomass yield-traits. Although LD decay was rapid in CCoAOMT2, the indel and two SNPs, which are at positions 75, 144, and 406, respectively, were tightly linked (r 2 > 0.89). The indel resulted in two amino acid (Asparagine and Glycine)  The other two SNPs were either synonymous or intron located SNPs. Thus, the indel is a more promising candidate QTP compared to the other two SNPs. Two DTS associated polymorphisms in 4CL1, which were both single nucleotide indels, led to frame shift mutations.
One indel starting at position 810 introduced a premature stop [36]. The other indel changed the peptide sequence substantially, since it is located close to the transcription initiation site. In 4CL2, two polymorphisms in complete LD were associated with PHT. One of them changed the amino acid sequence and is, therefore, a more likely candidate QTP. In the F5H gene, Leucine to Proline and Proline to Arginine substitutions, were associated with DTS and DMY, respectively. Both are expected to change protein structure dramatically based on the Blosum-62 substitution matrix [47]. Proline is very different from other amino acids due to its aliphatic side chain bonded to both nitrogen and a-carbon atoms. In summary, some of the above mentioned trait associated polymorphisms or LD groups likely change protein sequence and expression dramatically, and are consequently the most likely QTPs affecting agronomic traits. However, future studies with maize populations with very low LD or alternative approaches are required for validation.

Pleiotropic effects of monolignol biosynthetic genes
Besides biosynthesis of lignin monomers, the monolignol biosynthetic pathway is involved in biosynthesis of salicylates, coumarins, hydroxycinnamic amides, pigments, UV light protectants, antioxidants, and flavonoids [48]. Jone [49] concluded that phenylpropanoid compounds are involved in controlling plant development, growth, xylogenesis, and flowering. For example, chalcone and naringenin, two intermediates in the phenylpropanoid metabolism in plants, inhibit 4CL activity [50] and suppress the growth of at least 20 annual plant species including maize [51]. Moreover, mutants in genes coding for C3H, C4H, PAL, CCoAOMT1, CCR1 and HCT show effects on plant growth [19][20][21][22][23][24][25][26]. This is likely due to redirection of metabolic flux and accumulation of compounds, like naringenin, flavonoids, chalcone, which have the potential to perturb hormone homeostasis and ultimately affect plant growth. In our study, polymorphisms affecting both biomass yield and cell wall digestibility were identified in six monolignol biosynthetic genes (encoding for COMT, CCoAOMT2, 4CL1, 4CL2, F5H, and PAL). These findings indicate that at least some of the monolignol biosynthetic genes act pleiotropically on both lignin content or composition and biomass yield or other agronomic traits. However, only two polymorphisms, the indel at position 810 in the 4CL1 gene and the LD group with SNPs resulting in substitution from Proline to Arginine [36] in the F5H gene, were found to be associated with both biomass yield and cell wall digestibility traits without controlling multiple testing. After controlling multiple testing, only the indel in the 4CL1 gene was associated with both DTS and IVDOM. Thus, the majority of QTPs identified in our study affected only one of the two groups of traits. Intragenic linkage of respective QTPs was more abundant than pleiotropic QTPs. According to our findings, most QTPs for both groups of traits are expected to segregate independently in germplasms with low LD.
Another important implication from our results is, that pleiotropy identified by comparison of wild-type with knock-out alleles, might in several cases turn out to be due to close linkage of intragenic QTPs with effects on different pathways and traits. An example is the well-studied Table 4 Polymorphism character and position in reference sequence. Dwarf8 gene. This gene has been shown to affect plant height, when comparing mutant and wild type alleles [52]. However, association analyses with a range of wildtype alleles revealed candidate QTPs for flowering time, but not for plant height [37]. In Dwarf8, the DELLA domain is thought to affect plant height [52], while other polymorphisms affect flowering time. The DELLA domain was conserved in the 92 inbred lines used for an association analysis [37]. Similarly, previous bm3 mutant studies implied that the COMT coding gene acts pleiotropically on both forage quality and yield characters. However, after adjustment for multiple testing only one polymorphism was associated with DTS in our analysis, whereas eight different polymorphisms were associated with DNDF [Brenner et al.: Polymorphisms in O-methyltransferase genes are associated with stover cell wall digestibility in European maize (Zea mays L.), submitted]. Since earlier reports on pleiotropy of bm mutations were based on isogenic lines, another explanation might be closely linked genes in introgressed donor segments affecting either quality or yield characters.

Implications for plant breeding
Although the genetic correlation between DNDF and DMY was significant (P = 0.05), it was very low (r = -0.24) in these 39 inbred lines. Hence, it is very likely that the majority of genes affecting either biomass yield or cell wall digestibility traits are different. Our results support that monolignol biosynthetic genes affect both biomass yield-related and cell wall digestibility traits. Intragenic linkage of QTPs was the more frequent cause for "pleiotropy" compared to pleiotropic polymorphisms. No QTP in our study was associated with PHT and DNDF, or DMY and DNDF. Considering these correlations and association data together, we conclude that breeders can employ optimal wildtype alleles for monolignol biosynthetic genes to improve cell wall digestibility, without penalty on DMY.  [36]. In our study, four biomass yield-related traits PHT, DTS, DMC, and DMY were analyzed. PHT was measured as distance from soil level to the lowest tassel branch after flowering. DTS was measured as days from sowing to silking. Dry matter content (DMC) of stover (g/kg) (ears were manually removed) was determined 50 days after flowering and dry matter yield (DMY) was measured in tons per hectare.

Phenotypic data analyses
Mean values, heritability, and variance components of each biomass yield-related trait and correlations between the above mentioned eight traits were calculated in PLABSTAT version 3A [53]. Briefly, analyses of variance were performed for each experiment separately. Adjusted entry means and effective error mean squares were used to compute the combined variances and covariances across environments for each trait. The sums of squares for entries were subdivided into variation among inbred lines, environments, interaction between inbred lines and environments, and error. Variance components were computed for lines and environments, considering them as random effects in the statistical model: Phenotype = effects of lines + effects of environments + effects of lines by environment (P = mean+L+E+L×E). F-tests were employed for testing the homogeneity of lines, environments and interactions between lines and environments according to the approximation given by Satterthwaite [54]. Heritabilities (h 2 ) for each trait were calculated on an entry-mean basis, and confidence intervals for h 2 were obtained according to Knapp et al. [55]. Phenotypic and genotypic correlations between eight traits were calculated by standard procedures [56].

DNA extraction, amplification, and sequencing
Leaves of each of the 39 lines were harvested in the greenhouse three weeks after germination for DNA extraction by the Maxi CTAB method [57]. Population structure and association analysis 101 publicly available simple sequence repeat markers (SSR) http://www.maizegdb.org/ssr.php, evenly distributed across the whole genome of maize, were employed to genotype the 39 inbred lines. SSR data were used to infer the population structure in Structure 2.0 software [62,63]. Individual lines were grouped based on marker profiles by the Bayesian clustering method of Structure 2.0. The membership coefficients for each individual in each subpopulation were calculated with a burn-in length of 50,000 followed by 50,000 iterations and stored in a Q matrix. Inbreds were treated as haploids. Based on these SSR marker data, finer scale relative kinship (K)-Loiselle kinship coefficients [64] between lines were calculated in SPAGeDi [65]. Values on the diagonal of the K matrix were set as 2, and negative values in the matrix indicating that two individuals were less related than randomly chosen individuals [65] were set to 0. Association analyses were carried out using the general linear model (GLM), and mixed linear model (MLM) in TASSEL 2.01 software [41] to test associations between polymorphisms of the 10 monolignol biosynthetic genes and four biomass traits. The threshold for P-values was set to 0.05. In all models, the Q matrix was used to account for overall population structure. 10,000 permutations were used to determine the P-value for association of each polymorphism by GLM. The P-value adjusted for multiple tests was obtained by a step-down MinP procedure [66], implemented in TASSEL. For MLM, the K matrix was included to account for relative kinship between individuals [41]. Trait associated polymorphisms with r 2 > 0.85 and D' > 0.9 were assigned to a tight LD group [67]. The phenotypic variation explained by this tight LD group was considered to be equal to the phenotypic variation of that polymorphism with the largest effect in this region. The False Discovery Rate (FDR) was determined to correct for multiple testing by MLM [68].
Additional file 1: Supplementary table. Haplotype number, average, minimum, and maximum of biomass yield-related trait values for each monolignol biosynthetic gene. Haplotype numbers and inbred lines included in each haplotype group: see [35] for PAL, [36]