QTL mapping of agronomic traits in tef [Eragrostis tef (Zucc) Trotter]

Background Tef [Eragrostis tef (Zucc.) Trotter] is the major cereal crop in Ethiopia. Tef is an allotetraploid with a base chromosome number of 10 (2n = 4× = 40) and a genome size of 730 Mbp. The goal of this study was to identify agronomically important quantitative trait loci (QTL) using recombinant inbred lines (RIL) derived from an inter-specific cross between E. tef and E. pilosa (30-5). Results Twenty-two yield-related and morphological traits were assessed across eight different locations in Ethiopia during the growing seasons of 1999 and 2000. Using composite interval mapping and a linkage map incorporating 192 loci, 99 QTLs were identified on 15 of the 21 linkage groups for 19 traits. Twelve QTLs on nine linkage groups were identified for grain yield. Clusters of more than five QTLs for various traits were identified on seven linkage groups. The largest cluster (10 QTLs) was identified on linkage group 8; eight of these QTLs were for yield or yield components, suggesting linkage or pleotrophic effects of loci. There were 15 two-way interactions of loci to detect potential epistasis identified and 75% of the interactions were derived from yield and shoot biomass. Thirty-one percent of the QTLs were observed in multiple environments; two yield QTLs were consistent across all agro-ecology zones. For 29.3% of the QTLs, the alleles from E. pilosa (30-5) had a beneficial effect. Conclusion The extensive QTL data generated for tef in this study will provide a basis for initiating molecular breeding to improve agronomic traits in this staple food crop for the people of Ethiopia.

well on both waterlogged Vertisols in the highlands as well as water-stressed areas in the semi-arid regions throughout the country is one of the reasons for which tef is preferred over other grain crops such as maize or barley [2]. In addition, tef generally suffers less from biotic stresses compared to most other cereal crops grown in Ethiopia and it contains high levels of proteins and mineral [3].
Tef is an allotetraploid species with a base chromosome number of 10 (2n = 4× = 40). It belongs to the family Poaceae, sub-family Eragrostidae and genus Eragrostis. The genus contains approximately 350 species [4]. The exact diploid progenitors of tef are still unknown; however, most researchers agree that E. pilosa is the species most closely related to E. tef and is considered the direct wild tetraploid progenitor of tef [5]. It is also the only species known to be cross-compatible with modern tef varieties. Flow cytometry research has shown that tef has a genome size of 730 Mbp [6], which is roughly the same size as diploid sorghum and about 60% larger than the diploid rice genome. It has also the smallest chromosomes reported among the Poaceae ranging from 0.8 to 2.9 μm [6], which has significantly hindered the cytogenetic research of this species.
Understanding the genetic control of agronomic traits is essential for the sustained improvement of tef. Lodging is the number one cause of yield loss in tef; even with good crop management practices. Recent studies in tef have shown strong correlations between lodging, panicle type, culm thickness, and grain yield [2,7]. Important agronomic traits in tef, as in most crop species, are quantitative inherited [7,8], which complicates genetic analysis. Quantitative trait locus (QTL) analysis allows the identification of discrete chromosome segments controlling complex traits [9]. The significance of identifying QTLs that correspond with certain traits is that the information can be used for marker-assisted selection (MAS) program. This is the most comprehensive report of QTL analyses for agronomic traits in tef to date.
Cultivated tef and the wild species, E. pilosa, differ greatly for most agronomic traits and the close relationship betweenthese two species facilitate hybridization providing a unique opportunity to develop a new pool of genetic variation. The study by Tefera et al. [7] has demonstrated that E. pilosa has contributed useful breeding traits, such as earliness and short stature. Therefore, utilization of E. pilosa as a donor in an inter-specific cross is a useful strategy for broadening the genetic diversity of the existing gene pool in cultivated tef.
The purpose of this research was to identify and characterize QTLs controlling 22 agronomic traits; eight yield-related traits and 14 morphological traits, in the inter-specific cross between E tef, cv. Kaye Murri and E. pilosa .

Trait analysis
Effects of years and locations were highly significant (p < 0.001) for all traits evaluated in multiple locations (data not shown). The variance among lines was highly significant (p < 0.001) for all traits except RPR1, RPR2, and Crush1 (data not shown). The mean value of the two parents, Kaye Murri and E. pilosa  were significantly different for all 22 traits ( Table 1). As expected for an interspecific cross, distribution of phenotypic values in the progeny showed bi-directional transgressive segregants for all traits, except Crush1 and Crush2, which showed transgressive segregants towards the E. pilosa  parent only.
Phenotypic correlations were estimated between the overall means of the 22 phenotypic traits. All traits, except RPR1 and RPR2, were highly correlated (p < 0.001) with at least one other trait. Significant positive correlations were identified between yield and most agronomic traits except PedL and Dia in this population (Table 2). Lodging was not correlated with traits supposedly lodging related, such as PH, RPR1, 2 and Crush1, 2 ( Table 2). The frequency distributions of most of traits fit the normal distribution, however, seven traits (PWt, PSWt, GY, SB, HD, RPR1 and RPR2) were significantly skewed, and transformation was applied prior to QTL analysis except RPR1 and 2. The traits, RPR1, RPR2 and Crush1 were excluded for QTL analyses which did not show variances among lines thus, 19 traits were evaluated for QTL analyses.
A total of 99 QTLs for 19 traits was identified by three analyses in common; SMR, CIM and MT-CIM. The map positions of the QTLs together with the additive effects and R 2 values from CIM are presented in Fig. 1 and Table  3. The QTLs were distributed over all linkage groups except 4, 5, 12, 14, 15, and 17 ( Fig. 1). Two or more QTLs were identified for all traits except HD, CD2 and Dia. The number of chromosomes with significant QTL for the specific traits ranged from one (HD, CD2 and Dia) to 12 (GY). The number of significant QTL for the specific chromosomes ranged from zero (LG4, 5, 12, 14, 15, and 17) to 14 (LG2) (Fig. 1). The wild relative, E. pilosa  alleles had an increasing effect on 29.3% of the QTLs in the present study.
A test for potential interactions between significant QTL marker loci for all traits identified a relatively small number of epistatic interactions between loci. A total of 20 interactions consisting of 18 marker loci for four traits  were identified across nine linkage groups and three unlinked loci (Table 4).

QTL for grain yield and yield related traits
Heading date (HD) and maturity date (MD) Two MD QTLs were identified at three locations representative of all three agro-ecologies. The MD QTL on LG2 at 24.8 cM explained 0.34 of R 2 , and was associated with yield related traits such as PWt and SB (Fig. 1). Early maturity is a common characteristic of wild relatives of tef and E. pilosa

Grain yield (GY)
The largest number of QTLs was identified for GY, among the traits studied. Twelve QTLs were identified in nine linkage groups. The highest LOD score was 6.39 for ISSR549b explaining 0.2 of R 2 . Two QTLs in LG3, 50 cM apart, were significant in six locations representing three agro-ecologies. The E. pilosa  alleles in LG18 (ISSR840b) and LG20 (RZ588) increased grain yield. The rest of the QTLs were positively affected by the Kaye Murri alleles.

Shoot biomass (SB)
The most significant QTLs for SB were found on LG3, 8 and 10 with a LOD > 6 and R 2 > 0.19. One QTL on LG20 (RZ588) explained 0.22 of R 2 and the positive allele was from E. pilosa . This QTL co-located with PWt, PSWt and GY QTLs, all with same positive alleles from E. pilosa .

Lodging index (Lodg)
Three QTLs were located on LG1 and 8, and two QTLs were associated with unlinked loci. All five QTL alleles contributed by Kaye Murri increased lodging. The two QTLs (PALb and TCD323) on LG8 were located in the distal region of the linkage group. PALb showed the highest R 2 (0.38) and highest LOD score (5.5) and co-segregated with MD. TCD323 co-located with SB and GY, and was located near eight other QTLs, including lodging related traits, such as Crush2.

QTL for morphological and plant height related traits
Culm length (CulmL) Eight significant CulmL QTLs were identified on seven linkage groups and one unlinked locus ( Table 3). The R 2 ranged from 0.12 to 0.34. Except for RZ251 on LG13, increasing effects of all significant QTLs came from Kaye Murri. The strongest CulmL QTL is TCD95 on LG3 with a LOD score of 5.92 and an R 2 value of 0.21. This locus was associated with PSWt, Inter2, GY and SB.

Culm diameter 1 st and 2 nd internode (CD1 and CD2)
Two and one QTLs were associated with CD1 and CD2, respectively and were identified only in the C2-2 agroecology zone. These traits share common QTL regions on LG2 and the allele for thicker culms was contributed by Kaye Murri.

Peduncle length (PedL)
Eleven significant QTLs were identified on six linkage groups and five of the QTLs were associated with unlinked loci. The R 2 for PedL ranged from 0.11 to 0.35. At seven QTLs, E. pilosa  alleles increased PedL. Among these, two QTLs in LG10 and 21 were negatively associated with other traits (100sw and SB in LG10 and GY in LG21).

Plant height (PH)
Four significant QTLs were identified with R 2 ranging from 0.13 to 0.26. Kaye Murri alleles at QTLs in LG2, 7, and 8 increased PH while the E. pilosa  allele increased PH at RZ588 (LG20). All PH QTLs were associated with QTLs for multiple yield-related traits.

Number of internodes (Ninter)
Three QTLs were associated with Ninter. The most significant QTL (LOD = 4.97, R 2 = 0.20) was on LG2 which was associated with PH. 0.14 -0.09 Crush2 0.45*** -0.02 *, ** and *** significant at the 0.05, 0.01 and 0.001 probability level, respectively.  Three and seven QTLs were identified for Inter1 and Inter2, respectively. These QTLs overlapped in LG13 where the R 2 was about 0.24, and longer internode length resulted from the E. pilosa allele. The unlinked locus RZ961 was also associated with both of these traits.

Crown diameter (Dia)
Only one QTL, ISSR548a in LG8, was detected for Dia. This locus was associated with QTLs for nine different traits; PWt, PSWt, CulmL, PanL, PH, GY, SB, Lodg and Crush2 (Fig. 1). Most of these QTLs were unique to the DZBS location. Kaye Murri alleles increased crown diameter.
Crushing strength at the 2 nd internodes (Crush2) Two QTLs were identified for Crush2. The traits of RPR and Crush were measured to evaluate the strength of culm in order to evaluate lodging resistance. However, QTLs for Crush2 (BCD1087a and ISSR548a) were not co-localized with QTLs for Lodg. RPR1, RPR2 and Crush1 did not show phenotypic variances among lines thus, QTL analyses were not available.

Discussion
Single marker analysis (SMR) detects associations between individual markers and traits; therefore, it does not require a genetic map to be applied. In this study we used SMR for a preliminary test of significance of all polymorphic markers. For the loci that mapped into linkage groups [10], composite interval mapping (CIM) could be applied for detection and mapping of QTLs. Permutation tests were conducted to establish significant thresholds for CIM, reducing the chance of reporting false QTLs. In addition, multiple-trait analysis (MT-CIM) was used to analyze QTL over experiments, for detection of loci that consistently affected the phenotype across environments. The significant QTLs identified by all three analyses in common are presented herein (Table 3).
Tef improvement has relied mostly on mass selection from landraces for the development of new varieties. The grain yield of tef has risen from 3,425 to 4,599 kg/ha over 35 years of breeding [11]. The average rate of yield increase per year for the period of 1960 to 1995 was estimated at 27.16 kg/ha (0.79%), using linear regression of mean grain yield of cultivars on year of release. This gain is similar to rates reported for spring barley, oat and spring durum wheat in Ethiopia [11]. However, the national average grain yield of tef is still about 0.8 t/ha [1] and is not competitive with that of other major grain crops.
Grain yield was significantly correlated with all traits except PedL ( Table 2). The associations of GY with HD, MD, PWt, PSWt, 100sw, SB, CulmL, CD1, CD2, PanL, PH, Inter1, Inter2 and Crush2 indicated that later maturing, taller, more vigorous, and larger plants resulted in more grain yield. Tefera et al. [7,8] showed most yield and yield related traits had high broad-sense heritability (H) in the population used in this study, and moderate to high H values were obtained in a population derived from an intra-specific cross. As expected, improvement of yield potential in tef has been associated with an increase of biomass yield and yield components. Among the 99 QTLs identified, 12 GY QTLs were detected in nine different linkage groups (Fig. 1). The map positions of the QTLs for yield related traits and SB on the same chromosomes overlapped, thus supporting the significant phenotypic correlations (p < 0.001) ( Table 2).
Several chromosomal regions were associated with more than two traits indicating either linkage or pleiotropic effect. Clusters of QTLs (more than five QTLs) for various traits were identified on LG2, 3, 7, 8, 10, 13 and 20 (Fig.  1). Previous studies in cereal crops such as rice and wheat have also shown a clustering of agronomic QTLs [12][13][14][15]. The same chromosome region on LG21 was associated with positive and negative QTL alleles from E. tef for GY and PedL, respectively (Fig. 1), although the correlation between those two traits was non-significant ( Table 2). The PedL QTL showed a similar relationship on LG10 with those of 100sw and SB which are yield related components. The association of two positive QTL effects in the same chromosomal region was reported for studies involving O. rufipogon in rice [13,16]. The allele of O.  [13]. However, in some cases beneficial QTLs from O. rufipogon were associated with undesirable QTLs. For example, a QTL increasing panicle length QTL was in the same region as a QTL increasing the proportion of broken grains [16]. Where associations of desirable and undesirable agronomic QTLs are in the same chromosomal regions, careful selection would be needed to avoid undesirable characteristics in the derived lines.
Epistasis is part of the genetic architecture of grain yield and other agronomic traits. Gene interaction has also been reported for a few phenotypic traits of tef [17][18][19] thus, it is not surprising to detect it for more complex quantitative characters in this study [20]. An analysis to identify the potential epistatic interactions between QTLs identified 20 marker loci resulting in 15 two-way interactions (Table 4). GY QTLs had five two-way interactions and TCD95 and lfm256 were actively involved in the epistasis. The most interesting interaction was between TCD95 on LG3, and TCD227a on LG8, for GY QTLs, because this was shown for SB QTL interaction as well ( Fig. 1 and Table 4). In addition, QTLs on LG3 for GY and SB were detected in all three agro-ecology zones where agronomic traits were measured for this study. Likewise, the GY QTL (CNL53) on LG2 was detected across all three agro-ecologies and had significant interaction with TCD227a in LG8. Therefore, to improve grain yield, these three QTLs may need to be selected together.
Genotype and environment interaction could influence the ability to detect QTLs, even though tef displays versatile agro-ecological adoption with good resilience to both low and high moisture stress. Individual QTLs were not consistently detected across environments, and inconsistent QTL detection has been observed and attributed to QTL × environment interaction, which has been commonly observed in other grain yield QTL studies in cereal crops. Out of 12 GY QTLs, only two QTLs (LG2 and 3) were consistent across three agro-ecology zones. Three QTLs were detected in two agro-ecological zones: on LG7 (zones C2-1 and C3-3), LG8 (zones C2-2 and C3-3) and LG16 (zones C2-1 and C2-2). Even though, five GY QTLs were detected in multiple agro-ecology zones, there were no QTLs significant in all locations. The traits HD and MD as yield component traits are known to be sensitive to altitude because of day length. However, the HD and MD QTLs did not show discernible differences among different altitudes in this study. Assefa et al [21] demonstrated the diversity of yield related traits using 36 different germplasm populations collected from northern and central regions in Ethiopia corresponding to the same agro-ecology zones in this study. Regional differences in various traits of tef germplasm have been reported but altitude gradient regimes had no significant influence in affecting diversity levels in tef germplasm populations. Similar results were found in Ethiopian wheat, barley and sorghum germplasm [21].
Different soil types probably influenced QTL detection in this study. Two soil types were used in Debre Zeit: light soil (DZLS, Andosol, e04) and black soil (DZBS, Vertisol, e03 and e11). Plants were more vigorous and tall in the loamy Andosols, compared to the heavy textured Vertisol, even though the rainfall amount and temperature are the same for both soil types (Hailu Tefera, personal communication). The QTLs for PWt, PSWt, and Ninter were identified only at DZLS (e04), but the QTLs for 100sw, Lodg, PanL, and Inter2 were identified only at DZBS, 1999 (e03) ( Table 3). Since those experiments were conducted at very similar conditions, it is likely that soil type was the major factor interacting with the QTLs. Teklu and Tefera [11] conducted a yield potential experiment in which 10 agronomic traits were examined for 11 tef varieties on two soil types. The most significant (p < 0.05) variety and soil type interactions were found for plant height and panicle length. Among four PH QTLs in this study, two were detected on LG7 (DZLS, e04) and LG8 (DZBS, e03) each. However, three QTLs for PanL were identified only in DZBS (e03), not in DZLS ( Table 3). The environmentally sensitive QTLs for yield and yield components detected in this study clearly illustrate the importance of determining if QTLs by environment interactions are due to changes in magnitude or are crossover interactions before using MAS to select for QTLs. Identifying and selecting the proper allele at QTLs with crossover interactions requires careful evaluation in target environments. Inappropriate allele identification or selection could result in the indirect selection of QTL alleles with detrimental effects in some target environments.
Low grain yield of tef is partly due to the low basic productivity of currently available cultivars, together with susceptibility to lodging which has been the most serious agronomic problem. Lodging index showed positive and highly significant (p < 0.001) correlations with PSWt, 100sw, GY, SB and negative correlations with PedL thus, high yielding RILs tended to lodge ( Table 2). Two of the Lodg QTLs, on LG8, were associated with PH, GY and yield related traits, and the other three QTLs were independent of yield related traits (Fig. 1). The positive correlation of lodging with yield and other important yield component traits indicates that improvement of lodging resistance in tef will be a challenging issue for a breeder. Of five Lodg QTLs, all alleles causing more lodging were from the tall, high yielding and more lodging resistant parent, Kaye Murri compared to E. pilosa (30-5) (lodging score 65.13 vs 81.50) ( Table 1). This results from the unusual patterns of correlations of several traits differentiating the cultivated and wild parents of this cross. The weak or non-significant correlations of Lodg with CD1, CD2, PedL, PanL, PH, Inter1, RPR1, RPR2, Crush1, and Crush2 were counterintuitive. On the other hand, CulmL Ninter, and Inter2, were positively correlated while Dia was negatively correlated with Lodg as would be expected. The lack of significance of the negative correlation coefficients with RPR and Crush traits can be attributed to the small number of replicates and environments as well as the difficulty in measuring those traits. However, field observations of the wild and cultivated parent suggest that the very thin culms, small crown diameter, and weak straw of the wild parent, rather than plant height, are the traits contributing most to its lodging susceptibility. Several studies have found that QTLs for lodging and plant height are linked or located in the same chromosomal regions and could be used as indirect selection parameters for barley [22], rice [23], wheat [12], maize [24] and Italian ryegrass [25]. However, a reduction in plant height to improve lodging resistance may reduce the photosynthetic capacity of a canopy. In addition, the susceptibility to lodging differed among cultivars with similar plant height in wheat and rice [26,27]. Other factors such as stem cellulose or lignin content are related to stem rigidity [28] but were not measured in this study. One of the lignin biosynthesis genes, PAL (Phenylalanine ammonialyase from rice, X16099) co-localized with Lodg QTL in LG8 (Fig. 1) suggesting that it may be a candidate gene for this trait.
The development of inter-specific populations is one strategy to broaden the genetic diversity of cultivated crops and to identify QTLs associated with beneficial traits, such as yield, grain quality and disease resistance [29]. E. pilosa    [15], with the same species. There were two QTLs identified on LG18 and LG20 with an increase in yield from the E. pilosa  alleles ( Figure 1). The QTL on LG18 was not linked to any known undesirable QTLs and the E. pilosa  allele would be directly useful for developing breeding materials. However, the GY QTL interval (less than 10 cM) in LG20 was associated with a large increase in plant height, resulting in lodging. The GY QTL in LG20 may still be useful if the negative linkage can be broken or counteracted by other QTL reducing plant height. If markers can be successfully used to reduce linkage drag, the positive QTLs from E. pilosa  will be potentially useful for improving cultivated tef. Therefore, this study suggests that E. pilosa , and possibly other wild accessions, could be useful for diversifying the cultivated tef germplasm pool.

Conclusion
The primary objective of this study was to determine the number and location of QTLs for important agronomic traits in tef. An inter-specific population was used to map 99 QTLs for 19 traits across 15 linkage groups. The interactions of genotypes and environments among QTLs were reported here to evaluate alleles for target breeding environments. The results of this QTL study are a first step towards the design of a marker-assisted selection program for tef improvement.  [7]. The eight locations were chosen based on their representation of the three major agro-ecosystems of tef in Ethiopia [31]. The humid zone (C1) in the Western regions of Ethiopia has a tef-growing period of more than 150 days and a growing season rainfall of more than 850 mm. The wet semi-arid (C2) in the Central parts of the country is subdivided into two minor areas, high altitude (C2-1) more than 1900 masl and low altitude (C2-2) with 1700-1900 masl. These areas receive a growing season rainfall of 450-850 mm and the growing period is between 100-150 days. The dry semi-arid or the Northern Rift Valley (C3) consists of three minor areas designated as high altitude (C3-1) more than 1900 masl, mid-altitude (C3-2) 1700-1900 masl, and low altitude (C3-3) less than 1700 masl. The eight locations for QTL analysis were as follows: i) C2-1; Akaki, Chefe, Holetta ii) C2-2; Debre Zeit Light Soil, Debre Zeit Black Soil, Denbi, and iii) C3-3; Alemtena, Melkassa.

Trait evaluations
The RIL population was evaluated for 22 traits during the 1999 and 2000 growing seasons (Table 1). Ten plants per line were randomly selected at physiological maturity, and the following measurements were taken: (1) Table 1.

Statistical analyses
The genetic linkage map for the 94 RILs reported by Yu et al. [10] was used in this study. Briefly, 142 molecular markers produced 192 segregating loci; among those, 156 loci linked into 21 groups and 36 loci were unlinked. The map was constructed using restricted fragment length polymorphism (RFLP), simple sequence repeats derived from expressed sequence tags (EST-SSR), single nucleotide polymorphism/insertion and deletion (SNP/INDEL), intron fragment length polymorphism (IFLP), targeted region amplification polymorphism (TRAP) and inter-simple sequence repeat amplification (ISSR). The map covered 2,081.5 cM with a mean marker interval of 12.3 cM.
Phenotypic data were analyzed in SAS System V.8 [33]. The normal distribution of phenotypic data was verified using Shapiro-Wilk test at α = 0.01, and in some cases required transformation to log or square-root. Analysis of variance was done for each experiment, and line means were used for QTL analysis. Pearson's correlation coefficient was computed among phenotypic traits. QTL analyses were implemented in QTL Cartographer Version 2.5 [34]. First, data were analyzed to identify markers associated with variation for each trait using single marker analysis (SMR) using all linked and unlinked loci at a statistical threshold of p < 0.01. Second, trait data were analyzed by composite interval mapping (CIM) [35], using a reduced set of unlinked marker loci containing significant loci detected by SMR analysis. The parameter settings for CIM were model 6, forward and backward stepwise regression with threshold of p < 0.01 to select cofactors, window size 10 cM and 2 cM walking speed along chromosomes. QTLs were verified by LOD sores compared to an empirical genome-wide significance threshold calculated from 1,000 permutations for p < 0.01 to control type-I error. QTL position, LOD score, coefficients of determination (R 2 ), and additive effect were estimated by CIM for each QTL. Third, multiple-trait analysis method (MT-CIM) was used to jointly analyze QTL over experiments using the value of a trait in different experiment as a correlated trait [36]. The analysis through MT-CIM was performed using the parameter settings above, and LOD = 3.5 for declaring QTLs. Fourth, the Epistat program [37] was used to identify and evaluate pairs of loci whose combined effects can not be explained by independent and additive action using maximum likelihood together with Monte Carlo simulations.