Patterns of molecular and phenotypic diversity in pearl millet [Pennisetum glaucum (L.) R. Br.] from West and Central Africa and their relation to geographical and environmental parameters

Background The distribution area of pearl millet in West and Central Africa (WCA) harbours a wide range of climatic and environmental conditions as well as diverse farmer preferences and pearl millet utilization habits which have the potential to lead to local adaptation and thereby to population structure. The objectives of our research were to (i) assess the geographical distribution of genetic diversity in pearl millet inbreds derived from landraces, (ii) assess the population structure of pearl millet from WCA, and (iii) identify those geographical parameters and environmental factors from the location at which landraces were sampled, as well as those phenotypic traits that may have affected or led to this population structure. Our study was based on a set of 145 inbred lines derived from 122 different pearl millet landraces from WCA. Results Five sub-groups were detected within the entire germplasm set by STRUCTURE. We observed that the phenotypic traits flowering time, relative response to photoperiod, and panicle length were significantly associated with population structure but not the environmental factors which are expected to influence these traits in natural populations such as latitude, temperature, or precipitation. Conclusions Our results suggested that for pearl millet natural selection is compared to artificial selection less important in shaping populations.


Background
Pearl millet [Pennisetum glaucum (L.) R. Br.] is an annual, diploid, highly allogamous cereal with seven chromosome pairs [1]. It can be grown in a vast range of environmental conditions including environments that are characterized by frequent drought events and poor soil fertility [2]. This is one reason that pearl millet is one of the most important staple food crops in West and Central Africa (WCA) [3]. The other reason is that pearl millet grain has relatively high nutritional values for a cereal. Its grain has higher protein and fat content than wheat or rice and its amino acid composition is more appropriate for human nutrition than that of wheat or polished rice [4][5][6].
Cultivated pearl millet displays tremendous phenotypic variability for traits such as flowering time, panicle length, grain and stover characteristics, tolerance to drought, pests, and diseases, as well as nutritional value (e.g., [7]). Efficient and systematic exploitation of this diversity is the key to any crop improvement program [8]. This, however, requires in a first step the assessment of genetic diversity and population structure of the species under consideration.
For pearl millet, several studies have examined these issues. [9] determined the influence of farmer management on pearl millet landrace diversity in two villages in North-Eastern Nigeria. [10] assessed the genetic diversity within and between ten Indian pearl millet landraces. The genetic diversity of 46 wild and 421 cultivated genotypes of pearl millet from Niger was analyzed by [11,12] examined the phylogeny and origin of pearl millet. However, to our knowledge, no earlier study examined the genetic diversity and population structure of pearl millet across a wide geographic range in WCA.
The distribution area of pearl millet in WCA harbours a wide range of climatic and environmental conditions as well as diverse farmer preferences and pearl millet utilization habits (cf. [13]). This may lead to local adaptation and thereby to population structure. However, no earlier study examined systematically the forces that may have affected or led to the observed population structure in pearl millet.
The objectives of our research were to (i) assess the geographical distribution of genetic diversity, (ii) assess the population structure of pearl millet from WCA, and (iii) identify those geographical parameters and environmental factors from the location at which landraces were sampled, as well as those phenotypic traits that may have affected or led to this population structure.

Results
The heritabilities of the four phenotypic traits assessed for the 145 pearl millet in-breds ranged from 0.64 for SV to 0.93 for PL and was 0.80 for PH and 0.89 for FT. The geographic and environmental parameters as well as the phenotypic traits available for all pearl millet inbreds showed a continous distribution ( Figure 1). The correlation between the geographic and environmental parameters and examined phenotypic traits ranged from -0.828 between latitude and precipitation to 0.667 between latitude and mean annual temperature.
For the 20 SSR markers examined in our study, the number of alleles ranged from 9 to 28 (Table 1), with an average of 16.4. Gene diversity D was lowest for markers PSMP2249 and PSMP2267 (0.49) and highest for marker PSMP2063 (0.92). The average D value across all pearl millet inbreds was 0.74.
The first two principal components, which explained 4.2 and 3.9% of the total genetic variation, revealed no obvious clusters ( Figure 2). The STRUCTURE analysis resulted in five sub-groups and one admixed group. These groups, which comprised between 14 and 40 inbreds (Table 2), were located in different sectors of the PCA. The gene diversity D of the sub-groups ranged from 0.62 (sub-group 1) to 0.72 (sub-group 4) and the number of group-specific alleles varied between 15 (subgroup 3) and 44 (sub-group 4). The overall fixation index F st was 0.08. The number of alleles per locus of the pearl millet subsets of size 5, 10, 15, ..., 40 which maximize the gene diversity D ranged from 4.5 to 12.5, where gene diversity D varied between 0.74 and 0.82 (Table 3).
The AMOVA with the country of origin as hierarchy level revealed that most of the genetic variation was present between inbreds derived from landraces of the same country as well as within landraces but only a small proportion of the total genetic variation was attributable to countries (Additional file 1). The same trend was observed with respect to the agro-ecological zone of origin as hierarchy level (Additional file 1). Plotting the STRUCTURE results on the geographic map revealed no obvious association of sub-group membership probability and country of origin or agro-ecological zone of origin ( Figure 3). The pairwise Pearson's correlation coefficient between MRD and the geographic or phenotypic trait-based distances was highest for SBD (0.220) and lowest for PH (-0.028) ( Table 4). The highest correlation with SBD was observed for PL (0.077) and the lowest for annual temperature (-0.023). The tests of association between the Q matrix from STRUCTURE and geographic or environmental parameters as well as phenotypic traits were significant (a = 0.05) for FT, RRP, PL, and country of origin.

Discussion
Genetic diversity of the examined pearl millet germplasm Irrespective of the considered hierarchy level, the AMOVA revealed that about four times more variation was present between landraces than within landraces (Additional file 1). Our finding is in good accordance with the results of [10] who observed that about 2.5 times more variation was present between Indian pearl millet landraces than within the landraces. Consequently, we observed realistic estimates of within and between landrace variation despite the fact that we examined inbreds derived from landraces as well as that the average number of inbreds per landrace was only 1.2.
Long-term selection gain requires genetic variability [8]. Therefore, it was important to examine the genetic diversity of the pearl millet inbreds analyzed in our study. Since estimates of genetic diversity D are not affected by differences in sample size, direct comparisons between different studies are possible. Across the 145 pearl millet inbreds examined, we observed a total gene diversity D of 0.74 (Table 2). This value is higher than the gene diversity estimates observed in the study of [11] (0.49) for 421 genotypes of 140 allogamous cultivated pear millet landracs from Niger, based on SSR marker. This difference might be explained by the fact that in the latter study only genotypes from Niger were examined, whereas the inbreds of our study were derived from landraces from a much larger area of WCA. Furthermore, we observed a higher D value than [12] (0.60) for a pearl millet world collection. This finding might be due to the fact that we used a higher proportion of di-nucleotide repeat SSR markers, which tend to be more variable than SSRs with longer repeat motifs (cf., [14]), than [12].
Our observation on D was supported by the results on the average number of alleles per locus. Although our population size was smaller than that of the other studies and we examined inbreds instead of heterozygous genotypes, we observed a considerably higher number of alleles per locus (16.4) ( Table 2) than previously reported by [11] (6.2) and [12] (9.6). These findings on D and the average number of alleles per locus suggested that the pearl millet inbreds examined in this study is a valuable resource for increasing the genetic diversity in pearl millet breeding pro-grams.

Spatial distribution of genetic diversity
We observed differences between the genetic diversity parameters gene diversity D, average number of alleles per locus, and the number of group specific alleles A between the pearl millet inbreds originating from different WCA countries (Table 2). However, these differences are to a large extent (data not shown) only due to the fact that the number of inbreds originating from each of the ten countries differed considerably. Furthermore, the AMOVA revealed that less than one percent of the variation was found between countries (Additional file 1). This observation is in contrast to findings of [15], who observed statistically different D values for pearl millet genotypes from different countries. In addition, that study reported coefficients of differentiation between the genotypes from different countries (0.07-0.23) that suggested the presence of considerable variation between countries. These contrasting findings compared to our results might be due to the fact that our sampling was extremely unbalanced with respect to the number of inbreds per country. Another explanation might be that our study was based on SSR markers, whereas [15] examined isozyme markers that may not be selectively neutral and reveal only a low number of alleles per locus. In WCA, pearl millet is cultivated throughout three agro-ecological zones [16]. The adaptation of pearl millet to these different environments has the potential to lead to genetic differentiation. However, AMOVA revealed that almost none of the SSR genotype variation observed could be attributed to differences between the agroecological zones (Additional file 1). Our observation suggested that establishing core-collections based on the information of the agro-ecological zone of origin (e.g., [7]) might be sub-optimal. This was supported by our observation that the three agro-ecological zones contributed different numbers of inbreds to the pearl millet subsets maximizing the gene diversity D (Table 3).
Our findings might be explained by the habit of farmers, especially in the sudano-sahelian and in the sudanian zones, to grow not only one landrace but several (Bettina I.G. Haussmann, personal communication). In order to fill the hungry period, an early maturing landrace is cultivated, and in addition a landrace with late maturity that has a high yield potential in years with good growing conditions but might fail in years with severe terminal drought. Also the highly allogamous behaviour of the crop combined with an overlapping flowering time of early and late but photo-sensitive landraces in certain years with late planting dates has the potential to diluting differentiation between agroecological zones. Finally, our observation might be explained by the fact that the environmental conditions within each agro-ecological zone are too heterogeneous (high inter-annual climate variability) to permit  detectable genetic differentiation between landraces of different agro-ecological zones.

Inference of population structure
Due to the fact that neither country nor agro-ecological zone of origin revealed a clear sub-grouping of the pearl millet inbreds examined in our study (Additional file 1). we used the software STRUCTURE [17] to infer the population structure. The results of this analysis indicated that the 145 pearl millet inbreds of our study belong to five sub-groups (Additional file 2). The grouping of the pearl millet inbreds by STRUCTURE was in fair accordance with the results of the PCA (Figure 2). the five groups showed no distinct clusters but were located in different sectors of the PCA. Our finding together with an overall F st value of 0.08 suggested that the sub-groups of pearl millet are not as differentiated as in maize (e.g., [18]). Our finding of five sub-groups for pearl millet inbreds from WCA was in the range of previously reported numbers of sub-groups. [15] identified based on isozyme markers two distinct sub-groups in pearl millet inbreds from West Africa, whereas [19] reported three subgroups for pearl millet landraces from Niger. In contrast, [1] identified based on phenotypic data of morphological and disease resistance traits ten clusters of pearl millet landraces from Burkina Faso. These differences in the number of sub-groups compared to our study, are most likely due to the fact that earlier studies examined a lower number of accessions than we did. In addition, the above mentioned studies were based either on phenotypic data or a relatively low number of isozyme markers, while we used SSR markers to examine pearl millet inbreds.

Association of population structure and geographical, environmental, and phenotypic parameters
In the harshest production environments in WCA, farmers rely in highly cross-pollinated pearl millet landraces with site-specific adaptation and good production stability that are rarely outyielded by on-station-bred improved cultivars [20]. Thereby, farmers, together with the environmental factors, shape the pearl millet populations (cf. [9,21]). Identifying the variables that are correlated with population structure, thus, has the potential to help identify adaptive traits as well as the environmental conditions which are the driving factors of adaptation.
We observed that the phenotypic traits FT, PPR, and PL were significantly associated with the Q matrix from STRUCTURE (Table 4) using a multivariate linear  Table 3 Properties of the pearl millet subsets maximizing gene diversity D  [22,23] or Arabidopsis [24]. This finding can be explained by the fact that FT is in most plant species under divergent selection as it is the key adaptive trait enabling plants to flower at the optimum time for pollination and seed development [25]. In WCA, the growing conditions of pearl millet are characterized, among other hazards, by highly variable beginnings of the rainy season [26]. Photoperiod-sensitive flowering, in our study measured as PPR, has the potential to enhance adaptation to such environments. This is due to the fact that it leads to simultaneous flowering of genotypes in the target region, independent of the individual date of sowing in different fields. This in turn, is expected to lead to divergence and thereby might explain why PPR is associated with population stucture in pearl millet (Table 4).
Our finding that PL was significantly associated with the Q matrix from STRUCTURE (Table 4) might be explained by the fact that farmers preferences seem to differ largely in different regions of WCA (cf. [27]). This has the potential to lead to population stucture. The preference for a long panicle in certain regions of WCA is mostly due to practical reasons. In these areas, the harvested pearl millet panicles are usually tied up into bundles for transport from the field to the grain store, and this can be more easily done with long panicles, where short panicles require usually a bag for transportation.
We observed that the phenotypic traits FT, RRP, and PL were significantly associated with the Q matrix from STRUCTURE, but not the environmental factors ( Table  4) that are expected to influence these traits in natural populations such as latitude, temperature, or precipitation (also not the monthly averages; data not shown).
This was also true, if the phenotypic traits were used as cofactors when examining the environmental factors (data not shown). Our results suggested that for pearl millet landraces natural selection is less important than human selection in shaping populations which in turn can be explained by the fact that pearl millet landraces are no natural populations.  Agro-ecological zone 0.1139 * P < 0.05, ** P < 5 × 10 −5 , *** P < 5 × 10 −10 . † P value for the association of the corresponding parameters with the Q matrix from STRUCTURE based on a multivariate linear regression.

Conclusions
Our findings of high D values as well as a high average number of alleles per locus suggested that the pearl millet inbreds examined in this study is a valuable resource for increasing the genetic diversity in pearl millet breeding programs. Furthermore, the results of this study suggested that for pearl millet landraces natural selection is less important than human selection in shaping populations.

Plant materials
A set of 145 inbred lines (14 inbreds in S3, 131 inbreds in S4), derived from 122 different pearl millet landraces, were used in this study (Additional file 3). Our study was based on inbred lines derived from landraces as for this type of germplasm phenotypic evaluation can be performed on the basis of several plants per genotype and, thus, with higher heritability. Landraces were considered as different, if they have a different local name or were collected from different locations (cf. [9]). The number of inbreds per landrace ranged from one to two and was on average 1.2. The landraces had been assembled during joint pearl millet collection missions involving the "Insti-

Environmental parameters and phenotypic evaluation
For each collection site, mean annual precipitation and mean annual temperature were calculated using the gridded bivariate interpolation method [28] based on more than 160 years of data available from ftp://ftp. ncdc.noaa.gov/pub/data/ghcn/v2/. Based on the mean annual precipitation, each collection site was assigned to one of three agro-ecological zones (sahelian, sudanosahelian, sudanian) [16]. In addition, the altitude of each collection site was obtained by cross-referencing the geographic coordinates with the WORLDCLIM database [29] using DIVA-GIS software [30]. All 145 pearl millet inbreds were grown in 2007 on the ICRISAT research station in Sadore (Niger) with two sowing dates (15 th June and 16 th July). The two experiments were located next to each other. The design of each experiment was an a-lattice with two replicates. As experimental units, one-row plots with a length of 4.8 m and with 0.75 m between rows, were used. The recorded traits were seedling vigor (SV, score from 1 to 5, 1 = best, 5 = worst), flowering time (FT, Julian days), plant height (PH, cm), and panicle length (PL, cm). For each trait, the adjusted entry mean across both sowing dates, where heterogeneous error variances were assumed, was calculated. Heritability h 2 was computed as: where s 2 g is the genotypic variance and v 2 the mean variance of a difference of two adjusted treatment means [31]. Furthermore, for each inbred, the relative response to photoperiod (RRP) was calculated as 1-(FT 2 /FT 1 ) [32], where FT 1 and FT 2 were the adjusted entry means for FT (Julian days) observed in the first and second experiment, respectively. This parameter is unit-less and deviates from 0 for genotypes responding to photoperiod.

Molecular markers
Total genomic DNA was extracted from leaf tissue using a modified CTAB protocol [33]. A total of 20 simple sequence repeat markers (SSRs) [34][35][36][37] (Additional file 4) were used to genotype the 145 pearl millet inbreds. The SSRs were grouped into multiplex sets of three, where forward primers were labeled with fluorescent dyes (6-FAM, HEX, and TET; Biomers GmbH, Germany), and amplicons were generated using an amplification program of 94°C/3 min, followed by 30 cycles of 94°C/45 s, optimum annealing temperature T opt /1 min (Additional file 4) and 72°C/45 s, and a final extension step of 72°C/10 min. PCR products were denatured and size-fractioned using capillary electrophoresis on a MegaBACE sequencer (Amersham Biosciences, Sweden). The MegaBACE Fragment Profiler v1.2 (Amersham Biosciences, Sweden) was applied to size peak patterns, using internal ROX 400 HD for allele calling (Additional file 5). Each of the 20 SSRs showed less than 25% missing values. The map positions of these markers were extracted from the Gramene database.

Statistical analyses
Due to the genome-wide distribution of the SSR markers used in this study (Additional file 4) as well as the rapid decay of linkage disequilibrium in pearl millet [38], linkage disequilibrium between markers was neglected for all statistical analyses described below. The number of alleles per locus, the number of group-specific alleles A, and the gene diversity D [39] were determined. Modified Rogers distance (MRD) was calculated according to [40] and an F st analysis was performed according to [39] using the observed and expected heterozygosities for the population under consideration. Principal component analysis (PCA) of the 145 inbreds based on the (i) SSR allele frequency matrix and (ii) geographical and environmental parameters as well as the evaluated phenotypic traits (Additional file 6) was carried out. Analyses of molecular variance (AMOVA) were performed using Arlequin [41].
In order to identify those r = 5, 10, 15, ..., 40 pearl millet inbreds which maximize the gene diversity D, we used an algorithm which is based on an iterative maximization procedure [42]. Briefly, a subset of r inbreds was first chosen at random from the entire 145 pearl millet inbreds. In step one, all the subsets of size (r -1) were examined. The subset having the highest level of D was retained. In step two, among the remnant inbreds, the inbred bringing the greatest increase in D was added. These two steps were repeated until the gene diversity D of the subset reached a maximum.
A model-based approach implemented in software package STRUCTURE [17] was used to determine the presence of population structure and assign pearl millet inbred lines to sub-groups. In our investigations, the set of 145 inbreds was analyzed by setting the number of sub-groups from one to 20 with five repetitions. For each run of STRUCTURE, the burn-in time as well as the iteration number for the Markov chain Monte Carlo algorithm was set to 100,000. We used the ad hoc criterion described by [43] to estimate the number of subgroups. From the five repetitions with the estimated number of sub-groups, the one with the maximum likelihood was used to assign lines with membership probabilities of 0.80 or more to sub-groups. Inbreds with membership probabilities less than 0.80 for all individual sub-groups were assigned to an admixed group.
Pairwise geographic distances between all 145 pearl millet inbreds were calculated from the geographic coordinates. Furthermore, for each geographic and environmental parameter, phenotypic trait, as well as for the Q matrix from STRUCTURE (SBD), distances ED between all pairs of pearl millet inbreds were calculated as: where ED ij is the distance between inbred i and j, n the number of dimensions of the examined parameter, t ik and t jk the parameter values of the inbreds i and j for the kth dimension, t k − the mean parameter value of the kth dimension across all inbreds, and  t k the standard deviation of the parameter values for the kth dimension across all inbreds. For the Q matrix from STRUCTURE, n = 4, where n = 1 for the other parameters.
Pearson's correlation coefficient was calculated for all combinations between MRD and the above mentioned distances as well as between SBD and the above mentioned distances. In addition, we used the following multivariate linear regression model: where Q il is the probability that the ith pearl millet inbred belongs to the lth sub-group (i.e., the value of the ith row in the lth column of the Q matrix from STRUCTURE), μ l the intercept term for the lth column of the Q matrix from STRUCTURE, t i the parameter value of the ith pearl millet inbred, and e il the residual. We examined the geographic or environmental parameters as well as phenotypic traits with this model in order to identify those parameterrs which explain best the variation in the Q matrix from STRUCTURE.
If not stated differently, all analyses were performed with the statistical software R [44].

Additional material
Additional file 1: Analysis of molecular variance for the 145 pearl millet inbred genotypes of this study. Analysis of molecular variance for the 145 pearl millet inbred genotypes with respect to their country and agro-ecological zone of origin, where DF are the degrees of freedom, SSD the sum of squares deviations, s 2 the variance component, and % the percentage of variance contributed by each source of variation.
Additional file 2: Graphical representation of the results of STRUCTURE. Graphical representation of the results of STRUCTURE, where K is the number of sub-groups.
Additional file 3: Details for the 145 pearl millet inbred genotypes examined in this study. Details of the 145 pearl millet inbred genotypes examined in this study.
Additional file 4: Simple sequence repeats markers used in this study. Simple sequence repeats markers used in this study, where LG is the linkage group, Pos. the position in cM, and T opt the optimized annealing temperature.
Additional file 5: Screen shot of the MegaBACE Fragment Profiler. Screen shot of the MegaBACE Fragment Profiler to illustrate the procedure of allele calling.
Additional file 6: Principal component analysis of the 145 pearl millet inbreds examined in our study based on the corresponding geographical and environmental parameters as well as the evaluated phenotypic traits. Principal component analysis of the 145 pearl millet inbreds examined in our study based on the corresponding geographical and environmental parameters as well as the evaluated phenotypic traits. PC1 and PC2 are the first and second principal component, respectively, and the values in brackets give the proportion of explained variance. The different colored segments of the pie charts give the probability that a certain individual belongs to one of the five sub-groups identified by STRUCTURE.