- Open Access
Population genomics of Zea species identifies selection signatures during maize domestication and adaptation
BMC Plant Biology volume 22, Article number: 72 (2022)
Maize (Zea mays L. ssp. mays) was domesticated from teosinte (Zea mays ssp. parviglumis) about 9000 years ago in southwestern Mexico and adapted to a range of environments worldwide. Researchers have depicted the maize domestication and adaptation processes over the past two decades, but efforts have been limited either in sample size or genetic diversity. To better understand these processes, we conducted a genome-wide survey of 982 maize inbred lines and 190 teosinte accessions using over 40,000 single-nucleotide polymorphism markers.
Population structure, principal component analysis, and phylogenetic trees all confirmed the evolutionary relationship between maize and teosinte, and determined the evolutionary lineage of all species within teosinte. Shared haplotype analysis showed similar levels of ancestral alleles from Zea mays ssp. parviglumis and Zea mays ssp. mexicana in maize. Scans for selection signatures identified 394 domestication sweeps by comparing wild and cultivated maize and 360 adaptation sweeps by comparing tropical and temperate maize. Permutation tests revealed that the public association signals for flowering time were highly enriched in the domestication and adaptation sweeps. Genome-wide association study identified 125 loci significantly associated with flowering-time traits, ten of which identified candidate genes that have undergone selection during maize adaptation.
In this study, we characterized the history of maize domestication and adaptation at the population genomic level and identified hundreds of domestication and adaptation sweeps. This study extends the molecular mechanism of maize domestication and adaptation, and provides resources for basic research and genetic improvement in maize.
Maize (Zea mays L. ssp. mays) is the most widely planted crop species for food, feed, and industrial materials . Maize, along with its wild relatives, also serves as an excellent model organism for understanding the genetic and functional mechanisms of plant domestication and adaptation. Maize and teosinte make up the genus Zea, which consists of five species distributed from northern Mexico through Central America [2,3,4]. The five species are Zea nicaraguensis (hereafter nicaraguensis), Zea luxurians (hereafter luxurians), Zea diploperennis (hereafter diploperennis), Zea perennis (hereafter perennis), and Zea mays. Of these, diploperennis and perennis are diploid and tetraploid perennial teosinte, respectively, whereas the others are diploid annual species. The annual species Zea mays consists of four subspecies, including the domesticated maize, the lowland adapted Zea mays ssp. parviglumis (hereafter parviglumis), the highland adapted Zea mays ssp. mexicana (hereafter mexicana), and the mid-altitude adapted Zea mays ssp. huehuetenangensis (hereafter huehuetenangensis). A refined understanding of the genetic relationship within the genus Zea can help elucidate the trajectories of maize domestication and adaptation.
Previously published genetic and archaeological data clearly reveal that maize was domesticated from parviglumis in a single domestication event in southern Mexico ~ 9000 years ago [5,6,7]. During this period, maize underwent dramatic phenotypic changes in both morphological and physiological characteristics [8,9,10]. The genetic basis of the morphological differences between maize and teosinte has been intensely investigated by quantitative trait locus (QTL) mapping using maize-teosinte populations [11,12,13,14,15]. However, only a limited number of domestication QTLs have been mapped to the underlying genes, including teosinte branched1 (tb1) controlling branching [16,17,18], teosinte glume architecture1 (tga1) controlling the formation of the stony fruit case [19, 20], and grassy tillers1 (gt1) affecting prolificacy . In addition to the cloning of single genes, population genetics comparisons of maize and teosinte have revealed evidence for positive selection in hundreds of genes during maize domestication [3, 22].
After its domestication, maize began to spread from southern Mexico into North and South America, where it adapted to these diverse environmental conditions [4, 5]. One of the most important events in this adaptation process was the divergence between tropical and temperate lines around 3400–6700 years ago . Various environmental differences between temperate and tropical regions, such as temperature and day length, shaped maize diversity and facilitated its movement, and the footprints of this adaptation process were recorded in its genome. As with the study of the domestication process, genome-wide-level genotypic datasets provide an excellent resource for characterizing the genetic basis of adaptation. Adaptation studies involving these datasets linked to many aspects of the maize phenotypes and its metabolic pathways have identified a large number of selected loci, which reveal the complex genetic architecture of adaptation [23,24,25].
Flowering time is a key component in the adaptation of maize to local conditions as it moved to higher latitudes post-domestication. Five cis-variants in four genes, including a miniature transposon (MITE) located ~ 70 kb upstream of ZmRap2.7 , a CACTA-like transposon in the ZmCCT10 promoter , a Harbinger-like transposon located ~ 57 kb upstream of ZmCCT9 , and SNP-1245 and InDel-2339 in the promoter of ZCN8 , have been identified that contribute to phenotypes that allowed the pre-Columbian spread of maize throughout the Americas. The map of these interacting genes suggests that the SNP-1245A allele of ZCN8 may have been the first to be selected, whereas the other four early-flowering alleles made specific contributions to northward expansion in North America . These results suggest that the adaptation of maize was a complex process, involving numerous genetic loci that were selected at different evolutionary times for local adaptation .
During the past two decades, researchers have depicted the history of maize domestication and adaptation using genetic information from cultivated maize and its wild relatives [3, 5, 7, 22,23,24], but efforts have been limited either in sample size or geographic range. Here, a collection of 982 maize inbred lines representing global tropical, subtropical, and temperate germplasm and 190 teosinte accessions from Mexico and Central America were genotyped using the Illumina MaizeSNP50 BeadChip. We used this resource to determine the evolutionary relationship of the genus Zea, and to identify the loci that have undergone selection during maize domestication and adaptation. Subsequently, we performed co-localization analysis of selective sweeps with known selected genes, and associated genes for adaptation traits identified via genome-wide association studies (GWASs). We found that parts of the selected loci were associated with domestication and adaptation traits. This study will provide insights into maize evolutionary history, and the genetic resource should facilitate future maize breeding.
Genetic structure within the genus Zea
Using 42,204 high-quality single-nucleotide polymorphisms (SNPs), all 1172 materials (982 maize inbred lines and 190 teosinte accessions) were unambiguously assigned to the maize or the teosinte clusters through population structure analysis (Fig. 1A; Data S1). Membership probabilities of each teosinte individual in the maize cluster (0 < P < 0.5) reflected the common ancestry between some teosinte accessions and maize. Maize inbred lines were further divided into tropical/subtropical (hereafter tropical; 669 lines) and temperate (157 lines) subgroups (Fig. 1B; Data S1), consistent with the historical separation of these two subgroups [5, 30]. A substantial mixed group (156 lines) also shows the effect of more recent breeding efforts to expand diversity within each breeding pool by bringing in germplasm from the other. Twelve teosinte accessions from nicaraguensis and one accession from luxurians clustered into a single subgroup (Fig. 1B, C; Data S1), suggesting the possibility of genetic similarity between these two subspecies. Accessions from mexicana and parviglumis clustered independently, and each formed a unique subgroup with 96 and 75 accessions, respectively (Fig. 1B, C; Data S1). The diploperennis, perennis, and huehuetenangsis accessions clustered into a mixed subgroup, and the membership probabilities of diploperennis and perennis in mexicana and nicaraguensis subgroups were similar (Fig. 1C; Data S1). Subsequent differentiation of mexicana and parviglumis revealed two major subgroups including two mexicana clusters and four parviglumis clusters, in agreement with races classified by geographical distribution (Fig. 1D; Fig. S1; Data S1).
In addition to the population structure analysis, we also carried out a principal component analysis (PCA) using the same SNP data set, and found that the PCA results strongly supported the classification of species, subspecies, and races based on the population structure analysis of the genus Zea (Fig. S2). Whereas, the PCA plots show that the extreme points in maize represent B73 and Mo17, and that the spread of the maize points is distorted and over-stretched. This phenomenon might be caused by SNP ascertainment bias, especially from the Syngenta SNPs. To evaluate the effect of ascertainment bias caused by Syngenta SNPs, we re-analyzed population structure and principal component using 30,974 non-Syngenta SNPs. The results from ADMIXTURE show the correlation of membership probabilities calculated by all SNPs and non-Syngenta SNPs are pretty high (R2 > 0.99) for each assigned group (Fig. S3A, B). In addition, the PCA plots show similar distribution of maize (Fig. S3C, D). Furthermore, we calculated the polymorphic information content (PIC) for each SNP, and found that the genetic diversity was quite similar between the results calculated from two different datasets (Fig. S3E). Taken together, these results suggest that the ascertainment bias caused by Syngenta SNPs did not affect the global estimation of genetic relationship and genetic diversity in the genus Zea although it indeed affected the genetic distance of maize inbred lines.
To identify the primary sources of maize genetic diversity, we constructed a neighbor-joining phylogenetic tree that included all entries in this study (Fig. 2). In the phylogenetic tree, the accession in the luxurians group was closest to nicaraguensis (chosen as the root of the tree), followed in order by diploperennis and perennis, huehuetenangsis, mexicana, parviglumis, and, finally, maize. These groupings reflect the evolutionary lineage of all Zea species and subspecies. The monophyletic clade including all maize lines (Fig. 2A) strongly supports a single domestication event in maize. The parviglumis accessions from the Central Balsas race were closest to maize (Fig. 2), favoring the Balsas River valley as the center of maize domestication [5, 6, 31, 32]. In addition, the groups formed by the mexicana and parviglumis accessions seemed to be interconnected in a manner consistent with their geographical overlap (Fig. 2B; Fig. S1). Collectively, the evolutionary relationship of all Zea species and subspecies inferred by three methods is fully consistent with the current taxonomy of the genus Zea.
Shared and unique haplotypes in maize and teosinte
Because of their proximity to maize, further analyses were focused on mexicana and parviglumis teosinte, as compared with tropical and temperate maize. These comparisons allowed the determination of genetic variation acquired by maize from teosinte during the domestication period, as compared to variation partitioned during its adaptation from tropical to temperate environments. High pairwise FST among these four subgroups (0.10 < FST < 0.21) indicated high population differentiation (Table S1). Furthermore, high pairwise FST between teosinte and maize and relatively small FST between tropical and temperate maize reflect maize domestication and adaptation history. Whereas, we found the haplotype richness in parviglumis was similar with that in tropical maize (Table 1). To exclude the biased estimation of haplotypes caused by sample size, we randomly selected 75 samples in each group with 100 bootstraps except parviglumis that had the smallest sample size. As expected, the window-based haplotype number in teosinte was much greater than modern maize, with the order following as parviglumis > mexicana > tropical maize > temperate maize (Fig. S4). These findings indicate that the genetic diversity in maize, especially temperate maize, was dramatically reduced during maize domestication and adaptation.
Many group-specific haplotypes were also observed in the four subgroups, parviglumis, mexicana, tropical and temperate maize (Fig. 3; Table 1). The presence of relatively fewer maize-specific haplotypes suggests that most of the diversity present in the domesticated maize gene pool is contributed by teosinte, and is not due to de novo haplotype creation since domestication. Both tropical and temperate maize had a great proportion of haplotypes in common with parviglumis and mexicana (Fig. 3), suggesting that both parviglumis and mexicana contributed to ancestral alleles into domesticated maize. Whereas, the contribution of parviglumis to maize during domestication may be overestimated because of the rapid expansion of the initial maize progenitor population.
Footprints of selection in the genome
The domestication of maize from its wild progenitor resulted in extreme morphological changes in plant and ear architecture, followed by further changes as a result of selection during crop adaptation [8, 33]. To determine if these changes can be detected as footprints of selection in the maize genome, two between-population comparisons, the calculation of FST, and a cross-population composite likelihood ratio (XP-CLR) approach, were implemented for sliding windows between teosinte and maize, and between tropical and temperate maize (Fig. 4; Table 2; Data S2). Based on the top 0.5% of XP-CLR and FST values, we identified 141 and 295 regions, respectively, associated with domestication, with 42 regions identified in common by both methods (Fig. 4C; Table 2). We similarly identified 138 and 268 regions, respectively, for adaptation, with 46 regions identified by both methods (Fig. 4D; Table 2). The small portion of overlapping sweeps (~ 30%) between different methods may be due to the different aspects the two methods focus on. FST is based on single marker analysis with large variance of its measurements, while XP-CLR is a model-based extension of FST to multiple-loci analysis using linkage disequilibrium (LD) in the reference population to weight SNPs and then to reduce the high ratio of false positives . Collectively, we identified 394 regions with domestication features and 360 regions with adaptation features, covering 5.7% (131 Mb) and 5.5% (127 Mb) of the genome, respectively (Table 2). For domestication, the size of these selection footprint regions ranged from 100 kb to 1.7 Mb, with a mean size of 333 kb, harboring 2218 genes; fewer selection footprint regions with a similar average size (352 kb) were detected during the adaptation process (Fig. 4E, F; Table 2). In addition, 69 of the domestication-related selective sweeps showed evidence of selection during adaptation, indicating that a subset of around 17% of the domestication loci may have also contributed to adaptation related phenotypes (Data S2).
To test if genetic variation within selected regions contributed to phenotypic changes during maize domestication and adaptation, we collected 29 previously reported genes with evidence of selection during domestication and adaptation (Table S2) and performed a co-localization analysis (Fig. 5; Data S2). Of the 29 genes, nine genes fell within the selective sweeps detected in our study, and eight genes which were previously reported to be domestication-related genes were physically located within the domestication-related selective sweeps identified here, i.e. tb1 [18, 35], pbf1 , and zagl1 [37, 38]. The finding that we didn’t identify all the 29 known selected genes may be a consequence of the low marker density or different germplasms. Taken together, our results provide evidence that some selective sweeps identified here are associated with domestication traits, although the causative genes in most selective sweeps remain unknown.
Selection footprint regions associated with adaptation traits
To mine more loci or genes under selection during the adaptation process, we are using flowering-time traits as a representative for adaptation traits, since flowering time plays a key role in the process of adaptation that allowed maize to spread so widely [27,28,29]. We performed an additional co-localization analysis between selection sweeps and genomic regions associated with flowering-time traits as identified by genome-wide association studies (GWASs) [27, 39, 40]. A total of 32 domestication (8.1%) and 39 adaptation (10.8%) sweeps were co-located with GWAS signals for flowering time (Fig. 5; Data S2). Then we carried out a 1000-permutation test using the randomly sampled genomic regions with the same number and size as the selective sweeps compared to these public GWAS hits for flowering time [27, 39, 40]. The results revealed that the GWAS signals for flowering time were highly enriched in the domestication and adaption sweeps (Permutation test, P < 0.001) (Fig. S5). Notably, three reported flowering-time genes, ZmMADS69 , PhyB1 [42, 43], and zmm3 [37, 44], were detected within the GWAS signals as well as the selective sweeps. These results suggest that the genes underlying these co-localized regions for flowering-time traits might have undergone selection during maize domestication and adaptation.
In addition to characterization of selected regions potentially related to flowering time, we compared our selected regions to a marker-trait association mapping that was done for four flowering-time traits using a set of 508 maize inbred lines with ~ 1.25 million SNPs . At a P-value ≤6.05 × 10− 6 (1/165,202), a total of 10, 6, 11, and 4 loci were significantly associated with days to anthesis (DTA), days to silking (DTS), anthesis photoperiod response (APR), and silking photoperiod response (SPR), respectively (Data S3) when using best linear unbiased prediction (BLUP) values. Comparison of our selective sweeps to this GWAS on flowering-time traits using the set of 508 inbred lines grown at seven locations at diverse latitudes was also instructive; that GWAS identified 188 additional SNPs that resolved to 106 loci, and ten co-located with adaptation-related selective sweeps (Table 3; Data S3). The function of these ten candidate genes for flowering time that underwent selection during maize adaptation were annotated as transcription factors, flavonol synthase, MYB DNA-binding domain superfamily protein, etc (Table 3). Of these loci, association and adaptation-related selective signals were both noted at the gene GRMZM2G169293 (Fig. 6A, B), which encodes a ceramide and inositol phosphotransferase. We found that 77% of tropical inbred lines carried the C allele at the SNP (S8_167550959) that showed the most significant association at the GRMZM2G169293 locus, and that the percentage of lines with the C allele increased to 99% among temperate inbred lines (Fig. 6C). These contrasting frequency distributions suggest that the C allele of SNP S8_167550959 might be associated with distinct patterns of geographic dispersal. Interestingly, SNP S8_167550959 exhibited significant association with flowering time only at high latitudes, and the effects increased with latitude (except within Yunnan, China; Fig. 6D). Although the function of GRMZM2G169293 affecting flowering time need more solid evidence, i.e., overexpression or mutant analysis, these findings suggested the characterization of genes responsible for adaptation from tropical to temperate regions.
The germplasm analyzed here is comprised of an ecologically diverse collection of species including domesticated maize from tropical and temperate regions, and its close wild relatives. These taxa provide an excellent genetic resource to address multiple questions about speciation and evolution, structural and functional genomics, and utilization of teosinte germplasm in maize breeding. Cultivated maize has experienced a long period of artificial selection for desirable traits such as high yield (e.g., large seeds), nutrient richness (e.g., high levels of starch, oil, carotenoids, etc.), and ease of harvest [8,9,10, 15, 46]. This productivity-directed selection process generally results in the loss of genetic diversity in maize and an increased vulnerability to biotic and abiotic stresses .
Comparison of polymorphism data between maize landraces and teosinte in previous studies report a substantial loss (17%) of diversity during the domestication bottleneck . Following further and more intense artificial selection, modern maize lost even more (18.6%) genetic diversity compared to teosinte . Thus, in comparison with cultivated maize, its wild relative teosinte is a reservoir of genetic variation, and often exhibits favorable nutritional attributes , stress resilience [48, 49], and even agronomic and yield performance [12,13,14, 50]. Multiple favorable alleles from teosinte have been mined, such as ZmWAK for resistance to head smut , ZmNAC111 for drought tolerance in maize seedlings , and UPA2 for leaf angle . Notably, the teosinte UPA2 allele reducing the leaf angle, which has a pretty low allele frequency (4.4%) in teosinte that has not been used in modern maize, was introgressed into an elite modern maize hybrid, Nongda108, via marker-assisted selection, and finally enhanced the maize yield under dense planting . It is a successful example to incorporate the teosinte germplasm to improve the maize breeding. These findings suggest the potential to identify other beneficial variants useful for maize genetic improvement that may be hidden in teosinte. The five species of teosinte in the genus Zea, parviglumis, mexicana, huehuetenangensis, diploperennis and luxurians, can be hybridized with modern maize , enabling the transfer of favorable alleles that currently exist in wild relatives into modern maize breeding pools.
Capitalizing on the development of efficient genotyping technology, teosinte represents an attractive system for the study of population and ecological genomics of maize domestication, introgressive hybridization, and local adaptation [3, 53]. In our study, different methods including ADMIXTURE analysis, PCA, and phylogenetic tree analysis clearly elucidated the genetic relationship between maize and its wild relatives based on over 40,000 SNPs across the genome. Consistent with previous studies [2, 5, 6, 31, 32], our results confirm a single domestication event in maize from the Central Balsas parviglumis race and favor the Balsas River valley as the center of maize domestication. Notably, the domestication process inferred from paleogenomic data was both gradual and complex, in which different genetic loci were selected at different time points, and the transformation of teosinte to maize was completed in the last 5000 years . In addition to the evolutionary relationship between maize and teosinte, we also determined the evolutionary lineage of all species within teosinte, namely that parviglumis are closest to mexicana, followed in order by huehuetenangsis, diploperennis and perennis, luxurians and nicaraguensis. These findings answer a fundamental question in the taxonomic classification of teosintes, which has been debated during the last five decades [2, 55,56,57,58,59].
Our comparative genomic analysis between wild and modern maize, and between tropical and temperate maize, identified 5.7% of the genome that had been selected during maize domestication, and 5.5% of the genome that had been selected during adaptation. Our data cannot differentiate selective sweeps with domestication features from those with improvement features because we didn’t look at maize landraces. In comparisons to previous studies, the size of the selected genomic regions we identified is smaller, and only 24% (95/394) of putative domestication-related selective sweeps overlapped with the results of Hufford et al. , and 17% (62/360) of putative adaptation genes overlapped with the results of Liu et al. . These low percentages may result from different genetic germplasms, sample sizes, and SNP densities as well as from differences in the quality of the reference genome (Table S3). Although the SNP density used was relatively low, the larger sample size in our study shows greater genetic diversity (Fig. S6) and could increase the power of detecting selection signals . With newer developments in sequencing technology, re-sequencing our germplasm plus a set of maize landraces will refine what we are able to conclude about maize domestication, improvement, and adaptation.
Maize was subjected to drastic morphological or physiological changes during domestication that now differentiate it from its teosinte progenitor. Given these changes, the selective sweeps identified in this study could be associated with domestication and adaptation traits. These associations were supported by the co-location of the selective sweeps identified here and eight domestication genes (e.g., tb1 [16,17,18] and pbf1 ) plus a set of GWAS signals for flowering-time traits (Fig. 5). In addition to the known genes and existing GWAS signals reported in previous studies, ten candidate genes were identified that colocalized at both GWAS and selection signals. As an example, GRMZM2G169293 had a genetic effect on flowering time that was dependent on altitude. Similar trends have been seen in known adaptation genes such as ZmCCT10  and ZmCCT9 . Such temperature-related highland adaptation loci could be important for maize breeding in the face of climate change . Therefore, identification of selective sweeps during maize domestication and adaptation will extend our understanding of these processes, and greatly benefit maize breeding if this information is included in the process of maize improvement.
In summary, we determined the genetic structure reflected the historical evolutionary relationships among Zea species and subspecies, namely that maize is closest to parviglumis, followed by mexicana, huehuetenangsis, diploperennis and perennis, luxurians and nicaraguensis. Our comparative population genomic studies identified more than 600 domestication and adaptation sweeps, and the existing GWAS hits for flowering time were highly enriched in the selective sweeps. Combining with the GWAS results, we identified ten candidate genes that were significantly associated with adaptation traits and that have undergone selection during maize adaptation. Notably, a candidate gene GRMZM2G169293 was identified, which located within an adaptation selective sweep and was associated with photoperiod responses. Taken together, our results will provide increasing insights into the evolutionary history of maize and will greatly benefit the maize breeding.
Materials and methods
A set of 982 maize lines and 190 teosinte accessions were used in this study. The maize lines, representative of tropical, subtropical, and temperate germplasm, were collected from maize breeding programs of the International Maize and Wheat Improvement Center (CIMMYT) (n = 691), China (n = 221), the USA (n = 66), Thailand (n = 3), and Peru (n = 1) (Data S4). The teosinte accessions, representative of the entire geographical distribution of teosinte across Mexico and Central America, included 12 nicaraguensis, one luxurians, three diploperennis, two perennis, one huehuetenangsis, 96 mexicana, and 75 parviglumis accessions (Data S4). Based on their geographical distribution, the mexicana accessions were further divided into five geographical groups from Puebla, Central Plateau, Chalco, Durango and Nobogame, and parviglumis accessions were also further divided into five geographical groups from Southern Guerrero, Oaxaca, Eastern Balsas, Central Balsas, and Jalisco (Fig. S1 and Data S4).
Genotyping and SNP quality control
DNA was extracted from leaves that were obtained from a pool of at least six individuals for each maize line and one individual per teosinte accession. All maize lines and teosinte accessions were genotyped using the Illumina MaizeSNP50 BeadChip (Illumina Inc., San Diego, CA, USA) containing 56,110 SNPs . SNP genotypes were manually checked as reported previously . A total of 2353 SNPs with poor performance were removed from subsequent analyses. In addition, only the SNPs with probe sequences uniquely mapped to the B73 reference genome (B73 RefGenV3) using the Burrows-Wheeler Aligner (BWA) were retained . A final set of 42,204 polymorphic and single-copy SNPs with < 20% missing data across all 1172 accessions was used in the final analyses. The PIC for each SNP was calculated using PowerMarker version 3.25 .
Population structure analysis
Hierarchical population structure of all maize lines and teosinte accessions was estimated with the program ADMIXTURE, which implemented a Structure-like model-based maximum likelihood clustering algorithm . The maize lines and teosinte accessions were subsequently analyzed separately. For maize, lines with membership probabilities ≥0.70 were assigned to that corresponding group, and lines with a probability of < 0.70 for both the temperate and tropical groups were assigned to a mixed group. For teosinte, the entries were assigned to the corresponding subspecies and geographical groups based on their known origins and ADMIXTURE results. ADMIXTURE results showing individual assignments to corresponding groups were graphically displayed using R Version 3.1.1 (www.R-project.org).
Visualization of relationships
PCA was performed at the individual level using the GCTA software . Subgroups were formed that included all maize and teosinte accessions, maize inbred lines only, teosinte accessions only, and teosinte accessions split into two subgroups. The first three principal components were used to visualize the genetic relatedness among individuals and to investigate the groups. The identity-by-state distance matrix was calculated between each pair of lines with PLINK Version 1.7 , and was then imported into the MEGA6 program  to construct a neighbor-joining phylogenetic tree. One nicaraguensis accession was used as the outgroup.
Haplotype phasing and visualization
Haplotype phasing was done independently for each chromosome by SHAPEIT Version 2.12 [69, 70] with 2-Mb window size, 20 burn-in iterations, 20 iterations of the pruning stage, and 30 main iterations. Then the genome was divided into 50-kb windows to determine the haplotypes of linked SNPs in each window. If a window contains more than five SNPs, a random subset of five SNPs was selected for haplotype analysis, and the same randomly selected SNPs were used for all individuals. As a result, the SNP number used for haplotype analysis in each window ranged from one to five. For subsequent analyses, each haplotype window was defined as a locus, and each unique haplotype within the window was defined as an allele. In total, 17,109 loci were visualized for the window-based haplotypes.
Genome scanning for regions that have undergone selection
To achieve maximum statistical power, XP-CLR hosted on GitHub  was implemented along with the population fixation statistic, FST, using VCFtools  to detect loci that may have undergone selection during maize domestication and adaptation. In the analysis of XP-CLR, we used a 100-kb sliding window and a 10-kb step size. To ensure comparability of the composite likelihood score in each window, we fixed the number of SNPs assayed in each window to five with the setting ‘--maxsnps 5 --minsnps 5’ . Meanwhile, to keep the used genomic windows consistent in the XP-CLR analysis, the weighted FST values were estimated in each window that required at least five SNPs with the setting ‘--fst-window-size 100,000 --fst-window-step 10,000’ . Pairwise differentiation between populations (FST) was calculated using the “hierfstat” package of R .
Evidence for selection across the genome during the domestication and adaptation processes were evaluated in two separate comparisons: teosinte versus maize for domestication and tropical maize lines versus temperate maize lines for adaptation. For each method, we merged the adjacent windows with top 10% values into a single window, and the top 0.5% outliers were determined to represent putative selection signals. In addition, adjacent sweeps separated by a physical distance of < 100 kb were merged into a single selected locus.
Genome-wide association mapping for flowering-time traits
The 508 diverse inbred lines that made up an association mapping panel  were planted in seven environments, including six long-day (> 13 h) and one short-day (< 13 h) growing-season environments . Flowering time was recorded as DTA and DTS, and these values were then converted into GDDs. APR and SPR were calculated as the difference between GDDs under long- and short-day conditions for pollen shed and silking, respectively. The best linear unbiased prediction (BLUP) values for each trait were used for the marker-trait association analysis. Using ~ 1.25 million previously reported SNPs with a minor allele frequency of ≥0.05 , the marker-trait association analyses were performed using a mixed linear model  presented in TASSEL 5.2 , which accounted for population structure and relative kinship . Because the SNPs used for GWAS are in LD at different levels, we first performed LD pruning for the 1.25 million SNPs (window size 50, step size 50, r2 ≥ 0.2) using PLINK , and obtained 165,202 independent SNPs. Consequently, the Bonferroni-corrected threshold, 6.05 × 10− 6 (P < 1/165,202), was used as the whole-genome significance cutoff. Marker-trait associations were also analyzed with this dataset for flowering time in each environment.
SNP data for this study has been uploaded to European Variation Archive and can be retrieved through the project number PRJEB41335 (http://wwwdev.ebi.ac.uk/eva/?eva-study=PRJEB41335).
Quantitative trait locus
Principal component analysis
Cross-population composite likelihood ratio
Genome-wide association studies
Days to anthesis
Days to silking
Growing degree days
Anthesis photoperiod response
Silking photoperiod response
Best linear unbiased prediction
Minor allele frequency
Polymorphic information content
Ranum P, Pena-Rosas JP, Garcia-Casal MN. Global maize production, utilization, and consumption. Ann N Y Acad Sci. 2014;1312(1):105–12.
Fukunaga K, Hill J, Vigouroux Y, Matsuoka Y, Sanchez J, Liu KJ, et al. Genetic diversity and population structure of teosinte. Genetics. 2005;169(4):2241–54.
Hufford MB, Bilinski P, Pyhajarvi T, Ross-Ibarra J. Teosinte as a model system for population and ecological genomics. Trends Genet. 2012;28(12):606–15.
Manchanda N, Snodgrass SJ, Ross-Ibarra J, Hufford MB. Evolution and adaptation in the maize genome. In: The Maize Genome: Springer; 2018. p. 319–32.
Matsuoka Y, Vigouroux Y, Goodman MM, Sanchez J, Buckler E, Doebley J. A single domestication for maize shown by multilocus microsatellite genotyping. Proc Natl Acad Sci U S A. 2002;99(9):6080–4.
Piperno DR, Ranere AJ, Holst I, Iriarte J, Dickau R. Starch grain and phytolith evidence for early ninth millennium BP maize from the central Balsas River valley. Mexico. Proc Natl Acad Sci USA. 2009;106(13):5019–24.
van Heerwaarden J, Doebley J, Briggs WH, Glaubitz JC, Goodman MM, Gonzalez JDS, et al. Genetic signals of origin, spread, and introgression in a large sample of maize landraces. Proc Natl Acad Sci U S A. 2011;108(3):1088–92.
Doebley J. The genetics of maize evolution. Annu Rev Genet. 2004;38:37–59.
Doebley JF, Gaut BS, Smith BD. The molecular genetics of crop domestication. Cell. 2006;127(7):1309–21.
Flint-Garcia SA. Kernel evolution: from teosinte to maize. In: Maize Kernel Development; 2017. p. 1–15.
Liu Z, Cook J, Melia-Hancock S, Guill K, Bottoms C, Garcia A, et al. Expanding maize genetic resources with predomestication alleles: Maize-teosinte introgression populations. Plant Genome. 2016;9(1):plantgenome2015-07.
Chen Q, Yang CJ, York AM, Xue W, Daskalska LL, DeValk CA, et al. TeoNAM: a nested association mapping population for domestication and agronomic trait analysis in maize. Genetics. 2019;213(3):1065–78.
Fu Y, Xu G, Chen H, Wang X, Chen Q, Huang C, et al. QTL mapping for leaf morphology traits in a large maize-teosinte population. Mol Breeding. 2019;39:103.
Liu L, Huang J, He L, Liu N, Du Y, Hou R, et al. Dissecting the genetic architecture of important traits that enhance wild germplasm resource usage in modern maize breeding. Mol Breeding. 2019;39:157.
Fang H, Fu X, Wang Y, Xu J, Feng H, Li W, et al. Genetic basis of kernel nutritional traits during maize domestication and improvement. Plant J. 2020;101(2):278–92.
Doebley J, Stec A, Gustus C. Teosinte branched1 and the origin of maize: evidence for epistasis and the evolution of dominance. Genetics. 1995;141(1):333–46.
Doebley J, Stec A, Hubbard L. The evolution of apical dominance in maize. Nature. 1997;386(6624):485–8.
Studer A, Zhao Q, Ross-Ibarra J, Doebley J. Identification of a functional transposon insertion in the maize domestication gene tb1. Nat Genet. 2011;43(11):1160–3.
Doebley J, Stec A. Inheritance of the morphological differences between maize and teosinte: comparison of results for two F2 populations. Genetics. 1993;134(2):559–70.
Wang H, Nussbaum-Wagler T, Li B, Zhao Q, Vigouroux Y, Faller M, et al. The origin of the naked grains of maize. Nature. 2005;436(7051):714–9.
Wills DM, Whipple CJ, Takuno S, Kursel LE, Shannon LM, Ross-Ibarra J, et al. From many, one: genetic control of prolificacy during maize domestication. PLoS Genet. 2013;9:e1003604.
Wright SI, Bi IV, Schroeder SG, Yamasaki M, Doebley JF, McMullen MD, et al. The effects of artificial selection on the maize genome. Science. 2005;308(5726):1310–4.
Liu H, Wang X, Warburton ML, Wen W, Jin M, Deng M, et al. Genomic, transcriptomic, and phenomic variation reveals the complex adaptation of modern maize breeding. Mol Plant. 2015;8(6):871–84.
Pyhäjärvi T, Hufford MB, Mezmouk S, Ross-Ibarra J. Complex patterns of local adaptation in teosinte. Genome Biol Evol. 2013;5(9):1594–609.
Takuno S, Ralph P, Swarts K, Elshire RJ, Glaubitz JC, Buckler ES, et al. Independent molecular basis of convergent highland adaptation in maize. Genetics. 2015;200:1297–312.
Salvi S, Sponza G, Morgante M, Tomes D, Niu X, Fengler KA, et al. Conserved noncoding genomic sequences associated with a flowering-time quantitative trait locus in maize. Proc Natl Acad Sci U S A. 2007;104(27):11376–81.
Yang Q, Li Z, Li W, Ku L, Wang C, Ye J, et al. CACTA-like transposable element in ZmCCT attenuated photoperiod sensitivity and accelerated the postdomestication spread of maize. Proc Natl Acad Sci U S A. 2013;110(42):16969–74.
Huang C, Sun H, Xu D, Chen Q, Liang Y, Wang X, et al. ZmCCT9 enhances maize adaptation to higher latitudes. Proc Natl Acad Sci U S A. 2018;115(2):E334–41.
Guo L, Wang X, Zhao M, Huang C, Li C, Li D, et al. Stepwise cis-regulatory changes in ZCN8 contribute to maize flowering-time adaptation. Curr Biol. 2018;28(18):3005–15.
Vigouroux Y, Glaubitz JC, Matsuoka Y, Goodman MM, Jesus SG, Doebley J. Population structure and genetic diversity of New World maize races assessed by DNA microsatellites. Am J Bot. 2008;95(10):1240–53.
Doebley J. Molecular evidence and the evolution of maize. Econ Bot. 1990;44(3):6–27.
Ranere AJ, Piperno DR, Holst I, Dickau R, Iriarte J. The cultural and chronological context of early Holocene maize and squash domestication in the central Balsas River valley, Mexico. Proc Natl Acad Sci USA. 2009;106(13):5014–8.
Hake S, Ross-Ibarra J. Genetic, evolutionary and plant breeding insights from the domestication of maize. eLife. 2015;4:e05861.
Chen H, Patterson N, Reich D. Population differentiation as a test for selective sweeps. Genome Res. 2010;20(3):393–402.
Clark RM, Wagler TN, Quijada P, Doebley J. A distant upstream enhancer at the maize domestication gene tb1 has pleiotropic effects on plant and inflorescent architecture. Nat Genet. 2006;38(5):594.
Lang Z, Wills DM, Lemmon ZH, Shannon LM, Bukowski R, Wu Y, et al. Defining the role of prolamin-box binding factor1 gene during maize domestication. J Hered. 2014;105(4):576–82.
Zhao Q, Weber AL, McMullen MD, Guill K, Doebley J. MADS-box genes of maize: frequent targets of selection during domestication. Genet Res. 2011;93(1):65–75.
Wills DM, Fang Z, York AM, Holland JB, Doebley JF. Defining the role of the MADS-box gene, Zea Agamous-like1, a target of selection during maize domestication. J Hered. 2018;109(3):333–8.
Lu F, Romay MC, Glaubitz JC, Bradbury PJ, Elshire RJ, Wang T, et al. High-resolution genetic mapping of maize pan-genome sequence anchors. Nat Commun. 2015;6(1):1–8.
Li YX, Li C, Bradbury PJ, Liu X, Lu F, Romay CM, et al. Identification of genetic variants associated with maize flowering time using an extremely large multi-genetic background population. Plant J. 2016;86(5):391–402.
Liang Y, Liu Q, Wang X, Huang C, Xu G, Hey S, et al. ZmMADS69 functions as a flowering activator through the ZmRap2.7-ZCN8 regulatory module and contributes to maize flowering time adaptation. New Phytol. 2019;221(4):2335–47.
Sheehan MJ, Farmer PR, Brutnell TP. Structure and expression of maize phytochrome family homeologs. Genetics. 2004;167(3):1395–405.
Sheehan MJ, Kennedy LM, Costich DE, Brutnell TP. Subfunctionalization of PhyB1 and PhyB2 in the control of seedling and mature plant traits in maize. Plant J. 2007;49(2):338–53.
Studer AJ, Wang H, Doebley JF. Selection during maize domestication targeted a gene network controlling plant and inflorescence architecture. Genetics. 2017;207(2):755–65.
Liu H, Luo X, Niu L, Xiao Y, Chen L, Liu J, et al. Distant eQTLs and non-coding sequences play critical roles in regulating gene expression and quantitative trait variation in maize. Mol Plant. 2017;10(3):414–26.
Flint-Garcia SA, Bodnar AL, Scott MP. Wide variability in kernel composition, seed characteristics, and zein profiles among diverse maize inbreds, landraces, and teosinte. Theor Appl Genet. 2009;119(6):1129–42.
Hufford MB, Xu X, van Heerwaarden J, Pyhajarvi T, Chia JM, Cartwright RA, et al. Comparative population genomics of maize domestication and improvement. Nat Genet. 2012;44(7):808–11.
Mano Y, Omori F. Breeding for flooding tolerant maize using" teosinte" as a germplasm resource. Plant Root. 2007;1:17–21.
Wang L, Yang A, He C, Qu M, Zhang J. Creation of new maize germplasm using alien introgression from Zea mays ssp. mexicana. Euphytica. 2008;164(3):789–801.
Tian J, Wang C, Xia J, Wu L, Xu G, Wu W, et al. Teosinte ligule allele narrows plant architecture and enhances high-density maize yields. Science. 2019;365(6454):658–64.
Zuo W, Chao Q, Zhang N, Ye J, Tan G, Li B, et al. A maize wall-associated kinase confers quantitative resistance to head smut. Nat Genet. 2015;47(2):151–7.
Mao H, Wang H, Liu S, Li Z, Yang X, Yan J, et al. A transposable element in a NAC gene is associated with drought tolerance in maize seedlings. Nat Commun. 2015;6(1):1–13.
Aguirre-Liguori JA, Tenaillon MI, Vazquez-Lobo A, Gaut BS, Jaramillo-Correa JP, Montes-Hernandez S, et al. Connecting genomic patterns of local adaptation and niche suitability in teosintes. Mol Ecol. 2017;26(16):4226–40.
Ramos-Madrigal J, Smith BD, Moreno-Mayar JV, Gopalakrishnan S, Ross-Ibarra J, Gilbert MTP, et al. Genome sequence of a 5,310-year-old maize cob provides insights into the early stages of maize domestication. Curr Biol. 2016;26(23):3195–201.
Wilkes HG. Teosinte: the closest relative of maize. Teosinte: the closest relative of maize; 1967.
Iltis HH, Doebley JF, Guzmán R, Pazy B. Zea diploperennis (Gramineae): a new teosinte from Mexico. Science. 1979;203(4376):186–8.
Sanchez JJ, De la Cruz L, Vidal VA, Ron J, Taba S, Santacruz-Ruvalcaba F, et al. Three new teosintes (Zea Spp., Poaceae) from México. Am J Bot. 2011;98(9):1537–48.
Pena GT, Larios LD, Gonzales JDS, Corral JAR, Nava JJC, Santacruz-Ruvalcaba F, et al. Relationships among teosinte populations (Zea spp.) from Mexico, Guatemala and Nicaragua. Acta Bot Mex. 2015;111:17–45.
Rivera-Rodriguez DM, Gonzalez JDS, Larios LD, Santacruz-Ruvalcaba F, Corral JAR. Morphological and climatic variability of teosinte (Zea spp.) and relationships among taxa. Syst Bot. 2019;44(1):41–51.
Pickrell JK, Coop G, Novembre J, Kudaravalli S, Li JZ, Absher D, et al. Signals of recent positive selection in a worldwide sample of human populations. Genome Res. 2009;19(5):826–37.
Ganal MW, Durstewitz G, Polley A, Berard A, Buckler ES, Charcosset A, et al. A large maize (Zea mays L.) SNP genotyping array: development and germplasm genotyping, and genetic mapping to compare with the B73 reference genome. PLoS One. 2011;6:e28334.
Yan J, Yang X, Shah T, Sanchez-Villeda H, Li J, Warburton M, et al. High-throughput SNP genotyping with the Golden Gate assay in maize. Mol Breeding. 2010;25(3):441–51.
Li H, Durbin R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics. 2009;25(14):1754–60.
Liu K, Muse SV. PowerMarker: an integrated analysis environment for genetic marker analysis. Bioinformatics. 2005;21(9):2128–9.
Alexander DH, Novembre J, Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 2009;19(9):1655–64.
Yang JA, Lee SH, Goddard ME, Visscher PM. GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet. 2011;88:76–82.
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81(3):559–75.
Tamura K, Stecher G, Peterson D, Filipski A, Kumar S. MEGA6: molecular evolutionary genetics analysis version 6.0. Mol Biol Evol. 2013;30(12):2725–9.
Delaneau O, Marchini J, Zagury JF. A linear complexity phasing method for thousands of genomes. Nat Methods. 2012;9(2):179–81.
Delaneau O, Zagury JF, Marchini J. Improved whole-chromosome phasing for disease and population genetic studies. Nat Methods. 2013;10(1):5–6.
Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, et al. The variant call format and VCFtools. Bioinformatics. 2011;27(15):2156–8.
Goudet J. HIERFSTAT, a package for R to compute and test hierarchical F-statistics. Mol Ecol Notes. 2005;5(1):184–6.
Yang X, Gao S, Xu S, Zhang Z, Prasanna BM, Li L, et al. Characterization of a global germplasm collection and its potential utilization for analysis of complex quantitative traits in maize. Mol Breeding. 2011;28(4):511–26.
Yu J, Pressoir G, Briggs WH, Bi IV, Yamasaki M, Doebley JF, et al. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat Genet. 2006;38(2):203–8.
Bradbury PJ, Zhang Z, Kroon DE, Casstevens TM, Ramdoss Y, Buckler ES. TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics. 2007;23(19):2633–5.
Li Q, Yang X, Xu S, Cai Y, Zhang D, Han Y, et al. Genome-wide association studies identified three independent polymorphisms associated with α-tocopherol content in maize kernels. PLoS One. 2012;7:e36807.
We greatly appreciate Dr. Jianbing Yan at Huazhong Agricultural University and Drs. S. Taba, B.M. Prasanna, S.D. Nicolas and H.H Kuang of the International Maize and Wheat Improvement Center (CIMMYT) for their critical comments about the early version of this manuscript.
This work was supported by the National Natural Science Foundation of China (91935302, 31722039, 32022064) and Beijing Outstanding Young Scientist Program (BJJWZYJH01201910019026).
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Additional file 1: Figure S1.
Geographical distribution of all teosinte accessions. Figure S2. Genetic relationships of maize and teosinte assessed by PCA. Figure S3. Evaluation of the ascertainment bias caused by Syngenta SNPs. Figure S4. Haplotype richness in maize and teosinte groups estimated via window-based methods. Figure S5. Co-localization of putative selective sweeps with public GWAS hits for flowering time. Figure S6. Genetic relationships of maize and teosinte assessed by PCA using 36,839 common SNPs between this study and Hufford et al.’s study. Table S1. Population divergence among maize and teosinte subgroups estimated by pairwise FST values between different groups. Table S2. List of known domestication, improvement and adaptation genes in maize. Table S3. Comparisons of selective sweeps identified in this study and previous studies, and the factors affecting the identification of selective sweeps.
Additional file 2: Data S1.
Genetic relationship of maize and teosinte inferred by ADMIXTURE.
Additional file 3: Data S2.
Summary of selection sweeps with domestication and adaptation features.
Additional file 4: Data S3.
Summary of SNPs significantly associated with flowering-time traits detected by GWAS.
Additional file 5: Data S4.
List of plant materials used in this study.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Xu, G., Zhang, X., Chen, W. et al. Population genomics of Zea species identifies selection signatures during maize domestication and adaptation. BMC Plant Biol 22, 72 (2022). https://doi.org/10.1186/s12870-022-03427-w
- Evolutionary relationship
- Genome-wide association study
- Flowering time