Skip to main content

Association mapping in multiple yam species (Dioscorea spp.) of quantitative trait loci for yield-related traits



Yam (Dioscorea spp.) is multiple species with various ploidy level and considered as cash crop in many producing areas. Selection based phenotyping for yield and its related traits such as mosaic virus and anthracnose diseases resistance and plant vigor in multiple species of yam is lengthy however, marker information has proven to enhance selection efficiency.


In this study, a panel of 182 yam accessions distributed across six yam species were assessed for diversity and marker-traits association study using SNP markers generated from Diversity Array Technology platform. For the traits association analysis, the relation matrix alongside the population structure were used as co-factor to avoid false discovery using Multiple random Mixed Linear Model (MrMLM) followed by gene annotation.


Accessions performance were significantly different (p < 0.001) across all the traits with high broad-sense heritability (H2). Phenotypic and genotypic correlations showed positive relationships between yield and vigor but negative for yield and yam mosaic disease severity. Population structure revealed k = 6 as optimal clusters-based species. A total of 22 SNP markers were identified to be associated with yield, vigor, mosaic and anthracnose diseases resistance. Gene annotation for the significant SNP loci identified some putative genes associated with primary metabolism, pest and resistance to anthracnose disease, maintenance of NADPH in biosynthetic reaction especially those involving nitro-oxidative stress for resistance to mosaic virus, and seed development, photosynthesis, nutrition use efficiency, stress tolerance, vegetative and reproductive development for tuber yield.


This study provides valuable insights into the genetic control of plant vigor, anthracnose, mosaic virus resistance, and tuber yield in yam and thus, opens an avenue for developing additional genomic resources for markers-assisted selection focusing on multiple yam species.

Peer Review reports


Yam is among the principal root and tuber crops, including cassava and potato, that are widely grown and consumed as subsistence staples in sub-Saharan Africa where over 90% of the global production resides [1,2,3]. It is a group of multi-species monocot with X = 20 as the basic chromosome number and cultivated for the starchy underground tubers and aerial bulbils in the yam belts of west and central Africa [2]. In the Democratic Republic of Congo (DRC), yams play an important role in ensuring the sustenance of food security, primarily for the rural populace [4]. DRC is home to a few species of yam which includes D. alata (water/greater yam), D. cayenensis (yellow Guinea yam), D. dumetorum (bitter yam), and D. rotundata (white Guinea yam), D. bulbifera (aerial yam), D. burkilliana (wild yam), and D. praehensilis (bush yam) [4,5,6]. Though, D. alata, D. cayenensis and D. rotundata are widely cultivated among the farmers compared to other species [4, 7].

Despite the contribution of yam to the rural sustenance in DRC, production is seasonally met with several constraints including but not limited to poor agronomic performance (yield and related traits) and major pathological issues namely yam mosaic disease (YMD) and yam anthracnose disease (YAD). These constraints have consistently affected the performance of many cultivated landraces and thus, aggravating the loss of interest in yam production in many producing areas [4]. YMD is caused by yam mosaic viruses (YMV) while YAD is caused by Colletotrichum glo- eosporioides (Penz.). Percentage yield loss attributable to the synergy effect of these diseases have been reported to be above 50% [8, 9]. Developing and delivery of new and improved varieties for improved vigor and yield potential as well as better tolerance to YMD and YAD could increase the productivity of resource-poor farmers characterized by low use of external farm inputs.

The art of breeding for improved varieties demands a thorough understanding of the genetic basis of the traits. However, the lack of information on the genetic diversity as well as the genetic architecture of key and economic traits have been a major hindrance to the success of improved cultivar development in DRC. Findings from other yam producing regions have reported the influence of quantitative inheritance for key and economic traits in yam [10, 11]. Genome-Wide Association Studies (GWAS) is an ideal method for dissecting the genetic control of complex traits as it uses historic recombination events accumulated over many generations. GWAS has been successfully used to genetically dissect yam traits of economic importance such as tuber yield and mosaic virus resistance in D. rotundata [10], sex determination and cross compatibility in D. alata [11], and tuber dry matter content and oxidative browning in D. alata [12]. These studies have shown the importance of GWAS in identifying genomic regions and candidate genes associated with key and economic traits in yam, however they have been species-specific and thus, very likely that the impact of the finding from one species may not be perfectly transferrable to another species. This is due to pre and post-zygote challenges previously reported from crosses originating from different yam species [11]. Thus, identifying genomic regions associated with economic traits of importance in multiple yam species could offer a better breeding impact. This remained an area that has not been considered and exploited as it has been done for crops with multiple species. Realizing this would facilitate the development of molecular markers that can be relied upon for early generation traits selection in multiple species of Dioscorea.

As a contribution to further improve upon this method of breeding, identifying genomics regions associated with yield and related traits for possible markers development for selection in multiple yam species will offer an advantage over the current system of species-specific markers. The objective of this study was to dissect the genetic control of tuber yield and related traits (vigor, YMD, and YAD) in a panel of yam consisting of six species.


Phenotypic variation, correlation, and heritability among the 182 yam accessions

Significant interaction effect of year by accession was observed at p < 0.05 for yield and p < 0.001 for other parameters. Accession effect was significant at p < 0.001 for all the traits while year effect was significant for YAD at p < 0.01 and YMD severity at p < 0.001 (Table 1). GCV estimate ranged from moderate classification for YMV, YAD, and vigor (between 10 and 20%) to high classification in yield (43.71%). PCV estimate ranged from moderate classification for YMD and YAD (16.22 and 17.03, respectively) to high classification for yield and vigor (52.32 and 20.75, respectively). H2 estimate was high for all the characters (Table 1). Both phenotypic and genotypic correlations revealed positive relationship between yield and plant vigor. YMD had negative relationship with plant vigor however, YAD had positive correlation with plant vigor (Fig. 1).

Table 1 Estimates of variance, coefficients of variation and heritability in a panel of 182 yam accessions
Fig. 1
figure 1

Genotypic (left) and phenotypic (right) and relationships among traits. Vigor Plant vigor, YAD Yam anthracnose severity, YMD Yam mosaic severity, Yield_plant Tuber yield per plant

Summary statistics and genetic diversity assessment

A total number of 20,275 SNPs was generated by the DArTseq protocol from which 11,722 were retained after filtering for MAF, maximum missing, genotype quality, and read depth. MAF varied from 0.052 to 0.50 with an average of 0.231, gene diversity varied from 0.09 to 0.50 with an average of 0.324, and the observed heterozygosity varied from 0 to 0.576 with an average of 0.254. The polymorphic information content varied from 0.09 to 0.375 with an average of 0.264.

The population structure analysis revealed cluster K = 6 (Fig. 2; Sup. Figure 1) as optimal cluster number. Approximately 89% of the yam accessions were successfully assigned to at least one of the clusters while 11% distributed across four species (Sup. Table 1) were considered as admixt with assigned probability less than 0.5.

Fig. 2
figure 2

Graphical representation of yam accessions population structure based on admixture analysis. Populations were set at k = 6. The colors represent the six groups: group 1 (red), group 2 (green), group 3 (gray), group 4 (blue), group 5 (purple), and group 6 (yellow) based on a membership coefficient of ≥ 50%

Exploring the genetic relationship through principal component analysis showed that the first two PCs account for 65.2% of the total variation. The species-based PCA plot revealed a segregation plot of the six yam species except for some few cases where we observed some possible mixture in D. alata and D. bulbifera as well as in D. praehensilis and D. rotundata (Fig. 3). The species pairwise differentiation showed that the genomes of D. cayenensis and D. bulbifera are the most distantly related (0.768, p < 0.001) while the genomes of D. alata and D. bulbifera were the most related (0.016, p < 0.038) (Table 2). The genome relatedness of D. alata and D. bulbifera was further confirmed by the phylogeny analysis that grouped both species into the same clusters (Fig. 4).

Fig. 3
figure 3

Species-based scatter plot of yam accessions using 11,722 SNP markers. Each color represents the species, and each dot represents the individual within the species

Table 2 Pairwise species differentiation among 182 accessions of yam landraces
Fig. 4
figure 4

Phylogeny diagram of the panel of 182 yam accessions based on yam species as a co-factor

The phylogeny diagram revealed six genetic groups or clusters (Fig. 4). The first genetic group has the 40 members distributed across species of D. alata (60%), D. bulbifera (7.5%), and D. rotundata (32.5%). This group has genetic distance ranging from 0.006 to 0.345 with an average of 0.269. The second cluster has 12 members with genetic distance ranging from 0.008 to 0.339 with an average of 0.302. This cluster has the least number of membership and distributed across species of D. alata (75%) and D. bulbifera (25%). The third cluster has 28 members with genetic distance ranging from 0.005 to 0.345 with an average of 0.251. D. cayenensis (35.7%), D. dumetorum (42.9%), and D. praehensilis (21.4%) were identified in this cluster. The fourth cluster has 44 members with genetic distance ranging from 0.008 to 0.344 with an average of 0.261. The cluster has the largest cluster members and distributed across species of D. cayenensis (4.5%), D. dumetorum (4.5%), D. praehensilis (20.5%), and D. rotundata (70.45%). The fifth cluster has 26 members with genetic distance ranging from 0.006 to 0.332 with an average of 0.264. The members of this cluster belongs to D. rotundata. The sixth cluster has 32 members with genetic distance ranging from 0.005 to 0.343 with an average of 0.263. The members of this cluster also belong to D. rotundata (Fig. 4).

Genome-wide scan for traits

Plant vigor, yam anthracnose severity, yam mosaic severity and tuber yield

In 2021, the GWAS results revealed five significant SNP markers on chromosomes 2, 3, 7, 8, and 17 associating with plant vigor with LOD values ranging from 3.35 to 5.42, MAF ranging from 0.07 to 0.45, and the marker chr_7356 explained the highest phenotypic variance (63%). For anthracnose severity, one SNP marker was found on chromosome 10 with LOD value of 4.01, MAF of 0.17 and explained 19% of the phenotypic variance. For mosaic severity, one SNP marker was found on chromosome 1 with LOD value of 3.74, MAF of 0.16 and explained 18% of the phenotypic variance. For tuber yield, one SNP marker was found on chromosome 19 with LOD value of 3.63, MAF of 0.29 and explained less than 1% of the phenotypic variance (Table 3 and Fig. 5).

Table 3 SNP markers associated with plant vigor, yam anthracnose severity, yam mosaic severity, and tuber yield
Fig. 5
figure 5

Genome-wide association analysis of plant vigor (vigor), yam anthracnose severity (YAD), yam mosaic severity (YMD), and tuber yield per plant (yield) for the evaluation year 2021. Manhattan and QQ plots indicating SNPs associated with the vigor (3a and b), YAD (3c and d), YMD (3e and f), and yield (3 g and h). The y-axis represents the p-value of the marker-trait association on a –log10 scale

In 2022, five significant SNP markers on chromosomes 7, 8, 17, and 20 were associated with plant vigor with LOD values ranging from 3.01 to 5.51, MAF ranging from 0.32 to 0.48, and the marker chr_17_10919 explained the highest phenotypic variance (13%). For anthracnose severity, eleven significant SNP markers were found on chromosomes 1, 4, 8, 13, 14, 15, and 20 with LOD values ranging from 3.64 to 8.57, MAF ranging from 0.05 to 0.37, and the marker chr_4_3941 explained the highest phenotypic variance (11%). For mosaic severity, six significant SNP markers were found on chromosomes 2 and 15 with LOD values ranging from 4.63 to 9.69, MAF ranging from 0.14 to 0.36, and the marker chr_2_11441 explained the highest phenotypic variance (18%). For tuber yield, three significant SNP markers were found on chromosomes 6, 8, and 20 with LOD values ranging from 3.48 to 5.15, MAF ranging from 0.08 to 0.34, and the marker chr_16_47690 explained the highest phenotypic variance (43%) (Table 3 and Fig. 6).

Fig. 6
figure 6

Genome-wide association analysis of plant vigor (vigor), yam anthracnose severity (YAD), yam mosaic severity (YMD), and tuber yield per plant (yield) for the evaluation year 2022. Manhattan and QQ plots indicating SNPs associated with the vigor (4a and b), YAD (4c and d), YMD (4e and f), and yield (4 g and h). The y-axis represents the p-value of the marker-trait association on a –log10 scale

The combined analysis revealed seven significant SNP markers on chromosomes 12, 14, 15, 17, 18, and 20, with LOD values ranging from 3.13 to 5.69, MAF ranging from 0.11 to 0.48, and the marker chr_12_31668 explained the highest phenotypic variance (23%). For anthracnose severity, five significant SNP markers were found on chromosomes 6, 10, and 16 with LOD values ranging from 3.14 to 3.75. MAF ranging from 0.11 to 0.28, and the marker chr_16_14899 explained the highest phenotypic variance (11%). For mosaic severity, five significant SNP markers were found on chromosomes 1, 15, and 20 with LOD values ranging from 4.35 to 6.73, MAF ranging from 0.16 to 0.21, and the marker chr_15_19801explained the highest phenotypic variance (56%). For tuber yield, three significant SNP markers were found on chromosomes 8, 9, and 13 with LOD values ranging from 3.31 to 5.28, MAF ranging from 0.08 to 0.14, and the marker chr_9_3704 explained the highest phenotypic variance (42%) (Table 3 and Fig. 7).

Fig. 7
figure 7

Genome-wide association analysis of plant vigor (vigor), yam anthracnose severity (YAD), yam mosaic severity (YMD), and tuber yield per plant (yield) for combined evaluation period of 2021 and 2022. Manhattan and QQ plots indicating SNPs associated with the vigor (5a and b), YAD (5c and d), YMD (5e and f), and yield (5 g and h). The y-axis represents the p-value of the marker-trait association on a –log10 scale

Identification of existing putative genes

Of the 13 GWAS hits found for plant vigor, seven SNP markers were identified around the vicinity of some important genes on the yam reference genome. The GWAS hit on chromosome 2 is located on the genome region harboring the Cytochrome P450 (Cyt_P450); on chromosome 6, harboring the MCM OB domain (MCM_OB); on chromosome 8, harboring FBD domain (FBD); on chromosome 12, harboring Vps16 C-terminal (Vps16_C) and Fructose-1–6-bisphosphatase class I, N-terminal (FBPase_N); on chromosome 14, harboring Sugar phosphate transporter domain (Sugar_P_trans_dom) and UAA transporter family (UAA); and on chromosome 18, harboring Reverse transcriptase (RVT_2) (Table 4).

Table 4 Candidate genes within chromosomic regions associated with plant vigor, yam anthracnose severity, yam mosaic severity and tuber yield

For YAD, the identified SNP marker on chromosome 10 harbors the NAD-dependent epimerase/dehydratase (Epimerase_deHydtase) while the SNP marker on chromosome 16 harbors the NB-ARC domain (NB-ARC) and the Rx N-terminal domain (Rx N-terminal). For YMD, the identified SNP marker on chromosome 15 harbors the Ribonuclease H domain (RNaseH_domain) (Table 4).

For tuber yield, the associated SNP marker on chromosome 8 harbors the Gnk2-homologous domain (Ginkbilobin-2-GNK2), on chromosome 9 harbors the CO/COL/TOC1 (CCT_CS) and the Jas motif, and the associated SNP marker on chromosome 13 harbors Fumarate reductase/succinate dehydrogenase flavoprotein-like, C-terminal (Fum_Rdtase/Succ_DH_flav-like_C) and Cytochrome P450 (Cyt_P450) (Table 4).


Phenotypic variability

The existing natural variability among the accessions for the traits under consideration was high and very informative. The high broad-sense heritability of 69% for plant vigor, 74% for anthracnose severity, 84% for mosaic virus severity, and 74% for tuber yield per plant,demonstrated the possibility for high response to selection. As a rule, traits with high heritability estimates can be modified more easily by selection and breeding than traits with lower heritability [25]. In addition, the observed genetic variation in the study materials indicates their relevance for yam genetic studies in DRC.

Population differentiation

The knowledge of population structure within the panel of yam accessions used for this study is important to ensure the correction for spurious associations between markers and traits in GWAS analysis. The population structure of the present study based on the delta K revealed 6 as the optimal sub-populations. Though low level of admixture exists in the germplasm, two accessions found in D. cayenensis, and D. dumetorum, and nine accessions of D. praehensilis could be explained as the possible inclusion of the progeny of these species resulting from few generations of hybridization into the germplasm. However, for D. rotundata, the eight accession found as admixt means the genome is yet to achieve fixation as D. rotundata has been reported as a hybrid of D. praehensilis and D. abyssinica [26, 27]. The high genetic variability is an indication of the potentials of the studied accessions for genetic improvement with consideration for improved plant vigor, yam anthracnose disease resistance, yam mosaic disease resistance, and improved tuber yield. The phylogeny analysis revealed similar number of cluster (six) as the population structure analysis, indicating their relevance in preventing spurious associations in GWAS [12, 28].

Marker-traits association and identification of putative genes

The whole-genome scan for phenotypic and allelic variation in plant vigor, yam anthracnose disease resistance, yam mosaic disease resistance, and tuber yield identified 22 genomic regions on 15 chromosomes with significant LOD score (≥ 3). In the mixed model, and to correct false-positive associations, we made used of both the Q factor representing the population structure and the K matrix representing the kinship. A total of 13 SNP markers were associated with plant vigor, three SNP markers with anthracnose disease resistance, three SNP markers with mosaic disease resistance, and three SNP markers with tuber yield that could be of importance in the implementation of marker-informed selection for these traits. Previous studies have also reported GWAS hits on some chromosomes where this study has also found significant marker-trait associations.

For plant vigor, there has been no report hitherto on the genomic regions associated with plant vigor in any yam species however, this study found 13 SNPs distributed across 11 chromosomes. Of these 13 loci, four SNP loci have been found to harbor genes (Cyt_P450; Vps16_C; FBPase_N; Sugar_P_trans_dom) that play essential role in enhancing vegetative growth in plants [13], viability of plant root [14] and enhancing root vigor for water uptake [15].

For yam anthracnose disease resistance, Agre et al. [29] reported significant SNP association on chromosomes 7, 15, and 18 from where a total of five genes were found around the SNPs loci in D. alata. This study found two more SNP loci on chromosomes 10 and 16 (Epimerase_deHydtase; NB-ARC; Rx N-terminal) with essential roles in affecting the cell surface properties, virulence and extracellular enzyme production [17], pathogen recognition and activation of innate immune responses [18], and production of R-proteins to convey resistance to plant diseases [19].

For yam mosaic disease resistance, significant SNP marker have been reported on chromosome 15 alongside other four chromosomes from where several genes that are essential in plant defense mechanism and plant growth were found around the vicinity of the identified markers in D. rotundata [10]. This study also found significant SNP on chromosome 15 harboring the RNaseh_domain with essential function in antiviral defense mechanism in plant [20].

For tuber yield, significant SNP have been reported on chromosome 8 alongside eight other chromosomes from where two genes (AUX/IAA protein and Glycine-rich protein) have been identified around the vicinity of the identified markers in D. rotundata [10]. This study also found significant SNP on chromosome 8 as well as two other SNP loci on chromosomes 9 and 13 harboring genes (GNK2, CCT_CS, Jas motif, Fum_Rdtase/Succ_DH_flav-like_C, and Cyt_P450) that are essential for plant defense mechanism, plant growth and development, photo-assimilates production and partitioning around the vicinity of the SNPs loci.


In DRC, useful genetic variability exists in the panel of 182 yam accessions considered for this study. The genetic architecture of plant vigor, YAD, YMD, and plant yield are regulated by various SNPs unevenly distributed across the 20 chromosomes of the yam species used for this study. The associated SNP markers with plant vigor, YAD, YMD, and tuber yield could offer some potentials for employment for targeted and accelerated vigor, mosaic virus and anthracnose resistance, and tuber yield per plant in the species of yam considered for this study. The information from this study could help design new breeding strategies to capture superior alleles for improved vigor in yam, mosaic virus and anthracnose disease resistance and tuber yield per plant in future marker-based breeding in DRC.

Materials and method

Experimental site and planting materials

A panel of 182 yam accessions distributed across six species of yam (Sup. Table S1) obtained from previous germplasm collection exercise [4] were used for this study. The panel of yam accessions were evaluated for two years (2021 and 2022) at the University of Kisangani research terrain (longitude 0°33′05.9"N, latitude 25°05′17.3"E, Altitude 396 m a.s.l, Elevation 397 m a.s.l). The evaluation site is characterized by dense humid forest vegetation with an irregularly distributed rainfall pattern throughout the year (3156 mm annual). The soil type is mostly oxisols (ferralsols according to FAO classification) [30] and a mean temperature range of 21–35 °C minimum and maximum temperatures, respectively.

Phenotypic data collection

The accessions were planted using 12 by 16 lattice design with two replicates. Experimental plot consists of five plants on five-meter ridge spaced at 1 m within and between plants. The 182 accessions were phenotyped for two planting seasons. Tuber yield, plant vigor, YMD and YAD were assessed according to the recommendations of Asfaw [31] and yam crop ontology (access on 20th November 2022). Genotype fresh weight per plant was considered as yield per plant. Plant vigor, YMD, and YAD assessment were described in Table 5. The area under the disease progression curve (AUDPC), a valuable quantitative summary of disease severity for YMD and YAD over time was estimated using the trapezoidal method [32]. This method discretizes the time variable and calculates the average disease severity between each pair of adjacent time periods:

$$AUDPC=\sum\limits_{i=1}^{N}\frac{({Y}_{i}+ {Y}_{i+1})}{2} ({t}_{i+1}- {t}_{i})$$

where N is the number of assessments made, Yi is the anthracnose or virus severity score on date i, and t is the time in days between assessments Yi and Yi + 1.

Table 5 Assessment of plant vigor, yam mosaic and anthracnose diseases severity


Leaf samples were collected over 20 g of silica gel in covered plastic containers and kept under dark condition at room temperature for one week for adequate drying. Dried yam leaves were sent to Bioscience-IITA, Ibadan Nigeria where gDNA was extracted using CTAB protocol with slight modification. DNA quality was assessed using nanodrop before sending to Diversity Array Technology (DArT) Ltd Pty., Canberra, Australia for sequencing. High-throughput genotyping was conducted in 96 plex DArTseq protocol, and SNPs were called using the DArT’s proprietary software DArTSoft, as described by Killian et al. [33]. Generated reads was aligned with the D. rotundata reference genome V.2 [26].

Phenotypic data analysis

Analysis of variance (ANOVA) was conducted through mixed linear model using lmerTest package in R [21] by considering genotype as fixed effect while year, rep and block were considered as random effects as described in the model below.

$$Y_{ijkl}=\mu+G_i+Rep_j+Rep{\left(Blk\right)}_{j(k)}+Y_l+G\times Y_{(il)}+e_{ijkl}$$

where Y ijk is the phenotypic performance of accession for traits under consideration, µ is the average accession performance, G i is the effect of accession i, Rep j is the effect of replication j, Rep(Blk) j(k) is the block k effect nested in replication j, Y l is the effect of year l, G × Y (il) is the effect of the accession i by year l interaction, and e ijkl is the residual effect.

Degrees of relationship among the assessed traits was determined using the Pearson’s correlation coefficient and visualized using ggpairs function in ggally package [34]. Broad-sense heritability (H2), phenotypic coefficient of variance (PCV), and genotypic coefficient of variance (GCV) were calculated using the values derived from the respective variance components. H2 was classified as low (< 30%), medium (30–60%), and high (> 60%), according to Johnson et al. [35]. Following Deshmukh et al. [36], PCV and GCV estimates that were greater than 20% were rated as high, between 10 and 20% were rated as medium and lower than 10% were regarded as low.

$$H2= \frac{{\updelta }_{\mathrm{g}}^{2}}{{\updelta }_{\mathrm{g}}^{2}+\frac{{\updelta }_{\mathrm{gl}}^{2}}{\mathrm{l}} +\frac{{\updelta }_{\mathrm{e}}^{2}}{\mathrm{rl}}} \times 100= \frac{{\updelta }_{\mathrm{g}}^{2}}{{\updelta }_{\mathrm{g}}^{2}+{\updelta }_{\mathrm{E}}^{2}} \times 100 = \frac{{\updelta }_{\mathrm{G}}^{2}}{{\updelta }_{\mathrm{P}}^{2}} \times 100$$
$$PCV=(\frac{\surd {\updelta }_{\mathrm{P}}^{2}}{\upmu }) \times 100$$
$$GCV=(\frac{\surd {\updelta }_{\mathrm{g}}^{2}}{\upmu }) \times 100$$

where; δ2p = phenotypic variance, δ2g = genotypic variance, δ2gl = genotype by year interaction variance; δ2e: residual variance, r = number of replication; l = number of years; µ: grand mean of the trait.

Genotypic data analysis

Multiple sequences were generated by the DArTSeq platform using proprietary analytical pipelines. The HapMap file received from the DArT platform was converted into a variant call format. A total of 20,275 SNP markers were identified from the raw data and after filtering with VCFtools (Danecek et al., 2011) for minor allele frequency (MAF (0.05)), read depth (> 5), missing rate (80%), Genotype Quality (GQ = 20), maximum and minimum allele = 2 and no indels. A total of 11,721 SNP markers were retained after the filtering for downstream analysis. Summary statistics such as MAF, polymorphism information content (PIC), observed and expected heterozygosity (OH/EH) using PLINK 2 [37].

Genetic diversity and population structure analysis

Genetic diversity among the accessions and population structure was assessed using three methods namely; model-based maximum likelihood estimation of ancestral subpopulations using admixture [38], the phylogeny analysis through analysis of phylogenetic and evolution (APE) package [39] while the dendrogram was plotted using the ggtree package [40], PCA through FactorMiner R package [41] in R. Structure simulations were carried out using a burn-in period of 20,000 iterations and a Markov chain Monte Carlo (MCMC) set at 20,000. A binary file was generated using PLINK and subjected to cross-validation approaches for determination of the optimal K value. A cut-off value of 50% was applied and used to estimate membership probabilities. Genotypes were then assigned to groups accordingly. Population structure was further plotted using bar plot function implemented in R. For the PCA, the optimal number of clusters was assessed using the “silhouette” function implemented in FactoMiner R package [29].

Traits association analysis and gene annotation

A mixed linear model implemented in the GAPIT package in R was used to compute associations using the mixed model [42].

$$\mathrm{y }=\mathrm{ Xb }+\mathrm{ Zu }+\mathrm{ e}$$

where y is the vector of the phenotypic observations, X represents the SNP markers (fixed effect), Z represents the random kinship (co-ancestry) matrix, b is a vector representing the estimated SNP effects, u is a vector representing random additive genetic effects, and e is the vector for random residual errors.

A co-ancestry matrix from ADMIXTURE and kinship were included as covariates in the mixed-linear model using the Multi-random mixed linear model (MrMLM) respectively, to reduce spurious associations. The traits association analysis was conducted using single (year based) and combined BLUEs. The significant SNP markers were detected using Bonferroni threshold as stated by Cheng et al. [43] through six different genetic models namely: multi-locus random-SNP-effect Mixed Linear Model [44]; Fast multi-locus random-SNP-effect EMMA (FASTmrEMMA) [45]; Iterative Sure Independence Screening EM-Bayesian LASSO (ISIS EMBLASSO) [46]; polygenic-background-control- based least angle regression plus empirical Bayes (pLARmEB) [47]; polygenic- background-control-based KruskalWallis test plus empirical Bayes (pKWmEB) [48]; and fast mrMLM (FASTmrMLM) [46]. Quantile–quantile (QQ) plots were generated by plotting the negative logarithms (− log10) of the p-values against their expected p-values to fit the appropriateness of the GWAS model with the null hypothesis of no association and to determine how well the models accounted for population structure.

The possible candidate genes within the significant QTL region were searched in the defined range window of 1 MB at 500 Kb (downstream and upstream) from the yam Generic File Format (GFF3) file. Using the Generic File Format of the yam reference genome [26], the genes ID in the generic region were identified. Functions of the different putative genes were accessed using public database such as Interpro [49] and European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI) [50].

Availability of data and materials

Data can be obtained upon request from the corresponding author. VCF data available on the


  1. E. E. Bassey, “Constraints and Prospects of Yam Production in Nigeria,” Eur. J. Phys. Agric. Sci., vol. 5, no. 1, pp. 55–64, 2017, [Online]. Available:

  2. Asiedu R, Sartie A. Crops that feed the World 1. Yams. Food Secur. 2010;2(4):305–15.

    Article  Google Scholar 

  3. FAOSTAT, “FAO Food and Agriculture Organization of the United Nations Statistics database,” 2022. QC.

  4. Adejumobi I, et al. Diversity, trait preferences, management and utilization of yams landraces ( Dioscorea species ): an orphan crop in DR Congo. Sci Rep. 2022;12(1252):1–16.

    Article  CAS  Google Scholar 

  5. Bukatuka F, et al. Bioactivity and Nutritional Values of Some Dioscorea Species Traditionally Used as Medicinal Foods in Bandundu, DR Congo. European J Med Plants. 2016;14(1):1–11.

    Article  Google Scholar 

  6. N. L. Jeancy, M. Paul, E. L. Alasca, and B. Yves-dady, “Yam production on the sandy soil of Bateke Plateau ( DR Congo ),” J. Appl. Biosci., vol. 17, no. 163, pp. 16886–16896, 2021. online at on 31st July 2021

  7. Adejumobi I, Agre AP, Onautshu OD, Adheka GJ , Cipriano MI, Jean-Claude LK, Monzenga Joseph L. Assessment of yam landraces (Dioscorea spp.) of DR Congo for reaction to pathological diseases, yield potential and tuber quality characteristics, Agronomy, vol. 12, no. 599, pp. 1–20, 2022.

  8. Egesi CN, Odu BO, Ogunyemi S, Asiedu R, Hughes J. Evaluation of water yam (Dioscorea alata L.) germplasm for reaction to yam anthracnose and virus diseases and their effect on yield. J Phytopathol. 2007;155(9):536–43.

    Article  Google Scholar 

  9. Egesi CN, Onyeka TJ, Asiedu R. Severity of anthracnose and virus diseases of water yam (Dioscorea alata L.) in Nigeria I: Effects of yam genotype and date of planting. Crop Prot. 2007;26(8):1259–65.

    Article  Google Scholar 

  10. P. Agre, P. E. Norman, R. Asiedu, and A. Asfaw, “Identification of Quantitative Trait Nucleotides and Candidate Genes for Tuber Yield and Mosaic Virus Tolerance in an Elite Population of White Guinea Yam ( Dioscorea Rotundata ) Using Genome-Wide Association Scan,” BMC Plant Biol., pp. 1–16, 2021.

  11. Mondo JM, Agre PA, Asiedu R, Akoroda MO, Asfaw A. Genome-wide association studies for sex determination and cross-compatibility in water yam (Dioscorea alata L.). Plants. 2021;10(7):1–18.

    Article  CAS  Google Scholar 

  12. Gatarira C, et al. Genome-wide association analysis for tuber dry matter and oxidative browning in water Yam (Dioscorea alata L.). Plants. 2020;9(8):1–19.

    Article  CAS  Google Scholar 

  13. S. Enoki, K. Tanaka, A. Moriyama, N. Hanya, N. Mikami, and S. Suzuki, “Grape cytochrome P450 CYP90D1 regulates brassinosteroid biosynthesis and increases vegetative growth,” Plant Physiol. Biochem., vol. 196, 2023.

  14. R. W. Baker, P. D. Jeffrey, and F. M. Hughson, “Crystal Structures of the Sec1/Munc18 (SM) Protein Vps33, Alone and Bound to the Homotypic Fusion and Vacuolar Protein Sorting (HOPS) Subunit Vps16*,” PLoS One, vol. 8, no. 6, 2013.

  15. Ma C, et al. Exogenous Melatonin and CaCl2 Alleviate Cold-Induced Oxidative Stress and Photosynthetic Inhibition in Cucumber Seedlings. J Plant Growth Regul. 2022.

    Article  Google Scholar 

  16. H. Jiang et al., “A novel short-root gene encodes a glucosamine-6-phosphate acetyltransferase required for maintaining normal root cell shape in rice,” Plant Physiol., vol. 138, no. 1, 2005.

  17. R. Islam, S. Brown, A. Taheri, and C. K. Dumenyo, “The gene encoding nad-dependent epimerase/dehydratase, wcag, affects cell surface properties, virulence, and extracellular enzyme production in the soft rot phytopathogen, pectobacterium carotovorum,” Microorganisms, vol. 7, no. 6, 2019.

  18. G. Van Ooijen, G. Mayr, M. M. A. Kasiem, M. Albrecht, B. J. C. Cornelissen, and F. L. W. Takken, “Structure-function analysis of the NB-ARC domain of plant disease resistance proteins,” J. Exp. Bot., vol. 59, no. 6, 2008.

  19. G. Jia et al., “A haplotype map of genomic variations and genome-wide association studies of agronomic traits in foxtail millet (Setaria italica),” Nat. Genet., vol. 45, no. 8, 2013.

  20. K. Moelling, F. Broecker, G. Russo, and S. Sunagawa, “RNase H As gene modifier, driver of evolution and antiviral defense,” Frontiers in Microbiology, vol. 8, no. SEP. 2017.

  21. T. Miyakawa, K. I. Miyazono, Y. Sawano, K. I. Hatano, and M. Tanokura, “Crystal structure of ginkbilobin-2 with homology to the extracellular domain of plant cysteine-rich receptor-like kinases,” Proteins Struct. Funct. Bioinforma., vol. 77, no. 1, 2009.

  22. C. Strayer et al., “Cloning of the Arabidopsis clock gene TOC1, an autoregulatory response regulator homolog,” Science (80-. )., vol. 289, no. 5480, 2000.

  23. A. Norastehnia, R. H. Sajedi, and M. Nojavan-Asghari, “Inhibitory effects of methyl jasmonate on seed germination in maize (zea mays): effect on α-amylase activity and ethylene production,” Appl. plAnt Physiol., vol. 33, no. 2, pp. 13–23, 2007. Available:

  24. V. Yankovskaya et al., “Architecture of succinate dehydrogenase and reactive oxygen species generation,” Science (80-. )., vol. 299, no. 5607, 2003.

  25. J. Piaskowski, C. Hardner, L. Cai, Y. Zhao, A. Iezzoni, and C. Peace, “Genomic heritability estimates in sweet cherry reveal non-additive genetic variance is relevant for industry-prioritized traits,” BMC Genet., vol. 19, no. 1, 2018.

  26. Y. Sugihara et al., “Genome analyses reveal the hybrid origin of the staple crop white Guinea yam (Dioscorea rotundata),” Proc. Natl. Acad. Sci. U. S. A., vol. 117, no. 50, 2020.

  27. Y. Sugihara et al., “Population Genomics of Yams : Evolution and Domestication of Dioscorea Species,” 2021.

  28. P. Agre, P. E. Norman, R. Asiedu, and A. Asfaw, “Identification of Quantitative Trait Nucleotides and Candidate Genes for Tuber Yield and Mosaic Virus Tolerance in an Elite Population of White Guinea Yam ( Dioscorea Rotundata ) Using Genome-Wide Association Scan,” 2021.

  29. P. A. Agre et al., “Identification of QTLs Controlling Resistance to Anthracnose Disease in Water Yam (Dioscorea alata),” Genes (Basel)., vol. 13, no. 2, 2022.

  30. G. T. Adjumati, A. I. Pembele, and D. Ocan, “Use of charcoal ( biochar ) to enhance tropical soil fertility : A case of Masako in Democratic Republic of Congo,” J. Soil Sci. Environ. Manag., vol. 11(1), no. March, pp. 17–29, 2020.

  31. A. Asfaw, Standard Operating Protocol for Yam Variety Performance Evaluation Trial, no. April. 2016. p. 27.

  32. C. L. Campbell and L. V. Madden, Book Review: Introduction to Plant Disease Epidemiology., Vol. 19, Issue. 2. New York: John Wiley and Sons, New York., 1990.

  33. A. Kilian, G. Sanewski, and L. Ko, “The application of DArTseq technology to pineapple,” in Acta Horticulturae, 2016, vol. 1111.

  34. B. Schloerke et al., “Ggally: Extension to ggplot2,” R package version 0.5.0., 2020. .

  35. H. W. Johnson, H. F. Robinson, and R. E. Comstock, “ Genotypic and Phenotypic Correlations in Soybeans and Their Implications in Selection 1 ,” Agron. J., vol. 47, no. 10, 1955.

  36. S. DESHMUKH, M. BASU, and P. REDDY, “Genetic variability, character association and path coefficients of quantitative traits in Virginia bunch varieties of groundnut,” Indian J. Agric. Sci., vol. 56, no. 12, 1986.

  37. Z. L. Chen et al., “A high-speed search engine pLink 2 with systematic evaluation for proteome-scale identification of cross-linked peptides,” Nat. Commun., vol. 10, no. 1, 2019.

  38. D. A. Earl and B. M. vonHoldt, “STRUCTURE HARVESTER: A website and program for visualizing STRUCTURE output and implementing the Evanno method,” Conserv. Genet. Resour., vol. 4, no. 2, 2012.

  39. E. Paradis and K. Schliep, “Ape 5.0: An environment for modern phylogenetics and evolutionary analyses in R,” Bioinformatics, vol. 35, no. 3, 2019.

  40. G. Yu, D. K. Smith, H. Zhu, Y. Guan, and T. T. Y. Lam, “ggtree: an r package for visualization and annotation of phylogenetic trees with their covariates and other associated data,” Methods Ecol. Evol., vol. 8, no. 1, 2017.

  41. S. Lê, J. Josse, and F. Husson, “FactoMineR: An R package for multivariate analysis,” J. Stat. Softw., vol. 25, no. 1, 2008.

  42. J. Wang and Z. Zhang, “GAPIT Version 3: Boosting Power and Accuracy for Genomic Association and Prediction,” Genomics, Proteomics Bioinforma., vol. 19, no. 4, 2021.

  43. S. Cheng et al., “Distinct Aspects of Left Ventricular Mechanical Function Are Differentially Associated With Cardiovascular Outcomes and All-Cause Mortality in the Community,” J. Am. Heart Assoc., vol. 4, no. 10, 2015.

  44. S. B. Wang et al., “Improving power and accuracy of genome-wide association studies via a multi-locus mixed linear model methodology,” Sci. Rep., vol. 6, 2016.

  45. Y. J. Wen et al., “Erratum: Methodological implementation of mixed linear models in multi-locus genome-wide association studies (Briefings in bioinformatics (2017)),” Briefings in bioinformatics, vol. 18, no. 5. 2017.

  46. C. L. Tamba and Y.-M. Zhang, “A fast mrMLM algorithm for multi-locus genome-wide association studies,” bioRxiv, 2018.

  47. Zhang J, et al. PLARmEB: Integration of least angle regression with empirical Bayes for multilocus genome-wide association studies. Heredity (Edinb). 2017;118(6):517–24.

    Article  CAS  PubMed  Google Scholar 

  48. W. L. Ren, Y. J. Wen, J. M. Dunwell, and Y. M. Zhang, “PKWmEB: Integration of Kruskal-Wallis test with empirical Bayes under polygenic background control for multi-locus genome-wide association study,” Heredity (Edinb)., vol. 120, no. 3, 2018.

  49. M. Blum et al., “The InterPro protein families and domains database: 20 years on,” Nucleic Acids Res., vol. 49, no. D1, 2021.

  50. F. Madeira et al., “The EMBL-EBI search and sequence analysis tools APIs in 2019,” Nucleic Acids Res., vol. 47, no. W1, 2019.

Download references


Authors acknowledge the MOUNAF project for providing part funding for the research through the University of Kisangani. the International Foundation of Science (IFS) for Co-funding the research to make it a whole.


This study is partially supported by the BMGF through covering of the publication fees.

Author information

Authors and Affiliations



Conceptualization, A.I.I., P.A.A. Methodology, A.I.I. and P.A.A.; Supervision, D.O.O., P.A.A., and J.G.A.; Writing original draft, A.I.I. and P.A.A.; Manuscript review and editing, A.I.I., P.A.A., T.E.S, D.O.O.., J.G.A., J.L.K., I.M.C., and A.S.A.

Corresponding author

Correspondence to Paterne A. Agre.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that the research was conducted in the absence of any potential conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Adejumobi, I., Agre, P.A., Adewumi, A. et al. Association mapping in multiple yam species (Dioscorea spp.) of quantitative trait loci for yield-related traits. BMC Plant Biol 23, 357 (2023).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: