Identification of quantitative trait nucleotides and candidate genes for tuber yield and mosaic virus tolerance in an elite population of white guinea yam (Dioscorea rotundata) using genome-wide association scan

Background Improvement of tuber yield and tolerance to viruses are priority objectives in white Guinea yam breeding programs. However, phenotypic selection for these traits is quite challenging due to phenotypic plasticity and cumbersome screening of phenotypic-induced variations. This study assessed quantitative trait nucleotides (QTNs) and the underlying candidate genes related to tuber yield per plant (TYP) and yam mosaic virus (YMV) tolerance in a panel of 406 white Guinea yam (Dioscorea rotundata) breeding lines using a genome-wide association study (GWAS). Results Population structure analysis using 5,581 SNPs differentiated the 406 genotypes into seven distinct sub-groups based delta K. Marker-trait association (MTA) analysis using the multi-locus linear model (mrMLM) identified seventeen QTN regions significant for TYP and five for YMV with various effects. The seveteen QTNs were detected on nine chromosomes, while the five QTNs were identified on five chromosomes. We identified variants responsible for predicting higher yield and low virus severity scores in the breeding panel through the marker-effect prediction. Gene annotation for the significant SNP loci identified several essential putative genes associated with the growth and development of tuber yield and those that code for tolerance to mosaic virus. Conclusion Application of different multi-locus models of GWAS identified 22 QTNs. Our results provide valuable insight for marker validation and deployment for tuber yield and mosaic virus tolerance in white yam breeding. The information on SNP variants and genes from the present study would fast-track the application of genomics-informed selection decisions in breeding white Guinea yam for rapid introgression of the targeted traits through markers validation. Supplementary Information The online version contains supplementary material available at 10.1186/s12870-021-03314-w.

Agre et al. BMC Plant Biology (2021) 21:552 by smallholder farmers for its starchy underground tuber and aerial bulbils [2,3]. The global estimated mean annual yam production and gross values are approximately 73 million tons and 14 billion US dollars, respectively, with West Africa accounting for 92% of the total yam production [4,5]. There are over 600 Dioscorea species, of which 11 are economically significant [6]. White Guinea yam (D. rotundata), indigenous to Africa, is the most produced and consumed among cultivated species, supporting the livelihood of over 300 million people [2]. Yam is also important in many key life ceremonies in the major producing areas of West Africa [7].
Despite its socio-economic importance, a significant yield increase has not been achieved over the decades compared to cereal crops [1]. Improved varieties are vital for attaining increased productivity in farmers' fields. The development of improved yam varieties requires a better understanding of the genetic control of traits contributing to the increased yield and acceptable quality by growers and consumers. However, the breeding efforts have not adequately explored the genetic basis of tuber yield and virus resistance traits to fast-track improved cultivar development. Genes controlling key traits such as resistance to pests and diseases, tuber yield, and tuber quality traits exhibit quantitative inheritance. They may not be linked in a preferred direction, making improving these traits challenging using conventional breeding techniques [8]. In QTL mapping studies, the variation in virus resistance is attributed to a single major locus with a modest contribution [9]. Two random amplification of polymorphic DNA (RAPD) markers tightly linked in the coupling phase with Ymv-1 locus on the same linkage group were reported in resistant genotypes of D. rotundata.
For tuber yield, limited knowledge exists regarding QTL mapping studies [8]. The QTLs detected for YMV in yam were mainly based on conventional family-based linkage mapping. In contrast, the GWAS strategy using naturally occurring variants is a more robust and efficient method for identifying significant loci and the genes involved in the genetic control of complex traits. The GWAS strategy has increasingly been utilized in many crops, including root and tuber crops, to dissect the underlying genetic control mechanism in complex traits. However, GWAS mapping for tuber yield and YMV tolerance in yam has not been reported to date.
Supporting yam breeding efforts based on quantitative genetics principles and genomics tools is indispensable to increase the program's effectiveness for increasing productivity. Yam cultivar development using conventional strategies spans at least ten years from crossing to variety release recommendation [4,6]. The complementation of the traditional breeding techniques with advanced molecular tools has reduced the breeding cycle in crops [10]. In theory, genotypic information from molecular markers, when associated with phenotypic traits of interest, may be extensively used to select individuals with higher genetic value through marker-assisted selection (MAS) [11].
This study's objective was to dissect the genetic control of tuber yield and YMV tolerance in white Guinea yam.

Plant materials
The study panel comprised 406 white Guinea yam clones, of which, 36 were trait progenitors, 49 elite clones, and 321 early generation breeding lines from the IITA's yam breeding program (Supplementary Table 1). All the genotypes are from the International Institute of Tropical Agriculture, IITA Ibadan Nigeria and are maintained by the Yam Breeding Improvement Unit.

Phenotyping
Phenotypic data on tuber yield per plant (TYP) and yam mosaic virus (YMV) severity were recorded on the plant materials assessed at different breeding stages at IITA in Nigeria. The TYP and YMV severity were recorded on plants in the field using the procedure described in yam ontology (http:// www. cropo ntolo gy. org/ ontol ogy/ CO_ 343/ Yam) and yam standard operation protocol [12]. Tuber yield was recorded in kilogram on a plant basis at harvest (eight months after planting). The YMV severity score was assessed at 30-day intervals from 2 to 6 months after planting based on a visual assessment of the relative area of plant leaf surfaces affected by the mosaic virus disease using a five-ordinal scale of 1-5. A score of 1 represented no visible symptoms of virus infection, 2 for mild mosaic, vein-banding, green spotting or flecking, curling and mottling on few leaves but no leaf distortion, 3 for low incidence (25-50%) of the mosaic virus on the entire plant, 4 for the severe mosaic on most leaves and leaf distortion, and 5 for severe mosaic and bleaching with severe leaf distortion and stunting. The virus severity score values were converted to percentages and then used to estimate the area under disease progress curve (AUDPC) values as described by Forbes et al. [13]: where y i = disease severity at the i th observation, t i = time (days) at the i th observation, and n = total number of observations.

Phenotypic data analysis
We applied a one-step linear mixed model that used G-matrix to compute the best linear unbiased predictor (BLUP) values of an individual clone for a trait from the best fit model using the average information criterion (AIC) in restricted maximum likelihood (REML) algorithm [14] in the ASReml-R version 4 package [15]. The model used was: where y ij is the phenotypic value, μ is the overall average (shared by all observations), β i is the effect of block i, τ j is the specific effect to genotype j, γk is the specific effect to trials k and ℇ ij is an effect specific to each experimental unit (combination block and genotype ) and Z u u is the the vectors of random additive and nonadditive genetic within location effects, respectively, with corresponding design matrix Zu. Accordingly, the genetic variance was partitioned into the additive effects, which were associated with a covariance structure proportional to genetic relationships derived from the molecular markers and the non-additive genetic effect. The non-additive genetic variance is explained by individual identity rather than the genomic relationship matrix following the approach described by Borgognone et al. [16] and Ovenden et al. [17].
Broad sense heritability (H 2 ) estimates for the traits were calculated from phenotypic variance (σ 2 p) and the genotypic variance (σ 2 g). The BLUP values of the genotypes for the traits extracted from the best fit model were used as input for the GWAS model.

Genotyping and SNP data analysis
For each genotype, total genomic DNA was isolated from lyophilized young and fully expanded healthy leaves. Deoxyribonucleic acid (DNA) was extracted from the leaf samples using the CTAB procedure with slight modification [18]. DNA quality and concentration were assessed using agarose gel and nanodrop, respectively, following the methods described in Aljanabi and Martinez [19]. High-throughput genotyping was conducted in 96 plex DArTseq protocol, and SNPs were called using the DArT's proprietary software, DArTSoft, as described by Killian et al. [20]. Reads and tags found in each sequencing result were aligned to the Dioscorea rotundata reference genome version 2 (https:// drive. google. com/ drive/ folde rs/ 1H5T4 xjKAE l9LliR-4qK_ IR6Ty pCDe8 nj) with Hisat2 [21]. The raw HapMap file generated was first converted to a Variant Call Format (VCF) and filtered for missing value and polymorphic SNPs using quality control criteria of low sequence depth <5; SNP markers with missing values >20%; minor allele frequency (MAF) <0.05 and heterozygosity >50. Of the 16,242 SNP markers subjected to the filtering quality y ij = π + β i + τ j + γ k + ε ij + Z u u criteria, 5,581 good-quality SNPs were retained for various analyses.

Population genetic analysis
Various population genetic analysis methods were conducted to explore the structure and level of genetic diversity in the study material. The SNP distribution and the density were estimated using the 'Cmplot' function implemented in the CMplot R package [22]. For the SNP mutation from the reference to the alternative, SNPlay open website was used to estimate the rate of the transition and transversion across the retained SNP. Statistics such as the minor allele frequency (MAF), the observed and the expected heterozygosity, and the polymorphism information content were estimated using the function "--freq" and "--hardy" using PLINK V1.90 [23].
The genetic relationship among the plant materials was explored using the principal component analysis (PCA) in FactorMiner R package [24]. For the PCA, the origin of the plant (early generation and parental profile) was used as factor.
Structure software version 2.3.3 [25,26] was used to cluster samples into populations. Structure simulations were carried out using an admixture model with a burnin period of 20000 iterations and a Markov chain Monte Carlo (MCMC) set at 20000. The simulations were repeated 3 times for K-values of 1 to 10. The optimal subpopulation model was investigated in several ways: (1) by applying the informal pointers (i.e. geographical origin) proposed by Pritchard et al. [25] and Falush et al. [27]; (2) by considering ΔK, a second order rate change with respect to K, as defined in Evanno et al. [28], as implemented in STRU CTU RE HARVESTER [29] and thus the most likely value of K determined. Structure population was then plotted using barplot function implemeneted in R. The phylogeny tree was done using ape version 5.0 implemented in R [30].
In the mrMLM analysis, we accounted for population structure (Q) generated from Structure analysis. For each trait, the optimal number Q value included in the GWAS models was determined based on the highest ΔK value. The percentage of variation explained by the associated marker (R 2 ) and the markers effect were estimated in the mrMLM (v 4.0.2) R package (https:// cran.r-proje ct. org/ web/ packa ges/ mrMLM/ index. html).

Identification of existing putative genes
The possible candidate genes within the significant QTL region were searched in the defined range window of 1 MB at 500 Kb (downstream and upstream) from the yam Generic File Format (GFF3) file. Linkage disequilibrium (LD) was assessed between the significant SNPs using the LDheatmap library [38]. The yam generic feature format (GFF3) of the reference genome was used to identify the main gene in the inter-genic region using the SNPReff. Functions of the genes associated with the identified SNPs were determined using the public database Interpro, European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI) [39].

Haplotype estimation and SNP markers effect prediction
Haplotype associated with significant QTL was developed using "rstatix" package implemented in R, and the sequence of each haplotype was defined based on the 406 genetic material considered as testing and or identification population. The variant effect prediction was evaluated through the adjusted posterior probability, and the markers with high segregation were identified. Marker effects were then plotted for vizualization.  Table 2).

Genetic diversity, population structure and linkage disequilibrium
The DArT genotyping of 406 white Guinea yam clones detected the highest number of SNPs (637) mapped on chromosome 5 and the lowest of 123 on chromosome 11 ( Supplementary Fig. 1A). Transition SNPs (60.13%, 3,356 SNPs) were more frequent than transversions (39.87%, 2225 SNPs) ( Supplementary Fig. 1B). The observed heterozygosity value ranged from 0.029 to 0.622, with an average of 0.336 ( Supplementary Fig. 1C). The expected heterozygosity value ranged from 0.09 to 0.5, with an average of 0.331 ( Supplementary Fig. 1D). The minor allele frequency ranged from 0.05 to 0.5, with a mean of 0.24 ( Supplementary Fig. 1E). The polymorphic information content (PIC) ranged from 0.087 to 0.335, with an average value of 0.267 ( Supplementary Fig. 1F).
The population structure analysis of the yam diversity panel shows that the delta K values from the mean log-likelihood probabilities plateaued at K=7 (1306.47) (Fig. 1A). At K=7, the 406 yam diversity panel was divided into 7 sub-populations (Fig. 1C). Using the 50% cutoff criterion of membership probability threshold, 305 accessions were successfully assigned to the 7 different sub-populations. The remaining 101 accessions with a probability of associations less than 50% were designated as an admixed population. The phylogenetic tree also showed seven sub-populations with higher degrees of admixture similar to the delta K plot from the STRU CTU RE (Fig. 1B).
Exploring the the genetic relashionship through principal component analysis showed that the first two PCs account for 63.7% of the total variation (Fig. 2). The PCA clearly showed a higher degree of admixture between the early generation and parental profile clones. Both the early generation and the parental profile clones were distributed along PC1 and the PC2 (Fig. 2).

Genome-wide scan for traits Tuber yield
We found seventeen SNPs markers distributed on 9 chromosomes, significantly associated with tuber yield (kg plant -1 ) ( Table 2; Fig. 3). The LOD values for these SNPs ranged from 5.07 to 10.88 with minor allele frequency (MAF) ranging from 0.09 to 0.50. Of the 17 SNP markers associated with tuber yield, four were mapped on chromosome 4, two on chromosome 5, two each on chromosomes 8, 10, 14, and 17 and a single SNP each on chromosomes 13, 15, and 19 ( Table 2). The SNP marker chr05_24682916 explained the highest total phenotypic variance 8.47%.

Yam mosaic virus resistance
We found five SNP loci that showed a significant association with the reaction to mosaic virus infection ( Table 2, Fig. 4). Of the significant SNPs associated with YMV, three markers named chr03_6338751, chr05_30671001 and chr16_1482029 displayed negative quantitative trait nucleotide effects (Table 2). Using different genetic model for the SNP association SNP marker chr15_3906069 located on chromosome 15 was identified by two methods pLARmEB and pKWmEB. The total phenotypic variance explained by the markers associated with the yam mosaic virus varied from 0.33% to 5.96%. The minor allele frequency (MAF) of the associated SNP marker ranged from 0.16 to 0.49.

SNP-trait association mapping
Four multi-locus models (MLMs) including FASTm-rMLM, mrMLM, pKWmEB and pLARmEB detected a total of 22 QTNs across the 20 chromosomes of white yam for TYP and YMV traits (

Identification of existing putative genes Tuber yield
We explored the association of the identified QTN regions on the physical map with the potential candidate genes and their functions using the white Guinea between the specific SNPs in the vicinity of the peak adjacent to the putative gene (Fig. 5). On chromosome 4, the significant SNP for tuber yield is located on the genomic regions harboring six putative genes (Gibberellin regulated protein, AP2/ERF domain, NB-ARC, Dirigent protein, Membrane transport protein, and Importin subunit beta-1, plants) with known functions. On chromosome 5, we detected three putative genes (Expansin, AUX/IAA protein and AP2/ERF domain). On chromosome 8, we identified two putative genes (AUX/IAA protein; Glycine-rich protein) (Supplementary Table 4). Several putative genes were identified on chromosome 14 (Supplementary Table S4). On chromosome 15, which displayed average correlation through the Ldheatmap, five genes were identified in the vicinity of the targeted SNP marker. The LD heatmap for the SNP found in association with tuber yield on chromosome 19 revealed the presence of 9 putative genes (ABC transporter-like, Exportin-1/Importin-beta-like, Sodium/calcium exchanger membrane region, AUX/IAA protein, Geminivirus AL3 coat protein, AP2/ERF domain, Major facilitator, sugar transporter-like, and Expansin).

Yam mosaic virus resistance
We identified four candidate genes, namely AP2/ERF domain, Major facilitator, sugar transporter-like, and AUX/IAA protein on chromosome 3 near the SNP found in association with the YMV. The four identified candidate genes, AP2/ERF domain and AUX/IAA protein, were reported to confer essential gene functions related to plant defense and growth. The pairwise LD between the SNP of chromosome 3, 5, 10, 15 and 16 situated in genomic regions associated with YMV displayed a higher correlation with the three main haplotypes block (Fig. 6). On chromosome 10, fifteen different putative genes were identified near

Haplotype SNP distribution and SNP markers effect prediction
The frequencies and marker prediction effects of various haplotypes associated with tuber yield and resistance  Table 3. Of the seventeen SNP markers associated with the tuber yield, six SNP markers including chr04_6236404, chr05_24237388, chr08_7046574, chr13_13467988, chr14_11128124 and chr17_15363223 displayed high haplotype segregation among the different variants. Accordingly, the SNP markers on chromosomes 4, 5, 8, 13, 14 and 17 identified variants CC and CT to be associated with genotypes with higher tuber yield, whereas variants TT and AT were found to be associated with lower tuber yield (Fig. 7). Of the five SNP markers associated with the YMV, two (chr10_1116193 and chr16_1482029) were found to have high significant haplotype variations (Table 3). On chromosome 10, SNP markers associated with the YMV located at 1116193 bp showed that variants GG and AG were linked to lower predicted YMV value, while variant AA was identified to predict the higher YMV score (Fig. 8A). For the marker chr16_1482029 associated with YMV located at 1482029 bp variants TT and AT were linked to lower predicted YMV value (Fig. 8B). .

Phenotypic variation
The natural variation among the studied traits was high and very informative. Relatively high broad-sense heritability of 0.708 for tuber yield per plant and 0. 903 for yam mosaic virus severity score demonstrated substantial genetic variation in traits between the different clones. Therefore, the studied traits are amenable to genetic improvement through selection [40]. Furthermore, the observed natural genetic variation in the study materials signifies their relevance for genetic studies.

Population differentiation
Understanding population structure within the studied clones is imperative to determine how it affects the ability of GWAS to infer marker-trait association. The population structure of the present study based on the delta reveals 7 sub-populations, indicating high genetic variability. The high genetic variability indicates the potentials of the studied clones for genetic improvement aimed at tuber yield per plant and yam mosaic virus. The the phylogeny analysis reveals similar results as the populature structure analysis, indicating their relevance in preventing sham associations in GWAS in this study [41,42]. Thus, the marker density, diversity, and sample size demonstrated that the yam breeding panel used for this study is sufficiently powered to capture allelic variations for the studied traits.

Genome-wide association studies
The whole-genome scan for phenotypic and allelic variation in tuber yield and yam mosaic virus resistance identified genome regions on ten chromosomes (chromosomes 4, 5, 8, 10, 13, 14, 15, 16, 17 and 19) with significant −log10 values. Both Q matrix (population structure) were considered in a mixed linear model for the association analysis to reduce false-positive associations. The model used for tuber yield and tolerance to yam mosaic virus showed no inflation of p-values indicating that the structure of relationships was well accounted for in the GWAS analysis. These findings are consistent with the view that traits with no inflation of p-values show that the structural relationship is adequate for GWAS analysis [42]. Genome-wide association mapping has been used in exploring the elite alleles of many agronomic traits such as tuber dry matter and oxidative browning [42] in water yam (Dioscorea alata). In the present study, the phenotypic effect values of the favorable alleles of TYP and YMV were evaluated and inferred to positively and negatively affect the individual traits. Based on the stringent criterion of −log10, we identified 17 significant markers trait associations ranging between 1.01 e-20 and The information on SNP variants from the present study would fast-track the application of genomics-informed selection decisions in breeding white Guinea yam for higher tuber yield and resistance to mosaic virus. Such great potential of GWAS has been reported for some root and tuber crops such as cassava [43], potatoes [44] and water yam [42].

Detection of QTNs by multi-locus models (MLMs)
This study used different MLMs (FASTmrMLM, mrMLM, pKWmEB and pLARmEB) to identify genomic region associated with TYP and YMV. A total of 17 SNPs were significantly associated with TYP by the four MLM models across 9 out of the 20 chromosomes viz: chrs 4, 5, 8, 10, 14, 15, 17 and 19. Each of the four models detected different and complemeneted numbers of the SNPs: pKWmEB and pLARmEB (7 QTNs each) > FASTmrMLM, mrMLM (2 QTNs each). This indicates varied detection of each model. The MLMs used in this study detected putative candidate genes for the studied traits indicating its usefulness in GWAS. These results support the view that MLMs are useful for identifying QTNs and candidate genes in plants [45]. The findings of this study established a link between quantitative traits such as tuber yield and yam mosaic virus and single nucleotide polymorphisms. The variations observed in the population pannels constitute a pool of quantitative trait nucleotides (QTNs) that modulate tuber yield and yam mosaic virus traits in white yam.

Identification of putative genes
Our results identified SNP markers that associate significantly with allelic variation for tuber yield and YMV tolerance in white yam. The detected markers offer good targets for further validation and analysis due to their location in proximity to candidate genes regulating growth, development and disease resistance. The SNP in chromosome 3 is near to AP2/ERF domain, AUX/IAA protein, major facilitator, sugar transporter-like genes. Zarei et al. [46] reported that the AP2/ERF-domain transcription factor ORA59 acts as the integrator of the jasmonic acid (JA) and ethylene (ET) signaling pathways and is the key regulator of JA-and ET-responsive PLANT DEFENSIN1.2 (PDF1.2) expression. The SNP in chromosome 4 is near to Geminivirus AL1 replicationassociated protein, catalytic domain, AP2/ERF domain, NB-ARC, Dirigent protein, and membrane transport protein genes. The NB-ARC domain is noted to play a role in ATPase domain that comprises NB, ARC1, and ARC2 subdomains, which in its nucleotide-binding state regulates the R protein activity or resistance in plants [47]. The plant defense is induced by the R proteins in response to specific pathogen-derived molecules, called avirulence (AVR) proteins, thereby restricting pathogen proliferation [48]. The SNP in chromosome 10 is near to Geminivirus AL1 replication-associated protein, catalytic domain, Geminivirus Rep catalytic domain, Geminivirus AL3 coat protein, AP2/ERF domain, NB-ARC, Chlorophyll A-B binding protein, plant and chromista.   ns=non-significant, *, **, ***, and **** indicate significant association between haplotypes and markers disease resistance genes in our study require further validation and testing in yam germplasm. This could be done by converting these MTAs into low cost Kompetitive Allele-Specific PCR (KASP) markers that can efficiently transfer alleles into elite yam genotypes as reported for wheat [58]. These valuable genomic resources and PCR based markers (KASP markers) could greatly support selection initiatives for key traits in yam breeding through marker-assisted selection (MAS). These will also support the systematic study of the genetics, comparative genomics and evolution of yam, aimed at expediting the isolation and characterization of genes that control agronomically important traits such as tuber yield and yam mosaic virus. The SNP marker-TYP trait association exhibited high haplotype segregation. The marker effects alleles CC and CT are responsible for predicting high tuber yield per plant in the diversity panel used in the study, while alleles TT and GG were identified to associate with low yield. For the YMV, we found alleles GG, AG and TT to be responsible for low YMV disease scoring prediction. These findings suggest that data mining of favorable alleles is essential for improving the quantitative trait for tuber yield and YMV in yam using marker-assisted selection. Moreover, the results could be helpful for marker validation and deployment in yam breeding. Our findings agree with the view that information on marker effect based on segregation pattern is fundamental for marker validation and deployment in a breeding program [47,59]. Association mapping has been utilized to explore elite alleles present in many agronomic traits, including yield and related attributes in bread wheat [60].

Conclusion
Useful genetic variability exists in the 406 genotypes studied. The genetic architecture of TYP and YMV are regulated by varied QTNs unevenly distributed on the 20 chromosomes of white yam. Among the 4 MLM models, pKWmEB and pLARmEB are most robust in identifying more QTNs. The associated SNP markers could be potentially employed for targeted and accelerated tuber yield per plant and YMV resistance in white yam. The information from our study could help design new breeding strategies to hoard superior alleles for tuber yield per plant and yam mosaic virus in future marker-based breeding. The chromosomal regions controlling these studied traits could be exploited for selection and effective pyramiding of favorable alleles in white yam population improvement. Findings are relevant for population improvement of desirable TYP and YMV traits using marker assisted breeding (MAB) and haplotype-based scheme.