GWAS of pod morphological and color characters in common bean

Common bean (Phaseolus vulgaris L.) is an important legume species which can be consumed as immature pods and dry seeds after re-hydration and cooking. Many genes and QTL, and epistatic interactions among them, condition pod morphological traits. However, not all them have been mapped or validated nor candidate genes proposed. We sought to investigate the genomic regions conditioning pod morphological and color characters through GWAS. Single and multi-locus genome wide association analysis was used to investigate pod traits for a set of 301 bean lines of the Spanish Diversity Panel (SDP). The SDP was genotyped with 32,812 SNPs obtained from Genotyping by Sequencing. The panel was grown in two seasons and phenotypic data were recorded for 17 fresh pods traits grouped in four pod characters: pod length, pod cross-section, pod color, and number of seeds per pod. In all, 23 QTL for pod length, 6 for cross-section, 18 for pod color, 6 for number of seeds per pod and 9 associated to two or more pod characters were detected. Most QTL were located in the telomeric region of chromosomes Pv01, Pv02, Pv04, Pv08, Pv09 and Pv10. Eighteen detected QTL co-localized with 28 previously reported QTL. Twenty-one potential candidate genes involving developmental processes were detected underlying 11 QTL for pod morphological characters, four of them homologous to A. thaliana genes FIS2, SPL10, TTG2 and AML4 affecting silique size. Eight potential candidate genes involved in pigment synthesis, were found underlying five QTL for pod color. GWAS for pod morphological and color characters in the bean Spanish Diversity Panel revealed 62 QTL, 18 co-localized with previously reported QTL, and 16 QTL were underlain by 25 candidate genes. Overall 44 new QTL identified and 18 existing QTL contribute to a better understanding of the complex inheritance of pod size and color traits in common bean and open the opportunity for future validation works.


Background
Common bean (Phaseolus vulgaris L.) is an important legume species domesticated in two different areas of Latin America representing distinct Mesoamerican and Andean gene pools [1]. Cultivated genotypes of common bean exhibit wide diversity for growth habit, flower color, and shape, size and color of pods and seeds. Immature pod phenotypic diversity involves variation in length and curvature (straight vs curved), cross section (diameter, flat, round, sieve size), and color (yellow, green, purple) before the seeds start to develop. Immature pods of some bean genotypes are consumed as fresh green beans (syn. Garden, green, pole, snap, haricot or French beans) when the pods have reached maximum length while the seed is still forming, in contrast with dry beans that are consumed as mature seeds after rehydration and cooking. In all, 2.29 Mha were destined to snap bean crop in 2019 while 33.8 Mha were used in the dry bean crop (http://www.fao.org/faostat/). The snap bean group includes different market classes such as 'string snap bean' referring to types where the pod suture strings must be removed before consumption; 'yellow wax' and 'green bean', referring to yellow and green pod, respectively; 'Romano type' with a very large and flat pod; and 'blue lake type' with dark green pods that remain stringless and fibreless [2]. Furthermore, snap bean can be classified according to processing adaptation: frozen, canned, or fresh market.
A few studies report on the quantitative inheritance of pod length, thickness and width, and identification of QTL controlling these traits mapped across all 11 bean chromosomes [12][13][14][15]. Hagerty et al., used a dry bean x snap bean recombinant inbred population, to map: St (pod suture string) to Pv02; overlapping pod wall fiber, width, and thickness to Pv04; and pod length to Pv09 [14]. Murube et al., using two nested populations, found four genomic regions located on chromosomes Pv01, Pv02, Pv07 and Pv11 with overlapping QTL for pod size characters and number of seeds per pod [15].
The genetic control for color of immature pods is influenced by the Y and Arg genes: Y Arg exhibits green pod, y Arg yellow wax pod, Y arg greenish gray (silvery) pod, and y arg white pod [4]. The y allele conferring yellow pod color was mapped to Pv02 by Koinange et al. [8]. The B gene which regulates the production of precursors of anthocyanins pathway above the level of dihydrokaempferol formation also resides on Pv02 [16]. The genes Pur and Ro influence a range of pod colors from rose to purple pods [17]. The Ace gene produces shiny pod [18]. Myers et al. identified quantitative trait nucleotides (QTNs) associated with CIE L*, a*, b* color space values for pod color on Pv02, Pv03 and Pv05 in a panel of 149 snap bean accessions [19].
In summary, many genes and QTL, and epistatic interactions among them, condition pod morphological traits. However, not all genes have been mapped nor candidate genes proposed. Moreover, QTL need to be validated in different genetic backgrounds and environments before they can be implemented directly in plant breeding or used to search for underlying candidate genes. The reference genome for P. vulgaris [20] provides the framework for fine mapping genes and QTL conditioning pod morphological traits and to identify candidate genes. The reference genome combined with high throughput genotyping, improving statistical programs for detecting marker -trait associations, and access to diversity panels which have greater variation than bi-parental populations, enhances opportunities to identify putative genomic regions controlling specific traits [21].
A Spanish common bean diversity panel (SDP) of 308 lines was established from the local Spanish germplasm collection that included landraces and old and elite cultivars used for pod consumption [22]. The main aim of this work was to investigate genomic regions controlling pod size and color traits through genome wide association analysis (GWAS) of the SDP. Results will contribute to discovery of new genomic regions associated with pod characters, validation of reported QTL, and identification of candidate genes for the investigated traits.

Phenotypic variation, correlations and heritability
A total of 301 SDP lines were successfully characterized for the 17 morphological traits. The results show a wide and continuous variation for the 16 quantitative traits evaluated (see Figure S1 and Table S1). For instance, PL and NSP, two traits related to yield, ranged between 7.1 to 26.4 cm and 2.2 to 8.3 seeds, respectively. The SDP exhibited wide variation for color with green (241 lines), yellow (38), purple (3), green mottled (16) and yellow mottled (2) pods. Pod color measured by the CIE scale exhibited wide variation as well for the L*, a*, and b* vectors. For example, b* varied from − 4.38 to 40.8. The H 2 estimations for the 16 quantitative traits were high ranging from 0.31 for PSW to 0.91 for PLW (see Table S1).
Correlation analyses indicated significant relationships between many evaluated traits (Fig. 1). There were significant and positive correlations among the six pod section traits and a significant negative correlation for PSH/PSW, PSC and PSW. Most of the six pod length variables were significantly correlated except PL/PLC with PLP and PLA. Correlation analyses also revealed significative correlation among section and length traits except in five cases; PSC with PLP, PL and PLC and PSW with PLW and PL/PLC. NSP was significantly correlated with four pod length traits (PLA, PLP, PL and PLC). Finally, the three pod color variables (L*, a*, b*) were also significantly correlated.

Characterization and detection of SNPs
Sequencing of the GBS libraries yielded approximately 418 million reads in total for the 301 SDP lines. About 76.3% of the reads were successfully aligned to the common bean reference genome, 21.5% of the reads mapped to more than one locus, and 23.7% were unmapped. The NGSEP genotyping pipeline produced 346,819 biallelic SNPs in the 11 chromosomes and scaffolds of the reference genome. 32,812 SNPs distributed across the eleven bean chromosomes were retained after filtering parameters ( Figure S2). Most of these SNPs were present in coding regions (51.1%) and represented 46.1% silent mutations, 32.6% missense, 5.4% non-sense and 15.9% prime UTR regions. While intronic and intergenic regions contained 31.2 and 17.8%, respectively. A genomewide transition/transversion (Tr/Tv) ratio of 1.17 was observed.
GWAS SL-GWAS (MLM) revealed 63 significant QTNs, 57 of them grouped in 9 genomic regions (QTI): 7 for pod length, 1 for cross-section, and 1 for pod color. Six QTN showed a single association (Table S2). QTNs were not detected for PCOL and NSP. Interestingly, twenty-eight QTNs for pod morphological traits were detected at the distal end of chromosome Pv01 (45,582,871 to 48,454, 962) and 9 QTNs for the color vector b* in the telomere of chromosomes Pv07 (32026373-32,413,401).
ML-GWAS, using the six multi-locus models in the mrMLM package, revealed 103 QTN (Tables S3, S4, S5). QTNs were not detected for the index PSC. The mrMLM method detected the most associations (37) while FASTmrEMMA detected the fewest associations (21). In all, 14 significant QTNs were found for pod section traits (Table S3) and the QTN number per character ranged from 5 for PSH and only one for PSA and PSW. For pod length traits, 52 QTN were detected (Table S4), with 18 of them identified by at least two different GWAS methods. The number of QTNs ranged from 2 for PL/PLC a and 10 for PLP, PLA, PLC and NSP. These QTNs were mostly located in the telomeric regions of Pv01 and Pv02. Concerning pod color measured by CIE space, a total of 27 QTNs were detected (11 for the vector L*, 9 for a* and 7 for b*) while 10 QTNs were detected for pod color measured visually as  Table 1). Non-significant correlations (α = 0.05) are indicated with X a qualitative character (Table S5). These QTNs were mostly located in telomeric regions of chromosomes Pv02 (7) and Pv08 (5 The 166 QTNs detected 62 QTL, 23 for pod length, 6 for cross-section, 18 for pod color and 6 for number of seeds per pod as well as 9 QTL associated with multiple characters (Table 1). Most QTL were located on chromosomes Pv02 (12), Pv04 (7), Pv08 (7), and Pv10 (11) (see Fig. 2).

Co-location of QTL
Genomic positions for 96 previously reported QTL [10,12,14,15,19] for pod morphological traits in common bean were examined for overlap with the QTL identified in this work (Fig. 2). There were 15 genomic regions where a reported QTL and QTL detected in this study for pod traits overlapped ( Table 2; Fig. 2). These regions were located on seven chromosomes (Pv01, Pv02, Pv03, Pv04, Pv05, Pv06 Pv08 and Pv11). The beginning of chromosome Pv02 (542087-959,169) only co-located QTL for pod color, whereas the other overlapping QTL were associated with pod morphological traits or both morphological and pod color traits.

Discussion
Pod morphology and color are important traits in common bean because they influence consumer preference for pods which are eaten as green beans for many genotypes. This study identified genomic regions controlling pod traits in the Spanish Diversity Panel. This panel encompasses wide genetic [22] and phenotypic variation for pod color, pod size, pod cross section, and number of seeds per pod (see Figure S1). For instance, variation in pod length ranged between 26.5 and 7.5 cm, for SDP203 a Romano type with a very large green pod and SDP138 with a very short and flat green pod, respectively. Pod color varied from green to yellow to purple and by quantitative classification (CIE scale). The experimental design used (randomized complete block with a repetition per season) may affect the accuracy of trait estimation, particularly in a large trial. However, the recorded traits have high heritability and for most traits, H 2 estimates were high, suggesting a few major genes were involved. Results of correlation analysis support the grouping of the traits in four characters (pod length, pod cross section, and pod color traits, and seeds per pod). Most traits within a pod character were significantly correlated (Fig. 2). Number of seeds per pod (NSP), a major yield component [28], was only significantly correlated with pod length, pod area, pod perimeter and pod length curved.
Most existing methods used in association studies are based on single marker association in genome-wide scans with population structure and consider stringent methods to control false positive rate [29,30] so that some associations may not be detected by single locus models [29,30]. ML-GWAS showed a total of 103 associations with pod traits (14 for pod cross section characters, 42 for pod length character, 10 for number of seeds per pod and 37 for pod color) while SL-GWAS revealed 63 associations (3 for pod cross section, 50 for pod length, 10 for number of seeds per pod and 10 for pod color). All these association were grouped in a 62 QTL; 23 QTL involved in pod length characters, 6 in pod cross section characters, 18 in pod color, 6 in NSP and 9 in two more characters (Table 1; Fig. 2).
We observed that 18 QTL were co-located with earlier described QTL for pod size in various populations. QTL  [15].
Phvul.001G173700, a homologue of the Arabidopsis gene TTG2 (At2G37260), which affects seed size and weight in Arabidopsis and underlies a QTL for silique length in Brassica napus [26], is a candidate gene in this region.
In summary, GWAS revealed new and known genomic regions with QTL influencing pod size, pod color and number of seeds per pod. The 44 newly identified regions involved in the genetic control of pod size or color should be verified in future genetic analysis. The eightteen regions overlapping to previously reported QTL provide relevant information for the development of breeding programs and genetic analysis focused on these characters.

Plant material
The Spanish Diversity Panel (SDP) of 308 bean lines was described by Campa et al. [22]. Briefly, the SDP includes: 220 landraces, mostly from the updated Spanish Core Collection; 51 elite cultivars, mostly cultivated in Europe for snap bean consumption; and 37 lines representing traditional old cultivars and well-known breeding lines. The sequenced bean genotypes, G19833 [20] and BAT93 [41] were included as representatives of the Andean and Mesoamerican gene pools. The panel exhibits wide phenotypic variation for pod traits (see Figure S3). The population structure and linkage disequilibrium, described previously by Campa et al. [22], indicates two main groups corresponding to the Andean and Mesoamerican gene pools and a third group with admixture of both gene pools.

Phenotyping
The SDP was phenotyped in the greenhouse at Villaviciosa, Spain (43°2901 N, 5°2611 W; elevation 6.5 m)   Table 4). Fresh pods were harvested at the beginning of R8 stage when pods had reached maximum length and seeds began to enlarge. Twelve quantitative characters included pod longitudinal (PLP, PLA, PLW, PL, PLC, PL/PLC) and  Table 4 List of the 17 pod traits analysed. The code assigned to each character is indicated in parentheses. * 1, measured from digital images; 2, manually measured Classified as green, yellow, purple, mottled green, and mottled yellow cross section (PSP, PSA, PSW, PSH, PSH/PSW, PSC) dimensions that were obtained from 10 scanned fresh pods per line with the help of Tomato Analyzer software [42]. The external pod color was visually recorded as green, yellow, mottled green, mottled yellow and purple.
To record variation within phenotypic classes, the fresh pod color was also quantified with Tomato Analyzer software measuring three vectors in the CIE scale: L* detects the brightness from 0 (black) to 100 (white), a* represents color from green (negative values) to red (positive values), and the b* measures blue (negative values) to yellow (positive values). In parallel, the fresh pod color was visually recorded as green, yellow, mottled green, mottled yellow and purple. Finally, the number of seeds per pod (NSP) was manually recorded as an average from 10 pods. Mean values were adjusted identifying outliers through the coefficient of variation (CV). CV over 25% were not accepted. The phenotypic variation for individual traits was visualized by frequency distributions generated by ggplot2 [43]. Pearson's correlation coefficients among the traits were also investigated using the package 'corrplot' [44]. The broad-sense heritability (H 2 ) for each trait was estimated using the package 'heritability' [45]. The heritability and subsequent statistical analyses of the phenotypic data were conducted in R platform [46].

Genotyping
Genotyping-by-sequencing (GBS), as described by Elshire et al. [47], was conducted at BGI-Tech (Copenhagen, Denmark) using the ApeKI restriction enzyme. A GBS sequencing library was prepared by ligating the digested DNA to unique nucleotide adapters (barcodes) followed by PCR with flow-cell attachment site tagged primers. Sequencing was performed using Illumina HiSeq4000 and 100x Paired-End. The sequencing reads from different genotypes were deconvoluted using the barcodes and aligned to the Phaseolus vulgaris L. v2 reference genome (https://phytozome.jgi.doe.gov/ pz/portal.html#!info?alias=Org_Pvulgaris).
SNP discovery and genotype calling were conducted using NGSEP-GBS pipeline [48,49]. Maximum base quality score was set to 30 and minimum quality for reporting a variant was set to 40. All SNP markers detected with less than 50% missing values and a minor allele frequency (MAF) 0.05 were retained to perform imputation with ImputeVCF module into NGSEP, which is a reimplementation of the Hidden Markov Model (HMM) implemented in the package fastPHASE (http:// stephenslab.uchicago.edu/software.html). Annotation of variants was performed using the command Annotate by NGSEP. The distribution of the SNPs along chromosomes was visualized with the CMplot package (https:// github.com/YinLiLin/R-CMplot) of the R project [46]. SNPs were named considering physical position in the bean genome: chromosome and genomic position (bp).
Association analysis were carried out in the three data set; two seasons (spring 2017 and 2018) and the mean of two seasons. Critical threshold of significance was -log(p) > 5 for SL-GWAS and LOD > 5 for ML-GWAS. Significant trait-SNP (QTN) associations were considered when detected in the three analysis. Quantitative trait intervals (QTI) were defined when several QTNs were located at distance less than 0.3 Mbp. Significant QTN were classified as QTL according to the pod character (PodL, PodS, PodCol and NSP) and named considering the genomic position (chromosome and position Mbp).

QTL alignment
For QTL alignments, published mapping data from four independent studies that reported QTL for pod morphological traits in common bean [10,[12][13][14][15]19] were considered. Physical position was used to investigate the correspondence between the genomic regions identified in this work with the previously reported QTL. The physical position of QTL from the literature were based on flanking or underlying markers which were aligned with the bean reference (G19833) genome sequence v2.1 using the BLASTN algorithm (https://phytozome.jgi.doe.gov/pz/ portal.html). Marker sequences were obtained from PhaseolusGenes (http://phaseolusgenes.bioinformatics. ucdavis.edu/) or tag sequences containing the SNP supplied by the GBS analysis. ShinyCircos package [52] was used to visualize the position of each QTL in the bean genome from the underlying markers.

Candidate genes mining
Potential candidate genes were investigated in the bean genome v2.1 (www.phytozome.net) through exploration of the functional annotation of the genes underlying the detected QTL. In the case of single QTN, a window ±75, 000 bp from the QTN position was considered. Genes with a known function in developmental processes were considered. In addition, homologous genes to genes involving in the control of silique traits in Arabidopsis thaliana model species were examined [26,36].