Genome wide association mapping for agronomic, fruit quality, and root architectural traits in tomato under organic farming conditions

Background Opportunity and challenges of the agriculture scenario of the next decades will face increasing demand for secure food through approaches able to minimize the input to cultivations. Large panels of tomato varieties represent a valuable resource of traits of interest under sustainable cultivation systems and for genome-wide association studies (GWAS). For mapping loci controlling the variation of agronomic, fruit quality, and root architecture traits, we used a heterogeneous set of 244 traditional and improved tomato accessions grown under organic field trials. Here we report comprehensive phenotyping and GWAS using over 37,300 SNPs obtained through double digest restriction-site associated DNA (dd-RADseq). Results A wide range of phenotypic diversity was observed in the studied collection, with highly significant differences encountered for most traits. A variable level of heritability was observed with values up to 69% for morphological traits while, among agronomic ones, fruit weight showed values above 80%. Genotype by environment analysis highlighted the strongest genotypic effect for aboveground traits compared to root architecture, suggesting that the hypogeal part of tomato plants has been a minor objective for breeding activities. GWAS was performed by a compressed mixed linear model leading to 59 significantly associated loci, allowing the identification of novel genes related to flower and fruit characteristics. Most genomic associations fell into the region surrounding SUN, OVATE, and MYB gene families. Six flower and fruit traits were associated with a single member of the SUN family (SLSUN31) on chromosome 11, in a region involved in the increase of fruit weight, locules number, and fruit fasciation. Furthermore, additional candidate genes for soluble solids content, fruit colour and shape were found near previously reported chromosomal regions, indicating the presence of synergic and multiple linked genes underlying the variation of these traits. Conclusions Results of this study give new hints on the genetic basis of traits in underexplored germplasm grown under organic conditions, providing a framework for the development of markers linked to candidate genes of interest to be used in genomics-assisted breeding in tomato, in particular under low-input and organic cultivation conditions. Supplementary Information The online version contains supplementary material available at 10.1186/s12870-021-03271-4.

Tripodi et al. BMC Plant Biol (2021) 21:481 Background The expectations of the society for the next decades on the agricultural systems rely on food chains able to guarantee healthy and secure food productions. While the needs concern the adequate levels of supply for the expected growing population, food availability and security are threatened by climate changes and increasing pressures on natural resources [1]. The awareness of more sustainable food systems towards the reduction of the loss of biodiversity and environmental degradation increased the attention to organic farming as a system able to minimize excess of external chemical inputs such as synthetic fertilizers and pesticides [2]. The organic agriculture sector is developing rapidly and its acreage increased by 55% over the past decade, covering globally over 1.5% of the total farmland. In this frame, the European Union represents the second biggest market with Spain, France, and Italy ranking among the eight World top countries in terms of organic cultivation acreage [3].
Tomato (Solanum lycopersicum L.) is one of the most important horticultural crops largely appreciated for its high health and nutritional contribution and for its economic value. As a vegetable crop, it represents one of the pillars of the Mediterranean diet being second only to potato in terms of total production [4]. Since the beginning of the twentieth century, intense breeding activities have been principally focused on the development of morphologically diverse and high-yielding varieties [5]. These efforts have been mostly addressed for high input conditions while few attempts have been carried out for low input and/or organic farming. It has been estimated that over 95% of organic production is based on varieties bred for the conventional sector and lacking relevant features for the organic low-input production condition [6]. Therefore, there is an urgent need to establish breeding programs for the development of resilient cultivars able to increase the competitiveness of organic systems [6]. A large reservoir of traits is present in the existing array of traditional varieties adapted to natural and/or marginal environments in which they have been differentially selected by local farmers across years. This germplasm is considered highly valuable in terms of characters related to agro-ecological adaptation as well as consumer preferences [7]. Indeed, the related phenotypes are the complex sum of factors somehow connected to the evolutionary and developmental pressure that determined the adaptability in response to diverse environmental conditions [8]. A comprehensive description of phenotypic profiles is the prior step for promoting the use of the extant diversity as well as a cornerstone for establishing new breeding programs. Therefore, its exploration is fundamental for adding value to local genotypes rediscovering those traits lost in modern varieties that have been mostly developed for their yielding capacity and attitudes to processing [9].
The increasing availability of large-scale genotypic data of germplasm resources for many crops has been benefited by the innovation in genomics and cutting-edge genotyping platforms. Linking phenomic information to genomic sequence data has provided invaluable opportunities for the dissection of the genetic basis of complex traits implemented in genome-wide association mapping studies (GWAS). GWAS has emerged as a powerful approach for the identification of genetic variants present at a significantly higher frequency for a target phenotype in unrelated individuals [10]. The advantage over bi-parental mapping population relies on a broader genetic base to exploit and the higher map resolution due to the larger number of recombination events occurring in natural populations. Since its development, the GWA computational efficiency has been bolstered by integrating the population structure covariates and the family kinship inference into the linear mixed model (LMM). This approach has been successfully applied in crops for both morphologic and agronomic traits [11]. It reduces the rate of false positives increasing the power of association through correction based on the relationships of individuals [12]. Furthermore, the compression approach (CLMM) and the population parameters previously determined (P3D), by clustering individuals into groups and eliminating the re-computation of variance components, improves statistical power by reducing bias due to any occurring substructures in genetic association datasets [13].
Tomato has been a pioneering crop for QTL mapping on crossing populations for agronomic and yield related traits. In the last decade, several GWA studies have been performed resulting in the identification of novel loci mostly for fruit quality, metabolites and flavor-related compounds [14][15][16]. Further efforts to investigate major agronomic traits have been performed in core sets of improved and wild cultivars [17]. More recently, studies have been directed to the dissection of several morphoagronomic and plant architecture features in panels of approximately 160 wild, cultivated and cerasiforme genotypes [18] as well as in 126 landraces and vintage cultivars [19].
All those studies have been focalized on the epigeous part of the plant leading to the identification of several associated genomic regions, whereas fewer attempts have been done for root traits. The root system is critical for water absorption and nutritional elements supply, providing physical support for plant growth. Most of the improved cultivars being adapted to high input conditions result in the development of smaller root systems [20]. Exploring root architecture, therefore, facilitates the identification of genotypes able to perform more efficiently in suboptimal conditions such as limited availability of nutrients, drought, or salinity.
In the present study, we performed a broad phenotypic characterization for morphological, agronomic, and root traits in 244 cultivated tomato genotypes grown in two organic field trials under typical Mediterranean environmental conditions. Then we linked the diversity observed at the phenotypic level with genomic data obtained through Restriction Site Associated DNA Sequencing by implementing a GWAS analysis. The population's size used is adequate to provide a comprehensive catalogue of phenotypic information and to improve the detection power of GWAS. The obtained results will shed light on the potentiality of the collection studied for the organic farming sector and for planning further breeding programs. Furthermore, the novel associated regions may provide useful insight into functional studies.

Phenotypic diversity
A wide range of phenotypic diversity was observed in the considered germplasm collection (Fig. 1). Significant differences between accessions (P < 0.001) were found for 25 traits, whereas pests in foliage and fruits showed a lower significance threshold (P < 0.01) ( Table 1). Only the length and diameter of the three largest roots, as well as the diseases in foliage and fruits, were not significant. On average, higher heritability was found for morphological plant and fruit traits, although two agronomic traits, fruit set/truss and fruit weight displayed heritability values Fig. 1 Overview of the phenotypic diversity of the collection studied. a Fruits on the plant and their diversity; b Leafy and not leafy inflorescence; c Flowers with exerted and inserted styles; d Variability for fruit size, shape, colour and green shoulder; e Puffiness and locules number; f Density of fine roots and radicular crown angle; g Type of inflorescence with mature fruits: uniparous and compound above 70 and 80%, respectively. For all traits, the coefficient of phenotypic variation was from moderate to high with values ranging from 9.56% (pH) to 176.72% (fruit weight). Values of the phenotypic coefficient of variance (PCV) were higher than those of the genotypic coefficient of variance (GCV), with a ratio PCV/GCV ranging from 1.02 to 1.52 for fruit weight and disease in fruits, respectively. The highest GCV values were found for puffiness appearance, fruit set/truss and fruit weight (Table 1). These traits also exhibited the highest genetic advance as a percentage of the mean (GAM) with values above 98%.
On average, GAM was comparable between morphological and agronomic traits, although, slightly higher in the former. Lower general values were instead observed for root traits. The lowest levels (< 10%) were observed for pH and Radicular Crown Angle.

Morphological plant and fruit descriptors
Regarding traits in qualitative scale, the collection was largely composed of indeterminate accessions with inflorescences mostly uniparous or compound with a reduced number of accessions having a fishbone inflorescence Table 1 Descriptive statistics for pseudoqualitative and quantitative traits, broad sense heritability (H 2 ), phenotypic and genotypic coefficients of variation (PCV and GCV, respectively), and genetic advance as percent of population mean (GAM), for traits analysed in two environments (Alcasser, Spain; Monsampolo del Tronto, Italy) on 244 tomato cultivars a ***, ** indicate significance at P < 0.001, P < 0.01, respectively; NS not significant (Additional File 1: Figure S1). The external fruit colour at maturity was mostly red with several long shelf life (LS) accessions having pink external colour. Genotypes with yellow fruits were instead found among heirlooms, landraces for fresh consumption and cultivars. A wide diversity was found for berry morphology with a different rate of variability between accessions groups (AG) being higher in heirlooms (HL) and lower in breeding lines (BL). We observed the occurrence of all the classes of shape, although the most represented were the flattened or circular ones. Blossom end scar was mostly closed in all AG with a high proportion of cultivars, landraces for fresh consumption as well as heirlooms holding both open and closed blossom end. (Additional File 1: Figure S1). As for pseudo-qualitative traits, many accessions showed intermediate foliage density and a medium or strong vigor of plants (Fig. 2a, b). Few accessions had leafy inflorescences with shoots, whereas the style position was mostly inserted or at the same level as the stamens favoring self-pollination (Fig. 2c, d). The fruit set sequence was on average higher in Spain for all accession  (Fig. 2e), whereas less differences were found for green shoulder which ranged from absent to dark green, with most accessions having light green or medium green shoulders (Fig. 2f ). Remarkably, the accessions had a low level of puffiness and medium level of firmness (Fig. 2g, h). Larger diversity was instead observed for the ribbing at the calyx end, although only a minor portion of genotypes had a strong or very strong ribbing (Fig. 2i).
For three morphological traits (style position, external colour, plant vigor), the range of variation did not cover the entire descriptors scale, in fact, the evaluated panel did not include individuals with highly exerted style position, green colour of mature fruits, and very strong plant vigor. Among accessions groups, breeding lines did not show any accessions with shoots in leafy inflorescence, very good fruit set sequence, and severe puffiness, while landraces for fresh consumption and heirlooms encompassed the largest variability in fruit shapes (Additional File 1: Figure S1, Table S1).

Agronomic traits
Plants and fruits were evaluated for agronomic traits, including their global health status during trials conduction. All accessions showed a slightly higher level of radial cracking with respect to concentric cracking ( Fig. 3) with similar levels at both locations. The same trend was observed for blossom-end rot which was generally absent or low at both locations. On the contrary, fruit fasciation resulted higher in Spain reaching intermediate or severe values in several accessions of the considered cultivar classes. At both sites, the lowest levels of fruit fasciation were found for long shelf-life types, with no significant differences among cultivar groups in Italy (Table S1). The incidence of symptoms of pests in foliage was very low in all cultivar groups with higher values in Spain than in Italy. The opposite trend was observed for pests in fruits and diseases in foliage and fruits, all showing higher average values in Italy. Overall, the incidence of pests and diseases was low and quite similar among varietal groups, with no significant differences observed for pests in Italy and for disease in foliage in Spain.
Marked divergences were found among accessions groups and locations for quantitative trait performances (Fig. 4). An earlier fruit maturation was observed in Italy with an average difference of 11 days over Spain (Fig. 4a, Table S2). Among cultivar groups, the LS types matured later than the others at both locations. Ripening uniformity was instead homogeneous for all accessions (Fig. 4b) being 7.99 and 8.72 days in Italy and Spain, respectively (Table S2).
In average, landraces for fresh consumption had the largest locules number as well as the greatest variability (Fig. 4c). The same trend was observed for fruit weight which, in addition, exhibited the highest value in all cultivar groups in Spain (Fig. 4d).
For both fruit weight and number of fruits per truss (Fig. 4e), the variation among accessions was very high, reaching differences of 500-fold between individuals with  Table 2. a-b ripening related traits; c-f yield related traits; g-i chemical traits; j-n root traits the largest and the smallest berry. We observed overall yields higher in Spain than in Italy with average increases in all cultivar groups ranging from 39% for the Heirlooms to 47% for LS types (Fig. 4f, Table S1). On the contrary, the soluble solids content was higher in Italy than in Spain with average values of 5.13 and 4.49, respectively (Fig. 4g, Table S2). On average, breeding lines had the highest average values for soluble solids content although landraces for fresh consumption with high values were found at both locations (Table S1). In general, the accessions with high soluble solids content had very low fruit weight (e.g., BT00500 showed the highest soluble solid content with 9.35 of Brix° and fruit weight 8.72 g in Italy and 7.46 of Brix° and fruit weight 9.32 g in Spain; data not shown). Although the pH was almost similar among accessions not showing any significant differences between growth sites, the total acidity was higher in Italy for all accession groups (Fig. 4h, i). Overall, in both environments, we found several local varieties with better performance for yield-and quality-related traits with respect to heirlooms and fresh cultivars. A remarkable variation was found for root traits. Among them, only radicular crown angle did not show any significant variation between cultivation sites and among accessions groups (Fig. 4j, Table S1). The measures carried out on the 3 main roots exhibited the highest values in Italy for all cultivar groups with average values 61 and 74% higher than in Spain for the length and the diameter, respectively (Fig. 4k, l, Tables S1-S2). On the contrary, the diameter of the main root and the density of fine roots were on average greater in Spain although the highest values were found in some heirlooms and cultivars grown in Italy (Fig. 4m, n, Table S1).

Genotypic and environmental components underlying trait variation
Results of the combined analysis of variance for the traits evaluated across growing sites are given in Table 2. A high level of significance (P < 0.001) was found for the genotypic component (G) for all traits except for pests in fruits and for the field scored diseases. For most traits, the main source of variability was due to G, which accounted on average for 40.46% of the total sums of squares (TSS%), ranging from 13.11% for DFr to 81.05% for FW. Among morphological characteristics, on average G accounted for 47.15%, while for the agronomic and root traits, on average G accounted for 42.06 and 43.75%, of TSS% respectively.
Partitioning of TSS% in the other components affecting the variation indicated that E and G × E accounted for an average of 12.61 and 23.99%, respectively, resulting in a generally lower influence on the analyzed parameters. Foliage density, blossom-end rot and radicular crown angle did not show any significant variation due to the site, whereas significance at P < 0.05 was found for GS, LN, RU and PH. Four traits including LI, BR, PFr and DFr were not influenced by G × E factors, while a low significance was observed on CC, FW and PH.
The length and the diameter of roots were the most affected by site environmental differences showing a TSS of 41.36 and 42.82%, respectively. The G × E interaction was on average higher for root traits (TSS equal to 15.07%). respect to agronomic (TSS equal to 13.88%) and morphological (TSS equal to 13.79%) ones. Fruit fasciation and ripening uniformity were those for which the variation was highly influenced by G × E with TSS values of 31.30 and 23.12%, respectively.

Multivariate analysis and correlations
The principal component analysis (PCA) in the first two dimensions explained 30.13% of the total variance ( Fig. 5, Additional file 1: Figure S2). The first six components explained over 50% of the variation. The first component, accounting for 18.93% of the total variance, was positively correlated to 22 of the traits scored and included all pseudo-qualitative agronomic traits. The second component which explained 11.20% of the total variance was positively correlated to 24 traits and included all root characterizations. Morphological traits and agronomic quantitative measures showed a wide distribution being present in both negative and positive parts of the biplot. Only fruit set sequence and fruit set truss were negatively correlated to both components. The eigenvalues and the variable contribution for principal components are reported in Table S3. Fruit weight and diameter of 3 Largest Roots were the main factors discriminating the genotypes under study accounting for 12 and 17.13% of the total variation of the first and second components, respectively.
The projection of the accessions on the two-dimensional PCA graph evidenced the wide variability of accessions groups according to phenotypic characterizations highlighting a partial differentiation of the long shelf-life types from the rest, which were mostly distributed in the positive axis of the second component. Fresh consumption landraces were instead mainly distributed in the positive loadings of the first component. Cultivars and heirlooms were more interspersed although centroids cluster closely in the negative plot of the first two PCs.
The Spearman rank correlation coefficients after Bonferroni correction calculated for qualitative and quantitative traits revealed how some traits were rather independent, whereas a group of traits clustered together because of a reciprocal tight correlation for the same category of measures (e.g., root traits, yieldrelated traits) (Fig. 6). Interestingly, we found at both locations positive significant correlations between fruit weight and total yield with different morphological features, suggesting how plant architecture could be a good indicator for the selection of performing genotypes. Soluble solid content was negatively correlated at both sites with yield, fruit weight and locule number. As expected, the number of fruit locules was strongly correlated with the increase of fruit weight. Finally, fruit acidity was negatively correlated with total yield and fruit weight, and positively correlated with soluble solids.

Population stratification
Partitioning of the studied germplasm collection based on 37,317 SNPs and independent STRU CTU RE imputations at a range of K-values from 1 to 14 and Evanno's test (Additional file 1: Figure S3) showed K = 5 as the most likely number of subpopulations (Fig. 7a). Two Table 2 Analysis of variance and significant levels for genotypic (G) and environmental effects due to location (E) for tested lines and combined effects for the pseudoqualitative and quantitative traits evaluated *, **, ***, significant at P < 0.05, P < 0.01, P < 0.001, respectively; NS not significant df degrees of freedom, TSS total sum of squares, F F ratio large subgroups were detected including 85 (K = 2) and 114 (K = 3) individuals, whereas the remaining subpopulations were composed by 18 (K = 1), 12 (K = 4), and 2 (K = 5) accessions (Table S4). The remaining 12 genotypes were classified as admixed. All the accessions groups were represented in the detected subpopulations except for K5 which included only two LS genotypes. The maximum likelihood-based phylogenetic analysis based on Tamura-Nei model divided the collection into two main clusters: a smaller one mostly including LS accessions, while the remaining genotypes were clustered in different subgroups in the second branch. The dendrogram better highlighted the presence of groups based on varietal types (Fig. 7b). The genetic relationships based on PCoA (Fig. 7c) confirmed the presence of specific sub clusters for the established accessions groups, highlighting a major dispersion of the improved cultivars. The degree of relationship between elements of phenotypic and molecular matrices showed significant correlations between morpho-agronomic and genetic data (r = − 0.836, P < 0.0001).
The estimate of r 2 for all pairs of linked SNP loci were used to assess the extent of linkage disequilibrium decay (Additional File 1: Figure S4). Across the genome, a rapid LD decay was observed (r 2 < 0.20) within less than 50 kb genomic regions.

Genome-wide association mapping
Genome wide association analysis was performed following a univariate composite linear mixed model implementing structure covariates (five sub-populations) as fixed effects and genetic relatedness (kinship matrix) as random effect. For fifteen traits, 47 significantly associated variants on 8 chromosomes, were consistently identified in both environments (Table 3). Twelve additional associations were found for six traits in only one of the sites (Table S5). Most peaks were in intergenic regions at a distance of 2.75 kb to 1.6 Mb from known genes, and associations underlying seven genomic regions were found on five chromosomes. A circular plot summarizing GWA analysis and singular Manhattan and QQ plots are shown in Fig. 8a and Additional File 1: Figure S5. The largest number of associations were found for fruit puffiness appearance (Fig. 8b). A ribosome maturation factor located on the top of chromosome 8 was associated to the increase of puffiness (Fig. 9a). The underlying candidate gene was part of a cluster of variants encoding for specific metabolic enzymes and for zinc  Table 1 finger proteins which are involved in several plant growth mechanisms as well as abiotic stress response [21]. In the same region, ~ 6.0 Mbp downstream, an intergenic significant association for fruit shape was detected (Fig. 8b).
Colocalization of associations for several flower and fruit traits including, inflorescence, style position fruit fasciation, number of locules, fruit weight and ribbing at calyx end was found for sun-like protein 31 (SlSUN31). In all instances, the minor allele determines an increase of the trait (Fig. 9b-g). In the same genomic region, 71.15 kb upstream, a reduction of locules numbers was found associated with a thylakoid lumenal protein (Fig. 9h, i).
An additional cluster in the range 11-23 Mbp on the long arm of chromosome 5 enclosed SNPs associated to style position, puffiness, and diameter of the main root, whereas two SNPs for fruit weight were found associated downstream at 36-38 Mbp. All detected variants fall in intergenic regions closely linked to candidate Fig. 6 Spearman's rank correlation coefficients between pairs of phenotypes. Correlation coefficients are indicated in each cell. Coloured correlations are those with P value < 0.05 after Bonferroni correction. Colour intensity is directly proportional to the coefficients. On the right side of the correlogram, the legend colour shows the correlation coefficients and the corresponding colours. Correlogram for traits scored in Spain is placed below the diagonal, the correlogram for traits scored in Italy is placed upside the diagonal. Trait acronyms are listed in Table 1 genes underlying common physiological and developmental processes in plants.
Significant associations within genes were found on the long arms of chromosomes 1, 2 and 8 for fruit colour, soluble solid contents, and growth habit, respectively. For fruit colour, a chromatin remodeling protein (chr24) was putatively responsible for the decrease of intensity of fruit colour from red to pink (Fig. 9j), whereas a haloacid dehalogenase-like hydrolase (had) protein was associated to a consistent increase of soluble solids content (Fig. 9k). The determined growth habit (Fig. 9l) was associated to a single genetic variant on chromosome 8 underlying adenylate isopentenyltransferase, an enzyme involved in the biosynthetic pathway of cytokinin in higher plants [22].
Associations in six chromosomes were found within the Spanish' environment for ripening uniformity, foliage density and radial cracking, whereas two variants associated to radicular crown angle and total yield were found on chromosomes 8 and 11 for the Italian growth site. In all cases, SNPs were in intergenic regions in a range of − 12.25 + 45.48 kb from the candidate gene (Table S5). For density of fine roots, two independent associations were found on the chromosomes 1 and 7 in Italy and Spain, respectively. For the latter, a basichelix-loop-helix (bhlh) transcription factor was a candidate for the increase of density of fine roots.  Upstream and downstream SNPs close to candidate genes are specified with "-" and "+, " respectively. 0 indicates that SNPs falls within the candidate gene

Breeding potentialities for the established cultivated tomato panel
Since its domestication, tomato has undergone extensive changes leading to the development of diverse morphotypes for plant architecture and fruit features. The advent of modern breeding has targeted the increase of yield through disease resistant cultivars able to satisfy the everincreasing global demand. Dynamic market trends and changes in consumers' preferences of recent decades have subsequently widened breeding targets to quality-related traits and shelf-life, for which the potentialities are principally held in vintage cultivars such as local varieties and heirlooms [23]. Years of selection across often marginal environments and alien introgression transfer from wild ancestors broadened the genetic and phenotypic diversity of these germplasm resources providing a suitable reservoir of alleles for breeding in unadapted genotypes and/ or for low-input cultivation conditions [24].
In recent years, farms converted to organic systems and related consumer's demand for organic products have both drastically increased. In this scenario, Southern European countries such as Italy, Spain, and France, represent almost 50% of the total organic acreage area of Europe [25]. While the profitability and convenience of organic farming is still a matter of debate [26], there is an undoubted need to develop suitable cultivars for the sector. Objectives for organic breeding include relevant agronomic, morphological and quality traits' performance under low-input conditions. To that end, we have explored a diversified cultivated tomato collection representative of the diversity enclosed in ancient and modern varieties, in two organic farms of typical Mediterranean environments.
The considerable level of diversity found for the 37 traits analyzed suggests the presence of suitable cultivars keeping outstanding levels of productivity and fruit quality for cultivation and marketing in the organic sector, as well as an interesting source of variation for breeding purposes. As expected, high heritability values coupled with low variation due to E and G × E interaction were observed for several morphological traits. Interestingly, we found strong genotypic control for fruit weight. This trait is one of the most extensively domesticated in tomatoes and is regulated by major genes/QTLs responsible for the high range of variation [27]. The observed high genetic advance (GAM) combined with high heritability suggests an additive gene action on their expression enhancing conditions for selection [28]. On the contrary, the lower levels of heritability coupled with the low GAM for root architecture highlights how the development of the hypogeal part of plants has a more complex control, with high environmental effect on it. The diameter of the main root and radicular crown angle were lesser influenced by the differences between cultivation sites suggesting the stronger genotypic control for the parts of root anatomy ensuring primary function in plant anchorage and nutrient absorption. This was supported by positive correlations with growth habit, ripening earliness and plant vigour found at both sites, which although non-significant, suggest a link among the main root and traits related to plant fitness. Furthermore, the observed correlations suggest that the root apparatus could play a role in ripening earliness and in the accumulation of soluble solids. Although these two traits are not intercorrelated, the level of sucrose is reported to enhance the ethylene receptor genes and their signaling during tomato fruit maturation [29]. The inverse correlation between °Brix and yield-related traits may be likely due to the minor concentration of the total soluble solids in fruits increasing their size and/or in plants increasing their yield. In addition, we found higher values for chemical traits (SS, AC) and for precocity of ripening in all cultivar groups grown during the late spring-end summer cycle (Italy). This trend could be linked to a synergic effect of increased sunlight intensity and temperature [30,31]. Overall, the multivariate approach of investigation allowed a better understanding of phenotypic performance in low input conditions for the studied tomato collection as well as of genotypic and environmental factors underlying the variation of traits. Such abundant variation made the core collection suitable for investigation of the genetic basis of traits assessed. Therefore, we performed a genome-wide association mapping analysis.

Association mapping for candidate genes identification
The tomato panel herein reported is part of a larger collection of 288 individuals previously characterized for its genomic diversity and population structure [32]. By removing the highly redundant accessions and segregating lines, we selected a core set of 244 genotypes used for genome-wide association mapping. The analysis with STRU CTU RE performed in the present study, identified a likely number of subpopulations and admixed individuals in agreement with the previous results obtained by using ADMIXTURE [32], confirming the absence of any misinterpretation of ddRAD data in diploid species using both the bayesian approach and the maximum likelihood framework [33]. The collection consisted of five groups of accessions selected based on pre-existing phenotyping information (e.g., fruit colour and morphology, tolerance to stresses), geographic origin and market preferences. Both genomic and phenotypic data highlighted a greater variability within the cultivars group than in the rest of groups, likely due to the more intense breeding activity respect to landraces and heirlooms. This observation is supported by Shouten et al. [34] who reported two boosts of diversity in tomato from the second half of the twentieth century aimed at the introgression of disease resistance alleles from wild relatives as well as at the improvement of quality. Genome-wide association analysis through the CLMM model allowed to identify a total of 59 significant signals associated to 21 traits. This method represents an improvement of the linear mixed model in terms of computational and statistical powers being most suitable to minimize false positives due to population structure [10]. We used the Bonferroni correction for multiple testing to controls the family-wise error. Although conservative, this method allows avoiding any excessive claims, being suitable when background LD exists between SNPs, but they are assumed to be independent (low LD) [35].
Analyses allowed to identify associations in previously reported genomic regions as well as new candidates. Growth habit in tomato is regulated by self-pruning, for which, the loss-of-function mutated gene (sp) is one of the most common used for breeding modern cultivars [36]. We found a single mutation not previously reported for the candidate gene underlying cytokine biosynthesis, essential hormones for growth and development [22], suggesting a possible role for plant habit.
Fruit shape, weight, colour, and content of soluble solids are among the most studied traits in tomato for which underlying genes and their function have been described [37]. Major genes responsible for the increase of fruit shape are ovate and sun conferring fruit elongated as well as fas and lc controlling locule number. The sunlike protein 31 (SlSUN31) detected in this study is part of the 34-member sun-like gene family controlling fruit elongation and being reported to be expressed also in buds and young flowers [38]. Similarly, a cluster of flowers and fruit traits associated to the same candidate gene (Solyc11g071840) have been described [39]. Furthermore, Mata-Nicolás et al. [18] found the same genomic region associated to the increase of locules number and to uniparous inflorescence, whereas Ruggeri et al. [14]. attributed a fruit weight association to the same locus. This gene is located 2 Mbp upstream of cell size regulator (CSR, Solyc11g071940) reported as a candidate of fw11.3, a major QTL controlling fruit weight by increasing cell size and found in a similar region as fas [40]. We also found fruit fasciation associated to SlSUN31 confirming previous reports mapping fas in the same region of CSR. Furthermore, SlSUN31 is located at approximately 3 Mbp from the region (Solyc11g064850) involved in branched inflorescences of tomato-fin mutants [41], although the effect found in our study leads to a compound inflorescence compared to what was previously reported [18]. The associations found on chromosome 11 correspond to regions previously reported to be under selection within wild and cultivated tomato germplasm. Razifard et al. [42] reported selective sweeps for locule numbers at 1.879 Kb from SlSUN31. Lin et al. [43] identified improvement sweeps leading to the increasing of fruit mass in correspondence of fw11.3. All these findings highlighted how the distal end of the long arm of chromosome 11 carry selective sweeps signals involved in the domestication and improvement of tomato confirming SlSUN31 as a potential candidate gene for improving traits related to flower and fruit morphology.
Two additional associations were detected for fruit weight on chromosomes 4 and 5, the former in the region of fw4.1 reported as a major increasing fruit weight in tomato [44], whereas the latter is likely a new genomic region although QTLs for fruit weight on chromosome 5 have been reported in a tomato mapping population [45].
A single association was found for fruit shape, for which the variant was in the range of 29 Mbp from fs8.1 conferring a blocky and slightly elongated shape [46], and at 47 Mbp from SlOFP16, a member of the ovate family proteins key regulators of fruit shape [37]. Furthermore, on chromosome 8, several peaks upstream of the association for fruit shape might be responsible for the variation of puffiness. These findings confirm the presence of a broad region on chromosome 8 containing clusters of genes involved in the variation of morphological characteristics of fruits [47]. Moreover, we found associations on different chromosomes for the puffiness, evidencing the presence of different involved genomic regions as observed earlier [48].
Among quality-related traits, the soluble solids content is important for taste and sweetness in tomato. We detected a single variant on chromosome 2 matching with Solyc02g081820.3.1, which encodes for an Haloacid dehalogenase-like hydrolase. The candidate gene falls 3 Mbp upstream of a previously identified region encoding for an UV excision repair protein (Solyc02g085840.2) [15] and in the range of 1.0-3.5 Mbp downstream from transporters of glucose (Solyc02g079220.2) [49], transporters of amino acids (Solyc02g070270) [50], and proteasome degradation protein (Solyc02g081700) [19]. Furthermore, the association maps in the same interval of ssc2.1 and ssc2.2, two QTLs found in tomato RIL and MAGIC populations [51] as well as close to Brix 2.2 described in a mapping population involving cherry tomato [52]. In addition, within the same genomic region, SSC has been found to colocalize with locule number and fruit weight [19,53]. Although we did not observe such clustering, our results shed light on the presence of multiple linked genes underlying the soluble solids variation in the interval of 40-45 Mbp of tomato chromosome 2.
Another strong association responsible for the reduced intensity of colour was detected on chromosome 1, at ~ 2 Mbp from the previously identified interval underlying fruit colour and β-carotene variation [19]. The candidate gene chromatin remodeling 24 (Solyc01g068280.3.1) falls in the same genomic region of SlMYB12, a gene harbouring the y mutation responsible for the reduction of accumulation of naringenin chalcone leading to pink-coloured fruits [54]. The most represented fruits in the studied collection were pink-and red-coloured, therefore, the above-mentioned association suggests the presence of possible additive genes underlying this phenotype on the short arm of chromosome 1.
The root apparatus system represents the hidden part of the plant and in tomato fewer efforts are reported in comparison to other crops for the identification of the genetic basis of its architecture. The number of associations were quite low compared to those found in the epigeal part. Interestingly, for the diameter of the main root, we found a peak on chromosome 11 falling at 50 Kb from a bidirectional sugar transporter sweet, which is involved in sucrose efflux through the phloem contributing to its migration in plant tissues and as transient storage reserves in the stem and root [55].
Overall, the extent of the phenotypic and genetic variability enclosed in the present diversified collection and exploited in the association mapping approach allowed to narrow down the genomic regions underlying important traits for tomato breeding as well as identify novel ones. By confirming the previously identified SNPs, the association model adopted validates the approach employed for exploiting our mapping panel suggesting the robustness of the genome-wide associations detected for marker assisted selection and breeding under organic conditions.

Conclusion
In this study, we performed phenotypic characterization and genome wide association analysis in a broad and heterogeneous cultivated tomato panel grown under organic conditions. We found a broad range of variation in the traditional and improved gene-pool, confirming their potentialities as a reservoir of traits to exploit in genetic improvement. The information obtained consolidate those from previous association mapping studies using mixture of sub-varieties of S. lycopersicum (e.g., var. cerasiforme) or close related wild species (e.g., S. pimpinellifolium). We confirmed the strong involvement of different genomic regions for the traits studied, highlighting a major one on chromosome 11 underlying several flower and fruit traits. These results open the perspective for further approaches through functional validation. Furthermore, these variants, if appropriately validated, are a powerful tool for genomic-assisted breeding in tomato, particularly to obtain new cultivars adapted to organic cultivation.

Plant material
The collection used in this study is composed of 244 cultivated tomato (S. lycopersicum) accessions retrieved from the Universitat Politècnica de València (UPV, Spain), the Research Centre for Vegetable and Ornamental Crops (CREA, Italy) and the Tomato Genetics Resource Center (TGRC; USA) genebanks. The panel is part of a larger germplasm collection described by Esposito et al. [32] and consists of five accession groups (AG) including 60 landraces for fresh consumption (FC), 58 long shelf-life landraces (LS) known also as 'da Serbo' , 'de Penjar' , 'de Ramellet' , 44 heirloom varieties (HL), 71 elite cultivars (CL) and 11 breeding lines (BL). Landraces mainly represent the Mediterranean area whereas heirlooms and cultivars come from different regions of the world. Detailed information regarding the accessions is reported in Table S4.

Field trials and Phenotyping
Accessions were grown in two organic certified fields during the spring-summer season of 2019. The first field was in a private farm of the municipality of Alcàsser (province of Valencia, Spain) (39°37′ N, − 0°44′ E). The second was located at the Research Centre for Vegetable and Ornamental Crops (CREA, Monsampolo del Tronto, AP, Italy) (42°53′ N, 13°48′ E) following an agro-ecological management approach based on both conservation tillage and crop diversification strategies as described by Campanelli and Canali (2012) [56]. At both sites, the average temperatures during the growing season were ~ 20-22 °C with peaks of 38 °C and 40 °C in Spain and Italy, respectively.
According to the typical tomato cycles of the two growing areas, plants were sown at the end of February (Spain) and at the beginning of April (Italy) and transplanted to the open field after 6 weeks. A biodegradable plastic mulch film was used to avoid competition with weeds. Plants were grown with at a density of 2.6 plant/m 2 using a system of double rows in which canes from adjacent rows were tied together forming a triangle-shaped structure. A drip irrigation system was used for water supply whereas fertilization consisted of the application of an organic fertilizer. At both locations, a randomized block design was followed with three blocks and one replicate, consisting of four plants, per block.
All plants in each block were phenotyped for 5 qualitative and 9 pseudo-qualitative traits related to plant architecture and fruit characteristics as well as for 23 agronomic and root traits of which 9 are pseudo-qualitative and 14 are quantitative. All traits were scored at both locations except root weight which was scored only in Italy. Roots were assessed at the end of the cycle (about 150 days after transplant) firstly removing soil and taking care not to damage the root system, then removing the substrate residues by carefully shaking the roots. Details of traits analyzed scale and method of measurement are listed in Table S6. Details of root characterization are described in Additional file 1: Figure S6. All data have been manually reviewed and curated prior to analysis.

Phenotypic data analyses
The generalized linear model (GLM) was used to analyze pseudo-qualitative and quantitative traits through the F test in the ANOVA to detect significant differences. Trait means differences among the AG and within each AG were compared by using Tukey HSD (honest significant difference) test at P = 0.05. Broad-sense heritability (H 2 ) was estimated as the proportion of phenotypic variation σ 2 p due to genotypic variation σ 2 G : The phenotypic coefficient of variation (PCV) in percentage was calculated as the ratio of the square root for the phenotypic variation (σ P ) on the mean (μ) for the considered trait: whereas the genotypic coefficient of variation (GCV) was estimated as: The genetic advance as percent of population mean was also derived as: Where k is the selection intensity at 5% (2.06). (1) A two factorial linear model was used to determine the effects of the genotype, environment, and their interaction on trait performance: where μ is the grand mean, "G" is the random effect of genotype "i", "E" is the environmental main effect corresponding to environment "j", ε represent the error term.
Mean square values (MS), were used to estimate the magnitude of the observed effect while the total sum of squares in percentage (TSS%) was calculated dividing the TSS of the effect by the total TSS. Experimental data were statistically elaborated using R. Correlations across the genotypes for phenotypic traits were calculated using the Spearman's test at P < 0.05 after Bonferroni's correction for multiple comparisons. The correlogram was constructed and visualized using the Corrplot package implemented in R version 3.0.2 (https:// github. com/ taiyun/ corrp lot). Principal component analysis (PCA) was carried out to determine which are the most effective descriptors in discriminating among accessions using the computer package XLSTAT 2012.1 and visualizing the similarities among accessions. The relationships between pairs of phenotypic and molecular data matrices were computed by the Mantel test using Pearson's r-value.

Population structure analysis
The accessions included in the tomato panel were genotyped by Esposito et al. [32] using Double Digest Restriction Associated DNA (ddRAD) Sequencing. A set of 238,644 SNPs were identified in the set of 244 accessions from mapping raw reads to the Solanum lycopersicum SL 4.0 pseudomolecule assembly (www. solge nomics. net). Missing genotype calls were imputed using LD k-nearest neighbor imputation (LD-kNNi) implemented in TAS-SEL. 13 Markers were filtered using VCFtools (http:// vcfto ols. sourc eforge. net/) with a call rate value lower than 95% and with minor allele frequency (MAF) lower than 5%. After filtering for call rate and MAF, a total of 37,317 SNPs was subsequently used for the downstream analysis. The population structure was inferred using a Bayesian model-based approach integrated by neighbor joining phylogenetic analyses. The model-based analysis was performed using Structure v2.3.4 and admixture model with allele frequencies correlated among populations [57]. Runs were done using 20,000 burn-in cycles followed by 10,000 Monte Carlo -Markov Chain (MCMC) iterations, a number of sub-populations (K) ranging between 1 and 14 with five independent runs for each K. The most probable numbers of sub-populations were determined according to Evanno's method using Structure Harvester (http:// taylo r0. biolo gy. ucla. edu/ struc tureH arves ter/). Once defined the optimal K value, a second Structure (5) µij = µ + Gi + Ej + (G × E)ij + ε run was repeated at the best K value to maximize the accuracy in determining the membership of each accession. The same parameters as above were used, except for the number of burn-in and MCMC iterations (150,000 and 100,000 respectively). Accessions were considered to belong to a specific sub-population if its membership coefficient (qi) was ≥ 0.50, whereas the genotypes with qi lower than 0.5 at each assigned K were considered as admixed.
A phylogenetic tree was drawn using the Maximum Likelihood method and the Tamura-Nei model with 100 bootstraps. Analyses were conducted in MEGA X software (https:// www. megas oftwa re. net/). Principal coordinate analysis (PCoA) was conducted with the Past 3.04 software using Euclidean distance [58].
The r 2 statistic was estimated for each pair of SNPs using Plink v1.09 [59] to investigate the linkage disequilibrium (LD) decay in the entire population by using 37,317 SNP markers. Mean r 2 was computed into 1 Kb intervals and LD decay curves was fitted using a non-linear model, with distance (Mb) values on the x-axis and r 2 values on the y-axis.

Genome wide association study
Genome-wide association mapping using a compressed mixed linear model (CMLM) with population parameters previously defined (P3D) was performed in TAS-SEL [13,60]. To minimize the confounding effects (e.g., population stratification, unequal relatedness among individuals) the model incorporated population structure covariates (Q matrix, according to the best number of 5 sub-populations) and the kinship (K matrix) estimated using the centered identity by state (IBS) for accounting relationships among individuals. The analysis has jointly implemented phenotypic data from the two locations. For traits for which no association was found, the analysis was performed considering each grown site independently. The significance threshold for marker-trait association was determined after Bonferroni multiple test correction with genome-wide α = 0.05. The P values was estimated according to the formula: Considering 37,317 SNPs, the marker was considered significant when the P value was less than 5.873 (−log 10 P = 1.339 × 10 − 6 ). The phenotypic variation explained by each marker was the R 2 -value obtained from CLMM model. Circular Plot, Manhattan plot and QQ plots for GWAS results were produced using the R Package CMplot. The chromosomal location of the genome-wide significantly associated SNPs was (6) Pvalue = 0.05 no. of markers displayed using PhenoGram (https:// ritch ielab. org/ softw are/ pheno gram).

In silico candidate gene identification
Physical mapping of significantly associated SNPs and functional annotation of the predicted underlying genes were performed using the Solanum lycopersicum SL4.0 genome browser (https:// solge nomics. net/ jbrow se_ solge nomics). For associated SNPs mapping within intron regions, we considered the nearest genes located upstream or downstream of the significant markers. Gene models were blasted against Tomato Genome Proteins (ITAG release 4.0) to determine the gene annotation.

Availability of data and materials
The genomic raw sequence data analysed during the current study are available in the NCBI-SRA (National Center for Biotechnology Information Short Read Archive) repository database under the accession number BioProject ID PRJNA638535. The phenotypic raw datasets used and analysed during the current study available from the corresponding author on reasonable request.