Skip to main content

Genome wide association mapping for agronomic, fruit quality, and root architectural traits in tomato under organic farming conditions



Opportunity and challenges of the agriculture scenario of the next decades will face increasing demand for secure food through approaches able to minimize the input to cultivations. Large panels of tomato varieties represent a valuable resource of traits of interest under sustainable cultivation systems and for genome-wide association studies (GWAS). For mapping loci controlling the variation of agronomic, fruit quality, and root architecture traits, we used a heterogeneous set of 244 traditional and improved tomato accessions grown under organic field trials. Here we report comprehensive phenotyping and GWAS using over 37,300 SNPs obtained through double digest restriction-site associated DNA (dd-RADseq).


A wide range of phenotypic diversity was observed in the studied collection, with highly significant differences encountered for most traits. A variable level of heritability was observed with values up to 69% for morphological traits while, among agronomic ones, fruit weight showed values above 80%. Genotype by environment analysis highlighted the strongest genotypic effect for aboveground traits compared to root architecture, suggesting that the hypogeal part of tomato plants has been a minor objective for breeding activities. GWAS was performed by a compressed mixed linear model leading to 59 significantly associated loci, allowing the identification of novel genes related to flower and fruit characteristics. Most genomic associations fell into the region surrounding SUN, OVATE, and MYB gene families. Six flower and fruit traits were associated with a single member of the SUN family (SLSUN31) on chromosome 11, in a region involved in the increase of fruit weight, locules number, and fruit fasciation. Furthermore, additional candidate genes for soluble solids content, fruit colour and shape were found near previously reported chromosomal regions, indicating the presence of synergic and multiple linked genes underlying the variation of these traits.


Results of this study give new hints on the genetic basis of traits in underexplored germplasm grown under organic conditions, providing a framework for the development of markers linked to candidate genes of interest to be used in genomics-assisted breeding in tomato, in particular under low-input and organic cultivation conditions.

Peer Review reports


The expectations of the society for the next decades on the agricultural systems rely on food chains able to guarantee healthy and secure food productions. While the needs concern the adequate levels of supply for the expected growing population, food availability and security are threatened by climate changes and increasing pressures on natural resources [1]. The awareness of more sustainable food systems towards the reduction of the loss of biodiversity and environmental degradation increased the attention to organic farming as a system able to minimize excess of external chemical inputs such as synthetic fertilizers and pesticides [2]. The organic agriculture sector is developing rapidly and its acreage increased by 55% over the past decade, covering globally over 1.5% of the total farmland. In this frame, the European Union represents the second biggest market with Spain, France, and Italy ranking among the eight World top countries in terms of organic cultivation acreage [3].

Tomato (Solanum lycopersicum L.) is one of the most important horticultural crops largely appreciated for its high health and nutritional contribution and for its economic value. As a vegetable crop, it represents one of the pillars of the Mediterranean diet being second only to potato in terms of total production [4]. Since the beginning of the twentieth century, intense breeding activities have been principally focused on the development of morphologically diverse and high-yielding varieties [5]. These efforts have been mostly addressed for high input conditions while few attempts have been carried out for low input and/or organic farming. It has been estimated that over 95% of organic production is based on varieties bred for the conventional sector and lacking relevant features for the organic low-input production condition [6]. Therefore, there is an urgent need to establish breeding programs for the development of resilient cultivars able to increase the competitiveness of organic systems [6]. A large reservoir of traits is present in the existing array of traditional varieties adapted to natural and/or marginal environments in which they have been differentially selected by local farmers across years. This germplasm is considered highly valuable in terms of characters related to agro-ecological adaptation as well as consumer preferences [7]. Indeed, the related phenotypes are the complex sum of factors somehow connected to the evolutionary and developmental pressure that determined the adaptability in response to diverse environmental conditions [8]. A comprehensive description of phenotypic profiles is the prior step for promoting the use of the extant diversity as well as a cornerstone for establishing new breeding programs. Therefore, its exploration is fundamental for adding value to local genotypes rediscovering those traits lost in modern varieties that have been mostly developed for their yielding capacity and attitudes to processing [9].

The increasing availability of large-scale genotypic data of germplasm resources for many crops has been benefited by the innovation in genomics and cutting-edge genotyping platforms. Linking phenomic information to genomic sequence data has provided invaluable opportunities for the dissection of the genetic basis of complex traits implemented in genome-wide association mapping studies (GWAS). GWAS has emerged as a powerful approach for the identification of genetic variants present at a significantly higher frequency for a target phenotype in unrelated individuals [10]. The advantage over bi-parental mapping population relies on a broader genetic base to exploit and the higher map resolution due to the larger number of recombination events occurring in natural populations. Since its development, the GWA computational efficiency has been bolstered by integrating the population structure covariates and the family kinship inference into the linear mixed model (LMM). This approach has been successfully applied in crops for both morphologic and agronomic traits [11]. It reduces the rate of false positives increasing the power of association through correction based on the relationships of individuals [12]. Furthermore, the compression approach (CLMM) and the population parameters previously determined (P3D), by clustering individuals into groups and eliminating the re-computation of variance components, improves statistical power by reducing bias due to any occurring substructures in genetic association datasets [13].

Tomato has been a pioneering crop for QTL mapping on crossing populations for agronomic and yield related traits. In the last decade, several GWA studies have been performed resulting in the identification of novel loci mostly for fruit quality, metabolites and flavor-related compounds [14,15,16]. Further efforts to investigate major agronomic traits have been performed in core sets of improved and wild cultivars [17]. More recently, studies have been directed to the dissection of several morpho-agronomic and plant architecture features in panels of approximately 160 wild, cultivated and cerasiforme genotypes [18] as well as in 126 landraces and vintage cultivars [19].

All those studies have been focalized on the epigeous part of the plant leading to the identification of several associated genomic regions, whereas fewer attempts have been done for root traits. The root system is critical for water absorption and nutritional elements supply, providing physical support for plant growth. Most of the improved cultivars being adapted to high input conditions result in the development of smaller root systems [20]. Exploring root architecture, therefore, facilitates the identification of genotypes able to perform more efficiently in suboptimal conditions such as limited availability of nutrients, drought, or salinity.

In the present study, we performed a broad phenotypic characterization for morphological, agronomic, and root traits in 244 cultivated tomato genotypes grown in two organic field trials under typical Mediterranean environmental conditions. Then we linked the diversity observed at the phenotypic level with genomic data obtained through Restriction Site Associated DNA Sequencing by implementing a GWAS analysis. The population’s size used is adequate to provide a comprehensive catalogue of phenotypic information and to improve the detection power of GWAS. The obtained results will shed light on the potentiality of the collection studied for the organic farming sector and for planning further breeding programs. Furthermore, the novel associated regions may provide useful insight into functional studies.


Phenotypic diversity

A wide range of phenotypic diversity was observed in the considered germplasm collection (Fig. 1). Significant differences between accessions (P < 0.001) were found for 25 traits, whereas pests in foliage and fruits showed a lower significance threshold (P < 0.01) (Table 1). Only the length and diameter of the three largest roots, as well as the diseases in foliage and fruits, were not significant. On average, higher heritability was found for morphological plant and fruit traits, although two agronomic traits, fruit set/truss and fruit weight displayed heritability values above 70 and 80%, respectively. For all traits, the coefficient of phenotypic variation was from moderate to high with values ranging from 9.56% (pH) to 176.72% (fruit weight). Values of the phenotypic coefficient of variance (PCV) were higher than those of the genotypic coefficient of variance (GCV), with a ratio PCV/GCV ranging from 1.02 to 1.52 for fruit weight and disease in fruits, respectively. The highest GCV values were found for puffiness appearance, fruit set/truss and fruit weight (Table 1). These traits also exhibited the highest genetic advance as a percentage of the mean (GAM) with values above 98%. On average, GAM was comparable between morphological and agronomic traits, although, slightly higher in the former. Lower general values were instead observed for root traits. The lowest levels (< 10%) were observed for pH and Radicular Crown Angle.

Fig. 1
figure 1

Overview of the phenotypic diversity of the collection studied. a Fruits on the plant and their diversity; b Leafy and not leafy inflorescence; c Flowers with exerted and inserted styles; d Variability for fruit size, shape, colour and green shoulder; e Puffiness and locules number; f Density of fine roots and radicular crown angle; g Type of inflorescence with mature fruits: uniparous and compound

Table 1 Descriptive statistics for pseudoqualitative and quantitative traits, broad sense heritability (H2), phenotypic and genotypic coefficients of variation (PCV and GCV, respectively), and genetic advance as percent of population mean (GAM), for traits analysed in two environments (Alcasser, Spain; Monsampolo del Tronto, Italy) on 244 tomato cultivars

Morphological plant and fruit descriptors

Regarding traits in qualitative scale, the collection was largely composed of indeterminate accessions with inflorescences mostly uniparous or compound with a reduced number of accessions having a fishbone inflorescence (Additional File 1: Figure S1). The external fruit colour at maturity was mostly red with several long shelf life (LS) accessions having pink external colour. Genotypes with yellow fruits were instead found among heirlooms, landraces for fresh consumption and cultivars. A wide diversity was found for berry morphology with a different rate of variability between accessions groups (AG) being higher in heirlooms (HL) and lower in breeding lines (BL). ​We observed the occurrence of all the classes of shape, although the most represented were the flattened or circular ones. Blossom end scar was mostly closed in all AG with a high proportion of cultivars, landraces for fresh consumption as well as heirlooms holding both open and closed blossom end. (Additional File 1: Figure S1).

As for pseudo-qualitative traits, many accessions showed intermediate foliage density and a medium or strong vigor of plants (Fig. 2a, b). Few accessions had leafy inflorescences with shoots, whereas the style position was mostly inserted or at the same level as the stamens favoring self-pollination (Fig. 2c, d). The fruit set sequence was on average higher in Spain for all accession groups (Fig. 2e), whereas less differences were found for green shoulder which ranged from absent to dark green, with most accessions having light green or medium green shoulders (Fig. 2f). Remarkably, the accessions had a low level of puffiness and medium level of firmness (Fig. 2g, h). Larger diversity was instead observed for the ribbing at the calyx end, although only a minor portion of genotypes had a strong or very strong ribbing (Fig. 2i).

Fig. 2
figure 2

Variation for morphological plant and fruit descriptors across cultivar groups in two locations. Stacked bars indicate the proportion of each class for the considered traits on a total scale 0–1. a-d Plant and flower traits; e-i Fruit traits. Details of measurement scale for each trait are in Supplementary Table 2. Locations: IT = Italy, ES = Spain; Cultivar groups: BL = breeding lines, CL = elite cultivars, LS = long shelf-life landraces, FC = landraces for fresh consumption, HL = heirloom varieties

For three morphological traits (style position, external colour, plant vigor), the range of variation did not cover the entire descriptors scale, in fact, the evaluated panel did not include individuals with highly exerted style position, green colour of mature fruits, and very strong plant vigor. Among accessions groups, breeding lines did not show any accessions with shoots in leafy inflorescence, very good fruit set sequence, and severe puffiness, while landraces for fresh consumption and heirlooms encompassed the largest variability in fruit shapes (Additional File 1: Figure S1, Table S1).

Agronomic traits

Plants and fruits were evaluated for agronomic traits, including their global health status during trials conduction. All accessions showed a slightly higher level of radial cracking with respect to concentric cracking (Fig. 3) with similar levels at both locations. The same trend was observed for blossom-end rot which was generally absent or low at both locations. On the contrary, fruit fasciation resulted higher in Spain reaching intermediate or severe values in several accessions of the considered cultivar classes. At both sites, the lowest levels of fruit fasciation were found for long shelf-life types, with no significant differences among cultivar groups in Italy (Table S1). The incidence of symptoms of pests in foliage was very low in all cultivar groups with higher values in Spain than in Italy. The opposite trend was observed for pests in fruits and diseases in foliage and fruits, all showing higher average values in Italy. Overall, the incidence of pests and diseases was low and quite similar among varietal groups, with no significant differences observed for pests in Italy and for disease in foliage in Spain.

Fig. 3
figure 3

Assessment of cultivar groups in Italy and Spain for (pseudo) qualitative agronomic traits. Histograms with error bars indicating ± standard deviations, showing average values for each cultivar group in each location. Qualitative measures are reported on the Y-axis for each trait. For all traits related to agronomic fruit quality, the scale varies from 1 (absent) to 7 (abundant); for pests and diseases in foliage, the scale varies from 1 (very scarce) to 9 (very severe). For cultivar group codes, see Fig. 2

Marked divergences were found among accessions groups and locations for quantitative trait performances (Fig. 4). An earlier fruit maturation was observed in Italy with an average difference of 11 days over Spain (Fig. 4a, Table S2). Among cultivar groups, the LS types matured later than the others at both locations. Ripening uniformity was instead homogeneous for all accessions (Fig. 4b) being 7.99 and 8.72 days in Italy and Spain, respectively (Table S2).

Fig. 4
figure 4

Quantitative agronomic and root trait characterization. Notched box-plots showing median values and quartiles for the different cultivar group in each location. The measurement scale for each trait is reported on the Y-axis, details in Supplementary Table 2. a-b ripening related traits; c-f yield related traits; g-i chemical traits; j-n root traits

In average, landraces for fresh consumption had the largest locules number as well as the greatest variability (Fig. 4c). The same trend was observed for fruit weight which, in addition, exhibited the highest value in all cultivar groups in Spain (Fig. 4d).

For both fruit weight and number of fruits per truss (Fig. 4e), the variation among accessions was very high, reaching differences of 500-fold between individuals with the largest and the smallest berry. We observed overall yields higher in Spain than in Italy with average increases in all cultivar groups ranging from 39% for the Heirlooms to 47% for LS types (Fig. 4f, Table S1). On the contrary, the soluble solids content was higher in Italy than in Spain with average values of 5.13 and 4.49, respectively (Fig. 4g, Table S2). On average, breeding lines had the highest average values for soluble solids content although landraces for fresh consumption with high values were found at both locations (Table S1). In general, the accessions with high soluble solids content had very low fruit weight (e.g., BT00500 showed the highest soluble solid content with 9.35 of Brix° and fruit weight 8.72 g in Italy and 7.46 of Brix° and fruit weight 9.32 g in Spain; data not shown). Although the pH was almost similar among accessions not showing any significant differences between growth sites, the total acidity was higher in Italy for all accession groups (Fig. 4h, i). Overall, in both environments, we found several local varieties with better performance for yield- and quality-related traits with respect to heirlooms and fresh cultivars.

A remarkable variation was found for root traits. Among them, only radicular crown angle did not show any significant variation between cultivation sites and among accessions groups (Fig. 4j, Table S1). The measures carried out on the 3 main roots exhibited the highest values in Italy for all cultivar groups with average values 61 and 74% higher than in Spain for the length and the diameter, respectively (Fig. 4k, l, Tables S1-S2). On the contrary, the diameter of the main root and the density of fine roots were on average greater in Spain although the highest values were found in some heirlooms and cultivars grown in Italy (Fig. 4m, n, Table S1).

Genotypic and environmental components underlying trait variation

Results of the combined analysis of variance for the traits evaluated across growing sites are given in Table 2. A high level of significance (P < 0.001) was found for the genotypic component (G) for all traits except for pests in fruits and for the field scored diseases. For most traits, the main source of variability was due to G, which accounted on average for 40.46% of the total sums of squares (TSS%), ranging from 13.11% for DFr to 81.05% for FW. Among morphological characteristics, on average G accounted for 47.15%, while for the agronomic and root traits, on average G accounted for 42.06 and 43.75%, of TSS% respectively.

Table 2 Analysis of variance and significant levels for genotypic (G) and environmental effects due to location (E) for tested lines and combined effects for the pseudoqualitative and quantitative traits evaluated

Partitioning of TSS% in the other components affecting the variation indicated that E and G × E accounted for an average of 12.61 and 23.99%, respectively, resulting in a generally lower influence on the analyzed parameters. Foliage density, blossom-end rot and radicular crown angle did not show any significant variation due to the site, whereas significance at P < 0.05 was found for GS, LN, RU and PH. Four traits including LI, BR, PFr and DFr were not influenced by G × E factors, while a low significance was observed on CC, FW and PH.

The length and the diameter of roots were the most affected by site environmental differences showing a TSS of 41.36 and 42.82%, respectively. The G × E interaction was on average higher for root traits (TSS equal to 15.07%). respect to agronomic (TSS equal to 13.88%) and morphological (TSS equal to 13.79%) ones. Fruit fasciation and ripening uniformity were those for which the variation was highly influenced by G × E with TSS values of 31.30 and 23.12%, respectively.

Multivariate analysis and correlations

The principal component analysis (PCA) in the first two dimensions explained 30.13% of the total variance (Fig. 5, Additional file 1: Figure S2). The first six components explained over 50% of the variation. The first component, accounting for 18.93% of the total variance, was positively correlated to 22 of the traits scored and included all pseudo-qualitative agronomic traits. The second component which explained 11.20% of the total variance was positively correlated to 24 traits and included all root characterizations. Morphological traits and agronomic quantitative measures showed a wide distribution being present in both negative and positive parts of the biplot. Only fruit set sequence and fruit set truss were negatively correlated to both components. The eigenvalues and the variable contribution for principal components are reported in Table S3. Fruit weight and diameter of 3 Largest Roots were the main factors discriminating the genotypes under study accounting for 12 and 17.13% of the total variation of the first and second components, respectively.

Fig. 5
figure 5

Phenotypic variability of the 244 cultivated tomato genotypes. Scatter plot of the first (PC1) and second (PC2) principal components showing the variation for 31 pseudo-qualitative and quantitative morphological, agronomic and root traits scored in two environments. Based on cultivar groups, accessions are represented by different coloured symbols indicated in the legend. The first and second component centroids for each cultivar groups are indicated by filled yellow symbols with shape and edge colour according to cultivar groups (see legend). The direction from the centre of the biplot indicate how each trait contributes to the first two components. Trait acronyms are listed in Table 1

The projection of the accessions on the two-dimensional PCA graph evidenced the wide variability of accessions groups according to phenotypic characterizations highlighting a partial differentiation of the long shelf-life types from the rest, which were mostly distributed in the positive axis of the second component. Fresh consumption landraces were instead mainly distributed in the positive loadings of the first component. Cultivars and heirlooms were more interspersed although centroids cluster closely in the negative plot of the first two PCs.

The Spearman rank correlation coefficients after Bonferroni correction calculated for qualitative and quantitative traits revealed how some traits were rather independent, whereas a group of traits clustered together because of a reciprocal tight correlation for the same category of measures (e.g., root traits, yield-related traits) (Fig. 6). Interestingly, we found at both locations positive significant correlations between fruit weight and total yield with different morphological features, suggesting how plant architecture could be a good indicator for the selection of performing genotypes. Soluble solid content was negatively correlated at both sites with yield, fruit weight and locule number. As expected, the number of fruit locules was strongly correlated with the increase of fruit weight. Finally, fruit acidity was negatively correlated with total yield and fruit weight, and positively correlated with soluble solids.

Fig. 6
figure 6

Spearman’s rank correlation coefficients between pairs of phenotypes. Correlation coefficients are indicated in each cell. Coloured correlations are those with P value < 0.05 after Bonferroni correction. Colour intensity is directly proportional to the coefficients. On the right side of the correlogram, the legend colour shows the correlation coefficients and the corresponding colours. Correlogram for traits scored in Spain is placed below the diagonal, the correlogram for traits scored in Italy is placed upside the diagonal. Trait acronyms are listed in Table 1

Population stratification

Partitioning of the studied germplasm collection based on 37,317 SNPs and independent STRUCTURE imputations at a range of K-values from 1 to 14 and Evanno’s test (Additional file 1: Figure S3) showed K = 5 as the most likely number of subpopulations (Fig. 7a). Two large subgroups were detected including 85 (K = 2) and 114 (K = 3) individuals, whereas the remaining subpopulations were composed by 18 (K = 1), 12 (K = 4), and 2 (K = 5) accessions (Table S4). The remaining 12 genotypes were classified as admixed. All the accessions groups were represented in the detected subpopulations except for K5 which included only two LS genotypes.

Fig. 7
figure 7

Genomic diversity of the cultivated tomato collection. a STRUCTURE analysis of 244 S. lycopersicum genotypes with 37,317 SNP markers in the case of five clusters (K). The vertical coordinates of each subpopulation indicate the membership coefficient for each individual; each vertical bar represents one genotype. The coloured blocks correspond to the different clusters. b Maximum likelihood phylogenetic tree (unrooted). The initial tree for the heuristic search was obtained automatically by applying Neighbor-Join and BioNJ algorithms to a matrix of pairwise distances estimated using the Tamura-Nei model, and then selecting the topology with superior log likelihood value. The tree with the highest log likelihood (− 1,427,221.59) is shown. Internal colours correspond to clusters K1 to K5 (see panel a). c PCoA visualization of the genetic relationships between members of the association panel. Coloured symbols in the phylogenetic tree and the PCoA represent accessions from different cultivar group indicated in the legend

The maximum likelihood-based phylogenetic analysis based on Tamura-Nei model divided the collection into two main clusters: a smaller one mostly including LS accessions, while the remaining genotypes were clustered in different subgroups in the second branch. The dendrogram better highlighted the presence of groups based on varietal types (Fig. 7b). The genetic relationships based on PCoA (Fig. 7c) confirmed the presence of specific sub clusters for the established accessions groups, highlighting a major dispersion of the improved cultivars. The degree of relationship between elements of phenotypic and molecular matrices showed significant correlations between morpho-agronomic and genetic data (r = − 0.836, P < 0.0001).

The estimate of r2 for all pairs of linked SNP loci were used to assess the extent of linkage disequilibrium decay (Additional File 1: Figure S4). Across the genome, a rapid LD decay was observed (r2 < 0.20) within less than 50 kb genomic regions.

Genome-wide association mapping

Genome wide association analysis was performed following a univariate composite linear mixed model implementing structure covariates (five sub-populations) as fixed effects and genetic relatedness (kinship matrix) as random effect. For fifteen traits, 47 significantly associated variants on 8 chromosomes, were consistently identified in both environments (Table 3). Twelve additional associations were found for six traits in only one of the sites (Table S5). Most peaks were in intergenic regions at a distance of 2.75 kb to 1.6 Mb from known genes, and associations underlying seven genomic regions were found on five chromosomes. A circular plot summarizing GWA analysis and singular Manhattan and QQ plots are shown in Fig. 8a and Additional File 1: Figure S5. The largest number of associations were found for fruit puffiness appearance (Fig. 8b). A ribosome maturation factor located on the top of chromosome 8 was associated to the increase of puffiness (Fig. 9a). The underlying candidate gene was part of a cluster of variants encoding for specific metabolic enzymes and for zinc finger proteins which are involved in several plant growth mechanisms as well as abiotic stress response [21]. In the same region, ~ 6.0 Mbp downstream, an intergenic significant association for fruit shape was detected (Fig. 8b).

Table 3 Significant genome-wide associations robustly detected in two environments for morphological, agronomic and root traits using a compressed mixed linear model. For each trait, chromosomal position (in bp), main statistic and effects of each association, corresponding annotated gene or close annotated gene are shown
Fig. 8
figure 8

Genome wide association analysis. a Circos plot diagram showing associations for 15 traits. The dashed red line indicates significant threshold (−log10 p-value). For each chromosome is showed the (SNP) density in the tomato collection. The legend scale indicates the number of SNPs within 1 Mbp window size. Significant peaks are represented by blue dots. Each trait is represented by a concentric circle: a) Growth Habit, b) Inflorescence, c) Style Position, d) Fruit Shape, e) Green Shoulder, f) Fruit Color, g) Puffiness, h) Ribbing Calyx End, i) Concentric Cracking, j) Blossom-end rot, k) Fruit fasciation, l) Locules Number, m) Fruit Weight, n) Soluble Solids, o) Diameter of the Main Root. b Significantly associated SNPs, their chromosomal position and cluster of variants on eight tomato chromosomes. Circles represent the association between one genetic variant and one trait. Colors distinguish phenotypes

Fig. 9
figure 9

The effect of SNPs falling within gene regions. Notched box-plots showing the phenotypic performances of traits due to associations within genes. Red boxplot indicates the major allele, green boxplot indicates the mutated minor allele

Colocalization of associations for several flower and fruit traits including, inflorescence, style position fruit fasciation, number of locules, fruit weight and ribbing at calyx end was found for sun-like protein 31 (SlSUN31). In all instances, the minor allele determines an increase of the trait (Fig. 9b-g). In the same genomic region, 71.15 kb upstream, a reduction of locules numbers was found associated with a thylakoid lumenal protein (Fig. 9h, i).

An additional cluster in the range 11–23 Mbp on the long arm of chromosome 5 enclosed SNPs associated to style position, puffiness, and diameter of the main root, whereas two SNPs for fruit weight were found associated downstream at 36–38 Mbp. All detected variants fall in intergenic regions closely linked to candidate genes underlying common physiological and developmental processes in plants.

Significant associations within genes were found on the long arms of chromosomes 1, 2 and 8 for fruit colour, soluble solid contents, and growth habit, respectively. For fruit colour, a chromatin remodeling protein (chr24) was putatively responsible for the decrease of intensity of fruit colour from red to pink (Fig. 9j), whereas a haloacid dehalogenase-like hydrolase (had) protein was associated to a consistent increase of soluble solids content (Fig. 9k). The determined growth habit (Fig. 9l) was associated to a single genetic variant on chromosome 8 underlying adenylate isopentenyltransferase, an enzyme involved in the biosynthetic pathway of cytokinin in higher plants [22].

Associations in six chromosomes were found within the Spanish’ environment for ripening uniformity, foliage density and radial cracking, whereas two variants associated to radicular crown angle and total yield were found on chromosomes 8 and 11 for the Italian growth site. In all cases, SNPs were in intergenic regions in a range of − 12.25 + 45.48 kb from the candidate gene (Table S5). For density of fine roots, two independent associations were found on the chromosomes 1 and 7 in Italy and Spain, respectively. For the latter, a basic-helix-loop-helix (bhlh) transcription factor was a candidate for the increase of density of fine roots.


Breeding potentialities for the established cultivated tomato panel

Since its domestication, tomato has undergone extensive changes leading to the development of diverse morphotypes for plant architecture and fruit features. The advent of modern breeding has targeted the increase of yield through disease resistant cultivars able to satisfy the ever-increasing global demand. Dynamic market trends and changes in consumers’ preferences of recent decades have subsequently widened breeding targets to quality-related traits and shelf-life, for which the potentialities are principally held in vintage cultivars such as local varieties and heirlooms [23]. Years of selection across often marginal environments and alien introgression transfer from wild ancestors broadened the genetic and phenotypic diversity of these germplasm resources providing a suitable reservoir of alleles for breeding in unadapted genotypes and/or for low-input cultivation conditions [24].

In recent years, farms converted to organic systems and related consumer’s demand for organic products have both drastically increased. In this scenario, Southern European countries such as Italy, Spain, and France, represent almost 50% of the total organic acreage area of Europe [25]. While the profitability and convenience of organic farming is still a matter of debate [26], there is an undoubted need to develop suitable cultivars for the sector. Objectives for organic breeding include relevant agronomic, morphological and quality traits’ performance under low-input conditions. To that end, we have explored a diversified cultivated tomato collection representative of the diversity enclosed in ancient and modern varieties, in two organic farms of typical Mediterranean environments.

The considerable level of diversity found for the 37 traits analyzed suggests the presence of suitable cultivars keeping outstanding levels of productivity and fruit quality for cultivation and marketing in the organic sector, as well as an interesting source of variation for breeding purposes. As expected, high heritability values coupled with low variation due to E and G × E interaction were observed for several morphological traits. Interestingly, we found strong genotypic control for fruit weight. This trait is one of the most extensively domesticated in tomatoes and is regulated by major genes/QTLs responsible for the high range of variation [27]. The observed high genetic advance (GAM) combined with high heritability suggests an additive gene action on their expression enhancing conditions for selection [28]. On the contrary, the lower levels of heritability coupled with the low GAM for root architecture highlights how the development of the hypogeal part of plants has a more complex control, with high environmental effect on it. The diameter of the main root and radicular crown angle were lesser influenced by the differences between cultivation sites suggesting the stronger genotypic control for the parts of root anatomy ensuring primary function in plant anchorage and nutrient absorption. This was supported by positive correlations with growth habit, ripening earliness and plant vigour found at both sites, which although non-significant, suggest a link among the main root and traits related to plant fitness. Furthermore, the observed correlations suggest that the root apparatus could play a role in ripening earliness and in the accumulation of soluble solids. Although these two traits are not intercorrelated, the level of sucrose is reported to enhance the ethylene receptor genes and their signaling during tomato fruit maturation [29]. The inverse correlation between °Brix and yield-related traits may be likely due to the minor concentration of the total soluble solids in fruits increasing their size and/or in plants increasing their yield. In addition, we found higher values for chemical traits (SS, AC) and for precocity of ripening in all cultivar groups grown during the late spring-end summer cycle (Italy). This trend could be linked to a synergic effect of increased sunlight intensity and temperature [30, 31]. Overall, the multivariate approach of investigation allowed a better understanding of phenotypic performance in low input conditions for the studied tomato collection as well as of genotypic and environmental factors underlying the variation of traits. Such abundant variation made the core collection suitable for investigation of the genetic basis of traits assessed. Therefore, we performed a genome-wide association mapping analysis.

Association mapping for candidate genes identification

The tomato panel herein reported is part of a larger collection of 288 individuals previously characterized for its genomic diversity and population structure [32]. By removing the highly redundant accessions and segregating lines, we selected a core set of 244 genotypes used for genome-wide association mapping. The analysis with STRUCTURE performed in the present study, identified a likely number of subpopulations and admixed individuals in agreement with the previous results obtained by using ADMIXTURE [32], confirming the absence of any misinterpretation of ddRAD data in diploid species using both the bayesian approach and the maximum likelihood framework [33]. The collection consisted of five groups of accessions selected based on pre-existing phenotyping information (e.g., fruit colour and morphology, tolerance to stresses), geographic origin and market preferences. Both genomic and phenotypic data highlighted a greater variability within the cultivars group than in the rest of groups, likely due to the more intense breeding activity respect to landraces and heirlooms. This observation is supported by Shouten et al. [34] who reported two boosts of diversity in tomato from the second half of the twentieth century aimed at the introgression of disease resistance alleles from wild relatives as well as at the improvement of quality. Genome-wide association analysis through the CLMM model allowed to identify a total of 59 significant signals associated to 21 traits. This method represents an improvement of the linear mixed model in terms of computational and statistical powers being most suitable to minimize false positives due to population structure [10]. We used the Bonferroni correction for multiple testing to controls the family-wise error. Although conservative, this method allows avoiding any excessive claims, being suitable when background LD exists between SNPs, but they are assumed to be independent (low LD) [35].

Analyses allowed to identify associations in previously reported genomic regions as well as new candidates. Growth habit in tomato is regulated by self-pruning, for which, the loss-of-function mutated gene (sp) is one of the most common used for breeding modern cultivars [36]. We found a single mutation not previously reported for the candidate gene underlying cytokine biosynthesis, essential hormones for growth and development [22], suggesting a possible role for plant habit.

Fruit shape, weight, colour, and content of soluble solids are among the most studied traits in tomato for which underlying genes and their function have been described [37]. Major genes responsible for the increase of fruit shape are ovate and sun conferring fruit elongated as well as fas and lc controlling locule number. The sun-like protein 31 (SlSUN31) detected in this study is part of the 34-member sun-like gene family controlling fruit elongation and being reported to be expressed also in buds and young flowers [38]. Similarly, a cluster of flowers and fruit traits associated to the same candidate gene (Solyc11g071840) have been described [39]. Furthermore, Mata-Nicolás et al. [18] found the same genomic region associated to the increase of locules number and to uniparous inflorescence, whereas Ruggeri et al. [14]. attributed a fruit weight association to the same locus. This gene is located 2 Mbp upstream of cell size regulator (CSR, Solyc11g071940) reported as a candidate of fw11.3, a major QTL controlling fruit weight by increasing cell size and found in a similar region as fas [40]. We also found fruit fasciation associated to SlSUN31 confirming previous reports mapping fas in the same region of CSR. Furthermore, SlSUN31 is located at approximately 3 Mbp from the region (Solyc11g064850) involved in branched inflorescences of tomato-fin mutants [41], although the effect found in our study leads to a compound inflorescence compared to what was previously reported [18]. The associations found on chromosome 11 correspond to regions previously reported to be under selection within wild and cultivated tomato germplasm. Razifard et al. [42] reported selective sweeps for locule numbers at 1.879 Kb from SlSUN31. Lin et al. [43] identified improvement sweeps leading to the increasing of fruit mass in correspondence of fw11.3. All these findings highlighted how the distal end of the long arm of chromosome 11 carry selective sweeps signals involved in the domestication and improvement of tomato confirming SlSUN31 as a potential candidate gene for improving traits related to flower and fruit morphology.

Two additional associations were detected for fruit weight on chromosomes 4 and 5, the former in the region of fw4.1 reported as a major increasing fruit weight in tomato [44], whereas the latter is likely a new genomic region although QTLs for fruit weight on chromosome 5 have been reported in a tomato mapping population [45].

A single association was found for fruit shape, for which the variant was in the range of 29 Mbp from fs8.1 conferring a blocky and slightly elongated shape [46], and at 47 Mbp from SlOFP16, a member of the ovate family proteins key regulators of fruit shape [37]. Furthermore, on chromosome 8, several peaks upstream of the association for fruit shape might be responsible for the variation of puffiness. These findings confirm the presence of a broad region on chromosome 8 containing clusters of genes involved in the variation of morphological characteristics of fruits [47]. Moreover, we found associations on different chromosomes for the puffiness, evidencing the presence of different involved genomic regions as observed earlier [48].

Among quality-related traits, the soluble solids content is important for taste and sweetness in tomato. We detected a single variant on chromosome 2 matching with Solyc02g081820.3.1, which encodes for an Haloacid dehalogenase-like hydrolase. The candidate gene falls 3 Mbp upstream of a previously identified region encoding for an UV excision repair protein (Solyc02g085840.2) [15] and in the range of 1.0–3.5 Mbp downstream from transporters of glucose (Solyc02g079220.2) [49], transporters of amino acids (Solyc02g070270) [50], and proteasome degradation protein (Solyc02g081700) [19]. Furthermore, the association maps in the same interval of ssc2.1 and ssc2.2, two QTLs found in tomato RIL and MAGIC populations [51] as well as close to Brix 2.2 described in a mapping population involving cherry tomato [52]. In addition, within the same genomic region, SSC has been found to colocalize with locule number and fruit weight [19, 53]. Although we did not observe such clustering, our results shed light on the presence of multiple linked genes underlying the soluble solids variation in the interval of 40–45 Mbp of tomato chromosome 2.

Another strong association responsible for the reduced intensity of colour was detected on chromosome 1, at ~ 2 Mbp from the previously identified interval underlying fruit colour and β-carotene variation [19]. The candidate gene chromatin remodeling 24 (Solyc01g068280.3.1) falls in the same genomic region of SlMYB12, a gene harbouring the y mutation responsible for the reduction of accumulation of naringenin chalcone leading to pink-coloured fruits [54]. The most represented fruits in the studied collection were pink- and red-coloured, therefore, the above-mentioned association suggests the presence of possible additive genes underlying this phenotype on the short arm of chromosome 1.

The root apparatus system represents the hidden part of the plant and in tomato fewer efforts are reported in comparison to other crops for the identification of the genetic basis of its architecture. The number of associations were quite low compared to those found in the epigeal part. Interestingly, for the diameter of the main root, we found a peak on chromosome 11 falling at 50 Kb from a bidirectional sugar transporter sweet, which is involved in sucrose efflux through the phloem contributing to its migration in plant tissues and as transient storage reserves in the stem and root [55].

Overall, the extent of the phenotypic and genetic variability enclosed in the present diversified collection and exploited in the association mapping approach allowed to narrow down the genomic regions underlying important traits for tomato breeding as well as identify novel ones. By confirming the previously identified SNPs, the association model adopted validates the approach employed for exploiting our mapping panel suggesting the robustness of the genome-wide associations detected for marker assisted selection and breeding under organic conditions.


In this study, we performed phenotypic characterization and genome wide association analysis in a broad and heterogeneous cultivated tomato panel grown under organic conditions. We found a broad range of variation in the traditional and improved gene-pool, confirming their potentialities as a reservoir of traits to exploit in genetic improvement. The information obtained consolidate those from previous association mapping studies using mixture of sub-varieties of S. lycopersicum (e.g., var. cerasiforme) or close related wild species (e.g., S. pimpinellifolium). We confirmed the strong involvement of different genomic regions for the traits studied, highlighting a major one on chromosome 11 underlying several flower and fruit traits. These results open the perspective for further approaches through functional validation. Furthermore, these variants, if appropriately validated, are a powerful tool for genomic-assisted breeding in tomato, particularly to obtain new cultivars adapted to organic cultivation.


Plant material

The collection used in this study is composed of 244 cultivated tomato (S. lycopersicum) accessions retrieved from the Universitat Politècnica de València (UPV, Spain), the Research Centre for Vegetable and Ornamental Crops (CREA, Italy) and the Tomato Genetics Resource Center (TGRC; USA) genebanks. The panel is part of a larger germplasm collection described by Esposito et al. [32] and consists of five accession groups (AG) including 60 landraces for fresh consumption (FC), 58 long shelf-life landraces (LS) known also as ‘da Serbo’, ‘de Penjar’, ‘de Ramellet’, 44 heirloom varieties (HL), 71 elite cultivars (CL) and 11 breeding lines (BL). Landraces mainly represent the Mediterranean area whereas heirlooms and cultivars come from different regions of the world. Detailed information regarding the accessions is reported in Table S4.

Field trials and Phenotyping

Accessions were grown in two organic certified fields during the spring-summer season of 2019. The first field was in a private farm of the municipality of Alcàsser (province of Valencia, Spain) (39°37′ N, − 0°44′ E). The second was located at the Research Centre for Vegetable and Ornamental Crops (CREA, Monsampolo del Tronto, AP, Italy) (42°53′ N, 13°48′ E) following an agro-ecological management approach based on both conservation tillage and crop diversification strategies as described by Campanelli and Canali (2012) [56]. At both sites, the average temperatures during the growing season were ~ 20–22 °C with peaks of 38 °C and 40 °C in Spain and Italy, respectively.

According to the typical tomato cycles of the two growing areas, plants were sown at the end of February (Spain) and at the beginning of April (Italy) and transplanted to the open field after 6 weeks. A biodegradable plastic mulch film was used to avoid competition with weeds. Plants were grown with at a density of 2.6 plant/m2 using a system of double rows in which canes from adjacent rows were tied together forming a triangle-shaped structure. A drip irrigation system was used for water supply whereas fertilization consisted of the application of an organic fertilizer. At both locations, a randomized block design was followed with three blocks and one replicate, consisting of four plants, per block.

All plants in each block were phenotyped for 5 qualitative and 9 pseudo-qualitative traits related to plant architecture and fruit characteristics as well as for 23 agronomic and root traits of which 9 are pseudo-qualitative and 14 are quantitative. All traits were scored at both locations except root weight which was scored only in Italy. Roots were assessed at the end of the cycle (about 150 days after transplant) firstly removing soil and taking care not to damage the root system, then removing the substrate residues by carefully shaking the roots. Details of traits analyzed scale and method of measurement are listed in Table S6. Details of root characterization are described in Additional file 1: Figure S6. All data have been manually reviewed and curated prior to analysis.

Phenotypic data analyses

The generalized linear model (GLM) was used to analyze pseudo-qualitative and quantitative traits through the F test in the ANOVA to detect significant differences. Trait means differences among the AG and within each AG were compared by using Tukey HSD (honest significant difference) test at P = 0.05. Broad-sense heritability (H2) was estimated as the proportion of phenotypic variation σ2p due to genotypic variation σ2G:


The phenotypic coefficient of variation (PCV) in percentage was calculated as the ratio of the square root for the phenotypic variation (σP) on the mean (μ) for the considered trait:

$$\mathrm{PCV}=\left(\frac{\sigma_P\ }{\mu}\right)\times 100$$

whereas the genotypic coefficient of variation (GCV) was estimated as:

$$GCV=\left(\frac{\sigma_G\ }{\mu}\right)\times 100$$

The genetic advance as percent of population mean was also derived as:

$$GAM=\left(\frac{\sigma_p\times k\times H2}{\mu}\right)\times 100$$

Where k is the selection intensity at 5% (2.06).

A two factorial linear model was used to determine the effects of the genotype, environment, and their interaction on trait performance:

$$\mu ij=\mu + Gi+ Ej+\left(G\times E\right) ij+\varepsilon$$

where μ is the grand mean, “G” is the random effect of genotype “i”, “E” is the environmental main effect corresponding to environment “j”, ε represent the error term. Mean square values (MS), were used to estimate the magnitude of the observed effect while the total sum of squares in percentage (TSS%) was calculated dividing the TSS of the effect by the total TSS. Experimental data were statistically elaborated using R. Correlations across the genotypes for phenotypic traits were calculated using the Spearman’s test at P < 0.05 after Bonferroni’s correction for multiple comparisons. The correlogram was constructed and visualized using the Corrplot package implemented in R version 3.0.2 ( Principal component analysis (PCA) was carried out to determine which are the most effective descriptors in discriminating among accessions using the computer package XLSTAT 2012.1 and visualizing the similarities among accessions. The relationships between pairs of phenotypic and molecular data matrices were computed by the Mantel test using Pearson’s r-value.

Population structure analysis

The accessions included in the tomato panel were genotyped by Esposito et al. [32] using Double Digest Restriction Associated DNA (ddRAD) Sequencing. A set of 238,644 SNPs were identified in the set of 244 accessions from mapping raw reads to the Solanum lycopersicum SL 4.0 pseudomolecule assembly ( Missing genotype calls were imputed using LD k-nearest neighbor imputation (LD-kNNi) implemented in TASSEL.13 Markers were filtered using VCFtools ( with a call rate value lower than 95% and with minor allele frequency (MAF) lower than 5%. After filtering for call rate and MAF, a total of 37,317 SNPs was subsequently used for the downstream analysis. The population structure was inferred using a Bayesian model-based approach integrated by neighbor joining phylogenetic analyses. The model-based analysis was performed using Structure v2.3.4 and admixture model with allele frequencies correlated among populations [57]. Runs were done using 20,000 burn-in cycles followed by 10,000 Monte Carlo – Markov Chain (MCMC) iterations, a number of sub-populations (K) ranging between 1 and 14 with five independent runs for each K. The most probable numbers of sub-populations were determined according to Evanno’s method using Structure Harvester ( Once defined the optimal K value, a second Structure run was repeated at the best K value to maximize the accuracy in determining the membership of each accession. The same parameters as above were used, except for the number of burn-in and MCMC iterations (150,000 and 100,000 respectively). Accessions were considered to belong to a specific sub-population if its membership coefficient (qi) was ≥ 0.50, whereas the genotypes with qi lower than 0.5 at each assigned K were considered as admixed.

A phylogenetic tree was drawn using the Maximum Likelihood method and the Tamura-Nei model with 100 bootstraps. Analyses were conducted in MEGA X software ( Principal coordinate analysis (PCoA) was conducted with the Past 3.04 software using Euclidean distance [58].

The r2 statistic was estimated for each pair of SNPs using Plink v1.09 [59] to investigate the linkage disequilibrium (LD) decay in the entire population by using 37,317 SNP markers. Mean r2 was computed into 1 Kb intervals and LD decay curves was fitted using a non-linear model, with distance (Mb) values on the x-axis and r2 values on the y-axis.

Genome wide association study

Genome-wide association mapping using a compressed mixed linear model (CMLM) with population parameters previously defined (P3D) was performed in TASSEL [13, 60]. To minimize the confounding effects (e.g., population stratification, unequal relatedness among individuals) the model incorporated population structure covariates (Q matrix, according to the best number of 5 sub-populations) and the kinship (K matrix) estimated using the centered identity by state (IBS) for accounting relationships among individuals. The analysis has jointly implemented phenotypic data from the two locations. For traits for which no association was found, the analysis was performed considering each grown site independently. The significance threshold for marker-trait association was determined after Bonferroni multiple test correction with genome-wide α = 0.05. The P values was estimated according to the formula:

$$Pvalue=\frac{0.05}{no.\kern0.5em of\ markers\ }$$

Considering 37,317 SNPs, the marker was considered significant when the P value was less than 5.873 (−log10P = 1.339 × 10− 6). The phenotypic variation explained by each marker was the R2-value obtained from CLMM model. Circular Plot, Manhattan plot and QQ plots for GWAS results were produced using the R Package CMplot. The chromosomal location of the genome-wide significantly associated SNPs was displayed using PhenoGram (

In silico candidate gene identification

Physical mapping of significantly associated SNPs and functional annotation of the predicted underlying genes were performed using the Solanum lycopersicum SL4.0 genome browser ( For associated SNPs mapping within intron regions, we considered the nearest genes located upstream or downstream of the significant markers. Gene models were blasted against Tomato Genome Proteins (ITAG release 4.0) to determine the gene annotation.

Availability of data and materials

The genomic raw sequence data analysed during the current study are available in the NCBI-SRA (National Center for Biotechnology Information Short Read Archive) repository database under the accession number BioProject ID PRJNA638535. The phenotypic raw datasets used and analysed during the current study available from the corresponding author on reasonable request.



Genome-wide association study


Minimum allele frequency


Compression linear mixed model


Linear mixed model


Linkage disequilibrium


Markov Chain Monte Carlo


Phenotypic coefficient of variance


Genotypic coefficient of variance


Genetic advance as a percentage of the mean


Long shelf life




Breeding lines


Principal coordinate analysis


Total sums of squares


  1. Anderson R, Bayer PE, Edwards D. Climate change and the need for agricultural adaptation. Curr Opin Plant Biol. 2020;56:197–202.

    Article  PubMed  Google Scholar 

  2. Le Campion A, Oury FX, Heumez E, Rolland B. Conventional versus organic farming systems: dissecting comparisons to improve cereal organic breeding strategies. Org Agr. 2020;10:63–74.

    Article  Google Scholar 

  3. The World of Organic Agriculture In: Statistics Sessions at BIOFACH 2020 Nürnberg, Germany. 2020. Accessed 20 June 2021.

  4. FAOSTAT 2019. Accessed 20 June 2021.

  5. Higashide T, Heuvelink E. Physiological and morphological changes over the past 50 years in yield components in tomato. J Am Soc Hortic Sci. 2009;134:460–5.

    Article  Google Scholar 

  6. Lammerts van Bueren ET, Backes G, de Vriend H, Østergård H. The role of molecular markers and marker assisted selection in breeding for organic agriculture. Euphytica. 2010;175:51–64.

    Article  Google Scholar 

  7. Pérez-Caselles C, Brugarolas M, Martínez-Carrasco L. Traditional varieties for local markets: a sustainable proposal for agricultural SMEs. Sustain. 2020;12:1–19.

    Google Scholar 

  8. Pham B, McConnaughay KÌ. Plant phenotypic expression in variable environments. In: Monson R, editor. Ecology and the environment. The plant sciences, vol. 8. New York: Springer; 2014.

    Chapter  Google Scholar 

  9. Barrios-Masias FH, Jackson LE. California processing tomatoes: morphological, physiological and phenological traits associated with crop improvement during the last 80 years. Eur J Agron. 2014;53:45–55.

    Article  Google Scholar 

  10. Cortes LT, Zhang Z, Yu J. Status and prospects of genome-wide association studies in plants. Plant Genome. 2021;14:1–17.

    Google Scholar 

  11. Wang M, Jiang N, Jia T, Leach L, Cockram J, Waugh R, et al. Genome-wide association mapping of agronomic and morphologic traits in highly structured population of barley cultivars. Theor Appl Genet. 2012;124:233–46.

    Article  PubMed  Google Scholar 

  12. Runcie DE, Crawford L. Fast and flexible linear mixed models for genomewide genetics. PLoS Genet. 2019;15(2):e1007978.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  13. Zhang Z, Ersoz E, Lai CQ, Todhunter RJ, Tiwari HK, et al. Mixed linear model approach adapted for genome-wide association studies. Nat Genet. 2010;42:355–60.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  14. Ruggieri V, Francese G, Sacco A, D'Alessandro A, Manuela M, Parisi MG, et al. An association mapping approach to identify favoruable alleles for tomato fruit quality breeding. BMC Plant Biol. 2014;14:337.

    Article  PubMed Central  PubMed  CAS  Google Scholar 

  15. Sauvage C, Segura V, Bauchet G, Stevens R, Thi Do P, Nikoloski Z, et al. Genome wide association in tomato reveals 44 candidate loci for fruit metabolic traits. Plant Physiol. 2014;165:1120–32.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  16. Zhao J, Sauvage C, Zhao J, et al. Meta-analysis of genome-wide association studies provides insights into genetic control of tomato flavor. Nat Commun. 2019;10:1534.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  17. Bauchet G, Grenier S, Samson N, Bonnet J, Grivet L, Causse M. Use of modern tomato breeding germplasm for deciphering the genetic control of agronomical traits by genome wide association study. Theor Appl Genet. 2017;130(5):875–89.

    Article  CAS  PubMed  Google Scholar 

  18. Mata-Nicolás E, Montero-Pau J, Gimeno-Paez E, et al. Exploiting the diversity of tomato: the development of a phenotypically and genetically detailed germplasm collection. Hortic Res. 2020;7:66.

    Article  PubMed Central  PubMed  CAS  Google Scholar 

  19. Rodriguez M, Scintu A, Posadinu CM, Xu Y, Nguyen CV, Sun H, et al. GWAS based on RNA-Seq SNPs and high-throughput Phenotyping combined with climatic data highlights the reservoir of valuable genetic diversity in regional tomato landraces. Genes. 2020;23:1387.

    Article  CAS  Google Scholar 

  20. White PJ, George TS, Gregory PJ, Bengough AG, Hallett PD, McKenzie BM. Matching roots to their environment. Ann Bot. 2013;112:207–22.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  21. Wang K, Ding Y, Cai C, Chen Z, Zhu C. The role of C2H2 zinc finger proteins in plant responses to abiotic stresses. Physiol Plant. 2018.

  22. Kieber JJ, Schaller GE. Cytokinin signaling in plant development. Development. 2018;478:337–42.

    Google Scholar 

  23. Tieman D, Zhu G, Resende M Jr, Lin T, Nguyen C, Bies D, et al. A chemical genetic roadmap to improved tomato flavor. Science. 2017;355:391–4.

    Article  CAS  PubMed  Google Scholar 

  24. Foolad MR. Panthee DR marker-assisted selection in tomato breeding. Crit Rev Plant Sci. 2012;31:93–123.

    Article  Google Scholar 

  25. Eurostat. 2020. Accessed 20 June 2021.

  26. Röös E, Mie A, Wivstad M, Salomon E, Johansson B, Gunnarsson S, et al. Risks and opportunities of increasing yields in organic farming. A review. Agron Sustain Dev. 2018;38:331.

    Article  CAS  Google Scholar 

  27. Bai Y, Lindhout P. Domestication and breeding of tomatoes: what have we gained and what can we gain in the future? Ann Bot. 2007;100:1085–94.

    Article  PubMed Central  PubMed  Google Scholar 

  28. Falconer DS. Introduction to Quantitative Genetics, Ed. 3. Harlow, Essex, UK/New York: Longmans Green/John Wiley & Sons; 1989.

    Google Scholar 

  29. Li DD, Mou WS, Wang YS, Li L, Mao LC, Ying TJ, et al. Exogenous sucrose treatment accelerates postharvest tomato fruit ripening through the influence on its metabolism and enhancing ethylene biosynthesis and signaling. Acta Physiol Plant. 2016;38:225.

    Article  CAS  Google Scholar 

  30. Beckles DM. Factors affecting the postharvest soluble solids and sugar content of tomato (Solanum lycopersicum) fruit. Postharvest Biol Tec. 2012;63:129–40.

    Article  CAS  Google Scholar 

  31. Vijayakumar A, Shaji S, Beena R, Sarada S, Sajitha Rani T, Stephen R, et al. High temperature induced changes in quality and yield parameters of tomato (Solanum lycopersicum L.) and similarity coefficients among genotypes using SSR markers. Heliyon. 2021;7:e05988.

    Article  PubMed Central  PubMed  Google Scholar 

  32. Esposito S, Cardi T, Campanelli G, et al. ddRAD sequencing-based genotyping for population structure analysis in cultivated tomato provides new insights into the genomic diversity of Mediterranean ‘da serbo’ type long shelf-life germplasm. Hortic Res. 2020;7:134.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  33. Skotte L, Korneliussen TS, Albrechtsen A. Estimating individual admixture proportions from next generation sequencing data. Genetics. 2013;195:693–702.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  34. Schouten HJ, Tikunov Y, Verkerke W, Finkers R, Bovy A, Bai Y, et al. Breeding has increased the diversity of cultivated tomato in the Netherlands. Front Plant Sci. 2019;10:1606.

    Article  PubMed Central  PubMed  Google Scholar 

  35. Nyholt DR. A simple correction for multiple testing for single-nucleotide polymorphisms in linkage disequilibrium with each other. Am J Hum Genet. 2004;74:765–9.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  36. Carmel-Goren L, Liu YS, Lifschitz E, Zamir D. The SELFPRUNING gene family in tomato. Plant Mol Biol. 2003;52:1215–22.

    Article  CAS  PubMed  Google Scholar 

  37. Rodríguez G.R, Kim HJ, van der Knaap E. Mapping of two suppressors of OVATE (sov) loci in tomato. Heredity 2013;111, 256–264.

  38. Huang Z, Van Houten J, Gonzalez G, Xiao H, van der Knaap E. Genome-wide identification, phylogeny and expression analysis of SUN, OFP and YABBY gene family in tomato. Mol Gen Genomics. 2013;288:111–29.

    Article  CAS  Google Scholar 

  39. Sacco A, Ruggieri V, Parisi M, Festa G, Rigano MM, Picarella ME, et al. Exploring a tomato landraces collection for fruitrelated traits by the aid of a high-throughput genomic platform. PLoS One. 2015;10(9):e0137139.

    Article  PubMed Central  PubMed  Google Scholar 

  40. Mu Q, Huang Z, Chakrabarti M, Illa-Berenguer E, Liu X, Wang Y, et al. Fruit weight is controlled by cell size regulator encoding a novel protein that is expressed in maturing tomato fruits. PLoS Genet. 2017;13:e1006930.

    Article  PubMed Central  PubMed  CAS  Google Scholar 

  41. Xu C, Liberatore KL, MacAlister CA, et al. A cascade of arabinosyl transferases controls shoot meristem size in tomato. Nat Genet. 2015;47:784–92.

    Article  CAS  PubMed  Google Scholar 

  42. Razifard H, Ramos A, Della Valle AL, et al. Genomic evidence for complex domestication history of the cultivated tomato in Latin America. Mol Biol Evol. 2020;37:1118–32.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  43. Lin T, Zhu G, Zhang J, Xu X, Yu Q, Zheng Z, et al. Genomic analyses provide insights into the history of tomato breeding. Nat Genet. 2014;46:1220–6.

    Article  CAS  PubMed  Google Scholar 

  44. Grandillo S, Ku HM, Tanksley SD. Identifying the loci responsible for natural variation in fruit size and shape in tomato. Theor Appl Genet. 1999;99:978–87.

    Article  CAS  Google Scholar 

  45. Causse M, Duffe P, Gomez MC, Buret M, Damidaux R, Zamir D, et al. A genetic map of candidate genes and QTLs involved in tomato fruit size and composition. J Exp Bot. 2004;55(403):1671–85.

    Article  CAS  PubMed  Google Scholar 

  46. Ku HM, Grandillo S, Tanksley SD. fs8.1 a major QTL sets the pattern of tomato carpel shape well before anthesis. Theor Appl Genet. 2000;101:873–8.

    Article  CAS  Google Scholar 

  47. Sun L, Rodriguez GR, Clevenger JP, Illa-Berenguer E, Lin J, Blakeslee JJ, et al. Candidate gene selection and detailed morphological evaluations of fs8.1, a quantitative trait locus controlling tomato fruit shape. J Exp Bot. 2015;66:6471–82.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  48. Frary A, Fulton TM, Zamir D, Tanksley SD. Advanced backcross QTL analysis of a Lycopersicon esculentum X L. pennellii cross and identification of possible orthologs in the Solanaceae. Theor Appl Genet. 2004;108:485–96.

    Article  CAS  PubMed  Google Scholar 

  49. Ye J, Li WF, Ai G, Li CX, Liu GZ, Chen WF, et al. Genome-wide association analysis identifies a natural variation in basic helix-loop-helix transcription factor regulating ascorbate biosynthesis via D-mannose/L-galactose pathway in tomato. PLoS Genet. 2019;15(5):e1008149 pmid: 31067226.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  50. Albert E, Segura V, Gricourt J, Bonnefoi J, Derivot L, Causse M. Association mapping reveals the genetic architecture of tomato response to water deficit: focus on major fruit quality traits. J Exp Bot. 2016;67:6413–30.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  51. Pascual L, Albert E, Sauvage C, Duangjit J, Bouchet JP, Bitton F, et al. Dissecting quantitative trait variation in the resequencing era: complementarity of bi-parental, multi-parental and association panels. Plant Sci. 2016;242:120–30.

    Article  CAS  PubMed  Google Scholar 

  52. Lecomte L, Duffe P, Buret M, Servin B, Hospital F, Causse M. Marker-assisted introgression of five QTLs controlling fruit quality traits into three tomato lines revealed interactions between QTLs and genetic backgrounds. Theor Appl Genet. 2004;109:658–68.

    Article  CAS  PubMed  Google Scholar 

  53. Ranc N, Muños S, Xu J, Le Paslier MC, Chauveau A, Bounon R, et al. Genome-wide association mapping in tomato (Solanum lycopersicum) is possible using genome admixture of Solanum lycopersicum var. cerasiforme. G3. 2012;2:853–64.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  54. Ballester AR, Molthoff J, de Vos R. Biochemical molecular analysis of pink tomatoes, deregulated expression of the gene encoding transcription factor SlMYB12 leads to pink tomato fruit color. Plant Physiol. 2010;152:71–84 pmid: 19906891.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  55. Chen LQ. SWEET sugar transporters for phloem transport and pathogen nutrition. New Phytol. 2014;201:1150–5.

    Article  CAS  PubMed  Google Scholar 

  56. Campanelli G, Canali S. Crop production an environmental effects in conventional and organic vegetable farming systems: the case of a long-term experiment in Mediterranean conditions (Central Italy). J Sustain Agric. 2012;36(6):599–619.

    Article  Google Scholar 

  57. Pritchard JK, Stephens M, Donnelly P. Inference of population structure using multilocus genotype data. Genetics. 2000;155:945–59.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  58. Hammer Ø, Harper DAT, Ryan PD. PAST: paleontological statistics software package for education and data analysis. Palaeontol Electron. 2001;4:9.

    Google Scholar 

  59. Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience. 2015;4:7.

    Article  PubMed Central  PubMed  CAS  Google Scholar 

  60. Bradbury PJ, Zhang Z, Kroon DE, Casstevens TM, Ramdoss Y, Buckler ES. TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics. 2007;23(19):2633–5.

    Article  CAS  PubMed  Google Scholar 

Download references


The authors wish to acknowledge Nazzareno Acciarri for the genetic material provided and for helping to plan the experimental trial, Sandro Fabrizi, Maria Assunta Dattoli and Vittoria Di Filippo for technical support.


This research was supported by the European Union Horizon 2020 Research and Innovation program for funding this research under grant agreement No 774244 (Breeding for Resilient, Efficient and Sustainable Organic Vegetable Production; BRESOV) and by ‘RGV-FAO’project funded by the Italian Ministry of Agriculture, Food and Forestry. The funding bodies were not involved in the design of the study, collection, analysis, and interpretation of data, and in writing the manuscript.

Author information

Authors and Affiliations



P.T., J.P., T.C., M.J.D. conceived the project and coordinate experiments. M.J.D., G.C., S.Se., S.So., M.R.F., provided germplasm and outlined the phenotyping protocol. F.L., G.C., designed field experimental layout and managed field trial in Italy; S.Si., F.L., A.B., A.P., D.P., C.P., performed phenotypic characterization in IT, G.C. provided the plant photos in Italy. S.So., M.R.F., C.C. designed field experimental layout and managed field trial in Spain; M.R.F., C.C., E.S., L.P.-D., R.B., E.R.-M., performed phenotypic characterization in Spain. P.T. analyzed phenotypic data and performed the genetic analysis, P.T. and S.E. performed the bioinformatic analysis. P.T. drafted the manuscript, T.C., M.J.D., and J.P. carefully revised the manuscript. All authors discussed the results and commented on the manuscript. The authors read and approved the final manuscript.

Corresponding authors

Correspondence to Pasquale Tripodi or Jaime Prohens.

Ethics declarations

Ethics approval and consent to participate

Experimental research and field studies on tomato plants comply with relevant institutional, national, and international guidelines and legislation.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tripodi, P., Soler, S., Campanelli, G. et al. Genome wide association mapping for agronomic, fruit quality, and root architectural traits in tomato under organic farming conditions. BMC Plant Biol 21, 481 (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: