The main objective of this study was to explore the genetic diversity and structure of cultivated grapevine and link them to cultivar utilization, putative geographic origin and historical events. Microsatellite markers’ data for 2,323 unique cultivars collected and maintained at the French grapevine collection of Vassal (INRA, France), were available . Inferences of population structure were derived with both a Bayesian and a hierarchical clustering method. Since clustering methods may be sensitive to sampling bias, to improve our chances to detect true structure patterns, we followed three strategies, i) first we focused only on the 2,096 genotypes (out of the 2,323 unique cultivars) without missing SSR data and excluding putative clones and close mutants (with only one or two allele differences over the 40 alleles); indeed, missing data may bias the clustering procedure, and nearly identical SSR genotypes can be considered as redundant for our scope; ii) secondly, we evaluated the possible bias due to unbalanced geographical representativeness of our sample, by running STRUCTURE analysis on two data sets, one with the entire sample and the other balanced in term of cultivar geographical origin (cultivars being randomly picked within each geographical group). STRUCTURE provided a very consistent attribution of genotypes to clusters independently of the data set, thus only the full set of genotypes was further analyzed; iii) third, since the STRUCTURE clustering method can be disputed because human manipulation of cultivars (displacements, breeding, clonal propagation) could have generated a deviation from the Hardy-Weinberg equilibrium, we complemented the STRUCTURE analysis with the method of discriminant analysis developed by Ward, which is independent from any assumptions on population dynamics. According to Odong et al. , the two methods are complementary, so they can conveniently be used together and compared.
The 2,096 cultivars of the Vassal collection studied here originate from 52 countries around the world, making our sample highly representative of the cultivated grapevine gene pool. Our data confirmed the high levels of diversity and heterozygosity of the cultivated grapevine, in agreement with a number of previous studies [4, 5, 8, 17, 18]. This can be due to an intermix of factors: i) a weak bottleneck effect during domestication  as observed for maize and wheat [31, 32], probably in relation to ii) vegetative propagation and diffusion of cuttings across geographic regions , iii) several putative domestication events from different gene pools [9, 10], then intermixed by man with breeding and selection, and iv) diversifying selection in plant breeding . The large diversity found in grapevine opens an avenue for further selection and breeding . Among the 2,096 genotypes studied here, over one half is still poorly known from a viticultural and oenological point of view and may potentially carry new genes and traits of interest for new breeding and selection.
STRUCTURE identified one main level of population subdivision at Ks = 3 and a secondary subdivision at Ks = 5. A PCA analysis and Ward’s hierarchical clustering confirmed this finding. Both the STRUCTURE and Ward methods indicated inconsistencies in clustering for K = 4 and 6, suggesting that these two levels are not appropriate for subdividing the grapevine gene pool. While confirming the main subdivision, Ward’s clustering also pointed to a finer structure linked to grapevine uses, family structure or local geographic groups.
The analysis of family relationships also revealed that STRUCTURE clustered a significant portion of family-related genotypes, nearly double of the fraction found in the admixed group. By contrast almost no parentage was found among genotypes from different K3 groups (inter-group level). These findings are probably the result of the history of grapevine, with the practice of breeding focusing mostly on local varieties.
In the admixed group we could identify approximately 3% of genotypes with parents classified in two different STRUCTURE clusters, such as the wine grape Tarrango, known to be a cross between Touriga (a wine grape from Portugal, S-3.1 group) and Sultanina (a seedless table grape from Turkey, S-3.2 group). The crossing among genotypes from different STRUCTURE groups probably corresponds to recent breeding activity in search for novelties and hybrid vigor, remaining nevertheless proportionally marginal.
We also detected significantly more family relationships within the already know grapevine kin groups of i) Gouais [15, 34, 35], ii) Savagnin and Cabernet franc , iii) Chasselas and Muscat, and iv) Pinot and Riesling , and found traces of existence of two additional groups, each composed by a mix of several families, such as the W-12.6 and W-12.7 groups, comprising family-related table grapes with muscat flavor released by modern breeding.
The interaction of genetic structure and family relationship is known to be difficult to resolve, and 20 microsatellite loci are probably not sufficient to avoid false positives, despite the large number of alleles. Nevertheless, our family relationship analysis, seen as a tentative to understand large scale population patterns and not to precisely detect each single family pair, provided a coherent global picture. This analysis was also coherent with a more specific paper by Lacombe et al. in 2012  who explored direct parentage using an exclusion probabilities algorithm, with a slightly different sample, thus explaining minor differences.
Geography and history
The three main clusters revealed by our study, both with STRUCTURE and Ward’s methods, confirmed previously obtained molecular results [5, 9] and the eco-geographic grouping proposed by Negrul , in particular the correspondences between the “proles” occidentalis and S-3.1/W-3.1 groups, the pontica and S-3.3/W-3.3 groups, and the orientalis and S-3.2/W-3.2 groups. Our results allow us to subdivide these clusters according to cultivar putative geographical origins: i) West and Central Europe (S-3.1), ii) East Mediterranean, Caucasus, Middle and Far East (S-3.2), and iii) Balkans and East Europe (S-3.3). Clustering at K = 5 identified two new groups, an Iberian Peninsula group and a group of table grape obtentions with Italian Peninsula and Central Europe origins.
Genetic characterization of the groups clearly showed the East table grape group (S-3.2 and S-5.2 for K = 3 and 5 respectively) as the most diverse in terms of mean number of alleles, number of private alleles, and non-biased heterozygosity. This is consistent with the hypothesis that grapevine domestication initially occurred in Eastern regions (Caucasus and Fertile Crescent) as suggested earlier [2–4, 9], repeatedly introducing genes from the wild. The high frequency of private alleles in S-3.2 and S-5.2 could also be explained by a history of limited exchanges from East to West, as attested by the high differentiation values (Dest) between these regions, and a slower development of grape breeding in the East, as indicated by the low frequency of family-related genotypes in that region as compared to other regions, revealing a weaker selection bottleneck effect there. However, given the high genetic diversity of grapevine at all subdivision levels, the selection and breeding bottlenecks seem in general weak for this crop.
The second most diverse group was the West and Central Europe wine grape group, probably as a result of this area’s long history of grapevine cultivation and development, in combination, as already stated by other authors, with gene flow from local wild or primo-domesticated grapevines [9, 10, 18]. The Balkans and East Europe cluster also formed a well identified STRUCTURE group with an intermediate diversity. The two additional groups at Ks = 5 (the Iberian Peninsula group and the group of table grape obtentions), appeared as secondary groups with a lesser global diversity.
More generally, the full hierarchical partitioning obtained with the STRUCTURE and Ward methods as well as the Dest differentiation statistics appeared consistent with historical data, such as the diffusion of viticulture around the Mediterranean Sea, with one route connecting Eastern (W-3.2) to Western Europe through the Balkans and Central Europe (W-3.3, W.3.1) [2, 9], and a Southern route to the Maghreb and Iberian peninsula (W-3.2 /W-5.1 / W-12-4).
The Balkans and Eastern Europe group and the Western and Central Europe group were both characterized by a large proportion of genotypes belonging to one STRUCTURE group only, probably corresponding to separate regional grapevine cultivar development and selection. In contrast, other regions as Russia and Ukraine, the Iberian Peninsula, and the New World countries, contain a mix of two or three STRUCTURE groups, in relation to their regional position. In particular, varieties found in Russia and Ukraine appear to have either East (S-3.2), Balkans and East Europe (S-3.3) origins, consistently with what we know of the centralizing impact that Russian agricultural research had during the Soviet period . Similarly, the Iberian peninsula group include cultivars from West Europe (S-3.1), East (S-3.2) and Maghreb (S-5.1) as well as a high proportion of admixed genotypes, in coherence with the long historical exchange relationships this region had both with Europe and North Africa. Based on maternally inherited chloroplast markers, Arroyo-Garcia et al. suggested that the Iberian Peninsula could be a secondary center of domestication . Our results add a new view of Spain and Portugal as platforms of centralization, intermixing and exchange of varieties throughout history.
Finally, at Kw = 12, the genotypes from the eastern regions (proles orientalis) further subdivided into two sub-groups, one mainly composed of wine cultivars of Caucasian origin (including Georgia, Armenia, Azerbaijan and Turkey, W-12.12), and the other comprising table cultivars from Central Asia (Tajikistan, Uzbekistan, Turkmenistan) together with Iran and Afghanistan (W-12.11). The separation of these two groups may be a trace of divergent selection for the main local use for grapevine (table vs. wine). On the other hand, the absence of admixture in the Middle and Far East group, in particular for the 72 cultivars from Uzbekistan, Afghanistan, Tajikistan, Turkmenistan and Iran, and the high K scores of its genotypes may be an indication that the corresponding center of domestication was larger than formerly believed (several authors indeed placed it in a geographic region between the Black Sea and Iran [2, 3, 40, 41]), an hypothesis already proposed in 1976 by Olmo , but not confirmed by later studies. It is difficult to decide between these two scenarii since the information available on grapevine crop development is quite limited for Central Asian countries.
A large proportion of admixed genotypes was found by STRUCTURE, both at Ks = 3 and Ks = 5. A previous study on maize indicated that, in crops, STRUCTURE grouping is generally coherent for first cycle inbreds with simple parentage relationships, while the presence of multiple levels of family relationships and cohort overlapping in more advanced breeding systems leads to different grouping possibilities and low STRUCTURE stability . We can infer that our sample contains both types of material, with a number of ancient varieties anchoring the main clusters (founders), and recent breeds complicating structure resolution. The stability of Ks = 3 and Ks = 5 groupings and the individual percentage of cluster ancestry allowed us to discriminate among these two types of materials. The geographic distribution of the admixed genotypes is not “random” (Table 1): the Middle-Far East is the region displaying the lowest level of admixture, while Italy in particular and secondly the Iberian Peninsula, display the larger proportion of admixed genotypes. We were unable to find other traits characterizing the admixed group: it is composed of even proportions of phenotypic classes of grape use, berry color, flavor, berry seed number, or sex.
While confirming and reinforcing the observation of geographic structure of the cultivated gene pool already described by other authors [5, 9, 16, 18, 38], our results are also coherent with the study of Cipriani et al.  suggesting that Italian varieties present weak or no structure: indeed in our study the Italian cultivars appear to be admixed, probably as a result of the inter-regional exchange role that Roman culture has certainly played.
Our results also provide information about the effect human selection on morphological traits had on shaping the genetic diversity of cultivated grapevine. Table and wine grapes have different berry size and bunch shapes, both important traits used for cultivar classification . Table and wine grapes are clearly separated by STRUCTURE at K
= 3. At K
= 5; only the group including Iberian and Maghreb cultivars (S-5.1) is composed of a mix of table and wine cultivars, which is likely the result of artificial selection and intimate cultivars intermixing in this area.
The black color of berries is considered as an ancestral trait compared with the other colors, both at phenotypic  and molecular level. The molecular basis of the apparition of red, rose, grey and white berry colors has been previously documented [43–45] and the diffusion of the major causal mutations – Gret1 insertion and K980 mutation – within the cultivated compartment was described by Fournier-Level et al. . In the present STRUCTURE analysis, the Central and West Europe subgroup (S-5.3) is composed of a majority of black cultivars. This can be explained by the isolation of these regions from the Eastern cultivars, by local domestication and gene flow from endemic black-berried V. v. sylvestris, or human selection. All other subgroups include a large number of white cultivars, reinforcing the idea of a wide and strong diffusion of Gret1 over the whole geographic range of grapevine . Most of the intermediary phenotypes (red, rose and grey) are concentrated within two groups: Balkans and Central Europe (S-5.5), and East (S-5.2), confirming these regions as putative sources of color variation .
The geographical origin of Muscat flavor is assumed to be Greece or the Balkan Peninsula [46, 47]. Thereafter, human selection aimed to spread this desirable trait in both table and wine grapes . With STRUCTURE, we found the majority of Muscat founders within the Central Europe table group (S-5.4). Only a small number of them were involved in breeding, essentially in the Balkans, forming kingroups with other known parents such as Chasselas.
Seedless cultivars clustered essentially with cultivars of Turkish, Caucasian and Asian origins, belonging to the proles orientalis, coherently with available historical data about their origins from Turkey and Near-East .