Maximizing genetic representation in seed collections from populations of self and cross-pollinated banana wild relatives

Background Conservation of plant genetic resources, including the wild relatives of crops, plays an important and well recognised role in addressing some of the key challenges faced by humanity and the planet including ending hunger and biodiversity loss. However, the genetic diversity and representativeness of ex situ collections, especially that contained in seed collections, is often unknown. This limits meaningful assessments against conservation targets, impairs targeting of future collecting and limits their use. We assessed genetic representation of seed collections compared to source populations for three wild relatives of bananas and plantains. Focal species and sampling regions were M. acuminata subsp. banksii (Papua New Guinea), M. balbisiana (Viet Nam) and M. maclayi s.l. (Bougainville, Papua New Guinea). We sequenced 445 samples using suites of 16–20 existing and newly developed taxon-specific polymorphic microsatellite markers. Samples of each species were from five populations in a region; 15 leaf samples from different individuals and 16 seed samples from one infructescence (‘bunch’) were analysed for each population. Results Allelic richness of seeds compared to populations was 51, 81 and 93% (M. acuminata, M. balbisiana and M. maclayi respectively). Seed samples represented all common alleles in populations but omitted some rarer alleles. The number of collections required to achieve the 70% target of the Global Strategy for Plant Conservation was species dependent, relating to mating systems. Musa acuminata populations had low heterozygosity and diversity, indicating self-fertilization; many bunches were needed (> 15) to represent regional alleles to 70%; over 90% of the alleles from a bunch are included in only two seeds. Musa maclayi was characteristically cross-fertilizing; only three bunches were needed to represent regional alleles; within a bunch, 16 seeds represent alleles. Musa balbisiana, considered cross-fertilized, had low genetic diversity; seeds of four bunches are needed to represent regional alleles; only two seeds represent alleles in a bunch. Conclusions We demonstrate empirical measurement of representation of genetic material in seeds collections in ex situ conservation towards conservation targets. Species mating systems profoundly affected genetic representation in seed collections and therefore should be a primary consideration to maximize genetic representation. Results are applicable to sampling strategies for other wild species. Supplementary Information The online version contains supplementary material available at 10.1186/s12870-021-03142-y.


Background
Conservation of crop wild relatives (CWRs), wild plant species related to crops, is increasingly recognized as a vital component of both sustainable development for food security (Target 2.5 of the Sustainable Development Goals) [1] and biodiversity conservation (Target 9 of the Global Strategy for Plant Conservation) [2,3]. Importantly, this should include targeting conservation at the intraspecific level [4], essential for the functioning and flourishing of species, ecosystems [5] and crop breeding [6]. Associated with policy recognition is the need for assessments against indicators or targets. However, assessment of conservation at the genetic level is often lacking and poorly understood [4,7].
Conservation of CWRs should complementarily include both in situ and ex situ approaches [8]. Ex situ seed conservation can maintain numerous genotypes with minimal input [9]. However, knowledge of the genetic representativeness in ex situ seed collections, the proportion of alleles of wild populations also present in ex situ collections, has only been studied for a very small number of species. In fact, a recent meta-analysis of ex situ and in situ genetic comparisons only found six studies to include from seed bank collections [10]. Only two of these were of wild rather than cultivated species: a Mediterranean aquatic [11], and a temperate dioecious European tree species [12]. There is clearly, therefore, an evidence gap for reporting on the genetic representation of species in ex situ seed collections.
Presently, for seed collectors to maximise genetic capture in collections, sampling guidance is often broad, encompassing all species, or inferred from taxonomically or ecologically related species [13][14][15]. The general nature of such protocols does not always account for several key factors that shape genetics of populations and seeds, such as the existing genetic diversity of populations, the spatial distribution of plants in the environment [16][17][18], and species' reproductive systems [14]. It is therefore important to increase evidence-based sampling strategies to inform targeted future seed collections. Such evidence also provides valuable ecological information for in situ conservation and increases the value of seed collections, as it improves the selection and targeting of seed samples in breeding or phenotyping experiments.
Seed conservation of banana CWRs (Musa L.) is a case in point. Bananas, together with related plantains (both are Musa), are the most important fruit and among the most important crops in the world [19]. Global production is estimated to be 116 million tonnes annually, worth $31 billion (average of 2017-19) [19]. Worryingly, several biotic threats, such as by Fusarium Wilt Tropical Race 4 and Banana Bunchy Top Virus, threaten banana production. The small genepool of bananas make them, and the many millions of people who rely on them, particularly vulnerable [20,21]. There are around 80 species in the genus Musa [22]. They are tall herbaceous monocarpic monocotyledons native to tropical and subtropical Asia and the western Pacific. Most cultivated bananas and plantains derive from two species: M. acuminata subsp. and M. balbisiana [23][24][25][26]. The Fe'i bananas of Pacific regions are a distinct cultivated group, deriving from M. maclayi [27]. The focal species included in our study (Table 1), are therefore of interest to breeders (e.g. [40]).
Conservation of banana CWRs is increasingly important because they are under threat. In a recent study [41], 15% of species were provisionally assessed as endangered and an additional 19% vulnerable to extinction. Furthermore, 95% of Musa species were assessed as insufficiently conserved ex situ [41]. There are only 163 genotype accessions of 35 species maintained in genebanks as living plants [42]. Additionally, there are 131 seed accessions, multiple seeds collected from the same individual or population, of 10 species, stored at the Millennium Seed Bank, UK [43]. Many Musa species are therefore not represented in genebanks at all or are represented with little or as yet unknown representativeness. An evaluation of the genetic representation of present collections will help target future conservation efforts.
The objectives of the present study are to assess and compare the genetic capture in seed collections compared to their source populations, at both regional and local scales, for three focal species; to provide guidance about how to maximize genetic capture for future seed collections; and to provide direction for seed distribution on how to provide representative seed samples.

Results
Representation of populations in seeds at the regional level Allelic richness of seeds as a proportion of populations was 51, 81 and 93% (respectively M. acuminata, M. balbisiana, and M. maclayi, Table 2). Allelic richness (AR) of populations and seeds of M. maclayi was much higher than the other two species. In general, populations had many alleles that were private (PA); seeds had a few PA -indicating that some pollination occurred by plants not present in population samples. Only two alleles in M. acuminata seeds were private.
For all species, populations were characterized by having more rare alleles than seeds; and seeds had more common alleles (Fig. 1a). Seeds, therefore, captured most of the common alleles of populations, yet less so the rarer alleles. Musa maclayi had a notably high number of rare alleles in both populations and seeds.

Representation of populations in seeds at the local level
Allelic richness of local seeds as a proportion of the local populations from where they were collected was 56 ± 20%, 76 ± 42% and 78 ± 18% (mean and standard deviation, M. acuminata, M. balbisiana and M. maclayi respectively, Table 3). M. acuminata seed collections had very low H o including two bunches (infructescences) where H o was zero (Table 3). H exp varied considerably in local populations of M. acuminata. Nuru was the most diverse (H exp = 0.36), notably seeds from this population were completely homozygous (H o = 0). Sandaun was the least diverse M. acuminata population (H exp = 0.06). Inbreeding coefficient (F is ) was high for all M. acuminata populations, Vanimo being the most inbred (F is = 0.97), its seeds were also completely homozygous. The Ramu population was the least inbred (F is = 0.41), and had seeds with the highest heterozygosity (H o = 0.05). Musa balbisiana populations are also characterized by a low degree of diversity, yet inbreeding coefficients were much lower compared to M. acuminata. By contrast, populations of M. maclayi were characterized by a high level of heterozygosity and genetic diversity and low inbreeding coefficients. Populations of M. balbisiana and seeds of M. maclayi had negative F is meaning an excess of heterozygotes.

Targeting local seed collections
Most variance found in population samples was within local populations rather than between local populations, according to AMOVA (M. acuminata 70%, M. balbisiana 75%, M. maclayi 82%). This means that in order to maximize genetic capture in seed collections, targeting local populations is less important than increasing the number of bunches collected. This also reflects the real experience in seed collecting because the genetic structure of subpopulations is usually not known at the time of collecting. Knowing how many local collections to make is therefore more informative.
To assess the cumulative addition of local seed collections as proportion of total allelic richness, total allelic richness of populations and seeds were estimated, and then the mean and standard error of each local population and bunch was added cumulatively. To capture 70% of alleles estimated to be present in the region, at least four bunches need to be collected for M. balbisiana (Fig. 1b); for 90% of alleles, at least five bunches are necessary. For M. maclayi three bunches are needed to sample 70% of regional AR, and four bunches for 90%, despite the much higher AR. Allelic sampling for M. acuminata had a different profile. It was not possible to collect even 70% of regional AR, and for each bunch collected there was minimal gain in allelic capture. If many more bunches were sampled it may be possible to capture up-to 70% regional AR, but this would probably require > 15 bunches, based on extrapolation (Fig. 1b). Alleles in seeds of M. acuminata are largely shared by all bunches and by the regional population (Fig. 2). Each cumulative bunch adds only a few alleles. Populations with minimal shared alleles, and therefore maximum coverage, include combinations of Ramu and Vanimo, and Ramu and Nuru. Musa balbisiana displayed a similar overlapping pattern of shared alleles to M. acuminata, with very little gain each time a bunch was added. Seeds from the Can Cau population had the least shared alleles. Bunches from M. maclayi had less overlap, suggesting genetic structure and isolation by distance. No alleles in bunches were shared by all bunches, a large proportion of regional alleles was covered by bunch ellipses.
Genetic differentiation of local populations was detected using a permutation test on the AMOVA of population samples (M. acuminata φ =0.29, p = 0.001, [46] was calculated pairwise between all local populations and local seeds (Fig. 3). For M. acuminata genetic distance was low between all samples. Populations and seeds from Vanimo were most distant from other samples. Seeds Selecting seeds from the same local collection Accumulation of AR of seeds per bunch was estimated (Fig. 1c). For M. acuminata, a single seed contained over 90% of the AR of the whole bunch. The allele accumulation curve is virtually flat, seeds are therefore more-orless genetically identical. For M. balbisiana over 70% of alleles are found in only two seeds, and 90% of alleles in 10 seeds. For M. maclayi, 70% of the estimated total alleles in the bunch are captured by 16 seeds. To achieve 90% of the total the accumulation trend line must be extrapolated considerably beyond the data to around 35-50 seeds.

Discussion
Genetic capture in seed collections compared to their source populations, for three focal species Allelic richness of seeds as a proportion of populations met the conservation target of 70% from the Global Strategy for Plant Conservation [2] for two out of the three focal species (M. balbisiana 81% and M. maclayi 93%). M. acuminata only achieved 51% proportional allelic richness. In all cases several seed collections were required from different local populations to maximize genetic capture. The number of collections necessary to achieve the 70% target was species dependent. For M. acuminata it was > 15 local seed collections, for M. balbisiana four and M. maclayi three.
All common alleles of populations were included in seed collections, but the level of representation was lower for rare alleles, with some rare alleles missing from seed collections (Fig. 1a). Brown and Marshall's sampling strategy [15], used by about two thirds of leading seed conservation institutions [13], advocate sampling 30 individuals for out-crossing and 59 for selfing species, with the aim of having a 95% chance of capturing alleles with frequency < 0.05. The results of the present study show that collecting from a much lower number of mother plants (a total of five) resulted in relatively high genetic capture, even including alleles rarer than the threshold set by Brown and Marshall. Furthermore, against the key success criteria proposed by Brown and Marshall [15] capturing locally common alleles (because globally common alleles are easily collected in any sample and globally and locally rare alleles are ultimately limited by the sample size) -our results show that seed collections of M. maclayi and, to a lesser extent, M. acuminata and M. balbisiana, were successful (Figs. 2 and 3).
A high degree of homozygosity and low level of diversity, apparent in populations and seed collections of M. acuminata subsp. banksii in our study, as well as that of Christelova et al. [47], corroborate a typical genetic signal that is associated with self-fertilization [48]. Unlike cultivated and most wild banana species (including other M. acuminata species), seed bearing M. acuminata subsp. banksii are characterized by self-compatible hermaphroditic flowers, particularly in the upper hands on the inflorescence [31,37], these likely self-pollinate by autogamy prior to flower bract opening [49]. Similar floral morphology, and therefore probable selfpollination, is also observed in M. acuminata var. chinensis [50], M. boman [29], M. jackeyi (interestingly closely related to M. maclayi) [31], M. ingens [29], M. rubinea [51], M. schizocarpa [29], M. yunnanensis [50] and M. zainfui [52]. Seed collections derived by selffertilization are naturally more representative of the mother plant than the population. Therefore, to capture the genetic diversity in populations of self-pollinating Musaceae species, many mother plants must be sampled.
By contrast, populations and seeds of M. maclayi were characterized by higher levels of heterozygosity and diversity, consistent with cross-fertilization [53]. Male and female flowers of M. maclayi are temporarily and physically isolated on the same inflorescence. Female flowers are produced first, followed by male flowers, as the peduncle grows [29]. Genetic capture in M. maclayi seeds therefore represent both the mother plant and pollen donors within the population. As a result, less bunches need to be collected to represent the population compared to M. acuminata. The level of diversity evident in the small number of populations of the present study was somewhat surprising, because of the narrow distribution and relatively recent diversification and dispersal of the former Australimusa group to which M. maclayi belongs [54]. This demonstrates the strong effect of mating system on genetic diversity in populations and seeds.
Populations of M. balbisiana, in our results, had low heterozygosity and diversity, this is in keeping with several previous studies [24,34,47,55,56]. Moreover, the heterozygosity of seed batches was much lower than within populations. Our results were similar to those found by Bawin et al. [57] for M. balbisiana seeds collected from ex situ field collections or feral populations, but our seeds were less diverse than those collected from native populations in Yunnan (China). Even though M. balbisiana basal flowers are functionally female [29] and do not produce seeds when pollinators are excluded [35], flowers may effectively be selfed from a different flower of the same genotype on the same mat or from vegetatively reproduced or planted neighbouring plants [58]. Furthermore, apomictic seed development has been described in M. acuminata [59] and induced in Ensete superbum with pollen from M. balbisiana [60], and may additionally explain levels of homozygosity and apparent clonality in M. acuminata and M. balbisiana seeds observed.
The low diversity in populations of M. balbisiana, in our results, may be caused by a genetic bottleneck and/ or founder effect. This hypothesis was also proposed by Ge et al. [34] and Shepherd [61] and is in keeping with Musa ecology: being early successional or disturbanceadapted [62,63]. Additionally, the intensive deforestation and reforestation that has occurred in Viet Nam over the past 50 years [64] may also be causal. Indeed, according to a recent study [65], the ecological traits of M. balbisiana makes them particularly vulnerable to genetic erosion from anthropogenic disturbance. Furthermore, as M. balbisiana has many uses by local communities [66], plants are often planted or encouraged in vacant land. Finally, seed collections may indeed result from introgression from neighbouring cultivated bananas, as perhaps evident in the Can Cau population. These possibilities illustrate some of the challenges associated with conservation of CWRs by seed.
Variation in genetic capture of different species of the Musa genus demonstrates the profound effect of mating system on genetic capture in seed collection. Taxonomic relatedness, therefore, is not a good proxy for a sampling strategy [67]. In support of our results, a recent study by Hoban et al. [68] found species in the same genus required on average 50% more individuals to reach desired levels of capture compared to others. Furthermore, depending on mating system, dispersal distance, life cycle and the sampling strategy employed -up to 5 times as many individuals may need to be sampled for the same level of genetic capture [14].
Guidance about how to maximize genetic capture for future seed collections To maximize genetic capture in Musa seed collections, firstly, we recommend that species mating systems should be considered to inform sampling strategies. Our results are therefore in support of Brown and Marshall's sampling strategy discussed above [15].
For self-pollinated Musa species, as many mother plants should be sampled from as possible. For species with wide distributions, populations should be spatially dispersed; however this is less important than increasing the number of plants collected from. Collecting seeds from many individuals of adequate quality for long term storage is highly challenging; it is not straightforward to find mature seeds in the forest suitable for storage [69,70]. It would certainly not be possible to collect from the 59 individuals proposed by Brown and Marshall [15], or even the 15 proposed here, in one collecting trip. As bananas produce fruit throughout the year, seed collections may therefore require repeated temporal sampling from populations.
To target collections of fully out-crossing species, fewer collections are required to represent regional alleles. We recommend collections should be focussed on increasing the number of local populations collected from rather than the number of mother plants in a population. Local populations should be spatially dispersed to maximize genetic capture. This will also allow for locally distributed alleles to be captured [15]. The amount of both rare and locally distributed alleles therefore depends on resources for collection, but there are diminishing returns associated with such effort.
For all species, but especially for out-crossing species, it is also important to target collections that are far from agriculture and human interference. Large and well established populations should be prioritised [65]. This will likely maximize genetic diversity in source populations [71], and avoid unwanted introgression from cultivated forms [72].
Direction for seed distribution on how to provide representative seed samples To ensure enough seeds are conserved, self-pollinated species only require one or two seeds from a bunch to be part of a core collection. There is also very little point in using many samples of self-pollinated seeds in experiments. This contrasts with fully out-crossed seeds, where more seeds should be conserved in core collections per bunch or used as samples in experiments. For M. maclayi 16 seeds represent 70% of alleles, and 35-50 seeds represent 90% of alleles. Even so, these numbers of seeds are easily achieved, for most Musa species at least, where a bunch can contain hundreds to thousands of seeds. However, for some species we have collected (e.g. M. ingens), only a few seeds were found in a bunch, perhaps due to inadequate pollination. Additionally, these findings mean that despite low levels of survival in storage of some collections [62,69], population genetic diversity can be protected in a few seeds.

Limitations
The present study was constrained in that only one mother plant was used per local population, and only 5 per region. It was therefore not possible to test the effect of additional local seed collections on genetic capture. This was because accessing bunches at the right level of maturity for germination and storage is one of the key challenges for seed conservation of banana CWRs [69]; often mature bunches are not to be found in a forest population. Furthermore, in the present study we compared genetic capture in seed collections at the regional level. This does not account for the full level of diversity across species distributions which may be much wider than that sampled here, particularly in the case of M. balbisiana (Table 1). Further research should be done to assesses isolation by distance of source populations and seed genetic capture to optimise sampling strategies that use species distributions across ecozones as sampling strategies (e.g. [73]). Additionally, sampling did not consider temporal effects in sampling, such as collecting from the same populations at different time points, this may prove important, at least for cross-fertilized species.
It is also important to emphasize that whilst broad comparisons between species are of interest, direct comparison between species from our results should be cautioned because different taxon-specific microsatellite markers were employed. Observed allelic variation may indeed be resultant of specific markers used, rather than actual differences, meaningful at species level. However, as we used suites of 16-19 markers per species in the present study this effect is minimised, despite this, any comparative interpretation should be taken with caution. Importantly, direct comparisons between species was not our primary purpose, rather, our aims were to assess genetic capture in seed collections compared to their source populations for three focal species.

Conclusions
Seed banks are efficient ways of conserving genetic diversity present in wild populations and making it available for future use in breeding programmes or conservation. However, because very little is known about both population and seed genetic diversity the representativeness and therefore the value and use of seed collections is limited. We have demonstrated the measurement of genetic capture in seed collections of three of the most important wild relatives of the most important fruit crop in the world. We have shown how targeted seed sampling should be species specific and genetically informed; notably, species mating systems and evolutionary history (whether natural or anthropogenic) have a profound effect on the level of genetic diversity in seed collections. The results of the present study may be applied in sampling strategies of other wild species, in that species mating systems should be a primary consideration to maximize genetic representation in seed collections.

Focal species
We focused on three wild  Table 1). In this study M. maclayi includes closely related M. bukensis and M. maclayi subsp. maclayi taxa that occur on the island of Bougainville. Based on the description of both taxa and personal observations there is evidence of introgression between the two taxa on the island, and it is unclear whether they are two different, or one single, species [29].

Study region and populations
Natural populations of focal species in their respective native ranges were sampled during several collecting missions that took place between 2016 and 2019 ( Fig. 4; Table 1). Collection of M. acuminata was carried out in Papua New Guinea (PNG) in June 2017 and May 2019 [69,74]. M. balbisiana was collected in Viet Nam during November 2018 and April 2019. Musa maclayi was collected on the island of Bougainville (PNG) in October 2016 [75,76].

Plant material
Leaf and seed samples were collected from wild natural populations. All seeds, leaves and data were collected and transferred according to local legislation, with permission and supplied for non-commercial use and research under the Standard Material Transfer Agreement in accordance with the International Treaty on Plant Genetic Resources for Food and Agriculture. None of the species included in the present study are CITES listed. Formal field identification was carried out by Steven B. Janssens (Meise Botanic Garden, Belgium). Leaf samples were collected randomly from 5 local populations per species (Fig. 4, Table S1). From each population, leaves from 15 plants on average were sampled and further used in this study. Dried leaf samples were taken to the laboratory following the field mission for DNA extraction. A single seed containing bunch (infructescence) was also collected from each population. Groups of fruits (hands) from the former clusters of flowers subtended by one bract, were separated and processed separately after shipping to Meise Botanic garden as described by Kallow et al. [69]. Bunches collected in Viet Nam were not separated by hand and were processed in a similar way in the laboratory of Plant Resource Center (Ha Noi, Viet Nam). In both cases, seeds were stored at 15% relative humidity and − 20°C prior to germination and DNA extraction.
To overcome barriers associated with low and unpredictable in vivo germination, seeds were germinated by embryo rescue as described by Kallow et al. [6]. Seeds were selected randomly from two to three hands per bunch, or, for three bunches of M. acuminata and all bunches of M. balbisiana, from pooled seeds from the whole bunch. Due to low seed numbers and viability of M. balbisiana accessions DNA was extracted directly from their embryos. An average of 16 seeds per bunch were used in this study.
For each population, exact coordinates were recorded with a Garmin GPS device. Detailed taxonomic field notes, and notes on geography and ecology, were recorded for each sample. Photographs of mother plants (the plant from which the bunch was taken) and of bunches were taken. Seed samples from PNG and Bougainville were accessioned into the Meise Botanic Garden seed bank (Meise, Belgium). Seeds from Viet Nam were accessioned into the seed bank of Plant Resources Center (Ha Noi, Vietnam).

Microsatellite PCR
We isolated DNA using a method adapted from Doyle and Doyle [77] and then sequenced samples using a suite of taxon specific polymorphic microsatellite markers arranged in multiplexes (Table S2). For M. acuminata we developed mutiplexes from previous studies [34,[78][79][80][81][82][83]. A total of 86 primer pairs were tested for amplification individually and then arranged in a total of 15 multiplexes using Multiplex Manager [84] and Multiple Primer Analyzer [85] with 12 M. acuminata samples. From this, 20 markers arranged in four multiplexes were selected. For M. balbisiana, we used the multiplex arrangement of Bawin et al. [57]. These included 18 SSR markers organized into four multiplexes. For M. maclayi, a total of 16 specific SSR markers were newly developed and optimized by Genoscreen (Lille, France) and arranged in four multiplexes. We used an M13 labelling protocol [86] to arrange multiplexes. We used the Type-it Microsatellite PCR Kit (Qiagen, Venlo, the Netherlands) to amplify microsatellite regions. We then sequenced the resultant PCR product on an ABI 3730 sequencer (Applied Biosystems, Foster City, California, US). See Supplementary Methods for detailed methodology.

Data analysis Fragment length analysis and quality check
We analyzed microsatellite fragment lengths using Geneious v 8.1.9 software. Loci and samples with more than 25% missing data were excluded from the analysis to allow for missing-ness to be similar for seeds and populations. This resulted in excluding from the data one locus used for M. acuminata data, eight for M. balbisiana and two for M. maclayi (Table S3). Several loci were missing from M. balbisiana presumably because of low DNA concentrations resultant of extraction from embryos rather than leaves. Resultant missing data was 3.9% for M. acuminata, 8.1% for M. balbisiana and 4.7% for M. maclayi. We then assessed allele data for null allele excess, large allele drop out and error due to stuttering using the Microchecker software [87].

Genetic assessment
Genetic assessment was carried out at two levels: the regional level whereby samples were pooled by either all local populations or all seeds per species; and the local level whereby samples were not pooled but kept separate from each local population and each bunch per species. At the regional level we calculated several indices to represent genetic diversity of populations and seeds. All computations were carried out in the R environment [88]. As a broad estimate of the amount of genetic material present, we determined AR rarefied to equal sample size [89], using the pegas package [90]. We counted PA, present in populations and not seeds and vice versa using the poppr package [91]; and, in order to assess the rarity of alleles in samples, assessed the relative frequency of alleles, computed in the adegenet package [92]. To represent the genotypic diversity of samples and to assess inbreeding we calculated H exp [44]. We also measured the H o to assess population genotypic diversity and inbreeding. The number of MLG was computed, as an indicator of clonality. Several commonly used diversity indices were also calculated using the poppr package [91]: H′, λ and E 5 . At the local level, we repeated calculations of AR, H o , H exp and additionally calculated F is [45] in the hierfstat package [93]. Indices were compared using two-sample t tests.

Cumulative proportional allelic richness
We assessed the level of genetic variance between and within local populations by performing AMOVA on population samples. As most variance was within populations rather than between, we considered genetic capture could primarily be maximized by increasing the number of local seed collections made. We therefore measured how many bunches are required to capture 70% (based on Target 9 of the Global Strategy of Plant Conservation) [2] and 90% (an arbitrary but sometimes used threshold) of alleles in the region per species. We did this firstly by calculating the total regional population AR using bootstrap resampling [94]. Then the AR of local populations and bunches were estimated and the mean and standard deviations of these was added cumulatively using bootstrapping, separately for local populations and bunches. Estimates were normalised as percentages of total extrapolated regional AR.
A similar approach was employed to estimate proportional cumulative AR of seeds per bunch. For this, total AR of bunches was extrapolated [95], and then seeds were added cumulatively as described. Computations were made in the vegan package [96]. Trend lines were plotted using the loess method in ggplot2 [97].

Allele groupings and genetic structure
An initial assessment of population differentiation was made by carrying out a permutation test (999 permutations) on the AMOVA described above. Secondly, We made allele groupings for local seeds and regional populations (http://bioinformatics.psb.ugent.be/webtools/ Venn) and plotted them as Euler diagrams using the eulerr package [98]. Thirdly, we calculated pairwise genetic distance of local populations and seeds [46], and produced a heat map with dendrogram using complete linkage hierarchical clustering. Finally, we assessed isolation by distance by comparing Euclidean distances of coordinates and population matrices of seeds and populations (separately) using the Mantel test.