Characterization of ploidy and genomic constitution
Flow cytometry was used to define the genome size (2C content) and the ploidy level of 224 accessions. From the 221 section Musa accessions, only five (2.3%) presented conflicting results with the passport data. Similar discrepancy between estimation of ploidy by morphological characterization and flow cytometry had been reported [61, 62]. Previously, it was believed that nuclear DNA content would be a good predictor of genomic constitution , as the BB genome was thought to be on average 12% smaller than the AA genome . However, in our study the estimated size of genome A or B did not differ among the various ploidies and genomic groups, and therefore, estimating C values by flow cytometry alone could not distinguish the genomic constitution. The predicting value of genomic constitution might be affected by minute differences in the size of individual A and B genomes; variation in the number of sets of chromosomes from distinct genomes in triploids or tetraploids, including the occurrence of aneuploids ; the involvement of other Musa genomes, such as the presence of S or T genomes (from M. schizocarpa or M. textilis, respectively) in some cultivars ; or the lack of additiveness of genome size, caused by recombination, resulting in different proportions of genomes A or B [66, 67].
Determination of genomic constitution by molecular markers has long been sought, with attempts to use RAPD  or SSR [23, 28, 47, 69, 70], but with limited precision to determine the genome dosage. When we evaluated the ITS PCR-RFLP approach using standard cultivars, it was possible to identify all expected digested fragments, except the smallest one (50 bp) reported by Nwakanma et al. , which was not predicted by in silico digestion (not shown). Simulating the various A and B genome constitution and dosages indicated the ability to distinguish most genome combinations (BB, AAB, ABB and AB); however AAB could not be distinguished from AAAB, and ABB could not be distinguished from ABBB, possibly because of amplification competition. For successful adoption of this approach, knowledge about ploidy is essential . When the ITS PCR-RFLP approach was applied to the whole collection, the genomic constitution of most of the accessions was congruent with the morphologic classification available, as previously reported . Our data indicated that determination of ploidy and genomic constitution using morphologic descriptors can still be considered reliable and useful in most cases, with few exceptions.
Noteworthy, our study revealed that a few accessions presented unexpected behavior, such as ‘Yangambi no.2’ (28) and ‘BRS Tropical’ (79), recognized as AAB and AAAB, respectively, but they exhibited typical AAA and AAAA digestion profiles. These changes in the restriction profiles for ‘Yangambi nº 2’ and ‘BRS Tropical’ (a tetraploid hybrid from ‘Yangambi nº 2’) might have derived from a variant of the B genome rDNA-locus. Other unusual alleles were identified. For example, ‘Tugoomomboo’ (102), considered as AAA, exhibited an ABB PCR-RFLP profile, but it was classified as AAB by clustering analysis, suggesting the occurrence of the B genome allele for the ITS regions in one of the A genomes. The diploid AA ‘Madu’ (195) was indicated to be AB, with a slight change in the restriction fragment size for the B genome. This alteration in size was derived from a change in the RsaI restriction site, later confirmed by sequencing (not shown). This accession also exhibited ancestrality from group VI of AAB and AAAB and XVIII of AAA/AA/AAB (Figure 4). Such results can be related to the occurrence of recombination between the A and B genomes [5, 66, 67].
Incomplete concerted evolution of ITS sequences observed in Musa hybrids, with the predominance of the original parental alleles, might derive from the absence of sexual reproduction . But the observation of unexpected genotypes, demonstrated by sequence analyses of ITS and ETS regions of rDNA, have pointed to the occurrence of recombination between A and B or between M. acuminata subspecies genomes [5, 20, 71]. Homeologue pairing and recombination between A and B chromosomes have been actually observed in meiosis of triploid hybrid accessions (AAB and ABB) and an allotetraploid (AABB), and appeared to occur at some frequency [66, 70].
Therefore, despite fact that small differences in genome size between M. acuminata and M. balbisiana are recognized, the occurrence of chromosome recombination and multivalent pairing during meiosis, leading to unbalance genome segregation, could generate a continuum in genome sizes among accessions, overlapping differences and impairing the ability to distinguish genomic constitution, as corroborated by our results and others [61, 62]. Similarly, our results from PCR RFLP of ITS sequences pointed to the occurrence of recombinants, with the lack of B alleles in two hybrid accessions (AAB and AAAB), or the B genome allele in one of the A genomes for a ABB and AA. Exceptions from the commonly observed incomplete concerted evolution might be associated with the occurrence of sexual reproduction, with meiosis offering the possibility for homeologue chromosome pairing generating recombinant chromosomes.
Genetic diversity and clustering analysis
Sixteen SSR loci were used, revealing 182 alleles, with an average of 11.5, while Christelová et al.  detected an average of 15.4 and 14 alleles for 70 diploid and 38 triploid accessions, respectively. Within each ploidy level, the BB genome group presented a higher proportion of accessions with only one allele (homozygosis) as previously reported , suggesting a lower genetic variability  or the occurrence of a large number of null alleles among the accessions evaluated. Conversely, in cultivated AA accessions, structural heterozygosity [9, 73] might justify larger average heterozygosity (62.4%), as well as limited fertility [7, 9, 73, 74], in comparison to the wild diploids (mean 56.4%) (Additional file 1: Figure S1). Previous studies reported heterozygosity of 61% for cultivated AA and 53% for wild diploid accessions based on SSR markers , and 61% for cultivated AA and 53% for wild AA using RFLP markers .
In our study, it was verified a high proportion (more than 75%) of accessions producing one and two alleles among triploids. Banana triploid cultivars supposedly originated from crosses between non-reduced 2n gamete (restitution of the first or the second division) and reduced n gamete. The formation of non-reduced gametes tends to be higher when two different genomes are involved, such as in the case of AB or AA hybrids between subspecies of M. acuminata, as in the cultivated diploids [8, 9]. In the case of triploids, they most likely resulted from crosses between heterozygote diploid individuals, such as the cultivated diploids with non-reduced gametes (2n) and another individual (n) carrying a similar allele to one found in the other parent. This hypothesis is supported by the finding that the most frequent alleles found in cultivated AA diploids were observed in increasing frequency in triploid and tetraploid accessions, containing increasing dosages of the M. acuminata genome (Figure 5A). The association with cultivated diploids is justified by the presence in cultivated triploids and tetraploids of domestication traits, such as parthenocarpy, sterility and pulp yield . Further, Ortiz  investigated the occurrence of non-reduced gametes and observed that all genotypes that produced 2n gametes also produced fruits by pathernocarpy. Many cultivated triploids presented the same mitochondrial and chloroplast patterns as the cultivated diploids . The M. acuminata spp. banksii and M. a. spp. errans subspecies, characterized as cultivated diploids, are involved in the development of almost all the cultivated diploids and triploids and parthenocarpic cultivars [2,9,10].
Despite the fact that there was a trend of the participation of AA(C) in some accessions, only 34% (ABB); 39% (AAB); 57% (AAA); 42% (AAAB); and 70% (AAAA) of the accessions contained such alleles. This fact reinforces the previous observation from PCR-RFLP, that the origin of cultivated bananas might have involved recombination events (inter- and intraspecific) and backcrosses between species as well as human intervention. Therefore, a cultivar cannot carry the whole allelic complement from a specific genome A or B . On the other hand, 40% of the alleles present in the eight BB accessions were not detected on ABB, most likely because there is a larger diversity of BB in the formation of ABB. Hippolyte et al.  also verified a larger diversity in the B genome of interspecific hybrids, such as ABB, than in BB, suggesting an under-representation of the M. balbisinana diversity or the extinction of the parental donor of the B genome in these hybrids. Our study also detected these differences (Additional file 1: Figure S2), but when compared to BB, ABB showed to be more uniform (q > 0.91 for 62.5% and 87.5% of accessions) in the Structure analysis (Figure 7 and 4).
The analysis performed by converting SSR genotyping into binary data and using it to estimate dissimilarities among genotypes revealed a broad genetic variability among Musa accessions (Additional file 1: Table S2). SSR loci enabled the separation of accessions into two major clusters (one with at least one copy of the B genome, and the second with those exclusively with the A genome) and according to genomic constitution. Further subdivision, in general, corroborated the classification into banana subgroups (‘Pome’, ‘Plantain’, ‘Cavendish’, ‘Gros Michel’, ‘Bluggoe’ ‘Silk’, and ‘Pisang awak’). The most diverse accessions were AA diploids and the less diverse were subgroups of commercial interest, such as ‘Pome’, ‘Plantain’, ‘Cavendish’, ‘Gros Michel’, and ‘Bluggoe’, corroborating previous studies [21, 22, 28, 29, 70, 77–79]. Banana subgroups are characterized by genotypes that share similar agronomic and fruit quality traits , which are believed to originate from a common ancestor, meaning, one single meiotic event and the total lack of a sexual stage in the evolution of these subgroups , which justifies the small genetic differences. However, large morphological differences are observed in the field maintained by asexual propagation [78–80]. Epigenetic regulation might help to elucidate phenotypic differences within subgroups not correlated with genetic differences [66, 76].
In addition to the contribution regarding the identification of duplicated accessions, definition of the ploidy level and genomic constitution of the accessions, the cluster analysis based on SSR also enabled us to infer to which subgroup the natural triploid accessions belong, according to their allocation in the phenogram. This is a key aspect because it enabled us to separate accessions with similar agronomic attributes. This information can be used by breeding programs to develop hybrids, which requires certain agronomic or qualitative requisites of the subgroups. However, two clusters (identified as ‘unknown’; Figure 3) need to be further investigated for proper categorization.
Population structure and genetic relationships of accessions
To our knowledge, this is the first work to explore the co-dominant nature of the SSR markers in Musa accessions with distinct ploidy levels using the Bayesian model from Structure. Establishing the relationships and evolution of the genomes of modern cultivars, landraces and their wild relatives is of great importance to determine the effect of human intervention on the process of domestication and to understand the geographic dimension of the diversity and the domestication process of wild species . Many species have undergone a long and complex period of domestication and breeding with limited gene flow, it is expected that there is a complex population structure [81, 82].
Here, we suggested the separation of 224 accessions into 21 subpopulations (groups) based on the method proposed by Evanno et al. . Such elevated number of groups was expected considering that accessions with different genomic constitution (AA, BB, ABB, AAB, AAA, AAAA, and AAAB), and from distinct subgroups (‘Pome’, ‘Plantain’, ‘Cavendish’, ‘Gros Michel, etc) from the variou genomic groups were analyzed. In general, the grouping by Structure, even considering some alleles missing, was congruent for most groups formed (triploid and tetraploid accessions, especially) in the phenogram generated based on SSRs as dominant markers (without the exclusion of alleles). The agreement between both sets of data showed that the adaptations did not jeopardize the information from the alleles used in the Structure analysis, which also incorporates ancestrality to each group.
There are emerging evidences that the process of evolution of cultivated bananas might have not derived simply by hybridization followed by selection and clonal propagation (“single-step domestication”), but, on occasions, episodes of meiosis, recombination and fertilization might have eventually occurred [5, 66, 71]. In our study, evidence of mixed population ancestry, given by membership value (q ≤ 90%) was verified for wild and cultivated diploids, similar to what was observed for tetraploid hybrids from breeding programs. For triploid accessions, there was evidence of admixture (12.5% of ABB accessions; 39.5% of AAA; and 42.1% of AAB) with ancestry mostly in two, or many groups (with minimal ancestry to each group), suggesting multiple origins and/or the occurrence of recombinations more often than expected. However, accessions from subgroups ‘Plantain’ (group V), ‘Cavendish’ and ‘Gros Michel (X), and ‘Pome’ (XX) were highly homogeneous, with a few exceptions.
The subgroup ‘Pome’ (AAB; group XX; Figure 4) contained the most cultivated accessions in Brazil, and the Embrapa´s breding program has focused on the development of tetraploids derived from crosses between a partially fertile cultivated female parent (AAB), producing non-reduced gametes (2n), with a male diploid pollen-donor (AA), with novel desirable characters, such as disease resistance. Here, all these ‘Pome’ tetraploid hybrids from Embrapa demonstrated ancestry to the parental diploids ‘M53’ (Group IV) or ‘Calcutta 4’ (Group XI). Similar to what was observed for ‘Pome’ tetraploid hybrids, all the improved AAAA hybrids from ‘Gros Michel’ (94, 95, and 96) presented ancestry to diploid groups VII or II. In the ‘Pome’ subgroup (XX), from five triploids inferred as mixture, only 59 and 193 displayed a clear ancestrality to groups XVI and II, respectively. Curiously, ‘FHIA-02’ (91) is reported to be an AAAA hybrid, from a cross between ‘Williams’ and the diploid ‘SH3393’ with characteristics of the ‘Cavendish’ subgroup , but here it presented only 22% of the genome as ‘Cavendish’, suggesting to be ‘Pome’ (Table 1; Figure 3 and 4). Other FHIA hybrids, whose diploid parents were probably not represented in this study displayed ancestry in groups X (‘Cavendish’/‘Gros Michel’), XVI (‘Silk’/‘Mysore’) and XIX (Figure 4).
‘Cavendish’ and ‘Gros Michel’ were separated into two close subgroups in the cluster analysis (Figure 3); however, according to Structure (Figure 4), representative accessions from these subgroups appeared in the same group, most likely because they share common alleles [2, 8]. Similar results were also observed using RFLP , microsatellite , and DArT markers , while sharing the same cytotype for organellar genomes as shown based on PCR-RFLP . Hippolyte et al.  proposed that accessions from subgroup ‘Cavendish’ and ‘Gros Michel’ are derived from a common 2n gamete donor, and most likely two different, but genetically close, n donors. Raboin et al.  proposed the accessions ‘Sa’ and ‘KhaiNai On’ as the probable n gamete donor for ‘Gros Michel’ subgroups. In our study, two diploids with identical denominations (173 and 186) were allocated to group IX, but only accession 136 (‘Amritsagar’) from group X (‘Cavendish’/‘Gros Michel’) presented ancestrality (q ~ 18%) to group IX, which gives support to the proposed diploid origins of subgroup ‘Cavendish’ and ‘Gros Michel’. In addition, the diploid ‘Lareina BT100’ (205) was placed in group X and it could be a potential 2n gamete donor for ‘Cavendish’ and ‘Gros Michel’. Therefore, diploids from group IX and ‘Lareina BT100’ appeared as potentially related parentals of the ‘Cavendish’ and ‘Gros Michel’, which could be used in crossing programs or chromosome manipulations (doubling) to obtain/re-synthesize ‘Gros Michel’/‘Cavendish’ hybrids.
Noteworthy, some AAB and AAA triploid accessions demonstrated ancestry to other groups, containing other accessions with similar genomic constitution. It is known that some hybrids showed various degree of residual fertility and it is possible that their evolution involved episodes of sexual reproduction, as suggested by the backcross hypothesis .
Our results indicated that Structure was efficient in the detection of ancestry of recently developed tetraploid hybrids by breeding programs in Brazil (‘Pome’) and Jamaica (‘Gros Michel’) with a defined genealogy, and for some triploid cultivars. However, this approach appeared to be less efficient to detect the ancestry of most of the primeval triploid accessions, which make up the main commercial subgroups (‘Pisang awak’; ‘Gros Michel’; ‘Cavendish’; ‘Pome’; ‘Plantain’). This absence of detection of ancestry might be explained by a series of hypotheses.
One possibility is that potential parental diploids for the main commercial subgroups were under-represented in the collection, such as demonstrated by the absence of ancestry in diploids groups for some recent tetraploid hybrids developed by FHIA evaluated in this study (Figure 4). Secondly, the long and uncertain evolutionary period that these triploid cultivars went through since they originated might have resulted in changes/mutations in loci, which could result in complete elimination or modification of the alleles in one of the parents. The ability to detect ancestry for recently developed tetraploid hybrids is important evidence supporting this hypothesis. The process of allopolyploidization can lead to activation of retrotransposons; elimination and rearrangements of parental chromosomes [86, 87], DNA sequence losses, apparently from the largest parental genome [66, 88] and from highly repetitive sequence regions . Such events might have occurred in M. acuminata, with a larger genome [62, 63] and more repetitive sequences than M. balbisiana. Thirdly, the limited number of loci used can also be a reason for the lack of precision in identifying the ancestry of commercial accessions, as a large number of loci would increase the chances of finding equivalent alleles in a group of conserved polymorphic loci among the cultivated triploids and the ancestral diploids. For example, other researchers did not find differences between accessions of the ‘Cavendish’ subgroup , but differences between the accessions of this subgroup have been identified here and by Christelová et al. , most likely because of the larger number of alleles identified per locus.
The relationship between diploids and AAB could have been affected by the potential occurrence of recombinations between homeologue chromosomes with distinct structural organization, contributiong to large genetic changes in allopolyploids . Recombinations between the A and B genomes can occur, and it can be frequent in triploid hybrids, while it might lead to unbalanced genome transmission with respect to the parental species [66, 67], justifying variations in AAB genomes, morphological expression of A and B characters, and no addictiveness, as hybrids may carry different recombinant A and B chomossomes (e.g. AB and BA) . Therefore, all these processes, occurring in isolation or combined, especially in M. acuminata subspecies can obstruct the inference of ancestry for most of the triploid accessions.
Concerning diploids, the groups formed by clustering analysis presented distinct behavior as to the one observed for the triploid and tetraploid accessions. In the Structure approach, the groups were defined based on the likelihood probability using allelic frequencies that characterize each population , making this method more reliable to evaluate the group of individuals. In our study, a limited number of accessions of the distinct subspecies were analyzed (seven accessions of ssp. malaccensis at groups I, VII, VIII, XIX; one ssp. errans at XVIII; five ssp. banksii at group IX; three ssp burmannica/burmannicoides at XI, XVIII; four ssp. siamea at VII, XI, XVIII; two ssp. microcarpa at XI, XVIII; and three ssp. zebrina at XI, XVIII), which limit inferences about the relationships among these distinct subspecies. Further, some of these AA diploids can intercross, and the classification in subspecies was merely based on spatial and temporal isolation, and some of the accessions might have an inter-subspecifc origin .
Despite the limited number of accessions for each subspecies, inferences from previous studies were supported. For instance, the grouping of five ssp. banksii (group IX) accessions with cultivated diploids have been reported [2, 84] with a clear distinction from other subspecies . Musa acuminata ssp. banksii originated in Papua New Guinea and the Northern Indonesian islands, geographically isolated from the other subspecies, and it is a preferential autogamous . Accession of this subspecies, presented low average heterozygosis (55.8%) and PIC value (36.6%). These homozygous loci for banksii and the cultivated diploids were also reported by Grapin et al. . When compared with the other subspecies, banksii presented high membership values (Figure 4).
In general, there was a diversified behavior of diploids with accessions of the same subspecies in different groups and/or with different subspecies, as verified for groups XI and XVIII (Figure 4). These two groups contained a few accessions of ssp. burmannica/burmannicoides; ssp. siamea; ssp. microcarpa and ssp. zebrina, corroborating the grouping obtained based on DArT , and the closer relationships between ssp. errans and ssp. microcarpa. However, these subspecies demonstrated distinct cytotypes based on PCR-RFLP . Assembling the distinct subspecies into the same cluster has been reported [2, 9, 84]. This behavior could be associated with the broad variability that exists within M. acuminata or the presence of many rare alleles in the subspecies  that may obscure genetic relationships. Further, differences in markers and methods of analysis, together with distinct accession names , and the identification of some accessions as being from a determined subspecies is still questionable  makes direct comparison between studies difficult.