Allopolyploid origin in Rubus (Rosaceae) inferred from nuclear granule-bound starch synthase I (GBSSI) sequences

Background Polyploidy and hybridization are ubiquitous in Rubus L., a large and taxonomically challenging genus. Chinese Rubus are mainly concentrated into two major sections, the diploid Idaeobatus and the polyploid Malachobatus. However, it remains unclear to be auto- or allo- polyploid origin of polyploids in Rubus. We investigated the homoeologs and the structure of the GBSSI-1 (granule-bound starch synthase I) gene in 140 Rubus individuals representing 102 taxa in 17 (out of the total 24) subsections of 7 (total of 12) sections at different ploidy levels. Results Based on the gene structure and sequence divergence, we defined three gene variants, GBSSI-1a, GBSSI-1b, and GBSSI-1c. When compared with GBSSI-1a, both GBSSI-1b and GBSSI-1c have a shorter fourth intron, and GBSSI-1c had an additional deletion in the fifth intron. For diploids, either GBSSI-1a or GBSSI-1b was detected in 56 taxa consisting of 82 individuals from sect. Idaeobatus, while both alleles existed in R. pentagonus and R. peltatus. Both homoeologs GBSSI-1a and GBSSI-1b were identified in 39 taxa (48 individuals) of Malachobatus polyploids. They were also observed in two sect. Dalibardastrum taxa, in one sect. Chamaebatus taxon, and in three taxa from sect. Cylactis. Interestingly, all three homoeologs were observed in the three tetraploid taxa. Phylogenetic trees and networks suggested two clades (I and II), corresponding to GBSSI-1a, and GBSSI-1b/1c sequences, respectively. GBSSI-1 homoeologs from the same polyploid individual were resolved in different well-supported clades, and some of these homoelogs were more closely related to homoelogs in other species than they were to each other. This implied that the homoeologs of these polyploids were donated by different ancestral taxa, indicating their allopolyploid origin. Two kinds of diploids hybridized to form most allotetraploid species. The early-divergent diploid species with GBSSI-1a or -1b emerged before polyploid formation in the evolutionary history of Rubus. Conclusion This study provided new insights into allopolyploid origin and evolution from diploid to polyploid within the genus Rubus at the molecular phylogenetic level, consistent with the taxonomic treatment by Yü et al. and Lu. Electronic supplementary material The online version of this article (10.1186/s12870-019-1915-7) contains supplementary material, which is available to authorized users.


Background
The genus Rubus L. belongs to the subfamily Rosoideae of the family Rosaceae, with 750-1000 species distributed worldwide except Antarctica [1][2][3]. Focke [1][2][3] established the widely adopted Rubus taxonomy that contained 12 subgenera, with the three largest subgenera of Idaeobatus, Malachobatus, and Rubus (Additional file 1). The number of Rubus species in China accounts for 97% of the total in Asia. More than 200 species have been recorded in China, of which 139 species are indigenous [4]. Basing upon the evolutionary tendency of morphological features, chromosome numbers of certain species and the distribution patterns of species, taxonomists in China [4][5][6] proposed a new systematic arrangement of Chinese Rubus, with eight sections (Additional file 1). The two taxonomic systems are concordant in the classification of most species, while the arrangement of sections is presented in a reverse order to those of Focke's system (Additional file 1). Most species are assigned into two major sections, Idaeobatus and Malachobatus, including 11 and 13 subsections, respectively [5]. Section Idaeobatus is characterized by its shrub habit armed with sharp prickles, aciculae or setae, leaves pinnately compound or simple, stipules attached to the petioles, flowers hermaphroditic and often in terminal or axillary inflorescences, very rarely solitary, and drupelets separating from the receptacles [5,6]. In contrast, members of sect. Malachobatus are usually woody with prickles, simpleleaved, stipules free, flowers bisexual and in cymose panicles, subracemes, and drupelets adhering to receptacles [5,6].
The evolutionary history of Rubus species inferred from different analyses has been argued for a long time. Based on morphological and chromosomal data, Lu [4] suggested that evolution in Rubus proceeded from woody to herbaceous plants, and from species with compound leaves to simple leaves. This proposal was consistent with the view of Kalkman [7]. However, ITS data conflicted with these hypotheses: primarily semi-herbaceous, simpleleaved species occupied early-diverging positions in the trees [8].
To reconstruct the evolutionary history of plant polyploid species using molecular data, it is necessary to deal with the presence and the evolutionary fate of multiple gene copies resulting from paralogs and orthologs [26]. Identification of homoeologs in polyploids is crucial for reliable phylogeny reconstruction, and also informative for identifying parental lineages and inferring auto-or allo-formation of polyploids [27]. Low-copy nuclear genes that succeeded in other Rosaceae are potentially ideal nuclear markers for phylogenetic analysis of Rubus complex. The GBSSI gene, coding for granule-bound starch synthase I, is single copy in most diploid angiosperms [28]. The entire gene consists of 13 translated exons and 12 introns. Phylogenetic studies have shown that GBSSI exons and introns are useful in resolving relationships among closely related genera and species [26], especially in detecting ancient hybridization events of polyploids [29,30]. In Rubus and most Rosaceae, the GBSSI gene is represented by two paralogous loci, GBSSI-1 and GBSSI-2, which can be differentiated by specific indels [29,31]. Partial GBSSI-2 sequences, as a single copy gene, have provided high phylogenetic resolution within Rubus [25]. Additionally, two different alleles of GBSSI-1 were detected in octoploid R. chamaemorus, inferring it to be an ancient allopolyploid that resulted from multiple hybridization events [30]. It is believed that GBSSI-1 gene is extremely helpful to reveal the origin and evolution for Rubus polyploids.
In this study, we explored the utility of GBSSI-1 to elucidate the evolutionary history of genus Rubus and particularly the auto-or allo-polyploid origin of the polyploids. Our objectives were (i) to investigate the number of GBSSI-1 variants within Rubus at different ploidy levels, (ii) to analyze the gene structure and conduct homoeolog identification, and (iii) to provide new insights into the polyploid origin and evolutionary history within Rubus by reconstructing the phylogeny.

Results
Gene variants and orthology identification of GBSSI-1 within Rubus As shown in Fig. 1 and Additional file 2, we obtained different GBSSI-1 variants (GBSSI-1a, GBSSI-1b and GBSSI-1c) within Rubus at different ploidy levels. Based on the definition of ortholog by Yu et al. [32], we carried out the orthology assessment. The different GBSSI-1 variants shared > 90% identity at the amino acid sequence level with a significant E-value (< 10 − 10 ), and distributed on the same zone of chromosome 7 by alignment with reference genome of diploid R. occidentalis L. [33] (Additional file 3). Orthology of the Rubus diploid sequences was also assessed using phylogenetic analysis. The dataset was obtained from our GBSSI-1 sequences from Rubus diploids and from GBSSI (1 and 2) coding region sequences of Rosaceae species available in GenBank. This matrix included 378 nucleotides sites, of which 141 were constant and 164 were phylogenetically informative. The phylogenetic tree (Fig. 2) grouped all the Rosaceae GBSSI sequences into two well-supported clades with bootstrap values of 96 and 95%, respectively. These clades represented paralogous genes, corresponding to GBSSI-1 and GBSSI-2 according to Evans et al. [29]. In the GBSSI-1 clade, all the Rubus diploid sequences fell in a wellsupported clade (99% BS), which provided evidence that these sequences were orthologous.
After treating the gaps as missing data, we obtained 195 sequences for GBSSI-1 gene (Table 1). GBSSI-1a existd in 83 individuals whereas GBSSI-1b was found in 58 individuals. Three taxa containing five individuals possess GBSSI-1c. The final aligned GBSSI-1a consisted of 1296 nucleotides with length ranging from 1139 to 1234 base pairs. There were 441 (34.03%) variable characters, of which 257 (19.83%) were parsimony-informative. The aligned intron 4 was composed of 517 bp with length ranging from 403 to 484 bp, which had 188 variable sites. Seven indels were present in the entire gene alignment. The indels consisted of 1-303 nucleotides. Two relatively large ones (an insertion of 136 bp, and an insertion of 303 bp) were found in GBSSI-1a group.
The length of GBSSI-1b varied from 942 to 1001 bases. There were 234 (22.76%) variable sites, of which 134 (13.04%) were parsimony-informative in 1028 aligned nucleotides. The intron 4 contained 252 aligned nucleotides from 191 to 249 bp, and 65 variable sites. The alignment of the entire gene had four indels, each including 1 to 9 nucleotides. The aligned GBSSI-1c contained 913 bp with length range from 760 to 822 bp, of which just 11 were variable. JModelTest suggested that the best-fit model selected by Akaike Information Criterion (AIC) was TIM2 + G for GBSSI-1 dataset.

Orthologs of GBSSI-1 gene in Rubus
Orthology assessment is an important concern when using nuclear genes to reconstruct phylogeny, since paralogous sequences may lead to erroneous phylogenetic inferences [34,35]. We carried out sequence alignment and phylogenetic analysis to test the orthology and paralogy of GBSSI-1. Rousseau-Gueutin et al. [26] hypothesized orthology of the DHAR sequences because they shared similar positions in both diploid and the cultivated octoploid strawberry genomes. The GBSSI-1 sequences from Rubus shared the same location among different genomes (Additional file 3). From the phylogenetic analysis (Fig. 2), we observed the Rubus sequences belonged to the same gene copy, GBSSI-1, which supported their orthologous status. Compared with single copy GBSSI-2 in Rubus [25], GBSSI-1 gene was complex within the genus. Either GBSSI-1a or GBSSI-1b was detected in most diploids, while both of them were detected in R. pentagonus and R. peltatus, indicating their probable interspecific hybrid origin. Interestingly, different orthologs were identified based on gene structure within subsect. Corchorifolii of sect. Idaeobatus (Fig. 1, Additional file 2). Four taxa had GBSSI-1a and the other three had GBSSI-1b, which were clustered into subclades E and H1, respectively (Fig. 5). The two subclades belonged to clade II in gene trees, incongruent with their structure difference. We speculated that the GBSSI-1b originated from GBSSI-1a in some diploids by mutation. The two homoeologs also existed in majority of polyploids of sects. Malachobatus, Dalibardastrum, Chamaebatus, and Cylactis (unknown ploidy levels). Several sect. Malachobatus species even had GBSSI-1a, GBSSI-1b and GBSSI-1c. Tetraploid R. crassifolius (sect. Malachobatus) and blackberry cultivar ' Arapaho' (sect. Rubus) were exceptions with just one copy.
Of the 195 GBSSI-1 sequences in this study, seven contained stop codons and might have become pseudogenes, containing GBSSI-1a in R. fragarioides var. pubescens, GBSSI-1b in R. lambertianus and five GBSSI-1c sequences in R. lambertianus, R. lambertianus var. paykouangensis and R. panduratus (Additional file 2). All of them had deletions or insertions in the exon regions, leading to the nonsense mutation. The five GBSSI-1c sequences, with the missing fifth intron, might have become pseudogenes, but they might raise in quite recent since they had not yet led to long branches (the brief phylogram in the upper left corner in Fig. 5). Phylogenetic tree revealed that GBSSI-1c sequences were nested within GBSSI-1b clade (Fig. 5). It was reasonable to conclude that the GBSSI-1c type was directly originated from GBSSI-1b by mutation. Intron losses had been found in GBSSI-1 genes of diverse taxa, like subfamily Maloideae [29,31] and Pooideae [36]. In some species of Poeae, the GBSSI intron loss was interpreted as a nonhomoplasious synapomorphy [36]. Hu [37] proposed the 'intron exclusion hypothesis' , which suggested that a single intron could be precisely removed by double strand breaks (DSB) from a multiple-intron gene. This model of intron loss may explain the present results.

Incongruence between GBSSI-1-based phylogeny and traditional Rubus classification
Overall, GBSSI-1-based phylogeny largely supported Yü's rather than Focke's taxonomy. The results also generated some conflicts with the traditional morphology-based taxonomy, consistent with our previous study by chloroplast and single copy nuclear genes [25]. These incongruences probably suggested the need for a taxonomic revision using modern approaches.
The taxonomic treatments of R. ellipticus, R. ellipticus var. obcordatus, and R. pinfaensis have long been fraught with controversy. The dispute has mainly focused on two aspects, whether R. ellipticus and R. pinfaensis should be combined or not, and R. ellipticus var. obcordatus should be treated as a species R. obcordatus or a variety of R. ellipticus [2,5,6,[38][39][40]. In terms of character differences, R. ellipticus has dense pubescentia in blade back and R. pinfaensis has sparse villus [5]. On the contrary, the differences between R. ellipticus and R. ellipticus var. obcordatus not only focus on the leaflet shape and size, but also on the growth habits and habitat, inflorescence and flowering time [39]. Moreover, significant differences also exhibited in the pollen features, rDNA chromosomal distribution and genomic relationships by molecular cytogenetics [12,39,40]. In this study, three R. pinfaensis samples formed a strongly supported clade with the cluster of R. ellipticus and R. ellipticus var. obcordatus. The clade revealed obvious genetic divergence with any other species from both subsects. Stimulantes and Pungentes (Fig. 5, Additional file 5). Therefore, we supported to place them into a separate series Elliptici, sect. Idaeanthi, subg. Idaeobatus, as Focke proposed [2].

Allopolyploid origin of Rubus polyploids
Hybridization is believed to play an important role in plant speciation and evolution [24]. Chromosome numbers provide preliminary evidence for the possible hybrid origin of the sect. Malachobatus. The majority of the species from the sect. Idaeobatus present the chromosome number of 2n = 2x = 14 [9]. On the other hand, species in the sects. Malachobatus, Dalibardastrum and Chamaebatus have been reported to have higher ploidy levels (e.g., 2n = 4x = 28 for most species; R. amphidasys, 2n = 6x = 42; R. buergeri, 2n = 8x = 56) [9]. It is predicted that many speciation events in Rubus are associated with a change in ploidy levels. Thus, polyploidization may have played an important evolutionary role in the origin of the three sections. This study further offered the potential for new insights into the allopolyploid origin, especially in sect. Malachobatus.

Section Malachobatus
In our previous studies, bivalent pairing was the most predominant form in meiotic configuration, with just very few multivalents in some Malachobatus polyploids [13]. Moreover, polymorphism of 45S rDNA signal intensities by FISH were detected among them, implying different repeat copy numbers among different rDNA sites [12]. These results suggested that some sect. Malachobatus species be probable of allopolyploid origin. Here, GBSSI-1 homoeologs from the same polyploid individual dispersed in different well-supported clades in the GBSSI-1 gene tree (Figs. 4, 5, 6, Additional file 5), and some of these homoeologs were more closely related to homoeologs in other species than they were to each other, indicating that the homoeologs were donated by different ancestral taxa. As Wendel & Doyle [45] and Fortune [46] proposed, the sequences duplicated by polyploidy should be each other's closest clades in autopolyploids, whereas be distributed in different clades in allopolyploids. This mechanism has been clearly illustrated in the origin of allotetraploid rice by Ge et al. [47]. Therefore, our findings provided strong evidence for allopolyploid origin of most sect. Malachobatus species. This hypothesis indicated that two kinds of diploids hybridized to form most allotetraploid species.

Section Dalibardastrum
Section Dalibardastrum species are also allopolyploids because of the co-occurrence of GBSSI-1a and -1b homoeologs. Rubus tsangorum and R. amphidasys share some morphological similarities, such as weak, densely bristly, prostrate stems, simple leaves, and terminal or axillary inflorescences, subracemes with 5 to 15 flowers, whereas they were reported as a tetraploid and hexaploid, respectively [9]. Both of them were strongly nested within sect. Malachobatus group (Figs. 4, 5, 6, Additional file 5), which suggested that they share parental ancestors from sect. Malachobatus. In addition, no other homoeologs besides GBSSI-1a and -1b were found in the hexaploid. As a consequence, the hexaploid might be derived from tetraploid without further hybridization, but only through unreduced gamete of tetraploid (4x and 2x).

Section Cylactis
Members of sect. Cylactis formed a clearly polyphyletic group (Figs. 4, 5, 6, Additional file 5). They are creeping herbs with 3-or 5-foliolate compound leaves and several flowers in clusters or solitary [6]. This section contains various ploidy levels with diploid, tetraploid, and mixoploid [9]. Unfortunately, chromosome numbers of the examined taxa have never been reported. They all have two alleles of the GBSSI-1 gene, suggesting that hybridization events may have been involved in the origin. Specifically in sect. Cylactis, apomixis has also been found [48], hence various ploidy levels may be generated.
The role of diploid sect. Idaeobatus in the evolution within Rubus Diploid sect. Idaeobatus is one of the largest sections in Rubus, which has been resolved as a polyphyletic group with several different evolutionary routes [25]. Here, GBSSI-1-based phylogeny strongly support our previous results (Figs. 4, 5, 6, Additional file 5). This was congruent with its morphological diversity [5,6]. The majority of diploids with GBSSI-1a are composed of imparipinnately 3-9(− 11)-foliolate leaves and flowers in mainly corymbs, while subsect. Corchorifolii with GBSSI-1a consist of simple leaves in 1-flowered, and the remaining diploids with GBSSI-1b with imparipinnately 3-5(− 9)foliolate or simple leaves and flowers in subracemes. Particularly, R. pentagonus and R. peltatus with both GBSSI-1a and -1b is solitary flower with relative large diameter, with palmately 3-foliate and simple leaves, respectively. Furthermore, Idaeobatus species exhibit both sexual and asexual reproduction, and some species could freely hybridize with each other and produce fertile offspring [15][16][17]19]. This probably contribute to the formation of new species, among which polyploids are contained.
Based on the structure difference and phylogeny, GBSSI-1b originated from GBSSI-1a in some diploids by mutation, then polyploidization happened between species with GBSSI-1a and -1b. Therefore, to some extent, the early-divergent diploid species with GBSSI-1a or -1b emerged before polyploid formation in the evolution of Rubus. Then they probably experience their own distinct evolutionary history with various evolutionary rates [25]. During the process, various but common diploidization events might occur in these polyploids [24], hence the allotetraploid is the most frequent and stable form within Rubus [9].

Conclusions
This study presented phylogenies of genus Rubus based on low-copy nuclear GBSSI-1 gene with a comprehensive taxon sampling with 140 Rubus individuals representing 102 taxa in 17 (out of the total 24) subsections of 7 (total of 12) sections at different ploidy levels. Either GBSSI-1a or GBSSI-1b was detected in most diploids (except for R. pentagonus and R. peltatus with both two alleles) of sect. Idaeobatus and blackberry cultivar of sect. Rubus. Both homoeologs (1a and 1b) were observed in majority of polyploids from sect. Malachobatus, as well as in sects. Dalibardastrum, Chamaebatus, and Cylactis species. Phylogenetic trees showed two clades I and II, corresponding to GBSSI-1a, and GBSSI-1b/1c sequences. GBSSI-1 homoeologs from the same polyploid individual dispersed in different well-supported clades in the GBSSI-1 gene tree, and some of these homoeologs were more closely related to homoeologs in other species than they were to each other, indicating that the homoeologs were donated by different ancestral taxa. Based on the structure difference and phylogeny, GBSSI-1b originated from GBSSI-1a in some diploids by mutation, then polyploidization happened between species with GBSSI-1a and -1b. Two kinds of early-divergent ancestral diploids hybridized to form most extent allotetraploid species. This study provided new insights into allopolyploid origin and evolution from diploid to polyploid within genus Rubus at the molecular phylogenetic level, consistent with the taxonomic treatment by Yü et al. and Lu.
PCR products were verified in a 1% agarose gel, and the target products were separated and purified by UNIQ-10 Column MicroDNA Gel Extraction Kit (Sangon, Shanghai, China). For diploids, purified products were directly sequenced with BigDye 3.1 reagents on an ABI PRISM 3730 automatic sequencer (Applied Biosystems, Foster City, California, USA) from both directions. Special attention was paid to those sites with overlapping peaks in the chromatograms, because they may indicate intra-individual variation (polymorphisms) [53]. If an obviously overlapping signal was detected in both the forward and reverse chromatograms, the site was considered to be putatively polymorphic between alleles or copies. Those products with polymorphic sites were cloned using TA cloning after Atailing and ligated to pMD20-T vector with a kit (Takara, Dalian, China). More than three clones per sample were sequenced using M13 + , M13 − primers. For polyploids and R. peltatus, R. pentagonus, two or more amplification bands were cloned separately to obtain sequences. All the sequences have been submitted to the GenBank database with accession numbers of MF595603-MF595796 (Additional file 2). In addition, GBSSI-1 sequences of R. odoratus and other Rosaceae species were downloaded from GenBank (Additional file 2) [29,31,54].

Orthology identification
To identify the orthology of GBSSI-1 gene sequences, we conducted gene sequence similarities and performed phylogenetic analysis. According to Yu's [32] definition of ortholog, the identity at the amino acid sequence level was employed by alignment with the reference genome of diploid R. occidentalis L. [33]. Sequence orthology analysis was also confirmed by phylogenetic analysis using exon sequences of the two GBSSI copies published from Rosaceae [29] together with corresponding sequences generated in this study from diploid Rubus. Sequences from Pisum sativum [55] and Rhamnus catharticus [29] were used as outgroups.

Phylogenetic analyses
We used CLC Genomics Workbench v7.5 (CLC bio, Qiagen, Boston, MA) for sequence editing and assemblying. The boundaries between exons and introns were determined by aligning with GBSSI-1 sequence of R. odoratus [29] and preservations of the 'GT' and ' AG' at two ends of introns. Sequences were aligned with Muscle [56] and manually adjusted in the Molecular Evolutionary Genetics Analysis software (MEGA 7.0) [57] with gaps treated as missing data. Sequence variation within and between different homoeologs was calculated by MEGA 7.0.
The obtained sequences from all species were first blasted (BlastN) against the released Rubus occidentalis to confirm that they are derived from the same GBSSI-1 locus. For those species with two or more forms of amplicons, all cloned and sequenced sequences were included in multisequence alignment in MEGA (v7.0) to genotype the patterns. Since all sequences despite of various length exclusively hit the GBSSI-1 region, they were treated as different alleles from the same gene of GBSSI-1. Three major variants denoted as GBSSI-1a, GBSSI-1b, and GBSSI-1c were obtained and all analyzed in phylogeny reconstruction. If two or more homoeologs were detected in one species, all of them were included for this species. The best fitting substitution model for GBSSI-1 was determined with the Akaike Information Criterion (AIC) [58] using JModelTest v2.1.1 [59]. The maximum likelihood (ML) tree was conducted using IQ-TREE v1.4.2 [60,61]. One thousand regular bootstrap replicates were performed to obtain confidence values for the branches. Bayesian inference (BI) was performed with MrBayes v3.2.1 [62]. The Markov chains Monte Carlo (MCMC) algorithm was run for 6,000,000 generations with one cold and three heated chains, at sample frequency of 100. The first 1,500,000 generations were discarded as burn-in. Clade posterior probabilities (PP) were calculated from the combined sets of trees. All tree visualizations and annotations were achieved with iTOL v3 (Interactive Tree Of Life) online tool [63].
Phylogenetic networks can reflect the conflicting evolutionary signals and highlight reticulate evolution. Here, a network was constructed for the GBSSI-1 dataset with SplitsTree 4.14.2, using a NeighborNet diagram based on uncorrected-P distance matrix [64]. Bootstrap support was estimated with 1000 replicates.