Phenotypic variation of floral organs in flowering crabapples and its taxonomic significance

Background In angiosperms, phenotypic variation of floral organs is often considered as the traditional basis for the evolutionary relationship of different taxonomic groups above the species level. However, little is known about that at or below the species level. Here, we experimentally tested the phenotypic variation of Malus floral organs using combined methods of intraspecific uniformity test, interspecific distinctness analysis, principal component analysis, Pearson correlation analysis, and Q-type cluster analysis. The ancestor-inclined distribution characteristic analysis of Malus species and cultivars floral attributes was also carried out, so as to explore its taxonomic significance. Results 15/44 phenotypic traits (e.g., flower shape, flower type, flower diameter, ...) were highly consistent, distinguishable, and independent and could be used as the basis for Malus germplasm taxonomy. The studied 142 taxa were divided into two groups (A, B) and five sub-groups (A1, A2, B1, B2, B3), with significantly variable floral phenotypic attributes between groups and within sub-groups. Malus natural species were relatively clustered in the same section (series) while homologous cultivars showed evidence of ancestor-inclined distribution characteristics. However, no significant correlation between the evolutionary order of sections (Sect. Docyniopsis → Sect. Chloromeles → Sect. Sorbomalus → Sect. Eumalus) and group/sub-groups (B3 → B2 → B1 → A). Conclusions Phenotypic variation of floral organs could better explore the genetic relationship between Malus taxa. The findings improved our cognition of floral phenotypic variation taxonomic significance under the species level.

Flowers, as a unique and highly conserved morphological feature of angiosperms and are often considered as the traditional traits for complex phenotypic identification of different taxonomic groups, as well as evaluating the interplay between evolution and developmental bias [13][14][15]. Since the information obtained from floral morphological characterizations is consisted of large qualitative and quantitative traits datasets, multivariate analyses are considered to be the most suitable analytical tools for their evaluation [16,17]. Numerical taxonomy, as one of the multivariate analyses, accelerates the application of systematic taxonomy in plant evolution by quantitatively evaluating the morphological similarity between taxonomic groups [18]. However, the objectivity of the taxonomic results is greatly affected by the selected morphological traits. Recently, in ornamental plant germplasm numerical taxonomy studies, principal component analysis (PCA) is often used to reduce data dimensionality and can be supplemented with one-way analysis of variance (ANOVA) and correlation analysis (R-type cluster analysis: the classification of data objects into similarity groups) [19,20]. No scientific system has been formed for trait selection. Moreover, taxonomic units of some studies were solely established above the species level, and the conducted analyses were simply limited to the germplasm identification or clustering group division. This resulted in a failure to correctly locate the role of species, as to conduct in-depth discussion of genetic/evolutionary relationship analysis at or below the species level [21][22][23][24][25][26][27].
Based on floral organ phenotypes, we performed numerical taxonomy of 142 Malus taxa to address the following objectives: 1) establishing a scientific system for Malus taxonomic traits selection; 2) revealing the extent of Malus floral organs phenotypic diversity; and 3) clarifying the taxonomic significance (genetic or evolutionary relationships) of Malus floral variation.

Intraspecific uniformity test and interspecific distinctness analysis of floral phenotypic traits
Except for the pistil number per flower, the remaining 43 qualitative and quantitative traits had significant intraspecific uniformity ( MF ≥90%, C.v.≤10%), meeting the requirement for taxa classification (Fig. 1A, B). As for the interspecific distinctness of qualitative traits, it was found that only 15 floral traits (petal surface wrinkle, sepal deflexed, sepal apex shape, flower shape, petal relative position, sepal color, receptacle pubescence, receptacle color, peduncle pubescence, peduncle color, relative position of stigmas and anthers, style color, petal shape, petal outside color, and petal color at the balloon stage) showed a high degree of distinctness among taxa ( MF ≤ k 1 f ). The remaining 16 qualitative traits, all had no significant interspecific distinctness, and thus should not be further considered in the analyses. It is worth mentioning that although flower type (variable that reflects the number of petal whorls or petals) was less differentiated among the taxa, still it was retained in this analysis for its high recognition value (Fig. 1C). All 13 quantitative traits had a high degree of distinctness (C. v.≥15%), and could be used as taxonomic trait candidates (Fig. 1D).

Principal component analysis of floral phenotypic traits
Based on the 28 floral organs phenotypic traits (16 qualitative and 12 quantitative) selected by the above-mentioned intraspecific uniformity test and interspecific distinctness analysis, principal component analysis was then performed. Taking eigenvalue λ> 0.85 as the extraction threshold, a total of 10 principal components mainly consisted of 22 related floral traits were extracted with a cumulative variation contribution of 83.71%, reflecting most of the information of the original floral dataset ( Table 1).
As expected, the first principal component (PC1) was the most prominent and accounted for 20% of variation. Traits integrated by PC1 mainly reflected floral organs color (ordered of their importance: petal color at the balloon stage, peduncle color, sepal color, petal outside color, receptacle color, and style color). PC2 interpreted 18% of variation and was influenced by flower diameter, petal length, and petal width, which mainly reflected the flower size. PC3 contributed 12% of the variance mainly through petal length / petal width, petal shape, and petal relative position. PC4 had 9% of the variance, and partially affected petal number per flower and flower type. PC5 explained 7% of variation in the pubescence of floral organs (receptacle pubescence and peduncle pubescence). PC6 (5%) integrated two traits related to the sepal morphology (sepal length / sepal width and sepal apex shape). While the remaining components (PC7: 4%, PC8: 4%, PC9: 3%, and PC10: 3%), were affected by peduncle length, petal surface wrinkle, relative position of stigmas and anthers, and flower shape, respectively.

Pearson correlation analysis of Malus floral phenotypic traits
Pearson correlation analysis (taking r > 0.80 as the critical value) was performed on 22 floral traits selected by the principal component analysis described above (Fig. 2). It was found that most of the traits were independent of each other, and only few were completely or closely related (e.g., flower diameter and petal length (r = 0.98), flower diameter and petal width (r = 0.82), petal length and petal width (r = 0.81), petal number per flower and flower type (r = 0.87), petal outside color and petal color at the balloon stage (r = 0.94), peduncle color and sepal color (r = 0.83), and receptacle pubescence and peduncle pubescence (r = 0.83)). For these highly relevant traits, we opted to choose either one of the two traits for taxa classification. It should be pointed out that although lower correlation coefficient existed between petal shape and petal length / petal width (r = 0.61), these two traits were logically related in principle and also in this case, either one could be used.

Cluster analysis of Malus taxa based on important phenotypic traits of floral organs
Finally, 15 important phenotypic traits of floral organs were selected, that are the peduncle color, petal outside color, receptacle color, style color, flower diameter, petal shape, petal relative position, flower type, peduncle pubescence, sepal length / sepal width, sepal apex shape, peduncle length, petal surface wrinkle, relative position of stigmas and anthers, and flower shape. Figure 3 shows the cluster dendrogram of the studied 142 taxa using flexible-beta method based on these 15 floral traits. At Euclidean distances of 21.31 and 11.63, all taxa could be divided into two groups (A, B) and five sub-groups (A 1 , A 2 , B 1 , B 2 , and B 3 ), and the characters of floral organs varied significantly between groups and within sub-groups.
Group A: included a total of 64 taxa (45%) characterized by red flowers and consisted of two sub-groups (A 1 (50, 35%) and A 2 (14, 10%)). Taxa in sub-group A 1 are attractive for their single or semi-double flowers (4.15 ± 0.64 cm in diameter) that are shallow cupshaped or deep cup-shaped. The petals are red-to dark red-purple, as well as the color of receptacles and Fig. 1 Intraspecific uniformity test and interspecific distinctness analysis of Malus floral phenotypic traits. A The intraspecific uniformity of floral organs qualitative traits using 90% as the criteria. If MF≥90%, then the qualitative trait has met the uniformity requirements. B The intraspecific uniformity of floral organs quantitative traits using 10% as the criteria. If C.v.≤10%, then the quantitative trait has met the uniformity requirements. C The interspecific distinctness of floral organs qualitative traits using k 1 f as the reference. k is a coefficient that depends on the number of ranks (f) of each trait appeared in all taxa. The specific assignment is indicated in the C. If MF ≤ k 1 f , the qualitative trait is more differentiated among all taxa. D The interspecific distinctness of floral organs quantitative traits using 15% as the criteria. If C. v.≥15%, it was considered that the differentiation degree of this trait is high among all the taxa. For more accurate expression, the one-way ANOVA (Tukey's method) was performed. * represents that the difference of quantitative traits reached a significant level between different Malus taxa (P < 0.05) peduncles. The relative position of petals is touching or overlapping. Sepal shape of flowers in this sub-group is lanceolate, and their apexes are acuminate in majority. Peduncles are medium in length (2.75 ± 0.75 cm), mostly with no or sparse pubescence. Relative to anthers, stigmas are above or the same height. In subgroup A 2 , the flowers are large (4.87 ± 0.75 cm in diameter), charming with double (15-27 petals), light pink to deep pink, wrinkled petals. The flower shape is flat or deep cup-shaped, and the petals is elliptic. Relative position of petals is overlapping. The sepals are triangular, and their apexes are mostly acute. Peduncles are long (3.16 ± 0.52 cm). In this sub-group, the relative position of stigmas and anthers varied.
Group B: included 78 taxa (55%) and are distinguished by their single, pinkish white or white flowers, gradually changing from pink to rose-red or pure white buds. It contained three sub-groups (B 1 (21, 15%), B 2 (36, 25%) and B 3 (21, 15%). The degree of petal color rhythm (petal color changes during the different flowering stages) of the three sub-groups was B 1 > B 3 > B 2 . In sub-group B 1 , taxa are unique for their small (3.44 ± 0.88 cm) flowers that are flat or shallow cup-shaped. Petal shapes are mostly round to ovate, Table 1 Eigenvalue, contribution rate and cumulative contribution rate of each principal component The cumulative contribution rate means the representativeness of extracted factors for all variables. Generally, 80% is regarded as the critical value. And the larger the value, the stronger the representativeness. The meaning of each principal component is determined by the absolute value of load coefficient. Variables with an absolute value greater than 0.7 can be considered as representative ones of the principal component

Ancestor-inclined distribution characteristic analysis of Malus taxa
In this study, ancestor-inclined distribution characteristics of Malus taxa were analyzed from two aspects: species and cultivars. In accordance with the Malus species taxonomy system proposed by Rehder [28], Yu et al. [29], and Li et al. [30], the 31 species involved in our study belonged to seven sections ( (Fig. 4). The distribution of Malus species belonged to the same section (series) was relatively concentrated. The ancestor-inclined distribution probability reached up to 87% in the two groups and 61% in five sub-groups. From the literature, 33 out of the 111 tested Malus cultivars could be completely or partially traced back to their parental taxa (11 species with the floral organ phenotypic data involved in this study) [8][9][10]12]. Statistical analysis indicated that the studied 33 cultivars also showed obvious ancestor-inclined distribution characteristics in two groups (A, B) and five sub-groups (A 1 , A 2 , B 1 , B 2 , B 3 ), with inclined probability reaching up to 73 and 64%, respectively (Table 2). Based on the above distribution characteristics of Malus species in the two groups (A, B) and five subgroups (A 1 , A 2 , B 1 , B 2 , B 3 ), it was inferred that the evolutionary order of the four groups/sub-groups (A, B 1 , B 2 , and B 3 ) in this study might be: B 3 → B 2 → B 1 → A. According to the order from original to evolved, these four groups/sub-groups and the four sections of Malus species in classic taxonomy system were assigned by values: B 3 (1) → B 2 (2) → B 1 (3) → A (4), Sect. Docyniopsis (1) → Sect. Chloromeles (2) → Sect. Sorbomalus (3) → Sect. Eumalus (4). It was also found that there was no significant correlation between these two sets of evolutionary data (R 2 = 0.068, P = 0.156).

Establishment of a screening system for Malus floral taxonomic traits
Typical angiosperm flowers are composed of sterile sepals and petals and fertile stamens and carpels [31][32][33]. The significant differences in the number, type, size, shape, color, arrangement, and smell of each part determine its multi-dimensional and complex characteristics [14,34,35]. Uni-dimensional variables are usually difficult to describe in its entirety, while the specificity of different groups could be easily masked when several variables were simultaneously considered. Currently, dimensionality reduction of traits is often performed by principal component analysis (PCA) or correlation analysis (R-type cluster analysis), or by artificial screening based on intuitive experience [2,36,37]. No systematic and scientific system has been formed for trait screening. In this study, for meeting the requirements of Malus taxonomy and aesthetic, a theoretical and technical system (intraspecific uniformity test → interspecific distinctness analysis and one-way ANOVA → principal component analysis → Pearson correlation analysis) was established in accordance with the order of uniformity → distinctness → independence. Twelve qualitative (petal outside color, peduncle color, receptacle color, style color, petal shape, petal relative position, flower type, peduncle pubescence, sepal apex shape, flower shape, petal surface wrinkle, and relative position of the stigmas and anthers) and three quantitative traits (flower diameter, peduncle length, and sepal length/sepal width) were finally extracted, and these traits could reflect most of the information present in the original Malus floral dataset. This theoretical and technical trait screening system also has important reference for the extraction of phenotypic characteristics in taxonomy of other ornamental plant resources.

Taxonomic significance of phenotypic variation of Malus floral organs
Floral variation is a positive response of plants to the selection pressure [38][39][40][41]. Studies on floral variation not only contributed to our understanding of species evolution [42][43][44][45], but also revealed the genetic rules and variation degrees of populations / groups [46][47][48][49][50], which in turn provided a theoretical basis for the protection of species. In this study, through the cluster analysis of the 142 Malus taxa based on their floral organs phenotypic traits, we found that the distribution of Malus species belonged to the same section (series) was relatively concentrated, with ancestor-inclined distribution  Table 2 Parents traceability and identification of ancestor-inclined distribution characteristics of Malus cultivars probability reaching up to 87% in two groups (A, B) and 61% in five sub-groups (A 1 , A 2 , B 1 , B 2 , B 3 ). Among the 33 cultivars that could be traced to all or part of their parents, evident ancestor-inclined distribution characteristics were also observed in the above mentioned groups / sub-groups (the ancestor-inclined distribution probability reaching up to 73 and 64%, respectively). Our results agreed with the classical Malus taxonomy system established by Rehder [28], indicating that the phenotypic variation of floral organs could be well applied to the genetic relationship exploration between Malus taxa. However, by comparing the evolutionary order of Malus sections (Sect. Docyniopsis → Sect. Chloromeles → Sect. Sorbomalus → Sect. Eumalus) proposed by Langenfel'D [51] based on classic phenotypic traits with that of Malus groups / sub-groups (B 3 → B 2 → B 1 → A) inferred from the cluster dendrogram of the 142 Malus taxa, it was found that there was no significant correlation between them (R 2 = 0.068, P = 0.156). This indicated that floral variation is unable to reveal the evolutionary relationships of Malus species. In fact, the variation or change in different floral organs usually occurs at different taxonomic levels (family, genus, species, ranks below species) [52]. Size, color, smell, and the taste of floral organs are often quite different in species or lower levels [45,[53][54][55]. Jin [56] concluded that in the taxonomy of subgenus Tsutsusz (Rhododendron), the tree habit, shape and the size of corollas, could be used to distinguish grades above the species level. The pubescence type of young twigs, number of stamens, size of calyx lobes, pubescence condition of filament or corolla, etc. could be applied in the delimitation of species (taxa below species). In some cases, the pubescence condition of style could be limitedly adopted, while some important traits such as whether stamens are longer than pistils and whether stamens are equal in length should be better avoided. For exploring the evolutionary relationship of Malus species based on the phenotypes of floral organs, the screening of taxonomic traits varied at the species level is therefore playing the key role.

Conclusions
This study innovatively established a scientific system (intraspecific uniformity test → interspecific distinctness analysis and one-way ANOVA → principal component analysis → Pearson correlation analysis) for Malus taxonomic traits screening in accordance with the order of uniformity → distinctness → independence. This scientific system also has important reference for the extraction of phenotypic characteristics in taxonomy of other ornamental plant resources. Based on numerical taxonomy, phenotypic variation of Malus floral organs was then clearly clarified, as well as its taxonomic significance: Phenotypic variation of floral organs could better explore the genetic relationship between Malus taxa. These findings improved our cognition of floral phenotypic variation taxonomic significance under the species level.

Plant materials
A total of 142 Malus taxa (including 31 species and 111 cultivars) were collected from the National Repository of Malus spp. Germplasm (Yangzhou City, Jiangsu Province, China) ( Table 3). All trees were 7 to 10 years old and entered the full blooming phase. Each cultivar was represented by 30 individuals planted with 3 m between rows and 2 m within rows. According to the requirements of randomized block experiment design, 10 plants were taken as one block, and three blocks were set for each taxon.

Trait measurement, description, and coding
For each cultivar, 10 plants were randomly selected and three consistent, typical and standard full-bloom flowers for each plant were collected, yielding 30 samples in total.
All flowers were gathered from the middle of the tree and the branch exposed to the sun. Then, they were placed in a cooler and taken to the laboratory for immediate measurement. Phenotypic traits evaluation was carried out as recommended by the guidelines for Malus distinctness, uniformity and stability test [57] and additional traits were specifically selected for their identification value. All together 44 phenotypic traits of Malus floral organs were investigated in this study, including 31 qualitative traits (dimorphic traits and polymorphic traits that can only be observed and present discontinuous variation) and 13 quantitative traits (traits that can be differentiated by quantity and present continuous variation) [58] (Table 4).
For four consecutive years (2017 to 2020: end of March to mid-April or late April), the 44 traits were repeatedly assessed for correction. Qualitative traits were directly observed in the field, and the final values of quantitative traits were calculated as the mean value of 30 replicates. Hierarchical number coding system was applied for the qualitative traits following the order from ancestral to evolutionary as far as possible. Consecutively arranged non-negative integers 0, 1, 2, 3, ..., were taken for expression. The dimorphic traits with an evolutionary relationship that was difficult to determine were generally coded as 1 (Yes) and 0 (No) [59,60]. No coding was applied for the quantitative traits and the mean values of the 30 replicates were directly used for further analysis ( Table 4).

Screening of taxonomic traits
To obtain the traits that are highly consistent, distinguishable, and independent, a scientific system for Malus taxonomic traits screening was established (Fig. 5).

Intraspecific uniformity test
The intraspecific uniformity test for qualitative traits is expressed by the mean mode frequency ( MF ), and for quantitative traits is expressed by the mean coefficient of variation ( C.v. ). If MF ≥90% or C.v.≤10%, then the qualitative (quantitative) trait has met the uniformity requirements.
where n denotes the number of taxa; m denotes the number of repetitions; M 0 , S i and X i denotes the rank with highest frequency of occurrence, standard deviation of the observed values, and the mean observed values of each trait in each taxa's m repetitions, respectively.

Interspecific distinctness analysis
The interspecific distinctness analysis of qualitative traits is simply measured by the closeness of the mode frequency (MF) and theoretical frequency ( 1 f ) of the rank that each trait shows in all taxa. If MF ≤ k 1 f , the qualitative trait is more differentiated among all taxa. For quantitative traits, it is expressed by the coefficient of variation (C. v.) of the mean value of each trait in all taxa. If C. v.≥15%, it was considered that the differentiation degree of this trait is high among all the taxa. Oneway ANOVA (Tukey's method) should be performed on quantitative traits as well.
where, M 0 ′ denotes the rank of each trait that appears the most in all taxa; k is a coefficient that depends on the number of ranks (f) of each trait appeared in all taxa; S ′ and X ′ , respectively, denote the standard deviation and the average of observed mean values of each trait in all taxa.

Principal component analysis and Pearson correlation analysis
On the premise of higher uniformity and distinctness, principal component analysis (PCA) and Pearson correlation analysis were used to further reduce the dimensionality of the selected traits. In order to eliminate the impact of different dimensions on data analysis, the standard deviation (STD) normalization process was performed in advance on the original numerical matrix; that is, the orthonormal process.

Cluster analysis and ancestor-inclined distribution characteristics of Malus taxa
Based on the extracted taxonomic traits that could reflect the phenotypes of floral organs, the 142 taxa were quantitatively classified using flexible average method so as to reveal the phenotypic diversity of Malus floral organs. And meanwhile, the ancestor-inclined distribution characteristics was analyzed from two aspects: species and cultivars, aiming at clarifying Malus floral variation taxonomic significance.

Data processing
Origin 9.0, DPS 9.5, R 3.6.1, and Adobe Illustrator CS5 were used for data processing and graph plotting.