Evidence for extensive hybridisation and past introgression events in feather grasses using genome-wide SNP genotyping

Background The proper identification of feather grasses in nature is often limited due to phenotypic variability and high morphological similarity between many species. Among plausible factors influencing this issue are hybridisation and introgression recently detected in the genus. Nonetheless, to date, only a bounded set of taxa have been investigated using integrative taxonomy combining morphological and molecular data. Here, we report the first large-scale study on five feather grass species across several hybrid zones in Russia and Central Asia. In total, 302 specimens were sampled in the field and classified based on the current descriptions of these taxa. They were then genotyped with high density genome-wide markers and measured based on a set of morphological characters to delimitate species and assess levels of hybridisation and introgression. Moreover, we tested species for past introgression and estimated divergence times between them. Results Our findings demonstrated that 250 specimens represent five distinct species: S. baicalensis, S. capillata, S. glareosa, S. grandis and S. krylovii. The remaining 52 individuals provided evidence for extensive hybridisation between S. capillata and S. baicalensis, S. capillata and S. krylovii, S. baicalensis and S. krylovii, as well as to a lesser extent between S. grandis and S. krylovii, S. grandis and S. baicalensis. We detected past reticulation events between S. baicalensis, S. krylovii, S. grandis and inferred that diversification within species S. capillata, S. baicalensis, S. krylovii and S. grandis started ca. 130–96 kya. In addition, the assessment of genetic population structure revealed signs of contemporary gene flow between populations across species from the section Leiostipa, despite significant geographical distances between some of them. Lastly, we concluded that only 5 out of 52 hybrid taxa were properly identified solely based on morphology. Conclusions Our results support the hypothesis that hybridisation is an important mechanism driving evolution in Stipa. As an outcome, this phenomenon complicates identification of hybrid taxa in the field using morphological characters alone. Thus, integrative taxonomy seems to be the only reliable way to properly resolve the phylogenetic issue of Stipa. Moreover, we believe that feather grasses may be a suitable genus to study hybridisation and introgression events in nature. Supplementary Information The online version contains supplementary material available at 10.1186/s12870-021-03287-w.


Background
The proper delimitation of species plays an important role in taxonomy as well as in studies related to speciation, biogeography and ecology, leading to effective conservation and management of biodiversity. In the last two decades, traditional approaches relying mostly on morphological features have been supplemented by molecular data that boosted the discovery of new species. Although one estimate of the number of plant species is around 298,000 [1], recently it has been shown that the plant kingdom is comprised of at least 374,000 taxa [2]. Nowadays, many systematicists emphasise the need to apply multidisciplinary data, so-called integrative approaches or integrative taxonomy [3]. For instance, information from a variety of disciplines, e.g., morphology, biochemistry, cytogenetics and 'omics studies, increases the reliability and validity in identifying taxa [4][5][6].
To date, among the molecular methods, DNA barcoding has been a widely utilised tool to identify taxa at different levels aiming not only to facilitate revisionary taxonomy, but also to broaden our understanding of molecular phylogenetics and population-level variations [7][8][9]. Among standard plant markers, chloroplast regions rbcL and matK and the nuclear internal transcribed spacer (ITS) locus have been proposed for DNA barcoding of land plants [10,11]. Additionally, several non-coding plastid regions have been suggested as supplementary loci where further resolution is required [10].
Stipa is one of the largest genera in the grass subfamily Pooideae (Poaceae), currently comprising nearly 150 cool-season species with C 3 photosynthesis common in Eurasia and North Africa [12,13]. Based on ITS and the plastid trnK region, the genus has been proven to be monophyletic [14,15]. Nonetheless, the traditional barcodes are not able to validate the sectional subdivision in Stipa proposed, e.g., by Tzvelev [16,17] or Freitag [18]. Recently, the nuclear intergenic spacer (IGS) region [19] and marker sets derived from whole chloroplast genomes of 19 species [20] were proposed for phylogenetic studies of feather grasses. Although these markers are more phylogenetically informative in comparison to the previously used barcodes, they are still unable to discriminate all taxa, causing unresolved nodes in the reconstructed trees [19,20].
One of the plausible explanations for this unresolved branching in Stipa is that many feather grasses are of hybrid origin [16,[21][22][23]. Presently, hybridisation is considered to be widespread among at least 25% of plant taxa, mostly the youngest species [24]. This phenomenon is often accompanied by introgression via repeated backcrossing to one or both parental species that can lead to diversification and speciation [25,26]. In grasses, hybrid speciation has been explicitly studied at the genome level, e.g., in Triticum [27] and Brachypodium [28]. Nonetheless, previously many hybrids and introgressed individuals were characterised exclusively based on morphology that limited their successful identification in nature [29,30]. In addition, hybridisation may lead new organisms not only to intermediate traits of parental species but also to extreme, or transgressive, phenotypes [31] that complicate their proper taxonomic treatment. In feather grasses, the hypothesis of hybrid origin of some species was initially tested using multivariate morphological analyses [23,32] and, more recently, applying molecular markers among genetically closely related species in the Stipa heptapotamica complex [33] as well as within two genetically distant species, S. krylovii and S. bungeana [34]. Furthermore, due to the usage of integrative taxonomic approaches, it was shown that some Stipa taxa, previously assigned to S. richteriana and S. grandis, appeared to be cryptic species [33,35]. Thus, taking into account that ca. 30% of feather grasses could be of hybrid origin [13] and that cryptic species are present, integrative taxonomy seems to be the only reliable way to properly resolve the phylogenetic issue of Stipa.
Importantly, the advent of next generation sequencing technologies, primarily Illumina, and the continuously reducing sequencing cost have facilitated the implementation of genomic data in studies of non-model organisms. For instance, high-throughput techniques based on restriction enzymes, e.g., RADseq [36] and genotyping-by-sequencing [37], which have been foremost used in agricultural species [38], are currently widely applied in phylogenetics and studies related to hybridisation in many wild plant genera with little or no previous genomic information [39][40][41]. Recently, a promising result has also been demonstrated in Stipa where the usage of the DArTseq technique resulted in an increased number of markers that was several 100-fold higher than in the previous genomic studies [34].
During field studies on steppe communities, we observed high morphological variability in populations of genetically related plants. In particular, we noticed that some specimens of feather grasses representing S. capillata, S. grandis and S. krylovii seemingly share mixed morphological characteristics between these species, while taxon S. baicalensis is frequently observed within populations of the aforesaid taxa and resembles an intermediate phenotype between S. grandis and S. krylovii or S. capillata and S. krylovii. The above-mentioned species have wide distribution ranges ( Fig. 1 and Supplementary  Table S1). Specifically, S. capillata is the most widespread taxon within the genus, grows on the dry grasslands and is common in Siberia, Western Asia and is also present in a limited number of refugia in Europe and North Africa. Two species, S. baicalensis and S. grandis, share similar ranges in southwestern Siberia, in the Baikal region, in the south part of Zabaykalsky Krai, in Mongolia and in northeastern China; S. baicalensis is also present in the south part of the Russian Far East, whereas S. grandis occurs in Central China and Tibet. Finally, S. krylovii grows in Siberia, Mongolia, China, Northern Nepal, Southern Tajikistan, Eastern Kazakhstan and Eastern Kyrgyzstan [13,42].
We hypothesise that the observed variability in S. baicalensis, S. capillata, S. grandis and S. krylovii is due to the presence of interspecific hybrids and that may lead to species misidentification based on the current descriptions of these taxa [16,[43][44][45]. Thus, in the current study, we aim to use an integrative taxonomy approach to (1) delimitate species and test if S. baicalensis is a hybrid between S. grandis × S. krylovii or S. capillata × S. krylovii; (2) assess levels of hybridisation and introgression (if present) between the examined taxa and populations at the molecular level; (3) estimate divergence times between the studied taxa; (4) obtain insight  Table S1 into the extent of hybridisation between these species at the morphological level and (5) assess whether morphological characters can be used to identify hybrids in the field.

DNA-based species delimitation
The DArTseq technique was applied to obtain a total of 8660 SNP markers to infer the genetic structure of 302 Stipa specimens. Firstly, analyses of genetic clustering with an unweighted pair group method using arithmetic average (UPGMA) and fastSTRU CTU RE revealed five major clades corresponding to morphospecies S. glareosa, S. capillata, S. grandis, S. krylovii and S. baicalensis (Fig. 2). According to the fastSTRU CTU RE analysis, the first and the fourth clades consisted exclusively of pure specimens of S. glareosa and S. krylovii, respectively. The remaining clades beside pure specimens of S. capillata, S. grandis and S. baicalensis included hybrid individuals. In particular, the second clade comprised pure specimens of S. capillata and the admixed individuals S. capillata × S. baicalensis and S. capillata × S. krylovii. The third cluster consisted of pure specimens of S. grandis and hybrids S. grandis × S. krylovii and S. grandis × S. baicalensis. The fifth clade included pure specimens of S. baicalensis and the admixed individuals S. baicalensis × S. krylovii.
In total, fastSTRU CTU RE inferred 52 individuals with an admixture of two genetic clusters including an exception of S. capillata × S. baicalensis (0454631) that had a minor proportion (0.02) of a third cluster representing S. krylovii. Among pure individuals only one specimen of S. krylovii (0454646) had an insignificant admixed proportion (0.03) of S. grandis. Noteworthy, the vast majority of admixed individuals (49 or 94%) had a proportion of membership in the range from 0.46 to 0.54 indicating F1 hybrids or later generations of hybrids that have no backcrossing to the parental species. The remaining admixed samples represented: (1) one individual (0477009) was formed by a 0.78-0.22 admixture between S. baicalensis and S. krylovii evidencing a first-generation backcross (F1× S. baicalensis), (2) one individual (000948) was shared between S. grandis (0.88) and S. krylovii (0.12) indicating a second-generation backcross (first-generation backcross × S. grandis), (3) one individual (000956) was admixed between S. grandis (0.64) and S. krylovii (0.36) that may suggest a first-generation backcross (F1× S. grandis) or a more complex backcross to S. grandis via different intermediate combinations.
A consistent result was also found with a principal coordinates analysis (PCoA). The first three axes explained 29.6, 19.9 and 19.2% of the total genetic divergence within the studied taxa, respectively. According to the PCoA, pure individuals were grouped into five

Hybrid generation identification
The NewHybrids analysis revealed a more complex pattern of hybridisation than it was inferred with fastSTRU CTU RE. Among 16 admixed specimens of S. baicalensis × S. krylovii, previously assigned as F1, six individuals with posterior probabilities (PP) in a range of 0.84 and 1.00 were identified being F2 (F1× F1) hybrids suggesting that F1 hybrids are able to reproduce further. One specimen, S. baicalensis × S. krylovii (0477009), was proven to be a first-generation backcross (F1× S. baicalensis) having PP of 0.81. In addition, five mixed individuals had PP between two categories (F1 and F2 hybrids) in a range of 0.22-0.78 suggesting uncertainty in the assignment ( Fig. 4a and Supplementary Table S2). These mixed assignment individuals may represent a more advanced hybrid generation than can be detected by NewHybrids. Within 14 hybrids of S. capillata × S. krylovii, previously assigned to F1 hybrids, we detected six individuals of F1 (PP of 0.87-1.00) and one specimen was identified as an F2 hybrid with PP of 0.83. The remaining seven individuals had mixed assignments in a range of 0.39-0.73 for the F1 class and 0.27-0.61 for the F2 class, respectively (Fig. 4b). The analysis also demonstrated that 13 out 14 hybrids of S. capillata × S. baicalensis were F1 (PP of 0.86-1.00) and one individual remained unclassified sharing PP (0.54 and 0.46) between F1 and F2 categories (Fig. 4c). Among S. grandis × S. krylovii only one F1 hybrid was detected (PP of 0.91), two individuals were assigned to the F2 class (PP of 0.83 and 1.00) and two specimens were classified as first generation backcrosses (F1× S. grandis) having PP of 0.88 and 0.99. Although the specimen 000948 was inferred to be an F1 backcross, it is more plausible that it represents rather a second-generation backcross established by fastSTRU CTU RE due to NewHybrids being unable to detect more advanced backcrosses than F1. Additionally, one individual had mixed assignments between the F1 (PP of 0.27) and F2 (PP of 0.72) classes (Fig. 4d). Finally, the only hybrid detected for S. grandis × S. baicalensis appeared to be a first generation hybrid with PP of 1.00 (Fig. 4e).

Testing for introgression
A total of 6894 SNP markers were used to test for reticulation events between the studied taxa. Due to five different admixed combinations detected by fastSTRU CTU RE and NewHybrids, we tested all possible four-species combinations regardless of their phylogenetic positions (Fig. 2). The results of the f 4 statistic suggest no gene flow between S. capillata and the remaining species because of negligible deviations from the expected 50/50 ratio of BABA/ABBA patterns and the lowest Z-scores of any tests (Table 1). This finding disagrees with the presence of contemporary hybrids S. capillata × S. krylovii and S. capillata × S. baicalensis inferred with fastSTRU CTU RE and NewHybrids. Nonetheless, it can be explained by the fact that all identified admixed individuals were excluded from this analysis. On the other hand, introgression events were suggested between S. grandis and  S. baicalensis (combinations 5 and 11), S. grandis and S. krylovii (combinations 4, 6, 7 and 8), S. krylovii and S. baicalensis (combinations 9 and 12). Additionally, when S. grandis, S. krylovii and S. baicalensis were analysed together (combinations 4, 7 and 10) the ratio of BABA/ ABBA patterns were either almost equal (combination 10) or relatively lower (combinations 4 and 7) compared to the other tests that indicated gene flow among these species. One potential explanation is that these species are involved in introgression at the same rate, which theoretically cancel out each other.  Table S3) regardless of the fact that the distance between them is more than 1000 km. Additionally, the second most likely K according to STRU CTU RE was K = 3 indicating that this number of clusters is also a likely option. Relatively strong differentiation was also shown for populations of S. capillata with an exception of populations 5 and 6 with a moderate Fst value of 0.13, suggesting potential gene flow. According to the PCoA, almost all individuals were grouped together excluding population 3 and a few specimens of populations 2 and 5. Nonetheless, the first two axes of PCoA explained only 18% of the total genetic divergence within the specimens. On the other hand, STRU CTU RE supported K = 4 as the best fitting model, while two and nine clusters were also among the probable options ( Fig. 5b and Supplementary Fig. S2). Within S. grandis, evidence for weak differentiation was shown for geographically close populations 5 and 6 (Fst of 0.10) as well as for 7 and 8 (Fst of 0.08), while the first two axes of the PCoA explained 23.3% of the variation and revealed the close genetic relationship between geographically distant populations 1 and 2 (Fst of 0.42). In

Divergence-time estimation
The SNAPP phylogeny based on 2717 SNP markers among pure individuals revealed largely the same topology as the UPGMA dendrogram, except for the pair S. grandis and S. krylovii that were grouped together, while S. baicalensis was a sister taxon ( Fig. 6

Morpho-molecular analysis
As non-parametric Spearman correlation coefficients did not demonstrate any strong correlation (> 0.90) between the measured variables ( Supplementary Fig. S4), we retained all morphological characters (Table 3) for a factor analysis of mixed data (FAMD) and analyses of notch plots. Subsequently, to investigate if the observed phenotypes are congruent with molecular data, we supplemented the result of the FAMD analysis with the genetic clusters inferred by fastSTRU CTU RE for the best K = 5. As a result, the FAMD revealed five markedly differentiated groups (Fig. 7) of operational taxonomic units (OTUs) in accordance with the detected clusters using the SNP data (Figs. 2, 3 and 4). The first four dimensions explained 31.0, 18.9, 10.9 and 8.1% of the total variability, respectively. The first three axes are mainly composed by the contribution of 11 quantitative and four qualitative variables (Supplementary Table S4). Importantly, due to having the genetic clusters assigned by fastSTRU CTU RE, the two-dimensional plot revealed the slight overlapping of OTUs belonging to the pure species S. baicalensis and S. krylovii, whereas OTUs representing the admixed specimens S. baicalensis × S. krylovii were present in both clouds of the parental taxa (Fig. 7a). Furthermore, hybrids S. capillata × S. baicalensis were also present in both clouds of the pure species. On the other hand, all the admixed individuals S. capillata × S. krylovii were grouped together with only one parental species, S. capillata. Interestingly, hybrids between S. grandis and S. krylovii were mainly grouped together with OTUs of S. baicalensis with an exception of the first-and the second-generation backcrosses. The only hybrid detected between S. grandis and S. baicalensis was clustered together with the pure individuals of the former taxon. A more clear dispersal of the pure species can be seen in the three-dimensional plot, where differences between the studied species are explained by the third principal axis ( Fig. 7b; an interactive version of the plot available at https:// plot. ly/ ~eugen ebaya hmetov/ 42/). Additionally, all combinations of two axes plots for the four dimensions are present in Supplementary Fig. S5.
The result of FAMD and notch plots of the quantitative variables demonstrated that the studied species can be differentiated morphologically mainly based on the length of the lower segment of the awn (Col1L), the distance from the end of the dorsal line of hairs to the top of the lemma (DDL), the length of the anthecium (AL), the length of the middle segment of the awn (Col2L), the length of the callus (CL), the length of ligules of the internal vegetative shoots (LigIV), the length of ligules of the middle cauline leaves (LigC) and the length of hairs on the top of the lemma (LHTA). In addition, the length of hairs on the lower segment of the awn (HLCol1), the length of hairs on the middle segment of the awn (HLCol2) and the length of the callus base (CBL) can aid to distinguish S. glareosa from the remaining taxa ( Supplementary Fig. S6). For instance, the notch plot of Col1L showed significant differences between means and strong evidence of differing medians within the pure species except the pair S. baicalensis and S. capillata, while the hybrid individuals had mostly intermediate positions between the parental species except specimens of S. capillata × S. baicalensis that were significantly different from the samples of the pure taxa. The differences across all quantitative variables between individuals of the pure species and the hybrids can be better evaluated in the interactive box plots presented in Supplementary File 1.
Among the qualitative variables, the main contribution to the axes of the FAMD had the abaxial surface of vegetative leaves (AbSVL), the type of hairs on the top of the anthecium (HTTA), the type of the awn geniculation (AG) and the presence of hairs below nodes (PHBN) (Supplementary Table S4 Fig. S7).
Thus, based on the results of the molecular analyses and the FAMD combining both phenotypic and SNP data, we were able to differentiate the pure species and the hybrid individuals at the morphological level. We established that using the traditional identification keys [16,[43][44][45] 71 out of 302 specimens had been misidentified, mostly due to their hybrid nature (Supplementary  Table S5). In particular, 47 samples previously identified as pure species appeared to be hybrids. The remaining 24 specimens earlier were classified either as hybrids (15 samples) or misleadingly assigned to S. baicalensis (9 samples). Interestingly, in the latter case, all individuals were previously reported from the northeastern part of Kazakhstan [46].
The remaining doubtful samples morphologically were either misleadingly assigned as the pure species or as taxa of hybrid nature. For instance, 4 specimens of S. capillata were hybrids between S. capillata and S. krylovii, while specimens of S. grandis (2 samples) and S. krylovii (2 samples) were shown to be genetically admixed as S. grandis × S. krylovii. In opposite, 9 individuals identified as S. capillata × S. krylovii appeared to be the pure species S. capillata. Interestingly, only 5 out of 52 hybrid taxa were properly identified based on morphology including S. baicalensis × S. krylovii (3 samples) and S. grandis × S. krylovii (2 samples). The only one taxon that had not any questionable individuals was S. glareosa representing the section Smirnovia.

Discussion
The current understanding of taxonomy and species limits in Stipa is still largely based on morphological characters. Our study highlights the necessity of using molecular tools to properly identify taxa and detect processes underlying speciation. This is of particular relevance in hybrid zones where ongoing hybridisation and introgression may lead admixed individuals to phenotypes similar to one of the parental species, complicating identification of such taxa using morphological characters alone. Furthermore, integrative studies suggest that apparently intermediate phenotypes between two species are not necessarily hybrids [47]. On the other hand, as indicated in the present work, although some interspecific hybrids have intermediate characters between parental taxa, their phenotypic traits can also overlap with non-parental species leading to misidentification. Given the circumstances, here we utilised an integrative approach combining genome-wide data and morphology to delimitate species and ascertain the extent of hybridisation in feather grasses.
The study clearly illustrates that a molecular-based analysis, e.g., such as fastSTRU CTU RE, combined with a factor analysis of mixed data, utilising both quantitative and qualitative variables, can largely resolve the problem with species identification in the face of ongoing hybridisation and introgression. Primarily, such an approach allows to visualise and easily trace if the observed phenotypes are congruent with molecular data. Besides that, this approach may aid in selecting a set of traits that can be useful for species identification in the field. Our findings clearly demonstrate that the studied individuals can be clustered into five species groups representing S. capillata, S. baicalensis, S. glareosa, S. grandis and S. krylovii. Thus, here we found no evidence that S. baicalensis is of hybrid origin from S. grandis × S. krylovii or S. capillata × S. krylovii but is instead a genetically distinct species. The general branching of the phylogenetic trees is in good agreement with the current taxonomic classification. In particular, a representative of the section Smirnovia, S. glareosa, was genetically distant to the remaining species from the section Leiostipa. Nevertheless, our result contradicts a previous research, where S. capillata, S. krylovii and S. baicalensis represented one clade and S. grandis was a sister taxon [19], while here, such a sister taxon was S. capillata. The current result is likely more accurate due to applying several thousand SNPs across the genome in comparison to only one nuclear locus in the above-mentioned study. Additionally, we demonstrated that the potential split between S. capillata and the remaining representatives of Leiostipa took place approximately 1.07 Mya, which is similar to our previous estimate of 1.73 Mya based on the nucleolar organising regions [48]. Here, we also reported for the first time that diversification within species S. capillata, S. baicalensis, S. krylovii and S. grandis started ca. 130-96 kya (95% HPD: 181-63 kya). These ages may correspond to the potential window of time between the Last Interglacial period, which began around 130 kya, and the Last Glacial Period (LGP), which started about 110 kya. Thus, the observed pattern is similar to dispersing events reported for different taxa across the plant kingdom [49,50] suggesting climatic changes as a feasible factor in the current distribution of feather grasses. Of note, due to the divergence times that were inferred in SNAPP, which uses the multi-species coalescent model ignoring possible introgression, the confidence may be exaggerated. On the other hand, although introgression does cause biased errors in coalescent-based species tree inference [51][52][53], it should not affect the estimates in the present study, since all admixed individuals were excluded from the analysis.
Importantly, the results of molecular analyses were congruent and provided evidence for the existence of hybridisation between pairs S. capillata × S. baicalensis, S. capillata × S. krylovii, S. grandis × S. krylovii, S. grandis × S. baicalensis and S. baicalensis × S. krylovii. The presence of F2 or more advanced hybrid generations suggests that F1 individuals are able to reproduce further. This observation is in agreement with our previous findings on hybridisation within the genus [32,33], where a direct approach proved that a hybrid taxon S. heptapotamica produces fertile pollen grains and is capable of backcrossing to primarily one parental species [33]. Indeed, here we detected backcrosses S. baicalensis × S. krylovii and S. grandis × S. krylovii to their former parental species. Moreover, the analysis of BABA/ABBA patterns among species revealed signs of past introgression between S. baicalensis and S. krylovii, S. baicalensis and S. grandis, S. krylovii and S. grandis. Taking into account the diversification times of these species, we can hypothesise that if such a gene flow occurred it was relatively recent in evolutionary terms and seemingly still present between S. baicalensis and S. krylovii, as well as in pair S. grandis and S. krylovii. Nevertheless, we treat our BABA/ ABBA analysis with caution due to such a test originally being applied in human studies whose genome is available at the chromosome level and the number of SNPs was remarkably higher than used here. Thus, we intend to reassess the past gene flow within these taxa when a more continuous genome will be obtained. Additionally, although here we did not detect any backcrosses between S. baicalensis and S. grandis, it cannot be excluded that such a combination exists in nature, especially since both species are mostly common in Mongolia and China; however specimens from there were not presented in the study. Lastly, our study revealed only unidirectional backcrossing of hybrid taxa either to S. baicalensis or S. grandis, but not to S. krylovii. Similarly, due to the relatively small sample size and the limited number of localities used here, we cannot draw any reliable conclusions concerning possible barriers to gene flow to the latter taxon.
According to our results, the populations' expansion seemingly started during the LGP. Nonetheless, the assessment of genetic structures revealed signs of contemporary gene flow between populations across all species, despite significant geographical distances between some of them. For instance, populations of S. baicalensis, S. capillata and S. grandis from the eastern part of Khakassia and the southwestern area of Buryatia represented either one genetic cluster or were admixed between two. Among potential explanations are shifting their distributions in response to climate change, or seasonal migration of wild animals and livestock grazing. While we currently do not possess enough data to verify the first assumption, seeds of feather grasses usually are spread naturally by wind or water. On the other hand, they also can be frequently dispersed by the wool of mammals including, e.g., sheep, goats and horses [16,18]. Although this study was not intended to explore population differences in detail, we believe that our findings regarding gene flow merit further studies in order to better understand the intraspecific variation and relationships among populations as well as to discuss potential consequences of such events.
Our results also illustrated a complex association between species at the morphological level. There are usually few phenotypic characters differentiating hybridising species and these characters are often functionally or developmentally correlated [54]. In the present case, although the studied species had a set of distinctive characters, the current identification keys do not provide a solution on distinguishing admixed individuals in the section Leiostipa. As a result, only 5 out of 52 hybrid taxa were properly identified based on morphology. Furthermore, the results demonstrated that the most problematic taxon is S. baicalensis, which was frequently misidentified either with S. capillata or a cross S. capillata × S. krylovii, while a few individuals were misleadingly assigned to S. grandis × S. krylovii. Additionally, several hybrids between S. baicalensis and S. capillata were determined as pure S. krylovii. Therefore, we believe that the identification keys should be revised in order to properly delimitate pure species and propose a taxonomic treatment for the hybrid taxa identified in the study. Moreover, for a more comprehensive morphological assessment we suggest using scanning electron microscopy that can assist in finding unique ultrastructures among pure and admixed individuals.
Although here we highlight that morphological characters alone cannot be used to properly identify hybrids and backcrossed individuals in the field, it is a common issue in plants rather than an exception in feather grasses. For instance, in a study on three species of willows it was shown that based on phenotype only 5% of specimens were classified as introgressed individuals, which was much less than the 19% detected using SNP data [55]. Another research on tropical trees demonstrated that even limited genomic sampling, when combined with morphology and geography, can greatly improve estimates of species diversity for clades where hybridisation contributes to taxonomic difficulties [56]. Recently, an investigation on several pine species using SNP data derived from the DArTseq pipeline revealed that one species, previously considered of hybrid origin, is a genetically distinct species and provided insights into the challenges of solely using morphological traits when identifying taxa with cryptic hybridisation and variable morphology [57].
In grasses, hybridisation and introgression phenomena are still mainly studied in crop species, e.g., rice [58], wheat [59] and sugarcane [60]. To date, we have detected hybrids not only between genetically closely related species [33], but also among genetically distant Stipa taxa [13,34]. The results present here and our previous findings helps us to shift toward thinking of the Stipa phylogeny as reticulate webs rather than a strictly bifurcating tree. Nonetheless, studying hybridisation in feather grasses is not only of particular interest to plant taxonomists. The presence of parental species, multiple generation hybrids and backcrosses in different proportions in a hybrid zone may indicate renewed sympatry providing important data for studying species boundaries and patterns of speciation [61]. From this point of view, Stipa may be a suitable genus to study these phenomena. Despite the increasing interest in feather grasses at the molecular level [19,20,48,[62][63][64][65], there is still a lack of substantial knowledge regarding, e.g., chromosome numbers of admixed and pure taxa in hybrid zones, fertility of pollen grains in F1 and later generation hybrids and backcrosses and genomic information related to specific loci contributing to reproductive barriers. We believe that only an integrative approach combining the aforesaid data can properly interpret evolutionary patterns and processes in feather grasses.

Conclusions
In the current study we revealed a complex taxonomic issue in feather grasses with variable morphology exhibited due to extensive hybridisation. Based on SNPs derived from genome-wide genotyping we detected five genetic groups representing separate morphospecies and showed that S. baicalensis is a genetically distinct species instead of a taxon of hybrid origin as it was previously hypothesised. We demonstrated the presence of F1 hybrids between S. capillata × S. baicalensis, S. capillata × S. krylovii, S. grandis × S. krylovii, S. grandis × S. baicalensis, S. baicalensis × S. krylovii and F2 individuals in S. capillata × S. krylovii, S. grandis × S. krylovii, S. baicalensis × S. krylovii indicating low levels of reproductive isolation in these species. We also discovered a few backcrosses S. baicalensis × S. krylovii and S. grandis × S. krylovii to their former parental species suggesting possible introgression among the taxa.
Furthermore, we detected reticulation events between S. baicalensis and S. krylovii, S. baicalensis and S. grandis, S. krylovii and S. grandis. On the other hand, we revealed signs of contemporary gene flow between populations of the species from the section Leiostipa. Another important outcome of the research is divergence date estimates inferred at the species and population levels. Here we deduce that diversification within the studied species started ca. 130-96 kya and hypothesise that climatic changes during the LGP were a driving force behind the current distribution of feather grasses. Importantly, here we also emphasise the usefulness of applying integrative approaches combining molecular and morphological data to delimitate species and detect hybridisation and introgression events in feather grasses. Finally, we conclude that Stipa may be a suitable genus to study these phenomena.

Plant material
In total, 302 fully developed Stipa samples were used for molecular and morphological studies. We gathered individuals representing the section Leiostipa (S. baicalensis, S. capillata, S. grandis and S. krylovii) from localities where these taxa grow together as well as from areas where they grow separately from each other (Fig. 1, Supplementary Table S1). Additionally, we included two populations of S. baicalensis previously reported from the northeastern part of Kazakhstan [46] and 13 specimens of S. glareosa belonging to the section Smirnovia that were found in localities 10, 12, 13 and 16. All voucher specimens used in the study are preserved at TK and KRA. All maps were visualised using ArcGIS Pro 2.

DNA extraction, amplification and DArT sequencing
This section was performed according to the previously reported procedures [34]. In brief, genomic DNA was isolated from dried leaf tissues using a Genomic Mini AX Plant Kit (A&A Biotechnology, Poland) and sent to Diversity Arrays Technology Pty Ltd. (Canberra, Australia) for the following genome complexity reduction using restriction enzymes and high-throughput polymorphism detection [66].
All DNA samples were processed in digestion/ligation reactions as described previously [66], but replacing a single PstI-compatible adaptor with two different adaptors corresponding to two different restriction enzyme overhangs. The PstI-compatible adapter was designed to include Illumina flowcell attachment sequence, sequencing primer sequence and "staggered", varying length barcode region, similar to the sequence previously reported [37]. The reverse adapter contained a flowcell attachment region and MseI-compatible overhang sequence. Only "mixed fragments" (PstI-MseI) were effectively amplified by PCR using an initial denaturation step of 94 °C for 1 min, followed by 30 cycles with the following temperature profile: denaturation at 94 °C for 20 s, annealing at 58 °C for 30 s and extension at 72 °C for 45 s, with an additional final extension at 72 °C for 7 min. After PCR equimolar amounts of amplification products from each sample of the 96-well microtiter plate were bulked and applied to c-Bot (Illumina, USA) bridge PCR followed by sequencing on Hiseq2500 (Illumina, USA). The single read sequencing was performed for 77 cycles.
Sequences generated from each lane were processed using proprietary DArT analytical pipelines. In the primary pipeline, the fastq files were first processed to filter away poor quality sequences, applying more stringent selection criteria to the barcode region compared to the rest of the sequence. In that way the assignments of the sequences to specific samples carried in the "barcode split" step were reliable. Approximately 2.5 mln sequences per barcode/sample were identified and used in marker calling.

DArTseq data analysis
For the downstream analyses, we applied co-dominant single nucleotide polymorphism (SNP) markers processed in the R-package dartR v.1.5.5 [67] with the following parameters: (1) a scoring reproducibility of 100%, (2) SNP loci with read depth < 5 or > 50 were removed, (3) at least 95% loci called (the respective DNA fragment had been identified in greater than 95% of all individuals), (4) monomorphic loci were removed, (5) SNPs that shared secondaries (had more than one sequence tag represented in the dataset) were randomly filtered out to keep only one random sequence tag.

DNA-based species delimitation
Five approaches were used to analyse the genetic structure at the species level: (1) Unweighted Pair Group Method with Arithmetic Mean (UPGMA), (2) fastSTRU CTU RE analysis, (3) Principal Coordinates Analysis (PCoA), (4) NewHybrids analysis, (5) calculation of the f 4 statistic. In addition, to assess the genetic differentiation at the population level within S. baicalensis, S. capillata, S. grandis and S. krylovii we performed PCoA and STRU CTU RE analyses and calculated F ST . Furthermore, we used SNAPP to estimate divergence times within the studied species and populations.
Firstly, a UPGMA cluster analysis based on Jaccard's distance matrix was performed using R-packages dartR and visualised with stats v.3.6.2 [68]. Next, the genetic structure was investigated using fastSTRU CTU RE v.1.0, which implements the Bayesian clustering algorithm STRU CTU RE, assuming Hardy-Weinberg equilibrium between alleles, in a fast and resource-efficient manner [69]. A number of clusters (K-values) ranging from 1 to 10 were tested using the default parameters with ten replicate runs per dataset. The most likely K-value was estimated with the best choice function implemented in fastSTRU CTU RE. The output matrices for the best K-values were reordered and plotted using the R package pophelperShiny v.2.1.0 [70]. We applied the threshold of 0.10 < q < 0.90 as the most widely utilised measure for the assessment of hybridisation [71,72] with q-values > 0.9 being pure species and 0.45 < q < 0.55 being F1 hybrids, while first-and second-generation backcrosses with one parent were considered at q-values 0.25 and 0.125, respectively [73]. Then, a PCoA on a Euclidean distance matrix was performed using the R-package dartR and visualised with ggplot2 v.3.3.0 [74] to show the first two components and plotly v.4.9.2 [75] to illustrate the first three components.

Hybrid generation identification
Next, to assign individuals to a genetic category (pure, hybrid F1 or F2, backcross hybrid) based on their multilocus genotypes, we performed analyses using a Bayesian model-based clustering method implemented in NewHybrids v.1.1 [76] via the R package dartR. The program utilises Markov chain Monte Carlo (MCMC) simulations to compute the posterior probability of an individual belonging to pre-defined categories comparing two parental genotypes at a time. Thus, we used five data sets representing parental species and their hybrids defined by the UPGMA, fastSTRU CTU RE and PCoA outputs from the previous steps. Specifically, we had the following datasets: information content (option "AvgPIC") were chosen via dartR. Then, to ensure the convergence of the algorithm, we used ten different 200-SNP subsets selected in random (option "random"). The Jeffreys prior was used for θ and π and a burn-in of 10,000 MCMC generations followed by 10,000 post burn-in sweeps. The resulting posterior probabilities were calculated based on 11 runs and a probability threshold > 0.8 was set for the assignment into a genetic category. The calculated posterior probabilities for the assigned categories were visualised using the R package pophelperShiny.

Testing for introgression
Further, to calculate the f 4 statistic we retained only pure individuals determined via the UPGMA, fastSTRU CTU RE and PCoA analyses. All filtering steps were conducted using the R-package dartR with the above-mentioned sequence. Next, the processed data was converted into the EIGENSTRAT format using the R-package dartR.
The subsequent calculation of f 4 statistics was performed in ADMIXTOOLS v. 7.0.1 [77] using the R package admixr v.0.9.1 [78]. In brief, the f 4 statistic [79] is similar to the D statistic [80,81] and measures the average correlation in allele frequency differences between four populations, e.g., A, B, C and D [82]. If a divergent outgroup is provided as population A, we can test for gene flow between B and C (if the statistic is negative and Z-score < − 3) or B and D (if the statistic is positive and Z-score > 3). Due to such a test originally being applied to continuous genome-wide data, it is important to mention some limitations of the analysis while working with non-model organisms. If a draft genome is available, it is possible to specify positions of SNPs along contigs or scaffolds by a special parameter "blockname" implemented in ADMIXTOOLS. Nonetheless, the package currently does not support more than 600 contigs/scaffolds [83] decreasing the potential number of used SNPs for very fragmented genomes. That was the case of Stipa, where only one draft genome comprising 5931 contigs is currently available [48]. Additionally, SNPs positions and genetic distances are used for a block jackknife method to test for a significant deviation from the null expectation of the f 4 statistic. Thus, in the absence of a reference genome or the presence of a very fragmented genome the f 4 /D statistics should be treated with caution. Here we consider signatures of past gene flow events only if BABA or ABBA patterns are greater than 50. All possible fourspecies combinations were tested, while one species, S. glareosa, was selected as an outgroup (A) for all runs.

Population differentiation
To perform PCoA and STRU CTU RE analyses and calculate F ST at the population level we retained only pure species and kept populations with more than three individuals per population. To maintain as many populations as possible, we merged several individuals growing relatively close to each other (Table 2 and Supplementary  Table S1). All filtering steps were conducted using the R-package dartR with the above-mentioned sequence. PCoA analyses were also performed according to the aforesaid flow. Next, we used STRU CTU RE v.2.3.4 [84] instead of fastSTRU CTU RE due to the former software being markedly superior to the latter one under weak population differentiation [85]. To overcome the relatively slow speed of the analysis we applied parallel computing using StrAuto v.1.0 [86]. Five replicate runs were performed for each number of clusters (K) from one to ten with a burn-in of 50,000 iterations followed by 500,000 MCMC iterations. The optimal K value was identified based on Evanno's method of ΔK statistics [87] as implemented in Structure Harvester [88]. The calculated posterior probabilities for the assigned categories were visualised using the R package pophelperShiny. Then, the fixation index Fst was calculated using the R-package dartR with 1000 bootstraps to obtain p-values. This measure assesses genetic differentiation among populations, where values of 0.00-0.05 indicate low differentiation, 0.05-0.15 indicate moderate differentiation, while Fst > 0.15 indicates high levels of differentiation [89].

Divergence-time estimation
Finally, to estimate divergence times between the studied taxa, we retained only one pure individual per species and population using dartR and followed the abovementioned sequence of filtering steps with an exception of the called loci (100% instead of 95%). Then, the SNP data was converted to the Nexus format using dartR. Subsequently, we created an XML file using the SNAPPspecific template provided in Stange et al., 2018. SNAPP v.1.5.1 [90] utilises the multi-species coalescent approach and is well-suited for analyses of genome-wide data deducing the species tree and divergence times directly from SNPs [91]. We applied one time calibration, setting a log-normal distribution with a mean of 4.39 Mya and a standard deviation (SD) of 0.18 for the split between S. glareosa (section Smirnovia) and S. capillata (section Leiostipa) as it was inferred in our previous study [48]. The analysis was performed three times independently, 1.25 million MCMC generations for each run using BEAST2 v.2.6.3 [92].

Morphological analysis
A total of 302 specimens were examined under a light microscope SMZ800 (Nikon, Japan) across the 17 most informative quantitative and six qualitative morphological characters commonly used in keys and taxonomic descriptions of Stipa (Table 3). Firstly, the Shapiro-Wilk test was used in the R-package MVN v.5.8 [96] to assess the normality of the distribution of each characteristic. Secondly, the non-parametric Spearman's correlation test was applied using R-packages stats and Hmisc v.4.3-1 [97] to examine relations between the studied characters. The combined correlogram with the significance test was visualised using the R-package corrplot v.0.84 [98]. Next, a Factor Analysis of Mixed Data (FAMD) [99] was accomplished using the R-package FactoMineR v.2.3 [100] to characterise the variation within and among groups of taxa without a priori taxonomic classification and to extract the variables that best identified them. The number of principal components used in the analysis was chosen based on Cattell's scree test [101]. R-packages factoextra v.1.0.6 [102] and plotly were used to visualise the first two and the first three principal components, respectively. Subsequently, the plots were supplemented with the result of the fastSTRU CTU RE analysis for the best K = 5.
Additionally, to evaluate distributional relationships between each response variable and the studied taxa, notch plots and interactive box plots were created using R-packages ggplot2 v.3.3.0 and plotly v.4.9.2, respectively. The notched box plots display a confidence interval around the median, which is normally based on the median ± 1.57 × interquartile range/square root of n. According to this graphical method for data analysis, if the notches of the two boxes do not overlap, there is "strong evidence" (95% confidence) that their medians differ. In addition, to reveal significant differences between means of particular characters across all examined taxa the nonparametric Kruskal-Wallis test followed by the post-hoc Wilcoxon test with Bonferroni correction were performed using the R-package stats v.3.6.2.

Additional file 1. Interactive box plots.
Additional file 2: Supplementary Table S1. List of samples used in the study. Table S2. Average posterior probabilities inferred in NewHybrids for first (F1) and second (F2) generation hybrids and backcrosses (F1xP1 and F1xP2). Table S3. Pairwise Fst values for population differentiation across the four studied species. Fst > 0.15 indicating high levels of differentiation are in bold type. Table S4. Contribution (%) by dimension of each character (abbreviations according to Table 3) in FAMD. The first five characters contributing the most are in bold type. Abbreviations of the qualitative variables and their contributions to the principal axes are underlined. Table S5. The assigned species names based on morphological and molecular data. Mismatches are shown in bold type. Supplementary Figure S1 Venn diagram representing polymorphic SNPs among four pure Stipa species. The admixed individuals and S. glareosa, which did not show patterns of hybridisation, were omitted in the metric's calculation.