Genetic diversity and domestication origin of tea plant Camellia taliensis (Theaceae) as revealed by microsatellite markers

Background Tea is one of the most popular beverages in the world. Many species in the Thea section of the Camellia genus can be processed for drinking and have been domesticated. However, few investigations have focused on the genetic consequence of domestication and geographic origin of landraces on tea plants using credible wild and planted populations of a single species. Here, C. taliensis provides us with a unique opportunity to explore these issues. Results Fourteen nuclear microsatellite loci were employed to determine the genetic diversity and domestication origin of C. taliensis, which were represented by 587 individuals from 25 wild, planted and recently domesticated populations. C. taliensis showed a moderate high level of overall genetic diversity. The greater reduction of genetic diversity and stronger genetic drift were detected in the wild group than in the recently domesticated group, indicating the loss of genetic diversity of wild populations due to overexploitation and habitat fragmentation. Instead of the endangered wild trees, recently domesticated individuals were used to compare with the planted trees for detecting the genetic consequence of domestication. A little and non-significant reduction in genetic diversity was found during domestication. The long life cycle, selection for leaf traits and gene flow between populations will delay the emergence of bottleneck in planted trees. Both phylogenetic and assignment analyses suggested that planted trees may have been domesticated from the adjacent central forest of western Yunnan and dispersed artificially to distant places. Conclusions This study contributes to the knowledge about levels and distribution of genetic diversity of C. taliensis and provides new insights into genetic consequence of domestication and geographic origin of planted trees of this species. As an endemic tea source plant, wild, planted and recently domesticated C. taliensis trees should all be protected for their unique genetic characteristics, which are valuable for tea breeding.


Background
Plant domestication is one of the most important events in human history. People still depend on the staple cereal crops that were domesticated more than 6000 years ago in Central America [1], the Near East [2,3] and Eastern Asia [4]. In the initial domestication, the cultivated traits and genetic bottleneck may emerge in cultivars after over 1000 generations [3,5]. Reductions in genetic diversity have been found in cultivated rice [6,7], maize [8], soyabean [9,10] and other crops [11,12]. However, several instances in which no decline in the genetic diversity of planted populations occurred serve as a reminder of how complicated the situation is [13]. The biological nature of the plant whether it is annual or perennial, along with clonal propagation or sexual breeding, all have an effect on the results of domestication [14]. Differences in domestication activities, such as single or multiple domestication, also cause differences in the levels of genetic diversity in cultivars [15,16].
Both vital food plants and species that can be used for medicine or beverages, such as tea, have been domesticated for convenience. Tea was used at least as far back as 2,000 years ago in China [17]. It is one of the most popular beverages and has generated health, wealth and job opportunities throughout the world [18][19][20]. There are approximately 120 species in the genus Camellia [21], but tea in its commercial beverage form is usually produced from C. sinensis (L.) O. Kuntze. Since about 400 years ago when the first tea was introduced into Europe [18], C. sinensis has been gradually familiar to worldwide people. However, C. taliensis (W. W. Smith) Melchior, an important plant for use in producing tea, has only been recognized outside of its native areas for a few decades [22,23]. Several studies have investigated the genetic diversity of wild and planted trees of C. taliensis [24][25][26], and some have detected a reduction in chloroplast DNA (cpDNA) diversity during domestication [25,26]. But none of these studies has given more details on the domestication origin of this plant. Though C. sinensis is cultivated worldwide, there has been almost no genetic research conducted to answer the domestication questions of it, because the credible wild population of C. sinensis has been seldom found [27]. However, C. taliensis provides a unique opportunity to document the domestication origin of tea plants.
C. taliensis, a shrub or small tree (2-10 m) native to the subtropical mountain evergreen forests at altitudes of 1300-2700 m, is endemic from western Yunnan province of China to northern Myanmar [21]. It is generally distinguishable from C. sinensis by its glabrous or sparsely pubescent terminal buds and five-loculed ovary ( Figure 1a). C. sinensis has silvery-grey sericeous terminal buds and a three-loculed ovary [21]. In western Yunnan where it is mainly found, C. taliensis is called 'ye cha' (wild tea) or 'ben shan cha' (local mountain tea) by the local people (Figure 1b and c) [26]. Its leaf has been collected to produce beverage that is alike the tea from C. sinensis var. assamica but has its specific characteristic constituents [28,29]. The tea probably made from C. taliensis was recorded 1300 years ago [17,30], and this species has been cultivated throughout western Yunnan at least for hundreds years [30]. However, many tea gardens of C. taliensis have been replaced by the gardens of C. sinensis var. assamica or disappeared, and the current cultivated plants of C. taliensis are mainly located in the Lancang River basin and Dali city [25,30].
About a dozen years ago, due to the high price that 'wild tea' commanded in the local market, a large number of C. taliensis trees in the natural forest were cut down to collect leaves [24,26]. And the phenomena of directly domesticating wild C. taliensis trees by clearing out the other plants on a parcel of natural forest and keeping only specimens of C. taliensis (Figure 1d) or digging out the wild trees and planting them in gardens (Figure 1e) had been found locally. We call these directly domesticated trees as 'recently domesticated'. The cultivated trees derived from the seeds that gathered in tea gardens are called 'planted' (Figure 1f ) and trees lived in the natural forest are called 'wild' (Figure 1b and c). Unlike the cultivated traits such as non-shattering spikelet in rice [4], tea plants do not have the clear morphological characters that may differentiate cultivated from wild trees. Rigorous field investigations and local social surveys are implemented to differentiate between cultivated and wild form of tea plants.
The archaeological evidence is usually crucial to document the domestication origins [4,31]. However, the archaeological findings associated with crop origins are limited. Widely used molecular genetics approaches such as microsatellites can now be used to determine domestication origins [32] to produce a more detailed crop history [16]. It will provide an accurate outline of the domestication process when genetic analyses are consilient with ethnobotanical approaches in the research [14]. In the present study, we provide the analyses of genetic diversity and population structure in the wild, planted and recently domesticated populations of C. taliensis based on 14 nuclear microsatellite makers. We aimed to assess the relative levels of genetic diversity of C. taliensis compared to that of C. sinensis, which have been investigated using landraces and improved cultivars [33,34]. Then, we discussed whether reduction of genetic diversity occurred in the planted populations of C. taliensis relative to the wild populations, and estimated the genetic consequence of domestication. Finally, we addressed the geographical origin of planted trees, and tried to discuss more details about the domestication process. As an endemic tea source plant, knowledge of population genetics and domestication history of C. taliensis is of great importance for the effective conservation and utilization of the landraces and wild germplasm and to facilitate the genetic improvement of tea plants.

Genetic diversity and variance
A total of 178 alleles were detected in 25 populations of C. taliensis for the 14 loci analysed (Additional file 1). The average number of alleles per locus was 12.7. There were 15 private alleles in nine populations, including 12 alleles in the wild group and three alleles in the recently domesticated group. There were no private alleles in the planted group (Table 1), suggesting a common gene pool shared by planted and natural trees. The rare alleles (frequency ≤ 0.05) [35] accounted for 109 (61.2%) of the total 178 alleles revealed in all loci.
C. taliensis showed a moderate high level of overall gene diversity (H S = 0.597) ( Table 1). For each population analysed, the highest level of genetic diversity was found in the YJD population (allelic richness corrected for sample size: A = 5.524; H S = 0.682), and the lowest in the JCW population (A = 3.428, H S = 0.541). Inbreeding coefficient (F is ) values of 25 populations ranged from 0.029 to 0.275. The global F is was 0.160, suggesting a low inbreeding rate in the populations of C. taliensis.
In the group comparison tests, the wild group contained the greatest number of rare alleles across all three groups: only nine were absent from the wild group, whereas 40 were absent from the planted trees (χ 2 = 25.30, df = 1, P < 0.001) and 44 were absent from the recently domesticated individuals (χ 2 = 30.54, df = 1, P < 0.001). There was no significant difference in the number of rare alleles between the planted and recently domesticated groups (χ 2 = 0.31, df = 1, P = 0.578) ( Table 2). A was significantly lower (P one tailed < 0.05, 5000 permutations) in the wild group (4.400) than in the planted (4.911) or recently domesticated (4.993) groups. H S was significantly higher (P one tailed = 0.017) in the recently domesticated group (0.634) than in the wild group (0.583). The group comparison tests of A and H S indicated that wild populations had a lower level of genetic diversity compared with recently domesticated and planted populations. The observed heterozygosity (H O ) and F is showed no significant differences between the three groups (P one tailed > 0.05). The genetic differentiation (F st ) was 70% higher in the wild group than in the planted or recently domesticated groups (P one tailed ≤ 0.01) ( Table 2), indicating the more genetic variation in the larger sample of wild group.
Subsample genetic comparisons were performed between the selected adjacent populations. In terms of the number of rare alleles absent, there was no significant difference detected in YXP and LLP versus YXW and GSW (χ 2 = 3.19, df = 1, P = 0.074) or in OJW and YJW versus OJD and YJD (χ 2 = 3.39, df = 1, P = 0.065). No significant difference was found in the comparisons to the other genetic parameters, including A, H S , H O , F is and F st ( Table 2), suggesting the similar levels of genetic diversity, inbreeding and genetic differentiation between adjacent populations.
In the Analysis of molecular variance (AMOVA), no variation was found among wild, planted and primary domesticated groups, suggesting the same genetic basis of these groups. Most of variation was detected within populations (70.6% within individuals and 16.5% among individuals within populations) and 12.9% of variation was found among populations (Table 3).

Genetic drift of each population
The mean F values of the wild populations ranged from 0.0991 (MHW) to 0.2686 (JCW) with an average of 0.1656. The mean F values of the recently domesticated populations ranged from 0.0673 (YJD) to 0.1221 (LXD) with an average of 0.1072. Population YXP (0.0677) and DLP (0.1702) had the maximum and minimum mean F values, respectively, of the planted populations, and the average F value for this group was 0.1108 (Additional file 2). The genetic drift values suggested that the genetic composition of the wild populations had changed about 1.5-fold faster than that of the planted and recently domesticated populations since they diverged from the common ancestor.   (Figure 4). It indicated that the trees of  population DLP, CNP, FQP and YXP were genetically similar to the natural individuals from Cluster III and trees of population LLP were genetically similar to those from Cluster II. However, 6.2%-26.9% of the genomes of planted trees were similar to those of Cluster I. AMOVA detected 1.5% of variation among these clusters of the whole samples, suggesting a weak differentiation at such level of C. taliensis (Table 3).
To illustrate further the phylogenetic relationship between wild, recently domesticated and planted populations, the Neighbor-joining method was employed to reconstruct the phylogenetic tree of all 25 populations. Population FQP and CNP were phylogenetically close with population NMW, population DLP was close with population YDW, population LLP was close to population GSW and TCW, and population YXP was close to population YXW, MHW, OJD and OJW ( Figure 5). Combining the phylogenetic results with genotype assignment of planted trees, it was proposed that planted trees of C. taliensis might have been domesticated from the central forest of western Yunnan, around the geographic area of TCW, GSW, YDW, NMW and YXW (Figure 2), and dispersed artificially to distant places.

Discussion
Genetic diversity of C. taliensis Microsatellites had a high variation in the tea plants [33,37,38] as well as in other species of Camellia [39,40]. The high level of haplotype diversity and nucleotide diversity was reported by the nuclear PAL and cpDNA rpl32-trnL in C. taliensis [26]. In the present study, the overall gene diversity in C. taliensis (0.597) ( Table 1) was lower than that reported for C. japonica (0.84) [39]. The gene diversity of planted C. taliensis (0.606) was lower than that analysed in the cultivars and six of eight landraces of C. sinensis in Japan (0.617-0.723) [33] but higher than that revealed in the Chinese improved cultivars of C. sinensis (0.588) [34]. The landraces and wild tea plants reported by Yao et al. [34] were comprised by several different species of Camellia, which had higher gene diversity than C. taliensis.
Outcrossing breeding system, long life cycle and large geographic ranges may play central roles in shaping the high genetic diversity of tea plants [41]. The lower genetic differentiation means the higher gene flow between populations [42], indicating a majority of genetic diversity preserved within populations (Table 3) [20,26].  However, human activities are the additional factors that have been impacting the genetic diversity of tea plants and the adverse effects of encroachment of humans are increasing continuously. Felling the wild trees of C. taliensis to collect leaves for producing the wild tea ( Figure 1c) [24,26] and further deforestation to make way for farming, grazing and construction have caused persistent and serious damage to natural sources of this tea plant [43]. The lower genetic diversity and the higher F values in the wild populations may indicate the stronger genetic drift due to these causes (Table 2, Figure 5 and Additional file 2).

Genetic consequence of domestication
The genetic drift analysis ( Figure 5, Additional file 2) and both the group and subsample tests showed a lower level of genetic diversity in the wild populations (Table 2). Did they reveal that the planted populations have an advanced genetic diversity, which was exceptive in the plant domestication [5]? It is not possible to identify a real genetic consequence of domestication in the comparison of endangered wild populations [26] and protected planted populations. The decline in genetic diversity of wild trees introduced by human activity may mask the genetic bottleneck in planted individuals. However, the recently domesticated trees that came directly from the natural forest may partly represent the wild plants that were free from damage. Compared with the recently domesticated group, it indicates that the wild group has lost genetic diversity rather than the planted group has gained genetic diversity. Furthermore, although the differences of genetic diversity between planted and recently domesticated groups were not significant, the little reduction of A and H S and slight growth of F value in planted group would indicate a little but non-significant genetic bottleneck during the domestication, which suggests the complicated situation in the tea plants domestication (Table 2, Figure 5 and Additional file 2).
The information from both the chloroplast and nuclear genomes helped us to comprehensively understand the consequence of domestication. CpDNA rpl32-trnL intergenic spacer analyses showed a reduction of the genetic diversity during domestication with three planted populations and 21 wild populations of C. taliensis [25,26]. The maternal inheritance cpDNA analysis would suggest the limited seed sources of the planted C. taliensis during domestication. However, the analysis of cpDNA does not always give results that are consistent with the results analysed by nuclear DNA [44]. Almost the whole cpDNA variation (98.75%) was distributed among C. taliensis populations, which was contrastingly different from the results detected by nuclear DNA markers (Table 3) [26]. It may be rational to consider that the sampling number of populations will affect the comparison result of cpDNA diversity between different groups [44]. The loss of cpDNA diversity in the three planted populations compared with 21 wild populations may be partly derived from the much smaller number of planted populations.
Tea plants have 5-10 years long life cycle [45], and have been selected on the traits of leaf during their domestication. The artificial selection based on leaf characteristics may have less of an impact on the genomes of tea plants, especially as they are xenogamous plants and reproduce from seed. Additionally, the gene flow among local planted, wild and recently domesticated trees would introduce introgression among different groups and reduce the genetic difference (Table 3) [42]. The planted population of DLP, which is located at the northern frontier of the natural distribution of C. taliensis (Figure 2), is recorded as having a long period of cultivation [45]. We did not find wild C. taliensis trees in the local forests in the DLP area. Isolated from natural trees and long period of cultivation seem to be the major causes of the lowest genetic diversity and the highest drift values of population DLP among the planted populations (Table 1, Figure 5 and Additional file 2). Trees in population YXP had a high genetic diversity, which may be the result of gene flow between YXP and YXW. The tea garden from which population YXP was derived contained several cultivars of C. sinensis. Mixed cultivation may have made genetic introgression between the two species more feasible [46].

Geographical origins of the planted trees
In the present study, both phylogenetic and assignment analysis indicated that the planted trees of C. taliensis may be derived from the central forest of western Yunnan and dispersed artificially to distant places (Figures 2, 3, 4 and 5). Four of five planted populations (LLP, CNP, FQP and YXP) came from this area, suggesting that C. taliensis has been mainly domesticated from the adjacent natural forests. This area has a long period of domesticating tea plants [30]. A legend of 'dou cha' (tea fight) could be heard in the village of population YXP. Taking with the tea and seeds, people came from different places gathered in the village for the tea fight. The person who won the competition had to supply their elite seeds of C. taliensis to others for planting [30]. Through these human activities, the landraces of C. taliensis had been selected and spread to the farther place. The artificial dispersal of landraces would explain the close genetic relationship between some planted trees in population YXP and the natural trees of Cluster I in southern Yunnan as well as the relationship between DLP and YDW (Figures 2, 4 and 5).
Most of the crops that spread worldwide due to their unique values were initially derived from the native habitats of their wild ancestors, which can be traced back through both archaeological and genetic approaches [16,31]. The in situ plant domestication process is still underway [47]. From the current activities of recent domestication, it may be reasonable to consider that the origin of planted trees of C. taliensis was not a single event but an extended multistage process in which wild trees added sequentially over hundreds of years. The nonsignificant reduction in genetic diversity of planted trees will support this inference (Table 2). However, in the field investigation and local social survey, we found that a large number of endemic planted trees of C. taliensis had been replaced by the ecdemic improved cultivars of C. sinensis var. assamica in the late one hundred years [25]. It suggested that improved cultivars of tea were valued for their higher quality. In the last dozen years, the domestication of wild C. taliensis was principally owing to the high price of wild tea that had been hyped with the cultural values and it was claimed to be produced without using pesticide and chemical fertilizers [24,26]. It is hard to believe that people were willing to abandon the improved landrace in their gardens but introduce the wild trees from the natural forests frequently during hundreds of years. It is considered that the more likely process is the successive domestication in tea gardens and accompanied occasional introduction of few wild seeds or seedlings of C. taliensis.

Conservation strategies and utilization in tea breeding
Although the planted and recently domesticated populations had a greater genetic diversity, it is the wild populations that have preserved the most private alleles and rare alleles, making them the most important reservoirs of genetic variation (Tables 1 and 2). Taking natural trees and planting them in private gardens or clearing out other shrubs and transforming a plot of wild forest into one's own tea garden destroys not only wild resources of C. taliensis, but also the natural forest in general (Figure 1d and e). Without effective restriction, each individual action of initial domestication would add up to the substantial damage of common resources, not unlike the tragedy of the commons described by Hardin [48].
It is essential to conserve common natural resources, including wild tea plants, through efficient management. However, C. taliensis is an important landrace source that could generate new developments in tea breeding, for which wild genetic resources should be indispensable. The paradox of protection and production could be addressed through rapid reproduction from cuttings [49] of wild trees. The planted trees of C. taliensis should also be protected for their selected genetic characteristics and endemic culture, and they will facilitate the further breeding of tea plants.

Conclusions
In this study, we firstly illustrated the domestication origin of a tea plant with genetic approaches. Fourteen nuclear microsatellite loci detected a moderate high genetic diversity in C. taliensis. Using the credible wild, planted and recently domesticated populations of this tea plant, we discussed the genetic consequence of domestication and geographic origin of the planted trees. Group and subsample tests indicated that a little and non-significant bottleneck occurred during the domestication. The phylogenetic and assignment analyses suggested that the planted trees may have been domesticated from the adjacent central forest of western Yunnan and dispersed artificially to distant places. As an important tea source plant in Yunnan province of China, C. taliensis should be protected and utilized for their unique genetic characteristics, which are valuable for the genetic improvement of tea plants. Our study will be helpful to distinguish the genetic results of different collection and domestication activities of tea plant, and will further give deep insights into the custom and history of tea domestication.

Sampling of C. taliensis
Our sampling localities encompassed almost the entire range of C. taliensis in western Yunnan and the surrounding areas (Figure 2, Additional file 4). Wild trees were sampled from the natural forests (Figure 1b and c). Planted trees were collected from tea gardens and identified as seedling plants (Figure 1f ). Recently domesticated trees were sampled in tea gardens and the owners verified them as having come directly from natural forests (Figure 1d and e). We collected 587 individual plants from 16 wild populations (W), five planted populations (P) and four recently domesticated populations (D). Leaves were preserved in silica gel for DNA extraction. Voucher specimens were deposited at the Herbarium of the Kunming Institute of Botany, Chinese Academy of Sciences (KUN) (Additional file 4).

DNA extraction and microsatellite analysis
Total genomic DNA was extracted using a modified protocol of Doyle and Doyle [50]. Thirty-seven primer sets were selected from the known microsatellite loci in C. taliensis [51], C. sinensis [37,52,53], C. japonica [54] and other species [55]. After the primary screening, we got 14 nuclear microsatellite loci in which only five primer sets were transferred from other species of Camellia as rest of the primer sets were developed in specific for C. taliensis (Additional file 1). In the selected 37 primer sets, there were two chloroplast microsatellite loci: ccmp6 [52] and PS-ID [55]. We had not found the mutation in PS-ID from primary screening, but had found one mutation in ccmp6. However, when we developed all 587 individual with ccmp6, there was only a single mutation in population TCW. So, these two chloroplast microsatellite loci had not been implemented in the subsequent analyses.
PCR amplification was carried out according to the standard protocol and the products were separated on 8% polyacrylamide denaturing gel by silver staining. Two or three samples of each primer set were sequenced to ensure the markers hitting the same microsatellites regions as reported. The alleles were scored according to the specific references that contained 1-5 alleles of each locus from single or mixed PCR products and the100 bp DNA ladder (Tiangen Biotech, Beijing, China). About 30% of total data was performed additionally on the gel for cross and repeated scoring.

Genetic diversity estimation
The differences between the number of rare alleles (frequency ≤ 0.05) [35] present in the wild populations, planted populations and recently domesticated populations were tested using a χ 2 contingency table test [56]. The allelic richness corrected for sample size (A), the observed heterozygosity (H O ), the gene diversity (H S ) and the F-statistics were determined in FSTAT V2.9.3.2 [57]. This program was also used to perform comparison tests between each genetic parameter of the wild, planted and recently domesticated groups. The one-tailed P values were estimated using the random permutation method.
In the group comparison tests, the wide differences in number of trees between wild and planted groups as well as between wild and recently domesticated groups may bias the results. The subsample tests had been developed for avoiding this potential statistic bias and achieving the more detailed results. Certain adjacent populations in different groups were selected to perform the subsample comparisons: YXP and LLP versus YXW and GSW, OJW and YJW versus OJD and YJD (Figure 2). These genetic comparisons were also carried out in FSTAT V2.9.3.2 [57].

Genetic drift analysis
The F model, performed with the program STRUC-TURE V2.3.3 [58], was used to estimate the rate of drift away from a common ancestor for each wild, planted and recently domesticated populations. A Bayesian approach was implemented to infer the ancestral allele frequencies and the rates of drift away from the ancestral allelic state in each population (F values). For all pairs of wild, planted and recently domesticated populations, we set the prior mean F to 0.1. Three parallel Markov chains were run with a burn-in of 10 4 iterations and a run length of 10 5 iterations for each comparison. Regions of 90% credibility were computed from the distribution of F values estimated in the final run. The mean F values for each population were calculated across all runs and all other populations that belonged to different groups [32].

Ancestry analysis of the planted trees
Using the program STRUCTURE V2.3.3 [58,59], we estimated the number of genetic clusters of natural C. taliensis to which we would assign the planted trees. Both the wild and recently domesticated samples were included to estimate the genetic clusters, because the recently domesticated trees came directly from the natural forest and the broader natural samples would make the subsequence assignment analysis more accurate. We used the admixture model and assumed that the allele frequencies were correlated among the populations. The simulations were run with a burn-in of 500,000 iterations and a run length of 10 6 iterations from K = 1 through 20. Runs for each K were replicated 10 times and the true K was determined according to the method described by Evanno et al. [36]. After deduction of true K value, the wild and recently domesticated individuals were specified as belonging to each of K clusters but no prior information was specified as to the origin of planted trees, which established a new dataset. Using this new dataset and the admixture model, ten parallel Markov chains were run for the correlated allele frequency models with a burn-in of 500,000 iterations and a run length of 10 6 iterations to estimate the proportion of every planted tree's genome possessing ancestry in each of K clusters [10,32]. The results of the genetic clustering and ancestry analysis were perfected in the programs CLUMPP V1.1.2 [60] and DISTRUCT [61].

Phylogenetic analysis
Genetic distances (D A ) [62] between all 25 populations were calculated by DISPAN [63] with 1000 replicate bootstrap data sets. Using the pairwise D A , the program MEGA 5.1 [64] was implemented to construct a Neighbor-joining tree of the 25 populations.