Adaptive genetic differentiation in Pterocarya stenoptera (Juglandaceae) driven by multiple environmental variables were revealed by landscape genomics

Background The investigation of the genetic basis of local adaptation in non-model species is an interesting focus of evolutionary biologists and molecular ecologists. Identifying these adaptive genetic variabilities on the genome responsible can provide insight into the genetic mechanism of local adaptation. Results We investigated the spatial distribution of genetic variation in 22 natural populations of Pterocarya stenoptera across its distribution area in China to provide insights into the complex interplay between multiple environmental variables and adaptive genetic differentiation. The Bayesian analysis of population structure showed that the 22 populations of P. stenoptera were subdivided into two groups. Redundancy analysis demonstrated that this genetic differentiation was caused by the divergent selection of environmental difference. A total of 44 outlier loci were mutually identified by Arlequin and BayeScan, 43 of which were environment-associated loci (EAL). The results of latent factor mixed model analysis showed that solar radiation in June (Sr6), minimum temperature of the coldest month (Bio6), temperature seasonality (Bio4), and water vapor pressure in January (Wvp1) were associated with the highest numbers of EAL. Sr6 was associated with the ecological habitat of “prefered light”, and Bio6 and Wvp1 were associated with the ecological habitat of “warm and humid environment”. Conclusions Our results provided empirical evidence that environmental variables related to the ecological habitats of species play key roles in driving adaptive differentiation of species genome. Electronic supplementary material The online version of this article (10.1186/s12870-018-1524-x) contains supplementary material, which is available to authorized users.


Background
Recently, the investigation of the genetic basis of local adaptation in non-model species has become an interesting focus of evolutionary biologists and molecular ecologists [1,2]. Locally adapted species are facing selection pressures from temporal climate fluctuations and spatial environment heterogeneity. In response to these selective pressures, species will undergo adaptive changes in phenotypes and phenology [3]. Behind these phenotypic and phenological changes is the adaptive differentiation of genes on the genome. Genome scans enable us to identify these adaptive genes responsible for local adaptation using population genetic analyses. Identifying the genes that control these phenotypic and phenological changes can provide insight into the genetic mechanism of local adaptation [4]. However, the identification of these adaptive genes on the genome responsible for local adaptation remains a great challenge for most non-model species due to the lack of genomic information [5].
Landscape genomics was proposed by Joost et al. [6]. It has been used to uncover the relationship between adaptive genes on the genome and heterogeneous environment variables among natural populations of species [7]. Although most non-model species have no genomic information, molecular markers that do not need prior information and have high density coverage of the genome are suitable for landscape genomic studies of non-model species. Thus, amplified fragment length polymorphisms (AFLPs), inter-simple sequence repeats (ISSRs), and start codon targeted polymorphisms (SCoTs) all conform to the above conditions [4]. Simultaneously, these markers also have the advantages of high polymorphism and high repeatability. However, AFLP and ISSR are neutral molecular markers; the adaptive loci detected by these markers are more likely to be linked to adaptive genes. SCoT is a molecular marker developed based on the short conserved initial codon; it is a non-neutral bias marker that is more biased toward genes [8]. Therefore, it is more suitable for adaptive evolution research compared with the other two markers. In recent years, several reduced-representation genome sequencing (RRGS) technologies have been developed to improve the genome coverage density of molecular markers. These sequencing technologies include genotyping by sequencing [9], restricted site associated DNA [10], and specific-locus amplified fragment sequencing [11]. Although these markers yielded by the RRGS are different from the three molecular markers mentioned above, they contain DNA sequence information. Most of them cannot be annotated because they are in non-coding regions on the genome or the DNA sequence is too short and there is no whole-genome information. In any case, they can give some information about those genes that can be annotated. Recently, landscape genomics studies using these molecular markers have been carried out on many plant and animal species [4].
A large number of landscape genomic studies have proved that environmental variables would drive the adaptive differentiation of some loci on the genome of locally adapted species [12,13]. However, we do not know why adaptive differentiation occurred in these genes and why environmental factors played key roles in driving adaptive differentiation. During local adaptation, different species in different regions have different adaptive differentiation genes and different driving factors. Are there common reasons behind these differences? Several recent landscape genomic studies have proposed a hypothesis that environmental variables related to ecological habitats play key roles in driving species adaptive evolution; in other words, the genes associated with these environmental variables will undergo adaptive differentiation [2,14,15]. More landscape genomic research is needed to test whether this hypothesis applies to other species as well. Another question worthy of discussion is whether divergent selection on the genome from environmental variables plays a decisive role on the spatial genetic structure of species. Given that previous surveys on species population structure used more neutral molecular markers, we pay more attention to the effects of population demographic history and gene flow on them [16]. If the whole genome markers are used, that is, both neutral and non-neutral markers, the genetic differentiation that driven by natural selection from environmental variables will be detected. Population demographic history, gene flow, and natural selection, which will have a greater impact on the spatial genetic structure of species? The answers to these questions will help deepen our understanding of adaptive evolution of species.
Pterocarya stenoptera C. DC (Juglandaceae) is a deciduous broad-leaved tree growing in forests below 1500 m above sea level along the stream bank or wet hillside land. It is widely distributed in warm temperate and subtropical zones of China. Pterocarya stenoptera prefers light, tolerates waterlogging, likes to grow in warm and humid environment, and can grow on acidic to slightly alkaline soil [17]. Here, 22 natural populations of P. stenoptera across its distribution region in China were sampled to investigate the relationship between adaptive genetic variations on the genome and environmental variables by using landscape genomic approach.
In this study, we employed SCoT markers to scan the genome of P. stenoptera and identified the adaptive loci by performing correlations between local environmental variables and selected SCoT alleles. The present study aimed to (i) identify the spatial genetic structure of P. stenoptera, (ii) evaluate the role of environmental variations on the spatial genetic structure of P. stenoptera, and (iii) examine the effects of environmental variables on adaptive differentiation of P. stenoptera genome.

Population genetic structure
A total of 510 individuals of P. stenoptera from 22 wild populations were successfully scored using the 9 SCoT primers, and 1006 unambiguous fragments were identified with sizes varying from 100 bp to 1200 bp. The number of alleles in 9 primers ranged from 53 (SCoT35) to 156 (SCoT25). The lowest number and percentage of polymorphic alleles (N A = 126, PPA = 12.5) were found in AHXN (P5) population and the highest (N A = 261, PPA = 25.9) in JSBH (P16) population. Nei's genetic diversity (H E ) per population varied from 0.0459 in AHXN (P5) to 0.083 in HNTM (P20). Overall, the summary statistics of the genetic diversity analyses of 22 populations of P. stenoptera are shown in Table 1.
The Bayesian analysis of the population structure of P. stenoptera (Fig. 1) demonstrated that the highest Delta K value (Fig. 2) was obtained when 22 populations were clustered into two groups. The first group is Group A (P1 to P11), and the second group is Group B (P12 to P22). Despite the 22 populations of P. stenoptera could be divided into two groups, the genetic variations between the two groups was very low (6.69%, F CT = 0.067, P < 0.001; Table 2), and most genetic variations occurred within populations (75.67%, F ST = 0.243, P < 0.001; Table 2). The value of gene flow (Nm) among all populations was 2.022. To detect the roles of the 30 environmental variables in this genetic differentiation and their relative contribution on this differentiation, a constrained linear ordination analysis, redundancy analysis (RDA), was performed. The results of RDA are shown in Table 3 and Fig. 3.  Correlations between the genetic variables of 1006 alleles and the 25 environmental variables in axes 1 and 2 were both 1.000. The ratios of the eigenvalues of axes 1 and 2 were 27.4 and 12.0%, respectively. RDA analysis showed that these environmental variables could divide the populations of P. stenoptera into two groups, whereas the genetic differentiation between them is very weak (Fig. 3). This is consistent with the result of STRUCTURE and analysis of molecular variance (AMOVA). Five environmental variables were significantly correlated with RDA axes 1 and 2 (Table 3). Among these five environmental variables, mean diurnal range (Bio2), temperature seasonality (Bio4), and minimum temperature of the coldest month (Bio6) were related to temperature; solar radiation in June (Sr6) was related to light; and water vapor pressure in January (Wvp1) was related to air humidity. Bio6 and Sr6 showed the strongest correlation with genetic variables among the five environmental variables.

Characterization of environment-associated loci (EAL)
A total of 81 outlier loci (8.1% of 1006 SCoT alleles) with F ST P-value below 0.05 were identified by using the hierarchical island model in Arlequin ( Fig. 4; Additional file 1). Moreover, 168 outlier loci (16.7% of 1006 SCoT alleles) with posterior probability above 0.76 (i.e., log10PO > 0.5) were identified by using Bayesian method in BayeScan ( Fig. 4; Additional file 1). To reduce the false positive rate, the loci detected by both methods were considered as outlier loci. Here, a total of 44 mutual loci (4.4% of 1006 SCoT alleles) were detected by both methods. Latent factor mixed model (LFMM) analysis was subsequently performed to verify whether these outlier loci were driven by environmental variables. As a result, 43 EAL (4.0% of 1006 SCoT alleles) associated with at least one environmental variable were identified (Table 4). Among the 25 environment variables that we detected, Sr6, Bio6, Bio4, and Wvp1 were associated with the highest numbers of EAL. The results suggested that they play major role in the genetic differentiation of P. stenoptera.

Discussion
In this study, we investigated the spatial distribution of genetic variation in wild populations of P. stenoptera across its distribution area in China to provide insights into the complex interplay between multiple environmental variables and adaptive genetic differentiation of this widespread broad-leaved tree species and thus improve our understanding of the genetic mechanism of local adaptation. Landscape genomics has developed rapidly in the past decade, which has been proved to be an effective method for studying adaptive evolution of species [4]. Here, we revealed the adaptive evolution  of P. stenoptera in response to environmental variables by using 1006 SCoT alleles. The role of environmental factors in shaping the spatial genetic structure of species has been a key issue in landscape genomics research [18,19]. Spatial genetic structure of species is the result of the interaction of multiple factors, e.g., population demographic history, geographical or ecological barriers, transmission mode of seeds and pollen, geological events, and divergent selection of environmental factors [2,20]. In previous studies on population genetics, more attention was paid to the effects of the first few factors on the population genetic structure [21]. However, the efficient gene flow of species tends to obscure previous genetic structures, especially those based on nuclear genes [16]. Meanwhile, a completely new population genetic structure will be formed because of the divergent selection, genetic drift, and inbreeding. Although genetic drift and inbreeding can also rapidly alter the genetic structure of species, they occur more often in small and isolated populations [22]. However, P. stenoptera seems not suitable for this situation, due to a large population size. Meanwhile, there is high gene flow among populations of P. stenoptera (Nm = 2.022). The efficient gene flow (Nm > 1) would avoid the isolation between populations [23]. Therefore, the current genetic structure for P. stenoptera based on SCoT marker can hardly be attributed to genetic drift and inbreeding. Our survey showed that the 22 populations of P. stenoptera were divided into two groups, i.e., Group A (P1 to P11) and Group B (P12 to P22). Three possible reasons can be used to explain this intraspecific population differentiation. The first is caused by population demographic history. There were two refugia for P. stenoptera during the Quaternary glacial period; the present spatial genetic pattern was formed by the spread and redistribution of the population from the two refugia after glaciation. The second is due to geographical or ecological barriers. There is a geographical or ecological barrier between the two groups, which blocks or interferes with the gene flow and leads to the genetic differentiation between the two groups. The third is caused by divergent selection of environmental factors. The two groups are in different habitats, and different environmental factors lead to the allele frequency difference of the naturally selected genes, which eventually leads to the genetic differentiation between the two groups. Assuming that the first reason is true for P. stenoptera, the populations as refugia have the highest genetic diversity, and the population that is away from the refugia will gradually reduce its genetic diversity because of the founder effect. However, neither Group A nor Group B has gradient descent from the population with the highest genetic diversity. More importantly, Group A is discontinuous, which is separated by Group B. Thus, it is impossible to have two refugia for P. stenoptera. Overall, the first possible reason does not explain the genetic differentiation between the two groups of P. stenoptera. The second possible reason is considered to explain the genetic differentiation between the two groups of P. stenoptera. There must be ecological or geographical barriers between the two groups. In fact, the populations of the two groups are continuously distributed, and there are no geographical or geographical barriers. Even if the differentiation is caused by the second possible reason, the differentiation between them must be significant. Our results of hierarchical AMOVA (F CT = 0.067, P < 0.001) do not support this interpretation. By assuming the genetic divergence of P. stenoptera caused by the third possible reason, a weak genetic differentiation between the two groups would be expected because of the interaction between Statistically significant correlation by * P < 0.05 and ** P < 0.01 strong gene flow and divergent selection of environmental factors. The results of hierarchical AMOVA were consistent with this expectation. Our RDA results (Fig. 3) also showed that environmental variables could slightly separate the two groups of P. stenoptera. The populations of Group A live at higher temperature of the coldest month and higher water vapor pressure in January than those of Group B, whereas populations of Group B live at higher value of temperature seasonality and higher solar radiation in June than those of Group A. The genetic differentiation caused by the divergent selection of environmental differences will be maintained due to the existence of heterogeneous environments, which is different from the genetic differentiation caused by genetic drift. Taken together, the third possible reason is more suitable for P. stenoptera. Because of the inherent limitations of SCoT markers, sequence information cannot be obtained [4]. Thus, the genes cannot be annotated for their function. Although we do not know what genes these loci are or which genes they are linked to, we can know which environmental factors these loci are related to. Here, the results of LFMM analysis showed that Sr6, Bio6, Bio4, and Wvp1 were associated with the highest numbers of EAL, which suggested that these environmental variables play a major role on the genetic differentiation of P. stenoptera genome. Recent landscape genomic studies have proposed a hypothesis that environmental variables related to ecological habitats of species play key roles in driving adaptive differentiation of species genome [2,14,15]. Among the four environmental variables associated with the largest number of adaptive loci, Sr6 was associated with the ecological habitat of "prefered light", and Bio6 and Wvp1 were associated with the ecological habitat of "warm and humid environment". P. stenoptera blooms from April to May, and its fruits ripen from August to September. Therefore, we speculated that P. stenoptera might be more sensitive to light during fruit development. Our RDA results indicate that there is a significant difference in the amount of solar radiation in June (Sr6) between the two groups, which promoted the adaptive differentiation of these related adaptive loci. Bio6 and Wvp1 refer to the minimum temperature of the coldest   month and water vapor pressure in January, respectively. Similarly, there are significant differences of the two environmental variables between the two populations. In China, January is the driest month. These two variables were associated with larger numbers of adaptive loci, suggesting that temperature and water vapor in extreme environments were the main causes of their adaptive differentiation. In fact, previous studies have proved that there is a significant difference in the cold resistance of P. stenoptera from different provenance [24]. Surprisingly, seasonal variations in temperature (Bio4) might also have a significant impact on the adaptive differentiation of P. stenoptera. Overall, the hypothesis that environmental variables related to ecological habitats of species play key roles in driving adaptive differentiation of species genome is also suitable for P. stenoptera.

Conclusions
In the present study, SCoT markers were used to investigate adaptive genetic differentiation in P. stenoptera. Our survey showed that the 22 populations of P. stenoptera were divided into two groups. Although spatial genetic structure of species is the result of the interaction of multiple factors, our results suggested that the divergent selection of environmental differences play a major role on the genetic differentiation of P. stenoptera. Our results also provided empirical evidence that environmental variables related to ecological habitats of species play key roles in driving adaptive differentiation of species genome.

Sample collection
A total of 510 individuals from 22 natural populations were sampled from the entire distribution range of P. stenoptera in China (Fig. 5). Each population sample contained 20 to 24 individuals (

Molecular protocols
Genomic DNA was extracted from approximately 30 mg of dried leaves by using the standard Plant DNA Extraction Kit (Tiangen, Beijing, China) protocol. DNA was then quantified using the ND5000 Ultra Micro UV-Vis spectrophotometer (BioTeke, Beijing, China). All individuals were genotyped using SCoT markers. Despite its inherent defects lacking DNA sequence information, the SCoT marker has been used for landscape genomic studies because it has the advantages of non-requirement for genomic information, high repeatability, and high throughput [2,14,15]. After preliminary screening of the polymorphism and reproducibility of all SCoT primers [25], nine primers (SCoT5, SCoT9, SCoT16, SCoT18, SCoT25, SCoT27, SCoT30, SCoT31, and SCoT35) were selected for polymerase chain reactions (PCRs

Data analysis
The SCoT fragments were identified based on the presence or absence of peaks viewed in GeneMarker 2.2.0 (SoftGenetics, State College, Pennsylvania, USA). The raw information was then transformed into a 1/0 matrix. To reduce the error reading rate of SCoT fragments, the peaks within 100-1200 bp and relative fluorescent units above 200 were scored. Subsequent population genetic analyses were carried out with the basis of the 1/0 matrix from SCoT markers.
Genetic parameters, including polymorphic allele number (N A ), allele frequencies, Nei's measure of gene diversity (H E ) [26], and percentage of polymorphic alleles (PPA), were calculated using AFLP-SURV 1.0 [27]. Genetic structure of the 22 populations of P. stenoptera was assessed using the Bayesian-based program STRUCTURE 2.3.4 [28]. The program was run with K values from 1 to 10 with 10 replicates for each K, and a burn-in period of 10,000 and 10,000 Markov chain Monte Carlo iterations. The admixture model with independent allele frequencies was used for this analysis. The optimal K value with the most suitable population clusters was judged according to the ΔK values introduced by Evanno et al. [29], and this method was executed by Structure Harvester [30]. The average value of the admixture coefficients over 10 runs was calculated using CLUMPP 1.1 [31]. The barplots of STRUCTURE were obtained by DISTRUCT 1.1 [32]. The distribution of genetic differentiation at various levels for the 22 populations of P. stenoptera was characterized using hierarchical AMOVA within Arlequin 3.5 [33]. To calculate gene flow (Nm) among populations, we also calculated F ST based on neutral loci (i.e. excluding all outlier loci identified by Arlequin and BayeScan). The value of gene flow was estimated according to 1/4(1/F ST − 1). A total of 43 environmental variables (Additional files 2 and 3), including 11 temperature variables, 8 precipitation variables, 12 solar radiation variables, and 12 water vapor pressure variables, were downloaded from Worldclim (http://www.diva-gis.org/climate) from 1970 to 2000 at 2.5 arcmin resolution and further extracted using DIVA-GIS 7.5 [34]. The strongly correlated environmental variables with a Pearson correlation coefficient above 0.95 were eliminated. The correlation analysis was performed in SPSS 19 (SPSS Inc., Chicago, IL, USA). Thus, the remaining environmental variables were used for the subsequent RDA and environmental association analysis. After removing the strongly correlated environmental variables, 8 temperature variables (Bio1, Bio2, Bio3, Bio4, Bio5, Bio6, Bio8, and Bio9), 5 precipitation variables (Bio12, Bio13, Bio14, Bio15, and Bio18), 8 solar radiation variables (Sr1, Sr3, Sr5, Sr6, Sr7, Sr9, Sr10, Sr11), and 4 water vapor pressure variables (Wvp1, Wvp4, Wvp7, Wvp9) were retained. To infer the influence of the environmental variables to population genetic differentiation, we performed a constrained linear ordination analysis, RDA, in CANOCO 4.5 [35]. Here, allele frequencies per population (Additional file 4) were used as response variable and the remaining 30 uncorrelated environmental variables were used as explanatory variables.
To minimize the false positive rate, two methods were used to identify the mutual outlier loci for subsequent environmental association analysis. The first method is a hierarchical island model in Arlequin 3.5 [33]. The advantage of this approach is that it has better sensitivity to samples with common history and substructure. The program parameters used were as follows: 20,000 coalescent simulations, 100 simulated demes, and the number of simulated groups suggested by the results of STRUCTURE based on all loci. The loci with F ST P-value below 0.05 were considered as outlier loci, whereas those with total allele frequencies below 0.05 or above 0.95 were removed from the final