Identification and evolution of C4 photosynthetic pathway genes in plants

Background NADP-malic enzyme (NAPD-ME), and pyruvate orthophosphate dikinase (PPDK) are important enzymes that participate in C4 photosynthesis. However, the evolutionary history and forces driving evolution of these genes in C4 plants are not completely understood. Results We identified 162 NADP-ME and 35 PPDK genes in 25 species and constructed respective phylogenetic trees. We classified NADP-ME genes into four branches, A1, A2, B1 and B2, whereas PPDK was classified into two branches in which monocots were in branch I and dicots were in branch II. Analyses of selective pressure on the NAPD-ME and PPDK gene families identified four positively selected sites, including 94H and 196H in the a5 branch of NADP-ME, and 95A and 559E in the e branch of PPDK at posterior probability thresholds of 95%. The positively selected sites were located in the helix and sheet regions. Quantitative RT-PCR (qRT-PCR) analyses revealed that expression levels of 6 NADP-ME and 2 PPDK genes from foxtail millet were up-regulated after exposure to light. Conclusion This study revealed that positively selected sites of NADP-ME and PPDK evolution in C4 plants. It provides information on the classification and positive selection of plant NADP-ME and PPDK genes, and the results should be useful in further research on the evolutionary history of C4 plants.


Background
Photosynthesis is the process used by plants to convert solar energy into chemical energy. This enables them to produce their own food for development [1]. Photosynthesis in higher plants can be classified into C 3 , C 4 and Crassulacean acid metabolism (CAM) based on how they fix carbon during the process that leads to different initial photosynthesized products. The majority of land plants use the C 3 pathway, whereas C 4 and CAM plants were evolved from C 3 plants [2,3]. C 4 plants are more efficient than C 3 plants in utilizing CO 2 leading to superior adaptiveness to subtropical and tropical environments, lower concentrations of CO 2 , and more stressed environments [4]. Numerous studies have focused on understanding the efficiency and the mechanism of carbon fixation in C 4 plants [5,6].
Among the many enzymes involved in the C 4 photosynthesis pathway PPDK and NADP-ME are considered to be the most important [7,8].
PPDK is a critical enzyme that controls the photosynthetic rate in C 4 plants [9]. Many PPDK genes in C 4 and CAM plants have been cloned, exemplified by those in maize and Mesembryanthemum crystallinum [10,11]. A phylogenetic study suggested that PPDK genes in sorghum and rice are homologous [12]. Detailed analysis of PPDK isoform sequences between the Poaceae and Arabidopsis indicated that their sequences share about 20 amino acids of chloroplast transit peptide (cTP), proving that the PPDK genes had evolved before divergence of monocots and dicots [12].
NADP-ME genes can be classified into photosynthetic and non-photosynthetic types. The former mostly function in the chloroplasts [13] and improve photosynthetic efficiency by facilitating the release of CO 2 from decarboxylation of malate in proximal bundle-sheath cells, and in C 4 plants by providing CO 2 to Rubisco for carbon fixation [14,15]. Genomic and phylogenetic analyses showed that the NADP-ME gene family in the Poaceae has four branches, with one branch (NADP-ME IV) being expressed in the plastids. The C 4 -specific NADP-ME has some codons suppressed under positive selection and is independent of the NADP-ME IV family [16,17].
Natural selection, a key factor in biological evolution, includes positive selection, purifying selection, and neutral selection [18]. The base substitution rate (non-synonymous/synonymous, ω = dN/dS), an index that determines selection pressure after change, is typically used to understand the direction of evolution and its selective strength in a coding sequence. If ω > 1, a gene might undergo positive selection or presence of a new amino acid offers a fitness advantage; ω =1 is indicative of neutral selection; and a value of ω < 1 indicates purifying selection [19]. As an important basis of adaptive evolution, positive selection functions in a population by favorable transmission and increased frequency of a mutant allele [18].
Positive selection often implies the emergence of a new function [19,20]. In transformation of the C 3 to C 4 pathway positive selection mainly occurred in key enzymes in C 4 photosynthetsis, such as Rubisco, phosphoenolpyruvate carboxylase (PEPC), NADP-ME, and PPDK [12,[21][22][23][24][25][26]. For example, two positively selected large subunit (LSu) amino acid substitutions, M309I and D149A, distinguish C 4 Rubiscos from the ancestral C 3 species [21]. With the switch to C 4 , 21 amino acids evolved under positive selection and converged to similar or identical amino acids in most of the grass C 4 PEPC lineages [22]. Acquisitions of C 4 PEPC in sedges (Cyperaceae) were driven by positive selection on at least 16 codons [23]. Previous studies used variation in amino acids to study rates of evolution in the C 4 -NADP-ME pathway, and a number of residues was found to be under significant positive selection [24]. During independent evolution of NADP-ME in C 4 plants strong positive selection led to sequence convergence [25]. For example, among the 29 residues of C 4 NADP-MEs and non C 4 NADP-MEs, residues 284, 450 and 539 were identified as having been under positive selection during evolution of C 4 -NADP-ME in grasses, suggesting they were important in explaining kinetic and structural differences between C 4 and non-C 4 groups [26]. Phylogenetic analysis also suggested that the maize PPDK gene and its sorghum ortholog were under significant positive selection, implying possible functional changes [12].
The underlying molecular mechanisms of C 4 photosynthesis are poorly understood and few studies have been directed to understanding whether positive selection was associated with evolution of NADP-ME and PPDK in C 4 plants. Completion of the whole genome sequences of C 4 plants such as sorghum and maize [27,28], and improved knowledge of photosynthetic pathways and evolution, have set a solid foundation for study of the evolution and expression of key C 4 enzyme genes. A comparison of the PPDK and NADP-ME gene families in C 4 plants could advance knowledge of the evolutionary, functional and metabolic roles of these genes during photosynthesis. This study investigated the evolutionary processes in NADP-ME and PPDK in algal, moss, Lycopodiophyta, monocotyledon and dicotyledon species, providing new information regarding C 4 photosynthesis.

Results
Numbers of NADP-ME and PPDK genes in plants A total 162 NADP-ME and 35 PPDK sequences were found in 25 species, including one algal, one moss, one Lycopodiophyta, 10 monocot (including 6 C 4 ), and 12 dicot (including 1 C 4 ) species (Additional file 1: Table S1; Additional file 2: Table S2). There were 14 NADP-ME genes in soybean. Carrot, cotton and poplar each had 9 NADP-ME genes and Selaginella moellendorffii had 3 (Additional file 1: Table S1). The number of PPDK genes was far fewer, with the largest number being 3 in the banana species Musa acuminata. Most other species had only 1 or 2 PPDK genes (Additional file 2: Table S2).

Phylogeny of NADP-ME and PPDK
We constructed a phylogenetic tree for all 162 NADP-ME genes from 25 species and discovered that they shared a common ancestor. The algal NADP-ME was the most ancient gene and was divergent from the rest of the clade. Subfamilies A and B separated after whole genome duplication (Additional file 7: Figure S3). In subfamily A, the NADP-ME gene in algae branched off first, and the rest were classified into subfamilies A1 and A2. A clear clustering between monocot and dicot plants for each subfamily was observed. Among the A1 and A2 monocot branches, NADP-ME in Musa acuminata and Ananas comosus branched off before the Poaceae. Within the Poaceae, NADP-ME genes in C 4 plants were more closely related to each other (Fig. 1). In the B subfamily, the NADP-ME genes of algae again branched off first, followed by the land plants Physcomitrella patens and Selaginella moellendorffii. Among angiosperm species, NADP-ME from dicots (B2 subfamily) branched first and NADP-ME in the monocots diverged after gene duplication and formed the B1 subfamily which underwent three whole genome duplication events. Like the A subfamily, the NADP-ME in Musa acuminata and Ananas comosus branched off earlier than counterpart in the Poaceae in which there were four branches, namely, NADP-ME-B-M1, NADP-ME-B-M2, NADP-ME-B-M3 and NADP-ME-B-M4 (Fig. 2). We discovered that the NADP-ME genes were clustered and closely related within each of the C 3 and C 4 species groups.
All 35 PPDK genes from 25 species were used to construct a phylogenetic tree. The PPDK gene in green algae was first to branch off, and there was further divergence into subfamilies I and II. Subfamily I consisted of monocots and subfamily II consisted of dicots. The PPDK gene in subfamily I first appeared in Musa acuminata and Ananas comosus and later diverged to the Poaceae. Whole genome duplication then occurred after this divergence, and two main branches were formed, with one branch including barley, maize and Brachypodium distachyon showing loss of the PPDK gene or lack of a conserved PPDK structure. It was also discovered that PPDK genes in C 4 plants are closely related (Fig. 3).
Analysis of selection pressure on NADP-ME and PPDK genes Selection pressures within each of the A and B subfamilies of NADP-ME genes were investigated. In the subfamily A, the M0 and M3 models were based on the site model for calculation. Under the M0 model ω was 0.091, indicating that it was under purifying selection. The P-value from the chi-squared test comparing the M0 and M3 models was 0.000, suggesting that the ω value were not constant across loci (Table 1). For the branch model, seven branches, a1-a7, were assigned as front branches. The branch model results showed that the ω values for all front branches were < 1. Likelihood ratio tests (LRT) showed that branches a1, a2 and a3 were significantly different from the other branches with all ω values < 1 thus suggesting purifying selection ( Fig. 1; Table 1). The branch-site model revealed that the proportions of positive selection at a1-a5 were 5, 0.2, 5.8, 11 and 1.5%, respectively, whereas the proportions at a6 and a7 were close to 0. The numbers of positively selected sites for a1-a7 were 8, 5, 14, 5, 4, 2 and 2 at a posterior probability of . For the branch model, b1-b5 were assigned as front branches in the selective pressure analysis. C 4 plants are marked with green circles 0.6. The LRT result suggested that branches a1, a3 and a5 were significantly different from the M1 model (P < 0.05). Interestingly, the a1 branch, ancestral to subfamilies A1 and A2, was stabilized after positive selection at both the a1 and a3 branches. On the contrary, a4 and a5 still had positively selected sites following positive selection at the a2 branch. This suggested that subfamily A1 had undergone different levels of positive selection at different branches. The a5 and a7 branches comprised mostly monocots and C 4 plants ( Fig. 1; Table 1). For subfamily B, the ω values were similar to those of subfamily A on the site model. LRT indicated that subfamily B was still under purifying selection with ω values varying among sites ( Table 2). Branches b1-b5, were under strong purifying selection with proportions of positively selected sites of 5.5, 0, 0.6, 8.9 and 1.8% and ω values much smaller than 1 based on the branch and branch-site models ( Table 2). The numbers of positively selected sites for b1-b5 were 8, 0, 2, 0 and 5 at a posterior probability of 0.6 ( Fig. 2; Table  2). It was concluded that b1 is the most ancient branch of NADP-ME genes in subfamily B with a total of 8 positively selected sites. The b3 and b5 branches had 2 and 5 positively selected sites, whereas both the b2 and b4 branches comprised dicots, with no positively selected sites at a posterior probability of 0.6, thus indicating that the b2 and b4 branches were more conserved than the b3 and b5 branches and that the evolutionary steps from b1 to b3 and b5 in Subfamily B were rather complex ( Fig. 2; Table 2).
For the PPDK gene family, the M0 and M3 models compared by LRT yielded a P-value of 0.000 based on the site model. This indicated that the ω values were not constant across sites, similar to the NADP-ME gene family results (Table 3). Branches a-e were assigned as foreground branches in the branch model, with their ω values much smaller than 1, suggesting purifying selection. Interestingly, the ω values from a to e were gradually increasing, with a (0.0006) < b (0.024) < c (0.026) < d (0.078) < e (0.284). This trend suggests that PPDK genes were under strong purifying selection in lower plants prior to divergence of monocots and dicots. Even  Table 3). The branch-site model showed that the proportion of positively selected sites of branches a-d was close to 0, but in the case of branch e it was 4.4%. The numbers of positively selected sites of a-e were 1, 1, 0, 0, and 8 at a posterior probability of 0.6. Positively selected sites on branch e were statistically more than on the other four branches with P < 0.0001 ( Fig. 3; Table 3).

Protein structural characteristics of NADP-ME and PPDK
Based on the above phylogenetic relationships and positive selection analysis, we conducted detailed structural and functional studies using the protein sequence alignment of NADP-ME at the a5 branch and PPDK at the e branch, which contain monocots and C 4 plants, respectively. Cre06.g268750.t1.2 in the a5 branch and Cre10.g424750.t1.2  in the e branch were used as reference sequences for further analyses. Sites 94H and 196H in the a5 branch (Fig. 4) and 95A and 559E in the e branch (Fig. 5) were significantly positively selected at a posterior probability threshold of 95%. Conserved and highly conserved regions were distinguished.
Distribution of positively selected sites on three dimensional structures of NADP-ME and PPDK We took the three-dimensional (3D) model of seita.9G200600.1 and seita.9G354600.1 as an example and analyzed the positively selected sites. As shown in Fig. 6a, the positively selected sites 94H and 196H in the a5 branch of NADP-ME-A were mapped to the sites 148S and 370 W of seita.9G200600.1. Similarly, the positively selected sites 95A and 559E in the e branch of PPDK were mapped to the sites 147R and 663H of seita.9G354600.1 (Fig. 6b). The yellow color in 3D models indicates the helix region, red represents the sheet region, and blue corresponds to specific amino acids.
148S, 147R, 663H were located in helix regions, and 370 W was located in the sheet region (Fig. 6).
Expression analysis of foxtail millet NADP-ME and PPDK genes determined by qRT-PCR Based on the phylogenetic relationships (Figs. 1, 2 and 3), we selected 6 NADP-ME and 2 PPDK foxtail millet genes for qRT-PCR after light treatment. Expression levels of all these genes were up-regulated after light exposure for 1 h. Except for NADP-ME genes, Seita.5G314300.1 and Seita.9G200600.1, the others had higher expression levels after light treatment for 6 h ( Fig. 7; Additional file 8: Table S5).

Discussion
Evolution of the NADP-ME and PPDK gene families C 4 photosynthesis evolved approximately 30 million years ago [29]. Angiosperm C 4 plant species then underwent 62 independent evolutionary events [30]. Most C 4 plants are monocots, including 4600 grass and 1600  sedge species, whereas only 1600 C 4 species from 16 families are dicots with 75% of them in families Chenopodiaceae, Amaranthaceae, Euphorbiaceae, and Asteraceae [31]. Previous research concluded that despite the specific cell structure of C 4 plants the enzymes PEPC, NADP-ME and PPDK were essential for C 4 photosynthesis [32,33]. Interestingly, increases in the numbers of NADP-ME and PPDK genes occurred later in evolution. Various studies have suggested that multiple duplication events occurred during plant evolution, including the γ event that separated monocots and dicots [34], and ρ event that occurred before divergence of wheat, maize and rice, but after divergence of grasses and pineapple [35], and τ and σ events that occurred in the Poaceae [36]. In this study, 14 and 7 NADP-ME genes were identified in soybean and maize, respectively (Additional file 1: Table S1). Although the maize genome size (2300 Mb) is more than twice that of soybean (1100 Mb) [28,37] the number of NADP-ME genes in maize is less than in soybean, indicating that expansion of the NADP-ME gene family was not by genome duplication, but was caused by different expansion patterns after divergence of monocot and dicot species [38,39]. For the 35 PPDK genes from 25 species identified in this study most species had only one or two members (Additional file 2:   Table S2). Compared to NADP-ME the numbers of PPDK genes were less but were more stable during evolution. NADP-ME and PPDK genes are widely present in photosynthetic plant species such as algae, mosses, ferns, gymnosperms and angiosperms [40,41]. From the phylogenetic trees constructed in this study we concluded that NADP-ME genes were branched into subfamilies A and B. The B2 branch containing all dicot species evolved earlier than the B1 branch containing all Fig. 4 Multi-alignment of the amino acid sequences of NADP-ME in the a5 branch. Cre06.g268750.t1.2, GRMZM2G085747_P05, Sobic.001G201700.1, Seita.9G200600.1, Sevir.9G199800.1, Bradi3g30230.1, HORVU1Hr1G045720.1, LOC_Os10g35960.1, Gorai.007G097100.1, Aco007622.1, and GSMUA_ Achr1P00210_001 represent NADP-ME genes of Chlamydomonas reinhardtii, Zea mays, Sorghum bicolor, Setaria italica, Setaria viridis, Brachypodium distachyon, Hordeum vulgare, Oryza sativa, Gossypium raimondii, Ananas comosus, and Musa acuminata, respectively. Positively selected sites for NADP-ME in the above 11 monocotyledons were marked and displayed through espript3.0 (http://espript.ibcp.fr/ESPript/cgi-bin/ESPript.cgi). Cre06.g268750.t1.2 was used as the reference sequence. Posterior probability (P) are indicated: *, P > 95%; **, P > 99%. Conserved regions are boxed, highly conserved loci are in red monocot species, suggesting that the B subfamily evolved independently after divergence of the monocots and dicots a step known as the γ event (Fig. 2) [34]. The phylogenetic tree of the PPDK gene family showed that monocots branched off and formed subfamily I before dicots formed subfamily II, indicating that the PPDK gene family evolved independently after divergence of monocots and dicots [34]. In the Poaceae there was clear clustering within monocots and dicots. For example, the NADP and PPDK genes of C 4 plants were more closely related to each other than to C 3 plants (Figs. 1, 2 and  3). We inferred that both the NADP-ME and PPDK gene families in the Poaceae underwent independent evolution after the ρ event in monocots [36]. In addition, NADP-ME and PPDK in C 4 plants are more closely clustered than in C 3 plant species, possibly due to the higher photosynthetic efficiency of C 4 plants. This study used site, branch and branch-site models to investigate the effects of selection pressure on the NADP-ME and PPDK gene families. Both site and branch models failed to detect any positive sites, possibly negated by purifying selection and neutral drift [42,43]. The branch-site model is most accurate and can detect rare positively selected sites on specific branches [44]. The branch-site model detected a total of 55 sites at a posterior probability of 0.6 that had undergone positive selection in the NADP-ME gene family (Tables 1 and 2). We found a total of 8, 5, 14, 5, 4, 2, and 2 positively selected sites for the a1-a7 branches, respectively, in subfamily A ( Fig. 1; Table 1). In subfamily B we found 8, 0, 2, 0 and 5 positively selected sites for b1-b5 branches ( Fig. 2; Table 2). The branch model for the PPDK gene family revealed that the ω values were much smaller than 1 for the five front branches, indicating strong purifying selection ( Table 3). The branch-site model detected 1, 1, 0, 0 and 8 positively selected sites for branches a-e (Table 3).
Both site and branch models suggested that the NADP-ME and PPDK gene families had undergone mostly purifying selection while maintaining normal genes function. Detection of a few positively selected sites by the more accurate branch-site model demonstrated that only a few beneficial mutations had occurred during evolution in order to adjust to changing environments [45]. C 4 plants are capable of utilizing lower amounts of CO 2 compared to their C 3 counterparts. This might be related to the positively selected sites found in both the NADP-ME and PPDK families in C 4 plants.
Positive selection is the retention and spread of advantageous mutations throughout a population and has long been considered synonymous with shifts in protein function [45]. Determining the amount of positive selection has wide-ranging implications for understanding genome function and maintenance of genetic variation [46]. In this study, four positively selected sites, including 94H and 196H were identified in the a5 branch of NADP-ME and 95A and 559E in the e branch of PPDK at a posterior probability threshold of 95% (Figs. 4 and 5). Previous studies showed that minimal changes in the primary structure were responsible for the different kinetic behavior of each NADP-ME and PPDK isoform [47,48]. To clarify the roles of positively selected sites in C 4 plant evolution and explore the relationship between positively selected sites and high photosynthetic rates in C 4 plants, 3D models of seita.9G200600.1 and seita.9G354600.1 were drawn. As shown in Fig. 6, positively selected sites 148S, 147R, and 663H were located in helix regions, whereas 370 W was located in a sheet region. These positive amino acid selection sites might reflect the functional divergence in C 4 and C 3 plants that caused C 4 plants to possess higher photosynthetic capacity. These results also indicated that the amino acid sites of NADP-ME and PPDK family members changed during plant evolution, and that the evolutionary rates were different. It also provided a priority basis for further analysis of the functions of NADP-ME and PPDK.
Further analysis of genes in the a5 branch of NADP-ME and e branch of PPDK showed that the C 4 plants in the a5 branch include GRMZM2G085747_P05, Sobic.001G201700. 1, Sevir.9G198800.1, Pahal.9G197100.1, Aco007622.1, and Seita.9G200600.1 (Fig. 1). Previous study showed that maize GRMZM2G085747 was involved in the Calvin cycle by carbon fixation in the sheath cells of leaf vascular bundles maize (a C 4 species) during photosynthesis [49]. Sorghum NADP-ME gene Sobic.001G201700 showed high transcript abundance in the C 4 pathway [50]. Furthermore, a comparison of one C 3 and 11 C 4 grass species (Poaceae) showed that the transcript abundance of Sobic.001G201700 was consistently elevated in C 4 species [24]. The e branch of PPDK members all belonged to C 4 plants, including Seita.9G354600.1, Sevir.9G360400.1, Pahal.9G416400.1, and Sobic.001G326900. 1 (Fig. 3). A previous study reported that Sobic.001G326900 showed a high transcript abundance in the C 4 pathway [50]. In this study, the sites 94H and 196H in the a5 branch of NADP-ME and 95A and 559E in the e branch of PPDK were identified as positively selected at posterior probability thresholds of 95% (Figs. 4 and 5). GRMZM2G085747 and Sobic.001G201700 in the a5 branch of NADP-ME, and Sobic.001G326900 in the e branch of PPDK were all involved in C 4 photosynthesis [24,49,50]. Our results suggested that these sites were positively selected for high photosynthetic rates during C 4 evolution.

Conclusions
One hundred and sixty two NADP-ME and 35 PPDK genes characterized in 25 species had highly similar motif compositions within subfamilies. Phylogenetic analysis showed that the NADP-ME and PPDK genes can be placed in four and two branches, respectively. The NADP-ME and PPDK genes in C 4 species had closer evolutionary relationships than in C 3 species. Analyses of selective pressure on the NAPD-ME and PPDK gene families identified four positively selected sites, including 94H and 196H in the a5 branch of NADP-ME, 95A and 559E in the e branch of PPDK at posterior probability thresholds of 95%. The positively selected sites were located in helix and sheet region. It was inferred that positive selection was driving the evolution of NADP-ME and PPDK in C 4 species. This study contributes to an increased understanding the roles of NADP-ME and PPDK in C 3 and C 4 species, and provides insights into the evolutionary biology of C 4 plants.

Dataset
Conserved NADP-ME and PPDK protein sequences of Arabidopsis and rice were obtained from the public databases Uniprot (https://www.uniprot.org/) and TAIR (https://www.arabidopsis.org/). All NADP-ME and PPDK protein sequences and CDS (coding sequences) of 25 species, including representatives of algal, moss, Lycopodiophyta, monocotyledon and dicotyledon species were obtained from Phytozome V12 (https://phytozome. jgi.doe.gov/pz/portal.html) and incorporated into a local database. Each sequence was compared to the NADP-ME and PPDK protein sequences from other species and those from Arabidopsis and rice using blastp with a threshold of E < 1e-5. CDD and Pfam were used to investigate whether the sequences contained conserved NADP-ME and PPDK protein structures. Incomplete protein structures were removed.

Construction of phylogenetic trees and analysis of conserved protein sequences
Multiple comparisons of candidate NADP-ME and PPDK protein sequences were made using the software MUSCLE3.8.31 [51]. Neighbor joining (NJ) trees were constructed with the software MEGA 7.0 using the Poisson model with 1000 bootstrap replications, gaps were filled using pairwise methods, and other parameters were based on default values [52]. Maximum likelihood (ML) trees were constructed for NADP-ME and PPDK using the Bayesian Information Criterion (BIC) and 1000 bootstrap replications with the software IQ-TREE1.6.5 [53]. The optimal model of the ML trees was estimated using the parameter M: ONLY TEST. Visualization of the constructed phylogenetic tree used Figtree.

Analysis of natural selection pressure
The protein sequences of NADP-ME and PPDK from the multiple comparison analyses were determined using Muscle 3.8.31 software, the CDS and aligned protein sequences are submitted to the online tool PAL2NAL (http://www.bork. embl.de/pal2nal/) for codon alignment. Selection pressure was calculated using the software PAML4.9e, with ω < 1 indicating purifying selection, ω =1 indicating neutrality, and ω > 1 indicating positive selection [55]. Three methods were applied to calculate selection pressure: (1) site-specific models that adopt the M3 and M0 models in testing; (2) branch-specific models that compare the foreground branches to the background branches to test for positive selection; and (3) branch-site models (Model A), that tests for positively selected sites. Statistical analyses were performed using chi-squared tests.
The full-length protein sequences of foxtail millet (Setaria italica) NADP-ME and PPDK were submitted to I-TASSER server (https://zhanglab.ccmb.med.umich. edu/I-TASSER/) to predict the 3D structure. Positively selected sites were tested at a posterior probability threshold of 95% in the branch-site model and mapped onto the surface of 3D structures by PyMol v2.3 (http:// PyMOLwiki.org).