Functional divergence of the NIP III subgroup proteins involved altered selective constraints and positive selection
© Liu and Zhu. 2010
Received: 2 March 2010
Accepted: 20 November 2010
Published: 20 November 2010
Skip to main content
© Liu and Zhu. 2010
Received: 2 March 2010
Accepted: 20 November 2010
Published: 20 November 2010
Nod26-like intrinsic proteins (NIPs) that belong to the aquaporin superfamily are unique to plants. According to homology modeling and phylogenetic analysis, the NIP subfamily can be further divided into three subgroups with distinct biological functions (NIP I, NIP II, and NIP III). In some grasses, the NIP III subgroup proteins (NIP2s) were demonstrated to be permeable to solutes with larger diameter, such as silicic acid and arsenous acids. However, to date there is no data-mining or direct experimental evidences for the permeability of such larger solutes for dicot NIP2s, although they exhibit similar three-dimensional structures as those in grasses. It is therefore intriguing to investigate the molecular mechanisms that drive the evolution of plant NIP2s.
The NIP III subgroup is more ancient with a divergence time that predates the monocot-dicot split. The proliferation of NIP2 genes in modern grass species is primarily attributed to whole genome and segmental chromosomal duplication events. The structure of NIP2 genes is relatively conserved, possessing five exons and four introns. All NIP2s possess an ar/R filter consisting of G, S, G, and R, except for the cucumber CsNIP2;2, where a small G in the H2 is substituted with the bulkier C residue. Our maximum likelihood analysis revealed that NIP2s, especially the loop A (LA) region, have undergone strong selective pressure for adaptive evolution. The analysis at the amino acid level provided strong statistical evidences for the functional divergence between monocot and dicot NIP III subgroup proteins. In addition, several SDPs (Specificity Determining Positions) responsible for functional specificity were predicted.
The present study provides the first evidences of functional divergence between dicot and monocot NIP2s, and suggests that positive selection, as well as a radical shift of evolutionary rate at some critical amino acid sites is the primary driver. These findings will expand our understanding to evolutionary mechanisms driving the functional diversification of monocot and dicot NIP III subgroup proteins.
NOD26-like intrinsic proteins (NIPs) are plant specific integral membrane proteins, belonging to the aquaporin water channel superfamily. NIPs can be traced back to the early developmental stage of primitive land plants , indicating an important role during their evolution. Amongst plant aquaporins, only NIPs have the glycerol transport activity . It was reported that there are 9, 13, 9 and 6 NIP genes encoded in the Arabidopsis, rice, sorghum, and Cucumis sativus genomes, respectively . The divergence and proliferation of NIPs may be an adaptive response to an ever-changing environment .
Recent experimental evidences suggest that NIPs could perform a diverse range of functions, including a wider range for selectivity [4, 5]. It was demonstrated that two constriction points within the pore, referred to as the conserved dual NPA motifs and the aromatic/arginine (ar/R) selectivity filter, primarily determine the substrate selectivity of plant aquaporins . However, it appears that the ar/R filter, which consists of four amino acid residues from helix H2, helix H5, and loop LE1 + LE2, plays a crucial role in determining the substrate selectivity of NIPs . Based on difference in the ar/R filter motif and on our phylogenetic tree reconstruction, it is evident that NIPs should be divided into three distinct subgroups [1, 7]. The properties of the four residues making up the ar/R filter are remarkably different, leading to the significant variation of permeation ability for the proteins within each subgroup. In comparison to the NIP I and NIP II subgroups, NIP III proteins possess the largest constriction size of the pore (≥6Å), which allows larger solutes such as silicic acid (diameter 4.38Å) to permeate [6, 8].
With a few exceptions, the function of most of the NIP III subgroup proteins remains unknown. Of plant aquaporins, rice genes OsNIP2;1 (Lsi1) and OsNIP2;2 (Lsi6), and barley HvNIP2;1 (HvLsi1), which belong to the NIP III subgroup and localize in the plasma membrane, were demonstrated to function as a transporter of silicon across the biomembrane [9–11], a compound that can enhance the resistance of plants to biotic and abiotic stress. Moreover, it was found that rice OsNIP2;1 could be permeable to water, urea, boric acid, and silicic acid, but not glycerol. However, competition experiments indicated that this gene is a highly specific transporter of silicic acid . When expressed in oocytes, both OsNIP2;1 and OsNIP2;2 were shown to have transport activity for arsenite but not arsenate [12, 13], with the former gene having a greater role in the major pathway for arsenite uptake . In addition, the rice OsNIP2;1 was found to be responsible, at least partly, for the permeability to methylated arsenic species MMA and DMA . All three arsenous acids are slightly smaller than silicic acid [13, 14], which may explain the permeability of OsNIP2;1. Given that arsenite and glycerol are structurally similar , it is surprising that rice OsNIP2;1 does not transport glycerol [6, 13], and suggests that other structural features may be involved in the process of efficient substrate discrimination [6, 12].
NIP genes generally exhibit a low expression level, and often show a tissue- and/or cell type specific expression pattern [1, 15, 16]. The rice gene OsNIP2;1 is specifically expressed in roots, and the expression level can be transiently enhanced during the heading stage [10, 17]; whereas OsNIP2;2 is expressed in both the shoots and roots, and its expression level in roots is much weaker than that of OsNIP2;1 . The difference in expression pattern is consistent with their distinct silicon transport functions [10, 11]. Moreover, it was observed that the accumulation of silicon in different rice cultivars varies extensively, which might result from the differences in the expression of OsNIP2;1 and other silicon transporter genes in rice roots . The barley HvLsi1, like rice OsNIP2;1, is specifically expressed in the basal root, and a weak correlation between silicon uptake and the expression level of HvLsi1 in eight tested cultivars was observed . Moshelion et al.  examined the expression pattern of 33 aquaporin genes using macro-array hybridization, and found that ZmNIP2;1 is weakly expressed in maize suspension cultured cells. However, the function of ZmNIP2;1 remains to be further characterized precisely. In addition, the expression of rice OsNIP2;1 can be regulated by abiotic stresses. For instance, OsNIP2;1 expression is down-regulated during periods of dehydration and in response to the presence of ABA , but is induced by salt stress .
Homology modeling suggests that NIP proteins share a common general three-dimensional structure . The ar/R filter of Cicer arietinum CaNod26 and Cucurbita pepo CpNod26 (defined as CaNIP2;1 and CpNIP2;1 respectively in this study) consists of Gly (H2), Ser (H5), Gly (LE1) and Arg (LE2) , identical to those NIP III proteins reported in grass species [6, 7], and suggests that they also function in silicon uptake and/or translocation . However, for NIP III proteins in dicot species there is no any experimental evidence for the function in silicon uptake and/or translocation, although previous research revealed that the C. pepo CpNIP2;1, which is prominent in leaf veins, could catalyze the transmembrane flow of urea and water, but not glycerol . Therefore, it is intriguing to further investigate whether dicot NIP III genes can transport larger solutes like silicic acid. Here we demonstrated shifts of selective constraint and positive selection may have been involved in the evolution of the NIP III proteins, which may correspond to functional differences observed between dicot and monocot NIP III genes.
The protein and CDS sequences of approximately 20 plant genomes, as well as the cDNA and genomic DNA sequences of six other plant species (Eragrostis tef, Festuca pratensis, Elaeis oleifera, Musa acuminate, Zingiber officinale, Curcuma longa) that belong to Poaceae, Arecaceae, Musaceae, and Zingiberaceae, respectively, were searched. We found that there is at least one NIP2 gene presenting in all examined plant species except for two Arabidopsis spp. (Additional file 1). In monocotyledonous plants, there are two copies of NIP2 encoded within each grass species, except for maize, where three NIP III proteins were characterized ; while only one NIP2 gene (or fragment) was identified in Elaeis oleifera, Musa acuminate, Zingiber officinale and Curcuma longa. However, in the most tested dicotyledonous plants, besides G. max and cucumber, only a single copy of NIP2 gene was identified. In addition, a homologous NIP2-like protein was found in the licophyte Selaginella moellendorffii and the gymnosperm species Pinus taeda and Picea sitchensis. These results imply that NIP2 genes are relatively important for grasses.
In the monocot-specific clade, two subclades, based on the support of bootstrap values (91 and 99), were further presented: the NIP2;1 and NIP2;2 subclades (Figure 1). Within each subclade, the orthologous NIP2 genes derived from rice, maize, sorghum, barley, wheat, switchgrass, and B. distachyon were tightly clustered together, a result suggesting that the proliferation and diversification of NIP2s in Poaceae occurred before the divergence of grasses. Since previous studies revealed that the modern grasses diverged from a common grass ancestor , and a whole genome duplication (WGD) event was estimated to have occurred about 70 Mya [25, 26], we reason that the duplication of NIP2 genes in grasses may result from the corresponding WGD. To validate this, we analyzed the syntenic relationship between the chromosomes where the two NIP2 genes are located. We observed that three gene pairs including SbNIP2;1/SbNIP2;2, BdNIP2;1/BdNIP2;2 (see Additional file 3) and OsNIP2;1/OsNIP2;2 are located into chromosomal regions that were supposed to have undergone large-scale segmental duplications [1, 27]. In maize, ZmNIP2;2 and ZmNIP2;3, sharing an identity of 93.3% and 91.9% at the protein and DNA sequence level respectively, are closely phylogenetic related (Figure 1), although they are located in different chromosomes (6 and 9). Therefore, the production of ZmNIP2;2 and ZmNIP2;3 was probably through the recent WGD event occurred in maize [26, 28] after its divergence with sorghum from a common ancestor ~12 Mya .
4DTv distance (D4DTv) between paralogous NIP2 genes in monocot and dicot plants
In grasses, the orthologous NIP III subgroup proteins are significantly homologous to each other (Figure 1). The average identity of sequences for the NIP2;1 and NIP2;2 subclade proteins is 85.9% and 87.5% respectively, indicating that the NIP2 orthologs within each subclade can perform the same or similar functions. In agreement with this postulation, barley HvNIP2;1 (HvLsi1), like rice OsNIP2;1 (OsLsi1), was demonstrated to function in mediating the influx of silicic acid in roots [9, 10]. In addition, we calculated the D4DTv values for the paralogous genes in grasses, and found that with one exception (ZmNIP2;2/ZmNIP2;3 with a D4DTv value 0.128), other paralogous gene pairs are highly diverged, having a D4DTv value not less than 0.26 (Table 1). Therefore, the sequence variation should be partly responsible for their functional diversification, as indicated by the functional studies of OsNIP2;1 and OsNIP2;2 [10, 11, 13].
In contrast to the high similarity in grasses, large sequence variation was observed among dicot NIP2s that share an average identity of 61.6%. In addition, we found that the overall average identity between monocot and dicot NIP2s is decreased up to 60.4%. Nonetheless, both monocot and dicot NIP2s show similar gene structures, possessing five exons and four introns (Additional file 4). Although the introns vary extensively in length, three exons (from 2 to 4) are remarkablely conserved in tested plants (see Additional file 4), suggesting that strong functional constraints should impose on the corresponding exonic regions. In the two soybean GmNIP2s, both the exons and introns are slightly different in length, supporting of their recent expansion. However, substantial variation is found in the intron lengths, especially in the second and fourth introns for the ZmNIP2;2/ZmNIP2;3 gene pair, implying that their diversification may be ongoing. The similar cases were also observed in cucumber (CsNIP2;1/CsNIP2;2) and other paralogous gene pairs in grasses.
Results of positive selection analysis using a variety of codon substitution models
Positively selected sitesa,b
M0 (one ratio)
ω = 0.197
0.97% sites: ω = 1.85;
99.03% sites: 0.01<ω < 0.80
1162.2 (M3 vs M0)
p < < 0.01
ω = 0.243
M8 (β+ω > 1)
1.72% sites: ω = 1.54; 98.28% sites: 0.0004<ω < 0.76
10.2 (M8 vs M7)
p < 0.01
28T*, 30P, 38A, 77S*, 245M
The comparison of M3 vs M0 reveals that ω is not uniformly distributed along the NIP2 coding DNA sequences, and about 0.97% codon sites may be under the influence of positive selection (ω = 1.85). Similarly, compared with M7 model, the M8 model suggests that ~1.72% of codons fall in a category with estimated ω value 1.54, a result indicative of strong positive selection.
Using a variety of codon site models, we demonstrated the influence of positive selection in the evolution of NIP III subgroup genes (Table 2). Compared with the helix regions, the N- and C-terminus are highly divergent in NIP2 proteins (see Additional file 6). Nevertheless, it was found that plant aquaporins were regulated by phosphorylation within the cytosolic termini and loop regions [35, 38]. We used the NetPhos2.0 Server  to predict any possible serine and threonine phosphorylation sites in NIP2 proteins, and indeed found many predicted candidates in the corresponding regions (Additional file 7). These results suggest that the N-terminus and loop regions are functionally important, although they are not conserved as much as the helix regions. The site 77 is located in the loop between helix H1 and H2 (LA). In rice OsNIP2;1, the residue at the corresponding site is Ser (S), which was predicted to be a Serine phosphorylation site with high possibility (score = 0.89; Additional file 7); whereas in other NIP2s, the site is occupied by variant amino acids with different physiochemical properties (Figure 2). Therefore, positive selection in these regions may act as one major determinant in driving the functional divergence of NIP2s.
To test whether variable ω ratios are present among lineages, we performed the likelihood ratio test (LRT) to compare the two extreme models: the one-ratio model that assumes a unique rate ratio for all branches, and the free-ratio model that assumes an independent ω ratio for each branch . The log-likelihood value under the one-ratio model is -11945.3, while the value is -11796.9 for the free-ratio model. Twice the log likelihood differences, 2Δℓ = 296.8, is strongly statistically significant (p < 0.01), revealing a heterogeneous selective pressures among lineages. We further observed that some branches of the NIP III gene phylogeny include several internal branches having ω >1 (Figure 1), showing strong evidence for adaptive evolution.
In a gene family, the fate of new genes produced by duplication would either evolve a new function under positive selection, or be lost during evolution . As reported, plants have evolved more NIP proteins with multi-functions [1, 16, 22]. We revealed that positive selection was involved in the functional diversification of NIP subfamily genes , as well as in the evolution of NIP III subgroup genes (Figures 1 and 2, and Table 2). The grasses encoded two or three NIP2 genes, where the NIP2;2 gene has evolved new functions differentiating from that of NIP2;1s, which may represent an evolutionary advantage for grasses to uptake and translocate silicon efficiently.
The Gu (1999; 2006) methods [41, 42] implemented in DIVERGE2  were used to evaluate Type-I (shifted evolutionary rate) and Type-II (altered amino acid physiochemical property) functional divergence between gene clusters of interests in the NIP III subgroup. The advantage of these methods is that they use amino acid sequences, and thereby is not sensitive to saturation of synonymous sites [41, 42].
Based on the distinct expression patterns and functions of OsNIP2;1 and OsNIP2;2 [10, 11, 13], we supposed that functional divergence in NIP2s should have occurred after gene duplication in grasses. Unexpectedly, we found no statistical evidence for Type-I functional divergence between NIP2;1 s and NIP2;2s in grasses, because the coefficient of Type-I functional divergence was insignificant (θ I = 0.086 ± 0.145; LRT = 0.44; p > 0.05). However, when the posterior probability (Q k ) of divergence was determined for each site by DIVERGE2 , 7 sites with high probability (0.9 > Q k > 0.8) were identified to be Type-II functional divergence related (Additional file 8), indicative of a radical shift of amino acid properties . It appears that site-specific changes of amino acid physiochemical properties may act as one of the major evolutionary powers in driving the functional divergence of NIP2 proteins after their duplication in grasses. Of the seven sites identified, 5 of them are located in helixes and two in loop regions (LB and LE) (Additional file 8). Further, we observed that although the three sites located in helix H2 (V/T), H5 (A/P), and LE (G/A) are physically close to the four residues of the ar/R filter (G, S, G, and R) in the amino acid sequences, only G/A in LE is predicted to spatially interact with R in LE2. In addition, we found that the two residues in LB (A/S) and LE (G/A) are not only physically adjacent to the dual NPAs, but spatially interact with the third residue (A) in the NPA motifs. LB and LE are functional loops that participate in the formation of the aqueous pore , implying that the A/S and G/A residues should be structurally important. Moreover, a contact between LE (G/A) and helix H3 (S/A) was also observed. Therefore, the radical change of physiochemical property of the 7 sites may be the primary contributor to the functional divergence of NIP2s in grasses.
We further employed the DIVERGE2 to assess the coefficients of Type-I and Type-II functional divergence (θ I and θ II) between monocotyledonous and dicotyledonous NIP2s. We found that the null hypothesis (no functional divergence) can be strongly rejected, because the θ I value is statistically significant (θ I = 0.145 ± 0.041; LRT = 12.6; p < 0.01). This indicates that shifted selective constraints must strongly operate on some amino acid sites in monocot and dicot NIP2s, and thereby leading to a lineage-specific functional evolution after their divergence from an ancient common ancestor. Furthermore, two critical amino acid sites (CAASs) were predicted to be highly Type-I functional divergence related (Q k > 0.9), where one site is located in LA, and the other in the helix H4 (Figures 2 and 3). The CAAS in LA (76L) is proximal to the identified positively selected site 77 S. Using InterMap3 D , the two sites are predicted to spatially contact with each other, and have probably coevolved during evolution. In dicot NIP2s, the CAAS in H4 is invariant Val (V), whereas it is substituted with Ile (I) in some monocot NIP2s. Nevertheless, no obviously statistical evidence for the Type-II functional divergence was found (θ II = -0.030 ± 0.096; p > 0.05). Together, these observations indicated that amino acid site-specific shifts of evolutionary rate and changes of amino acid property should not uniformly act on the NIP2s after the split of monocots and dicots.
Specificity determining positions (SDPs) in the monocot and dicot lineage-specific NIP2 proteins
In addition, we noticed that when using OsNIP2;1 as the reference sequence, the residue at the site 84 was invariant Gln (Q) in monocots including Musa acuminate and Elaeis oleifera, except for the Zingiber officinale ZoNIP2, where the corresponding site was occupied by Ala (72A) (Additional files 1 and 6). Whether the substitution of Q with A in ZoNIP2 is sequencing or assembling error is unknown. However, if the residue 72A in ZoNIP2 was replaced by G, this site was also predicted to be a SDP with high confidence (p value = 0.000; Table 3). Moreover, it was found that the residue 84Q lies on the channel side (Additional file 9), and exhibits an interaction with the first residue of the ar/R filter in H2. This suggests that the site 84Q may be involved in the formation of a functional pore, although the ar/R residues are identical in rice OsNIP2;1 and cucumber CsNIP2;1 (Figure 2 and Additional file 5). If it was true, the Z. officinale ZoNIP2 gene, like dicots, may perform a dissimilar function from other monocots.
Notably, we found that the predicted SDPs for the NIP III subgroup are mainly distributed in two spatial regions. As discussed above, the site 84Q was postulated to be involved in the formation of the channel (Additional file 9). By contrast, most of other SDPs lie on the surface of the protein and likely participate in establishing the tetrameric structure.
The current NIP III subgroup proteins were originated and diverged from an ancient common ancestor that emerged before the divergence of monocots and dicots. Subsequently, monocot- and dicot-specific expansion of NIP2 genes occurred. Plant NIP2s show relatively conserved gene structures, each containing five exons and four introns. With only one exception (CsNIP2;2), the ar/R filter of NIP2 proteins consists of G, S, G, and R.
All tested grasses encoded at least two NIP2 proteins in their genomes. These paralogous NIP2s were probably produced via the whole genome duplication (WGD) or segmental chromosomal duplication event occurred in the common ancestor of modern grasses. The proliferation and diversification of NIP2s in grasses may be, at least partially, responsible for their highly efficient influx of silicon in roots and then transporting them out of xylem. By contrast, most of tested dicot plants have only one NIP2 gene. Due to the small constriction size of the pore, CsNIP2;2 might be unable to transport silicon, although the cucumber genome encoded two NIP2 genes. In particular, Arabidopsis spp. should have lost their NIP III subgroup genes during evolution.
The NIP III subgroup genes have experienced strong positive selection and diverged in function between monocots and dicots. Several SDPs were identified to be responsible for the determination of functional specificity of monocot and dicot NIP2 proteins. These findings provide deeper insights into understanding the evolutionary mechanisms of NIP III subgroup proteins and their functional diversification.
The rice (OsNIP2;1 and OsNIP2;2), sorghum (SbNIP2;1 and SbNIP2;2), maize (ZmNIP2;1, ZmNIP2;2 and ZmNIP2;3), barley HvLsi1 (HvNIP2;1), C. arietinum CaNIP2;1 and C. pepo CpNIP2;1 proteins that belong to the NIP III subgroup were collected according to the published literatures [1, 7, 9–11, 22]. The Ricinus communis RcNIP2;1 (AN: EEF27965) was identified on the basis of a BLASTP search against the nr database with e value 0.01 in National Center for Biotechnology Information (NCBI). These proteins were used as query to search against the amino acid and DNA sequences of the Brachypodium distachyon, Triticum aestivum, Hordeum vulgare, Vitis vinifera, Carica papaya, Medicago truncatula, Glycine max, Lycopersicon esculentum, Populus trichocarpa, Cucumis sativus, Panicum virgatum, Mimulus guttatus, Prunus persica, and Manihot esculenta genomes using BLASTP and TBLASTN programs, respectively. In addition, an exhaustive search against the cDNA and/or genomic DNA sequences of the genomes Eragrostis tef, Festuca pratensis, Elaeis oleifera, Musa acuminate, Zingiber officinale, Curcuma longa, Picea sitchensis, and Selaginella moellendorffii were performed also using BLASTN and TBLASTN programs, respectively. Programs InterProScan  and ConPred II  were utilized to detect conserved domains and predict the putative transmembrane regions (TMs), respectively. FgeneSH http://linux1.softberry.com/ was employed to predict the gene structures of candidates identified.
The NIP III subgroup protein sequences were aligned using the program L-INS-i implemented in MAFFT v6.6 , with the parameters: Scoring matrix for amino acid sequences, BLOSUM62; Gap opening penalty, 2.0; and Gap extension penalty, 0.2. The resulting protein alignment was subsequently employed to generate the codon-alignment of corresponding coding DNA sequences using a custom PERL script. In the codon-based alignment, the codon sites at which most sequences have data except for one or two sequences were kept while sites at which all sequences except for one or two have alignment gaps were removed. The phylogenetic trees were reconstructed with MEGA v4.0  using the Minimum Evolution (ME) and Neighbor-joining (NJ) methods with the parameters of pairwise deletion of gaps/missing data and the p-distance substitution model where only the transversions were taken into account. The reliability of interior branches was assessed with 1000 bootstrap resamplings. Phylogenetic trees were displayed using MEGA v4.0 .
We employed the CODEML program implemented in the PAML v4.4 software package  to test the hypothesis of positive selection in the NIP III subgroup genes. In the analysis, two pairs of models were contrasted to test for heterogeneous selective pressures at codon sites. First, models M0 (one ratio) and M3 (discrete) were compared, using a test for heterogeneity between codon sites in the d N/d S ratio value, ω. The second comparison was M7 (beta) vs M8 (beta+ω >1); this is the most stringent test of positive selection . M0 and M7 belong to null models that do not allow for any codons with ω >1. In such comparisons, positive selection is indicated if the models that allow for selection (M3 and M8) are significantly better than the null model (no selection) in the likelihood ratio test (LRT). When the LRT suggests positive selection, the Bayes empirical Bayes (BEB) method was used to calculate the posterior probabilities that each codon is from the site class of positive selection under models M3 and M8 .
To test for heterogeneous selective pressures among lineages , models of variable ω ratios among lineages were fitted by maximum likelihood (ML) to the NIP III subgroup sequence alignment. The ratio of nonsynonymous-to-synonymous for each branch under two models (one-ratio and free-ratio for branches) was calculated, and the two models were compared using the LRT test to see whether the ω ratios are different among lineages. Accordingly, positive selection is indicated if the free-ratio model that allows for selection is significantly better than the one-ratio model (no selection) in the LRT analysis.
The software DIVERGE2  was employed to detect functional divergence between clusters of interests in the plant NIP III subgroup. The coefficients of Type-I and Type-II functional divergence (θ I and θ II) between any two interesting clusters were calculated. If θ I or θ II is significantly >0, it means site-specific altered selective constraints or a radical shift of amino acid physiochemical property after gene duplication and/or speciation [41, 42]. Moreover, a site-specific profile based on the posterior probability (Q k ) was used to predict critical amino acid residues that were responsible for functional divergence. In this analysis, large Q k indicates a high possibility that the functional constraint (or the evolutionary rate) and/or the radical change of amino acid property of a site is different between two clusters [41, 42].
The SDPpred  was utilized to identify the specificity determining positions (SDPs) that determined differences in the functional specificity of homologous proteins. Based on a given multiple alignment, SDPpred computes Z-score for each alignment column. The higher Z-score, the more likely the position is a true specificity determinant. The Bernoulli estimator was incorporated into SDPpred to automatically produce a recognition cutoff (B-cutoff) to evaluate the significance of the Z-scores in order to assess whether the observed Z-score is sufficiently high to indicate an SDP. Positions scoring higher than this cutoff are predicted to determine the specificity.
The homology models of NIP III subgroup proteins were generated with HHpred  using the method of HMM-HMM comparison of the queried protein and the templates deposited in the PDB database. The generated models were prepared and viewed with UCSF Chimera , and the critical amino acid residues identified were mapped onto the corresponding structures accordingly.
4DTv distance (D4DTv) that stands for fourfold synonymous third-codon transversion was calculated to assess the genetic distances between paralogous pairs. In such analysis, the paralogous proteins for each species were pairwise aligned using MAFFT v6.6 , and subsequently the corresponding codon-alignment was created according to the resulting protein alignment using a custom PERL script. Based on these alignments, we firstly identified the conserved fourfold degenerate amino acids. And then the corresponding codons were extracted from the codon-alignment and used to calculate the 4DTv distance between each aligning pair. D4DTv ranges from 0 for recently duplicated peptides, to ~0.5 for paralogs with an ancient evolutionary past.
We thank the anonymous reviewers for their valuable and constructive suggestions for the manuscript, and Dr. Zhongyi Cheng (University of Chicago) for his helping to polish and improve the English language. This work was financially supported by grants from the Zhejiang Provincial Natural Science Foundation of China (No. Y3090152), National Natural Science Foundation of China (No. 31000170), the Foundation of Zhejiang Educational Committee, China (No. Y200805229), the intramural fund from Zhejiang A & F University to Q. Liu (No. 2010FR060), and the Special Fund for Grade B Innovative Research Team from Zhejiang A & F University to Z. Zhu.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.