Genome-wide analysis of major intrinsic proteins in the tree plant Populus trichocarpa: Characterization of XIP subfamily of aquaporins from evolutionary perspective

Background Members of major intrinsic proteins (MIPs) include water-conducting aquaporins and glycerol-transporting aquaglyceroporins. MIPs play important role in plant-water relations. The model plants Arabidopsis thaliana, rice and maize contain more than 30 MIPs and based on phylogenetic analysis they can be divided into at least four subfamilies. Populus trichocarpa is a model tree species and provides an opportunity to investigate several tree-specific traits. In this study, we have investigated Populus MIPs (PtMIPs) and compared them with their counterparts in Arabidopsis, rice and maize. Results Fifty five full-length MIPs have been identified in Populus genome. Phylogenetic analysis reveals that Populus has a fifth uncharacterized subfamily (XIPs). Three-dimensional models of all 55 PtMIPs were constructed using homology modeling technique. Aromatic/arginine (ar/R) selectivity filters, characteristics of loops responsible for solute selectivity (loop C) and gating (loop D) and group conservation of small and weakly polar interfacial residues have been analyzed. Majority of the non-XIP PtMIPs are similar to those in Arabidopsis, rice and maize. Additional XIPs were identified from database search and 35 XIP sequences from dicots, fungi, moss and protozoa were analyzed. Ar/R selectivity filters of dicots XIPs are more hydrophobic compared to fungi and moss XIPs and hence they are likely to transport hydrophobic solutes. Loop C is longer in one of the subgroups of dicot XIPs and most probably has a significant role in solute selectivity. Loop D in dicot XIPs has higher number of basic residues. Intron loss is observed on two occasions: once between two subfamilies of eudicots and monocot and in the second instance, when dicot and moss XIPs diverged from fungi. Expression analysis of Populus MIPs indicates that Populus XIPs don't show any tissue-specific transcript abundance. Conclusion Due to whole genome duplication, Populus has the largest number of MIPs identified in any single species. Non-XIP MIPs are similar in all four plant species considered in this study. Small and weakly polar residues at the helix-helix interface are group conserved presumably to maintain the hourglass fold of MIP channels. Substitutions in ar/R selectivity filter, insertion/deletion in loop C, increasing basic nature of loop D and loss of introns are some of the events occurred during the evolution of dicot XIPs.


Background
Water transport in different parts of a plant is significantly contributed by the integral membrane channel protein, aquaporin, which is a member of the Major Intrinsic Protein (MIP) superfamily [1]. In addition to their role in plant soil-water relations [2,3], members of this family are also implicated in plant reproduction [4,5], cell elongation [6], plant cell osmoregulation [7] and seed germination [8]. Aquaporins also influence leaf physiology and leaf movements [9,10], drought resistance [11], salt tolerance [12,13] and fruit ripening [14] in plants. MIP family consists of both aquaporins [15] and aquaglyceroporins [16,17]. A large number of MIP genes have been identified in plants and they seem to be diverse. Arabidopsis [18], maize [19] and rice [20,21] each have more than 30 MIP genes. Phylogenetic analysis reveals that the MIP genes can be largely divided into at least four different subfamilies and they have been classified as plasma membrane intrinsic proteins (PIPs), tonoplast intrinsic proteins (TIPs), nodulin-26 intrinsic proteins (NIPs) and small basic intrinsic proteins (SIPs) [18,19,21,22]. Three additional subfamilies have been recently reported. In the nonvascular moss Physcomitrella patens which is a primitive land plant, a novel plant MIP (GIP) homologous to bacterial glycerol channels found in gram-positive bacteria has been identified [23]. Two other subfamilies found recently in the same species are hybrid intrinsic proteins (HIPs) and unrecognized X intrinsic proteins (XIPs) [24]. Substrate specificity, expression and localization of many members of PIPs, TIPs and NIPs have been investigated. Plant MIPs localize in plasma membranes (PIPs and some NIPs) [25][26][27], tonoplast (TIPs) [28], endoplasmic reticulum (SIPs) [29] and other subcellular compartments [30]. In addition to water and glycerol [31][32][33], PIPs, TIPs and NIPs facilitate the transport of other unconventional neutral solutes and gases [34]. This includes urea [35][36][37], lactic acid [38] and metalloids like boron [27,39], silicon [26], arsenic and antimony [40,41]. Carbon dioxide [42], hydrogen peroxide [43] and NH3 [44,45] are among the other molecules that are transported by plant MIPs. The transport activity of these MIP genes is regulated by many factors including cotranslational and post-translational modifications [46][47][48], gating [49] or subcellular trafficking [50,51]. Members of XIPs and HIPs are the least characterized and they need further investigation regarding solute transport, expression and other properties.
Three-dimensional structures of proteins belonging to MIP family have been determined from several organisms [52][53][54][55][56][57][58] including a plant aquaporin SoPIP2;1 from spinach [49]. All MIP structures exhibit a conserved hourglass fold with α-helical bundle comprising six transmembrane (TM) helices (H1 to H6) and two half-helices. The halfhelices forming the seventh TM helix are from loops B and E (LB and LE) that also possess the signature sequence Asn-Pro-Ala (NPA). These conserved motifs from the two half-helices meet approximately at the center of the membrane giving rise to one of the two pore constrictions. The second constriction, also known as aromatic/arginine (ar/ R) selectivity filter, is formed by four residues towards the extracellular side approximately 8 Å from the NPA region. The four residues in this selectivity filter are contributed by transmembrane helices H2, H5 and the loop LE. Molecular mechanism of water and glycerol transport, exclusion of charged groups and specificity of solute transport have been investigated by computational [59][60][61][62][63] and experimental studies [64,65]. Recently, homology modeling was carried out on Arabidopsis, rice and maize MIPs [21,66] and the structures were classified based on the residues in the ar/R selectivity filter. The diversity of pore configurations indicated that the plant MIPs could transport much more diverse solutes than their counterparts in mammals.
The genome sequence of the model tree plant Populus trichocarpa (Black cottonwood) has been recently determined [67]. Phylogenetically, Populus is more closely related to Arabidopsis than the model cereal plant rice. Populus is a eudicot and both Populus and Arabidopsis are clustered in angiosperm Euroside I clade [68]. The availability of genomes of Arabidopsis, Populus and rice will facilitate the study of comparative biology of all the three species. As a second eudicot genome sequence with its modest genome size, Populus trichocarpa offers unique opportunity to study some aspects that cannot be studied in other model annual plants [68]. Examples include wood development, seasonality, flowering and natural variation [69]. Apart from its genomic sequence, other Populus genomic resources such as Populus EST sequences, full-length cDNA sequences and DNA microarrays also offer tools to study Populus biology [70][71][72]. Populus is also a good model system in which long distance transport of water and nutrients can be investigated. However, there are only few studies on poplar aquaporins and their role in long distance transport of water and other nutrients. Seven aquaporins have been investigated in mycorrhized poplar plants and it has been shown that there is a strong increase in the capacity of water transport in plasma membrane of root cells [73]. Analysis of EST sequences from the root of hybrid cottonwood described the expression levels of Populus PIP and TIP members during different stages of adventitious root development [74]. A recent study by Danielson and Johanson [24] identified a group of aquaporins from Populus belonging to the unrecognized XIP category. As in Arabidopsis and rice, the availability of Populus genome sequence gives an opportunity to identify and characterize the whole repertoire of MIPs in this species. In this paper, we have carried out genome-wide analysis of Populus MIPs from its genomic sequence and characterized them. We have identified 55 full-length MIP genes in Populus and this is the largest number of MIP genes identified in any single species to date. We have compared several features of Populus MIPs with their counterparts in Arabidopsis, rice and maize. The unique features identified in Populus MIPs are discussed in this paper.

MIP genes in Populus genome
The whole genome shotgun (WGS) sequence of Populus trichocarpa [67] available at NCBI [75] was searched using TBLASTN [76] for genes coding for MIPs. The initial query sequence from rice OsPIP2;1 resulted in identification of 41 Populus MIPs (PtMIPs). Five other query sequences representing PIP, TIP, NIP, SIP and XIP family members from the initial search results yielded additional MIP proteins. A list of more than 50 full-length MIP proteins from Populus WGS contigs was obtained (Table 1) after discarding those sequences with missing transmembrane regions or interrupted by a stop codon in the middle of the sequence as predicted by the program GeneMark [77,78]. The Populus genome paper [67] has reported 67 genes belonging to major intrinsic protein family (Table S12 in the reference Tuskan et al. [67]), although the details are not mentioned. The Joint Genome Institute (JGI) has listed 63 aquaporin genes (KOG ID: 0223) belonging to Populus trichocarpa. We have carefully compared the MIP proteins from our TBLASTN search result with those 63 from JGI and found that there are 50 sequences common between both of them. We find that 9 of the 63 MIP proteins from JGI have to be discarded for various reasons (Additional file 1: Table S1). Four JGI sequences were not found in our search. One sequence from our search (NCBI accession no. AARH01008299) is not present in the JGI list. Thus, we have finally obtained 55 full-length MIP protein sequences from Populus trichocarpa which is the largest set of MIP sequences from any single species identified so far and they are listed in Table 1. The available data shows that forty four Populus MIP genes are nearly uniformly spread over 13 of the 19 haploid chromosomes. Nine out of 13 chromosomes have at least 3 MIPs each with the highest number of eight MIPs observed in chromosome IX ( Table 1). The remaining genes are located on a scaffold not yet assigned to a chromosome.

Comparison of Populus MIPs with MIPs of Arabidopsis, rice and maize
PtMIPs were compared individually with MIPs from Arabidopsis (AtMIPs), rice (OsMIPs) and maize (ZmMIPs). Then all MIPs from the four plant species were compared together. Multiple sequence alignments of full length proteins using the program T-COFFEE [79] were generated on different sets of MIP sequences, namely (i) PtMIPs, (ii) PtMIPs and AtMIPs, (iii) PtMIPs and OsMIPs, (iv) PtMIPs and ZmMIPs and (v) PtMIPs, AtMIPs, OsMIPs and ZmMIPs. The trees created using these alignments by neighbor-joining (NJ) method shows that PtMIPs can be classified into five subfamilies. PIPs, TIPs, NIPs and SIPs from Populus clustered with the respective subfamilies from Arabidopsis, rice and maize ( Figure 1, Additional files 2 to 4). The fifth subfamily belongs to the uncharacterized XIP family and is not observed in the other three plant species. Sequences belonging to neither HIP nor GIP family [23,24] are found in all the four plant species. When MIPs from all four plant species were considered together, the corresponding non-XIP subfamily members clustered together and XIPs observed only in Populus clustered separately (Additional file 4). The results of NJ method were found to be very similar to those by heuristic distance, parsimony and maximum likelihood methods with the clustering more or less maintained in all three methods (data not shown Each subfamily was further subdivided into groups according to their clustering in the phylogenetic tree and their similarity with the known MIPs from other plants. As in other plants, Populus PIPs and TIPs have two (PtPIP1 and PtPIP2) and five (PtTIP1 to PtTIP5) subgroups respectively. However, maximum number of seven subgroups is observed for Arabidopsis NIPs while Populus, maize and rice NIPs have only three to four subgroups. Two PtNIP members (PtNIP3;1 and PtNIP3;2) have substitutions in both NPA motifs. Although, two subgroups are observed for PtSIP subfamily similar to other plants under study, the number of SIP proteins found in Populus is the maximum observed so far. The Ala residue in the first NPA motif in four out of 6 PtSIPs is substituted by Thr or Leu. The uncharacterized XIP family found only in Populus among the four species has two subgroups PtXIP1 and PtXIP2. While sequences from other subfamilies have been analyzed and studied experimentally, little is known about the XIP family members. We have identified additional members of XIP family and further sequence analysis and homology modeling helped us to characterize this subfamily further and the details are explained below. The sequences for these genes predicted by GeneMark are longer than those found in JGI. GeneMark-predicted sequences have extensions in N-or Cterminus. In this study, we have used GeneMark predicted sequences due to their similarity with Arabidopsis sequences e The symbol '*' indicates that these genes are located on a scaffold not yet assigned to a chromosome.  Danielson and Johanson [24] have reported 19 XIP members that included 5 Populus XIPs. Among the XIPs, 10 were from dicot plants other than Populus, three were from moss and one was from a protozoa. No XIP homolog was found in monocots. We examined all these sequences and found that the sequence from Nicotiana benthamina (Gen-Bank ID: CK295158) lacks the first transmembrane segment. Similarly, one of the EST sequences for Liriodendron tulipifera (GenBank ID: DT60037) is lacking NCBI record. Hence, these two sequences were discarded for further analysis of XIP sequences. In addition to the 6 Populus XIPs identified in the present study (5 of them have been reported by Danielson and Johanson [24] also), we have considered the 12 additional XIP sequences from plants, moss and the protozoa reported earlier [24].

XIP subfamily members in other species
In order to identify additional XIP members, we used each of the six PtXIP sequence as a query and searched the plant EST databases using TBLASTN [80]. We have identified an additional 8 XIP sequences from dicot plants ( Table 2). We also carried out TBLASTN searches on various completed and partial genome sequences of different organism groups available in NCBI. To our surprise, many hits were obtained from organisms that are classified as fungi with e-values ranging from 4.0E-17 to 1.0E-04. The program GeneMark [77,78] was used to identify the coding regions and we found 9 full-length (Table 3) and 5 partial fungi MIP sequences based on GeneMark predictions. Partial fungi sequences were not considered for further analysis (Additional file 1: . When only XIP members are considered, the fungi and moss XIPs form two independent clusters separate from the dicot XIPs ( Figure 2). All the dicot XIPs fall into one of the two subgroups, XIP1 or XIP2. The lone XIP from protozoa does not fall into any of the four groups. Analysis of pairwise sequence alignments indicates that the XIP sequences within the subgroup are highly similar. The average sequence identities between pairs of sequences within XIP1 and XIP2 groups are ~71% and ~70% respectively (Table 4). However, the sequence variation between the two XIP groups is significant and the average sequence identity falls to ~40% when sequences are compared across the two groups.   [24]. See also Figure 2. b Deviation of NPA signature motif in loops B and E is reported. '-' indicates that NPA motif is conserved.
that PtXIPs have diverged significantly from other subfamilies. Notably, substitutions are observed in the conserved NPA motif in loop B in almost all XIPs. However, the recent crystal structure of an MIP homolog from Plasmodium falciparum [58] in which both the NPA motifs are substituted, indicates that the mutations in the conserved in NPA motif are compensated by covariant mutations throughout the protein.

Comparison of ar/R selectivity filters in PtMIPs and XIPs
Knowledge of three-dimensional structure helps to understand the mechanism of a protein's function at molecular SPT a The names are prefixed with "F-" to distinguish fungi XIPs from other XIPs. b Deviation of NPA signature motif in loops B and E is reported. '-' indicates that NPA motif is conserved. Phylogenetic analysis of XIPs Figure 2 Phylogenetic analysis of XIPs. All 35 XIPs from dicot plants, fungi, moss and protozoa have been used to construct the phylogenetic tree using NJ method. Multiple sequence alignment for creating the phylogenetic tree was generated by T-COF-FEE. XIPs from dicot plants, fungi and moss cluster separately. All dicot XIPs cluster into two subgroups XIP1 and XIP2. All fungi XIPs and some dicot XIPs (Table 2) have been identified in this study. Other dicot XIPs, moss XIPs and the lone XIP from protozoa were identified by Danielson and Johanson. The names of Populus XIPs, dicot XIPs identified in this study and fungi XIPs are given in Tables 1, 2  level. To date, the structure of only one plant MIP protein (SoPIP2;1) has been determined experimentally at atomic level [49]. Homology modeling technique has been used to build three-dimensional models of plant MIPs and it helped to identify different structural subclasses based on the residues in the ar/R selectivity filter [21,66]. Such an approach also helped to identify the group conservation of small/weakly polar residues at the helix-helix interface.
We have modeled all the PtMIP proteins and the additional XIPs found in other dicot plants, fungi, moss and protozoa. We have analyzed the ar/R selectivity filters of all PtMIPs with a specific focus to XIP proteins. The non-XIP proteins from Populus have been compared with those from XIPs. Our structure-based sequence alignments of PtMIPs and XIPs help us to identify features in XIP proteins that distinguish them from MIPs from other subfamily.
Analysis of ar/R selectivity filters in PtPIPs, PtTIPs, PtNIPs and PtSIPs indicate that residues forming the selectivity filter region are very similar to their counterparts in other three plants compared in this study. Only three out of 49 non-XIPs show some distinct features in this region (Table  5). With the lone exception of PtPIP2;10, all PIPs from Arabidopsis, rice and maize and 14 out of 15 PtPIPs have Phe from helix H2, His from helix H5, Thr and Arg from loop E (LE1 and LE2 positions) forming the ar/R selectivity region. PtPIP2;10 has Asn in the place of Phe in H2 position making the pore constriction more hydrophilic ( Figure 3A). Among the PtNIPs, PtNIP1;5 is somewhat similar to the other members of PtNIP1 subgroup. However, it has two small residues in positions H5 and LE1 making the size of the constriction at this point relatively larger. Similarly, PtSIP1;1 has a unique substitution in the ar/R tetrad in which the conserved Arg in loop E is replaced by bulky hydrophobic Phe. With the other three positions occupied by hydrophobic residues (Ile in H2, Val in H5 and Pro in LE1), this could be one of the most hydrophobic pore constriction in the MIP members ( (Table 6). In the first group, thirteen XIP sequences have Val/Ile (H2), Thr (H5), Ala (LE1) and Arg (LE2) as ar/R signature. This is similar to the ar/R filter of PtNIP3;1 and PtNIP3;2 in which the positions of hydrophobic and Thr (or Ser) residues are interchanged in the positions H2 and H5. In the second group, the Ala at LE1 position of the first group is replaced by Val making it more hydrophobic than the first group. In the third group, hydrophobic residues Val and Ile occupy three out of four positions (H2, H5 and LE1) with the conserved Arg at LE2 retained. This results in a highly hydrophobic environment at the pore constriction ( Figure 4A) and it is somewhat similar to PtSIP1;1. The last group with one protein (PtXIP1;4) has ar/R tetrad similar to some of the NIP members of rice and maize (OsNIP2;1, OsNIP2;2, OsNIP3;2, OsNIP4;1, ZmNIP2;1 and ZmNIP2;2). Small residues Ala/Thr are observed in three out of four positions making the constriction larger.
In general, dicot XIP members from groups II and III significantly deviate from other subfamilies of PtMIPs and display more hydrophobic character at the ar/R selectivity filter compared to other PtMIPs.
Comparison of ar/R filters in moss XIPs (Table 6) indicates that all three of them have different signatures and hence each one can be considered as a separate group. PpXIP1;1 has a signature similar to a TIP protein from rice and maize (OsTIP4;2 and ZmTIP4;3). Similarly, ar/R tetrad of PpXIP1;2 has resemblance to another TIP protein from rice and maize (OsTIP5;1 and ZmTIP5;1). Interestingly, these two ar/R motifs are not found in Arabidopsis. The third XIP from moss has a Tyr at H2 position and Tyr residue has not been observed as part of the ar/R signature in any of the 160 plant MIPs analyzed from the four plant species. The ar/R filters of all three moss XIPs are more hydrophilic than their counterparts in dicot plants.
The only example from the protozoa has bulky residues in all four positions that form the ar/R filter. Danielson and Johanson [24] have observed that this non-plant sequence from amoeba has some of the sequence characteristics such as NPA boxes and ar/R filter different from other XIPs.
Majority of fungi XIP sequences (7 out of 9 forming group I) has ar/R tetrad in which the H2 position is occupied by Asn (Table 6). Small residues are found in H5 and LE1 positions and the highly conserved Arg is observed in LE2 (F-TaXIP has a Lys residue in this position; Figure 4B).
This signature is very different from that of dicot plant XIPs which are more hydrophobic. However, the group I fungi XIPs shows striking similarity with the ar/R filter of a moss XIP (PpXIP1;1) which in turn is similar to some of the rice and maize TIPs. Asn in H2 position is replaced by Gln in PpXIP1;1 and other features of ar/R filter are retained. Similarly, F-TsXIP from group II of fungi MIPs has ar/R signature similar to that of PpXIP1;2. The weakly polar and hydrophobic residues at H5 and LE1 positions are interchanged in the moss XIP. The XIP forming the third group in fungi (F-TvXIP2) is the only example that shows some similarity to group I dicot XIPs. One hydrophobic, two small/weakly polar residues with the conserved Arg at LE2 is the characteristic of ar/R motif in this group which is also shared by some members of Populus NIPs (PtNIP3;1 and PtNIP3;2).
In summary, PtMIPs that do not belong to XIP subfamily have ar/R selectivity filter similar to those found in Arabidopsis, rice and maize. Residues forming ar/R tetrad in fourteen dicot XIP sequences are found to be similar to the NIP sequences from the Populus, rice and maize. The ar/R selectivity filters of the remaining eight dicot XIPs are more hydrophobic in nature and lack counterparts in other subfamilies of plants considered in this study. On the other hand, the moss and fungi XIPs have ar/R constriction that are more hydrophilic and similar to rice and maize TIPs. The analysis of ar/R selectivity filters based on homology modeling shows clear distinction between dicot XIPs and moss/fungi XIPs.

Comparison of loops in XIPs and other MIP subfamily members
Although transmembrane segments in aquaporin give structural scaffold and define the channel environment, loops connecting the TM helices also have significant role in the function of the channel such as gating [49] and could possibly be involved in selectivity also [58,81]. Among the five loops (A to E), the high conservation of residues observed in loops B and E are due to these loops possessing the NPA signature motif and their residues defining the channel interior and selectivity filter. The loop A, connecting H1 and H2, was used to discriminate the groups within Populus PIP family [73]. Loops C and D have been implicated in solute selectivity [58] and gating [49] respectively. Hence features observed in these loops could be an important factor in giving rise to (i) different MIP subgroups, (ii) determining the nature of solute that is transported and (iii) functioning of the channel itself. We specifically focused on the loops C and D to find out whether they could be used to discriminate PtXIPs from the other Populus subfamily members. We also analyzed dicot XIPs and fungi/moss XIPs separately. We first used structure-based sequence alignment to segregate sequences in the loop regions and then used T-COFFEE Ar/R selectivity filters of PtPIP2;10 and PtSIP1;1 [79] method to align only the part belonging to the respective loop regions from all MIP sequences and also independently from the subfamilies.

Loop C
Among the four known plant MIP subfamilies, the lengths of loop C in PIPs and a subgroup of SIPs (SIP2s) are the largest (> 20 residues) and the smallest (14 residues) respectively (Table 7). Exceptions are observed in few members. For example, ZmTIP5;1 has 29 residues. However, the same analysis for XIP members show some interesting features. In general, the length of loop C can be used to distinguish the dicot and moss XIPs from other plant MIP subfamilies. All 18 dicot XIPs belonging to the first subgroup (XIP1s) are observed to have much longer C loop with 33 residues ( Figure 5). The length of the same loop in XIP2 members is shorter by 8 residues, but still 5 residues longer than plant PIPs. Surprisingly, the loop C of all moss XIPs are similar to the dicot XIP1s and all are having loop C with more than 30 residues. Fungi XIPs, on the other hand, has much shorter loop C among all XIPs and its length is comparable to that of plant PIPs with 20 residues ( Figure 5).
Analysis of loop C residues indicates that some MIP families are enriched with Gly residues in this loop. All XIPs have at least three Gly residues and dicot XIPs have more Gly residues than any other MIPs (Table 7). Twenty out of 22 dicot XIPs have at least five Gly residues in loop C (Figure 5). Similarly, loop C in 52 out of 54 PIPs contains at least four Gly residues (Additional file 6). However, TIPs and NIPs possess less number of Gly in loop C than their counterparts in PIPs and XIPs, although some exceptions are seen. For example, OsNIP1;2 and OsNIP1;5 have respectively 9 and 7 Gly residues in loop C. SIPs have the least number of Gly (2 or 1) in this loop. The longer loop and larger number of Gly residues indicate that the loop C in dicot XIPs is much more flexible than other MIP members.
When we analyzed the loop C of human counterparts, four out of thirteen human aquaporins (AQP3, AQP7, AQP9 and AQP10) contain 35 residues in loop C and all four also possess at least 3 Gly residues (Table 7). These human MIP homologs are known to be glycerol transporters, a feature also shared by the prototype glycerol transporter, the bacterial GlpF. GlpF with 39 residues in loop C is one of the longest known in aquaporin family. Most of the other human aquaporins have loop C with 20 to 23 residues, shorter by more than 10 residues compared to their glycerol-transporting counterparts. Although, it is tempting to correlate the length of loop C with the glycerol transporting property, several plant NIPs are known to transport glycerol [65] and they have much shorter loop C and their length is only half of what is observed in dicot XIPs and GlpF. However, the fact that the loop C residues have a role to play in the selectivity of solute transport has support from experimental studies (see Discussion).

Loop D
The crystal structure of plant plasma membrane aquaporin clearly demonstrates the involvement of loop D in gating of the channel [49]. Loop D is, in general, shorter than loop C. Among the four major non-XIP subfamilies, PIPs have longer D loop with 13 to 14 residues (Additional file 7). D loops in SIPs are the shortest with 8 to 9 residues (Table 7). There are some exceptions like AtPIP1;4 and AtNIP1;1 that have more than 20 residues in loop D. Analysis of loop D sequences in XIPs indicates that all of them have slightly longer loop D (15 to 16 residues) compared to that of plant MIPs from other subfamily members ( Figure 6).
Computational studies on a mammalian AQP1 have indicated that the basic residues in loop D could be significant in cation transport in the central channel formed by the tetramer [82]. We have examined the occurrence of charged residues in loop D of all plant MIP families (Table  7). In general, loop D in dicot XIPs is more basic, having at least three basic residues compared to their counterparts in moss and fungi ( Figure 6). The loop D of all the fungi XIPs is rich in proline residues and no proline is observed in the same loop in majority of dicot XIPs. Among non-XIP members, PIPs have four basic residues compared to two or less in TIPs, NIPs and SIPs (Additional file 7). Similarly, two out of four glycerol-transporting human AQPs have less number of basic residues than other human homologs. This analysis indicates that the possible influence of loop D in gating of the central channel could be different in different species.

Group conservation of residues at the helix-helix interface
Analysis of high-resolution crystal structures of MIP homologs showed that small and weakly polar residues (Ala, Gly, Ser, Thr and Cys) occur at the helix-helix interface of transmembrane helix bundle [21,54,83]. Structure-based sequence alignment of 105 MIP sequences from Arabidopsis, rice and maize indicated that these residues are conserved as a group at the helix-helix interface at 17 positions in MIP proteins [21]. High abundance of such residues helps to mediate helix-helix interactions and close packing of helices [84]. In this study, we have analyzed the group conservation at those 17 positions by considering 55 Popular MIPs and all the XIPs using structure-based sequence alignment. Our results show that in Populus MIPs also small and weakly polar residues are group conserved at the helix-helix interface ( Table 8). As observed in the other three plant species, PtPIPs have the highest conservation in which all 17 positions are 100% group conserved (Additional file 8). This is followed by PtTIPs (82 -100%) and PtNIPs (91 -100%). Group conservation at helix-helix interface is in general high in PtSIPs and PtXIPs, although some positions are poorly conserved. The conservation of Ala 78, Gly 82 and Ser 181 (the numbering followed here is that of 1Z98, the structure of SoPIP2;1) is below 50% in PtSIPs. Similarly, the positions corresponding to Thr 55, Ala 103, Ser 181 and Ala 256 are either poorly conserved (< 25%) or not conserved at all in PtXIPs. It must be mentioned that the number of sequences considered for PtXIPs is only six, and analysis of all 22 dicot XIP sequences also gives rise to a similar observation ( Table 8).
Analysis of 9 fungi XIPs indicates that the group conservation of small and weakly polar residues is 100% for 9 positions and is very high for another 5 positions. There are differences between dicot and fungi XIPs. For example, at position 181, although the group conservation is only 23% in dicot XIPs, Gly is 100% conserved in fungi XIPs.
Ar/R selectivity filters of PtXIP2;1 and F-FoXIP However, we observed the opposite at position 82. In the dicot XIPs, the group conservation at this position is 77% while in the fungi XIPs, there is absolutely no conservation. Similarly, the position 55 is reasonably well conserved in fungi XIPs and there is poor conservation in dicot XIPs.
In the previous analysis, we have observed that subfamilies show strong preference for one or another amino acid at certain positions [21]. A similar trend is observed in Populus MIPs also. Notably, the position 226 is occupied by either Ser/Ala in PtPIPs, PtTIPs, PTNIPs and PtSIPs. In PtXIPs a strong preference for Cys is observed at that position (Additional file 8). Similarly, at position 253 Ala/Gly is predominantly found in the four non-XIP subfamilies and a preference for Cys is found in PtXIPs at the same position. This is also confirmed in the analysis of 35 XIPs and all of them have Cys at position 226. In position 253, only dicot XIPs shows a strong preference for Cys (Table 8).

Gene Structure of MIPs Non-XIP Populus MIPs
The availability of three plant genomes, two dicotyledons and one monocotyledon, enabled us to analyze and compare the gene structures of MIP genes belonging to different subfamilies and different species. Recently, gene structures of MIPs from the avascular plant Physcomitrella patens have also been analyzed [24]. Although the exon-intron organization of AtMIPs has been reported [18], comparison of MIP gene structures across the three plant species has not been carried out. We have compared the gene structures of PtMIPs with that of OsMIPs and AtMIPs. In general, they show that the number and positions of introns are unique and are conserved within each subfamily of a given species. However, major differences are observed when the subfamilies from dicots are compared with those from the monocot.
Comparison of members from PIP subfamily shows that the gene structures of majority of PtPIPs have three introns, similar to that of AtPIPs ( Figure 7). However, only 3 out of 11 OsPIPs have the same organization. Eight OsPIPs have lost at least one intron (two of the OsPIPs belonging to the indica-cultivar group have been excluded from this analysis). OsPIP1;3 and OsPIP2;7 have only one intron and OsPIP2;8 has no intron. In most of the OsPIPs, the intron between the helices H2 and H3 has been lost. A similar result is observed for NIP subfamily (Figure 7).

Populus XIPs versus moss/fungi XIPs
The pattern of exon -introns in five out of six PtXIPs has already been reported and compared with that of two moss XIPs [24]. Two introns in the N-terminal region are observed in six out of seven XIPs. Due to the high degree of variation observed in the N-termini, no conclusion was reached regarding the conservation of intron positions between the moss plant and Populus. Since the fungi XIPs have been identified from their respective genome sequences, it is possible to derive the gene structure of these MIP sequences and compare them with that of Populus and Physcomitrella. It is interesting to note that in addition to the N-terminal intron, six out of nine fungi XIPs have at least one additional intron ( Figure 8). In all six of them, an intron is present between helices H5 and H6. In three cases, additional introns are present between helices H2 and H3 and also between H3 and H4.

Transcript abundance of non-PtXIPs and PtXIPs
Expression levels of all Populus MIPs were analyzed using an Affymetrix microarray-based Poplar genome arrays [85] as described in the Methods section. We have reanalyzed the Populus transcript abundance data generated by Wilkins et al [85]. Transcript abundance of 50 out of 55 PtMIPs are available in the microarray dataset. There were no probe sets for two TIPs (PtTIP5;1 and PtTIP5;2) and three NIPs (PtNIP1;1, PtNIP1;2 and PtNIP1;5). Heatmap (Figure 9) is produced for the remaining Populus MIPs using the expression profiles obtained for nine different tissues (seedlings grown under three different light conditions, young and mature leaves, female and male catkins, roots and xylem). Probe sets were clustered using hierarchical clustering and the heatmap is displayed using this clustering based on the transcript abundance pattern using the program Heatplus [86]. Major PtMIPs that are expressed in xylem, a tissue responsible for the woody stem, are PtPIPs and PtTIPs. A similar result is observed in root tissues also. Maximum number of PtTIPs is expressed in seeds grown in different light conditions. PtNIPs and PtSIPs are the predominant members expressed in male and female catkins. No appreciable accumulation of transcripts in mature leaf and seedlings grown in continuous darkness is found for NIPs and SIPs. The same is true for PIPs in female catkins and seedlings grown in continuous darkness and then transferred to light for 3 hrs. PtXIPs are expressed in seven of the nine tissues studied. Only in xylem and female catkins, no member of XIPs is found to be expressed. Transcript abundance of two XIPs is found in male catkins, root and three tissues of seedlings grown in different light conditions. A single XIP is expressed in mature leaf (PtXIP1;5) and young leaf (PtXIP2;1).

Discussion
Due to whole-genome duplication events, the number of protein-coding genes in Populus is more than that  observed in Arabidopsis [67]. In the present study, we have found 55 Populus MIP genes and this is much higher compared to the total of 35 Arabidopsis MIPs. Our studies show that Populus has ~1.6 times MIP genes than those found in Arabidopsis. This agrees with the reported observation, based on comparative genomics studies, that for each Arabidopsis gene, 1.4 to 1.6 putative Populus homologs are found [67]. The number of MIPs from rice and maize is also found to be less than forty [19][20][21].

Non-XIP MIPs from eudicot genomes have similar ar/R filters and gene structures
Homology modeling was used to analyze the aromatic/ arginine selectivity filters of plant MIPs. Ar/R tetrads from non-XIP PtMIPs were analyzed and compared with that of their counterparts from Arabidopsis, rice and maize. Ar/R filters of only three out of 49 non-XIP PtMIPs seem to be different from the other three plants. Although, the larger number of TIPs in Populus indicated the possible diversity in the solutes transported by this subfamily, analysis of ar/ R selectivity filters of all PtTIPs indicated otherwise. They are identical to AtTIPs and no member of PtTIP was found to have ar/R filter that can be described as novel. Similarly, nine out of 11 PtNIP members have counterparts in AtNIPs. One or two examples are found in NIP and SIP members where the ar/R filter is identical or similar to rice/maize members. The analysis of ar/R selectivity filters did not find any surprises and it shows that majority of non-XIP PtMIPs are similar to their counterparts in Arabidopsis.
The availability of two eudicot genomes (Populus and Arabidopsis) and one monocot genome (rice) helps us to analyze and compare the gene structures of plant MIPs. Differences observed in the pattern of exon -intron organization of MIPs from these three plant species can explain the evolution of eudicot MIP gene family and also the divergence of monocot MIPs from dicots. Intron loss is observed in majority of the OsPIPs and OsNIPs compared to the same subfamilies in Arabidopsis and in Populus. The loss of introns observed in OsPIPs and OsNIPs might have occurred independently during the evolution of rice to achieve genome slimming [87]. It is also tempting to speculate that the intron loss in rice might have happened during the divergence of monocotyledonous and dicotyledonous plants that occurred about 200 million years ago (Mya) [88]. However, such generalization is possible only after analyzing plant MIP gene structures from a large number of monocot and dicot plants. In this context, we would like to point out the recent work of Roy and Penny [89] who have observed a high degree of intron loss along a wide variety of eukaryotic lineages. They have also found that intron losses have outnumbered intron gains during the evolution of plants.

XIPs in dicots and fungi differ in Ar/R selectivity filter, loop C and gene structure
Our TBLASTN search on plant EST databases and fungi genomic sequences identified additional XIPs from dicot plants and fungi. In total, we considered 35 XIPs from dicots, fungi and moss for characterizing this new subfamily. We analyzed several features including the nature of ar/R selectivity filters, loop lengths, conservation of residues at the helix-helix interface and gene structures and these features were compared between different species groups within XIPs to understand the evolution of this uncharacterized subfamily. Comparison was also made between XIPs and other four subfamilies of Populus.

Ar/R filters in XIPs are hydrophilic in moss/fungi and more hydrophobic in dicot plants
Homology models of XIPs were analyzed and divided into structural subclasses based on the nature of residues that constitute the ar/R selectivity filters. The 22 dicot XIPs were divided into four structural subclasses. Fourteen of them from two groups are similar to the NIP subgroups from the four plants analyzed in this paper. Eight dicot XIPs from the remaining two groups have bulky hydrophobic residues occupying two/three of the four positions. Some SIPs from Populus, rice and maize have such arrangement although they lack the conserved Arg at LE2 position. All the three XIPs from moss and eight out of 9 fungi XIPs have hydrophilic residues occupying two out of four positions. Small residues are found in the remaining two positions of most of the fungi and all the moss XIPs. This arrangement is very similar to some of the rice and maize TIPs but it is not found in Populus and Arabidopsis.
In general, the ar/R selectivity filters of fungi and moss are more hydrophilic than their dicot counterparts. This clearly indicates that the nature of solutes that are transported by dicot XIPs will be very different from their counterparts in fungi/moss.
Ar/R filters of some of the XIPs are presented in the recent work of Danielson and Johanson [24]. Two possibilities are given for the residue at H5 position in their work. In  the present study, when the target sequence was aligned with the template sequences during homology modeling procedure, it resulted in aligning the conserved Gly in H5 and hence in our models, the residue at H5 position of ar/ R filters is the alternate residue reported in their paper [24]. This residue is Ser/Thr in most of the cases. Even if we consider the other possibility for H5 position (Val/Ile for dicot XIPs) as reported in [24], the ar/R filter of dicot XIPs will become even more hydrophobic compared to fungi/moss XIPs. There is also a disagreement in the ar/R tetrad reported for SmXIP1;1. Our model shows that the H2 position in this moss XIP is occupied by a Tyr residue, whereas Danielson and Johanson [24] have reported a Leu residue at this position. Our structure-based sequence alignment clearly shows that Tyr is more likely to occupy this position (data not shown) which also makes the ar/R filter more hydrophilic as observed in the other two moss XIPs (PpXIP1;1 and PpXIP1;2)

Can the length of loop C be used as an indicator of XIP subfamilies?
Although the significance of loops B and E is one of the most well established in aquaporin channel's function, recent crystal structures from the plant spinach [49] and the malarial parasite Plasmodium falciparum [58] have indicated the role of two other loops D and C in the chan-nel's gating and selectivity. Loop C connects the two halves of the channel protein linking the transmembrane segments H3 and H4. The length of this loop from known crystal structures varies from 20 residues in water-transporting human AQP1 [52] to 39 residues in glyceroltransporting GlpF [53]. This loop tucks into the channel core towards the ar/R selectivity filter and comes in close contact with the Arg residue at LE2 position of ar/R tetrad. The nature of residues in loop C is suggested to influence the solute molecules approaching the extracellular vestibule [58,81]. The length of loop C seems to be characteristic of different plant MIP subfamilies. Analysis of loop C in XIP members shows variation among the XIP subfamilies. Dicot XIP1s, moss XIPs, glycerol-specific GlpF and all four glycerol-transporting human AQP homologs have loop C that is more than 30 residues long. Dicot XIP2s and fungi XIPs have a smaller loop C with 20 to 25 residues as observed in other human AQP homlogs. Loop C in GlpF has been suggested to provide an attractive site for glycerol in the periplasmic vestibule [81]. Although, there seems to be a correlation between the nature of solute transport and the length of the loop C, this relationship could not be very clearly established. For example, it appears that all glycerol-transporting MIPs will have a long loop C with > 30 residues. However, several plant  Table 3.

Exon-intron pattern observed in Populus and fungi XIPs
Relative transcript abundance profiles of Populus MIPs Figure 9 Relative transcript abundance profiles of Populus MIPs. A heat map showing the transcript abundance of MIPs from all Populus subfamilies [85] is displayed using the program "Heatplus" [86]. This heat map is produced using the expression data obtained from Populus eFP browser [103]. The transcript abundance levels for the Populus MIPs were clustered using hierarchical clustering based on Pearson correlation coefficients. Each row corresponds to the normalized expression profile of a particular gene and their names are shown. Data obtained for nine different tissues for each gene are represented in columns. Symbols in the map represent as follows: ML --mature leaf; YL --young leaf; R --root; DG --dark-grown seedlings, etiolated; DL --dark grown seedling etiolated and then exposed to light for 3 hrs; CL --continuous light-grown seedling; FC --female catkins; MC --male catkins; X --xylem. The data is normalized for each gene (row-normalized). The relative transcript accumulation is represented in a color code with green and red showing respectively the lower and higher levels of transcript accumulation. NIPs have been shown to transport glycerol [65] and some of them have much shorter loop C with less than 20 residues. Similarly, if we assume that all water-transporting channels have shorter loop C with < 25 residues, then the moss XIPs with hydrophilic ar/R selectivity filter are likely to transport water along with other hydrophilic solutes with their C loop having more than 30 residues. A clearer picture is likely to emerge if we have functional data on more MIPs that can be directly linked with the length of loop C.

MC
We have also recognized another interesting feature that some of the MIP families are enriched with Gly residues in loop C. Dicot XIPs and PIPs have at least five and four Gly residues respectively in loop C. It could be that these Gly residues are present to impart flexibility to the loop or they could adopt conformations that are not allowed for other residues. We have examined the conformations of the three Gly residues that are part of the 'GGG' motif in both chains of spinach PIP structure (PDB ID: 1Z98; [49]). All three Gly residues have a positive φ value (+66 to +102°) and a ψ value close to zero (-12 to + 17°) and this conformation is not accessible to other residues. Hence it is possible that Gly residues in the "GGGxN" and "GGC" motifs of PIPs and XIPs in loop C could play an important conformational role.

Intron loss is observed in moss and Populus XIPs
Exon-intron pattern helps us to understand the evolution of XIP genes. Since all the Populus and fungi genes were identified from their respective genome sequences and the gene structure of Physcomittrella XIPs have been already reported [24], it was possible to compare the exon-intron organization of these 17 XIPs (6 from Populus, 9 from fungi and 2 from P. Patens). While six out of 9 fungi MIPs have at least two introns, a single intron at the N-terminus is found in Populus and the moss XIPs. It appears that intron loss has occurred during the evolution when the moss plants diverged from fungi with moss XIPs having retained only the N-terminal intron. When moss plants further diverged to dicotyledons, no introns were inserted in the coding region of XIPs. While the gene structures of moss and dicot XIPs are similar, the fungi with more introns have different pattern of exon-intron organization.

Evolution of dicot XIPs
Several reports, including fossil studies and molecular clock estimates have speculated the animal and plant evolutionary lines. Recent protein sequence analysis has estimated that major lineages of fungi were present more than 1000 million years ago and land plants appeared after 300 million years [90]. Analysis of MIPs from primitive organisms to higher animals will help to understand the evolution of these channel proteins and their transport mechanisms of diverse solutes at molecular level. Analysis of ar/R selectivity filters, loops and exon-intron organization of 34 XIPs from fungi, moss and dicot plants has given an idea about the evolution of this subfamily of aquaporins from fungi to higher plants (XIP from protozoa is not included in this Discussion). The hydrophilic ar/R selectivity filter in the fungi and moss XIPs indicates that these MIPs are likely to be involved in transport of hydrophilic solutes including water. The emergence of higher plants could possibly indicate more diversity in the solutes that are transported. The amino acids in the hydrophilic ar/R selectivity filters of fungi and moss XIPs were substituted by hydrophobic residues during the divergence of higher plants and this selectivity filter has become more hydrophobic in the dicot XIPs. As a result, the dicot XIPs are likely to be involved in solutes that are more hydrophobic than those transported by their counterparts in fungi and moss. With no XIP homolog found in monocots, at least the XIPs might have been replaced by some of the TIPs and NIPs with similar ar/R selectivity filters and transcription abundance profiles. The loop C in moss XIPs is longer than that of fungi XIPs and hence an insertion of more than 10 residues has occurred in the loop C of moss XIPs. While this length is retained in XIP1 group of dicot plants, the dicot XIP2s have loop C that is shorter by 8 residues. Hence, a deletion event seems to have occurred when dicot XIP2s evolved from moss or diverged from dicot XIP1s. As far as the loop D is con-cerned, dicot XIPs have more basic residues in loop D than their counterparts in fungi/moss and the only other subfamily with more number of basic residues in loop D is PIPs. As suggested for AQP1 [82], loop D in XIPs could be involved in activating the central tetrameric ion channel upon binding to some signaling molecule. Although evolution has made its mark in the selectivity filters and loops, the group conservation of small and weakly polar residues in the helix-helix interface of the α-helical bundle observed in other MIP subfamilies is largely maintained in XIPs also. Analysis of exon-intron pattern suggests that intron loss has occurred in XIP genes when fungi diverged from the lineage leading to primitive and higher plants. In summary, during divergence from fungi and moss, the ar/ R selectivity filters of dicot XIPs has become more hydrophobic, loop C has become longer in a subgroup of dicot XIPs and loop D has become more basic. Moreover, analysis of gene structure indicates that moss/Populus XIPs lost introns when they evolved from fungi. The evolutionary features observed for dicot XIPs are summarized in Figure  10. Some of the observations made in this study will be strengthened as more genome sequences are available for different kingdoms and we will have a better understanding of the evolution of MIPs at molecular level.

Conclusion
We have analyzed 55 Populus MIP sequences and compared them with those from Arabidopsis, rice and maize. In addition to the four known MIP subfamilies, Populus has an additional uncharacterized XIP subfamily. The non-XIP Populus members are similar to their counterparts in the other three plants. The ar/R selectivity filters of majority of PtMIPs and the characteristics of loops C and D are similar to AtMIPs, OsMIPs and ZmMIPs. As far as the gene structures are concerned, the number and positions of introns are conserved within each subfamily of a given species. However, the inter-species comparison indicates that PIPs and NIPs of monocots lost introns when they diverged from eudicots.
We have also characterized 35 XIPs belonging to four different taxonomic groups. Our results show that in comparison to a hydrophilic selectivity filters in fungi and moss XIPs, substitutions in ar/R selectivity filters led to a more hydrophobic constriction in dicot XIPs. A longer loop C due to insertion is observed when moss and a subgroup of dicot XIPs evolved from fungi. When fungi XIPs diverged, intron loss is observed in moss and dicot XIPs. Analysis of microarray data indicates that Populus XIPs are expressed in almost all the tissues studied and they don't show any unique tissue-specific expression. While substitutions in ar/R tetrad and insertion/deletion events in loops reflect the divergence of these channel proteins, a high conservation of small and weakly polar residues as a group at the helix-helix interface is observed in all MIP subfamilies. Presumably, such group conservation helps to maintain the structural integrity of this channel protein during evolution. Our results indicate that in comparison to their counterparts in fungi and moss, dicot XIPs are likely to transport more hydrophobic solutes. Loop C in dicot XIPs in general and XIP1 subgroup in particular will have a potential influence in the selectivity of the solutes.

Identification of Populus MIP genes
The genome sequence of Populus trichocarpa female individual "Nisqually 1" clone [67] was searched for MIP genes using TBLASTN [76,80]. The whole genome shotgun sequence (WGS) of Populus available at the National Center for Biotechnology Information (NCBI) [75] was used for this purpose with a rice MIP protein sequence (OsPIP2;1) as a query sequence. The hits thus obtained were subjected to phylogenetic clustering (see below). One representative sequence from each cluster was chosen as query sequence to identify additional and more distantly related Populus MIP homologs. Regions in Populus WGS contigs containing MIP genes were used to find out the gene structure using the program GeneMark.hmm ES-3.0 [77,78]. This version of GeneMark program is based on self-training algorithm for prediction of genes from novel eukaryotic genomes. There is significant similarity between Populus and Arabidopsis at the genome level and also the relative frequency of protein domains [67]. Between these two organisms, there is similarity in the codon usage also [70]. Hence, for gene prediction in Populus, Arabidopsis was chosen as a model organism in Gen-eMark. The predicted MIP genes were further compared with the Populus EST sequences available at NCBI and also the Populus EST database "PopulusDB" [70,72]. The KOG (euKaryotic Orthologous Groups) browser in Joint Genome Institute (JGI) [91] was also looked for Populus MIP genes.
The program T-COFFEE [79] was used to perform multiple sequence alignment on MIP protein sequences which was then used to generate phyologenetic tree. Three different methods were used to construct the evolutionary relationship among the sequences. They include neighborjoining method as implemented in Clustal (Version 1.82) [92] and heuristic searches using distance and parsimony methods as available in PAUP* version 4.0.0d55 in GCG package (Wisconsin Package version 10.3, Accelrys Inc., San Diego, California). The stability of branches in the resulting trees was confirmed by 100 bootstrap trails for all the three methods. The program TreeView [93] was used to display the trees.

Homology modeling of plant MIPs
Three-dimensional models of Populus MIPs and other MIP proteins were built using the same protocol described in our earlier studies [21] to build models of Arabidopsis, rice and maize MIPs and it is briefly described below. Modeling procedure consisted of two stages. In the first stage, the software package MODELLER [94,95] was used to construct homology models of plant MIPs. In the second stage, the program SCWRL3 [96] was used to predict the side-chain conformation. MODELLER derives a set of spatial restraints on the structure of the target sequence using its alignment with the sequence of template structure(s).
The resulting model is derived by optimizing the violations of all spatial restraints. The quality of the model is usually improved by considering more than one structure as template. In this study, we used the structures of mammalian AQP1 [52], bacterial GlpF [53] and archael AQPM [55] simultaneously as templates in their comparative modeling procedure. Their unique PDB IDs are 1J4N, 1FX8 and 2F2B respectively. All are high resolution structures (resolution 1.7 to 2.2 Å) and show different water permeabilities [96]. Using the program 'GAP' available in GCG package and the scoring matrix BLOSUM62, we found the pairwise sequence alignment of all Populus MIPs with the three template sequences. The average pairwise sequence identities between Populus MIPs and the three templates range from 21 to 45%. Template sequences were first aligned based on a multiple structural superposition and then the target sequence was aligned. The target-template alignment was manually checked to find out if there is any gap in the middle of a transmembrane helical region or in the conserved loops B or E. If necessary, this alignment was manually refined. We have also analyzed more than 800 MIP sequences from diverse organisms (Gupta and Sankararamakrishnan, Manuscript in preparation) and found that at least one residue in each transmembrane segment (E17, G59, Q103, E144, G175 and P218 respectively in TM1 to TM6; 1J4N numbering) is very highly conserved. We have exploited this information during the alignment of target and template sequences and hence there is less ambiguity in transmembrane segments. The models were built with the resultant target-template alignment using a 'very fast' simulated annealing optimization protocol. Ten models were built for each target sequence and the one with the lowest MODELLER objective function was selected. The refinement of loops and the side-chain conformations of nonconserved residues were carried out by MODELLER's loop optimization procedure and the graph theory-based SCWRL3 [96] method respectively. Finally, the model was minimized using GROMACS [97,98] and its stereochemical quality was evaluated using PROCHECK [99]. Pore diameter profile of the model along its pore axis was calculated using the program HOLE [100] as described in Bansal and Sankararamakrishnan [21].

Populus microarray analysis
The transcript abundance of all Populus MIPs was analyzed using PopGenExpress, an Affymetrix microarray-based resource for poplar transcriptome analysis [85]. Expression data was obtained in biological triplicate RNA samples extracted in nine tissues by Malcolm Campbell and coworkers [85] and we have reanalyzed this transcript abundance data to find out whether there is any pattern of transcript accumulation in Populus MIP members. The microarray data corresponding to these experiments can be accessed in the NCBI's GEO database [101] (accession number: GES13990). Probe sets corresponding to the putative Populus MIPs were identified using Probe Match, a tool available as part of the NetAffx Analysis Center [102]. The identified probe sets were then used in the Populus electronic fluorescent pictograph browser (Poplar eFP browser) [103] to find out the transcript abundance levels. For genes with more than one probe sets, the median of expression values were considered. When two genes have the same probe set, then they are considered to have same level of transcript accumulation. The probe sets were then clustered using hierarchical clustering based on Pearson coefficients and the program Heatplus available in Bioconductor package [86] was used to display the expression pattern.

Authors' contributions
RS conceived the project. RS and ABG designed the work. ABG carried out the work. RS and ABG wrote the manuscript. Both authors approved the final version of the manuscript. Figure 10 Evolution of dicot XIPs. A likely scenario for the evolution of dicot XIPs. The evolutionary events are indicated at the point where the XIPs diverged. Dicot XIPs evolved from fungi and moss through substitutions at ar/R selectivity filter, insertion/deletion of loop C and loss of an intron. However, small and weakly polar residues occurring at the helix-helix interface are highly group-conserved.