Identification and structural characterization of FYVE domain-containing proteins of Arabidopsis thaliana

Background FYVE domains have emerged as membrane-targeting domains highly specific for phosphatidylinositol 3-phosphate (PtdIns(3)P). They are predominantly found in proteins involved in various trafficking pathways. Although FYVE domains may function as individual modules, dimers or in partnership with other proteins, structurally, all FYVE domains share a fold comprising two small characteristic double-stranded β-sheets, and a C-terminal α-helix, which houses eight conserved Zn2+ ion-binding cysteines. To date, the structural, biochemical, and biophysical mechanisms for subcellular targeting of FYVE domains for proteins from various model organisms have been worked out but plant FYVE domains remain noticeably under-investigated. Results We carried out an extensive examination of all Arabidopsis FYVE domains, including their identification, classification, molecular modeling and biophysical characterization using computational approaches. Our classification of fifteen Arabidopsis FYVE proteins at the outset reveals unique domain architectures for FYVE containing proteins, which are not paralleled in other organisms. Detailed sequence analysis and biophysical characterization of the structural models are used to predict membrane interaction mechanisms previously described for other FYVE domains and their subtle variations as well as novel mechanisms that seem to be specific to plants. Conclusions Our study contributes to the understanding of the molecular basis of FYVE-based membrane targeting in plants on a genomic scale. The results show that FYVE domain containing proteins in plants have evolved to incorporate significant differences from those in other organisms implying that they play a unique role in plant signaling pathways and/or play similar/parallel roles in signaling to other organisms but use different protein players/signaling mechanisms.


Background
The FYVE lipid-binding domains were named after the first letter of the four proteins in which they were originally discovered: Fab1, YOTB, Vac1, and EEA1 [1]. FYVE proteins have primarily been associated with functions related to endosomal trafficking e.g. Hrs is involved in sorting of down-regulated receptor molecules in early endosomes [2], Vacuolar protein sorting mutant 27 phenotype (Vps27p) in endosome maturation [3], EEA1 in endocytic membrane fusion [4] and regulation of endosome-to-TGN retrograde transport via phosphatidylinositol 3-phosphate 5-kinase (PIKfyve) [5].
However, they may play other important roles in cell signaling as exemplified by Faciogenital dysplasia 1 in cytoskeletal regulation [6], Fab1p in regulation of membrane homeostasis [7][8][9] and Smad Anchor for Receptor Activation (SARA) [10] as well as endofin in growth factor signaling [11][12][13]. Structurally, FYVE domains share a fold comprising of two small double-stranded β-sheets and a C-terminal α-helix as deduced from experimentally solved structures such as the crystal structure of the FYVE domain from yeast Vps27p [14]. The fold is stabilized by eight Zn 2+ coordinating cysteines residues, which bind Zn 2+ in pairs such that the first and third pairs bind one zinc atom, while the second and fourth pairs bind the other zinc atom [14]. The FYVE domains have been characterized as phosphoinositide-binding domains that are highly specific for the phosphatidylinositol 3 phosphate (PtdIns(3)P) [15][16][17][18]. This ligand recognition is Zn 2+ -dependent [19] and stems primarily from a conserved ligand-binding motif, i.e. (R/K)(R/K) HHCR surrounding the third and fourth cysteine residues [14]. Mutagenesis of either the cysteines involved in Zn 2+ coordination or the ligand-binding conserved residues result in decreased affinity for PtdIns(3)P [15,[19][20][21].
The PtdIns(3)P-binding signature contains three classic conserved regions: the N-terminal WxxD, the central R(R/K)HHCR and the C-terminal R(V/I)C motifs [14]. Combined they drive the PtdIns(3)P specific membrane recruitment of FYVE domains. However, there are several factors in addition to PtdIns(3) P-binding that are thought to contribute to the membrane affinity of FYVE domains: nonspecific electrostatic interactions between the basic face of the domain and the anionic membrane surface [22][23][24], hydrophobic interactions between the residues located in the "turret loop" near the PtdIns(3)P binding pocket and the membrane bilayer [14,[23][24][25][26], dimerization [19,27] and pH [28]. In additional to working out the structural and functional role of various amino acids comprising the binding motifs, it has also been shown that the binding of PtdIns(3)P to the ligand-binding pocket of FYVE domains neutralizes nearby basic residues to reduce the local positive potential and allow conserved hydrophobic residues to penetrate the membrane interface enhancing membrane attachment [22,24,25]. Recently, a molecular dynamics simulations study explored the interactions of the EEA1-FYVE domain and verified that it undergoes a decrease in dynamic flexibility upon binding to its PtdIns(3)P ligand and a phospholipid bilayer [29].
The PtdIns(3)P-binding FYVE domains are well conserved in various organisms and have been studied extensively in different model organisms except plants. Plants possess several FYVE domain-containing proteins and PtdIns(3)P has been shown to be present in various compartments [30] as well as membranes [11] of plant cells. It is possible to envision that plant cells utilize the same or highly similar lipid-binding and membrane-targeting mechanisms [30] for FYVE domains given that both the FYVE domains and type III PI3-kinase, which makes PtdIns(3)P, are present in plant cells [31]. However some recent reports suggest that PtdIns(3)P may not be the only known phosphoinositide ligand recognized by plant FYVE domains, for example, the FYVE of EEA1 has been shown to be capable of binding to PtdIns(5)P [32,33].
We have undertaken a comprehensive examination of all FYVE domains of the model plant Arabidopsis thaliana (At) to understand the structural basis for the mechanism of their function and to explore their similarities and differences with respect to other organisms. We describe the 15 different FYVE domain-containing proteins that are expressed in Arabidopsis, all of which are largely unexplored. Our detailed sequence analysis and biophysical characterization of the structural models of the FYVE domains in Arabidopsis suggest membrane interaction mechanisms and their subtleties. Moreover, the study also reveals unique biophysical properties of plant FYVE domains, a new binding motif specific only to the variant class of plant FYVE domains and novel domain architectures unique to plant FYVE proteins.

Results
Identification, characterization and chromosomal localization of FYVE domain-containing proteins encoded in the Arabidopsis genome The total number of FYVE domain-containing proteins seems to be directly correlated with the total estimated number of genes for a given organism, e.g. 27 FYVE encoding genes in a total of 42,000 in H. sapiens, 13 in a total of 18,000 in C. elegans and 5 in a total of 6,000 in S. cerevisiae [34]. We identified 15 AtFYVE proteins in the Arabidopsis protein sequence database i.e. TAIR first genome release (version TAIR 6.0, Nov 2005). Later genome releases built upon the gene structures of TAIR6 release as well as community input regarding missing and incorrectly annotated genes and they do not contain any new genes encoding FYVE proteins. Our finding of 15 FYVE proteins encoded within predicted 25,500 genes [35] of the Arabidopsis genome falls in line with the above observation. The initial identification was done using an automated pipeline [36]. Later, the total number of AtFYVE proteins and their individual accession numbers were verified through manual searches performed in various databases. The 15 FYVE domains present in various Arabidopsis proteins (representing the entire family of AtFYVE proteins) aligned with human EEA1 FYVE domain (PDB: 1JOC chain A [37]) are shown in Fig.  1A. Fig. 1B displays the schematic localization of the 15 AtFYVE proteins within the Arabidopsis genome. The 15 identified sequences of AtFYVE proteins are dispersed throughout the Arabidopsis genome, being located on all chromosomes except chromosome 2 (Fig. 1B). The disagreement of our total with previously reported totals of nine [32], over ten [38] and most recently, sixteen [39] FYVE domains stems from misannotations. For example, AT1G61620, AT1G66040, AT1G66050 and AT5G39550 proteins are all annotated as FYVE proteins but do not actually possess FYVE domains based on various sequence analysis methods.

Domain Architecture of Arabidopsis FYVE proteins
On the basis of domain architecture of the proteins, we propose five classes of AtFYVE proteins (Fig. 2). Class I comprises two out of four documented Arabidopsis Fab1p homologues expressed in plants, i.e. AT3G14270 and AT4G33240 [40,41]. The other two Fab1 homologues do not contain a FYVE domain [40,41]. Class I members, i.e. AT3G14270 and AT4G33240, contain a FYVE domain, followed by Fab1_TCP(chaparonin-like) and PIPKc domains. AT3G14270 and AT4G33240 are annotated in NCBI database as "phosphatidylinositol-4phosphate 5-kinase family proteins" while in Uni-ProtKB/TrEMBL as "putative uncharacterized proteins." Our Blast analysis reveals similarity of both class I members to ppk-3 (C. elegans), Fab1p (S. cerevisiae), and phosphatidylinositol-3-phosphate 5-kinase type III (H. sapiens) (see supplementary material). Ppk-3 and Fab1p proteins share domain architecture identical to class I members and phosphatidylinositol-3-phosphate 5-kinase type III protein has an additional DEP domain (see supplementary material). Class II is represented by two sequences, AT3G43230 and AT1G29800, which possess two domains: a FYVE and a Domain of Unknown Function (DUF500). Class III comprises the AT1G61690 protein and class IV comprises the AT1G20110 protein.
Both classes are unique in that they contain only a FYVE domain but they differ in the placement of the FYVE domain (N-terminus versus C-terminus) and also their biophysical properties (this study). UniProtKB/ TrEMBL annotates function for class II-IV as putative uncharacterized. The representation of class II-IV members in the literature is full of contradictions. They are not mentioned in the classification by Drobak and Heras [38] and AT1G29800 of class II together with AT1G61690 of class III are omitted from the classification by Jensen et al [32]. Moreover, class IV protein was identified as AtAAF79901 and shown to contain a FYVE domain followed by a plant specific SGNH-plant-lipaselike domain [32]. Our analysis of the sequence suggests, however, that class IV protein is over 300 amino acids shorter than AtAAF79901, and it does not contain a SGNH-plant-lipase-like domain. Class II sequences, seem to contain an additional DUF500 domains not represented by van Leeuwen [39]. Class V is the largest class. It includes nine AtFYVE proteins, which share similar domain architecture, i.e. Pleckstrin Homology of Phospholipase C (PH_PLC), followed by Regulator of Chromosome Condensation 1 (RCC1) regions/blades (overlapping with Alpha Tubulin Suppressor 1 (ATS1)) and FYVE domains. In addition, seven out of nine class V proteins are characterized by the presence of a DZC motif found near the C-terminus DZC. UniProtKB/ TrEMBL annotates function for class V members as either disease resistance protein-like, e.g. AT5G42140 and AT4G14370, Ran GTPase binding/chromatin binding/zinc ion binding, e.g. AT1G65920, AT1G69710, AT3G23270 and AT5G12350, or putative uncharacterized, e.g. AT3G47660, AT1G76950, and AT5G19420.
The SMART database recognizes between three and five RCC1 regions within class V AtFYVE proteins, whereas the CD-search identifies additionally yeast domain with similarity to human RCC1 domain, ATS1 domain, overlapping the RCC1 blades ( Fig. 2). In some cases, only the ATS1 domain is detected by the CD-search or the number of RCC1 blades does not correspond to the number obtained from SMART database (data not shown). These inconsistencies prompted further enquiry into the number and nature of the putative RCC1 repeats identified in class V of AtFYVE proteins. Up to now, RCC1 and RCC1-like domains that have been described are within cytoplasmic proteins associated with membrane structures, e.g. endosomes (Alsin) [42] and Golgi apparatus (HERC1) [43]. Fig.  3 shows an internal sevenfold sequence repeat of 51-68 residues present in the solved structure of human RCC1 [44] aligned with putative RCC1 regions of class V AtFYVE proteins. In human RCC1, one half of the first sequence repeat, the C and D repeats, is made from the N-terminal end of the protein, and the other half, the A and B repeats, is made from the C-terminal end [44]. It has been suggested that this arrangement stabilize the   [44]. Residues absolutely conserved within the secondary structures are black on a colored background. Residues moderately conserved within the secondary structures and/or among Arabidopsis repeats and not human RCC1 are white on a colored background. Boxed residues correspond to amino acids which are highly conserved among each blade of the seven propeller structure [44]. circular arrangement of secondary structural elements through a molecular clasp mechanism similar to a belt closure [44]. Our data show that putative RCC1 blades of AtFYVE proteins align well with six of human seven RCC1 blades. In fact, the seven highly conserved residues, i.e. four glycines, a tyrosine, a leucine and a cis-proline, identified in human RCC1 repeats are also mostly conserved among putative AtRCC1 blades (boxed residues). However, it appears that the first blade of human RCC1 shares little or no primary and/or secondary sequence similarity with most putative AtRCC1 blades. The first blade of putative AtRCC1 may not even be a potential repeat for at least seven out of nine class V AtFYVE proteins because they share a low sequence similarity with human RCC1 in the corresponding region as compared to other regions.  27 isoform β-FYVE (PDB: 1X4U). Their overall net charge is highly positive, but varies from +9 to +16. The electrostatic profile of class IV AtFYVE model shows the weakest positive potential observed among AtFYVE domain models. Additional electrostatic profiles for the alternative models, their PDB coordinate files and verification profiles are available online (see supplementary material).

Sequence motifs of the Arabidopsis FYVE domains
AtFYVE domains can be divided into two distinct groups based on different consensus sequences identified via CLUSTALW multiple sequence alignment (Fig.  5). Fig. 5A depicts AtFYVE domains, which belong to class I-IV. These Arabidopsis domains were previously referred to as classic FYVE domains because they contain three classic conserved regions: the N-terminal WxxD, the central R(R/K)HHCR and the C-terminal R (V/I)C motifs [32] implicated in binding the phosphoinositide ligand PtdIns(3)P. Class I-IV AtFYVE proteins have a classic FYVE domain (Fig. 5A) with a conserved motif for PtdIns(3)P-binding that is found in FYVE domains of H. sapiens [34], S. cerevisiae, C. elegans [45] and various other organisms, e.g. P. troglodytes, M. musculus, R. norvegicus, C. familiaris, B. taurus, G. Gallus. Class V AtFYVE domains do not share the N-terminal WxxD motif. Instead they have a WxxG motif, only a G residue or residues that share no similarity to the WxxD or WxxG motifs (Fig. 5). Moreover, the central R(R/K) HHCR motif is replaced by a (K/R)(R/K)HNCY motif, which is atypical and hence the name "variant binding motif" of FYVE domains [32]. We observe that the variable turret loop prior to the R (R/K)HHCR motif, which is associated with membrane penetration of the FYVE domain, and the putative dimerization interface region are made up of residues, which are quite diverse in the various class I-IV FYVE domains. Despite the observed differences in residues, however, all class I-IV AtFYVE domains share at least one hydrophobic residue within the turret loop and highly hydrophobic dimerization interface regions. AT1G29800-FYVE and AT3G43230-FYVE have an insertion of an additional hydrophobic residue within the turret loop. Class V AtFYVE domains contain a conserved phenylalanine residue in the second position (with the exception of AT3G47660-FYVE) and a conserved arginine in the last position within the turret loop. As in the case of class I-IV AtFYVE domains, the putative dimerization interface region of class V AtFYVE domains is highly hydrophobic. Unlike class I-IV AtFYVE domains, however, class V AtFYVE domains dimerization interface regions seems highly conserved with at least three absolutely conserved residues, i.e. AxxAP.

FYVE domains have the potential to bind headgroups of both PtdIns(3)P and PtdIns(5)P
Preliminary docking studies depicted in Fig. 5 and Fig. 6 show that class I-IV AtFYVE domains have a potential to bind headgroups of both PtdIns(3)P and PtdIns(5)P using the same set of residues previously identified to bind the headgroup of PtdIns(3)P in other FYVE domains, i.e. the RHHxR motif and the arginine residue of RVC motif (Fig. 5). Class V AtFYVE domains use the variant signature of residues, i.e. xRKxHNxY motif, and a (L/F/P)YR motif, which overlaps the classic RVC motif, to potentially bind headgroups of PtdIns(3)P and PtdIns(5)P (Fig. 5B). In addition to the variant residues, our data indicate that a (H/K/N)xx(S/T)(S/N)(K/R)K motif located immediately prior to the dimerization region is also used by class V AtFYVE domains to recognize either headgroup (Fig. 5B).

Discussion
Proteins that contain FYVE zinc finger domains have so far been known as effectors of PtdIns(3)P playing a major role in endocytic and vesicular trafficking [46][47][48]. PtdIns(3)P is a phosphoinositide that is present at very low levels in plant cells [49][50][51][52]. It is synthesized by phosphatidylinositol 3-kinase (PI3K). Both PtdIns(3)P and PI3K are essential for normal plant growth [31] and have been implicated in diverse physiological functions, including root nodule formation [53], auxin-induced production of reactive oxygen species (ROS) and root gravitropism [54], root hair curling and Rhizobium infection in M. truncatula [55], maintenance of the processes essential for root hair cell elongation [56], increased plasma membrane endocytosis and the intracellular production of ROS in the salt tolerance response [57], stomatal closing movement [57,58], and possibly cytokinesis [11]. If we envision plant FYVE domains as being potential effectors of PtdIns(3)P, they could play important roles in various physiological processes. In this study we have modeled the structure of all AtFYVE domains and predicted their membrane targeting behavior based on the biophysical profiles of the modeled structures.
Based on the domain architecture and homology to proteins of known function, we have classified AtFYVE proteins into five distinct classes (Fig. 2). Similar domain based classifications previously performed for FYVE domain-containing proteins in H. sapiens, C. elegans and S. cerevisiae genomes [34,45] suggested a certain degree of correspondence among the different FYVE proteins in various organisms [34]. However, AtFYVE proteins are striking in showing no obvious similarities or correspondence to the FYVE proteins included in these domain architecture-based classifications. More specifically, only one class of AtFYVE proteins corresponds to what was reported in other organisms, i.e. class I in our classification and the corresponding PIKfvye, MmPIKfyve and ScFab1p groups in the other classifications [34,40,45]. Even that correspondence, however, is partial since the Arabidopsis counterparts lack the disheveled, Egl-10, and pleckstrin (DEP) domain observed in mammals and worms [34,40,45,59]. The remaining members of AtFYVE proteins class II-V are unique and exhibit completely different domains suggesting that FYVE domains in plants play a unique role in plant signaling pathways and/or play similar/parallel roles in signaling as other organisms but use different protein players or signaling mechanisms.
The two class I sequences are homologues of the PIKfyve/Fab1 family of phosphatidylinositol phosphate 5kinases that phosphorylate the D-5 position in phosphatidylinositol (PtdIns) and PtdIns(3)P to make PtdIns(5)P and PtdIns(3,5)P 2 , respectively [60]. PIKfyve/Fab1 proteins bind PtdIns(3)P with high specificity through their FYVE domains [15,60] and are known to participate in several aspects of endosomal trafficking functions [61], transduction of osmotic shock signals [62] and other cellular functions in mammals and yeast [40] as well as in plants [63][64][65]. Recently, the two class I AtFYVE, PIKfyve proteins, were found to participate in vacuolar rearrangement essential for successful pollen development [63] and our molecular models provide the structural insight into their mode of function. Both members possess the complete classic signature for PtdIns(3)P-binding and the conserved hydrophobic motif suggesting that they likely bind membranes using the general mechanism of non-specific electrostatic interactions, followed by membrane penetration of hydrophobic residues close to the PtdIns(3)P-binding pocket facilitated by an electrostatic switch coupled with specific interactions with PtdIns(3)P as proposed by previous computational modeling studies [22]. These studies have shown that all human FYVE domains have electrostatic equipotential profiles similar to those of Hrs and EEA1 FYVE domains. This electrostatic polarity seems to be characteristic for class I AtFYVE domains and their S. cerevisiae and C. elegans homologues (Fig. 4 and Fig. S1  (Supplementary material)). Despite the overall electrostatic profile similarity, AT3G14270-FYVE has a higher net charge (+7 at pH 6.5; Zn ions included) than AT4G33240-FYVE (+3 at pH 6.5; Zn ions included) (Fig  3). Based on the net charge difference, we predict that AT4G33240-FYVE will have a reduced non-specific electrostatic contribution to membrane targeting. Moreover, we predict that its hydrophobic contribution will also be reduced because the conserved hydrophobic motif of AT4G33240-FYVE possesses a valine residue instead of a leucine residue found in AT3G14270-FYVE (Fig. 4). Additionally, FYVE domain dimerization might be important for functional membrane association of AT4G33240-FYVE.
Class II-IV proteins have untouched sequences in terms of functional assignment, which remain annotated as "putative uncharacterized proteins" in various sequence databases. All of them share the complete/ nearly complete conserved PtdIns(3)P-binding motif and a large basic binding pocket except for class IV AT1G20110-FYVE, which has a significantly reduced basic surface patch in the potential ligand-binding pocket ( Fig. 4; class II-IV domains have net charges of +6, +8, +11, and +2, respectively). Class II FYVE domains possess a classic FYVE domain electrostatic profile but their binding signature is missing the first of the arginines in the R(R/K)HHCR motif, which is known to recognize the 1-phosphate of PtdIns(3)P headgroup [20]. Even though this residue doesn't participate in the direct recognition of the 3-phosphate, mutational studies suggest that substitution of this arginine substantially reduces the FYVE domain's affinity for PtdIns (3)P-containing membranes and potential for membrane localization. The altered signature may slightly reduce the local basic charge in the vicinity of the hydrophobic motif and lower the barrier to membrane penetration. In this class, we predict a classic FYVE domain membrane-targeting behavior with subtle differences that could be verified using mutational studies. Class III FYVE domain on the contrary has the full binding signature and an electrostatic equipotential profile similar to those of Hrs and EEA1 FYVE domains. We predict that this domain will localize to PtdIns(3)P-containing membranes using the classic mechanism of action of previously studied FYVE domains with a strong contribution from non-specific electrostatic interactions.
Class IV AtFYVE domain has the most reduced basic surface patch and the lowest net charge of +2 among AtFYVE domains. Hydrophobic contribution through membrane insertion will likely be an important component of membrane binding for this class, similar to FENS-FYVE [22], which localizes to endosomal membrane [66] even though it has a weaker positive potential than other known FYVE domains [22].
Class V proteins are the most interesting class of the FYVE domain-containing proteins although much remains to be understood about their function. Out of the 18 human RCC1 superfamily proteins, none corresponds, in their domain architecture to class V FYVE proteins [67]. The closest match, the PAM protein, has 3 RCC1 repeats and a FYVE domain but in a different order and accompanied by domains other than domains found in class V AtFYVE proteins [67]. In contrast to the traditional seven canonical repeats found is most RCC1-like proteins, there are six RCC1 repeats in some proteins such as WBSCR16, Nek9, RPGR [67] and some AtFYVE proteins (this study). Since β-propellers (including RCC1 repeats) could be made of a variable number of blades and are thought to evolve by blade duplication and deletion [68], there could be three alternative explanations for the absence of the first canonical RCC1 repeat in some class V AtFYVE proteins: 1) the second half of blade 1 and the first half of blade 7 engage with one another to form a symmetrical 6-bladed β-propeller; 2) an "open" ring-propeller forms as known for the Cterminal domain of ParC subunit [69] and suggested for the short-form of Alsin [70]; or 3) the first repeat is a non-canonical RCC1 repeat as seen in other proteins [67]. Therefore, despite the sequence differences, it is possible that the 6 RCC1 repeats found in some AtFYVE adapt a β-propeller structure similar to β-propeller structures found in proteins from other organisms.
Previously, it has been suggested that association with membrane(s) may be crucial for the functioning of this class of AtFYVE proteins given the presence of two phosphoinositide-binding domains, i.e. PH and FYVE domains [32]. Experimental data suggest that class V AT1G65920 PH domain binds to PtdIns(4,5)P 2 while its FYVE domain binds to PtdIns(3)P as well as PtdIns(5)P [32]. The various members of class V AtFYVE domains show a high degree of sequence conservation within an enrichment of basic residues throughout the length of the FYVE domain (Fig. 5). The most striking feature of these FYVE domains is the presence of a variant phosphoinositide-binding motif (Fig. 5B), which seems to be unique to plants as is the overall domain architecture of these proteins ( [32]; Fig. 5B). When the variant (K/R)(R/ K)HNCY motif of class V FYVE domains is used to search for other FYVE domains, only sequences from plants are retrieved, e.g. Q1SA17 (M. truncatula), The obvious question that comes to mind is whether this variant signature is responsible for an altered binding specificity in this class of FYVE proteins and therefore associated with a novel pattern of membrane/ sub-cellular targeting. Within mammalian cells, FYVE domains are highly conserved and seem to select PtdIns (3)P over other phosphoinositides [16]. Despite the conservation in the overall mechanism, there are significant differences in the specificity and affinity of individual FYVE domains towards phosphoinositides. In fact, EEA1 has affinity for PtdIns(5)P as well, perhaps because PtdIns(3)P and PtdIns(5)P are similar in all aspects except having the phosphomonoester in a different position [38]. Consequently, PtdIns(5)P has been shown to induce small but important chemical shift changes similar to those induced by PtdIns(3)P in the binding motif residues with the exception of one arginine, which remains practically unaltered by PtdIns(5)P [38]. PtdIns (3)P specific recognition by the FYVE domain seems to involve indirect recognition of this specific ligand by exclusion of alternatively phosphorylated phosphoinositides: the two residues implicated in this are the aspartic acid of the N-terminal WxxD motif and the second histidine of the central HHCR motif [20]. Both of these motifs are substituted in class V variant AtFYVE domains (Fig. 5) by the WxxG and HNCY motifs, respectively. This opens up the possibility that class V FYVE domains may have the potential to interact equally or better with phosphoinositide ligands other than PtdIns(3)P. Our preliminary docking analysis of classic as well as variant motif-containing AtFYVE domains seem to suggest that both have the potential to interact with PtdIns(3)P and PtdIns(5)P headgroups using practically the same set of residues (Fig. 5). Additionally, our analysis reveals a highly conserved putative ligand-association motif located immediately prior to the dimerization region present only within the class V proteins (Fig. 5). Class V AtFYVE domains are also different in exhibiting very large basic surface patches with prominent hydrophobic motifs. These patches are the largest observed among FYVE domains classified to date [71,72]. We predict that class V AtFYVE domains target to the membrane with highly significant contributions from non-specific electrostatics and hydrophobic interactions, coupled with specific interactions with PtdIns(3) P and/or PtdIns(5)P using the variant binding residues and an additional conserved motif specific to this class of FYVE domains.
Based on experimental studies, it has been suggested that the strength of the positive potential and the identity of the hydrophobic residues near the binding site may be two key factors, which are critical in determining which FYVE domains act alone, undergo dimerization or require additional partners before anchoring to the membrane [22]. For example, SARA-FYVE was predicted and verified experimentally to associate with the membrane with significant contributions from non-specific electrostatic and hydrophobic interactions given its net charge of +12 (zinc ions included) as well as the presence of phenylalanine at the conserved hydrophobic position [22,71,73]. Our data suggests that AtFYVE domains engage in both non-specific electrostatic and PtdIns(3)P-induced hydrophobic interactions for membrane localization, the contribution differing for individual domains as described earlier. Additionally, dimerization may play an important role in the membrane recruitment of FYVE domains [21,74] and it appears that the free energy contributions to the membrane association are additive for each monomer of the EEA1-FYVE dimer [22]. The dimer interface regions of AtFYVE domains are longer and more hydrophobic (Fig.  5) than the equivalent region of EEA1-FYVE and predicted region of SARA-FYVE [72,75] suggesting that all AtFYVE domains have the potential to dimerize and associate with membrane(s) as dimers.

Conclusions
Overall, AtFYVE proteins are quite distinct from other organisms, exhibiting unique domain architectures, biophysical properties as well as altered binding motifs. The biophysical profiles of the modeled FYVE domains in Arabidopsis suggest membrane-targeting mechanisms ranging from the previously described classic modes to the novel binding mode of the class V FYVE domains, which seem to be found only in plants. Our predictions provide a foundation for designing directed mutational studies to confirm these behaviors, which is crucial to the understanding of the role of these domains in important plant signaling pathways, something that has so far not been explored.

Arabidopsis FYVE proteins
The accession numbers of the AtFYVE proteins were identified using a computational pipeline for automated high-throughput modeling [36], which run against Arabidopsis protein sequence database (TAIR6_-pep_20051108). The AtFYVE protein sequences corresponding to the identified accession numbers were retrieved from KEGG GENES [76,77] and verified for presence of FYVE domain with SMART [78][79][80].

Modeling methodology
There is no single homology modeling program/routine that has been singled out as the best method for comparative modeling [97]. To generate high-quality models for the AtFYVE domains, we implemented a number of programs to create many different alternative alignments and models followed by a quality assessment and a selection process. We used two separate approaches: automated and manual. The automated approach involved the use of a high-throughput computational pipeline, which uses its own built in alignment, modeling and evaluation methods [36] as well as Pudge for modeling and evaluation [98]. The manual approach is based on choosing several alternative options for each step in the process of creating the homology models as previously detailed by Singh and Murray [99]. The scheme involves the use of multiple approaches at each step: 1) choice of a suitable structural template, 2) alignment of the template and target sequences, 3) model building, and 4) model evaluation and refinement using 3D-JIGSAW [100][101][102], Modeller 8v1 [103,104], NEST [105], LOOPP [106][107][108], HOMER [109], CPH [110], PHYRE [111], manual editing using GeneDoc [112], guided by Verify3 D [113,114] and Prosa [115]. Loop refinement and side chain conformations were performed using individual modeling programs whenever available. In addition, loop refinement was done with Loopy [116] and the prediction of side-chain conformations with SCWRL3.0 [117] and SCAP [118][119][120].

Analysis of the models
The models were analyzed for their sequence, structural and biophysical properties. The analyses of biophysical properties including the electrostatics, hydrophobicity and shape of each model were conducted using the surface property analysis tools in the program GRASP [121]. The pKa values of ionizable amino acid side chains in AtFYVE domains as well as total charges were computed using the automated system H++ [122][123][124], which is based on solutions to the Poisson-Boltzmann equation. The calculations were performed using default settings. The reported total charges was calculated at pH 6.5 because EEA1-FYVE was estimated to exist in bound state at low pH of 6.0-6.6 and only half of the protein was estimated to remain active at the cytostolic pH of 7.3 [20].

Phosphoinositides docking and analysis of resulting interactions
Rigid and flexible docking was performed using DOCK 6.1 [126] and DOCK 6.1 suite programs. A molecular surface of the receptor was created with DMS [127,128]. Spheres were generated with Sphgen_cpp v1.2, which was modified by Andrew Magis from its original version called Sphgen [126]. The resulting file was edited to include only spheres grouped within the first cluster. Grids were generated with GRID [129,130]. Contact scores and energy scores were calculated using an energy cutoff distance of 5.0 A. Our docking technique was validated by docking Ins(1,3)P 2 of known FYVE domains into their corresponding solved structures. Although FYVE domains are suggested to bind only Ins (1,3)P 2 and Ins(1,5)P 2 , we also docked Ins(1,4)P 2 , Ins (1,4,5)P 3 , Ins(1,3,5)P 3 , Ins(1,3,4)P 3 , and Ins(1,3,4,5)P 4 as controls. Following the initial validation we used our approach to dock three Ins(1,3)P 2 and three Ins(1,5)P 2 ligands using rigid and flexible docking scenarios with the predictive models of AtFYVE domains. In the end, each predictive model was subjected to twelve docking runs, six for each headgroup. A given residue is reported to interact with the headgroup only if it does so 50% or more of the time (i.e. 3 or more times) as evaluated by the Ligand-Protein Contacts (LPC) server [131].

Electronic supplementary material
The sequences and coordinate files representing our models for all AtFYVE domains as well as other supplementary information (GRASP images and structure verification plots, and alignment files) are available at the following website: http://userhome.brooklyn.cuny.edu/ ssingh/arabidopsis/FYVE/fyve.html.