Genome-wide and molecular evolution analyses of the phospholipase D gene family in Poplar and Grape

Background The Phospholipase D (PLD) family plays an important role in the regulation of cellular processes in plants, including abscisic acid signaling, programmed cell death, root hair patterning, root growth, freezing tolerance and other stress responses. PLD genes constitute an important gene family in higher plants. However, until now our knowledge concerning the PLD gene family members and their evolutionary relationship in woody plants such as Poplar and Grape has been limited. Results In this study, we have provided a genome-wide analysis of the PLD gene family in Poplar and Grape. Eighteen and eleven members of the PLD gene family were identified in Poplar and Grape respectively. Phylogenetic and gene structure analyses showed that the PLD gene family can be divided into 6 subgroups: α, β/γ, δ, ε, ζ, and φ, and that the 6 PLD subgroups originated from 4 original ancestors through a series of gene duplications. Interestingly, the majority of the PLD genes from both Poplar (76.5%, 13/17) and Grape (90.9%, 10/11) clustered closely together in the phylogenetic tree to the extent that their evolutionary relationship appears more tightly linked to each other, at least in terms of the PLD gene family, than it does to either Arabidopsis or rice. Five pairs of duplicated PLD genes were identified in Poplar, more than those in Grape, suggesting that frequent gene duplications occurred after these species diverged, resulting in a rapid expansion of the PLD gene family in Poplar. The majority of the gene duplications in Poplar were caused by segmental duplication and were distinct from those in Arabidopsis, rice and Grape. Additionally, the gene duplications in Poplar were estimated to have occurred from 11.31 to 13.76 million years ago, which are later than those that occurred in the other three plant species. Adaptive evolution analysis showed that positive selection contributed to the evolution of the PXPH- and SP-PLDs, whereas purifying selection has driven the evolution of C2-PLDs that contain a C2 domain in their N-terminal. Analyses have shown that the C2-PLDs generally contain 23 motifs, more than 17 motifs in PXPH-PLDs that contain PX and PH domains in N-terminal. Among these identified motifs, eight, (6, 8, 5, 4, 3, 14, 1 and 19) were shared by both the C2- and PXPH-PLD subfamilies, implying that they may be necessary for PLD function. Five of these shared motifs are located in the central region of the proteins, thus strongly suggesting that this region containing a HKD domain (named after three conserved H, K and D residues) plays a key role in the lipase activity of the PLDs. Conclusion As a first step towards genome wide analyses of the PLD genes in woody plants, our results provide valuable information for increasing our understanding of the function and evolution of the PLD gene family in higher plants.


Background
Plants are exposed to widely varying environmental conditions and because of their sessile nature they can only survive and thrive by adapting to the changes in their surroundings. Thus, higher plants have the ability to adapt to periods of stress by employing specific responses underpinned by defined modifications of their cellular processes. Phospholipase D (PLD) plays an important role in the regulation of diverse cellular processes in plants, including abscisic acid signaling, programmed cell death, root hair patterning, root growth, freezing tolerance and other stress responses [1]. PLD hydrolyzes phospholipids into a head group alcohol and phosphatidic acid (PA), which is an important intracellular messenger in plants, microorganisms and mammals [2].
The gene encoding PLD was first identified in plants more than 50 years ago [3], but did not receive detailed attention until the 1980s [4,5]. Multiple PLD genes encoding isoforms that could be classified into different subgroups with distinct biochemical, regulatory and catalytic properties have now been identified. Six Arabidopsis PLDs (α, β, γ, δ, ε and ζ) have been characterized molecularly and biochemically and can be differentiated depending on their requirements and/or affinities for Ca 2+ , phosphatidylinositol 4,5-bisphosphate (PIP 2 ) and free fatty acids [6,7]. The predominant isoenzyme is the αtype PLD, which can be detected in both the leaves and seeds of plants and is responsible for the majority of the baseline PLD activity found therein. PLDα does not require phosphoinositides for its activity when assayed in the presence of mM levels of Ca 2+ ions. It exhibits optimum activity at pH values between 5 and 6 and at high, non-physiological Ca 2+ concentrations between 30 and 100 mM [8,9]. In contrast, the β, γ, δ and ε PLD isoenzymes from Arabidopsis show their highest activity at μM Ca 2+ concentrations and require the presence of PIP 2 to be fully active [10]. The activity of plant PLDζ appears to occur independently of Ca 2+ ions, but requires PIP 2 to selectively hydrolyze phosphatidylcholine. In rice, an additional isoenzyme, PLDφ, has been identified but poorly characterized as of yet [11]. The PLD gene family encodes proteins with a number of cellular functions. For example, it has been suggested that PLDβ is involved in the regulation of seed germination and may act as a negative regulator of defence responses and disease resistance in rice [11,12], whereas PLDδ has been shown to play an important role in drought-induced hydrogen peroxide synthesis, responses to freezing and UV irradiation, and in the reorganization of microtubules at plasma membrane [1,13].
Despite these apparent differences in their biochemical functions, all the eukaryotic PLDs share the presence of an N-terminal phospholipid-binding region and two highly conserved C-terminal domains where two catalytic HxKxxxxD (HKD) motifs interact to promote the lipase activity [14,15]. The plant PLD family can also be divided into two further subfamilies (C2 and PXPH) based on the composition of their N-terminal phospholipid-binding domains. The C2-PLD subfamily comprises PLDs containing a C2 domain in their N-termini, while the N-termini of those of the PXPH-PLD subfamily contain both a phox homology (PX) domain and a pleckstrin homology (PH) domain. The C2, PX and PH domains have been implicated in protein-protein interactions, but perhaps their best described function involves their ability to modulate membrane targeting of proteins. The C2 domain of the C2-PLDs mediates the localization of soluble proteins to membranes by binding phospholipids in a Ca 2+ dependent manner [16], while the PX and PH domains of the PXPH-PLDs have been shown to mediate membrane targeting and are closely linked to polyphosphoinositide signalling [17]. The C2-PLDs only exist in plants, whereas the PXPH-PLDs exist both in plants and other organisms such as Caenorhabditis elegans and Homo sapiens. Presumably, the genes encoding the C2-PLDs and their progenitors have been lost from the evolutionary lineages leading to animals and fungi [18]. Furthermore, one additional small PLD subfamily (SP-PLDs) exists in which members comprise PLDs possessing an N-terminal signal peptide in place of the usual C2 or PXPH domains and the resulting specific cellular localizations may relate to their particular physiological functions in modulating plant growth, development and defence [11]. The isoforms α, β, γ, δ, and ε are C2-PLDs, the ζ isoform is a PXPH-PLDs and the φ isoform is a SP-PLDs.
The PLD gene family had been well studied in Arabidopsis and rice. However, there is far less information about this family for woody plant species such as Poplar and Grape. The recent provision of draft genome sequences for Poplar and Grape offered the opportunity to investigate the PLD gene family in these species. In this study, we first identified the PLD gene family members in Poplar and Grape and then performed detailed evolutionary analyses of these identified genes in comparison with those existing in Arabidopsis and rice.

PLD gene family in Poplar and Grape
In order to identify members of the PLD gene family in Poplar and Grape, the corresponding sequence information from Arabidopsis was used to perform multiple searches of the relevant DNA databases using the blast and tblastn algorithms, keyword searches and protein domain searches. The Poplar and Grape sequences returned by such searches were confirmed as encoding PLDs by using the programs PFAM and SMART. Following this strategy, we identified 18 genes encoding PLDs in Poplar (including a pseudogene) ( Table 1) and 11 PLD genes in Grape (Table 2). These numbers where similar to the number of PLD genes present in the rice (17 PLD genes) and Arabidopsis (12 PLD genes) genomes. Since there was no standard annotation assigned to these newly identified genes, we assigned each of them an identity based on the order of their location on each of either the Poplar or Grape chromosomes.
Based on the presence of C2, PX and PH motifs within their N-terminal domains, all the PLD family members in Poplar and Grape were assigned to two main subgroups, C2-PLDs and PXPH-PLDs. Additionally, one gene encod-ing an SP-PLD with an N-terminal signal peptide replacing the C2, PX and PH domains was identified for each of these species. Corresponding SP-PLD genes were also found in other species, including Caenorhabditis elegans  (CAE72017, NP_504824), Dictyostelium discoideum (XP_637114), Homo sapiens (AAH00553, AAH15003) and rice (Os06g44060).

Chromosomal location of PLD genes on Poplar and Grape genomes
Chromosomal location analyses showed that PLD genes of Poplar and Grape were dispersed throughout the respective genomes. Five Poplar PLD genes were localized to unassembled genomic sequence scaffolds and thus were not mapped to any particular chromosome. In Poplar, chromosomes I, II, VI and XVIII were found to possess two PLD genes each, and each of chromosomes III, V, X, XIII and XIV to possess a single PLD gene (Figure 1). For Grape, 11 PLD genes were found to be present on 8 of the 19 chromosomes; chromosomes II, V, XII, XV and XVIII were all found to possess one PLD gene each, whereas chromosome IV, IX and XI possessed two PLD genes each ( Figure 2).

Phylogenetic relationships of PLD gene family in Poplar and Grape
In order to classify the PLD genes identified for Poplar and Grape and investigate their evolutionary relationships, their derived protein sequences and those of Arabidopsis and rice [6,11] were subjected to phylogenetic analyses. One rice PLD gene, OsPLDκ (Os02g02790), was excluded from the analysis since it appeared to encode a protein missing one HKD domain at its C-terminus, indi-cating that this gene may be a pseudogene or constitute either a sequencing or assembly error. After excluding other cases of such pseudogenes or incorrectly assembled genes, a total of 56 PLD genes were used in the analyses, 17 from Poplar (excluding the pseudogene PtPLD18), 11 from Grape, 12 from Arabidopsis and 16 from rice [6,11] ( Figure 3). A phylogenetic tree based on protein sequences was constructed using the neighbor-joining (NJ) method with p-distance and complete deletion option. For statistical reliability, we conducted bootstrap analysis with 1000 replicates. The NJ phylogenetic tree showed that all the PLD genes from the four higher plants divided into 6 well-supported clades (bootstrap values from 64% to 100%). Among these, the previously classified β and γ isoforms clustered closely together and were not explicitly separated from each other. Accordingly, the tree clades were classified into six subgroups, α, β/γ, δ, ε, ζ and φ ( Figure 3). Among these, the α subgroup constituted the largest clade containing 19 members, and the β/ γ subgroup formed the second largest clade containing 12 members (bootstrap value, 100%). Additionally, the β/γ and δ subgroups further clustered forming a larger clade and implying that they originated from a common ancestor by frequent gene duplication. Among these subgroups, the α, β/γ, δ, ε comprised C2-PLDs while the ζ and φ subgroups comprised PXPH-PLDs and SP-PLDs. Interestingly, although phylogenetically members of the ε subgroup comprised C2-PLDs, they appeared somewhat divergent from this class of PLDs and were indeed inter- mediary between the C2-PLDs and PXPH-PLDs. Distinct from the other C2-PLDs, PLDε appeared to possess the C2 structural fold, but this contained none of the acidic amino acid residues thought to be involved in Ca 2+ binding, suggesting that the phospholipid binding of PLDε is less Ca 2+ -dependent than the other C2-PLDs [19]. This feature of the PLDε C2 domain appeared to be conserved between Poplar, Grape and Arabidopsis. Surprisingly, PLDε does not appear to exist in rice.
Structural analyses can provide valuable information concerning duplication events when interpreting phylogenetic relationships within gene families. Thus, the exon/intron structure of each member of the PLD family was analyzed (right panel in Figure 3). The number of exons determined for members of the PLD gene family ranged from 2 in OsPLDα6 to 22 in PtPLD16. Most members within the individual subgroups shared similar intron/exon numbers and predicted coding sequence (CDS) lengths, consistent with the phylogenetic classification of the PLDs into the subgroups depicted in the left panel of Figure 3. For example, both β/γ and δ subgroups included members with 9-12 exons with CDS lengths of between 792 to 1296 codons, consistent with the observation that they originated by continuous gene duplication. Interestingly, the genes VvPLD9 and VvPLD10 appeared longer than the other members of the β/γ and δ subgroups because of the presence of a single long intron that contained repeated retrotransposon elements [20]. Similarly, members of the α and ε subgroups possessed 3-4 exons, with some introns extended by retrotransposon elements, suggesting that they also had a common ancestor. Members of the ζ subgroup which comprised PXPH-PLDs, were distinct from the C2-PLDs clade in that they possessed between 19 to 21 exons (except AtPLDζ2 that Similarly, all members of the PXPH-PLD φ subgroup had 7 exons, also implying they originated via an evolutionary path separate to that of the C2-PLDs. Thus, the phylogenetic and gene structure analysis suggested that the 6 PLD subgroups originated from 4 ancestors via a series of gene duplications. PLD genes from each of the subgroups were found in all four species of the higher plants examined with the exception of members of the two small subgroups, ε and φ, which were absent in the rice and Arabidopsis genomes, respectively. Presumably, the main subgroups of the plant PLD gene family were established before the dicot-monocot lineage parted and before further division of the dicotyledonous non-woody and woody herbaceous lineage. The majority of the PLD genes from Poplar (76.5%, 13/17) and Grape (90.9%, 10/11) clustered more closely together in the phylogenetic tree than they did with those from Arabidopsis and rice (Figure 3), suggesting that two woody plants had a closer evolutionary relationship than with the non-woody herbaceous dicot and the monocot [21]. Five pairs of Poplar PLD genes (PtPLD10 and PtPLD4, PtPLD17 and PtPLD15, PtPLD6 and PtPLD3, PtPLD13 and PtPLD2, and PtPLD16 and PtPLD8) formed 5 well-supported subclusters (bootstrap values of 100%) (left panel in Figure 3), indicating that they were evolutionarily very closely related. Each pair of genes in each of the 5 subclusters had very similar structures (right panel in Figure 3), indicating that they originated from relatively recent gene duplications. Four of these five subclusters also clustered relatively closely with a similar Grape PLD gene. At least one pair of Grape PLD genes (VvPLD7 and VvPLD8) clustered sufficiently closely to suggest that they too arose from a recent duplication event. This Grape subcluster also clustered closely with a PLD gene from Poplar. Collectively, these results indicate that frequent gene duplications occurred following the divergence of the Poplar and Grape species and that in Poplar this resulted in a rapid expansion of the size of the PLD gene family.

Evolutionary patterns of PLD gene family in Arabidopsis, rice, Poplar and Grape
Segmental duplication, tandem duplication and transposition events such as retroposition and replicative transposition are the main reasons for gene family expansion [22]. Two tandem PLD gene duplications have previously been identified in Arabidopsis (AtPLDγ2-AtPLDγ1-AtPLDγ3) and rice (OsPLDα3-OsPLDα4-OsPLDα5) [6,11]. Chromosomal location analyses of the PLD gene family in Polar and Grape showed that the majority of the genes appeared randomly scattered throughout the genome with the exception of one pair of Grape PLD genes (VvPLD7/VvPLD8) which were tightly co-located and thus most likely resulted from a tandem duplication ( Figure 2). This suggests that tandem duplication is not a major contributory event leading to the expansion of the PLD gene family in higher plants. Thus, we hypothesized that, at least in Arabidopsis, rice, Poplar and Grape, segmental duplication and transposition events may have played a more leading role in the evolution of the PLD gene family.
To validate this hypothesis, we first selected 13 paralogous PLD gene pairs from the phylogenetic tree ( Figure 3) and subsequently explored the degree to which the protein-coding genes flanking each paralogous pair were similar. There were 5 pairs of paralogous PLD genes identified in the phylogenetic tree for Poplar ( Figure 3). The identities of the genes flanking both sides of all 5 pairs of the paralogous Poplar PLD genes were found to be highly conserved ( Table 3), suggesting that all of the paralogous PLD genes in Poplar arose from segmental duplication events. Similarly, in rice the protein-coding genes flanking each of the three pairs of PLD paralogous genes identified (OsPLDβ1/OsPLDβ2, OsPLDα1/OsPLDα2, and OsPLDδ2/OsPLDδ3) were found to be conserved (Table  3). To better explore the mechanisms of the PLD gene family expansion in Grape, a phylogenetic analysis of only Grape PLD genes was used to identify paralogous gene pairs (see Additional file 1). Two additional gene pairs (VvPLD1/VvPLD10 and VvPLD9/VvPLD11) were thus identified and protein-coding gene identity was found to be highly conserved in the regions flanking the genes VvPLD1 and VvPLD10. Similarly, in Arabidopsis one PLD gene pair (AtPLDα1/AtPLDα2) with conserved proteincoding genes in their flanking regions was identified (Table 3).
Taken together, these findings indicate that the mechanisms underlying the gene duplications that have contributed to the expansion of the PLD gene family differ between the four higher plants examined. In Poplar, segmental duplication accounted for the majority of the gene duplications identified. In evolutionary terms, most of these Poplar PLD gene duplications appeared to have occurred relatively recently and may be associated with novel functional divergence and adaptation. However, in Arabidopsis, rice and Grape, both segmental duplication and transposition events appear to have contributed to the duplication of the PLD genes. It is worth noting that some 41.4% of the Grape genome is composed of repetitive/transposable elements [20]. Thus, it is prudent to propose that transposition events could have been an important factor governing the expansion of PLD gene family in this species.
To estimate the evolutionary dates of the segmental duplication events, Ks was used as the proxy for time and the conserved protein-coding genes flanking the PLD gene pairs were thus subjected to Ks calculation ( Table 3). The protein-coding genes flanking the 5 pairs of duplicated genes in Poplar had very consistent mean Ks values (from 0.2059 to 0.2505), suggesting that the segmental duplication events in this species occurred within the last 11.31 to 13.76 million years. This time period is subsequent to the time at which the evolutionary lineage of Poplar and Arabidopsis divided, circa 100-120 million years ago (Ma), and is consistent with the time (13 Ma) when a recent large scale genome duplication event is thought to have occurred in Poplar [23]. The implication is that, relative to other species, the rapid expansion of the PLD gene family in Poplar resulted from higher order genome level processes.
The PLD gene segmental duplication in Grape was estimated to have occurred about 25.09 Ma (mean Ks = 0.7527), which is similar to when this was observed in Poplar. The observation that there are fewer PLD genes in Grape compared to Poplar may be due to the fact that Grape experienced two genome wide duplication (GWD) events during evolution compared to three in Poplar [24,25].
For rice, the segmental duplication event was estimated to have occurred between 69.41 to 76.70 Ma, which is subsequent to the time of divergence of the monocots and eudicots (170-235 Ma), but precedent to the time of the origin of the grasses (55-70 Ma) [26][27][28]. The earliest observed segmental duplication event occurred in the PLD genes of Arabidopsis around 88.39 Ma. It is interesting, therefore, that despite similar levels of GWD, Arabidopsis has comparably fewer PLD genes than Poplar. It is likely that this may due to the fact that the Arabidopsis genome has subsequently suffered a high level of gene loss [20,29].

Functional divergence and driving forces for genetic divergence
Site-specific shift rates (Type-I functional divergence) reflect the difference in the evolutionary rate of change of specific amino acid sites in proteins following gene duplication [30,31]. In order to detect the Type-I functional divergence occurring in the PLDs, we determined the differences in the site-specific evolutionary rates of amino acid changes between the C2-PLD and PXPH-PLD clades ( Figure 3) using the program DIVERGE. The results showed a significant evidence of type I functional divergence between the C2-PLDs and PXPH-PLDs (θ I = 0.64, P < 0.01, see Additional file 2). When the threshold values of posterior probability (Q k ) were set to either 0.80 or 0.90, 75 and 40 amino acid sites, respectively, were determined to be associated with the functional divergence of the C2-and PXPH-PLDs (see Additional file 2).
Positive Darwinian selection has been reported to be associated with gene duplication and functional divergence. To explore whether positive selection drove evolution of the PLD gene family, the coding regions of thirteen PLD gene paralogs from Arabidopsis, rice, Poplar and Grape were subjected to sliding window analyses. The nonsynonymous (dN)/synonymous substitution (dS) ratio (ω = dN/dS) is generally used to identify positive selection. A dN/dS (also known as Ka/Ks) ratio >1, <1 and = 1 indicates positive, negative, or purifying selection, and neutral evolution, respectively [32]. We calculated the dN/dS ratios for all the paralogs depicted in the phylogenetic tree reported in Figure 3 with a sliding window of 300 bp and a moving step of 50 bp. The resulting pairwise comparison data showed that all the paralogous genes have dN/dS ratios of <1 except for the comparisons OsPLDα4 vs. OsPLDα5 and OsPLDβ1 vs. OsPLDβ2 (see Additional file 3), strongly suggesting that the PLD gene family had mainly experienced strong purifying selection pressure. Here, the action of such purifying selection on the duplicated Poplar PLD genes supports the observation that the rapid expansion of the PLD family in this species resulted from higher order genome level processes. The gene pair, OsPLDα4 and OsPLDα5, clustered closely together (bootstrap values of 100%) and exhibited very similar exon/intron structures (Figure 3), suggesting that they were derived from relatively recent gene duplication event. The gene pair OsPLDβ1 and OsPLDβ2 also appeared to be similarly derived. Pairwise comparisons between OsPLDα4 and OsPLDα5 and between OsPLDβ1 and OsPLDβ2 exhibited ω values >1 in some regions, especially in the N termini of the proteins (see Additional file 3), suggesting that a more recent episode of positive selection has occurred after the gene duplication event.
To further investigate the evolutionary selection pressures acting on the PLDs, a site-specific model was formulated using the Codeml program of PAML 4.0 [33] with sequences from the C2-, PXPH-and SP-PLD clades. Consistent with the pairwise comparison results, when using the robust codon-substitution model in PAML, purifying selection was also determined to have acted on the C2-PLDs (see Additional file 4). Such a selection pressure may indicate that strong functional constraints have a bearing on the evolution of the C2-PLDs, supporting the notion that this group of the PLDs have important and essential roles in the regulation of plant cellular processes. Conversely, the concept that purifying selection is the main evolutionary mode of amino acid change in the C2-PLDs, along with the fact that the majority (12/13) of duplicated PLD genes belong to this clade, implies that C2-PLD gene duplication is unlikely to be associated with the formation of PLDs of either novel or divergent function.
In contrast, positive selection was observed to have occurred during the evolution of the PXPH-PLDs and SP-PLDs. Weak (ω = 1.34) and strong (ω = 8.14) positive selections were determined to have acted on the PXPH-PLDs and SP-PLDs, respectively. Although not reaching significant levels (posterior probabilities >0.90), one (site 627) and 3 (sites 3, 18 and 23) positively selected amino acid sites were identified in the PXPH-and SP-PLDs, respectively (see Additional file 4).
Plants possess relatively few PXPH-and SP-PLD genes in comparison to the numbers of C2-PLD genes found in their genomes (Figure 4). Thus, the positive selection that has acted on the PXPH-and SP-PLD genes may imply that their functional diversification has resulted from the need to adapt to a changing environment.

Domains and Motifs analyses in PLD gene family
The proteins encoded by the newly identified Poplar and Grape PLD genes were subjected to protein domain analyses. The program hmmpfam in HMMer [34] was used initially to identify the major domains of the PLD proteins. Such domain analyses for Arabidopsis and rice PLDs has been previously performed [6,11]. Here, the analyses showed that the all the PLDs in Poplar and Grape possessed the two characteristic and structurally conserved HKD domains essential for their lipase activity. As in the case of Arabidopsis and rice, the Poplar and Grape PLDs could be classified into the three subgroups (C2-, PXPH-and SP-PLD), based on the presence of the subgroup-specific domains ( Figure 4). As expected, in their N-terminal regions the C2-PLDs contained one C2 domain while the PXPH-PLDs contained both PX and PH domains and an N-terminal signal peptide was identified in each of the SP-PLDs (Figure 4).
Such domain search tools are suitable for defining the presence or absence of roughly recognisable domains, but they are unable to recognize either smaller individual motifs or more divergent patterns. Thus, we used the motif search tool MEME/MAST to mine for more detailed motif information (see Additional file 5) in the PLDs of the four higher plants examined. The thirty motifs identified by MEME were annotated by Inter-ProScan [35]. The result showed that, except for motifs 20, 25, 27 and 29, the majority of these domains were functionally associated with PLD activity (see Additional file 6). For analytical convenience, we divided the data into parts covering three regions of the PLDs: the N-terminal region before the first HKD domain, the middle region including the region in and between the two HKD domains, and the C-terminal region after the second HKD domain ( Figure 5).
The N-terminal region of the C2-PLDs contained 10 motifs, compared with 6 motifs in the same region of the PXPH-PLDs ( Figure 5A). Four motifs found in the N-termini of the C2-PLDs (20, 17, 24 and 12) appeared specific to this PLD clade (see Additional file 7) and have been suggested to take part in the formation of an eight βstrand switch involved in Ca 2+ -binding [36]. Motifs 28 and 29 appeared to be specific to the PX domain and motif 30 to the PH domain of the PXPH-PLDs (see Additional file 8), and are thought to be associated with the binding of phosphatidylinositol lipids [36]. One observed exception was AtPLDγ3 that contained an additional motif, 13, in the C2 domain. Additionally, in OsPLDα7, PtPLD10, PtPLD14, VvPLD5, AtPLDα3, AtPLDε and AtPLDγ2 the C2 domain appeared to have lost either one or two of the four motifs mentioned above that are associated with the binding of Ca 2+ (see Additional file 5). The N-terminal region of the C2-PLDs, the region behind C2 domain (referred to as the post-C2 region) usually included the 6 motifs 22, 10, 6, 8, 9, and 15. A degree of loss of some of these motifs was observed in some of the C2-PLDs from each of the four species examined. For example, AtPLDε, VvPLD5, VvPLD2, PtPLD14, PtPLD17, PtPLD6, PtPLD3 and OsPLDα7 appear to have lost either one or both of motifs 22 and 15 (see Additional file 5). In contrast, no motif loss was observed in the PXPH-PLDs, possibly due to the small number of members of this subgroup and the relatively small number of motifs in the Nterminal region of these PLDs. Both the C2-PLDs and PXPH-PLDs shared two conserved motifs, 6 and 8, implying a crucial role for these in PLD function.
The middle region of the PLDs contained 11 and 7 motifs in the C2-PLDs and PXPH-PLDs, respectively. There appeared to be a relatively higher level of conservation in this region between the C2-PLDs and PXPH-PLDs as they shared 5 motifs (5, 4, 3, 14 and 1). The middle region of the C2-PLDs and PXPH-PLDs started from the C-terminal ends of motifs 15 and 26, respectively, and ended with motif 1 that formed the HKD2 domain. The C-terminal sequences of motifs 15 and 26 are identical and, together with motif 5, formed the HKD1 domain (see Additional file 9). Thus, these two sections of the middle region appeared identical in both the C2-PLDs and PXPH-PLDs ( Figure 5A). Comparative sequence alignment of the two HKD domains revealed that the HKD1 domain sequence was relatively more diverse than   Both the HKD1 and HKD2 domains contained three highly conserved amino acids (6H, 8K and 13D), implying that these have a key functional importance within these domains. It would appear that the two HKD domains have co-evolved within the PLD family. Phylogenetic analysis of the HKD1 and HKD2 domains, respectively, produced two trees that exhibited similar topology (see Additional file 10) to that revealed by the same analysis of the PLD gene family. The HKD domain trees clustered similarly into 5 subgroups, with gene members clustering in an almost identical fashion to that observed when the full length PLDs were so analysed.
Three other motifs (4, 3 and 14) in the middle region were also shared by both the C2-PLDs and PXPH-PLDs. Variations in the sequence of this motif exhibit different PIP 2 binding affinity [8,37,38]. Sequence alignment of motif 4 from the individual PLDs showed that 73.2% (30/ 41) of the amino acid sites were highly conserved with 10 of them being fully conserved, suggesting that they may play an essential role in the binding of both C2-PLDs and PXPH-PLDs to PIP 2 (see Additional file 11). Motif 3 contained a regular-expression sequence "IYIENQ[FY]F" ( Figure 5B). The seventh amino acid of this regularexpression sequence, Phe (F), appeared in all PXPH-PLDs, but was often substituted by Tyr (Y) in the C2-PLDs (see Additional file 12). This short sequence was only found in the PLD family members, and has been postulated to increase the rate of catalysis and ensure substrate specificity [8]. This suggests that the sequence   "IYIENQ[FY]F" may be almost as critical as the HKD motif for PLD activity [39,40]. In addition, motif 14 was also found in both the C2-PLDs and PXPH-PLDs (see Additional file 13). Four amino acid sites in this motif were shown to be highly conserved, especially the eighth amino acid, tyrosine, which was fully conserved. With the exception of these four conserved amino acid sites, the remainder of motif 14 exhibited a high degree of sequence polymorphism.
Apart from these shared motifs, the C2-PLDs and PXPH-PLDs possessed a number of clade-specific motifs within their middle regions. The C2-PLDs had 5 such motifs (21, 27, 11, 2 and13) and the PXPH-PLDs one motif, 23 ( Figure 5A). The C2-PLD-specific motif 2 contained a core triplet of amino acids, "ERF", followed by a highly conserved hydrophobic region, "VYVVV" (see Additional file 14). in AtPLDα1, this motif was reported as being able to bind to the Gα subunit of the Arabidopsis heterotrimeric G protein [41]. When the sequences of motif 2 from the different C2-PLDs were aligned, the ERF triplet appeared to be relatively more conserved than the "VYVVV" region. However, mutations that have occurred in OsPLDα7, AtPLDγ2, and OsPLDδ2 changed the second residue of the ERF triplet from the basic amino acid R into the non-charged amino acids S, N and Q, respectively, implying a possible change in the ability of these PLDs to bind to the heterotrimeric G protein Gα subunit [41].
Overall, 8 motifs (6, 8, 5, 4, 3, 14, 1 and 19) were shown to be shared by both the C2-and PXPH-PLDs subgroups, implying that they are likely to be necessary for PLD function. The majority of these conserved motifs appeared to exist in the middle region of the PLDs, strongly suggesting that the two HKD domains in this region play a key role in the lipase activity of the PLDs.

Conclusion
In this study, we have provided a genome-wide identification and analysis of the PLD gene family in Poplar and Grape. Eighteen and 11 members of the PLD gene family were identified in Poplar and Grape, respectively. Phylogenetic and gene structure analyses showed that the PLD gene family can be divided into 6 subgroups (α, β/γ, δ, ε, ζ, and φ) and that these 6 PLD subgroups originated from 4 original ancestors through a series of gene duplications. Phylogenetically, the majority of the PLD genes from Poplar (82.8%, 14/17) and Grape (90.9%, 10/11) clustered particularly closely, suggesting a close evolutionary relationship between these two species. Five pairs of duplicated PLD genes were identified in Poplar, more than those identified in Grape, suggesting that frequent gene duplication occurred after the species diverged resulting in a rapid expansion of the PLD gene family in Poplar. The majority of gene duplications in Poplar appeared to have been caused by segmental duplication, distinguishing it from the other three plant species, Arabidopsis, rice and Grape, where both segmental duplication and transposition events appeared to have contributed to the duplication of the PLD genes. Furthermore, the PLD gene duplications in Poplar were estimated to have occurred between 11.31 to 13.76 Ma, substantially later than the time when duplications occurred in the other three plant species (25.09 to 88.39 Ma). Adaptive evolution analysis showed that purifying selection has driven evolution of the C2-PLDs, whereas positive selection has contributed, at least in part, to the evolution of the remaining PLDs, especially after gene duplication.
The PLD gene family is divided into two main subfamilies, C2-PLDs and PXPH-PLDs, and one smaller subfamily, SP-PLDs. Motif analyses show that the C2-PLDs and PXPH-PLDs generally contain 23 and 17 motifs, respectively. Among these, 8 motifs were shared by both the C2and PXPH-PLDs subfamilies, implying that they may be necessary for PLD function. The majority of these shared motifs exist in the middle region of the PLDs, suggesting that the two HKD domains also play a core role in PLD activity.
This detailed analysis of the PLD gene family in these two woody plants has provided the data that will form the basis for future hypothesis-driven experiments involving either loss-or gain-of-function studies aimed at clarifying the role of the different PLDs in the growth, development and survival of Poplar and Grape. Thus, this new knowledge of the PLD gene family in these species may lead to the possibility of modulating PLD gene expression and function in order to control specific aspects of the physiology and development of woody plants.

Identification of PLD gene families in Poplar and Grape
To identify members of the PLD gene family in Poplar and Grape, multiple database searches were performed. Arabidopsis PLD gene sequences were retrieved from http://www.arabidopsis.org and used as queries to perform repetitive blast searches against the Poplar Genome (V1.1) database http://genome.jgi-psf.org/Poptr1_1/ Poptr1_1.home.html and the Genoscope Genome Project Grape genome database http://www.cns.fr/. Blast searches were also performed against nucleic acid sequence data repositories at the National Center for Biotechnology Information (NCBI, http://www.ncbi.nlm.nih. gov. Genes annotated as "Phospholipases D" or "PLD" were also collected by keyword searches in Genbank. Additionally, a Hidden Markov Model (HMM) search was performed in the proteome databases of Poplar and Grape using HKD domain HMM profiles (PFAM, PF00614). Profile searches were performed using the HMMER 2.3.2 software package [34]. All protein sequences derived from the candidate PLD genes collected were examined using the domain analysis programs, PFAM http://pfam.sanger.ac.uk/ and SMART http://smart.embl-heidelberg.de/ with the default cut off parameters. Gene sequences with two HKD domains were considered to be members of the PLD gene family. Pseudogenes were determined according to their gene annotation or when their coding sequences were obviously terminated by premature stop codons.

Sequence and phylogenetic analyses of PLD gene family
PLD gene sequences were aligned using the program Clustal X with BLOSUM30 as the protein weight matrix. The program MUSCLE (version 3.52) was also used to perform multiple sequence alignments to confirm the Clustal X data output [42]. Phylogenetic trees based on the protein sequences of the PLDs were constructed using the neighbor-joining (NJ) method of the program MEGA4 [43] with p-distance and the complete deletion option parameters engaged. The reliability of the trees obtained was tested using bootstrapping with 1000 replicates. Images of the phylogenetic trees were also drawn using MEGA4.

Chromosomal location and Gene structure of PLD genes
PLD gene chromosomal locations were determined using the Poplar genome browser http://genome.jgi-psf.org/ Poptr1_1/optr1_1.home.html and Grape genome browser http://www.cns.fr/externe/Genome Browser/ Vitis/, respectively. Gene intron/extron structure information was collected from the genome annotations of Poplar and Grape from NCBI.

Protein Motif analysis
In order to investigate protein motifs in more detail, the PLD protein sequences were analyzed using the MEME/ MAST software http://meme.sdsc.edu/ [44,45]. The functional annotation of the identified motifs was implemented by InterProScan http://www.ebi.ac.uk/Tools/ InterProScan/.

Analysis of PLD gene expansion patterns
Segmental (chromosomal segments) duplication, tandem duplication (duplications in a tandem pattern) and transposition events result in gene family expansion [46]. Transposition occurs when a segment from one chromosome becomes unaligned with the corresponding segment from the other chromosome. Because it is difficult to identify transposition events based on gene sequence analysis, in this study we focused on the processes of segmental and tandem duplication. To categorize expansion of the PLD gene family, we examined the chromosomal locations of all members of this family in Arabidopsis, rice, Poplar and Grape. Tandem duplication was characterized by multiple gene family members occurring within either the same or neighboring intergenic regions.
A method similar to that of Maher et al. [47] was used to identify segmental duplications. First paralogous PLD genes were identified at the terminal nodes of the phylogenetic tree. Next, 10 protein-coding genes upstream and downstream of each pair of paralogs were obtained from the annotated genomes of Arabidopsis, rice, Poplar and Grape. Lastly, the similarity between the genes flanking one PLD gene and those flanking the other PLD gene in each pair of paralogs was determined. A pair of paralogous PLD genes was considered to have originated from a duplication event if both resided within a region of conserved protein-coding genes.

Calculating Ks to date the duplication events and adaptive evolution analysis of PLD gene family
Pairwise alignment of nucleotide sequences of PLD paralogs was performed using Clustal X1.83. Gaps in the alignments were removed manually by Bioedit. The Ka and Ks values of the paralogous genes were estimated by the program K-Estimator 6.0 [48]. To better explain the patterns of macro-evolution, estimates of the evolutionary rates were considered extremely useful. Assuming a molecular clock, the synonymous substitution rates (Ks) of duplicated genes would be expected to be similar over time [49]. Thus, Ks could be used as the proxy for time and the conserved flanking protein-coding genes was used to estimate the dates of the segmental duplication events. The mean Ks value was calculated for each of duplicated gene pairs and then used to date the duplication events. Ks values greater than 2.0 were discarded in order to avoid the risk of saturation. The Ks values were then used to calculate the approximate date of the duplication event(T = Ks/2λ), assuming clock-like rates (λ) of synonymous substitution of 1.5 × 10 -8 substitutions/synonymous site/year for Arabidopsis [50], 6.5 × 10 -9 for rice [51], 9.1 × 10 -9 for Poplar [52], and 6.5 × 10 -9 for Grape [53]. To investigate whether Darwinian positive selection was involved in driving gene divergence after duplication, first Sliding Window analysis (300 bp window, 50 bp slide) was performed on the coding regions of paralogous PLD genes from the four plant species studied and was then used to calculate the Ka/Ks ratio. Subsequently, the codon-based site model of codeml in PAML [33] was used to perform adaptive evolution analysis on the three different types of PLD genes separately.