- Research article
- Open Access
Genome-wide and molecular evolution analyses of the phospholipase D gene family in Poplar and Grape
© Liu et al; licensee BioMed Central Ltd. 2010
- Received: 11 November 2009
- Accepted: 18 June 2010
- Published: 18 June 2010
The Phospholipase D (PLD) family plays an important role in the regulation of cellular processes in plants, including abscisic acid signaling, programmed cell death, root hair patterning, root growth, freezing tolerance and other stress responses. PLD genes constitute an important gene family in higher plants. However, until now our knowledge concerning the PLD gene family members and their evolutionary relationship in woody plants such as Poplar and Grape has been limited.
In this study, we have provided a genome-wide analysis of the PLD gene family in Poplar and Grape. Eighteen and eleven members of the PLD gene family were identified in Poplar and Grape respectively. Phylogenetic and gene structure analyses showed that the PLD gene family can be divided into 6 subgroups: α, β/γ, δ, ε, ζ, and φ, and that the 6 PLD subgroups originated from 4 original ancestors through a series of gene duplications. Interestingly, the majority of the PLD genes from both Poplar (76.5%, 13/17) and Grape (90.9%, 10/11) clustered closely together in the phylogenetic tree to the extent that their evolutionary relationship appears more tightly linked to each other, at least in terms of the PLD gene family, than it does to either Arabidopsis or rice. Five pairs of duplicated PLD genes were identified in Poplar, more than those in Grape, suggesting that frequent gene duplications occurred after these species diverged, resulting in a rapid expansion of the PLD gene family in Poplar. The majority of the gene duplications in Poplar were caused by segmental duplication and were distinct from those in Arabidopsis, rice and Grape. Additionally, the gene duplications in Poplar were estimated to have occurred from 11.31 to 13.76 million years ago, which are later than those that occurred in the other three plant species. Adaptive evolution analysis showed that positive selection contributed to the evolution of the PXPH- and SP-PLDs, whereas purifying selection has driven the evolution of C2-PLDs that contain a C2 domain in their N-terminal. Analyses have shown that the C2-PLDs generally contain 23 motifs, more than 17 motifs in PXPH-PLDs that contain PX and PH domains in N-terminal. Among these identified motifs, eight, (6, 8, 5, 4, 3, 14, 1 and 19) were shared by both the C2- and PXPH-PLD subfamilies, implying that they may be necessary for PLD function. Five of these shared motifs are located in the central region of the proteins, thus strongly suggesting that this region containing a HKD domain (named after three conserved H, K and D residues) plays a key role in the lipase activity of the PLDs.
As a first step towards genome wide analyses of the PLD genes in woody plants, our results provide valuable information for increasing our understanding of the function and evolution of the PLD gene family in higher plants.
- Gene Duplication
- Tandem Duplication
- Segmental Duplication
- Amino Acid Site
- Pleckstrin Homology Domain
Plants are exposed to widely varying environmental conditions and because of their sessile nature they can only survive and thrive by adapting to the changes in their surroundings. Thus, higher plants have the ability to adapt to periods of stress by employing specific responses underpinned by defined modifications of their cellular processes. Phospholipase D (PLD) plays an important role in the regulation of diverse cellular processes in plants, including abscisic acid signaling, programmed cell death, root hair patterning, root growth, freezing tolerance and other stress responses . PLD hydrolyzes phospholipids into a head group alcohol and phosphatidic acid (PA), which is an important intracellular messenger in plants, microorganisms and mammals .
The gene encoding PLD was first identified in plants more than 50 years ago , but did not receive detailed attention until the 1980s [4, 5]. Multiple PLD genes encoding isoforms that could be classified into different subgroups with distinct biochemical, regulatory and catalytic properties have now been identified. Six Arabidopsis PLDs (α, β, γ, δ, ε and ζ) have been characterized molecularly and biochemically and can be differentiated depending on their requirements and/or affinities for Ca2+, phosphatidylinositol 4,5-bisphosphate (PIP2) and free fatty acids [6, 7]. The predominant isoenzyme is the α-type PLD, which can be detected in both the leaves and seeds of plants and is responsible for the majority of the baseline PLD activity found therein. PLDα does not require phosphoinositides for its activity when assayed in the presence of mM levels of Ca2+ ions. It exhibits optimum activity at pH values between 5 and 6 and at high, non-physiological Ca2+ concentrations between 30 and 100 mM [8, 9]. In contrast, the β, γ, δ and ε PLD isoenzymes from Arabidopsis show their highest activity at μM Ca2+ concentrations and require the presence of PIP2 to be fully active . The activity of plant PLDζ appears to occur independently of Ca2+ ions, but requires PIP2 to selectively hydrolyze phosphatidylcholine. In rice, an additional isoenzyme, PLDφ, has been identified but poorly characterized as of yet . The PLD gene family encodes proteins with a number of cellular functions. For example, it has been suggested that PLDβ is involved in the regulation of seed germination and may act as a negative regulator of defence responses and disease resistance in rice [11, 12], whereas PLDδ has been shown to play an important role in drought-induced hydrogen peroxide synthesis, responses to freezing and UV irradiation, and in the reorganization of microtubules at plasma membrane [1, 13].
Despite these apparent differences in their biochemical functions, all the eukaryotic PLDs share the presence of an N-terminal phospholipid-binding region and two highly conserved C-terminal domains where two catalytic HxKxxxxD (HKD) motifs interact to promote the lipase activity [14, 15]. The plant PLD family can also be divided into two further subfamilies (C2 and PXPH) based on the composition of their N-terminal phospholipid-binding domains. The C2-PLD subfamily comprises PLDs containing a C2 domain in their N-termini, while the N-termini of those of the PXPH-PLD subfamily contain both a phox homology (PX) domain and a pleckstrin homology (PH) domain. The C2, PX and PH domains have been implicated in protein-protein interactions, but perhaps their best described function involves their ability to modulate membrane targeting of proteins. The C2 domain of the C2-PLDs mediates the localization of soluble proteins to membranes by binding phospholipids in a Ca2+ dependent manner , while the PX and PH domains of the PXPH-PLDs have been shown to mediate membrane targeting and are closely linked to polyphosphoinositide signalling . The C2-PLDs only exist in plants, whereas the PXPH-PLDs exist both in plants and other organisms such as Caenorhabditis elegans and Homo sapiens. Presumably, the genes encoding the C2-PLDs and their progenitors have been lost from the evolutionary lineages leading to animals and fungi . Furthermore, one additional small PLD subfamily (SP-PLDs) exists in which members comprise PLDs possessing an N-terminal signal peptide in place of the usual C2 or PXPH domains and the resulting specific cellular localizations may relate to their particular physiological functions in modulating plant growth, development and defence . The isoforms α, β, γ, δ, and ε are C2-PLDs, the ζ isoform is a PXPH-PLDs and the φ isoform is a SP-PLDs.
The PLD gene family had been well studied in Arabidopsis and rice. However, there is far less information about this family for woody plant species such as Poplar and Grape. The recent provision of draft genome sequences for Poplar and Grape offered the opportunity to investigate the PLD gene family in these species. In this study, we first identified the PLD gene family members in Poplar and Grape and then performed detailed evolutionary analyses of these identified genes in comparison with those existing in Arabidopsis and rice.
PLD gene family in Poplar and Grape
PLD genes identified in Poplar
PLD genes identified in Grape
Based on the presence of C2, PX and PH motifs within their N-terminal domains, all the PLD family members in Poplar and Grape were assigned to two main subgroups, C2-PLDs and PXPH-PLDs. Additionally, one gene encoding an SP-PLD with an N-terminal signal peptide replacing the C2, PX and PH domains was identified for each of these species. Corresponding SP-PLD genes were also found in other species, including Caenorhabditis elegans (CAE72017, NP_504824), Dictyostelium discoideum (XP_637114), Homo sapiens (AAH00553, AAH15003) and rice (Os06g44060).
Chromosomal location of PLD genes on Poplar and Grape genomes
Phylogenetic relationships of PLD gene family in Poplar and Grape
Structural analyses can provide valuable information concerning duplication events when interpreting phylogenetic relationships within gene families. Thus, the exon/intron structure of each member of the PLD family was analyzed (right panel in Figure 3). The number of exons determined for members of the PLD gene family ranged from 2 in OsPLDα6 to 22 in PtPLD16. Most members within the individual subgroups shared similar intron/exon numbers and predicted coding sequence (CDS) lengths, consistent with the phylogenetic classification of the PLDs into the subgroups depicted in the left panel of Figure 3. For example, both β/γ and δ subgroups included members with 9-12 exons with CDS lengths of between 792 to 1296 codons, consistent with the observation that they originated by continuous gene duplication. Interestingly, the genes VvPLD9 and VvPLD10 appeared longer than the other members of the β/γ and δ subgroups because of the presence of a single long intron that contained repeated retrotransposon elements . Similarly, members of the α and ε subgroups possessed 3-4 exons, with some introns extended by retrotransposon elements, suggesting that they also had a common ancestor. Members of the ζ subgroup which comprised PXPH-PLDs, were distinct from the C2-PLDs clade in that they possessed between 19 to 21 exons (except AtPLDζ2 that had 16 exons), suggesting an independent evolutionary lineage. Similarly, all members of the PXPH-PLD φ subgroup had 7 exons, also implying they originated via an evolutionary path separate to that of the C2-PLDs. Thus, the phylogenetic and gene structure analysis suggested that the 6 PLD subgroups originated from 4 ancestors via a series of gene duplications.
PLD genes from each of the subgroups were found in all four species of the higher plants examined with the exception of members of the two small subgroups, ε and φ, which were absent in the rice and Arabidopsis genomes, respectively. Presumably, the main subgroups of the plant PLD gene family were established before the dicot-monocot lineage parted and before further division of the dicotyledonous non-woody and woody herbaceous lineage. The majority of the PLD genes from Poplar (76.5%, 13/17) and Grape (90.9%, 10/11) clustered more closely together in the phylogenetic tree than they did with those from Arabidopsis and rice (Figure 3), suggesting that two woody plants had a closer evolutionary relationship than with the non-woody herbaceous dicot and the monocot . Five pairs of Poplar PLD genes (PtPLD10 and PtPLD4, PtPLD17 and PtPLD15, PtPLD6 and PtPLD3, PtPLD13 and PtPLD2, and PtPLD16 and PtPLD8) formed 5 well-supported subclusters (bootstrap values of 100%) (left panel in Figure 3), indicating that they were evolutionarily very closely related. Each pair of genes in each of the 5 subclusters had very similar structures (right panel in Figure 3), indicating that they originated from relatively recent gene duplications. Four of these five subclusters also clustered relatively closely with a similar Grape PLD gene. At least one pair of Grape PLD genes (VvPLD7 and VvPLD8) clustered sufficiently closely to suggest that they too arose from a recent duplication event. This Grape subcluster also clustered closely with a PLD gene from Poplar. Collectively, these results indicate that frequent gene duplications occurred following the divergence of the Poplar and Grape species and that in Poplar this resulted in a rapid expansion of the size of the PLD gene family.
Evolutionary patterns of PLD gene family in Arabidopsis, rice, Poplar and Grape
Segmental duplication, tandem duplication and transposition events such as retroposition and replicative transposition are the main reasons for gene family expansion . Two tandem PLD gene duplications have previously been identified in Arabidopsis (AtPLDγ2-AtPLDγ1-AtPLDγ3) and rice (OsPLDα3-OsPLDα4-OsPLDα5) [6, 11]. Chromosomal location analyses of the PLD gene family in Polar and Grape showed that the majority of the genes appeared randomly scattered throughout the genome with the exception of one pair of Grape PLD genes (VvPLD7/VvPLD8) which were tightly co-located and thus most likely resulted from a tandem duplication (Figure 2). This suggests that tandem duplication is not a major contributory event leading to the expansion of the PLD gene family in higher plants. Thus, we hypothesized that, at least in Arabidopsis, rice, Poplar and Grape, segmental duplication and transposition events may have played a more leading role in the evolution of the PLD gene family.
Duplicated PLD genes and the number of conserved protein-coding genes flanking them in Arabidopsis, rice, Poplar and Grape
Duplicated PLD gene 1
Duplicated PLD gene 2
Number of conserved flanking protein-coding genes
Date (million years ago)
Taken together, these findings indicate that the mechanisms underlying the gene duplications that have contributed to the expansion of the PLD gene family differ between the four higher plants examined. In Poplar, segmental duplication accounted for the majority of the gene duplications identified. In evolutionary terms, most of these Poplar PLD gene duplications appeared to have occurred relatively recently and may be associated with novel functional divergence and adaptation. However, in Arabidopsis, rice and Grape, both segmental duplication and transposition events appear to have contributed to the duplication of the PLD genes. It is worth noting that some 41.4% of the Grape genome is composed of repetitive/transposable elements . Thus, it is prudent to propose that transposition events could have been an important factor governing the expansion of PLD gene family in this species.
To estimate the evolutionary dates of the segmental duplication events, Ks was used as the proxy for time and the conserved protein-coding genes flanking the PLD gene pairs were thus subjected to Ks calculation (Table 3). The protein-coding genes flanking the 5 pairs of duplicated genes in Poplar had very consistent mean Ks values (from 0.2059 to 0.2505), suggesting that the segmental duplication events in this species occurred within the last 11.31 to 13.76 million years. This time period is subsequent to the time at which the evolutionary lineage of Poplar and Arabidopsis divided, circa 100-120 million years ago (Ma), and is consistent with the time (13 Ma) when a recent large scale genome duplication event is thought to have occurred in Poplar . The implication is that, relative to other species, the rapid expansion of the PLD gene family in Poplar resulted from higher order genome level processes.
The PLD gene segmental duplication in Grape was estimated to have occurred about 25.09 Ma (mean Ks = 0.7527), which is similar to when this was observed in Poplar. The observation that there are fewer PLD genes in Grape compared to Poplar may be due to the fact that Grape experienced two genome wide duplication (GWD) events during evolution compared to three in Poplar [24, 25].
For rice, the segmental duplication event was estimated to have occurred between 69.41 to 76.70 Ma, which is subsequent to the time of divergence of the monocots and eudicots (170-235 Ma), but precedent to the time of the origin of the grasses (55-70 Ma) [26–28]. The earliest observed segmental duplication event occurred in the PLD genes of Arabidopsis around 88.39 Ma. It is interesting, therefore, that despite similar levels of GWD, Arabidopsis has comparably fewer PLD genes than Poplar. It is likely that this may due to the fact that the Arabidopsis genome has subsequently suffered a high level of gene loss [20, 29].
Functional divergence and driving forces for genetic divergence
Site-specific shift rates (Type-I functional divergence) reflect the difference in the evolutionary rate of change of specific amino acid sites in proteins following gene duplication [30, 31]. In order to detect the Type-I functional divergence occurring in the PLDs, we determined the differences in the site-specific evolutionary rates of amino acid changes between the C2-PLD and PXPH-PLD clades (Figure 3) using the program DIVERGE. The results showed a significant evidence of type I functional divergence between the C2-PLDs and PXPH-PLDs (θI = 0.64, P < 0.01, see Additional file 2). When the threshold values of posterior probability (Qk) were set to either 0.80 or 0.90, 75 and 40 amino acid sites, respectively, were determined to be associated with the functional divergence of the C2- and PXPH-PLDs (see Additional file 2).
Positive Darwinian selection has been reported to be associated with gene duplication and functional divergence. To explore whether positive selection drove evolution of the PLD gene family, the coding regions of thirteen PLD gene paralogs from Arabidopsis, rice, Poplar and Grape were subjected to sliding window analyses. The nonsynonymous (dN)/synonymous substitution (dS) ratio (ω = dN/dS) is generally used to identify positive selection. A dN/dS (also known as Ka/Ks) ratio >1, <1 and = 1 indicates positive, negative, or purifying selection, and neutral evolution, respectively . We calculated the dN/dS ratios for all the paralogs depicted in the phylogenetic tree reported in Figure 3 with a sliding window of 300 bp and a moving step of 50 bp. The resulting pairwise comparison data showed that all the paralogous genes have dN/dS ratios of <1 except for the comparisons OsPLDα4 vs. OsPLDα5 and OsPLDβ1 vs. OsPLDβ2 (see Additional file 3), strongly suggesting that the PLD gene family had mainly experienced strong purifying selection pressure. Here, the action of such purifying selection on the duplicated Poplar PLD genes supports the observation that the rapid expansion of the PLD family in this species resulted from higher order genome level processes. The gene pair, OsPLDα4 and OsPLDα5, clustered closely together (bootstrap values of 100%) and exhibited very similar exon/intron structures (Figure 3), suggesting that they were derived from relatively recent gene duplication event. The gene pair OsPLDβ1 and OsPLDβ2 also appeared to be similarly derived. Pairwise comparisons between OsPLDα4 and OsPLDα5 and between OsPLDβ1 and OsPLDβ2 exhibited ω values >1 in some regions, especially in the N termini of the proteins (see Additional file 3), suggesting that a more recent episode of positive selection has occurred after the gene duplication event.
To further investigate the evolutionary selection pressures acting on the PLDs, a site-specific model was formulated using the Codeml program of PAML 4.0  with sequences from the C2-, PXPH- and SP-PLD clades. Consistent with the pairwise comparison results, when using the robust codon-substitution model in PAML, purifying selection was also determined to have acted on the C2-PLDs (see Additional file 4). Such a selection pressure may indicate that strong functional constraints have a bearing on the evolution of the C2-PLDs, supporting the notion that this group of the PLDs have important and essential roles in the regulation of plant cellular processes. Conversely, the concept that purifying selection is the main evolutionary mode of amino acid change in the C2-PLDs, along with the fact that the majority (12/13) of duplicated PLD genes belong to this clade, implies that C2-PLD gene duplication is unlikely to be associated with the formation of PLDs of either novel or divergent function.
In contrast, positive selection was observed to have occurred during the evolution of the PXPH-PLDs and SP-PLDs. Weak (ω = 1.34) and strong (ω = 8.14) positive selections were determined to have acted on the PXPH-PLDs and SP-PLDs, respectively. Although not reaching significant levels (posterior probabilities >0.90), one (site 627) and 3 (sites 3, 18 and 23) positively selected amino acid sites were identified in the PXPH- and SP-PLDs, respectively (see Additional file 4).
Domains and Motifs analyses in PLD gene family
The proteins encoded by the newly identified Poplar and Grape PLD genes were subjected to protein domain analyses. The program hmmpfam in HMMer  was used initially to identify the major domains of the PLD proteins. Such domain analyses for Arabidopsis and rice PLDs has been previously performed [6, 11]. Here, the analyses showed that the all the PLDs in Poplar and Grape possessed the two characteristic and structurally conserved HKD domains essential for their lipase activity. As in the case of Arabidopsis and rice, the Poplar and Grape PLDs could be classified into the three subgroups (C2-, PXPH- and SP-PLD), based on the presence of the subgroup-specific domains (Figure 4). As expected, in their N-terminal regions the C2-PLDs contained one C2 domain while the PXPH-PLDs contained both PX and PH domains and an N-terminal signal peptide was identified in each of the SP-PLDs (Figure 4).
The N-terminal region of the C2-PLDs contained 10 motifs, compared with 6 motifs in the same region of the PXPH-PLDs (Figure 5A). Four motifs found in the N-termini of the C2-PLDs (20, 17, 24 and 12) appeared specific to this PLD clade (see Additional file 7) and have been suggested to take part in the formation of an eight βstrand switch involved in Ca2+-binding . Motifs 28 and 29 appeared to be specific to the PX domain and motif 30 to the PH domain of the PXPH-PLDs (see Additional file 8), and are thought to be associated with the binding of phosphatidylinositol lipids . One observed exception was AtPLDγ3 that contained an additional motif, 13, in the C2 domain. Additionally, in OsPLDα7, PtPLD10, PtPLD14, VvPLD5, AtPLDα3, AtPLDε and AtPLDγ2 the C2 domain appeared to have lost either one or two of the four motifs mentioned above that are associated with the binding of Ca2+ (see Additional file 5). The N-terminal region of the C2-PLDs, the region behind C2 domain (referred to as the post-C2 region) usually included the 6 motifs 22, 10, 6, 8, 9, and 15. A degree of loss of some of these motifs was observed in some of the C2-PLDs from each of the four species examined. For example, AtPLDε, VvPLD5, VvPLD2, PtPLD14, PtPLD17, PtPLD6, PtPLD3 and OsPLDα7 appear to have lost either one or both of motifs 22 and 15 (see Additional file 5). In contrast, no motif loss was observed in the PXPH-PLDs, possibly due to the small number of members of this subgroup and the relatively small number of motifs in the N-terminal region of these PLDs. Both the C2-PLDs and PXPH-PLDs shared two conserved motifs, 6 and 8, implying a crucial role for these in PLD function.
Three other motifs (4, 3 and 14) in the middle region were also shared by both the C2-PLDs and PXPH-PLDs. Motif 4 contained a regular-expression sequence "[GK]GPR[EQ]PWHD[LIV]H[CS][KR][IL][ED]GPA[YW]DVLTNFE[QE]RWRK[AQ]G[G][PW][KD]GLVK" (Figure 5B) which is thought to form the binding site of PIP2. Variations in the sequence of this motif exhibit different PIP2 binding affinity [8, 37, 38]. Sequence alignment of motif 4 from the individual PLDs showed that 73.2% (30/41) of the amino acid sites were highly conserved with 10 of them being fully conserved, suggesting that they may play an essential role in the binding of both C2-PLDs and PXPH-PLDs to PIP2 (see Additional file 11). Motif 3 contained a regular-expression sequence "IYIENQ[FY]F" (Figure 5B). The seventh amino acid of this regular-expression sequence, Phe (F), appeared in all PXPH-PLDs, but was often substituted by Tyr (Y) in the C2-PLDs (see Additional file 12). This short sequence was only found in the PLD family members, and has been postulated to increase the rate of catalysis and ensure substrate specificity . This suggests that the sequence "IYIENQ[FY]F" may be almost as critical as the HKD motif for PLD activity [39, 40]. In addition, motif 14 was also found in both the C2-PLDs and PXPH-PLDs (see Additional file 13). Four amino acid sites in this motif were shown to be highly conserved, especially the eighth amino acid, tyrosine, which was fully conserved. With the exception of these four conserved amino acid sites, the remainder of motif 14 exhibited a high degree of sequence polymorphism.
Apart from these shared motifs, the C2-PLDs and PXPH-PLDs possessed a number of clade-specific motifs within their middle regions. The C2-PLDs had 5 such motifs (21, 27, 11, 2 and13) and the PXPH-PLDs one motif, 23 (Figure 5A). The C2-PLD-specific motif 2 contained a core triplet of amino acids, "ERF", followed by a highly conserved hydrophobic region, "VYVVV" (see Additional file 14). in AtPLDα1, this motif was reported as being able to bind to the Gα subunit of the Arabidopsis heterotrimeric G protein . When the sequences of motif 2 from the different C2-PLDs were aligned, the ERF triplet appeared to be relatively more conserved than the "VYVVV" region. However, mutations that have occurred in OsPLDα7, AtPLDγ2, and OsPLDδ2 changed the second residue of the ERF triplet from the basic amino acid R into the non-charged amino acids S, N and Q, respectively, implying a possible change in the ability of these PLDs to bind to the heterotrimeric G protein Gα subunit .
In the C-terminal region, the C2-PLDs contained 4 motifs (25, 7, 19 and 16) and the PXPH-PLDs 2 motifs (18 and 19) (Figure5), thus sharing the single motif 19.
Overall, 8 motifs (6, 8, 5, 4, 3, 14, 1 and 19) were shown to be shared by both the C2- and PXPH-PLDs subgroups, implying that they are likely to be necessary for PLD function. The majority of these conserved motifs appeared to exist in the middle region of the PLDs, strongly suggesting that the two HKD domains in this region play a key role in the lipase activity of the PLDs.
In this study, we have provided a genome-wide identification and analysis of the PLD gene family in Poplar and Grape. Eighteen and 11 members of the PLD gene family were identified in Poplar and Grape, respectively. Phylogenetic and gene structure analyses showed that the PLD gene family can be divided into 6 subgroups (α, β/γ, δ, ε, ζ, and φ) and that these 6 PLD subgroups originated from 4 original ancestors through a series of gene duplications. Phylogenetically, the majority of the PLD genes from Poplar (82.8%, 14/17) and Grape (90.9%, 10/11) clustered particularly closely, suggesting a close evolutionary relationship between these two species. Five pairs of duplicated PLD genes were identified in Poplar, more than those identified in Grape, suggesting that frequent gene duplication occurred after the species diverged resulting in a rapid expansion of the PLD gene family in Poplar. The majority of gene duplications in Poplar appeared to have been caused by segmental duplication, distinguishing it from the other three plant species, Arabidopsis, rice and Grape, where both segmental duplication and transposition events appeared to have contributed to the duplication of the PLD genes. Furthermore, the PLD gene duplications in Poplar were estimated to have occurred between 11.31 to 13.76 Ma, substantially later than the time when duplications occurred in the other three plant species (25.09 to 88.39 Ma). Adaptive evolution analysis showed that purifying selection has driven evolution of the C2-PLDs, whereas positive selection has contributed, at least in part, to the evolution of the remaining PLDs, especially after gene duplication.
The PLD gene family is divided into two main subfamilies, C2-PLDs and PXPH-PLDs, and one smaller subfamily, SP-PLDs. Motif analyses show that the C2-PLDs and PXPH-PLDs generally contain 23 and 17 motifs, respectively. Among these, 8 motifs were shared by both the C2- and PXPH-PLDs subfamilies, implying that they may be necessary for PLD function. The majority of these shared motifs exist in the middle region of the PLDs, suggesting that the two HKD domains also play a core role in PLD activity.
This detailed analysis of the PLD gene family in these two woody plants has provided the data that will form the basis for future hypothesis-driven experiments involving either loss- or gain-of-function studies aimed at clarifying the role of the different PLDs in the growth, development and survival of Poplar and Grape. Thus, this new knowledge of the PLD gene family in these species may lead to the possibility of modulating PLD gene expression and function in order to control specific aspects of the physiology and development of woody plants.
Identification of PLD gene families in Poplar and Grape
To identify members of the PLD gene family in Poplar and Grape, multiple database searches were performed. Arabidopsis PLD gene sequences were retrieved from http://www.arabidopsis.org and used as queries to perform repetitive blast searches against the Poplar Genome (V1.1) database http://genome.jgi-psf.org/Poptr1_1/Poptr1_1.home.html and the Genoscope Genome Project Grape genome database http://www.cns.fr/. Blast searches were also performed against nucleic acid sequence data repositories at the National Center for Biotechnology Information (NCBI, http://www.ncbi.nlm.nih.gov. Genes annotated as "Phospholipases D" or "PLD" were also collected by keyword searches in Genbank. Additionally, a Hidden Markov Model (HMM) search was performed in the proteome databases of Poplar and Grape using HKD domain HMM profiles (PFAM, PF00614). Profile searches were performed using the HMMER 2.3.2 software package . All protein sequences derived from the candidate PLD genes collected were examined using the domain analysis programs, PFAM http://pfam.sanger.ac.uk/ and SMART http://smart.embl-heidelberg.de/ with the default cut off parameters. Gene sequences with two HKD domains were considered to be members of the PLD gene family. Pseudogenes were determined according to their gene annotation or when their coding sequences were obviously terminated by premature stop codons.
Sequence and phylogenetic analyses of PLD gene family
PLD gene sequences were aligned using the program Clustal X with BLOSUM30 as the protein weight matrix. The program MUSCLE (version 3.52) was also used to perform multiple sequence alignments to confirm the Clustal X data output . Phylogenetic trees based on the protein sequences of the PLDs were constructed using the neighbor-joining (NJ) method of the program MEGA4  with p-distance and the complete deletion option parameters engaged. The reliability of the trees obtained was tested using bootstrapping with 1000 replicates. Images of the phylogenetic trees were also drawn using MEGA4.
Chromosomal location and Gene structure of PLD genes
PLD gene chromosomal locations were determined using the Poplar genome browser http://genome.jgi-psf.org/Poptr1_1/optr1_1.home.html and Grape genome browser http://www.cns.fr/externe/GenomeBrowser/Vitis/, respectively. Gene intron/extron structure information was collected from the genome annotations of Poplar and Grape from NCBI.
Protein Motif analysis
In order to investigate protein motifs in more detail, the PLD protein sequences were analyzed using the MEME/MAST software http://meme.sdsc.edu/[44, 45]. The functional annotation of the identified motifs was implemented by InterProScan http://www.ebi.ac.uk/Tools/InterProScan/.
Analysis of PLD gene expansion patterns
Segmental (chromosomal segments) duplication, tandem duplication (duplications in a tandem pattern) and transposition events result in gene family expansion . Transposition occurs when a segment from one chromosome becomes unaligned with the corresponding segment from the other chromosome. Because it is difficult to identify transposition events based on gene sequence analysis, in this study we focused on the processes of segmental and tandem duplication. To categorize expansion of the PLD gene family, we examined the chromosomal locations of all members of this family in Arabidopsis, rice, Poplar and Grape. Tandem duplication was characterized by multiple gene family members occurring within either the same or neighboring intergenic regions. A method similar to that of Maher et al.  was used to identify segmental duplications. First paralogous PLD genes were identified at the terminal nodes of the phylogenetic tree. Next, 10 protein-coding genes upstream and downstream of each pair of paralogs were obtained from the annotated genomes of Arabidopsis, rice, Poplar and Grape. Lastly, the similarity between the genes flanking one PLD gene and those flanking the other PLD gene in each pair of paralogs was determined. A pair of paralogous PLD genes was considered to have originated from a duplication event if both resided within a region of conserved protein-coding genes.
Calculating Ks to date the duplication events and adaptive evolution analysis of PLD gene family
Pairwise alignment of nucleotide sequences of PLD paralogs was performed using Clustal X1.83. Gaps in the alignments were removed manually by Bioedit. The Ka and Ks values of the paralogous genes were estimated by the program K-Estimator 6.0 . To better explain the patterns of macro-evolution, estimates of the evolutionary rates were considered extremely useful. Assuming a molecular clock, the synonymous substitution rates (Ks) of duplicated genes would be expected to be similar over time . Thus, Ks could be used as the proxy for time and the conserved flanking protein-coding genes was used to estimate the dates of the segmental duplication events. The mean Ks value was calculated for each of duplicated gene pairs and then used to date the duplication events. Ks values greater than 2.0 were discarded in order to avoid the risk of saturation. The Ks values were then used to calculate the approximate date of the duplication event(T = Ks/2λ), assuming clock-like rates (λ) of synonymous substitution of 1.5 × 10 -8 substitutions/synonymous site/year for Arabidopsis , 6.5 × 10 -9 for rice , 9.1 × 10 -9 for Poplar , and 6.5 × 10 -9 for Grape . To investigate whether Darwinian positive selection was involved in driving gene divergence after duplication, first Sliding Window analysis (300 bp window, 50 bp slide) was performed on the coding regions of paralogous PLD genes from the four plant species studied and was then used to calculate the Ka/Ks ratio. Subsequently, the codon-based site model of codeml in PAML  was used to perform adaptive evolution analysis on the three different types of PLD genes separately.
This project was supported by grants from National Science Foundation of China (No. 30871704, and No.30971452 to Hu X) and 100 Talents Program of CAS to Hu X.
- Wang X: Regulatory functions of phospholipase D and phosphatidic acid in plant growth, development, and stress responses. Plant Physiol. 2005, 139 (2): 566-573. 10.1104/pp.105.068809.PubMedPubMed CentralView ArticleGoogle Scholar
- Munnik T: Phosphatidic acid: an emerging plant lipid second messenger. Trends Plant Sci. 2001, 6 (5): 227-233. 10.1016/S1360-1385(01)01918-5.PubMedView ArticleGoogle Scholar
- DJ H, IL C: A new phospholipidsplitting enzyme specific for the ester linkage between the nitrogenous base and the phosphoric acid grouping. J Biol Chem. 1947, 669-705. 169Google Scholar
- Cockcroft S: Ca2+-dependent conversion of phosphatidylinositol to phosphatidate in neutrophils stimulated with fMet-Leu-Phe or ionophore A23187. Biochim Biophys Acta. 1984, 795 (1): 37-46.PubMedView ArticleGoogle Scholar
- Bocckino SB, Blackmore PF, Wilson PB, Exton JH: Phosphatidate accumulation in hormone-treated hepatocytes via a phospholipase D mechanism. J Biol Chem. 1987, 262 (31): 15309-15315.PubMedGoogle Scholar
- C Q, X W: The Arabidopsis phospholipase D family: characterization of a Ca2+-independent and phosphatidylcholine-selective PLDζ1 with distinct regulatory domains. Plant Physiol. 2002, 128: 1057-1068. 10.1104/pp.010928.View ArticleGoogle Scholar
- Qin W, Pappan K, Wang X: Molecular heterogeneity of phospholipase D (PLD). Cloning of PLDgamma and regulation of plant PLDgamma, -beta, and -alpha by polyphosphoinositides and calcium. J Biol Chem. 1997, 272 (45): 28267-28273. 10.1074/jbc.272.45.28267.PubMedView ArticleGoogle Scholar
- McDermott M, Wakelam MJ, Morris AJ: Phospholipase D. Biochem Cell Biol. 2004, 82 (1): 225-253. 10.1139/o03-079.PubMedView ArticleGoogle Scholar
- Sharma S, Gupta MN: Purification of phospholipase D from Dacus carota by three-phase partitioning and its characterization. Protein Expr Purif. 2001, 21 (2): 310-316. 10.1006/prep.2000.1357.PubMedView ArticleGoogle Scholar
- Qin C, Wang C, Wang X: Kinetic analysis of Arabidopsis phospholipase Ddelta. Substrate preference and mechanism of activation by Ca2+ and phosphatidylinositol 4,5-biphosphate. J Biol Chem. 2002, 277 (51): 49685-49690. 10.1074/jbc.M209598200.PubMedView ArticleGoogle Scholar
- Li G, Lin F, Xue HW: Genome-wide analysis of the phospholipase D family in Oryza sativa and functional characterization of PLD beta 1 in seed germination. Cell Res. 2007, 17 (10): 881-894. 10.1038/cr.2007.77.PubMedView ArticleGoogle Scholar
- Yamaguchi T, Kuroda M, Yamakawa H, Ashizawa T, Hirayae K, Kurimoto L, Shinya T, Shibuya N: Suppression of a phospholipase D gene, OsPLDbeta1, activates defense responses and increases disease resistance in rice. Plant Physiol. 2009, 150 (1): 308-319. 10.1104/pp.108.131979.PubMedPubMed CentralView ArticleGoogle Scholar
- Testerink C, Munnik T: Phosphatidic acid: a multifunctional stress signaling lipid in plants. Trends Plant Sci. 2005, 10 (8): 368-375. 10.1016/j.tplants.2005.06.002.PubMedView ArticleGoogle Scholar
- Koonin EV: A duplicated catalytic motif in a new superfamily of phosphohydrolases and phospholipid synthases that includes poxvirus envelope proteins. Trends Biochem Sci. 1996, 21 (7): 242-243.PubMedView ArticleGoogle Scholar
- JH E: Phospholipase D-structure, regulation and function. Rev Physiol Biochem Pharmacol. 2002, 1: 1-94.Google Scholar
- Kopka J, Pical C, Hetherington AM, Muller-Rober B: Ca2+/phospholipid-binding (C2) domain in multiple plant proteins: novel components of the calcium-sensing apparatus. Plant Mol Biol. 1998, 36 (5): 627-637. 10.1023/A:1005915020760.PubMedView ArticleGoogle Scholar
- van Leeuwen W, Okresz L, Bogre L, Munnik T: Learning the lipid language of plant signalling. Trends Plant Sci. 2004, 9 (8): 378-384. 10.1016/j.tplants.2004.06.008.PubMedView ArticleGoogle Scholar
- Elias M, Potocky M, Cvrckova F, Zarsky V: Molecular diversity of phospholipase D in angiosperms. BMC Genomics. 2002, 3 (1): 2-10.1186/1471-2164-3-2.PubMedPubMed CentralView ArticleGoogle Scholar
- Hong Y, Devaiah SP, Bahn SC, Thamasandra BN, Li M, Welti R, Wang X: Phospholipase D epsilon and phosphatidic acid enhance Arabidopsis nitrogen signaling and growth. Plant J. 2009, 58 (3): 376-387. 10.1111/j.1365-313X.2009.03788.x.PubMedPubMed CentralView ArticleGoogle Scholar
- Jaillon O, Aury JM, Noel B, Policriti A, Clepet C, Casagrande A, Choisne N, Aubourg S, Vitulo N, Jubin C, et al: The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature. 2007, 449 (7161): 463-467. 10.1038/nature06148.PubMedView ArticleGoogle Scholar
- Hedges SB: The origin and evolution of model organisms. Nat Rev Genet. 2002, 3 (11): 838-849. 10.1038/nrg929.PubMedView ArticleGoogle Scholar
- Kong H, Landherr LL, Frohlich MW, Leebens-Mack J, Ma H, dePamphilis CW: Patterns of gene duplication in the plant SKP1 gene family in angiosperms: evidence for multiple mechanisms of rapid gene birth. Plant J. 2007, 50 (5): 873-885. 10.1111/j.1365-313X.2007.03097.x.PubMedView ArticleGoogle Scholar
- Sterck L, Rombauts S, Jansson S, Sterky F, Rouze P, Van de Peer Y: EST data suggest that poplar is an ancient polyploid. New Phytol. 2005, 167 (1): 165-170. 10.1111/j.1469-8137.2005.01378.x.PubMedView ArticleGoogle Scholar
- Tuskan GA, Difazio S, Jansson S, Bohlmann J, Grigoriev I, Hellsten U, Putnam N, Ralph S, Rombauts S, Salamov A, et al: The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science. 2006, 313 (5793): 1596-1604. 10.1126/science.1128691.PubMedView ArticleGoogle Scholar
- Bowers JE, Chapman BA, Rong J, Paterson AH: Unravelling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events. Nature. 2003, 422 (6930): 433-438. 10.1038/nature01521.PubMedView ArticleGoogle Scholar
- Wolfe KH, Gouy M, Yang YW, Sharp PM, Li WH: Date of the monocot-dicot divergence estimated from chloroplast DNA sequence data. Proc Natl Acad Sci USA. 1989, 86 (16): 6201-6205. 10.1073/pnas.86.16.6201.PubMedPubMed CentralView ArticleGoogle Scholar
- Crane PR, Friis EM, Pedersen KR: The origin and early diversification of angiosperms. Nature. 1995, 374 (6517): 27-33. 10.1038/374027a0.View ArticleGoogle Scholar
- Kellogg EA: Evolutionary history of the grasses. Plant Physiol. 2001, 125 (3): 1198-1205. 10.1104/pp.125.3.1198.PubMedPubMed CentralView ArticleGoogle Scholar
- Ku HM, Vision T, Liu J, Tanksley SD: Comparing sequenced segments of the tomato and Arabidopsis genomes: large-scale duplication followed by selective gene loss creates a network of synteny. Proc Natl Acad Sci USA. 2000, 97 (16): 9121-9126. 10.1073/pnas.160271297.PubMedPubMed CentralView ArticleGoogle Scholar
- Gu X: Statistical methods for testing functional divergence after gene duplication. Mol Biol Evol. 1999, 16 (12): 1664-1674.PubMedView ArticleGoogle Scholar
- Gu X: Maximum-likelihood approach for gene family evolution under functional divergence. Mol Biol Evol. 2001, 18 (4): 453-464.PubMedView ArticleGoogle Scholar
- Li WH, Gojobori T: Rapid evolution of goat and sheep globin genes following gene duplication. Mol Biol Evol. 1983, 1 (1): 94-108.PubMedGoogle Scholar
- Yang Z: PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol. 2007, 24 (8): 1586-1591. 10.1093/molbev/msm088.PubMedView ArticleGoogle Scholar
- Eddy SR: Profile hidden Markov models. Bioinformatics. 1998, 14 (9): 755-763. 10.1093/bioinformatics/14.9.755.PubMedView ArticleGoogle Scholar
- Hunter S, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bork P, Das U, Daugherty L, Duquenne L, et al: InterPro: the integrative protein signature database. Nucleic Acids Res. 2009, D211-215. 10.1093/nar/gkn785. 37 DatabaseGoogle Scholar
- Sutton RB, Davletov BA, Berghuis AM, Sudhof TC, Sprang SR: Structure of the first C2 domain of synaptotagmin I: a novel Ca2+/phospholipid-binding fold. Cell. 1995, 80 (6): 929-938. 10.1016/0092-8674(95)90296-1.PubMedView ArticleGoogle Scholar
- Pappan K, Qin W, Dyer JH, Zheng L, Wang X: Molecular cloning and functional analysis of polyphosphoinositide-dependent phospholipase D, PLDbeta, from Arabidopsis. J Biol Chem. 1997, 272 (11): 7055-7061. 10.1074/jbc.272.11.7055.PubMedView ArticleGoogle Scholar
- Pappan K, Austin-Brown S, Chapman KD, Wang X: Substrate selectivities and lipid modulation of plant phospholipase D alpha, -beta, and -gamma. Arch Biochem Biophys. 1998, 353 (1): 131-140. 10.1006/abbi.1998.0640.PubMedView ArticleGoogle Scholar
- Sung TC, Roper RL, Zhang Y, Rudge SA, Temel R, Hammond SM, Morris AJ, Moss B, Engebrecht J, Frohman MA: Mutagenesis of phospholipase D defines a superfamily including a trans-Golgi viral protein required for poxvirus pathogenicity. EMBO J. 1997, 16 (15): 4519-4530. 10.1093/emboj/16.15.4519.PubMedPubMed CentralView ArticleGoogle Scholar
- Wang C, Wang X: A novel phospholipase D of Arabidopsis that is activated by oleic acid and associated with the plasma membrane. Plant Physiol. 2001, 127 (3): 1102-1112. 10.1104/pp.010444.PubMedPubMed CentralView ArticleGoogle Scholar
- J Z, X W: Arabidopsis phospholipase Da1 interacts with the heterotrimeric G-protein a subunit through a motif analogous to the DRY motif in G-protein-coupled receptors. J Biol Chem. 2004, 279: 1794-1800.View ArticleGoogle Scholar
- Edgar RC: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004, 32 (5): 1792-1797. 10.1093/nar/gkh340.PubMedPubMed CentralView ArticleGoogle Scholar
- Kumar S, Nei M, Dudley J, Tamura K: MEGA: a biologist-centric software for evolutionary analysis of DNA and protein sequences. Brief Bioinform. 2008, 9 (4): 299-306. 10.1093/bib/bbn017.PubMedPubMed CentralView ArticleGoogle Scholar
- Bailey TL, Elkan C: Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc Int Conf Intell Syst Mol Biol. 1994, 2: 28-36.PubMedGoogle Scholar
- Bailey TL, Gribskov M: Combining evidence using p-values: application to sequence homology searches. Bioinformatics. 1998, 14 (1): 48-54. 10.1093/bioinformatics/14.1.48.PubMedView ArticleGoogle Scholar
- Cannon SB, Mitra A, Baumgarten A, Young ND, May G: The roles of segmental and tandem gene duplication in the evolution of large gene families in Arabidopsis thaliana. BMC Plant Biol. 2004, 4: 10-10.1186/1471-2229-4-10.PubMedPubMed CentralView ArticleGoogle Scholar
- Maher C, Stein L, Ware D: Evolution of Arabidopsis microRNA families through duplication events. Genome Res. 2006, 16 (4): 510-519. 10.1101/gr.4680506.PubMedPubMed CentralView ArticleGoogle Scholar
- Comeron JM: K-Estimator: calculation of the number of nucleotide substitutions per site and the confidence intervals. Bioinformatics. 1999, 15 (9): 763-764. 10.1093/bioinformatics/15.9.763.PubMedView ArticleGoogle Scholar
- Shiu SH, Karlowski WM, Pan R, Tzeng YH, Mayer KF, Li WH: Comparative analysis of the receptor-like kinase family in Arabidopsis and rice. Plant Cell. 2004, 16 (5): 1220-1234. 10.1105/tpc.020834.PubMedPubMed CentralView ArticleGoogle Scholar
- Blanc G, Wolfe KH: Widespread paleopolyploidy in model plant species inferred from age distributions of duplicate genes. Plant Cell. 2004, 16 (7): 1667-1678. 10.1105/tpc.021345.PubMedPubMed CentralView ArticleGoogle Scholar
- Yu J, Wang J, Lin W, Li S, Li H, Zhou J, Ni P, Dong W, Hu S, Zeng C, et al: The Genomes of Oryza sativa: a history of duplications. PLoS Biol. 2005, 3 (2): e38-10.1371/journal.pbio.0030038.PubMedPubMed CentralView ArticleGoogle Scholar
- Lynch M, Conery JS: The evolutionary fate and consequences of duplicate genes. Science. 2000, 290 (5494): 1151-1155. 10.1126/science.290.5494.1151.PubMedView ArticleGoogle Scholar
- Gaut BS, Morton BR, McCaig BC, Clegg MT: Substitution rate comparisons between grasses and palms: synonymous rate differences at the nuclear gene Adh parallel rate differences at the plastid gene rbcL. Proc Natl Acad Sci USA. 1996, 93 (19): 10274-10279. 10.1073/pnas.93.19.10274.PubMedPubMed CentralView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.