The evolution of WRKY transcription factors
© Rinerson et al.; licensee BioMed Central. 2015
Received: 26 November 2014
Accepted: 13 February 2015
Published: 27 February 2015
The availability of increasing numbers of sequenced genomes has necessitated a re-evaluation of the evolution of the WRKY transcription factor family. Modern day plants descended from a charophyte green alga that colonized the land between 430 and 470 million years ago. The first charophyte genome sequence from Klebsormidium flaccidum filled a gap in the available genome sequences in the plant kingdom between unicellular green algae that typically have 1-3 WRKY genes and mosses that contain 30-40. WRKY genes have been previously found in non-plant species but their occurrence has been difficult to explain.
Only two WRKY genes are present in the Klebsormidium flaccidum genome and the presence of a Group IIb gene was unexpected because it had previously been thought that Group IIb WRKY genes first appeared in mosses. We found WRKY transcription factor genes outside of the plant lineage in some diplomonads, social amoebae, fungi incertae sedis, and amoebozoa. This patchy distribution suggests that lateral gene transfer is responsible. These lateral gene transfer events appear to pre-date the formation of the WRKY groups in flowering plants. Flowering plants contain proteins with domains typical for both resistance (R) proteins and WRKY transcription factors. R protein-WRKY genes have evolved numerous times in flowering plants, each type being restricted to specific flowering plant lineages. These chimeric proteins contain not only novel combinations of protein domains but also novel combinations and numbers of WRKY domains. Once formed, R protein WRKY genes may combine different components of signalling pathways that may either create new diversity in signalling or accelerate signalling by short circuiting signalling pathways.
We propose that the evolution of WRKY transcription factors includes early lateral gene transfers to non-plant organisms and the occurrence of algal WRKY genes that have no counterparts in flowering plants. We propose two alternative hypotheses of WRKY gene evolution: The “Group I Hypothesis” sees all WRKY genes evolving from Group I C-terminal WRKY domains. The alternative “IIa + b Separate Hypothesis” sees Groups IIa and IIb evolving directly from a single domain algal gene separate from the Group I-derived lineage.
KeywordsWRKY transcription factor Evolution Lateral gene transfer Resistance protein Charophyte
Over twenty years ago research began into an unknown group of DNA-binding proteins and during this time we have learned much about WRKY transcription factors. Based on the first amino acid sequences, several suggestions were made that subsequent research has shown to be correct. For example, the conserved cysteines and histidines in the WRKY domain do indeed form a novel zinc finger-like motif and the WRKY amino acid sequence binds directly its cognate cis-acting element, the W box (TTGACC/T) DNA binding site. As soon as the WRKY domain was characterized, it was suggested that it contained a novel zinc finger structure, even though the spacing of zinc chelating amino acids was unusual. The first evidence to support a zinc finger structure came from studies with 2-phenanthroline that chelates zinc ions. Addition of 2-phenenthroline to gel retardation assays using WRKY proteins resulted in a loss of binding to the W box target sequence . The other main suggestion was that the WRKY signature amino acid sequence at the N-terminus of the WRKY domain binds directly to the W box sequence in the DNA of target promoters. This was shown to be correct by publication of the solution structure of the C-terminal WRKY domain of the Arabidopsis WRKY4 protein in the absence of binding to a W box. The WRKY domain was found to form a four-stranded β-sheet . Soon afterwards, a crystal structure of the C-terminal WRKY domain of the Arabidopsis WRKY1 protein was also reported. This revealed a similar solution structure except that the WRKY domain may contain an additional β-strand at the N-terminus of the domain . An important breakthrough was recently reported with the first structural determination of the WRKY domain bound to its W Box cis-acting element . This revealed that part of a four-stranded β-sheet enters the major groove of DNA in an atypical mode that was called a β-wedge. This sheet is almost perpendicular to the DNA helical axis. As initially predicted, amino acids in the conserved WRKYGQK signature motif contact the W Box DNA bases mainly through extensive apolar contacts with thymine methyl groups . These structural data provide the molecular basis to explain the previously noted conservation of both the WRKY signature sequence at the N-terminus of the WRKY domain and the W Box DNA sequence .
There has been interest in the evolution of the WRKY gene family as it promises to yield insights into how biotic and abiotic stress responses and signalling evolved as plants went from single cellular aquatic algae to multicellular flowering plants. The first work defined the seven major groups of WRKY genes found in flowering plants (Groups I, IIa, IIb, IIc, IId, IIe, and III) . This classification was only partly based on phylogenetic analyses but has proven over time to be an accurate representation of the major groups of WRKY genes in flowering plants [5,6]. In 2005, Zhang and Wang used the availability of an increasing number of plant genome sequences to propose a hypothesis of the evolution of WRKY genes in plants . They hypothesized that a proto-WRKY gene with a single WRKY domain underwent domain duplication to produce Group I WRKY genes. Subsequent loss of the N-terminal WRKY domain led to Group IIc genes from which all other WRKY genes evolved, Group III genes being the last . Since this paper, the first genome sequences of a moss  and a spike moss  have been published and with this extra data it became clear that Group III genes were not the last to evolve but rather Group IIa genes . Babu et al. looked for WRKY-like genes outside of the plant kingdom and showed that WRKY domains share a similar zinc finger domain and four strand fold with GCM1 and FLYWCH domains and suggested that they may be derived from a BED finger and ultimately a C2H2 zinc finger domain . This appears at least partly true. The zinc finger structures of these proteins do appear to have some similarities at the primary amino acid level, suggesting that they are related. However, there appear to be no similarities in the WRKY signature portion of the domains. It is possible that the zinc finger portions of WRKY, GCM1 and FLYWCH proteins do share a common ancestor but any common structural features other than the zinc finger share no similarities at the primary amino acid sequence level. The most recent work on the evolution of the WRKY gene family, again proposed an ancestral Group I WRKY gene, Group IIa evolving from a Group I gene, Group IId evolving from Group IIa, and Group III genes being evolutionarily the youngest . However, several lines of evidence from sequenced genomes show that this cannot be the case. Firstly, Group IIa WRKY genes were the last to evolve as they are the only group absent from the spike moss Selaginella moellendorffii . This means that Group IId genes could not have evolved from Group IIa genes because Group IId genes predate Group IIa genes. Similarly, Group III-like genes are present in the moss Physcomitrella patens  and Group III genes are present in S. moellendorffi. Group III genes therefore predate both Group IIa genes and Group IIe genes and cannot therefore be the youngest group.
The many diverse species of modern day plants all descended from a single charophyte green alga that colonized the land between 430 and 470 million years ago . The recent availability of the first genome sequence from a member of the Charophyta (the filamentous terrestrial alga Klebsormidium flaccidum)  fills in a gap in the evolutionary history of WRKY genes associated with the colonization of land by plants and reveals some unexpected new insights into WRKY evolution.
Our phylogenetic and comparative genomic studies show here that there has indeed been a lineage-specific expansion of WRKY transcription factors in plants but they are not found exclusively in plants. WRKY transcription factors most likely evolved very early in the green lineage and then a number of lateral gene transfer events have occurred in diplomonads, social amoebae, fungi incertae sedis, and amoebozoa. These non-plant WRKY genes do not belong to any of the seven groups of WRKY genes found in flowering plants suggesting that these lateral gene transfers are ancient and may provide insights into the early ancestral single domain WRKY genes. Based on our phylogenetic analyses and genomic searches we propose that there are four major WRKY transcription factor lineages in flowering plants, Groups I + IIc, Groups IIa + IIb, Groups IId + IIe, and Group III. Group I WRKY proteins have two WRKY domains, whereas Group II proteins have a single domain, and Group III proteins have a single domain with a C-C-H-C zinc finger structure rather than C-C-H-H. We propose two alternative hypotheses of WRKY gene evolution: The “Group I Hypothesis” sees all WRKY genes evolving from Group I C-terminal domains with IIb genes evolving before the appearance of the conserved PR intron. The alternative “IIa + b separate Hypothesis” sees Groups IIa and IIb with their hallmark VQR intron evolving directly from a single domain ancestral algal WRKY gene separate from the other Group I-derived lineage. We also show that one other type of WRKY gene has evolved in flowering plants and these proteins contain domains typical for both resistance (R) proteins and WRKY transcription factors. We have classified these R protein-WRKY genes into eight groups (RW1-RW8). R protein-WRKY genes are not present in all plant genomes but have evolved numerous times in flowering plants. Each type of R protein-WRKY gene is restricted to specific flowering plant lineages. These chimeric proteins contain not only novel combinations of protein domains but also novel combinations and numbers of WRKY domains.
Results and discussion
Distribution of WRKY genes in the tree of life
To gain further insights into the evolution of WRKY transcription factors, we used our extensive data set of WRKY domains for phylogenetic analyses. A MUSCLE alignment  of the WRKY domains was produced in MEGA6  (Additional file 2). Inspection of the alignment and comparisons to similar alignments produced using CLUSTALW  showed that the MUSCLE alignment was better at correctly aligning the zinc coordinating amino acids and fewer manual adjustments were required than with CLUSTALW results (data not shown). CLUSTALW yielded the lowest accuracy for full-length sequences in almost all test cases compared to eight other popular multiple sequence alignment programs  and the choice of MUSCLE to create sequence alignments instead of CLUSTALW  or CLUSTALX  is liable to result in more robust phylogenetic analyses.
Figure 2 shows both Neighbor Joining and Maximum Likelihood trees of the data set of WRKY domains. Other phylogenetic trees such as Minimum Evolution and Maximum Parsimony produced similar results. All seven flowering plant WRKY groups are present as separate clades together with several other groups that appear not to be present in flowering plants. These additional groups vary in their positions in the NJ and ML trees, probably because these proteins are not members of any of the higher plant groups. One of these non-flowering plant groups contains all the WRKY genes from fungi incertae sedis. These WRKY genes are the most divergent of WRKY transcription factors. The other groups consist of WRKY genes from unicellular green algae, diplomonads, social amoebae, and amoebozoa. These observations establish that some WRKY genes are found outside of the plant kingdom and that apart from some Group I-like genes in social amoebae, and amoebozoa, these non-plant genes are not representatives of any of the seven flowering plant WRKY groups.
Non-plant WRKY genes
Fungi such as the fungi incertae sedis are ancient, probably over 1,000 million years old , and have been suggested as playing an important role in the evolution of land plants . Rhizopus microsporus is a widely distributed soil fungus that can cause mucormycosis in immunocompromised humans and seedling blight in rice . Fungi incertae sedis species such as R. irregularis (formerly Glomus intraradices) are mycorrhizal fungi and actually penetrate plant cells . It is therefore possible that an ancient WRKY gene was transferred from a plant cell early during the evolution of land plants and that this gene has given rise to the fungal type of WRKY gene. Fossil evidence shows arbuscular mycorrhizal symbiosis to be at least as old as the earliest land plants (470–480 million years ago) and to predate plant roots . It is likely that colonization of the land by plants was therefore dependent on fungal provision of inorganic nutrients and water . It is possible that these first terrestrial symbioses with fungal cells led to an early lateral gene transfer of a WRKY gene to a non-plant host.
WRKY genes in unicellular green algae
The WRKY genes in unicellular green algae fall into three groups based on phylogenetic analyses (Figure 2). One group corresponds to the Group I genes found in flowering plants. These genes have been postulated to be ancestral to all higher plant WRKY genes, largely because the only WRKY gene present in the unicellular green alga C. reinhardtii is of this type. The situation in C. reinhardtii may, however, be a little misleading as other unicellular green algae, such as M. pusilla, O. lucimarinus, and O. tauri have more than a single WRKY gene but these do not include genes that are members of Group I.
We therefore propose that Group I WRKY genes may not be the universal ancestor of WRKY genes in higher plants and that other groups of WRKY genes may have evolved directly from ancient single domain WRKY gene(s). The other groups of WRKY genes in unicellular green algae might possibly have been the direct ancestors of Groups IIa and IIb in flowering plants (see below) and certainly seem to have been ancestral to some WRKY genes found outside of the plant kingdom. Consistent with this suggestion, the diplomonad WRKY genes cluster with a group of unicellular green alga WRKY genes that have a single domain and are not part of the Group I clade (Figure 5), suggesting that it was lateral gene transfer from this class of algal genes that led to diplomonad genes. This single domain type of algal WRKY gene does not appear to be represented in higher plant genomes and this suggests that these WRKY genes have no counterparts in higher plants. This is also consistent with their apparent loss in C. reinhardtii. The presence of these single domain WRKY genes in unicellular green algae and similar two domain versions in diplomonads suggests that Group I genes were not the only early WRKY genes that could have given rise to WRKY groups in higher plants or non-plant organisms.
WRKY genes in multicellular green algae
Until recently, there was a large gap in the available genome sequences in the plant kingdom between unicellular green algae such as C. reinhardtii that typically have 1–3 WRKY genes and mosses such as P. patens that have 30–40 genes. This situation changed with the publication of the K. flaccidum genome sequence . We have searched the available K. flaccidum genome sequence and it contains just two WRKY genes (Figures 1 and 2). The first is a Group I gene (kfl00096) that contains two WRKY domains similar to the single C. reinhardtii gene. Unexpectedly, the second gene (kfl00189) is a Group IIb gene. Phylogenetic analyses show that kfl00189 clusters with other Group IIb WRKY genes (Figure 2). The amino acid sequence of the kfl00189 WRKY domain also has hallmarks of Group IIb proteins from flowering plants such as the C-X5-C spacing in the zinc finger motif, the sequence QVQR in the middle of the finger, and the sequence DGCx immediately before the WRKY amino acid signature (Figure 1). All of these primary amino acid sequences are features of Group IIb WRKY proteins . Strikingly, kfl00189 also contains the conserved QVQR type intron that flowering plant Group IIb and IIa genes possess rather than the PR type intron shared by all other flowering plant WRKY genes .
These observations necessitate a re-evaluation of the current view of WRKY gene evolution. Previously, it had been assumed that Group IIb genes evolved from Group IIc-like genes later in the evolution of plants . Now it is clear that only Group I genes predate them. However, the new information that we provide here showing an early evolution of Group IIb genes poses a new question. Did the Group IIb genes evolve from a Group I gene or did they evolve independently from a single WRKY domain-containing unicellular green alga gene? These are the two types of WRKY gene that were present before the colonization of land and multicellularity and so IIb genes must have evolved from one type or the other. We have called these two different possibilities the “IIa + b Separate Hypothesis” and the “Group I Hypothesis”. The “IIa + b Separate Hypothesis” suggests that Group IIa and IIb WRKY genes did not evolve from Group I genes whereas the “Group I Hypothesis” suggests that all WRKY genes in higher plants evolved from Group I genes. Group I WRKY genes from unicellular green algae do not appear to have the conserved PR intron that is a hallmark of most multicellular plant WRKY genes (Figure 6). Both the PR intron and the QVR intron first appear in filamentous green algae at about the time of the colonization of land. Further information is required to determine how Group IIb WRKY genes evolved in the first multicellular green algae.
WRKY genes in mosses and spike mosses
The further evolution of WRKY genes in multicellular plants is rather clearer. The first available genome sequence of a moss (P. patens) showed that it contains Group I, Group IIb, Group IIc, Group IId, and Group III-like genes . The newly evolved Group IIc WRKY genes appear to have evolved from Group I genes by loss of the N-terminal domain. It is likely that both Group IId and Group III genes evolved from Group IIc/Group I C-terminal domain genes based on the presence of the conserved PR intron (Figure 6).
P. patens has Group III-like genes that show distinct features that are not present in Group III genes from more advanced plants. For example, the single domain genes from fungi appear to be closer phylogenetically to these P. patens Group III genes than other WRKY genes (Figure 2). The consensus amino acid sequence of the WRKY signature from these variant moss Group III proteins is WKNNGNT, compared to WKKYGNK in fungal genes and WRKYGQK in flowering plant Group III genes (Figure 3). In filamentous green algae, there appear (based on the single available genome) to be only Group I and Group IIb genes. It is therefore likely that Group III genes evolved from Group I genes and not IIb genes because Group I and Group III share the PR intron (Figure 6). It is now clear that previous suggestions that Group III genes were the last group to evolve [7,11] are certainly incorrect as Group III genes predate Group IIa and Group IIe genes (Figure 2).
The genome sequence of a spike moss, S. moellendorffii  provided a view of the WRKY gene family in a primitive vascular plant. The approximately 40 million years of evolution that separates the mosses from S. moellendorffii, has seen two major changes in the WRKY gene family. Firstly, the appearance of Group IIe genes and secondly, vascular plants starting with S. moellendorffii have Group III WRKY genes that are similar to those in flowering plants with a similar zinc finger structure and WRKY signature amino acid sequence.
WRKY genes in flowering plants
All of the main groups of WRKY genes that are present in flowering plants are present in S. moellendorffii except for Group IIa genes which were therefore the last to evolve and appear to have arisen from Group IIb genes (Figure 2). Group IIa genes are the group with the smallest number of members and appear to play many important roles in regulating stress responses (both biotic and abiotic) .
The relationship with FLYWCH, GCM1, and BED proteins
One of the most unusual features of the WRKY gene family in flowering plants is the existence of chimeric proteins comprising domains typical for both R proteins and WRKY transcription factors . These R protein domains include toll interleukin 1 receptor (TIR), leucine-rich repeat (LRR), nucleotide-binding site (NBS), and APAF-1, R proteins, and CED-4 domain (ARC). With the sequencing of the A. thaliana genome, three such R protein-WRKY genes were found (AtWRKY16, AtWRKY19, and AtWRKY52) and it seemed likely that R protein-WRKY genes were a feature of most plant genomes. The majority of plant resistance (R) genes encode a class of innate immune receptors (NLRs) with nucleotide binding and leucine-rich repeat domains. R-gene evolution is thought to be facilitated by the formation of R-gene clusters, which permit sequence exchanges via recombinatorial mispairing and generate high haplotypic diversity. This pattern of evolution may also generate diversity at other loci that contribute to the R-complex .
R protein-WRKY genes
Gene model and comments
AT5G45260, RRS1, ATWRKY52, SLH1
Chr5: 18326203 - 18332609
AT5G45050, TTR1, ATWRKY16
Chr5: 18176914 - 18181805
AT4G12020, ATWRKY19, MAPKKK11
Chr4: 7201656 - 7208766
scaffold_8: 2126189 - 2132181
scaffold_8: 2768079 - 2773021
scaffold_8: 2623905 - 2628796
Chr02: 12369911 - 12381876
Chr08: 53517353 - 53522939
Chr02: 52695615 - 52704484
scaffold_8: 1163667 - 1169265
Oryza sativa japonica
LOC_Os07g17230. FgenesH prediction different
Oryza sativa indica
Chromosome 11: 21,830,082-21,837,218
Oryza sativa japonica
gi|108864659 Retrotransposon at 3 prime end
LG7: 18263740 - 18277966
LG7: 22236162 - 22242036
LG7: 9804380 - 9813690
LG6: 802048 - 808355
Gm05: 35364051 - 35374699
Chr08: 48587929 - 48599304
Chr08: 48660141 - 48668419
scaffold_2: 820964 - 828678
scaffold_2: 805474 - 814178
scaffold_2: 845249 - 851848
scaffold_2: 26415481 - 26421105
MRNA21370. Tandem repeat with FvRWRKY1
LG7: 18263740 - 18277966
Group RW1: TIR-NB ARC-LRR-WRKY (IIe). Found in Capsella and Arabidopsis.
Group RW2: TIR-NB ARC-LRR-WRKY (III)-[WRKY (III)]. May have one or two Group III WRKY domains and may possibly be two groups. Found in strawberry.
Group RW3: PAH-WRKY (I NT)-WRKY (I CT)-NB ARC. AtWRKY19 is the only member of this family. It also contains a MAP kinase kinase kinase domain at the C-terminal end of the protein. This region has very high sequence similarity to MAP kinase kinase kinases found in Arabidopsis.
Group RW4: TIR-NB ARC-LRR-WRKY (III). Has the same domains as Group RW2 but different architecture (different positions of the domains) and does not cluster with the Group RW2 proteins. Found in soybean.
Group RW5: [B3]-LRR-NB ARC-LRR-WRKY (IIe). Some, but not all, have a B3 DNA-binding domain. Found in the monocots rice, sorghum, and switchgrass.
Group RW6: NB ARC-LRR-WRKY (III)-WRKY (III). These are found in the monocots sorghum, barley, rice, foxtail millet, and Tausch’s goatgrass. One of the proteins has only one WRKY domain and another has an additional NAC DNA-binding domain.
Group RW7: LRR-WRKY (III)-WRKY (IId)-Calmodulin binding domain-WRKY (IIc). The two members of this group are found in G. raimondii (a possible progenitor species of tetraploid cotton). The WRKY domains from GrRWRKY1 are truncated and difficult to classify.
Group RW8: WRKY (III)-NB ARC-LRR. Found in cacao.
It is clear that these genomic rearrangements are associated with specific plant lineages and appear therefore to be relatively recent events. For example, Groups RW2 and RW4 are found in the Fabidae, and RW5 and RW6 in the grasses (Figure 8). Other groups such as RW2, RW7, and RW8 have only been found in a single species and even considering the limited availability of plant genome sequences, it is likely that they are present in only a small number of related species. This suggests that the formation of many of these R protein-WRKY genes are recent events and this is consistent with information showing that many R-genes are fast-evolving and characterized by chimeric structures resulting from frequent sequence exchanges among group members .
Sequence exchange between R-gene paralogues is considered to be the dominant mechanism for generating variations of type I resistance genes . In addition, it has been known for some time that novel disease resistance specificities result from sequence exchange between tandemly repeated genes . It may be significant that many of the R protein-WRKY genes contain one or more Group III WRKY domains because we have previously shown that tandem repeats of Group III WRKY genes exist in species such as B distachyon . It is possible that the existence of R protein-WRKY genes reflects the frequent recombination associated with some R-genes but it may also reflect a high level of recombination at some WRKY gene loci, especially tandem repeats. The strawberry NBS-LRR-WRKY genes FvRWRKY1 and FvRWRKY5 illustrate the relative instability of these genes in the genome (Figures 8, 9 and 10). FvRWRKY1 and FvRWRKY5 are found on linkage group 7 between 18245853 and 18295852. FGENESH predictions predict a single large polypeptide of 2,854 amino acids. However this predicted polypeptide contains what could be two very similar proteins with a TIR-NBS-LRR-WRKY (III) structure. The proteins are similar but not identical with blocks of similarity separated by dissimilar regions. Strikingly, an N-terminal segment of 186 amino acids from the first TIR-NBS-LRR-WRKY protein from amino acid 12 onwards is present as an identical 186 amino acids in the second TIR-NBS-LRR-WRKY protein (data not shown). Clearly, there has been a genomic rearrangement and duplication of some TIR-NBS-LRR-WRKY sequences. This illustrates that novel R protein-WRKY combinations appear to be formed through rearrangements including duplications.
We suggest that, once formed, some R protein-WRKY genes are under positive selection as they combine different components of signaling pathways that may either create new diversity in signaling or accelerate signaling by short circuiting signaling pathways. In favour of this hypothesis are the identities of other domains that have been incorporated in R protein-WRKYs. These domains do not seem to be random segments of protein coding genes but rather other signaling components such as B3 and NAC DNA-binding domains and calmodulin-binding domains (Figure 10).
It has also been observed that many transposable elements are found at R-gene loci, including retrotransposons, transposons, and miniature inverted transposable elements . This may provide one mechanism by which R-gene loci are rearranged. We found transposable elements next to at least one of the 29 R protein-WRKY genes, OsjRWRKY2 (data not shown), and transposable elements may therefore play a role in the creation of some R protein-WRKY genes.
It is possible that a small number of the predicted R protein-WRKY genes do not actually form chimeric proteins that contain all of the domains predicted by gene prediction programs such as FGENESH  and Hidden Markov Models (HMM). Further research will be required to determine the exact protein architecture produced from each individual gene and also whether there may be instances of alternate splicing. However, it is clear from studies of the Arabidopsis NBS-LRR-WRKY genes that these genes do indeed encode chimeric proteins and that the WRKY domain is indeed functional and binds to DNA .
A re-writing of WRKY transcription factor evolution
Early in the green lineage, a BED finger-like C2H2 zinc finger domain evolved into a WRKY domain by the addition of a WRKY-like motif N-terminal to the zinc finger. This single domain WRKY transcription factor served as the progenitor for all other WRKY genes. There appear to have been at least four independent lateral gene transfer events to non-plants during the early evolution of the WRKY gene family. The first may have occurred as long ago as 480 million years. During the colonization of land by plants the first terrestrial symbioses and other interactions with fungal cells led to lateral gene transfer of a WRKY gene to a non-plant host. These single domain fungal type WRKY genes from fungi incertae sedis are ancient and reflect the single WRKY domain present in the oldest form of WRKY transcription factors from unicellular organisms. The second lateral gene transfer saw an ancestral WRKY gene with a single WRKY domain transfer to a diplomonad, and the third saw a similar gene transfer to an amoebozoa species (Figure 2). The final lateral gene transfer event was the transfer of an early Group I WRKY gene from an alga to an amoebozoa. All of these events appear to have occurred before or during the conquest of land and there may have been multiple instances of similar transfers. As more genomic sequences become available, it may be possible to identify additional lateral gene transfer events of WRKY genes to non-plant species and to more accurately date these transfer events.
In the early multicellular terrestrial algae, Group IIb genes evolved either from a single WRKY domain-containing ancestor or from a Group I gene. We propose two alternative hypotheses of Group IIb WRKY gene evolution: The “Group I Hypothesis” sees all WRKY genes evolving from Group I C-terminal domains. The alternative “IIa + b Separate Hypothesis” sees Groups IIa and IIb with their hallmark VQR intron evolving directly from a single domain ancestral algal WRKY gene separate from the other Group I-derived lineage. The conserved PR intron, that is a hallmark of most multicellular plant WRKY genes, was not a feature of the C-terminal domains of the first Group I genes from unicellular green algae but evolved in the period that saw multicellular algae colonize the land. The presence of this PR intron in Group IId, IIe and III WRKY genes supports the hypothesis that these groups evolved from the group I C-terminal domain.
We are aware that the phylogenetic trees in Figure 2 are potentially at odds with the other data because it is possible that Group I + IIc, IIa + IIb, IId + IIe and III all evolved independently from ancestral WRKY genes. However, one observation argues strongly against this. If later Groups such as IId, IIe, and III evolved after Group I but independently from other ancestral WRKY genes – then where are these other ancestral genes? They are appear to be absent from all sequenced genomes that contain the earlier Group I/IIc/IId genes. Put simply, we cannot find any ancestral WRKY genes in multicellular green algae or mosses from which later groups could independently evolve other than Group I or Group IIb.
It appears from our phylogenetic analyses and genomic searches that there are four major WRKY transcription factor lineages in flowering plants, Groups I + IIc, Groups IIa + IIb, Groups IId + IIe, and Group III. In addition, there are several other groups of WRKY genes that are found only in unicellular green algae. WRKY genes that are present in non-plant species due to ancient lateral gene transfer are either from the algal types of WRKY genes or early Group I-like genes.
During the evolution of flowering plants, one other type of WRKY genes evolved that contain domains typical for both R proteins and WRKY transcription factors. These R protein-WRKYs are not found in all plant genomes but have evolved many times and with differing domain structures (Figures 8, 9 and 10). The formation of these R protein-WRKY genes is recent with classes being restricted to specific flowering plant lineages. Once formed, R protein-WRKYs may be selected for as they combine different components of signaling pathways that may either create new diversity in signaling or accelerate signaling by short circuiting signaling pathways.
Based on our phylogenetic analyses and genomic searches we propose a new hypothesis on the evolution of WRKY transcription factors that includes early lateral gene transfers to some non-plant organisms and algal WRKY genes that have no counterparts in flowering plants. There are four major WRKY transcription factor lineages in flowering plants, Groups I + IIc, Groups IIa + IIb, Groups IId + IIe, and Group III. We propose two alternative hypotheses of WRKY gene evolution: The “Group I Hypothesis” sees all WRKY genes evolving from Group I C-terminal domains with IIb genes evolving before the appearance of the conserved PR intron. The alternative “IIa + b separate Hypothesis” sees Groups IIa and IIb with their hallmark QVQR intron evolving directly from a single domain ancestral algal WRKY gene separate from the other Group I-derived lineage. Further genome sequences may help us determine which of these two alternatives is likely to best reflect the evolution of WRKY transcription factors.
Availability of supporting data
All supporting data are included as additional files.
The amino acids sequences of the complete WRKY gene families from the organisms used were taken from phytozome (http://www.phytozome.net/)  or NCBI (http://www.ncbi.nlm.nih.gov/). The amino acid sequences of the WRKY domains (Additional file 1: Table S1) or the complete amino acid sequences of the R protein-WRKYs (Additional file 3: Table S2) were used for phylogenetic analyses. The data set of R protein-WRKY genes was obtained using blastp, PSI-BLAST and tblastn searches at NCBI (http://www.ncbi.nlm.nih.gov/) [36,37]. Additionally, Hidden Markov models were developed to each R protein-WRKY. Genomic DNA sequences were analysed by FGENESH (http://www.softberry.com/)  to perform ab initio gene prediction in order to find any alternative protein predictions from those in gene models.
Alignments were constructed using MUSCLE  and the following parameters; Gap Penalties: Gap open −2.9, Gap Extended 0, Hydrophobicity multiplier 1.2 Memory/Iterations: Max Memory in MB 4095, Max Iterations 8; Clustering Method Iteration 1, 2 (UPGMB), Clustering Method (Other Iterations (UPGMB), Min. Diag. Length (Lambda) 24. The Alignment for Figure 2 is presented as Additional file 2.
For each Neighbor Joining tree , the percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (1000 replicates) were determined . The evolutionary distances were computed using the Poisson correction method  and are in the units of the number of amino acid substitutions per site. All ambiguous positions were removed for each sequence pair. Evolutionary analyses were conducted in MEGA6 . All positions containing alignment gaps and missing data were eliminated in pairwise sequence. In Figure 2B, the Maximum Likelihood tree with the highest log likelihood (−20809.5522) is shown. Initial tree(s) for the heuristic search were obtained by applying the Neighbor-Joining method to a matrix of pairwise distances estimated using a JTT model. The tree is drawn to scale, with branch lengths measured in the number of substitutions per site. The analysis involved 664 amino acid sequences. All positions with less than 95% site coverage were eliminated. There were a total of 58 positions in the final dataset.
Multiple sequence alignments and consensus sequences
Multiple sequence alignments and consensus sequences were produced using ClustalW2 (http://hmmer.janelia.org)  using the default settings and visualized using Jalview (www.jalview.org) . The multiple sequence alignment for Figure 2 is presented in Additional file 2.
Hidden Markov Model analyses were performed using the complete amino acid sequences on the R protein-WRKYs using the protein sequence vs profile-HMM database tool at Janelia.org (http://hmmer.janelia.org)  using the default settings and searching the Pfam, Gene3D, and Superfamily databases.
Intron/exon boundary analysis
Intron/exon boundaries of individual WRKY genes were obtained from phytozome (http://www.phytozome.net/) . The consensus amino acid sequences of each WRKY group from higher plants was modified from Rushton et al.  and obtained using all members from Arabidopsis thaliana.
B3 DNA binding domain
Hidden Markov Models
A class of innate immune receptors with nucleotide binding and leucine-rich repeat domains
Paired amphipathic helix domain
- R proteins:
Toll interleukin 1 receptor
APAF-1, R proteins, and CED-4
The authors would like to thank Mani Kant Choudhary, Marissa Miller, Naveen Kumar, Malini Rao, Deena Rushton and Nikhil Kesarla in the Rushton lab. We would also like to thank all of those that have contributed to the field of WRKY transcription factor evolution especially Yuanji Zhang, Liangjiang Wang, Thomas Eulgem, Imre Somssich, Luise Brand, Dierk Wanke, Madan Babu and Lakshminarayanan Aravind. This project was supported in part by National Research Initiative grants 2008-35100-04519 and 2008-35100-05969 from the USDA National Institute of Food and Agriculture.
- Rushton PJ, Macdonald H, Huttly AK, Lazarus CM, Hooley R. Members of a new family of DNA-binding proteins bind to a conserved cis-element in the promoters of alpha-Amy2 genes. Plant Mol Biol. 1995;29(4):691–702.View ArticlePubMedGoogle Scholar
- Yamasaki K, Kigawa T, Inoue M, Tateno M, Yamasaki T, Yabuki T, et al. Solution structure of an Arabidopsis WRKY DNA binding domain. Plant Cell. 2005;17(3):944–56.View ArticlePubMed CentralPubMedGoogle Scholar
- Duan MR, Nan J, Liang YH, Mao P, Lu L, Li L, et al. DNA binding mechanism revealed by high resolution crystal structure of Arabidopsis thaliana WRKY1 protein. Nucleic Acids Res. 2007;35(4):1145–54.View ArticlePubMed CentralPubMedGoogle Scholar
- Yamasaki K, Kigawa T, Watanabe S, Inoue M, Yamasaki T, Seki M, et al. Structural basis for sequence-specific DNA recognition by an Arabidopsis WRKY transcription factor. J Biol Chem. 2012;287(10):7683–91.View ArticlePubMed CentralPubMedGoogle Scholar
- Eulgem T, Rushton PJ, Robatzek S, Somssich IE. The WRKY superfamily of plant transcription factors. Trends Plant Sci. 2000;5(5):199–206.View ArticlePubMedGoogle Scholar
- Rushton PJ, Somssich IE, Ringler P, Shen QJ. WRKY transcription factors. Trends Plant Sci. 2010;15(5):247–58.View ArticlePubMedGoogle Scholar
- Zhang Y, Wang L. The WRKY transcription factor superfamily: its origin in eukaryotes and expansion in plants. BMC Evol Biol. 2005;5:1.View ArticlePubMed CentralPubMedGoogle Scholar
- Rensing SA, Lang D, Zimmer AD, Terry A, Salamov A, Shapiro H, et al. The Physcomitrella Genome Reveals Evolutionary Insights into the Conquest of Land by Plants. Science. 2008;319(5859):64–9.View ArticlePubMedGoogle Scholar
- Banks JA, Nishiyama T, Hasebe M, Bowman JL, Gribskov M, dePamphilis C, et al. The Selaginella genome identifies genetic changes associated with the evolution of vascular plants. Science. 2011;332(6032):960–3.View ArticlePubMed CentralPubMedGoogle Scholar
- Babu MM, Iyer LM, Balaji S, Aravind L. The natural history of the WRKY–GCM1 zinc fingers and the relationship between transcription factors and transposons. Nucleic Acids Res. 2006;34(22):6505–20.View ArticlePubMed CentralPubMedGoogle Scholar
- Brand LH, Fischer NM, Harter K, Kohlbacher O, Wanke D. Elucidating the evolutionary conserved DNA-binding specificities of WRKY transcription factors by molecular dynamics and in vitro binding assays. Nucleic Acids Res. 2013;41(21):9764–78.View ArticlePubMed CentralPubMedGoogle Scholar
- Timme RE, Bachvaroff TR, Delwiche CF. Broad phylogenomic sampling and the sister lineage of land plants. PLoS One. 2012;7(1):e29696.View ArticlePubMed CentralPubMedGoogle Scholar
- Hori K, Maruyama F, Fujisawa T, Togashi T, Yamamoto N, Seo M, et al. Klebsormidium flaccidum genome reveals primary factors for plant terrestrial adaptation. Nat Commun. 2014;5:3978.View ArticlePubMed CentralPubMedGoogle Scholar
- Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32(5):1792–7.View ArticlePubMed CentralPubMedGoogle Scholar
- Tamura K, Stecher G, Peterson D, Filipski A, Kumar S. MEGA6: Molecular Evolutionary Genetics Analysis version 6.0. Mol Biol Evol. 2013;30(12):2725–9.View ArticlePubMed CentralPubMedGoogle Scholar
- Thompson JD, Higgins DG, Gibson TJ. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994;22(22):4673–80.View ArticlePubMed CentralPubMedGoogle Scholar
- Pais FS-M, de Ruy P, Oliveira G, Coimbra R. Assessing the efficiency of multiple sequence alignment programs. Algorithms Mol Biol. 2014;9:4.View ArticlePubMed CentralPubMedGoogle Scholar
- Fitzpatrick DA. Horizontal gene transfer in fungi. FEMS Microbiol Lett. 2012;329(1):1–8.View ArticlePubMedGoogle Scholar
- Kunin V, Goldovsky L, Darzentas N, Ouzounis CA. The net of life: reconstructing the microbial phylogenetic network. Genome Res. 2005;15(7):954–9.View ArticlePubMed CentralPubMedGoogle Scholar
- Andersson JO, Sjogren AM, Davis LA, Embley TM, Roger AJ. Phylogenetic analyses of diplomonad genes reveal frequent lateral gene transfers affecting eukaryotes. Curr Biol. 2003;13(2):94–104.View ArticlePubMedGoogle Scholar
- Clarke M, Lohan AJ, Liu B, Lagkouvardos I, Roy S, Zafar N, et al. Genome of Acanthamoeba castellanii highlights extensive lateral gene transfer and early evolution of tyrosine kinase signaling. Genome Biol. 2013;14(2):R11.View ArticlePubMed CentralPubMedGoogle Scholar
- Richards TA, Soanes DM, Foster PG, Leonard G, Thornton CR, Talbot NJ. Phylogenomic analysis demonstrates a pattern of rare and ancient horizontal gene transfer between plants and fungi. Plant Cell Online. 2009;21(7):1897–911.View ArticleGoogle Scholar
- Hedges SB, Blair JE, Venturi ML, Shoe JL. A molecular timescale of eukaryote evolution and the rise of complex multicellular life. BMC Evol Biol. 2004;4(2):279–84.Google Scholar
- Simon L, Bousquet J, Lévesque RC, Lalonde M. Origin and diversification of endomycorrhizal fungi and coincidence with vascular land plants. 1993.Google Scholar
- Lackner G, Hertweck C. Impact of endofungal bacteria on infection biology, food safety, and drug development. PLoS Pathog. 2011;7(6):e1002096.View ArticlePubMed CentralPubMedGoogle Scholar
- Tisserant E, Malbreil M, Kuo A, Kohler A, Symeonidi A, Balestrini R, et al. Genome of an arbuscular mycorrhizal fungus provides insight into the oldest plant symbiosis. Proc Natl Acad Sci. 2013;110(50):20117–22.View ArticlePubMed CentralPubMedGoogle Scholar
- Schüßler A, Schwarzott D, Walker C. A new fungal phylum, the Glomeromycota: phylogeny and evolution. Mycol Res. 2001;105(12):1413–21.View ArticleGoogle Scholar
- Friedman AR, Baker BJ. The evolution of resistance genes in multi-protein plant resistance systems. Curr Opin Genet Dev. 2007;17(6):493–9.View ArticlePubMedGoogle Scholar
- Kuang H, Wei F, Marano MR, Wirtz U, Wang X, Liu J, et al. The R1 resistance gene cluster contains three groups of independently evolving, type I R1 homologues and shows substantial structural variation among haplotypes of Solanum demissum. Plant J. 2005;44(1):37–51.View ArticlePubMedGoogle Scholar
- Kuang H, Woo S-S, Meyers BC, Nevo E, Michelmore RW. Multiple genetic processes result in heterogeneous rates of evolution within the major cluster disease resistance genes in lettuce. Plant Cell Online. 2004;16(11):2870–94.View ArticleGoogle Scholar
- Parniske M, Hammond-Kosack KE, Golstein C, Thomas CM, Jones DA, Harrison K, et al. Novel Disease Resistance Specificities Result from Sequence Exchange between Tandemly Repeated Genes at the Cf-4/9 Locus of Tomato. Cell. 1997;91(6):821–32.View ArticlePubMedGoogle Scholar
- Tripathi P, Rabara RC, Langum TJ, Boken AK, Rushton DL, Boomsma DD, et al. The WRKY transcription factor family in Brachypodium distachyon. BMC Genomics. 2012;13(1):270.View ArticlePubMed CentralPubMedGoogle Scholar
- Salamov AA, Solovyev VV. Ab initio gene finding in Drosophila genomic DNA. Genome Res. 2000;10(4):516–22.View ArticlePubMed CentralPubMedGoogle Scholar
- Noutoshi Y, Ito T, Seki M, Nakashita H, Yoshida S, Marco Y, et al. A single amino acid insertion in the WRKY domain of the Arabidopsis TIR–NBS–LRR–WRKY-type disease resistance protein SLH1 (sensitive to low humidity 1) causes activation of defense responses and hypersensitive cell death. Plant J. 2005;43(6):873–88.View ArticlePubMedGoogle Scholar
- Goodstein DM, Shu S, Howson R, Neupane R, Hayes RD, Fazo J, et al. Phytozome: a comparative platform for green plant genomics. Nucleic Acids Res. 2012;40(Database issue):D1178–86.View ArticlePubMed CentralPubMedGoogle Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215(3):403–10.View ArticlePubMedGoogle Scholar
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25(17):3389–402.View ArticlePubMed CentralPubMedGoogle Scholar
- Saitou N, Nei M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol. 1987;4(4):406–25.PubMedGoogle Scholar
- Felsenstein J. Confidence limits on phylogenies: an approach using the bootstrap. Evolution. 1985;39(4):783–91.View ArticleGoogle Scholar
- Zuckerkandl E, Pauling L. Molecules as documents of evolutionary history. J Theor Biol. 1965;8(2):357–66.View ArticlePubMedGoogle Scholar
- Finn RD, Clements J, Eddy SR. HMMER web server: interactive sequence similarity searching. Nucleic Acids Res. 2011;39(Web Server issue):W29–37.View ArticlePubMed CentralPubMedGoogle Scholar
- Clamp M, Cuff J, Searle SM, Barton GJ. The Jalview Java alignment editor. Bioinformatics. 2004;20(3):426–7.View ArticlePubMedGoogle Scholar
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.