Transcriptome mining, functional characterization, and phylogeny of a large terpene synthase gene family in spruce (Picea spp.)

Background In conifers, terpene synthases (TPSs) of the gymnosperm-specific TPS-d subfamily form a diverse array of mono-, sesqui-, and diterpenoid compounds, which are components of the oleoresin secretions and volatile emissions. These compounds contribute to defence against herbivores and pathogens and perhaps also protect against abiotic stress. Results The availability of extensive transcriptome resources in the form of expressed sequence tags (ESTs) and full-length cDNAs in several spruce (Picea) species allowed us to estimate that a conifer genome contains at least 69 unique and transcriptionally active TPS genes. This number is comparable to the number of TPSs found in any of the sequenced and well-annotated angiosperm genomes. We functionally characterized a total of 21 spruce TPSs: 12 from Sitka spruce (P. sitchensis), 5 from white spruce (P. glauca), and 4 from hybrid white spruce (P. glauca × P. engelmannii), which included 15 monoterpene synthases, 4 sesquiterpene synthases, and 2 diterpene synthases. Conclusions The functional diversity of these characterized TPSs parallels the diversity of terpenoids found in the oleoresin and volatile emissions of Sitka spruce and provides a context for understanding this chemical diversity at the molecular and mechanistic levels. The comparative characterization of Sitka spruce and Norway spruce diterpene synthases revealed the natural occurrence of TPS sequence variants between closely related spruce species, confirming a previous prediction from site-directed mutagenesis and modelling.


Background
Conifer trees (order Coniferales; Gymnosperms) are extremely long-lived plants that must confront a multitude of biotic and abiotic stresses that vary with the season and over their lifetime. Conifers have evolved several resistance mechanisms that repel, kill, inhibit, or otherwise reduce the success of herbivores and pathogens. These mechanisms include both mechanical and chemical defences that can be present constitutively or that are induced upon challenge [1,2]. As a major part of their constitutive and inducible defensive repertoire, conifers produce an abundant and complex mixture of terpenoids in the form of oleoresin secretions and volatile emissions [2,3]. The diversity of the terpenoids in conifers suggests that, like in other plants [4], an arms race has unfolded in the interactions of conifers with other organisms through the production of specialized (i.e., secondary) metabolites. The diversity of conifer terpenoids includes predominantly monoterpenes, sesquiterpenes and diterpenes, which originate from the activity of a family of terpene synthases (TPSs), and other enzymes, such as cytochromes P450, that may functionalize some of the terpenes [2,5].
Despite much work on individual conifer TPSs [2], the total number of TPSs present in any one conifer species is not yet known since no conifer genome has been sequenced to date. In contrast, the sequenced and annotated genomes of several angiosperm species provide an indication of the diversity of TPSs we might expect to see in any one plant species. For example, the genes encoding putatively active mono-, sesqui-, and di-TPSs number at least 32 in the Arabidopsis (Arabidopsis thaliana) genome [6], at least 31 in the rice (Oryza sativa) genome [7], at least 32 in the poplar (Populus trichocarpa) genome [8], and at least 69 in the genome of a highly inbred grapevine (Vitis vinifera) Pinot Noir variety [9,10]. All of these angiosperm genomes contain clusters of duplicated TPS genes. The large genome size of conifers and the diversity of their terpenoid profiles may suggest a similarly sized or potentially larger TPS gene family in conifer species. However, targeted BAC sequencing of a few conifer TPSs from white spruce (Picea glauca) did not reveal any genomic clustering of multiple TPS genes in this conifer genome [11,12].
Most of our current knowledge of the size, functional diversity and phylogeny of gymnosperm TPSs is based on targeted cDNA cloning and characterization in two conifer species, grand fir (Abies grandis) and Norway spruce (P. abies), along with a few TPSs in other gymnosperms [2]. In grand fir, 11 different TPS genes have been functionally characterized [13]. Martin et al. [14] described a set of 9 different TPSs in Norway spruce (P. abies) and examined the phylogeny of 29 gymnosperm TPSs, all of which fell into the gymnosperm-specific TPS-d subfamily. A deeper understanding of the diversity and functional complexity of the conifer TPS-d subfamily requires additional gene discovery by transcriptome mining. Large collections of expressed sequence tags (ESTs) and fulllength cDNAs (FLcDNAs) exist for several conifer species [15][16][17] and provide a rich resource for identifying and functionally characterizing new TPSs.
Here, we have analyzed the ESTs and FLcDNAs from Sitka spruce (P. sitchensis), white spruce (P. glauca), and hybrid white spruce (P. glauca × P. engelmannii) to identify a comprehensive set of expressed members of the spruce TPS gene family. We have functionally characterized several members from each species for a total of 21 newly characterized spruce TPSs. This work complements previous work in Norway spruce [14] and provides a molecular basis from which to explain much of the chemical complexity of the oleoresin and volatile terpenoids in spruce. Results of the functional gene characterization are discussed in the context of previously reported terpenoid metabolite profiles of oleoresin and volatile emissions in Sitka spruce.

Results and Discussion
Identification of unique TPS sequences and isolation of full-length TPS cDNA clones The in silico analysis of 443,665 spruce ESTs identified a total of 506 ESTs corresponding to putative TPSs ( Table 1). Assembly of these ESTs into contigs and singlets allowed us to estimate the minimum number of actively expressed TPS genes in each of the three spruce species of our analysis. We identified 69 unique TPS sequences in white spruce, 55 in Sitka spruce, and 20 in hybrid white spruce. Although the rate of gene discovery was dependent on the depth of EST sequencing (Table 1), the substantially deeper EST sequence coverage in white spruce (242,931 ESTs) did not result in a proportional increase of TPS discovery relative to Sitka spruce (174,384 ESTs) and hybrid white spruce (26,350 ESTs), suggesting that the majority of expressed TPSs in the tissues sequenced were captured at the depth of sequencing probed in white spruce and Sitka spruce. The estimate of at least 69 TPSs in white spruce is comparable to the number of putatively active TPS genes found in the sequenced genomes of angiosperms and is perhaps a good approximation of the total number of transcriptionally active TPS genes in a conifer species. From the set of assembled TPS sequences, we examined approximately 170 of the corresponding cDNA clones by restriction digest, colony PCR and/or sequencing to identify those which contained full ORFs. Eighteen FLcDNA clones were selected for subcloning and functional characterization. In addition, three full-length TPS cDNA clones were obtained by RACE cloning or homology-based PCR cloning. As the Treenomix project [16], which generated the available cDNA clones focused its FLcDNA program on Sitka spruce, the majority of the full-length TPS cDNA clones were from this species (12 FLcDNAs). Five full-length TPS cDNA clones originated from white spruce, and four from hybrid white spruce.

Functional characterization of recombinant TPS enzymes
Most previously described conifer TPSs are multiproduct enzymes [14,18], and because the identity and relative abundance of TPS products are very sensitive to small changes in amino acid sequence [19][20][21][22][23][24], it is not possible to accurately predict function based solely upon *Conifer TPS protein sequences available from NCBI were used to query the three species-specific EST databases using the tBLASTn module of WU-BLAST 2.0 and an E-value cut off of 1 × 10 -5 . The resulting outputs were filtered to exclude duplicates, and then assembled separately by species using CAP3 [49]. The total TPSs represents an estimated minimum number of unique TPSs found in each species.
amino acid sequence similarity/phylogeny. While it might be possible to infer a TPS gene function from the chemical phenotype of a corresponding plant mutant, the genetic resources for such an approach are available only for a very few model systems such as Arabidopsis [25]. Instead, in most systems, the functional annotation of each TPS requires expression and enzyme characterization of recombinant protein.
Recombinant spruce TPSs were expressed in E. coli and purified by Ni-affinity chromatography before assaying each individually with geranyl diphosphate (GPP), farnesyl diphosphate (E,E-FPP), and geranylgeranyl diphosphate (E,E,E-GGPP), the three respective transprenyl diphosphate substrates of conifer monoterpene synthases, sesquiterpene synthases, and diterpene synthases. Since two recent reports described the occurrence and conversion of cis-prenyl diphosphate substrates in tomato [26][27][28], we also assessed if spruce is likely to produce these additional TPS substrates. Mining of all available spruce EST sequences did not reveal the presence of prenyltransferases for the formation of cis-prenyl diphosphate substrates (D. Hall and J. Bohlmann; unpublished results).
In the following sections we describe the specific functional characterization of the 21 spruce TPSs (Figure 1). With one exception, each of these TPSs only made significant use of one of the substrates. Based upon functional characterization, the 21 TPSs comprised Figure 1 Phylogeny of functionally characterized gymnosperm TPSs. The ent-kaurene synthase from Physcomitrella patens was included as an outgroup. TPSs described in this paper are shown with white background. Protein alignments were prepared using MUSCLE [54] and phylogenetic trees were constructed using the neighbour-joining method with 100 bootstrap repetitions (asterisks are given at clades supported by 80% and higher bootstrap values), within CLC Main Workbench (CLC bio, Århus, Denmark). 15 monoterpene synthases, 4 sesquiterpene synthases, and 2 diterpene synthases. The product identities and abundance for each TPS, including quantitative composition of multi-product profiles, is shown in Table 2, and representative GCMS traces are shown in Figures 2,  3, and 4. A summary of the functional annotation along with NCBI GenBank accession numbers appears in Table 3. Results of the functional TPS characterization are discussed in the context of previously reported terpenoid metabolite profiles in Sitka spruce genotype FB3-425 [see Supplemental Tables in [29]], from which many of the functionally characterized TPS FLcDNAs were isolated. Terpenoid profiles are also available from a collection of 111 Sitka spruce accessions [30].

Functional characterization of monoterpene synthases: (-)-a/b-pinene synthases
We characterized one new (-)-α/β-pinene synthase in Sitka spruce (PsTPS-Pin) and two in white spruce (PgTPS-Pin-1 and PgTPS-Pin-2; both originating from the same genotype) (Tables 1 and 2). These three enzymes clustered closely with the two previously Terpinen-4-ol 5.2 characterized (-)-α/β-pinene synthases from Sitka spruce [32] and Norway spruce [14] in the TPS-d1 clade ( Figure 1). The topology of this group of five (-)-α/β-pinene synthases suggests that they represent orthologs in the three spruce species of our comparison. The two pairs of (-)-α/β-pinene synthase genes in white spruce and in Sitka spruce may represent recently duplicated genes or allelic variants in each of these two species. The two white spruce enzymes differed in only four amino acids between each other, and the two Sitka spruce enzymes differed in only six amino acids. The white spruce (-)-α/β-pinene synthases were approximately 96% identical with the (-)-α/β-pinene synthase in Norway spruce, and approximately 96% identical with the (-)-α/β-pinene synthases in Sitka spruce. The Sitka spruce (-)-α/βpinene synthases shared approximately 95% identity with the Norway spruce enzyme. The (-)-α-pinene synthase from loblolly pine (Pinus taeda) [33] and the (-)-α/β-pinene synthase from grand fir [34] clustered outside the group of the spruce (-)-α/β-pinene synthases ( Figure 1). These pine and grand fir (-)-pinene synthases may be the corresponding orthologs outside of the spruce genus. The two (-)-α/β-pinene synthases in white spruce (PgTPS-Pin-1 and PgTPS-Pin-2) contained only four amino acid differences: Q/R94, R/G217, S/N221, and E/G599, but showed an opposing pattern in the relative amounts of αand β-pinene produced by the recombinant enzymes (67:33 and 29:71 α-pinene:βpinene, respectively, Table 2). Based upon homology modelling with the limonene synthase from Mentha spicata as a template [35], we examined whether any of the four different residues were in or near the active site. Only the residue at 599 (corresponding to M572 of the template) was near the active site. Although this residue was not on the surface of the modelled active site, it was directly behind the residues that contribute Table 2 Product profiles of recombinant TPS enzymes based upon total ion current of GCMS analysis on a DB-WAX column (Continued)   Compounds were identified by comparison of mass spectra and retention indices with authentic standards if available, and retention indices, and/or mass spectra from Adams [52] and NIST, and combined mass spectra and retention index library searches in MassFinder [53] if standards were not available.
The monoterpenes (-)-α-pinene and (-)-β-pinene are prominent resin compounds in Sitka spruce [29,30] and in Norway spruce [36,37]. In Norway spruce, induced accumulation of these compounds in bark tissue of MeJA-treated stems is the result of increased enzyme activity, protein abundance, and transcript levels of (-)-α/β-pinene synthase [38]. Previous work in Sitka spruce also showed strong accumulation of transcripts detected with a (-)-α/β-pinene synthase probe in MeJA-and insect-treated stems, both at the site of insect feeding and some distance away [29].
(-)-Linalool was previously detected as the major volatile emission of MeJA-treated and weevil-attacked Sitka spruce saplings in the genotype FB3-425 [29], similar to the MeJA-induced emission of linalool from Norway spruce [37]. Transcripts detected with a PaTPS-Lin probe were strongly induced in needles of MeJA-treated Sitka spruce [29]. Linalool volatiles are thought to function in indirect defence against herbivores. Apparently, the (-)-linalool emissions in spruce do not originate from the oleoresin reservoirs of severed resin ducts, but from the induced de novo biosynthesis in other tissues. The cloning of (-)-linalool synthase genes from Sitka spruce and white spruce makes it possible to investigate, in future work, the localization of these enzymes and the corresponding transcripts in the needles using the methods of laser-assisted tissue microdissection techniques [39] or immunofluorescence localization [40].

Functional characterization of monoterpene synthases: 1,8-Cineole synthases
In each of the three spruce species studied we identified and characterized a single 1,8-cineole synthase, PgTPS-Cin, Pg×eTPS-Cin, and PsTPS-Cin (Tables 2 and 3, Figure 2). The three enzymes shared approximately 99% sequence identity to each other and form a distinct group in the TPS-d1 clade most closely related to the linalool synthases. The 1,8-cineole synthases and the linalool synthases are among only a few known conifer monoterpene synthases that produce mainly oxygenated monoterpenes instead of olefins. All three 1,8-cineole synthases were multi-product enzymes with the amount of the major 1,8-cineole product varying from approximately 60% of total product for PsTPS-Cin to approximately 90% for PgTPS-Cin. These three spruce enzymes also had similar profiles of minor products (-)-αterpineol, (+)-α-pinene, β-pinene, myrcene and others (Table 2 and Figure 2). Although 1,8-cineole has been identified as a monoterpenoid component in needles and MeJA-induced volatile emissions of Norway spruce [37], and has recently been shown to inhibit attraction in the field and response of an olfactory receptor neuron to pheromone of a spruce beetle [43], this is the first characterization of gymnosperm TPSs that produce this compound.

Functional characterization of sesquiterpene synthases
A complex blend of sesquiterpenes is found in minor quantities in the oleoresin of conifers, including Sitka spruce [29] and Norway spruce [37]. Sesquiterpenes are also present in the MeJA-induced volatile emissions of Norway spruce [37] and in the MeJA-and weevilinduced volatile emissions in Sitka spruce [29]. For the   Table 2.
PgTPS-Hum, Pg×eTPS-Lonf, PsTPS-Lonp belong to the TPS-d2 clade of the gymnosperm TPS-d subfamily, together with other conifer multi-product sesquiterpene synthases (Figure 1). The hybrid white spruce Pg×eTPS-Far/Oci appeared to be orthologous with farnesene synthases from loblolly pine and Norway spruce in the TPS-d1 clade.
PsTPS-LAS and PsTPS-Iso play an important role in the overall diterpene resin acid defence systems of Sitka spruce. The six products of the two Sitka spruce diterpene synthases are present as the corresponding diterpene resin acids in the oleoresin of Sitka spruce stem tissues [29]. Accumulation of all of these diterpene resin acids was induced by MeJA treatment or insect attack, along with increased transcript levels detected with the orthologous PaTPS-LAS and PaTPS-Iso probes [29].
The sequences of PsTPS-LAS and PaTPS-LAS differed by only 12 amino acids, and PsTPS-Iso and PaTPS-Iso differed by only 35 amino acids. In a detailed investigation of the PaTPS-LAS and PaTPS-Iso enzymes, using reciprocal site-directed mutagenesis and domainswapping, we have recently shown that four amino acid residues determine the different product profiles of these Norway spruce diterpene synthases [24]. These product-determining residues are identical between the levopimaradiene/abietadiene synthases (PsTPS-LAS and PaTPS-LAS) in Sitka and Norway spruce, consistent with their similar product profiles. However, only three of these residues are identical between the isopimaradiene synthases (PsTPS-Iso and PaTPS-Iso) in Sitka and Norway spruce; the fourth residue (V732) is the same as that found in the Norway spruce levopimaradiene/ abietadiene synthase. In our previous study [24], the corresponding reciprocal L725V mutation obtained by site-directed mutagenesis of PaTPS-Iso resulted in the formation of sandaracopimaradiene as a minor product. This product profile change is consistent with the new observation that the isopimaradiene synthase from Sitka spruce (PsTPS-Iso) naturally produced sandaracopimaradiene as a minor compound (Table 2, Figure 4). Overall, these results highlight how mutations produced in the laboratory that determine product profile differences also exist in nature and do result in the evolution of altered TPS product profiles between species or genotypes.

Phylogeny of gymnosperm TPSs
All known conifer TPSs of specialized (i.e., secondary) metabolism are members of the gymnosperm-specific TPS-d subfamily, which is a distinct clade of the larger plant TPS gene family [47]. The TPS-d subfamily has been subdivided into three clades TPS-d1 through TPS-d3 based on a previous phylogeny of 29 gymnosperm TPSs [14]. Here, we have substantially expanded the phylogeny of functionally characterized gymnosperm TPSs to a total of 72 members (Figure 1), of which 41 are from spruce species with 20 different TPSs from Sitka spruce. The number of TPSs functionally characterized in Sitka spruce is one of the largest for any species, but is not yet approaching our in silico minimum estimate for the number of TPSs in a spruce genome (at least 69 transcriptionally active TPS genes). The diverse set of newly characterized spruce TPSs broadly represent the major TPS-d1, TPS-d2 and TPS-d3 clades, and allowed us to identify groups of likely orthologous TPS genes across the spruce species. Examples for such groups of orthologous TPSs in the TPS-d1 clade are the (-)-α/β-pinene synthases, the (-)-linalool synthases, (E, E)-α-farnesene synthases; in the TPS-d2 clade are the longifolene synthases; and in the TPS-d3 clade are the levopimaradiene/abietadiene synthases and isopimaradiene synthases. These groups represent genes whose functions had apparently evolved prior to speciation of the spruce genus. In the TPS-d3 group of conifer diterpene synthases, the basal function of a multi-product levopimaradiene/abietadiene synthase had apparently evolved prior to conifer speciation, as this function exists in a group of closely related genes from the genera Abies, Pinus and Picea.
Overall, the large diversity of gene functions among the many closely related genes of the conifer TPS-d1 group illustrates the many events of gene duplications and sub-or neo-functionalizations that have occurred in the evolution of this amazing family of conifer genes of specialized metabolism. The functionally identified spruce TPS genes account for many of the major and minor terpenoid compounds of the defensive oleoresin and volatile emissions. However, there are several distinct types of TPSs still to be found in spruce based upon the terpenoid components identified in oleoresin. Based on the current phylogeny of functionally characterized spruce TPSs, we predict that most of the remaining TPSs to be identified will be highly similar in sequence to previously identified TPS, but with the possibility of diverse function due to relatively minor sequence divergence.
In contrast to the many duplicated TPS-d genes of terpenoid specialized metabolism, the related spruce TPS genes of general gibberellin phytohormone biosynthesis, specifically ent-copalyl diphosphate synthase (TPS-c) and ent-kaurene synthase (TPS-e), appear to be expressed as single copy genes [12]. These primary metabolism TPS genes are basal to the specialized metabolism genes and are the descendants of an ancestral plant diterpene synthase similar to the one found in the non-vascular plant Physcomitrella patens [12,48]. The mechanisms that suppress manifestation or retention of TPS gene duplication in diterpenoid primary metabolism and those that enhance TPS gene duplication and functional diversification in specialized metabolism in a conifer genome are not known but are worthy of future investigation. The high functional plasticity of the TPS-d family and the great diversity of terpenoids produced may impart fitness advantages against a multitude of pests and pathogens. We speculate that the TPS-d genes of specialized metabolism originating from gene duplication are slower, or less likely, to become inactive pseudogenes compared to those genes with less functional plasticity in primary metabolism.

Conclusions
Based upon estimates from EST and FLcDNA sequencing in three species of spruce, the TPS gene family in conifers appears to be at least of comparable size to those found in angiosperms with sequenced genomes. This study highlights the great diversity of TPSs of specialized metabolism in conifers, which resulted from gene duplication and functional diversification. Functional differences can occur naturally due to small differences in amino acid sequence.

Methods
In silico identification of spruce terpene synthases in the EST and FLcDNA databases Quality trimmed and filtered nucleotide sequences were obtained from spruce genomic resources developed in the Genome Canada-funded Treenomix (http://www. treenomix.ca) and Arborea (http://www.arborea.ulaval. ca) projects as follows: white spruce (242,931 ESTs), Sitka spruce (174,384 ESTs), and hybrid white spruce (also referred to as interior spruce; 26,350 ESTs) [15,16]. Conifer TPS protein sequences available from NCBI were used to query the three species-specific databases using the tBLASTn module of WU-BLAST 2.0 and an E-value cut off of 1 × 10 -5 . The resulting outputs were filtered to exclude duplicates, and then assembled separately by species using CAP3 [49] using an overlap of 40 bp and a percent identity of 95%. The assembled TPS candidate sequences were then tentatively annotated using NCBI BLASTx using the nr database (downloaded Oct. 2008).

Selection of FLcDNA clones for functional characterization
Authentic cDNA clones corresponding to the aboveidentified TPS candidate sequences were examined further by restriction digest, colony PCR, and/or sequencing. Those clones that potentially contained a full-length TPS cDNA (i.e. complete ORF) were fully sequenced and if a unique full-ORF TPS was found, the insert was subcloned for expression as described below. In one case, two full ORFs (WS00725_G07c1, WS00725_G07c2) were obtained by 5'-RACE and the full-length genes were subsequently cloned into pCR Blunt II TOPO (Invitrogen).

Cloning of PsTPS-Iso
Because of our particular interest in conifer diTPSs [12,24] and the low abundance of putative diTPSs in the ESTs, we chose to isolate an isopimaradiene synthase (PsTPS-Iso) cDNA from Sitka spruce using homologybased cloning to allow functional comparison with its putative levopimaradiene/abietadiene synthase paralog (PsTPS-LAS, WS0299_C21, described here). Examination of the spruce EST resources [15,16] identified a 3'-read for clone WS00752_D05 from white spruce with high similarity to the isopimaradiene synthase from Norway spruce (PaTPS-Iso; [14]). Full sequencing of this cDNA clone indicated that it was an incomplete transcript. Using PCR with primers designed for the 3'-UTR of this sequence and the 5'-UTR of WS0299_C21, we amplified a 2,700 bp cDNA from the bark of methyl jasmonatetreated Sitka spruce (genotype Haney 898). The amplicon was cloned into pCR Blunt II TOPO and fully sequenced (PsTPS-Iso, pSW06061903).
Plasmids were transformed into chemically competent C41 E. coli cells (http://www.overexpress.com) containing the pRARE 2 plasmid (coding for rare tRNAs) prepared from Novagen Rosetta 2 cells (EMD Biosciences, Inc., Madison, WI, USA). Luria-Bertani medium (5 mL) containing appropriate antibiotics was inoculated with three individual colonies and cultured overnight at 37°C, 220 rpm. Terrific Broth medium (50 mL) containing appropriate antibiotics was then inoculated with 0.5 mL of the overnight culture and grown in a 250 mL baffle flask at 37°C and 300 rpm until an optical density at 600 nm of at least 0.8 was reached. Cultures were then cooled to 16°C, induced with 0.2 mM IPTG, and then cultured for approximately 16-20 h at 16°C and 220 rpm before pelleting and freezing.
Assay products were analyzed on an Agilent HP-5ms column (5% phenyl methyl siloxane, 30 m × 250 μm ID, 0.25 μm film) at 1 mL min -1 He on an Agilent 6890N gas chromatograph, 7683B series autosampler (vertical syringe position of 8 to sample the pentane layer), and 5975 Inert XL MS Detector. GC temperature program as follows: 40°C, hold 1 min, 7.5°C min -1 to 250°C, hold 2 min, pulsed splitless injector held at 250°C. Samples were also analyzed on an Agilent DB-WAX column (polyethylene glycol, 30 m × 250 μm ID, 0.25 μm film) with the following temperature program: 40°C, hold 3 min, 10°C min -1 to 240°C, hold 15 min, pulsed splitless injector held at 240°C. Compounds were identified by comparison of mass spectra and retention indices with authentic standards if available, and retention indices, and/or mass spectra from Adams [52] and NIST, and combined mass spectra and retention index library searches in MassFinder [53] if standards were not available.

Phylogenetic analysis
Protein alignments were prepared using MUSCLE [54] and phylogenetic trees were constructed using the neighbour-joining method with 100 bootstrap repetitions, both within CLC Main Workbench 5.6.1 (CLC bio, Århus, Denmark).