The P450 multigene family of Fontainea and insights into diterpenoid synthesis

Background Cytochrome P450s (P450s) are enzymes that play critical roles in the biosynthesis of physiologically important compounds across all organisms. Although they have been characterised in a large number of plant species, no information relating to these enzymes are available from the genus Fontainea (family Euphorbiaceae). Fontainea is significant as the genus includes species that produce medicinally significant epoxy-tigliane natural products, one of which has been approved as an anti-cancer therapeutic. Results A comparative species leaf metabolome analysis showed that Fontainea species possess a chemical profile different from various other plant species. The diversity and expression profiles of Fontainea P450s were investigated from leaf and root tissue. A total of 103 and 123 full-length P450 genes in Fontainea picrosperma and Fontainea venosa, respectively (and a further 127/125 partial-length) that were phylogenetically classified into clans, families and subfamilies. The majority of P450 identified are most active within root tissue (66.2% F. picrosperma, 65.0% F. venosa). Representatives within the CYP71D and CYP726A were identified in Fontainea that are excellent candidates for diterpenoid synthesis, of which CYP726A1, CYP726A2 and CYP71D1 appear to be exclusive to Fontainea species and were significantly more highly expressed in root tissue compared to leaf tissue. Conclusion This study presents a comprehensive overview of the P450 gene family in Fontainea that may provide important insights into the biosynthesis of the medicinally significant epoxy-tigliane diterpenes found within the genus. Supplementary Information The online version contains supplementary material available at 10.1186/s12870-021-02958-y.


Background
Diterpenes, also known as diterpenoids or isoprenoids, are a structurally diverse class of small molecules that are widespread throughout the plant kingdom. Diterpenes exhibit many and varied biological activities and consequently there is significant commercial interest in their potential applications as pharmaceuticals, food products, and industrial and agricultural chemicals [1][2][3][4][5]. Tigilanol tiglate (TT), a novel epoxy-tigliane diterpene ester extracted from the fruit of Fontainea picrosperma (family Euphorbiaceae) [3,6], is of particular current interest due to its effectiveness as a local treatment for a range of cancers in humans and companion animals [3,[7][8][9]. Recently, TT was approved by regulatory authorities in Europe and the USA as a veterinary pharmaceutical for the treatment of non-metastatic canine mast cell tumours.
TT cannot be synthesised on a commercial scale and instead is obtained by purification from the fruit of F. picrosperma [10]. Despite this, all tissues of plants may accumulate diterpenoids [11], so whilst the fruit of F. picrosperma is the raw material for the purification of TT, TT is likely present in all tissues of the plant. To this point, the natural biosynthetic pathway leading to the biosynthesis of tigliane esters such as TT is currently unknown, but as is the case for other macrocyclic diterpenes (e.g. jatrophane & ingenane), the roots may be the site of biosynthesis [12,13]. Given the important role cytochrome P450s (P450s) play in the biosynthetic pathways of both primary and secondary metabolite production [14], it is likely that they are critical to the biosynthesis of TT and epoxy-tiglianes more generally.
P450s are widely distributed in eukaryotes, where they form a large and diverse class of enzymes consisting of more than 35,000 members [15,16] and play vital roles in biosynthesis of natural products, degradation of xenobiotics, biosynthesis of steroid hormones, drug metabolism and synthesis of secondary metabolites [14,15]. They catalyse reactions in biosynthetic pathways of many compounds such as alkaloids, flavonoids, lignans, isoprenoids, phenolics, antioxidants and phenylpropanoid [17,18]. In plants, the P450 superfamily is one of the largest gene families of enzyme proteins, where for instance, it is the third largest gene family present in Arabidopsis [19]. At present, 5100 plant P450s have been annotated and clustered into two different categories (A type and non-A-Type) and 11 different clans [20]. The Atype P450 enzymes are grouped as the CYP71 clan, whereas the non-A type are subdivided into 10 clans -CYP51, CYP72, CYP74, CYP85, CYP86, CYP97, CYP710, CYP711, CYP727, and CYP746 [14,15,19] according to the standard nomenclature system [21]. The CYP71 clan includes more than 50% of all plant P450s [22,23].
Despite the abundance of diterpenoids in plants, there are still large gaps in our understanding of their biosynthesis pathways. To date, CYP71D and CYP726A members (within clan CYP71) have been functionally characterized as diterpene modifying in Euphorbiaceae [24,25]. Furthermore, these P450 genes are present in genome biosynthetic clusters, also with casbene synthase, and this organisation is conserved across the Euphorbiacae species investigated (i.e. for those species from which genome is available) [26][27][28][29]. This has led to the theory that P450s are the driving force for plant diterpene diversity [26]. Based on only a small pool of experimental gene expression data, these diterpenoid biosynthetic genes have broad tissue expression, from leaf to root, flower and stem [25,30].
To date, there are no reports of P450 genes in Fontainea, yet the next-generation sequencing (NGS) approach provides the ideal tool towards their elucidation in this genus. Fontainea picrosperma and Fontainea venosa are closely related species that presumably produce similar arrays of natural products, including epoxy-tigliane diterpenes. In this study, we report the general metabolomic profiles of these two Fontainea species, and compared to non-Fontainea species. Towards better understanding these similarities and differences, we used NGS trancriptomics to elucidate the Fontainea P450 family and their relative gene expression in leaf and root tissue, with particular focus on those predicted to be involved in diterpenoid synthesis.

Metabolomic analysis and P450 identification
Targeted analysis for TT in various tissues (root, leaf, bark and fruit) of F. picrosperma demonstrated its presence in all tissues (Additional Fig. 1). Based on this result, as well as prior evidence that diterpenoid biosynthesis enzymes are broadly expressed [31,32], the leaf was chosen for a multi-species metabolomics comparison. Metabolomics analysis of leaf tissue from F. picrosperma, F. venosa and 4 other non-Fontainea plant species provided a total of 49,098 mass spectral ions extracted from the LC-MS dataset. A partial least squarediscriminant analysis (PLS-DA) was performed to analyse the chemodiversity among samples (DOI: https:// doi.org/10.25907/00049). The PLS-DA model with three components accounting for 20.8, 25.0 and 15.8% of the total variance showed that F. picrosperma and F. venosa were notably separated from other species (Fig. 1a). This untargeted metabolomic analysis indicated that Fontainea species were considerably more closely related from a chemical perspective compared to two Euphorbiaceae (Manihot esculenta, Ricinus communis) and two non-Euphorbiaceae (Arabidopsis thaliana, Solanum lycopersicum) plants.
To investigate if chemodiversity is correlated with P450 diversity, the F. picrosperma and F. venosa P450 were initially identified through NGS, de novo reference assembly and gene ontology. Transcriptome libraries were constructed from combined root and leaf tissue for both F. picrosperma and F. venosa plants using Illumina HiSeq 2500 trimmed reads (150-200 bp). In total, 12 Gb and 30 Gb raw reads were generated for F. picrosperma and F. venosa libraries, respectively, which were assembled into 192,639 (N50 1450 bp) and 246,608 (N50 1248 bp) contigs. From these reference Fontainea transcriptomes, we identified 103 and 123 full-length P450 genes (and 127 and 125 partial-length) in F. picrosperma and F. venosa, respectively, all of which contained the conserved cytochrome P450 domain. The 4 different recognised P450 motif/regions (I-helix, K-helix, PERF and heme-binding [33]) are presented with noted conservation (Fig. 1b). A comparative P450 sequence identity analysis of the same 6 species used in metabolomic analysis, based on 90% sequence identity, demonstrated that Fontainea share considerably more P450 identity (48%) compared to other species of Euphorbiaceae and non-Euphorbiaceae (Fig. 1c). The non-Euphorbiaceae species of A. thaliana and S. lycopersicum shared no P450 (> 90% identity) with F. picrosperma or F. venosa.
F. picrosperma P450 genes were classified into 10 clans, within which there were 37 families and 45 subfamilies (Table 1). In F. venosa, P450 genes were classified into 9 clans, containing 37 families and 67 subfamilies ( Table 2). The general characteristics of Fontainea full-length P450 proteins were also investigated, including the amino acid (aa) length, molecular weight, isoelectric point (pI) and presence of secretory signal peptide. Length varied from 300 to 618 aa in F. picrosperma and 301-632 aa in F. venosa, with molecular weights ranging from~27-71 kDa. The pI values ranged from 5.22-9.8 in F. picrosperma and 5.29-9.91 in F. venosa. We also calculated their instability index (II) and found that 18 F. picrosperma and 52 F. venosa P450 were stable (stability factor < 40). The GRAVY values were negative for all P450 proteins, indicating them to be hydrophilic. In F. picrosperma, 68 P450 were predicted to have a secondary pathway signal peptide, whereas only 5 sequences (FpCYP74B2, FpCYP707A2, FpCYP707A3, FpCYP707A4 and FpCYP88A1) contained chloroplast-targeting peptides. In F. venosa, 78 P450 had pathway signal peptides and 3 (FvCYP707A1, FvCYP707A2 and FvCYP707A4) had chloroplasttargeting signal peptides. There were no P450 with mitochondrial-targeting signal peptides.

Phylogenetic and putative functional analysis of P450
A phylogenetic analysis containing 1042 P450s from 6 species (F. picrosperma, F. venosa, R. communis, M. esculenta, A. thaliana and S. lycopersicum) confirmed that the majority of F. picrosperma and F. venosa P450 do not show substantial relatedness to P450 from species outside of the Euphorbiaceae family (Additional Fig. 2). There were 72 P450 genes exclusive to Fontainea (identity > 92%). A Fontainea-specific P450 phylogeny showed that in F. picrosperma, 16 CYP85 were assigned into 7 families that form a single clade, while the CYP72 clan contained 17 genes assigned to 3 families (Fig. 2a). In F. venosa, 14 genes were assigned into 7 families that formed a single clade for CYP85. In the CYP72 clade, 17 P450 clustered into 4 families. A single P450 was represented in clan CYP710 (FpCYP710A1, FvCYP710A1) and CYP51 (FpCYP51G, FvCYP51G), which are phylogenetically most related to CYP85. The majority of F. picrosperma P450 belong to the CYP71 clan (57 genes; 57.68%), followed by the CYP72 and CYP85 clans, which is also known as A-type (Fig.  2b). The CYP71 clan is responsible for alkaloid, sesquiterpenoid, cyclic terpenoid and flavonoid biosynthesis. The majority of F. venosa P450 also belong to the CYP71 clan (68 genes; 55.28%), followed by the CYP72 and CYP85 clans (Fig. 2b). The non-A types encompass the remaining 46 P450, which belong to 9 P450 clans and 21 families in F. picrosperma. In F. venosa, there were 55 non-A type P450, which belong to 8 clans and 19 families. Of note, representative P450 from CYP711 were absent from F. venosa, while CYP97 were more well represented in F. venosa compared to F. picrosperma.

P450 gene candidates involved in diterpenoid metabolism
Phylogenetic analysis of the Fontainea P450 classified within clans CYP71D and CYP726A were found to be closely related to those found in other Euphorbiaceae species (Jatropha curcus, Euphorbia peplus, Euphorbia latex and R. communis), confirming their position within these putative diterpenoid P450 subfamilies (Fig. 4a). This was additionally supported by an expanded conserved motif analysis (in addition to I-helix, K-helix, PERF and heme-binding motifs). Of the 4 F. picrosperma CYP726A, FpCYP726A4 was most divergent. A single To investigate tissue expression of the common Fontainea diterpenoid P450 genes CYP726A1, CYP726A2 and CYP71D1, an additional 9 biological samples of F. picrosperma were quantitatively analysed (12 in total) by mapping RNA-seq to the reference transcriptome. These data showed statistically significantly higher levels of gene expression for all 3 genes in the root tissue compared to leaf tissue (Fig. 4b, Additional File 3). The housekeeping genes, glyceraldehyde-3-P dehydrogenase (GAPC) and elongation factor 1-alpha (EF1α) showed consistency in both leaf and root tissues and higher expression in root tissues compared to leaf tissue [12].

Discussion
Cytochrome P450s are evolutionarily conserved enzymes that are involved in the catalysis of numerous reactions, required for growth, development, defence [34] and secondary metabolism [14]. Prior to this study, no P450 genes had been identified, let alone characterized, in any species of the genus Fontainea. This is significant as P450 genes are likely to be important for future understanding of the biosynthetic pathways that produce medicinally significant diterpene esters, such as TT, which are unique to Fontainea. Towards that aim, we have identified and classified putative full-length P450 encoding genes in two species of Fontainea, F. picrosperma and F. venosa. Phylogenetic analysis allowed us to identify groups of genes for further evaluation. Moreover, their expression profiles in leaf and root tissues were investigated, with a particular focus on the P450 genes linked with diterpenoid biosynthesis, potentially involved in the production of TT.
We report 103 and 123 full-length P450 genes from F. picrosperma and F. venosa, respectively, that were classified into clans, which cumulatively consisted of 37 families and 67 subfamilies that fit into conformed plantderived functions, most prominently with diterpenoid, flavonoid and other functions. An ortholog comparison showed that P450 genes of Fontainea species are largely unique when compared to other plant species of both Euphorbiaceae and non-Euphorbiaceae. In support of our metabolomics analysis, the S. lycopersicum and A. thaliana P450 showed low overall similarity to Fontainea species. This may be attributed to the unique biosynthesis of diterpenoid derivatives that are phorbol The total number of Fontainea P450 identified in this study was consistent with that found in other plant transcriptomics P450 research, including the total number of full-length sequences, clans, families and subfamilies [30,35,36]. For example, transcriptomic studies allowed the elucidation of 151 full-length P450 genes in Lonicera japonica [35], 118 full-length in Taxus chinensis [36] and 116 full-length in Salvia miltiorrhiza [30]. However, in S. miltorrhiza, the tissues used for transcriptomics included leaves, roots and flowers, while in L. japonica, flower and buds were used. Our study identified 127 and 125 partial-length P450 gene from F. picrosperma and F. venosa, respectively. To obtain the full-length sequence, additional RNA-seq from the stem, flower and fruit would be helpful, as well as from different stages of development. In addition, this could be complemented by genome sequencing.
If a genome is available for a species, it does provide an alternate mechanism for P450 gene identification, through genome-wide interrogation. In A. thaliana, this approach identified 246 genes that clustered into 9 clans and 47 families [20,37], while a much larger number were identified from the soybean (G. max) and rice (Japonica) genomes, containing 332 and 355 P450 genes, respectively [38]. Far fewer were present in the legume (Medicago truncatula), where 151 putative P450 genes were identified, including 135 novel P450 [39]. We expect that once a genome is available for Fontainea, a more complete list of full-length P450 genes will be established.
Our results using these predictive protein characterisation analyses (i.e. molecular weight, cell localization, average z-score value expression profile in Fontainea species. Red circle, represents significantly different (p < 0.05) between root and leaf tissues in F. picrosperma. Green circle represents significantly different (p < 0.05) between root and leaf tissues in F. venosa. See Additional Fig. 4 for significance values. Venn diagrams shown below indicates those P450 exclusive and shared between leaf and root tissues function) were in line with prior studies of P450 proteins. P450s are typically anchored on the surface of the endoplasmic reticulum [40] and some may target to the plastids or mitochondria [41]. It is common that animal CYPs are anchored to mitochondria, but there is no report of any plant CYP with mitochondrial localization [39] except maize, where 3 CYPs have been reported (Zea mays) [42]. In our study of F. picrosperma or F. venosa, no deduced P450 proteins were predicted to have mitochondrial targeting peptides. CYP74A1, CYP74B1 and CYP74B2 in F. picrosperma and F. venosa and CYP726A4 and CYP71B12 in F. picrosperma were found in the chloroplast. In other plants, such as Triticum araraticum, Z. mays all members of CYP74 and CYP701 were targeted to chloroplast [42].
The diversity of different P450s between F. picrosperma and F. venosa, and other species, likely contributes to the observed differences in their chemical profiles, including diterpenes [25]. In all plant species that have been researched to date, the largest P450 clan is CYP71 [21]. The families and subfamilies within the clan have diverged remarkably during plant evolution, many of which are known to be involved in secondary metabolite biosynthesis of flavonoids and alkaloids [43].
Similarly, the CYP71 family is the largest P450 clan in Fontainea (see Fig. 2). On the contrary, two CYP711 representatives were identified from F. picrosperma, but were absent in F. venosa, although CYP711 have been described in other plant species [23]. In our phylogenetic tree, the CYP74 clan is adjacent to CYP711 family, suggesting that Fontainea CYP711 may also function within the metabolism of oxylipins and strigolactone signals [23], as strigolactones have been identified as branching inhibition hormones in plants, and several CYP711 have been experimentally confirmed as strigolactones biosynthetic enzymes [44,45]. We additionally found that F. venosa had more CYP97 genes compared to F. picrosperma; the CYP97 clan is involved in the hydroxylation of carotenoids [46]. Carotenoids are a group of widely distributed pigments derived from the ubiquitous isoprenoid biosynthetic pathway and play diverse roles in plant primary and secondary metabolism. Carotenoids contain two pigments, carotene and lutein, which absorb and transfer energy to protect chlorophyll [37]. We speculate that this may partially explain why F. venosa have darker leaves compared to F. picrosperma.
Based on other studies, P450 biosynthesis genes are relatively more highly expressed in root tissue compared to leaf tissue [25,[47][48][49]. In our study, we also found that Fontainea (F. picrosperma and F. venosa) P450 genes were more highly expressed in root tissue compared to leaf tissue (see Fig. 3). Among those significantly more highly expressed in the root (n = 3) have been associated with fertility reduction (CYP78A), UV stress tolerance (CYP84A), gibberellin metabolism (CYP714A) and jasmonic acid metabolism (CYP74A) [50][51][52][53]. Those significantly more highly expressed in the leaf include those previously associated with catalysing successive oxidation steps of the plant hormone jasmonoyl-isoleucine for catabolic turnover (CYP94), expression of ABA 8′-hydroxylase and affects ABA levels to control seed dormancy (CYP707A), hydroxylation of carotenoids (CYP97), biosynthesis of castasterone in the brassinosteroid biosynthetic pathway (CYP85A) and glucosinolate metabolism (CYP83) [40,46,[54][55][56]. Some species variation existed in P450 homolog tissue expression (see Fig. 3b). This may be explained by the different growth and developmental stage of plants from which the tissue was sampled, as P450s are involved in the regulation of plant hormone metabolism, growth and development and hormones are involved in formation and development of flowers, leaves, stems and fruits [57].
Diterpenoids are one of the most widespread classes of secondary metabolites in higher plants, which are synthesized from basic isoprene units (C 5 H 8 ) and further modified by various oxidoreductases, acyltransferases, dehydrogenases and glucosyltransferases [58]. P450dependent oxidative modification is essential for the biosynthesis of diterpenes [58]. There are countless products formed in plants, among them diterpenoids are one of the most diverse groups, consisting of more than 12,000 metabolites [27] that have proven to be valuable as therapeutic drugs [59]. Our phylogenetic analysis of diterpenoid biosynthesis CYP71 clan members revealed that Fontainea have representatives within the CYP71D and CYP726A.
All F. picrosperma diterpenoid biosynthesis P450 genes were significantly more highly expressed in root tissue compared to leaf tissue. The expression of genes can be affected by the developmental stage of plants, environmental conditions, seasonal and diurnal effects as well as biotic and abiotic stress [60]. Therefore, future research should explore the expression of the identified P450 genes under these different scenarios and in additional tissues. In other plants, diterpenoid biosynthesis P450 genes were highly expressed in root tissue compared to leaf and flower [30]. Nonetheless, Fontainea CYP71D and CYP726A genes are excellent candidates for involvement in diterpenoid biosynthesis pathways, in particular, the biosynthesis of epoxy-tigliane diterpene esters, which are only found in species of Fontainea, although further experiments are required to confirm this hypothesis.
Their identification allows for experimental analysis of their function, for example, in vitro expression of the proteins followed by gene expression detection, or by knock-in and knock-out can be completed. This may be followed by gene expression detection in vivo, depending on the availability of a robust experimental system. Also, the analysis of high TT producing F. picroserma, compared with low producers, will provide guidance about P450 (and other genes) that potentially regulate TT production.

Conclusions
This research represents an important first step to understanding the role of P450 genes associated with biosynthesis of diterpene esters in Fontainea species. A metabolome analysis showed that Fontainea species possess a chemical profile different from other plant species. This could at least partially be explained by the diversity of unique P450 found in Fontainea. Further intra-genus chemical variation could be due to variation in P450, specifically those that are predicted to be involved in diterpenoid metabolism (CYP71D1, CYP726A1 and CYP726A2), which are significantly more highly expressed in the root tissue compare to leaf tissue. These P450 are strong candidates as key enzymes required for the biosynthesis of medicinally significant diterpene esters of the epoxy-tigliane class.

Plant tissue collection for tigilanol tiglate analysis
Root, leaf, bark and fruit tissue samples were collected from F. picrosperma seedlings grown in the University of the Sunshine Coast (USC, Sippy Downs) greenhouse. Plants were grown in independent pots and kept in the greenhouse at ambient temperature and humidity according to Mitu et al. [12]. Dried (~150 mg) F. picrosperma root, bark, fruit and leaf tissues were extracted using methanol and the presence of TT confirmed using standard HPLC-UV techniques. Briefly, using an Agilent 1260 HPLC System, at 249 nm, with a Halo RP Amide 2.7 μm, 150 mm × 4.6 mm column and acetonitrile/water solvent scheme, UV profiles were compared to linear standard curves prepared using TT (supplied by EcoBiotics) between 0.0003 and 0.3 mg/mL.

Metabolome analysis
Fresh mature leaves of F. picrosperma, F. venosa, M. esculenta, R. communis, S. lycopersicum and A. thaliana were collected from at least 2 individual plants. The leaves were cleaned and placed over silica-gel in zip-lock bags for 2 weeks in dark at room temperature. From 0.1 g of dried ground leaves, 3 mL of methanol was added. Ultrasound extraction was carried out for 1 min at room temperature prior to filtration onto a pre-weighed 20 mL borosilicate glass scintillation vial. The residue was extracted again with 3 mL of methanol in the same conditions. The methanol in the filtrate was evaporated using a Genevac centrifugal EZ-2 evaporator. The dried residue was then reconstituted in methanol to prepare a solution at 1 mg/mL and stored at − 20°C in dark. Separation was performed on an ExionLC uHPLC system (Shimadzu) equipped with a Kinetex C 18 column (2.1 × 150 mm, 100 Å). The column compartment was maintained at 40°C and the auto-sampler was kept at 15°C. The mobile phase was composed of water with 0.1% formic acid (v/v) (solvent A) and acetonitrile with 0.1% formic acid (v/v) (solvent B), and was run at a flow rate of 0.5 mL/min. The linear gradient started at 2% solvent B for 0.1 min, increased to 100% solvent B for 14.5 min and then was kept at this level for 1.4 min. Starting conditions were achieved in 0.5 min and re-equilibrated for 3.5 min, resulting in a total uHPLC run time of 20 min. The injection volume was 5 μL. The X500R QTOF mass spectrometer (Sciex) equipped with an ESI source in a positive mode was controlled by the Sciex OS software. The curtain gas, gas 1, gas 2 were set at 25, 40 and 50 psi, respectively. The spray voltage, declustering potential and collision energy were applied at 5500, 100 and 35 V, respectively. Spectral data was recorded in the mass range of m/z 100-1500 Da. The MS ions were extracted from LC-MS data using MZmine 2 (version 2.53) with setting parameters including retention time 0.6-17.5 min, retention time tolerance of 0.1 min, mass range m/z 100-1500 Da, mass tolerance of 0.02 Da, a noise threshold of 100. The pre-processed peak table data matrix was submitted onto a web-based service MetaboAnalyst (https://www.metaboanalyst.ca/), for metabolomic data analysis.

Plant tissue collection for transcriptomics
F. picrosperma and F. venosa seedlings were provided by EcoBiotics Ltd. and plants were grown in the USC (Sippy Downs) greenhouse. Plants were grown in independent pots and kept in the greenhouse at ambient temperature and humidity according to Mitu et al. [12]. The plants were used from 2 to 4 years old and healthy, fully expanded leaves and actively growing root tips, including the apical meristem and root caps were dissected (single leaf from each plant) and (1 cm 2 root) from plants of each species and preserved following the procedure described by Mitu et al. [12].

RNA isolation
Total RNA was isolated from~100 mg of leaf and root tissue of 12 individual F. picrosperma and 3 individual F. venosa using the Qiagen mini plant kit (Hilden, Germany), according to the manufacturer's protocol. The initial yield and purity of RNA were measured using a Nanodrop spectrophotometer 2000c (Thermo Scientific, Waltham, MA, USA) at 260 and 280 nm and agarose gel electrophoresis. An Agilent Bioanalyzer 2100 (Agilent Technologies, USA) was used to analyse the RNA integrity number (RIN). High-quality total RNA (RIN > 7) was provided to Novogene (Beijing, China) for cDNA synthesis (cDNA Rapid Library Preparation Kit, Roche, Mannheim, Germany) and paired-end Illumina HiSeq 2500 sequencing (Illumina, San Diego, CA, USA).

De novo assembly and functional annotation
Two reference transcriptome libraries were prepared from two individual plant samples of F. picosperma and F. venosa leaf and root tissue. Quality of raw reads of each library were checked separately using FastQC [61] and Trimmomatic [62]. Trimmed reads of the different RNA-seq libraries for F. picrosperma and F. venosa were merged separately prior to assembly using Trinity [63], which applies a de novo reconstruction method. Quality of the assembly was assessed using the built-in Trinity Perl script to generate an N50 value. Alignment coverage rate was calculated using the program Bowtie [64] with a cut-off set at 70%. Following assembly, Transdecoder was used to predict open reading frames (ORFs) with default parameter of minimum 100 amino acids ORF length. Sequence datasets can be found in the NCBI (NCBI; www.ncbi.nlm.nih.gov), Sequence Read Archive (SRA) database under the accession number PRJNA687112. The assembled protein sequences were used as queries against NR protein database (NCBI nonredundant protein sequences), Nt (NCBI non-redundant nucleotide sequences), Swiss-Prot (a manually annotated and reviewed protein sequence database), KO (KEGG Ortholog database) and GO (Gene Ontology) to identify putative protein functions using the BLAST algorithm with an E-value cut-off of 10 − 5 [65].

Classification and characterization of Fontainea P450 genes
An integrated HMM-search and InterProScanverification approach was applied to identify the putative P450 gene families in Fontainea species. The P450 family HMM model was used through HMMER3 [66]. The filtered sequences were further blasted using NCBI (https://www.ncbi.nlm.nih.gov/) with the cut-off E-value of 10 − 5 . Sequences annotated as P450 members were collected. The full-length P450 proteins were identified manually using nucleotide sequences in ExPASy [67] based on Chen et al. [30] and after annotation those sequences that did not match with any plant species were discarded. After filtering, the coding sequences of the resultant subjects were retrieved. Finally, results from the two methods were integrated and corrected manually. A BLAST search of F. picrosperma and F. venosa P450 genes against 814 previously identified P450 genes from 4 different plant species (A. thaliana, R. communis, M. esculenta, S. lycopersicum) was performed for P450 gene classification.
All full-length P450 genes were named according to the standard P450 nomenclature [21]. Briefly, 40, 55 and 97% sequence identities were used as cut-offs for family, subfamily and allelic variants, respectively. According to a previous study [57], functions of P450 clans were identified. We also calculated their instability index (II) using ExPASy tool (https://web.expasy.org/protparam/). Theoretical iso-electric points (PI) and molecular weight (kDa) were used to assess the physicochemical properties of putative P450s for each full-length P450 protein, as predicted by the ExPASy tool (http://www.expasy.org/ tools/). The GRAVY values were calculated using GRAV Y calculator (http://www.gravy-calculator.de/). The cellular locations were predicted using the TargetP1.1 server with specificity > 0.95 (http://www.cbs.dtu.dk/ services/TargetP/). Furthermore, P450 motifs were confirmed by Multiple Expectation Maximization for Motif Elicitation (MEME) program with the motif number set to 10 and all other parameters were default (http://alternate.memesuite. org/) [68]. P450 protein sequences were collected from F. picrosperma, F. venosa and other species that have appropriate genomic data available: A. thaliana, R. communis, M. esculenta, S. lycopersicum. F. picrosperma. A pairwise BLASTp comparison was performed using Geneious R11 11.0.2 with default parameters.

Phylogenetic analysis of P450
One hundred and three (103) full-length genes in F. picrosperma and 123 genes in F. venosa were used for phylogenetic representation in Fontainea species. Sequence alignment was performed using Geneious 11.02 software performing ClustalW alignment. The phylogenetic tree was constructed using FastTree and maximum likelihood (ML) algorithm. The statistical bootstrap support of each branch was assessed by re-sampling the amino acid positions 1000 times. The maximum likelihood phylogenetic tree and evolutionary analyses were carried out using iTOL web server (https://itol.embl.de/) [69]. For conserved domain identification, multiple sequence alignment of full-length Fontainea species protein sequences were carried out using ClustalX program using default parameters [70]. The alignment file was submitted to Web Logo generator software for generating the logo of conserved domains available at (http:// weblogo.berkeley.edu/) [71].

Relative P450 gene expression
Total RNA was extracted from leaf and root tissue of 3 different F. picrosperma and F. venosa plants using an RNAeasy plant extraction kit from QIAGEN® (Hilden, Germany) according to Mitu et al. [12]. The expression levels of P450 genes were calculated using the CLC genomic 11.01 software package following default parameters. Raw counts for RNA-sequencing data of F. picrosperma and F. venosa genes were normalized to Transcripts Per Million (TPM). Levels of expression were represented as the log2 ratio of transcript abundance between leaf and root tissues. Next, we generated a z-score for sequencing depth normalized reads counts. Expression of each enzyme in leaf and root tissue was analysed by normal clustering. Relative expression profiles of P450 genes were presented in the form of a heatmap, which was constructed using z-score with Clustvis (https://biit.cs.ut.ee/clustvis/) [72], using default parameters and a hierarchical clustering analysis to assess biological sample relatedness. We also identified homolog sequences in both Fontainea species (percentage of identity 92%). Another heatmap was constructed with zscore of average TPM value of 3 different plant of each species. To determine the statistically significant differences between leaf and root tissue of homolog sequences, we used Microsoft Excel software 2013 to conduct Student's t-test. Values are reported as average z-score from three different plant of Fontainea species. Significant differences: p < 0.05.

Phylogenetic and quantitative analysis of diterpenoid P450
Previously identified diterpenoid P450s genes from members of the Euphorbiaceae family were acquired from NCBI (NCBI; www.ncbi.nlm.nih.gov), then used to identify Fontainea diterpenoid P450s genes, by homology. Identified genes were used to construct a phylogeny tree using Geneious 11.02 software following ClustalW alignment. The phylogenetic tree was constructed using the maximum likelihood method, as described above. For quantitative analysis, RNA-seq from leaf and root tissue from 12 F. picrosperma individuals were used. The high-quality cleaned reads were aligned to the F. picrosperma reference transcriptomes using CLC genomic workbench 11.01 following default settings. To identify their gene expression values, TPMs value was calculated, as described above and a bar graph was created with their average TPM value. Diterpenoid genes showing statistically significant differential expression between root and leaf tissue of F. picrosperma were determined by Student's t-test using Microsoft Excel software 2013. Values were reported as average TPM value from 12 different plant of F. picrosperma. Significant differences: p < 0.01.