Sequencing and assembly
A total of 16,842,368 paired-end reads (2x150) were generated (5,276,841 for leaves, 5,000,558 for stem and 6,240,602 for roots). Prior to the assembly process, the paired reads were trimmed and/or merged together using the SeqPrep pipeline (see methods for more details). A de novo assembly was generated using Oases [19], a Bruijn graph-based assembler designed as an extension of Velvet [20] mainly used to assemble short-read sequences derived from transcriptomics data. Velvet/Oases produced a total of 61,620 contigs ranged from 0.1 to 10 kb, with an average length of 547.28 bp (Additional file 1). The GC contents of the contig set was approximately 44.7 %, which is similar to the GC content of the coding regions from other species within the Malpighiales order (reviewed in [21]). The N50 of these contigs was also estimated and resulted in a moderately high value of 867 bp. A fairly large number (40,727) of assembled contigs (40,727, which represents a 66.01 % of the total), were between 200 bp and 500 bp in length, indicating the presence of assembled fragments. For practical purposes, in the presented work, all contigs from the dataset are referred to as unigenes. The BLASTx algorithm [22] was used to annotate the unigenes based on the traditional top-BLAST-hit annotation method. As references, a collection of protein databases including the Arabidopsis thaliana (Arabidopsis) and plant RefSeq databases were used for this purpose. A significant value (e-value) of 10−5 was applied as threshold in the BLASTx similarity searches (Additional file 2: Table S1).
Functional annotation
Based on the Arabidopsis top hits, Gene Ontology (GO) annotations for the C. brasiliense unigenes were obtained. WEGO software [23] was used to perform GO functional classification into the three major categories (biological process, molecular function, and cellular components). Among the unigenes with Arabidopsis hits, 42,090 (68.30 %) were assigned to different gene ontology categories with a total of 367,994 functional terms, of which 103,865 are unique. Biological processes comprised the majority of the functional terms (178,629; 48.54 %), followed by cellular components (95,428; 25.93 %) and molecular functions (93,937; 25.52 %) (Additional file 3: Figure S1; see also Additional file 4: Table S2). Top-ranked categories of GO biological processes were the sub-categories corresponding to cellular (27,090 unigenes) and metabolic (24,653 unigenes) processes. Interestingly, response to stimulus (14,101 unigenes) and biological regulation (12,646 unigenes) were also prominently represented among GO biological processes categories. In addition to functional annotation based on GOs, C. brasiliense unigenes were classified based on metabolic pathways available and described in Kyoto Encyclopedia of Genes and Genomes (KEGG). KEGG Automatic Annotation Server (KAAS; [24]) was used to assign to C.brasiliense unigenes the KEGG Orthology (KO) codes and enzyme commission (EC) numbers. KO codes were assigned to 4,881 unigenes, of which 1,733 could be associated to specific EC numbers related to 226 different metabolic pathways (Additional file 2: Table S1).
Gene expression profiles of C. brasiliense organs
The gene expression profiles of roots, stems, and leaves were analyzed by mapping paired reads to the transcriptome assembly. An expression profile matrix containing the contigs (rows) and the number of mapped reads in each transcriptome sample (columns) was created (Additional file 4: Table S3). To make read counts comparable among samples, a normalization of reads per kilobase per million (RPKM) was performed. In order to distinguish expressed genes in at least one of the organs sampled from background, a threshold of RPKM ≥ 3 was used [25]. According to the stringency levels, 26,299 unigenes (59.26 % of the total) were expressed in all three tissues sampled (ubiquitous genes). The remaining unigenes were expressed in a single tissue (organ specifics) or they were distributed among different tissue combinations. The likelihood ratio statistic (R-statistic) [26] was calculated and used to identify those genes with significant expression level differences among sampled organs. Higher R-values indicate a greater probability of differential expression, whereas R-values near zero represent constitutive expression. In this context, it was considered as preferentially expressed genes the unigenes with R-values ≥ 8 (true positive rate of ~98 %) [26]. A total of 3,173 (7.15 %) C. brasiliense unigenes were identified as preferentially expressed genes, 1,274 in leaves, 965 in stems, and 934 in roots, some of which (247, 143 and 231, respectively) could be considered as organ-specific. (Fig. 1 and Additional file 4: Table S4).
In order to gain insight into the organ-function connection, the top 10 organ-specific unigenes were surveyed. In leaves, three C. brasiliense unigenes (UN36044, UN28345 and UN34582), which are homologous to Arabidopsis members of the glycine-aspartic acid-serine-leucine motif lipase/ hydrolase (GDSL lipase) family, were highly expressed as well as some homologs (UN09544 and UN13106) of 3-ketoacyl-CoA synthase proteins (KSC). Consistently, members of the GDSL lipase gene family, such as AT5G33370 and AT3G04290, are co-regulated with genes involved in cutin biosynthesis as CER6/KCS6/CUT1 (AT1G68530), which has a dominant role in the elongation of very-long-chain fatty acids for cuticular wax synthesis [27–29]. The lipoxygenase/peroxygenase pathway is also involved in biosynthesis of cutin monomers [30]; this could explain the expression profile of the UN00603, which was identified as a leaf-specific unigene and annotated as homologous to chloroplast lipoxygenase LOX2 (AT3G45140), an enzyme required for wound-induced jasmonic acid accumulation in Arabidopsis [31]. The presence of all of these genes in the leaf transcriptome was expected considering that epicuticular waxes are produced either exclusively during leaf development and expansion, or during the entire lifetime of the leaf [18].
Regarding stems, a homolog of fasciclin-like arabinogalactan protein FLA12 (AT5G60490; unigene UN35075) exhibited one of the highest expression levels as a stem-specific gene, which is consistent with previous reports showing that the expression of some members of the FLA gene family are correlated with the onset of secondary-wall cellulose synthesis in Arabidopsis stems, and with wood formation in the stems and branches of trees. This data suggests that unigene UN35075 may play a biological role in C. brasiliense stem development [32]. Additionally, genes encoding enzymes related to monolignol biosynthesis, such as phenylalanine ammonia-lyase (PAL1; UN01310), caffeoyl-CoA 3-O-methyltransferase (CCoAOMT; UN21637 and UN12250), and 4-coumarate: CoA ligase (4CL; UN01988), were identified as preferentially expressed in stems, although transcripts could be detected in all three organs sampled. Lignin, which plays a crucial role in conducting tissue in plant stems, is synthesized from the oxidative coupling of monolignols [33]. In addition, 16 unigenes homologous to Arabidopsis IRX proteins were also classified as specifically or preferentially expressed in stems; this was expected considering that the irregular xylem (irx) mutant is characterized by a reduction in cellulose in stem tissue [34]. Finally, two C. barsiliense unigenes (UN02173 and UN03226), which were homologous to members of the family of high affinity phosphorous transporters (PHT1), were identified as highly expressed only in roots, as well as the UN03833 unigene, a homolog of ARSK (AT2G26290), a poorly characterized gene encoding a root-specific kinase [35].
In order to validate the expression profiles obtained by normalized read counts, RT-qPCR was performed using nine chosen genes. All genes evaluated showed RT-qPCR expression profiles in complete agreement with the profiles derived from read counts analyses (Fig. 1).
Functional annotation of preferentially expressed genes
An enrichment analysis of GO terms was performed in order to identify, among the biological process categories, those subcategories that were over-represented in at least one of the organs analyzed (Additional file 4: Table S5). To compare the abundance of some preferentially expressed genes with specific biological processes, a modification of the same method applied to the preferentially expressed genes was used (the percentage of unigenes annotated with a given GO term in the organ-preferentially expressed genes was compared with the percentage of genes annotated with the same GO term in the complete set of the C. brasiliense transcriptome; Additional file 4: Table S6). Amoung the GO terms assigned to the sub-set of 3,173 of C. brasiliense preferentially expressed unigenes within biological process category, some over-represented biological processes were photosynthesis, alkaloid, and terpenoid metabolic-related (Additional file 3: Figure S2; see also Additional file 4: Table S7). This is consistent with the notion that the carbon skeletons of all secondary plant products are derived from carbohydrates synthesized by photosynthesis and that the synthesis of some of them depends on primary metabolites (Fig. 2). It should be noted that some biological processes showed no significant differences when the number of unigenes grouped for each process in each organ were compared. This was the case for the phenylpropanoid biosynthesis pathway (GO: 0009699), which showed an average of 39 genes that were grouped within this category in each organ sampled (Additional file 4: Table S6).
The biosynthesis of umbelliferone, a key precursor in the calanolides formation
Coumarins are synthesized in plants via the shikimate pathway, in which phenylalanine is an end product that also gives rise to the aromatic amino acids tyrosine and tryptophan and other small molecules such as flavonoids and hydroxycinnamic acid conjugates [36]. Successive para- and ortho- hydroxylation of trans-cinnamate (conjugate base of trans-cinnamic acid) leads to the formation of coumarin via 2-coumarate, or via 4-coumarate, to the formation of hydroxycoumarins such as umbelliferone (7-hydroxycoumarin). Other hydroxycoumarins lacking oxygenation at C-7 also share the trans-cinnamic acid as its precursor. According to EC numbers assigned to C. brasiliense unigenes, with only one exception (glutamate-prephenate aminotransferase; EC: 2.6.1.79), homologs from all enzymes required for the formation of L-phenylalanine via the shikimate pathway were identified (Additional file 3: Figure S3).
The key enzymes involved in the umbelliferone biosynthetic pathway (via 4-coumarate) are: a) phenylalanine ammonia lyase [PAL; EC:4.3.1.24], b) cinnamate 4 hydroxylase [C4H; EC:1.14.13.11], c) 4-coumarate: CoA ligase [4CL; EC:6.2.1.12], and d) 4-coumaroyl 2′-hydroxylase [EC:1.14.11.-] (Fig. 3). These enzymes involved in umbelliferone biosynthesis have been functionally characterized in Ruta graveolens species [39].
Proteins encoded by PAL gene family members can catalyze the conversion of L-phenylalanine to ammonia and trans-cinnamic acid, and are found in angiosperms and gymnosperms and also in some mosses, algae, fungi, and bacteria [37]. Based on the annotations extracted from KEGG, unigene UN01310 was identified as a homolog of PAL (K10775, EC:4.3.1.24). The WISE2 program [38] (http://www.ebi.ac.uk/Tools/psa/genewise/) and the Arabidopsis PAL1 protein (AT2G37040) were used to identify coding sequences in their correct open reading frames (this program uses a protein sequence as a template to predict a related protein sequence in the analyzed DNA sequence), then using the SeaView program [39], the protein-coding nucleotide sequences were aligned based on their corresponding amino acid translations to calculate the percent of identity at nucleotide and amino acid levels (Additional file 5). The identity between C. brasiliense/Arabidopsis PAL genes was estimated to be 69.05 % at the nucleotide level (CDS) and 79.9 % at the amino acid level. The distinctive aromatic amino acid lyase conserved domain (PF00221) was identified by motif/domain search against the Pfam database [40] (http://pfam.xfam.org/). Through a BLAST search based on the PF00221 domain, a total of eighteen C. brasiliense unigenes were identified as homologous to PAL genes, all of them annotated as homologs of some of the four existing Arabidopsis PAL genes ([41] PAL1; AT2G37040), PAL2; AT3G53260, PAL3; AT5G04230 and PAL4; AT3G10340) during the top-blast-hit annotation process (Additional file 2: Table S1). A phylogenetic tree from C. brasiliense PAL proteins was generated. Only the unigenes for which the proteins derived from their corresponding coding sequences represent at least 65 % of the length of their homologs were included (UN01310, UN03680 and UN02730). The PAL family members from Arabidopsis were also included in phylogenetic analysis. The multiple sequence alignments (Additional file 6) show that C. brasiliense proteins have high identity (ranged from 49.2 to 80.0 %) with PAL family proteins from Arabidopsis, and a three-dimensional structure analysis indicated that these proteins also contain a highly similar putative structure to other PAL plant proteins previously reported (Fig. 4). The high similarities of protein sequences among the C. brasiliense and Arabidopsis homologs and 3D models of the PAL family suggest that they may play similar functional roles.
Similar approaches to those described above were used to identify the remaining C. brasiliense genes involved in umbelliferone biosynthesis. A total of three unigenes were identified as homologous to the only trans-cinnamate 4-hydroxylase (AT2G30490; C4H) in the Arabidopsis genome (Additional file 3: Figure S4 and Additional file 7). C4H is a plant-specific cytochrome P450 (PF00067) that catalyzes the second step of the multibranched phenylpropanoid pathway [42]. Regarding 4-coumarate CoA ligase [4CL; EC:6.2.1.12], a total of six unigenes were detected as homologous to these proteins; however, only in three of them (UN01603, UN01988 and UN01725), were complete open reading frames identified (Additional file 8). The motif/domain searches revealed that both the AMP-binding C-terminal domain (PF13193) and common AMP-binding central domain (PF00501) are present in the translated sequences corresponding to these C. brasiliense unigenes (Additional file 3: Figure S5). Commonly, most angiosperms encode to a small family of 4CL (e.g., seven members in case of Arabidopsis). These enzymes are involved in the last step of the general phenylpropanoid pathway, and in addition to using 4-coumarate as substrate, they also convert p-coumaric acid, ferulic acid, caffeic acid and 5-OH-ferulic acid with different catalytic efficiency [43, 44].
Finally, the bi-directional best hit (BBH) method was used to identify a homolog of 4-coumaroyl 2′-hydroxylase [EC:1.14.11.-] from Ruta graveolens. The 4-coumaroyl 2′-hydroxylase isolated/characterized from R. graveolens (Accession JF799117.1) is the only enzyme that has been specifically assigned to coumarin synthesis, and to a lesser extent this enzyme also accepts 4-coumaroyl-CoA to produce umbelliferone [45]. The C. brasiliense unigene UN02124, in which the cytochrome P450 conserved domain PF00067 is present, showed 87.2 % identity with the published protein from R. graveolens (Additional file 3: Figure S6 and Additional file 9). The final step of the in vivo pathway for the synthesis of umbelliferone involves a trans-cis isomerization followed by a subsequent lactonization of the 2, 4-dihydroxy-cinnamoyl-CoA that closes the side chain, and this reaction occurs spontaneously (Fig. 3).
Considering that we were able to identify homologs to all genes involved in the umbelliferone biosynthetic pathway via 4-coummarate, and due the absence of several transcripts that potentially encode for enzymes such as cinnamate 2-hydroxylase (EC: 1.14.13.14), 2-coumarate O-β-glucosyltransferase (EC: 2.4.1.114), 2-coumarate β-D-glucoside isomerase (EC: 5.2.1.-) and coumarinic acid glucoside β-glucosidase (EC:3.2.1.21), which were first characterized in Melilotus alba [46, 47] were they are all involved in coumarin biosynthetic pathway via 2-coumarate, we suggest that in C. brasiliense, umbelliferone is synthesized via the 4-coumarate pathway. This was expected, because in contrast with mammals, only in a few plant species (e.g. Catharanthus roseus and Conium maculatum) has it been suggested that enzymes capable of carrying out the hydroxylation of coumarin in C-7 to produce umbelliferone exist (reviewed in [48]).
Analysis of putative candidate genes involved in the calanolide (angular pyranocoumarins) biosynthetic pathway
The calanolides are pyranocoumarins, one class of coumarin derivates. The biosynthesis of pyranocoumarins is still poorly understood, but it has been reported that remarkable structural similarities between furanocoumarins and pyranocoumarins exist. Both classes of coumarins (furano- and pyrano- coumarins) can be grouped into linear and angular types. The determination of the type, either linear or angular, is based on the prenylation position of the common precursor (umbelliferone). In the case of the linear type, the pyran (or furan) ring is attached at C-6 and C-7, while in the angular type, the substitution is carried out at C-7 and C-8 [49]. Both types of pyranocoumarins are derived from the same precursors that give rise to linear and angular furanocoumarins (demethylsuberonosin and osthenol, respectively) (Fig. 5).
Previous studies conducted in Pimpinella magna and Pastinaca sativa plants could be considered the first experimental evidence to prove that linear and angular furanocuoumarins are derived from umbelliferone, prenylated at either the C-6 (leading to the formation of demethylsuberonosin; DMS) or C-8 position (osthenol), respectively [50]. In later years, additional investigations revealed that the cyclization of demethylsuberosin leads to (+)-marmesin formation, taking place through an enzymatic reaction that occurs in the presence of NADPH and molecular oxygen [51]. These ‘mamersin synthases’ have been identified as cytochrome P450 monooxygenases in Petroselinum crispum and Ammi maju plant species (reviewed in [52]). The range of reactions catalyzed by P450s include the epoxidation of olefins by insertion of an ‘oxen’ [53], and the reactive product of this reaction often inactivates the enzyme by alkylation of the prosthetic heme group [54]. However, no intermediate was released from mamersin synthase reaction, and it is likely that the 7-hydroxyl group of demethylsuberosin delocalizes the double bond electrons and favors the instantaneous cyclization to the dihydrofuranocoumarin (Fig. 5a). Model mechanisms have been proposed for the reactions mediated by catalytic action of the P450 enzymes, and one of these mechanisms consists of primary interaction of the catalytic P450 oxo-derivative, formed by heterolytic cleavage of the oxygen-oxygen bond in the ferric-hydroperoxy species, with aliphatic double bonds [52, 54]. This mechanism of reaction is compatible with such a cyclization of demethylsuberosin to (+)-marmesin, avoiding the formation of an intermediate epoxide (Fig. 5a). The formation of (−)-columbianetin from osthenol is catalyzed in an analogous fashion and the subsequent activities of both enzymes, angelicin and psoralen synthases, have been supported experimentally [55, 56]. It appears feasible that the synthesis of linear and angular pyranocoumarins such xanthyletin [57], graveolone [58], or seselin [59] may be produced concomitantly with furanocoumarins in a very similar way as from demethylsuberosin or osthenol, respectively (Fig. 5b).
Here, we propose two closely related biosynthetic pathways for calanolide biosynthesis (Fig. 6). These metabolic pathways imply some steps in the biosynthesis upon the basis of the presence of some related-compounds previously identified in other plant species. Starting from umbelliferone (1), a highly abundant metabolite identified in several plant species belonging to the Rutaceae and Compositae families [60], it is likely that the C-8 prenylated derivative (osthenol), which has been previously considered as a precursor at angular pyranocoumarins biosynthesis, could be one of the first intermediates. However, considering that some umbelliferone-related metabolites such as esculetin (2), daphnetin (3), 6,7,8-trimethoxycoumarin (4) and 5,7,8-trimethoxycoumarin (5), have been previously identified in several different plant species (reviewed in [49, 61]), it may be suggested that the formation of 5,7-dihydroxycoumarin (6) could be an initial step before the c-prenylation of umbelliferone at C-6, which leads to osthenol. Additional steps in calanolide biosinthesis include the epoxidation of 5,7-dihydroxy-6-(3-methyl-2-butenyl) coumarin (7) to (8), followed by the cyclization of (8) to (9), which after a dehydration step leads to (10). (10) could be a precursor of alloxanthoxyletin (11) [61], or in a similar way, a second c-prenylation at C-8 (12) may lead to (13) after their corresponding epoxidation-cyclization. Successive steps from (13) may involve the C-3 alkylation and the mono-hydroxylation of the pyran ring to produce (14), and then by Wagner-Meerwein rearrangement (1,2-methyl shifts), followed by the subsequent hydroxylation, the calanolides A, B and C could be produced (Fig. 6). Details from Wagner-Meerwein rearrangements that might lead to a calanolide formation (perhaps favoring the biosynthesis of one over the others) are shown in Additional file 3, Figure S7. The second suggested pathway could occur in a very similar way, starting with the c-prenylation of umbelliferone at C-8, which would lead to seselin production [59] and followed by a second c-prenylation at position C-4 (Fig. 6). Apparently, unlike the biosynthesis of furanocoumarins, in which the formation of an intermediary epoxide is avoided, the pyran ring formation in the pyranocoumarin biosynthetic pathway depends precisely on that intermediate.
Considering the remarkable structural similarities between furanocoumarins and pyranocoumarins (linear and angular), and the fact that both classes of compounds are derived from the same precursors (DMS and osthenol), it is possible to hypothesize that the enzymes involved in these biosynthetic pathways might share common ancestry. This hypothesis is consistent with the observation that furanocoumarins and pyranocuoumarins do not usually coexist in the same species. Even when many unknowns about biosynthesis and the function of furanocoumarins remain unresolved, it is clear that our knowledge about closely related compounds, such as pyranocoumarins, appears to be even less.
Concerning furanocoumarins, it has been reported that they are produced by a wide variety of plants in response to pathogen or herbivore attack. They are activated by ultraviolet light and can be highly toxic to certain vertebrate and invertebrate herbivores due to their integration into DNA, which contributes to rapid cell death [49, 62]. It has been suggested that linear and angular furanocoumarins are the results of a co-evolutionary process between plant and insects. Plant-insect interaction studies reveal that linear furanocoumarins are more toxic than angular ones; however, angular structures apparently produce a synergistic effect when they are combined with linear ones [62]. When mixed, linear and angular furanocoumarins result in a combination that is more difficult for insects to detoxify [62]. Apparently, during evolution, angular furanocoumarins appeared later than linear ones, a hypothesis that finds support based on the observation that angular furanocoumarins are always found concomitantly with linear structures, while linear types can be found alone (reviewed in [49, 63]).
During recent years, many enzymes involved in furanocoumarin biosynthesis have been described at the molecular level, including three P450 monooxygenases (psoralen-, angelicin-, and (+)-marmesin- synthases [55, 56]), bergaptol O-methyltransferase [64], as well as the key enzyme, a prenyltransferase [63]) involved in the critical step leading to precursors synthesis of linear and angular furanocoumarins (DMS and osthenol respectively; see Figs. 5 and 6). Additional P450-dependent enzymatic steps have remained unresolved, but the participation of this class of enzymes has been suggested in many steps of furanocoumarin biosynthesis. (Figs. 5 and 6, see [49] for an extended revision). Furanocoumarins are produced by a wide variety of plants in response to pathogen or herbivore attack.
The biosynthesis of pyranocoumarins has not yet been investigated. However, considering that psoralen synthase shows 70 % identity with angelisin synthase, and that its participation in linear and angular furanocoumarins biosynthesis has been previously demonstrated [55, 56], and based on the assumption that enzymes involved in pyranocoumarin biosynthesis might share a common ancestor of unknown functionality (perhaps as a result of gene duplications and subsequent molecular evolution), we used these and other enzymes involved in furanocoumarin biosynthesis as sequence references to identify homologs in the C. brasiliense unigenes set.
First, the prenyltransferase (PT) identified in parsley (Petroselinum crispum) [63], the angelicin synthase (CYP71AJ4) from Pastinaca sativa, the psoralen synthase (CYP71AJ1) identified from Ammi majus and their orthologs (CYP71AJ2 and CYP71AJ3 from Apium graveolens and Pastinaca sativa, respectively) [55, 56], were used as references in tBLASTn similarity searches (e-value 10−6) against the C. brasiliense unigene dataset. Additionally, the poorly characterized CYP82H1 isolated from Ammi majus, which increases its expression levels in plant-fungi interactions and accompanying furanocoumarin biosynthesis, was also included [65]. Sequences of unigenes that upon translation show at least 20 % identity over a resides window that represents a complete coding sequence (CDS), or at least 70 % of the homologous protein, were retained for future analysis.
A total of six PT-like sequences were identified. The subsequent comparison against PFAM databases confirmed the presence of the UbiA prenyltransferase domain (PF01040). The unigene UN01964 (with 35 % shared identity) was resolved in the same clade as the parsley prenyltransferase (Additional file 3: Figure S8 and Additional file 10). Considering these results, we propose that UN01964 should be considered a leading candidate to encode a putative prenyl transferase involved in biosynthesis of the precursors that lead to synthesis of linear and angular pyranocoumarins.
Regarding the subsequent steps in pyranocoumarin biosynthesis, a total of 34 unigenes were identified as homologous to the previously characterized P450 monooxygeneases involved in the furanocoumarin biosynthetic pathway (stringency levels: coverage ≥ 70 %, identity ≥ 20 %). Three major clades were recognized on the phylogenetic tree, and one of these included only three C. brasiliense unigenes. The second clade brought together all CYP71AJ proteins that were included in phylogenetic analyses and a total of fourteen C. brasiliense unigenes. Finally, the third clade included the CYP82H1 P450 monooxygenase and the remaining C. brasiliense CYP-like sequences that were identified (Additional file 3: Figure S9, Additional file 11). The CYP71-related clade comprises two sister clades named as classes I and II, respectively. The translated Class I unigenes showed a percent identity that ranged from 31 to 46 % with respect to CYP71AJ proteins, while class II members were on average ~10 % less similar (Additional file 4: Table S8). Low percent similarities at the protein level were expected considering that the presence of furanocoumarins has not been reported in C. brasiliense, which is instead capable of producing pyranocoumarins (linear and angular, including the calonolide compounds) in young seedlings (mainly at leaves) and in callus cultures [13]. According to the expression profiles of these genes, 44 % were identified as preferentially expressed in some of the organs sampled. UN02363, UN03063, UN04124, and UN02841 were selected as preferentially expressed in leaves. With only one exception (UN02841), these genes were classified as CYP71-related; two of them (UN02363 and UN03063) grouped in class II and the other one into class I (UN04124). In addition, UN04124 possesses 35.6 and 35.8 % identity with the psoralen and angelicin synthases of Pastinaca sativa. Altogether, this data suggests that UN04124 can be considered a prime candidate for involvement in angular and/or linear pyranocoumarin biosynthesis.