Identification of QTLs affecting scopolin and scopoletin biosynthesis in Arabidopsis thaliana

Background Scopoletin and its glucoside scopolin are important secondary metabolites synthesized in plants as a defense mechanism against various environmental stresses. They belong to coumarins, a class of phytochemicals with significant biological activities that is widely used in medical application and cosmetics industry. Although numerous studies showed that a variety of coumarins occurs naturally in several plant species, the details of coumarins biosynthesis and its regulation is not well understood. It was shown previously that coumarins (predominantly scopolin and scopoletin) occur in Arabidopsis thaliana (Arabidopsis) roots, but until now nothing is known about natural variation of their accumulation in this model plant. Therefore, the genetic architecture of coumarins biosynthesis in Arabidopsis has not been studied before. Results Here, the variation in scopolin and scopoletin content was assessed by comparing seven Arabidopsis accessions. Subsequently, a quantitative trait locus (QTL) mapping was performed with an Advanced Intercross Recombinant Inbred Lines (AI-RILs) mapping population EstC (Est-1 × Col). In order to reveal the genetic basis of both scopolin and scopoletin biosynthesis, two sets of methanol extracts were made from Arabidopsis roots and one set was additionally subjected to enzymatic hydrolysis prior to quantification done by high-performance liquid chromatography (HPLC). We identified one QTL for scopolin and five QTLs for scopoletin accumulation. The identified QTLs explained 13.86% and 37.60% of the observed phenotypic variation in scopolin and scopoletin content, respectively. In silico analysis of genes located in the associated QTL intervals identified a number of possible candidate genes involved in coumarins biosynthesis. Conclusions Together, our results demonstrate for the first time that Arabidopsis is an excellent model for studying the genetic and molecular basis of natural variation in coumarins biosynthesis in plants. It additionally provides a basis for fine mapping and cloning of the genes involved in scopolin and scopoletin biosynthesis. Importantly, we have identified new loci for this biosynthetic process. Electronic supplementary material The online version of this article (doi:10.1186/s12870-014-0280-9) contains supplementary material, which is available to authorized users.


Background
Plants produce a great variety of secondary metabolites. It is estimated that between 4000 to 20 000 metabolites per species can be expected [1]. This great biochemical diversity reflects the variety of environments in which plants live, and the way they have to deal with different environmental stimuli. The production of specialized secondary metabolites is assumed to protect plants against biotic and abiotic stresses [2]. Although Arabidopsis is a small plant with short generation time and highly reduced genome, it has a set of secondary metabolites that is as abundant and diverse as those of other plant taxa [3]. In recent years, this model plant was extensively used towards identification of genes and enzymes working in a complex network involved in secondary metabolites biosynthesis and regulation [4].
Currently, genetic variation found between natural Arabidopsis accessions is an important basic resource for plant biology [5][6][7]. Arabidopsis with its extensive genetic natural variation provides an excellent model to study variation in the biosynthesis of secondary metabolites in natural populations. Recent genetic analysis of natural variation in untargeted metabolic composition uncovered many qualitative and quantitative differences in metabolite accumulation between Arabidopsis accessions [8][9][10]. Numerous studies [8,[10][11][12] proved the presence of abundant genetically controlled variation for various classes of secondary metabolites. Coumarins (scopoletin, scopolin, skimmin and esculetin) are one of the secondary metabolite classes found in Arabidopsis' roots [13][14][15][16]. But up to now, nothing is known about natural variation in coumarins content between Arabidopsis accessions.
Coumarins are a group of important natural compounds that provide for the plant antimicrobial and antioxidative activities, and are produced as a defence mechanism against pathogen attack and abiotic stresses [17]. Importantly, coumarins are widely recognized in the pharmaceutical industry for their wide range of therapeutic activities and are an active source for drug development. Numerous coumarins have medical application in the treatment of burns and rheumatoid diseases. Furanocoumarins, which are coumarin derivatives, are used in the treatment of leucoderma, vitiligo and psoriasis [18], due to their photoreactive properties. Moreover, they are used in symptomatic treatment of demyelinating diseases, particularly multiple sclerosis [19]. Furanocoumarin-producing plants that are currently studied are non-model organisms [20] and many approaches to identify the genes underlying genetic variation in coumarins accumulation are not yet available in those species. Scopoletin, which is a major coumarin compound of Arabidopsis, has been found in many plant species [21][22][23][24][25][26][27][28][29], and was clearly shown to have antifungal and antibacterial activities important for medical purposes [30]. All these properties make coumarins attractive from the commercial point of view.
Coumarins are derived from phenylopropanoid pathway, which serves as a rich source of metabolites in plants [31,32]. It was suggested that in Arabidopsis several branch pathways leading from phenylpropanoid compounds to coumarins are probable [14]. Scopoletin and scopolin biosynthesis was shown to be strongly dependent on the CYP98A3 [14], which is the cytochrome P450 catalyzing 3′-hydroxylation of p-coumarate units in the phenylpropanoid pathway [33]. The feruloyl-CoA was suggested to be a major precursor in scopoletin biosynthesis [15]. A key enzyme involved in the final step of scopoletin biosynthesis, which is the conversion of feruloyl-CoA into 2-hydroxyferuloyl-CoA, is encoded by a member of the iron (Fe) IIand 2-oxoglutarate-dependent dioxygenase (2OGD) family, designated as F6′H1 [15]. Despite the advances that have been made in previous years [15,[34][35][36][37][38][39][40][41][42] (Figure 1), many questions with regard to coumarins biosynthesis are still open [43]. In particular, the regulation of the biosynthesis of coumarins is not well understood. Up to now, all studies investigating coumarins biosynthesis in the model plant Arabidopsis were done with one laboratory accession Col-0, which was used as the genetic background of all mutant and transgenic plants.
To gain an understanding of the genetic architecture of coumarins biosynthesis, we screened a set of Arabidopsis accessions for variation in scopolin and scopoletin content, and subsequently conducted a quantitative trait locus (QTL) mapping. Our study addressed the following questions. Is there a natural variation in accumulation of scopolin and scopoletin between Arabidopsis accessions and what are genetic regions responsible for the observed differences? What are candidate genes possibly underlying QTLs involved in scopolin and scopoletin biosynthesis?

Phenotypic variation between accessions
A set of seven natural Arabidopsis accessions, which are the parents of existing RIL populations and represent accessions from different locations, were used in the initial screening for variation in scopolin and scopoletin accumulation. Accessions were grown in vitro in liquid cultures in order to obtain the optimal growth of plant roots. Under these conditions, most of the scopoletin is stored in root cells in vacuoles as its glycoside form, scopolin. In order to reveal the content of both scopolin and that of scopoletin, a subset of the methanol extracts made from Arabidopsis roots were subjected to enzymatic hydrolysis in order to hydrolyze the glycoside forms of coumarins. Using highperformance liquid chromatography (HPLC), we detected in the roots scopoletin (sct in Figure 2), as well as scopolin (scl in Figure 2BC). The identification of scopoletin in HPLC fraction ( Figure 3A) was further confirmed using gas chromatography/mass spectrometry (GC/MS) by comparison to spectrum library ( Figure 3B). The quantification of coumarins in methanol root extracts made from seven Arabidopsis accessions clearly showed the presence of natural variation in scopolin content before enzymatic hydrolysis ( Figure 4A) and scopoletin after hydrolysis ( Figure 4B). In spite of the fact that scopolin standard was not available and in order to unify further analysis, we measured the amounts of both scopolin and scopoletin as area% of total chromatogram signals. The statistically significant differences between group means for scopolin and scopoletin accumulation were determined by one-way ANOVA (p < 0.001 and p < 0.0001, respectively). Values that are not significantly different based on the post hoc test (least significant differences [LSD]) are indicated by the same letters ( Figure 4). Based on the obtained results we have selected an Advanced Intercross Recombinant Inbred Lines (AI-RILs) mapping population derived from the cross between Col-0 and Est-1, because these parents significantly differed in coumarins content. Further genetic analysis was performed using values for the accumulation of scopolin before enzymatic hydrolysis and the content of scopoletin after hydrolysis of methanol extracts.

Genetic analyses of scopolin and scopoletin accumulation
The scopoletin and scopolin content values were determined for three biological replicates of AI-RILs (n = 144 and n = 140, respectively) and parental lines, which were grown in independent flasks in liquid cultures. A set of lines (AI-RILs) showed a wider range of scopolin ( Figure 5A) and scopoletin ( Figure 5B) values than the ones observed for both parental lines (Col-0 and Est-1), which indicated the presence of transgressive segregation and suggested that multiple loci contribute to variation in the EstC population. The lowest scopolin content within AI-RILs was 1.90 (measured as an area% of total chromatogram signals) that corresponds to 20% of the minimum Col-0 value. The maximal relative value of scopolin was 45.13, which corresponds to 159% of the maximal Est-1 value. For scopoletin content, these values were respectively 7.82 (54% of the minimum Col-0 value) and 54.93 (159% of the maximal Est-1 value) ( Table 1). Having a commercially available scopoletin standard, we were able to quantify the scopoletin contents as μg/g fresh weight (μg/gFW) in both parental lines of the AI-RILs mapping population (Col-0 and Est-1) before and after enzymatic hydrolysis. The scopoletin levels in root samples not subjected to hydrolysis were~3 μg/gFW and~10 μg/gFW in Col-0 and Est-1 respectively, and~16 μg/gFW and~86 μg/gFW in samples after hydrolysis. These values correspond to~18, 54, 82 and 449 nmol/gFW respectively that is in the range found in the literature data, which vary from~1 to 1200 nmol/gFW depending on plant culture being used [14]. The calculated quantities of parental lines (Table 2) can be used as references for the overall quantity of the products in the whole mapping population.
In order to identify the fraction of variation that is genetically determined, the broad sense heritability (H 2 ) for scopolin and scopoletin content was estimated as described in Methods section. In the AI-RIL population,  the broad sense heritability ranged from 0.45 for scopoletin to 0.50 for scopolin content (Table 1). To explore the relationship between scopolin content in methanol root extracts before enzymatic hydrolysis and scopoletin levels in extracts subjected to hydrolysis, the mean values of coumarins for each AI-RILs were used as phenotype values in trait correlation analysis. A relatively strong genetic correlation (R 2 = 0.6634) was observed between the level of coumarins measured before and after hydrolysis in the AI-RILs population, indicating genetic co-regulation of scopolin and scopoletin biosynthesis ( Figure 6).

Mapping QTLs for scopolin and scopoletin accumulation
Six QTLs were identified, with one QTL being detected for scopolin and five QTLs for scopoletin accumulation ( Table 3). The QTL effect sizes ranged from the 7.0% to 16.7% of the phenotypic variance explained by the QTL (PVE), with three of the six QTLs having effect sizes below 10% PVE. One QTL (SCL1) was detected for scopolin accumulation at the bottom of chromosome 5 ( Figure 7) explaining the 13.86% PVE (Table 3), and five QTLs (SCT1 -SCT5) for scopoletin accumulation were identified on chromosome 1, 3 and 5 ( Figure 8, Table 3). No QTLs were detected on chromosome 2 and 4. To improve the QTL model explaining variation in a scopoletin content, the MQM approach was performed using two QTLs (SCT4 and SCT5) as cofactors. We have included in the model QTL on chromosome 1 (SCT1), despite its LOD score was slightly below the threshold (3.327). The whole model explains 37.6% variance for scopoletin content. No epistasis between the main effect loci were detected.

QTL mapping identifies known and new loci for coumarins biosynthesis
Some of the mapped QTLs underlying variation in scopolin (SCL1) and scopoletin (SCT1 and SCT2) accumulation in the AI-RILs population, co-localize with the genes annotated to be involved in coumarin biosynthetic process (Plant Metabolic Network, http://plantcyc.org/, Figure 1). We detected seven cloned and characterized genes encoding enzymes for scopoletin and scopolin biosynthesis that co-localize with detected QTLs (see Additional file 1). Within the SCL1 interval, which is characterized by one of the highest LOD score values, there are two very good candidates. One of them is At5g48930 encoding a shikimate O-hydroxycinnamoyltransferase (HCT), while the other one (At5g54160) encodes caffeic acid/5-hydroxyferulic acid O-methyltransferase (OMT1). Importantly, both genes are expressed in roots (SCL1 in Table 4). Within the SCT1 and SCT2 intervals underlying variation in scopoletin content more possible candidate genes were detected: At1g33030, At1g51990, At1g67980 and At1g67990 (TSM1) encoding proteins from O-methyltransferase family; At1g51680 and At1g65060 encoding isoforms of 4-coumarate:CoA ligase (4CL1 and 4CL3 respectively); At1g62940 encoding acyl-CoA synthetase (ACOS5); and At1g55290 encoding feruloyl CoA ortho-hydroxylase 2 (F6′H2).
In order to reveal other candidate genes possibly underlying detected QTLs, two QTLs for scopoletin content (SCT4 and SCT5) and one QTL associated with scopolin (SCL1) accumulation were chosen for further in silico analyses. The selected intervals are characterized by the highest percentage of phenotypic variance explained by each QTL and the highest LOD score values. The annotated functions for all genes located in the selected QTL intervals were checked. As a result, we selected genes encoding transcription factors that might be induced by environmental stresses and enzymes that according to the annotation functions could be possibly involved in scopolin and scopoletin biosythensis. Subsequently, we performed in silico analysis of the tissue distribution and level of expression of selected genes. Only genes that were expressed in roots were selected as possible candidates for further studies. As a result, we selected a set of genes that deserve close attention as possible new loci underlying variation in scopolin and scopoletin accumulation (Table 4). Among candidates possibly involved in scopoletin accumulation, a particularly interesting one is a CYP81D11 gene (At3g28740) encoding a member of the cytochrome P450 family, which is located within the QTL on chromosome 3 (SCT4 in The statistically significant differences between group means for scopolin and scopoletin accumulation were determined by one-way ANOVA (p < 0.001 and p < 0.0001, respectively). Values that are not significantly different based on the post hoc test (least significant differences [LSD]) are indicated by the same letters. The data analysis consisted of scopolin and scopoletin relative levels measured as area% of total chromatogram signals. Error bars represent the SD from three measurements. Table 4). According to the 1001 Genomes Project database (www.1001genomes.org) and re-sequencing data of Est-1 from our laboratory (see Additional files 2 and 3, indicated as Est-1*), the CYP81D11 gene contains several SNPs and one indel in the coding sequences of the parental lines of EstC mapping population and in the other accessions tested in this study (see Additional file 2).
Other interesting candidates are three genes (At5g14340, At5g14750, At5g15130) located within the QTL interval on chromosome 5 (SCT5 in Table 4), which encode members of the MYB and WRKY transcription factor families. These genes are relatively highly expressed in roots and their expression is induced by various environmental stresses [44]. A particularly interesting candidate that  could be possibly linked to scopolin accumulation was detected within the QTL on chromosome 5 (SCL1 in Table 4). It is At5g53990 encoding a UDP-glycosyltransferase, which is relatively highly expressed in Arabidopsis roots [44].
According to the 1001 Genomes Project and our resequencing data of Est-1, this gene contains several SNPs in the coding sequences of tested accessions including the parental lines (see Additional file 3). Interestingly, the CYP81D11 and UDP-glycosyltransferase sequences originating from Est, Est-1 (both taken from the 1001 Genomes Project database) and Est-1* that was re-sequenced in our laboratory are not identical (see Additional files 2 and 3). This needs to be further verified.

Discussion
Here, we report a QTL mapping study of variation in scopoletin and scopolin accumulation between two Arabidopsis accessions and thereby we demonstrate the usefulness of Arabidopsis natural variation in elucidating the genetic and molecular basis of coumarins biosynthesis. A large number of Arabidopsis recombinant inbred line (RIL) populations are available and extensively used for identification of numerous QTLs controlling various traits such as growth, development or resistance to different biotic and abiotic stresses as well as the content of chemical compounds [5,7,9,45,46]. In most studies, the average number of QTLs identified is between one and 10 and at least one major QTL is detected [47]. Here, one QTL for scopolin and five QTLs for scopoletin accumulation were detected, which is in agreement with the average result in the field. Using an AI-RILs mapping population has the advantage in comparison to RILs due to the fact that the opportunity for recombination is increased before genotypes are fixed upon selfing [48]. As a result, using AI-RILs mapping population that captures an increased number of recombination events [48], enabled us to detect QTLs with effect size as low as 7.0% PVE.
Once QTL has been identified, the next challenge is to identify the gene(s) underlying detected QTL. In most cases, a large number of genes that are present in the QTL interval cannot be directly tested for candidacy. In order to reduce the mapped region, a fine-mapping is performed in which many individuals are genotyped for markers around the QTL. More accurate QTL localization might lead to the selection of candidate genes. Nonetheless, performing a fine mapping may be practically difficult if the QTL effect is relatively small [49]. When multiple data sets are available, which is the case for Arabidopsis, it is possible to improve accuracy and to test the candidacy of genes within mapped QTL intervals [49] based on the available information. Therefore, it seems like a realistic possibility to identify candidate genes underlying a QTL by using the high throughput expression data and the complete genome sequences of numerous Arabidopsis accessions that were used to construct mapping populations. There are successful examples of using expression arrays in identifying genes causally associated with quantitative traits of interest, both in plants and animals [50,51]. In this study, possible candidate genes were found within mapped QTL intervals for scopolin and scopoletin content, including known and novel loci. Further functional analysis, including re-sequencing, characterization of loss-  of-function alleles and conducting gene complementation either by crossing or genetic transformation, are required to prove the role of selected possible candidate genes in coumarins biosynthesis and their regulation. Expanding molecular understanding of coumarins biosynthesis at an ecological level will be beneficial for the future discovery of the physiological mechanisms of action of genes involved in coumarins biosynthesis. It was suggested recently that some members the 2′-OG dioxygenase family, including the F6′H1 that is a key enzyme in scopoletin biosynthesis, may be involved in Fe deficiency responses and metabolic adjustments linked to Fe homeostasis in plant cells [52]. Other latest studies showed that Fe deficiency induces the secretion of scopoletin and its derivatives by Arabidopsis roots [53], and that F6′H1 is required for the biosynthesis of coumarins that are released into the rhizosphere as part of the strategy I-type Fe acquisition machinery [54]. Previously, the existence of natural variation in root exudation profiles was clearly detected among eight Arabidopsis accessions [55]. The above mentioned findings make a study of coumarins biosynthesis in Arabidopsis using naturally occurring intraspecific variation even more promising and up-to-date.

Conclusions
In summary, we have presented here for the first time a presence of naturally occurring intraspecies variation in scopoletin and its glucoside, scopolin, accumulation among seven Arabidopsis accessions. Even though, these accessions do not completely represent a wide genetic variation existing in Arabidopsis, it is assumed that these accessions should reflect genetic adaptation to local environmental factors [6]. A QTL mapping study of scopoletin and scopolin variation within EstC mapping population was conducted leading to the identification of new loci. The results presented here suggest that natural variation in coumarins content in Arabidopsis has a complex molecular basis. Importantly, they also provide a basis for fine mapping and cloning of the genes involved in coumarins biosynthesis.

Growth conditions
The seeds were surface sterilized by soaking in 70% ethanol for two min and subsequently kept in 5% calcium hypochlorite solution for eight min. Afterwards seeds were rinsed three times in autoclaved millipore water and planted on 0.5 Murashige and Skoog's (MS) medium containing 1% sucrose, 0.8% agar supplemented with 100 mg/l myo-inositol, 1 mg/l thiamine hydrochloride, 0.5 mg/l pyridoxine hydrochloride and 0.5 mg/l nicotinic acid. For stratification, plates were kept in the dark at 4°C for 72 h and then placed under defined growth conditions. All plants were grown in vitro in plant growth chambers under a photoperiod of 16 h light (35 μmol m −2 s −1 ) at 20°C and 8 h dark at 18°C. After 10 days seedlings were transferred from agar plates into 200 ml glass culture vessels (5.5 cm diameter × 10 cm high, glass jars with magenta B caps) containing 8 ml sterile liquid medium. Plants grown in liquid cultures were incubated on rotary platform shakers at 120 rpm. After 17 days plants were harvested (28 th day of culture), leaves and roots were frozen separately in liquid nitrogen and stored at −80°C. All genotypes were grown in three biological replicates (in independent flasks). The growth conditions were monitored by a HOBO U12 data logger (Onset Computer Corporation, Bourne, MA) that recorded the parameters (temperature, light intensity and relative humidity) in an interval at every five minutes.

Preparation of methanol extracts from Arabidopsis roots
The root tissue was homogenized using steel beads and sonication. The coumarins were extracted at 4°C with 80% methanol. After 24 h two sets of methanol extracts were centrifuged for 20 min at 13000 rpm, one set was additionally subjected to enzymatic hydrolysis using β-glucosidase from almonds (Sigma-Aldrich) dissolved in acetate buffer according to modified protocol of [56].

Scopoletin and scopolin quantification by High-Performance Liquid Chromatography (HPLC)
The methanol extracts of Arabidopsis roots with and without enzymatic treatment were analyzed ( Figure 2) using a Perkin Elmer series 200 HPLC system comprising of a quaternary LC pump, autosampler, column oven and a UV detector. All samples were filtered with 0.22 μm filters before loading. The volume injected was 10 μl. Gradient elution on Perkin Elmer C18 column SC18 (250×4.6 mm) was performed at flow rate of 0.7 ml/min with the following solvent system: (A) 50 mM ammonium acetate pH 4.5, (B) Methanol: starting from 30% B for 2 min, 30-80% B in 40 min followed by isocratic elution and column regeneration. The fluorescence detector was based on absorbance at 340 nm excitation wavelength and emission at 460 nm. The data analysis consisted of scopoletin and scopolin relative analysis (area percent of total chromatogram).

Scopoletin identification by Gas Chromatography/Mass Spectrometry (GC/MS)
The HPLC fractions containing scopoletin peak were collected and scopoletin identification was confirmed ( Figure 3A) with Gas Chromatography/Mass Spectrometry (GC/MS) by comparison to spectrum library ( Figure 3B). GC/MS analysis was performed using a Perkin-Elmer GC XL Gas Chromatograph interfaced to a Mass Spectrometer equipped with an Elite-5MS (5% diphenyl/ 95% dimethyl polysiloxane) fused to a capillary column (30 × 0.25 μm ID × 0.25 μm df). For GC/MS detection, an electron ionization system operated in electron impact mode  with an ionization energy of 70 eV. Helium gas was used as a carrier gas at a constant flow rate of 1 ml/min, and an injection volume of 2 μl was employed (a split ratio of 10:1). The ion-source temperature was 250°C, the oven temperature was programmed from 100°C (isothermal for 5 min), with an increase of 10°C/min to 300°C. Mass spectra were taken at 70 eV; a scan interval of 0.5 s and fragments from 30 to 450 Da. The solvent delay was 1 to 2 min, and the total GC/MS running time was 38 min. The mass-detector used in this analysis was Turbo-Mass Gold-Perkin-Elmer, and the MS software Turbo-Mass ver-5.1.

Quantitative traits
Coumarins were quantified in the methanol root extracts of three biological replicates (cultivated in independent flasks) of all AI-RILs individuals. Methanol extracts subjected to enzymatic hydrolysis were used for scopoletin quantification, while scopolin contents were determined in methanol extracts without hydrolysis.

Quantitative genetic analyses
The scopolin and scopoletin mean values for each AI-RILs were used in QTL mapping and trait correlation analysis. The regression equation and R 2 were calculated by plotting scopolin and scopoletin mean values against one another in Scatterplot (Microsoft Excel). The broad sense heritability (H 2 ) was estimated according to the formula H 2 = V G /(V G + V E ), where V G is the amonggenotype variance component and V E is the residual (error) variance.

QTL analyses in the AI-RIL population
Statistical analysis of phenotypic data was performed by Shapiro-Wilk normality test. Phenotypic data is normally distributed at the significance level α = 0.05. QTL mapping was performed using R software (A Core Team, 2012, www.R-project.org) with R/qtl package [57,58]; http://www. rqtl.org/). QTL mapping was performed with Simple Interval Mapping (SIM) (data not shown) followed by the Multiple QTL mapping (MQM) procedure. The QTLs with the highest logarithm of odds (LOD) scores detected by SIM were subsequently used to make the QTL model by the MQM. The final QTL model was done with the backward elimination of cofactors with the window size 10 cM and maximum number of cofactors 5. Significance threshold (LOD) values (P <0.05) for the QTL presence was estimated from 10 000 permutations and is 3.4. "Addint" function has been used to add pairwise interaction, one at a time, to a multiple-QTL model. No interaction has been detected.

Candidate genes selection
The physical positions of genes annotated to be involved in coumarin biosynthetic process (Plant Metabolic Network, http://plantcyc.org/) were checked according to TAIR (http://www.arabidopsis.org/). To reveal other candidate genes possibly underlying detected QTLs, a list of candidates was constructed using the following criteria: (1) genes encoding enzymes belonging to families involved in coumarins biosynthesis and genes encoding transcription factors that might be induced by environmental stresses (http://www.arabidopsis.org/); (2) genes that are expressed in roots (http://bar.utoronto.ca/). The list of potential candidates was compiled by searching TAIR (http://www. arabidopsis.org/) and Arabisopsis eFP Browser (http://bar. utoronto.ca/) ( Table 4).

Statistical analysis
All treatments included at least three (or two in case of parental lines used in the genetic mapping) biological replicates. Data processing and statistical analyses (one way ANOVA, post-hoc test: least significant difference test [LSD]) were carried out using Microsoft Excel. Error bars representing standard deviation (SD) are shown in the figures; the data presented are means.

DNA samples preparation and sequencing
The RNeasy® Plant Mini Kit (Qiagen) was used following the instructions of the manufacturer and including on-column DNA digestion step with the RNase-Free DNase Set (Qiagen) to eliminate genomic DNA contamination. 0.5 μg of RNA was used for reverse transcription by Maxima First Strand cDNA Synthesis Kit (Thermo Scientific). The amplification of genes coding sequences was carried out in a 20 μl reaction mixture containing cDNA synthetized from RNA isolated from roots, 0.4 U of Platinum® Taq DNA Polymerase (Invitrogen), 200 μM dNTP, 1 μM primers, and 1 × PCR Buffer and 1.5 mM Mg 2+ . The reaction mixture was denatured at 94°C for 2 min, and then the PCR amplification was performed using 34 cycles of 94°C for 30 sec, 52°C for 30 sec, and 72°C for 90 sec in the Thermal Cycler C1000 Touch (Bio-Rad). Gene-specific primers used for AT5G53990 UDPglycosyltransferase amplification were 5′-ATGGGCCAA AATTTTCACGCT -3′ and 5′-TCATTCAAGATTTGTA TCGTTGACT-3′ and for AT3G28740 CYP81D11 5′-ATGTCATCAACAAAGACAATAATGG-3′ and 5′-TTA TGGACAAGAAGCATCTAAAACC-3′. PCR products were cloned into pCR8 vector (Invitrogen). For plasmid amplification and maintenance, the Escherichia coli strain One Shot® (Invitrogen) was used. Positive clones were sequenced using vector specific primers M13fwd and M13rev and BigDye® Terminator v3.1 (Life Technologies). Sequencing reaction products were separated and analyzed by 3730xl DNA Analyzer. All sequences were aligned using CLUSTALW [59].