Plant material
Two Theobroma cacao varieties: Scavina 6 and Amelonado were used for this study. Cacao plants were grown in greenhouse as previously described [44]. Leaf and flower tissues were collected from Scavina 6 plants. For leave tissues, young red stage leaves were collected. The definition of leaves stages were previously described [45]. Cacao pods were obtained by hand pollinating Amelonado (a self-compatible variety), in order to reduce the effect of genetic variation on seed traits. Pods harvested 18 weeks after pollination were dissected into exocarp (outer fruit tissue) and seeds for separate analysis. Exocarp samples represent the outer 1–3 mm layer of the fruit obtained using a fruit peeler. All samples were frozen in liquid nitrogen upon collection and stored at −80°C until extraction.
Transgenic and wild-type tobacco plants (Nicotiana tabacum var. Samsun provided by Wayne Curtis, Department of Chemical Engineering, The Pennsylvania State University) were grown in a greenhouse under the same condition as cacao plants. Arabidopsis plants (Arabidopsis thaliana) were grown in soil at 22°C, 50% humidity and a 16 h/8 h light/dark photoperiod in a growth chamber (Conviron, Pembina, ND, USA). Plants grown aseptically were plated on MS medium [46] with 2% (w/v) sucrose solidified with 0.6% (w/v) agar. Arabidopsis ecotype Columbia (Col −0) plants were used as the wild type. T-DNA insertion mutants ban (SALK_040250) and ldox (SALK_028793) were obtained from The Arabidopsis Biological Resource Center (Columbus, OH, USA).
Nucleic acid purification and cDNA synthesis
Total RNA from leaves of Theobroma cacao (Scavina 6) was isolated using a modified cetyl trimethyl ammonium bromide (CTAB) extraction method as previously described [47] with the following modifications. RNA isolated from the CTAB extraction LiCl precipitation was further purified and concentrated using RNeasy columns (Qiagen, Valencia, CA, USA), but the phenol/chloroform extraction and sodium acetate/ehanol precipitation step was omitted. The quality of RNA was verified by observing absorbance ratios of A260/A280 (1.8 to 2.0) and A260/A230 (1.8 to 2.2) and by separating 200 ng RNA samples on 0.8% agarose gels to examine intact ribosomal bands. First strand cDNA was synthesized using the SMART RACE cDNA amplification kit (Clontech, Mountain View, CA, USA).
Isolation of cDNA and genomic clones from Theobroma cacao
The putative expressed sequence tag (EST) sequences of cacao anthocyanidin reductase (TcANR), anthocyanidin synthase (TcANS) and leucoanthocyanidin reductase (TcLAR) genes were obtained by searching the Theobroma cacao EST database (http://esttik.cirad.fr/) [31] using the tBLASTn program [48]. The query sequences used were the protein sequences of BANYULS and LDOX from Arabidopsis thaliana and DuLAR from Desmodium uncinatum respectively (Accession numbers: NP_176365, Q96323 and CAD79341). Based on the sequences of the EST contigs from the ESTtik database (EST Treatment and Investigation Kit; http://esttik.cirad.fr), PCR primers were designed to amplify the entire coding sequences of each gene: ANR_F (5′-AGCCATGGCCAGCCAGACCGTAGG-3′) and ANR_R (5′-GCGGCCGCTCACTTGAGCAGCCCCTTAGC-3′), ANS_F (5′-CCATGGTGACTTCAATGGCCCCCAG-3′) and ANS_R (5′-GCGGCCGCCTCAATTAGACAGGCCATC-3′) and LAR_F (CCATGGATATGAAATCAACAAACATGAATGGTTC) and LAR_R (GCGGCCGCTCATGTGCATATCGCAGTG). NcoI sites were added to the 5’ end of each start codon and NotI sites were added to the 3’ end of each stop codon to facilitate the subsequent cloning into binary T-DNA vectors. The coding sequences were amplified from cacao cDNA prepared from young leaves (genotype Scavina 6) with the Advantage cDNA PCR Kit (Clontech, Mountain View, CA, USA) using these primers. PCR reactions were carried out in a total volume of 20 μL at 94°C for 5 min; 5 cycles of 94°C for 30 sec, 55°C for 30 sec, and 72°C for 1 min; then another 23 cycles of 94°C for 30 sec, 60°C for 30 sec, and 72°C for 1 min; followed by a final extension at 72°C for 5 min. PCR products were gel purified and cloned into the pGEM-T easy vector (Promega, Madison, WI, USA). The correct open reading frames (ORFs) of each of the resulting constructs (pGEMT-TcANR, pGEMT-TcANS and pGEMT-TcLAR) were confirmed by DNA sequencing.
The DNA sequences of the TcANR, TcANS and TcLAR genes were obtained by isolation and sequencing of genomic clones. Briefly, 2 high-density filters were arrayed with 18,432 colonies of Theobroma cacao (genotype LCT-EEN 37) bacterial artificial chromosome (BAC) clones on each (library and filters constructed by The Clemson University Genomics Institute (CUGI, https://www.genome.clemson.edu/). The filters were hybridized to full-length cDNA of each gene labeled with P32 using the MEGA Labeling Kit (GE Healthcare, Piscataway NJ). DNA hybridizations were carried out at 65°C in 1 mM ethylenediaminetetraacetic acid (EDTA), 7% sodium dodecyl sulfate (SDS), 0.5 M sodium phosphate (pH7.2) for 16–18 h. Filters were washed twice at 65°C in 1 mM EDTA, 1% SDS, 40 mM Na2HPO4 for 20 min, twice at 65°C in 1.5× sodium chloride/sodium citrate (SSC), 0.1% SDS for 20 min and twice at 65°C in 0.5× SSC, 0.1% SDS for 20 min. Two or more BAC clones were identified for each gene and confirmed by PCR using plasmid DNA from individual colonies and gene specific primers. High purity plasmid DNA from individual BAC clones was then isolated using a NucleoBond BAC 100 kit (Macherey-Nagel Inc., Bethlehem, PA, USA) and both strands of DNA were sequenced using primers designed from the cDNAs and genomic DNA. DNA sequencing results were analyzed and assembled using Vector NTI software (Invitrogen, San Diego, CA), application Contig Assembly. The DNA sequence of each gene was then compared to the corresponding coding sequence by using the BLAST2 online tool (http://www.ncbi.nlm.nih.gov/blast/bl2seq/wblast2.cgi) to obtain exon and intron locations for the gene organization analyses.
Phylogenetic analysis
Deduced protein sequences of all Arabidopsis IFR-like genes were retrieved from The Arabidopsis Information Resource (TAIR) database (http://www.arabidopsis.org/) by querying the TAIR protein database with the Desmodium LAR protein sequence (CAD79341) using the WU-BLAST2 (BLASTP) program. Protein sequences from other species were retrieved from the GenBank (http://www.ncbi.nlm.nih.gov/Genbank/). Accession numbers are indicated in the figure legend.
Multiple sequence alignment of proteins was performed by ClustalX algorithm [49] with default parameter settings (gap opening penalty: 10, gap extension penalty: 0.2, delay divergent cutoff: 30%, protein weight matrix: Gonnet series) and this alignment was used to construct the phylogenetic tree using the neighbor-joining method in the MEGA package [50]. Five hundred bootstrapped datasets were used to estimate the confidence of each tree clade.
Determination of proanthocyanidins (PAs) and anthocyanins
For quantitative analysis of anthocyanin levels in transgenic tobacco flowers, fresh petals (0.3-0.5 g fresh weight) from three flowers were immersed in 5 mL ethanol: 6 M HCl (1:1) and incubated at 4°C overnight. The extract solution was transferred to a new tube and the petals were extracted for the second time using the same method. Absorbance of the pooled extract solution was then measured at 526 nm and the total anthocyanin levels were calculated using a standard molar absorbance curve prepared using cyanidin-3-glucoside (Sigma-Aldrich, MO, USA).
To extract soluble PAs from cacao and tobacco tissues, 0.3 to 0.5 g of frozen tissues were ground into a fine powder in liquid nitrogen and then extracted with 5 mL of extraction solution (70% acetone: 29.5% water: 0.5% acetic acid) by vortexing for 5 seconds followed by water bath sonication for 15 min using a bench top ultrasonic cleaner (Model 2510, Bransonic, Danbury, CT, USA). To extract soluble PAs from Arabidopsis seeds and siliques, the same extraction solution and method were applied, except that 100 to 500 mg dry seeds and 10 green siliques were used as grinding samples, and 500 μL extraction solution were used. After sonication, samples were vortexed again and centrifuged at 2500 g for 10 min. The supernatant was transferred to a new tube and the pellet was re-extracted twice as above. Pooled supernatants were extracted twice with hexane to remove fat and chlorophyll and then filtered through a 0.45 μm polytetrafluoroethylene (PTFE) syringe filter (Millipore, Billerica, MA, USA). Depending on availability of plant samples, different numbers of biological replicates were performed for cacao, tobacco and Arabidopsis samples. For cacao, there are at least five biological replicates, for tobacco, there are seven or more biological replicates, and for Arabidopsis, there are three biological replicates.
To quantify soluble PA levels, extracts were then quantified by reaction with p-dimethylamino-cinnamaldehyde (DMACA), which specifically interacts with PA monomers and polymers to form blue pigments [51]. Briefly, 50 μL aliquots of samples were mixed with 200 μL of dimethylaminocinnamaldehyde (DMACA; Sigma-Aldrich, MO, USA) reagent (0.1% DMACA, 90% reagent-grade ethanol, 10% HCl) in 96-well microtiter plates. Absorption was measured at 640 nm at one-minute intervals for 20 min to get the highest readings. Triple technical replicates were performed to obtain mean values. The total PA levels were calculated using the standard molar absorbance curve prepared using procyanidin B2 (Indofine, NJ, USA).
For quantitative analysis of insoluble PAs from cacao tissues, the residues from soluble PA extractions were air dried in an exhaust hood for two days, weighed, and 5 mL butanol-HCl reagent (95% butan-1-ol: 5% concentrated HCl) was added and the mixture was sonicated for one hour followed by centrifugation at 2500 g for 10 min. An aliquot of clear supernatant was diluted 40-fold in butanol-HCl reagent and absorbance was measured at 550 nm to determine the amount of background absorption. The samples were then boiled for 1 hour with vortexing every 20 min, cooled to room temperature and centrifuged again at 2500 g for 10 min. The supernatant from boiled sample was diluted 40-fold in butanol-HCl reagent and absorbance was measured at 550 nm. The values were normalized by subtraction of the background absorbance and the PA levels were calculated as cyanidin equivalents using cyanidin-3-glucoside (Sigma-Aldrich, MO, USA) as standards.
To visualize the presence of PAs in Arabidopsis seeds, dry seeds were immersed for 2 days in the 0.1% DMACA reagent described above and then washed 3 times with 70% ethanol as described previously [35]. Catechin and epicatechin content was determined by reverse-phase HPLC using an Alliance separations module (Model 2695; Waters, Milford, MA, USA) equipped with a multi λ fluorescence detector (Model 2475; Waters, Milford, MA, USA). Samples of soluble PA extracts (10 μL) were separated on a 250 mm × 4.6 mm Luna 5-μm Phenyl Hexyl column (Phenomenex, Torrance, CA, USA) and then assayed by fluorescence emission at 315 nm following excitation at 280 nm. The HPLC separation utilized a binary mobile phase gradient mixture of A+B where mobile phase A was 0.5% trifluoroacetic acid (TFA) (v/v with water) and mobile phase B was 0.5% TFA (v/v with methanol). The gradient conditions were: 0 min, 16% mobile phase B; 4 min, 16% mobile phase B; 14 min, 50% mobile phase B; 18 min, 50% mobile phase B; 22min, 100% mobile phase B; 26 min, 100% mobile phase B; 30 min, 16% mobile phase B. The column was maintained at 30°C and the flow rate was 1 mL/min. Catechin and epicatechin standards were purchased from Sigma-Aldrich (St. Louis, MO, USA). This work was performed at the Hershey Technical Center (Hershey, PA, USA).
Transformation of tobacco and arabidopsis
The coding sequences of TcANS, TcANR and TcLAR were excised from the cloning vector (pGEM-T easy) (Promega, Madison, WI, USA) with NcoI and NotI restriction enzymes and cloned into the pE2113-EGFP [44] intermediate vector to replace the original EGFP coding sequence. As a result, the cacao coding sequences are located immediately downstream of the E12-Ω, an enhanced expression promoter modified from CaMV35S [52], and upstream of the CaMV35S-terminator. The over-expression cassettes of TcANR and TcANS was excised out from pE2113-TcANR and pE2113-TcANS constructs respectively with HaeII restriction enzyme, blunt ended with T4 polymerase and then introduced into the pCAMBIA-1300 binary vector (CAMBIA, Canberra, Australia) linearized with SmaI restriction enzyme; the over-expression cassettes of TcLAR was excised out from pE2113-TcLAR construct with HindIII and PvuII restrictin enzyme and ligated into pCAMBIA-1300 binary vector linearized with HindIII and SmaI restriction enzyme. All binary transformation constructs were introduced into Agrobacterium tumefaciens strain AGL1 [53] by electroporation as described previously [54].
Tobacco leaf disc transformation was performed as previously described [55] and transgenic shoots were regenerated on MSs (MS shooting) media supplemented with 25 mg/L hygromycin. Only one shoot was selected from each explant to ensure independent transformants. After rooting for 2 weeks in MSr (MS rooting) media supplemented with 25 mg/L hygromycin, hygromycin-resistant plantlets were transferred to soil and grown in a greenhouse as described above.
Arabidopsis transformation was carried out using the floral dip method [56], and T1 transgenic plants were selected on MS media supplemented with 2% sucrose, 0.65% agar and 25 mg/L hygromycin. Hygromycin-resistant T1 seedlings were transferred to soil 7 days after germination and grown in a growth chamber as described in above.
Expression analysis of TcANS, TcANR and TcLAR
Total RNA from leaves, flowers, whole pods, pod exocarp and ovules of Theobroma cacao (Scavina 6 and Amelonado) was isolated as described above. Total RNA from young leaves of transgenic and wild-type tobacco plants as well as Arabidopsis plants was isolated using the RNeasy Plant mini kit (Qiagen, Valencia, CA, USA). cDNA was synthesized from 1 μg of total RNA in a total volume of 20 μL using M-MuLV Reverse Transcriptase (NEB, Ipswich, MA, USA) according to the supplier’s protocols, and 2 μL were used in the subsequent reverse transcription-PCR (RT-PCR) reactions. The primers for RT-PCR were designed to amplify across at least one intron giving products of approximately 500 bp from cDNA and 700 bp to 1500 bp from genomic DNA. These primer sets were used to check all cDNAs for genomic DNA contamination. The primers used for TcANS were TcANSRT_F (5′-ACCTTGTTAACCATGGGATCTCGG-3′) and TcANSRT_R (5′-GACGGTGTCACCAATGTGCATGAT-3′); the primers used for TcANR were TcANR_F (5′- TGCTTGAGAAGGGCTACGCTGTTA-3′) and TcANR_R (5′-AAAGATGTGGCAAGGCCAATGCTG-3′); the primers used for TcLAR were TcLAR_F (5′-AATTCCATTGCAGCTTGGCCCTAC-3′) and TcLAR_R (5′-GGCTTGCTCACTGCTTTGGCATTA-3′). TcActin was used as an internal standard for cacao gene expression using primer set Tc46RT_F (5′-AGCTGAGAGATTCCGTTGTCCAGA-3′) and Tc46RT_R (5′-CCCACATCAACCAGACTTTGAGTTC-3′). AtUbi and NtrRNA was chosen as constitutive expression controls for Arabidopsis (ubiquitin) and tobacco (rRNA) respectively with primer pairs AtUbi_F (5′-ACCGGCAAGACCATCACTCT-3′) and AtUbi_R (5′-AGGCCTCAACTGGTTGCTGT-3′) [57], and NtrRNA_F (5′-AGGAATTGACGGAAGGGCA-3′) and NtrRNA_R (5′-GTGCGGCCCAGAACATCTAAG-3′) [58].
The number of PCR cycles was optimized between 20 and 32 to select a cycle number such that amplification was in the linear range; 28 cycles were chosen for all the RT-PCR reactions. The PCR reaction was carried out in a total volume of 20 μL at 94°C for 5 min; 28 cycles of 94°C for 30 sec, 55°C for 30 sec, and 72°C for 45 sec; followed by a final extension at 72°C for 5 min. The PCR products were visualized on 1% agarose gels stained with ethidium bromide (EtBr) and documented using Molecular Imager Gel Doc XR + System equipped with a 16-bit CCD camera (Bio-Rad Laboratories, Hercules, CA) and bands were quantified using Quantity One 1-D Analysis Software (Bio-Rad Laboratories, Hercules, CA).
Assay of LAR activities
The open reading frame (ORF) of the TcLAR gene was PCR amplified from pGEMT-TcLAR using Advantage 2 polymerase mix (Clontech, Mountain View, CA) and the following primers: TcLARCDF1 (5′-GAGCTC atggatatgaaatcaacaaacatg-3′; the SacI site is in italics and the start codon is bold) and TcLARCDR2 (5′-24 CTCGAGtgtgcatatcgcagtg-3′; the XhoI site is in italics and the stop codon was removed to incorporate the C-terminal His-tag sequence of the expression vector at the 3’ end of the ORF of TcLAR). It was then subcloned into the SacI and XhoI sites of the pET-21a expression vector (Novagen, Gibbstown, NJ, USA). After confirmation by sequencing, the resulting vector pET21a-TcLAR was transformed into Escherichia coli strain Rosetta (DE3) (Novagen, Gibbstown, NJ, USA). For protein expression, a single bacterial colony was inoculated into Luria-Bertani medium (10 g/L tryptone, 5 g/L yeast extract, 10 g/L NaCl) containing 100 μg/mL ampicillin and grown at 37°C overnight. An overnight culture was then diluted into terrific broth (TB) medium (12 g/L Tryptone, 24 g/L Yeast Extract, 0.4% glycerol, 2.31 g/L KH2PO4, 12.54 μg/mLg/L K2HPO4) containing 100 ampicillin and grown at 37°C until the OD600 reached 0.6-0.8, at which time IPTG (isopropyl β -D-1-thiogalactopyranoside), was added to a final concentration of 1 mM to induce protein expression. Recombinant TcLAR protein with a 6⋅His tag at the C terminus was purified using a Magne-His kit (Promega, Madison, WI, USA) and the protein concentration measured by the Bradford method [59]. This work was performed at the Samuel Roberts Nobel Foundation (Ardmore, Oklahoma, USA).
3H-3,4-cis-leucocyanidin was synthesized as described by [60]. Assay of recombinant TcLAR protein with 3H-3, 4-cis-leucocyanidin was carried out in a final volume of 100 μL containing 10% (w/v) glycerol, 100 mM potassium phosphate (pH 7.0), 4 mM dithiothreitol (DTT), 0.5 mM NADPH, 0.4 mM 3H-leucocyanidinμgofpurifieand recombinant30 TcLAR protein. The reaction was initiated by the addition of enzyme and incubated at 30°C for 1 h. The assay was terminated by the addition of 20 μL of methanol followed by centrifugation. Products were analyzed by HPLC, with absorbance monitoring at 280 nm. Products eluting at retention times between 13 to 31 min were collected (1 min/tube) and the fractions containing labeled products were identified by liquid scintillation counting. Boiled pure protein was used as a control.
Reverse-phase HPLC analysis of enzymatic products was performed using an Agilent HP1100 HPLC (Agilent Technologies, Inc., Santa Clara, CA, USA) with the following gradient using solvents A (1% phosphoric acid) and B (acetonitrile) at a 1 mL/min flow rate: 0 to 5 min, 6% B; 5 to 10 min, 6% to 10% B; 10 to 20 min, 10% to 11% B; 20 to 25 min, 11% to 12.5% B; 25 to 45 min, 12.5% to 37% B; 45 to 48 min, 37% to 100% B; 48 to 58 min, 100%, 58 to 60 min 100% to 6% B. Absorbance data were collected at 280 nm. Identifications were based on comparison of chromatographic behavior and UV spectra with authentic standards. This work was performed at Samuel Roberts Nobel Foundation (Ardmore, Oklahoma, USA).
Sequence data from this article can be found in the GenBank/EMBL data libraries under accession numbers GU324347 (TcANR), GU324349 (TcANS) and GU324351 (TcLAR).