An analysis of expressed sequence tags of developing castor endosperm using a full-length cDNA library
© Lu et al. 2007
Received: 21 January 2007
Accepted: 31 July 2007
Published: 31 July 2007
Skip to main content
© Lu et al. 2007
Received: 21 January 2007
Accepted: 31 July 2007
Published: 31 July 2007
Castor seeds are a major source for ricinoleate, an important industrial raw material. Genomics studies of castor plant will provide critical information for understanding seed metabolism, for effectively engineering ricinoleate production in transgenic oilseeds, or for genetically improving castor plants by eliminating toxic and allergic proteins in seeds.
Full-length cDNAs are useful resources in annotating genes and in providing functional analysis of genes and their products. We constructed a full-length cDNA library from developing castor endosperm, and obtained 4,720 ESTs from 5'-ends of the cDNA clones representing 1,908 unique sequences. The most abundant transcripts are genes encoding storage proteins, ricin, agglutinin and oleosins. Several other sequences are also very numerous, including two acidic triacylglycerol lipases, and the oleate hydroxylase (FAH12) gene that is responsible for ricinoleate biosynthesis. The role(s) of the lipases in developing castor seeds are not clear, and co-expressing of a lipase and the FAH12 did not result in significant changes in hydroxy fatty acid accumulation in transgenic Arabidopsis seeds. Only one oleate desaturase (FAD2) gene was identified in our cDNA sequences. Sequence and functional analyses of the castor FAD2 were carried out since it had not been characterized previously. Overexpression of castor FAD2 in a FAH12-expressing Arabidopsis line resulted in decreased accumulation of hydroxy fatty acids in transgenic seeds.
Our results suggest that transcriptional regulation of FAD2 and FAH12 genes maybe one of the mechanisms that contribute to a high level of ricinoleate accumulation in castor endosperm. The full-length cDNA library will be used to search for additional genes that affect ricinoleate accumulation in seed oils. Our EST sequences will also be useful to annotate the castor genome, which whole sequence is being generated by shotgun sequencing at the Institute for Genome Research (TIGR).
The hydroxy fatty acid ricinoleate (12-hydroxy-octadeca-cis-9-enoic acid: 18:1-OH) is an important natural raw material with great value as a petrochemical replacement in a variety of industrial processes. Its derivatives are found in products such as lubricants, nylon, dyes, soaps, inks, adhesives, and biodiesel . The seeds of castor plant (Ricinus communis L.) are the major source of ricinoleate, which constitutes about 90% of the total fatty acids of the seed oil. However, oilseed castor cultivation is limited to tropical and sub-tropical regions, and seeds are laboriously harvested by methods that are difficult to adapt to large-scale production. In addition, castor seeds contain the poisonous ricin as well as strongly allergenic 2S albumins, which pose health threats for workers during planting, harvesting and processing. It is therefore highly desirable to produce ricinoleate in temperate oilseed crops through genetic engineering.
Ricinoleate biosynthesis in castor seeds is catalyzed by an oleate Δ12-hydroxylase (FAH12), a close homologue of the oleate Δ12-desaturase (FAD2) . The FAH12 adds a hydroxy group (-OH) to the twelfth carbon of oleic acid moieties esterified to the sn-2 position of phosphatidylcholine . Expression of FAH12 in transgenic tobacco and Arabidopsis caused the accumulation of hydroxy fatty acids, but only to about 17% of total seed oil, far less than that in the native castor seeds [4–6]. To increase ricinoleate in transgenic oilseeds and create a castor oil replacement, it is necessary to better understand the mechanisms of lipid metabolism in castor seed. We are specifically interested in the expression profile of genes that are co-expressed with the FAH12 gene because some of these gene products may also contribute to ricinoleate accumulation in developing castor seeds. Expressed sequence tag (EST) analysis provides a convenient and efficient gateway for identification of genes expressed in specific tissues and cells as well as allowing characterization of the level of transcript expression . Despite the availability of a small number (744) of ESTs from developing castor endosperm , and a more wealthy EST collection from leaves recently released by the Institute of Genome Research , gene expression information in developing castor endosperm is limited. There was no full-length cDNA resource in castor either. In this report, we sequenced the 5'ends of about 5,000 cDNA clones from a full-length cDNA library derived from developing castor endosperm, the storage organ in castor seed. We analyzed the abundance of specific cDNAs from 4,720 EST sequences. We found that the castor oleate desaturase (RcFAD2) sequence is much less abundant than that of the FAH12 in our cDNA sequences, suggesting a transcriptional control of these two genes in castor endosperm to favor ricinoleate accumulation.
The most abundant sequences from a full-length cDNA library of developing castor endosperm
No of ESTs
Functional description of gene product
seed storage protein [Ricinus communis]
2S albumin precursor (Allergen Ric c 1)
Agglutinin precursor (RCA)
16.9 kDa oleosin
2S albumin precursor (Allergen Ric c 1)
40S ribosomal protein S9 (RPS9C)
Probable nonspecific lipid-transfer protein AKCS9 precursor (LTP)
lipase (class 3) family
60S ribosomal protein L10A (RPL10aA)
Thiazole biosynthetic enzyme, chloroplast precursor
Enolase (2-phosphoglycerate dehydratase)
Vacuolar processing enzyme precursor (VPE)
subtilisin-like serine protease, putative
oleate 12-hydroxylase – castor bean
elongation factor – alpha (EF-1-ALPHA)
putative 40S ribosomal protein S12
Acyl carrier protein 1, chloroplast precursor (ACP 1)
acyl- [acyl-carrier-protein] desaturase (stearoyl-ACP desaturase)
Protein disulfide isomerase precursor (PDI)
Triosephosphate isomerase, cytosolic (TIM)
60S ribosomal protein L18 (RPL18B)
malonyl-CoA:Acyl carrier protein transacylase
ADP, ATP carrier protein 1, mitochondrial precursor
cytosolic phosphoglycerate kinase 1
embryonic protein BP8
L3 Ribosomal protein
proteinase inhibitor se60-like protein
stress related protein -related
40S ribosomal protein S9 (RPS9C)
Annexin-like protein RJ4
glutathione peroxidase, putative
OSJNBa0067K08.3 [Oryza sativa (japonica cultivar-group)]
Translationally controlled tumor protein homolog (TCTP)
Oil-body oleosin genes are also highly expressed, making up about 4% of the total sequences. The 209 ESTs for oleosins in the sequenced clones represent 6 different genes according to sequence similarity to Arabidopsis oleosin homologues. These genes are expressed at different levels. The castor oleosin RcOLE2 (accession No. AAR15172), a homologue of the Arabidopsis At4g25140, is the most abundant one (170 ESTs). There are 34 ESTs representing the RcOLE1 (accession No. AAR15171), a homologue of At3g01570. Others are much less abundant. Only two ESTs are homologous to At5g51210, and one EST each for the oleosins that are homologous to At2g25890, At3g18570, and At3g27660, respectively. In contrast, expression levels of different oleosins in developing Arabidopsis seeds vary less dramatically. For example, the EST counts for At4g25140, At5g40420 and At3g27660 are 9, 38 and 49, respectively from 10,522 sequences . The relatively high abundant 21-KD oleosin gene (At5g40420) in Arabidopsis seeds is absent in our cDNA sequences of castor. These findings suggest that different oleosins may play different roles in oil accumulation in castor and Arabidopsis seeds. In our high-throughput screening experiment, we found that co-expressing RcOLE2 (an At4g25140 homologue) with FAH12 resulted in moderately increased hydroxy fatty acid accumulation in transgenic Arabidopsis seeds . At4g25140 plays an important role in regulating oil body size in Arabidopsis seed . The abundance of RcOLE2 in our EST collection suggests it may play a similar role in castor seed.
Besides storage proteins, oleosins, ricin and a metallothionein-like protein as listed in Table 1, there are several genes that are somewhat abundant in our cDNA library. These include lipid transfer proteins, genes encoding components of the protein biosynthetic apparatus such as alanine aminotransferase, ribosomal proteins, and elongation factor 1-alapha, as well as proteins involved in carbohydrate metabolism such as glyceraldehyde-3-phosphate dehydrogenase, enolase, and triosephosphate isomerase. The genes in this class also include the oleate hydroxylase (FAH12) and other genes of lipid metabolism such as acyl carrier protein (ACP), stearoyl-ACP desaturase, and malonyl-CoA:ACP transacylase.
Interestingly, as listed in Table 1, we identified a class-3 triacylglycerol lipase (cn82) that is highly abundant (23 ESTs) in our cDNA library. This gene, we termed RcTGL3, was recently characterized as an acidic triacylglycerol (TAG) lipase of the castor bean . A close homologue of this gene (RcTGL3-2) with 87% sequence identity was also identified (cn81), and its full-length sequence was determined (GenBank accession No. EF071862). The RcTGL3-2 gene is moderately abundant in our cDNA library (8 ESTs). The more abundant RcTGL3 gene is specifically expressed in developing castor endosperm as revealed by RT-PCR analysis (data not shown; also see ). The function of a TAG lipase is to hydrolyze TAG into fatty acids and the intermediate products diacylglycerol or monoacylglycerol. The high level of expression of the TAG lipases along with many lipid synthetic genes in developing endosperm of castor seeds raised questions about their roles in seed development or lipid accumulation. Speculating that they might play a role in ricinoleate accumulation in castor endosperm, we transformed the two lipase homologues independently into a FAH12-expressing Arabidopsis line, CL37 , and the fatty acid methyl esters of the transgenic seeds were analyzed by GC. The fatty acid compositions of the transgenic seeds that co-expressed FAH12 and either lipase genes showed no significant difference from those of CL37 (data not shown). This result suggested that the lipases might not have significant contribution to fatty acid synthesis in transgenic Arabidopsis seeds. We did not pursue further studies of the transgenic lines since they had no effect on hydroxy fatty acid accumulation. Whether the transgenic lipase genes have altered lipase activities and their consequences on seed metabolism and physiology remain subjects of future investigations.
It is not clear why lipases express at such a high level of expression in developing seeds while lipid synthesis is actively taking place. The acidic lipase protein has also been detected in dry and germinating castor seeds , suggesting a role in breakdown of storage lipids to support post-germinative seedling development. However, the presence of a neutral or alkaline TAG lipase in castor seed and its predominant role in lipolysis  conflicts with this simple interpretation. Reverse-genetic analysis by knockout or knock-down of these genes in castor plant may provide an answer to the function(s) of the acidic lipases in developing seeds, as transformation technology has recently been extended to castor .
One of our purposes in analyzing ESTs was to identify genes that are important to lipid metabolism in castor endosperm. In contrast to a very high abundance of oleosins, and the moderately high abundance of some genes including the FAH12 and others that are listed in Table 1, most genes involved in lipid metabolism occur once or a few times in our EST data. Although about 3% of the genes we identified encode proteins involved in various aspects of lipid metabolism, they represent a small proportion of the approximately 150 lipid metabolism genes expressed in Arabidopsis seeds . For example, genes encoding enzymes such as diacylglycerol acyltransferase and others known to play major roles in TAG biosynthesis were not detected by our EST analysis, although some were detected by PCR analysis of our library .
Fatty acid compositions of the hydroxylase-transgenic line CL37 and selected lines that were transformed with the additional castor FAD2 gene. Data represent mean values of three independent GC analyses
Fatty acid composition (mol%)
We report here an analysis of the ESTs derived from a full-length cDNA library of castor developing endosperm. The ESTs are enriched in genes encoding storage proteins, ricin, oleosins, as well as other housekeeping cellular components such as those for protein synthesis. We identified two ESTs of the castor acidic TAG lipases, which are abundantly expressed in developing castor endosperm. Expression of these lipases did not increase ricinoleate accumulation in transgenic Arabidopsis seeds. Their function in castor developing seed remains unclear. In contrast to FAH12, FAD2 is much lower in abundance in our cDNA library, suggesting that regulation of FAD2 and FAH12 expression in castor endosperm may contribute to high-level accumulation of ricinoleate in castor oils, and our results in transgenic Arabidopsis plants support this possibility.
A full-length cDNA resource is particularly valuable for the correct annotation of genomic sequences and for the functional analysis of genes and their products [6, 21, 22]. Recently, The Institute for Genomic Research (TIGR) has initiated a project to generate redundant sequence analysis of the castor genome http://castorbean.tigr.org. Our results contribute to a better understanding of the castor plant at the genomic level, most especially for understanding seed metabolism. Future EST work will focus on subtractive or normalized cDNA library material to expedite gene discovery and functional genomic studies. We will also include EST analyses using mRNA extracted from different stages of seed development. Our ultimate goal is to identify genetic factors contributing to increased ricinoleate accumulation in seed oils, first in Arabidopsis and ultimately in oilseed crops.
A full-length cDNA library was constructed in a lambda vector incorporating the Gateway cloning system . Briefly, developing castor seeds were harvested at 20 days after pollination at developmental stage IV, when the endosperm undergoes rapid dimensional growth and gain in weight . The embryos were removed and total RNA was extracted from the endosperm. After mRNA purification, first strand full-length cDNA was generated with Superscript III reverse transcriptase (Invitrogen) and primer 5'-GAGAGAGAGAGAGAGAGAGGATCCACTCGAG TTTTTTTTTTTTTTTTVN-3' (including the restriction sites for BamHI and XhoI), followed by the cap-trapping procedure described by Carninci and Hayashizaki . Second strand cDNA was synthesized using the Single-Strand Linker Ligation Method . The resulting double-stranded cDNA was digested with SstI and XhoI, then ligated into the digested arms of the λGW cloning vector . The ligation product was packaged with Max Plax (Epicentre, Madison, WI) according to manufacturer's protocol. Consequently, a full-length cDNA library containing ~5 × 105 clones was obtained.
For sequencing, the cDNA library was transferred into the plasmid vector pDONR201 (Invitrogen) by the BP cloning process, then transformed into E. coli DH10B by electroporation. With the assistance of the Research Technology Support Facility at Michigan State University, colonies were picked randomly, inoculated into 96-well plates containing 1 mL of LB media and incubated at 37°C for 18 hr. DNA from bacterial cultures was purified using a Qiagen 3000 robot, and cDNA inserts were sequenced once from the 5'end of each clone using the BigDye terminator kit and an automated DNA capillary sequencer (ABI 3730, Applied Biosystems). The sequencing primer (5'-AAAAGCAGGCTGAGCTCGTCG-3') was designed to overlap the cDNA insertion site so that vector sequences were not included in EST sequences.
The 5' DNA EST sequence chromatogram data were base-called using the program Phred ; EST reads were quality trimmed using the Phred quality score at a position where five ambiguous bases (phred quality > 2 and at least 200 bp) were found within 15 consecutive bases. EST sequences were clustered using the software stackPACK (provided by SANBI ). Groups that contained only one sequence were classified as singletons. EST sequences longer than 200 bp were compared to NCBI  and TAIR  databases using the BLASTX program.
The corresponding open reading frame (ORF) of the castor FAD2 gene was amplified by PCR using Phusion DNA polymerase (New England Biolabs) and the following pair of specific primers: 5'-GCAAGCTTATGGGTGCTGGTGGCAGAAT-3' and 5'-GATCTAGATCAAAATTTGTTGTTATACCAG-3'. For ligation behind the inducible GAL1 gene promoter of the yeast expression vector pYES2 (Invitrogen, CA), the primers were extended by a HindIII or a XbaI restriction site (underlined), respectively. The resulting 1.2-kb PCR product was cloned into the vector pYES2 and transformed into the Saccharomyces cerevisiae strain DBY747 using the Frozen-EZ Yeast Transformation kit (Zymo Research, CA). Complete minimal drop out-uracil medium containing 2% glucose as the exclusive carbon source was inoculated with a single colony and grown at 30°C over night. FAD2 expression was induced by transferring the cells into the above medium containing 2% galactose instead of glucose, and grown overnight. Yeast cells were harvested by centrifugation at 1500 g for 5 min at 4°C, and washed once with distilled water. Fatty acid analyses were conducted as described below.
For RT-PCR analysis of FAD2, 1 μg of mRNA extracted from developing castor endosperm was used to do reverse transcription in 20 μL volume using the SuperScript III first-strand cDNA synthesis system for RT-PCR following the manufacturer's instructions (Invitrogen, CA). PCR was conducted using the above primers specific to castor FAD2 gene and 0.5 μL cDNA from the RT reaction. The PCR reaction was initiated by one cycle of 94°C for 3 min, and followed by 15 or 25 cycles of 94°C 30s, 55°C 30s and 72°C 1 min. For amplification of the FAH12 gene, the following pair of gene specific primers were used: 5'-ATGGGAGGTGGTGGTCGCAT-3' and 5'-TTAATACTTGTTCCGGTACC-3'. The primers 5'-ATGGCTGAGCATCAACAATCAC-3' and 5'-TCAGCCCTGTCCTTCATCTC-3' were used to amplify the oleosin OLE2 gene. All three resulting PCR products are full-length cDNA of the open reading frames.
We have previously described the Arabidopsis transgenic line CL37, expressing the castor oleate hydroxylase FAH12 . Full-length cDNA clones of the RcFAD2 and lipase genes were cloned into the plant expression vector pGate-DsRed-Phas  by the gateway LR cloning process following the manufacturer's instructions (Invitrogen), and transformed into CL37 by an Agrobacterium-mediated floral dip method . Transgenic seeds were screened using the DsRed fluorescent protein marker [6, 30]. Transgenic red seeds were sorted for comparison to non-transgenic seeds from the same T1 plant, and the fatty acids were analyzed by gas chromatography. Fatty acid methyl esters were prepared by heating ~20 seeds at 80°C in 1 ml 2.5% H2SO4 (v/v) in methanol for 90 min, followed by extraction with 200 μl hexane and 1.5 ml of 0.9% NaCl (w/v), then 100 μl of the organic phase was transferred to autoinjector vials. Samples of one μl were injected into an Agilent 6890 GC fitted with a 30-M × 0.25-mm DB-23 column (Agilent). The GC was programmed for an initial temperature of 190°C for 2 min followed by an increase of 8°C per min to 230°C and maintained for a further 6 min.
The authors thank the Research Technology Support Facility at Michigan State University for cDNA sequencing and bioinformatics services. This research was supported by the Dow Chemical Co. and Dow AgroSciences, the National Research Initiative of the USDA Cooperative State Research, Education and Extension Service grant no. 2006-03263, and the Agricultural Research Center at Washington State University to JB. Support for CL also came from the Concurrent Technologies Cooperation and the Bio-based Product Institute at Montana State University.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.