Mining the bitter melon (momordica charantia l.) seed transcriptome by 454 analysis of non-normalized and normalized cDNA populations for conjugated fatty acid metabolism-related genes

Background Seeds of Momordica charantia (bitter melon) produce high levels of eleostearic acid, an unusual conjugated fatty acid with industrial value. Deep sequencing of non-normalized and normalized cDNAs from developing bitter melon seeds was conducted to uncover key genes required for biotechnological transfer of conjugated fatty acid production to existing oilseed crops. It is expected that these studies will also provide basic information regarding the metabolism of other high-value novel fatty acids. Results Deep sequencing using 454 technology with non-normalized and normalized cDNA libraries prepared from bitter melon seeds at 18 DAP resulted in the identification of transcripts for the vast majority of known genes involved in fatty acid and triacylglycerol biosynthesis. The non-normalized library provided a transcriptome profile of the early stage in seed development that highlighted the abundance of transcripts for genes encoding seed storage proteins as well as for a number of genes for lipid metabolism-associated polypeptides, including Δ12 oleic acid desaturases and fatty acid conjugases, class 3 lipases, acyl-carrier protein, and acyl-CoA binding protein. Normalization of cDNA by use of a duplex-specific nuclease method not only increased the overall discovery of genes from developing bitter melon seeds, but also resulted in the identification of 345 contigs with homology to 189 known lipid genes in Arabidopsis. These included candidate genes for eleostearic acid metabolism such as diacylglycerol acyltransferase 1 and 2, and a phospholipid:diacylglycerol acyltransferase 1-related enzyme. Transcripts were also identified for a novel FAD2 gene encoding a functional Δ12 oleic acid desaturase with potential implications for eleostearic acid biosynthesis. Conclusions 454 deep sequencing, particularly with normalized cDNA populations, was an effective method for mining of genes associated with eleostearic acid metabolism in developing bitter melon seeds. The transcriptomic data presented provide a resource for the study of novel fatty acid metabolism and for the biotechnological production of conjugated fatty acids and possibly other novel fatty acids in established oilseed crops.


Background
A target of plant biotechnology has been the engineering of novel fatty acid production in seeds of established crops to enhance the industrial value of vegetable oils [1]. This research has involved the identification of genes for the synthesis of novel fatty acids from nonagronomic species and the subsequent transfer of these genes to crops for seed-specific expression. Targets for this research have included epoxy and hydroxylated fatty acids [1][2][3]. With only a few exceptions, these efforts have resulted in the production of novel fatty acids at levels significantly lower than those found in native sources. The modest success of this research has underscored the lack of knowledge in the specialized metabolism associated with the production and storage of novel fatty acids in oilseeds.
Our research has centered on fatty acids containing conjugated, or non-methylene interrupted double bonds, as a system for addressing gaps in our understanding of novel fatty acid metabolism. Oils enriched in conjugated fatty acids can be used as drying agents in coating materials such as paints, inks, and varnishes. The conjugated double bonds of these fatty acids are highly prone to oxidation, which enhances rates of polymerization or "drying" of coating materials [4]. The most widely used oil for these applications is tung oil extracted from seeds of Vernicia fordii. The value of this oil as a drying agent arises from its high content of the conjugated fatty acid α-eleostearic acid (18:3 Δ 9cis, 11trans, 13trans ) that comprises > 80% of tung oil [5]. Eleostearic acid also comprises~65% of the seed oil of Momordica charantia (bitter melon) [6]. Other conjugated fatty acids, including calendic (18:3 Δ 8trans,10trans,12cis ), catalpic (18:3 Δ 9trans,11trans,13cis ), and punicic (18:3 Δ 9cis,11trans,13cis ) acids, can be found in seed oils from species of at least nine different plant families [5,[7][8][9].
Efforts to transfer eleostearic acid production to seeds of temperate crops have been facilitated by the identification of genes encoding variant forms of the Δ12 oleic acid desaturase (or FAD2) termed "conjugases" [10][11][12]. These enzymes catalyze the removal of hydrogen atoms from the carbon atoms that flank the Δ12 double bond of linoleic acid, and convert the Δ12 double bond into two conjugated Δ11, Δ13 double bonds [10]. The product of this reaction is a conjugated triene with Δ9, 11, 13 unsaturation. In addition to Δ12-specific conjugases, Δ9 conjugases have been described in Calendula officinalis and Dimorphotheca sinuata that convert the Δ9 double bond of linoleic acid into conjugated Δ8, Δ10 double bonds [13][14][15].
Transgenic expression of Δ9 and Δ12 conjugase genes under control of strong seed-specific promoters in Arabidopsis and soybean have yielded conjugated fatty acid levels of 10 to 15% of the total seed oils [9]. These levels are well below amounts of conjugated fatty acids that naturally accumulate in seeds of plants such as tung and bitter melon. In the engineered Arabidopsis and soybean seeds, conjugated fatty acids not only accumulate in storage form in triacylglycerols (TAGs) but are also detected in aberrantly high amounts in membrane phospholipids (10% to 25% of the total fatty acids of these lipids), especially phosphatidylcholine [9]. In contrast, conjugated fatty acids are only minor components of phospholipids (< 1.5% of the total phospholipid fatty acids) in seeds from plants that naturally accumulate conjugated fatty acids to levels approaching 85% of the total fatty acids [9,16].
Although conjugases are of central importance for producing conjugated fatty acids, these results indicate that additional enzymes are required for the metabolism and accumulation of conjugated fatty acids in seeds of transgenic plants. Similar conclusions have been reached in efforts to engineer the production of hydroxy, epoxy, and acetylenic fatty acids in seeds [1,[17][18][19]. These fatty acids are also produced by variant forms of the Δ12 oleic acid desaturase. As with conjugases, the variant FAD2 hydroxylases, epoxygenases, and acetylenases use fatty acids bound to phosphatidylcholine and possibly other phospholipids, such as phosphatidylethanolamine, as substrates [16,20,21]. The products of these enzymes must be efficiently metabolized from phospholipids for storage at high levels in TAGs. This can occur either by the direct removal of the unusual fatty acid from phosphatidylcholine or by removal of the phosphocholine head group of phosphatidylcholine to produce the diacylglycerol for TAG synthesis [1]. Findings from seeds engineered with conjugases and well as with acetylenases suggest that specialized enzymes have evolved for the metabolism of unusual fatty acids from their site of synthesis on phosphatidylcholine to their storage in TAGs [9,19]. These enzymes are presumably absent from seeds of plants such as Arabidopsis and soybean that do not normally produce unusual fatty acids. These may include specialized phospholipases, acyltransferases, and enzymes associated with the removal or transfer of phospholipid head groups.
The production of high levels of conjugated fatty acids and other unusual fatty acids formed by FAD2 variants in seeds of transgenic plants will undoubtedly require the identification of genes for these specialized metabolic enzymes. To facilitate this effort, we have undertaken 454 pyrosequencing studies to obtain a comprehensive profile of the transcriptome of developing bitter melon seeds during a period of rapid synthesis and accumulation of eleostearic acid. Bitter melon seeds offer a useful system to study the functional genomics of eleostearic acid synthesis relative to tung seeds, which accumulate higher levels of this fatty acid, because bitter melon plants can be grown under controlled conditions and seeds can be more easily staged for eleostearic acid accumulation. As described here, we have identified 14,000 unique gene transcripts from normalized and non-normalized cDNA populations, including transcripts for the majority of enzymes involved in lipid biosynthesis and metabolism. Candidate genes for potential enzymes involved in eleostearic acid metabolism are highlighted, and also a divergent class of FAD2 that may be specialized for eleostearic acid biosynthesis in bitter melon seeds is described.

Results and Discussion
Determination of a seed developmental stage for rapid biosynthesis of eleostearic acid Bitter melon seeds were initially analyzed at different time points after floral pollination to determine the developmental stages at which active synthesis and accumulation of eleostearic acid occur. It was anticipated that this information would provide the basis for selection of an optimal time point during seed development for the identification of genes associated with eleostearic acid metabolism. For these studies, lipids were extracted from developing seeds, consisting primarily of embryo and progressively lesser amounts of endosperm during seed development. Developing seeds were sampled at intervals from 17 to 30 days after pollination (DAP). The extracted lipids were partitioned by silica solid phase extraction into fractions of neutral lipids, comprised primarily of triacylglycerols (TAGs), and phospholipids. From analysis of fatty acids from these fractions, it was observed that rapid accumulation of eleostearic acid begins between 18 and 19 DAP, and the accumulation of eleostearic acid is detected almost exclusively in the TAG-enriched neutral lipid fraction. This accumulation coincides with the growth and expansion of the embryo. Accompanying the accumulation of eleostearic acid are large decreases in palmitic acid (16:0) and α-linolenic acid (18:3) content in neutral and phospholipids and large increases in stearic acid (18:0) content in the neutral lipids. Consistent with the transition in fatty acid profile, expression of genes for the bitter melon conjugase, which produces eleosteraric acid, and the oil body-associated oleosin was detected by RT-PCR initially at 18 DAP ( Figure 1B). These collective data suggest that seeds at 18 DAP are enriched in the transcripts for enzymes involved in the synthesis and accumulation of eleostearic acid. This stage in seed development therefore is a suitable time point for transcriptomic analyses to identify metabolic genes specialized for eleostearic acid accumulation.

Construction of non-normalized and normalized cDNA libraries
Transcriptomic studies were conducted to identify genes associated with the synthesis and metabolism of eleostearic acid in bitter melon seeds. From previous expressed sequence tag analysis of developing bitter melon seeds [10], it was known that genes for enzymes such as acyltransferases that may be specialized for eleostearic acid metabolism are not highly expressed in developing seeds of this plant. Therefore, in order to enhance gene discovery, 454 pyrosequencing was conducted using non-normalized and normalized cDNA populations prepared from bitter melon seeds at 18 DAP. 454 pyrosequencing is now an established platform for deep sequencing of genomes and transcriptomes, and normalization of cDNAs was anticipated to enrich for low abundance mRNA transcripts in developing bitter melon seeds.
Prior efforts to normalize cDNA libraries have involved a number of different protocols including those based on the use of subtractive hybridization, hydroxyapatite (HAP)-bound column chromatography, and duplex-specific nuclease (DSN) treatment [22][23][24]. The normalization procedure used (as outlined in Figure 2) was adapted from the Evrogen Trimmer-Direct kit that uses kamchatka crab-based DSN to reduce amounts of abundant cDNAs that more rapidly re-anneal following melting and hybridization. This kit is also compatible with the Clontech SMART cDNA library, allowing for cloning of normalized cDNA inserts into a bacterial vector. This method has been previously used to normalize cDNA libraries from a number of organisms including lake sturgeon, asian seabass, and Medicago [25][26][27].
Prior to the 454 sequencing, a pilot experiment was conducted to analyze the composition of transcripts from both non-normalized and normalized cDNA pools by sequencing of clones from each library. From the non-normalized cDNA library, 40 independent colonies were isolated and sequenced (Additional File 1). About 40% of these sequences were found to encode seed storage proteins such as napin-and trypsin inhibitorrelated polypeptides. In the normalized cDNA library, a large reduction in percentage of the abundant transcripts encoding seed storage proteins and a wider range in size of transcripts were observed ( Figure 3). The sequencing results from the sequences derived from the normalized library revealed many transcripts unrelated to the seed storage proteins, confirming the effectiveness of normalization (Additional File 1). Following the pilot study, transcript sequences were analyzed in both the normalized and non-normalized cDNA populations using 454 deep sequencing. The initial run of 454 pyrosequencing generated 404,468 reads from the non-normalized cDNA pools and 255,687 reads from the normalized cDNA pools (Table  1). After trimming and screening, about 228,000 and 177,991 clean reads remained in the non-normalized and normalized cDNA pools, respectively. About 22% singletons were found in both populations. The remaining 78% reads were assembled into 10,072 and 18,245 contigs from non-normalized and normalized cDNA population, respectively ( Figure 4A and 4B). In both libraries, the average length of ESTs was approximately 800 nucleotides ( Table 1). The extra 80% assembled contigs in the normalized cDNA pools suggests that normalization played a critical role in enriching lowabundant unique transcripts and increasing the total number of cDNAs.
These assembled contigs were then searched against both Arabidopsis TAIR7 and viridiplantae subdivision of NCBI protein database with e-value cutoff of 1e-10, to find their homologues using BLASTX program. In both libraries, about 50% of the contigs assembled did not have a match in either database (designated "no hit" transcripts). In the normalized cDNA library, about half of these "no hit" sequences were fragments between 200 and 300 nucleotides (nt). For transcript contigs smaller than 200 nt, about 74% were "no hit". In contrast, 90% of transcripts at 500-1,000 nt and 99% of transcripts longer than 1,000 nt, were identified with homologs in Arabidopsis and/or green plant genomes. The majority of these "no hit" sequences probably result from primer dimer and holopolymer formation casuing the short sequences not to give high matches to known proteins. These could also encode 5'-and 3'-untranslated regions of transcripts, or small RNA species that do not encode proteins. The number of transcripts generated from the normalized library with either Viridiplant or Arabidopsis homologues was almost three times more than from the non-normalized populations (Table 1). This, as well as the observation that normalization yielding 80% more contigs, strongly suggests that normalization has played an important role in increasing the detection of lowabundance transcripts and that the increase in the total number of unique transcripts will facilitate the gene mining processes. These sequences are publicly available for web-based BLAST searches at http://genomics.msu. edu/JO/blast/blast.html.
Non-normalized cDNA populations reflect an early stage of seed development Deep sequencing of the non-normalized cDNAs allowed for analysis of gene expression profiles during an early stage of seed development in bitter melon. After comparing each contig with the non-redundant (nr) protein database of the NCBI and Arabidopsis proteins at TAIR using the BLASTX program, 4,459 contigs (representing 152,989 total reads) were identified with 3,093 unique Viridiplant homologs. Among the most abundant 50 contigs in the library, 20 of them had no identified homologs (Additional File 2). The remaining contigs encoded primarily seed storage protein-or ribosomeinactivating protein-related polypeptides.
To further focus the analysis of developing seeds of bitter melon, contigs with homologs for the same gene were combined as a single cluster and used to rank the most abundant sequences (> 250 read counts) in the non-normalized cDNA population. The most abundant sequences encoded seed storage proteins, which represented 25% of the total contigs (Table 2; Additional File 2). These included transcripts for genes related to napin (most abundant, 169 contigs with 23,678 reads), napinlike protein large chain, legumin-like seed storage protein, and 11 S globulin and its precursor. The second group of highly expressed genes (about 15% of total) encode ribosome-inactivating proteins, including type I and type II, MAP30, α-momorcharin [28], trichosanthin and its precursor, and NeuAc-gal/GalNAc-binding lectin.
Also abundant in non-normalized cDNAs from bitter melon seeds at 18 DAP were reads for transcripts encoding structural proteins associated with cell development, including latex protein, extensin, ribosomal associated membrane protein 4 (RAMP4), glycoprotein, and sec61 protein ( Table 2; Additional File 2). Genes involved in gibberellin biosynthesis, such as gibberellin 20-oxidase and gibberellin 7-oxidase, are highly expressed in the early developmental stage of bitter melon seeds. Homologs for these genes were previously shown to be upregulated during the fruit maturation of morning glory [29]. Highly expressed genes were also detected for polypeptides involved in redox balance in seeds, including riboflavin biosynthase, oxygenase, cytochrome P450, glutaredoxin, type-2 metallothionein, cytosolic ascorbate peroxidase, and cytochrome P450 monoxygenase.   Transcripts for many lipid-related genes are abundant in developing bitter melon seeds Genes encoding enzymes and other polypeptides associated with lipid biosynthesis and metabolism were detected among the 50 most abundant reads from the non-normalized cDNAs ( Table 2; Additional File 2). These included lipoxygenase, Δ12 oleic acid desaturase (FAD2), Δ12 fatty acid conjugase, and Class-3-type lipases. The most abundant reads included those encoding lipid or fatty acid-binding proteins, such as a phosphatidylethanolamine binding protein (PEBP)-homolog, acyl-CoA-binding protein, and acyl-carrier protein.
Interestingly, sequences for PEBP (gi|157343662) (1,921 reads and 6 contigs) were the 11 th most abundant in the non-normalized cDNAs, PEBP has been implicated in signal transduction in mammalian systems, and its role in eleostearic acid metabolism, if any, is not clear. Genes for oil body-associated proteins were also highly represented in the bitter melon seed transcriptome. Most notably, 212 reads representing three contigs were detected for caleosin genes. Of lesser abundance were oleosin genes representing homologs of Table 2 List of gene products for the 50 most abundant transcript reads in the non-normalized cDNA library from bitter melon seeds.  The numbers of reads and contigs and most related homolog for each gene product are also shown.
Arabidopsis OLEO1 and OLEO4 (81 reads in two contigs, and 16 reads in one contig were detected for OLEO1 and OLEO4 homologs, respectively). Given the variability of the amphipathic domain found in oleosins from diverse sources [30,31], it is possible that one or more of the bitter melon oleosins has specificity for eleostearic-rich TAGs to promote their efficient packaging and accumulation in oil bodies. Reads for Class 3 TAG lipase were also detected at high abundance in the non-normalized cDNA pool ( Table 2). Two groups of these homologs were present in the bitter melon libraries: (gi|157335527 with 1,000 reads, 3 contigs) and (gi|157345129 with 616 reads, 1 contig). Although the role for Class 3 TAG lipases in developing bitter melon seeds is not known, transcripts for this enzyme class were also found to be abundant in developing castor bean seeds (Ricinus communis) endosperm [32].
Normalization enhances gene discovery in developing bitter melon seeds A major goal of this study was to deeply mine the transcriptome of developing bitter melon seeds for fatty acid biosynthetic and metabolic genes. From this pool, candidate genes that are specialized for eleostearic metabolism can be identified. Normalization of the bitter melon cDNA pools was employed as a technique to facilitate gene discovery efforts.
To assess the efficacy of normalization, the bitter melon sequences from both non-normalized and normalized libraries with Arabidopsis homologs were counted to be 6,737 and 2,187, respectively ( Table 1). Distribution of these homologs in Arabidopsis covered all aspects of cellular components, molecular functions, and biological processes ( Figure 5). The fold increase of genes in normalized versus non-normalized over different categories ranges from 1.3 (ribosome, abundant transcripts) to 6 (signal transduction, low abundant transcripts), averaging at 2.9, suggesting a higher efficiency in gene discovery using the normalization library. High levels of protein metabolism, cell organization and biogenesis, and transport in molecular process were also observed. Most genes in molecular function categories encode activities of hydrolase, transferase, and protein and nucleotide binding, activities typically found in the active expansion of embryo and oil biosynthesis.
The top 50 genes with the largest numbers of reads in the non-normalized library included primarily genes for  Numbers of reads and contigs and most related homolog for each gene product are also shown. For comparison, the numbers of read and contigs for each transcript in the non-normalized cDNA library are indicated.
seed storage proteins, including those related to napin, legumin, and 11 S globulin ( Table 2). As an indication of the power of normalization for reducing the representation of these abundant genes, reads for transcripts of napin genes were reduced from 23,678 in the normalized library to 1,928 after normalization (Table 3). This resulted in a reduction in napin genes as the most abundant in numbers of reads in the non-normalized cDNAs to the fifth most abundant in the normalized cDNAs. Normalization also uncovered a number of genes that were not detected in the non-normalized library, including transcripts for genes encoding β-luffin, RPL24A, RPS17, PATL1, enoyl-reductase, and LRR proteins.
Notably, β-luffin ranked number 3 in the normalized library with 2,110 reads, despite its absence from sequences in the non-normalized library. As a first step toward understanding eleostearic acid biosynthesis and metabolism, we first attempted to identify all genes involved in lipid metabolism in developing bitter melon seeds. Normalized library sequences were compared with those in the annotated lipid metabolism gene database compiled at Michigan State University http://lipids.plantbiology.msu.edu/ [33]. From this analysis, bitter melon cDNAs encoding enzymes for known lipid biosynthetic and catabolic pathways were identified. From this analysis, almost all categories of lipid metabolism, from synthesis of lipid in plastids and endomembranes, metabolism of acyl-lipids in mitochondria, synthesis and degradation of storage oil, lipid signaling, fatty acid elongation, wax and cutin metabolism, and a group of miscellaneous genes were identified (Table 1, Additional File 3). Among the genes not detected in either library were those for enzymes in lipid degradation pathways, including DAD1-like acylhydrolase, allene oxide cyclase, and patatin-like acyl-hydrolase. Transcripts for several other hydrolases, including wax ester hydrolase, fatty acid ω-hydrolase, and alcohol-forming fatty acyl coenzyme A reductase, were also not detected in the bitter melon seed transcriptome. Most of these enzymes are involved in the synthesis of surface lipids, such as waxes in leaves and flowers that are not present in the embryo. In total, we identified 345 bitter melon transcripts that share homology with 189 known lipid genes in Arabidopsis (Table 4). This information is provided in a searchable database (Additional File 4).

Transcriptomic analysis and gene mining for eleostearic acid metabolism in bitter melon seeds
To gain a global perspective of lipid metabolism in bitter melon seeds, the abundance of 454 transcript reads for enzymes involved in fatty acid and TAG biosynthesis was compiled from 454 data obtained from non-normalized and normalized cDNA populations ( Figure 6). With regard to de novo fatty acid synthesis in plastids, the largest numbers of reads in the non-normalized cDNAs were transcripts for acyl carrier protein (ACP, 811 reads), 3-keto-acyl-ACP synthase (KAS) II (190 reads) and 3-ketobutyl-ACP reductase (160 reads) ( Figure 6). It is also notable that 68 reads were detected for the FatA acyl-ACP thioesterase, but no reads were detected for the FatB acyl-ACP thioesterase in the non-normalized cDNAs. This difference in numbers of reads and the known substrate preferences of Fat A (i.e., most active with 18:0-and 18:1-ACP) and Fat B (i.e., most active with 16:0-ACP) [34][35][36] likely account for the relatively high content of stearic acid (18:0) and low content of palmitic acid (16:0) in bitter melon seeds (Figure 1). Compared to de novo fatty acid synthesis, transcript reads for ER-associated lipid enzymes were of low abundance in the non-normalized cDNAs, except for FAD2 Δ12-oleic acid desaturases (1,089 reads) and fatty acid conjugase (284 reads) (Table 4, Figure 6).
For ER-associated TAG synthesis enzymes, normalization yielded significant enrichment of transcripts for enzymes including glycerol 3-phosphate acyltransferase 9 (GPAT9), phosphatidic acid phosphatase (PA Pase), Table 4 Numbers of total transcript reads and contigs comprising different categories of lipid genes from 454 analysis of normalized cDNA from developing bitter melon seeds. Also shown are the numbers of unique homologs of known Arabidopsis genes in the assembled contigs. The categories are based on those used in the Arabidopsis Lipid Gene database [33]. diacylglycerol acyltransferase 1 (DGAT1), phospholipid: diacylglycerol acyltransferase 1 (PDAT1). Normalization also uncovered transcripts not detected in the non-normalized cDNA population, including those for diacylglycerol acyltransferase 2 (DGAT2), phospholipase C-type enzymes (PLC), CDP-choline:diacyglyglycerol cholinephosphotransferase (AAPT1). Notably, transcripts for the recently reported phosphatidylcholine:diacylglycerol cholinephosphotransferase (PDCT), a key enzyme in polyunsaturated fatty acid synthesis in Arabidopsis seeds [37], were not detected in either the non-normalized or normalized libraries. Similar to the findings reported here, no transcripts for PDCT were detected in a transcriptomic analysis of developing tung seeds, which also accumulate high levels of eleostearic acid (Shockey, unpublished data). These findings suggest that metabolic pathways independent of PDCT are associated with eleostearic acid metabolism. FAD3 transcripts for the Δ15 linoleic acid desaturase were also not detected in either cDNA library, which is consistent with the near absence of α-linolenic acid in developing bitter melon seeds. Eleostearic acid is synthesized by Δ12 conjugase activity on linoleic acid bound primarily to phosphatidylcholine [9]. Despite its site of synthesis, eleostearic acid is found nearly exclusively in TAG and accounts for < 2% of the fatty acids in phospholipids throughout the development of bitter melon seeds ( Figure 1A). This implies that eleostearic acid is efficiently metabolized after its synthesis on phosphatidylcholine to its point of accumulation in TAG. To study the flux of eleostearic acid from its synthesis on phosphatidylcholine to storage in TAG, genes for lipid metabolism enzymes involved in fatty acid esterification and removal from glycerol backbones and those catalyzing removal or transfer of phospholipid head groups are of particular interest. Among these enzymes are the two classes of diacylglycerol acyltransferases: DGAT1 and DGAT2. These enzymes catalyze the esterification of fatty acids from acyl-CoA pools to the sn-3 position of DAG to form TAG. DGAT1 has been shown to be a major enzyme associated with TAG synthesis in Arabidopsis seeds [38,39]. In contrast, studies of DGAT2 T-DNA mutants have failed to identify a role for this enzyme in TAG accumulation in Arabidopsis seeds [38]. However, transgenic expression of DGAT2-type enzymes from castor bean and Vernonia galamensis have been shown to enhance the accumulation of ricinoleic acid in Arabidopsis seeds and vernolic acid in soybean seeds, respectively [40,41]. Results from tung also suggest that DGAT2 may be important in eleostearic acid metabolism in seeds of this species [42,43]. In the 454 sequence data, five reads for transcripts for DGAT1 and zero reads for DGAT2 were detected in the non-normalized cDNA library ( Figure 6). Following normalization, 35 reads for DGAT1 transcripts and five reads for DGAT2 transcripts were detected ( Figure 6). It is unclear whether the larger number of reads for DGAT1 transcripts indicates that this enzyme, rather than DGAT2, is of greater importance for eleostearic acid metabolism in bitter melon seeds. We are currently exploring this hypothesis by coexpression of bitter melon DGAT1 and DGAT2 with the bitter melon fatty acid conjugase in Arabidopsis conjugase to compare the relative abilities of these enzymes to enhance eleostearic acid accumulation. To facilitate these studies, full-length cDNAs for bitter melon DGAT1 and DGAT2 have been isolated. These sequences represent a single gene for each DGAT class. Alignments of the amino acid sequences of these enzymes with known DGAT1 and DGAT2 polypeptides are shown ( Figure 7A and Additional File 5). The bitter melon DGAT1 is most closely related to the grape DGAT1 (70% identity) and Euonymus alatus DGAT1 (68% identity) but is more distantly related to DGAT1 enzymes from Arabidopsis (64% identity), castor (65% identity), and tung (66% identity) ( Figure 7B). The bitter melon DGAT2 is most closely related to its homolog in Arabidopsis (57% identity), but distantly related to DGAT2 enzyme from castor (50% identity) and tung (49% identity). As shown in Figure 7A and Additional File 5, the N-terminal regions of the bitter melon DGATs and other known DGATs are the most variable portion of these polypeptides. One possibility is that the N-termini of DGATs are important determinants of the substrate specificities of these enzymes, especially with regard to the metabolism of unusual fatty acids.
In addition to DGATs, phospholipid:diacylglycerol acyltransferases (PDATs) are important enzymes for the final acylation step in TAG synthesis. In this regard, the activities of PDAT1 (At5g13640) and DGAT1 were recently shown to account for the bulk of TAG synthesis in Arabidopsis seeds [38]. Arabidopsis also contains a PDAT1-related gene (At3g44830) that displays seedspecific expression [44]. The polypeptide encoded by At3g44830 shares 57% amino acid sequence identity with PDAT1; however, the function of this polypeptide has yet to be established.
PDATs catalyze the transacylation of fatty acids from phospholipid to the sn-3 position of DAG and share homology with the well-studied enzyme lecithin:cholesterol acyltransferase (LCAT), which catalyzes sterol ester synthesis in blood plasma [45]. In plants, PDAT activity with high specificity for the transfer of ricinoleic was identified in microsomes of castor bean [46], suggesting the possibility that PDAT-type activity may also be an important contributor to eleostearic acid metabolism in bitter melon seeds. In our normalized cDNA library, two contigs (McCtg3028 and McCtg2714) were identified with closest relation to the PDAT1-like gene At3g44830. These two contigs were confirmed to one gene by PCR amplification, and this gene was designated McPDAT1 (data not shown). No close homologs for the At5g13640-encoded PDAT1 were detected in the nonnormalized or normalized bitter melon sequence data. The role of McPDAT1 in eleostearic acid metabolism is currently being explored.
In addition to DGATs and PDATs, transcripts for numerous other enzymes that may be specialized for eleostearic acid metabolism were detected in the nonnormalized and normalized libraries. These include transcripts for lysophosphatidic acid acyltransferasese (LPAT), phospholipase A 2 -and phospholipase C-related enzymes, and lysophosphatidylcholine acyltransferase (LPCAT).

Identification of a FAD2 variant in developing bitter melon seeds
An unexpected finding from the transcriptomic analysis of developing bitter melon seeds was the discovery of two divergent forms of FAD2, the Δ12 oleic acid desaturase. A detailed analysis of FAD2 contigs revealed two related but different sequences. One designated "McFAD2" was reported previously [10], and the second is an evolutionarily divergent FAD2 designated "McFAD2v" ("v" for variant). McFAD2 and McFAD2v share 69% amino acid sequence identity ( Figure 8A). In addition, McFAD2 and McFAD2v share 60% and 63% amino acid sequence identity with the bitter melon conjugase, respectively. McFAD2v is most closely related to a variant FAD2 identified in seeds of snake gourd (Trichosanthes kirilowii) that synthesizes the conjugated fatty acid punicic acid (18:3 Δ 9cis,11trans,13cis ) ( Figure 8B) [12]. Although it is more distantly related to McFAD2, McFAD2v lacks amino acid substitutions in the proximity of the catalytic His box domains that are characteristic of functionally variant FAD2-type enzymes (e.g., conjugase, epoxygenase, hydroxylase) [47], suggesting that McFAD2v is likely a typical Δ12 oleic acid desaturase ( Figure 8A).
To establish the functions of McFAD2 and McFAD2v, the open-reading frames of these enzymes were assembled under control of the GAL10 promoter in the pESC-URA vector and expressed in Saccharomyces cerevisiae. In galactose-induced cultures for both desaturases, production of 16:2 and 18:2 were detectable ( Figure 8C). Neither fatty acid was detected in induced cultures containing the pESC-URA vector lacking cDNA insert. In addition, no conjugated fatty acids were detected in the induced McFAD2 or McFAD2v cultures. As a control, the bitter melon conjugase was also expressed in S. cerevisiae. Unlike McFAD2 and McFAD2v, the conjugase displayed mixed functionality; generating small amounts of 16:2 and 18:2 from Δ12 desaturase activity as well as eleostearic acid from conjugase activity with 18:2 (data not shown). These results indicate that both McFAD2 and McFAD2v function as Δ12 oleic desaturases despite their divergent sequences.
Gene expression studies were conducted to understand the basis for two functional Δ12 oleic acid desaturases in developing bitter melon seeds. Using RT-PCR, expression profiles of genes for McFAD2, McFAD2v, and the conjugase were obtained during seed development ( Figure 8D). Interestingly, expression of the conjugase gene most closely mirrored the timing for expression of McFAD2v during seed development. By comparison, expression of McFAD2 was detected earlier in seed development. Given the similarity in their gene expression patterns, McFAD2v may have evolved to function in concert with the conjugase for eleostearic acid synthesis in bitter melon seeds. In this regard, the Δ12 oleic acid desaturase provides the linoleic acid substrate for production of eleostearic acid by the conjugase. It is notable that transgenic expression of the bitter melon conjugase in Arabidopsis seeds and soybean somatic embryos results in large increases in the relative content of oleic acid, in a manner consistent with the apparent inhibition of native Δ12 oleic acid desaturase activity [9,10]. For example, relative amounts of oleic acid in seeds of non-transformed Arabidopsis Col-0 fad3/fae1 increase from~28% of the total fatty acids to nearly 55% of the total fatty acids in seeds that express the bitter melon conjugase [9]. Although a number of biochemical scenarios could be proposed, one possibility for future study is that McFAD2v and the conjugase functionally interact to maintain efficient synthesis of eleostearic acid in bitter melon seeds. The role of two FAD2-related enzymes in the synthesis of an unusual fatty acid has previously been demonstrated in the synthesis of dimorphecolic acid in Dimorphotheca sinuata seeds [15].

Conclusions
Deep sequencing of developing bitter melon seeds was conducted to identify candidate genes that are associated with the synthesis of the conjugated fatty acid eleostearic acid and the efficient metabolism of eleostearic acid from its synthesis on phosphatidylcholine to storage in TAG. By use of 454 pyrosequencing of nonnormalized cDNAs derived from bitter melon seeds at 18 DAP, 190 contigs with homology to 83 known lipid genes in Arabidopsis were obtained from 10,072 total contigs. The discovery of lipid genes was significantly enhanced through the normalization of cDNAs based on the use of duplex-specific nuclease. 454 sequence data from a normalized library generated 345 contigs with homology to 189 known lipid genes in Arabidopsis from 18,245 total contigs, although the total number of clean reads from the normalized library was 22% lower than that obtained from the non-normalized library. Overall, transcriptomic analysis of bitter melon seeds using 454 technology yielded sequence data for genes encoding all of the known fatty acid biosynthetic enzymes and nearly all of the known ER-associated fatty acid modification and metabolic enzymes, including acyltransferases such as DGAT1, DGAT2, and a PDAT1-related enzyme that are likely central to efficient metabolism of eleostearic acid. Also identified in the transcriptomic analysis was a divergent FAD2 that was demonstrated to have Δ12-oleic acid desaturase activity and may be important in the synthesis of eleostearic acid. The sequence information from developing bitter melon seeds has been made publicly available in a searchable format http://genomics.msu.edu/JO/blast/ blast.html; Additional File 4) and will likely serve as a useful resource for studies of unusual fatty acid metabolism in plants and for engineering of conjugated fatty acid production in oilseed crops.

Growth conditions of plants and collection of seeds
Momordica charantia L. was grown under short-day conditions with 8 h light at 25°C/16 h dark at 21°C, 50% humidity, and 600 μmol m -1 s -1 of light. Independent male flowers were used for hand pollination of female flowers. Embryos were dissected from seeds of fruits collected at specific DAP and frozen immediately in liquid nitrogen. Embryos were stored at -80°C until use in RNA isolation or lipid analysis.

Lipid analysis of bitter melon embryos
Total lipids were extracted from frozen bitter melon embryos as described [9] using a modified version of the method reported by Bligh and Dyer [48]. Neutral lipids (consisting predominantly of TAGs), glycolipids, and phospholipids were partitioned from the total lipids by solid phase extraction (SPE) using commercially prepared silica columns (500 mg silica bed; Fisher Scientific). The total lipid extract was dissolved in one ml of chloroform and applied to a SPE column that had been equilibrated in chloroform. Neutral lipids were eluted with ten ml of chloroform and five ml of chloroform: acetone (80:20 v/v). Glycolipids were then eluted with seven ml of acetone. Phospholipids were subsequently eluted with five ml of methanol:chloroform:water (100:50:40 v/v/v). To the phospholipid fraction, 1.3 ml of water and 1.3 ml of chloroform were added. After mixing and centrifugation, the lower organic phase containing phospholipids was recovered. The neutral and phospholipid fractions in glass screw cap test tubes (13 × 100 mm) were dried under nitrogen and then transesterified with the addition of 1.5 ml of 1% sodium methoxide in methanol (w/v) and 0.2 ml of toluene. For quantification of fatty acids, triheptadecanoin (Nu-Chek, Elysian, Minnesota USA) was also added to each fraction as an internal standard. Transesterification and subsequent recovery and analysis of fatty acid methyl esters by gas chromatography was conducted as previously described [9].

RNA extraction and RT-PCR analysis
RNA was extracted from bitter melon seeds using the Trizol reagent as described by the manufacturer (Invitrogen). RT-PCR was carried out using the Advantage RT-for-PCR kit from BD Biosciences Clontech. In brief, 1 μg of total RNA was reverse transcribed, and the cDNA was used in PCR reactions to amplify the corresponding genes with FAD2-or conjugase, oleosin, or β-Tubulin-specific primers, respectively. The conjugase primers were 5'

cDNA library construction and normalization
Total RNA was extracted from developing bitter melon seeds was ground to a fine powder in liquid nitrogen using Trizol reagent (Invitrogen) according too the manufacturer's protocol. mRNA were purified from~1 mg of total RNA by two passes through oligo-dT cellulose columns by use of the Illustra mRNA purification kit (GE Healthcare). cDNA libraries were constructed using SMART PCR cDNA synthesis kit (Clontech). First-strand cDNA was synthesized with 150 ng mRNA in a volume of 10 μl using the provided SMART II primer, a modified CDS III/3' cDNA synthesis primer (5'-AAGCAGTGGTATCAACGCAGAGTGGCCGAGGCG GCCGACATGTTTTGTTTTTTTTTC TTTTTTTTT TVN-3') and Superscript II reverse transcriptase (Invitrogen). Double stranded cDNA was prepared by PCR (18 cycles) using 2 μl of the first-strand reaction in a 50 μl reaction volume. Following Proteinase K treatment, four PCR reactions were pooled before SfiI digestion and size fractionated on the provided CHROMA SPIN-400 column. Only fractions containing fragments larger than 500 bp were collected, precipitated, and resuspended in TE buffer. Library normalization of this cDNA was conducted by use of Trimmer-Direct cDNA normalization kit (Evrogen). Briefly, four 250 ng aliquots of cDNA were hybridized at 98°C for 2 min followed by 68°C for 5 h. The hybridized cDNAs were then treated with 0, 0.25, 0.5, and 1 μl duplex-specific nuclease (DSN), respectively, before stop with the DSN stop buffer. cDNA (1 μl) from each aliquot was subjected to PCR amplification. Based on the results from the sample lacking DSN, the cycle number (9+2 cycles) was determined for the first round amplification of DSN treated samples. After examination of the cDNAs on the agarose gel, the selected aliquot of cDNAs were then diluted 10 times and subjected to a second round of PCR using 2 μl in 100 μl reaction (12 cycles). The amplified cDNA pool was then treated with proteinase K, fractionated, and precipitated for the non-normalized cDNA library construction. For the pilot study, cDNA PCR fragments were digested with SfiI enzyme and cloned into SfiIA and SfiIB sites of pDNR-LIB vector (Clontech).

sequencing and data analysis
DNA sequencing was performed at the Michigan State University Research Technology Support Facility using the GS FLX sequencer (Roche). Reads were trimmed to remove low quality and primer sequences using Seq-Clean [49]. The reduced dataset then underwent two rounds of assembly with CAP3. First-round CAP3 parameter settings for percent match, overlap length, maximum overhang percent, gap penalty, and base quality cutoff for clipping were -p 90 -o 50 -h 15 -g 2 -c 17, respectively. For the second round, -o was changed to 100. The resultant contigs were then annotated with a translated BLAST against the TAIR7 and the viridiplantae subdivision of the NCBI nonredundant protein databases.
Sequence data have been deposited in the GenBank Short Read Archive (SRA). The accession number for the project in NCBI SRA is SRP004091. The accession numbers in NCBI SRA for the individual experiments are SRX030203 (normalized sequence data) and SRX030204 (non-normalized sequence data). The assembled sequence data are also available in a searchable format at http://genomics.msu.edu/JO/blast/blast. html, and lipid gene data are compiled in Additional File 4.

Sequence alignment and phylogenetic analysis
Protein sequences were aligned using the clustal W Multiple Sequence Alignment Program [50] using Gonnet protein weight matrix (gap open penalty = 10, gap extension penalty = 0.2, gap separation distance = 4) and displayed by GeneDoc [51]. Phylogenetic trees of protein sequences (aligned with Clustal W) was generated in MEGA4.0.1 [52] using the neighbor-joining method [53]. Pairwise deletion was used for handling of sequence gaps, and 2000 bootstrap replicates were performed. The evolutionary distances were computed using the Poisson correction method [54].