Transcriptome analysis of ectopic chloroplast development in green curd cauliflower (Brassica oleracea L. var. botrytis)

Background Chloroplasts are the green plastids where photosynthesis takes place. The biogenesis of chloroplasts requires the coordinate expression of both nuclear and chloroplast genes and is regulated by developmental and environmental signals. Despite extensive studies of this process, the genetic basis and the regulatory control of chloroplast biogenesis and development remain to be elucidated. Results Green cauliflower mutant causes ectopic development of chloroplasts in the curd tissue of the plant, turning the otherwise white curd green. To investigate the transcriptional control of chloroplast development, we compared gene expression between green and white curds using the RNA-seq approach. Deep sequencing produced over 15 million reads with lengths of 86 base pairs from each cDNA library. A total of 7,155 genes were found to exhibit at least 3-fold changes in expression between green and white curds. These included light-regulated genes, genes encoding chloroplast constituents, and genes involved in chlorophyll biosynthesis. Moreover, we discovered that the cauliflower ELONGATED HYPOCOTYL5 (BoHY5) was expressed higher in green curds than white curds and that 2616 HY5-targeted genes, including 1600 up-regulated genes and 1016 down-regulated genes, were differently expressed in green in comparison to white curd tissue. All these 1600 up-regulated genes were HY5-targeted genes in the light. Conclusions The genome-wide profiling of gene expression by RNA-seq in green curds led to the identification of large numbers of genes associated with chloroplast development, and suggested the role of regulatory genes in the high hierarchy of light signaling pathways in mediating the ectopic chloroplast development in the green curd cauliflower mutant.


Background
Chloroplast biogenesis from proplastids requires coordinate expression of nuclear and chloroplast genes [1], and is largely regulated by developmental and environmental cues such as light. Approximately 3000 proteins in chloroplasts are encoded by the nucleus [2]. They participate in a large number of functional processes that are required for chloroplast biogenesis. These processes include import of nuclear encoded proteins through the Toc/Tic complexes, protein assembly and disassembly with chaperone proteins, thylakoid formation, pigment synthesis, plastid divisions, and retrograde signaling [3,4]. In addition, a great number of proteins localized outside chloroplasts, such as photoreceptors, light-signaling transducers, and transcription factors, have been shown to be involved in chloroplast development [3,4]. On the one hand, most genes belonging to these two classes are essential for chloroplast development since suppression of their expressions leads to impaired chloroplasts. On the other hand, some light signaling pathway genes, such as constitutive photomorphogenic 1 (COP1), COP10, COP11, De-etiolated 1 (DET1) and Phytochrome-interacting transcription factor 3 (PIF3), function as suppressors of light-regulated gene expression and loss-of-function mutations of these genes result in ectopic chloroplast development [5][6][7]. In contrast, Elongated Hypocotyl 5 (HY5) that acts downstream of multiple families of photoreceptors [8][9][10] has been genetically characterized as a positive regulator of photomorphogenesis under a broad spectrum of light and affects chloroplast development [4,11]. Overexpression of HY5-ΔN77 has been shown to result in precocious development of chloroplasts in the hypocotyls [12]. Determining how these genes are coordinately expressed during chloroplast development requires a genome-wide examination of gene expression during the transition from non-colored plastids into chloroplasts.
Mutations in model and other plant species are important resources for functional genomics studies. Analyses of some plastid development mutants identify important regulatory genes of plastid development. For example, ARC6, the first gene discovered to have a global effect on plastid differentiation in higher plants, was identified from an Arabidopsis mutant arc6 [13]. The Orange (Or) gene that encodes a zinc-finger DnaJ cysteine rich domain containing protein is isolated from the orange curd cauliflower mutant and has been proven to be responsible for the conversion of leucoplasts into chromoplasts [14]. The green curd cauliflower mutant is a spontaneous mutation with an abnormal pattern of chloroplast development in curds. Compared with other mutants in which chloroplast development is impaired, the green curd mutant is unique in turning otherwise non-photosynthetic white tissue into green color with the ectopic development of chloroplasts in the inflorescence meristematic cells. The mutation in the green curd cauliflower could involve the gene(s) sufficient for chloroplast development, although there is possibility that the white curd cauliflower carries a genetic mechanism for the suppression of chloroplast development, which the green curd mutation would suppress.
In the present study, we profiled gene expression in green and white curds on the genome scale using the RNA-seq approach. We assembled 118,000 unigenes with an average length of 406 bp from cDNA libraries of green and white curds and detected 7155 differentially expressed genes with a change in expression of at least 3-fold. Among them are a large number of genes associated with chloroplast development. We also observed that BoHY5 was expressed at higher level in green curds than in white curds and that 2616 HY5-targeted genes were expressed differentially. Among these HY5-targeted genes, all the 1600 up-regulated genes were found to be HY5-targeted genes in the light in Arabidopsis, suggesting a role of BoHY5 with the ectopic chloroplast development in the green curd cauliflower mutant.

Cauliflower mutant with green curds
Cauliflower curd is composed of inflorescence meristems that normally contain proplastids and leucoplasts and is therefore white [15]. In the commercially available green cauliflower mutant, chloroplasts are developed in the curd, turning the otherwise white tissue green (Figure 1a and 1b). While the mutant plants produced green curds under normal growth conditions in greenhouse and in field, the intensity of green hue in the curd tissues was affected by light intensity. Under field growth conditions, the curd tissues exposed to direct sunlight showed dark green color and those grown in shade exhibited less green hue. Autofluorescence of chlorophyll in chloroplasts was clearly observed in the green curd cells under the confocal microscope (Figure 1c and 1d). To investigate chloroplast development in the green curd mutant, we first measured chlorophyll content in young leaf and curd tissues. Higher level of total chlorophyll was detected in leaf tissue of green cauliflower plants than that of the white control. The concentration of chlorophyll in green curd cauliflower leaves was 1780.4 μg/g fresh weights (FW), while that in the white curd leaves was 1056.6 μg/g FW. Although different levels of total chlorophyll were observed between the two samples, the ratio of chlorophyll a/b for leaves in white and green mutant was similar at 2.70:1 and 2.75:1, respectively. In comparison to leaf tissue, the chlorophyll level in the curd of green cauliflower was lower at 344.4 μg/g FW. The chlorophyll a/b ratio was 3.43:1, showing that the accumulation of chlorophyll a was much greater than that of chlorophyll b in green curds ( Figure 1e). As expected, no chlorophyll accumulation was detected in the white curd tissue. The green curd cauliflower mutant serves as an excellent model system for investigating the genetic basis of chloroplast biogenesis in plants.

Comparative analysis of gene expression between green and white curd cauliflower
To investigate the transcriptional control of chloroplast development, RNA-seq was employed to monitor differences in gene expression between the green curd mutant and the white cauliflower. A single lane of an Illumina GAII run was utilized for each library and a total of more than 15 million 86-bp reads from each lane were produced. Since currently there is no full genome sequence available for cauliflower (Brassica oleracea) and the genomics resources from other Brassica species are not applicable due to the short length of RNA-seq reads, we developed a novel analysis strategy for our RNA-seq data as described in the Methods section. A total of 118,000 unigenes (including alternative spliced isoforms) with an average length of 406 bp were obtained. Statistical analysis identified 7155 unigenes that were differentially expressed between green curd mutant and white curd control. Among them, 4436 genes (3.76%) were expressed at least 3-fold higher (Additional file 1) and 2719 genes (2.3%) were expressed at least 3-fold lower in green curd than in white curd (Additional file 2). Functional categorization revealed that these genes were largely involved in cellular process (1317), response to stress (980), metabolic process (810), response to abiotic stimulus (654), and biosynthetic process (574). Yet, a large group of genes (3602) remained unclassified ( Figure 2).

Metabolic pathway changes
To identify the metabolic pathways that were affected in the green curd mutant, a cauliflower metabolic pathway database was created based on annotation of the assembled cauliflower unigenes. The significantly affected pathways were identified by using the Plant MetGenMAP analysis system http://bioinfo.bti.cornell. edu/cgi-bin/MetGenMAP/home.cgi [17]. A total of 198 specific metabolic pathways were significantly changed in green curd mutant (p < 0.01) (Additional file 3). As expected, many metabolic pathways involved in chloroplast biogenesis and function were significantly altered. These included those associated with chlorophyll biosynthesis, such as chlorophyllide a biosynthesis I, chlorophyll a biosynthesis I, chlorophyll a biosynthesis II, chlorophyll a degradation, and chlorophyll cycle, as well as with carotenoid biosynthesis (Additional file 3). In addition, those pathways associated with photosynthesis, such as oxygenic photosynthesis, Calvin cycle, and photorespiration, and with other metabolic processes that take place in chloroplasts, such as amino acid biosynthesis and starch biosynthesis, were also significantly changed (Additional file 3).

Genes involved in chloroplast formation
Chlorophylls and carotenoids compose the photosynthetic pigments that play key roles in photosynthesis. Many genes involved in chlorophyll biosynthesis were found to be expressed highly in green curd in comparison with white ( Figure 3). The upregulated genes included Mg-chelatase that plays a key regulatory role in chlorophyll biosynthesis. Genomes uncoupled 4 (GUN4, PP005347) and Genomes uncoupled 5 (GUN5, PP031929) involved in chlorophyll biosynthesis were also expressed at higher levels in green curds. These two genes are among those that produce plastid-tonuclear retrograde signaling molecules [18,19]. The upregulation of many genes in chlorophyll biosynthesis resulted in the accumulation of chlorophyll a and b in chloroplasts. Concomitantly, a number of genes involved in carotenoid biosynthesis were also up-regulated (Table  2), suggesting an increased capacity for the synthesis of photosynthetic pigments. Consistent with the accumulation of chlorophyll a and b in green curds, genes encoding chlorophyll binding proteins were also upregulated (Table 2). Moreover, genes encoding photosystem I and photosystem II proteins were among the upregulated genes (Table 2), indicating the development of chloroplast structures in the green curd tissue.  In addition to the enhanced biosynthesis of photosynthetic apparatus, genes involved in a number of other chloroplast biogenesis processes were also differentially expressed in green curd mutant. TRANSLOCON AT THE OUTER ENVELOPE MEMBRANE OF CHLORO-PLASTS 34 (Toc34) and Toc159 are important parts of the Toc/Tic complexes mediating protein import from cytosol [1]. High levels of Toc34 (PP019500 and PP051864) and Toc159 (PP013646 and PP007289) transcripts were observed in the green curds. Proteins imported into chloroplasts need to be properly assembled and folded, a process that is mediated by a group of chaperone proteins, such as HSP70 and Cpn60, and protein disulfide isomerase [3,20,21]. Accordingly, chaperone HSP70 (PP031462, PP020739, and PP094292) and protein disulfide isomerase (PP012584, PP000051, and PP028760) were found to be significantly upregulated in green curds (Additional file 1).
Chlorophyllase catalyzes degradation of chlorophyll a to yield chlorophyllide and phytol [22]. Chlorophyllase (PP095462) was expressed lower in green curds than in white curds, which could account for the accumulation of chlorophyll a in green curds ( Figure 3).

Signaling genes for chloroplast biogenesis
The large number of differentially expressed genes between the green curd mutant and white curd cauliflower suggests that genes at high hierarchy in the signal transduction cascade could be involved. COP/DET/FUS are a group of evolutionarily conserved proteins that represent central repressors of photomorphogenesis including chloroplast development [11]. No changes were detected in the expression of COP1, COP10, COP11, and DET1. COP9 complex acts as a suppressor of chloroplast development [5,23]. Unexpectedly, we found that COP9 (PP010178) and FUSCA 12 (FUS12)/ COP9 signalosome complex subunit 2 (PP014936) were expressed at higher levels in green curds than in white curds. Such higher expression could be a result of a negative feedback as the case of SPA1, a partner of COP1, which is frequently found to be light induced [24,25]. PIFs are another group of regulators that repress photomorphogenesis. No changes were observed for the expression of PIF3 and PIF4 in the green vs. white curds. Interestingly, the transcript of PIL2 (PP058986) was increased in the green curd mutant.
In contrast to those photomorphogenesis repressors, HY5 is a key regulator that promotes photomorphogenic development in all light conditions and directly regulates the light-responsive gene expression [8,9,[26][27][28]. Here, we found that BoHY5 (PP014970 and PP017071) and BoHY5-HOMOLOG (PP001428) were expressed at higher levels in green curds than white curds (Table 1 and Figure 4c). A recent study on genome-wide mapping of the Hy5-mediated gene networks in Arabidopsis reveals that HY5 could potentially bind to 11,797 genes with 2770 and 2191 being light and dark regulated genes, respectively [26]. Sequence comparison with the HY5-targeted genes in Arabidopsis revealed that a total of 2616 cauliflower HY5-targeted homolog genes were differentially expressed in green curds (Figure 4a). Among them included 1600 up-regulated genes and 1016 down-regulated genes (Additional file 4 and 5). All of the 1600 up-regulated genes were found to be HY5targeted genes in the light, while 48 down-regulated genes were HY5-targeted genes in the dark (Figure 4b).

Discussion
Large-scale transcriptome sequencing by next generation sequencing platforms such as the Illumina GA sequencing system has been proven to be a powerful and efficient approach for gene expression analysis at the genome level and offers several advantages over microarray technologies [29]. Since the RNA-seq approach provides digital representation of the gene abundance and the statistics are well modeled by the Poisson distribution, even a single replication has been shown to be adequate [30]. Currently, the RNA-seq approach has been widely used to investigate transcriptomes of plants and animals, especially for those having whole genome sequences [31]. A number of tools to map RNA-seq data to reference genomes and to quantify the expression of transcripts have been developed [32]. However, relatively fewer reports have shown studies on using the RNA-seq approach for organisms without reference genomes. In this report we employed the RNA-seq approach to investigate the gene expression changes in a green curd mutant in order to elucidate the genetic basis of chloroplast biogenesis and development. RNAseq reads along with publicly available ESTs of cauliflower were assembled de novo using a novel assembly strategy as described in the Methods section. A total of 118, 000 unigenes were obtained and 7155 genes showed at least 3-fold changes in expression in green curd mutant. Among them, a large number of genes involved in photomorphogenesis including chloroplast development were revealed, demonstrating a successful use of the RNA-seq approach to profile gene expression in a species without a fully sequenced genome.
Chloroplast biogenesis and development proceed with the coordinated action of many processes [3,4]. Both environmental signals and plastidic/nuclear factors affect these processes. Light regulation of chloroplast development has been well-documented [3,4,33]. The light signaling pathways are composed of phytochromes, transcription factors and numerous intermediates which control photomorphogenesis including chloroplast development. The COP/DET/FUS proteins are suggested to have a function in suppressing chloroplast development in non-photosynthetic tissues [4]. Loss of function mutation of these regulators, such as cop1 and det1, has been shown to result in ectopic chloroplast      development, leading to greening in Arabidopsis roots [5,6]. The fact that the transcripts of COP1 and DET1 remained unchanged and a large number of lightresponsive genes were altered in green curds of cauliflower suggests that other regulatory genes in the hierarchy of photomorphogenic regulation are responsible for chloroplast development in the green curd.
In the light signaling cascade, HY5 plays an important role in light signaling and chloroplast development. HY5 receives upstream signals and activates a large number of genes by directly binding to the G-box in the promoters of these genes [9,26,27]. Here, we observed higher level of HY5 transcript in the green curd mutant. Furthermore, 2616 cauliflower homologs of HY5-targeted genes were differentially expressed in green curds. Noticeably, among the 2616 genes, 1600 were up-regulated genes in green curd cauliflower. The fact that all 1600 up-regulated genes were the HY5-targeted genes in the light suggests an important role of elevated expression of BoHY5 in mediating chloroplast development in green curd cauliflower mutant. Furthermore, it is known that COP1 negatively controls HY5 activity [12]. Although COP1 was expressed at the same level between green curds and white curds, we found that CIP1 was significantly reduced in green curds ( Figure  4c). Arabidopsis CIP1 is associated with the cytoskeleton and has been hypothesized to affect partitioning of COP1 in the nucleus and cytoplasm [34]. It is possible that COP1 activity in the nucleus might be affected by low level of CIP1, causing ectopic chloroplast development in green curds. Thus, BoHY5 and/or the other genes at the high hierarchy in the signal transduction cascade could be responsible or work in concert to regulate chloroplast biogenesis and development in otherwise white tissue to give rise to the striking green curd mutant phenotype.
Ultimately, the development of chloroplasts requires the coordinated action of a number of processes, including the biosynthesis of photosynthetic complexes, transportation of nuclear encoded proteins into chloroplasts, processing of the imported proteins, and assembly of the photosynthetic apparatus [3,4]. Indeed, many genes involved in photosynthetic pigment biosynthesis along with pigment-binding proteins such as chlorophyll a/b binding proteins were discovered to be upregulated in our genome-wide profiling of green curd cauliflower. The majority of chloroplast proteins are nucleusencoded and enter the chloroplasts via the Toc/Tic translocon complexes [1]. The increased expression of Toc genes in the green curd mutant supports an enhanced activity of chloroplast-targeted protein import. The imported proteins are folded and processed to form functional proteins. Molecular chaperones HSP70 and Cpn60 have long been known to be involved in this process [3,20]. A recent study shows that a protein disulfide isomerase is also required for protein folding [21]. Consistent with the increased activity of protein import, genes associated with protein folding and assembling were expressed highly in the green curd mutant for chloroplast development.

Conclusions
In the present study, we compared gene expression on a genome-wide scale by using RNA-seq in a species without a reference genome. This study identified a great number of genes associated with chloroplast development and suggested the potential role of elevated expression of BoHY5 and/or other regulatory genes in the high hierarchy of light signaling pathways for the ectopic chloroplast development in green curd cauliflower. Our results indicate that RNA-seq as a powerful tool in a genomic era could accelerate the functional identification of genes and aid in dissecting the genetic basis of naturally-occurring variations in crops.

Plant materials
White curd cauliflower cultivar Stovepipe (Brassica oleracea L. var. botrytis) and the green curd mutant line ACX800 were used in this study. Cauliflower plants were grown either in a greenhouse under 14-h-light/10h-dark cycle at 23°C or in a field. In the greenhouse, the natural daylight was supplied by full-spectrum lamps with the light intensity at 400 μmol photons m -2 s -1 . Fresh curd tissues were harvested, immediately frozen in liquid nitrogen, and stored at -80°C for RNA extraction and chlorophyll extraction.

RNA extraction and construction of cDNA library for sequencing
Total RNA was extracted from pooled curd tissue using the TRIzol reagent according to the manufacturer's instruction (Invitrogen, Carlsbad, CA), and was further purified with the RNeasy ® Plant Mini Kit (Qiagen, Valencia, CA). The cDNA libraries of green and white cauliflower from five micrograms of total RNA were constructed using the mRNA Sequencing Sample Preparation Kit following the manufacturer's instruction (Illumina, San Diego, CA, USA). Sequencing was carried out on an Illumina/Solexa Genome Analyzer II system at the Cornell University Life Sciences Core Laboratories Center.

RNA-seq data processing and analysis
The raw Illumina RNA-seq reads were first processed to remove low quality regions and adaptor sequences using an in-house perl script. To eliminate rRNA sequence contamination, the reads were then aligned to cauliflower ribosomal RNA (rRNA) sequences using Bowtie [35], allowing up to two mismatches. A total of 60,000 cauliflower Sanger ESTs were collected from GenBank in June, 2010. These ESTs were screened against the NCBI UniVec database, the Escherichia coli genome, and cauliflower rRNA sequences, to remove those contaminant sequences. The resulting high quality ESTs were assembled into unigenes using iAssembler http://bioinfo.bti.cornell.edu/tool/iAssembler. The processed Illumina reads were then aligned to the cauliflower EST-unigenes using Bowtie [35], allowing up to two mismatches. A de novo assembly of the unaligned reads was then performed using ABySS [36]. The unigenes assembled from ESTs and unaligned Illumina reads, respectively, were further assembled using iAssembler. Following mapping to EST-unigenes and de novo assembly, transcript count information for sequences corresponding to each unigene were compared to obtain relative expression levels following normalization to RPKM (reads per kilobase of exon model per million mapped reads) [37]. The significance of differential gene expression between the green and white curds was determined using the R statistical method described by Stekel et al. [38] and raw p-values were adjusted for multiple tests using the false discovery rate [39]. Genes with a false discovery rate ≤ 0.01 and a fold change no less than 3 were identified as differentially expressed genes between green and white curds.
To identify biological processes affected in the green curd mutant, the differentially expressed genes were annotated by assigning gene ontology (GO) terms. Potential roles of differentially expressed genes in some specific biological processes were identified. In addition, we created a metabolic pathway database based on the annotation information of the assembled cauliflower unigenes using the Pathway Tools [40]. The pathway database was then integrated into the Plant MetGen-MAP system [17] to identify the significantly affected pathways.

Verification of RNA-Seq by quantitative RT-PCR
The cDNA was synthesized using oligo-dT primers and Superscript ® reverse transcriptase III (Invitrogen, Carlsbad, CA). qRT-PCR was conducted by using the SYBR Green PCR master mix (Applied Biosystems, CA). The cycling conditions involved denaturation at 95°C for 10 min, followed by 40 cycles of 95°C for 15 s and 60°C for 60 s. The dissociation curves were analyzed to verify the specificity of RT-PCR. The relative expression of selected genes was normalized to a cauliflower actin gene [14]. Values reported represent the average of two biological repeats with three independent trials. Gene-specific primers used are listed in Additional file 6.

Chlorophyll determination
Fifty milligrams of curds were ground in liquid nitrogen, and 1 mL of 80% acetone was added to extract chlorophyll. After centrifugation at 12,000 g for 5 min, the supernatant was transferred into the new tube and measured at OD 645 and OD 663 . Chlorophyll concentrations were calculated by using MacKinney's coefficients in the following equations: Chlorophyll a = 12.7*(OD 663 )-2.69* (OD 645 ) and Chlorophyll b = 22.9*(OD 645 )-4.48*(OD 663 ) [41].
Authors' contributions XZ designed the research, conducted molecular and biochemical analyses, and wrote the manuscript. ZF performed the bioinformatics data analysis. TWT participated in the initial design and discussion of the project, and editing of the manuscript. LL conceived the research and participated in the writing of the manuscript. All authors read and approved the final version of the manuscript.