Transcriptome analysis of stem development in the tumourous stem mustard Brassica juncea var. tumida Tsen et Lee by RNA sequencing
© Sun et al; licensee BioMed Central Ltd. 2012
Received: 31 December 2011
Accepted: 21 April 2012
Published: 21 April 2012
The Erratum to this article has been published in BMC Plant Biology 2013 13:90
Tumourous stem mustard (Brassica juncea var. tumida Tsen et Lee) is an economically and nutritionally important vegetable crop of the Cruciferae family that also provides the raw material for Fuling mustard. The genetics breeding, physiology, biochemistry and classification of mustards have been extensively studied, but little information is available on tumourous stem mustard at the molecular level. To gain greater insight into the molecular mechanisms underlying stem swelling in this vegetable and to provide additional information for molecular research and breeding, we sequenced the transcriptome of tumourous stem mustard at various stem developmental stages and compared it with that of a mutant variety lacking swollen stems.
Using Illumina short-read technology with a tag-based digital gene expression (DGE) system, we performed de novo transcriptome assembly and gene expression analysis. In our analysis, we assembled genetic information for tumourous stem mustard at various stem developmental stages. In addition, we constructed five DGE libraries, which covered the strains Yong'an and Dayejie at various development stages. Illumina sequencing identified 146,265 unigenes, including 11,245 clusters and 135,020 singletons. The unigenes were subjected to a BLAST search and annotated using the GO and KO databases. We also compared the gene expression profiles of three swollen stem samples with those of two non-swollen stem samples. A total of 1,042 genes with significantly different expression levels occurring simultaneously in the six comparison groups were screened out. Finally, the altered expression levels of a number of randomly selected genes were confirmed by quantitative real-time PCR.
Our data provide comprehensive gene expression information at the transcriptional level and the first insight into the understanding of the molecular mechanisms and regulatory pathways of stem swelling and development in this plant, and will help define new mechanisms of stem development in non-model plant organisms.
KeywordsTranscriptome Tumourous stem mustard Development RNA sequencing Brassica juncea var. tumida
The tumourous stem mustard, Brassica juncea (Cruciferae), is an important vegetable and the raw material for Fuling mustard. The genetics breeding, physiology, biochemistry and classification of mustards have been extensively studied, but little work has been done at the molecular level. The three diploid (B. nigra, B. oleracea, and B. campestris) and three allotetraploid (B. carinata, B. juncea, and B. napus) species of Brassica, related according to the U-triangle theory, are one of the best model systems to study polyploidy and genetic relationships . Thus, investigating tumourous stem mustard at the molecular level may help to clarify the genetic relationships mentioned above.
Development of the stem in tumourous stem mustard, which is pickled or eaten fresh, is directly related to the quality and yield of tumourous mustards. Stem swelling is a key characteristic of tumourous stem formation, and although many new cultivars of tumourous stem mustard have been bred, the molecular mechanism underlying stem swelling is unclear. Tumourous stem mustard is an annual plant, and the stem does not swell except in plants sown between mid-September and mid-October in Chongqing and the other valleys of the Yangtze River, China; thus, the production period of edible stems is limited [2, 3]. Identification of the genes controlling stem swelling and the regulatory network would facilitate molecular breeding and increase the yield and quality of this crop.
Tumourous stem mustard shares a close genetic relationship with the model plants Arabidopsis thaliana and Arabidopsis lyrata, whose genetic backgrounds are more clearly understood. Thus, A. thaliana may aid in our understanding of the mechanism of swelling in tumourous stem mustard. In recent years, novel, high-throughput, deep sequencing technologies have allowed the efficient generation of large-scale ESTs and improved the speed of gene discovery . In addition, A. thaliana microarrays may be used to study stem swelling because of the close genetic relationship between that organism and tumourous stem mustard. RNA sequencing (RNA-Seq) generates absolute rather than relative gene expression measurements and provides greater insight and accuracy than microarrays [5–7].
In this study, we generated over 5 billion bases of high-quality DNA sequence using Illumina technology and demonstrated the suitability of short-read sequencing for the de novo assembly and annotation of genes expressed without prior genome information. In addition, we constructed five digital gene expression (DGE) libraries and compared the gene expression profiles of tumourous stem mustard at different developmental stages with those of different varieties. The assembled annotated transcriptome sequences and gene expression profiles will facilitate the identification of genes involved in tumourous stem mustard swelling and be a useful reference for other Cruciferae developmental studies.
The tumourous stem mustard (Brassica juncea var. tumida Tsen et Lee) strains Yong'an (with inflated tumourous stems) and a mutant variety Dayejie (without inflated stems) were collected from the Chongqing Fuling Institute of Agricultural Sciences, Chongqing, China. After seeding on October 5th, Dayejie and four developmental stages of Yong'an were collected in February of the next year. Dayejie stems were collected 22 weeks after seeding (daye3bianzhong). The stems of Yong'an were collected 18, 20, 22, and 25 weeks after seeding (respectively: yongan1hao, uninflated; yongan2hao, one week before the start of inflation; yongan3hao, one week after the start of inflation; and yongan4hao, one month after the start of inflation). After washing in physiological saline and 0.1% DEPC-treated water, the fresh samples were stored in liquid nitrogen until the extraction of total RNA.
cDNA preparation and Illumina sequencing
Total RNA was extracted using a Plant RNA Kit according to the manufacturer's instructions (Watson Biotech, Shanghai, China). The quantity of RNA was verified by an ultraviolet spectrometer and electrophoresis on a denaturing formaldehyde agarose gel. To obtain complete gene expression information, a pooled RNA sample from tissues of different developmental stages (stems of the yongan1hao, yongan2hao, yongan3hao, and yongan4hao stages) was used for transcriptome analysis. Oligo(dT) beads were used to isolate poly(A) + mRNA from the total RNA. Fragmentation buffer was added to disrupt the mRNA into short fragments. Taking these short fragments as templates, a random hexamer primer was used to synthesise first-strand cDNA. Second-strand cDNA was synthesised using buffer, dNTPs, RNase, and DNA polymerase I. The resulting short fragments were purified with a QiaQuick PCR extraction kit and resolved with EB buffer for end repair and addition of a poly(A) tail. Afterwards, the short fragments were connected with sequencing adapters. Following agarose gel electrophoresis, suitable fragments were selected as templates for PCR. Finally, the library was sequenced using Illumina HiSeq™ 2000.
The raw data from the images were collected using Solexa GA pipeline 1.6 by removing low-quality reads (reads with unknown sequences 'N'), adaptor sequence fragments, and empty reads. Next, de novo assembly of the transcriptome into unigenes was carried out with SOAPdenovo, a short-reads assembly program . Briefly, SOAPdenovo first combines reads with a particular overlap to form longer fragments without N, which are called contigs. Then, the reads are mapped back to contigs. The program is able to detect contigs from the same transcript as well as the sequences between these contigs using paired-end reads. Next, SOAPdenovo connects the contigs using N to represent unknown sequences between each two contigs, and scaffolds are made. Paired-end reads are used again for gap filling of the scaffolds to obtain sequences with the lowest Ns and which cannot be extended on either end. Such sequences are defined as unigenes.
Subsequently, a BLASTX alignment (e-value < 0.00001) was performed between the unigenes and protein databases, including the non-redundant (nr), Swiss-Prot, Kyoto Encyclopaedia of Genes and Genomes (KEGG), and COG databases, and the best alignments were used to decide the sequence direction of the unigenes. If the results from the different databases conflicted, a priority order of nr, Swiss-Prot, KEGG, and COG was followed to decide the sequence direction of the unigenes. When a unigene happened to be unaligned in any of the above databases, ESTScan was used to predict its coding regions and decide its sequence direction . In the final step, using nr annotation, the Blast2GO program was used to obtain the Gene Ontology (GO) and KEGG annotations of the unigenes . After obtaining the GO annotation for each unigene, WEGO software was used to classify the unigenes by function and to determine the distribution of gene functions in the species at the macro level . All raw transcriptome data have been deposited at the sequence read archive (SRA) of NCBI.
DGE library preparation and sequencing
Total RNA was extracted from the stems of yongan1hao, yongan2hao, yongan3hao, yongan4hao, and daye3bianzhong using a Plant RNA Kit (Watson Biotech) according to the manufacturer's instructions. Next, a DGE library was prepared using an Illumina gene expression sample prep kit. Briefly, total RNA from the samples was used for mRNA capture with magnetic oligo(dT) beads. Double-stranded cDNAs were synthesised directly on the poly(A) + RNA-bound beads and then digested with NlaIII. Those cDNA fragments with 3' ends were purified from the magnetic beads, and Illumina adaptor 1 was added to their 5' ends. After digestion with MmeI, which recognises the junction between Illumina adaptor 1 and the sequence CATG, 21-bp tags containing adaptor 2 were ligated to the 3' ends of the tags to create a tag library. The library was amplified by PCR over 15 cycles, and 85-bp strips were purified by PAGE. The single-stranded molecules were attached to the Illumina chip for sequencing. All raw tag data were deposited at the SRA of NCBI.
The raw image data were transformed by base calling into sequence data. To map the DGE tags, the sequenced raw data were filtered to remove adaptor sequences, low-quality sequences (tags with unknown sequences), empty tags (no tag sequence between the adaptors), and tags with only one copy number (probable sequencing error). For tag annotation, clean tags containing CATG and the 21-bp tag sequences were mapped to our transcriptome reference database, allowing no more than one nucleotide mismatch. The clean tags were designated as unambiguous clean tags. For gene expression analysis, the number of expressed tags was calculated and normalised to the number of transcripts per million tags.
In this equation, N represents the number of genes with a GO/KO annotation, n represents the number of differentially expressed genes in N, M represents the number of genes in each GO/KO term, and m represents the number of differentially expressed genes in each GO/KO term. For GO enrichment analysis, all of the p-values were subjected to Bonferroni correction. A corrected p-value of < 0.05 was selected as the threshold for determining significant enrichment of the gene sets.
Quantitative real-time PCR (qRT-PCR) validation
Total RNA was extracted as described for DGE library preparation and sequencing. Total RNA (1 μg) from each sample was reverse-transcribed in a 10-μl reaction using the AMV RNA PCR Kit 3.0 (Takara). The sequences of the primers used are given in Additional file 1: Table S1. The 18 s rRNA gene of tumourous stem mustard was used as an internal control. qRT-PCR was performed using SYBR Premix Ex Taq™ Kit (Takara) according to the manufacturer's protocol. The selected genes were verified using a Bio-Rad iQ5 real-time PCR detection system with a cycling temperature of 57°C and with a single peak on the melting curve to ensure a single product. At least three replicates were tested per sample.
Illumina sequencing and sequence assembly
To evaluate the quality of the dataset, the ratio of the gap length to the length of the assembled unigenes was analysed (Figure 1b). Most of the unigenes showed gap lengths < 5% of the total length, which accounted for 94.79% of the total number of unigenes (146,265).
Annotation of predicted proteins
Functional classification using the KEGG database
Functional classification and pathway assignment was performed using the KEGG database . First, the 146,265 unigenes with an e-value ≤ 1e-05 were compared using BLASTX against the KEGG database. In total, 39,203 unigenes were assigned to 119 KEGG pathways (see Additional file 3: Table S3). The major pathways were 'metabolic pathways [ko01100]', 'biosynthesis of secondary metabolites [ko01110]', 'plant-pathogen interaction [ko04626]', 'spliceosome [ko03040]' and 'starch and sucrose metabolism [ko00500]'; the gene numbers and percentages assigned to these pathways were 8,939 (22.8%), 4,470 (12.17%), 2,792 (7.12%), 1,786 (4.56%) and 1,180 (3.01%), respectively.
DGE library sequencing
Statistics of DGE sequencing
Total Mapped Reads
< = 2 bp mismatch
Total Unmapped Reads
Different components of the raw reads in each sample
Gene expression variations among the different samples
Validation of RNA-Seq-based gene expression
Tumourous stem mustard, an important cash crop and the raw material for Fuling pickles, is a world-famous vegetable crop. Presently, many varieties of tumourous stem mustard have been bred, but the regulatory pathway and molecular mechanism of mustard tumourous stem development are unclear, and molecular biological research on tumourous stem mustard is rare.
By transcriptome sequence analysis, we obtained 54,577,780 reads corresponding to about 5 Gb of raw sequence data. The predicted 146,265 unigenes were subjected to BLAST annotation, and 72% of the unigenes returned a significant BLAST result. As expected, most unigenes shared the highest sequence similarity with crucifers (A. thaliana, A. lyrata subsp. lyrata, and Brassica). The number of genes similar to Brassica genes was lower than the number of similar genes between A. thaliana and A. lyrata subsp. lyrata (Figure 2), possibly due to the characteristics of the B. juncea transcriptome in the NCBI protein database. Arabidopsis thaliana is an important model plant with a clear genetic background that is very useful for researching gene functions in tumourous stem mustard. Our transcriptome analysis is the first high-throughput sequencing of tumourous stem mustard and will serve as a basis for other studies.
To investigate the regulatory pathway and molecular mechanism of tumour swelling, we created five DGE libraries from plants at different developmental stages and samples from a non-swollen mutant to analyse the gene expression patterns at various developmental stages. The quality of the DGE libraries was further confirmed by qRT-PCR analysis. Because the tumours continue to swell for a long period after initiation, different swelling stages were selected for each experimental group, and non-swollen stages of Yong'an and the non-swollen mutant strain Dayejie were used for comparison. The DGE profiles of the swelling stages were compared with the controls. Different genes may be involved in tumourous stem formation, and the common genes in the six comparison groups reduced the number of differentially expressed genes that might be related to tumourous stem development.
Compared with the GO annotation results of the DGE screening of genes in the transcriptome data, we found no genes distributed in the three molecular function subcategories (enzyme regulator activity, structural molecular activity, or translation regulator activity) in the DGE group, indicating that these three subcategories are not related to tumour swelling. A comparison of the results of Additional file 3: Table S3 and Table S6 showed that the pathway order number of 'Plant hormone signal transduction' was about 10 (Additional file 6: Table S6), suggesting that these genes were screened out and that the pathway 'plant hormone signal transduction' is related to tumour swelling. Mapping the DGE data back to the transcriptome database revealed that about 30% of the reads were mapped and that > 60% remained transcribed sequences. Although large amounts of data were obtained by transcriptome sequencing, the reference sequences may still be insufficient and may have caused the lower mapped ratio, which could be resolved by increasing the sequencing depth and enhancing the accuracy of the assembly.
Although 1,042 differentially expressed genes were discovered using the above method, the key genes related to tumourous stem formation need to be analysed further. The genes with the greatest changes in expression were selected for further study; a log 2 ratio value > 10 was used as a threshold to select thirteen genes for further analysis. Seven of the genes have unknown functions and six genes have a functional annotation based on sequence similarity. Of the six annotated genes, four gene functions require further clarification, and two genes, phytochrome kinase substrate 1 (PKS1) (Unigene142399_num2_yongan) and ABR1 (ABA REPRESSOR1) (Unigene140028_num2_yongan), have functions whose details are relatively well known. Girdhar et al.  determined that ABR1 functions in the negative regulation of ABA responses during seed germination and plays a role in the ABA signalling pathway in Arabidopsis. In this study, the expression level of ABR1 was significantly higher in the inflation stage of the stem tumour. This suggests that ABA signalling is related to stem inflation. As Additional file 6: Table S6 shows, four genes may be involved in the 'plant hormone signal transduction [ko04075]' pathway, suggesting that plant hormones are related to stem swelling; however, ABR1 was not one of the four genes annotated in the 'plant hormone signal transduction' pathway in this study. During the inflation stage of stem tumours, plant hormones play key roles in the number of cells in the stem, which increase rapidly accompanied by cell splintering. PKS1 expression was upregulated during the tumour inflation stage. To our knowledge, PKS1 is phosphorylated in a phytochrome-dependent manner and negatively regulates phytochrome light signalling in Arabidopsis . Other reports have shown that PKS1 regulates root phototropism and gravitropism and leaf flattening and positioning, and that it affects the state of phytochrome A in etiolated Arabidopsis seedlings [16–20]. Chen et al. [21, 22] found that illumination and temperature were the main factors affecting the formation of stem tumours, but they did not determine whether PKS1 plays key roles in illumination. Is PKS1 related to stem inflation in tumourous stem mustard? Whether or not the overexpression or knockout/down of ABR1 and PKS1 alters the trend of inflation requires further research.
Although the molecular functions of individual tumourous mustard genes and the associated signal transduction pathways remain largely unknown, the present transcriptome analysis provides valuable information regarding tumourous stem development, which may facilitate future investigations of the detailed regulatory mechanisms and pathways. Additionally, we obtained about 5 Gb of raw sequence data and predicted 146,265 assembled unigenes, and annotated 105,555 unigenes using Illumina sequencing technology. This is the first large-scale transcriptome and expression study of the tumourous stem mustard, Brassica juncea var. tumida Tsen et Lee for which the data has been deposited in GenBank (as of October 2011). We believe that our data not only provide more molecular information for researchers, but will also help to accelerate gene expression and function research in Brassica juncea as well as that on the evolutionary relationships of the Cruciferae.
We acknowledge the Beijing Genomics institute at Shenzhen for its assistance in original data processing and related bioinformatics analysis. This work was supported by the Natural Science Foundation of China (NSFC grant No. 31071461) and Chongqing Natural Science Foundation(CSTC,2011AB1095; CSTC,2009BA1088).
- Maluszynska J, Hasterok R: Identification of individual chromosomes and parental genomes in Brassica juncea using GISH and FISH. Cytogenet Genome Res. 2005, 109 (1-3): 310-314. 10.1159/000082414.PubMedView ArticleGoogle Scholar
- Lin HQ, Zhou GF, Fan YH, Wang XY, Liu YH, Wang B: Effects of sowing date on the main properties in tumorous stem mustard. Southwest China J Agricul Sci. 2005, 18 (3): 365-367. Chinese.Google Scholar
- Liu YH, Zhang ZR, Leng R, Li J: Response and Filtration to Sowing Date in Tumorous Stem Mustard (Brassicajuncea var. mmida Tsen et Lee). Southwest China J Agricul Sci. 2010, 23 (3): 805-809. Chinese.Google Scholar
- Ansorge WJ: Next-generation DNA sequencing techniques. N Biotechnol. 2009, 25 (4): 195-203. 10.1016/j.nbt.2008.12.009.PubMedView ArticleGoogle Scholar
- Hoen PA, Ariyurek Y, Thygesen HH, Vreugdenhil E, Vossen RH, de Menezes RX, Boer JM, van Ommen GJ, den Dunnen JT: Deep sequencing-based expression analysis shows major advances in robustness, resolution and inter-lab portability over five microarray platforms. Nucleic Acids Res. 2008, 36 (21): e141-10.1093/nar/gkn705.PubMedPubMed CentralView ArticleGoogle Scholar
- Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y: RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 2008, 18 (9): 1509-1517. 10.1101/gr.079558.108.PubMedPubMed CentralView ArticleGoogle Scholar
- Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B: Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008, 5 (7): 621-628. 10.1038/nmeth.1226.PubMedView ArticleGoogle Scholar
- Li R, Zhu H, Ruan J, Qian W, Fang X, Shi Z, Li Y, Li S, Shan G, Kristiansen K, Li S, Yang H, Wang J, Wang J: De novo assembly of human genomes with massively parallel short read sequencing. Genome Res. 2010, 20 (2): 265-272. 10.1101/gr.097261.109.PubMedPubMed CentralView ArticleGoogle Scholar
- Iseli C, Jongeneel CV, Bucher P: ESTScan: a program for detecting, evaluating, and reconstructing potential coding regions in EST sequences. Proc Int Conf Intell Syst Mol Biol. 1999, 138-148.Google Scholar
- Conesa A, Götz S, García-Gómez JM, Terol J, Talón M, Robles M: Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics. 2005, 21 (18): 3674-3676. 10.1093/bioinformatics/bti610.PubMedView ArticleGoogle Scholar
- Ye J, Fang L, Zheng H, Zhang Y, Chen J, Zhang Z, Wang J, Li S, Li R, Bolund L, Wang J: WEGO: a web tool for plotting GO annotations. Nucleic Acids Res. 2006, W293-297. 34 Web Server.
- Audic S, Claverie JM: The significance of digital gene expression profiles. Genome Res. 1997, 7 (10): 986-995.PubMedGoogle Scholar
- Kanehisa M, Goto S, Kawashima S, Okuno Y, Hattori M: Database issue, 280: The KEGG resource for deciphering the genome. Nucleic Acids Res. 2004, 32: D277-10.1093/nar/gkh063.PubMedPubMed CentralView ArticleGoogle Scholar
- Pandey GK, Grant JJ, Cheong YH, Kim BG, Li L, Luan S: ABR1, an APETALA2-domain transcription factor that functions as a repressor of ABA response in Arabidopsis. Plant Physiol. 2005, 139 (3): 1185-1193. 10.1104/pp.105.066324.PubMedPubMed CentralView ArticleGoogle Scholar
- Fankhauser C, Yeh KC, Lagarias JC, Zhang H, Elich TD, Chory J: PKS1, a substrate phosphorylated by phytochrome that modulates light signaling in Arabidopsis. Science. 1999, 284 (5419): 1539-1541. 10.1126/science.284.5419.1539.PubMedView ArticleGoogle Scholar
- Lariguet P, Schepens I, Hodgson D, Pedmale UV, Trevisan M, Kami C, de Carbonnel M, Alonso JM, Ecker JR, Liscum E, Fankhauser C: Phytochrome kinase substrate 1 is a phototropin 1 binding protein required for phototropism. Proc Natl Acad Sci USA. 2006, 103 (26): 10134-10139. 10.1073/pnas.0603799103.PubMedPubMed CentralView ArticleGoogle Scholar
- Boccalandro HE, De Simone SN, Bergmann-Honsberger A, Schepens I, Fankhauser C, Casal JJ: Phytochrome kinase substrate1 regulates root phototropism and gravitropism. Plant Physiol. 2008, 146 (1): 108-115.PubMedPubMed CentralView ArticleGoogle Scholar
- Molas ML, Kiss JZ: PKS1 plays a role in red-light-based positive phototropism in roots. Plant Cell Environ. 2008, 31 (6): 842-849. 10.1111/j.1365-3040.2008.01797.x.PubMedView ArticleGoogle Scholar
- Carbonnel M, Davis P, Roelfsema MR, Inoue S, Schepens I, Lariguet P, Geisler M, Shimazaki K, Hangarter R, Fankhauser C: The Arabidopsis phytochrome kinase substrate2 protein is a phototropin signaling element that regulates leaf flattening and leaf positioning. Plant Physiol. 2010, 152 (3): 1391-1405. 10.1104/pp.109.150441.PubMedPubMed CentralView ArticleGoogle Scholar
- Sineshchekov V, Fankhauser C: PKS1 and PKS2 affect the phyA state in etiolated Arabidopsis seedlings. Photochem Photobiol Sci. 2004, 3 (6): 608-611. 10.1039/b315431a.PubMedView ArticleGoogle Scholar
- Chen ZJ, Cong L, Wang BL, Wu GL: Studies on the formation of stem swelling and the bud differentiation and their relationship in tuber mustard. J Zhejiang Agricul University. 1994, 20 (3): 267-272. Chinese.Google Scholar
- Chen ZJ, Wang BL, Gong I, Wu GL, Zhang MF: The effect of temperature and night on development of tuber mustard. Acta Agriculturae Zhejiangensis. 1997, 9 (1): 10-15. Chinese.Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.