High-throughput sequencing of black pepper root transcriptome
© Gordo et al.; licensee BioMed Central Ltd. 2012
Received: 26 January 2012
Accepted: 29 August 2012
Published: 17 September 2012
Black pepper (Piper nigrum L.) is one of the most popular spices in the world. It is used in cooking and the preservation of food and even has medicinal properties. Losses in production from disease are a major limitation in the culture of this crop. The major diseases are root rot and foot rot, which are results of root infection by Fusarium solani and Phytophtora capsici, respectively. Understanding the molecular interaction between the pathogens and the host’s root region is important for obtaining resistant cultivars by biotechnological breeding. Genetic and molecular data for this species, though, are limited. In this paper, RNA-Seq technology has been employed, for the first time, to describe the root transcriptome of black pepper.
The root transcriptome of black pepper was sequenced by the NGS SOLiD platform and assembled using the multiple-k method. Blast2Go and orthoMCL methods were used to annotate 10338 unigenes. The 4472 predicted proteins showed about 52% homology with the Arabidopsis proteome. Two root proteomes identified 615 proteins, which seem to define the plant’s root pattern. Simple-sequence repeats were identified that may be useful in studies of genetic diversity and may have applications in biotechnology and ecology.
This dataset of 10338 unigenes is crucially important for the biotechnological breeding of black pepper and the ecogenomics of the Magnoliids, a major group of basal angiosperms.
Black pepper (Piper nigrum) is one of the most popular and oldest spices in the world, with culinary uses and more recently as a food preservative . Due its medicinal properties, it is used in traditional medicine for its antioxidant, anti-inflammatory and anticancer properties [2, 3]. This species is the second most widely traded spice in the world, accounting for a value of 929 million US dollars from a production of 445900 tons in 2009 .
P. nigrum is a perennial, climbing vine indigenous to the southwestern region of India. Its production is limited to rainy tropical regions such as those found in Asia and South America. Tropical climates favor the development of diseases, and crop losses due to pests and diseases are a major constraint in the production of black pepper . Resistance against the two major pathogens, Phytophtora capsici and Fusarium solani f. sp. piperis, has not been achieved by classical breeding of the germplasm of cultivated black pepper and remains a major challenge that may be amenable to improvement using plant biotechnology . Despite the agricultural and economic importance of black pepper, knowledge of its genetics is presently very limited .
For non-model plants such as black pepper with little or no molecular information available, next-generation sequencing (NGS) technologies offer a great opportunity for the rapid access to genetic information [6–9]. In plant biology, due to large sizes of genomes and high levels of ploidy, the characterization of transcriptomes is a powerful tool for the identification of proteins. RNA-seq can sequence, in an extremely high-throughput and quantitative manner, the transcriptome of an organism or tissue . Next-generation sequencing technologies generate large numbers of reads with high sampling rates of cDNA libraries, providing a deeper and more complete view of transcriptomes . This transcriptome-wide description of RNA sequences identifies the coding sequence, single nucleotide polymorphisms (SNPs), other polymorphisms, splicing variants and relative levels of expression by counting reads, a method known as digital gene expression (DGE) [9, 10].
In this work, we used RNA-seq technology to analyze the transcriptome of black pepper. About 71 millions reads were generated and 22363 de novo assembled transcripts were obtained from tissue of the stem-root. The transcripts were functionally annotated and represent the first sequences derived from the transcriptome of P. nigrum. About 257 new simple sequence repeats (SSRs) were described. These results, the first dataset of sequences of the Magnoliid group, will open new perspectives for studies of diversity and ecology and will benefit breeding and genomic programs aiming to understand plant-pathogen interactions in the root, the organ that is central in the process of infection by F. solani and P. capsici, the two major pathogens of black pepper.
Overview of sequencing and assembly
Total number of reads
Number of unique high-quality reads
Number of contigs obtained with Velvet/Oases/STM-
Total size (bp)
Largest transcript length (bp)
N50 length (bp)
No ATGC characters
Average coverage per base
Predicted CDS (partial/complete)
Reads mapped in transcripts
Most expressed transcript (rpkm)
De novotranscriptome assembly
A final dataset of 22363 contigs was obtained, corresponding to 3.8 Mbp. After contigs assembly, the contigs dataset was filtered using SeqClean tool and were removed 1357 contigs with rDNA sequences, 1640 with mitochondrial, 539 with chloroplastidial and 154 plant DNA repeats sequences (http://seqclean.sourceforge.net/). All contigs with a length inferior to 100 bp were excluded.
Functional annotation of the transcriptome
Percent of predicted proteins with homology to protein sequences from databases of other species
% with homology
Characterization of proteome predicted from CDS of root libraries
Arabidopsis thaliana root library 5536
Black pepper root (our work)
OrthoMCL proteins families
Arabidopsis thaliana root library 5536
Black pepper root (our work)
Detection of microsatellites
Biotechnological methods are being successfully employed in plant breeding  to confer, for example, increased resistances to pathogens. To use biotechnology in breeding programs, however, prior genetic and genomic knowledge is needed to isolate genes and characterize genotypes and phenotypes . Plant genomics, especially for non-model plants, is always a challenge due to the high levels of ploidy, large genome sizes, low genome complexities and large proportions of repeat sequences. Instead of sequencing the complete genome, sequencing the transcriptome is a good alternative for rapidly and efficiently accessing the expressed genes and for characterizing phenotypes .
The characterization of transcriptomes can use hybrid- or sequenced-based technologies, but all hybrid-based technologies require prior knowledge of the transcriptome. In plant biology, Expressed Sequence Tags (ESTs) based on the sequencing of cDNA libraries have been a great advance and have contributed significantly to the study of plant genomics. The sequencing of cDNA libraries, however, has many limitations, such as the cost in time and money involved in cloning and Sanger sequencing. The serial analysis of gene expression (SAGE) and the improved variant SuperSAGE have been used to increase throughput and sampling, but the reliability of the identification of transcripts is one of their limitations . The second generation of sequencing (2-GS) technology circumvents these limitations and generates a large number of reads or datasets of short reads in a shorter time and with a lower reagent cost per base than Sanger sequencing. 2-GS technologies have opened great prospectives for genetic studies of non-model plants .
The genome of black pepper is very poorly characterized, with only 184 sequences deposited in Genbank (access 3/11/2011). Previous genetic studies have focused on the diversity, phylogeny and taxonomy of cultivated black pepper and its wild relatives [19–22]. Despite the agricultural and economic importance of this spice, only DNA sequences of phylogenetics markers are available in molecular databases (Figure 1). Another difficulty for the genomic study of black pepper is the lack of a sequenced genome from the Magnoliid group, describe as basal angiosperms group [23, 24] (Figure 1). All full-sequenced genomes available in databases are from monocot or dicot plants, plants phylogenetically distant from black pepper, as show in Figure 1[23, 24].
About 20 Gb of the data can be mapped, corresponding to 71 million short reads of 50 bp. De novo assembly generated 22363 contigs, 10338 unigenes with an N50 of 168 bp. The N50 is slightly less than the 202 bp obtained in an Illumina 75 bp dataset for the Ipomoea batatas transcriptome and the 208 bp obtained in a 75 bp single- and paired-end dataset from Camellia sinensis[25, 26]. The number of contigs, however, is lower than in these other studies: 56516 and 127094 contigs for I. batatas and C. sinensis, respectively. Although that the read number is higher in our study, (71 million vs 59 and 35 million for I. batatas and C. sinensis), the contig number is inferior. This result could be explain by the read lengths were different (50 bp vs 75 pb) and SOLiD dataset is lower in quality, only 50–60% of reads mapped against a transcriptome reference [27, 28].
Estimating the amount of coverage of the transcriptome is complex due to the lack of genetic data from black pepper. Black pepper is tetraploid with 2n = 52. The size of its genome, estimated by cytogenetic studies, is about 1220 Mbp (1C = 1.25 pg), which is about ten-fold larger than the genome of Arabidopsis thaliana. In A. thaliana, about 6% of the genome is transcribed, representing 41671 transcripts . In the A. thaliana TGI database, however, two root-derived cDNA libraries have largely been sequenced and have identified 5609 and 5884 transcripts (11.3 and 9.3 Mbp), which seems to indicate that only 14% of predicted genes are expressed in the root . Assuming a similar proportion of transcription in P. nigrum, the entire transcriptome is estimated to be 73.2 Mbp (6%) and the root transcriptome to be 10.3 Mbp (0.84%).
The cDNA library was sequenced with a high coverage per base, 62x on average, summing 3.8 Mbp. Shown as an example in Figure 4, the transcript NODE-882-2-0_930 has a good read coverage, up to 20x, and shows a high identity and a good co-linearity with an mRNA sequence from P. trichocarpa, indicating a good quality of sequencing and assembly. The comparison of predicted CDSs with the proteomes of other plants demonstrates that the percentage of sequences with significant homology is higher with dicots (51–53%) than with monocots (15–23.4%). This low degree of homology may be due to the phylogenetic distance between magnoliids and others groups (eucots and dicots) and to the different average transcript size (211 pb), as described during the sequencing of the I. batatas transcriptome . The method of k-mer additive assembly is efficient in maximizing the number of transcripts, and interestingly, the proportion of transcripts with significant homology (22–35%), the N50 (129 to 236 bp) and the average length of transcript (156–214 bp) are relatively well conserved for each k-mer (19–43 bp) (Figure 2). Comparison with other studies of plant transcriptomes is difficult because the read length used with the Illumina platform is longer (75 bp) and the library is paired-end, which clearly facilitates the process of assembly, as demonstrated for I. batatas. Finally, limitations of the 2-GS platform produce high numbers of short contigs, but our dataset has 2144 unigenes over 200 bp in length and about 72% of reads with a significant homology, which represent a significant advance in our biological knowledge of black pepper.
The functional annotation of the dataset of transcripts is informative for the physiology of stems and roots of black pepper. NGS datasets from plant root tissue are only available for ginseng and sweetpotato. The “response to stimulus” and “localization” categories of Figure 5 are highly represented in the transcriptomes from roots of both sweetpotato and black pepper . The profile of molecular functions is conserved between the two root plants. In both, the function “transporter activity” is quite high. This feature may be explained by the fact that roots are very important in the absorption of microelements. This preliminary annotation of the transcriptome from black pepper is very important and should lead to the identification of new genes coding for transporters, transcription factors specific to root and stem tissue or proteins used for defence.
In model plants, NGS has increased the ability to generate and analyze the expression and detection of most transcripts, to identify genes and their forms of alternative splicing and to analyze DNA methylation and histone modifications on a genomic scale [6, 8]. The technology based on Sanger sequencing has generated much data from model species, and plants with sequenced genomes and transcriptomes have provided a robust database reflecting gene expression. The transcriptomic resources for non-model plants, particularly with NGS, however, are limited, especially for the analysis of expressed sequences, alternative splicing and SNPs.
The sequencing of P. nigrum using the SOLiD platform (short-reads) produced the first dataset of sequences for a non-model species belonging to the Magnoliid group and order Piperales. Assembly without a reference genome or transcriptome is considered difficult, but a partial transcriptome of the non-model species P. nigrum was assembled. This work was aided by the use of some de novo assemblies performed using NGS data from non-model plants such as chickpea Cicer arietinum L. , Eucalyptus grandis x Eucalyptus urophylla hybrid , rubber Hevea brasiliensis, sweetpotato Ipomoea batatas and buckwheat Fagopyrum.
Analysis of SSRs in the data generated has great potential for applications in studies of genetic diversity, plant breeding and reproduction. The sequences produced in this study are very relevant, since little information about black pepper is available in the biological databases, and the data produced can be used in ecological studies and biotechnology.
The tissue samples were obtained from basal root region of eight 70-day-old plantlets. All plantlets used in this study are from Brazilian cultivar of P. nigrum, Bragantina ( Additional file 3: Figure S3). The Bragantina cultivar is an ecotype of the Asian Panniyur-1 cultivar, which is a hybrid of the Uthirankotta and Cheriyakaniyakadan cultivars. Stakes obtained from black pepper plants were rooted in trays containing sand and grown in pots containing vermiculite previously autoclaved twice. The plantlets used in this study were grown and acclimatized in a greenhouse for about around two months.
cDNA preparation and sequencing
Total RNA was extracted from 35 mg of tissue from the stem-root region of P. nigrum using the Illustra RNAspin Mini Kit (GE Healthcare, USA). To remove any DNA contamination, the sample was cleaned with the Oligotex Direct mRNA Mini Kit (Quiagen, USA). The rRNAs were depleted using a RiboMinus Eukaryote Kit for RNA-Seq and a RiboMinus Plant Kit for RNA-Seq (Invitrogen, USA). The concentration of the mRNA was determined using a Qubit analyzer (Invitrogen, USA).
An aliquote (126.5 ng/μL) of depleted RNA was fragmented and used to generate one cDNA library for high-throughput sequencing. The library of fragments was obtained using the SOLiD Total RNA-Seq Kit (Invitrogen, USA). The global procedure of this kit is based on the hybridization of adapters with degenerate ends, followed by reverse transcription and library amplification by PCR . The cDNAs were selected by size on a polyacrylamide gel before and after the library amplification. The sequencing of the cDNA library was performed in a well of a flow cell.
Pre-processing sequencing data
The SOLiD adapter sequences were trimmed from reads using CutAdapt (0.9.5) , and the sequences of reads were converted from SOLiD encoding color space into double-encoded colors in FASTQ format. PRINSEQ (lite 0.14.4)  was used to remove redundant reads, to trim low quality reads (average QV < 20 for a 3-nt window, and QV < 18) and to remove small pre-processed reads (read length < 30 bp) . Statistics of the dataset of reads was obtained using FastQC software (V. 0.9.2) (http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/).
De novotranscriptome assembly and functional annotation
Transcriptome assembly was performed using Velvet and Oases [38, 39]. For optimizing the de novo assembly, the method of additive multiple-k  was used to combine the properties of multiple assemblies using different k-mers (19–43).
The program ESTScan was used to detect potential coding regions in the transcript sequences obtained by assembly . In order to obtain the protein sequence the FrameD tool was employed . The program BWA  was used to align raw reads against all contigs obtained by Velvet assembly for calculating the coverage of the sequencing.
Local BLASTX (E-value 1e-05) was used to search homologous sequences against plant databases from PlantGDB (ftp://ftp.plangdb.org), nr-viridiplantae (NCBI), V. vinifera, A. thaliana, P. trichocarpa, G. max and O. sativa.
The functional annotation was based on BLASTX results with the nr-viridiplantae database, and the program BLAST2GO was used to assign biological functions, cellular components and cellular processes to the transcripts . The groups of orthologous and paralogous proteins were predicted using orthoMCL tools .
Detection of microsatellites
The MISA perl script was used to identify the position of SSRs in the sequence dataset (http://pgrc.ipk-gatersleben.de/misa/misa.html). The search for SSRs in our transcriptome was performed using the detection of bi-, tri- and tetranucleotide repeats, present more than 5x and with a minimum distance of 100 bp.
The authors thank, MSR. Barbosa for technical help. The work was supported by the Rede Paraense de Genômica e Proteômica (Governo do Estado do Pará/SEDECT/FAPESPA), PROPESP/UFPA, FADESP, FINEP and CAPES (Coordenação de Aperfeicoamento de Pessoal de Nivel Superior).
- Nair KPP: Agronomy and Economy of Black Pepper and Cardamom The “King” and “Queen” of Spices. Elsevier, New York 2011.
- Nishimura Y, Kitagishi Y, Yoshida H, Okumura N, Matsuda S: Ethanol extracts of black pepper or turmeric down-regulated SIRT1 protein expression in Daudi culture cells. Mol Med Report. 2011, 4: 727-730.
- Liu Y, Yadev VR, Aggarwal BB, Nair MG: Inhibitory effects of black pepper (Piper nigrum) extracts and compounds on human tumor cell proliferation, cyclooxygenase enzymes, lipid peroxidation and nuclear transcription factor-kappa-B. Nat Prod Commun. 2010, 5: 1253-1257.PubMedView Article
- Food and Agriculture Organization of the United Nations: Statistics of Agricultural Production. FAO, Rome 2011.
- Dias JS, Ortiz R: Plant Breeding Reviews. Edited by: Janick J. 2011, Wiley Blackwell, New York, 151-224. Janick J. (Series Editor), Transgenic Vegetable Crops: Progress, Potentials, and Prospects, Plant Breeding Reviews,View Article
- Hawkins RD, Hon GC, Ren B: Next-generation genomics: an integrative approach. Nat Rev Genet. 2010, 11 (7): 476-486.PubMedPubMed Central
- Lister R, Gregory BD, Ecker JR: Next is now: new technologies for sequencing of genomes, transcriptomes, and beyond. Curr Opin Plant Biol. 2009, 12 (2): 107-118. 10.1016/j.pbi.2008.11.004.PubMedPubMed CentralView Article
- Morozova O, Hirst M, Marra MA: Applications of new sequencing technologies for transcriptome analysis. Annu Rev Genomics Hum Genet. 2009, 10: 135-151. 10.1146/annurev-genom-082908-145957.PubMedView Article
- Wang Z, Gerstein M, Snyder M: RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009, 10 (1): 57-63. 10.1038/nrg2484.PubMedPubMed CentralView Article
- Nagalakshmi U, Waern K, Snyder M: RNA-Seq: a method for comprehensive transcriptome analysis. Curr Protoc Mol Biol. 2010, SUPPL. 89: 4.11.1-4.11.13.
- Babraham Bioinformatics: FASTQC, a quality control tool for the high throughput sequence data. 2012, http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/,
- Surget-Groba Y, Montoya-Burgos JI: Optimization of de novo transcriptome assembly from next-generation sequencing data. Genome Res. 2010, 20: 1432-1440. 10.1101/gr.103846.109.PubMedPubMed CentralView Article
- Zheng Y, Zhao L, Gao J, Fei Z: iAssembler: a package for de novo assembly of Roche-454/Sanger transcriptome sequences. BMC Bioinformatics. 2011, 12: 453-10.1186/1471-2105-12-453.PubMedPubMed CentralView Article
- He S, Liu C, Skogerbø G, Zhao H, Wang J, Liu T, Bai B, Zhao Y, Chen R: NONCODE v2.0: decoding the non-coding. Nucleic Acids Res. 2008, 36: D170-D172. 10.1093/nar/gkn294.PubMedPubMed CentralView Article
- Gotz S, Garcia-Gomez JM, Terol J, Williams TD, Nagaraj SH, Nueda MJ, Robles M, Talon M, Dopazo J, Conesa A: High-throughput functional annotation and data mining with the Blast2GO suite. Nucleic Acids Res. 2008, 36 (10): 3420-3435. 10.1093/nar/gkn176.PubMedPubMed CentralView Article
- Langridge P, Fleury D: Making the most of 'omics' for crop breeding. Trends Biotechnol. 2011, 29 (1): 33-40. 10.1016/j.tibtech.2010.09.006.PubMedView Article
- Wang L, Li P, Brutnell TP: Exploring plant transcriptomes using ultra high-throughput sequencing. Brief Funct Genomic Proteomic. 2010, 9 (2): 118-128.View Article
- Gong W, Shen YP, Ma LG, Pan Y, Du YL, Wang DH, Yang JY, Hu LD, Liu XF, Dong CX, et al: Genome-wide ORFeome cloning and analysis of Arabidopsis transcription factor genes. Plant Physiol (Rockv). 2004, 135 (2): 773-782. 10.1104/pp.104.042176.View Article
- Jaramillo MA, Manos PS: Phylogeny and patterns of floral diversity in the genus Piper (Piperaceae). Am J Bot. 2001, 88 (4): 706-716. 10.2307/2657072.PubMedView Article
- Menezes IC, Cidade FW, Souza AP, Sampaio IC: Isolation and characterization of microsatellite loci in the black pepper, Piper nigrum L. (piperaceae). Conserv Genet Resour. 2009, 1 (1): 209-212. 10.1007/s12686-009-9051-6.View Article
- Sen S, Skaria R, Abdul Muneer PM: Genetic diversity analysis in Piper species (Piperaceae) using RAPD markers. Mol Biotechnol. 2010, 46 (1): 72-79. 10.1007/s12033-010-9281-6.PubMedView Article
- Joy N, Abraham Z, Soniya EV: A preliminary assessment of genetic relationships among agronomically important cultivars of black pepper. BMC Genet. 2007, 8: 42-PubMedPubMed CentralView Article
- Paterson AH: Leafing through the genomes of our major crop plants: strategies for capturing unique information. Nat Rev Genet. 2006, 7 (3): 174-184.PubMedView Article
- Jiao Y, Wickett NJ, Ayyampalayam S, Chanderbali AS, Landherr L, Ralph PE, Tomsho LP, Hu Y, Liang H, Soltis PS, et al: Ancestral polyploidy in seed plants and angiosperms. Nature. 2011, 473 (7345): 97-100. 10.1038/nature09916.PubMedView Article
- Wang Z, Fang B, Chen J, Zhang X, Luo Z, Huang L, Chen X, Li Y: De novo assembly and characterization of root transcriptome using Illumina paired-end sequencing and development of cSSR markers in sweet potato (Ipomoea batatas). BMC Genomics. 2010, 11: 726-10.1186/1471-2164-11-726.PubMedPubMed CentralView Article
- Shi CY, Yang H, Wei CL, Yu O, Zhang ZZ, Jiang CJ, Sun J, Li YY, Chen Q, Xia T, et al: Deep sequencing of the Camellia sinensis transcriptome revealed candidate genes for major metabolic pathways of tea-specific compounds. BMC Genomics. 2011, 12: 131-10.1186/1471-2164-12-131.PubMedPubMed CentralView Article
- Goff L, Davila J, Swerdel M, Moore J, Cohen R, Wu H, Sun Y, Hart R: Ago2 immunoprecipitation identifies predicted microRNAs in human embryonic stem cells and neural precursors. PLoS One. 2009, 4 (9): e7192-10.1371/journal.pone.0007192.PubMedPubMed CentralView Article
- Ribeiro-dos-Santos A, Khayat AS, Silva A, Alencar DO, Lobato J, Luz L, Pinheiro DG, Varuzza L, Assumpcao M, Assumpcao P, et al: Ultra-deep sequencing reveals the microRNA expression pattern of the human stomach. PLoS One. 2010, 5 (10): e13205-10.1371/journal.pone.0013205.PubMedPubMed CentralView Article
- Soltis DE, Soltis PS, Bennett MD, Leitch LJ: Evolution of genome size in the angiosperms. Am J Bot. 2003, 90: 1596-1603. 10.3732/ajb.90.11.1596.PubMedView Article
- Quackenbush J, Cho J, Lee D, Liang F, Holt I, Karamycheva S, Parvizi B, Pertea G, Sultana R, White J: The TIGR Gene Indices: analysis of gene transcript sequences in highly sampled eukaryotic species. Nucleic Acids Res. 2001, 29 (1): 159-164. 10.1093/nar/29.1.159.PubMedPubMed CentralView Article
- Garg R, Patel RK, Jhanwar S, Priya P, Bhattacharjee A, Yadav G, Bhatia S, Chattopadhyay D, Tyagi AK, Jain M: Gene Discovery and Tissue-Specific Transcriptome Analysis in Chickpea with Massively Parallel Pyrosequencing and Web Resource Development. Plant Physiol (Rockv). 2011, 156: 1661-1678. 10.1104/pp.111.178616.View Article
- Mizrachi E, Hefer CA, Ranik M, Joubert F, Myburg AA: De novo assembled expressed gene catalog of a fast-growing Eucalyptus tree produced by Illumina mRNA-Seq. BMC Genomics. 2010, 11: 681-10.1186/1471-2164-11-681.PubMedPubMed CentralView Article
- Xia Z, Xu H, Zhai J, Li D, Luo H, He C, Huang X: RNA-Seq analysis and de novo transcriptome assembly of Hevea brasiliensis. Plant Mol Biol. 2011, 77 (3): 299-308. 10.1007/s11103-011-9811-z.PubMedView Article
- Logacheva MD, Kasianov AS, Vinogradov DV, Samigullin TH, Gelfand MS, Makeev VJ, Penin AA: De novo sequencing and characterization of floral transcriptome in two species of buckwheat (Fagopyrum). BMC Genomics. 2011, 12: 30-10.1186/1471-2164-12-30.PubMedPubMed CentralView Article
- Applied_Biosystems: SOLiD™ Total RNA-Seq Kit. 2010
- Martins M: Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal. 2011, 17 (1): 10-12.View Article
- Schmieder R, Edwards R: Quality control and preprocessing of metagenomic datasets. Bioinformatics. 2011, 27: 863-864. 10.1093/bioinformatics/btr026.PubMedPubMed CentralView Article
- Schulz MH, Zerbino DR, Vingron MA, Birney EC: Oases: Robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics. 2012, 28 (8): 1086-1092. 10.1093/bioinformatics/bts094.PubMedPubMed CentralView Article
- Zerbino DR, Birney E: Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 2008, 18: 821-829. 10.1101/gr.074492.107.PubMedPubMed CentralView Article
- Iseli C, Jongeneel CV, Bucher P: ESTScan: a program for detecting, evaluating, and reconstructing potential coding regions in EST sequences. Proc Int Conf Intell Syst Mol Biol. 1999, ISMB99: 138-148.
- Schiex T, Gouzy J, Moisan A, de Oliveira Y: FrameD: A flexible program for quality check and gene prediction in prokaryotic genomes and noisy matured eukaryotic sequences. Nucleic Acids Res. 2003, 31 (13): 3738-3741. 10.1093/nar/gkg610.PubMedPubMed CentralView Article
- Li H, Durbin R: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009, 25 (14): 1754-1760. 10.1093/bioinformatics/btp324.PubMedPubMed CentralView Article
- Conesa A, Gotz S: Blast2GO: A comprehensive suite for functional analysis in plant genomics. Int J Plant Genomics. 2008, 2008: 619832-PubMedPubMed CentralView Article
- Fischer S, Brunk BP, Chen F, Gao X, Harb OS, Iodice JB, Shanmugam D, Roos DS, Stoeckert CJ: Using OrthoMCL to assign proteins to OrthoMCL-DB groups or to cluster proteomes into new ortholog groups. Curr Protoc Bioinformatics. 2011, 6: 6.12.11-6.12.19.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.