Biotechnological methods are being successfully employed in plant breeding  to confer, for example, increased resistances to pathogens. To use biotechnology in breeding programs, however, prior genetic and genomic knowledge is needed to isolate genes and characterize genotypes and phenotypes . Plant genomics, especially for non-model plants, is always a challenge due to the high levels of ploidy, large genome sizes, low genome complexities and large proportions of repeat sequences. Instead of sequencing the complete genome, sequencing the transcriptome is a good alternative for rapidly and efficiently accessing the expressed genes and for characterizing phenotypes .
The characterization of transcriptomes can use hybrid- or sequenced-based technologies, but all hybrid-based technologies require prior knowledge of the transcriptome. In plant biology, Expressed Sequence Tags (ESTs) based on the sequencing of cDNA libraries have been a great advance and have contributed significantly to the study of plant genomics. The sequencing of cDNA libraries, however, has many limitations, such as the cost in time and money involved in cloning and Sanger sequencing. The serial analysis of gene expression (SAGE) and the improved variant SuperSAGE have been used to increase throughput and sampling, but the reliability of the identification of transcripts is one of their limitations . The second generation of sequencing (2-GS) technology circumvents these limitations and generates a large number of reads or datasets of short reads in a shorter time and with a lower reagent cost per base than Sanger sequencing. 2-GS technologies have opened great prospectives for genetic studies of non-model plants .
The genome of black pepper is very poorly characterized, with only 184 sequences deposited in Genbank (access 3/11/2011). Previous genetic studies have focused on the diversity, phylogeny and taxonomy of cultivated black pepper and its wild relatives [19–22]. Despite the agricultural and economic importance of this spice, only DNA sequences of phylogenetics markers are available in molecular databases (Figure 1). Another difficulty for the genomic study of black pepper is the lack of a sequenced genome from the Magnoliid group, describe as basal angiosperms group [23, 24] (Figure 1). All full-sequenced genomes available in databases are from monocot or dicot plants, plants phylogenetically distant from black pepper, as show in Figure 1[23, 24].
About 20 Gb of the data can be mapped, corresponding to 71 million short reads of 50 bp. De novo assembly generated 22363 contigs, 10338 unigenes with an N50 of 168 bp. The N50 is slightly less than the 202 bp obtained in an Illumina 75 bp dataset for the Ipomoea batatas transcriptome and the 208 bp obtained in a 75 bp single- and paired-end dataset from Camellia sinensis[25, 26]. The number of contigs, however, is lower than in these other studies: 56516 and 127094 contigs for I. batatas and C. sinensis, respectively. Although that the read number is higher in our study, (71 million vs 59 and 35 million for I. batatas and C. sinensis), the contig number is inferior. This result could be explain by the read lengths were different (50 bp vs 75 pb) and SOLiD dataset is lower in quality, only 50–60% of reads mapped against a transcriptome reference [27, 28].
Estimating the amount of coverage of the transcriptome is complex due to the lack of genetic data from black pepper. Black pepper is tetraploid with 2n = 52. The size of its genome, estimated by cytogenetic studies, is about 1220 Mbp (1C = 1.25 pg), which is about ten-fold larger than the genome of Arabidopsis thaliana. In A. thaliana, about 6% of the genome is transcribed, representing 41671 transcripts . In the A. thaliana TGI database, however, two root-derived cDNA libraries have largely been sequenced and have identified 5609 and 5884 transcripts (11.3 and 9.3 Mbp), which seems to indicate that only 14% of predicted genes are expressed in the root . Assuming a similar proportion of transcription in P. nigrum, the entire transcriptome is estimated to be 73.2 Mbp (6%) and the root transcriptome to be 10.3 Mbp (0.84%).
The cDNA library was sequenced with a high coverage per base, 62x on average, summing 3.8 Mbp. Shown as an example in Figure 4, the transcript NODE-882-2-0_930 has a good read coverage, up to 20x, and shows a high identity and a good co-linearity with an mRNA sequence from P. trichocarpa, indicating a good quality of sequencing and assembly. The comparison of predicted CDSs with the proteomes of other plants demonstrates that the percentage of sequences with significant homology is higher with dicots (51–53%) than with monocots (15–23.4%). This low degree of homology may be due to the phylogenetic distance between magnoliids and others groups (eucots and dicots) and to the different average transcript size (211 pb), as described during the sequencing of the I. batatas transcriptome . The method of k-mer additive assembly is efficient in maximizing the number of transcripts, and interestingly, the proportion of transcripts with significant homology (22–35%), the N50 (129 to 236 bp) and the average length of transcript (156–214 bp) are relatively well conserved for each k-mer (19–43 bp) (Figure 2). Comparison with other studies of plant transcriptomes is difficult because the read length used with the Illumina platform is longer (75 bp) and the library is paired-end, which clearly facilitates the process of assembly, as demonstrated for I. batatas. Finally, limitations of the 2-GS platform produce high numbers of short contigs, but our dataset has 2144 unigenes over 200 bp in length and about 72% of reads with a significant homology, which represent a significant advance in our biological knowledge of black pepper.
The functional annotation of the dataset of transcripts is informative for the physiology of stems and roots of black pepper. NGS datasets from plant root tissue are only available for ginseng and sweetpotato. The “response to stimulus” and “localization” categories of Figure 5 are highly represented in the transcriptomes from roots of both sweetpotato and black pepper . The profile of molecular functions is conserved between the two root plants. In both, the function “transporter activity” is quite high. This feature may be explained by the fact that roots are very important in the absorption of microelements. This preliminary annotation of the transcriptome from black pepper is very important and should lead to the identification of new genes coding for transporters, transcription factors specific to root and stem tissue or proteins used for defence.