Phaseolus vulgaris or common bean is the most important edible food legume in the world. It provides 15% of the protein and 30% of the caloric requirement to the world's population, and represents 50% of the grain legumes consumed worldwide . Common bean has several market classes, which include dry beans, canned beans, and green beans. The related legume soybean (Glycine max), which is one of the most important sources of seed protein and oil content belongs to the same group of papilionoid legumes as common bean. Common bean and soybean diverged nearly 20 million years ago around the time of the major duplication event in soybean [2, 3]. Synteny analysis indicates that most segments of any one common bean linkage group are highly similar to two soybean chromosomes . Since P. vulgaris is a true diploid with a genome size estimated to be between 588 and 637 mega base pairs (Mbp) [5–7], it will serve as a model for understanding the ~1,100 million base pairs (Mbp) soybean genome . Common bean is also related to other members of the papilionid legumes including cowpea (Vigna unguiculata) and pigeon pea (Vigna radiata). Therefore, better knowledge of the common bean genome will facilitate better understanding of other important legumes as well as the development of comparative genomics resources.
The common bean genome is currently being sequenced . When the sequencing of the genome is complete, this will require the prediction, annotation and validation of the expressed genes in common bean. The availability of large sets of annotated sequences as derived by identification, sequencing, and validation of genes expressed in the common bean will help in the development of an accurate and complete structural annotation of the common bean genome, a valid transcriptome map, and the identification of the genetic basis of agriculturally important traits in common bean. The transcriptome sequences will also help in the identification of transcription factors and small RNAs in common bean, understanding of gene families, and very importantly the development of molecular markers for common bean.
To date there are several relevant and important publications in common bean transcriptome sequencing and bioinformatics analyses. Ramirez et al.  sequenced 21,026 ESTs from various cDNA libraries (nitrogen-fixing root nodules, phosphorus-deficient roots, developing pods, and leaves) derived from the Meso-American common bean genotype Negro Jamapa 81, and leaves from the Andean genotype G19833. Approximately 10,000 of these identified ESTs were classified into 2,226 contigs and 7,969 singletons.
Melotto et al.  constructed three cDNA libraries from the common bean breeding line SEL1308. These libraries were comprised of 19-day old trifoliate leaves, 10-day old shoots, and 13-day old shoots (inoculated with Colletotrichum lindemuthianum). Of the 5,255 single-pass sequences obtained from this work, trimming and clustering helped identify 3,126 unigenes, and of these only 314 unigenes showed similarity to sequences from the existing database.
Tian et al.  constructed a suppression substractive cDNA library to identify genes involved in response to phosphorous starvation. Six-day old seedlings from the genotype G19833 were exposed to high and low phosphorus (five and 1,000 μmol/L) respectively and the poly (A+) RNA derived from total shoot and root RNA from plants in these conditions was used for construction of the libraries. After dot-blot hybridization and identification of differentially expressed clones, full-length cDNAs were identified from cDNA libraries constructed from the low and high P exposure experiments. Differentially expressed genes were characterized into five functional groups, and these authors were able to further classify 72 genes by comparison to the GenBank non-redundant database using BLASTx values less than 1.0 × 1e-2).
Thibivilliers et al.  identified 6, 202 new common bean ESTs (out of a total of 10,221 ESTs) by using a substractive cDNA library constructed from the common bean rust resistant-cultivar Early Gallatin. This cultivar was inoculated with races 49 (avirulent on genotypes such as Early Gallatin carrying the rust resistance locus Ur-4) and 41 (a virulent race that is not recognized by Ur-4). In order to identify genes which are differentially expressed, suppression substractive expression experiments were carried out to identify sequences which were up-regulated in response to susceptible and resistant host-pathogen interactions.
Despite these studies in common bean, there is still a paucity in the number of common bean ESTs and genes that have been deposited in GenBank (~83,448 ESTs, as of September, 2010) compared to other legume and plant models. Therefore, there is a need for deeper coverage and EST sequences from diverse common bean tissues and genotypes.
There has been an evolution in sequencing technologies starting with the traditional dideoxynucleotide sequencing to capillary-based sequencing to current "next-generation" sequencing [12, 13]. The emergence of next-generation sequencing technologies has substantially helped advance plant genome research, particularly for non-model plant species . Next generation sequencing strategies typically have the ability to generate millions of reads of sequences at a time, without the need for cloning of the fragment libraries; these are faster than traditional capillary-based methods which may be limited to 96 samples in a run and require the nucleic acid material (DNA or complementary DNA; cDNA) to be cloned into a plasmid and amplified by Escherichia coli (E. coli). Therefore, cloning bias that is typically present in genome sequencing projects can be avoided, although depending on the specific platform used for next generation sequencing, there may be other specific biases involved. An advantage of some next generation sequencing technologies is that information on genome organization and layout may not be necessary a priori. The Roche 454 method uses the pyrophosphate molecule released when nucleotides are incorporated by DNA polymerase into the growing DNA chain to fuel reactions that result in the detection of light resulting from cleavage of oxyluciferin by luciferase . Using an emulsion PCR approach, it has the ability to sequence 400 to 500 nucleotides of paired ends and produces approximately 400-600 Mbp per run. This method has been applied to genome  and transcriptome [17–19] sequencing due to its high throughput, coverage, and savings in cost.
In A. thaliana, pyrosequencing has been tested successfully to verify whether this technology is able to provide an unbiased representation of transcripts as compared to the sequenced genome. Using messenger RNA (mRNA) derived from Arabidopsis seedlings, Weber and colleagues  identified 541,852 ESTs which accounted for nearly 17,449 gene loci and thus provided very deep coverage of the transcriptome. The analysis also revealed that all regions of the mRNA transcript were equally represented therefore removing issues of bias, and very importantly, over 16,000 of the ESTs identified in this research were novel and did not exist in the existing EST database. Therefore, these researchers concluded that the pyrosequencing platform has the ability to aid in gene discovery and expression analysis for non-model plants, and could be used for both genomic and transcriptomic analysis.
In the legume Medicago truncatula, the 454 technology has been used to generate 252,384 reads with average (cleaned) read length of 92 nucleotides , with a total of 184,599 unique sequences generated after clustering and assembly. Gene ontology (GO) assignments from matches to the completed Arabidopsis sequence showed a broad coverage of the GO categories. Cheung and colleagues  were also able to map 70,026 reads generated in this research to 785 Medicago BAC sequences. In their analysis of the maize shoot apical meristem, Emrich and colleagues  discovered 261,000 ESTs, annotated more than 25,000 maize genomic sequences, and identified ~400 maize transcripts for which homologs have not been identified in any other species. The value of this approach in novel gene/EST discovery is underlined by the fact that nearly 30% of the ESTs identified in this study did not match the ~648,000 maize ESTs in the databases. Velasco and colleagues  generated a draft genome of grape, Vitis vinifera Pinot Noir by using a combination of Sanger sequencing and 454 sequencing. They identified approximately 29,585 predicted genes of which 96.1% could be assigned to genetic linkage groups (LGs). Many of the genes identified have potential implications on grapevine cultivation including those that influence wine quality, and response to pathogens. Detailed analysis was also carried out to identify sequences related to disease resistance, phenolic and terpenoid pathways, transcription factors, repetitive elements, and non-coding RNAs (including microRNAs, transfer RNAs, small nuclear RNAs, ribosomal RNAs and small nucleolar RNAs).
Sequences obtained in common bean by deep sequencing can be mapped onto common bean maps by using syntenic relationships between common bean and soybean; these two species diverged over 19 MYA. McClean et al.  determined syntenic relationships between common bean and soybean by taking genetically positioned transcript loci and mapping to the soybean 1.01 pseudochromosome assembly. Since prior evidence has shown that almost every common bean locus maps to two soybean locations (recent diploidy and polyploidy respectively), and a genome assembly is not yet available in common bean, this synteny can be effectively utilized. Therefore, by referencing common bean loci with unknown physical map positions (in common bean) to syntenic regions in soybean, and then referencing back to the common bean genetic map, approximate locations of common bean transcript loci were determined. Using this method, the authors  were able to determine median physical-to-genetic distance ratio in common bean to be ~120 Kb/cM (based on the soybean physical distance derived from the pseudochromosome assembly). This allowed the placing of ~15,000 EST contigs and singletons on the common bean map, and this strategy will allow for the discovery and chromosomal locations of genes controlling important traits in both common bean and soybean. Therefore, until the common bean genome is completed, we can now use synteny with soybean to determine more accurate locations of common bean transcripts.