Construction and EST sequencing of full-length, drought stress cDNA libraries for common beans (Phaseolus vulgaris L.)
© Blair et al; licensee BioMed Central Ltd. 2011
Received: 21 January 2011
Accepted: 25 November 2011
Published: 25 November 2011
Common bean is an important legume crop with only a moderate number of short expressed sequence tags (ESTs) made with traditional methods. The goal of this research was to use full-length cDNA technology to develop ESTs that would overlap with the beginning of open reading frames and therefore be useful for gene annotation of genomic sequences. The library was also constructed to represent genes expressed under drought, low soil phosphorus and high soil aluminum toxicity. We also undertook comparisons of the full-length cDNA library to two previous non-full clone EST sets for common bean.
Two full-length cDNA libraries were constructed: one for the drought tolerant Mesoamerican genotype BAT477 and the other one for the acid-soil tolerant Andean genotype G19833 which has been selected for genome sequencing. Plants were grown in three soil types using deep rooting cylinders subjected to drought and non-drought stress and tissues were collected from both roots and above ground parts. A total of 20,000 clones were selected robotically, half from each library. Then, nearly 10,000 clones from the G19833 library were sequenced with an average read length of 850 nucleotides. A total of 4,219 unigenes were identified consisting of 2,981 contigs and 1,238 singletons. These were functionally annotated with gene ontology terms and placed into KEGG pathways. Compared to other EST sequencing efforts in common bean, about half of the sequences were novel or represented the 5' ends of known genes.
The present full-length cDNA libraries add to the technological toolbox available for common bean and our sequencing of these clones substantially increases the number of unique EST sequences available for the common bean genome. All of this should be useful for both functional gene annotation, analysis of splice site variants and intron/exon boundary determination by comparison to soybean genes or with common bean whole-genome sequences. In addition the library has a large number of transcription factors and will be interesting for discovery and validation of drought or abiotic stress related genes in common bean.
The legume family is the second most important crop family as a human food source after cereals and in addition provides scores of other products including fodder and feedstock, valuable timber, vegetable oil, bio-fuels, important medicines and even poisons . Legumes are unequalled for stabilization and reforestation of degraded land due to their ability to fix nitrogen, compete with other plants, repel herbivory and grow on acid soils in a range of environments . Many legumes are major elements of international trade because they are high value and a source of protein, calories and oil. Within the legume family, common bean (Phaseolus vulgaris) is the most important crop for direct human consumption and is third in overall production after soybean (Glycine max L.) and peanuts (Arachis hypogea L.). However, unlike these species, beans are primarily grown on small- to medium-scale farms and are not used for industrial processing .
Expressed sequence tags (ESTs) are partial sequences of transcribed genes and represent gene expression in different tissues and often different genotypes depending on the plant treatment and development stage at which the mRNA was extracted . ESTs are known to be derived from transcribed mRNA which is cloned into cDNA libraries which are then sequence en masse. Therefore, a large effort has gone into constructing many different cDNA libraries for major legume crops such as soybean  and model legume species such as Lotus japonicus  and barrel medic, Medicago truncatula .
The number of ESTs found for all plant species now is over 21 million sequences. For the legumes a total of over 3 million sequences have been generated with the largest numbers in soybean (1.5 million) and the model legumes barrel medic (280,000) and lotus (242,000). This compares to over 6 million sequences in the Gramineae and nearly 3 million in the Brassicaceae.
Comprehensive libraries have been made for rice and Arabidopsis thaliana for example . Among the legumes, relatively fewer ESTs are found for the crop legumes than in the model legumes and soybean. Among the more minor legume crops, only a recent effort in cowpea (Vigna unguiculata) by Muchero et al.  nears the threshold of 200,000 total ESTs while common bean has about half that.
In common bean, there have been very few large scale efforts at cDNA cloning or EST sequencing and the current number of ESTs is 114,139 as of December 2010. Preparation of ESTs for common bean began with moderate numbers of GenBank entries by groups from CIAT, UNESP and UNAM [10–12] organizations in Colombia, Brazil and Mexico, respectively, showing the importance of this crop to Latin America.
Additional ESTs have been sequenced or analyzed in US universities such as Univ. of Minnesota  and Univ. of Missouri . Among these studies the first medium sized collections by Melotto et al.  and Ramírez et al.  consisted of 5,243 and 15,333 ESTs or unigenes, respectively. However, these represented EST sequencing of three and five different cDNA libraries, respectively.
The tissues sampled in common bean have represented mainly disease-infected seedling tissue for the set of libraries from Melotto et al.  and then a range of tissues from nodules and nodulated roots to leaves and pods for Ramírez et al. . Since then there has been the publication of one large scale EST collection of 37,919 un-trimmed ESTs by Thibivilliers et al.  from beans infected with rust (Uromyces appendiculatus) and one additional set of ESTs from two root libraries . In addition, a large number 391,150 ESTs have been developed for the suspensor cells of the related species P. coccineus by UCLA. Finally, a Canadian group at the Univ. of Saskatchewan has sequenced 10,272 ESTs from P. angustissimus, another relative of common bean.
Among other tropical legumes, pigeonpea (Cajanus cajan L.), has had an EST project of around 10,000 sequences  which are of interest due to the close relationship with common bean and its adaptation to the same dry to sub-humid conditions beans face. Cultivated peanut, Arachis hypogea, with 86,935 ESTs plus two ancestral species of peanut with around 32,000 ESTs each are the only other tropical legumes that have been emphasized.
Of these EST collections, only one collection from Ramírez et al.  and another rom Blair et al.  has represented tolerance to abiotic stresses so far with both research groups emphasizing genes expressed under low phosphorus conditions in roots However, some efforts have been made to evaluate metabolic pathways and clone transcription factors  or to sequence differentially expressed cDNAs from drought-treated tissues [18, 19]. Therefore, there is a need for additional EST sequencing in common bean and other tropical legumes especially for tissues affected by the drought and soil or weather stresses that are very important issues for productivity of these crops .
Among the legumes and for common beans in particular, one aspect of transcriptome analysis and EST sequencing that has been missing is the cloning of full-length cDNA clones. This technology, as first described by Seki et al.  and Carnici et al. , consists in capture of mRNA through their 5' caps and stabilization of the full transcript during ligation into an appropriate vector and during reverse transcription from the poly A tail . Full-length cDNA libraries have been made for a large range of arabidopsis tissues [24, 25] and for several starch-rich crops [26, 27] but fewer for legumes, except for soybean .
Full length cDNA libraries are extremely useful for analysis of the transcriptome and for comparative genomics and genome sequence validation given that they represent entire transcription units rather than partial gene sequences like most other cDNA libraries . They are especially valuable in that they uncover the transcriptional start site for most genes and EST sequencing of their 5'ends uncovers the un-translated region and methionine-encoding, ATG codon, translational start signal. They can then be used along with non-full length cDNA sequenced clones to cover entire gene sequences allowing scientists to determine where the open reading frame starts and ends and anchoring all this information to genomic sequences.
These characteristics give full-length cDNA sequences essential roles in discovering alternative splicing patterns and promoter regions . In some cases, full length cDNA clones have been used to construct microarrays to characterize the binding of transcription factors to promoter elements within the 5' UTRs of genes . Full length cDNAs also have utility in functional and physical analysis of protein activity and structure through their use as expression vectors as reviewed in . Several examples exist of 3D crystal structure being determined through the use of these clones [29–32].
In addition, full length cDNA clones have a role in characterizing gene structure in different species. For example their 5' and 3'sequences can be used to compare GC content and folding capacity in 5'UTR (un-translated regions) versus ORF (open reading frames) and 3'UTR regions .
Finally, as with other sorts of ESTs, full-length cDNA clone sequencing can be used to develop many types of genetic markers including simple sequence repeats (SSRs) which tend to be in greater supply in 5'UTR sequences, single nucleotide polymorphisms (SNPs) especially for different parts of ORFs [33, 34]. It is important to use standard genotypes such as those from genome sequencing efforts in the construction of full length cDNA libraries as the genome to gene comparisons become more straightforward when this occurs. In summary, full length cDNA technology can be very important for gene annotation, for sequencing of the transcriptome and for comparative genomics
The objectives of this research, therefore, were to make full-length cDNA libraries that would be useful for gene discovery in common bean, genome annotation of the sequenced genotypes and for an understanding of abiotic stress tolerance in the crop. Multiple treatments were sampled including unstressed, drought, low phosphorus and aluminum stressed plants so as to enhance the activation of the transcriptome machinery and naturally normalize the sampling of mRNAs. Furthermore, two genotypes were used in this initial fl-cDNA library construction, one known to be drought tolerant (BAT477) and the other which is the subject of full-length genomic sequencing (G19833). A total of nearly 10,000 ESTs were generated from the second library to show the utility of this technique in determining gene structure.
This EST sequencing project was performed as part of a breeding project to discover molecular markers in common beans for marginal areas of Sub-Saharan Africa and the process of marker discovery from full-length cDNA sequences is discussed. We also aimed to compare the ESTs from the full-length cDNA library to two previous large EST sets for common bean and show the advantages this technology has for genomic tool development in this less-well studied species.
Two elite common bean genotypes were selected based on their attributes for stress resistance and use in genomics studies. First, the Mesoamerican gene pool advanced lines BAT 477 was selected based on its deep rooting ability and known drought tolerance , and second, the Andean gene pool genotype G19833 was selected based on resistance to both Al-toxicity and low phosphorus soil stresses [36, 37]. This latter genotype has been selected for whole genome sequencing based on the physical map made by Schlueter et al.  and refined by Cordoba et al. . The DOR364 × G19833 mapping population described in Blair et al.  has also been used for determining the location of interesting genes such as those for red seed color  and recessive resistance to BCMV  as well as for QTL for nutritional traits .
Treatments, experimental conditions and sampling times
The genotypes were subjected to drought and irrigated (control) conditions as main treatments. Three types of soils with specific properties were collected at different locations including Palmira (highly compacted), Darién (low P content), and Quilichao (high Al content) for a total of six treatments as shown in Additional file 1. The experiment was established under greenhouse conditions using plastic PVC-tubes of 0.8 m inserted in non-translucent sleeves and filled with the specific soil plus a 2:1 soil: sand ratio. The irrigation was stopped at 10 days after seed germination to simulate natural drought stress under all drought treatments. In a split plot design with 2 replicates, a control treatment was normally irrigated throughout the experiment for each soil type. Aerial and root tissues were harvested, washed and frozen immediately with liquid nitrogen for the subsequent total RNA isolations. Harvests of tissues were performed at five-day intervals until reaching 35 days of drought period and sampling from each of the development stages of the plants: seedlings (cotyledons and shoots), growing stage (leaves, stems, shoots and roots), and reproductive stage (flowers and small pods). Roots were carefully obtained by washing away the sand-soil mixture with a light stream of water and then rinsing in a plastic tub. Tissues of the irrigated control were only collected at 15, 30, and 45 days after germination, which were representative of the stages of growing and flowering (see Additional file 2 for explanation of the time course for tissue harvest and for a photograph of the deep root, cylinder culture system).
Total RNA isolations
Frozen tissues were ground mechanically to a fine powder using liquid nitrogen. Aerial and root tissues with their corresponding treatments and sampling times were processed separately. Total RNA isolations were carried out using the TRIzol® reagent (Invitrogen, Cat. # 15596-018) and following the manufacturer guidelines. Total RNA pellets were re-suspended in RNAse-free water and quantified by spectrophotometry. After quantification each the amount of RNA obtained from each sampling time for the drought and irrigated treatments were pooled separately within each target genotype. Total RNA from aerial and root tissues were also pooled separately. RNA quality was determined through denaturing agarose gels (1.5%) containing formaldehyde and stained with ethidium bromide. Two different primer tags for each genotype, one specific for BAT477 and one specific for G19833 derived mRNAs were used to identify the genotype of the isolated cDNA clones.
Library construction and EST sequencing
Library construction was performed at the RIKEN institute of Yokohama, Japan with the following abbreviated method using pooled mRNAs from each genotype in separate reactions: Poly (A)+ RNA was prepared with the a μMACS mRNA Isolation Kit (Miltenyi Biotec) following the protocol of the manual. A full-length cDNA library was constructed from the biotinylated poly (A)+ RNA using the CAP trapper method and trehalose-thermoactivated reverse transcriptase. The resultant double-stranded cDNAs were digested with BamHI and XhoI, and ligated into the BamHI and SalI sites of a Lambda-based pFLCIII-cDNA vector . During the first strand cDNA synthesis an individualized tag primer sequence was incorporated into the libraries for each genotype: namely 5'-CTGATACG-3' for BAT477 and 5'-GTCATACG-3' for G19833 identified by its placement between the poly A tail and a SfiI (GGCCNNN·NGGCC) site.
Once transformed into Eschereschia coli bacteria, clones from the G19833 library were picked by an automated robotic colony picker to a total of 10,000 clones (half from each library). A total of 384 individual clones were sized and sequenced from both ends at RIKEN to determine their insert size and quality before sending a total of 9,984 clones (approximately half the library) for sequencing at the Washington State University sequencing center in St. Louis, Missouri. A total of 26 plates were sequenced from glycerol stocks on automated capillary DNA sequencers (ABI 3730× from Applied Biosystems). Sequencing was performed using the M13-21 primer (5'-TGTAAAACGACGGCCAGT-3') to evaluate the clones from the 5' end of the full-length cDNA inserts. Although we made two libraries (one Andean from G19833 and one Mesoamerican from BAT477) we only sequenced from the first of these given funding constraints and given the relative importance of G19833 which is being sequenced by a whole genome shotgun approach (S. Jackson, pers. communication).
Sequence reads were trimmed for low quality and vector contamination. This was accomplished for each full length EST sequence using Phred software  to eliminate low quality regions with more than one N in a 100 bases or a stretch of bases with Phred quality score < 20. These were generally found at the 3'end of the sequences and were discarded. Vector sequences, meanwhile, were eliminated by TrimVector (Sequencher, Ann Arbor, MI) using a database of vectors found at NCBI followed by manual confirmation with local cleaning using the software program Geneious (Biomatters Ltd, Auckland, New Zealand) to check for poor sequences. Poly-N stretches were masked and small insert clones (< 100) or hits to E. coli sequences disregarded. Poly-A tails were identified and trimmed at their adjacent base if followed by vector sequences. The software program BLAST2GO from , and which is a part of Blast 2.2.23 , was used to remove any hits to ribosomal, chloroplast and mitochondrial sequences along with non-plant hits. The software CAP3 from  with default parameters (gap penalty factor, N > 6; segment pair score cutoff, N > 40, overlap length score cutoff N > 80, etc) was used to assemble the sequences from the full-length cDNA clones allowing us to assemble the newly-created and cleaned full-length ESTs into contigs and unigenes. Comparisons were made between the distribution of cleaned sequences in the full-length cDNA library compared to those of Ramírez et al.  and Thibivilliers et al. .
Gene annotation and comparative genomics
BLAST2GO was again used with the assembly of EST sequences to determine the top-hit distribution of each unigene and contig from the full-length cDNA database. Searches were made against corresponding non-redundant (nr) database with blastx against all higher plant proteins. The E-value and positive alignment length distributions were then determined with a high-scoring segment pair (HSP) cutoff of 33 and an E-value threshold of 1E-3. Gene ontologies (GO) were assigned based on Harris et al.  and by evaluating against Uniprot, TAIR, GR-protein and Ecosys databases. Genes were then evaluated for their likely molecular cellular function, cellular localization and involvement at four gene ontology levels. Gene annotation for the 5'untranslated region (UTR) and open-reading frame (ORF) border were made by comparing the BLAST2GO report for the beginning of known proteins and matching these with the start codon for each unigene using an E-value threshold for hits of 1E-6 up to a cutoff of 1E-55 Comparisons were also made to sequences from Ramírez et al.  and Thibivilliers et al.  and with the annotation of the soybean (G. max) genome and soybean ESTs to determine intron/exon boundaries and to confirm probable start codons. In addition, KEGG annotation was also used to determine KO ontologies and to determine the position of the unigenes and singletons form the full assembly described above in various biochemical pathways and directed acyclic graphing (DAG) was used to determine the gene relationships. Finally, simple sequence repeats were identified in the full-length sequences based on their locations within the 5'UTR or ORF using first the software program RepeatFinder  and then more definitively SciRoKO .
Two full-length cDNA libraries were successfully constructed using the CAP-trapper technology, one for the drought tolerant Mesoamerican genotypes BAT477 and one for the acid-soil tolerant Andean genotype G19833. The libraries were based on totals of 3.789 mg and 4.258 mg of high-quality total RNA obtained from the six different irrigation × soil treatments for these two genotypes, respectively, which was sufficient for the highly complex process of full-length mRNA selection from polyA mRNAs. For the libraries, a total of 20,000 clones were selected robotically (half from each genotype) and in preliminary sequencing of 5'and 3'ends the clones of both libraries (a 384 well plate each) were shown to average 1.5 kb in length and to have non-chimeric sequences mostly with poly A tails at their 3'end. Additional file 3 shows the distribution of the initial clones in terms of total length.
Comparison of major EST sequencing efforts in common bean.
Sequence reads (after LQ & vector trimming)
Proportion of unigenes per sequence (%)
Average EST length (nt)
Average contig length (nt)
Average singleton length (nt)
In our functional analysis of the unigenes discovered in the full-length cDNA library, we found that BLAST2GO found a range of hits from our low threshold of 1 × 10-10 up to 1 × 10-175 and similarity values ranging from 40 to 100% alignment within a range of nucleotide windows. Top species hit for the full-length unigenes was with soybean (over 1,500 unigenes) and then grape (over 500 hits). These were followed by 250 to 400 hits each with medicago, poplar and castor bean.
Differences in categorization of the unigenes from the full length cDNA library compared to two other recent EST libraries of common bean.
Full-length library (this study)
Root library (Blair et al., in press)
Biological process 2
Cellular component organization
Cellular component biogenesis
Immune system process
Multi-cellular organismal process
Response to stimulus
Molecular function 2
Electron carrier activity
Enzyme regulator activity
Molecular transducer activity
Nutrient reservoir activity
Structural molecular activity
Translation regulator activity
Transcription regulator activity
Cellular component 3
Organelle - membrane bounded
Organelle - non membrane bound
Additional differences were observed for the analysis of the full-length cDNA library here and the libraries from Ramírez et al.  for leaf tissue and for that of Thibivilliers et al. . This shows the high degree of normalization of the full-length cDNA library given its mix of root, shoot and leaf tissues from various water treatments and soil growth conditions. Response to abiotic stress, generalized stress, chemical stimuli, cellular as well as macro-molecular metabolic, primary metabolic, oxidation/reduction and catabolic processes were all important in the third level of biological processes (data not shown). The frequency of aquaporins in the library may reflect the adaptation to drought stress in half the tissue sources used for full-length cDNA library construction.
Simple sequence repeats found by two software programs in the unigene set of full-length cDNA sequences.
SciRoKo w/o mono-nt
The major success of this research was the construction of two full-length cDNA libraries from two important common bean genotypes grown under three types of abiotic stress and the preliminary sequencing of the libraries with approximately 7,000 ESTs, with an average of 564 nucleotides in length. The success rate of the EST sequencing effort is equivalent to that of Ramirez et al.  but slightly lower than that of Thivibilliers et al.  which is to be expected given the library movement from RIKEN to Univ. of Washington for sequencing. The high rate of unigenes per HQ sequence is fairly unique among cDNA libraries established for common bean to date and represents the broad representation of tissues used to create the full-length cDNA library and its predominant coverage of the 5'end of the ESTs.
A similar approach was taken by Umezawa et al.  when constructing a full-length cDNA library for soybean in which they used a total of seven stresses (drought, salt, freezing, low temperature, P starvation, flooding and nematodes) along with three specialized tissue types (flower buds, nodules and developing seeds) for their RNA extracts. We used a similar strategy and sampled all organs from both below ground and above ground parts up to 35 days of growth under the important common bean stresses of low phosphorus, aluminum toxicity and drought [2, 20]. We did not include developing seeds or pods because of the limitations of growing plants under cylinder pot culture where reproductive stage tissue is of lower quality than vegetative stage tissue. We sampled roots extensively by careful separation from soil.
In addition to the library construction itself, the EST sequencing now brings the total of ESTs in common bean to almost 120,000 sequences. Considering that most previous ESTs were in the range of 500 to 600 nt and were from regular cDNA libraries that only represented specific tissues and partial sequences starting from random points in the gene sequences (usually in the ORF) this work had the advantage of providing more diverse ESTs for the species. In addition we obtained many longer EST reads (from 700 to 900 nt long) that mostly represented the 5'UTR plus the often most crucial start codon and first exon of the ORFs, rather than the 3' ends typical of regular EST projects. This is a true benefit of full-length cDNA library construction and allowed us to conduct proper gene modeling of exons and introns when cDNA sequences were compared against genomic sequences of soybean for example. It is expected that the full-length cDNA sequences will be useful for annotation of the genome. Another advantage of our efforts in EST sequencing was derived from the fact that the multiple-tissue sampling that we did was found to increase the proportion of unigenes in certain gene ontology categories compared to other libraries. For example, it was notable that the number of total unigenes of the full-length library was almost two thirds the number of Ramírez et al.  despite half the sequencing effort, showing the success of mixed tissue libraries and full-length cDNA construction as a strategy for gene discovery.
This advantage of the full-length cDNA library and cap-trapping developed at RIKEN is based on the chemical introduction of a biotin group into the diol residue of the cap structure of eukaryotic mRNA. This step is followed by digestion by RNase I, a ribonuclease that can cleave single-stranded RNA at any site and by selection of full-length cDNA. The libraries produced by this method contained a very high proportion of full-length cDNAs and produced an excellent yield without involving PCR amplification which could introduce bias in the representativeness of the clones. Carninci et al.  also found that by introducing the disaccharide trehalose to the reverse transcriptase reaction at the high reaction temperature of 60°C resulted in the synthesis of even longer full-length cDNAs, and higher representation of long full-length cDNAs in the library. In summary, this method of selecting full-length cDNAs by biotinylation of the mRNA cap and streptavidin capture followed by the use of the trehalose-thermostabilized reverse transcriptase, made it possible to prepare longer full-length cDNAs, and at the same time to remove non-full-length cDNAs [24, 25]. This method has shown itself to be ideal for construction of high-content full-length cDNA libraries for various crops, to analyze domain order and determine start codons. We are lucky now to have common bean be the second legume to have one of these libraries available after soybean .
Recently, emphasis on EST sequencing has waned due to the advent of next generation sequencing techniques that can quickly dissect a transcriptome. However, full length cDNA sequences will remain useful in the discovery of alternative splice sites and for unraveling paralogs within gene families. For common bean full length cDNAs this retains even more relevance as it is a basal member of the Phaseoleae tribe which contains other important species such as soybean, cowpea and pigeonpea [1.50,]. The comparisons of ESTs in diploid common bean and ancestral tetraploid soybean are likely to be important so as to discover which exact copies of genes are orthologs and in that manner make use of each species as a model for the other. Currently the number of ESTs within the Phaseoleae tribe is the highest among the legumes (2.2 M reads) compared to other legume tribes (Cicereae, Dabergieae, Fabeae, Galegeae, Loteae and Trifolieae) which only have from 30 to 300 thousand each. Indeed, the tropical genera Cajanus, Glycine, Phaseolus and Vigna all represent economically important food or industrial crops while in the cool season legumes much of the EST sequencing has been done in model or forage species of arguably lesser importance . Finally, the increasing number of sequences in the Phaseoleae reflects a bias towards the Papilionoideae legumes compared to the Caesilpinoids and Mimosid families with far fewer ESTs each.
The number of sequences available in related clades is increasingly correlated with the ease of marker development in any one member of each plant species or clade. We have found the full-length cDNA library useful for finding the 5'UTR and start codons of various genes from pathways that are important to legumes, such as abiotic stress response or nutrient accumulation. In this work we initiated the search for a new set of EST-SSR with the thought that given the high proportion of 5'UTR sequences in the ESTs we have obtained, there would be new classes and perhaps more polymorphic SSRs. So far we have applied two programs finding from between 175 and 1,400 SSRs and are contemplating the use of other software for a more complete analysis before embarking on primer design. ESTs are known to be rich sources of SSRs . The difference in EST-SSR frequency found between RepeatFinder and SciRoKo could be due to the algorithm in SSR identification especially as the second software found mono-nucleotide repeats (without these only 1,464 SSRs were identified equivalent to 18.1% of ESTs) while the first software found 2.5% of ESTs to have SSRs. In other future work, we plan to take advantage of the two libraries we have made which are tagged at the 3' end with two different tags which distinguish each of the genotypes we used for library constructions. Our plan it to sequence from the 3'end of the full-length clones already sequenced to construct scaffolds for each gene and then to compare the populations of cDNA clones from each genotypes for single nucleotide polymorphism (SNPs). We are beginning to evaluate assemblies of full-length cDNA sequences with regular EST sequences from common bean and related species to identify possible SNPs through comparison of Andean versus Mesoamerican or P. vulgaris versus P. coccineus assembled ESTs. This is possible since many of the EST libraries made so far for common bean represent a range of varieties from Andean snap beans (Early Gallatin) to Mesoamerican dry beans (Negro Jamapa).
In addition we will soon have the genomic sequences of the Andean landrace G19833 genotype and the Mesoamerican breeding line BAT93 genotype for these comparisons. Our plan is to sequence additional 5'and 3'ESTs from the two libraries in a forthcoming paper so as to determine the genotype of each clone and use these for bioinformatics analysis of single nucleotide polymorphisms in the comparison of BAT477 and G19833 with each other and with the sequenced BAT93 genome. The other aspect of genome annotation that we are interested in and which might be readily assisted by more sequencing of these libraries is in the uncovering of multi-gene families by distinguishing them at their 5'ends and in determining promoter sequences that are with in expressed transcripts. In this sense the aquaporin we identified as an example of alternate splicing is interesting for further characterization. The study of how completely the full-length clones covered parts of individual KEGG pathways was also interesting because it gave evidence of pathways that are turned on during stress generally (the perixosime pathway for example) or that have been shown to be both constitutive an related to specific stresses (such as the citrate cycle pathway which is important for generating organic acids during aluminum stress). Another activity will be to determine how representative the full-length cDNA clones are of all genes predicted to be expressed from the whole genome sequence.
The value of full length cDNA libraries is in their utility for the correct annotation of genomic sequences and functional analysis of genes because of their representativeness of the 5"UTR and full ORF of most genes, unlike other cDNA cloning or EST sequencing efforts (Seki et al. 2002). When seqecuenced from both 5'and 3'end they can be used to create physical scaffolds that definitively determine transcript length. Since they are each unique clones, they are more useful for determining alternate splicing sites. This is the second set of full-length cDNA libraries made for a legume and one of the first outside of cereals, cassava or arabidopsis. As such it is important to make use of the information gathered from this library for marker development and genome characterization useful for plant breeding.
Finally, the present full-length cDNA library adds substantially to the number of unique EST sequences available for the common bean genome and especially provides 5'end sequences that are more unique and useful for gene identification. These EST tags should be useful for functional gene annotation, analysis of splice site variants and intron/exon determination, and evaluation of gene homologies or KEGG pathway confirmation especially as future whole-genome sequences become available.
We are grateful to José Polonia and Idupulapati Rao for organizing the greenhouse experiments, to Natalia Hurtado for help with GenBank sequence submission, to Washington University sequencing center for EST generation and for funding from the Generation Challenge Program through the Bill and Melinda Gates Foundation grant to the Tropical Legumes I project. We would also like thank the Government of Japan and the RIKEN institute for counterpart funding.
- Varshney RK, Close TJ, Singh NK, Hoisington DA, Cook DR: Orphan legume crops enter the genomics era!. Curr Opin Plant Biol. 2009, 12: 202-210.PubMedView ArticleGoogle Scholar
- Graham PH, Vance CP: Update on legume utilization, Legumes: Importance and Constraints to Greater Use. Plant Physiol. 2003, 131: 872-877.PubMedPubMed CentralView ArticleGoogle Scholar
- Broughton WJ, Hernandez G, Blair MW, Beebe S, Gepts P, Vanderleyden J: Beans (Phaseolus spp.) - model food legumes. Plant and Soil. 2003, 252: 55-128.View ArticleGoogle Scholar
- Hatey F, Tosser-Klopp G, Clouscard-Martinato K, Mulsant P, Gasser F: Expressed sequence tags for genes: a review. Genet Selec Evol. 1998, 30: 521-524.View ArticleGoogle Scholar
- Vodkin LO, Khanna A, Robin Shealy R, Steven J Clough SJ, Gonzalez DO, Philip R, Gracia Zabala G, Thibaud-Nissen F, Sidarous M, Strömvik MW, Shoop ESchmidt C, Retzel E, Erpelding J, Shoemaker RC, Rodriguez-Huete AM, Polacco JC, Coryell V, Keim P, Gong G, Liu L, Pardinas J, Schweitzer P: Microarrays for global expression constructed with a low redundancy set of 27,500 sequenced cDNAs representing an array of developmental stages and physiological conditions of the soybean plant. BMC Genomics. 2004, 5: 73-PubMedPubMed CentralView ArticleGoogle Scholar
- Asamizu E, Nakamura Y, Sato S, Tabata S: Characteristics of the Lotus japonicus gene repertoire deduced from large-scale expressed sequence tag (EST) analysis. Plant Mol Biol. 2004, 54: 405-414.PubMedView ArticleGoogle Scholar
- Cheung F, Haas BJ, Goldberg SMD, May GD, Xiao Y, Town CD: Sequencing Medicago truncatula expressed sequenced tags using 454 Life Sciences technology. BMC Genomics. 2006, 7: 272-PubMedPubMed CentralView ArticleGoogle Scholar
- Asamizu E, Nakamura Y, Sato S, Tabata S: A large scale analysis of cDNA in Arabidopsis thaliana: generation of 12, 028 non-redundant expressed sequence tags from normalized and size-selected cDNA libraries. DNA Res. 2000, 7: 175-180.PubMedView ArticleGoogle Scholar
- Muchero W, Diop NN, Bhat PR, Fenton RD, Wanamaker S, Pottor M, Hearne S, Cisse N, Fatokun C, Ehlers JD, Roberts PA, Close TJ: A consensus genetic map of cowpea [Vigna unguiculata (L) Walp.] and synteny based on EST-derived SNPs. Proc Natl Acad Sci. 2009, 27: 18159-18164.View ArticleGoogle Scholar
- Blair MW, Muñoz-Torres M, Giraldo MC, Pedraza F: Development and diversity assessment of Andean-derived, gene-based microsatellites for common bean (Phaseolus vulgaris L.). BMC Plant Bio. 2009, 9: 100-View ArticleGoogle Scholar
- Melotto M, Monteiro-Vitorello CB, Bruschi AG, Camargo LEA: Comparative bioinformatic analysis of genes expressed in common bean (Phaseolus vulgaris) seedlings. Genome. 2005, 48: 562-570.PubMedView ArticleGoogle Scholar
- Ramírez M, Graham MA, Blanco-López L, Silvente S, Medrano-Soto A, Blair MW, Hernández G, Vance CP, Lara M: Sequencing and analysis of common bean ESTs: Building a foundation for functional genomics. Plant Phys. 2005, 137: 1211-1227.View ArticleGoogle Scholar
- Graham MA, Ramírez M, Valdés-Lopez V, Lara , Tesfaye M, Vance CP, Hernández G: Identification of candidate phosphorus stress induced genes in Phaseolus vulgaris through clustering analysis across several plant species. Func Pl Bio. 2006, 33: 789-797.View ArticleGoogle Scholar
- Thibivilliers S, Joshi T, Campbell KB, Scheffler B, Xu D, Cooper B, Nguyen HT, Stacey G: Generation of Phaseolus vulgaris ESTs and investigation of their regulation upon Uromyces appendiculatus infection. BMC Plant Bio. 2009, 9: 46-View ArticleGoogle Scholar
- Blair MW, Fernandez AC, Pedraza F, Muñoz-Torres MC, Kapu Sella N, Brown K, Lynch JP: Parallel sequencing of ESTs from two cDNA libraries for high and low phosphorus adaptation in common beans. The Plant Genome. 2011.Google Scholar
- Raju NL, Gnanesh BN, Lekha P, Jayashree B, Pande S, Hiremath PJ, Byregowda M, Singh NK, Varshney RK: The first set of EST resource for gene discovery and marker development in pigeonpea (Cajanus cajan L.). BMC Plant Bio. 2010, 10: 45-View ArticleGoogle Scholar
- Hernández G, Ramírez M, Valdés-López O, Tesfaye M, Graham MA, Czechowski T, Schlereth A, Wandrey M, Erban A, Cheung , Wu HC, Lara M, Town CD, Kopka J, Udvardi MK, Vance CP: Phosphorus Stress in Common Bean: Root Transcript and Metabolic Responses. Plant Phys. 2007, 144: 752-767.View ArticleGoogle Scholar
- Barrera-Figueroa BE, Peña-Castro JM, Acosta-Gallegos JA, Ruiz-Medrano R, Xoconostle-Cázares B: Isolation of up-regulated genes in roots of a drought tolerant bean (Phaseolus vulgaris cv. Pinto Villa) under early water deficit stress, and characterization of a new member of group 3 LEA genes. Func Plant Biol. 2007, 34: 368-381.View ArticleGoogle Scholar
- Kavar T, Maras M, Kidric M, Sustar-Vozlic J, Meglic V: Identification of genes involved in the response of leaves of Phaseolus vulgaris to drought stress. Mol Breed. 2008, 21: 159-172.View ArticleGoogle Scholar
- Graham PH, Rosas JC, Estevez de Jensen C, Peralta E, Tlusty B, Acosta-Gallegos J, Arraes Pereira PA: Addressing edaphic constraints to bean production: the bean/cowpea CRSP project in perspective. Field Crop Res. 2003, 82: 179-192.View ArticleGoogle Scholar
- Seki M, Carninci P, Nishiyama Y, Hayashizaki Y, Shinozaki K: High-efficiency cloning of Arabidopsis full-length cDNA by biotinylated CAP trapper. Plant J. 1998, 15: 707-720.PubMedView ArticleGoogle Scholar
- Carninci P, Shabata Y, Hayatsu N, Itoh M, Shiraki T, Hirozane T, Watahiki A, Shibata K, Muramatsu M, Hayashizaki Y: Balanced size and long-size cloning of full-length, cap-trapped cDNAs into vectors of the novel lambda-FLC family allows enhanced gene discovery rate and functional analysis. Genomics. 2001, 77: 79-90.PubMedView ArticleGoogle Scholar
- Seki M, Shinozaki K: Functional genomics using RIKEN Arabidopsis thaliana full-length cDNAs. J Plant Res. 2009, 122: 355-366.PubMedView ArticleGoogle Scholar
- Seki M, Narusaka M, Kamiya A, Ishida J, Satou M, Sakurai T, Nakajima M, Enju A, Akiyama K, Oono Y, Muramatsu M, Hayashizaki Y, Kawai J, Caninci P, Itoh M, Ishii Y, Arakawa T, Shibata K, Shinagawa A, Shinozaki K: Functional annotation of a full-length Arabidopsis cDNA collection. Science. 2002, 296: 141-145.PubMedView ArticleGoogle Scholar
- Seki M, Satou M, Sakurai T, Akiyama K, Iida K, Ishida J, Nakajima M, Enju A, Narusaka M, Fujita M, Shinosaki K: RIKEN Arabidopsis full-length (RAFL) cDNA and its applications for expression profiling under abiotic stress conditions. J Exp Bot. 2004, 55: 213-223.PubMedView ArticleGoogle Scholar
- Ogihara Y, Mochida K, Kawaura K, Murai K, Seki M, Kamiya A, Shinozaki K, Carninci P, Hayashizaki Y, Shin IT: Construction of a full-length cDNA library from young spikelets of hexaploid wheat and its characterization by large-scale sequencing of expressed sequence tags. Genes Genet Syst. 2004, 79: 227-232.PubMedView ArticleGoogle Scholar
- Sakurai T, Plata G, Rodríguez-Zapata F, Seki M, Salcedo A, Toyoda A, Ishiwata A, Tohme J, Sakaki Y, Shinozaki K, Manabu Ishitani M: Sequencing analysis of 20.000 full-length cDNA clones from cassava reveals lineage specific expansions in gene families related to stress response. BMC Plant Bio. 2007, 7: 66-View ArticleGoogle Scholar
- Umezawa T, Sakurai T, Totoki Y, Toyoda A, Seki M, Ishiwata A, Akiyama K, Kuotani A, Yoshida T, Mocida K, Kasuga M, Todaka D, Maruyama K, Nakahsim K, Enju A, Mizukado S, ahmend S, Yoshiwara K, Funatsuki H, teraishi M, Osaki M, Shinano T, Akashi R, Sakaki Y, Yamaguchi-Shinosaki K, Shinozaki K: Sequencing and analysis of approximately 40,000 soybean cDNA clones from a full length enriched cDNA library. DNA Res. 2008, 15: 333-346.PubMedPubMed CentralView ArticleGoogle Scholar
- Yamasaki K, Kigawa T, Inoue M, Tateno M, Yamasaki T, Yabuki T, Aoki M, Seki E, Matsuda T, Nunokawa E, Ishizuka Y, Terada T, Shirouzu M, Osanai T, Tanaka A, Seki M, Shinozaki K, Yokoyama S: A novel zinc-binding motif revealed by solution structures of DNA-binding domains of Arabidopsis SBP-family transcription factors. J Mol Biol. 2004, 337: 49-63.PubMedView ArticleGoogle Scholar
- Yamasaki K, Kigawa T, Inoue M, Yamasaki T, Tateno M, Yabuki T, Aoki M, Seki E, Masuda T, Tomo Y, Hayami N, Terada T, Shirouzu M, Osanai T, Tanaka A, Seki M, Shinozaki K, Yokoyama S: Solution structure of the WRKY DNA-binding domain. Plant Cell. 2005, 17: 944-956.PubMedPubMed CentralView ArticleGoogle Scholar
- Yamasaki K, Kigawa T, Inoue M, Yamasaki T, Yabuki T, Aoki M, Seki E, Matsuda T, Tomo Y, Terada T, Shirouzu M, Tanaka A, Seki M, Shinozaki K, Yokoyama S: Solution structure of the major DNA-binding domain of Arabidopsis ETHYLENE INSENSITIVE3-LIKE3. J Mol Biol. 2005, 348: 253-264.PubMedView ArticleGoogle Scholar
- Yokoyama S, Hirota H, Kigawa T, Yabuki T, Shirouzu M, Terada T, Ito Y, Matsuo Y, Kuroda Y, Nishimura Y, Kyogoku Y, Miki K, Masui R, Kuramitsu S: Structural genomics projects in Japan. Nature Struct Biol. 2000, 7: 943-945.PubMedView ArticleGoogle Scholar
- Kofler R, Schlötterer C, Lelley T: SciRoKo: a new tool for whole genome microsatellite search and investigation. Bioinformatics. 2007, 13: 1683-1685.View ArticleGoogle Scholar
- Galeano CH, Fernández AC, Gómez M, Blair MW: Single strand conformation polymorphism based SNP and Indel markers for genetic mapping and synteny analysis of common bean (Phaseolus vulgaris L.). BMC Genomics. 2009, 10: 629-PubMedPubMed CentralView ArticleGoogle Scholar
- Sponchiado BN, White JW, Castillo JA, Jones PG: Root growth of four common bean cultivars in relation to drought tolerance in environments with contrasting soil types. Exp Agric. 1989, 25: 249-257.View ArticleGoogle Scholar
- Yan X, Liao H, Beebe SE, Blair MW, Lynch JP: QTL mapping of root hairs and acid exudation traits and their relationship to phosphorus uptake in common bean. Plant and Soil. 2004, 265: 17-29.View ArticleGoogle Scholar
- López-Marín HD, Rao IM, Blair MW: Quantitative trait loci for root morphology traits under aluminum stress in common bean (Phaseolus vulgaris L.). Theor Appl Genet. 2009, 119: 449-458.PubMedView ArticleGoogle Scholar
- Schlueter JA, Goicoechea JL, Collura K, Gill N, Lin J-Y, Yu Y, Vallejos E, Muñoz M, Blair MW, Tohme J, Tomkins J, McClean P, Wing R, Jackson SA: BAC-end sequence analysis and a draft physical map of the common bean (Phaseolus vulgaris L.) genome. Trop Plant Bio. 2008, 1: 40-48.View ArticleGoogle Scholar
- Córdoba JM, Chavarro MC, Schleuter JJ, Jackson SA, Blair MW: Integration of physical and genetic maps of the common bean genome through microsatellite markers. BMC Genomics. 2010, 11: 436-PubMedPubMed CentralView ArticleGoogle Scholar
- Blair MW, Pedraza F, Buendia HF, Gaitán-Solís E, Beebe SE, Gepts P, Tohme J: Development of a genome-wide anchored microsatellite map for common bean (Phaseolus vulgaris L.). Theor Appl Genet. 2003, 107: 1362-1374.PubMedView ArticleGoogle Scholar
- Bassett MJ, Miklas PN, Caldas GV, Blair MW: A dominant gene for garnet brown seed coats at the Rk locus in 'Dorado' common bean and mapping Rk to linkage group 1. Euphytica. 2010, 176: 281-290.View ArticleGoogle Scholar
- Miklas PN, Larsen RC, Riley R, Kelly JD: Potential marker assisted selection for bc-1(resistance to bean common mosaic potyvirus in common bean). Euphytica. 2000, 116: 211-219.View ArticleGoogle Scholar
- Blair MW, Astudillo C, Grusak M, Graham R, Beebe S: Inheritance of seed iron and zinc content in common bean (Phaseolus vulgaris L.). Mol Breed. 2009, 23: 197-207.View ArticleGoogle Scholar
- Ewing B, Hillier L, Wendl MC, Green P: Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 1998, 8: 175-185.PubMedView ArticleGoogle Scholar
- Conesa A, Götz S: Blast2GO: A Comprehensive Suite for Functional Analysis in Plant Genomics. Internat J Plant Genoms. 2008, 12-Article ID 619832Google Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215: 403-10.PubMedView ArticleGoogle Scholar
- Huang X, Madan A: CAP3: A DNA sequence assembly program. Genome Res. 1999, 9 (9): 868-877.PubMedPubMed CentralView ArticleGoogle Scholar
- Harris MA, Clark J, Ireland A, Lomax J, Ashburner M, Foulger R, Eilbeck K, Lewis S, Marshall B, Mungall C: The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 2004, 32: D258-261.PubMedView ArticleGoogle Scholar
- Volfovsky N, Haas BJ, Salzberg SL: A clustering method for repeat analysis in DNA sequences. Genome Biology. 2001, 2: 0027.1-0027.11.View ArticleGoogle Scholar
- Cannon SB, May GD, Jackson SA: Update on comparative genomics of legumes. Three sequenced legume genomes and many crop species: rich opportunities for translational genomics. Plant Physiol. 2009, 109: 144-159.Google Scholar
- Varshney RK, Singh NK, Kulwal PL, Penmetsa RV, Saxena RK, Datta S, Rosen B, Farmer AD, Dubey A, Saxena KB, Fakrudin B, Singh MN, Wanjar KBi, Killian A, May GD, McCombie R, Jackson SA, Cook DR: Pigeonpea genomics initiative (PGI): an international effort towards improving crop productivity of pigeonpea (Cajanus cajan L.). Mol Breed. 2009, 23: 1-16.View ArticleGoogle Scholar
- Kantety RV, La Rota M, Matthews DE, Sorrells ME: Data mining for simple sequence repeats in expressed sequence tags from barley, maize, rice, sorghum and wheat. Plant Mol Bio. 2002, 48: 501-510.View ArticleGoogle Scholar