Development of ESTs from chickpea roots and their use in diversity analysis of the Cicergenus
BMC Plant Biology volume 5, Article number: 16 (2005)
Chickpea is a major crop in many drier regions of the world where it is an important protein-rich food and an increasingly valuable traded commodity. The wild annual Cicer species are known to possess unique sources of resistance to pests and diseases, and tolerance to environmental stresses. However, there has been limited utilization of these wild species by chickpea breeding programs due to interspecific crossing barriers and deleterious linkage drag. Molecular genetic diversity analysis may help predict which accessions are most likely to produce fertile progeny when crossed with chickpea cultivars. While, trait-markers may provide an effective tool for breaking linkage drag. Although SSR markers are the assay of choice for marker-assisted selection of specific traits in conventional breeding populations, they may not provide reliable estimates of interspecific diversity, and may lose selective power in backcross programs based on interspecific introgressions. Thus, we have pursued the development of gene-based markers to resolve these problems and to provide candidate gene markers for QTL mapping of important agronomic traits.
An EST library was constructed after subtractive suppressive hybridization (SSH) of root tissue from two very closely related chickpea genotypes (Cicer arietinum). A total of 106 EST-based markers were designed from 477 sequences with functional annotations and these were tested on C. arietinum. Forty-four EST markers were polymorphic when screened across nine Cicer species (including the cultigen). Parsimony and PCoA analysis of the resultant EST-marker dataset indicated that most accessions cluster in accordance with the previously defined classification of primary (C. arietinum, C. echinospermum and C. reticulatum), secondary (C. pinnatifidum, C. bijugum and C. judaicum), and tertiary (C. yamashitae, C. chrossanicum and C. cuneatum) gene-pools. A large proportion of EST alleles (45%) were only present in one or two of the accessions tested whilst the others were represented in up to twelve of the accessions tested.
Gene-based markers have proven to be effective tools for diversity analysis in Cicer and EST diversity analysis may be useful in identifying promising candidates for interspecific hybridization programs. The EST markers generated in this study have detected high levels of polymorphism amongst both common and rare alleles. This suggests that they would be useful for allele-mining of germplasm collections for identification of candidate accessions in the search for new sources of resistance to pests / diseases, and tolerance to abiotic stresses.
Chickpea (Cicer arietium L.) is one of the world's more important but less studied leguminous food crop with over 10 M ha grown across the Americas, the Mediterranean basin, East Africa, the Middle East, Asia and Australia . As a grain legume it plays a significant role in the nutrition of the rural and urban poor in the developing world, as it provides a protein-rich supplement to cereal-based diets particularly of vegetarians and subsistence farmers who cannot afford meat. Despite its economic importance, chickpea productivity has been low because of yield losses to foliar and soil-borne fungal diseases (ascochyta blight, fusarium wilt and botrytis grey mould), insect pests (helicoverpapod borer) and abiotic stresses such as drought, cold and salinity. Sources of resistance and tolerance to these constraints exist in the wild Cicer germplasm yet remain largely unused by conventional breeding programs [2–5]. Three annual Cicer gene-pools have been defined based on the available hybridization reports, biochemical and molecular diversity analysis : species within the primary gene-pool (C. arietinum, C. reticulatum and C. echinospermum) can be readily crossed usually generating fully fertile progeny; while species within the secondary gene-pool (C. bijugum, C. pinnatifidum and C. judaicum) can be successfully crossed with the cultigen C. arietinum, providing hybrid embryos are rescued. However, the progeny of crosses between primary and second gene-pools are frequently sterile; finally, species within the tertiary gene-pool (C. cuneatum, C. chorassanicum, C. yamashitae and others) have not yet been successfully crossed with the cultigen Cicer arietinum.
Cultivated chickpea has a relatively low level of diversity, due to a series of bottlenecks caused by restricted distribution of the wild progenitors and the founder effect associated with domestication . In addition, chickpea breeding programs have limited themselves to a small number of cultivated genotypes having sources of biotic stress resistance and abiotic stress tolerance with little or no use of wild species . This has resulted in limited durability of resistances to many of the major pests and diseases, and limited progress in abiotic stress tolerance breeding. Wild germplasm accessions held at the International Centre for Agricultural Research in Dry Areas (ICARDA) in Syria, the International Institute for the Semi-Arid Tropics (ICRISAT) in India and other genebanks in USA, Europe and Australia contain valuable sources of novel genetic variation for improvement of these traits . The wild Cicer species from the secondary gene-pool (particularly C. bijugum, C. pinnatifidum and C. judaicum) are known to possess multiple sources of pest and disease resistance and tolerance to abiotic stresses including drought and cold [7, 8].
Where species barriers can be overcome, molecular markers will facilitate rapid and efficient transfer of economically important traits into cultivated breeding pools. This will have the added advantage of broadening the narrow genetic base of this crop and thereby reducing its vulnerability to evolving pest and disease pressures, while simultaneously providing stable increases in yield far beyond the current 0.6% annual improvement .
Most types of molecular markers have been tested in chickpea including isozymes [9–12], restriction fragment length polymorphism (RFLP) [13, 14] random amplified polymorphic DNA markers (RAPDs) [15, 16] amplified fragment length polymorphisms (AFLPs) , sequence characterized amplified regions (SCARs) , inter-simple sequence repeat (ISSRs) , simple sequence repeat (STMS or SSR) [19–22], resistance gene analogs (RGAs)  DNA amplification fingerprinting (DAF) , and expressed sequence tags (ESTs) . However, there is a low level of polymorphism detected in cultivated chickpea using isozyme/allozyme markers [9, 10] and RFLP analysis [13, 14]. In contrast, SSR markers have been shown to be highly polymorphic within the cultigen C. arietinum  and have been routinely utilized for creating genetic linkage maps [27–29]. These SSR markers have also been used as reference points for integration of the different chickpea linkage groups derived from inter- and intra- specific crosses [28, 29]. To date there are over 500 SSR markers developed from chickpea genomic libraries [19, 20, 22]. In general, 30–50% of the chickpea SSR markers are polymorphic in any given breeding or intra-specific mapping population (HK Buhariwalla, unpublished data).
SSR markers remain the marker of choice for marker assisted selection in many breeding programs. However, SSR motifs may evolve too rapidly to be valuable as the sole assay for interspecific diversity analysis. In addition, markers shown to have tight genetic linkage to target genes in interspecific mapping populations may lose their selective power when used in backcross programs based on interspecific derivatives. Thus, there is a need for the development and utilization of gene-based markers that better serve molecular breeding applications and diversity analysis of germplasm.
In this context, the most important criteria for new markers are high reproducibility, detection of co-dominance polymorphism and suitability for rapid large-scale low cost screening. EST-based markers fulfill these criteria and since they are associated with the coding regions of the genome they also enhance molecular germplasm evaluation by capturing variation across transcribed regions and in genes of known function. This potentially resolves the problem of limited genomic coverage suffered by traditional single gene phylogenetic studies [30, 31]. Targeted EST development has already begun in chickpea, focusing on ABA-related mechanisms of water-deficit tolerance in epicotyl tissue .
Our objective in this study was to increase the public EST resource for chickpea with a particular focus on root tissue, as constitutive drought tolerance mechanisms (such as root system development) are a common target for the crop physiology community  and are increasingly identified by genomics studies as important components of stress tolerance mechanisms [32, 34]. Thus, an EST library was constructed after SSH of root tissue from two very closely related chickpea (C. arietinum) genotypes the landrace ICC 4958 and the popular local variety Annigeri, both considered to possess important sources of drought tolerance [35, 36]. Following sequencing and functional annotation, we have generated over 100 EST markers which we have used in preliminary diversity analysis of wild and cultivated Cicer germplasm.
Results and discussion
The SSH process resulted in over 4000 clones expected to be largely associated with ICC 4958 (SSH tester) but could conceivably also include some highly constitutively expressed genes from Annigeri (SSH driver). Typically the cDNA inserts ranged from 0.5–1 kb. All clones were partially or fully sequenced through a single pass read from the 3' end and trimmed of vector and low quality sequences. A length threshold of 170 bp was set in view of the minimum expected size of a functional gene-encoding region (exon) . Removal of short sequences (below 170 bp: largely comprising of problematic repeats and sequences predominantly composed of poly-A) resulted in 2858 high quality EST sequences with an average length of 480 bp all available at the ICRISAT Chickpea EST Database .
Analysis of ESTs and assembly of consensus sequences
Gene annotations were based on similarities to either known or putative ESTs in the public databases. All annotations were based on Blast searches, with a score threshold of ≥ 200 for BLASTn. For tBLASTx a score threshold of >100 was set, as these generally had e-values <10-5 with a minimum of 50% identity over at least 30% of the length of the protein, which are the commonly used thresholds for reliable sequence annotation [39, 40]. Tentative functional annotations of EST sequences based on tBLASTx were grouped under 12 general categories based on the biochemical functions of the predicted proteins (Figure 1). The 2858 sequences were assembled into 210 contigs with 267 ESTs remaining unassociated (singletons). Putative identifications could be assigned to 2179 of the 2858 sequences (76%), representing 73% of the unigenes (178 singletons and 162 contigs), and all of these proteins were represented in other plant species. The 2858 sequences were assembled into 210 contigs with 267 ESTs remaining unassociated (singletons). Assuming that in most cases each contig represents one gene  a maximum number of 477 genes are represented in this dataset (available at NCBI: accession numbers CK148643–CK149150). The total length of contigs ranged from 240 to 1291 bases (with an average of 648 bases) while the largest number of ESTs assembled into one contig was 255. EST assembly leading to the generation of tentative consensus (TC) sequences has dramatically reduced the redundancy in this database, as reported elsewhere  (see Additional file 1).
Over three quarters of the 210 consensus sequences (77% equivalent to 164 contigs) were found to have significant similarity (tBLASTx >100) to sequences in public databases. Many of these annotations (47 consensus sequences) were validated through comparison with the CDD database using RPS-BLAST which compares a protein query sequence to a position-specific score matrix prepared from the underlying conserved protein domain alignment . Conserved motifs consistent with tBLASTx annotations were identified in 47 consensus sequences providing an additional level of confidence for these assigned gene functions (see Additional file 2).
As expected, we observed a substantial number of the EST contigs and singletons to have no significant homology (NSH: 48 contigs and 89 singletons; 29%) to either nucleotide or protein sequences in public databases at the time of analysis. This compares well to the higher proportion of sequences (44%) reported with little or no homology from chickpea leaf tissue . In addition, a substantial number of ESTs (approximately 19% of the unigenes) were found to have similarity with unannotated hypothetical, putative or unknown proteins contained in the public databases and were, therefore, placed in the unidentified function (UF) class.
PCR optimisation and unit costs
Optimization of PCR conditions is a critical precursor for accurate and robust large-scale marker screening and offers additional benefits for reducing the unit cost of genotyping . As optimization relies on the sequential investigation of each reaction variable, the process can take a long time, thus the conditions of optimization are rarely identified empirically and individual components are rarely all tested simultaneously. However, Cobb and Clarkson  devised a strategy in which the components of a PCR reaction could be tested together using the least number ofexperiments based on the Taguchi principle . We adapted the Cobb and Clarkson , 9 reactions of 4 PCR components to a 5 reactions of 5 PCR components, the additional component being enzyme concentration. This further reduced the time and cost of optimization, whilst also minimizing the cost of the PCR screening by also considering enzyme concentration. Final primer (0.15 - 0.5 pmol) and Taq DNA polymerase (0.2 – 0.5 U) concentrations resulted in a reagent unit cost of US$0.08 per PCR sample. This compares very favorable with other reports of PCR unit costs [48, 49]. Clearly DNA extraction costs are an equally important component for molecular breeding programs and can also be greatly reduced as has been reported elsewhere . The primer sequences and optimization parameters of all the polymorphic markers are given in Table 2. Optimization also resulted in fewer spurious amplification products and clearer polymorphic bands amongst the accessions studied (Figure 2).
EST marker diversity analysis
A total of 106 EST primers were designed, optimized and screened, of which 48 gave good amplification products in ICC 4958 (the SSH tester accession in the EST generation process) while 58 were deemed unsuitable as they either did not produce an amplification product or generated a complex pattern of bands which were difficult to evaluate. Of the 48 primers pairs screened across 1 cultivated and 8 wild Cicer species, 44 were polymorphic (Table 2). A total of 167 polymorphic fragments ranging in size from 200 to >1000 bp was scored on polyacrylamide or agarose electrophoresis gels depending on product size (Figure 2). The number of alleles detected per primer ranged from 2 – 8 with the polymorphic information content (PIC) ranging 0.03 – 0.89 (Table 5).
Fourteen of the 106 EST's contained SSR motifs, of which 10 contained perfect trinucleotide repeats. A relatively large proportion of these EST-SSR markers (8) were derived from the no significant homology (NSH) and unknown protein functional (UF) classes, based on sequence comparisons (BLASTn tBLASTx and PSI-BLAST). Markers from these EST classes have readily detected polymorphism in the diverse germplasm tested (PIC 0.20–0.82). Since over 48% of the EST's generated from the chickpea root library fall into NSH and UF classes, these are clearly useful markers with potentially important, yet unknown functions.
Eleven of the polymorphic EST's were from stress annotated transcripts (AGLC2, AGLC16, AGLC 20, AGLC29, AGLC 34, AGLC45, AGLC52, AGLC53, AGLC55, AGLC66, AGLC68 and AGLC93); of these four contained SSR motifs (AGLC 34, AGLC55, AGLC67 and AGLC84). These stress related EST markers may be useful for allele-mining of germplasm collections for the identification of candidate accessions with new sources of agronomically important traits and for candidate gene mapping .
The EST data-set generated in this study was used to compute pair-wise genetic distances between species according to Band similarity coefficient and UPGMA clustering. In order to display, with minimal distortion, the genetic relationships between species, a Principle Coordinate Analysis (PCoA) was carried out. The first two dimensions of the PCoA plot indicate the presence of 5 clusters, which account for 20% and 17% of the total variation (Figure 3). Dollo and polymorphism parsimony analysis (DOLLOP) conducted on this data-set found one most parsimonious tree with 301 steps (Figure 4) corresponding well with the PCoA analysis.
Both PCoA and parsimony analysis provide clear separation of most species. Both analysis grouped C. echinospermum, C. reticulatum and C. arietinum, together. These three species are classified together as the primary gene pool based on crossability studies. Similarly, all previous diversity studies based on allozymes , RAPD [16, 53], microsatellite , ISSR [15, 55] and AFLP  markers have also grouped C. echinospermum, C. reticulatum and C. arietinum together. The EST-based diversity analysis presented here shows C. arietinum (believed to be the wild annual progenitor  of cultivated chickpea) to be slightly distant to C. echinospermum and C. reticulatum but still within the same cluster (best reflected by the PCoA analysis Figure 3). All previous molecular diversity studies have clustered C. bijugum, C. pinnatifidum and C. judaicum together, although sometimes not well differentiated from the primary gene-pool species based on molecular diversity analysis. These three species are commonly considered to represent the secondary gene-pool as they require embryo rescue to recover viable progeny when crossed with the cultigen . Both parsimony and PCoA analyses of the EST data-set clustered C. judaicum, C. pinnatifidum and C. bijugum together. This is in agreement with previous studies which demonstrated the close association of these three species. The current analysis suggests that C. pinnatifidum and C. judaicum are more closely related. This is in agreement with previous reports based on morphology analysis , seed storage protein analysis  and isozyme analysis . In contrast, ecogeographical studies of the wild Cicer germplasm have suggested that C. judaicum to be quite diverged , which may explain the outlying position of one C. judaicum accession studied here (IG 69986) (Figure 4). Most other species generally fall into one of two clusters generally collectively referred to as the tertiary gene pool (Figure 3 and Figure 4) from which no viable hybrids have been reported from crosses with the cultigen.
Parsimony analysis grouped C. chorassanicum with C. cuneatum, but placed the C. yamashitae accessions as outliers (Figure 4). This data supports the tertiary group proposed by Croser et al.  to consist of; C. chorassanicum which has never been successfully crossed with cultivated chickpea and C. yamashitae and C. cuneatum which have been crossed with the cultigen but the resultant progeny have failed to flower or proven sterile. However, this is not reflected in the PCoA (Figure 3).
The distribution of accessions from the same species within secondary and tertiary clusters suggests that these two gene-pools may not be as distinct as previously expected. This may indicate that there are some tertiary gene-pool species that might be more readily crossed with the cultigen than their taxonomic classification would otherwise indicate. However, since the current study is based on a limited numbers of accessions for both C. cuneatum and C. yamashitae further research is required to fully define the relationships between these species and the scope of new opportunities for plant breeders.
This is the first report of the use of EST-based markers for estimating genetic distances between annual Cicer species. We report on the development and characterization of 106 EST markers from chickpea sequences with functional annotations or unknown functions, some of which contain SSR motifs. These markers have detected high levels of polymorphism amongst the wild species studied here and initial results indicate that around 20% are polymorphic in intraspecific (C. arietinum) mapping populations, either directly or as cleaved amplified polymorphic sequences (CAPS) based markers .
The EST marker-based diversity analysis reported here broadly supports the Cicer taxonomy based on allozymic data and conclusions by Croser et al. . The concurrence of our EST data with that of allozymic studies in particular suggests that many gene-based loci do have a common ancestory across Cicer species, as proposed but not substantiated by Choumane et al. . Clustering patterns based on RAPD, AFLP and ISSR are generally very similar [15, 17, 55] but are substantially different to those based on the EST marker analysis reported here and the previously reported allozyme-based analyses . This may be a consequence of RAPD, AFLP and ISSR markers sampling fundamentally different regions of the genome (under different evolutionary pressures) as compared with markers based on expressed genes. Unfortunately a detailed comparison of different assays is confounded by the virtual absence of common accessions and the adoption of different statistical analyses in each study .
The EST database developed in this study provides a preliminary profile of some differentially expressed genes that may be associated with constitutive mechanisms important for stress tolerance and root development in chickpea roots. These include transcripts with putative annotations for proteases, T6P synthase, non-specific lipid transfer proteins, MRP-like ABC transporters, chaperones- HSP70, TCP-1-alpha, bZIP transcription factor, calcium ATPases, protein kinases, MRP4 glutathione-conjugate transporter, glutathione S-transferase, phosphoenol pyruvate carboxylase, S-adenosyl methionine synthetase (see Additional files 1 and 2 for full details). It is envisaged that these differentially expressed genes can be validated by the chickpea community and the most promising ESTs used in candidate gene mapping and allele mining. The resultant annotated EST markers will then be of substantial value for marker-assisted introgression programs based on interspecific crosses using wild Cicer species that harbour agronomically valuable genes. One of the aims of chickpea breeding is to address the continuing need for cultivars adapted to particular geographical regions or with specific new ideotypes to ensure sustainability and profitability of production. An important means of continuing to achieve such breeding objectives is through the use of novel germplasm . Many breeding programs, in particular those involving rice and wheat, have successfully utilized molecular and statistical approaches in accelerating the introduction of novel genes from wild species through marker accelerated backcross breeding.
The availability of increasing amounts of sequence data in many legume species now offers the potential for routine development of gene-based markers. These provide the ultimate assay (or so-called 'perfect marker') for indirect trait selection and map-based cloning. Combining gene-based markers together with highly polymorphic flanking SSR markers will greatly assist in reducing linkage drag and increasing the speed and efficiency of subsequent introgression programs. Finally, molecular genetic diversity analysis based on EST markers may also provide a means of predicting which accessions are most likely to produce fertile progeny when crossed with chickpea cultivars.
Plant material and growth conditions for SSH
Fifteen seeds from each chickpea accession (ICC 4958 and Annigeri) were sterilized with 25% chlorex (v/v) for 10 mins, rinsed twice with sterile distilled water, before placing in pre-sterilized plant pots containing sterilized vertisol. The soil was collected from the field, passed through a 2 mm sieve and autoclaved in plastic bags using a 60 min cycle at 120°C and repeated three times on consecutive days. Seeds were sown immediately after the last autoclave cycle.
Three seeds were sown per pot; the plants were maintained in a Conviron growth chamber (Conviron, Winnipeg, Man.) under optimal physiological conditions for chickpea growth: 25°C day (11 h) and 12°C night (13 h), 455 μmoles m-2 sec-1 illumination intensity, 80% day and 40% night relative humidity, and irrigation as required.
Roots were harvested 28 days after germination (with flowering expected after 30 days). The soil was washed off the roots in running water, the intact whole plant was lifted out of the soil and soil particles were gently rubbed off in running water with gloved hands. The root tissue was subsequently treated with (0.1% v/v) DEPC solution and immersed in sterile Milli Q water, excess water from the roots was blotted onto filter paper and roots were weighed (approx. 2 g), flash frozen in liquid nitrogen and stored at -80°C.
RNA isolation and suppressed subtractive hybridization
The subtraction of the two genomes was achieved through a process of PCR-based suppression of genes common to both genotypes, and genes that are differentially expressed are enriched. As the process is based on PCR, low copy number cDNA can be detected from the tester (ICC 4958). RNA isolation and library construction was carried out by Avestha Gengraine Technologies (Bangalore, India). Total RNA was isolated from the root tissue of ICC 4958 and Annigeri using Trizol reagents (GIBCO-BRL) for RNA isolation, mRNA was purified using Oligotex (Qiagen GmbH, Hilden, Germany).
The suppression subtractive hybridization (SSH) process was carried out using a Clontech PCR-Select cDNA subtraction kit (Clontech INC. USA). Tester (ICC 4958) and driver (Annigeri) cDNA was restricted with RsaI for 90 mins and cDNA from the ICC 4958 was ligated with the required adaptors. Adaptor ligation was confirmed by PCR, followed by one round of over-night hybridisation with Annigeri as the driver and ICC 4958 as the tester. A second round of hybridisation was followed by primary PCR (27 cycles) and secondary PCR (12 cycles) according to the manufacturer's instructions for the PCR-Select. The PCR products of the subtraction were analyzed by gel electrophoresis and cloned using pCR4-TOPO cloning kit (Invitrogen Inc., USA). The cDNA clones were amplified in E. coli DH10B cells by electroporation and insert containing clones were selected by plating on LB agar containing 100 ug/ml ampicillin. Large colony forms were picked and used to regenerate single clone cultures in 96-well microtitre plates. After growth over-night at 37°C, glycerol was added to a final concentration of 15% and cultures were stored at -80°C.
Cultures of transformed E. coli were grown over-night in 100 ul LB media containing 100 ug/ml ampicillin, the cells were centrifuged and suspended in sterile distilled water, heat denatured for 10 mins at 95°C, 1 ul of supernatant was used for insert amplification with M13 forward and reverse primers. Inserts (5 ul) were separated by electrophoresis on 1.2% agarose gel containing ethidium bromide to confirm amplification of a single insert and for quantification. After SAP/exonuclease I digest, 4 ul of the insert was sequenced using T7 primer and 1/8 fraction of the recommended volume of dye terminator reagent (Big Dye terminator cycle sequencing kit v2, ABI-Foster City) for cycle sequencing. After removal of dye blobs by ethanol precipitation, DNA sequences were analyzed by capillary-electrophoresis using an Applied Biosystems genetic analyzer model ABI 3700 or ABI 3100.
Sequence analysis for predicting gene function
Base calling was performed using the ABI DNA Sequence Analysis Software (v3.7) or by Chromas v2.2 (Technelysium Pty Ltd. Australia). All sequences were scanned visually for quality of the peak shapes and corresponding base call. Vector sequences were trimmed using Sequencher v.4 (Gene Codes, Ann Arbor, MI) software and low quality bases (quality score <20) were trimmed from both ends of sequences. A total of 4000 sequences were analyzed, only sequences of more than 170 bp with less than 5% ambiguity were processed. All sequences containing interspersed or simple repeats were masked with the RepeatMasker software  using Arabidoposis as reference. The masked sequences were screened against the following DNA databases: Arabidopsis thaliana, Medicago truncatula, Glycine max, Zea mays and Oryza sativa, in the TIGR Gene Indices using default settings for BLASTn and tBLASTx searches . The same sequences were also used to search the NCBI EST database (dbEST) using Viridiplantae as a limiter .
The comparison of chickpea sequences with those in public databases was carried out during 2002 – 2003, and blast searches with unique sequences were repeated in November 2004. An in-house script was used to retrieve best matches from the outputs of BLASTn and tBLASTx text files, retaining the following information: gene identifier, description, BLAST score, percent identity, alignment length, reading frame, and database name from the top-scoring alignments. Alignments with scores of ≥200 for BLASTn and ≥100 for tBLASTx were extracted into an MS-Access table. The individual ESTs were assembled into tentative consensus sequences (TCs) using Sequencher, the assembly parameters used for 'dirty data' were (minimum percentage match of 90 and minimum overlap 40 bp). A sequence was retained in a contig if it matched at least one sequence already in the contig based on pair-wise BLAST , with a percentage identity threshold of 90%. A consensus sequence was then derived for each cluster. Consensus sequences were translated into six possible reading frames with Transeq software , using the standard genetic code. The translated sequences were then submitted to Reverse Position Specific (RPS)-BLAST  to search for matches to the conserved domain database (CDD; all 4540-PSSMs) using default parameters. The RPS-BLAST outputs were similarly processed using an in-house script, retaining the following information from the top-scoring alignments: gene identifier, description, score and e-value. Each result was manually interpreted and classified according to the biochemical function of the predicted protein.
Plant material and DNA isolation
Fifteen chickpea accessions representing 9 annual Cicer species, including the cultivated species C. arietinum, were used in this study (Table 1). Seeds of these accessions were germinated in Jiffy pots in a growth room and DNA was extracted from 4–5 pinnules of 15–20 day old individual plants with 2% CTAB extraction buffer containing 0.03% mercaptoethanol as described in . DNA concentrations were determined by comparing the sample intensity (1 μl) with that of known amounts of uncut lambda DNA by electrophoresis in 1.2% 'Ready-to-run' agarose gels (Amersham) containing ethidium bromide. DNA was diluted to a working stock concentration of 10 ng/μl and checked before use.
Primer design of EST markers
A total of 106 EST sequences were chosen that represented diverse functional/stress annotations, a selection of which contained SSR motifs. In addition, primers were also designed from a random selection of sequences categorized as having 'no significant homology' (NSH) and 'unknown protein function' (UF) when compared with either nucleotide or protein sequences in public databases.
Primer sequences were designed using Genefisher 1.1 software  using the following criteria 57 – 63°C melting temperature (Tm), 40–50% GC content, 17–24 bp primer length and the difference in Tm between forward and reverse primer pairs was limited to 2°C. For those EST sequences containing repeat motifs, primers were designed to flank the microsatellite region. Optimal primer pairs were selected using Netprimer, the primer design algorithm which analyzes all possible secondary structures including hairpins, self-dimers and cross-dimers . The Tm and product length of 8 forward and reverse pairs per sequence from Genefisher software were evaluated by Netprimer software for minimal or no secondary structure. All primer and EST sequences are accessible through the ICRISAT Chickpea EST Database .
PCR optimization and evaluation
The chickpea accession ICC 4958 (from which the EST sequences were derived) was used as the reference accession for primer optimization. A total of 106 EST primers (synthesized by MWG, Germany) were optimized simultaneously for the following PCR components using a modified (5×5) grid . On this basis initial optimization of the five major components in a PCR (concentrations of primer, template DNA, Mg++, dNTP and enzyme) were empirically determined for each primer used in this study (Table 3). Optimal touch-down temperature and number of amplification cycle profiles were also determined for each primer pair (Table 4). This procedure is important to minimize non-specific amplification, maximize data accuracy and minimizes cost.
PCRs were performed in a total volume of 5 μl containing 5–10 ng of DNA and optimized concentration of the following: primer (0.15–0.3 pmoles), dNTP (0.1–0.2 mM), MgCl2 (1.5–2 mM), Qiagen Taq (0.2–0.5 u) and 1 x buffer. PCR amplifications were carried out using a GeneAmp model 9700 thermocycler (Perkin Elmer-Applied Biosystems). As the annealing temperatures of the markers ranged from 47–65°C we opted to use 3 catagories of "touchdown" temperature cycles (see Table 4).
PCR amplified products with a size range of 200–600 bp were resolved by non-denaturing polyacrylamide gels electrophoresis (6–10% acrylamide/bisacrylamide, 20:1 in TBE). Bands were visualized through a modified silver staining protocol  as follows: gels were immersed in water for 3 mins, followed by 20 mins in 0.1% CTAB solution and 0.3% ammonia solution for 15 mins. Silver staining solution was freshly prepared each day, consisting of 0.1 % (w/v) AgNO3 in 4 mM NaOH solution, 0.5 to 0.6 ml of 25% ammonia was titrated until the cloudy suspension became clear. Gels were gently agitated in the silver nitrate solution for 30 mins, and developed in 1.5% (w/v) sodium carbonate and 0.02% (v/v) formamide solution until bands appeared. The gels were rinsed in water, fixed in 1.5% glycerol solution and documented by scanning. PCR products above 600 bp were resolved by agarose gels electrophoresis (0.8–1%) containing ethidium bromide.
EST marker data collection and analysis
The amplification profile from each EST marker was checked for reproducibility and subsequently scored visually (and independently rescored) for the presence (1) or absence (0) of polymorphic bands. The degree of polymorphism was quantified using the polymorphic information content (PIC) calculator  based on the following formula:
PIC = 1 - ∑ Pi2 - ∑ ∑ Pi2Pj2
i = 1 i = 1 j = i+1
where Pi is the frequency of an individual genotype.
Pair-wise genetic distances were calculated with NTSYS-pc software version 2.0  as Sxy = 1 - (2nxy/nx + ny) as Sxy = 1 - (2nxy/nx + ny) based on band coefficient , where nxy are shared bands amongst the individuals, nx and ny are the number of fragments exhibited by each individual. Distance based on Band coefficient is similar to the genetic distance (GD) formula by Nei and Li : GD = 1 - Sxy. The 15 × 15 similarity matrix (based on Band coefficient) was subjected to sequential agglomerative hierarchical nested (SAHN) clustering using UPGMA (unweighted pair-group method analysis). Principle coordinate analysis based on these distance estimates was performed using NTSYS, using consecutive commands of 'DCENTRE' and 'EIGEN' in order to generate a PCoA plot which is more informative in highlighting differences between major taxonomic groups than dendrogram representations. Maximum parsimony analysis was conducted using the PHYLIP program Dollo and polymorphism parsimony analysis version 3.63  and the consensus tree viewed in tree view.
FAOSTAT data. 2004, [http://faostat.fao.org].
Erskine W, Siddique K, Khan T, Cowling W: Utilization of grain legume diversity. Plant genetic resourses of legumes in the Mediterranean. Edited by: Maxted N, Bennett SJ. Kluwer Academic, The Netherlands; 2001:311-326.
Knight E, Brinsmead R, Fordyce M, Wood J, Kelly A, Harden S: Use of wild relative Cicer echinospermum in chickpea improvement. Proceedings of the 12 th Australian Plant Breeding Conference. Perth Australia. 2002, 150-154.
Muehlbauer FJ, Kaiser WJ, Simon CJ: Potential for wild species in cool season food legume breeding. Euphytica. 1994, 73: 109-114. 10.1007/BF00027187.
Singh KB, Malhotra RS, Halila H, Knight EJ, Verma MM: Current status and future strategy in breeding chickpea for resistance to abiotic and biotic stresses. Euphytica. 1994, 73: 137-149. 10.1007/BF00027190.
Croser JS, Ahmad F, Clarke HJ, Siddique KHM: Utilisation of wild Cicer in chickpea improvement- progress, constraints and prospects. Aust J Agric Res. 2003, 54: 429-444. 10.1071/AR02157.
Berger J, Abbo S, Turner NC: Ecogeography of annual wild Cicer species: the poor state of the world collection. Crop Sci. 2003, 43: 1076-1090.
Singh KB, Ocampo B, Robertson LD: Diversity for abiotic and biotic stress resistance in the wild annual Cicer species. Genet Res Crop Evol. 1998, 45: 9-17. 10.1023/A:1008620002136.
Gaur PM, Slinkard AE: Inheritance and linkage of isozyme codominant genes in chickpea. J Hered. 1990, 81: 455-461.
Gaur PM Slinkard AE: Genetic control and linkage relations of additional isozyme markers in chickpea. Theor Appl Genet. 1990, 80: 648-656.
Kazan K, Muehlbauer FJ, Weeden NF, Ladizinsky G: Inheritance and linkage relationships of morphological and isozyme loci in chickpea. Theor Appl Genet. 1993, 86: 417-426. 10.1007/BF00838556.
Labdi M, Robertson LD, Singh KB, Charrier A: Genetic diversity and phylogenetic relationships among the annual Cicer species as revealed by isozyme polymorphism. Euphytica. 1996, 88: 181-188. 10.1007/BF00023889.
Udupa SM, Sharma A, Sharma RP, Pai RA: Narrow genetic variability in Cicer arietinum L. as revealed by RFLP analysis. J Plant Biochem Biotech. 1993, 2: 84-86.
van Rheenen HA, Reddy MV, Kumar J, Haware MP: Breeding for resistance to soil-borne diseases in chickpea. Disease Resistance Breeding in Chickpea. Edited by: Singh KB, Saxena MC. ICARDA, Aleppo, Syria; 1992:55-70.
Iruela M, Rubio J, Cubero JI, Gil J, Millán T: Phylogenetic analysis in the genus Cicer and cultivated chickpea using RAPD and ISSR markers. Theor Appl Genet. 2002, 104: 643-651. 10.1007/s001220100751.
Sudupak M, Akkaya M, Kence A: Analysis of genetic relationships among perennial and annual Cicer species growing in Turkey using RAPD markers. Theor Appl Genet. 2002, 105: 1220-1228. 10.1007/s00122-002-1060-8.
Nguyen TT, Taylor PWJ, Reeden RJ, Ford R: Genetic diversity estimates in Cicer using AFLP analysis. Plant Breeding. 2004, 123: 173-179. 10.1046/j.1439-0523.2003.00942.x.
Mayer MS, Tullu A, Simon CJ, Kumar J, Kraft JM, Muehlbauer FJ: Development of a DNA marker for fusarium wilt resistance in chickpea. Crop Sci. 1997, 37: 1625-1629.
Hüttel B, Winter P, Weising K, Choumane W, Weigand F: Sequence-tagged microsatellite site markers for chickpea (Cicer arientinum L.). Genome. 1999, 42: 210-217. 10.1139/gen-42-2-210.
Lichtenzveig J, Scheuring C, Dodge J, Abbo S, Zhang H-B: Construction of BAC and BIBAC libraries and their applications for generation of SSR markers for genome analysis of chickpea, Cicer arietinum L. Theor Appl Genet. 2005, 110: 492-510. 10.1007/s00122-004-1857-8.
Niroj SK, Bhumika S, Bhatia S: Isolation and characterization of a sequence-tagged microsatellite site markers in chickpea (Cicer arietinum L.). Mol Ecol Notes. 2003, 3: 428-430. 10.1046/j.1471-8286.2003.00472.x.
Winter P, Pfaff T, Udupa SM, Hüttel B, Sharma PC, Sahi S, Arreguin-Espinoza R, Weigand F, Muehlbauer FJ, Kahl G: Characterisation and mapping of sequence-tagged microsatellite sites in the chickpea (Cicer arietinum L.) genome. Mol Gen Genet. 1999, 262: 90-101. 10.1007/s004380051063.
Hüttel B, Santra D, Muehlbauer F, Kahl G: Resistance gene analogues of chickpea (Cicer arietinum L.): isolation, genetic mapping and association with a Fusarium resistance gene cluster. Theor Appl Genet. 2002, 105: 479-490. 10.1007/s00122-002-0933-1.
Benko-Iseppon AM, Winter P, Huettel B, Staginnus CM, Muehlbauer FJ, Kahl G: Molecular markers closely linked to fusarium resistance genes in chickpea show significant alignment to pathogenesis-related genes located on Arabidopsis chromosomes 1 and 5. Theor Appl Genet. 2003, 107: 379-386. 10.1007/s00122-003-1260-x.
Pfaff T, Kahl G: Mapping of gene-specific markers on the genetic map of chickpea (Cicer arietinum L.). Mol Genet Genomics. 2003, 269: 243-251.
Udupa SM, Robertson LD, Weigand F, Baum M, Kahl G: Allelic variation at (TAA)n microsatellite loci in a world collection of chickpea (Cicer arietinum L.) germplasm. Mol Gen Genet. 1999, 261: 354-63. 10.1007/s004380050976.
Tekeoglu M, Rajesh N, Muehlbauer FJ: Integration of sequence tagged microsatellite sites to the chickpea genetic map. Theor Appl Genet. 2002, 105: 847-854. 10.1007/s00122-002-0993-2.
Udupa SM, Baum M: Genetic dissection of pathotype-specific resistance to ascochyta blight disease in chickpea (Cicer arietinum L.) using microsatellite markers. Theor Appl Genet. 2003, 106: 1196-1202.
Winter P, Benko-Iseppon AM, Hüttel B, Ratnaparkhe M, Tullu A, Sonnante G, Pfaff T, Tekeoglu M, Santra D, Sant VJ, Rajesh PN, Kahl G, Muehlbauer FJ: A linkage map of the chickpea (Cicer arietinum L.) genome based on recombinant inbred lines from a C. arietinum × C. reticulatum cross: localization of resistance genes for fusarium wilt races 4 and 5. Theor Appl Genet. 2000, 101: 1155-1163. 10.1007/s001220051592.
Doyle JJ, Doyle JL, Harbison C: Chloroplast-expressed glutamine synthease in Glycine and related Leguminosae: Phylogeny, gene duplication and ancient polyploidy. Systematic Botany. 2003, 28: 567-577.
Kajita T, Ohashi H, Tateishi Y, Bailey CD, Doyle JJ: rbcL and legume phylogeny, with particular reference to Phaseoleae, Millettieae, and Allies. Systematic Botany. 2001, 26: 515-536.
Romo S, Labrador E, Dopico B: Water stress-regulated gene expression in Cicer arietinum seedlings and plants. Plant Physiol Biochem. 2004, 39: 1017-1026. 10.1016/S0981-9428(01)01318-3.
Sinclair TR, Muchow RC: System analysis of plant traits to increase grain yield on limited water supplies. Agron J. 2001, 93: 263-270.
Colmenero-Flores JM, Moreno LP, Smith CE, Covarrubias AA: Pvlea-18, a member of a new late-embryogenesis-abundant protein family that accumulates during water stress and in the growing regions of well-irrigated bean seedlings. Plant Physiol. 1999, 120: 93-103. 10.1104/pp.120.1.93.
Saxena NP, Krishnamurthy L, Johansen C: Registration of a drought resistant chickpea germplasm. Crop Sci. 1993, 33: 1424.
Saxena NP: Management of drought in chickpea – a holistic approach. Management of Agricultural drought – Agronomic and Genetic Options. Edited by: Saxena NP. Oxford & IBH Publishers Co Pvt. Ltd, New Delhi, India; 2003:103-122.
Glick B, Pasternak J: Molecular biotechnology: Principles and applications of recombinant DNA. American Society for Microbiology; Washington DC, USA; 1998.
ICRISAT chickpea EST database:. [http://www.icrisat.org/gt1/cpest/home.asp].
Quackenbush J, Liang F, Holt I, Pertea G, Upton J: The TIGR Gene Indices: reconstruction and representation of expressed gene sequences. Nucl Acids Res. 2000, 28: 141-145. 10.1093/nar/28.1.141.
Ronning CM, Stegalkina SS, Ascenzi RA, Bougri O, Hart AL, Utterbach TR, Vanaken SE, Riedmuller SB, White JA, Cho J, Pertea GM, Lee Y, Karamycheva S, Sultana R, Tsai J, Quackenbush J, Griffiths HM, Restrepo S, Smart CD, Fry WE, Van Der Hoeven R, Tanksley S, Zhang P, Jin H, Yamamoto ML, Baker BJ, Buell CR: Comparitive analysis of potato expressed sequence tag libraries. Plant Physiol. 2003, 131: 419-429. 10.1104/pp.013581.
White JA, Todd J, Newman T, Focks N, Girke T, Ilarduya OM, Jaworski JG, Ohlrogge JB, Benning C: A new set of Arabidopsis Expressed Sequence Tags from developing seeds. The metabolic pathway from carbohydrates to seed oil. Plant Physiol. 2000, 124: 1582-1594. 10.1104/pp.124.4.1582.
Rounsley SD, Glodek A, Sutton G, Adams MD, Somerville CR, Venter JC, Kerlavage AR: The construction of Arabidopsis expressed sequence tag assemblies. A new resource to facilitate gene identification. Plant Physiol. 1996, 112: 1177-1183. 10.1104/pp.112.3.1177.
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucl Acids Res. 1997, 25: 3389-3402. 10.1093/nar/25.17.3389.
Coram T, Pang E: Ascochyta blight of chickpea – searching for resistance gene analogues and characterizing an enriched EST library. Proceedings of the 12 th Australian Plant Breeding Conference, 15–20 September, Perth, Australia. 2002, 17-22.
Beaulieu M, Larson GP, Geller L, Flanagan SD, Krontiris TG: PCR candidate region mismatch scanning: adaption to quantitive, high-throughput genotyping. Nucleic Acid Res. 2001, 29: 1114-1124. 10.1093/nar/29.5.1114.
Cobb BD, Clarkson JM: A simple procedure for optimising the polymerase chain reaction (PCR) using modified Taguchi methods. Nucl Acids Res. 1994, 22: 3801-3805.
Taguchi G: Introduction to Quality Engineering, Asian Productivity Organization. American Supplier Institute Inc., Dearborn, MI; 1986.
Dreher K, Khairallah M, Ribaut JM, Morris M: Money matters (I): costs of field and laboratory procedures associated with conventional and marker-assisted maize breeding at CIMMYT. Mol Breeding. 2003, 11: 221-234. 10.1023/A:1022820520673.
Ragot M, Hoisington DA: Molecular markers for plant breeding: comparisons of RFLP and RAPD genotyping costs. Theor Appl Genet. 1993, 86: 975-984. 10.1007/BF00211050.
Mace ES, Buhariwalla HK, Crouch JH: A high throughput DNA extraction protocol for tropical molecular breeding programs. Plant Mol Biol Rep. 2004, 21: 459a-459h.
Kresovich S, Luongo AJ, Schloss SJ: Mining the Gold: Finding Allelic variants for improved crop conservation and use. Managing Plant Genetic Diversity. Edited by: Engels JMM, Ramanatha Rao V, Brown AHD, Jackson MT. IPGRI; 2002:379-386.
Kazan K, Muehlbauer FJ: Allozyme variation and phylogeny in annual species of Cicer (Leguminosae). Plant Syst Evol. 1991, 175: 11-21. 10.1007/BF00942142.
Ahmad F: Random amplified polymorphic DNA (RAPD) analysis reveals genetic relationships among the annual Cicer species. Theor Appl Genet. 1999, 98: 657-663. 10.1007/s001220051117.
Choumane W, Winter P, Weigand F, Kahl G: Conservation and variability of sequence-tagged microsatellite sites (STMSs) from chickpea (Cicer aerietinum L.) within the genus Cicer. Theor Appl Genet. 2000, 101: 269-278. 10.1007/s001220051479.
Sudupak M: Inter and Intra-species Inter Simple sequence Repeat (ISSR) variation in the genus Cicer. Euphytica. 2004, 135: 229-238. 10.1023/B:EUPH.0000014938.02019.f3.
Ladizinsky G, Adler A: Genetic relationships among the annual species of Cicer L. Theor Appl Genet. 1976, 48: 197-203. 10.1007/BF00527371.
Ahmad F, Slinkard AE: Genetic relationships in the genus Cicer L. as revealed by polyacrylamide gel electrophoresis of seed storage proteins. Theor Appl Genet. 1992, 84: 688-692. 10.1007/BF00224169.
Buhariwalla HK, Gaur PM, Pande S, Crouch JH: Asian Development Bank. Half-yearly progress report in chickpea for project 'Rapid crop improvement for poor farmers in the semi-arid tropics'. ICRISAT, India; 2002.
Gargav S, Gaur PM: Phylogenetic relationships of perennial chickpea species Cicer microphylhum (Benth.) with its annual relatives as revealed by allozyme polymorphism. Indian Journal of Genetics. 2001, 61: 218-222.
Heslop-Harrison JS: Exploiting novel germplasm. Aust J Agric Res. 2002, 53: 873-879. 10.1071/AR02078.
RepeatMasker software. [http://repeatmasker.genome.washington.edu].
TIGR BLASTn and tBLASTx analysis. [http://tigrblast.tigr.org].
NCBI BLASTn analysis:. [http://www.ncbi.nlm.nih.gov].
Transeq software:. [http://www.ebi.ac.uk/emboss/transeq].
Giegerich R, Meyer F, Schleiermacher C: GeneFisher – Software Support for the Detection of Postulated Genes. Proceedings of the Fourth International Conference on Intelligent Systems for Molecular Biology. AAAI Press, USA; 1996:68-77.
Netprimer analysis. [http://www.premierbiosoft.com].
Tegelstrom H: Detection of mitochondrial DNA fragments in molecular genetic analysis of populations. A Practical Approach. Edited by: Hoelzel AR. IRL Press, Oxford; 1992:89-114.
Polymorphic information content calculator. [http://www.agri.huji.ac.il/~weller/Hayim/parent/PIC].
Rohlf FJ: NTSYS-pc. Numerical taxonomy and multivariate analysis system, Version 2.0. Applied Biostatistics, New York; 1998.
Lynch M: The similarity index and DNA fingerprinting. Mol Biol Evol. 1990, 7: 478-484.
Nei M, Li WH: Mathematical model for studying genetic variation in terms of restriction endonucleases. Proc Natl Acad Sci. 1979, 74: 5267-5273.
Felsenstein J: PHYLIP (Phylogeny Inference Package), version 3.63. the author, Univ. of Washington, Seattle, Washington; 1995.
This work was supported by ICRISAT core funds including earmarked unrestricted grants from the governments of UK, Japan and the European Union. The authors would like to thank Dr HC Sharma for providing access to chickpea material, Pratibha R. for technical assistance with initial EST marker screening, Dr HD Upadhyaya for provision of gene-bank passport data for accessions used in this study, Seetha K for lab management, PVNS Prasad and KDV Prasad for assistance with figures and to Drs R Serraj and R Folkertsma for helpful suggestions during manuscript development. We greatly appreciate suggestions from an anonymous reviewer concerning this manuscript.
HKB coordinated library construction and sequencing of a large proportion of the ESTs, experimental design, marker analysis and manuscript development. JB coordinated and/or carried out all bioinformatic analyses and participated in all aspects of manuscript development. KE carried out all aspects of marker optimization and screening, and contributed to various aspects of experimental design and data analysis. JHC participated in all aspects of experimental design, data interpretation and manuscript development and provided overall coordination of the project. All authors read and approved all versions of the manuscript.
Hutokshi K Buhariwalla, Jayashree B contributed equally to this work.
Electronic supplementary material
Additional File 1: List of the most abundant EST from the chickpea root EST database. Number of chickpea ESTs forming tentative consensus sequences (TC) with corresponding TC number and where known, putative annotation*. (DOC 40 KB)
Additional File 2: Table of chickpea unigene sequences classified through RPS-Blast. Functional categories, with corresponding genbank ID numbers, Blastx descriptors, domains identified through RPS-Blast and e-values of chickpea unigene sequences. (DOC 118 KB)
About this article
Cite this article
Buhariwalla, H.K., B, J., Eshwar, K. et al. Development of ESTs from chickpea roots and their use in diversity analysis of the Cicergenus. BMC Plant Biol 5, 16 (2005). https://doi.org/10.1186/1471-2229-5-16
- Suppression Subtractive Hybridization
- Polymorphic Information Content
- Subtractive Suppressive Hybridization
- Cicer Species
- Principle Coordinate Analysis