The narrow genetic base of pigeonpea has hindered the wide use of molecular marker technology for crop improvement . In the present study, two BAC libraries were developed with an estimated ~11× genome coverage of pigeonpea. Sequencing of 50,000 BAC clones from both insert ends provided 88,860 BESs. Removal of cytoplasmic orgeneller BESs and cluster analysis facilitated the maximum possible recovery of nuclear genomic sequences comprising 41,329 singletons and 10,601 non-redundant contigs. With an objective to understand the constitution of SSR containing BAC clones, BESs were run through an annotation pipeline. Major proportion of the sequences remained non-annotated which may be considered as 'novel' C. cajan sequences. The overall repetitive fraction, resulting from BES analysis was found to be intermediate (22.15%) when compared with the percentage of repetitive elements in BESs of other legumes such as Trifolium (8.5%), soybean (33.5%), and common bean (49.3%) . BES annotation analysis has shown a considerable variability in the amount of repetitive fraction in different crop species such as tomato (49.3%) , papaya (16%) , banana (36%)  and citrus (25%) . This variation in the amount of repetitive elements in BESs is an indicative feature of presence of repetitive elements in the genome of a species. A varying level of annotations in different species may also be responsible for difference in repetitive elements. Proportion of annotated genic fraction was found more or less similar as observed in the BESs analysis of other crop species such as Phaseolus (29.3%) , apple (10.9%) , banana (11%) , Brassica (11%)  and papaya (19.%) .
BESs have been very useful to develop SSR markers in several plant species including legumes like soybean , common bean  and Medicago . In terms of SSRs abundance, overall density of 1 SSR per 5.64 kb seems to be in good congruency with the earlier reports in plant genomes . Similar results showing SSR frequencies of 1 SSR per 4 to 10 kb were achieved in different plant species like Medicago, soybean, Lotus, Arabidopsis and rice . This discrepancy observed in different studies may be accounted to (i) amount of sequence data analyzed, (ii) criteria for SSR identification, and (iii) different sources of derived sequences. It is also important to note that after excluding non-annotated BESs, majority (70.21%) of SSRs belong to be associated with genes. These observations are in agreement of the comprehensive study in plant genomes where SSRs were found associated mainly with genes .
In terms of distribution of SSRs, unlike the common occurrence of 'CG' motif in monocot species, 'CG' motifs were the least abundant in pigeonpea genome, as previously observed in other legume species (Medicago, Lotus and soybean). Such low abundance of "CG" di-nucleotide repeats may be attributed to their tendency of forming secondary structures (hairpins), leading to a selective pressure against 'CG' accumulation in genomes .
While converting identified SSRs into genetic markers, though 3,072 SSR primer pairs were synthesized; of these 2,964 (96.48%) primers yielded scorable amplicons. This rate of successful amplification is quite higher than earlier reported in pigeonpea [10–13]. All the repeat classes showed more than 98% amplification except di-nucleotide repeats which had comparatively lower rate of amplification (95.98%).
All the successfully amplified primer pairs were screened for polymorphism on a set of 22 diverse pigeonpea genotypes representing parents of 13 mapping populations segregating for various traits. These mapping populations represented the best cross combinations based on diversity revealed through morphological attributes and available marker data . The overall frequency of length polymorphism was found to be 28.40% which is lower than reported in earlier studies i.e. 50% , 81.3%  and 95% . This can be attributed to use of only one wild species genotype in this study unlike earlier studies. Occurrence of a very low level of DNA polymorphism among pigeonpea cultivars is not unexpected as several studies have documented such results [33–35].
As expected degree of marker polymorphism was lower in intra-specific populations than in inter-specific mapping population (ICP 28 × ICPW 94). The frequency of marker polymorphism increased dramatically with SSR locus longer than 200 bp. PIC values for SSR markers were also analyzed in relation to repeat length and unit type. In terms of repeat length, Class I SSRs were more polymorphic as compared to the Class II SSRs which may be accounted to the hyper-variable nature of Class I SSRs  Among different type of repeat unit classes, tetra-nucleotide repeats, in general, showed the higher average PIC value (0.64) followed by di-nucleotide repeats (0.57). It was also observed that among tri-nucleotide repeat class, the 'TAA' repeat motifs, displayed higher polymorphism (average PIC value = 0.59). Similarly, 'TA' repeat motifs in di-nucleotide repeat class had a higher average PIC value (0.59) compared to the others. Similar trends were also observed in other legumes such as chickpea ,  and  where the SSR markers with repeat motifs 'TAA' or 'TA' exhibited extensive abundance and polymorphism as well. Higher average PIC value of compound SSRs (0.58) can be attributed to the fact that the markers with compound SSRs have more than one SSR motif, which increases their chance to be polymorphic .
This study provides a list of polymorphic markers for different mapping populations that segregate for a number of important traits like Fusarium wilt (FW), sterlity mosaic disease (SMD), fertility restorer (Rf) etc. that are important for pigeonpea improvement . Genotyping of these mapping populations with identified polymorphic markers together with phenotyping data should provide the markers associated with QTLs (quantitative trait loci)/gene(s) for trait of interest that can be used for enhancing the breeding efficiency through marker-assisted selection.
To develop a reference genetic map, an inter-specific cross was used so that a larger number of segregating loci can be integrated into the genetic map. Usually SSR markers are co-dominant and follow Mendelian inheritance . However deviation from the expected segregation ratio for SSR markers is not an uncommon feature in inter-specific crosses and especially F2 population. Significant distortion observed in the marker data may be attributed to several possible reasons such as the abortion of male or female gametes or the selective exclusion of a particular gametic genotype from fertilization, owing to incompatibility, incongruity, certation, or zygote selection . Percentage distortion observed in the present study is comparable with previously reported studies performed on inter-specific crosses .
In the present study, the genetic map derived from an inter-specific cross ICP 28 × ICPW 94 included eleven discrete linkage groups corresponding to the basic chromosome number of the genus (x = 11). Initial construction of a skeletal map with un-skewed markers and followed by integration of distorted markers helped in minimizing the possibility for spurious assignments of markers . The final map comprised of 239 marker loci with a total map length of 930.90 cM having average spacing of 3.8 cM between two marker loci. This is the first report on the construction of SSR-based genetic map in pigeonpea. Therefore this map should serve as a 'reference map' for other future genetic maps of pigeonpea. Moreover as the SSR markers are derived from the BAC-end sequences, these markers and the map should be very useful resource for linking the genetic map with a 'future' physical map of pigeonpea .
Developed set of large number of SSR markers should be very useful for applied aspects of genetics and breeding in pigeonpea, especially when the cultivated gene pool has a narrow genetic diversity. In case of pigeonpea, CMS- hybrid technology is becoming popular to tackle the low crop productivity . For assessing the genetic purity of hybrids, in general, grow out test (GOT) based on morphological criteria is used. However, GOT is limited by the accuracy, time and labour cost . In this context, for each of two hybrids (ICPH 2671 and ICPH 2438), a set of 42 markers has been identified that can be used for purity assessment of hybrid seeds. SSR markers have been found very effective for determining hybrid purity in many species like rice , maize  and cotton . In fact in case of ICPH 2438 hybrid, two diagnostic SSR markers were identified for purity assessment in an earlier study also . Although some studies report suitability of even one marker for hybrid purity assessment test [43, 47, 48]. This study increases the diagnostic markers in large number for ICPH 2438 and also identifies a set of diagnostic markers for another pigeonpea hybrid ICPH 2671. Moreover identification of different marker groups, especially the group of common markers (CcM0257, CcM1559, CcM1825 and CcM1895) for both hybrids, for undertaking multiplex assays provides an added value to enhance their utility for hybrid purity assessment.