Chromosome arm-specific BAC end sequences permit comparative analysis of homoeologous chromosomes and genomes of polyploid wheat
© Sehgal et al.; licensee BioMed Central Ltd. 2012
Received: 5 December 2011
Accepted: 9 April 2012
Published: 4 May 2012
Bread wheat, one of the world’s staple food crops, has the largest, highly repetitive and polyploid genome among the cereal crops. The wheat genome holds the key to crop genetic improvement against challenges such as climate change, environmental degradation, and water scarcity. To unravel the complex wheat genome, the International Wheat Genome Sequencing Consortium (IWGSC) is pursuing a chromosome- and chromosome arm-based approach to physical mapping and sequencing. Here we report on the use of a BAC library made from flow-sorted telosomic chromosome 3A short arm (t3AS) for marker development and analysis of sequence composition and comparative evolution of homoeologous genomes of hexaploid wheat.
The end-sequencing of 9,984 random BACs from a chromosome arm 3AS-specific library (TaaCsp3AShA) generated 11,014,359 bp of high quality sequence from 17,591 BAC-ends with an average length of 626 bp. The sequence represents 3.2% of t3AS with an average DNA sequence read every 19 kb. Overall, 79% of the sequence consisted of repetitive elements, 1.38% as coding regions (estimated 2,850 genes) and another 19% of unknown origin. Comparative sequence analysis suggested that 70-77% of the genes present in both 3A and 3B were syntenic with model species. Among the transposable elements, gypsy/sabrina (12.4%) was the most abundant repeat and was significantly more frequent in 3A compared to homoeologous chromosome 3B. Twenty novel repetitive sequences were also identified using de novo repeat identification. BESs were screened to identify simple sequence repeats (SSR) and transposable element junctions. A total of 1,057 SSRs were identified with a density of one per 10.4 kb, and 7,928 junctions between transposable elements (TE) and other sequences were identified with a density of one per 1.39 kb. With the objective of enhancing the marker density of chromosome 3AS, oligonucleotide primers were successfully designed from 758 SSRs and 695 Insertion Site Based Polymorphisms (ISBPs). Of the 96 ISBP primer pairs tested, 28 (29%) were 3A-specific and compared to 17 (18%) for 96 SSRs.
This work reports on the use of wheat chromosome arm 3AS-specific BAC library for the targeted generation of sequence data from a particular region of the huge genome of wheat. A large quantity of sequences were generated from the A genome of hexaploid wheat for comparative genome analysis with homoeologous B and D genomes and other model grass genomes. Hundreds of molecular markers were developed from the 3AS arm-specific sequences; these and other sequences will be useful in gene discovery and physical mapping.
Plant genomes may vary over 2,000-fold in genome size (Genlisea aurea, ~64 Mb to Paris japonica, ~149 Gb). Within the grass family, bread wheat, also known as common or hexaploid wheat (Triticum aestivum L., 2n = 6x = 42), has a large genome size of 17 Gb because of two allopolypoidization events and also a very high repetitive DNA content (reviewed in Gill et al. ). Each chromosome of wheat, at an average size of 809 Mb (17 Gb/21), is twice the size of the rice genome (390 Mb). Therefore, the physical mapping and sequencing of the wheat genome poses a technical challenge and is cost prohibitive, as discussed at an NSF-sponsored workshop . The wheat genomics community decided on a chromosome-based approach to physical mapping and sequencing to overcome these problems and to provide an opportunity for division of labor and cost sharing by establishing an international consortium of scientists (http://www.wheatgenome.org/). This approach is feasible because of the ability to purify wheat chromosomes and/or chromosome arms by flow cytometric sorting. Particular chromosomes and chromosome arms can be isolated at over 90 % purity and in sufficient quantity for making chromosome and chromosome arm-specific BAC libraries [2–4].
Among the wheat chromosomes, 3B is the largest, sorts out as a single peak on a flow karyotype and its BAC library  was used to construct the first wheat chromosome physical map . Another peak contains three of the smallest wheat chromosomes 1D, 4D and 6D, and their physical map construction is underway (http://wggrc.plantpath.ksu.edu/wheat/Dgenome/). For the remaining wheat chromosomes, lines carrying telocentric chromosomes, which represent individual arms, must be used to increase sorting resolution and library construction [2, 3]. Chromosome and chromosome arm-specific BAC libraries now have been constructed for most of the wheat chromosomes (http://lmcc.ieb.cz/dna-libraries/cereals).
Flow-sorted chromosomes and libraries have been utilized for generating chromosome and chromosome arm-specific sequence information for gene discovery and comparative genome analysis. BAC end sequences (BESs) have been analyzed from BAC libraries and can be highly informative in determining genome content and organization, enhance the value of BACs as a genomic resource and also provide random sequence information. BESs have been used to estimate the distribution of the repetitive elements including the retrotransposons, DNA transposons and SSRs in several plant species [7–14]. Paux et al.  generated 11 Mb of random BESs from chromosome 3B of bread wheat and reported 86% of the sequences as repetitive elements, 1.2% as coding region and 13% was unknown. BESs can be an excellent source for marker development for genetic and physical mapping. BESs were used to develop simple sequence repeats, genic sequence-based markers and, more recently, inserted transposable elements junction based ISBP markers [9, 10, 13–15].
Sequences from the whole-genome amplification of flow-sorted chromosomes or chromosome-arms have been published for several wheat and barley chromosomes [16–22]. Wicker et al.  sequenced flow-sorted wheat group-1 chromosomes by Roche/454 technology at 1.3–2.2x coverage and used the sequence data to estimate gene syntenic relationships of the Triticeae with Brachypodium, rice and sorghum and suggested a large number of sequences that are nonsyntenic to model grasses are probably pseudogenes. Berkman et al. [17, 21] recently sequenced the flow sorted chromosomes 7BS and 7DS of wheat to isolate the low copy and genic sequences and observed ~ 60% rate of colinearity with Brachypodium. Recently, Rustenholnz et al. studied the gene space organization in wheat by mapping the expressed portion of chromosome 3B using barley microarrays  and a wheat unigene microarray  and observed a 2-fold increase in gene density from centromere towards the telomeres. A BES  and transcript map  of chromosome 3B provide an opportunity for comparative analysis of homoeologous chromosomes.
In this study, we used the first arm-specific library in wheat from chromosome 3AS  for which we are developing a physical map. The 3AS BAC library consists of 55, 296 clones (physical map 3ASv1.0; http://wggrc.plantpath.ksu.edu/wheat/3A/3A_index.html). We randomly selected nearly 10,000 BACs and end sequenced them to generate 11 Mb of sequence. These sequences were analyzed to obtain insight into the sequence composition of wheat chromosome arm 3AS and its evolutionary relationship with other homoeologous chromosomes and model grass genomes. Furthermore, we identified DNA motifs potentially useful as molecular markers to saturate chromosome 3AS.
Results and discussion
BAC-end sequencing and data processing
Origin and distribution of 3AS BAC clones used in BAC end sequencing in the first FPC assembly of chromosome 3AS of bread wheat
No. of end-sequenced BACs
Average number of end sequenced BACs/Contig
Size >200 kb
Size >500 kb
Analysis of the repeat fraction and identification of novel repeat elements
Occurrence and distribution of repetitive DNA in the BAC end sequences of bread wheat chromosome arm 3AS
Class, subclass, family
No. of hits
No. of bases (bp)
Fraction of chromosome arm 3AS (%)
Class I elements (retrotransposons)
Class II elements (DNA transposons)
Other known repeats
The frequency and distribution of SSRs in wheat homoeologous chromosomes The SSRs on chromosome arm 3AS were obtained from BESs analysis and SSRs from chromosome 3B similar analysis from BESs 
Wheat 3AS (355 Mb)
Wheat 3B (1000 Mb)
Public repeat databases were used to mask the BESs matching to known DNA repeat sequences. However, the repeat detection could be limited by the size and diversity of the repeat database, such as the absence of genome and species-specific elements. Furthermore, the rapid evolution of repetitive DNA elements may lead to the origin of several unknown repetitive DNA elements. To identify unknown and wheat-specific repetitive DNA elements, the BESs were masked with known repetitive elements and then compared to each other. The BESs with multiple hits (more than eight) at high stringency (at least 100 bp aligned contiguously) were identified as repeat sequences. Families of repeat sequences were obtained computationally from BESs and then subjected to multiple sequence alignment to obtain 122 consensus sequences. The consensus sequences were merged at overlapping regions and extended using CAP3 software  and detected 31 putative novel repeats with sizes ranging from 199 bp to 1,420 bp (Additional file 1). The putative repeat elements could be considerably longer but, because the average BES length was 623 bp, only a part of the repeats could be identified. The searching of all databases, including TREP, RepBase and Genbank, against putative repeats resulted in the identification of 11 protein coding genes representing large gene families, 10 of which also matched transposable proteins. The remaining 20 repeat elements showed no similarity in BLASTN or BLASTX  and were termed as Triticum novel repeat sequences (TRES). The novel, repeat sequences (TRES) were analyzed for their presence in wheat chromosome 3A by another BLAST against BESs. These TRES were repeated at a higher frequency compared to the initial search in the BESs because of the extended length of the repeat sequences. TRES2 had maximum number of hits in the BESs (84 at e <10-25), and the novel repeats totaled 84 matches in the BESs database (Additional file 2). Seventeen of the 20 TRESs had a hit on chromosome 3B BESs when BLASTN searched in NCBI nt/nr database.
Gene content and functional annotation
After masking the known and putatively new repetitive sequences, the remaining BESs were used to estimate the gene content of chromosome arm 3AS. The sequences were compared with wheat transcript assemblies (Triticum aestivum release 2) using TBLASTX and BLASTN algorithms . We found 519 unique sequences that were similar (e-value 10-50) to an EST with a cumulative match length of 152,662 bp, representing about 1.38 % of the data set. Considering the size of 3AS as 355 Mb, the estimated transcribed fraction of 1.38 % corresponds to 4.8 Mb in 3AS and 11.4 Mb for the entire 3A chromosome. Analysis of full-length cDNAs in several grass species have shown the average lengths of FLcDNA varying between 1,000-2,000 bp. In 37,139 Oryza sativa and 7,455 maize FLcDNAs, the average length was found to be 1,746 ± 945 bp  and 1,442 bp , respectively. In the Triticeae tribe, the average length of 22,651 Uni-FLcDNAs in barley was 1,711 ± 863 bp . Considering 1.7 kb as the average length of a gene based on the average in barley, our results suggest the presence of about 2,850 genes on the short arm of chromosome 3A and ~6,700 genes on chromosome 3A. This estimate is a little lower than the prediction of 7–8,000 genes in chromosome 3A based on preliminary results from Roche/454 sequences of sorted 3A chromosome arms (our unpublished results).
The gene estimates for the homoeologous 3B chromosome, which is 20% bigger than 3A, varies from 6,000 genes based on BES analysis  to over 8,400 to 9,500 genes, based on sequence analysis of 17 Mb of contiguous BAC sequences  and ~6,300 genes based on 1.2x coverage of sorted chromosome 3B by Solexa/Illumina sequencing . The estimated number of genes on chromosomes 3A and 3B is much higher than the 4,856 genes predicted for syntenic rice chromosome 1 . However, the predicted gene numbers may be overestimated, because it is difficult to distinguish bona fide genes from pseudogenes or gene fragments due to the incompleteness of genome or chromosome sequence data.
Comparative sequence composition of homoeologous A, B and D genomes
The homoeologous genomes of wheat have undergone dramatic sequence diversification in the intergenic regions, but have still retained the backbone of gene synteny with their ancestral genomes. A significant amount of representative sequence data was generated from chromosome 3A in the present study and from chromosome 3B [9, 24] and the D genome . Assuming that a chromosome arm could act as a good representative sample of the genome, we utilized 11 Mb BESs each from chromosome arm 3AS, chromosome 3B  and a 2.9 Mb of shotgun sequence from the D genome  to compare the composition of homoeologous genomes. Our estimate of the repeat content in chromosome 3AS (79.1%) is similar to the 73-82% repeat content of other A-genome chromosomes sequenced by next generation technology from flow-sorted chromosome-arm specific MDA DNA[19, 20, 22]. However, MDA is known to introduce a bias in the representation of individual amplified regions [39, 40], and a more accurate estimation must await quantitative analysis of the repeat content from flow-sorted chromosome arms DNA without the MDA step. However, BES information from chromosomes 3AS and 3B are the best available sequence samples to compare with homoeologous chromosomes and genomes and, together with next generation sequencing of wheat chromosomes 1A, 4A, 5A, 1B, 7BS, 1D and 7DS, provide a snapshot of the gene content of wheat.
Based on the analysis of 3B BESs and the reanalysis of the D-genome shotgun sequence , Paux et al. hypothesized that the Triticeae genomes have evolved from an ancestral genome of approximately 1 Gb. The portion of the genome not related to the repetitive sequences in the A genome based on 3AS and 5A sequence data is estimated at 1.1 Gb (~20% of the 5.5 Gb A genome). However, most probably, this is an overestimate, because the wheat repeated sequence database is incomplete and the non-repetitive portion of the D genome based on evidence from bioinformatics and hybridization analyses was just 8.4% .
GC content is an important feature and was quite comparable between chromosomes 3AS (44.5%) and the other A-genome chromosomes 1AL (44.7%, ) 4AS (44.7%, ) however, the GC content of chromosome 4AL (41.4%, ) was considerably lower possibly because of biasness caused by MDA. The GC content in the B-genome chromosome 3B (44.6%, ) was quite similar to that of the A-genome chromosomes of wheat, and the GC content of wheat is comparable to other grasses, such as rye (45.9%, ), maize (46.5%, ) and rice (44.0%, ).
The BESs from chromosome 3AS were BLASTN against the 3,000-loci transcript map recently developed for chromosome 3B , and 100 BESs matched to 77 unigenes from chromosome 3B. Of these, 77 BESs (matching 77 unigenes), 45 had a hit in rice (40/77) or Brachypodium (40/77) with 70% (28/40) of the hits in rice and 77% (31/40) in Brachypodium in syntenic regions (Additional file 5). Rustenholz et al.  reported 35-42% of unigenes genes mapped on chromosome 3B were non-syntenic in model grasses genomes. Our results, based on the sequence from 3A together with 3B unigenes  showed that 23-30% of the genes are non-syntenic in wheat, which is consistent with data from chromosome arms 1AL, 1BL, and 1DL that showed 20-45% were non-syntenic . A similar rate of colinearity was observed for chromosomes 7BS and 7DS with the corresponding region of the Brachypodium by sequence analysis of sorted chromosome arms . This erosion of colinearity could be explained by a high rate of transposable element activity in hexaploid wheat, such as TE-driven gene movement . Nearly, 40% (32/77) BESs having a hit to 3B unigenes had no hit in rice or Brachypodium. Rustenholz et al.  reported that 89% of these unigenes were expressed, and we found 11 of the 32 BESs had a hit with barley full-length cDNA suggesting these genes are less likely to be pseudogenes. Most probably, these genes are less conserved and, therefore, remain undetected in model genomes, however, a few of them could be novel wheat/Triticeae-specific genes. In the recent barley genome sequence analysis by Meyer et al. , similar results were observed where 23% of the barley genes had no sequence similarity in model genomes, although they were expressed in barley.
Based on our BES analysis, we estimated one gene per 123 kb and ~44,600 genes in the A genome of wheat. Vitulo et al.  recently sequenced the MDA from flow-sorted chromosome 5A of wheat and estimated 5,088 genes of chromosome 5A and with a gene density of one gene per 162 kb. The coding fraction for chromosome 3AS (1.38%) was comparable to those of chromosome 5A (1.08-1.30%, ) and 1AL (1.03%, ). In another sequence analysis and comparison between the chromosome 4A of wheat and model genome Brachypodium, Hernandez et al.  estimated 9,500 genes on chromosome 4A and ~61,500 in the A genome. However, taking into account the large number of pseudogenes and paralogs, they estimated a minimum number of 4,300 genes on chromosome 4A and 28,000 in the A genome of wheat . The number of genes estimated in our analysis is higher than conservative estimates based on chromosomes 4A and 5A, however, it corresponds well with the recently estimated numbers of genes in the A genome based on BESs from chromosome 1AL (50,000 genes ) and the number of genes in the B genome based on 17 Mb of BAC sequence of chromosome 3B (41,000-50,000 genes ). These estimates are much lower than predicted in earlier studies based on the random sequencing of methylfiltrated libraries  and significantly higher than the estimated 36,000 genes for the D-genome donor Ae. tauschii based on BAC sequencing .
Berkman et al.  made a very stringent estimate of about 25,000-26,000 genes per genome of wheat based on sequence analysis of wheat chromosome arms 7BS and 7DS. However, the estimates made by Berkman et al.  and Hernandez et al.  are based on the genome zipper approach. These estimates do not take into account about 23-30% of the genes that lack sufficient sequence similarity to any gene in the three model grass genomes as suggested by Meyer et al.  in barley and our analysis of 3AS BESs and 3B unigenes in wheat.
On the other hand, the estimate of gene content by several other studies show that the relatively higher gene content in wheat (100,000-350,000) [9, 41] could be an inflation in the actual number of genes due to misannotations of pseudogenes as all these estimates are based on partial gene sequences or a relatively small data set compared to the large wheat genome. Furthermore, recent sequence analysis of the group-1 chromosomes of wheat showed that a large number of nonsysntenic genes are present in only one of three wheat genomes and only 60 % of them showed transcriptional evidence, suggesting that most of them are pseudogenes  making annotation of genes based on sequence similarity suboptimal. We may have a better estimate of the gene content once full-length gene sequences and a gold standard reference sequence of wheat becomes available allowing us to differentiate between genes and gene fragments.
Comparison to model grass genomes
Molecular markers from 3AS
In order to identify new putative markers for physical and genetic mapping, 17,591 BESs were analyzed for the presence of SSRs, which could be converted into microsatellite markers. A total of 1,057 perfect repeats of di-, tri- and tetranucleotide repeats were identified. The frequency of SSRs derived from the BESs was about one SSR per 10.4 kb (Table 3). The most abundant of these SSRs were dinucleotide 779 (73.8%) and trinucleotide 268 (25.2%); tetranucleotides were rare (1.0%). Nearly 28% of SSRs (both di- and trinucleotide) were AT rich (i.e., TA, AT and TAA). A comparison of the frequency and distribution of the SSRs between the A and B genomes of wheat indicates a higher density of SSRs on chromosome 3B (one SSR per 6.1 kb) compared to chromosome 3AS (one SSR per 10.4 kb). Furthermore, the dinucleotide motif (TA)n was most abundant (12.6%) on chromosome 3AS in contrast to the trinucleotide motif (AAG), which was the most abundant in wheat chromosome 3B . Other species, such as Brachypodium, rice and maize, also have a higher frequency of trinucleotide motifs, however, species such as papaya show a higher frequency of dinucleotide motifs and soybean had higher frequency of tetranucleotide motifs . In our study, the frequency of SSRs on chromosome 3AS is far lower than that of chromosome 3B. The higher frequency of AT-rich di- and trinucleotide motifs in wheat and maize suggests that AT-rich region are associated with repetitive sequences, in contrast to GC-rich regions, which are associated with the transcribed part of the genome.
Microsatellite primer pairs were successfully developed from 758 out of 1,057 SSRs. Of the 758 SSRs primer pairs, 155 were developed from class I microsatellites with more than nine motif repeats (Additional file 7). The BESs were also processed to identify junctions between TE and other sequences using ISBPfinder.pl . In total, 7,928 insertion sites were identified in 17,591 BESs, and primer pairs were successfully designed from 695 suitable junctions (Additional file 7). A repeat was identified every 1.39 kb on t3AS, whereas a repeat was identified every 0.89 kb on t1AL , suggesting that repeat-based makers could play an important role in the saturation mapping of the wheat genome. A set of 96 class I SSRs and 96 ISBP primer pairs were tested by PCR amplification for 3AS specificity and utility across species.
Nearly, 53% (23/44) of ISBPs designed from 1AL BESs were 1AL specific , suggesting that because the first primer is designed from repeat and the second one from unique sequences, the high specificity of the unique sequences becomes very significant for specific amplification.
The set of 96 class I SSRs and 96 ISBP primer pairs were also tested by PCR amplification on wheat genotypes, the parents of three mapping populations: cultivated diploid wheat (T. momococcum subsp. monococcum) and wild diploid wheat (T. monococcum subsp. aegilopoides); tetraploid wheat (T. turgidum subsp. durum) genotypes Langdon and Langdon DIC-3A; and bread wheat genotypes CS, Opata 85 and synthetic W7984 (Figure 4). More than 93% of the SSR and 97% of ISBP primer pairs gave amplification from diploid, tetraploid and hexaploid wheat. Twenty-six percent (25/96) ISBPs and 16% (15/96) SSRs were found polymorphic between parents of diploid mapping population T. monococcum and T. aegilopoides on 3% agarose gels. To eliminate the effect of amplification from 3B and 3D, we are mapping these markers in a diploid wheat (AA) T. monococcum subsp. monococcum/subsp. aegilopoides RIL population (data not shown). Amplification of the primer pairs from diploid, tetraploid and hexaploid wheat shows their reproducibility and wide adaptability of these markers.
In summary, a total of 1,453 new (758 SSR + 695 IBSP) putative markers have been developed spanning the 3AS arm, which adds over 200% more markers than mapped in all previous studies (<600 markers) on this arm. The universality and variability of SSRs make these sequences an attractive source of developing microsatellite markers. However, because most of the genome (>80%) is repetitive, the ISBP repeat insertion-based markers could be very important in saturating markers in this portion of the genome and, thus, could enhance the efficiency of breeding programs. The potential of such repeat-based markers in crop breeding for high-throughput, marker-assisted selection has been suggested by Paux et al. . More than 30 genes and QTLs related to domestication traits, such as brittle rachis; tillering; disease resistance genes against stem rust, leaf rust, tanspot, and Septoria tritici blotch; and agronomic traits such as preharvest sprouting and root-shoot biomass, have been genetically mapped on chromosome 3AS of wheat (http://www.shigen.nig.ac.jp/wheat/komugi/genes/symbolClassList.jsp) . In addition to the 1,453 molecular markers (SSR and ISBP) developed in the present study, we have obtained sequence tags of 161 of the estimated ~2,850 genes on chromosome 3AS. The 1,453 molecular markers plus 161 sequence tags of genes will aid in marker-assisted selection and the map-based cloning of above-mentioned agronomic traits on 3AS arm. The molecular markers will also help to align the physical and genetic maps of chromosome 3AS.
BESs are a valuable genomic resource for genome organization analysis, comparative genomics and marker development. Analysis of 17,591 3AS BESs revealed differences in the repeat landscape of the A- and B-genome homoeologous chromosomes and estimate of gene content, Triticum-specific and genome-specific genes. We also developed BESs-derived microsatellite and ISBP putative markers as a resource for integrating physical and genetic maps and for mapping and cloning of a large number of agronomically important genes on 3AS arm.
We used the first of the two BAC libraries constructed from the short arm of chromosome 3A (3AS). The library (code TaaCsp3AShA) consists of 55,296 clones arranged in 144 384-well plates. With the average insert size of 80 kb and 11% contamination with other chromosomes, the library represents 10.9x equivalents of the chromosome arm .
BAC end sequencing
Twenty-six 384-well plates (9,984 BAC clones) randomly selected from the 3AS library were end sequenced (5’ & 3’) at the J. Craig Venter Institute (Rockville, MD). BAC clones of HindIII 96-deep-well blocks containing 1.2 ml/well of 2xYT medium were grown overnight at 37°C with shaking at 300 rpm. The cells were harvested by centrifugation, and the BAC DNA was purified using a REAL Prep 96 Plasmid Kit (Qiagen, Valencia, CA). For BAC-end sequencing, 5 μl of purified BAC DNA (~ 0.2 to 0.5 μg) was used in a sequencing reaction with ABI BigDye terminators (Applied Biosystems, Foster City, CA). Template DNA was sequenced from both directions with pCC1BAC/pIndigoBAC-5 Forward and Reverse End-Sequencing Primers (Epicentre, Madison, WI). Electrophoresis of the sequencing reaction was carried out with a 3730xl DNA Analyzer (Applied Biosystems, Foster City, CA).
Sequence data processing
The sequences and quality files from trace files were read by the Phred program for base calling and trimmed to remove vector and low quality bases [28, 29]. The high-quality sequence data were then filtered for sequences contaminated with Escherichia coli or with plant organelle genomes based on matches to the wheat mitochondria and chloroplast sequences.
The high-quality BESs were compared with several repeat databases, including the Triticeae repetitive (TREP) sequence database (http://wheat.pw.usda.gov/ITMI/Repeats/)[47, the TIGR plant repeat database (ftp://ftp.tigr.org/pub/data/TIGR_Plant_Repeats/), and RepBase at GIRI. (http://www.girinst.org/). Self-BLASTN and BLASTX at an e-value cutoff of 10−5 was used . The composition and contents of repeat element in BESs was assessed using the RepeatMasker program (http://www.repeatmasker.org/) with the default settings using each of the above databases as the custom library option. Finally, a cutoff score of 250 was used to identify the repetitive sequences. The BESs were annotated based on their best match to the repeat database and categorized according to (http://wheat.pw.usda.gov/ITMI/Repeats/gene_annotation.pdf). Sequences matching known repeats were masked as N and repeat masked sequences were further used to identify novel repeats. Self-BLASTN was performed on repeat-masked BESs to identify sequences that had multiple strong matches to other BESs with an e-value <10−50. The blast output file was processed using RECON software (http://selab.janelia.org/recon.html)  to identify families with eight or more sequences. The sequences were extracted and aligned by ClustalW . Consensus sequences for each family with a minimum 100-bp alignment and 80 % identity were selected. The consensus sequences were compared to each other and aligned using the CAP3 software  to merge overlapping regions and extend the sequences. The putative novel repeats were searched in BLAST against nonredundant nucleotide, EST and protein databases to remove any characterized sequences. The repeat-masked sequences were subjected to gene content analysis by a homology search using TBLASTX versus the T. aestivum Transcript assembly 2.0 (e-value 10−50) (http://www.jcvi.org/wheat/downloads.php). The cumulative match lengths were used to calculate the coding fraction, as described for repetitive elements. The repeat masked sequences were subjected to Blast2GO for functional annotation using default parameters .
Comparative sequence analysis
The rice (http://rice.plantbiology.msu.edu/), Brachypodium (http://ftp.brachypodium.org/) and Sorghum (http://www.phytozome.net/sorghum) sequences were downloaded from their respective databases. The 3B BESs were obtained from NCBI Genbank and 3B unigene sequences were requested from authors . The repeat-masked BESs sequences were searched in model grass genomes by BLASTN against Brcahypodium chromosome 2, rice chromosome 1 and Sorghum chromosome 3 at an e-value of 1e-5 and a minimum hit length of 50 bp. The unique hits for each BESs were used for developing a circos map for visualization of synteny . The 3AS BESs were compared with chromosome 3B unigenes, barley full length cDNA by BLASTN at an e-value of 1e-10.
Development of molecular markers from BESs
The BESs were screened computationally for the presence of di-, tri- and tetranucleotides SSRs using the SSRIT program (ftp://ftp.gramene.org/pub/gramene/software/scripts/ssr.pl). SSRs motifs that span ten or more nucleotides were recorded, and SSR location data in sequence were extracted from the SSR search output files. Sequences containing SSR were used to design primers using an in-house perl script and Pimer 3.0 with default settings, and ISBP primers were designed using IsbpFinder.pl . All the SSR and ISBP primer pairs were amplified using the 0.8 μM of primer and 0.25 mM of dNTP, 1.25 mM MgCl2, 0.1 U Taq polymerase (Bioline, CA, USA) with 1× buffer and 50 ng template DNA. PCR amplification was conducted using the following procedure: nine cycles (95°C for 30 sec, 58°C for 60 sec minus 0.5°C each cycle, 72°C for 60 sec), 25 cycles (95°C for 30 sec, 53°C for 45 sec, 72°C for 60 sec) and one additional cycle (72°C for 7 minutes). Genomic DNA amplified with SSR and ISBP primers were separated on 3 % GenePure HiRes Agarose (ISC Bioexpress, Kaysville, UT, USA) gels.
BSG and WL: conceived the study; SKS and WL: carried out the sequence data analysis; PR and AC: BAC-end sequencing; HS and JD: provided the BAC library and participated in manuscript preparation; SKS, BSG and WL wrote the manuscript. All authors read and approved the final manuscript.
bacterial artificial chromosome
BAC end sequences
expressed sequence tag
high-scoring segment pair
multiple displacement amplification
simple sequence repeat
insertion site based polymorphism
recombinant inbred line..
This work was supported by United States Department of Agriculture (USDA-NIFA grant nos. 2006-35604-17248 and 2008-35300-04588). HS and JD were supported by the Czech Ministry of Education, Youth and Sports and the European Regional Development Fund (Operational Program Research and Development for Innovations No. ED0007/01/01). We thank Jayachandran Pillamari for customized scripts and developing figures. This is contribution no. 12-240-J from the Kansas Agricultural Experiment Station, Manhattan, KS 66506–5502, USA.
- Gill BS, Appels R, Botha-Oberholster AM, Buell CR, Bennetzen JL, Chalhoub B, Chumley F, Dvorak J, Iwanaga M, Keller B, Li WL, McCombie WR, Ogihara Y, Quetier F, Sasaki T: A workshop report on wheat genome sequencing: international genome research on wheat consortium. Genetics. 2004, 168 (2): 1087-1096. 10.1534/genetics.104.034769.PubMedView ArticleGoogle Scholar
- Kubalakova M, Vrana J, Cihalikova J, Simkova H, Dolezel J: Flow karyotyping and chromosome sorting in bread wheat (Triticum aestivumL.). Theor Appl Genet. 2002, 104 (8): 1362-1372. 10.1007/s00122-002-0888-2.PubMedView ArticleGoogle Scholar
- Dolezel J, Kubalakova M, Paux E, Bartos J, Feuillet C: Chromosome-based genomics in the cereals. Chromosome Res. 2007, 15 (1): 51-66. 10.1007/s10577-006-1106-x.PubMedView ArticleGoogle Scholar
- Safar J, Simkova H, Kubalakova M, Cihalikova J, Suchankova P, Bartos J, Dolezel J: Development of chromosome-specific BAC resources for genomics of bread wheat. Cytogenet Genome Res. 2010, 129 (1–3): 211-223.PubMedView ArticleGoogle Scholar
- Safar J, Bartos J, Janda J, Bellec A, Kubalakova M, Valarik M, Pateyron S, Weiserova J, Tuskova R, Cihalikova J, Vrana J, Simkova H, Faivre-Rampant P, Sourdille P, Caboche M, Bernard M, Dolezel J, Chalhoub B: Dissecting large and complex genomes: flow sorting and BAC cloning of individual chromosomes from bread wheat. Plant J. 2004, 39 (6): 960-968. 10.1111/j.1365-313X.2004.02179.x.PubMedView ArticleGoogle Scholar
- Paux E, Sourdille P, Salse J, Saintenac C, Choulet F, Leroy P, Korol A, Michalak M, Kianian S, Spielmeyer W, Lagudah E, Somers D, Kilian A, Alaux M, Vautrin S, Berges H, Eversole K, Appels R, Safar J, Simkova H, Dolezel J, Bernard M, Feuillet C: A physical map of the 1-gigabase bread wheat chromosome 3B. Science. 2008, 322 (5898): 101-104. 10.1126/science.1161847.PubMedView ArticleGoogle Scholar
- Messing J, Bharti AK, Karlowski WM, Gundlach H, Kim HR, Yu Y, Wei FS, Fuks G, Soderlund CA, Mayer KFX, Wing RA: Sequence composition and genome organization of maize. Proc Natl Acad Sci USA. 2004, 101 (40): 14349-14354. 10.1073/pnas.0406163101.PubMedView ArticleGoogle Scholar
- Hong CP, Plaha P, Koo D-H, Yang T-J, Choi SR, Lee YK, Uhm T, Bang J-W, Edwards D, Bancrofts I, Park B-S, Lee J, Lim YP: A survey of theBrassica rapagenome by BAC-end sequence analysis and comparison withArabidopsis thaliana. Mol Cells. 2006, 22 (3): 300-307.PubMedGoogle Scholar
- Paux E, Roger D, Badaeva E, Gay G, Bernard M, Sourdille P, Feuillet C: Characterizing the composition and evolution of homoeologous genomes in hexaploid wheat through BAC-end sequencing on chromosome 3B. Plant J. 2006, 48 (3): 463-474. 10.1111/j.1365-313X.2006.02891.x.PubMedView ArticleGoogle Scholar
- Bartos J, Paux E, Kofler R, Havrankova M, Kopecky D, Suchankova P, Safar J, Simkova H, Town CD, Lelley T, Feuillet C, Dolezel J: A first survey of the rye (Secale cereale) genome composition through BAC end sequencing of the short arm of chromosome 1R. BMC Plant Biol. 2008, 8 (95): .
- Datema E, Mueller LA, Buels R, Giovannoni JJ, Visser RGF, Stiekema WJ, van Ham RCHJ: Comparative BAC end sequence analysis of tomato and potato reveals overrepresentation of specific gene families in potato. BMC Plant Biol. 2008, 8 (34): .
- Gonzalez VM, Rodriguez-Moreno L, Centeno E, Benjak A, Garcia-Mas J, Puigdomenech P, Aranda MA: Genome-wide BAC-end sequencing ofCucumis melousing two BAC libraries. BMC Genomics. 2010, 11 (618): .
- Dolezel J, Greilhuber J, Suda J: Estimation of nuclear DNA content in plants using flow cytometry. Nat Protoc. 2007, 2 (9): 2233-2244. 10.1038/nprot.2007.310.PubMedView ArticleGoogle Scholar
- Lucas S, Simkova H, Safar J, Jurman I, Cattonaro F, Vautrin S, Bellec A, Berges H, Dolezel J, Budak H: Functional features of a single chromosome arm in wheat (1AL) determined from its structure. Funct Integr Genomics. 2011, 12 (1): 173-182.PubMedView ArticleGoogle Scholar
- Shultz JL, Kazi S, Bashir R, Afzal JA, Lightfoot DA: The development of BAC-end sequence-based microsatellite markers and placement in the physical and genetic maps of soybean. Theor Appl Genet. 2007, 114 (6): 1081-1090. 10.1007/s00122-007-0501-9.PubMedView ArticleGoogle Scholar
- Mayer KFX, Taudien S, Martis M, Simkova H, Suchankova P, Gundlach H, Wicker T, Petzold A, Felder M, Steuernagel B, Scholz U, Graner A, Platzer M, Dolezel J, Stein N: Gene content and virtual gene order of barley chromosome 1 H. Plant Physiol. 2009, 151 (2): 496-505. 10.1104/pp.109.142612.PubMedView ArticleGoogle Scholar
- Berkman PJ, Skarshewski A, Lorenc MT, Lai K, Duran C, Ling EYS, Stiller J, Smits L, Imelfort M, Manoli S, McKenzie M, Kubalakova M, Simkova H, Batley J, Fleury D, Dolezel J, Edwards D: Sequencing and assembly of low copy and genic regions of isolatedTriticum aestivumchromosome arm 7DS. Plant Biotech J. 2011, 9 (7): 768-775. 10.1111/j.1467-7652.2010.00587.x.View ArticleGoogle Scholar
- Mayer KFX, Martis M, Hedley PE, Simkova H, Liu H, Morris JA, Steuernagel B, Taudien S, Roessner S, Gundlach H, Kubalakova M, Suchankova P, Murat F, Felder M, Nussbaumer T, Graner A, Salse J, Endo T, Sakai H, Tanaka T, Itoh T, Sato K, Platzer M, Matsumoto T, Scholz U, Dolezel J, Waugh R, Stein N: Unlocking the barley genome by chromosomal and comparative genomics. Plant Cell. 2011, 23 (4): 1249-1263. 10.1105/tpc.110.082537.PubMedView ArticleGoogle Scholar
- Wicker T, Mayer KFX, Gundlach H, Martis M, Steuernagel B, Scholz U, Simkova H, Kubalakova M, Choulet F, Taudien S, Platzer M, Feuillet C, Fahima T, Budak H, Dolezel J, Keller B, Stein N: Frequent gene movement and pseudogene evolution is common to the large and complex genomes of wheat, barley, and their relatives. Plant Cell. 2011, 23 (5): 1706-1718. 10.1105/tpc.111.086629.PubMedView ArticleGoogle Scholar
- Vitulo N, Albiero A, Forcato C, Campagna D, Dal Pero F, Bagnaresi P, Colaiacovo M, Faccioli P, Lamontanara A, Simkova H, Kubalakova M, Perrotta G, Facella P, Lopez L, Pietrella M, Gianese G, Dolezel J, Giuliano G, Cattivelli L, Valle G, Stanca AM: First Survey of the wheat chromosome 5A composition through a next generation sequencing approach. PLoS One. 2011, 6 (10): e26421-10.1371/journal.pone.0026421.PubMedView ArticleGoogle Scholar
- Berkman PJ, Skarshewski A, Manoli S, Lorenc MT, Stiller J, Smits L, Lai K, Campbell E, Kubalakova M, Simkova H, Batley J, Dolezel J, Hernandez P, Edwards D: Sequencing wheat chromosome arm 7BS delimits the 7BS/4AL translocation and reveals homoeologous gene conservation. Theor Appl Genet. 2012, 124 (3): 423-432. 10.1007/s00122-011-1717-2.PubMedView ArticleGoogle Scholar
- Hernandez P, Martis M, Dorado G, Pfeifer M, Galvez S, Schaaf S, Jouve N, Simkova H, Valarik M, Dolezel J, Mayer KFX: Next-generation sequencing and syntenic integration of flow-sorted arms of wheat chromosome 4A exposes the chromosome structure and gene content. Plant J. 2012, 69 (3): 377-386. 10.1111/j.1365-313X.2011.04808.x.PubMedView ArticleGoogle Scholar
- Rustenholz C, Hedley PE, Morris J, Choulet F, Feuillet C, Waugh R, Paux E: Specific patterns of gene space organisation revealed in wheat by using the combination of barley and wheat genomic resources. BMC Genomics. 2010, 11 (714): .
- Rustenholz C, Choulet F, Laugier C, Safar J, Simkova H, Dolezel J, Magni F, Scalabrin S, Cattonaro F, Vautrin S, Bellec A, Berges H, Feuillet C, Paux E: A 3,000-loci transcription map of chromosome 3B unravels the structural and functional features of gene islands in hexaploid wheat. Plant Physiol. 2011, 157 (4): 1596-1608. 10.1104/pp.111.183921.PubMedView ArticleGoogle Scholar
- Safar J, Simkova H, Kubaláková M, Suchankova P, Cihalikova J, Bartos J, Fiocchetti F, Roselli M, Gill BS, Dolezel J, Lucretti S: Generating resources for genomics of wheat homoeologous chromosome group 3: 3AS- and 3DS-specific BAC libraries. J Genet Breed. 2007, 61: 151-160.Google Scholar
- Gill BS, Friebe B, Endo TR: Standard karyotype and nomenclature system for description of chromosome bands and structural-aberrations in wheat (Triticum aestivum). Genome. 1991, 34 (5): 830-839. 10.1139/g91-128.View ArticleGoogle Scholar
- Soderlund C, Longden I, Mott R: FPC: a system for building contigs from restriction fingerprinted clones. CABIOS. 1997, 13: 523-535.PubMedGoogle Scholar
- Lazo GR, Tong J, Miller R, Hsia C, Rausch C, Kang Y, Anderson OD: Software scripts for quality checking of high-throughput nucleic acid sequencers. Biotechniques. 2001, 30 (6): 1300.PubMedGoogle Scholar
- Chou HH, Holmes MH: DNA sequence quality trimming and vector removal. Bioinformatics. 2001, 17 (12): 1093-1104. 10.1093/bioinformatics/17.12.1093.PubMedView ArticleGoogle Scholar
- Goff SA, Ricke D, Lan TH, Presting G, Wang RL, Dunn M, Glazebrook J, Sessions A, Oeller P, Varma H, Hadley D, Hutchinson D, Martin C, Katagiri F, Lange BM, Moughamer T, Xia Y, Budworth P, Zhong JP, Miguel T, Paszkowski U, Zhang SP, Colbert M, Sun WL, Chen LL, Cooper B, Park S, Wood TC, Mao L, Quail P, Wing R, Dean R, Yu YS, Zharkikh A, Shen R, Sahasrabudhe S, Thomas A, Cannings R, Gutin A, Pruss D, Reid J, Tavtigian S, Mitchell J, Eldredge G, Scholl T, Miller RM, Bhatnagar S, Adey N, Rubano T, Tusneem N, Robinson R, Feldhaus J, Macalma T, Oliphant A, Briggs S: A draft sequence of the rice genome (Oryza sativa L. ssp japonica). Science. 2002, 296 (5565): 92-100. 10.1126/science.1068275.PubMedView ArticleGoogle Scholar
- Haberer G, Young S, Bharti AK, Gundlach H, Raymond C, Fuks G, Butler E, Wing RA, Rounsley S, Birren B, Nusbaum C, Mayer KFX, Messing J: Structure and architecture of the maize genome. Plant Physiol. 2005, 139 (4): 1612-1624. 10.1104/pp.105.068718.PubMedView ArticleGoogle Scholar
- Huang XQ, Madan A: CAP3: A DNA sequence assembly program. Genome Res. 1999, 9 (9): 868-877. 10.1101/gr.9.9.868.PubMedView ArticleGoogle Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215 (3): 403-410.PubMedView ArticleGoogle Scholar
- Matsumoto T, Tanaka T, Sakai H, Amano N, Kanamori H, Kurita K, Kikuta A, Kamiya K, Yamamoto M, Ikawa H, Fujii N, Hori K, Itoh T, Sato K: Comprehensive sequence analysis of 24,783 barley full-length cdnas derived from 12 clone libraries. Plant Physiol. 2011, 156 (1): 20-28. 10.1104/pp.110.171579.PubMedView ArticleGoogle Scholar
- Soderlund C, Descour A, Kudrna D, Bomhoff M, Boyd L, Currie J, Angelova A, Collura K, Wissotski M, Ashley E, Morrow D, Fernandes J, Walbot V, Yu Y: Sequencing, mapping, and analysis of 27,455 maize full-length cDNAs. PLoS Genetics. 2009, 5 (11): e1000740-10.1371/journal.pgen.1000740.PubMedView ArticleGoogle Scholar
- Choulet F, Wicker T, Rustenholz C, Paux E, Salse J, Leroy P, Schlub S, Le Paslier MC, Magdelenat G, Gonthier C, Couloux A, Budak H, Breen J, Pumphrey M, Liu SX, Kong XY, Jia JZ, Gut M, Brunel D, Anderson JA, Gill BS, Appels R, Keller B, Feuillet C: Megabase level sequencing reveals contrasted organization and evolution patterns of the wheat gene and transposable element spaces. Plant Cell. 2010, 22 (6): 1686-1701. 10.1105/tpc.110.074187.PubMedView ArticleGoogle Scholar
- Gotz S, Garcia-Gomez JM, Terol J, Williams TD, Nagaraj SH, Nueda MJ, Robles M, Talon M, Dopazo J, Conesa A: High-throughput functional annotation and data mining with the Blast2GO suite. Nucleic Acids Res. 2008, 36 (10): 3420-3435. 10.1093/nar/gkn176.PubMedView ArticleGoogle Scholar
- Li WL, Zhang P, Fellers JP, Friebe B, Gill BS: Sequence composition, organization, and evolution of the core Triticeae genome. Plant J. 2004, 40 (4): 500-511. 10.1111/j.1365-313X.2004.02228.x.PubMedView ArticleGoogle Scholar
- Pinard R, de Winter A, Sarkis GJ, Gerstein MB, Tartaro KR, Plant RN, Egholm M, Rothberg JM, Leamon JH: Assessment of whole genome amplification-induced bias through high-throughput, massively parallel whole genome sequencing. BMC Genomics. 2006, 7 (216): .
- Pugh TJ, Delaney AD, Farnoud N, Flibotte S, Griffith M, Li HI, Qian H, Farinha P, Gascoyne RD, Marra MA: Impact of whole genome amplification on analysis of copy number variants. Nucleic Acids Res. 2008, 36 (13): e80-10.1093/nar/gkn378.PubMedView ArticleGoogle Scholar
- Rabinowicz PD, Citek R, Budiman MA, Nunberg A, Bedell JA, Lakey N, O'Shaughnessy AL, Nascimento LU, McCombie WR, Martienssen RA: Differential methylation of genes and repeats in land plants. Genome Res. 2005, 15 (10): 1431-1440. 10.1101/gr.4100405.PubMedView ArticleGoogle Scholar
- Massa AN, Wanjugi H, Deal KR, O'Brien K, You FM, Maiti R, Chan AP, Gu YQ, Luo MC, Anderson OD, Rabinowicz PD, Dvorak J, Devos KM: Gene space dynamics during the evolution ofAegilops tauschii,Brachypodium distachyon,Oryza sativa, andSorghum bicolorGenomes. Mol Biol Evol. 2011, 28 (9): 2537-2547. 10.1093/molbev/msr080.PubMedView ArticleGoogle Scholar
- Erayman M, Sandhu D, Sidhu D, Dilbirligi M, Baenziger PS, Gill KS: Demarcating the gene-rich regions of the wheat genome. Nucleic Acids Res. 2004, 32 (12): 3546-3565. 10.1093/nar/gkh639.PubMedView ArticleGoogle Scholar
- Huo N, Lazo GR, Vogel JP, You FM, Ma Y, Hayde DM, Coleman-Derr D, Hill TA, Dvorak J, Anderson OD, et al: The nuclear genome ofBrachypodium distachyon: analysis of BAC end sequences. Funct Integr Genomics. 2008, 8 (2): 135-147. 10.1007/s10142-007-0062-7.PubMedView ArticleGoogle Scholar
- Paux E, Faure S, Choulet F, Roger D, Gauthier V, Martinant J-P, Sourdille P, Balfourier F, Le Paslier M-C, Chauveau A, Cakir M, Gandon B, Feuillet C: Insertion site-based polymorphism markers open new perspectives for genome saturation and marker-assisted selection in wheat. Plant Biotech J. 2010, 8 (2): 196-210. 10.1111/j.1467-7652.2009.00477.x.View ArticleGoogle Scholar
- McIntosh RA, Yamazaki Y, Devos KM, Dubcovsky J, Rogers WJ, Appels R: Catalogue of gene symbols for wheat. Wheat Inf Serv. 2003, 97: 27-37.Google Scholar
- Wicker T, Matthews DE, Keller B: TREP: A database for Triticeae repetitive elements. TrendsPlant Sci. 2002, 7 (12): 561-562.View ArticleGoogle Scholar
- Jurka J, Kapitonov VV, Pavlicek A, Klonowski P, Kohany O, Walichiewicz J: Repbase update, a database of eukaryotic repetitive elements. Cytogenet Genome Res. 2005, 110 (1–4): 462-467.PubMedView ArticleGoogle Scholar
- Bao ZR, Eddy SR: Automated de novo identification of repeat sequence families in sequenced genomes. Genome Res. 2002, 12 (8): 1269-1276. 10.1101/gr.88502.PubMedView ArticleGoogle Scholar
- Thompson JD, Higgins DG, Gibson TJ: CLUSTAL-W - Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994, 22 (22): 4673-4680. 10.1093/nar/22.22.4673.PubMedView ArticleGoogle Scholar
- Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R, Horsman D, Jones SJ, Marra MA: Circos: An information aesthetic for comparative genomics. Genome Res. 2009, 19 (9): 1639-1645. 10.1101/gr.092759.109.PubMedView ArticleGoogle Scholar