Sequencing of 6.7 Mb of the melon genome using a BAC pooling strategy
- Víctor M González†1,
- Andrej Benjak†2,
- Elizabeth Marie Hénaff1,
- Gisela Mir2,
- Josep M Casacuberta1,
- Jordi Garcia-Mas2 and
- Pere Puigdomènech1Email author
© González et al; licensee BioMed Central Ltd. 2010
Received: 9 August 2010
Accepted: 12 November 2010
Published: 12 November 2010
Cucumis melo (melon) belongs to the Cucurbitaceae family, whose economic importance among horticulture crops is second only to Solanaceae. Melon has a high intra-specific genetic variation, morphologic diversity and a small genome size (454 Mb), which make it suitable for a great variety of molecular and genetic studies. A number of genetic and genomic resources have already been developed, such as several genetic maps, BAC genomic libraries, a BAC-based physical map and EST collections. Sequence information would be invaluable to complete the picture of the melon genomic landscape, furthering our understanding of this species' evolution from its relatives and providing an important genetic tool. However, to this day there is little sequence data available, only a few melon genes and genomic regions are deposited in public databases. The development of massively parallel sequencing methods allows envisaging new strategies to obtain long fragments of genomic sequence at higher speed and lower cost than previous Sanger-based methods.
In order to gain insight into the structure of a significant portion of the melon genome we set out to perform massive sequencing of pools of BAC clones. For this, a set of 57 BAC clones from a double haploid line was sequenced in two pools with the 454 system using both shotgun and paired-end approaches. The final assembly consists of an estimated 95% of the actual size of the melon BAC clones, with most likely complete sequences for 50 of the BACs, and a total sequence coverage of 39x. The accuracy of the assembly was assessed by comparing the previously available Sanger sequence of one of the BACs against its 454 sequence, and the polymorphisms found involved only 1.7 differences every 10,000 bp that were localized in 15 homopolymeric regions and two dinucleotide tandem repeats. Overall, the study provides approximately 6.7 Mb or 1.5% of the melon genome. The analysis of this new data has allowed us to gain further insight into characteristics of the melon genome such as gene density, average protein length, or microsatellite and transposon content. The annotation of the BAC sequences revealed a high degree of collinearity and protein sequence identity between melon and its close relative Cucumis sativus (cucumber). Transposon content analysis of the syntenic regions suggests that transposition activity after the split of both cucurbit species has been low in cucumber but very high in melon.
The results presented here show that the strategy followed, which combines shotgun and BAC-end sequencing together with anchored marker information, is an excellent method for sequencing specific genomic regions, especially from relatively compact genomes such as that of melon. However, in agreement with other results, this map-based, BAC approach is confirmed to be an expensive way of sequencing a whole plant genome. Our results also provide a partial description of the melon genome's structure. Namely, our analysis shows that the melon genome is highly collinear with the smaller one of cucumber, the size difference being mainly due to the expansion of intergenic regions and proliferation of transposable elements.
During recent years an important effort has been made to increase the tools available for the genomic analysis of major plant crop species. Since the first genome sequence available of Arabidopsis thaliana , several others have been published. They include model plants such as Brachypodium  but, increasingly, species that have been chosen for their importance in agriculture. For example the rice , maize , sorghum  or soybean  genomes are complex but the wealth of genetic information matches their economic interest. Consequently, for both scientific and economic reasons an increasing number of plant genomes are being analyzed, providing important resources useful for their biological study and breeding.
Several species of interest from both scientific and economic perspectives are of the Cucurbitaceae family. These include melon, cucumber, watermelon and squashes, all of which have been the object of biological and agricultural interest for centuries. In recent years various molecular tools have been established. For instance, the first assembly of the cucumber genome , as well as an increasing number of genetic and genomic resources developed for melon, a diploid species with a relatively compact (around 454 Mb ) genome . These include tools such as a collection of more than 129,000 ESTs [10, 11], BAC libraries [12, 13], oligo-based microarrays [14, 15], TILLING and EcoTILLING platforms [16, 17], a set of near isogenic lines (NILs)  and several melon genetic maps [11, 19–25]. Recently, we have built a physical map with 0.9x genomic coverage using both a BAC library and a genetic map previously developed in our laboratories [http://melonomics.upv.es/public_files, ], the first report of such a genomic resource of a Cucurbitaceae species so far. This physical map has also been integrated with the genetic map by anchoring a number of physical contigs (representing 12% of the melon genome) to 175 known genetic markers. These tools have been useful in the study of interesting agronomical traits such as virus or fungi resistance [27, 28], sex determination [29, 30] or the control of ripening [31, 32]. These results demonstrate that molecular genetic approaches can successfully be used in melon to address basic questions of biological or agronomic relevance.
More extensive sequence information would be invaluable to complete the picture of the melon genomic landscape. Indeed, the sequences of only a few selected genomic regions have been published, totaling no more than 500 kb [29, 33–35] and as of May 2010 no more than 173 melon genes can be found in GenBank , although a collection of ESTs probably representing more than 70% of the transcriptome is currently available . The sequencing of the Sorghum genome has shown the feasibility of sequencing a plant genome larger than that of melon (730 Mb) using a Sanger-based whole genome shotgun approach . However, the development of new massively parallel sequencing technologies allows envisaging a complete sequencing of the species at higher speed and at lower cost than previous Sanger-based methods. To this end, both whole genome sequencing approaches as well as map-based, BAC-to-BAC strategies have been proposed to sequence plant genomes [36, 37].
A small number of research projects involving 454 sequencing of BAC clones have currently been published. In a pioneering study aimed at analyzing how 454 technology would perform on template derived from large genomes rich in repetitive content, four barley BAC clones 102-120 kb long, two of which had been previously sequenced using Sanger technology, were sequenced using 454 . The results showed that gene-containing regions could efficiently and accurately be assembled into contigs, even at read coverages as low as x10.
In a later work eight BACs belonging to a minimum tiling path covering ca. 1 Mb of the Atlantic salmon genome were sequenced using 454 technology, the first published use of paired-end reads for de novo sequence assembly . This study demonstrated that although the inclusion of paired-end reads greatly improved sequence assembly, there remained a significant number of gaps when compared to Sanger-generated sequencing data. Thus the authors concluded that, when it comes to de novo sequencing complex genomes, 454 sequencing should be restricted, at least for the time being, to establishing a set of ordered sequenced contigs.
Although these studies show that 454 sequencing can be used to assemble gene-containing regions from genomic sequences using a BAC-to-BAC approach, the cost of 454 sequencing individual BACs has led to consider pooling individual clones as a means to increase throughput and reduce the cost of genome sequencing. In one published study, 166 BACs totalling 20 Mb were divided into six pools of overlapping BACs, aided by paired-end sequencing. These were then used to 454-sequence a minimum tiling path which covered an entire chromosome arm from Oryza barthii . The report shows that pooling BACs does not increase the complexity to a degree that makes assembly impossible, what makes this approach a feasible strategy for reducing the cost of BAC sequencing. In another work 91 barley BAC clones, pooled by sets of 12 or 24, were sequenced using 454 technology . The introduction of short sequence tags to fragmented BAC DNA prior to pooling and sequencing helped to resolve the assembly of multiplex sequencing data by establishing relationships between BAC clones and sequence reads, reducing sample complexity.
Here we present a pilot project aiming to sequence two pools of 35 and 23 melon BACs using the 454 system and a combination of shotgun and paired-end sequencing. The goal of the study was twofold: obtain sequence data for a significant proportion of the melon genome and thus insight into its structure, and test the strategy of massively sequencing pools of BACs. The results obtained allow an accuracy assessment of 454 sequence and assembly data as compared with sequence data produced using classical Sanger technology. Overall, the study provides approximately 7 Mb or 1.5% of the melon genome as a first step towards the complete sequence. The analysis of this data has provided insight into characteristics of the melon genome such as gene density, transposon content and synteny with cucumber.
Results and discussion
Selection of BAC clones for pooling and sequencing
Two pools of DNA prepared from BACs were sequenced using the 454 pyrosequencing method. These BACs had been produced from DNA of the double haploid line PIT92 obtained from the cross of PI 161375 and T111 as described in .
A second batch of 20 BACs anchored to genetic markers distributed throughout the genome but different from those corresponding to the first set of 35 BACs was also chosen for 454-sequencing (see Figure 1). Three additional BACs were included in this second pool: the above-mentioned BAC Cm43_H20, and two randomly chosen BAC clones not linked to any known genetic marker (BACs Cm21_I02 and Cm12_I23). In all, the second pool consists of 23 BACs mapping to at least 21 different genetic loci.
Correspondence between sequenced BAC clones, genetic markers and assembled contigs/scaffolds.*
Stretches of Ns
Sequencing and assembly
Details of the 454 FLX runs from which sequence data were obtained.
No. of reads
Metrics for BAC assemblies and final results after manual correction.*
57 BACs (two pools together)
No. of contigs a
No. of bases in contigs
Average contig size (bp)
N50 contig size (bp)
Largest contig size (bp)
Q40 plus bases
No. of scaffolds
No. of scaffolds larger than 20 kb
No. of bases in scaffolds
Average scaffold size
N50 scaffold size
Largest scaffold size
No. of unscaffolded contigs c
No. of bases in unscaff. contigs
Average unscaff. contig size
The assignment of contigs and scaffolds to BACs was performed using anchored genetic markers and BAC-end sequences as described in the Methods section. Also, the information from the C. melo FPC physical map  together with BAC-end sequences from some BAC clones in FPC contigs allowed us to manually edit two scaffolds of the final assembly. The physical map was also useful in assigning BACs Cm21_I08 and Cm12_I23 to their corresponding scaffolds, as no genetic markers correspond to these BACs. Finally, the previously Sanger-sequenced BAC Cm60_K17 (Acc. No.: AF499727.1, ) was added to the alignment of the sequenced BACs from the MRGH63 contig in order to extend the sequence used for subsequent analysis (see Additional file 1 Figure S1).
The final assembly consists of 73 scaffolds totaling 6.3 Mb, 73% of which are longer than 60 kb, with average scaffold size 86.8 kb and the largest scaffold 304 kb long; also, 744 unscaffolded contigs totaling 382 kb of sequence remain (Table 3). The sequence coverage of the final assembly is 39x, calculated as the ratio between the total length of the sequence reads and the assembly sequence length. Paired-end reads are used in the process of sequence assembly to join contigs (formed by read alignments) in structures called scaffolds, which represent sorted and correctly orientated contigs that are separated by gaps which sizes are estimated based on the average paired-end size (see, for example, ). The N50 contig size of our assembly was rather small (30.6 kb) compared to the N50 scaffold size (107.6 kb). This result confirms the importance of paired-ends when it comes to assembling a complex genome using 454 sequences.
Regarding the assignment of sequences to particular BACs, BAC Cm47_C02 could not be assigned to any scaffold or contig and BAC Cm46_I24 was assigned to a small contig of less than 1 kb using the genetic marker sequence information, and to another two small scaffolds using both BAC-end sequences. All other BACs were assigned to a unique scaffold or contig, two of which were smaller than 15 kb, another five in the 60-90 kb range while the rest was over 90 kb long (Table 1).
The search for BAC ends in the final set of contigs and scaffolds suggests that at least 42 scaffolds cover the complete sequence of 44 BACs (including the three BACs belonging to the scaffold MRGH63). An average of seven stretches of Ns (produced as a result of contig scaffolding) was found per scaffold and the total length of all Ns accounts for 4.8% of the final assembly length (see Additional file 3 Table S2). Nine additional scaffolds assigned to as many BAC clones were found to contain only one BAC border each; however, six of these scaffolds were bigger than 100 kb, and so they probably represent complete BAC sequences but for small deletions at their borders, while the rest measured between 60 and 80 kb and could represent a significant proportion of their correspondent BAC sequences. Finally, BAC borders were absent from two BAC sequences (corresponding to BACs Cm57_M11 and Cm59_C10), both smaller than 11 kb and therefore most likely incomplete.
As a summary, of a total of 57 pooled BACs, most likely complete sequences were produced for 50 BAC clones, three were incomplete but in the range of 60-80 kb and four BACs were attributed very limited sequence information. As the assignment was performed using a small amount of sequence information, namely the marker and BAC-end sequences (not available for all BACs), any sequence shorter than the full BAC insert size has few chances of being assigned to any particular BAC. This is obvious for the BAC Cm46_I24 where with each BAC-end sequence and the marker sequence we assign three rather small sequences (Additional file 3 Table S2). In our results, a total of 374 kb distributed in 20 contigs/scaffolds longer than 2,000 bp remained unassigned after the final assembly and could account for most of the sequence of those four problematic BACs. All markers but one (F149), and all available BAC-end sequences but three, matched against a contig or scaffold. The nucleotide sequences of contigs and scaffolds assigned to BACs as well as of those unassigned assembly sequences larger than 2 kb have been deposited in the GenBank database and their accession numbers can be found in the Additional file 3 Table S2.
The number of gaps per Mb (61) and the estimated amount of missed sequence in our main assembly (5%) are lower than those values from the above-mentioned studies using 454 sequencing of BAC clones [37–39], a fact most probably due to the absence of paired-end sequencing in [38, 40], to the short reads that were being produced at the erlier stages of 454 technology (100 bp on average) , to the complexity of the barley and salmon genomes as compared with melon's [38–40], and to the higher amount of assembled sequence in the case of O. barthii . In summary, although using shotgun and paired-end libraries of pooled BACs remains a costly proposition for sequencing a whole genome, it is well adapted to certain situations. Indeed, our results show that it is a feasible and cost-efficient strategy for sequencing particular regions of interest of relatively compact genomes like that of melon. This approach would also be useful in genome walking strategies for gene cloning, or resolving a particular region where a physical map is available.
Sequence accuracy assessment
Differences between Sanger- and 454-sequences of BAC Cm13_J04.
Length of Sanger-sequence
Stretches of Ns on 454-sequence
3,572 bp (3.6%)
454 differences 2
It has been already described that Sanger and 454 technologies have a generally comparable level of accuracy regarding genic regions or other single-copy sequences, homopolymeric stretches being the main source of read errors in both techniques when low copy regions are considered [37, 38, 43, 44]. Previous reports have also shown that longer stretches of A and T are more likely to cause problem when using pyrosequencing . Indeed, there is a tendency of homopolymers to be shorter in the 454 sequence than in the Sanger reads, although at least a report exists where the stretches were consistently found to be one nucleotide longer in the 454 sequences [38, 43]. In summary, the polymorphisms detected between the melon 454 and Sanger sequences in a 100 kb interval involved only 1.7 differences every 10,000 bp, a figure close to previously reported values [37, 38].
Besides homopolymers, repetitive DNA is known to be more problematic for 454 sequencing than for Sanger due to the shorter length of the 454 reads. Repetitive regions can be collapsed into one consensus contig causing gaps to appear in the final assembly. This may be the main reason behind the gaps accounting for an estimated loss of ca. 5% of melon sequence in our final assembly. Indeed, all five stretches of Ns in Cm13_J04 consensus sequence are found in two regions that contain repetitive sequences such as a transposable element and a TIR-NBS-LRR resistance gene (data not shown).
Ab initio prediction of protein coding, tRNA and rRNA genes was carried out as described in the Methods chapter. The predictions were validated by homology with protein sequences at NCBI databases and with ESTs from the melon unigene v3 database at ICUGI . A census of simple sequence repeats (SSRs) was also performed using the msatcommander software.
C. melo BAC sequences characteristicsa.
Total sequence length
Sequence length excluding stretches of Ns
Number of predicted protein coding genes b
Number of predicted protein coding genes with homology to C. melo ESTs
Gene density c
9.9 genes/100 kb (1.5 - 19.7, SD: 4.3)
Average exon length
Average intron length
Exons per gene
4.9 (1-29, SD: 4.4) (74% of genes ≤ 6 exons)
Average protein length d
386 (34-2,156, SD: 268)
Average% of introns in coding sequence e
45.6 (4.3 - 95.5, SD: 20.6)
GC content (%)
33 (30.2 - 38.7, SD: 1.34)
4,430 (74,590 bp, 1.25% of total sequence)
1 SSRs/1.3 Kb
Transposable elements g
The recent publication of the Cucumber sativus genome sequence begs the comparison of sequence and annotation characteristics of both cucurbit species . Overall, the statistics of protein-coding genes from both cucurbits are quite similar. The predictions for the cucumber genome are a gene density of 10 per 100 kb, mean protein length of 349 amino acids, average number of exons per gene, exon length and intron length of 4.8, 238 bp and 483 bp, respectively, and tRNA gene density of 2.9 per Mb. While the gene density, mean exon length and average number of exons per gene are very similar in both species, in cucumber the protein length is only slightly smaller (0.9x), and mean intron length is just 1.2 times greater.
The apparent similar gene density, together with the similarity in average protein length, number of exons and average exon and intron lengths, seems contradictory with the difference in genome size between both species. Indeed, the estimated size of the melon genome is 1.3x that of cucumber [7, 9]. It has to be taken into account, however, that the cucumber gene density was calculated based on as much as 70% of the complete genomic sequence, which most probably included gene-poor regions, while the melon gene density has been estimated using BAC clones that have gene- or EST- based genetic markers and thus probably represent gene-rich regions. Therefore, it might be the case that the actual melon gene density is lower than that of cucumber, hypothesis that is supported by the analysis of syntenic regions from both genomes (see below in the "Analysis of microsynteny" section).
Transposon content of the sequenced BACs
Transposons were annotated using sequence similarity searches with previously characterized transposons as well as by Ab initio methods based on transposon structural characteristics. As expected, most of the elements found belong to the retrotransposon class of mobile elements, with the Gypsy family being the most represented. However, the fraction of the genome these elements occupy is apparently smaller than in other genomes of similar size. Indeed, while retrotransposons account for 20% of the genomes of grapevine (504.6 Mb) and Lotus japonicus (472 Mb) [45, 46], these elements seem to account for only 7.2% of the melon genome (454 Mb) (Table. 6). Retrotransposons are not randomly distributed in genomes and while some elements preferentially integrate in gene-rich regions (see for example ), others target heterochromatic regions for integration, in particular those belonging to the Gypsy family which are usually present at higher copy number . Thus, the apparent low retrotransposon copy number could be due to the fact that heterochromatic regions are under-represented in the 1.5% fraction of the genome analyzed, which was selected to be representative of the gene-rich regions of the melon genome.
Transposon content in the C. melo sequenced BACs.a
Analysis of microsynteny
Besides, the orientation of the putative syntenic genes was found to be conserved in all cases. However, a number of genes were duplicated in melon. These included the expansion of a cluster of NBS-LRR genes present in scaffold MRGH63, which is particularly interesting as the Vat gene and other disease resistance genes have been mapped to this region . NBS-LRR genes are the main family of resistance genes in plants, and are frequently found in clusters . Highly conserved gene order and content together with 95% of sequence similarity over coding regions has already been reported by Huang et al. based on the comparison of four sequenced BAC clones against the sequenced cucumber genome .
Besides the duplication of several genes, a major difference between cucumber and melon syntenic regions is the number of transposon insertions [Figure 2]. The cucumber sequences analysed contain only two retrotransposon insertions, one of which seems very old as it is highly degenerated. On the contrary, the melon syntenic regions contain three DNA transposons (two hATs and one MULE) and 15 retrotransposons (most of them from the Gypsy superfamily), including the degenerated retrotransposon found in cucumber. In particular, transposon activity appears to account for the expansion of ca. 60 kb in the melon scaffold0077 relative to its cucumber counterpart. In scaffold MRGH63, a localised transposon number amplification together with duplication of melon resistance gene homologs (see below) accounts for an 88 kb-long expansion of the sequence of melon relative to that of cucumber. Also, transposons were found to be putatively involved in gene disruption processes in scaffolds 9 and MRGH63.
These results suggest that transposition activity after the divergence of the two ancestors of melon and cucumber has been low in cucumber but very high in melon. This transposon amplification and mobilization could be a reason for the 1.8× increase in size of the melon syntenic regions. Bearing in mind that the melon genome is estimated to be 1.3× greater that of cucumber, it can tentatively be assumed that transposon activity may be mainly responsible for that difference in genome sizes.
It is interesting to note that almost half of the melon specific transposons are interspersed with NBS-LRR predicted genes that potentially form resistance gene clusters. Gene duplications and transposon insertions have been proposed to provide a structural environment that permits unequal crossovers and interlocus gene conversion allowing rapid evolution of resistance genes . In addition, the presence of active retrotransposons interspersed with resistance genes may also contribute to the resistance gene regulation by silencing related mechanisms . A detailed analysis of syntenic regions containing putative resistance genes between melon and cucumber may provide new information on the evolution of resistance genes and the development of new resistances in cultivated crops.
A set of 57 BAC clones from a double haploid line of melon was sequenced in two pools with the 454 system using both shotgun and paired-end approaches followed by bioinformatic assembly of the fragments obtained. From this assembly it was possible to obtain most likely complete sequences for 50 of these BACs, as judged by the length and the presence of BAC-end sequences, with a final coverage of 39×. The accuracy of the assembly was excellent, compared with a BAC clone already sequenced with the Sanger method, except in a small number of repetitive sequences, consistent with other 454 sequencing projects [37, 38]. These results show that 454-sequencing of pooled BACs, using both shotgun and paired-end libraries, is a feasible strategy for sequencing long stretches of genomic sequence from medium-size genomes such as that of melon. However, correction using other sequencing techniques would be needed for medium to high repetitive content regions.
The analysis of the fraction (around 1.5%) of the melon genome obtained provides a pilot overview of this species' genomic structure. Predicted gene annotations were confirmed in 73% of the cases by comparison with EST collections. This is probably a good measure of the completeness of the transcriptome information currently available for this species. The analysis of the sequences provides an interesting overview of the features such as microsatellite content, gene density and average protein length, revealing similarity to that of its close relative, cucumber.
Finally, the comparison of four melon regions totalling 782 kb against the genomic sequence of cucumber (the only other Cucurbit species where a draft genome sequence is available) reveals a high degree of collinearity between both species. The analysis of the detected syntenic regions suggests that the size difference of the two genomes is due to the expansion of intergenic regions, mainly through the activity of transposable elements in melon after the divergence of the two species. It is particularly interesting to note that almost half of the detected melon-specific transposons are interspersed with NBS-LRR predicted genes that potentially form resistance gene clusters. We have confirmed the utility of this sequencing method for small genomic fractions, and the analysis of the data thus obtained has expanded our understanding of the melon genome structure and the mechanisms underlying its evolution.
A BamHI BAC library from the double-haploid melon line 'PIT92' (PI 161375 × T111) was previously developed in our laboratory using pECBAC1 as cloning vector [, http://hbz7.tamu.edu/homelinks/bac_est/vector/sequence/sequence.htm]. With 23,040 BAC clones distributed in sixty 384-well plates, an average insert size of 139 kb and 20% empty clones, the library represents 5.7 genomic equivalents of the haploid melon genome.
Two pools of 35 and 23 BACs were selected for the analysis. Individual preinocules were grown on 1 ml 1 × LB plus 12.5 μg/ml chloramphenicol at 300 rpm, 37°C, for 17 h. The following day, 30 μl of each BAC clone from the preinocules were added into 50 ml tubes containing 20 ml 1 × LB plus 12.5 μg/ml chloramphenicol, and grown at 37°C, 300 rpm for 15 h. The grown cultures were then mixed to produce two separate volumes representing the two BAC pools and the bacterial cells were harvested by centrifugation at 6,000 × g for 15 min at 4°C.
Genomic DNA-free BAC DNA extraction was performed using the QIAGEN® Large-Construct Kit (Cat. No. 12462) following the manufacturer's instructions. Both final DNA pellets were resuspended in 500 μl TE pH 8.0 each.
All sequencing was performed with a Roche 454 Genome Sequencer machine using FLX chemistry. Two DNA extractions were done from the 35-BACs pool, one to create a shotgun library and the other one to create a 3 kb paired-end library. The shotgun library was used for one titration run and one full run performed by Lifequencing S. L. at their premises in Valencia, Spain. The paired-end library was sequenced on two quarters of a plate followed by a full run performed at our 454 sequencing facility. For the 23-BACs pool, one DNA extraction was done which served to create a shotgun and a 3 kb paired-end library. The shotgun library was sequenced with a full run while the paired-end library was sequenced on three eighths of a plate; both runs were performed at our 454 sequencing facility.
Sequence assembly was done using Newbler version 2.3 with default parameters. Reads from all BACs were processed together in one assembly run. The sequence of E. coli strain DH10B (NC 010473.1) was used as screening database and the vector pECBAC1 as trimming database, but without 30 bp of sequence flanking either side of the BamHI restriction site (see below) http://hbz7.tamu.edu/homelinks/bac_est/vector/sequence/sequence.htm. Additional assemblies of each BAC pool were independently done using Newbler versions 2.3 and 2.0 (data now shown); results of these assemblies served for comparison purposes and only in some cases helped to manually correct some scaffolds in the global assembly.
Sequences of the genetic markers previously anchored to the analyzed BACs as well as some BAC-end sequences previously available in our laboratories (GenBank Acc. Nos. can be found in the Additional file 3 Table S2) were used to assign a sequence to a specific BAC. Based on this information, in some cases we could join two scaffolds that corresponded to the same BAC into a single superscaffold that would represent the whole BAC insert. In these cases a gap was introduced between the scaffolds so that the final sequence had the size of the average insert size of the BAC library. The manually introduced gaps accounted for 7.25% of all the gaps in the assembly. The sizes of these gaps in nucleotides are as follow: 500 in Scaffold52B05; 1,209 in Scaffold45K01; 1,538 in Scaffold11I12; 1,831 in Scaffold54E01; 2,288 in Scaffold55F19; 2,586 in Scaffold59B11; and 12,064 in Scaffold54J04.
In order to study how many of the assembled contigs and scaffolds represented the complete sequence of BACs, those sequences were searched for BAC borders in the following ways: 1) by searching at their extremes the 30 bp sequence corresponding to pECBAC1; 2) by blasting against individual reads containing the 30 bp sequence and 3) by blasting against BAC-end sequences that were already available for some of the sequenced BACs [see Additional File 3 Table S2].
Ab initio gene prediction was performed using the command-line version of Augustus 2.3 software http://augustus.gobics.de/ using A. thaliana as plant model. The melon unigene v3 collection at ICUGI  was used to improve the Augustus prediction, setting the minimum identity parameter to 92. In some cases, the FGENESH annotation software at http://linux1.softberry.com/berry.phtml, with Arabidopsis as plant model, was used to complement or improve the Augustus annotation. The predicted coding sequences were checked against the non-redundant protein databases at NCBI using BLASTP searching for protein homologs.
tRNA genes were predicted using the tRNAscan-SE 1.21 software http://lowelab.ucsc.edu/tRNAscan-SE/ and rRNA genes were identified with RNAmmer 1.2 server http://www.cbs.dtu.dk/services/RNAmmer/. Simple sequence repeats (SSRs) were searched for using the msatcommander 0.8.2 software http://code.google.com/p/msatcommander/; the minimum repeat lengths considered were: 10 bp (mononucleotides), 12 bp (di- and trinucleotides), 16 bp (tetranucleotides), 20 bp (pentanucleotides) and 24 bp (hexanucleotides).
Transposons were annotated using Ab initio and sequence similarity searches integrated in a pipeline based on Dawgpaws . The programs used for de novo prediction of LTR retrotransposons included LTR_STRUC , LTR_finder  and LTR_seq , and vary in the type of structures they look for, their stringency and their search algorithms. The homology-based approach consisted of searching for sequences that show a high degree of similarity to known transposons. For this, we compiled nucleotide databases of already characterized transposons obtained from the RepBase database  as well as NCBI . Likewise, we constructed protein sequence databases of transposases from various transposon families, searching NCBI for combinations of keywords such as "transposase" and "CACTA", "hAT", "Mariner", "Mutator" or "PIF". This approach is useful for corroborating results obtained from the de novo programs, as well as identifying other types of transposons such as DNA transposons. The output of these various programs was converted into gff3 format and uploaded into the Apollo genome viewer and annotation tool , along with the gene annotations, for manual curation. As a first step, each scaffold was examined and putative transposons were identified according to the computational evidence. These were then manually inspected to look for LTRs or TIRs, query NCBI to determine which family they belong to, and resolve instances of nested or truncated elements. These bona fide transposons were used to query the set of scaffolds in similarity searches, aiming at identifying partial or degenerated copies and defining transposon families. This third step is particularly relevant when a large amount of sequence data is available, as aligning many copies of an element aids to precisely define its borders and find consensus sequences. At this point, with the current fraction of the genome available, we have not found enough copies of a single element to perform this part of the analysis.
Four annotated melon scaffolds were analysed for homology with the Cucumis sativus genome assembly deposited in Phytozome v5 , using the BLASTN algorithm. The selected cucumber regions were annotated the same way as the melon BACs. Pairs of homologous genes were tentatively selected on the basis of the gene annotation and then confirmed by performing BLASTP alignments of the correspondent predicted proteins. Syntenic regions were defined as contiguous regions containing two or more homologous genes in C. melo and C. sativus, irrespective of orientation and exact order of genes, based on the results of BLASTP comparisons. The relative syntenic quality in a region, expressed as a percentage, was calculated by dividing the sum of the conserved genes in both syntenic regions by the sum of the total number of genes in both regions, excluding transposable elements and collapsing tandem duplications .
The C. melo BAC nucleotide sequences are available in the DDBJ/EMBL/GenBank databases under the accession numbers HM854749-HM854824. The raw data can be found in the SRA archive of the NCBI under the accession number SRA024701.1.
We gratefully acknowledge Lifesequencing S. L. for technical assistance in 454-sequencing one of the DNA pools. This project was funded by the Plan Nacional de Investigación Científica of the Spanish Ministerio de Educación y Ciencia (Projects BIO2007-61789 to PPR and AGL2006-12780-C02-01 to JGM), by the Consolider-Ingenio 2010 Programme of the Spanish Ministerio de Ciencia e Innovación (CSD2007-00036 "Center for Research in Agrigenomics"), and by the Departament d'Innovació, Universitats i Empresa de la Generalitat de Catalunya. We acknowledge the valuable technical help from Roche 454 and Roche Spain.
- Arabidopsis Genome Initiative: Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature. 2000, 408: 796-815. 10.1038/35048692.View ArticleGoogle Scholar
- The International Brachypodium Initiative: Genome sequencing and analysis of the model grass Brachypodium distachyon. Nature. 2010, 463: 763-768. 10.1038/nature08747.View ArticleGoogle Scholar
- International Rice Sequencing Project: The map-based sequence of the rice genome. Nature. 2005, 436: 793-800. 10.1038/nature03895.View ArticleGoogle Scholar
- Schnable PS, Ware D, Fulton RS, Stein JC, Wei FS, Pasternak S, Liang CZ, Zhang JW, Fulton L, Graves TA, Minx P, Reily AD, Courtney L, Kruchowski SS, Tomlinson C, Strong C, Delehaunty K, Fronick C, Courtney B, Rock SM, Belter E, Du FY, Kim K, Abbott RM, Cotton M, Levy A, Marchetto P, Ochoa K, Jackson SM, Gillam B, et al: B73 Maize Genome: Complexity, diversity, and dynamics. Science. 2009, 326: 1112-1115. 10.1126/science.1178534.PubMedView ArticleGoogle Scholar
- Paterson AH, Bowers JE, Bruggmann R, Dubchak I, Grimwood J, Gundlach H, Haberer G, Hellsten U, Mitros T, Poliakov A, Schmutz J, Spannagi M, Tang H, Wang X, Wicker T, Bharti AK, Chapman J, Feltus FA, Gowik U, Grigoriev IV, Lyons E, Maher CA, Martis M, Narechania A, Otillar RP, Penning BW, Salamov AA, Wang Y, Zhang L, Carpita NC, Freeling M, Gingle AR, Hash CT, Keller B, Klein P, Kresovich S, McCann MC, Ming R, Peterson DG, Mehboob-ur-Rahman , Ware D, Westhoff P, Mayer KFX, Messing J, Rokhsar DS: The Sorghum bicolour genome and the diversification of grasses. Nature. 2009, 457: 551-556. 10.1038/nature07723.PubMedView ArticleGoogle Scholar
- Schmutz J, Cannon SB, Schlueter J, Ma J, Mitros T, Nelson W, Hyten DL, Song Q, Thelen JJ, Cheng J, Xu D, Hellsten U, May GD, et al: Genome sequence of the palaeopolyploid soybean. Nature. 2010, 463: 178-183. 10.1038/nature08670.PubMedView ArticleGoogle Scholar
- Huang S, Li R, Zhang Z, Li L, Gu X, Fan W, Lucas WJ, Wang X, Xie B, Ni P, Ren Y, Zhu H, Li J, Lin K, Jin W, Fei Z, Li G, Staub J, Kilian A, van der Vossen EAG, Wu Y, Guo J, He J, Jia Z, Ren Y, Tian G, Lu Yao, Ruan J, Quian W, Wang M, Huang Q, Li B, Xuan Z, Cao J, Asan , Wu Z, Zhang J, Cai Q, Bai Y, Zhao B, Han Y, Ying Li, Li X, Wang S, Shi Q, Liu S, Cho WK, Kim JY, Xu Y, Heller-Uszynska K, Miao H, Cheng Z, Zhang S, Wu J, Yang Y, Kang H, Man Li, Liang H, Ren X, Shi Z, Wen M, Jian M, Yang H, Zhang G, Yang Z, Chen R, Liu S, Li J, Ma L, Liu H, Zhou Y, Zhao Y, Fang X, Li G, Fang Li, Li Y, Liu D, Zheng H, Zhang Y, Qin N, Li Z, Yang G, Yang S, Bolund L, Kristiansen K, Zheng H, Li S, Zhang X, Yang H, Wang J, Sun R, Zhang B, Jiang S, Wang J, Du Y, Li S: The genome of the cucumber, Cucumis sativus L. Nature Genetics. 2009, 41: 1275-1283. 10.1038/ng.475.PubMedView ArticleGoogle Scholar
- Arumuganathan K, Earle ED: Nuclear DNA content of some important plant species. Plant Mol Biol Rep. 1991, 9: 208-218. 10.1007/BF02672069.View ArticleGoogle Scholar
- Ezura H, Fukino N: Research tools for functional genomics in melon (Cucumis melo L.): Current status and prospects. Plant Biotechnology. 2009, 26: 359-368.View ArticleGoogle Scholar
- Gonzalez-Ibeas D, Blanca J, Roig C, González-To M, Picó B, Truniger V, Gómez P, Deleu W, Caño-Delgado A, Arús P, Nuez F, Garcia-Mas J, Puigdomènech P, Aranda MA: MELOGEN: an EST database for melon functional genomics. BMC Genomics. 2007, 8: 306-10.1186/1471-2164-8-306.PubMedPubMed CentralView ArticleGoogle Scholar
- The International Cucurbit Genomics Initiative (ICuGI). [http://www.icugi.org]
- van Leeuwen H, Monfort A, Zhang HB, Puigdomènech P: Identification and characterization of a melon genomic region containing a resistance gene cluster from a constructed BAC library. Microlinearity between Cucumis melo and Arabidopsis thaliana. Plant Mol Biol. 2003, 51: 703-718. 10.1023/A:1022573230486.PubMedView ArticleGoogle Scholar
- Luo M, Wang YH, Frisch D, Joobeur T, Wing RA, Dean RA: Melon bacterial artificial chromosome (BAC) library construction using improved methods and identification of clones linked to the locus conferring resistance to melon Fusarium wilt (Fom-2). Genome. 2001, 44: 154-162. 10.1139/gen-44-2-154.PubMedView ArticleGoogle Scholar
- Mascarell-Creus A, Cañizares J, Vilarrasa-Blasi J, Mora-García S, Blanca J, Gonzalez-Ibeas D, Saladié M, Roig C, Deleu W, Picó-Silvent B, López-Bigas N, Aranda MA, Garcia-Mas J, Nuez F, Puigdomènech P, Caño-Delgado AI: An oligo-based microarray offers novel transcriptomic approaches for the analysis of pathogen resistance and fruit quality traits in melon (Cucumis melo L.). BMC Genomics. 2009, 10: 467-10.1186/1471-2164-10-467.PubMedPubMed CentralView ArticleGoogle Scholar
- Ophir R, Eshed R, Harel-Beja R, Tzuri G, Portnoy V, Burger Y, Uliel S, Katzir N, Sherman A: High-throughput marker discovery in melon using a self-designed oligo microarray. BMC Genomics. 2010, 11: 269-10.1186/1471-2164-11-269.PubMedPubMed CentralView ArticleGoogle Scholar
- Tadmor Y, Katzir N, Meir A, Yaniv-Yaakov A, Sa'ar U, Baumkoler F, Lavee T, Lewinsohn E, Schaffer A, Buerger J: Induced mutagenesis to augment the natural genetic variability of melon (Cucumis melo L.). Israel J Plant Sci. 2007, 55: 159-169. 10.1560/IJPS.55.2.159.View ArticleGoogle Scholar
- Nieto C, Piron F, Dalmais M, Marco CF, Moriones E, Gómez-Guillamón ML, Truniger V, Gómez P, Garcia-Mas J, Aranda MA, Bendahmane A: EcoTILLING for the identification of alleclic variants of melon eIF4E, a factor that controls virus susceptibility. BMC Plant Biol. 2007, 7: 34-10.1186/1471-2229-7-34.PubMedPubMed CentralView ArticleGoogle Scholar
- Eduardo I, Arús P, Monforte AJ: Development of a genomic library of near isogenic lines (NILs) in melon (Cucumis melo L.) from the exotic accession PI161375. Theor Appl Genet. 2005, 112: 139-148. 10.1007/s00122-005-0116-y.PubMedView ArticleGoogle Scholar
- Wang YH, Thomas CE, Dean RA: A genetic map of melon (Cucumis melo L.) based on amplified fragment length polymorphism (AFLP) markers. Theor Appl Genet. 1997, 95: 791-798. 10.1007/s001220050627.View ArticleGoogle Scholar
- Danin-Poleg Y, Reis N, Baudracco-Arnas S, Pitrat M, Staub JE, Oliver M, Arus P, deVicente CM, Katzir N: Simple sequence repeats in Cucumis mapping and map merging. Genome. 2000, 43 (6): 963-974. 10.1139/gen-43-6-963.PubMedView ArticleGoogle Scholar
- Oliver M, Garcia-Mas J, Cardus M, Pueyo N, Lopez-Sese AL, Arroyo M, Gomez-Paniagua H, Arus P, de Vicente MC: Construction of a referente map of melon. Genome. 2001, 44: 836-845. 10.1139/gen-44-5-836.PubMedView ArticleGoogle Scholar
- Silberstein L, Kovalski I, Brotman Y, Perin C, Dogimont C, Pitrat M, Klingler J, Thompson G, Portnoy V, Katzir N, Perl-Treves R: Linkage map of Cucumis melo including phenotypic traits and sequence-characterized genes. Genome. 2003, 46: 761-773. 10.1139/g03-060.PubMedView ArticleGoogle Scholar
- Gonzalo MJ, Oliver M, Garcia-Mas J, Monfort A, Dolcet-Sanjuan R, Katzir N, Arus P, Monforte AJ: Simple-sequence repeat markers used in merging linkage maps of melon (Cucumis melo L.). Theor Appl Genet. 2005, 110: 802-811. 10.1007/s00122-004-1814-6.PubMedView ArticleGoogle Scholar
- Deleu W, Esteras C, Roig C, González-To M, Fernández-Silva I, González-Ibeas D, Blanca J, Aranda MA, Arús P, Nuez F, Monforte AJ, Picó MB, Garcia-Mas J: A set of EST-SNPs for map saturation and cultivar identification in melon. BMC Plant Biology. 2009, 9: 90-10.1186/1471-2229-9-90.PubMedPubMed CentralView ArticleGoogle Scholar
- Harel-Beja R, Tzuri G, Portnoy V, Lotan-Pompan M, Lev S, Cohen S, Dai N, Yeselson L, Meir A, Libhaber SE, Avisar E, Melame T, van Koert P, Verbakel H, Hofstede R, Volpin H, Oliver M, Fougedoire A, Stalh C, Fauve J, Copes B, Fei Z, Giovannoni J, Ori N, Lewinsohn E, Sherman A, Burger J, Tadmor Y, Schaffer AA, Katzir N: A genetic map of melon highly enriched with fruit quality QTLs and EST markers, including sugar and carotenoid metabolism genes. Theor Appl Genet. 2010, 121 (3): 511-33. 10.1007/s00122-010-1327-4.PubMedView ArticleGoogle Scholar
- González V, Garcia-Mas J, Arús P, Puigdomènech P: Generation of a BAC-based physical map of the melon genome. BMC Genomics. 2010, 11: 339-10.1186/1471-2164-11-339.PubMedPubMed CentralView ArticleGoogle Scholar
- Nieto C, Morales M, Orjeda G, Clepet C, Monfort A, Sturbois B, Puigdomènech P, Pitrat M, Caboche M, Dogimont C, Garcia-Mas J, Aranda MA, Bendahmane A: An eIF4E allele confers resistance to an uncapped and non-polyadenylated RNA virus in melon. Plant J. 2006, 48: 452-462. 10.1111/j.1365-313X.2006.02885.x.PubMedView ArticleGoogle Scholar
- Joobeur T, King JJ, Nolin SJ, Thomas CE, Dean RA: The Fusarium wilt resistance locus Fom-2 of melon contains a single resistance gene with complex features. Plant Journal. 2004, 39: 283-297. 10.1111/j.1365-313X.2004.02134.x.PubMedView ArticleGoogle Scholar
- Boualem A, Mohamed F, Fernandez R, Troadec C, Martin A, Morin H, Sari MA, Collin F, Flowers JM, Pitrat M, Purugganan MD, Dogimont C, Bendahmane A: A conserved mutation in an ethylene biosynthesis enzyme leads to andromonoecy in melons. Science. 2008, 321: 836-838. 10.1126/science.1159023.PubMedView ArticleGoogle Scholar
- Martin A, Troadez C, Boualem A, Rajab M, Fernandez R, Morin H, Pitrat M, Dogimont C, Bendahmane A: A transposon-induced epigenetic change leads to sex determination in melon. Nature. 2009, 461: 1135-1138. 10.1038/nature08498.PubMedView ArticleGoogle Scholar
- Moreno E, Obando JM, Dos-Santos N, Fernández-Trujillo JP, Monforte AJ, Jordi Garcia-Mas: Candidate genes and QTLs for fruit ripening and softening in melon. Theor App Genet. 2008, 116 (4): 589-602. 10.1007/s00122-007-0694-y.View ArticleGoogle Scholar
- Ezura H, Owino WO: Melon, an alternative model plant for elucidating fruit ripening. Plant Science. 2008, 175: 121-129. 10.1016/j.plantsci.2008.02.004.View ArticleGoogle Scholar
- van Leeuwen H, Monfort A, Zhang HB, Puigdomènech P: Identification and characterization of a melon genomic region containing a resistance gene cluster from a constructed BAC library. Microlinearity between Cucumis melo and Arabidopsis thaliana. Plant Mol Biol. 2003, 51: 703-718. 10.1023/A:1022573230486.PubMedView ArticleGoogle Scholar
- van Leeuwen H, Garcia-Mas J, Coca M, Puigdomènech P, Monfort A: Analysis of the melon genome in regions encompassing TIR-NBS-LRR resistance genes. Mol Gen Genom. 2005, 273: 240-251. 10.1007/s00438-004-1104-7.View ArticleGoogle Scholar
- Deleu W, González V, Monfort A, Bendahmane , Puigdomènech P, Arús P, Garcia-Mas J: Structure of two melon regions reveals high microsynteny with sequenced plant species. Mol Genet Genomics. 2007, 278: 611-622. 10.1007/s00438-007-0277-2.PubMedView ArticleGoogle Scholar
- Varshney RK, Nayak SN, May GD, Jackson SA: Next-generation sequencing technologies and their implications to crop genetics and breeding. Trends in Biotechnology. 2009, 27: 522-530. 10.1016/j.tibtech.2009.05.006.PubMedView ArticleGoogle Scholar
- Rounsley S, Marri PR, Yu Y, He R, Sisneros N, Goicoechea JL, Lee SJ, Angelova A, Kudrna D, Luo M, Affourtit J, Desany B, Knight J, Niazi F, Egholm M, Wing RA: De novo next generation sequencing of plant genomes. Rice. 2009, 2: 35-43. 10.1007/s12284-009-9025-z.View ArticleGoogle Scholar
- Wicker T, Schlagenhauf , Graner A, Close TJ, Keller B, Stein N: 454 sequencing put to the test using the complex genome of barley. BMC Genomics. 2006, 7: 275-10.1186/1471-2164-7-275.PubMedPubMed CentralView ArticleGoogle Scholar
- Quinn NL, Levenkova N, Chow W, Bouffard P, Boroevich KA, Knight JR, Jarvie TP, Lubieniecki KP, Desany BA, Koop BF, Harkins TT, Davidson WS: Assessing the feasibility of GS FLX pyrosequencing for sequencing the Atlantic salmon genome. BMC Genomics. 2008, 9: 404-10.1186/1471-2164-9-404.PubMedPubMed CentralView ArticleGoogle Scholar
- Steuernagel B, Taudien S, Gundlach H, Seidel M, Ariyadasa R, Schulte D, Petzold A, Felder M, Graner A, Scholz U, Mayer KFX, Platzer M, Stein N: De novo 454 sequencing of barcoded BAC pools for comprehensive gene survey and genome analysis in the complex genome of barley. BMC Genomics. 2009, 10: 547-10.1186/1471-2164-10-547.PubMedPubMed CentralView ArticleGoogle Scholar
- Morales M, roig E, Monforte AJ, Arús P, Garcia-Mas J: Single-nucleotide polymorphisms detected in expressed sequence tags of melon (Cucumis melo L.). Genome. 2004, 47 (2): 352-60. 10.1139/g03-139.PubMedView ArticleGoogle Scholar
- Essafi A, Diaz-Pendon JA, Moriones E, Monforte AJ, Garcia-Mas J, Martin-Hernandez AM: Dissection of the oligogenic resistance to Cucumber mosaic virus in the melon accession PI161375. Theor Appl Genet. 2009, 118 (2): 275-284. 10.1007/s00122-008-0897-x.PubMedView ArticleGoogle Scholar
- Moore MJ, Dhingra A, Soltis PS, Shaw R, Farmerie WG, Folta KM, Soltis DE: Rapid and accurate pyrosequencing of angiosperm plastid genomes. BMC Plant Biol. 2006, 6 (1): 17-10.1186/1471-2229-6-17.PubMedPubMed CentralView ArticleGoogle Scholar
- Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA, Berka J, Braverman MS, Chen YJ, Chen Z, Dewell SB, Du L, Fierro JM, Gomes XV, Godwin BC, He W, Helgesen S, Ho CH, Irzyk GP, Jando SC, Alenquer MLI, Jarvie TP, Jirage KB, Kim J, Knight JR, Lanza JR, Leamon JH, Lefkowitz SM, Lei M, Li J, Lohman KL, Lu H, Makhijani VB, McDade KE, McKenna MP, Myers EW, Nickerson E, Nobile JR, Plant R, Puc BP, Ronan MT, Roth GT, Sarkis GJ, Simons JF, Simpson JW, Srinivasan M, Tartaro KR, Tomasz A, Vogt KA, Volkmer GA, Wang SH, Wang Y, Weiner MP, Yu P, Begley RF, Rothberg JM: Genome sequencing in microfabricated high-density picolitre reactors. Nature. 2005, 437: 376-380.PubMedPubMed CentralGoogle Scholar
- Velasco R, Zharkikh A, Troggio M, Cartwright DA, Cestaro A, Pruss D, Pindo M, Fitzgerald LM, Vezzulli S, Reid J, Malacarne G, Iliev D, Coppola G, Wardell B, Micheletti D, Macalma T, Facci M, Mitchell JT, Perazzolli M, Eldredge G, Gatto P, Oyzerski R, Moretto M, Gutin N, Stefanini M, Chen Y, Segala C, Davenport C, Demattè L, Mraz A, Battilana J, Stormo K, Costa F, Tao Q, Si-Ammour A, Harkins T, Lackey A, Perbost C, Taillon B, Stella A, Solovyev V, Fawcett JA, Sterck L, Vandepoele K, Grando SM, Toppo S, Moser C, Lanchbury J, Bogden R, Skolnick M, Sgaramella V, Bhatnagar SK, Fontana P, Gutin A, Van de Peer Y, Salamini F, Viola R: A high quality draft consensus sequence of the genome of a heterozygous grapevine variety. PLoS One. 2007, 2 (12): e1326-10.1371/journal.pone.0001326.PubMedPubMed CentralView ArticleGoogle Scholar
- Sato S, Nakamura Y, Kaneko T, Asamizu E, Kato T, Nakao M, Sasamoto S, Watanabe A, Ono A, Kawashima K, Fujishiro T, Katoh M, Kohara M, Kishida Y, Minami C, Nakayama S, Nakazaki N, Shimizu Y, Shinpo S, Takahashi C, Wada T, Yamada M, Ohmido N, Hayashi M, Fukui K, Baba T, Nakamichi T, Mori H, Tabata S: Genome structure of the legume, Lotus japonicus. DNA Res. 2008, 15 (4): 227-39. 10.1093/dnares/dsn008.PubMedPubMed CentralView ArticleGoogle Scholar
- Le QH, Melayah D, Bonnivard E, Petit M, Grandbastien MA: Distribution dynamics of the Tnt1 retrotransposon in tobacco. Mol Genet Genomics. 2007, 278 (6): 639-51. 10.1007/s00438-007-0281-6.PubMedView ArticleGoogle Scholar
- Gao X, Hou Y, Ebina H, Levin HL, Voytas DF: Chromodomains direct integration of retrotransposons to heterochromatin. Genome Res. 2008, 18: 359-369. 10.1101/gr.7146408.PubMedPubMed CentralView ArticleGoogle Scholar
- Benjak A, Forneck A, Casacuberta JM: Genome-wide analysis of the "cut-and-paste" transposons of grapevine. PLoS One. 2008, 3 (9): e3107-10.1371/journal.pone.0003107.PubMedPubMed CentralView ArticleGoogle Scholar
- Phytozome: a tool for green plant comparative genomics. [http://www.phytozome.net/]
- Friedman AR, Baker BJ: The evolution of resistance genes in multi-protein plant resistance systems. Curr Op Genet Dev. 2007, 17: 493-499. 10.1016/j.gde.2007.08.014.PubMedView ArticleGoogle Scholar
- Hernández-Pinzón I, de Jesús E, Santiago N, Casacuberta JM: The frequent transcripcional readthrough of the tobacco Tnt1 retrotransposon and its possible implications for the control of resistance genes. J Mol Evol. 2009, 68 (3): 269-78. 10.1007/s00239-009-9204-y.PubMedView ArticleGoogle Scholar
- Estill JC, Bennetzen JL: The DAWGPAWS pipeline for the annotation of genes and transposable elements in plant genomes. Plant Methods. 2009, 5: 8-10.1186/1746-4811-5-8.PubMedPubMed CentralView ArticleGoogle Scholar
- McCarthy EM, McDonald JF: LTR_STRUC: a novel search and identification program for LTR retrotransposons. Bioinformatics. 2003, 19 (3): 362-367. 10.1093/bioinformatics/btf878.PubMedView ArticleGoogle Scholar
- Xu Z, Wang H: LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 2007, 35: W265-268. 10.1093/nar/gkm286.PubMedPubMed CentralView ArticleGoogle Scholar
- Kalyanaraman A, Aluru S: Efficient algorithms and software for detection of full-length retrotransposons. J Bioinform Comput Biol. 2006, 4 (2): 197-216. 10.1142/S021972000600203X.PubMedView ArticleGoogle Scholar
- Repbase. [http://www.girinst.org/repbase/index.html]
- National Center for Biotechnology Information. [http://www.ncbi.nlm.nih.gov/]
- Lewis SE, Searle SMJ, Harris H, Gibson M, Iyer V, Ricter J, Wiel C, BAyraktaroglu L, Birney E, Crosby MA, Kaminker JS, Matthews B, Prochnik SE, Smith CD, Tupyl JL, Rubin GM, Misra S, Mungall CJ, Clamp ME: Apollo: a sequence annotation editor. Genome Biology. 2002, 3 (12): RESEARCH0082-10.1186/gb-2002-3-12-research0082.PubMedPubMed CentralView ArticleGoogle Scholar
- Cannon SB, Sterck L, Rombauts S, Sato S, cheung F, Gouzy G, Wang X, Mudge J, Vasdewani J, Scheix T, Spannagl M, Monaghan E, Nicholson C, Humphray SJ, Schoof H, Mayer KFX, Rogers J, Quetier F, Oldroyd GE, Debelle F, Cook DR, Retzel EF, Roe BA, Town CD, Tabata S, Van de Peer Y, Young ND: Legume genome evolution viewed through the Medicago truncatula and Lotus japonicus genomes. Proc Natl Acad Sci USA. 2006, 103: 14959-14964. 10.1073/pnas.0603228103.PubMedPubMed CentralView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.