PineappleDB: An online pineapple bioinformatics resource
- Richard L Moyle†1,
- Mark L Crowe†2,
- Jonni Ripi-Koia1, 3,
- David J Fairbairn1 and
- José R Botella1Email author
© Moyle et al; licensee BioMed Central Ltd. 2005
Received: 09 June 2005
Accepted: 05 October 2005
Published: 05 October 2005
A world first pineapple EST sequencing program has been undertaken to investigate genes expressed during non-climacteric fruit ripening and the nematode-plant interaction during root infection. Very little is known of how non-climacteric fruit ripening is controlled or of the molecular basis of the nematode-plant interaction. PineappleDB was developed to provide the research community with access to a curated bioinformatics resource housing the fruit, root and nematode infected gall expressed sequences.
PineappleDB is an online, curated database providing integrated access to annotated expressed sequence tag (EST) data for cDNA clones isolated from pineapple fruit, root, and nematode infected root gall vascular cylinder tissues. The database currently houses over 5600 EST sequences, 3383 contig consensus sequences, and associated bioinformatic data including splice variants, Arabidopsis homologues, both MIPS based and Gene Ontology functional classifications, and clone distributions. The online resource can be searched by text or by BLAST sequence homology. The data outputs provide comprehensive sequence, bioinformatic and functional classification information.
The online pineapple bioinformatic resource provides the research community with access to pineapple fruit and root/gall sequence and bioinformatic data in a user-friendly format. The search tools enable efficient data mining and present a wide spectrum of bioinformatic and functional classification information. PineappleDB will be of broad appeal to researchers investigating pineapple genetics, non-climacteric fruit ripening, root-knot nematode infection, crassulacean acid metabolism and alternative RNA splicing in plants.
In terms of commercial production, pineapple [Ananas comosus (L.) Merrill] is the third most important tropical fruit after banana and mango. Pineapple fruits are classified as non-climacteric, as there is no respiratory burst or spike in ethylene production during ripening and exogenous application of ethylene does not rapidly accelerate fruit ripening. Much has been learnt about the control of fruit ripening in climacteric fruit using tomato as a model system. In particular, manipulation of genes involved in the ethylene biosynthetic pathway and a MADS box transcription factor have led to altered ripening characteristics [1–5]. Conversely, almost nothing is known of how non-climacteric fruit ripening is controlled. Efforts to identify genes controlling non-climacteric fruit ripening are hampered by the small number of non-climacteric fruit gene sequences available for study. Thus, as a first step toward understanding the molecular basis of non-climacteric fruit ripening in pineapple, an EST sequence project has been initiated to isolate expressed sequences from mature green unripe and yellow ripened pineapple fruits .
Many crop species, including pineapple, are susceptible to root-knot nematode infection. Crop losses due to nematode infections are estimated to be more than 100 billion dollars each year . Additionally, the toxic soil fumigants used to control nematodes are becoming increasingly banned in many countries. Understanding the molecular mechanisms governing the nematode-plant interaction is of utmost importance in developing alternative strategies for the control of nematode infection. As such, we have constructed EST sequencing libraries from pineapple root and gall vascular cylinder tissue infected with the root-knot nematode Meloidogyne javanica. The vascular cylinder contains the giant cell structures that the nematode feeds upon, and can be dissected from the root cortex and stripped of nematodes with relative ease. Sequencing EST clones from such libraries is a first step toward isolating and identifying gene sequences involved in the plant-nematode interaction.
The collection of EST sequence information requires accurate gene annotation as well as dedicated platforms for storage, processing, curation and data retrieval. Ideally, collected sequence information should be easily accessed, and presented in a user-friendly format that provides the tools to mine the data efficiently. PineappleDB was developed to provide the research community with access to a curated and searchable bioinformatics resource housing fruit, root and gall derived EST sequences, contig sequences, clone annotations, functional classification and Gene Ontology information – all via a user-friendly web interface. PineappleDB will be a valuable resource of broad appeal to researchers studying pineapple genetics, crassulacean acid metabolism, non-climacteric fruit ripening, alternative RNA splicing in plants and root-knot nematode infection.
Construction and content
The database was developed using MySQL 4.0, and implemented on a server running RedHat 9.0. The web interface uses cgi scripts written in Perl 5.8.1. Perl scripts were also used for data processing and uploading into the database. Field descriptions and a full database schema are provided on the help page of the website.
PineappleDB web interface
The pineapple bioinformatic resource can be accessed through a web interface . The introductory page contains information about the pineapple bioinformatic resource, access to the pineapple EST database, lists of contigs containing full-length coding sequences, alternatively spliced clones, putative nematode sequences, and links to pineapple related web pages.
PineappleDB will be periodically upgraded as annotation, functional classification, and GO information are updated.
EST sequencing and bioinformatic analysis pipeline
Users may refer to the pineappleDB flow diagram on the homepage of the website for an overview of the bioinformatics pipeline. In total, 7296 clones from five libraries were 5' end sequenced. The libraries were constructed from uninfected root tips (~2 cm), dissected vascular cylinders of galls from early infection (1–4 weeks post infection), dissected vascular cylinders of galls from late infection (5–10 weeks post infection), mature green fruit and mature yellow fruit. Over 75% of clones returned an average Phred20 score of more than 700 bp. Raw sequences were manually edited for sequence quality and trimmed of plasmid contaminant and polyA tail in the sequence viewer program Chromas v2.13 (Technelysium). 5861 edited sequences were retrieved at an average read length of 769 bp. The 1615 clones with poor sequence quality and/or yielding less than 150 bp of insert sequence were eliminated from further bioinformatic analysis.
The 408 green fruit, 1140 yellow fruit, 343 root tip, 1298 early infection and 2461 late infection edited EST sequences were clustered into 3383 contigs, using SeqMan sequence assembly software (DNASTAR Inc. Madison, USA) and key parameters of minimum 90% match over at least 45 bp overlap. All edited EST sequences have been submitted to the GenBank dbEST [GenBank: CO730741-CO732287, DT335767-DT339792] . Each sequence was assigned a putative identification by BLASTX alignment of contig consensus sequences to the GenBank non-redundant (nr) protein database . Those clones that did not retrieve a BLASTX match better than the 10-20 E-value cut-off were annotated as an undiscovered sequence. The sequences were also BLASTX searched against MATDB to retrieve putative MIPS based functional classifications for clones with a homologous gene from the model plant organism Arabidopsis thaliana . GO classifications were obtained by downloading GO results from a multiple search of the TAIR Arabidopsis resource . A semi-automated process of parsing BLAST hits and manually curating the putative annotations resulted in a spreadsheet of information including cloneID, contig number, number of clones in each contig, nearest BLASTX match, accession number of match, length of match, percent similarity, putative annotation and functional classifications.
Identification of full length clones
Contig consensus sequences containing a polyA tail were analyzed for open-reading frames using EditSeq sequence analysis software (DNASTAR Inc. Madison, USA) and were BLASTX searched against the GenBank nr protein database . Contig consensus sequences were identified as containing a putative full length coding sequence by alignment to known full-length protein coding sequences, and/or by the presence of stop codons upstream of a significant open reading frame. A list of the putative full length coding sequences identified is present in the pineapple bioinformatic resource.
Identification of splice variants
Edited clone sequences generally assembled into contigs with 97–100% homology. However, the contig assembly report occasionally revealed incidences where some clones clustered with between 90–97% homology or that some clones did not cluster into existing contigs due to homology somewhat below the 90% threshold. An inspection of these clone sequences and contigs revealed the presence of apparently unspliced intron sequence, and/or the absence of exon sequence in some of the clones. A comparative analysis to other clone sequences within the contig alignment and to homologous protein coding sequences in GenBank verified that 120 clones contain an apparent "mis-splicing" event. The putative splice variant clones containing un-spliced intron sequence and/or missing spliced exon sequence are listed in the pineapple bioinformatic resource. The presence or absence of a putative splice variant is also reported in contig/clone search outputs.
Identification of putative nematode sequences
Despite precautions to remove nematodes from root tissues prior to library construction, it was anticipated that there would be some contamination of the pineapple gall libraries with nematode derived sequences. All contig consensus sequences containing root and gall EST's were BLASTN searched against the GenBank dbEST and BLASTX searched against the GenBank nr database. Matches to known nematode sequences were manually inspected and 77 contigs identified as containing a putative nematode sequence. All contigs containing putative nematode sequence are listed within the online pineapple bioinformatic resource.
The pineapple EST sequencing project was initiated as a first step toward identifying genes involved in and the molecular basis of non-climacteric fruit ripening and the nematode-plant interaction. The online pineapple bioinformatics resource was developed to house EST sequence information and associated bioinformatic data in a user-friendly format. PineappleDB can be freely accessed via the internet, and currently contains BLAST and text search tools to efficiently mine the dataset for clones and contigs of interest. The resulting search outputs contain comprehensive information on the clone and contigs including cloneID, contig number, number of clones in each contig, nearest BLASTX match, accession number of match, length of match, percent similarity, putative annotation, splice variants, MIPS based functional classifications, Gene Ontology classifications, and the distribution of clones from each library (fig. 1). Links are also provided to other clones within the same contig, the GenBank BLASTX nearest neighbour, and to homologous coding sequences from the model organism Arabidopsis thaliana.
PineappleDB houses the first reported collection of EST sequences isolated from pineapple. PineappleDB will grow as more EST sequence information becomes available. Furthermore, we have initiated a pineapple microarray project and it is anticipated that gene expression data will be incorporated into the online pineapple bioinformatics resource in the future. The EST database will periodically be upgraded as annotation, functional classification, and gene ontology information is updated.
Availability and requirements
The PineappleDB resource can be accessed via http://www.pgel.com.au
Contact: Dr. José R Botella at email@example.com
The pineapple bioinformatic resource was funded in part by an ARC-Linkage grant with Golden Circle Limited. M.L.C. was funded by the Australian Research Council Special Research Centre for Functional and Applied Genomics.
- Ben Amor M, Flores B, Latchs A, Bouyazen M, Pech JC: Inhibition of ethylene biosynthesis by antisense ACC oxidase RNA prevents chilling injury in Charentais cantaloupe melons. Plant Cell Environ. 1999, 22: 1579-1586. 10.1046/j.1365-3040.1999.00509.x.View ArticleGoogle Scholar
- Hadfield KA, Dang T, Guis M, Pech JC, Bouzayen M, Bennett AB: Characterization of ripening-regulated cDNAs and their expression in ethylene-suppressed charentais melon fruit. Plant Physiol. 2000, 122: 977-983. 10.1104/pp.122.3.977.PubMedPubMed CentralView ArticleGoogle Scholar
- Theologis A, Oeller PW, Wong LM, Rottmann WH, Gantz DM: Use of a tomato mutant constructed with reverse genetics to study fruit ripening, a complex developmental process. Dev Genet. 1993, 14: 282-295. 10.1002/dvg.1020140406.PubMedView ArticleGoogle Scholar
- Vrebalov J, Ruezinsky D, Padmanabhan V, White R, Medrano D, Drake R, Schuch W, Giovannoni J: A MADS-box gene necessary for fruit ripening at the tomato ripening-inhibitor (rin) locus. Science. 2002, 296: 343-346. 10.1126/science.1068181.PubMedView ArticleGoogle Scholar
- Adams-Phillips L, Barry C, Giovannoni J: Signal transduction systems regulating fruit ripening. Trends Plant Sci. 2004, 9: 331-338. 10.1016/j.tplants.2004.05.004.PubMedView ArticleGoogle Scholar
- Moyle R, Fairbairn DJ, Ripi J, Crowe M, Botella JR: Developing pineapple fruit has a small transcriptome dominated by metallothionein. J Exp Bot. 2005, 56: 101-112.PubMedGoogle Scholar
- Abad P, Favery B, Rosso MN, Castagnone-Sereno P: Root-knot nematode parasitism and host response: molecular basis of a sophisticated interaction. Mol Plant Path. 2003, 4: 217-224. 10.1046/j.1364-3703.2003.00170.x.View ArticleGoogle Scholar
- PineappleDB: The Online Pineapple Bioinformatics Resource. [http://www.pgel.com.au].
- Boguski MS, Lowe TM, Tolstoshev CM: dbEST--database for "expressed sequence tags". Nat Genet. 1993, 4: 332-333. 10.1038/ng0893-332.PubMedView ArticleGoogle Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215: 403-410. 10.1006/jmbi.1990.9999.PubMedView ArticleGoogle Scholar
- Schoof H, Ernst R, Nazarov V, Pfeifer L, Mewes HW, Mayer KF: MIPS Arabidopsis thaliana Database (MAtDB): an integrated biological knowledge resource for plant genomics. Nucleic Acids Res. 2004, 32: D373-6. 10.1093/nar/gkh068.PubMedPubMed CentralView ArticleGoogle Scholar
- The Arabidopsis Information Resource. [http://www.arabidopsis.org].
- National Center for Biotechnology Information. . [http://www.ncbi.nih.gov].
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.