PMRD: a curated database for genes and mutants involved in plant male reproduction
BMC Plant Biology volume 12, Article number: 215 (2012)
Male reproduction is an essential biological event in the plant life cycle separating the diploid sporophyte and haploid gametophyte generations, which involves expression of approximately 20,000 genes. The control of male reproduction is also of economic importance for plant breeding and hybrid seed production. With the advent of forward and reverse genetics and genomic technologies, a large number of male reproduction-related genes have been identified. Thus it is extremely challenging for individual researchers to systematically collect, and continually update, all the available information on genes and mutants related to plant male reproduction. The aim of this study is to manually curate such gene and mutant information and provide a web-accessible resource to facilitate the effective study of plant male reproduction.
Plant Male Reproduction Database (PMRD) is a comprehensive resource for browsing and retrieving knowledge on genes and mutants related to plant male reproduction. It is based upon literature and biological databases and includes 506 male sterile genes and 484 mutants with defects of male reproduction from a variety of plant species. Based on Gene Ontology (GO) annotations and literature, information relating to a further 3697 male reproduction related genes were systematically collected and included, and using in text curation, gene expression and phenotypic information were captured from the literature. PMRD provides a web interface which allows users to easily access the curated annotations and genomic information, including full names, symbols, locations, sequences, expression patterns, functions of genes, mutant phenotypes, male sterile categories, and corresponding publications. PMRD also provides mini tools to search and browse expression patterns of genes in microarray datasets, run BLAST searches, convert gene ID and generate gene networks. In addition, a Mediawiki engine and a forum have been integrated within the database, allowing users to share their knowledge, make comments and discuss topics.
PMRD provides an integrated link between genetic studies and the rapidly growing genomic information. As such this database provides a global view of plant male reproduction and thus aids advances in this important area.
Male reproduction is a complex and highly coordinated biological process that includes the development of the male reproductive organ, the stamen, that contain the microspores/pollen, as well as subsequent pollen release, pollination, pollen tube growth, guidance, reception, gamete migration and finally fertilization [1–6]. The stamen comprises an anther with multiple specialized cells/tissues for the production of viable pollen and a filament that supports the anther. Microspore/pollen development requires meiotic and subsequent mitotic divisions, and numerous cooperative functional interactions between the gametophytic and sporophytic tissues within the anther. Pollen development needs precise spatiotemporal expression of genes, orchestrated activity and localized control of enzymes, cell-to-cell communication, cell development and differentiation [2, 6]. Furthermore, disruption of gene expression by environmental effects, or genetic mutations, frequently results in reduced fertility, or complete male sterility, causing loss of agricultural yield. Control of plant fertility is also of economic importance with some male sterile lines used in agriculture for crop improvement, for example in breeding of super hybrid rice .
Due to the importance of male reproduction, much effort has been applied to understand the molecular regulation of plant male reproduction. Transcriptome analysis has indicated that more than 20,000 genes are expressed in rice (Oryza sativa) developing anthers and about 18,000 in Arabidopsis (Arabidopsis thaliana) pollen [8–10]; suggesting extensive gene expression changes during anther development and pollen formation . Furthermore, recent forward and reverse genetic studies have identified a large number of male sterile mutants and related genes [1, 12, 13]. However, it is time-consuming and inefficient for individual researchers to access accurate information on male reproduction in plants. This is particularly relevant in the context of comparative analysis between species.
To bridge the gap between genetic studies and genomic information in plant male reproduction, we systematically collected male sterile mutant and gene information by manual curation, and created the PMRD (Plant Male Reproduction Database) database. This database provides a bi-directional integration of the rapidly growing genomic data and knowledge from genetic studies, which will undoubtedly improve our understanding of the mechanisms of plant male reproduction. PMRD functions not only as a high quality curated database for browsing and retrieving knowledge on genes and mutants in plant male reproduction, but also as a dynamic website with build-in bioinformatics tools to access genomic information. Moreover, PMRD is designed with knowledge sharing features that include wiki and forum tools to facilitate community annotation, information sharing and education.
Construction and content
Collection of plant male reproduction related genes
Genes included in PMRD are divided into two categories: male sterile genes (MS genes) and male reproduction related genes (MR genes). The differences between MS genes and MR genes is that the function of MS genes has been demonstrated by analyzing the mutants showing reduced male fertility or transmission efficiency, whereas, MR genes mean that the MR genes have the putative function in male reproduction without genetic evidence. MS genes were identified from literature and biological database searches. MR genes were identified based upon GO annotations, the phenotypes of TAIR germplasms and expression information in literature . In order to establish a repository of literature for manual curation, we extensively collected publications on genetic and molecular studies of plant male reproduction through Pubmed and journal specific database searches. A total of 370 full-text publications were retrieved, including 143 papers for rice (Oryza sativa), 187 papers for Arabidopsis thaliana and 40 papers for a further 31 plant species. From this local repository of literature 343 MS genes and 321 MS mutants were identified. Next, we collected 163 MS mutants from two rice databases: Oryzabase and China Rice Data Center [15, 16]. To identify MR genes, we collected 41 GO terms associated with plant male reproduction from the GO Consortium . Subsequently we mapped the 41 GO terms onto annotations from the RAP-DB, TAIR and PLAZA websites [17–19] (See Additional file 1). Regarding MR genes identified in the literature, we collected 3697 MR genes. Therefore when combined with the MS genes, we have identified 4203 genes and 484 mutants in 33 species that are implicated as involved in plant male reproduction (Table 1).
Data entry and curation
Curation of information from publications into a well-structured searchable repository of knowledge is a critical step in biological database construction. This included manual review of papers, identification of biological entities, definition of the experimental methods used, conversion of experimental results and phenotypic observations into a standard format, and summarizing gene function data. In the PMRD curation process, papers were initially examined and checked whether appropriate for inclusion as an MS/MR gene in PMRD. The criterion for inclusion as an MS gene was that mutation of the gene must cause defects in male reproduction. Once identified the full-name, gene symbol and a brief description of the gene were obtained. Information was collated associated with the gene product expression pattern, molecular and biological function. Genes in rice and Arabidopsis were then mapped onto RAP-DB and TAIR locus, and included in PMRD. For other species, gene names mentioned in the papers were used. Gene expression assays in both rice and Arabidopsis were curated in detail using controlled anatomy and stage vocabularies. If the papers included genetic or transgenic studies of mutants, the curators captured the following information: mutant names, mutated genes, mutagenesis methods, dominance, mutant phenotypes and male sterility categories. All curated information was checked and confirmed by senior experts in this field.
Database design implementation
PMRD functions as a database system that brings together three main sources of knowledge: 1) general genomic information from public databases; 2) detailed curation of genetic studies from the literature; 3) public annotation from the research community (Figure 1). In the genomic annotation section, the chromosomal location, sequence, GO terms, KEGG pathway information and Interpro annotations are displayed [14, 15, 17, 18, 20, 21]. Plant male reproduction-associated microarray datasets were downloaded from GEO . To provide detailed anatomical information on mutant phenotype and gene expression we firstly designed tags and controlled vocabulary (CV), which were then used to normalize the information during the curation process. Controlled vocabulary for the development stages and anatomy was set according to publically accepted standards [1, 23]. The curated information in PMRD includes: summaries of genes function, gene expression patterns, mutant background, mutagenesis methods, descriptions of mutant phenotypes and male sterile type definitions. Genes for anther development and pollen formation were collated and the information organized in a two-dimensional module displayed on a webpage, which associates genes and mutants with stages and tissues, allowing multiple ways to browse genes and mutants of interest. For other male reproduction processes we capitalized upon community annotation, and created 4 online data collection tables, including “Pollination”, “Pollen Germination and Tube Growth”, “Guidance and Perception”, “Migration and Fusion”. We also integrated Mediawiki engine into PMRD, thus allowing users to contribute their knowledge on mutants, development stages, anatomy, and to create other topics that they have interests in. Finally, a forum was also setup to facilitate discussions.
Utility and discussion
Database web interface
We developed a user-friendly web interface for searching and browsing information in PMRD. Users can easily search genes by names, identifiers, sequences, expression, phenotypes and male sterile categories of relative mutants. Since the data structure for different species is not the same, web pages for searching and browsing are grouped into rice, Arabidopsis and other species in the main PMRD website menu. To make information retrieval convenient and precise, search pages are designed to include both simple and advanced options. The web page displaying information on MS genes contains six sections (Figure 2). The first section displays “Basic Information” of the gene, including gene symbols, gene names, description of genes from external databases and function as curation by PMRD staff. The second section contains “Genomic Information”, such as locations, gene structures and sequences. The third section displays “General Annotation” retrieved from external databases, including GO terms, KEGG pathway information and Interpro protein signatures [14, 20, 21]. The forth section displays the “Expression Pattern” of the gene. Expression information for rice and the Arabidopsis were obtained from literature curation and TAIR annotation . The fifth part summarizes “Mutant” information of the gene, including mutants, phenotypes, and male sterile categories. Male sterile categories indicate the pollen abortion type, which were set according to plant ontology (PO) and rice knowledge bank [27, 28]. A mutant can be assigned to more than one category. If detailed male sterile information of the mutant could not be obtained from data sources, it was assigned as “not defined”. The sixth section shows the “Publications” related to the genes. Web pages displaying mutant information are organized into five sections (Figure 3). Basic information includes mutagenesis method, dominance, background and a short description of the mutant. The following section includes information and links for the mutated genes. The third section displays curation of mutant phenotype observations. The last two sections display male sterility information and related publications. In case of a very long page, the user can collapse/expand the panel for each section, however because of heterogeneous data sources, not all contain complete datasets for all of the sections mentioned above.
PMRD also provides a variety of tools for information retrieval and display. “Browse page” was created (Figure 4) as a hub page to integrate information on genes, mutants, expression and phenotypes into a single interface according to stages and tissues during different male reproduction processes. The Ajax technique was employed to navigate through stages and tissues without refreshing. To enable more intuitive and informative multiple keywords searches, we developed a tool to visualize keywords-gene relationship using CanvasXpress and CytoscapeWeb (Figure 5) [24, 29]. This draws a connection between an MS gene and a keyword if the keyword appeared in the data entries related to the gene, including gene description, expression, related mutant phenotypes and GO annotations. For microarray data, the user can browse and search expression information on the microarray visualization page and microarray search page (Figure 6). We also provide a tool for BLAST searching for rice and Arabidopsis to help users search genes by sequences and ID converter for different databases [18, 30–33]. Finally, a wiki page with an easy to use editor plug-in has been setup to promote community information contributions; we encourage the users to contribute their knowledge in the wiki page and recommend literatures to us.
Comparative functional genomics study is an emerging approach that relies upon the application of the vast accumulated knowledge available for model species to less characterized species. Recently, a number of comparative, or functional genomics websites for plants have been developed, such as PLAZA, Phytozome, the Floral Genome Project, MoccaDB, SolRgene and BRAD [19, 34–38]. As more plant genome sequences become available, it will be interesting to extend and apply the current knowledge in PMRD for comparative studies. Future versions of PMRD will provide cross-species tools for comparing and mining male reproduction related genes. Finally there is an urgent need for automatic literature curation, since manual text curation is a challenging job for annotators, which requires much expertise and devotion. A number of gateway databases for model species have adopted text-mining tools. The Mouse Genome Informatics has initiated a dictionary based text mining tool to help biocuration . Flybase has developed natural language processing and automatic experimental information categorization tools to aid curation [40, 41]. At the moment the data sources of PMRD are mostly literature from genetic and molecular studies. In such papers, information is often organized into discernable sections, such as initial characterization of a gene, gene expression assays, and morphological phenotype observations, etc. Two text-mining tools are currently available for the Arabidopsis [42, 43]; it is hoped that such text-processing software will be used in future updates and maintenance of the database.
Finally, plant male reproduction covers a wide range of biological processes and the improvement of PMRD requires continuous effort and community contributions. The first version of PMRD is based on data collected mainly from anther and pollen development. For future updates, we have opened online data collection tables to extend the detailed coverage of related topics.
Plant Male Reproduction Database (PMRD) is a comprehensive resource for browsing and retrieving knowledge about genes and mutants related to plant male reproduction. Currently, PMRD holds information for 4203 genes and 484 mutants associated with plant male reproduction across 33 plant species. The two major model plant species, rice and Arabidopsis, have the greatest number of entries and most detailed curation. The ultimate goal of the database is to extend this further to provide a dynamic and comprehensive information resource with associated data mining tools to aid research in plant male reproduction.
Availability and requirement
The PMRD database is freely accessible at .
Gene Expression Omnibus
Kyoto Encyclopedia of Genes and Genomes
- MR gene:
Male reproduction related gene
- MS gene:
Male sterile gene
The Rice Annotation Project Database
The Arabidopsis Information Resource.
Zhang D, Luo X, Zhu L: Cytological analysis and genetic control of rice anther development. J Genet Genomics. 2011, 38 (9): 379-390. 10.1016/j.jgg.2011.08.001.
Zhang D, Wilson ZA: Stamen specification and anther development in rice. Chin Sci Bull. 2009, 54 (14): 2342-2353. 10.1007/s11434-009-0348-3.
Chang F, Wang Y, Wang S, Ma H: Molecular control of microsporogenesis in Arabidopsis. Curr Opin Plant Biol. 2011, 14 (1): 66-73. 10.1016/j.pbi.2010.11.001.
Berger F, Hamamura Y, Ingouff M, Higashiyama T: Double fertilization - caught in the act. Trends Plant Sci. 2008, 13 (8): 437-443. 10.1016/j.tplants.2008.05.011.
Cheung AY, Wu HM: Structural and functional compartmentalization in pollen tubes. J Exp Bot. 2007, 58 (1): 75-82.
Twell D: A blossoming romance: gamete interactions in flowering plants. Nat Cell Biol. 2006, 8 (1): 14-16. 10.1038/ncb0106-14.
Cheng SH, Zhuang JY, Fan YY, Du JH, Cao LY: Progress in research and development on hybrid rice: a super-domesticate in China. Ann Bot. 2007, 100 (5): 959-966. 10.1093/aob/mcm121.
Fujita M, Horiuchi Y, Ueda Y, Mizuta Y, Kubo T, Yano K, Yamaki S, Tsuda K, Nagata T, Niihama M, et al: Rice expression atlas in reproductive development. Plant Cell Physiol. 2010, 51 (12): 2060-2081. 10.1093/pcp/pcq165.
Pina C, Pinto F, Feijo JA, Becker JD: Gene family analysis of the Arabidopsis pollen transcriptome reveals biological implications for cell growth, division control, and gene expression regulation. Plant Physiol. 2005, 138 (2): 744-756. 10.1104/pp.104.057935.
Deveshwar P, Bovill WD, Sharma R, Able JA, Kapoor S: Analysis of anther transcriptomes to identify genes contributing to meiosis and male gametophyte development in rice. BMC Plant Biol. 2011, 11: 78-10.1186/1471-2229-11-78.
Chen C, Farmer AD, Langley RJ, Mudge J, Crow JA, May GD, Huntley J, Smith AG, Retzel EF: Meiosis-specific gene discovery in plants: RNA-Seq applied to isolated Arabidopsis male meiocytes. BMC Plant Biol. 2010, 10: 280-10.1186/1471-2229-10-280.
Suzuki G: Recent progress in plant reproduction research: the story of the male gametophyte through to successful fertilization. Plant Cell Physiol. 2009, 50 (11): 1857-1864. 10.1093/pcp/pcp142.
Ma X, Feng B, Ma H: AMS-dependent and independent regulation of anther transcriptome and comparison with those affected by other Arabidopsis anther genes. BMC Plant Biol. 2012, 12 (1): 23-10.1186/1471-2229-12-23.
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000, 25 (1): 25-29. 10.1038/75556.
Kurata N, Yamazaki Y: Oryzabase. an integrated biological and genome information database for rice. Plant Physiol. 2006, 140 (1): 12-17.
China Rice Data C: [http://www.ricedata.cn].
Ohyanagi H, Tanaka T, Sakai H, Shigemoto Y, Yamaguchi K, Habara T, Fujii Y, Antonio BA, Nagamura Y, Imanishi T, et al: The Rice Annotation Project Database (RAP-DB): hub for Oryza sativa ssp. japonica genome information. Nucleic Acids Res. 2006, 34: 741-744. 10.1093/nar/gkj094.
Swarbreck D, Wilks C, Lamesch P, Berardini TZ, Garcia-Hernandez M, Foerster H, Li D, Meyer T, Muller R, Ploetz L, et al: The Arabidopsis Information Resource (TAIR): gene structure and function annotation. Nucleic Acids Res. 2008, 36: D1009-D1014.
Van Bel M, Proost S, Wischnitzki E, Movahedi S, Scheerlinck C, Van de Peer Y, Vandepoele K: Dissecting plant genomes with the PLAZA comparative genomics platform. Plant Physiol. 2012, 158 (2): 590-600. 10.1104/pp.111.189514.
Hunter S, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bork P, Das U, Daugherty L, Duquenne L, et al: InterPro: the integrative protein signature database. Nucleic Acids Res. 2009, 37: 211-215. 10.1093/nar/gkn785.
Ogata H, Goto S, Sato K, Fujibuchi W, Bono H, Kanehisa M: KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 1999, 27 (1): 29-34. 10.1093/nar/27.1.29.
Barrett T, Troup DB, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, Marshall KA, Phillippy KH, Sherman PM, et al: NCBI GEO: archive for functional genomics data sets--10 years on. Nucleic Acids Res. 2011, 39: 1005-1010. 10.1093/nar/gkq1184.
Sanders PM, Bui AQ, Weterings K, McIntire KN, Hsu YC, Lee PY, Truong MT, Beals TP, Goldberg RB: Anther developmental defects in Arabidopsis thaliana male-sterile mutants. Sex Plant Reprod. 1999, 11 (6): 297-322. 10.1007/s004970050158.
YUI Library: [http://developer.yahoo.com/yui/].
The Plant Ontology Consortium and plant ontologies. Comp Funct Genomics. 2002, 3 (2): 137-142. 10.1002/cfg.154.
Rice Knowledge Bank: [http://www.knowledgebank.irri.org/rice.htm].
Lopes CT, Franz M, Kazi F, Donaldson SL, Morris Q, Bader GD: Cytoscape Web: an interactive web-based network browser. Bioinformatics. 2010, 26 (18): 2347-2348. 10.1093/bioinformatics/btq430.
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215 (3): 403-410.
Ouyang S, Zhu W, Hamilton J, Lin H, Campbell M, Childs K, Thibaud-Nissen F, Malek RL, Lee Y, Zheng L, et al: The TIGR Rice Genome Annotation Resource: improvements and new features. Nucleic Acids Res. 2007, 35: 883-887. 10.1093/nar/gkl976.
Pruitt KD, Tatusova T, Brown GR, Maglott DR: NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy. Nucleic Acids Res. 2012, 40: 135-10.1093/nar/gks395.
Boutet E, Lieberherr D, Tognolli M, Schneider M, Bairoch A: UniProtKB/Swiss-Prot. Methods Mol Biol. 2007, 406: 89-112.
Goodstein DM, Shu S, Howson R, Neupane R, Hayes RD, Fazo J, Mitros T, Dirks W, Hellsten U, Putnam N, et al: Phytozome: a comparative platform for green plant genomics. Nucleic Acids Res. 2012, 40: 1178-1186. 10.1093/nar/gkr944.
Albert VA, Soltis DE, Carlson JE, Farmerie WG, Wall PK, Ilut DC, Solow TM, Mueller LA, Landherr LL, Hu Y, et al: Floral gene resources from basal angiosperms for comparative genomics research. BMC Plant Biol. 2005, 5: 5-10.1186/1471-2229-5-5.
Plechakova O, Tranchant-Dubreuil C, Benedet F, Couderc M, Tinaut A, Viader V, De Block P, Hamon P, Campa C, de Kochko A, et al: MoccaDB - an integrative database for functional, comparative and diversity studies in the Rubiaceae family. BMC Plant Biol. 2009, 9: 123-10.1186/1471-2229-9-123.
Vleeshouwers VG, Finkers R, Budding D, Visser M, Jacobs MM, van Berloo R, Pel M, Champouret N, Bakker E, Krenek P, et al: SolRgene: an online database to explore disease resistance genes in tuber-bearing Solanum species. BMC Plant Biol. 2011, 11: 116-10.1186/1471-2229-11-116.
Cheng F, Liu S, Wu J, Fang L, Sun S, Liu B, Li P, Hua W, Wang X: BRAD, the genetics and genomics database for Brassica plants. BMC Plant Biol. 2011, 11: 136-10.1186/1471-2229-11-136.
Dowell KG, McAndrews-Hill MS, Hill DP, Drabkin HJ, Blake JA: Integrating text mining into the MGI biocuration workflow. 2009, Database (Oxford), 19.
Karamanis N, Seal R, Lewin I, McQuilton P, Vlachos A, Gasperin C, Drysdale R, Briscoe T: Natural language processing in aid of FlyBase curators. BMC Bioinformatics. 2008, 9: 193-10.1186/1471-2105-9-193.
Fang R, Schindelman G, Van Auken K, Fernandes J, Chen W, Wang X, Davis P, Tuli MA, Marygold S, Millburn G, et al: Automatic categorization of diverse experimental information in the bioscience literature. BMC Bioinformatics. 2012, 13 (1): 16-10.1186/1471-2105-13-16.
Bajic VB, Veronika M, Veladandi PS, Meka A, Heng MW, Rajaraman K, Pan H, Swarup S: Dragon Plant Biology Explorer. a text-mining tool for integrating associations between genetic and biochemical entities with genome annotation and biochemical terms lists. Plant Physiol. 2005, 138 (4): 1914-1925. 10.1104/pp.105.060863.
Krallinger M, Rodriguez-Penagos C, Tendulkar A, Valencia A: PLAN2L: a web tool for integrated text mining and literature-based bioentity relation extraction. Nucleic Acids Res. 2009, 37: 160-165. 10.1093/nar/gkp484.
Plant Male Reproduction Database: [http://www.pmrd.org].
We thank Prof. Hugh Dickinson, Prof. Weicai Yang, Prof. Chris Franklin, Dr. Bing Zhang, Prof. Lijia Qu, Prof. David Twell, Prof. Yaoguang Li, Prof. De Ye and Dr. Jie Xu for helpful discussions. We would also like to thank the BBSRC China Partnership Scheme for providing the opportunity to link together colleagues working in Plant Reproduction in the UK and China, and initiating the idea for a reproduction database.
This work was supported by Funds from the National Natural Science Foundation of China [31110103915 and 30830014 to DB.Z.]; from the National Basic Research Program of China [2013CB129602 to DB.Z.]; the Chinese Transgenic Project [2011ZX08012-002 to DB.Z]; the National 863 High-Tech Project [2011AA10A101 and 2012AA10A302 to DB.Z]; and a project funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions.
The authors declare that they have no competing interests.
XC, DW and HY designed and implemented the database, website pages and on-line tools. XC, DW, ZAW, DZ, WY are responsible for data collection, manually curation and quality control. PS, CW participated in design of the database schema. ZAW and DZ conceived the study. XC, ZAW, DZ drafted the manuscript. All authors read and approved the manuscript.
Electronic supplementary material
Authors’ original submitted files for images
About this article
Cite this article
Cui, X., Wang, Q., Yin, W. et al. PMRD: a curated database for genes and mutants involved in plant male reproduction. BMC Plant Biol 12, 215 (2012). https://doi.org/10.1186/1471-2229-12-215