OPTIMAS-DW: A comprehensive transcriptomics, metabolomics, ionomics, proteomics and phenomics data resource for maize
- Christian Colmsee1,
- Martin Mascher1,
- Tobias Czauderna1,
- Anja Hartmann1,
- Urte Schlüter2,
- Nina Zellerhoff3,
- Jessica Schmitz3,
- Andrea Bräutigam4,
- Thea R Pick4, 5,
- Philipp Alter6,
- Manfred Gahrtz6,
- Sandra Witt7,
- Alisdair R Fernie7,
- Frederik Börnke2,
- Holger Fahnenstich8,
- Marcel Bucher3,
- Thomas Dresselhaus6,
- Andreas PM Weber4,
- Falk Schreiber1, 9,
- Uwe Scholz1Email author and
- Uwe Sonnewald2
© Colmsee et al.; licensee BioMed Central Ltd. 2012
Received: 14 September 2012
Accepted: 12 December 2012
Published: 29 December 2012
Maize is a major crop plant, grown for human and animal nutrition, as well as a renewable resource for bioenergy. When looking at the problems of limited fossil fuels, the growth of the world’s population or the world’s climate change, it is important to find ways to increase the yield and biomass of maize and to study how it reacts to specific abiotic and biotic stress situations. Within the OPTIMAS systems biology project maize plants were grown under a large set of controlled stress conditions, phenotypically characterised and plant material was harvested to analyse the effect of specific environmental conditions or developmental stages. Transcriptomic, metabolomic, ionomic and proteomic parameters were measured from the same plant material allowing the comparison of results across different omics domains. A data warehouse was developed to store experimental data as well as analysis results of the performed experiments.
The OPTIMAS Data Warehouse (OPTIMAS-DW) is a comprehensive data collection for maize and integrates data from different data domains such as transcriptomics, metabolomics, ionomics, proteomics and phenomics. Within the OPTIMAS project, a 44K oligo chip was designed and annotated to describe the functions of the selected unigenes. Several treatment- and plant growth stage experiments were performed and measured data were filled into data templates and imported into the data warehouse by a Java based import tool. A web interface allows users to browse through all stored experiment data in OPTIMAS-DW including all data domains. Furthermore, the user can filter the data to extract information of particular interest. All data can be exported into different file formats for further data analysis and visualisation. The data analysis integrates data from different data domains and enables the user to find answers to different systems biology questions. Finally, maize specific pathway information is provided.
With OPTIMAS-DW a data warehouse for maize was established, which is able to handle different data domains, comprises several analysis results that will support researchers within their work and supports systems biological research in particular. The system is available at http://www.optimas-bioenergy.org/optimas_dw.
KeywordsMaize Zea mays Database WGCNA Biomass Yield Data integration Transcriptomics Metabolomics Phenomics
Maize is a major crop plant, grown for human and animal nutrition, as well as a renewable resource for bioenergy. Considering that fossil fuels are limited, it is clear that there must be alternative ways of production. Biofuel might be such an alternative. When looking at the large increase of the world’s population it is also obvious that more people need to be provided with food, but in contrast there is less arable land available. The world’s climate change causes more extreme weather conditions all over the world, which means that plants must be more resistant to such conditions. Therefore, it is important to find ways to increase the yield and biomass in maize plants and to study how maize plants react within specific abiotic and biotic stress situations. The OPTIMAS project (OPTImisation of bioMASs in maize) was started in 2009 to find answers to the question of yield and biomass increase and furthermore to obtain useful insights into the distribution of plant resources between vegetative biomass and corn yield (http://www.optimas-bioenergy.org/).
The rapid improvement of analytical methods now enables to extend systems biology approaches directly for crop plant systems. During the project maize plants were grown under a large set of controlled stress conditions, characterised phenotypically, and plant material was harvested to analyse the effect of specific environmental conditions (e.g. cold, drought or nutrient stress) or developmental (e.g. flowering, leaf gradient dependent growth stage) stages. It was anticipated that the collected measurement data of transcriptomics, metabolomics, ionomics, and proteomics from the same plant material would facilitate the comparison of results on different omics domains. A better understanding of metabolic events underlying phenotypic changes will allow to further optimise maize breeding and cultivation for its multiple purposes as food, feed and bioenergy source. A central goal was to store the collected data in a database and to find a concept to link these data from the different domains and finally to provide access to all collected data and analysis results to the users.
There are already several maize databases available such as MaizeGDB  and Panzea . MaizeGDB is a database for storing and curating genetics and genomics related data of maize. It serves also as a community platform comprising maize references and information about persons and organisations. Panzea is a database dealing with molecular and functional diversity in the maize genome. The database includes genotypic, phenotypic and polymorphism data.
Here OPTIMAS-DW, a comprehensive data warehouse containing large amounts of integrated maize-specific data from five domains is presented. A data warehouse is a database which enables a user to integrate and analyse data from different data domains. This includes transcriptomic, metabolomic, proteomic, ionomic and phenomic data as well as metabolic pathways. So far there exists no other database allowing to store data of all these data domains. OPTIMAS-DW is a public information resource which provides researchers with a large collection of data for their own research. This paper describes the structure and usage of OPTIMAS-DW.
Construction and content
The web application itself enables the user to browse, search, filter and download stored data. OPTIMAS-DW supports two ways of data export: (a) the export to a tab delimited file comprising all data selected by the user (e.g. data filtered for a specific treatment or a timepoint) and (b) the export into a special format for data analysis and visualisation in VANTED , a tool for the Visualisation and Analysis of Networks containing Experimental Data. The export tool is a Java Web Start application where the user selects the file format and a target directory where the file will be stored. Using VANTED the user is able to map the experimental data from the OPTIMAS data warehouse onto maize specific pathways stored in MetaCrop , a manually curated repository comprising high quality data about crop plant metabolism. The pathways can also be accessed through the web interface of OPTIMAS-DW. Therefore the MetaCrop database schema provides the relevant information to link the web interface of OPTIMAS-DW directly to the specific pathways in MetaCrop. Besides using VANTED, the data can also be analysed with other tools, such as an R package called WGCNA (Weighted Gene Correlation Network Analysis) . WGCNA is a widely used tool [7–9], allowing the user to find interesting genes responsible for high biomass production in maize plants. The results of these analyses can be accessed through the web interface. Beside the WGCNA analysis results, GeneSpring (version GX11) analysis results are available as well.
Within the OPTIMAS project a 44K Agilent oligo chip was designed to perform different gene expression analyses. Based on experiences from previous works with the POCI array (Potato Oligo Chip Initiative) , the database schema from this project was adapted for the OPTIMAS oligo chip. Therefore at the start of the project the newest available maize genome version 3b.50 (http://maizesequence.org) was used to select the unigenes for the oligo chip. BLAST  analyses provided information about redundancies within the dataset. To overcome the redundancies a MIRA  assembly (Mimicking Intelligent Read Assembly) was performed. Furthermore, 17,723 contigs and 16,881 singletons were selected as unigenes, and four additional sequences, special candidate genes from project partners, which were not present in the selected unigenes, were added. To achieve a number of 42,000 unigenes, 7,392 EST sequences from NCBI maize unigenes, which had no hit on the predicted maize genes, were selected additionally. The eArray-software from Agilent (http://www.genomics.agilent.com) computed a number of 41,780 60-mer oligos from the unigenes, which can be browsed and downloaded at OPTIMAS-DW. When the maize genome was published in November 2009 , the quality of the 44K chip was analysed in comparison to the maize genome version 4a.53. It was discovered, that 52,040 of the 53,764 gene models got a hit in the OPTIMAS unigene set. Therefore, it was decided to generate a mapping from the OPTIMAS identifiers to the 4a.53 version of the maize genome. Beside the availabilty in OPTIMAS-DW the chip data is published in GEO  and is available online (http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL14913). As an additional service, OPTIMAS-DW provides ViroBLAST , which enables users to run a BLAST of sequences against the OPTIMAS unigene set, the maize genome (version 3b.50 and 4a.53) or the NCBI maize unigene set.
Overview about BLAST analyses in OPTIMAS-DW
Entries in hit database
Maize Genome 3b.50
NCBI non redundant Peptides
Maize Genome 3b.50
NCBI Zea mays Unigene Build 75
Maize Genome 3b.50
EMBL fungi ESTs
Maize Genome 4a.53
NCBI non redundant Peptides
(used for Blast2Go)
OPTIMAS Oligo Set
Uniref, version 2011-09-21
Overview about experiments and measurements for all data domains stored in OPTIMAS-DW
Cold Stress (A188 and B73)
(study of 2 pairs of maize inbred lines,
each having one line with a good
water-use-efficiency and one line
with a poor one)
Nitrogen Stress (A188 and B73)
Nitrogen Use Efficiency 1
(16 maize inbred lines)
Nitrogen Use Efficiency 4
(C:N ratios of different plant parts)
Mycorrhiza Compartment 2/3
(Physiological, elemental, gene
expression and metabolite analysis
of mycorrhizal maize line B73)
Mycorrhiza Compartment 6/8
(Screening of 27 maize lines for their
responsiveness towards the arbuscular
mycorrhiza fungi by physiological and
Mycorrhiza Compartment 9
(Analysis of 2 closely related pairs
of maize lines for their physiological,
elemental and metabolite profile in
reaction to mycorrhiza infection.)
Field Experiment 2010
(26 inbred lines grown in the field)
13C Disc feeding (13C enrichment)
13C Glucose feeding (13C enrichment)
13CO2 feeding (13CO2 enrichment)
15N Urea feeding (15N enrichment)
Plant growth stage experiments
(Comparison of elemental composition
of maize kernels of line B73 provided
by Regensburg or BASF)
(analysis of 2 pairs of maize inbred
lines to identify transcripts/metabolites
regulating flowering time in maize)
(analysis of the developmental
gradient of the third maize leaf)
Complete Data Warehouse
Another way to navigate through the data starts at the experiment view. Here, the user can browse all experiments and can access the data from all domains available for a particular experiment sample. As described in Figure 1, the metadata concept is realised within this web page. By the selection of a specific experiment sample, data from each data domain can be retreived. When using the data domain specific view instead, the metadata is also visible to the user but only domain specific data can be browsed. The domain specific view therefore enables the user to retrieve and analyse domain specific data while the experiment view has the advantage to retrieve and analyse data of different domains of specific samples. For each experiment also a short description is availble (Figure 3e).
Beside the experimental data the user is able to retrieve analysis results from the web interface. This includes WGCNA results where specific conditions were defined. The user can, for example, detect genes that are highly correlated to fresh weight, growth rate or metabolite profiles in the nitrogen stress experiment . Additionally, the chip ID and the correlation values are listed as well as the annotation. Furthermore, module lists of a WGCNA analysis are available where the transcript data of all experiments were included. A module is a cluster of interconnected nodes representing highly correlated genes. Here, the user can detect which genes are located in the same module. Additionally, a gene expression profile for each module is available visualising the average expression pattern (module eigengene) for each sample of analysed experiments. The WGCNA module overview enables the user to carry out in which module a specific gene is located in the different WGCNA analysis results. With the help of that function the user can detect genes, which will react in a different way in different experiments.
Finally, the user can browse through the maize specific pathways stored in MetaCrop. A table provides a list of these pathways including clickable thumbnail images redirecting the user to MetaCrop. In MetaCrop the user can navigate through the pathway data to get detailed information.
OPTIMAS-DW provides an innovative concept to link data from different data domains through metadata. It is very easy to extend the data warehouse by additional data domains by adjusting the main components of OPTIMAS-DW, such as data templates, the import tool, the database schema, and the web interface. With the web interface the user can extract data very easy either by browsing through a specific data domain of interest or by looking for data related to a specific experiment. Tools like the Gene Specific View or the WGCNA analysis enable the user to get answers to different systems biological questions. With WGCNA it is for example possible to detect genes, which are correlated to the growth rate or the fresh weight of a maize plant. In that case transcript and phenomic data is used by the analysis to detect responsible genes for biomass increase in maize plants. With OPTIMAS-DW the user is able to perform data analysis using different data domains. By using the Gene Specific View the user is, for example able to look at the behavior of genes of interest in different experiments and conditions such as genotype, plant growth stage, treatment or plant anatomy.
OPTIMAS-DW enables us to store more experimental data in future maize related projects to enhance our data collection of maize. By further extending the database content and its functionality OPTIMAS-DW could help the researchers to better understand the systems biological processes in maize plants. Because of the experiences gathered during the development of OPTIMAS-DW, the infrastructure and pipeline could also be used to set up data warehouses for other plant organisms. Furthermore, ways of data analysis could be improved in the future enabling users to start data analyses directly from the web interface and with selected data of their interest.
With OPTIMAS-DW a comprehensive data warehouse for maize was established, which is able to handle different data domains and which comprises several analysis results that will support researchers within further projects. The easy access to transcriptomics, metabolomics, ionomics and proteomics data from plant material with detailed phenotypic description allows the use of the full potential of large scale analysis tools in the future. It is also possible to continuously extend the data warehouse by adding more experimental data, even in data domains which are not already available in OPTIMAS-DW. The concept to combine different data domains by metadata will be used in future projects. We believe that OPTIMAS-DW will be a very valuable public data warehouse for maize related research and supports systems biological research in particular.
Availability and requirements
The OPTIMAS-DW is available with no restrictions at the following http://www.optimas-bioenergy.org/optimas_dw. All datasets are free to use and can be downloaded via the web interface. There are no restrictions on use of the database as well as all stores data sets.
CC developed the data warehouse including the OPTIMAS-DW web application. CC, TC and AH designed and provided the data import template. U. Schlüter, NZ, AB, TRP, PA, MG, SW and HF provided the experimental data. MM performed the WGCNA analyses. ARF, FB, HF, MB, TD, APMW, and U. Sonnewald supervised the biological part of the project. FS and U. Scholz supervised the bioinformatics part of the project. U. Sonnewald headed the project consortium. All authors tested and used OPTIMAS-DW and thereby contributed to the improvement of the web interface usability. All authors read and approved the final manuscript.
We thank Thomas Schmutzer for his help with the sequence assembly, Matthias Klapperstück for his support to get access to the MetaCrop database, Sophia Sonnewald for doing the chip design with Agilent, and Stephan Weise for proof reading of the manuscript. This work was supported by the German Federal Ministry of Education and Research in the frame of OPTIMAS [FKZ 0315430A-G].
- Lawrence C, Dong Q, Polacco M, Seigfried T, Brendel V: MaizeGDB, the community database for maize genetics and genomics. Nucleic Acids Res. 2004, 32: D393-D397. 10.1093/nar/gkh011.PubMedPubMed CentralView Article
- Zhao W, Canaran P, Jurkuta R, Fulton T, Glaubitz J, Buckler E, Doebley J, Gaut B, Goodman M, Holland J, et al: Panzea: a database and resource for molecular and functional diversity in the maize genome. Nucleic Acids Res. 2006, 34: D752-D757. 10.1093/nar/gkj011.PubMedPubMed CentralView Article
- Kuenne C, Grosse I, Matthies I, Scholz U, Sretenovic-Rajicic T, Stein N, Stephanik A, Steuernagel B, Weise S: Using data warehouse technology in crop plant bioinformatics. J Integr Bioinf. 2007, 4: 88.
- Junker BH, Klukas C, Schreiber F: VANTED: A system for advanced data analysis and visualization in the context of biological networks. BMC Bioinf. 2006, 7: e109-10.1186/1471-2105-7-109.View Article
- Schreiber F, Colmsee C, Czauderna T, Grafahrend-Belau E, Hartmann A, Junker A, Junker B, Klapperstück M, Scholz U, Weise S: MetaCrop 2.0: managing and exploring information about crop plant metabolism. Nucleic Acids Res. 2012, 40: D1173-D1177. 10.1093/nar/gkr1004.PubMedPubMed CentralView Article
- Langfelder P, Horvath S: WGCNA: an R package for weighted correlation network analysis. BMC Bioinf. 2008, 9: e559-10.1186/1471-2105-9-559.View Article
- Ficklin S, Feltus F: Gene Coexpression Network Alignment and Conservation of Gene Modules between Two Grass Species: Maize and Rice. Plant Physiol. 2011, 156: 1244-1256. 10.1104/pp.111.173047.PubMedPubMed CentralView Article
- Weston D, Gunter L, Rogers A, Wullschleger S: Connecting genes, coexpression modules, and molecular signatures to environmental stress phenotypes in plants. BMC Syst Biol. 2008, 2: e16-10.1186/1752-0509-2-16.View Article
- DiLeo M, Strahan G, den Bakker, Hoekenga O: Weighted Correlation Network Analysis (WGCNA) Applied to the Tomato Fruit Metabolome. PLoS ONE. 2011, 6: e26683-10.1371/journal.pone.0026683.PubMedPubMed CentralView Article
- Kloosterman B, De Koeyer D, Griffiths R, Flinn B, Steuernagel B, Scholz U, Sonnewald S, Sonnewald U, Bryan G, Prat S, et al: Genes driving potato tuber initiation and growth: identification based on transcriptional changes using the POCI array. Funct Integr Genomics. 2008, 8: 329-340. 10.1007/s10142-008-0083-x.PubMedView Article
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 3: 403-410.View Article
- Chevreux B, Pfisterer T, Drescher B, Driesel A, Müller W, Wetter T, Suhai S: Using the miraEST assembler for reliable and automated mRNA transcript assembly and SNP detection in sequenced ESTs. Genome Res. 2004, 14: 1147-1159. 10.1101/gr.1917404.PubMedPubMed CentralView Article
- Schnable P, Ware D, Fulton R, Stein J, Wei F, Pasternak S, Liang C, Zhang J, Fulton L, Graves T, et al: The B73 Maize Genome: Complexity, Diversity, and Dynamics. Science. 2009, 326: 1112-1115. 10.1126/science.1178534.PubMedView Article
- Barrett T, Troup D, Wilhite S, Ledoux P, Rudnev D, Evangelista C, Kim I, Soboleva A, Tomashevsky M, Marshall K, et al: NCBI GEO: archive for high-throughput functional genomic data. Nucleic Acids Res. 2009, 37: D8885-D890.View Article
- Deng W, Nickle D, Learn G, Maust B, Mullins J: ViroBLAST: a stand-alone BLAST web server for flexible queries of multiple databases and user’s datasets. Bioinformatics. 2007, 23: 2334-2336. 10.1093/bioinformatics/btm331.PubMedView Article
- Conesa A, Götz S, Garcia-Gomez J, Terol J, Talon M, Robles M: Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics. 2005, 21: 3674-3676. 10.1093/bioinformatics/bti610.PubMedView Article
- Harris M, Clark J, Ireland A, Lomax J, Ashburner M, Foulger R, Eilbeck K, Lewis S, Marshall B, Mungall C, et al: The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 2004, 32: D258-D261. 10.1093/nar/gkh036.PubMedView Article
- Webb IUoB EC Biology: Nomenclature Committee. (1992) Enzyme nomenclature 1992 : recommendations of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology on the nomenclature and classification of enzymes. 1992, San Diego: Academic Press.
- Swarbreck D, Wilks C, Lamesch P, Berardini T, Garcia-Hernandez M, Foerster H, Li D, Meyer T, Muller R, Ploetz L, et al: The Arabidopsis Information Resource (TAIR): gene structure and function annotation. Nucleic Acids Res. 2008, 36: D1009-D1014.PubMedPubMed CentralView Article
- Paterson A, Bowers J, Bruggmann R, Dubchak I, Grimwood J, Gundlach H, Haberer G, Hellsten U, Mitros T, Poliakov A, et al: The Sorghum bicolor genome and the diversification of grasses. Nature. 2009, 457: 551-556. 10.1038/nature07723.PubMedView Article
- Pick T, Bräutigam A, Schlüter U, Denton A, Colmsee C, Scholz U, Fahnenstich H, Pieruschka R, Rascher U, Sonnewald U, et al: Systems Analysis of a Maize Leaf Developmental Gradient Redefines the Current C4 Model and Provides Candidates for Regulation. Plant Cell. 2011, 23: 1-13. 10.1105/tpc.111.230110.View Article
- Junker A, Rohn H, Czauderna T, Klukas C, Hartmann A, Schreiber F: Creating interactive, web-based and data-enriched maps with the Systems Biology Graphical Notation. Nat Protoc. 2012, 7: 579-593. 10.1038/nprot.2012.002.PubMedView Article
- Le Novère N, Hucka M, Mi H, Moodie S, Schreiber F, Sorokin A, Demir E, Wegner K, Aladjem M, Wimalaratne S, et al: The Systems Biology Graphical Notation. Nat Biotechnol. 2009, 27: 735-741. 10.1038/nbt.1558.PubMedView Article
- Schlüter U, Mascher M, Colmsee C, Scholz U, Bräutigam A, Fahnenstich H, Sonnewald U: Maize source leaf adaptation to nitrogen deficiency effects not only N and C metabolism but also control of P homeostasis. Plant Physiol. 2012, 160: 1384-1406. 10.1104/pp.112.204420.PubMedPubMed CentralView Article
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License(http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.