- Database
- Open access
- Published:
AgriSeqDB: an online RNA-Seq database for functional studies of agriculturally relevant plant species
BMC Plant Biology volume 18, Article number: 200 (2018)
Abstract
Background
The genome-wide expression profile of genes in different tissues/cell types and developmental stages is a vital component of many functional genomic studies. Transcriptome data obtained by RNA-sequencing (RNA-Seq) is often deposited in public databases that are made available via data portals. Data visualization is one of the first steps in assessment and hypothesis generation. However, these databases do not typically include visualization tools and establishing one is not trivial for users who are not computational experts. This, as well as the various formats in which data is commonly deposited, makes the processes of data access, sharing and utility more difficult. Our goal was to provide a simple and user-friendly repository that meets these needs for data-sets from major agricultural crops.
Description
AgriSeqDB (https://expression.latrobe.edu.au/agriseqdb) is a database for viewing, analysing and interpreting developmental and tissue/cell-specific transcriptome data from several species, including major agricultural crops such as wheat, rice, maize, barley and tomato. The disparate manner in which public transcriptome data is often warehoused and the challenge of visualizing raw data are both major hurdles to data reuse. The popular eFP browser does an excellent job of presenting transcriptome data in an easily interpretable view, but previous implementation has been mostly on a case-by-case basis. Here we present an integrated visualisation database of transcriptome data-sets from six species that did not previously have public-facing visualisations. We combine the eFP browser, for gene-by-gene investigation, with the Degust browser, which enables visualisation of all transcripts across multiple samples. The two visualisation interfaces launch from the same point, enabling users to easily switch between analysis modes. The tools allow users, even those without bioinformatics expertise, to mine into data-sets and understand the behaviour of transcripts of interest across samples and time. We have also incorporated an additional graphic download option to simplify incorporation into presentations or publications.
Conclusion
Powered by eFP and Degust browsers, AgriSeqDB is a quick and easy-to-use platform for data analysis and visualization in five crops and Arabidopsis. Furthermore, it provides a tool that makes it easy for researchers to share their data-sets, promoting research collaborations and data-set reuse.
Background
RNA-sequencing (RNA-Seq) is currently the preferred technology for genome-wide transcriptional profiling due to its combined ease of use, quality of data and suitability for a diverse range of applications [1, 2]. Recent advances in next generation sequencing (NGS) technologies coupled with decreases in the cost of sequencing have resulted in collection of large volumes of RNA-Seq data from many species [3, 4]. These data are typically deposited in online repositories in formats that are text and/or table-based. Visualization of data is a key early step in transcriptomic analysis for many biologists, allowing examination of data quality, as well as rapid interrogation of leads and hypothesis generation. Many researchers who wish to investigate public transcriptome data are not computational experts, for whom transferring data from the format of online repositories to visualization tools is challenging. This creates a barrier to data reuse. The eFP browser, which was first developed for in silico gene expression analysis in Arabidopsis, is an excellent piece of software to display transcriptome data visually [5]. At the time of writing, 20 plant transcriptome data-sets are available publicly in dedicated eFP browsers (http://bar.utoronto.ca, [5,6,7,8,9,10,11,12,13,14,15,16,17]). Degust is a web-based data visualization tool that provides different functionality from eFP functionality (https://github.com/drpowell/degust). It enables users to view all transcripts from all samples in an experiment, examine trends between samples, to visualize quality-control metrics and to drill down into subsets of transcripts with expression patterns of interest. These two data browsers could be integrated to provide users with an easy to use tool for accessing and analysing multiple data-sets and, by developing some enhanced functionality, they could be used for data download and to generating quality images for presentations or publications.
RNA-Seq is often performed at whole plant or organ level using samples that are composed of different tissues and cell types. This approach masks cell- or tissue-specific information about transcripts, which is important to understand spatial-regulation and functions of genes [18, 19]. Spatial resolution is also important to capture transcripts that are expressed at extremely low levels in specific cell types and that are consequently below the limit of detection in bulk samples of tissues [2]. Temporal gene expression data is also an important tool, which can be used to investigate the mechanisms of genome regulation and to understand the relationships between development and gene function [20]. These approaches have been used in functional studies aimed at deciphering regulatory and structural gene networks of diverse plant species, including forest trees and major crops such as wheat (Triticum aestivum), rice (Oryza sativa), maize (Zea mays), barley (Hordeum vulgare), and tomato (Solanum lycopersicum) [21,22,23,24,25,26,27].
Here we present AgriSeqDB (https://expression.latrobe.edu.au/agriseqdb), a web-based resource that can serve as a public portal for accessing, analysing and visualizing tissue and cell-specific transcriptome data-sets from multiple species. Our focus in this implementation is primarily upon transcriptome data-sets during the development of seeds and fruits of agriculturally-relevant species. The database integrates two existing open-source browsers and enhances their functionality. The Degust browser provides access to information on genome-wide expression across samples and data-sets, aiding the discovery of new genes that can contribute to crop improvement. It also provides quality-control information. The eFP browser allows users to visualize between different samples the abundance of individual transcripts encoded by genes of interest.
Construction and content
Database/website architecture
The main structure of AgriSeqDB is described in Fig. 1. It consists of a landing portal that is implemented using an HTML frontend and Python/Django backend to present all data-sets and associated meta-data to users. The landing portal allows the user to discover the data-sets and navigate to data viewers of interest. The existing eFP browser, which has HTML (frontend) and Python (backend) tools, was selected in order to allow users to view expression data on a gene-by-gene basis [5]. Additionally, the existing tool Degust is included to allow viewing of expression profiles across all (or a subset of) genes at once [28]. Degust uses an HTML/Javascript frontend and Haskell backend. Both tools were linked and wrapped within the Landing Portal to ensure that users receive a consistent look and feel when using the portal and each viewer (Fig. 2). The source code for the landing portal and integrations with the viewers is available for reuse (https://bitbucket.org/arobinson/agribiohvc). This repository makes use of git submodules to link the source code of eFP and Degust browsers, each of which was modified slightly from original versions to ensure that they link cleanly; source code for modified versions is available at https://bitbucket.org/arobinson/efp and https://github.com/andrewjrobinson/degust, respectively. The Landing Portal and eFP browser use a MySQL database server to store settings and data/meta-data, while Degust uses files on the file system. A central configuration portal was added to ease the loading of data-sets into the database and of the landing portal documentation, allowing organism annotation upload, data-set upload, data-set configuration such as making it private/public, providing external links and abstract etc., and deploying the data-set to what we refer to as GeneView (eFP) or GeneExplore (Degust).
Data sources
All data-sets displayed currently in AgriSeqDB are transcriptomes published recently and deposited in public databases (Fig. 2, Table 1). Users of AgriSeqDB can view data directly from database server without the need to download it and then install/configure a viewer to visualise it. The data-sets were generated in six studies of seeds or fruit. The first is a study we conducted of transcriptome changes in whole Arabidopsis seeds during germination, which provides a useful reference due to this species’ high-quality genome sequence and annotation [20]. Additionally, we displayed five data-sets from major agricultural crops. These were: A study of transcriptome changes in different tissues (aleurone, starchy endosperm, embryo, scutellum, pericarp, testa, husk and crushed cell layers) of barley grain at different stages of germination [21]; a study on transcriptome changes associated with different cell types of maize endosperm after pollination [22]; a study on transcriptome changes associated with seed germination and coleoptile growth in rice [23]; a study on transcriptome changes associated with fruit development in tomato fruit [26]; and a study on grain/endosperm transcriptome of bread wheat [24]. The GeneExplore (Degust) component of AgriSeqDB requires RNA-seq data in raw count format (i.e. number of reads per gene or transcript, not normalised) for the subsequent analyses it applies. We made use of the raw count data provided by the original authors on the respective GEO/SRA repositories for the Arabidopsis, wheat, rice and barley studies (Table 1). In those cases, mapping and read counting were consequently as described previously [20, 21, 23]. In the case of the maize and tomato data-sets, raw count data were not available from the GEO/SRA repositories, but the original sequence reads were. For these we aligned and quantified the count data using Kallisto with the reference tomato or maize transcriptomes (Solanum lycopersicum SL2.50 or Zea mays AGPv4), using the resulting data as the input to AgriSeqDB [29].
Utility and discussion
Our goal was to develop a publicly accessible transcriptome database that provides simple and readily available tools to perform functional analysis of individual target genes or sets of genes. AgriSeqDB is a highly interactive and multi-view database that can be used for various purposes, including the discovery of genes of interest. Users of AgriSeqDB can view data directly from database server without the need to download it and then install/configure a viewer to visualise it. However, we provide the option for advanced users to download and install their own local AgriSeqDB for custom data-sets.
GeneView (eFP)
AgriSeqDB also allows users to get a better understanding of individual genes of interest, by inspecting them within GeneView (eFP) (Fig. 3). This incorporates the full existing functionality of eFP [5]. Users can visualise expression of transcripts across all samples so that they may consider the relationships between samples (i.e. growth stage, tissue type, various treatments). Additionally, we incorporated an additional image download function, not previously available. Images may be downloaded in high-resolution .png format for presentations or publications. This is done by single clicking the Download button (Fig. 3). We have also enabled cross-species comparisons directly from the GeneView (eFP) records. When users are viewing a gene that interests them within GeneView (eFP), they can click on a button that directly returns a search from the Gramene database (http://www.gramene.org). This returns homologs, orthologs and paralogs drawn from 2,076,020 genes across 53 crop and model plant species, as well as a comparative phylogenetic tree.
GeneExplore (Degust)
Users are presented with a simple interface to query all genes using GeneExplore (Degust) (Fig. 4a). Extensive existing functionality is available to users within Degust [28]. Filters can be created on the data based upon expression levels in individual samples, false discovery rate (FDR) and Log2 fold-change cut-off. Sub-sets of samples or transcripts can be selected for analysis can be analysed and the sample for referencing fold-change can be selected. MA plots of comparisons between pairs of samples can be displayed (Fig. 4b). Data quality metrics can be assessed by inspecting whether the replicates of each sample group together in the multidimensional scaling (MDS) plot (Fig. 4c). Data tables can also be downloaded for selected transcripts in .csv format for downstream analyses.
Data-set administration (advanced usage case)
One key inclusion to AgriSeqDB is the data-set administration tool. This tool is available only when users download and install their own local AgriSeqDB, for reasons of server security detailed below. eFP browser did not contain an interface to upload data, so configuration required much manual interaction. While Degust contained its own administration tool, it was not flexible enough to accommodate eFP and the landing portal. Consequently, we developed a new data upload interface to encompass both eFP and Degust. This allows the user (secured by username/password) to upload new data-sets and deploy them to each of the viewers. The tool provides the user the ability to execute custom code on the host server, access to which should be restricted to the local database administrator and trusted users. An example configuration is included in Fig. 5 and Additional file 1: Figure S1. A link is provided on our AgriSeqDB landing portal that takes users to a repository from which all AgriSeqDB code can be downloaded and from where installation/administration instructions and help files can be viewed (Fig. 2). The direct address of the code and help repository is https://bitbucket.org/arobinson/agribiohvc. The repository includes a link from which users can access a Galaxy Project RNA-seq analysis tutorial (https://galaxyproject.org/tutorials/rb_rnaseq/), which users may find useful to prepare data when establishing their own local AgriSeqDB.
Conclusions
We believe AgriSeqDB will be an important resource and data-reuse tool for plant biologists who seek greater insights into the role of individual genes or group of genes in biological processes, including for comparative studies in crop species of major agricultural importance. The databases will be periodically updated with more viewers and data-sets, focusing on additional tissue and cell-specific data-sets from crop species. The database currently contains results of RNA-Seq from different tissues and cell types, and it is planned that transcriptome data from single cell RNA-Seq will be added in the future. In the long term it is envisaged that users will be provided with links to GEO auto-download and view as well as allowed to upload data-sets at least temporarily. All source code is freely available for reuse by advanced users.
Abbreviations
- eFP:
-
Electronic Fluorescent Pictograph
- FDR:
-
False Discovery Rate
- GEO:
-
Gene Expression Omnibus
- HTML:
-
Hypertext Markup Language
- MDS:
-
Multi-Dimensional Scaling
- NGS:
-
Next Generation Sequencing
- RNA-Seq:
-
RNA Sequencing
References
Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nature Rev Genet. 2009;10:5–63.
Martin LB, Fei Z, Giovannoni JJ, Rose JK. Catalyzing plant science research with RNA-Seq. Front Plant Sci. 2013;4:66.
Petryszak R, Fonseca NA, Füllgrabe A, Huerta L, Keays M, Tang YA, Brazma A. The RNASeq-er API-a gateway to systematically updated analysis of public RNA-Seq data. Bioinformatics. 2017;33:2218–20.
Langmead B, Nellore A. Cloud computing for genomic data analysis and collaboration. Nature Rev Genet. 2018;19:208–19.
Winter D, Vinegar B, Nahal H, Ammar R, Wilson GV, Provart NJ. An “electronic fluorescent pictograph” browser for exploring and analyzing large-scale biological data sets. PLoS One. 2007;2:e718.
Waese J, Fan J, Pasha A, Yu H, Fucile G, Shi R, et al. ePlant: visualizing and exploring multiple levels of data for hypothesis generation in plant biology. Plant Cell. 2017;doi.org/10.1105/tpc.17.00073.
Fucile G, Di Biase D, Nahal H, La G, Khodabandeh S, Chen Y, et al. ePlant and the 3D data display initiative: integrative systems biology on the world wide web. PLoS One. 2011;6:e15237.
Dean G, Cao Y, Xiang D, Provart NJ, Ramsay L, Ahad A, et al. Analysis of gene expression patterns during seed coat development in Arabidopsis. Mol Plant. 2011;4:1074–91.
Mustroph A, Zanetti ME, Jang CJ, Holtan HE, Repetti PP, Galbraith DW, et al. Profiling translatomes of discrete cell populations resolves altered cellular priorities during hypoxia in Arabidopsis. PNAS. 2009;106:18843–8.
Wilkins O, Nahal H, Foong J, Provart NJ, Campbell MM. Expansion and diversification of the Populus R2R3-MYB family of transcription factors. Plant Physiol. 2009;149:981–93.
Tran F, Penniket C, Patel RV, Provart NJ, Laroche A, Rowland O, Robert LS. Developmental transcriptional profiling reveals key insights into Triticeae reproductive development. Plant J. 2013;74:971–88.
Sibout R, Proost S, Hansen BO, Vaid N, Giorgi FM, Ho-Yue-Kuang S, et al. Expression atlas and comparative coexpression network analyses reveal important genes involved in the formation of lignified cell wall in Brachypodium distachyon. New Phytol. 2017;215:1009–25.
Champigny MJ, Sung WW, Catana V, Salwan R, Summers PS, Dudley SA, et al. RNA-Seq effectively monitors gene expression in Eutrema salsugineum plants growing in an extreme natural habitat and in controlled growth cabinet conditions. BMC Genomics. 2013;14:578.
Kagale S, Nixon J, Khedikar Y, Pasha A, Provart NJ, Clarke WE, et al. The developmental transcriptome atlas of the biofuel crop Camelina sativa. Plant J. 2016;88:879–94.
Clevenger J, Chu Y, Scheffler B, Ozias-Akins P. A developmental transcriptome map for allotetraploid Arachis hypogaea. Front Plant Sci. 2016;7:1446.
Fasoli M, Dal Santo S, Zenoni S, Tornielli GB, Farina L, Zamboni A, et al. The grapevine expression atlas reveals a deep transcriptome shift driving the entire plant into a maturation program. Plant Cell. 2012;24:3489–505.
Li P, Ponnala L, Gandotra N, Wang L, Si Y, Tausta SL, et al. The developmental dynamics of the maize leaf transcriptome. Nat Genet. 2010;42:1060.
Slane D, Kong J, Berendzen KW, Kilian J, Henschen A, Kolb M, et al. Cell type-specific transcriptome analysis in the early Arabidopsis thaliana embryo. Development. 2014;41:4831–40.
Chen J, Zeng B, Zhang M, Xie S, Wang G, Hauck A, Lai J. Dynamic transcriptome landscape of maize embryo and endosperm development. Plant Physiol. 2014;166:252–64.
Narsai R, Gouil Q, Secco D, Srivastava A, Karpievitch YV, Liew LC, et al. Extensive transcriptomic and epigenomic remodelling occurs during Arabidopsis thaliana germination. Genome Biol. 2017;18:172.
Betts NS, Berkowitz O, Liu R, Collins HM, Skadhauge B, Dockter C, et al. Isolation of tissues and preservation of RNA from intact, germinated barley grain. Plant J. 2017;91:754–65.
Zhan J, Thakare D, Ma C, Lloyd A, Nixon NM, Arakaki AM, et al. RNA sequencing of laser-capture microdissected compartments of the maize kernel identifies regulatory modules associated with endosperm cell differentiation. Plant Cell. 2015;27:513–31.
Narsai R, Secco D, Schultz MD, Ecker JR, Lister R, Whelan J. Dynamic and rapid changes in the transcriptome and epigenome during germination and in developing rice (Oryza sativa) coleoptiles under anoxia and re-oxygenation. Plant J. 2017;89:805–24.
Pfeifer M, Kugler KG, Sandve SR, Zhan B, Rudi H, Hvidsten TR, et al. Genome interplay in the grain transcriptome of hexaploid bread wheat. Science. 2014;345:1250091.
Celedon JM, Yuen M, Chiang A, Henderson H, Reid KE, Bohlmann J. Cell-type-and tissue-specific transcriptomes of the white spruce (Picea glauca) bark unmask fine-scale spatial patterns of constitutive and induced conifer defense. Plant J. 2017;92:710–26.
D’Esposito D, Ferriello F, Dal Molin A, Diretto G, Sacco A, Minio A, et al. Unraveling the complexity of transcriptomic, metabolomic and quality environmental response of tomato fruit. BMC Plant Biol. 2017;17:66.
Shinozaki Y, Nicolas P, Fernandez-Pozo N, Ma Q, Evanich DJ, Shi Y, et al. High-resolution spatiotemporal transcriptome mapping of tomato fruit development and ripening. Nat Commun. 2018;9:364.
Powell DR. https://github.com/drpowell/degust, 2013.
Bray NL, Pimentel H, Melsted P, Pachter L. Near-optimal probabilistic RNA-seq quantification. Nat Biotechnol. 2016;34:525.
Acknowledgments
We gratefully acknowledge the excellent work of Dr. Nicholas Provart (U. Toronto) and the eFP team past and present as well as the developer of Degust, David Powell (Monash University), whose software we utilised here. We also thank them for their advice when establishing our database. We thank all members of the teams who generated the data we have displayed in AgriSeqDB, who are too numerous to list here. We thank Maoshan Chen (La Trobe University) for his help with data processing.
Funding
This work was supported by a grant from the Australian National Data Service (ANDS) grant as well as by in-kind contributions from La Trobe University Information and Communication Technology and the La Trobe Genomics Platform.
Availability of data and materials
The database is freely available via https://expression.latrobe.edu.au/agriseqdb. It is compatible with all modern popular web browsers and possible to use by tablets and mobile/cell phones. Database source code is available for reuse at https://bitbucket.org/arobinson/agribiohvc. Modified Degust and eFP source code used in this project is available at https://bitbucket.org/arobinson/efp and https://github.com/andrewjrobinson/degust.
Author information
Authors and Affiliations
Contributions
AJR developed the data portal, created illustrations (for eFP) and uploaded data. MT contributed to project development. MGL and JW conceived the project and provided scientific direction. MGL conducted tool research. CB annotated Meta-data and RDA records. RS, AW, SH, EF and KU participated in project management. AJR, MT, MGL and JW wrote the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
KU declares that she is an employee of the funder, the Australian National Data Service. The authors declare that they have no other competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Additional file
Additional file 1:
Figure S1 Many of the setting the user can alter during the process of uploading a data-set to AgriSeqDB to control how the data-set is displayed in the landing portal and each data-viewer. (PNG 165 kb)
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Robinson, A.J., Tamiru, M., Salby, R. et al. AgriSeqDB: an online RNA-Seq database for functional studies of agriculturally relevant plant species. BMC Plant Biol 18, 200 (2018). https://doi.org/10.1186/s12870-018-1406-2
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12870-018-1406-2