- Open Access
AgriSeqDB: an online RNA-Seq database for functional studies of agriculturally relevant plant species
BMC Plant Biologyvolume 18, Article number: 200 (2018)
The genome-wide expression profile of genes in different tissues/cell types and developmental stages is a vital component of many functional genomic studies. Transcriptome data obtained by RNA-sequencing (RNA-Seq) is often deposited in public databases that are made available via data portals. Data visualization is one of the first steps in assessment and hypothesis generation. However, these databases do not typically include visualization tools and establishing one is not trivial for users who are not computational experts. This, as well as the various formats in which data is commonly deposited, makes the processes of data access, sharing and utility more difficult. Our goal was to provide a simple and user-friendly repository that meets these needs for data-sets from major agricultural crops.
AgriSeqDB (https://expression.latrobe.edu.au/agriseqdb) is a database for viewing, analysing and interpreting developmental and tissue/cell-specific transcriptome data from several species, including major agricultural crops such as wheat, rice, maize, barley and tomato. The disparate manner in which public transcriptome data is often warehoused and the challenge of visualizing raw data are both major hurdles to data reuse. The popular eFP browser does an excellent job of presenting transcriptome data in an easily interpretable view, but previous implementation has been mostly on a case-by-case basis. Here we present an integrated visualisation database of transcriptome data-sets from six species that did not previously have public-facing visualisations. We combine the eFP browser, for gene-by-gene investigation, with the Degust browser, which enables visualisation of all transcripts across multiple samples. The two visualisation interfaces launch from the same point, enabling users to easily switch between analysis modes. The tools allow users, even those without bioinformatics expertise, to mine into data-sets and understand the behaviour of transcripts of interest across samples and time. We have also incorporated an additional graphic download option to simplify incorporation into presentations or publications.
Powered by eFP and Degust browsers, AgriSeqDB is a quick and easy-to-use platform for data analysis and visualization in five crops and Arabidopsis. Furthermore, it provides a tool that makes it easy for researchers to share their data-sets, promoting research collaborations and data-set reuse.
RNA-sequencing (RNA-Seq) is currently the preferred technology for genome-wide transcriptional profiling due to its combined ease of use, quality of data and suitability for a diverse range of applications [1, 2]. Recent advances in next generation sequencing (NGS) technologies coupled with decreases in the cost of sequencing have resulted in collection of large volumes of RNA-Seq data from many species [3, 4]. These data are typically deposited in online repositories in formats that are text and/or table-based. Visualization of data is a key early step in transcriptomic analysis for many biologists, allowing examination of data quality, as well as rapid interrogation of leads and hypothesis generation. Many researchers who wish to investigate public transcriptome data are not computational experts, for whom transferring data from the format of online repositories to visualization tools is challenging. This creates a barrier to data reuse. The eFP browser, which was first developed for in silico gene expression analysis in Arabidopsis, is an excellent piece of software to display transcriptome data visually . At the time of writing, 20 plant transcriptome data-sets are available publicly in dedicated eFP browsers (http://bar.utoronto.ca, [5,6,7,8,9,10,11,12,13,14,15,16,17]). Degust is a web-based data visualization tool that provides different functionality from eFP functionality (https://github.com/drpowell/degust). It enables users to view all transcripts from all samples in an experiment, examine trends between samples, to visualize quality-control metrics and to drill down into subsets of transcripts with expression patterns of interest. These two data browsers could be integrated to provide users with an easy to use tool for accessing and analysing multiple data-sets and, by developing some enhanced functionality, they could be used for data download and to generating quality images for presentations or publications.
RNA-Seq is often performed at whole plant or organ level using samples that are composed of different tissues and cell types. This approach masks cell- or tissue-specific information about transcripts, which is important to understand spatial-regulation and functions of genes [18, 19]. Spatial resolution is also important to capture transcripts that are expressed at extremely low levels in specific cell types and that are consequently below the limit of detection in bulk samples of tissues . Temporal gene expression data is also an important tool, which can be used to investigate the mechanisms of genome regulation and to understand the relationships between development and gene function . These approaches have been used in functional studies aimed at deciphering regulatory and structural gene networks of diverse plant species, including forest trees and major crops such as wheat (Triticum aestivum), rice (Oryza sativa), maize (Zea mays), barley (Hordeum vulgare), and tomato (Solanum lycopersicum) [21,22,23,24,25,26,27].
Here we present AgriSeqDB (https://expression.latrobe.edu.au/agriseqdb), a web-based resource that can serve as a public portal for accessing, analysing and visualizing tissue and cell-specific transcriptome data-sets from multiple species. Our focus in this implementation is primarily upon transcriptome data-sets during the development of seeds and fruits of agriculturally-relevant species. The database integrates two existing open-source browsers and enhances their functionality. The Degust browser provides access to information on genome-wide expression across samples and data-sets, aiding the discovery of new genes that can contribute to crop improvement. It also provides quality-control information. The eFP browser allows users to visualize between different samples the abundance of individual transcripts encoded by genes of interest.
Construction and content
All data-sets displayed currently in AgriSeqDB are transcriptomes published recently and deposited in public databases (Fig. 2, Table 1). Users of AgriSeqDB can view data directly from database server without the need to download it and then install/configure a viewer to visualise it. The data-sets were generated in six studies of seeds or fruit. The first is a study we conducted of transcriptome changes in whole Arabidopsis seeds during germination, which provides a useful reference due to this species’ high-quality genome sequence and annotation . Additionally, we displayed five data-sets from major agricultural crops. These were: A study of transcriptome changes in different tissues (aleurone, starchy endosperm, embryo, scutellum, pericarp, testa, husk and crushed cell layers) of barley grain at different stages of germination ; a study on transcriptome changes associated with different cell types of maize endosperm after pollination ; a study on transcriptome changes associated with seed germination and coleoptile growth in rice ; a study on transcriptome changes associated with fruit development in tomato fruit ; and a study on grain/endosperm transcriptome of bread wheat . The GeneExplore (Degust) component of AgriSeqDB requires RNA-seq data in raw count format (i.e. number of reads per gene or transcript, not normalised) for the subsequent analyses it applies. We made use of the raw count data provided by the original authors on the respective GEO/SRA repositories for the Arabidopsis, wheat, rice and barley studies (Table 1). In those cases, mapping and read counting were consequently as described previously [20, 21, 23]. In the case of the maize and tomato data-sets, raw count data were not available from the GEO/SRA repositories, but the original sequence reads were. For these we aligned and quantified the count data using Kallisto with the reference tomato or maize transcriptomes (Solanum lycopersicum SL2.50 or Zea mays AGPv4), using the resulting data as the input to AgriSeqDB .
Utility and discussion
Our goal was to develop a publicly accessible transcriptome database that provides simple and readily available tools to perform functional analysis of individual target genes or sets of genes. AgriSeqDB is a highly interactive and multi-view database that can be used for various purposes, including the discovery of genes of interest. Users of AgriSeqDB can view data directly from database server without the need to download it and then install/configure a viewer to visualise it. However, we provide the option for advanced users to download and install their own local AgriSeqDB for custom data-sets.
AgriSeqDB also allows users to get a better understanding of individual genes of interest, by inspecting them within GeneView (eFP) (Fig. 3). This incorporates the full existing functionality of eFP . Users can visualise expression of transcripts across all samples so that they may consider the relationships between samples (i.e. growth stage, tissue type, various treatments). Additionally, we incorporated an additional image download function, not previously available. Images may be downloaded in high-resolution .png format for presentations or publications. This is done by single clicking the Download button (Fig. 3). We have also enabled cross-species comparisons directly from the GeneView (eFP) records. When users are viewing a gene that interests them within GeneView (eFP), they can click on a button that directly returns a search from the Gramene database (http://www.gramene.org). This returns homologs, orthologs and paralogs drawn from 2,076,020 genes across 53 crop and model plant species, as well as a comparative phylogenetic tree.
Users are presented with a simple interface to query all genes using GeneExplore (Degust) (Fig. 4a). Extensive existing functionality is available to users within Degust . Filters can be created on the data based upon expression levels in individual samples, false discovery rate (FDR) and Log2 fold-change cut-off. Sub-sets of samples or transcripts can be selected for analysis can be analysed and the sample for referencing fold-change can be selected. MA plots of comparisons between pairs of samples can be displayed (Fig. 4b). Data quality metrics can be assessed by inspecting whether the replicates of each sample group together in the multidimensional scaling (MDS) plot (Fig. 4c). Data tables can also be downloaded for selected transcripts in .csv format for downstream analyses.
Data-set administration (advanced usage case)
One key inclusion to AgriSeqDB is the data-set administration tool. This tool is available only when users download and install their own local AgriSeqDB, for reasons of server security detailed below. eFP browser did not contain an interface to upload data, so configuration required much manual interaction. While Degust contained its own administration tool, it was not flexible enough to accommodate eFP and the landing portal. Consequently, we developed a new data upload interface to encompass both eFP and Degust. This allows the user (secured by username/password) to upload new data-sets and deploy them to each of the viewers. The tool provides the user the ability to execute custom code on the host server, access to which should be restricted to the local database administrator and trusted users. An example configuration is included in Fig. 5 and Additional file 1: Figure S1. A link is provided on our AgriSeqDB landing portal that takes users to a repository from which all AgriSeqDB code can be downloaded and from where installation/administration instructions and help files can be viewed (Fig. 2). The direct address of the code and help repository is https://bitbucket.org/arobinson/agribiohvc. The repository includes a link from which users can access a Galaxy Project RNA-seq analysis tutorial (https://galaxyproject.org/tutorials/rb_rnaseq/), which users may find useful to prepare data when establishing their own local AgriSeqDB.
We believe AgriSeqDB will be an important resource and data-reuse tool for plant biologists who seek greater insights into the role of individual genes or group of genes in biological processes, including for comparative studies in crop species of major agricultural importance. The databases will be periodically updated with more viewers and data-sets, focusing on additional tissue and cell-specific data-sets from crop species. The database currently contains results of RNA-Seq from different tissues and cell types, and it is planned that transcriptome data from single cell RNA-Seq will be added in the future. In the long term it is envisaged that users will be provided with links to GEO auto-download and view as well as allowed to upload data-sets at least temporarily. All source code is freely available for reuse by advanced users.
Electronic Fluorescent Pictograph
False Discovery Rate
Gene Expression Omnibus
Hypertext Markup Language
Next Generation Sequencing
Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nature Rev Genet. 2009;10:5–63.
Martin LB, Fei Z, Giovannoni JJ, Rose JK. Catalyzing plant science research with RNA-Seq. Front Plant Sci. 2013;4:66.
Petryszak R, Fonseca NA, Füllgrabe A, Huerta L, Keays M, Tang YA, Brazma A. The RNASeq-er API-a gateway to systematically updated analysis of public RNA-Seq data. Bioinformatics. 2017;33:2218–20.
Langmead B, Nellore A. Cloud computing for genomic data analysis and collaboration. Nature Rev Genet. 2018;19:208–19.
Winter D, Vinegar B, Nahal H, Ammar R, Wilson GV, Provart NJ. An “electronic fluorescent pictograph” browser for exploring and analyzing large-scale biological data sets. PLoS One. 2007;2:e718.
Waese J, Fan J, Pasha A, Yu H, Fucile G, Shi R, et al. ePlant: visualizing and exploring multiple levels of data for hypothesis generation in plant biology. Plant Cell. 2017;doi.org/10.1105/tpc.17.00073.
Fucile G, Di Biase D, Nahal H, La G, Khodabandeh S, Chen Y, et al. ePlant and the 3D data display initiative: integrative systems biology on the world wide web. PLoS One. 2011;6:e15237.
Dean G, Cao Y, Xiang D, Provart NJ, Ramsay L, Ahad A, et al. Analysis of gene expression patterns during seed coat development in Arabidopsis. Mol Plant. 2011;4:1074–91.
Mustroph A, Zanetti ME, Jang CJ, Holtan HE, Repetti PP, Galbraith DW, et al. Profiling translatomes of discrete cell populations resolves altered cellular priorities during hypoxia in Arabidopsis. PNAS. 2009;106:18843–8.
Wilkins O, Nahal H, Foong J, Provart NJ, Campbell MM. Expansion and diversification of the Populus R2R3-MYB family of transcription factors. Plant Physiol. 2009;149:981–93.
Tran F, Penniket C, Patel RV, Provart NJ, Laroche A, Rowland O, Robert LS. Developmental transcriptional profiling reveals key insights into Triticeae reproductive development. Plant J. 2013;74:971–88.
Sibout R, Proost S, Hansen BO, Vaid N, Giorgi FM, Ho-Yue-Kuang S, et al. Expression atlas and comparative coexpression network analyses reveal important genes involved in the formation of lignified cell wall in Brachypodium distachyon. New Phytol. 2017;215:1009–25.
Champigny MJ, Sung WW, Catana V, Salwan R, Summers PS, Dudley SA, et al. RNA-Seq effectively monitors gene expression in Eutrema salsugineum plants growing in an extreme natural habitat and in controlled growth cabinet conditions. BMC Genomics. 2013;14:578.
Kagale S, Nixon J, Khedikar Y, Pasha A, Provart NJ, Clarke WE, et al. The developmental transcriptome atlas of the biofuel crop Camelina sativa. Plant J. 2016;88:879–94.
Clevenger J, Chu Y, Scheffler B, Ozias-Akins P. A developmental transcriptome map for allotetraploid Arachis hypogaea. Front Plant Sci. 2016;7:1446.
Fasoli M, Dal Santo S, Zenoni S, Tornielli GB, Farina L, Zamboni A, et al. The grapevine expression atlas reveals a deep transcriptome shift driving the entire plant into a maturation program. Plant Cell. 2012;24:3489–505.
Li P, Ponnala L, Gandotra N, Wang L, Si Y, Tausta SL, et al. The developmental dynamics of the maize leaf transcriptome. Nat Genet. 2010;42:1060.
Slane D, Kong J, Berendzen KW, Kilian J, Henschen A, Kolb M, et al. Cell type-specific transcriptome analysis in the early Arabidopsis thaliana embryo. Development. 2014;41:4831–40.
Chen J, Zeng B, Zhang M, Xie S, Wang G, Hauck A, Lai J. Dynamic transcriptome landscape of maize embryo and endosperm development. Plant Physiol. 2014;166:252–64.
Narsai R, Gouil Q, Secco D, Srivastava A, Karpievitch YV, Liew LC, et al. Extensive transcriptomic and epigenomic remodelling occurs during Arabidopsis thaliana germination. Genome Biol. 2017;18:172.
Betts NS, Berkowitz O, Liu R, Collins HM, Skadhauge B, Dockter C, et al. Isolation of tissues and preservation of RNA from intact, germinated barley grain. Plant J. 2017;91:754–65.
Zhan J, Thakare D, Ma C, Lloyd A, Nixon NM, Arakaki AM, et al. RNA sequencing of laser-capture microdissected compartments of the maize kernel identifies regulatory modules associated with endosperm cell differentiation. Plant Cell. 2015;27:513–31.
Narsai R, Secco D, Schultz MD, Ecker JR, Lister R, Whelan J. Dynamic and rapid changes in the transcriptome and epigenome during germination and in developing rice (Oryza sativa) coleoptiles under anoxia and re-oxygenation. Plant J. 2017;89:805–24.
Pfeifer M, Kugler KG, Sandve SR, Zhan B, Rudi H, Hvidsten TR, et al. Genome interplay in the grain transcriptome of hexaploid bread wheat. Science. 2014;345:1250091.
Celedon JM, Yuen M, Chiang A, Henderson H, Reid KE, Bohlmann J. Cell-type-and tissue-specific transcriptomes of the white spruce (Picea glauca) bark unmask fine-scale spatial patterns of constitutive and induced conifer defense. Plant J. 2017;92:710–26.
D’Esposito D, Ferriello F, Dal Molin A, Diretto G, Sacco A, Minio A, et al. Unraveling the complexity of transcriptomic, metabolomic and quality environmental response of tomato fruit. BMC Plant Biol. 2017;17:66.
Shinozaki Y, Nicolas P, Fernandez-Pozo N, Ma Q, Evanich DJ, Shi Y, et al. High-resolution spatiotemporal transcriptome mapping of tomato fruit development and ripening. Nat Commun. 2018;9:364.
Powell DR. https://github.com/drpowell/degust, 2013.
Bray NL, Pimentel H, Melsted P, Pachter L. Near-optimal probabilistic RNA-seq quantification. Nat Biotechnol. 2016;34:525.
We gratefully acknowledge the excellent work of Dr. Nicholas Provart (U. Toronto) and the eFP team past and present as well as the developer of Degust, David Powell (Monash University), whose software we utilised here. We also thank them for their advice when establishing our database. We thank all members of the teams who generated the data we have displayed in AgriSeqDB, who are too numerous to list here. We thank Maoshan Chen (La Trobe University) for his help with data processing.
This work was supported by a grant from the Australian National Data Service (ANDS) grant as well as by in-kind contributions from La Trobe University Information and Communication Technology and the La Trobe Genomics Platform.
Availability of data and materials
The database is freely available via https://expression.latrobe.edu.au/agriseqdb. It is compatible with all modern popular web browsers and possible to use by tablets and mobile/cell phones. Database source code is available for reuse at https://bitbucket.org/arobinson/agribiohvc. Modified Degust and eFP source code used in this project is available at https://bitbucket.org/arobinson/efp and https://github.com/andrewjrobinson/degust.
Ethics approval and consent to participate
Consent for publication
KU declares that she is an employee of the funder, the Australian National Data Service. The authors declare that they have no other competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Figure S1 Many of the setting the user can alter during the process of uploading a data-set to AgriSeqDB to control how the data-set is displayed in the landing portal and each data-viewer. (PNG 165 kb)