Skip to main content

AgriSeqDB: an online RNA-Seq database for functional studies of agriculturally relevant plant species



The genome-wide expression profile of genes in different tissues/cell types and developmental stages is a vital component of many functional genomic studies. Transcriptome data obtained by RNA-sequencing (RNA-Seq) is often deposited in public databases that are made available via data portals. Data visualization is one of the first steps in assessment and hypothesis generation. However, these databases do not typically include visualization tools and establishing one is not trivial for users who are not computational experts. This, as well as the various formats in which data is commonly deposited, makes the processes of data access, sharing and utility more difficult. Our goal was to provide a simple and user-friendly repository that meets these needs for data-sets from major agricultural crops.


AgriSeqDB ( is a database for viewing, analysing and interpreting developmental and tissue/cell-specific transcriptome data from several species, including major agricultural crops such as wheat, rice, maize, barley and tomato. The disparate manner in which public transcriptome data is often warehoused and the challenge of visualizing raw data are both major hurdles to data reuse. The popular eFP browser does an excellent job of presenting transcriptome data in an easily interpretable view, but previous implementation has been mostly on a case-by-case basis. Here we present an integrated visualisation database of transcriptome data-sets from six species that did not previously have public-facing visualisations. We combine the eFP browser, for gene-by-gene investigation, with the Degust browser, which enables visualisation of all transcripts across multiple samples. The two visualisation interfaces launch from the same point, enabling users to easily switch between analysis modes. The tools allow users, even those without bioinformatics expertise, to mine into data-sets and understand the behaviour of transcripts of interest across samples and time. We have also incorporated an additional graphic download option to simplify incorporation into presentations or publications.


Powered by eFP and Degust browsers, AgriSeqDB is a quick and easy-to-use platform for data analysis and visualization in five crops and Arabidopsis. Furthermore, it provides a tool that makes it easy for researchers to share their data-sets, promoting research collaborations and data-set reuse.


RNA-sequencing (RNA-Seq) is currently the preferred technology for genome-wide transcriptional profiling due to its combined ease of use, quality of data and suitability for a diverse range of applications [1, 2]. Recent advances in next generation sequencing (NGS) technologies coupled with decreases in the cost of sequencing have resulted in collection of large volumes of RNA-Seq data from many species [3, 4]. These data are typically deposited in online repositories in formats that are text and/or table-based. Visualization of data is a key early step in transcriptomic analysis for many biologists, allowing examination of data quality, as well as rapid interrogation of leads and hypothesis generation. Many researchers who wish to investigate public transcriptome data are not computational experts, for whom transferring data from the format of online repositories to visualization tools is challenging. This creates a barrier to data reuse. The eFP browser, which was first developed for in silico gene expression analysis in Arabidopsis, is an excellent piece of software to display transcriptome data visually [5]. At the time of writing, 20 plant transcriptome data-sets are available publicly in dedicated eFP browsers (, [5,6,7,8,9,10,11,12,13,14,15,16,17]). Degust is a web-based data visualization tool that provides different functionality from eFP functionality ( It enables users to view all transcripts from all samples in an experiment, examine trends between samples, to visualize quality-control metrics and to drill down into subsets of transcripts with expression patterns of interest. These two data browsers could be integrated to provide users with an easy to use tool for accessing and analysing multiple data-sets and, by developing some enhanced functionality, they could be used for data download and to generating quality images for presentations or publications.

RNA-Seq is often performed at whole plant or organ level using samples that are composed of different tissues and cell types. This approach masks cell- or tissue-specific information about transcripts, which is important to understand spatial-regulation and functions of genes [18, 19]. Spatial resolution is also important to capture transcripts that are expressed at extremely low levels in specific cell types and that are consequently below the limit of detection in bulk samples of tissues [2]. Temporal gene expression data is also an important tool, which can be used to investigate the mechanisms of genome regulation and to understand the relationships between development and gene function [20]. These approaches have been used in functional studies aimed at deciphering regulatory and structural gene networks of diverse plant species, including forest trees and major crops such as wheat (Triticum aestivum), rice (Oryza sativa), maize (Zea mays), barley (Hordeum vulgare), and tomato (Solanum lycopersicum) [21,22,23,24,25,26,27].

Here we present AgriSeqDB (, a web-based resource that can serve as a public portal for accessing, analysing and visualizing tissue and cell-specific transcriptome data-sets from multiple species. Our focus in this implementation is primarily upon transcriptome data-sets during the development of seeds and fruits of agriculturally-relevant species. The database integrates two existing open-source browsers and enhances their functionality. The Degust browser provides access to information on genome-wide expression across samples and data-sets, aiding the discovery of new genes that can contribute to crop improvement. It also provides quality-control information. The eFP browser allows users to visualize between different samples the abundance of individual transcripts encoded by genes of interest.

Construction and content

Database/website architecture

The main structure of AgriSeqDB is described in Fig. 1. It consists of a landing portal that is implemented using an HTML frontend and Python/Django backend to present all data-sets and associated meta-data to users. The landing portal allows the user to discover the data-sets and navigate to data viewers of interest. The existing eFP browser, which has HTML (frontend) and Python (backend) tools, was selected in order to allow users to view expression data on a gene-by-gene basis [5]. Additionally, the existing tool Degust is included to allow viewing of expression profiles across all (or a subset of) genes at once [28]. Degust uses an HTML/Javascript frontend and Haskell backend. Both tools were linked and wrapped within the Landing Portal to ensure that users receive a consistent look and feel when using the portal and each viewer (Fig. 2). The source code for the landing portal and integrations with the viewers is available for reuse ( This repository makes use of git submodules to link the source code of eFP and Degust browsers, each of which was modified slightly from original versions to ensure that they link cleanly; source code for modified versions is available at and, respectively. The Landing Portal and eFP browser use a MySQL database server to store settings and data/meta-data, while Degust uses files on the file system. A central configuration portal was added to ease the loading of data-sets into the database and of the landing portal documentation, allowing organism annotation upload, data-set upload, data-set configuration such as making it private/public, providing external links and abstract etc., and deploying the data-set to what we refer to as GeneView (eFP) or GeneExplore (Degust).

Fig. 1
figure 1

High level structure of AgriSeqDB showing the linkage between data browsers and the central Landing Portal. The Landing Portal provides a central place to access all data-sets and provide meta-data that isn’t provided by data browsers. The data browsers provide access to the same data in various forms to enable greater insight

Fig. 2
figure 2

AgriSeqDB home screen showing the six data-sets from species including crops species of major agricultural importance that are currently in the database

Data sources

All data-sets displayed currently in AgriSeqDB are transcriptomes published recently and deposited in public databases (Fig. 2, Table 1). Users of AgriSeqDB can view data directly from database server without the need to download it and then install/configure a viewer to visualise it. The data-sets were generated in six studies of seeds or fruit. The first is a study we conducted of transcriptome changes in whole Arabidopsis seeds during germination, which provides a useful reference due to this species’ high-quality genome sequence and annotation [20]. Additionally, we displayed five data-sets from major agricultural crops. These were: A study of transcriptome changes in different tissues (aleurone, starchy endosperm, embryo, scutellum, pericarp, testa, husk and crushed cell layers) of barley grain at different stages of germination [21]; a study on transcriptome changes associated with different cell types of maize endosperm after pollination [22]; a study on transcriptome changes associated with seed germination and coleoptile growth in rice [23]; a study on transcriptome changes associated with fruit development in tomato fruit [26]; and a study on grain/endosperm transcriptome of bread wheat [24]. The GeneExplore (Degust) component of AgriSeqDB requires RNA-seq data in raw count format (i.e. number of reads per gene or transcript, not normalised) for the subsequent analyses it applies. We made use of the raw count data provided by the original authors on the respective GEO/SRA repositories for the Arabidopsis, wheat, rice and barley studies (Table 1). In those cases, mapping and read counting were consequently as described previously [20, 21, 23]. In the case of the maize and tomato data-sets, raw count data were not available from the GEO/SRA repositories, but the original sequence reads were. For these we aligned and quantified the count data using Kallisto with the reference tomato or maize transcriptomes (Solanum lycopersicum SL2.50 or Zea mays AGPv4), using the resulting data as the input to AgriSeqDB [29].

Table 1 RNA-Seq data-sets included in AgriSeqDB

Utility and discussion

Our goal was to develop a publicly accessible transcriptome database that provides simple and readily available tools to perform functional analysis of individual target genes or sets of genes. AgriSeqDB is a highly interactive and multi-view database that can be used for various purposes, including the discovery of genes of interest. Users of AgriSeqDB can view data directly from database server without the need to download it and then install/configure a viewer to visualise it. However, we provide the option for advanced users to download and install their own local AgriSeqDB for custom data-sets.

GeneView (eFP)

AgriSeqDB also allows users to get a better understanding of individual genes of interest, by inspecting them within GeneView (eFP) (Fig. 3). This incorporates the full existing functionality of eFP [5]. Users can visualise expression of transcripts across all samples so that they may consider the relationships between samples (i.e. growth stage, tissue type, various treatments). Additionally, we incorporated an additional image download function, not previously available. Images may be downloaded in high-resolution .png format for presentations or publications. This is done by single clicking the Download button (Fig. 3). We have also enabled cross-species comparisons directly from the GeneView (eFP) records. When users are viewing a gene that interests them within GeneView (eFP), they can click on a button that directly returns a search from the Gramene database ( This returns homologs, orthologs and paralogs drawn from 2,076,020 genes across 53 crop and model plant species, as well as a comparative phylogenetic tree.

Fig. 3
figure 3

The full screenshot showing AT2G40170 gene expression in GeneView (eFP) browser. The user uses the search form at the top to select the gene of interest and select the mode of operation including: (1) absolute, shows the counts as stored in the database for the primary gene, (2) relative, shows the counts relative to the control for primary gene, and (3) compare, counts as a ratio between the primary and secondary genes. Clicking the view button updates the figure below to show the expression levels of each sample by colour coding the fill area with a scale red-yellow (for absolute) and red-grey-blue (for relative and compare). Alternatively, the user can click the download button (indicated by a green arrowhead) to download the expression image at twice the resolution as shown on-screen (ready for publication). Data is from transcriptome of Arabidopsis seeds during germination [20]

GeneExplore (Degust)

Users are presented with a simple interface to query all genes using GeneExplore (Degust) (Fig. 4a). Extensive existing functionality is available to users within Degust [28]. Filters can be created on the data based upon expression levels in individual samples, false discovery rate (FDR) and Log2 fold-change cut-off. Sub-sets of samples or transcripts can be selected for analysis can be analysed and the sample for referencing fold-change can be selected. MA plots of comparisons between pairs of samples can be displayed (Fig. 4b). Data quality metrics can be assessed by inspecting whether the replicates of each sample group together in the multidimensional scaling (MDS) plot (Fig. 4c). Data tables can also be downloaded for selected transcripts in .csv format for downstream analyses.

Fig. 4
figure 4

Screen shot of the GeneExplore (Degust) browser and subsequent result pages. a the Arabidopsis time-series data-set is shown here as an example, displaying transcripts that are up-regulated in S samples but down-regulated in SL samples (Top panel). The user can select which samples they wish to see with the checkboxes in the top left of screen along with the method of analysis (voom/limma, edgeR, or voom). In the top right, the user can control the rendering and thresholds of using the options dialog. All genes that match filters above are shown in a heat-map, which clusters genes with similar levels of expression (Middle panel). Running the mouse-over each gene highlights it in the plots above. Table showing all matching genes in tabular format with the expression levels for each sample, false discovery rate and any extra annotation columns provided in the data-set (Lower panel). In the top centre the user can limit genes by using 1 of 3 interactive plots, and the parallel coordinates plot allows the user to limit genes by their log fold gene expression (per sample). b Example of an MA plot. Users can limit genes by drawing a box around genes on the on the MA plot; the two samples used for the MA plot are specified in the Options dialog (top right). c An MDS plot showing groupings of the individual replicates of each sample. Data is from transcriptome of Arabidopsis seed during germination [20]

Data-set administration (advanced usage case)

One key inclusion to AgriSeqDB is the data-set administration tool. This tool is available only when users download and install their own local AgriSeqDB, for reasons of server security detailed below. eFP browser did not contain an interface to upload data, so configuration required much manual interaction. While Degust contained its own administration tool, it was not flexible enough to accommodate eFP and the landing portal. Consequently, we developed a new data upload interface to encompass both eFP and Degust. This allows the user (secured by username/password) to upload new data-sets and deploy them to each of the viewers. The tool provides the user the ability to execute custom code on the host server, access to which should be restricted to the local database administrator and trusted users. An example configuration is included in Fig. 5 and Additional file 1: Figure S1. A link is provided on our AgriSeqDB landing portal that takes users to a repository from which all AgriSeqDB code can be downloaded and from where installation/administration instructions and help files can be viewed (Fig. 2). The direct address of the code and help repository is The repository includes a link from which users can access a Galaxy Project RNA-seq analysis tutorial (, which users may find useful to prepare data when establishing their own local AgriSeqDB.

Fig. 5
figure 5

The process of uploading a data-set to a local installation of AgriSeqDB consists of 3-steps. First, the user selects a unique identifier and display name for the data-set (a). Second, the user chooses the count file and various eFP images used to display expression values from their PC to upload (b). Finally, the user can alter many settings that control how the data-set is displayed in the landing portal and each data-viewer as shown in Additional file 1 (see Supplementary Fig. 1). Most settings have sensible defaults where user input is required, time saving tools (such as colour picker & clickable image for eFP settings) or spreadsheet import/export (sample settings)


We believe AgriSeqDB will be an important resource and data-reuse tool for plant biologists who seek greater insights into the role of individual genes or group of genes in biological processes, including for comparative studies in crop species of major agricultural importance. The databases will be periodically updated with more viewers and data-sets, focusing on additional tissue and cell-specific data-sets from crop species. The database currently contains results of RNA-Seq from different tissues and cell types, and it is planned that transcriptome data from single cell RNA-Seq will be added in the future. In the long term it is envisaged that users will be provided with links to GEO auto-download and view as well as allowed to upload data-sets at least temporarily. All source code is freely available for reuse by advanced users.



Electronic Fluorescent Pictograph


False Discovery Rate


Gene Expression Omnibus


Hypertext Markup Language


Multi-Dimensional Scaling


Next Generation Sequencing


RNA Sequencing


  1. Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nature Rev Genet. 2009;10:5–63.

    Article  CAS  Google Scholar 

  2. Martin LB, Fei Z, Giovannoni JJ, Rose JK. Catalyzing plant science research with RNA-Seq. Front Plant Sci. 2013;4:66.

    Article  Google Scholar 

  3. Petryszak R, Fonseca NA, Füllgrabe A, Huerta L, Keays M, Tang YA, Brazma A. The RNASeq-er API-a gateway to systematically updated analysis of public RNA-Seq data. Bioinformatics. 2017;33:2218–20.

    Article  Google Scholar 

  4. Langmead B, Nellore A. Cloud computing for genomic data analysis and collaboration. Nature Rev Genet. 2018;19:208–19.

    Article  CAS  Google Scholar 

  5. Winter D, Vinegar B, Nahal H, Ammar R, Wilson GV, Provart NJ. An “electronic fluorescent pictograph” browser for exploring and analyzing large-scale biological data sets. PLoS One. 2007;2:e718.

    Article  Google Scholar 

  6. Waese J, Fan J, Pasha A, Yu H, Fucile G, Shi R, et al. ePlant: visualizing and exploring multiple levels of data for hypothesis generation in plant biology. Plant Cell. 2017;

  7. Fucile G, Di Biase D, Nahal H, La G, Khodabandeh S, Chen Y, et al. ePlant and the 3D data display initiative: integrative systems biology on the world wide web. PLoS One. 2011;6:e15237.

    Article  CAS  Google Scholar 

  8. Dean G, Cao Y, Xiang D, Provart NJ, Ramsay L, Ahad A, et al. Analysis of gene expression patterns during seed coat development in Arabidopsis. Mol Plant. 2011;4:1074–91.

    Article  CAS  Google Scholar 

  9. Mustroph A, Zanetti ME, Jang CJ, Holtan HE, Repetti PP, Galbraith DW, et al. Profiling translatomes of discrete cell populations resolves altered cellular priorities during hypoxia in Arabidopsis. PNAS. 2009;106:18843–8.

    Article  CAS  Google Scholar 

  10. Wilkins O, Nahal H, Foong J, Provart NJ, Campbell MM. Expansion and diversification of the Populus R2R3-MYB family of transcription factors. Plant Physiol. 2009;149:981–93.

    Article  CAS  Google Scholar 

  11. Tran F, Penniket C, Patel RV, Provart NJ, Laroche A, Rowland O, Robert LS. Developmental transcriptional profiling reveals key insights into Triticeae reproductive development. Plant J. 2013;74:971–88.

    Article  CAS  Google Scholar 

  12. Sibout R, Proost S, Hansen BO, Vaid N, Giorgi FM, Ho-Yue-Kuang S, et al. Expression atlas and comparative coexpression network analyses reveal important genes involved in the formation of lignified cell wall in Brachypodium distachyon. New Phytol. 2017;215:1009–25.

    Article  CAS  Google Scholar 

  13. Champigny MJ, Sung WW, Catana V, Salwan R, Summers PS, Dudley SA, et al. RNA-Seq effectively monitors gene expression in Eutrema salsugineum plants growing in an extreme natural habitat and in controlled growth cabinet conditions. BMC Genomics. 2013;14:578.

    Article  CAS  Google Scholar 

  14. Kagale S, Nixon J, Khedikar Y, Pasha A, Provart NJ, Clarke WE, et al. The developmental transcriptome atlas of the biofuel crop Camelina sativa. Plant J. 2016;88:879–94.

    Article  CAS  Google Scholar 

  15. Clevenger J, Chu Y, Scheffler B, Ozias-Akins P. A developmental transcriptome map for allotetraploid Arachis hypogaea. Front Plant Sci. 2016;7:1446.

    Article  Google Scholar 

  16. Fasoli M, Dal Santo S, Zenoni S, Tornielli GB, Farina L, Zamboni A, et al. The grapevine expression atlas reveals a deep transcriptome shift driving the entire plant into a maturation program. Plant Cell. 2012;24:3489–505.

    Article  CAS  Google Scholar 

  17. Li P, Ponnala L, Gandotra N, Wang L, Si Y, Tausta SL, et al. The developmental dynamics of the maize leaf transcriptome. Nat Genet. 2010;42:1060.

    Article  CAS  Google Scholar 

  18. Slane D, Kong J, Berendzen KW, Kilian J, Henschen A, Kolb M, et al. Cell type-specific transcriptome analysis in the early Arabidopsis thaliana embryo. Development. 2014;41:4831–40.

    Article  Google Scholar 

  19. Chen J, Zeng B, Zhang M, Xie S, Wang G, Hauck A, Lai J. Dynamic transcriptome landscape of maize embryo and endosperm development. Plant Physiol. 2014;166:252–64.

    Article  Google Scholar 

  20. Narsai R, Gouil Q, Secco D, Srivastava A, Karpievitch YV, Liew LC, et al. Extensive transcriptomic and epigenomic remodelling occurs during Arabidopsis thaliana germination. Genome Biol. 2017;18:172.

    Article  Google Scholar 

  21. Betts NS, Berkowitz O, Liu R, Collins HM, Skadhauge B, Dockter C, et al. Isolation of tissues and preservation of RNA from intact, germinated barley grain. Plant J. 2017;91:754–65.

    Article  CAS  Google Scholar 

  22. Zhan J, Thakare D, Ma C, Lloyd A, Nixon NM, Arakaki AM, et al. RNA sequencing of laser-capture microdissected compartments of the maize kernel identifies regulatory modules associated with endosperm cell differentiation. Plant Cell. 2015;27:513–31.

    Article  CAS  Google Scholar 

  23. Narsai R, Secco D, Schultz MD, Ecker JR, Lister R, Whelan J. Dynamic and rapid changes in the transcriptome and epigenome during germination and in developing rice (Oryza sativa) coleoptiles under anoxia and re-oxygenation. Plant J. 2017;89:805–24.

    Article  CAS  Google Scholar 

  24. Pfeifer M, Kugler KG, Sandve SR, Zhan B, Rudi H, Hvidsten TR, et al. Genome interplay in the grain transcriptome of hexaploid bread wheat. Science. 2014;345:1250091.

    Article  Google Scholar 

  25. Celedon JM, Yuen M, Chiang A, Henderson H, Reid KE, Bohlmann J. Cell-type-and tissue-specific transcriptomes of the white spruce (Picea glauca) bark unmask fine-scale spatial patterns of constitutive and induced conifer defense. Plant J. 2017;92:710–26.

    Article  CAS  Google Scholar 

  26. D’Esposito D, Ferriello F, Dal Molin A, Diretto G, Sacco A, Minio A, et al. Unraveling the complexity of transcriptomic, metabolomic and quality environmental response of tomato fruit. BMC Plant Biol. 2017;17:66.

    Article  Google Scholar 

  27. Shinozaki Y, Nicolas P, Fernandez-Pozo N, Ma Q, Evanich DJ, Shi Y, et al. High-resolution spatiotemporal transcriptome mapping of tomato fruit development and ripening. Nat Commun. 2018;9:364.

    Article  Google Scholar 

  28. Powell DR., 2013.

  29. Bray NL, Pimentel H, Melsted P, Pachter L. Near-optimal probabilistic RNA-seq quantification. Nat Biotechnol. 2016;34:525.

    Article  CAS  Google Scholar 

Download references


We gratefully acknowledge the excellent work of Dr. Nicholas Provart (U. Toronto) and the eFP team past and present as well as the developer of Degust, David Powell (Monash University), whose software we utilised here. We also thank them for their advice when establishing our database. We thank all members of the teams who generated the data we have displayed in AgriSeqDB, who are too numerous to list here. We thank Maoshan Chen (La Trobe University) for his help with data processing.


This work was supported by a grant from the Australian National Data Service (ANDS) grant as well as by in-kind contributions from La Trobe University Information and Communication Technology and the La Trobe Genomics Platform.

Availability of data and materials

The database is freely available via It is compatible with all modern popular web browsers and possible to use by tablets and mobile/cell phones. Database source code is available for reuse at Modified Degust and eFP source code used in this project is available at and

Author information

Authors and Affiliations



AJR developed the data portal, created illustrations (for eFP) and uploaded data. MT contributed to project development. MGL and JW conceived the project and provided scientific direction. MGL conducted tool research. CB annotated Meta-data and RDA records. RS, AW, SH, EF and KU participated in project management. AJR, MT, MGL and JW wrote the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Mathew G. Lewsey.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

KU declares that she is an employee of the funder, the Australian National Data Service. The authors declare that they have no other competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Additional file

Additional file 1:

Figure S1 Many of the setting the user can alter during the process of uploading a data-set to AgriSeqDB to control how the data-set is displayed in the landing portal and each data-viewer. (PNG 165 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Robinson, A.J., Tamiru, M., Salby, R. et al. AgriSeqDB: an online RNA-Seq database for functional studies of agriculturally relevant plant species. BMC Plant Biol 18, 200 (2018).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: