- Research
- Open access
- Published:
FEAtl: a comprehensive web-based expression atlas for functional genomics in tropical and subtropical fruit crops
BMC Plant Biology volume 24, Article number: 890 (2024)
Abstract
Background
Fruit crops, including tropical and subtropical fruits like Avocado (Persea americana), Fig (Ficus carica), Date Palm (Phoenix dactylifera), Mango (Mangifera indica), Guava (Psidium guajava), Papaya (Carica papaya), Pineapple (Ananas comosus), and Banana (Musa acuminata) are economically vital, contributing significantly to global agricultural output, as classified by the FAO’s World Programme for the Census of Agriculture. Advancements in next-generation sequencing, have transformed fruit crop breeding by providing in-depth genomic and transcriptomic data. RNA sequencing enables high-throughput analysis of gene expression, and functional genomics, crucial for addressing horticultural challenges and enhancing fruit production. The genomic and expression data for key tropical and sub-tropical fruit crops is currently lacking a comprehensive expression atlas, revealing a significant gap in resources for horticulturists who require a unified platform with diverse datasets across various conditions and cultivars.
Results
The Fruit Expression Atlas (FEAtl), available at http://backlin.cabgrid.res.in/FEAtl/, is a first-ever extensive and unified expression atlas for tropical and subtropical fruit crops developed using 3-tier architecture. The expressivity of coding and non-coding genes, encompassing 2,060 RNA-Seq samples across 91 tissue types and 177 BioProjects, it provides a comprehensive view of gene expression patterns for different tissues under various conditions. FEAtl features multiple tabs that cater to different aspects of the dataset, namely, Home, About, Analyze, Statistics, and Team and contains seven central functional modules: Transcript Information,Sample Information, Expression Profiles in FPKM and TPM, Functional Analysis, Genes Based on Tau Score, and Search for Specific Gene. The expression of a transcript of interest can be easily queried by searching by tissue ID and transcript type. Expression data can be displayed as a heat map, along with functional descriptions as well as Gene Ontology and Kyoto Encyclopedia of Genes and Genomes.
Conclusions
This atlas represents a groundbreaking compilation of a wide array of information pertaining to eight distinct fruit crops and serves as a fundamental resource for comparative analysis among different fruit species and is a catalyst for functional genomic studies. Database availability: http://backlin.cabgrid.res.in/FEAtl/.
Background
Fruits, regarded as nature’s most prized offerings to humanity [1], hold a significant position in the global agricultural landscape in terms of their nutritional value and availability per person [2]. Beyond their edible and nourishing qualities, fruits have also gained immense symbolic and cultural importance [3]. The cultivation of horticultural fruit crops has played a pivotal role in recent advancements in health and socioeconomic development [2]. With the integration of genomics and fruits, this sector is projected to be the fastest-growing in agriculture. The cultivation of fruit crops not only contributes to the well-being of individuals but also impacts the overall progress of nations [2, 4]. The production and per capita consumption of fruits directly influence the standard of living in a country [4]. Fruits and vegetables are integral components of a healthy diet and have been linked to a reduced risk of chronic diseases [5]. However, despite their numerous health benefits, many individuals fail to incorporate an adequate amount of fruits and vegetables into their daily meals. The World Health Organization recommends a minimum intake of 400 g, or five portions, of fruits and vegetables per day, emphasizing the importance of consuming a diverse range of produce to obtain a variety of essential nutrients [6]. To promote fruit and vegetable consumption, strategies aimed at enhancing intake are crucial for overall health. Strategies to promote fruit and vegetable intake are essential for health, and a well-planned and behavior-focused nutrition education can be effective in enhancing fruit and vegetable intake [7].
WHO/FAO plays an important role in identifying health risks and issues through comprehensive surveys and examinations, followed by their efforts in providing valuable guidance on dietary and physical activity guidelines. The WHO report presents an ambitious policy that aims to combat the escalating rates of chronic diseases in less industrialized nations. Consequently, there is a pressing need for further research to determine effective interventions in resource-poor environments [8]. In recognition of the nutritional and health benefits of fruits and vegetables, the United Nations General Assembly has designated 2021 as the International Year of Fruits and Vegetables. This initiative aims to raise awareness about the significance of fruits and vegetables in maintaining a balanced and healthy diet, as well as promoting a healthy lifestyle. Additionally, the campaign focuses on reducing losses and waste within the fruit and vegetable sector, while emphasizing the economic, social, and environmental advantages of increasing their production and consumption. The FAO advocates for a comprehensive food systems approach to enhance nutrition and address various challenges, including urbanization, climate change, and food shortages. This approach entails examining agriculture, food supply chains, food environments, and consumer behavior, while integrating sustainable practices throughout the entire production, harvesting, postharvest handling, processing, and consumption processes. By providing a framework and initiating discussions, the FAO highlights the interconnectedness of stakeholders and key issues that should be considered for action during the International Year of Fruits and Vegetables 2021. The primary objective of this initiative is to draw policy attention to the importance of fruits and vegetables in our diets and facilitate the sharing of successful practices [9].
Fruit crops hold substantial economic value and play a pivotal role in contributing to regional and global economies. They are of considerable economic importance across various regions, characterized by high marketable yields and significant contributions to agricultural production. The United Nations’ Food and Agriculture Organization (FAO) in its World Programme for the Census of Agriculture 2020 categorizes key fruit and nuts, particularly highlighting tropical and subtropical fruits as Avocado (Persea americana), Fig (Ficus carica), Date Palm (Phoenix dactylifera), Mango (Mangifera indica), Guava (Psidium guajava), Papaya (Carica papaya), Pineapple (Ananas comosus), and Banana (Musa acuminata) [10].
The advent of low-cost sequencing machines in the genomic era has given a tremendous genomic data to the scientific community. The Central Dogma of gene expression involves two main stages: transcription, which converts DNA into RNA, and translation, where RNA is transformed into protein [11]. Transcription and translation are key processes in gene expression, with transcripts serving as the intermediary between DNA and protein synthesis [12]. Transcription plays a crucial role in regulating gene activity, essentially determining when genes are activated or deactivated, thereby defining the cell’s identity and condition [13]. Transcripts play a vital role in gene expression, serving as the intermediaries between DNA and protein synthesis [14]. The coding region of a transcript, consisting of start and stop codons, known as the coding sequence (CDS), while the untranslated regions (UTRs) play a crucial role in post-transcriptional gene regulation [15]. The CDS region is the part of the gene that encodes for the protein, and its length, codon usage, nucleosome positioning, and post-transcriptional modifications can all affect translation initiation, elongation, and overall protein abundance [16, 17]. During translation, ribosomes read the sequence of mRNA in the CDS region and use it as a template to assemble the corresponding amino acids into a polypeptide chain, which eventually folds into a functional protein [18]. The untranslated regions (UTRs) of mRNA are non-coding regions that flank the coding sequence (CDS) of a gene. There are two UTRs in mRNA: the 5’ UTR and the 3’ UTR. The 5’ UTR is located at the 5’ end of the mRNA, while the 3’ UTR is located at the 3’ end of the mRNA. The 5’ UTR is involved in translation initiation, while the 3’ UTR is important for the regulation of mRNA stability, localization, and translation efficiency [15, 19, 20]. The study of the transcriptome, including mRNA, miRNA, lncRNA, and small RNA, is essential for understanding biological pathways and disease processes [21].
Tissue specificity studies in plants are crucial for understanding the contributions of specific tissues to overall metabolism and gene expression [22]. The study of tissue specificity in plants is significant in understanding the molecular basis of plant development, function, and adaptation. Plant tissues consist of many different cell types, each with specific functions, and the identification and characterization of tissue-specific genes can provide valuable insights into the molecular mechanisms that govern these processes. Tissue-specific genes are often associated with specialized cellular functions and can serve as important biomarkers for specific tissues or diseases [23, 24]. The development and benchmarking of tissue-specificity metrics, such as the tau, gini and counts are crucial for accurately quantifying the tissue specificity of gene expression. These metrics enable the systematic comparison of different methods for measuring tissue specificity and help identify the most robust and informative approaches for characterizing gene expression patterns across various tissues. Tau stands out as the optimal metric for assessing tissue specificity [25]. There are several studies of tissue-specific gene expression studies in agriculture, few of them are- use of tissue-specific promoters in molecular farming to enhance agronomic traits and drive the production of proteins and secondary metabolites in plants [26]; improved breeding practices by developing crops with desirable traits [27] and utilization in genome editing [28].
The advancement of genomics and transcriptomic resources has bridged the gap between sequence information obtained from various sequencing projects and functional genomics. The development of next-generation sequencing (NGS) technologies, including second and third-generation sequencing, has significantly improved the genome sequencing of fruits [29]. This progress has been instrumental in the development of genomics-assisted breeding programs, facilitated by the enhanced availability of genomic and transcriptomic data for various fruit species [30]. RNA sequencing (RNA-Seq) is a powerful technology that enables researchers to study gene expression and transcriptomic data in a high-throughput manner. It has revolutionized the field of genomics by providing a more accurate and comprehensive understanding of gene expression compared to traditional methods. RNA-Seq allows for the detection of differentially expressed genes, alternative splicing events, and gene isoforms, providing valuable insights into gene regulation and function [31, 32]. Moreover, RNA-Seq has a wide dynamic range of expression levels, making it suitable for detecting rare and lowly-expressed transcripts [33].
The analysis of gene expression in fruit crops is crucial for a multitude of reasons. Firstly, it enables the elucidation of genetic factors underlying horticultural and agronomic challenges, which are pivotal for enhancing fruit production and crop improvement strategies [34]. Secondly, such studies are instrumental in pinpointing key functional and regulatory genes linked to vital traits like disease resistance, stress tolerance, fruit quality, and ripening processes [35]. Moreover, comparative analyses of gene expression can shed light on the potential reconfiguration or repurposing of existing genetic pathways, paving the way for the development of novel and varied fruit phenotypes [36]. Transcriptome analysis is particularly valuable in identifying genes that exhibit differential expression associated with alternate bearing, a condition in which fruit trees alternate between high and low yield years [37]. Gene expression analysis facilitates the identification of genes and genetic markers associated with desirable traits such as fruit quality, nutritional content, disease resistance, and pest control. This, in turn, aids in the development of varieties that can better withstand changing environmental conditions, benefiting farmers, consumers, and the environment [38,39,40]. Lastly, gene expression studies can also be utilized to identify potential markers for assessing the physiological ripeness status of fruits [41].
Despite significant progress in Next-Generation Sequencing (NGS) and the benefits it offers for understanding gene expression, there is a conspicuous gap in research specifically focused on gene expression in fruits. This highlights a critical need for increased research initiatives in this domain. The current landscape of genomic and expression data for various fruit crops such as Avocado, Banana, Guava, Date, Figs, Papaya, and Pineapple lacks a comprehensive and dedicated expression atlas. In contrast, MangoBase has analysed 12 datasets, encompassing 11 BioProjects that extend to 80 samples of Mango Fruit. These studies examine various stages, including changes in pulp firmness and sweetness, peel coloration, and the effects of hot water postharvest treatment and infection with C. gloeosporioides. Dedicated solely to the Mango fruit crop, this database concentrates on fruit tissue and has identified roughly 340 coding sequences from the transcripts [42]. This limited focus highlights a significant gap in resources for horticulturists, who need a unified platform that offers access to a wide array of datasets which should encompass various tissues and include data from different treatments and conditions for each cultivar of most fruit crops. Apart from this, tissue specificity metrics calculation needs to be implemented to broaden the area of research for each tissue. Apart from this, there is no dedicated expression atlas for any other tropical or sub-tropical fruit crops.
In our study, we have broadened the existing research in arena of horticulture to include all tropical and subtropical fruit crops identified by the FAO, focusing on those with completely annotated genomes. Our study covers eight such fruit crops, namely, avocados, bananas, guava, dates, figs, mangoes, papayas, and pineapples. This expansion has led to the examination of 177 BioProjects, approximately 15-fold increase, representing 2,060 samples, a 26-fold increase. The extended analysis of this study encompasses every identifiable tissue type within these fruits, including undetermined tissue types categorized as ‘unknown.’ Regarding genomic data, our study has analysed the expression and provided functional annotations for coding sequences (CDS) and untranslated regions (UTR) of large number of transcripts. As a result, the dataset generated provides extensive biological insights, covering a broad spectrum of transcripts and tissue types across eight tropical and subtropical fruit crops. The study also aims at enriching the database with tissue-specific genes by implementing an analysis of tissue specificity metrics, including the Tau score and Tissue-specificity index (TSI). The proposed Fruit Expression Atlas (FEAtl) stands as a pioneering comprehensive gene expression database, specifically focusing on tropical and subtropical fruit crops such as avocados, bananas, guava, dates, figs, mangoes, papayas, and pineapples, all recognized by the FAO. FEAtl supports several Sustainable Development Goals (SDGs). It enhances food security and sustainable agriculture (SDG 2) by aiding breeding programs for improved yield and resilience, and it contributes to good health (SDG 3) by ensuring the availability of nutritious fruits. The database promotes economic growth (SDG 8) by fostering innovation in agriculture and supports responsible consumption and production (SDG 12) through sustainable practices. Additionally, FEAtl aids climate action (SDG 13) by helping develop climate-resilient crops. This comprehensive resource offers a global perspective on gene expression patterns across all major tissues of these fruit crops. It is an invaluable tool for horticulturists, providing deep insights into the coding and non-coding genes of these fruits, including variations across different cultivars, and under various biotic and abiotic stress conditions. By enhancing the understanding of functional genomics and transcriptomics in these crops, the FEAtl can significantly contribute to the development patterns of these fruits, facilitating international exchange of horticultural commodities and resources, and ultimately benefiting growers worldwide.
Construction and content
Data retrieval and pre-processing
Raw RNA-Seq data from various countries have been collected in FASTQ format both for single and paired end libraries for avocados, bananas, guavas, dates, figs, mangoes, papayas, and pineapples, from various public domain repositories, namely, Sequence Read Archive (SRA) [43], the European Nucleotide Archive (ENA) [44], and Ensembl Plants [45]. Corresponding reference genome files (FASTA) and annotation files (GTF/GFF) were obtained from NCBI and Ensembl Plants. The samples were categorized based on their specific tissues, followed by their collective processing. Figure 1 illustrates the global distribution and tissue-wise categorization of RNA-seq data from these fruit crop species as obtained from public domain repositories. The data retrieved are categorization into broad tissue types for each species in the form of pie charts for a comparative visual of tissue representation. This breakdown provides insight into the relative composition of tissue-specific sequences within each plant species’ RNA-seq data (Fig. 1). Table 1 illustrates the raw dataset used in the study. The state-of-the-art methodology was employed in this research work in order to construct the Fruit Expression Atlas. The workflow commences with raw reads subjected to a rigorous quality check before trimming and removal of adapters. The processed reads are then aligned to the reference genome, followed by the quantification of gene expression levels, represented by FPKM/TPM metrics. This foundation supports subsequent steps including the calculation of coding potential, abundance count, functional annotation, and tissue specificity scoring. The final stages involve functional and pathway analysis, as well as tissue-specificity scoring, ultimately resulting in the creation of the comprehensive Fruit Expression Atlas. The general approach utilized for the development of FEAtl has been delineated through the framework depicted in Fig. 2. Each step involved is the methodology is discussed in details in the following section.
Prior to analysis, each set of retrieved read data underwent a preprocessing step. For visualizing read parameters, including the detection of low-quality bases and Illumina adapters, FastQC version 0.11.8 [46] was utilized. The distribution of base quality scores and the average content of bases per read were determined using the Phred quality score. The cleaning and trimming of reads were then performed based on the FastQC reports, using Trimmomatic v0.39 [47]. Reads shorter than 40 base pairs, Phred score below 30 as well as those exhibiting consistent noise, were excluded for the downstream analysis. This process ensures that only high-quality reads are used for further analysis, which is essential for accurate results.
Read alignment, mapping and estimation of gene expression profiles
Initially, the reference genome was indexed using HISAT2 [48]. After indexing, processed RNA-Seq reads are mapped to the indexed genome using HISAT2 using the ‘--dta’ option. Subsequently, the resulting SAM files were converted to BAM format using Samtools [49]. These BAM files were then indexed and sorted for further analysis.
Quantification and normalization of mapped reads were performed to determine Fragments Per Kilobase of Transcript Per Million Mapped Reads (FPKM) and Transcripts Per Million (TPM) values. This analysis utilized StringTie [50, 51] following a specific protocol that omitted the quantification of novel transcripts. StringTie was instructed using the ‘-A’ option to generate transcript abundance values (FPKM or TPM) solely for known gene models, thus excluding novel transcripts.
Coding potential calculation
To precisely annotate genes and minimize false positives, coding potential was computed for each transcript in each fruit crop. CPC2, a tool for categorizing RNA transcripts, was leveraged to differentiate between transcripts that encode proteins versus non-coding RNAs based on open reading frames (ORFs) [52].
Estimation of tau score
Subsequently, each abundance file was categorized based on tissue types to calculate the tissue specificity score, known as the tau score [53]. This calculation was performed for each transcript using the R package ‘tispec’ [54].
The Tau score is a widely used metric for quantifying tissue specificity of gene expression. It is based on the expression level of a gene in each tissue and its maximum expression level across all tissues. It is a normalized measure that considers the amplitude of differences between tissues and the number of tissues in which a gene is expressed [25]. The tau score, ranging from 0.00 to 1.00, is a measure used to identify tissue-specific gene expression. The tau index is a measure of tissue specificity that quantifies the degree to which a gene is specifically expressed in a particular tissue compared to its expression in other tissues. It is calculated using the following formula:
The Ï„ index represents the tau index. N represents the total number of tissues. xi represents the expression specificity score of the gene in tissue i. This formula yields a tau index value ranging from 0 to 1. A tau index close to 0 indicates that the gene is expressed across many tissues, representing low specificity. A tau index close to 1 indicates that the gene is highly expressed in only a few tissues, representing high specificity [53].
The Tau score is utilized for the classification of genes according to their expression levels and specificity. Genes that possess a Tau score equal to or greater than 0.85 are acknowledged to exhibit specific expression, indicating their predominant expression in one or a few tissues. Conversely, genes with a Tau score below 0.85 are categorized as having widespread expression, signifying their expression across multiple tissues [55]. Here, we have employed the classification of genes based on specificity categories involves two main groups: Pan-tissue categories and Tissue-Specific categories [56]. The categories are shown below:
Pan-tissue categories
The analysis in the present study focuses solely on assessing the specificity of gene expression across multiple tissues, regardless to expression in any particular tissue. The tau index was calculated for each gene against all samples, representing its overall expression specificity across multiple tissue types.
-
1.
High Tissue Specificity genes: Tau index ranges from 0.8 to 1.0.
-
2.
Intermediate Tissue Specificity genes: Tau index ranges from 0.2 to less than 0.8.
-
3.
Low Tissue Specificity genes: Tau index ranges from 0 to less than 0.2.
Tissue-Specific Categories: Gene expression levels within specific tissue types were examined using a targeted approach. Initially, the mean expression level was computed for each tissue sample across all tissues within each species, using FPKM expression profiles. Subsequently, the tau index was calculated for each tissue group, allowing for comparisons and categorization into Tissue-Specific Categories based on patterns of expression specific to tissue types.
-
1.
Highly or Absolutely specific genes: Tau index ranges from 0.8 to 1.0.
-
2.
Intermediate specific genes: Tau index ranges from 0.2 to less than 0.8.
-
3.
Non-specific or Low-specific genes: Tau index ranges from 0 to less than 0.2.
Functional annotation and pathway analysis
Functional annotation and Gene Ontology (GO) term annotation of Molecular Function, Biological Process, and Cellular Component were performed for all coding and non-coding transcripts identified in the fruit crops under study. This comprehensive analysis utilized the Blast2GO software [57] and its default parameter settings.
The KEGG [58] pathway analysis, which involves the mapping of annotated genes to biochemical pathways, was conducted using the Blast2GO tool were performed on a subset of tissue-specific genes under various categories of specificity designated with the tau score.
Development of fruit expression atlas
The comprehensive Fruit Expression Atlas (FEAtl) database was developed using a three-tier architecture comprising a presentation layer, application layer, and data layer. The data layer encompasses the MySQL database, which houses all study-related results, including expression values for various fruit crops. The database development utilized MySQL (https://www.mysql.com/), while the web interface was crafted using PHP (https://www.php.net/) and HTML, with design implemented via CSS. Dynamic functionality was introduced using JavaScript. The database is hosted on an Apache server (https://httpd.apache.org/), with webpage design and deployment facilitated by the XAMPP framework. Users can retrieve data through a sequential process involving a user request to the webserver, a query sent to the MySQL database, a response generated by the database and sent to the web interface, and finally, the web server’s response to the user. The Fruit Expression Atlas optimizes storage, retrieval, and analysis of genomic data with a front-end using HTML, CSS, and JavaScript, managed by PHP interacting with MySQL. The database consists of three main components: the front-end (User Interface), which uses HTML for markup, CSS for styling, and JavaScript for interactive elements, providing users with a responsive and accessible interface; the back-end (server-side logic), handled by PHP, which processes user requests, interacts with the MySQL database, and generates content dynamically based on user interactions and data queries; and the database server, MySQL, used to store and manage all species-related data through structured tables designed to handle large datasets typical of transcriptomic information.
Utility and discussion
Read alignment, mapping and estimation of gene expression profiles
The development of Fruit Expression Atlas involves meticulous quality control of raw RNA-seq reads to ensure accurate subsequent analyses, including identification of poor-quality scores and adapter sequences. Trimming and adapter removal were performed to for high-quality reads for alignment. Alignment to reference genomes is optimized to maximize mapping rates and minimize incorrect alignments, ensuring reliable transcript abundance estimation. Transcript assembly and quantification using FPKM and TPM allow for normalization and comparative analysis. Coding potential analysis is crucial for distinguishing coding from non-coding RNA, aiding in the exploration of transcriptome complexity in tropical and subtropical fruit crops. Transcript annotation links transcriptomic data to biological processes, cellular components, and molecular functions. Pathway analysis enriches biological interpretation by mapping transcripts to metabolic and signalling pathways, identifying targets for genetic improvement of fruit crops. Tissue specificity scoring reveals unique gene expression patterns in specific tissues or developmental stages.
Our investigation has significantly expanded the current body of horticultural research by incorporating all tropical and subtropical fruit crops with fully annotated genomes, as acknowledged by the FAO. The analysis covers a broad range of genomic data from eight key crops—avocado, banana, guava, date palm, fig, mango, papaya, and pineapple. This comprehensive dataset, obtained through a thorough examination of 177 BioProjects and 2,060 samples representing 36 different tissue types, presents a wealth of biological insights. The chromosome level assembly was obtained for avocado, banana, date, fig, mango and pineapple while scaffolds were reported for guava and papaya. Notably, it includes the study of 256,955 transcripts, consisting of 88,887 coding sequences (CDS) and 168,068 untranslated regions (UTRs). It was observed that for all the fruits except fig, the coding sequences ranged from 54 to 68.16%. For fig, it was exceptionally 74.725%. In case of non-coding sequences, it ranged from 31.84 − 45.28% for all fruit crops except fig, where it was a low as 25.275%. Table 2 presents data on coding and non-coding transcripts, genome size, and assembly level across various fruit crop species. Meanwhile, Fig. 3 illustrates the abundance of coding and non-coding transcripts graphically, all the fruit species show a higher number of non-coding transcripts, which is consistent with the general trend observed in living organisms. Alternative splicing and post-transcriptional processing of non-coding transcripts contribute to their higher number [59]. The higher proportion of non-coding than coding transcripts in all these fruit crops indicate regulatory complexity, structural RNA significance and their crucial roles in cellular processes.
Gene expression categorization, functional annotation and pathway analysis
In case of Pan-Tissue Categories of Gene Expression based on tau score, it is evident that the number of genes with intermediate tissue specificity is consistently higher across all fruit species compared to those with high and low tissue specificity (Fig. 4). Avocado harbors a moderate quantity of genes displaying high and low tissue specificity, yet it possesses a notably higher count of genes manifesting intermediate tissue specificity, thereby indicating an equilibrium in gene expression patterns. Banana demonstrates a comparable trend to avocado but exhibits an even larger quantity of genes with intermediate tissue specificity, suggesting a more intricate regulatory framework facilitating gene expression across multiple tissues. Dates showcase a substantial number of genes characterized by both high and intermediate tissue specificity, while only a minimal amount display low tissue specificity, thus implying specialized and varied tissue functionalities. Figs contain fewer genes with high tissue specificity in contrast to other fruits, implying a lower degree of specialization, but upholding a balance between genes with intermediate and low specificity. Guava registers the smallest number of genes with high tissue specificity, hinting at an inclination towards broader gene expression throughout various tissues. Mango features a notable quantity of genes with high and intermediate tissue specificity, illustrating an equilibrium between genes that are specialized and broadly expressed. Papaya demonstrates a reduced presence of high specificity genes, indicating a lower degree of specialization, yet retaining a substantial number of genes with intermediate specificity. Pineapple presents a balanced distribution akin to other fruits, with a greater proportion of genes falling under the intermediate specificity category. Genes with intermediate tissue specificity provide a balance between the need for specialized functions and the flexibility to be used in various tissues. This balance can be evolutionarily advantageous, allowing organisms to adapt to different environments and functional demands. Intermediate tissue specificity genes are expressed in multiple tissues but at varying levels, which enables them to perform different functions in different contexts. This flexibility is crucial for organisms to respond to changing environments and to maintain homeostasis across different tissues [60].
In the case of Tissue Specific Categories of Gene Expression based on tau score, housekeeping genes predominantly exhibit low tissue specificity, indicating their broad expression across various tissues to support essential cellular functions as depicted in Fig. 5. This is because housekeeping genes are constitutively expressed in all cells and conditions, regardless of tissue type, developmental stage, cell cycle state, or external signal. The low tau scores of housekeeping genes indicate that they are expressed at similar levels across different tissues and conditions, which is a key characteristic of these genes [61]. Research on housekeeping genes in Prunus rootstocks identified several candidates with stable expression levels showed medium expression levels and high stability under different stress conditions [62]. Additionally, a study on human housekeeping genes highlighted that a significant fraction of genes is broadly expressed in multiple tissues, emphasizing the importance of understanding tissue-specific gene expression patterns [63]. These findings underscore the significance of identifying and utilizing housekeeping genes with low tau scores for accurate normalization in gene expression studies across different biological contexts. Genes involved in these specialized functions tend to have high tau scores, which are a measure of tissue specificity. Genes with high tau scores are considered tissue-specific, meaning they have a strong preference for expression in specific tissues or organs. For instance, genes associated with the development and function of the testis exhibit high tau scores, reflecting their pronounced tissue specificity within this organ. Genes changing their expression patterns among tissues or under various conditions are prone to more evolutionary alterations compared to genes with steady expression patterns or those vital for basic cellular activities [60].
Tissue specificity plays a crucial role in determining the quality of fruits. The unique lipid composition of avocado fruits is influenced by the specific expression of genes involved in lipid metabolism in different tissues. For example, the mesocarp (flesh) and seed tissues have distinct lipid profiles that contribute to the overall quality of the fruit as previously reported in avocado [64]. Mannose binding-related genes are enriched specifically in avocado and tomato, and contain cellulase genes that might reflect common ripening processes in these species [65]. The gene expression related to the modification of the cell wall, such as xyloglucan endotransglycosylase/hydrolase-like (XTH-like) and pectin methylesterase-like (PME-like) as found in banana in our study, varies depending on the tissue type and plays a role in determining the quality of the fruit. Specifically, the activity of these genes in the peel tissue as the fruit ripens after harvest can impact the fruit’s firmness and storage longevity as reported previously in banana [66]. Genes involved in the metabolism of Gibberellin (GA), including those responsible for encoding GA oxidases and GA biosynthesis enzymes, exhibit differential expression levels across various tissues of the banana plant. Notably, the expression of GA oxidase genes is elevated in young fruits and false stems, indicating a heightened activity in GA metabolism within these tissues, thus potentially influencing the regulation of fruit length as reported in banana [67]. The transcriptomic profile of papaya roots displays a distinct enrichment of genes associated with stress response and defense mechanisms, including the process of hydrogen peroxide breakdown in our study. This indicates that the roots are primed to quickly react to a shortage of water [68] and fruit ripening is associated with the up-regulation of cell wall-related genes, such as polygalacturonase (PG) genes, involved in the breakdown of cell walls, leading to fruit softening [69]. The examination of tissue-specific transcriptomes revealed the presence of gene clusters that were highly abundant in bracts/sepals, indicating their potential role in plant defense mechanisms. Additionally, certain gene clusters were found to be active during the initial stages of fruit growth, suggesting their involvement in endocytosis processes. Another gene cluster exhibited heightened activity as the fruit matured, showing associations with the biosynthesis of terpenoids and polyketides [70]. In guava, the upregulation of gene expression related to ethylene biosynthesis and secondary metabolites, like phenylpropanoid and monolignol pathways, occurs throughout the stages of fruit development, maturation, and ripening and contribute to the softening of fruit tissue and the development of desirable fruit traits like color and flavor [71]. Transcriptomic studies in mango have revealed valuable information regarding the gene expression patterns associated with aroma biosynthesis pathways in various mango cultivars and stages of development. The differential expression of these genes within different tissues and ripening phases plays a crucial role in shaping the distinct flavor characteristics observed in diverse mango varieties [72]. Date palm is a plant with a considered to notable withstand high levels of salinity as per transcriptomic analyses where specific genes implicated in conferring salt tolerance have been recognized, notably including those associated with ion transport proteins as well as auxin-responsive genes [73]. Similar auxin-responsive and iron associated genes were found in our study.
MYB transcription factors are involved in regulating secondary metabolism, cell cycle, and stress responses. Exploring the function of MYB34 in avocado could reveal its role in fruit development, flavour, or stress resilience as also reported in Brassica species. The loss of MYB34 function in Capsella rubella was found to contribute to the backward evolution of indolic glucosinolate biosynthesis [74]. Avocado has a high number of endoglucanase genes, which are involved in fruit ripening. This is consistent with the use of cellulase for fruit ripening in earlier studies in avocado [65]. In banana, we could find EIL1 (Ethylene Insensitive 3-like 1) transcription factor, involved in the ethylene signaling pathway, which plays a crucial role in regulating fruit ripening. EIL1 mRNA is detected in all banana tissues but at lower levels in the peel compared to the pulp [66]. Ascorbate peroxidase (APX) plays a crucial role in the regulation of fruit ripening in bananas. APX activity is higher in banana fruit during ripening, which is associated with the accumulation of reactive oxygen species (ROS) and the breakdown of cell walls [75]. Alcohol dehydrogenase (ADH) is an enzyme that plays a crucial role in the metabolism of ethanol and other aldehydes in various organisms, including date palms (Phoenix dactylifera). ADH activity is higher in date palm sap than in other plant species, indicating its importance in the plant’s metabolism [76]. In our study, in fig, Beta-Amyrin 28-Oxidase was found in chromosome 1, involved in the biosynthesis of triterpenoids, which play a role in the development and ripening of fruits. β-Amyrin 28-oxidase plays a significant role in the ripening process of fruits by catalyzing the oxidation of beta-amyrin to produce oleanolic acid, a precursor of oleanane-type saponins. These saponins are involved in the development and ripening of fruits, particularly in the regulation of cell wall degradation and texture changes during ripening as reported in Eleutherococcus senticosus (Siberian ginseng) [77] and eggplant [78]. AMY (α-amylase) and BAM (β-amylase) genes found in guava in our research are involved in starch degradation and account for 61.4% of the starch degradation genes in guava as previously reported [79]. In guava, Secoisolariciresinol dehydrogenase (SDH) gene is involved in the biosynthesis of secoisolariciresinol and are highly expressed in the pulp of this fruit, indicating their role in red color development [71]. In mango, CHS (Chalcone Synthase) genes are involved in the biosynthesis of urushiols and related phenols, which are responsible for the characteristic flavour and aroma of mangoes. The CHS gene family is highly expanded in mangoes, with some genes showing universally higher expression in peels than in flesh and Lipoxygenase (LOX) genes that are involved in the biosynthesis of aldehydes and are also highly expanded in mangoes [80]. The LOX gene family is highly expressed in mangoes compared to other plants, suggesting a significant role in the flavour profile of mangoes. This expansion is also seen in other fruit species like passion fruit [81] and mango [82]. Genome-wide analysis of mangoes has identified multiple LOX genes, including 9-LOX and 13-LOX, which are involved in the biosynthesis of lactones and other flavour compounds [82]. In papaya, Neutral Invertase genes have been discovered to play a role in sugar production and are crucial for the development of fruit sweetness in papaya [83], whereas bHLH genes serve as transcription factors that govern carotenoid biosynthesis throughout the process of fruit ripening in the same fruit species [84]. Peroxygenase and Epoxide Hydrolase (EH) genes are involved in the biosynthesis of lactones and are important for the flavour profile of papaya [85]. The pineapple genome contains several significant genes and genetic features related to its domestication and cultivation. Far1-Related Sequence (FRS) genes are reported to be expressed in pineapple and rice, involved in reproductive tissue development and have evolved specialized functions in pineapple [86]. Transcription factor genes belonging to the GRF family were detected within the dataset derived from pineapple, playing crucial roles as transcriptional regulators governing various facets of plant development and responses to stress. These genes play key roles in the development of floral organs, growth of leaves, and modulation of hormonal responses [87].
Development of fruit expression atlas
FEAtl, the Fruit Expression Atlas, available at http://backlin.cabgrid.res.in/FEAtl, is a pioneering and comprehensive gene expression database, developed using three-tier architecture and dedicated to tropical and subtropical fruit crops. It focuses on key fruits recognized by the FAO, including Avocado, Fig, Date Palm, Mango, Guava, Papaya, Pineapple, and Banana. It houses the detailed information about these fruit crops with the provision to retrieve transcripts based on the tissues. The detailed information included the transcript information, sample information, expression in FPKM and TPM, functional analysis, gene-based tau score ad search for specific gene. FEAtl features multiple tabs that cater to different aspects of the dataset, namely, Home, About, Analyze, Statistics, and Team. The Home page is designed to offer a comprehensive overview of the dataset, featuring a Quick Start segment that contains hyperlinks leading to individual fruits, along with an option for zooming in on images. Moving on to the About page, it delves into the general materials and methods utilized in the dataset, provides references for each tool incorporated, and includes a section dedicated to acknowledging the sources of the pictures used. Transitioning to the Analyze page, users can access a dropdown menu with hyperlinks for in-depth analysis of each fruit, facilitating a more detailed examination. The page dedicated to Statistics provides a comprehensive overview of dataset details, including elaborate graphical illustrations showcasing the various BioProjects and Samples that have been meticulously studied. Furthermore, it offers a detailed examination of the diverse tissue samples associated with each individual fruit, enhancing the understanding of the research findings. Lastly, the Team page functions as a platform to showcase the individuals contributing to the analysis and development of the database along with the contact information. Additionally, in the footer section, it incorporates a visitor counter to track the number of visits and a global location map for geographical reference in every page as well as contains hyperlinks to the Indian Council of Agricultural Research (ICAR), the ICAR-Indian Agricultural Research Institute (ICAR-IARI), and the ICAR-Indian Agricultural Statistics Research Institute (ICAR-IASRI), the funding bodies for this research.
Our developed FEAtl is more comprehensive and dedicated web-genomic resource of eight fruit crops in contrary to MangoBase which is solely for mango, covering 12 datasets from 11 BioProjects [42].
Each page in FEAtl dedicates to various fruits like Avocado, Fig, Date Palm, Mango, Guava, Papaya, Pineapple, and Banana offers a comprehensive overview that includes detailed information such as the botanical name, family, chromosome number, taxonomy ID linked to the NCBI taxonomy page, GenBank assembly used in the study with a hyperlink to NCBI, genome size, and other relevant genome-related details. To complement this, there is a visual depiction of each fruit. Additionally, FTP links are made available to facilitate easy access and downloading of genome and annotation files by users. These files are available in three formats: .fna (FASTA format), .gtf (Gene Transfer Format), and .gff (General feature Format). Subsequent to the descriptive content, a user-friendly interface presents a selection box where users can pick their preferred tissue type from radio buttons and choose between coding and non-coding transcripts. In order to proceed to the subsequent page, users are mandated to select and then click on the submit button; failure to comply with this requirement will trigger a warning message, thus ensuring that users make a conscious choice before progressing any further. After submission, the user will land on the result overview page, where they will be given options to select from the following categories: Transcript Information, Sample Information, Expression (FPKM), Expression (TPM), Functional Analysis, Genes based on Tau score, and Search for specific gene. Each selection will lead to a specific resultant page providing detailed information and analysis based on the chosen category.
In the page dedicated to Transcript Information, crucial details pertaining to the transcript are presented in a structured tabular layout aimed at facilitating comprehension and analysis. Within this framework, one can find the transcript ID, which is conveniently hyperlinked to the genome annotation of the specific fruit under consideration in databases such as NCBI or Plants Ensembl. Furthermore, the information includes the precise genomic location of the transcript on the chromosome, denoted by either the chromosome number or the shorthand name of the organelle. In certain instances, the accession number may also be provided. The orientation of the transcript (strand), along with a reference identifier number that is linked to the corresponding Nucleotide page on the NCBI platform. Further, details such as the start and end positions of the transcript, its overall length at the nucleotide and peptide levels, as well as the probability of coding, are meticulously outlined for comprehensive understanding. Moreover, a crucial aspect covered in this comprehensive overview is the classification of the transcript as either coding or non-coding, which aids in functional interpretation. Expanding beyond basic information, the section also delves into gene ontology details encompassing molecular functions, biological processes, and cellular components, with direct links provided to the EBI Gene Ontology page for further exploration. Also, insights into the pathway associated with the transcript are shared through the Pathway ID, which is seamlessly linked to the KEGG database for a deeper understanding of its functional implications. To enhance user experience and facilitate navigation through the wealth of information presented, the table is thoughtfully designed with pagination settings allowing for the display of 50 entries per page. Moreover, both vertical and horizontal scrollbars are incorporated to enable smooth traversal and exploration of the extensive dataset provided.
In the Sample Information page, users can access comprehensive details regarding the sample, which are connected through hyperlinks to the SRA page of NCBI. This repository encompasses essential identifiers such as BioSample and BioProject accession, which are intricately linked to their corresponding pages on the NCBI platform. Additionally, a thorough description is included, offering in-depth insights into the specific sample or tissue under examination, along with details on cultivar, developmental stage (if relevant), treatment (if applicable), and any other pertinent information of significance. The structured presentation of this information is organized in a tabular layout, ensuring optimal comprehension and ease of navigation, with pagination and both vertical and horizontal scrollbars incorporated for seamless viewing and exploration.
The Expression Profiles in FPKM and TPM pages share the same features, with the only difference being the values displayed. A heatmap visually represents gene expression data using either Fragments Per Kilobase Million (FPKM) values or Transcripts Per Million (TPM) values. The color gradient ranges from yellow (indicating low expression) to red (indicating high expression). Hovering over the heatmap reveals sample and transcript details, including expression levels. A single click on a transcript ID displays the transcript ID name, description, chromosome number, and coding label, while a single click on a sample ID displays the sample ID number, description, and tissue name. Pagination is provided to facilitate navigation.
In the Functional Analysis page, users can explore detailed information about gene functions. This includes Gene Ontology terms encompassing cellular components, molecular functions, and biological processes, all linked to the EBI Gene Ontology page. Additionally, the pathways in which the genes are involved are linked to KEGG. The information is presented in a tabular format with pagination and vertical and horizontal scrollbars for easy viewing. A single click on a transcript ID reveals the transcript ID name, description, chromosome number, and coding label.
In the Genes Based on Tau Score page, detailed information is provided about the tissue specificity index and tau score. This includes discussions on High/Absolute Specificity, Intermediate Specificity, and Non/Low Specificity/ Housekeeping Genes. A selection panel allows users to choose one of the three categories, and upon submission, a table displays the number of transcripts in the selected category. The table includes the transcript ID, clicking on a transcript ID reveals its details., coding label, tau score, and specificity name. Pagination and vertical and horizontal scrollbars are provided to facilitate navigation.
On the Search for Specific Gene page, users can input gene names separated by commas to see expression levels across samples. They can choose FPKM or TPM as the unit. A heatmap is used to show gene expression, like FPKM and TPM pages. The heatmap ranges from yellow (low expression) to red (high expression). Hovering over it shows sample and transcript details, including expression levels. Clicking on a transcript ID displays its details, and clicking on a sample ID shows sample details. Pagination is available for easy navigation.
A detailed representation of the structure of the FEAtl is provided in Fig. 6 showing the interfaces of FEAtl database, with the number of species, tissue types, bio-projects, and RNA-Seq samples available for query, quick start guide to the fruits studied, detailed information and search page, showcasing the general information for the selected fruits, search options based on tissue and transcript type, statistical summaries, and tools for expression analysis. The interfaces also include access pages for specific tissue types and descriptions of the pipeline used in the development of FEAtl, result page, displaying representative results for gene expression queries focusing on specific tissues and transcripts etc. In Expression Atlas, users can query genes or gene sets of interest and explore their expression across 8 species or within specific species, tissues, and developmental stages in a constitutive or differential context. Such expression Atlas has immense use in further knowledge discovery as research-accelerating tool.
Conclusions
In the current investigation, we have meticulously formulated the pioneering comprehensive and integrated expression atlas tailored for tropical and sub-tropical agricultural produce, aptly dubbed as the Fruit Expression Atlas (FEAtl). This centralized repository of gene expression data as represented in Fig. 7 represents a groundbreaking compilation of a wide array of information pertaining to eight distinct fruit crops: Avocado, Fig, Date Palm, Mango, Guava, Papaya, Pineapple, and Banana, addressing the urgent need for enhanced agricultural productivity. The FEAtl consolidates an extensive collection of data, encompassing 2,060 RNA-Seq samples derived from 91 unique tissue categories, procured from 177 BioProjects, resulting in a grand total of 256,955 transcripts. Notably, there is an absence of a specialized expression atlas dedicated to any other tropical or sub-tropical fruit cultivars. Consequently, FEAtl emerges as a novel expression atlas, presenting a thorough investigation of diverse tropical and sub-tropical fruit crops, thereby emerging as a valuable asset to professionals in the field of horticulture. The FEAtl offers a comprehensive view of gene expression, covering diverse tissue types, developmental stages, growth conditions, and treatments. Its platform integrates essential tools like gene annotation, gene ontology, and pathway analysis, which are crucial for decoding the intricate expression patterns observed in the various samples and genomic elements. The platform’s user-friendly visualization tools further enhance its utility. As a comprehensive transcriptomic resource, the FEAtl plays a crucial role in the advancement of horticultural sciences. It provides in-depth insights into the genetic makeup and behavior of a range of tropical and sub-tropical fruit crops, vital for a deeper understanding of their growth, response to environmental stimuli, and potential areas for genetic enhancement. The platform is invaluable for exploring gene expression across various growth stages and environmental scenarios, thereby enriching knowledge of the molecular processes that influence fruit traits. FEAtl serves as a fundamental resource for comparative analysis among different fruit species and is a catalyst for functional genomic studies.
Data availability
Supplementary file for the BioProjects studied for the development of FEAtl is available for the reference.
References
Krishnaswamy K, Gayathri R. Nature’s bountiful gift to humankind: vegetables & fruits & their role in cardiovascular disease & diabetes. Indian J Med Res. 2018;148(5):569–95.
Singh KM, Ahmad N, Pandey VL, Sinha DK. Impact of National Horticulture Mission on Vegetable and Fruit sectors of India. Indian J Econ Dev. 2022;66–75.
Savo V, Kumbaric A, Caneva G. Grapevine (Vitis vinifera L.) Symbolism in the ancient Euro-Mediterranean cultures. Econ Bot. 2016;70(2):190–7.
Alae-Carew C, Bird FA, Choudhury S, Harris F, Aleksandrowicz L, Milner J, et al. Future diets in India: a systematic review of food consumption projection studies. Glob Food Sect. 2019;23:182–90.
Boeing H, Bechthold A, Bub A, Ellinger S, Haller D, Kroke A, et al. Critical review: vegetables and fruit in the prevention of chronic diseases. Eur J Nutr. 2012;51(6):637–63.
World Health Organization. 2005. Fruit and vegetables for health: report of the Joint FAO.
Pem D, Jeewon R. Fruit and Vegetable Intake: benefits and Progress of Nutrition Education interventions- Narrative Review article. Iran J Public Health. 2015;44(10):1309–21.
Taren D, Wiseman M. Feedback on WHO/FAO global report on diet, nutrition and non-communicable diseases. Public Health Nutr. 2003;6(5):425–425.
FAO. 2020. Fruit and vegetables – your dietary essentials. The International Year of Fruits and Vegetables, 2021, background paper. Rome.
FAO. [World Programme for the Census of Agriculture 2020]. License: CC BY-NC-SA 3.0 IGO.
CRICK F. Central Dogma of Molecular Biology. Nature. 1970;227(5258):561–3.
Cope M. Transcripts (coding and analysis). International Encyclopedia of Human Geography. Elsevier; 2009. pp. 350–4.
Guo J. Transcription: the epicenter of gene expression. J Zhejiang Univ Sci B. 2014;15(5):409–11.
Farrell RE, Bassett CL. Multiple transcript initiation as a mechanism for regulating Gene expression. Regulation of Gene expression in plants. Boston, MA: Springer US; 2007. pp. 39–66.
Mignone F, Gissi C, Liuni S, Pesole G. Untranslated regions of mRNAs. Genome Biol. 2002;3(3):reviews00041.
Lyu X, Yang Q, Zhao F, Liu Y. Codon usage and protein length-dependent feedback from translation elongation regulates translation initiation and elongation speed. Nucleic Acids Res. 2021;49(16):9404–23.
Shamsuzzaman M, Rahman N, Gregory B, Bommakanti A, Zengel JM, Bruno VM, et al. Inhibition of Ribosome Assembly and Ribosome Translation has distinctly different effects on abundance and paralogue composition of ribosomal protein mRNAs in Saccharomyces cerevisiae. mSystems. 2023;8(1):e0109822.
Strunk BS, Loucks CR, Su M, Vashisth H, Cheng S, Schilling J, et al. Ribosome assembly factors prevent premature translation initiation by 40S assembly intermediates. Science. 2011;333(6048):1449–53.
Waititu JK, Zhang C, Liu J, Wang H. Plant non-coding RNAs: Origin, Biogenesis, Mode of Action and their roles in Abiotic Stress. Int J Mol Sci. 2020;21(21):8401.
Barrett LW, Fletcher S, Wilton SD. Regulation of eukaryotic gene expression by the untranslated gene regions and other non-coding elements. Cell Mol Life Sci. 2012;69(21):3613–34.
Raghavachari N, Garcia-Reyero N. In. Overview of Gene expression analysis: Transcriptomics. 2018. pp. 1–6.
Thome M, Skrablin MD, Brandt SP. Tissue-specific mechanical microdissection of higher plants. Physiol Plant. 2006;128(3):383–90.
Yaschenko AE, Fenech M, Mazzoni-Putman S, Alonso JM, Stepanova AN. Deciphering the molecular basis of tissue-specific gene expression in plants: can synthetic biology help? Curr Opin Plant Biol. 2022;68:102241.
Hurgobin B, Lewsey MG. Applications of cell- and tissue-specific ’omics to improve plant productivity. Emerg Top Life Sci. 2022;6(2):163–73.
Kryuchkova-Mostacci N, Robinson-Rechavi M. A benchmark of gene expression tissue-specificity metrics. Brief Bioinform. 2016. bbw008.
Zheng Y, Ma Y, Luo J, Li J, Zheng X, Gong H, et al. Identification and analysis of reference and tissue-specific genes in bitter Gourd based on Transcriptome Data. Horticulturae. 2023;9(12):1262.
Dennis ES, Ellis J, Green A, Llewellyn D, Morell M, Tabe L, et al. Genetic contributions to agricultural sustainability. Philosophical Trans Royal Soc B: Biol Sci. 2008;363(1491):591–609.
Ku HK, Ha SH. Improving Nutritional and Functional Quality by Genome editing of crops: Status and perspectives. Front Plant Sci. 2020;11.
Heather JM, Chain B. The sequence of sequencers: the history of sequencing DNA. Genomics. 2016;107(1):1–8.
Islas-Osuna MA, Tiznado-Hernández ME. Biotechnology and molecular biology of tropical and subtropical fruits. Postharvest Biology and Technology of Tropical and Subtropical fruits. Elsevier; 2011. pp. 315–80.
Deshpande D, Chhugani K, Chang Y, Karlsberg A, Loeffler C, Zhang J et al. RNA-seq data science: from raw data to effective interpretation. Front Genet. 2023;14.
Mekso MM, Feyissa T. RNA-Seq as an effective Tool for Modern Transcriptomics, a review-based study. J Appl Res Plant Sci. 2022;3(02):236–41.
Kukurba KR, Montgomery SB. RNA sequencing and analysis. Cold Spring Harb Protoc. 2015;2015(11):pdbtop084970.
Rai MK, Rathour R, Kaushik S. Recent advances in Transcriptomics: an Assessment of recent progress in Fruit plants. Omics Technologies for Sustainable Agriculture and Global Food Security (vol II). Singapore: Springer Singapore; 2021. pp. 95–122.
Ayala-Doñas A, de Cara-GarcÃa M, Román B, Gómez P. Gene expression in Zucchini Fruit Development. Horticulturae. 2022;8(4):306.
Carey S, Mendler K, Hall JC. How to build a fruit: transcriptomics of a novel fruit type in the Brassiceae. PLoS ONE. 2019;14(7):e0209535.
Sharma N. Differential Gene expression studies: a possible way to understand bearing habit in Fruit crops. Transcr Open Access. 2015;03(02).
Zhang C, Hao YJ. Advances in genomic, transcriptomic, and Metabolomic Analyses of Fruit Quality in Fruit crops. Hortic Plant J. 2020;6(6):361–71.
Penna S, Jain SM. Fruit Crop Improvement with Genome Editing, in Vitro and transgenic approaches. Horticulturae. 2023;9(1):58.
Escoto-Sandoval C, Ochoa-Alejo N, MartÃnez O. Inheritance of gene expression throughout fruit development in Chili pepper. Sci Rep. 2021;11(1):22647.
Keller-Przybyłkowicz SE, Rutkowski KP, Kruczyńska DE, Pruski K. Changes in gene expression profile during fruit development determine fruit quality. Hortic Sci. 2016;43(1):1–9.
Gómez-Ollé A, Bullones A, Hormaza JI, Mueller LA, Fernandez-Pozo N. MangoBase: A Genomics Portal and Gene expression Atlas for Mangifera indica. Plants. 2023;12(6):1273.
Leinonen R, Sugawara H, Shumway M. The sequence read Archive. Nucleic Acids Res. 2011;39(Database):D19–21.
Leinonen R, Akhtar R, Birney E, Bower L, Cerdeno-Tarraga A, Cheng Y, et al. The European Nucleotide Archive. Nucleic Acids Res. 2011;39(Database):D28–31.
Kersey PJ, Lawson D, Birney E, Derwent PS, Haimel M, Herrero J, et al. Ensembl genomes: extending Ensembl across the taxonomic space. Nucleic Acids Res. 2010;38(suppl1):D563–9.
Andrews S. 2010. FastQC: a quality control tool for high throughput sequence data.
Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30(15):2114–20.
Kim D, Paggi JM, Park C, Bennett C, Salzberg SL. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat Biotechnol. 2019;37(8):907–15.
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25(16):2078–9.
Pertea M, Kim D, Pertea GM, Leek JT, Salzberg SL. Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. Nat Protoc. 2016;11(9):1650–67.
Pertea M, Pertea GM, Antonescu CM, Chang TC, Mendell JT, Salzberg SL. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol. 2015;33(3):290–5.
Kang YJ, Yang DC, Kong L, Hou M, Meng YQ, Wei L, et al. CPC2: a fast and accurate coding potential calculator based on sequence intrinsic features. Nucleic Acids Res. 2017;45(W1):W12–6.
Yanai I, Benjamin H, Shmoish M, Chalifa-Caspi V, Shklar M, Ophir R, et al. Genome-wide midrange transcription profiles reveal expression level relationships in human tissue specification. Bioinformatics. 2005;21(5):650–9.
Condon K. 2020. tispec: Calculates tissue specificity from RNA-seq data. (https://github.com/roonysgalbi/tispec).
Lüleci HB, Yılmaz A. Robust and rigorous identification of tissue-specific genes by statistically extending tau score. BioData Min. 2022;15(1):31.
dos Santos GA, Chatsirisupachai K, Avelar RA, de Magalhães JP. Transcriptomic analysis reveals a tissue-specific loss of identity during ageing and cancer. BMC Genomics. 2023;24(1):644.
Conesa A, Götz S, GarcÃa-Gómez JM, Terol J, Talón M, Robles M. Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics. 2005;21(18):3674–6.
Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000;28(1):27–30.
Dhamija S, Menon MB. Non-coding transcript variants of protein-coding genes – what are they good for? RNA Biol. 2018;1–7.
Jiang W, Chen L. Tissue specificity of Gene expression evolves across Mammal species. J Comput Biol. 2022;29(8):880–91.
Joshi CJ, Ke W, Drangowska-Way A, O’Rourke EJ, Lewis NE. What are housekeeping genes? PLoS Comput Biol. 2022;18(7):e1010295.
Bastias A, Oviedo K, Almada R, Correa F, Sagredo B. Identifying and validating housekeeping hybrid Prunus spp. genes for root gene-expression studies. PLoS ONE. 2020;15(3):e0228403.
Al-Dasooqi N, Bowen JM, Gibson RJ, Logan RM, Stringer AM, Keefe DM. Selection of housekeeping genes for gene expression studies in a rat model of Irinotecan-Induced Mucositis. Chemotherapy. 2011;57(1):43–53.
Ge Y, Dong X, Liu Y, Yang Y, Zhan R. Molecular and biochemical analyses of avocado (Persea americana) reveal differences in the oil accumulation pattern between the mesocarp and seed during the fruit developmental period. Sci Hortic. 2021;276:109717.
Nath O, Fletcher SJ, Hayward A, Shaw LM, Masouleh AK, Furtado A, et al. A haplotype resolved chromosomal level avocado genome allows analysis of novel avocado genes. Hortic Res. 2022;9:uhac157.
Mbéguié-A-Mbéguié D, Hubert O, Baurens FC, Matsumoto T, Chillet M, Fils-Lycaon B, et al. Expression patterns of cell wall-modifying genes from banana during fruit ripening and in relationship with finger drop. J Exp Bot. 2009;60(7):2021–34.
Chen J, Xie J, Duan Y, Hu H, Hu Y, Li W. Genome-wide identification and expression profiling reveal tissue-specific expression and differentially-regulated genes involved in gibberellin metabolism between Williams banana and its dwarf mutant. BMC Plant Biol. 2016;16(1):123.
Gamboa-Tuz SD, Pereira-Santana A, Zamora-Briseño JA, Castano E, Espadas-Gil F, Ayala-Sumuano JT, et al. Transcriptomics and co-expression networks reveal tissue-specific responses and regulatory hubs under mild and severe drought in papaya (Carica papaya L). Sci Rep. 2018;8(1):14539.
Fabi JP, Broetto SG, da Silva SLGL, Zhong S, Lajolo FM, do Nascimento JRO. Analysis of Papaya Cell Wall-related genes during Fruit Ripening indicates a central role of Polygalacturonases during pulp softening. PLoS ONE. 2014;9(8):e105685.
Mao Q, Chen C, Xie T, Luan A, Liu C, He Y. Comprehensive tissue-specific transcriptome profiling of pineapple (Ananas comosus) and building an eFP-browser for further study. PeerJ. 2018;6:e6028.
Mittal A, Yadav IS, Arora NK, Boora RS, Mittal M, Kaur P, et al. RNA-sequencing based gene expression landscape of guava cv. Allahabad Safeda and comparative analysis to colored cultivars. BMC Genomics. 2020;21(1):484.
Pathak G, Dudhagi SS, Raizada S, Sane VA. Transcriptomic insight into Aroma Pathway genes and effect of ripening difference on expression of aroma genes in different mango cultivars. Plant Mol Biol Rep. 2022.
Yaish MW, Patankar HV, Assaha DVM, Zheng Y, Al-Yahyai R, Sunkar R. Genome-wide expression profiling in leaves and roots of date palm (Phoenix dactylifera L.) exposed to salinity. BMC Genomics. 2017;18(1):246.
Chen D, Chen H, Dai G, Zhang H, Liu Y, Shen W, et al. Genome-wide identification of R2R3-MYB gene family and association with anthocyanin biosynthesis in Brassica species. BMC Genomics. 2022;23(1):441.
Corpas FJ, González-Gordo S, Palma JM. Ascorbate peroxidase in fruits and modulation of its activity by reactive species. J Exp Bot. 2024;75(9):2716–32.
Gupta N, Dubey A, Tewari L. High efficiency alcohol tolerant Saccharomyces isolates of Phoenix dactylifera for bioconversion of sugarcane juice into bioethanol. J Sci Ind Res. 2009;68:401–5.
Jo HJ, Han JY, Hwang HS, Choi YE. β-Amyrin synthase (EsBAS) and β-amyrin 28-oxidase (CYP716A244) in oleanane-type triterpene saponin biosynthesis in Eleutherococcus senticosus. Phytochemistry. 2017;135:53–63.
Sang Z, Zuo J, Wang Q, Fu A, Zheng Y, Ge Y, et al. Determining the effects of Light on the Fruit Peel Quality of Photosensitive and Nonphotosensitive Eggplant. Plants. 2022;11(16):2095.
Feng C, Feng C, Lin X, Liu S, Li Y, Kang M. A chromosome-level genome assembly provides insights into ascorbic acid accumulation and fruit softening in guava (Psidium guajava). Plant Biotechnol J. 2021;19(4):717–30.
Wang P, Luo Y, Huang J, Gao S, Zhu G, Dang Z, et al. The genome evolution and domestication of tropical fruit mango. Genome Biol. 2020;21(1):60.
Huang D, Ma F, Wu B, Lv W, Xu Y, Xing W, et al. Genome-Wide Association and expression analysis of the Lipoxygenase Gene Family in Passiflora edulis revealing PeLOX4 might be involved in Fruit ripeness and Ester Formation. Int J Mol Sci. 2022;23(20):12496.
Deshpande AB, Chidley HG, Oak PS, Pujari KH, Giri AP, Gupta VS. Isolation and characterization of 9-lipoxygenase and epoxide hydrolase 2 genes: insight into lactone biosynthesis in mango fruit (Mangifera indica L). Phytochemistry. 2017;138:65–75.
Zhou Z, Ford R, Bar I, Kanchana-Udomkan C. Papaya (Carica papaya L.) Flavour profiling. Genes (Basel). 2021;12(9).
Zhou D, Shen Y, Zhou P, Fatima M, Lin J, Yue J, et al. Papaya CpbHLH1/2 regulate carotenoid biosynthesis-related genes during papaya fruit ripening. Hortic Res. 2019;6:80.
Ming R, Hou S, Feng Y, Yu Q, Dionne-Laporte A, Saw JH, et al. The draft genome of the transgenic tropical fruit tree papaya (Carica papaya Linnaeus). Nature. 2008;452(7190):991–6.
Yow AG, Bostan H, Castanera R, Ruggieri V, Mengist MF, Curaba J, et al. Improved High-Quality Genome Assembly and Annotation of Pineapple (Ananas comosus) Cultivar MD2 revealed extensive haplotype diversity and Diversified FRS/FRF Gene Family. Genes (Basel). 2021;13(1):52.
Yi W, Luan A, Liu C, Wu J, Zhang W, Zhong Z et al. Genome-wide identification, phylogeny, and expression analysis of GRF transcription factors in pineapple (Ananas comosus). Front Plant Sci. 2023;14.
Acknowledgements
The authors are thankful to CABin Scheme (F. no. Agril. Edn. 4–1/2013-A&P), Indian Council of Agricultural Research, Ministry of Agriculture and Farmers’ Welfare, Govt. of India for providing infrastructural support to carry out this research and for creation of Advanced Super Computing Hub for Omics Knowledge in Agriculture (ASHOKA) facility where the work was carried out. The grant of IARI Merit scholarship to AR is duly acknowledged.
Funding
Not applicable.
Author information
Authors and Affiliations
Contributions
DK, MAI, SJ, and MS designed the study; AR, BK, and NK carried out the computational experiments; AR, HC, and UB constructed the database; AR, BK and SJ conducted data interpretation and wrote the initial draft of the manuscript. SJ and MAI reviewed and revised the manuscript. All authors have read and approved the final version of the manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Roy, A., Chaurasia, H., Kumar, B. et al. FEAtl: a comprehensive web-based expression atlas for functional genomics in tropical and subtropical fruit crops. BMC Plant Biol 24, 890 (2024). https://doi.org/10.1186/s12870-024-05595-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12870-024-05595-3