Skip to main content

The landscape of fusion transcripts in plants: a new insight into genome complexity

Abstract

Background

Fusion transcripts (FTs), generated by the fusion of genes at the DNA level or RNA-level splicing events significantly contribute to transcriptome diversity. FTs are usually considered unique features of neoplasia and serve as biomarkers and therapeutic targets for multiple cancers. The latest findings show the presence of FTs in normal human physiology. Several discrete reports mentioned the presence of fusion transcripts in planta, has important roles in stress responses, morphological alterations, or traits (e.g. seed size, etc.).

Results

In this study, we identified 169,197 fusion transcripts in 2795 transcriptome datasets of Arabidopsis thaliana, Cicer arietinum, and Oryza sativa by using a combination of tools, and confirmed the translational activity of 150 fusion transcripts through proteomic datasets. Analysis of the FT junction sequences and their association with epigenetic factors, as revealed by ChIP-Seq datasets, demonstrated an organised process of fusion formation at the DNA level. We investigated the possible impact of three-dimensional chromatin conformation on intra-chromosomal fusion events by leveraging the Hi-C datasets with the incidence of fusion transcripts. We further utilised the long-read RNA-Seq datasets to validate the most reoccurring fusion transcripts in each plant species followed by further authentication through RT-PCR and Sanger sequencing.

Conclusions

Our findings suggest that a significant portion of fusion events may be attributed to alternative splicing during transcription, accounting for numerous fusion events without a proportional increase in the number of RNA pairs. Even non-nuclear DNA transcripts from mitochondria and chloroplasts can participate in intra- and inter-chromosomal fusion formation. Genes in close spatial proximity are more prone to undergoing fusion formation, especially in intra-chromosomal FTs. Most of the fusion transcripts may not undergo translation and serve as long non-coding RNAs. The low validation rate of FTs in plants indicated that the fusion transcripts are expressed at very low levels, like in the case of humans. FTs often originate from parental genes involved in essential biological processes, suggesting their relevance across diverse tissues and stress conditions. This study presents a comprehensive repository of fusion transcripts, offering valuable insights into their roles in vital physiological processes and stress responses.

Peer Review reports

Introduction

Fusion transcripts (FTs), also referred to as chimeric RNAs, represent a class of transcripts formed through the fusion of two distinct gene/transcript sequences, either at the DNA or RNA level [1, 2]. Chromosome duplication, deletion, inversion, translocation, gene duplication, and recombination contribute to the formation of chimera or fusion genes, which then transcribe to generate fusion transcripts [3, 4]. At the RNA level, fusion transcripts can arise from diverse mechanisms, such as intergenic cis- or trans-RNA splicing and read-through processes, giving rise to transcription-induced chimeras (TIC) or cis-fusion transcripts [5], tandem chimeric RNAs [5, 6], transcription-induced gene fusions (TIGF) [7], or cis-SAGes [8]. Additionally, the contribution of transposon-mediated fusions [9] and retroposition [10] further amplifies the heterogeneity of these fusion transcripts. This intricate RNA-level fusion formation presents an alternative gene fusion mechanism where the fusion occurs between the exons of pre-existing genes. Remarkably, while this process does not increase the number of genes, it significantly enhances the transcriptome complexity, leading to the emergence of new transcription units with potentially novel functions. In some instances, these novel transcriptional units may become fixed as new genes in the genome through secondary evolutionary events [11]. Importantly, fusion transcripts can serve diverse roles, functioning as non-coding RNAs or potentially encoding novel proteins, thereby influencing cellular responses, physiological processes, and signalling events across various organisms [12, 13].

In humans, fusion events have been identified as significant contributors to oncogenicity and potential biomarkers for cancer diagnosis, prognosis, and therapeutic interventions [14,15,16,17,18]. Intriguingly, the presence of FTs has also been detected under normal physiological conditions [19, 20].

In parallel, research in the field of fusion transcript discovery has expanded to include plants, where in vivo trans-splicing has been reported in numerous species [21,22,23]. This led to the revelation of fusion transcripts in plant genomes. Over the past decade, there has been substantial progress in identifying FTs in various plant species, including [24,25,26] red clover (Trifolium pratense L.) [26], Oryza [25, 27], Arabidopsis thaliana [28], Camellia sinensis [29], and Zea mays [30]. These plant-specific fusion transcripts contribute to the complexity of plant transcriptome, potentially playing roles in stress responses, morphological alterations, or gene silencing [31,32,33,34,35,36].

For instance, in Solanum lycopersicum, the PFP-LeT6 fusion has been found to induce morphological changes in tomato leaves, shedding light on the evolutionary dynamics of leaf patterning in the tomato family [35, 37]. Further expanding the scope, fusion transcripts have been implicated in the regulation of critical metabolic pathways such as benzylisoquinoline alkaloid (BIA), terpenoid, cyanogenic glycosides, benzoxazinoids, and amino acid metabolic metabolism [38]. For instance, the norcoclaurine synthase (NCS) gene fusion in Papaver somniferum, and its related species plays a pivotal role in the primary step of the BIA pathway [39]. Similarly, in Oryza rufipogon, the discovery of the “GRAINS NUMBER 2” (GN2) gene, derived from gene fusion, has been linked to critical rice traits, including grain number, plant height, and heading date [40]. These findings collectively emphasize the pivotal role of fusion events in generating novel genes with distinct functions across various organisms. These events contribute to adaptation and enhance overall organism functioning, making the study of fusion transcripts an asset. However, in the realm of plants, the establishment of a comprehensive baseline for fusion events has been sporadic at best.

The advent of high-throughput technology has ushered in an era of data accumulation, providing invaluable insights into the molecular processes underlying fusion events. Notably, whole transcriptome sequencing has emerged as a cost-effective and sensitive approach for detecting fusion genes that span exons from two distinct genes, surpassing earlier detection techniques such as RT-PCR-based methods, fluorescence in situ hybridisation (FISH), and microarrays [41].

Interestingly, there are around 40 tools available for the identification of fusion transcripts from RNA-Seq datasets, but all are trained and tested on human datasets. Recently, we have identified fusion transcripts in the model plant Arabidopsis thaliana by using a modified tool (EricScript-Plants) and assessed their tissue-wise distribution [2]. Here, we have used a combination of tools and a meta caller (FuMa) for the identification of fusion transcripts in Arabidopsis thaliana, Cicer arietinum, and Oryza sativa (indica and japonica subgroups). Further characterization of fusion transcripts was also done by utilizing multi-omics datasets. We have also validated the top recurrent FTs by long-read sequencing datasets followed by Sanger sequencing and RT-PCR. Furthermore, a database, PFusionDB (www.nipgr.ac.in/PFusionDB) was developed for all the identified fusion transcripts by incorporating different kinds of tools viz. BLAST, ‘WATER’ utility of EMBOSS-6.6.0 package, and other search modules. To the best of our knowledge, the fusion catalog, and analysis presented in this research represent the most extensive and insightful exploration of fusion transcripts in plants to date. The data herein represents a vast repository of fusion events, the relevance and significance of which warrant further investigation in the realm of plant biology.

Materials and methods

Data retrieval for study

The latest genome assemblies of Arabidopsis thaliana (TAIR10.1) and Cicer arietinum (ASM33114v1) were obtained from the National Center for Biotechnology Information (NCBI) genome database (https://www.ncbi.nlm.nih.gov/), while the genome assemblies of Oryza sativa (rice) indica (ASM465v1) and japonica (IRGSP-1.0) subgroups were retrieved from Ensembl Plants (https://plants.ensembl.org/).

A total of 2795 Illumina paired-end RNA-Seq datasets of non-mutant (e.g. wild type/normal conditions) studies were retrieved using SRA-Toolkit (v2.11.0) (https://www.ncbi.nlm.nih.gov/sra/docs/toolkitsoft/) from the NCBI Sequence Read Archive (SRA) database. A total of 1280, 181, 697, and 637 paired-end datasets of A. thaliana, C. arietinum, O. sativa indica, and japonica subgroups, respectively, were retrieved for fusion identification. A total of 596 ChIP-Seq datasets of A. thaliana, 4 of O. sativa indica, and 87 of O. sativa japonica were downloaded from NCBI-SRA. It includes both paired-end and single-end libraries. To analyse Hi-C datasets, only wild-type (e.g., normal conditions) samples with restriction digestion enzymes specified were selected from NCBI. For this purpose, a total of 24 samples of A. thaliana, and eight samples of O. sativa (four each of the indica and japonica subgroups) were retrieved. No ChIP-Seq and Hi-C interaction data was found for C. arietinum.

The mass spectrometry raw data files for wild-type samples, which contain no mutants or genetic modifications, were retrieved from the PRIDE database [42]. Here, a total of 1668, 189, 121, and 825 LCMS data files were downloaded for the A. thaliana, C. arietinum, O. sativa indica, and japonica subgroups, respectively.

Fusion transcripts identification

Five fusion detection tools, namely STAR-Fusion (v1.10.0) [43], TrinityFusion (v0.3.5) [43], SQUID (v1.5) [44], EricScript-Plants (v0.5.5b) [45], and MapSplice (v2.2.1) [46], were used to identify the fusion transcripts in RNA-Seq datasets. These tools were selected based on their ability to run efficiently on non-human datasets. EricScript-Plants built its genome database from the Ensembl genome database files, which were missing in the case of C. arietinum and hence, not used for this species.

First, raw sequencing reads were pre-processed for adaptor trimming and quality check using the FASTP tool (v0.21.0) [47]. Further, each tool was used for fusion identification at default parameters (Supplementary File 1). The individual results from each tool were then used as input for FuMa [48] to identify the overlapping fusion transcripts. Identical fusion transcripts amongst the results of different tools were detected and used for further analysis using FuMa’s overlap-based matching strategy, which considers a fusion transcript with overlapping parental genes common among the tools.

Motif analysis and scanning of RNA binding domains

The 200 bp regions of 5′ and 3′ parental genes, which have been used for finding DNA-binding proteins, were analysed to find the motifs by the Gapped Local Alignment of Motifs (GLAM2) tool of the MEME SUITE (v5.3.3) [49]. These motifs were scanned by Tomtom against plant motif databases (CISBP-RNA) https://meme-suite.org/meme/tools/tomtom for RNA binding protein domains [50].

Identification of DNA binding proteins around the breakpoint region of fusion parental genes

A total of 719 ChIP-Seq samples were aligned to their respective genomes using Bowtie2 (v2.4.2) [51, 52], out of those, 597 samples aligned for A. thaliana, 10 for O. sativa indica, and 112 for O. sativa japonica. The MACS3 (v3.0.0a6) [53] tool was used to call peaks for the aligned reads after converting SAM files to sorted BAM using SAMtools [54]. Following that, Bedtools intersect tool (v2.26.0) [55] was used to identify the intersections between detected fusion-gene coordinates and peaks. Protein interactions enriched around the breakpoint positions of the respective fusion partners were studied with 100 bp sequences upstream and downstream from the breakpoint region of each gene taken for analysis of DNA-binding proteins and histone modification sites.

Hi-C interaction data analysis

The downloaded raw Hi-C interaction reads were pre-processed for adapter trimming with the FASTP tool and further used as input for the HiC-Pro tool [56], which generated valid interactions as output. Using an in-house script, the interaction result file obtained from the Hi-C analysis was annotated using the gene annotation file to extract the interaction involved in the fusion forming parental genes. The interactions around the breakpoint junctions were extracted by locating the breakpoints in the highly interacting regions of the genes.

Chimeric peptides validation

For the identification of translationally active fusion transcripts, we searched publicly available proteomic datasets against the peptide formed at the junction of the two parental genes, forming a chimeric transcript. For this, a chimeric peptide database was created from the identified fusion transcripts using 200 bp upstream and downstream regions from the breakpoint positions of the 5’ and 3’ parental genes, respectively, and concatenating them to form putative fusion transcript sequences. Subsequently, three-frame translation was carried out by the “transeq” utility, and then the translated peptides were in-silico digested by trypsin using the “pepdigest” utility of EMBOSS (v6.6.0) [57], because the available proteomic datasets were trypsin digested. The peptides with no amino acids translated from the breakpoint junction were discarded. The rest of the peptides were combined with reviewed protein sequences from UniProt to create a database.

A total of 2800 raw mass spectra files from 12 different stress conditions or treatments, and 14 different normal tissue or cell types were downloaded and converted to standardised MS-mzML format with the help of ThermoRawFileParserGUI (v1.6.0) [58] and MSConvert (by ProteoWizard v3.0.2) [59]. The chimeric peptide databases were searched against the LCMS mzML data files for peptide spectrum matches (PSMs) by the MSGFPlus (v2022.01.07) tool [60]. The PSMs were assigned a statistical q-value of less than 0.05 that defined the minimal threshold for defining potential true positives [61].

Long-read RNA-Seq data analysis and fusion validation

RNAbloom2 was used for error correction and de novo transcriptome assembly of long-read PacBio RNA-Seq datasets collected from the NCBI Sequence Read Archive (SRA). Further, the fusion junction sequence was searched against assembled long-read transcriptome by using BLASTn with an e-value threshold of 0.01 to ensure fusion detection in long-read data.

Plant material, growth conditions, and stress treatment

Seeds of Arabidopsis thaliana Columbia 0 (Col-0) ecotype and rice (Oryza sativa L.) genotype IR64, a semi-dwarf indica rice variety, were procured and grown at the NIPGR plant growth facility in controlled conditions. Arabidopsis seeds were surface sterilized and stratified for 48 h at 4 °C, and then grown in an autoclaved mixture (1:1) of agro-peat and vermiculite in plastic pots at 22 °C in a culture room for 3 weeks under 16 h light/8 h dark (long day) conditions. For drought treatment, 21-day-old seedlings were removed from the pot and transferred to folds of tissue paper. For salinity stress, seedlings were transferred to a beaker containing 150 mM NaCl solution at 22 °C. For cold treatment, the potted seedlings were kept at 4 °C. The control seedlings were kept in plastic pots at 22 °C. Each stress was provided for a period of 5 h, and samples were collected for RNA isolation.

Rice seeds were surface-sterilized and allowed to germinate in a soil-less medium. The sterilized seeds were soaked in distilled water for 24 h in a flask at 37 °C. Later, these seeds were placed in MS agar medium for germination for a week, and the coleoptile of germinated seeds was slotted in the Styrofoam and placed in Yoshida nutrient solution. The plant growth chamber was maintained at 16 h/ 8 h light/dark photoperiod, and 26 °C/28°C Day/night temperature. 21-day-old seedlings were subjected to drought [20% (w/v) polyethylene glycol (PEG) 6000], salt stress [150 mM NaCl], and cold stress [4 °C]. Tissues were collected from the stressed and controlled seedlings after 5 h of the treatment. Three independent biological replicates of each tissue sample were harvested, frozen in liquid nitrogen, and stored at -80℃ until further use.

RNA isolation and cDNA preparation, quantitative real-time PCR validation

Total RNA was isolated from 100 mg of tissue using the RNeasy Plant Mini Kit (Qiagen) according to the manufacturer’s instructions. The quality and quantity of RNA samples were assessed using the Nanodrop Spectrophotometer (NanoDrop Technologies) and Agilent Bioanalyzer. DNase treatment was applied to 5 µg of total RNA using RNase-Free DNase Set (Qiagen) following the manufacturer’s instructions. The cDNA was synthesized from 2 µg RNA using the Verso cDNA synthesis kit (Thermo Fisher, USA). The real-time PCR assay was performed on StepOne™ Real-Time PCR System (Applied Biosystems™) with the reaction mixture (10 µl) consisting of 5 µl of 2X SyBr green master mix (applied biosystem), 10 µM of each primer, and 100 ng of cDNA as the template. The temperature profiling was set at 95 °C for 2 min followed by 40 cycles at 95 °C for 15 s and 60 °C for 1 min. All the reactions were carried out at least in three biological as well as three technical replicates, and the obtained data were evaluated by employing the 2Ct method to determine the relative expression level [62]. Following RT-PCR and gel electrophoresis, DNA bands were extracted and purified using a GeneJET Gel Extraction kit (Thermo Scientific™) and sent for Sanger sequencing at the NIPGR DNA sequencing facility.

Statistical analysis

All experimental data were expressed as the mean with standard deviation (mean ± SD) of three independent biological replicates. A test of statistical significance using a comparison among the means of control and stressed plants was conducted using one-way analysis of variance (ANOVA) followed by Student’s t-test at P-value < 0.05. The P-values < 0.05 were considered statistically significant.

Results

Fusion transcripts (FTs) identification

In this comprehensive study, we profiled a total of 113,740 unique fusion transcripts from 2795 paired-end RNA-Seq libraries, spanning 35 diverse tissues or cell types, and encompassing 63 different conditions or treatments. Specifically, we identified 62,716, 6690, 25,016, and 19,318 overlapping fusion events in Arabidopsis thaliana, Cicer arietinum, Oryza sativa indica, and japonica subgroups, respectively (Supplementary Table S1). Intriguingly, we observed multiple junction positions for the same parental RNA pairs, resulting in a total of 75,155 unique fusion RNA pairs. These fusion transcripts stemmed from 47,795 parental genes in A. thaliana, 3727 in C. arietinum, 12,170 in O. sativa indica, and 11,463 in O. sativa japonica. Furthermore, our analysis unveiled that a diverse set of 18,015 genes contributed to fusion transcript formation in A. thaliana, 3431 in C. arietinum, 13,176 in O. sativa indica, and 12,742 in O. sativa japonica. Notably, despite C. arietinum boasting the largest genome among the studied species, the number of FTs discovered was comparatively lower than in A. thaliana and O. sativa (indica and japonica subgroups). This discrepancy can be attributed to the relatively smaller number of datasets profiled and the less extensive genome annotation in C. arietinum.

Conservation of fusion transcripts among different plant species

To facilitate a direct comparison of fusion occurrences, we searched for homologs of parental genes involved in fusion formation across four plant species by using the OrthoFinder tool. Our analysis revealed that 221 fusion transcripts of Arabidopsis thaliana have their homologs fusion pairs in Oryza sativa japonica (Supplementary Table S2). Likewise, we identified a total of 205 Arabidopsis thaliana fusion orthologs in Oryza sativa indica, as well as 156 fusion pair orthologs in Cicer arietinum, which contribute to the emergence of fusion transcripts within these organisms. We hypothesized that the formation of these fusion transcripts holds functional significance for plants and may lead to their evolutionary selection.

Diversity of fusion transcripts across stress and tissue type

The analysis of fusion transcripts across various stress conditions, and tissues unveiled distinct expression patterns of chimeric RNAs in distinct samples. In some samples, fusion transcripts exhibited limited occurrences, such as those observed in samples from H. armigera infection, while others were more prolific, with hundreds of events recorded, particularly in samples from leaf tissue (Supplementary Figures S1-S2). Interestingly, most of the chimeric fusion transcripts were detected in just a single transcriptomic sample (83% total). This pattern was even more pronounced in A. thaliana, where almost 90% of FTs were found to be uniquely expressed, while in other species, this proportion averaged around 74%. The scarcity of recurrent FTs highlighted their unique expression profiles, specific to tissues or influenced by specific conditions or treatments (Supplementary Figure S3). Conversely, a subset of the 113,740 fusion transcripts discovered, exhibited a broader expression spectrum, being detected in more than one tissue or under various condition types. For instance, the fusion transcript AT1G29920_AT1G29930, characterised by parental genes associated with chlorophyll-binding proteins, was found to be expressed in 289 samples, spanning 17 different conditions, and encompassing 13 distinct tissue types (Supplementary Table S3). Only a limited number of fusion transcripts were commonly profiled across different tissues or condition types, underscoring the predominantly unique expression patterns of the FTs (Supplementary Figure S4).

Characterization of fusion transcripts

Out of 113740 identified FTs, approximately 55% include inter-chromosomal junction events with 5’ and 3’ parental genes present on different chromosomes (Supplementary Figure S5), and on average 45.01% have intra-chromosomal junction events with parental genes on the same chromosome (Supplementary Figure S5). The intra-chromosomal fusion events were further categorised into two groups based on the genomic distance between the two parental genes forming fusion (Supplementary Figure S5). The fusions where the parental genes were on the same chromosome with a genomic distance less than or equal to 200 kb were termed ‘proximal’, while FTs with a genomic distance greater than 200 kb were considered ‘distal’ types. Around 80% of total intra-chromosomal fusion transcripts were proximal type, and only 8.59% of total fusion transcripts were distal type (Supplementary Table S1). Additionally, we analysed the relative distance between the fusion gene partners, present on the same strand, and chromosome to find out if a specific intergenic distance was preferred between the two parental genes forming a fusion transcript. The fusion transcripts with parental genes on the same chromosome were mostly found close to each other with the genomic distance being less than 200 kb for most of the fusion transcripts, hence were mostly proximal type (36.4% on average). This suggests that the mechanism underlying fusion transcript generation might prefer shorter distances between genes, in the case of cis-SAGe fusions (Supplementary Figure S6). A Circos plot was constructed to demonstrate the distribution of identified fusion transcripts across the genomes of all four species (Fig. 1). Interestingly, it was found that chromosome 1 harboured the maximum number of both inter-chromosomal and intra-chromosomal RNA pairs in A. thaliana, while chromosome 6 contributed to the largest amount in C. arietinum (Fig. 1). In O. sativa (indica and japonica subgroups), chromosome 3 seems to be a “hotspot” for fusion transcripts, with the maximum number of FTs having one of their parental genes on this chromosome (Fig. 1).

Fig. 1
figure 1

Circos plots showing the genome-wide linkage of fusion transcript parental genes, where chimeric transcripts are visualized as a line connecting the parental genes. A. Arabidopsis thaliana, B. Cicer arietinum, C. Oryza sativa indica, and D. Oryza sativa japonica. The plots are drawn separately for fusion transcripts classified as proximal type (depicted by blue lines), distal type (depicted by red lines), and inter-type (denoted by multi-colored lines)

The fusion can occur in either the coding or untranslated regions of the parental genes, regulating the translational activity of the RNA pairs [16]. On average, 70% of the chimeric transcripts had both sides of the junction fall into the middle of known exons in all the species studied; hence, they were classified as “M/M” (Supplementary Figure S5). On the contrary, only 5% of total FTs identified were classified as “E/E,” where both junction breakpoints lay in known exon ends (Supplementary Figure S5), with exon end positions considered within a 1 bp range of the actual end position. Most chimeric transcripts lacking known exon boundaries at the fusion sites could suggest that these transcripts were formed from unknown sites that do not use the canonical splicing sites and therefore were more likely to be novel exonic sequences produced by transcription of altered genes. This observation may also indicate that such FTs were produced at the DNA level by the recombination occurring for non-allelic homologous. Additionally, 6% of all overlapping FTs had one breakpoint overlapping an exonic end and another falling in the middle of an exon (E/M or M/E) (Supplementary Figure S5). A relatively lower percentage of fusions had their junction breakpoints in the untranslated or unknown regions, with an average of 2.5% of FTs having either both sides in the 3’ or 5’ UTR region (U/U) or one side in the UTR and the other at the exon end or in the middle of the exons (U/E, E/U, U/M, or M/U). Such fusions may result in the normal coding potential of one gene while changing the other, or they may result in “promoter swapping,” where the promoter of the 5’ gene is placed for coding the 3’ gene (junction sequences lying in the 5’ UTR regions). It was also noteworthy that E/E category FTs were mostly inter-chromosomal type (on average 58%). At the same time, most FTs with junction positions in UTR regions belonged mostly to the proximal type (Supplementary Figure S7).

Impact of sequencing depth and genome complexity on fusion transcripts distribution

Our findings revealed a noteworthy positive correlation between the number of fusion transcripts expressed in a sample and its sequencing size. This observation suggested that the identification of fusion transcripts is influenced by read depth, with a marked increase in the number of FTs detected as sample size increases (Supplementary Figure S8). Compared to C. arietinum and O. sativa japonica, where the average correlation coefficient hovered around 0.2, this correlation was notably stronger in A. thaliana and O. sativa indica, yielding a correlation coefficient of 0.5. This observation can be attributed to the larger number of genes present in these genomes. As the gene count increases, the likelihood of genes participating in fusion events also increases.

The number of fusion-forming genes on a chromosome demonstrated a positive correlation with the total number of genes residing on that chromosome. Notably, A. thaliana was found to have the highest percentage of genes involved in the formation of fusion transcripts, with nearly 57.8% of total annotated genes participating in chimeric RNA formation. In contrast, C. arietinum exhibited the fewest annotated genes contributing to fusion formation, accounting for approximately 11% of the total. Conversely, in the O. sativa genomes (indica and japonica subgroups), approximately 30% of the total annotated genes on each chromosome were involved in fusion transcript formation. This distribution pattern suggested that FTs were not created in a biased manner and that every gene on each chromosome possesses an equal opportunity to partake in fusion events (Supplementary Figure S9).

Specificity of fusion transcripts in tissues and stress conditions

The stress and tissue samples with identified fusion transcripts were visualised using the t-Distributed Stochastic Neighbour Embedding (t-SNE) plot, created by designating the presence or absence of FTs in each RNA-Seq sample to investigate the association of FTs with specific cell types or tissue differentiation lineages (Fig. 2). The clustering analysis of the datasets showed that most of the samples from similar tissues or conditions were grouped in all four species studied. In A. thaliana, a total of nine clear clusters were observed (Fig. 2A), with samples from different stresses and tissues grouped, with the exception seen in the case of cold stress in aerial tissue and Hyaloperonospora arabidopsidis (HPA) infection in leaves, where the samples for the two different tissues and conditions clustered together. The nine clusters observed included separate clusters for cytokinin and cadmium treatment in roots; aphid infestation in the rosette; mite infestation in the leaf; gibberellin acid treatment in the shoot; abscisic acid and blue light treatment in the seedling; and Colletotrichum tofieldiae or Colletotrichum incanum (Ct-Ci) infection in the root. In C. arietinum, only five distinctive clusters were observed, including 500 ppm and 700 ppm of carbon dioxide stress in the shoot tissue, Fusarium oxysporum f. sp. Ciceris race 1 (Foc1) infection in the root, Aschochyta treatment in leaf and stem, Helicoverpa armigera infection in the leaf, and drought stress in the shoot. A much more scattered clustering was observed in C. arietinum, with a few tissues and conditions ungrouped (root tissue, drought, and salt stress) due to the smaller number of samples analysed (Fig. 2B).

Fig. 2
figure 2

t-SNE plot created by designating the presence or absence of FTs in each RNA-Seq sample to investigate the association of FTs with specific cell types or tissue differentiation lineages of samples. A. Arabidopsis thaliana, B. Cicer arietinum, C. Oryza sativa indica, and D. Oryza sativa japonica using a binary profile of fusion transcripts drawn using the Rtsne package

A total of seven clusters were observed in Oryza sativa indica, with separate clusters observed in the case of drought stress, for samples retrieved from various tissues, including panicle, leaf, and seedling. Other clusters formed in O. sativa indica include submergence and salt stress in the leaf, silicon treatment in the root, and Rhizoctonia solani infection in the leaf (Fig. 2C). The highest number of clusters were observed in O. sativa japonica (14 clusters), including Na2CO3 treatment in the leaf; drought, iron (Fe), and arsenic stress in the shoot; RDV infection in the seedling; salt stress in the root, seed, and seedling; Magnaporthe oryzae, Hirschmanniella oryzae, Pyricularia oryzae, mycorrhiza, and Rhizoctonia solani infection in the leaf; and drought stress in the leaf. These results further support the tissue and stress specifications of FTs (Fig. 2D).

Functional analysis of fusion transcripts

The gene ontology (GO) enrichment analysis was performed for the parental genes involved in the fusion events to functionally classify and gain deeper insights into the functional roles of parental genes participating in fusion events, encompassing both recurrent and unique fusion transcripts. This analysis was carried out separately for parental genes from fusions uniquely expressed in the samples and those associated with recurrent fusion transcripts. We aimed to discern potential differences in the functional attributes of genes involved in recurrent FTs versus those expressed differentially across various samples. In the case of Cicer arietinum, where the parental genes lacked annotations, we leveraged homologous sequences from A. thaliana to perform this analysis. This approach allowed us to infer functional insights despite the absence of direct annotations for C. arietinum genes. Interestingly, it was found that several terms related to molecular functionality, including “structural molecular activity”, “structural constituent of ribosome”, “anion binding”, etc., were significantly enriched in both recurrent fusions’ parental genes and uniquely present fusions’ parental genes in all species (Supplementary Figure S10). This could imply that the fusion transcripts were involved in fundamental processes to cells, which explains why some of the FTs were expressed in samples from different tissues or cell types. Additionally, “ribosome”, “carbon metabolism”, “carbon fixation in photosynthetic organisms”, and “photosynthesis”, were the major pathways that were commonly enriched in the recurrent as well as unique parental genes of A. thaliana and O. sativa japonica FTs. Consequently, the enrichment maps were constructed using the GO analysis results to functionally characterise the FT parental genes in processes where most of these are statistically represented. The map further showed the common enrichment of FT parental genes (both recurrent and unique) in processes including “photosynthetic process,” “nucleotide metabolic process,” and “ATP metabolic process”. Conversely, it was found that few GO terms related to stress responses were enriched in genes that were uniquely present in a sample of each species. This analysis also indicated that fusion transcripts were usually involved in metabolic or photosynthetic processes shared by all plant species, but some sample-specific fusion transcripts might also be regulated in stress conditions (Fig. 3).

Fig. 3
figure 3

GO enrichment map for fusion transcripts expressing uniquely in a sample and fusion transcript identified in more than one sample of A. Arabidopsis thaliana, B. Oryza sativa indica, C. Oryza sativa japonica, and D. Cicer arietinum

The FTs co-expression profiles were also constructed using the expression values from EricScript results to get insight into the functionality of FTs as the analysis identified the group of FTs that had correlated expression levels across multiple samples. The analysis reported a total of 61 FTs (11 in A. thaliana, 27 in O. sativa indica, and 23 in O. sativa japonica) that were highly correlated (Pearson correlation coefficient > = 0.70) in expression levels across multiple samples. This suggested that the two co-expressing fusions may be involved in similar processes and might have similar functionality. For example, in the O. sativa japonica co-expressing FTs-Os12g0420200_Os04g0473150 and Os12g0420400_Os04g0692200, all four parental genes were found to be involved in photosynthesis processes (Fig. 4).

Fig. 4
figure 4

The fusion transcript expression correlation plotted for A. Arabidopsis thaliana, B. Oryza sativa indica, C. Oryza sativa japonica

Correlation analysis between fusion transcripts and their parental genes

Pearson’s correlation analysis was conducted to evaluate the correlation between the expression pattern of fusion transcripts and their parental genes. Despite the analysis, no significant correlation was found between the expression levels of fusion transcripts and their parental genes. This observation implies that the expression of fusion transcripts is likely independent of the expression pattern of their parental genes. It suggests that the regulation of fusion transcript expression is distinct and possibly governed by mechanisms separate from those controlling parental gene expression (Supplementary Figure S11).

Motif analysis and RNA binding proteins (RBP) domains

The overlap of identified motifs from 5’ and 3’ parental genes in diverse species led to the identification of four common RNA binding motifs including, M198_0.6, M068_0.6, M205_0.6, and M231_0.6. The M198_0.6 RBP was found to be prevalent in the 5’ parental genes of the FTs in A. thaliana and both the parental genes of FTs in C. arietinum and O. sativa japonica (Fig. 5). Besides this, M068_0.6 was present in A. thaliana (5’ and 3’ parental genes) and 3’ parental genes in O. sativa japonica’s FTs. The motif M205_0.6 was found in both parental genes of Indica rice and 5’ of japonica rice, while motif M231_0.6 was present in 3’ of Arabidopsis and both parental genes of O. sativa indica (Fig. 5). Among these RNA binding motifs, M198_0.6, M068_0.6, and M205_0.6 bind to the RNP_1 domain that regulates alternative splicing, RNA stability, and translation [63]; while M231_0.6 binds to the S1 domain which plays a role in translation by interacting with mRNA and ribosomes [64].

Fig. 5
figure 5

Sequence logo of the most enriched motifs scanned by TomTom identified in upstream and downstream sequences from the chimeric junction of 5’ and 3’ fusion parental genes in A. Arabidopsis thaliana, B. Cicer arietinum, C. Oryza sativa indica, and D. Oryza sativa japonica using GLAM2

DNA binding proteins on breakpoint region

We detected enrichments of 34 DNA-binding proteins and 44 different types of histone modifications near the junction breakpoints on the fusion-forming parental genes in both A. thaliana and O. sativa japonica (Supplementary Table S4). However, we did not observe any DNA-binding proteins or histone modification enrichments in the O. sativa indica subgroup because of the small number of samples analysed. The prevalent histone modification “H3K4me3”, typically found in close proximity to the transcriptional start sites of promoters, and closely associated with transcriptional mechanisms, was abundant around most junction breakpoint sites [65, 66]. Additionally, the histone modification “H3K27me3” was also found to be abundantly enriched around most of the junction breakpoint sites. This histone mark holds significance in transcriptional regulation and plays a role in the regulation of plant adaptation to environmental stresses [67].

Our analysis further revealed enrichments of DNA-binding proteins linked to plant growth and stress responses. Noteworthy examples include PPD2 [68], a regulator of leaf growth, and ABI5, a plant hormone and growth regulator [69]; additionally, a lot of stress-responsive proteins, including those containing the NAC domain, were also enriched [70]. Intriguingly, a substantial number of fusion transcript pairs in O. sativa japonica exhibited the enrichment of stress-related DNA binding proteins, including SNAC1, ONAC127, and ONAC129. Remarkably, 1372 fusion pairs were associated with the SNAC1 DNA-binding protein near the junction breakpoint sites of the parental genes [71]. These binding events of histone modifications and DNA-binding proteins at the breakpoint region signified the underlying unknown mechanisms governing the generation of fusion transcripts and their specific functionalities (Supplementary Table S4). For instance, the DNA binding protein SPL7 was found in proximity to the junction of the 5’ gene of the most recurrent A. thaliana FT, AT1G29920_AT1G29930. SPL7 plays a pivotal role in copper homeostasis, a requirement for photosynthetic processes, and the parental genes involved in this fusion are associated with chlorophyll A/B-binding. This observation suggested a potential mechanism contributing to the functional characteristics of the FT [72] (Supplementary S12). Similarly, SNAC1, a DNA-binding protein implicated in iron homeostasis and drought-stress responsiveness, was found to be enriched in the 3’ gene of the FT Os11g0106700_Os12g0106000, which is associated with iron ion binding-related metabolic activities [71] (Supplementary Figure S13).

Spatial proximity and fusion breakpoints

The Hi-C data analysis revealed that a significant number of interactions were inter-chromosomal in each species, with 26%, 58%, and 62.5% on average observed in the A. thaliana, O. sativa indica, and japonica subgroups, respectively. Additionally, it was found that the stress samples, for example, salt stress samples in O. sativa (indica and japonica subgroups), had a lesser number of interacting regions as compared to the total interactions found in control samples. This observation shows that the positioning and pattern of genes change under different environmental stress conditions.

The total valid interactions obtained were then annotated and compared to the fusion breakpoints to extract only those interactions with fusion-forming parental genes from the broad chromatin interacting domains. It revealed that most interactions occurred between parental genes on the same chromosome, with only a few interactions occurring between genes on different chromosomes, which was high in total interactions in all species (Supplementary Table S5). Subsequently, an average of 5.2% of FTs had their breakpoint near the highly interacting regions of chromatin, with a total of 6583 chromatin interacting with the FTs. A total of 3722 FTs were involved in A. thaliana, 1283 in O. sativa indica, and 3135 in O. sativa japonica. It was also found that 12% (on average) of the intra-chromosomal FTs had breakpoints in the highly interacting regions of the fusion-forming genes.

Translationally active FTs: chimeric peptides

An in-silico peptide database was created by three-frame translation and subsequent digestion of identified FT sequences by digesting enzymes, to in-silico validate these using already available proteomic data. By utilizing the in-house script, a total of 46,756 in-silico chimeric peptides spanning the fusion junctions were generated, with 16,662, 4928, 17,156, and 8010 chimeric peptides in A. thaliana, C. arietinum, O. sativa indica, and O. sativa japonica, respectively. Of all the acquired Peptide Spectrum Matches (PSMs), only 4219 PSMs had a q-value higher than the threshold of 0.05. The analysis of these results revealed that only 139 chimeric peptides translated from 150 chimeric transcripts retained the junction sequence, with 40 chimeric peptides found in A. thaliana, five in C. arietinum, 47 in O. sativa indica, and 47 in O. sativa japonica (Supplementary Table S6). A. thaliana showing a higher percentage of validation in FTs identified from shoot tissue, C. arietinum having the highest number of chimeric peptides validated in seed developmental stages, O. sativa indica in drought stress conditions, and O. sativa japonica in leaf tissue. Interestingly, it was found that approximately 56% of the FTs translating a valid chimeric peptide had their parental genes on the same chromosome.

Validation of fusion transcripts with long-read RNA-Seq datasets

We obtained PacBio RNA-Seq data (Supplementary Table S7) for Arabidopsis thaliana and Oryza sativa from the NCBI Sequence Read Archive (SRA). Subsequently, we performed error correction and de-novo transcriptome assembly using RNAbloom2. Following this, we conducted a BLASTn search at an e-value threshold of 0.01 for the top 100 fusion junction sequences identified from Illumina data for each plant species against the long-read transcriptome assembly. Long-reads exhibiting over 90% sequence identity with the fusion junction sequence were classified as true fusions. Using this approach, we validated 44 fusions of Arabidopsis thaliana in long-read transcriptome assembly, 10 fusions of Oryza sativa japonica, and 17 fusions of Oryza sativa indica (Supplementary Table S7).

Validation of fusion transcripts using RT-PCR and Sanger sequencing

We have selected 41 recurrent fusion transcripts (23 FTs in Arabidopsis thaliana and 18 in Oryza sativa) identified in more than 50 samples of both species for validation using RT-PCR and traditional Sanger sequencing. We concentrated on validating the most frequent fusions because they were found in numerous tissues, which could indicate that they play a part in some fundamental function that may be significant for all cell types. The primers were designed to anneal to the putative sequence of the fusion transcripts flanking the fusion junction site, inferred by joining the 200 bp upstream and downstream of the 5’ and 3’ parental genes (Supplementary Table S8). We successfully validated 14 fusion transcripts with confirmed junctions: nine in A. thaliana and five in O. sativa (Supplementary Table S8). Except for the fusion transcript ATCG00065_ATCG01230, all the FTs had their experimentally validated junction site overlap the in-silico predicted site by the fusion identification tools (Supplementary Figure S14). The FTs chosen for validation included 14 proximal, 24 inter, and 3 distal-type fusions. Surprisingly, 50% of validated fusions were proximal, compared to 29% of inter-type fusions, and none were distal. Additionally, the pattern with which the junction breakpoint of the fusions was determined showed an equal chance of fusions whose junction lies in the end and middle of the exons for both the parental genes to be validated (41.6% E/E and 41.1% M/M). The relative expression of the validated FTs was examined under drought, salt, and cold stress to understand the expression of the fusion transcripts under these common stresses. Seven FTs (three in A. thaliana and four in O. sativa) were found to be differentially expressed under a variety of stresses, with FTs from A. thaliana expressing more strongly under salt stress, while four of five FTs from O. sativa showing a relative downregulation under drought stress (Fig. 6 and Supplementary Figure S15). For example, FT Os08g0534200_Os02g0305800 showed a high downregulation in drought stress with a fold change of -3, and FT AT4G06534_AT4G06536 strongly downregulated with a fold change value of -4 under salt stress.

Fig. 6
figure 6

Experimental validation of fusion transcripts. (A) The expression graph of validated fusion transcripts showed variable expression in drought, salt, and cold stress as examined by quantitative RT-PCR. Error bars denote standard deviation (SD). * p < 0.05, ** p < 0.01, *** p < 0.001 when compared with the control. (B) The agarose gel electrophoresis of RT-PCR product on 1.5% gel where GAPDH_Arabidopsis and GADPH_rice were the normalized genes

Discussion

Transcription-mediated gene fusion in plants was thought to be extremely rare for a long time. However, as the last decade unfolded, genome-wide surveys unveiled a multitude of transcription-induced chimeras [25,26,27, 73]. Nevertheless, the functional and evolutionary potential, as well as the emergence of these fusion transcripts in plants, remains relatively uncharted territory. This study endeavours to provide a comprehensive view of the fusion transcript landscape in four important plant species, namely, Arabidopsis thaliana, Cicer arietinum, Oryza sativa indica, and japonica group. These analyses span different tissues and stress conditions, profiled using paired-end transcriptome data to identify the fusion transcripts with high confidence.

Tools selected for the identification of fusion transcripts in the abovementioned plants yielded varying numbers of fusion transcripts for each sample, owing to diverse methodologies and filtering criteria employed by each tool [43,44,45,46] (Supplementary Table S9). This diversity introduces the possibility of both cases, missing true positives and the inclusion of false positives. For instance, MapSplice, which doesn’t rely on canonical splice sites for fusion identification, may report numerous artifacts [46]. Conversely, STARFusion might impose stringent filtering steps that result in the exclusion of potential fusion transcripts [43]. Hence, in this study, the identification of overlapping fusions was carried out using FuMa, which identifies fusions with parental genes sharing a common region [48].

Here, we uncovered an extensive list of fusion transcripts, many of which exhibit multiple fusion breakpoint sites on the same fusion parental gene pairs. These findings suggest that a significant portion of fusion events may be attributed to alternative splicing during transcription, accounting for numerous fusion events without a proportional increase in the number of RNA pairs. This extensive collection of FTs aligns with previous fusion transcript studies in plants e.g. in Oryza, it was estimated that fusion gene origination rates, contributing to the emergence of new genes, are notably high (63 fusion genes per species per million years) [24, 30]. Another study conducted on rice reported that 6.8% of the total spliced isoforms were fusion transcripts [27]. Zhang et al. identified that approximately 50% of novel genes on the short arm of chromosome 3 in rice have emerged through chimeric mechanisms [74].

Moreover, a substantial number of fusion transcripts identified in Arabidopsis thaliana exhibited overlaps with our previously reported fusions in AtFusionDB database [2]. Interestingly, the top three most recurrent read-through fusion transcripts documented in this database, ATPB_ATPE, RPL22_RPS3, and PSBE_PSBF, were found to be highly expressed in this current study, occurring in 107, 98, and 84 out of 1280 samples, respectively. The substantial convergence of fusion transcripts identified in our study in O. sativa japonica with fusions identified in Rice by Zhang et al. further reinforces our findings [25]. Additionally, distinctive fusion transcripts identified in this study, which have been previously characterised, contributed to the expansion of transcriptome complexity. For instance, the read-through fusion transcript, “AT1G02050_AT1G02060”, was previously recognised as a fusion transcript spanning two loci with distinct splicing patterns in the intergenic regions. Likewise, “AT4G39361_AT4G39364”, was identified as transcripts covering two loci, maintaining the splicing pattern but with the retention of an un-spliced intergenic region. The FT “AT2G23550_AT2G23560” represented transcripts spanning two gene loci with different splicing patterns compared to each annotated transcript [75]. The BIO3-BIO1, a bifunctional protein encoding chimeric gene described by Muralla et al. as a read-through transcription-induced chimera crucial for biotin biosynthesis, was also observed to be expressed in one RNA-Seq sample of A. thaliana under low-temperature condition [76].

Our analysis revealed that even non-nuclear DNA transcripts from mitochondria and chloroplasts can participate in intra- and inter-chromosomal fusion formation. Such fusion transcripts constitute a captivating class of functional chimeric transcripts in plants, where non-nuclear genes, predominantly from mitochondria, are relocated to the nuclear genome through RNA intermediates. These relocated genes form chimeras with pre-existing nuclear genes, likely via retroposed gene copies [10, 77]. In many cases, ancestral nuclear genes provide targeting signals for the import of the mitochondrion-derived proteins back into mitochondria [78], enabling the transfer of mitochondrial genes to the nucleus while preserving mitochondrial functions. Fusion transcripts between plastid chromosomal genes and nuclear genes have also been identified. In A. thaliana, fusion transcript “AT1G28420_ATCG00920” features a 5’ parental gene on chromosome 1 and a 3’ fusion partner on the plastid chromosome. Similarly, fusion transcript “AT1G01820_ATMG01280” results from a fusion of transcripts on chromosome 1 and the mitochondrial chromosome.

It is well documented that chimeric products possess the potential to evolve novel functions that are distinct yet reminiscent of the parental genes [79]. This characteristic enables researchers to explore gene function and evolution. Consequently, our gene ontology (GO) enrichment analysis for parental genes revealed that recurrent parental genes in all species primarily participate in metabolic processes. This underscores the pivotal role of fusion transcripts in the operation of such metabolic pathways. The participation of FTs in such fundamental processes might correspond to the common profiling of fusion transcripts in diverse conditions/treatments and tissue samples. Many metabolically active fusion genes are involved in the functioning of important pathways, such as Norcoclaurine synthase (NCS) in benzylisoquinoline alkaloid (BIA) metabolism, reticuline epimerase (REPI) enzyme in morphine biosynthesis of opium poppy [38, 80], and aspartate kinase-homoserine dehydrogenases (AK-HSD) fusion in the biosynthesis of aspartate-family amino acids, have already been identified. Additionally, fusion transcripts exhibited significant enrichment in processes linked to photosynthesis, further supporting their relatively higher expression in leaf tissue samples (Supplementary Figure S2). Intriguingly, we observed that a well-known QTL, SPI, short panicle 1 (Os04g0441800), known to affect panicle architecture, formed a fusion with a gene, Os04g0441800, implicated in molecular functions related to transmembrane transporter activity [81].

Using ChIP-Seq and Hi-C data, we embarked on exploring how the surrounding sequences in 3D proximity and epigenetic modifications contribute to chimeric RNA formation. Our analysis unveiled the enrichment of numerous DNA-binding proteins near the breakpoint junctions of parental genes involved in fusion formation. For example, the validated fusion transcript, Os08g0534200_Os02g0305800, identified in Oryza sativa samples, displayed enrichments of several histone modifications, including lysine butyrylation (Kbu), crotonylation (Kcr), H3K9ac, and H3K9me2, near the breakpoint region of the 5’ parental gene. Notably, Kbu and Kcr, transcriptionally active proteins enriched at the transcriptional start site (TSS) in rice, may play a role in the formation of this fusion transcript [82]. This fusion transcript was consistently downregulated in all stress conditions and was previously identified by Zheng et al. as “AK066730_AK066926” [25]. The junction sequences of certain parental genes involved in fusion formation, featured transcription factors with a predilection for binding to protein-coding gene promoters. For instance, in the fusion “ATCG00065_ATCG01230,” the DNA binding protein, AGO4, was enriched approximately 150 bp upstream of the 5’ gene breakpoint [83]. This FT also exhibited coding potential and translated two chimeric peptides, “YLGPFSGEPPSYLTGEFPGDYGWDTAGLSADPETFAR” and “YLGPSGSPWYGSDR”, which were validated from 792 to 98 different FTs, respectively (Supplementary Table S6).

Additionally, we found that the genomic distance between intra-chromosomal fusion transcripts appeared to favour fusion transcript formation. The Hi-C analysis revealed that genes in close spatial proximity are more prone to undergoing fusion formation in these species, especially in intra-chromosomal FTs. The higher number of validated proximal intra-chromosomal fusion transcripts, coupled with supporting data from spatial interaction Hi-C data, solidified the notion that the process of fusion transcript generation in plants might be spatially influenced, primarily occurring in genomic regions in close proximity to each other. The observation that chimeric RNAs prefer to be intragenic suggested that these detected chimeric RNAs are authentic transcripts within cells, rather than artefacts generated during cDNA library preparation. If fusion transcripts were solely a consequence of recombination events during the cDNA library formation, we would anticipate that the two segments of these artificial chimeric sequences would originate from different chromosomes randomly, rather than consistently from the same chromosome.

Furthermore, fusion RNAs formed by splicing and transcriptional read-through exhibited significantly closer breakpoints than those formed by genomic rearrangements. This is akin to the findings reported by Chen et al. [40] in rice, where approximately 90% of the identified fusion transcripts were intra-chromosomal. Notably, identifying many fusion-forming tandem genes with intergenic genomic distance of less than 200 kb in A. thaliana and O. sativa further supported the idea that most fusion transcripts originate from genes with minimal intergenic distances [84].

The structural diversity of these fusion transcripts is one of their most distinguishing features; they can emerge at the terminus of known exons, in the middle of exons, or UTR regions. This diversity can impact the functionality of the fusion protein, allowing it to serve various functions or even culminating in the generation of non-coding RNAs. Fusion products formed between coding regions of parental genes with matching reading frames generate in-frame transcripts, which translate into functional proteins. Conversely, junction breakpoints in UTR regions may activate, eliminate, or attenuate the functionality of the respective 5’ and 3’ proteins [16]. In this study, we validated 139 chimeric peptides out of 46,756 in silico generated chimeric peptides using mass spectrometry data, affirming that fusion transcripts can indeed possess coding potential. It was worth noting that the relatively low number of validations could be attributed to the limited availability of proteomic datasets. The apparent discrepancy between the number of putative chimeric transcripts and reported chimeric proteins to date suggested that many fusion transcripts may not undergo translation, possibly serving as regulators in the form of long non-coding RNAs.

We chose 41 recurrent events of fusion transcripts for validation using Sanger sequencing and RT-PCR; however, only 14 (approximately 34%) were successfully validated. This could be attributed to errors, such as template switching, that may occur during the reverse transcription step of cDNA library preparation. Additionally, the observed low validation rate could be influenced by the fact that fusion transcripts are expressed at very low levels. Nevertheless, in most cases, the junction sequences observed in the sequenced RT-PCR bands aligned with the in silico predicted junctions, signifying that FT formation is not a stochastic process. Notably, about 50% of the validated FTs exhibited parental genes on opposite strands, implying that inversion or trans-splicing could be a mechanism for these FTs’ formation. Our investigation into the mechanism of FT generation revealed that most of the plant fusions employ non-canonical splicing sites, with a smaller portion having known exon boundaries. The presence of numerous M/M-type and E/E-type FTs suggested that alternative splicing or genomic rearrangement at the DNA level plays a major role in FT formation in plants. Out of the 14 validated FTs, seven were of the proximal type, featuring parental genes located on the same chromosome, suggesting their formation through splicing mechanisms or transcriptional read-through.

Data availability

All the fusion data generated is available in the form of a database named ‘PFusionDB’, www.nipgr.ac.in/PFusionDB. The scripts and code used for analysis in the study are available in GitHub repository (https://github.com/skbinfo/PFusion).

Abbreviations

FTs:

Fusion transcripts

TIC:

Transcription-induced chimeras

TIGF:

Transcription-induced gene fusions

BIA:

Benzylisoquinoline alkaloid

NCS:

Norcoclaurine synthase

GN2:

GRAINS NUMBER 2

FISH:

Fluorescence in situ hybridisation

NCBI:

National Center for Biotechnology Information

SRA:

Sequence Read Archive

GLAM2:

Gapped Local Alignment of Motifs

PSMS:

Peptide spectrum matches

GO:

Gene ontology

References

  1. Gingeras TR. Implications of chimeric non-collinear transcripts. Nature. 2009;461:206.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Singh A, Zahra S, Das D, Kumar S. AtFusionDB: a database of fusion transcripts in Arabidopsis thaliana. Database (Oxford). 2019;2019.

  3. Frenkel-Morgenstern M, Lacroix V, Ezkurdia I, Levin Y, Gabashvili A, Prilusky J, et al. Chimeras taking shape: potential functions of proteins encoded by chimeric RNA transcripts. Genome Res. 2012;22:1231–42.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Li H, Wang J, Ma X, Sklar J. Gene fusions and RNA trans-splicing in normal and neoplastic human cells. Cell Cycle. 2009;8:218–22.

    Article  CAS  PubMed  Google Scholar 

  5. Parra G, Reymond A, Dabbouseh N, Dermitzakis ET, Castelo R, Thomson TM, et al. Tandem chimerism as a means to increase protein complexity in the human genome. Genome Res. 2006;16:37.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Greger L, Su J, Rung J, Ferreira PG, Lappalainen T, Dermitzakis ET, et al. Tandem RNA chimeras contribute to Transcriptome Diversity in Human Population and are Associated with Intronic Genetic variants. PLoS ONE. 2014;9:104567.

    Article  Google Scholar 

  7. Mertens F, Johansson B, Fioretos T, Mitelman F. The emerging complexity of gene fusions in cancer. Nat Rev Cancer. 2015;15:371–81.

    Article  CAS  PubMed  Google Scholar 

  8. Annala MJ, Parker BC, Zhang W, Nykter M. Fusion genes and their discovery using high throughput sequencing. Cancer Lett. 2013;340:192–200.

    Article  CAS  PubMed  Google Scholar 

  9. Jiang N, Bao Z, Zhang X, Eddy SR, Wessler SR. Pack-MULE transposable elements mediate gene evolution in plants. Nat 2004. 2004;431:7008.

    Google Scholar 

  10. Wang W, Zheng H, Fan C, Li J, Shi J, Cai Z, et al. High rate of chimeric gene origination by Retroposition in Plant genomes. Plant Cell. 2006;18:1791.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Dorney R, Dhungel BP, Rasko JEJ, Hebbard L, Schmitz U. Recent advances in cancer fusion transcript detection. Brief Bioinform. 2023;24:1–12.

    Article  CAS  Google Scholar 

  12. Lei Q, Li C, Zuo Z, Huang C, Cheng H, Zhou R. Evolutionary insights into RNA trans-splicing in vertebrates. Genome Biol Evol. 2016;8:562–77.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Han C, Sun LY, Wang WT, Sun YM, Chen YQ. Non-coding RNAs in cancers with chromosomal rearrangements: the signatures, causes, functions and implications. J Mol Cell Biol. 2019;11:886–98.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Dupain C, Harttrampf AC, Boursin Y, Lebeurrier M, Rondof W, Robert-Siegwald G, et al. Discovery of New Fusion transcripts in a cohort of Pediatric Solid cancers at Relapse and Relevance for Personalized Medicine. Mol Ther. 2019;27:200–18.

    Article  CAS  PubMed  Google Scholar 

  15. Zhang Y, Gong M, Yuan H, Park HG, Frierson HF, Li H. Chimeric transcript generated by cis-splicing of adjacent genes regulates prostate cancer cell proliferation. Cancer Discov. 2012;2:598–607.

    Article  CAS  PubMed  Google Scholar 

  16. Latysheva NS, Babu MM. Discovering and understanding oncogenic gene fusions through data intensive computational approaches. Nucleic Acids Res. 2016;44:4487–503.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Varley KE, Gertz J, Roberts BS, Davis NS, Bowling KM, Kirby MK, et al. Recurrent read-through fusion transcripts in breast cancer. Breast Cancer Res Treat. 2014;146:287–97.

    Article  PubMed  PubMed Central  Google Scholar 

  18. Druker BJ, Tamura S, Buchdunger E, Ohno S, Segal GM, Fanning S, et al. Effects of a selective inhibitor of the abl tyrosine kinase on the growth of bcr-abl positive cells. Nat Med. 1996;2:561–6.

    Article  CAS  PubMed  Google Scholar 

  19. Babiceanu M, Qin F, Xie Z, Jia Y, Lopez K, Janus N, et al. Recurrent chimeric fusion RNAs in non-cancer tissues and cells. Nucleic Acids Res. 2016;44:2859–72.

    Article  PubMed  PubMed Central  Google Scholar 

  20. Chwalenia K, Facemire L, Li H. Chimeric RNAs in cancer and normal physiology. Wiley Interdiscip Rev RNA. 2017;8(6).

  21. Chapdelaine Y, Bonen L. The wheat mitochondrial gene for subunit I of the NADH dehydrogenase complex: a trans-splicing model for this gene-in-pieces. Cell. 1991;65:465–72.

    Article  CAS  PubMed  Google Scholar 

  22. Koller B, Fromm H, Galun E, Edelman M. Evidence for in vivo trans splicing of pre-mRNAs in tobacco chloroplasts. Cell. 1987;48:111–9.

    Article  CAS  PubMed  Google Scholar 

  23. Kück U, Choquet Y, Schneider M, Dron M, Bennoun P. Structural and transcription analysis of two homologous genes for the P700 chlorophyll a-apoproteins in Chlamydomonas reinhardii: evidence for in vivo trans-splicing. EMBO J. 1987;6:2185–95.

    Article  PubMed  PubMed Central  Google Scholar 

  24. Zhou Y, Zhang C. Evolutionary patterns of chimeric retrogenes in Oryza species. Sci Rep 2019. 2019;9(1):9:1–12.

    Google Scholar 

  25. Zhang G, Guo G, Hu X, Zhang Y, Li Q, Li R, et al. Deep RNA sequencing at single base-pair resolution reveals high complexity of the rice transcriptome. Genome Res. 2010;20:646–54.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Chao Y, Yuan J, Li S, Jia S, Han L, Xu L. Analysis of transcripts and splice isoforms in red clover (Trifolium pratense L.) by single-molecule long-read sequencing. BMC Plant Biol. 2018;18:1–12.

    Article  Google Scholar 

  27. Hasan S, Huang L, Liu Q, Perlo V, O’Keeffe A, Margarido GRA, et al. The Long read Transcriptome of Rice (Oryza sativa ssp. japonica var. Nipponbare) reveals novel transcripts. Rice. 2022;15:1–17.

    Article  Google Scholar 

  28. Thimmapuram J, Duan H, Liu L, Schuler MA. Bicistronic and fused monocistronic transcripts are derived from adjacent loci in the Arabidopsis genome. RNA. 2005;11:128.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Qiao D, Yang C, Chen J, Guo Y, Li Y, Niu S, et al. Comprehensive identification of the full-length transcripts and alternative splicing related to the secondary metabolism pathways in the tea plant (Camellia sinensis). Sci Rep. 2019;9:2709.

    Article  PubMed  PubMed Central  Google Scholar 

  30. Zhou Y, Zhang C, Zhang L, Ye Q, Liu N, Wang M, et al. Gene fusion as an important mechanism to generate new genes in the genus Oryza. Genome Biol. 2022;23:1–23.

    Article  Google Scholar 

  31. Zhou Y, Lu Q, Zhang J, Zhang S, Weng J, Di H et al. Genome-wide profiling of Alternative Splicing and Gene Fusion during Rice Black-streaked dwarf virus stress in Maize (Zea mays L). Genes (Basel). 2022;13(3):456.

  32. Parakkunnel R, Bhojaraja Naik K, Vanishree G, Susmita C, Purru S, Udaya Bhaskar K et al. Gene fusions, micro-exons and splice variants define stress signaling by AP2/ERF and WRKY transcription factors in the sesame pan-genome. Front Plant Sci. 2022;13:1076229.

  33. He Z shui, Zou H song, Wang Y, zhang, Zhu J bi, Yu G. qiao. Maturation of the nodule-specific transcript MsHSF1c in Medicago sativa may involve interallelic trans-splicing. Genomics. 2008;92:115–21.

  34. Kawasaki T, Okumura S, Kishimoto N, Shimada H, Higo K, Ichikawa N. RNA maturation of the rice SPK gene may involve trans-splicing. Plant J. 1999;18:625–32.

    Article  CAS  PubMed  Google Scholar 

  35. Chen JJ, Janssen BJ, Williams A, Sinha N. A gene fusion at a homeobox locus: alterations in leaf shape and implications for morphological evolution. Plant Cell. 1997;9:1289.

    CAS  PubMed  PubMed Central  Google Scholar 

  36. Duc C, Sherstnev A, Cole C, Barton GJ, Simpson GG. Transcription termination and chimeric RNA formation controlled by Arabidopsis thaliana FPA. PLoS Genet. 2013;9(10).

  37. Kim M, Canio W, Kessler S, Sinha N. Developmental changes due to long-distance movement of a homeobox fusion transcript in tomato. Science. 2001;293:287–9.

    Article  CAS  PubMed  Google Scholar 

  38. Hagel JM, Facchini PJ. Tying the knot: occurrence and possible significance of gene fusions in plant metabolism and beyond. J Exp Bot. 2017;68:4029–43.

    Article  CAS  PubMed  Google Scholar 

  39. Li Y, Li S, Thodey K, Trenchard I, Cravens A, Smolke CD. Complete biosynthesis of noscapine and halogenated alkaloids in yeast. Proc Natl Acad Sci U S A. 2018;115:E3922–31.

    CAS  PubMed  PubMed Central  Google Scholar 

  40. Chen H, Tang Y, Liu J, Tan L, Jiang J, Wang M, et al. Emergence of a novel chimeric gene underlying grain number in rice. Genetics. 2017;205:993–1002.

    Article  CAS  PubMed  Google Scholar 

  41. Wang Q, Xia J, Jia P, Pao W, Zhao Z. Application of next generation sequencing to human gene fusion detection: computational tools, features and perspectives. Brief Bioinform. 2013;14:506–19.

    Article  CAS  PubMed  Google Scholar 

  42. Perez-Riverol Y, Bai J, Bandla C, García-Seisdedos D, Hewapathirana S, Kamatchinathan S, et al. The PRIDE database resources in 2022: a hub for mass spectrometry-based proteomics evidences. Nucleic Acids Res. 2022;50:D543–52.

    Article  CAS  PubMed  Google Scholar 

  43. Haas BJ, Dobin A, Li B, Stransky N, Pochet N, Regev A. Accuracy assessment of fusion transcript detection via read-mapping and de novo fusion transcript assembly-based methods. Genome Biol. 2019;20:1–16.

    Article  CAS  Google Scholar 

  44. Ma C, Shao M, Kingsford C. SQUID: transcriptomic structural variation detection from RNA-seq. Genome Biol. 2018;19(1):52.

  45. Benelli M, Pescucci C, Marseglia G, Severgnini M, Torricelli F, Magi A. Discovering chimeric transcripts in paired-end RNA-seq data by using EricScript. Bioinformatics. 2012;28:3232–9.

    Article  CAS  PubMed  Google Scholar 

  46. Wang K, Singh D, Zeng Z, Coleman SJ, Huang Y, Savich GL et al. MapSplice: accurate mapping of RNA-seq reads for splice junction discovery. Nucleic Acids Res. 2010;38(18):e178.

  47. Chen S, Zhou Y, Chen Y, Gu J, Fastp. An ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. Oxford University Press; 2018. pp. i884–90.

  48. Hoogstrate Y, Böttcher R, Hiltemann S, Van Der Spek PJ, Jenster G, Stubbs AP. FuMa: reporting overlap in RNA-seq detected fusion genes. Bioinformatics. 2016;32:1226–8.

    Article  CAS  PubMed  Google Scholar 

  49. Frith MC, Saunders NFW, Kobe B, Bailey TL. Discovering sequence motifs with arbitrary insertions and deletions. PLoS Comput Biol. 2008;4:e1000071.

    Article  PubMed  PubMed Central  Google Scholar 

  50. Gupta S, Stamatoyannopoulos JA, Bailey TL, Noble WS. Quantifying similarity between motifs. Genome Biol. 2007;8:1–9.

    Article  Google Scholar 

  51. Langmead B, Wilks C, Antonescu V, Charles R. Scaling read aligners to hundreds of threads on general-purpose processors. Bioinformatics. 2019;35:421–32.

    Article  CAS  PubMed  Google Scholar 

  52. Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nature Methods 2012 9:4. 2012;9:357–9.

  53. Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 2008;9:1–9.

    Article  Google Scholar 

  54. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25:2078–9.

    Article  PubMed  PubMed Central  Google Scholar 

  55. Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26:841–2.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Servant N, Varoquaux N, Lajoie BR, Viara E, Chen CJ, Vert JP et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 2015;16.

  57. Rice P, Longden L, Bleasby A. EMBOSS: the European Molecular Biology Open Software suite. Trends Genet. 2000;16:276–7.

    Article  CAS  PubMed  Google Scholar 

  58. Hulstaert N, Shofstahl J, Sachsenberg T, Walzer M, Barsnes H, Martens L, et al. ThermoRawFileParser: modular, scalable, and cross-platform RAW file Conversion. J Proteome Res. 2020;19:537–42.

    Article  CAS  PubMed  Google Scholar 

  59. Chambers MC, Maclean B, Burke R, Amodei D, Ruderman DL, Neumann S, et al. A cross-platform toolkit for mass spectrometry and proteomics. Nat Biotechnol. 2012;30:918–20.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Kim S, Pevzner PA. MS-GF + makes progress towards a universal database search tool for proteomics. Nat Commun 2014. 2014;5:1.

    Google Scholar 

  61. Käll L, Storey JD, Noble WS. QVALITY: non-parametric estimation of q-values and posterior error probabilities. Bioinformatics. 2009;25:964–6.

    Article  PubMed  PubMed Central  Google Scholar 

  62. Livak KJ, Schmittgen TD. Analysis of relative gene expression data using real-time quantitative PCR and the 2(-Delta Delta C(T)) method. Methods. 2001;25:402–8.

    Article  CAS  PubMed  Google Scholar 

  63. Landsman D. RNP-1, an RNA-binding motif is conserved in the DNA-binding cold shock domain. Nucleic Acids Res. 1992;20:2861.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. McGinness KE, Sauer RT. Ribosomal protein S1 binds mRNA and tmRNA similarly but plays distinct roles in translation of these molecules. Proc Natl Acad Sci U S A. 2004;101:13454–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. Howe FS, Fischl H, Murray SC, Mellor J. Is H3K4me3 instructive for transcription activation? BioEssays. 2017;39:1–12.

  66. Liang G, Lin JCY, Wei V, Yoo C, Cheng JC, Nguyen CT, et al. Distinct localization of histone H3 acetylation and H3-K4 methylation to the transcription start sites in the human genome. Proc Natl Acad Sci U S A. 2004;101:7357–62.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Shen Q, Lin Y, Li Y, Wang G. Dynamics of H3K27me3 modification on Plant Adaptation to Environmental cues. Plants. 2021;10.

  68. Baekelandt A, Pauwels L, Wang Z, Li N, De Milde L, Natran A, et al. Arabidopsis Leaf flatness is regulated by PPD2 and NINJA through repression of CYCLIN D3 genes. Plant Physiol. 2018;178:217–32.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. Skubacz A, Daszkowska-Golec A, Szarejko I. The role and regulation of ABI5 (ABA-insensitive 5) in plant development, abiotic stress responses and phytohormone crosstalk. Front Plant Sci. 2016;7:234140.

    Article  Google Scholar 

  70. Nuruzzaman M, Sharoni AM, Kikuchi S. Roles of NAC transcription factors in the regulation of biotic and abiotic stress responses in plants. Front Microbiol. 2013;4 SEP:55831.

  71. Liu G, Li X, Jin S, Liu X, Zhu L, Nie Y et al. Overexpression of Rice NAC Gene SNAC1 improves Drought and Salt Tolerance by enhancing Root Development and reducing transpiration rate in transgenic cotton. PLoS ONE. 2014;9.

  72. Bernal M, Casero D, Singh V, Wilson GT, Grande A, Yang H, et al. Transcriptome sequencing identifies SPL7-Regulated Copper Acquisition genes FRO4/FRO5 and the Copper Dependence of Iron Homeostasis in Arabidopsis. Plant Cell. 2012;24:738.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  73. Wang M, Wang P, Liang F, Ye Z, Li J, Shen C, et al. A global survey of alternative splicing in allopolyploid cotton: landscape, complexity and regulation. New Phytol. 2018;217:163–78.

    Article  PubMed  Google Scholar 

  74. Zhang C, Wang J, Marowsky NC, Long M, Wing RA, Fan C. High occurrence of functional new chimeric genes in Survey of Rice chromosome 3 short arm genome sequences. Genome Biol Evol. 2013;5:1038.

    Article  PubMed  PubMed Central  Google Scholar 

  75. Zhang S, Li R, Zhang L, Chen S, Xie M, Yang L, et al. New insights into Arabidopsis transcriptome complexity revealed by direct sequencing of native RNAs. Nucleic Acids Res. 2020;48:7700.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. Muralla R, Chen E, Sweeney C, Gray JA, Dickerman A, Nikolau BJ, et al. A bifunctional locus (BIO3-BIO1) required for biotin biosynthesis in Arabidopsis. Plant Physiol. 2008;146:60–73.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  77. Nugent JM, Palmer JD. RNA-mediated transfer of the gene coxII from the mitochondrion to the nucleus during flowering plant evolution. Cell. 1991;66:473–81.

    Article  CAS  PubMed  Google Scholar 

  78. Liu SL, Zhuang Y, Zhang P, Adams KL. Comparative analysis of structural diversity and sequence evolution in plant mitochondrial genes transferred to the nucleus. Mol Biol Evol. 2009;26:875–91.

    Article  PubMed  Google Scholar 

  79. Yanai I, Derti A, DeLisi C. Genes linked by fusion events are generally of the same functional category: a systematic analysis of 30 microbial genomes. Proc Natl Acad Sci. 2001;98:7940–5.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  80. Farrow SC, Hagel JM, Beaudoin GAW, Burns DC, Facchini PJ. Stereochemical inversion of (S)-reticuline by a cytochrome P450 fusion in opium poppy. Nat Chem Biol. 2015;11(9):728-32.

    Google Scholar 

  81. Li S, Qian Q, Fu Z, Zeng D, Meng X, Kyozuka J, et al. Short panicle1 encodes a putative PTR family transporter and determines rice panicle size. Plant J. 2009;58:592–605.

    Article  CAS  PubMed  Google Scholar 

  82. Lu Y, Xu Q, Liu Y, Yu Y, Cheng ZY, Zhao Y et al. Dynamics and functional interplay of histone lysine butyrylation, crotonylation, and acetylation in rice under starvation and submergence. Genome Biol. 2018;19(1):144.

  83. Zheng Q, Rowley MJ, Böhmdorfer G, Sandhu D, Gregory BD, Wierzbicki AT. RNA polymerase V targets transcriptional silencing components to promoters of protein-coding genes. Plant J. 2013;73:179.

    Article  CAS  PubMed  Google Scholar 

  84. Shahmuradov IA, Shahmuradov IA, Abdulazimova AU, Solovyev VV, Qamar R, Chohan N et al. Mono-and Bi-cistronic chimeric mRNAs in Arabidopsis and Rice genomes. Appl Comput Math. 2010;(9):66–81.

Download references

Acknowledgements

The authors are thankful to the Department of Biotechnology (DBT)-eLibrary Consortium, India, for providing access to e-resources. AS and SZ are thankful to the Council of Scientific and Industrial Research (CSIR), India, for research fellowships. SA and FH are thankful to the University Grants Commission (UGC), India, and the Department of Biotechnology (DBT), India, respectively, for research fellowships. The authors acknowledge the Computational Biology & Bioinformatics Facility (CBBF) of the National Institute of Plant Genome Research (NIPGR).

Funding

This research is supported by the BT/PR40146/BTIS/137/4/2020 project grant from the Department of Biotechnology (DBT), Government of India, and EEQ/2019/000231 Science and Engineering Research Board (SERB), Department of Science and Technology, Government of India.

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization: AS, PC, and SKData curation: AS, PC, RB, SA, FH, and SCFormal analysis: PC, AS, RB, SA, FH, NS and MRValidation: RG and SZVisualization: AAFunding acquisition, Project administration, Resources, and Supervision: SKRoles/Writing - Original draft: AS, PC, SZ, RG, and SK; and Writing - review & editing: SK.

Corresponding author

Correspondence to Shailesh Kumar.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chitkara, P., Singh, A., Gangwar, R. et al. The landscape of fusion transcripts in plants: a new insight into genome complexity. BMC Plant Biol 24, 1162 (2024). https://doi.org/10.1186/s12870-024-05900-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12870-024-05900-0

Keywords