Transcriptome analysis of ripe and unripe fruit tissue of banana identifies major metabolic networks involved in fruit ripening process

Background Banana is one of the most important crop plants grown in the tropics and sub-tropics. It is a climacteric fruit and undergoes ethylene dependent ripening. Once ripening is initiated, it proceeds at a fast rate making postharvest life short, which can result in heavy economic losses. During the fruit ripening process a number of physiological and biochemical changes take place and thousands of genes from various metabolic pathways are recruited to produce a ripe and edible fruit. To better understand the underlying mechanism of ripening, we undertook a study to evaluate global changes in the transcriptome of the fruit during the ripening process. Results We sequenced the transcriptomes of the unripe and ripe stages of banana (Musa accuminata; Dwarf Cavendish) fruit. The transcriptomes were sequenced using a 454 GSFLX-Titanium platform that resulted in more than 7,00,000 high quality (HQ) reads. The assembly of the reads resulted in 19,410 contigs and 92,823 singletons. A large number of the differentially expressed genes identified were linked to ripening dependent processes including ethylene biosynthesis, perception and signalling, cell wall degradation and production of aromatic volatiles. In the banana fruit transcriptomes, we found transcripts included in 120 pathways described in the KEGG database for rice. The members of the expansin and xyloglucan transglycosylase/hydrolase (XTH) gene families were highly up-regulated during ripening, which suggests that they might play important roles in the softening of the fruit. Several genes involved in the synthesis of aromatic volatiles and members of transcription factor families previously reported to be involved in ripening were also identified. Conclusions A large number of differentially regulated genes were identified during banana fruit ripening. Many of these are associated with cell wall degradation and synthesis of aromatic volatiles. A large number of differentially expressed genes did not align with any of the databases and might be novel genes in banana. These genes can be good candidates for future studies to establish their role in banana fruit ripening. The datasets developed in this study will help in developing strategies to manipulate banana fruit ripening and reduce post harvest losses. Electronic supplementary material The online version of this article (doi:10.1186/s12870-014-0316-1) contains supplementary material, which is available to authorized users.


Background
Banana fruit is the staple food for an estimated 400 million people. The banana plant is a large herbaceous, evergreen, flowering monocot belonging to the genus Musa (family Musaceae order Zingiberales). The majority of the cultivated banana is derived from the cross between Musa acuminata and Musa balbisiana. The fruit development and ripening is a complex process influenced by numerous factors including light, hormones, temperature and genotype. Ripening associated events in climacteric fruits, including banana, leads to developmentally and physiologically regulated changes in gene expression which ultimately bring changes in color, texture, flavor, and aroma of fruit [1][2][3]. Fruit ripening and softening involves irreversible physiological and biochemical changes which contribute to the perishability of the banana fruit. Premature ripening brings significant losses to both farmers and consumers alike. Therefore, there is an urgent need to develop tools to delay ripening and softening process through genetic engineering approaches.
Recently, the genome of banana was sequenced using DH-Pahang a double haploid (523 Mb) derived from a seedy diploid of the subspecies M. malaccensis, which led to the identification of 36,542 protein coding genes [4]. To support and accelerate genetic and genomic studies of banana, the banana genome hub was recently developed [5]. It has been commonly observed that ripening of banana involves extensive changes in the cell wall [6]. Earlier studies with banana identified multiple families of genes associated with cell wall degradation [7][8][9][10][11]. Apart from softening associated genes, a few genes have been identified in banana that relate to ethylene biosynthesis, signal transduction and transcription factors [12,13]. Approaches like subtractive hybridization and differential library screening have been employed [11,[14][15][16] to identify differentially expressed genes during banana fruit ripening. However, apart from these genes, ripening likely involves the up and down-regulation of hundreds of genes not yet identified in banana.
Expressed Sequence Tags (ESTs) can be a useful tool for the purposes of gene discovery especially in non-model plants for which limited genomic information is available [17,18]. The in-depth generation of EST datasets and comparison provide information about all the expressed regions of a genome and can be used to characterize patterns of gene expression during fruit ripening. Using Next-Generation Sequencing (NGS) such databases have been developed and used for discovery and prediction of genes involved in fruit development and ripening. Transcriptome analyses in Curcumas' melo [19,20], citrus [21,22] blueberry [23], capsicum [24], Chinese bayberry [25], sweet orange [26], kiwi fruit [27], grape [28,29] tomato [30], watermelon [31] and many others have provided insight into genes and pathways involved in fruit development and ripening [32]. These databases are also a rich source of gene-derived molecular markers (e.g. simple sequence repeat, SSR) which can be used for germplasm breeding or physical mapping.
The primary objective of our study was to add to a basic understanding of banana fruit ripening at molecular level. In this study, we established a transcriptome datasets of unripe and ripe banana fruit using NGS technology based on 454 GS FLX Titanium platform. We identified genes involved in ethylene biosynthesis and its perception, fruit softening and other processes that initiate the ripening process to produce an edible banana fruit. The analysis has provided new information about many genes not previously identified that are expressed during banana fruit ripening. Some of these genes may be potential candidates that can be manipulated to increase the postharvest shelf life of banana and reduce economic losses. As a part of this study, we identified molecular markers for EST-SSRs that will facilitate marker-assisted breeding of banana. In addition, we mapped our reads to the Musa acuminate banana genome, as well as de novo assembly to account for the varietal difference in the species sequences. The contigs obtained were then mapped again to the banana genome to identify members of different gene families.

Results and discussion
Sequencing, annotation and mapping to the banana genome To examine global changes occurring during ripening in the banana fruit, cDNA libraries from unripe and ripe banana fruit pulp (cultivar Harichhal) were sequenced using half plate run for each on a 454-GS FLX Titanium platform. Each transcriptome produced more than 7,00,000 high quality (HQ) reads (Table 1), which were assembled using the GS Assembler program as described in Material and methods.
To study the differential expression of genes during banana fruit ripening, the total number of reads of unripe and ripe fruit transcriptomes were tagged, pooled and assembled using parameters described in material and methods using the GSAssembler program. A total of 14,83,544 reads were assembled into 19,410 contigs and 92,823 singletons. Within this assembly, 10,715 contigs were considered as large contigs with average size of 914 bp. The average contig length of all contigs was 642 bp with contig depth of 80 reads. These contigs and singletons were pooled together and are referred to here as the comparative transcripts. The total number of comparative transcripts was 1,12,233. As many gene families have multiple members, partially assembled transcipts could lead to erroneous results for differential analysis. To rule out this possibility, the combined assembly of unripe and ripe transcriptomes was preferred over the individually assembled transcripts of ripe and unripe transcriptomes. To annotate the comparative transcripts, transcripts were queried against the NCBI NR database, TAIR proteins, MSU Rice proteins using the BlastX program and against CDD using the rpsblast programme. The information about total number of comparative transcripts annotated by the different databases is provided in the Additional file 1, Additional file 2, Additional file 3, Additional file 4.
The assembled contigs were also mapped to the Musa genome to annotate the genes and also to study the differential expression in the two libraries. The 19,410 contigs and 92,823 singletons obtained were mapped to the 36,542 genes currently identified in the Musa genome. Of the total contigs and singletons, 15,978 contigs and 59,410 singletons mapped to 21,298 genes in the musa genome, and 8,490 of the mapped genes were common to both contigs and singletons. The remaining 3,432 contigs that did not match the Musa genome were annotated using the NCBI NR database, TAIR proteins, MSU7 version Rice proteins using the BlastX program and against CDD using the blastx programme. Of these, 247 contigs were annotated and the remaining 3,185 contigs were unique to the banana transcriptome. The 3,432 contigs which did not match the Musa genome may be due to differences between the genomic sequence of DH-Pahang and Harichhal varieties or transposable elements, experiment artefacts, or mis-prediction of genes in DH-Pahang. In addition, possibilities of post-transcriptional events like alternative splicing of the transcripts during ripening process leading to unique transcripts cannot be ruled out. Such alternative splicing during plant growth and development have been reported in other plants [33,34]. The 15,978 contigs matched to 12,315 Musa genes. Of these, 9,809 contigs had one CDS match in the Musa genome; whereas 6,169 contigs matched to 2,506 Musa CDS indicating that more than one contig mapped to the CDS sequences. This could be due to the partial contigs or due to alternative splicing of the transcript. To identify the alternative spliced transcripts, these 6,169 contigs and 2,506 Musa CDS were analysed as described in Material and Methods to identify alternatively spliced transcripts. It was found that 1,243 contigs that mapped to 402 CDS were alternatively spliced transcripts and 4,926 contigs that mapped to 2,104 Musa cds were partial transcripts.

Comparative transcriptome analysis and differential gene expression
The number of reads in a particular contig is in general a measure of the transcript abundance of that particular contig, however this could also be due to sampling errors rather than genuine gene expression differences. To rule out this possibility, we applied three statistical tests P-value, FDR and the R statistical test. In the R statistical test [35] only R value > =8 was filtered that gave a believability of >99%. In this test, the singletons were statistically insignificant and hence discarded since the contigs were assembled from reads of unripe and ripe libraries. Using this statistic from 19,410 contigs, only 1,921 contigs were significantly differentially regulated. Of these, 653 genes were up-regulated (more than 2-fold) and 837 were down-regulated (more than 2-fold) in ripe fruit in comparison to unripe fruit (Additional file 5). Of these, 107 up-regulated and 83 down-regulated genes did not give hits in any of the databases analysed and could be novel genes that may be involved in different pathways or molecular networks during ripening in banana fruit. When analysis was carried out using differentially expressing genes during ripening in DH Pahang cultivar by D'Hont et al. [4], 353 genes showed differential expression. A large number of genes (98%) had similar expression pattern between our analysis and by D'Hont et al. (2012) [4]. A set of 569 differentially expressed genes had CDS counterpart in the Musa genome but were not significantly expressed in the earlier study [4]. These 569 differentially expressed genes may be playing an important role in the ripening of the banana variety Harichhal. To further annotate genes and study metabolic pathways and functional annotation, the KEGG description of TIGR and TAIR gene ids were transferred to the orthologous banana transcripts in our study.

Genes involved in banana ripening
During banana fruit ripening, the pulp tissue losses its turgidity, softens and produces aromatic volitiles. To bring about these changes, a repertoire of genes is differentially expressed to regulate these processes. In the following sections, we have summarized changes in gene expression based on their predicted role in softening and aroma and flavor.
Up-regulated genes during banana fruit ripening Softening of the banana tissue Cell wall hydrolysis plays an important role in plant growth and development that includes ripening as well as stress responses. Most of the genes involved in cell wall hydrolysis are members of multigene families and many have highly specialized functions in cell wall metabolism [36]. The process of softening begins with the onset of ripening. The stage at which the ripe tissue was CDD collected for this study was fruit that had already begun to soften. It has been previously reported that the gene families responsible for softening of banana include expansins, pectate lyases and xylogulcan endotransglycosylases [6][7][8][9].
In the present study, several members of these gene families showed significantly higher expression in the ripe fruit compared to unripe fruit with some members of each family exhibiting more than a 12 fold increase in expression (Table 2). In our study, we analysed the expression of genes annotated as cellulase, polygalacturonase (PG), pectin esterase, pectate lyase (PL), XTH and expansin ( Figure 1). We observed that the greatest increase in gene expression was associated with the gene families PL, XTH and expansin. Five different expansin genes were identified in this study, and four of these were significantly up-regulated in the ripening fruit. From the XTH gene family, 13 members were identified of which several were significantly upregulated in the ripening fruit. Since xyloglucan forms a major component of the cell wall in non-graminecious monocot plants, its role during ripening in banana is quite understandable. Members of XTH gene family have also been demonstrated to play important role in the ripening of other fleshy fruits like tomato and peach [37]. Similarly, 5 members were identified for the PL gene family and all of these were highly expressed during ripening.
Polygalacturonases and cellulases are also present as multigene families in banana. Some members of these families showed significantly up-regulation during ripening; however, it was generally not as high as members of the expansin, XTH and PL gene families. A few members of the PME gene family were also up-regulated; however, since one of the functions for PME is to modify pectins to make them more accessible to PL and PG, the transcripts for PME may have already declined in the ripe fruit (4-days post ethylene) used in the study. It has been reported that the highest PME activity is observed at 2 days post ethylene exposure and declined significantly by day 3 [6]. Details on the fold change of each gene family are provided in Additional file 6.
The beta glucosidases (GH family 17) are also known to play an important role in the softening of the banana fruit. As many as 7 beta glucosidases genes showed more than two fold enhanced expression in the ripe banana fruit as compared to unripe fruit in our analysis. Apart from its role in the cell wall degradation, beta glucosidases are also known to participate in the hydrolysis of phytohormones (i.e. glucosides of gibberellins, abscisic acid and cytokinins) and in the metabolism of cyanogenic glucosides. In graminae, these glucosides have been shown to be involved in the shikimate as well as aromatic acid biosynthesis pathways [38]. Genes related to the cell wall softening were among the top upregulated genes indicating that softening of fruit as a major process during banana fruit ripening at molecular level.

Genes related to aroma and flavor compounds
The aroma of the banana fruit is attributed to the presence of various volatiles like isoamyl alcohol, isoamyl acetate, butyl acetate, elemecine and several others [39]. These volatiles are produced primarily by the phenylpropanoid pathway, fatty acid biosynthesis pathway and isoleucine biosynthesis pathway [40]. Since the major components of the aroma and flavor volatiles are esters, the expression of genes involved in biosynthesis of esters from amino acids, fatty acids and unsaturated fatty acids were analysed here. The genes involved in each step were identified ( Figure 2) and differential expression was examined. The conversion of sugars to alcohol is mediated by ADH which is further converted to esters by AATs. At least 10 contigs annotated as ADH genes showed more than 2-fold up-regulation in the ripe fruit as compared to unripe fruit. Similarly, the lipoxygenases genes were also significantly up-regulated in the ripe fruit as compared to unripe fruit. A large number of transferases were up-regulated in the ripe sample, which could be playing a putative role in the production of the aroma volatiles.
Our analysis also suggested that genes for the butyltransferases, acetyltransferases, O-methyltransferases were significantly up-regulated in the ripe fruit as compared to unripe fruit ( Table 3). The members of BAHD acyltransferases gene family are known to be involved in the acetyl CoA dependent acylation of secondary metabolites resulting in the formation of esters and amides. Hoffmann et al., [41] categorised these in four different groups namely (A) Taxus acyltransferase involved in taxol biosynthesis (B) anthocyanin acyltransferases involved in anthocyanin biosynthesis (C) enzymes with un-related substrates and (D) hydroxycinnamoyl acyltransferase. In the present study, at least 30 acyltransferases were significantly upregulated in the ripe fruit. One of the gene annotated as 3-N-debenzoyl-2-deoxytaxol N-benzoyltransferase was one of the most highly up-regulated genes (10-fold) in the ripe fruit. This enzyme family is involved in the acylation of the final step in the taxol biosynthesis pathway. The hydroxycinnamoyl acyltransferase also showed a significant increase (5.8-fold) in the ripe fruit (Additional file 6). The significatly higher expression of these genes in the ripe fruit suggests their involvement in the production of banana volatile esters that may contribute to the ripe fruit aroma. The role of AAT has already been established in the ester formation [42]. A set of other genes including 4-coumarate--CoA ligase 1, peroxisomal-coenzyme A synthetase involved in the formation of aromatic volatiles were also up-regulated in ripe fruit (Table 2 and Additional file 6). Our analysis indicates that volatile esters are generally synthesized from amino acids and not the fatty acid degradation pathway ( Figure 2).

Down-regulated genes during banana fruit ripening
As the fruit matures for ripening, the genes which are required for the growth and development are not required and are therefore down-regulated. We carried out analysis to identify such genes using comparative transcriptome data. The vacuolar ATP transporters play an important role during the development of fruit and are known to be helpful in creating a proton gradient across the tonoplast membrane, which is effective in transport of nutrients, metabolites and proteins. As the process of softening starts, these proteins are no longer required and hence the gene encoding V-ATPases, showed a significant decline in their expression in ripe fruit as compared to unripe fruit. In the present study, the most significantly down-regulated genes were the trans-membrane transporters and antiporters. Out of these expression of AVP1, a gene encoding an ATPase/hydrogen-translocating pyrophosphatase, decreased in ripe fruit compared to unripe fruit by 12-fold, the greatest decline of any transcript in our analysis (Table 3). These genes are mainly involved in maintaining the pH balance and transport of important metabolites. As ripening proceeds, the fruit vacuolar membrane starts to degenerate as these types of transporters may not be required. As many as 112 genes annotated as transporters in various families were down-regulated (Additional file 5). In our analysis, many of the genes responsible for RNA processing and protein synthesis were down-regulated in ripe fruit. In addtion, a large number of transcription factors and genes associated with flower and fruit development were down-regulated. We observed a decline in expression of the several floral homeotic genes, FT genes, auxin responsive genes in ripe fruit. These regulatory proteins may no longer be required at ripening stage hence, showed a significant reduction in gene expression in ripe fruit as compared to unripe fruit.

Modulated pathways during banana fruit ripening
The KO ids of all the contigs that matched with TAIR ids were extracted and involvement of genes in different pathways was analysed using KEGG pathway database. Analysis suggested that the transcriptomes of both the unripe and ripe fruit pulp included genes associated with many different KEGG pathways. The genes from banana were mapped onto the KEGG pathway under metabolism, genetic information processing, environmental information processing, cellular processes and organisms systems. Metabolic pathways identified included carbohydrate, lipid, amino-acid, nucleotide, energy metabolisms. The KEGG pathways database for the rice genome has 120 pathways and genes for each of these pathways were identified in banana (Additional file 7), indicating the complete coverage of the transcriptomes in our study. GO analysis of differentially expressed genes indicated that most of the ripening asscociated gene expression was assigned to funtional groups for transcription factors, nucleic acid activity and receptor binding activity. More than 50 percent the transcripts in the transcriptomes were involved in energy  pathways, hydrolase activity, response to abiotic and biotic stimulus and other biological processes. These are some of the pathways that were active during ripening and this data might provide a platform to explore ripening related genes (Additional file 8).
As ethylene biosynthesis and perception is essential to banana fruit ripening, a comprehensive analysis for the genes involved in ethylene synthesis and signal transduciton was carried out. Several contigs were identified as gene related to ethylene biosynthesis including SAM, ACS and ACO (Figure 3). Various members of the each gene family showed differential gene expression in ripe and unripe fruit. As each of these gene families has several members, expression of some genes was up-regulated while others was either down-regulated or remained unchanged. It might be assumed that the genes that were up-regulated were associated with system 2 ethylene biosynthesis whereas those that were down-regulated were linked to system 1 ethylene biosynthesis or other biological processes [43]. In addition, a large number of genes associated to the ethylene signal transduction were also identified in our analysis. Many of these genes have been identified for the first time in banana as well. As many as 14 members related to CTR1 and CTR1-like are identified in our study. Similarly, genes related to ETR1, ERS, EIN2, EIN3, EIN4, EIL were also identified in the transcriptome database. In another study, through genome-wide analysis, 25 members of MAPK were also identified. Of these, many were differentially regulated [44] and could hold the key to finding the missing members of the ethylene signal transduction pathway during fruit ripening.

Transcription factors and their role in ripening
Gene regulation through transcription factors (TFs) plays an important role in biological and cellular processes. To study a potential role for the transcription factors in banana fruit ripening, all the genes in the plant transcription factor (TF) database [45] were downloaded   [3,12,43,46]. At the ripe fruit stage we collected, the most important processes are of cell wall degradation and synthesis of aromatic volatiles. The MADS and NAC domain proteins are known to interact with each other and other cell wall related gene promoters like expansin and others [43]. Since most of these TFs belong to multigene families, many TFs were down regulated during ripening, indicating their  differential role during various stages of ripening and fruit development.

Novel genes with modulated expression during banana fruit ripening
A large number of genes that did not show any hits to any of the databases but were significantly and differentially regulated were identified in this study (Additional file 9). These genes could be involved in the various processes like cell-wall softening, production of aromatic volatiles, changes in colour of the peel and development of flavour compounds. A total of 3185 genes did not show any hits to any of the databases (NR, AGIprot, Rice, CDD) of these 548 and 648 genes were 2-fold upand down-regulated respectively.

Validation of differential gene expression
The differential expression of a few selected genes was confirmed by RT-qPCR. These genes were randomly selected from three categories including genes related to the ethylene signalling, aroma and softening. The expressions for each gene was examined in unripe fruit (0) and 2, 4, 6 and 8 days post ethylene treatment (Figure 4).
In regard to genes related to ethylene signalling, of the ethylene receptor genes examined, expression of an ERS1-like gene and an EIN4-like gene increased markedly (>10-fold) during ripening. The CTR1 gene, which is downstream from the ethylene-receptors, initially showed a reduction in expression in the early stages of ripening, but had a significant increase in expression at 6 days post ethylene exposure (Figure 4). Similarly, the ETR1 gene showed a reduction in expression at day 2, which later increased at 6 days post ethylene exposure. Out of all the genes selected for analysis, one of the ERS1 genes did not show significant change in expression and the EIN4 gene showed a down-regulation during ripening process. The differential expression of these genes as analysed through quantitative real time PCR was similar to that observed in the comparative transcriptome analysis. The aroma related GTs and MTs showed a significant increase in expression as the ripening progressed, and this increase in expression generally began at day 4 and reached a maximum at day 6 of ripening. Expression of the aroma genes appears to correlated with the stage when the fruit emits a characteristic aroma and after this senescence and over-ripening sets Table 4 Transcription factor gene families and their members in banana fruit transcriptomes   TF family  Unripe  Ripe  TF family  Unripe  Ripe  TF family  Unripe  Ripe  TF family  Unripe  Ripe   ABI3VP1  34  31  CAMTA  18  16  LFY  0  0  SBP  43  54   Alfin-like  20  16  CCAAT  45  39  LIM  4  5  Sigma70-like  12   in resulting in a less palitable fruit. The aroma volatiles are no longer needed and hence the expression of these genes starts to decrease. For the softening related genes the expression of selected members of PE, PL XTH, Cellulase and PG gene families were studied. As observed in comparative transcriptome data, quantitative-RT analysis also suggested significantly higher expression of XTH and PL genes as the ripening progressed. The expression of these genes started increasing drastically at the 4 day stage and continued till senescence of the fruit. The expression of one member of cellulase and 2 members of PG gene families were also studied through quantitative-RT analysis. The expression of these genes increased during the progress of ripening, however, it was not as significant as the increase in the XTH, PL and PE genes. The results obtained through quantitative-RT analysis verified and extended differential expression as observed in the comparative transcriptome analysis between ripe and unripe fruit.

SSR markers
EST derived SSR markers are an important tool for gene mapping. SSR marker studies have been done in banana earlier and a banana SSR database is available; however, identification of SSRs was done using the publicly available ESTs, which was somewhat limited for banana. To enrich the SSR markers in Banana, we identified SSRs using the Misa pipeline in the combined assembly data of the ripe and unripe transcriptomes ( Table 5). The combined transcriptome was screened for the presence of di-, tri-, tetra-, penta-and hexa-nucleotide SSR motifs and 1,042 SSRs were identified in the Supercontigs for the unripe and ripe fruit transcriptomes. The Di-and tri-repeats formed the major part of SSRs and were around 70% of the total SSRs identified. The annotation of the contigs associated with different SSRs was extracted using a custom perl script. Several of the SSRs were in genes up-regulated in ripening process. Con-tig17908 and Contig03660, which containined one SSR each, were annotated as expansin and XTH, respectively, and both were strongly up-regulated during ripening (Additional file 10). The SSRs identified, in this study, will be useful as genetic markers for breeding improved varieties of banana.

Conclusion
Banana is an economically important fruit in many parts of the world; however, huge post-harvest losses are incurred by farmers and consumers due to over-ripening. The ethylene regulated ripening in banana has not been studied in great detail at the molecular level. Most of the studies carried out are related to single genes or a single gene family. However, ten gene families related to ethylene biosynthesis and signalling have been studied recently in detail [47]. More global analysis of gene expression in banana has been restricted to subtractive hybridisation and PAGE-DDGE, both of which fail to give a comprehensive picture of the transcriptome. In the present study, we have sequenced the transcriptomes of two stages of the banana fruit pulp and identified genes involved in the ripening processes. The two most important processes related to banana fruit ripening were softening and production of aroma volatiles. Both of these processes were studied in detail and many genes related to aroma formation were identified. Several acyltransferases were identified that are likely involved in the synthesis aromatic volatiles and flavour components. In addition, the present study highlights the importance of expansins, PL and XTH in the softening of the fruit. Apart from enriching the banana genes in the database, we have also identified many novel genes that could be playing an integral part during ripening in banana, and may be good candidates for future gene manipulation studies.

Plant material and RNA isolation
Fruits of Musa accuminata (Dwarf Cavendish, Genome AAA, var. Robusta, Harichhal, germplasm code TRY0081 at National Research Centre for Banana, India) were harvested from plants grown in the field of CSIR-National Botanical Research Institute, Lucknow. Fruits were washed, wiped and exposed to 100 μL/L ethylene for 24 h to initiate ripening and stored for four days as described earlier [6]. The selection of fruit, ethylene treatment and RNA isolation was replicated four time using ten fruits in each experiment. Two fruits from each set were randomly chosen and the pulp pooled and frozen in liquid nitrogen and stored in −70°C for further use. Frozen tissues from ripe and unripe fruits were ground to a fine powder in liquid nitrogen using a mortar and pestle. Total RNA from unripe and ripe tissues was extracted using method previously described [48] followed by DNaseI treatment according to manufacturer's instructions (Ambion, USA). RNA quality was checked on agarose/EtBr gel and quantity determined with a spectrophotometer (Nanodrop, Thermo Scientific, USA).

cDNA Library construction and 454 sequencing
An equal amount of total RNA from each of the four different preparations was pooled and used for library preparations. First strand cDNA was prepared using 5 μg of the pooled RNA using oligo-dT primer and Superscript II reverse transcriptase (Invitrogen, Carlsbad, CA). A double-stranded cDNA library was then synthesized as described in double stranded cDNA synthesis kit (Invitrogen, Carlsbad, CA), and the double-stranded cDNA purified by Gene Chip Sample Cleanup Module (Affymetrix, USA). Quantity as well as quality of the double stranded cDNA library was checked on an Agilent 2100 Bioanalyzer DNA chip (Agilent Technologies Inc., Santa Clara, CA). Approximately three micrograms of double-stranded cDNA was sheared by nebulization to produce random fragments of about 250-800 bp in length. The nebulized cDNA was purified further using QIAGEN QIA quick PCR purification spin columns and pooled. Fragments smaller than 300 bp were removed and the purified cDNA samples were assesed on DNA chip (Agilent 2100 Bioanalyzer, USA) to analyze quantity as well as confirm the fragment size (350-800 bp). Adapter ligation and purification of adapter ligated library was done according to manufacturer's instruction (Roche, USA). The quality and quantity of library was evaluated on Agilent High sensitivity chip and spectroflurometer (Perkin Elmer, USA), respectively. The double-stranded cDNA fragments were then denatured to generate singlestranded cDNA fragments, which were then amplified by emulsion PCR for sequencing according to manufacturer's instructions (454 Life Sciences, Roche, USA). Reads from unripe and ripe libraries were processed and trimmed to remove low quality and primer sequences.

De novo sequence assembly and annotation
The raw 454 sequences from ripe and unripe banana fruit libraries were screened and trimmed for weak signals by GS FLX pyrosequencing software to yield high-quality (HQ) sequences (>99.5% accuracy of single-base reads). The primer and adapter sequences were trimmed from the HQ sequences, and sequences shorter than 50 bp removed before assembly. The trimmed sequences were assembled into unique contigs and singletons using ROCHE GS Assembler (version 2.5.3) with 40 base pair overlap and 96% identity. The contigs and singletons were annotated using a standalone version of NCBI BLASTx program [49] against the Arabidopsis protein database at The Arabidopsis Information Resource (TAIR; http://www.arabidopsis. org) (version Tair9), MSU Rice genome annotation and the NCBI non-redundant protein (Nr) database (http:// www.ncbi.nlm.nih.gov; released on 06/23/2009) and The Banana Genome Hub (http://banana-genome.cirad.fr/) using the BLASTx algorithm with an E-value cut-off of 10 −5 and extracting only the top hit for each sequence. Annotation against the CDD database (http://www.ncbi.nlm. nih.gov) was done using the rpsblast programe of the blast suite, and pfam using the hmmer v 3 programe. To find out the potential coding regions in unigenes were presented or not, ESTScan was carried out using HMM based program. To analyse the partial and alternative transcripts, the contigs were computationally fragmented to 100 bp tagged and mapped to the banana genome using the bowtie2 programme [50]. Parts of the contigs that skipped an exon during mapping were identified as alternatively spliced mapping on banana genome [4].

Functional classification and biological pathways assignment
To gain an understanding of metabolic and genetic networks operating during ripening, the genes identified in our transcriptome were mapped according to their linkage in the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways database. Enzyme commission (EC) numbers were assigned to unique sequences, based on the BLASTx search of protein databases, using a cut off E-value 10 −5 . The output of KEGG analysis includes KEGG orthology (KO) assignments and KEGG pathways (http://www.genome.jp/kegg/) that are populated with the KO assignments. Gene ontology (GO) analysis was also performed using the GO terms indentified for banana supercontigs having an E-value of >10 −5 in a BLAST search of Arabidopsis genes in the TAIR databases.

Digital gene expression and pathway analysis
To analyse differential gene expression the reads per contigs were counted and the transcript per million calculated. Differentially expressed genes were identified using DESeq package [51]. To statistically determine the differential gene expression the R statistics [35] was applied, and R ≥8 were considered to be highly significant. To calculate the threshold R value, 1000 datasets for each library was generated according to the random Poisson distribution as previously described [35]. For the comparative expression analysis with the musa genome, all the unigenes including singletons were mapped to annotated gene models predicted for the musa genome. Expression levels were calculated using TPM (Transcripts per million) of contigs and the predicted levels checked again using the DESeq pacakge [51]. Pathway analysis was performed using the KEGG and Biocyc program for Arabidopsis and Rice, and the contigs were fished using custom made perl scripts. Clustering of the genes and the heat maps were Tri-nucleotide repeats 536 579 Tetra-nucleotide repeats 24 49 Penta-nucleotide repeats 5 5 Hexa-nucleotide repeats 8 8 generated using the MEV software (http://www.tm4.org/ mev.html).

Designing of oligonucleotide primers and real-time PCR analysis
A set of oligonucleotide primers (Additional file 11) were designed for RT-qPCR on the basis of sequence information developed through sequence analysis. For RT-qPCR, first-strand cDNA was synthesized using total RNA in a Revert Aid H minus first strand cDNA synthesis kit (Fermentas life Sciences, USA) according to the prescribed protocol. The cDNA was checked by semi quantitative PCR, followed by agarose gel electrophoresis. The PCR mix for Real time PCR contained 1 μl of diluted cDNA (10 ng), 10 μl of 2× SYBR Green PCR Master Mix (Applied Biosystems, USA), and 200 nM of each gene-specific primer in a final volume of 20 μl. A no template control was also performed for each primer pair. Expression was quantified using the Applied Biosystems 7500 Fast Real time PCR System. All the PCRs were performed under following conditions: 20 sec at 95°C, 3 sec at 95°C, and 40 cycles of 30 sec at 60°C in 96-well optical reaction plates (Applied Biosystems, USA). The specificity of amplicons was verified by melting curve analysis (60°C to 95°C) after 40 cycles. Three technical replicates were performed for each cDNA.