Identification and characterization of NAGNAG alternative splicing in the moss Physcomitrella patens
© Sinha et al; licensee BioMed Central Ltd. 2010
Received: 15 December 2009
Accepted: 28 April 2010
Published: 28 April 2010
Alternative splicing (AS) involving tandem acceptors that are separated by three nucleotides (NAGNAG) is an evolutionarily widespread class of AS, which is well studied in Homo sapiens (human) and Mus musculus (mouse). It has also been shown to be common in the model seed plants Arabidopsis thaliana and Oryza sativa (rice). In one of the first studies involving sequence-based prediction of AS in plants, we performed a genome-wide identification and characterization of NAGNAG AS in the model plant Physcomitrella patens, a moss.
Using Sanger data, we found 295 alternatively used NAGNAG acceptors in P. patens. Using 31 features and training and test datasets of constitutive and alternative NAGNAGs, we trained a classifier to predict the splicing outcome at NAGNAG tandem splice sites (alternative splicing, constitutive at the first acceptor, or constitutive at the second acceptor). Our classifier achieved a balanced specificity and sensitivity of ≥ 89%. Subsequently, a classifier trained exclusively on data well supported by transcript evidence was used to make genome-wide predictions of NAGNAG splicing outcomes. By generation of more transcript evidence from a next-generation sequencing platform (Roche 454), we found additional evidence for NAGNAG AS, with altogether 664 alternative NAGNAGs being detected in P. patens using all currently available transcript evidence. The 454 data also enabled us to validate the predictions of the classifier, with 64% (80/125) of the well-supported cases of AS being predicted correctly.
NAGNAG AS is just as common in the moss P. patens as it is in the seed plants A. thaliana and O. sativa (but not conserved on the level of orthologous introns), and can be predicted with high accuracy. The most informative features are the nucleotides in the NAGNAG and in its immediate vicinity, along with the splice sites scores, as found earlier for NAGNAG AS in animals. Our results suggest that the mechanism behind NAGNAG AS in plants is similar to that in animals and is largely dependent on the splice site and its immediate neighborhood.
Eukaryotic primary mRNAs consist of protein-coding regions (exons) and intervening non-coding regions (introns). The mature mRNA transcript, which acts as substrate for translation into protein, is produced by removing introns in a process called splicing. Splicing can be either constitutive, always producing the same mRNA, or alternative, via variable inclusion of parts of the primary transcript. Alternative splicing (AS) is thus a mechanism that enables multiple transcripts and proteins to be encoded by the same gene, thereby promoting transcript and protein diversity . Furthermore, events of AS can provide an additional level of post-transcriptional gene regulation, e.g. by the production of mRNA isoforms with truncated open reading frames that are subject to degradation by the nonsense mediated decay pathway [2, 3]. AS is particularly widespread in higher eukaryotes, especially in mammals - it has been estimated that up to 94% of all multi-exonic H. sapiens genes are alternatively spliced . Large-scale detection of AS usually involves expressed sequence tags (ESTs), microarray, or RNA-seq analysis. However, not all AS events can be detected by these methods. Moreover, nowadays genomic sequence data is being churned out at a much faster rate than transcript data, that is, many genomes have low transcript coverage. Thus, there is a need for independent methods of detecting AS.
While there have been numerous experimental as well as computational studies of AS in animals, the study of AS in plants is still in its early stages . Although AS is commonly observed in plants, the overall abundance of AS seems to be lower than in animals. Several studies have estimated that between 20%-30% of plant genes undergo AS [14–17], while the current estimate based on deep sequencing of the Arabidopsis thaliana transcriptome is 42%-56% of intron-containing genes . EST-based detection of AS in plants revealed that intron retention appears to be the most common kind of AS event in plants [13–16]. Exon-skipping, which is the most common event in animals , is much less frequent in plants [14, 16]. The two prevalent models for spliceosome assembly are intron-definition, which applies to short introns (thus to a majority of plant introns) and involves the intron as the initial unit of recognition during spliceosome assembly; and exon-definition, which applies to long introns introns (thus to a majority of animal introns), and involves recognition of the exon as the initial unit for splicing [14, 20–22]. Thus, one would expect inaccurate splicing to result in intron-retention under the intron-definition model, and exon-skipping under the exon-definition model . Hence the results showing that intron-retention is the most common AS event in plants and exon-skipping in animals are consistent with these models of splicing. However, alternative acceptors and donors seem to occur at a comparable frequency . In particular, short distance or subtle AS events, seem to be just as common, and NAGNAG acceptors are widespread and abundant; a study on AS found 953 alternative NAGNAGs in rice and 485 in A. thaliana .
Initial analyses of the model plant P. patens, the first sequenced bryophyte, indicated a distribution of AS events similar to other plants studied so far . Consequently, we here aimed to characterize and predict the extent of NAGNAG AS in P. patens. Analysis of the available transcript data indicates that NAGNAG AS is just as common in the moss P. patens as in seed plants. We achieved a high level of performance in silico, and 64% of the cases of well-supported AS using independently generated 454 data could be correctly predicted. In agreement with a recent study comparing A. thaliana and O. sativa with mammals , our results suggest that the mechanism of NAGNAG AS is similar in plants and animals.
Results and discussion
Identification of alternative NAGNAGs using Sanger ESTs
Gene ontology enrichment analysis
GO analyses of genes with alternative NAGNAG acceptor site
# in test group
# in reference group
# non annot. test
# non annot. reference group
P. patens alternative NAGNAG genes Sanger support
intracellular membrane-bound organelle
P. patens alternative NAGNAG genes Sanger and/or 454 support
intracellular membrane-bound organelle
A. thaliana alternative NAGNAG genes combined (Iida et al., 2008 and Schindler et al., 2008)
intracellular membrane-bound organelle
nucleic acid binding
nucleobase, nucleoside, nucleotide and nucleic acid metabolic process
transcription factor activity
transcription regulator activity
Evolutionary conservation of NAGNAG splicing among plants?
NAGNAG motifs occuring atconserved positions in A. thaliana and P. patens
Prediction of NAGNAG AS in P. patens
constitutive: ≥ 10 ESTs supporting either E or I variant, 0 for the other;
alternative: ≥ 2 ESTs supporting each variant, ≥ 10% of ESTs supporting minor variant.
Generation of additional transcript evidence
Summarized NAGNAG coverage
Sanger-based NAGNAG sites
covered NAGNAG sites
Additional 454-based NAGNAG sites
covered gene models
Total alternative NAGNAG sites
Experimental confirmation of the NAGNAG AS
Experiments were performed on 19 candidate NAGNAGs, 14 as controls (seven with AS according to transcript data, and seven without AS) to see whether the splicing outcomes according to Sanger and 454 reads could be confirmed by a PCR based approach, and five on the basis of an orthologous alternative NAGNAG intron in A. thaliana (see above). Of the seven candidates with support for AS from Sanger or 454 datasets, three were predicted to be alternative spliced with p(EI) values > 0.9 (Additional file 2). Using Sanger sequencing of cDNA based PCR products, all three candidates were indeed verified as being alternatively spliced in P. patens protonema and gametophore tissue, respectively. Eight candidate genes were used as potential negative controls, as their p(EI) predictions were 0.365 and lower. All candidates showed support for the single predicted isoform by means of available transcript evidence and consequently only this single isoform could be detected during experimental validation (Additional file 2). Having support for both variants from either the Sanger or the 454 datasets, but a p(EI) < 0.9, four more candidates were chosen to be validated. NAGNAG AS could be confirmed for the gene product Phypa_161321 by Sanger sequencing of cDNA PCR products, although it has a low p(EI) of 0.181 (Additional file 2). The experimental validation is supported by the Sanger dataset, where 13 "E" variants as well as 27 "I" variants could be identified. This is the only case where prediction from the Naïve Bayes Classifier does not agree with the experimental results. In case of Phypa_74146 and Phypa_199161, only one of the two isoforms could be detected, reflecting the low p(EI) values.
Twelve of the 19 candidate genes possess a GAG in the NAGNAG motif (Additional file 2). Using the above described methods, all of them are shown to be not alternatively spliced. Therefore, GAG seems not to be used as an alternative acceptor for AS in P. patens in most cases, which is in line with the sequence logos (Figure 2B). Exceptions could be Phypa_199161 and Phypa_228333, which possess both isoforms regarding Sanger and 454 datasets. These two candidates may indeed use GAG as acceptors for AS, but this remains to be proven. Rare usage of GAGs as acceptors in P. patens is in agreement with previous work which shows that functional acceptors are only very rarely GAGs - the order of preference for the nucleotide preceding the AG in functional acceptors is C > T > A > G, which has been shown both by experimental work  as well as by in silico analyses of NAGNAG splicing [5, 12]. When we consider the EST and 454 evidence in P. patens, only 4.6% (149/3225) of GAG-containing NAGNAGs are alternative - filtering by transcript support to use only well-supported cases (as described for the preparation of training data in the "Methods" section) further reduces this to 2.6% (14/536). Taken together, this strongly suggests that GAGs function only very rarely as functional acceptors in P. patens (if at all).
Using 454 data for independent validation of predictions
The classifier was trained based on previously existing Sanger evidence, the additional 454 evidence was used for independent validation. Combining the 454 and Sanger datasets resulted in 296 additional NAGNAG AS events being detected - of these, 66 had strong support for AS in terms of satisfying the criteria used to define the training dataset (≥ 2 reads for each variant, ≥ 10% of the reads for the minor variant). 62% (41/66) of these were predicted to be alternative by the Naïve Bayes classifier. If we require ≥ 4 reads per variant while keeping the threshold of minor variant abundance at ≥ 10%, the correct predictions rise to 75% (9/12). When considering AS according to 454 reads alone, 64% (80/125) of the well-supported cases of AS are predicted correctly, which increases to 79% (30/38) if we require ≥ 4 reads per variant while keeping the threshold of minor variant abundance at ≥ 10%. On the other hand, if we look at cases which are constitutive with a support of ≥ 30 transcripts, according to the combined transcript dataset, only 1/93 E cases and 0/65 I cases are predicted to be alternative. The Naïve Bayes classifier predicts 371 further cases of AS (155 of 2,549 currently labeled E, and 216 of 1,891 currently labeled I) in P. patens - the high specificity shown by nearly no predicted AS in strongly supported constitutive NAGNAGs combined with the sensitivity of 62% in detecting newly discovered strongly supported cases of AS shows that there are potentially several hundred as yet undiscovered cases of NAGNAG AS in P. patens.
Prediction of NAGNAG AS in P. patens by a classifier trained on H. sapiensdata
We had earlier shown that a classifier trained on only H. sapiens NAGNAG data could predict NAGNAG splicing outcomes with near-identical accuracy on other vertebrate genomes (mouse, rat, dog, chicken), and with a slight drop in the case of D. melanogaster and Caenorhabditis elegans . Therefore, we also tried to predict NAGNAG AS in P. patens using a Naive Bayes classifier trained on H. sapiens data and achieved an AUC of 0.90, 0.99 and 0.97 for the EI, E and I forms, respectively. This was achieved using five features (the Ns in the NAGNAG, the two positions immediately upstream and the position immediately downstream) and is similar to that achieved on D. melanogaster earlier , reinforcing the notion that NAGNAG splicing in plants is similar to that in animals.
Here we describe the first computational prediction of alternative splicing (AS) in a non-seed plant and find that NAGNAG AS in P. patens, a moss, can be predicted with high accuracy. Since the extent of NAGNAGs in P. patens had not yet been reported, this work involved both characterization as well prediction of NAGNAG splicing in P. patens. Using ESTs, we found that NAGNAG AS is as widespread in the bryophyte P. patens as it is in the seed plants A. thaliana and O. sativa. Thus, NAGNAG AS is likely to be a common feature of AS in all land plants, just as it is in animals. Although we detected homologs with NAGNAG events among the two land plants P. patens and A. thaliana, NAGNAG splicing seems not to be conserved at the intron level.
Using carefully constructed training and test datasets, an in silico performance of AUC = 0.96, 0.99 and 0.98 was achieved for the EI, E and I forms, respectively. The most informative features (according to information gain ) were the nucleotides in the NAGNAG and its immediate vicinity, and even a relatively simple classifier like the Naïve Bayes classifier could match the more sophisticated Bayesian network and Support vector machine. The performance achieved by a Naïve Bayes classifier trained on H. sapiens data (AUC = 0.90, 0.99 and 0.97 for the EI, E and I forms, respectively) was similar to that achieved on D. melanogaster earlier . This indicates that, as in animals, the mechanism behind NAGNAG AS in plants is simple in nature and mostly dependent on the splice site neighborhood. Independent validation of the predictions of the classifier (trained on Sanger EST data alone) using 454 data showed that 64% (80/125) of the well-supported cases of NAGNAG AS could be predicted correctly.
In total, seven candidates were chosen for independent experimental confirmation of the Sanger and 454 evidence of NAGNAG splicing. The experimental confirmation depends on detection of isoforms using sequence electropherograms and is less sensitive than size polymorphism detection using fluorescence-labeled primers. The latter method was used on two of the seven examples and confirmed the results of the previous method. While there is transcript support for alternative use of GAG acceptors this could not be proven in our experimental validation. In addition, a further 12 experiments were performed - six as negative controls, all of which agreed with the predictions, and five to check for possible conserved NAGNAG AS with A. thaliana, which could not be detected.
When additional 454 transcript evidence was used to supplement the Sanger EST data, a total of 664 alternative NAGNAGs were found in P. patens. Since the average coverage per constitutive NAGNAG was still only approximately ten ESTs, this number shall likely continue to rise with deeper coverage of the transcriptome. Nevertheless, the results provide the first evidence that NAGNAG AS is widespread in P. patens. Our findings are in agreement with a recent study which showed that NAGNAG AS shares common properties in A. thaliana and O. sativa and animals . This indicates that the mechanism behind NAGNAG AS in land plants is similar to that in animals. The pervasiveness of NAGNAG AS suggests that it may be a general feature of splicing in animals and plants, and possibly in all eukaryotes.
Identification of alternative splicing at NAGNAG acceptors using ESTs
Feature design and extraction; classifiers
Feature extraction was done based on annotated data using a Perl script (Additional file 3; see Additional file 4 for example input data. The script produces output which, together with Additional file 5, can be used with standard classifiers). The region used for analysis can be seen in Figure 5. Since the composition of the splice site neighborhood influences splicing in general, the base pairs at positions -20 to +3 with respect to the NAGNAG were each used as a single feature, as were the two Ns in the NAGNAG motif. The last three positions of the upstream exon were also included, since they can influence both the process of splicing, as well as reflect influence of codon usage near the exon boundary. Thus, we had a total of 28 features which each represented a nucleotide, and thus had four possible values (A, C, G, T). A weak polypyrimidine tract (PPT) can contribute to AS, and the number of pyrimidines in the 3' region of the intron is a measure of PPT strength. Therefore, we designed a feature called "Y-content", which refers to the number of pyrimidines in the 20 bp upstream of the NAGNAG. Splice site strength, being one of the most important determinants of splicing outcome, was also included as a feature - the strength of the two possible splice sites for each NAGNAG exon, as computed using SpliceMachine , contributed two more features. In total, 31 features were used. We used the WEKA package and Bayesian Networks, Naive Bayes classifiers, and Support vector machines . For feature selection within WEKA, we used the method "CfsSubsetEval". In addition, we also used manual inclusion and exclusion of features.
where H(Class) is the entropy of the class variable, and H(Class|Feature) is the conditional entropy of the class variable, given the feature. Information gain is a well established measure for feature selection in Machine Learning. We used the WEKA package for computing information gain, in order to rank the features according to how informative they were. We also used it for prediction based on SVMs, as implemented in the SMO option, and for prediction using Naïve Bayes classifiers.
Functional annotation and GO enrichment analysis
For every (potential) NAGNAG splicing region an overlapping P. patens gene model was assigned using the start and stop coordinates on the genomic scaffolds. The corresponding predicted protein sequences were subjected to BLAST2GO  GO term annotation which was extended by various subcellular target prediction and homology-based methods (see http://www.cosmoss.org/annotation/references?cosmoss_ref=1 for details). The resulting GO annotation was mapped to GO slim terms using the Blast2GO internal mapping function using the "goslim_plant.obo" ontology subset. GO enrichment analysis was performed against the complete P. patens with the BLAST2GO internal Fisher's exact test/GOSSIP  using the two-tailed test, with false discovery rate (FDR) correction and a q-value cut-off < 0.05. The A. thaliana alternative NAGNAG splicing gene set was constructed using the alternative NAGNAG acceptor cases identified within the A. thaliana genome from  and . The resulting alternative NAGNAG acceptor set contains 290 A. thaliana proteins. These proteins were subjected to a GO enrichment analysis as described above for P. patens. The A. thaliana GOA was downloaded from ftp://ftp.arabidopsis.org/home/tair/Ontologies/Gene_Ontology/ATH_GO_GOSLIM.txt (17.11.2009) and mapped to GO slim (goslim_plant.obo) with BLAST2GO.
Candidate selection for evolutionary conserved NAGNAG acceptors
P. patens cosmoss v1.2 and A. thaliana TAIR 8 proteins were subjected to a BLAST based single linkage clustering using BLASTCLUST . The parameters were set to 70% length coverage and 70% alignment identity to obtain only highly conserved homologs. In total 1,088 clusters with at least one P. patens, respectively A. thaliana, protein were found. Five candidates out of seven P. patens genes, each sharing a cluster with A. thaliana alternative NAGNAG acceptor containing genes [24, 26], were selected for experimental validation. In addition, these P. patens candidate genes contain a potential NAGNAG acceptor in the same intron as the corresponding A. thaliana homolog.
Experimental confirmation of splice variants
P. patens total RNA was isolated from protonema and gametophore tissue using the RNeasy Plant Mini Kit (Qiagen, Hilden, Germany). cDNA synthesis was carried out with 250 ng total RNA using Superscript III Reverse Transcriptase (Invitrogen, Karlsruhe, Germany) according to the manufacturers' instructions. For validation of different splice variants, PCR was performed from protonema and gametophore RNA, respectively, using native Pfu-Polymerase (Fermentas, St. Leon-Rot, Germany). PCR primers were obtained from Sigma (München, Germany). PCR reactions were carried out using 12 ng cDNA as template. Products were extracted using the QIAquick PCR purification Kit (Qiagen, Hilden, Germany) and directly sequenced (GATC, Konstanz, Germany). Sequences and chromatograms were analysed with ChromasPro Version 1.34. Alternatively, PCR products amplified with carboxyfluorescein (FAM) labeled forward primers were analysed by capillary electrophoresis, where AS was detected as a size difference of three nucleotides in length. PCR products were diluted as appropriate and subjected to capillary electrophoresis for separation and detection. For this purpose, 10 μL HiDi formamide (Applied Biosystems) and 0.5 μL HD400 GS internal size standard were added to each well, and the plate was mounted on a 3100 Genetic Analyzer with Foundation Data Collection software v. 2.0 and Gene Mapper ID software v. 3.2 (Applied Biosystems, Darmstadt, Germany).
Tissue culture and generation of additional transcript evidence
Physcomitrella patens strain Gransden 2004  was cultivated on solidified (1% w/v agar) mineral medium [250 mg L-1 KH2PO4, 250 mg L-1 MgSO4 × 7-H2O, 250 mg L-1 KCl, 1000 mg L-1 Ca(NO3)2 × 4H2O, 12.5 mg L-1 FeSO4 × 7H2O, pH 5.8 with KOH] on 9 cm petri dishes enclosed by laboratory film in a Percival cultivation chamber (CLF, Germany) at 22°C with a 16 h light, 8 h dark regime under 70 μmol*s-1*m-2 white light (long day conditions). Gametophore colonies were grown from single gametophores transferred to the dishes from precultured colonies. Induction of gametangia was performed by placing the dishes under inductive conditions , i.e. 20 μmol *s-1*m-2 white light and 15°C with a 8 h light, 16 h dark regime until development of gametangia. After harvesting and freezing, the material was ground under liquid nitrogen and total RNA isolated using the Ambion mirVana miRNA isolation kit (Applied Biosystems, Darmstadt, Germany). RNA isolation and subsequent sequencing pool creation steps were carried out by Vertis Biotechnologie (Freising, Germany). Poly(A)+ RNA was prepared by oligo(dT) chromatography and cDNA was synthesized using a N6 randomized primer. Afterwards, 454 adapters A (CCATCTCATCCCTGCGTGTCTCCGACTCAG) and B (CTGAGACTGCCAAGGCACACAGGGGATAGG) were ligated to the 5' and 3' ends of the cDNA. The resulting N0 cDNA was amplified using PCR (16 cycles) with a proof reading enzyme. Normalization was carried out by one cycle of denaturation and reassociation of the cDNA, resulting in N1-cDNA. Reassociated ds-cDNA was separated from the remaining ss-cDNA (normalized cDNA) by passing the mixture over a hydroxylapatite column. After hydroxylapatite chromatography, the ss-cDNA was amplified with 9 PCR cycles. Finally, the cDNA in the size range of 500-700 bp was eluted from a preparative agarose gel and subjected to GS FLX Titanium sequencing (GATC, Konstanz, Germany), resulting in 631,313 raw reads. After low quality and adapter clipping using LUCY  and SeqClean http://compbio.dfci.harvard.edu/tgi/software/, and polyA-tail removal with trimmest , 589,283 reads with a mean length of 343 nucleotides remained. The 454 reads (Additional file 1) were mapped against the genome as described above for the P. patens Sanger ESTs and are available at http://www.cosmoss.org for download in a genome browser track "454 reads sexual gametophores (normalized library)" http://www.cosmoss.org/cgi/gbrowse/physcome/
This work was supported by the German Research Foundation (DFG grant Re 837/10-2 to R.R. and S.A.R.), by the Federal Ministry of Education and Research (BMBF grant FRISYS 0313921 to R.B., R.R. and S.A.R.) and by the Excellence Initiative of the German Federal and State Governments (EXC 294 to R.B., R.R. and S.A.R.). We are grateful to S. Richardt and E. Heupel for skillful technical assistance and to M. Heinrich for analysing the FAM-labeled PCR products.
- Graveley BR: Alternative splicing: increasing diversity in the proteomic world. Trends in Genetics. 2001, 17 (2): 100-107. 10.1016/S0168-9525(00)02176-4.PubMedView ArticleGoogle Scholar
- Hughes TA: Regulation of gene expression by alternative untranslated regions. Trends in Genetics. 2006, 22 (3): 119-122. 10.1016/j.tig.2006.01.001.PubMedView ArticleGoogle Scholar
- Stalder L, Mühlemann O: The meaning of nonsense. Trends in Cell Biology. 2008, 18 (7): 315-321. 10.1016/j.tcb.2008.04.005.PubMedView ArticleGoogle Scholar
- Wang ET, Sandberg R, Luo S, Khrebtukova I, Zhang L, Mayr C, Kingsmore SF, Schroth GP, Burge CB: Alternative isoform regulation in human tissue transcriptomes. Nature. 2008, 456 (7221): 470-476. 10.1038/nature07509.PubMedPubMed CentralView ArticleGoogle Scholar
- Hiller M, Huse K, Szafranski K, Jahn N, Hampe J, Schreiber S, Backofen R, Platzer M: Widespread occurrence of alternative splicing at NAGNAG acceptors contributes to proteome plasticity. Nat Genet. 2004, 36 (12): 1255-1257. 10.1038/ng1469.PubMedView ArticleGoogle Scholar
- Zavolan M, Kondo S, Schonbach C, Adachi J, Hume DA, Group RG, Members GSL, Hayashizaki Y, Gaasterland T: Impact of Alternative Initiation, Splicing, and Termination on the Diversity of the mRNA Transcripts Encoded by the Mouse Transcriptome. Genome Res. 2003, 13 (6b): 1290-1300. 10.1101/gr.1017303.PubMedPubMed CentralView ArticleGoogle Scholar
- Dou Y, Fox-Walsh KL, Baldi PF, Hertel KJ: Genomic splice-site analysis reveals frequent alternative splicing close to the dominant splice site. RNA. 2006, 12 (12): 2047-2056. 10.1261/rna.151106.PubMedPubMed CentralView ArticleGoogle Scholar
- Ermakova EO, Nurtdinov RN, Gelfand MS: Overlapping alternative donor splice sites in the human genome. Journal of Bioinformatics and Computational Biology. 2007, 991-1004. 10.1142/S0219720007003089.Google Scholar
- Sugnet CW, Kent WJ, Jr AM, Haussler D: Transcriptome and Genome Conservation of Alternative Splicing Events in Humans and Mice. Pacific Symposium on Biocomputing. 2004, 9: 66-77.Google Scholar
- Sinha R, Nikolajewa S, Szafranski K, Hiller M, Jahn N, Huse K, Platzer M, Backofen R: Accurate prediction of NAGNAG alternative splicing. Nucl Acids Res. 2009, 37 (11): 3569-3579. 10.1093/nar/gkp220.PubMedPubMed CentralView ArticleGoogle Scholar
- Chern T-M, van Nimwegen E, Kai C, Kawai J, Carninci P, Hayashizaki Y, Zavolan M: A Simple Physical Model Predicts Small Exon Length Variations. PLoS Genetics. 2006, 2 (4): e45-10.1371/journal.pgen.0020045.PubMedPubMed CentralView ArticleGoogle Scholar
- Akerman M, Mandel-Gutfreund Y: Alternative splicing regulation at tandem 3' splice sites. Nucl Acids Res. 2006, 34 (1): 23-31. 10.1093/nar/gkj408.PubMedPubMed CentralView ArticleGoogle Scholar
- Barbazuk WB, Fu Y, McGinnis KM: Genome-wide analyses of alternative splicing in plants: Opportunities and challenges. Genome Research. 2008, 18 (9): 1381-1392. 10.1101/gr.053678.106.PubMedView ArticleGoogle Scholar
- Wang B-B, Brendel V: Genomewide comparative analysis of alternative splicing in plants. PNAS. 2006, 103 (18): 7175-7180. 10.1073/pnas.0602039103.PubMedPubMed CentralView ArticleGoogle Scholar
- Wang B-B, O'Toole M, Brendel V, Young N: Cross-species EST alignments reveal novel and conserved alternative splicing events in legumes. BMC Plant Biology. 2008, 8 (1): 17-10.1186/1471-2229-8-17.PubMedPubMed CentralView ArticleGoogle Scholar
- Campbell M, Haas B, Hamilton J, Mount S, Buell CR: Comprehensive analysis of alternative splicing in rice and comparative analyses with Arabidopsis. BMC Genomics. 2006, 7 (1): 327-10.1186/1471-2164-7-327.PubMedPubMed CentralView ArticleGoogle Scholar
- Ner-Gaon H, Leviatan N, Rubin E, Fluhr R: Comparative Cross-Species Alternative Splicing in Plants. Plant Physiol. 2007, 144 (3): 1632-1641. 10.1104/pp.107.098640.PubMedPubMed CentralView ArticleGoogle Scholar
- Filichkin SA, Priest HD, Givan SA, Shen R, Bryant DW, Fox SE, Wong W-K, Mockler TC: Genome-wide mapping of alternative splicing in Arabidopsis thaliana. Genome Research. 2009, 20: 45-58. 10.1101/gr.093302.109.PubMedView ArticleGoogle Scholar
- Kim E, Magen A, Ast G: Different levels of alternative splicing among eukaryotes. Nucl Acids Res. 2007, 35 (1): 125-131. 10.1093/nar/gkl924.PubMedPubMed CentralView ArticleGoogle Scholar
- Berget SM: Exon recognition in vertebrate splicing. J Biol Chem. 1995, 270: 2411-2414.PubMedView ArticleGoogle Scholar
- Lorkovic ZJ, Kirk DAW, Lambermon MHL, Filipowicz W: Pre-mRNA splicing in higher plants. Trends in Plant Science. 2000, 5 (4): 160-167. 10.1016/S1360-1385(00)01595-8.PubMedView ArticleGoogle Scholar
- Lim LP, Burge CB: A computational analysis of sequence features involved in recognition of short introns. Proceedings of the National Academy of Sciences of the United States of America. 2001, 98 (20): 11193-11198. 10.1073/pnas.201407298.PubMedPubMed CentralView ArticleGoogle Scholar
- Rensing SA, Lang D, Zimmer AD, Terry A, Salamov A, Shapiro H, Nishiyama T, Perroud P-F, Lindquist EA, Kamisugi Y, et al: The Physcomitrella Genome Reveals Evolutionary Insights into the Conquest of Land by Plants. Science. 2008, 319 (5859): 64-69. 10.1126/science.1150646.PubMedView ArticleGoogle Scholar
- Iida K, Shionyu M, Suso Y: Alternative Splicing at NAGNAG Acceptor Sites Shares Common Properties in Land Plants and Mammals. Mol Biol Evol. 2008, 25 (4): 709-718. 10.1093/molbev/msn015.PubMedView ArticleGoogle Scholar
- Bluthgen N, Brand K, Cajavec B, Swat M, Herzel H, Beule D: Biological profiling of gene groups utilizing Gene Ontology. Genome Inform. 2005, 16 (1): 106-115.PubMedGoogle Scholar
- Schindler S, Szafranski K, Hiller M, Ali G, Palusa S, Backofen R, Platzer M, Reddy A: Alternative splicing at NAGNAG acceptors in Arabidopsis thaliana SR and SR-related protein-coding genes. BMC Genomics. 2008, 9 (1): 159-10.1186/1471-2164-9-159.PubMedPubMed CentralView ArticleGoogle Scholar
- Hiller M, Szafranski K, Sinha R, Huse K, Nikolajewa S, Rosenstiel P, Schreiber S, Backofen R, Platzer M: Assessing the fraction of short-distance tandem splice sites under purifying selection. Rna. 2008, 14 (4): 616-629. 10.1261/rna.883908.PubMedPubMed CentralView ArticleGoogle Scholar
- Lang D, Zimmer AD, Rensing SA, Reski R: Exploring plant biodiversity: the Physcomitrella genome and beyond. Trends in Plant Science. 2008, 13 (10): 542-549. 10.1016/j.tplants.2008.07.002.PubMedView ArticleGoogle Scholar
- Ling C, Huang J, Zhang H: AUC: a better measure than accuracy in comparing learning algorithms. Canadian Artificial Intelligence Conference 2003. 2003, 329-341.Google Scholar
- Reski R: Development, genetics and molecular biology of mosses. Botanica Acta. 1998, 111: 1-15.View ArticleGoogle Scholar
- Hollins C, Zorio DAR, Macmorris M, Blumenthal T: U2AF binding selects for the high conservation of the C. elegans 3' splice site. RNA. 2005, 11 (3): 248-253. 10.1261/rna.7221605.PubMedPubMed CentralView ArticleGoogle Scholar
- Witten IH, Frank E: Data Mining: Practical machine learning tools and techniques Second edition. Morgan Kaufmann, San Francisco, 2005,Google Scholar
- Gremme G, Brendel V, Sparks ME, Kurtz S: Engineering a software tool for gene structure prediction in higher organisms. Information and Software Technology. 2005, 47 (15): 965-978. 10.1016/j.infsof.2005.09.005.View ArticleGoogle Scholar
- Stajich JE, Block D, Boulez K, Brenner SE, Chervitz SA, Dagdigian C, Fuellen G, Gilbert JGR, Korf I, Lapp H, et al: The bioperl toolkit: Perl modules for the life sciences. Genome Research. 2002, 12 (10): 1611-1618. 10.1101/gr.361602.PubMedPubMed CentralView ArticleGoogle Scholar
- Crooks GE, Hon G, Chandonia J-M, Brenner SE: WebLogo: A Sequence Logo Generator. Genome Res. 2004, 14 (6): 1188-1190. 10.1101/gr.849004.PubMedPubMed CentralView ArticleGoogle Scholar
- Degroeve S, Saeys Y, De Baets B, Rouze P, Peer Van de Y: SpliceMachine: predicting splice sites from high-dimensional local context representations. Bioinformatics. 2005, 21 (8): 1332-1338. 10.1093/bioinformatics/bti166.PubMedView ArticleGoogle Scholar
- Conesa A, Gotz S, Garcia-Gomez JM, Terol J, Talon M, Robles M: Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics. 2005, 21 (18): 3674-3676. 10.1093/bioinformatics/bti610.PubMedView ArticleGoogle Scholar
- Bluethgen N, Brand K, Cajavec B, Swat M, Herzel H, Beule D: Biological profiling of gene groups utilizing Gene Ontology. Genome Inform. 2005, 16 (1): 106-115.Google Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic Local Alignment Search Tool. Journal of Molecular Biology. 1990, 215 (3): 403-410.PubMedView ArticleGoogle Scholar
- Hohe A, Rensing SA, Mildner M, Lang D, Reski R: Day length and temperature strongly influence sexual reproduction and expression of a novel MADS-box gene in the moss Physcomitrella patens. Plant Biology. 2002, 4 (5): 595-602. 10.1055/s-2002-35440.View ArticleGoogle Scholar
- Chou H-H, Holmes MH: DNA sequence quality trimming and vector removal. Bioinformatics. 2001, 17 (12): 1093-1104. 10.1093/bioinformatics/17.12.1093.PubMedView ArticleGoogle Scholar
- Rice P, Longden I, Bleasby A: EMBOSS: The European Molecular Biology Open Software Suite. Trends in Genetics. 2000, 16 (6): 276-277. 10.1016/S0168-9525(00)02024-2.PubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.