The developmental and physiological complexity of eukaryotes could not be explained solely by the number of protein-coding genes . For example, the Drosophila melanogaster genome contains only twice as many genes as some bacterial species, although the former is far more complex in its genome organization than the latter. Similarly, the number of protein-coding genes in human and nematode is extremely close. A portion of this paradox can be resolved through alternative pre-mRNA splicing . In addition, post-translational modifications can also contribute to the increased complexity and diversity of protein species .
Recent studies suggest that most of the genome are transcribed, among the transcripts only a small portion encode for proteins, whereas a large portion of the transcripts do not encode any proteins, which are generally termed non-protein coding RNAs (npcRNA). For example, transcriptome profiling in rice (Oryza sativa) indicates that there are about 8400 putative npcRNAs, which do not overlap with any predicted open reading frames (ORFs) . These npcRNAs are subdivided as housekeeping npcRNAs (such as transfer and ribosomal RNAs) and regulatory npcRNAs or riboregulators, with the latter being further divided into short regulatory npcRNAs (<300 bp in length, such as microRNA, siRNA, piwi-RNA) and long regulatory npcRNAs (>300 bp in length). With the identification of microRNAs and siRNAs in diverse organisms, increasing evidences indicate that these short npcRNAs play important roles in development, responses to biotic and abiotic stresses by cleavage of target mRNAs or by interfering with translation of target genes [5–9].
Long npcRNAs are transcribed by RNA polymerase II, polyadenylated and often spliced . Studies in mice and human suggested that at least 13% and 26% of the unique full-length cDNAs, respectively, are thought to be poly(A) tail-containing long npcRNAs [11–13]. Emerging evidences also suggest that long npcRNAs are developmentally regulated and responsive to external stimuli, and play roles in development and stress responses of plants and disease in human. For example, some long npcRNAs are regulated in various stresses in plants and animals [9, 14–16]. In Caenorhabditis elegans, 25 npcRNAs are either over- or under-expressed under heat shock or starvation conditions , while in Arabidopsis, the abundance of 22 putative long npcRNAs are regulated by phosphate starvation, salt stress or water stress . In Arabidopsis, long npcRNA, COOLAIR (cold induced long antisense intragenic RNA), is cold-induced FLC antisense transcripts, and has an early role in the epigenetic silencing of FLC and to silence FLC transcription transiently . Long npcRNA HOTAIR in human is reported to reprogram chromatin state to promote cancer metastasis .
Currently, two computational methods are employed to identify long npcRNAs, genome-based and transcript-based. Using genomic sequences, more than 200 candidate long npcRNAs were predicted in Escherichia coli , and at least 20 long npcRNA genes have been experimentally confirmed . In Rhizobium etli, 89 candidate npcRNAs are detected by high-resolution tilling array, and 66 are classified as novel ones . While using cDNA or EST sequences, a large number of long npcRNAs are detected in Drosophila, mouse and Arabidopsis [12, 18, 24–26].
Up to date, identification of long npcRNAs is limited to a few plant species, such as Arabidopsis, rice and maize. To our best knowleage, in wheat no systematic identification of long npcRNAs is reported. Wheat (Triticum aestivum, AABBDD, 2n = 42) is the most widely grown crop plant, occupying 17% of all the cultivated land, provides approximately 55% of carbohydrates for world human consumption , Biotic and abiotic stresses are important limiting factors for yield and grain quality in wheat production. For instance, powdery mildew, caused by the obligate biotrophic fungus Blumeria graminis f. sp. tritici (Bgt), is one of the most devastating diseases of wheat in China and worldwide and causing significant yield losses . High temperature, often combined with drought stress, causes yield loss and reduces the grain quality . To reduce the damages caused by biotic and abiotic stresses, plants have evolved sophisticated adaptive response mechanisms to reprogram gene expression at the transcriptional, post-transcriptional and post-translational levels . Recently, transcript profiling has been successfully employed to determine the transcriptional responses to powdery mildew infection and heat stress in wheat, and the results revealed that a number of genes were significantly induced or repressed in response to these stresses [31, 32].
In our previous study , it was demonstrated that expression of microRNAs in wheat was regulated by powdery mildew infection and heat stress, which stimulated us to explore whether long npcRNA was also responsive to powdery mildew infection and/or heat stress. In this study, we performed a genome-wide in silico screening of powdery mildew infection and heat stress responsive wheat transcripts in order to isolate a collection of long npcRNA genes. Combining microarray analysis and high-throughput SBS sequencing methods, we totally characterized 125 putative stress responsive long npcRNAs in wheat, four of them were miRNA precursors, and one was experimentally verified by northern blot. Wheat long npcRNAs displayed tissue-specific expression patterns and their expression levels were altered in response to powdery mildew infection and/or heat stress, which suggested that at least a subset of these newly identified wheat long npcRNAs potentially play roles in response to biotic and/or abiotic stresses in wheat.