Identification of nuclear genes controlling chlorophyll synthesis in barley by RNA-seq

Albinism in plants is characterized by lack of chlorophyll and results in photosynthesis impairment, abnormal plant development and premature death. These abnormalities are frequently encountered in interspecific crosses and tissue culture experiments. Analysis of albino mutant phenotypes with full or partial chlorophyll deficiency can shed light on genetic determinants and molecular mechanisms of albinism. Here we report analysis of RNA-seq transcription profiling of barley (Hordeum vulgare L.) near-isogenic lines, one of which is a carrier of mutant allele of the Alm gene for albino lemma and pericarp phenotype (line i:BwAlm). 1221 genome fragments have statistically significant changes in expression levels between lines i:BwAlm and Bowman, with 148 fragments having increased expression levels in line i:BwAlm, and 1073 genome fragments, including 42 plastid operons, having decreased levels of expression in line i:BwAlm. We detected functional dissimilarity between genes with higher and lower levels of expression in i:BwAlm line. Genes with lower level of expression in the i:BwAlm line are mostly associated with photosynthesis and chlorophyll synthesis, while genes with higher expression level are functionally associated with vesicle transport. Differentially expressed genes are shown to be involved in several metabolic pathways; the largest fraction of such genes was observed for the Calvin-Benson-Bassham cycle. Finally, de novo assembly of transcriptome contains several transcripts, not annotated in current H. vulgare genome version. Our results provide the new information about genes which could be involved in formation of albino lemma and pericarp phenotype. They demonstrate the interplay between nuclear and chloroplast genomes in this physiological process.


Background
Chlorophylls form the most abundant and the most important class of plant pigments, as they play central role in photosynthesis process which is a main source of energy for plants as well as a main source of oxygen in atmosphere of the Earth. In eukaryotic plant and algae cells chlorophyll is located in chloroplasts. Chloroplast is an organoid which, like the mitochondria, contains its own genome sometimes called 'plastome' [1]. However, chloroplasts are strongly dependent on the nuclear genome of the plant cell. In angiosperms, plastome contains only approximately 50 protein-coding genes involved in chlorophyll synthesis, photosynthesis and some metabolic processes, as well as tRNA and rRNA genes [2], while the amount nuclear-encoded proteins which are predicted to be targeted to chloroplasts is as much as 2500-3500 (as accessed in Arabidopsis thaliana; [3,4]). In A. thaliana, from 1400 to 1500 cyanobacterial proteins have been identified, of which about a half are targeted to chloroplasts [5]. It was suggested that about 4500 A. thaliana genes descend from cyanobacterial genomes, however, only about 1300 of them are predicted to have their products targeted to chloroplasts [6].
Transcription of chloroplast genome is provided by two types of RNA polymerasea nuclear-encoded plastid RNA-polymerase (NEP) and plastid encoded plastid RNA-polymerase (PEP). PEP is an RNA polymerase of bacterial design, and it uses sigma-factors for transcription initiation, which are nuclear-encoded [7]. Chloroplast genes can be divided into three groupsclass I genes have promoters that are only recognized by PEP, class II have promoters recognized by both enzymes, and class III are only transcribed by NEP [8]. Most genes in plastome have promoters that can be recognized by both types of polymerase [9].
Chloroplast genes control is mostly post-transcriptional. It is regulated by nuclear-encoded proteins, and in some cases a single nuclear-encoded protein is required specifically for controlling of expression of a single chloroplastencoded gene [10].
All of the above suggests that a sophisticated coordination of plastome and nuclear genome is required for proper development and functioning of chloroplasts. Such coordination is formed by anterograde signalingfrom nucleus to chloroplastand retrograde signalingfrom chloroplast to nucleus [11]. Intermediates of chlorophyll synthesis pathway, namely protoporphyrin IX, serve as signaling molecules and are transported from chloroplast to nucleus [12]. On the other hand, a number of authors disagree with the role of protoporphyrin IX as a signaling molecule [13,14]. This leads to a conclusion that mechanisms of nucleus-tochloroplast and chloroplast-to-nucleus signaling are poorly understood.
Although a number of mutant phenotypes with full or partial chlorophyll deficiency have been described, genes responsible for such phenotype formation are largely unknown. Barley (H. vulgare L.) Alm gene (chromosome 3H; [15]) determining albino lemma and pericarp has not been sequenced yet, and the target or regulatory genes for Alm remain unknown. Near-isogenic lines (NILs) are the proper models for isolation of genes underlying phenotypic variation by transcriptomics, proteomics or metabolomics approaches (reviewed in [16]). Availability of barley genome reference sequence [17] facilitates application of WGS-based approaches (such as RNA-seq) for investigating genetic networks responsible for phenotypic variation. RNA-seq approach has been already employed successfully to uncover the history of wild barley domestication and distribution [18], for studying the gene family of barley aquaporines [19] or creating SNP maps of barley [20].
In the current study, RNA-seq was exploited for identification of genes involved in tissue-specific albinism using near-isogenic lines differing by the Alm gene allelic state.

Results
Phenotypic characterization of the near-isogenic lines and pleotropic effect of Alm The i:BwAlm near-isogenic line is characterized by albino lemma (base and the central part), however the upper part of the lemma and awns are green. Bowman lemma and awn have a green color (Fig. 1). Chlorophyll fluorescence is observed in the central part of the lemma in Bowman, while it is absent in the central part of i:BwAlm lemma (Fig. 1d). Chlorophyll fluorescence is also observed in lemma-awn transition point of Bowman (Fig. 1c). In i:BwAlm,the analysis of the fluorescence pattern in transition point between the white and green zones (Fig. 1f ) detected an alternation of cells containing a population of fluorescing chloroplasts and lacking them (Fig. 1g). While moving in the basal direction the number of detected fluorescent cells decreases to extinction (Fig. 1g). The appearance and fluorescent pattern in awns (and other similarly colored parts of plant such as mid-stem and leaf blade) did not reveal differences between Bowman and i:BwAlm.
Pleutropic effect of Alm was observed on auricles (and adjacent leaf blade regions), stem node (and adjacent stem regions; Fig. 2) and sheath of the first leaf of the i:BwAlm line. In Bowman, all these parts of plant are green. Along the surface of stem both lines have the green strands with bright chlorophyll fluorescence ( Fig. 2), however in the i:BwAlm node adjacent region these strands flat out, and a fragmentation of fluorescent strands into the loosely spaced islands is detected (Fig. 2e, f ). The shape of patterns of cell population containing fluorescing chloroplasts in this zone differs between Bowman and i:BwAlm. In the Bowman plants cells arranged in files (Fig. 2c). In i:BwAlm cells with fluorescence form irregular patterns, no cell files are observed (Fig. 2f ).
Thus, the Alm gene results in vanishing of a chloroplasts population with chlorophyll in lemma, pericarp, auricles with adjacent leaf blade region, stem node with adjacent stem region and sheath of the first leaf. The alternation of the patterns of cells containing a population of fluorescing chloroplasts and cells without fluorescence signal is evident in transition region between the white and green zones (Figs. 1g and 2f ). Difference between i:BwAlm and Bowman in the cell arrangement (Fig. 2c, f ) demonstrates that Alm gene, in addition to effects on chloroplasts, results in modification of patterning of cells with active chloroplasts in pericarp and auricles.

qPCR gene differential expression validation
The qRT-PCR-based evaluation of transcription of two genes in the spikelets of the NILs is summarized in Fig. 3. The first is gene MLOC_17002 that encodes Type III Light-harvesting complex II Chlorophyll a/b-binding precursor protein. Its expression level estimated from RNAseq data decreased in i:BwAlm plants more than 45-fold (corrected with Benjamini-Hochberg procedure p =0.002). The second is a XLOC_012413 transcript that, according to Gene Ontology analysis, is functionally connected to vesicle transport. Its expression level estimated from RNA-seq data increased in i:BwAlm plants more than 1000-fold (corrected p = 0.002). According to RT-PCR data Type III LHCII CAB precursor protein gene transcript was abundant in Bowman and significantly decreased (18.5 fold) in albino plants (Fig. 3a). XLOC_012413 transcript was poorly detectable in Bowman, however its expression level increased significantly (121 ford) in spikelets of i:BwAlm plants (Fig. 3b). The difference between expression levels estimated by RT-PCR for these two genes is p = 0.033 for MLOC_17002 gene and p = 0.008 for XLOC_012413 gene according to t-test.
Libraries quality assessment 28481151 short reads in six pooled libraries were produced as raw sequenced data. Each library was examined for quality control, adapter sequences were trimmed and reads with insufficient length or too low mean quality were removed, which resulted in discarding of 7.4 % to 13.4 % of reads in five of six libraries, and removal of 40 % sequences in library Bowman (2) ( Table 1).
Next, we performed analysis of the mapping reads from the six libraries to estimate the optimal value of the mismatch parameter. Mapping reads to the reference genome of H. vulgare was performed with TopHat2 tool ten times, with increasing number of allowed mismatches from 2 to 20 as described in the Methods section. The resulting FPKM counts for each transcribed genome fragment are listed in Additional file 1. We observed an increase in percentage of fragments mapped to genome in each library with increasing mismatch number (Fig. 4). In five of six libraries proportion of mapped reads ranged from 23 % to 38 % (2 mismatches allowed) to 64 % to 79 % (20 mismatches allowed); for the Bowman (1) library percentage changes from 21 % to 39 % for alignment with 2 and 20 mismatches, respectively (Fig. 4).
Additionally we monitored changes of four characteristics of genome coverage quality for six libraries (see the Methods section) with respect to mismatch number. The ratio of these values to their maxima in ten TopHat2 alignments is shown in Fig. 5. This plot (Fig. 5) a b c d g e f Fig. 1 The spike phenotype of Bowman (a-d) and i:BwAlm(e-g). a, b, e, f appearance at visible light. c, d, g chlorophyll fluorescence patterns of selected areas demonstrated that number of mapped reads increases with increase in number of allowed mismatches. However, at mismatch values greater than 12 the number of matched reads saturates its maximal value. Similarly, N uncov drops with increase in number of allowed mismatches, and N orphans is reduced in alignments with six or more allowed mismatches compared to alignments with two and four allowed mismatches. Combination of these three parameters is believed to be optimal for alignment with 18 allowed mismatches. Aligning reads to the reference genome with STAR program resulted in successfully mapping of 72 to 83 % of reads for five libraries out of six, with Bowman (1) library having only 46 % of mapped reads ( Table 2).
After examination of considered parameters, the mapping with 18 allowed mismatches was taken into further analysis. A total of 14.9 million of reads were mapped to the reference genome from all six libraries. Cufflinks pipeline identified 33537 genome fragments that are covered by short reads from at least one library and/or are annotated in the current genome assembly.
Next, we compared the coverage of the H. vulgare transcripts by reads from each of the libraries. We generated tables of counts indicating the transcript coverage for 24158 genomic regions from H. vulgare genome annotation by cuffnorm tool, 18820 of which passed out filtering criteria. Results of clustering of six libraries with respect to the similarity of coverage of these regions are shown in Fig. 6. The tree diagram demonstrates that estimates of coverage for H. vulgare genes form two large clusters corresponding to two contrast genotypes irrespectively to the alignment methods and sample replicate. Detailed analysis of the diagram demonstrates that the read counts estimates performed by TopHat2and STAR tools for the same replicate are closer than those for different replicates are.

Analysis of transcript abundance
One thousand seven hundred and eighty-six genome fragment identified by cufflinks pipeline (5 % of total number of transcribed genome fragments) showed changes in expression levels with |log 2 FC| > 2. List of transcribed fragments and genes with expression level changes greater than four-fold is presented in Additional file 2. One thousand two hundred and eighty-four of those fragments have a lower level of expression in i:BwAlm, while 502 fragments have a higher expression level in i:BwAlmin comparison with a c b f e d Genes with known functions and higher expression levels in i:BwAlm are listed in Table 3 and described below in details.
Genes MLOC_33278 and MLOC_74627 encode a putative homolog of DNA helicase and cystein proteinase, respectively.
Gene MLOC_65942 encodes iron-regulated solute carrier (SLC) family protein. The majority of the SLC transporters are secondary active transporters, such as exchangers, symporters, and antiporters, for which transport is driven by various energy coupling mechanisms [21].
Gene MLOC_66447 encodes pectinesterase protein. This enzyme catalyzes demethylation of pectin. Pectin is a component of cell wall of land plants, and in plant cell pectinesterase is an ubiquitous enzyme associated with cell wall [22]. It is known to participate in a number of processes, including pollen tube growth and pollen tetrad separation [23]. Gene MLOC_22576 encodes a protein of pectin mythylesterase inhibitor superfamily.
The most abundantly transcribed genes in Bowman are chloroplast genes encoding rubisco small chain and ribosomal S12 protein and nuclear genes that encodes rubisco small chain (Table 4). These genes have a lower level of expression in i:BwAlm. Other genes with high level of expression in Bowman include sucrose synthase, tubulin α-3 chain, S-adenosylmethinonine decarobxulase proenzyme, phenylalanine ammonia-lyase, ATP synthase subunit β, translationally-controlled tumor protein homologue and two uncharacterized proteins. These genes also have a very high levels of expression in i:BwAlm line (Table 4).

Differential expression identification
We used two tools to identify differentially expressed genes between i:BwAlm and Bowman lines. Using Cufflinks pipeline we detected 902 genome fragments with differential expression (see methods), of which 119 had a higher level of expression in i:BwAlm plants, and 783 had a higher level of expression in Bowman line plants. Implementing EdgeR package for R, we detected 79 and 802 fragments with higher and lower expression levels in i:BwAlm plants, respectively (Fig. 8). 512 fragments  with increased expression and 50 fragments with decreased expression in line i:BwAlm were detected with both tools. Thus, 1221 genome fragments are revealed to have differential expression (|log 2 FC| > 1, p < 0.05 after correction with Benjamini-Hochberg procedure), which comprises approximately 3.6 % of all transcribed fragments. Among those there are 1073 genome fragments with lower level of expression in i:BwAlm. 694 of those fragments are genes annotated in barley genome current assembly, 115 have a known protein product. 148 genome fragments have differential expression with higher level of expression in i:BwAlm. 67 of those fragments are genes annotated in H. vulgare genome assembly. 4 of those genes have known protein productsgene MLOC_61063 that encodes 3-ketoacyl-CoA synthase, gene MLOC_63089 that encodes asparagine synthetase, gene MLOC_51356 encoding leucine-rich receptor-like protein kinase family protein, and gene MLOC_3822 encoding non-specific lipid transfer protein. Figure 9 shows discrepancy between a set of genes with higher and lower levels of expression in line i:BwAlm.
Furthermore, cufflinks pipeline identified 571 genome fragments as having differential isoform expression (see Methods) with p-value corrected through Benjamini-Hochberg procedure less than 0.05. Of these fragments, 282 are genes annotated in current assembly of barley genome. 83 of these genes have known functions. List of genes with differentially expressed isoforms includes gene MLOC_50938 encoding tetratricopeptide-like superfamily protein, gene MLOC_60122 encoding thylakoid rhodanese-like protein, gene MLOC_3822 encoding non-specific lipid transfer protein and genes MLOC_945 and MLOC_20487 that encode chlorophyll a-b binding proteins. This list also includes 16 fragments localized   Table 2.

Chloroplast gene expression analysis
Dissimilarity between coverage of chloroplast genome fragments was observed in results of cufflinks library reads mappings. According to the results obtained, 34 genome fragments have zero level of expression in both lines. Among those fragments are 23 genes of tRNA, genes that encode two ribosomal proteins, photosystem II protein Z, two RNA-polymerase subunits, four cytochrome proteins, NADH-plastoquinon oxidoreductase subunit 6 and gene ycf15 that encodes hypothetical chloroplast protein. 15 genome fragments that are expressed in Bowman line have zero level of expression in i:BwAlm line. Among these fragments are genes that encode three photosystem I proteins, two photosystem II proteins, ATP synthase CF0 subunit IV, four NADHplastoquinon oxidoreductase subunits, chloroplast envelope membrane protein, photosystem I assembly protein Ycf4, gene of 5S ribosomal RNA and three genes of tRNA. Additional file 3 lists expressed chloroplast genome fragments, respective FPKM values for i:BwAlm libraries and Bowman libraries and log 2 values of expression fold change.
All chloroplast genome fragments that have nonzero coverage in i:BwAlm have higher coverage levels in Bowman line. Figure 10 illustrates a number of operons with one of four levels of expression change: 'Zero transcription' for operons with no transcription observed in both studied lines, 'log(FC) < 2' for operons with less than four-folds change of transcription level, 'log(FC) > 2' for operons with change in transcription level from fourfolds to 368-fold, and 'log(FC) > 10' for operons with zero transcription in line i:BwAlm and sufficiently high level of transcription in line Bowman.
A plastid genome fragment that contains 17 genes has the highest among all plastid fragments level of coverage in i:BwAlm line. Of those 17 genes, 13 are genes of ribosomal proteins. Four other genes encode photosystem II protein N, translational initiation factor 1, RNApolymerase alpha-subunit and NADH-plastoquinon oxidoreductase subunit 2. List of other fragments that have high coverage in i:BwAlm line includes four ribosomal RNA coding genes, three photosystem II protein genes, rubisco large chain gene, ATP synthase CF0 subunits I and III genes, ATP synthase CF1 alpha subunit gene and ribosomal protein S12 gene.

Gene ontology analysis of differentially expressed genes
We analyzed the distribution of the putative protein products GO-terms annotation in three categories: biological function, molecular functions and cellular compartment localization. The most significantly presented gene ontology terms related to molecular function, biological processes and cellular compartment localizations attributed to putative protein products of genes with higher level of expression in Bowman are shown in Table 5.
For genes with higher expression level in i:BwAlm only GO terms associated with cell localization were identified. Nineteen out of 65 genes were associated with following terms: 'vesicle' , 'membrane-bound vesicle' , 'cytoplasmic vesicle' , 'cytoplasmic membrane-bound vesicle' at corrected p = 0.00128 for each of the terms.
Additional files 4 and 5 show GO terms associated with biological processes and cellular components for genes with lower level of expression in i:BwAlm line, respectively. Additional file 6 illustrates GO terms associated with cellular components for genes with higher expression level in line i:BwAlm.

Pathway analysis for differentially expressed genes
Six hundred and ninety-four genes with changes of expression level were analyzed in respect to their involvement in metabolic pathways. 'Omics data mapping' tool of PlantCyc database recognized 132 of these genes. Seventeen metabolic pathways included at least two genes with significant expression level change. Five pathways with highest numbers of genes with different expression levels are listed in Table 6. Other pathways are: The 'Calvin-Benson-Bassham circle' pathway that includes a highest number of differentially expressed genes is shown in the Fig. 11. It is related to 'sucrose biosynthesis I (from photosynthesis)' pathway that has four genes with different levels of expression. For gene MLOC_65956, which encodes an uncharacterized protein, differential expression was not confirmed. The other three genes included in a pathway are MLOC_62317, MLOC_19670 and MLOC_76055, which are also included in Calvin-Benson-Bassham cycle, and have a confirmed differential expression.
The step of Calvin-Benson-Bassham cycle that has the highest number of differentially expressed genes included in it is a rate-limiting [26] CO 2 fixation, which occurs in form of D-ribulose-1,5-bisphosphate carboxylation. This step involves ribulose-bisphosphate carboxylase enzyme. It consists of large and small subunits, each of which is present in eight copies in a functioning polypeptide. The large subunit contains active site of the enzyme and is chloroplast-encoded. In plants, a family of nuclear genes RBCS encodes different forms of small chain subunit that may be involved in regulation of enzyme catalytic activity [27]. It was also shown that translation level of large subunit is suppressed in absence of small subunit proteins [28].
In the current study, expression levels of gene MLOC_61558 that encodes rubisco large chain and is localized on morex_contig_456277 fragment of current genome assembly, as well as genes MLOC_44795, MLOC_21811 and MLOC_64679 that encode three rubisco small chain subunits, are shown to be lower in i:BwAlm in comparison with Bowman. Differential expression was confirmed by EdgeR tool for genes MLOC_61558 and MLOC_21811 and by both EdgeR and cuffdiff tools for genes MLOC_64679 and MLOC_44795.
The following step in the pathway includes gene MLOC_76055, which encodes phosphoglycerate kinase (PGK), an enzyme that catalyzes phosphorilation of 3phospho-D-glycerate. It exists in all living organisms and  Values of expression fold change in form of base 2 logarithm and corrected with Benjamini-Hochberg procedure are provided for each gene has a highly conserved structure. The enzyme functions as a monomer. In plants, PGK is reported to have two isozymes, one cytosolic and one chloroplast, which are encoded by two different nuclear genes and are expressed independently from each other [29]. The next step of Calvin-Benson-Bassham cycle is reduction of the carboxyl of 1,3-bisphosphoglycerate to an aldehyde. It involves enzyme glyceraldehyde-3phosphate dehydrogenase (GAPDH). In plants, GAPDH exist in cytosole as a heterotetramer and in chloroplasts in a different form [30]. Subunits of this enzyme are encoded by nuclear genes [31]. PlantCyc database indicates that two genes are included into this step, one of which is MLOC_44511 gene that shows differential expression between studied lines of barley. Protein product of this gene is referred to as 'predicted protein/uncharacterized protein' in barley genome assembly version 1v28.
Differentially expressed genes MLOC_62317 and MLOC_19670 are included in regeneration of ribulose. This stage includes series of reactions and a variety of enzymes, among which is fructose-1,6-bisphosphatase enzyme, which is also involved in gluconeogenesis and sucrose biosynthesis.
Finally, gene MLOC_58999 that encodes phosphoribulokinase protein is involved in phosphorilation of Dribulose-5-phosphate. Resulting product of the stage is D-ribulose-1,5-bisphosphate that is later used as a substrate by rubisco. However, differential expression for gene MLOC_58999 was not confirmed by either cuffdiff or EdgeR tools.
Furthermore, analysis of several plant pigment biosynthesis pathways demonstrated that they contain a number of differentially expressed genes. Pathways 'Superpathway of anthocyanin biosynthesis (from delphinidin 3-O-glucoside)' and 'Superpathway of anthocyanin biosynthesis (from pelargonidin 3-O-glucoside)' each have three barley genes annotated in PlantCyc database for H. vulgare. None of the genes demonstrated differential expression at the specified significance level. Pathway 'Chlorophyll cycle' contain five genes, three of them have significant changes in expression levels (Fig. 12). Pathway 'zeaxanthin, antheraxanthin and violaxanthin    interconversion' has eight annotated genes; three of them have significant changes in expression level and one has differential isoform expression (Fig. 13). All differentially expressed genes included in these two pathways have a lower level of expression in i:BwAlm line. Table 7 shows the lists of annotated barley genes included in the abovementioned pathways.

De novo transcriptome assembly
De novo assembly of transcriptome was performed to verify transcriptome data for presence of novel transcripts. We merged data from different samples of the same genotype before assembly analysis. Statistics of resulted sequence assemblies provided in Table 8. Alignment of resulted transcripts to H. vulgare reference genome by megablast tool of BLAST package. A total of 565 and 605 transcripts in assemblies of iBwAlm and Bowman, respectively, showed no significant homology to any sequences of reference genome. Transcripts with no significant homology to barley genome were examined in details. They were mapped to Escherichia coli genome in order to remove possible contaminants. 29 transcripts from i:BwAlm assembly and 28 transcripts from Bowman assembly showed homology with E. coli genome sequence. These transcripts were removed from further analysis. After aligning assembled transcripts without H. vulgare similarity to CDS sequences of Oryza sativa and Brachypodium distachion several homologous sequences were found. Alignment to Arabidopsis thaliana CDS sequences revealed no homology in any of the transcripts.
Thus, 564 transcripts from i:BwAlm assembly and 604 transcripts from Bowman assembly showed no significant homology to H. vulgare genome (see additional files 7 and 8). Among those, 320 transcripts are common to both assemblies (Table 9).

Discussion
Differential expression analysis 1221 genome fragments were distinguished as having differential expression, while 1786 genome fragments have changes in expression level greater than fourfolds, based on their RPKM values. In studies of plants comparative transcriptomics numbers of differentially expressed genes are usually presented in thousands: 3096 [32], 4811 [33], 5944 [34]. Furthermore, it is stated that no less than 20 % of plant genes express differentially in darkness and light [35,36]. Thus, we suppose that a number of genes that we identified as differentially expressed should not be considered exaggerated. At the same time, changes in the transcriptional level do not necessarily correlate with changes in levels of translation [37], and not all the observed changes in transcription level of the genes may be functionally associated with phenotypic changes between two barley lines.
Five genes encoding different pentatricopeptide repeat-containing proteins (PPR) show changes in expression level, with four of them having lower level of expression in line i:BwAlm and one having a higher level of expression in that line. PPR proteins are RNA-binding proteins that are especially abundant in land plants [38]. Most PPR proteins in plant cells are targeted either to mitochondria or chloroplast, where they bind RNA and affect gene expression [39]. More than 400 PPR genes are identified in A. thaliana genome [40], however, only 10 such genes are annotated in barley genome, which suggests that actual number of PPR genes that express differentially between Bowman and i:BwAlm lines is much larger.
Gene ontology analysis revealed that genes with lower level of expression in line i:BwAlm are mostly functionally associated with porphyrin pigment biosynthesis. Since chlorophylls a and b are chemical derivatives of porphyrin, it islikelythat these genes are indirectly associated with chlorophyll synthesis. Genes with higher level of expression in line i:BwAlm are functionally associated with cytosol vesicles. Despite TIC and TOC membrane complexes being a primary path of protein transport into plastids [41],vesicular transport is known to play role in this process [42]. Pathway analysis showed that differentially expressed genes participate in several metabolic pathways, including photosynthesis light reaction and Calvin-Benson-Bassham cycle that occurs in chloroplasts. Interconversion of several xanthin derivatives is another biological process that takes place on chloroplast membranes [43]. Different forms of xanthins are included in light-harvesting complexes of chloroplast and participate in nonphotosynthetic quenching process that protects the cell from oxydative stress [44]. Reduction in expression level of genes included in this pathway is in consistent with the general expression level decrease of genes functionally connected with chloroplasts. Likewise, observed lower expression levels of genes included in 'Chlorophyll cycle' pathway is in agreement with lowered amounts of chlorophyll in studied organs.
Variegated mutants that have patches of albino cells in leaves and other organs have been studied in regard of Values of logarithm of fold change and p-values (corrected with Benjamini-Hochberg procedure) are provided for each gene plastid composition of their cells that lack chlorophyll. It was shown that in many variegated mutants chloroplasts lack thylakoids but accumulate vesicles [45,46].

Plastid gene expression patterns in Bowman and i:BwAlm genotypes
Changes in transcript abundance of most plastid genome fragments are observed between two barley lines (suppl. file. 2). We did not observeany plastid genome fragment which abundance in line i:BwAlm is increased compared to Bowman. The only genes that did not change their abundance in i:BwAlm/Bowman are those having zero coverage in both barley lines. Changes in coverage observed for the majority of transcribed plastid genome fragments (20 of 47 transcribed genome fragments) is four-folds to sixteen-folds (2 < log 2 FC < 4). Majority (45 of 55) of genes in operons with changes in expression levels are protein-coding genes, mostly encoding ribosomal proteins (6 genes) and photosystems I and II proteins (16 genes) (Fig. 14). According to the results of microscopic analysis, many cells in studied organs lack chlorophyll. However, observed expression of several plastid genome fragments suggests that these cells still contain plastids, whether these are abnormal chloroplasts that do not contain chlorophyll or some other types of colorless plastids, presumably proplastids or etioplasts. At the same time, these plastids are either less abundant and/or less transcriptionally active than chloroplasts in cells of Bowman plants. This suggestion can explain overall decrease of plastid genome fragments coverage in line i:BwAlm. While most plastid genome fragments have a decrease in transcript abundance ranging from four-fold to sixteen-fold that can be accounted for by reduction in plastid number and/or general transcriptional activity, several genome fragments have a more abrupt decrease in transcription. Three genome fragments have decrease in transcript abundance of approximately 32 times in i:BwAlm compared to Bowman. One of these fragments contains genes psbC and psbD encoding subunits of Fig. 11 Participation of differentially expressed genes in Calvin-Benson-Bassham cycle pathway. The scheme of pathway is taken from PMN database. Genes with higher level of expression in Bowman line are marked in red Fig. 12 Involvement of differentially expressed genes in chlorophyll cycle pathway. The diagram of pathway was taken from BarleyCyc online database. Genes with differential expression are shown in red on the respective stages of their protein products involvement photosystem II, the other fragment contains a tRNA gene, and the third contains gene psaC encoding a photosystem I subunit and genes ndhE and ndhD that encode NADH-plastoquinoneoxydoreductase subunits. Gene rps14 that encodes one of ribosome proteins has an 87-fold transcript reduction. Genes that encode 4.5S, 16S and 23S ribosomal RNAs have decrease in transcript abundance from 155-fold to 368-fold.
Finally, 15 genome fragments that are expressed in Bowman have zero coverage in i:BwAlm. This list contains several genes encoding photosystems I and II subunits, NADH-plastoquinoneoxydoreductase subunits, tRNA genes, 5S rRNA gene and a gene encoding ATP synthase CF0 subunit IV.
It was shown that most plastid genes in H. vulgare belong to type II, i.e., have promoters for both PEP and NEP, with only rpoB gene being transcribed solely by NEP [47]. With this in mind, it looks surprising that we did not observe expression of rpoB and rpoC2genes in both lines. 23 genes encoding plastid tRNAs are not expressed in eitherlines as well. At the same time, plastidal rRNA-encoding genes have high levels of transcript abundance in both lines, except for 5S ribosomal RNA gene that shows no expression in i:BwAlm line while being transcriptionally active in Bowman line.
Etioplast is a plastid that lacks chlorophyll and is colorless. Etioplasts form in leaves deprived of light. A Fig. 13 Involvement of differentially expressed genes in Xanthin cycle pathway. The diagram of pathway was taken from BarleyCyc online database. Genes with differential expression are shown in red on the respective stages of their protein products involvement  comparison of chloroplast and etioplast proteomes revealed that several proteins present in etioplasts do not occur in chloroplasts [48]. Namely, protochlorophyllide reductase is believed to be an indispensable component of etioplast proteome. Gene porA encoding the enzyme is transcriptionally active both in i:BwAlm and Bowman. However, its expression is mainly controlled on translation stage [49]. NAD(P)H dehydrogenase subunits I and J are present in etioplasts as well [48]. At the same time, in i:BwAlm no transcripts of genes ndhI and ndhJ that encode two respective subunits were found, whereas a low level of transcription was observed for both genes in Bowman.
It should be noted, however, that changing the level of transcription for some genes maybe not the primary cause of albinism [50]. Albinism in plants may be due to various reasons, also including spontaneous chromosomal and ptDNA rearrangements, predominantly single or multiple deletions of DNA segments. Most deletions occur in the large single copy region, which is the least stable region in the entire plastome genome. This region primarily contains genes responsible for photosynthesis, genes encoding the proteins of photosystem I and II. Thus, further investigation is needed to clarify possible molecular mechanisms of in i:BwAlm mutants in details.

De novo transcriptome assembly
Three de novo transcriptome assemblies of three pooling combinations of six libraries were performed -(1) all six libraries together, (2) three Bowman libraries together and (3) three i:BwAlm libraries together. As a result, assembly of all six pooled libraries was deemed ineffective, since total length of transcripts and N50 values in that assembly were inferior in comparison to other two assemblies. Therefore, assembly of all six pooled libraries was removed from further analysis.
More than 99 % of assembled transcripts were successfully mapped to current H. vulgare genome assembly. Furthermore, N50 for both assemblies is slightly higher than reported in other works for H. vulgare transcriptome assembly [51,52]. This suggests that quality of assembly is satisfying. Relatively small number of transcripts that show no homology to barley genome and low percentage of short reads that did not map to genome during alignment point out that current version of H. vulgare genome, despite still being in development, is profound enough.
For sequences from de novo assembly that show no homology to genomic sequences of other plant species and have no replicates in the other assembly it can be assumed that they might be assembly errors. On the other hand, it is reported [53,54] that other plant transcriptomes have a fraction of unique sequences that do not align to other plant genomes. A more thorough investigation of assembled transcripts is required to distinguish assembly errors from unique transcripts. The fact that no novel transcripts aligned to A. thaliana CDS suggests that novel transcripts contain sequences specific at least to Poaceae, and possibly even narrower group.

Conclusions
Although peculiarities of genetic mechanisms controlling formation of albino lemma and pericarp phenotype remain unclear and require further investigation, several  Despite being state-of-the-art method, RNA-seq and transcriptome analysis should not be implemented without aid from other techniques of investigation, such as microscopy and proteomic and metabolomic analysis. To uncover the mechanisms of partial albino phenotype formation in this particular case further examination is required, namely microscopic analysis in order to clarify plastid number, structure and location in albino cells. Since expression of most plastid genes is regulated on post-transcriptional stages, proteomic analysis can give insight into actual protein content of the albino cells and shed light into coordination between plastid genes transcription and actual levels of their protein products.
Finally, it is suspected that, since transition from proplastid to chloroplast occurs during organogenesis, transcriptomic analysis of a formed organ is unlikely to reveal genetic background of phenotype formation. To detect genetic mechanisms behind albino lemma and pericarp phenotype formation, earlier stages of organ development, at which chloroplast genesis in plants of Bowman line occurs, should be examined and compared.

Methods
Plant materials, DNA and RNA extraction, genotyping H. vulgare cultivar Bowman and near-isogenic line i:BwAlm (NGB20419) developed at Bowman background but carrying the Alm gene were grown in a hydroponic greenhouse at the shared center "Laboratory of Artificial Cultivation of Plants" of Institute of Cytology and Genetics SB RAS under a 14 h photoperiod. Plants were genotyped and used for RNA-seq analysis. DNA was extracted from leaf material using a procedure described in [55]. Primers amplifying the barley 3H microsatellite loci Xgbms0022, Xgbms0046, Xgbms0048, Xgbms0050, Xgbms0085, Xgbms0102, Xgbms0149, Xgbms0212, Xeb-mac541, Xbmag225, Xbmag877 [56,57] were used in PCRs conducted according to [58]. The resulting amplicons were visualized after separation through a 5 % agarose (ACTGene, Inc., Piscataway, NJ, USA) gel. The scheme of donor's fragment in chromosome 3H of NGB20419 NIL is presented Additional file 9.

Phenotypic characterization
H. vulgare cultivar Bowman and near-isogenic line (NGB20419) were compared visually, using transmitted light and chlorophyll fluorescence microscopy. Confocal images of chlorophyll fluorescence pattern in different plant organs were acquired with a laser scanning system LSM 780 NLO (ZEISS, Germany). Samples of freshly isolated plant fragments were placed on a glass slide and covered coverslip. Fluorescence emission spectra were acquired using the 405 nm ray line of a diode laser for excitation and the emitted fluorescence was detected from 550 to 700 nm. To represent the pattern of chlorophyll fluorescence from volumetric organs were performed maximum-intensity-projection processing using ZEN software (ZEISS, Germany).

DNA and RNA extraction
Total RNA was extracted from i:BwAlm developing spikelets with albino lemma and pericarp and from those of Bowman control using a Plant RNA MiniPrep TM kit (Zymo Research Corporation, Irvine, CA, USA). RNA extracted from several plants was pooled together to exclude possible errors, introduced by deviations of biological material. Three biological replicates were prepared for each genotype.

RNA preparation and sequencing
RNA was pooled into six libraries (three replicates for each line). Pooled libraries of short read were examined for quality of reads. Libraries were incubated with poly-T-tailed beads for poly-A enrichment. ERCC spike-in mix was added to each of libraries to create a form of built-in control. RNA in libraries was fragmented by incubation with nuclease enzymes. Sequencing was carried out with IonTorrent platform.

Read preprocessing and mapping
We used FastQC (http://www.bioinformatics.bbsrc.ac.uk/ projects/fastqc) to estimate sequencing quality, cutadapt [60] to remove adapter sequences and prinseq [61] to filter sequences by quality (minimal mean Phred quality 20) and length (minimal fragment length 50). Resulting filtered libraries were mapped to H. vulgare reference genome assembly 082214v1.28 from Ensembl Plants database.
We used TopHat2 [62] and STAR [63] tools for read mapping. Preliminary analysis was performed for these two mappers to obtain optimal results.
We run TopHat2 ten times increasing values of the 'allowed mismatches' , 'read edit distance' and 'read gap length' parameters by 2 from 2to 20. Resulting alignments of TopHat2 were processed using Cufflinks [64] pipeline to obtain mRNA fragment counts and annotation. The following quality measures we estimated to choose the alignment parameters delivering maximal genome coverage and minimal mapping errors: (1)number of genome segments covered by reads continuously, N cov ; (2)number of genome segments covered by single read only in all six libraries, N orphans ; (3)number of annotated genome regions without library fragments aligned to them, N uncov ; (4)proportion of library fragments aligned to the reference genome, F mapped .
STAR was used to align the libraries four times with the parameter 'outFiletMismatchMax' set at 4, 8, 12 and 18. A proportion of aligned library fragments was examined in order to investigate the influence of allowed number of mismatches. Increase in number of allowed mismatches did not result in significant increase in number of mapped reads. STAR alignments with 18 allowed mismatches were taken into further analysis.
To estimate the consistency of mRNA expression data from different libraries we normalized read counts obtained by Tophat method using cuffnorm. Mapping files of STAR were processed with 'coverageBed' utility of Bedtools [65] program to produce table of counts for each mapped fragment. Gff file based on genome marking produced by Cufflinks pipeline was used as a reference genome annotation in order to assess table of counts from STAR mappings. These estimates of the mRNA fragments expression were used as input to Gen-eCluster3.0 [66] software to cluster six libraries by UPGMA method using average clustering method. Reconstructed tree describing similarity of mRNA expression estimates in all six libraries was visualized using TreeView program [67].

Gene expression analysis
Identification of differentially expressed genes was performed on TopHat2 mapping results by two methods: Cuffldiff and EdgeR [68]. To run Cufflinks pipeline we used default parameters. Genes were distinguished as differentially expressing if change of expression level was greater than two-fold (|log 2 FC| > 1) and p-value corrected through Benjamini-Hochberg procedure was less than 0.05 (q < 0.05). For EdgeR, differential expression detection was performed using Likelihood-ratio test. Genes were distinguished as diffrerentially expressed if the change of expression level was greater than two-fold and false discovery rate was less than 0.05 (|log 2 FC| > 1 and FDR < 0.05). Lists of differentially expressed genes with higher and lower levels of expression in i:BwAlm line were then analyzed separately. Resulting lists of differentially expressed genes obtained by cufflinks pipeline and EdgeR tool were compared to each other. Further analysis was performed on the list of genes with significant differential expression confirmed with either of tools, while keeping tracks for each gene in respect to which tool its differential expression was confirmed with.
Isoform differential expression was computed with Cufflinks pipeline. Cufflinks forms lists of potential isoforms for each transcribed genome fragment based on data on exon-intron structure annotated in the genome assembly. Genes were distinguished as having differential isoform expression if they have differential expression with the criteria listed above of two isoforms in two studied lines.

Transcript functional annotation
Prior to the transcriptome sequence annotation, we assigned Barley affymetrix array sequences to the list of differentially expressed transcripts with higher and lower levels of expression in line i:BwAlm separately by sequence similarity using Blast software. Gene ontology classification and enrichment analysis was performed by AgriGO online service [69].
A list of genes taken for pathway analysis consisted of genes annotated in the H. vulgare genome assembly and having changes of expression level greater than four-fold as derived from their FPKM values across all twelve mappings. The pathway analysis was performed using PlantCyc (http://www.plantcyc.org/) database. 'Overlay experimental data' option was selected for the Cellular Overview of H. vulgare (http://pmn.plantcyc.org/over viewsWeb/celOv.shtml). Experimental data was provided in form of text file with column of gene identifier and column with values of log 2 of FPKM change for respective genes between the two barley lines.
In addition, BarleyCyc database was examined specifically for pathways of plant pigments biosynthesis. Following pathways were studied: 'Superpathway of anthocyanin biosynthesis (from delphinidin 3-O-glucoside)' , 'Superpathway of anthocyanin biosynthesis (from pelargonidin 3-Oglucoside)' , 'Chlorophyll cycle' and 'zeaxanthin, antheraxanthin and violaxanthin interconversion' that is a part of 'superpathway of carotenoid biosynthesis'. 'Customize or Overlay Omics Data on Pathway Diagram' option was implemented, and list of differentially expressed genes was used to overlay on the list of genes included in the pathway.

De novo transcriptome assembly and analysis
De novo assembly of transcriptome was conducted using Trinity tool [70]. All libraries for each biological case were pooled together, and resulting sets of fragments were assembled using default parameters. Each assembly was aligned to H.vulgare reference genome using Blastn tool of blast suite with [71]. Reliability threshold for homology detection was e < 10 −10 . Fragments that had no significant homology with reference genome were aligned to Echerichia coli genome (NCBI reference sequence NC_000913.3) in order to remove possible contaminants. After removing contaminants, remaining contigs were aligned to Brachypodium distachion CDS library version 1.0.28, Oryza sativa CDS library version 1.0.28and Arabidopsis thaliana CDS library of version TAIR10. O. sativa and B. distachion sequences were retrieved from Ensembl Plants database, A. thaliana sequences were obtained from TAIR database [72]