Variation in plastid genomes and transcriptomes in the gynodioecious species Silene vulgaris

Gynodioecious species exist in two genders – male-sterile females and hermaphrodites. Male sterility in higher plants often results from mitonuclear interaction between the CMS (cytoplasmic male sterility) gene(s) encoded by mitochondrial genome and by nuclear-encoded restorer genes. Mitochondrial and nuclear-encoded transcriptomes in females and hermaphrodites are intensively studied, but little is known about gender-specific gene expression in plastids. We have compared plastid transcriptomes between females and hermaphrodites in two haplotypes of a gynodioecious species Silene vulgaris with known CMS candidate genes. We revealed no significant differences between the genders in plastid transcriptomes of two haplotypes of S. vulgaris. It suggests that gene expression of plastid genes is not affected by CMS in flower buds of S. vulgaris, although both genders may still differ in plastid gene expression in specific tissues. We revealed the difference between the plastid transcriptomes of two S. vulgaris haplotypes in editing rate and in the coverage of several antisense transcripts. Our results document the variation in plastid genomes and transcriptomes in S. vulgaris.


Conclusions
We revealed no significant differences between the genders in plastid transcriptomes of two haplotypes of S. vulgaris. It suggests that gene expression of plastid genes is not affected by CMS in flower buds of S. vulgaris, although both genders may still differ in plastid gene expression in specific tissues. We revealed the difference between the plastid transcriptomes of two S. vulgaris haplotypes in editing rate and in the coverage of several antisense transcripts. Our results document the variation in plastid genomes and transcriptomes in S. vulgaris.

Background
Gynodioecy is a plant breeding system, in which hermaphrodite (H) and female (F) individuals co-occur in the same population. It occurs in about 2% of all angiosperm genera [1]. Male sterility in most (but not in all) gynodioecious plant species is encoded by the interaction of mitochondrial-encoded cytoplasmic male sterility (CMS) genes and by nuclear restorer of fertility (Rf) genes [2]. CMS is used in agriculture to produce hybrid seed with high yield and it is therefore extensively studied in crops, e.g., rice [3,4], sunflower [5], maize [6], or sugar beet [7].
Despite a widespread occurrence of gynodioecy among angiosperms, studies of CMS in wild species remain scarce. Silene vulgaris (bladder campion) emerged as a model for the study of gynodioecy and CMS in natural populations more than two decades ago [8]. The genus Silene is rich in species with diverse mating systems -hermaphroditism, gynodioecy and dioecy [9]. The mating system affects DNA sequence variation in organellar loci, which is often higher in gynodioecious Silene species than in their closely related dioecious or hermaphroditic congeners most likely owing to balancing selection acting on CMS loci and the whole organellar genome [10,11].
Besides the mode of selection, substitution rate is another essential factor which influences DNA sequence variation in plant organelles. Whereas the substitution rate is generally low in plant organellar genomes, it is highly elevated in some phylogenetic lineages, including the genus Silene [12]. Particularly two species -Silene noctiflora and Silene conica -achieved extreme rates of sequence and structural evolution in mitochondrial and plastid genomes [13,14].
Silene vulgaris has an organellar genome substitution rate above the angiosperm average but is relatively slowly evolving compared to other Silene species [15,16]. However, intraspecific structural rearrangements in the mitochondrial genome of S. vulgaris are extreme, involving not only frequent losses and gains of intergenic DNA, but also changes in coding sequences [15,17].
The accessibility of completely sequenced mitochondrial genomes of S. vulgaris and progress in RNA-seq methods enable the construction of comprehensive mitochondrial transcriptomes in this species. Such comparisons of transcriptomes between F and H plants in this species have revealed candidate CMS genes in their mitochondrial genomes and found differences in RNA editing rates between the different haplotypes which increased intraspecific protein variation [17,18].
The plastid genome of S. vulgaris has been studied to a lesser extent than the mitochondrial genome of this species. The complete sequence of the plastid genome of only a single haplotype has been published [14] and no comprehensive plastid transcriptome analysis of this species is available.
To gain more detailed insight into plastid genome evolution in S. vulgaris, we assembled complete plastid sequences from five haplotypes of this species, including two accessions from high mountains adapted to high altitudes and genetically distant from the remaining three haplotypes originating from lowland populations [19]. CMS is a complex phenotype, which is associated with profound changes in the expression of some mitochondrial and many nuclear genes [20 -22]. However, plastid transcriptomes are rarely studied in connection with CMS and it is not known, whether the transcription of plastid genes is affected by the male-sterile phenotype in gynodioecious plants. We therefore compared plastid transcriptomes of F and H plants in two of the haplotypes of S. vulgaris, for which mitochondrial transcriptomes are also available [17,18]. We did not detect any significant distinctions between the genders but found interesting differences between transcriptomes of the two S. vulgaris haplotypes.

Plant material
We collected seeds of S. vulgaris from two populations occurring above a timber line in European mountains (Dachstein and Vřesová studánka, the haplotypes D11 and VS1, respectively), and from one population growing in sand dunes just above sea level in Netherlands (Zandvoort, ZE2) ( Table 1). The mountain populations S. vulgaris exhibited morphological traits (floral color, leaf shape, procumbent growth) distinct from lowland plants. These populations were sometimes treated as the separate subspecies S. vulgaris subsp. prostrata or S. vulgaris subsp. glareosa [49], but unrestricted gene flow and clinal variation in floral color and flavonoid production along altitudinal gradient were documented [19,32]. We therefore refer to the mountain populations simply as S. vulgaris in our study.
Seeds were germinated and cultivated in the greenhouse at the Institute of Experimental Botany (IEB) in Prague, as described previously [18]. S. vulgaris collected in Dachstein is procumbent with light violet flowers. It was sometimes classified as S. vulgaris subsp.
prostrata [49]. However, individuals with intermediate phenotypes between higher altitude and lower altitude plants were reported [19], which suggests unlimited gene flow among the populations. The Dachstein plants are very sensitive to moisture, and they grew poorly in the greenhouse.  [50] was adopted to correct long but error-prone SMRT reads with short but accurate MiSeq reads. The proovread output 'trimmed' consisting of error corrected reads was used as a local blast database and the reads homologous to the S. vulgaris plastid genome (JF715057) were identified by blastn search with a cutoff e -20. Canu v 1.3 [51] was applied for the assembly of proovread corrected reads. The resulting two contigs corresponded to a long single copy region (LSC) plus inverted repeat (IR), and a short single copy region (SSC). The complete plastid genomic sequences were deposited under the Genbank accession numbers MK473866-8 (ZE2, VS1,D11).
The complete plastid genomic sequence JF715057 [14] derived from S. vulgaris carrying the mitochondrial haplotype SD2 [15] served as a reference for the annotation of the newly assembled plastid genomes of S. vulgaris.

Complete plastid genomic sequences from S. vulgaris KOV and KRA
The data sets obtained from the Roche 454 GS-FLX platform with Titanium reagents (from constructed 3 kb paired-end libraries) previously used to assemble mitochondrial genomes of the S. vulgaris haplotypes KOV [15] and KRA [17] were utilized for the generation of plastid sequences for both haplotypes. Although the DNA specimens were enriched for mitochondrial DNA, they contained plenty of plastid reads, which provided 10 -20 × coverage of the plastid genome. Roche's GS de novo Assembler v.2.6 ('Newbler') was used for initial assembly. The resulting contigs were mapped against the available chloroplast genome of S. vulgaris (JF715057) and gaps were filled by individual trimmed 454 reads mapping against the same reference. The KOV and KRA plastid genomic sequences were confirmed by re-mapping of the 454 reads against them. Alignments were manually checked for potential SNPs, indels and insertions and edited respectively. Withinindividual variation in A or T homopolymers > 5 was often observed, which might reflect possible heteroplasmy, or the co-existence of two or more variant sequences in the same individual. The resulting KOV and KRA complete plastid sequences (GenBank accession numbers MH890612 and MH890613) were used as the reference genomes for the following transcriptomic analyses.

Chloroplast genome features
The distance matrix with the pairwise comparison of single nucleotide polymorphisms across all different plastid genomes was calculated using snp-dists (v. 0.6, github.com/tseemann/snp-dists), while for the pairwise comparison of indels the dist.dna function of the ape package within R was employed with the "indel" and "indelblock" model. The positions of simple sequence repeats (SSR) were estimated using the microsatellite identification software tool MISA-web [52] with thresholds of the repeat sequence length longer than five nucleotides for mononucleotides, four repeat units for dimer and trinucleotide SSRs, and three repeat units for tetra-, penta-and hexanucleotide SSRs.

Phylogenetic analyses
First, the six plastid genomes of S. vulgaris together with Silene latifolia as outgroup (JF715055) were aligned with MAFFT v.7.388 [53] using the L-INS-i mode with misaligned sites manually edited. The inverted repeat region A (IRA) region was cut from the resulting nucleotide alignment. For the calculation of phylogenetic trees two different alignments were compared; first with non-informative sites such as homopolymer regions masked when longer than five nucleotides in non-coding sequence (CDS); second with CDS only (without tRNAs and rRNAs). Additionally, indel characters were coded using the "simple indel coding" algorithm as described in [54] for both alignments with 2matrix [55]. The phylogenetic trees were calculated using RAxML [56] and MrBayes [57] at the CIPRES portal [57].
The maximum likelihood (ML) method was applied using the CIPRES webportal with RAxML v. 8.2.10 with 1000 bootstraps and the GTRGAMMA model for both bootstrapping and tree inference. Indels were given in a partition file as binary characters describing indel size and distribution throughout the respective sequence alignment. Alternatively, the Bayesian approach for phylogenetic tree construction was employed through MrBayes v. The initial alignment was performed with the assembler GSNAP v. 2017-05-03 [60] in paired-end mode with known splice sites [14]. The plastid genome sequences of the haplotypes KOV and KRA of S. vulgaris were used as the references. IRA within the reference plastid genome was cut away to ensure proper read mapping with GSNAP. The resulting alignments were separated by strand using the view function in SAMtools v. 1.9 [61] by filtering according to read-pair orientation utilizing SAM flags as described in [18] These alignments were filtered for potential mitochondrial reads derived from plastid DNA inserted to the mitochondrial genome of the corresponding haplotype (GenBank accession numbers JQ771300 and MH455602) ( Table 4). At these known regions all reads not matching to the reference sequence and not presenting potential RNA editing were filtered deploying SAMtools view function, a custom AWK script and seqtk v. 1.2 (https://github.com/lh3/seqtk) for subsequent analyses. The mapped reads were visualized by means of the Integrative Genomic Viewer (IGV) [62].
For comparison we also used the ChloroSeq pipeline [24]  The same reference, known splice sites and filtered reads as for the final GSNAP alignments were used with bowtie v. 2.2.6 [63] and tophat v. 2.1.1 [64] for read mapping as described in [24].

RNA editing rates
Initial variant discovery was performed with HaplotypeCaller in GenomeAnalysisTK v. 3.7 [65] on the minus-and plus-stranded alignments of S. vulgaris KOV and KRA with the minimum call and emit threshold (stand_call_conf) set to 20. All variant sites with C-to-T for the plus-and G-to-A alteration in the minus-strand were manually checked and verified for subsequent final calling of variants with SAMtools v. 1.2 mpileup using the DPR output tag (discontinued since v. 1.3) for the number of high-quality bases per observed allele. In cases of low read mapping, but definite RNA editing the DP4 values were used to evaluate editing rates across all individuals. RNA editing rates were calculated based on these values as counts of Ts divided by the sum of Cs and Ts at the specific editing sites. The final editing sites were used as list for calculation of editing rates with ChloroSeq. Editing rates below a threshold of five percent (or 10 edited nucleotides, whatever is smaller), or low coverage (less than 200 reads mapped) are not shown in the final results.

Transcript abundance estimation
The estimation of transcript abundance was done as described in [18] using the coverage function within bedtools and a custom AWK script. The coverage was calculated per-base, averaged over the length of the respective feature of interest, first regardless of strand followed by strand-specific calculations for each sample and normalized as TPM as described in [66]. The average and standard deviation of TPM were calculated for both haplotypes and sexes. Antisense transcripts were recognized if their depth of coverage exceeded 300 -500, which corresponded to the TPM values of the least expressed protein coding genes.

Complete plastid genomes of S. vulgaris
We assembled five complete plastid genomes of S. vulgaris from Eurasia -the haplotypes D11, VS1, ZE2, KRA and KOV ( Table 1). They ranged in length from 151,463 bp to 151,572 bp and contained a long single copy region (LSC), a short single copy region (SSC), and two inverted repeats (IRs) ( Table 2). The boundaries between repeat and single copy regions and gene content were identical to the previously published plastid genome of S. vulgaris SD2 [14].
We identified the positions of simple sequence repeats (SSR), short arrays of tandem repeat units, in the six plastid genomes of S. vulgaris under study. We found 871 mononucleotide repeats longer than five nucleotides, but only 46 of them were longer than nine nucleotides (Additional file 1: Data Set 1). In addition, there were 62 dinucleotides, eight trinucleotides, eight tetranucleotides, and only one pentanucleotide.
SSRs represent useful markers in population genetic studies owing to their variability.
Within-individual variations in the number of mononucleotide units, arisen due to heteroplasmy, was observed in most mononucleotide regions. SSRs with a repeat unit higher than two nucleotides, not affected by heteroplasmy, can therefore be recommended for plastid genotyping in S. vulgaris. We found 19 positions of di-, tri-, or tetranucleotides, which varied among the analyzed plastid genomes of S. vulgaris.

Phylogenetic relationships and sequence polymorphism
Plastid haplotypes VS1 and D11 of the high mountain S. vulgaris populations, occurring at the altitudes above 1200 m a. s. l, diverged first on the phylogenetic tree constructed on the basis of concatenated protein-coding sequences and with S. latifolia as an outgroup (Fig. 1b). The same topology was confirmed when entire plastid sequences except for homopolymers larger than five were used (Fig. 1a). In the latter phylogenetic tree, the position of the KRA haplotype from Siberia and D11 haplotype from the Alps was not resolved. The results obtained by maximum-likelihood (ML) method were consistent with the outputs generated by MrBayes, except that the position of KRA and D11 was always resolved (Additional file 2: Fig. 1, Additional file 3: Fig. 2).
The number of indels in pairwise alignments of individual plastid genomes of S. vulgaris varied from 33 to 112, and nucleotide substitutions ranged from 35 to 212 ( Table 2). The highest number of the polymorphisms was found in the alignments with the haplotype VS1 from the Jeseníky Mts, consistent with its divergence's basal position within phylogenetic tree of S. vulgaris haplotypes. The indels occurred primarily in intergenic regions, but the deletion of nine bp in length, which shortened the rpl20 protein by the last three amino acids was identified in the haplotypes SD2 and ZE2. All other haplotypes (KOV, KRA, D11, VS1) harbored the longer version of the rpl20 protein.
S. vulgaris plastid genes varied in their degree of polymorphism. Thirty of 77 unique protein coding genes were identical, and additional 25 genes carried only synonymous segregating sites and therefore encoded proteins identical among the six plastid haplotypes. Only 22 genes, including accD, matK, rpoB or ycf2, carried at least one nonsynonymous segregating site (Fig 2). Two highly polymorphic genes -ycf1 and ndhF -

Plastid transcriptomes of S. vulgaris KRA and KOV
We generated plastid transcriptomes of two haplotypes of S. vulgaris KRA [17,18] using the data sets previously employed to construct the mitochondrial transcriptomes. We compared gene coverages and RNA editing rates in flower buds between F and H individuals and between the two haplotypes.
We compared depth of coverage of protein coding genes, because rRNA was removed before cDNA library preparation and small RNAs (< 100 nt) including tRNAs were lost in the course of RNA extraction. Depth of coverage was similar in F and H plants in both haplotype KOV and KRA, no gene was differentially expressed between the genders. The depth of coverage could not be directly compared between the KOV and KRA plastid genomes, because Illumina sequencing was performed on different platforms and produced reads of different lengths for each plastid transcriptome. We therefore compared the sets of highly and lowly covered genes between the two haplotypes. The genes psbA, rbcL, psbE, rps14 and rps16 were among the most highly expressed, whereas the ndhF and psbN were the least expressed genes both in KRA and KOV plants (Additional file 1: Data Set 4.), which indicates general similarity between the two plastid transcriptomes.
Introns were covered to a lower extent than adjacent exons in most plastid genes, but their coverage reached levels comparable to exons in some genes, for example in ndhB (Fig. 3a).

Antisense non-coding transcripts in plastid transcriptomes of S. vulgaris KRA and KOV
We identified five antisense non-coding transcripts > 100 nt with transcript abundance comparable to protein coding genes, overlapping trn genes (Additional file 1: Data Set 5.), three of them were found in both KRA and KOV transcriptomes, two were revealed in KOV only. The trnS-GGA and trnW-CCA -trnP-UGG antisense transcripts corresponded to the 3'UTRs of the rps4 gene and petG genes, respectively. The trnS-UGA antisense transcript colocalized with the 5'UTR of the psbZ gene (Fig. 3b, c, d).
The antisense transcript spanning the 5'UTR and the start of the rps14 gene (Fig. 4a), was the most abundant atisense transcript derived from protein coding genes in KOV, but it was absent in the KRA transcriptome. Another antisense transcript derived from the rps19 and rpl2 genes was revealed in both S. vulgaris transcriptomes under study (Fig.   4b). The psbN gene coding for a small transmembrane protein necessary for the assembly of photosystem II [23] was transcribed from minus DNA strand in sense orientation and from the opposite strand in antisense orientation as a part of the longer psbT-psbH transcript. The antisense psbN transcript exhibited much higher depth of coverage than the sense psbN transcript coding for the psbN protein (Fig. 4c).
Similarly with protein-coding genes, no statistically significant differences in antisense transcript levels were found between F and H plants. In contrast, the abundance of antisense transcripts differed between the KOV and KRA transcriptomes of S. vulgaris more than the transcript levels of protein coding genes. The most remarkable distinction was found in the antisense rps14 transcript (Fig. 4a), which was highly covered in KOV, but completely missing in KRA.
The depth of coverage in the KRA and KOV plastid transcriptome estimated by the ChloroSeq pipeline [24] was in agreement with the results stated above, which were obtained using GSNAP according to [18] (Additional file 1: Data Set 4.).

RNA editing positions in plastid genomes of S. vulgaris KRA and KOV
We identified 51 unique C to U editing sites in the plastid genomes of S. vulgaris KRA and KOV, 38 of them located in protein coding regions, 2 of them in introns, and 11 in intergenic regions. Editing sites in rRNAs and tRNAs were not evaluated due to their biased coverage caused by the sample preparation methods. Most edits (95%) in coding sequences were non-synonymous, changing the amino acid composition; only two editing sites were silent. The most frequently edited genes were ndhB (9 edits), ndhD (4 sites), and ndhA (4 sites).
The most remarkable difference was found in the position 50 of the psbZ coding region, which changed leucine for serine. No editing was observed in this position in the KRA haplotype, whereas approximately 6 % of psbZ transcripts, which represented about four hundred reads, were edited in each of six KOV plants ( Table 3) No statistically significant differences of editing rates between F and H individuals were observed between the KOV or KRA haplotypes. The estimates of editing rates provided by the GSNAP [18] and the ChloroSeq pipeline [24] were consistent (Additional file 4: Figure S3).

Variation in six plastid genomes of S. vulgaris
This comparison of six completely sequenced plastid genomes of S. vulgaris revealed identical structures and moderate sequence differences, which is in line with the previously reported slow plastid evolutionary rate of this Silene species, unlike fast evolving S. noctiflora and S. conica [14]. Our study involved two accessions from mountain populations and one accession collected in sandy beach close to the sea. The mountain haplotypes diverged first in the phylogenetic tree, confirming their genetic distance from lowland populations. In contrast, the beach accession ZE2 clustered together with the remaining populations, suggesting it recently diverged likely becoming isolated in the process of adaptation to the ecologically divergent seashore habitat with an increased salinity.
High variation in plastid evolutionary rates exist not only among the Silene species, but also among individual plastid genes [16]. We found prominent distinctions in substitution rate among the plastid genes also at the within-species level in S. vulgaris. Mainly two genes, ycf1 and ndhF, accumulated a high number of non-synonymous substitutions.
Whereas the ycf1 gene exhibited high acceleration in substitution rate across all Silene species and also other angiosperms, the ndhF gene belonged among the more slowly evolving genes [16]. Our study found elevated Ka/Ks in the ndhF pair-wise alignments which included at least one mountain haplotype of S. vulgaris. The Ka/Ks values close to 1 indicate relaxed selection, the values >1 may imply positive selection. Our results suggest distinct selection acting on the ndhF gene in mountain populations compared to lowland plants of S. vulgaris. The habitats above timberline are exposed to intense light including UV. The increased content of flavonoids in the mountain populations of S. vulgaris has been interpreted as a defense against excessive light [32]. It is possible that the nonsynonymous substitutions in the ndhF may be also related to adaptation to a higher light intensity in high altitudes.

RNA editing in plastid transcriptomes of S. vulgaris
We found no significant difference in the plastid transcriptomes of flower buds in terms of coverage and RNA editing rate between F and H plants neither in the KRA nor KOV haplotype of S. vulgaris. This indicates that plastid transcriptomes were not affected by the processes underlying pollen abortion in two haplotypes with distinct CMS types [17,18]. However, as we analyzed total RNA from entire flower buds, it cannot be excluded that the transcription in plastids of some specific tissues, e.g., tapetum is still influenced by CMS.
Thirty eight RNA editing sites were revealed in the protein coding genes in the S. vulgaris transcriptomes. This value is comparable to those reported in other eudicots: 34 edits A. thaliana [28], 40 edits in V. radiata [31], or 51 edits in C. sativus [25]. This number is much lower than 138 editing sites discovered in the plastid protein coding genes of the basal angiosperm A. trichopoda, which is in agreement with a general trend of gradual loss of plastid edits in the course of the evolution of flowering plants [25]. The loss of editing by the replacement of C for T in genomic DNA occurred in parallel to a similar extent, but it affected distinct sites in various lineages ( Table 3).
The vast majority of editing sites in protein coding genes (36) were nonsynonymous, changing the encoded amino acid. With a single exception (low-rate edit introducing a premature stop codon in ndhJ), all non-synonymous sites were conservedthey were either edited or C was replaced with T in at least one species under comparison.
Most, but not all non-synonymous sites were edited more than 80%. An interesting example of a largely unedited essential position was observed in the ndhD gene, where editing established the start codon in only about 15% of transcripts, which strongly reduced the abundance of correct mRNA and might have decreased the production of functional NdhD protein. The same position was low-edited in non-photosynthetic organs (roots, etiolated seedlings) of A. thaliana [33]. As our transcriptomes were derived from young flower buds, which contained both photosynthetic and non-photosynthetic tissues, low editing of the ndhD start codon might have reflected a lack of editing in some floral tissues. Owing to its possible strong impact on the NdhD protein abundance and the function of NDH complex, editing of the start codon of might have been employed to regulate the expression of the ndhD gene. Although the primary function of organellar RNA editing in plants resides in the restoration of conserved amino acids [34], its role in posttranscriptional gene expression control shall be considered [35]. Additional examples of developmental stage-and tissue-specific RNA editing were previously described in plastids of tomato [36] or potato [37].
The two haplotypes of S. vulgaris differed in editing extent of the psbZ gene. One haplotype was edited to a low extent, but consistently across all six individuals of both genders, the other one was not edited at all. Editing rate of the position 50 of psbZ varied across angiosperms under comparison, where all the possibilities, namely editing, replacement of C for T, and zero editing, were observed ( Table 3). No comprehensive plastid transcriptome of Silene has been published, but mining available transcriptomic data from GenBank showed similar pattern of the variation in psbZ editing rate across close relatives of S. vulgaris as across angiosperms. The psbZ protein is an important component the supramolecular architecture of photosystem II [38,39], whose subunits belong among the least divergent genes in Silene [16], most likely owing to the action of purifying selection. The variation in psbZ RNA editing across angiosperms, Caryophyllaceae, and even within a single species S. vulgaris is therefore surprising. The editing of the position 50 replaces S for L in the middle of the first transmembrane domain of psbZ [39]. It is possible that this exchange does not have noticeable impact on the protein function. The variation in RNA editing among Arabidopsis species, affecting functionally less important sites, was recently described [40], but the position 50 of psbZ was uniformly edited in the Arabidopsis species according to this study. As the position 50 of psbZ is completely edited in the model plant A. thaliana, the identification of nuclear factors responsible for the psbZ editing will be possible, which may help to clarify the function of this editing event.

Antisense RNAs in plastid transcriptomes of S. vulgaris
We found the long antisense transcript of the psbN gene, which was more abundant that the sense transcript of this gene. The psbN gene is located on the strand complementary to the psbT-psbH intergenic region, which is a part of the conserved psbB operon. The transcription of the psbN gene in A. thaliana is controlled by a specific promoter recognized by the plastid-encoded RNA polymerase together with nucleusencoded sigma factor SIG3 [41]. The antisense psbN transcript was found to affect the cleavage of the psbT-psbH intercistronic RNA [42] and to influence the translation of psbT mRNA in A. thaliana [43]. It is therefore possible that antisense psbN transcript has a similar regulatory function in S. vulgaris. On the contrary, we found only a very low or zero coverage of the strand complementary to the ndhB gene. The antisense ndhB transcript was observed in A. thaliana, tobacco and poplar and may play a role in mRNA stability control [44]. Its expression is influenced by temperature and developmental stage. It may not be expressed in floral buds, or in S. vulgaris at all.
We did not estimate the expression of small RNAs including tRNAs, owing to a size limitation, but we detected longer antisense RNAs transcribed from the strand complementary to the trn genes. The antisense trnS-GGA and antisense trnW-CCA are located in 3'UTR of rps4 and petG, respectively. They form secondary structures, which may be recognized by RNA-binding proteins that regulate transcription of plastid mRNAs [45,46]. Similarly, the antisense trnS-UGA may stabilize the 5'end of the psbZ transcript and influence its translation. Numerous antisense RNAs were described in plastid transcriptomes, for example 107 putative antisense transcripts in A. thaliana [47], or 137 antisense candidates in Salvia miltiorrhiza [48]. We detected only eight putative long antisense RNAs in S. vulgaris, which might have been caused by two factors. First, we narrowed our search by raising the coverage threshold to the level of protein-coding genes. Second, we carefully eliminated reads derived from plastid inserts in the mitochondrial genome, which can be erroneously recognized as plastid-encoded transcripts.
The accumulation of antisense RNA can be influenced by the environment and developmental stage [43,44], which may explain, why some antisense RNAs were expressed only in one haplotype of S. vulgaris. On the other hand, all the putative antisense RNAs recognized in the S. vulgaris transcriptomes were also found in A. thaliana [47], which suggests their evolutionary conservation and possible functional importance.

Conclusions
We found no significant differences between F and H individuals in the plastid transcriptomes prepared from flower buds (where the differences between both genders may be expected) of two haplotypes of gynodioecious plant S. vulgaris, which suggests that CMS was not associated with the changes in plastid gene expression in this species.
However, we cannot exclude, that differences in plastid transcriptomes exist in specific tissues of floral buds. We observed differences between the two haplotypes of S. vulgaris

Consent to publish
Not applicable

Availability of data and materials
The data of this study data have been deposited in the NCBI with BioProject accession

Competing interests
The authors declare that they have no competing interests.

Funding
This project was funded by the grant of the Grant Agency of the Czech Republic 16-09220S to HŠ. Additional support was provided by European Regional Development Fund-Project   Branches with bootstrap support below 60% were collapsed to polytomies.
Phylogenetic trees were computed through the CIPRES webportal with RAxML v.
8.2.10 using 1000 bootstraps and the GTRGAMMA model.