Tight association of genome rearrangements with gene expression in conifer plastomes

Background Our understanding of plastid transcriptomes is limited to a few model plants whose plastid genomes (plastomes) have a highly conserved gene order. Consequently, little is known about how gene expression changes in response to genomic rearrangements in plastids. This is particularly important in the highly rearranged conifer plastomes. Results We sequenced and reported the plastomes and plastid transcriptomes of six conifer species, representing all six extant families. Strand-specific RNAseq data show a nearly full transcription of both plastomic strands and detect C-to-U RNA-editing sites at both sense and antisense transcripts. We demonstrate that the expression of plastid coding genes is strongly functionally dependent among conifer species. However, the strength of this association declines as the number of plastomic rearrangements increases. This finding indicates that plastomic rearrangement influences gene expression. Conclusions Our data provide the first line of evidence that plastomic rearrangements not only complicate the plastomic architecture but also drive the dynamics of plastid transcriptomes in conifers. Supplementary Information The online version contains supplementary material available at 10.1186/s12870-020-02809-2.

In this study, we sequenced both plastomic DNA and RNA from one representative genus in each of the six extant conifer families. Strand-speci c RNA libraries have the advantage of allowing for the discrimination of sense and antisense transcripts [17]. We took advantage of this to (1) investigate the full transcription capability of both plastomic strands, (2) estimate the relative number of plastid coding and antisense transcripts, and (3) identify plastid C-to-U RNA-editing sites separately at sense and antisense transcripts in conifers. We also compared plastid gene expression levels among conifers and demonstrated a strong association between gene expression and plastomic rearrangements. We discuss possible mechanisms underlying this association.

Both plastomic strands are fully transcribed in conifers
The six newly assembled plastomes are illustrated as linear molecules to facilitate pairwise comparisons ( Fig. 1a). A plastomic inversion was detected in the sampled K. davidiana individual when it was compared to the conspeci c reference (NC_011930; Fig. S1a). This polymorphic inversion is anked by Pinaceae Type I repeats (Wu et al. 2011 [18]), which are capable of triggering homologous recombination to generate predominant and substoichiometric plastomic isomers in K. davidiana (Fig. S1b).
RNAseq coverage across the six sampled conifer plastomes is represented as histograms in Fig. 1. RNAseq coverage of rRNAs is low, indicating effective depletion of rRNA transcripts prior to sequencing.
We also found that over 94.2% of the plastome sequences were covered by RNAs generated from a speci c single DNA strand. The coverage ratio increased to over 99.9% after RNAseq reads from both strands were combined (Fig. S2). Overall, our data reveal almost full transcription of both plastomic strands, indicating that intergenic, intronic, and antisense transcripts are ubiquitous in conifer plastids.
Our data also show that CDS sense transcripts are generally more abundant than their antisense counterparts, although there are several exceptions (Fig. S3). For example, psbN, a photosynthetic system II gene, is located on the strand opposite to the psbB operon, a well-known polycistronic transcription unit that comprises four genes: psbB, psbH, petB, and petD [19]. Therefore, the transcripts antisense to psbN are likely overrepresented due to the strongly expressed psbB operon. This nding suggests that the positions of plastid genes strongly affect antisense RNA expression.
In uence of plastomic rearrangements on CDS expression Among the six conifer plastomes, 31 syntenic blocks were identi ed to estimate plastomic rearrangements (Fig. S4a). Pairwise comparisons reveal 2 − 14 inversions among the sampled conifers (Fig. S4b). To normalize expression levels, RNAseq reads mapped to CDSs were collected and combined to calculate transcripts per million (TPM). Figure 2 compares the TPM scores between orthologous plastid CDSs that retain equivalent functions across conifer species. We found that (1) psbA and rbcL are the two most highly expressed genes in the presence of light and (2) TPM scores of these orthologous genes are signi cantly correlated (Pearson's correlation coe cients = 0.733 to 0.914, all P < 0.001), suggesting that their expression levels are strongly functionally dependent. However, these correlation coe cients are inversely associated with the number of plastomic rearrangements (PR) between species (Pearson's ρ = −0.626, P = 0.013). Taken together, our results demonstrate that plastomic rearrangements reduce the strength of functionally-dependent association of plastid gene expression. In other words, these rearrangements in uence gene expression in conifers.

Plastid RNA editing occurs in both sense and antisense transcripts
We detected 78 C-to-U RNA-editing sites in K. davidiana plastids, 42 in A. dammara, 23 in N. nagi, 35 in S. verticillata, 32 in Ce. wilsoniana, and 21 in Cu. konishii ( Fig. 3a; Table S1). Notably, the majority (76.2-96.9%) of these edited sites cause non-silent editing, introducing non-synonymous changes in amino acid sequences. In contrast, silent-editing sites at synonymous codon positions occur in only 0 − 14.3% of the sites. In addition, editing e ciency at silent-editing sites is nearly always less than 50%, with two exceptions: ndhE of A. dammara and psbA of S. verticillata ( Fig. 3b; Table S1). We also discovered one to three editing sites in antisense transcripts of CDSs from each conifer species. These sites are partially edited, with e ciency less than 50% (Fig. 3b).
We further investigated the intersection among these edited sites based on their alignments. In Fig. 3c, edited sites are designated as "shared" when they appear at the same alignment position in two or more species. Those found only in a single species are designated as "speci c" sites. Most silent and antisense edited sites are species-speci c. Only one site-located in the rps8 transcript-is shared by all conifers, suggesting that it originated in the common ancestor of all conifers more than 300 million years ago [20]. In addition, K. davidiana plastids contain more species-speci c RNA-editing sites than any other species we examined, with the proportion of "speci c" sites exceeding other conifers by more than twofold (Fig.  S5). This nding implies that Pinaceae has evolved a distinctive set of plastid RNA-editing sites after diverging from cupressophytes.

Discussion
We used strand-speci c RNAseq data to explore plastid transcriptomic pro les across all six conifer families. Our data indicate that conifer plastomes transcribe nearly full sequences of both DNA strands, reinforcing the viewpoint that full transcription of plastomic sequences is the norm rather than an exception among seed plants [8]. In addition, we identi ed a number of C-to-U edited sites in sense and antisense transcripts. These results suggest that strand-speci c RNAseq improves the detection of RNAediting sites by not only removing antisense contamination during mapping but also allowing for the exploration of editing events in antisense transcripts. Notably, all antisense sites are edited ine ciently, implying that they are likely accidental or tissue-speci c [21][22][23].
We also discovered numerous RNAseq reads mapping onto introns, intergenic spacers (IGSs), and the regions antisense to CDSs. Plastid non-coding RNAs were proposed to regulate gene expression [24]. In some model plants, plastid CDS and IGS transcripts have similar expression levels [7]. We did not compare transcript abundance between CDSs and IGSs because the latter's transcriptional orientation was uncertain, making it di cult to identify the corresponding RNAseq reads in a strand-speci c manner. Nonetheless, we did observe numerous transcripts antisense to CDSs. In plastids, antisense transcripts were hypothesized to bind to the 3' end of mRNAs and stabilize them [25]. This stabilization mechanism is likely active for all CDS transcripts since their antisense counterparts are prevalent in conifer plastids.
It has long been known that transcription termination of most plastid genes is ine cient as it results in abundant and diverse read-through transcripts that must be post-transcriptionally processed [26]. In a recent study [27], the mechanism of read-through transcription, which affects the transcription of downstream genes, resulted in extreme accumulation of accD transcripts when transcription termination of the upstream gene, rbcL, was inactivated. Here, we propose that read-through transcription also helps interpret our nding that plastomic rearrangements in uence gene expression. Relocating a gene involves recon guring its neighboring loci and thus altering the read-through transcription effect from the upstream gene. This ultimately changes the number of transcripts of the relocated gene and its downstream neighbors. However, without environmental stress treatments, it is di cult to link altered gene expression from plastomic rearrangements with a biological adaption. Fortunately, inter-and intraspeci c plastomic inversions have been documented in several conifer lineages [this study; 18, 28 − 31], providing ideal material to study the association between plastomic rearrangements and biological adaptation in the future.

Conclusion
It has long been known that plastomic rearrangements occur frequently in conifers. However, gene expression dynamics in the relocated plastid gene (after rearrangements) and its downstream neighbors has not been investigated. In this pivotal study, we show that in conifers (1) both plastomic strands are fully transcribed, (2) increased plastomic arrangements reduce the strength of functionally-dependent association of plastid gene expression, (3) RNA editing occurs in both sense and antisense transcripts, and (4) the Pinaceae have evolved a distinctive set of plastid RNA-editing sites after diverging from cupressophytes. The tight association of plastomic rearrangements with gene expression leads us to propose that read-through transcription is likely the key to make this association. Additional studies and molecular biology validation are needed to better understand the biological adaptation of plastomic rearrangements in conifers.

Plant materials, DNA and RNA extraction and sequencing
The six representative conifer species (i.e., Keteleeria davidiana for Pinaceae, Agathis dammara for Araucariaceae, Nageia nagi for Podocarpaceae, Sciadopitys verticillata for Sciadopityaceae, Cephalotaxus wilsoniana for Taxaceae, and Cunninghamia konishii for Cupressaceae) were collected and identi ed by Dr. Chung-Shien Wu (Biodiversity Research Center, Academia Sinica). Permission was not necessary for collecting these plants. The voucher specimens were deposited at the Herbarium, Biodiversity Research Center, Academia Sinica, Taipei (HAST ; Table S2).
For DNA and RNA extraction, approximately 30 cm of fresh young shoots were collected from the six conifer species. To reduce potential variability due to different growth conditions, shoots were grown hydroponically in a growth chamber (GC-550R, Yihder Company, New Taipei City) at 25 °C with a light intensity of 100 µmol m − 2 s − 1 . After 24 hours, fresh leaves on the shoots were harvested for DNA and RNA extraction using the methods described in [32] and [33], respectively. The extracted DNA was sequenced at Genomics BioSci & Tech (New Taipei City, Taiwan) on an Illumina HiSeq 4000 system. We also performed strand-speci c RNAseq using the same system after DNase I (Invitrogen) treatment, rRNA depletion (Illumina Ribo-Zero rRNA Removal kits, Plant Leaf version), and library construction with dUTP and random hexamers. Table S2 details the information on sampling locality, voucher numbers, GenBank accessions, and DNAseq and RNAseq read counts used in this study.
Plastome assembly and RNA mapping analysis Plastome assembly was initially conducted using SPAdes 3.13 [34] with the option of "careful" and a range of k-mer sizes (21, 33, 55, 77, and 99). Plastomic contigs were identi ed using NCBI-blast 2.2.18 [35] against in-house databases. Gaps between contigs were closed using GapCloser 1.12 [36]. This yielded complete plastomes for all sampled conifers, except K. davidiana because of its long Pinaceae Type I repeats [18]. We subsequently designed speci c primers (Fig. S1) to amplify the corresponding regions and perform genome nishing in the latter species.

Identi cation and visualization of RNA-editing sites
To identify C-to-U RNA-editing sites, the "Find Variations" option implemented in Geneious 11.1.5 was employed with the threshold: minimum coverage = 50, minimum variant frequency = 0.1, and maximum variant P-value = 10 − 6 . Editing e ciencies were estimated by calculating the ratio of edited to unedited bases in mapped reads. Intersection among edited sites from the six conifers were evaluated using UpSetR [39]. Plastome maps, transcriptional pro les, and RNA-editing sites were visualized using Circos 0.67 [40].

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.