Copia and Gypsy retrotransposons activity in sunflower (Helianthus annuus L.)

Background Retrotransposons are heterogeneous sequences, widespread in eukaryotic genomes, which refer to the so-called mobile DNA. They resemble retroviruses, both in their structure and for their ability to transpose within the host genome, of which they make up a considerable portion. Copia- and Gypsy-like retrotransposons are the two main classes of retroelements shown to be ubiquitous in plant genomes. Ideally, the retrotransposons life cycle results in the synthesis of a messenger RNA and then self-encoded proteins to process retrotransposon mRNA in double stranded extra-chromosomal cDNA copies which may integrate in new chromosomal locations. Results The RT-PCR and IRAP protocol were applied to detect the presence of Copia and Gypsy retrotransposon transcripts and of new events of integration in unstressed plants of a sunflower (Helianthus annuus L.) selfed line. Results show that in sunflower retrotransposons transcription occurs in all analyzed organs (embryos, leaves, roots, and flowers). In one out of sixty-four individuals analyzed, retrotransposons transcription resulted in the integration of a new element into the genome. Conclusion These results indicate that the retrotransposon life cycle is firmly controlled at a post transcriptional level. A possible silencing mechanism is discussed.


Background
The mobile component of the genome is represented by sequences, called transposable elements (TEs), which are potentially able to change their chromosomal location (transposition) through different mechanisms. This feature has a cladistic significance and TEs are subdivided into two main classes accordingly to their mechanism of transposition, retrotransposons (REs, class I) and DNA transposons (class II). Class I elements, which includes all REs, can transpose through a replicative mechanism which involves the transcription of an RNA intermediate by the enzyme machinery of the host cell, and subsequent retrotranscription to cDNA and integration into the host genome by the enzymes encoded by the retrotransposon RNA. Such a "copy and paste" mechanism has been largely successful during the evolution of eukaryotes in which class I elements represent the largest portion of higher plant genomes. In the case of Oryza australiensis, the amplification of retrotransposons doubled the genome size [1].
Retrotransposons are divided into autonomous and nonautonomous elements, according to the presence of ORFs that encode RE enzymes. Non-autonomous elements do not carry enough coding capacity to allow them to transpose autonomously, nevertheless they are able to move using enzymes encoded by other elements [2].
Basically, the genome of autonomous REs is organized in two domains: the gag domain, which is committed towards the production of virus like particles, and the pol domain, whose encoded enzymes are used for processing RE-mRNA and obtaining a double stranded DNA to be integrated into the genome. The occurrence of long terminal repeats (LTRs) flanking the retrotransposon genome distinguishes REs into two main classes, namely LTR-and non-LTR-retrotransposons. LTRs carry promoter elements, polyadenilation signals and enhancers regulating the transcription of retroelements.
Gypsy and Copia LTR retrotransposons are two ubiquitous classes [3,4] of plant REs that differ in the order of genes encoded by pol. Gypsy and Copia elements resemble retroviruses in their structure due to the presence of LTRs and internal ORFs. LTR-retrotransposons lacking internal coding domains, such as TRIMs (Terminal-repeat Retrotransposons In Miniature [5]) and LARDs (LArge Retrotransposons Derivatives [6]) have also been described. Formerly discovered in Solanum tuberosum and Arabidopsis, TRIMs have been reported in monocots and dicots while LARDs, which were shown to be transcribed, have been reported in Triticeae [4,6].
Over the last two decades, some examples have correlated the emerging of RE activity in the genome with a stress mediated reaction: Tnt1 and Tto1 in Nicotiana and Tos17 in rice showed stress-induced (by tissue culture) transcription and transposition [7][8][9][10], while these elements were not transcribed in standard culture conditions. Large genome sequencing of grass plants showed that REs are responsible for extensive changes in genome structure and, surprisingly, dramatic differences were reported even among individuals belonging to the same species [11,12]. A remarkable example of retrotransposon dynamics as an evolutionary adaptive mechanism within an ecological system is offered by BARE1 elements in wild barley [13].
It has been proposed that REs restructuring action plays a role in regulating gene expression [14,15]. It has been suggested that allelic variation in non-genic (regulatory) sequence may be involved in heterosis, i.e. the superior performance of hybrids in respect of their parents [16]. In this sense, the old epithet of "junk" for such repeated sequences, which have affected genome structure and function, is becoming obsolete.
Though the interplay between REs and host genome has allowed genome expansion and the evolution of the gene expression regulating network, the vast majority of REs seem to be inactivated by a large spectrum of mutations. Only few elements have been shown to transpose autonomously and data from EST libraries in grasses indicate that most are poorly transcribed [17][18][19]. However, it is conceivable that the activity of REs should be limited by the host genome because of their potential mutagenic action.
The control of TEs activity is related to RNA interference, a process mediated by small RNAs which derive from a number of different precursors, determining chromatin specific methylation and condensation, and RNA degradation [20]. In the fission yeast Schizosaccharomyces pombe, a basal level of transcripts matching centromeric DNA repeats is the substrate for the production of small RNAs that maintain heterochromatin structure through histone methylation [21]. A silencing pathway of REs and repetitive sequences, driven by anti-sense small RNAs, is well described in Drosophila [22] and in Arabidopsis [23].
In plants, retrotransposon dynamics have mainly been investigated in grasses and other monocotyledons, and in dicotyledons such as Arabidopsis, Gossypium species, Nicotiana, and Lotus japonicus. Recently, genome expansion related to the amplification of REs has been shown to occur in the evolution of three Helianthus hybrid species adapted to extreme environments [24,25].
The cultivated sunflower (Helianthus annuus) has a medium-large sized genome (3.30 pg DNA per haploid genome [26]. The occurrence of Copia and Gypsy REs has been reported a few years ago for the first time [27] and, lately, the main portion of sunflower genome was shown to be composed by REs [28]. The analysis of cpDNA suggested that the Helianthus genus originated between 4.75 and 22.7 million years ago while the Helianthus extant lineages appeared between 1.7 and 8.2 million years ago [29].
With the aim to study RE activity in a relatively young and medium-large genome sized species, we have analyzed retrotransposons transcription and integration of new REs in plants of cultivated sunflower.

REs expression in the Helianthus annuus genome
Sunflower repeated sequences were previously isolated from a Helianthus annuus partial genomic library by hybridization with labeled genomic DNA [27]. Among these, one Copia-like sequence (pHaS211 [EMBL acc. number AJ009967], hereafter called C211) and three Gypsy-like sequences (pHaS13 [AJ532592], pHaS22 [FM208278], and pHaS30 [FM208279], hereafter called G13, G22, and G30, respectively) resulted as being medium repeated, with a copy number per haploid genome ranging from 4,000 to 16,000 (Cavallini, unpublished). These sequences were studied with respect to their RNA transcription.
Specific PCR primers were designed on conserved domains matching the RNAseH and the Integrase gene of Copia and Gypsy elements, respectively. RT-PCR experiments were performed to assess the occurrence of RE transcripts in different organs such as root, leaf, flower (at three different stages) and embryo (at four different stages), collected from HCM line plants. For each organ and stage, amplified fragments of the expected length were obtained (Fig. 1), indicating that the retroelement families studied are actively transcribed in all the organs analyzed. Additional PCR products were obtained, that might have been originated by the transcription of retrotransposon remnants and/or related elements.
Three amplified fragments for each RT-PCR product were cloned and sequenced (EMBL acc. numbers FM208268-FM208277). Sequences belonging to the Copia-like element did not show sequence polymorphism (Table 1). This may indicate that only one or a few Copia REs belonging to C211 family are transcribed, so that polymorphism cannot be detected by sequencing only three PCR products. All nine sequences belonging to Gypsy-like REs were Eight out of the nine Gypsy sequences and the Copia ones do not show stop codons. In spite of the low number of sequences analyzed, this result indicates that many of the expressed REs encode functional protein sequences. Within the G13, G22, and G30 families, the ratio between synonymous and non synonymous substitutions was 0.071, 0.086, and 0.189, respectively, i.e., close to zero (Table 1). Such low ratios are usually found in coding gene sequences and they indicate conservative selection. A BLAST search on EST databases using sequences of three Gypsy and one Copia REs as queries, indicated that RE families related to those analyzed in our experiments are transcribed also in other Asteraceae species (Table 2)

Isolation and analysis of LTRs
Putative full-length Gypsy-like LTRs were isolated by twostep chromosome walking, following the method reported for sunflower Copia-like LTRs [30].
Twelve putative Gypsy LTRs (EMBL acc. numbers FM177929-FM177940) and 18 putative Copia LTR (FM177911 -FM177928 [30]) were aligned and a consensus tree, based on nucleotide sequences, was obtained by the neighbor-joining analysis (Fig. 2). The tree showed a clear distinction between Gypsy and Copia LTRs. Moreover, Copia LTRs resulted to be more uniform than Gypsy ones (Table 3), for which three distinct families are observed (Fig. 2). Calculation of nucleotide diversity along the relatively uniform Copia LTRs indicated that diversity is higher in the central region of LTR than at both ends (not shown).
CLUSTAL alignment clearly showed conserved putative TATA box promoter both in the Gypsy (TATAAA) and in the Copia (TATATATA) LTRs. To analyze the structure of a putative RE promoter, isolated Copia and Gypsy LTRs were T: total of EST matches, S: transcripts from stressed tissues, U: transcripts from unstressed tissues. Matches were considered reliable according to their E-value (e -5 ) with variable minimum query coverage: 10% for G13 (710 bp-long) and for G22 (740 bp-long), 45% for G30 (467 bp-long) and for C211 (252 bp-long). scanned for cis-elements against the PLACE database [31]. In all the LTRs analyzed, stress-responsive cis-elements as Myb, Myc, and WRKY motifs were found. Cis-elements typical of constitutively expressed genes such as Dofrelated elements [32] and CACT boxes [33] were observed. Also many putative light responsive elements, such as GATA boxes and GT1 binding sites [34], tissuespecific motifs such as SEF3 binding sites [35], were found in all the LTRs analyzed. All these elements may account for the observed RE expression. It is to be noted that similar cis-elements can be observed in both strands of analyzed LTRs.

Insertion of new REs in the genome
To investigate whether Copia and Gypsy transcriptional activity leads to the integration of daughter copies in the genome of Helianthus annuus, the IRAP protocol [36] was applied to detect polymorphisms within the HCM line. This sunflower line was subject to eighteen self-pollination cycles, thus it is to be considered as homozygous as indicated by the phenotype uniformity. Since the IRAP protocol displays RE fingerprinting arising from the amplification of neighbour LTRs, new integrated copies of LTR-retroelements can produce polymorphic bands if they insert themselves close enough to a second element, to be amplified by Taq DNA polymerase.
To detect RE-related polymorphisms, primers were designed on the 5'-and the 3'-LTRs ends of Copia and Gypsy elements. Since RE insertions are mutagenic and the development of plantlets might be aborted, new events of RE integration were surveyed in sunflower embryos from four inflorescences, one for each developmental stage. Sixty-four embryos (7, 14, 21, or 28 days after pollination) were tested, using specific Copia-and Gypsy-LTR primers, respectively. Only a combination of LTR specific primers (FF C-LTR/C-LTR2), matching a Copia retroelement, produced a clear polymorphic band in a 14 day old embryo (Fig. 3). This band was recovered from the gel, and then cloned and sequenced (EMBL acc. number FM209477). The sequence was found to be delimited by the two primers; 3'-Copia LTR and 5'-Copia LTR ends were present, excluding unspecific primer annealing. The sequence between the two LTRs (665 bp long) was iso-lated in other HCM plants using primers designed on the inter-retrotransposon sequence and it showed 100% similarity with the same locus related to the polymorphic band. To assess whether Copia LTRs flanked this locus in plants of the HCM line, PCRs were performed with primers pointing outward from the genomic locus (Fig. 4). Of the two possible amplified fragments, only one was obtained of the expected molecular weight, containing a portion of 5'-LTR, indicating the occurrence of a Copia element in that side of the locus. This suggests that a new Copia retrotransposon had inserted itself in the embryo on the other side of the locus.

Discussion
McClintock [37] addressed the presence and activity of mobile elements within the host as a chance for the genome to cope with challenges to which it is not prepared to react. The genome restructuring action of mobile elements should have made available further genetic variability, increasing the possibility to overcome the challenge.
In the present work, RT-PCR experiments have demonstrated that the Copia and the Gypsy REs families investigated are transcribed in all Helianthus annuus analyzed tissues, i.e. roots, leaves and flowers. Transcripts showed a very low level of non synonymous/synonymous substitution rates. Similar ratios were reported for Copia, Gypsy, and LINE REs also in other Asteraceae, as Hieracium aurantiacum, Taraxacum officinale and Antennaria parlinii, suggesting a conservative selection [38]. Nine out of ten RE-mRNA transcripts did not show any supernumerary stop codon, supporting the presence of a potentially functional segment of RE pol-proteins. That "parasite" sequences with no apparent function for the host genome tend to maintain their amino acid sequence may be somewhat unexpected. A possible explanation is that only recently inserted elements (i.e., those which have not been subject to mutations yet) are functional.
BLAST screening against EST databases has shown that transcripts related to these REs families are largely transcribed in different Helianthus species and in other Asteraceae. Interestingly, G13 and G22 matches are distributed in several species belonging to different genera. It should be noted that, in other dicots, retrotransposon ESTs were reported to find matches only in their host species [18].

The isolation and sequencing of a number of full length
Copia and Gypsy 5'-LTRs showed the occurrence of a proper TATA box and putative cis-elements in their sequence. Transcripts were isolated using a poly-T primer targeting the 3'-poli-A tail, showing that the 3'-ends of RE transcripts were processed by the host genome. The occurrence of promoter sequences, of functional protein sequences, the RE transcription and the RE-mRNA 3'-end processing are all hints of autonomous retrotransposons.
It is known that REs are subject to inactivation by either mutations or chromatin condensation. Replication allows these elements to survive as genome parasites, but the higher the replication rate the lower will be the host fitness and, consequently, survival of the REs. In the sun-flower, although the expression of REs is widespread in all the tissues analyzed, IRAP experiments revealed only one convincing polymorphism, which was attributable to a new integration event. This suggests that, despite a substantial transcriptional activity, RE-mRNAs are quelled and the insertion of new REs is inhibited at a post-transcriptional level, as shown in other species including humans (see [20]).
The amplification of REs in the genome can have some functions for the host. For example, the occurrence in the LTR of putative promoter elements such as those observed in the sunflower could be used by the host to regulate the expression of nearby genes. Constitutively active LTR promoters could determine a housekeeping expression, while tissue-specific LTR promoters would drive the expression of genes in those tissues. An example of gene transcription related to the activity of adjacent retrotransposons was reported in wheat [39]. In mouse oocytes, retrotranspo- son related transcripts are predominant in the mRNA pool and LTR promoters are responsible for the transcription of a set of genes [40].

IRAP analysis of sunflower embryos
The ubiquity of RE transcripts observed in sunflower tissues, and the fact that REs are actively transcribed in standard culture conditions, can support the idea that retrotransposons can be integrated into cell metabolism. For instance, a basal level of retrotransposons transcription would make available the "rough material", namely dsRNAs, that can trigger RE silencing via RNA-directed DNA methylation and chromatin remodeling [41] or via a post-transcriptional mechanism [23]. Double stranded RNA precursors may originate from transcribed nested, head-to-tail oriented LTRs [39], from read-through transcription of two elements which are head-to-tail oriented or from anti-sense strand transcription [42][43][44].
In this sense, low-copy elements would be the most hazardous for the host, because of the rareness of head-to-tail orientation in the genome, so reducing the efficiency of silencing mechanisms. Accordingly, the few elements in plants for which new insertion events were shown, are three Copia-like elements, Tnt1, Tto1, and Tos17, present in a relatively low copy number (< 1,000) per haploid genome (see [45]).
Previous analyses of sunflower REs revealed that they are highly methylated [28]. The families of REs investigated in this study are present in several thousands of copies within the genome and are possibly methylated. However, the widespread transcription of such elements suggests that RE silencing in this species occurs also by degradation of RE mRNAs.

Conclusion
Retrotransposon transcription was shown in all sunflower tissues analyzed in our experiments. RE activity is not apparently induced by environmental factors or by culture conditions. In one over 64 surveyed embryos a new RE insertion occurred, possibly determining a mutation.
We can speculate that in the sunflower the rarity of insertion events, observed in our experiments despite the consistent transcriptional activity of the Copia and Gypsy RE families investigated, would be linked to post-transcriptional regulation of REs activity, probably through the degradation of target RE mRNAs.

Plant materials, DNA and RNA extraction
Roots, leaves, embryos, and flowers were collected from plants of the HCM line of Helianthus annuus, grown in the field. The HCM line was developed at the Dept. of Crop Plant Biology of University of Pisa after 18 self-pollination cycles, starting from an open-pollinated cultivar, and it is a highly homozygous line, as indicated by phenotype uniformity. Self-pollination was obtained by covering inflorescences to prevent outcrossing. After sampling, tissues were ground in liquid nitrogen. DNA was extracted Schematic representation of polymorphism analysis Figure 4 Schematic representation of polymorphism analysis. A polymorphic band was detected in the HCM embryo # 15 (see Fig. 4). Forward and reverse primers designed within the inter-retrotransposon locus and directed outward (blue arrows) were coupled with specific LTR primers (black arrows). Only one fragment was amplified from plants of HCM line.
from embryos and leaves using a CTAB protocol [46] with minor modifications. For total RNA extraction, a MESguanidine hydrochloride-containing buffer was used following the protocol described by Logeman et al. [47].

RNA purification
A tuned RNA purification protocol was tailored to avoid genomic DNA contamination, i.e., DNA remnants invalidating RT-PCR analyses. Such a high level of accuracy is crucial especially when analyzing RE expression because of the high frequency of REs in the genomes.

Expression analyses by RT-PCR
For retrotranscription, total RNA (5 μg) was heated for 3 min at 70°C and retrotranscribed in a 20 μl volume reaction using 400 μM of each deoxynucleotide triphosphate, 0.25 μM poly(T) primer, 1×RT-Buffer, 1 mM DTT, 200 U SuperScript III Reverse Transcriptase (Life Technologies). The same quantity of RNA was processed as above but in the absence of the reverse transcriptase and used as a negative control in RT-PCR.

Sequence analysis
Sequences were aligned using CLUSTAL W [48]. Some adjustments were made by eye. Statistics of sequence polymorphisms were performed using the DnaSP program version 3.51 [49]. Nucleotide diversity (π, i.e. the average number of nucleotide differences per site) and its sampling variance were calculated according to Nei [50], equations 8.4 and 8.12, replacing 2n by n.
Relationships among LTR sequences were investigated by the neighbour-joining (NJ) method (distance algorithm after Kimura), using the PHYLIP program package Version 3.572 [51]: after sequence alignment, 500 versions of the  Polymorphic band analysis primers (Copia)

RT-PCR primers (Copia and Gypsy)
original alignment were generated using the SEQBOOT program; then trees were generated using PROTDIST (or DNADIST) and NEIGHBOR programs, using default options. A strict consensus tree was obtained from the available trees using the CONSENSE program.

Isolation of Gypsy LTRs
To isolate full length Gypsy LTRs a two-step PCR protocol was applied [30]. Firstly, putative partial 3' LTR chromosome walking was performed: specific retrotransposon forward primers designed onto a conserved domain belonging to a Gypsy Integrase gene (GenBank Acc. Nr. AJ532592) (Gypsy ChWp4, Table 4) were coupled with a random annealing reverse primer (5'-ACCATCGTCCT-CAGGTTAGTCAGG-3', Ra A-P). PCR products were amplified using 30 ng DNA, 2.5 mM MgCl 2 , 0.5 μM primers, 1 U Taq FirePol (Biodyne) DNA polymerase, 20 μl volume reaction. Thermocycling was performed at 94°C for 30 s, 60°C for 30 s and 72°C for 2 min, for 30 cycles. Products longer than 1,000 bp were cloned and sequenced as above. Clustal analyses were performed to address putative polypurine tract (PPT) whose location is usually a couple of nucleotides before the 3' LTR beginning. Due to the remarkable LTR sequence variability and the lack of large conserved sequence traits, at this stage the 3' boundaries of the 3' LTR within the sequenced clones could not be determined.
In the second step, isolation of complete 5' LTRs was performed. As the retrotransposon LTRs were made identical before the retroelement genome integration, primers designed in the 3'LTR would be expected to match both LTRs. Therefore, specific forward primers were designed downstream of the putative PPT matching Gypsy-like 3' LTR, (P1FGypsy and P2FGypsy respectively at bases 51-75 and bases 84-109 after the canonical 5 ' TG) and coupled with a universal primer designed onto the primer binding site (PBS) related to the tRNA met sequence pointing towards the 5' LTR (PBS met , 5'-TAGGTCGGAACAG-GCTCTGATACCA-3' [52]). Thermocycling was performed at 94°C for 30 s, 57°C for 30 s and 72°C for 60 s, for 30 cycles. PCR products resulting from a semi-nested PCR between P2FGypsy-and PBS-primer were visualized on EtBr-stained agarose gel, cloned as above and sequenced.

Polymorphic band analysis
A polymorphic fragment was recovered from the gel, cloned in pGEM T-easy vector using the manufacturer's instructions, and sequenced. Specific primers (EMB 1F, EMB 1R, Table 4) were designed to amplify the same genomic locus in individuals of the same HCM line by PCR, using 30 ng DNA, 2.5 mM MgCl 2 , 0.5 μM primers final concentration, 1 U Taq FirePol (Biodyne) DNA polymerase, 20 μl volume reaction. Thermocycling was performed at 94°C for 30 s, 60°C for 30 s, 72°C for 2 min. Primers (EMB 2F, EMB 2R, Table 4) were designed on the polymorphic isolated locus. The LTR primers (FF C-LTR, C-LTR2, Table 4) were coupled with EMB 2F and EMB 2R, and PCR was performed using the same reaction mixture as above, at 94°C for 30 s, 57°C for 30 s, 72°C for 1 min. PCR products were cloned as above and sequenced.