Adult and juvenile phases in A. acetabulum differ significantly in gene expression
The expression analysis (Fig. 2) and the fact that there is virtually no overlap between the two libraries suggest that the subtraction succeeded in isolating differentially expressed transcripts.
The 941 ESTs were organized into 675 independent clusters or singletons. Although this number is probably an overestimate – non-overlapping ESTs originating from the same transcript may partition to different clusters or remain singletons – these ESTs only represent a portion of all the ESTs present in the libraries. These data provide strong evidence that the adult and juvenile phases in A. acetabulum differ significantly in gene expression and that a large number of genes are probably phase-specific.
Physiological differences between the two developmental phases
The functions of the transcripts expressed at the different phases partition differently into functional classes (Fig. 4). Given that the libraries were created such that only ESTs specific to one developmental phase would be isolated, the distribution of gene functions among the ESTs is not expected to reflect that of a typical photosynthetic cell, but merely the functions that are specific to one developmental phase or the other.
Juveniles seem to devote much of their unique gene expression to transcription, protein synthesis, transport and storage, consistent with the general idea that juveniles are fast growing, more dedicated to growth than morphogenesis. During juvenile phase, the unicells increase about 10-fold in height (from <1 mm to 1 cm) in 2 weeks and are not competent to make a cap. On the other hand, adults increase in height about 3-fold, another 3 centimeters. Adult cells are much more complex than juvenile cells, with numerous whorls of hair that are highly branched. There is thus an increased requirement for cell wall synthesis during the adult phase. Adults also are competent to execute complex cap morphogenesis, and are preparing for nuclear divisions, nuclei transport and gametogenesis [21]. There is not a large increase in cytoplasmic volume between adult and juveniles because most of the volume within an adult is occupied by a central vacuole (Ngo et al., submitted). This more complex adult development is consistent with a lower percentage of transcripts dedicated to protein synthesis and a higher percentage of transcripts involved in cell structure (e.g. cytoskeletal proteins, enzymes involved in cell wall synthesis or maintenance, histones).
A surprisingly high number of ESTs from both libraries are associated with photosynthesis (Additional file 3, class 2). For example, five ESTs were putative homologs of the rbcS protein. It is possible that these ESTs truly come from different transcripts. It is also possible that they originate from the same transcript but have not been clustered together because they do not overlap or because of regions of poor sequence quality. To address this question, we aligned these sequences to the A. acetabulum rbcS mRNA sequence present in Genbank (Fig. 6). We observed in each EST or cluster, a region of high sequence identity to the Aa-rbcS sequence (identity varied between 84 and 89% at the nucleotide level). Two of the clusters (cn115 and cn116) were identical except for sequence ambiguities, which were frequent enough for these two sequences not to be clustered together. With the exception of cn116 and cn115, the other sequences were different enough from each other and from the Aa-rbcS sequence to conclude that they did not actually originate from the same transcript. In support of this hypothesis, regions outside of these fragments of very high sequence identity could not be aligned (<50% identity). At the amino acid level, the sequence identity between these regions (white boxes in Fig. 6) and the Aa-rbcS sequence was almost complete, varying between 94 and 96%. A closer look at the sequences confirmed that most of the nucleotide differences present in the ORFs occurred at the third position of a codon, often resulting in the conservation of the amino acid sequence. Finally, one of the clusters (cn115) ends with a polyA tract, indicating the end of the 3'UTR, while the transcript that generated cluster 146 seems to possess a much longer 3'UTR that contains no polyA tract. Taken together, these results suggest that there may be at least 3 different loci coding for the rbcS protein in Acetabularia (cn169, J538 and Aa-rbcS, Fig 6). Confirmation of these surprising results with different methods and detailed expression analysis of the different loci will be of great interest. A similar analysis of the chlorophyll a/b binding proteins might also determine whether different proteins are expressed during different phases and whether they are functionally distinct and/or differentially regulated. The LHC (light-harvesting complex) binding proteins form a very large family that has been best characterized in land plants [22]. Analysis of the 22 ESTs with identity to LHC binding proteins from land plants will improve our currently poor knowledge of this gene family in algae.
Putative gene functions of particular interest
Immuno-cytochemistry has been used to visualize tubulin and actin proteins in A. acetabulum during development [23, 24]. Actin microfilaments were found in thalli of all ages forming continuous, parallel bundles along the entire stalk. Microtubules, conversely, could not be detected in the alga prior to meiosis. Microtubules were detected during reproduction, surrounding haploid nuclei as they are transported up into the cap. Consistent with these results, our data suggest that juveniles express actin transcripts (but not tubulin) while adults express both alpha and beta tubulin (but not actin) transcripts, presumably in preparation for reproduction.
Two of our ESTs were putative expansin homologues. Expansins promote cellulose walls extension in land plants. Typical of the "mannan weeds", the wall of the diplophase of A. acetabulum is predominantly a para-crystalline mannan framework (Dunn et al., submitted). Only gametangia are enclosed in a cellulosic wall, itself surrounded by the mannan wall of the cap. Consistent with this, our results indicate that the expansin gene was expressed during adult but not juvenile phase. So far, expansins have only been found in land plants [25] where expansin acts within the cell wall and is activated by an acidic pH [26]. If these transcripts code for expansin proteins that play a role in loosening walls in A. acetabulum, it would be interesting to see if their mechanism of action is similar to that in land plants, and whether their substrate is also a cellulose wall.
A. acetabulum, for most of its life cycle, contains only one nucleus, which is located in the rhizoid. This nucleus undergoes replication at the end of adult phase, during reproduction [27]. At this juncture, there is a tremendous need for nucleotides and histones to make the millions of haploid nuclei needed for gametogenesis [21]. Consistent with this, histone mRNAs were found only in the adult library (Additional file 3, class 9). It would be interesting to look more deeply into when these transcripts are expressed and how this organism is able to produce histone proteins in such high quantities in such a short period of time.
Finally, one of the ESTs has homology to an argonaute protein (Additional file 3, class 11, E value to Oryza sativa argonaute of 5e-15). Argonaute proteins are highly conserved and play a major role in RNA interference in animals (a.k.a. quelling in fungi or post-transcriptional gene silencing in plants [28]). These processes are involved in the silencing of specific genes via double stranded RNA [29] and their importance in post-transcriptional regulation is just starting to be deciphered. Argonaute proteins have been found in land plants, ciliates, animals and fungi but, to the best of our knowledge, this EST is the first identified algal sequence of an argonaute protein.
Why most of the ESTs do not correspond to any previously described sequences
We can think of three reasons why only 28.6% of the ESTs, a particularly low number, were assigned a putative homolog based on BLAST and InterPro searches. First, these are subtracted libraries, created with the objective of identifying rare, phase-specific transcripts or transcripts involved in morphogenesis, apical growth, or phase change. Hence, these ESTs should include fewer housekeeping transcripts, abundant transcripts, or transcripts common to both phases or to other organisms.
Second, the ESTs were generated by a reverse transcriptase using a poly-T primer that often does not generate full-length cDNAs. Our libraries therefore tend to be enriched in 3' ends of the transcripts, which contain non-coding sequences and which would not be recognized in homology searches. The high percentage of ESTs containing a polyA or polyT stretch supports this hypothesis.
Finally, A. acetabulum belongs to the order Dasycladales, in the green algal class Ulvophyceae, for which very little sequence data is currently available. Before the addition of our ESTs, only 73 DNA sequences from A. acetabulum were available in Genbank, representing just 37 different genes. Although complete genomes of several land plants and green algae are now at least partially available, it is plausible that most of the A. acetabulum sequences are too divergent from those of other algae or land plants to be recognized as orthologs when entered in BLAST searches [30]. To test this hypothesis, we raised the cut-off value for the BLASTN and BLASTX searches against the Genbank databases from 10E-06 to 10E-03. Most additional hits obtained originated from algal or land plant sequence as opposed to a random distribution of the organisms represented in Genbank. This supports the hypothesis that these ESTs are probably homologous to these algal or plant sequences but too divergent for the homologies to be trusted.
Do adult and juvenile transcripts differ in structure? Insights into post-transcriptional regulation
Curiously, 40% of the juvenile clones but only 15% of the adult clones end with a polyA or polyT tract. If these tracts correspond to the mRNA polyA tail, then these ESTs contain some or all of the 3' untranslated regions (3' UTR) of the transcript from which they originated. We have diagrammed hypotheses explaining the differential occurrence of these tracts in adult versus juvenile clones (Fig. 7). The first explanation presumes an artifact of the techniques used to create the libraries. ESTs result from the amplification of cDNA fragments that have been digested by RsaI, each RsaI fragment having an equal chance of being amplified and cloned. If the adult cDNAs were more completely digested than the juvenile cDNAs, then the adult cDNAs would have generated a higher number of ESTs, a lower proportion of which would contain polyA tracts (Fig. 7a). A second hypothesis presumes differential mRNA length: if adult cDNAs were, on average, longer than juvenile cDNAs, each adult cDNA would produce more ESTs, yielding a lower proportion of ESTs containing the polyA tract. Adult cDNAs could be longer if on average they have longer coding sequences (Fig. 7b) or longer 3'UTRs (Fig. 7c). If they have longer 3'UTRs, the proportion of coding sequences as well as the proportion of ESTs with polyA tracts will be higher in juvenile ESTs than in adult ESTs, consistent with our findings.
Why would 3' UTRs be longer in adult transcripts than in juvenile transcripts? In adult A. acetabulum, growth and morphogenesis occurs almost exclusively at the stalk apex, centimeters away from the unique nucleus located in the rhizoid. Therefore, aspects of post-transcriptional regulation, such as mRNA stability and mRNA localization, are probably very important to the regulation of gene expression in these unicells. Indeed, more than half of the transcripts (9/16) studied to date in A. acetabulum are localized to one end or the other of the unicell, most often to its apex [31–33]. To achieve this localization, each transcript must contain cis-acting elements within its sequence, also called 'zipcodes' [34]. In yeast and animal cells, 'zipcodes' are part of the 3' UTR of the localized transcripts [35]. Also, considering the rate at which mRNA molecules move along cytoskeletal elements along the stalk of A. acetabulum [36, 37], to reach the apex, any mRNA must be at least three days old, classifying them among the "ultra-stable" mRNA species [38]. In plants, the cis-acting elements responsible for stability of an mRNA molecule are also located in its 3' UTR [38]. Transcripts 3' UTR might therefore play an important role in the regulation of gene expression in this species, especially in adults.
To achieve these stability and localization patterns, adult mRNAs probably contain several post-transcriptional regulatory elements within their 3' UTRs, potentially explaining why these would be longer. What are these regulatory elements? The fact that three conserved elements (Fig. 5) were found within several unrelated ESTs, most of which originate from the adult library is promising. These 3 elements also appear to be specific to A. acetabulum and are located in a non-coding region of the carbonic anhydrase gene, whose transcript is apically localized [13]. Five of the ESTs containing the second conserved element also contain a polyA tract, suggesting that these ESTs may code for 3' UTRs. The first and second conserved elements fall within introns of AaCA1 (Fig. 5). It is possible that these sequence elements of AaCA1 are part of alternatively spliced introns and sometimes contained in the mature mRNAs produced from this gene. Future research will focus on elucidating the function of these conserved elements and their spatial expression during development.