Genomic expression profiling of mature soybean (Glycine max) pollen

Background Pollen, the male partner in the reproduction of flowering plants, comprises either two or three cells at maturity. The current knowledge of the pollen transcriptome is limited to the model plant systems Arabidopsis thaliana and Oryza sativa which have tri-cellular pollen grains at maturity. Comparative studies on pollen of other genera, particularly crop plants, are needed to understand the pollen gene networks that are subject to functional and evolutionary conservation. In this study, we used the Affymetrix Soybean GeneChip® to perform transcriptional profiling on mature bi-cellular soybean pollen. Results Compared to the sporophyte transcriptome, the soybean pollen transcriptome revealed a restricted and unique repertoire of genes, with a significantly greater proportion of specifically expressed genes than is found in the sporophyte tissue. Comparative analysis shows that, among the 37,500 soybean transcripts addressed in this study, 10,299 transcripts (27.46%) are expressed in pollen. Of the pollen-expressed sequences, about 9,489 (92.13%) are also expressed in sporophytic tissues, and 810 (7.87%) are selectively expressed in pollen. Overall, the soybean pollen transcriptome shows an enrichment of transcription factors (mostly zinc finger family proteins), signal recognition receptors, transporters, heat shock-related proteins and members of the ubiquitin proteasome proteolytic pathway. Conclusion This is the first report of a soybean pollen transcriptional profile. These data extend our current knowledge regarding regulatory pathways that govern the gene regulation and development of pollen. A comparison between transcription factors up-regulated in soybean and those in Arabidopsis revealed some divergence in the numbers and kinds of regulatory proteins expressed in both species.


Background
In flowering plants, pollen development occurs in the anthers. The meiotic division of diploid sporogenous cells gives rise to a tetrad of haploid microspores. The microspores then undergo an asymmetric mitotic division, giving rise to a smaller generative cell enveloped within a larger vegetative cell [1]. The generative cell divides once again to give rise to the two haploid sperm cells required for double fertilization. In most plants, the pollen is bicellular at anther dehiscence, with the division of generative cells taking place during pollen tube growth in the female tissues. However, in some cases such as crucifers and grasses, this division takes place while the pollen is still undergoing maturation in the anther.
In the last decade, the knowledge of pollen transcriptome has emerged with the development of large-scale transcriptional profiling techniques. This is exemplified by a number of studies carried out using model species such as Arabidopsis thaliana [2][3][4][5] or Oryza sativa with a recent report on allergen transcripts [6]. Studies on Arabidopsis pollen transcriptome showed that 9.7% of the 13,977 pollen-expressed mRNAs were selectively expressed in pollen; among them, many genes had an unknown function or were reported to be functionally associated with signalling pathways and cell wall metabolism [4]. These studies also revealed differences among the cell cycle regulators, cytoskeleton genes, and signalling in pollen as compared to sporophytic tissues [2][3][4][5].
The current knowledge of the pollen transcriptome however, is limited to Arabidopsis and rice that have tri-cellular pollen grains at maturity. Comparative studies on pollen of other genera, particularly legume crop plants, are needed to understand the pollen gene networks that are subjected to functional and evolutionary conservation. In this study, we present the transcript profile of the mature soybean pollen that is bi-cellular as compared to sporophytic tissues assayed on the soybean GeneChip ® . Among the transcripts identified to be up-regulated in the pollen in comparison to the sporophytic tissues, we observed many that are unknown as well as transcripts with putative annotation. That has allowed us to infer pollen regulatory roles for various families of transcription factors as well as products associated with protein destination and storage, signal transduction, transporters and heat shock-associated proteins. The data presented here represent a rich source of novel target genes for further studies into molecular processes that govern the development of pollen.

Detection of differentially expressed transcripts in soybean mature pollen
Using the soybean GeneChip ® , we compared the transcript profiles of soybean pollen with that of sporophytic tissues consisting of an equal mix of RNA derived from leaves and stems of 10-day-old soybean seedlings. The raw intensity data generated from the microarray hybridization experiment were imported into AffylmGUI [7] and were analysed as outlined in Materials and Methods.
When the normalized data were visually displayed by scatter-plotting the log 2 -transformed signal intensities of the two different samples, there was much complexity and differences on the transcript pattern between pollen and sporophytic tissues as indicated by the greater scatter of the points in the plot in comparison to a similar plot between sporophytic tissues [i.e.] stems, roots and leaves (this study) versus shoot apical meristem (Haerizadeh et al., unpublished) ( Figure 1).
The soybean GeneChip ® used contains probe sets for 37,500 transcripts and the resulting analysis revealed that approximately 27% of these are expressed in the soybean pollen while 75% are being expressed in sporophytic tissues. This difference reflects the specialization of pollen as MA plot comparing the transcript profile of pollen against sporophytic tissues (stems, roots and leaves tissues) or shoot apical meristems (SAM; Haerizadeh et al, unpublished) against stems, roots and leaves tissues (this study) compared to other tissues with respect to providing a specific set of transcripts for specific functions such as germination, pollen tube growth, and the subsequent process of fertilization. Meanwhile, only 7.87% of the pollenexpressed genes are likely to be pollen-specific as no 'present' calls were detected for the corresponding probe sets in the sporophytic tissues. A total of 8,763 transcripts show statistically significant differential regulation in pollen as compared to sporophytic tissues with 1,686 of them showing higher expression levels in the pollen than the sporophytic tissues (at adjusted p-value < 0.05; Additional File 1 and Additional File 2). When the expression pattern for sporophytic tissue-expressed chlorophyll a/b binding protein family members were examined, none of these transcripts were represented in the pollen-expressed dataset and hence validate our experimental approach.

Functional categories of transcripts differentially expressed in pollen
The transcripts represented by the soybean GeneChip ® have been annoated as described in Materials and Methods. This allowed us to examine functional categories of transcripts that are up-or down-regulated in the pollen. As shown in Figure 2, although many of the genes fall into "unclassified" or "no homology to known protein" categories, the general distribution and over-representation of categories such as intracellular trafficking, signal transduction and transcription are evident. The up-regulated transcripts in the "no homology to known protein" category provide a valuable opportunity for the initiation of many functional analysis experiments toward an in-depth understanding of the pollen gene regulatory system and its components, which are presently incomplete.
It is interesting to note that none of the significantly upregulated transcripts encode products that are related to the small RNA pathways (Additional File 1). A closer inspection of the expression values revealed that all of the small RNA pathways associated transcripts have signals below the detection threshold. This is consistent with a previous report [3] although a recent study by the same group has revealed the detection of 3 out of 15 genes of the ARGONAUTE family that were previously below the detection limit. The authors have attributed this discrepancy to "improved chemistry for sample processing, array hybridization, and staining that resulted in a better signal to noise ratio and thus a higher sensitivity" [8]. It is equally likely that the small RNA pathways are only active in the generative cells and hence further transcript profiling work on gametes shall resolve this issue.

Top 30 candidates up-regulated in the pollen
The top 30 most highly up-regulated transcripts in pollen in comparison to sporophytic tissues are those predicted to encode cell wall-related proteins such as pectate lyase and pectin esterase family proteins, rapid alkalinization factor (RALF), multi-copper oxidase, and some transporters, along with unknown and novel genes (Table 1). RALF, a 5 kDa ubiquitous polypeptide in plants was first reported as RALF gene in tobacco encoding a ubiquitous 115-amino acid protein, which is processed into a 5-kD signaling peptide [9]. The peptide induced a rapid alkalinization of the culture medium of tobacco suspensioncultured cells and a concomitant activation of an intracellular mitogen-activated protein kinase [9]. RALF is considered as a potential signaling molecule and a putative RALF receptor has been detected in plasma membranes [10]. RALF-LIKE 10 is selectively expressed in Arabidopsis pollen [5]. In our data on soybean pollen two RALF isoforms, RALF-Like 11 and RALF-LIKE 19 show selective expression in pollen. The conserved up-regulation of genes encoding RALF-like signaling peptides in soybean and Arabidopsis pollen implicates its essential role in pollen development. However, further experiments involving gain-of-function or loss-of-function mutants are required to address this hypothesis.
Meanwhile, 9 out of 30 highly abundant transcripts in mature soybean pollen are predicted to encode members of pectin esterase and pectate lyase families of cell-wall loosening enzymes (Table 1). Corresponding genes in Arabidopsis were among those with the highest expression in pollen [2,4,5]. It has been proposed that besides their possible involvement in pollen tube wall modification, these hydrolytic enzymes may be important for the penetration of the stigmatic tissues.
Functional categorization of up-and down-regulated tran-scripts in the soybean mature pollen in comparison to sporo-phytic tissues Figure 2 Functional categorization of up-and down-regulated transcripts in the soybean mature pollen in comparison to sporophytic tissues. Red or Green bar denotes upor down-regulated categories, respectively

Transcription factors up-regulated in the soybean pollen
A search using the matching AGI of the soybean probe set was performed at the Arabidopsis Gene Regulatory Information Server http://arabidopsis.med.ohio-state.edu/AtT FDB/ to explore the different families of transcription factors represented by the up-regulated transcripts in the pollen to see which transcription factors might have a major role in regulating activities in the mature pollen. Although many of the transcripts are annotated as transcription factors, the corresponding Arabidopsis orthologues are yet to be grouped under the 50 different families at the AtTFDB collection and this is likely due to the lack of functional knowledge of the genes concerned. Nevertheless, at least 16 different families of transcription factors are represented as listed in Table 2.
Zinc finger transcription factors are prominent in our differentially regulated gene data (25 genes). Although reported as pollen-specific genes in 1992 [11], zinc finger proteins act as master regulators (transcriptional repressors) in neuronal development, animal germ cells, and spermatogenesis [12]. For instance, Blimp1/Prdm1, a zinc finger transcriptional repressor, is the key regulator of early axis formation and primordial germ cell specifica-tion in animals [13]. Also, it has been shown that a targeted silencing of Ovol1 (also known as movo1), a zincfinger transcription factor, leads to germ cell degeneration and defective sperm production in mice [14]. These proteins are also reported to be important regulatory molecules in various plant developmental processes, such as apical meristem development via chromatin remodeling process, anther development, and flowering.
It has been recently reported that a class of MYB factors regulate sperm cell formation in plants [15]. We identified three members of the MYB family as up-regulated in soybean pollen ( Table 2). Certain MADS box proteins have been identified as pollen-specific in Antirrhinum [16] and have also been reported as an important non-classical transcriptional factor family in Arabidopsis pollen. Pina et al reported the over-representation of MADS box genes in the Arabidopsis pollen transcriptome, with 17 genes expressed in pollen and nine showing enrichment in pollen [3].
Plant homeodomain (PHD) finger transcription factors are up-regulated in soybean pollen. The PHD finger may promote both gene expression and repression through interactions with trimethylated lysine 4 on histone H3 (H3K4), a universal modification seen at the beginning of active genes [17,18]. PHDs are associated with chromatin condensation during mitosis or meiosis, general transcriptional machinery, and a transcriptional regulator required for proper development, flowering, and fertility of plants [19,20].
Meanwhile, very little is known about the physiological and developmental roles of WRKY proteins, another family of transcription factor up-regulated in the soybean pollen. Although the DNA binding site of WRKY proteins is well-defined, determining the individual role of WRKY factors remains a challenge [21,22]. Though the function of WRKY proteins in pollen is not clear, our data suggest an important and novel regulatory role for these proteins in soybean pollen.
A member of the basic helix-loop helix (bHLH) transcription factor also shows differential expression in soybean pollen; this group also shows a similar pattern of expression in Arabidopsis pollen. bHLH proteins are a family of transcription factors that bind to their DNA targets as dimmers [23,24]. They have been characterized in non-plant eukaryotes as important regulatory components in diverse biological processes such as the control of cell proliferation and the development of specific cell lineages. It has been shown that Tcfl5, a testis-specific bHLH protein, interacts with the regulatory region of the Calmegin gene promoter as a testis-specific activator of this gene and other testis-specific genes in mouse spermatogenesis [25].
Whether pollen-expressed bHLH transcription factors regulate sperm cell specific gene expression remains to be determined. Two NAC transcription factor family members are up-regulated in soybean pollen, suggesting a role of this family of proteins in the regulation of pollen genes, a function that to the best of our knowledge has not been reported for this class of genes.

Transcripts associated with the ubiquitin system
Post-translational protein modifications play a critical role in most cellular processes through their unique ability to rapidly and reversibly alter the functions of synthesized proteins, multi-protein complexes, and intracellular structures. In eukaryotes, such modifications frequently occur by attaching a small polypeptide to the target protein. Ubiquitin and small ubiquitin-related modifiers (SUMO) are among those polypeptides [26]. Approximately 5% of Arabidopsis genes encode proteins that are predicted to be involved in the ubiquitin-proteasome system, and the regulation of protein degradation by ubiquitination is important in many plant processes [27].
Ubiquitin ligases that are associated with membraneenclosed organelles are required for polarized pollen tube growth [28]. Furthermore, there has been a report of the enrichment of ubiquitin family genes in Arabidopsis sperm cells [8]. Our data contain many ubiquitin family genes, suggesting a role for this group of genes in pollen development through ubiquitin-mediated protein turnover (Table 3).

Signal transduction and transporters
Approximately, 100 different signalling proteins, such as 14-3-3 proteins and kinases are up-regulated at the gene level in the soybean pollen. 14-3-3 proteins are among the most important and versatile proteins in eukaryotes [29]. They interact with many regulatory proteins like transcription factors (by protein-protein interaction) and alter their activity, in addition to performing regulatory roles by shuttling proteins between various cellular locations. In plants, it has been reported that 14-3-3 proteins regulate the H-ATPase pumps of the plasma membrane [30]. As expected, calcium-related proteins are enriched in soybean pollen, as they are important regulators of pollen germination and tube growth. Calcium and calcium sensor proteins such as calmodulin (CaM), a universal calcium sensor protein, play important roles in gene regulation, and hence plant growth and development [31,32]. It has been shown that calcium transporters are key regulators of pollen tube development and fertilization in flowering plants [33]. In addition, CaM binding proteins, such as maize pollen calmodulin-binding protein (MPCBP) and NPG1 (no pollen germination1) in Arabidopsis, are specifically expressed in pollen and regulate pollen germination, as supported by the observation that down-regulation of these genes resulted in the inability of the pollen to germinate [34,35]. As expected, we identified many calcium-related genes in our soybean dataset ( Table 4). Some of these proteins are already known to be pollen-specific, and many are highly up-regulated (up to 256-fold) as compared to sporophytic tissues, highlighting the importance of these proteins in pollen biology.
Transport proteins, including membrane pumps, represent one of the largest up-regulated gene sets in the soybean pollen (Additional File 1). Table 5 shows a representative list of transcripts classified under the functional category of "transporter" and this includes those predicted to encode SUGAR TRANSPORTER 4 (STP4), ARABIDOPSIS H(+)-ATPASE 8 (AHA8), AHA9, monosac-charide/H+ symporter (STP), amino acid transporter, Ca 2+ pumps and a putative phosphate translocator. Similar categories of transcripts have been reported to be up-regulated in Arabidopsis pollen [36].
Higher plants possess two distinct families of sugar carriers: the disaccharide transporters that primarily catalyse sucrose transport and the monosaccharide transporters that mediate the transport of a variable range of monosaccharides [37]. The STP4 gene encodes a membrane located monosachharide H+ symporter that can catalyze the uptake of various monosaccharides [38]. High expression of monosachharide transporter in soybean pollen points towards glucose and fructose as preferred source of  nutrition for pollen germination and tube growth. A similar pollen specific expression of a putative hexose transporter gene was reported in Arabidopsis and Petunia [39,40]. It has been proposed that in species where monosachharides are taken up preferentially, sucrose might be hydrolysed to glucose and fructose by a cell-wall invertase before uptake by monosachharide transporters in the growing pollen tube.
High up-regulation of H + ATPases including those encoding AHA8 and AHA9 in soybean pollen points to an essential role similar to their Arabidopsis and Nicotiana counterparts. The expression of AHA8 and AHA9 has been shown to be pollen-specific in Arabidopsis [3]. Recently, a pollen H+ ATPases has been shown to be associated with the tip growth in Nicotiana pollen tubes [41]. Uptake and translocation of cationic nutrients play essential roles in plant growth, nutrition, signal transduction, and development [42]. The plant cation transporter gene families include potassium transporters and channels, sodium transporters, calcium antiporters, cyclic nucleotide-gated channels and cation diffusion facilitator proteins. Our data show that several of the members of cation/proton exchanger family proteins are expressed at a higher level in the soybean pollen in comparison to those of sporophytic tissues. Bock et al [36] reported that fourteen members of the cation/proton exchanger (CHX) gene family are expressed late in pollen development and also raised questions about their roles and multiplicity. The possibility that they are localized to different intracellular compartments was proposed. It is noteworthy that a similar multiplicity of cation/proton exchanger family genes that are up-regulated in the soybean pollen is apparent in our data.

WD-40 repeat proteins
WD-40 repeat proteins are defined by the presence of four or more repeating units containing a conserved core of approximately 40 amino acids that usually end with tryptophan-aspartic acid (WD). WD-repeat proteins are conserved in animals and plants, where they participate in complexes involved in chromatin metabolism and gene expression [43][44][45]. They also have been reported to be transcriptional repressors that interact either with corepressors or in a complex with histone deacetylases, to regulate spermatogenesis, and to function as mitotic checkpoints to ensure accurate chromosome segregation. A number of WD-repeat protein are up-regulated in the soybean pollen (Table 6) implicating their likely involvement in regulating pollen development.

Heat shock proteins
Heat shock proteins (HSPs)/chaperones) are divided in five major families: the HSP70, the HSP60, the HSP 90, the HSP 100 families and a small HSP family [46]. The accumulation of heat shock proteins (HSPs) under heat and other abiotic stresses has been suggested to play a key role in the acquisition of thermotolerance in plants and other organisms. At the cell level these proteins are responsible for protein folding, assembly, and translocation, and can assist in protein re-folding under stress conditions. Some studies could not detect heat shock response in developing microspores or mature pollen of various species [47,48] while others have shown that many HSPs are expressed in microspores and mature pollen [49].
It is interesting to note that in our present study on mature soybean pollen transcriptome, there is significant up-regulation of transcripts encoding heat shock proteins as well as heat shock transcription factors HSFB2A and HSFA5 (Table 7; Figure 3). A recent study on transcriptome changes during pollen germination showed significant up-regulation of HSPs during pollen germination and tube growth, and many of these HSPs are undetectable at the expression level in mature pollen [50]. These authors proposed that these HSPs might function as molecular chaperones for protein modification processes during pollen germination and tube growth. Heat shock factors are the primary molecules responsible for activating genes responsive to both heat stress and other stressors [51]. The up-regulation of heat shock transcription factor HSFB2A and HSFA5 in soybean pollen matches similar up regulation of its counterpart in Arabidopsis pollen [51]. The plant HSF family has been reported to comprise more than 20 members with recent evidence pointing towards the unique functions of individual HSFs in signal transduction pathways activated in response to environmental stress and during development. Conserved up-regulation of HSFB2A and HSFA5 in both soybean and Arabidopsis pollen points towards unique role of these transcription factors in pollen development and possibly in gamete development. It is interesting to note that heat shock proteins are known for their role in animal spermatogenesis by acting as molecular chaperones to assist with protein folding [52].

Conclusion
This is the first report on transcriptional profiling of the pollen of a major legume crop. The current knowledge from pollen transcriptome profiling with microarrays is limited to the model plant, Arabidopsis. Our data will extend the current understanding of pollen biology and gene regulation by providing a set of robustly selected, differentially expressed genes in soybean pollen. We also provide a number of genes with unknown functions that are highly expressed in the pollen and could be tested in many functional analyses to increase our understanding of gene regulation in pollen. Most of the genes important for sporophytic organs are highly repressed in pollen. Regulation of these genes is probably controlled at the transcriptional level by transcriptional factors and chromatin remodelling machinery, as pollen contains a variety of Pollen was collected on coverslips by rubbing isolated anthers together, and anther tissue was removed from the coverslip prior to freezing at -80°C. Pollen purity and viability was assessed by microscopic observations and fluorescein diacetate test (Figure 4).

RNA isolation and microarray hybridization
Total RNA from pollen or sporophytic tissues (primary stem, primary roots and mature leaves of 10-day-old soybean seedlings) was isolated using the QIAGEN RNeasy Mini Kit (QIAGEN) and eluted with nuclease-free water. Subsequent cDNA labelling and Affymetrix Soybean GeneChip hybridization was carried out by AGRF (Australian Genome Research Facility, Melbourne, Australia) using 3 μg of total RNA according to protocols outlined in http://www.affymetrix.com/support/downloads/manu als/expression_analysis_technical_manual.pdf.

Analysis of expression data
The GeneChip ® Soybean Genome Array (Affymetrix, Inc.) containing probe sets for 37,500 transcripts was used in this study. Three biological replicates for pollen and two biological replicates for sporophytic tissues were used.
Raw numeric values representing the signal of each feature were imported into AffylmGUI (Affymetrix linear modeling Graphical User Interface [7] that uses the Empirical Bayes linear modeling approach of Smyth (2005) [53] for identifying differentially expressed genes in pollen. The data were normalized using Robust Multiarray Averaging (RMA) method and a linear model was then used to average data between replicate arrays and to look for variability between them [7]. The list of transcripts that were detected to be differentially expressed at adjusted p-value of < 0.05 were used for all subsequent analysis. All microarray data have been submitted to Gene Expression Omnibus (GEO) at NCBI http://www.ncbi.nlm.nih.gov/ geo under the accession GSE 12286.
To obtain the number of pollen-expressed genes (expressed in pollen and sporophytic tissues), we collect the expression signals, average expression values, and present/absent calls from AffylmGUI (RMA data) and sorted the data in Excel. To find pollen-specific group of genes, we used the following criteria: 1) showed statistically significant differential expression at adjusted pvalue < 0.05; 2) possessed a signal greater than or equal to 100 on each replicate; 3) had a cut-off value of a 2-fold change; and 4) had "Absence" calls on all of the sporophytic replicates.
The annotation for the transcripts represented by the soybean GeneChip ® was downloaded from the Seed Development website http://estdb.biology.ucla.edu/seed/. The annotation is based on the best BLASTX match of the corresponding soybean sequences against TAIR Arabidopsis protein database or NCBI non-redundant protein database (expect value < 0.01). Functional categories for these transcripts were assigned based on the EU Arabidopsis sequencing project [54] as described at the Seed Development website http://estdb.biology.ucla.edu/seed/. Photomicrograph of isolated pollen under light microscopy (left) and fluorescein diacetate viability screen in epifluores-cence microscopy (right) Figure 4 Photomicrograph of isolated pollen under light microscopy (left) and fluorescein diacetate viability screen in epifluorescence microscopy (right).