Molecular characterization of the SPL gene family in Populus trichocarpa

Background SPLs, a family of transcription factors specific to plants, play vital roles in plant growth and development through regulation of various physiological and biochemical processes. Although Populus trichocarpa is a model forest tree, the PtSPL gene family has not been systematically studied. Results Here we report the identification of 28 full-length PtSPLs, which distribute on 14 P. trichocarpa chromosomes. Based on the phylogenetic relationships of SPLs in P. trichocarpa and Arabidopsis, plant SPLs can be classified into 6 groups. Each group contains at least a PtSPL and an AtSPL. The N-terminal zinc finger 1 (Zn1) of SBP domain in group 6 SPLs has four cysteine residues (CCCC-type), while Zn1 of SPLs in the other groups mainly contains three cysteine and one histidine residues (C2HC-type). Comparative analyses of gene structures, conserved motifs and expression patterns of PtSPLs and AtSPLs revealed the conservation of plant SPLs within a group, whereas among groups, the P. trichocarpa and Arabidopsis SPLs were significantly different. Various conserved motifs were identified in PtSPLs but not found in AtSPLs, suggesting the diversity of plant SPLs. A total of 11 pairs of intrachromosome-duplicated PtSPLs were identified, suggesting the importance of gene duplication in SPL gene expansion in P. trichocarpa. In addition, 18 of the 28 PtSPLs, belonging to G1, G2 and G5, were found to be targets of miR156. Consistently, all of the AtSPLs in these groups are regulated by miR156. It suggests the conservation of miR156-mediated posttranscriptional regulation in plants. Conclusions A total of 28 full-length SPLs were identified from the whole genome sequence of P. trichocarpa. Through comprehensive analyses of gene structures, phylogenetic relationships, chromosomal locations, conserved motifs, expression patterns and miR156-mediated posttranscriptional regulation, the PtSPL gene family was characterized. Our results provide useful information for evolution and biological function of plant SPLs.

Background SPL proteins constitute a diverse family of transcription factors playing vital roles in plant growth and development. SPLs are specific to plants and have a highly conserved SBP (SQUAMOSA PROMOTER BINDING PROTEIN) domain with approximately 78 amino acid residues. The domain contains three functionally important motifs, including zinc finger 1 (Zn1), zinc finger 2 (Zn2), and nuclear location signal (NLS) [1,2]. Genes encoding SPLs were first identified for SBP1 and SBP2 in Antirrhinum majus [3]. Lately, it has been found in various green plants, including single-celled green algae, mosses, gymnosperms, and angiosperms. The results showed that SPLs existed as a large gene family in plants.
Populus trichocarpa is a model plant with whole genome sequence available [32]. A total of 352 miRNA precursors, including 12 for miR156, have been identified [33][34][35][36][37][38][39]. However, the regulation of miR156 in P. trichocarpa PtSPLs has not been analyzed. In our previous studies [40], 17 PtSPLs, which appeared to be full-length or partial sequence with at least 300 amino acids, were identified from the Populus genome assembly v1.1 (http:// genome.jgi-psf.org/Poptr1_1/Poptr1_1.home.html). They were named PtSPL1-PtSPL17, respectively, of which PtSPL3 and PtSPL4 had the highest similarities with AtSPL7 involved in Cu homeostasis [40]. In order to characterize the whole SPL gene family in P. trichocarpa, we searched the Populus genome assembly v1.1, v2.2 and v3.0 [32]. It resulted in the identification of 28 full-length PtSPLs. Gene structures, chromosomal locations, phylogenetic relationships, conserved protein motifs and expression patterns of all identified PtSPLs were systematically analyzed. MiR156-mediated posttranscriptional regulation of PtSPL genes was investigated. The results provide useful information for elucidating the biological functions of SPLs in P. trichocarpa.

Results
Identification of 28 SPL genes in P. trichocarpa genome Analysis of the Populus genome assembly v1.1, v2.2 and v3.0 showed the existence of 28 full-length SPL genes in the P. trichocarpa genome ( Table 1). All of the deduced PtSPL proteins contained the conserved SBP domain. The theoretical pI of deduced PtSPL proteins ranged from 5.87 to 9.49. The length varied between 148 and 1044 amino acids. The molecular weight (Mw) varied from 16.2 to 116.1 kDa (Additional file 1). The distribution of pI is similar to AtSPLs (Additional file 2); however, the length and Mw of PtSPLs are larger than AtSPLs.
Mapping PtSPLs to the P. trhichocarpa genome showed that 28 PtSPLs were unevenly distributed on 14 chromosomes with four on Chr2, 3 on each of Chr1, Chr8, Chr10 and Chr14, 2 on each of Chr3, Chr11 and Chr15, and one on each of Chr4, Chr5, Chr7, Chr12, Chr16 and Chr18 (Figure 1). Relatively high densities of PtSPLs were observed in the top and bottom regions of Chr8, Chr10, Chr11 and Chr14, the top of Chr1, Chr4, and Chr16, and the bottom of Chr3, Chr5, Chr7, Chr12 and Chr18. Few are in the central regions of chromosomes. Moreover, 11 pair of PtSPLs (Ks < 1.0) were evolved from intrachromosomal duplication ( Table 2), indicating the importance of gene duplication for PtSPL gene expansion.

Phylogenetic analysis of SPLs in P. trichocarpa and Arabidopsis
In order to investigate the evolutionary relationship between P. trichocarpa and A. thaliana SPL proteins, a neighbor-joining (NJ) phylogenetic tree was constructed for 28 PtSPLs and 16 AtSPLs using MEGA5.1. The reliability of branching was assessed by the bootstrap resampling method using 1,000 bootstrap replicates. Only nodes supported by bootstrap values >50% are used for further analysis. The results showed that the 44 SPL proteins clustered into 6 groups (named G1-G6), each of which contained at least one AtSPL and one PtSPL (Figure 2). It is consistent with the results from SmSPLs in Salvia miltiorrhiza [41]. To further confirm that there are 6 groups of SPLs, we also constructed a phylogenetic tree for 28 PtSPLs, 16 AtSPLs, 18 rice OsSPLs and 15 SmSPLs. As shown in Additional file 3, the 77 SPLs also clustered into 6 groups. The difference between the two trees constructed (Figure 2, Additional file 3) is that PtSPL12, PtSPL13, PtSPL28 and AtSPL6 belonging to G1 in Figure 2 are included in G2 in Additional file 3. An intron was found in the SBP domain-encoding region of all SPL genes from P. trichocarpa and Arabidopsis ( Figure 3); however, sequence feature analysis showed that the SBP domain of SPLs in G6 (AtSPL7, PtSPL3 and PtSPL4) were divergent from the other groups. The N-terminal zinc finger of G6 SPLs has four cysteine residues in the SBP domain, while SPLs in the other groups mainly contain three cysteines and one histidine, indicating the diversification of plant SPL evolution. On the other hand, SPLs within a group have similar intron number, exon-intron structure, and coding sequence length. Consistently, the length, Mw and theoretical pI of deduced SPL proteins within a group are also similar, although they are divergent among groups. It suggests the conservation of plant SPLs in a group. Phylogenetic analysis showed that PtSPL3 and PtSPL4 had high homology with AtSPL7, an Arabidopsis SPL with the capability of binding CuREs in the MIR398 promoter in vitro and involved in response to copper deficiency in Arabidopsis [22]. It is consistent with our previous results for PtSPLs [40]. Based on the phylogenetic tree, PtSPL3 and AtSPL7 are very likely to be orthologous proteins ( Figure 2). Additionally, 5 pairs of AtSPLs and 11 pairs of PtSPLs seem to be paralogous proteins ( Figure 2). It includes AtSPL9/15, AtSPL10/11, PtSPL8/27, PtSPL12/13 and PtSPL11/19 belonging to G1, PtSPL18/22 and PtSPL14/15 from G2, PtSPL21/26 belonging to G3, AtSPL14/16, AtSPL1/12, PtSPL2/9, PtSPL1/5 and PtSPL6/ 7 included in G4, and AtSPL3/4, PtSPL16/23 and PtSPL20/ 25 clustering in G5. About 62.5% of the 16 AtSPLs and 78.5% of the 28 PtSPLs exist as paralogous pairs. It suggested that the expansion of SPL genes occurred after separation of paralogous genes. The results from paralogous pair identification were consistent with segmental duplications in the P. trichocarpa genome (http://chibba.agtec.uga. edu/duplication/) [32], suggesting the origination of paralogous PtSPLs from segmental duplication. Prediction of potential age of tandem duplication events using synonymous substitutions (Ks) values showed that the segmental duplication events for PtSPLs appeared to occur in 9-21 mya ( Table 2). It is consistent with the age of P. trichocarpa genome duplication events [32].

Comparative analysis of PtSPL and AtSPL gene structures
Gene structure analysis showed that the number of introns in the coding region of 28 PtSPL genes varied from 1 to 10. The number of PtSPLs with 1, 2, 3, 4, 9 and 10 introns is five, ten, four, one, six, and two, respectively ( Figure 3, Additional file 1). Similarly, the intron number of 16 AtSPLs varies between 1 and 9 (Additional file 2). The pattern of intron distribution in PtSPLs is quite similar to AtSPLs with the majority to be 2 and 9 introns, followed by 1 and 3 ( Figure 3, Additional files 1 and 2) [41]. In addition, the position of intron in the SBP domain is highly conserved. It locates in the codon for the 48th amino acid of SBP domain (Additional file 4). These results suggest the conservation of exonintron structures between PtSPLs and AtSPLs.
The length of introns varies significantly among PtSPL genes, such as those in G1, G2 and G5 ( Figure 3). We analyzed the internal exons and introns of PtSPLs and AtSPLs. The results showed that the exons of PtSPLs had a size from 43 to 884 bp with an average of 314 bp, which is slightly greater than 293 bp of the average length of AtSPL exons. Approximately 59% of PtSPL exons and 63% of AtSPL exons have a size below 300 bp and 71% and 70% of exons are between 60 and 160 bp in PtSPLs and AtSPLs, respectively ( Figure 4). Although the size distribution of PtSPLs exons is similarity with AtSPL exons, intron size distribution is more variable, ranging from 30 bp to 3.0 kb. There are 6 PtSPL introns (5%) with sizes >1.5 kb; however, no such introns exist in AtSPLs. About 55% of PtSPLs have sizes below 300 bp and 56% of introns are between 60 and 160 bp; however, the majority of AtSPLs (94%) have sizes below 300 bp. The average size of PtSPL introns is 476 bp, which is much greater than 120 bp of AtSPLs. These results suggest the difference of exon and intron size distribution between PtSPLs and AtSPLs.
In addition to the conserved domains, other conserved motifs could also be important for the function of SPLs [27,43]. We searched conserved motifs using MEME and applied an e-value cut off of 1e −10 to the recognition. It resulted in the identification of 25 motifs for 28 PtSPLs (Figure 6, Table 3). The majority of motifs identified are conserved between PtSPLs and AtSPLs [41], while three, including motifs 11, 19 and 23, are specific to PtSPLs. It indicates the conservation and diversity of PtSPLs and AtSPLs. The number of motifs in each SPL varies from 1 to 16 ( Figure 6). Motif 1 is actually the SBP domain. Consistently, it exists in all SPLs analyzed. Motif 14 existed in G1 and G2 SPLs contains the target gene sequence of miR156, indicating the posttranscriptional regulation of G1 and G2 SPLs by miR156. In addition to motifs 1 and 14, several motifs widely exist in two SPL groups, such as motif 12 found in G1 and G2, motifs 2, 4, 5, 6, 15 and 16 existing in G4 and G6 (Figure 6), indicating the importance of these motifs. We also found several motifs to be group-unique, such as motif 24 specifically existing in G6 SPLs and motifs 7, 9, 10 and 18 specific to G4 ( Figure 6). These group-unique motifs could be important for specific roles of SPLs in the group. Moreover, PtSPLs and AtSPLs [41] within a group share similar motif (s), indicating they probably play similar roles in plant growth and development.

Expression patterns of SPLs in P. trichocarpa
The expression pattern of a gene is often correlated with its function. In order to preliminarily elucidate the roles of PtSPLs in P. trichocarpa development, we first searched PopGenIE for gene expression data from microarray analysis [44]. Except for PtSPL17, the expression levels of 27 PtSPLs in roots, stems, young leaves and mature leaves were obtained ( Figure 7). Next, we examined the relative expression levels of 28 PtSPLs in young leaves, mature leaves, young stems, young roots and tissues from developing secondary xylem and phloem from the 4th-6th and 12th-25th internodes of one-year-old P. trichocarpa plants using the quantitative real-time RT-PCR method ( Figure 8). The results showed that qRT-PCR data was generally consistent with microarray data for relative expression of PtSPLs in roots, stems, young leaves and mature leaves (Figures 7  and 8). Although all PtSPLs were expressed in at least one of the tissues examined, differential expression was observed. Many putative paralogous genes, such as PtSPL18/22 in G2, PtSPL21/26 in G3, PtSPL2/9, PtSPL1/ 5 and PtSPL6/7 in G4 and PtSPL16/23 belonging to G5, show similar expression patterns, suggesting redundant roles of these PtSPL gene pairs. However, the expression patterns of few gene pairs, including PtSPL12/13 in G1, and PtSPL14/15 belonging to G2 are distinct. It indicates these PtSPLs may play different roles in P. trichocarpa development, although they are paralogous genes.

MiR156-mediated posttranscriptional regulation of PtSPLs
It has been shown that 10 AtSPLs are regulated by miR156 [11]. The complementary sites of miR156 are in the coding regions or 3' UTRs of AtSPLs. In order to know miR156-medicated posttranscriptional regulation of PtSPLs, we searched coding regions and 3' UTRs of all PtSPLs for targets of P. trichocarpa miR156a-miR156j on the psRNATarget server using default parameters [45]. The results showed that 18 PtSPLs were potential targets of miR156 (Figures 9 and 10). MiR156targeting sites in 13 PtSPLs belonging to G1 and G2 locate in the last exon and encode the conserved peptide ALSLLS. The target sites for other 5 PtSPLs belonging to G5 locate in the 3' UTRs close to the stop codons ( Figure 10). Consistently, AtPSLs clustering in G1, G2 and G5 are targets of miR156 in Arabidopsis. It suggests that miR156-mediated posttranscriptional regulation of SPLs is conserved in P. trichocarpa and Arabidopsis.

Discussion
SPLs are plant-specific transcription factors containing a highly conserved SBP (SQUAMOSA PROMOTER BINDING PROTEIN) domain. It can specifically bind to the promoters of floral meristem identity gene SQUA-MOSA and its orthologous genes and plays important regulatory roles in plant growth and development [46][47][48][49]. The genes encoding SPLs have been identified from various plant species, such as Arabidopsis [2,10,23,26], maize [30], Antirrhinum majus [3], rice [50], silver birch [51], and S. miltiorrhiza [41]. SPL genes exist as a large gene family in plants. The number of SPLs in Arabidopsis, rice, P. patens, maize and tomato is 16, 19, 13, 31 and 15, respectively [4-9]. Availability of the whole genome sequence allows us to perform genome-wide identification of SPLs in P. trichocarpa. Analysis of three versions of the annotated P. trichocarpa genome showed the existence of 28 full-length PtSPLs, which distribute on 14 chromosomes. It is the first attempt to analyze the PtSPL gene family. The results provide a basis for elucidating the functions of SPLs in P. trichocarpa, a model forest tree.
The number of SPL genes in P. trichocarpa is much greater than that in Arabidopsis, rice, P. patens and tomato, although it is similar to the number of maize SPLs [4][5][6][7][8][9]. Sequence homologous analysis suggests that gene duplication plays an important role in SPL gene expansion in P. trichocarpa. A total of 11 pairs of intrachromosome-duplicated PtSPLs were identified in this study. All of them clustered together in the phylogenetic tree (Figure 2). It is consistent with previous findings for generation and maintenance of gene families in other organisms, such as mouse, human and Arabidopsis [52,53]. Actually, gene duplication has been reported for many plant transcription factor gene families, such as MYB, AP2, MADS and so on [54][55][56] and duplicated SPL gene pairs have been identified in Arabidopsis (AtSPL10/11, AtSPL4/5 and AtSPL1/12) and rice (OsSPL2/ 19, OsSPL3/12, OsSPL4/11, OsSPL5/10 and OsSPL16/18) [57][58][59][60][61]. However, the number of homologous PtSPL gene pairs is obviously greater than that in Arabidopsis and rice, indicating that more segment duplication events happened in Populus and most SPL genes in Arabidopsis and Populus expanded in a species-specific manner [62][63][64].
Comparative analysis of P. trichocarpa PtSPLs and Arabidopsis AtSPLs revealed many conserved sequence features. For instance, all of the deduced proteins contain the highly conserved SBP domain with about 78 amino acid residues. The intron position and intron phase in the SBP-domain-encoding regions are also conserved among all SPL genes in P. trichocarpa and Arabidopsis, indicating that plant SPL genes originate from a common ancestor. Based on the neighbor-joining (NJ) phylogenetic tree constructed using MEGA 5.1., 44 SPL proteins from P. trichocarpa and Arabidopsis were found to cluster into 6 groups. Each group includes at least a PtSPL and one AtSPL. The intron number and intron phase are similar for PtSPLs and AtSPLs within a group. The results suggest the conservation between P. trichocarpa PtSPLs and Arabidopsis AtSPLs.
It has been shown that AtSPLs play significant regulatory roles in a variety of developmental processes in Arabidopsis. For instance, morphological traits of cauline leaves and flowers are regulated by AtSPL2, AtSPL10 and AtSPL11 [19]. Juvenile-to-adult growth phase transition and leaf initiation rate are controlled by the   [23][24][25]. Cu homeostasis in Arabidopsis is regulated by the member of group 6, AtSPL7 [22]. In this study, we found that many motifs were unique to or mainly existed in a group of SPLs. It is consistent with the redundant roles of AtSPLs in a group and indicates that the members of PtSPLs in the same group may play similar roles as their Arabidopsis counterparts. The function of SPLs in different groups could be functionally distinct. On the other hand, three PtSPL-specific motifs, including motifs 11, 19 and 23, were identified, suggesting that some PtSPLs may play species-specific roles. Consistently, most of paralogous PtSPL gene pairs in the same group show similar expression patterns, whereas a few of them exhibit differential patterns. The results indicate subfunctionalisation and neofunctionalisation of SPLs within a plant species and among different species. MiR156-medicated posttranscriptional regulation is important for the function of a subset of SPLs [11,41,65]. Target prediction showed that all PtSPLs in groups 1, 2 and 5 were regulated by miR156. The complementary sites of miR156 locate in the coding region of G1 and G2 SPLs, whereas it locates in 3' UTR of G5 SPLs. It is consistent with the results from Arabidopsis SPLs and suggests the conservation of miR156-mediated posttranscriptional regulation in plants.

Conclusion
In this study, a total of 28 full-length SPLs were identified from the whole genome sequence of P. trichocarpa. Through a comprehensive analysis of gene structures, phylogenetic relationships, chromosomal locations, conserved motifs, expression patterns and miR156-mediated posttranscriptional regulation, the PtSPL gene family was characterized and compared with SPLs in Arabidopsis. The results showed that 28 PtSPLs and 16 AtSPLs clustered into 6 groups. Many PtSPLs and AtSPLs within a group are highly conserved in sequence features, gene structures, motifs, expression patterns and posttranscriptional regulation, suggesting the conservation of plant SPLs within a group. However, significant differences were observed for SPLs among groups. In addition, various motifs were identified in PtSPLs but not in AtSPLs. It suggests the diversity of plant SPLs. The results provide useful information for elucidating the functions of SPLs in P. trichocarpa.

Identification of PtSPL genes
The nucleotide sequences and deduced amino acid sequences of 16 known SPL genes in Arabidopsis [2,4] were obtained from the TAIR database (http://www.arabidopsis.org) (Additional file 2). The SBP domain of AtSPLs was identified using Pfam (http://pfam.sanger.ac. uk). BLAST search of PtSPLs against Populus trichocarpa v1.1, v2.2 and v3.0 was carried out using AtSPL SBP as the query sequences [32] (http://genome.jgi-psf.org/ Poptr1_1/Poptr1_1.home.html,http://www.phytozome.net/ poplar.php#B). An e-value cut off of 1e −5 was applied to the recognition. We also searched the databases for SBP using the keywords search tool on the web servers. Protein  Transcript levels in roots were arbitrarily set to 1 and the levels in other tissues were given relative to this. Error bars represent standard deviations of mean value from three biological replicates. ANOVA (analysis of variance) was calculated using SPSS. P < 0.05 was considered statistically significant. sequences retrieved from Populus trichocarpa v1.1, v2.2 and v3.0 were then aligned and combined based on sequence identities.

Phylogenetic construction and motif analysis
Phylogenetic trees were constructed using the neighborjoining (NJ) method in MEGA5.1. Branching reliability was assessed by the bootstrap re-sampling method using 1,000 bootstrap replicates. Only nodes supported by bootstrap values greater than 50% were analyzed. Conserved domains of PtSPLs were identified using Pfam (http://pfam.sanger. ac.uk) and by BLAST analysis of protein sequences against the Conserved Domain Database (CDD, http://www.ncbi. nlm.nih.gov/Structure/cdd/wrpsb.cgi) with the expected evalue threshold of 1.0 and the maximum size of hits to be 500 amino acids [70]. The 78 amino acids of SBP domain were aligned using clustalW. Sequence logos were generated using the weblogo platform (http://weblogo.berkeley. edu/). Potential protein motifs were predicted using the MEME package (http://meme.sdsc.edu/meme/) with the following parameters applied. It includes the distribution of motifs: zero and one per sequence, maximum number of motifs to find: 25, minimum width of motif: 8, and maximum width of motif: 150. An e-value cut off of 1e −10 was applied to the recognition.
Quantitative real-time reverse transcription-PCR (qRT-PCR) P. trichocarpa plants were grown in an artificial climate chamber for about one year. Young leaves (2nd-3rd from the top), mature leaves (12th from the top), young stems (1st-3rd from the top), young roots, tissues of developing secondary xylem and phloem from the 4th-6th and 12th-25th internodes from the top of P. trichocarpa plants were collected. Three biological repeats were carried out. Total RNA was extracted using the plant total RNA extraction kit (Aidlab, China). Genomic DNA contamination was eliminated by pre-treating total RNA with RNase-free DNase (Promega, USA). RNA integrity was analyzed on a 1.2% agarose gel and its quantity was determined using a NanoDrop 2000C Spectrophotometer (Thermo Scientific, USA). Total RNA was reversetranscribed by Superscript III Reverse Transcriptase (Invitrogen, USA). qRT-PCRs were carried out in triplicate for each tissue sample using gene-specific primers (Additional file 6) as described previously [71]. The program used for qRT-PCR is as follows: predenaturation at 95°C for 30s, 40 cycles of amplification at 95°C for 5 s, 60°C for 18 s and 72°C for 15 s. The length of amplicons was between 80 bp and 250 bp. Actin was used as a reference gene as described previously [72]. Dissociation curve was used to assess amplification specificity. Relative abundance of transcripts was analyzed using the comparative Ct method [73]. The arithmetic formula, 2-ΔΔCq, was used to achieve results for relative quantification. Cq represents the threshold cycle. Standardization of gene expression data from three biological replicates was performed as described [74]. For statistical analysis, ANOVA (analysis of variance) was calculated using SPSS (Version 19.0, IBM, USA). P < 0.05 was considered statistically significant.

Microarray data analysis
Microarray data of PtSPLs was obtained by the ePlanttissue expression tool at PopGenIE (http://www.popgenie. org/). The data was gene-wise normalized and then analyzed using the average linkage clustering technique in Cluster 3.0 [75].