Genome-wide analysis and stress-responsive expression of CCCH zinc finger family genes in Brassica rapa

Ubiquitous CCCH nucleic acid-binding motif is found in a wide-variety of organisms. CCCH genes are involved in plant developmental processes and biotic and abiotic stress responses. Brassica rapa is a vital economic crop and classical model plant of polyploidy evolution, but the functions of CCCH genes in B. rapa are unclear. In this study, 103 CCCH genes in B. rapa were identified. A comparative analysis of the chromosomal position, gene structure, domain organization and duplication event between B. rapa and Arabidopsis thaliana were performed. Results showed that CCCH genes could be divided into 18 subfamilies, and segmental duplication might mainly contribute to this family expansion. C-X7/8-C-X5-C3-H was the most commonly found motif, but some novel CCCH motifs were also found, along with some loses of typical CCCH motifs widespread in other plant species. The multifarious gene structures and domain organizations implicated functional diversity of CCCH genes in B. rapa. Evidence also suggested functional redundancy in at least one subfamily due to high conservation between members. Finally, the expression profiles of subfamily-IX genes indicated that they are likely involved in various stress responses. This study provides the first genome-wide characterization of the CCCH genes in B. rapa. The results suggest that B. rapa CCCH genes are likely functionally divergent, but mostly involved in plant development and stress response. These results are expected to facilitate future functional characterization of this potential RNA-binding protein family in Brassica crops.

Zinc finger motif was first found in the Xenopus transcription factor IIIA, and zinc finger protein is now regarded as one of the most abundant protein family in eukaryotic genomes. Zinc finger motif was named based on the zinc-binding amino acids and the requisition of zinc ions to stabilize its structure. Zinc finger motif is a small, functional, independently folded domain [16][17][18]. Zinc finger proteins play a multi-faceted role in numerous biological processes, including DNA recognition, RNA packaging, transcriptional activation or repression, regulation of apoptosis, protein folding and assembly, and lipid binding [18,19]. Fourteen zinc finger families have been found in plants, in which DNA or protein binding proteins are over-represented [20]. Cys2His2 (C2H2) zinc finger is the most common DNA-binding motif found in eukaryotic transcription factors [21]. By contrast, the ubiquitous CCCH motif is preferentially function in RNA-binding and processing. In mammals, a prototypical tandem CCCH zinc finger protein tristetraprolin (TTP) binds to the TNFα (tumor necrosis factor α) ARE (AU-rich element) in 3' UTR region to destabilize TNFα mRNA [22]. In Arabidopsis and rice, different CCCH motifs are characterized by variable number of amino acid spacers between each cysteine and cysteine-histidine (C-X 4-15 -C-X 4-6 -C-X 3 -H) [23]. The CCCH zinc finger proteins usually contain 1-6 CCCH repeated motifs, and C-X 7/8 -C-X 5 -C-X 3 -H is the most abundant motif in Arabidopsis and rice CCCH proteins. Arabidopsis has 68 CCCH protein genes that are divided into 11 subfamilies [23]. AtC3H14 (At1G66810) and AtC3H15 (At1G68200) containing the C-X 8 -C-X 5 -C-X 3 -H-X 18 -C-X 8 -C-X 5 -C-X 3 -H motif are the only two Arabidopsis tandem CCCH zinc finger (TZF) proteins with a conserved TZF motif identical to animal counterparts [24]. AtC3H48 (AT4G25440), AtC3H59 (AT5G40880), AtC3 H62 (AT5G49200), and AtC3H63 (AT5G51980), containing one or two CCCH motif(s) of C-X 7 -C-X 4/5 -C-X 3 -H and additional seven WD40 (WD or beta-transducin repeats) domains, are unique to plants [23,25,26]. Rice is another model plant for CCCH zinc finger protein research. Rice CCCH zinc finger proteins contain 67 members in 8 subfamilies. Rice CCCH proteins are mainly involved in abiotic stress response and development [23].

Identification of CCCH genes in B. rapa
Using Arabidopsis CCCH zinc finger proteins as queries, we screened B. rapa genome by BLASTp tool (http:// brassicadb.org/brad/). We identified and reaffirmed 103 CCCH zinc finger proteins by SMART and NCBI in B. rapa (Additional file 1). 102 of the 103 corresponding CCCH genes were mapped to chromosome A01-A10 ( Fig. 1). Chromosome A09, the first longest chromosome and Chromosome A03, the second longest chromosome, possesses the largest number (16) of CCCH genes followed by ChrA07 with 13 members. ChrA10 is the shortest chromosome, 20.72 M, and carries the highest density of 11 CCCH genes. ChrA02, the third longest chromosome, carries the lowest density of CCCH genes. No CCCH genes were observed in the middle region (longer than 10 M) of ChrA01 and ChrA02. The average density of CCCH genes on each chromosome is lower than that of Arabidopsis, whereas higher than that of rice and maize (Fig. 2).

Duplication of CCCH genes in B. rapa
Gene duplication plays a vital role in the evolution of new gene functions and is one of the primary drivers of adaptive evolution [50]. According to the criteria established previously, some B. rapa CCCH genes have been apparently duplicated [51] Additional file 2). The highest frequency of CCCH gene segmental duplication events occurred between ChrA01 and ChrA03 which contained five segmental duplication events, followed by four segmental duplication events between ChrA03 and ChrA05. Most of the duplicated gene pairs are linked, suggesting that chromosome or segment duplication might occur among ChrA01, ChrA03 and ChrA05 (Fig. 1). Six additional CCCH gene pairs located on ChrA03 and ChrA05 were not regarded as duplicated genes because CDS coverage or protein identify did not meet the criteria of Yang [51] and Sun [52], even though they are orthologous to the corresponding Arabidopsis genes (Additional file 2). All of the Ka/Ks ratios were significantly < 0.5, suggesting a result of negative selection. The gene duplication date was estimated around 0.29-19.58 MYA (Million Years Ago) (Additional file 2).

Phylogenetic analysis of CCCH zinc finger proteins in B. rapa
To further characterize the CCCH genes in B. rapa, 103 CCCH zinc finger protein sequences were subjected to phylogenetic analysis by the Maximum Likelihood (ML) method ( Fig. 3; Additional files 1 and 3; [23]). Based on the Arabidopsis classification system [23], the aforementioned 84 out of 103 CCCH zinc finger proteins could be grouped into 18 distinct subfamilies. Subfamily-I with 19 members is the largest, followed by subfamily IX containing 17, whereas subfamily-III has just one member. Subfamily-XVIII contains the proteins with the longest sequences (Fig. 4). The distribution of CCCH zinc finger proteins in B. rapa phylogenetic tree constructed by the Maximum Likelihood method is similar to that made by Neighbor-joining (NJ) phylogenetic trees of Arabidopsis-B. rapa (Additional file 4) or ML phylogenetic trees of Arabidopsis-B. rapa-rice (Additional file 5).

Structures of B. rapa CCCH genes
To understand the evolution of gene diversification, the gene structures were analyzed (Fig. 5). Results show that the intron/exon number of CCCH genes changed in a wide range, from1-16, but each subfamily was relatively conserved in gene structure. For example, subfamily-I genes always contain five to seven exons except BraA05g011870, BraA05g034410 and BraA07g011120. Subfamily-VI and subfamily-VII genes contain two and three exons, respectively. All gene structures of subfamily-IX have only one exon, not intron. Interestingly, some ungrouped genes had more complex gene structures. Duplicated gene pairs were mainly distributed in subfamily-I and IX, but all of subfamily-VII/XIV members were triplicated genes. Most of the duplicated genes displayed similar gene structure between or among sister genes.   Domain organization of B. rapa CCCH zinc finger proteins In order to determine the evolution and conservation of CCCH zinc finger proteins, full-length proteins were used to decipher the domain organization. There were significant differences in domain organization among subfamilies, even for the type and number of CCCH motif in B. rapa. Altogether, 15 types and 257 CCCH motifs (C-X 3-17 -C-X 4-10 -C-X 1-5 -H) were identified. Among them, C-X 7/8 -C-X 5 -C 3 -H was the most common motif (Figs. 2 and 4). In general, each CCCH zinc finger protein carries 1-6 copies of CCCH motifs (Fig. 4). C-X 7/8 -C-X 5 -C 3 -H motif was mainly found in subfamily-I, VI, VII, VIII, XI and XIII. In contrast to Arabidopsis, C-X 7 -C-X 6 -C-X 3 -H and C-X 10 -C-X 5 -C-X 3 -H motifs were not present in B. rapa, whereas novel motif C-X 17 -C-X 6 -C-X 3 -H was found in BraA07g000550 and BraA09g012200 of subfamily-XIV. The C-X 17 -C-X 6 -C 3 -H motif is not present in Arabidopsis, but does exist in maize genome [20]. The novel motifs C-X 3 -C-X 5 -C 1 -H, C-X 17 -C-X 6 -C 3 -H, C-X 8 -C-X 5 -C 5 -H and C-X 4 -C-X 10 -C 2 -H exist in unassorted proteins BraA10g028450, BraA09g056300, BraA01g000170 and BraA02g003110, respectively. Except CCCH motifs, WD40 (WD or beta-transducin repeats), ANK (Ankyrin repeats), RRM (RNA recognition motif) and RING (Really Interesting New Gene) domains exist in subfamily-IV, IX, X/XI, XVI, respectively.

Conserved stress-responsive subfamily-IX members
Arabidopsis RR-TZF genes (Subfamily-IX) play pivotal roles in plant growth, development and stress response likely by targeting AU-rich RNA elements at 3' UTR and recruiting catabolic machineries to trigger mRNA degradation [34]. To characterize the corresponding subfamily-IX homologs in B. rapa, the proteins sequences were used to constructing the phylogenetic tree and deciphering the domain organization, and their transcriptional responses to ABA, drought, and salt stresses were determined (Figs. 6 and 7). Arabidopsis and B. rapa RR-TZF proteins can be divided into two groups (Fig. 6a), due to highly conserved  Ankyrin repeats (ANK domain) which is essential for all known Notch signaling pathway by mediating proteinprotein interactions [53]. Group-2 members contain both arginine-rich motif (RR) and TZF domains, whereas group-1 members contain an additional two or three ANK repeats. B. rapa RR-TZF proteins contain conserved C-X 7-8 -C-X 5 -C-X 3 -H and C-X 5 -C-X 4 -C-X 3 -H motifs spaced by 16 amino acids and an RR motif which contains a conserved C-X 5 -H-X 4 -C-X 3 -H motif (Fig. 6b). We used a program from Wang et al. [23] to detect Nuclear Export Signal (NES) in B. rapa RR-TZF proteins, and found all members contained a putative NES sequence, indicating that they may be nucleocytoplasmic shuttling proteins involved in signal transduction (Fig. 6c).
The expression profiles of RR-TZF genes in B. rapa leaves at 4 different developmental stages are detected by real-time PCR (Fig. 7). Twelve of the 17 RR-TZF genes showed remarkably elevated expression under NaCl, ABA or mannitol treatment. However, BraA02g004530, BraA0 3g021290, BraA03g022120, BraA04g029550, BraA10g02 5840 showed no significant expression changes, so they might not be involved in responses to these stresses. Most of the RR-TZF genes had much higher expression under ABA than under control (MS) conditions, except for the five aforementioned genes and BraA10g029760 and BraA 05g005940. Among the ABA-responsive genes, BraA07 g001000, BraA01g008400, BraA03g054960 and BraA09 g046730 showed remarkable induction, with higher expression levels than under NaCl or mannitol stress stimulus. BraA10g029760 and BraA05g005940 were easily induced by NaCl, while BraA09g020370 was strongly induced by mannitol stress. Most B. rapa RR-TZF genes showed increasing expression within 3-6 h, after which their expression declined, except for BraA01g008400 and BraA09g046730 under ABA stress, and BraA08g035210 and BraA05g005940 under NaCl stress, which showed expression peaks after only 1 h.

Discussion
Identification and classification of B. rapa CCCH zinc finger proteins Brassica crops are not only vital economic crops, but also classical model plants of polyploidy evolution. B. rapa, one of the most important vegetable crops and genomic model organisms, may be the putative contributor of the A-subgenome (B. oleracea provides C-subgenome, and B. napus, the hybrid offspring B. rapa and B. oleracea, contains A-and C-subgenome) [49]. Previous studies showed that transcription factors of B. rapa were significantly over-retained [49]. CCCH genes, as ubiquitous regulators in a variety of organisms, function in plant development and stress response by interacting with DNA, RNA or proteins [22,23]. Rameneni et al. has been just identified 63 CCCH genes in B. rapa [54], and it was much less than its expected quantity [49]. Here, we identified and confirmed 103 CCCH genes in B. rapa (Fig. 1, Additional file 1), which is much more abundant than 68 in Arabidopsis (n = 5, [23]), 67 in rice (n = 12, [23], 68 in maize (n = 10, [20] and even that of plant species with nearly twice chromosome number, such as 91 in poplar (n = 19, [44]) and 69 in grape (n = 19, [43]), and a similar number to tetraploid switchgrass [48].
B. rapa genes have an average transcript length of 2015 bp, a coding sequence length of 1172 bp and a mean of 5.03 exons per gene [49]. B. rapa CCCH genes intron/exon numbers varies in a wide range, from1-16. Most of the sequences have less than 10 exons, and the average is 5.72 exons per gene (Fig. 5). Moreover, similar to maize, the range of exon length of B. rapa CCCH genes is very wide. Together these results indicate that the gene structure and function are highly diverse among CCCH genes [43]. However, the gene structure and domain organization are relatively conserved in each subfamily, suggesting functional redundancy of the subfamily members.

Duplication and evolution of B. rapa CCCH genes
The CCCH family appears to undergo complicated evolution processes and become one of the largest gene families in plants [23,44]. Brassica genome has undergone an additional whole genome triplication (WGT) since its divergence from the Arabidopsis lineage at least 13-17 MYA. More than 90% of the B. rapa genome is syntenic with that of the Arabidopsis genome [49]. Previous study also showed that the triplicated Brassica genome segments diverged from a common ancestor soon after Arabidopsis and Brassica lineages divergence, and about 35% of these genes have been lost, most likely via a deletion mechanism in an interspersed pattern [61]. However, genes encoding proteins involved in signal transduction or transcriptional regulation are largely well retained [49,61]. Environmental factors may play crucial roles in transcription factors reservation [49]. Similarly, CCCH gene family of B. rapa may have expanded because of genome triplication. After that the gene family might still retain most key members because of their comprehensive and vital functions in response to abiotic or biotic stresses, although there is an overall shrinkage of some members since the total number is just~1.5 fold to that of Arabidopsis ( [49], Fig. 2). Brassica species B. oleracea (2n = 20) and B. nigra (2n = 16) CCCH families also expanded, possessing 75, 92 sequences respectively (Additional files 3, 7 and 8).
It is surprising that all 17 duplicated CCCH gene pairs are segmental duplication and except one group on ChrA 09, and all sister genes are orthologous of Arabidopsis counterparts except for one pair ( Fig. 1; Additional file 2). Ka, Ks, and Ka/Ks ratios of duplicated genes were calculated to explore the mode of selection [20,62]. Generally, a ratio < 1 means negative selection, a ratio = 1 means neutral selection, while a Ka/Ks ratio > 1 means positive selection [20]. Most of the resultant Ka/Ks ratios were significantly < 0.5 (Additional file 2). These results are highly consistent with previous reports, strongly suggesting that most of duplicated CCCH genes undergo purifying selection and the functions of the duplicated genes don't diverge much after the duplication events [20,44]. The relative conserved gene structures and absolute conserved domain organizations in/among sister genes of duplicated gene pairs further demonstrate the retentive function (Figs. 4 and 5).
The estimated dates of duplication indicated that 9 gene-pairs duplication events might occur before the additional whole genome triplication (WGT), and the others might occur after that (Additional file 2; [49]).

Domain organization diversity and putative function analysis
Domain organization directly related to the protein functions [63]. Domain organization is conserved between Arabidopsis and B. rapa. Most of subfamily-I proteins are highly conserved with five C-X 8 -C-X 5 -C-X 3 -H motifs, such as Arabidopsis AtC3H37 (HUA1, AT3G12680) that regulates stamen, carpel, and floral development by preferentially binding to poly rU and poly rG [27]. Subfamily-II is conserved between plant and animal, such as Arabidopsis AtC3H14 (At1G66810) and AtC3H15 (At1G68200) containing the typical C-X 8 -C-X 5 -C-X 3 -H-X 18 -C-X 8 -C-X 5 -C-X 3 -H motif found in animal TZFs. AtC3H14 is involved in secondary wall biosynthesis [24]. Subfamily-III such as Arabidopsis AtC3H1 (AT1G01350) with C-X 8 -C-X 5 -C-X 3 -H motif also belongs to RING-finger family, and is involved in lignin biosynthesis [28,29]. The B. rapa subfamily-III homolog BraA10g000600 has conserved CCCH and RING domain. All subfamily-IV proteins, including Arabidopsis and B. rapa homologs, contain a C-X 7 -C-X 4 -C-X 3 -H or C-X 7 -C-X 5 -C-X 3 -H motif, plus additional six WD40 repeats, which is a protein-protein or protein-DNA interaction domain [56]. The subfamily-IV proteins are plant-specific and may be involved in chlorophyll biosynthesis and light response [23,25]. subfamily-V, X and XI proteins in Arabidopsis contain a C-X 7 -C-X 5 -C-X 3 -H motif and an RNA recognition motif (RRM). RRM is the most common RNA-binding domain in eukaryotes. Plant RRM-containing proteins are involved in the regulation of flowering and adaptation to heat stress [57,58,64]. Subfamily-VI, BraA05g023320 and BraA09g032780 have one C-X 7 -C-X 5 -C-X 3 -H and two C-X 8 -C-X 5 -C-X 3 -H motifs, whereas the BraA01g033820 and BraA03g038850 have two C-X 7 -C-X 5 -C-X 3 -H and one C-X 8 -C-X 5 -C-X 3 -H motif. Subfamily-VII proteins, containing a C-X 7 -C-X 5 -C-X 3 -H, a C-X 8 -C-X 5 -C-X 3 -H and a conserved RNA-binding K homology motif (KH), are conserved in domain organization and may possess transactivation and RNA-binding activities, that are also known to have redundant roles in the regulation of flowering and senescence in Arabidopsis [31,59]. Subfamily-IX members are characterized by two identical C-X 7-8 -C-X 5 -C-X 3 -H and C-X 5 -C-X 4 -C-X 3 -H motifs separated by 16-18 amino acids. The subfamily-IX proteins are involved in plant growth, development, and stress response [33,34]. Subfamily-X and subfamily-XI have similar domain organization. Subfamily-XV proteins have two C-X 7 -C-X 5 -C-X 3 -H and one C-X 8 -C-X 4 -C-X 3 -H. Subfamily-XVI proteins have three C-X 7 -C-X 5 -C-X 3 -H, one C-X 9 -C-X 5 -C-X 3 -H and a protein-binding RING domain. Subfamily-XVII proteins have a C-X 7 -C-X 5 -C-X 3 -H and an auxin-repressed motif. Two subfamily-XVIII proteins carry PHD (Plant Homeo Domain), SWIB (SWI/SNF complex B) and GYF (glycine-tyrosine-phenylalanine) motifs except C-X 7 -C-X 5 -C-X 3 -H. Except CCCH, RING, PHD, RRM, ANK motifs are also responsible for protein-protein or protein-DNA or RNA binding, and it suggests that CCCH protein might involved in multifunction [53,57,58,64,65].

Conclusion
In this study, we identified 103 CCCH genes in B. rapa. Eighty-eight of these genes are categorized into 18 subfamilies based on the results of phylogenetic, gene structure and domain organization analysis. Gene structure and domain organization results reveal that CCCH genes are functional diverged, but highly conserved among members within subfamily. There are nine diploid gene pairs and seven triploid gene pairs, and all duplicated genes are due to segmental duplication. Furthermore, the results of expression profiling suggest that members of subfamily-IX might be involved in ABA, drought, and salt stress response.

Analysis of gene structure, domain organization, and phylogenetic relationship
The gene structures were visualized using the Gene Structure Display Server (http://gsds.cbi.pku.edu.cn/). The site information of the domain organization was used to construct a protein organization sketch map using IBS1.0 [73].
Multiple sequence alignment of CCCH zinc finger proteins was carried out using the MUSCLE (MUltiple Sequence Comparison by Log-Expectation) program [74] and the resulting file was subjected to phylogenic analysis using the MEGA 7.0 program [75]. A tree was constructed based on the full-length protein sequences using the Maximum Likelihood (ML) method with Partial deletion and Poisson model, and a Bootstrap test of 1000 replicates for internal branch reliability.

Duplicated genes encoding CCCH zinc finger proteins
Duplicated genes were defined according to Yang et al. [51]: gene pairs in which both the coverage of the shorter full-length-CDS sequence covering and the identifies of their encoding amino acid > 70% were regarded as duplicated genes. Tandem duplicates were defined following Sun et al. [52]: duplicated genes located within 100 kb that were separated by ten or fewer non-homologues were defined as tandemly duplicated genes. The full-length CDS sequence coverage and amino acid identities were determined using Blastn/Blastp at the NCBI website [71].
The number of nonsynonymous mutations (Ka) and the number of synonymous substitutions (Ks) of duplicated genes were calculated by DnaSP 6.0 [62]. The Ka/Ks ratios between duplicated genes were analyzed to determine the mode of selection. The duplication time (T, million years ago, MYA) was calculated as T = Ks/2λ × 10 − 6 (λ = 1.5 × 10 − 8 for B. rapa [76].
Plant material and stress treatment B. rapa seedlings were grown on 1/2 MS plates at 25°C under a 16 h light/8 h dark photoperiod. Three-week-old seedlings with 2-3 true leaves were placed in a growth chamber for 3 days to acclimatize before treatment with ABA, NaCl and mannitol. The stress treatments were conducted in accordance with Lee et al. [77]. The whole seedlings were harvested and put into 1/2 liquid MS medium with 250 mM NaCl, 100 μM ABA or 300 mM mannitol. The seedlings were sampled to detect gene expression response to stress at 1, 3, 6, and 9 h and untreated seedlings were used as control at the same time points. Triplicate seedling samples were collected. The materials were quickly frozen in liquid nitrogen and stored at − 80°C for further analysis.

RNA extraction and real-time quantitative RT-PCR
RNA extraction and Real-time quantitative RT-PCR were conducted as described previously [71]. Total RNA was extracted from the samples using a TRIzol reagent kit (Invitrogen, Carlsbad, CA, US) according to the manufacturer's specifications. The RNA integrity was evaluated using agarose gel electrophoresis and ethidium bromide staining. The RNA preparation was then treated with Dnase I and first strand synthesis of cDNA was performed by using oligo (dT) primer and RT Enzyme (Thermo Fisher, USA).
The quantitative real-time PCR was carried out with SYBR-green fluorescence using a CF × 96 Real Time System (BIORAD) with a 20 μl PCR reaction mixture that included 8.8 μl of diluted cDNA, 10 μl of 2 × Fas-tStart Universal SYBR Green Master (ROX) (Roche, Switzerland), and 0.6 μl of forward and reverse primer (Additional file 9). The BraA02g003190 gene was used as a reference gene. Each sample was run in triplicate for analysis. At the end of the PCR cycles, melting curve analysis was performed to validate the specific generation of the expected PCR product. The expression levels of RR-TZF genes were calculated with the 2 − ΔΔCT method [78].