Genome-wide identi cation and analysis of WRKY gene family in jute (Corchorus capsularis)

Background WRKY transcription factor is a kind of transcription factor which plays an important role in plant response to biotic, abiotic stress, plant growth and development. However, little information was available about the WRKY genes in jute ( Corchorus capsularis ).Results In the present study, 43 jute WRKY (CcWRKY) genes were identified by using Pfam database domain search and BLAST homology alignment based on the transcriptome data of jute. And the gene structure, phylogeny, conserved domain and three-dimensional structure of protein were also analyzed by GSDS2.0, MEGA7.0, DNAMAN5.0, WebLogo 3 and SWISS-MODEL bioinformatics tools. According to the WRKY conserved domain features and the evolution analysis with Arabidopsis thaliana , 43 members were divided into three classes: I, II and III containing 9, 28, 6 members, respectively. According to the evolutionary relationship, class II further divided into five subclasses: II-a (2), II-b (7), II-c (7), II-d (6) and II-e (6). Genetic structure analysis showed that exon and intron number of CcWRKY genes had high variability (3-11 exons), even within the same subgroup. Most of the CcWRKY genes were expressed in different tissues, but they were mainly expressed in stem bark and stem stick. After GA 3 stress, the expression of most WRKY genes in GA 3 -sensitive variety "Aidianyehuangma" was significantly different from that of normal variety "Huangma 179". These results indicated that CcWRKY genes play an important role in gibberellin biosynthesis pathway and fiber development.Conclusions CcWRKY proteins are highly conserved, the length of the gene sequence and the number of introns varied widely, all WRKY genes showed a variety of expression patterns in different tissues, most of the WRKY genes responded to GA 3 stress, which play an important role in gibberellin


Background
The WRKY gene family is a transcription factor that exists only in plants. It is mainly involved in transcriptional regulation and signal transduction processes in plants [1]. In the transcriptional regulatory network, WRKY transcription factors bind speci c DNA sequences to activate or repress transcription of multiple target genes [2,3]. The conserved WRKY domain contains approximately 60 amino acid residues.
In the WRKY domain, a conserved WRKYGQK hexapeptide sequence is usually followed by a C2H2-or C2HC-type zinc nger structure. According to the number of WRKY domains and the type of zinc nger structure, the WRKY family can be divided into 3 groups: Group , Group and Group [4]. And Group II could be further divided into ve subgroups: II-a, II-b, II-c, II-d and II-e. Group contains two WRKY domains and C 2 H 2 zinc nger structure, Group contains a WRKY domain and a C 2 H 2 zinc nger structure, and Group contains a WRKY domain and a C 2 -H-C zinc nger structure [5].
Jute (Corchorus capsularis L) is a natural ber plant, which belongs to the Malvaceae family (2n = 14). It is mainly cultivated in Bangladesh, India, China and so on. Although many studies have showed that WRKY gene plays an important role in plant growth development, stress tolerance and ber development [23], little information was available about the WRKY genes in jute. Therefore, this is the rst time to identify and analyze the WRKY transcription factors in jute. The aims were: (1) Identi cation, phylogenetic and structure analysis of the WRKY gene family in jute; (2) Expression analysis in different tissues and GA 3 stress of WRKY gene family in jute.

Plant materials
The tested varieties "Huangma 179" and "Aidianyehuangma" were planted in Yangzhong Science and Education Base of Fujian Agriculture and Forestry University, Youxi County, Sanming City, Fujian Province on May 1 st , 2017. 10 days after sowing, hypocotyl samples were obtained from three individuals separately. On July 1 st , 2017 (60 days after sowing), the stem bark and stem stick were obtained from three individuals separately. On September 1 st , 2017(120 days after sowing), the stem barks were obtained from three individuals separately. Then, the samples were immediately frozen in liquid nitrogen and stored in a refrigerator at -80℃ for subsequent analysis. Three samples come from each tissue were stored separately as three biological replicates.
To identi ed the WRKY gene of jute in response to GA 3 stress, six 60-day-after-sowing jute plants were treated with GA 3 stress. The GA 3 stress treatment was carried out for 4 hours and 3 days, respectively. After the GA 3 stress treatment, the stem barks of each jute were obtained. In addition, three jute plants of the same age were sampled separately as control. All samples were immediately frozen in liquid nitrogen and stored in a refrigerator at -80℃ for follow-up analysis.

Identi cation and Analysis of Conservative Domain of CcWRKY
The assembled sequence data have been deposited at the NCBI Sequence Read Archive (SRA, http://www.ncbi.nlm.nih.gov/Traces/sra) vide SRA SRP215917. Assembly of CcWRKY genes and ORF analysis were just based on the references of Islam et al [24] and Zhang, L. et al. [25]. To obtain the WRKY family genes in the jute genome, a local BLASTP search was performed to identify complete WRKY members, using Arabidopsis WRKY protein sequences as query sequences. In addition, we also use the conservative domain prediction software Pfam to ensure that all candidate genes contain WRKY conserved domains [26].
According to SWISS-MODEL [27], the three-dimensional homology of 43 members of WRKY transcription factor in jute were modeled and then the tertiary structure of WRKY protein were identi ed.

Phylogenetic analysis of CcWRKY protein
According to the software GSDS2.0 (http://gsds.cbi.pku.edu.cn/), The schematic diagram of intron and exon structure of CcWRKY were made. Phylogenetic and molecular evolutionary analysis was conducted using MEGA7 (http://www.megasoftware.net) [28] with pairwise distance and the neighbor-joining algorithm. The p-distance method was used to compute the evolutionary distances, which were used to estimate the number of amino acid substitutions per site. Conducting 1,000 bootstrap sampling steps [12] established the reliability of each tree. AtWRKY proteins were added to phylogenetic analysis to facilitate the subgroup classi cation of CcWRKY. WRKY family sequences from Arabidopsis were downloaded from TAIR (http://www.arabidopsis.org/).

Expression analysis of CcWRKY genes at different growth stages and GA 3 stress
According to the FRKM value of different genes in different tissues (hypocotyls-10d, leaf-60d, root-60d, stem bark-60d, stem stick-60d, stem bark-120d) and response with GA 3 stress on stem bark-60d, heatmap and histogram of WRKY genes were drawn by Excel, SPSS and R language.

RNA extraction and quantitative real-time PCR (qRT-PCR)
The RNA was extracted from the leaves, roots, stem bark and stem stick at the vegetable stage(60-dayafter-sowing) of jute with the EZNA Plant RNA Kit (from OMEGA), and cDNA was synthesized by a reverse transcription kit (from Takara). This experiment used the Actin [29] as the reference gene by qRT-PCR.
qRT-PCR reaction system: 10 μl of GoTaq® qPCR Master Mix, 0.4 μl of left primer(10μM), 0.4 μl of right primer(10μM), 2μl of cDNA, and 7.2μl of Nuclease-Free Water. Ampli cations were performed with an initial 10min step of 95℃ followed by 40 denaturation cycles at 95℃ for 15s and primer annealing at 60℃ for 1min. The melting curve used the default program of the instrument (ABI7500). The qRT-PCR reaction of each gene was repeated 3 times, and a similar result was obtained. In addition, the data obtained by qRT-PCR were calculated and analyzed by the formula 2 -△△CT . The primers were designed to avoid the WRKY conserved domain.

Identifcation of full length CcWRKY genes
To obtain the WRKY family genes in the jute genome, a local BLASTP search was performed to identify complete WRKY members, using Arabidopsis WRKY protein sequences as query sequences. In addition, we also used Pfam conserved domain prediction software to obtain 43 candidate genes containing WRKY domain (named as CcWRKY), as shown in Table 1.

Analysis of conservative domain of CcWRKY genes
The sequences of WRKY domain protein of jute WRKY genes were identi ed and analyzed by DNAMAN5.0 software. Then, conservative structure prediction was performed in Weblogo. The results showed that the conserved domains of WRKY gene family in jute could be divided into three groups: I, II and III. Group I had nine members. It could be further divided into I-C and I-N subgroups. Group I contain two WRKY domains and zinc nger structures, and the zinc nger structure are CX 4 C 22-23 HXH. Group II could be further divided into subgroups II-a, II-b, II-c, II-d and II-e, with 2, 7, 7, 6 and 6 members, respectively. In II-a, II-b, II-d and II-e, the heptapeptide domain and zinc nger structure of WRKY at Cterminal were WRKYGQK and CX 5 C 23 HXH, while in II-c, the heptapeptide domain and zinc nger structure of WRKY at C-terminal were WRKYGQK and CX 4 C 23 HXH. There were six members in group III. The heptapeptide domain and zinc nger structure of WRKY at C-terminal were WRKYGQK and CX 7 C 23 HXC (Fig 1).
In addition, it was found that there are still mutations in its protein sequence in jute although WRKY transcription factor has a very conserved WRKY domain. Among the 43 members of WRKY transcription factor in jute, the conserved domain of one gene (WRKYGQK) and the zinc nger structure of four genes were all mutated ( Table 2). This variation indicated that despite the structurally highly conserved WRKY gene family, some variations still occur in its WRKY domain, which also illustrated that the plant WRKY gene family had diversity in the evolutionary process.

Phylogenetic analysis of CcWRKY protein in diverse species
Comparing the known WRKY region of Arabidopsis thaliana WRKY protein with CcWRKY, the WRKY domain sequence of CcWRKY protein was clustered and analyzed using MEGA7 (Fig 2). These CcWRKY proteins can be divided into three groups: I, II and III. And Group II can be divided into II-a, II-b, II-c, II-d and II-e subgroups. The classi cations of phylogenetic tree analysis were consistent with the results of Figure  1 ( Table 1).

Structure analysis of intron and exon of WRKY in jute
The numbers of exons and introns of jute WRKY were shown in Figure 3. The numbers of exons varied from 3 to 11. 21 WRKYs (48.84%) contained 3 exons, 5 WRKYs (11.63%) contained 4 exons, 8 WRKYs (18.60%) contained 5 exons, 6 WRKYs (13.95%) contained 6 exons. From the groups, Group II c+d+e and group III were relatively conservative, while Group I and Group II a+b+c's structures were signi cantly different and changed greatly. Most CcWRKYs in Group II c+d+e and group III contain 3 exons except Ccv40151700 (4 exons) and Ccv40018590(4 exons).

Analysis of tertiary structure of protein
The tertiary structure of protein is further coiled and folded on basis of the secondary structure. The tertiary structure of CcWRKY protein were conducted by SWISS-MODEL. The majority of the 43 amino acid sequences have the similar three-dimensional structure. One representative homology modeling from CcWRKY gene family was shown in Figure 4, consisting of several beta folding. Their tertiary structure were quite similar with Arabidopsis thaliana [30]. It had also proved that the CcWRKY gene family is highly conserved in structure.

Expression analysis of CcWRKY genes in different tissues
Tissue speci c expression of genes is often considered as markers of speci c gene functions in this tissue. Since WRKY genes are related to the bast ber development of plants [31,32], we mainly focus on the expression of CcWRKY genes at different stages of stem growth. Based on the RNA-seq data, we used R language to draw the heatmap of the expression patterns of CcWRKY genes in different stem growth stages (Fig. 5). The difference of gene expression is generally represented by colors, red represents high expression and blue represents low expression. The results showed that all the CcWRKY genes were expressed in the stem of jute, and the expression of WRKY genes differ at different stem growth stages. Meanwhile, it proved that there were no pseudogenes in 43 genes. From Figure 5, we could see that 43 genes were divided into two categories. The expressions of 13 genes were lower in the different tissues of jute, and the others were higher. Totally, 10 WRKYs were highly expressed in leaf (60d), 3 WRKYs were highly expressed in hypocotyls (10d), 2 WRKYs were highly expressed in stem stick (60d), 2 WRKYs were highly expressed in stem bark (60d), 14 WRKYs were highly expressed in root (60d), and 12 WRKYs were highly expressed in stem bark (120d). It could be seen that the WRKY genes were mainly expressed in the stem bark of jute. With the continuous growth of jute, the bast ber of jute will gradually accumulate in the stem bark. Therefore, it is believed reasonably that the WRKY genes are involved in bast ber development in jute. For example, Ccv40032460 was highly expressed in hypocotyls (10d), lowly expressed in stem bark (60d), and no expression in stem bark (120d). It suggests that this gene may play a negative regulatory role in jute ber accumulation.

GA 3 stress analysis of CcWRKY genes involved in cell wall formation
The stem barks were treated with GA 3 stress for "Huangma 179" and " Aidianyehuangma " when the plants were at vigorous growth stages (60 days after sowing). Then, the samples were taken after 4 hours and 3 days, respectively. The samples without GA 3 treatment could be used as control. We analyzed the RNA-seq results of these materials, and then drew the corresponding histogram (Fig. 6), the up column indicated that the gene expressions were up-regulated, and the down column showed the gene expressions were down-regulated. According to our previous research [33], "Aidianyehuangma" is a dwarf variety that sensitive to GA 3 . After spraying GA 3 , it was found that the plant height of "Aidianyehuangma" could increase [33]. The expression of WRKY genes mostly changed signi cantly under GA 3 stress, especially for the down regulated genes, as shown in Figure 6. By comparison of expression of CcWRKY genes in different treatment time (4h and 3d) after spraying GA 3 , the expression of most of CcWRKY genes (31 genes) changed in the same trend in "Huangma 179", similar results were found in "Aidianyehuangma". From these, we found that the expressions of 18 CcWRKY genes of "Aidianyehuangma" were sensitive to the GA 3 stress, which showed signi cantly different from controls. It indicated that these CcWRKY genes play an important role in the gibberellin biosynthesis pathway and ber development.
Comparing the expression of WRKY gene in "Huangma 179" and "Aididiantehuangma" in the same treatment, the two showed a similar trend in general, but the trend of the former was more obvious whether it was up or down. Since "Aidianyehuangma" is a dwarf variety that sensitive to GA 3 while "Huangma 179" is a normal variety. This suggested that these co-regulated CcWRKY genes possibly participated in cross-talk between signaling pathways regulating GA 3 stress. Since gibberellin has a close relationship with ber development [34], the down regulated CcWRKY genes (like Ccv40018580, Ccv40001440, Ccv40052670, Ccv40120290, Ccv40136560, Ccv40154680, and Ccv40170160 etc.) might positively regulated the synthesis of GA 3 . This also indicated that CcWRKY genes play an important role in gibberellin biosynthesis pathway and ber development.
To veri ed the accuracy of the gene expression, 9 CcWRKY genes were randomly selected for qRT-PCR analysis (Fig. 7). The results of qRT-PCR corresponded to the results of FPKM.

Discussion
CcWRKY transcription factors in jute WRKY transcription factors are one of the largest families of transcriptional regulators in plants. They play important roles in plant growth and development, as well as defensive in biotic and abiotic stresses [1,[35][36][37]. In this study, there were at least 43 WRKY members in jute genome, and the numbers were similar to those of sugar beet (40) , whose members were more than those of jute. By comparing species, the numbers of WRKY genes in different species is not proportional to their genome size. Nowadays, researchers have suggested that gene duplication, segmental duplication and whole genome duplication play important roles in the mass production of gene families [41]. Unfortunately, due to the lack of data on jute research, genome data also only published a draft, so many problems failed to get a satisfactory answer. We suspected that jute genome WRKY genes were less than other species, perhaps because they did not experience whole genome replication as other species do. However, with the improvement of genome sequencing and assembly, upgrade and update of the search and analysis software, discovery of variable splicing in genomes and the continuous advance of jute related research, we believed that the new WRKY members also existed in the genome of jute.
In general, the locations of introns and exons in the genome may provide important evidences for their evolutionary relationships. In this study, we systematically and comprehensively analyzed the distributions and lengths of exons and introns of the members of the WRKY gene family. By analyzing the gene structures of jute WRKY, it was found that its members consisted of 3 to 11 exons, and nearly half of them were 3 exons. These results provided valuable information for the study of jute and the evolutions of the WRKY gene family in other species. In addition, the members of the WRKY family have similar three-dimensional structures, which were formed by several beta folds. It is similar to the 3D structure of the domains of Arabidopsis WRKY protein in the database [30].
CcWRKY genes involved in cell wall formation Both Jute and cotton are important bre crops in the world. So, we usually would compare the results of jute with the cotton. According the expressions analysis of CcWRKY genes in different tissues, we could nd that the mostly group III CcWRKY genes (Ccv40018580, Ccv40120290, Ccv40170160 and Ccv40154680) were highly expressed during the ber development except two genes (Ccv40018590 and Ccv40064890). This result in jute was consistent to that in cotton [44]. In our study, most CcWRKY genes expressed differently in the stem bark and hypocotyl. This interesting phenomenon leaded us to believe that there was a certain connection between WRKY transcription factors and bast ber development. In recent years, the study [45] has shown that Arabidopsis WRKY transcription factor was involved in the formation of secondary cell wall of myeloid parenchyma cells, which could signi cantly increase plant biomass. Among them, AtWRKY12 could negatively regulated the deposition of lignin, xylan and cellulose in myelocytes. In this study, the expression of Ccv40232460 (CcWRKY12) in the stem decreased with the growth of jute plants. We speculated that this gene has a negative regulatory effect on the accumulation of secondary wall bers in jute. This was consistent with previous studies in WRKY12 [45].
It has long been known that cellulose synthesis in plants was regulated by various phytohormones. GA 3 is an important hormone for plant growth and development throughout the whole life cycle. Therefore in this study, the gene expression patterns of 43 CcWRKY transcription factor genes under GA 3 stress were systematically analyzed. Because of the great in uences of GA 3 on plant heights, the dwarf variety "Aidianyehuangma" was selected as the target of GA 3 stress treatment. The results showed that the expressions of most genes were signi cantly increased or decreased, indicated that these CcWRKY transcription factor genes were likely to be involved in the defense response under GA 3 stress. At present, it has been shown that gibberellin-mediated signaling cascade regulates cellulose synthesis [34]. Therefore it will be very interesting to further studied about the relationships between CcWRKY genes, GA 3 stress and ber development.
In addition, due to the importance of jute economy, it will be very important to study the transcriptional regulation of WRKY proteins in jute. These CcWRKY genes have positive effects on ber development and response to GA 3  qRT-PCR veri cation of randomly selected 9 CcWRKY genes (Test varieties: "Huangma 179" and "Aidianyehuangma"; samples: jute stem bark of 60 days after sowing; CK: control, water treatment; 3d: 3d after GA3 stress).

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.