De novo assembly of transcriptome and genome-wide identification reveal GA3 stress-responsive WRKY transcription factors involved in fiber formation in jute (Corchorus capsularis)

Background WRKY is a group of transcription factors (TFs) that play a vital role in plant growth, development, and stress tolerance. To date, none of jute WRKY (CcWRKY) genes have been identified, even if jute (Corchorus capsularis) is one of the most important natural fiber crops in the world. Little information about the WRKY genes in jute is far from sufficient to understand the molecular mechanism of bast fiber biosynthesis. Results A total of 244,489,479 clean reads were generated using Illumina paired-end sequencing. De novo assembly yielded 90,982 unigenes with an average length of 714 bp. By sequence similarity searching for known proteins, 48,896 (53.74%) unigenes were annotated. To mine the CcWRKY TFs and identify their potential function, the search for CcWRKYs against the transcriptome data of jute was performed, and a total of 43 CcWRKYs were identified in this study. The gene structure, phylogeny, conserved domain and three-dimensional structure of protein were analyzed by bioinformatics tools of GSDS2.0, MEGA7.0, DNAMAN5.0, WebLogo 3 and SWISS-MODEL respectively. Phylogenetic analysis showed that 43 CcWRKYs were divided into three groups: I, II and III, containing 9, 28, and 6 members respectively, according to the WRKY conserved domain features and the evolution analysis with Arabidopsis thaliana. Gene structure analysis indicated that the number of exons of these CcWRKYs varied from 3 to 11. Among the 43 CcWRKYs, 10, 2, 2, and 14 genes showed higher expression in leaves, stem sticks, stem barks, and roots at the vigorous vegetative growth stage, respectively. Moreover, the expression of 21 of 43 CcWRKYs was regulated significantly with secondary cell wall biosynthesis genes using FPKM and RT-qPCR by GA3 stress to a typical GA3 sensitive dwarf germplasm in comparison to an elite cultivar in jute. The Cis-element analysis showed that promoters of these 21 CcWRKYs had 1 to 4 cis-elements involved in gibberellin-responsiveness, suggesting that they might regulate the development of bast fiber in response to GA3 stress. Conclusions A total of 43 CcWRKYs were identified in jute for the first time. Analysis of phylogenetic relationship and gene structure revealed that these CcWRKYs might have a functional diversity. Expression analysis showed 21 TFs as GA3 stress responsive genes. The identification of these CcWRKYs and the characterization of their expression pattern will provide a basis for future clarification of their functions in bast fiber development in jute.


Background
The WRKY gene family are transcription factors that exists only in plants. It is mainly involved in transcriptional regulation and signal transduction processes in plants [1]. In the former one, WRKY transcription factors bind specific DNA sequences to repress or activate transcription of numerous important genes [2,3]. The conserved WRKY domain comprises 60 aa residues and a conserved WRKY GQK hexapeptide sequence within the WRKY domain is normally accompanied by a C2HC-type or C2H2 zincfinger structure. According to the number of WRKY domains and the type of zinc finger structure, the WRKY family can be divided into 3 groups: Group I, Group II and Group III [4]. Group II could be further divided into five subgroups: II-a, II-b, II-c, II-d and II-e. Group I contains two WRKY domains and C 2 H 2 zinc finger structure, Group II contains a WRKY domain and a C 2 H 2 zinc finger structure, and Group III contains a WRKY domain and a C 2 -H-C zinc finger structure [5].
Jute (Corchorus capsularis L) is a diploid (2n = 14) natural fiber plant, which belongs to the Malvaceae family [23]. It is mainly cultivated in Bangladesh, India, and China. The bast fiber of jute is mainly used to make fabrics, ropes and threads, which is an indispensable part in human's life. Due to the reason that plant fibers are renewable, eco-friendly, and 100% biodegradable, more and more people are interested in plant fiber textiles and production technology [24]. Jute gained considerable attention in recent decades.
It has long been known that WRKY genes play an important role in plant growth, development, stress tolerance, especially in fiber development [25]. And bioactive gibberellins (GAs) are also a type of important plant growth regulators, which play the key roles in stem development. However, none of WRKY (CcWRKY) genes has been identified in jute so far. Therefore, it is imperative to identify and analyze the WRKY transcription factors in jute after the release of the draft genome and transcriptome [23]. The aims of this study were to identify the CcWRKYs and analyze their expression pattern in different tissues, especially bast fiber in response to GA 3 stress.

Plant materials and treatments
Two genotypes, a typical GA 3 sensitive dwarf germplasm ("Aidianyehuangma") and an elite cultivar ("Huangma 179"), were used as plant materials in this study. The two genotypes of jute were grown in the farm of Fujian Agriculture and Forestry University on May 1st, 2017. When 10 days after sowing (DAS), hypocotyls were independently sampled from three plants of "Huangma 179". The leaves, roots, stem barks and stem sticks were independently collected from three plants of "Huangma 179" when 60 DAS, a vegetative growth period for jute. And stem barks were independently sampled from three plants of "Aidianyehuangma". When 120 DAS, the stem barks were independently collected from three plants of "Huangma 179".
To explore the effects of GA 3 stress on bast fiber in jute, "Huangma 179" was used as the control variety and "Aidianyehuangma" as the test variety. According to the method of MS medium [26], the standard medium (CK) and 0.1 mg·L − 1 GA 3 (GA3) exogenous hormone medium were prepared. An elite cultivar "Huangma 179" and a typical GA 3 sensitive dwarf germplasm "Aidianyehuangma" were used in this study. Thirty seeds were planted under MS medium in each bottle, and the length of hypocotyl was measured when the first leaf was grown.
To identify the expression of CcWRKYs in bast fiber in response to GA 3 stress, six plants of the two genotypes were treated with 100 mg·L − 1 exogenous GA 3 on 60 DAS. After 4 h and 72 h of the GA 3 stress treatment, the stem barks of each plant were obtained from "Huangma 179" and "Aidianyehuangma" respectively. Also, three plants of the two genotypes were sampled separately as control. Three samples from each tissue were immediately frozen in liquid nitrogen and stored at − 80°C separately as three biological replicates for the following RNA isolation (Additional file 1: Table S1).

Transcriptome sequencing assembly
Total RNA was isolated from three separate collected samples using Plant Total RNA Isolation kit (Tiandz Inc.; Beijing, China), respectively. RNAs of each sample tissue were equally mixed together for libraries construction as well as for Illumina sequencing which was performed by a company, Beijing Novogene Biological Information Technology Co., Ltd., Beijing, China (https:// www.novogene. cn/) . In short, 10 μg of good quality RNA was collected for transcriptome library construction. mRNAs of 12 sequencing libraries (Additional file 1: Table S1) was purified from total RNA of each sample tissues using Poly-T oligoattached magnetic beads. These mRNAs were then fragmented with divalent cations under raised temperature in Illumina patented fragmentation buffer. The superscript II and random oligonucleotides were utilized to synthesize first cDNA strand while RNase H and Polymerase I were used for getting the second cDNA strand. Fragments holding adaptors were selected with gel purification as well as Illumina polymerase chain reaction (PCR) amplification to produce cDNA library. Systems like Agilent Bioanalyzer 2100 and AMPure XP system were applied to quantify and purify the PCR products respectively. Indexcoded clustering of samples was conducted using the cBot Cluster Generation System with Illumina (TruSeq PE Cluster Kit v3-cBot-HS) referring to the manufacturer's recommendations. After cluster production, transcriptome sequencing was carried out using Hiseq 2000 platform Illumina with 90 bp paired-ends reads.
The raw data (sequence) reads were filtered for quality by trimming reads holding adapter (poly-N and lowquality sequence reads) in Perl script house. The assembled sequence data with high quality were de novo assembled into a transcriptome using Trinity (a short read-assembly) program containing all default settings for all considerations and mapped to transcripts. Transcripts that had less than 10× coverage were removed and finally good quality raw data were deposited in the NCBI Sequence Read Archive (SRA) database (BioProject ID: PRJNA555734) with accession codes of SAMN12327429 to SAMN12327440.

Identification, sequence alignment and gene structure analysis of CcWRKYs
The assembly of CcWRKY genes and ORF analysis were performed according to Islam et al. [23] and Zhang et al. [27]. Full-length amino acids sequences of all WRKY proteins in Arabidopsis thaliana (http://www.arabidopsis.org/) were used as query sequences. A BLASTP search was performed with an E-value threshold of 10 − 6 . All the potential CcWRKY genes identified from BLAST searches were only accepted if they contained the WRKY domain. And the conserved domains of predicted WRKY were confirmed using multiple sequence alignments. Further, conservative domain prediction using Pfam was conducted to ensure that all candidate genes contain WRKY conserved domains [28]. Candidate WRKYs were also checked by searching for WRKY domains in the candidate amino acids sequences using SWISS-MOD-ELL (http://swissmodel.expasy.org/) [29]. The intron/ exons information of CcWRKYs were obtained by loading both the full-length protein sequences and their related coding sequences into GSDS (Gene Structure Display Server) version 2.0. Base on homologous proteins in Arabidopsis as query sequences, secondary cell wall biosynthesis genes in jute were also identified by local BLASTP.

Phylogenetic analysis of CcWRKYs
Due to the variable lengths of the complete protein sequences of the WRKY genes, WRKY domains extracted from the predicted proteins were used to draw the phylogenetic tree. A neighbor joining evolutionary tree was constructed by MEGA7 software [30]. The p-distance method was used to compute the evolutionary distances, which were used to estimate the number of amino acid substitutions per site. Conducting 1000 bootstrap sampling steps [12] was used to establish the reliability of each tree. The phylogenetic tree was divided into the different groups on the basis of Arabidopsis annotations and classification. WRKY family sequences from Arabidopsis were downloaded from TAIR (http:// www.arabidopsis.org/).

RNA extraction and gene expression analysis
Total RNA was isolated from three separate collected samples using Plant Total RNA Isolation kit (Tiandz Inc.; Beijing, China), respectively. The cDNAs were transcribed from RNAs using the GoScript™ reverse transcription system (Promega, Madison, USA) referring to manufacturer's directions. The Cc actin gene was used as a reference gene to normalize the expression of CcWRKY [31]. Three independent reactions were performed for putative CcWRKY genes using the Fast Start Universal SYBR® Green Master (ROX) with a 7500 realtime PCR machine (ABI) referring to manufacturer's directions. The reaction systems were: 0.4 μl of right primer (10 μM), 0.4 μl of left primer (10 μM), 10 μl of GoTaq® qPCR Master Mix, 7.2 μl of Nuclease-Free Water, and 2 μl of cDNA. Amplification conditions were programmed as follows: 30 s of denaturation at 95°C, followed by 15 s of 40 cycles at 95°C, and 30 s at 60°C, with a melting curve temperature ranged from 55 to 99°C. Each reaction was carried out in three biological replicates and the relative gene expression was determined with 2 -△△CT method [32]. Three biological repeats  Error bars indicate standard deviation of the means. Gene expression levels were calculated as FPKM according to the length of the gene and reads count mapped to this gene: FPKM = total exon fragments/ [mapped reads (millions) × exon length (kb)]. The expression analysis of CcWRKYs was performed according to the FRKM value of different genes in different tissues, i.e., Hypocotyl-10d, Leaf-60d, Root-60d, Stem bark-60d, Stem stick-60d, and Stem bark-120d (Additional file 1: Table S1). After GA 3 stress, an elite cultivar "Huangma 179" was used as a reference variety to estimate the expression of CcWRKYs and related genes while a typical GA 3 sensitive dwarf germplasm "Aidianyehuangma" as a test germplasm. The heatmaps were created by R language based on the transformed data of log 2 (FPKM + 1) values. The differentially expressed genes among the treated samples were estimated by referring to the standard of FDR < 0.05 & |log2(fold change) | > 1. All data analyses were conducted using SPSS Statistics 20. The cis-elements within the region of 5 kb upstream promoters of CcWRKYs were predicted and analyzed by PlantCARE.

De novo assembly of transcriptome
To obtain and characterize the expression pattern of GA 3 stress-responsive WRKY transcription factors in jute, 12 sequencing libraries was from hypocotyls, leaves, roots, stem bark and stem stick at different growth stages between two genotypes, a typical GA 3 sensitive dwarf germplasm ("Aidianyehuangma") and an elite cultivar ("Huangma 179") (Fig. 1a). By Illumina paired-end sequencing, 244,489,479 clean reads comprised of 44.13% GC ratio, and 93.81% Q30 bases were found (Additional file 2: Table S2). Moreover, 110,677 overall transcripts comprised of N50 length of 1710 bp, and mean length of 862 bp, were de novo assembled by trinity program (Fig. 1b, Additional file 3: Table S3, Additional file 4: Table S4). Of these transcripts, 73.73% were in the length range of 200 bp to 1000 bp. Further, 90,982 unigenes had N50 length of 1357 bp, and a mean length of 714 bp were assembled from transcripts (Fig. 1b, Additional file 3: Table S3, Additional file 4: Table S4). The length of unigenes differed from 201 bp to 15,570 bp. 58,764 unigenes possessed lengths ranged from 201 to 500 bp and 14,489 unigene lengths ranged from 501 to 1000 bp, whereas 17, 729 unigenes (19.49%) lengths were greater than 1000 bp (Additional file 3: Table S3).
To evaluate the quality of identified unigenes, de novo assembled trancriptome sequences by trinity were taken as reference sequence. All the obtained clean sequence reads were mapped to the assembled unigenes by RSEM software. Concerning these assembled unigenes functional annotations, the sequence homology search was conducted against Nr (non-redundant protein sequences) at different databases in NCBI using BLASTx with an Evalue of 10 − 5 (Fig. 1c, Additional file 5: Table S5). Among a total of 90,982 unigenes, 39,238 (43.12%) and 33,410 (36.72%) showed significant homology to common genes available in SwissProt and non-redundant protein databases, respectively. Together, 48,896 (53.74%) unigenes unigenes were annotated in not less than one public databases like GO, Nr, KOG, KO, PFAM, or Swiss Prot. Finally, number of sequences from various species matched to that of Jute were determined from the annotation feature (Additional file 6: Table S6). As indicated in Fig. 1d, the three most abundant species were Theobroma cacao (11,457,46.9%), Hordeum vulgare (11,071, 9.6%), and Gossypium raimondii (8274, 6.8%), representing about 63.3% of the total annotated species.

Identification of full-length CcWRKYs and phylogenetic analysis
To get complete WRKY members in jute (CcWRKY), BLASTp search was conducted using Arabidopsis fulllength WRKY protein sequences from TAIR (http:// www.arabidopsis.org) as query. And Pfam was used to detect their conserved domain. A total of 43 candidate genes containing WRKY domain (named as CcWRKY) were identified, which is shown in Table 1.
To further evaluate these CcWRKYs, the conservative domains were identified using DNAMAN5.0 software and the conservative structure was performed by Weblogo, as shown in Fig. 2 and Additional file 7. The conserved domains of CcWRKYs could be divided into three groups: I, II and III, including 9, 28 and 6 members respectively. Group I, which could be further divided into I-C and I-N subgroups, contains two WRKY  II could be further  divided into subgroups II-a, II-b, II-c, II-d and II-e, with  2, 7, 7, 6 and 6 members, respectively. The heptapeptide domain and zinc finger structure of WRKY at Cterminal in the subgroup of II-a, II-b, II-d and II-e were WRKYGQK and CX 5 C 23 HXH, while the heptapeptide domain and zinc finger structure of WRKY at Cterminal in the subgroup of II-c were WRKYGQK and CX 4 C 23 HXH. Group III contains the heptapeptide domain and the zinc finger structure of WRKY at Cterminal is WRKYGQK and CX 7 C 23 HXC. Moreover, the present results showed that there are still mutations in its protein sequence, even though WRKY transcription factor has a much conserved WRKY domain. Among the 43 CcWRKYs identified in jute, the conserved domain of one gene (WRKYGQK) and the zinc finger structure of four genes were all mutated (Additional file 7: Table S7). These variations indicated that some variations still occur in its WRKY domain despite the structurally high conserved WRKY gene family, which also illustrated that the plant WRKY gene family had a functional diversity in the evolutionary process.
By comparison of the WRKY TFs of A. thaliana with CcWRKYs, analysis of phylogenetic relationship showed that these CcWRKYs could be divided into three groups: I, II and III. And Group II can be divided into II-a, II-b, II-c, II-d and II-e subgroups (Fig. 3). The phylogenetic tree analysis confirmed the results of the conservative domain analysis of CcWRKYs (Fig. 2, Table 1).

Gene structure analysis of CcWRKYs
The number of exons and introns of CcWRKYs were analyzed, as shown in Fig. 4. The number of exons varied from 3 to 11. 21 WRKYs (48.84%) contained 3 exons, 5 WRKYs (11.63%) contained 4 exons, 8 WRKYs (18.60%) contained 5 exons, and 6 WRKYs (13.95%) contained 6 exons. From the different groups, the exon numbers of group II c + d + e and group III were relatively conservative, while the exon numbers of group I and group II a + b + c's were significantly different and varied greatly. Most of CcWRKYs in group II c + d + e and group III contain 3 exons except Ccv40151700 (4 exons) and Ccv40018590 (4 exons).
The tertiary structure of WRKYs is further coiled and folded based on the secondary structure. The tertiary structures of CcWRKYs were conducted by SWISS-MODEL. The majority of the 43 CcWRKYs have the similar three-dimensional structure. One representative homology modeling from CcWRKYs was shown in Additional file 8 (Fig. S1), consisting of several beta folding. This tertiary structure was similar to that of Arabidopsis thaliana [33], which confirmed that the CcWRKYs are highly conserved in the tertiary structure.

Expression analysis of CcWRKYs in different stem growth stages
Tissue-specific expression of genes is often considered as markers of specific gene functions in this tissue. Since WRKY genes are related to the bast fiber development of plants [34,35], the expression of CcWRKYs in different stages of stem growth was analyzed. R language was used to draw the heatmap of the expression patterns of CcWRKYs in different stem growth stages (Additional file 9: Fig. S2) based on the RNA-seq data. The difference of gene expression is generally represented by different colors, that mean the red color represents high expression and the blue one represents low expression. All the CcWRKYs were expressed in different stages of Fig. 4 The gene structures of CcWRKYs. It shows the exon (black box)-intron (black lines) organization of the same genes. Exon-intron structure analyses of CcWRKY genes were performed by the online tool GSDS stem growth, while the expression of CcWRKYs differ in various stem growth stages. It indicates that there were no pseudogenes in 43 CcWRKYs. From the expression pattern of the heatmap, 43 CcWRKYs were divided into two categories (Additional file 9: Fig. S2). The expression of 13 of 43 CcWRKYs was low in the different tissues of jute, while the others were high. Totally, 10 CcWRKYs were highly expressed in Leaf-60d, 3 CcWRKYs were highly expressed in Hypocotyl-10d, 2 CcWRKYs were highly expressed in Stem stick-60d, 2 CcWRKYs were highly expressed in Stem bark-60d, 14 CcWRKYs were highly expressed in Root-60d, and 12 CcWRKYs were highly expressed in Stem bark-120d. It could be seen that these CcWRKYs were mainly expressed in the stem barks of jute. With the continuous stem growth stages of jute, the bast fiber of jute will gradually accumulate in the stem bark. Therefore, it is believed reasonably that the WRKY genes may be involved in bast fiber development in jute. For example, Ccv40032460 was highly expressed in Hypocotyl-10d, lowly expressed in Stem bark-60d, and no expression in Stem bark-120d. It suggests that this gene might play a negative regulatory role in jute fiber accumulation.

Expression pattern of GA 3 stress-responsive CcWRKY genes in the vegetative growth period
After being treated with MS culture media of 0.1 mg·L − 1 exogenous GA 3 , the hypocotyl length of "Aidianyehuangma" showed significant difference compared with "Aidianyehuangma" without being treated with exogenous hormone, and reached that of an elite cultivar "Huangma 179" (Additional file 10: Fig. S3). "Aidianyehuangma" is a typical GA 3 sensitive dwarf germplasm, as shown in our previous study [36]. Since bioactive gibberellins (GAs) play the key roles in stem development, the two genotypes of "Aidianyehuangma" and "Huangma 179", were used to characterize the expression pattern of these CcWRKYs in bast fibers in response to GA 3 stress in the vegetative growth period (60 DAS).
The stem barks were treated with GA 3 stress for "Huangma 179" and "Aidianyehuangma" on 60 DAS. Then, the stem barks were sampled in two different time points of 4 and 72 h later after spraying GA 3 , respectively. The stem barks without GA 3 treatment were used as control. The RNA-seq data of these samples were analyzed, and then drew the corresponding histogram using FPKM (Additional file 11: Fig. S4, Additional file 12: Fig. S5). The above x-axis columns indicated that the gene expressions were up-regulated, while the below xaxis ones showed that the gene expressions were downregulated. The expression of most CcWRKYs was regulated significantly under GA 3 stress, especially for the down regulated genes. By comparison of the expression of CcWRKYs under different time points (4 and 72 h later) after spraying GA 3 , the expression of 33 of 43 CcWRKYs in the stem barks was regulated in the same trend in both of "Huangma 179" and "Aidianyehuangma". Among them, it found that 21 CcWRKYs were regulated significantly under the GA 3 stress in the stem barks of "Aidianyehuangma" compared with "Huangma 179" (Fig. 5 and Additional file 13: Table  S8), except Ccv40020010, Ccv40015220, Ccv40068880, Ccv40154010 and Ccv40090780. Among these 21 CcWRKYs, the transcript levels of 14 CcWRKYs were down-regulated and 7 CcWRKYs were up-regulated in the stem barks under the GA 3 stress. To verify the accuracy of the gene expression, 9 CcWRKYs were randomly selected for RT-qPCR analysis (Additional file 14: Fig. S6). RT-qPCR assays of the expression patterns of the 9 CcWRKYs were similar to the results of FPKM.
The variations of expression of most of these gibberellin biosynthesis genes in "Aidianyehuangma"(test: a typical GA 3 sensitive dwarf germplasm) were more significant than those in "Huangma 179" (control: an elite cultivar) ( Fig. 6 and Additional file 15: Table S9), indicating GA 3 play a certain role in stem development according to the gene expression regulation of these gibberellin biosynthesis genes. Notably, expression of most of these secondary cell wall biosynthesis genes in "Aidianyehuangma" were down-regulated significantly than those in "Huangma 179" (Fig. 6). The lignin synthase genes (CCoAOMT and 4CL) and cellulose synthase genes (CesA4, CesA7 and CesA8) showed downregulation of their expression in "Aidianyehuangma" compared with "Huangma 179". Combined with expression pattern of 21 GA 3 stress-responsive CcWRKYs (Fig.  5), it gives a hint that the 7 up-regulated CcWRKYs may play a positive role in response to GA 3 stress while the 14 down-regulated CcWRKYs might play a negative role in the development of stem barks of "Aidianyehuangma" under the GA 3 stress.
To confirm this hint, cis-element analysis of promoters of these 21 CcWRKYs were analyzed using PlantCARE ( Table 2). The promoters of these 21 CcWRKYs had 1 to 4 cis-elements involved in gibberellin-responsiveness, such as GARE-motif, P-box and TATC-box. These 21 CcWRKYs were responded to the stress of GA 3 and could increase the fiber related traits including plant height of the "Aidianyehuangma", suggesting that they might be involved in the development of bast fiber in jute in response to GA 3 stress.

Discussion
CcWRKY transcription factors in jute WRKY transcription factors are one of the largest families of transcriptional regulators in plants. They play important roles in plant development, biotic and abiotic stresses [1,[37][38][39]. The present study identified a total of 43 WRKY genes in the whole genome of jute. The number of WRKY genes in jute was consistent with that in sugar beet (40) [40], canola (43) [41] and castor (47) [42]. However, it was inconsistent to that in other plants' genomes such as Arabidopsis thaliana (74), cotton (116) [16], soybean (188) [43], poplar (104) [15], tomato (81) [44], cabbage (145) [45], whose members were more than those of jute. By comparing with different species, the numbers of WRKY genes in different species are not proportional to their genome size. Recently, researchers have suggested that gene duplication, segmental duplication and whole genome duplication play important roles in the mass production of gene families [43]. Unfortunately, due to the insufficient data on jute researches, particularly genome data. The only published draft jute genome was not enough to provides satisfactory solutions to many problems. We suspected that jute genome WRKY genes were less than that in other species, perhaps because they did not experience whole genome replication as other species do. However, with the improvement of genome sequencing and assembly, we believed that the new WRKY members also existed in the genome of jute.
In general, the locations of introns and exons in the genome may provide important evidences for their evolutionary relationships. In this study, it was Fig. 6 Expression analyses of several secondary cell wall biosynthesis genes in a typical GA 3 sensitive dwarf germplasm "Aidianyehuangma" in response to GA 3 stress. The transcript levels were determined in stem barks from the data of RNA-seq in response to GA 3 stress. Error bars indicate standard deviation of the means. Data points marked with asterisk (FDR < 0.05 & |log2(fold change) | > 1) show significant differences between the elite cultivar "Huangma 179" and GA 3 sensitive dwarf germplasm "Aidianyehuangma" in response to GA 3 stress. 179: the elite cultivar "Huangma 179", Aidian: a GA 3 sensitive dwarf germplasm "Aidianyehuangma", GA3-4 h: After 4 h of the GA 3 stress treatment, GA3-72 h: After 72 h of the GA 3 stress treatment comprehensively to analyze the distributions and lengths of exons and introns of CcWRKYs. By analyzing the gene structures of CcWRKYs, it was found that its members consisted of 3 to 11 exons, and nearly half of them were 3 exons. Moreover, the members of the WRKY gene family in jute have similar three-dimensional structures, which were formed by several beta folds. It is similar to the 3D structure of the domains of Arabidopsis WRKY protein in the database [33]. These results provided valuable information for the evolutions of the WRKY gene family in various species.

GA 3 stress-responsive CcWRKY genes involved in cell wall formation
Group III genes, one among the WRKY gene family have played a vital role in plant evolution and adaptions, hence it is regarded as the most advanced in respect of evolution [46]. However, according to the expression analysis of CcWRKYs in different tissues, it could be found that most of CcWRKYs in group III (Ccv40018580, Ccv40120290, Ccv40170160 and Ccv40154680) were highly expressed during the fiber development except two genes (Ccv40018590 and Ccv40064890). Theses results in jute were consistent to that in cotton [47]. It is suggested that CcWRKYs in group III in jute may have certain functional differentiation in fiber crops, which may have certain influence on the fiber development of crops. In recent years, the study of Wang et al. [48] has shown that Arabidopsis WRKY transcription factors were involved in the formation of secondary cell wall, which could significantly increase plant biomass, such as AtWRKY12. In this study, most of the CcWRKY genes expressed differently in the stem barks and hypocotyls. This interesting phenomenon leaded us to believe that there was a certain connection between WRKY transcription factors and bast fiber development.
Phytohormones are known to play a significant role in plant cellulose biosynthesis. GA is a key hormone for the growth and development of plant in its entire life cycle. Gibberellin has a great effect on the initiation, differentiation and development of cotton fiber, which can induce more fiber initial cells and stimulate fiber elongation. Therefore, the gene expression patterns of 43 CcWRKYs under GA 3 stress in this study were systematically analyzed. Because of the great influences of GA 3 on plant heights, a typical GA 3 sensitive dwarf germplasm ("Aidianyehuangma") was selected as the target of GA 3 stress treatment. Expression pattern showed that the expression of 21 of 43 CcWRKYs was significantly regulated in "Aidianyehuangma". It indicated that these CcWRKYs were likely to be involved in bast fiber development under GA 3 stress. At present, it has been shown that gibberellin-mediated signaling cascade regulates cellulose synthesis [49]. Therefore, mutations in Gibberellin gene could change transcription of CesAs (cellulose synthase) genes. And GA could regulate secondary wall cellulose synthesis in land plants. Previous studies reported that WRKY transcription factors were involved in epidermal hair development, glucose and gibberellin signaling [50]. Therefore, it will be very interesting to further study about the relationships between CcWRKY genes, GA 3 stress and fiber development.
In addition, due to the economic importance of jute, it will be very important to study the transcriptional regulation of WRKYs for jute improvement. These CcWRKY genes have obvious effects on fiber development and response to GA 3 stress. At present, the research of transcription factors in Arabidopsis thaliana was only a part of them, and it couldn't fully reveal the functions of each transcription factor. And the same transcription factors have different functions in different species because of the different evolutionary directions. Overall, this study provided a theoretical basis for the subsequent study of the CcWRKY genes and their effects on fiber development by identification and bioinformatics analysis of the CcWRKY genes.

Conclusion
De novo assembly of transcriptome of 12 sequencing libraries was conducted to obtain and characterize the expression pattern of GA 3 stress-responsive WRKY transcription factors in jute. Totally, 43 CcWRKYs in jute were identified. The numbers of exons of these CcWRKY genes varied widely, indicating they may have a functional diversity. Expression profiles of these CcWRKYs showed a variety of expression patterns at different stages of stem development. 21 of 43 CcWRKYs was regulated significantly with secondary cell wall biosynthesis genes in response to GA 3 stress. This suggested that these CcWRKYs might be involved in the growth and development of bast fiber in jute.