Evolution, expression and functional analysis of cultivated allotetraploid cotton DIR genes

Dirigent (DIR) proteins mediate regioselectivity and stereoselectivity during lignan biosynthesis and are also involved in lignin, gossypol and pterocarpan biosynthesis. This gene family plays a vital role in enhancing stress resistance and in secondary cell-wall development, but systematical understanding is lacking in cotton. In this study, 107 GbDIRs and 107 GhDIRs were identified in Gossypium barbadense and Gossypium hirsutum, respectively. Most of these genes have a classical gene structure without intron and encode proteins containing a signal peptide. Phylogenetic analysis showed that cotton DIR genes were classified into four distinct subfamilies (a, b/d, e, and f). Of these groups, DIR-a and DIR-e were evolutionarily conserved, and segmental and tandem duplications contributed equally to their formation. In contrast, DIR-b/d mainly expanded by recent tandem duplications, accompanying with a number of gene clusters. With the rapid evolution, DIR-b/d-III was a Gossypium-specific clade involved in atropselective synthesis of gossypol. RNA-seq data highlighted GhDIRs in response to Verticillium dahliae infection and suggested that DIR gene family could confer Verticillium wilt resistance. We also identified candidate DIR genes related to fiber development in G. barbadense and G. hirsutum and revealed their differential expression. To further determine the involvement of DIR genes in fiber development, we overexpressed a fiber length-related gene GbDIR78 in Arabidopsis and validated its function in trichomes and hypocotyls. These findings contribute novel insights towards the evolution of DIR gene family and provide valuable information for further understanding the roles of DIR genes in cotton fiber development as well as in stress responses.

Having the various biochemical functions, DIR proteins play important roles in stress responses, especially in plant defense against pathogens. The expression of PsDRR206 in pea pod tissue was induced by Fusarium solani infection, and its metabolite functioned as phytoalexin [6]. Overexpression of GmDIR22 in soybean can increase total lignan accumulation and enhance plant resistance to Phytophthora sojae [8]. Pepper plants with the silencing of CaDIR7 are more susceptible to Phytophthora capsici, NaCl and mannitol stresses [15]. Cotton plants with the overexpression of GhDIR1 showed an increased lignin content and displayed more resistance to Verticillium dahliae [11]. Besides defense responses, DIR proteins also presented other kinds of physiological functions such as Casparian strip formation [10] and pod dehiscence [16].
Ralph and colleagues performed a phylogenetic analysis of 150 DIR proteins from the seed plant division and suggested the presence of six distinct DIR subfamilies, DIR-a and five DIR-like subfamilies (b/d, c, e, f and g) [17]. After that, DIR gene family has been systematically studied in several vascular plants, and some stressinduced genes [15,[18][19][20] or SCW-related genes [21,22] were identified. Understanding the functions of DIR genes would be a practicable approach to enhance stress resistance of cotton and to improve fiber properties, but no comprehensive understanding has been provided in cotton until now. There are only a few V. dahliae-responsive DIR genes in literatures, and fiber development-related DIR genes remain to be identified. Recently, a series of Gossypium whole-genome sequences has been released, which makes it possible to analyze gene family at a genome-wide level in cotton. In the present study, we firstly identified the DIR gene family members in cultivated allotetraploid cotton, G. barbadense and G. hirsutum, and analyzed their evolution. Their expression patterns in response to V. dahliae infection and during cotton fiber development were systematically investigated. Furthermore, the functional analysis of GbDIR78 in Arabidopsis revealed its role in cell elongation. Our findings will further the understanding of this elusive gene family and provide candidate DIR genes for both defense response and fiber development.
Most cotton DIR genes (especially members in DIR-a, DIR-e and DIR-f) held a classical gene structure without intron (Additional file 1: Figure S1). Notably, the DIR-b/ d clade showed variable gene structures, indicating lower selection constraints. About 77% of GbDIRs and 82% of GhDIRs encoded proteins containing a signal peptide (Additional file 8: Table S1 and S2). As expected, subcellular localization prediction showed that they were mainly located in the extracellular space. In spite of having a signal peptide, several DIR proteins belonging to DIR-b/d were directed to the vacuole. In addition, DIR proteins located in the nucleus or the peroxisome were observed; the unexpected localization might signify novel functions. We also scanned cotton DIR protein sequences for N-glycosylation sites to explore their solubility and stability. Interestingly, DIR-e genes had much fewer N-glycosylation sites (Table 1), displaying the evolution and divergence of cotton DIR gene family.

Chromosomal distribution, duplication and evolution
To determine the chromosomal distribution of cotton DIR genes, we marked their physical locations on chromosomes based on their annotation information (Additional file 2: Figure S2). All the 107 GhDIRs were mapped to 24 of the total 26 chromosomes (i.e. except At06 and Dt06). The gene number varied from 1 to 15 across these chromosomes. Specifically, GhDIRs belonging to DIR-b/d were enriched on chromosomes At01, At04, At11, Dt01, Dt04 and Dt11, while DIR-e genes were mainly located on At10 and Dt10. Gene clusters were frequently observed, indicating a number of tandem duplication events. Because of the short divergence time, GbDIRs and GhDIRs were extremely similar in chromosomal distribution. GbDIRs were located on chromosomes except Dt02, At06 and Dt06, and the gene number ranged from 1 to 13. Six genes (GbDIR29 and GbDIR103-GbDIR107) were not mapped because of the incomplete location information.
To analyze the expansion of cotton DIR gene family, we identified segmental and tandem duplications in G. barbadense and G. hirsutum. The numbers of segmentally duplicated DIR genes and tandemly duplicated DIR genes were 13 and 31, respectively, in G. barbadense and 18 and 41 in G. hirsutum (Table 1; Fig. 2a and b; Additional file 8: Table S4 and S5). Obviously, tandem duplication as the major impetus drove the expansion of cotton DIR gene family, corresponding to the abovementioned gene clusters. Ks (substitution per synonymous site) values can be used to estimate the occurrence time of gene duplications and then to identify wholegenome duplication (WGD) events. Ks ranges of 0.4-0.6 (corresponding to Gossypium-specific WGD around 16.6 MYA) and 1.5-1.9 (corresponding to the paleohexaploidization event shared by eudicots around 130.8 MYA) were observed in G. raimondii [24]. Moreover, Ks peak of 0.03 accounts for the divergence between Agenome and D-genome progenitors [25]. Here we also calculated the Ks values of gene duplications to analyze their occurrence time; the LPB method was selected because its results were just in line with expectations (Additional file 8: Table S4 and S5). As shown in Fig. 2, Ks values of GbDIR14/GbDIR41, GbDIR79/GbDIR99 and GhDIR13/GhDIR41 were 0.60, 0.41 and 0.58, respectively, indicating that these three segmental duplications might be generated from Gossypium-specific polyploidization. The other segmental duplications were To further reveal evolutionary history of cotton DIR gene family, we integrated the duplication events with a phylogenetic tree containing 107 GbDIRs, 107 GhDIRs and 23 reported DIRs ( Fig. 2c; Additional file 8: Table  S6). Track i marked segmental duplications, while tracks ii, iii, iv and v (with artificially-created Ks ranges of 0.0-0.2, 0.2-0.6, 0.6-1.9 and 1.9-3.0, respectively) labeled tandem duplications. The values 0.6 and 1.9 referred to above-mentioned 0.4-0.6 and 1.5-1.9, respectively, and a cluster of Ks values around 0.1 set up the Ks range of 0.0-0.2 (Additional file 8: Table S4 and S5). The DIR-a clade, mainly involved in lignan biosynthesis, was basically established before the Gossypium-specific WGD event (Fig. 2c). Similarly, DIR-e was also a relatively  Table 1).
The Ka/Ks ratio is a measure of selective pressure on proteins, and Ka/Ks > 1, = 1 and < 1 indicate positive selection (or molecular adaptation), neutral evolution, and purifying selection (or selective constraints), respectively. Here we calculated the Ka/Ks ratio for all the duplicated DIR gene pairs (Additional file 8: Table S4 and S5). Interestingly, those three gene pairs generated from Gossypium-specific WGD (i.e. GbDIR14/GbDIR41, GbDIR79/ GbDIR99 and GhDIR13/GhDIR41) showed Ka/Ks ratios > 1, implying that positive selection might contribute to their surviving from gene losses and/or translocations. Ka/ Ks ratios of the other segmentally duplicated DIR gene pairs were fairly low (0.14-0.35 in G. barbadense and 0.14-0.34 in G. hirsutum), suggesting the conserved functions due to purifying selection (Additional file 3: Figure  S3a

Evolutionary history of cotton DIR gene family
To further verify the evolution of cotton DIR gene family, we constructed a phylogenetic tree consisting of 488 DIR genes from eight dicotyledonous species sharing a common paleo-hexaploid ancestor ( Fig. 3a; Additional file 8: Table S7). After the paleo-hexaploidization event, Arabidopsis thaliana and Glycine max have severally undergone two rounds of WGD events, while no WGDs have been identified in Vitis vinifera and Theobroma cacao [26][27][28][29]. Moreover, the Gossypium genus has experienced lineage-specific WGD, divergence and hybridization. As shown in Fig. 3a, we identified 33, 54, 25, 32, 63, 67, 107 and 107 DIR genes in Vitis vinifera, Glycine max, Arabidopsis thaliana, Theobroma cacao, G. arboreum, G. raimondii, G. barbadense and G. hirsutum, respectively. We assigned these DIR genes to distinct clades (Fig. 3b). Although they underwent different rounds of WGDs, these species were quite consistent in the number of DIR-a genes; G. barbadense and G. hirsutum, as tetraploid species, displayed DIR-a genes twice as many as the others. Similarly, apart from soybean, DIR-e was stable across species. Thus, DIR-a and DIR-e might have been set up before the divergence of Rosids, corresponding to the results inferred from the duplication events. In the above analysis, we found that DIR-f was evolutionarily inactive in G. barbadense and G. hirsutum. As expected, G. arboreum, G. raimondii, Arabidopsis thaliana and Theobroma cacao, belonging to Malvids, possessed less DIR-f genes. In particular, Arabidopsis thaliana lost the DIR-f clade during its evolution. DIR-b/d-I existed in all the eight species, DIR-b/d-II was absent in grape and soybean, and DIR-b/d-III was lineage-specific in cotton. Clearly, DIR-b/d-I, −II and -III really appeared in turn in evolution. It has been shown that DIR-b/d-III displayed two distinct clades and that the left clade arose later than the right. As shown in Fig.  3c, most genes in the left clade of DIR-b/d-III were in the collinear blocks, suggesting that DIR-b/d-III was established before the divergence of cotton species. The chromosome reciprocal translocation between At04 and At05 was also observed, which has been confirmed recently [30].

Expression patterns and transcriptional regulation
Studies have shown that DIR proteins are implicated in lignan, lignin and gossypol biosynthesis, which are all part of plant defense responses against pathogens. To investigate the role of cotton DIR genes in disease resistance, the expression patterns of GhDIRs were analyzed in response to V. dahliae and water (the check group) using a Verticillium wilt-resistant cultivar. As a result, about one quarter of GhDIRs were highly expressed in the check group (Fig. 4). For the "gossypol clade" (the left half of DIR-b/d-III), almost half of the members showed a fairly high expression level, indicating its importance in plant pre-formed defense. Once the seedlings were inoculated with V. dahliae, most of the highly expressed GhDIRs were dramatically down-regulated. It seems that V. dahliae can weaken the functions of DIR genes by disturbing their expression to colonize cotton hosts. These down-regulated genes should be an important resource to understand plant-pathogen interaction. To verify the intriguing expression patterns, we also analyzed V. dahliae-and water-responsive expression of GhDIRs at 12 hpi in six other cultivars, and the log 2 (fold-change) values were presented in a heatmap (Additional file 4: Figure S4). After inoculation with V. dahliae, the cultivars S1 and S2 exhibited the largest number of down-regulated DIR genes, corresponding to their lowest Verticillium wilt resistance. In contrast with S1 and S2, the cultivars T3 and T4 showed more upregulated genes in the DIR-a and DIR-e clades, and thus showed higher Verticillium wilt tolerance. Unlike T3, T4, S1 and S2, the cultivars T1 and T2 displayed quite a number of up-regulated DIR genes following inoculation with V. dahliae. Despite having different patterns in different cultivars, DIR genes might have contributed to cotton Verticillium wilt resistance.
Considering that lignin/lignin-like phenolics can affect cotton fiber quality, we analyzed the expression patterns of DIR genes during cotton fiber development (Fig. 5). As shown, the expression profiles in G. barbadense (Pima90-53 and Hai7124) were similar to that in G. hirsutum (HY405 and ND13), exhibiting low expression levels in the DIR-e, DIR-f and DIR-b/d-III clades. Several genes belonging to DIR-b/d-II (GbDIR25, GbDIR71 and GbDIR107 in G. barbadense; GhDIR27, GhDIR28, GhDIR75 and GhDIR76 in G. hirsutum) were preferentially expressed in cotton fibers of 20, 25 and 30 DPA, indicating potential functions in secondary wall development. GbDIR78 and GhDIR35 which were part of DIRb/d-I showed high transcript levels in the fibers of 5, 10 and 15 DPA, suggesting their importance during cotton fiber elongation. Furthermore, GbDIR78 and GhDIR35 were highly homologous but differentially expressed. Two DIR-a genes GhDIR12 and GhDIR36 were highly expressed during secondary wall thickening, whereas their orthologous genes GbDIR13 and GbDIR35 exhibited quite low expression levels. The differential expression might have contributed to the different fiber quality between these two species.
Transcription factor binding sites (TFBS) provide cues for transcriptional regulation. A total of 266 JASPAR matrices were selected and fetched to identify potential TFBS in the promoter regions of GbDIRs and GhDIRs (Additional file 8: Table S8). Despite the biased JASPAR database and the strict threshold, TFBS were widely detected, including hormone-activated signaling pathway (ABA, IAA, ETH, GA, JA and SA), response to abiotic stresses (drought, salt and temperature), response to biotic stresses, and plant cell wall development ( Fig. 6; Additional file 5: Figure S5; Additional file 8: Table S9 and S10). In the DIR-a, DIR-e and DIR-f clades, GhDIR33, GhDIR36, GhDIR78, GhDIR80, GhDIR86 and GhDIR92 had no TFBS related to ABA signal  Table S11 transduction and showed little or no expression in the roots of cotton seedlings (Fig. 4); the others tended to be highly expressed, indicating the probable regulatory roles of ABA signaling in root-specific gene expression. Four DIR-b/d-II genes GhDIR27, GhDIR28, GhDIR75 and GhDIR76 were highly expressed in cotton fibers. Despite having extremely close phylogenetic relationships with these four genes, GhDIR6 and GhDIR64 exhibited quite low expression levels (Fig. 5b). One cause may be the lack of TFBS in their promoter regions (Fig. 6). Similarly, GbDIR6 and GbDIR60 differed from GbDIR25, GbDIR71 and GbDIR107 in TFBS occurrences and in transcript levels ( Fig. 5a; Additional file 5: Figure S5). As another example, GhDIR36 carried more IAA-responsive TFBS than GbDIR35, corresponding to their differential expression in cotton fibers ( Fig. 5; Additional file 6: Figure  S6a). Considering that GbDIR13 and GhDIR12 differed in transcript levels in cotton fibers but owned similar TFBS ( Fig. 5; Additional file 6: Figure S6b), their transacting TFs were analyzed. As shown, some TFs associated with IAA signaling, ETH signaling or plant cell wall development exhibited higher expression levels in G. hirsutum than in G. barbadense.

Functional characterization of GbDIR78 in Arabidopsis
RNA-seq data showed GbDIR78 was preferentially expressed during cotton fiber elongation and that GbDIR78 and GhDIR35 differed in transcript levels, implying the involvement of GbDIR78 in cell elongation. To further identify its functions, the ORF of GbDIR78 driven by a 35S promoter was transformed into A. thaliana plants. Two transgenic T 3 lines OE2 and OE3 were generated, and the stable expression of GbDIR78 was confirmed by Real-time PCR and Western blot ( Fig. 7a  and b). Arabidopsis leaf trichomes can serve as a useful experimental system to dissect cotton fiber development because they partly share regulatory mechanisms [31][32][33][34]. Here, trichomes from the fifth rosette leaves of OE2, OE3 and WT plants were measured, and then we discovered that the transgenic plants owned significantly longer trichomes (Fig. 7c and d). Moreover, dark-grown hypocotyls were utilized to determine the role of GbDIR78 in cell elongation because their growth resulted from cell elongation rather than division [35,36]. As a result, the seedlings of OE2 and OE3, compared with WT seedlings, showed significantly longer hypocotyls ( Fig. 7e and f). As expected, the longer epidermal cells were observed in the hypocotyls of transgenic plants in a microscopic inspection (Fig. 7g). These results indicate that GbDIR78 can promote cell elongation and might have contributed to cotton fiber development.

Discussion
Type b/d group expanded considerably and evolved rapidly in cotton Gene duplication events (tandem, segmental/whole-genome or by transposition) have provided raw evolution materials and meanwhile built up various types of gene families. Among them, tandem and segmental duplications (from unequal crossing-over and infrequent polyploidy, respectively) are fully thought out in the evolution of plant gene families [37,38]. In the present study, the expansion of DIR-b/d was mainly due to tandem duplications (Fig. 2 and Table 1). As a result, a  Table S13 and S14 number of gene clusters were generated, and the chromosomes At01, At04, At11, Dt01, Dt04 and Dt11 contained only DIR-b/d genes (Additional file 2: Figure S2). Interestingly, the heterogeneous gene clusters including DIR-b/d-II and DIR-b/d-III members were observed on chromosomes At01 and Dt01, which provided clues about the initial generation of DIR-b/d-III. DIR-b/d genes accounted for 75.7% in G. barbadense and 76.6% in G. hirsutum, while 34.3, 20.4, 50.0 and 56.8% in spruce, rice, pepper and flax, respectively [15,17,19,21]. Also, as shown in Fig. 3b, the proportion of DIR-b/d genes in cotton was much larger than that in Vitis vinifera, Glycine max, Arabidopsis thaliana and Theobroma cacao. All these results showed a Gossypium-specific expansion of DIR-b/d. Given the rare segmental duplication events in plant genomes, tandem duplication has been proposed as a proper mechanism to cope with rapidly changing environments [39,40]. Therefore, the rapid expansion of DIR-b/d might be a plant adaptive evolution against biotic and abiotic stresses. For example, a cluster of high-density genes located on chromosome At04 was involved in gossypol biosynthesis  Figure S2) and might have contributed to plant defense responses against pathogens and pests. Moreover, tandem duplication acts different contribution across species; tandemly duplicated DIR genes are abundant in Medicago truncatula and Oryza sativa but scarce in Capsicum annuum [15,19,20]. To explore why tandem duplications promoted the rapid expansion of DIR-b/d, we compared the strength of selection acting on tandemly duplicated genes (Additional file 3: Figure  S3c and S3d). Compared with the genes in DIR-a and DIR-e, tandemly duplicated genes in DIR-b/d, especially DIR-b/d-III, showed larger Ka/Ks values, which meant weaker purifying selection and greater evolutionary rates. In other words, the relaxed selective constraints might have accelerated neofunctionalization corresponding to the occurrence of "gossypol clade". Moreover, this may be one reason for the rapid expansion of cotton DIR-b/d subfamily. Considering that the calculated Ka/Ks values were averaged over sites and time, positive selection might have worked at individual sites or in a short period, which benefitted the retention of gene duplicates.
In brief, perhaps to cope with environmental challenges, the DIR-b/d clade expanded considerably and evolved rapidly in allotetraploid cotton. The tandemly arrayed DIR genes should also be candidate resistance resources in breeding programs.

DIRs may substantially affect cotton fiber development
Lignin, deposited mostly in the secondary cell walls of vascular plants, contributes to water transport, mechanical support and plant stress responses. Besides, the deposition of lignin in cell walls can repress cell growth due to the decreased extensibility. In cotton fibers, lignin has been neglected for the low concentrations. However, in recent years, studies suggest that lignin-like phenolics may substantially affect cotton fiber quality [41][42][43]. Given the involvement of DIR genes in lignin biosynthesis, we analyzed their expression patterns during cotton fiber development. A DIR-b/d-II "SCW clade" was observed in G. hirsutum, consisting of GhDIR27, GhDIR28, GhDIR75 and GhDIR76 (Fig. 5b). Being highly expressed during secondary wall thickening, they might affect lignin deposition in cotton fibers. After analyzing QTLs reported for fiber quality traits, GhDIR27 and GhDIR28 fell in qFS-c4-1, a stable QTL across multiple environments controlling fiber strength [44]. Also, two other QTLs for fiber strength (qFS-C4-3 and qFS04.1) and one QTL controlling fiber micronaire (qFM-Chr4-3) were detected at almost the same region [45][46][47]. Interestingly, GhDIR27 and GhDIR28 also fell in qFL04.1, a QTL conferring fiber length [48]. Thus, it is reasonable to speculate that this clade might regulate lignin biosynthesis and affect fiber development. Similarly, "SCW clade" was also observed in G. barbadense (Fig.  5a). However, GbDIR27 and GbDIR70 (orthologous to GhDIR28 and GhDIR75, respectively) showed quite low expression levels, which might explain in part different fiber properties between the two cotton species. DIR-a genes are widely considered to mediate lignan biosynthesis (Additional file 8: Table S6). In particular, a soybean DIR-a protein GmPdh1 was inferred to affect the pattern of lignin deposition by considering its expression patterns and its promotion to pod dehiscence [16]. It reminds us of that the loss-of-function mutation of AtPrR1 (pinoresinol reductase) results in alterations in lignin levels, lignin structure and tissue-specific lignin distribution [49]; there may be an association between lignan biosynthesis and lignin deposition. In the present study, two DIR-a genes GhDIR12 and GhDIR36 broadly similar to GmPdh1 were highly expressed during secondary cell wall thickening (Fig. 5b). However, their orthologous genes (GbDIR13 and GbDIR35, respectively) exhibited little expression in cotton fibers (Fig. 5a). It is interesting to investigate whether GhDIR12 and GhDIR36 can negatively regulate fiber quality by enhancing lignin biosynthesis. Unlike the above-mentioned genes, GhDIR35 was preferentially expressed during fiber elongation. It belonged to the DIR-b/d-I clade and fell in qFL08.1, a stable QTL for fiber length [50]. GbDIR78, highly homologous to GhDIR35, showed a higher transcript level. Moreover, the overexpression of GbDIR78 in Arabidopsis plants can promote cell elongation (Fig. 7). Therefore, GbDIR78 and GhDIR35 may play an important role in cotton fiber elongation, and their differential expression might have contributed to the different fiber properties.
GbDIR78 promotes cell elongation possibly by regulating phenylpropanoid metabolism GbDIR78 can promote cell elongation in Arabidopsis plants, but the mechanisms need to be discussed. Being preferentially expressed during cotton fiber elongation, GbDIR78 is not likely able to participate in lignin biosynthesis. A soybean gene GmDIR22 can effectively direct lignan biosynthesis in vitro and in vivo [8].
GbDIR78 was quite close to GmDIR22 in phylogenetic tree (Fig. 2c), implying that GbDIR78 may also be involved in lignan biosynthesis. Moreover, when transiently expressed in onion epidermal cells, GbDIR78 entered the secretory pathway and was mainly retained in the plasma membrane (Additional file 7: Figure S7), which is partly similar to the subcellular localization of GmDIR22. However, DIR proteins are targeted to the cell wall when involved in lignin biosynthesis [9,10]. Thus, we reasonably speculate that GbDIR78 participates in lignan biosynthesis. Slightly confusingly, the reported cotton GhDIR1 gene shares close evolutional relationships with GbDIR78, but the overexpression of GhDIR1 in cotton can enhance lignification [11]. This could be explained in terms of the association between lignan and lignin biosynthesis [49].
Apoplast ROS (reactive oxygen species) signaling is crucial for cell elongation [51], but at high concentrations ROS become toxic, causing cell wall stiffening [52]. The cell walls of Arabidopsis plants overexpressing GbDIR78 may be at moderate ROS concentrations, because some (neo)lignans can act against oxidative damage [53]. Given that lignans compete with flavonoids for phenylalanine precursors [54,55], the metabolic flux towards lignans can result in a reduction of flavonoid biosynthesis. Flavonoids (especially flavonols) have been shown to inhibit polar auxin transport [56]. Thus, in GbDIR78-overexpressed Arabidopsis plants, auxin transport (in an apical-basal axis) may be elevated compared with WT plants. To sum up, the longer trichomes and hypocotyls in the transgenic Arabidopsis plants might be due to moderate ROS levels and higher auxin accumulation. In cotton fibers, appropriate ROS levels are important for cell elongation [57,58]. Also, some flavonoids play a negative impact on cotton fiber development [59]. Thus, the metabolic flux from flavonoids into lignans should be a novel alternative way to improve cotton fiber quality.

Conclusions
In summary, we performed a genome-wide analysis of DIR gene family in G. barbadense and G. hirsutum. Our study clearly demonstrates how segmental and tandem duplications contribute to the expansion of cotton DIR gene family and highlights a Gossypiumspecific clade involved in atropselective synthesis of gossypol. We also suggest that DIR genes can not only confer Verticillium wilt resistance but also affect cotton fiber development. In addition, the fact that GbDIR78 can promote cell elongation in Arabidopsis plants paves an alternative way to improve cotton fiber properties. Our results provide useful insights into the evolutionary history, expression patterns, transcriptional regulation, and functional analysis of Gossypium DIR genes.

Plant materials and growth conditions
G. barbadense cv. Pima90-53 [60] and Hai7124 [61], with superior fiber quality, and G. hirsutum cv. HY405 and ND13, with moderate fiber quality, were grown at an experimental field in Baoding (38°45′N, 115°29′E) during the growing season (late April to late October). Field management followed routine farming methods. G. hirsutum cultivars with different resistance to V. dahliae infection used in this study are as follows: a resistant cultivar ND601 [62], four tolerant cultivars (AusSiV2, Xinmian33B, Nongdamian7 and Nongdamian8), two susceptible cultivars (Handan333 and Xiangmian18). The Verticillium wilt resistance was assessed on the basis of observations at a disease nursery over several years. In V. dahliae-responsive expression experiments, the seedlings of these cultivars were grown in 50% Hoagland's solution under environmental conditions of 28°C/ 25°C (day/night), 16-h photoperiod, and 80% relative humidity. The solution was changed every four days in order to ensure the healthy growth of seedlings. Fourweek-old cotton seedlings were infected with V. dahliae strain Linxi2-1 as described by Wang et al. [63]. Most of the above-mentioned cultivars were collected and preserved, with the appropriate permissions, by the National Medium-term Gene Bank of Cotton in China, including Pima90-53 (M210080, introduced from USA), Hai7124 (M210054; Jiangsu, China), AusSiV2 (M131662, introduced from Australia), Xinmian33B (M112566, introduced from USA), Nongdamian7 (M110598; Hebei, China), Nongdamian8 (M110599; Hebei, China), Handan333 (M112751; Hebei, China) and Xiangmian18 (M114752; Hunan, China). The other three cultivars HY405, ND13 and ND601 were collected, from Hebei Province in China, by Hebei Agricultural University, and their accession numbers were G100937, G100728 and G100729, respectively. All necessary permissions for planting and investigating these cultivars were obtained from Hebei Agricultural University and the National Medium-term Gene Bank of Cotton in China, and the collection and research of these cultivars have complied with the Convention on the Trade in Endangered Species of Wild Fauna and Flora. A. thaliana Columbia wild-type plants (Col-0) and transgenic plants were cultivated in pots containing sterile vermiculite in a greenhouse (22°C, 16-h photoperiod, and 70% relative humidity). Hoagland's nutrient solution was added weekly.

Identification and characterization of DIR proteins
The sequence alignments of DIR gene family (PF03018) were downloaded from Pfam [69]. The candidate DIR proteins were identified using HMMER 3.0 [70] and confirmed with the Batch CD-Search service [71]. Nglycosylation sites were identified using NetNglyc 1.0 (http://www.cbs.dtu.dk/services/NetNGlyc/). Signal peptide prediction was performed with SignalP 5.0 [72]. YLoc served for predicting subcellular localization [73]. The protein length, molecular weight and isoelectric point were investigated by a native Perl script.

Gene structure, chromosomal distribution, conserved motifs and phylogenetic analysis
The exon-intron information and gene location information were fetched from gene annotation files and subsequently visualized using TBtools [74]. MEME was employed to search conserved motifs, with a limit of 15 motifs and other default parameters [75]. The phylogenetic trees were constructed by MEGA 7.0 using the neighbor-joining (NJ) method with 1000 bootstrap replications [76] and then displayed with the online iTOL tool [77].

Gene duplication and the calculation of Ka, Ks and Ka/Ks values
Segmental and tandem duplications were detected by MCScanX with default parameters [78]. The duplication events were fetched and then displayed with TBtools [74]. Homologous genes between At-and Dtsubgenome were determined using the bidirectional best hit method in BLAST. We used ParaAT [79] to construct protein-coding DNA alignments. Then the paired sequences were used to calculate Ka, Ks and Ka/Ks values using KaKs_Calculator with the LPB method [80].

RNA-seq data
A Verticillium wilt-resistant G. hirsutum cultivar ND601 and a highly aggressive defoliating V. dahliae strain Linxi2-1 were used in V. dahliae-responsive expression analysis. The roots of four-week-old seedlings infected by V. dahliae were collected independently at 2, 6, 12, 24 and 48 h post-inoculation (hpi), while the roots of seedlings inoculated with distilled water, as control, were also collected at the corresponding time points. For each time point, two biological replicates were generated. Frozen roots were ground mechanically to a fine powder in liquid nitrogen. Then total RNA was isolated using an RNAprep pure Plant Kit (TIANGEN, Beijing, China), following the manufacturer's instructions. For RNA-seq, strand-specific cDNA libraries were prepared at the Novogene Bioinformatics Institute, Beijing, China. An Illumina Hiseq 4000 platform was then used for sequencing, and 150 bp paired-end reads were generated. Then gene expression levels were calculated using FPKM (Fragments Per Kilobase of exon model per Million mapped reads). Finally, log 2 (1 + FPKM) values after averaging two replicates were displayed. Similarly, four Verticillium wilt-tolerant G. hirsutum cultivars AusSiV2, Xinmian33B, Nongdamian7 and Nongdamian8 (termed as T1, T2, T3 and T4, respectively, during the current study) and two susceptible cultivars Handan333 and Xiangmian18 (termed as S1 and S2, respectively) were used for expression experiments. RNA-seq samples were generated at 12 hpi in the same way described above. To highlight expression changes, we showed fold-change values with log 2 transformation. The fold change is the ratio of 1 + FPKM (treatment) to 1 + FPKM (control). In addition, G. barbadense cultivars Pima90-53 and Hai7124, and G. hirsutum cultivars HY405 and ND13 were employed in fiber development-related expression analysis. For each cultivar, cotton bolls were harvested independently at 0, 5, 10, 15, 20, 25 and 30 days postanthesis (DPA). For each time point, samples from multiple cotton bolls were collected and pooled to minimize variations. Finally, 28 libraries (generated from ovules of 0 DPA, and fibers of 5, 10, 15, 20, 25 and 30 DPA) were used to produce 125 bp paired-end reads on a HiSeq 2500 platform. RPKM (Reads Per Kilobase of exon model per Million mapped reads) was applied to estimate expression levels, and we finally showed log 2 (1 + RPKM) values.

Promoter analysis
The promoter sequences (2000-bp upstream of ATG) of DIR genes were extracted from genomes of G. barbadense and G. hirsutum. According to Gene Ontology annotations, a total of 266 JASPAR matrices (transcription factor binding profiles) were selected and then fetched, including hormone-activated signaling pathway (ABA, IAA, ETH, GA, JA and SA), response to abiotic stresses (drought, salt and temperature), response to biotic stresses, and plant cell wall development [81]. For each JASPAR matrix, FIMO was used to scan promoter sequences for matches with a strict threshold of p-value < 1E-5 [82]. The potential transcription factor binding sites (TFBS) were counted according to their classification information. Finally, we showed the results using iTOL [77].

Functional analysis of GbDIR78
The coding sequence of GbDIR78 (from Pima90-53) was amplified and inserted into the Gateway pDONR207 vector to form an entry clone. Then the coding sequence was recombined into the Gateway pGWB414 vector to generate an overexpression (OE) construct under the control of CaMV 35S promoter. The OE construct was transformed into A. thaliana (Col-0) through Agrobacterium-mediated plant transformation. Transgenic plants were identified using 50 μg/ml kanamycin screening (1/2 MS medium) and PCR detection. Two transgenic T 3 lines OE2 and OE3 were generated, and the stable expression of GbDIR78 was confirmed by Real-time PCR and Western blot (HA-Tag). Decolorized by ethanol, the fifth rosette leaves of four-week-old wild type (WT) and OE plants were photographed with an Olympus BX51 microscope (Tokyo, Japan), and the longest branch was measured using the ImageJ software for each of about 150 legible trichomes in each line [83]. To observe darkgrown hypocotyls, the seeds of A. thaliana were sterilized and then grown in vertical plates (1/2 MS medium, 0.9% agar and pH 5.8) under the conditions of 22°C and continuous darkness. Five-day-old WT and OE seedlings were harvested and then photographed with a professional Epson V800 scanner (Nagano, Japan). Their hypocotyls were subsequently measured using the ImageJ software [83]. To observe the epidermal cells of the hypocotyls, the seedlings used above were stained with propidium iodide (PI) and then photographed using an Olympus FV10i laser scanning microscope (Tokyo, Japan). Two-tailed t test (p-value) was conducted using the GraphPad Prism software (San Diego, CA, USA). To examine the subcellular localization of GbDIR78, its coding sequence was recombined into the Gateway pEarleyGate103 vector, which can express target protein with a C-terminal GFP fusion. A plasmid expressing GFP alone served as a control. GbDIR78-GFP fusion protein and GFP were transiently expressed in onion epidermal cells by a Bio-Rad PDS-1000/He system (Hercules, CA, USA). Then the transformed cells were monitored with an Olympus BX51 microscope (Tokyo, Japan) after incubating on MS agar medium for 24 h (22°C, continuous darkness).
Additional file 1: Figure S1. Phylogenetic relationships, motif analysis and gene structure of GbDIRs (a) and GhDIRs (b). Fifteen distinct motifs were identified with MEME software. Exons and introns are represented by green boxes and black lines, respectively. The conserved domain regions are colored in yellow Additional file 2: Figure S2. Chromosomal distribution of GbDIRs (a) and GhDIRs (b). Tandemly duplicated genes are colored in red and linked by red lines Additional file 3: Figure S3. Ka/Ks values for segmental duplications among GbDIRs (a) and GhDIRs (b), and for tandem duplications among GbDIRs (c) and GhDIRs (d). Tandemly duplicated DIR-b/d-III genes are colored in red, while DIR-a and DIR-e genes are colored in blue Additional file 4: Figure S4. Expression patterns of GhDIRs from Verticillium wilt tolerant and susceptible cultivars (a-f). Red boxes indicate up-regulated genes after inoculation with V. dahliae (compared with the CK group), while blue boxes indicate down-regulated genes. The figures in boxes represent log 2 (fold-change) values corresponding to color gradients. (g) The disease index at 20 dpi. T and S represent tolerant and susceptible G. hirsutum cultivars, respectively. The original FPKM values were provided as Additional file 8: Table S12 Additional file 5: Figure S5. Identification of TFBS in the promoter regions of GbDIRs Additional file 6: Figure S6. The potential mechanisms causing the differential expression. (a) GhDIR36 carried more IAA-responsive TFBS than GbDIR35. (b) Although GbDIR13 and GhDIR12 carried similar TFBS, the trans-acting TFs exhibited higher expression levels in G. hirsutum than in G. barbadense Additional file 7: Figure S7. Subcellular localization of GFP alone or GbDIR78-GFP fusion protein in onion epidermal cells Additional file 8: Table S1. Characteristics of 107 GbDIR proteins. Table S2. Characteristics of 107 GhDIR proteins. Table S3. List of DIR genes used in the phylogenetic tree.  . Table S6. List of DIRs with biochemical and/ or physiological functions in literatures. Table S7. List of DIR genes identified in Gossypium arboreum, Gossypium raimondii, Glycine max, Theobroma cacao and Vitis vinifera. Table S8. JASPAR matrices used to identify potential TFBS. Table S9. List of motif occurrences in Gossypium barbadense. Table S10. List of motif occurrences in Gossypium hirsutum. Table S11. FPKM values of DIR genes in response to Verticillium dahliae and water in Gossypium hirsutum cultivar ND601. Table S12. FPKM values of DIR genes in response to Verticillium dahliae and water in six Gossypium hirsutum cultivars. Table S13. RPKM values of DIR genes during cotton fiber development in Gossypium barbadense. Table S14.