Characterization of wheat homeodomain-leucine zipper family genes and functional analysis of TaHDZ5-6A in drought tolerance in transgenic Arabidopsis

Background Many studies in Arabidopsis and rice have demonstrated that HD-Zip transcription factors play important roles in plant development and responses to abiotic stresses. Although common wheat (Triticum aestivum L.) is one of the most widely cultivated and consumed food crops in the world, the function of the HD-Zip proteins in wheat is still largely unknown. Results To explore the potential biological functions of HD-Zip genes in wheat, we performed a bioinformatics and gene expression analysis of the HD-Zip family. We identified 113 HD-Zip members from wheat and classified them into four subfamilies (I-IV) based on phylogenic analysis against proteins from Arabidopsis, rice, and maize. Most HD-Zip genes are represented by two to three homeoalleles in wheat, which are named as TaHDZX_ZA, TaHDZX_ZB, or TaHDZX_ZD, where X denotes the gene number and Z the wheat chromosome on which it is located. TaHDZs in the same subfamily have similar protein motifs and intron/exon structures. The expression profiles of TaHDZ genes were analysed in different tissues, at different stages of vegetative growth, during seed development, and under drought stress. We found that most TaHDZ genes, especially those in subfamilies I and II, were induced by drought stress, suggesting the potential importance of subfamily I and II TaHDZ members in the responses to abiotic stress. Compared with wild-type (WT) plants, transgenic Arabidopsis plants overexpressing TaHDZ5-6A displayed enhanced drought tolerance, lower water loss rates, higher survival rates, and higher proline content under drought conditions. Additionally, the transcriptome analysis identified a number of differentially expressed genes between 35S::TaHDZ5-6A transgenic and wild-type plants, many of which are involved in stress response. Conclusions Our results will facilitate further functional analysis of wheat HD-Zip genes, and also indicate that TaHDZ5-6A may participate in regulating the plant response to drought stress. Our experiments show that TaHDZ5-6A holds great potential for genetic improvement of abiotic stress tolerance in crops.


Background
Changes to the transcriptome are achieved through the action of transcription factors (TFs), which repress or activate suites of genes to modulate plant growth and respond to environmental stimuli [1]. The HD-Zip family consists of a large number of transcription factors that seem be unique to the plant kingdom. HD-Zip proteins contain a Homeobox domain (HD) and an adjacent Leucine Zipper (LZ) motif [2]. The HD domain is responsible for specific DNA binding, whereas the LZ motif acts as a mediator to protein dimerization [2]. Based on the additional conserved motifs and their phylogenetic relationships, HD-Zip genes can be classified into four subfamilies (HD-Zip I, II, III, and IV) [2][3][4][5]. All subfamilies cotnain the LZ domain and are characterized by differences in the regions downstream of this domain. HD-Zip II subfamily proteins contain a conserved "CPSCE" motif located in the C-terminus, which is not found in HD-Zip I subfamily proteins [2]. HD-Zip III and IV subfamily proteins uniquely contain the extra conserved START and HD-SAD domains [2]. The HD-Zip III subfamily proteins are distinguished from those of HD-Zip IV by the presence and absence, respectively, of a C-terminal MEKHLA domain [2,6].
In recent years, many efforts have been made to elucidate the functions of HD-Zip genes. Members of the HD-Zip family have been found to play pivotal roles in plant development and the adaption to environmental stresses. HD-Zip I subfamily proteins are mainly involved in the regulation of organ growth and development, de-etiolation, blue light signaling, and also in regulating the response to abiotic stresses [7][8][9][10][11]. For example, ATHB7 and ATHB12 are both sensitive to abscisic acid (ABA) and water deficit, and negatively regulate the ABA response in Arabidopsis [6]. ATHB1 acts as a positive regulator to promote hypocotyl elongation [8] and to mediate the determination of leaf cell fate [9]. The TaHDZipI-2 gene was shown to regulate flowering and spike development and improve frost tolerance in transgenic barley lines [10]. Additionally, wheat TaHD-ZipI-3, − 4 and − 5 genes are differentially expressed in response to abscisic acid (ABA), cold and drought treatment through binding to specific cis-elements [11]. HD-Zip II subfamily proteins participate in embryonic apical development, auxin signaling, and are also involved in light and abiotic stress responses [12][13][14][15]. In Arabidopsis, both ATHB2/HAT4 and HAT2 participate in auxin-mediated morphogenesis, and ATHB2/HAT4 also regulates the leaf cell expansion and shade avoidance under red/far-red light [13,14]. OsHOX11 and OsHOX27 are two rice HD-Zip II genes, and their expression is dramatically decreased upon exposure to drought in a drought-resistant cultivar [12]. Additionally, a sunflower HD-ZIP II gene, HAHB10, participates in the response to biotic stress [15].
HD-Zip III subfamily proteins have been reported to control embryogenesis, apical meristem development, vascular bundle development, morpho-physiological changes in roots and auxin transport, and leaf polarity [16][17][18][19][20]. ATHB8 and ATHB15 are thought to direct vascular development [17,18]. CLV3 has been shown to interact with HD-Zip III members to regulate floral meristem activities [19], and KANADI interacts with HD-Zip III genes to control lateral root development [20]. PopREVOLUTA (PRE), a class III HD-Zip gene in poplar, is involved in the growth of cambia and secondary vascular tissues [16]. HD-Zip IV subfamily proteins are integral to growth and development of trichome, cuticle, and root tissues, as well as epidermal cell differentiation [21][22][23][24]. In Arabidopsis, GL2 regulates trichome expansion and root hair differentiation [22], and PDF2 plays a vital role in epidermal cells to control normal development of the floral organs [21]. OCL4 (OUTER CELL LAYER4) encodes a maize HD-Zip IV transcription factor that inhibits trichome development and influences anther cell division in maize [23]. In addition, recent studies have demonstrated that overexpression of AtHDG11, an HD-Zip IV gene, increases drought tolerance in Arabidopsis, tobacco, rice, sweet potato, and cotton [24].
Bread wheat (Triticum aestivum; 2n = 6x = 42; AABBDD) is an integral global food crop [25,26]. The modern bread wheat genome is the result of two allopolyploidization events with three genomes. First, the A genome donor (T. urartu, AA; 2n = 14) hybridized with the B genome donor (A. speltoides, SS; 2n = 14). This event, which occurred0 .2 Mya, produced the allotetraploid T. turgidum L. (AABB). Second, this AABB donor hybridized with the D genome donor (A. tauschii~9000 ya. This resulted in the allohexaploid wheat T. aestivum (AABBDD) [27,28], which has a large (> 17 Gb) and composite genome, making genomic studies difficult. Because of wheat's importance globally, extensive research has been conducted to sequence and annotate its genome [25,26,[29][30][31][32][33]. Recent efforts have sequenced isolated chromosome arms and constructed a draft sequence of the hexaploid wheat genome (IWGSC, 2018). However, compared with Arabidopsis and rice, there are fewer studies of the HD-Zip family in wheat. To date, only five genes encoding HD-Zip subfamily I members (TaHDZipI-1 to TaHDZipI-5) have been isolated and partially characterized from wheat [10,11,34]. Although some HD-Zip genes have been functional studied in wheat, the reports of their genome organization, structure and phylogenetic relationships are limitted, especially for HD-Zip genes involved in drought stress response.
In a previous study, 46 wheat HD-Zip genes were identified [35], which is not consistent with the large genome of wheat. Thus, a further survey of the HD-Zip gene family should be conducted using the most current version of the wheat genome. Here, we present a genome-wide identification and analysis of the HD-Zip genes from wheat and show the phylogenetic relationships among the wheat genes and to those from Arabidopsis and other plants. We performed gene expression analyses to characterize the expression profiles of HD-Zip genes in various organs/tissues and in response to drought stress. We then performed functional analysis of a drought-induced HD-Zip I gene, TaHDZ5-6A, by investigating drought stress tolerance and physiological traits in transgenic Arabidopsis plants. Finally, we propose a putative mechanism by which TaHDZ5-6A enhances drought tolerance in transgenic Arabidopsis plants. Our results provide a basis for the further functional analysis of the wheat HD-Zip gene family.

Identification of the HD-zip gene family in wheat
Wheat genome data used in this study were downloaded from the Chinese Spring IWGSC RefSeq v1.1 reference genome assembly (https://wheat-urgi.versailles.inra. fr/). We firstly converted the wheat genome into a local BLAST database using the UNIX pipeline. Then, we used 90 Arabidopsis and rice HD-Zip protein sequences to perform a BLAST search (BLASTP) against this local blast database using cut-off E-value <1e-10. After remove the all redundant sequences using CD-hit program, the rest of protein sequences were further subjected to identify the HD domain and LZ motif using the Simple Modular Architecture Research Tool (SMART; http://smart.emblheidelberg.de/smart/set_mode.cgi? NORMAL = 1). In a recently study, a total of 46 HD-Zip genes were identified in wheat by a genome-wide bioinformatic survey [35]. In this study, we further identified 67 additional HD-Zip genes in wheat latest genome and extended the total member to 113. Based on the genomic position information, 113 HD-Zip genes were located across all the 21 wheat chromosomes, ranging from 3 to 8 per chromosome. Chromosome 5A/B/D have the most HD-Zip genes (24 total, 8 per chromosome), followed by chromosome 4A/B/D (18 total, 6 per chromosome) ( Table 1; Additional file 1: Figure S1). Acording to their phylogenetic relationship, the 113 HD-Zip proteins were grouped into 40 homoeologous clusters, and the members in each of 39 clusters were assigned to A, B or D sub-genomes. Finally, We designated wheat HD-Zip genes as TaHDZX_ZA, TaHDZX_ZB, or TaHDZX_ZD, where X denotes the gene number and Z the wheat chromosome where it is located. The detailed information of HD-Zip family genes in wheat, including nomenclature proposed in the previous study [35] was listed in Table 1. As shown in Table 1, the identified HD-Zip genes in wheat encode proteins ranging from 192 (TaHDZ12-6D) to 890 (TaHDZ35-1B) amino acids (aa) in length with an average of 501 aa. Furthermore, the computed molecular weights of these HD-Zip proteins ranged from 20.88 (TaHDZ12-6D) to 96.02 (TaHDZ35-1B) kDa. The theoretical pI of the deduced HD-Zip proteins ranged from 4.59 (TaHDZ5-6A) to 9.79 (TaHDZ12-6D).

Phylogenetic analysis of HD-zip gene family
Our study aimed to understand the phylogenetic relationships between plant HD-Zip proteins. We began by identification of HD-Zip genes from seven other plant species with varying levels of complexity for which entire genomes were accessible, including Chlamydomonas reinhardtii, Physcomitrella patens, the monocotyledonous angiosperms Brachypodium distachyon, Oryza sativa, and Zea mays, and the dicotyledonous angiosperms Arabidopsis thaliana, Populus trichocarpa, and Vitis vinifera. From this analysis, we found that the HD-Zip gene family seems to be restricted to land plants; all genomes except that of the algae contained genes for HD-Zip proteins. We then analyzed their evolutionary relationships using full-length HD-Zip proteins from eight land plant species to construct a neighbour-joining phylogenetic tree. Acordingly, the phylogenetic tree was divided into four well-conserved subfamilies, designated as HD-Zip I to IV (Fig. 1a). The phylogenetic tree also revealed the species-biased distribution of these plant HD-Zip proteins (Fig. 1b). HD-Zip I members consisted of the largest subfamily in the plant species except for Brachypodium distachyon and wheat, where HD-Zip II and IV were the largest respectively. In contrast, HD-Zip III subfamily composed of the fewest HD-Zip members except for moss (Fig. 1c). Subfamily I included 31 TaHDZ genes, grouped into 11 clusters ( TaHDZ1-4A Table 1).
To clarify the paralog and ortholog relationships of wheat HD-Zip members, we further divided each subfamily into subclasses. According to this reshaped phylogenic tree (Fig. 2), each subfamily contain the HD-Zip proteins from Arabidopsis, rice, and wheat, suggesting that these subfamilies were appeared before the dicot-monocot split. Consistent with the nomenclature in previous studies of Arabidopsis and rice [36], HD-Zip I subfamily was divided into seven subclasses, i.e., α, β, γ, δ, ε, φ and ζ (Fig. 2). Clade ε and φ contains only sequences from Arabidopsis. Clade ζ contains sequences from both rice and wheat, with no members in Arabidopsis, suggesting the gene loss in Arabidopsis during the long period of evolution of this group. The HD-Zip II subfamily was divided into ten subclasses, from α to κ, according to Hu et al. (2012) [37]. Clade β contains only sequences from Arabidopsis. Clade α and γ contains sequences from both rice, wheat, and Arabidopsis. While the other clades only contains sequences from rice and wheat. The HD-Zip III subfamily was only divided into three subclades, designated as clade α, β and γ, consistent with the previous studies [37]. Each clade contains sequences from both rice, wheat, and Arabidopsis. The HD-Zip IV subfamily was also divided into six subclades, designated clade α, β, γ, δ, ε and ζ as in a previous study [37]. Clade δ excluded genes from rice and Arabidopsis, while clade ζ included only sequences from rice and wheat. Eudicot-and monocot-specific clustering patterns of HD-Zip genes emerged when tree topology was examined. This pattern may reflect evolutionary history of these subgroups: HD-Zip genes in eudicots were likely retained after they diverged from monocots and then expanded. Exon-intron structural divergence can play an important role in the evolution of multiple gene families [38]. We constructed a phylogenetic tree using only the 113 full-length wheat HD-Zip protein sequences fo further examine patterns in wheat. We found that wheat HD-Zip proteins also fell into the four subfamilies described previously (Fig. 3a). We further mapped the exon/intron organization in the coding regions of each TaHDZ gene. Specifically, 21 TaHDZ genes had two introns, 28 had three introns, 15 had four introns, two had five introns, two had seven introns, 11 had eight introns, 12 had nine introns, three had 10 introns, five had 11 introns, two had 15 introns, two had 16 introns, and 10 had 18 introns (Fig. 3b, c). In general, orthologous genes are highly conserved with respect to gene structure, and this conservation is sufficient to reveal their evolutionary relationships [38]. In wheat, HD-Zip genes within the same subfamily shared similar gene structures (intron number and exon length), especially the members of the HD-Zip I and HD-Zip III subfamilies, i.e., HD- Fig. 2 The phylogenetic tree of HD-Zip proteins from wheat, Arabidopsis, maize and rice. Members of the HD-zip genes from wheat are marked in red. Two-letter prefixes for sequence identifiers indicate species of origin. Ta, Triticum aestivum; At, Arabidopsis thaliana; Os, Oryza sativa; Zm, Zea mays. The tree was constructed using the Neighbor-Joining algorithm with 1000 bootstrap based on the full length sequences of HD-Zip proteins. The HD-Zip proteins are grouped into four distinct groups Zip I genes mainly had two or three introns in their gene regions, and HD-Zip III genes mainly had 18 introns. However, the exon/intron compositions in HD-Zip II and IV genes were more variable, i.e., HD-Zip II members possessed two to four introns, and the number of introns in HD-Zip IV family members varied from 4 to 11 (Fig. 3b, c).
The allohexaploid bread wheat genome is known to have formed by fusion of the T. urartu (subgenome A), Aegilops speltoides (subgenome B), and A. tauschii (subgenome D) genomes prior to several hundred thousand years ago. A majority (60.1-61.3%) of genes in the A, B, and D subgenomes have orthologs in all the related diploid genomes. To deeply understand the intron gain or loss for homeologous TaHDZ genes in wheat, the intron/exon structures of TaHDZ genes that clustered together based on the phylogenetic tree were compared. Among these, fourteen clusters showed changes in their intron/exon structure, including TaHDZ1-4A/B/D, TaHDZ3-4A/B/D, TaHDZ5-  (Fig.  3b). Because there are many orthologs in the wheat A, B, and D sub-genomes, intron gain/loss of these orthologs significantly increases the transcriptome and proteome complexity in wheat.
To further examine the diverse structurse of wheat HD-Zip proteins, the conserved motifs were identified by searching the SALAD database along with subsequent annotation with InterPro (Additional file 2: Figure S2). Seven of these motifs were found to be associated with the functionally defined domains. Motifs 1 and 2 were referred to the HD domain, which is the typical conserved domain found in the middle of all the TaHDZ proteins, and motif 5 was associated with the adjacent LZ domain. Motifs 17 and 34 were specifically made up the MEKHLA domain in subfamily III proteins of wheat (14 members). Motifs 3 and 4 were associated with the START region, which has been identified in subfamily III and IV proteins (Additional file 2: Figure S2). Similar motif compositions are shared by TaHDZ proteins which cluster together, and this indicates that members of a given group possess similar functionalities.
Tissue-specific expression profile of TaHDZ genes Gene family members can exhibit different expression patterns in different tissues to accommodate various physiological processes. To gain insight into the temporal and spatial expression patterns and putative functions of HD-Zip genes in wheat growth and development, the tissuespecific expression patterns of the 113 TaHDZ genes were investigated using RNA-seq data from 10 different tissues. All TaHDZ genes were found to be expressed in at least one of the tissues examined ( Fig. 4; Additional file 3: Table  S1). Subfamily I TaHDZ genes were found to be much more highly expressed in seedling roots, stems, leaves, flag leaves, young spikes, and 5-day-old grains; for example, TaHDZ1-4A/B/D are highly expressed in leaves and 5-dayold grains, TaHDZ8-6A/B/D are highly expressed in leaves and young spikes (15-days-old), and TaHDZ11-2A/B/D are highly expressed in leaves and 5-day-old spikes ( Fig. 4; Additional file 4: Figure S3). Subfamily II TaHDZ genes are more highly expressed in seedling roots, stems, leaves, flag leaves, and young spikes; for example, TaHDZ19-3A/ B/D are highly expressed in young spikes, while TaHDZ20-1A/B/D are highly expressed in seedling stems, leaves, and 5-day-old spikes ( Fig. 4; Additional file 5: Figure  S4). Subfamily III TaHDZ genes showed relatively higher expression levels in seedling stems, leaves, and young spikes; TaHDZ24-3A/B/D are highly expressed in seedling leaves, and TaHDZ27-5A/B/D are highly expressed in seedling stems and leaves (Figure 4; Additional file 6: Figure S5). Subfamily IV TaHDZ genes are highly expressed in seedling stems, young spikes, and grains; TaHDZ29-3A/B/D are highly expressed in 10-day-old grains, TaHDZ32-3A/B/D are highly expressed in 5-20 day-old grains, and TaHDZ38-5A/B/D are highly expressed in seedling stems and young spikes ( Fig. 4; Additional file 7: Figure S6). Thus, genes in the four wheat HD-Zip subfamilies display obvious differences in expression patterns and levels, which indicates that these genes have undergone functional differentiation and redundancy. It is worth mentioning that most homologous genes show similar expression patterns during development. However, it should also be noted that many clustered expression profiles do not reflect gene similarities, and this includes the copies of individual HD-Zip gene types from the subgenomes. Some of them even show the opposite expression patterns. For instance, TaHDZ7, which is located on chromosome 2D, is preferentially expressed in the seedling leaves and flag leaves, whereas the homologous TaHDZ7 gene from 2A is only expressed in the flag leaves, and the TaHDZ7 homolog from 2B is preferentially expressed in flag leaves and 5-day-old spikes ( Fig. 4; Additional file 4: Figure S3). TaHDZ37 on 2A shows relatively higher expression in 10-15 day-old grains, while its homologous TaHDZ37 from 2B is preferentially expressed in seedling leaves and 20-day-old grains, and the homologous from 2D is highly expressed in 15-days-old grains (Figure 4; Additional file 7: Figure S6). The divergences in expression profiles between homologous genes from the different subgenomes reveals that some of them may have lost their function or acquired a new function after polyploidization during the evolution of wheat.

Expression patterns of TaHDZ genes in response to drought stress
Wheat productivity is severely affected by drought stress, and therefore the study of drought responsive genes is important to increase wheat yield. Many studies have shown that the HD-Zip genes play a crucial role in the response to abiotic stresses in plants. To gain more insight into the roles of wheat HD-Zip genes in stress tolerance, we first identified the cis-elements within 2 kb promoter region using online program PlantCARE (http://bioinformatics.psb.ugent.be/webtools/plantcare/ html/). We found a number of cis-acting elements related to stress response in the promoter of TaHDZs. They included DRE (Dehydration-responsive element), ABRE (ABA-responsive element), MBS (MYB binding site involved in drought-inducibility), MYC (MYC recognition site), MYB (MYB recognition site), and LTR (low temperature responsive element) (Additional file 8: Table S2). To further understand the potential role of TaABFs in the drought stress response, we reanalyzed the expression profiles of all wheat HD-Zip genes using RNA-seq data from roots and leaves that were subjected to drought treatment. We found that the wheat HD-Zip genes could be mainly classified into two groups based on their expression patterns (Fig. 5a, b; Fig. 6a, b). In leaves, the expression levels of 45 TaHDZ genes were up-regulated at one or more time point during drought stress treatment; this included 20 genes from the HD-Zip I subfamily ( TaHDZ2-5A (Fig. 5a, c, and d). In contrast, 50 TaHDZ genes showed down-regulated expression under drought stress, including seven genes from subfamily I, six genes from subfamily II, 12 genes from subfamily III, and 25 genes from subfamily IV (Fig. 5a, c, d). In roots, 34 TaHDZ genes were found to be up-regulated in  TaHDZ37-2B and TaHDZ40-2B) (Fig. 6a, c, d). In contrast, 51 TaHDZ genes were down-regulated under drought stress in roots, including 12 genes from subfamily I, 8 genes from subfamily II, 13 genes from subfamily III, and 18 genes from subfamily IV (Fig. 6a, c, d). These results indicate that most TaHDZ genes in subfamilies I and II may play important roles in the response to drought stress.

TaHDZ5-6A confers drought tolerance in Arabidopsis
The phylogenetic analysis and gene expression profiles suggest that TaHDZ5-6A/D may participate in regulating the drought stress response in wheat. Protein sequence analysis revealed that TaHDZ5-6A and TaHDZ5-6D share 95% sequence similarity (Additional file 9: Figure S7). In order to further comfirm the potential role of TaHDZ5 in the drought stress response, we performed quantitative real-time PCR (qRT-PCR) using RNA isolated from different tissues and drought conditions. The PCR primers were designed to amplify the homologous alleles of TaHDZ5. The results showed that TaHDZ5 is expressed at higher levels in the seedling leaves, flag leaves and young spikes, with the highest expression detected in the seedling leaves, and TaHDZ5 was upregulated throughout the testing period by drought stress (Additional file 10: Figure S8). To further investigate the role of TaHDZ5 in the drought stress response, we generated 35S::TaHDZ5-6A transgenic Arabidopsis lines. Three independent transgenic lines (OE1, OE2, and OE3) were chosen for analysis based on their TaHDZ5-6A expression levels (Fig. 7a). WT and 35S::TaHDZ5-6A transgenic plants were grown for 3 weeks in soil before water was withheld for 14 d. There was no obviously phenotypic differences between 35S:: TaHDZ5-6A transgenic and WT plants under normal conditions (Fig. 7c). After the drought treatment and six days of rewatering, 72-88% of the 35S::TaHDZ5-6A plants had survived, whereas only~8% of the WT plants were alive (Fig. 7b, c). Thus, the ectopic of TaHDZ5-6A greatly improved drought tolerance in transgenic Arabidopsis.
The stomatal apertures of leaves from 35S::TaHDZ5-6A and WT plants grown in soil were measured. The stomatal aperture indices of of the OE1, OE2, and OE3 plants were 0.41, 0.42 and 0.41, respectively, while that of the WT plants was 0.40, when grown under normal conditions (Fig. 7d, e). After being subjected to 10 d of  Statistical analysis of survival rates after the drought-stress treatment. The average survival rates and standard errors were calculated based on data obtained from three independent experiments. Significant differences were determined by a t-test. *P < 0.05, **P < 0.01. c Drought tolerance of 35S:TaHDZ5-6A transgenic Arabidopsis. Photographs were taken before and after the drought treatment, and followed by a six-day period of rewatering. D Stomatal aperture of WT and 35S::TaHDZ5-6A transgenic plants under normal and drought conditions. e Statistical analysis of stomatal aperture of WT and 35S::TaHDZ5-6A transgenic plants. Values are mean ratios of width to length. Error bars represent standard errors of three independent experiments (n = 60). Bars, 10 μm. f Water loss from detached rosettes of WT and 35S::TaHDZ5-6A transgenic plants. Water loss was expressed as the percentage of initial fresh weight. Values are means from eight plants for each of three independent experiments. Significant differences were determined by a t-test. *P < 0.05, **P < 0.01. g Free proline content of WT and 35S::TaHDZ5-6A transgenic plants under normal and drought stress treatment drought stress, the stomatal aperture indices of the OE1, OE2, and OE3 plants decreased to 0.22, 0.18, and 0.22, respectively, significantly reduced as compared to that of the WT (Fig. 7d, e). Consistent with these results, the water loss in detached leaves of 35S::TaHDZ5-6A transgenic plants was much more slowly than those of WT plants under dehydration (Fig. 7f). These results indicate that the 35S::TaHDZ5-6A transgenic plants removed water from the soil more slowly than did the WT plants, reducing the rate of wilting. To explore whether TaHDZ5-6A ectopic expression influences proline accumulation, we compared the free proline contents in 35S:: TaHDZ5-6A transgenic and WT plants. Consistent with the drought tolerance phenotype, the proline contents were much higher in transgenic plants than those of the WT plants under drought conditions (Fig. 7g). These findings collectively indicate that TaHDZ5-6A can enhance drought tolerance in transgenic Arabidopsis.
Global gene expression changes in 35S::TaHDZ5-6A transgenic Arabidopsis RNA sequencing allowed us to understand how drought tolerance was conferred by the ectopic of TaHDZ5-6A. The transcriptome of the 35S::TaHDZ5-6A transgenic plants was compared to that of WT plants under normal, non-stress conditions. In transgenic plants, a total of 495 and 111 genes were upregulated and downregulated by at least 2-fold (P < 0.001, FDR < 0.05) as compared with the WT (Fig. 8a, b; Additional file 11: Table  S3). The upregulated genes included genes related to water deprivation, abscisic acid, hormones, and abiotic stimuli, and downregulated pathways included those responsive to auxin stimuli, oxidative stress, and defense responses (Fig. 8c). We then chose 10 genes upregulated in transgenic plants and known to be involved in response to drought: DREB2A [39], RD29A [40], RD29B [40], RD26 [41], RD17 [42], PP2CA [43], RAB18 [42], ANAC019 [44], NCED3 [45], and RD20 [46]. We used qRT-PCR to measure their relative expression levels under normal and drought conditions in transgenic and WT plants (Fig. 8d). The results of qRT-PCR were in alignment with those of RNA-seq, indicating that TaHDZ5-6A may positively regulate the transcription of these 10 genes, and thereby play a role in the response, including rapid stomatal closure and reduction of water loss, of transgenic Arabidopsis plants under drought conditions.

Discussion
Benefit from the whole genome sequencing and the genomic databases, we could explore the gene families in much greater detail, especially in complex genomes, such as wheat [47]. The identification of wheat HD-Zip genes is an essential step towards the further functional characterization of these genes. Although the HD-Zip gene family has been widely studied in both monocots and dicots, their functions remain obscure in wheat. Previous studies have reported the identification of a few individual HD-Zip gene families in wheat, including the TaHDZipI-1 to TaHDZipI-5 genes [10,11,34]. However, a systematic identification and characterization of wheat HD-Zip genes has not been performed until the present. To address this knowledge gap, we performed a comprehensive identification and analysis of wheat HD-Zip genes in this study.
We identified 113 putative HD-Zip genes in wheat based on the Chinese Spring IWGSC RefSeq v1.1 reference genome (https://wheat-urgi.versailles. inra.fr/) ( Table 1). The number of HD-Zip genes is twice than that found in Arabidopsis, rice, maize, or poplar [3,4,37], because wheat is an allohexaploid crop. The further sequence alignment clearly showed the higher sequence divergence of 113 wheat HD-Zip proteins, especially the sequences at the C-terminus, indicating that HD-Zip genes play the diverse roles in wheat growth and development [2,48]. The wheat HD-Zip family members can be further divided into four well-conserved subfamilies (I-IV) based on their phylogenic relationships ( Fig. 1;  Fig. 2), gene structures (Fig. 2), and motif arrangements (Additional file 2: Figure S2). Our results are consistent with previous studies [3,4,37]. The HD-Zip III was the smallest subfamily in our study ( Fig. 1; Fig. 2), which is consistent with the earlier reports that HD-Zip III is the most conserved subfamily with little members among various species [3,37]. Also, the number of HD-Zip II and IV subfamily members vary in different species, which is the main reason that there are different numbers of HD-Zip family genes in various species. The gene structure analysis revealed that genes in each HD-Zip subfamily have similar exon-intron structures with respect to numbers and positions (Fig. 3b). However, the HD-Zip II and HD-Zip IV subfamilies were found to be more divergent (Fig. 3c), indicating that these genes may have different functions in wheat development. In addition, wheat HD-Zip proteins contain specific conserved domains in each subfamily (Additional file 2: Figure S2). The HD and LZ domains, which have been reported to be responsible for protein-DNA and for protein-protein interactions, are conserved in all HD-Zip proteins [36]. Except for subfamily-biased conserved motifs, the HD-Zip proteins target different sequences; for example, the HD-Zip I proteins target the CAAT(A/T) ATTG sequence, HD-Zip II proteins interact with the CAAT(C/G) ATTG sequence, and HD-Zip III and IV proteins recognize GTAAT(G/C) ATTAC and TAAA TG(C/T) A, respectively [10,36].
Recently, there have been efforts to understand the functions of HD-Zip genes in Arabidopsis. To further elucidate the potential functions of wheat HD-Zip genes, the orthologous relationships between wheat and Arabidopsis proteins have been examined in depth. Subfamily I is divided into seven subclasses or clades, i.e., α, β, γ, δ, ε, φ, and ζ (Fig. 2). Clade α includes Arabidopsis ATHB13, a positive regulator of drought, salinity, and cold stresses [49,50], that is a ortholog of the wheat TaHDZ3-4A/B/D genes. Expression of the β-clade members ATHB5 and ATHB6 is also affected by water deficit, and both genes appear to regulate growth in response to ABA and/or drought treatment [51,52], but these genes have no orthologs in wheat. γ-Clade members are typically induced by abiotic stresses, and include Arabidopsis ATHB7 and ATHB12 [7], which are the orthologs of the wheat TaHDZ5-6A/B/D, TaHDZ7-2A/B/D, and TaHDZ 8-6A/B/D genes. Furthermore, the δ-clade genes ATH B21, ATHB40, and ATHB53 are induced by ABA treatment and salt stress; these three TFs are involved in controlling axillary bud development [53], and are the orthologs of the wheat TaHDZ9-4A/B/D, TaHDZ10-2B/ D, and TaHDZ11-2A/B/D genes. The HD-Zip II subfamily is divided into ten subclasses, from α to κ (Fig. 2). Clade γ consists of ATHB2/HAT4 and HAT2, genes that regulate auxin-mediated morphogenesis in Arabidopsis [13,14] and that are the orthologs to wheat TaHDZ21-2A/B/D. The HD-Zip III subfamily is classified into three subclades, designated α, β, and γ (Fig. 2). Clade α corresponds to the REV clade described in previous studies [54], which are orthologs of the wheat TaHDZ25-1A/B/D and TaHDZ26-4B/D genes. Clade β includes Arabidopsis ATHB8 [17] and ATHB15/CNA [19], which are the orthologs of wheat TaHDZ24-3A/B/ D, and clade γ contains PHB and PHV [55,56], which are the orthologs of wheat TaHDZ27-5A/B/D and TaHDZ28-5A/B/D. The HD-Zip IV subfamily also consists of six subclades, designated α, β, γ, δ, ε, and ζ (Fig.  2). Clade α contains Arabidopsis ANL2, a regulator of anthocyanin accumulation in the leaf sub-epidermal layer and of cell identity in the root [57]; ANL2 is orthologous to the wheat TaHDZ36-6A/B/D, TaHDZ37-2A/ B/D, and TaHDZ38-5A/B/D genes. Clade β includes Arabidopsis GL2 [22], which is the ortholog of wheat TaHDZ27-5A/B/D and TaHDZ32-3A/B/D. Clade γ contains trichome formation genes, HDG4, HDG5, and HDG8-12 [58], which are the orthologs of wheat TaHDZ33-6A/B/D, TaHDZ34-7A/B/D, and TaHDZ35-1A/B/D. Clade ε is composed of AtML1 and PDF2 that are responsible for shoot epidermal cell differentiation [21], and are the orthologs of the wheat TaHDZ39-7A/B/ D genes. These results will help us to further understand the function of wheat HD-Zip genes, especially those that are orthologous with Arabidopsis HD-Zip genes.
To better understand the roles of the wheat HD-Zip genes during the life cycle of wheat, we performed an expression analysis of publicly-available RNA-seq data in 10 organs/tissues at different developmental stages. Genes in the HD-Zip family have been reported to be involved in the development of different organs, and expression HD-Zip genes varies widely in different organs (Fig. 4); for example, Genes of the HD-Zip I subfamily play an important role in the development of flowers and leaves, and have been found to control the development of cotyledon, spike, and leaves [8,10,59,60]. We found that the HD-Zip I genes were primarily found in seedling leaves, flag leaves, and young spikes (Additional file 4: Figure S3). We also found that most of the HS-Zip II genes displayed elevated levels of expression in both leaves and young spikes (Additional file 5: Figure S4), while previous research had found that HD-Zip II genes were involved in carpel margin, flower growth [15,61] and leaf polarity [62]. We also observed that HD-Zip III genes were found primarily in the leaves and stems of seedlings (Additional file 5: Figure S5), and might be involved in organ polarity, vascular development, and meristem function [54,63]. Prior research has found that HD-Zip IV genes serve a role in the development of grain, trichome, and anther [23,64], because most of them show higher expression levels in seedling stems, young spikes, and during grain development (Additional file 7: Figure S6). These results suggest the TaHDZ genes may play a variety of roles in wheat development.
The manner in which HD-Zip genes respond to stress strongly indicates that they are involved in adapting to dynamic conditions in their environment. The expression of the HD-Zip family I and II genes is activated or inhibited by drought conditions, which is similar to other plants ( Fig. 5; Fig. 6). Our qRT-PCR analyses further revealed that a noval HD-Zip I gene, TaHDZ5 is highly expressed in seedling leaves and is induced by drought stress (Additional file 10: Figure S8). To investigate the role of TaHDZ5 in the abiotic stress response, We transformed the homologous gene TaHDZ5-6A into Arabidopsis, confirming the expression of TaHDZ5-6A via qRT-PCR (Fig.  7a). Compared to the WT plants, the transgenic lines were significantly more resistant to drought conditions. Environmental stressors can induce physiological changes in plants, which can be measured and used to analyze certain crops' resistance to abiotic stressors. We analyzed the transpiration rate of the detached leaves, and found that the rate of water loss was lower in the transgenic plants than it was in the WT plants (Fig. 7f). We also found that under the stress of drought, the stomata closed at a quicker rate in the transgenic plants than it did in the WT plants (Fig. 7d, e). In addition, the proline content was higher in the transgenic plants compared to WT plants under drought conditions (Fig. 7g). We also found that constitutive expression of TaHDZ5-6A in Arabidopsis significantly increased the transcription of many stressresponsive genes, including RD29A, RD29B, RAB18, DREB2A, NCED3, and RD17 (Fig. 8). These data provide strong evidence that TaHDZ5-6A can enhance drought tolerance in the transgenic arabidopsis plants. Previous studies have reported that overexpression of TF genes may cause the growth retardation in transgenic plants [65][66][67], restricting the applicability of target genes in transgenic breeding. However, in our study, there are no obvious adverse effects were observed of 35S::TaHDZ5-6A transgenic plants (Fig. 7a), indicating the potential for using TaHDZ5-6A in plant breeding.

Conclusions
In conclusion, we performed a comprehensive analysis of the genome organization, evolutionary relationships, and expression profiles of the HD-Zip gene family members in wheat and also functionally characterized TaHDZ5-6A by showing that it confers drought tolerance in transgenic Arabidopsis. The present study has built a foundation and provides an essential framework for the further functional characterization of wheat HD-Zip genes in various physiological processes, including their role and the underlying molecular mechanism in the regulation of drought tolerance in wheat.

Plant materials and drought stress treatments
Wheat (Triticum aestivum) variety Chinese spring (CS) was identified and obtained from the Prof. Zhensheng Kang's Lab (Northwest A&F University, China) and was used to analysis the expression of TaHDZ genes, this wheat variety can also obtained from Chinese Crop Germplasm Resources Information System (http://www.cgris. net/ zhongzhidinggou/index.php). After surface-sterilized with 75% ethanol and washed with deionized water, the seeds were placed on wet filter paper to germinated at 25°C for 3 days. The germinated seeds were placed in a nutrient solution (0. For drought treatment, the three-leaf stage seedlings were placed on a clean bench and subjected to dehydration (25°C, relative humidity of 40-60%). Leaves and roots from three seedlings were collected after 0, 1, 3, 6, 12 and 24 h for drought treatment.
To investigate the tissue-specific expression patterns of TaHDZ5-6A in wheat, field grown wheat cv. Chinese spring were used. Wheat plants were grown during the growing season at the experimental station of the Northwest A & F University, Yangling, Shanxi, China (longitude 108°E, latitude 34°15′N) from 2016 to 2017. Ten tissue/ organ samples including root, stem, leaf of wheat seedling at five-leaf stage, young spike at early booting stage, spike at heading stage, flag leaf at heading stage, and the grain of 5, 10, 15 and 20 DPA. Each sample was collected from at least five individual plants for two repeats. The aforementioned samples were frozen quickly in liquid nitrogen and placed at − 80°C for further RNA extraction.

Identification and annotation of HD-Zip genes in wheat
The HD-Zip domain (PF00046) was downloaded from Pfam (http://pfam.xfam.org/) and used as a query. The HD-Zip genes were identified from the Chinese Spring IWGSC RefSeq v1.0 reference genome assembly (https:// wheat-urgi.versailles.inra.fr/) using HMMER3.1 [68](Evalue <1e-10). After remove all redundant sequences using CD-hit program, the rest of protein sequences were further subjected to identify the HD and LZ domains using the Simple Modular Architecture Research Tool (SMART; http://smart.embl-heidelberg.de/smart/ set_mode.cgi? NORMAL = 1). We further filtered these genes through phylogenetic analysis along with previously identified HD-Zip proteins from Arabidopsis thaliana, Vitis vinifera, Populus trichocarpa, Brachypodium distachyon, Oryza sativa, Zea mays, and Physcomitrella patens [37]. Phylogenetic analysis was also implemented to categorize different HD-Zip subfamilies. Homeologous genes from each of the three wheat subgenomes (A, B, and D genomes) were named TaHDZX_ZA, TaHDZX_ZB, or TaHDZX_ZD, where X denotes the gene number and Z the wheat chromosome where it is located. The theoretical pI (isoelectric point) and Mw (molecular weight) of each putative wheat HD-Zip protein was calculated using compute pI/Mw tool online (http://web.expasy. org/compute_pi/).

Phylogenetic analysis and conserved protein motif/ domain identification
Multiple sequence alignments were generated using the ClustalW program with the default settings [69]. To investigate the evolutionary relationship among HD-Zip proteins, an unrooted phylogenetic tree was obtained by neighborjoining (NJ) method using MEGA6.0 software based on the full-length of HD-Zip protein sequences [70]. The bootstrap probability of each branch was estimated with 10,000 replications to obtain confidence support.

Gene expression analysis by RNA-seq data
To study the expression of TaHDZ genes in different tissues, RNA-seq data from ten tissues, including root, stem, leaf of wheat seedling at five-leaf stage, young spike at early booting stage, spike at heading stage, flag leaf at heading stage, and the grain of 5, 10, 15 and 20 DPA were collected from database (http://genedenovoweb.ticp.net: 81/Wheat_GDR1246/index.php?m = index&f = index). For further analysis the expression of TaHDZ genes in response to drought stress, we harvested the leaves and roots from three-week-old wheat seedlings subjicted to drought treatment at 0, 1, 3, 6, and 12 h to conduct the RNA-seq analysis. Wheat plantation and sampling was mentioned above. TopHat and Cufflinks were used to analyze the genes' expression based on the RNA-seq data [71,72]. The FPKM value (fragments per kilobase of transcript per million fragments mapped) was calculated for each TaHDZ gene, the log10-transformed (FPKM + 1) values of the 113 TaHDZ genes were used for heat map generation.

RNA extraction and quantitative real-time PCR
Total RNA was isolated and purified using Total RNA Rapid Extraction Kit for Polysaccharides Polyphenol Plant (BioTeke) according to the manufacturer's directions. To eliminate genomic DNA contamination, the purified RNA was dosed with RNase-free DNase I (TaKaRa, China). One μg of the total RNA was used to synthesize firststrand cDNA via Recombinant M-MLV reverse transcriptase (Promega, USA). We then performed quantitative real time-PCR (qRT-PCR) using an ABI7300 Thermo-cycler (Applied Biosystems, USA) in optical 96well plates. The reactions were performed in a 10 μl solution (200 nM gene-specific primers, 1 μl diluted cDNA, and 5 μl SYBR Premix Ex Taq II (TaKaRa)) under the following conditions: 10 min at 95°C, then 40 cycles of 15 s at 95°C and 30 s at 60°C. We used a melting curve analysis to verify the specificity of the amplicon for each primer pair. The wheat Actin (Gene ID: 542814) was used as the internal control to detect the expression of TaHDZ5 in wheat, and Arabidopsis Actin2 (AT3G18780) was used to analyze the expression of stress-responsive genes in Arabidopsis. The relative gene expression levels were calculated according to the 2 −ΔΔCt method [73], with the variation in expression being estimated from three biological replicates. The primer pairs used for qRT-PCR analysis are listed in Additional file 12: Table S4.

TaHDZ5-6A isolation and Arabidopsis transformation
Arabidopsis ecotype Columbia was obtained from the Prof. Zhensheng Kang's Lab (Northwest A&F University, China) and was used to transform the TaHDZ5-6A. We amplified the full-length opening reading frame of TaHDZ5-6A with gene specific primers from wheat cDNA (forward: 5′-ATGGAGCCCGGCCGGCTCAT-3′; reverse: 5′ -CTAGTTCCACATCCAGTAGCTGATC-3′), after which we cloned into the pGreen vector [74] via the cauliflower mosaic virus (CaMV) 35S promoter. We then introduced the recombinant vector (35S:: TaHDZ5-6A) into Agrobacterium tumefaciens, creating the ecotype Columbia via the floral dip method. We plated T 1 seeds on an MS medium (2% sucrose, 50 mg/ mL kanamycin) to select the transformants, and used homozygous T 3 plants to analyze the phenotypes.

Drought tolerance assay
We transferred seven-day-old 35S::TaHDZ5-6A plants that were germinated on the MS medium to pots with a 230 g, 2:1 solution of Jiffy mix and vermiculite to perform the drought tolerance assays. We exposed 30 two-day-old plants that had been growing under ideal conditions (relative humidity 60%, and 16/8 h light/dark photoperiod, 22°C) to drought conditions, withholding water from the plants for 14 days. We then resumed watering to allow for recovery, observing the number of plants that survived after six days. We compared at least 64 plants in each line with the wild-type (WT) plants in each experiment. The statistical data was obtained from three independent experiments, while a student's t-test was used to analyze the differences between transgenic and wild-type plants.

Water loss measurement
We measured the rates of water loss in eight plants of the 35S::TaHDZ5-6A transgenic and eight wild-type plants. We detached soil-grown plants that were three weeks old from the roots, and weighed them immediately (fresh weight, FW). The plants were placed on a stable surface (relative humidity 40-45%, 22-24°C) and weighed at predetermined intervals (desiccated weights), after which the proportions of weight loss were calculated relative to the starting weights. The plants were then dried in an oven at 80°C for 24 h to a constant dry weight (DW). The water loss was considered to be the percentage of the starting weight at each predetermined time point. Each line had three replicates performed, and a student's t-test was used to analyze the differences between transgenic plants and wild-type plants.

Stomatal aperture analysis
Stomatal apertures were measured according to previously described procedures [75]. Samples of similarly sized and aged leaves were obtained from 35S::TaHDZ5-6A and WT plants that were subjected to drought conditions for 10 d. The rosette leaves were placed in a solution (10 mM Mes-Tris, 30 mM KCl, pH 6.15) and exposed to light for 3 h. The stomata on the epidermal strips obtained from rosette leaves was examined using a light microscope (Olympus ix71, Tokyo, Japan). The Image J software (http://rsbweb.nih.gov/ij) was used to determine the length and width of the stomatal pores, which was subsequently used to calculate the stomatal apertures, or the ratio of width to length.

Proline content measurement
After exposure to drought conditions, the Arabidopsis leaves were collected at predetermined times to assess the levels of free proline. We obtained leaves of similar size and location from WT plants and 35S::TaHDZ5-6A to keep the samples uniform, and levels of free proline were analyzed according to previous procedures [76]. The samples (~0.1 g) were then homogenized in 3% sulfosalicylic acid and boiled for 10 min. Following the reaction between acid ninhydrin and proline, we measured the absorbance of the sample solutions at 520 nm using a UV-Vis spectrophotometer (NanoDrop 2000c, Thermo Scientific, Wilmington, DE, USA).