Response of phytohormone mediated plant homeodomain (PHD) family to abiotic stress in upland cotton (Gossypium hirsutum spp.)

Background The sequencing and annotations of cotton genomes provide powerful theoretical support to unravel more physiological and functional information. Plant homeodomain (PHD) protein family has been reported to be involved in regulating various biological processes in plants. However, their functional studies have not yet been carried out in cotton. Results In this study, 108, 55, and 52 PHD genes were identified in G. hirsutum, G. raimondii, and G. arboreum, respectively. A total of 297 PHD genes from three cotton species, Arabidopsis, and rice were divided into five groups. We performed chromosomal location, phylogenetic relationship, gene structure, and conserved domain analysis for GhPHD genes. GhPHD genes were unevenly distributed on each chromosome. However, more GhPHD genes were distributed on At_05, Dt_05, and At_07 chromosomes. GhPHD proteins depicted conserved domains, and GhPHD genes exhibiting similar gene structure were clustered together. Further, whole genome duplication (WGD) analysis indicated that purification selection greatly contributed to the functional maintenance of GhPHD gene family. Expression pattern analysis based on RNA-seq data showed that most GhPHD genes showed clear tissue-specific spatiotemporal expression patterns elucidating the multiple functions of GhPHDs in plant growth and development. Moreover, analysis of cis-acting elements revealed that GhPHDs may respond to a variety of abiotic and phytohormonal stresses. In this regard, some GhPHD genes showed good response against abiotic and phytohormonal stresses. Additionally, co-expression network analysis indicated that GhPHDs are essential for plant growth and development, while GhPHD genes response against abiotic and phytohormonal stresses may help to improve plant tolerance in adverse environmental conditions. Conclusion This study will provide useful information to facilitate further research related to the vital roles of GhPHD gene family in plant growth and development.


Background
Plants often face various abiotic and biotic stress conditions. Abiotic stresses include heat, cold, drought, and salinity, whereas biotic stresses mainly come from bacteria, fungi, viruses, and insects. These abiotic and biotic stresses significantly reduce crop quality and productivity world-wide [1,2]. In order to adapt such unfavorable environment, plants have established a comprehensive mechanism to combat stress signals and mitigate their effects on plant growth and development [3]. Phytohormones play significant roles in regulating developmental processes and signal transduction networks, which respond to various abiotic stresses. Brassinosteroid (BR), jasmonate (JA), gibberellin (GA), salicylic acid (SA), auxin, and abscisic acid (ABA) regulate plant growth, development, stress, and defense responses [4][5][6][7][8][9][10][11], but how phytohormones mediate the growth and stress trade-off is unclear.
Zinc finger protein motifs are part of many protein families and widely distributed in eukaryotic organisms. The term "zinc finger" represents the sequence motif in which cysteines and/or histidines coordinate the zinc atom(s) to form the local peptide structure that are required for their specific functions. The "finger" structural motif has been divided into different types, such as TFIIIA-type zinc finger (EPF1, SUPERMAN) [12,13], WRKY family (WRKY1, 2, and 3), GATA1-type protein (NTL1) [14,15], Dof family (Dof1) [16,17], RING-finger type (COP1) [18], PHD-finger family (AtHAT3.1 and ZmHOX1a) [19,20], LIM family (SF3) [21,22], and other uncategorized types. Plant homeodomain (PHD) zinc fingers are small reader domains found in several chromatin-binding proteins. In plants, PHD proteins are usually zinc finger proteins with one or more PHD domains, which have a Cys4-His-Cys3 zinc-binding motif consisting of about 60 amino acids [23]. It is worth noting that the number of amino acids between cysteine and histidine or between cysteine residues in the PHD domain are conserved, while second amino acid (before the penultimate cysteine residue) is usually an aromatic amino acid, such as tryptophan [24].
Since the discovery of the first PHD protein HAT3.1 (Histone acetyltransferase 3.1) in Arabidopsis, more PHD proteins have been identified to participate in many physiological and biochemical processes involved in the structure and transcription of chromatin [25]. In Arabidopsis, PHD protein MMD1 (Male meiocyte death 1)/DUET is specifically expressed in male meiocytes and involved in regulating gene expression during meiosis, mutations of mmd1 gene leads to the death of male meiotic cells [26][27][28]. Epigenetic regulation in eukaryotes is performed through complex signal interactions between chromatin markers and small RNA species. AtVIM1 (Variant in methylation 1) functions in DNA methylation-histone interface to maintain the centromeric heterochromation in Arabidopsis [29]. In addition, PHD proteins are involved in regulating plant response to abiotic stresses and altering plant growth and development [30,31]. In soybean, six Alfin1-type PHD proteins were identified to respond against salt, cold, drought, and ABA treatment. For instance, GmPHD2 improve salt tolerance in transgenic Arabidopsis plants compared with the wild type plants [32]. In Arabidopsis, AtVIN3 (Vernalization insensitive 3) protein binds to modified histone in vitro to change the binding specificity of PHD-finger domain and accelerate the vernalization reaction in vivo [33]. During seed germination, the AL PHD-PRC1 complex affect seed developmental genes from the active state associated with H3K4me3 to the repressive transcriptional state associated with H3K27me3, thereby promote seed germination [34]. PHD protein GSR1 (Germostatin resistance locus 1) is a member of auxin-mediated genetic network for seed germination and form a corepressor with ARF16 (Auxin response factor 16) to regulate seed germination [35]. Therefore, PHD proteins play irreplaceable roles in the biological processes of life.
At present, the PHD protein family has been studied in several plants, such as Arabidopsis thaliana, poplar (Populus trichocarpa) [36], maize (Zea mays) [30], moso bamboo (Phyllostachys edulis) [37], carrot (Daucus carota L.) [38], potato (Solanum tuberosum) [39], and pear (Pyrus bretschneideri) [40]. However, comprehensive identification and characterization of cotton PHD protein family has not been carried out till date. Upland cotton (Gossypium hirsutum) is the most important natural fiber crop in the world. Recently, the availability of the complete genome sequence and annotations of G. hirsutum [41], G. arboreum [42], and G. raimondii [43] provided an excellent opportunity to identify and characterize PHD transcription factors in cotton. In this study, we performed the whole genome-wide analysis, tissue expression pattern analysis, relative expression level analysis under different stresses and phytohormones treatment, and co-expression network analysis of GhPHD genes in upland cotton. Our results indicated that GhPHD genes are involved in various processes of plant growth and development, and phytohormones mediate responses of GhPHD genes against abiotic stresses.

Genome-wide identification of PHD proteins in cotton
Based on the homology of protein sequences, 108, 52, and 55 PHD proteins were identified in three cotton species G. hirsutum, G. arboreum, and G. raimondii, respectively. In addition, 39 and 43 PHD proteins were identified in Arabidopsis and rice, respectively (Table  S1). Among 108 GhPHD proteins, 56 members belong to the At subgenome and 52 members belong to the Dt subgenome. The predicted biophysical characteristic of GhPHDs (Table 1) indicates that the length of GhPHD proteins ranges from 159 aa (GhPHD28) to 2231 aa (GhPHD39) with an average length of 741 aa. Moreover, the molecular weight of GhPHD proteins ranges from 17.76 kD (GhPHD28) to 247.42 kD (GhPHD39) with an average value of 93.09 kD. The isoelectric point (pI) of GhPHD proteins ranges from 4.58 (GhPHD38) to 10.41 (GhPHD103) with an average value of 6.89. Furthermore, the predicted subcellular localization indicated that 93 GhPHD proteins are located in nucleus, ten in cytoplasm, and five are extracellular.

Phylogenetic analysis, chromosomal location, and gene duplication
In order to understand the phylogenetic relationship of PHD proteins in rice, Arabidopsis, and cotton, we constructed a NJ phylogenetic tree and classified PHD proteins into five groups (A-E) (Fig. 1). Among them, most of the orthologous PHD proteins between the diploid and allotetraploid cotton are grouped in same clade exhibiting maximum homology in phylogenetic relationship. Each group contains PHD proteins of these five species, of which group A and D are the first and second largest groups, containing 97 and 79 members, respectively. While, there are relatively few PHD members in groups B, C, and E. Chromosome location analysis showed that 108 GhPHD genes are positioned on 26 chromosomes, including 13 chromosomes from the At subgenome and 13 chromosomes from the Dt subgenome ( Fig. S1 and Table S2). Deeper insights indicated that At_05, At_07, and Dt_05 chromosomes contain more number of genes (eight GhPHD genes on each) and display a dense distribution at the top. However, some chromosomes contain only two GhPHD genes, such as At_10, At_11, Dt_03, and Dt_11.
We further investigated the whole genome duplication (WGD) event experienced by GhPHD genes. As a result, 73 GhPHD gene pairs depict segmental duplication and four gene pairs show tandem duplication events ( Table 2), indicating that WGD is the main contributor of GhPHD gene family expansion. Duplication gene pairs may have undergone three alternative fates during the evolution process, namely non-functionalization, neofunctionalization, and sub-functionalization [44]. In order to study the evolutionary history of GhPHD genes, the Ka/Ks calculator 2.0 is used to calculate the synonymous and non-synonymous substitution rates. The Ka/Ks ratio of 76 duplicated gene pairs is less than 1, indicating that GhPHD genes underwent purification selection pressure with limited functional divergence. However, there is only one gene pair with the Ka/Ks greater than 1, indicating the occurrence of positive selection pressure. Collectively, these results indicated that the great contribution of purification selection pressure in the functional maintenance of GhPHD genes in upland cotton.

Gene structure and conserved motifs analysis
To better understand the similarity and diversity of GhPHD proteins in upland cotton, we analyzed the phylogenetic tree, exon-intron structure, and conserved motif. Phylogenetic tree grouped GhPHD proteins according to protein homology, conserved gene structure, and motif distribution (Fig. 2). GhPHD49 shows the longest genomic sequence with 26 exons, while GhPHD12 displays the shortest genomic sequence with only two exons ( Fig. 2 and Table S3). Furthermore, a total of three motifs are identified in all GhPHD proteins, and all GhPHD proteins have a typical PHD domain (i.e., motif 1). Phylogenetic tree showed that 21 GhPHD proteins are clustered in a clade. Except for GhPHD28, all other GhPHD proteins contain three motifs with similar gene structure and motif distribution (Fig. 2).
Protein sequence alignment shows that GhPHD proteins have a typical Cys4-His-Cys3 motif, which consists of about 60 amino acids and is accompanied by nine conserved amino acid residues (Fig. S2). The conserved histidine (H) is separated from the fourth conserved cysteine (C) by four amino acids and two amino acids from subsequent conserved cysteine (C) residue. The third and fourth conserved cysteine (C) before histidine (H) are separated by one or two amino acids, but the interval number between other conserved amino acids is uncertain. However, GhPHD17, GhPHD27, GhPHD71, and GhPHD81 exhibit maximum homology, but show less conserved PHD domain ( Fig. 2 and Fig. S2).

Cis-acting element analysis
Many studies have showed that PHD genes are involved in various stress responses [30,31,37]. To elucidate the putative function of GhPHDs under different stresses, we first identified the cis-acting elements in the promoter region that respond to stresses and phytohormones. We identified many cis-acting elements that respond to ABA (ABRE), auxin (TGA and AuxRR-core), GA (TATC-box, P-box, CARE, and GARE), ethylene (ERE), SA (TCA), and MeJA (CGTCA). These results indicated that a total of 85 GhPHD genes are responsive to ethylene, followed by ABA, GA, and MeJA. 73 GhPHD genes have cis-acting elements that respond to three or more phytohormones. Interestingly, the promoters of GhPHD5, GhPHD47, GhPHD56, and GhPHD65 genes contain cis-elements that respond to the above six phytohormones. In addition, we found that many abiotic stresses response elements (TC-rich repeat, MBS, and LTR), circadian control elements, and light-responsive  3 and Table S5). These results indicated that GhPHD genes may participate in various signal transduction pathways, such as phytohormones, light response, and abiotic stresses, and play important roles in regulating plant growth and development.

Tissue-specific expression pattern of GhPHD genes
To predict the physiological functions of GhPHD genes in cotton growth and development, we used the online transcriptome data to analyze the tissue-specific expression profile of GhPHD genes in different tissues such as root, stem, leaf, petal, stamen, pistil, ovule, and fiber. According to the expression features and hierarchical clustering (Fig. 4), GhPHD genes are mainly clustered into four groups (A-D). The nine GhPHD genes in group A are highly expressed in all tissues, indicating that they may play important roles in plant growth and development. In particular, GhPHD23 and GhPHD77 show maximum expression levels in ovule and fiber tissues, demonstrating that these two genes may be involved in the development of ovule and fiber. Further, 43 GhPHDs in group B show lower expression levels in all tissues, while six GhPHD genes (GhPHD56, GhPHD108, GhPHD40, GhPHD93, GhPHD19, and GhPHD73) are predominantly expressed in the early stage of ovule development, indicating that they may play important roles in ovule and seed development. Moreover, GhPHD genes in group C show higher expression levels in ovule. However, GhPHD genes in group D show poor expression in all observed tissues. These results indicated that GhPHDs may be involved in regulating cotton growth and development, especially in the development of ovule and fiber.

Identification of stress-related PHD genes in upland cotton
Analysis of the transcriptome data showed that 66 GhPHD genes have higher expression levels under heat, cold, salt, and drought treatments (Fig. S3). In order to further estimate the responses of GhPHDs under abiotic stresses, we treated four-week-old cotton seedlings with heat, cold, salt, and drought, and observed the relative expression level of 12 GhPHD genes (Fig. 5). The relative expression level of GhPHD18 is up-regulated under all stresses, indicating that GhPHD18 may be involved in multiple stresses response mechanisms. GhPHD23 is upregulated only under heat treatment, indicating that GhPHD23 responds positively to heat stimuli. Further, GhPHD34, GhPHD40, and GhPHD43 are up-regulated after heat and salt treatment, while GhPHD80 and GhPHD88 are up-regulated after heat and drought tolerance at various time points. In addition, we found that GhPHD5 is up-regulated against salt and drought, while GhPHD72 and GhPHD107 are up-regulated against salt and heat, respectively. These results indicated that GhPHD genes may be involved in abiotic stress to improve plant tolerance in adverse environments.

Identification of GhPHD genes in response to phytohormones
To further determine whether GhPHD genes respond to phytohormones, we treated four-week-old cotton seedlings with GA, MeJA, IAA, SA, and BL, and identified changes in the relative expression of GhPHD genes (Fig. 6). The relative expression level of GhPHD5 increases significantly after MeJA, IAA, and BL treatment. While GhPHD5 shows higher expression after 0.5 h after SA treatment indicating that GhPHD5 may respond to multiple phytohormones signal transduction pathway, which is consistent with the fact that GhPHD5 promoter contains cis-acting elements related to multiple  Co-expression network with functional modules for G. hirsutum and G. arboreum Gene co-expression network analysis is a network diagram constructed on the basis of similarity of gene expression data, reflecting the relationship of expression regulation between genes [45]. We analyzed the coexpression network of GhPHD genes using ccNET software, and predicted many co-expressed genes and interaction proteins (Table S6). Among these, GhPHD5 is positively co-expressed with a plant-specific DNA ligase, which is related to seed germination and DNA repair. In addition, GhPHD5 is also positively co-expressed with SLOMO protein, which is a F-box protein required for auxin homeostasis and the normal timing of lateral organ initiation at the shoot meristem [46] illustrating that GhPHD5 may be involved in the regulation of auxin signal transduction pathway, and mediates seed germination and organ formation to regulate plant growth and development. Similarly, GhPHD18 interacts with highly hydrophilic proteins that regulate FLC (Flowering locus C) expression [47] and shows positively co-expressed with SHAGGY-related kinases involved in meristem organization, indicating that GhPHD18 may affect the flowering time of meristem. Further, GhPHD34 negatively co-expressed with ERF (Ethylene response factor) subfamily B-1, participating in ethylene signaling pathway and responding to abiotic stresses. GhPHD107 positively co-expressed with ARF-GAP and ERF genes, and may be involved in the signal pathways of auxin and ethylene. More interestingly, we predicted many proteins that interact with GhPHD88, such as leucine-rich repeat protein kinase (LRRK), late embryogenesis abundant (LEA) protein, AP2/B3 transcription factor, R2R3 factor, DREB subfamily A-2, cellulose synthase, gibberellinregulated family protein (GRP), and ethylene response factor (ERF) (Fig. 7a and Table S6), suggesting that GhPHD88 may be involved in many physiological processes such as plant growth and development, phytohormone signal transduction, and stress response. Further, Gene Ontology (GO) analysis of GhPHDs indicated that protein binding and zinc ion binding are the most abundant functional terms (Fig. 7b), which is consistent with the existing results that the cysteine residues exhibit high affinity for zinc ions (Zn 2+ ), and Zn 2+ -cysteine complexes are key medium for protein structure, catalysis, and regulation [48]. In summary, GhPHDs were involved in regulating cotton growth and development, especially ovule and fiber development. Further, GhPHDs not only respond to multiple phytohormones signal transduction pathways, but also improve cotton's tolerance to adverse environments such as heat, salt, and drought. Particularly, GhPHD5, GhPHD80, GhPHD88 are prominent in their responses. Combining the predicted results of co-expressed genes and interacting proteins, we inferred that phytohormones could improve plant tolerance to abiotic stresses through GhPHD genes and their cofactors, but their regulatory mechanism and interaction network still need further research.

Phylogenetic analysis and duplication
Phylogenetic tree was used to analyze the evolutionary relationship between PHD proteins in cotton, rice, and Arabidopsis. A total of 297 PHD proteins were divided into five groups (A-E). The relationship between cotton PHD proteins and AtPHD proteins was closer than that of OsPHD proteins, which is consistent with the evolutionary relationship between cotton, Arabidopsis, and rice. Although the G. arboreum genome is about twice that of the G. raimondii genome, however, more GrPHD proteins were identified than GaPHD proteins. Most PHD proteins from two diploids and one allotetraploid were closely distributed in phylogenetic tree, which is coherent with the fact that upland cotton evolved from the hybridization of A and D genomes [49].
We identified 108 GhPHD proteins in the G. hirsutum genome, which are more than previously identified PHD protein family members in Arabidopsis, maize, potato, and pear [30,39,40]. The main reason for the more number of GhPHDs is that upland cotton underwent polyploidization and promoted gene duplication. Upland cotton is an allotetraploid cotton produced by the hybridization between G. arboreum (A 2 genome) and G. raimondii (D 5 genome) [49]. The At and Dt subgenome donors of upland cotton are orthologous relatives and share the same number of ortholog genes, resulting in the duplication and doubling of GhPHD genes in upland cotton. Therefore, the sum total of GaPHD genes and GrPHD genes was approximately equal to the number of GhPHD genes. Previous studies have reported that gene duplication, including whole genome duplication, Fig. 5 The relative expression levels of 12 GhPHD genes under heat, cold, salt, and drought treatment. The relative expression levels were estimated by RT-qPCR. The error bars represent the standard deviations of three experiments segment duplication, tandem duplication, and transposition events was the main reason for gene family expansion [50,51]. In our study, a total of 77 duplicated gene pairs were identified in GhPHD family, including 73 segmental duplicated pairs and four tandem duplicated pairs ( Table 2). The Ka/Ks values of most GhPHD duplication gene pairs was less than 1, which indicated that GhPHD family experienced strong purification selection pressure. Purification selection dominated the expansion of GhPHD genes, eliminated deleterious loss-of-function mutations at both duplicated loci, increased fixation, and retained the function of the new duplicated genes [52].
Conserved amino acid residues, protein motifs, and gene structure analysis Conserved amino acid residues analysis showed that GhPHD domain was highly conserved during the process of evolution. The amino terminus of GhPHD domain contained the Cys4-His-Cys3 zinc finger motif composed of 50 to 80 amino acids with the regular arrangement of cysteine residues, an important medium for zinc ion binding and protein structure [48]. In addition, a total of three motifs were identified in GhPHD proteins and the motif distribution was relatively conservative, indicating that GhPHD proteins may play different physiological functions, and the subtle differences between GhPHD proteins in different clade may be related to cotton growth, development, and stress tolerance.
Gene structure may be determined by the insertion/deletion events and is an important parameter to predict gene evolution and new function generation [53]. Gene structure analysis indicated that the duplication genes showed similar gene structures with varied intron length indicating that the intron length may play major roles in the functional diversification of GhPHD genes. In this study, we found that the intron number varies from 1 to 25, but most GhPHD genes Fig. 7 Co-expression networks analysis of GhPHD88 and GO enrichment analysis of 108 GhPHDs. a Co-expression network analysis of GhPHD88 with functional modules for G. hirsutum and G. arboreum. Yellow and green colour indicates that query protein and interaction proteins, respectively. There are four interaction lines, red lines indicated ortholog gene pairs in G. hirsutum and G. arboreum; pink lines and blue lines indicate proteins own interaction and positive/negative co-expression relationship with target protein; orange lines indicate proteins own interaction and protein-protein relationship with target protein. b GO enrichment analysis of all GhPHD genes contained 2 to 11 introns which supported the previous research that cotton is a new evolution species that experience a decrease in the number of introns during the early stages of evolution [54].

GhPHD genes expression in tissues, abiotic and phytohormone stresses
Many studies demonstrated that PHD proteins are the main mediators of transcriptional regulation during plant developmental processes such as meiosis and postmeiotic events [55], germination [34], pollen maturation [56], flowering time [57], embryo meristem initiation, and root development [55,58]. Gene's expression profiles showed that GhPHDs may play important regulatory roles in cotton growth and development, especially during the development of ovule and fiber. In addition, we have also identified some GhPHD genes that respond to abiotic stress and phytohormones in upland cotton. The analysis of cis-acting elements and seedlings treatment experiments indicated that GhPHD genes may respond to abiotic stress and participate in the signal transduction of phytohormones.
For example, GhPHD genes (GhPHD5, GhPHD40, GhPHD43, GhPHD80, and GhPHD88) respond positively to heat, salt, and drought and they may be important genetic materials for improving plant tolerance under adverse environments.
Research reports indicated that phytohormones may regulate the response to abiotic stress in plants. Auxin response factors (ARFs) are a type of transcription factors that regulate the expression of auxin-responsive genes [59,60]. The significant up-regulation of the transcription level of ARFs under stress indicates that they are potential mediators for plants to respond to adverse environments [61,62]. Ethylene response factors belong to the ERF subfamily of the AP2/ARF transcription factor family, and are widely involved in plant development, phytohormones response, disease resistance, and adversity response [63,64]. In this study, co-expression network analysis indicated that GhPHD genes may improve plant tolerance to abiotic stresses by phytohormone signaling pathways. For instance, GhPHD5 may improve tolerance to heat, salt, and drought by regulating auxin homeostasis. Similarly, GhPHD34 and GhPHD107 may be involved in auxin and ethylene signal transduction pathways to improve heat tolerance and promote growth and development. GhPHD88 regulates the signal transduction of various phytohormones and abiotic stresses, and promotes growth and development. Although GhPHDs are indispensable in the course of life, the physiological functions of GhPHDs in crosstalk between abiotic stress and phytohormone need further study.

Conclusions
In this study, a total of 297 PHD proteins were identified in total five plant species including G. hirsutum, G. arboreum, G. raimondii, rice, and Arabidopsis. The PHD proteins were divided into five groups based on the phylogenetic analysis. Segmental duplication events were the main contributors toward the expansion of GhPHD gene family in upland cotton. Moreover, duplicated gene pairs of GhPHD gene family might have experienced functional divergence, since their expression patterns were different in different tissues. Tissues specific expression patterns indicated that GhPHDs are very important for growth and development, especially ovule and fiber development. The phytohormones and stresses treatment and co-expression network analysis showed that GhPHDs may improve the tolerance to adverse environments by phytohormones signal transduction pathway. Taken together, our study provides key basic knowledge to understand the functional mechanisms of cotton growth and development, as well as candidate genes for cotton breeding resistant to abiotic stresses and phytohormone stimulation.

Methods
Sequence retrieval, multiple sequence alignment, and phylogenetic analysis The genome sequence and information of cotton (G. hirsutum, G. raimondii, and G. arboreum) were acquired from the CottonFGD (https://cottonfgd.org/) [65]. HMMER (https://www.ebi.ac.uk/Tools/hmmer/) software with default parameters was used to search for the corresponding protein sequences, and used the conserved PHD domain sequence as a query. We used BLAST program to further identify PHD sequences based on homology. The conserved domain of PHD proteins was predicted by Pfam [66] and SMART [67] software. Multiple sequence alignment of PHD proteins were performed using Clustal X [68]. MEGA 6.0 [69] was used to construct phylogenetic trees, using the neighbor-joining (NJ) algorithm with default parameters and 1000 bootstrap replicates. The molecular weight (MW), isoelectric point (pI), and GRAVY value of GhPHD proteins were predicted using ExPASy [70], and the subcellular localization of GhPHD proteins was predicted by the CELLO v2.5 server [71].
Chromosomal location, gene structure, and conserved motif The positional information of GhPHD genes was obtained from the General Feature Format (GFF) file downloaded from the CottonFGD website [65]. GhPHDs were mapped on the chromosome using MapInspect (https://mapinspect.software.informer.com/). For the exon-intron structural analysis of GhPHD genes, the coding sequences were used to align their genomic DNA sequences and the structure diagram was drawn using the online Gene Structure Display Server (GSDS 2.0) program [72]. Conserved motifs of GhPHD proteins were investigated using the online toolkit Multiple Expectation maximization for Motif Elicitation (MEME 5.0.5) [73]. The optimized parameters of MEME are as follows: the number of repetitions, any; the maximum number of motifs, 50; and the optimum width of each motif, between 6 and 300 residues, and retaining only motifs associated with an E value < e − 5 . The identified protein motifs were further annotated with TBtools [74].

Identification of cis-acting elements and gene expression pattern
The 1500 bp promoter sequence before the transcription start site of GhPHD genes were downloaded from the CottonFGD website [65]. The cis-acting elements in the GhPHD promoter regions were predicted using the Plant Cis-Acting Regulatory Element website [75]. The tissue expression patterns of GhPHD genes were analyzed using the online cotton transcriptome data, and heatmap was drawn by TBtools [74]. The transcriptome data of root, stem, leaf, petal, stamen, pistil, ovule (− 3, − 1, 0, 1, 3, 5, 10, 20, 25, 35 DPA) and fiber (5,10,20,25 DPA) was used in this study. The ccNET software [76] was used to analyze the gene co-expression network relationship.

Plant material, abiotic stresses and phytohormones treatment
Upland cotton ZM24 is a short-season cotton variety selected by the Cotton Research Institute of Chinese Academy of Agricultural Sciences. Firstly, ZM24 seeds were pre-germinated in the conical flask filled with water at room temperature for 48 h. Pre-germinated seeds were then transferred to the liquid medium with a cultivation temperature of 30°C, a photoperiod of 16 h light and 8 h dark. Four-week-old cotton seedlings were treated with brassinolide (BL, 10 μM), gibberellin (GA, 100 μM), indole-3-acetic acid (IAA, 100 μM), salicylic acid (SA, 10 μM), and methyl jasmonate (MeJA, 10 μM) for 0.5, 1, 3, and 6 h. Similarly, four-week-old cotton seedlings were treated with heat (38°C), cold (4°C), NaCl (200 mM), and polyethylene glycol (PEG) (20% mass fraction) for 1, 2, 4, and 6 h. In the experiment, the untreated sample was used as the control group. The collected leaves were immediately frozen in liquid nitrogen and stored at − 80°C for RNA extraction and RT-qPCR analysis. For abiotic stresses and phytohormones treatment, a total of 20 cotton seedlings were used for each treatment and three biological replicates were performed for each experiment.

RNA extraction and RT-qPCR analysis
Total RNA of the collected cotton leaves was extracted using the RNAprep Pure Plant Kit (Polysaccharides & Polyphenolics-rich) (TianGen, Beijing, China). In order to synthesize the first-strand cDNA, the EasyScript Allin-One First-strand cDNA synthesis SuperMix for RT-qPCR kit (TransGen, Beijing, China) was used in accordance with the manufacturer's protocol and the cDNA was used as template for subsequent RT-qPCR reaction. RT-qPCR was performed using TransStart Top Green qPCR SuperMix (TransGen, Beijing, China) in LightCycler 480 (Roche, Basel, Switzerland). Each PCR reaction was performed in triplicate, and three biological replicates were quantified. GhHistone 3 (GenBank accession no. AF024716) was used as an internal control [77]. The relative expression level was calculated as described previously [78]. The primers used for RT-qPCR analysis were listed in Table S7. For statistical analysis, the RT-qPCR data was considered as normal distribution and we conducted a two-tailed Student's t-test in Microsoft Excel 2007.