Genome-wide Characterization and Functional Analysis of the Heat Shock Transcription Factor Family in Pumpkin

Background: Crop quality and yield are affected by abiotic and biotic stresses, and heat shock transcription factors (Hsfs) are considered to play important roles in regulating plant tolerance under various stresses. To investigate the response of pumpkin to abiotic stress, we analysed the genome of pumpkin. Results:In this research, a total of 36 Cucurbita moschata Hsf (CmHsf) members were identied and classied into three subfamilies (I, II, and III) according to their amino acid sequence identity. The Hsfs of the same subfamily usually exhibit a similar gene structure (intron-exon distribution) and conserved domains (DNA-binding and other functional domains). Chromosome localization analysis showed that the 36 CmHsfs were unevenly distributed on 18 of the 21 chromosomes (except for Cm_Chr00, Cm_Chr08 and Cm_Chr20), among which 18 genes formed 9 duplicated gene pairs that have undergone segmental duplication events. The Ka/Ks ratio showed that the duplicated CmHsfs have mainly experienced strong purifying selection. High-level synteny was observed between Cucurbita moschata and other Cucurbitaceae species. Conclusions: The expression prole of CmHsfs in the roots, stems, cotyledons and true leaves revealed that the CmHsfs exhibit tissue specicity. The analysis of cis-acting elements and quantitative real-time polymerase chain reaction (qRT-PCR) revealed that some key CmHsf genes were activated by cold stress, heat stress, hormones and salicylic acid. This study lays the foundation for revealing the role of CmHsfs in resistance to various stresses, which is of great signicance for the selection of stress-tolerant pumpkin varieties. pectin, other the gene structural characteristics, gene duplications, chromosomal locations, evolutionary divergence and phylogenetic relationships of 36 pumpkin Hsf genes. analyse the digital expression proles of these 36 genes in response to numerous This study emphasizes the function of the Hsf genes in various stress conditions and improves our understanding of the effects of polyploidization on the evolution of the Hsf family.

Identi cation of CmHsf genes in pumpkin and physical and chemical characteristics of CmHsfs A total of 36 CmHsf genes were identi ed after the removal of redundant sequences (Table 1), and they were designated CmHsf1 to CmHsf36 according to the starting positions of the 36 genes on the chromosomes (from Cmo_Chr00 to Cmo_Chr20, from top to bottom). The physicochemical parameters of each CmHsf were generated, and the open reading frames (ORFs) ranged from 543 bp (CmHsf32) to 4380 bp (CmHsf13), with predicted proteins of 179-1458 amino acids. Furthermore, the molecular weights of these CmHsfs ranged from 20.5642 to 161.5554 kiloDaltons (kDa) ( Table 1). Although the deduced heat shock transcription factors presented diversity in terms of the parameters mentioned above, most of the CmHsfs exhibited low isoelectric points (pI) (average 6.3) ( Table 1). Subcellular localization prediction indicated that only 2 heat shock transcription factor genes (CmHsf12 and CmHsf17) were expressed in the cell membrane, cytoplasm and nucleus, while the remaining CmHsf genes were expressed in the nucleus. Classi cation and conserved domain analysis of 36 CmHsfs in pumpkin To identify the phylogenetic relationships of the 36 CmHsfs, an unrooted phylogenetic tree was produced. These pumpkin CmHsfs can be divided into three subfamilies (subfamily I, subfamily II and subfamily III; Fig. 1A) according to sequence similarity. Subfamily I (containing 21 members) was the largest group, and subfamily III included 13 members, while subfamily II presented the fewest members (2 members) (Fig. 1A). Furthermore, based on the structural characteristics of the conserved DBDs and HR-A/B domains, we can divide the 36 CmHsfs into three groups (A, B, and C) ( Table 2). All CmHsfs contained a DBD and an HR-A/B domain (Table 2), and the DBD was composed of approximately 100 conserved amino acids (Fig. S1). In addition, except for CmHsf27 and CmHsf32, all of the CmHSFs contained an NLS. The CmHsfs in group A contained an AHA domain, while the CmHsfs in groups B and C did not contain an AHA domain, and only the genes in Group B contained an RD (Table 2). To further reveal conserved domains, all CmHsfs were submitted to MEME, and 10 different motifs were identi ed ( Fig. 1B; Fig. S2). Overall, the CmHsfs exhibited 4-9 motifs, and motifs 1, 2 and 4 were present in all CmHsf proteins. Motif 3 was present in all proteins except for CmHsf20 and CmHsf5. Motif 3, motif 1 and motif 2 together formed the DBD, and motif 4 formed the HR-A/B domain. In addition, we found that motif 5 existed only in subfamily I, while motif 9 appeared only in subfamily III (Fig. 1B). It is clear that the CmHsfs from the same clade usually present conserved domains or similar motif compositions, suggesting functional similarities among these proteins.

Exon-intron analysis of 36 Hsfs in pumpkin
An exon-intron organization map of the 36 CmHsf genes was also produced (Fig. 2). Different numbers of exons (from 2 to 26) were found in the 36 CmHsf genes, suggesting that CmHsfs are quite diverse. In subfamily III, except for CmHsf1, CmHsf10 and CmHsf35, which contained 9, 8 and 3 exons, respectively, the other CmHsf genes all contained 2 exons. CmHsf genes on the same branch usually presented similar intron-exon distributions, such as CmHsf26_CmHsf9. Some genes in the same family exhibited signi cantly different intron-exon distributions. For example, CmHsf12 contained 26 exons, which was different from the other CmHsfs, indicating that CmHsf12 may have a special function.

Chromosomal distribution and gene duplication of Hsf genes in pumpkin
Chromosomal distribution analysis in the genome revealed that the 36 CmHsf genes were unevenly distributed on 19 of the 21 chromosomes (Fig. 3). The chromosome Cm_Chr06 exhibited the most CmHsf genes, with 5 genes, followed by chromosome Cm_Chr05, with 4 genes. A total of 3 genes were present on each of chromosomes Cm_Chr03, Cm_Chr07 and Cm_Chr14, and 2 genes were present on each of chromosomes Cm_Chr02, Cm_Chr04, Cm_Chr10, Cm_Chr11 and Cm_Chr16, while no genes were distributed on chromosomes Cm_Chr00, Cm_Chr08 and Cm_Chr20.
Two genes, whose putative amino acid identity is > 80% and gene alignment coverage is > 0.75, were de ned here as a recently duplicated gene pair. A total of 18 duplicated genes were identi ed and divided into nine groups, each of which contained two duplicated genes. Eight duplicated gene pairs were distributed on different chromosomes (Fig. 3), which demonstrated that segmental duplication events were involved in the expansion of the CmHsf genes. CmHsf10 and CmHsf12 were separated by a region of more than 100 kb, indicating that all duplicated gene pairs had undergone segmental duplication events. The Ka and Ks ratios were less than 1.0, which suggested that the pairs had evolved mainly under functional constraints with negative or purifying selection (Table 3). We also calculated evolutionary times and divergence times of the duplicated pumpkin Hsf gene pairs ranging from 10.17 to 65.74 million years ago (Mya), averaging 21.11 Mya (Table 3).

Synteny analysis of Hsf genes in pumpkin
According to the synteny analysis of Hsfs in pumpkin and 5 other species (Arabidopsis thaliana; Lagenaria siceraria; Cucumis sativus; Cucurbita maxima; Charleston gray), we found that Charleston gray exhibited the most Hsf homologous genes (56), followed by Lagenaria siceraria (52), Cucurbita maxima (51) and Cucumis sativus (51). Arabidopsis thaliana presented the fewest (18) homologous genes (Fig. 4). Furthermore, the syntenic genes of the pumpkin Hsfs could be found on all chromosomes of Arabidopsis thaliana, Lagenaria siceraria, Cucumis sativus, Cucurbita maxima, and Charleston gray, indicating that the pumpkin Hsfs have remained closely related to those of these ve species during the process of evolution. In addition, we found that certain Hsf genes on chromosomes Cm_Chr02, Cm_Chr06, Cm_Chr08, and Cm_Chr016 of pumpkin corresponded to two or more Hsf genes in Arabidopsis. This phenomenon was more fully re ected in the collinear diagram of pumpkin with Lagenaria siceraria, Cucumis sativus, Cucurbita maxima and Charleston gray. In general, the collinear relationship between pumpkin and Lagenaria siceraria, Cucumis sativus, Cucurbita maxima or Charleston gray) was closer than that for Arabidopsis, suggesting that these species may have originated from the same ancestor.

Expression pattern of Hsf genes in pumpkin
To understand the physiological role of CmHsfs, we analysed the expression patterns of 36 heat shock transcription factors in the roots, stems, cotyledons and true leaves of pumpkin via quantitative real-time PCR. The transcriptional abundance of 36 pumpkin heat shock transcription factors was obtained from at least one tissue ( Fig. 5; Table S1). Heat map and cluster analyses showed that 21 CmHsf genes were highly expressed in cotyledons and true leaves, such as CmHsf4, CmHsf32, CmHsf35, CmHsf19 and CmHsf15. Two genes (CmHsf9 and CmHsf10) were expressed more highly in the roots and stem than in the cotyledons and true leaves. Some genes were highly expressed only in one tissue. For example, CmHsf23 was mainly expressed in the roots, and its relative expression level was 100-258 times that in other tissues. Based on the above analysis, 36 heat shock transcription factors showed tissue speci city.

Cis-acting element analysis of Hsf genes in pumpkin
To explore the potential function of Hsfs, the cis-elements in the promoters (2 kb upstream) of the 36 Hsf genes in pumpkin were predicted. A total of 429 ciselements were found among all CmHsfs, which responsed to 9 abiotic stresses, including showing salicylic acid responsiveness, defence and stress responsiveness, low-temperature responsiveness, abscisic acid responsiveness, gibberellin responsiveness, MeJA responsiveness, auxin responsiveness, drought inducibility and wound responsiveness ( Fig. 6A; Table S2). A total of 31% of the 429 cis-acting elements were involved in abscisic acid responsiveness, which existed in 32 of the 36 CmHsfs (Fig. 6B, 6C). In addition, 27% and 45% of the cis-acting elements were MeJA response elements (harbouring CGTCA and TGACG motifs) and auxin response elements, respectively (Fig. 6B). Among the 36 heat shock transcription factors, 28 genes were involved in the MeJA response, and 22 genes were involved in the auxin response. A total of 14 heat shock transcription factors exhibited low-temperature response elements. Since the Hsf genes involved in abscisic acid responsiveness, low-temperature responsiveness, MeJA responsiveness and auxin responsiveness account for a high proportion of these genes, we speculated that these genes might play important roles in these stresses.
By analysing the cis-acting elements of individual genes, we found that both CmHsf34 and CmHsf27 contained 12 abscisic acid response elements (Table  S2). In addition, CmHsf17, CmHsf26, CmHsf9 and CmHsf35 contained 8 MeJA response elements, and CmHsf23 and CmHsf35 contained the greatest number (3) of low-temperature response elements, which indicates that these key CmHsfs may play an important role in the corresponding stress response.

Response of CmHsf genes to temperature stress
To explore the response of CmHsfs to temperature stress, we cultured pumpkin seedlings at 4 °C and 38 °C. Under cold treatment, 44% of the CmHsf genes (16 genes) were signi cantly upregulated, and 27% of the CmHsf genes (10 genes) were signi cantly downregulated ( Fig. 7; Table S3). For instance, CmHsf3, CmHsf5, CmHsf23, CmHsf24, CmHsf27, CmHsf35 and CmHsf36 were highly expressed under cold stress, at 23.06-99.04 times the control level. In addition, the CmHsf4, CmHsf15, CmHsf31 and CmHsf32 genes exhibited low expression levels under cold stress. At the same time, two genes (CmHsf28 and CmHsf30) were not expressed under cold stress, indicating that the expression of these genes may be limited in time or space. Under heat treatment, 24 genes were signi cantly upregulated, and 12 genes were signi cantly downregulated ( Fig. 7; Table S3). The expression levels of CmHsf9 and CmHsf31 under heat stress were 128.38 and 66.39 times those in the control plants, respectively, suggesting that these two genes may play important roles under heat stress.
Some genes presented low expression levels under heat treatment, such as CmHsf17, CmHsf11, CmHsf21, CmHsf22, CmHsf23 and CmHsf35. Considering the expression levels of the CmHsf genes under cold and heat stress together, we found that CmHsf9, CmHsf11, CmHsf21, CmHsf23, CmHsf31, CmHsf34 and CmHsf35 showed opposite trends under the two stresses, so we speculate that these genes may play important roles in temperature stress.

Response of CmHsf genes to hormones and salicylic acid
According to the prediction of cis-acting elements in the CmHsfs promoter, a total of 28, 32, and 19 CmHsf genes were found to be involved in the MeJA response, abscisic acid responsiveness and salicylic acid responsiveness, respectively ( Fig. 6; Table S2). Therefore, we analysed the responses of these genes to MeJA, ABA, and SA. The results of qRT-PCR analysis showed that 31 CmHsfs responded to MeJA to varying degrees, and the expression of CmHsf20 was 5.1 times that in the control (Fig. 8; ; Table S4). Under ABA treatment, 21 CmHsfs were signi cantly upregulated, and 15 genes were signi cantly downregulated. The expression levels of CmHsf3, CmHsf4, CmHsf5, CmHsf6, CmHsf7, CmHsf8, CmHsf12, CmHsf25, CmHsf29 and CmHsf31 under ABA stress were 20 ~ 86 times those of the control plants, indicating that these genes play important roles under ABA stress. All CmHsfs responded to SA, among which CmHsf25, CmHsf27, CmHsf29 and CmHsf32 were signi cantly increased under SA treatment, while CmHsf1, CmHsf2, CmHsf23 and CmHsf28 were signi cantly decreased under SA treatment. Based on the above analysis, we conclude that CmHsf family genes are involved in multiple stresses and may play different roles in these stresses.

Discussion
Heat shock transcription factors are broadly present in all plants and are considered to be important regulators of abiotic stress. The Hsf family has been comprehensively and systematically analysed Glycine max [16], Chinese cabbage [17], Pyrus bretschneideri [18], potato [19], grape [20] and Brassica oleracea [21]. However, the Hsf family has not been extensively studied in Cucurbita moschata.
In our study, we identi ed 36 Hsf genes in Cucurbita moschata via genome-wide analysis. Compared with the 25 reported ZmHsfs [14] and 31 PtHsfs [13], we found that the number of Hsf genes in Cucurbita moschata is greater than those in Zea mays and Populus trichocarpa. Genome sizes vary signi cantly in these species; for instance, the genome size of Cucurbita moschata is 197.83 Mb, and that of Zea mays is 2300 Mb. The maize genome size is 11 times that of Cucurbita moschata. However, the number of maize Hsf genes is much lower than the number of Hsf genes in Cucurbita moschata. The reason for this difference might be that although two rounds of gene duplication occurred in the Zea mays genome during its evolution [55,56], the Hsf genes of Zea mays underwent large gene losses. In addition, the genome of Cucurbita moschata also underwent a whole-genome duplication (WGD) event during the phylogeny of the species [40].
Chromosomal segmental duplications and individual gene duplications are a major driving force in the genome evolution process [57]. In this study, all Hsf gene pairs were found to have experienced segmental duplication events, with no tandem duplication events, indicating that segmental duplication has played a important role in the evolution of the Cucurbita moschata Hsf gene family. The Ka and Ks ratios of all duplicated pairs indicated that these gene pairs were under purifying selection. Additionally, the relatively high Ka/Ks ratios for CmHsf12-CmHsf10 suggested that they have experienced rapid evolution.
The CmHsfs were divided into three groups (Groups A, B and C) based on the conserved structural characteristics of the DBD and the HR-A/B domain. In addition, phylogenetic tree analysis indicated that the CmHsfs could be divided into three subfamilies (subfamily I, subfamily II and subfamily III). While subfamily II corresponded to group C and subgroup III corresponds to group B, subgroup I contained not only group A genes but also group C genes. Due to the close homology of the genes on the same branch, we speculate that the evolutionary path of the CmHsfs has been changing.
Cis-acting elements are essential for gene expression, and their numbers are correlated with gene expression intensity [58,59]. CmHsf23 and CmHsf35 contain three low-temperature response elements, which mean that CmHsf23 and CmHsf35 may play key roles under low-temperature stress. The qRT-PCR results showed that CmHsf23 and CmHsf35 were signi cantly upregulated under low temperature, and the expression pro les of these two genes showed opposite trends under high-temperature stress, which further veri ed the response of the two genes to temperature stress. The prediction of cis-acting elements showed that the promoters of 28 CmHsf genes contained MeJA response elements, and qRT-PCR analysis showed that the expression levels of 31 genes changed to varying degrees under MeJA treatment. However, from the relative expression values, we found that the CmHsfs responded less to MeJA than to ABA and SA. Therefore, we concluded that pumpkin Hsf family genes were mainly involved in the responses to ABA and SA.

Conclusions
In summary, we identi ed 36 CmHsfs in the pumpkin genome based on a thorough analysis and provided genetic information such as chromosome locations and exon-intron structures, conserved domains, and duplicated genes. We speci cally examined the expression pro les of these CmHsfs in different tissues. At the same time, we examined the responses of CmHsfs to multiple stresses, and several key genes were found to regulate pumpkin resistance.

Sequence retrieval from the Cucurbita moschata database and physicochemical characterization
To identify the heat shock transcription factor family in pumpkin, the genome of pumpkin was downloaded from the Cucurbita moschata database (CuGenDB, http://cucurbitgenomics.org/) [40]. A total of 25 Arabidopsis thaliana Hsf genes were obtained from the NCBI database by using their gene IDs from of 1 × e − 10 . To eliminate false positives, sequences were discarded if they constituted < 70% of the corresponding Arabidopsis Hsf protein. SMART (http://smart.embl-heidelberg.de/) [41] and MARCOIL (http://toolkit.tuebingen.mpg.de/marcoil) were used to predict the DBDs and HR-A/B domains. After the removal of redundant genes, the remaining genes were identi ed as pumpkin Hsf (CmHsf) family genes.
The physical and chemical characteristics of the heat shock transcription factors, including their molecular weight (Mw), isoelectric point (pI) and the number of amino acids, were analysed with ExPASy (http://web.expasy.org/tools/). Information on CmHsf genes including their chromosomal distribution, their start and the end positions on the chromosomes were extracted from Cucurbita moschata database, and their subcellular locations was predicted with Plant-mPLoc [42].

Phylogenetic tree construction
To reveal the phylogenetic relationships of Hsf genes in pumpkin, an unrooted phylogenetic tree was constructed with MEGA 5.0 [43] according to the similarity of full-length amino acid sequence of 36 CmHsf genes. The bootstrap values were obtained using 1000 replicates with the pairwise deletion option.

Analysis of conserved domains and gene structure
The conserved motifs of Hsf genes in pumpkin were analysed by using Multiple Expectation Maximization or Motif Elicitation (MEME, http://meme-suite.org/) [44], and the LOGOs of the protein motifs were also obtained with MEME. The NLSs and NESs of the heat shock transcription factors were predicted by using cNLS Mapper [45] and the NetNES 1.1 Server [46], respectively. The exon-intron structures of individual CmHsf family genes were obtained from GSDS (Gene Structure Display Server, http: //gsds.cbi. pku.edu.cn/) [47] by comparing the coding domain sequences (CDSs) and the corresponding genomic sequences of pumpkin.

Gene duplication and gene collinearity analysis
The chromosomal locations of the CmHsf genes were mapped and imaged with visualization tools (http://visualization.ritchielab.psu.edu/home/index) based on their initial positional information obtained from Cucurbita moschata (CuGenDB, http://cucurbitgenomics.org/). To identify gene duplications, all CDS sequences of pumpkin Hsf genes were subjected to BLAST searches against each other (identity > 80%, e value < 1e − 10 ) by using the local Blast program. Gene alignment coverage was then acquired by pair-wise alignment using the previously calculated method: Gene alignment coverage = (alignment lengthmismatches)/length of the longer gene. Pairs were considered duplications when the gene alignment coverage was greater than 0.75. Moreover, two genes which were separated by several genes in a 100-kb were named as tandemly duplicated genes [48]. To estimate the divergence of these duplicated CmHsf genes, we used the KaKs calculator to calculate the synonymous substitution ratio (Ks) according to the method of Gojo-bori and Nei [49]. To avoid the saturation of substitutions, we required that Ks values > 2.0 must be discarded [50,51]. The divergence time (T) was computed according to the formulathe (T = Ks/2λ × 10 − 6 million years ago (Mya), λ = 1.5 × 10 − 8 ) in the previous literature [52]. The criteria for identifying gene collinearity were based on previous reports [53], and the synteny relationships between the heat shock transcription factors of pumpkin and those of other species (Arabidopsis thaliana, Cucumis sativus, Cucurbita maxima, Charleston gray, Lagenaria siceraria) were constructed using Dual Synteny Plotter software (https://github.com/CJ-Chen/TBtools) [54].

Analysis of cis-acting elements of CmHsf gene promoters in pumpkin
The promoter sequences (2 kb before the start codon) of all CmHsf genes were extracted from the Cucurbita moschata database, and we predicted the promoter cis-acting elements of CmHsfs by using Dual Synteny Plotter software [54].

Plant material, growth conditions and stress treatment
The pumpkin variety "Tianmi 1" was used as the study material. The seeds were provided by the pumpkin team of School of Horticulture and Landscape Architecture, Henan Institute of Science and Technology. The seeds were sown in a tray containing a matrix-meteorite (3:1) mixture and grown in a plant growth chamber. The arti cial growth conditions were set to 25 ℃ / 16 ℃, 16 h light / 8 h dark and 65% relative humidity. We sampled and analysed different tissues (roots, stems, cotyledons and true leaves) of two-month-old seedlings. In addition, some of the seedlings were transferred to 38 ℃ for 6 h heat treatment, or transferred to 4 ℃ for 6 h cold treatment. Another portion of the seedlings were cultured in 1/2 Hoagland solution, pH 6.5. After 5 days of adaptation, the plants were cultured with the following treatments: (1) control; (2) 1 mM MeJA; (3) 5 mM salicylic acid (SA); (4) 100 µM abscisic acid (ABA).
Leaf samples were collected at 10 h after the above treatments. Three independent biological replications formed one sample. Control and stress-treated samples were frozen in liquid nitrogen and stored at -70 ℃ for further analysis.
RNA extraction, reverse transcription and qRT-PCR analysis    Heat map and hierarchical clustering of 36 CmHsf genes in true leaves under cold stress and heat stress. The relative expression levels of CmHsf genes at 10 h after cold stress (4 ℃) and heat stress (38 ℃)werequantifed against the seedlings without stress (CK), and the reference gene(β-Actin) was used to correct the expression level of target genes, and the expression level of the control was set as 1.The bar on the right of the heat map represents the relative expression levels.

Page 22/22
This is a list of supplementary les associated with this preprint. Click to download.