Genome-wide association and differential expression analysis of salt tolerance in Gossypium hirsutum L at the germination stage

Background Salinity is a major abiotic stress seriously hindering crop yield. Development and utilization of tolerant varieties is the most economical way to address soil salinity. Upland cotton is a major fiber crop and pioneer plant on saline soil and thus its genetic architecture underlying salt tolerance should be extensively explored. Results In this study, genome-wide association analysis and RNA sequencing were employed to detect salt-tolerant qualitative-trait loci (QTLs) and candidate genes in 196 upland cotton genotypes at the germination stage. Using comprehensive evaluation values of salt tolerance in four environments, we identified 33 significant single-nucleotide polymorphisms (SNPs), including 17 and 7 SNPs under at least two and four environments, respectively. The 17 stable SNPs were located within or near 98 candidate genes in 13 QTLs, including 35 genes that were functionally annotated to be involved in salt stress responses. RNA-seq analysis indicated that among the 98 candidate genes, 13 were stably differentially expressed. Furthermore, 12 of the 13 candidate genes were verified by qRT-PCR. RNA-seq analysis detected 6640, 3878, and 6462 differentially expressed genes at three sampling time points, of which 869 were shared. Conclusions These results, including the elite cotton accessions with accurate salt tolerance evaluation, the significant SNP markers, the candidate genes, and the salt-tolerant pathways, could improve our understanding of the molecular regulatory mechanisms under salt stress tolerance and genetic manipulation for cotton improvement. Electronic supplementary material The online version of this article (10.1186/s12870-019-1989-2) contains supplementary material, which is available to authorized users.


Background
Salinity is a significant abiotic stress that reduces crop productivity and quality throughout the world [1]. More than 6% of the world's 800 million agricultural lands are affected by salinity [2]. A comprehensive understanding of the salt-responsive molecular mechanisms and exploring salt-tolerant genes will help increase crop tolerance to salinity [1,3,4]. Upland cotton (Gossypium spp.) is an important source of natural fiber, vegetable oil, and protein and is also a moderately salt-tolerant and pioneer crop that can be grown in saline-alkali land. Nevertheless, its yield will be drastically reduced as the soil salinization level increases [5]. Tolerance to salinity significantly varies among cotton germplasms. Thus, screening elite high salttolerant germplasms is key to breeding salt-tolerant cotton, as well as in identifying molecular mechanisms and key genes associated with salt tolerance.
Several methods have been developed for assessing germplasm. Factor analysis of principal component analysis (PCA) is commonly used in the evaluation of the status of each material in a group by analyzing a large number of samples and major correlation indicators [6][7][8][9].
Subordinate function analysis (SFA) is often used to evaluate stress tolerance [8,10]. However, the evaluation may be one-sided when using only PCA or SFA [10,11]. A comprehensive evaluation value combining PCA and SFA can convert each indicator into independent factors that can be compared with each other while maintaining the original information, thereby providing a more comprehensive assessment of plant tolerance [10]. In addition, this comprehensive evaluation value has higher accuracy and efficiency than grade evaluation with estimating intuitive withered area proportion. This integrated approach has been used to assess stress tolerance in sugarcane [10,12], cucumber [13], tomato [11], alfalfa [8], wheat [14], and cotton [15].
Salinity tolerance is a multigene controlled trait and is susceptible to environmental factors. Association mapping based on linkage disequilibrium (LD) is a common and powerful technique for identifying genomic regions related to specific variants of phenotypic characteristics [16] based on its capability of dissecting a larger number of alleles than linkage mapping. However, studies on identifying salt-tolerant QTLs/genes in cotton using association mapping [7,16,17] or even linkage mapping [18,19] are limited. In addition, the results of a single investigation on association mapping for salt tolerance in cotton using high-density single-nucleotide polymorphism (SNP) markers have been reported. A total of 23 significant SNPs and 280 possible candidate genes, of which most are involved with transcription factors, transporters, and enzymes, were found to be associated with two salt tolerance-related traits [20]. With the maturity and popularization of second-generation sequencing, RNA sequencing (RNA-seq) has become the major approach in excavating candidate genes, as well as in constructing molecular regulatory pathways and potential regulatory networks. Some salt-responsive mRNAs [21,22], miRNAs [3], alternative splicing [23], or long non-coding RNAs (lncRNAs) [24] were detected in cotton using RNA-seq and the potential molecular regulatory pathways or regulatory networks of some genes were preliminarily explored. In cotton, several salt stress-inducible genes have been detected through association or linkage mapping and RNA-seq, including GhNHX1 [25], metallothionein (GhMT3a) [26], GhERF2-GhERF6 [27,28], GhDREB1 [29], CCCH-type zinc finger (GhZFP1) [30], GhNAC1-GhNAC631 [31], GhMPK2 [32], GhMKK1 [33], GhSOD1 and GhCAT1 [34], GhWRKY17 [35], and GhAnn1 [36].
In this study, we combined comprehensive evaluation, association mapping, and RNA-seq to explore salt-tolerant candidate chromosomal regions/genes in cotton at the germination stage. This study provides candidate QTL (qualitative-trait locus) regions and genes for dissecting the genetic mechanisms of salt tolerance and variety breeding in cotton.

Salt tolerance performance and evaluation
At germination stage, 10 traits were measured in 196 upland cotton genotypes, and the results are shown in Table 1. The results indicated that all 10 traits were significantly hindered by salt stress. All traits exhibited significantly lower means and extreme values under salt stress than the normal conditions. Comparatively, GP (germination potential) was the most affected trait, whereas SFM (shoot fresh mass) was least affected. The 0.3% (200 mmol/L) NaCl stress reduced not only the total germination rate but also the germination speed and subsequent growth. Under normal conditions, the CVs (coefficient variations) ranged from 7.1% in RDM (root dry mass) to 20.6% in VI (vigor index). The 0.3% (200 mmol/L) NaCl stress resulted in significantly increased CVs in nearly all 10 traits, except for VI and RFM (root fresh mass). SDM (shoot dry mass), GP, and GR (germination rate) ranked the top three CVs. These results indicated that most of the trait responses of this panel of upland cotton to salt stress are highly diverse.
To reduce the impact of environmental factors and to reach high evaluation accuracy, the BLUPed (BLUP: the best linear unbiased prediction) STI (salt tolerance index) was calculated based on the nine (three years × three replications) STIs in each trait, and the results were shown in Additional file 1: Table S1 [37,38]. No significant differences among STIs among the 3 years and the BLUPed STIs of each trait were observed.    [6], indicating that the STI data were suitable for PCA. The four STI data sets resulted in consistent results in PCA. For each set of STI data, two principal components that accounted for at least 86.85% of the total variance were obtained based on the Eigenvalues-more-than-1.0 rule (Additional file 3: Figure S1 and Additional file 4: Table S3). As shown in PCA plots (Additional file 5: Figure S2), Factor 1 represents all the salt-tolerance traits except for STI_RFM (salt tolerance index of root fresh mass), which was Factor 2.
Furthermore, the comprehensive evaluation value (D) of each genotype was obtained based on the subordinate function value (U) computed with the two principle factors (Additional file 6: Table S4) Table S4).

Association of SNP markers and salt tolerance
In this GWAS (genome-wide association study), the comprehensive evaluation values (D) was used as phenotypic data to detect significant salt tolerance QTLs/ genes. Thus, the covariant Q matrix, which can reduce the negative influence of group structure, was introduced into the mixed linear model (MLM) to reduce false positives. To test the quality of the mixed linear model combined structure Q matrix and kinship matrix (MLM [Q + K]) and association results, we employed a QQ (Quantile quantile) plot. Figure 1 showed that the MLM [Q + K] model was slightly too strict in detecting significant SNP markers, indicating that the probability of false positives was much lower. Consequently, a total of 33 SNPs that were significantly (p < 0.001) associated with salt tolerance were detected using MLM [Q + K] ( Fig. 2 and Table 2). Among these, 17 SNPs were detected in at least two environments (3 years and BLUPed data) and 7 SNPs were identified in all four environments, explaining 5.98 to 10.76% phenotypic variation, with an average of 8.21%. These 17 SNPs were consequently used to identify candidate QTLs. When the distance between the first SNP and neighboring SNPs was less than the LD decay distance at pairwise r 2 = 0.1 level on each chromosome (Additional file 7: Table S5) or r 2 between the first SNP and the neighboring SNP was > 0.1, these SNP markers were regarded as a confidence interval for a candidate QTL. Finally, 13 QTLs were obtained on 10 chromosomes, including 4 QTLs on homologous chromosomes A07-D07.

Candidate genes detected with GWAS
According to the physical positions of the candidate QTLs referenced to the G. hirsutum (TM-1) genome [39], a total of 98 candidate genes were detected (Table  2), and all these genes were annotated in Arabidopsis thaliana (Additional file 8: Table S6). The candidate genes were retrieved in UniProtKB to annotate their biological processes (Additional file 8: Table S6). Functional annotation showed that 35 candidate genes were associated with salt tolerance (Table 3). Of these, 11 genes were involved in eight transcription factor (TF) families, including CO-like, MYB, bZIP, ERF, TALE, SBP, HD-ZIP, and ARR-B; 17 genes were related to "response to stress" or "defense response" such as the stress of acidic pH, heat, hydrogen peroxide, water deprivation, and hyperosmotic salinity; nine genes were associated with "signaling" or "response to signal factors" such as salicylic acid (SA), gibberellic acid (GA), jasmonic acid (JA), calcium-mediated signaling, cytokinin, and abscisic acid (ABA); six genes were involved in "ion homeostasis" or "ion transport", one gene was associated with "fatty acid biosynthetic process", and five genes were related to amino acid synthesis or transport. In addition, 12 of these 35 candidate genes were involved in two or three functional categories (Table 3).
GO (Gene Ontology) enrichment analysis was conducted to further infer the functions of these 98 candidate genes. At a P-value < 0.05 and the number of genes > 3, the 98 candidate genes were categorized into eight GO terms (Fig. 3). Each of the two main categories, namely, biological process (BP) and molecular function (MF), contained four GO terms. The BP category in this study contained "chitin catabolic process", "carbohydrate metabolic process", and "metabolic process and regulation of transcription", whereas the MF category contained "chitinase activity", "hydrolase activity", and "transcription factor activity and transporter activity".
Based on the physical positions, 21 of the 98 candidate genes detected in the GWAS were proximal to significant SNPs, and six (Gh_A07G0622-TM18816, Gh_A07G0729-TM19028, Gh_A12G0874-TM41811, Gh _D07G0251-TM63245, Gh_D10G0640-TM73567, and Gh_D10G0647-TM73579) of them had significant SNPs in their coding sequences (CDS) (Additional file 9: Table S7). Furthermore, based on their functional annotation or biological process (Table 3), three (Gh_A07G0622, Gh_ A07G0729, and Gh_D07G0251) of the six genes were related to the functional categories of "response to stress or defense response", "signaling or response to signal factors" or "transcription factors", implying their involvements in salt tolerance responses.  Table S8). At least 88% of the clean reads, which were raw reads without low-quality reads and adaptor sequences, were mapped to the G. hirsutum (TM-1) genome, in which uniquely mapped reads accounted for about 81% of the clean reads. A total of 6640, 3878, and 6462 DEGs (differentially expressed genes) (P-value < 0.05) were obtained between salt stress and the control at 3, 24, and 72 h post salt stress treatment, respectively, and 3956 (59.6%), 2238 (57.7%), and 3226 (49.9%) were upregulated (Fig. 4). Among the three sampling time points, we identified 869 shared DEGs, including 562 continuously upregulated and 307 continuously downregulated genes.
To verify our DEG results, qRT-PCR was adopted on 20 randomly selected DEGs, including 13 continuously upregulated and 7 downregulated genes (Additional file 10:   Figure S3). Relative expression levels were calculated using the ΔΔCt method. The results of transcriptome sequencing coincided with our qRT-PCR findings.
To monitor salt tolerance gene expression, GO enrichment analysis (P-value corrected by FDR < 0.05) of 562 continuously upregulated genes and 307 continuously downregulated genes was conducted (Fig. 5). For continuously upregulated genes (Fig. 5a), the BP category consisted of "single-organism metabolic process", "oxidationreduction process", "carbohydrate metabolic process", "photosynthesis", "organonitrogen compound catabolic process", "oligosaccharide metabolic process" and "aromatic compound catabolic process"; the cellular component (CC) category was mainly related to "thylakoid" and "photosystem"; and the MF category comprised "oxidoreductase activity" and "O-methyltransferase activity". For continuously downregulated genes (Fig. 5b), the BP category was mainly involved in "response to oxidative stress", "obsolete GTP catabolic process", and "polyol metabolic process"; the CC category was mainly involved in "viral capsid", "integral component of organelle membrane", and "intrinsic component of organelle membrane"; and the MF category was mainly involved in "transporter activity", "peroxidase activity", and "inorganic diphosphatase activity". All the enriched GO terms of the 869 shared DEGs were associated with stress responses.
KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway enrichment of the 869 continuously up/downregulated DEGs was performed (Fig. 6). The continuously upregulated genes were significantly (P-value < 0.05) enriched in nine KEGG pathways (Fig. 6a), including "photosynthesis", "flavonoid biosynthesis", "starch and sucrose metabolism", "photosynthesis-antenna proteins", "pyruvate metabolism", "plant hormone signal transduction", "beta-alanine metabolism", "galactose metabolism", and "arachidonic acid metabolism". The KEGG pathways of continuously upregulated genes correspond to the GO enrichment results and the functions of genes, which were associated with stress response. The continuously downregulated genes were mostly related to nine KEGG pathways (Fig. 6b), including phagosome, phenylalanine metabolism, phenylpropanoid biosynthesis, starch and sucrose metabolism, pentose and glucuronate interconversions, ascorbate and aldarate metabolism,  TFs play pivotal roles in plant stress responses [40]. In this study, 62 (7.13%) genes encoding for TFs were identified as shared DEGs, of which 41 were continuously upregulated and 21 continuously downregulated ( Table 4). All the 62 genes were belonged to 19 TF families, including several key regulatory gene families responding to abiotic and biotic stresses such as ARF, bHLH, bZIP, C2H2, ERF, HD-ZIP, MYB, MYB_related, NAC, and WRKY.

Combination of association analysis and transcriptome sequencing
We combined the GWAS and RNA-Seq results to further screen salt tolerance candidate genes. Of the 98 candidate genes in GWAS, 13 exhibited significantly different expression levels (p < 0.005) at more than one sampling time point in RNA-seq analysis (Table 5). Of the 13 putative DEGs, eight exhibited significantly different expression at only one time point, four (Gh_ A03G1740, Gh_A12G0877, Gh_D07G0263, and Gh_ D07G0500) at two time points, and one, Gh_A07G0622, upregulated at all three time points. In addition, six of the 13 putative DEGs (Gh_A01G1563, Gh_A02G1100, Gh_A07G0622, Gh_A12G0877, Gh_D07G0251, and Gh_ D07G0500) were proximal to significant SNPs in GWAS, and two (Gh_A07G0622 and Gh_D07G0251) of the six genes had significant SNPs within coding sequence regions (Additional file 9: Table S7). In addition, as shown in above, Gh_A07G0622 and Gh_D07G0251 were involved in "response to stress or defense response", "signaling or response to signal factors", and "transcription factors" (Table 3).
To further verify the putative genes, all the 13 putative DEGs except for Gh_A12G0877, which could not be distinguished from a homologous gene, were analyzed by qRT-PCR using a set of three replicates for each sample at 3, 12, 24, 48, and 72 h after salt or water treatment. Relative expression levels were calculated using the ΔΔCt method. Figure 7 showed that the putative genes Gh_A01G1563, Gh_A07G0622, Gh_D07G0243, and Gh_D07G0251 were continuously upregulated after salt treatment. The genes Gh_D07G0623, Gh_ D07G0250, Gh_D07G0258, and Gh_D07G0500 responded to salt stress at the initial stage (3 h or 12 h post treatment). The gene Gh_A07G0623 was responsive to salt at 72 h post treatment. The candidate genes Gh_ A03G1740, Gh_A02G1100, and Gh_D07G0249 were continuously downregulated under salt stress. The qRT-PCR results showed all the 12 DEGs might be involved in salt tolerance responses.

Comprehensive evaluation of salt tolerance
The salt tolerance traits of upland cotton are complex and vary with species, developmental stage, and tissue [41]. Thus, it is very important to evaluate germplasm salt tolerance accurately with a precise method. Several indexes have been used to assess salt tolerance in cotton, including indicators of seed germination (GR, GP, GI (germination index), and VI), plant morphological indexes (PH, SFM, SDM, and RL), physiological and biochemical indexes (Na + , K + , and betaine), and yield traits (boll number, boll weight, and lint yield) [7,20,42]. At present, it is generally believed that the comprehensive evaluation of salt tolerance combined with multi-indexes and multi-methods is more authentic and reliable such as PCA, SFA, and the comprehensive evaluation D value [6][7][8][9][10][11]. The comprehensive evaluation D value, which has the higher accuracy, has been widely used in evaluating stress tolerance in germplasms of different crops or vegetables [8,[10][11][12][13][14][15]. In this research, we investigated 10 traits related to salt tolerance and used their comprehensive evaluation D values in GWAS, which should help to improve salt tolerance evaluation of this panel and therefore the GWAS.

Candidate QTLs/genes detected with GWAS
Salt tolerance is a important and complex trait in cotton. Molecular tagging of salt tolerance has been investigated in several previous researches with different marker types and mapping populations [7, 16-19, 43, 44]. And, a meta analysis of salt tolerance QTLs in cotton was also reported [45]. Compared to the previous reports, there was no co-location QTL with the QTLs detected herein. This low consistency may be related to these factors, including the complexity of salt tolerance mechanisms in cotton, and differences in populations, salt tolerance assessment traits, marker types, and marker densities used in different studies. In this study, a high-density SNP array and four sets of D value (three years and BLUPed D value) were used in GWAS and 13 candidate QTLs controlling salt tolerance at germination stage were detected in no less than two environments. In Fig. 6 Statistics of pathway enrichment analysis of salt-tolerance genes detected in RNA-seq. a Pathway enrichment analysis of 562 continuously upregulated salt-tolerance genes at p-value < 0.05. b Pathway enrichment analysis of 307 continuously downregulated salt tolerance genes at p-value < 0.05 addition, the QTLs regions harbored candidate genes whose functions were considered to be involved in salt tolerance and regulated by salt stress.
Many candidate genes from the 13 QTLs are very likely to be associated with salt tolerance, based on their annotation in Arabidopsis thaliana and functional annotation (Table 3 and Additional file 8: Table S6). Eleven genes are involved in seven TF families, namely, MYB, bZIP, ERF, TALE, SBP, HD-ZIP, and ARR-B. TFs play a major role in plant biotic and abiotic stress responses [46]. MYB [46][47][48], bZIP [49][50][51], and ERF [28,52,53] are also known to be involved in responses of biotic and abiotic stress. The CO-like gene (COL4) positively regulates abiotic (salt and osmotic) stress tolerance in Arabidopsis through an ABA-dependent method [54]. SBP, the plant-specific TF that participates in plant development, may enhance stress tolerance of plants by the growth regulation [55]. The ARR-B gene ARR12 (Gh_D07G0251 in this study) negatively regulates the responses of Arabidopsis to drought [56]. HD-ZIP TFs are involved in both development and stress (drought or salt) responses in A. thaliana [57,58], rice [59], Manihot esculenta Crantz [60] and Craterostigma plantagineum [61]. Furthermore, the HD-ZIP gene (GhHB1) in G. hirsutum responds to root development, abscisic acid, and salt [62], suggesting that HD-ZIP genes play an important role in plant development and stress responses. As shown in Table 3, these 17 candidate genes detected in GWAS respond to acidic pH, heat, hydrogen peroxide, water deprivation, or hyperosmotic salinity stress or are involved in "defense response" and are highly likely to be related to salt tolerance. In addition, the gene Gh_ A01G1564 responds to GA and JA, Gh_A02G1099 and Gh_A02G1100 responds to SA, Gh_D07G0251 and Gh_ D07G0256 respond to cytokinin (CK), Gh_D07G0500 responds to ABA, and Gh_D10G0643 responds to auxin (IAA). Plant hormones such as GA, JA, SA, CK, ABA, and IAA play essential roles in plant adaptation to external stimuli and changes in the environment [3,22,63]. Those genes involved in "response to hormone stimulus" or "signaling" were concordant with the results of previous studies of expression profiling of plants under salt stress [5,64,65]. Gh_A07G0729 is also involved in calcium-mediated signaling, which plays an essential role in adapting to salt stress [3,66,67]. In addition, six genes were involved in "ion homeostasis" or "ion transport", by which plant tolerance to salinity could be enhanced [68][69][70].
Furthermore, the GO terms enriched with candidate genes in this GWAS included carbohydrate metabolic process, metabolic process, hydrolase activity, regulation of transcription, transcription factor activity, and transporter activity, and were also mainly enriched in previous studies of responses to salt (NaCl) stress in cotton roots at the seedling stage [3,21,22,65,71]. Overexpression of chitinases in transgenic tobacco enhanced its tolerance to biotic (fungal and bacterial pathogens) and abiotic (salinity and heavy metals) stress [72], and accumulation of the chitinase isoforms was induced by heavy metal stress in plants [73]. In this GWAS, the enriched GO terms associated with salt stress included "chitin catabolic process" and "chitinase activity".

Candidate genes and pathways associated with salt tolerance in transcriptome sequencing
To date, studies on the mechanism of salt tolerance in cotton are limited, with most investigations focusing on transcriptome sequencing of salt-responsive genes and pathways. Xu et al. (2013) [65] examined variations in gene expression of roots after exposing plants to 200 mM NaCl for 3, 12, 72, or 144 h and revealed that the enriched GO terms were related to cellular components, including "intrinsic to membrane", "cytoplasmic vesicle", and "membrane part". Peng et al. (2014) [3] identified DEGs in cotton leaves at 4 and 24 h post-application of salt stress (200 mM NaCl) and revealed enriched GO terms such as "transcription factor activity", "response to stress", and "regulation of biological process". Guo et al. (2015) [5] reported that the transcripts upregulated in both salt-tolerant and salt-sensitive cultivars under 150 mM NaCl treatment enriched GO terms related to  "response to stimulus", "transcription factor activity", "peroxisome", and "proline metabolic process". Zhang et al. (2016) [22] identified DEGs in G. davidsonii roots and leaves at 12, 24, 48, 96, and 144 h post-salt stress (200 mM NaCl) and identified DEGs that enriched the salt-responsive GO categories, including "response to oxidative stress", "responses to osmotic stress", "ion transport", "response to various hormone stimulus", "response to sucrose stimulus", and "metabolic processes". Additionally, genes related to metabolic processes were involved in "carbohydrate", "hormone", "protein", "lipid", "amino", "oxidation reduction", and "organic substance". Shu et al. (2017) [21] reported that the DEGs of NaCl/ CK were associated with the GO terms of "oxidation-reduction", "oligosaccharide metabolic process", "photosynthesis", "thylakoid", and "oxidoreductase activity". In the current study, we screened out 6640, 3878, and 6462 DEGs in the high salt-tolerance upland accession Han682 at 3, 24, and 72 h post salt stress respectively. Among all the DEGs, 562 were continuously upregulated and 307 were continuously downregulated. The GO terms enriched with continuously upregulated DEGs were related to "metabolic process", "oxidation-reduction process", "carbohydrate metabolic process", "photosynthesis", "oligosaccharide metabolic process", "thylakoid", and "oxidoreductase activity", which were also identified in previous studies [5,21,22,74]. The GO terms enriched with continuously downregulated DEGs included "intrinsic to membrane", "cytoplasmic vesicle" and "membrane part", "response to oxidative stress", and "transporter activity", were agree with the findings of previous studies [3,5,22,65]. The GO term related to "oxidationreduction" was enriched with both continuously up-and downregulated DEGs, suggesting that oxidation-reduction systems elicit more complex responses to stress. In addition, the GO terms related to "membrane", "transporter activity", and "thylakoid" were enriched with significant DEGs in both roots and leaves, suggesting that some mechanisms associated with salt tolerance may be shared in different plant tissues and organs [3,5,21,22,65].
The pathways related to salt tolerance, including flavonoid biosynthesis, starch and sucrose metabolism, plant hormone signal transduction, starch and sucrose metabolism, phenylpropanoid biosynthesis, phenylalanine metabolism, and phagosome, were also identified in previous studies on salt tolerance mechanism of roots of cotton seedlings [5,21,22,74,75]. These pathways may play an essential role in plant adaptation to stress. However, the pathways of pyruvate metabolism, galactose metabolism, and arachidonic acid metabolism are specific to the seedling stage. The pyruvate metabolism pathway was also founded in leaves with salt stress at the seedling stage [75]. The germination of cotton seeds under salt condition require a suitable physiological state such as salt ion homeostasis, sufficient energy supply, and the capacity to remove harmful substances. These specific pathways such as pyruvate metabolism may be able to improve seed survival rate at the seedling stage during salt stress, although the underlying mechanism is unclear. The pathway of flavonoid biosynthesis had the highest rich factor and was very highly significant (P-value < 0.0001) in the pathway enrichment of continuously upregulated DEGs. Flavonoids, which are polyphenolic secondary metabolic compounds, play an important role in growth, development, reproduction, and stress defense [76,77]. Petrussa et al. (2013) [77] have shown that flavonoids constitute a secondary ROSeliminating system in plants under severe or prolonged stress conditions. Petrussa et al. (2013) [77] and Fini et al. (2011) [76] suggest that the key role of vacuoles in ROS homeostasis might be mediated by flavonoids. Ma et al. (2014) [78] have investigated the expression of genes that are involved in the flavonoid pathway and the accumulation of flavonoids related to drought tolerance in wheat. Flavonoids also play key roles in defense responses against biotic stress [79].
Similar MYB-BHLH-WDR (MBW) complexes and a family of small MYB proteins (R3-MYB) have been shown to play a key role in the regulation of flavonoid biosynthesis [47,80]. This confirmed that TFs play an essential role in the biotic and abiotic stress responses in plants. In the current study, 19 TF families that are always related to salt stress response [3,5,22], were continually differentially expressed (Table 4). Among these, three (ARF, ERF, and C2H2) were downregulated, and seven (C3H, COlike, HSF, LBD, M-type_MADS, NF-YA, and NF-YB) were upregulated. The top six TFs were bZIP, MYB (MYB-related), HD-ZIP, NAC, bHLH, and HSF.
The qRT-PCR results also indicated the expression of these putative genes, Gh_A01G1563, Gh_A07G0622, Gh_ D07G0243, Gh_D07G0251, Gh_D07G0623, Gh_D07G0250, Gh_D07G0258, Gh_D07G0500, Gh_A07G0623, Gh_ A03G1740, Gh_A02G1100, and Gh_D07G0249, were regulated by salt stress at the germination stage. These results provide candidate genes for further research on salt tolerance mechanism of cotton. The specific functions and molecular regulation of these genes in salt tolerance of cotton need to be further studied [88].

Conclusions
In the current study, the salt tolerance of 196 accessions was comprehensively evaluated with the comprehensive D values of 10 salt-relevant traits. Based on this, a GWAS for salt tolerance was conducted. In GWAS, 98 candidate genes were obtained in the 13 candidate QTLs from 17 significant SNPs in at least two environments. Functional annotation revealed that 35 of the 98 candidate genes were involved in salt tolerance responses. Furthermore, transcriptome sequencing of a high salt resistant accession, Han682, at three time points after salt or control treatment were conducted to verify the results of GWAS. By combining the results of GWAS and RNA-seq, 13 putative genes were identified and the expressions of 12 of these genes were verified using qRT-PCR. These results will enhance our understanding of the molecular-genetic regulation of salt stress tolerance in cotton and aid in the the modification of salinity tolerance related traits.

Plant materials and SNP markers
A panel of 196 diverse upland cotton accessions, including 169 genotypes from five cotton-growing regions across China and 27 from other countries, were employed in association mapping of salt tolerance at the germination stage. All these 196 accessions were inbred for at least 3 years before use in this study. The detailed information on this panel is described in Additional file 12: Table S9 [89].
The 41,815 polymorphic SNP markers screened from 77,774 SNPs (CottonSNP80K, [90]) were applied in population structure analysis and GWAS. The 196 genotypes were divided into two subpopulations and confirmed using four methods (the UPGMA (unweighted pair group method with arithmetic means) phylogeny, PCA, STRUCTURE, and kinship matrix) (Additional file 13: Figure S4) [89]. The summary of SNPs, polymorphic information content (PIC), gene diversity, and LD decay were also calculated as described by Yuan et al. (2018) (Additional file 7: Table S5) [89].

Salt tolerance assessment at the germination stage
Salt tolerance evaluation at the germination stage, which is the most sensitive development stage to salt stress [91], was performed in a triplex randomized block experiment in 2017, with seeds from three calendar years (2014-2016). Cotton seeds delinted by sulfuric acid were surface-sterilized with 0.1% HgCl for 15 min, then uniform-sized seeds were selected for the germination test, which was performed in an incubator at 28 ± 1°C and 80% ± 2% relative humidity. The seeds (100 seeds) were planted in a germination box (13 × 19 × 12 cm) containing 750 g dry sand and covered evenly with 250 g dry sand above the seeds, then 250 mL 200 mmol/L NaCl solution or distilled water (as control) was added. Every replicate (treatment or control) of each genotype consisted of 200 seeds in two germination boxes.
The number of germinated seeds was recorded from the 3rd day to 10th day after sowing. Then GR, GP, GI, and VI were calculated using the following formulas: GR = G7 TS Â 100% and GP = G3 TS Â 100%, where TS is the total number of seeds in each replicate (200 were used) and G 3 or G 7 is the number of total germinated seeds in two germination boxes from the first day to the third or seventh day after sowing; and GI = P Gt Dt , VI = GI × S, where t is the number of days after planting, Gt is the number of germinated seeds at the tth day after sowing, Dt is the number of days after planting corresponding to Gt, and S is the fresh weight of a single plant seedling.
In addition to germination characteristics, several other salt tolerance-related traits were also investigated. On the  [92,93]. The BLUPed STI were calculated with the nine STIs (three replications × three years) of each trait using the R lmer4 package. Then, four sets of STIs (STIs in three years and the BLUPed STIs) were used in the subsequent calculations. The weights of each principal component factor were calculated as Wi = Ri/ ∑ Ri (i = 1, 2, …, n), and R i means the contribution rate of the ith principal component factor.
Subordinate function analysis was performed as follow: U(Xi) = (Xi − Xmin)/(X max − Xmin) (i = 1, 2, …, n), where X i is the value of the ith principal component factor, and X min and X max indicate the minimum and maximum value of the ith principal component factor [10,11].

Association mapping
Marker-trait association was performed using the software TASSEL version 5.2.40 (RRID:SCR_012837), and the threshold value was set at p < 0.001 for declaring a significant marker-trait association. For each chromosome, the LD decay distance was regarded as the confidence interval for the candidate QTL detected on it. With the G. hirsutum AD1 genome NAU-NBI assembly v1.1 [39], candidate genes were gained for each QTL.

RNA-seq and DEG
Based on the comprehensive evaluation of salt tolerance of the 196 genotypes, the high salt-resistant accession Han682 (namely H1) was selected for RNA-seq analysis under salt stress. After being surface-sterilized with 0.1% HgCl, seeds were sown in wet sand in germination boxes. When the radicle had grown to 2-3 mm long, seeds with uniform radicles were selected and planted in sand beds with 0.3% (weight percentage, approximately 200 mmol/L) or 0% NaCl (the control) in other germination boxes. Root tissues were harvested at 3, 24, and 72 h after planting and immediately stored in liquid nitrogen. The collected root samples were designated as S3, S24, and S72 for the salttreated plants, and H3, H24, and H72 for the control plants, respectively. Each treatment was repeated twice.