Characterization and transcriptome analysis of a dominant genic male sterile cotton mutant

Background Male sterility is an efficient trait for hybrid seed production and germplasm innovation. Until now, most studies on male sterility were on cytoplasmic and recessive genic sterility, with few on dominant genic male sterility, especially in cotton, due to lack of such mutant. Results We discovered a natural male sterile (MS) Sea Island cotton (G. barbadense) mutant. Genetic analysis showed the mutation was caused by a dominant mutation in a single nuclear gene. Comparative cytological observation of anther sections from MS and wild-type (WT) uncovered cellular differences in anther at and after the tetrad stage of pollen mother cells (PMC). In the MS anthers, the outer wall of pollen grains was free of spinules, the tapetum was vacuolated and showed delayed degradation, consequently, no functional pollen grains. Comparison of transcriptomes from meiosis, tetrad, mononuclear and binuclear pollen, and pollen maturation stages identified 13,783 non-redundant differentially expressed genes (DEGs) between MS and WT. Based on the number of DEGs, analyses of enriched GO terms and KEGG pathways, it was evident that significant transcriptomic changes occurred at and after the tetrad stage, consistent with cytological observation, and that the major differences were on metabolism of starch, sucrose, ascorbate, aldarate, alanine, aspartate and glutamate, and biosynthesis of cutin, suberine and wax. WGCNA analysis identified five modules containing 920 genes highly related to anther development, especially the greenyellow module with 54 genes that was highly associated with PMC meiosis and tetrad formation. A NAC transcription factor (Gh_D11G2469) was identified as a hub gene for this module, which warrants further functional characterization. Conclusions We demonstrated that the MS trait was controlled by a single dominant nuclear gene and caused by delayed tapetum degradation at the tetrad stage. Comparative transcriptome analysis and gene network construction identified DEGs, enriched GO terms and metabolic pathways, and hub genes potentially associated with anther development and the MS trait. These results contribute to our understanding of dominant genic male sterility (DGMS) and provided source for innovation of cotton germplasm.


Background
Male sterility generally refers to a biological trait in which monoecious plants maintain fully normal female functions, while male gametes cannot be produced. Adverse growth conditions, diseases or gene mutations can cause male sterility. Male sterility is typically divided into cytoplasmic sterility (CMS) and genic male sterility (GMS) [1]. CMS, caused by dysfunctional mitochondrial genes, shows typically non-Mendelian inheritance [2]. GMS, controlled by nuclear genes, is generally recessive mutations which affect a huge number of biological functions of plants [3]. Male sterility is an effective pollination control system and an important tool for hybrid seed production. Understanding the development of pollen and anther is essential for sexual reproduction. Many studies have probed physiological and biochemical changes during pollen and anther development in different plant species, and investigated the mechanisms of gene regulation and metabolism related to pollen development [4][5][6][7][8]. Some studies have identified key genes involved in pollen and anther development [9], such as MALE STERILITY 2 (MS2), CYP703A2, and CYP704B1 [10][11][12] that are involved in anther cell differentiation and division, pollen cell wall development, and anther dehiscence [13][14][15][16][17][18].
The formation of anthers undergoes a very complicated process. At the microsporocyte stage, the anther wall consists of four layers: epidermis, endothecium, middle layer and tapetum [19]. Proper programmed cell death (PCD) leading to controlled degradation of layers of anther wall is very important for the development of functional pollen grains. Tapetum is the most inner anther wall directly connecting with pollen mother cells (PMC), and its main function is to provide adequate nutrients for pollen development [20], thus, the development and timely degradation of the tapetum layer is critical for the development of pollen wall. Several genes have been shown to be involved in PCD and tapetum degradation. MS1 encodes PHD (plant hemoedomain) transcription factor in Arabidopsis. The ms1 mutant showed vacuolization of tapetum cells and delayed PCD, leading to male sterility [21,22]. Many transcription factors, such as DYT1, TDF1 and AMS, have been found to be involved in the regulation of tapetum degradation and development of anther wall. The dyt1 mutant showed abnormal cell division and vacuolization of tapetum cells, resulting in formation of abnormal tetrads [23]. TDF1 is a R2R3 MYB transcription factor and functions downstream of DYT1. The tdf1 mutant showed vacuolized tapetum cells, leading to abnormal secretion and degradation of callose enzyme [24]. AMS is a basic helix-loop-helix (bHLH) transcription factor. The ams mutant showed vacuolization and delayed degradation of tapetum cells as well as defects in microspore outer wall, indicating that AMS is essential for tapetum cell development and microspore formation [25].
Cotton is the most important fiber crop and has distinct heterosis. However, use of hybrid vigor in cotton production is limited owning to lack of suitable male sterile materials for production of F1 seeds and high cost involved in production of F1 seeds by manual emasculation. Both CMS and GMS are available in cotton. In addition to genetic analysis of the MS mutants, physiological, biochemical, cytological and transcriptomic studies have been applied to investigate the MS mechanism. For instance, genes involved in hormone biogenesis and signaling, carbon and energy metabolism, and pollen wall development were found to be differentially expressed in different developmental stages of anthers from the "Dong A" MS mutant and its corresponding WT [26]. In 1355A, a GMS mutant caused by a single recessive mutation, the pollen wall of the mononuclear stage anther was thickened and had no spinules [27].
Despite the progress on both CMS and GMS studies in cotton, novel MS germplasm is desperately needed, particularly those stable GMS ones. In this study, we reported a new natural GMS mutant, which was found in the field of Sea Island cotton (G. barbadense). The sterile trait of the GMS mutant is controlled by a single dominant nuclear gene and has been introgressed into both elite G. barbadense and upland cotton (G. hirsutum) backgrounds. We further compared anther development and transcripome of Shida 98-6 (the recurrent upland cotton variety) and its nearly isogenic MS line Shida 98-6A to know the cellular phenotype of the MS mutant and the gene networks related to the MS trait.

Genetic analysis of the male sterile trait
After discovery of the male sterile mutant, the male sterile trait has been transferred to Shida 98-6 (G. hirsutum) or Xinhai 53 (G. barbadense) background by five times of backcross using the sterile segregants as the female parent and Shida 98-6 or Xinhai 53 as the pollen donor. In each BC1F1 generation, we observed roughly equal number of sterile and fertile segregants. All progeny of the fertile segergants were fertile. These results suggest that the male sterile trait was caused by a single dominant nuclear gene that can only be maintained in heterozygotes. To further confirm this hypothesis, we generated several segregating populations by crossing sterile (designated Shida 98-6A or Xinhai 53A) or fertile BC5F1 segregant to the recurrent parent (Shida 98-6 or Xinhai 53) or another G. hirsutum variety Xinluzao33 (

Phenotypic characteristics of the male sterile mutant
No morphological difference was observed between MS (Shida 98-6A) and WT (Shida 98-6) during vegetative growth and in the mature plants. During the period of reproductive growth, the MS and WT plants did not show noticeable difference in flowering time, petal color, flower bud size and other floral organs, but showed an obvious difference in anther appearance and development. The WT anthers were light yellow and normally dehisced to produce pollen grains at the mature stage (Fig. 1a). However, the MS anthers were dark yellow, and did not dehisce at the mature stage (Fig. 1b), and consequently no pollen grains were produced in MS (Fig. 1c, d). In order to identify the developmental  differences in anthers between WT and MS, we dissected anthers and compared cytologically anther development between WT and MS at different developmental stages.
We determined the anther developmental stages based on optical microscope observation of paraffin sections according to the previous method [28]. The MS and WT anthers had similar cytological characteristics and no obvious differences in cellular structures at the meiosis stage of PMC (Fig. 1e, i). A clear defect in the MS anthers was first observed at the tetrad stage. At this stage in the WT anthers, microspore was released, the pollen grains were not full but crescent-shaped, and spinules protruding from the exine were formed, which were dyed pink by Safranine O (Fig. 1f), whereas in the MS anthers, microspore also released normally, however, development of microspore exine showed obvious abnormalities, including being unable to form spinules protruding and closely vacuolated tapetum cells (Fig. 1j). At the mononuclear and binuclear pollen stage, pollen grains of WT began to accumulate starch, which could be dyed green by Fast Green FCF, the outer walls of the pollen grains had normal spinules protruding, which were dyed pink by Safranine O, and the tapetum was absorbed and utilized as nutrients (Fig. 1g). In the MS mutant, the tapetum disintegrated and could not be completely absorbed by pollen grains as nutrients and showed a delayed degradation, and the microspores had no material accumulation as indicated by no staining by Fast Green FCF. Furthermore, pollen grains had no spinules protruding, could not be dyed pink by Safranine O, and finally aborted (Fig. 1k). At the mature stage, the WT pollen grains could be colored by Safranine O and Fast Green FCF (Fig. 1h), whereas no pollen grains but only the defective pollen wall cavities were observed in the MS anthers (Fig. 1l).

Transcriptome analysis
To explore the molecular mechanism underlying anther abortion observed in the MS mutant, we compared transcriptomes of WT and MS anthers of the four developmental stages described in the last section, i.e. meiosis, tetrad, mononuclear and binuclear pollen, and pollen maturation stages, considering that the differences of anther cellular phenotypes started to be observed at the tetrad stage. In total, 24 libraries (2 genotypes × 4 developmental stages × 3 biological replicates) were sequenced and a total of 1,161,509,206 raw reads were generated. The sequence of nucleotide mass fraction Q30 greater than 30 was 92.94% in all samples, and the GC content was 43.22%. Raw reads were filtered to remove low quality ones and a total of 1,145,504,498 clean reads were finally used in alignment. Approximately 96.04% of the clean reads could be aligned to the TM-1 reference genome (see methods for details) with8 9.66% of them being uniquely aligned (Additional file 3: Table S1).
In the four anther developmental stages, a total of 21, 789 genes were found to be differentially expressed between MS and WT. Of those differentially expressed genes (DEGs), 17,028 (78.15%) were down-regulated and 4761 (21.85%) were up-regulated (|log 2 (fold change) | ≥2 and p adj < 0.05; Fig. 2a). Compared with WT, MS had a much higher number of down-regulated genes than that of up-regulated genes at all four developmental stages, especially at the two late stages (Fig. 2a). Of the 21,789 DEGs, 13,783 were unique ones with 109, 1041, 2081 and 4333 unique to meiosis, tetrad, mononuclear and binuclear pollen, and mature pollen stage, respectively, and 128 genes showed differential expression at all the four developmental stages (Fig. 2b). Of the upregulated genes (MS vs WT), 2157 (45.31%) had a 2-3 folds change in expression while 874 (18.35%) had an over 5 folds change in gene expression. Among the down-regulated genes, 4343 (25.51%) had a 2-3 folds expression difference while 7317 (42.97%) had a > 5 folds expression difference (Fig. 2c).

Gene ontology analysis of DEGs
Gene Ontology (GO) analysis was performed using the DEGs identified between MS and WT in the four different anther developmental stages. Using the criterion of corrected P Value ≤ 0.05, we found that the 13,783 DEGs were enriched for 118 GO terms (Additional file 4: Table  S2), and identified 5, 95, 125 and 107 GO terms enriched at the meiosis, tetrad, mononuclear and binuclear pollen, and pollen maturation stage, respectively, with 2, 33, 34 and 21 GO terms unique to each of the corresponding stage (Fig. 4a). The number of enriched GO terms were significantly fewer at the meiosis stage (the top 30 GO terms instead of the 5 enriched ones were shown in Fig. 4b) than at the other three stages (Fig. 4c). Three GO terms were enriched at both the meiosis stage and the tetrad stage, 6 GO terms were enriched at both the tetrad stage and the mononuclear and binuclear pollen stage, 33 GO terms were enriched at both the mononuclear and binuclear pollen stage and the pollen maturation stage, and only 1 GO term was enriched at both the tetrad stage and the pollen maturation stage. A total of 52 GO terms were commonly enriched from the tetrad stage to the pollen maturation stage, and no commonly enriched GO term was found in all the four stages (Fig. 4a). The enriched GO terms were quite different between the meiosis stage and the other three stages (Additional file 5: Table S3). For example, the two biological process terms enriched at the meiosis stage were related to asparagine metabolism, whereas those enriched in the other three stages were mainly related to cell differentiation and cell wall organization or biogenesis (Fig. 4b, c). These results together with the number of DEGs identified at the four developmental stages suggest that the significant difference in anther development between MS and WT most likely starts from the tetrad stage, consistent with the cytological observation.

KEGG pathways of DEGs
In order to further analyze the DEGs, we performed pathway analysis. Analysis of all the 13,783 non-redundant DEGs found 109 Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways (Additional file 6: Table S4). Many significantly changed pathways were related to metabolism of starch and sucrose, plant-pathogen interaction, galactose metabolism, biosynthesis of endocytosis, sesquiterpenoid and triterpenoid, inositol phosphate metabolism, phenylpropanoid biosynthesis, and phagosome (Additional file 6: Table S4), many of them are associated with tapetum and pollen wall development. The enriched KEGG pathways at the meiosis stage included those involved in metabolism of starch, sucrose, alanine, aspartate and glutamate, interconversions of pentose and glucuronate, and  (Fig. 5a). The DEGs involved in zeatin biosynthesis included genes encoding cytokinin oxidase/dehydrogenase (Gh_ D04G0688) and don-glucosyltransferase (Gh_D08G1604). Among the DEGs involved in the metabolism of cutin, suberine and wax were genes encoding Jojoba acyl CoA reductase-related male sterility protein (Gh_A09G1215, Gh_D09G1221), cytochrome P450 (Gh_D12G2271) and caleosin-related family protein (Gh_D10G1585) (Additional file 6: Table S4, Additional file 1: Fig. S1B). At the tetrad stage, pathways related to metabolism of starch, sucrose, cyanoamino acid, ascorbate and aldarate, and interconversions of pentose and glucuronate were enriched (Fig. 5b), including genes encoding serine transhydroxy methyltransferase (Gh_A11G1879), BS-glucosidase (Gh_D07G2431, Gh_D01G2109) and ascorbate peroxidase (Gh_A13G2003) (Additional file 6: Table S4, Additional file 1: Fig. S1C). The pathways enriched at the mononuclear and binuclear pollen stage were related to metabolism of ether lipid and phenylalanine, including genes that encode phospholipid/glycerol acyltransferase (Gh_A01G0955) and peroxidase (Gh_ A08G2028 and Gh_D11G0463) (Additional file 6: Table S4, Additional file 1: Fig. S1D). The terpene synthase pathway with 21 DEGs, which is related to sesquiterpene and triterpene biosynthesis, was identified at the pollen mature stage (Additional file 6: Table S4, Additional file 1: Fig. S1E). These results suggest that anther development in cotton is controlled by a complex gene network regulating multiple metabolic pathways.

Gene network analysis with WGCNA
Weighted Correlation Network Analysis (WGCNA) is a systematic biological approach for describing patterns of genes association with different samples. To identify the specific genes that are highly correlated with anther development, the co-expression networks were generated by WGCNA using the 13,783 non-redundant DEGs and all biological samples. A total of 11 gene modules associated with the specific expression profiles of different samples were identified (Fig. 6a). Of the 11 modules, 5 (greenyellow, green, purple, yellow and pink) were significantly associated with tetrad, mononuclear and binuclear pollen, or pollen maturation stage in WT or MS, and no gene module was significantly associated with the meiosis stage although the blue module was marginally significant in MS (Fig. 6b). The greenyellow module with 54 genes was highly associated with the tetrad stage of WT (Shida 98-6). The green (191 genes) and turquoise (3262 genes) modules were significantly and marginally significantly associated with the mononuclear and binuclear pollen stage in MS and WT, respectively. There were 78 genes in the purple module that was significantly associated with the pollen maturation stage of WT. The yellow (492 genes) and pink (105 genes) modules were significantly associated with the pollen maturation stage in MS in addition to the marginally significant black module (Fig. 6b).
The 54 DEGs of the greenyellow module were enriched for 10 GO terms, including asparagine synthase  glutamine-hydrolyzing) activity, asparagine metabolic process, asparagine biosynthetic process, aspartate family amino acid biosynthetic process and aspartate family amino acid metabolic process (Additional file 2: Fig.  S2A). At the meiosis and tetrad stages, the greenyellow module was highly associated with alanine, aspartate and glutamate metabolism (Corrected P Value = 9.15 × 10 − 3 , 3 genes) and diterpenoid biosynthesis (Corrected P Value = 1.38 × 10 − 2 , 2 genes) (Additional file 2: Fig. S2B). The KEGG pathway analysis was also performed on the genes in each of other modules (Additional file 7: Table  S5), and their fragments per kilobase of transcript per million mapped reads (FPKM) values were shown in Additional file 8: Table S6.

Construction of co-expression gene networks and identification of hub genes associated with male sterility
According to cytological observation, the difference in anther development between WT and MS first appeared at the tetrad stage, our focus was thus on the greenyellow module as it was significantly associated with the tetrad stage in WT and most of its genes were significantly down-regulated at the tetrad stage in MS (Fig.  6b). Among these down-regulated genes, our focus was further on those that were also down-regulated during meiosis, as they might be related to the degradation of the tapetum and the formation of pollen grains in the early stage. In the greenyellow module, we identified 25 hub genes based on the criteria of eigengene-based connectivity (K ME ) value ≥0.9 and edge weight value ≥0.4. The major hub genes included genes encoding NAC domain containing protein 47 (Gh_D11G2469), MLP-like protein 423 (Gh_A04G0891), myb domain protein 2 (Gh_A11G0981) and glutamine-dependent asparagine synthase 1 (Gh_D09G0861) (Fig. 7, Table 2).

Tapetum and anther development related genes
Tapetum plays important roles in pollen development. Any abnormality in tapetum development will lead to pollen abortion. In Arabidopsis, a number of genes and transcription factors, including DYT1, TDF1, AMS, MYB103 and MS1, regulating development of tapetum have been reported [29]. In our anther transcriptome data, the cotton orthologs of these genes were clustered in the blue module (Fig. 6b), which was marginally associated with the meiosis stage in MS (Fig. 3b). Individually, these tapetum development related genes were not differentially expressed at the meiosis stage between WT and MS, but were significantly up-regulated at the tetrad stage in MS (Fig. 3b), suggesting that these genes might contribute to the MS trait although they might not be the cause of the MS trait. Several cotton orthologs of other Arabidopsis and rice genes related to anther and tapetum development were also differentially expressed at the meiosis and/or tetrad stages. For instance, the MS2 genes (Gh_A09G1215, Gh_D09G1221) encoding the Jojoba acyl CoA reductaserelated male sterility protein (Fig. 3b) were downregulated at the meiosis and tetrad stages in MS compared to WT, while ACOS5 (Gh_A03G1091, Gh_D02G1514) encoding acyl-CoA synthetase 5, and CYP703A1 (Gh_ A12G1506, Gh_D12G2768) and CYP703A2 (Gh_ A12G1014, Gh_D12G1132) encoding cytochrome P450 were up-regulated during the tetrad stage in MS anthers (Fig. 3b).

Discussion
Male sterility prevents autogamy and permits allogamy, and thus reduces the cost of hybrid seed production by eliminating the process of emasculation. The male sterile trait described in this study was controlled by a dominant mutation in a single nuclear gene (Table 1), which adversely affected anther and pollen development, resulting in complete pollen abortion (Fig. 1). The MS mutation was stable in different genetic backgrounds and under variable environmental conditions, it would be a highly valuable trait in cotton heterosis breeding. In breeding, recurrent selection is an effective strategy to improve the genetic structure of crop population and improve the frequency of favorable genes in the population through multiple hybridization and selection. It plays an important role in crop improvement as the favorable genes of different germplasm could be concentrated in one individual by recurrent selection [30][31][32]. We have introgressed the MS trait into different elite cotton genetic backgrounds, such as Shida 98-6 and Xinhai 53. The new MS germplasm provides a powerful tool for creating new germplasm resources because a large number of crosses could be easily performed by The MS mutant did not produce any pollen grains, which could be caused by developmental defects between the meiosis stage and the tetrad stage as the initial anther developmental differences between MS and WT were observed at the tetrad stage (Fig. 1f, j). At the tetrad stage, the tapetum of MS anther became vacuolated. At the mononuclear and binuclear stage, compared to the WT anther, the MS anther had more tapetum residue or remaining tapetum cells, the pollen wall became thickened (Fig. 1k), and pollen grains had no spinules (Fig. 1j, k). Tapetum is the innermost anther wall directly connected with pollen mother cells, playing an important role in pollen development. With the formation of microspores, tapetum cells began to functionally differentiate to form cells with secretory function. The degradation of tapetum provides necessary enzymes and nutrients for microspore development [33,34]. Tapetum degradation has thus to be very precisely regulated, and precocious or delayed degradation of tapetum could lead to microspore abortion [35]. Therefore, we speculate that the delayed tapetum degradation or failure in development of functional pollen grains might be the underlying cause of the MS trait we discovered.
It had been reported that mutations of transcription factor genes regulating tapetum development could lead to tapetum cavitation, tapetum irregular degradation and pollen wall defects in Arabidopsis thaliana [36]. A gene network or cascade, DYT1 -TDF1 -AMS -MYB103 -MS1, has been proposed to be involved in regulation of the development and degradation of tapetum in Arabidopsis [21,34]. TDF1 is a R2R3 MYB transcription factor gene. In A. thaliana, tapetum cells of the tdf1 mutant had serious morphological vacuoles and could not transit to secretory tapetum [37]. Its rice ortholog (OsTDF1) played a role similar to that of A. thaliana [38]. Mutation in OsTDF1 resulted in changes in hydrolytic enzyme, hydrophobic protein, cellulose, hemicellulose and pectin polymer, which play a role in the development of pollen wall. In the MS anther, the expression level of the cotton orthologs of Arabidopsis TDF1 and other genes of the network regulating tapetum development and degradation were up-regulated at the tetrad stage, suggesting ongoing degradation of tapetum at this stage in the MS anther, consistent with the observed phenotype of delayed degradation of tapetum in the MS anther (Fig. 1j, k).
In rice and A. thaliana, Reactive oxygen species (ROS) regulates PCD in the tapetum, thereby affecting the development of male gametes [39,40]. To maintain the dynamic balance of ROS, plants contain enzymes that produce ROS (such as NADPH oxidase and peroxidase) and enzymes that remove ROS (such as ascorbate peroxidase and glutathione peroxidase). Antioxidants, such as ascorbic acid and glutathione mainly act on scavenging hydroxyl radicals and singlet oxygen [41]. Ascorbic acid and glutathione related genes were enriched in DEGs at the tetrad stage based on KEGG analysis (Fig. 5b) and the expression level of genes was lower in MS than in WT (Additional file 1: Fig. S1C), which might lead to the failure of scavenging of ROS in cells of the MS mutant. Consequently the ROS homeostasis could not be maintained in cells of the MS mutant, leading to destroying the critical period of anther development and the occurrence of plant sterility. GO analysis showed that asparagine synthase (glutamine-hydrolyzing) activity, asparagine biosynthetic and metabolic process were enriched at the meiosis stage and the greenyellow module (Fig. 4b, Additional file 2: Fig. S2A), and KEGG analysis found enriched alanine, aspartate and glutamate metabolism in the greenyellow module (Additional file 2: Fig. S2).
Based on the WGCNA analysis (Table 2) and the correlation networks map (Fig. 7), a gene encoding NAC domain containing protein (Gh_D11G2469) was identified as the candidate hub gene highly associated with anther development at the meiosis and tetrad stages. Gh_ D11G2469 is homologous to Arabidopsis genes AT1G61110 and AT3G04070. AT1G61110 is specifically expressed in tapetum based on GUS fusion analysis. Mutation of the gene led to defects in tapetum and pollen development, and finally male sterility [42]. The functionality of AT3G04070 has not been reported. Downregulation of the rice homologue, Os07g37920, of Gh_ D11G2469 reduced the proportion of viable pollen grains and caused male sterility [43]. Considering these demonstrated roles of Arabidopsis and rice homologues in anther and pollen development, the role of Gh_ D11G2469 in cotton anther and pollen development warrants further characterization.

Conclusions
We described a natural male sterile mutation found in Sea Island cotton, which is controlled by a single dominant nuclear gene. Comparison of sterile and fertile anthers demonstrated cytologically that male sterile was due to pollen abortion caused by developmental defects during the transition from meiosis to tetrad formation. Comparative transcriptome analysis supported the cytological observations. Gene network and pathway analyses identified gene modules and candidate genes regulating anther and pollen development. The MS trait has been introgressed into elite G. hirsutum and G. barbadense backgrounds, providing novel MS germplasm for developing cross combinations with hybrid vigor, production of hybrid seeds, and innovation of germplasm resource.

Plant materials
In the summer of 1992, we found a male sterile Sea Island cotton plant (G. barbadense). Flowers of the sterile plant were pollinated with pollens from fertile plants to preserve the sterile trait. The F1 population showed roughly equal number of fertile and sterile plants, and all F2 progeny derived from the fertile F1 plants were fertile, indicating that the male sterile trait is controlled by a single dominant genic mutation. The male sterility mutant used in this study was provided by the Cotton Research Institute of Shihezi University. First, the male sterility of sea island cotton was transferred to upland cotton through backcross generations, the sterile trait was then introgressed into Upland cotton Shida 98-6 (G. hirsutum) and Sea Island cotton Xinhai 53 (G. barbadense) background by backcross (six times) to breed nearly isogenic male sterile line Shida 98-6A and Xinhai 53A, respectively. Several populations, including two derived from cross Shida 98-6A or Xinhai 53A with another Upland cotton variety Xinluzao 33, were generated for genetic analyses ( Table 1). The segregating populations were grown in the experimental field of the Cotton Institute of Shihezi University in Xinjiang, China. The plants (WT and MS) used in microscope observation and transcriptome analysis were grown in artificial climate chamber at 28-30°C with a 16 L: 8 D photoperiod scheme and 60% relative humidity.

Histological analyses
Different sizes of flower buds with a diameter < 9 mm were harvested from MS and WT plants to identify anthers at the following four developmental stages: meiosis, tetrad, mononuclear and binuclear pollen and pollen maturation stages using optical microscopy [28]. The anthers of the four developmental stages were separately collected, frozen in liquid nitrogen and stored at − 80°C for subsequent experiments. For histological observation, the samples were fixed in FAA (5% formalin, 5% glacial acetic acid, 70% alcohol), dehydrated with increasing concentration of alcohol (70, 85, 95, 100%) for 1 h each, and then treated with increasing concentration of dimethylbenzene (1/2 alcohol + 1/2 dimethylbenzene, 1/ 4 alcohol + 3/4 dimethylbenzene and pure dimethylbenzene) for 1 h each, and finally soaked and embedded in wax. After embedding, the samples were sliced to sections with a thickness of 8 μm. A drop of sticky tablet sticks and spreads were dropped on a clean glass slide with the sliced sections, and baked and dried. The samples were dewaxed stained by 1% Safranine O (configured with 85% alcohol) and 1% Fast Green FCF method (configured with 95% alcohol), and then dehydrated, transparent, and mounted for observation using optical microscopy.

RNA extraction, library construction, and RNA-Seq
The anthers collected at different stages were sent to Beijing Novogene bioinformatics technology Co., Ltd. for RNA-Seq. The purity and integrity of the RNA samples were determined by NanoPhotometer Spectrophotometer and Agilent 2100 Bioanalyzer. RNA integrity and possible contamination were further determined using 1% agarose gel electrophoresis. A total of 3 μg total RNA per sample was used in RNA-seq library preparation (each developmental stage with three biological replicates). Sequencing was done using an Illumina HiSeq 4000 sequencing platform with 150 base pair (bp) paired-end (PE) reads.

Data processing
Raw data in FASTq format were processed using Perl scripts to determine the quality of the data, including the percentage of GC content, Q20 and Q30. Lowquality reads were filtered and eliminated to obtain clean reads for subsequent analysis. Indexes of the G. hirsutum reference genome were built by Bowtie v2.2.3 and the gene model annotation files were downloaded from the CottonGen database (http://www.cottongen.org). The generated clean reads were mapped to the G. hirsutum TM-1 genome [44] using TopHat v2.0.12. The Cufflinks software was used to calculate the expression levels of individual genes using FPKM [45]. The DESeq software was used to determine differential gene expression through negative binomial distribution and calculation of false discovery rate (FDR) by the Benjamini and Hochberg method, as well as the adjusted p value . The adjusted p value < 0.05 was used as the threshold for identification of DEGs among samples.

Function annotation of DEGs and KEGG pathways
The functions of DEGs were annotated based on homology with the annotated Arabidopsis genes using TAIR (Arabidopsis information resources). GOseq [46] was used to perform GO functional enrichment analysis of DEGs. KEGG enrichment was done by the KOBAS 2.0 software.

Gene network construction and screening of hub genes
The WGCNA R package [47] was used to construct the co-expression networks associated with anther development. The hub genes were screened on the basis of the module K ME values and high-weight values. The correlation networks were drawn using Cytoscape 3.6.1.

Quantitative RT-PCR validation of DEGs
Quantitative RT-PCR (qRT-PCR) was performed to validate DEGs according to previously reported protocol [48]. Gene-specific primers were designed according to the cDNAs sequences with NCBI Primer-BLAST (https://www.ncbi.nlm.nih.gov/tools/primer-blast/) (Additional file 9: Table S7). Total RNA was extracted from the samples using RNA extraction kit (Takara). And then total RNA (1 μg) was denatured and reverse transcribed by using PrimeScript™ RT reagent Kit with gDNA Eraser (Takara) at 42°C for 15 min, 58°C for 5 s. qRT-PCR was performed on Light Cycler® 480II, using Power SYBR Green PCR Master Mix, and 10 μl PCR reactions contained 1 μl of cDNA, 5 μl of SYBR Green PCR Master Mix, 10 μM of each pair of target primers and 3 μl of H 2 O. PCR conditions were as follows: preincubation at 95°C for 5 min, followed by 40 cycles of 94°C for 15 s, 58°C for 20 s, and 72°C for 20 s, then melting curve with 95°C for 5 s, 65°C for 1 min, 97°C for 1 min, finally cooling at 40°C for 10 s. Three technical replicates from three independent biological experiments were performed for qRT-PCR analyses. The relative expression levels were analyzed according to the 2 −ΔΔC T method and ubiquitin (GhUBI, XM_ 012634824) was used as a reference gene [49].