Analysis of CcGASA family members in Citrus clementina (Hort. ex Tan.) by a genome-wide approach

The Gibberellic Acid Stimulated Arabidopsis (GASA) proteins were investigated in the study to help understand their possible roles in fruit trees, particularly in Citrus. A total of 18 CcGASA proteins were identified and characterized in Citrus clementina via a genome-wide approach. It was shown that the CcGASA proteins structurally shared a conserved GASA domain but varied considerably in primary sequences and motif compositions. Thus, they could be classified into three major phylogenetic groups, G1~G3, and two groups, G1 and G3 could be further classified into subgroups. The cis- elements on all CcGASA promoters were identified and categorized, and the associated transcription factors were predicted. In addition, the possible interactions between the CcGASA proteins and other proteins were predicted. All the clues suggested that these genes should be involved in defense against biotic and abiotic stresses and in growth and development. The notion was further supported by gene expression analysis that showed these genes were more or less responsive to the treatments of plant hormones (GA3, SA, ABA and IAA), and infections of citrus canker pathogen Xanthomonas citri. It was noted that both the segmental and the tandem duplications had played a role in the expansion of the CcGASA gene family in Citrus. Our results showed that the members of the CcGASA gene family should have structurally and functionally diverged to different degrees, and hence, the representative group members should be individually investigated to dissect their specific roles. Supplementary Information The online version contains supplementary material available at 10.1186/s12870-021-03326-6.


Introduction
The Snakin/Gibberellic Acid Stimulated in Arabidopsis (GASA) is a unique multigene family. Since the isolation of GAST1 (Gibberellic Acid Stimulated Transcript 1) from tomato [43], many GASA protein family members have been characterized in different species, such as potato [42], common wheat [13], soybean [2], Arabidopsis [6,38], petunia [8], rice [15], apple [14], grapevine [1] and maize [56]. A comprehensive genome sequence analysis of 33 plant species revealed approximately 445 Snakin/GASA protein encoding genes [45]. Further bioinformatics data mining showed that the Snakin/GASA genes were present in all well-characterized or sequenced plant species but were completely absent in moss and green algae [17]. It is known that the GASA family peptides share a conserved C-terminal domain, designated as GASA. The GASA domain contains 12 cysteine residues (Cys-motif ) arranged in the pattern of "XnCX 3 CX 3 CX 8( GASA proteins are known to play diverse roles in plants. They are involved in the regulation of growth and development processes, including cell division [34], stem elongation [8], floral induction [36], seed germination [38,39], lateral root formation [56] and fruit development [33]. The GASA proteins are also linked to stress responses, such as resistance to heat [26], drought, and paclobutrazol (PBZ) stresses [51], tolerance to salt and oxidative stresses [3], and modulation of reactive oxygen species (ROS) [47]. Moreover, GASA proteins have shown suppressive effects on a wide range of bacterial and fungal pathogens. Purified StSN1 peptide, for example, was found in an in vitro challenge experiment to be toxic to several fungal pathogens like Fusarium solani, Fusarium culmorum, Bipolaris maydis and Botrytis cinerea, and to bacterial pathogens such as Clavibacter michiganensis subsp. Sepedonicus [46]. The GASA family proteins even exhibit anti-viral and anti-nematode activities, as exemplified by GmSN1 that enhances soybean mosaic virus resistance in both Arabidopsis and soybean [19] and by CaSn that promotes nematode-resistance in pepper [31]. The citrus CcGASA4 was shown to be highly induced in citrus leaves following infection of Citrus tristeza virus [54].
GASA family genes are implicated in the responses of plants to hormones such as gibberellin (GA), abscisic acid (ABA), salicylic acid (SA), jasmonic acid (JA) and ethylene (ETH). Most GASA genes including Arabidopsis AtGASA4, 6,7,8 and 13 [39,57], rice OsGASR1 and 2 [15], petunia GIP 1, 2, 4 and 5 [8], maize ZmGSL1, 2, 4, 6 and 9 [56], were induced by exogenous GA treatment. However, some other GASA genes, such as Arabidopsis AtGASA1, 5, 9 and 11 [57], and potato StSN2 [9] were repressed by GA. Interestingly, some GASA genes exhibited tissue-specific responses to GA applications. GsGASA1 expression was induced in leaves but repressed in roots by exogenous application of GA in Glycine soja [30]. AtGASA4 was up-regulated in most, if not all, meristematic regions, presumably in actively dividing cells, but was down-regulated in cotyledons and leaves following GA treatment [6]. ABA was shown to induce the expression of AtGASA2, 3, 5 and 14, and inhibit the expression of AtGASA7 and 9 in Arabidopsis [57]. The expression of StSN2 was induced [9] whereas Snakin-3 was down regulated by ABA treatment in potato [35]. In addition, some members of the GASA family, such as GAST1 [37,43], StSN2 [9] and GASA5 [57], were regulated by GA and ABA antagonism. The GA mediated increase in GAST1 transcripts was partially inhibited by ABA in tomato [43]. AtGASA1 was up-regulated by GA and down-regulated by ABA in Arabidopsis [37]. StSN2 was up-and downregulated by ABA and GA, respectively, in potato [9]. The expression of GASA5 was repressed by GA 3 and enhanced by ABA [57]. GASA genes are also responsive to other hormones. Brassinoesteroid (BR) synthesis was activated by OsGSR1 by directly regulating a BR biosynthetic enzyme [52]. The transcription of HbGASA7-1, 14 and 16 was significantly increased after ETH, SA, or JA treatment in Hevea brasiliensis [4].
Less information about the GASA genes is available for trees, particularly for fruit trees. Nevertheless, 14 VvGASA genes in grapevine (Vitis vinifera L.) [1], 26 MdGASA genes in apple (Malus domestica) [14] and a CcGASA4 gene in citrus [54] were reported. Citrus is one of the most important fruit trees worldwide. A steady increase in global per capita consumption of citrus fruits has been witnessed in the past 30 years [32]. Citrus production is, however, being threatened by numerous biotic and abiotic stresses. Identification and functional analysis of citrus defense-and stress-related genes should deepen our understanding of the molecular mechanisms of stress-responses and lend helps in improving stress tolerance in plants. Considering that a citrus GASA gene was highly responsive to the Citrus tristiza virus, we performed a detailed bioinformatics analysis of the relevant gene family, aiming to provide first tier information for dissecting their exact roles in the defense of citrus against stresses.

Identification of putative Citrus clementina GASA genes
The GASA protein sequences of Arabidopsis, apple [14] and grape [1] were downloaded from the EnsemblPlants online database (http:// plants. ensem bl. org/ info/ data/ ftp/ index. html) and analyzed using the HMMER (https:// www. ebi. ac. uk/ Tools/ hmmer/) to build a model based on the GASA domain (Accession: pfam02704). The model was used to query the entire C. clementina genome to obtain all putative clementine mandarin GASA proteins (Supplementary Table S1). The integrity of the GASA domain of every CcGASA protein sequence was verified using the Simple Modular Architecture Research Tool (SMART: http:// smart. embl. de/) [24]. All non-redundant putative protein sequences with conserved GASA domain were reserved and used for further analysis. GASA proteins from other three citrus species, pommelo (Citrus maxima), sweet orange (Citrus sinensis) and trifoliate orange (Poncirus trifoliata), were also identified following the same protocol.
All CcGASA protein coding sequences, genomic sequences and the associated information such as accession number and chromosomic position, were downloaded from the Phytozome database (https:// phyto zome. jgi. doe. gov/ pz/ portal. html#). In addition, the physical location of each CcGASA gene on the genome was mapped by the Mapchart software.

Promoter analysis of CcGASA genes
Around 1.9-kb long promoter sequence upstream of the start codon (ATG) of each CcGASA gene was downloaded from the Phytozome12.0 database [18]. The cisregulatory elements on the promoters were analyzed by using the Plant Cis-Acting Regulatory DNA Elements (PlantCARE) program (http:// bioin forma tics. psb. ugent. be/ webto ols/ plant care/ html/) [29].

Expression analysis of CcGASA genes
The 4-year-old C. clementina trees had been planted in plastic pots and used as experimental materials. For the biotic stress treatment, mature leaves of similar size and shape were picked and brought to lab in a humidified chamber, and immediately pierced at ten evenly spaced points at the back of each leaf with a syringe needle. The pierced leaves were placed face down in a try lined at the bottom with water-soaked filter papers. Ten micro litter of Xanthomonas citri subsp. Citri (Xcc) cell cultures, diluted to 5 × 10 8 cfu/mL (OD 600 ≈ 0.3), was applied to cover the pierced holes. For controls, 10 μL of water was applied. The trays were then misted with water and covered with plastic films, and sampled at 0 h, 12 h, 24 h, 48 h, and 72 h. Samples were immediately frozen in liquid nitrogen, and then stored in a -80°C refrigerator until use. For hormone treatment, trees were sprayed with 100 μΜ of GA 3 , 2 mΜ of SA, 100 μM of indole-3-acetic acid (IAA) and 200 μΜ of ABA, and leaves were harvested after 3h, 6h, 12h and 24h, flash frozen in liquid nitrogen, and stored at −80°C. Three biological replicates were used in the study.
Total RNA was extracted from frozen leaf samples using the Polysaccharide Polyphenol plant total RNA rapid extraction kit (Bioteke, China) following the manufacturer's instructions. RNA concentrations were determined using the NanoDrop2000C (ThermoFisher Scientific, USA), and RNA quality was evaluated by ratios of OD 260 / OD 280 . cDNA was synthesized from total RNA using the PrimeScript TM RT reagent kit with gDNA Eraser (TaKaRa, Japan). The quantitative real-time PCR (qRT-PCR) experiment was performed on a thermo-cycler, QuantStudio5 (ABI, USA). Each tube of qRT-PCR solution contained 10.0 μL iTaq TM Universal SYBR ® Green Supermix (Bio-Rad, USA), 1.0 μL primer pair F/R, 2.0 μL cDNA, and 6.0 μL dH 2 O. The thermo-cycler was programed as follows: an initial incubation at 95°C for 10 min and followed by 40 cycles of 95°C for 5 s + 60°C for 20 s. For melting-curve analysis, the program was set to 95°C for 15 s, 60°C for 2 min and then the temperature was progressively increased to 95°C at a constant rate of 0.2°C/s. The primers used were shown in Supplementary Table S2. The Actin (Gen-Bank accession: XM_006427792) was used as the reference gene. Since three proteins, CcGASA16, CcGASA17 and CcGASA18, were predicted to be derived from a single gene sequence (Ciclev10006243m.g) via alternative splicing, it was impossible to design primers to separate CcGASA18 from CcGASA16 and CcGASA17, and hence only CcGASA16 and CcGASA17 were eventually analyzed. For quality controls, three technical replicates were also used in addition to three biological replicates. The relative expression level, shown as the ratio of the analyzed gene to the reference gene, was determined by calculating the 2 −ΔΔCt (ΔCt = Ct CcGASA -Ct Actin). Data were statistically analyzed using ANOVA. Duncan's LSD multiple range test (p ≤ 0.05) was performed to reveal significant changes. Figures were drawn by using the Origin 2019b.
The leaves, stems, young fruits and roots from healthy C. clementina trees were subjected to transcriptome profiling in a commercial biotech company, Oebiotech Company (China; https:// www. oebio tech. com/) to reveal possible tissue-specific expression patterns. Briefly, total RNA was isolated from samples and used to prepare RNA-seq libraries. The libraries were then sequenced on Illumina Genome Analyzer platform. Clean reads were obtained by passing the row sequencing data through all quality control procedures, and were mapped to the C. clementina genome sequence using the HISAT2 software. The expression level of each gene in the RNA-seq libraries was calculated as the FPKM (Fragments Per Kilobase of transcript per Million fragments mapped) by using the Cuffquant and Cuffnorm software. The differentially expressed transcripts, defined by a fold change (|log2Fold Change|) of greater than 1 and a P value (false discovery rate, FDR) of less than 0.05, between samples were identified by using the DESeq2 [5] software.

Synteny analysis and calculation of Ka/Ks ratio for duplicated genes
CcGASA gene duplication events were identified according to the criteria proposed by Holub [21]. The Basic Circos function of TBtools software was used to show the interspersed segmental duplications using the data from the PLAZA (https:// bioin forma tics. psb. ugent. be/ plaza/ versi ons/ plaza_ v4_5_ dicots/) [49]. The Ka (non-synonymous substitution rate) and Ks (synonymous substitution rate) between the duplicated genes were calculated using the online tool PAL2NAL (http:// www. bork. embl. de/ pal2n al/ index. cgi? examp le= Yes# RunP2N). Mode of selection acting on the duplicated genes was evaluated through Ka/Ks ratio, and a positive, negative or neutral selection was considered when the ratio was > 1, < 1, or = 1, respectively. The gene loci of GASAs were extracted from the annotation gff3-file on the EnsemblPlants and the Orange (C. sinensis) Genome Annotation Project online database. Collinear pairs were extracted using TBtools [12] to identify syntenic blocks and duplications within the GASAs across the whole genomes of 7 species, C. clementina, C. sinensis, C. maxima, P. trifoliata, A. thaliana, M. domestica and V. vinifera. The collinearity map between these species was drawn with the help of MCScan X (TBtools software) program.

Analysis of transcription factor regulatory network and protein interaction network involving CcGASA proteins
Transcription factor (TFs) network prediction was performed online at the threshold parameter p-value ≤ 1e-5 on the Plant Transcriptional Regulatory Map (PTRM) website (http:// plant regmap. gao-lab. org/ regul ation_ predi ction. php), using all the CcGASA sequences as an input. The Cytoscape 3.8 software was used to visualize the transcription factor regulatory network [27]. The predicted TFs were subjected to GO analyses on the Omicshare cloud platform (https:// www. omics hare. com/ tools/). The functional interacting network models of CcGASA proteins were predicted using the web program STRING 11.0 (http:// string-db. org). The confidence parameter was set at a threshold of 0.40, and for other parameters the default values were used.

Genome-wide distribution of C. clementina GASA genes and features of their deduced proteins
Eighteen putative CcGASA proteins were found by searching the C. clementine protein database against a model built from other known plant GASA proteins (Table 1), and correspondingly, 15 CcGASA genes were identified. The proteins were sequentially designated according to their chromosomal locations as CcGASA1-18 in this study (Fig. 1). Comparison between CcGASA4 and CcGASA5 showed that both are coded by the same gene, Ciclev10033115m.g. Similarly, CcGASA16, 17 and 18 are coded by Ciclev10006243m.g. It should be noted that Ciclev10013454m, previously designated as CcGASA4 [54], was re-designated as CcGASA12 in this study. As shown in Fig. 1, the 15 CcGASA genes were located on 7 scaffolds on the C. clementina genome. More specifically, 4 of them were on scaffold 5, 3 on scaffold 6, 3 on scaffold 9, 2 on Scaffold 3, 1 on scaffold 2 and 1 on scaffold 4.
The deduced protein sequences of the 18 CcGASA transcripts varied in length from 70 to 206 amino acids (Fig. 2). They were low molecular weight peptides, mostly less than 13kDa, although CcGASA7 (15.86 kDa) and CcGASA11 (21.78 kDa) were slightly larger ( Table 2). All CcGASAs were relatively high in their isoelectric point (pI) values, for 7 of them had a pI value of higher than 8, and the others even higher than 9. They were mostly predicted to be unstable since 13 of them, except for CcGASA1, CcGASA2, CcGASA4, CcGASA5 and CcGASA6, had an instability index values of higher than 40. According to the Grand average of hydropathicity (GRAVY), the CcGASA proteins, excluding CcGASA2 and CcGASA8, were hydrophilic. Amino acid preference analysis showed that Cys, Lys, Ser, Leu and Pro were the preferable amino acids although Ala, Gly, Thr, Glu, Argand Tyr were also common. The aliphatic index values of the CcGASA proteins were different, varying from 25.14 to 84.74.
The WOLF PSORT II program predicted that the citrus CcGASA family proteins were mostly extracellular, or vacuole-and chloroplast-localized. A few of them might be located in endoplasmic reticulum, nucleus, mitochondria, cytoplasm, plastid and golgiosome ( Table 2). The 3D model prediction, with a confidence of more than 99.9%, showed that all of them were relatively flexible for possessing random coils (Supplementary Fig. S1). As can be seen, a   small α helix was located at the end of the N-terminal, which was connected next to a larger α helix, but the β-strand was only present on two of them, CcGASA7 and CcGASA17 (Supplementary Table S3). The transmembrane topology analysis showed that there was at least one transmembrane helix on CcGASA1, 3, 4, 5, 7, 8,10, and 12 ( Supplementary Fig. S2).

Phylogenetic relationship of the GASA proteins
An unrooted NJ phylogenetic tree was established by aligning all GASA protein sequences from citrus (56), Arabidopsis (15), apple (26) and grape (14) (Fig. 3 with G1, CcGASA2, 4, 5, 7, 8, 9 and 11 were grouped with G3, whereas CcGASA6 and CcGASA13 were classified with G2. Clearly, all the six species analyzed contained homologs of the three branches. The phylogenetic tree for all citrus GASA proteins was shown in Fig. 4a. A total of 7 conserved protein motifs could be identified from the citrus GASA proteins analyzed (Fig. 4b). They were represented by CLRAC GTC CARCLCVPPGTYGNKEVC (motif1), SGYTRGLLQSIDCGGLCAARCSLHSRPNP (motif2), CYTBMTTKGGKPKCP (motif3), MAFRAALLL-LATLLLVSTSVLSNNEEEYLLEKDTTYPKTPVPA-PAPPKAP (motif4), MASRVFLLLSJLLFC (motif5), PTVTPAPPLKPPTTYPPPVKPPTTTPPPVTPPKTA-PAPQVP (motif6) and IAVIENQDTQRGZEV (motif7), respectively. It was found that the three motifs, motif1, motif2 and motif3, were universally presented on every member analyzed. Motif5 was missing only in G3c and while motif4 and 6 were only found in the same G3c. Motif7 was appeared only on G3d and on three members of G1. Apparently, the branching of the phylogenetic tree was related to the differences in the arrangement of these motifs. For example, the G2 group members contained 4 closely spaced motifs, motif1, motif2, motif3 and motif5. Similarly, most of the G1 members also contained the same 4 motifs but unlike G2, their motif5 and motif2 was interrupted by insertion of motif7 or a non-motif spacer. Clearly, the citrus G3 members were relatively more diverse than those of G1 and G2 as shown by their motif compositions and motif arrangements, and thus could be further classified into 4 subgroups G3a-d (Fig. 4a). The G1group could also be sub-classified into four sub-groups, G1a-d, Some citrus CcGASA proteins should be structurally or functionally impaired for missing one or two motifs as compared to their respective group members. For instance, motif2 and motif5 were deleted from CcGASA14 and 15, and motif5 was missing in CcGASA9, 16 and 17. The arrangements of exons and introns were relatively conserved in the same group (Fig. 4c). Only one intron was found in G2 while 2 to 3 introns were presented on most of the G1 and G3 group members. Uniquely, two trifoliate orange genes, PtGASA8 and 14, and one citrus gene, CcGASA17, did not contain any intron.

CcGASA promoters and their possible activators
The cis-regulatory elements identified on the promoters of CcGASA genes were listed in Fig. 5 and Supplementary Table S5. It was shown that a large number of the elements were stress-related, such as ARE (antioxidant response element), GC-motif (enhancer-like element involved in anoxic specific inducibility), LTR (low-temperature responsiveness), MBS (drought-inducibility), DRE core (cold-and dehydration-responsiveness), TC-rich repeats (defense and stress responsiveness), box S (elicitation, wounding and pathogen responsievness), MYB(abiotic element), MYC(abiotic element), STRE (stress response element), WRE3 (wound-response element 3), WUNmotif (wound responsiveness) and W box (wounding and pathogen responsive). Notably, 2 to 9 MYCs were present on all CcGASA promoters. The ARE, essential for the anaerobic induction, was present on 17 CcGASA promoters but not on the CcGASA15 promoter. Hormone responsive elements were also abundant, including ABRE that is responsive to ABA, P-box, GARE-motif and TATC-box that are responsive to GA, AuxRR-core and TGA-element that are responsive to AUX, ERE that is responsive to ET, TCA and TCA that are responsive to SA, and TGACG-motif and CGTCA-motif that are responsive to MeJA. Light response cis-elements, including 3-AF1 binding site, ACE, AE-box, ATCT-motif, AT1-motif, Box II, Box 4, GATA-motif, G-box, GA-motif, GTGGC-motif, I-box, LAMP-element, GT1-motif, Gap-box, LS7, MRE, Sp1, TCCC-motif, TCT-motif, chs-CMA1a and chs-CMA2a, were also frequently found on CcGASA promoters. In addition, plant growth and development associated cis-elements, including meristem-specific expression elements, CCG TCC -box and CAT-box, plant seed and shoot development-related RY-element, circadian control element, MSA-like cell cycle control element, and the palisade mesophyll cells differentiation element HD-Zip 1, were also identified on CcGASA promoters.
The potential transcriptional regulatory network of the CcGASA gene family was analyzed. It was shown that the TFs that might bind to the above-mentioned cis-elements were numerous ( Supplementary Fig. S3 and Table S6). Most of the TFs were ERF, MYB and MIKC_MADS, for 23, 12 and 9 of them were respectively identified in the study. In addition, 5 Dofs, 4 ARFs, 4 C 2 H 2 s, and 4 HSF were also predicted to be the activators of the CcGASA genes. Functionally, these TFs were mostly associated with abiotic and biotic stresses such as pathogen attacks, heat shock and drought stresses. In addition, some TFs including ERF (Ciclev10009361m.g), ERF (Ciclev10025816m.g), ARF (Ciclev10000183m.g, Ciclev10011065m.g, Ciclev10014391m.g, Ciclev10030860m.g), GRAS (Ciclev10017466m.g), and C2H2 (Ciclev10002297m.g) that are responsive to plant hormones are known to be involved in plant growth and development regulations. Tissue specific TFs were also identified, including the lateral organ boundaries transcription factor LBD (Ciclev10024416m.g) and the root specific transcription factor NAC (Ciclev10010579m.g). Detailed GO function enrichment analysis showed that these putative transcription factors were mainly enriched in the cellular process (GO:0009987), multicellular organismal process (GO:0032501), developmental process (GO:0032502), single-organism process (GO:0044699),   (Fig. 6).

Expression of CcGASAs under treatments of Xcc and plant hormones
The basal expression of all CcGASA genes was analyzed in different organs of C. clementina (Fig. 7). The results showed that CcGASA1, CcGASA3, CcGASA7, CcGASA12, CcGASA16, CcGASA17 and CcGASA18 were preferably expressed in leaves. CcGASA6, CcGASA8, CcGASA9 and CcGASA11 were highly expressed in fruits. CcGASA2 was mainly expressed in roots. CcGASA4, CcGASA5 and CcGASA10 were most abundantly expressed in stems. CcGASA14 was predominantly expressed in fruits and, to a lesser extent, in roots. Similarly, CcGASA13 was most highly expressed in leaves and next highly expressed in stems.
The inducibility of the CcGASA genes by Xcc was investigated. As shown in Supplementary Fig. S4, the detached C. clementina leaf explants developed watersoaked symptoms and white spots around the pinholes within 7d following inoculation with Xcc, the citrus canker pathogen. As expected, no such symptom was observed on control leaves. The qRT-PCR analysis showed that the CcGASA genes were mostly induced by Xcc (Fig. 8). Ten of the genes, CcGASA1, 2,4,5,8,9,11,12,13,16 were highly induced by Xcc, as exemplified by peak induction of more than 1000-fold for CcGASA1 and CcGASA16, and 300-fold for CcGASA13. Three clear patterns, gradual induction, later-stage induction and middle-stage induction, were observed. Gradual induction occurred to CcGASA1, 2,9,13. Later-stage induction occurred to 4 genes, CcGASA8, 11, 12, 16, for their expression was mostly peaked at 72 h. The expression of two genes, CcGASA4 and CcGASA5, was peaked at 24h, i.e., in the middle of the treatment. Two genes, CcGASA14 and CcGASA15, was shown to be moderately repressed or not induced by Xcc infection (Fig. 8).
The expression patterns of CcGASA genes were found to be modulated by hormone treatments (Fig. 9, Supplementary Figs. S5 and S6). It was shown that CcGASA3, 4,5,10,11,12 and CcGASA14 were significantly up-regulated by IAA treatment. The highest upregulation, about 12-fold, was shown by CcGASA3. The second highest induction, about 10-folds, was shown by CcGASA12. CcGASA11 was up-regulated by about 9-folds. In contrast, CcGASA13, 15, 16 and CcGASA17 were significantly down-regulated by IAA. Under SA treatment, the expression of CcGASA2, 3,4,5,10,11,12 and CcGASA13 followed an inverted V-shape pattern   whereas CcGASA16 and CcGASA17 were significantly down-regulated. Under GA 3 treatment, the expression of CcGASA1, 2, 11 and CcGASA12 was increased at the beginning, peaked in the middle and subdued at the end of the treatment while the expression of CcGASA3, 4 and CcGASA5 were decreased at 3h, but reversed to increase at 6h and 12h, and then decreased again at the end of the treatment. Notably, 3 genes, CcGASA7, 10 and CcGASA15 were always up-regulated whereas 3 other genes, CcGAS13, 16 and CcGASA17 were always down-regulated under GA 3 treatment. Under ABA treatment, CcGASA8, 10 and 14 were up-regulated, CcGASA2, 7 and 9 were initially increased and then decreased, whereas the other remaining genes were always down-regulated.

Evolution of citrus CcGASA genes
The possible tandem duplication events were inferred from analyzing the sequences of all CcGASA genes according to Holub et al. [21]. It was shown that a recent whole gene duplication event might be responsible for the cluster of two genes, CcGASA8 and CcGASA9, on scaffold 5. A more ancient whole gene duplication event should have generated another cluster on scaffold 9, CcGASA14 and CcGASA15. Interspersed segmental duplications were also detected across different scaffolds as shown in Fig. S7. They should be responsible for the generation of 4 pairs, CcGASA2-CcGASA5, CcGASA3-CcGASA12, CcGASA5-CcGASA11 and CcGASA6-CcGASA13 (Table 3). Ka/Ks ratios were calculated to measure the selection pressures between each of the 6 gene pairs ( Table 3). The results showed that the Ka/Ks ratios were all less than 1, suggesting that the duplicates had experienced purifying selection.
A synergy analysis of the orthologous GASA genes from C. clementina, C. sinensis, C. maxima, P. trifoliata, A. thaliana, M. domestica and V. vinifera genomes identified 96 collinearity events between C. clementina and the other six species. It was found that 6, 12, 11, 11, 11 and 10 CcGASA genes had synonyms in A. thaliana, C. maxima, C. sinensis, P. trifoliata, apple, and grape, respectively ( Fig. 10 and Supplementary Table S7). In addition, we found that CcGASA8, 13, 12 were collinearity with the GASA genes of the other 6 species, indicating that these 3 genes play a very important role in the expansion of the GASA family.

Discussion
The low molecular-weight GASA proteins have been identified in different plants, and a large number of functional studies have shown their crucial roles in regulating plant growth, development and defense against pathogens [46]. However, the detailed regulation mechanisms through which GASAs operate have not yet been established. We previously found that a Citrus GASA gene was induced by Citrus tristiza virus infection [54]. To go a step deeper in understanding the role of the gene and, in a broad sense, other GASA genes in citrus, we set out to conduct a genome-wide bioinformatics study on the C. clementina GASA gene family. In this study, 15 CcGASA genes were identified from the whole-genome sequence of C. clementina, including the gene previous reported by us [54]. Their genomic DNA and deduced protein sequences were compared with each other and with their homologous genes from other plant species, allowing the establishment of the inter-genomic and the intragenic phylogenetic trees. Detailed analysis on gene structures, cis-elements, chromosomal locations was performed. Primary sequences, physiochemical properties, subcellular localizations, 3D structures, transmembrane domains, motifs of the proteins were also analyzed. In addition, evolution of the CcGASA genes was investigated and possible duplication events were thus identified. Collinear relationships were revealed between the CcGASA genes and those from other citrus species, arabidopsis, apple and grape. The transcription factors possibly binding to CcGASA promoters, and the proteins possibly interacting with CcGASAs were also predicted and their associated functions were analyzed.
Our results indicated that CcGASA might be involved in plant defense processes. Firstly, these small hydrophilic and unstable proteins were mostly predicted to be in the extracellular space (Table 2), strongly indicating their association with plant cell walls that constitute the first line of defense in plants [44]. Secondly, their involvement in stress regulation was also suggested by the presence of many stress responsive cis-acting elements on their promoters (Fig. 5), such as ABRE involved in ABA-regulated osmotic stress [25], CGTCA-motif and TGACG-motif involved in the regulation in seed germination, senescence, and stress responses [10], ERE required for the expression of most ethylene-induced genes [23]. In addition, anaerobic (ARE and/or GC-motif ), abiotic (MYB, MYC and/or STRE), low temperature (LTR), drought (MBS/DRE core), pathogens (TC-rich repeats/box S/W box) and wounding (WUN-motif/W box) induced/ responsive elements were also very abundant. Furthermore, analysis for TFs interacting with these elements also indicated that many of the TFs were associated with stress responses (Supplementary Fig. S3). Moreover, the possible involvement of some CcGASA proteins in defense was also shown by their predicted interactions with defense-related proteins (Supplementary Fig. S8). Thirdly, the expression of the CcGASAs was induced by treatments of stress-related hormones and citrus canker bacterium Xcc (Figs. 8 and 9 and Supplementary Figs. S5 and S6).
The GASA genes have also been known to play important roles in plant growth and development [34]. In this regard, the nature of some cis-elements found on their promoters (Supplementary Table S5), such as tissue specific expression elements, AAGAA-motif (driving endosperm-specific negative expression), CAT-box (specifying meristem expression) and many light responsive elements, among others, implied that the CcGASA genes should indeed play important roles in the growth and development of citrus. Moreover, GA (GARE-motif/P-box/TATC-box), SA (TCA/TCA-element) and auxin (AuxRR-core/TGA-element) responsive elements were found in the promoter regions of most GASA genes. GO analysis also showed that some TFs identified by bioinformatics method in the study are associated with plant development (GO: 0032502), biological regulation (GO: 0050789 and GO: 0065007), responses to stimuli (GO: 0050896) (Fig. 6). That several flowering-related TFs were identified indicated that the citrus CcGASA genes or at least some of them should play a role in reproduction processes in Citrus, which is similar to Arabidopsis AtGASA4, a flowering promotion gene [38], and AtGASA5, a flowering delaying gene [58]. It should be noted that both Arabidopsis genes are actually involved in GA-mediated flowering.
The above notions that CcGASA proteins should play multiple roles in citrus were further supported by phylogenetic classification results. As shown in Fig. 4, the proteins were classified into 3 large groups and the classification was apparently related to their motif compositions (Fig. 4b) and primary sequences (Fig 2). It has long been known that structure and function of a protein is interrelated [20]. Thus, variations occurred to GASA protein structures should have allowed them to evolve different functions although they still share the highly conserved GASA domain. This could be exemplified by two G1d genes, CcGASA14 and 15 that have lost two motifs in their primary sequences (Fig. 4b). The possible consequence of such a large structural variation might be far-reaching, i.e., the pathogen inducibility they shared with other paralogs in the same G1 group was lost forever (Fig. 8). Structural differentiations among GASA proteins should also have resulted in functional differentiations. In this respect, the Xcc induction of G1a members, represented by CcGASA1 and CcGASA16, was 2 and 3 orders of magnitude higher than that of G2 and G3 members (Fig. 8), respectively, which strongly suggested that the G1a members have been specialized to cope with biotic stresses. However, we need to do more investigations to demonstrate this speculation.
The expansion of the GASA gene family was shown to be mainly through DNA duplication, either interspersed segmental duplication or tandem duplication [40]. In this study a total of 15 CcGASA genes were identified in C. clementina. Comparatively, there are about 15, 14, 26, and 37 GASA genes in Arabidopsis, grape, apple and soybean, respectively [1,2,14]. As can be seen, there are approximately two times more GASA genes in apple and soybean than in citrus, grape and Arabidopsis. Such a large discrepancy in GASA gene numbers between different species could be better explained by that only apple and soybean respectively experienced a recent whole genome duplication (WGD) while the others did not [41,50]. Similarly, it was found that the EIN3/EIL genes were doubled from 4~5 in Arabidopsis, tobacco, tomato, rice, peach, mei and strawberry that did not have a recent WGD, to 10 in pear that shared a recent WGD with apple [11,53,55]. The Ka/ Ks ratios between the 6 paralogous GASA gene pairs were all less than 1 ( Table 3), indicating that they have been undergoing a purifying selection rather than a positive or neutral selection. Comparison of the two tandem duplication originated pairs, CcGASA14 / 15 and CcGASA8 / 9, with other CcGASA genes revealed that both CcGASA14 and 15 lost two motifs while CcGASA9 lost one motif (Fig. 4b), indicating that the three genes might either undergo degeneration or evolve new functions.

Conclusions
Eighteen CcGASA proteins from the C. clementina genome were identified and analyzed in this study, with emphasis on their possible roles in defense in citrus. Results from bioinformatics analysis showed that the members of the gene family have structurally and functionally diverged to different degrees and thus may play different roles in the growth and development of Citrus. Experimental evidence showed that the expression of the G1a subgroup members was highly sensitive to bacterial infection, strongly suggesting that they may play an important role in the responses of citrus to biotic stresses.