Genome-wide identification and expression analysis of aquaporin family in Canavalia rosea and their roles in the adaptation to saline-alkaline soils and drought stress

Background Canavalia rosea (Sw.) DC. (bay bean) is an extremophile halophyte that is widely distributed in coastal areas of the tropics and subtropics. Seawater and drought tolerance in this species may be facilitated by aquaporins (AQPs), channel proteins that transport water and small molecules across cell membranes and thereby maintain cellular water homeostasis in the face of abiotic stress. In C. rosea, AQP diversity, protein features, and their biological functions are still largely unknown. Results We describe the action of AQPs in C. rosea using evolutionary analyses coupled with promoter and expression analyses. A total of 37 AQPs were identified in the C. rosea genome and classified into five subgroups: 11 plasma membrane intrinsic proteins, 10 tonoplast intrinsic proteins, 11 Nod26-like intrinsic proteins, 4 small and basic intrinsic proteins, and 1 X-intrinsic protein. Analysis of RNA-Seq data and targeted qPCR revealed organ-specific expression of aquaporin genes and the involvement of some AQP members in adaptation of C. rosea to extreme coral reef environments. We also analyzed C. rosea sequences for phylogeny reconstruction, protein modeling, cellular localizations, and promoter analysis. Furthermore, one of PIP1 gene, CrPIP1;5, was identified as functional using a yeast expression system and transgenic overexpression in Arabidopsis. Conclusions Our results indicate that AQPs play an important role in C. rosea responses to saline-alkaline soils and drought stress. These findings not only increase our understanding of the role AQPs play in mediating C. rosea adaptation to extreme environments, but also improve our knowledge of plant aquaporin evolution more generally. Supplementary Information The online version contains supplementary material available at 10.1186/s12870-021-03034-1.

native species, thereby plays basic and pioneering roles in island greening, sand fixation, and ecological restoration of tropical and subtropical coral islands and coastal zones [2]. Sandy soils, salinization, and seasonal drought are factors that limit growth for many plants in coastal areas or coral reefs. Canavalia rosea belongs to the "mangrove associates" group, in which some elaborate mechanisms for adapting to highly saline and alkaline soils and drought stress have evolved at both the morphological and physiological-molecular levels. Understanding the molecular and evolutionary mechanisms of C. rosea's adaptation to the special habitats would help to illuminate extremophile adaptations to adverse conditions. Saline-alkaline soils and drought stress both cause plant cellular water deficits [3] and result in water imbalances, from root water uptake to leaf transpiration [4]. Identification of genes involved in responding to water-deficit stress in C. rosea may be valuable for molecular breeding improvement of saline-alkaline and drought-related traits through genetic engineering.
Water is an essential component of any biological system and plants exhibit elaborate adaptations to maintain survival in the presence of water stress. Aquaporins (AQPs) are transmembrane proteins that play critical roles in controlling transmembrane water transport in and out of plant cells by forming water channels [5]. In addition to water transport, AQPs facilitate the transport of small molecules such as urea, H 2 O 2 , and NO, and elements such as boron and silicon across cell membranes [6]. Aquaporins are found in a wide variety of taxa, including microbes, animals, and plants, and are the oldest family of major intrinsic proteins (MIPs). Aquaporins have been traditionally classified into four major subfamilies: plasma membrane intrinsic proteins (PIPs), tonoplast intrinsic proteins (TIPs), Nod26-like intrinsic proteins (NIPs), and small and basic intrinsic proteins (SIPs) [7]. Additionally, in some plant genomes, a small number of AQPs have been identified as a fifth subfamily called X-intrinsic proteins (XIPs), which are absent from monocots and Brassicaceae [8]. Furthermore, GlpF-like intrinsic proteins (GIPs) isolated from a moss (Physcomitrella patens) and hybrid intrinsic proteins (HIPs) found in a fern (Selaginella moellendorffii) and a moss (P. patens), which are rare in most plants, are both classified into the AQP family [9,10].
Structurally, almost all AQPs consist of six transmembrane domains (α-helices, H1 to H6) with N and C termini facing the cytosol [11]. The six transmembrane domains are joined by five interhelical loops (A-E). The conserved loops (B and E) show extremely hydrophobic characteristic, often containing internal repeats of asparagine-proline-alanine residues (NPA motifs). These conserved, hydrophobic loops seem to be the most important features maintaining AQP function by forming short helices [12,13]. Aromatic/Arginine regions (ar/R) and Froger's positions are also conserved in most of AQPs [14]. Generally, AQPs are inserted into membranes in a tetrameric structure comprising four independent pores created by AQP monomers [15]. Besides being water channel proteins, some AQPs are also involved in facilitating the transport of CO 2 [16], NO [17], glycerol [18], H 2 O 2 [19], some trivalent elements [20], and a wide range of small uncharged solutes [21]. It is clear that AQPs show versatile functions in water uptake, nutrient balancing, long-distance signal transfer, nutrient/ heavy metal acquisition in plant development, and stress responses [22].
Unlike the AQP members in yeast (only two genes, AQY1 and AQY2) [23] or animals (only 13 AQPs in mammals) [24], plants AQPs comprised large, highly diverse gene families that may be linked to plants' greater adaptability to local conditions given their sessile nature [11,25]. Many AQP gene families have been identified using cDNA and whole-genome analyses in a wide variety of plant species, including Arabidopsis (35 members) [26], maize (31 members) [27], and rice (34 members) [28]. Given advances in whole genome sequencing, AQP-related research has recently gained traction in studies of plant adaptation, especially for halophyte and drought-tolerant plants. As a close relative of Arabidopsis, Eutrema salsugineum has been considered a model extremophile used to identify mechanisms of salt tolerance. The AQP family in E. salsugineum has been characterized by assessing differential gene expression patterns, with research mostly focused on assessing responses to salt, cold, and drought stress [29,30]. Chickpea (Cicer arietinum) has better drought tolerance than most of leguminous species and its AQP gene family has been characterized to further investigate its adaptability to water deficit [31,32]. Furthermore the AQP gene family of cassava (Manihot esculenta), a drought-tolerant tuber that is an important food resource in many African countries, has been characterized in terms of its evolution, structure, and expression patterns [33]. Canavalia rosea is more tolerant to drought, high salinity, heat, and low nitrogen and phosphorous than most of leguminous plants. It is therefore of particular interest to identify the complete set of AQPs within C. rosea (CrAQPs) and to perform comparative analyses to understand their evolutionary relationships, particularly regarding the adaptability of this species to coastal and coral reef habitats.
In our study, the availability of whole genome sequence data for C. rosea facilitated genome-wide analysis to identify the evolutionary relationships between C. rosea AQPs and those of related leguminous species. We characterized the structure of CrAQPs and their chromosomal locations. We also investigated the expression profiles of CrAQP genes in various tissues, in response to different abiotic stressors, and in different habitats, along with promoter analyses. Additionally, a single plasma membrane intrinsic protein gene, CrPIP1;5, was functionally identified using heterogeneous transgenic assays.

Identification of the C. rosea AQP family
Base on the protein BLAST research and Hidden Markov model profile (Pfam ID: PF00230) search, a total of 37 CrAQP members were identified and annotated in the C. rosea genome database. The set of CrAQPs includes 11 NIPs, 11 PIPs, 10 TIPs, 4 SIPs, and 1 XIP (Table 1), which were named according to their phylogenetic and  (Table 1). Based on multiple alignments, a neighbor-joining phylogenetic tree was constructed with the amino acid sequences of AQPs from C. rosea, Arabidopsis, and soybean (Fig. 1). The clustering results clearly showed that there was only one sequence encoding for the XIP protein in C. rosea. In addition, the SIP subfamily in C. rosea (CrSIP) had a smaller but more conserved cluster than other three subfamilies in C. rosea (CrNIP, CrPIP, and CrTIP). We also compared the number of AQP genes in C. rosea with other plant genomes ( which might be due to a whole-genome duplication event in the distant past [34]. The length of CrAQP proteins ranged from 155 aa (CrSIP1;3) to 709 aa (CrNIP4;1), while most were between 230 and 320 aa. The predicted molecular weight and isoelectric points of the CrAQPs ranged from 17.13 kDa to 78.97 kDa and 4.8 to 9.92, respectively (Tables 1 and 3). Thirty four of the 37 CrAQPs included six transmembrane domains and the remaining three members (CrNIP4;1, CrSIP1;3, and CrXIP1;1) possessed seven, three, and five transmembrane domains, respectively ( Table 3). The identification of transmembrane regions of CrAQPs is shown in Figure S1.

Features of AQP proteins
The phosphorylation state of AQP proteins is a key factor regulating the transport of water and other small molecules, or affecting of the protein subcellular localization [14,22]. In this study, we predicted the possible phosphorylation sites of CrAQPs. In brief, all CrAQPs except for CrXIP1;1 contained all three phosphorylation sites (Ser, Thr, and  Table 1). We also predicted the subcellular localization of CrAQPs. The two programs used (WoLF_PSORT and Plant-mPLoc) had similar results and most CrAQPs were located in the plasma membrane, although some were located in vacuoles, plastids, and the endoplasmic reticulum ( Table 1). The subcellular localizations of CrAQPs showed diverse and broad patterns, indicating that the in vivo compartmentation of CrAQPs is highly variable for each member to regulate transport of water and/or solutes across the plasma membrane and intracellular membrane systems, thereby exercising unique biological functions.
The NPA motifs, ar/R filter, and Froger's positions of AQPs are critical for their substrate selectivity. A multiple alignment between CrAQPs and other plant AQPs was performed and the conserved NPA motifs and amino acids in ar/R filter and Froger's positions are characterized in Table 3 and Figure S1. Except for CrPIP2;1, the other 36 CrAQPs all contained two NPA motifs, one in loop B and one in loop E, and most of them were conserved. However, some CrAQPs, such as in CrTIP4;1 and four CrSIPs, displayed a variable third residue in the LB NPA motif, in which the A residue was replaced by S/T/L. In addition, the CrXIP1;1 protein had variable first and third residues in the LB NPA motif (SPV). In loop E, this NPA motif was more conserved and only CrNIP1;3, CrNIP3;1, CrNIP3;2, and CrNIP3;3 showed substitutions of A by V. In CrSIP1;3, the LE NPA motif degenerated into NLG and showed a greater divergence in residues of the two NPA motifs than the other CrAQPs ( Table 3). The space between the two conserved NPA motifs varied from 79 to 127 aa and most were between 108 and 119 aa (Table 3). At the ar/R selectivity filters and Froger's positions, the CrAQPs displayed more differences than in NPA motifs (Table 3). These variabilities determined the substrate specificity of CrAQPs.

Chromosomal locations and evolutionary characterization of CrAQPs
To investigate the evolutionary relationship among CrAQP genes, chromosome maps were constructed (Fig. 2a). There are eleven chromosomes in the C. rosea genome and CrAQP genes were found on all except chromosome 5. On the other ten chromosomes, the CrAQPs were unevenly distributed. Among them, chromosome 3 had seven CrAQP genes, chromosome 8 had six, chromosome 2 had five, chromosome 4 had four, chromosomes 1, 6, and 9 had three, and chromosomes 7, 10, and 11 had two.
Gene duplication events of CrAQPs were also investigated. A total of eighteen and four CrAQP genes were found to be segmentally and tandemly duplicated, respectively ( Table 4). The distribution of segmental duplication of CrAQPs in C. rosea chromosomes was simply showed in Fig. 2b.The selection pressure acting on CrAQP genes was inferred from the ratio of non-synonymous (Ka) to synonymous (Ks) substitution values. Our data indicate that all CrAQP genes were under evolutionary pressure, with an average Ka/Ks ratio of 0.1523. All Ka/Ks ratios were well below one (range: 0.0989-0.2738) ( Table 4). These results suggest that CrAQPs experienced strong purifying selection pressure with limited functional divergence after duplication.

Gene structures and protein motif compositions
Gene structure analyses performed using the GSDS tool revealed relatively large variation in the number and length of introns/exons that resulted in CrAQPs length variation (720-14,816 bp) across five different CrAQP subfamilies ( Fig. 3a and b). The number of introns ranged from zero (CrSIP1;2) to eleven (CrNIP4;1). Most CrNIPs and CrPIPs possessed three to four introns and most CrTIPs had two introns, except for CrTIP4;1, which had three introns. Three of four CrSIPs had two introns, except for CrSIP1;2, which was intronless. The only CrXIP1;1 also had two introns.
To further investigate the function of AQPs in C. rosea, MEME was used to identify the conserved domains of CrAQP proteins. Among the ten motifs identified, motifs 1, 2, 4, 6, 7, and 8 were widely found in CrNIP and CrTIP subfamilies. Most of the CrPIP members shared conserved motifs 1, 2, 3, 4, 5, 6, 9, and 10. Three of the four CrSIPs shared conserved motifs 4 and 6, all except CrSIP2;1. The CrXIP1;1 protein possessed only motif 6. In general, the motif compositions were similar within each CrAQP protein subfamily (Fig. 3c).  21:333 Cis-acting regulatory elements The regulation of CrAQP expression remains a key mediator of CrAQP function, especially in response to stress and plant growth and development. The cis-acting regulatory elements are a series of nucleotide motifs that bind to specific transcription factors, thereby The promoter analyses of all 37 CrAQPs identified 68 putative cis-acting elements, including 25 light responsive elements, 4 ABA responsive elements, 3 gibberellin-responsive elements, 2 MeJA-responsive elements, 2 auxin responsive elements, 1 ethylene-responsive element, 22 abiotic or biotic stress-related responsive elements, and 18 development-related responsive elements (Table S3). We characterized these elements into 12 categories: light responsive elements, gibberellin-responsive elements, MeJA-responsive elements, auxin-responsive elements, salicylic acid responsive elements, ABRE-, ERE-, MYC-, MYB-, MBS-, and TCrich repeats, and LTR. The numbers of these elements in each CrAQP promoter region are summarized in Fig. 4a. In addition, because PIPs play an important roles in maintaining water balance in plant cells, we summarized the abiotic stress-related cis-acting elements (including ABRE, ERE, MYB, MBS, TC-rich repeats, and MYC) within 11 CrPIP promoter regions (Fig. 4b). The categories and numbers of these elements suggest that mechanisms regulating CrPIP expression are involved in stress responses. However, further functional studies are still necessary to confirm the functions of these cis-acting CrAQP elements.  Fig. 3 Phylogenetic relationships, genes' structure, and motif compositions of the AQP genes in C. rosea. a The phylogenetic tree on the left side is constructed using MEGA 6.0. The five major groups are marked with different color backgrounds. b The exon-intron organization of the CrAQPs is constructed using GSDS 2.0 (in the middle). c The conserved motifs of each group on the right side are identified by the MEME web server. Different motifs are represented by different colored boxes, and the motif sequences are provided in Table S4

Expression profiles of CrAQPs in different tissues and plants residing in different habitats
Tissue-and habitat-specific expression profiles of CrAQPs were assessed by examining their Illumina RNA-Seq data representing seven tissue types: roots, vines, young leaves, flowering buds, and young fruits gathered from SCBG, and two mature leaf samples gathered from SCBG and YX Island respectively. Expression of all CrAQPs was detected in at least one of the examined tissues, though the transcript level was diverse. Overall, the CrPIP members had relatively higher expression in all tissues. The subfamilies CrPIP and CrTIP also produced abundant transcripts in most examined tissues (Fig. 5). Young flowering buds and young fruits tended to have high levels of CrAQP expression across the whole family (Fig. 5a). We also focused on the comparison of CrAQPs' expression levels in adult C. rosea leaves collected from various habitats (YX Island and SCBG), and the results indicated that the most of CrAQPs expressed higher in the YX sample than in the SCBG sample, particularly the CrPIP members (Fig. 5b). These results suggest that CrAQPs might play diverse roles in the growth and development of C. rosea, and in this extremophile halophyte's adaptation to coral reef habitats.

Expression profiles of CrPIPs in response to different stressors and the ABA treatment
We performed a gene expression analysis on different C. rosea tissues to examine the expression patterns of CrPIP genes under various abiotic stress conditions and an ABA hormone treatment. The purpose of these treatments were to mimic reef and coast adversity as much as possible. We performed qRT-PCR to detect the transcript levels of these subfamily genes. As shown in Fig. 6, expression of all CrPIPs was affected by the stressors and hormone application. We also found several CrPIP members that showed relatively stable expression patterns, even under the various stressors. These genes included CrPIP1;5, CrPIP2;2, CrPIP2;3, and CrPIP2;5 (Fig. 6). Combining these results with the RNA-Seq data (Fig. 5), it is evident that these genes maintained higher expression levels than the other CrAQP genes across different tissues and habitats, suggesting that they may be involved in maintaining basic and primary water homeostasis during C. rosea growth and development.  were slightly upregulated in aerial tissues. High osmotic stress increased the expression of CrPIP1;1, CrPIP2;4, and CrPIP2;6, and the ABA treatment increased the expression of CrPIP1;2, CrPIP1;4, CrPIP2;1, and CrPIP2;6 ( Fig. 6). These results indicate the role these genes play in multiple abiotic stress responses and ABA signaling response in C. rosea.

Interactions between CrPIP1;5 and CrPIP2;3
A previous study indicated that plant PIP1 and PIP2 members can associate together in heterodimers and tetramers [35]. In our previous study, we have functional identified a C. rosea PIP2 gene, CrPIP2;3, being involved in drought tolerance in transgenic plant [36]. CrPIP1;5 and CrPIP2;3 were both initially isolated from C. rosea cDNA library, and these two members showed much higher expression level in different tissues of C. rosea than other CrAQPs (Fig. 5a), which might indicate their relative importance of maintaining water balance in vivo. In this study we analyzed two PIP members, CrPIP1;5 and CrPIP2;3, to confirm that CrAQPs could form homodimers or heterodimers. To explore CrPIP1;5-CrPIP2;3 interactions, a series of DNA constructs were prepared for a yeast two-hybrid assay (Fig. 7a). BD and AD vectors were co-transformed into yeast AH109. Both CrPIP1;5 and CrPIP2;3 did not self-activate, but both can form homodimers through direct interactions with themselves (Fig. 7b). Furthermore, CrPIP1;5 and CrPIP2;3 can interact with each other (Fig. 7c). Together, these results indicate that, at least in yeast cells, these two CrPIP members can interact with themselves and each other to form active pores for water and small molecule transport across membranes.

Abiotic stress tolerance of yeast and Arabidopsis heterologously expressing CrPIP1;5
We performed functional identification of CrPIP1;5 using a yeast expression system, constructing with a CrPIP1;5-pYES-DEST52 recombinant vector (Fig. 8a). As seen in Fig. 8b, W303 transformed with either CrPIP1;5 or pYES2 developed normally and did not differ in growth rate from the SDG control plate. However, with the addition of PEG8000 or sorbitol, W303 transformed with CrPIP1;5 showed an obvious growth lag compared to yeast containing the pYES2 control. When NaCl was added to the SDG medium, the W303 yeast containing CrPIP1;5 showed better growth than the control (Fig. 8b).
We also checked H 2 O 2 transport activity using the yeast expression system. CrPIP1;5 resulted in increased H 2 O 2 sensitivity of yeast and lower growth rates, while both the BY4741 strain and the H 2 O 2 -sensitive mutant strain skn7Δ showed similar growth performance to the SDG control plate (Fig. 8c). These results indicate that, at least in yeast cells, CrPIP1;5 is an active H 2 O and H 2 O 2 transporter.
To further assess the effects of CrPIP1;5, we generated transgenic Arabidopsis plants that ectopically expressed CrPIP1;5 under the control of 35S promoters. The plants were confirmed as transgenic using genomic PCR, RT-PCR, and qRT-PCR ( Figure S2). Plants from three homozygous T3 lines (OX 1#, OX 5#, and OX 10#) were subjected to the salt, salt-alkaline, high osmotic, and drought tolerance tests. Although our seed germinating assays indicated that CrPIP1;5 OX lines showed no significant difference on germination rates under salt, saltalkaline, or high osmotic challenges compared with WT seeds ( Figure S3), while in the seedling growth assays, CrPIP1;5 OX line seedlings showed slightly growth retardation on the salt and salt-alkaline MS plates compared with WT control ( Figure S4).
The seeds of WT and CrPIP1;5 OX lines were grown in well-watered conditions for 30 days, and prior to the salt, drought, and alkaline stress treatments, the growth rates of adult plants (WT and three CrPIP1;5 OX lines) were relatively consistent. There was no difference in tolerance between WT and transgenic plants (OX 1#, OX 5#, and OX 10#) under salt (200 mM NaCl) and salt-alkaline (100 mM NaHCO 3 , pH 8.2) stressors ( Figure S5). Apparently, CrPIP1;5 resulted in weak sensitivity to drought (Fig. 9a). After 10 days of water withdrawal, all plants wilted to some degree in both WT and the three CrPIP1;5 OX lines. After re-watering and growing for another 7 days, most of the CrPIP1;5 OX plants did not recover, while most of the WT plants did recover and had a higher survival rate than CrPIP1;5 OX plants (Fig. 9b). This result indicates that overexpression of CrPIP1;5 decreased plant resistance to drought.
We found that GFP-fused CrPIP1;5 was constitutively expressed (under the control of CaMV 35S) in transgenic Arabidopsis plants. Root tip fluorescence of roughly three-to four-day-old transgenic Arabidopsis seedlings was easily discerned by confocal microscopy; the GFP-CrPIP1;5 protein was visible in the plasma membranes of transgenic plants, while in control plant roots, the GFP signal was distributed evenly in the whole cytoplasm ( Figure S6). These results suggest that subcellular localization of CrPIP1;5 was consistent across the PIP1 subfamily and was predominantly localized to the plasma membrane, as well as partly in endomembrane system. Within the plasma membrane, CrPIP1;5 was folded into a specific transmembrane channel and functioned as a water transporter.

Discussion
Water deficit-caused by drought, high salinity/alkaline, high temperature, cold/freezing conditions or other abiotic stressors-can negatively affect plant growth and survival. However, plants have developed intricate mechanisms to cope with this type stress, including alterations to signal perception and transduction and differential expression of stress responsive genes through complex networks. Aquaporins are a class of integral membrane proteins that facilitate the diffusion of water and other small solutes. Plants often maintain large and diverse AQP families compared to animals and microorganisms. Aquaporins have been reported to play crucial roles in plant water balance and homeostasis under adverse growing conditions [5,21] and in response to specific biotic challenges [37,38]. In this study, we performed genome-wide identification and characterization of AQPs in C. rosea to understand the evolution of this family and its molecular roles. We were particularly interested in resolving the molecular mechanisms underlying this extremophile halophyte's adaptation to coral reef habitats and its responses to acute salt, alkaline, and drought stressors. The AQP protein family within the C. rosea genome was characterized and 37 putatively functional CrAQP isoforms (based on Pfam domain sequences) were identified, belonging to the PIP (11 isoforms), TIP (10), NIP (11), SIP (4), and XIP (1) subfamilies (Table 1). We performed whole genome sequencing of C. rosea, and our result indicates that this species is diploid, with a 534.94 Mbp genome size (data not published). The number of AQPs was similar to other diploid plant species (Table 2) and their protein sequences were highly similar. This indicates that the number of AQPs and sequence specificity may not be directly related to the adaptation of C. rosea to extreme environments. The roles that CrAQPs play in stress tolerance needs to be further studied from other perspectives, such as transcriptional regulation, protein modification, and the regulation of AQP transmembrane transport activities. Although numerous studied have identified AQPs in model plant species, research on this gene family has increasingly focused on plants that inhabit novel environments. This is largely because AQP genes are seen as candidates for use in genetic modification of crops to increase agricultural productivity [39,40]. The saltbush Atriplex canescens is highly tolerant of saline-alkaline soils, drought, heavy metals, and cold, and the AQP genes AcPIP2 and AcNIP5;1 have been shown to be involved in abiotic stress tolerance in this species, and their overexpression in transgenic Arabidopsis caused altered tolerance to drought and salt [41,42]. Compared with cultivated soybean, the wild Glycine soja is relatively saltalkaline tolerant. Two AQP genes from G. soja, GsTIP2;1 and GsPIP2;1, minimized tolerance to salt and dehydration stress when overexpressed in Arabidopsis, implying they have negative impacts on stress tolerance by regulating water potential [43,44]. In most functional analyses conducted in transgenic plants, the overexpression of AQP genes caused elevated tolerance to salt and drought, such as in Malus zumi (gene MzPIP2;1) [45], Sesuvium portulacastrum (SpAQP1) [46], Stipa purpurea (SpPIP1) [47], Simmondsia chinensis (ScPIP1) [48], Thellungiella salsuginea (TsPIP1;1) [49], and Phoenix dactylifera (PdPIP1;2) [50]. The elevated expression of AQP genes in plants can lead to cellular changes in water potential, which cause alterations in water uptake and transpiration, and ultimately modify tolerance to water deficit stress. In this respect, understanding the distribution, expansion, regulation, phylogenetic diversity, and evolutionary selection of AQP genes in extremophile plants like C. rosea is an important step toward potentially improving the water utilization abilities and drought adaptations of other plant species, including agricultural crops.
Plant AQPs play versatile physiological roles in combatting abiotic stress, not only by regulating water content and potential, but also by transporting certain signaling molecules and nutrients. Generally, AQPs consist of six transmembrane helices connected by five loops (A-E) and cytosolic N-and C-termini. Loops B (cytosolic) and E (non-cytosolic) both contain the highly conserved NPA (asparagine-proline-alanine) motifs that form part of the core of these proteins. The aromatic/ arginine (ar/R) constriction is located at the non-cytosolic end of the pore. The substrate specificity of AQPs is closely related to several different signature sequences, including NPA motifs, the ar/R filter, and Froger's positions (FPs) [51]. In all CrAQP NPA motifs, the first two residues were the most conserved, except for CrSIP1;3 and CrXIP1;1, in which the loop B and loop E NPA motifs degenerated into NLG and SPV. The third residue of NPA motifs was more variable, in which A was frequently replaced by either L, S, T, or V. However compared to the NPA motifs, the 10 amino acid residues at the ar/R filter and Froger's positions were more variable in all CrAQPs (Table 3). In some subfamilies, the ar/R selectivity filter sequences were similar, such as in CrPIPs (F-H-T-R), CrTIP1s (H-I-A-V), and CrNIP1s (W-V-A-R). We also analyzed Froger's positions (P1-P5), five conserved amino acid residues that are related to glycerol transport in water-conducting AQPs. The P2, P3, P4, and P5 Froger's positions in CrPIPs were relatively conserved (S-A-F-W), and in CrTIPs, they were less conserved (S-A-Y/F-W). In CrNIPs and CrSIPs, the P3 and P4 positions mostly stayed A and Y. It is supposed that plant TIPs may transport various small solutes, including H 2 O 2 , NH 4 + , and urea, in addition to water [40]. As with other plant TIPs, CrTIPs are mainly located in vacuolar membranes and may be involved in the regulation of water flow across subcellular compartments of organelles [52]. The variation of CrTIPs in ar/R selectivity filter sequences may contribute to their multiple transport functions, and their NPA spacing varies from 79 to 127 amino acid residues, which indicates that CrTIPs might also be involved in the transmembrane transport of multiple small molecules.
Gene structure organization, gene expansion, and gene diversity are critical indicators of the evolution of gene families. The CrPIP and CrTIP subfamilies exhibit relatively stable gene structure in comparison with other subfamilies (Fig. 3). Most of them possess three (CrPIPs) or two (CrTIPs) introns, suggesting that they might share a common ancestral origin. Similar to previous reports showing very few or no intronless AQP genes in other plant species [30,31,53], only one intronless AQP was identified in C. rosea. The intronless gene, CrSIP1;2, might have evolved recently through a retrotransposon process. The CrAQP family has undergone a number of duplication events consistent with the highly duplicated nature of plant genomes ( Fig. 2; Table 4). The duplication events concerning segmental and tandem duplications identified in this study have also been reported in other plant species [33,34]. In the present study, some duplicated CrAQPs have distinct patterns of expression in different tissues and habitats, and under different stressors and hormone exposure (Figs. 5 and 6). It is likely that these duplicated gene pairs have similar protein functions yet function in different biological processes, probably mediated by transcriptional regulation or posttranscriptional modification.
Canavalia rosea is a salt-and alkaline-tolerant and drought-adapted halophyte, and abiotic stressors, such as saline-alkaline soil, seasonal drought, strong solar irradiance, and high temperatures, are the main limiting factors that induce osmotic stress and disturb water balance for this species and other tropical seaside plants. For regulating the water transport under different abiotic stresses efficiently, plant often changes the expression of aquaporin genes through a series of complex steps to control the water uptake or management. However, the water management effectiveness and patterns could always depending on the plant growth conditions or tissue type, as well as the different types or degrees of water stress and the specificity of AQP members [54]. In plant, the PIP isoforms were supposed to playing major roles in maintaining plant water homeostasis and responses to abiotic stress [11,55]. Gene transcript levels are dependent upon the structures of their promoters. Therefore, the cis-acting elements in promoter regions might provide the key to understanding genetic factors influencing the responses of signal molecules and environmental elicitors. We summarized the abiotic stress-related cisacting elements in CrAQP promoters (Fig. 4) and our findings suggest diversity in CrAQP expression patterns, which could be also considered as a part of the adaptation mechanisms to stress conditions. Chickpea (Cicer arietinum L.) is an important food legume crop with good drought and salinity tolerance than most of other crops [31,32]. The promoter analyses of CaAQPs also identified a number of cis-regulatory elements, including defense and stress responsive elements. This promoter analysis was also partly support by the experimental expression analyses some of selected CaAQPs, while was not absolutely consistent with the expression patterns of CaAQPs [32]. Furthermore, the expression profiles of CrAQPs in different tissues revealed by RNA-Seq indicate that some of the CrPIP and CrTIP subfamilies had higher expression levels than other subfamilies (Fig. 5a), and habitat-specific RNA-Seq data acquired from leaves further indicated the most of the CrPIP members had greater expression levels in coastal C. rosea (YX) than in inland C. rosea (SCBG; Fig. 5b). Similar results were also observed in leguminous plant, chickpea (Cicer arietinum L.) [32]. In the drought-tolerant genotype chickpea, two subfamilies (CaPIPs and CaTIPs) showed high expression in all tissues as compared to other subfamilies, which indicated these two subfamilies were essential for water transportation and carried drought tolerant. The qRT-PCR results also showed some of PIP members showed much higher expression level in drought-tolerant genotype chickpea than in drought-tolerant genotype chickpea [32]. Our results suggest that differential expression of CrPIPs might be associated with different water use strategies in different habitats, and the higher expression level of CrAQPs in coastal C. rosea plants might be an adaptive mechanism to deal with intracellular and extracellular water-deficit signals. Due to the material limitation, and we cannot collect enough and intact root tissues from YX Island for RNA-seq analysis, so we mimicked the stresses in lab, and analyzed the gene expression with qRT-PCR. The expression patterns of CrPIPs under salt, alkaline, and drought stress and the ABA hormone treatment were further investigated (Fig. 6). The results showed that CrPIP expression was most affected under the saline-alkaline, high osmotic stress, and ABA treatments. Furthermore, some CrPIPs showed clearly different and even opposite expression patterns in roots, vines, and leaves. This can be attributed to the fact that in roots the PIP proteins mainly facilitate water absorption from external environments, while in vines and leaves, the PIPs may play a larger role in transpiration. Broadly, our results suggest a role for PIPs in regulating C. rosea hydraulics and probably adaptation to the challenging environmental conditions found on tropical coral reefs and islands.
In our previous research, we have characterized CrPIP2;3 as a salt/drought stress related gene [36]. Here we performed protein-protein interaction studies using yeast two-hybrid assays and found that two CrPIP members, encoded by CrPIP1;5 and CrPIP2;3, that were highly expressed in all tested tissues and almost constitutively expressed under the abiotic stress challenges and ABA treatment (Figs. 5a and 6). These two CrPIP members could bind to themselves and each other to form homodimers and heterodimers (Fig. 7). This is consistent with previous findings that some PIP1 and PIP2 members could assemble as homotetramers and heterotetramers, thereby triggering channel activities, influencing substrate specificity, and regulating PIP trafficking [56]. Here, our results on the expression patterns of CrPIP1;5 and CrPIP2;3 provide a detailed understanding of their regulatory modes and help to illuminate CrAQP functions. These data are especially helpful for characterizing AQP-interacting protein complexes involved in C. rosea's adaptations to harsh environmental conditions such as low water availability and saline-alkaline soils.
Our results from the yeast overexpression system indicate that CrPIP1;5 is an active transmembrane H 2 O and H 2 O 2 transporter (Fig. 8). We assessed the overexpression of CrPIP1;5 in transgenic Arabidopsis, and CrPIP1;5 lead to slightly reduced saline-alkaline and drought tolerance, which showed the exact opposite phenotype compared with CrPIP2;3's overexpression in Arabidopsis [36]. This also suggests that CrPIP1;5 could play a key role in water transport. Here we speculated that as a foreign AQP gene, CrPIP1;5 might be involved in modifying the function of endogenous PIPs of Arabidopsis, or function more in water flowing out than water absorption in transgenic Arabidopsis roots, thereby resulted in sensitivity to salt and drought stresses. We also found that high levels of salt, alkaline, and ABA slightly decreased the expression of CrPIP1;5 in C. rosea, this further suggests that this gene is highly important for water movement between cells and tissues, and is indeed involved in a stress response pathway that protects yeast cells or plants from water loss under high salinity conditions and promotes water release under high osmotic stress or drought (Figs. 8b and 9). The overexpression of CrPIP1;5 in transgenic Arabidopsis is in contrast to most previous findings [40,54], suggesting that overexpression of plant PIPs results in specificities of abiotic stress tolerance or sensitivity, which also might be attributed to the protein interaction, post-translational modifications (PTM), protein trafficking of the foreign AQP. While in C. rosea, our results about the expression pattern of CrPIP1;5 regarding to different tissues or habitats (Fig. 5), as well as the transcriptional changes responding to salinity/alkaline, high osmotic stress, and ABA treatment (Fig. 6a), indicated that CrPIP1;5 might be boiled down to a "housekeeping gene" for basic water homeostasis, to some degree. Even the CrPIP1;5 showed higher expression level in YX sample than in SCBG sample (leaves) (Fig. 5b), which might be probably due to the long-term adaptive mechanism that C. rosea plants on YX island have to deal with much tougher waterdeficit adversities than plants in SCBG, including water absorption from the external environment and water transportation in vivo. The slightly decreased expression of CrPIP1;5 in C. rosea seedlings caused by stress factors or ABA (Fig. 6a), which might be an emergency protection for alleviating the damages caused by water disturbances, since we only checked the CrPIP1;5's expression changes in 24 h challenged by different factors. Many studies also proved that the overexpression of plant PIPs could increase sensitivity to drought stress. For example, tobacco PIP1 member, NtAQP1, caused a decline of root hydraulic conductivity and decreased resistance of plants to water stress [57]. Transgenic tobacco (Nicotiana tabacum) plants overexpressing AtPIP1;4 and AtPIP2;5 displayed rapid water loss under dehydration stress and showed enhanced water flow under drought stress [58]. The Glycine soja gene GsPIP2;1 negatively impact salt and drought stress tolerance by regulating water potential when overexpressed in transgenic Arabidopsis [44]. In addition, Arabidopsis plants overexpressing AcPIP2 (a PIP gene from saltbush A. canescens) exhibited drought-sensitive phenotypes [41]. This can be explained by the nature and intensity of stresses, combining with the cooperation between the over-expressed foreign AQPs and the endogenous AQPs. When the water-deficit stress signals were perceived, the foreign AQPs' over-expression might affect the expression patterns and distribution of endogenous AQPs, and as a result the plant stress-responsive effects might vary among plant species. In our study, the overexpression of CrPIP1;5 showed some sensitivity both in yeast and in Arabidopsis, which only reflect CrPIP1;5 did involve in water stress responses in vivo. That, combined with our previous related research about CrPIP2;3 [36], suggested that overexpression of these two PIP genes in transgenic Arabidopsis promoted plant responses to abiotic stressors by maintaining water homeostasis, as well as decreasing the damage caused by water-deficit stress. Overall, as an active water channel protein with high expression levels in C. rosea different tissues or challenged by different factors, the CrPIP1;5, as well as the CrPIP2;3, might be basic sustainers for water homeostasis during the development of C. rosea plants.

Conclusions
The leguminous nitrogen-fixing plant, C. rosea, presents extreme saline-alkaline and drought resistance and is used as pioneer species on islands and reefs for artificial vegetation construction. In the present study, we conducted a genome-wide analysis and characterization of AQPs in C. rosea. Our results will be helpful for understanding the involvement of this gene family in adaptation to stressful abiotic conditions, particularly through its impact on water balance. We determined that the CrAQP family consists of 37 members distributed across five subfamilies. Each member had subtle variations in gene and protein structures, transcriptional regulation, subcellular localization, substrate-specificity, and post-translational regulatory mechanisms. Expression profiling of CrAQPs revealed higher expression of PIPassociated genes in almost every tissue of C. rosea plants, suggesting that this subfamily likely plays important roles in developmental processes and abiotic stress responses. As predicted, the two PIP1 and PIP2 members, CrPIP1;5 and CrPIP2;3, formed homodimers and heterodimers through protein interactions. We also functionally identified one of the CrPIP1 members, CrPIP1;5, given its highest expression levels in different tissues of C. rosea. Although our results showed that overexpression of CrPIP1;5 could increase sensitivity to saline-alkaline and drought conditions in yeast and plants, combining the high and regulatable expression of CrPIP1;5 in C. rosea, it can be inferred that CrPIP1;5 is one of basic and important functional genes for facilitating water transportation in vivo, and might be involved in the C. rosea ecological adaptation to tropical coral reef. The identification of CrAQPs in this study will be useful for further investigation of the roles that AQPs play in the various developmental stages and physiological processes of C. rosea, as well as elucidating the possible ecological adaptation mechanisms of C. rosea to extreme environments, and identification candidate genes for potential introduction into transgenic agricultural crops.

Plant materials
Canavalia rosea plants growing on Yongxing Island (YX, 16˚83′93′′ N, 112˚34′00′′ E) and in the South China Botanical Garden (SCBG, 23˚18′76′′ N, 113˚37′02′′ E) were used in this study. The C. rosea plants growing in SCBG were introduced gradually from coastal area in Hainan Province since 2012, and grew steadily for one to seven years with regular water and fertilizer supply in Guangzhou, China. To analyze tissue-specific transcriptional patterns of the identified CrAQPs, roots, stems, leaves, flowers, and fruits were gathered from C. rosea plants grown in SCBG. In addition, to investigate the involvement of the CrAQPs in adaptation to different habitats, adult leaves were gathered from C. rosea plants growing in both YX and SCBG. In brief, the C. rosea tissues were collected outdoors and immediately frozen in liquid nitrogen, then the samples were stored at − 80 °C for subsequent RNA-seq analysis. Three independent biological replicates were used.

Identification of CrAQP genes and gene duplication analysis of the CrAQP family
To identify all putative CrAQP genes, the genome database of C. rosea (data not published) was used to obtain DNA and protein sequences (Table S1). In brief, DIAMOND [59] and InterProscan (https:// www. ebi. ac. uk/ inter pro/ search/ seque nce/) were used to identify all C. rosea proteins with conserved domains and motifs (e < 1e −5 ), and all proteins were annotated using InterPro and Pfam databases (http:// pfam. xfam. org/). The Pfam ID (MIP, PF00230) was used to search the CrAQP protein family, and putative sequences of CrAQP proteins were identified and submitted to SMART (http:// smart. embl-heide lberg. de/) and the NCBI Conserved Domain Database (https:// www. ncbi. nlm. nih. gov/ Struc ture/ cdd/ wrpsb. cgi) to confirm presence of the AQP domain. Next, the selected CrAQPs were named based on their sequence homology with known AQPs and C. rosea genome annotation.
Gene segmental and tandem duplications were assessed using MCScanX software (http:// chibba. pgml. uga. edu/ mcsca n2/), and tandem duplications were also checked manually according their gene loci. The number of synonymous substitutions per synonymous site (Ka), the number of non-synonymous substitutions per nonsynonymous site (Ks), and the P-value from a Fisher's exact test of neutrality were calculated using the Nei-Gojobori model with 1,000-bootstrap replicates [60]. A Ka/Ks ratio < 1 indicates purifying selection, a Ka/Ks ratio = 1 indicates neutral selection, and a Ka/Ks ratio > 1 indicates positive selection.

Multiple sequence alignment and phylogenetic analysis of CrAQP family proteins
A bootstrap neighbor-joining phylogenetic tree was constructed based on multiple alignments of the identified AQPs from C. rosea with AQPs from Arabidopsis and soybean using MEGA 6.0 with 1,000 bootstraps. The sequences of GmAQPs (from soybean; Glycine max) and AtAQPs (from Arabidopsis; Arabidopsis thaliana) were downloaded from the phytozome database (https:// phyto zome-next. jgi. doe. gov/). Aquaporins were mapped on C. rosea chromosomes according to positional information of the CrAQP genes in the C. rosea genome database and displayed using MapInspect software (http:// mapin spect. appon ic. com/).

Expression analysis of CrAQPs
A transcriptome database of C. rosea was constructed using Illumina HiSeq X sequencing technology. The quality of the RNA-Seq datasets created from seven different tissues (roots, vines, young leaves, flowers, and young siliques collected from C. rosea growing in SCBG; mature leaves from C. rosea growing in SCBG and on YX Island) was examined using FastQC (http:// www. bioin forma tics. babra ham. ac. uk/ proje cts/ fastqc/), which produced 40 Gb clean reads. In brief, five different tissue samples were collected from similar young C. rosea plants (two-yearold with flowers and young siliques planted in SCBG), then these samples were cleaned and frozen quickly with liquid nitrogen for organ-specific RNA-Seq analysis. As to the habitat-specific RNA-Seq analysis, the mature leaves gathered separately from C. rosea plants growing in SCBG (seven-year-old plants) or native plants in YX Island, then the mature leaf samples were frozen with liquid nitrogen for future use. The samples mentioned above were collected with 3 independent replicates for each group. Clean reads were mapped to the C. rosea reference genome using Tophat v.2.0.10 (http:// tophat. cbcb. umd. edu/). Gene expression levels were calculated as fragments per kilobase of transcript per million mapped reads (FPKM) according to the length of the gene and the read counts mapped to the gene: FPKM = total exon fragments/[mapped reads (millions) × exon length (kb)]. Expression levels (log2) of CrAQPs were visualised as clustered heatmaps using TBtools.
To investigate the involvement of the CrAQP genes in abscisic acid (ABA) and in various stress responses, C. rosea was germinated from seed and 30-day-old seedlings were exposed to stressors. In brief, for the high osmotic stress treatment, seedlings were removed from their pots and carefully washed with distilled water to remove soil from the roots, and then transferred into a 300 mM mannitol solution. For high salt stress, seedlings were soaked in a 600 mM NaCl solution. For alkaline stress, seedlings were soaked in a 150 mM NaHCO 3 (pH 8.2) solution. For ABA treatment, a freshly prepared working solution of 100 μM exogenous ABA was sprayed on the leaves of seedlings. The second and/or third mature leaves from the seedling apexes were collected at 0, 2, and 24 h during the previously described stress treatments, with the 0-hourtime point used as the control. All samples were immediately frozen in liquid nitrogen and stored at − 80 °C for subsequent gene expression analysis. Three independent biological replicates were used. Transcript abundance of several CrAQPs' transcript was investigated using a qRT-PCR assay. The extraction and isolation of RNA and the synthesis of first strand cDNAs was performed as the previous report [36], with 3 replicates for each group. In brief, total RNA was extracted from C. rosea seedling tissues under the stress/ABA treatments and reverse transcribed to cDNA. Quantitative RT-PCR was conducted using the LightCycler480 system (Roche, Basel, Switzerland) and TransStart Tip Green qPCR SuperMix (TransGen Biotech, Beijing, China). All of the gene expression data obtained via qRT-PCR was normalized to the expression of CrEF-α (Table S2). The primers used for qRT-PCR (CrEF-αRTF/CrEF-αRTR for the reference gene and other CrAQP-specific primer pairs) are listed in Table S2.

Functional identification of CrPIP1;5 in transgenic Arabidopsis plants
The coding sequence (CDS) of the CrPIP1;5 cDNA was PCR-amplified using the primer pair PIP1-5OXGF/ PIP1-5OXGR (Table S2) and then inserted into plant expression vector pEGAD to generate CrPIP1;5-pEGAD. Thus, transgenic Arabidopsis plants (col-0 genotype, three overexpression lines, OX 1#, OX 5#, and OX 10#) were generated. After confirmation with genomic PCR and quantitative RT-PCR, these T3 homozygous transgenic lines were tested for their stress tolerance according to their seed germination rate as well as seedling and adult plant growth rate. These tests were thereby meant to evaluate the biological functions of CrPIP1;5.
In brief, seed germination rate of the CrPIP1;5 transgenic Arabidopsis (OX 1#, OX 5#, OX 10#, and WT) was measured under the following stress treatments: NaCl (175 mM, 200 mM, and 225 mM; salt stress); 5 mmol/L NaHCO 3 plus 95 mmol/L NaCl (pH 8.2), 7.5 mmol/L NaHCO 3 plus 92.5 mmol/L NaCl (pH 8.2), and 10 mmol/L NaHCO 3 plus 90 mmol/L NaCl (pH 8.2; alkaline stress); mannitol (200 mM, 300 mM, and 400 mM) stress. The goal of these treatments was to detect the effect of the overexpression of CrPIP1;5 on affecting the salt/alkaline/osmotic tolerance of transgenic Arabidopsis seeds during germination. Additionally, root length was calculated to evaluate the influence of the overexpression of CrPIP1;5 on transgenic Arabidopsis seedlings under abiotic stress (100 mM, 150 mM, and 200 mM NaCl for salt stress; 0.5 mmol/L NaHCO 3 plus 99.5 mmol/L NaCl, 0.75 mmol/L NaHCO 3 plus 99.25 mmol/L NaCl, 1 mmol/L NaHCO 3 plus 99 mmol/L NaCl, pH 8.2 for alkaline stress; 200 mM, 300 mM, and 400 mM mannitol for osmotic stress. Wild-type Arabidopsis and Murashige&Skoog medium (MS) or MS plus 100 mM NaCl (pH 8.2) medium were used as controls. The seed germination and seedling growth experiments were both performed on MS plates with or without stress factors, in the same greenhouse environment used to grow the Arabidopsis plants. Drought and salt/alkaline tolerance assays were also performed on transgenic adult Arabidopsis plants. Both WT and transgenic seeds (OX 1#, OX 5#, and OX 10#) were grown on MS medium. Ten-dayold seedlings were transplanted into square pots filled with nutrient solution soaked vermiculite, with identical soil moisture. For drought tolerance assay, thirty to forty plants of each OX lines and WT control were cultured in growth chamber as described above without watering for another 20 days, since the original vermiculite humidity could ensure regular growth of Arabidopsis plants. The water content of vermiculite in the pots was reduced over this timeframe but did not induce drought stress. The plants were then subjected to a drought tolerance assay by continuous withdrawing water, whereby WT and transgenic plants (OX 1#, OX 5#, and OX 10#) turned to continuous drought conditions for 10 days and then rewatered adequately and recovered for another 7 days. Survival rates were then calculated according to the number of living plants at the end of the experiment. For salt tolerance assay, the ten-day-old seedlings of WT and transgenic Arabidopsis (OX 1#, OX 5#, and OX 10#) growing in square pots were irrigated with 200 mM NaCl solution (50 mL each pot) and the phenotype was recorded after 7 days. For alkaline tolerance assay, the same seedlings were irrigated with 100 mM NaHCO 3 solution (50 mL each pot) and the phenotype was recorded after 4 days. Subcellular localization of CrPIP1;5 in Arabidopsis was also detected using GFP fusion protein in seedling roots. The OX homozygous lines of CrPIP1;5-pEGAD and the control (pEGAD) transgenic plants were sterilized and spotted in MS plates to generate seedlings. Then, threeto four-day-old seedlings were detected using a camera fitted to a confocal laser scanning microscope to record the GFP fluorescence of different tissues. To confirm it was the cell membrane that was fluorescing, seedling roots were stained by using a propidium iodide solution (1 mg/mL in phosphate buffer solution).

Statistical analysis
All the experiments in this study were repeated three times independently and results are shown as mean ± SD (n ≥ 3). Pairwise differences between means were analyzed using Student's t-tests in Microsoft Excel 2010.