Fasciclin-like arabinogalactan gene family in Nicotiana benthamiana: genome-wide identification, classification and expression in response to pathogens

Background Nicotiana benthamiana is widely used as a model plant to study plant-pathogen interactions. Fasciclin-like arabinogalactan proteins (FLAs), a subclass of arabinogalactan proteins (AGPs), participate in mediating plant growth, development and response to abiotic stress. However, the members of FLAs in N. benthamiana and their response to plant pathogens are unknown. Results 38 NbFLAs were identified from a genome-wide study. NbFLAs could be divided into four subclasses, and their gene structure and motif composition were conserved in each subclass. NbFLAs may be regulated by cis-acting elements such as STRE and MBS, and may be the targets of transcription factors like C2H2. Quantitative real time polymerase chain reaction (RT-qPCR) results showed that selected NbFLAs were differentially expressed in different tissues. All of the selected NbFLAs were significantly downregulated following infection by turnip mosaic virus (TuMV) and most of them also by Pseudomonas syringae pv tomato strain DC3000 (Pst DC3000), suggesting possible roles in response to pathogenic infection. Conclusions This study systematically identified FLAs in N. benthamiana, and indicates their potential roles in response to biotic stress. The identification of NbFLAs will facilitate further studies of their role in plant immunity in N. benthamiana.


Background
The plant cell wall is a dynamic and complex organelle, which is mainly composed of cellulose, hemicellulose, pectins, glycans and proteins. It is not only involved in mechanical protection and structural support, but also in signal transduction, intercellular communication and immunity [1][2][3].
Hydroxyproline-rich glycoproteins (HRGPs) are typical cell-wall proteins that participate in plant growth, development and immunity [4,5]. HRGPs have a few repetitive glycosylation motifs containing hydroxyproline (Hyp) residues that are glycosylation sites. Based on the different levels of O-glycosylation, the HRGP superfamily can be classified into three subfamilies: the hyperglycosylated arabinogalactan proteins (AGPs), the minimally glycosylated Pro-rich proteins (PRPs) and the moderately glycosylated extensins (EXTs) [5]. AGPs are abundant in plants, and can themselves be subdivided into six main subclasses: the classical AGPs, AG peptides, Lys-rich AGPs, FLAs, nonclassical AGPs and chimeric AGPs [6]. FLAs generally have one or two fasciclin domains, and have been discovered in fruit flies, mammals, sea urchins, plants, yeast and bacteria. Besides fasciclin domains, FLAs often contain an N-terminal signal peptide as well as a C-terminal glycosylphosphatidylinositol (GPI) anchor signal peptide. The GPI and fasciclin domains are functionally important and are believed to mediate cell adhesion [7,8].
So far, the FLA family members have been identified in several plant species. 21 FLAs have been identified in Arabidopsis thaliana [8], 27 in rice (Oryza sativa) [9,10], 34 in wheat (Triticum aestivum) [10], 35 in poplar (Populus trichocarpa) [11], 19 in cotton (Gossypium hirsutum) [12], 33 in Chinese cabbage (Brassica rapa) [13], 18 in Eucalyptus grandis [14] and 23 in textile hemp (Cannabis sativa) [15]. FLAs are cell wall structural glycoproteins that mediate cellulose deposition and cell wall development. They are believed to participate in fiber development, elongation and stem dynamics, affecting the quality of fiber and wood in cotton and woody plants like poplar and eucalyptus [16] and are abundant in the xylem [17]. Knock down of PtFLA6 resulted in a decrease of stem hardness and xylem cellulose lignin, and down-regulation of genes involved in cell wall synthesis [18]. Overexpression of GhGalT1 promoted cotton fiber development by controlling the glycosylation of FLAs [19] and in plants where GhAGP4 was knocked down, fiber initiation and elongation were strongly inhibited and there was suppression of the cytoskeleton network and of cellulose deposition in fiber cells [20]. During cell wall regeneration from cotton protoplasts, there is up regulation of proline-rich protein (PRPL), glycine-rich protein (GRP), and extensin (EPR1) but also of FLA2, which may mediate the construction and modification of the cell wall [21]. In addition, AtFLA11, AtFLA12, EgrFLA2 and EgrFLA3 have similar functions [14,22]. FLAs can also regulate pollen development. In Arabidopsis and maize, AtFLA9 and ZmFLA7 showed negative correlation with abortion, and reductions in the expression of FLAs increased the abortion of fertilized ovaries [23]. AtFLA3-silenced Arabidopsis had abnormal pollen grains, also suggesting a function in pollen formation [24]. FLAs have also been implicated in cell-to-cell communication [13], shoot development [25,26], seed mucilage adherence [27], glycan stabilization [28] and in response to stresses from salt [29][30][31], cold [32] and hydrogen peroxide [33].
Although FLAs have multiple roles in plant growth and development, very little is known about any involvement they may have in response to pathogens. N. benthamiana is a model plant for studying plant immunity, but the structure, function and expression of its FLA gene family members is unknown. In this study, we have identified and characterized the members of the FLA gene family in N. benthamiana and also reported their subcellular localization, expression patterns, and their response to viral and bacterial pathogens.

Identification of members of the NbFLA family
Based on previous studies [8], FLAs have an AGP-like glycosylated region, a fasciclin domain and an N-terminal signal peptide. We followed these criteria to identify putative FLAs in N. benthamiana. The sequences of the 21 identified AtFLAs were downloaded [8] and the N. benthamiana genome was downloaded from the Sol Genomics Network (https://solgenomics.net/) [34]. A total of 38 NbFLAs were identified by two round BLASTP and signal peptide prediction (Table 1 and Additional file 1:  Table S1). Most of these (66%) have lengths of 200-300aa, while the largest (NbFLA10) has 495aa and the smallest (NbFLA26) has only 182aa. The predicted isoelectric points range from 4.29 to 9.77, and the molecular weights (MWs) derived only from the amino acid sequences (not including glycans) are in the range 19.68-52.32 kDa. The protein properties of the NbFLAs are similar to those of other plant species [8,11].

Phylogenetic analysis and multiple sequence alignment of NbFLAs
To better reveal their evolutionary relationships and to help the classification of NbFLAs, the sequences of all 21 AtFLAs and 38 NbFLAs were used to construct a phylogenetic tree (Fig. 1). Because of the low sequence similarity between some FLAs, phylogenetic analysis alone could be misleading and therefore pair-wise sequence similarity, presence and number of fasciclin domains and GPI were also used to create a classification, as previously described [8]. Most NbFLAs were sufficiently classified by phylogenetic analysis, but for a few (NbFLA8/15 and NbFLA10/14) their protein properties including the presence and number of fasciclin domains and GPI had also to be taken into account. The 38 NbFLAs we identified could be divided into the same four subclasses previously reported for the AtFLAs [8], named I to IV (Fig. 1 We also constructed separate phylogenetic trees for each subclass of NbFLAs, including the sequences from the other 8 plant species in which FLAs have been identified (Arabidopsis, rice, wheat, poplar, cotton, Chinese cabbage, Eucalyptus grandis and textile hemp) (Additional file 2: Fig. S1). In general, FLAs have a relatively high homology among closely related species, like AtFLAs/BrFLAs and OsFLAs/TaFLAs. FLAs from the same species often exist in pairs, like NbFLA26/29 and TaFLA19/27, suggesting that they may be paralogous genes. Subclasses I and III are the two largest groups and the clustering patterns are complicated. FLAs from the same species do not generally group together, and there are some closely-related pairs from different species suggesting that they are orthologous genes (e.g. NbFLA12/BrFLA22 and TaFLA2/OsFLA2). In subclasses II and IV, most FLAs from the same species group together (e.g. NbFLA6/9/16/17 and TaFLA6/7/8/29). Subclass II has fewest members and most of them are not GPI anchored, but the OsFLAs are a significant exception. Previously reported fasciclin domains contain about 110-150 amino acid residues and have two highly conserved regions (H1 and H2) and a [Phe/Tyr]-His ([Y/F] H) motif [12]. An alignment of the amino acid sequences of the fasciclin domains of the NbFLAs constructed using MUSCLE and some manual analysis showed a similar pattern (Fig. 2). The Thr residue in the H1 region is highly conserved and is followed by other conserved residues such as Val/ Ile (one position after Thr) and Asn/Asp (six positions after Thr). These residues may play a role in maintaining the structure of the fasciclin domain and/or cell adhesion [12]. As reported for other fasciclin domains [11,31,35], small hydrophobic amino acids such as Leu, Val and Ile are abundant in the H2 region. In the [Y/F] H motif, His and Pro residues are also relatively conserved.

Analysis of the structural and conserved motifs of NbFLAs
Further analysis of gene structure and motifs of the NbFLAs is shown in Fig. 3. The phylogenetic tree confirmed that NbFLAs could be grouped into four subclasses ( Fig. 3a). Analysis of the genomic DNA sequences showed that NbFLAs usually had 0, 1 or 2 introns (Fig. 3b). All of the members in subclass II have one or two introns while most members of subclasses I and III have none (Fig. 3b). The most closely related members of each subclass, usually have a similar exon/intron structure, with little difference in the length of introns and exons. However, a few NbFLA gene pairs showed different intron/exon arrangements. For example, NbFLA1 and NbFLA31 have high sequence similarity, but NbFLA1 has no introns while NbFLA31 has one.
An online MEME analysis was done to identify additional motifs among the 38 NbFLAs. Twenty conserved motifs were predicted ( Fig. 3c and Additional file 3: Table S2) and each NbFLA contained between five and ten of these. Some motifs were common to most members, while the others were unique to one or few subclasses. For example, most NbFLAs (84%) contained motif 17. Motifs 10 and 11 were present only in subclass III and motifs 9, 16, 18 and 19 were found only in subclass II. Motif 7 was unique to subclasses II and IV, and

Prediction of cis-acting elements and transcription factors among the NbFLAs
The cis-acting elements in the promoter regions of the NbFLAs were analyzed and a totally 105 cis-acting elements were predicted ( Fig. 4 and Additional file 4: Table  S3). These cis-acting elements were related to environmental stress, hormone response, development, light response, promoter, site binding and other functions (Fig.  4a). The most abundant elements were light-responsive elements, including G-box, GT1-motif and GATA-motif. 15 hormone responsive elements were identified and these are mainly involved in response to abscisic acid (ABA) or methyl jasmonate (MeJA) (Fig. 4b). Among the predicted environmental stress-related elements, STRE, MBS and ARE were the most abundant (Fig. 4c). Several abundant predicted cis-acting elements are known to mediate plant immunity. For example, VdMYB1 binds to the MBS in the VdSTS2 gene promoter, thus activating VdSTS2 transcription and positively regulating defense responses [36]. Machi3-1 and TaRIM1 also bind MBS cis-acting elements to increase host resistance [37,38].
By binding to transcription factors (TFs), cis-acting elements regulate the precise initiation and efficiency of gene transcription. We then therefore predicted potential TFs which may regulate the transcription of NbFLAs ( Fig. 5 and Additional file 5: Table S4). The NbFLAs had Fig. 5 Regulation network between NbFLAs and potential TFs. Green hexagons represent transcription factors, blue rectangles represent NbFLAs, and black lines represent potential regulatory relationships an average of five TFs, but it appears that NbFLA4 and NbFLA27 may be regulated by more TFs, including specific TFs like RAV and CPP, while NbFLA8/15/38 may each be regulated by only two TFs. In total, 25 TFs were predicted of which C2H2, BBR-BPC, Dof, Myb and MIKC were the most abundant. Previous studies have demonstrated the role of TFs in regulating plant immunity. NbCZF1, a novel C2H2-Type zinc finger protein, is a regulator of plant defense [39] and VvDOF3 enhances powdery mildew resistance in Vitis vinifera [40]. In addition, AtMyb15 and MdMyb30 also participate in enhancing disease resistance [41,42].

Subcellular localization analysis of NbFLAs
Bioinformatics analysis based on the NbFLA amino acid sequences suggested that all of them could locate to membranes, and only NbFLA4 was predicted to locate in both the nucleus and membranes (Table 1). To validate these predictions, we selected one NbFLA in each subclass (NbFLA4/6/31/32) to analyze their localization by laser confocal microscopy. AtP1P2A-GFP was used as membrane marker [43]. The results showed that while NbFLA6 and NbFLA32 were only located in membranes, NbFLA4 was present both in membranes and the nucleus, consistent with the predictions (Fig. 6).
A GPI anchored signal is vital for membrane localization and is predicted in about two thirds of AtFLAs and PtrFLAs and in 20 of 38 (53%) of NbFLAs (Table 1). Among the four selected NbFLAs, only NbFLA31 was not GPI anchored. Correspondingly, although a plasmolysis experiment confirmed the membrane localization of NbFLA31, a diffused red fluorescence could also be observed in the cytoplasm ( Fig. 6 and Additional file 6: Fig. S2).

Tissue-specific expression of NbFLAs
To comprehensively understand the functions of NbFLAs, two or three NbFLAs from each subclass were randomly selected to analyze their expression in five different tissues (root, stem, young leaf, mature leaf and flower) by RT-qPCR ( Fig. 7 and Additional file 7: Fig.  S3). The expression level of all selected NbFLAs (except NbFLA4) was higher in young leaves than in mature ones. NbFLA11/18/31/32/34 were highly expressed in young leaves, and NbFLA4 were expressed highly in flowers. It was earlier reported that PtFLA6 is specifically expressed in tension wood (TW) and that decreased transcripts of PtFLA6 influenced stem dynamics [18]. In this study, NbFLA2/6/15/17, belonging to subclasses I and II, were highly expressed in stems, suggesting that they may play a role in stem dynamics.

Expression of NbFLAs under biotic stress
To investigate whether NbFLAs participate in the response to pathogens, leaves of N. benthamiana were inoculated with turnip mosaic virus (TuMV), potato virus X (PVX), pepper mottle mosaic virus (PMMoV) and the bacterial pathogen Pseudomonas syringae pv tomato strain DC3000 (Pst DC3000). At 5 days post virus inoculation (dpi), or 2 days post Pst DC3000 infection, leaves were collected to study the expression pattern of 11 NbFLA genes by RT-qPCR (Fig. 8).
TuMV infection led to a huge reduction in expression of all the NbFLAs tested, especially NbFLA15/18/32/34, which all decreased by more than 99%. PVX or PMMoV infection usually induced a modest reduction in expression, although NbFLA6 was slightly upregulated by PVX. The bacterial pathogen Pst DC3000 decreased expression of most NbFLAs by 73-99% but, in contrast, NbFLA4 and NbFLA7 were substantially upregulated. These results show that most NbFLAs are substantially affected by TuMV and Pst DC3000 and may therefore play roles in post-infection responses.

Discussion
FLA families have been identified and characterized in several plants including Arabidopsis [8], rice [9,10], wheat [10], poplar [11], cotton [12], Chinese cabbage [13], Eucalyptus grandis [14] and textile hemp [15]. In this study, we identified 38 FLAs in N. benthamiana and found that their structural domains were conserved by studying phylogenetic trees, gene structure and conserved motifs (Fig. 3). In general, NbFLAs could be divided into four subclasses and NbFLAs in each subclass had similar gene structure, motifs and conserved domains. Consistent with the FLAs in Arabidopsis [8], subclass II contained fewest NbFLAs and NbFLAs in subclass IV were the most variable. The FLAs of other dicotyledonous plant species had similar properties in each subclass, but while dicot members of subclass II have no GPI, most OsFLAs and TaFLAs in the subclass are GPI anchored [10]. In addition, OsFLAs in subclass II have only one fasciclin domain, unlike the FLAs of the dicotyledonous species [10]. Thus a different classification of FLAs in monocotyledonous plants may be required.
Twenty-five of the 38 NbFLAs had a single fasciclin domain, 13 of them had two domains and 20 of the 38 were GPI anchored. A GPI-anchored signal together with a fasciclin domain are known to be important for cell adhesion, for membrane localization and for enabling more stable interactions between adhesion complexes. It has been suggested that plants may have FLAs with GPI-anchoring for maintaining the integrity of the plasma membrane and FLAs that are not GPI-anchored for mediating cell expansion [8].
Previous studies have shown different expression patterns of FLAs in the tissues of other plants. For example, AtFLA11/12 were highly expressed in stems [22], as were BrFLA6/9/22 (homologous to AtFLA11). Some EgrFLAs were also highly expressed in stems [14,22] and 10 Pop-FLAs were highly expressed in poplar tension wood [35].
Some biotic and abiotic stresses lead to significant changes in the transcription of FLAs. For example, Under H 2 O 2 stress, the expression levels of wheat FLA proteins were increased, which may contribute to H 2 O 2 tolerance [33]. Similarly, AtFLA3 was expressed more highly under cold stress [32]. Under salt stress, OsFLA10/18 expression was reduced [9] while PtrFLA2/ 12/20/21/24/30 were upregulated [11]. In addition, TaFLA3/4/9 were downregulated after heat, ABA or NaCl treatment [10]. OsFLA24 and AtFLA1/2/8 were also significantly reduced following ABA treatment [8,9]. Many of the frequently predicted TFs in the NbFLAs, including C2H2, Dof and Myb, have been reported to play a role in the ABA pathway [45][46][47][48] and therefore, as in other species, NbFLAs may be regulated by the ABA pathway. While the function of FLAs in the signaling pathway during abiotic stresses has been investigated, little is known about their potential role in response to pathogens. AtFLA1/2/8 were decreased by pathogen challenge, oxidative stress and in ascorbatedeficient vtc mutants [49]. The fungus Ophiostoma novo-ulmi reduced the expression of FLAs in English elm ramets [50]. Our results show that almost all NbFLAs were specifically downregulated by TuMV and Pst DC3000 infection and this suggests that NbFLAs may have specific roles in pathogen infection.
Because of their role in cell adhesion and their membrane localization, AGPs (including FLAs) may interact with receptor-like kinases as wall-associated kinases and thus be involved in signal transduction [51]. For example, AtFLA4 (SOS5) mediated root growth and seed adhesion through cell wall receptor-like kinase (FEI1/2) [27], and modulated ABA signaling to regulate cell wall biosynthesis and root growth [25,27]. The known functions of GPI and the fasciclin domain suggest that NbFLAs might be involved in host-pathogen interactions. Thus, a further role of NbFLAs in plant resistance is worth exploring.

Conclusion
In this study, 38 NbFLAs were identified and could be divided into four subclasses. In general, the closest members of NbFLAs from the same subclass have similar structure and conserved motifs. The expression patterns of selected NbFLAs in different tissues were diverse and selected NbFLAs were downregulated following infection by TuMV or Pst DC3000. Our results will help to lay the foundation for understanding of the structure and characteristics of the FLA family and for exploring the relationship between FLAs and immunity in N. benthamiana.

Identification of the NbFLAs family
The sequences of the 21 identified AtFLAs were downloaded and the N. benthamiana genome was downloaded from the Sol Genomics Network (https://solgenomics.net/ ) [34]. NbFLAs were identified by two rounds of BLASTP. Firstly, all AtFLAs were used to search possible NbFLAs using TBtools [52]. Then NCBI Batch CD-Search [53,54] was used to confirm whether candidate NbFLAs contained a fasciclin domain including FAS1 (smart00554), Fasciclin superfamily (cl02663) or Fasciclin (pfam02469). Next, we predicted the N-terminal signal peptide by SignaIP5.0 [55], the C-terminal GPI anchor addition signal by big-PI Plant Predictor [56], and the glycosylation site by NetGlycate 1.0 [57]. Finally, using criteria previously established, sequences that contained an AGP-like glycosylated region, fasciclin domains and an N-terminal signal peptide were considered as NbFLAs [11]. The CDS length, pI and molecular weights (MW) of all predicted NbFLAs were then determined by ExPASy [58] and their subcellular localization predicted by Plant-mPLoc [59].

Phylogenetic analysis and multiple sequence alignment
Sequences of AtFLA proteins were obtained from the NCBI protein database (http://www.ncbi. nlm.nih.gov/ protein/). A neighbor-joining (NJ) phylogenetic tree of full-length sequences of AtFLAs and NbFLAs was constructed with 1000 bootstrap replicates using MEGA7.0. A multiple sequence alignment of all NbFLAs was also created by Clustal X 2.0 [60]. Gene structure and conserved domains were analyzed and visualized using NCBI Batch CD-Search [53,54] and TBtools [52]. Conserved motifs of the genes were analyzed by the MEME program [61] with the following parameters: optimum motif width was set to 30-70, the number of repetitions was set to zero or one, the maximum number of motifs was set to identify 15 motifs.

Promoter cis-acting elements and TFs prediction
The promoter cis-Acting elements were predicted by PlantCARE [62] and transcription factors were predicted by PlantRegMap [63], with N. sylvestris as the target species.

Plasmid construction and Agroinfection assays in N. benthamiana
Based on the sequences above, we cloned the CDS sequences of NbFLA4/6/31/32 and constructed them into a transient expression vector with red fluorescent label. All primers used for plasmid construction are listed in Additional file 8: Table S5. Agroinfection assays were conducted as previously described [64]. Briefly, the constructs were transformed into A. tumefaciens (strain GV3101) by electroporation. The transformants were cultured and re-suspended in the inoculation buffer [10 mM MgCl 2 , 2 mM acetosyringone, 100 mM MES (pH 5.7)] for 3-5 h at room temperature. The suspensions were then adjusted to OD 600 = 0.1 and were infiltrated into leaves of 4-to 6-week old N. benthamiana plants with needleless syringes.

Plant growth and pathogen inoculation
N. benthamiana seeds were donated by Dr. Yule Liu (Tsinghua University, China) and grown in mixed soil matrix (peat: vermiculite = 1:1) under a 16-h light (2000 lx)/8-h dark photoperiod at 26 ± 2°C with relative humidity 60 ± 5%. A TuMV infectious clone was kindly provided by Dr. Fernando Ponz (INIA, Laboratorio de Virologı'a Vegetal, Spain), a PVX infectious clone was kindly provided by Dr. Stuart MacFarlane (James Hutton Institute, UK) and a PMMoV infectious clone was created in our lab. The Pst DC3000 strain was kindly provided by Dr. Yule Liu (Tsinghua University, China). TuMV, PVX and PMMoV were inoculated onto the newly expanded leaves of N. benthamiana. Inoculum was obtained by homogenizing virus-infected leaves in phosphate buffer, and with phosphate buffer as mock control. The Pst DC3000 was cultured in King's B medium at 28°C. Leaves of N. benthamiana were infiltrated with a suspension of Pst DC3000 (OD 600 = 10 − 5 ) in 10 mM of MgCl 2 , while plants only infiltrated with 10 mM of MgCl 2 were used as the negative control as previously described [65]..

Expression analysis by RT-qPCR
RT-qPCR analysis was performed to confirm the expression of representative NbFLA genes. We used at least three independent biological replicates and three technical replicates. First-strand cDNA was synthesized from 0.5 mg of RNA with PrimeScript RT reagent kit (TaKaRa). RT-qPCR was carried out by SYBR-green fluorescence using the Roche LightCycler®480 Real-Time PCR System. Relative gene expression levels were calculated according to the ΔΔCT method [66] and visualized in a heat map by Tbtools [52]. All primers used for RT-qPCR are listed in Additional file 8: Table S5.