Genome-wide identification and evolutionary analysis of RLKs involved in the response to aluminium stress in peanut

Background As an important cash crop, the yield of peanut is influenced by soil acidification and pathogen infection. Receptor-like protein kinases play important roles in plant growth, development and stress responses. However, little is known about the number, location, structure, molecular phylogeny, and expression of RLKs in peanut, and no comprehensive analysis of RLKs in the Al stress response in peanuts have been reported. Results A total of 1311 AhRLKs were identified from the peanut genome. The AhLRR-RLKs and AhLecRLKs were further divided into 24 and 35 subfamilies, respectively. The AhRLKs were randomly distributed across all 20 chromosomes in the peanut. Among these AhRLKs, 9.53% and 61.78% originated from tandem duplications and segmental duplications, respectively. The ka/ks ratios of 96.97% (96/99) of tandem duplication gene pairs and 98.78% (646/654) of segmental duplication gene pairs were less than 1. Among the tested tandem duplication clusters, there were 28 gene conversion events. Moreover, all total of 90 Al-responsive AhRLKs were identified by mining transcriptome data, and they were divided into 7 groups. Most of the Al-responsive AhRLKs that clustered together had similar motifs and evolutionarily conserved structures. The gene expression patterns of these genes in different tissues were further analysed, and tissue-specifically expressed genes, including 14 root-specific Al-responsive AhRLKs were found. In addition, all 90 Al-responsive AhRLKs which were distributed unevenly in the subfamilies of AhRLKs, showed different expression patterns between the two peanut varieties (Al-sensitive and Al-tolerant) under Al stress. Conclusions In this study, we analysed the RLK gene family in the peanut genome. Segmental duplication events were the main driving force for AhRLK evolution, and most AhRLKs subject to purifying selection. A total of 90 genes were identified as Al-responsive AhRLKs, and the classification, conserved motifs, structures, tissue expression patterns and predicted functions of Al-responsive AhRLKs were further analysed and discussed, revealing their putative roles. This study provides a better understanding of the structures and functions of AhRLKs and Al-responsive AhRLKs. Supplementary Information The online version contains supplementary material available at 10.1186/s12870-021-03031-4.


Background
Aluminium (Al) is one of the most harmful factors in plant growth in acidic soils, and Al can cause 25% to 80% yield losses depending on the crop [1,2]. Al signalling induces a series of physiological events in plant cells. The most obvious phenomena of Al toxicity are inhibition of cell elongation in the apical region and induction of programmed cell death (PCD) [3][4][5]. PCD is an active, orderly, and genetically controlled form of cell death and occurs in plants throughout development and in response to environmental stresses [6]. Early studies found that Altreatment can enhance Fe 2+ -induced lipid peroxidation and PCD in tobacco cells [7]. For decades, Al-induced PCD has been proven in many plant species including: soybean (Glycine max) [8], maize (Zea mays) [9], barley (Hordeum vulgare) [10], tomato (Lycopersicon esculentum) [11]and peanut (Arachis hypogaea) [12]. Al-induced PCD is mediated through two cell signal transduction pathways: a mitochondrial-dependent pathway and a nuclear-dominated mitochondrial-independent pathway [5]. However, Al signal information and its transmembrane transduction are unknown. Both pathways use plasma membrane and/or cell wall-localized receptors to sense environmental stimuli and efficiently transduce signals between cells, which perceive and transduce signals to modulate gene expression and/or enzyme activity as well as motility [13]. Receptor-like protein kinase (RLK) play important roles in the process of cell signal transduction, and are involved in a variety of plant physiological processes including: self-incompatibility [14], environmental signal processing [15], organ shape and meristem activity [16], hormone signal transduction [17], PCD [18], and tolerance to oxidative stress [19]. RLKs sense and transduce signals through protein interactions and phosphorylation [20]. Based on the structure of the extracellular domain, RLKs have been classified into several families such as S-RLKs, LRR-RLKs, EGF-RLKs, LecRLKs, TNFR-RLKs and PR5K-RLKs [21].While many RLKs involved in the environmental stress response have been found, few RLKs have been reported to be involved in Al stress response. WAK1, which mediates the interaction between the cell wall and cytoplasm and may participate in cell elongation and morphogenesis [22], was the first RLK that was found to be involved in the Al stress response. Theoverexpression of WAK1 was reported to enhance Al tolerance in Arabidopsis [23]. The results showed that RLKs play an important role in Al-induced PCD, but the mechanism of RLKs in the regulation of Alinduced PCD is unknown.
Peanuts are an important oil crop worldwide. Aldependent inhibition of growth causes a reduction in peanut yield in acidic soil. There is no comprehensive analysis of the RLK gene family in the peanut. In the present study, recently released peanut whole genome sequence data (http:// peanu tgr. fafu. edu. cn/ index. php) were utilized to analyse the RLK gene family in peanut. A total of 1311 AhRLKs have been identified. The LRR-RLKs and LecRLKs were further divided into 24 and 35 subfamilies, respectively based on a phylogenetic analysis. The evolution and collinearity of AhRLKs were investigated. The evolutionary patterns of the RLK gene family were tested by investigating gene duplication events in the peanut. In addition, 90 AhRLKs in response to Al stress were identified by transcriptomic analysis, and the expression profiles of AhRLKs at different Al treatment time-points were comprehensively determined. These results will provide a basis for further research on the evolution and physiological functions of AhRLKs in response to Al stress in the peanut.

Identification of AhRLKs in the peanut
To identify the members of AhRLKs in the peanut, we downloaded publicly available peanut genome sequence data and used the Arabidopsis RLK sequence as a query to perform a genome-wide similarity search. After filtration of the sequence, a total of 1311 AhRLKs that contained at least one kinase domain were initially identified, including 548 LRR-RLKs, 274 LecRLKs, 83 cysteine-rich RLKs, 76 EGF RLKs, 49 proline-rich RLKs, 46 s-domain RLKs, 22 TMK-RLKs, 2 TNFR-RLKs, 1 RRO-RICH RLK, 28 RLCK-RLKs, 24 LysM-RLKs, and 158 no obvious domains (Additional files 1 and 2). LRR-RLKs and LecR-LKs were considered for further analyses.

Chromosomal location and gene duplication of AhRLKs
Physical positions of AhRLKs obtained from the "Peanut Genome resource" (http:// peanu tgr. fafu. edu. cn/) [26] were used to map them onto peanut chromosomes. Chromosome location information demonstrated that all the AhRLKs were unevenly distributed among the 20 chromosomes of the peanut, and 1.14% (15/1311) did not show assembly information (Fig. 3). Many AhRLKs were located on chromosomes 14 (111, 8.47%) and 13 (106, 8.09%), while only 31 (2.36%) AhRLKs were located on chromosome 6. Regarding LRR-RLKs, subfamilies LRR-XI and LRR-III were present on all chromosomes, while others were found only on some chromosomes. The majority of the LRR-RLKs and LecRLKs were located on chr 3, 13, 8 and 18 (Additional file 3), in particular, all members of the G-LecRLKs-XVII and G-LecRLKs-VIa subfamilies were distributed on chr 8 and 18 (Additional file 4, Fig. 4).
Gene replication events play an important role in the evolution of new functions of proteins and the expansion of genomes. Segmental duplication and tandem duplication are the main causes of the expansion of gene families in plants [27]. The position of two or more AhRLKs on the chromosome within 100 kb was considered a tandem duplication cluster. The results showed that approximately 9.53% (125/1311) of the genes were located in tandem duplication regions and constituted 52 clusters (Additional file 5). Among these genes, 5 V II I-1   I X   I   X  V  I  I  I  I  I  X  IX  VIII-2  X X   X X I  X X II   X I I I   XI   V I I  X X I I I  X X I V  I V   II I  Xb  X II Ib X II Ia  X I V  X a V X V   X V I  V I -2 V I -1

Phylogenetic analysis of Al-responsive AhRLKs
In a previous study, we performed a transcriptome analysis to identify differentially expressed genes (DEGs) and pathways between two peanut cultivars under Al Stress [29]. In this study, we scrutinized transcriptome data to detect the AhRLKs involved in the Al response. Genes with log2-transformed ratio FPKM values greater than 1 or less than -1 were defined as differentially expressed genes. A total of 90 Al-responsive . To reveal the evolutionary relationships of these proteins, a phylogenetic tree was constructed using the ML method (Fig. 6). Phylogenetic analysis of all 90 AhRLKs revealed that the Al-responsive AhRLKs were further classified into 7 groups, including 48.9% LRR-RLKs, 21.1% LecRLKs and 8.9% CRKs. The phylogenetic tree showed that most of these genes belonged to LRR-RLKs and LecRLKs, covering the main subfamilies of LRR-RLKs and LecRLKs. Interestingly, these Alresponsive AhRLKs were evenly distributed across the LecRLK family, but unevenly distributed across the LRR-RLK families, focusing on LRR-III, LRR-XI, LRR-XII, LRR-VIII-1, and LRR-VIII-2.

Characterization of the amino acid sequences and gene structure of Al stress-related AhRLKs
As shown in Fig. 7, 90 Al stress-related AhRLKs were divided into 7 groups. The diversification of exons/ introns has been reported to be an important reason for the evolution of certain gene families [30]. The distribution of exons/introns of AhRLKs was further analysed. The results showed that 7.8% of Al stress-related AhRLKs (7/90) had no introns. One, two and three introns were found in 30% (   introns, of which 70.6% (36/51) were LRR-RLKs, and 7.8% (4/51) were LecRLKs. This result was similar to the study in which most LRR-RLKs in Arabidopsis had fewer than three introns [31]. Moreover, to analyse the diversity of the Al stress-related AhRLKs, the MEME tool was used to predict putative motifs of these proteins. A total of 5 different motifs were detected in Al stress-related AhRLKs and named motifs 1 to 5 (Additional file 8

Expression profiles of Al-responsive AhRLKs in different tissues
To further understand the role of Al-responsive AhRLKs in peanut growth and development, the expression profiles of Al-responsive AhRLKs from different organs, including leaves, stems, florescence, roots and root tips, were tested in a cultivated variety (A. hypogaea L.) using transcriptomic data (Fig. 8

Expression patterns of Al-responsive AhRLKs under Al stress
To further investigate the putative functions of Alresponsive AhRLKs, an RNA-Seq dataset that was generated from different Al treatment time points were utilized to reveal the expression profiles of these genes under Al stress. The expression profiles of Alresponsive AhRLKs are shown in histograms (Fig. 9).
As shown in Fig. 9

Segmental duplication events played an important role in AhRLK family evolution
RLKs are involved in a variety of plant physiological processes and various abiotic and biotic stress responses [32,33] The 548 LRR-RLKs were classified into 24 subfamilies (I to XXIV) based on their phylogenetic relationship with Arabidopsis, which was 2 times the number of Arabidopsis LRR-RLKs (Fig. 1). In general, the number of LRR-RLKs for most of the subfamilies among the peanut was two times the number of LRR-RLKs of Arabidopsis, except LRR-XII, LRR-XIV, LRR-XV and LRR-XVI, which had more than three times the number of members of Arabidopsis. Only one subfamily, LRR-V, had fewer members than Arabidopsis. The number of LecR-LKs was over 3 times the number of AtLecRLKs (Fig. 2). The subfamilies in the peanut such as L-LecRLK-VII, L-LecRLKs-IX and G-LecRLKs-VIa were much larger than the subfamilies in Arabidopsis, while some subfamilies, including G-LecRLKs-VIb, G-LecRLKs-VIII, G-LecRLKs-VII, G-LecRLKs-X, G-LecRLKs-III, L-LecR-LKs-VI, L-LecRLKs-I, L-LecRLKs-II, L-LecRLKs-III and L-LecRLKs-V, were not found in the peanut (Tables 1 and  2). Polyploidy may cause an increase in the number of genome genes in the peanut. In recent research, a total of 309, 379, 467, 531, and 543 LRR-RLKs have been identified in diploid rice [34], diploid poplar [35], tetraploid soybean [36], allohexaploid wheat [37] and tetraploid cotton [38], respectively, indicating that larger gene families are present in polyploid plants. Other  duplication and shuffling, also contribute to the expansion of gene families. Gene duplication was the main mechanism for evolutionary events [39]. The gene duplication results revealed that 9.53% (125/1311) of AhRLKs were located in regions with tandem duplications, and 61.78% (810/1311) were located in regions with segmental duplications, which indicated that segmental duplication played a major role in the evolution of AhRLKs (Additional file 5). Among the AhLRR-RLKs 5.66% (31/548) and 66.60% (365/548) were found to be located in the tandem duplication region and the segmental duplication region, respectively. This finding is consistent with the work in soybean that segmental duplication may be the main mechanism of LRR-RLK amplification [36]. In addition, the ka/ks ratios of 94.9% (1290/1360) of AhRLKs were less than 1, which suggested that most AhRLKs were selected for purification (Fig. 5). The ka/ks ratios of six gene pairs including, AH16G29500. 1     and AH14G36690.1 and AH14G43630.1 were more than 1, which indicated that these genes were in a state of positive selection in peanuts, evolving rapidly, and might be very important for the evolution of the peanut. We also calculated the divergence time, and the results showed that many tandem duplication events appeared to have occurred during relatively recent key periods 0-10 MYA, and many segmental duplication events appeared to have occurred during 0-30 MYA (Fig. 5b; Additional file 6), illustrating that these AhRLKs were generated by recent gene duplication events in Arachis hypogaea L. Moreover, 28 gene transformation events were detected among the genes in the 52 tandem duplication clusters, and 44 genes involved in at least one gene conversion event, which suggested that gene conversion events had taken place between the duplicated AhRLKs. Gene conversion is implicated in the concerted evolution of multigene families, which helps gene evolution by allowing more time for duplicated genes to obtain selectable differences [40,41]. As changes in expression patterns are an important factor that cause genes to gain selectable differences [40,42], studying the temporal Table 1 Total number of receptors distributed in the different subfamilies of LRR-RLKs and spatial expression patterns of these genes would be of interest.

Conservation of the AhRLKs in response to Al stress
In this study, a total of 90 AhRLKs were identified as Al stress-related genes, which were divided into 7 groups (Fig. 7). Most of the subgroups show certain regularity of exon-intron structure. For instance, all genes in subgroups I, II and VII contained more than three introns. Members belonging to the same subgroup had similar exon/intron organization. Furthermore, 5 conserved motifs were identified in these AhRLKs and the motif compositions among subgroups were consistent with the phylogenetic classification. These results indicated that the members in the subgroups were more conservative in the evolution.

Diversity roles of Al-responsive AhRLKs in different subgroups
To further understand the Al-responsive AhRLKs in the peanut, we investigated the potential functions of each subgroup (Table 3). In subgroup I, PERK1 has been reported to regulate ABA signalling pathways and modulate the expression of genes related to cell elongation and ABA signalling during root growth [43], implying that the genes in Subgroup I were essential to plant signalling and growth. The inhibition of root elongation is known to be the primary symptom of Al toxicity, and the members of subgroup I may take part in the Al response by influencing cell elongation. The genes known to function in subgroup II were reported to play a role in plant signal transduction, plant growth and biotic stress response, for instance, PXC1 and CRCK1 played a role in signal transduction [44,45], PRK1 was essential for the postmeiotic development of pollen [46], FLS2 was involved in preinvasive immunity against bacterial infection [47], and RCH1 was critical to the resistance of the hemibiotrophic fungal pathogen Colletotrichum higginsinaum [48]. In Subgroup III, ANXUR1/ ANXUR2 were involved in controlling pollen tube rupture during the fertilization process and regulating signal transduction [49]. FERONIA was required for cell elongation during vegetative growth [50], suggesting that the genes in subgroup III might play an important role in plant morphology. In subgroup IV, TMK1 was an essential enzyme for DNA synthesis in bacteria [51], which indicated that the genes of subgroup IV might play a critical role in cell expansion and proliferation regulation. The subgroup V gene RLK1 was reported to increase tolerance to salinity, heavy metal stresses, and Botrytis cinerea infection [52], suggesting that the genes of subgroup V are implicated in biotic and abiotic stress responses. In subgroup VI, CRK5 was reported to respond to drought and salt stresses [53], and CRK45 was a potentially positive regulator of ABA signalling in early seedling growth [54] and stomatal movement [55], indicating that the genes of subgroup VI are critical to the abiotic stress response and related to plant morphology. The reported genes in subgroup VII, such as GsSRK, were shown to be positive regulators of plant tolerance to salt stress [56], and SD1-29 improved plant resistance to bacteria [57], showing that the genes of subgroup VII have critical roles in the response to biotic and abiotic stresses. In general, Al-responsive AhRLKs in different subgroups take part in the Al response by different pathways. Subgroups I and II are related to signal transduction, subgroup II is implicated in the biotic stress response, subgroups III and VI play an essential role in plant morphology, subgroup IV plays a critical role in cell expansion and proliferation regulation, and subgroups V and VII are critical to the biotic stress and abiotic stress response ( Table 3). The AtRLK gene family plays a role in plant growth and development processes [63]. As shown in the histograms in Fig. 8, the expression pattern of the Al-responsive AhRLKs exhibited tissue specificity, and approximately 2.2% (2/90, AH07G04000.1 and AH16G09430.1) of Al-responsive AhRLKs were expressed in all four tested organs with high expression levels (value > 5) in the peanut, implying that these genes might play essential roles in plant growth and development. Approximately 2.2% (2/90, AH16G41130.1 and AH07G24540.1) of Al-responsive AhRLKs were expressed specifically and at a high level in aerial organs. About 8.8% (8/90, AH14G07810.1, AH03G21680.1 AH19G41030.1 AH13G57290.1, AH10G29990.1, AH08G20520.1, AH08G06390.1, and AH01G04120.1) of Al-responsive AhRLKs were expressed specifically and at a high level in roots or root tips. The tissue specificity of these Al-responsive AhRLKs indicates their key roles in tissue development or tissue functions. Additionally, 6 tissue nonspecific genes (AH07G04000.1, AH03G13700.1, AH10G03910.1, AH08G04680.1, AH08G04640.1, and AH16G09430.1) that were expressed at a high level specifically in roots are also worth considering. As shown in the histograms in Fig. 9, the majority of the Al-responsive RLKs were upregulated after 8 h of Al treatment in 99-1507, while only moderate changes were detected in some Al-responsive RLKs in ZH2, which suggested that Al-responsive RLKs responded rapidly to Al stress in the Al-tolerant variety. Although the genes had different expression profiles under Al stress in different varieties, the expression levels of 12 genes (AH04G28680.1, AH16G41130.1, AH01G21880.1, AH10G16100.1, AH08G24070.1,  1), whose homologues have been reported to be involved in early seedling growth regulation, early flower primordia and stamen development, lateral root emergence, abiotic stress responses and plant defence signalling in Arabidopsis thaliana, were important Al-responsive genes that may be suitable candidates for interpreting the mechanisms underlying the Al response in peanuts in future work.

Conclusions
In this study, a total of 1311 RLKs were identified in the peanut genome, 2 times the number of Arabidopsis RLKs, including 548 LRR-RLKs and 274 LecRLKs. LRR-RLK represented the largest RLK gene family identified in plants. These AhRLKs were unevenly distributed among 20 chromosomes of peanut. Compared with tandem duplication, segmental duplication might play a more critical role in some AhRLKs. Furthermore, we identified a total of 90 Al-responsive AhRLKs by mining the transcriptome database. The exon/intron compositions and motif arrangements were considerably conserved among members in the same groups or subgroups. Analysis of transcriptome data revealed tissue expression patterns of the 90 Al-responsive AhRLKs, and tissue-specific expression genes were found. Among them, root-specific genes might play a key role in Al sensing and response in the peanut. The close phylogenetic relationship of Alresponsive AhRLKs and characterized AhRLKs in the same subgroup provided insight into their putative functions. Overall, this systematic analysis provided valuable information to understand the biological functions of the AhRLK genes under Al stress in peanut.

The resources of peanut AhRLKs
All RLK full-length amino acid sequences in Arabidopsis were downloaded from UniProt (https:// www. unipr ot. org/) and these sequences were used as queries to perform a BLASTP search against A. duranensis RLKs Receptor-like serine/threonine-protein kinase SD1-8 SD1-29 resistances to bacteria in crop species [64] VII AH01G24170.1 G-type lectin S-receptor-like serine/threonineprotein kinase B120 GsSRK a positive regulator of plant tolerance to salt stress [56] Note: only the Al responsive AhRLKs with characterized homologs were listed in the table by NCBI (https:// www. ncbi. nlm. nih. gov/). These resulting sequences were then used as new queries to conduct a BLASTP search again in PEANUT GENOME RESOURSE (http:// peanu tgr. fafu. edu. cn/), to avoid missing potential members. The redundant entries were removed manually. Then the resulting unique sequences were analysed with both SMART (http:// smart. emblheide lberg. de) [65] and NCBI's Conserved Domains Database (CDD; http:// www. ncbi. nlm. nih. gov/ Struc ture/ cdd/ wrpsb. cgi) to ensure the presence of the RLK domains in newly identified members. Only proteins containing at least one kinase domain were considered putative AhRLKs, and 1311 AhRLKs were finally obtained. The amino acid residue base, and molecular weight were predicted with ExPaSy ProtParam tool (https:// web. expasy. org/ protp aram/). The genome sequence, protein sequences and genome annotation of the peanut were performed according to PEANUT GENOME RESOURSE (http:// peanu tgr. fafu. edu. cn/).

Multiple sequence alignments and phylogenetic tree construction of AhRLKs
The full-length amino acid sequences of LRR-AhRLKs, LecRLKs and 90 Al-responsive AhRLKs defined in the previous section were aligned using ClustalX in MEGA 7 with default parameters [66]. The phylogenetic tree based on the multiple sequence alignments of peanut LRR-RLKs (Fig. 1), LecRLKs (Fig. 2) and 90 AhRLKs in response to Al stress (Fig. 6) was generated by MEGA 7. A Poisson correction model was used to account for multiple substitutions, while alignment gaps were removed with partial deletion. The statistical strength was estimated by bootstrap resampling using 1000 replicates. Based on the multiple sequence alignment and the previously reported classification of Arabidopsis thaliana, the peanut RLKs were assigned to different subfamilies and subgroups [24,67].

Chromosomal locations and duplication analysis for peanut RLKs
The physical location of AhRLKs on the chromosomes was obtained from the PEANUT GENOME RESOURSE database (http:// peanu tgr. fafu. edu. cn/). All members of AhRLKs were mapped onto peanut chromosomes based on their physical positions, and chromosomal location images were produced with the online software Map Gene 2 Chromosome v2 (MG2C:http:// mg2c. iask. in/ mg2c_ v2.0/). The chromosome location information of the peanut was extracted from GFF files that contain the information of peanut genome annotation. BLASTP was performed to search for potential homologous gene pairs (E-value < 1e −5 ) across genomes. Information on homologous pairs was used as input to identify syntenic chains by MCScanX [68]. In addition, MCScanX was also used to identify tandem and segmental duplications in the AhRLK gene family. RLKs clustered together within 100 kb were regarded as tandem duplicated genes based on the criteria of other plants. The diagram was generated by TBtools [69]. The nonsynonymous (Ka) and synonymous (Ks) substitution ratios were calculated by Simple Ka/Ks Caculator in TBtools. The divergence time was calculated with the formula T = Ks/2r, and the r of dicotyledonous plants was 1.5*10^-8 synonymous substitutions per site per year [70]. We used the Geneconv program with default parameters to search evidence for tandem duplication cluster gene conversion (http:// www. math. wustl. edu/ ~sawyer/ genec onv/) [71]. Since GENE-CONV required at least three sequences for detecting gene conversion events, tandem duplication clusters that contained at least 3 genes were detected. For this program, the clustalW (CDS) alignment was used as the input. Geneconv can detect candidate fragments of directed gene conversion between gene pairs (allowing mismatch). Gene conversion events were considered as statistically significant when P < 0.05.

Gene structure and motif analysis of AhRLKs in response to Al stress
The exon-intron structures of 90 peanut Al-responsive AhRLKs were determined based on their coding sequence alignments and their respective genomics sequences, while diagrams were obtained from the online program Gene Structure Display Server with default parameters (http:// gsds. cbi. pku. edu. cn/) [72]. To identify the conserved motifs of the Al response AhRLKs, the MEME (Multiple Em for Motif Elicitation) tool was used to predict putative motifs of these proteins (http:// memesuite. org/) [73]. The combination of phylogenetic tree, gene and protein structures was generated using TBtools.

Expression Pattern Analysis for Al-responsive AhRLKs
By scrutinizing the existing transcriptome data, the expression profiles of Al-responsive AhRLKs in different tissues under normal conditions and in the root tips of different peanut varieties under Al stress were analysed.