Mining candidate gene for rice aluminum tolerance through genome wide association study and transcriptomic analysis

Background The genetic mechanism of aluminum (Al) tolerance in rice is great complicated. Uncovering genetic mechanism of Al tolerance in rice is the premise for Al tolerance improvement. Mining elite genes within rice landrace is of importance for improvement of Al tolerance in rice. Results Genome-wide association study (GWAS) performed in EMMAX for rice Al tolerance was carried out using 150 varieties of Ting’s core collection constructed from 2262 Ting’s collections with more than 3.8 million SNPs. Within Ting’s core collection of clear population structure and kinship relatedness as well as high rate of linkage disequilibrium (LD) decay, 17 genes relating to rice Al tolerance including cloned genes like NRAT1, ART1 and STAR1 were identified in this study. Moreover, 13 new candidate regions with high LD and 69 new candidate genes were detected. Furthermore, 20 of 69 new candidate genes were detected with significant difference between Al treatment and without Al toxicity by transcriptome sequencing. Interestingly, both qRT-PCR and sequence analysis in CDS region demonstrated that the candidate genes in present study might play important roles in rice Al tolerance. Conclusions The present study provided important information for further using these elite genes existing in Ting’s core collection for improvement of rice Al tolerance.


Background
Over 50% of the world's arable land is acidic and about 13% of global rice were produced on acidic soils [1][2][3]. Rice (Oryza sativa L.) is one of the most important crops in the world, nearly half of the world's population feed on rice. Aluminum (Al) which is the most abundant metal in the earth's crust could be solubilized into trivalent Al (Al 3+ ) in acidic soils when pH value is below 5.0. Root growth might be inhibited and rice yield could be reduced significantly when Al 3+ is at high concentration [4,5] while beneficial at low level [6]. Therefore, improving Al tolerance is significantly useful for rice production.
There are several limitations of linkage mapping: 1. only two alleles at any given locus can be studied in biparental crosses; 2. high cost; 3. poor mapping resolution [29], whereas association mapping based on linkage disequilibrium (LD) could overcome these limitations [30] and enable researchers to use modern technologies to exploit natural variations [31]. A genome wide association study (GWAS) was reported in rice Al tolerance using about 36 thousand SNPs [9], and an association analysis was performed in rice Al tolerance in our previous study in which only 274 SSRs markers were used in association mapping [17]. However, no higher resolution GWAS on Al tolerance within natural population was performed in previous studies on Al tolerance in rice.
Ting's rice collection which is one of the earliest rice collection in China consists of 150 varieties constructed from 2262 of 7128 original landraces [32]. Ting's core collection had been used for association mapping on rice agronomic traits [33]. Moreover, Ting's core collection had been reported that it possessed a wide-range of phenotypic variation for Al tolerance and was used to perform association mapping on Al tolerance [17]. Therefore, Ting's core collection could be an appropriate population for GWAS on rice Al tolerance.
In the present study, a GWAS for rice Al tolerance based on relative root elongation (RRE) in seedling stage was carried out using Ting's core collection with more than 3.8 million high quality SNPs. Candidate regions identified by GWAS were compared with regions identified as QTL in previous studies and with Al sensitive mutants and/or candidate genes. This study would provide important information of candidate genes for Al tolerance improvement in rice.

Identification of phenotypic variations for Al tolerance
The rice landraces in Ting's core collection revealed a wide range of phenotypic variation and indicated a normal distribution for Al tolerance which was identified by RRE (Fig. 1a, Additional file 5: Table S1) as well as showed strong Al tolerance with an average of RRE > 0.50 (Additional file 1: Figure S1).

Genome re-sequencing and SNPs getting
Ting's core collection was re-sequenced with whole genome re-sequencing method. High quality SNPs (Additional file 7: Table S3) distribution along position on each chromosome were shown in Additional file 2: Figure S2. 3,808,730 SNPs were generated totally, while there were 386,562 SNPs found in the CDS region (Additional file 7: Table S3).

Population structure and LD estimation
PCA were performed with all SNPs for identifying the population structure of Ting's core collection, and two subpopulations were observed (Fig. 1b). Moreover, we measured LD through r 2 for Ting's core collection using the SNP data. We found that LD dropped to half of its maximum value at 100~350 kb on 12 chromosomes (Fig. 1c).

GWAS
A total of 3,808,730 SNPs were applied into GWAS for Al tolerance using GLM and MLM. We used P = 1.07 × 10 − 5 and P = 2.63 × 10 − 7 as the significant threshold for RRE in our study for MLM and GLM, respectively ( Fig. 1d and Additional file 3: Figure S3). 5 and 38 significant loci distributing on 12 chromosomes were identified totally for Al tolerance in GLM and MLM, respectively ( Table 1). The highest significant signal on each chromosome were shown in Table 1 in GLM and MLM. Our results indicated that the false positives were well controlled in this study (Additional file 4: Figure S4).
In our study, we identified 17 genes relating to rice Al tolerance which were reported in previous studies (Fig.  1d) with a modest significant threshold (P < 0.001). Besides, 13 new candidate regions with high LD and 69 new candidate genes for Al tolerance were identified firstly using MLM in our study (Additional file 8: Table  S4). New significant association regions with smaller P value and higher consecutive peak detected by MLM for Al tolerance were shown in Fig. 1d. And detailed distribution of these new gene-based association signals were included in Additional file 9: Table S5.

Effects of allele variations
Allele analysis was performed for 13 high significant SNPs and frequency of alleles existing in Ting's core collection were higher than those in reference (Table 2). Furthermore, top five significant SNPs at the gene-based location were selected for detecting the effects of allelic variations on Al tolerance ( Table 2). RRE of varieties with alternative alleles for above five SNPs were not larger than those with reference (Nipponbare) alleles in Ting's core collection (Fig. 2a). Moreover, staining by hematoxylin of Nipponbare (Al-tolerant) with allele C was lighter than Youzhan (an Al-sensitive variety) with allele T at 10,953,291 on chromosome 1 after Al toxicity treatment ( Fig. 2b and c).

Transcriptomic analysis
6948 genes were up-regulated for Al tolerant variety between treatment under (T) and without (CK) Al toxicity, while 1063 genes were up-regulated for Al sensitive variety and totally 796 genes were up-regulated both in Al tolerant and sensitive varieties (Fig. 3a). There were 4910, 544 and 322 genes which were down-regulated for Al tolerant, sensitive and both, respectively (Fig. 3b). Among 69 candidate genes identified through GWAS, 20 genes were also detected with significant difference between treatment under and without Al toxicity by transcriptome sequencing (Additional file 10: Table S6).

qRT-PCR and sequence analysis in CDS region
The expression of eight in above candidate genes detected by transcriptome sequencing were significant different in root tip between Nipponbare (Al tolerant) and Kasalath (Al sensitive) ( Fig. 4). At the meanwhile, 20 candidate genes were chosen for sequencing analysis in CDS region and sequence variations including substitutions, deletions and insertions could be found in seven genes CDS region between Al-tolerant and Al-sensitive varieties, which lead to amino acid variations (Table 3).

Discussion
Al toxicity is one of the most limits for rice production. QTL mapping based on segregating population is the main method for most of previous studies for Al tolerance. Linkage mapping which were constructed using typical Al tolerant and sensitive varieties. However, only two alleles at any given locus can be studied in linkage Table 1 Summary of association mapping results for relative root elongation using GLM and MLM Chr.
Number of significant loci SNP with the highest -log 10 P (  mapping. In this study, Ting's core collection consisted of rice landraces which represents an intermediate stage in domestication between wild and elite cultivars [34] was used for association mapping for Al tolerance. A wide-range of phenotypic variations and a normal distribution for Al tolerance were indicated in Ting's core collection. Therefore, Ting's core collection could be used as for GWAS on Al tolerance. There were two researches on rice Al tolerance using low resolution GWAS [9,17]. More than 3.8 million SNPs were applied into GWAS for Al tolerance in the present study and much higher in resolution than above  two studies. Moreover, LD dropped to half of its maximum value at 100~350 kb ( Fig. 1c) which is agreement with the previous measurements [35][36][37].
The research of Famoso et al. (2011) is the first GWAS on Al tolerance in rice [9]. In this study, 48 regions associated with Al tolerance were identified by GWAS. In total, three regions/loci were detected in both study of Famoso et al. (2011) and present. Ting's core collection was used to perform GWAS in our previous study using 274 SSR markers [17], and 23 significant trait-marker associations were discovered. However, no identical region/locus was identified in both our previous and present study. There might be three reasons for explaining this: 1. Different markers with distinct resolutions used in two researches; 2. No identical significant threshold used in two researches. In our previous study, P < 0.05 was applied to judge a significant region/locus. The significant trait-marker associations might not be detected in present study; 3. RRE of 18 varieties were missing in present study while there were only three missing in our previous studies.
In the present study, we found that there was no signals with a strong significant threshold in previously reported genes or regions. It is probably because: 1. Different materials were used in previous studies; 2. No same statistical model was used in previous studies; 3. Most of QTL were identified with mutations in previous studies; 4. Al tolerance were affected by numerous alleles with minor effects, so previous QTL/genes cannot be detected in the present study. Thus, a modest significant threshold (P < 0.001) was used in the present study which is same to the study of Huang et al. (2015) for grain number [38]. We identified 17 genes (P < 0.001) relating to rice Al tolerance which were reported in previous studies (Fig. 1d). Significant association signals close to OsCDT3 on chromosome 1 [39], Alt TRG 1.2 on chromosome 1 [9], OsFRDL4 on chromosome 1 [15], NRAT1 on chromosome 2 [13], ASR1 on chromosome 2 [7], LOC_Os02g49790 on chromosome 2 (Rice Genome Annotation Project, RGAP), LOC_Os03g55290 on chromosome 3 (RGAP), LOC_Os04g41750 on chromosome 4  (RGAP), STAR2 on chromosome 5 [10], LOC_Os06g22600 on chromosome 6 (RGAP), STAR1 on chromosome 6 [10], Alt LRG 9.1 on chromosome 9 [9], LOC_Os10g42780 on chromosome 10 (RGAP), ASR5 on chromosome 11 [20], LOC_Os11g29680 on chromosome 11 (RGAP), ART1 on chromosome 12 [14] and Alt TRG 12.2 on chromosome 12 [9] were detected in our study. It is necessary to notice that -log 10 (P) value of these significant signals both in our study and the first as well as the only one GWAS research [9] for rice Al tolerance are not large as those in GWAS researches of other rice traits. In our opinion, this might be due to high complexity of rice Al tolerance mechanism. In our opinion, those peaks with consecutive loci which are below the significant threshold (−log10(P) = 4.97) are also valuable to be analyzed deeply because the reported genes for Al tolerance are below 4.97, hence, we set -log10(P) = 3.0 as the threshold for above peaks. 13 new candidate regions with high LD and 69 new candidate genes (Additional file 8: Table S4) for Al tolerance were identified firstly using MLM in our study. Among 69 candidate genes for Al tolerance, 48 genes could be not predicted their performance on Al detoxification through annotation, while the remaining 21 genes could be speculated their detoxifying Al mechanism with annotation information and previous studies (Additional file 8: Table S4). For instance, LOC_Os01g09370.1 encodes an ankyrin repeat domaincontaining protein 28, and ankyrin repeat domaincontaining protein was reported that it is probably involved in the regulation of antioxidation metabolism shared by stress responses [40]. Therefore, LOC_Os01g09370.1 might be one gene which is related to Al stress. LOC_ Os01g57350.1, LOC_Os01g57360.1, LOC_Os01g57420.1 and LOC_Os07g29220.1 encode a diacylglycerol kinase, an acyltransferase, a diacylglycerol kinase as well as a cyclopropane-fatty-acyl-phospholipid synthase respectively which are necessary to lipid metabolism in plant [41]. Moreover, lipids were reported that they might participate in rice Al detoxification [42,43]. Interestingly, few scientists paid attention to study on lipid in rice Al tolerance. Pentatricopeptide repeat domain containing protein encoded by LOC_Os01g57410.1 was identified that this protein was a regulator for abiotic stress [44]. LOC_Os01g25090.1, LOC_ Os01g74160.1 and LOC_Os03g30080.1 encode a transposon protein, putative, CACTA, En/Spm sub-class [45], a carboxyl-terminal peptidase [46] as well as a transposon protein, putative, CACTA, En/Spm sub-class [44] which were associated with metal accumulation. LOC_ Os09g33559.1 encodes a proline-rich family protein, while proline was reported it relating to Al metabolism [47].
To further test positive capability of candidate genes identified in our study, we chose two varieties (Al tolerant and sensitive) to perform transcriptome sequencing. Among 69 candidate genes identified through GWAS, 20 genes were also detected with significant differences between treatments under and without Al toxicity by transcriptome sequencing (Additional file 10: Table S6). There were three candidate genes (i.e. LOC_Os01g57350.1, LOC_Os01g57420.1 and LOC_Os07g29220.1) which were up regulated (T versus CK) obviously in Al tolerant variety while no significant difference in Al sensitive variety among candidate genes in group lipid metabolism, and another candidate gene (i.e. LOC_Os01g57360.1) was up regulated in Al sensitive variety. Both LOC_Os01g57450.1 and LOC_Os01g57480.1 which are in group abiotic stress response were up regulated in Al tolerant variety while no significant difference in Al sensitive variety. In group membrane protein, both LOC_Os01g74180.1 and LOC_ Os06g14030.1 were up regulated in Al tolerant variety while no significant difference in Al sensitive variety. Expression of candidate gene LOC_Os09g33559.1 was down regulated in Al tolerant variety while no significant difference in Al sensitive variety. Moreover, remaining 11 candidate genes in group unknown were up or down regulated for two varieties. Furthermore, qRT-PCR and sequence analysis in CDS region also gave more supports to our candidate genes.
In addition, after GWAS experiment, we agree that it is a better way to carry out further studies by using varieties with extreme values in Ting's core collection. In present study, after GWAS transcriptome analysis, histological observation, sequence comparisons of CDS and qRT-PCR were used to support the results of GWAS. Histological observation was performed to identify the effects of allele variations of significant SNPs from GWAS. Transcriptome analysis was to try to find some identical genes to GWAS. Both sequence comparisons of CDS and qRT-PCR were designed to try to support the candidate genes identified by GWAS and transcriptome. We chose different materials for further study because the candidate genes in present study were supposed to be supported by different materials. And we tried to prove the candidate genes from GWAS which are natural variations exiting in Ting's core collection would have effects to rice Al tolerance in other materials.

Conclusions
In this study, 69 new candidate genes for Al tolerance were identified using GWAS and some among 69 candidate genes were also detected through transcriptome sequencing, qRT-PCR and checking of sequence variation of CDS. It is worth to perform deeper research on them.

Plant material
Ting's core collection with 150 rice landraces [48] was used in present study. The information of 150 landraces are shown in Additional file 5: Table S1.

Phenotyping for Al tolerance
Seeds of Ting's core collection were harvested from the farm of China National Rice Research Institute, Hangzhou (30°3 N, 120°2E), during the late season (May-October) in 2016. Relative root elongation (RRE) was used to evaluate Al tolerance of all varieties, please see more details in our previous study [17]. RRE of 10 seedlings of each variety growing in parallel were measured with a ruler before and after the treatments (24 h) in one replicate, and six replications were performed. Fu et al. (2010) referenced that RRE ≥ 0.5 should be used as a criterion to find out Al tolerant varieties [49].
The root tips (1~2 cm) of rice seedlings cultivated for 24 h in the presence or absence of Al were gently shaken in a 0.5 mmol l − 1 CaCl 2 (pH = 4.0) solution for 10 min. The root tips were submerged into hematoxylin solution (0.2% hematoxylin and 0.02% potassium iodide). After 45 min, the root tips were soaked into deionized water till no clear color on root tips. Then the root tips were photographed using stereoscopic and light microscopes. We chose Youzhan as one of the varieties which had different alleles (alternative allele) to Nipponbare (reference allele) for some SNPs to be stained by hematoxylin solution.
Genome re-sequencing and SNP calling Genomic DNA from a single plant was used for sequencing. Re-sequencing was performed by Illumina HiSeq™ 4000 with 6~7 folds of genome coverage. We performed the mapping to the rice reference genome (IRGSP-1.0) for the 150-bp reads by using bwamem with the -M option of BWA software [50], and the mapped reads were realigned by using RealignerTargetCreator in GATK [51]. SNPs were labeled with the −glm BOTH option of UnifiedGenotyper in GATK. After filtering the SNPs with low minor-allele frequency (5%), a total of 3,808, 730 SNPs was remained for GWAS in our study. Please read more details in our previous study [52].

Genetic analyses
Genetic analyses were performed following our previous study [52]. Principal component analysis (PCA), and LD decay analysis among Ting's core collection were performed based on all SNPs. SPSS 17.0 was used for PCA analysis. The LD was evaluated using squared Pearson's correlation coefficient (r 2 ) calculated with the −r 2 command of the software PLINK [53]. The Q-matrix was obtained with the membership probability of each variety using ADMIX-TURE Version 1.22. The Loiselle algorithm was chosen for calculating kinship matrix by software SPAGeDi [54].

GWAS
A total of 3,808,730 SNPs were used for GWAS. General linear model (GLM, without Q-matrix) and mixed linear model (MLM, K + Q) were performed using TASSEL software (www.maizegenetics.net/tassel). P ≤ 2.63 × 10 − 7 (P = 1/n; n = total markers, which is a rough Bonferroni correction corresponding to -log 10 (P) = 6.58) should be defined as the suggestive threshold which is same to that in our previous study [52] in present study, but significant locus can not be identified according to this threshold. Thus, another significance threshold was calculated, i.e., a minimum Bayes factor (mBF). The mBF was calculated using the following formula: mBF = − e * P * ln(P) [55], thus, the significance threshold in present study was -log10(P) = 4.97.

Transcriptomic analysis
Bashizi, one Al-tolerant variety (RRE = 0.65) and Heikedanuo, one Al-sensitive (RRE = 0.34) from Ting's core collection (Additional file 5: Table S1) were chosen for transcriptome analysis. Total RNA of root tip after one day under 100 μmol l − 1 AlCl 3 was extracted with the miRNeasy Kit (QIAGEN, USA). The RNA samples were evaluated on agarose gels, quantified in a spectrophotometer and stored at − 80°C. RNA sequencing was performed by Illumina NextSeq 500 for 2 × 125-bp pairedend sequencing. Samples under Al toxic treatment and without Al toxic in three times as well as one time respectively were collected for RNA sequencing [56].

Real-time qRT-PCR
Nipponbare (RRE = 0.85), one well known Al-tolerant variety and Kasalath (RRE = 0.30), one well known Alsensitive were chosen for real-time qRT-PCR analysis. Total RNA of root tip after one day without and under 100 μmol l − 1 AlCl 3 was extracted with the miRNeasy Kit (QIAGEN, USA). RNA was converted to cDNA using the protocol supplied by the manufacturer of ReverTra Ace qPCR RT Master Mix with gDNA remover (TOYOBO). The expression was determined with THUNDERBIRD™ SYBR® qPCR Mix without ROX (TOYOBO) by Roche Light Cycler® 480II. The primer sequences for qRT-PCR were listed in Additional file 6: Table S2. Ubiquitin was used as an internal control. Relative expression levels were calculated by 2 -ΔΔCt method. Three independent biological replicates were made for each treatment. The volume of the qRT-PCR reaction system was 10 μl: SYBR Premix Ex Taq II, 2x, 5 μl; PCR Primer (Forward+Reverse, 10uM), 2 μl; cDNA, 2 μl; ddH 2 O, 1 μl. The profile of PCR program is: 95°C for 3 min; 45 cycles of 95°C for 10 s, 58°C for 15 s, 72°C for 25 s; 95°C 10 s; 65°C 1 min; 40°C 1 min.

CDS sequence variation analysis of candidate genes
Nipponbare, one well known Al-tolerant variety. Kasalath and IR64 (RRE = 0.29), two well known Al-sensitive varieties were chosen for sequence variation analysis. Genomic DNA was extracted using modified SDS