Detection and validation of single feature polymorphisms using RNA expression data from a rice genome array
© Kim et al; licensee BioMed Central Ltd. 2009
Received: 23 October 2008
Accepted: 29 May 2009
Published: 29 May 2009
A large number of genetic variations have been identified in rice. Such variations must in many cases control phenotypic differences in abiotic stress tolerance and other traits. A single feature polymorphism (SFP) is an oligonucleotide array-based polymorphism which can be used for identification of SNPs or insertion/deletions (INDELs) for high throughput genotyping and high density mapping. Here we applied SFP markers to a lingering question about the source of salt tolerance in a particular rice recombinant inbred line (RIL) derived from a salt tolerant and salt sensitive parent.
Expression data obtained by hybridizing RNA to an oligonucleotide array were analyzed using a statistical method called robustified projection pursuit (RPP). By applying the RPP method, a total of 1208 SFP probes were detected between two presumed parental genotypes (Pokkali and IR29) of a RIL population segregating for salt tolerance. We focused on the Saltol region, a major salt tolerance QTL. Analysis of FL478, a salt tolerant RIL, revealed a small (< 1 Mb) region carrying alleles from the presumed salt tolerant parent, flanked by alleles matching the salt sensitive parent IR29. Sequencing of putative SFP-containing amplicons from this region and other positions in the genome yielded a validation rate more than 95%.
Recombinant inbred line FL478 contains a small (< 1 Mb) segment from the salt tolerant parent in the Saltol region. The Affymetrix rice genome array provides a satisfactory platform for high resolution mapping in rice using RNA hybridization and the RPP method of SFP analysis.
A SFP is a polymorphism detected by a single probe in an oligonucleotide array . SFPs represent SNPs, INDELs or both. A polymorphism within a transcribed sequence might reflect a biologically pertinent variation within the encoded protein or a regulatory element located in an untranslated region. Therefore, SFPs detected using oligonucleotide microarrays designed for expression analysis can provide function-associated genetic markers.
We initially developed the RPP method of SFP discovery using the Affymetrix barley genome array  and then applied this method to rice . A distinguishing component of our method is the use of complex RNA as a surrogate for rice genomic DNA, eliminating genome size and interference from highly repetitive DNA as technical impediments to SFP detection. Another distinguishing element of our method is that RPP first utilizes a probe set level analysis to identify SFP-containing probe sets and then chooses only the one or two most discriminatory probes from within each SFP-containing probe set.
SFPs have been identified using oligonucleotide microarrays in several species. In yeast  and Arabidopsis , SFPs were detected by hybridization of genomic DNA to oligonucleotide microarrays. SFP genotyping was accomplished also by hybridization of mRNA to an oligonucleotide-expression array in yeast . More recently, SFPs were identified in rice using hybridization of genomic DNA to an oligonucleotide microarray [6, 7].
Here we analyzed RNA expression data using the RPP method to detect SFPs among a salt-tolerant rice recombinant inbred line (RIL), FL478, and its presumed parental rice genotypes, Pokkali and IR29, as described previously [2, 3]. FL478 was developed from an indica cross between salt-tolerant Pokkali and salt-susceptible IR29 [8–10]. Gregorio et al. (1997) identified salt-tolerant and salt-sensitive RILs . One of the RILs, FL478 (F2-derived F8) was among the most salt tolerant.
Our purpose in the present study was to apply higher density SFP analysis to a lingering question about the nature of salt tolerance in RIL FL478, following our previous report that the only SFP markers that we were aware of in the vicinity of the Saltol locus in FL478 originated from the salt sensitive parent.
Results and discussion
SFP detection and validation
SFPs detected in Saltolregion by RPP method
We explored the source of the Saltol region in FL478 because several reports demonstrated the importance of this region for salt tolerance, and because our prior report  suggested that the Saltol region of FL478 may have originated from the salt sensitive parent. Bonilla et al. (2002)  initially delimited Saltol as a QTL controlling three traits (low Na+ absorption, high K+ absorption and low Na+/K+ ratio) within a 15 cM segment of the rice genetic map with peak LOD score > 6.7 (Figure 4). A major QTL for high shoot K+ concentration under salt stress also was identified in the same region . More recently, Ren et al. (2005) identified the SKC1 gene encoding a sodium transporter and demonstrated that it is a determinant of salt tolerance in the Saltol region .
Rice SFP probe sets in the Saltol region
Probe set name
Gene model a
Position of 5' end
Putative ADP-ribosylation factor protein
Putative calmodulin protein
Actin family protein
Glutamyl-tRNA synthetase family protein
Putative ubiquitin-conjugating enzyme X protein
SNF7 family protein
Transferase family protein
Protein kinase domain containing protein
Putative dual specificity protein phosphatase family protein
Peroxidase family protein
Transcription initiation factor IID, 18kD subunit family protein
Putative Importin alpha-1b subunit protein
Putative PPR986-12 protein
Putative HASTY protein
Putative transposon protein, unclassified
Validation of SFPs in Saltolregion by amplicon sequencing
Correct SFP call rate by RPP method
We examined a total of 64 putative SFPs by amplicon sequencing (Additional file 2). Among them, 62 were found to cover polymorphisms (~97% validation). Among these 62 confirmed SFPs, 51 (82.2%) were positioned over a single SNP, seven (11.3%) were positioned over an INDEL, two (3.2%) spanned one SNP and one INDEL, one (1.6%) spanned > 1 SNP and no INDEL, and one spanned > 1 SNP and > 1 INDEL. From this we assert that at the threshold of top 20 percentile outlying scores, our detection method is correct about 97% of the time (2 false positive in 64) in a priori identification of SFPs from the Affymetrix rice genome array data using RNA-based datasets. Winzeler et al. (1998) identified more than 3,000 polymorphisms between two yeast strains at a 5% error rate using DNA hybridization . Also, about 1,000 SFPs were identified at 3~7% error rates in yeast using mRNA hybridization . In Arabidopsis, among 3,806 predicted SFPs, 97% of known polymorphisms were detected, which established a false negative rate of 3% . Rostoks et al. (2005) used a probe level analysis of transcriptome data in barley to identify 10,504 putative SFPs, which included ~40% false positives . More recently, rice genomic DNA was hybridized to an oligonucleotide microarray to detect SFPs  with an up to 20% false discovery rate. The 97% validation rate (3% false positives) from our method of RNA-based SFP detection by RPP compares favourably to these other performance metrics.
In the single nucleotide polymorphism database (dbSNP) of the National Center for Biotechnology Information (NCBI), more than 5 million polymorphisms including SNPs, small INDELs and microsatellite repeat variations have been catalogued. Also, the International Rice Research Institute has initiated a project to identify a large fraction of the SNPs in germplasm pertinent to cultivated rice through whole-genome comparisons . This will provide additional millions of rice SNPs. Our work has shown that the existing Affymetrix rice genome array can be used to provide some thousands of SFP markers from a pairwise rice genotype comparison. Because a number of researchers have been using Affymetrix microarrays for transcriptome analyses in a range of rice RILs, NILs and germplasm accessions, existing data files provide abundant opportunities for the identification of additional SFP markers and resolution of trait determinants without additional expenditure on materials or data acquisition. Therefore, application of the RPP method to existing data could augment, or sometimes obviate the need for, other markers to meet objectives such as map-based cloning and sub-Mb resolution of the position of trait determinants. Examples of such applications would be to define introgressed regions in NILs or to generate moderate density linkage maps from RIL populations. Also, SFPs can provide a reliable discovery component in the development of markers for other detection systems including SNPs, CAPS, DArT, and SSRs.
We identified a small (< 1 Mb) segment from the salt tolerant parent, presumably a Pokkali accession, in the Saltol region of RIL FL478 using SFP analysis with confirmation by amplicon sequencing. This small segment is flanked by alleles identical to those in the salt sensitive parent IR29. This study shows that the Affymetrix rice genome array, designed for expression analysis, provides a satisfactory genetic marker system for mapping in rice using RNA hybridization and the RPP method of SFP analysis.
Seeds of rice (Oryza sativa) genotypes Pokkali, IR29 and FL478 were obtained from G. B. Gregorio at the International Rice Research Institute in the Philippines and then propagated at the USDA/ARS George E. Brown, Jr., US Salinity Laboratory in Riverside, CA. Seedlings of the three genotypes were grown and stored at -80°C until DNA extraction.
Genomic DNA isolation
Genomic DNA was extracted from seedlings of the three genotypes using a DNeasy Plant Mini Kit (Qiagen, USA) according to the manufacturer's protocol. For each genotype, more than seven seedlings were ground and about 0.1 g of pulverized tissue was processed. Purified genomic DNA was quantified at 260 nm using a spectrophotometer.
SFP identification by RPP method
We produced RNA expression data using the Affymetrix rice GeneChip hybridized with cRNA synthesized from shoot tissue RNA of young seedling of three rice genotypes with and without salt stress, essentially as described previously . The dataset was from seven chips with Pokkali RNA, five chips with IR29 and six chips with FL478. The Affymetrix rice GeneChip consists of probe sets designed for 48,564 japonica and 1,260 indica sequences http://www.affymetrix.com/. For SFP detection, we applied the RPP method to each probe set that had a "present" call in all chip samples from each pair of genotypes under comparison: (1) Pokkali versus IR29, (2) Pokkali versus FL478, (3) IR29 versus FL478. Using the top 20 percentile of all overall outlying scores as a cutoff, SFP probes were compiled. FL478 alleles presumed to be inherited from IR29 were then obtained as the SFPs detected in comparisons (1) and (2) but not (3). Similarly FL478 alleles presumed to be from Pokkali were obtained as the SFPs detected in (1) and (3) but not (2). As described in Cui et al. (2005) , the RPP method first measures the overall outlyingness of each probe set. Probe sets with significantly high outlying scores are then analyzed at the probe level and the probes that make a sufficiently large contribution to overall outlyingness of the probe set are identified as SFP probes.
We obtained the target sequence of each probe set from the sequence information file (SIF) for the Affymetrix rice genome array http://www.affymetrix.com/. The target sequence corresponds to the 5' end of the 5'-most probe to the 3' end of the 3'-most probe. To obtain the corresponding indica rice genomic sequences, each target sequence was searched using BLASTN against the indica rice whole genome shotgun sequences in the NCBI database http://www.ncbi.nlm.nih.gov/BLAST/Genome/PlantBlast.shtml?10. The indica sequences (cv. 93-11) were aligned with the target sequence using AlignX in Vector NTI Advance 10 (Invitrogen, USA). HarvEST:RiceChip  was used to check the position of SFP probes in each target sequence. Primers were designed using Primer3 http://frodo.wi.mit.edu/cgi-bin/primer3/primer3_www.cgi/. The primers are listed in Additional file 3.
PCR was performed in 20 μl containing 25~50 ng of genomic DNA, 0.1 μM of specific primers, 0.2 mM dNTPs, and 1 unit of Taq (GenScript Corp., USA) DNA polymerase. The reaction included a 5 min denaturation at 95°C followed by 35 cycles of PCR (94°C, 30 sec; 55~65°C, 70 sec; 72°C, 60 sec), and a final 5 min at 72°C. Aliquots (4 μl) of the PCR products were separated on a 1.2% agarose gel to check the band size and quantity. PCR products were purified using QIAquick PCR purification Kit (Qiagen, USA) to prepare for sequencing.
DNA sequence analysis
DNA sequencing was performed by the dideoxynucleotide chain termination method . The amplified PCR products (amplicons) were sequenced with an ABI-PRISM 3730×l Autosequencer (ABI, USA). These sequences were then compared with the target sequence of each probe set using AlignX (Invitrogen, USA). Comparisons of nucleotide sequence similarity were displayed using GeneDoc . Rice genomic amplicon sequences have been deposited in the GenBank Data Library under accession numbers [GenBank:EF589163–EF589342 and EU099042–EU099056].
Current address of JX is Department of Statistics and Actuarial Science, East China Normal University, Shanghai 200241, China. Current address of HW is Department of Plant Pathology, University of California, Davis, CA 95616, USA.
The authors thank Dr. Jan T. Svensson and Dr. Livia Tommasini for helpful discussions and technical assistance. This work was supported by a grant from the International Rice Research Institute under the USAID Linkage Program to AMI and in part by the Korea Research Foundation Grant funded by the Korean Government (MOEHRD) (KRF-2005-214-C00229) to SHK.
- Borevitz JO, Liang D, Plouffe D, Chang HS, Zhu T, Weigel D, Berry CC, Winzeler E, Chory J: Large-scale identification of single-feature polymorphisms in complex genomes. Genome Res. 2003, 13: 513-523. 10.1101/gr.541303.PubMedPubMed CentralView ArticleGoogle Scholar
- Cui X, Xu J, Asghar R, Condamine P, Svensson JT, Wanamaker S, Stein N, Roose M, Close TJ: Detecting single-feature polymorphisms using oligonucleotide arrays and robustified projection pursuit. Bioinformatics. 2005, 21: 3852-3858. 10.1093/bioinformatics/bti640.PubMedView ArticleGoogle Scholar
- Walia H, Wilson C, Condamine P, Liu X, Ismail AM, Zeng LH, Wanamaker SI, Mandal J, Xu J, Cui XP, Close TJ: Comparative transcriptional profiling of two contrasting rice genotypes under salinity stress during the vegetative growth stage. Plant Physiol. 2005, 139: 822-835. 10.1104/pp.105.065961.PubMedPubMed CentralView ArticleGoogle Scholar
- Winzeler EA, Richards DR, Conway AR, Goldstein AL, Kalman S, McCullough MJ, McCusker JH, Stevens DA, Wodicka L, Lockhart DJ, Davis RW: Direct allelic variation scanning of the yeast genome. Science. 1998, 281: 1194-1197. 10.1126/science.281.5380.1194.PubMedView ArticleGoogle Scholar
- Ronald J, Akey JM, Whittle J, Smith EN, Yvert G, Kruglyak L: Simultaneous genotyping, gene-expression measurement, and detection of allele-specific expression with oligonucleotide arrays. Genome Res. 2005, 15: 284-291. 10.1101/gr.2850605.PubMedPubMed CentralView ArticleGoogle Scholar
- Kumar R, Qiu J, Joshi T, Valliyodan B, Xu D, Nguyen HT: Single feature polymorphism discovery in rice. PLoS ONE. 2007, 3: e284-10.1371/journal.pone.0000284.View ArticleGoogle Scholar
- Edwards JD, Janda J, Sweeney MT, Gaikwad AB, Liu B, Leung H, Galbraith DW: Development and evaluation of a high-throughput, low-cost genotyping platform based on oligonucleotide microarrays in rice. Plant Methods. 2008, 4: 13-10.1186/1746-4811-4-13.PubMedPubMed CentralView ArticleGoogle Scholar
- Bonilla P, Dvorak J, Mackill D, Deal K, Gregorio G: RFLP and SSLP mapping of salinity tolerance genes in chromosome 1 of rice (Oryza sativa L.) using recombinant inbred lines. Philipp Agric Scientist. 2002, 85: 68-76.Google Scholar
- Gregorio GB, Senadhira D, Mendoza RD: Screening rice for salinity tolerance. IRRI Discussion Paper Series Number 22. International Rice Research Institute, Manila, Philippines; 1997.Google Scholar
- Gregorio GB, Senadhira D, Mendoza RD, Manigbas NL, Roxas JP, Guerta CQ: Progress in breeding for salinity tolerance and associated abiotic stresses in rice. Field Crops Res. 2002, 76: 91-101. 10.1016/S0378-4290(02)00031-X.View ArticleGoogle Scholar
- Lin HX, Zhu MZ, Yano M, Gao JP, Liang ZW, Su WA, Hu XH, Ren ZH, Chao DY: QTLs for Na+ and K+ uptake of the shoots and roots controlling rice salt tolerance. Theor Appl Genet. 2004, 108: 253-260. 10.1007/s00122-003-1421-y.PubMedView ArticleGoogle Scholar
- Ren ZH, Gao JP, Li LG, Cai XL, Huang W, Chao DY, Zhu MZ, Wang ZY, Luan S, Lin HX: A rice quantitative trait locus for salt tolerance encodes a sodium transporter. Nat Genet. 2005, 37: 1141-1146. 10.1038/ng1643.PubMedView ArticleGoogle Scholar
- Rostoks N, Borevitz JO, Hedley PE, Russell J, Mudie S, Morris J, Cardle L, Marshall DF, Waugh R: Single-feature polymorphism discovery in the barley transcriptome. Genome Biol. 2005, 6: R54-10.1186/gb-2005-6-6-r54.PubMedPubMed CentralView ArticleGoogle Scholar
- Mcnally KL, Bruskiewich R, Mackill D, Buell CR, Leach JE, Leung H: Sequencing multiple and diverse rice varieties. Connecting whole-genome variation with phenotypes. Plant Physiol. 2006, 141: 26-31. 10.1104/pp.106.077313.PubMedPubMed CentralView ArticleGoogle Scholar
- HarvEST: Affymetrix Rice version 1.01. [web version: http://www.harvest-web.org; download from: http://harvest.ucr.edu/].
- Rozen S, Skaletsky HJ: Primer3 on the WWW for general users and for biologist programmers. Bioinformatics Methods and Protocols (Methods in Molecular Biology). Edited by: Krawetz S, Misener S. Totowa, NJ: Humana Press; 2000:365-386.Google Scholar
- Sanger F, Nicklen S, Coulson AR: DNA sequencing with chain terminating inhibitors. Proc Natl Acad Sci USA. 1977, 74: 5463-5467. 10.1073/pnas.74.12.5463.PubMedPubMed CentralView ArticleGoogle Scholar
- Nicholas KB, Nicholas HBJ, Deerfield DW: GeneDoc: analysis and visualization of genetic variation. EMBNEW NEWS. 1997, 4: 14Google Scholar